Model estimates with balance repeated replication (BRR) based standard errors
brrmodel varlist [weight] [if exp] [in range] [, brrweight(varlist) fay(#) deff deft cmd(command name) level(#) dof(#) or command_options ]
pweights are allowed; see help weights. Analysis weights are required, and must be specified with the command, or by the svyset pweight command.
Description
brrmodel estimates regression-type models for complex survey data. Standard errors are calculated using a series of user-supplied replication weights, by the balanced repeated replication (BRR) method. This is an alternate method to the Taylor series linearization methods used by Stata's svy-based commands.
It will run ols regression, logit/probit, ordered logit/probit, multinomial logit, and poisson regression models.
Options
brrweight() specifies the list of variables that contain the replicate weights for the dataset. The standard errors for the model are based on the variation in the estimates generated across the various weights.
A set of brrweights is required for the analysis. Once the brrweights are specified they are stored as a characteristic of the dataset and need not be respecified in subsequent command.
fay() specifies the k value that should be used for weighting the estimates, based on Fay's method. The default is zero, meaning that simple averaging will be used. As with the replicate weights, the value for fay() is stored as a characteristic of the dataset once it is specified, and need not be re-specified in subsequent commands.
dof() specifies the degrees of freedom for model fit and t-statistics. The default is to use the number of replications.
deff and deft request that design effects deff and deft be displayed with the model estimates. See [R] svymean for details.
cmd() specifies the model estimation command. Valid options are regress logit probit logistic oprobit, ologit, mlogit and poisson. The default is ols regression.
or specifies that coefficients from a logit model should be displayed as odds-ratios. See logit.
Example command and output
. brrmodel income region sex asset [pw=wgt] , brrw(bwgt*)
OLS estimates with BRR-based standard errors
Analysis weight: wgt Number of obs = 1904 Replicate weights: bwgt* Population size = 10738 Number of replicates: 20 Degrees of freedom = 20 k (Fay's method): 0.000 F( 3, 18) = 86.40 Prob > F = 0.0000 R-squared = 0.0057
----------------------------------------------------------------------- ------- income | Coef. Std. Err. t P>|t| [95% Conf. In > terval] -------------+--------------------------------------------------------- ------- region | .6214526 .116294 5.34 0.000 .3788676 . > 8640376 sex | -1.121041 .2671536 -4.20 0.000 -1.678313 -. > 5637684 asset | .1274654 .0077771 16.39 0.000 .1112427 . > 1436881 _cons | 37.61598 .5194737 72.41 0.000 36.53238 3 > 8.69959 ----------------------------------------------------------------------- -------
Saved Results
brrmodel is an estimation command, so it saves model estimates and brr-based (c > o)variance matrix in e(b) and e(V) and creates e(sample) to reflect the estimation sample. > It also stores design effects in e(deff) and e(deft), and the simple-random-sampling-without-replacement (co) > variance matrix in e(V_srs).
svytest will estimate adjusted Wald linear hypothesis tests after BRR model estimation. (brrmodel specifies the estimation command as "svybrrmodel" in order to allow svytest to function.)
Scalars e(N_strata) and e(N_psu) are set in order to allow svytest to operate c > orrectly. N_strata is set to the degrees of freedom for the model (by default the number of replic > ates), and N_psu is set to twice the degrees of freedom.
predict after an estimation should work as documented for the relevant command. > Warning: this has not been tested extensively.
Methods and formulae
Point estimates are calculated using aweights, and are identical to those produced by Stata's svy-based commands. The variance matrix of the estimates is formed by calculating
G V = c * SUM [ (B - B(i))(B - B(i))' ] i=1
where B is the estimated coefficient vector based on the full sample weights, B(i) is the estimated coefficient vector using the i'th set of replicate weights, G is the number of replicates, and c is a constant defined as:
1 / G for standard BRR (i.e. fay==0), or
1 / (G*(1-k)^2) for Fay's method.
Acknowledgements
I would like to thank Bobby Gutierrez at StataCorp for advice on implementation of BRR.
Author
Nick Winter Cornell University nw53@cornell.edu