------------------------------------------------------------------------------- help for brrmean, brrtotal, brrratio -------------------------------------------------------------------------------

Estimate means, totals, ratios, and proportions for survey data, with balance r > epeated replication (BRR) based standard errors

brrmean varlist [weight] [if exp] [in range] [, common_options]

brrtotal varlist [weight] [if exp] [in range] [, common_options]

brrratio varname [/] varname [varname [/] varname ...] [weight] [if exp] [in range] [, common_options]

where common_options are

brrweight(varlist) fay(#) dof(#)

by(varlist) [complete|available] nolabel level(#) ci deff deft meff meft obs size

brrmean, brrratio, and brrtotal typed without arguments redisplay previous results. Any of the following options can be used when redisplaying results:

level(#) ci deff deft meff meft obs size

All of these commands allow pweights; see help weights.

Description

brrmean, brrtotal, and brrratio produce estimates of population means, totals, ratios, and proportions. Standard errors are calculated using a series of user-supplied replication weights, by the balanced repeated replication (BRR) method. This is an alternate method to the Taylor series linearization methods used by Stata's svy-based commands.

Estimates for multiple subpopulations can be obtained using the by() option. The if option will give estimates for a single population. (Note that with the BRR method, use of if or in producese correct estimates for the relevant subpopulation.)

svytest will operate after these commands, as it does after their svy-based equivalents.

Options

brrweight() specifies the list of variables that contain the replicate weights for the dataset. The standard errors for the model are based on the variation in the estimates generated across the various weights.

A set of brrweights is required for the analysis. Once the brrweights are specified they are stored as a characteristic of the dataset and need not be respecified in subsequent command.

fay() specifies the k value that should be used for weighting the estimates, based on Fay's method. The default is zero, meaning that simple averaging will be used. As with the replicate weights, the value for fay() is stored as a characteristic of the dataset once it is specified, and need not be re-specified in subsequent commands.

dof() specifies the degrees of freedom for model fit and t-statistics. The default is to use the number of replications.

by(varlist) specifies that estimates be computed for the subpopulations defined by different values of the variable(s) in the varlist.

nolabel can only be specified when by() is specified. nolabel requests that numeric values rather than value labels be used to label output for subpopulations. By default, value labels are used.

[complete|available] specifies how missing values are to be handled. complete specifies that only observations with complete data should be used. available specifies that all available nonmissing values be used for each estimate.

If neither complete nor available is specified, available is the default when there are missing values and there are two or more variables in the varlist (or four or more for svyratio). complete must be specified to compute the covariance or to use svytest after running the command; see help svytest.

level(#) specifies the confidence level (i.e., nominal coverage rate), in percent, for confidence intervals; see help level.

ci requests that confidence intervals be displayed. If no display options are specified then, by default, confidence intervals are displayed.

deff requests that the design-effect measure deff be displayed. If no display options are specified then, by default, deff is displayed.

deft requests that the design-effect measure deft be displayed. See [R] svymean for a discussion on deff and deft.

meff requests that the meff measure of misspecification effects be displayed.

meft requests that the meft measure of misspecification effects be displayed. See [R] svymean for a discussion of meff and meft.

obs requests that the number of observations used for the computation of the estimate be displayed for each row of estimates.

size requests that the estimate of the (sub)population size be displayed for each row of estimates. The (sub)population size estimate equals the sum of the weights for those observations used for the mean/total/ratio estimate.

Examples

. brrmean birthwgt [pw=wgt], brrw(bw*) . brrmean birthwgt, by(race) . brrmean birthwgt if race==1

. brrratio hdresult/tcresult

Methods and formulae

Point estimates are calculated using aweights, and are identical to those produced by Stata's svy-based commands. The variance matrix of the estimates is formed by calculating

G V = c * SUM [ (B - B(i))(B - B(i))' ] i=1

where B is the estimated coefficient vector based on the full sample weights, B(i) is the estimated coefficient vector using the i'th set of replicate weights, G is the number of replicates, and c is a constant defined as:

1 / G for standard BRR (i.e. fay==0), or

1 / (G*(1-k)^2) for Fay's method.

Acknowledgements

These commands consists largely of the ado file code from official Stata's svy_x command, which underlies svymean, svytotal, and svyratio. They are modified to calculate (co)variances differently. I would like to thank Bobby Gutierrez at StataCorp for advice on implementation of BRR.

Author

Nick Winter Cornell University nw53@cornell.edu