help reweight -------------------------------------------------------------------------------

Title

reweight -- Reweights survey variables using external aggregate totals

Syntax

reweight varlist [if] [in] , sweight(varname) nweight(newvar) total(matrix) dfunction(name) [svalues(matrix) tolerance(#) niter(#) ntries(#) upbound(#) lowbound(#) mlowbounds(#) mupbounds(#)]

Description

reweight calibrates survey data to external aggregate totals. The methodology closely follows Deville and Deville and Sarndal (1992) and the recursive algorithm that implements the calibration is from Creedy (2003).

Options for reweight

sweight(varname) is required and specifies a numeric variable for the original survey weights.

nweight(name) is required and defines the name of the new variable containing the calibrated weights.

total(matrix) is required and contains a Stata 1xK matrix with the user-provided totals, with arguments to be inserted in the same order as the K calibrating variables in varlist.

dfunction(name) specifies the distance function to be used when computing the new weights. The allowed distance functions are the chi-squared (type "chi2"), the Deville and Sarndal's (type "ds") and three more, which we define as type-A (type "a"), type-B (type "b") and type-C (type "c") distant functions. See Pacifico (2010) for details.

svalues(matrix) specifies user-provided starting values. Starting values must be put in a Stata 1xK matrix following the same order as the variables in varlist. The default is a vector with the Lagrange multipliers obtained from the chi-squared distance function.

tolerance(#) specifies the tolerance level to asses convergence. The default is tolerance(0.000001). reweight employs a double criterion to asses convergence. The first is that the difference between the estimated and the external totals must be lower than the tolerance level. The second criterion is that - from one iteration to the other - the percentage variations of the estimated distance between the new and the original weights must be lower than the tolerance level for each observation in the sample.

ntier(#) specifies the number of maximum iterations. The default is niter(50).

ntries(#) specifies the maximum number of “tries” when the algorithm doeas not achieve convergence within the maximum number of iterations. This option can be useful when the external totals are significantly different from the survey totals. In such situations the algorithm automatically restarts with new random starting values up to # times. The default is ntries(0).

upbound(#) specifies the upper-bound of the ratio between the new and the original weight when using the Deville and Sarndal's distance function. The default is upbound(3). Note that this value must be bigger than 1.

lowbound(#) specifies the lower-bound of the ratio between the new and the original weightwhen using the Deville and Sarndal's distance function. The default is lowbound(0.2). Note that this value must be between 0 and 1.

mlowbound(#) and mupbound(#) are relevant options only for the DS distant function when the option ntries(#) is effective. In this case, if the recursion does not achieve convergence the routine starts again with a new set of starting values and of new random bounds. mlowbound(#) specifies the maximum deviation from the highest value of the lower bound and mupbound(#) specifies the maximum deviation from the lowest value of the upper bound. As an example, if mlowbound(#) is set to 0.5 than the new random value for the lower bound will be drawn from a uniform distribution in the range 0.5-1 and if mupbound(#) is set to 5 than the new random value for the upper bound will be drawn from a uniform distribution in the range 1-5. The default is 0.1 and 6 respectively.

Example

Consider the following example from Creedy(2003). id is the identification number of each unit included in the survey, x1, x2, x3 and x4 are variables included in the survey, weight is the vector of original survey weights:

. use http://fmwww.bc.edu/RePEc/bocode/r/reweight, clear . list

id x1 x2 x3 x4 weight 1 1 1 0 0 3 2 0 1 0 0 3 3 1 0 2 0 5 4 0 0 6 1 4 5 1 0 4 1 2 6 1 1 0 0 5 7 1 0 5 0 5 8 0 0 6 1 4 9 0 1 0 0 3 10 0 0 3 1 3 11 1 0 2 0 5 12 1 1 0 1 4 13 1 0 3 1 4 14 1 0 4 0 3 15 0 0 5 0 5 16 0 1 0 1 3 17 1 0 2 1 4 18 0 0 6 0 5 19 1 0 4 1 4 20 0 1 0 0 3

The survey weights produce the following aggregate totals:

1. tabstat x1 x2 x3 x4 [w=weight], s(su) 2. stats x1 x2 x3 x4 3. sum 44 24 213 32

Now, let us assume that external information on these variables are available, > and that the real population totals are:

stats x1 x2 x3 x4 50 20 230 35

In this case, reweight can be used to calibrate the original survey weights so > that the new estimated totals will be equal to the population totals:

matrix t=(50 \ 20 \ 230 \ 35) reweight x1 x2 x3 x4, sw(weight) nw(wchi2) tot(t) df(chi2) reweight x1 x2 x3 x4, sw(weight) nw(wa) tot(t) df(a) reweight x1 x2 x3 x4, sw(weight) nw(wb) tot(t) df(b) reweight x1 x2 x3 x4, sw(weight) nw(wc) tot(t) df(c) reweight x1 x2 x3 x4, sw(weight) nw(wds) tot(t) df(ds)

list w* weight wchi2 wa wb wc wds 3 2.753 2.674 2.654 2.697 2.706 3 2.109 2.228 2.260 2.193 2.178 5 5.945 5.998 6.012 5.982 5.976 4 4.005 3.944 3.926 3.963 3.974 2 2.484 2.514 2.521 2.505 2.501 5 4.589 4.456 4.423 4.495 4.510 5 5.752 5.729 5.717 5.739 5.747 4 4.005 3.944 3.926 3.963 3.974 3 2.109 2.228 2.260 2.193 2.178 3 3.120 3.086 3.074 3.098 3.106 5 5.945 5.998 6.012 5.982 5.976 4 3.985 3.814 3.762 3.870 3.897 4 5.019 5.108 5.136 5.080 5.065 3 3.490 3.490 3.487 3.491 3.494 5 4.678 4.665 4.666 4.667 4.665 3 2.345 2.370 2.380 2.360 2.355 4 5.070 5.191 5.232 5.150 5.128 5 4.614 4.603 4.604 4.603 4.600 4 4.967 5.028 5.043 5.010 5.001 3 2.109 2.228 2.260 2.193 2.178

Which gives the same values as in Creedy (2003).

Reference

Creedy, J., 2003. Survey Reweighting for Tax Microsimulation Modelling, Treasury Working Paper Series 03/17, New Zealand Treasury.

Deville, J.C. and Sarndal, C.E., 1992. Calibration estimators in survey sampling, Journal of the American Statistical Association 87 (418) 376-382, American Statistical Association.

Pacifico 2010. reweight: A Stata module to reweight survey data to external totals, CAPPaper N.79.

Author

This command was written by Daniele Pacifico (daniele.pacifico@tesoro), Italian Department of the Treasury. Comments and suggestions are welcome.

Also see

Manual: [R] reweight

Online: [R] reweight