help reweight-------------------------------------------------------------------------------

Title

reweight-- Reweights survey variables using external aggregate totals

Syntax

reweightvarlist[if] [in],sweight(varname)nweight(newvar)total(matrix)dfunction(name) [svalues(matrix)tolerance(#)niter(#)ntries(#)upbound(#)lowbound(#)mlowbounds(#)mupbounds(#)]

Description

reweightcalibrates survey data to external aggregate totals. The methodology closely follows Deville and Deville and Sarndal (1992) and the recursive algorithm that implements the calibration is from Creedy (2003).

Options for reweight

sweight(varname)is required and specifies a numeric variable for the original survey weights.

nweight(name)is required and defines the name of the new variable containing the calibrated weights.

total(matrix)is required and contains a Stata 1xK matrix with the user-provided totals, with arguments to be inserted in the same order as the K calibrating variables invarlist.

dfunction(name)specifies the distance function to be used when computing the new weights. The allowed distance functions are the chi-squared (type "chi2"), the Deville and Sarndal's (type "ds") and three more, which we define as type-A (type "a"), type-B (type "b") and type-C (type "c") distant functions. See Pacifico (2010) for details.

svalues(matrix)specifies user-provided starting values. Starting values must be put in a Stata 1xK matrix following the same order as the variables invarlist. The default is a vector with the Lagrange multipliers obtained from the chi-squared distance function.

tolerance(#)specifies the tolerance level to asses convergence. The default istolerance(0.000001).reweightemploys a double criterion to asses convergence. The first is that the difference between the estimated and the external totals must be lower than the tolerance level. The second criterion is that - from one iteration to the other - the percentage variations of the estimated distance between the new and the original weights must be lower than the tolerance level for each observation in the sample.

ntier(#)specifies the number of maximum iterations. The default isniter(50).

ntries(#)specifies the maximum number of “tries” when the algorithm doeas not achieve convergence within the maximum number of iterations. This option can be useful when the external totals are significantly different from the survey totals. In such situations the algorithm automatically restarts with new random starting values up to#times. The default isntries(0).

upbound(#)specifies the upper-bound of the ratio between the new and the original weight when using the Deville and Sarndal's distance function. The default isupbound(3). Note that this value must be bigger than 1.

lowbound(#)specifies the lower-bound of the ratio between the new and the original weightwhen using the Deville and Sarndal's distance function. The default islowbound(0.2). Note that this value must be between 0 and 1.

mlowbound(#)andmupbound(#)are relevant options only for the DS distant function when the optionntries(#)is effective. In this case, if the recursion does not achieve convergence the routine starts again with a new set of starting values and of new random bounds.mlowbound(#)specifies the maximum deviation from the highest value of the lower bound andmupbound(#)specifies the maximum deviation from the lowest value of the upper bound. As an example, ifmlowbound(#)is set to 0.5 than the new random value for the lower bound will be drawn from a uniform distribution in the range 0.5-1 and ifmupbound(#)is set to 5 than the new random value for the upper bound will be drawn from a uniform distribution in the range 1-5. The default is 0.1 and 6 respectively.

ExampleConsider the following example from Creedy(2003).

idis the identification number of each unit included in the survey,x1,x2,x3andx4are variables included in the survey,weightis the vector of original survey weights:

. use http://fmwww.bc.edu/RePEc/bocode/r/reweight, clear. list

id x1 x2 x3 x4 weight1 1 1 0 0 32 0 1 0 0 33 1 0 2 0 54 0 0 6 1 45 1 0 4 1 26 1 1 0 0 57 1 0 5 0 58 0 0 6 1 49 0 1 0 0 310 0 0 3 1 311 1 0 2 0 512 1 1 0 1 413 1 0 3 1 414 1 0 4 0 315 0 0 5 0 516 0 1 0 1 317 1 0 2 1 418 0 0 6 0 519 1 0 4 1 420 0 1 0 0 3

The survey weights produce the following aggregate totals:

1. tabstat x1 x2 x3 x4 [w=weight], s(su)2. stats x1 x2 x3 x43. sum 44 24 213 32

Now, let us assume that external information on these variables are available, > and that the real population totals are:

stats x1 x2 x3 x450 20 230 35

In this case,

reweightcan be used to calibrate the original survey weights so > that the new estimated totals will be equal to the population totals:

matrix t=(50 \ 20 \ 230 \ 35)reweight x1 x2 x3 x4, sw(weight) nw(wchi2) tot(t) df(chi2)reweight x1 x2 x3 x4, sw(weight) nw(wa) tot(t) df(a)reweight x1 x2 x3 x4, sw(weight) nw(wb) tot(t) df(b)reweight x1 x2 x3 x4, sw(weight) nw(wc) tot(t) df(c)reweight x1 x2 x3 x4, sw(weight) nw(wds) tot(t) df(ds)

list w*weight wchi2 wa wb wc wds3 2.753 2.674 2.654 2.697 2.7063 2.109 2.228 2.260 2.193 2.1785 5.945 5.998 6.012 5.982 5.9764 4.005 3.944 3.926 3.963 3.9742 2.484 2.514 2.521 2.505 2.5015 4.589 4.456 4.423 4.495 4.5105 5.752 5.729 5.717 5.739 5.7474 4.005 3.944 3.926 3.963 3.9743 2.109 2.228 2.260 2.193 2.1783 3.120 3.086 3.074 3.098 3.1065 5.945 5.998 6.012 5.982 5.9764 3.985 3.814 3.762 3.870 3.8974 5.019 5.108 5.136 5.080 5.0653 3.490 3.490 3.487 3.491 3.4945 4.678 4.665 4.666 4.667 4.6653 2.345 2.370 2.380 2.360 2.3554 5.070 5.191 5.232 5.150 5.1285 4.614 4.603 4.604 4.603 4.6004 4.967 5.028 5.043 5.010 5.0013 2.109 2.228 2.260 2.193 2.178Which gives the same values as in Creedy (2003).

ReferenceCreedy, J., 2003.

Survey Reweighting for Tax Microsimulation Modelling, Treasury Working Paper Series 03/17, New Zealand Treasury.Deville, J.C. and Sarndal, C.E., 1992.

Calibration estimators in surveysampling, Journal of the American Statistical Association 87 (418) 376-382, American Statistical Association.Pacifico 2010.

reweight: A Stata module to reweight survey data toexternal totals, CAPPaper N.79.

AuthorThis command was written by Daniele Pacifico (daniele.pacifico@tesoro), Italian Department of the Treasury. Comments and suggestions are welcome.

Also seeManual:

[R] reweightOnline:

[R] reweight