help reweight -------------------------------------------------------------------------------
Title
reweight -- Reweights survey variables using external information
Syntax
reweight varlist , weight(varname) nweight(newvar) total(matrix) dfunction(name) [pchange(#) upbound(#) lowbound(#) ntbounds(#) ntries(#) svalues(matrix)]
Description
reweight uses external information to reweight surveys of microdata. The command is based on the procedure proposed in Creedy (2004) and Deville and Särndal (1992)
Options for reweight
weight(varname) is required and specifies a numeric variable for the original survey weights.
nweight(newer) is required and defines the name of the new weights.
total(matrix) is required and identifies a Stata column vector with the new totals gathered from external information. Note that the order of these numbers must follow the order of the variables in varlist.
dfunction(name) specifies the distance function to be used when computing the new weights. The allowed distance functions are the chi-squared (type "chi2"), the Deville and Särndal's (type "ds") and two more, which we call function A (type "a") and B (type "b"). The default is the chi-square distance function. Note that all these distance functions are explained in Creedy(2004).
pchange(#) specifies the tolerance level when using Newton's method to compute the new weights. Newton's methods are used with the A, B and DS distance functions. Convergence is based on a double criteria: the percentage change in the updated distance function and the differences in the estimated totals with respect to the previous iteration. The default is pchange(0.00001).
upbound(#) is used to specifies the upper-bound when using the Deville and Särndal's distance function. The default is upbound(5). Note that this value must be bigger than 1.
lowbound(#) is used to specifies the lower-bound when using the Deville and Särndal's distance function. The default is lowbound(0.2). Note that this value must be smaller than 1.
ntries(#) is used to specifies the number of maximum “tries” with the distance functions that use Newton's methods. This option can be useful when the new totals are significantly different from the survey one. In such situations the algorithm may not achieve convergence and it automatically restarts with new (random) starting values. However, after # trays the algorithm stops anyway, instead of trying with new starting values. The default id ntries(5).
ntbounds(#) is used to specifies the number of tries before drawing new random bounds with the Deville and Särndal's distance function. This option may be useful when the new totals are significantly different from the original one. In this case the algorithm could not achieve convergence with the specified bounds and it restarts with a new set of starting values. However, depending on the number specified in ntbounds(#), after # tries the algorithm sets both new starting values and new random bounds.
As an example, if after k iterations the algorithm overflows and the number in ntries(#) is bigger than 1, the algorithm restarts from iteration 1 with new starting values. If the algorithm does not achieve converge either with the new starting values and the user has specified ntbounds(2) then the algorithm restarts again with new starting values and new random bounds. However, if the user has specified ntbounds(3) it would be needed another failure - i.e. the third one - before setting new random bounds.
svalues(matrix) is used to specifies starting values with the distance functions that use Newton's methods. Starting values must be in a Stata column vector, following the order of the variables in varlist. The default is a vector with the Lagrange Multipliers obtained from the solution of the minimization problem with the chi-squared distance function.
Example
Consider the following example from Creedy(2003). id is the identification number of each unit included in the survey, x1, x2, x3 and x4 are variables included in the survey, weight is the vector of original survey weights:
use http://fmwww.bc.edu/repec/bocode/r/reweight.dta, clear list id x1 x2 x3 x4 weight 1 1 1 0 0 3 2 0 1 0 0 3 3 1 0 2 0 5 4 0 0 6 1 4 5 1 0 4 1 2 6 1 1 0 0 5 7 1 0 5 0 5 8 0 0 6 1 4 9 0 1 0 0 3 10 0 0 3 1 3 11 1 0 2 0 5 12 1 1 0 1 4 13 1 0 3 1 4 14 1 0 4 0 3 15 0 0 5 0 5 16 0 1 0 1 3 17 1 0 2 1 4 18 0 0 6 0 5 19 1 0 4 1 4 20 0 1 0 0 3
The vector of survey weights produces the following aggregate totals:
1. tabstat x1 x2 x3 x4 [w=weight], s(su) 2. stats x1 x2 x3 x4 3. sum 44 24 213 32
Now, let us assume that external information on these variables are available, > and that the true totals are:
stats x1 x2 x3 x4 sum 50 20 230 35
In this case, reweight can be used to adjust the survey weights so that the new > survey totals match the true totals:
matrix t=(50 \ 20 \ 230 \ 35) reweight x1 x2 x3 x4, w(weight) nw(wchi2) tot(t) df(chi2) reweight x1 x2 x3 x4, w(weight) nw(wds) tot(t) df(ds) low(0.2) up(3) reweight x1 x2 x3 x4, w(weight) nw(wa) tot(t) df(a) reweight x1 x2 x3 x4, w(weight) nw(wb) tot(t) df(b)
1. tabstat x1 x2 x3 x4 [w=wds], s(su) 2. stats x1 x2 x3 x4 3. sum 50 20 230 35 4. browse w*
Which gives the same values as in Creedy (2003):
weight wchi2 wds wa wb 3 2.7534503 2.7057046 2.6738135 2.6540317 3 2.1091624 2.1776459 2.2284116 2.2600965 5 5.9451664 5.9762224 5.9975662 6.0123387 4 4.0052762 3.9737666 3.9440418 3.9259422 2 2.483622 2.5006367 2.5139863 2.521438 5 4.5890838 4.5095077 4.4563559 4.4233862 5 5.7521965 5.7469223 5.7291728 5.716967 4 4.0052762 3.9737666 3.9440418 3.9259422 3 2.1091624 2.1776459 2.2284116 2.2600965 3 3.1197391 3.1055051 3.0862549 3.0740943 5 5.9451664 5.9762224 5.9975662 6.0123387 4 3.9852951 3.8966345 3.8144997 3.7619213 4 5.0187026 5.0647032 5.1083784 5.1356056 3 3.4899119 3.4936742 3.4899599 3.4872875 5 4.6783835 4.6649212 4.6654181 4.665873 3 2.3446835 2.3552152 2.370096 2.3803712 4 5.0701612 5.1284983 5.1907285 5.2318093 5 4.6140602 4.6001223 4.6025412 4.6043357 4 4.9672439 5.0012735 5.0279726 5.0428759 3 2.1091624 2.1776459 2.2284116 2.2600965
Reference Creedy, J., 2004. Reweighting Household Surveys for Tax Microsimulation Modelling: An Application to the New Zealand Household Economic Survey . Australian Journal of Labour Economics 7 (1) 71-88, Centre for Labour Market Research.
Creedy, J., 2003. Survey Reweighting for Tax Microsimulation Modelling, Treasury Working Paper Series 03/17, New Zealand Treasury.
Deville, J.C. and Särndal, C.E., 1992. Calibration estimators in survey sampling, Journal of the American Statistical Association 87 (418) 376—382, American Statistical Association.
Author
This command was written by Daniele Pacifico (daniele.pacifico@tesoro), Italian Department of the Treasury. Comments and suggestions are welcome.
Also see
Manual: [R] reweight
Online: [R] reweight