{smcl} {* 10Jan2011/} {cmd:help reweight }{right:} {hline} {title:Title} {p2colset 5 17 19 2}{...} {p2col :{hi:reweight} {hline 2}}Reweights survey variables using external information {p_end} {p2colreset}{...} {title:Syntax} {p 8 15 2} {cmd:reweight} {varlist} {cmd:,} {cmdab:w:eight(}{varname}{cmd:)} {cmdab:nw:eight}({newvar}) {cmdab:tot:al}({it:matrix}) {cmdab:df:unction}({it:name}) [{cmdab:pch:ange(}#{cmd:)} {cmdab:up:bound(}#{cmd:)} {cmdab:low:bound(}#{cmd:)} {cmdab:ntb:ounds(}#{cmd:)} {cmdab:nt:ries(}#{cmd:)} {cmdab:sv:alues}({it:matrix})] {title:Description} {pstd} {cmd:reweight} uses external information to reweight surveys of microdata. The command is based on the procedure proposed in Creedy (2004) and Deville and Särndal (1992) {title:Options for reweight} {phang} {opth weight(varname)} is required and specifies a numeric variable for the original survey weights. {phang} {opth nweight(newer)} is required and defines the name of the new weights. {phang} {opt total(matrix)} is required and identifies a Stata column vector with the new totals gathered from external information. Note that the order of these numbers must follow the order of the variables in {varlist}. {phang} {opt dfunction(name)} specifies the distance function to be used when computing the new weights. The allowed distance functions are the chi-squared (type "chi2"), the Deville and Särndal's (type "ds") and two more, which we call function A (type "a") and B (type "b"). The default is the chi-square distance function. Note that all these distance functions are explained in Creedy(2004). {phang} {opt pchange(#)} specifies the tolerance level when using Newton's method to compute the new weights. Newton's methods are used with the A, B and DS distance functions. Convergence is based on a double criteria: the percentage change in the updated distance function and the differences in the estimated totals with respect to the previous iteration. The default is {opt pchange(0.00001)}. {phang} {opt upbound(#)} is used to specifies the upper-bound when using the Deville and Särndal's distance function. The default is {cmd:upbound(5)}. Note that this value must be bigger than 1. {phang} {opt lowbound(#)} is used to specifies the lower-bound when using the Deville and Särndal's distance function. The default is {cmd:lowbound(0.2)}. Note that this value must be smaller than 1. {phang} {opt ntries(#)} is used to specifies the number of maximum “tries” with the distance functions that use Newton's methods. This option can be useful when the new totals are significantly different from the survey one. In such situations the algorithm may not achieve convergence and it automatically restarts with new (random) starting values. However, after # trays the algorithm stops anyway, instead of trying with new starting values. The default id {opt ntries(5)}. {phang} {opt ntbounds(#)} is used to specifies the number of tries before drawing new random bounds with the Deville and Särndal's distance function. This option may be useful when the new totals are significantly different from the original one. In this case the algorithm could not achieve convergence with the specified bounds and it restarts with a new set of starting values. However, depending on the number specified in {cmd:ntbounds(#)}, after # tries the algorithm sets both new starting values and new random bounds. {p 8 8 2} As an example, if after k iterations the algorithm overflows and the number in {opt ntries(#)} is bigger than 1, the algorithm restarts from iteration 1 with new starting values. If the algorithm does not achieve converge either with the new starting values and the user has specified {cmd:ntbounds(2)} then the algorithm restarts again with new starting values and new random bounds. However, if the user has specified {cmd:ntbounds(3)} it would be needed another failure - i.e. the third one - before setting new random bounds. {phang} {opt svalues(matrix)} is used to specifies starting values with the distance functions that use Newton's methods. Starting values must be in a Stata column vector, following the order of the variables in {varlist}. The default is a vector with the Lagrange Multipliers obtained from the solution of the minimization problem with the chi-squared distance function. {title:Example} {pstd} Consider the following example from Creedy(2003). {cmd:id} is the identification number of each unit included in the survey, {cmd:x1}, {cmd:x2}, {cmd:x3} and {cmd:x4} are variables included in the survey, {cmd:weight} is the vector of original survey weights: {cmd} use http://fmwww.bc.edu/repec/bocode/r/reweight.dta, clear list id x1 x2 x3 x4 weight 1 1 1 0 0 3 2 0 1 0 0 3 3 1 0 2 0 5 4 0 0 6 1 4 5 1 0 4 1 2 6 1 1 0 0 5 7 1 0 5 0 5 8 0 0 6 1 4 9 0 1 0 0 3 10 0 0 3 1 3 11 1 0 2 0 5 12 1 1 0 1 4 13 1 0 3 1 4 14 1 0 4 0 3 15 0 0 5 0 5 16 0 1 0 1 3 17 1 0 2 1 4 18 0 0 6 0 5 19 1 0 4 1 4 20 0 1 0 0 3 {txt} {pstd} The vector of survey weights produces the following aggregate totals: {cmd} 1. tabstat x1 x2 x3 x4 [w=weight], s(su) 2. stats x1 x2 x3 x4 3. sum 44 24 213 32 {txt} Now, let us assume that external information on these variables are available, and that the true totals are: {cmd} stats x1 x2 x3 x4 sum 50 20 230 35 {txt} In this case, {cmd:reweight} can be used to adjust the survey weights so that the new survey totals match the true totals: {cmd} matrix t=(50 \ 20 \ 230 \ 35) reweight x1 x2 x3 x4, w(weight) nw(wchi2) tot(t) df(chi2) reweight x1 x2 x3 x4, w(weight) nw(wds) tot(t) df(ds) low(0.2) up(3) reweight x1 x2 x3 x4, w(weight) nw(wa) tot(t) df(a) reweight x1 x2 x3 x4, w(weight) nw(wb) tot(t) df(b) {txt} {cmd} 1. tabstat x1 x2 x3 x4 [w=wds], s(su) 2. stats x1 x2 x3 x4 3. sum 50 20 230 35 4. browse w* {txt} Which gives the same values as in Creedy (2003): {cmd} weight wchi2 wds wa wb 3 2.7534503 2.7057046 2.6738135 2.6540317 3 2.1091624 2.1776459 2.2284116 2.2600965 5 5.9451664 5.9762224 5.9975662 6.0123387 4 4.0052762 3.9737666 3.9440418 3.9259422 2 2.483622 2.5006367 2.5139863 2.521438 5 4.5890838 4.5095077 4.4563559 4.4233862 5 5.7521965 5.7469223 5.7291728 5.716967 4 4.0052762 3.9737666 3.9440418 3.9259422 3 2.1091624 2.1776459 2.2284116 2.2600965 3 3.1197391 3.1055051 3.0862549 3.0740943 5 5.9451664 5.9762224 5.9975662 6.0123387 4 3.9852951 3.8966345 3.8144997 3.7619213 4 5.0187026 5.0647032 5.1083784 5.1356056 3 3.4899119 3.4936742 3.4899599 3.4872875 5 4.6783835 4.6649212 4.6654181 4.665873 3 2.3446835 2.3552152 2.370096 2.3803712 4 5.0701612 5.1284983 5.1907285 5.2318093 5 4.6140602 4.6001223 4.6025412 4.6043357 4 4.9672439 5.0012735 5.0279726 5.0428759 3 2.1091624 2.1776459 2.2284116 2.2600965{txt} {title:Reference} {phang} Creedy, J., 2004. {it: Reweighting Household Surveys for Tax Microsimulation Modelling: An Application to the New Zealand Household Economic Survey }. Australian Journal of Labour Economics 7 (1) 71-88, Centre for Labour Market Research. {phang} Creedy, J., 2003. {it: Survey Reweighting for Tax Microsimulation Modelling}, Treasury Working Paper Series 03/17, New Zealand Treasury. {phang} Deville, J.C. and Särndal, C.E., 1992. {it: Calibration estimators in survey sampling}, Journal of the American Statistical Association 87 (418) 376—382, American Statistical Association. {title:Author} {phang}This command was written by Daniele Pacifico (daniele.pacifico@tesoro), Italian Department of the Treasury. Comments and suggestions are welcome. {p_end} {title:Also see} {psee} Manual: {bf:[R] reweight} {psee} Online: {manhelp reweight R}{p_end}