help calibrate-------------------------------------------------------------------------------

Title

calibrate-- Calibrates survey datasets to population totals

Syntax

calibrate,marginals(varlist)poptot(matrix)entrywt(varname)exitwt(varname) [options]

calibratetakes a sampling weight and converts it to a calibration weight. The variables being calibrated to are listed inmarginals, and the population totals used in the calibration are in a row matrixpoptot.

optionsDescription -------------------------------------------------------------------------Requiredentrywt(varname)the entry weight (selection weight)exitwt(varname)the exit (calibrated) weightmarginals(varlist)variables used for calibrationpoptot(matrix)row matrix of population totals

Optionsmethod(method)specifies the calibration methodqvar(varname)scaling variable (default equals 1)print()controls the amount of printinggraphs(graphs)controls the graphs producedoutc(varname)binary variable used for methodnonrespto indicate responsesampvars(varlist)additional variables used for the non-response methodstolerance(real)tolerance used in the iterative methods (logisticandblinear)maxit(real)maximum number of iterations used in the iterative methods (logisticandblinear)lbound(real)minmum value of the ratioexitwttoentrywtfor methodblinearubound(real)maximum value of the ratioexitwttoentrywtfor methodblinear-------------------------------------------------------------------------

Description

calibratecalibrates survey datasets to external totals. Seven possible methods are available. Thelinearandlogisticmethods are the equivalent to Methods 1 and 2 of Deville and Särndal (1992). The bounded linear method (blinear) is an iterative method that uses the linear method while also constraining the ratio of the exit weight to the entry weight to be between specified limits (c.f. Singh and Mohl, 1996). The non-response methods (nrSS,nr2A,nr2Bandnr2C) assume the dataset contains both responders and non-responders. They calibrate the responders to population-level information on the variables inmarginals, while using information about the selected sample on the variables insampvars. MethodnrSSis the single-step procedure in Chapter 8 of Särndal and Lundström (2005). Methodsnr2Aandnr2Bare their two-step procedures with one difference: the intermediate weights obtained after the first step have any negative weights set to zero. Methodnr2Cis related to the other two-step methods. Details of methodnr2Care available on request.

In the special case where the calibration variables are all categorical and the scaling variable is a constant, the logistic method is equivalent to raking (Demming and Stephan, 1940). This case can also be dealt with using the

maxentropyprogram.

entrywtis the selection weight of the individual case. It will usually be the reciprocal of the selection probability. If it has been scaled (for example to sum to the sample size) it will usually be advisable to rescale it to sum to the population size. The weightexitwtwill be generated (or replaced if it already exits). The population totals are held in the row matrixpoptot. The calibration variables (marginals) should be numeric. Categorical variables will usually need to be converted to indicator variables.

Options

method(method)specifies the calibration method.linearis the default. Other methods arelogistic,blinear, or the non-response methods:nrSS,nr2A,nr2Bandnr2C.

qvar(varname)is related to the importance of the observation. See (Deville and Särndal, 1992) for further details. When using one of the non-response methods, it is usually advisable to use the default value ofqvar.

print()controls the amount of printing. Options arenone(the default),final(which summarises the final weights) andall(which summarises the weights after each iteration). When the method islinearornonrespthe optionsfinalandallare equivalent.

graphs(graphs)controls the number of graphs produced. Options arenone(the default),final(which produces a histogram of the exit weight) andall. The optionallproduces two additional graphs: a scatterplot of the exit weight against the entry weight, and a histogram either of the ratio of the exit weight to the entry weight (for methodslinear,blinearornonresp) or of the logarithm of the ratio of the exit weight to the entry weight (for methodlogistic).

outc(varname)is a binary variable equal to 1 if the case corresponds to a responder and 0 otherwise. This is required when a non-response method is used and is ignored otherwise.

sampvars(varlist)is a list of variables that are available on the complete sample, both responders and non-reponders. This is required when a non-response method is used and is ignored otherwise. Variables inmarginalsshould not be included insampvars.

tolerance(real)specifies the tolerance for the iterative methods.

maxit(real)specifies the maximum number of iterations to be used by the iterative methods. The default is 15.

lbound(real)Puts a lower bound on the ratioexitwttoentrywtfor methodblinear. The default is 0.2.

ubound(real)Puts an upper bound on the ratioexitwttoentrywtfor methodblinear. The default is 5.

Warnings and problemsCalibration can result in negative weights. If this happens

calibratewill give a warning. (Note that the methodlogisticensures that calibration weights are positive). It will also give a warning if the calibration matrix is found to be singular. This is usually a consequence of collinearity among the marginal variables and the solution is usually to re-calibrate after omitting variables.Note also that there is no guarantee that a solution to the calibration equations exits.

It is also worth noting that the method

calibrateuses to solve the calibration equations involves calculating the inverse of a matrix using the commandinvsym. This limits the number of calibration constraints that can be used to the maximum size of the matrix. There could also be some problems if the problem is almost singular.A further problem might occur when using the logistic method. This method uses Newton-Raphson to solve the calibration equations, and might fail to converge, especially if the initial estimate is not close to the solution. The initial estimate

calibrateuses is calculated from the selection weights. Newton-Raphson might fail if the selection weights have been scaled (for example to sum to the sample size). Rescaling them to sum to the population size will sometimes be a solution.

ExamplesTo calibrate the

multistagedataset. The population consists of 8,000,000 high school seniors. Assume it is known that it is 50% male and 50% female, and contains 7,000,000 white seniors.

. use http://www.stata-press.com/data/r9/multistageConvert the categorical variables

sexandraceinto binary indicator variables.

. tab sex, gen(isex). tab race, gen(irace)Make a row matrix of popultaion totals (male, female, white).

. matrix M=[4000000, 4000000, 7000000]An example of linear calibration creating an exit weight called wt1:

. calibrate , marginals(isex1 isex2 irace1) poptot(M)entrywt(sampwgt) exitwt(wt1)An example of linear calibration with additional printing:

. calibrate , marginals(isex1 isex2 irace1) poptot(M)entrywt(sampwgt) exitwt(wt1) print(all) graphs(all)To check that the weighted sex and race distributions are correct:

. tab sex [iweight=wt1]. tab race [iweight=wt1]It is possible to calibrate to continuous variables. Suppose it is also known that the average weight is 160lbs (so the total weight is 1,280,000,000lbs).

. matrix M=[4000000, 4000000, 7000000, 1280000000]Linear, logistic or bounded linear calibration can be used. An example of logistic (with printing turned on) is:

. calibrate , marginals(isex1 isex2 irace1 weight) poptot(M)entrywt(sampwgt) exitwt(wt2) method(logistic) print(all)Checks:

. tab sex [iweight=wt2]. tab race [iweight=wt2]. summ weight [iweight=wt2]

Saved results

calibratesaves the following inr():Scalars

r(N)number of observationsr(mean)mean exit weightr(min)minimum exit weightr(max)maximum exit weightr(entdeff)approximate design effect (one plus the coefficient of variation) of the entry weightsr(exitdeff)approximate design effect (one plus the coefficient of variation) of the exit weightsr(sclmin)minimum exit weight after re-scaling to have a mean of oner(sclmax)maximum exit weight after re-scaling to have a mean of oner(sclsd)standard deviation of exit weights after re-scaling to have a mean of one

Matrices

r(Bhat)coefficients of the variables inmarginalsused in the equation calculating the exit weight

Also seeCalibration can be thought of as a generalisation of post-stratification. The p > rogram

calibestgeneralises Stata's post-stratification estimation commands.

ReferencesDeming, W. E., and F. F. Stephan. 1940. On a least squares adjustment of a sample frequency table when the expected marginal totals are known.

Annals of Mathematical Statistics11: 427-444.Deville, J.-C., and C.-E. Särndal. 1992. Calibration estimators in survey sampling.

Journal of the American Statistical Association87: 376-382.Särndal, C.-E., and S. Lundström. 2005. {it Estimation in Surveys with Nonresponse} New York, Wiley.

Singh, A., C. and C. A. Mohl. 1996. Understanding calibration estimators in survey sampling

Survey Methodology22: 107-115. Chichester, UK: Wiley.

AuthorJohn D'Souza National Centre for Social Research London, England, UK John.D'Souza@natcen.ac.uk