------------------------------------------------------------------------------- help fordsweightandscdsweight(Roger Newson) -------------------------------------------------------------------------------

Generate direct standardization weights for input to estimation commands

dsweightstanvarlist[if] [in] [weight]usingfilename,generate(newvarname)[groupvars(varlist)by(varlist)nocompletemissingtfreqvar(varname)sortedfloatfast]

scdsweightstanvarlist[if] [in] [weight]usingfilename,generate(newvarname)scenvar(varname)[by(varlist)nocompletemissingtfreqvar(varname)sortedfloatfast]where

stanvarlistis avarlistspecifying a list of standardization variables.

Description

dsweightgenerates direct standardization weights for input as pweights to estimation commands, standardizing the joint distribution of a list of standardization variables to a standard target population, possibly within groups defined by value combinations of a list of group variables. A direct standardization weight is defined as a ratio between the frequency of a combination of values of standardization variables in a target standard population and the frequency of the same combination of values of standardization variables in the sample or group. The standard target population may be the full sample, or a by-group defined by a combination of values of by-variables, or it may be defined using a dataset with 1 observation per combination of the group variables, and data on the frequencies of these combinations in the standard target population.scdsweightis a version ofdsweightfor generating scenario direct standardization weights, which can be input as scenario weights to the SSC packagescsomersd.

Options fordsweightandscdsweight

generate(newvarname)must be present. It specifies the name of a new variable to be generated, containing the direct standardization weights.

groupvars(varlist)(dsweightonly) specifies a list of variables, whose value combinations will be groups, within which the joint distribution of the standardization variables in thestanvarlistwill be standardized, using the sampling probability weights, to the joint distribution of the standardization variables in the target population. Ifgroupvars()is absent, then the standardization weights will standardize the joint distribution of the standardization variables in the full input sample to the standard target population. The full input sample is the set of all observations in the dataset (or in the by-group ifby()is specified) for which the values of all standardization variables and all group variables are non-missing, and which are not excluded by theifand/orinqualifiers.

scenvar(varname)(scdsweightonly) specifies a binary scenario-indicator variable, with values 0 and 1, indicating that an observation is present in a scenario, for which the scenario direct standardization weights will be calculated. These scenario direct standardization weights are equal to zero for observations not in the scenario, and equal to direct standardization weights for observations in the scenario, standardizing the distribution of the standardization variables for observations in the scenario to the standard population. These scenario direct standardization weights may be input, as scenario-specific weights, to thescsomersdpackage, downloadable from SSC. Thescsomersdpackage uses rank methods to compare the distributions of outcomes between scenarios. An example of a scenario-comparison rank statistic is the population attributwble risk, which may be either crude or age-standardized.

by(varlist)specifies a list of by-variables, whose combinations (missing or non-missing) specify the by-groups. The standardization weights are calculated independently within each by-group. If ausingdataset is specified, then the by-variables must be present in thisusingdataset, and, together with the standardization variables, they must uniquely identify the observations in theusingdataset. If ausingdataset is not specified, then the generated standardization weights will standardize the joint distribution of the standardization variables to the subset of the total sample within each by-group.

nocompletespecifies that each group specified by thegroupvars()option (or the scenario specified by thescenvars()option) does not have to contain the full list of value combinations of the standardization variables. Ifnocompleteis absent, thendsweightandscdsweightchecks that each combination of values of the standardization variables (within each by-group ifby()is specified) is present in each combination of values of thegroupvars()variables, or in the scenario specified by thescenvar()variable, within each by-group ifby()is specified. If this condition is not met, thendsweightorscdsweightwill fail.

missingspecifies that the generated standardization weights, in the variable named bygenerate(), may have missing values in the input sample, even if the group (or scenario) variables and standardization variables are non-missing. This may be because the sum of weights in the sample, group or scenario is zero, or because ausingdataset is specified and does not contain an observation with the current combination of the standardization variables. Ifmissingis not specified, and some standardization weights in the input sample are missing, thendsweightorscdsweightwill fail.

tfreqvar(varname)specifies the name of a variable, in theusingdataset, containing the frequencies (or sums of weights) of the corresponding combination of standardization variables in the standard target population. Iftfreqvar()is not specified, and ausingdataset is specified, thendsweightorscdsweightlooks for a variable named_freq. Such a variable will usually be present if theusingdataset has been created by the Stata commandcontract, or by the SSC packagexcontract.

sortedfunctions as the option of the same name formerge. It specifies that the observations in theusingdataset are already sorted by the standardization variables (or by the by-variables and the standardization variables ifby()is specified), so there is no need for Stata to sort them before use. This may save some computational time.

floatspecifies that the output variable specified bygenerate()will be of storage typefloator lower. Iffloatis not specified, then the output variable will be generated as typedouble. Note that the output variable will be compressed after being generated (usingcompress) to the lowest type possible without loss of precision, whether or not the user specifiesfloat.

fastis an option for programmers. It specifies thatdsweightorscdsweightwill take no action to restore the existing dataset in memory in the event of failure, or if the user presses Break. Iffastis not specified, thendsweightandscdsweightwill take this action, which uses an amount of time depending on the size of the dataset in memory.

Remarks

dsweightworks on the same principle asdstdize. However,dsweightcreates weights that can be input to estimation commands aspweights, in order to estimate a wide range of directly-standardized parameters (not only rates and proportions).scdsweightis intended for use with thescsomersdpackage, which the user can download from SSC, and which calculates rank statistics for comparing scenarios. The user must also download the SSC packagessomersdandexpgen, ifscsomersdis to work.

ExamplesThe following examples make use of the

xcontractcommand, which can be downloaded from SSC, and is an extended version ofcontract.Set-up:

. use http://www.stata-press.com/data/r11/lbw.dta, clear. gene agegp=age. recode agegp (0/19=1) (20/29=2) (30/max=3). lab def agegp 1 "<20" 2 "20-29" 3 "30+". lab val agegp agegp. lab var agegp "Age group". describe. tab agegp, mThe following example creates and lists standardization weights, standardizing the children of smoking and non-smoking mothers to the age group distribution in the total sample, and then uses

regress, with the standardization weights as sampling probability weights, to estimate an effect of maternal smoking on birth weight, standardized by age group. We then usecenslope, part of the SSC packagesomersd, to estimate an age-standardized median difference in birth weight between the babies of smoking and non-smoking mothers.

. dsweight agegp, groupvars(smoke) gene(swei1). xcontract smoke agegp swei1, list(, abbr(32) sepby(smoke)). regress bwt smoke [pweight=swei1]. censlope bwt smoke [pweight=swei1], transf(z) tdistThe following example creates a dataset

agpfreq1, with 1 observation per maternal age group and data on the frequencies of that maternal age group in the children of non-smoking mothers. We then usedsweightto create sampling probability weights, standardizing the children of smokers and non-smokers to the maternal age group distribution of non-smokers, and display these weights usingxcontract. We then useregressto estimate the effect of smoking, in a hypothetical population, where smoking and non-smoking mothers have the age distribution of non-smokers in the sample. Finally, we usecenslopeto estimate an age-standardized median difference in birth weight between the babies of smoking and non-smoking mothers.

. xcontract agegp if smoke==0, list(, abbr(32)) saving(agpfreq1,replace). dsweight agegp using agpfreq1, groupvars(smoke) gene(swei2). xcontract smoke agegp swei2, list(, abbr(32) sepby(smoke)). regress bwt smoke [pweight=swei2]. censlope bwt smoke [pweight=swei2], transf(z) tdistThe following example demonstrates the use of the

scdsweightmodule to compute scenario direct standardization weights for use with thescsomersdpackage, downloadable from SSC. We define a scenario indicator variablenonsmoke, indicating that a subject is a non-smoker. We then usescsomersdto define scenario direct-standardization weights, stored in a new variableswei3, and equal to age-standardization weights for children of non-smokers and to zero for children of smokers. We then usescsomersdto compare two scenarios, the real-world scenario and a fantasy scenario where all mothers are non-smoking and the age-group distribution stays the same, and estimate a population attributable risk, equal to the difference between the proportions of babies with low birth weight in the real-world scenario and in the fantasy scenario.

. gene nonsmoke=1-smoke. scdsweight agegp, scenvar(nonsmoke) gene(swei3). xcontract smoke nonsmoke agegp swei3, list(, abbr(32) sepby(smoke)). scsomersd low [pwei=1], sweight(swei3) transf(z) tdist

AuthorRoger Newson, National Heart and Lung Institute, Imperial College London, UK. Email: r.newson@imperial.ac.uk

Also seeManual:

[D] merge,[D] contract,[R] dstdizeOn-line: help formerge,contract,dstdizehelp forxcontract,somersd,censlope,scsomersd,expgenif installed