Titleiimpute -- Incremental simple (or multiple separate) imputation(s) of a set of variables

Syntax

iimputevarlist[,options]

optionsDescription -------------------------------------------------------------------------additional(varlist)additional variables to include in the imputation modelcontextvars(varlist)a set of variables identifying different electoral contexts (by default all cases are treated as part of the same context).stackid(varname)a variable identifying different "stacks", for which values will be separately imputed ifiimputeis issued after stacking.nostackoverride the default behavior that treats each stack as a separate context).minofrange(#)minimum value of the item range (used for recoding imputed values)maxofrange(#)maximum value of the item range (used for recoding imputed values)iprefix(name)prefix for generated imputed variables (default is "i_")mprefix(name)prefix for generated variables indicating original missingness of a variable (default is "m_")mcountname(name)name of a generated variable reporting original number of missing items (default is "_iimpute_mc")mimputedcountname(name)name of a generated variable reporting number of missing items after imputation (default is "_iimpute_mic")noinflatedo not inflate the variance of imputed values to match the variance of original item values (default is to add random perturbations to these values, as required).roundround each final value (after inflation, unless that was suppressed) to the nearest integer (default is to leave values unrounded).limitdiag(#)number of contexts for which to display full diagnostics (these can be quite voluminous) as imputation progresses (default is to display diagnostics for all contexts).replacedrops all original variables invarlistafter imputation.

DescriptionThough

iimputecan impute missing values for a single variable (by calling Stata'simpute, but with various options as described below) its primary function is to impute multiple variables according to an incremental procedure which - if required - is applied separately to electoral contexts identified bycontextvars:1) Within each context, observations are split into groups, based on the number of missing items. Observations for which only one variable has a missing value are processed first, and so on.

2) Within each of the above groups, variables are ranked according to the number of missing observations. Variables with fewer missing observations are processed first, and so on.

3) According to the order defined in step 2 (and within each group defined in step 1), variables are imputed through simple imputation (using Stata's

imputecommand).This implements the incremental nature of the procedure. Since observations with fewer missing variables are imputed first, and (within each group) items with fewer missing observations are imputed first, later imputations (that have to impute more data) will use a more complete (partially imputed) dataset.

The imputation model is based on all valid values of variables in

varlist, plus all variables specified in theadditional()option, which - understandably - would be crucial for imputation of those observations where all variables invarlisthave missing values (but there might be theoretical reasons for basing imputation only on the values of other members of a battery).Please note that Stata's

imputecommand'sregsample()option is used, with a dummy variable generated from the actual value ofcontextvar. This means that the sample used in the imputation model is the whole electoral context and not only the restricted group defined in step 1.NOTE that the number of independent variables upon which to base the imputation (the total of

varlistandadditional) is limited to 30 because that is the limit for Stata'simputecommand. This limitation might lead the user to prefer to issue theiimputecommand aftergenstacksandgenyhatshave reduced the number of indeps in the dataset.4) The variance of imputed item values is then inflated to match the variance of original item values, as recommended in the literature. If this is not wanted then the option

noinflateshould be employed.5) Imputed values are finally rounded, if

roundis optioned. Specifying theminofrange()and/ormaxofrange()options further constrains the imputed values to a specific range. While such options are not useful when imputing heterogeneous variables,they can be useful when a battery of analogous items is being imputed. This may suggest callingiimputemultiple times with different settings for these constraints. By default no constraint is applied.The

iimputecommand can be issued before or after stacking. If issued after stacking, by default it treats each stack as a separate context to take into account along with any higher-level contexts. However, thenostackoption can be employed to forceiimputeto ignore the stack-specific contexts. In addition, theiimputecommand can be employed with or without distinguishing between higher-level contexts, if any, (with or without thecontextvarsoption) depending on what makes methodological sense.

Multiple ImputationIt is possible to impute multiple different datasets by using Stata's

set_seedcommand to supply a different seed for the random number generator called byiimputethat inflates the variance of the imputed values returned by Stata'simpute. Each dataset created in this way needs to be separately saved before changing the seed to impute a different dataset. The resulting datasets can be imported into Stata'smior used to arrive at separate estimates that are then combined manually. NOTE that, if Stata'sseedcommand is not employed, the separate datasets will still be different from each other (a different dataset would be created on each occasion because by default Stata employs a different random seed each time it inflates the variance of imputed values), but these differences will not be replicable.

Options

additional(varlist)if specified, additional variables to include in the imputation model beyond those invarlist. These additional variables will not have any missing values imputed.

contextvars(varlist)if specified, variables whose combinations identify different electoral contexts (default is to treat all cases as part of the same context)

stackid(varname)if specified, a variable identifying different "stacks" for which values will be separately imputed in the absence of thenostackoption. The default is to use the "genstacks_stack" variable if theiimputecommand is issued after stacking.

nostackif present, overrides the default behavior of treating each stack as a separate context (has no effect if theiimputecommand is issued before stacking).

minofrange(name)if specified, minimum value of the item range (used for constraining imputed values).

maxofrange(name)if specified, maximum value of the item range (used for constraining imputed values).

iprefix(name)if specified, prefix for generated imputed variables (default is "i_")

mprefix(name)if specified, prefix for generating variables that indicate original missingness of a variable (default is "m_")

mcountname(name)if specified, name of a generated variable reporting number of missing items before imputation (default is "_iimpute_mc")

mimputedcountname(name)if specified, name of a generated variable reporting number of missing items after imputation, which could still be non-zero if all variables in the imputation model are missing for certain cases (default is "_iimpute_mic")

noinflateif specified, do not inflate the variance of imputed values to match the variance of original item values (default is to add random perturbations to these values, as required)

roundif specified, round each final value (after inflation, if any) to the closest integer (default is to leave values unrounded)

limitdiag(#)if specified, limits the number of contexts for which full diagnostics are displayed to # (default is to display diagnostics for all contexts, which can be quite voluminous)

replaceif specified, drops all original variables for which imputed versions have been created (default is to keep original as well as new variables)

Examples:The following command imputes PTVs stored in variables whose names begin with

ptv, (using standard Stata variable variable list conventions) in a dataset where observations are nested in contexts defined bycid. The imputation model is based only on the PTV variables. Imputed values will be rounded to the nearest integer between 0 and 10. The data are assumed to not be already stacked.

. iimpute ptv*, context(cid) min(0) max(10) roundThe following command imputes variables

ptvandlrrespin a dataset that had already been stacked and where observations are nested in contexts defined bycid. The imputation model is based on these variables plus a variety of y-hat affinity varlables and one party-level variable (seats). Imputed values will not be constrained in any way. Such a command might well be issued prior to a call on gendist to create euclidean distances between lrresp (if that was left-right respondent location) and a battery of party location variables.

. iimpute ptv lrresp, additional(y_class-y_churchatt seats)contextvars(cid)

Generated variables

iimputesaves the following variables and variable sets:i_

name1i_name2... a set of variables with names matching the original variables (which are left unchanged) for which missing data has been imputed. m_name1m_name2... a set of dummy variables indicating whether each specific variable was imputed in a specific observation (i.e. was originally missing). _iimpute_mc a variable showing the original count of missing items for each case. _iimpute_mic a variable showing the count of items that are still missing for each case after imputation. This might happen, eg., if the variables specified inadditionalalso have mostly missing values on the same observations where all variables invarlistare missing.NOTE that a subsequent invocation of

iimputewill replace_iimpute_mcand_iimpute_micwith new counts of missing values for that invocation ofiimpute. So the user should save these values after issuing the previous command, if they will be of later interest.