{smcl} {cmd:help iimpute} {hline} {title:Title} {p2colset 5 20 22 2}{...} {p2col :iimpute {hline 2}}Incremental simple (or multiple separate) imputation(s) of a set of variables{p_end} {p2colreset}{...} {title:Syntax} {p 8 16 2} {opt iimpute} {varlist} [{cmd:,} {it:options}] {synoptset 25 tabbed}{...} {synopthdr} {synoptline} {synopt :{opt add:itional(varlist)}}additional variables to include in the imputation model{p_end} {synopt :{opt con:textvars(varlist)}}a set of variables identifying different electoral contexts (by default all cases are treated as part of the same context).{p_end} {synopt :{opt sta:ckid(varname)}}a variable identifying different "stacks", for which values will be separately imputed if {cmd:iimpute} is issued after stacking.{p_end} {synopt :{opt nos:tack}}override the default behavior that treats each stack as a separate context).{p_end} {synopt :{opt min:ofrange(#)}}minimum value of the item range (used for recoding imputed values){p_end} {synopt :{opt max:ofrange(#)}}maximum value of the item range (used for recoding imputed values){p_end} {synopt :{opt ipr:efix(name)}}prefix for generated imputed variables (default is "i_"){p_end} {synopt :{opt mpr:efix(name)}}prefix for generated variables indicating original missingness of a variable (default is "m_"){p_end} {synopt :{opt mco:untname(name)}}name of a generated variable reporting original number of missing items (default is "_iimpute_mc"){p_end} {synopt :{opt mim:putedcountname(name)}}name of a generated variable reporting number of missing items after imputation (default is "_iimpute_mic"){p_end} {synopt :{opt noi:nflate}}do not inflate the variance of imputed values to match the variance of original item values (default is to add random perturbations to these values, as required).{p_end} {synopt :{opt rou:nd}}round each final value (after inflation, unless that was suppressed) to the nearest integer (default is to leave values unrounded).{p_end} {synopt :{opt lim:itdiag(#)}}number of contexts for which to display full diagnostics (these can be quite voluminous) as imputation progresses (default is to display diagnostics for all contexts).{p_end} {synopt :{opt rep:lace}}drops all original variables in {it:{bf:varlist}} after imputation.{p_end} {synoptline} {title:Description} {pstd} Though {cmd:iimpute} can impute missing values for a single variable (by calling Stata's {cmd:impute}, but with various options as described below) its primary function is to impute multiple variables according to an incremental procedure which - if required - is applied separately to electoral contexts identified by {it:contextvars}: {pstd}1) Within each context, observations are split into groups, based on the number of missing items. Observations for which only one variable has a missing value are processed first, and so on. {pstd}2) Within each of the above groups, variables are ranked according to the number of missing observations. Variables with fewer missing observations are processed first, and so on. {pstd}3) According to the order defined in step 2 (and within each group defined in step 1), variables are imputed through simple imputation (using Stata's {cmd:impute} command). {pmore}This implements the incremental nature of the procedure. Since observations with fewer missing variables are imputed first, and (within each group) items with fewer missing observations are imputed first, later imputations (that have to impute more data) will use a more complete (partially imputed) dataset. {pmore}The imputation model is based on all valid values of variables in {it:varlist}, plus all variables specified in the {cmd:additional()} option, which - understandably - would be crucial for imputation of those observations where all variables in {it:varlist} have missing values (but there might be theoretical reasons for basing imputation only on the values of other members of a battery). {pmore}Please note that Stata's {bf:{help impute:impute}} command's {cmd:regsample()} option is used, with a dummy variable generated from the actual value of {it:contextvar}. This means that the sample used in the imputation model is the whole electoral context and not only the restricted group defined in step 1. {pmore}NOTE that the number of independent variables upon which to base the imputation (the total of {it:{bf:varlist}} and {cmd:additional}) is limited to 30 because that is the limit for Stata's {cmd:impute} command. This limitation might lead the user to prefer to issue the {cmd:iimpute} command after {bf:{help genstacks:genstacks}} and {bf:{help genyhats:genyhats}} have reduced the number of indeps in the dataset. {pstd}4) The variance of imputed item values is then inflated to match the variance of original item values, as recommended in the literature. If this is not wanted then the option {cmd:noinflate} should be employed. {pstd}5) Imputed values are finally rounded, if {cmd:round} is optioned. Specifying the {cmd:minofrange()} and/or {cmd:maxofrange()} options further constrains the imputed values to a specific range. While such options are not useful when imputing heterogeneous variables,they can be useful when a battery of analogous items is being imputed. This may suggest calling {cmd:iimpute} multiple times with different settings for these constraints. By default no constraint is applied. {pstd} The {cmd:iimpute} command can be issued before or after stacking. If issued after stacking, by default it treats each stack as a separate context to take into account along with any higher-level contexts. However, the {cmd:nostack} option can be employed to force {cmd:iimpute} to ignore the stack-specific contexts. In addition, the {cmd:iimpute} command can be employed with or without distinguishing between higher-level contexts, if any, (with or without the {cmd:contextvars} option) depending on what makes methodological sense.{break} {title:Multiple Imputation} {pstd}It is possible to impute multiple different datasets by using Stata's {bf:{help set seed:set_seed}} command to supply a different seed for the random number generator called by {cmd:iimpute} that inflates the variance of the imputed values returned by Stata's {cmd:impute}. Each dataset created in this way needs to be separately saved before changing the seed to impute a different dataset. The resulting datasets can be imported into Stata's {bf:{help mi:mi}} or used to arrive at separate estimates that are then combined manually. NOTE that, if Stata's {cmd:seed} command is not employed, the separate datasets will still be different from each other (a different dataset would be created on each occasion because by default Stata employs a different random seed each time it inflates the variance of imputed values), but these differences will not be replicable. {title:Options} {phang} {opth additional(varlist)} if specified, additional variables to include in the imputation model beyond those in {it:varlist}. These additional variables will not have any missing values imputed. {phang} {opth contextvars(varlist)} if specified, variables whose combinations identify different electoral contexts (default is to treat all cases as part of the same context) {phang} {opth stackid(varname)} if specified, a variable identifying different "stacks" for which values will be separately imputed in the absence of the {cmd:nostack} option. The default is to use the "genstacks_stack" variable if the {cmd:iimpute} command is issued after stacking. {phang} {opt nostack} if present, overrides the default behavior of treating each stack as a separate context (has no effect if the {cmd:iimpute} command is issued before stacking). {phang} {opth minofrange(name)} if specified, minimum value of the item range (used for constraining imputed values).{p_end} {phang} {opth maxofrange(name)} if specified, maximum value of the item range (used for constraining imputed values).{p_end} {phang} {opth iprefix(name)} if specified, prefix for generated imputed variables (default is "i_"){p_end} {phang} {opth mprefix(name)} if specified, prefix for generating variables that indicate original missingness of a variable (default is "m_"){p_end} {phang} {opth mcountname(name)} if specified, name of a generated variable reporting number of missing items before imputation (default is "_iimpute_mc"){p_end} {phang} {opth mimputedcountname(name)} if specified, name of a generated variable reporting number of missing items after imputation, which could still be non-zero if all variables in the imputation model are missing for certain cases (default is "_iimpute_mic"){p_end} {phang} {opt noinflate} if specified, do not inflate the variance of imputed values to match the variance of original item values (default is to add random perturbations to these values, as required){p_end} {phang} {opt round} if specified, round each final value (after inflation, if any) to the closest integer (default is to leave values unrounded){p_end} {phang} {opth limitdiag(#)} if specified, limits the number of contexts for which full diagnostics are displayed to # (default is to display diagnostics for all contexts, which can be quite voluminous){p_end} {phang} {opt replace} if specified, drops all original variables for which imputed versions have been created (default is to keep original as well as new variables){p_end} {title:Examples:} {pstd}The following command imputes PTVs stored in variables whose names begin with {it:ptv}, (using standard Stata variable variable list conventions) in a dataset where observations are nested in contexts defined by {it:cid}. The imputation model is based only on the PTV variables. Imputed values will be rounded to the nearest integer between 0 and 10. The data are assumed to not be already stacked.{p_end}{break} {phang2}{cmd:. iimpute ptv*, context(cid) min(0) max(10) round} {p_end} {pstd}The following command imputes variables {it:ptv} and {it:lrresp} in a dataset that had already been stacked and where observations are nested in contexts defined by {it:cid}. The imputation model is based on these variables plus a variety of y-hat affinity varlables and one party-level variable (seats). Imputed values will not be constrained in any way. Such a command might well be issued prior to a call on gendist to create euclidean distances between lrresp (if that was left-right respondent location) and a battery of party location variables.{p_end}{break} {phang2}{cmd:. iimpute ptv lrresp, additional(y_class-y_churchatt seats) contextvars(cid)} {p_end} {title:Generated variables} {pstd} {cmd:iimpute} saves the following variables and variable sets: {synoptset 20 tabbed}{...} {synopt:i_{it:name1} i_{it:name2} ...} a set of variables with names matching the original variables (which are left unchanged) for which missing data has been imputed.{p_end} {synopt:m_{it:name1} m_{it:name2} ...} a set of dummy variables indicating whether each specific variable was imputed in a specific observation (i.e. was originally missing).{p_end} {synopt:_iimpute_mc} a variable showing the original count of missing items for each case.{p_end} {synopt:_iimpute_mic} a variable showing the count of items that are still missing for each case after imputation. This might happen, eg., if the variables specified in {it:additional} also have mostly missing values on the same observations where all variables in {it:varlist} are missing.{p_end} {phang} NOTE that a subsequent invocation of {cmd:iimpute} will replace {it:_iimpute_mc} and {it:_iimpute_mic} with new counts of missing values for that invocation of {cmd:iimpute}. So the user should save these values after issuing the previous command, if they will be of later interest.