help for mimP Royston, JC Galati, JB Carlin & IR White -------------------------------------------------------------------------------

Title

mim-- A prefix command for analysing and manipulating multiply imputed datasets

Syntax

mim[,mim_options]:command

mim[,replay_options]

mim_optionsDescription ------------------------------------------------------------------------- General *category(cat_type)wherecat_typeisfit,maniporcombine- specify whethercommandis estimation, data manipulation or one whose (scalar) results are to be combined using Rubin's rulesnoisilydisplay output from execution ofcommandwithin each of the imputed datasetsEstimation (valid only for estimation commands)

dotsdisplay progress dots during model fittingfrom(#)fit model, starting from imputation#to(#)fit model, ending with imputation#storebvfillse(b),e(V)etc. with multiple-imputation estimatesManipulation (valid only for data manipulation commands) +

sortorder(varlist)one or more variables that uniquely identify the observations in a given imputed dataset following each execution ofcommandCombination (valid for a wide range of Stata commands)

est(est_spec)specifies the scalar (calledest) to be combined across imputationsse(se_spec)specifies the standard error ofestto be combined across imputationsbyvaruses byvar (rather than the default, statsby) to extract and storeestand its SE in each imputation------------------------------------------------------------------------- * only necessary for estimation and data manipulation commands not listed under Description + not valid for append and reshape; MANDATORY for all other data manipulation commands.

replay_optionsDescription -------------------------------------------------------------------------clearbvclearse(b),e(V)etc., but leaves othermimestimates intactj(#)fillse(b),e(V)etc. with estimates corresponding to imputed dataset#mcerrordisplays a table of Monte Carlo standard errors for quantities in the table of regression coefficientsstorebvsame as for estimation, unlessjoption is specifiedreporting_optionslevel and eform options supported bycommand-------------------------------------------------------------------------

xiis allowed as a prefix tomim, but not as prefix tocommand, see xi.svyis allowed as a prefix tocommand, see svy.versionis allowed as a prefix tocommand, see version.

mimis a prefix command for working with multiply-imputed (MIM) datasets, wherecommandcan be any of a wide range of Stata commands. The function thatmimperforms depends on the category ofcommandpassed tomim; either estimation, data manipulation, post estimation or utility. A limited range of commands can be used withmimwithout specifying thecategorymim_option. These are:

Estimation:regress, mean, proportion, ratio, logistic, logit, ologit, mlogit, probit, oprobit, poisson, glm, binreg, nbreg, gnbreg, blogit, clogit, cnreg, mvreg, rreg, qreg, iqreg, sqreg, bsqreg, stcox, streg, xtgee, xtreg, xtlogit, xtnbreg, xtpoisson, xtmixed, svy:regress, svy:mean, svy:proportion, svy:ratio, svy:logistic, svy:logit, svy:ologit, svy:mlogit, svy:probit, svy:oprobit, svy:poisson, stepwise

Post Estimation:lincom, testparm, predict

Data Manipulation:reshape, append, merge

Utility:check,genmissWith one exception,

commandis specified with its full usual syntax. The exception is merge, where only one "using" file is allowed. Also,commandmay be one of two internal utility commands,checkandgenmiss, where the required syntaxes are

mim:check[varlist]

mim:genmissvarnamerespectively (see Utility commands for more details regarding these two commands).

Note that the

commandstepwiseexpects the synatx of Stata'sstepwisecommand, and is itself a 'prefix' command. It uses P-values from Wald tests for deciding whether to include or exclude variables in a model.Further Stata estimation and data manipulation commands can be used with

mimby specifying the mim_optioncategory(mim_type), wheremim_typemay befitfor estimation commands,manipfor data manipulation commands orcombinefor combining scalar estimates and their SE's according to Rubin's rules. See Combining estimates using Rubin's rules for more details ofmim, category(combine), and Combining estimates using Rubin's rules for a warning about combining estimates in this way. Use ofmimin these ways is at the user's discretion, and the results are not guaranteed.The dataset structure used by

mimis a stacked format. In Stata 11 it may be either the newflongstyle or that created by Royston's ice (if installed) command. Details of the dataset format may be found under MIM dataset format below. Also, please study the following remarks on howmimfunctions under different versions of Stata.

mim and Stata 11With Stata 11,

mimrecognizes the 'old' ice-style format variables (_miand_mj) and the newmi-style variables (_mi_idand_mi_m). Note that multiply imputed data created by ice can be imported into themiflongstyle by using the command mi import ice, clear automatic. Theautomaticoption ensures that the imputed variables are correctly registered. If you omit the option, you may encounter difficulties.If

mimis called by a Stata version below 11.0, it recognizes only_miand_mjas format variables. If called by Stata version 11.0 or higher,mimfirst looks for_miand_mj. If it fails to find them, it checks for anmi-style data structure and if necessary converts the data to styleflong(see mi set and mi convert). Note that theflongstyle persists aftermimhas finished. Finally, if neither type of formatting is found,mimgives up and issues an error message.In what follows, the format variables are called

_mi_idand_mi_mwith the implicit understanding that if the data are in theiceformat, we mean_miand_mj, respectively.With Stata 11, if the data are in

miformat andmimcreates new variables, e.g. with themim: predictnewvarcommand, make sure you keep such variables unregistered. To avoid possible data loss in Stata 11 when working withmim, do NOT convert the data to a differentmistyle using mi convert.When

mimstarts, it checks and reports which format is being used.

Options+---------+ ----+ General +----------------------------------------------------------

categoryspecifies the type of command that is being passed tomim, either estimation (categoryfit) or data manipulation (categorymanip).

noisilyspecifies that the results of the application ofcommandto each of the individual imputed datasets should be displayed.+------------+ ----+ Estimation +-------------------------------------------------------

dotsspecifies that progress dots should be displayed.

from(#)fits the specified model from imputation#(i.e. for_mi_m >=#).#must be an integer between 1 andm, the maximum value of_mi_min the dataset. Default#is 1.

storebvspecifies that the standard list of returned results for estimation commands be filled using the multiple-imputation results. In particular this forces the multiple-imputation coefficient and covariance matrix estimates intoe(b)ande(V), respectively, enabling application at the user's own discretion of Stata post-estimation commands that use these quantities directly (see Replay of estimation results [advanced] for further details).

to(#)fits the specified model between imputationfrom()and imputation#.#must be an integer between 2 andm, wheremis the maximum value of_mi_min the dataset. Note that if#>mthen#is assumed to equalmand no error is raised. Default#ism.+--------------+ ----+ Manipulation +-----------------------------------------------------

sortorderspecifies a list of one or more variables that uniquely identify the observations in each of the datasets in amim-compatible dataset; for data manipulation, this option must specify a list of variables that together uniquely identify the observations in each dataset AFTERcommandhas been applied to the given dataset (note thatvarlistcannot include_mi_id, since the_mi_mand_mi_idvariables are dropped from each dataset prior to the call tocommand).+-------------+ ----+ Combination +------------------------------------------------------

byvarspecifies thatbyvarbe used to execute the requiredstata_cmdin each imputation and store the required statistic (and optionally, its SE) in new variable(s), to be combined bymimaccording to Rubin's rules. The default is to usestatsby. Use ofbyvaraffects the syntax of the optionsest()andse(), see below.

est(est_spec)specifies the scalarestto be combined across imputations.est_specdepends on whether thebyvaroption is used or not. By default,statsbyis used to computeestfromstata_cmdaccording toest_spec.The following table shows what

est_speclooks like when the estimand,est, is a regression coefficient, its SE, or a quantity (usually a scalar) returned bystata_cmdin either ane()or anr()result:--------------------------------------------------------------- Type of estimand (

est)statsby(default)byvar--------------------------------------------------------------- Regression coefficient [eq]_b[varname]b(varname)SE of regression coefficient [eq]_se[varname]se(varname)Quantity returned in e()e(quantityname)e(quantityname)Quantity returned in r()r(quantityname)r(quantityname)---------------------------------------------------------------The optional

eqrefers to an 'equation';eqmay be##, where#is an equation number, or an equation name.byvardoes not currently support multiple equations.

se(se_spec)specifies the standard error ofestto be used with Rubin's rules. Note thatse()is optional; if omitted, only the mean ofestacross imputations is calculated.se_specfollows the same rules asest_spec(seeest()above).+--------+ ----+ Replay +-----------------------------------------------------------

clearbvspecifies that the additional items returned using thestorebvorjoptions be cleared, but that all other estimation results returned bymimbe left intact.

j(#)specifies that the standard results returned by estimation commands be filled using the estimates from the last fit of an estimation command applied to the#th imputed dataset, and that these estimates be replayed.

mcerrordisplays a table of Monte Carlo standard errors for the quantities presented in the main table of multiple-imputation results. The MC standard errors measure the uncertainty in the estimated quantities due to the use of a finite number m of imputations. In general, MC error decreases as m is increased. The MC error for the regression coefficients is computed as the square root of the between-imputation variance (B) divided by the square root of the number of imputations. For the other quantities, jackknife estimates (leaving out one imputation each time) (Efron & Gong 1983) are presented. Themcerroroption may not be combined with other replay options other thanreporting_options, nor may it be specified at model-fitting time.

storebv, same as for estimation, unless thejoption is specified.

reporting_optionsspecifieslevel()andeformoptions supported bycommand.

There are no

mim_optionsformim: checkandmim: genmiss.mim: predictallows options appropriate topredictaftercommand- see Notes on mim: predict for further information.

RemarksRemarks are presented under the headings

MIM dataset format,Display ofregression results,Combining estimates using Rubin's rules,Notes onmim: predict, andScore labels in -mlogit-.For a multiply-imputed dataset to be compatible with

mim, the dataset must contain:a numeric variable called

_mi_mwhose values identify the individual dataset to which each observation belongs, a numeric variable called_mi_idwhose values identify the observations within each individual dataset.Moreover, if the original data with missing values are to be stored in the dta file, then those observations must be identified with the value

_mi_m==0, while imputed datasets are identified using positive_mi_mvalues. In particular, the dataset in the stack identified by_mi_m==0is ignored for the purpose of model fitting withmim. For convenience, a multiply-imputed dataset satisfying the above requirements is called aMIM dataset.The requirements above have been kept as simple as possible. They allow a set of multiply-imputed datasets stored in separate files to be stacked into the format required by

mimusing only the basic data processing commandsgenerate,appendandreplace. (Nevertheless, for convenience, a dedicated command mimstack has been provided for this purpose.)An example of a multiply imputed dataset in

mim-compatible format is shown below. The original data consist of a completely observed variable y and a variable x with missing values in the 3rd, 4th and 6th observations, and there are 2 imputed copies of the original dataset in the stack.

_mi_m_mi_idyx---------------------------------- 0 1 1.1 105 0 2 9.2 106 0 3 1.1 . 0 4 2.3 . 0 5 7.5 108 0 6 7.9 . 1 1 1.1 105 1 2 9.2 106 1 3 1.1 109.796 1 4 2.3 110.456 1 5 7.5 108 1 6 7.9 102.243 2 1 1.1 105 2 2 9.2 106 2 3 1.1 107.952 2 4 2.3 115.968 2 5 7.5 108 2 6 7.9 114.479

mimdisplays parameter estimates (obtained by Rubin's rules - see Model fitting) and their standard errors, taking into account between- and within-imputation variation. Confidence intervals and test statistics for regression coefficients are based on the t distribution with estimated degrees of freedom (d.f.) obtained using the method of Barnard and Rubin. The final entry for each parameter estimate in the model is "FMI", standing for "fraction of missing information". For each predictor, the FMI is a function of the ratio of the between- to within-imputation variance of the estimated coefficient and its d.f.:FMI = [r + 2/(d.f. + 3)]/(r + 1)

where r is the "relative increase in variance due to non-response" (Rubin). Since d.f. is always positive, FMI lies between 0 and 1, and since d.f. is usually considerably larger than 3, FMI is approximately r/(r + 1). The larger the value of FMI, the greater the loss of information (hence loss of precision) that has been induced in the estimated coefficient by the missing data.

It is important to remember that the reported FMI is an

estimate. For a small number of imputations, the estimate may be imprecise. Just how imprecise may be gauged to some extent by increasing the number of imputations, refitting the model inmimand inspecting the resulting FMI. Combining estimates using Rubin's rules {pstd} While statistical theory guarantees the asymptotic normality of regression coefficients estimated by maximum likelihood, the same guarantee does not apply in general. One should be aware that combining estimates across imputations using Rubin’s rules may not always make sense. In particular, it assumes that the sampling distribution of the estimate is approximately normal, with the corresponding SE (if supplied). It may be appropriate to transform the scale of the parameter (e.g. Fisher’s transform for the correlation coefficient) before obtaining MI combined estimates. Notes on mim: predict {pstd} The syntax ofmim: predictis {phang}mim: predictnewvarname,[predict_options] {pstd} wherepredict_optionsare options appropriate topredictforcommand, the regression command just run bymim. Note thatmim: predictcan only predict one new variable (newvarname) at a time. Thus syntaxes ofpredictthat allow one to predict several variables at once are disallowed. The most obvious example ismlogit. For example, supposeywas a 3-level categorical outcome variable, coded 1, 2, 3, and a model of the formmim: mlogit yexplanatory_variableshad just been fit. The command {phang}. mim:predict yhat1 yhat2 yhat3, xb{pstd} would result in an error message (too many variables specified), whereas following regularmlogit, it would be valid. The solution withmim: predictis {phang}. mim: predictyhat1, outcome(1) xb{p_end} {phang}. mim: predict yhat2, outcome(2)xb{p_end} {phang}. mim: predict yhat3, outcome(3) xb{p_end} {pstd} The default action formim: predictis the same as the default forpredictaftercommand. For example, whencommandislogit,mim: predictproduces the event probability, not the linear predictor. The optionxbmust be included to obtain the linear predictor. The values returned in the imputed datasets (_mj> 0) use imputation-specific parameter estimates and (if appropriate) the imputed covariate values. The values returned in the_mj= 0 section of the dataset are obtained by combining the predictions from the imputed datasets using Rubin’s rules. {pstd} As just mentioned, the across-imputation average of whatever is being predicted is stored in imputation 0 (_mj= 0). Note, however, that if after fitting (say) amim: logitmodel you domim: predict pandmim:predict xb, xb, then logit(p) =xbfor_mj> 0 but not for_mj= 0. The behaviour is logical, but should nevertheless be borne in mind. {pstd} There may be better ways to perform multiple-imputation inference for a desired predicted quantity, particularly when the latter is a highly non-linear function of the original model parameters. In the case of logistic regression, for example, a user might prefer to combine on the linear predictor scale before obtaining inferences for predicted probabilities by back-transformation, i.e.mim: predict xb, xbfollowed bygen p = invlogit(xb), which will not give the same results asmim:predict p. There appears to be no clear statistical theory to guide these decisions. Score labels in -mlogit- {pstd} It is legal in Stata for score labels to contain periods (UK English: full stops). For example, {phang}. label define edulbl 1 "Less than H.S." 2 "H.S." 3 "Assoc. orhigher"{p_end} {phang}. label values edu edulbl{pstd} is perfectly valid. Such labels define equation-names when used with themlogitcommand. However, Stata does not allow them to be transferred "manually" to matrices, a feature which would stopmimin its tracks. To avoid the problem,mimconverts the periods in such labels to underscores when reportingmlogitmodel equations. Saved results {pstd} After model fitting,mimreturns results ine()as follows. {synopthdr:Result} {syntab:Matrices} {synopt:e(MIM_Q)}coefficient estimates{p_end} {synopt:e(MIM_T)}total covariance matrix estimate{p_end} {synopt:e(MIM_TLRR)}Li-Raghunathan-Rubin (1999) estimate of total covariance matrix{p_end} {synopt:e(MIM_W)}within imputation covariance matrix estimate{p_end} {synopt:e(MIM_B)}between imputation covariance matrix estimate{p_end} {synopt:e(MIM_dfvec)}vector of MI degrees of freedom{p_end} {synopt:e(MIM_lambda)}vector of fraction of missing information (FMI){p_end} {synopt:e(MIM_r)}vector of increase in variance due to missing information{p_end} {syntab:Scalars} {synopt:e(MIM_dfmin)}minimum ofe(MIM_dfvec){p_end} {synopt:e(MIM_dfmax)}maximum ofe(MIM_dfvec){p_end} {synopt:e(MIM_Nmin)}minimun number of observations used in estimation{p_end} {synopt:e(MIM_Nmax)}maximum number of observations used in estimation{p_end} {syntab:Macros} {synopt:e(MIM_m)}number of imputed datasets used in estimation{p_end} {synopt:e(MIM_levels)}values of_mi_mvariable used in estimation{p_end} {synopt:e(MIM_prefix)}value ofe(prefix)returned bycommand{p_end} {synopt:e(MIM_prefix2)}mim{p_end} {synopt:e(MIM_cmd)}the name of the estimation command specified incommand{p_end} {synopt:e(MIM_depvar)}value ofe(depvar)returned bycommand{p_end} {synopt:e(MIM_title)}value ofe(title)returned bycommand{p_end} {synopt:e(MIM_properties)}value ofe(properties)returned bycommand{p_end} {synopt:e(MIM_eform)}value ofe(eform)returned bycommand{p_end} {syntab:Additional results (returned whenstorebvoptionis specified)} {synopt:e(b)}equal toe(MIM_Q){p_end} {synopt:e(V)}equal toe(MIM_T){p_end} {synopt:e(N)}equal toe(MIM_Nmin){p_end} {synopt:e(sample)}equal to 1 for observations in the estimation sample, 0 otherwise{p_end} {synopt:e(cmd)}equal toe(MIM_cmd){p_end} {synopt:e(depvar)}equal toe(MIM_depvar){p_end} {synopt:e(df_r)}equal toe(MIM_dfmin){p_end} {synopt:e(properties)}equal toe(MIM_properties){p_end} Examples {pstd} Examples and accompanying remarks are given under the headingsModel fitting,Data manipulation,Post-estimation,Replay of estimation results [advanced],Utilitycommands, andCombining estimates using Rubin's rules. Model fitting {pstd} When invoked for model fitting,mimappliescommandto each of the imputed datasets in the current MIM dataset, and then combines the individual estimates using Rubin's rules for multiple-imputation-based inferences. In most cases fitting a statistical model to a multiply-imputed dataset withmimis simply a matter of loading the MIM-format dataset into Stata and executing the desired estimation command, prefixing it with themimprefix. Several examples are provided below. {phang}. use mymimdataset1, clear{p_end} {phang}. mim: regressy x1 x2 x3 x4{p_end} {phang}. use mymimdataset2, clear{p_end} {phang}. mim: logistic y x1 x2, coef{p_end} {phang}. use mymimdataset3, clear{p_end} {phang}. xi: mim: glm low age lwt i.race smoke ptl ht ui, f(bin)l(logit) le(90){p_end} {phang}. xi: mim: stepwise, pr(0.05): glm lowage lwt (i.race) smoke ptl ht ui, f(bin) l(logit) le(90){p_end} {phang}. use mymimdataset4, clear{p_end} {phang}. mim: svy: proportionheartatk{p_end} {phang}. mim: svy: logistic heartatk age weight height{p_end} {phang}. mim, noi: svy jackknife, nodots: logit highbp heightweight age age2 female black, or{p_end} {phang}. use mymimdataset5,clear{p_end} {phang}. mim: xtmixed gsp private emp water other unemp ||region: R.state, l(90){p_end} {pstd} Additionally, other Stata estimation commands may by fitted to a MIM dataset using thecategory(fit)option ofmim. Two examples are given below. {phang}. usemymimdataset6, clear{p_end} {phang}. mim, cat(fit): mvprobit (private =years logptax loginc) (vote=years logptax loginc), nolog{p_end} {phang}. use mymimdataset7, clear{p_end} {phang}. mim, cat(fit): MyNewCommandy x1 x2{p_end} Data manipulation {pstd} The stacked dataset format used bymimallows simple data manipulation such as generating and replacing variables to be performed using existing Stata commands. More complex data manipulation tasks, particularly those that alter the number of observations in each of the imputed datasets, usually require more detailed programming. For convenience, three common tasks, namely reshaping, appending and merging datasets, can be accomplished by prefixing the relevant command withmim. The first two are straightforward, and in most instances will be applied by simply prefixing the usual syntax withmim. {phang}. use mymimdataset7, clear{p_end} {phang}. mim: reshape wide income, i(id) j(year){p_end} {phang}. mim: reshape long{p_end} {phang}. use mymimdataset8, clear{p_end} {phang}. mim: append using mymimdataset9{p_end} {pstd} Merging twomim-compatible datasets requires a little further explanation, since it requires that thesortorderoption be specified tomim. This option is necessary so thatmimcan generate a new_mi_idvariable once merging is complete. For example, suppose thatmymimdataset10is amim-compatible dataset containing patient details, with each patient having a uniqueid, andmymimdataset11is a second stacked dataset containing additional longitudinal measurements on each patient, with each measurement uniquely identified by the two variablesid time. Merging these data into a single dataset would usually be accomplished by a match-merge on theidvariable. However, once merging is complete, the observations in the merged dataset are determined by the pair of variablesidandtime. Usingmimthe merge would be accomplished as follows: {phang}. usemymimdataset10, clear{p_end} {phang}. mim, sortorder(id time): merge idusing mymimdataset11{p_end} {pstd} Additionally, other Stata commands that either manipulate a single dataset or a master/using pair of datasets may by applied to a multiply-imputed dataset using thecategoryoption ofmim. This is most likely to be of interest whencommandis a user-written program designed to accomplish a project-specific task. {phang}. use mymimdataset12, clear{p_end} {phang}. mim,category(manip) so(id): mystatacmd x1 x2 x3{p_end} Post-estimation {pstd} In general Stata's standard post-estimation methods cannot be directly applied with multiply-imputed data. Methods relying on likelihood comparisons (lrtest) are not applicable because multiple imputation does not involve calculation of likelihood functions for the data. Furthermore, application of a post-estimation command directly to the multiple-imputation estimates will not in general produce valid simultaneous inferences for multiple parameters, since applying Rubin's rules to the vector of parameter estimates and their associated variance-covariance matrices does not work reliably (Li et al, 1991). Performing inferences for target parameters that are scalar (unidimensional) is however easily accomplished using Rubin's rules, and this has enabled us to create multiple-imputation versions oflincomandpredict. In addition, we have implemented the method of Li et al (1991) to create amim-specific version oftestparm, which allows the testing of null hypotheses relating to a vector of parameters. Examples of the use ofmim: lincom,mim: testparmandmim: predictare given below. For other post-estimation tasks see the additional remarks under Replay of estimation results [advanced]. {pstd} Warning:mim: lincomhas an anomalous feature. Stata'slincomfollowinglogisticbehaves atypically compared with other Stata regression commands such asstcox. If you wish to get odds ratio estimates withmim: logisticfollowed bymim: lincom, you should specify the model asmim: logit ..., orand the lincom command asmim: lincomexp, or. {phang}. use mymimdataset2, clear{p_end} {phang}. mim: logit y x1 x2{p_end} {phang}. mim: lincom x1 + 2 * x2{p_end} {phang}. mim: lincom x1 + x2, or{p_end} {phang}. mim: testparm_all{p_end} {phang}. mim: predict yhat, xb{p_end} {phang}. mim:predict yhatse, stdp{p_end} Replay of estimation results [advanced] {pstd} Multiple-imputation estimates may be replayed by simply typingmimat the command line. If the estimates for a given imputed dataset have previously been called up using thej(#)option, the overall (Rubin's rules) estimates may be re-displayed by typingmim, storebvormim,clearbv. Alevel(#)option and anyeformoptions supported bycommandmay be specified during replay. {phang}. use mymimdataset2, clear{p_end} {phang}. mim: logit y x1 x2{p_end} {phang}. mim, or l(90){p_end} {pstd} Multiple-imputation estimates may be copied intoe(b),e(V)etc. by specifying thestorebvoption during replay. Note that use of multiple-imputation estimates in this way is at the user's descretion, and validity of the results is not guaranteed. In particular, forcing the multiple-imputation estimates intoe(b)ande(V)allows application of a Stata post-estimation command directly to the multiple-imputation estimates. While this may be valid in specific cases, it is certainly not valid in general (see Post-estimation for additional comments). {phang}. mim, storebv{p_end} {pstd} (Note that thestorebvoption may also be specified during model fitting.) {pstd} Alternatively, by specifying thej(#)option ofmim, the estimates corresponding to the application ofcommandto one of the individual imputed datasets are copied into their usual place ine()(that is, intoe(b),e(V)etc.).commandcan also be replayed directly in this situation, for example {phang}. mim: logit yx1 x2{p_end} {phang}. mim, j(1){p_end} {phang}. logit, or{p_end} {pstd} displays the estimated odds ratios for imputation #1. {pstd} The facility to replay individual estimates has been incorporated with extensibility in mind, particularly with regard to post-estimation. The most likely application is to loop over the individual estimates, replaying and capturing necessary quantities from each set of results in turn, and then combining these in some way, where the standard approach for simple scalar estimation would be to use Rubin's rules. {phang}.use mymimdataset2, clear{p_end} {phang}. mim: logit y x1 x2{p_end} {phang}. local levels `"`e(MIM_levels)'"'{p_end} {phang}. foreach j oflocal levels {{p_end} {phang}. quietly mim, j(`j'){p_end} {phang}.... apply some post-estimation command or capture some stored resultshere ...{p_end} {phang}. }{p_end} {phang}.combine results fromindividual estimations using Rubin's rules ...{p_end} {pstd} Finally, to avoid inadvertent application of a Stata post-estimation command to estimates copied intoe(b),e(V)etc. using either thej(#)orstorebvoption, theclearbvoption is provided to allow one to clear these estimates when finished (without losing the multiple imputation estimates from memory). It is recommended always to make use of this facility. {phang}. mim, clearbv{p_end} Utility commands {pstd} Thecheckcommand provides a detailed integrity check of a multiply imputed dataset in stacked format. The main checks are that non-missing values must be constant across imputed datasets and that all missing values must have been imputed. Note that the utility commands are only applicable when the original dataset with missing values has been included in the stacked dataset (see MIM dataset format). {phang}. use mymimdataset12, clear{p_end} {phang}. mim: check{p_end} {phang} Alternatively, the check can be restricted to selected variables. {phang}. mim: check x1 x2 x3 x4 x5{p_end} {pstd} Thegenmisscommand generates a missing indicator variable for a specified variable. {phang}. mim: genmiss x1{p_end} {pstd} In this case the generated indicator variable is called_mim_x1(and in general the naming convention used is to prefixvarnamewith_mim_). Combining estimates using Rubin's rules {pstd} Some simple examples ofmim, category(combine)may help to clarify how to use this powerful facility. One small point to note: the degrees of freedom used in calculating the t-statistic for confidence intervals are slightly larger according tomim, category(combine)than tomimwhen fitting regression models. The result is thatmim, category(combine)gives slightly narrower confidence intervals. {pstd}1. The mean ofxwith its SE and95% CI computed in different ways{pmore}Using the default calculating tool (statsby): {pmore}. mim, cat(combine) est(_b[x]) se(_se[x]) : meanx{p_end} {pmore}. mim, cat(combine) est(_b[_cons]) se(_se[_cons]) :regress x{p_end} {pmore}. mim, cat(combine) est(r(mean))se(sqrt(r(Var)/r(N))) : ameans x{p_end} {pmore}Note the use of an expression for the SE of the mean, namelyse(sqrt(r(Var)/r(N))).statsbyallows this flexibility butbyvardoesn't. {pmore}Using the alternative calculating tool (byvar): {pmore}. mim, cat(combine) byvar est(b(x))se(se(x)) : mean x{p_end} {pmore}. mim, cat(combine) byvar est(b(_cons))se(se(_cons)) : regress x{p_end} {pstd}2. Area under a ROC curve{pmore} The aim is to fit a logistic regression ofyonx1andx2, and compute the AUROC (area under the ROC curve) for the resulting linear predictor in each imputation, combine the AUROC values across imputations and report the mean AUROC with its SE and 95% CI. {pmore}. mim: logit y x1x2{p_end} {pmore}. mim: predict xb{p_end} {pmore}. mim, cat(combine)est(r(area)) se(r(se)) : roctab y xb{p_end} {pmore}. mim, cat(combine)byvar est(r(area)) se(r(se)) : roctab y xb{p_end} {pmore} We have noticed thatbyvaris substantially faster thanstatsbyin some examples; in theroctabexample just given, it takes one third of the time taken bystatsby. The reason appears to be thatstatsbyexecutesstata_cmdfirst for the entire dataset, then for each imputation, whereasbyvaronly does it for each imputation. {pstd}3. Using a sequence of Stata commands{pmore} Note the feature ofbyvarthatstata_cmdcan be a sequence of Stata commands, separated by@. The feature is not available withstatsby. {pmore} For example, the mean AUROC in the second example above could be obtained by the following single command: {pmore}. mim,cat(combine) byvar est(r(area)) : logit y x1 x2 @ lroc, nograph{p_end} {pmore} Sincelrocdoes not return the SE of the AUROC, these()option ofmim, category(combine)is omitted and only the mean AUROC is reported. {pstd}4. Combining estimates of a parameter from a multi-equation model{pmore}This is purely a pedagogic example, sincemimreports combined results for all parameters of a multi-equation model anyway: {phang2}.mim, cat(combine) est([ln_p]_b[_cons]) se([ln_p]_se[_cons]) : streg x1x2, distribution(weibull){p_end} Authors {pstd} John C. Galati & John B. Carlin, Clinical Epidemiology & Biostatistics Unit Murdoch Children’s Research Institute & University of Melbourne{break} john.carlin@mcri.edu.au {pstd} Patrick Royston, MRC Clinical Trials Unit, London.{break} pr@ctu.mrc.ac.uk References {phang} Carlin JB, Galati JC and Royston P. 2008. A new framework for managing and analyzing multiply imputed data in Stata.Stata Journal8(1): 49-67. {phang} Carlin JB, Li N, Greenwood P and Coffey C. 2003. Tools for analyzing multiple imputed datasets.Stata Journal3(3): 226-244. {phang} Efron B, Gong G. 1983. A leisurely look at the bootstrap, the jackknife, and cross-validation.TheAmerican Statistician37: 36-48. {phang} Li KH, Raghunathan TE, Rubin DB. 1991. Large-sample significance levels from multiply-imputed data using moment-based statistics and an F reference distribution.Journalof the American Statistical Association86: 1065-1073. {phang} Royston P. 2004. Multiple imputation of missing values.Stata Journal4(3): 227-241. {phang} Royston P. 2005. Multiple imputation of missing values: update.Stata Journal5(2): 188-201. {phang} Royston P. 2005. Multiple imputation of missing values: update of ice.Stata Journal5(4): 527-536. {phang} Royston P. 2007. Multiple imputation of missing values: further update of ice, with an emphasis on interval censoring.StataJournal7(4): 445–464. {phang} Royston P, Carlin JB and White IR. 2009. Multiple imputation of missing values: new features for mim.StataJournalto appear. Also see {pstd} Online: help for mim, mimstack, mi