help for mim                         P Royston, JC Galati, JB Carlin & IR White
-------------------------------------------------------------------------------


Title

    mim -- A prefix command for analysing and manipulating multiply imputed
    datasets


Syntax

        mim [, mim_options] : command

        mim [, replay_options]



    mim_options            Description
    -------------------------------------------------------------------------
    General
    * category(cat_type)   where cat_type is fit, manip or combine - specify
                             whether command is estimation, data manipulation
                             or one whose (scalar) results are to be combined
                             using Rubin's rules
      noisily              display output from execution of command within
                             each of the imputed datasets

    Estimation (valid only for estimation commands)
      dots                 display progress dots during model fitting
      from(#)              fit model, starting from imputation #
      to(#)                fit model, ending with imputation #
      storebv              fills e(b), e(V) etc. with multiple-imputation
                             estimates

    Manipulation (valid only for data manipulation commands)
    + sortorder(varlist)   one or more variables that uniquely identify the
                             observations in a given imputed dataset
                             following each execution of command

    Combination (valid for a wide range of Stata commands)
      est(est_spec)        specifies the scalar (called est) to be combined
                             across imputations
      se(se_spec)          specifies the standard error of est to be combined
                             across imputations
      byvar                uses byvar (rather than the default, statsby) to
                             extract and store est and its SE in each
                             imputation

    -------------------------------------------------------------------------
    * only necessary for estimation and data manipulation commands not listed
      under Description
    + not valid for append and reshape; MANDATORY for all other data
      manipulation commands.


    replay_options         Description
    -------------------------------------------------------------------------
      clearbv              clears e(b), e(V) etc., but leaves other mim
                             estimates intact
      j(#)                 fills e(b), e(V) etc. with estimates corresponding
                             to imputed dataset #
      mcerror              displays a table of Monte Carlo standard errors
                             for quantities in the table of regression
                             coefficients
      storebv              same as for estimation, unless j option is
                             specified
      reporting_options    level and eform options supported by command
    -------------------------------------------------------------------------

    xi is allowed as a prefix to mim, but not as prefix to command, see xi.
    svy is allowed as a prefix to command, see svy.
    version is allowed as a prefix to command, see version.



Description

    mim is a prefix command for working with multiply-imputed (MIM) datasets,
    where command can be any of a wide range of Stata commands. The function
    that mim performs depends on the category of command passed to mim;
    either estimation, data manipulation, post estimation or utility. A
    limited range of commands can be used with mim without specifying the
    category mim_option. These are:

        Estimation:  regress, mean, proportion, ratio, logistic, logit, 
        ologit, mlogit, probit, oprobit, poisson, glm, binreg, nbreg, gnbreg,
        blogit, clogit, cnreg, mvreg, rreg, qreg, iqreg, sqreg, bsqreg, 
        stcox, streg, xtgee, xtreg, xtlogit, xtnbreg, xtpoisson, xtmixed, 
        svy:regress, svy:mean, svy:proportion, svy:ratio, svy:logistic, 
        svy:logit, svy:ologit, svy:mlogit, svy:probit, svy:oprobit, 
        svy:poisson, stepwise

        Post Estimation:  lincom, testparm, predict

        Data Manipulation:  reshape, append, merge

        Utility:  check, genmiss

    With one exception, command is specified with its full usual syntax. The
    exception is merge, where only one "using" file is allowed. Also, command
    may be one of two internal utility commands, check and genmiss, where the
    required syntaxes are

        mim : check [varlist]

        mim : genmiss varname

    respectively (see Utility commands for more details regarding these two
    commands).

    Note that the command stepwise expects the synatx of Stata's stepwise
    command, and is itself a 'prefix' command. It uses P-values from Wald
    tests for deciding whether to include or exclude variables in a model.

    Further Stata estimation and data manipulation commands can be used with
    mim by specifying the mim_option category(mim_type), where mim_type may
    be fit for estimation commands, manip for data manipulation commands or
    combine for combining scalar estimates and their SE's according to
    Rubin's rules. See Combining estimates using Rubin's rules for more
    details of mim, category(combine), and Combining estimates using Rubin's
    rules for a warning about combining estimates in this way. Use of mim in
    these ways is at the user's discretion, and the results are not
    guaranteed.

    The dataset structure used by mim is a stacked format. In Stata 11 it may
    be either the new flong style or that created by Royston's ice (if
    installed) command. Details of the dataset format may be found under MIM
    dataset format below. Also, please study the following remarks on how mim
    functions under different versions of Stata.

    mim and Stata 11

    With Stata 11, mim recognizes the 'old' ice-style format variables (_mi
    and _mj) and the new mi-style variables (_mi_id and _mi_m). Note that
    multiply imputed data created by ice can be imported into the mi flong
    style by using the command mi import ice, clear automatic. The automatic
    option ensures that the imputed variables are correctly registered. If
    you omit the option, you may encounter difficulties.

    If mim is called by a Stata version below 11.0, it recognizes only _mi
    and _mj as format variables. If called by Stata version 11.0 or higher,
    mim first looks for _mi and _mj. If it fails to find them, it checks for
    an mi-style data structure and if necessary converts the data to style
    flong (see mi set and mi convert). Note that the flong style persists
    after mim has finished. Finally, if neither type of formatting is found,
    mim gives up and issues an error message.

    In what follows, the format variables are called _mi_id and _mi_m with
    the implicit understanding that if the data are in the ice format, we
    mean _mi and _mj, respectively.

    With Stata 11, if the data are in mi format and mim creates new
    variables, e.g. with the mim: predict newvar command, make sure you keep
    such variables unregistered. To avoid possible data loss in Stata 11 when
    working with mim, do NOT convert the data to a different mi style using 
    mi convert.

    When mim starts, it checks and reports which format is being used.


Options

        +---------+
    ----+ General +----------------------------------------------------------

    category specifies the type of command that is being passed to mim,
        either estimation (category fit) or data manipulation (category
        manip).

    noisily specifies that the results of the application of command to each
        of the individual imputed datasets should be displayed.

        +------------+
    ----+ Estimation +-------------------------------------------------------

    dots specifies that progress dots should be displayed.

    from(#) fits the specified model from imputation # (i.e. for _mi_m >= #).
        # must be an integer between 1 and m, the maximum value of _mi_m in
        the dataset.  Default # is 1.

    storebv specifies that the standard list of returned results for
        estimation commands be filled using the multiple-imputation results.
        In particular this forces the multiple-imputation coefficient and
        covariance matrix estimates into e(b) and e(V), respectively,
        enabling application at the user's own discretion of Stata
        post-estimation commands that use these quantities directly (see
        Replay of estimation results [advanced] for further details).

    to(#) fits the specified model between imputation from() and imputation
        #.  # must be an integer between 2 and m, where m is the maximum
        value of _mi_m in the dataset. Note that if # > m then # is assumed
        to equal m and no error is raised. Default # is m.

        +--------------+
    ----+ Manipulation +-----------------------------------------------------

    sortorder specifies a list of one or more variables that uniquely
        identify the observations in each of the datasets in a mim-compatible
        dataset; for data manipulation, this option must specify a list of
        variables that together uniquely identify the observations in each
        dataset AFTER command has been applied to the given dataset (note
        that varlist cannot include _mi_id, since the _mi_m and _mi_id
        variables are dropped from each dataset prior to the call to
        command).

        +-------------+
    ----+ Combination +------------------------------------------------------

    byvar specifies that byvar be used to execute the required stata_cmd in
        each imputation and store the required statistic (and optionally, its
        SE) in new variable(s), to be combined by mim according to Rubin's
        rules. The default is to use statsby. Use of byvar affects the syntax
        of the options est() and se(), see below.

    est(est_spec) specifies the scalar est to be combined across imputations.
        est_spec depends on whether the byvar option is used or not. By
        default, statsby is used to compute est from stata_cmd according to
        est_spec.

        The following table shows what est_spec looks like when the estimand,
        est, is a regression coefficient, its SE, or a quantity (usually a
        scalar) returned by stata_cmd in either an e() or an r() result:

        ---------------------------------------------------------------
        Type of estimand (est)        statsby (default)        byvar     
        ---------------------------------------------------------------
        Regression coefficient        [eq]_b[varname]        b(varname) 
        SE of regression coefficient  [eq]_se[varname]      se(varname) 
        Quantity returned in e()       e(quantityname)  e(quantityname) 
        Quantity returned in r()       r(quantityname)  r(quantityname) 
        ---------------------------------------------------------------

        The optional eq refers to an 'equation'; eq may be ##, where # is an
        equation number, or an equation name. byvar does not currently
        support multiple equations.

    se(se_spec) specifies the standard error of est to be used with Rubin's
        rules. Note that se() is optional; if omitted, only the mean of est
        across imputations is calculated. se_spec follows the same rules as
        est_spec (see est() above).

        +--------+
    ----+ Replay +-----------------------------------------------------------

    clearbv specifies that the additional items returned using the storebv or
        j options be cleared, but that all other estimation results returned
        by mim be left intact.

    j(#) specifies that the standard results returned by estimation commands
        be filled using the estimates from the last fit of an estimation
        command applied to the #th imputed dataset, and that these estimates
        be replayed.

    mcerror displays a table of Monte Carlo standard errors for the
        quantities presented in the main table of multiple-imputation
        results. The MC standard errors measure the uncertainty in the
        estimated quantities due to the use of a finite number m of
        imputations.  In general, MC error decreases as m is increased.  The
        MC error for the regression coefficients is computed as the square
        root of the between-imputation variance (B) divided by the square
        root of the number of imputations.  For the other quantities,
        jackknife estimates (leaving out one imputation each time) (Efron &
        Gong 1983) are presented.  The mcerror option may not be combined
        with other replay options other than reporting_options, nor may it be
        specified at model-fitting time.

    storebv, same as for estimation, unless the j option is specified.

    reporting_options specifies level() and eform options supported by
        command.


    There are no mim_options for mim: check and mim: genmiss.  mim: predict
    allows options appropriate to predict after command - see Notes on mim:
    predict for further information.


Remarks

    Remarks are presented under the headings MIM dataset format, Display of
    regression results, Combining estimates using Rubin's rules, Notes on
    mim: predict, and Score labels in -mlogit-.

    MIM Dataset format

    For a multiply-imputed dataset to be compatible with mim, the dataset
    must contain:

        a numeric variable called _mi_m whose values identify the individual
            dataset to which each observation belongs,
        a numeric variable called _mi_id whose values identify the
            observations within each individual dataset.

    Moreover, if the original data with missing values are to be stored in
    the dta file, then those observations must be identified with the value
    _mi_m==0, while imputed datasets are identified using positive _mi_m
    values. In particular, the dataset in the stack identified by _mi_m==0 is
    ignored for the purpose of model fitting with mim. For convenience, a
    multiply-imputed dataset satisfying the above requirements is called a
    MIM dataset.

    The requirements above have been kept as simple as possible. They allow a
    set of multiply-imputed datasets stored in separate files to be stacked
    into the format required by mim using only the basic data processing
    commands generate, append and replace. (Nevertheless, for convenience, a
    dedicated command mimstack has been provided for this purpose.)

    An example of a multiply imputed dataset in mim-compatible format is
    shown below. The original data consist of a completely observed variable
    y and a variable x with missing values in the 3rd, 4th and 6th
    observations, and there are 2 imputed copies of the original dataset in
    the stack.

                     _mi_m      _mi_id       y        x    
                      ----------------------------------
                         0        1      1.1        105 
                         0        2      9.2        106 
                         0        3      1.1          . 
                         0        4      2.3          . 
                         0        5      7.5        108 
                         0        6      7.9          . 
                         1        1      1.1        105 
                         1        2      9.2        106  
                         1        3      1.1    109.796  
                         1        4      2.3    110.456  
                         1        5      7.5        108  
                         1        6      7.9    102.243  
                         2        1      1.1        105  
                         2        2      9.2        106 
                         2        3      1.1    107.952 
                         2        4      2.3    115.968 
                         2        5      7.5        108 
                         2        6      7.9    114.479 


    Display of regression results

    mim displays parameter estimates (obtained by Rubin's rules - see Model
    fitting) and their standard errors, taking into account between- and
    within-imputation variation.  Confidence intervals and test statistics
    for regression coefficients are based on the t distribution with
    estimated degrees of freedom (d.f.) obtained using the method of Barnard
    and Rubin. The final entry for each parameter estimate in the model is
    "FMI", standing for "fraction of missing information". For each
    predictor, the FMI is a function of the ratio of the between- to
    within-imputation variance of the estimated coefficient and its d.f.:

        FMI = [r + 2/(d.f. + 3)]/(r + 1)

    where r is the "relative increase in variance due to non-response"
    (Rubin). Since d.f. is always positive, FMI lies between 0 and 1, and
    since d.f. is usually considerably larger than 3, FMI is approximately
    r/(r + 1).  The larger the value of FMI, the greater the loss of
    information (hence loss of precision) that has been induced in the
    estimated coefficient by the missing data.

    It is important to remember that the reported FMI is an estimate.  For a
    small number of imputations, the estimate may be imprecise.  Just how
    imprecise may be gauged to some extent by increasing the number of
    imputations, refitting the model in mim and inspecting the resulting FMI.
    Combining estimates using Rubin's rules {pstd} While statistical theory
    guarantees the asymptotic normality of regression coefficients estimated
    by maximum likelihood, the same guarantee does not apply in general. One
    should be aware that combining estimates across imputations using Rubin’s
    rules may not always make sense.  In particular, it assumes that the
    sampling distribution of the estimate is approximately normal, with the
    corresponding SE (if supplied).  It may be appropriate to transform the
    scale of the parameter (e.g. Fisher’s transform for the correlation
    coefficient) before obtaining MI combined estimates.  Notes on mim:
    predict {pstd} The syntax of mim: predict is {phang}mim: predict
    newvarname , [ predict_options ] {pstd} where predict_options are options
    appropriate to predict for command, the regression command just run by
    mim. Note that mim: predict can only predict one new variable
    (newvarname) at a time. Thus syntaxes of predict that allow one to
    predict several variables at once are disallowed. The most obvious
    example is mlogit. For example, suppose y was a 3-level categorical
    outcome variable, coded 1, 2, 3, and a model of the form mim: mlogit y
    explanatory_variables had just been fit. The command {phang}. mim:
    predict yhat1 yhat2 yhat3, xb {pstd} would result in an error message
    (too many variables specified), whereas following regular mlogit, it
    would be valid. The solution with mim: predict is {phang}. mim: predict
    yhat1, outcome(1) xb{p_end} {phang}. mim: predict yhat2, outcome(2)
    xb{p_end} {phang}. mim: predict yhat3, outcome(3) xb{p_end} {pstd} The
    default action for mim: predict is the same as the default for predict
    after command. For example, when command is logit, mim: predict produces
    the event probability, not the linear predictor. The option xb must be
    included to obtain the linear predictor.  The values returned in the
    imputed datasets (_mj > 0) use imputation-specific parameter estimates
    and (if appropriate) the imputed covariate values. The values returned in
    the _mj = 0 section of the dataset are obtained by combining the
    predictions from the imputed datasets using Rubin’s rules.  {pstd} As
    just mentioned, the across-imputation average of whatever is being
    predicted is stored in imputation 0 (_mj = 0). Note, however, that if
    after fitting (say) a mim: logit model you do mim: predict p and mim:
    predict xb, xb, then logit(p) = xb for _mj > 0 but not for _mj = 0.  The
    behaviour is logical, but should nevertheless be borne in mind.  {pstd}
    There may be better ways to perform multiple-imputation inference for a
    desired predicted quantity, particularly when the latter is a highly
    non-linear function of the original model parameters.  In the case of
    logistic regression, for example, a user might prefer to combine on the
    linear predictor scale before obtaining inferences for predicted
    probabilities by back-transformation, i.e.  mim: predict xb, xb followed
    by gen p = invlogit(xb), which will not give the same results as mim:
    predict p. There appears to be no clear statistical theory to guide these
    decisions.  Score labels in -mlogit- {pstd} It is legal in Stata for
    score labels to contain periods (UK English: full stops). For example,
    {phang}. label define edulbl 1 "Less than H.S." 2 "H.S." 3 "Assoc. or
    higher"{p_end} {phang}. label values edu edulbl {pstd} is perfectly
    valid. Such labels define equation-names when used with the mlogit
    command. However, Stata does not allow them to be transferred "manually"
    to matrices, a feature which would stop mim in its tracks.  To avoid the
    problem, mim converts the periods in such labels to underscores when
    reporting mlogit model equations.  Saved results {pstd} After model
    fitting, mim returns results in e() as follows.  {synopthdr:Result}
    {syntab:Matrices} {synopt:e(MIM_Q)}coefficient estimates{p_end}
    {synopt:e(MIM_T)}total covariance matrix estimate{p_end}
    {synopt:e(MIM_TLRR)}Li-Raghunathan-Rubin (1999) estimate of total
    covariance matrix{p_end} {synopt:e(MIM_W)}within imputation covariance
    matrix estimate{p_end} {synopt:e(MIM_B)}between imputation covariance
    matrix estimate{p_end} {synopt:e(MIM_dfvec)}vector of MI degrees of
    freedom{p_end} {synopt:e(MIM_lambda)}vector of fraction of missing
    information (FMI){p_end} {synopt:e(MIM_r)}vector of increase in variance
    due to missing information{p_end} {syntab:Scalars}
    {synopt:e(MIM_dfmin)}minimum of e(MIM_dfvec){p_end}
    {synopt:e(MIM_dfmax)}maximum of e(MIM_dfvec){p_end}
    {synopt:e(MIM_Nmin)}minimun number of observations used in
    estimation{p_end} {synopt:e(MIM_Nmax)}maximum number of observations used
    in estimation{p_end} {syntab:Macros} {synopt:e(MIM_m)}number of imputed
    datasets used in estimation{p_end} {synopt:e(MIM_levels)}values of _mi_m
    variable used in estimation{p_end} {synopt:e(MIM_prefix)}value of
    e(prefix) returned by command{p_end} {synopt:e(MIM_prefix2)}mim{p_end}
    {synopt:e(MIM_cmd)}the name of the estimation command specified in
    command{p_end} {synopt:e(MIM_depvar)}value of e(depvar) returned by
    command{p_end} {synopt:e(MIM_title)}value of e(title) returned by
    command{p_end} {synopt:e(MIM_properties)}value of e(properties) returned
    by command{p_end} {synopt:e(MIM_eform)}value of e(eform) returned by
    command{p_end} {syntab:Additional results (returned when storebv option
    is specified)} {synopt:e(b)}equal to e(MIM_Q){p_end} {synopt:e(V)}equal
    to e(MIM_T){p_end} {synopt:e(N)}equal to e(MIM_Nmin){p_end}
    {synopt:e(sample)}equal to 1 for observations in the estimation sample, 0
    otherwise{p_end} {synopt:e(cmd)}equal to e(MIM_cmd){p_end}
    {synopt:e(depvar)}equal to e(MIM_depvar){p_end} {synopt:e(df_r)}equal to
    e(MIM_dfmin){p_end} {synopt:e(properties)}equal to
    e(MIM_properties){p_end} Examples {pstd} Examples and accompanying
    remarks are given under the headings Model fitting, Data manipulation,
    Post-estimation, Replay of estimation results [advanced], Utility
    commands, and Combining estimates using Rubin's rules.  Model fitting
    {pstd} When invoked for model fitting, mim applies command to each of the
    imputed datasets in the current MIM dataset, and then combines the
    individual estimates using Rubin's rules for multiple-imputation-based
    inferences. In most cases fitting a statistical model to a
    multiply-imputed dataset with mim is simply a matter of loading the
    MIM-format dataset into Stata and executing the desired estimation
    command, prefixing it with the mim prefix. Several examples are provided
    below.  {phang} . use mymimdataset1, clear {p_end} {phang} . mim: regress
    y x1 x2 x3 x4 {p_end} {phang} . use mymimdataset2, clear {p_end} {phang}
    . mim: logistic y x1 x2, coef {p_end} {phang} . use mymimdataset3, clear
    {p_end} {phang} . xi: mim: glm low age lwt i.race smoke ptl ht ui, f(bin)
    l(logit) le(90) {p_end} {phang} . xi: mim: stepwise, pr(0.05): glm low
    age lwt (i.race) smoke ptl ht ui, f(bin) l(logit) le(90) {p_end} {phang}
    . use mymimdataset4, clear {p_end} {phang} . mim: svy: proportion
    heartatk {p_end} {phang} . mim: svy: logistic heartatk age weight height
    {p_end} {phang} . mim, noi: svy jackknife, nodots: logit highbp height
    weight age age2 female black, or {p_end} {phang} . use mymimdataset5,
    clear {p_end} {phang} . mim: xtmixed gsp private emp water other unemp ||
    region: R.state, l(90) {p_end} {pstd} Additionally, other Stata
    estimation commands may by fitted to a MIM dataset using the
    category(fit) option of mim. Two examples are given below.  {phang} . use
    mymimdataset6, clear {p_end} {phang} . mim, cat(fit): mvprobit (private =
    years logptax loginc) (vote=years logptax loginc), nolog {p_end} {phang}
    . use mymimdataset7, clear {p_end} {phang} . mim, cat(fit): MyNewCommand
    y x1 x2 {p_end} Data manipulation {pstd} The stacked dataset format used
    by mim allows simple data manipulation such as generating and replacing
    variables to be performed using existing Stata commands. More complex
    data manipulation tasks, particularly those that alter the number of
    observations in each of the imputed datasets, usually require more
    detailed programming. For convenience, three common tasks, namely
    reshaping, appending and merging datasets, can be accomplished by
    prefixing the relevant command with mim. The first two are
    straightforward, and in most instances will be applied by simply
    prefixing the usual syntax with mim.  {phang} . use mymimdataset7, clear
    {p_end} {phang} . mim: reshape wide income, i(id) j(year) {p_end} {phang}
    . mim: reshape long {p_end} {phang} . use mymimdataset8, clear {p_end}
    {phang} . mim: append using mymimdataset9 {p_end} {pstd} Merging two
    mim-compatible datasets requires a little further explanation, since it
    requires that the sortorder option be specified to mim. This option is
    necessary so that mim can generate a new _mi_id variable once merging is
    complete. For example, suppose that mymimdataset10 is a mim-compatible
    dataset containing patient details, with each patient having a unique id,
    and mymimdataset11 is a second stacked dataset containing additional
    longitudinal measurements on each patient, with each measurement uniquely
    identified by the two variables id time. Merging these data into a single
    dataset would usually be accomplished by a match-merge on the id
    variable. However, once merging is complete, the observations in the
    merged dataset are determined by the pair of variables id and time. Using
    mim the merge would be accomplished as follows:  {phang} . use
    mymimdataset10, clear {p_end} {phang} . mim, sortorder(id time): merge id
    using mymimdataset11 {p_end} {pstd} Additionally, other Stata commands
    that either manipulate a single dataset or a master/using pair of
    datasets may by applied to a multiply-imputed dataset using the category
    option of mim. This is most likely to be of interest when command is a
    user-written program designed to accomplish a project-specific task.
    {phang} . use mymimdataset12, clear {p_end} {phang} . mim,
    category(manip) so(id): mystatacmd x1 x2 x3 {p_end} Post-estimation
    {pstd} In general Stata's standard post-estimation methods cannot be
    directly applied with multiply-imputed data. Methods relying on
    likelihood comparisons (lrtest) are not applicable because multiple
    imputation does not involve calculation of likelihood functions for the
    data. Furthermore, application of a post-estimation command directly to
    the multiple-imputation estimates will not in general produce valid
    simultaneous inferences for multiple parameters, since applying Rubin's
    rules to the vector of parameter estimates and their associated
    variance-covariance matrices does not work reliably (Li et al, 1991).
    Performing inferences for target parameters that are scalar
    (unidimensional) is however easily accomplished using Rubin's rules, and
    this has enabled us to create multiple-imputation versions of lincom and
    predict. In addition, we have implemented the method of Li et al (1991)
    to create a mim-specific version of testparm, which allows the testing of
    null hypotheses relating to a vector of parameters.  Examples of the use
    of mim: lincom, mim: testparm and mim: predict are given below. For other
    post-estimation tasks see the additional remarks under Replay of
    estimation results [advanced].  {pstd} Warning: mim: lincom has an
    anomalous feature.  Stata's lincom following logistic behaves atypically
    compared with other Stata regression commands such as stcox. If you wish
    to get odds ratio estimates with mim: logistic followed by mim: lincom,
    you should specify the model as mim: logit ..., or and the lincom command
    as mim: lincom exp, or.  {phang} . use mymimdataset2, clear {p_end}
    {phang} . mim: logit y x1 x2 {p_end} {phang} . mim: lincom x1 + 2 * x2
    {p_end} {phang} . mim: lincom x1 + x2, or {p_end} {phang} . mim: testparm
    _all {p_end} {phang} . mim: predict yhat, xb {p_end} {phang} . mim:
    predict yhatse, stdp {p_end} Replay of estimation results [advanced]
    {pstd} Multiple-imputation estimates may be replayed by simply typing mim
    at the command line. If the estimates for a given imputed dataset have
    previously been called up using the j(#) option, the overall (Rubin's
    rules) estimates may be re-displayed by typing mim, storebv or mim,
    clearbv. A level(#) option and any eform options supported by command may
    be specified during replay.  {phang} . use mymimdataset2, clear {p_end}
    {phang} . mim: logit y x1 x2 {p_end} {phang} . mim, or l(90) {p_end}
    {pstd} Multiple-imputation estimates may be copied into e(b), e(V) etc.
    by specifying the storebv option during replay. Note that use of
    multiple-imputation estimates in this way is at the user's descretion,
    and validity of the results is not guaranteed. In particular, forcing the
    multiple-imputation estimates into e(b) and e(V) allows application of a
    Stata post-estimation command directly to the multiple-imputation
    estimates. While this may be valid in specific cases, it is certainly not
    valid in general (see Post-estimation for additional comments).  {phang}
    . mim, storebv {p_end} {pstd} (Note that the storebv option may also be
    specified during model fitting.) {pstd} Alternatively, by specifying the
    j(#) option of mim, the estimates corresponding to the application of
    command to one of the individual imputed datasets are copied into their
    usual place in e() (that is, into e(b), e(V) etc.). command can also be
    replayed directly in this situation, for example {phang} . mim: logit y
    x1 x2 {p_end} {phang} . mim, j(1) {p_end} {phang} . logit, or {p_end}
    {pstd} displays the estimated odds ratios for imputation #1.  {pstd} The
    facility to replay individual estimates has been incorporated with
    extensibility in mind, particularly with regard to post-estimation. The
    most likely application is to loop over the individual estimates,
    replaying and capturing necessary quantities from each set of results in
    turn, and then combining these in some way, where the standard approach
    for simple scalar estimation would be to use Rubin's rules.  {phang} .
    use mymimdataset2, clear {p_end} {phang} . mim: logit y x1 x2 {p_end}
    {phang} . local levels `"`e(MIM_levels)'"' {p_end} {phang} . foreach j of
    local levels { {p_end} {phang} .    quietly mim, j(`j') {p_end} {phang} .
       ... apply some post-estimation command or capture some stored results
    here ...  {p_end} {phang} . } {p_end} {phang} . combine results from
    individual estimations using Rubin's rules ...  {p_end} {pstd} Finally,
    to avoid inadvertent application of a Stata post-estimation command to
    estimates copied into e(b), e(V) etc. using either the j(#) or storebv
    option, the clearbv option is provided to allow one to clear these
    estimates when finished (without losing the multiple imputation estimates
    from memory). It is recommended always to make use of this facility.
    {phang} . mim, clearbv {p_end} Utility commands {pstd} The check command
    provides a detailed integrity check of a multiply imputed dataset in
    stacked format. The main checks are that non-missing values must be
    constant across imputed datasets and that all missing values must have
    been imputed. Note that the utility commands are only applicable when the
    original dataset with missing values has been included in the stacked
    dataset (see MIM dataset format).  {phang} . use mymimdataset12, clear
    {p_end} {phang} . mim: check {p_end} {phang} Alternatively, the check can
    be restricted to selected variables.  {phang} . mim: check x1 x2 x3 x4 x5
    {p_end} {pstd} The genmiss command generates a missing indicator variable
    for a specified variable.  {phang} . mim: genmiss x1 {p_end} {pstd} In
    this case the generated indicator variable is called _mim_x1 (and in
    general the naming convention used is to prefix varname with _mim_).
    Combining estimates using Rubin's rules {pstd} Some simple examples of
    mim, category(combine) may help to clarify how to use this powerful
    facility. One small point to note: the degrees of freedom used in
    calculating the t-statistic for confidence intervals are slightly larger
    according to mim, category(combine) than to mim when fitting regression
    models.  The result is that mim, category(combine) gives slightly
    narrower confidence intervals.  {pstd}1. The mean of x with its SE and
    95% CI computed in different ways {pmore}Using the default calculating
    tool (statsby):  {pmore}. mim, cat(combine) est(_b[x]) se(_se[x]) : mean
    x{p_end} {pmore}. mim, cat(combine) est(_b[_cons]) se(_se[_cons]) :
    regress x{p_end} {pmore}. mim, cat(combine) est(r(mean))
    se(sqrt(r(Var)/r(N))) : ameans x{p_end} {pmore}Note the use of an
    expression for the SE of the mean, namely se(sqrt(r(Var)/r(N))). statsby
    allows this flexibility but byvar doesn't.  {pmore}Using the alternative
    calculating tool (byvar):  {pmore}. mim, cat(combine) byvar est(b(x))
    se(se(x)) : mean x{p_end} {pmore}. mim, cat(combine) byvar est(b(_cons))
    se(se(_cons)) : regress x{p_end} {pstd}2. Area under a ROC curve {pmore}
    The aim is to fit a logistic regression of y on x1 and x2, and compute
    the AUROC (area under the ROC curve) for the resulting linear predictor
    in each imputation, combine the AUROC values across imputations and
    report the mean AUROC with its SE and 95% CI.  {pmore}. mim: logit y x1
    x2{p_end} {pmore}. mim: predict xb{p_end} {pmore}. mim, cat(combine)
    est(r(area)) se(r(se)) : roctab y xb{p_end} {pmore}. mim, cat(combine)
    byvar est(r(area)) se(r(se)) : roctab y xb{p_end} {pmore} We have noticed
    that byvar is substantially faster than statsby in some examples; in the
    roctab example just given, it takes one third of the time taken by
    statsby. The reason appears to be that statsby executes stata_cmd first
    for the entire dataset, then for each imputation, whereas byvar only does
    it for each imputation.  {pstd}3. Using a sequence of Stata commands
    {pmore} Note the feature of byvar that stata_cmd can be a sequence of
    Stata commands, separated by @. The feature is not available with
    statsby.  {pmore} For example, the mean AUROC in the second example above
    could be obtained by the following single command:  {pmore}. mim,
    cat(combine) byvar est(r(area)) : logit y x1 x2 @ lroc, nograph{p_end}
    {pmore} Since lroc does not return the SE of the AUROC, the se() option
    of mim, category(combine) is omitted and only the mean AUROC is reported.
    {pstd}4. Combining estimates of a parameter from a multi-equation model
    {pmore}This is purely a pedagogic example, since mim reports combined
    results for all parameters of a multi-equation model anyway:  {phang2}.
    mim, cat(combine) est([ln_p]_b[_cons]) se([ln_p]_se[_cons]) : streg x1
    x2, distribution(weibull){p_end} Authors {pstd} John C. Galati & John B.
    Carlin, Clinical Epidemiology & Biostatistics Unit Murdoch Children’s
    Research Institute & University of Melbourne{break}
    john.carlin@mcri.edu.au {pstd} Patrick Royston, MRC Clinical Trials Unit,
    London.{break} pr@ctu.mrc.ac.uk References {phang} Carlin JB, Galati JC
    and Royston P. 2008.  A new framework for managing and analyzing multiply
    imputed data in Stata.  Stata Journal 8(1): 49-67.  {phang} Carlin JB, Li
    N, Greenwood P and Coffey C. 2003.  Tools for analyzing multiple imputed
    datasets. Stata Journal 3(3): 226-244.  {phang} Efron B, Gong G. 1983. A
    leisurely look at the bootstrap, the jackknife, and cross-validation. The
    American Statistician 37: 36-48.  {phang} Li KH, Raghunathan TE, Rubin
    DB. 1991. Large-sample significance levels from multiply-imputed data
    using moment-based statistics and an F reference distribution.  Journal
    of the American Statistical Association 86: 1065-1073.  {phang} Royston
    P. 2004. Multiple imputation of missing values.  Stata Journal 4(3):
    227-241.  {phang} Royston P. 2005. Multiple imputation of missing values:
    update.  Stata Journal 5(2): 188-201.  {phang} Royston P. 2005. Multiple
    imputation of missing values:  update of ice. Stata Journal 5(4):
    527-536.  {phang} Royston P. 2007. Multiple imputation of missing values:
    further update of ice, with an emphasis on interval censoring. Stata
    Journal 7(4): 445–464.  {phang} Royston P, Carlin JB and White IR. 2009.
    Multiple imputation of missing values:  new features for mim. Stata
    Journal to appear.  Also see {pstd} Online:  help for mim, mimstack, mi