-------------------------------------------------------------------------------
help for  inorm                                                  John C. Galati
-------------------------------------------------------------------------------

Multiple-imputation via data-augmentation under a multivariate normal model

inorm em varlist [if exp] [in range] [, xvars(varlist) maxits(#) criterion(#) ridge(#) mu(string) sigma(string) echo mata ]

inorm da varlist [if exp] [in range] using filename [, xvars(varlist) m(#) its(#) burnin(#) ridge(#) mu(string) sigma(string) seed1(#) seed2(#) replace mata ]

Description

inorm is a command for creating multiple imputed copies of an incomplete dataset under a multivariate normal model. It consists of two subcommands, em and da, which provide initial likelihood estimation (em) and imputation via data-augmentation (da), respectively. The em command is used to produce initial estimates, returned in matrices r(mu) and r(sigma), that give a starting point for the data-augmentation stage. After imputation with the da command, the imputed datasets are stored in a long format, suitable for analysis using the prefix command mim (if installed), in a single Stata dataset specified by filename.

This command is a Stata implementation of the Windows freeware NORM, written by Joe Schafer (http://www.stat.psu.edu/~jls/misoftwa.html#mi), and is based on original Fortran code provided by him. For further details the user is referred to JL Schafer, Analysis of Incomplete Multivariate Data, Chapman & Hall 1997.

inorm is intended to run in either of two modes, one using a "plug-in" software component that requires a Windows environment, and the other using Mata code that should be platform-independent. The Mata version appears to be slow and it has not been extensively tested.

Options

burnin specifies an initial number of iterations to perform before commencing draws of the imputed datasets. The default is zero. In general burnin + its iterations will be performed prior to drawing the set of imputations for the first dataset, and its iterations will be performed between each subsequent draw of a set of imputated values for the dataset.

criterion specifies a convergence criterion for the em algorithm. The default is "0.000001", meaning the algorithm terminates when either the maxits number of iterations has been performed, or no entry in either the coefficient vector or covariance matrix estimate changes by more than 0.000001 times the previous value from one iteration to the next.

echo specifies that an estimate of the loglikelihood at each em iteration should be echoed to the screen.

its specifies the number of iterations between draws of imputed datasets. The default is 50.

m specifies the number of imputed copies of the dataset to be created. The default is 2.

mata specifies that the mata version of inorm should be used in preference to the plugin version (on Windows; on other operating systems the plugin version is not available).

maxits specifies the maximum number of iterations of the em algorithm to be performed. The default is 1000 iterations.

mu specifies the name of a Stata matrix containing an initial estimate for the mean of the multivariate normal model.

replace specifies that filename may be overwritten if it exists.

ridge specifies a ridge hyperparameter for data-dependant ridge prior as described in JL Schafer, Analysis of Incomplete Multivariate Data, Chapman & Hall 1997.

seed1 specifies a first seed for the random number generator.

seed2 specifies a second seed for the random number generator.

sigma specifies the name of Stata matrix containing an initial estimate of the covariance matrix for the multivariate normal model.

xvars gives a list of covariates that are completely observed on the estimation subsample (defined by the optional if or in clauses). The variables in varlist are modelled using a multivariate normal distribution conditional on the values of the xvars variables.

Examples

We provide an example analysis that reproduces results displayed by Paul Allison in his monograph "Missing Data" (Sage Publications, 2002). (The table references below correspond to tables in Chapters 4 and 5 of the book.) We are grateful to Rodrigo Alfaro of the Central Bank of Chile for suggesting the use of this example.

* Table 4.3 use http://www.ats.ucla.edu/stat/examples/md/usnews.dta, clear inorm em _all mat std = diag(vecdiag(r(sigma))) mat std = vecdiag(cholesky(std)) mat T4_3 = r(mu)', std' mat list T4_3

* Table 4.4 mat T4_4 =r(rho) mat list T4_4, f(%4.3f)

* Tables 5.3, 5.4 inorm da _all using tmp, m(5) seed1(20081003) replace

use tmp, clear * first look at regression results in each imputed dataset forvalues k=1/5 { quietly reg gradrat csat lenroll private stufac rmbrd if _mj==`k' est store T`k' } est table T*, se b(%6.3f)

* obtain combined MI estimates cf. Tables 5.3, 5.4 mim: reg gradrat csat lenroll private stufac rmbrd mim, mcerror

* compare with results using larger number of imputations to stabilise MC error use http://www.ats.ucla.edu/stat/examples/md/usnews.dta, clear inorm em _all inorm da _all using tmp, m(20) seed1(20081013) replace use tmp, clear mim: reg gradrat csat lenroll private stufac rmbrd mim, mcerror

Also see

Online: help for mim