------------------------------------------------------------------------------- help forinormJohn C. Galati -------------------------------------------------------------------------------

Multiple-imputation via data-augmentation under a multivariate normal model

inorm emvarlist[ifexp] [inrange] [,xvars(varlist)maxits(#)criterion(#)ridge(#)mu(string)sigma(string)echomata]

inorm davarlist[ifexp] [inrange]usingfilename[,xvars(varlist)m(#)its(#)burnin(#)ridge(#)mu(string)sigma(string)seed1(#)seed2(#)replacemata]

Description

inormis a command for creating multiple imputed copies of an incomplete dataset under a multivariate normal model. It consists of two subcommands,emandda, which provide initial likelihood estimation (em) and imputation via data-augmentation (da), respectively. Theemcommand is used to produce initial estimates, returned in matricesr(mu)andr(sigma), that give a starting point for the data-augmentation stage. After imputation with thedacommand, the imputed datasets are stored in a long format, suitable for analysis using the prefix command mim (if installed), in a single Stata dataset specified byfilename.This command is a Stata implementation of the Windows freeware NORM, written by Joe Schafer (http://www.stat.psu.edu/~jls/misoftwa.html#mi), and is based on original Fortran code provided by him. For further details the user is referred to JL Schafer, Analysis of Incomplete Multivariate Data, Chapman & Hall 1997.

inormis intended to run in either of two modes, one using a "plug-in" software component that requires a Windows environment, and the other using Mata code that should be platform-independent. The Mata version appears to be slow and it has not been extensively tested.

Options

burninspecifies an initial number of iterations to perform before commencing draws of the imputed datasets. The default is zero. In generalburnin+itsiterations will be performed prior to drawing the set of imputations for the first dataset, anditsiterations will be performed between each subsequent draw of a set of imputated values for the dataset.

criterionspecifies a convergence criterion for the em algorithm. The default is "0.000001", meaning the algorithm terminates when either the maxits number of iterations has been performed, or no entry in either the coefficient vector or covariance matrix estimate changes by more than 0.000001 times the previous value from one iteration to the next.

echospecifies that an estimate of the loglikelihood at each em iteration should be echoed to the screen.

itsspecifies the number of iterations between draws of imputed datasets. The default is 50.

mspecifies the number of imputed copies of the dataset to be created. The default is 2.

mataspecifies that the mata version of inorm should be used in preference to the plugin version (on Windows; on other operating systems the plugin version is not available).

maxitsspecifies the maximum number of iterations of the em algorithm to be performed. The default is 1000 iterations.

muspecifies the name of a Stata matrix containing an initial estimate for the mean of the multivariate normal model.

replacespecifies thatfilenamemay be overwritten if it exists.

ridgespecifies a ridge hyperparameter for data-dependant ridge prior as described in JL Schafer, Analysis of Incomplete Multivariate Data, Chapman & Hall 1997.

seed1specifies a first seed for the random number generator.

seed2specifies a second seed for the random number generator.

sigmaspecifies the name of Stata matrix containing an initial estimate of the covariance matrix for the multivariate normal model.

xvarsgives a list of covariates that are completely observed on the estimation subsample (defined by the optional if or in clauses). The variables invarlistare modelled using a multivariate normal distribution conditional on the values of thexvarsvariables.

ExamplesWe provide an example analysis that reproduces results displayed by Paul Allison in his monograph "Missing Data" (Sage Publications, 2002). (The table references below correspond to tables in Chapters 4 and 5 of the book.) We are grateful to Rodrigo Alfaro of the Central Bank of Chile for suggesting the use of this example.

* Table 4.3use http://www.ats.ucla.edu/stat/examples/md/usnews.dta, clearinorm em _allmat std = diag(vecdiag(r(sigma)))mat std = vecdiag(cholesky(std))mat T4_3 = r(mu)', std'mat list T4_3

* Table 4.4mat T4_4 =r(rho)mat list T4_4, f(%4.3f)

* Tables 5.3, 5.4inorm da _all using tmp, m(5) seed1(20081003) replace

use tmp, clear* first look at regression results in each imputed datasetforvalues k=1/5 {quietly reg gradrat csat lenroll private stufac rmbrd if _mj==`k'est store T`k'}est table T*, se b(%6.3f)

* obtain combined MI estimates cf. Tables 5.3, 5.4mim: reg gradrat csat lenroll private stufac rmbrdmim, mcerror

* compare with results using larger number of imputations to stabilise MCerroruse http://www.ats.ucla.edu/stat/examples/md/usnews.dta, clearinorm em _allinorm da _all using tmp, m(20) seed1(20081013) replaceuse tmp, clearmim: reg gradrat csat lenroll private stufac rmbrdmim, mcerror

Also seeOnline: help for mim