------------------------------------------------------------------------------- help for mimstack JC Galati, P Royston & JB Carlin -------------------------------------------------------------------------------

Title

mimstack -- A command for stacking a multiply-imputed dataset into the format required by mim.

Syntax

mimstack, m(#) sortorder(varlist) istub(string) | ifiles(string) [ nomj0 clear ]

Description

mimstack is a utility command which transforms a multiply-imputed dataset stored in seperate files into the format required by mim.

Options

m specifies the number of imputed datasets

sortorder specifies a list of one or more variables that uniquely identify the observations in each of the datasets to be stacked

istub specifies a list of `m'+1 datasets to be stacked, `istub'0.dta, `istub'0.dta, ..., `istub'`m'.dta, where `istub'0.dta contains the original data with missing values and the remaining files contain the `m' imputed copies of the data, unless the nomj0 option is specified, in which case only the imputed datasets are stacked into mim-compatible format

ifiles specifies a space-separated list of `m'+1 datasets to be stacked, with the first filename specifying the file containing the original data with missing values, and the remaining `m' filenames specifying the files containind the imputed datasets, unless the nomj0 option is specified, in which case only the `m' files containing the imputed data are specified

nomj0 specifies that the original data is not to be stacked with the imputed datasets

clear allows the current dataset to be discarded

Remarks

For users:

The first use of mimstack addresses the problem where the user has some number of imputed copies of a dataset (10 for example), possibly generated using some other software, and these are stored in separate data files, say "myfile0.dta", "myfile1.dta", ..., "myfile10.dta", where "myfile0.dta" contains the original data with missing values. The datasets are stacked into a mim-compatible dataset using the istub option as follows

. mimstack, m(10) so("idno") istub(myfile) clear

Here sortorder must contain a list of variables that uniquely identify the observations in each of the datasets. Upon succesful completion of the stack operation, the resulting mim compatible dataset is the current dataset in memory. Note that the clear option is only required if the current data has not been saved, but is to be discarded. If the user optionally wishes to stack only the imputed datasets (a practice that is strongly discouraged by the authors, since details of the values that were originally missing will be lost), then this is done using the nomj0 option as follows

. mimstack, m(10) so("idno") nomj0 istub(myfile)

In this case the file "myfile0.dta" is not required. Note that certain mim functions (predict, for example) require that the original data be stored with the imputed data in the stacked dataset.

For programmers:

While it is desirable to do as much processing of a mim dataset with the complete stacked dataset in memory, there are occasions where one may have valid reasons for needing to process each of the imputed datasets separately in memory (this is how the manip option of mim is implemented, for example). In this scenario, mimstack may be used to restack the individual datasets post processing using the ifiles option, assuming that the changes to the individual datasets have been captured in separate temporary files and the _mi variable has been regenerated in each dataset to reflect any changes in the number of observations in the datasets:

. mimstack, m(`m') so("_mi") ifiles(`"`ifilelist'"') clear

Here m is the number of imputed datasets in the stack and ifilelist is a local macro containing the list of m+1 temporary filenames, with each filename being separated by a space, and, if the path or names of the temporary files contain spaces, each filename enclosed in compound quotes.

Note that while the most natural approach to handling the above type of processing is by looping over the individual _mj values, and at each iteration, preserving the mim dataset, keeping the individual dataset required, processing it, saving the results to a temporary file and then restoring; for large m this is very inefficient. A preferable method is to temporarily save the mim dataset to disk and then read in each individual dataset one at a time via the using clause of the use command as follows:

. quietly levelsof _mj, local(levels) . local m : word count `levels' . tempfile mimfile . quietly save `mimfile' . foreach j of local levels { . use _all if _mj==`j' using `mimfile', clear . ... . tempfile tfile`j' . quietly save `tfile`j'' . local ifilelist `"`ifilelist' `tfile`j''"' . } . mimstack `mimfile', m(`m') so("_mi") ifiles(`"`ifilelist'"') clear

Also see

Online: help for mim .