{smcl} {* 12feb2007}{...} {hline} help for {hi: mimstack} {right:JC Galati, P Royston & JB Carlin} {hline} {title:Title} {p 8 8 2} {cmd:mimstack} {hline 2} A command for stacking a multiply-imputed dataset into the format required by {help mim}. {title:Syntax} {p 8 17 2} {cmd:mimstack}{cmd:,} {cmd:m(}{it:#}{cmd:)} {cmdab:so:rtorder(}{it:varlist}{cmd:)} {cmd:istub(}{it:string}{cmd:)} | {cmd:ifiles(}{it:string}{cmd:)} [ {cmd:nomj0} {cmd:clear} ] {title:Description} {p 4 4 2} {cmd:mimstack} is a utility command which transforms a multiply-imputed dataset stored in seperate files into the format required by {help mim}. {title:Options} {p 4 8 2} {cmd:m} specifies the number of imputed datasets {p 4 8 2} {cmdab:so:rtorder} specifies a list of one or more variables that uniquely identify the observations in each of the datasets to be stacked {p 4 8 2} {cmd:istub} specifies a list of `m'+1 datasets to be stacked, `istub'0.dta, `istub'0.dta, ..., `istub'`m'.dta, where `istub'0.dta contains the original data with missing values and the remaining files contain the `m' imputed copies of the data, unless the {cmd:nomj0} option is specified, in which case only the imputed datasets are stacked into {cmd:mim}-compatible format {p 4 8 2} {cmd:ifiles} specifies a space-separated list of `m'+1 datasets to be stacked, with the first filename specifying the file containing the original data with missing values, and the remaining `m' filenames specifying the files containind the imputed datasets, unless the {cmd:nomj0} option is specified, in which case only the `m' files containing the imputed data are specified {p 4 8 2} {cmd:nomj0} specifies that the original data is not to be stacked with the imputed datasets {p 4 8 2} {cmd:clear} allows the current dataset to be discarded {title:Remarks} {p 4 4 2} {it:For users:} {p 4 4 2} The first use of {cmd:mimstack} addresses the problem where the user has some number of imputed copies of a dataset (10 for example), possibly generated using some other software, and these are stored in separate data files, say "myfile0.dta", "myfile1.dta", ..., "myfile10.dta", where "myfile0.dta" contains the original data with missing values. The datasets are stacked into a {cmd:mim}-compatible dataset using the {cmd:istub} option as follows {p 4 4 2} {cmd:. mimstack, m(10) so("idno") istub(myfile) clear} {p 4 4 2} Here {cmdab:so:rtorder} must contain a list of variables that uniquely identify the observations in each of the datasets. Upon succesful completion of the stack operation, the resulting {help mim} compatible dataset is the current dataset in memory. Note that the {cmd:clear} option is only required if the current data has not been saved, but is to be discarded. If the user optionally wishes to stack only the imputed datasets (a practice that is strongly discouraged by the authors, since details of the values that were originally missing will be lost), then this is done using the nomj0 option as follows {p 4 4 2} {cmd:. mimstack, m(10) so("idno") nomj0 istub(myfile)} {p 4 4 2} In this case the file "myfile0.dta" is not required. Note that certain {help mim} functions ({cmd:predict}, for example) require that the original data be stored with the imputed data in the stacked dataset. {p 4 4 2} {it:For programmers:} {p 4 4 2} While it is desirable to do as much processing of a mim dataset with the complete stacked dataset in memory, there are occasions where one may have valid reasons for needing to process each of the imputed datasets separately in memory (this is how the {cmd:manip} option of {help mim} is implemented, for example). In this scenario, {cmd:mimstack} may be used to restack the individual datasets post processing using the {cmd:ifiles} option, assuming that the changes to the individual datasets have been captured in separate temporary files and the {cmd:_mi} variable has been regenerated in each dataset to reflect any changes in the number of observations in the datasets: {p 4 4 2} {cmd:. mimstack, m(`m') so("_mi") ifiles(`"`ifilelist'"') clear} {p 4 4 2} Here {it:m} is the number of imputed datasets in the stack and {it:ifilelist} is a local macro containing the list of {it:m+1} temporary filenames, with each filename being separated by a space, and, if the path or names of the temporary files contain spaces, each filename enclosed in compound quotes. {p 4 4 2} Note that while the most natural approach to handling the above type of processing is by looping over the individual _mj values, and at each iteration, preserving the mim dataset, keeping the individual dataset required, processing it, saving the results to a temporary file and then restoring; for large m this is very inefficient. A preferable method is to temporarily save the mim dataset to disk and then read in each individual dataset one at a time via the {cmd:using} clause of the {cmd:use} command as follows: {p 4 4 2} {cmd:. quietly levelsof _mj, local(levels)} {p_end} {p 4 4 2} {cmd:. local m : word count `levels'} {p_end} {p 4 4 2} {cmd:. tempfile mimfile} {p_end} {p 4 4 2} {cmd:. quietly save `mimfile'} {p_end} {p 4 4 2} {cmd:. foreach j of local levels {c -(}} {p_end} {p 4 4 2} {cmd:.}{space 5}{cmd:use _all if _mj==`j' using `mimfile', clear} {p_end} {p 4 4 2} {cmd:.}{space 5}{cmd:...} {p_end} {p 4 4 2} {cmd:.}{space 5}{cmd:tempfile tfile`j'} {p_end} {p 4 4 2} {cmd:.}{space 5}{cmd:quietly save `tfile`j''} {p_end} {p 4 4 2} {cmd:.}{space 5}{cmd:local ifilelist `"`ifilelist' `tfile`j''"'} {p_end} {p 4 4 2} {cmd:. {c )-}} {p_end} {p 4 4 2} {cmd:. mimstack `mimfile', m(`m') so("_mi") ifiles(`"`ifilelist'"') clear} {p_end} {title:Also see} {p 4 4 2} Online: help for {help mim} . {p_end}