Multiple Imputation using the Approximate Bayesian Bootstrap (hotdeck) with wei > ghts
whotdeck [varlist] [using] [if exp] [in exp] , predmis(string) [ by(varlist) store rep:lace output noise quiet generate(varname) command(string) parms(string) predmis(string) impute(#) ]
Description
whotdeck will tabulate the missing data patterns within the varlist. A row of data with missing values in any of the variables in the varlist is defined as a `missing line' of data, similarly a `complete line' is one where all the variables in the varlist contain data. The whotdeck procedure replaces the varlist variables in the `missing lines' with the corresponding values in the `complete lines'. Whotdeck should be used several times within a multiple imputation sequence since missing data are imputed stochastically, with weights determined by a logistic regression model of the missing data process, rather than deterministically.
This is an adapted form of the Approximate Bayesian Bootstrap method of Rubin and Schenker(1986); first a bootstrap sample of lines are sampled with replacement from the entire dataset. Then a logistic regression model specified by predmis is fitted to estimate the missingness weights. Then nmiss lines are sampled, using the weights, with replacement from the nobs complete lines of data.
One major assumption with the whotdeck procedure is that the missing data are either missing completely at random (MCAR) or is missing at random (MAR) conditional on the logistic regression model of the missing mechanism specified in the option predmis. Additionally the sampling can be stratified using the by option.
If a dataset contains a multivariate missingness pattern then it may contain very few complete lines of data. The whotdeck procedure will not work very well in such circumstances. There are more elaborate methods that only replace missing values, rather than the whole row, for imputed values. These multivariate multiple imputation methods are discussed by Schafer(1997).
Options
predmis(string) specifies the linear predictor of the missingness model
by(varlist) specifies categorical variables defining strata within which the imputation is to be carried out.
store specifies whether the imputed datasets are saved to disk.
using specifies the root of the imputed datasets filenames. The default is "imp" and hence the datasets will be saved as imp1.dta, imp2.dta, ....
command(string) specifies the analysis performed on every imputed dataset.
noise specifies whether the individual analyses are displayed. By default the combined estimates are displayed.
parms(string) specifies the parameters of interest from the analysis. If the command is a regression command then the parameter list can include a subset of the variables specified in the regression command.The final output consists of the combined estimates of these parameters. For non-standard commands that are "regression" commands the parms option looks at the estimation matrix e(b) and requires the column names to identify the coefficients of interest.
impute(#) specifies the number of imputations to be used.
Examples
whotdeck y x, predmis(sex age) command(logit y x) parms(x _cons) impute(5) Do not save imputed datasets but carry out a logistic regression on the imputed dataset and display the coefficients for x and the constant term of the model. > Use the variables age and sex in the model for predicting missingness
Author
Adrian Mander, Glaxo Smithkline, Harlow, UK.
Click here to see Adrian Mander's WEB site Email adrian.p.mander@gsk.com
Also see
On-line: help for hotdeck (if installed).