-------------------------------------------------------------------------------
help for whotdeck
-------------------------------------------------------------------------------

Multiple Imputation using the Approximate Bayesian Bootstrap (hotdeck) with wei > ghts

whotdeck [varlist] [using] [if exp] [in exp] , predmis(string) [ by(varlist) store rep:lace output noise quiet generate(varname) command(string) parms(string) predmis(string) impute(#) ]

Description

whotdeck will tabulate the missing data patterns within the varlist. A row of data with missing values in any of the variables in the varlist is defined as a `missing line' of data, similarly a `complete line' is one where all the variables in the varlist contain data. The whotdeck procedure replaces the varlist variables in the `missing lines' with the corresponding values in the `complete lines'. Whotdeck should be used several times within a multiple imputation sequence since missing data are imputed stochastically, with weights determined by a logistic regression model of the missing data process, rather than deterministically.

This is an adapted form of the Approximate Bayesian Bootstrap method of Rubin and Schenker(1986); first a bootstrap sample of lines are sampled with replacement from the entire dataset. Then a logistic regression model specified by predmis is fitted to estimate the missingness weights. Then nmiss lines are sampled, using the weights, with replacement from the nobs complete lines of data.

One major assumption with the whotdeck procedure is that the missing data are either missing completely at random (MCAR) or is missing at random (MAR) conditional on the logistic regression model of the missing mechanism specified in the option predmis. Additionally the sampling can be stratified using the by option.

If a dataset contains a multivariate missingness pattern then it may contain very few complete lines of data. The whotdeck procedure will not work very well in such circumstances. There are more elaborate methods that only replace missing values, rather than the whole row, for imputed values. These multivariate multiple imputation methods are discussed by Schafer(1997).

Options

predmis(string) specifies the linear predictor of the missingness model

by(varlist) specifies categorical variables defining strata within which the imputation is to be carried out.

store specifies whether the imputed datasets are saved to disk.

using specifies the root of the imputed datasets filenames. The default is "imp" and hence the datasets will be saved as imp1.dta, imp2.dta, ....

command(string) specifies the analysis performed on every imputed dataset.

noise specifies whether the individual analyses are displayed. By default the combined estimates are displayed.

parms(string) specifies the parameters of interest from the analysis. If the command is a regression command then the parameter list can include a subset of the variables specified in the regression command.The final output consists of the combined estimates of these parameters. For non-standard commands that are "regression" commands the parms option looks at the estimation matrix e(b) and requires the column names to identify the coefficients of interest.

impute(#) specifies the number of imputations to be used.

Examples

whotdeck y x, predmis(sex age) command(logit y x) parms(x _cons) impute(5) Do not save imputed datasets but carry out a logistic regression on the imputed dataset and display the coefficients for x and the constant term of the model. > Use the variables age and sex in the model for predicting missingness

Author

Adrian Mander, Glaxo Smithkline, Harlow, UK.

Click here to see Adrian Mander's WEB site Email adrian.p.mander@gsk.com

Also see

On-line: help for hotdeck (if installed).