------------------------------------------------------------------------------- help forwhotdeck-------------------------------------------------------------------------------

Multiple Imputation using the Approximate Bayesian Bootstrap (hotdeck) with wei> ghts

whotdeck[varlist] [using] [ifexp] [inexp] ,predmis(string)[by(varlist)storerep:laceoutputnoisequietgenerate(varname)command(string)parms(string)predmis(string)impute(#)]

Description

whotdeckwill tabulate the missing data patterns within thevarlist. A row of data with missing values in any of the variables in thevarlistis defined as a `missing line' of data, similarly a `complete line' is one where all the variables in thevarlistcontain data. Thewhotdeckprocedure replaces thevarlistvariables in the `missing lines' with the corresponding values in the `complete lines'.Whotdeckshould be used several times within a multiple imputation sequence since missing data are imputed stochastically, with weights determined by a logistic regression model of the missing data process, rather than deterministically.This is an adapted form of the Approximate Bayesian Bootstrap method of Rubin and Schenker(1986); first a bootstrap sample of lines are sampled with replacement from the entire dataset. Then a logistic regression model specified by

predmisis fitted to estimate the missingness weights. Thennmisslines are sampled, using the weights, with replacement from thenobscomplete lines of data.One major assumption with the

whotdeckprocedure is that the missing data are either missing completely at random (MCAR) or is missing at random (MAR) conditional on the logistic regression model of the missing mechanism specified in the optionpredmis. Additionally the sampling can be stratified using thebyoption.If a dataset contains a multivariate missingness pattern then it may contain very few complete lines of data. The

whotdeckprocedure will not work very well in such circumstances. There are more elaborate methods thatonlyreplace missing values, rather than the whole row, for imputed values. These multivariate multiple imputation methods are discussed by Schafer(1997).

Options

predmis(string)specifies the linear predictor of the missingness model

by(varlist)specifies categorical variables defining strata within which the imputation is to be carried out.

storespecifies whether the imputed datasets are saved to disk.

usingspecifies the root of the imputed datasets filenames. The default is "imp" and hence the datasets will be saved as imp1.dta, imp2.dta, ....

command(string)specifies the analysis performed on every imputed dataset.

noisespecifies whether the individual analyses are displayed. By default the combined estimates are displayed.

parms(string)specifies the parameters of interest from the analysis. If thecommandis a regression command then the parameter list can include a subset of the variables specified in the regression command.The final output consists of the combined estimates of these parameters. For non-standard commands that are "regression" commands theparmsoption looks at the estimation matrix e(b) and requires the column names to identify the coefficients of interest.

impute(#)specifies the number of imputations to be used.

Exampleswhotdeck y x, predmis(sex age) command(logit y x) parms(x _cons) impute(5) Do not save imputed datasets but carry out a logistic regression on the imputed dataset and display the coefficients for x and the constant term of the model. > Use the variables age and sex in the model for predicting missingness

AuthorAdrian Mander, Glaxo Smithkline, Harlow, UK.

Click here to see Adrian Mander's WEB site Email adrian.p.mander@gsk.com

Also seeOn-line: help for hotdeck (if installed).