------------------------------------------------------------------------------- help mltcooksd Katja Moehring and Alexander Schmidt -------------------------------------------------------------------------------

Cook's D and DFBETAs after mixed models (beta version)

Syntax

mltcooksd [ , ] [ keepvar(prefix) ] [ counter ] [ graph ] [ slabel ] [ fixed ] [ random ] [ approx ]

mltcooksd is part of the mlt (multilevel tools) package.

Description

mltcooksd estimates Cook's D and DFBETAs for the second level units in two-level mixed models estimated with xtmixed, xtmelogit or xtmepoisson (Stata Version 11 or above). Cook's D describes the influence that the exclusion of a single level-two unit has on the estimated model parameters. DFBETAs describes the influence that a single level-two unit has on each of the independent variables in the model.

By default mltcooksd reports Cook's D for the whole model (random+fixed part). The options fixed and random add separate estimates of Cook's D for the random and the fixed part of the model. See Snijders and Berkhof (2008: 158) for the formulas of Cook's D.

For models with a random part, Cook's D and DFBETAs cannot be estimated from the matrices stored after the regression. The Ado mltcooksd goes the empirical way and calculates Cook's D and DFEBTAs by estimating a series of models, excluding each level-two unit one at a time. We follow Van der Meer et. al. (2006) in this approach.

mltcooksd will show and use cutoff values for Cook's D and DFBETAs. These cutoff values are based on Belsley et. al. (1980: 13). The cutoff value for Cook's D is 4/n, with n= number of level-two units. The cutoff value for DFBETAs is 2/sqrt(n), with n = number of level-two units.

mltcooksd stores each estimated model. The command mltshowm produces an estimation table for all models that produce a Cook's D value above the cutoff. If you want to display other models estimated by mltcooksd, have a look at the list of stored models (estimates dir). All models stored by mltcooksd begin with the letters WJ, followed by the number of the left out level-two unit, e.g. WJ1 is the model estimated without (Unit) J=1.

Options

keepvar(prefix) specifies whether mltcooksd should keep the variables containing Cook's D and DFBETAs values. You have to specify a prefix which is used in the variable names.

counter specifies that mltcooksd displays the estimated time until the program finishes. Depending on your model mltcooksd can run quite a long time, so it might be interesting to see how long it will run. The first estimate will be given after estimating the first model. Then, mltcooksd gives a new refined estimate after each new estimation.

graph specifies that mltcooksd produces a box plot showing the distribution of DFBETAs for each independent variable in the model.

slabel suppresses the value labels of the level-two units in the graph (if specified) and in the listing of Cook's D and DFBETAs.

fixed lists Cook's D for the fixed part of the model separately.

random lists Cook's D for the random part of the model separately.

approx computes an approximation of Cook's D and DFBETAs (following Snijders and Berkhof 2008, Snijders and Bosker 1999). The approximation can be derived much faster than the complete computation. The option is for use after xtmelogit and xtmepoisson. Details: We perform only one iteration for each model, starting from the coefficient vector of the full model (one-step estimator). More iterations are only done if the model does not converge. We do not use the algorithms proposed in Snijders and Berkhof 2008 (IGLS, RIGLS, Fisher scoring), but the same algorithm that has been used to compute the full model (in most cases the default: Stata's modified Newton-Raphson).

Examples Load data set (ISSP 2006) . net get mlt . use redistribution.dta

Multilevel regression of "Support for income redistribution" . xtmixed gr_incdiff sex age incperc rgdppc gini || Country: , mle var

Estimate Cook's D and DFBETAs (fixed and random part seperately) . mltccoksd, fixed random counter

References

David Belsley, Edwin Kuh, Roy Welsch (1980): Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. New York: John Wiley.

ISSP (2006): International Social Survey Programme - Role of Government IV, GESIS StudyNo: ZA4700, Edition 1.0, doi:10.4232/1.4700.

Tom Snijders and Johannes Berkhof (2008): Diagnostic Checks for Multilevel Models. In Handbook of Multilevel Analysis, edited by J. De Leeuw and E. Meijer. New York: Springer.

Tom A.B. Snijders and Roel J. Bosker (1999): Multilevel Analysis. An Introduction to Basic and Advanced Multilevel Modeling. London: Sage.

Tom Van der Meer, Manfred Te Grotenhuis and Ben Pelzer (2006): Influential Cases in Multilevel Modeling: A Methodological Comment. American Sociological Review 75(1), 173-178.

Authors

Katja Moehring, GK SOLCIFE, University of Cologne, moehring@wiso.uni-koeln.de, www.katjamoehring.de.

Alexander Schmidt, GK SOCLIFE and Chair for Empirical Economic and Social Research, University of Cologne, alex@alexanderwschmidt.de, www.alexanderwschmidt.de.

Also see