Test the impact of sampling weights in regression analysis
wgttest depvar [varlist] [if exp] [in range] , wgt(wgtvar) [ cmd(estimation_command) prefix(string) testopt(test_options) nonoise estimation_options ]
by ... : may be used with wgttest; see help by.
Description
Whether to use sampling weights (pweights) in regression analysis should be carefully evaluated. Often, the weights do not have great influence on the parameter estimates (see e.g. Winship and Radbill, 1994, to learn when and why). In such cases, unweighted estimates are preferable because they are more efficient than the weighted estimates.
wgttest performs a test proposed by DuMouchel and Duncan (1983) to evaluate the significance of the impact of sampling weights on estimation results. First, a regression model of depvar on varlist including wgtvar and the first order interactions between varlist and wgtvar as additional covariates will be estimated. Second, the coefficients of these covariates (i.e. wgtvar and the interactions) are tested against zero using a standard F test (as provided by the post-estimation command test). "If the F test is not significant, then the weighted and unweighted estimates are not significantly different and the analyst can proceed by using unweighted OLS. Weighted and unweighted estimates are significantly different if the F test is significant" (Winship and Radbill, 1994: 248). In the later case, the unweighted estimators are probably biased by sample selection and the weighted estimators are preferable. Be aware, however, that significant differences between weighted and unweighted estimates may also be due to model misspecification.
Options
cmd(estimation_command) allows users to choose a command other than regress for model estimation. Technically, wgttest will work with most estimation commands (if not all). This, however, does not mean that the test is always valid (DuMouchel and Duncan, 1983, who proposed the test, discuss it solely within the framework of linear regression).
nonoise suppresses the estimation results.
prefix(string) allows users to choose a prefix other than _I for the interaction variables. The length of the prefix is restricted to 4 characters. Note that the interaction variables will only be created temporarily.
testopt(test_options) may be used to pass options thru to the post-estimation command test.
wgt(wgtvar) specifies the sampling weights (mandatory).
estimation_options are passed thru to the estimation command.
Examples
Test the impact of the sampling weights (variable pwt) for a linear regression model of wages on education and work experience:
. wgttest wage education experience, wgt(pwt)
References
DuMouchel, W. H. and G. J. Duncan (1983). Using Sample Survey Weights in Multiple Regression Analyses of Stratified Samples. Journal of the American Statistical Association 78: 535-543. Winship, C. and L. Radbill (1994). Sampling Weights and Regression Analysis. Sociological Methods and Research 23: 230-257.
Author
Ben Jann, ETH Zurich, jann@soz.gess.ethz.ch
Also see
Manual: [U] 23 Estimation and post-estimation commands, [U] 29 Overview of Stata estimation commands, [R] test, [R] regress