Title
khb Decomposition of effects in non-linear probabiltiy models using the KHB-method
Syntax
khb model-type depvar key-vars || z-vars [if] [in] [weight] [ , options ]
options Description ------------------------------------------------------------------------- Main concomitant(varlist) concomitants disentangle disentangle difference for each z-var summary summary of decomposition or exponentiated coeficients vce(vcetype) vcetype may be robust, cluster clustvar ape decomposition using average partial (marginal) effects continuous treat dummy variable as continuous when using ape-method notable suppress coefficient table verbose show restricted and full model keep keep residuals of z-vars xstandard standardize key-vars zstandard standardize z-vars Model-type specific outcome(outcome) outcome used for decomposition when model-type is mlogit baseoutcome(#) value of depvar that will be the base outcome when model-type is mlogit group(varname) necessary options for model-types rologit and clogit; see help of these models. other all options allowed for the specified model-type -------------------------------------------------------------------------
model-type can be any of regress, logit, ologit, probit, oprobit, cloglog, {help slogit}, scobit, rologit, clogit, xtlogit, xtprobit and mlogit. Other models might also produce output but for the time being this output is considered to be "experimental".
depvar is the name of the dependent variable, key-vars is a varlist holding the name(s) of the variable(s) to be decomposed, and z-vars is a varlist holding the name(s) of control variables of interest.
Factor variables are allowed for key-vars. Factor variables for z-vars are only allowed for Stata 12 or higher. Factor variables for key-vars are not allowed, if option -xstandard- is specified.
aweights, fweights, iweights, and pweights are allowed if they are allowed in the specified model-type; see weight.
Description
khb applies the KHB method developed to compare the estimated coefficients between two nested non-linear probability models (Karlson/Holm/Breen 2011; Breen/Karlson/Holm 2010). An important use of the technique is to decompose the total effect of a variable into a direct and indirect of spurious part. The method is developed for binary, logit and probit models, but this command also includes other nonlinear probability models (ordered and multinomial) and linear regression. Contrary to other decomposition methods, the KHB-method gives unbiased decompositions, decomposes effects of both discrete and continuous variables, and provides analytically derived statistical tests for many models of the GLM family.
In linear regression models, decomposing the total effect into direct and indirect/spurious effects is straightforward. The decomposition is done by comparing the estimated coefficient of a key variable of interest (key-var) between a reduced model without a control variable Z and a full model with one or more Z variable added. The difference between the estimated coefficients of the key-variable of interest in the two models expresses the amount by which the effect of the key-variable is confounded by the z-variable(s). If the control variable is hypothesized to be a consequence of the key-variable, the difference will be commonly termed as the "indirect effect"; if the control variable is the hypothesized to be a cause of the key-variable, the difference is termed the "spurious effect".
The strategy described for linear models cannot be used in the context of nonlinear probability models such as logit and probit, because the estimated coefficients of these models are not comparable between different models. The reason is a rescaling of the model induced by a property of these models: the coefficients and the error variance are not separately identified. The KHB-method solves this problem. It allows the comparison of effects of nested models for many models of the GLM framework, including logit, probit, ologit, oprobit, and mlogit. The basic idea of the method is to compare the full model with a reduced model that substitutes some Z-variables by the residuals of the Z-variables from a regression of the Z-variables on the key-vars (see Karlson/Holm/Breen 2011 for explanations and details). The method consequently allows separation of the change in the coefficient that is due to confounding and the change that is due to rescaling.
The KHB-method also allows the inclusion variables that control for confounding influences on the decomposition. These variables are named concomitants in Karlson/Holm (2011) and Breen/Karlson/Holm (2010). These variables do not play the role of the Z-variables of the y*-x-relationship, but rather as a set of variables that is included to secure that both, the effects of the full model and the reduced model are not confounded by these variables.
The KHB-method is primarily intended to be used for various variants of logit and probit models. However, it can be also used for linear regression, in which case it returns the same results as the standard technique. khb is then just a convenient way to do the decomposition with one single command.
Note that using regress as model-type for binary dependent variables boils down to using a linear probability model for the decomposition. However, the interpretation of decompositions in linear probability models is unknown, and may not reflect the parameters of interest (in particular the indirect effect). Caution should consequently be exercised, and the authors do not recommend using khb for linear probability models until the properties of these models have been explored formally.
A worked example using khb appears in Breen/Karlson/Holm (2010).
Options
summary requests the provision of a decomposition summary for all key-vars. By default, khb reports the effects of all key variables along with standard errors in terms of the estimated coefficients. With option summary, khb also presents a table holding the "confounding ratios", the "percentage reduction due to confounding" and the "rescale factor". The confounding ratio measures the impact of confounding net of rescaling. The percentage reduction measures the percentage change in the coefficient of each key-var attributable to confounding net of scaling. Finally, the rescale factor measures the impact of rescaling, net of confounding.
disentangle request a table that show how much of the difference between the full and reduced model is contributed by each of the single z-variables.
notable suppresses the display of the coefficient table. This normally involves the options summarize and/or disentangle
concomitant(varlist) is used to specify control variables that are not z-variables. Factor variables are allowed.
vce(vcetype) specifies the type of standard error reported. It defaults to the Stata's defaults for the specified model-type. Standard errors for indirect effects are estimated using a method discussed by Sobel (1982). The option vce() set the standard errors for total and direct effects and controls the type of standard error that enter into Sobel's method. Types robust, cluster; see help vce_option.
ape is used to decompose the key-vars using average partial effects (average marginal effects). Uses margins to compute average partial effects. For model-types ologit and oprobit, khb uses the average partial effect on the probability for the first outcome unless outcome() is specified; see ologit_postestimation for various ways to specify outcome(). Note that with APE the calculated indirect effect is not constant across outcomes. This is a well-known property of ordered choice models (see Greene/Henscher 2010).
or exponentiates the estimated coefficients, and hence shows odds-ratios for logit models. The coefficient for the reduced model is then the product of the full model with the estimated difference.
verbose is used to show the complete output of the full and restricted models that are used to estimate the decomposition. This is especially usefull to detect problems that occure in the intermediate steps of the estimation.
keep is used to keep the residuals of the z-variables, i.e. the z-variables net of confounding. These residuals are included as independent variables in the reduced model.
continuous Average partial effects are by default based on unit effects for dummy variables. Specifying continuous treats dummy variables equal to continuous variables. See margins for details about this option
xstandard is used to standardize the key-vars.
zstandard is used to standardize the z-vars.
outcome(outcome) specifies the outcome for which the decompostion is to be calculated. This takes effect for models for multinomial response (mlogit), and, if option ape is specified, for ordered response models. outcome() can be specified using
#1, #2, ..., where #1 means the first category of the dependent variable, #2 means the second category, etc.;
the values of the dependent variable; or
the value labels of the dependent variable if they exist.
baseoutcome(#) can be used for model-type mlogit. It specifies the value of depvar to be treated as the base outcome. The default is to choose the most frequent outcome. The option can be used together with outcome() to fully control the contrast for which the decompositon is done.
Example(s)
. use dlsy_khb.dta . khb logit univ fses || abil . khb probit univ fses || abil . khb logit univ fses || abil, c(intact boy) . khb logit univ fses || abil, summary
References
Breen, R./Karlson, K.B./Holm, A. (Forthcoming). Total, direct, and indirect effects in logit models. Accepted for publication in: Sociological Methods and Research.
Greene, W.H./Hensher, D.A. (2010): Modeling Ordered Choices: A Primer. New York: Cambridge University Press.
Karlson, K.B./Holm, A./Breen, R. (2011): Comparing Regression Coefficients Between Same-sample Nested Models using Logit and Probit. A New Method. Sociological Methodology 42:286-313.
Karlson, K.B./Holm, A. (2011): Decomposing primary and secondary effects: A new decomposition method. Research in Stratification and Social Mobility 29:221-237.
Kohler, U./Karlson, K.B./Holm, A. (2011): Comparing coefficients of nested nonlinear probability models. The Stata Journal 11:420-438.
Also see
Manual: [R] margins
Online: help for margins, ldecomp (if installed)
Web: Stata's Home
Author
Ulrich Kohler (kohler@wzb.eu) and Kristian Karlson (kbk@dpu.dk)
Please send bug reports and questions regarding the program to Ulrich Kohler. Questions regarding the KHB method itself are handled by Kristian Karlson.