help for ldecomp -------------------------------------------------------------------------------

Title

ldecomp -- Decomposes total effects in logistic regresion into direct and indirect effects.

Syntax

ldecomp depvar [ control_var1 [...]] } [if] [in] [weight] , direct(varname) indirect(varlist) [ at(control_var1 # [; control_var2 #] [...]) obspr predpr predodds or rindirect normal range(# #) nip(#) interactions nolegend nodecomp nobootstrap bootstrap_options ]

fweights, pweights, and iweights are allowed when the nobootstrap option is specified.

Description

ldecomp decomposes the total effects of a categorical variable in logistic regresion into direct and indirect effects using a method method by Erikson et al. (2005) and a generalization of this method by Buis (2008). Say our dependent variable is whether or not someone attends college, and we are interested in decomposing the total effect of class background (high or low). We suspect that part of the total effect can be explained by differences between the classes in performance during high school: higher class children do better at high school and children that do better at high school are more likely to attend college. This is the indirect effect. The direct effect of class is the effect while controlling for the performance: higher class children are more likely to attend college even if they have the same performance during high school.

There are two ways in which one can see the impact of differences in the distribution of performance across classes, and thus get the indirect effect. One can fix the logistic regresion coefficients to be equal to the coefficients for the lower class and compare the proportion attending college of the group with a distribution of performance equal to the lower class and proportion attending college of the group with a distribution of performance equal to the higher class. This way the only difference between the two groups is the distribution of performance. Call this method 1. Alternatively, one can make the same comparison but fix the logistic regression coefficients to be equal to the coefficients of the higher class. Call this method 2.

Similarly, one can control for performance, and thus get the direct effect, by fixing the distribution of performance to be equal to the distribution of performance of the higher class (method 1) or the lower class (method 2). Once these distributions are fixed one can compare the proportion attending college of the group with the logistic regression coefficients of the lower class with the proportion attending college of the group with the logistic regression coefficients of the higher class.

If these direct and indirect effects are represented as odds ratios than the total effect is the product of the the direct and indirect effect, as can be seen in equations 1 (method 1) and 2 (method 2). The O represents the odds of attending college, the first subscript the distribution of performance and the second subscript the logistic regression coefficients.

O_hl O_hh O_hh ------ X ------ = ------ (1) O_ll O_hl O_ll

O_hh O_lh O_hh ------ X ------ = ------ (2) O_lh O_ll O_ll

If these direct and indirect effects are represented as log odds ratios than the total effect is the sum of the the direct and indirect effect, as can be seen in equations 1' (method 1) and 2' (method 2).

+ + + + + + | O_hl | | O_hh | | O_hh | ln|------| + ln|------| = ln|------| (1') | O_ll | | O_hl | | O_ll | + + + + + +

+ + + + + + | O_hh | | O_lh | | O_hh | ln|------| + ln|------| = ln|------| (2') | O_lh | | O_ll | | O_ll | + + + + + +

By default ldecomp shows the decomposition in terms of log odds ratios for both method 1 and 2, and computes standard errors using bootstrap. By specifying the relindir option ldecomp will also show estimates of the size of the indirect relative to the total effect, using method 1 and 2 and the average of these two, and their standard errors, also computed using the bootstrap.

In order for this decomposition to work one needs a whole set of odds of attending college, both of groups that actually exist in the data like the group with the distribution of performance and the logistic regression coefficients of the lower class, and of groups that don't exist in the data, like the group with the distribution of performance of the lower class and the logistic regression coefficients of the higher class. All these odds are computed by transforming the average predicted probability of all these actual and counterfactual groups. Erikson et al. (2005) and Buis (2008) differ with respect to the way these average probabilities are computed: Erikson et al. assume that ability is normaly distributed and numerically integrate over this distribution, while Buis makes no assumption about this distribution and just computes the predicted probabilities and then computes the mean, effectively integrating over the empirical distribution of performance instead of over a normal distribution.

Other than Erikson et al. (2005) ldecomp also allows one to add control variables. While computing this decompostion these will by default be fixed at their mean value if no value was specified in the at() option

Options

direct(varname) specifies the variable whose direct effect we want to decompose into an indirect and total effect. This has to be a categorical variable, each value of varnameis assumed to represent a group.

indirect(varlist) specifies the variable(s) through which the indirect effect occurs. By default multiple variables are allowed and these can be from any distribution. If the normal option is specified only one variable can be entered, and this variable is assumed to be normally distributed.

at(control_var1 # [; control_var2 #] [...]) specifies the values at which the control variables are to be fixed. The default is to fix the value of control variable at its mean value.

obspr specifies that a table of the observed proportions are to be displayed.

predpr specifies that a table of predicted and counterfactual proportions is to be displayed. If the normal option is not specified the diagonal elements of this table will be exactly the same as the observed proportions.

predodds specifies that a table of predicted and counterfactual odds is to be displayed.

or specifies that the decomposition is displayed in terms of odds-ratios instead of log odds-ratios.

rindirect specifies that the relative contributions of the indirect effects to the total effect (in terms of log odds ratios) is to be displayed.

normal specifies that the predicted and counterfactual proportions are to be computed according to the method specified by Erikson et al. (2005). This means that the variable specified in indirect() is assumed to be normally distributed. This option was primarily added for compatibility with Erikson et al. (2005). By default the method by Buis (2008) is used, which allows multiple variables to be specified in indirect() and makes no assumptions about the distribution of these variables.

range(##) specifies the range over which the numerical integration of varlist is to be performed. The default is the minimum of the variable in varlist minus 10% of the range of varlist and the maximum of varlist plus 10% of the range of varlist. This option can only be specified with the normal option because in the default method there is no need for numerical integration.

nip(#) specifies the number of integration points used in the numerical integration of the variable in varlist. The default is 1000. This option can only be specified with the normal option because in the default method there is no need for numerical integration.

interactions specifies that interactions between the categories of the variable specified in direct() and the variable(s) specified in indirect(). In other words the effects of the variables specified in indirect() on the dependent variable are allowed to differ from one another for each category of the variable specified in direct(). This option was primarily added for compatibility with Erikson et al. (2005).

nolegend suppresses a legend that is by default displayed at the bottom of the main table.

nodecomp prevents ldecomp from displaying the table of decompositions, which can be useful in combination with the obspr, predpr, and/or predodds options.

nobootstrap prevents ldecomp from using bootstrap to calculate standard errors.

bootstrap_options The following options of bootstrap are allowed: reps(#), strata(varlist), size(#), cluster(varlist), idcluster(newvar), saving(filename[, suboptions]), bca, mse, level(#), nodots, seed(#), and jackknifeopts(jkopts).

Example

. use wisconsin.dta, clear . ldecomp college , direct(ocf57) indirect(hsrankq)

Author

Maarten L. Buis Universitaet Tuebingen Instituet fur Soziologie maarten.buis@uni-tuebingen.de

References

Buis, M.L. (2009). Direct and indirect effects in a logit model. http://www.maartenbuis.nl/wp/ldecomp.html

Erikson, R, J.H. Goldthorpe, M. Jackson, M. Yaish, D.R. Cox (2005). On class differentials in educational attainment. Proceedings of the National Academy of Science, 102(27): 9730-9732.

Jackson, M, R. Erikson, J. Goldthorpe, M. Yaish (2007). Primary and secondary effects in class differentials in educational attainment: The transition to A-level courses in England and Wales. Acta Sociologica, 50(3): 211-229.

Also see

Online: logit bootstrap bootstrap_postestimation jackknife

If installed: fairlie gdecomp