{smcl} {* 29Jan2008} help for {hi:ldecomp} {hline} {title:Title} {p2colset 5 16 18 2}{...} {p2col :{hi: ldecomp} {hline 2}}Decomposes total effects in logistic regresion into direct and indirect effects.{p_end} {p2colreset}{...} {title:Syntax} {p 8 15 2} {cmd:ldecomp} {it:depvar} [ {it:control_var1} [...]] } {ifin} {weight} {cmd:,} {opt d:irect(varname)} {opt i:ndirect(varlist)} [ {opt at(control_var1 # [; control_var2 #] [...])} {opt obs:pr} {opt predp:r} {opt predo:dds} {opt or} {opt ri:ndirect} {opt norm:al} {opt range(# #)} {opt nip(#)} {opt int:eractions} {opt noleg:end} {opt nodec:omp} {opt noboot:strap} {it:{help ldecomp##bootopt:bootstrap_options}} ] {p 8 8} fweights, pweights, and iweights are allowed when the {cmd:nobootstrap} option is specified. {title:Description} {pstd} {cmd:ldecomp} decomposes the total effects of a categorical variable in logistic regresion into direct and indirect effects using a method method by Erikson et al. (2005) and a generalization of this method by Buis (2008). Say our dependent variable is whether or not someone attends college, and we are interested in decomposing the total effect of class background (high or low). We suspect that part of the total effect can be explained by differences between the classes in performance during high school: higher class children do better at high school and children that do better at high school are more likely to attend college. This is the indirect effect. The direct effect of class is the effect while controlling for the performance: higher class children are more likely to attend college even if they have the same performance during high school. {pstd} There are two ways in which one can see the impact of differences in the distribution of performance across classes, and thus get the indirect effect. One can fix the logistic regresion coefficients to be equal to the coefficients for the lower class and compare the proportion attending college of the group with a distribution of performance equal to the lower class and proportion attending college of the group with a distribution of performance equal to the higher class. This way the only difference between the two groups is the distribution of performance. Call this method 1. Alternatively, one can make the same comparison but fix the logistic regression coefficients to be equal to the coefficients of the higher class. Call this method 2. {pstd} Similarly, one can control for performance, and thus get the direct effect, by fixing the distribution of performance to be equal to the distribution of performance of the higher class (method 1) or the lower class (method 2). Once these distributions are fixed one can compare the proportion attending college of the group with the logistic regression coefficients of the lower class with the proportion attending college of the group with the logistic regression coefficients of the higher class. {pstd} If these direct and indirect effects are represented as odds ratios than the total effect is the product of the the direct and indirect effect, as can be seen in equations 1 (method 1) and 2 (method 2). The O represents the odds of attending college, the first subscript the distribution of performance and the second subscript the logistic regression coefficients. {col 8}O_hl{col 17}O_hh{col 26}O_hh {col 7}{hline 6} X {col 16}{hline 6} = {hline 6} (1) {col 8}O_ll{col 17}O_hl{col 26}O_ll {col 8}O_hh{col 17}O_lh{col 26}O_hh {col 7}{hline 6} X {col 16}{hline 6} = {hline 6} (2) {col 8}O_lh{col 17}O_ll{col 26}O_ll {pstd} If these direct and indirect effects are represented as log odds ratios than the total effect is the sum of the the direct and indirect effect, as can be seen in equations 1' (method 1) and 2' (method 2). {col 9}{c TLC}{col 16}{c TRC}{col 22}{c TLC}{col 29}{c TRC}{col 35}{c TLC}{col 42}{c TRC} {col 9}{c |} O_hl {c |}{col 22}{c |} O_hh {c |}{col 35}{c |} O_hh {c |} {col 7}ln{c |}{hline 6}{c |} + ln{c |}{hline 6}{c |} = ln{c |}{hline 6}{c |} (1') {col 9}{c |} O_ll {c |}{col 22}{c |} O_hl {c |}{col 35}{c |} O_ll {c |} {col 9}{c BLC}{col 16}{c BRC}{col 22}{c BLC}{col 29}{c BRC}{col 35}{c BLC}{col 42}{c BRC} {col 9}{c TLC}{col 16}{c TRC}{col 22}{c TLC}{col 29}{c TRC}{col 35}{c TLC}{col 42}{c TRC} {col 9}{c |} O_hh {c |}{col 22}{c |} O_lh {c |}{col 35}{c |} O_hh {c |} {col 7}ln{c |}{hline 6}{c |} + ln{c |}{hline 6}{c |} = ln{c |}{hline 6}{c |} (2') {col 9}{c |} O_lh {c |}{col 22}{c |} O_ll {c |}{col 35}{c |} O_ll {c |} {col 9}{c BLC}{col 16}{c BRC}{col 22}{c BLC}{col 29}{c BRC}{col 35}{c BLC}{col 42}{c BRC} {pstd} By default {cmd:ldecomp} shows the decomposition in terms of log odds ratios for both method 1 and 2, and computes standard errors using {helpb bootstrap}. By specifying the {cmd:relindir} option {cmd:ldecomp} will also show estimates of the size of the indirect relative to the total effect, using method 1 and 2 and the average of these two, and their standard errors, also computed using the bootstrap. {pstd} In order for this decomposition to work one needs a whole set of odds of attending college, both of groups that actually exist in the data like the group with the distribution of performance and the logistic regression coefficients of the lower class, and of groups that don't exist in the data, like the group with the distribution of performance of the lower class and the logistic regression coefficients of the higher class. All these odds are computed by transforming the average predicted probability of all these actual and counterfactual groups. Erikson et al. (2005) and Buis (2008) differ with respect to the way these average probabilities are computed: Erikson et al. assume that ability is normaly distributed and numerically integrate over this distribution, while Buis makes no assumption about this distribution and just computes the predicted probabilities and then computes the mean, effectively integrating over the empirical distribution of performance instead of over a normal distribution. {pstd} Other than Erikson et al. (2005) {cmd:ldecomp} also allows one to add control variables. While computing this decompostion these will by default be fixed at their mean value if no value was specified in the {cmd:at()} option {title:Options} {phang} {opt direct(varname)} specifies the variable whose direct effect we want to decompose into an indirect and total effect. This has to be a categorical variable, each value of varnameis assumed to represent a group. {phang} {opt indirect(varlist)} specifies the variable(s) through which the indirect effect occurs. By default multiple variables are allowed and these can be from any distribution. If the {cmd:normal} option is specified only one variable can be entered, and this variable is assumed to be normally distributed. {phang} {opt at(control_var1 # [; control_var2 #] [...])} specifies the values at which the control variables are to be fixed. The default is to fix the value of control variable at its mean value. {phang} {cmd:obspr} specifies that a table of the observed proportions are to be displayed. {phang} {cmd:predpr} specifies that a table of predicted and counterfactual proportions is to be displayed. If the {cmd:normal} option is not specified the diagonal elements of this table will be exactly the same as the observed proportions. {phang} {cmd:predodds} specifies that a table of predicted and counterfactual odds is to be displayed. {phang} {cmd:or} specifies that the decomposition is displayed in terms of odds-ratios instead of log odds-ratios. {phang} {cmd:rindirect} specifies that the relative contributions of the indirect effects to the total effect (in terms of log odds ratios) is to be displayed. {phang} {cmd:normal} specifies that the predicted and counterfactual proportions are to be computed according to the method specified by Erikson et al. (2005). This means that the variable specified in {cmd:indirect()} is assumed to be normally distributed. This option was primarily added for compatibility with Erikson et al. (2005). By default the method by Buis (2008) is used, which allows multiple variables to be specified in {cmd:indirect()} and makes no assumptions about the distribution of these variables. {phang} {opt range(##)} specifies the range over which the numerical integration of varlist is to be performed. The default is the minimum of the variable in varlist minus 10% of the range of varlist and the maximum of varlist plus 10% of the range of varlist. This option can only be specified with the {cmd:normal} option because in the default method there is no need for numerical integration. {phang} {opt nip(#)} specifies the number of integration points used in the numerical integration of the variable in varlist. The default is 1000. This option can only be specified with the {cmd:normal} option because in the default method there is no need for numerical integration. {phang} {cmd: interactions} specifies that interactions between the categories of the variable specified in {cmd: direct()} and the variable(s) specified in {cmd: indirect()}. In other words the effects of the variables specified in {cmd: indirect()} on the dependent variable are allowed to differ from one another for each category of the variable specified in {cmd: direct()}. This option was primarily added for compatibility with Erikson et al. (2005). {phang} {cmd: nolegend} suppresses a legend that is by default displayed at the bottom of the main table. {phang} {cmd: nodecomp} prevents {cmd: ldecomp} from displaying the table of decompositions, which can be useful in combination with the {cmd: obspr}, {cmd: predpr}, and/or {cmd: predodds} options. {phang} {cmd: nobootstrap} prevents {cmd:ldecomp} from using {cmd:bootstrap} to calculate standard errors. {marker bootopt}{...} {phang} {it:bootstrap_options} The following options of {helpb bootstrap} are allowed: {opt r:eps(#)}, {opt str:ata(varlist)}, {opt si:ze(#)}, {opt cl:uster(varlist)}, {opt id:cluster(newvar)}, {bf:saving}({it:filename}[, {it:suboptions}]), {opt bca}, {opt mse}, {opt l:evel(#)}, {opt nodots}, {opt seed(#)}, and {opt jack:knifeopts(jkopts)}. {title:Example} {phang}{cmd:. use wisconsin.dta, clear}{p_end} {phang}{cmd:. ldecomp college , direct(ocf57) indirect(hsrankq)}{p_end} {title:Author} {p 4 4} Maarten L. Buis{break} Universitaet Tuebingen{break} Instituet fur Soziologie{break} maarten.buis@uni-tuebingen.de {p_end} {title:References} {p 4 8 2}Buis, M.L. (2009). Direct and indirect effects in a logit model. {browse "http://www.maartenbuis.nl/wp/ldecomp.html"} {p 4 8 2}Erikson, R, J.H. Goldthorpe, M. Jackson, M. Yaish, D.R. Cox (2005). On class differentials in educational attainment. {it:Proceedings of the National Academy of Science}, 102(27): 9730-9732. {p 4 8 2}Jackson, M, R. Erikson, J. Goldthorpe, M. Yaish (2007). Primary and secondary effects in class differentials in educational attainment: The transition to A-level courses in England and Wales. {it:Acta Sociologica}, 50(3): 211-229. {title:Also see} {psee} Online: {helpb logit} {helpb bootstrap} {helpb bootstrap_postestimation} {helpb jackknife} {psee} If installed: {helpb fairlie} {helpb gdecomp} {p_end}