help for ldecomp
-------------------------------------------------------------------------------

Title

    ldecomp -- Decomposes total effects in logistic regresion into direct and
                 indirect effects.

Syntax

        ldecomp depvar [ control_var1 [...]] } [if] [in] [weight] ,
               direct(varname) indirect(varlist) [ at(control_var1 # [;
               control_var2 #] [...]) obspr predpr predodds or rindirect
               normal range(# #) nip(#) interactions nolegend nodecomp
               nobootstrap bootstrap_options ]

        fweights, pweights, and iweights are allowed when the nobootstrap
        option is specified.

Description

    ldecomp decomposes the total effects of a categorical variable in
    logistic regresion into direct and indirect effects using a method method
    by Erikson et al. (2005) and a generalization of this method by Buis
    (2008). Say our dependent variable is whether or not someone attends
    college, and we are interested in decomposing the total effect of class
    background (high or low). We suspect that part of the total effect can be
    explained by differences between the classes in performance during high
    school: higher class children do better at high school and children that
    do better at high school are more likely to attend college.  This is the
    indirect effect. The direct effect of class is the effect while
    controlling for the performance: higher class children are more likely to
    attend college even if they have the same performance during high school.

    There are two ways in which one can see the impact of differences in the
    distribution of performance across classes, and thus get the indirect
    effect. One can fix the logistic regresion coefficients to be equal to
    the coefficients for the lower class and compare the proportion attending
    college of the group with a distribution of performance equal to the
    lower class and proportion attending college of the group with a
    distribution of performance equal to the higher class.  This way the only
    difference between the two groups is the distribution of performance.
    Call this method 1. Alternatively, one can make the same comparison but
    fix the logistic regression coefficients to be equal to the coefficients
    of the higher class. Call this method 2.

    Similarly, one can control for performance, and thus get the direct
    effect, by fixing the distribution of performance to be equal to the
    distribution of performance of the higher class (method 1) or the lower
    class (method 2). Once these distributions are fixed one can compare the
    proportion attending college of the group with the logistic regression
    coefficients of the lower class with the proportion attending college of
    the group with the logistic regression coefficients of the higher class.

    If these direct and indirect effects are represented as odds ratios than
    the total effect is the product of the the direct and indirect effect, as
    can be seen in equations 1 (method 1) and 2 (method 2). The O represents
    the odds of attending college, the first subscript the distribution of
    performance and the second subscript the logistic regression
    coefficients.

       O_hl     O_hh     O_hh
      ------ X ------ = ------              (1)
       O_ll     O_hl     O_ll

       O_hh     O_lh     O_hh
      ------ X ------ = ------              (2)
       O_lh     O_ll     O_ll

    If these direct and indirect effects are represented as log odds ratios
    than the total effect is the sum of the the direct and indirect effect,
    as can be seen in equations 1' (method 1) and 2' (method 2).

        +      +     +      +     +      +
        | O_hl |     | O_hh |     | O_hh |
      ln|------| + ln|------| = ln|------|  (1')
        | O_ll |     | O_hl |     | O_ll |
        +      +     +      +     +      +

        +      +     +      +     +      +
        | O_hh |     | O_lh |     | O_hh |
      ln|------| + ln|------| = ln|------|  (2')
        | O_lh |     | O_ll |     | O_ll |
        +      +     +      +     +      +

    By default ldecomp shows the decomposition in terms of log odds ratios
    for both method 1 and 2, and computes standard errors using bootstrap. By
    specifying the relindir option ldecomp will also show estimates of the
    size of the indirect relative to the total effect, using method 1 and 2
    and the average of these two, and their standard errors, also computed
    using the bootstrap.

    In order for this decomposition to work one needs a whole set of odds of
    attending college, both of groups that actually exist in the data like
    the group with the distribution of performance and the logistic
    regression coefficients of the lower class, and of groups that don't
    exist in the data, like the group with the distribution of performance of
    the lower class and the logistic regression coefficients of the higher
    class. All these odds are computed by transforming the average predicted
    probability of all these actual and counterfactual groups. Erikson et al.
    (2005) and Buis (2008) differ with respect to the way these average
    probabilities are computed: Erikson et al. assume that ability is normaly
    distributed and numerically integrate over this distribution, while Buis
    makes no assumption about this distribution and just computes the
    predicted probabilities and then computes the mean, effectively
    integrating over the empirical distribution of performance instead of
    over a normal distribution.

    Other than Erikson et al. (2005) ldecomp also allows one to add control
    variables.  While computing this decompostion these will by default be
    fixed at their mean value if no value was specified in the at() option


Options

    direct(varname) specifies the variable whose direct effect we want to
        decompose into an indirect and total effect. This has to be a
        categorical variable, each value of varnameis assumed to represent a
        group.

    indirect(varlist) specifies the variable(s) through which the indirect
        effect occurs. By default multiple variables are allowed and these
        can be from any distribution. If the normal option is specified only
        one variable can be entered, and this variable is assumed to be
        normally distributed.

    at(control_var1 # [; control_var2 #] [...]) specifies the values at which
        the control variables are to be fixed. The default is to fix the
        value of control variable at its mean value.

    obspr specifies that a table of the observed proportions are to be
        displayed.

    predpr specifies that a table of predicted and counterfactual proportions
        is to be displayed.  If the normal option is not specified the
        diagonal elements of this table will be exactly the same as the
        observed proportions.

    predodds specifies that a table of predicted and counterfactual odds is
        to be displayed.

    or specifies that the decomposition is displayed in terms of odds-ratios
        instead of log odds-ratios.

    rindirect specifies that the relative contributions of the indirect
        effects to the total effect (in terms of log odds ratios) is to be
        displayed.

    normal specifies that the predicted and counterfactual proportions are to
        be computed according to the method specified by Erikson et al.
        (2005). This means that the variable specified in indirect() is
        assumed to be normally distributed. This option was primarily added
        for compatibility with Erikson et al. (2005). By default the method
        by Buis (2008) is used, which allows multiple variables to be
        specified in indirect() and makes no assumptions about the
        distribution of these variables.

    range(##) specifies the range over which the numerical integration of
        varlist is to be performed. The default is the minimum of the
        variable in varlist minus 10% of the range of varlist and the maximum
        of varlist plus 10% of the range of varlist. This option can only be
        specified with the normal option because in the default method there
        is no need for numerical integration.

    nip(#) specifies the number of integration points used in the numerical
        integration of the variable in varlist. The default is 1000. This
        option can only be specified with the normal option because in the
        default method there is no need for numerical integration.

    interactions specifies that interactions between the categories of the
        variable specified in direct() and the variable(s) specified in
        indirect(). In other words the effects of the variables specified in
        indirect() on the dependent variable are allowed to differ from one
        another for each category of the variable specified in direct(). This
        option was primarily added for compatibility with Erikson et al.
        (2005).

    nolegend suppresses a legend that is by default displayed at the bottom
        of the main table.

    nodecomp prevents ldecomp from displaying the table of decompositions,
        which can be useful in combination with the obspr, predpr, and/or
        predodds options.

    nobootstrap prevents ldecomp from using bootstrap to calculate standard
        errors.

    bootstrap_options The following options of bootstrap are allowed:
        reps(#), strata(varlist), size(#), cluster(varlist),
        idcluster(newvar), saving(filename[, suboptions]), bca, mse,
        level(#), nodots, seed(#), and jackknifeopts(jkopts).


Example

    . use wisconsin.dta, clear
    . ldecomp college , direct(ocf57) indirect(hsrankq)


Author

    Maarten L. Buis
    Universitaet Tuebingen
    Instituet fur Soziologie
    maarten.buis@uni-tuebingen.de


References

    Buis, M.L. (2009). Direct and indirect effects in a logit model.  
        http://www.maartenbuis.nl/wp/ldecomp.html

    Erikson, R, J.H. Goldthorpe, M. Jackson, M. Yaish, D.R. Cox (2005).  On
        class differentials in educational attainment.  Proceedings of the
        National Academy of Science, 102(27): 9730-9732.

    Jackson, M, R. Erikson, J. Goldthorpe, M. Yaish (2007). Primary and
        secondary effects in class differentials in educational attainment:
        The transition to A-level courses in England and Wales. Acta
        Sociologica, 50(3): 211-229.


Also see

    Online: logit bootstrap bootstrap_postestimation jackknife

    If installed: fairlie gdecomp