help oaxaca9
-------------------------------------------------------------------------------

Title

    oaxaca9 -- Blinder-Oaxaca decomposition of outcome differentials


Syntax

        oaxaca9 depvar [indepvars] [if] [in] [weight] , by(groupvar) [
               options ]


    options                    Description
    -------------------------------------------------------------------------
    Main
      by(groupvar)             specifies the groups; by() is required
      swap                     swap groups
      detail[(dlist)]          display detailed decomposition
      adjust(varlist)          adjustment for selection variables

    Decomposition type
      threefold[(reverse)]     three-fold decomposition; the default
      weight(# [# ...])        two-fold decomposition based on specified
                                 weights
      pooled[(model_opts)]     two-fold decomposition based on pooled model
                                 including groupvar
      omega[(model_opts)]      two-fold decomposition based on pooled model
                                 excluding groupvar
      reference(name)          two-fold decomposition based on stored model
      split                    split unexplained part of two-fold
                                 decomposition

    X-Values
      x1(names_and_values)     provide custom X-values for Group 1
      x2(names_and_values)     provide custom X-values for Group 2
      categorical(clist)       identify dummy variable sets and apply
                                 deviation contrast transform

    SE/SVY
      svy[(svyspec)]           survey data estimation
      vce(vcetype)             vcetype may be may be analytic, robust,
                                 cluster clustvar, bootstrap, or jackknife
      cluster(varname)         adjust standard errors for intragroup
                                 correlation (Stata 9)
      fixed[(varlist)]         assume non-stochastic regressors
      suest[(name)] | nosuest  do/do not use suest to obtain joint variance
                                 matrix
      nose                     suppress computation of standard errors

    Models
      model1(model_opts)       estimation details for the Group 1 model
      model2(model_opts)       estimation details for the Group 2 model
      noisily                  display model estimation output

    Reporting
      xb                       display table with coefficients and means
      level(#)                 set confidence level; default is level(95)
      eform                    report exponentiated results
      nolegend                 suppress legend
    -------------------------------------------------------------------------
    bootstrap, by, jackknife, statsby, and xi are allowed; see prefix.
    Weights are not allowed with the bootstrap prefix.
    aweights are not allowed with the jackknife prefix.
    vce(), cluster(), and weights are not allowed with the svy option.
    fweights, aweights, pweights, and iweight are allowed; see weight.


Description

    oaxaca9 computes the so-called Blinder-Oaxaca decomposition, which is
    often used to analyze wage gaps by sex or race. depvar is the outcome
    variable of interest (e.g. log wages) and indepvars are predictors (e.g.
    education, work experience, etc.). groupvar identifies the groups to be
    compared. For methods and formulas see Jann (2008).

    oaxaca9 typed without arguments replays the last results, optionally
    applying xb, level(), eform, or nolegend.


Options

        +------+
    ----+ Main +-------------------------------------------------------------

    by(groupvar) specifies the groupvar that defines the two groups that will
        be compared. by() is required.

    swap reverses the order of the groups.

    detail[(dlist)] requests that the detailed results for the individual
        predictors be reported. Use dlist to subsume the results for sets of
        regressors (results for variables not appearing in dlist are listed
        individually). The syntax for dlist is

            name:varlist [, name:varlist ...]

        The usual shorthand conventions apply to the varlists specified in
        dlist (see help varlist; additionally, _cons is allowed). For
        example, specify detail(exp:exp*) to subsume exp (experience) and
        exp2 (experience squared).  name is any valid Stata name and labels
        the set.

    adjust(varlist) causes the differential to be adjusted by the
        contribution of the specified variables before performing the
        decomposition. This is useful, for example, if the specified
        variables are selection terms. Note that adjust() is not needed for 
        heckman models.

        +--------------------+
    ----+ Decomposition type +-----------------------------------------------

    threefold[(reverse)] computes the three-fold decomposition. This is the
        default unless weight(), pooled, omega, or reference() is specified.
        The decomposition is expressed from the viewpoint of Group 2. Specify
        threefold(reverse) to express the decomposition from the viewpoint of
        Group 1.

    weight(# [# ...]) computes the two-fold decomposition where # [# ...] are
        the weights given to Group 1 relative to Group 2 in determining the
        reference coefficients (weights are recycled if there are more
        coefficients than weights). For example, weight(1) uses the Group 1
        coefficients as the reference coefficients, weight(0) uses the Group
        2 coefficients.

    pooled[(model_opts)] computes the two-fold decomposition using the
        coefficients from a pooled model over both groups as the reference
        coefficients. groupvar is included in the pooled model as an
        additional control variable. Estimation details may be specified in
        parentheses; see the model1() option below.

    omega[(model_opts)] computes the two-fold decomposition using the
        coefficients from a pooled model over both groups as the reference
        coefficients (without including groupvar as a control variable in the
        pooled model). Estimation details may be specified in parentheses;
        see the model1() option below.

    reference(name) computes the two-fold decomposition using the
        coefficients from a stored model. name is the name under which the
        model was stored; see estimates store. Do not combine the reference()
        option with bootstrap or jackknife methods.

    split causes the "unexplained" component in the two-fold decomposition to
        be split into a part related to Group 1 and a part related to Group
        2. split is effective only if specified with weight(), pooled, omega,
        or reference().

    Only one of threefold, weight(), pooled, omega, and reference() is
    allowed.

        +----------+
    ----+ X-Values +---------------------------------------------------------

    x1(names_and_values) and x2(names_and_values) provide custom values for
        specific predictors to be used for Group 1 and Group 2 in the
        decomposition. The default is to use the group means of the
        predictors.  The syntax for names_and_values is

            varname [=] value [[,] varname [=] value ... ]

        Example: x1(educ 12 exp 30)

    categorical(clist) identifies sets of dummy variables representing
        categorical variables and transforms the coefficients so that the
        results of the decomposition are invariant to the choice of the
        (omitted) base category (deviation contrast transform). The syntax
        for clist is

            varlist [, varlist ... ]

        where each varlist must contain indicator (0/1) variables for all
        categories including the base category (that is, a base category
        indicator variable must exist in the data). To generate a suitable
        set of indicator variables use, for example,

            tabulate catvar, generate(stubname) [ nofreq ]

        where catvar is the categorical variable and the indicator variables
        will be named stubname1, stubname2, ... (nofreq may be used to
        suppress the frequency table; see help tabulate).

        The variables of a set specified in categorical() are added to the
        indepvars (unless at least one of the variables of the set already
        appears in indepvars), omitting the first variable of the set to
        prevent collinearity for model estimation (i.e. the first variable is
        used to represent the base category). Change the order of the
        variables or explicitly specify the desired terms in indepvars to
        change the base category.

        The deviation contrast transform can also be applied to interactions
        between a categorical and a continuous variable. Specify the
        continuous variable in parentheses at the end of the list in this
        case, i.e.

            varlist (varname) [, ... ]

        and also include a list for the main effects. Example:

            categorical(d1 d2 d3, xd1 xd2 xd3 (x))

        where x is the continuous variable, and d1 etc. and xd1 etc. are the
        main effects and interaction effects.

        +--------+
    ----+ SE/SVY +-----------------------------------------------------------

    svy[([vcetype] [, svy_options])] executes oaxaca9 while accounting for
        the survey settings identified by svyset (this is essentially
        equivalent to applying the svy prefix command, although the svy
        prefix is not allowed with oaxaca9 due to some technical issues).
        vcetype and svy_options are as described in help svy.

    vce(vcetype) specifies the type of standard errors reported. vcetype may
        be may be analytic (the default), robust, cluster clustvar,
        bootstrap, or jackknife; see [R] vce_option.

    cluster(varname) adjusts standard errors for intragroup correlation; this
        is Stata 9 syntax for vce(cluster clustvar).

    fixed[(varlist)] identifies fixed regressors (all if specified without
        argument; an example for fixed regressors are experimental factors).
        The default is to treat regressors as stochastic. Stochastic
        regressors inflate the standard errors of the decomposition
        components.

    suest[(name)] enforces using suest to obtain the covariances between the
        models/groups. suest is implied by pooled, omega, reference(), svy,
        vce(cluster), and cluster(). Specify suest(name) to save suest's
        estimation results under name name using estimates store. nosuest
        prevents applying suest, which may cause biased standard errors.

    nose suppresses the computation of standard errors.

        +------------------+
    ----+ Model estimation +-------------------------------------------------

    model1(model_opts) and model2(model_opts) specify the estimation details
        for the two group-specific models. The syntax for model_opts is

            [estcom] [, store(name) addrhs(spec) estcom_options ]

        where estcom is the estimation command to be used and estcom_options
        are options allowed by estcom. The default estimation command is 
        regress. store(name) saves the model's estimation results under name
        name using estimates store. addrhs(spec) adds spec to the "right-hand
        side" of the model. For example, use addrhs() to add extra variables
        to the model. Examples:

            model1(heckman, select(varlist_s) twostep)

            model1(ivregress 2sls, addrhs((varlist2=varlist_iv)))

        Technical notes:

          o oaxaca9 uses the first equation for the decomposition if a model
            contains multiple equations.

          o Coefficients that occur in one of the models only are assumed
            zero for the other group. It is important, however, that the
            associated variables contain non-missing values for all
            observations in both groups.

    noisily displays the models' estimation output.

        +-----------+
    ----+ Reporting +--------------------------------------------------------

    xb displays a table containing the regression coefficients and predictor
        values on which the decomposition is based.

    level(#) specifies the confidence level, as a percentage, for confidence
        intervals.  The default is level(95) or as set by set level.

    eform specifies that the results be displayed in exponentiated form.

    nolegend suppresses the legend for the regressor sets defined by the
        detail() option.


Examples

        . use http://fmwww.bc.edu/RePEc/bocode/o/oaxaca.dta

        . oaxaca9 lnwage educ exper tenure, by(female)

        . oaxaca9 lnwage educ exper tenure, by(female) weight(1)

        . oaxaca9 lnwage educ exper tenure, by(female) pooled

        . svyset [pw=wt]
        . oaxaca9 lnwage educ exper tenure, by(female) svy

        . oaxaca9 lnwage educ exper tenure, by(female) vce(bootstrap)


Saved Results

    Scalars   
      e(N)           number of observations
      e(N_1)         number of observations in Group 1
      e(N_2)         number of observations in Group 2
      e(N_clust)     number of clusters

    Macros    
      e(cmd)         oaxaca9
      e(depvar)      name of dependent variable
      e(by)          name group variable
      e(group_1)     value of group variable for Group 1
      e(group_2)     value of group variable for Group 2
      e(title)       Blinder-Oaxaca decomposition
      e(model)       type of decomposition
      e(weights)     weights specified in the weight() option
      e(refcoefs)    equation name used in e(b0) for the reference
                       coefficients
      e(detail)      detail, if detailed results were requested
      e(legend)      regressor sets defined by the detail() option
      e(adjust)      names of adjustment variables
      e(fixed)       names of fixed variables
      e(suest)       suest, if suest was used
      e(wtype)       weight type
      e(wexp)        weight expression
      e(clustvar)    name of cluster variable
      e(vce)         vcetype specified in vce()
      e(vcetype)     title used to label Std. Err.
      e(properties)  b V

    Matrices  
      e(b)           decomposition results
      e(V)           variance-covariance matrix of decomposition results
      e(b0)          vector containing coefficients and X-values
      e(V0)          variance-covariance matrix of e(b0)

    Functions 
      e(sample)      marks estimation sample


References

    Jann, Ben (2008). The Blinder-Oaxaca decomposition for linear regression
        models. The Stata Journal 8(4): 453-479.

    Working paper version available from: 
    http://ideas.repec.org/p/ets/wpaper/5.html


Author

    Ben Jann, ETH Zurich, jannb@ethz.ch


Also see