help oaxaca9


oaxaca9 -- Blinder-Oaxaca decomposition of outcome differentials


oaxaca9 depvar [indepvars] [if] [in] [weight] , by(groupvar) [ options ]

options Description ------------------------------------------------------------------------- Main by(groupvar) specifies the groups; by() is required swap swap groups detail[(dlist)] display detailed decomposition adjust(varlist) adjustment for selection variables

Decomposition type threefold[(reverse)] three-fold decomposition; the default weight(# [# ...]) two-fold decomposition based on specified weights pooled[(model_opts)] two-fold decomposition based on pooled model including groupvar omega[(model_opts)] two-fold decomposition based on pooled model excluding groupvar reference(name) two-fold decomposition based on stored model split split unexplained part of two-fold decomposition

X-Values x1(names_and_values) provide custom X-values for Group 1 x2(names_and_values) provide custom X-values for Group 2 categorical(clist) identify dummy variable sets and apply deviation contrast transform

SE/SVY svy[(svyspec)] survey data estimation vce(vcetype) vcetype may be may be analytic, robust, cluster clustvar, bootstrap, or jackknife cluster(varname) adjust standard errors for intragroup correlation (Stata 9) fixed[(varlist)] assume non-stochastic regressors suest[(name)] | nosuest do/do not use suest to obtain joint variance matrix nose suppress computation of standard errors

Models model1(model_opts) estimation details for the Group 1 model model2(model_opts) estimation details for the Group 2 model noisily display model estimation output

Reporting xb display table with coefficients and means level(#) set confidence level; default is level(95) eform report exponentiated results nolegend suppress legend ------------------------------------------------------------------------- bootstrap, by, jackknife, statsby, and xi are allowed; see prefix. Weights are not allowed with the bootstrap prefix. aweights are not allowed with the jackknife prefix. vce(), cluster(), and weights are not allowed with the svy option. fweights, aweights, pweights, and iweight are allowed; see weight.


oaxaca9 computes the so-called Blinder-Oaxaca decomposition, which is often used to analyze wage gaps by sex or race. depvar is the outcome variable of interest (e.g. log wages) and indepvars are predictors (e.g. education, work experience, etc.). groupvar identifies the groups to be compared. For methods and formulas see Jann (2008).

oaxaca9 typed without arguments replays the last results, optionally applying xb, level(), eform, or nolegend.


+------+ ----+ Main +-------------------------------------------------------------

by(groupvar) specifies the groupvar that defines the two groups that will be compared. by() is required.

swap reverses the order of the groups.

detail[(dlist)] requests that the detailed results for the individual predictors be reported. Use dlist to subsume the results for sets of regressors (results for variables not appearing in dlist are listed individually). The syntax for dlist is

name:varlist [, name:varlist ...]

The usual shorthand conventions apply to the varlists specified in dlist (see help varlist; additionally, _cons is allowed). For example, specify detail(exp:exp*) to subsume exp (experience) and exp2 (experience squared). name is any valid Stata name and labels the set.

adjust(varlist) causes the differential to be adjusted by the contribution of the specified variables before performing the decomposition. This is useful, for example, if the specified variables are selection terms. Note that adjust() is not needed for heckman models.

+--------------------+ ----+ Decomposition type +-----------------------------------------------

threefold[(reverse)] computes the three-fold decomposition. This is the default unless weight(), pooled, omega, or reference() is specified. The decomposition is expressed from the viewpoint of Group 2. Specify threefold(reverse) to express the decomposition from the viewpoint of Group 1.

weight(# [# ...]) computes the two-fold decomposition where # [# ...] are the weights given to Group 1 relative to Group 2 in determining the reference coefficients (weights are recycled if there are more coefficients than weights). For example, weight(1) uses the Group 1 coefficients as the reference coefficients, weight(0) uses the Group 2 coefficients.

pooled[(model_opts)] computes the two-fold decomposition using the coefficients from a pooled model over both groups as the reference coefficients. groupvar is included in the pooled model as an additional control variable. Estimation details may be specified in parentheses; see the model1() option below.

omega[(model_opts)] computes the two-fold decomposition using the coefficients from a pooled model over both groups as the reference coefficients (without including groupvar as a control variable in the pooled model). Estimation details may be specified in parentheses; see the model1() option below.

reference(name) computes the two-fold decomposition using the coefficients from a stored model. name is the name under which the model was stored; see estimates store. Do not combine the reference() option with bootstrap or jackknife methods.

split causes the "unexplained" component in the two-fold decomposition to be split into a part related to Group 1 and a part related to Group 2. split is effective only if specified with weight(), pooled, omega, or reference().

Only one of threefold, weight(), pooled, omega, and reference() is allowed.

+----------+ ----+ X-Values +---------------------------------------------------------

x1(names_and_values) and x2(names_and_values) provide custom values for specific predictors to be used for Group 1 and Group 2 in the decomposition. The default is to use the group means of the predictors. The syntax for names_and_values is

varname [=] value [[,] varname [=] value ... ]

Example: x1(educ 12 exp 30)

categorical(clist) identifies sets of dummy variables representing categorical variables and transforms the coefficients so that the results of the decomposition are invariant to the choice of the (omitted) base category (deviation contrast transform). The syntax for clist is

varlist [, varlist ... ]

where each varlist must contain indicator (0/1) variables for all categories including the base category (that is, a base category indicator variable must exist in the data). To generate a suitable set of indicator variables use, for example,

tabulate catvar, generate(stubname) [ nofreq ]

where catvar is the categorical variable and the indicator variables will be named stubname1, stubname2, ... (nofreq may be used to suppress the frequency table; see help tabulate).

The variables of a set specified in categorical() are added to the indepvars (unless at least one of the variables of the set already appears in indepvars), omitting the first variable of the set to prevent collinearity for model estimation (i.e. the first variable is used to represent the base category). Change the order of the variables or explicitly specify the desired terms in indepvars to change the base category.

The deviation contrast transform can also be applied to interactions between a categorical and a continuous variable. Specify the continuous variable in parentheses at the end of the list in this case, i.e.

varlist (varname) [, ... ]

and also include a list for the main effects. Example:

categorical(d1 d2 d3, xd1 xd2 xd3 (x))

where x is the continuous variable, and d1 etc. and xd1 etc. are the main effects and interaction effects.

+--------+ ----+ SE/SVY +-----------------------------------------------------------

svy[([vcetype] [, svy_options])] executes oaxaca9 while accounting for the survey settings identified by svyset (this is essentially equivalent to applying the svy prefix command, although the svy prefix is not allowed with oaxaca9 due to some technical issues). vcetype and svy_options are as described in help svy.

vce(vcetype) specifies the type of standard errors reported. vcetype may be may be analytic (the default), robust, cluster clustvar, bootstrap, or jackknife; see [R] vce_option.

cluster(varname) adjusts standard errors for intragroup correlation; this is Stata 9 syntax for vce(cluster clustvar).

fixed[(varlist)] identifies fixed regressors (all if specified without argument; an example for fixed regressors are experimental factors). The default is to treat regressors as stochastic. Stochastic regressors inflate the standard errors of the decomposition components.

suest[(name)] enforces using suest to obtain the covariances between the models/groups. suest is implied by pooled, omega, reference(), svy, vce(cluster), and cluster(). Specify suest(name) to save suest's estimation results under name name using estimates store. nosuest prevents applying suest, which may cause biased standard errors.

nose suppresses the computation of standard errors.

+------------------+ ----+ Model estimation +-------------------------------------------------

model1(model_opts) and model2(model_opts) specify the estimation details for the two group-specific models. The syntax for model_opts is

[estcom] [, store(name) addrhs(spec) estcom_options ]

where estcom is the estimation command to be used and estcom_options are options allowed by estcom. The default estimation command is regress. store(name) saves the model's estimation results under name name using estimates store. addrhs(spec) adds spec to the "right-hand side" of the model. For example, use addrhs() to add extra variables to the model. Examples:

model1(heckman, select(varlist_s) twostep)

model1(ivregress 2sls, addrhs((varlist2=varlist_iv)))

Technical notes:

o oaxaca9 uses the first equation for the decomposition if a model contains multiple equations.

o Coefficients that occur in one of the models only are assumed zero for the other group. It is important, however, that the associated variables contain non-missing values for all observations in both groups.

noisily displays the models' estimation output.

+-----------+ ----+ Reporting +--------------------------------------------------------

xb displays a table containing the regression coefficients and predictor values on which the decomposition is based.

level(#) specifies the confidence level, as a percentage, for confidence intervals. The default is level(95) or as set by set level.

eform specifies that the results be displayed in exponentiated form.

nolegend suppresses the legend for the regressor sets defined by the detail() option.


. use

. oaxaca9 lnwage educ exper tenure, by(female)

. oaxaca9 lnwage educ exper tenure, by(female) weight(1)

. oaxaca9 lnwage educ exper tenure, by(female) pooled

. svyset [pw=wt] . oaxaca9 lnwage educ exper tenure, by(female) svy

. oaxaca9 lnwage educ exper tenure, by(female) vce(bootstrap)

Saved Results

Scalars e(N) number of observations e(N_1) number of observations in Group 1 e(N_2) number of observations in Group 2 e(N_clust) number of clusters

Macros e(cmd) oaxaca9 e(depvar) name of dependent variable e(by) name group variable e(group_1) value of group variable for Group 1 e(group_2) value of group variable for Group 2 e(title) Blinder-Oaxaca decomposition e(model) type of decomposition e(weights) weights specified in the weight() option e(refcoefs) equation name used in e(b0) for the reference coefficients e(detail) detail, if detailed results were requested e(legend) regressor sets defined by the detail() option e(adjust) names of adjustment variables e(fixed) names of fixed variables e(suest) suest, if suest was used e(wtype) weight type e(wexp) weight expression e(clustvar) name of cluster variable e(vce) vcetype specified in vce() e(vcetype) title used to label Std. Err. e(properties) b V

Matrices e(b) decomposition results e(V) variance-covariance matrix of decomposition results e(b0) vector containing coefficients and X-values e(V0) variance-covariance matrix of e(b0)

Functions e(sample) marks estimation sample


Jann, Ben (2008). The Blinder-Oaxaca decomposition for linear regression models. The Stata Journal 8(4): 453-479.

Working paper version available from:


Ben Jann, ETH Zurich,

Also see