help for oaxaca8

------------------------------------------------------------------------------- A newer version of this software is available from the SSC Archive as oaxaca. -------------------------------------------------------------------------------

Decomposition of outcome differentials

oaxaca8 est1 est2 [, common_options oaxaca8_options ]

oaxaca2 varlist [weight] [if exp] [in range] , by(groupvar) [ common_options oaxaca2_options ]

common_options Description ---------------------------------------------------------------------- weight(wgt [wgt ...]) specify weights for the two-fold decomposition; wgt is # or omega detail[(dlist)] display detailed results for the regressors adjust(varlist) adjustment for selection variables fixed[(varlist)] assume fixed regressors level(#) set the confidence level eform display results in exponentiated form tf display three-fold decomposition nose suppress computation of standard errors esave save results in e() ---------------------------------------------------------------------- where dlist is name = varlist [, name = varlist ...]

oaxaca8_options Description ---------------------------------------------------------------------- reference(ref [ref ...]) specify reference estimates asis do not change the order of the models ----------------------------------------------------------------------

oaxaca2_options Description ---------------------------------------------------------------------- by(groupvar) specifies the groups; by() is not optional pooled request decomposition based on pooled model includeby include groupvar in the pooled model noisily display model estimates cmd(cmd [cmd ...]) set the estimation command, default: regress cmdopts(opts [opts ...]) options for model estimation addvars(vars [vars ...]) additional regressors for individual models ---------------------------------------------------------------------- aweights, fweights, iweights, and pweights are allowed with oaxaca2 (depending on the used estimation command); see help weight.


Given the results from two models previously estimated and stored by estimates store, oaxaca8 computes the so called Blinder-Oaxaca decomposition of the mean outcome differential. An example is the decomposition of the gender wage gap into an "explained" portion due to differences in endowments and an "unexplained" portion due to differences in coefficients. est1 refers to the name of the stored estimates for the first group (e.g. males), est2 is the name of the stored estimates for the second group (e.g. females).

oaxaca8 can display different variants of the decomposition and also provides standard errors. See the methods and formulas section for details.

oaxaca2 is a wrapper for oaxaca8. It first estimates the group models and then performs the decomposition. oaxaca2 is suitable for use with bootstrap (also see the esave option).

oaxaca8 requires Stata 8.2 or higher. A Stata 7 decomposition package is available from the SSC Archive as decompose. Also see decomp by Ian Watson. Packages to compute decompositions of changes in outcome differentials are smithwelch and jmpierce.


+----------------+ ----+ common_options +---------------------------------------------------

weight(wgt [wgt ...]), where wgt is either # or omega, specifies the weight given to the parameters of the high-outcome group for the two-fold decomposition. A separate decomposition is computed for each specified wgt. For example, weight(0 1) displays a decomposition with the low-outcome group coefficients as reference and a decomposition with the high group parameters as a reference. Specifying weight(omega) causes oaxaca8 to compute the reference parameters from the data as explained in the methods and formulas section. The weight(omega) option makes sense only in the context of OLS regression. Furthermore, note that the interpretation of the detailed results for the "unexplained" part (see the detail option) is problematic with this decomposition.

detail[(dlist)] requests that the detailed decomposition results for the individual regressors be reported. Use dlist to subsume the results for specific groups of regressors (variables not appearing in dlist are listed individually). The usual shorthand conventions apply to the varlists specified in dlist (see help varlist). For example, specify detail(exp=exp*) if the models contain exp (experience) and exp2 (experience squared).

A cautionary note: For the "unexplained" part of the differential, the subdivision into separate contributions is sensitive to locational transformations of the regressors (see, e.g., Oaxaca and Ransom 1999). The results are thus arbitrary unless the regressors have natural zero points. A related problem is that the results for categorical variables depend on the choice of the reference category. A solution to the reference category problem is provided by the devcon package from the SSC Archive.

adjust(varlist) may be used to adjust the outcome differential for the effects of certain variables (e.g. selection variables) before computing the decomposition.

fixed[(varlist)] indicates that certain regressors are fixed. The default is to treat all regressors as stochastic. If fixed is specified without arguments, all regressors are assumed to be fixed. Using this option has implications for the computation of the standard errors of the decomposition components.

level(#) specifies the confidence level, in percent terms, for the confidence intervals of the computed statistics; see help level.

eform causes the results to be displayed in exponentiated form.

tf specifies that the three-fold decomposition be displayed in any case.

nose suppresses the calculation of standard errors.

esave specifies that the results be returnd in e(). This is useful, e.g., if you want to use bootstrap with oaxaca8. Note that the off-diagonal elements in e(V) will be set to zero since oaxaca8 does not provide the covariances among the various decomposition components. Do not apply lincom or similar techniques to the returned results. Also do not use predict.

+-----------------+ ----+ oaxaca8_options +--------------------------------------------------

reference(ref1 [ref2 ...]) specifies reference estimates to be used with the two-fold decomposition. ref1, ref2, etc. refer to the names of the stored models. A separate decomposition is computed for each model specified. Note that no standard errors will be computed for the "unexplained" part in these decompositions.

asis instructs oaxaca8 not to change the order of the models. By default, oaxaca8 rearranges the models so that the mean differential is positive.

+-----------------+ ----+ oaxaca2_options +--------------------------------------------------

by(groupvar) defines the groups between which the decomposition is to be performed. groupvar is to take on two unique values.

pooled displays a decomposition based on a pooled model over both groups.

includeby specifies that groupvar (see the by() option) be included as a control variable in the pooled model.

noisily causes the estimates of the individual models to be displayed.

cmd(cmd [cmd ...]) specifies the estimation commands for the models (see estcom). The default command is regress. For example, specify cmd(ivreg) to use ivreg instead. Specify more than one command, if the different commands be used. For example, cmd(regress ivreg) would use regress for the first group and ivreg for the second.

cmdopts("opts" ["opts" ...]) may be used to specify sets of options for the model estimation commands. opts must be enclosed in quotes if it contains spaces. If only one set of options is specified, it is added to all models. For example, specify cmdopts("robust nocons") to add the options robust and nocons to all models. Alternatively, cmdopts("robust nocons" "hc3") would add robust nocons to the first model and hc3 to the second. Finaly, cmdopts("hc3" "") would add hc3 to the first model and nothing to the second.

addvars("vars" ["vars" ...]) specifies additional variables to be added to individual models. For example, addvars("" "lambda") would add variable lambda to the second model.


Step 1: Estimate and store the models

. regress lnwage educ exp exp2 if female==0 . estimates store male . regress lnwage educ exp exp2 if female==1 . estimates store female

Step 2: Compute the decomposition

- three-fold decomposition (endowments, coefficients, interaction)

. oaxaca8 male female

- various parametrizations of the two-fold decomposition (explained, unexplained)

. oaxaca8 male female, weight(1 0.5 0 omega)

Usage of oaxaca2: steps 1 and 2 in one command

. oaxaca2 lnwage educ exp exp2, by(female)

Bootstrapping (Stata 8)

. bs "oaxaca2 lnwage educ exp exp2, by(female) esave nose" _b

Bootstrapping (Stata 9)

. bootstrap _b: oaxaca2 lnwage educ exp exp2, by(female) esave nose

(Note that the nose option in the bootstrap examples is not essential. However, bootstrap executes faster if nose is specified.)

Saved Results

oaxaca8 saves in r():


r(pred1) mean linear prediction from first group r(se_pred1) standard error of prediction from first group r(pred2) mean linear prediction from second group r(se_pred2) standard error of prediction from second group r(diff) difference between mean predictions r(se_diff) standard error of difference


r(D) results of the decompositions r(VD) variances of the results in r(D) r(B1) coefficients from the first model r(VB1) variance-covariance matrix from the first model r(B2) coefficients from the second model r(VB2) variance-covariance matrix from the second model r(X1) means of the regressors for the first group r(VX1) variance-covariance matrix of the means of the regressors for the first group r(X2) means of the regressors for the second group r(VX2) variance-covariance matrix of the means of the regressors for the second group

If esave is specified, oaxaca8 additionally saves in e():


e(N) total number of case e(N1) number of cases in first group e(N2) number of cases in second group e(pred1) mean linear prediction from first group e(se_pred1) standard error of prediction from first group e(pred2) mean linear prediction from second group e(se_pred2) standard error of prediction from second group e(diff) difference between mean predictions e(se_diff) standard error of difference


e(cmd) containing "oaxaca8"


e(b) decomposition results e(V) variances of decomposition results (covariances set to 0)


e(sample) estimation sample

Methods and Formulas

The three-fold decomposition

The following linear models are given:

Y1 = X1b1 + e1 Y2 = X2b2 + e2

for some outcome variable Y in two groups 1 and 2. As long as E(e1)=E(e2)=0, the mean outcome difference between the two groups can be decomposed as

R = x1'b1 - x2'b2 = (x1-x2)'b2 + x2'(b1-b2) + (x1-x2)'(b1-b2) = E + C + CE

where x1 and x2 are the vectors of means of the regressors (including the constants) for the two groups (e.g. see Winsborough and Dickenson 1971, Jones and Kelley 1984, Daymont and Andrisani 1984). In other words, R is decomposed into one part that is due to differences in endowments (E), one part that is due to differences in coefficients (including the intercept) (C), and a third part that is due to interaction between coefficients and endowments (CE).

The two-fold decomposition

Depending on the model that is assumed to be the "true" model (i.e. the "absence-of-discrimination" model), the terms of the three-fold decomposition may be used to determine the "explained" (Q) and "unexplained" (U; e.g. discrimination) parts of the differential (the question is how to allocate the interaction term CE). Oaxaca (1973) proposed assuming either the low group model or the high group model as the no-discrimination model, which implies that Q=E and U=C+CE and Q=E+CE and U=C, respectively. More generally, the coefficients of the "true" model may be expressed as

b* = Wb1+(I-W)b2

where I is an identity matrix and W is a matrix of weights. Analogously, the decomposition may be written as

R = (x1-x2)'[Wb1+(I-W)b2] + [x1'(I-W)+x2'W](b1-b2)

In the two cases proposed by Oaxaca (1973), W is a nullmatrix or equals I, respectively (W=I is also suggested by Blinder 1973). Furthermore, W may be wI, where w is a scalar reflecting the weight given to the coefficients for the first group (Reimers 1983 proposed w=.5, Cotton 1988 proposed using the relative group size). Use the weigth() option to specify w.

Alternatively, Neumark (1988) proposed using the coefficients from a pooled model for both groups, which implies that

W = diag(b*-b2) diag(b1-b2)^-1


R = (x1-x2)'b* + [x1'(b1-b*)+x2'(b*-b2)]

where b* is the vector of the coefficients from the pooled model. However, other coefficients vectors may also make sense. Use the reference() option to specify such a reference model.

In the context of OLS regression, the method proposed by Neumark is equivalent to using the weighting matrix

W = (X1'X1 + X2'X2)^-1 (X1'X1)

where X1 and X2 are the matrices of observed values for the two samples (Oaxaca and Ransom 1994). This approach is implemented via the weight(omega) option.

Standard errors

The variances/standard errors of the components are computed according to the method detailed in Jann (2005). For the case of fixed regressors, also see Oaxaca and Ransom (1998). The variances and covariances of the coefficients are taken from the e(V) matrices of the models. The variance-covariance matrices of the means of the regressors in the models are estimated according to standard formulas (cross-product matrix of deviations divided by N*(N-1)) unless pweights or clusters are applied or a specific survey design is set (see help svyset). In the latter cases, the variance-covariance matrices are estimated using the svymean command. Note that standard errors cannot be computed for the U term if the non-discriminating coefficients are taken from a reference model specified via the reference() option. Use bootstrap to derive the standard errors in this case.

Selection models

Assume that a selection variable S appears in the models. If the variable is marked by specifying adjust(S), the differential will be adjusted for selection, i.e.

R_s = x1'b1 - x2'b2 - (s1bs1 - s2bs2)

where s1 and s2 are the means of S and bs1 and bs2 are the coefficients of S, and oaxaca8 will decompose R_s instead of R. Note that it is not necessary to use the adjust option if the models were estimated with heckman. See Dolton and Makepeace (1986) or Neumann and Oaxaca (2004) for more sophisticated approaches to dealing with selection.

If a specific regressor (or a selection variable) appears only in one model, the corresponding coefficient and the mean of the regressor will be set to zero for the other group.


Blinder, A.S. (1973). Wage Discrimination: Reduced Form and Structural Estimates. The Journal of Human Resources 8: 436-455. Cotton, J. (1988). On the Decomposition of Wage Differentials. The Review of Economics and Statistics 70: 236-243. Daymont, T.N., Andrisani, P.J. (1984). Job Preferences, College Major, and the Gender Gap in Earnings. The Journal of Human Resources 19: 408-428. Dolton, P.J., Makepeace, G.H. (1986). Sample Selection and Male-Female Earnings Differentials in the Graduate Labour Market. Oxford Economic Papers 38: 317-341. Jann, B. (2005). Standard Errors for the Blinder–Oaxaca Decomposition: http://repec.org/dsug2005/oaxaca_se_handout.pdf. Jones, F.L., Kelley, J. (1984). Decomposing Differences Between Groups. A Cautionary Note on Measuring Discrimination. Sociological Methods and Research 12: 323-343. Neuman, S., Oaxaca, R.L. (2004). Wage decompositions with selectivity-corrected wage equations: A methodological note. Journal of Economic Inequality 2: 3-10. Neumark, D. (1988). Employers' Discriminatory Behavior and the Estimation of Wage Discrimination. The Journal of Human Resources 23: 279-295. Oaxaca, R. (1973). Male-Female Wage Differentials in Urban Labor Markets. International Economic Review 14: 693-709. Oaxaca, R.L., Ransom, M.R. (1994). On discrimination and the decomposition of wage differentials. Journal of Econometrics 61: 5-21. Oaxaca, R.L., Ransom, M.R. (1998). Calculation of approximate variances for wage decomposition differentials. Journal of Economic and Social Measurement 24: 55-61. Oaxaca, R.L., Ransom, M.R. (1999). Identification in Detailed Wage Decompositions. The Review of Economics and Statistics 81: 154-157. Reimers, C.W. (1983). Labor Market Discrimination Against Hispanic and Black Men. The Review of Economics and Statistics 65: 570-579. Winsborough, H.H., Dickinson, P. (1971). Components of Negro-White Income Differences. Proceedings of the American Statistical Association, Social Statistics Section: 6-8.


Ben Jann, ETH Zurich, jannb@ethz.ch

Also see

Online: help for regress, estimates, heckman, devcon (if installed),