------------------------------------------------------------------------------- help foroaxaca8-------------------------------------------------------------------------------------------------------------------------------------------------------------- A newer version of this software is available from the SSC Archive as

oaxaca. -------------------------------------------------------------------------------

Decomposition of outcome differentials

oaxaca8est1est2[,common_optionsoaxaca8_options]

oaxaca2varlist[weight] [ifexp] [inrange] ,by(groupvar)[common_optionsoaxaca2_options]

common_optionsDescription ----------------------------------------------------------------------weight(wgt[wgt ...])specify weights for the two-fold decomposition;wgtis#oromegadetail[(dlist)] display detailed results for the regressorsadjust(varlist)adjustment for selection variablesfixed[(varlist)] assume fixed regressorslevel(#)set the confidence leveleformdisplay results in exponentiated formtfdisplay three-fold decompositionnosesuppress computation of standard errorsesavesave results ine()---------------------------------------------------------------------- wheredlistisname=varlist[,name=varlist...]

oaxaca8_optionsDescription ----------------------------------------------------------------------reference(ref[ref ...])specify reference estimatesasisdo not change the order of the models ----------------------------------------------------------------------

oaxaca2_optionsDescription ----------------------------------------------------------------------by(groupvar)specifies the groups;by()is not optionalpooledrequest decomposition based on pooled modelincludebyincludegroupvarin the pooled modelnoisilydisplay model estimatescmd(cmd[cmd...])set the estimation command, default:regresscmdopts(opts[opts...])options for model estimationaddvars(vars[vars...])additional regressors for individual models ----------------------------------------------------------------------aweights,fweights,iweights, andpweights are allowed withoaxaca2(depending on the used estimation command); see help weight.

DescriptionGiven the results from two models previously estimated and stored by

estimates store,oaxaca8computes the so called Blinder-Oaxaca decomposition of the mean outcome differential. An example is the decomposition of the gender wage gap into an "explained" portion due to differences in endowments and an "unexplained" portion due to differences in coefficients.est1refers to the name of the stored estimates for the first group (e.g. males),est2is the name of the stored estimates for the second group (e.g. females).

oaxaca8can display different variants of the decomposition and also provides standard errors. See the methods and formulas section for details.

oaxaca2is a wrapper foroaxaca8. It first estimates the group models and then performs the decomposition.oaxaca2is suitable for use withbootstrap(also see theesaveoption).

oaxaca8requires Stata 8.2 or higher. A Stata 7 decomposition package is available from the SSC Archive asdecompose. Also seedecompby Ian Watson. Packages to compute decompositions of changes in outcome differentials aresmithwelchandjmpierce.

+----------------+----+common_options+---------------------------------------------------

weight(wgt[wgt ...]), wherewgtis either#oromega, specifies the weight given to the parameters of the high-outcome group for the two-fold decomposition. A separate decomposition is computed for each specifiedwgt. For example,weight(0 1)displays a decomposition with the low-outcome group coefficients as reference and a decomposition with the high group parameters as a reference. Specifyingweight(omega)causesoaxaca8to compute the reference parameters from the data as explained in the methods and formulas section. Theweight(omega)option makes sense only in the context of OLS regression. Furthermore, note that the interpretation of the detailed results for the "unexplained" part (see thedetailoption) is problematic with this decomposition.

detail[(dlist)] requests that the detailed decomposition results for the individual regressors be reported. Usedlistto subsume the results for specific groups of regressors (variables not appearing indlistare listed individually). The usual shorthand conventions apply to thevarlists specified indlist(see help varlist). For example, specifydetail(exp=exp*)if the models containexp(experience) andexp2(experience squared).A cautionary note: For the "unexplained" part of the differential, the subdivision into separate contributions is sensitive to locational transformations of the regressors (see, e.g., Oaxaca and Ransom 1999). The results are thus arbitrary unless the regressors have natural zero points. A related problem is that the results for categorical variables depend on the choice of the reference category. A solution to the reference category problem is provided by the

devconpackage from the SSC Archive.

adjust(varlist)may be used to adjust the outcome differential for the effects of certain variables (e.g. selection variables) before computing the decomposition.

fixed[(varlist)] indicates that certain regressors are fixed. The default is to treat all regressors as stochastic. Iffixedis specified without arguments, all regressors are assumed to be fixed. Using this option has implications for the computation of the standard errors of the decomposition components.

level(#)specifies the confidence level, in percent terms, for the confidence intervals of the computed statistics; see help level.

eformcauses the results to be displayed in exponentiated form.

tfspecifies that the three-fold decomposition be displayed in any case.

nosesuppresses the calculation of standard errors.

esavespecifies that the results be returnd ine(). This is useful, e.g., if you want to usebootstrapwithoaxaca8. Note that the off-diagonal elements ine(V)will be set to zero sinceoaxaca8does not provide the covariances among the various decomposition components. Do not applylincomor similar techniques to the returned results. Also do not usepredict.

+-----------------+----+oaxaca8_options+--------------------------------------------------

reference(ref1[ref2 ...])specifies reference estimates to be used with the two-fold decomposition.ref1,ref2, etc. refer to the names of the stored models. A separate decomposition is computed for each model specified. Note that no standard errors will be computed for the "unexplained" part in these decompositions.

asisinstructsoaxaca8not to change the order of the models. By default,oaxaca8rearranges the models so that the mean differential is positive.

+-----------------+----+oaxaca2_options+--------------------------------------------------

by(groupvar)defines the groups between which the decomposition is to be performed.groupvaris to take on two unique values.

pooleddisplays a decomposition based on a pooled model over both groups.

includebyspecifies thatgroupvar(see theby()option) be included as a control variable in the pooled model.

noisilycauses the estimates of the individual models to be displayed.

cmd(cmd[cmd...])specifies the estimation commands for the models (see estcom). The default command isregress. For example, specifycmd(ivreg)to useivreginstead. Specify more than one command, if the different commands be used. For example,cmd(regress ivreg)would useregressfor the first group andivregfor the second.

cmdopts("opts" ["opts" ...])may be used to specify sets of options for the model estimation commands.optsmust be enclosed in quotes if it contains spaces. If only one set of options is specified, it is added to all models. For example, specifycmdopts("robust nocons")to add the optionsrobustandnoconsto all models. Alternatively,cmdopts("robust nocons" "hc3")would addrobust noconsto the first model andhc3to the second. Finaly,cmdopts("hc3" "")would addhc3to the first model and nothing to the second.

addvars("vars" ["vars" ...])specifies additional variables to be added to individual models. For example,addvars("" "lambda")would add variablelambdato the second model.

ExampleStep 1: Estimate and store the models

. regress lnwage educ exp exp2 if female==0 . estimates store male . regress lnwage educ exp exp2 if female==1 . estimates store female

Step 2: Compute the decomposition

- three-fold decomposition (endowments, coefficients, interaction)

. oaxaca8 male female

- various parametrizations of the two-fold decomposition (explained, unexplained)

. oaxaca8 male female, weight(1 0.5 0 omega)

Usage of

oaxaca2: steps 1 and 2 in one command. oaxaca2 lnwage educ exp exp2, by(female)

Bootstrapping (Stata 8)

. bs "oaxaca2 lnwage educ exp exp2, by(female) esave nose" _b

Bootstrapping (Stata 9)

. bootstrap _b: oaxaca2 lnwage educ exp exp2, by(female) esave nose

(Note that the

noseoption in the bootstrap examples is not essential. However,bootstrapexecutes faster ifnoseis specified.)

Saved Results

oaxaca8saves inr():Scalars:

r(pred1)mean linear prediction from first groupr(se_pred1)standard error of prediction from first groupr(pred2)mean linear prediction from second groupr(se_pred2)standard error of prediction from second groupr(diff)difference between mean predictionsr(se_diff)standard error of differenceMatrices:

r(D)results of the decompositionsr(VD)variances of the results inr(D)r(B1)coefficients from the first modelr(VB1)variance-covariance matrix from the first modelr(B2)coefficients from the second modelr(VB2)variance-covariance matrix from the second modelr(X1)means of the regressors for the first groupr(VX1)variance-covariance matrix of the means of the regressors for the first groupr(X2)means of the regressors for the second groupr(VX2)variance-covariance matrix of the means of the regressors for the second group

If

esaveis specified,oaxaca8additionally saves ine():Scalars:

e(N)total number of casee(N1)number of cases in first groupe(N2)number of cases in second groupe(pred1)mean linear prediction from first groupe(se_pred1)standard error of prediction from first groupe(pred2)mean linear prediction from second groupe(se_pred2)standard error of prediction from second groupe(diff)difference between mean predictionse(se_diff)standard error of differenceMacros:

e(cmd)containing "oaxaca8"Matrices:

e(b)decomposition resultse(V)variances of decomposition results (covariances set to 0)Functions:

e(sample)estimation sample

Methods and Formulas

The three-fold decompositionThe following linear models are given:

Y1 =X1b1 +e1Y2 =X2b2 +e2for some outcome variable Y in two groups 1 and 2. As long as E(

e1)=E(e2)=0, the mean outcome difference between the two groups can be decomposed asR =

x1'b1 -x2'b2 = (x1-x2)'b2 +x2'(b1-b2) + (x1-x2)'(b1-b2) = E + C + CEwhere

x1 andx2 are the vectors of means of the regressors (including the constants) for the two groups (e.g. see Winsborough and Dickenson 1971, Jones and Kelley 1984, Daymont and Andrisani 1984). In other words, R is decomposed into one part that is due to differences in endowments (E), one part that is due to differences in coefficients (including the intercept) (C), and a third part that is due to interaction between coefficients and endowments (CE).

The two-fold decompositionDepending on the model that is assumed to be the "true" model (i.e. the "absence-of-discrimination" model), the terms of the three-fold decomposition may be used to determine the "explained" (Q) and "unexplained" (U; e.g. discrimination) parts of the differential (the question is how to allocate the interaction term CE). Oaxaca (1973) proposed assuming either the low group model or the high group model as the no-discrimination model, which implies that Q=E and U=C+CE and Q=E+CE and U=C, respectively. More generally, the coefficients of the "true" model may be expressed as

b* =Wb1+(I-W)b2where

Iis an identity matrix andWis a matrix of weights. Analogously, the decomposition may be written asR = (

x1-x2)'[Wb1+(I-W)b2] + [x1'(I-W)+x2'W](b1-b2)In the two cases proposed by Oaxaca (1973),

Wis a nullmatrix or equalsI, respectively (W=Iis also suggested by Blinder 1973). Furthermore,Wmay be wI, where w is a scalar reflecting the weight given to the coefficients for the first group (Reimers 1983 proposed w=.5, Cotton 1988 proposed using the relative group size). Use theweigth()option to specify w.Alternatively, Neumark (1988) proposed using the coefficients from a pooled model for both groups, which implies that

W= diag(b*-b2) diag(b1-b2)^-1or

R = (

x1-x2)'b* + [x1'(b1-b*)+x2'(b*-b2)]where

b* is the vector of the coefficients from the pooled model. However, other coefficients vectors may also make sense. Use thereference()option to specify such a reference model.In the context of OLS regression, the method proposed by Neumark is equivalent to using the weighting matrix

W= (X1'X1 +X2'X2)^-1 (X1'X1)where

X1 andX2 are the matrices of observed values for the two samples (Oaxaca and Ransom 1994). This approach is implemented via theweight(omega)option.

Standard errorsThe variances/standard errors of the components are computed according to the method detailed in Jann (2005). For the case of fixed regressors, also see Oaxaca and Ransom (1998). The variances and covariances of the coefficients are taken from the

e(V)matrices of the models. The variance-covariance matrices of the means of the regressors in the models are estimated according to standard formulas (cross-product matrix of deviations divided by N*(N-1)) unlesspweights or clusters are applied or a specific survey design is set (see helpsvyset). In the latter cases, the variance-covariance matrices are estimated using thesvymeancommand. Note that standard errors cannot be computed for the U term if the non-discriminating coefficients are taken from a reference model specified via thereference()option. Usebootstrapto derive the standard errors in this case.

Selection modelsAssume that a selection variable S appears in the models. If the variable is marked by specifying

adjust(S), the differential will be adjusted for selection, i.e.R_s =

x1'b1 -x2'b2 - (s1bs1 - s2bs2)where s1 and s2 are the means of S and bs1 and bs2 are the coefficients of S, and

oaxaca8will decompose R_s instead of R. Note that it is not necessary to use theadjustoption if the models were estimated withheckman. See Dolton and Makepeace (1986) or Neumann and Oaxaca (2004) for more sophisticated approaches to dealing with selection.If a specific regressor (or a selection variable) appears only in one model, the corresponding coefficient and the mean of the regressor will be set to zero for the other group.

ReferencesBlinder, A.S. (1973). Wage Discrimination: Reduced Form and Structural Estimates. The Journal of Human Resources 8: 436-455. Cotton, J. (1988). On the Decomposition of Wage Differentials. The Review of Economics and Statistics 70: 236-243. Daymont, T.N., Andrisani, P.J. (1984). Job Preferences, College Major, and the Gender Gap in Earnings. The Journal of Human Resources 19: 408-428. Dolton, P.J., Makepeace, G.H. (1986). Sample Selection and Male-Female Earnings Differentials in the Graduate Labour Market. Oxford Economic Papers 38: 317-341. Jann, B. (2005). Standard Errors for the Blinder–Oaxaca Decomposition: http://repec.org/dsug2005/oaxaca_se_handout.pdf. Jones, F.L., Kelley, J. (1984). Decomposing Differences Between Groups. A Cautionary Note on Measuring Discrimination. Sociological Methods and Research 12: 323-343. Neuman, S., Oaxaca, R.L. (2004). Wage decompositions with selectivity-corrected wage equations: A methodological note. Journal of Economic Inequality 2: 3-10. Neumark, D. (1988). Employers' Discriminatory Behavior and the Estimation of Wage Discrimination. The Journal of Human Resources 23: 279-295. Oaxaca, R. (1973). Male-Female Wage Differentials in Urban Labor Markets. International Economic Review 14: 693-709. Oaxaca, R.L., Ransom, M.R. (1994). On discrimination and the decomposition of wage differentials. Journal of Econometrics 61: 5-21. Oaxaca, R.L., Ransom, M.R. (1998). Calculation of approximate variances for wage decomposition differentials. Journal of Economic and Social Measurement 24: 55-61. Oaxaca, R.L., Ransom, M.R. (1999). Identification in Detailed Wage Decompositions. The Review of Economics and Statistics 81: 154-157. Reimers, C.W. (1983). Labor Market Discrimination Against Hispanic and Black Men. The Review of Economics and Statistics 65: 570-579. Winsborough, H.H., Dickinson, P. (1971). Components of Negro-White Income Differences. Proceedings of the American Statistical Association, Social Statistics Section: 6-8.

AuthorBen Jann, ETH Zurich, jannb@ethz.ch

Also seeOnline: help for

regress,estimates,heckman,devcon(if installed),