------------------------------------------------------------------------------- help fordecompose-------------------------------------------------------------------------------

Decomposition of wage differentials

Standard syntax:

decomposevarlist[weight] [ifexp] [inrange],by(varname)[detailestimateslambda(varname)noisygpoolednpooledregress_options]

aweights,fweights,iweights, andpweights are allowed; see help weights.Alternative syntax:

decompose,save(high|low|pooled )decompose[,detailestimateslambda(varname)]

DescriptionGiven the results from two regressions (one for each of two groups),

decomposecomputes several decompositions of the outcome variable difference. The decompositions show how much of the gap is due to differing endowments between the two groups, and how much is due to discrimination. Usually this is applied to wage differentials using Mincer type earnings equations.Standard syntax (

varlistandby(varname)specified): Regression models will be estimated for each category ofvarnameprior to the computation of the decomposition.Alternative syntax: Results from stand-alone estimation commands may be saved using

decompose, save(). The commanddecompose(withoutvarlist,byorsave) will capture these results and compute the decomposition.See decomp by Ian Watson for a similar package.

OptionsCommon options:

detailadditionally displays decomposition results for variables.

estimatesadditionally displays a table of regressions coefficients and means.

lambda(varname)reduces the mean prediction by the effect ofvarnameat its mean. This might be reasonable ifvarnameis a selection variable.Standard syntax options:

by(varname)specifies the grouping variable (which may be numeric or string). The group with highest mean on the dependent variable will be compared to each of the other groups.

noisyswitches on regression output.

npooleddeactivates the estimation of pooled regression models (which are required for the Neumark decomposition; see methods and formulas below).

gpooledrequests the estimation of a pooled model over all groups rather than casewise pooled models (note: ifby(varname)only specifies two groups this will have no effect).

regress_optionscontrol the regression estimation; see help regress.Alternative syntax options:

save()saves the coefficients, means and the number of cases (or the sum of weights, respectively) of the preceding estimation. Usesave(high)for the high group (i.e. the group with the higher mean on the dependent variable),save(low)for the low group, andsave(pooled)for the pooled model over both groups. The right-hand-side varlists of the high and low models do not necessarily need to be identical (if, e.g., a selection term is included in one model; note that the consideration of a pooled model is not possible in this case).

ExamplesStandard syntax:

. decompose lnwage educ exp exp2, by(female) detail estimates

. decompose lnwage educ exp exp2 lbda [pweight=1/prob], by(female) lambda(lbda)

Alternative syntax:

. regress lnwage educ exp exp2 [fweight=pop] if female==0 . decompose, save(high) . regress lnwage educ exp exp2 [fweight=pop] if female==1 . decompose, save(low) . regress lnwage educ exp exp2 [fweight=pop] if inlist(female,0,1) . decompose, save(pooled) . decompose

. regress lnwage educ exp exp2 if female==0 . decompose, save(high) . regress lnwage educ exp exp2 lbda if female==1 . decompose, save(low) . decompose, lambda(lbda) detail

Saved Results

r(fH)proportion of obs. (or sum of wgts) in high group (scalar)r(pred)vector of mean predictionsr(decomp)detailed decomposition matrixr(xb)matrix of coefficients and means

Methods and FormulasLet y1 and y2 be the means of the dependent variable Y,

x1 andx2 the row vectors of the means of the explanatory variables X1,...,Xk, andb1 andb2 the column vectors of the coefficient for group 1 (high) and group 2 (low). The raw differential y1-y2 may then be expressed asR = y1-y2 = (

x1-x2)b2 +x2(b1-b2) + (x1-x2)(b1-b2) = E + C + CE(Winsborough/Dickenson 1971; Jones/Kelley 1984; Daymont/Andrisani 1984), i.e., R is decomposed into a part due to differences in endowments (E), a part due to differences in coefficients (including the intercept) (C), and a part due to interaction between coefficients and endowments (CE). Depending on the model which is assumed to be non-discriminating, these terms may be used to determine the "unexplained" (U; discrimination) and the "explained" (V) part of the differential (the question is how to allocate the interaction term CE). Oaxaca (1973) proposed to assume either the low group model or the high group model as non-discriminating, which leads to U=C+CE and V=E or U=C and V=E+CE, respectively. More generally the decomposition may be written as

y1-y2 = (

x1-x2)[D*b1+(I-D)*b2] + [x1*(I-D)+x2*D](b1-b2)where

Iis a identity matrix andDis a diagonal matrix of weights. In the two cases proposed by Oaxaca (1973)Dis a nullmatrix or equalsI, respectively (D=Iis also what Blinder 1973 suggested). Reimers (1983) proposed to use the mean coefficients between the low and the high model, i.e. the diagonal elements ofDequal 0.5, Cotton (1988) proposed to weight the coefficients by group size, i.e. the diagonal elements ofDequal fH, where fH is the relative proportion of subjects in the high group (or sum of weights, if weights are applied). Finally, Neumark (1988) proposed to estimate a pooled model over both groups, which leads toD=diag(bP-b2)*diag(b1-b2)^-1 ory1-y2 = (

x1-x2)bP + [x1(b1-bP)+x2(bP-b2)]where

bP is the column vector of the coefficients in the pooled model.

decomposecalculates and displays R, E, C, CE, as well as U and V according to the methods described. The coefficient vectors are taken from "e(b)" returned by the estimation commands, the means of the explanatory variables and group sizes are calculated for "e(sample)" using summarize (weighted if necessary).Treatment of selection variables: Assume that a selection variable XS appears in both models. If it is not marked out by

lambda(XS)it will be treated just as any other variable. If it is marked out, however, the group means of Y will be adjusted for selection, that isyS1 = y1 - xS1*bS1 yS2 = y2 - xS2*bS2

where xS1 and xS2 are the group means of XS, and bS1 and bS2 the corresponding coefficients. The raw differential will then be

RS = yS1 - yS2 = y1 - y2 - (xS1*bS1 - xS2*bS2)

Now assume that the selection variable XS appears in only one model (as possible via alternative syntax). If XS is not marked out its effect will be fully enclosed in the explained part V in any case (this is accomplished by assuming xS=0 in the other model and bS1=bS2) (see Dolton/Makepeace 1986 for an alternative treatment which I did not get to incorporate yet). If it is marked out, the mean of the corresponding group will be adjusted for selection as described above.

ReferencesBlinder, A.S. (1973). Wage Discrimination: Reduced Form and Structural Estimates. The Journal of Human Resources 8: 436-455. Cotton, J. (1988). On the Decomposition of Wage Differentials. The Review of Economics and Statistics 70: 236-243. Daymont, T.N., Andrisani, P.J. (1984). Job Preferences, College Major, and the Gender Gap in Earnings. The Journal of Human Resources 19: 408-428. Dolton, P.J., Makepeace, G.H. (1986). Sample Selection and Male-Female Earnings Differentials in the Graduate Labour Market. Oxford Economic Papers 38: 317-341. Jones, F.L., Kelley, J. (1984). Decomposing Differences Between Groups. A Cautionary Note on Measuring Discrimination. Sociological Methods and Research 12: 323-343. Neumark, D. (1988). Employers' Discriminatory Behavior and the Estimation of Wage Discrimination. The Journal of Human Resources 23: 279-295. Oaxaca, R. (1973). Male-Female Wage Differentials in Urban Labor Markets. International Economic Review 14: 693-709. Reimers, C.W. (1983). Labor Market Discrimination Against Hispanic and Black Men. The Review of Economics and Statistics 65: 570-579. Winsborough, H.H., Dickenson, P. (1971). Components of Negro-White Income Differences. Proceedings of the American Statistical Association, Social Statistics Section: 6-8.

AuthorBen Jann, ETH Zurich, jann@soz.gess.ethz.ch

Also seeManual:

[R] regressOn-line: help for regress