-------------------------------------------------------------------------------
help for decompose
-------------------------------------------------------------------------------

Decomposition of wage differentials


Standard syntax:

        decompose varlist [weight] [if exp] [in range] , by(varname) [ detail
              estimates lambda(varname) noisy gpooled npooled regress_options ]

aweights, fweights, iweights, and pweights are allowed; see help weights.

Alternative syntax:

        decompose , save(high | low | pooled )
        decompose [ , detail estimates lambda(varname) ]


Description

Given the results from two regressions (one for each of two groups), decompose
computes several decompositions of the outcome variable difference. The
decompositions show how much of the gap is due to differing endowments between
the two groups, and how much is due to discrimination. Usually this is applied
to wage differentials using Mincer type earnings equations.

Standard syntax (varlist and by(varname) specified):  Regression models will be
estimated for each category of varname prior to the computation of the
decomposition.

Alternative syntax: Results from stand-alone estimation commands may be saved
using decompose, save(). The command decompose (without varlist, by or save)
will capture these results and compute the decomposition.

See decomp by Ian Watson for a similar package.


Options

Common options:

detail additionally displays decomposition results for variables.

estimates additionally displays a table of regressions coefficients and means.

lambda(varname) reduces the mean prediction by the effect of varname at its
    mean. This might be reasonable if varname is a selection variable.

Standard syntax options:

by(varname) specifies the grouping variable (which may be numeric or string).
    The group with highest mean on the dependent variable will be compared to
    each of the other groups.

noisy switches on regression output.

npooled deactivates the estimation of pooled regression models (which are
    required for the Neumark decomposition; see methods and formulas below).

gpooled requests the estimation of a pooled model over all groups rather than
    casewise pooled models (note: if by(varname) only specifies two groups this
    will have no effect).

regress_options control the regression estimation; see help regress.

Alternative syntax options:

save() saves the coefficients, means and the number of cases (or the sum of
    weights, respectively) of the preceding estimation. Use save(high) for the
    high group (i.e. the group with the higher mean on the dependent variable),
    save(low) for the low group, and save(pooled) for the pooled model over
    both groups. The right-hand-side varlists of the high and low models do not
    necessarily need to be identical (if, e.g., a selection term is included in
    one model; note that the consideration of a pooled model is not possible in
    this case).


Examples

Standard syntax:

        . decompose lnwage educ exp exp2, by(female) detail estimates

        . decompose lnwage educ exp exp2 lbda [pweight=1/prob], by(female)
            lambda(lbda)

Alternative syntax:

        . regress lnwage educ exp exp2 [fweight=pop] if female==0
        . decompose, save(high)
        . regress lnwage educ exp exp2 [fweight=pop] if female==1
        . decompose, save(low)
        . regress lnwage educ exp exp2 [fweight=pop] if inlist(female,0,1)
        . decompose, save(pooled)
        . decompose

        . regress lnwage educ exp exp2 if female==0
        . decompose, save(high)
        . regress lnwage educ exp exp2 lbda if female==1
        . decompose, save(low)
        . decompose, lambda(lbda) detail


Saved Results

r(fH)     proportion of obs. (or sum of wgts) in high group (scalar)
r(pred)   vector of mean predictions
r(decomp) detailed decomposition matrix
r(xb)     matrix of coefficients and means


Methods and Formulas

Let y1 and y2 be the means of the dependent variable Y, x1 and x2 the row
vectors of the means of the explanatory variables X1,...,Xk, and b1 and b2 the
column vectors of the coefficient for group 1 (high) and group 2 (low).  The
raw differential y1-y2 may then be expressed as

    R = y1-y2 = (x1-x2)b2 + x2(b1-b2) + (x1-x2)(b1-b2) = E + C + CE

(Winsborough/Dickenson 1971; Jones/Kelley 1984; Daymont/Andrisani 1984), i.e.,
R is decomposed into a part due to differences in endowments (E), a part due to
differences in coefficients (including the intercept) (C), and a part due to
interaction between coefficients and endowments (CE). Depending on the model
which is assumed to be non-discriminating, these terms may be used to determine
the "unexplained" (U; discrimination) and the "explained" (V) part of the
differential (the question is how to allocate the interaction term CE). Oaxaca
(1973) proposed to assume either the low group model or the high group model as
non-discriminating, which leads to U=C+CE and V=E or U=C and V=E+CE,
respectively. More generally the decomposition may be written as

    y1-y2 = (x1-x2)[D*b1+(I-D)*b2] + [x1*(I-D)+x2*D](b1-b2)

where I is a identity matrix and D is a diagonal matrix of weights. In the two
cases proposed by Oaxaca (1973) D is a nullmatrix or equals I, respectively
(D=I is also what Blinder 1973 suggested). Reimers (1983) proposed to use the
mean coefficients between the low and the high model, i.e. the diagonal
elements of D equal 0.5, Cotton (1988) proposed to weight the coefficients by
group size, i.e. the diagonal elements of D equal fH, where fH is the relative
proportion of subjects in the high group (or sum of weights, if weights are
applied). Finally, Neumark (1988) proposed to estimate a pooled model over both
groups, which leads to D=diag(bP-b2)*diag(b1-b2)^-1 or

    y1-y2 = (x1-x2)bP + [x1(b1-bP)+x2(bP-b2)]

where bP is the column vector of the coefficients in the pooled model.

decompose calculates and displays R, E, C, CE, as well as U and V according to
the methods described. The coefficient vectors are taken from "e(b)" returned
by the estimation commands, the means of the explanatory variables and group
sizes are calculated for "e(sample)" using summarize (weighted if necessary).

Treatment of selection variables: Assume that a selection variable XS appears
in both models. If it is not marked out by lambda(XS) it will be treated just
as any other variable. If it is marked out, however, the group means of Y will
be adjusted for selection, that is

    yS1 = y1 - xS1*bS1
    yS2 = y2 - xS2*bS2

where xS1 and xS2 are the group means of XS, and bS1 and bS2 the corresponding
coefficients. The raw differential will then be

    RS = yS1 - yS2 = y1 - y2 - (xS1*bS1 - xS2*bS2)

Now assume that the selection variable XS appears in only one model (as
possible via alternative syntax). If XS is not marked out its effect will be
fully enclosed in the explained part V in any case (this is accomplished by
assuming xS=0 in the other model and bS1=bS2) (see Dolton/Makepeace 1986 for an
alternative treatment which I did not get to incorporate yet). If it is marked
out, the mean of the corresponding group will be adjusted for selection as
described above.


References

Blinder, A.S. (1973). Wage Discrimination: Reduced Form and Structural
    Estimates. The Journal of Human Resources 8: 436-455.
Cotton, J. (1988). On the Decomposition of Wage Differentials. The Review of
    Economics and Statistics 70: 236-243.
Daymont, T.N., Andrisani, P.J. (1984). Job Preferences, College Major, and the
    Gender Gap in Earnings. The Journal of Human Resources 19: 408-428.
Dolton, P.J., Makepeace, G.H. (1986). Sample Selection and Male-Female Earnings
    Differentials in the Graduate Labour Market. Oxford Economic Papers 38:
    317-341.
Jones, F.L., Kelley, J. (1984). Decomposing Differences Between Groups. A
    Cautionary Note on Measuring Discrimination. Sociological Methods and
    Research 12: 323-343.
Neumark, D. (1988). Employers' Discriminatory Behavior and the Estimation of
    Wage Discrimination. The Journal of Human Resources 23: 279-295.
Oaxaca, R. (1973). Male-Female Wage Differentials in Urban Labor Markets.
    International Economic Review 14: 693-709.
Reimers, C.W. (1983). Labor Market Discrimination Against Hispanic and Black
    Men.  The Review of Economics and Statistics 65: 570-579.
Winsborough, H.H., Dickenson, P. (1971). Components of Negro-White Income
    Differences. Proceedings of the American Statistical Association, Social
    Statistics Section: 6-8.


Author

Ben Jann, ETH Zurich, jann@soz.gess.ethz.ch


Also see

Manual:  [R] regress
On-line:  help for regress