{smcl}
{* 22aug2002}{...}
{hline}
help for {hi:decompose}
{hline}

{title:Decomposition of wage differentials}


{p} Standard syntax:

{p 8 14}{cmd:decompose} {it:varlist} [{it:weight}] [{cmd:if} {it:exp}]
 [{cmd:in} {it:range}] {cmd:,} {cmd:by(}{it:varname}{cmd:)}
 [ {cmdab:d:etail} {cmdab:e:stimates} {cmdab:la:mbda}{cmd:(}{it:varname}{cmd:)}
 {cmdab:n:oisy} {cmdab:gp:ooled} {cmdab:np:ooled} {it:regress_options} ]

{p} {cmd:aweight}s, {cmd:fweight}s, {cmd:iweight}s, and {cmd:pweight}s are allowed;
see help {help weights}.

{p} Alternative syntax:

{p 8 14}{cmd:decompose} {cmd:,} {cmdab:s:ave}{cmd:(}{cmdab:h:igh} | {cmdab:l:ow} |
 {cmdab:p:ooled }{cmd:)}{p_end}
{p 8 14}{cmd:decompose} [ {cmd:,} {cmdab:d:etail} {cmdab:e:stimates}
 {cmdab:la:mbda}{cmd:(}{it:varname}{cmd:)} ]


{title:Description}

{p}Given the results from two regressions (one for each of two groups),
{cmd:decompose} computes several decompositions of the outcome variable difference. The
decompositions show how much of the gap is due to differing endowments between the
two groups, and how much is due to discrimination. Usually this is applied to wage
differentials using Mincer type earnings equations.

{p}Standard syntax ({it:varlist} and {cmd:by(}{it:varname}{cmd:)} specified):
Regression models will be estimated for each category of {it:varname} prior to the
computation of the decomposition.

{p}Alternative syntax: Results from stand-alone estimation commands may be saved
using {cmd:decompose, save()}. The command {cmd:decompose}
(without {it:varlist}, {cmd:by} or {cmd:save}) will capture these
results and compute the decomposition.

{p}See {net "describe http://fmwww.bc.edu/RePEc/bocode/d/decomp":decomp}
by Ian Watson for a similar package.


{title:Options}

{p}Common options:

{p 0 4}{cmd:detail} additionally displays decomposition results for variables.

{p 0 4}{cmd:estimates} additionally displays a table of regressions coefficients and
means.

{p 0 4}{cmd:lambda(}{it:varname}{cmd:)} reduces the mean prediction by the effect of
{it:varname} at its mean. This might be reasonable if {it:varname} is a selection
variable.

{p}Standard syntax options:

{p 0 4}{cmd:by(}{it:varname}{cmd:)} specifies the grouping variable (which may be
numeric or string). The group with highest mean on the dependent variable will be
compared to each of the other groups.

{p 0 4}{cmd:noisy} switches on regression output.

{p 0 4}{cmd:npooled} deactivates the estimation of pooled regression models (which are
required for the Neumark decomposition; see methods and formulas below).

{p 0 4}{cmd:gpooled} requests the estimation of a pooled model over all groups rather
than casewise pooled models (note: if {cmd:by(}{it:varname}{cmd:)} only specifies two
groups this will have no effect).

{p 0 4}{it:regress_options} control the regression estimation; see help {help regress}.

{p}Alternative syntax options:

{p 0 4}{cmd:save()} saves the coefficients, means and the number of cases (or the sum
of weights, respectively) of the preceding estimation. Use {cmd:save(high)} for the
high group (i.e. the group with the higher mean on the dependent variable),
{cmd:save(low)} for the low group, and {cmd:save(pooled)} for the pooled model over
both groups. The right-hand-side varlists of the high and low models do not
necessarily need to be identical (if, e.g., a selection term is included in one
model; note that the consideration of a pooled model is not possible in this case).


{title:Examples}

{p} Standard syntax:

{p 8 12}{inp:. decompose lnwage educ exp exp2, by(female) detail estimates}

{p 8 12}{inp:. decompose lnwage educ exp exp2 lbda [pweight=1/prob], by(female) lambda(lbda)}

{p} Alternative syntax:

{p 8 12}{inp:. regress lnwage educ exp exp2 [fweight=pop] if female==0}{p_end}
{p 8 12}{inp:. decompose, save(high)}{p_end}
{p 8 12}{inp:. regress lnwage educ exp exp2 [fweight=pop] if female==1}{p_end}
{p 8 12}{inp:. decompose, save(low)}{p_end}
{p 8 12}{inp:. regress lnwage educ exp exp2 [fweight=pop] if inlist(female,0,1)}{p_end}
{p 8 12}{inp:. decompose, save(pooled)}{p_end}
{p 8 12}{inp:. decompose}

{p 8 12}{inp:. regress lnwage educ exp exp2 if female==0}{p_end}
{p 8 12}{inp:. decompose, save(high)}{p_end}
{p 8 12}{inp:. regress lnwage educ exp exp2 lbda if female==1}{p_end}
{p 8 12}{inp:. decompose, save(low)}{p_end}
{p 8 12}{inp:. decompose, lambda(lbda) detail}


{title:Saved Results}

{p}{cmd:r(fH)} {space 3} proportion of obs. (or sum of wgts) in high group (scalar){p_end}
{p}{cmd:r(pred)} {space 1} vector of mean predictions{p_end}
{p}{cmd:r(decomp)} detailed decomposition matrix{p_end}
{p}{cmd:r(xb)} {space 3} matrix of coefficients and means


{title:Methods and Formulas}

{p}Let y1 and y2 be the means of the dependent variable Y, {bf:x}1 and {bf:x}2 the
row vectors of the means of the explanatory variables X1,...,Xk, and {bf:b}1 and
{bf:b}2 the column vectors of the coefficient for group 1 (high) and group 2 (low).
The raw differential y1-y2 may then be expressed as

{p 4 4}R = y1-y2 = ({bf:x}1-{bf:x}2){bf:b}2 + {bf:x}2({bf:b}1-{bf:b}2) +
({bf:x}1-{bf:x}2)({bf:b}1-{bf:b}2) = E + C + CE

{p}(Winsborough/Dickenson 1971; Jones/Kelley 1984; Daymont/Andrisani 1984), i.e., R
is decomposed into a part due to differences in endowments (E), a part due to
differences in coefficients (including the intercept) (C), and a part due to
interaction between coefficients and endowments (CE). Depending on the model which is
assumed to be non-discriminating, these terms may be used to determine the
"unexplained" (U; discrimination) and the "explained" (V) part of the differential
(the question is how to allocate the interaction term CE). Oaxaca (1973) proposed to
assume either the low group model or the high group model as non-discriminating,
which leads to U=C+CE and V=E or U=C and V=E+CE, respectively. More generally the
decomposition may be written as

{p 4 4}y1-y2 = ({bf:x}1-{bf:x}2)[{bf:D}*{bf:b}1+({bf:I}-{bf:D})*{bf:b}2] +
[{bf:x}1*({bf:I}-{bf:D})+{bf:x}2*{bf:D}]({bf:b}1-{bf:b}2)

{p}where {bf:I} is a identity matrix and {bf:D} is a diagonal matrix of weights. In
the two cases proposed by Oaxaca (1973) {bf:D} is a nullmatrix or equals {bf:I},
respectively ({bf:D}={bf:I} is also what Blinder 1973 suggested). Reimers (1983)
proposed to use the mean coefficients between the low and the high model, i.e. the
diagonal elements of {bf:D} equal 0.5, Cotton (1988) proposed to weight the
coefficients by group size, i.e. the diagonal elements of {bf:D} equal fH, where
fH is the relative proportion of subjects in the high group (or sum of weights,
if weights are applied). Finally, Neumark (1988) proposed to
estimate a pooled model over both groups, which leads to
{bf:D}=diag({bf:b}P-{bf:b}2)*diag({bf:b}1-{bf:b}2)^-1 or

{p 4 4}y1-y2 = ({bf:x}1-{bf:x}2){bf:b}P +
[{bf:x}1({bf:b}1-{bf:b}P)+{bf:x}2({bf:b}P-{bf:b}2)]

{p}where {bf:b}P is the column vector of the coefficients in the pooled model.

{p}{cmd:decompose} calculates and displays R, E, C, CE, as well as U and V according to
the methods described. The coefficient vectors are taken from "e(b)" returned by
the estimation commands, the means of the explanatory variables and group sizes are
calculated for "e(sample)" using {help summarize} (weighted if necessary).

{p}Treatment of selection variables: Assume that a selection variable XS appears in
both models. If it is not marked out by {cmd:lambda(}XS{cmd:)} it will be treated
just as any other variable. If it is marked out, however, the group means of Y will be
adjusted for selection, that is

{p 4 4}yS1 = y1 - xS1*bS1{p_end}
{p 4 4}yS2 = y2 - xS2*bS2

{p}where xS1 and xS2 are the group means of XS, and bS1 and bS2 the corresponding
coefficients. The raw differential will then be

{p 4 4}RS = yS1 - yS2 = y1 - y2 - (xS1*bS1 - xS2*bS2)

{p}Now assume that the selection variable XS appears in only one model (as possible
via alternative syntax). If XS is not marked out its effect
will be fully enclosed in the explained part V in any case (this is accomplished by
assuming xS=0 in the other model and bS1=bS2) (see Dolton/Makepeace 1986 for an
alternative treatment which I did not get to incorporate yet). If it is marked out,
the mean of the corresponding group will be adjusted for selection as described
above.


{title:References}

{p 0 4}Blinder, A.S. (1973). Wage Discrimination: Reduced Form and Structural
Estimates. The Journal of Human Resources 8: 436-455.{p_end}
{p 0 4}Cotton, J. (1988). On the Decomposition of Wage Differentials. The Review of
Economics and Statistics 70: 236-243.{p_end}
{p 0 4}Daymont, T.N., Andrisani, P.J. (1984). Job Preferences, College Major, and the
Gender Gap in Earnings. The Journal of Human Resources 19: 408-428.{p_end}
{p 0 4}Dolton, P.J., Makepeace, G.H. (1986). Sample Selection and Male-Female Earnings
Differentials in the Graduate Labour Market. Oxford Economic Papers 38: 317-341.{p_end}
{p 0 4}Jones, F.L., Kelley, J. (1984). Decomposing Differences Between Groups. A Cautionary
Note on Measuring Discrimination. Sociological Methods and Research 12: 323-343.{p_end}
{p 0 4}Neumark, D. (1988). Employers' Discriminatory Behavior and the Estimation of
Wage Discrimination. The Journal of Human Resources 23: 279-295.{p_end}
{p 0 4}Oaxaca, R. (1973). Male-Female Wage Differentials in Urban Labor Markets.
International Economic Review 14: 693-709.{p_end}
{p 0 4}Reimers, C.W. (1983). Labor Market Discrimination Against Hispanic and Black Men.
The Review of Economics and Statistics 65: 570-579.{p_end}
{p 0 4}Winsborough, H.H., Dickenson, P. (1971). Components of Negro-White Income
Differences. Proceedings of the American Statistical
Association, Social Statistics Section: 6-8.


{title:Author}

{p}Ben Jann, ETH Zurich, jann@soz.gess.ethz.ch


{title:Also see}

Manual:  {hi:[R] regress}
{p 0 19}On-line:  help for {help regress}{p_end}