{smcl}
{* 25oct2006}{...}
{hline}
help for {hi:jmpierce2}
{hline}
{title:Second-order Juhn-Murphy-Pierce decomposition}
{p 8 15 2}
{cmd:jmpierce2} {it:est11} {it:est21} {it:est12} {it:est22}
[ {cmd:,}
{bind:{cmdab:b:enchmark:(}{cmd:1}|{cmd:2}|{it:est1bm} {it:est2bm}{cmd:)}}
{bind:{cmdab:r:eference:(}{cmd:1}|{cmd:2}|{it:estref1} {it:estref2} [{it:estrefbm}]{cmd:)}}
{cmdab:d:etail}[{cmd:(}{it:dlist}{cmd:)}]
{cmdab:par:ametric}
{bind:{cmdab:res:iduals:(}{it:newvar1 newvar2}|{it:prefix}{cmd:)}}
{bind:{cmdab:rank:s:(}{it:newvar1 newvar2}|{it:prefix}{cmd:)}}
{cmdab:non:otes} {cmd:nopreserve} ]
{p 4 4 2} where {it:dlist} is
{p 15 15 2}
{it:name1} {cmd:=} {it:varlist1} [ {cmd:,} {it:name2} {cmd:=} {it:varlist2}
[{cmd:,} {it:...} ] ]
{title:Description}
{p 4 4 2}
{cmd:jmpierce2} computes the decomposition of differences in mean outcome
differentials proposed by Juhn, Murphy and Pierce (1991). An example is
the decomposition of the change of the black-white or the male-female wage
differential over time (Juhn, Murphy and Pierce 1991; Blau and Kahn 1997)
or the decomposition of differences in the male-female wage differential
between countries (Blau and Kahn 1992, 1996; OECD 2002).
{p 4 4 2}
{it:est11}, {it:est21}, {it:est12}, and {it:est22} specify the
previously fitted and stored regression estimates to be used with the
decomposition (see help {help estimates store}). The model estimated last
may be indicated by a period (.), even if it has not yet been stored.
{it:est11} and {it:est21} specify the group 1 estimate (e.g. male, white)
and the group 2 estimate (e.g. female, black) for the first sample (e.g.
time point 1, country A), {it:est12} and {it:est22} are the group
estimates for the second sample (time point 2, country B). Note that the
estimation samples ({cmd:e(sample)}) of the specified models determine the
relevant observations for the decomposition. Group 1 and group 2 must not
overlap.
{p 4 4 2}
See the {help smithwelch} package (available from the SSC archive; type
{net "describe http://fmwww.bc.edu/repec/bocode/s/smithwelch":ssc describe smithwelch})
for an alternative approach to decompose differences in differentials.
{p 4 4 2}{hi:Warning:} {cmd:jmpierce2} is intended for use with models that have
been estimated by the {help regress} command. Use {cmd:jmpierce2} with other
models at your own risk.
{title:Options}
{p 4 8 2}
{cmd:benchmark(1}|{cmd:2}|{it:est1bm} {it:est2bm}{cmd:)} specifies (the
estimates for) the "benchmark" sample. {cmd:benchmark(1)} signifies that
sample 1 is the benchmark sample and {it:est11} and {it:est21} are the
benchmark estimates. Analogously, {it:est12} and {it:est22} are used as the
benchmark, if you specify {cmd:benchmark(2)}. Alternatively, use
{bind:{cmd:benchmark(}{it:est1bm} {it:est2bm}{cmd:)}} to provide the
estimates from another sample to be used as the benchmark (e.g. the pooled
sample over all time points or countries). If {cmd:benchmark()} is omitted,
an extended decomposition containing interaction terms for simultaneous
changes in quantities and prices is computed. See the Methods and Formulas
Section below.
{p 4 8 2}
{cmd:reference(}{cmd:1}|{cmd:2}|{it:estref1} {it:estref2}
[{it:estrefbm}]{cmd:)} determines the reference coefficients and reference
residual distributions within the samples to be used with the
decomposition. The default is {cmd:reference(1)}, meaning that the
coefficients from the first group (i.e. {it:est11} and
{it:est12}) are used; {cmd:reference(2)} uses the group 2 estimates
({it:est21} and {it:est22}). Alternatively, specify
{bind:{cmd:reference(}{it:estref1} {it:estref2} [{it:estrefbm}]{cmd:)}}
to provide other reference estimates (e.g. models based on the pooled
samples over both groups). {it:estrefbm} is required only if
{bind:{cmd:benchmark(}{it:est1bm} {it:est2bm}{cmd:)}} is specified.
{p 4 8 2}
{cmd:detail}[{cmd:(}{it:dlist}{cmd:)}] requests that detailed
decomposition results for the individual regressors be reported (applies
only to the decomposition of the change in the "predicted gap"; see the
Methods and Formulas Section below). Use {it:dlist} to subsume the results
for specific groups of regressors (variables not appearing in {it:dlist}
are listed individually). The usual shorthand conventions apply to the
{it:varlist}s specified in {it:dlist} (see help {it:varlist}). For
example, specify {cmd:detail(exp=exp*)} if the models contain {cmd:exp}
(experience) and {cmd:exp2} (experience squared).
{p 4 8 2}
{cmd:parametric} causes {cmd:jmpierce2} to compute the decomposition using
standardized residuals and residual standard deviations. The default is to
apply a nonparametric approach based on the relative ranks of the
residuals and the inverse residual distribution functions.
{p 4 8 2}
{cmd:residuals(}{it:newvar1 newvar2}|{it:prefix}{cmd:)} saves the imputed
hypothetical residuals as variables ({it:newvar1} or {it:prefix}{cmd:1}
for the first sample, {it:newvar2} or {it:prefix}{cmd:2} for the second
sample).
{p 4 8 2}
{cmd:ranks(}{it:newvar1 newvar2}|{it:prefix}{cmd:)} saves the computed
percentile ranks as variables ({it:newvar1} or {it:prefix}{cmd:1} for
the first sample, {it:newvar2} or {it:prefix}{cmd:2} for the second sample).
{p 4 8 2}
{cmd:nonotes} suppresses the display of the legend.
{p 4 8 2}
{cmd:nopreserve} is a technical option. {cmd:jmpierce2} internally preserves the
data (see help {help preserve}) and then drops all unused observations to
speed up the computations. However, if {cmd:nopreserve} is specified,
{cmd:jmpierce2} skips preserving the data and keeps the unused observations in
memory. {cmd:nopreserve} may make sense if there are only few unused
observations or if {cmd:parametric} is specified.
{title:Examples}
{com}. regress lnwage educ exp exp2 if sex==0 & year==1
. estimates store male1
. regress lnwage educ exp exp2 if sex==1 & year==1
. estimates store female1
. regress lnwage educ exp exp2 if sex==0 & year==2
. estimates store male2
. regress lnwage educ exp exp2 if sex==1 & year==2
. estimates store female2
. jmpierce2 male1 female1 male2 female2
{txt}
{com}. jmpierce2 male1 female1 male2 female2, benchmark(1)
{txt}
{com}. generate byte year2 = year==2
. regress lnwage educ exp exp2 year2 if sex==0 & (year==1 | year==2)
. estimates store male12
. regress lnwage educ exp exp2 year2 if sex==1 & (year==1 | year==2)
. estimates store female12
. jmpierce2 male1 female1 male2 female2, benchmark(male12 female12)
{txt}
{com}. regress lnwage educ exp exp2 if year==1
. estimates store pooled1
. regress lnwage educ exp exp2 if year==2
. estimates store pooled2
. jmpierce2 male1 female1 male2 female2, reference(pooled1 pooled2)
{txt}
{title:Saved Results}
{p 4 4 2}
Matrices:
{p 4 15 2}{cmd:r(D)}{space 7}Decomposition of differentials{p_end}
{p 4 15 2}{cmd:r(DD)}{space 6}Decomposition of difference in differentials{p_end}
{p 4 15 2}{cmd:r(E)}{space 7}Decomposition of difference in predicted gap{p_end}
{p 4 15 2}{cmd:r(U)}{space 7}Decomposition of difference in residual gap{p_end}
{p 4 15 2}{cmd:r(b1)}{space 6}Parameter vector for sample 1{p_end}
{p 4 15 2}{cmd:r(b2)}{space 6}Parameter vector for sample 2{p_end}
{p 4 15 2}{cmd:r(b3)}{space 6}Parameter vector for benchmark sample (if provided){p_end}
{p 4 15 2}{cmd:r(dX1)}{space 5}Vector of quantity differences for sample 1{p_end}
{p 4 15 2}{cmd:r(dX2)}{space 5}Vector of quantity differences for sample 2{p_end}
{title:Methods and Formulas}
{p 4 4 2}
Consider the linear model
y_t = x_t'b_t + e_t, E(e_t) = 0
{p 4 4 2}
where y_t is a vector of outcomes (e.g. log hourly wages) at time t, x_t
is the data matrix (the values of the regressors), b_t is a coefficients
vector, and e_t is the vector of residuals. The model can be reformulated
as
y_t = x_t'b_t + r_t*s_t
{p 4 4 2}
where s_t represents the standard deviation of the residuals and
r_t is the vector of standardized residuals. Thus, the equation
now has a two-component residual, that is, the residuals are expressed
as a function of the general residual inequality at time t and the
positions of the residuals in the residual distribution.
{p 4 4 2}
Given two groups (e.g. males and females), the mean outcome differential
between the two groups can then be decomposed as follows:
dy_t = dx_t'b_t + dr_t*s_t
{p 4 4 2}
where dy is the difference in mean outcomes between the groups, dx is a
vector of the group differences in means of regressors, and dr is the
group difference in mean standardized residuals. The first term, E = dx_t'b_t,
is the "predicted gap". It reflects the "explained" part of the
differential due to differences in "observed quantities" (aka "endowments"
aka regressors). The second term, U = dr_t*s_t, is the "residual gap" and
reflects the "unexplained" part of the differential (due to differences in
"unobserved quantities", their "unobserved prices", and discrimination).
It is easy to see that the "predicted gap" and the "residual gap"
are equivalent to the explained part and the unexplained part in the standard
Blinder-Oaxaca decomposition (see, e.g., help {help oaxaca}; available
from the SSC Archive, type
{net "describe http://fmwww.bc.edu/repec/bocode/o/oaxaca":ssc describe oaxaca}).
{p 4 4 2}
Now, given two time points t=1 and t=2 (or, e.g., two countries), the
{it:change} in the outcome differential can be written as
dy_2-dy_1 = [dx_2'b_2 - dx_1'b_1] + [dr_2*s_2 - dr_1*s_1]
{p 4 4 2}
where the first part on the right-hand side of the equation is the change
in the "predicted gap" (dE) and the second part is the change in the
"residual gap" (dU). The two terms can be further decomposed into
dE = (dx_2-dx_1)'b_1 + dx_1'(b_2-b_1) + (dx_2-dx_1)'(b_2-b_1)
and
dU = (dr_2-dr_1)s_1 + dr_1(s_2-s_1) + (dr_2-dr_1)(s_2-s_1)
{p 4 4 2}
The first term in the decomposition of dE reflects the portion of the
change in the "predicted gap" that is explained by changes in the group
differences in "observed quantities" (aka endowments) and the second term
is the part that is due to changes in "observed prices" (aka
coefficients). The third term is an adjustment term accounting for the
interaction effect induced by the simultaneous change in quantities and
prices. Similarly, the first term in the decomposition of dU, sometimes
called the "gap effect", reflects the change that is due to changes in the
group differences in residual positions (i.e. changes in the group
differences in "unobserved quantities" and changes in discrimination) and
the second term is the part due to changes in residual inequality (i.e.
changes in "unobserved prices" for the "unobserved quantities"). The last
term again adjusts for interaction.
{p 4 4 2}
It is common practice to reduce the three terms in the decompositions
above to two terms only by employing the coefficients vector and residual
variation from a "benchmark" sample. Be b_B the benchmark
coefficients vector and s_B the benchmark residual standard deviation. The
decompositions may then be written as
dE = (dx_2-dx_1)'b_B + [dx_2'(b_2-b_B) + dx_1'(b_B-b_1)]
and
dU = (dr_2-dr_1)s_B + [dr_2(s_2-s_B) + dr_1(s_B-s_1)]
{p 4 4 2}
If one of the two time points is the benchmark, the formulas simplify to
the parametrization applied by Juhn, Murphy and Pierce (1991), that is
dE = (dx_2-dx_1)'b_1 + dx_2'(b_2-b_1)
dU = (dr_2-dr_1)s_1 + dr_2(s_2-s_1)
{p 4 4 2}
or the parametrization applied by, e.g., Blau and Kahn (1997), that is
dE = (dx_2-dx_1)'b_2 + dx_1'(b_2-b_1)
dU = (dr_2-dr_1)s_2 + dr_1(s_2-s_1)
{p 4 4 2}
An alternative would be, for example, to use the pooled sample over all
time points as the benchmark sample. Note that in this case it is
reasonable to include year dummies in the models for the benchmark sample
(see, e.g., OECD 2002:103).
{p 4 4 2}
{it:Nonparametric implementation of the decomposition of dU}
{p 4 4 2}
By definition, e_t = r_t*s_t. Therefore, dr_1*s_1 is simply the group
difference in mean residuals at t=1 and dr_2*s_2 is the difference in mean
residuals at t=2. But what about dr_1*s_2 or dr_2*s_1? One obvious
solution would be to estimate the residual standard deviations and the
standardized residuals for both time points and then multiply the standard
deviation of one time point with the mean difference in standardized
residuals of the other. This approach is applied by {cmd:jmpierce2} if
specifying the {cmd:parametric} option. The disadvantage of the parametric
approach is that differences in distributional shape (apart from the
variance of the distribution) are neglected. Therefore, Juhn et al. (1991)
proposed the following non-parametric procedure, which is the default
procedure in {cmd:jmpierce2}. Let F_t() be the distribution function of the
residuals at time t. Furthermore, let q_t represent the positions of the
residuals in the residual distribution at time t (see help
{help relrank}; available from the SSC Archive, type
{net "describe http://fmwww.bc.edu/repec/bocode/r/relrank":ssc describe relrank}),
that is
q_t = F_t(e_t)
Furthermore
e_t = F[-1]_t(q_t)
{p 4 4 2}
where F[-1]_t() stands for the inverse of F_t() (see help {help invcdf};
available from the SSC Archive, type
{net "describe http://fmwww.bc.edu/repec/bocode/i/invcdf":ssc describe invcdf}).
Applying the inverse distribution function of one time point to the
residual ranks of the other, leads to a non-parametric version of the
decomposition of dU. For example, dr_1*s_2 is obtained by assigning each
individual at t=1 a percentile number corresponding to its position in the
residual distribution of t=1, then using these relative ranks to derive
hypothetical residuals for the t=1 individuals given the t=2 residual
distribution function, and finally computing the group difference in the
means of these hypothetical residuals.
{p 4 4 2}
{it:Reference coefficients and reference residual distribution}
{p 4 4 2}
For each time point, a reference model must be specified to determine the
coefficients and residual distribution to be used in the decomposition.
The default is to use {it:est11} and {it:est12} as the reference models
(see the {cmd:reference()} option). From a technical point of view, two
situations have to be distinguished. First, the reference model may be
the group 1 model ({cmd:reference(1)}) or the group 2 model
({cmd:reference(2)}). In these cases, the coefficients of that model are
used to compute the residuals for both groups, but only the observations
in the reference group are used to determine the residual distribution
function. Second, the reference model may be some other model (e.g. a
pooled model over both groups). In this case, the coefficients from the
reference model are again used to compute the residuals for both groups.
The residual distribution function, however, is not derived from these
residuals. It is instead computed using the pooled residuals from the two
group-specific models.
{p 4 4 2}
Technical notes:
{p 8 10 2}
- {cmd:jmpierce2} does not require all models to contain the exact same set of
regressors. Coefficients not appearing in a model are simply assumed to be
zero for that model. However, it is important that all regressors are
defined (i.e. non-missing) for all observations used with the
decomposition. Thus, even if a regressor does not appear in an individual
model, the regressor must contain valid values for the observations in the
estimation sample of that model.
{p 8 10 2}
- {cmd:jmpierce2} computes residuals as the differences between the values of
the model's dependent variable and the model's linear predictions (using
{help matrix score}). If the models have been estimated using weighted
data, {cmd:jmpierce2} will take account of these weights in its computations.
In the {cmd:parametric} mode, {cmd:jmpierce2} will use the value of
{cmd:e(rmse)} as the model's residual standard deviation. If
multiple-equation models or models with ancillary parameters are used with
{cmd:jmpierce2}, only the first equation in {cmd:e(b)} is taken into account.
{title:References}
{p 4 8 2}
Juhn, Chinhui, Kevin M. Murphy, Brooks Pierce (1991). Accounting for the
Slowdown in Black-White Wage Convergence. Pp. 107-143 in: Workers and
Their Wages, ed. by Marvin Kosters, Washington, DC: AEI Press.
{p_end}
{p 4 8 2}
Blau, Francine D., Lawrence M. Kahn (1992). The Gender Earnings Gap:
Learning from International Comparisons. American Economic Review 82:
533-538.
{p_end}
{p 4 8 2}
Blau, Francine D., Lawrence M. Kahn (1996). Wage Structure and Gender
Earnings Differentials: an International Comparison. Economica 63:
S29-S62.
{p_end}
{p 4 8 2}
Blau, Francine D., Lawrence M. Kahn (1997). Swimming Upstream: Trends in
the Gender Wage Differential in the 1980s. Journal of Labor Economics 15:
1-42.
{p_end}
{p 4 8 2}
OECD (2002). Employment Outlook, Chapter 2. Paris.
{title:Author}
{p 4 4 2}
Ben Jann, ETH Zurich, jann@soz.gess.ethz.ch
{title:Also see}
{p 4 13 2}
Online: help for {help regress}, {help estimates}, {help cumul},
{help smithwelch} (if installed), {help jmp} (if installed), {help oaxaca} (if installed),
{help invcdf} (if installed), {help relrank} (if installed)