Juhn-Murphy-Pierce decomposition
jmpierce est1 est2 [ , reference(0|1|2|estref) statitics(statlist) blocks(blist) save(newvar1 newvar2|prefix) residuals(newvar) ]
where blist is
name1 = varlist1 [, name2 = varlist2 [, ...] ]
Description
jmpierce computes the decomposition of differences between two outcome distributions introduced by Juhn, Murphy and Pierce (1993; also see Blau and Kahn 1996) from models previously fitted and stored by estimates store. Examples are the decomposition of changes in the income distribution over time, the decomposition of male-female wage differentials, or the decomposition of wage inequality differences between countries.
est1 is the name of the estimates related to the first distribution (e.g. the distribution in country A, among males, or at time point t), est2 is the name of the estimates related to the second distribution (e.g. the distribution in country B, among females, or at time point t-1). Note that the samples underlying est1 and est2 must be disjunctive.
The model estimated last may be indicated by a period (.), even if it has not yet been stored.
See the oaxaca package and the decompose package (available from the SSC archive; type ssc describe oaxaca and ssc describe decompose) for programs to compute Oaxaca-Blinder type decompositions. See the jmpierce2 package for the decomposition of changes of differentials over time (ssc describe jmpierce2).
Warning: jmpierce is intended for use with models that have been estimated by the regress command. Use jmpierce with other models at your own risk.
Options
reference() specifies the reference or benchmark model. The default is reference(0), meaning that the average coefficients from est1 and est2 are used as the reference prices and the average residual distribution from est1 and est2 is used as the reference residual distribution. However, reference(1) uses est1 and reference(2) uses est2 as the benchmark, i.e. the coefficients from either est1 or est2 are used as the reference prices and the residuals from est1 or est2 are used to determine the reference residual distribution. Alternatively, specify reference(estref), where estref is the name of the reference model. In this case, the coefficients from estref are used as the reference prices and the average residual distribution from est1 and est2 is used as the reference residual distribution. See the "Methods and Formulas" section for more details.
statistics(statlist) specifies the summary statistics for which the decomposition be displayed. The default is statistics(mean). Specify, for example, statistics(p25 p50 p75) to compute the decomposition for the 25th, 50th, and 75th percentile. Available statistics are
statname definition -------------------------------------------------- mean mean sd standard deviation median median (same as p50) p5 5th percentile p10 10th percentile p25 25th percentile p50 50th percentile (same as median) p75 75th percentile p90 90th percentile p95 95th percentile iqr interquartile range (same as d7525) d9010 p90 - p10 d7525 p75 - p25 (same as iqr) d9050 p90 - p50 d5010 p50 - p10 --------------------------------------------------
blocks(blist) reports the quantity effect (Q) of specified blocks of variables. Unless the decomposition is conducted at the mean, the results will most likely depend on the order of the blocks.
save(newvar1 newvar2|prefix) creates a variable reflecting the hypothetical outcome distribution under the condition of fixed prices and fixed unobservables (called newvar1 or prefix1, respectively) and a variable reflecting the hypothetical outcome distribution under the condition of fixed unobservables (called newvar2 or prefix2, respectively).
residuals(newvar) creates a variable containing the hypothetical residuals called newvar.
Examples
Decomposition of the gender wage gap using the residuals/prices of the male model as benchmark:
. regress lnwage educ exp exp2 if sex==1 . estimates store male . regress lnwage educ exp exp2 if sex==2 . estimates store female . jmpierce male female, reference(1) statistics(mean median)
... subdividing the quantity effect Q between education and experience:
. jmpierce male female, reference(1) blocks(educ=educ, exp=exp*)
... using a the average residual distribution and the prices of a pooled model as benchmark:
. regress lnwage educ exp exp2 if sex==1 | sex==2 . jmpierce male female, ref(.)
Saved Results
Matrices:
r(D) The components of the decomposition(s) r(stats1), r(stats2) The summary statistics for the hypothetical and raw distributions r(Qblocks) Quantity effect by (blocks of) variables
Methods and Formulas
Closely following Juhn, Murphy and Pierce (1993): Given are the models
y_1 = x_1b_1 + u_1 y_2 = x_2b_2 + u_2
where y_1 and y_2 are the vectors of the values of the dependent variable in two samples, x_1 and x_2 are the data matrices (observable quantities), b_1 and b_2 are the vectors of estimated coefficients (observable prices) and u_1 and u_2 are the residuals (unmeasured prices and quantities).
Let F_1(.) and F_2(.) denote the cumulative distribution functions of the residuals. For example,
p_i1 = F_1(u_i1|x_i1)
is the percentile of an individual residual in the residual distribution of model 1. By definition we can write
u_i1 = F_1[-1](p_i1|x_i1)
where F_1[-1](.) is the inverse of the cumulative distribution function.
Next, assume that F(.) is a reference residual distribution (e.g. the average residual distribution over both samples) and that b is an estimate of benchmark coefficients (e.g. the coefficients from a pooled model over the whole sample). We can then determine hypothetical outcomes with varying quantities between the groups but fixed prices (coefficients) and a fixed residual distribution as
y1_i1 = x_i1b + F[-1](p_i1|x_i1) y1_i2 = x_i2b + F[-1](p_i2|x_i2)
Furthermore, the hypothetical outcomes with varying quantities and varying prices but a fixed residual distribution are given as
y2_i1 = x_i1b_1 + F[-1](p_i1|x_i1) y2_i2 = x_i2b_2 + F[-1](p_i2|x_i2)
Finally, the outcomes with varying quantities, varying prices and a varying residual distribution can be determined as
y3_i1 = x_i1b_1 + F_1[-1](p_i1|x_i1) y3_i2 = x_i2b_2 + F_2[-1](p_i2|x_i2)
These last outcomes are obviously nothing else than the originally observed values, that is:
y3_i1 = y_i1 = x_i1b_1 + u_i1 y3_i2 = y_i1 = x_i2b_1 + u_i2
Let a capital letter stand for a summary statistic of the distribution of the variable denoted by the corresponding lower-case letter. For instance, Y may be the mean or the interquartile range of the distribution of y. The differential Y_1-Y_2 can then be decomposed as
Y_1-Y_2 = [Y1_1-Y1_2] + [(Y2_1-Y2_2) - (Y1_1-Y1_2)] + [(Y3_1-Y3_2) - (Y2_1-Y2_2)] = T = Q + P + U
That is, the total difference (T) can be attributed to differences in observable quantities (Q), differences in observable prices (P), and differences in unobservable quantities and prices (U).
Technical notes:
- jmpierce's method to invert the empirical distribution function uses averages where the function is flat (the same method is used by summarize and pctile). Also see the invcdf package (available from the SSC archive; type ssc d invcdf). The choice of the inversion method may have a significant impact on the decomposition results (especially in small samples).
- reference(0) (the default) causes jmpierce to use the average coefficients from est1 and est2 as the reference coefficients. The "average" coefficients are derived by computing a simple arithmetic mean, that is b = (b_1+b_2)/2.
- reference(0) or reference(estref), where estref is not est1 nor est2, causes jmpierce to use the average residual distribution from est1 and est2 as the reference residual distribution. The "average" residual distribution is computed by pooling the residuals from est1 and est2.
References
Juhn, Chinhui, Kevin M. Murphy, Brooks Pierce (1993). Wage Inequality and the Rise in Returns to Skill. Journal of Political Economy 101(3): 410-442.
Blau, Francine D., Lawrence M. Kahn (1996). International Differences in Male Wage Inequality: Institutions versus Market Forces. Journal of Political Economy 104(4): 791-837.
Author
Ben Jann, ETH Zurich, jann@soz.gess.ethz.ch
Also see
Online: help for regress, estimates, cumul, pctile, oaxaca (if installed), decompose (if installed), jmpierce2 (if installed),