{smcl} {* 25oct2006}{...} {hline} help for {hi:smithwelch} {hline} {title:Trend decomposition of outcome differentials} {p 8 15 2} {cmd:smithwelch} {it:est11} {it:est21} {it:est12} {it:est22} [{cmd:,} {bind:{cmdab:b:enchmark:(}{cmd:1}|{cmd:2}|{it:est1bm} {it:est2bm}{cmd:)}} {bind:{cmdab:r:eference:(}{cmd:1}|{cmd:2}|{it:estref1} {it:estref2} [{it:estrefbm}]{cmd:)}} {cmdab:d:etail}[{cmd:(}{it:dlist}{cmd:)}] {cmdab:a:djust:(}{it:varlist}{cmd:)} {cmd:eform} {cmdab:non:otes} ] {p 4 4 2} where {it:dlist} is {p 15 15 2} {it:name1} {cmd:=} {it:varlist1} [ {cmd:,} {it:name2} {cmd:=} {it:varlist2} [{cmd:,} {it:...} ] ] {title:Description} {p 4 4 2} {cmd:smithwelch} computes decompositions of differences in mean outcome differentials. Smith and Welch (1989) used such decomposition techniques in their analysis of the change in the black-white wage differential over time. An alternative application would be the decomposition of country differences in the male-female wage gap. Also see Lee (2000) and Heckman et al. (2000). {p 4 4 2} {it:est11}, {it:est21}, {it:est12}, and {it:est22} specify the previously fitted and stored regression estimates to be used with the decomposition (see help {help estimates store}). The model estimated last may be indicated by a period (.), even if it has not yet been stored. {it:est11} and {it:est21} specify the group 1 estimates (e.g. male, black) and the group 2 estimates (e.g. female, white) for the first sample (e.g. time point 1, country A), {it:est12} and {it:est22} are the group estimates for the second sample (time point 2, country B). Note that the estimation samples ({cmd:e(sample)}) of the specified models determine the relevant observations for the decomposition. Group 1 and group 2 must not overlap. {p 4 4 2} See the {help jmpierce2} package (available from the SSC archive; type {net "describe http://fmwww.bc.edu/repec/bocode/j/jmpierce2":ssc describe jmpierce2}) for an alternative approach for the decomposition of differences in differentials. See the {help oaxaca} package (type {net "describe http://fmwww.bc.edu/repec/bocode/o/oaxaca":ssc describe oaxaca}) for a program to compute single differential decompositions. {title:Options} {p 4 8 2} {cmd:benchmark(1}|{cmd:2}|{it:est1bm} {it:est2bm}{cmd:)} specifies (the estimates for) the "benchmark" sample. {cmd:benchmark(1)} signifies that sample 1 is the benchmark sample and {it:est11} and {it:est21} are the benchmark estimates. Analogously, {it:est12} and {it:est22} are used as the benchmark, if you specify {cmd:benchmark(2)}. Alternatively, use {bind:{cmd:benchmark(}{it:est1bm} {it:est2bm}{cmd:)}} to provide the estimates from another sample to be used as the benchmark (e.g. the pooled sample over all time points or countries). If {cmd:benchmark()} is omitted, an extended decomposition containing interaction terms for simultaneous changes in endowments {it:and} coefficients is computed. See the Methods and Formulas Section below. {p 4 8 2} {cmd:reference(}{cmd:1}|{cmd:2}|{it:estref1} {it:estref2} [{it:estrefbm}]{cmd:)} determines the reference coefficients within the samples to be used with the decomposition. {cmd:reference(1)} means that the coefficients from the first group (i.e. {it:est11} and {it:est12}) are used; {cmd:reference(2)} uses the group 2 estimates ({it:est21} and {it:est22}). Alternatively, specify {bind:{cmd:reference(}{it:estref1} {it:estref2} [{it:estrefbm}]{cmd:)}} to provide other reference estimates (e.g. models based on the pooled samples over both groups). {it:estrefbm} is required only if {bind:{cmd:benchmark(}{it:est1bm} {it:est2bm}{cmd:)}} is specified. If {cmd:reference()} is omitted, an extended decomposition containing interaction terms for the combined effect of differences in endowments {it:and} coefficients is computed. See the Methods and Formulas Section below. {p 4 8 2} {cmd:detail}[{cmd:(}{it:dlist}{cmd:)}] requests that detailed decomposition results for the individual regressors be reported. Use {it:dlist} to subsume the results for specific groups of regressors (variables not appearing in {it:dlist} are listed individually). The usual shorthand conventions apply to the {it:varlist}s specified in {it:dlist} (see help {it:varlist}). For example, specify {cmd:detail(exp=exp*)} if the models contain {cmd:exp} (experience) and {cmd:exp2} (experience squared). Note that individual results concerning the effect of changes/differences in coefficients may arbitrarily depend on the scaling of the regressors. {p 4 8 2} {cmd:adjust(}{it:varlist}{cmd:)} may be used to adjust the outcome differentials for the effects of certain variables (e.g. selection variables) before computing the decomposition. {p 4 8 2} {cmd:eform} causes the results to be displayed in exponentiated form. {p 4 8 2} {cmd:nonotes} suppresses the display of the legend. {title:Examples} {com}. regress lnwage educ exp exp2 if sex==0 & year==1 . estimates store male1 . regress lnwage educ exp exp2 if sex==1 & year==1 . estimates store female1 . regress lnwage educ exp exp2 if sex==0 & year==2 . estimates store male2 . regress lnwage educ exp exp2 if sex==1 & year==2 . estimates store female2 . smithwelch male1 female1 male2 female2 {txt} {com}. smithwelch male1 female1 male2 female2, benchmark(1) reference(1) {txt} {com}. generate byte year2 = year==2 . regress lnwage educ exp exp2 year2 if sex==0 & (year==1 | year==2) . estimates store male12 . regress lnwage educ exp exp2 year2 if sex==1 & (year==1 | year==2) . estimates store female12 . smithwelch male1 female1 male2 female2, benchmark(male12 female12) {txt} {com}. regress lnwage educ exp exp2 if year==1 . estimates store pooled1 . regress lnwage educ exp exp2 if year==2 . estimates store pooled2 . smithwelch male1 female1 male2 female2, reference(pooled1 pooled2) {txt} {title:Saved Results} {p 4 4 2} Matrices: {p 4 22 2}{cmd:r(D)}{space 14}Decomposition of individual differentials{p_end} {p 4 22 2}{cmd:r(DD)}{space 13}Decomposition of difference in differentials{p_end} {p 4 22 2}{cmd:r(b11)} ... {cmd:r(b22)} Parameter vectors{p_end} {p 4 22 2}{cmd:r(X11)} ... {cmd:r(X22)} Vectors of means of regressors{p_end} {p 4 22 2}{cmd:r(b1b)}, {cmd:r(b2b)}{space 4}Parameter vectors for benchmark sample (if provided){p_end} {p 4 22 2}{cmd:r(br1)}, {cmd:r(br2)}{space 4}Reference parameter vectors (if provided){p_end} {p 4 22 2}{cmd:r(brb)}{space 12}Reference parameter vector for benchmark sample (if provided){p_end} {title:Methods and Formulas} {p 4 4 2} Consider the linear model Y_gt = X_gt'b_gt + e_gt, E(e_gt) = 0, g = 1,2 t = 1,2, {p 4 4 2} where Y_gt is a vector of outcomes (e.g. log hourly wages) for group g at time t, X_gt is the data matrix (the values of the regressors), b_gt is a coefficients vector, and e_gt is the vector of residuals. The group differential in mean outcome at time t can be decomposed as follows (also see help {help oaxaca}, if installed): dy_t = y_1t - y_2t = x_1t'b_1t - x_2t'b_2t = (x_1t-x_2t)'b_2t + x_2t'(b_1t-b_2t) + (x_1t-x_2t)'(b_1t-b_2t) = dx_t'b_2t + x_2t'db_t + dx_t'db_t = E + C + EC {p 4 4 2} where y_gt and x_gt symbolize group means and the "d" prefix indicates group differences. Thus, the mean outcome differential is decomposed into a part that is due to group differences in characteristics or "endowments" (E), a part that is due to differences in coefficients (including the intercept) (C), and a correction term capturing the interaction effect of differences in endowments and coefficients (EC). The fist term, E, measures the change in mean outcome for group 2 if, everything else equal, group 2 had the group 1 endowment levels. The second term, C, measures the change in mean outcome for group 2 if group 2 retained its own endowment levels, but had the group 1 coefficients. The last term, EC, quantifies the additional effect that is due to the combined differences in in endowments and coefficients. {p 4 4 2} Now suppose that we want to analyze the change in the outcome differential over time (or compare the outcome differentials for different countries). The change in the differential from t=1 to t=2 can be written as the sum of the changes in the decomposition components E, C, and CE: dy_2 - dy_1 = [dx_2'b_22 - dx_1'b_21] + [x_22'db_2 - x_21'db_1] + [dx_2'db_2 - dx_1'db_1] = dE + dC + dEC {p 4 4 2} Each of the three terms can again be divided into a part due to changes in the x's, a part due to changes in the b's, and an interaction effect accounting for the simultaneous change in the x's and b's: dE = (dx_2-dx_1)'b_21 + dx_1'(b_22-b_21) + (dx_2-dx_1)'(b_22-b_21) dC = (x_22-x_21)'db_1 + x_21'(db_2-db_1) + (x_22-x_21)'(db_2-db_1) dEC = (dx_2-dx_1)'db_1 + dx_1'(db_2-db_1) + (dx_2-dx_1)'(db_2-db_1) (E) (C) (CE) {p 4 4 2} {it:Specifying reference models for the group differentials} {p 4 4 2} It is common practice to remove the interaction term in the decomposition of the group differentials by specifying "reference" coefficients to be used with the decomposition (for example, the pooled estimates over both groups). Let b_rt indicate the reference coefficients vector at time t. The decomposition of the outcome differential at time t can then be written as: dy_t = dx_t'b_rt + [x_1t'(b_1t-b_rt) + x_2t'(b_rt-b_2t)] = E + C {p 4 4 2} Accordingly, the difference in differentials may be expressed as dy_2 - dy_1 = dE + dC with dE = (dx_2-dx_1)'b_r1 + dx_1'(b_r2-b_r1) + (dx_2-dx_1)'(b_r2-b_r1) dC = [(x_12-x_11)'(b_11-b_r1) + (x_22-x_21)'(b_r1-b_21)] + [x_11'((b_12-b_r2)-(b_11-b_r1)) + x_21'((b_r2-b_22)-(b_r1-b_21))] + [(x_12-x_11)'((b_12-b_r2)-(b_11-b_r1)) + (x_22-x_21)'((b_r2-b_22)-(b_r1-b_21))] {p 4 4 2} Note that the equations simplify a lot if the reference coefficients are the coefficients from the first group or the second group. For example, if b_rt=b_1t: dy_t = dx_t'b_1t + x_2t'(b_1t-b_2t) dy_2 - dy_1 = dE + dC dE = (dx_2-dx_1)'b_11 + dx_1'(b_12-b_11) + (dx_2-dx_1)'(b_12-b_11) dC = (x_22-x_21)'db_1 + x_21'(db_2-db_1) + (x_22-x_21)'(db_2-db_1) {p 4 4 2} {it:Specifying a benchmark sample} {p 4 4 2} Similarly, the number of terms in the decomposition of the change in differentials can be reduced by specifying a "benchmark" sample. Let b_1b and b_2b be the coefficient vectors from the benchmark sample for group 1 and group 2. The decomposition of the difference in differentials then is: dy_2 - dy_1 = dE + dC + dEC dE = (dx_2-dx_1)'b_2b + [dx_2'(b_22-b_2b) + dx_1'(b_2b-b_21)] dC = (x_22-x_21)'db_b + [x_22'(db_2-db_b) + x_21'(db_b-db_1)] dEC = (dx_2-dx_1)'db_b + [dx_2'(db_2-db_b) + dx_1'(db_b-db_1)] {p 4 4 2} Again, the formulas simplify if one of the two time points is the benchmark. For example, if b_gb=b_g1: dE = (dx_2-dx_1)'b_21 + dx_2'(b_22-b_21) dC = (x_22-x_21)'db_1 + x_22'(db_2-db_1) dEC = (dx_2-dx_1)'db_1 + dx_2'(db_2-db_1) {p 4 4 2} Note that, if the benchmark estimates are the estimates from the pooled sample over both time points (or, e.g., all time points if there are more than two time points), it seems reasonable to include time point dummies in the models. While this is unproblematic for the decomposition of dE, it may have unwanted effects on the decomposition of dC (because the year dummies will appear in the first term of the decomposition of dC). A better solution would be to implicitly introduce the year dummies using the {help areg} command for the benchmark estimates. {p 4 4 2} {it:Specifying reference models {sf:and} a benchmark sample} {p 4 4 2} If reference and benchmark models both are specified, the formulas may be written as: dy_2 - dy_1 = dE + dC dE = (dx_2-dx_1)'b_rb + [dx_2'(b_r2-b_rb) + dx_1'(b_rb-b_r1)] dC = [(x_12-x_11)'(b_1b-b_rb) + (x_22-x_21)'(b_rb-b_2b)] + [x_12'((b_12-b_r2)-(b_1b-b_rb)) + x_22'((b_r2-b_22)-(b_rb-b_2b)) + x_11'((b_1b-b_rb)-(b_11-b_r1)) + x_21'((b_rb-b_2b)-(b_r1-b_21))] {p 4 4 2} where b_rb is the reference coefficients vector from the benchmark sample. Using the second group estimates as the reference estimates and the first time point as the benchmark yields the parametrization applied by Smith and Welch (1989): dy_2 - dy_1 = dE + dC dE = (dx_2-dx_1)'b_21 + dx_2'(b_22-b_21) (1.i) (1.iii) dC = (x_12-x_11)'(b_11-b_21) + x_12'((b_12-b_22)-(b_11-b_21)) (1.ii) (1.iv) {p 4 4 2} The numbers in parentheses beneath the decomposition components correspond to the equation numbers in Smith and Welch (1989:529). Furthermore, note that Smith and Welch use different indices (12 is 1, 22 is 2, 11 is 3, 21 is 4). {p 4 4 2} Technical notes: {p 8 10 2} - {cmd:smithwelch} does not require all models to contain the exact same set of regressors. Coefficients not appearing in a model are simply assumed to be zero for that model. However, it is important that all regressors are defined (i.e. non-missing) for all observations used with the decomposition. Thus, even if a regressor does not appear in an individual model, the regressor must contain valid values for the observations in the estimation sample of that model. {p 8 10 2} - If the models were estimated using weighted data (see help {help weight}), {cmd:smithwelch} will take account of these weights in its computations of the means of the regressors. {p 8 10 2} - If multiple-equation models or models with ancillary parameters are used with {cmd:smithwelch}, only the first equation in {cmd:e(b)} is taken into account. {title:References} {p 4 8 2} Heckman, James J., Thomas M. Lyons, Petra E. Todd (2000). Understanding Black-White Wage Differentials, 1960-1990. American Economic Review 90: 344-349. {p_end} {p 4 8 2} Lee, Sang-Hyop (2000). On Decomposing Changes in Male-Female Wage Gap. Working Paper No. 00-12. University of Hawaii at Manoa. {p_end} {p 4 8 2} Smith, James P., Finis R. Welch (1989). Black Economic Progress After Myrdal. Journal of Economic Literature 27: 519-564. {p_end} {title:Author} {p 4 4 2} Ben Jann, ETH Zurich, jann@soz.gess.ethz.ch {title:Also see} {p 4 13 2} Online: help for {help regress}, {help estimates}, {help jmpierce2} (if installed), {help oaxaca} (if installed)