{smcl}
{* 25oct2006}{...}
{hline}
help for {hi:smithwelch}
{hline}

{title:Trend decomposition of outcome differentials}

{p 8 15 2}
 {cmd:smithwelch} {it:est11} {it:est21}  {it:est12} {it:est22}
 [{cmd:,}
 {bind:{cmdab:b:enchmark:(}{cmd:1}|{cmd:2}|{it:est1bm} {it:est2bm}{cmd:)}}
 {bind:{cmdab:r:eference:(}{cmd:1}|{cmd:2}|{it:estref1} {it:estref2} [{it:estrefbm}]{cmd:)}}
 {cmdab:d:etail}[{cmd:(}{it:dlist}{cmd:)}]
 {cmdab:a:djust:(}{it:varlist}{cmd:)}
 {cmd:eform}
 {cmdab:non:otes}  ]


{p 4 4 2} where {it:dlist} is

{p 15 15 2}
 {it:name1} {cmd:=} {it:varlist1} [ {cmd:,} {it:name2} {cmd:=} {it:varlist2}
 [{cmd:,} {it:...} ] ]


{title:Description}

{p 4 4 2}
 {cmd:smithwelch} computes decompositions of differences in mean outcome
 differentials. Smith and Welch (1989) used such decomposition techniques
 in their analysis of the change in the black-white wage differential over
 time. An alternative application would be the decomposition of country
 differences in the male-female wage gap. Also see Lee (2000) and Heckman
 et al. (2000).

{p 4 4 2}
{it:est11}, {it:est21}, {it:est12}, and {it:est22} specify the
 previously fitted and stored regression estimates to be used with the
 decomposition (see help {help estimates store}). The model estimated last
 may be indicated by a period (.), even if it has not yet been stored.
 {it:est11} and {it:est21} specify the group 1 estimates (e.g. male, black)
 and the group 2 estimates (e.g. female, white) for the first sample (e.g.
 time point 1, country A), {it:est12} and {it:est22} are the group
 estimates for the second sample (time point 2, country B). Note that the
 estimation samples ({cmd:e(sample)}) of the specified models determine the
 relevant observations for the decomposition. Group 1 and group 2 must not
 overlap.

{p 4 4 2}
 See the {help jmpierce2} package (available from the SSC archive; type
  {net "describe http://fmwww.bc.edu/repec/bocode/j/jmpierce2":ssc describe jmpierce2})
 for an alternative approach for the decomposition of differences in
 differentials. See the {help oaxaca} package (type
  {net "describe http://fmwww.bc.edu/repec/bocode/o/oaxaca":ssc describe oaxaca})
 for a program to compute single differential decompositions.


{title:Options}

{p 4 8 2}
 {cmd:benchmark(1}|{cmd:2}|{it:est1bm} {it:est2bm}{cmd:)} specifies (the
 estimates for) the "benchmark" sample. {cmd:benchmark(1)} signifies that
 sample 1 is the benchmark sample and {it:est11} and {it:est21} are the
 benchmark estimates. Analogously, {it:est12} and {it:est22} are used as the
 benchmark, if you specify {cmd:benchmark(2)}. Alternatively, use
  {bind:{cmd:benchmark(}{it:est1bm} {it:est2bm}{cmd:)}} to provide the
 estimates from another sample to be used as the benchmark (e.g. the pooled
 sample over all time points or countries). If {cmd:benchmark()} is
 omitted, an extended decomposition containing interaction terms for
 simultaneous changes in endowments {it:and} coefficients is computed. See the
 Methods and Formulas Section below.

{p 4 8 2}
 {cmd:reference(}{cmd:1}|{cmd:2}|{it:estref1} {it:estref2}
 [{it:estrefbm}]{cmd:)} determines the reference coefficients within the
 samples to be used with the decomposition. {cmd:reference(1)} means that
 the coefficients from the first group (i.e. {it:est11} and {it:est12}) are
 used; {cmd:reference(2)} uses the group 2 estimates ({it:est21} and
 {it:est22}). Alternatively, specify
  {bind:{cmd:reference(}{it:estref1} {it:estref2} [{it:estrefbm}]{cmd:)}}
 to provide other reference estimates (e.g. models based on the pooled
 samples over both groups). {it:estrefbm} is required only if
  {bind:{cmd:benchmark(}{it:est1bm} {it:est2bm}{cmd:)}} is specified. If
 {cmd:reference()} is omitted, an extended decomposition containing
 interaction terms for the combined effect of differences in endowments
 {it:and} coefficients is computed. See the Methods and Formulas Section
 below.

{p 4 8 2}
 {cmd:detail}[{cmd:(}{it:dlist}{cmd:)}] requests that detailed
 decomposition results for the individual regressors be reported. Use
 {it:dlist} to subsume the results
 for specific groups of regressors (variables not appearing in {it:dlist}
 are listed individually). The usual shorthand conventions apply to the
 {it:varlist}s specified in {it:dlist} (see help {it:varlist}). For
 example, specify {cmd:detail(exp=exp*)} if the models contain {cmd:exp}
 (experience) and {cmd:exp2} (experience squared). Note that individual
 results concerning the effect of changes/differences in coefficients
 may arbitrarily depend on the scaling of the regressors.

{p 4 8 2}
 {cmd:adjust(}{it:varlist}{cmd:)} may be used to adjust the outcome
 differentials for the effects of certain variables (e.g. selection
 variables) before computing the decomposition.

{p 4 8 2}
 {cmd:eform} causes the results to be displayed in exponentiated form.

{p 4 8 2}
 {cmd:nonotes} suppresses the display of the legend.



{title:Examples}

        {com}. regress lnwage educ exp exp2 if sex==0 & year==1
        . estimates store male1
        . regress lnwage educ exp exp2 if sex==1 & year==1
        . estimates store female1
        . regress lnwage educ exp exp2 if sex==0 & year==2
        . estimates store male2
        . regress lnwage educ exp exp2 if sex==1 & year==2
        . estimates store female2
        . smithwelch male1 female1 male2 female2
        {txt}

        {com}. smithwelch male1 female1 male2 female2, benchmark(1) reference(1)
        {txt}

        {com}. generate byte year2 = year==2
        . regress lnwage educ exp exp2 year2 if sex==0 & (year==1 | year==2)
        . estimates store male12
        . regress lnwage educ exp exp2 year2 if sex==1 & (year==1 | year==2)
        . estimates store female12
        . smithwelch male1 female1 male2 female2, benchmark(male12 female12)
        {txt}

        {com}. regress lnwage educ exp exp2 if year==1
        . estimates store pooled1
        . regress lnwage educ exp exp2 if year==2
        . estimates store pooled2
        . smithwelch male1 female1 male2 female2, reference(pooled1 pooled2)
        {txt}


{title:Saved Results}

{p 4 4 2}
Matrices:

{p 4 22 2}{cmd:r(D)}{space 14}Decomposition of individual differentials{p_end}
{p 4 22 2}{cmd:r(DD)}{space 13}Decomposition of difference in differentials{p_end}
{p 4 22 2}{cmd:r(b11)} ... {cmd:r(b22)} Parameter vectors{p_end}
{p 4 22 2}{cmd:r(X11)} ... {cmd:r(X22)} Vectors of means of regressors{p_end}
{p 4 22 2}{cmd:r(b1b)}, {cmd:r(b2b)}{space 4}Parameter vectors for benchmark
 sample (if provided){p_end}
{p 4 22 2}{cmd:r(br1)}, {cmd:r(br2)}{space 4}Reference parameter vectors (if
 provided){p_end}
{p 4 22 2}{cmd:r(brb)}{space 12}Reference parameter vector for
 benchmark sample (if provided){p_end}


{title:Methods and Formulas}

{p 4 4 2}
 Consider the linear model

       Y_gt = X_gt'b_gt + e_gt,  E(e_gt) = 0,  g = 1,2  t = 1,2,

{p 4 4 2}
 where Y_gt is a vector of outcomes (e.g. log hourly wages) for group g at
 time t, X_gt is the data matrix (the values of the regressors), b_gt is a
 coefficients vector, and e_gt is the vector of residuals. The group
 differential in mean outcome at time t can be decomposed as follows (also
 see help {help oaxaca}, if installed):

       dy_t = y_1t - y_2t = x_1t'b_1t - x_2t'b_2t

            = (x_1t-x_2t)'b_2t + x_2t'(b_1t-b_2t) + (x_1t-x_2t)'(b_1t-b_2t)

            =     dx_t'b_2t    +    x_2t'db_t     +       dx_t'db_t

            =         E        +         C        +           EC

{p 4 4 2}
 where y_gt and x_gt symbolize group means and the "d" prefix indicates
 group differences. Thus, the mean outcome differential is decomposed into
 a part that is due to group differences in characteristics or "endowments"
 (E), a part that is due to differences in coefficients (including the
 intercept) (C), and a correction term capturing the interaction effect of
 differences in endowments and coefficients (EC). The fist term, E,
 measures the change in mean outcome for group 2 if, everything
 else equal, group 2 had the group 1 endowment levels. The second term, C, measures
 the change in mean outcome for group 2 if group 2 retained its own
 endowment levels, but had the group 1 coefficients. The last term, EC,
 quantifies the additional effect that is due to the combined differences in
 in endowments and coefficients.

{p 4 4 2}
 Now suppose that we want to analyze the change in the outcome differential
 over time (or compare the outcome differentials for different countries).
 The change in the differential from t=1 to t=2 can be written as the sum
 of the changes in the decomposition components E, C, and CE:

        dy_2 - dy_1 = [dx_2'b_22 - dx_1'b_21] + [x_22'db_2 - x_21'db_1]

                      + [dx_2'db_2 - dx_1'db_1]

                    = dE + dC + dEC

{p 4 4 2}
 Each of the three terms can again be divided into a part due to changes
 in the x's, a part due to changes in the b's, and an interaction effect
 accounting for the simultaneous change in the x's and b's:

        dE  = (dx_2-dx_1)'b_21 + dx_1'(b_22-b_21) + (dx_2-dx_1)'(b_22-b_21)

        dC  = (x_22-x_21)'db_1 + x_21'(db_2-db_1) + (x_22-x_21)'(db_2-db_1)

        dEC = (dx_2-dx_1)'db_1 + dx_1'(db_2-db_1) + (dx_2-dx_1)'(db_2-db_1)

                     (E)               (C)                   (CE)


{p 4 4 2}
 {it:Specifying reference models for the group differentials}

{p 4 4 2}
 It is common practice to remove the interaction term in the
 decomposition of the group differentials by specifying "reference"
 coefficients to be used with the decomposition (for example, the
 pooled estimates over both groups). Let b_rt indicate the
 reference coefficients vector at time t. The decomposition of the outcome
 differential at time t can then be written as:

       dy_t = dx_t'b_rt + [x_1t'(b_1t-b_rt) + x_2t'(b_rt-b_2t)]

            =     E     +                   C

{p 4 4 2}
 Accordingly, the difference in differentials may be expressed as

        dy_2 - dy_1 = dE + dC

    with

        dE = (dx_2-dx_1)'b_r1 + dx_1'(b_r2-b_r1) + (dx_2-dx_1)'(b_r2-b_r1)

        dC = [(x_12-x_11)'(b_11-b_r1) + (x_22-x_21)'(b_r1-b_21)]

             + [x_11'((b_12-b_r2)-(b_11-b_r1))

                + x_21'((b_r2-b_22)-(b_r1-b_21))]

             + [(x_12-x_11)'((b_12-b_r2)-(b_11-b_r1))

                + (x_22-x_21)'((b_r2-b_22)-(b_r1-b_21))]

{p 4 4 2}
 Note that the equations simplify a lot if the reference coefficients are
 the coefficients from the first group or the second group. For example, if
 b_rt=b_1t:

        dy_t = dx_t'b_1t + x_2t'(b_1t-b_2t)

        dy_2 - dy_1 = dE + dC

        dE = (dx_2-dx_1)'b_11 + dx_1'(b_12-b_11) + (dx_2-dx_1)'(b_12-b_11)

        dC = (x_22-x_21)'db_1 + x_21'(db_2-db_1) + (x_22-x_21)'(db_2-db_1)


{p 4 4 2}
 {it:Specifying a benchmark sample}

{p 4 4 2}
 Similarly, the number of terms in the decomposition of the change in
 differentials can be reduced by specifying a "benchmark"
 sample. Let b_1b and b_2b be the coefficient vectors from the benchmark
 sample for group 1 and group 2. The decomposition of the difference in
 differentials then is:

        dy_2 - dy_1 = dE + dC + dEC

        dE  = (dx_2-dx_1)'b_2b + [dx_2'(b_22-b_2b) + dx_1'(b_2b-b_21)]

        dC  = (x_22-x_21)'db_b + [x_22'(db_2-db_b) + x_21'(db_b-db_1)]

        dEC = (dx_2-dx_1)'db_b + [dx_2'(db_2-db_b) + dx_1'(db_b-db_1)]

{p 4 4 2}
 Again, the formulas simplify if one of the two time points is the benchmark.
 For example, if b_gb=b_g1:

        dE  = (dx_2-dx_1)'b_21 + dx_2'(b_22-b_21)

        dC  = (x_22-x_21)'db_1 + x_22'(db_2-db_1)

        dEC = (dx_2-dx_1)'db_1 + dx_2'(db_2-db_1)

{p 4 4 2}
 Note that, if the benchmark estimates are the estimates from the pooled sample
 over both time points (or, e.g., all time points if there are more than
 two time points), it seems reasonable to include time point dummies in the
 models. While this is unproblematic for the decomposition of dE, it may
 have unwanted effects on the decomposition of dC (because the year
 dummies will appear in the first term of the decomposition of dC). A
 better solution would be to implicitly introduce the year dummies using the
 {help areg} command for the benchmark estimates.


{p 4 4 2}
 {it:Specifying reference models {sf:and} a benchmark sample}

{p 4 4 2}
 If reference and benchmark models both are specified, the formulas may be
 written as:

        dy_2 - dy_1 = dE + dC

        dE = (dx_2-dx_1)'b_rb + [dx_2'(b_r2-b_rb) + dx_1'(b_rb-b_r1)]

        dC = [(x_12-x_11)'(b_1b-b_rb) + (x_22-x_21)'(b_rb-b_2b)]

             + [x_12'((b_12-b_r2)-(b_1b-b_rb))

                + x_22'((b_r2-b_22)-(b_rb-b_2b))

                + x_11'((b_1b-b_rb)-(b_11-b_r1))

                + x_21'((b_rb-b_2b)-(b_r1-b_21))]

{p 4 4 2}
 where b_rb is the reference coefficients vector from the benchmark sample.
 Using the second group estimates as the reference estimates and the
 first time point as the benchmark yields the parametrization applied by
 Smith and Welch (1989):

        dy_2 - dy_1 = dE + dC

        dE = (dx_2-dx_1)'b_21 + dx_2'(b_22-b_21)

                   (1.i)            (1.iii)

        dC = (x_12-x_11)'(b_11-b_21) + x_12'((b_12-b_22)-(b_11-b_21))

                     (1.ii)                      (1.iv)

{p 4 4 2}
 The numbers in parentheses beneath the decomposition components
 correspond to the equation numbers in Smith and Welch (1989:529).
 Furthermore, note that Smith and Welch use different indices
 (12 is 1, 22 is 2, 11 is 3, 21 is 4).


{p 4 4 2}
 Technical notes:

{p 8 10 2}
 - {cmd:smithwelch} does not require all models to contain the exact same
 set of regressors. Coefficients not appearing in a model are simply
 assumed to be zero for that model. However, it is important that all
 regressors are defined (i.e. non-missing) for all observations used with
 the decomposition. Thus, even if a regressor does not appear in an
 individual model, the regressor must contain valid values for the
 observations in the estimation sample of that model.

{p 8 10 2}
 - If the models were estimated using weighted data (see help
  {help weight}), {cmd:smithwelch} will take account of these weights in
 its computations of the means of the regressors.

{p 8 10 2}
 - If multiple-equation models or models with ancillary parameters are used
 with {cmd:smithwelch}, only the first equation in {cmd:e(b)} is taken into
 account.


{title:References}

{p 4 8 2}
Heckman, James J., Thomas M. Lyons, Petra E. Todd (2000). Understanding
Black-White Wage Differentials, 1960-1990. American Economic Review 90:
344-349.
{p_end}
{p 4 8 2}
Lee, Sang-Hyop (2000). On Decomposing Changes in Male-Female Wage Gap.
Working Paper No. 00-12. University of Hawaii at Manoa.
 {p_end}
{p 4 8 2}
Smith, James P., Finis R. Welch (1989). Black Economic Progress After
Myrdal. Journal of Economic Literature 27: 519-564.
 {p_end}


{title:Author}

{p 4 4 2}
 Ben Jann, ETH Zurich, jann@soz.gess.ethz.ch


{title:Also see}

{p 4 13 2}
 Online:  help for {help regress}, {help estimates}, {help jmpierce2} (if installed),
 {help oaxaca} (if installed)