{smcl} {* 05may2008}{...} {hline} help for {hi:oaxaca8} {hline} {hline} {p 0 0 2} A newer version of this software is available from the SSC Archive as {bf:{net "describe http://fmwww.bc.edu/RePEc/bocode/o/oaxaca":oaxaca}}. {p_end} {hline} {title:Decomposition of outcome differentials} {p 8 14 2}{cmd:oaxaca8} {it:est1} {it:est2} [{cmd:,} {it:{help oaxaca8##com0:common_options}} {it:{help oaxaca8##oax0:oaxaca8_options}} ] {p 8 14 2}{cmd:oaxaca2} {it:varlist} [{it:weight}] [{cmd:if} {it:exp}] [{cmd:in} {it:range}] , {cmd:by(}{it:groupvar}{cmd:)} [ {it:{help oaxaca8##com0:common_options}} {it:{help oaxaca8##oax20:oaxaca2_options}} ] {marker com0} {it:{help oaxaca8##com:common_options}}{col 31}Description {hline 70} {cmdab:w:eight:(}{it:wgt} [{it:wgt ...}]{cmd:)}{col 31}{...} specify weights for the two-fold {col 35}decomposition; {it:wgt} is {it:#} or {cmdab:o:mega} {cmdab:d:etail}[{cmd:(}{it:dlist}{cmd:)}]{col 31}{...} display detailed results for the regressors {cmdab:a:djust}{cmd:(}{it:varlist}{cmd:)}{col 31}{...} adjustment for selection variables {cmdab:fix:ed}[{cmd:(}{it:varlist}{cmd:)}]{col 31}{...} assume fixed regressors {cmdab:l:evel:(}{it:#}{cmd:)}{col 31}{...} set the confidence level {cmd:eform}{col 31}{...} display results in exponentiated form {cmd:tf}{col 31}{...} display three-fold decomposition {cmd:nose}{col 31}{...} suppress computation of standard errors {cmdab:es:ave}{col 31}{...} save results in {cmd:e()} {hline 70} where {it:dlist} is{col 31}{...} {it:name} {cmd:=} {it:varlist} [{cmd:,} {it:name} {cmd:=} {it:varlist} {it:...}] {marker oax0} {it:{help oaxaca8##oax:oaxaca8_options}}{col 31}Description {hline 70} {cmdab:r:eference:(}{it:ref} [{it:ref ...}]{cmd:)}{col 31}{...} specify reference estimates {cmd:asis}{col 31}{...} do not change the order of the models {hline 70} {marker oax20} {it:{help oaxaca8##oax2:oaxaca2_options}}{col 31}Description {hline 70} {cmd:by(}{it:groupvar}{cmd:)}{col 31}{...} specifies the groups; {cmd:by()} is not optional {cmdab:p:ooled}{col 31}{...} request decomposition based on pooled model {cmdab:i:ncludeby}{col 31}{...} include {it:groupvar} in the pooled model {cmdab:noi:sily}{col 31}{...} display model estimates {cmd:cmd(}{it:cmd} [{it:cmd} ...]{cmd:)}{col 31}{...} set the estimation command, default: {cmd:regress} {cmdab:cmdo:pts(}{it:opts} [{it:opts} ...]{cmd:)}{col 31}{...} options for model estimation {cmdab:addv:ars(}{it:vars} [{it:vars} ...]{cmd:)}{col 31}{...} additional regressors for individual models {hline 70} {p 4 4 2} {cmd:aweight}s, {cmd:fweight}s, {cmd:iweight}s, and {cmd:pweight}s are allowed with {cmd:oaxaca2} (depending on the used estimation command); see help {help weight}. {title:Description} {p 4 4 2} Given the results from two models previously estimated and stored by {bf:{help estimates store}}, {cmd:oaxaca8} computes the so called Blinder-Oaxaca decomposition of the mean outcome differential. An example is the decomposition of the gender wage gap into an "explained" portion due to differences in endowments and an "unexplained" portion due to differences in coefficients. {it:est1} refers to the name of the stored estimates for the first group (e.g. males), {it:est2} is the name of the stored estimates for the second group (e.g. females). {p 4 4 2} {cmd:oaxaca8} can display different variants of the decomposition and also provides standard errors. See the methods and formulas section for details. {p 4 4 2} {cmd:oaxaca2} is a wrapper for {cmd:oaxaca8}. It first estimates the group models and then performs the decomposition. {cmd:oaxaca2} is suitable for use with {bf:{help bootstrap}} (also see the {cmd:esave} option). {p 4 4 2} {cmd:oaxaca8} requires Stata 8.2 or higher. A Stata 7 decomposition package is available from the SSC Archive as {bf:{net "describe http://fmwww.bc.edu/RePEc/bocode/d/decompose":decompose}}. Also see {bf:{net "describe http://fmwww.bc.edu/RePEc/bocode/d/decomp":decomp}} by Ian Watson. Packages to compute decompositions of changes in outcome differentials are {bf:{net "describe http://fmwww.bc.edu/repec/bocode/s/smithwelch":smithwelch}} and {bf:{net "describe http://fmwww.bc.edu/repec/bocode/j/jmpierce":jmpierce}}. {title:Options} {marker com} {it:{dlgtab:common_options}} {p 4 8 2} {cmd:weight(}{it:wgt} [{it:wgt ...}]{cmd:)}, where {it:wgt} is either {it:#} or {cmd:omega}, specifies the weight given to the parameters of the high-outcome group for the two-fold decomposition. A separate decomposition is computed for each specified {it:wgt}. For example, {cmd:weight(0 1)} displays a decomposition with the low-outcome group coefficients as reference and a decomposition with the high group parameters as a reference. Specifying {cmd:weight(omega)} causes {cmd:oaxaca8} to compute the reference parameters from the data as explained in the methods and formulas section. The {cmd:weight(omega)} option makes sense only in the context of OLS regression. Furthermore, note that the interpretation of the detailed results for the "unexplained" part (see the {cmd:detail} option) is problematic with this decomposition. {p 4 8 2}{cmd:detail}[{cmd:(}{it:dlist}{cmd:)}] requests that the detailed decomposition results for the individual regressors be reported. Use {it:dlist} to subsume the results for specific groups of regressors (variables not appearing in {it:dlist} are listed individually). The usual shorthand conventions apply to the {it:varlist}s specified in {it:dlist} (see help {help varlist}). For example, specify {cmd:detail(exp=exp*)} if the models contain {cmd:exp} (experience) and {cmd:exp2} (experience squared). {p 8 8 2}A cautionary note: For the "unexplained" part of the differential, the subdivision into separate contributions is sensitive to locational transformations of the regressors (see, e.g., Oaxaca and Ransom 1999). The results are thus arbitrary unless the regressors have natural zero points. A related problem is that the results for categorical variables depend on the choice of the reference category. A solution to the reference category problem is provided by the {bf:{net "describe http://fmwww.bc.edu/RePEc/bocode/d/devcon":devcon}} package from the SSC Archive. {p 4 8 2} {cmd:adjust(}{it:varlist}{cmd:)} may be used to adjust the outcome differential for the effects of certain variables (e.g. selection variables) before computing the decomposition. {p 4 8 2} {cmd:fixed}[{cmd:(}{it:varlist}{cmd:)}] indicates that certain regressors are fixed. The default is to treat all regressors as stochastic. If {cmd:fixed} is specified without arguments, all regressors are assumed to be fixed. Using this option has implications for the computation of the standard errors of the decomposition components. {p 4 8 2}{cmd:level(}{it:#}{cmd:)} specifies the confidence level, in percent terms, for the confidence intervals of the computed statistics; see help {help level}. {p 4 8 2}{cmd:eform} causes the results to be displayed in exponentiated form. {p 4 8 2}{cmd:tf} specifies that the three-fold decomposition be displayed in any case. {p 4 8 2} {cmd:nose} suppresses the calculation of standard errors. {p 4 8 2} {cmd:esave} specifies that the results be returnd in {cmd:e()}. This is useful, e.g., if you want to use {bf:{help bootstrap}} with {cmd:oaxaca8}. Note that the off-diagonal elements in {cmd:e(V)} will be set to zero since {cmd:oaxaca8} does not provide the covariances among the various decomposition components. Do not apply {bf:{help lincom}} or similar techniques to the returned results. Also do not use {bf:{help predict}}.{p_end} {marker oax} {it:{dlgtab:oaxaca8_options}} {p 4 8 2} {cmd:reference(}{it:ref1} [{it:ref2 ...}]{cmd:)} specifies reference estimates to be used with the two-fold decomposition. {it:ref1}, {it:ref2}, etc. refer to the names of the stored models. A separate decomposition is computed for each model specified. Note that no standard errors will be computed for the "unexplained" part in these decompositions. {p 4 8 2} {cmd:asis} instructs {cmd:oaxaca8} not to change the order of the models. By default, {cmd:oaxaca8} rearranges the models so that the mean differential is positive.{p_end} {marker oax2} {it:{dlgtab:oaxaca2_options}} {p 4 8 2}{cmd:by(}{it:groupvar}{cmd:)} defines the groups between which the decomposition is to be performed. {it:groupvar} is to take on two unique values. {p 4 8 2}{cmd:pooled} displays a decomposition based on a pooled model over both groups. {p 4 8 2}{cmd:includeby} specifies that {it:groupvar} (see the {cmd:by()} option) be included as a control variable in the pooled model. {p 4 8 2}{cmd:noisily} causes the estimates of the individual models to be displayed. {p 4 8 2}{cmd:cmd(}{it:cmd} [{it:cmd} ...]{cmd:)} specifies the estimation commands for the models (see {help estcom}). The default command is {bf:{cmd:regress}}. For example, specify {cmd:cmd(ivreg)} to use {bf:{help ivreg}} instead. Specify more than one command, if the different commands be used. For example, {cmd:cmd(regress ivreg)} would use {cmd:regress} for the first group and {cmd:ivreg} for the second. {p 4 8 2}{cmd:cmdopts(}"{it:opts}" ["{it:opts}" ...]{cmd:)} may be used to specify sets of options for the model estimation commands. {it:opts} must be enclosed in quotes if it contains spaces. If only one set of options is specified, it is added to all models. For example, specify {cmd:cmdopts("robust nocons")} to add the options {cmd:robust} and {cmd:nocons} to all models. Alternatively, {cmd:cmdopts("robust nocons" "hc3")} would add {cmd:robust nocons} to the first model and {cmd:hc3} to the second. Finaly, {cmd:cmdopts("hc3" "")} would add {cmd:hc3} to the first model and nothing to the second. {p 4 8 2}{cmd:addvars(}"{it:vars}" ["{it:vars}" ...]{cmd:)} specifies additional variables to be added to individual models. For example, {cmd:addvars("" "lambda")} would add variable {cmd:lambda} to the second model. {title:Example} {p 4 4 2}Step 1: Estimate and store the models {com}. regress lnwage educ exp exp2 if female==0 . estimates store male . regress lnwage educ exp exp2 if female==1 . estimates store female{txt} {p 4 4 2}Step 2: Compute the decomposition {p 6 8 2}- three-fold decomposition (endowments, coefficients, interaction) {com}. oaxaca8 male female{txt} {p 6 8 2}- various parametrizations of the two-fold decomposition (explained, unexplained) {com}. oaxaca8 male female, weight(1 0.5 0 omega){txt} {p 4 4 2}Usage of {cmd:oaxaca2}: steps 1 and 2 in one command {com}. oaxaca2 lnwage educ exp exp2, by(female){txt} {p 4 4 2}Bootstrapping (Stata 8) {com}. bs "oaxaca2 lnwage educ exp exp2, by(female) esave nose" _b{txt} {p 4 4 2}Bootstrapping (Stata 9) {com}. bootstrap _b: oaxaca2 lnwage educ exp exp2, by(female) esave nose{txt} {p 4 4 2}(Note that the {cmd:nose} option in the bootstrap examples is not essential. However, {cmd:bootstrap} executes faster if {cmd:nose} is specified.) {title:Saved Results} {p 4 4 2}{cmd:oaxaca8} saves in {cmd:r()}: {p 4 4 2}Scalars: {p 4 16 2}{cmd:r(pred1)}{space 4}mean linear prediction from first group{p_end} {p 4 16 2}{cmd:r(se_pred1)}{space 1}standard error of prediction from first group{p_end} {p 4 16 2}{cmd:r(pred2)}{space 4}mean linear prediction from second group{p_end} {p 4 16 2}{cmd:r(se_pred2)}{space 1}standard error of prediction from second group{p_end} {p 4 16 2}{cmd:r(diff)}{space 5}difference between mean predictions{p_end} {p 4 16 2}{cmd:r(se_diff)}{space 2}standard error of difference{p_end} {p 4 4 2}Matrices: {p 4 16 2}{cmd:r(D)}{space 8}results of the decompositions{p_end} {p 4 16 2}{cmd:r(VD)}{space 7}variances of the results in {cmd:r(D)}{p_end} {p 4 16 2}{cmd:r(B1)}{space 7}coefficients from the first model{p_end} {p 4 16 2}{cmd:r(VB1)}{space 6}variance-covariance matrix from the first model{p_end} {p 4 16 2}{cmd:r(B2)}{space 7}coefficients from the second model{p_end} {p 4 16 2}{cmd:r(VB2)}{space 6}variance-covariance matrix from the second model{p_end} {p 4 16 2}{cmd:r(X1)}{space 7}means of the regressors for the first group{p_end} {p 4 16 2}{cmd:r(VX1)}{space 6}variance-covariance matrix of the means of the regressors for the first group{p_end} {p 4 16 2}{cmd:r(X2)}{space 7}means of the regressors for the second group{p_end} {p 4 16 2}{cmd:r(VX2)}{space 6}variance-covariance matrix of the means of the regressors for the second group{p_end} {p 4 4 2}If {cmd:esave} is specified, {cmd:oaxaca8} additionally saves in {cmd:e()}: {p 4 4 2}Scalars: {p 4 16 2}{cmd:e(N)}{space 8}total number of case{p_end} {p 4 16 2}{cmd:e(N1)}{space 7}number of cases in first group{p_end} {p 4 16 2}{cmd:e(N2)}{space 7}number of cases in second group{p_end} {p 4 16 2}{cmd:e(pred1)}{space 4}mean linear prediction from first group{p_end} {p 4 16 2}{cmd:e(se_pred1)}{space 1}standard error of prediction from first group{p_end} {p 4 16 2}{cmd:e(pred2)}{space 4}mean linear prediction from second group{p_end} {p 4 16 2}{cmd:e(se_pred2)}{space 1}standard error of prediction from second group{p_end} {p 4 16 2}{cmd:e(diff)}{space 5}difference between mean predictions{p_end} {p 4 16 2}{cmd:e(se_diff)}{space 2}standard error of difference{p_end} {p 4 4 2}Macros: {p 4 16 2}{cmd:e(cmd)}{space 6}containing "{cmd:oaxaca8}"{p_end} {p 4 4 2}Matrices: {p 4 12 2}{cmd:e(b)}{space 8}decomposition results{p_end} {p 4 12 2}{cmd:e(V)}{space 8}variances of decomposition results (covariances set to 0){p_end} {p 4 4 2}Functions: {p 4 12 2}{cmd:e(sample)}{space 3}estimation sample{p_end} {title:Methods and Formulas} {it:The three-fold decomposition} {p 4 4 2} The following linear models are given: {bf:Y}1 = {bf:X}1{bf:b}1 + {bf:e}1 {bf:Y}2 = {bf:X}2{bf:b}2 + {bf:e}2 {p 4 4 2} for some outcome variable Y in two groups 1 and 2. As long as E({bf:e}1)=E({bf:e}2)=0, the mean outcome difference between the two groups can be decomposed as {p 8 8 2} R = {bf:x}1'{bf:b}1 - {bf:x}2'{bf:b}2 = ({bf:x}1-{bf:x}2)'{bf:b}2 + {bf:x}2'({bf:b}1-{bf:b}2) + ({bf:x}1-{bf:x}2)'({bf:b}1-{bf:b}2) = E + C + CE {p 4 4 2} where {bf:x}1 and {bf:x}2 are the vectors of means of the regressors (including the constants) for the two groups (e.g. see Winsborough and Dickenson 1971, Jones and Kelley 1984, Daymont and Andrisani 1984). In other words, R is decomposed into one part that is due to differences in endowments (E), one part that is due to differences in coefficients (including the intercept) (C), and a third part that is due to interaction between coefficients and endowments (CE). {it:The two-fold decomposition} {p 4 4 2} Depending on the model that is assumed to be the "true" model (i.e. the "absence-of-discrimination" model), the terms of the three-fold decomposition may be used to determine the "explained" (Q) and "unexplained" (U; e.g. discrimination) parts of the differential (the question is how to allocate the interaction term CE). Oaxaca (1973) proposed assuming either the low group model or the high group model as the no-discrimination model, which implies that Q=E and U=C+CE and Q=E+CE and U=C, respectively. More generally, the coefficients of the "true" model may be expressed as {p 8 8 2} {bf:b}* = {bf:W}{bf:b}1+({bf:I}-{bf:W}){bf:b}2 {p 4 4 2} where {bf:I} is an identity matrix and {bf:W} is a matrix of weights. Analogously, the decomposition may be written as {p 8 8 2} R = ({bf:x}1-{bf:x}2)'[{bf:W}{bf:b}1+({bf:I}-{bf:W}){bf:b}2] + [{bf:x}1'({bf:I}-{bf:W})+{bf:x}2'{bf:W}]({bf:b}1-{bf:b}2) {p 4 4 2}In the two cases proposed by Oaxaca (1973), {bf:W} is a nullmatrix or equals {bf:I}, respectively ({bf:W}={bf:I} is also suggested by Blinder 1973). Furthermore, {bf:W} may be w{bf:I}, where w is a scalar reflecting the weight given to the coefficients for the first group (Reimers 1983 proposed w=.5, Cotton 1988 proposed using the relative group size). Use the {cmd:weigth()} option to specify w. {p 4 4 2}Alternatively, Neumark (1988) proposed using the coefficients from a pooled model for both groups, which implies that {p 8 8 2} {bf:W} = diag({bf:b}*-{bf:b}2) diag({bf:b}1-{bf:b}2)^-1 {p 4 4 2} or {p 8 8 2} R = ({bf:x}1-{bf:x}2)'{bf:b}* + [{bf:x}1'({bf:b}1-{bf:b}*)+{bf:x}2'({bf:b}*-{bf:b}2)] {p 4 4 2} where {bf:b}* is the vector of the coefficients from the pooled model. However, other coefficients vectors may also make sense. Use the {cmd:reference()} option to specify such a reference model. {p 4 4 2}In the context of OLS regression, the method proposed by Neumark is equivalent to using the weighting matrix {p 8 8 2}{bf:W} = ({bf:X}1'{bf:X}1 + {bf:X}2'{bf:X}2)^-1 ({bf:X}1'{bf:X}1) {p 4 4 2} where {bf:X}1 and {bf:X}2 are the matrices of observed values for the two samples (Oaxaca and Ransom 1994). This approach is implemented via the {cmd:weight(omega)} option. {it:Standard errors} {p 4 4 2}The variances/standard errors of the components are computed according to the method detailed in Jann (2005). For the case of fixed regressors, also see Oaxaca and Ransom (1998). The variances and covariances of the coefficients are taken from the {cmd:e(V)} matrices of the models. The variance-covariance matrices of the means of the regressors in the models are estimated according to standard formulas (cross-product matrix of deviations divided by N*(N-1)) unless {cmd:pweight}s or clusters are applied or a specific survey design is set (see help {bf:{help svyset}}). In the latter cases, the variance-covariance matrices are estimated using the {bf:{help svymean}} command. Note that standard errors cannot be computed for the U term if the non-discriminating coefficients are taken from a reference model specified via the {cmd:reference()} option. Use {bf:{help bootstrap}} to derive the standard errors in this case. {it:Selection models} {p 4 4 2} Assume that a selection variable S appears in the models. If the variable is marked by specifying {cmd:adjust(}S{cmd:)}, the differential will be adjusted for selection, i.e. {p 8 8 2}R_s = {bf:x}1'{bf:b}1 - {bf:x}2'{bf:b}2 - (s1bs1 - s2bs2) {p 4 4 2} where s1 and s2 are the means of S and bs1 and bs2 are the coefficients of S, and {cmd:oaxaca8} will decompose R_s instead of R. Note that it is not necessary to use the {cmd:adjust} option if the models were estimated with {bf:{help heckman}}. See Dolton and Makepeace (1986) or Neumann and Oaxaca (2004) for more sophisticated approaches to dealing with selection. {p 4 4 2} If a specific regressor (or a selection variable) appears only in one model, the corresponding coefficient and the mean of the regressor will be set to zero for the other group. {title:References} {p 4 8 2}Blinder, A.S. (1973). Wage Discrimination: Reduced Form and Structural Estimates. The Journal of Human Resources 8: 436-455.{p_end} {p 4 8 2}Cotton, J. (1988). On the Decomposition of Wage Differentials. The Review of Economics and Statistics 70: 236-243.{p_end} {p 4 8 2}Daymont, T.N., Andrisani, P.J. (1984). Job Preferences, College Major, and the Gender Gap in Earnings. The Journal of Human Resources 19: 408-428.{p_end} {p 4 8 2}Dolton, P.J., Makepeace, G.H. (1986). Sample Selection and Male-Female Earnings Differentials in the Graduate Labour Market. Oxford Economic Papers 38: 317-341.{p_end} {p 4 8 2}Jann, B. (2005). Standard Errors for the Blinder–Oaxaca Decomposition: {browse "http://repec.org/dsug2005/oaxaca_se_handout.pdf"}.{p_end} {p 4 8 2}Jones, F.L., Kelley, J. (1984). Decomposing Differences Between Groups. A Cautionary Note on Measuring Discrimination. Sociological Methods and Research 12: 323-343.{p_end} {p 4 8 2}Neuman, S., Oaxaca, R.L. (2004). Wage decompositions with selectivity-corrected wage equations: A methodological note. Journal of Economic Inequality 2: 3-10.{p_end} {p 4 8 2}Neumark, D. (1988). Employers' Discriminatory Behavior and the Estimation of Wage Discrimination. The Journal of Human Resources 23: 279-295.{p_end} {p 4 8 2}Oaxaca, R. (1973). Male-Female Wage Differentials in Urban Labor Markets. International Economic Review 14: 693-709.{p_end} {p 4 8 2}Oaxaca, R.L., Ransom, M.R. (1994). On discrimination and the decomposition of wage differentials. Journal of Econometrics 61: 5-21.{p_end} {p 4 8 2}Oaxaca, R.L., Ransom, M.R. (1998). Calculation of approximate variances for wage decomposition differentials. Journal of Economic and Social Measurement 24: 55-61.{p_end} {p 4 8 2}Oaxaca, R.L., Ransom, M.R. (1999). Identification in Detailed Wage Decompositions. The Review of Economics and Statistics 81: 154-157.{p_end} {p 4 8 2}Reimers, C.W. (1983). Labor Market Discrimination Against Hispanic and Black Men. The Review of Economics and Statistics 65: 570-579.{p_end} {p 4 8 2}Winsborough, H.H., Dickinson, P. (1971). Components of Negro-White Income Differences. Proceedings of the American Statistical Association, Social Statistics Section: 6-8. {title:Author} {p 4 4 2}Ben Jann, ETH Zurich, jannb@ethz.ch {title:Also see} {p 4 13 2} Online: help for {bf:{help regress}}, {bf:{help estimates}}, {bf:{help heckman}}, {bf:{help devcon}} (if installed), {bf:{help smithwelch}} (if installed), {bf:{help jmpierce}} (if installed)