{smcl} {* 24apr2023}{...} {hi:help oaxaca}{...} {right:{browse "http://github.com/benjann/oaxaca/"}} {hline} {title:Title} {pstd}{hi:oaxaca} {hline 2} Blinder-Oaxaca decomposition of outcome differentials {title:Syntax} {p 8 15 2} {cmd:oaxaca} {depvar} [{it:indepvars}] {ifin} {weight} {cmd:,} {opt by(groupvar)} [ {help oaxaca##opt:{it:options}} ] where {it:indepvars} is {it:term} [{it:term} {it:...}] with {it:term} as {varlist} or {cmd:(}[{help oaxaca##subsume:{it:name}}{cmd::}] {varlist}{cmd:)} and {varlist} may contain {cmdab:n:ormalize(}{help oaxaca##norm:{it:spec}}{cmd:)} {synoptset 25 tabbed}{...} {marker opt}{synopthdr:options} {synoptline} {syntab :Main} {synopt :{opt by(groupvar)}}specifies the groups; {cmd:by()} is required {p_end} {synopt :{opt swap}}swap groups {p_end} {synopt :{opt lin:ear}}linear decomposition; the default {p_end} {synopt :{opt logit}}logit decomposition {p_end} {synopt :{opt probit}}probit decomposition {p_end} {synopt :{cmdab:nod:etail}}suppress detailed decomposition {p_end} {synopt :{opt a:djust(varlist)}}adjustment for selection variables {p_end} {syntab :Decomposition type} {synopt :{cmdab:three:fold}[{cmd:(}{cmdab:r:everse}{cmd:)}]}three-fold decomposition; the default {p_end} {synopt :{opt w:eight(# [# ...])}}two-fold decomposition using specified weights {p_end} {synopt :{cmdab:p:ooled}[{cmd:(}{it:{help oaxaca##mopts:model_opts}}{cmd:)}]}two-fold decomposition using pooled model including {it:groupvar} {p_end} {synopt :{cmdab:o:mega}[{cmd:(}{it:{help oaxaca##mopts:model_opts}}{cmd:)}]}two-fold decomposition using pooled model excluding {it:groupvar} {p_end} {synopt :{opt ref:erence(name)}}two-fold decomposition using stored model {p_end} {synopt :{opt split}}split unexplained part of two-fold decomposition {p_end} {syntab :SE/SVY} {synopt :{cmd:svy}[{cmd:(}{it:{help oaxaca##svy:svyspec}}{cmd:)}]}survey data estimation {p_end} {synopt :{opth vce(vcetype)}}{it:vcetype} may be may be {opt analytic}, {opt r:obust}, {opt cl:uster}{space 1}{it:clustvar}, {opt boot:strap}, or {opt jack:knife} {p_end} {synopt :{opt cl:uster(varname)}}adjust standard errors for intragroup correlation (Stata 9) {p_end} {synopt :{cmdab:fix:ed}[{cmd:(}{it:varlist}{cmd:)}]}assume non-stochastic regressors {p_end} {synopt : {cmd:suest}[{cmd:(}{it:name}{cmd:)}] | {cmd:nosuest}}do/do not use {helpb suest} to obtain joint variance matrix {p_end} {synopt :{opt nose}}suppress computation of standard errors {p_end} {syntab :Models} {synopt :{cmd:model1(}{it:{help oaxaca##mopts:model_opts}}{cmd:)}}estimation details for the Group 1 model {p_end} {synopt :{cmd:model2(}{it:{help oaxaca##mopts:model_opts}}{cmd:)}}estimation details for the Group 2 model {p_end} {synopt :{opt noi:sily}}display model estimation output {p_end} {synopt :{opt relax}}do no stop on dropped coefficients/zero variances {p_end} {synopt :{it:estopts}}options passed through to all models {p_end} {syntab :X-Values (linear decomposition only)} {synopt :{cmd:x1(}{it:{help oaxaca##x1x2:names_and_values}}{cmd:)}}provide custom X-values for Group 1 {p_end} {synopt :{cmd:x2(}{it:{help oaxaca##x1x2:names_and_values}}{cmd:)}}provide custom X-values for Group 2 {p_end} {syntab :Reporting} {synopt :{opt xb}}display table with coefficients and means {p_end} {synopt :{opt l:evel(#)}}set confidence level; default is {cmd:level(95)} {p_end} {synopt :{opt eform}}report exponentiated results {p_end} {synopt :{opt noh:eader}}suppress table header {p_end} {synopt :{opt nodef:initions}}suppress information on definition of decomposition terms {p_end} {synopt :{opt notab:le}}suppress coefficients table {p_end} {synopt :{opt nole:gend}}suppress legend on variable sets in detailed decomposition {p_end} {synoptline} {p 4 6 2} {cmd:bootstrap}, {cmd:by}, {cmd:jackknife}, {cmd:statsby}, and {cmd:xi} are allowed; see {help prefix}. {p_end} {p 4 6 2} Weights are not allowed with the {helpb bootstrap} prefix. {p_end} {p 4 6 2} {cmd:aweight}s are not allowed with the {helpb jackknife} prefix. {p_end} {p 4 6 2} {cmd:vce()}, {cmd:cluster()}, and weights are not allowed with the {cmd:svy} option. {p_end} {p 4 6 2} {cmd:fweight}s, {cmd:aweight}s, {cmd:pweight}s, and {cmd:iweight} are allowed; see {help weight}; {cmd:aweight}s are not allowed with {cmd:logit} or {cmd:probit} {p_end} {title:Description} {pstd} {cmd:oaxaca} computes the so-called Blinder-Oaxaca decomposition, which is often used to analyze wage gaps by sex or race. {it:depvar} is the outcome variable of interest (e.g. log wages) and {it:indepvars} are predictors (e.g. education, work experience, etc.). {it:groupvar} identifies the groups to be compared. The standard errors of the decomposition components are computed using the delta method and take into account the variability induced by stochastic regressors. For methods and formulas see Jann (2008). {pstd} {cmd:oaxaca} also supports the non-linear decomposition for binary dependent variables proposed by Yun (2004). See the {cmd:logit} and {cmd:probit} options. An alternative non-linear decomposition for binary dependent variables, suggested by Fairlie (2005), is available as {helpb fairlie} from the SSC Archive (see {net "describe fairlie, from(http://fmwww.bc.edu/repec/bocode/f/)":ssc describe fairlie}). {pstd} {cmd:oaxaca} typed without arguments replays the last results, optionally applying {cmd:xb}, {cmd:level()}, {cmd:eform}, or {cmd:nolegend}. {marker subsume} {title:Subsume results for sets of variables} {pstd}Decomposition results can be aggregated for subsets of variables using syntax {it:...} {cmd:(}[{it:name}{cmd::}] {varlist}{cmd:)} {it:...} {pstd} where {it:name} provides a label for the subset (the name of the first variable in the subset is used as label if {it:name} is omitted). For example, you could type {com}. oaxaca lnwage educ (expten: exper tenure), by(female){txt} {pstd} to subsume the contributions of {cmd:exper} and {cmd:tenure}. Apart from variable names, also {cmd:_cons} and {cmd:_offset} can be specified as part of a subset. {marker norm} {title:Normalization of categorical variables} {pstd} For categorical regressors, the detailed decomposition results depend on the choice of the (omitted) base category. A solution is to compute the decomposition based on "normalized" effects, i.e. effects that are expressed as deviation contrasts from the grand mean (Yun 2005). To "normalize" the effects for a set of indicator variables representing a categorical variable include the indicator variables in the list of regressors using syntax {it:...} {opt n:ormalize(spec)} {it:...} {pstd}where {it:spec} usually simply is the list of indicator variables. Note that an indicator variable has to be supplied for every category (including the base category). For example, you could type {com}. tabulate isco, generate(isco) nofreq . oaxaca lnwage educ exper normalize(isco1-isco9), by(female){txt} {pstd}The {cmd:tablate, generate()} command is a convenient way to generate a set of indicator variables from a categorical variable (such as the 9 major group ISCO-88 job classification). The base category to be omitted from model estimation can be designated using the {cmd:b.} operator, but this should not affect the decomposition results. For example, you could type {com:... normalize(married b.single divorced) ...} {pstd}The first variable is taken if no base category is marked. {pstd}Note that {cmd:normalize()} is allowed within subsumed variable sets. For example, you could type {com:... (family: kids6 normalize(married b.single divorced)) ...} {pstd}Normalization can also be applied to interactions between a categorical variable and a continuous variable. In this case, type {cmd:#} followed by the name of the continuous variable at the end in {cmd:normalize()}. Because usually you would also want to normalize the main effects you should supply two {cmd:normalize()} statements, one for the main effects and one for the interactions. Example: Suppose {cmd:d1}, {cmd:d2}, and {cmd:d3} are indicator variables representing a categorical variable and {cmd:d1x}, {cmd:d2x}, and {cmd:d3x} are interactions of these indicators with a continuous variable {cmd:x}. You could then type {com:... x normalize(d1 d2 d3) normalize(d1x d2x d3x # x) ...} {pstd} To aggregate all decomposition terms related to {cmd:x} and {cmd:d}, you could type {com:... (xd: x normalize(d1 d2 d3) normalize(d1x d2x d3x # x)) ...} {title:Options} {dlgtab:Main} {phang} {opt by(groupvar)} specifies the {it:groupvar} that defines the two groups that are to be compared. {cmd:by()} is required. {phang} {opt swap} reverses the order of the groups.{p_end} {phang} {opt linear} causes the standard linear decomposition to be computed. This is the default. The estimation command for the group models defaults to {helpb regress}. {phang} {opt logit} causes the non-linear decomposition for a binary dependent variable to be computed using the weighting method described by Yun (2004). The estimation command for the group models defaults to {helpb logit}. {phang} {opt probit} causes the non-linear decomposition for a binary dependent variable to be computed using the weighting method described by Yun (2004). The estimation command for the group models defaults to {helpb probit}. {pstd} Only one of {opt linear}, {opt logit}, or {opt probit} is allowed. {phang} {opt nodetail} suppresses the detailed results and only computes the overall decomposition. {phang} {opt adjust(varlist)} causes the group differential to be adjusted by the contribution of the specified variables before computing the decomposition. This is useful, for example, if the specified variables are selection terms. Note that {cmd:adjust()} is not needed if {helpb heckman} is used to estimate the models. {cmd:_offset} is allowed in {cmd:adjust()}. {dlgtab:Decomposition type} {phang} {cmd:threefold}[{cmd:(}{cmdab:reverse}{cmd:)}] computes the three-fold decomposition. This is the default. The decomposition is expressed from the viewpoint of Group 2. Specify {cmdab:threefold(reverse)} to express the decomposition from the viewpoint of Group 1. {phang} {opt weight(# [# ...])} computes the two-fold decomposition where {it:#} [{it:# ...}] are the weights given to Group 1 relative to Group 2 in determining the reference coefficients (weights are recycled if there are more coefficients than weights). For example, {cmd:weight(1)} uses the Group 1 coefficients as the reference coefficients, {cmd:weight(0)} uses the Group 2 coefficients. {phang} {cmd:pooled}[{cmd:(}{it:{help oaxaca##mopts:model_opts}}{cmd:)}] computes the two-fold decomposition using the coefficients from a pooled model over both groups as the reference coefficients. {it:groupvar} is included in the pooled model as an additional control variable. Estimation details may be specified in parentheses; see the {helpb oaxaca##mopts:model1()} option below. {phang} {opt omega}[{cmd:(}{it:{help oaxaca##mopts:model_opts}}{cmd:)}] computes the two-fold decomposition using the coefficients from a pooled model over both groups as the reference coefficients (without including {it:groupvar} as a control variable). Estimation details may be specified in parentheses; see the {helpb oaxaca##mopts:model1()} option below. {phang} {opt reference(name)} computes the two-fold decomposition using the coefficients from a stored model. {it:name} is the name under which the model was stored; see {helpb estimates store}. It is suggested not to combine {cmd:reference()} with {cmd:vce(bootstrap)} or {cmd:vce(jackknife)}. {phang} {opt split} causes the "unexplained" component in the two-fold decomposition to be split into a part related to Group 1 and a part related to Group 2. {pstd}Only one of {cmd:threefold}, {cmd:weight()}, {cmd:pooled}, {cmd:omega}, and {cmd:reference()} is allowed. {dlgtab:X-Values} {marker x1x2} {phang} {opt x1(names_and_values)} and {opt x2(names_and_values)} provide custom values for specific predictors to be used for Group 1 and Group 2 in the decomposition (only allowed with linear decomposition). The default is to use the group means of the predictors. The syntax for {it:names_and_values} is {p 12 16 2}{it:varname} [{cmd:=}] {it:value} [[{cmd:,}] {it:varname} [{cmd:=}] {it:value} {it:...} ] {pmore}Example: {cmd:x1(educ 12 exp 30)} {p_end} {dlgtab:SE/SVY} {marker svy} {phang} {cmd:svy}[{cmd:(}[{it:vcetype}] [{cmd:,} {it:svy_options}]{cmd:)}] executes {cmd:oaxaca} while accounting for the survey settings identified by {helpb svyset} (this is essentially equivalent to applying the {helpb svy} prefix command, although the {helpb svy} prefix is not allowed with {cmd:oaxaca} due to some technical issues). {it:vcetype} and {it:svy_options} are as described in help {helpb svy}. {phang} {opt vce(vcetype)} specifies the type of standard errors reported. {it:vcetype} may be may be {opt analytic} (the default), {opt robust}, {opt cluster}{space 1}{it:clustvar}, {opt bootstrap}, or {opt jackknife}; see {help vce_option:{bf:[R]}{space 1}{it:vce_option}}. {phang} {opt cluster(varname)} adjusts standard errors for intragroup correlation; this is Stata 9 syntax for {cmd:vce(cluster}{space 1}{it:clustvar}{cmd:)}. {phang} {cmd:fixed}[{cmd:(}{it:varlist}{cmd:)}] identifies fixed regressors (all if specified without argument; an example for fixed regressors are experimental factors). The default is to treat regressors as stochastic. Stochastic regressors inflate the standard errors of the decomposition components. {phang} {cmd:suest}[{cmd:(}{it:name}{cmd:)}] enforces using {helpb suest} to obtain the covariances between the models/groups. {cmd:suest} is implied by {cmd:pooled}, {cmd:omega}, {cmd:reference()}, {cmd:svy}, {cmd:vce(cluster)}, and {cmd:cluster()}. Specify {cmd:suest(}{it:name}{cmd:)} to save {helpb suest}'s estimation results under {it:name} using {helpb estimates store}. {cmd:nosuest} prevents applying {helpb suest} (this may cause biased standard errors). {phang} {opt nose} suppresses the computation of standard errors. {dlgtab:Model estimation} {marker mopts} {phang} {cmd:model1(}{it:model_opts}{cmd:)} and {cmd:model2(}{it:model_opts}{cmd:)} specify the estimation details for the two group models. The syntax for {it:model_opts} is {p 12 16 2}[{it:{help estimation_commands:estcom}}] [{cmd:,} {opt sto:re(name)} {opt add:rhs(spec)} {it:estcom_options} ] {pmore}where {it:estcom} is the estimation command to be used and {it:estcom_options} are options allowed by {it:estcom}. {opt store(name)} saves the model's estimation results under {it:name} using {helpb estimates store}. {opt addrhs(spec)} adds {it:spec} to the "right-hand side" of the model. For example, use {cmd:addrhs()} to add extra variables to the model. Examples: {cmd:model1(heckman, select(}{it:varlist_s}{cmd:) twostep)} {cmd:model1(ivregress 2sls, addrhs((}{it:varlist2}{cmd:=}{it:varlist_iv}{cmd:)))} {pmore}Note that {cmd:oaxaca} uses the first equation if a model contains multiple equations. Furthermore, coefficients that only occur in one of the models are assumed zero in the other model. It is required, however, that the associated variables contain non-missing values for all observations in both groups. {phang} {opt noisily} displays the models' estimation output. (Note that, depending on context, these models will be estimated in a way such that the displayed standard errors are not valid.) {phang} {opt relax} causes {cmd:oaxaca} to continue its computations even if coefficients are dropped from the models (e.g. due to collinearity) or if some coefficients have zero variances. The default is to return error in such a situation. {phang} {it:estopts} are common options to be passed through to the models. {dlgtab:Reporting} {phang} {opt xb} displays a table containing the regression coefficients and predictor values on which the decomposition is based. {phang} {opt level(#)} specifies the confidence level, as a percentage, for confidence intervals. The default is {cmd:level(95)} or as set by {helpb set level}. {phang} {opt eform} specifies that the results be displayed in exponentiated form. {phang} {opt noheader} suppresses the display of the table header. {phang} {opt nodefinitions} suppresses the display of the definitions of the decomposition terms in the table header. {phang} {opt notable} suppresses the display of the output table containing the decomposition results. {phang} {opt nolegend} suppresses the display of the legend about the sets of independent variables in the detailed decomposition. {title:Examples} {com}. {stata "use http://fmwww.bc.edu/RePEc/bocode/o/oaxaca.dta"} . {stata oaxaca lnwage educ exper tenure, by(female)} . {stata oaxaca lnwage educ exper tenure, by(female) weight(1)} . {stata oaxaca lnwage educ exper tenure, by(female) pooled} . {stata svyset [pw=wt]} . {stata oaxaca lnwage educ exper tenure, by(female) pooled svy} . {stata oaxaca lnwage educ exper tenure, by(female) pooled vce(bootstrap)} . {stata tabulate isco, nofreq generate(isco)}{txt} . {stata "oaxaca lnwage educ exper tenure normalize(isco?), by(female) pooled"} . {stata "use http://fmwww.bc.edu/RePEc/bocode/h/homecomp.dta, clear"} . {stata "oaxaca homecomp female age (educ:hsgrad somecol college) (marstat:married prevmar) if white==1|black==1, by(black) logit pooled":oaxaca homecomp female age (educ:hsgrad somecol college)} {stata "oaxaca homecomp female age (educ:hsgrad somecol college) (marstat:married prevmar) if white==1|black==1, by(black) logit pooled":(marstat:married prevmar) if white==1|black==1, by(black)} {stata "oaxaca homecomp female age (educ:hsgrad somecol college) (marstat:married prevmar) if white==1|black==1, by(black) logit pooled":logit pooled} {txt} {title:Saved Results} {synoptset 15 tabbed}{...} {p2col 5 15 19 2: Scalars}{p_end} {synopt:{cmd:e(N)}}number of observations {p_end} {synopt:{cmd:e(N_1)}}number of observations in Group 1 {p_end} {synopt:{cmd:e(N_2)}}number of observations in Group 2 {p_end} {synopt:{cmd:e(N_clust)}}number of clusters {p_end} {synopt:{cmd:e(k_eq)}}number of equations in {cmd:e(b)} {p_end} {synoptset 15 tabbed}{...} {p2col 5 15 19 2: Macros}{p_end} {synopt:{cmd:e(cmd)}}{cmd:oaxaca} {p_end} {synopt:{cmd:e(cmdline)}}command as typed {p_end} {synopt:{cmd:e(title)}}{cmd:Blinder-Oaxaca decomposition} {p_end} {synopt:{cmd:e(by)}}name group variable {p_end} {synopt:{cmd:e(group_1)}}value of group variable for Group 1 {p_end} {synopt:{cmd:e(group_2)}}value of group variable for Group 2 {p_end} {synopt:{cmd:e(depvar)}}name of dependent variable {p_end} {synopt:{cmd:e(model)}}{cmd:linear}, {cmd:logit}, or {cmd:probit} {p_end} {synopt:{cmd:e(threefold)}}{cmd:threefold}, {cmd:threefold(reverse)}, or empty {p_end} {synopt:{cmd:e(weights)}}weights specified by {cmd:weight()} or empty {p_end} {synopt:{cmd:e(split)}}{cmd:split} or empty {p_end} {synopt:{cmd:e(refcoefs)}}{cmd:pooled}, {cmd:omega}, name of reference model, or empty {p_end} {synopt:{cmd:e(legend)}}definitions of regressor sets {p_end} {synopt:{cmd:e(normalized)}}normalized indicator sets {p_end} {synopt:{cmd:e(adjust)}}names of adjustment variables {p_end} {synopt:{cmd:e(fixed)}}{cmd:fixed}, {cmd:fixed(}{it:varlist}{cmd:)}, or empty {p_end} {synopt:{cmd:e(suest)}}{cmd:suest} or empty {p_end} {synopt:{cmd:e(wtype)}}weight type {p_end} {synopt:{cmd:e(wexp)}}weight expression {p_end} {synopt:{cmd:e(clustvar)}}name of cluster variable {p_end} {synopt:{cmd:e(vce)}}{it:vcetype} specified in {cmd:vce()} {p_end} {synopt:{cmd:e(vcetype)}}title used to label Std. Err. {p_end} {synopt:{cmd:e(properties)}}{cmd:b V} {p_end} {synoptset 15 tabbed}{...} {p2col 5 15 19 2: Matrices}{p_end} {synopt:{cmd:e(b)}}decomposition results {p_end} {synopt:{cmd:e(V)}}variance-covariance matrix of decomposition results {p_end} {synopt:{cmd:e(b0)}}vector containing coefficients and X-values {p_end} {synopt:{cmd:e(V0)}}variance-covariance matrix of {cmd:e(b0)} {p_end} {synoptset 15 tabbed}{...} {p2col 5 15 19 2: Functions}{p_end} {synopt:{cmd:e(sample)}}marks estimation sample{p_end} {p2colreset}{...} {title:References} {phang} Fairlie, Robert W. (2005). An extension of the Blinder-Oaxaca decomposition technique to logit and probit models. Journal of Economic and Social Measurement 30: 305-316. DOI: {browse "https://doi.org/10.3233/JEM-2005-0259":10.3233/JEM-2005-0259} [Working paper: {browse "https://www.iza.org/publications/dp/1917/"}] {phang} Jann, Ben (2008). The Blinder-Oaxaca decomposition for linear regression models. The Stata Journal 8(4): 453-479. DOI: {browse "https://doi.org/10.1177/1536867X0800800401":10.1177/1536867X0800800401} [Working paper: {browse "http://ideas.repec.org/p/ets/wpaper/5.html"}] {phang} Yun, Myeong-Su (2004). Decomposing differences in the first moment. Economics Letters 82: 275-280. DOI: {browse "https://doi.org/10.1016/j.econlet.2003.09.008":10.1016/j.econlet.2003.09.008} [Working paper: {browse "https://www.iza.org/publications/dp/877/"}] {phang} Yun, Myeong-Su (2005). A Simple Solution to the Identification Problem in Detailed Wage Decompositions. Economic Inquiry 43: 766-772. DOI: {browse "https://doi.org/10.1093/ei/cbi053":10.1093/ei/cbi053} [Working paper: {browse "https://www.iza.org/publications/dp/836/"}] {title:Author} {p 4 4 2}Ben Jann, University of Bern, ben.jann@unibe.ch {title:Also see} {p 4 13 2} Online: help for {helpb regress}, {helpb logit}, {helpb probit}, {helpb heckman}, {helpb suest}, {helpb svyset}; {helpb fairlie} (if installed)