{smcl} {* 27jan2009}{...} {hi:help oaxaca9} {hline} {title:Title} {pstd}{hi:oaxaca9} {hline 2} Blinder-Oaxaca decomposition of outcome differentials {title:Syntax} {p 8 15 2} {cmd:oaxaca9} {depvar} [{indepvars}] {ifin} {weight} {cmd:,} {opt by(groupvar)} [ {help oaxaca9##opt:{it:options}} ] {synoptset 25 tabbed}{...} {marker opt}{synopthdr:options} {synoptline} {syntab :Main} {synopt :{opt by(groupvar)}}specifies the groups; {cmd:by()} is required {p_end} {synopt :{opt swap}}swap groups {p_end} {synopt :{cmdab:d:etail}[{cmd:(}{it:{help oaxaca9##dlist:dlist}}{cmd:)}]}display detailed decomposition {p_end} {synopt :{opt a:djust(varlist)}}adjustment for selection variables {p_end} {syntab :Decomposition type} {synopt :{cmdab:three:fold}[{cmd:(}{cmdab:r:everse}{cmd:)}]}three-fold decomposition; the default {p_end} {synopt :{opt w:eight(# [# ...])}}two-fold decomposition based on specified weights {p_end} {synopt :{cmdab:p:ooled}[{cmd:(}{it:{help oaxaca9##mopts:model_opts}}{cmd:)}]}two-fold decomposition based on pooled model including {it:groupvar} {p_end} {synopt :{cmdab:o:mega}[{cmd:(}{it:{help oaxaca9##mopts:model_opts}}{cmd:)}]}two-fold decomposition based on pooled model excluding {it:groupvar} {p_end} {synopt :{opt ref:erence(name)}}two-fold decomposition based on stored model {p_end} {synopt :{opt split}}split unexplained part of two-fold decomposition {p_end} {syntab :X-Values} {synopt :{cmd:x1(}{it:{help oaxaca9##x1x2:names_and_values}}{cmd:)}}provide custom X-values for Group 1 {p_end} {synopt :{cmd:x2(}{it:{help oaxaca9##x1x2:names_and_values}}{cmd:)}}provide custom X-values for Group 2 {p_end} {synopt :{cmdab:cat:egorical(}{it:{help oaxaca9##clist:clist}}{cmd:)}}identify dummy variable sets and apply deviation contrast transform {p_end} {syntab :SE/SVY} {synopt :{cmd:svy}[{cmd:(}{it:{help oaxaca9##svy:svyspec}}{cmd:)}]}survey data estimation {p_end} {synopt :{opth vce(vcetype)}}{it:vcetype} may be may be {opt analytic}, {opt r:obust}, {opt cl:uster}{space 1}{it:clustvar}, {opt boot:strap}, or {opt jack:knife} {p_end} {synopt :{opt cl:uster(varname)}}adjust standard errors for intragroup correlation (Stata 9) {p_end} {synopt :{cmdab:fix:ed}[{cmd:(}{it:varlist}{cmd:)}]}assume non-stochastic regressors {p_end} {synopt : {cmd:suest}[{cmd:(}{it:name}{cmd:)}] | {cmd:nosuest}}do/do not use {helpb suest} to obtain joint variance matrix {p_end} {synopt :{opt nose}}suppress computation of standard errors {p_end} {syntab :Models} {synopt :{cmd:model1(}{it:{help oaxaca9##mopts:model_opts}}{cmd:)}}estimation details for the Group 1 model {p_end} {synopt :{cmd:model2(}{it:{help oaxaca9##mopts:model_opts}}{cmd:)}}estimation details for the Group 2 model {p_end} {synopt :{opt noi:sily}}display model estimation output {p_end} {syntab :Reporting} {synopt :{opt xb}}display table with coefficients and means {p_end} {synopt :{opt l:evel(#)}}set confidence level; default is {cmd:level(95)} {p_end} {synopt :{opt eform}}report exponentiated results {p_end} {synopt :{opt nole:gend}}suppress legend {p_end} {synoptline} {p 4 6 2} {cmd:bootstrap}, {cmd:by}, {cmd:jackknife}, {cmd:statsby}, and {cmd:xi} are allowed; see {help prefix}. {p_end} {p 4 6 2} Weights are not allowed with the {helpb bootstrap} prefix. {p_end} {p 4 6 2} {cmd:aweight}s are not allowed with the {helpb jackknife} prefix. {p_end} {p 4 6 2} {cmd:vce()}, {cmd:cluster()}, and weights are not allowed with the {cmd:svy} option. {p_end} {p 4 6 2} {cmd:fweight}s, {cmd:aweight}s, {cmd:pweight}s, and {cmd:iweight} are allowed; see {help weight}. {p_end} {title:Description} {pstd} {cmd:oaxaca9} computes the so-called Blinder-Oaxaca decomposition, which is often used to analyze wage gaps by sex or race. {it:depvar} is the outcome variable of interest (e.g. log wages) and {it:indepvars} are predictors (e.g. education, work experience, etc.). {it:groupvar} identifies the groups to be compared. For methods and formulas see Jann (2008). {pstd} {cmd:oaxaca9} typed without arguments replays the last results, optionally applying {cmd:xb}, {cmd:level()}, {cmd:eform}, or {cmd:nolegend}. {title:Options} {dlgtab:Main} {phang} {opt by(groupvar)} specifies the {it:groupvar} that defines the two groups that will be compared. {cmd:by()} is required. {phang} {opt swap} reverses the order of the groups.{p_end} {marker dlist} {phang}{cmd:detail}[{cmd:(}{it:dlist}{cmd:)}] requests that the detailed results for the individual predictors be reported. Use {it:dlist} to subsume the results for sets of regressors (results for variables not appearing in {it:dlist} are listed individually). The syntax for {it:dlist} is {p 12 16 2}{it:name}{cmd::}{it:varlist} [{cmd:,} {it:name}{cmd::}{it:varlist} {it:...}] {pmore} The usual shorthand conventions apply to the {it:varlist}s specified in {it:dlist} (see help {it:{help varlist}}; additionally, {cmd:_cons} is allowed). For example, specify {cmd:detail(exp:exp*)} to subsume {cmd:exp} (experience) and {cmd:exp2} (experience squared). {it:name} is any valid Stata name and labels the set. {phang} {opt adjust(varlist)} causes the differential to be adjusted by the contribution of the specified variables before performing the decomposition. This is useful, for example, if the specified variables are selection terms. Note that {cmd:adjust()} is not needed for {helpb heckman} models. {dlgtab:Decomposition type} {phang} {cmd:threefold}[{cmd:(}{cmdab:reverse}{cmd:)}] computes the three-fold decomposition. This is the default unless {cmd:weight()}, {cmd:pooled}, {cmd:omega}, or {cmd:reference()} is specified. The decomposition is expressed from the viewpoint of Group 2. Specify {cmdab:threefold(reverse)} to express the decomposition from the viewpoint of Group 1. {phang} {opt weight(# [# ...])} computes the two-fold decomposition where {it:#} [{it:# ...}] are the weights given to Group 1 relative to Group 2 in determining the reference coefficients (weights are recycled if there are more coefficients than weights). For example, {cmd:weight(1)} uses the Group 1 coefficients as the reference coefficients, {cmd:weight(0)} uses the Group 2 coefficients. {phang} {cmd:pooled}[{cmd:(}{it:{help oaxaca9##mopts:model_opts}}{cmd:)}] computes the two-fold decomposition using the coefficients from a pooled model over both groups as the reference coefficients. {it:groupvar} is included in the pooled model as an additional control variable. Estimation details may be specified in parentheses; see the {helpb oaxaca9##mopts:model1()} option below. {phang} {opt omega}[{cmd:(}{it:{help oaxaca9##mopts:model_opts}}{cmd:)}] computes the two-fold decomposition using the coefficients from a pooled model over both groups as the reference coefficients (without including {it:groupvar} as a control variable in the pooled model). Estimation details may be specified in parentheses; see the {helpb oaxaca9##mopts:model1()} option below. {phang} {opt reference(name)} computes the two-fold decomposition using the coefficients from a stored model. {it:name} is the name under which the model was stored; see {helpb estimates store}. Do not combine the {cmd:reference()} option with bootstrap or jackknife methods. {phang} {opt split} causes the "unexplained" component in the two-fold decomposition to be split into a part related to Group 1 and a part related to Group 2. {opt split} is effective only if specified with {cmd:weight()}, {cmd:pooled}, {cmd:omega}, or {cmd:reference()}. {pstd}Only one of {cmd:threefold}, {cmd:weight()}, {cmd:pooled}, {cmd:omega}, and {cmd:reference()} is allowed. {dlgtab:X-Values} {marker x1x2} {phang} {opt x1(names_and_values)} and {opt x2(names_and_values)} provide custom values for specific predictors to be used for Group 1 and Group 2 in the decomposition. The default is to use the group means of the predictors. The syntax for {it:names_and_values} is {p 12 16 2}{it:varname} [{cmd:=}] {it:value} [[{cmd:,}] {it:varname} [{cmd:=}] {it:value} {it:...} ] {pmore}Example: {cmd:x1(educ 12 exp 30)} {p_end} {marker clist} {phang} {opt categorical(clist)} identifies sets of dummy variables representing categorical variables and transforms the coefficients so that the results of the decomposition are invariant to the choice of the (omitted) base category (deviation contrast transform). The syntax for {it:clist} is {p 12 16 2}{it:varlist} [{cmd:,} {it:varlist} {it:...} ] {pmore}where each {it:varlist} must contain indicator (0/1) variables for all categories including the base category (that is, a base category indicator variable must exist in the data). To generate a suitable set of indicator variables use, for example, {p 12 16 2}{cmd:tabulate} {it:catvar}{cmd:, generate(}{it:stubname}{cmd:)} [ {cmd:nofreq} ] {pmore}where {it:catvar} is the categorical variable and the indicator variables will be named {it:stubname}{cmd:1}, {it:stubname}{cmd:2}, ... ({cmd:nofreq} may be used to suppress the frequency table; see help {helpb tabulate_oneway:tabulate}). {pmore}The variables of a set specified in {cmd:categorical()} are added to the {it:indepvars} (unless at least one of the variables of the set already appears in {it:indepvars}), omitting the first variable of the set to prevent collinearity for model estimation (i.e. the first variable is used to represent the base category). Change the order of the variables or explicitly specify the desired terms in {it:indepvars} to change the base category. {pmore}The deviation contrast transform can also be applied to interactions between a categorical and a continuous variable. Specify the continuous variable in parentheses at the end of the list in this case, i.e. {p 12 16 2}{it:varlist} {cmd:(}{it:varname}{cmd:)} [{cmd:,} {it:...} ] {pmore}and also include a list for the main effects. Example: {p 12 16 2}{cmd:categorical(d1 d2 d3, xd1 xd2 xd3 (x))} {pmore}where {cmd:x} is the continuous variable, and {cmd:d1} etc. and {cmd:xd1} etc. are the main effects and interaction effects. {dlgtab:SE/SVY} {marker svy} {phang} {cmd:svy}[{cmd:(}[{it:vcetype}] [{cmd:,} {it:svy_options}]{cmd:)}] executes {cmd:oaxaca9} while accounting for the survey settings identified by {helpb svyset} (this is essentially equivalent to applying the {helpb svy} prefix command, although the {helpb svy} prefix is not allowed with {cmd:oaxaca9} due to some technical issues). {it:vcetype} and {it:svy_options} are as described in help {helpb svy}. {phang} {opt vce(vcetype)} specifies the type of standard errors reported. {it:vcetype} may be may be {opt analytic} (the default), {opt robust}, {opt cluster}{space 1}{it:clustvar}, {opt bootstrap}, or {opt jackknife}; see {help vce_option:{bf:[R]}{space 1}{it:vce_option}}. {phang} {opt cluster(varname)} adjusts standard errors for intragroup correlation; this is Stata 9 syntax for {cmd:vce(cluster}{space 1}{it:clustvar}{cmd:)}. {phang} {cmd:fixed}[{cmd:(}{it:varlist}{cmd:)}] identifies fixed regressors (all if specified without argument; an example for fixed regressors are experimental factors). The default is to treat regressors as stochastic. Stochastic regressors inflate the standard errors of the decomposition components. {phang} {cmd:suest}[{cmd:(}{it:name}{cmd:)}] enforces using {helpb suest} to obtain the covariances between the models/groups. {cmd:suest} is implied by {cmd:pooled}, {cmd:omega}, {cmd:reference()}, {cmd:svy}, {cmd:vce(cluster)}, and {cmd:cluster()}. Specify {cmd:suest(}{it:name}{cmd:)} to save {helpb suest}'s estimation results under name {it:name} using {helpb estimates store}. {cmd:nosuest} prevents applying {helpb suest}, which may cause biased standard errors. {phang} {opt nose} suppresses the computation of standard errors. {dlgtab:Model estimation} {marker mopts} {phang} {cmd:model1(}{it:model_opts}{cmd:)} and {cmd:model2(}{it:model_opts}{cmd:)} specify the estimation details for the two group-specific models. The syntax for {it:model_opts} is {p 12 16 2}[{it:{help estimation_commands:estcom}}] [{cmd:,} {opt sto:re(name)} {opt add:rhs(spec)} {it:estcom_options} ] {pmore}where {it:estcom} is the estimation command to be used and {it:estcom_options} are options allowed by {it:estcom}. The default estimation command is {helpb regress}. {opt store(name)} saves the model's estimation results under name {it:name} using {helpb estimates store}. {opt addrhs(spec)} adds {it:spec} to the "right-hand side" of the model. For example, use {cmd:addrhs()} to add extra variables to the model. Examples: {cmd:model1(heckman, select(}{it:varlist_s}{cmd:) twostep)} {cmd:model1(ivregress 2sls, addrhs((}{it:varlist2}{cmd:=}{it:varlist_iv}{cmd:)))} {pmore}Technical notes: {phang2} {space 2}o{space 1}{cmd:oaxaca9} uses the first equation for the decomposition if a model contains multiple equations. {phang2} {space 2}o{space 1}Coefficients that occur in one of the models only are assumed zero for the other group. It is important, however, that the associated variables contain non-missing values for all observations in both groups. {phang} {opt noi:sily} displays the models' estimation output. {dlgtab:Reporting} {phang} {opt xb} displays a table containing the regression coefficients and predictor values on which the decomposition is based. {phang} {opt level(#)} specifies the confidence level, as a percentage, for confidence intervals. The default is {cmd:level(95)} or as set by {helpb set level}. {phang} {opt eform} specifies that the results be displayed in exponentiated form. {phang} {opt nolegend} suppresses the legend for the regressor sets defined by the {cmd:detail()} option. {title:Examples} {com}. {stata "use http://fmwww.bc.edu/RePEc/bocode/o/oaxaca.dta"} {com}. {stata oaxaca9 lnwage educ exper tenure, by(female)}{txt} {com}. {stata oaxaca9 lnwage educ exper tenure, by(female) weight(1)}{txt} {com}. {stata oaxaca9 lnwage educ exper tenure, by(female) pooled}{txt} {com}. {stata svyset [pw=wt]}{txt} {com}. {stata oaxaca9 lnwage educ exper tenure, by(female) svy}{txt} {com}. {stata oaxaca9 lnwage educ exper tenure, by(female) vce(bootstrap)}{txt} {title:Saved Results} {synoptset 15 tabbed}{...} {p2col 5 15 19 2: Scalars}{p_end} {synopt:{cmd:e(N)}}number of observations {p_end} {synopt:{cmd:e(N_1)}}number of observations in Group 1 {p_end} {synopt:{cmd:e(N_2)}}number of observations in Group 2 {p_end} {synopt:{cmd:e(N_clust)}}number of clusters {p_end} {synoptset 15 tabbed}{...} {p2col 5 15 19 2: Macros}{p_end} {synopt:{cmd:e(cmd)}}{cmd:oaxaca9} {p_end} {synopt:{cmd:e(depvar)}}name of dependent variable {p_end} {synopt:{cmd:e(by)}}name group variable {p_end} {synopt:{cmd:e(group_1)}}value of group variable for Group 1 {p_end} {synopt:{cmd:e(group_2)}}value of group variable for Group 2 {p_end} {synopt:{cmd:e(title)}}{cmd:Blinder-Oaxaca decomposition} {p_end} {synopt:{cmd:e(model)}}type of decomposition {p_end} {synopt:{cmd:e(weights)}}weights specified in the {cmd:weight()} option {p_end} {synopt:{cmd:e(refcoefs)}}equation name used in {cmd:e(b0)} for the reference coefficients {p_end} {synopt:{cmd:e(detail)}}{cmd:detail}, if detailed results were requested {p_end} {synopt:{cmd:e(legend)}}regressor sets defined by the {cmd:detail()} option {p_end} {synopt:{cmd:e(adjust)}}names of adjustment variables {p_end} {synopt:{cmd:e(fixed)}}names of fixed variables {p_end} {synopt:{cmd:e(suest)}}{cmd:suest}, if {cmd:suest} was used {p_end} {synopt:{cmd:e(wtype)}}weight type {p_end} {synopt:{cmd:e(wexp)}}weight expression {p_end} {synopt:{cmd:e(clustvar)}}name of cluster variable {p_end} {synopt:{cmd:e(vce)}}{it:vcetype} specified in {cmd:vce()} {p_end} {synopt:{cmd:e(vcetype)}}title used to label Std. Err. {p_end} {synopt:{cmd:e(properties)}}{cmd:b V} {p_end} {synoptset 15 tabbed}{...} {p2col 5 15 19 2: Matrices}{p_end} {synopt:{cmd:e(b)}}decomposition results {p_end} {synopt:{cmd:e(V)}}variance-covariance matrix of decomposition results {p_end} {synopt:{cmd:e(b0)}}vector containing coefficients and X-values {p_end} {synopt:{cmd:e(V0)}}variance-covariance matrix of {cmd:e(b0)} {p_end} {synoptset 15 tabbed}{...} {p2col 5 15 19 2: Functions}{p_end} {synopt:{cmd:e(sample)}}marks estimation sample{p_end} {p2colreset}{...} {title:References} {phang} Jann, Ben (2008). The Blinder-Oaxaca decomposition for linear regression models. The Stata Journal 8(4): 453-479. {pstd} Working paper version available from: {browse "http://ideas.repec.org/p/ets/wpaper/5.html"} {title:Author} {p 4 4 2}Ben Jann, ETH Zurich, jannb@ethz.ch {title:Also see} {p 4 13 2} Online: help for {helpb regress}, {helpb heckman}, {helpb suest}, {helpb svyset}