help oaxaca-------------------------------------------------------------------------------

Title

oaxaca-- Blinder-Oaxaca decomposition of outcome differentials

Syntax

oaxacadepvar[indepvars] [if] [in] [weight],by(groupvar)[options]

where

indepvarsisterm[term...]with

termasvarlistor([name:]varlist)and

varlistmay containnormalize(spec)

optionsDescription ------------------------------------------------------------------------- Mainby(groupvar)specifies the groups;by()is requiredswapswap groupslinearlinear decomposition; the defaultlogitlogit decompositionprobitprobit decompositionnodetailsuppress detailed decompositionadjust(varlist)adjustment for selection variablesDecomposition type

threefold[(reverse)] three-fold decomposition; the defaultweight(#[#...])two-fold decomposition using specified weightspooled[(model_opts)] two-fold decomposition using pooled model includinggroupvaromega[(model_opts)] two-fold decomposition using pooled model excludinggroupvarreference(name)two-fold decomposition using stored modelsplitsplit unexplained part of two-fold decompositionSE/SVY

svy[(svyspec)] survey data estimationvce(vcetype)vcetypemay be may beanalytic,robust,clusterclustvar,bootstrap, orjackknifecluster(varname)adjust standard errors for intragroup correlation (Stata 9)fixed[(varlist)] assume non-stochastic regressorssuest[(name)] |nosuestdo/do not usesuestto obtain joint variance matrixnosesuppress computation of standard errorsModels

model1(model_opts)estimation details for the Group 1 modelmodel2(model_opts)estimation details for the Group 2 modelnoisilydisplay model estimation outputrelaxdo no stop on dropped coefficients/zero variancesestoptsoptions passed through to all modelsX-Values (linear decomposition only)

x1(names_and_values)provide custom X-values for Group 1x2(names_and_values)provide custom X-values for Group 2Reporting

xbdisplay table with coefficients and meanslevel(#)set confidence level; default islevel(95)eformreport exponentiated resultsnolegendsuppress legend -------------------------------------------------------------------------bootstrap,by,jackknife,statsby, andxiare allowed; see prefix. Weights are not allowed with thebootstrapprefix.aweights are not allowed with thejackknifeprefix.vce(),cluster(), and weights are not allowed with thesvyoption.fweights,aweights,pweights, andiweightare allowed; see weight;aweights are not allowed withlogitorprobit

Description

oaxacacomputes the so-called Blinder-Oaxaca decomposition, which is often used to analyze wage gaps by sex or race.depvaris the outcome variable of interest (e.g. log wages) andindepvarsare predictors (e.g. education, work experience, etc.).groupvaridentifies the groups to be compared. The standard errors of the decomposition components are computed using the delta method and take into account the variability induced by stochastic regressors. For methods and formulas see Jann (2008).

oaxacaalso supports the non-linear decomposition for binary dependent variables proposed by Yun (2004). See thelogitandprobitoptions. An alternative non-linear decomposition for binary dependent variables, suggested by Fairlie (2005), is available asfairliefrom the SSC Archive (see ssc describe fairlie).

oaxacatyped without arguments replays the last results, optionally applyingxb,level(),eform, ornolegend.

Subsume results for sets of variablesDecomposition results can be aggregated for subsets of variables using syntax

...([name:]varlist)...where

nameprovides a label for the subset (the name of the first variable in the subset is used as label ifnameis omitted). For example, you could type. oaxaca lnwage educ (expten: exper tenure), by(female)

to subsume the contributions of

experandtenure. Apart from variable names, also_consand_offsetcan be specified as part of a subset.

Normalization of categorical variablesFor categorical regressors, the detailed decomposition results depend on the choice of the (omitted) base category. A solution is to compute the decomposition based on "normalized" effects, i.e. effects that are expressed as deviation contrasts from the grand mean (Yun 2005). To "normalize" the effects for a set of indicator variables representing a categorical variable include the indicator variables in the list of regressors using syntax

...normalize(spec)...where

specusually simply is the list of indicator variables. Note that an indicator variable has to be supplied for every category (including the base category). For example, you could type. tabulate isco, generate(isco) nofreq . oaxaca lnwage educ exper normalize(isco1-isco9), by(female)

The

tablate, generate()command is a convenient way to generate a set of indicator variables from a categorical variable (such as the 9 major group ISCO-88 job classification). The base category to be omitted from model estimation can be designated using theb.operator, but this should not affect the decomposition results. For example, you could type... normalize(married b.single divorced) ...

The first variable is taken if no base category is marked.

Note that

normalize()is allowed within subsumed variable sets. For example, you could type... (family: kids6 normalize(married b.single divorced)) ...

Normalization can also be applied to interactions between a categorical variable and a continuous variable. In this case, type

#followed by the name of the continuous variable at the end innormalize(). Because usually you would also want to normalize the main effects you should supply twonormalize()statements, one for the main effects and one for the interactions. Example: Supposed1,d2, andd3are indicator variables representing a categorical variable andd1x,d2x, andd3xare interactions of these indicators with a continuous variablex. You could then type... normalize(d1 d2 d3) normalize(d1x d2x d3x # x) ...

Options+------+ ----+ Main +-------------------------------------------------------------

by(groupvar)specifies thegroupvarthat defines the two groups that are to be compared.by()is required.

swapreverses the order of the groups.

linearcauses the standard linear decomposition to be computed. This is the default. The estimation command for the group models defaults toregress.

logitcauses the non-linear decomposition for a binary dependent variable to be computed using the weighting method described by Yun (2004). The estimation command for the group models defaults tologit.

probitcauses the non-linear decomposition for a binary dependent variable to be computed using the weighting method described by Yun (2004). The estimation command for the group models defaults toprobit.Only one of

linear,logit, orprobitis allowed.

nodetailsuppresses the detailed results and only computes the overall decomposition.

adjust(varlist)causes the group differential to be adjusted by the contribution of the specified variables before computing the decomposition. This is useful, for example, if the specified variables are selection terms. Note thatadjust()is not needed ifheckmanis used to estimate the models._offsetis allowed inadjust().+--------------------+ ----+ Decomposition type +-----------------------------------------------

threefold[(reverse)] computes the three-fold decomposition. This is the default. The decomposition is expressed from the viewpoint of Group 2. Specifythreefold(reverse)to express the decomposition from the viewpoint of Group 1.

weight(#[#...])computes the two-fold decomposition where#[# ...] are the weights given to Group 1 relative to Group 2 in determining the reference coefficients (weights are recycled if there are more coefficients than weights). For example,weight(1)uses the Group 1 coefficients as the reference coefficients,weight(0)uses the Group 2 coefficients.

pooled[(model_opts)] computes the two-fold decomposition using the coefficients from a pooled model over both groups as the reference coefficients.groupvaris included in the pooled model as an additional control variable. Estimation details may be specified in parentheses; see themodel1()option below.

omega[(model_opts)] computes the two-fold decomposition using the coefficients from a pooled model over both groups as the reference coefficients (without includinggroupvaras a control variable). Estimation details may be specified in parentheses; see themodel1()option below.

reference(name)computes the two-fold decomposition using the coefficients from a stored model.nameis the name under which the model was stored; seeestimates store. It is suggested not to combinereference()withvce(bootstrap)orvce(jackknife).

splitcauses the "unexplained" component in the two-fold decomposition to be split into a part related to Group 1 and a part related to Group 2.Only one of

threefold,weight(),pooled,omega, andreference()is allowed.+----------+ ----+ X-Values +---------------------------------------------------------

x1(names_and_values)andx2(names_and_values)provide custom values for specific predictors to be used for Group 1 and Group 2 in the decomposition (only allowed with linear decomposition). The default is to use the group means of the predictors. The syntax fornames_and_valuesis

varname[=]value[[,]varname[=]value...]Example:

x1(educ 12 exp 30)+--------+ ----+ SE/SVY +-----------------------------------------------------------

svy[([vcetype] [,svy_options])] executesoaxacawhile accounting for the survey settings identified bysvyset(this is essentially equivalent to applying thesvyprefix command, although thesvyprefix is not allowed withoaxacadue to some technical issues).vcetypeandsvy_optionsare as described in helpsvy.

vce(vcetype)specifies the type of standard errors reported.vcetypemay be may beanalytic(the default),robust,clusterclustvar,bootstrap, orjackknife; see[R]vce_option.

cluster(varname)adjusts standard errors for intragroup correlation; this is Stata 9 syntax forvce(clusterclustvar).

fixed[(varlist)] identifies fixed regressors (all if specified without argument; an example for fixed regressors are experimental factors). The default is to treat regressors as stochastic. Stochastic regressors inflate the standard errors of the decomposition components.

suest[(name)] enforces usingsuestto obtain the covariances between the models/groups.suestis implied bypooled,omega,reference(),svy,vce(cluster), andcluster(). Specifysuest(name)to savesuest's estimation results undernameusingestimates store.nosuestprevents applyingsuest(this may cause biased standard errors).

nosesuppresses the computation of standard errors.+------------------+ ----+ Model estimation +-------------------------------------------------

model1(model_opts)andmodel2(model_opts)specify the estimation details for the two group models. The syntax formodel_optsis[

estcom] [,store(name)addrhs(spec)estcom_options]where

estcomis the estimation command to be used andestcom_optionsare options allowed byestcom.store(name)saves the model's estimation results undernameusingestimates store.addrhs(spec)addsspecto the "right-hand side" of the model. For example, useaddrhs()to add extra variables to the model. Examples:

model1(heckman, select(varlist_s) twostep)

model1(ivregress 2sls, addrhs((varlist2=varlist_iv)))Note that

oaxacauses the first equation if a model contains multiple equations. Furthermore, coefficients that only occur in one of the models are assumed zero in the other model. It is required, however, that the associated variables contain non-missing values for all observations in both groups.

noisilydisplays the models' estimation output.

relaxcausesoaxacato continue its computations even if coefficients are dropped from the models (e.g. due to collinearity) or if some coefficients have zero variances. The default is to return error in such a situation.

estoptsare common options to be passed through to the models.+-----------+ ----+ Reporting +--------------------------------------------------------

xbdisplays a table containing the regression coefficients and predictor values on which the decomposition is based.

level(#)specifies the confidence level, as a percentage, for confidence intervals. The default islevel(95)or as set byset level.

eformspecifies that the results be displayed in exponentiated form.

nolegendsuppresses the legend about the sets of independent variables.

Examples. use http://fmwww.bc.edu/RePEc/bocode/o/oaxaca.dta

. oaxaca lnwage educ exper tenure, by(female)

. oaxaca lnwage educ exper tenure, by(female) weight(1)

. oaxaca lnwage educ exper tenure, by(female) pooled

. svyset [pw=wt] . oaxaca lnwage educ exper tenure, by(female) pooled svy

. oaxaca lnwage educ exper tenure, by(female) pooled vce(bootstrap)

. tabulate isco, nofreq generate(isco) . oaxaca lnwage educ exper tenure normalize(isco?), by(female) pooled

. use http://fmwww.bc.edu/RePEc/bocode/h/homecomp.dta, clear . oaxaca homecomp female age (educ:hsgrad somecol college) (marstat:married prevmar) if white==1|black==1, by(black) logit pooled

Saved ResultsScalars

e(N)number of observationse(N_1)number of observations in Group 1e(N_2)number of observations in Group 2e(N_clust)number of clustersMacros

e(cmd)oaxacae(title)Blinder-Oaxaca decompositione(by)name group variablee(group_1)value of group variable for Group 1e(group_2)value of group variable for Group 2e(depvar)name of dependent variablee(model)linear,logit, orprobite(threefold)threefold,threefold(reverse), or emptye(weights)weights specified byweight()or emptye(refcoefs)pooled,omega, name of reference model, or emptye(legend)definitions of regressor setse(normalized)normalized indicator setse(adjust)names of adjustment variablese(fixed)fixed,fixed(varlist), or emptye(suest)suestor emptye(wtype)weight typee(wexp)weight expressione(clustvar)name of cluster variablee(vce)vcetypespecified invce()e(vcetype)title used to label Std. Err.e(properties)b VMatrices

e(b)decomposition resultse(V)variance-covariance matrix of decomposition resultse(b0)vector containing coefficients and X-valuese(V0)variance-covariance matrix ofe(b0)Functions

e(sample)marks estimation sample

ReferencesFairlie, Robert W. (2005). An extension of the Blinder-Oaxaca decomposition technique to logit and probit models. Journal of Economic and Social Measurement 30: 305-316.

Jann, Ben (2008). The Blinder-Oaxaca decomposition for linear regression models. The Stata Journal 8(4): 453-479. [Working paper version available from: http://ideas.repec.org/p/ets/wpaper/5.html]

Yun, Myeong-Su (2004). Decomposing differences in the first moment. Economics Letters 82: 275-280.

Yun, Myeong-Su (2005). A Simple Solution to the Identification Problem in Detailed Wage Decompositions. Economic Inquiry 43: 766-772.

AuthorBen Jann, Institute of Sociology, University of Bern, jann@soz.unibe.ch

Also seeOnline: help for

regress,logit,probit,heckman,suest,svyset;fairlie