help oaxaca9-------------------------------------------------------------------------------

Title

oaxaca9-- Blinder-Oaxaca decomposition of outcome differentials

Syntax

oaxaca9depvar[indepvars] [if] [in] [weight],by(groupvar)[options]

optionsDescription ------------------------------------------------------------------------- Mainby(groupvar)specifies the groups;by()is requiredswapswap groupsdetail[(dlist)] display detailed decompositionadjust(varlist)adjustment for selection variablesDecomposition type

threefold[(reverse)] three-fold decomposition; the defaultweight(#[#...])two-fold decomposition based on specified weightspooled[(model_opts)] two-fold decomposition based on pooled model includinggroupvaromega[(model_opts)] two-fold decomposition based on pooled model excludinggroupvarreference(name)two-fold decomposition based on stored modelsplitsplit unexplained part of two-fold decompositionX-Values

x1(names_and_values)provide custom X-values for Group 1x2(names_and_values)provide custom X-values for Group 2categorical(clist)identify dummy variable sets and apply deviation contrast transformSE/SVY

svy[(svyspec)] survey data estimationvce(vcetype)vcetypemay be may beanalytic,robust,clusterclustvar,bootstrap, orjackknifecluster(varname)adjust standard errors for intragroup correlation (Stata 9)fixed[(varlist)] assume non-stochastic regressorssuest[(name)] |nosuestdo/do not usesuestto obtain joint variance matrixnosesuppress computation of standard errorsModels

model1(model_opts)estimation details for the Group 1 modelmodel2(model_opts)estimation details for the Group 2 modelnoisilydisplay model estimation outputReporting

xbdisplay table with coefficients and meanslevel(#)set confidence level; default islevel(95)eformreport exponentiated resultsnolegendsuppress legend -------------------------------------------------------------------------bootstrap,by,jackknife,statsby, andxiare allowed; see prefix. Weights are not allowed with thebootstrapprefix.aweights are not allowed with thejackknifeprefix.vce(),cluster(), and weights are not allowed with thesvyoption.fweights,aweights,pweights, andiweightare allowed; see weight.

Description

oaxaca9computes the so-called Blinder-Oaxaca decomposition, which is often used to analyze wage gaps by sex or race.depvaris the outcome variable of interest (e.g. log wages) andindepvarsare predictors (e.g. education, work experience, etc.).groupvaridentifies the groups to be compared. For methods and formulas see Jann (2008).

oaxaca9typed without arguments replays the last results, optionally applyingxb,level(),eform, ornolegend.

Options+------+ ----+ Main +-------------------------------------------------------------

by(groupvar)specifies thegroupvarthat defines the two groups that will be compared.by()is required.

swapreverses the order of the groups.

detail[(dlist)] requests that the detailed results for the individual predictors be reported. Usedlistto subsume the results for sets of regressors (results for variables not appearing indlistare listed individually). The syntax fordlistis

name:varlist[,name:varlist...]The usual shorthand conventions apply to the

varlists specified indlist(see helpvarlist; additionally,_consis allowed). For example, specifydetail(exp:exp*)to subsumeexp(experience) andexp2(experience squared).nameis any valid Stata name and labels the set.

adjust(varlist)causes the differential to be adjusted by the contribution of the specified variables before performing the decomposition. This is useful, for example, if the specified variables are selection terms. Note thatadjust()is not needed forheckmanmodels.+--------------------+ ----+ Decomposition type +-----------------------------------------------

threefold[(reverse)] computes the three-fold decomposition. This is the default unlessweight(),pooled,omega, orreference()is specified. The decomposition is expressed from the viewpoint of Group 2. Specifythreefold(reverse)to express the decomposition from the viewpoint of Group 1.

weight(#[#...])computes the two-fold decomposition where#[# ...] are the weights given to Group 1 relative to Group 2 in determining the reference coefficients (weights are recycled if there are more coefficients than weights). For example,weight(1)uses the Group 1 coefficients as the reference coefficients,weight(0)uses the Group 2 coefficients.

pooled[(model_opts)] computes the two-fold decomposition using the coefficients from a pooled model over both groups as the reference coefficients.groupvaris included in the pooled model as an additional control variable. Estimation details may be specified in parentheses; see themodel1()option below.

omega[(model_opts)] computes the two-fold decomposition using the coefficients from a pooled model over both groups as the reference coefficients (without includinggroupvaras a control variable in the pooled model). Estimation details may be specified in parentheses; see themodel1()option below.

reference(name)computes the two-fold decomposition using the coefficients from a stored model.nameis the name under which the model was stored; seeestimates store. Do not combine thereference()option with bootstrap or jackknife methods.

splitcauses the "unexplained" component in the two-fold decomposition to be split into a part related to Group 1 and a part related to Group 2.splitis effective only if specified withweight(),pooled,omega, orreference().Only one of

threefold,weight(),pooled,omega, andreference()is allowed.+----------+ ----+ X-Values +---------------------------------------------------------

x1(names_and_values)andx2(names_and_values)provide custom values for specific predictors to be used for Group 1 and Group 2 in the decomposition. The default is to use the group means of the predictors. The syntax fornames_and_valuesis

varname[=]value[[,]varname[=]value...]

categorical(clist)identifies sets of dummy variables representing categorical variables and transforms the coefficients so that the results of the decomposition are invariant to the choice of the (omitted) base category (deviation contrast transform). The syntax forclistis

varlist[,varlist...]where each

varlistmust contain indicator (0/1) variables for all categories including the base category (that is, a base category indicator variable must exist in the data). To generate a suitable set of indicator variables use, for example,

tabulatecatvar, generate(stubname)[nofreq]where

catvaris the categorical variable and the indicator variables will be namedstubname1,stubname2, ... (nofreqmay be used to suppress the frequency table; see helptabulate).The variables of a set specified in

categorical()are added to theindepvars(unless at least one of the variables of the set already appears inindepvars), omitting the first variable of the set to prevent collinearity for model estimation (i.e. the first variable is used to represent the base category). Change the order of the variables or explicitly specify the desired terms inindepvarsto change the base category.The deviation contrast transform can also be applied to interactions between a categorical and a continuous variable. Specify the continuous variable in parentheses at the end of the list in this case, i.e.

varlist(varname)[,...]and also include a list for the main effects. Example:

categorical(d1 d2 d3, xd1 xd2 xd3 (x))where

xis the continuous variable, andd1etc. andxd1etc. are the main effects and interaction effects.+--------+ ----+ SE/SVY +-----------------------------------------------------------

svy[([vcetype] [,svy_options])] executesoaxaca9while accounting for the survey settings identified bysvyset(this is essentially equivalent to applying thesvyprefix command, although thesvyprefix is not allowed withoaxaca9due to some technical issues).vcetypeandsvy_optionsare as described in helpsvy.

vce(vcetype)specifies the type of standard errors reported.vcetypemay be may beanalytic(the default),robust,clusterclustvar,bootstrap, orjackknife; see[R]vce_option.

cluster(varname)adjusts standard errors for intragroup correlation; this is Stata 9 syntax forvce(clusterclustvar).

fixed[(varlist)] identifies fixed regressors (all if specified without argument; an example for fixed regressors are experimental factors). The default is to treat regressors as stochastic. Stochastic regressors inflate the standard errors of the decomposition components.

suest[(name)] enforces usingsuestto obtain the covariances between the models/groups.suestis implied bypooled,omega,reference(),svy,vce(cluster), andcluster(). Specifysuest(name)to savesuest's estimation results under namenameusingestimates store.nosuestprevents applyingsuest, which may cause biased standard errors.

nosesuppresses the computation of standard errors.+------------------+ ----+ Model estimation +-------------------------------------------------

model1(model_opts)andmodel2(model_opts)specify the estimation details for the two group-specific models. The syntax formodel_optsis[

estcom] [,store(name)addrhs(spec)estcom_options]where

estcomis the estimation command to be used andestcom_optionsare options allowed byestcom. The default estimation command isregress.store(name)saves the model's estimation results under namenameusingestimates store.addrhs(spec)addsspecto the "right-hand side" of the model. For example, useaddrhs()to add extra variables to the model. Examples:

model1(heckman, select(varlist_s) twostep)

model1(ivregress 2sls, addrhs((varlist2=varlist_iv)))Technical notes:

o

oaxaca9uses the first equation for the decomposition if a model contains multiple equations.o Coefficients that occur in one of the models only are assumed zero for the other group. It is important, however, that the associated variables contain non-missing values for all observations in both groups.

noisilydisplays the models' estimation output.+-----------+ ----+ Reporting +--------------------------------------------------------

xbdisplays a table containing the regression coefficients and predictor values on which the decomposition is based.

level(#)specifies the confidence level, as a percentage, for confidence intervals. The default islevel(95)or as set byset level.

eformspecifies that the results be displayed in exponentiated form.

nolegendsuppresses the legend for the regressor sets defined by thedetail()option.

Examples. use http://fmwww.bc.edu/RePEc/bocode/o/oaxaca.dta

. oaxaca9 lnwage educ exper tenure, by(female)

. oaxaca9 lnwage educ exper tenure, by(female) weight(1)

. oaxaca9 lnwage educ exper tenure, by(female) pooled

. svyset [pw=wt] . oaxaca9 lnwage educ exper tenure, by(female) svy

. oaxaca9 lnwage educ exper tenure, by(female) vce(bootstrap)

Saved ResultsScalars

e(N)number of observationse(N_1)number of observations in Group 1e(N_2)number of observations in Group 2e(N_clust)number of clustersMacros

e(cmd)oaxaca9e(depvar)name of dependent variablee(by)name group variablee(group_1)value of group variable for Group 1e(group_2)value of group variable for Group 2e(title)Blinder-Oaxaca decompositione(model)type of decompositione(weights)weights specified in theweight()optione(refcoefs)equation name used ine(b0)for the reference coefficientse(detail)detail, if detailed results were requestede(legend)regressor sets defined by thedetail()optione(adjust)names of adjustment variablese(fixed)names of fixed variablese(suest)suest, ifsuestwas usede(wtype)weight typee(wexp)weight expressione(clustvar)name of cluster variablee(vce)vcetypespecified invce()e(vcetype)title used to label Std. Err.e(properties)b VMatrices

e(b)decomposition resultse(V)variance-covariance matrix of decomposition resultse(b0)vector containing coefficients and X-valuese(V0)variance-covariance matrix ofe(b0)Functions

e(sample)marks estimation sample

ReferencesJann, Ben (2008). The Blinder-Oaxaca decomposition for linear regression models. The Stata Journal 8(4): 453-479.

Working paper version available from: http://ideas.repec.org/p/ets/wpaper/5.html

AuthorBen Jann, ETH Zurich, jannb@ethz.ch

Also see