help for gdecomp

Decomposition of outcome differentials after nonlinear models


gdecomp groupvar [, options ] : estimation_command


groupvar specifies a binary (numeric) variable identifying the two groups;

estimation_command (see help estcom) should begin with the logit, logistit, logistit, probit, poisson, or nbreg;

options are dxweight(high|low) reverse eform level(#) noheader nocoef dummies(varlist_1 [\ varlist_2 ..])


gdecomp implements a generalized Blinder-Oaxaca decomposition which applies to categorical and count outcomes (and parallel to this, to nonlinear regression models). First, estimation_command is estimated in the two groups of groupvar. Then the observed difference in the dependent variable of estimation_command between the groups defined by groupvar is decomposed into three parts: (1) a part due to differences in endowments (labeled by E), and (2) a part due to differences in marginal effects and finally (3) a part due to difference in baseline predictions or constants (labeled by U). See the Methods and formulas section below.

Typed without arguments, gdecomp replays the estimation results. gdecomp shares all features of estimation commands; see help estcom for details.

Before using gdecomp, please install the latest version of margeff. (The latest version is 2.0.1, dated 15 Septermber 2006). See other packages carrying out Blinder-Oaxaca decompositions at the bottom of this help file.


dxweight(high|low) affects the calculation of the endowment effect. If dxweight(high) is specified then differences in endowments are evaluated at the high-outcome regression line. If dxweight(low) is specified then differences in endowments are weighted with the marginal effects from the low-outome group. The default is dxweight(high).

reverse tells gdecomp that the group with the lower average of the outcome variable should be treated as the high-outcome group. By default, gdecomp defines the low-outcome group to be the group with the largest observed mean of the outcome variable. The default behavior generalizes the idea that average earnings are higher in the high-outcome group. The reverse option makes sense and should be used only if high value of the outcome variable indicate outcomes that are "negatively" valued (or, outcomes decreasing subjective utility). Do not use this option if large categories of the outcome variable record high salaries or being in the labor force; use this option if large categories of the outcome variable record being unemployed.

eform tells gdecomp that the dependent variable is the natural logarithm of the outcome variable, so that correct marginal effects (changes in the exponential of the linear prediction) can be calculated. This option is useful if the dependent variable is the logarithm of wage. Warning: with this option, you do not request the results to be displayed in exponentiated form.

level(#) specifies the confidence level, in percent terms, for the confidence intervals of the computed statistics; see help level.

noheader suppresses the display of overall and variable-level decomposition results.

nocoef suppresses the display of the decomposition results for the variables, and forces gdecomp to display the E, C and U components (without respective standard errors). .

dummies(varlist_1 [\ varlist_2 ... ]) modifies the calculation of marginal effects for dummy variables. Here, varlist_1 [\ varlist_2 ... ] are lists of dummy variables, where all dummies of a list indicate different categories of the same underlying categorical variable. Let xvar be a categorical variables with K+1 (K>1) categories. In this case, not xvar, but K dummies - say, D1, ..., DK - are included in the regression model. The estimated marginal effects for these K dummies may be misleading (see an example in the help file margeff). The correct result is obtained if one specifies the dummies(D*) option.

Methods and Formulas

Let y1 and y0 be the means of the dependent variable Y in the high-outcome and the low-outcome groups, respectively (thus y1>y0). Let x1 and x0 the row vectors of the means of the explanatory variables X1,...,Xk, and m1 and m0 the column vectors of the marginal effects in groups 1 and 0, and a1 and a0 the baseline predictions in groups 1 and 0.

If the dxweight(high|low) option is omitted or dxweight(high) is specified, then the raw differential y1-y2 is approximated as

y1-y0 = (x1-x0)m1 + x0(m1-m0) + a1-a0

If, however, the dxweight(low) option is specified, then the raw differential y1-y2 is approximated as

y1-y0 = (x1-x0)m0 + x0(m1-m0) + a1-a0

Whatever method is chosen, the first part on the right-hand side is the endowments effect (E), and the second part on the right-hand side is the coefficient effect (C), and the third part is the difference due to differences in "constants" (unexplained part, U).


Tamás Bartus, Corvinus University, Budapest,

Also see

On-line: help for gdecomp decompose oaxaca if installed