help fairlie


fairlie -- Nonlinear decomposition of binary outcome differentials


fairlie depvar indepvars [if] [in] [weight], by(groupvar) [ options ]

where the syntax for indepvars is

term [term ...]

with term as a variable name or, alternatively, a set of variables specified as

([name:] varlist)

name is any valid Stata name and labels the set. If name is omitted, the name of the first variable is used to label the set.

options Description ------------------------------------------------------------------------- by(groupvar) specify the groups (required); groupvar must be 0/1 reps(#) number of decomposition replications; default is 100 nodots suppress the replication dots ro randomize ordering of variables in the detailed decomposition reference(#) specify the reference model; # must be 0 (use group 0 model) or 1 (use group 1 model); default is 0 pooled[(varlist)] use a pooled model as reference; varlist is added to the model if specified probit use a probit model; default is to use a logit model noest suppress model estimation output saveest(name) store model estimation results under name level(#) set confidence level; default is level(95) nolegend suppress legend estopts options passed through to the internal call of logit or probit ------------------------------------------------------------------------- fweights, pweights, and iweight are allowed; see weight.


fairlie computes the nonlinear decomposition of binary outcome differentials proposed by Fairlie (1999, 2003, 2005). That is, fairlie computes the difference in Pr(depvar!=0) between the two groups defined by groupvar and quantifies the contribution of group differences in the indepvars to the outcome differential. Furthermore, fairlie estimates the separate contributions of the individual independent variables (or groups of independent variables). fairlie also reports standard errors for these separate contributions. Note that the covariances are set to zero. Therefore, do not use post-estimation commands such as test or lincom after fairlie.

The implementation of the decomposition technique closely follows the suggestions provided by Fairlie (2003). The paper can be downloaded from If weights are specified, a modified algorithm is used for the computation of the detailed decomposition (see below).

The decomposition technique involves one-to-one matching of cases between the two groups. If the groups have different sizes, a sample is drawn from the larger group. Since the results depend on the specific sample, the process is repeated and mean results are reported. Use reps() to specify the number of desired replications. Set the random-number seed for replicable results; see help generate.

The separate contributions from independent variables or groups of independent variables may be sensitive to the ordering of variables. If results are sensitive to ordering then use the ro option described below to randomize the ordering of variables, thus approximating results over all possible orderings.

Alternative decomposition approaches for binary response variables are provided, e.g., by Gomulka and Stern (1990) and Yun (2003).

Algorithm for weighted data: The algorithm by Fairlie for the detailed decomposition is based on matching observations from the two groups, where the groups are balanced by drawing a random sample (without replacement) from the larger group. The goal of the matching is to generate a hypothetical sample in which the distributions of some of the variables stem from the first group and some from the second group. In the case of weighted data, the original algorithm cannot be used, since different weights would have to be applied to the different variables in the hypothetical sample. However, an appropriate hypothetical sample can be constructed by matching samples from both groups where the sampling probabilities are proportional to the weights. In the present implementation the sizes of the two sub-samples are set to half the total sample size over both groups and observations are drawn with replacement. The choice of the sub-sample size is arbitrary but that does not matter much since the precision of the results only depends on the "grand total" of sampled observations, which is a function of the sub-sample size and the number of decomposition replications as set by the reps() option. That is, a smaller (larger) sub-sample size can be counterbalanced by an increase (a decrease) in the number of replications. The results from the original algorithm and from the algorithm for weighted data are numerically different, but they have the same expectation if the weights are uninformative (i.e. if the weights are equal for all observations or if the weights are independent from the observations).


by(groupvar) defines the groups between which the decomposition is to be performed. groupvar must be 0/1.

reps(#) specifies the number of decomposition replications to be performed. The default is 100.

nodots suppresses the display of replication dots.

ro causes the ordering of variables to be randomized in the detailed decomposition. The default is to estimate the separate contributions of the individual independent variables (or groups of independent variables) one after another in the specified order. Note that results are sensitive to this ordering. Specifying the ro option will randomize the order of the variables in each replication and, therefore, approximate average results over all possible orderings. This is recommended if the results are sensitive to ordering of variables. It may also be useful to increase the number of replications when using this option.

reference(#) specifies the reference estimates to be used with the decomposition. reference(0), the default, indicates that the coefficients from the groupvar==0 model are used. reference(1) specifies that the coefficients from the groupvar==1 model are used.

pooled[(varlist)] specifies that the coefficients from the pooled model over all cases be used for the decomposition. (Note that the cases used in the pooled model are not necessarily restricted to the non-missing cases of groupvar. This is reasonable because it is sometimes desirable to include cases from other groups as well. Use if and in to restrict the sample of the pooled model.) Optionally, varlist will be added as additional control variables to the pooled model. Often, pooled(groupvar) is a good choice. In any case, it is important that the reference group in the pooled model is the group for which groupvar==0.

probit specifies that the probit command is used for model estimation. The default is to use logit.

noest suppresses the display of the model estimates.

saveest(name) stores the model estimation results under name using estimates store.

level(#); see estimation options.

nolegend suppresses the legend for the variable sets.

estopts are options passed through to the internal call of logit or probit.


. use

. fairlie homecomp female age (educ:hsgrad somecol college) (marstat:married prevmar) if white==1|black==1, by(black)

. fairlie homecomp female age (educ:hsgrad somecol college) (marstat:married prevmar) if white==1|black==1, by(black) pooled(black)

. generate black2 = black==1 if white==1|black==1

. fairlie homecomp female age (educ:hsgrad somecol college) (marstat:married prevmar), by(black2) pooled(black latino asian natamer)

. fairlie homecomp female age (educ:hsgrad somecol college) (marstat:married prevmar) if white==1|black==1 [pw=wgt], by(black)

Saved Results

Scalars e(N) number of observations e(N_0) number of obs for which groupvar==0 e(N_1) number of obs for which groupvar==1 e(N_match) sample size used for one-to-one matching e(reps) number of decomposition replications e(pr_0) outcome probability for groupvar==0 e(pr_1) outcome probability for groupvar==1 e(diff) differential e(pr_0)-e(pr_1) e(expl) total contribution of group differences in regressors

Macros e(cmd) fairlie e(depvar) name of dependent variable e(by) name group variable e(_cmd) command used for model estimation (logit or probit) e(wtype) weight type e(wexp) weight expression e(reference) reference estimates (0, 1, or pooled) e(legend) definitions of regressor sets e(ro) ro, if the random order option was specified e(properties) b V

Matrices e(b) detailed decomposition results e(V) variances for e(b) (covariances are set to zero) e(_b) reference coefficients e(_V) variance-covariance matrix of e(_b)

Functions e(sample) marks estimation sample


Fairlie, Robert W. (1999). The Absence of the African-American Owned Business: An Analysis of the Dynamics of Self-Employment. Journal of Labor Economics 17(1): 80-108. Fairlie, Robert W. (2003). An Extension of the Blinder-Oaxaca Decomposition Technique to Logit and Probit Models. Economic Growth Center, Yale University Discussion Paper No. 873. Fairlie, Robert W. (2005). An extension of the Blinder-Oaxaca decomposition technique to logit and probit models. Journal of Economic and Social Measurement 30: 305-316. Gomulka, Joanna, and Nicholas Stern (1990). The Employment of Married Women in the United Kingdom 1970-83. Economica 57: 171-199. Yun, Myeong-Su (2003). Decomposing Differences in the First Moment. IZA Discussion Paper No. 877.


Ben Jann, ETH Zurich,

Thanks for citing this software as follows:

Jann, B. (2006). fairlie: Stata module to generate nonlinear decomposition of binary outcome differentials. Available from


I thank Sonia Bhalotra, Rob Fairlie, Julia Horstschräer, George Leckie, and Steven Samuels for their comments and suggestions.

Also see