help fairlie-------------------------------------------------------------------------------

Title

fairlie-- Nonlinear decomposition of binary outcome differentials

Syntax

fairliedepvarindepvars[if] [in] [weight],by(groupvar)[options]where the syntax for

indepvarsis

term[term...]with

termas a variable name or, alternatively, a set of variables specified as

([name:]varlist)

nameis any valid Stata name and labels the set. Ifnameis omitted, the name of the first variable is used to label the set.

optionsDescription -------------------------------------------------------------------------by(groupvar)specify the groups (required);groupvarmust be 0/1reps(#)number of decomposition replications; default is 100nodotssuppress the replication dotsrorandomize ordering of variables in the detailed decompositionreference(#)specify the reference model;#must be 0 (use group 0 model) or 1 (use group 1 model); default is 0pooled[(varlist)] use a pooled model as reference;varlistis added to the model if specifiedprobituse aprobitmodel; default is to use alogitmodelnoestsuppress model estimation outputsaveest(name)store model estimation results undernamelevel(#)set confidence level; default islevel(95)nolegendsuppress legendestoptsoptions passed through to the internal call oflogitorprobit-------------------------------------------------------------------------fweights,pweights, andiweightare allowed; see weight.

Description

fairliecomputes the nonlinear decomposition of binary outcome differentials proposed by Fairlie (1999, 2003, 2005). That is,fairliecomputes the difference in Pr(depvar!=0) between the two groups defined bygroupvarand quantifies the contribution of group differences in theindepvarsto the outcome differential. Furthermore,fairlieestimates the separate contributions of the individual independent variables (or groups of independent variables).fairliealso reports standard errors for these separate contributions. Note that the covariances are set to zero. Therefore, donotuse post-estimation commands such astestorlincomafterfairlie.The implementation of the decomposition technique closely follows the suggestions provided by Fairlie (2003). The paper can be downloaded from http://ssrn.com/abstract=497302. If weights are specified, a modified algorithm is used for the computation of the detailed decomposition (see below).

The decomposition technique involves one-to-one matching of cases between the two groups. If the groups have different sizes, a sample is drawn from the larger group. Since the results depend on the specific sample, the process is repeated and mean results are reported. Use

reps()to specify the number of desired replications. Set the random-number seed for replicable results; see helpgenerate.The separate contributions from independent variables or groups of independent variables may be sensitive to the ordering of variables. If results are sensitive to ordering then use the

rooption described below to randomize the ordering of variables, thus approximating results over all possible orderings.Alternative decomposition approaches for binary response variables are provided, e.g., by Gomulka and Stern (1990) and Yun (2003).

Algorithm for weighted data: The algorithm by Fairlie for the detailed decomposition is based on matching observations from the two groups, where the groups are balanced by drawing a random sample (without replacement) from the larger group. The goal of the matching is to generate a hypothetical sample in which the distributions of some of the variables stem from the first group and some from the second group. In the case of weighted data, the original algorithm cannot be used, since different weights would have to be applied to the different variables in the hypothetical sample. However, an appropriate hypothetical sample can be constructed by matching samples from both groups where the sampling probabilities are proportional to the weights. In the present implementation the sizes of the two sub-samples are set to half the total sample size over both groups and observations are drawn with replacement. The choice of the sub-sample size is arbitrary but that does not matter much since the precision of the results only depends on the "grand total" of sampled observations, which is a function of the sub-sample size and the number of decomposition replications as set by thereps()option. That is, a smaller (larger) sub-sample size can be counterbalanced by an increase (a decrease) in the number of replications. The results from the original algorithm and from the algorithm for weighted data are numerically different, but they have the same expectation if the weights are uninformative (i.e. if the weights are equal for all observations or if the weights are independent from the observations).

Options

by(groupvar)defines the groups between which the decomposition is to be performed.groupvarmust be 0/1.

reps(#)specifies the number of decomposition replications to be performed. The default is 100.

nodotssuppresses the display of replication dots.

rocauses the ordering of variables to be randomized in the detailed decomposition. The default is to estimate the separate contributions of the individual independent variables (or groups of independent variables) one after another in the specified order. Note that results are sensitive to this ordering. Specifying therooption will randomize the order of the variables in each replication and, therefore, approximate average results over all possible orderings. This is recommended if the results are sensitive to ordering of variables. It may also be useful to increase the number of replications when using this option.

reference(#)specifies the reference estimates to be used with the decomposition.reference(0), the default, indicates that the coefficients from thegroupvar==0 model are used.reference(1)specifies that the coefficients from thegroupvar==1 model are used.

pooled[(varlist)] specifies that the coefficients from the pooled model over all cases be used for the decomposition. (Note that the cases used in the pooled model are not necessarily restricted to the non-missing cases ofgroupvar. This is reasonable because it is sometimes desirable to include cases from other groups as well. Useifandinto restrict the sample of the pooled model.) Optionally,varlistwill be added as additional control variables to the pooled model. Often,pooled(groupvar)is a good choice. In any case, it is important that the reference group in the pooled model is the group for whichgroupvar==0.

probitspecifies that theprobitcommand is used for model estimation. The default is to uselogit.

noestsuppresses the display of the model estimates.

saveest(name)stores the model estimation results undernameusingestimates store.

level(#); see estimation options.

nolegendsuppresses the legend for the variable sets.

estoptsare options passed through to the internal call oflogitorprobit.

Examples. use http://fmwww.bc.edu/RePEc/bocode/h/homecomp.dta

. fairlie homecomp female age (educ:hsgrad somecol college) (marstat:married prevmar) if white==1|black==1, by(black)

. fairlie homecomp female age (educ:hsgrad somecol college) (marstat:married prevmar) if white==1|black==1, by(black) pooled(black)

. generate black2 = black==1 if white==1|black==1

. fairlie homecomp female age (educ:hsgrad somecol college) (marstat:married prevmar), by(black2) pooled(black latino asian natamer)

. fairlie homecomp female age (educ:hsgrad somecol college) (marstat:married prevmar) if white==1|black==1 [pw=wgt], by(black)

Saved ResultsScalars

e(N)number of observationse(N_0)number of obs for whichgroupvar==0e(N_1)number of obs for whichgroupvar==1e(N_match)sample size used for one-to-one matchinge(reps)number of decomposition replicationse(pr_0)outcome probability forgroupvar==0e(pr_1)outcome probability forgroupvar==1e(diff)differentiale(pr_0)-e(pr_1)e(expl)total contribution of group differences in regressors

Macros

e(cmd)fairliee(depvar)name of dependent variablee(by)name group variablee(_cmd)command used for model estimation (logitorprobit)e(wtype)weight typee(wexp)weight expressione(reference)reference estimates (0,1, orpooled)e(legend)definitions of regressor setse(ro)ro, if the random order option was specifiede(properties)b VMatrices

e(b)detailed decomposition resultse(V)variances fore(b)(covariances are set to zero)e(_b)reference coefficientse(_V)variance-covariance matrix ofe(_b)Functions

e(sample)marks estimation sample

ReferencesFairlie, Robert W. (1999). The Absence of the African-American Owned Business: An Analysis of the Dynamics of Self-Employment. Journal of Labor Economics 17(1): 80-108. Fairlie, Robert W. (2003). An Extension of the Blinder-Oaxaca Decomposition Technique to Logit and Probit Models. Economic Growth Center, Yale University Discussion Paper No. 873. Fairlie, Robert W. (2005). An extension of the Blinder-Oaxaca decomposition technique to logit and probit models. Journal of Economic and Social Measurement 30: 305-316. Gomulka, Joanna, and Nicholas Stern (1990). The Employment of Married Women in the United Kingdom 1970-83. Economica 57: 171-199. Yun, Myeong-Su (2003). Decomposing Differences in the First Moment. IZA Discussion Paper No. 877.

AuthorBen Jann, ETH Zurich, jannb@ethz.ch

Thanks for citing this software as follows:

Jann, B. (2006). fairlie: Stata module to generate nonlinear decomposition of binary outcome differentials. Available from http://ideas.repec.org/c/boc/bocode/s456727.html.

AcknowledgmentsI thank Sonia Bhalotra, Rob Fairlie, Julia Horstschräer, George Leckie, and Steven Samuels for their comments and suggestions.

Also see