help ebalance
-------------------------------------------------------------------------------

Title

ebalance -- Entropy reweighting to create balanced samples

Syntax

ebalance [treat] covar [if] [in] [, options]

options Description -------------------------------------------------------------------------

Main targets(numlist) set balance constraints for covariates; default is targets(1) manualtargets(numlist) alternative for manual specification of balance constraints basewt(varname) variable with base weights; default is base weight of 1 for all units normconst(real) set normalization constant; default is normconst(1.0)

Advanced wttreat accept base weights for treated units; basewt() is required generate(newvar) specify varname for variable that stores the entropy balancing weights keep(filename) specify filename of a dataset that stores the balance table replace overwrite existing dataset maxiter(#) set maximum number of iterations; default is maxiter(20) tolerance(real) set tolerance level for convergence; default is tolerance(.015)

------------------------------------------------------------------------- covar is a varlist that may include factor variables, see fvvarlist.

Description

ebalance implements entropy balancing, a data preprocessing procedure that allows users to reweight a dataset such that the covariate distributions in the reweighted data satisfy a set of specified moment conditions (see Hainmueller 2012 for details). This can be useful to create balanced samples in observational studies with a binary treatment where the control group data can be reweighted to match the covariate moments in the treatment group. Entropy balancing can also be used to reweight a survey sample to known characteristics from a target population. ebalance can be used to adjust differences in the first, second, and third moment of the covariate distributions (ie. covariate means, variances, and skewness). Moments of the joint distribution can also be adjusted by including interaction terms for the covariates. The weights that result from entropy balancing can be passed to any standard model to subsequently analyze the reweighted data.

Required

treat varname that specifies the binary treatment variable. Values should be 1 for treated and 0 for control units. By default ebalance will reweight the data from the control units to match the moments computed from the data of the treated units. If the user just has a single data group (eg. a survey sample) that should be reweighted to match some known moments (eg. from a target population), then the manualtargets() option should be used and in this case the treat variable should be omitted (see manualtargets() for details).

covar varlist that specifies the covariates to be balanced on. At least one variable should be specified.

Options

+------+ ----+ Main +-------------------------------------------------------------

targets(numlist) specifies the highest order of moment constraints (1, 2, or 3) for each variable specified in covar. For example, tar(3 1 2) means that the adjustment includes the 1st, 2nd, and 3rd moment for the first covariate, the 1st moment for the second covariate, and the 1st and 2nd moment for the third covariate. By adjustment we mean that the control group data will be reweighted such that the specified moments match the values of the same moments in the treatment group data. The length of numlist should be identical to the number of covariates specified in covar except when only a single number is specified, which means that all the covariates will be adjusted to the same highest order as specified by that number, e.g. tar(1) is equivalent to tar(1 1 1) if three covariates are used in covar. Note that for a binary covariate only its first moment will be considered, regardless what number is specified for it in targets(), since matching the 1st moment is sufficient to balance higher moments.

manualtargets(numlist) if the user just has a single data group (eg. a survey sample), the manualtargets() option can be used to reweight the data such that it matches some user specified target moments for the covariates. For example, manualtargets(25 10 0.8) implies that the balancing weights will be chosen such that the means of the 1st, 2nd, and 3rd covariate in covar will match 25, 10, and 0.8, respectively. The length of numlist should be identical to the number of covariates specified in covar. Since there is only a single group, no treat variable should be used. The manualtargets() option is not compatible with targets() and wttreat.

basewt(varname) a varname that specifies a variable with survey base weights. If not specified, the default is to set all base weights to 1. If specified, the base weights for the control units are taken from basewt(varname), but the base weights for the treated units are still set to 1 unless wttreat is also specified. In the latter case, the base weights for all units are taken from basewt(varname).

normconst(real) a real number that specifies the normalizing constant (the default is 1). The resulting ebalance weights for the control units are multiplied with this specified real number, e.g. normconst(2) means that the total of the ebalance weights for the control units is two times the total of the weights for the treated units.

+----------+ ----+ Advanced +---------------------------------------------------------

wttreat specifies that survey weights for treated units should be taken into consideration. The weights are stored in the variable specified by basewt(). Not compatible with manualtargets(). See basewt().

generate(newvar) creates a new variable newvar that stores the estimated balancing weights. If not specified, the weights are stored in a variable named _webal by default. Note that _webal will be replaced when ebalance is called again.

keep(filename) saves a dataset with the balance table in the file filename.dta, which will hold the following variables (balance table for a single group is slightly different):

Xname: covariate that was balanced on

mean_Tr: mean of the treated units

mean_Co_Pre: mean of the raw control units

mean_Co_Post: mean of the reweighted control units

var_Tr: variance of the treated units

var_Co_Pre: variance of the raw control units

var_Co_Post: variance of the reweighted control units

skew_Tr: skewness of the treated units

skew_Co_Pre: skewness of the raw control units

skew_Co_Post: skewness of the reweighted control units

sdiff_Pre: standardized difference between treated and raw control groups

sdiff_Post: standardized difference between treated and reweighted control groups

replace permits keep() to overwrite an existing dataset.

maxiter(#) specifies the maximum number of iterations for the algorithm. Usually the default setting of 20 iterations should be sufficient, but the maximum number of iterations can be increased if no convergence is achieved. Notice that increasing the number of iterations will not help to achieve convergence if the algorithm fails because too many potentially collinear moment constraints are specified. In such cases, one should lower the order of the moment conditions by resetting targets() or dropping variables in covar. Another option is to relax the tolerance level. Note that even if convergence is not achieved within the maximum number of iterations, ebalance will still return the weights obtained in the last iteration.

tolerance(real) specifies the tolerance level for the convergence of the algorithm. The tolerance level refers to the maximum deviation across the specified moment constraints. Convergence is achieved if all specified moments match within the specified tolerance level.

Examples

Load example data (Lalonde Dataset) . sysuse cps1re74

Basic syntax . ebalance treat age educ black, tar(1)

tar(1), short for tar(1 1 1), means that the control units are reweighted to satisfy the balance constraints that the 1st moments (means) of age, educ and black match the corresponding moments of the treated units.

. ebalance treat age educ black, tar(3 2 1)

control units are reweighted to satisfy the balance constraints that the 1st, 2nd, and 3rd moment (means, variances, and skewness) of age, the 1st and 2nd moment of educ, and the 1st moment of black match the corresponding moments of the treated units. Since black is binary, adjusting its 1st moment is sufficient to adjust the higher moments.

New variable and higher order constraints . ebalance treat age educ black, g(ebw1) tar(1) . ebalance treat age educ black, g(ebw2) tar(3 1 1)

The two commands store the estimated balancing weights in the newly generated variables ebw1 and ebw2 respectively. figure1 and figure2 display the kernel densities of age for the treatment and control group data in the two cases and show how balancing constraints may affect the reweighted covariate distributions.

Interactions . gen ageXblack = age*black . ebalance treat educ age black ageXblack, tar(1) . bysort black: tabstat age [aweight=_webal], by(treat) s(N me v) nototal

By including interaction terms, covariates will be balanced across subsample groups. In the above case, for example, age is balanced within both black and non-black subgroups. It can also be achieved by using the functionality for factor variables (see fvvarlist for details) as follows: . ebalance treat educ black##c.age, tar(1) . bysort black: tabstat age [aweight=_webal], by(treat) s(N me v) nototal

Save balance table . ebalance treat age educ black, tar(2) k(baltable) rep

The balance table for treated and control units of both raw data and reweighted data is saved as baltable.dta for further use.

Estimation after reweighting . reg re78 treat age educ black re74 re75 u74 u75 . ebalance treat age educ black re74 re75 u74 u75, tar(2) . svyset [pweight= _webal] . svy: reg re78 treat

We first run a simple regression controlling for all the covariates in the Lalonde dataset. The estimate of the treatment effect is rather far from the experimental target answer of $1,794. Then we use ebalance to adjust 1st and 2nd moments of the covariates for the control group. The following regression based on the reweighted data generates an estimate with much less bias (see Hainmueller 2012 for details). Example for Single Group . ebalance age educ black hispan if treat==0, manual(28 10 0.1 0.1) If the user only has a single data group, for example a survey sample that should be reweighted to match some known target moments, then the manualtargets() option should be used to specify the moment constraints. Here we use this option such that the control units are reweighted such that the means of age, educ, black and hisp are equal to 25, 10, 0.1 and 0.1, respectively. Note that no treatment variable is specified in this case since there is only one group. Base Weights . gen basew=1 . ebalance treat age educ black, tar(3) basewt(basew) norm(2)

basewt(basew) option is used to pass user supplied base weights. Moreover, the norm(2) option is used to set the total of the weights for the control units to two times the total of the weights for the treated units. . replace basew=5 if treat==1 & age>30 . ebalance treat age educ black, tar(3) basewt(basew) norm(2) wttr

When wttreat is also specified, the base weights for treated units are also taken from basewt(basew). Because of this, the result is slightly different from above. Optimization settings . ebalance treat age educ black, tar(3) maxi(15) . ebalance treat age educ black, tar(3) maxi(15) tol(1)

In the first example, the optimization does not converge within the default tolerance, so ebalance returns the weights from the last iteration which already come pretty close. In the second example, the tol() is increased to relaxing the convergence criterion.

Saved results

By default, ebalance ereturns the following results, which can be displayed by typing ereturn list after ebalance is finished (also see ereturn).

Scalars e(convg) whether convergence is achieved (1 = achieved; 0 = not) e(maxdiff) maximum deviation across the specified moment constraints

Macros e(cmd) ebalance e(title) Entropy Balance e(cmdline) command as typed

Matrices e(lambdas) coefficient vector e(moments) sample moments of the treated e(preBal) balance table before reweighting e(postBal) balance table after reweighting

Functions e(sample) marks estimation sample

----------------------------------------------------------------------------- e(lambdas) and e(moments) are in the same order as the adjusted covariates are shown in the Data Setup section, i.e. the first order moment constraints come first; then the second; then the third.

References

Hainmueller, J. 2012, "Entropy Balancing: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies." Political Analysis, 20(1), 25-46.

Zaslavsky, A. 1988, "Representing local reweighting area adjustments of households", Survey Methodology 14(2), 265-288.

Ireland, C. and Kullback, S. 1968, "Contingency tables with given marginals", Biometrika 55, 179--188.

Kullback, S. 1959, "Information Theory and Statistics", Wiley, NY.

Authors

Jens Hainmueller, jhainm@mit.edu MIT

Yiqing Xu, xyq@mit.edu MIT