help imb -------------------------------------------------------------------------------


cem -- Measure of (Im)balance for CEM


imb varlist [if] [in] [, options]

options Description ------------------------------------------------------------------------- treatment(varname) name of the treatment variable breaks(string) method used to generate cutpoints miname(string) filename root of the imputed datasets, if in separate files misets(integer) number of imputed datasets, if in separate files impvar(string) name of imputed dataset variable, if in stack/flong format useweights should the cem_weights be use?


imb returns a number of measures of imbalance in covariates between treatment and control groups. A multivariate L1 distance, univariate L1 distrances, difference in means and empirical quatiles difference are reported. The L1 measures are computed by coarsening the data according to breaks and comparing across the multivariate histogram. See Iacus, King and Porro (2008) for more details on this measure.


+------+ ----+ Main +-------------------------------------------------------------

varlist is a list of variables to be included as coviarates.

+---------+ ----+ Options +----------------------------------------------------------

treatment(varname) sets the treatment variable used for the imbalance checks.

breaks(string) sets the default automatic coarsening algorithm. If either cem or imb has been run and there is a r(L1_breaks) available, this will be the default. Otherwise, the default for this is "scott". It is not incredibly important which method is used here as long as it is consistent.

miname(string) if the imputed datasets are in separate files, is the root of the filenames of the imputed dataset. They should be in the working directory. For example, if miname were "imputed", then the filenames should be "imputed1.dta","imputed2.dta" and so on.

misets(integer) if the imputed datasets are in separate files, is the number of imputed datasets being used for matching.

impvar(string) if the imputed data is stacked in one dataset (the Stata default), this is the name of the variable identifying to which imputation the observation belongs.

useweights makes imb use the weights from the output of cem. This is useful for checking balance after running cem.

Saved Results

Scalars r(L1) multivariate imbalance measure

Matrices r(imbal) matrix of univariate imbalance measures

Strings r(L1_breaks) break method used for L1 distance

References and Distribution

cem is licensed under GLP2. For more information, see:

For a full reference on Coarsened Exact Matching, see:

Stefano M. Iacus, Gary King, and Giuseppe Porro, "Matching for Causal Inference Without Balance Checking", copy at <>

To report bugs or give comments, please contact Matthew Blackwell <>.