help for nnmatch

Nearest Neighbor Matching Estimation for Average Treatment Effects

nnmatch depvar treatvar varlist_nnmatch [if exp] [in range] [pw] [, tc(ate |att |atc) m(#) metric(maha |matname) exact(varlist_ex) biasadj(bias |varlist_adj) robust(#_v) population level(#) keep(filename) replace]

depvar, varlist_nnmatch, and elements of biasadj(varlist_adj) and exact(varlist_ex) must be numeric variables. treatvar must be a {0,1} variable.

Description

nnmatch estimates the average treatment effect on depvar by comparing outcomes between treated and control observations (as defined by treatvar), using nearest neighbor matching across the variables defined in varlist_nnmatch. nnmatch can estimate either the treatment effect for the treated observations, for the controls, or for the sample as a whole. The program pairs observations to the closest m matches in the opposite treatment group to provide an estimate of the counterfactual treatment outcome. The program allows for matching over a multi-dimensional set of variables(varlist_nnmatch), giving options for the weighting matrix to be used in determining the optimal matches. It also allows exact matching (or as close as possible) on a subset of variables. In addition, the program allows for bias correction of the treatment effect, and estimation of either the sample or population variance, with or without assuming a constant treatment effect (homoskedasticity). Finally it allows observations to be used as a match more than once, thus making the order of matching irrelevant. See Imbens et al. (Stata Journal 2004) for greater detail.

Options

tc(ate|att|atc) specifies which treatment effect is to be estimated:

ate: the average treatment effect, att: the average treatment effect for the treated, or atc: the average treatment effect for the controls.

If no option is specified, the average treatment effect, ate, is assumed. In this case all observations are matched to their nearest m neighbors of the opposite treatment group. In estimating the att or atc, only the treated or controls, respectively, are matched.

m(#) specifies the number of matches to be made per observation. If two observations of the opposite treatment group are equally close to that being matched, both will be used. Thus the number of matches per observation will be greater than or equal to m. If the average treatment effect is selected, m must be less than or equal to the smaller of N0 and N1, where N0 is the number of control observations in the dataset, and N1 is the number of treatment observations. If tc(att) is selected, m need only be less than or equal to N0; if tc(atc) is selected, m must be less than or equal to N1. If m(#) is not specified, 1 is assumed.

metric(maha|matname) specifies the weighting matrix to be used when k, the number of elements of varlist_nnmatch, is greater than 1. The metric option specifies the relative weight to be placed on each variable in varlist_nnmatch in defining nearest neighbor matches. Two options are available:

(1) metric(maha) specifies the Mahalanobis metric, the inverse of the sample variance/covariance matrix of the k variables in varlist_nnmatch. (2) metric(matname) allows for a user-defined weight matrix matname, where matname is an already-specified k-dimensional, symmetric and positive semi-definite matrix.

If no option is specified, the default is to use the k x k diagonal matrix of the inverse sample standard errors of the k variables in varlist_nnmatch.

exact(varlist_ex) allows the user to specify exact matching (or as exact as possible) on one or more variables. The exact-matching variables need not overlap with the elements of varlist_nnmatch. In practice, however, the exact matching option adds these variables to the original k x k varlist_nnmatch matrix, but in the weight matrix multiplies each exact element by 1000 relative to the weights placed on the elements of varlist_nnmatch. (Regardless of the metric() option chosen for the varlist_nnmatch variables, the exact match variables are normalized via the default option - the inverse sample errors.) Because for each matched observation there may not exist members of the opposite treatment group with equal values of the exact-matching variables, matching may not be exact across the full dataset. The output lists the percentage of matches (across the paired observations, greater than or equal to N*m in number) that match exactly.

biasadj(bias | varlist_adj) The simple matching estimator estimates the average treatment effect by calculating the average, over the N observations being matched, of the difference between the depvar outcome for observation i, and the average outcomes for its m matches in the opposite treatment group. However, the simple matching estimator will be biased if matching is not exact. This option regression-adjusts the results using either the original matching variable(s), varlist_nnmatch (if baisadj(bias) is selected), or a newly-specified set of variables, varlist_adj, (if baisadj(varlist_adj) is chosen).

robust(#_v) By default nnmatch assumes homoskedastic errors (a constant treatment effect). However, the user can allow for heteroskedastic errors by selecting the robust(#_v) option. The program does this by conducting a second matching process (again across the elements of varlist_nnmatch), this time matching observations in the same treatment group, to compare variability in outcomes (depvar) for observations with approximately the same varlist_nnmatch values. robust(#_v) allows the user to choose how many matches are used in this process. If robust is not selected, or #_v equals zero, homoskedastic errors are estimated.

population allows the user to specify the calculation of the population variance rather than the sample variance. If population is not selected, sample variance is assumed.

level(#) specifies the confidence level, in percent, for confidence intervals. The default is level(95) or as set by set level.

keep(filename) replace In the estimation process, nnmatch creates a temporary dataset holding, for each observation i being matched, a new observation holding i's outcome variable (depvar) and matching variable(s), varlist_nnmatch, values, and the outcome and varlist_nnmatch values for its m closest matches. Thus, the new dataset will hold at least (but potentially more than) N*m observations. If biasadj(varlist_adj) or exact(varlist_ex) are selected, the temporary dataset will also hold these values for each observation i and its match(es) j. keep(filename) allows the user to save this temporary dataset.

If keep(filename) is selected, each observation of filename.dta will hold the following variables:

t: The treatment group indicator, treatvar, for the observation being matched, i. y: The observed outcome variable, depvar(i). x: The varlist_nnmatch values for observation i. id: The identification code for the observation being matched, i. (When the command nnmatch is given, the program creates a temporary variable, id = {1,2,...N}, based on the original sort order.) index: The identification code for j, the match for observation i. dist: The estimated distance between observation i and its match j, based on the varlist_nnmatch values of each and the selected weight matrix. k_m: The number of times observation i is itself used as a match for any observation l of the opposite treatment group, each time weighted by the total number of matches for the given observation l. (For example, if observation i is one of three matches for observation l, it receives a value of 1/3 for that match. k_m(i) is the sum, across all observations l, of this value. Thus the sum of k_m across all observations i will equal N (or N0 or N1, if the atc or att, respectively, are estimated). Note that this value refers to i's use as a match, not to its matches j, so the value of k_m is equal across all observations in the temporary dataset that pertain to the matching of observation i. w_id: Weight for observation i, if weights are selected. w_index: Weight of observation j, the match for observation i, if weights are selected. `y'_0: The inferred depvar value if observation i were in the control group. (If observation i is in fact a control observation, `y'_0 = `y'(i). If i is a treated observation, `y'_0 = `y'(j).) `y'_1: Inferred depvar value if i were in the treated group. `x'_0m: Values of varlist_nnmatch for i's `control' observation. Namely, if i is a control observation, `x'_0m = x_i for each element x of varlist_nnmatch. If i is a treatment observation, `x'_0m will equal x_j. `x'_1m: Values of varlist_nnmatch for i's `treatment' observation. `b'_0b: Values of the bias-adjustment variables (if biasadj(varlist_adj) is selected) for i's `control' observation, where `b' represents each element of the bias-adjustment variables. `b'_1b: Bias-adjustment variables for i's `treatment' observation. `e'_0e: Values of the exact-matching variables (if exact(varlist_ex) is selected) for i's `control' observation, where `e' represents each element of the exact-matching variables. `e'_1e: Exact-matching variables for i's `treatment' observation.

replace If keep(filename) is selected, and filename.dta already exists, replace must be selected for filename.dta to be saved.

Weights

nnmatch allows users to define weights, used as pweights. In particular, when determining the m closest matches, if weights are selected, the program will choose the closet observations, j, such that their summed weights are equal to or just exceed m. If robust standard errors are chosen, a similar approach will be taken in choosing the closest matches in the variance calculation step.

Examples: match

nnmatch y t x1 x2 nnmatch y t x1, m(3) nnmatch y t x1 x2, tc(att) nnmatch y t x1 x2, tc(atc) met(maha) bias(bias) robust(4) nnmatch y t x1 x2, met(matname) bias(x1 x3) keep(artdata) replace nnmatch y t x1 x2 [w=w], met(matname) bias(x1 x3) exact(x4) pop