-------------------------------------------------------------------------------
help for pstest
-------------------------------------------------------------------------------

Covariate imbalance testing and graphing

pstest [varlist] [if exp] [in range] [, treated(varname) both raw mweight(varname) support(varname) notable label onlysig nodist graph hist graph_options ]

Description

pstest calculates and optionally graphs several measures of the balancing of the variables in varlist between two groups (if varlist is not specified, pstest will look for the variables that were specified in the latest call of psmatch2 or of pstest). In particular it can be used to gauge comparability in terms of varlist between:

1. Two matched samples (the default).

pstest can be called directly after psmatch2, or it can be fed matching weights via option mweight to assess the extent of balancing achieved on the two matched samples. A particularly useful way to use pstest is in search of a matching method and set of matching parameters that achieves good balancing; psmatch2 can be called repeatedly prefixed by quietly and the extent of corresponding balancing can each time be displayed by calling pstest.

2. Any two samples (option raw).

pstest can be called to assess the comparability of any two groups. This may be before performing matching, or completely unrelated to matching purposes. (The groups are in any case referred to as Treated and Controls, but they could be males and females, employed and non-employed etc.).

3. Two samples before and after having performed matching (option both).

In this case pstest compares the extent of balancing between the two samples before and after having performed matching.

For each variable in varlist it calculates:

(a) t-tests for equality of means in the two samples (before and after matching if option both is specified). T-tests are based on a regression of the variable on a treatment indicator. Before matching or on raw samples this is an unweighted regression on the whole sample, after matching the regression is weighted using the matching weight variable _weight or user-given weight variable in mweight and based on the on-support sample;

(b) the standardised percentage bias. If option both is specified, the standardised percentage bias is shown before and after matching, together with the achieved percentage reduction in abs(bias). The standardised % bias is the % difference of the sample means in the treated and non-treated (full or matched) sub-samples as a percentage of the square root of the average of the sample variances in the treated and non-treated groups (formulae from Rosenbaum and Rubin, 1985).

It also calculates overall measures of covariate imbalance:

(a) Pseudo R2 from probit estimation of the conditional treatment probability (propensity score) on all the variables in varlist on raw samples, matched samples (default) or both before and after matching. Also displayed are the corresponding P-values of the likelihood-ratio test of the joint insignificance of all the regressors (before and after matching if option both is specified);

(b) the mean and median bias as summary indicators of the distribution of the abs(bias) (before and after matching if option both is specified);

Optionally pstest graphs the extent of covariate imbalance in terms of standardised percentage differences using dot charts (option graph) or histograms (option hist).

One only need type pstest[, both] directly after psmatch2 to inspect the extent of covariate balancing in matched samples if psmatch2 has been called with a varlist.

If option both is specified, pstest returns the following diagnostics of covariate balancing before and after matching: r(meanbiasbef) and r(meanbiasaft) the mean absolute standardised bias, r(medbiasbef) and r(medbiasaft) the median absolute standardised bias, r(r2bef) and r(r2aft) the pseudo R2 from probit estimation and r(chiprobbef) and r(chiprobaft) the P-value of the likelihood-ratio test. If the two groups are compared only once (matched samples as default or two unmatched samples if option raw is specified), pstest returns r(meanbias), r(medbias), r(r2) and r(chiprob). pstest always returns in r(exog) the names of the variables for which it has tested the extent of balancing.

Important notes

pstest only considers balancing for the treated even if called after psmatch2, ate.

Spline matching as in psmatch2, spline as well as the default (tricube) local linear regression matching as in psmatch2, llr first smooth the outcome and then perform nearest neighbor matching. pstest does not make sense in these cases since more non-treated are used to calculate the counterfactual outcome than the nearest neighbor only.

Detailed Syntax

Matched samples:

pstest [varlist] [if exp] [in range] [, treated(varname) mweight(varname) support(varname) notable nodist label onlysig graph hist graph_options ]

Raw samples:

pstest [varlist] [if exp] [in range] , raw treated(varname) [ notable nodist label onlysig graph hist graph_options ]

Before and after matching:

pstest [varlist] [if exp] [in range] , both [ treated(varname) mweight(varname) support(varname) notable nodist label graph hist graph_options ]

Options

treated(varname) Treatment (or group) indicator (0/1). If option raw is not specified, default is _treated left behind from the latest psmatch2 call.

both Requires comparability to be assessed both before and after matching. Default is only after matching.

raw Requires comparability to be assessed between any two (unweighted) groups. This can be before wishing to perform matching, but also unrelated to matching purposes, e.g. to quickly assess how randomisation has worked.

mweight(varname) Weight of matches. If option raw is not specified, default is _weight left behind from the latest psmatch2 call.

support(varname) Common support indicator (0/1). If option raw is not specified, default is _support left behind from the latest psmatch2 call.

notable Do not display the table with the individual covariate imbalance indicators (standardised percentage bias, t-tests, and if option both is specified achieved percentage reduction in absolute bias) for each variable in varlist.

label Display variable labels instead of variable names in the variable-by-variable table.

onlysig In the variable-by-variable table only display those variables which are significantly unbalanced (p<=0.10). This option is ignored if option both is specified.

nodist Do not display the distribution summary of the absolute standardised percentage bias across all variables in varlist.

graph Display a graphical summary of covariate imbalance via a dot chart, showing the standardised percentage bias for each covariate. If option both is specified, information before and after matching is displayed in the same dot chart. If more than 30 covariates are specified, they are not labelled.

hist Display a graphical summary of covariate imbalance via a histogram, showing the distribution of the standardised percentage bias across covariates. If option both is specified, imbalance before and after matching is displayed in two histograms. Recommended for a large number of covariates.

graph_options Additional options can be specified for the relevant graph type (dot graph or histogram). Useful examples are yscale(range(numlist)), ylabel(numlist)) or legend(off) for the former and bin(#) for the latter.

Examples

. pstest age gender foreign exper, t(training) mw(_weight) onlysig graph . pstest age foreign exper if district==1, raw t(male) label hist . psmatch2 treated age gender foreign exper, outcome(wage) . pstest . pstest, both

Also see

The commands psmatch2, psgraph.

Background Reading

Rosenbaum, P.R. and Rubin, D.B. (1985), "Constructing a Control Group Using Multivariate Matched Sampling Methods that Incorporate the Propensity Score", The American Statistician 39(1), 33-38.

Author

Edwin Leuven, University of Oslo. If you observe any problems mailto:e.leuven@gmail.com.