Title
gpscore -- Estimation of the generalized propensity score
Syntax
gpscore varlist [if] [in] [weight], t(varname) gpscore(newvar) predict(newvar) sigma(newvar) cutpoints(varname) index(string) nq_gps(#) [t_transf(transformation) normal_test(test) norm_level(#) test_varlist(varlist) test(type) flag(#) detail]
fweights, iweights, and pweights are allowed; see weight.
Description
gpscore estimates the parameters of the conditional distribution of the treatment given the control variables in varlist by maximum likelihood, which is assumed to be normal; assesses the validity of the assumed normal distribution model by a user-specified goodness-of-fit test; and estimates the generalized propensity score (GPS). The estimated GPS is defined as R=r(T,X), where r(.,.) is the conditional density of the treatment given the covariates, T is the observed treatment, and X is the vector of the observed covariates. Then gpscore tests the balancing property by using the algorithm suggested by Hirano and Imbens (2004), and informs the user whether and at what extent the balancing property is supported by the data.
Options
Required
t(varname) specifies that varname is the treatment variable.
gpscore(newvar) specifies the variable name for the estimated GPS, which is added to the dataset.
predict(newvar) creates a new variable to hold the fitted values of the treatment variable.
sigma(newvar) creates a new variable to hold the maximum likelihood estimate of the conditional standard error of the treatment given the covariates.
cutpoints(varname) divides the set of potential treatment values into intervals according to the sample distribution of the treatment variable, cutting at varname quantiles.
index(string) specifies the representative point of the treatment variable at which the GPS has to be evaluated within each treatment interval. string identifies either the mean (string = mean) or a percentile (string = p1, ..., p100) of the treatment.
nq_gps(#) specifies that the values of the GPS evaluated at the representative point index(string) of each treatment interval have to be divided into # (1 <= # <= 100) intervals, defined by the quantiles of the GPS evaluated at the representative point index(string).
Optional
t_transf(transformation) specifies the transformation of the treatment variable used in estimating the GPS. The default transformation is the identity function. The supported transformations are the logarithmic transformation, t_transf(ln); the zero-skewness log transformation, t_transf(lnskew0); the zero-skewness Box-Cox transformation, t_transf(bcskew0); and the Box-Cox transformation, t_transf(boxcox).
normal_test(test) specifies the goodness-of-fit test that gpscore will perform to assess the validity of the assumed normal distribution model for the treatment conditional on the covariates. By default, gpscore performs the Kolmogorov-Smirnov test (normal_test(ksmirnov)). Possible alternatives are the Shapiro-Francia test, normal_test(sfrancia); the Shapiro-Wilk test, normal_test(swilk); and the Stata skewness and kurtosis test for normality, normal_test(sktest).
norm_level(#) sets the significance level of the goodness-of-fit test for normality. The default is norm_level(0.05).
test_varlist(varlist) specifies that the extent of covariate balancing has to be inspected for each variable of varlist. The default varlist consists of the variables used to estimate the GPS. This option is useful when there are categorical variables among the covariates to test the balancing property for the omitted group.
test(type) specifies whether the balancing property has to be tested using a standard two-sided t test (the default) or a Bayes-factor-based method.
flag(#) specifies that gpscore estimates the GPS without performing either a goodness-of-fit test for normality or a balancing test. The default # is 1, meaning that both the normal distribution model and the balancing property are tested; the default level is recommended.
detail displays more detailed output showing the results of the goodness-of-fit test for normality, some summary statistics of the distribution of the GPS evaluated at the representative point of each treatment interval, and the results of the balancing test within each treatment interval.
Remarks
Please remember to use the update query command before running this program to make sure you have an up-to-date version of Stata installed. Otherwise, this program may not run properly.
The treatment has to be continuous.
Make sure that the variables in varlist do not contain missing values.
Examples
. #delimit ; . gpscore agew ownhs male tixbot workthen yearw, . t(prize) gpscore(mygps) predict(hat_treat) sigma(hat_sd) . cutpoints(cut) index(p50) nq_gps(5) . ;
. #delimit ; . gpscore agew ownhs male tixbot workthen yearw, . t(prize) gpscore(mygps) predict(hat_treat) sigma(hat_sd) . cutpoints(cut) index(p50) nq_gps(5) . t_transf(ln) normal_test(0.01) . ;
. #delimit ; . gpscore agew ownhs male tixbot workthen yearw, . t(prize) gpscore(mygps) predict(hat_treat) sigma(hat_sd) . cutpoints(cut) index(p50) nq_gps(5) . t_transf(ln) normal_test(0.01) test(Bayes_factor) . ;
Reference
Hirano, K., and G. W. Imbens. 2004. The propensity score with continuous treatments. In Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives, ed. A. Gelman and X.-L. Meng, 73-84. West Sussex, England: Wiley InterScience.
Authors
Michela Bia Laboratorio Riccardo Revelli Centre for Employment Studies, Collegio Carlo Alberto michela.bia@laboratoriorevelli.it
Alessandra Mattei Department of Statistics,"Giuseppe Parenti", University of Florence mattei@ds.unifi.it
Also see
Article: Stata Journal, volume 8, number 3: st0150
Online: doseresponse, doseresponse_model