{smcl}
{* *! version 9 02Oct2024}{...}

help gpsmdbal
{hline}

{title:Title}

{pstd} {cmd:gpsmdbal} {hline 2} Balancing property test. It is the multidimensional analog to the t-tests for equality of means before and after matching implemented in {help pstest} for binary treatments (Leuven and Sianesi 2003). 

{title:Syntax}

{phang2} {cmd:gpsmdbal} {varlist}(min=1){cmd:,} {opt cutpoints(numlist integer max=1)}  {opt index(string)} {opt nq_gpsmd(numlist max=1)} {opt discrtreat(string)} [{opt ptile(string)} {opt obs_notsup(string)} {opt gpsmdtequalt(string)} {opt ln(varlist)} {opt level(numlist max=1)}]

{phang} {it: varlist}: The variables for which the balancing property has to be assessed.

{title:Description}

{pstd}
{cmd:gpsmdbal} tests whether conditioning on the propensity score effectively removes the differences regarding the mean of the exogenous covariates between groups treated with different doses.
The procedure is the multidimensional analog to the t-tests for equality of means before and after matching implemented in {help pstest} for binary treatments (Leuven and Sianesi 2003). 
Similarly to {cmd:gpsmdcomsup} relies on partitioning the treatment into an arbitrary number of subsets and iteratively considering one subset as "the treatment group".
The equality of means between the control and treatment groups is then tested before and after adjusting for the propensity score. 

{pstd}
The command should be used together with {cmd:gpsmd}, {cmd:gpsmdcomsup}, and {cmd:gpsmdpolest} to estimate the dose-response function. 

{pstd}
{it:Note}: {cmd:gpsmd} must be invoked before invoking {cmd:gpsmdbal}. 

{title:Options}

{phang}{opt cutpoints(numlist integer max=1)}: the number of discrete intervals of the dimensions of the treatment.

{phang}{opt index(string)}: the point {bf:t_{it:d}} where the user wants to calculate the GPS. It can be "mean" or "p50": "mean" for the mean, and "p50" for the median.

{phang}{opt nq_gpsmd(numlist max=1)}: the number of discrete subsets of the GPS.

{phang}{opt discrtreat(string)}: the program discretizes the treatment in a user-defined number of subsets. It also generates a variable storing the information about the discrete subset to which an observation belongs.
In {opt discrtreat(string)} the user must specify the name of this variable.

{phang}{opt obs_notsup(string)}: the string with the name of the dummy variable generated by the command {cmd:gpsmdcomsup}. The variable indicates whether the observation is included in the common support or not. 
If specified, {cmd:gpsmdcomsup} must have been run before invoking {cmd:gpsmdbal}. If it is not specified, {cmd:gpsmdbal} will perform the analysis using the entire sample.

{phang}{opt ptile(string)}: the program generates the discrete subsets of the treatment by calculating the Cartesian product of the discrete intervals of the dimensions. 
In addition, the program generates a variable for each dimension where it stores the discrete subset of the dimension to which an observation belongs. 
In {opt ptile(string)} the user must specify the first characters for the name of these variables. The default is __ptile .

{phang}{opt gpsmdtequalt(string)}: the user may want to inspect the distribution of the GPS calculated at the representative point of the discrete subsets of the treatment, g({bf:t_{it:d}}, {bf:Z_i}).
When {opt gpsmdtequalt(string)} is specified, the program generates one variable for each discrete subset of the treatment storing the GPS calculated at the representative point of that discrete subset.
These variables are named {it: gpsmdtequalt#} where {it: gpsmdtequalt} is the name specified in {opt gpsmdtequalt(string)} and # stands for the number of the discrete subset. By default, the program does not generate these variables.

{phang}{opt ln(varlist)}: the treatment dimensions that have to be log-transformed.

{phang}{opt level(numlist max=1)}: the program prints the table with the adjusted and unadjusted differences in means ({cmd:r(NofDiscTreat)}) both entirely and setting to missing cells whose p-value is higher than a certain threshold (specified in {opt level(numlist)}). The default is 0.05.

{title:Examples}
{hline}
{pstd}Setup

{phang2}Setting the dataset:

{phang3}{cmd:. clear all}

{phang3}{cmd:. set obs 1200}

{phang2}Generating independent variables for the propensity score estimation:

{phang3}{cmd:. seed 13131}

{phang3}{cmd:. gen X1 = 1* rnormal(0,1)}

{phang3}{cmd:. gen X2 = 2* rnormal(0,1)}

{phang3}{cmd:. gen X3 = 3* rnormal(0,1)}

{phang3}{cmd:. gen X4 = 4* rnormal(0,1)}

{phang3}{cmd:. gen X5 = 5* rnormal(0,1)}

{phang3}{cmd:. gen X6 = 6* rnormal(0,1)}

{phang3}{cmd:. gen X7 = 7* rnormal(0,1)}

{phang2}Generating the treatment dimensions:

{phang3}{cmd:. matrix R = (25, 2 \2, 25)}

{phang3}{cmd:. drawnorm V1 V2, cov(R)}

{phang3}{cmd:. gen T1= 1*X1 + .5*X2 + 1*X3 + .5*X4 + 1*X5 + .5*X6 + 1*X7 + V1}

{phang3}{cmd:. gen T2= .5*X1 + 1*X2 + .5*X3 + 1*X4 + .5*X5 + 1*X6 + .5*X7 + V2}

{phang2}The estimation:

{phang3}{cmd:. gpsmd T1 T2, exogenous(X1 X2 X3 X4 X5 X6 X7) gpsmd(GPS)}

{phang3}{cmd:. gpsmdcomsup T1 T2, exogenous(X1 X2 X3 X4 X5 X6 X7) index("p50") cutpoints(2) obs_notsup(Commonsupport)}

{phang3}{cmd:. gpsmdbal X1 X2 X3 X4 X5 X6 X7,  index("p50") cutpoints(2) nq_gpsmd(4) discrtreat(Discretetreat) obs_notsup(Commonsupport)}

{hline}

{title:Stored results}

{p2col 5 20 24 2: Variables}{p_end}
{p 6 8 2}{cmd:gpsmdbal} generates the following list of variables: 

{p 8 10 2}- One variable named as specified in {opt discrtreat}. This variable reports, for each observation, the discrete set of the treatment to which the observation is assigned.

{p 8 10 2}- One variable named {it: `ptile’#} for every dimension # of the treatment.
The {it: `ptile’#} variable stores the number of the discrete subset of the #dimension to which the observation is assigned. The variable {it: `discrtreat'} is generated as the Cartesian product of {it `ptile’#}.

{pstd}
{cmd:gpsmdbal} stores the following in {cmd:r()} - Some objects are simply copied from the {cmd: gpsmd} results:

{synoptset 20 tabbed}{...}
{p2col 5 20 24 2: Macros}{p_end}
{synopt:{cmd:r(NofDiscTreat)}}macro with the number of discrete treatments{p_end}
{synopt:{cmd:r(cmd)}}macro with the name of the command just invoked ({cmd:gpsmdbal}){p_end}
{synopt:{cmd:r(cmdline)}}macro with the {it:cdmline}. This macro reports the command just invoked, including options and specifications{p_end}
{synopt:{cmd:r(DimensionsFS)}}macro with the name of the dimensions used in calculating the propensity score. It differs from {cmd:r(Dimensions)} only if the {opt ln(varlist)} option is used.{p_end}
{synopt:{cmd:r(LNVarCreated)}}if the {opt ln(varlist)} option is specified, {cmd:r(LNVarCreated)} contains the list of the variable generated{p_end}

{synoptset 20 tabbed}{...}
{p2col 5 20 24 2: Matrices}{p_end}
{synopt:{cmd:r(Tabmeandiff)}}matrix having one row for each variable the user wanted to test and two columns for each discrete treatment. In the cells, the program reports the mean differences before and after adjusting for the GPS.{p_end}
{synopt:{cmd:r(Tabmeandiff2)}}{cmd:r(Tabmeandiff)} where pvalues higher than the threshold specified in {opt level()} are set to missing.{p_end}
{synopt:{cmd:r(TableImpRes)}}matrix having one row for each variable the user wanted to test and two columns for each discrete treatment. In the cells, the program reports the p-value of the test before and after adjusting for the GPS.{p_end}
{synopt:{cmd:r(TableImpRes2)}}{cmd:r(TableImpRes)} where pvalues higher than the threshold specified in {opt level()} are set to missing.{p_end}
{synopt:{cmd:r(ResultAdj#)}}for each discrete subset of the treatment, the program generates a matrix reporting the results of the t-test for the adjusted mean. The first column reports the t statistic, the second column reports the p-value, and the third column reports the degree of freedom. There is one row for each variable that the user wanted to test.{p_end}
{synopt:{cmd:r(Result#)}}the program generates a matrix reporting the results of the t-test for discrete subset # before the adjustment. There is one column for every r-class object of {help ttest} plus one for the estimated difference. There is one row for each variable that the user wanted to test. {cmd:r(ResultAdj#)} and {cmd:r(Result#)} are somehow redundant objects. {cmd:r(TabellaImpRes)} already includes the essential information{p_end}
{synopt:{cmd:r(Chosenpoint#)}}the program reports, for each discrete subset of the treatment #, a matrix storing the representative treatment vector {bf:t_{it:d}} chosen.{p_end}


{title:Bibliography and Sources}

{p}Egger, Peter H., and Maximilian von Ehrlich. 2013. ‘Generalized Propensity Scores for Multiple Continuous Treatment Variables’. {it:Economics Letters} 119 (1): 32–34.

{p}Egger, Peter H., and Peter Egger. 2016. ‘Heterogeneous Effects of Tariff and Nontariff Policy Barriers in General Equilibrium’. Beiträge zur Jahrestagung des Vereins für Socialpolitik 2016: Demographischer Wandel - Session: Trade Barriers, No. G18-V3, ZBW - Deutsche Zentralbibliothek für Wirtschaftswissenschaften, Leibniz-Informationszentrum Wirtschaft, Kiel und Hamburg.

{p}Egger, Peter H., Maximilian v. Ehrlich, and Douglas R. Nelson. 2020. ‘The Trade Effects of Skilled versus Unskilled Migration’. {it:Journal of Comparative Economics} 48 (2): 448–64.

{p}Leuven, Edwin, and Barbara Sianesi. 2003. ‘PSMATCH2: Stata Module to Perform Full Mahalanobis and Propensity Score Matching, Common Support Graphing, and Covariate Imbalance Testing’.