{smcl}
{* *! version 2.0  23jul2021}{...}

{cmd: help ggt}
{hline}

{title:Title}

{p2colset 5 12 12 2}{...}
{p2col :{hi:ggt} {hline 2}}Geweke, Gowrisankaran, and Town Model Quality Estimator{p_end}
{p2colreset}{...}

{title:Syntax}

{p 8 18 2}
{cmdab:ggt,} 
outcomevar({it:varname})
orgchar({it:varname})
indID({it:varname})
orgID({it:varname})
choicechar({it:varlist})
[
{it:options}]


{title:Description}

{pstd}
This program estimates the parameters of the Geweke, Gowrisankaran, and Town (2003), "GGT," model.
The GGT model estimates the posterior distribution of organizational performance where there are 
many organizations from which individuals can choose to receive services. In this framework, individuals 
may select organizations based, in part, on information that is unobserved to the researcher and is 
correlated with the binary outcome. If this is the case, then standard approaches to inferring 
organization performance will yield biased estimates. The GGT model corrects for this unobserved selection 
allowing for flexible correlation in the error structure across the organizational choice and outcome equations. 
The estimation approach is Bayesian. In sum, the model combines an organization choice multinomial probit model 
with an individual outcome binary probit model, allowing for correlation across equations for each individual. As noted in GGT,
some possible applications for this model include: hospital quality based on mortality, school performance based on graduation 
rates, prison rehabilitation programs based on recidivism rates, and job training programs based on incidence of harassment complaints. 

{pstd}
The parameters are estimated using Bayesian inference through Markov chain Monte Carlo techniques to simulate 
parameters and latent variables conditional on data to determine the posterior distribution of parameters. 
While we present the basics of the model in the associated auxiliary file "{bf:ggt_methods.pdf}," (accessed via the command: {cmd: ssc desc ggt}), 
we encourage all users of this Stata function to read the GGT 
paper to fully understand the model, assumptions underneath the model, and parameters used in the estimation. 

{pstd}
To speed up the computation process, the program code calls an included C plugin file which estimates the 
parameters via MCMC Gibbs Sampling. 



{title:Methods and Equations}

{pstd}
We include a brief, yet important, explanation of the appropriate GGT model in the auxiliary file "{bf:ggt_methods.pdf,}" which can be obtained
via command {cmd: ssc desc ggt}. This file shows which variables and parameters are references in the calling for the ggt Stata function. 

{pstd} 
{bf:Technical Note:} Users may notice some slight differences in the model description of prior 
distributions from that in GGT Section 2.2. These do not change the model but do make the Stata 
code more tractable. We also describe these changes in the {bf:"ggt_methods.pdf"} file. 


{title:Options}

{dlgtab:Required Model Variables}

{phang}
{opt outcomevar(varname)} is required. It is the name of the variable that indicates the individual outcomes
in the binary probit model. This variable needs to be 0 or 1 for each individual. 

{phang}
{opt orgchoice(varname)} is required. It is the name of the variable that indicates the organization 
that each individual selects/chooses. This variable should be 0,1 and should sum to 1 for each 
individual.
 
{phang}
{opt indID(varname)} is required. It provides a unique identifier for each individual.
 
{phang}
{opt orgID(varname)} is required. It provides a unique identifier for each organization.

{phang}
{opt choicechar(varlist)} is required. It specifies the name of the variables that should be 
included in the choice equation. These are the Z variables referenced in {bf:"ggt_methods.pdf"} auxiliary file. 


{dlgtab:Optional Model Variables}

{phang}
{opt orgchar(varlist)} specifies the name of the variables that hold the different organization 
characteristics. These are the k and l variables referenced in {bf:"ggt_methods.pdf"} auxiliary file.
The maximum number of variables in this varlist is 10. The variables must be categorical in nature 
and can either be of Stata type string or factor. 
	
{phang}
{opt indchar(varlist)} specifies the name of the variables that should be included in the individual 
outcome probit equation. These are the X variables referenced in {bf:"ggt_methods.pdf"} auxiliary file.
The maximum number of variables in this varlist is 20. 
 
{dlgtab:Optional Model Parameters}

{phang}
{opt niter(integer)} is the number of iterations for Gibbs sampling. The default is 100000. 

{phang} 
{opt alphapriorvar(real)} is the diagonal elements of the alpha prior variance-covariance matrix. 
The default is 1. 

{phang}
{opt gammapriorvar(real)} is the diagonal elements of the gamma prior variance-covariance matrix. 
The default is 1.		

{phang}
{opt deltapriorvar(real)} is the sigma_gamma^2 term in the prior distribution of delta. See footnote 17 in GGT for information 
on choosing this value. The default is 0.038416.

{phang}
{opt priortau(real, integer)} is the hyper-parameters for the organization characteristic variance 
hierarchical prior distributions. ggt allows users to specify the s2 and v terms in the hierarchical prior, 		
s2 / tauo2 ~ chi2(v) for organization characteristic, o. These terms are all referenced in referenced in {bf:"ggt_methods.pdf}" auxiliary file. 
The first and second numbers in priortau() are s2 and v respectively. The default is priortau(1.25,5). Users must specify both elements 
if choosing to use this option. 

{phang}
{opt noselection} option should be specified if the user does not want to apply the selection 
correction. In this case, the program 	will simply estimate the parameters in GGT equation (1). Note: 
The code will also estimate alpha solely for the purpose of comparison. 

{phang}
{opt noconstant} option should be specified if the user does not want to include a constant in 
the outcome probit equation, i.e. gamma will not include a constant term. 

{dlgtab:Reporting}

{phang}
{opt savedraws} option will save a .csv file in the directory which holds every 100 draws of each parameter 
via the MCMC Gibbs Sampling routine. 



{title:Examples}

{pstd}
In this section, we present the data structure necessary to run {cmd:ggt} along with 
3 examples to demonstrate different calls to the program. Please see the auxiliary file {bf:"ggt_examples.pdf"} for more 
detailed explanation of the following data and examples. We provide the sample data, {bf:"ggt_test_data.dta"} 
as an additional auxiliary file as well. 

{bf:Data Structure}

{pstd}
Assume we are interested in hospital quality. The dataset {bf:"ggt_test_data.dta"} contains data on 
300 patients and 8 hospitals. The variables include the individual patient identifier, {it:indnumber}, and the hospital identifier,
{it:hospnum}. The patient specific variables are {it:mortality} and {it:severity}.
The individual choice variables are {it:dist} and {it:dist2} representing the distance from each patient to each 
hospital along with its square (normalized to have similar scales, necessary since the priors are the same).  We also have 
hospital characteristic variables, {it:hosp_size} and {it:hosp_ownership}.

{pstd}
In the Stata dataset, there should be an observation for each individual-hospital pair, even if the 
individual did not choose that hospital. For example, with 300 patients and 8 hospitals, we have 
300*8=2400 observations in the data. Please see the {bf:"ggt_test_data.dta"} auxiliary file to explore the appropriate dataset format. 

{pstd}{cmd:. use ggt_test_data.dta}{p_end}

{bf:Example 1}

{pstd}
Suppose we want to estimate the selection-correction hospital quality measures using all the default settings. 

{pstd}{cmd:. ggt, outcomevar(mortality) orgchoice(hosp_choice) indID(indnumber) orgID(hospnum) choicechar(dist dist2)}{p_end}
{pstd}{txt}complete

{pstd}
This will apply the selection model using {it:dist} and {it:dist2} as the choice characteristics. 
Since we did not specify the {opt indchar} option, the code will assume only a constant and the hospital choice 
for the individual probit model. Additionally, since we did not specify {opt orgchar}, the code will assume 
no correlation across hospitals via hospital size or ownership. The sampling algorithm will assume the 
default prior variance options and number of iterations.

{pstd}
The output on the screen will be the summary statistics for the estimated beta draws via the MCMC Gibbs 
sampler. The variable "q_n" respresents the quality for hospital ID, 'n'. The number of observations in 
this example is 900- this comes from the default 100000 iterations, saving only every 100th draw, and 
deleting the first 10000 draws as burn-in. 

{pstd}
{bf:Note:} The code may take several minutes to complete running due to its computational complexity. Once the 
code is complete, the word "complete" will display on the Stata screen. 

{bf:Example 2}

{pstd}
Now, suppose we want to include the severity measure in the morality equation and we also want to allow hospital correlation based on size and ownership. 
Additionally, we want to rescale the prior variances based on the structure of the data. Specifically, we want the prior variance of alpha to be 5, 
the prior variance of gamma to be 3, selection term for delta to be .1, and the parameters for the hyperpriors to be 1 and 5.  Finally, we want to save 
the draws for each of the parameters in a csv file to the directory. 

{pstd}{cmd:. ggt, outcomevar(mortality) orgchoice(hosp_choice) indID(indnumber) orgID(hospnum) choicechar(dist dist2) indchar(severity) orgchar(hosp_size hosp_ownership) alphapriorvar(5) gammapriorvar(3) deltapriorvar(.1) priortau(1,5) savedraws }{p_end}

{pstd}complete

{bf:Example 3}

{pstd}
Finally, suppose we wish to compare the results to the case where we do not apply the selection 
correction. In this case, the program simply estimates equation (1) in GGT. We can still specify all the 
options, but the code will only use those that are necessary. e.g., since the nonselection model assumes 
that delta=0, then specifying deltapriorvar is unnecessary. 

{pstd}
Note: Even though the equation we wish to estimate does not depend on patient-organization choice 
characteristics, the code will still require choice characteristics in its estimation of alpha. 

{pstd}{cmd:. ggt, outcomevar(mortality) orgchoice(hosp_choice) indID(indnumber) orgID(hospnum) choicechar(dist dist2) indchar(severity) orgchar(hosp_size hosp_ownership) alphapriorvar(5) gammapriorvar(3) priortau(1,5) noselection}{p_end}
{pstd}{txt}complete


{title:References}

Geweke, J., Gowrisankaran, G., & Town, R. J. (2003). Bayesian inference for hospital quality in a selection model. Econometrica, 71(4), 1215-1238.

{p2colreset}{...}