{smcl} {* *! version 1.0.0 6feb2022}{...} {vieweralsosee "[R] heckman" "help heckman"}{...} {vieweralsosee "[R] xtheckman" "help xtheckman"}{...} {viewerjumpto "Syntax" "gtsheckman##syntax"}{...} {viewerjumpto "Description" "gtsheckman##description"}{...} {viewerjumpto "Options" "gtsheckman##options"}{...} {viewerjumpto "Stored results" "gtsheckman##results"}{...} {viewerjumpto "Examples" "gtsheckman##examples"}{...} {viewerjumpto "Author" "gtsheckman##author"}{...} {viewerjumpto "References" "gtsheckman##references"}{...} {title:Title} {phang} {* phang is short for p 4 8 2} {bf:gtsheckman} {hline 2} A generalized two-step Heckman selection model {marker syntax}{...} {title:Syntax} {phang} {cmd:gtsheckman} {depvar} [{it:{help varlist:indepvars}}] {ifin} {cmd:,} {cmd: {opt sel:ect} (}{it:{help varlist:depvar_s}} {cmd:=} {it:{help varlist:varlist_s}}{cmd:)} [{it:options}] {phang} As in {helpb heckman}, {it:depvar} is the dependent variable, subject to sample selection, {it:indepvars} is the list of independent regressors, {it:depvar_s} is the binary selection indicator, and {it:varlist_s} is the list of independent regressors in the selection equation. {synoptset 20 tabbed}{...} {synopthdr} {synoptline} {syntab:Model} {synopt :*{opt sel:ect()}}specifies the selection equation: dependent and independent variables{p_end} {p2col:{bf:het(}{it:{help varlist:varlist}}{bf:)}}independent variables to model the variance in the selection equation{p_end} {p2col:{bf:clp(}{it:{help varlist:varlist}}{bf:)}}independent variables to be interacted with the inverse mills ratio{p_end} {syntab:SE/Robust} {p2col:{bf:vce(}{it:{help vcetype:vcetype}}{bf:)}}{it:vcetype} may be {cmd:robust} or {cmd:cluster} {it:clustervar}{p_end} {syntab:Reporting} {p2col:{bf:lambda}}generates the (scaled) inverse mills ratio (lambda) as a variable{p_end} {syntab:Maximize} {p2col:{it:maximize_options}}controls the maximization process; seldom used{p_end} {synoptline} {pstd} *{opt sel:ect()} is required. {marker description}{...} {title:Description} {pstd} {cmd:gtsheckman} fits regression models with selection by using Heckman's two-step consistent estimator. It is similar to the two step consistent {helpb heckman} estimator, but allows for heteroskedasticity in the first step and a more general specification of the control function. Moreover it provides both heteroskedastic robust inference as well as cluster robust inference. Therefore this command encompasses the two step consistent {helpb heckman} estimator as a special case. The methodology was proposed and studied by Carlson and Joshi (2022). {marker options}{...} {title:Options} {dlgtab:Model} {phang} {opt select(depvar_s = varlist_s)} specifies the variables for the selection equation. It is an integral part of specifying a Heckman model and is required. the selection equation should contain at least one variables that is not in the outcome equation. {it:depvar_s} should be coded as 0 or 1, with 0 indicating an observation not selected and 1 indicating a selected observation. {phang} {opt het(varlist)} specifies the independent variables in the variance function for the heteroskedastic probit estimator in the first stage. {phang} {opt clp(varlist)} specifies the independent variables to be interacted with lambda (inverse mills ratio) in the control function in the second stage. {dlgtab:SE/Robust} {phang} {opt vce(vcetype)} specifies the stype of standard errors reported, which includes types that are robust to some kinds of misspecification ({cmd:robust}), and that allow for intragroup correlation ({cmd:cluster} {it:clustervar}). {dlgtab:Reporting} {phang} {opt lambda} generates the (scaled) inverse mills ratio as a new variable named {cmd:lambda}. The inverse mills ratio is calculated from the first stage selection equation estimates, and will be scaled by the inverse of the conditional variance estimates when the option {cmd: het()} is specified. Some post estimation commands (like {helpb margins} and {helpb predict}) will require the {cmd:lambda} option be specified. {dlgtab:Maximization} {phang} {it:maximize_options} controls the maximization process; see help {help maximize}. Use of them is likely to be rare. {marker examples}{...} {title:Examples} {pstd}Setup{p_end} {phang2}{cmd:. use http://fmwww.bc.edu/ec-p/data/wooldridge/mroz, clear}{p_end} {pstd}Obtain Heckman's two-step consistent estimates{p_end} {phang2}{cmd:. gtsheckman lwage educ exper expersq, select(inlf = educ exper expersq age nwifeinc kidslt6 kidsge6)}{p_end} {pstd}Obtain Heckman's two-step consistent estimates with heteroskedastic robust standard errors{p_end} {phang2}{cmd:. gtsheckman lwage educ exper expersq, select(inlf = educ exper expersq age nwifeinc kidslt6 kidsge6) vce(robust)}{p_end} {pstd}Obtain Heckman's two-step consistent estimates with heteroskedasticity in the sample selection equation and robust standard errors{p_end} {phang2}{cmd:. gtsheckman lwage educ exper expersq, select(inlf = educ exper expersq age nwifeinc kidslt6 kidsge6) het(educ kidslt6 kidsge6) vce(robust)}{p_end} {pstd}Obtain Heckman's two-step consistent estimates with heteroskedasticity in the sample selection equation and covariance, and robust standard errors{p_end} {phang2}{cmd:. gtsheckman lwage educ exper expersq, select(inlf = educ exper expersq age nwifeinc kidslt6 kidsge6) het(educ kidslt6 kidsge6) clp(educ kidslt6 kidsge6) vce(robust)}{p_end} {pstd} Additional examples can be found in the gtsheckman_examples.do do-file. {marker results}{...} {title:Stored results} {pstd} {cmd:gtsheckman} stores the following in {cmd:e()}: {synoptset 20 tabbed}{...} {p2col 5 20 24 2: Scalars}{p_end} {synopt:{cmd:e(N)}}number of observations{p_end} {synopt:{cmd:e(N_selected)}}number of selected observations{p_end} {synopt:{cmd:e(N_nonselected)}}number of nonselected observations{p_end} {synoptset 20 tabbed}{...} {p2col 5 20 24 2: Macros}{p_end} {synopt:{cmd:e(cmd)}}{cmd:gtsheckman}{p_end} {synopt:{cmd:e(vce)}}{it:vcetype} specified in {cmd:vce()}{p_end} {synopt:{cmd:e(vcetype)}}title used to label Std. Err.{p_end} {synopt:{cmd:e(properties)}}{cmd:b V}{p_end} {synoptset 20 tabbed}{...} {p2col 5 20 24 2: Matrices}{p_end} {synopt:{cmd:e(b)}}coefficient vector{p_end} {synopt:{cmd:e(V)}}variance-covariance matrix of the estimators{p_end} {synoptset 20 tabbed}{...} {p2col 5 20 24 2: Functions}{p_end} {synopt:{cmd:e(sample)}}estimation sample{p_end} {p2colreset}{...} {marker author}{...} {title:Author} {pstd} Alyssa H. Carlson{break}Department of Economics, University of Missouri{break} carlsonah@missouri.edu{break}{browse "https://carlsonah.mufaculty.umsystem.edu/"} {marker references} {title:References} {phang} Carlson, A. H., and Joshi, R. 2024. Sample Selection in Linear Panel Data Models with Heterogenous Coefficents. {it:Journal of Applied Econometrics} 39(2):237-255. URL: {browse "https://doi.org/10.1002/jae.3022"}