Template-Type: ReDIF-Paper 1.0 Author-Name: Kristoffer Bjarkefur Author-Workplace-Name: The World Bank Group Author-Email: kbjarkefur@worldbank.org Author-Name: Luiza Cardoso de Andrade Author-Workplace-Name: The World Bank Group Author-Name: Benjamin Daniels Author-Workplace-Name: The World Bank Group Author-Name: Mrijan Rimal Author-Workplace-Name: The World Bank Group Title: ietoolkit: How DIME Analytics develops Stata code from primary data work Abstract: Over the years, the complexity of data work in development research has grown exponentially, and standardizations for workflows are needed for researchers and data analysts to work simultaneously on multiple projects. -ietoolkit- was developed to standardize and simplify best practices for data management and analysis across the 100+ members of the World Bank's Development Research Group, Impact Evaluations team (DIME). It includes a standardized project folder structure; standardized Stata 'boilerplate' code; standardized balance tables, graphs, and matching procedures; and modified dropping and saving commands with built-in safety checks. The presentation will outline how the -ietoolkit- structure is meant to serve as a guide for projects to move their data through the analysis process in a standardized way, as well as offer a brief introduction to the other commands. The intent is for many projects within one organization to have a predictable workflow, such that researchers and data analysts can move between multiple projects and support other teams easily and rapidly without expending time relearning idiosyncratic project organization structures and standards. These tools are developed open-source on GitHub and available publicly. Creation-Date: 20190802 File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Bjarkefur.pdf Handle: RePEc:boc:scon19:12 Template-Type: ReDIF-Paper 1.0 Author-Name: Benjamin Daniels Author-Workplace-Name: World Bank Development Research Group, Impact Evaluations (DIME) Author-Email: bdaniels@worldbank.org Author-Name: Luiza Cardoso de Andrade Author-Workplace-Name: The World Bank Group Author-Name: Kristoffer Bjarkefur Author-Workplace-Name: The World Bank Group Title: iefieldkit: Stata commands for primary data collection and cleaning Abstract: Data collection and cleaning workflows use highly repetitive but extremely important processes. -iefieldkit- was developed to standardize and simplify best practices for high-quality primary data collection across the 100+ members of the World Bank's Development Research Group, Impact Evaluations team (DIME). It automates: error-checking for electronic ODK-based survey modules such as those implemented in SurveyCTO; duplicate checking and resolution; data cleaning including renaming, labeling, recoding, and survey harmonization; and codebook creation. The presentation will outline how the -iefieldkit- package is intended to provide a data collection workflow skeleton for nearly any type of primary data collection, from questionnaire design to data import. One feature of many -iefieldkit- commands is their utilization of spreadsheet-based workflows, which reduce repetitive coding in Stata and document corrections and cleaning in a human-readable format. This enables rapid review of data quality in a standardized process, with the goal of producing maximally clean primary data for the downstream data construction and analysis phases in a transparent and accessible manner. These tools are developed open-source on GitHub and available publicly. Creation-Date: 20190802 File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Daniels.pdf Handle: RePEc:boc:scon19:11 Template-Type: ReDIF-Paper 1.0 Author-Name: Billy Buchanan Author-Workplace-Name: Fayette County Public Schools Author-Email: william@williambuchanan.net Title: Barrel Aged Software Development: brewscheme as a four-year old Abstract: The term software development implies some type of change over time. While Stata goes through extraordinary steps to support backwards compatibility, user-contributors may not always see a need to continue developing programs shared with the community. How do you know if/when you should add additional programs/functionality to an existing package? Is it easy/practical to extend existing Stata code or is it easier to refactor everything from the ground up? What can you do to make it easier to extend existing code? While -brewscheme- may have started as a relatively simple package with a couple of commands and limited functionality, in the four years since it was introduced it has grown into a multifunctional library of tools to make it easier to create customized visualizations in Stata while being mindful of color sight impairments. I will share my experience, what I have learned, and strategies related to how I dealt with these questions in the context of the development of the -brewscheme- package. I will also show what the additional features do that the original -brewscheme- did not do. Creation-Date: 20190802 File-URL: https://wbuchanan.github.io/stataConference2019/#/ Handle: RePEc:boc:scon19:30 Template-Type: ReDIF-Paper 1.0 Author-Name: Phil Ender Author-Workplace-Name: UCLA Retired Author-Email: ender@ucla.edu Title: Simulating Baboon Behavior using Stata Abstract: This presentation originated from a field study of the behavior of feral baboons in Tanzania. The field study made use of behavior sampling methods. Behavior sampling methods included on-the-moment (instantaneous) and thru-the-moment (one-zero). Some primatologists critiqued behavioral sampling as not reflecting true frequency or duration. A Monte Carlo simulation study was performed to compare behavior sampling with actual frequency and duration. Creation-Date: 20190802 File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Ender.pdf Handle: RePEc:boc:scon19:99 Template-Type: ReDIF-Paper 1.0 Author-Name: Barbara Williams Author-Workplace-Name: Virginia Mason Medical Center Author-Email: barbara.williams@virginiamason.org Title: Using cluster analysis to understand complex data sets- experience from a national nursing consortium Abstract: Cluster analysis is a type of exploratory data analysis for classifying observations and identifying distinct groups. It may be useful for complex data sets where commonly used regression modeling approaches may be inadequate due to outliers, complex interactions or violation of assumptions. In health care, the complex effect of nursing factors (including staffing levels, experience, and contract status), hospital size, and patient characteristics on patient safety (including pressure ulcers and falls) has not been well understood. In this presentation, I will explore the use of use Stata cluster analysis (cluster) to describe five groups of hospital units which have distinct characteristics to predict patient pressure ulcers and hospital falls in relationship to employment of supplemental registered nurses (SRNs) in a national nursing database. The use of SRNs is a common practice among hospitals to fill gaps in nurse staffing. But the relationship between the use of SRNs and patient outcomes varies widely, with some groups reporting a positive relationship while other groups report an adverse relationship. The purpose of this presentation is to identify the advantages and disadvantages of cluster analysis and other methods when analyzing non-normally distributed, non-linear data that have unpredictable interactions. Creation-Date: 20190802 File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Williams.pptx Handle: RePEc:boc:scon19:20 Template-Type: ReDIF-Paper 1.0 Author-Name: Karl X.Y. Zou Author-Workplace-Name: Texas A&M University Author-Email: Xinyuan.Zou@tamu.edu Author-Name: Mark Fossett Title: The Individual Process of Neighborhood Change and Residential Segregation in 1940 - An Implication of Discrete-Choice Model Abstract: Using the 1940 restricted census microdata, this study develops discrete choice models to investigate how individual and household characteristics, along with the features of neighborhoods of residence affect individual choices of residential outcomes in the US cities. This study will make several innovations: (1) We will take advantage of 100% census microdata on the whole population of the cities to establish discrete-choice models estimating the attributes of alternatives (e.g. neighborhoods) and personal characteristics simultaneously. (2) This study will set a routine of reconstructing personal records to the data structure eligible for discrete-choice model and then test whether or not the assumptions are violated. (3) This study will assess the extent and importance of discrimination and residential preferences respectively through the model specification. The results suggest that both in-group racial and class preferences can explain the individual process of neighborhood changes. All groups somehow practice out-group avoidance based on race and social class. Such phenomena are more pronounced in multi-racial cities. Creation-Date: 20190802 File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Zou.pptx Handle: RePEc:boc:scon19:42 Template-Type: ReDIF-Paper 1.0 Author-Name: Hua Peng Author-Workplace-Name: StataCorp Title: Stata Python integration Abstract: Users may extend Stata's features using other programming languages such as Java and C. New in Stata 16, Stata has tight integration with Python, which allows users to embed and execute Python code from within Stata. I will discuss how users can easily call Python from Stata, output Python results within Stata, and exchange data and results between Python and Stata, both interactively and as sub-routines within do-files and ado-files. I will also show examples of the Stata Function Interface (sfi); a Python module provided with Stata which provides extensive facilities for accessing Stata objects from within Python. File-URL: https://huapeng01016.github.io/chicago19/#/hua-pengstatacorphpeng Creation-Date: 20190802 Handle: RePEc:boc:scon19:1 Template-Type: ReDIF-Paper 1.0 Author-Name: Giovanni Cerulli Author-Workplace-Name: IRCrES-CNR, National Research Council of Italy Author-Email: giovanni.cerulli@ircres.cnr.it Title: Extending the difference-in-differences (DID) to settings with many treated units and same intervention time: Model and Stata implementation Abstract: The difference-in-differences (DID) estimator is popular to estimate average treatment effects in causal inference studies. Under the common support assumption, DID overcomes the problem of unobservable selection using panel, time, and/or location fixed effects, and the knowledge of the pre/post intervention times. New developments of DID have been recently proposed: (i) the Synthetic Control Method (SCM) applies when a long pre- and post-intervention time series is available, only one unit is treated, and intervention occurs in a specific time (implemented in Stata via SYNTH by Hainmueller, Abadie, Dimond, 2014); (ii) an extension to binary time varying treatment with many treated units, have been also proposed and implemented in Stata via TVDIFF (Cerulli and Ventura, 2018). However, a command to accommodate a setting with many treated units and same intervention time is still lacking. In this presentation, I propose a potential outcome model to accommodate this latter setting, and provide a Stata implementation via the new Stata routine FTMTDIFF (standing for fixed-time multiple treated DID). I will finally set some guidelines for future DID developments. Creation-Date: 20190802 File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Cerulli.pdf Handle: RePEc:boc:scon19:26 Template-Type: ReDIF-Paper 1.0 Author-Name: Austin Nichols Author-Workplace-Name: Abt Associates Author-Email: austinnichols@gmail.com Author-Name: Andrew Goodman-Bacon Author-Workplace-Name: Vanderbilt University Author-Name: Thomas Goldring Author-Workplace-Name: Georgia Policy Labs Title: Bacon decomposition for understanding differences-in-differences with variation in treatment timing Abstract: In applications of a difference-in-differences (DD) model, researchers often exploit natural experiments with variation in onset, comparing outcomes across groups of units that receive treatment starting at different times. Goodman-Bacon (2019) shows that this DD estimator is a weighted average of all possible two-group/two-period DD estimators in the data. The -bacon- command performs this decomposition and graphs all two-by-two DD estimates against their weight, which displays all identifying variation for the overall DD estimate. Given the widespread use of the two-way fixed effects DD model, -bacon- has broad applicability across domains and will help researchers understand how much of a given DD estimate comes from different sources of variation. Creation-Date: 20190802 File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Goodman-Bacon.pptx Handle: RePEc:boc:scon19:46 Template-Type: ReDIF-Paper 1.0 Author-Name: Choonjoo Lee Author-Workplace-Name: Korea National Defense University Author-Email: bloom.rampike@gmail.com Title: The matching problem using Stata Abstract: A main purpose of this presentation is to discuss an algorithm for the matching problem. As an example, K-cycle Kidney exchange problem is defined and solved using user-written Stata program. Creation-Date: 20190802 File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Lee.pdf Handle: RePEc:boc:scon19:43 Template-Type: ReDIF-Paper 1.0 Author-Name: Joseph Terza Author-Workplace-Name: Department of Economics, Indiana University Purdue University Indianapolis Author-Email: jvterza@iupui.edu Title: Mata implementation of Gauss-Legendre quadrature in the M-estimation context: Correcting for sample-selection bias in a generic nonlinear setting Abstract: Many contexts in empirical econometrics require non-closed form integration for appropriate modeling and estimation design. Applied researchers often avoid such correct but computationally demanding specifications and opt for simpler misspecified modeling designs. The presentation will detail a newly developed Mata implementation of a relatively simple numerical integration technique – Gauss-Legendre quadrature. Although this Mata code is applicable in a variety of circumstances, it was mainly written for use in M-estimation when the relevant objective function (e.g. the likelihood function) involves integration at the observation level. As inputs, the user supplies a vector-valued integrand function (e.g. a vector of sample log-likelihood integrands) and a matrix of upper and lower integration limits. The code outputs the corresponding vector of integrals (e.g. the vector of observation-specific log-likelihood values). To illustrate the use of this Mata implementation, we conduct an empirical analysis of classical sample selection bias in the estimation of wage offer regressions. We estimate a nonlinear version of the model based on the modeling approach suggested by Terza (Econometric Reviews, 2009) which requires numerical integration. This model is juxtaposed with the classical linear sample selection specification of Heckman (Annals of Economic and Social Measurement, 1976) for which numerical integration is not required. Creation-Date: 20190802 File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Terza.pdf Handle: RePEc:boc:scon19:31 Template-Type: ReDIF-Paper 1.0 Author-Name: Carlos Dorantes Author-Workplace-Name: Tec de Monterrey Author-Email: cdorante@tec.mx Title: A practical application of the mvport package: CAPM-based optimal portfolios Abstract: The mvport package has commands for financial portfolio optimization and portfolio backtesting. I present a practical implementation of a CAPM-based strategy to select stocks, and then apply different optimization settings, and evaluate the resulting portfolios. The presentation illustrates how to automate the process through a simple do file that allows to easily change parameters (e.g. stock list, market index, risk-free rate) using an Excel interface. The program automates the following: a) data collection, b) CAPM model estimation for all stocks, c) selection of stocks based on CAPM parameters, d) portfolio optimization with different configurations, and e) portfolio backtesting. For data collection, the getsymbols and the freduse command is used to get online price data for all the S&P500 stocks and the risk-free rate. For each stock, two competing CAPM models are estimated: using a simple regression, and using an autoregressive conditional heteroscedasticity (ARCH) model. The CAPM parameters are used to select stocks. Then the mvport package is used to optimize different configurations of the portfolio. Finally, the performance of each portfolio configuration is calculated is compared with the market portfolio. Creation-Date: 20190802 File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Dorantes.pdf Handle: RePEc:boc:scon19:50 Template-Type: ReDIF-Paper 1.0 Author-Name: Tim Schmidt Author-Workplace-Name: Discover Financial Services Author-Email: timothyschmidt@discover.com Title: Tools to analyze interest rates and value bonds Abstract: Bond markets contain a wealth of information about investor preferences and expectations. However, extracting such information from market interest rates can be computationally burdensome. I introduce a suite of new Stata commands to aid finance professionals and researchers in using Stata to analyze the term structure of interest rates and value bonds. The genspot command uses a bootstrap methodology to construct a spot rate curve from a yield curve of market interest rates under a no-arbitrage assumption. The genfwd command generates a forward rate curve from a spot rate curve, allowing researchers to infer market participants’ expectations of future interest rates. Finally, the pricebond command uses forward rates to value a bond with user-specified terms. Creation-Date: 20190802 File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Schmidt.pdf Handle: RePEc:boc:scon19:21 Template-Type: ReDIF-Paper 1.0 Author-Name: Mustafa Karakaplan Author-Email: mukarakaplan@yahoo.com Title: Panel Stochastic Frontier Models with Endogeneity in Stata Abstract: I introduce xtsfkk, a new Stata command for fitting panel stochastic frontier models with endogeneity. The advantage of xtsfkk is that it can control for the endogenous variables in the frontier and/or the inefficiency term in a longitudinal setting. Hence, xtsfkk performs better than standard panel frontier methodologies such as xtfrontier that overlook endogeneity by design. Creation-Date: 20190802 File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Karakaplan.pptx Handle: RePEc:boc:scon19:53 Template-Type: ReDIF-Paper 1.0 Author-Name: Fernando Rios-Avila Author-Workplace-Name: Levy Economics Institute Author-Email: friosavi@levy.org Title: Recentered Influence Functions (RIF) in Stata: RIF-Regression and RIF-Decomposition Abstract: Recentered Influence Functions (RIF) are statistical tools that have been popularized by Firpo, Fortin, and Lemieux (2009) for analyzing unconditional partial effects (UPE) on quantiles in a regression analysis framework (Unconditional Quantile Regressions). The flexibility and simplicity of this tool, however, has opened the possibility to extend the analysis to other distributional statistics, using linear regressions or decomposition approaches. In this paper, I introduce three Stata commands to facilitate the use of Recentered Influence Functions in the analysis of outcome distributions: rifvar() is an egen extension used to create RIFs for a large set of distributional statistics; rifhdreg facilitates the estimation of RIF-regressions enabling the use of high dimensional fixed effects; and oaxaca_rif which is used for the implementation of Oaxaca-Blinder type decomposition analysis. Creation-Date: 20190802 File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Rios-Avila.pdf Handle: RePEc:boc:scon19:22 Template-Type: ReDIF-Paper 1.0 Author-Name: Thomas Zylkin Author-Workplace-Name: University of Richmond Author-Email: tzylkin@richmond.edu Title: Verifying the Existence of Maximum Likelihood Estimates in Generalized Linear Models Abstract: There has been considerable ambiguity over how to verify whether estimates from nonlinear models "exist" and what can be done if they do not. This is the so-called ``separation'' problem. We characterize the problem in detail across a wide range of generalized linear models and introduce a novel method for dealing with it in the presence of high-dimensional fixed effects, as are often recommended for gravity models of international trade and in other common panel data settings. We have included these methods in a new Stata command for HDFE-Poisson estimation called ``PPMLHDFE''. We have also created a suite of test cases developers may use in the future for testing whether their estimation packages are correctly identifying instances of separation. These projects are joint with Sergio Correia and Paulo Guimaraes. We have written two papers related to these topics and also created a website with example code and data illustrating the separation issue and how we solve it. Please see our github for more details: https://github.com/sergiocorreia/ppmlhdfe/ Creation-Date: 20190802 File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Zylkin.pdf Handle: RePEc:boc:scon19:47 Template-Type: ReDIF-Paper 1.0 Author-Name: Austin Nichols Author-Workplace-Name: Abt Associates Author-Email: austinnichols@gmail.com Title: Unbiased IV in Stata Abstract: A well-known result is that exactly identified IV has no moments, including in the ideal case of an experimental design (i.e. a randomized control trial with imperfect compliance. This result no longer holds when the sign of the first stage is known, however. I describe a Stata implementation of an unbiased estimator for instrumental variables models with a single endogenous regressor where the sign of one or more first‐stage coefficients is known (due to Andrews and Armstrong 2017) and its finite sample properties under alternative error structures. Creation-Date: 20190802 File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Nichols.pdf Handle: RePEc:boc:scon19:44 Template-Type: ReDIF-Paper 1.0 Author-Name: Di Liu Author-Workplace-Name: StataCorp Title: Using lasso and related estimators for prediction Abstract: Users may extend Stata's features using other programming languages such as Java and C. New in Stata 16, Stata has tight integration with Python, which allows users to embed and execute Python code from within Stata. I will discuss how users can easily call Python from Stata, output Python results within Stata, and exchange data and results between Python and Stata, both interactively and as sub-routines within do-files and ado-files. I will also show examples of the Stata Function Interface (sfi); a Python module provided with Stata which provides extensive facilities for accessing Stata objects from within Python. File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Liu.pdf Creation-Date: 20190802 Handle: RePEc:boc:scon19:2 Template-Type: ReDIF-Paper 1.0 Author-Name: David Drukker Author-Workplace-Name: StataCorp Title: Inference after lasso model selection Abstract: The increasing availability of high-dimensional data and increasing interest in more realistic functional forms have sparked a renewed interest in automated methods for selecting the covariates to include in a model. I discuss the promises and perils of model selection and pay special attention to estimators that provide reliable inference after model selection. I will demonstrate how to use Stata 16's new features for double selection, partialing out, and cross-fit partialing out to estimate the effects of variables of interest while using lasso methods to select control variables. File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Drukker.pdf Creation-Date: 20190802 Handle: RePEc:boc:scon19:3 Template-Type: ReDIF-Paper 1.0 Author-Name: Joseph Canner Author-Workplace-Name: Johns Hopkins University School of Medicine, Department of Surgery Author-Email: jcanner1@jhmi.edu Author-Name: Hwanhee Hong Author-Workplace-Name: Duke University Medical Center, Department of Biostatistics and Bioinformatics Author-Name: Tianjing Li Author-Workplace-Name: Johns Hopkins University Bloomberg School of Public Health, Department of Epidemiology Title: Uncovering the true variability in meta-analysis results using resampling methods Abstract: Traditionally, meta-analyses are performed using a single effect estimate from each included study, resulting in a single combined effect estimate and confidence interval. However, there are a number of processes that could give rise to multiple effect estimates from each study, such as multiple individuals extracting study data, the use of different analysis methods for dealing with missing data or dropouts, and the use of different types of endpoints for measuring the same outcome. Depending on the number of studies and the number of possible estimates per study, the number of combinations of studies for which a meta-analysis could be performed could be in the thousands. Accordingly, meta-analysts need a tool that can iterate through all of these possible combinations (or a reasonably-sized sample thereof), compute an effect estimate for each, and summarize the distribution of the effect estimates and standard errors for all combinations. We have developed a Stata command, -resmeta-, for this purpose that can generate results for 10,000 combinations in a few seconds. This command can handle both continuous and categorical data, can handle a variable number of estimates per study, and has options to compute a variety of different estimates and standard errors. In the presentation we will cover case studies where this approach was applied, considerations for more general application of the approach, command syntax and options, and different ways of summarizing the results and evaluating different sources of variability in the results. Creation-Date: 20190802 File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Canner.pptx Handle: RePEc:boc:scon19:28 Template-Type: ReDIF-Paper 1.0 Author-Name: Theodore Karrison Author-Workplace-Name: University of Chicago and NRG Oncology Author-Email: tkarrison@health.bsd.uchicago.edu Author-Name: James Dignam Author-Workplace-Name: University of Chicago and NRG Oncology Title: Comparing Treatments in the Presence of Competing Risks Based on Life Years Lost Abstract: Competing risks are frequently encountered in medical research. Examples are clinical trials in head-and-neck and prostate cancer where deaths from cancer and deaths from other causes are competing risks. Andersen (Stat in Med, 2013) showed that the area under the cause j cumulative incidence curve from 0 to t* can be interpreted as the number of life years lost (LYL) due to cause j before time t*. LYL can be estimated and compared in Stata using either the pseudo-observations approach described in Overgaard, Andersen, and Parner (Stata Journal, 2015) or by modification of a routine by Pepe and Mori (Stat in Med, 1993) for testing the equality of cumulative incidence curves. We describe an application of the method to the DeCIDE trial, a phase III randomized clinical trial of induction chemotherapy plus chemoradiotherapy vs. chemoradiotherapy alone in patients with locally advanced head-and-neck cancer. We present simulation results demonstrating that the pseudo-observations and Pepe-Mori approaches yield similar results. We also evaluate the power obtained from comparing life years lost relative to standard procedures for analyzing competing risks data, including cause-specific logrank tests (Freidlin and Korn; Stat in Med, 2005) and the Fine-Gray model (Fine and Gray; JASA, 1999). Creation-Date: 20190802 File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Karrison.pptx Handle: RePEc:boc:scon19:24 Template-Type: ReDIF-Paper 1.0 Author-Name: Ben Adarkwa Dwamena Author-Workplace-Name: University of Michigan Medical School Author-Email: bdwamena@med.umich.edu Title: Hierarchical Summary ROC Analysis: A frequentist-bayesian colloquy in Stata Abstract: Meta-analysis of diagnostic accuracy studies requires the use of more advanced methods than meta-analysis of intervention studies. Hierarchical or multilevel modelling accounts for the bivariate nature of the data, both within and between study heterogeneity and threshold variability. The hierarchical summary receiver operating characteristic (HSROC) and the bivariate random-effects models are currently recommended by the Cochrane Collaboration. The bivariate model is focused on estimating summary sensitivity and specificity and as a generalized linear mixed model is estimable in most statistical software including Stata. The HSROC approach models the implicit threshold and diagnostic accuracy for each study as random effects and includes a shape or scale parameter which enables asymmetry in the SROC by allowing accuracy to vary with implicit threshold. As a generalized non-linear mixed model, it has not been previously/directly estimable in Stata though possible with WinBUGS and SAS Proc NLMIXED or indirectly extrapolating its parameters from the bivariate model in Stata. This talk will demonstrate for the first time how the HSROC model can be fitted in Stata using ML programming and the recently introduced bayesmh command. Using a publicly available dataset, I will show the comparability of Stata results with those obtained with WinBUGS and SAS Proc NLMIXED. Creation-Date: 20190802 File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Dwamena.pdf Handle: RePEc:boc:scon19:48 Template-Type: ReDIF-Paper 1.0 Author-Name: Ercio Munoz Author-Workplace-Name: CUNY Graduate Center and Stone Center on Socio-economic Inequality Author-Email: emunozsaavedra@gc.cuny.edu Author-Name: Salvatore Morelli Author-Workplace-Name: CUNY Graduate Center and Stone Center on Socio-economic Inequality Title: kmr: A Command to Correct Survey Weights for Unit Nonresponse using Group's Response Rates Abstract: This article describes kmr, a Stata command to estimate a micro compliance function using group level nonresponse rates (2007, Journal of Econometrics 136: 213-235), which can be used to correct survey weights for unit nonresponse. We illustrate the use of kmr with an empirical example using the Current Population Survey and state-level nonresponse rates. Creation-Date: 20190802 File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Munoz.pdf Handle: RePEc:boc:scon19:13 Template-Type: ReDIF-Paper 1.0 Author-Name: Matthew Masten Author-Workplace-Name: Duke University Author-Email: matt.masten@duke.edu Author-Name: Alexandre Poirier Author-Workplace-Name: Georgetown University Title: tesensitivity: A Stata Package for Assessing the Unconfoundedness Assumption Abstract: This talk will discuss a new set of methods for quantifying the robustness of treatment effects estimated under the unconfoundedness assumption (also known as selection on observables or conditional ignorability). Specifically, we estimate bounds on the ATE, the ATT, and the QTE under nonparametric relaxations of unconfoundedness indexed by a scalar sensitivity parameter c. These deviations allow for limited selection on unobservables, depending on the value of c. For large enough c, these bounds equal the no assumptions bounds. Our methods allow for both continuous and discrete outcomes, but require discrete treatments. We implement these methods in a new Stata package, tesensitivity, for easy use in practice. We illustrate how to use this package and these methods with an empirical application to the National Supported Work Demonstration program. Creation-Date: 20190802 File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Masten.pdf Handle: RePEc:boc:scon19:51 Template-Type: ReDIF-Paper 1.0 Author-Name: Abigail S. Baldridge Author-Workplace-Name: Northwestern University Title: Connecting Stata and Microsoft Word using StatTag for collaborative reproducibility Abstract: Although Stata can render output and reports to Microsoft Word, pdf and html files, Stata users must sometimes transcribe statistical content in to separate Microsoft Word documents (for example, documents drafted by colleagues in Word or documents that must be prepared in Word), a process that is error prone, irreproducible, and inefficient. This talk will illustrate how StatTag (www.stattag.org), an open source, free, and user-friendly program that we developed, addresses this problem. Since its introduction in 2016, StatTag has undergone substantial improvements and refinements. StatTag establishes a bidirectional link between Stata files and a Word document and supports a reproducible pipeline even when (1) statistical results must be included and updated in Word documents that were never generated from Stata; and (2) text in Word files generated from Stata has departed substantially from original content, for example, through tracked changes or comments. We will demonstrate how to use StatTag to connect Stata and Word files so that all files can be edited separately, but statistical content—values, tables, figures, and verbatim output—can be updated automatically in Word. Using practical examples, we will also illustrate how to use StatTag to view, edit, and rerun Stata code directly from Word. Creation-Date: 20190802 File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Baldridge.pptx Handle: RePEc:boc:scon19:4 Template-Type: ReDIF-Paper 1.0 Author-Name: Debora Giovannelli Author-Email: debora.giovannelli@gmail.com Title: Postestimation Analysis with Stata by SPost13 commands of Survey Data analyzed by MNLM Abstract: Data coming back from a brand survey have been analysed by a regression model for nominal outcomes, also known as the Multinomial Logit Model. The Multinomial Logit Model (MNLM) belongs to a multivariate version of Generalized Linear Models (GLM), a class of models popularized by McCullagh and Nelder (1982) and widely used in many different fields (Social Sciences, Biomedical Sciences, Epidemiology, Public Health, Genetic, Zoology, Education, but also Marketing Researches, Survey Analysis and Product/Process/Service Quality Control). The interpretation of these regression models requires a background knowledge that is not always common, especially in business application fields. Data must be “readable” to anyone who has the responsibility to take serious decision, which can strongly influence not only the business of a company but also the safety and the quality of its products/processes and services. The scope of this presentation is to show and highlight the advantages of the implementation of Spost13 commands, setup by J. Scott Long and J. Freese, as very useful tools for making easier the interpretation of results coming from the implementation of this regression model for nominal response variables. Creation-Date: 20190802 File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Giovannelli.pdf Handle: RePEc:boc:scon19:38 Template-Type: ReDIF-Paper 1.0 Author-Name: Bob Wen Author-Workplace-Name: Clemson University Author-Email: shihaow@clemson.edu Title: The Causal Effects of Wages on Labour Supply for Married Women -- Evidence from American Couples Abstract: Using individual-level panel data from PSID, we consistently estimate the causal effects of own wages on interior labour supply for married women who were between 17 and 55 years old in 2005 and surveyed every two years till 2015. We first discuss the representative married woman's utility maximisation choice subject to her budget constraint that connects her husband's wages and non-labour income to her labour supply decisions through the couple relationship. Suggested by the optimal hours of work equation and comparative statics, we start our empirical analysis with a pooled OLS, holding relevant factors constant. Then we take into account the endogeneity problem due to sample selection and alleviate this issue by adding the selection variable (the inverse Mill's ratio from probit selection regression) into the hours of work equation. Besides, we control for individual heterogeneity (such as the married women's preference for work, ability and family tradition) and simultaneity of labour supply and labour demand using panel data fixed effects 2SLS with demand shifters as instruments for endogenous variables in the labour supply equation. We find that: (1) The causal effects of wages on labour supply (the hours-wage elasticities) drop from 0.29 in the pooled OLS to 0.16 in the panel data fixed effects 2SLS model after we account for sample selection, individual heterogeneity, and simultaneous equations bias. (2) Holding other factors constant, a 1% increase in married women's wages raises their hours of work by 0.16% on average. (3) Part-time female workers are more responsive to wage changes than their full-time counterparts. (4) There is evidence of backwards-bending labour supply curves. Creation-Date: 20190802 File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Wen.pdf Handle: RePEc:boc:scon19:27 Template-Type: ReDIF-Paper 1.0 Author-Name: Joseph Canner Author-Workplace-Name: Johns Hopkins University School of Medicine, Department of Surgery Author-Email: jcanner1@jhmi.edu Author-Name: Krisztian Sebestyen Author-Workplace-Name: Johns Hopkins University School of Medicine, Department of Surgery Title: Fitting generalized linear models when the data exceeds available memory Abstract: Despite the increase in random access memory (RAM) capacity and the decrease in RAM prices in the years since Stata was first released, the increase in the size of data sets in recent years can still exceed available RAM. This is particularly true for those who are using Stata on a personal laptop or desktop instead of an enterprise server. Accordingly, there is a need for statistical tools that can read small chunks of data from disk, perform calculations on those chunks, accumulate intermediate results, and produce final results that are the same as those obtained by performing the entire calculation in memory. The most ubiquitous statistical method is the generalized linear model (GLM), and mathematical methods have been available for many years to update the Q-R or Cholesky decomposition matrices with small chunks of data. Thomas Lumley’s R command bigglm uses Fortran functions published by Alan J. Miller in 1992 and freely available as Algorithm AS 274. We have developed –bigglm- for Stata using the same functions, as well as expanding the library of available family and link functions. The current version can read Stata datasets as well as import data from an ODBC source. In the presentation we will discuss the limitations of the current approach and suggest areas for improvement. Creation-Date: 20190802 File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Canner.pdf Handle: RePEc:boc:scon19:49 Template-Type: ReDIF-Paper 1.0 Author-Name: Fernando Rios-Avila Author-Workplace-Name: Levy Economics Institute Author-Email: friosa@gmail.com Title: Estimation of Varying Coefficient models in Stata Abstract: Non-parametric regressions are a powerful statistical tool to model relationships between dependent and independent variables with minimal assumptions on the underlying functional forms. However, these types of models have two main weaknesses: First, their added flexibility also creates a curse of dimensionality, even with a modest set of independent variables. Second, while the above weakness can be addressed using larger samples, procedures available for model selection, in particular cross-validation, are computationally intensive in large samples. An alternative is to use semiparametric regression modeling combining the flexibility of non-parametric with the structure of standard models. In this presentation, I’m introducing a set of programs that aim to estimate a semiparametric model known as varying coefficient models. The proposed modules estimate linear models where the coefficients for the independent variables are assume to be a smooth function of a single running z, using a local linear kernel estimation. The current set of modules can be used to: 1) Estimate the optimal bandwidth for the semiparametric model using CV, 2) Estimate the model(s) using a predefined set of reference points, with three alternatives standard errors estimations 3) Obtain the model predictions as well as a set of diagnosis and specification tests, 4) Plot all the coefficients, and rate of change, respect to the running variable for the selected points of reference. Creation-Date: 20190802 File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Rios-Avila_poster.pdf Handle: RePEc:boc:scon19:6 Template-Type: ReDIF-Paper 1.0 Author-Name: Lakshika Tennakoon Author-Workplace-Name: Division of General Surgery, Section of Trauma & Critical Care Author-Email: lakshika@stanford.edu Author-Name: David Spain Author-Workplace-Name: Division of General Surgery, Section of Trauma & Critical Care Author-Name: Lisa M Knowlton Author-Workplace-Name: Division of General Surgery, Section of Trauma & Critical Care Title: Psychiatric Morbidity in Physically Injured Children and Adolescents: A National Evaluation Abstract: Background: Mental health disorders are among the leading causes of disability worldwide. Studies have demonstrated that most adult mental health disorders begin in childhood and adolescence. Aims: We hypothesized that psychiatric disorders are common among hospitalized pediatric trauma patients, and that they are associated with poor outcomes. Methods: The KIDS Inpatient Sample 2012 was queried to provide national estimates for pediatric trauma. Patients aged 1 year and above were included. Psychiatric diagnoses were defined using ICD9-CM codes and the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition. Unadjusted and adjusted analyses were performed using STATA-15.2. Results: Of the total 6.7 million children and adolescents admitted to hospital in 2012,141,561 (2.12%) of them had a primary diagnosis of trauma. 17.3% (n=23,312) of patients had a psychiatric diagnosis. Patients with a psychiatric disorder were older compared to patients without a psychiatric disorder (mean age: 16.3 vs 12.2 years, p<0.001), were more often males (76.1% vs 68%, p<0.001) and white (58.2% vs 54%, p<0.001). The highest prevalence of psychiatric disorders (58.1%) was reported in the 15-19 years age group. Patients with a psychiatric diagnosis had multiple injuries (44.3%), isolated extremity fractures (18%), isolated other injuries (15%) and head injuries (14.8%). Overall mortality was lower for injured patients with a psychiatric disorder both in unadjusted and adjusted analyses (0.6 vs 1.3; aOR=0.98, p<0.001). Conclusion: Psychiatric diagnoses are surprisingly common among pediatric trauma patients. Increased vigilance and counseling is needed for this population. Creation-Date: 20190802 File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Tennakoon.pdf Handle: RePEc:boc:scon19:41