Template-Type: ReDIF-Paper 1.0
Author-Name:  Kristoffer Bjarkefur
Author-Workplace-Name: The World Bank Group
Author-Email:  kbjarkefur@worldbank.org
Author-Name:  Luiza Cardoso de Andrade
Author-Workplace-Name: The World Bank Group
Author-Name:  Benjamin Daniels
Author-Workplace-Name: The World Bank Group
Author-Name:  Mrijan Rimal
Author-Workplace-Name: The World Bank Group
Title: ietoolkit: How DIME Analytics develops Stata code from primary data work
Abstract: Over the years, the complexity of data work in development research has grown exponentially, and standardizations for workflows are needed for researchers and data analysts to work simultaneously on multiple projects. -ietoolkit- was developed to standardize and simplify best practices for data management and analysis across the 100+ members of the World Bank's Development Research Group, Impact Evaluations team (DIME). It includes a standardized project folder structure; standardized Stata 'boilerplate' code; standardized balance tables, graphs, and matching procedures; and modified dropping and saving commands with built-in safety checks.
The presentation will outline how the -ietoolkit- structure is meant to serve as a guide for projects to move their data through the analysis process in a standardized way, as well as offer a brief introduction to the other commands. The intent is for many projects within one organization to have a predictable workflow, such that researchers and data analysts can move between multiple projects and support other teams easily and rapidly without expending time relearning idiosyncratic project organization structures and standards. These tools are developed open-source on GitHub and available publicly. 
Creation-Date: 20190802 
File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Bjarkefur.pdf
Handle: RePEc:boc:scon19:12

Template-Type: ReDIF-Paper 1.0
Author-Name:  Benjamin Daniels
Author-Workplace-Name: World Bank Development Research Group, Impact Evaluations (DIME)
Author-Email:  bdaniels@worldbank.org
Author-Name:  Luiza Cardoso de Andrade
Author-Workplace-Name: The World Bank Group
Author-Name:  Kristoffer Bjarkefur
Author-Workplace-Name: The World Bank Group
Title: iefieldkit: Stata commands for primary data collection and cleaning
Abstract: Data collection and cleaning workflows use highly repetitive but extremely important processes. -iefieldkit- was developed to standardize and simplify best practices for high-quality primary data collection across the 100+ members of the World Bank's Development Research Group, Impact Evaluations team (DIME). It automates: error-checking for electronic ODK-based survey modules such as those implemented in SurveyCTO; duplicate checking and resolution; data cleaning including renaming, labeling, recoding, and survey harmonization; and codebook creation.
The presentation will outline how the -iefieldkit- package is intended to provide a data collection workflow skeleton for nearly any type of primary data collection, from questionnaire design to data import. One feature of many -iefieldkit- commands is their utilization of spreadsheet-based workflows, which reduce repetitive coding in Stata and document corrections and cleaning in a human-readable format. This enables rapid review of data quality in a standardized process, with the goal of producing maximally clean primary data for the downstream data construction and analysis phases in a transparent and accessible manner. These tools are developed open-source on GitHub and available publicly. 
Creation-Date: 20190802 
File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Daniels.pdf
Handle: RePEc:boc:scon19:11

Template-Type: ReDIF-Paper 1.0
Author-Name:  Billy Buchanan
Author-Workplace-Name: Fayette County Public Schools
Author-Email:  william@williambuchanan.net
Title: Barrel Aged Software Development: brewscheme as a four-year old
Abstract: The term software development implies some type of change over time.  While Stata goes through extraordinary steps to support backwards compatibility, user-contributors may not always see a need to continue developing programs shared with the community.  How do you know if/when you should add additional programs/functionality to an existing package?  Is it easy/practical to extend existing Stata code or is it easier to refactor everything from the ground up?  What can you do to make it easier to extend existing code?   While -brewscheme- may have started as a relatively simple package with a couple of commands and limited functionality, in the four years since it was introduced it has grown into a multifunctional library of tools to make it easier to create customized visualizations in Stata while being mindful of color sight impairments.  I will share my experience, what I have learned, and strategies related to how I dealt with these questions in the context of the development of the -brewscheme- package.  I will also show what the additional features do that the original -brewscheme- did not do. 
Creation-Date: 20190802 
File-URL: https://wbuchanan.github.io/stataConference2019/#/
Handle: RePEc:boc:scon19:30

Template-Type: ReDIF-Paper 1.0
Author-Name:  Phil Ender
Author-Workplace-Name: UCLA Retired
Author-Email:  ender@ucla.edu
Title: Simulating Baboon Behavior using Stata
Abstract: This presentation originated from a field study of the behavior of feral baboons in Tanzania.  The field study made use of behavior sampling methods. Behavior sampling methods included on-the-moment (instantaneous) and thru-the-moment (one-zero).  Some primatologists critiqued behavioral sampling as not reflecting true frequency or duration.  A Monte Carlo simulation study was performed to compare behavior sampling with actual frequency and duration. 
Creation-Date: 20190802 
File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Ender.pdf
Handle: RePEc:boc:scon19:99

Template-Type: ReDIF-Paper 1.0
Author-Name:  Barbara Williams
Author-Workplace-Name: Virginia Mason Medical Center
Author-Email:  barbara.williams@virginiamason.org 
Title: Using cluster analysis to understand complex data sets- experience from a national nursing consortium
Abstract: Cluster analysis is a type of exploratory data analysis for classifying observations and identifying distinct groups.  It may be useful for complex data sets where commonly used regression modeling approaches may be inadequate due to outliers, complex interactions or violation of assumptions. In health care, the complex effect of nursing factors (including staffing levels, experience, and contract status), hospital size, and patient characteristics on patient safety (including pressure ulcers and falls) has not been well understood. In this presentation, I will explore the use of use Stata cluster analysis (cluster) to describe five groups of hospital units which have distinct characteristics to predict patient pressure ulcers and hospital falls in relationship to employment of supplemental registered nurses (SRNs) in a national nursing database. The use of SRNs is a common practice among hospitals to fill gaps in nurse staffing. But the relationship between the use of SRNs and patient outcomes varies widely, with some groups reporting a positive relationship while other groups report an adverse relationship.  The purpose of this presentation is to identify the advantages and disadvantages of cluster analysis and other methods when analyzing non-normally distributed, non-linear data that have unpredictable interactions. 
Creation-Date: 20190802 
File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Williams.pptx
Handle: RePEc:boc:scon19:20

Template-Type: ReDIF-Paper 1.0
Author-Name:  Karl X.Y. Zou
Author-Workplace-Name: Texas A&M University
Author-Email:  Xinyuan.Zou@tamu.edu
Author-Name: Mark Fossett
Title: The Individual Process of Neighborhood Change and Residential Segregation in 1940 - An Implication of Discrete-Choice Model
Abstract: Using the 1940 restricted census microdata, this study develops discrete choice models to investigate how individual and household characteristics, along with the features of neighborhoods of residence affect individual choices of residential outcomes in the US cities. This study will make several innovations: (1) We will take advantage of 100% census microdata on the whole population of the cities to establish discrete-choice models estimating the attributes of alternatives (e.g. neighborhoods) and personal characteristics simultaneously. (2) This study will set a routine of reconstructing personal records to the data structure eligible for discrete-choice model and then test whether or not the assumptions are violated. (3) This study will assess the extent and importance of discrimination and residential preferences respectively through the model specification. The results suggest that both in-group racial and class preferences can explain the individual process of neighborhood changes. All groups somehow practice out-group avoidance based on race and social class. Such phenomena are more pronounced in multi-racial cities.   
Creation-Date: 20190802 
File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Zou.pptx
Handle: RePEc:boc:scon19:42

Template-Type: ReDIF-Paper 1.0
Author-Name: Hua Peng
Author-Workplace-Name: StataCorp
Title: Stata Python integration
Abstract: Users may extend Stata's features using other programming languages such as Java and C. New in Stata 16, Stata has tight integration with Python, which allows users to embed and execute Python code from within Stata. I will discuss how users can easily call Python from Stata, output Python results within Stata, and exchange data and results between Python and Stata, both interactively and as sub-routines within do-files and ado-files. I will also show examples of the Stata Function Interface (sfi); a Python module provided with Stata which provides extensive facilities for accessing Stata objects from within Python.
File-URL: https://huapeng01016.github.io/chicago19/#/hua-pengstatacorphpeng
Creation-Date: 20190802 
Handle: RePEc:boc:scon19:1

Template-Type: ReDIF-Paper 1.0
Author-Name:  Giovanni Cerulli
Author-Workplace-Name: IRCrES-CNR, National Research Council of Italy
Author-Email:  giovanni.cerulli@ircres.cnr.it
Title: Extending the difference-in-differences (DID) to settings with many treated units and same intervention time: Model and Stata implementation
Abstract: The difference-in-differences (DID) estimator is popular to estimate average treatment effects in causal inference studies.
Under the common support assumption, DID overcomes the problem of unobservable selection using panel, time, and/or location fixed effects, and the knowledge of the pre/post intervention times.
New developments of DID have been recently proposed: (i) the Synthetic Control Method (SCM) applies when a long pre- and post-intervention time series is available, only one unit is treated, and intervention occurs in a specific time (implemented in Stata via SYNTH by Hainmueller, Abadie, Dimond, 2014); (ii) an extension to binary time varying treatment with many treated units, have been also proposed and implemented in Stata via TVDIFF (Cerulli and Ventura, 2018). However, a command to accommodate a setting with many treated units and same intervention time is still lacking.
In this presentation, I propose a potential outcome model to accommodate this latter setting, and provide a Stata implementation via the new Stata routine FTMTDIFF (standing for fixed-time multiple treated DID). I will finally set some guidelines for future DID developments. 
Creation-Date: 20190802 
File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Cerulli.pdf
Handle: RePEc:boc:scon19:26

Template-Type: ReDIF-Paper 1.0
Author-Name:  Austin Nichols
Author-Workplace-Name: Abt Associates 
Author-Email:  austinnichols@gmail.com
Author-Name: Andrew Goodman-Bacon
Author-Workplace-Name: Vanderbilt University
Author-Name: Thomas Goldring
Author-Workplace-Name: Georgia Policy Labs
Title: Bacon decomposition for understanding differences-in-differences with variation in treatment timing
Abstract: In applications of a difference-in-differences (DD) model, researchers often exploit natural experiments with variation in onset, comparing outcomes across groups of units that receive treatment starting at different times. Goodman-Bacon (2019) shows that this DD estimator is a weighted average of all possible two-group/two-period DD estimators in the data. The -bacon- command performs this decomposition and graphs all two-by-two DD estimates against their weight, which displays all identifying variation for the overall DD estimate. Given the widespread use of the two-way fixed effects DD model, -bacon- has broad applicability across domains and will help researchers understand how much of a given DD estimate comes from different sources of variation. 
Creation-Date: 20190802 
File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Goodman-Bacon.pptx
Handle: RePEc:boc:scon19:46

Template-Type: ReDIF-Paper 1.0
Author-Name:  Choonjoo Lee
Author-Workplace-Name: Korea National Defense University
Author-Email:  bloom.rampike@gmail.com
Title: The matching problem using Stata
Abstract: A main purpose of this presentation is to discuss an algorithm for the matching problem. As an example, K-cycle Kidney exchange problem is defined and solved using user-written Stata program. 
Creation-Date: 20190802 
File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Lee.pdf
Handle: RePEc:boc:scon19:43

Template-Type: ReDIF-Paper 1.0
Author-Name:  Joseph Terza
Author-Workplace-Name: Department of Economics, Indiana University Purdue University Indianapolis
Author-Email:  jvterza@iupui.edu
Title: Mata implementation of Gauss-Legendre quadrature in the M-estimation context: Correcting for sample-selection bias in a generic nonlinear setting
Abstract: Many contexts in empirical econometrics require non-closed form integration for appropriate modeling and estimation design. Applied researchers often avoid such correct but computationally demanding specifications and opt for simpler misspecified modeling designs.  The presentation will detail a newly developed Mata implementation of a relatively simple numerical integration technique – Gauss-Legendre quadrature.  Although this Mata code is applicable in a variety of circumstances, it was mainly written for use in M-estimation when the relevant objective function (e.g. the likelihood function) involves integration at the observation level.  As inputs, the user supplies a vector-valued integrand function (e.g. a vector of sample log-likelihood integrands) and a matrix of upper and lower integration limits.  The code outputs the corresponding vector of integrals (e.g. the vector of observation-specific log-likelihood values).  To illustrate the use of this Mata implementation, we conduct an empirical analysis of classical sample selection bias in the estimation of wage offer regressions.  We estimate a nonlinear version of the model based on the modeling approach suggested by Terza (Econometric Reviews, 2009) which requires numerical integration.  This model is juxtaposed with the classical linear sample selection specification of Heckman (Annals of Economic and Social Measurement, 1976) for which numerical integration is not required. 
Creation-Date: 20190802 
File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Terza.pdf
Handle: RePEc:boc:scon19:31

Template-Type: ReDIF-Paper 1.0
Author-Name:  Carlos Dorantes
Author-Workplace-Name: Tec de Monterrey
Author-Email:  cdorante@tec.mx
Title: A practical application of the mvport package: CAPM-based optimal portfolios
Abstract: The mvport package has commands for financial portfolio optimization and portfolio backtesting. I present a practical implementation of a CAPM-based strategy to select stocks, and then apply different optimization settings, and evaluate the resulting portfolios. The presentation illustrates how to automate the process through a simple do file that allows to easily change parameters (e.g. stock list, market index, risk-free rate) using an Excel interface.  
The program automates the following: a) data collection, b) CAPM model estimation for all stocks, c) selection of stocks based on CAPM parameters, d) portfolio optimization with different configurations, and e) portfolio backtesting. 
For data collection, the getsymbols and the freduse command is used to get online price data for all the S&P500 stocks and the risk-free rate. For each stock, two competing CAPM models are estimated: using a simple regression, and using an autoregressive conditional heteroscedasticity (ARCH) model. The CAPM parameters are used to select stocks. Then the mvport package is used to optimize different configurations of the portfolio. Finally, the performance of each portfolio configuration is calculated is compared with the market portfolio. 
Creation-Date: 20190802 
File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Dorantes.pdf
Handle: RePEc:boc:scon19:50

Template-Type: ReDIF-Paper 1.0
Author-Name:  Tim Schmidt
Author-Workplace-Name: Discover Financial Services
Author-Email:  timothyschmidt@discover.com
Title: Tools to analyze interest rates and value bonds
Abstract: Bond markets contain a wealth of information about investor preferences and expectations.  However, extracting such information from market interest rates can be computationally burdensome.  I introduce a suite of new Stata commands to aid finance professionals and researchers in using Stata to analyze the term structure of interest rates and value bonds.  The genspot command uses a bootstrap methodology to construct a spot rate curve from a yield curve of market interest rates under a no-arbitrage assumption.  The genfwd command generates a forward rate curve from a spot rate curve, allowing researchers to infer market participants’ expectations of future interest rates.  Finally, the pricebond command uses forward rates to value a bond with user-specified terms. 
Creation-Date: 20190802 
File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Schmidt.pdf
Handle: RePEc:boc:scon19:21

Template-Type: ReDIF-Paper 1.0
Author-Name:  Mustafa Karakaplan 
Author-Email:  mukarakaplan@yahoo.com
Title: Panel Stochastic Frontier Models with Endogeneity in Stata
Abstract: I introduce xtsfkk, a new Stata command for fitting panel stochastic frontier models with endogeneity. The advantage of xtsfkk is that it can control for the endogenous variables in the frontier and/or the inefficiency term in a longitudinal setting. Hence, xtsfkk performs better than standard panel frontier methodologies such as xtfrontier that overlook endogeneity by design. 
Creation-Date: 20190802 
File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Karakaplan.pptx
Handle: RePEc:boc:scon19:53

Template-Type: ReDIF-Paper 1.0
Author-Name:  Fernando Rios-Avila
Author-Workplace-Name: Levy Economics Institute
Author-Email:  friosavi@levy.org
Title: Recentered Influence Functions (RIF) in Stata: RIF-Regression and RIF-Decomposition
Abstract: Recentered Influence Functions (RIF) are statistical tools that have been popularized by Firpo, Fortin, and Lemieux (2009) for analyzing unconditional partial effects (UPE) on quantiles in a regression analysis framework (Unconditional Quantile Regressions). The flexibility and simplicity of this tool, however, has opened the possibility to extend the analysis to other distributional statistics, using linear regressions or decomposition approaches. In this paper, I introduce three Stata commands to facilitate the use of Recentered Influence Functions in the analysis of outcome distributions: rifvar() is an egen extension used to create RIFs for a large set of distributional statistics;  rifhdreg facilitates the estimation of RIF-regressions enabling the use of high dimensional fixed effects; and oaxaca_rif which is used for the implementation of Oaxaca-Blinder type decomposition analysis. 
Creation-Date: 20190802 
File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Rios-Avila.pdf
Handle: RePEc:boc:scon19:22

Template-Type: ReDIF-Paper 1.0
Author-Name:  Thomas Zylkin
Author-Workplace-Name: University of Richmond
Author-Email:  tzylkin@richmond.edu
Title: Verifying the Existence of Maximum Likelihood Estimates in Generalized Linear Models
Abstract: There has been considerable ambiguity over how to verify whether estimates from nonlinear models "exist" and what can be done if they do not. This is the so-called ``separation'' problem. We characterize the problem in detail across a wide range of generalized linear models and introduce a novel method for dealing with it in the presence of high-dimensional fixed effects, as are often recommended for gravity models of international trade and in other common panel data settings. We have included these methods in a new Stata command for HDFE-Poisson estimation called ``PPMLHDFE''. We have also created a suite of test cases developers may use in the future for testing whether their estimation packages are correctly identifying instances of separation.
These projects are joint with Sergio Correia and Paulo Guimaraes. We have written two papers related to these topics and also created a website with example code and data illustrating the separation issue and how we solve it. Please see our github for more details: https://github.com/sergiocorreia/ppmlhdfe/ 
Creation-Date: 20190802 
File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Zylkin.pdf
Handle: RePEc:boc:scon19:47

Template-Type: ReDIF-Paper 1.0
Author-Name:  Austin Nichols
Author-Workplace-Name: Abt Associates 
Author-Email:  austinnichols@gmail.com
Title: Unbiased IV in Stata
Abstract: A well-known result is that exactly identified IV has no moments, including in the ideal case of an experimental design (i.e. a randomized control trial with imperfect compliance. This result no longer holds when the sign of the first stage is known, however. I describe a Stata implementation of an unbiased estimator for instrumental variables models with a single endogenous regressor where the sign of one or more first‐stage coefficients is known (due to Andrews and Armstrong 2017) and its finite sample properties under alternative error structures.  
Creation-Date: 20190802 
File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Nichols.pdf
Handle: RePEc:boc:scon19:44

Template-Type: ReDIF-Paper 1.0
Author-Name: Di Liu
Author-Workplace-Name: StataCorp
Title: Using lasso and related estimators for prediction
Abstract: Users may extend Stata's features using other programming languages such as Java and C. New in Stata 16, Stata has tight integration with Python, which allows users to embed and execute Python code from within Stata. I will discuss how users can easily call Python from Stata, output Python results within Stata, and exchange data and results between Python and Stata, both interactively and as sub-routines within do-files and ado-files. I will also show examples of the Stata Function Interface (sfi); a Python module provided with Stata which provides extensive facilities for accessing Stata objects from within Python.
File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Liu.pdf
Creation-Date: 20190802 
Handle: RePEc:boc:scon19:2

Template-Type: ReDIF-Paper 1.0
Author-Name: David Drukker
Author-Workplace-Name: StataCorp
Title: Inference after lasso model selection
Abstract: The increasing availability of high-dimensional data and increasing interest in more realistic functional forms have sparked a renewed interest in automated methods for selecting the covariates to include in a model. I discuss the promises and perils of model selection and pay special attention to estimators that provide reliable inference after model selection. I will demonstrate how to use Stata 16's new features for double selection, partialing out, and cross-fit partialing out to estimate the effects of variables of interest while using lasso methods to select control variables.
File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Drukker.pdf
Creation-Date: 20190802 
Handle: RePEc:boc:scon19:3

Template-Type: ReDIF-Paper 1.0
Author-Name:  Joseph Canner
Author-Workplace-Name: Johns Hopkins University School of Medicine, Department of Surgery
Author-Email:  jcanner1@jhmi.edu
Author-Name: Hwanhee Hong
Author-Workplace-Name: Duke University Medical Center, Department of Biostatistics and Bioinformatics
Author-Name: Tianjing Li
Author-Workplace-Name: Johns Hopkins University Bloomberg School of Public Health, Department of Epidemiology
Title: Uncovering the true variability in meta-analysis results using resampling methods
Abstract: Traditionally, meta-analyses are performed using a single effect estimate from each included study, resulting in a single combined effect estimate and confidence interval. However, there are a number of processes that could give rise to multiple effect estimates from each study, such as multiple individuals extracting study data, the use of different analysis methods for dealing with missing data or dropouts, and the use of different types of endpoints for measuring the same outcome. Depending on the number of studies and the number of possible estimates per study, the number of combinations of studies for which a meta-analysis could be performed could be in the thousands. Accordingly, meta-analysts need a tool that can iterate through all of these possible combinations (or a reasonably-sized sample thereof), compute an effect estimate for each, and summarize the distribution of the effect estimates and standard errors for all combinations. We have developed a Stata command, -resmeta-, for this purpose that can generate results for 10,000 combinations in a few seconds. This command can handle both continuous and categorical data, can handle a variable number of estimates per study, and has options to compute a variety of different estimates and standard errors. In the presentation we will cover case studies where this approach was applied, considerations for more general application of the approach, command syntax and options, and different ways of summarizing the results and evaluating different sources of variability in the results. 
Creation-Date: 20190802 
File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Canner.pptx
Handle: RePEc:boc:scon19:28

Template-Type: ReDIF-Paper 1.0
Author-Name:  Theodore Karrison
Author-Workplace-Name: University of Chicago and NRG Oncology
Author-Email:  tkarrison@health.bsd.uchicago.edu
Author-Name: James Dignam
Author-Workplace-Name: University of Chicago and NRG Oncology
Title: Comparing Treatments in the Presence of Competing Risks Based on Life Years Lost
Abstract: Competing risks are frequently encountered in medical research.  Examples are clinical trials in head-and-neck and prostate cancer where deaths from cancer and deaths from other causes are competing risks.  Andersen (Stat in Med, 2013) showed that the area under the cause j cumulative incidence curve from 0 to t* can be interpreted as the number of life years lost (LYL) due to cause j before time t*.  LYL can be estimated and compared in Stata using either the pseudo-observations approach described in Overgaard, Andersen, and Parner (Stata Journal, 2015) or by modification of a routine by Pepe and Mori (Stat in Med, 1993) for testing the equality of cumulative incidence curves.  We describe an application of the method to the DeCIDE trial, a phase III randomized clinical trial of induction chemotherapy plus chemoradiotherapy vs. chemoradiotherapy alone in patients with locally advanced head-and-neck cancer.  We present simulation results demonstrating that the pseudo-observations and Pepe-Mori approaches yield similar results. We also evaluate the power obtained from comparing life years lost relative to standard procedures for analyzing competing risks data, including cause-specific logrank tests (Freidlin and Korn; Stat in Med, 2005) and the Fine-Gray model (Fine and Gray; JASA, 1999). 
Creation-Date: 20190802 
File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Karrison.pptx
Handle: RePEc:boc:scon19:24

Template-Type: ReDIF-Paper 1.0
Author-Name:  Ben Adarkwa Dwamena
Author-Workplace-Name: University of Michigan Medical School
Author-Email:  bdwamena@med.umich.edu
Title: Hierarchical Summary ROC  Analysis: A frequentist-bayesian colloquy in Stata
Abstract: Meta-analysis of diagnostic accuracy studies requires the use of more advanced methods than meta-analysis of intervention studies. Hierarchical or multilevel modelling accounts for the bivariate nature of the data, both within and between study heterogeneity and threshold variability. The hierarchical summary receiver operating characteristic (HSROC) and the bivariate random-effects models are currently recommended by the Cochrane Collaboration. The bivariate model is focused on estimating summary sensitivity and specificity and as a generalized linear mixed model is estimable in most statistical software including Stata. The HSROC approach models the implicit threshold and diagnostic accuracy for each study as random effects and includes a shape or scale parameter which enables asymmetry in the SROC by allowing accuracy to vary with implicit threshold. As a generalized non-linear mixed model, it has not been previously/directly estimable in Stata though possible with WinBUGS and SAS Proc NLMIXED or indirectly extrapolating its parameters from the bivariate model in Stata. This talk will demonstrate for the first time how the HSROC model can be fitted in Stata using ML programming and the recently introduced bayesmh command. Using a publicly available dataset, I will show the comparability of Stata results with those obtained with WinBUGS and SAS Proc NLMIXED. 
Creation-Date: 20190802 
File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Dwamena.pdf
Handle: RePEc:boc:scon19:48

Template-Type: ReDIF-Paper 1.0
Author-Name:  Ercio Munoz
Author-Workplace-Name: CUNY Graduate Center and Stone Center on Socio-economic Inequality
Author-Email:  emunozsaavedra@gc.cuny.edu
Author-Name: Salvatore Morelli
Author-Workplace-Name: CUNY Graduate Center and Stone Center on Socio-economic Inequality
Title: kmr: A Command to Correct Survey Weights for Unit Nonresponse using Group's Response Rates
Abstract: This article describes kmr, a Stata command to estimate a micro compliance function using group level nonresponse rates (2007, Journal of Econometrics
136: 213-235), which can be used to correct survey weights for unit nonresponse.
We illustrate the use of kmr with an empirical example using the Current Population Survey and state-level nonresponse rates. 
Creation-Date: 20190802 
File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Munoz.pdf
Handle: RePEc:boc:scon19:13

Template-Type: ReDIF-Paper 1.0
Author-Name:  Matthew Masten
Author-Workplace-Name: Duke University
Author-Email:  matt.masten@duke.edu
Author-Name: Alexandre Poirier
Author-Workplace-Name: Georgetown University
Title: tesensitivity: A Stata Package for Assessing the Unconfoundedness Assumption
Abstract: This talk will discuss a new set of methods for quantifying the robustness of treatment effects estimated under the unconfoundedness assumption (also known as selection on observables or conditional ignorability). Specifically, we estimate bounds on the ATE, the ATT, and the QTE under nonparametric relaxations of unconfoundedness indexed by a scalar sensitivity parameter c. These deviations allow for limited selection on unobservables, depending on the value of c. For large enough c, these bounds equal the no assumptions bounds. Our methods allow for both continuous and discrete outcomes, but require discrete treatments. We implement these methods in a new Stata package, tesensitivity, for easy use in practice. We illustrate how to use this package and these methods with an empirical application to the National Supported Work Demonstration program. 
Creation-Date: 20190802 
File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Masten.pdf
Handle: RePEc:boc:scon19:51

Template-Type: ReDIF-Paper 1.0
Author-Name:  Abigail S. Baldridge
Author-Workplace-Name: Northwestern University
Title: Connecting Stata and Microsoft Word using StatTag for collaborative reproducibility
Abstract: Although Stata can render output and reports to Microsoft Word, pdf and html files, Stata users must sometimes transcribe statistical content in to separate Microsoft Word documents (for example, documents drafted by colleagues in Word or documents that must be prepared in Word), a process that is error prone, irreproducible, and inefficient.
This talk will illustrate how StatTag (www.stattag.org), an open source, free, and user-friendly program that we developed, addresses this problem. Since its introduction in 2016, StatTag has undergone substantial improvements and refinements. StatTag establishes a bidirectional link between Stata files and a Word document and supports a reproducible pipeline even when (1) statistical results must be included and updated in Word documents that were never generated from Stata; and (2) text in Word files generated from Stata has departed substantially from original content, for example, through tracked changes or comments. We will demonstrate how to use StatTag to connect Stata and Word files so that all files can be edited separately, but statistical content—values, tables, figures, and verbatim output—can be updated automatically in Word. Using practical examples, we will also illustrate how to use StatTag to view, edit, and rerun Stata code directly from Word.
Creation-Date: 20190802 
File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Baldridge.pptx
Handle: RePEc:boc:scon19:4

Template-Type: ReDIF-Paper 1.0
Author-Name:  Debora Giovannelli
Author-Email:  debora.giovannelli@gmail.com
Title: Postestimation Analysis with Stata by SPost13 commands of Survey Data analyzed by MNLM
Abstract: Data coming back from a brand survey have been analysed by a regression model for nominal outcomes, also known as the Multinomial Logit Model.
The Multinomial Logit Model (MNLM) belongs to a multivariate version of Generalized Linear Models (GLM), a class of models popularized by McCullagh and Nelder (1982) and widely used in many different fields (Social Sciences, Biomedical Sciences, Epidemiology, Public Health, Genetic, Zoology, Education, but also Marketing Researches, Survey Analysis and Product/Process/Service Quality Control).
The interpretation of these regression models requires a background knowledge that is not always common, especially in business application fields.
Data must be “readable” to anyone who has the responsibility to take serious decision, which can strongly influence not only the business of a company but also the safety and the quality of its products/processes and services.
The scope of this presentation is to show and highlight the advantages of the implementation of Spost13 commands, setup by J. Scott Long and J. Freese, as very useful tools for making easier the interpretation of results coming from the implementation of this regression model for nominal response variables.
Creation-Date: 20190802 
File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Giovannelli.pdf
Handle: RePEc:boc:scon19:38

Template-Type: ReDIF-Paper 1.0
Author-Name:  Bob Wen
Author-Workplace-Name: Clemson University 
Author-Email:  shihaow@clemson.edu

Title: The Causal Effects of Wages on Labour Supply for Married Women -- Evidence from American Couples
Abstract: Using individual-level panel data from PSID, we consistently estimate the causal effects of own wages on interior labour supply for married women who were between 17 and 55 years old in 2005 and surveyed every two years till 2015. 
We first discuss the representative married woman's utility maximisation choice subject to her budget constraint that connects her husband's wages and non-labour income to her labour supply decisions through the couple relationship. 
Suggested by the optimal hours of work equation and comparative statics, we start our empirical analysis with a pooled OLS, holding relevant factors constant. 
Then we take into account the endogeneity problem due to sample selection and alleviate this issue by adding the selection variable (the inverse Mill's ratio from probit selection regression) into the hours of work equation. 
Besides, we control for individual heterogeneity (such as the married women's preference for work, ability and family tradition) and simultaneity of labour supply and labour demand using panel data fixed effects 2SLS with demand shifters as instruments for endogenous variables in the labour supply equation. 
    We find that:
   (1) The causal effects of wages on labour supply (the hours-wage elasticities) drop from 0.29 in the pooled OLS to 0.16 in the panel data fixed effects 2SLS model after we account for sample selection, individual heterogeneity, and simultaneous equations bias. 
   (2) Holding other factors constant, a 1% increase in married women's wages raises their hours of work by 0.16% on average. 
   (3) Part-time female workers are more responsive to wage changes than their full-time counterparts. 
   (4) There is evidence of backwards-bending labour supply curves.
Creation-Date: 20190802 
File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Wen.pdf
Handle: RePEc:boc:scon19:27

Template-Type: ReDIF-Paper 1.0
Author-Name:  Joseph Canner
Author-Workplace-Name: Johns Hopkins University School of Medicine, Department of Surgery
Author-Email:  jcanner1@jhmi.edu
Author-Name: Krisztian Sebestyen
Author-Workplace-Name: Johns Hopkins University School of Medicine, Department of Surgery
Title: Fitting generalized linear models when the data exceeds available memory
Abstract: Despite the increase in random access memory (RAM) capacity and the decrease in RAM prices in the years since Stata was first released, the increase in the size of data sets in recent years can still exceed available RAM. This is particularly true for those who are using Stata on a personal laptop or desktop instead of an enterprise server. Accordingly, there is a need for statistical tools that can read small chunks of data from disk, perform calculations on those chunks, accumulate intermediate results, and produce final results that are the same as those obtained by performing the entire calculation in memory. The most ubiquitous statistical method is the generalized linear model (GLM), and mathematical methods have been available for many years to update the Q-R or Cholesky decomposition matrices with small chunks of data. Thomas Lumley’s R command bigglm uses Fortran functions published by Alan J. Miller in 1992 and freely available as Algorithm AS 274.  We have developed –bigglm- for Stata using the same functions, as well as expanding the library of available family and link functions. The current version can read Stata datasets as well as import data from an ODBC source. In the presentation we will discuss the limitations of the current approach and suggest areas for improvement. 
Creation-Date: 20190802 
File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Canner.pdf
Handle: RePEc:boc:scon19:49

Template-Type: ReDIF-Paper 1.0
Author-Name:  Fernando Rios-Avila
Author-Workplace-Name: Levy Economics Institute
Author-Email:  friosa@gmail.com
Title: Estimation of Varying Coefficient models in Stata
Abstract: Non-parametric regressions are a powerful statistical tool to model relationships between dependent and independent variables with minimal assumptions on the underlying functional forms. However, these types of models have two main weaknesses: First, their added flexibility also creates a curse of dimensionality, even with a modest set of independent variables. Second, while the above weakness can be addressed using larger samples, procedures available for model selection, in particular cross-validation, are computationally intensive in large samples.
An alternative is to use semiparametric regression modeling combining the flexibility of non-parametric with the structure of standard models. In this presentation, I’m introducing a set of programs that aim to estimate a semiparametric model known as varying coefficient models. The proposed modules estimate linear models where the coefficients for the independent variables are assume to be a smooth function of a single running z, using a local linear kernel estimation. 
The current set of modules can be used to: 1) Estimate the optimal bandwidth for the semiparametric model using CV, 2) Estimate the model(s) using a predefined set of reference points, with three alternatives standard errors estimations 3) Obtain the model predictions as well as a set of diagnosis and specification tests, 4) Plot all the coefficients, and rate of change, respect to the running variable for the selected points of reference.
Creation-Date: 20190802 
File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Rios-Avila_poster.pdf
Handle: RePEc:boc:scon19:6

Template-Type: ReDIF-Paper 1.0
Author-Name:  Lakshika Tennakoon
Author-Workplace-Name: Division of General Surgery, Section of Trauma & Critical Care
Author-Email:  lakshika@stanford.edu
Author-Name: David Spain
Author-Workplace-Name:  Division of General Surgery, Section of Trauma & Critical Care
Author-Name: Lisa M Knowlton
Author-Workplace-Name:  Division of General Surgery, Section of Trauma & Critical Care
Title: Psychiatric Morbidity in Physically Injured Children and Adolescents: A National Evaluation
Abstract: Background: Mental health disorders are among the leading causes of disability worldwide. Studies have demonstrated that most adult mental health disorders begin in childhood and adolescence. 
  Aims: We hypothesized that psychiatric disorders are common among hospitalized pediatric trauma patients, and that they are associated with poor outcomes. 
  Methods:  The KIDS Inpatient Sample 2012 was queried to provide national estimates for pediatric trauma. Patients aged 1 year and above were included. Psychiatric diagnoses were defined using ICD9-CM codes and the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition. Unadjusted and adjusted analyses were performed using STATA-15.2.
  Results: Of the total 6.7 million children and adolescents admitted to hospital in 2012,141,561 (2.12%) of them had a primary diagnosis of trauma. 17.3% (n=23,312) of patients had a psychiatric diagnosis. Patients with a psychiatric disorder were older compared to patients without a psychiatric disorder (mean age: 16.3 vs 12.2 years, p<0.001), were more often males (76.1% vs 68%, p<0.001) and white (58.2% vs 54%, p<0.001). The highest prevalence of psychiatric disorders (58.1%) was reported in the 15-19 years age group. Patients with a psychiatric diagnosis had multiple injuries (44.3%), isolated extremity fractures (18%), isolated other injuries (15%) and head injuries (14.8%).  Overall mortality was lower for injured patients with a psychiatric disorder both in unadjusted and adjusted analyses (0.6 vs 1.3; aOR=0.98, p<0.001).
  Conclusion: Psychiatric diagnoses are surprisingly common among pediatric trauma patients. Increased vigilance and counseling is needed for this population. 
Creation-Date: 20190802 
File-URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Tennakoon.pdf
Handle: RePEc:boc:scon19:41