Template-Type: ReDIF-Paper 1.0
Title: Resultssets in resultsframes in Stata 16-plus
File-URL: http://repec.org/lsug2022/uk2022_newson.pdf
File-URL: http://repec.org/lsug2022/uk2022_newson_examples.zip
Author-Name: Roger Newson
Author-Workplace-Name: Cancer Prevention Group, School of Cancer & Pharmaceutical Sciences, King's College London
Author-Person: pne37
Abstract: A resultsset is a Stata dataset created as output by a Stata command. It may be listed and/or saved in a disk file and/or written over an existing dataset in memory, and/or (in Stata Versions 16 or higher) written to a data frame (or resultsframe) in the memory, without damaging any existing data frames. Commands creating resultssets include parmest, parmby, xcontract, xcollapse, descsave, xsvmat, and xdir. Commands useful for processing resultsframes include xframeappend, fraddinby, and invdesc. We survey the ways in which resultsset processing has been changed by resultsframes.
Creation-Date: 20220910
Handle: RePEc:boc:lsug22:01

Template-Type: ReDIF-Paper 1.0
Title: A suite of Stata programs for analysing simulation studies
File-URL: http://repec.org/lsug2022/uk2022_marley-zagar.pptx
Author-Name: Ella Marley-Zagar
Author-Workplace-Name:  MRC Clinical Trials Unit at UCL, London, UK
Author-Name: Ian R. White
Author-Workplace-Name: MRC Clinical Trials Unit at UCL, London, UK
Author-Person: pwh62
Author-Name: Tim P. Morris
Author-Workplace-Name: MRC Clinical Trials Unit at UCL, London, UK
Abstract: Simulation studies are used in a variety of disciplines to evaluate the properties of statistical methods. Simulation studies involve creating data by random sampling, typically from known probability distributions, with the aim of assessing the robustness and accuracy of new statistical techniques by comparing them to some known truth. We introduce the siman suite for the analysis of simulation results, a set of Stata programs that offer data manipulation, analysis and graphics to process, explore and visualise the results of simulation studies.
 siman expects a sensibly structured dataset of simulation study estimates, with input variables being in ‘long’ or ‘wide’ format, string or 1 numeric. The estimates data can be reshaped by siman reshape to enable data exploration.
 The key commands include siman analyse to estimate and tabulate performance; graphs to explore the estimates data (siman scatter, siman swarm, siman zipplot, siman blandaltman, siman comparemethodsscatter); and a variety of graphs to visualise the performance measures (siman nestloop, siman lollyplot, siman trellis) in the form of scatter plots, swarm plots, zip plots, Bland–Altman plots, nested-loop plots, lollyplots and trellis graphs (see Morris et al., 2019).
Creation-Date: 20220910
Handle: RePEc:boc:lsug22:02

Template-Type: ReDIF-Paper 1.0
Title: Cook’s distance measures for panel data models
File-URL: http://repec.org/lsug2022/uk2022_vincent.pdf
Author-Name: David Vincent 
Author-Workplace-Name: David Vincent Economics
Abstract: Influential observations in regression analysis, are datapoints whose deletion has a large impact on the estimated coefficients. The usual diagnostics for assessing the influence of each datapoint, are designed for least squares regression and independent observations and are not appropriate when estimating panel data models.
​The purpose of this presentation is to describe a new command cooksd2, which extends the traditional Cook’s (1977) distance measure, to determine the influence of each datapoint when applying the fixed, random and between-effects regression estimators. The approach is based on the framework developed by Christensen, Pearson and Johnson (1992) and also reports the influence of an entire subject or group of datapoints, following the methods described by Banerjee & Frees (1997).
Creation-Date: 20220910
Handle: RePEc:boc:lsug22:03

Template-Type: ReDIF-Paper 1.0
Title: Bayesian multilevel modeling
File-URL: http://repec.org/lsug2022/uk2022_marchenko.pdf
Author-Name: Yulia Marchenko
Author-Workplace-Name: StataCorp
Abstract: In multilevel or hierarchical data, which include longitudinal, cross-sectional, and repeated-measures data, observations belong to different groups. Groups may represent different levels of hierarchy such as hospitals, doctors nested within hospitals, and patients nested within doctors nested within hospitals. Multilevel models incorporate group-specific effects in the regression model and assume that they vary randomly across groups according to some a priori distribution, commonly a normal distribution. This assumption makes multilevel models natural candidates for Bayesian analysis. Bayesian multilevel models additionally assume that other model parameters such as regression coefficients and variance components — variances of group-specific effects — are also random.
​ In this presentation, I will discuss some of the advantages of Bayesian multilevel modeling over the classical frequentist estimation. I will cover some basic random-intercept and random-coefficients modeling using the bayes: mixed command. I will then demonstrate more advanced model fitting by using the new-in-Stata-17 multilevel syntax of the bayesmh command, including multivariate and nonlinear multilevel models.
Creation-Date: 20220910
Handle: RePEc:boc:lsug22:04

Template-Type: ReDIF-Paper 1.0
Title: Bias-corrected estimation of linear dynamic panel data models 
File-URL: http://repec.org/lsug2022/uk2022_kripfganz.pdf
Author-Name: Sebastian Kripfganz
Author-Workplace-Name:  University of Exeter Business School
Author-Person: pkr246
Author-Name: Jörg Breitung
Author-Workplace-Name:  University of Cologne
Author-Person: pbr526
Abstract: In the presence of unobserved group-specific heterogeneity, the conventional fixed-effects and random-effects estimators for linear panel data models are biased when the model contains a lagged dependent variable and the number of time periods is small. We present a computationally simple bias-corrected estimator with attractive finite-sample properties, which is implemented in our new xtdpdbc Stata package. The estimator relies neither on instrumental variables nor on specific assumptions about the initial observations. Because it is a method-of-moments estimator, standard errors are readily available from asymptotic theory. Higher-order lags of the dependent variable can be accommodated as well. A useful test for the correct model specification is the Arellano–Bond test for residual 3 autocorrelation. The random-effects versus fixed-effects assumption can be tested using a Hansen overidentification test or a generalized Hausman test. The user can also specify a hybrid model, in which only a subset of the exogenous regressors satisfies a random-effects assumption.
Creation-Date: 20220910
Handle: RePEc:boc:lsug22:05

Template-Type: ReDIF-Paper 1.0
Title: Impact of proximity to gas production activity on birth outcomes across the US
File-URL: http://repec.org/lsug2022/uk2022_baum.pdf
Author-Name: Christopher F. Baum
Author-Workplace-Name: Boston College
Author-Person: pba1
Author-Name: Hailee Schuele
Author-Workplace-Name: Boston College
Author-Name: Philip J. Landrigan
Author-Workplace-Name: Boston College
Author-Name: Summer Sherburne Hawkins
Author-Workplace-Name: Boston College
Abstract: Despite mounting evidence on the health effects of natural gas development (NGD), including hydraulic fracturing (“fracking”), existing research has been constrained to high-producing states, limiting generalizability. We examined the impacts of prenatal exposure to NGD production activity in all gas-producing US states on birth outcomes overall and by race/ethnicity. Mata routines were developed to link 185,376 NGD production facilities in 28 US states and their distance-weighted monthly output with county population centroids via geocoding. These data were then merged with 2005–2018 county-level microdata natality files on 33,849,409 singleton births from 1,984 counties in 28 states, using nine-month county-level averages of NGD production by both conventional and unconventional production methods, based on month/year of birth.
  Linear regression models were estimated to examine the impact of prenatal exposure to NGD production activity on birth weight and gestational age, while logistic regression models were used for the dichotomous outcomes of low birth weight (LBW), preterm birth, and small for gestational age (SGA). Overall, prenatal exposure to NGD production activity increased adverse birth outcomes. We found that a 10% increase in NGD production in a county decreased mean birth weight by 1.48 grams. A significant interaction by race/ethnicity revealed that a 10% increase in NGD production decreased birth weight for infants born to Black women by 10.19 grams and Asian women by 2.76 grams, with no significant reductions in birth weight for infants born to women from other racial/ethnic groups. Although effect sizes were small, results were highly consistent. NGD production decreases infant birth weight, particularly for those born to minoritized mothers.
Creation-Date: 20220910
Handle: RePEc:boc:lsug22:06

Template-Type: ReDIF-Paper 1.0
Title: Estimating Compulsory Schooling Impacts on Labour Market Outcomes in Mexico using Fuzzy Regression Discontinuity Design (RDD) with parametric and non-parametric analyses
File-URL: http://repec.org/lsug2022/uk2022_leon1.pdf
Author-Name: Erendira Leon Bravo
Author-Workplace-Name: University of Westminster
Abstract: This study estimates the impacts on labour market outcomes of the 1993 compulsory schooling reform in Mexico. A well-known problem in this analysis is the endogeneity between schooling and labour market outcomes due to unobservable characteristics that could jointly determine them. There is also heterogeneity in the empirical evidence of the effectiveness of such schooling policies among developing and developed countries perhaps due to the different contexts and identification strategies used. Some studies use Instrumental Variables (IV) and Difference in differences (D-i-D) methods to tackle endogeneity issues. Most analyses use a Regression Discontinuity Design (RDD) approach with different order polynomial of the year of birth (i.e., cubic or quartic order), whereas few studies use months of birth for more accurate and robust estimates as it allows more schooling variation within a year.
 The impact of the Mexican policy is analysed in this study through a fuzzy RDD approach with the use of Stata for the period 2009 to 2017. It addresses endogeneity by exploiting the age cohort discontinuities in months of birth, for more robust estimation, as an exogenous source of education variation. Fuzzy RDD then compares schooling and labour market outcomes among the birth cohorts exposed to those not exposed to the reform. The fuzziness accounts for the imperfect compliance by using the random assignment of the exposure to the policy.
 Stata allows plotting discontinuity graphs between cohorts as well as the McCrary test to validate the use of this methodology. It also facilitates parametric and non-parametric analyses. The empirical evidence suggests that the 1993 compulsory schooling law, although raising average school attendance, was an insufficient policy to impact labour market outcomes in Mexico. The analysis contributes to the limited literature on the returns to compulsory schooling that uses a rigorous RDD methodology in developed and developing countries.
Creation-Date: 20220910
Handle: RePEc:boc:lsug22:07

Template-Type: ReDIF-Paper 1.0
Title: Bias Adjusted Three Step Latent Class Analysis Using R and the gsem Command in Stata
File-URL: http://repec.org/lsug2022/uk2022_tompsett.pdf
Abstract: In this presentation we will describe a means to perform bias adjusted latent class analysis using three step methodology. This method is often performed using MPLUS, LATENT GOLD, or specific functions in Stata. Here we will describe a novel means to perform this analysis using the poLCA package in R to perform the first two steps, and the gsem command in Stata to perform the third step. This methodology is applied to a case study involving performing causal analysis by integrating inverse probability of treatment weights into the methodology. We will also demonstrate how to obtain estimates of the average causal effect of exposure on a latent class using the margins command with robust standard errors. Our aim is to broaden awareness of three step latent class methods and causal analysis, and offer means to perform this methodology for users of R, for which there currently is little software available.
Author-Name: Daniel Tompsett 
Author-Workplace-Name: University College London
Author-Name: Bianca De Stavola
Author-Workplace-Name: University College London
Creation-Date: 20220910
Handle: RePEc:boc:lsug22:08

Template-Type: ReDIF-Paper 1.0
Title: Distributed Lag Non-Linear Models (DLNMs) in Stata
File-URL: http://repec.org/lsug2022/uk2022_tobias.pptx
Author-Name: Aurelio Tobias 
Author-Workplace-Name: Spanish Research Council (CSIC), Barcelona, Spain
Author-Person: pto220
Author-Name: Ben Armstrong
Author-Workplace-Name: Spanish Research Council (CSIC), Barcelona, Spain
Author-Name: Antonio Gasparrini
Author-Workplace-Name: Spanish Research Council (CSIC), Barcelona, Spain
Abstract: The distributed lag non-linear models (DLNMs) represent a modelling framework to flexibly describe associations showing potentially non-linear and delayed effects in time-series data. This methodology rests on the definition of a crossbasis, a bi-dimensional functional space combining two sets of basis functions, which specify the relationships in the dimensions of predictor and lags, respectively. DLNMs have been widely used in environmental epidemiology to investigate the short-term associations between environmental exposures, such as weather variables or air pollution, and health outcomes, such as mortality counts or disease-specific hospital admissions. We implemented the DLNMs framework in Stata through the crossbasis command to generate the basis variables that can be fitted in a broad range of regression models. In addition, the post estimation commands crossbgraph and crossbslices allow interpreting the results, emphasizing graphical representation, after the regression model fit. We present an overview of the capabilities of these new user-developed commands and describe the practical steps to fit and interpret DLNMs with an example of real data to represent the relationship between temperature and mortality in London during the period 2002-2006.
Creation-Date: 20220910
Handle: RePEc:boc:lsug22:09

Template-Type: ReDIF-Paper 1.0
Title: Advanced Data visualizations with Stata: Part III
File-URL: http://repec.org/lsug2022/uk2022_naqvi.pdf
Author-Name: Asjad Naqvi
Author-Workplace-Name: Austrian Institute for Economic Research (WIFO)
Author-Person: pna493
Abstract: The presentation will showcase recent developments in complex data visualizations with Stata. These include various types of polar plots, for example, spider plots, sunburst charts, circular bar graphs, and various visualizations with spatial data, including bi-variate maps, gridded waffle charts, and map clippings. Updates for several Stata packages including joyplot, bimap, streamplot, and clipgeo will be presented and suggestions for improving Stata’s graph capabilities will be discussed.
Creation-Date: 20220910
Handle: RePEc:boc:lsug22:10

Template-Type: ReDIF-Paper 1.0
Title: Grinding axes: Axis scales, labels and ticks
File-URL: http://repec.org/lsug2022/uk2022_cox.pptx
Author-Name: Nick Cox
Author-Workplace-Name: Durham University, UK
Author-Person: pco34
Abstract: This is a round-up of not quite utterly obvious tips and tricks for graph axes, using both official and community-contributed commands. Ever needed
 a logarithmic scale but found default labels undesirable?
 a slightly non-standard scale such as logit, reciprocal or root?
 a tick to be suppressed?
 labels between ticks, not at them?
 automagic choice of “nice” labels under your control?
 Community-contributed commands mentioned will include mylabels, myticks, nicelabels, niceloglabels, qplot and transplot.
Creation-Date: 20220910
Handle: RePEc:boc:lsug22:11

Template-Type: ReDIF-Paper 1.0
Title: Exchangeably weighted bootstrap schemes
File-URL: http://repec.org/lsug2022/uk2022_vankerm-handout.pdf
File-URL: http://repec.org/lsug2022/uk2022_vankerm-slides.pdf
Author-Name: Philippe van Kerm
Author-Workplace-Name: LISER and University of Luxembourg
Author-Person: pva19
Abstract: The exchangeably weighted bootstrap is one of the many variants of bootstrap resampling schemes. Rather than directly drawing observations with replacement from the data, weighted bootstrap schemes generate vectors of replication weights to form bootstrap replications. Various ways to generate the replication weights can be adopted and some choices bring practical computational advantages. This talk demonstrates how easily such schemes can be implemented and where they are particularly useful, and introduces the exbsample command which facilitates their implementation.
Creation-Date: 20220910
Handle: RePEc:boc:lsug22:12

Template-Type: ReDIF-Paper 1.0
Title: Improving fitting and predictions for flexible parametric survival models
File-URL: http://repec.org/lsug2022/uk2022_lambert.html
Author-Name: Paul Lambert
Author-Workplace-Name: University of Leicester, UK 
Author-Workplace-Name: Karolinska Institutet, Sweden
Abstract: Flexible parametric survival models have been available in Stata since 2000 with Patrick Royston’s stpm command. I developed stpm2 in 2008 which added various extensions. However, the command is old and does not take advantage of some of the features Stata has added over the years. I will introduce stpm3, which has been completely rewritten adds a number of useful features including,
  Full support for factor variables (including for time-dependent effects).
 Use of extended functions within a varlist. Incorporate various functions (splines, fractional polynomial functions, etc.) directly within a varlist. These also work when including interactions and time-dependent effects.
 Easier and more intuitive predictions. These full synchronize with the extended functions making predictions for complex models with multiple interactions/non-linear effects incredibly simple. Make predictions for specific covariate patterns and perform various types of contrasts. 8
 Directly save predictions to one or more frames. This separates the data used to analyse the data and that used for predictions.
 Obtain various marginal estimates using standsurv. This synchronizes with stpm3 factor variables and extended functions making marginal estimates much easier and less prone to user mistakes for complex models
 Model on the log(hazard) scale. vii. Do all the above for standard survival models, competing risk models, multistate models and relative survival models all within the same framework.
Creation-Date: 20220910
Handle: RePEc:boc:lsug22:13

Template-Type: ReDIF-Paper 1.0
Title: sttex – a new dynamic document command for Stata and LaTeX
File-URL: http://repec.org/lsug2022/uk2022_jann.pdf
Author-Name: Ben Jann
Author-Workplace-Name: University of Bern
Author-Person: pja61
Abstract: In this talk, I will introduce a new command for processing a dynamic LaTeX document in Stata, i.e., a document containing both LaTeX paragraphs and Stata code. A key feature of the new command is that it tracks changes in the Stata code and executes the code only when needed, allowing for an efficient workflow. The command is useful for creating automated statistical reports, writing articles with data analysis, preparing slides for a methods course or a conference talk, or even writing a complete textbook with examples of applications.
Creation-Date: 20220910
Handle: RePEc:boc:lsug22:14

Template-Type: ReDIF-Paper 1.0
Title: Custom estimation tables
File-URL: http://repec.org/lsug2022/uk2022_pitblado.pdf
Author-Name: Jeff Pitblado
Author-Workplace-Name: Stata Corp
Abstract: This presentation illustrates how to construct custom tables from one or more estimation commands. I demonstrate how to add custom labels for significant coefficients and make targeted style edits to cells in the table using the following commands:
 collect get,
 collect dir,
 collect dims, 
 collect levelsof,
 collect label list,
 collect label values,
 collect layout,
 collect query header,
 collect style header,
 collect style showbase,
 collect style row,
 collect style cell,
 collect query column,
 collect style column,
 collect stars,
 collect query,
 column collect preview,
 etable.
 I begin with a description of what constitutes a collection and how items (numeric and string results) in a collection are tagged (identified) and conclude with a simple workflow to enable users to build their own custom tables from estimation commands. This presentation motivates the construction of estimation tables and concludes with the convenience command etable.
Creation-Date: 20220910
Handle: RePEc:boc:lsug22:15

Template-Type: ReDIF-Paper 1.0
Title: The Impact of a Government Pay Reform in Mexico on the Public Sector Wage Gap
File-URL: http://repec.org/lsug2022/uk2022_leon-bravo.pdf
Author-Name: Erendira Leon Bravo
Author-Workplace-Name: University of Westminster
Author-Name: Barry Reilly
Author-Workplace-Name: University of Sussex
Author-Person: pre338
Abstract: The 2018 Federal Pay Reform on the Remuneration of Public Servants in Mexico is used to exploit its impacts on the public-private sector wage gap across the unconditional wage distribution in a developing country context. This policy uses both payment cuts and freezes for public sector workers.
 Using cross-sectional data from 2017 to 2019, both the mean and unconditional quantile (UQ) regression models within a Difference-in-Differences (D-i-D) framework are estimated. Stata allows the use of UQ regressions based on the Re-centred Influence Function (RIF) to centre the IF around the statistic of interest (e.g., the population mean ‘µ’, 10 E[Y]) and not zero (i.e., re-weighting the observations) for generating the RIF-quantiles. The RIF average effects are interpreted at different quantiles of the unconditional wage distribution (e.g., the 5th, 95th percentiles or other intermediate quantiles).
 Then, the D-i-D approach implemented through Stata provides the effects of the reform before and after the policy intervention. It also deals with the endogeneity of employment selection by taking into account the differences in the unobservable effects of the public-private employment sector selection pre-treatment and post-treatment, such unobservables are differenced out to mitigate the concerns about potential selection bias.
 Robustness checks are also executed with Stata, such as cohort fixed effects with pseudo panel dataset, a two-step model within a Heckman framework, the Hansen J-statistic to test orthogonality, an IV-based model, an individual-level fixed effects (FE) model with panel dataset, and a placebo in time test.
 Although there is some evidence that public sector employees anticipated the introduction of the policy, it reduced the public sector pay gap strongly among the lower-paid workers of the unconditional pay distribution. The UQ effects of this policy change on the public–private sectoral wage gap contribute to the limited literature for both developed and developing countries.
Creation-Date: 20220910
Handle: RePEc:boc:lsug22:16

Template-Type: ReDIF-Paper 1.0
Title: mixrandregret: A command for fitting mixed random regret minimization models using Stata
File-URL: http://repec.org/lsug2022/uk2022_gutierrez-vargas.pdf
Author-Name: Álvaro A. Gutiérrez-Vargas
Author-Workplace-Name: Research Centre for Operation Research and Statistics (ORSTAT), KU Leuven
Author-Name: Ziyue Zhu 
Author-Workplace-Name: Research Centre for Operation Research and Statistics (ORSTAT), KU Leuven
Author-Name: Martina Vandebroek
Author-Workplace-Name: Research Centre for Operation Research and Statistics (ORSTAT), KU Leuven
Abstract: Stata has a strong suite of survey data-analysis references and tools and remains the primary choice for researchers working with survey data. On the other hand, R is the primary choice for data visualization in many academic papers, given its flexibility, especially when using the ggplot2 package based on the design philosophy of The Grammar of Graphics. An unfulfilled need for many researchers is innovatively presenting survey data-analysis results without feeling limited by working within one statistical software only. This presentation discusses a workflow of using Stata for analysis and exporting the results through the postfile commands, then handing the data off to R to create a rich array of figures. As a proof of concept, the presentation will show results from an ongoing health economics research project from the Philippines of around 200,000 observations from national income and expenditure survey data to create publication-quality dumbbell plots, concentration curves, and Pen’s parades. Finally, the presentation will briefly describe how to share code and results in a public repository like Github.
Creation-Date: 20220910
Handle: RePEc:boc:lsug22:17

Template-Type: ReDIF-Paper 1.0
Title: Illuminating the factor and dependence structure in large panel models
File-URL: http://repec.org/lsug2022/uk2022_ditzen.pdf
Author-Name: Jan Ditzen
Author-Workplace-Name: Free University of Bozen-Bolzano
Author-Person: pdi434
Abstract: In panel models a precise understanding about the number of common factors and dependence across the cross-sectional dimension is key for any applied work. This talk will give an overview about how to estimate the number of common factors and how to test for cross-sectional dependence. It does so by presenting two community contribute commands: xtnumfac and xtcd2. xtnumfac implements 10 different methods to estimate the number of factors, among them the popular methods by Bai & Ng (2002) and Ahn & Horenstein (2013). The degree of cross-section dependence is investigated using xtcd2. xtcd2 allows implements three different tests for cross-section dependence, based on Pesaran (2015), Juodis & Reese (2021) and Pesaran & Xie (2021). The talk includes a review of the theory, a discussion of the commands and empirical examples.
Creation-Date: 20220910
Handle: RePEc:boc:lsug22:18