Template-Type: ReDIF-Paper 1.0 Title: Balance and variance inflation checks for completeness-propensity weights File-URL: http://repec.org/lsug2024/UK24_Newson.pdf File-URL: http://repec.org/lsug2024/UK24_Newson_example1.do Author-Name: Roger Newson Author-Workplace-Name: Cancer Prevention Group, School of Cancer & Pharmaceutical Sciences, King's College London Author-Person: pne37 Author-Email: roger.newson@kcl.ac.uk Abstract: Inverse treatment-propensity weights are a standard method for adjusting for predictors of exposure to a treatment. As a treatment-propensity score is a balancing score, it makes sense to do balance checks on the corresponding treatment-propensity weights. It is also a good idea to do variance-inflation checks, to estimate how much the propensity weights might inflate the variance of an estimated treatment effect, in the pessimistic scenario in which the weights are not really necessary. In Stata, the SSC package somersd can be used for balance checks, and the SSC package haif can be used for variance-inflation checks. It is argued that balance and variance-inflation checks are also necessary in the case of completeness-propensity weights, which are intended to remove inbalance in predictors of completeness between the subsample with complete data and the full sample of subjects with complete or incomplete data. However, the usage of somersd, scsomersd, and haif must be modified, because we are removing imbalance between the complete sample and the full sample, instead of between the treated subsample and the untreated subsample. An example will be presented, from a clinical trial in which the author was involved, and in which nearly a quarter of randomized subjects had no final outcome data. A post-hoc sensitivity analysis is presented, using inverse-completeness-probability weights. Creation-Date: 20240916 Handle: RePEc:boc:lsug24:01 Template-Type: ReDIF-Paper 1.0 Title: Using GitHub for collaborative analysis File-URL: http://repec.org/lsug2024/UK24_Middleton-Dalby.pptx Author-Name: Chloe Middleton-Dalby Author-Workplace-Name: Adelphi Real World Author-Email: chloe.middleton-dalby@adelphigroup.com Author-Name: Liane Gillespie-Akar Author-Workplace-Name: Adelphi Real World Abstract: Recent trends have led to an increased importance being placed upon formal quality control processes for analysis conducted within the pharmaceutical industry and beyond. While a key feature of Stata is reproducibility through do- files and automated reporting, there are limited built-in tools for version control, code review, and collaborative analysis. Git is a distributed version control system, widely used by software development teams for collaborative programming, change tracking, and enforcement of best practices. Git keeps a record of all changes to a codebase over time, providing the ability to easily revert to a previous state, manage temporary branches, and combine code written by multiple people. Services such as GitHub build on the Git framework, providing tools to conduct code review, host source files, and manage projects. We present an overview of Git and GitHub, and explain how we use it for Stata projects at Adelphi Real World: an organisation specialising in the collection and analysis of real-world healthcare data from physicians, patients, and caregivers. We share an example project to outline the benefits of code review for both data integrity and as a training tool. We also discuss how, through implementing a software-development-like approach to the creation of ado-files, we can enhance the process of creating new programs in Stata and gain confidence in the robustness and quality of our commands. Creation-Date: 20240916 Handle: RePEc:boc:lsug24:02 Template-Type: ReDIF-Paper 1.0 Title: My favorite overlooked life savers in Stata File-URL: http://repec.org/lsug2024/UK24_Ditzen.pdf Author-Name: Jan Ditzen Author-Workplace-Name: Free University of Bozen-Bolzano Author-Person: pdi434 Abstract: Everyone loves a good testing, estimation, or graphical community contributed package. However, a successful empirical project relies on many small, overlooked but priceless programs. In this talk I will present three of my personal life savers. 1. adotools: adotools has four main uses. It allows the user to create and maintain a library of adopaths. Paths can be dynamically added to and removed from a running Stata session. When removing an ado-path, all ado-programs located in the folder are cleared from memory. adotools can also reset all user specified adopaths. 2. psimulate2: ever wanted to run Monte Carlo simulations in parallel? With psimulate2 you can and there are (almost) no setup costs at all. psimulate2 splits the number of repetitions into equal chunks, spreads them over multiple instances of Stata and reduces the time to run Monte Carlo simulations. It also allows macros to be returned and can save and append simulation results directly into a dta or frame. It can be run on Windows, Unix and Mac. 3. xtgetpca: Extracting principal components in panel data is common, however no Stata solution exists. xtgetpca fills this gap. It allows for different types of standardization, removal of fixed effects and unbalanced panels. Creation-Date: 20240916 Handle: RePEc:boc:lsug24:03 Template-Type: ReDIF-Paper 1.0 Title: Professional statistical development: What, why, and how File-URL: http://repec.org/lsug2024/UK24_Marchenko.pdf Author-Name: Yulia Marchenko Author-Workplace-Name: StataCorp Abstract: In this presentation, I will talk about professional statistical software development in Stata and the challenges of producing and supporting a statistical software package. I will share some of my experience on how to produce high-quality software, including verification, certification, and reproducibility of the results, and on how to write efficient and stable Stata code. I will also discuss some of the aspects of commercial software development such as clear and comprehensive documentation, consistent specifications, concise and transparent output, extensive error checks, and more. Creation-Date: 20240916 Handle: RePEc:boc:lsug24:04 Template-Type: ReDIF-Paper 1.0 Title: Stata, medical statistics, and me... File-URL: http://repec.org/lsug2024/UK24_DeStavola.pdf Author-Name: Bianca de Stavola Author-Workplace-Name: University College London Abstract: In this talk, I will use personal recollections to revisit the challenges many public health researchers have faced since the birth of Stata in 1985. I will discuss how, from the 1990s onwards, the increasing demands for data management and analysis were met by Stata developers and the broader Stata community, particularly Michael Hills. Additionally, I will review how Stata's expansion in scope and capacity with each new version has enhanced our ability to train new generations of medical statisticians and epidemiologists. Finally, I will reflect on current and future challenges. Creation-Date: 20240916 Handle: RePEc:boc:lsug24:05 Template-Type: ReDIF-Paper 1.0 Title: Estimating the wage premia of refugee immigrants File-URL: http://repec.org/lsug2024/UK24_Baum.pdf Author-Name: Christopher F. Baum Author-Workplace-Name: Boston College Author-Person: pba1 Author-Email: baum@bc.edu Author-Name: Hans Lööf Author-Workplace-Name: Royal Institute of Technology, Stockholm Author-Person: plf1 Author-Name: Andreas Stephan Author-Workplace-Name: Linneaus University Author-Person: pst185 Author-Name: Klaus F. Zimmermann Author-Workplace-Name: UNU-Merit Maastricht University Author-Person: pzi13 Abstract: In this case study, we examine the wage earnings of fully-employed previous refugee immigrants in Sweden. Using administrative employer-employee data from 1990 onwards, about 100,000 refugee immigrants who arrived between 1980 and 1996 and were granted asylum are compared to a matched sample of native- born workers using coarsened exact matching. Employing recentered influence function (RIF) quantile regressions to wage earnings for the period 2011–2015, the occupational-task-based Oaxaca–Blinder decomposition approach shows that refugees perform better than natives at the median wage, controlling for individual and firm characteristics. The RIF-quantile approach provides better insights for the analysis of these wage differentials than the standard regression model employed in earlier versions of the study. Creation-Date: 20240916 Handle: RePEc:boc:lsug24:06 Template-Type: ReDIF-Paper 1.0 Title: compmed: A new command for estimating causal mediation effects with non-adherence to treatment allocation File-URL: http://repec.org/lsug2024/UK24_Ster.pptx Author-Name: Anca Chis Ster Author-Workplace-Name: Kings College London Author-Name: Sabine Landau Author-Workplace-Name: Kings College London Author-Name: Richard Emsley Author-Workplace-Name: Kings College London Abstract: In clinical trials, a standard intention-to-treat analysis will unbiasedly estimate the causal effect of treatment offer, though ignores the impact of participant non- adherence. To account for this, one can estimate a complier-average causal effect (CACE), the average causal effect of treatment receipt in the principal strata of participants who would comply with their randomisation allocation. Evaluating how interventions lead to changes in the outcome (the mechanism) is also key for the development of more effective interventions. A mediation analysis aims to decompose a total treatment effect into an indirect effect, one that operates via changing the mediator, and a direct effect. To identify mediation effects with non- adherence, it has been shown that the CACE can be decomposed into a direct effect, the Complier-Average Natural Direct Effect (CANDE), and a mediated effect, the Complier-Average Causal Mediated Effect (CACME). These can be estimated with linear Structural Equation Models (SEMs) with Instrumental Variables. However, obtaining estimates of the CACME and CANDE in Stata requires (1) correct fitting of the SEM in Stata and (2) correct identification of the pathways that correspond to the CACME and CANDE. To address these challenges, we introduce a new command, compmed, which allows users to perform the relevant SEM fitting for estimating the CACME and CANDE using a single, more intuitive, and user-friendly interface. Compmed requires the user to specify only the continuous outcome, continuous mediator, treatment receipt, and randomisation variables. Estimates, standard errors, and 95% confidence intervals are reported for all effects. Creation-Date: 20240916 Handle: RePEc:boc:lsug24:07 Template-Type: ReDIF-Paper 1.0 Title: Causal Mediation File-URL: http://repec.org/lsug2024/UK24_MacDonald.pdf Author-Name: Kristin MacDonald Author-Workplace-Name: StataCorp Abstract: Causal inference aims to identify and quantify a causal effect. With traditional causal inference methods, we can estimate the overall effect of a treatment on an outcome. When we want to better understand a causal effect, we can use causal mediation analysis to decompose the effect into a direct effect of the treatment on the outcome and an indirect effect through another variable, the mediator. Causal mediation analysis can be performed in many situations—the outcome and mediator variables may be continuous, binary, or count, and the treatment variable may be binary, multivalued, or continuous. In this talk, I will introduce the framework for causal mediation analysis and demonstrate how to perform this analysis with the mediate command, which was introduced in Stata 18. Examples will include various combinations outcome, mediator, and treatment types. Creation-Date: 20240916 Handle: RePEc:boc:lsug24:08 Template-Type: ReDIF-Paper 1.0 Title: Imputation when data cannot be pooled File-URL: http://repec.org/lsug2024/UK24_Orsini.pdf Author-Name: Nicola Orsini Author-Workplace-Name: Karolinska Institutet Author-Person: por11 Author-Email: nicola.orsini@ki.se Author-Name: Robert Thiesmeier Author-Workplace-Name: Karolinska Institutet Author-Name: Matteo Bottai Author-Workplace-Name: Karolinska Institutet Abstract: Distributed data networks are increasingly used to study human health across different populations and countries. Analyses are commonly performed at each study site to avoid the transfer of individual data between study sites due to legal and logistical barriers. Despite many benefits, however, a frequent challenge in such networks is the absence of key variables of interest at one or more study sites. Current imputation methods require the availability of individual data from the involved studies to impute missing data. This creates a need for methods that can impute data in one study using only information that can be easily and freely shared within a data network. To address this need, we introduce a new Stata command, mi impute from, designed to impute missing variables in a single study using a linear predictor and the related variance/covariance matrix from an imputation model fit from one or multiple external studies. In this presentation, the syntax of mi impute from will be presented along with motivating examples from health-related research. Creation-Date: 20240916 Handle: RePEc:boc:lsug24:09 Template-Type: ReDIF-Paper 1.0 Title: Scalable high-dimensional non-parametric density estimation, with Bayesian applications File-URL: http://repec.org/lsug2024/UK24_Grant.pdf Author-Name: Robert Grant Author-Workplace-Name: BayesCamp Ltd Author-Email: robert@bayescamp.com Abstract: Few methods have been proposed for flexible, non-parametric density estimation, and they do not scale well to high-dimensional problems. We describe a new approach based on smoothed trees called the kudzu density (Grant 2022). This fits the little-known density estimation tree (Ram & Gray 2011) to a dataset and convolves the edges with inverse logistic functions, which are in the class of computationally minimal smooth ramps. New Stata commands provide tree fitting, kudzu tuning, estimates of joint, marginal and cumulative densities, and pseudo-random numbers. Results will be shown for fidelity and computational cost. Preliminary results will also be shown for ensembles of kudzu under bagging and boosting. Kudzu densities are useful for Bayesian model updating where models have many unknowns, require rapid update, datasets are large, and posteriors have no guarantee of convexity and unimodality. The input “dataset” is the posterior sample from a previous analysis. This is demonstrated with a real-life large dataset. A new command outputs code to use the kudzu prior in bayesmh evaluators, BUGS and Stan. Creation-Date: 20240916 Handle: RePEc:boc:lsug24:10 Template-Type: ReDIF-Paper 1.0 Title: Thirty graphical tips Stata users should know, revisited File-URL: http://repec.org/lsug2024/UK24_Cox.zip Author-Name: Nick Cox Author-Workplace-Name: Durham University, UK Author-Person: pco34 Abstract: In 2010 I gave a talk at the London presenting thirty graphical tips. The display materials remain accessible on Stata's website, but are awkward to view, as they are based on a series of .smcl files. I will recycle the title, and some of the tips, and add new ones, as some of what you or your students or your research team should know about when coding graphics for mainstream tasks. The theme of "thirty" matches this 30th London meeting, and to a good enough approximation my 33 years as a Stata user. The talk mixes examples from official and community-contributed commands and details both large and small. Creation-Date: 20240916 Handle: RePEc:boc:lsug24:11 Template-Type: ReDIF-Paper 1.0 Title: Fancy graphics: Small multiples carpentry File-URL: http://repec.org/lsug2024/UK24_VanKerm.pdf Author-Name: Philippe van Kerm Author-Workplace-Name: LISER and University of Luxembourg Author-Person: pva19 Abstract: Using 'small multiples' in data visualization and statistical graphics consists in combining repeated, small-sized diagrams to display variations in data patterns or associations across a series of units. Sometimes the small multiples are mere replications of identical plots, but with different plot elements highlighted. Small displays are typically arranged on a grid and the overall appearance is, as Tufte puts it, akin to the sequence of frames of a movie when ordering follows a time dimension. Creating diagrams for use in gridded 'small multiples' is easy with Stata's graphics combination commands. The grid pattern can however be limiting. The talk will present tips and tricks for building small multiple diagrams and illustrate some coding strategies for arranging individual frames in the most flexible way, opening up some creative possibilities of data visualization. Creation-Date: 20240916 Handle: RePEc:boc:lsug24:12 Template-Type: ReDIF-Paper 1.0 Title: Robust testing for serial correlation in linear panel-data models File-URL: http://repec.org/lsug2024/UK24_Kripfganz.pdf Author-Name: Sebastian Kripfganz Author-Workplace-Name: University of Exeter Business School Author-Email: S.Kripfganz@exeter.ac.uk Author-Person: pkr246 Abstract: Serial correlation tests are essential parts of standard model specification toolkits. For static panel models with strictly exogenous regressors, a variety of tests are readily available. However, their underlying assumptions can be very restrictive. For models with predetermined or endogenous regressors, including dynamic panel models, the Arellano–Bond (1991, Review of Economic Studies) test is predominantly used, but it has low power against certain alternatives. While more powerful alternatives exist, they are underused in empirical practice. The recently developed Jochmans (2020, Journal of Applied Econometrics) portmanteau test yields substantial power gains when the time horizon is very short, but it can quickly lose its advantage even for time dimensions that are still widely considered as small. I propose a new test based on a combination of short and longer differences, which overcomes this shortcoming and can be shown to have superior power against a wide range of stationary and nonstationary alternatives. It does not lose power as the process under the alternative approaches a random walk—unlike the Arellano–Bond test—and it is robust to large variances of the unit-specific error component—unlike the Jochmans portmanteau test. I present a new Stata command that flexibly implements these (and more) tests for serial correlation in linear error component panel-data models. The command can be run as a postestimation command after a variety of estimators, including generalized method of moments, maximum likelihood, and bias-corrected estimation. Creation-Date: 20240916 Handle: RePEc:boc:lsug24:13 Template-Type: ReDIF-Paper 1.0 Title: The Oaxaca-Blinder decomposition in Stata: an update File-URL: http://repec.org/lsug2024/UK24_Jann.pdf File-URL: http://repec.org/lsug2024/UK24_Jann-geoplot.pdf Author-Name: Ben Jann Author-Workplace-Name: University of Bern Author-Person: pja61 Abstract: In 2008, I published the Stata command -oaxaca-, which implements the popular Oaxaca-Blinder (OB) decomposition technique. This technique is used to analyze differences in outcomes between groups, such as the wage gap by gender or race. Over the years, both the functionality of Stata and the literature on decomposition methods have evolved, so that an update of the -oaxaca- command is now long overdue. In this talk I will present a revised version of - oaxaca- that uses modern Stata features such as factor-variable notation and supports additional decomposition variants that have been proposed in the literature (e.g., reweighted decompositions or decompositions based on recentered influence functions). Creation-Date: 20240916 Handle: RePEc:boc:lsug24:14 Template-Type: ReDIF-Paper 1.0 Title: Visualisations to evaluate and communicate adverse event data in randomised controlled trials File-URL: http://repec.org/lsug2024/UK24_Phillips.pptx Author-Name: Rachel Phillips Author-Workplace-Name: Imperial College London Author-Email: r.phillips@imperial.ac.uk Abstract: Introduction: Well-designed visualisations are a powerful way to communicate information to a range of audiences. In randomised controlled trials (RCT) where there is an abundance of complex data on harms (known as adverse events) visualisations can be a highly effective means to summarise harm profiles and identify potential adverse reactions. Trial reporting guidelines such as the CONSORT extension for harms encourage the use of visualisations for exploring harm outcomes, but research has demonstrated that their uptake is extremely low. Methods: To improve the communication of adverse event data collected in RCTs we developed recommendations to help trialists decide which visualisations to use to present this data. We developed Stata commands (aedot and aevolcano) to produce two of the visualisations, the volcano and dot plot, to present adverse event data with the aim of easing implementation and promoting increased uptake. Results: In this talk, using clinical examples, we will introduce and demonstrate application of these commands. We will contrast the produced visual summaries from the volcano and dot plot with traditional non-graphical presentations of adverse data with examples in the published literature, with the aim of demonstrating the benefits of graphical displays. Discussion: Visualisations offer an efficient means to summarise large amounts of adverse event data from RCTs and statistical software eases the implementation of such displays. We hope that development of bespoke Stata commands to create visual summaries of adverse events will increase uptake of visualisations in this area by the applied clinical trial statistician. Creation-Date: 20240916 Handle: RePEc:boc:lsug24:15 Template-Type: ReDIF-Paper 1.0 Title: Optimising adverse event analysis in clinical trials when dichotomising continuous harm outcomes File-URL: http://repec.org/lsug2024/UK24_Cornelius.pptx Author-Name: Victoria Cornelius Author-Workplace-Name: Imperial College London Author-Name: Odile Sauzet Author-Workplace-Name: Imperial College London Abstract: Introduction: The assessment of harm in randomized controlled trials is vital to enable a risk-benefit assessment on the intervention under evaluation. Many trials undertake regular monitoring of continuous outcomes such as laboratory measurements, for example, blood tests. Typical practice in a trial analysis is to dichotomize this type of data into abnormal/normal categories based on reference values. Frequently, the proportion of participants with abnormal results between treatment arms are then compared using a chi-squared or Fisher’s exact test reporting a p-value. Because dicotomization results in substantial loss of information contained in the outcome distribution, this increases the chance of missing a opportunity to detect signals of harm. Methods: A solution to this problem is to use the outcome distribution in each arm to estimate the between-arm difference in proportions of participants with an abnormal result. This approach has been developed by Sauzet et. al (2016), and it protects against a loss of information and retains statistical power. Results: In this talk, I will introduce the distributional approach and associated Stata community-contributed command distdicho. I will compare the original analysis of blood test results from a small population drug trial in pediatric eczema with the results using the distributional approach and discuss inference from the trial based on these. Creation-Date: 20240916 Handle: RePEc:boc:lsug24:16 Template-Type: ReDIF-Paper 1.0 Title: Enhancing multi-arm multi-stage (MAMS) randomised controlled trials: Implementing interim treatment selection rules with Stata’s nsatge commands. File-URL: http://repec.org/lsug2024/UK24_Choodari-Oskooei.pptx Author-Name: Babak Choodari-Oskooei Author-Workplace-Name: University of Leeds Creation-Date: 20240916 Handle: RePEc:boc:lsug24:17 Template-Type: ReDIF-Paper 1.0 Title: nmf: implementation of non-negative matrix factorisation (NMF) in Stata File-URL: http://repec.org/lsug2024/UK24_Batty1.pptx Author-Name: Jonathan Batty Author-Workplace-Name: University of Leeds Creation-Date: 20240916 Handle: RePEc:boc:lsug24:18 Template-Type: ReDIF-Paper 1.0 Title: Difference in differences using constraints in Stata File-URL: http://repec.org/lsug2024/UK24_Birch.pdf Author-Name: Colin Birch Author-Workplace-Name: APHA: Animal and Plant Health Agency Creation-Date: 20240916 Handle: RePEc:boc:lsug24:19 Template-Type: ReDIF-Paper 1.0 Title: Advanced Bayesian survival analysis with merlin and morgana File-URL: http://repec.org/lsug2024/UK24_Crowther.pdf Author-Name: Michael Crowther Author-Workplace-Name: Red Door Analytics Author-Email: michael@reddooranalytics.se Abstract: In this talk I will describe our latest work to bring advanced Bayesian survival analysis tools to Stata. Previously, we have introduced the morgana prefix command (bayesmh in disguise), which provides a Bayesian wrapper for survival models fitted with stmerlin (which is merlin’s more user friendly wrapper designed for working with st data). We have now begun the work to sync morgana with the much more general merlin command, to allow for Bayesian multiple outcome models. Within survival analysis, multiple outcomes arise when we consider competing risks or the more general setting of multi-state processes. Using an example in breast cancer, I will show how to estimate competing risks and illness- death multi-state models within a Bayesian framework, incorporating prior information for covariate effects, and baseline hazard parameters. Importantly, we have also developed the predict functionality to obtain a wide range of easily interpretable predictions, such as cumulative incidence functions and (restricted) life-expectancy, along with their credible intervals. Creation-Date: 20240916 Handle: RePEc:boc:lsug24:20 Template-Type: ReDIF-Paper 1.0 Title: codefinder: optimising Stata for the analysis of large, routinely collected healthcare data File-URL: http://repec.org/lsug2024/UK24_Batty2.pptx Author-Name: Jonathan Batty Author-Workplace-Name: University of Leeds Author-Name: Marlous Hall Author-Workplace-Name: University of Leeds Abstract: Routinely collected healthcare data (including electronic healthcare records and administrative data) are increasingly available at the whole-population scale, and may span decades of data collection. These data may be analysed as part of clinical, pharmacoepidemiologic and health services research, producing insights that improve future clinical care. However, the analysis of healthcare data on this scale presents a number of unique challenges. These include the storage of diagnosis, medication and procedure codes using a number of discordant systems (including ICD-9 and 10, SNOMED-CT, Read codes, etc.) and the inherently relational nature of the data (each patient has multiple clinical contacts, during which multiple codes may be recorded). Pre-processing and analysing these data using optimised methods has a number of benefits, including minimisation of computational requirements, analytic time, carbon footprint and cost. We will focus on one of the main issues faced by the healthcare data analyst: how to most efficiently collapse multiple, disparate diagnosis codes (stored as strings across a number of variables) into a discrete disease entity, using a pre-defined code list. A number of approaches (including the use of Boolean logic, the inlist function, string functions and regular expressions) will be sequentially benchmarked in a large, real-world healthcare dataset (n = 192 million hospitalisation episodes during a 12-year period; approximately 1 terabyte of data). The time and space complexity of each approach (in addition to its carbon footprint), will be reported. The most efficient strategy has been implemented into our newly-developed Stata command: codefinder, which will be discussed. Creation-Date: 20240916 Handle: RePEc:boc:lsug24:21 Template-Type: ReDIF-Paper 1.0 Title: Data-driven decision making using Stata File-URL: http://repec.org/lsug2024/UK24_Cerulli.pdf Author-Name: Giovanni Cerulli Author-Workplace-Name: CNR–IRCRES, National Research Council of Italy Author-Email: giovanni.cerulli@ircres.cnr.it Author-Person: pce40 Abstract: This presentation focuses on implementing a model in Stata for making optimal decisions in settings with multiple actions or options, commonly known as multi- action (or multi-arm) settings. In these scenarios, a finite set of decision options is available. In the initial part of the presentation, I provide a concise overview of the primary approaches for estimating the reward or value function, as well as the optimal policy within the multi-arm framework. I outline the identification assumptions and statistical properties associated with optimal policy learning estimators. Moving on to the second part, I explore the analysis of decision risk. This examination reveals that the optimal choice can be influenced by the decision maker's risk attitude, specifically regarding the trade-off between the reward conditional mean and conditional variance. The third part of the paper presents a Stata implementation of the model, accompanied by an application to real data. Creation-Date: 20240916 Handle: RePEc:boc:lsug24:22 Template-Type: ReDIF-Paper 1.0 Title: Pattern matching in Stata: chasing the devil in the details File-URL: http://repec.org/lsug2024/UK24_Mael.pdf Author-Name: Mael Astruc-Le Souder Author-Workplace-Name: University of Bordeaux Author-Email: mael.astruc-le-souder@u-bordeaux.fr Abstract: The vast majority of quantitative statistics now have to be estimated through computer calculations. A computation script strengthens the reproducibility of these studies but requires carefulness from the researchers when writing their code to avoid various mistakes. This presentation introduces a command implementing some checks foreign to a dynamically typed language such as Stata in the context of data analysis. This command uses a new syntax, similar to switch or match expressions, to create a variable based on other variables in place of chains of 'replace' statements with 'if' conditions. More than the syntax, the real interest of this command lies in the two properties it checks for. The first one is exhaustiveness: do the stated conditions cover all the possible cases? The second one is usefulness: are all the conditions useful, or is there redundancy between branches? I borrow the present idea of pattern matching from the Rust programming language and the earlier implementation in the OCaml programming language of the algorithm detailed in Maranget (2017) [1]. The command and source code are available on GitHub [2]. [1] MARANGET L. Warnings for Pattern Matching Journal of Functional Programming. 2007;17(3):387–421. doi:10.1017/S0956796807006223 [2] https://github.com/MaelAstruc/stata_match Creation-Date: 20240916 Handle: RePEc:boc:lsug24:23 Template-Type: ReDIF-Paper 1.0 Title: Relationships among recent difference-in-differences estimators and how to compute them in Stata File-URL: http://repec.org/lsug2024/UK24_Wooldridge.pdf Author-Name: Jeffrey Wooldridge Author-Workplace-Name: Michigan State University Author-Person: pwo39 Abstract: I will provide an overview of the similarities and differences among popular estimators in the context of staggered interventions with panel data, illustrating how to compute the estimates, as well as interpret them, using built-in and community-contributed Stata commands. Creation-Date: 20240916 Handle: RePEc:boc:lsug24:24