Template-Type: ReDIF-Paper 1.0
Title: Quantile regressions with multiple fixed effects
File-URL: http://repec.org/usug2023/US23_Rios-Avila.html
Author-Name: Fernando Rios-Avila 
Author-Workplace-Name: Levy Economics Institute 
Author-Person: pri214
Abstract: Quantile regression (QR) is an estimation strategy that provides richer characterizations of the relationships between dependent and independent variables. Some developments in the literature have focused on extending quantile regression analysis to include individual fixed effects in the framework of panel data, avoiding the incidental parameter problem, under different assumptions. One recent article by Machado-Santos-Silva (2019) proposed a location-scale estimator that allows for the inclusion of individual fixed effects in the framework of panel data, which permits individual effects to vary across quantiles. In this presentation, I propose an extension to this estimator that permits using any number of fixed effects, providing alternative estimators for SE beyond those suggested in Machado-Santos-Silva (2019). I also present the command mmqreg, which implements these extensions.
Creation-Date: 20230729
Handle: RePEc:boc:usug23:01

Template-Type: ReDIF-Paper 1.0
Title: iedorep: Quickly locate reproducibility failures in Stata code
File-URL: http://repec.org/usug2023/US23_Daniels.pptx
Author-Name: Benjamin Daniels
Author-Workplace-Name: The World Bank, Development Impact Evaluation
Author-Person: pda505
Abstract: iedorep is a new Stata command in DIME Analytics ietoolkit package to check reproducibility of each line of Stata do-files. First, iedorep takes a single do-file as an argument, runs it, and stores the Stata state after each line executes. This includes the current data signature, the state of the RNG, and the state of the sort RNG. Then, it runs the do-file again, checking the state at all the same points. Finally, it reports exactly which lines (if any) have produced unstable states — quickly and accurately identifying hard-to-find reproducibility failures. This presentation will cover potential ways of using iedorep. We will discuss how it detects reproducibility errors, how it provides an efficient way to debug and check reproducibility of Stata code, and how it encourages users to write more accessible code. We will also explore how iedorep can be used in workshops and teaching activities and how it can serve as an important tool in research teams to review code and ensure project reproducibility. Finally, we will highlight areas for improvement and development challenges, such as within-loop implementation and recursive use in projects that use run or do to manage subtasks.
Creation-Date: 20230729
Handle: RePEc:boc:usug23:02

Template-Type: ReDIF-Paper 1.0
Title: Introducing the Stata linter: A tool to produce clear and transparent Stata code
File-URL: http://repec.org/usug2023/US23_San_Martin.pdf
Author-Name: Luis Eduardo San Martin
Author-Workplace-Name: The World Bank, Development Impact Evaluation
Author-Name: Rony Rodriguez-Ramirez
Author-Workplace-Name: World Bank-DECRG
Abstract: Statistical programming code developed collaboratively is common in modern data work. However, it is also usual for people to have different coding conventions, making it challenging for one reader to quickly understand another's code and impeding transparency. This is especially true for researchers using Stata, because it does not have a widely accepted style guide and few economics graduate students are taught best practices for writing code. To tackle the problem of poor and inconsistent coding conventions in Stata, DIME Analytics recently launched a new tool: the Stata linter. The Stata linter uses the new lint Stata command to help users write good Stata code by identifying problematic code practices. It reads a Stata do-file and automatically detects coding style that makes code hard to follow or that can lead to unintended errors, following DIME Analytics' Stata style guide. This presentation will cover the main functionalities of lint, showcasing how it can be used to detect and correct bad coding practices and improve the readability and transparency of Stata do-files.
Creation-Date: 20230729
Handle: RePEc:boc:usug23:03

Template-Type: ReDIF-Paper 1.0
Title: Heterogeneous difference-in-difference estimation
File-URL: http://repec.org/usug2023/US23_Pinzon.pdf
Author-Name: Enrique Pinzón
Author-Workplace-Name:  StataCorp
Abstract: Treatment effects might differ over time and for groups that are treated at different points in time, treatment cohorts. In Stata 18, we introduced two commands that estimate treatment effects that vary over time and cohort. For repeated cross-sectional data, we have hdidregress. For panel data, we have xthdidregress. Both commands let you graph the evolution of treatment over time. They also allow you to aggregate treatment within cohort and time and visualize these effects. I will show you how both commands work and briefly discuss the theory underlying them.
Creation-Date: 20230729
Handle: RePEc:boc:usug23:04

Template-Type: ReDIF-Paper 1.0
Title: Generalized 2SLS procedure for Stata
File-URL: http://repec.org/usug2023/US23_Suarez_Chavarria.pdf
Author-Name: Nicolas Suarez Chavarria
Author-Workplace-Name:  Stanford University
Abstract: In this presentation, I implement code to run the generalized 2SLS procedure to estimate peer effects described in Bramoullé, Djebbari, and Fortin (2009). Identification of peer effects through social networks. Journal of econometrics, 150(1), 41-55. With this, we can estimate peer-effects models in Stata very easily; we just need to define an adjacency matrix in Mata, define our dependent variable and our exogenous variables, and then just run the regression. The program returns the standard display of a regression command with coefficients, standard errors, p-values, and so on for our endogenous and exogenous effects, and all these coefficients are also stored in e() to be used after with other postestimation commands. The program also allows us to row-normalize our adjacency matrix and to add group-level fixed effects.
Creation-Date: 20230729
Handle: RePEc:boc:usug23:05

Template-Type: ReDIF-Paper 1.0
Title: Program monitoring of educational tablet-based interventions using topic modeling in Stata
File-URL: http://repec.org/usug2023/US23_Bahlibi.pptx
Author-Name: Abraham Bahlibi
Author-Workplace-Name: Imagine Worldwide
Abstract: Rigorous research conducted in Africa since 2015 established that onebillion's software, an award-winning tablet-based curriculum, produces meaningful impacts in literacy and numeracy (Levesque, Bardack, Chigeda 2020; Levesque, Bardack, Chigeda, Bahlibi, Winiko 2022; Pitchford, Hubber, Chigeda 2017). As these programs are scaled up, program monitoring will become critical for maintaining the quality of implementation and outcomes. International organizations have called for using text analysis as a tool for monitoring and evaluation (Wencker 2019). The present study piloted the use of text analysis to identify themes from field observations of a tablet-based program using onebillion's software for early grade learners. We collected 426 open-ended observations by field officers. We used the Stata package ldagibbs to run topic modeling/latent Dirichlet allocation (LDA). LDA clusters text documents into a user-chosen number of topics (Schwarz 2018). We anticipated that LDA would generate topics that help us more efficiently summarize field observations. LDA successfully generated topics such as faulty audio cables and how they contributed to noisier classrooms. We will receive more survey data as we scale to new sites. Pilot results suggest that LDA may be an efficient means of identifying topics otherwise difficult to identify with staff review of voluminous survey responses.
Creation-Date: 20230729
Handle: RePEc:boc:usug23:06

Template-Type: ReDIF-Paper 1.0
Title: The longitudinal effects of disability types on incomes and employment
File-URL: http://repec.org/usug2023/US23_Millard.pdf
Author-Name: Robert Millard
Author-Workplace-Name: Stony Brook University
Abstract: This presentation studies the heterogeneous effects of disability onset on the level and composition of personal income. I use linked Canadian survey and administrative tax data to estimate the change in disaggregated income measures in the 10 years following onset. Estimates are obtained using a recent inverse weighting methodology that corrects for biases in two-way fixed effect and event study estimators. I differentiate disability based on limitations to daily activities, constructing three aggregate types: physical, cognitive, and concurrent. I then analyze the variation in effects across activity limitations within these aggregate types. I find that people with cognitive disabilities experience declines of greater magnitude and permanence in employment rates and employment income than people with physical disabilities. However, people with only cognitive disabilities experience less of an increase in government transfer payments from programs targeting individuals with disabilities. Within cognitive disabilities, intellectual and mental limitations experience greater declines in employment and employment income and less of an increase in government transfers compared with activity limitations within physical. Within physical disabilities, dexterity, mobility, and flexibility limitations experience remarkably similar treatment paths. In contrast, I find insignificant effects for limitations caused by pain alone, which confounds the estimated effects of physical disabilities.
Creation-Date: 20230729
Handle: RePEc:boc:usug23:07

Template-Type: ReDIF-Paper 1.0
Title: Bayesian meta-analysis of time to benefit
File-URL: http://repec.org/usug2023/US23_Boscardin.pdf
Abstract: The clinical decisions to start a treatment for any condition require balancing short-term risks with long-term benefits. A clinically interpretable survival analysis metric in such decisions is time-to-benefit (TTB), the time at which a specific absolute risk reduction (ARR) is first obtained between two treatment arms. We describe a method for estimating TTB using Bayesian methods for meta-analysis. We first extract published survival curves using DigitizeIt and use these to reconstruct person-level time-to-event data with the Stata module ipdfc. Next, using the bayesmh command, we fit a hierarchical Bayesian model allowing for parameters of Weibull survival curves that are specific to each study and arm. We use the resulting joint posterior distribution to estimate study-specific and overall TTB for given ARR (for example, estimates and credible intervals for time until an ARR of 0.01, which is the time until an additional 1 out of 100 patients would benefit from the treatment). As a case study, the presentation shows results from a study of time-to-benefit of blood pressure medications on prevention of cardiovascular events.
Author-Name: John Boscardin
Author-Workplace-Name:  University of California San Francisco
Author-Name: Irena Cenzer
Author-Workplace-Name: University of California San Francisco
Author-Name: Sei J. Lee
Author-Workplace-Name: University of California San Francisco
Author-Name: Matthew Growdon
Author-Workplace-Name: University of California San Francisco
Author-Name: W. James Deardorff
Author-Workplace-Name: University of California San Francisco
Creation-Date: 20230729
Handle: RePEc:boc:usug23:08

Template-Type: ReDIF-Paper 1.0
Title: spgen: Creating spatially lagged variables in Stata
File-URL: http://repec.org/usug2023/US23_Kondo.zip
Author-Name: Keisuke Kondo
Author-Workplace-Name: Research Institute of Economy, Trade and Industry
author-Person: pko652
Abstract: This presentation introduces the new community-contributed command spgen, which computes spatially lagged variables in Stata. Spatial econometric analysis has gained attention from researchers and policymakers, and demand for its use is continuously growing among Stata users. The Sp commands are provided on Stata version 15 or later and facilitate handling of spatial data and estimation of spatial econometric models. The newly developed command spgen provides the extended function of the spgenerate command in the Sp commands to deal with a large-sized spatial dataset, such as mesh data and grid square statistics. The computation of spatially lagged variables requires a spatial weight matrix, which mathematically describes the spatially dependent structures in the matrix. However, when the spatial weight matrix is too large for the computer specs, the matrix operations may be unable to calculate spatially lagged variables in the Sp commands. The spgen command deals with this problem and provides some interesting examples of spatial data analysis. 
Creation-Date: 20230729
Handle: RePEc:boc:usug23:09

Template-Type: ReDIF-Paper 1.0
Title: Using Stata for Q-methodology studies
File-URL: http://repec.org/usug2023/US23_Akhtar-Danesh.pptx
Author-Name: Noori Akhtar-Danesh
Author-Workplace-Name: McMaster University
Abstract: Q‐methodology is an innovative research method where qualitative data are analyzed using quantitative techniques. It has the strengths of both qualitative and quantitative methods and is regarded as a bridge between these two approaches. It is used for the assessment of subjectivity, including attitudes, perceptions, feelings and values, preferences, life experiences such as stress and quality of life, and intraindividual concerns such as self-esteem, body image, and satisfaction. Q-methodology can be used in any type of research where the outcome variable involves assessment of subjectivity. It is used to identify unique salient viewpoints, as well as shared views on subjective issues, thereby providing unique insights into the richness of human subjectivity. Currently, there are only a handful of programs with limited capability for Q-methodology analysis. In this presentation, I provide a brief review of Q-methodology and three user written commands, qconvert, qfactor, and qpair, in Stata, which offer an attractive set of options for Q-methodology analysis, including different factor extraction and factor rotation techniques. Applications of these commands will be illustrated using two real datasets.
Creation-Date: 20230729
Handle: RePEc:boc:usug23:10

Template-Type: ReDIF-Paper 1.0
Title: locproj: A new Stata command to estimate local projections
File-URL: http://repec.org/usug2023/US23_Ugarte-Ruiz.pdf
Author-Name: Alfonso Ugarte-Ruiz
Author-Workplace-Name:BBVA
Author-Person: pug24
Abstract: locproj estimates linear and nonlinear impulse response functions (IRFs) based on the local projections methodology first proposed by Jorda (2005). The procedure allows one to easily implement several options used in the growing literature of local projections. The options allow defining the desired specification in a fully automatic or in a customized way. For instance, it allows defining any nonlinear combination of variables as the impulse (shock) or defining methodological options that depend on the response horizon. It allows choosing different estimation methods for both time series and panel data, including the instrumental variables options currently available in Stata. It performs the necessary transformations to the dependent variable in order to estimate the local projections in the desired transformation, such as levels, logs, differences, log-differences, cumulative changes, and cumulative log-differences. For every option, the procedure generates the corresponding transformation of the dependent variable needed in case the user wants to include lags of the dependent variable. It reports the IRF, together with its standard error and confidence interval as an output matrix and through an IRF graph. The user can easily choose different options for the desired IRF graph and other options to save and use the results.
Creation-Date: 20230729
Handle: RePEc:boc:usug23:11

Template-Type: ReDIF-Paper 1.0
Title: Optimal policy learning using Stata
File-URL: http://repec.org/usug2023/US23_Cerulli.pdf
Author-Name: Giovanni Cerulli
Author-Workplace-Name: IRCRES-CNR
Author-Person: pce40
Abstract: In the footsteps of the recent literature on empirical welfare maximization (EWM), I present a new Stata command called opl to carry out "optimal policy learning", a statistical procedure to design treatment assignments using a machine learning approach. The opl command focuses on three policy classes: threshold based, linear combination, and fixed-depth tree. I show a practical example, based on a real policy case—that is, the popular LaLonde training program—where, by stressing the policymaker perspective, I show how to carry out optimal treatment assignment and the potential operative problems that can come up in applying this procedure to real-world case studies. I will discuss, in particular, problems of “angle solutions”. The presentation offers a general protocol to carry out optimal policy assignment using Stata and contributes to stress the policymaker empirical perspective and related issues arising when carrying out optimal policy assignment in practice.
Creation-Date: 20230729
Handle: RePEc:boc:usug23:12

Template-Type: ReDIF-Paper 1.0
Title: Consistent estimation of finite mixtures: An application to latent group panel structures
File-URL: http://repec.org/usug2023/US23_Langevin.pdf
Author-Name: Raphaël Langevin
Author-Workplace-Name:  McGill University
Abstract: In this presentation, I show that maximizing the likelihood of a mixture of a finite number of parametric densities leads to inconsistent estimates under weak regularity conditions. The size of the asymptotic bias is positively correlated with the overall degree of overlap between the densities within the mixture. In contrast, I show that slight modifications in the classification expectation-maximization (CEM) algorithm—the likelihood generalization of the K-means algorithm—produce consistent estimates of all parameters in the mixture, and I derive the asymptotic distribution of the proposed estimation procedure. I confirm the inconsistency of MLE procedures, such as the expectation-maximization (EM) algorithm, using numerical experiments with simple Gaussian mixture models. Simulation results show that the proposed estimation strategy generally outperforms the EM algorithm when estimating latent group panel structures with unrestricted group membership across units and over time. I also compare the finite-sample performance of each estimation strategy using a mixture of two-part models to predict individual healthcare expenditures from health administrative data. Estimation results show that the proposed consistent CEM approach leads to smaller prediction errors than models estimated with the EM algorithm, with a reduction of more than 40% in the out-of-sample prediction error compared with the standard, single-component, two-part model. The proposed estimation procedure thus represents a useful tool when both homogeneity of the parameters and constant group membership are assumed not to hold in panel-data analysis.
Creation-Date: 20230729
Handle: RePEc:boc:usug23:13

Template-Type: ReDIF-Paper 1.0
Title: Reproducible research in Stata: Managing dependencies and project files
File-URL: http://repec.org/usug2023/US23_Correia.pdf
Author-Name: Sergio Correia
Author-Workplace-Name: Board of Governors of the Federal Reserve System
Author-Person: pco826
Author-Name:  Matthew Seay 
Author-Workplace-Name: Board of Governors of the Federal Reserve System
Abstract: Reproducibility of results is one of Stata's most valuable features, as well as an essential goal for researchers and journal editors. This ability, however, is limited by the lack of version control for user-submitted packages, which are often distributed through Github and other channels outside of the Statistical Software Components (SSC) archive. Thus, other researchers or even coauthors might fail to reproduce a given result given the same code and data because of different package versions. In this talk, we present REQUIRE, a Stata package that fills this gap by ensuring that package dependencies are consistent across users. For this, REQUIRE is able to extract a package version number based on the "starbang lines" included by users at the top of each ado-file. Because starbangs are not standardized and come in many different variants, our package takes particular care to cover corner cases and have a coverage as broad as possible across all packages available on SSC and Github. Then, REQUIRE can be used to assert that an exact or minimum package version is present and install it if asked for. Last, we showcase how to use this package together with the related SETROOT package that tracks projects' working directories.
Creation-Date: 20230729
Handle: RePEc:boc:usug23:14

Template-Type: ReDIF-Paper 1.0
Title: Creating Likert-scale visualizations: An approach using Stata and Tableau
File-URL: http://repec.org/usug2023/US23_Cervantes.pptx
Author-Name: Sergio Cervantes
Author-Workplace-Name: WestEd
Abstract: Stata and Tableau are tools that can be used to gain insight into Likert-scale responses. However, very little research exists that discusses how one can create Likert-scale visualizations with the use of Stata and Tableau in tandem. The purpose of this work is to help researchers create Likert-scale visualizations efficiently. The step-by-step process will serve as a guide for researchers to create dashboard-worthy visualizations that effectively present data. The key is creating an Excel file exported from Stata that can be imported as a data source into Tableau. It is important that this file include respondents' ID, group variable, and Likert-scale responses. In addition, the raw data must be prepared using reshape, and an additional variable indicating the numeric values of the Likert-scale responses (or vice versa) must be generated using gen. Once the Excel file is imported into Tableau, we can set up the visual with the sheet interface. Using Tableau, we can create a Likert-scale visual with select mark modifications and even include item-response averages using level of detail (LOD) arithmetic. Using best data practices and formatting, we can create visuals that effectively communicate findings from raw survey data.
Creation-Date: 20230729
Handle: RePEc:boc:usug23:15

Template-Type: ReDIF-Paper 1.0
Title: Metaprogramming: What it is, how to use it, and why you should care
File-URL: http://repec.org/usug2023/US23_Buchanan.html
Author-Name: Billy Buchanan
Author-Workplace-Name:  SAG Corporation
Abstract: Metaprogramming provides a highly flexible approach to solving complex programming problems. Although metaprogramming can be challenging to implement in some programming languages, metaprogramming is easy to implement in Stata largely because of the evaluation of local macros. However, metaprogramming is rarely discussed in the Stata community despite the benefits that metaprogramming can and does provide for many Stata users already. This talk will include a discussion of what metaprogramming is and how metaprogramming can be used effectively to increase efficiency and will illustrate the use of metaprogramming in Stata.
Creation-Date: 20230729
Handle: RePEc:boc:usug23:16

Template-Type: ReDIF-Paper 1.0
Title: Bayesian model averaging
File-URL: http://repec.org/usug2023/US23_Marchenko.pdf
Author-Name: Yulia Marchenko
Author-Workplace-Name: StataCorp
Abstract: Model uncertainty accompanies many data analyses. Stata's new bma suite that performs Bayesian model averaging (BMA) helps address this uncertainty in the context of linear regression. Which predictors are important given the observed data? Which models are more plausible? How do predictors relate to each other across different models? BMA can answer these and more questions. BMA uses the Bayes theorem to aggregate the results across multiple candidate models to account for model uncertainty during inference and prediction in a principled and universal way. In my presentation, I will describe the basics of BMA and demonstrate it with the bma suite. I will also show how BMA can become a useful tool for your regression analysis, Bayesian or not!
Creation-Date: 20230729
Handle: RePEc:boc:usug23:17

Template-Type: ReDIF-Paper 1.0
Title: dqrep: Facilitating harmonized data-quality assessments with Stata
File-URL: http://repec.org/usug2023/US23_Schmidt.zip
Author-Name: Carsten Oliver Schmidt
Author-Workplace-Name:  University Medicine Greifswald
Author-Name: Stephan Struckmann
Author-Workplace-Name:  University Medicine Greifswald
Author-Name: Birgit Schauer 
Author-Workplace-Name:  University Medicine Greifswald
Abstract: Transparent data-quality reporting is a key element of reproducible research. Transparency ranges from explicit assumptions underlying any data-quality check up to harmonized reporting that facilitates comparisons of results within and across studies. However, this is far from being common. To the best of our knowledge, none of the existing routines was capable of triggering a series of structured reports on multiple datasets with potentially unknown errors based on a single command call to grade and compare data-quality issues. Therefore, the dqrep Stata package was developed. dqrep triggers a set of more than 60 newly developed Stata ado’s to compute a customizable range of quality checks. This comprises descriptive overviews, missing values, rule violations, outliers, time trends, observer and device effects. Underlying assumptions are read from easily modifiable spreadsheets. Based on this, all results are integrated in PDF and docx files, as well as in result summary files to facilitate postprocessing, for example, to create benchmarks. It is shown how a single command call is used to control the data-quality pipeline in a large scale cohort study and how this may contribute to FAIR research.
Creation-Date: 20230729
Handle: RePEc:boc:usug23:18

Template-Type: ReDIF-Paper 1.0
Title:  Measuring associations and evaluating forecasts of categorical variables
File-URL: http://repec.org/usug2023/US23_Sirchenko.pdf
Abstract: This presentation introduces a new Stata command, classify, that computes various measures of association and correlation between two categorical variables (binary, ordinal, or nominal), evaluates the performance of categorical deterministic forecasts, and provides diagnostic probability scores of the accuracy of probabilistic forecasts. We compiled a comprehensive catalogue of 9 diagnostic scores for probabilistic forecasts and over 210 measures of association and correlation employed in different fields, along with the terminological synonymy and bibliography associated with them. In addition to the overall measures, the command computes the category-specific metrics for each observed category and its macro and weighted averages. We also classify all measures according to the two types of symmetry as well as propose and compute the complement and transpose symmetric variants of those measures that are not symmetric.
Author-Name: Andrei Sirchenko
Author-Workplace-Name: Nyenrode Business University
Author-Person: psi424
Author-Name: Jochem Huismans 
Author-Workplace-Name: University of Amsterdam
Author-Name: Jan Willem Nijenhuis 
Author-Workplace-Name: Nedap NV
Creation-Date: 20230729
Handle: RePEc:boc:usug23:19