Template-Type: ReDIF-Paper 1.0
Title: A worked example of matching-adjusted indirect comparison using Stata
File-URL: http://repec.org/usug2024/US24_Barrette.pptx
Author-Name: Eric Barrette
Author-Workplace-Name: Medtronic
Abstract: Matching-adjusted indirect comparison is a comparative effectiveness research methodology that leverages individual level data and aggregate results when head-to-head randomized trials are not available or feasible. MAIC is growing in popularity partly because of the high costs of randomized trials and because of interest on the part of regulators for more safety and effectiveness evidence. Since the seminal papers describing the theory and application of MAIC were published just over a decade ago, the literature on how to apply this method as well as demonstrations of its applications has grown quickly. The National Institute for Health Care Excellence (NICE) in the UK released a technical document in 2016 that described MAIC best practices and provided sample R code for an example analysis. As the method has become more popular, references to the use of Stata for statistical analysis are appearing in publications yet very little documentation or code is available. We present the NICE technical documentation worked example using Stata in parallel to the original example in R and highlight the efficiencies and potential challenges of both programs.
Creation-Date: 20240804
Handle: RePEc:boc:usug24:01

Template-Type: ReDIF-Paper 1.0
Title: xv != xi: Cross-validation in Stata
File-URL: http://repec.org/usug2024/US24_Buchanan.html
Author-Name: Steven Brownell
Author-Workplace-Name: SAG Corporation
Author-Name: Billy Buchanan
Author-Workplace-Name: SAG Corporation
Abstract: Evaluating the out-of-sample properties of statistical models is important, especially for predictive modeling/analytics. Although Stata currently implements cross-validation methods natively for some model-fitting commandsâ€”dslogit, dspoisson, dsregress, elasticnet, lasso, poivregress, pologit, popoisson, poregress, sqrtlasso, xpoivregress, xpologit, xpopoisson, and xporegressâ€”broader use of cross-validation is not natively supported. At last yearâ€™s conference, a user explained the challenges that students and new users face when trying to use cross-validation procedures in Stata. While it is possible to implement the four-step process of splitting the sample, fitting the model to the training sample, predicting outcomes on the validation/test sample, and computing metrics related to the fit, doing so is tedious and time consuming. Developing a program that implements the four-step process above is not a trivial task, despite what one of the authors initially thought. In this talk, we present xv, an extensible prefix command implementing cross-validation for Stata estimation commands.
Creation-Date: 20240804
Handle: RePEc:boc:usug24:02

Template-Type: ReDIF-Paper 1.0
Title: Fungible regression coefficients
File-URL: http://repec.org/usug2024/US24_Ender.pdf
Author-Name: Phil Ender
Author-Workplace-Name: UCLA Stat Consulting 
Abstract:Ordinary least-squares regression (OLS) estimates coefficients such that the residual sum of squares (RSS) is a minimum. Further, the R-squared between the response variable and the predictors is a maximum. The solution for these OLS coefficients is unique; that is, there is only one set of coefficients that minimizes the residuals. But what if we estimated coefficients that come within one percent (0.01) or less of the maximum value of R-squared. There can be multiple sets of coefficients that yield the same R-squared. These are the fungible regression coefficients (FRCs). How many different fungible regression coefficients are possible? What do these FRCs look like? Are these FRCs of any use whatsoever? This presentation will address these questions.
Creation-Date: 20240804
Handle: RePEc:boc:usug24:03

Template-Type: ReDIF-Paper 1.0
Title: Heterogeneous difference-in-difference estimation
File-URL: http://repec.org/usug2024/US24_Jin.pptx
Author-Name: Zhenghao Jin
Author-Workplace-Name:  Johns Hopkins University, School of Medicine, Department of Surgery
Author-Name: Abimereki Muzaale
Author-Workplace-Name: Johns Hopkins University, School of Medicine, Department of Surgery
Abstract: Traditional Stata programming requires the mastery of syntax for tailoring program behaviors. However, the traditional programming approach is challenging in scenarios where 1) users lack familiarity with Stata syntax, 2) instructions for third-party programs are nonexistent, and 3) the program calls for complex syntax. To address these issues, we've explored a prompt-based programming approach.
 We used Stata's request() option under the display command to offer the user stepwise value intake prompts, establish checkpoints prior to executions, and allow modifications without termination.
 The request() option introduces significant advantages. It guides the user through parameter intake in a straightforward manner, reducing the efforts to understand complex syntax. Moreover, by incorporating checkpoints and allowing for modifications during processing, it substantially diminishes the likelihood and impact of errors. However, incorporating such prompt-based elements into broader programming frameworks presents challenges because of the requirement for user input.
 Adopting a prompt-based programming approach significantly eases the learning curve and offers a practical solution for both preventing and correcting errors efficiently. Nonetheless, the potential difficulties of integrating these prompt-based elements into larger programming projects warrant careful consideration. Programmers need to evaluate application context, user's expertise, and the practicality of integration when choosing between programming methodologies.
Creation-Date: 20240804
Handle: RePEc:boc:usug24:04

Template-Type: ReDIF-Paper 1.0
Title: Causal mediation analysis in Stata
File-URL: http://repec.org/usug2024/US24_Dallakyan.pptx
Author-Name: Aramayis Dallakyan
Author-Workplace-Name:  StataCorp
Abstract: Causal mediation analysis determines the mechanism through which a treatment influences an outcome through a mediator. The objective of this presentation is to provide a practical guide, facilitating an understanding and implementation of causal mediation analysis using Stata 18's new mediate command. We will begin by introducing the fundamental steps of causal analysis and then apply these steps to causal mediation analysis. Additionally, we will highlight the differences between causal and traditional mediation analysis. The presentation will also delve into various types of direct and indirect effects, illustrating their practical applications. Examples demonstrating how to perform causal mediation estimation within Stata, using different types of outcomes and mediators (continuous, binary, and count), will be provided. No prior knowledge of Stata is required, although a basic understanding of causal inference will be beneficial.
Creation-Date: 20240804
Handle: RePEc:boc:usug24:05

Template-Type: ReDIF-Paper 1.0
Title: Seeing is believing: Added-variable plots for complex estimators
File-URL: http://repec.org/usug2024/US24_Gallup.pdf
Author-Name: John Gallup
Author-Workplace-Name: Portland State University, Department of Economics
Abstract: Added-variable plots show the contribution of each data point of one explanatory variable to an outcome variable, while controlling for the influence of multiple other explanatory variables. This is a multivariate generalization of a scatterplot with a trend line. It provides an intuitive visual presentation of complex estimation results to specialists and nonspecialists alike. The plots show the marginal effect of an explanatory variable on the outcome as well as how closely the data adhere to the estimate. Observers can see outliers and the statistical significance of the estimated coefficient. The more complex the estimation method, the more helpful it is to have an accessible visual representation of the results.
 Currently, added-variable plots are available only for OLS regression in Stata. I recently extended the theory of added-variable plots to all commonly used linear and nonlinear estimators, including generalized least squares, instrumental variables, maximum likelihood, nonlinear least squares, and generalized method of moments estimators. I am in the process of programming added-variable plot commands for all Stata estimators. I have started with added-variable plots for panel data (xt) estimators (SJ 2020) and will shortly add them for instrumental variables and time-series estimators.
Creation-Date: 20240804
Handle: RePEc:boc:usug24:06

Template-Type: ReDIF-Paper 1.0
Title: Streamlining Stata project collaboration and package management, introducing repado and reproot from the repkit package
File-URL: http://repec.org/usug2024/US24_Bjarkefur1.pdf
Author-Name: Kristoffer Bjarkefur
Author-Workplace-Name: The World Bank, DIME/LSMS
Author-Name: Luis Eduardo San Martin
Author-Workplace-Name: World Bank
Author-Name: Benjamin B. Daniels
Author-Workplace-Name: World Bank
Author-Person: pda505
Abstract: In this presentation, we introduce two novel commands, repado and reproot, as part of the Stata package repkit, designed to streamline projects package version control and management across teams. The repado command facilitates precise version control of Stata packages within projects by establishing project-specific ado-path folders. This ensures consistent usage of package dependencies among team members and enhances reproducibility by preserving access to specific command versions, vital for revisiting older projects. Moreover, repado proves instrumental in package development, enabling seamless testing of unpublished commands alongside stable versions in diverse project environments. Complementing repado, reproot offers efficient management of root paths across projects with minimal manual intervention. Unlike existing packages addressing the same inefficiency, like setroot, reproot excels in handling multirooted projects, such as those involving Git collaboration and data sharing on diverse platforms like Dropbox or network drives. Its streamlined setup ensures rapid root path identification, even when a project's roots are in different location in a project. The setup of reproot only needs to be done once per computer for all projects. This helps optimize project navigation and facilitate seamless integration across team workflows, especially in teams and organizations collaborating on many projects.
Creation-Date: 20240804
Handle: RePEc:boc:usug24:07

Template-Type: ReDIF-Paper 1.0
Title: reprun, automating complete reproducibility verifications
File-URL: http://repec.org/usug2024/US24_Daniels.pdf
Author-Name: Benjamin B. Daniels
Author-Workplace-Name: World Bank, DIME
Author-Person: pda505
Author-Name:  Ankriti Singh
Author-Workplace-Name: World Bank, DIME
Author-Name: Luis Eduardo San Martin
Author-Workplace-Name: World Bank
Author-Name: Kristoffer Bjarkefur
Author-Workplace-Name: The World Bank, DIME/LSMS
Abstract: The reprun command in Stata is designed to automate reproducibility verifications for sets of Stata do-files. This session presents detailed updates to the command in the context of DIME Analyticsâ€™s repkit package, which spans a complete workflow for the reproducibility verifications. The repkit package aims to ensure that the outputs of reproducibility packages are stable and reproducible, addressing the common sources of reproducibility failures. By identifying and correcting issues, users can improve the reliability of their statistical analyses, making them suitable for sharing and publication. The reprun command performs two runs of a specified do-file, recording the state of Stata after each lineâ€™s execution during the first run and then comparing it with the state after the same lineâ€™s execution in the second run. Key states monitored include the random-number generator (RNG) state, data sort order, and data contents. If discrepancies occur between the two runs, reprun flags potential reproducibility errors, reporting mismatches in a table format, which helps in identifying and resolving issues. This tool emphasizes the importance of managing randomness and maintaining consistent data states to avoid reproducibility errors, especially when inconsistent outputs are far downstream in code from their sources.
Creation-Date: 20240804
Handle: RePEc:boc:usug24:08

Template-Type: ReDIF-Paper 1.0
Title: adodownâ€”a framework for Stata package development
File-URL: http://repec.org/usug2024//US24_Bjarkefur2.pdf
Author-Name: Kristoffer Bjarkefur
Author-Workplace-Name: The World Bank, DIME/LSMS
Author-Name: Arthur Shaw
Author-Workplace-Name: World Bank, DIME/LSMS
Abstract: adodown aims to make Stata packages easier for both developers to create and users to understand. For developers, adodown offers workflow commands that automate manual tasks at each stage of development. At project's start, adodown creates the necessary scaffolding for the package (folders, pkg-file, etc). For each package command, it uses templates to create necessary files (i.e., ado, documentation, unit test) and adds appropriate entries in the pkg-file. For documentation, it allows developers to draft in plain Markdown while creating standard help files in SMCL. And for publication, adodown collects the required files, puts them in proper format, and prepares a zip file for SSC submission. Also, adodown automatically deploys a package documentation website. For users, this provides an easy way to discover packages, to understand what they do, and to explore how commands workâ€”all without installing the package. For developers, this provides packages with a welcome web presence and offers a home for additional documentation (e.g., how-to guides, technical notes, FAQs) and keeps HTML documentation up to date with SMCL documentation through continuous deployment via GitHub Actions. This talk will demonstrate how adodown works, showcase a few live examples, and seek feedback from the Stata community.
Creation-Date: 20240804
Handle: RePEc:boc:usug24:09

Template-Type: ReDIF-Paper 1.0
Title: Estimating the wage premia of refugee immigrants with coarsened exact matching and recentered influence function quantile regressions
File-URL: http://repec.org/usug2024/US24_Baum.pdf
Abstract: In this case study, we examine the wage earnings of fully employed previous refugee immigrants in Sweden. Using administrative employerâ€“employee data from 1990 onward, about 100,000 refugee immigrants who arrived between 1980 and 1996 and were granted asylum are compared with a matched sample of native-born workers using coarsened exact matching. Employing recentered influence function (RIF) quantile regressions to wage earnings for the period 2011â€“2015, the occupational-task-based Oaxacaâ€“Blinder decomposition approach shows that refugees perform better than natives at the median wage, controlling for individual and firm characteristics. The RIF-quantile approach provides better insights for the analysis of these wage differentials than the standard regression model employed in earlier versions of the study.
Author-Name: Kit Baum
Author-Workplace-Name: Boston College
Author-Person: pba1
Author-Name: Hans LÃ¶Ã¶f
Author-Workplace-Name: Royal Institute of Technology
Author-Person: plf1
Author-Name: Andreas Stephan
Author-Workplace-Name: Linnaeus University
Author-Person: pst185
Author-Name: Klaus Zimmermann
Author-Workplace-Name: UNU-MERIT
Author-Person: pzi13
Creation-Date: 20240804
Handle: RePEc:boc:usug24:10

Template-Type: ReDIF-Paper 1.0
Title: Optimal policy learning with observational data in multiaction scenarios: Stata implementation
File-URL: http://repec.org/usug2024/US24_Cerulli.pdf
Author-Name: Giovanni Cerulli
Author-Workplace-Name: CNR-IRCRES, Research Institute on Sustainable Economic Growth, National Research Council of Italy
Author-Person: pce40
Author-Name: Antonio Zinilli
Author-Workplace-Name: CNR-IRCRES, Research Institute on Sustainable Economic Growth, National Research Council of Italy
Author-Person: pzi131
Abstract: This presentation presents a new Stata command for carrying out optimal policy learning (OPL) with observational data, i.e., data-driven optimal decision-making, in multiaction (or multiarm) settings, where a finite set of decision options is available. The presentation and related command focus on three components: estimation, risk preference, and regret estimation via three estimation methods (i.e., regression adjustment, inverse probability weighting, and doubly robust estimators). After briefly presenting the statistical background of this OPL model and the related syntax of the Stata command, the presentation will focus on an application related to climate-related agricultural policies.
Creation-Date: 20240804
Handle: RePEc:boc:usug24:11

Template-Type: ReDIF-Paper 1.0
Title: Estimating a probit model with a continuous endogenous covariate and using complex survey data: An application to socioeconomic mobility analysis in Mexico
File-URL: http://repec.org/usug2024/US24_Peon.pdf
Author-Name: Sylvia Beatriz Guillermo Peon
Author-Workplace-Name: BenemÃ©rita Universidad AutÃ³noma de Puebla
Author-Name: Alejandro Miguel CastaÃ±eda Valencia
Author-Workplace-Name: BenemÃ©rita Universidad AutÃ³noma de Puebla
Author-Name: Juan Enrique Huerta Wong
Author-Workplace-Name: VocerÃa de Presidencia de la RepÃºblica
Abstract: We use Stata to estimate the probability of having a high socioeconomic destination as a function of education, parental economic level, and other explanatory variables. Given the potential endogeneity of the education variable, we estimate a probit model with an instrumental variable under the context of a complex survey dataset. The maximum-likelihood estimation procedure of the structural parameters is carried out using two equivalent strategies that consider Stata functions and reporting options. Following Long and Freese (2014), we first estimate the model using the ivprobit command with sampling weights and clustered robust standard errors, allowing us to obtain the report of the Wald exogeneity test. As a second strategy, we use the ivprobit command with survey data analysis estimation. We performed additional steps needed to compute the overall rate of correctly classified results after estimation under survey data analysis or using sampling weights. The validity of an instruments test remains challenging for ivprobit models with survey data. Our analysis of the estimation results is extended and enriched with the calculation of odds ratios (testing whether they are statistically different from one) and average probabilities by region in Mexico and educational level.
Creation-Date: 20240804
Handle: RePEc:boc:usug24:12

Template-Type: ReDIF-Paper 1.0
Title: Data visualization with Stata
File-URL: http://repec.org/usug2024/US24_Peng.html
Author-Name: Hua Peng
Author-Workplace-Name: StataCorp
Abstract: This talk will demonstrate how to produce informative, robust, and complex graphs using reproducible official and community-contributed routines in Stata. We will also discuss commonly used programming tools and tips for creating more engaging graphs.
Creation-Date: 20240804
Handle: RePEc:boc:usug24:13

Template-Type: ReDIF-Paper 1.0
Title: Postestimation with latent class analysis accounting for class uncertainty
File-URL: http://repec.org/usug2024/US24_Kolenikov.pdf
Author-Name: Stas Kolenikov
Author-Workplace-Name: NORC
Author-Person: pko3
Author-Name: Kathy Rowan
Author-Workplace-Name: NORC
Abstract: Latent class analysis (LCA) is a statistical model with categorical latent variables in which the measured categorical outcomes have proportions of the outcome categories that differ between classes. In official Stata, the model is fit using the gsem, lclass() command. Applied researchers often need to follow up the LCA modeling with other statistical analyses that involve the classes from the model, from simple descriptive statistics of variables not in the model, to multivariate models. A simplified shortcut procedure is to assign the class with the highest predicted probability, but doing so results in treating the classes as fixed and perfectly observed, rather than latent and estimated, leading to underaccounting of uncertainty and downward bias in standard errors. We demonstrate how to utilize the existing official Stata multiple imputation (MI) capacity to impute classes based on the LCA postestimation results and present the resulting dataset to Stata mi procedures as valid MI data. The standard MI diagnostics that can be applied to the mi estimate results show that variances are noticeably underestimated when only the modal class is imputed. In the application that motivated this development, the variances were biased down by 25% to 40%.
Creation-Date: 20240804
Handle: RePEc:boc:usug24:14

Template-Type: ReDIF-Paper 1.0
Title: Ensuring reproducibility in Stata: Insights from the World Bank's reproducible research repository
File-URL: http://repec.org/usug2024/US24_San_Martin.pdf
Author-Name: Luis Eduardo San Martin
Author-Workplace-Name: World Bank
Author-Name: Maria Ruth Jones
Author-Workplace-Name: World Bank
Author-Name: Maria Reyes Retana
Author-Workplace-Name: World Bank
Author-Name: Benjamin B. Daniels
Author-Workplace-Name: World Bank, DIME
Author-Person: pda505
Author-Name: Kristoffer Bjarkefur
Author-Workplace-Name: The World Bank, DIME/LSMS
Abstract: The challenge of reproducing economics research has gained increased attention with the growing advocacy for open science in the field. Economics journals and research institutions are quickly adopting reproducibility guidelines, requiring authors to provide code and data for reproducing results and ensuring the trustworthiness of their findings. Presented by the Development Impact Analytics team of the World Bank, this session delves into the intricacies of achieving reproducibility in Stata. Since the launch of the World Bank's Reproducible Research Repository, the team has conducted reproducibility verifications and curated reproducibility packages for almost a hundred working papers from diverse research teams in the organization, building up a valuable and novel experience into addressing common issues that break reproducibility in Stata analyses. The session will present an overview of the workflows and tools the team has developed in response to identified reproducibility challenges in typical Stata works, covering key topics such as controlling the versions of external dependencies and appropriately handling randomness in Stata code. The presentation will include practical strategies for enhancing the transparency and reliability of Stata-based research.
Creation-Date: 20240804
Handle: RePEc:boc:usug24:15

Template-Type: ReDIF-Paper 1.0
Title: classify: Over 200 measures of association, correlation, and forecast accuracy for categorical outcomes
File-URL: http://repec.org/usug2024/US24_Sirchenko.pdf
Author-Name: Andriy Sirchenko
Author-Workplace-Name:  Nyenrode Business University
Author-Person: psi424
Author-Name: J. Huismans
Author-Workplace-Name:  University of Amsterdam
Author-Name: J. W. Nijenhuis
Author-Workplace-Name:   Nedap NV
Abstract:We describe a new Stata command, classify, that computes various measures of association and correlation, between two categorical variables (dichotomous and polytomous, nominal and ordinal), diagnostic scores of probabilistic forecasts of such variables, and various measures of the accuracy of deterministic forecasts of them. We compiled a comprehensive catalogue of over 210 measures of association, correlation and forecast verification and 9 diagnostic scores for probabilistic forecasts from different fields, along with the terminological synonymy and bibliography associated with them. In addition to the overall measures, the command computes the class-specific metrics as well as their macro and weighted averages.
Creation-Date: 20240804
Handle: RePEc:boc:usug24:16

Template-Type: ReDIF-Paper 1.0
Title: Scalable high-dimensional nonparametric density estimation, with Bayesian applications
File-URL: http://repec.org/usug2024/US24_Grant.pdf
Author-Name: Robert Grant
Author-Workplace-Name: BayesCamp
Abstract: Few methods have been proposed for flexible, nonparametric density estimation, and they do not scale well to high-dimensional problems. We describe a new approach based on smoothed trees called the kudzu density (Grant 2022). This fits the little-known density estimation tree (Ram & Gray 2011) to a dataset and convolves the edges with inverse logistic functions, which are in the class of computationally minimal smooth ramps. New Stata commands provide tree fitting, kudzu tuning, estimates of joint, marginal and cumulative densities, and pseudo-random numbers. Results will be shown for fidelity and computational cost. Preliminary results will also be shown for ensembles of kudzu under bagging and boosting. Kudzu densities are useful for Bayesian model updating where models have many unknowns, require rapid update, datasets are large, and posteriors have no guarantee of convexity and unimodality. The input â€œdatasetâ€ is the posterior sample from a previous analysis. This is demonstrated with a real-life large dataset. A new command outputs code to use the kudzu prior in bayesmh evaluators, BUGS/JAGS, and Stan.
Creation-Date: 20240804
Handle: RePEc:boc:usug24:17

Template-Type: ReDIF-Paper 1.0
Title: Using Stata to analyze public opinion data: An open educational resource for beginning undergraduate and graduate methods courses in political science
File-URL: http://repec.org/usug2024/US24_Benstead.pptx
Author-Name: Lindsay Benstead
Author-Workplace-Name:  Portland State University
Abstract: This presentation provides an overview of a new Open Educational Resource (OER) textbook for analyzing public opinion data using Stata created by the author. Based on the authorâ€™s experience teaching research methods for political science at Portland State University, the guide walks students through an article published in Democratization using publicly available data from the Arab Barometer. Students learn how to craft a research question, identify and measure variables, and perform descriptive and multivariate statistical tests. At the end of the course, students will be able to replicate the published findings and craft their own research design for analyzing public opinion data. The author presents the Stata commands used to produce the results reported in the published article to guide students through the replication and their own original research project.
Creation-Date: 20240804
Handle: RePEc:boc:usug24:18