Template-Type: ReDIF-Paper 1.0 Title: A worked example of matching-adjusted indirect comparison using Stata File-URL: http://repec.org/usug2024/US24_Barrette.pptx Author-Name: Eric Barrette Author-Workplace-Name: Medtronic Abstract: Matching-adjusted indirect comparison is a comparative effectiveness research methodology that leverages individual level data and aggregate results when head-to-head randomized trials are not available or feasible. MAIC is growing in popularity partly because of the high costs of randomized trials and because of interest on the part of regulators for more safety and effectiveness evidence. Since the seminal papers describing the theory and application of MAIC were published just over a decade ago, the literature on how to apply this method as well as demonstrations of its applications has grown quickly. The National Institute for Health Care Excellence (NICE) in the UK released a technical document in 2016 that described MAIC best practices and provided sample R code for an example analysis. As the method has become more popular, references to the use of Stata for statistical analysis are appearing in publications yet very little documentation or code is available. We present the NICE technical documentation worked example using Stata in parallel to the original example in R and highlight the efficiencies and potential challenges of both programs. Creation-Date: 20240804 Handle: RePEc:boc:usug24:01 Template-Type: ReDIF-Paper 1.0 Title: xv != xi: Cross-validation in Stata File-URL: http://repec.org/usug2024/US24_Buchanan.html Author-Name: Steven Brownell Author-Workplace-Name: SAG Corporation Author-Name: Billy Buchanan Author-Workplace-Name: SAG Corporation Abstract: Evaluating the out-of-sample properties of statistical models is important, especially for predictive modeling/analytics. Although Stata currently implements cross-validation methods natively for some model-fitting commands—dslogit, dspoisson, dsregress, elasticnet, lasso, poivregress, pologit, popoisson, poregress, sqrtlasso, xpoivregress, xpologit, xpopoisson, and xporegress—broader use of cross-validation is not natively supported. At last year’s conference, a user explained the challenges that students and new users face when trying to use cross-validation procedures in Stata. While it is possible to implement the four-step process of splitting the sample, fitting the model to the training sample, predicting outcomes on the validation/test sample, and computing metrics related to the fit, doing so is tedious and time consuming. Developing a program that implements the four-step process above is not a trivial task, despite what one of the authors initially thought. In this talk, we present xv, an extensible prefix command implementing cross-validation for Stata estimation commands. Creation-Date: 20240804 Handle: RePEc:boc:usug24:02 Template-Type: ReDIF-Paper 1.0 Title: Fungible regression coefficients File-URL: http://repec.org/usug2024/US24_Ender.pdf Author-Name: Phil Ender Author-Workplace-Name: UCLA Stat Consulting Abstract:Ordinary least-squares regression (OLS) estimates coefficients such that the residual sum of squares (RSS) is a minimum. Further, the R-squared between the response variable and the predictors is a maximum. The solution for these OLS coefficients is unique; that is, there is only one set of coefficients that minimizes the residuals. But what if we estimated coefficients that come within one percent (0.01) or less of the maximum value of R-squared. There can be multiple sets of coefficients that yield the same R-squared. These are the fungible regression coefficients (FRCs). How many different fungible regression coefficients are possible? What do these FRCs look like? Are these FRCs of any use whatsoever? This presentation will address these questions. Creation-Date: 20240804 Handle: RePEc:boc:usug24:03 Template-Type: ReDIF-Paper 1.0 Title: Heterogeneous difference-in-difference estimation File-URL: http://repec.org/usug2024/US24_Jin.pptx Author-Name: Zhenghao Jin Author-Workplace-Name: Johns Hopkins University, School of Medicine, Department of Surgery Author-Name: Abimereki Muzaale Author-Workplace-Name: Johns Hopkins University, School of Medicine, Department of Surgery Abstract: Traditional Stata programming requires the mastery of syntax for tailoring program behaviors. However, the traditional programming approach is challenging in scenarios where 1) users lack familiarity with Stata syntax, 2) instructions for third-party programs are nonexistent, and 3) the program calls for complex syntax. To address these issues, we've explored a prompt-based programming approach. We used Stata's request() option under the display command to offer the user stepwise value intake prompts, establish checkpoints prior to executions, and allow modifications without termination. The request() option introduces significant advantages. It guides the user through parameter intake in a straightforward manner, reducing the efforts to understand complex syntax. Moreover, by incorporating checkpoints and allowing for modifications during processing, it substantially diminishes the likelihood and impact of errors. However, incorporating such prompt-based elements into broader programming frameworks presents challenges because of the requirement for user input. Adopting a prompt-based programming approach significantly eases the learning curve and offers a practical solution for both preventing and correcting errors efficiently. Nonetheless, the potential difficulties of integrating these prompt-based elements into larger programming projects warrant careful consideration. Programmers need to evaluate application context, user's expertise, and the practicality of integration when choosing between programming methodologies. Creation-Date: 20240804 Handle: RePEc:boc:usug24:04 Template-Type: ReDIF-Paper 1.0 Title: Causal mediation analysis in Stata File-URL: http://repec.org/usug2024/US24_Dallakyan.pptx Author-Name: Aramayis Dallakyan Author-Workplace-Name: StataCorp Abstract: Causal mediation analysis determines the mechanism through which a treatment influences an outcome through a mediator. The objective of this presentation is to provide a practical guide, facilitating an understanding and implementation of causal mediation analysis using Stata 18's new mediate command. We will begin by introducing the fundamental steps of causal analysis and then apply these steps to causal mediation analysis. Additionally, we will highlight the differences between causal and traditional mediation analysis. The presentation will also delve into various types of direct and indirect effects, illustrating their practical applications. Examples demonstrating how to perform causal mediation estimation within Stata, using different types of outcomes and mediators (continuous, binary, and count), will be provided. No prior knowledge of Stata is required, although a basic understanding of causal inference will be beneficial. Creation-Date: 20240804 Handle: RePEc:boc:usug24:05 Template-Type: ReDIF-Paper 1.0 Title: Seeing is believing: Added-variable plots for complex estimators File-URL: http://repec.org/usug2024/US24_Gallup.pdf Author-Name: John Gallup Author-Workplace-Name: Portland State University, Department of Economics Abstract: Added-variable plots show the contribution of each data point of one explanatory variable to an outcome variable, while controlling for the influence of multiple other explanatory variables. This is a multivariate generalization of a scatterplot with a trend line. It provides an intuitive visual presentation of complex estimation results to specialists and nonspecialists alike. The plots show the marginal effect of an explanatory variable on the outcome as well as how closely the data adhere to the estimate. Observers can see outliers and the statistical significance of the estimated coefficient. The more complex the estimation method, the more helpful it is to have an accessible visual representation of the results. Currently, added-variable plots are available only for OLS regression in Stata. I recently extended the theory of added-variable plots to all commonly used linear and nonlinear estimators, including generalized least squares, instrumental variables, maximum likelihood, nonlinear least squares, and generalized method of moments estimators. I am in the process of programming added-variable plot commands for all Stata estimators. I have started with added-variable plots for panel data (xt) estimators (SJ 2020) and will shortly add them for instrumental variables and time-series estimators. Creation-Date: 20240804 Handle: RePEc:boc:usug24:06 Template-Type: ReDIF-Paper 1.0 Title: Streamlining Stata project collaboration and package management, introducing repado and reproot from the repkit package File-URL: http://repec.org/usug2024/US24_Bjarkefur1.pdf Author-Name: Kristoffer Bjarkefur Author-Workplace-Name: The World Bank, DIME/LSMS Author-Name: Luis Eduardo San Martin Author-Workplace-Name: World Bank Author-Name: Benjamin B. Daniels Author-Workplace-Name: World Bank Author-Person: pda505 Abstract: In this presentation, we introduce two novel commands, repado and reproot, as part of the Stata package repkit, designed to streamline projects package version control and management across teams. The repado command facilitates precise version control of Stata packages within projects by establishing project-specific ado-path folders. This ensures consistent usage of package dependencies among team members and enhances reproducibility by preserving access to specific command versions, vital for revisiting older projects. Moreover, repado proves instrumental in package development, enabling seamless testing of unpublished commands alongside stable versions in diverse project environments. Complementing repado, reproot offers efficient management of root paths across projects with minimal manual intervention. Unlike existing packages addressing the same inefficiency, like setroot, reproot excels in handling multirooted projects, such as those involving Git collaboration and data sharing on diverse platforms like Dropbox or network drives. Its streamlined setup ensures rapid root path identification, even when a project's roots are in different location in a project. The setup of reproot only needs to be done once per computer for all projects. This helps optimize project navigation and facilitate seamless integration across team workflows, especially in teams and organizations collaborating on many projects. Creation-Date: 20240804 Handle: RePEc:boc:usug24:07 Template-Type: ReDIF-Paper 1.0 Title: reprun, automating complete reproducibility verifications File-URL: http://repec.org/usug2024/US24_Daniels.pdf Author-Name: Benjamin B. Daniels Author-Workplace-Name: World Bank, DIME Author-Person: pda505 Author-Name: Ankriti Singh Author-Workplace-Name: World Bank, DIME Author-Name: Luis Eduardo San Martin Author-Workplace-Name: World Bank Author-Name: Kristoffer Bjarkefur Author-Workplace-Name: The World Bank, DIME/LSMS Abstract: The reprun command in Stata is designed to automate reproducibility verifications for sets of Stata do-files. This session presents detailed updates to the command in the context of DIME Analytics’s repkit package, which spans a complete workflow for the reproducibility verifications. The repkit package aims to ensure that the outputs of reproducibility packages are stable and reproducible, addressing the common sources of reproducibility failures. By identifying and correcting issues, users can improve the reliability of their statistical analyses, making them suitable for sharing and publication. The reprun command performs two runs of a specified do-file, recording the state of Stata after each line’s execution during the first run and then comparing it with the state after the same line’s execution in the second run. Key states monitored include the random-number generator (RNG) state, data sort order, and data contents. If discrepancies occur between the two runs, reprun flags potential reproducibility errors, reporting mismatches in a table format, which helps in identifying and resolving issues. This tool emphasizes the importance of managing randomness and maintaining consistent data states to avoid reproducibility errors, especially when inconsistent outputs are far downstream in code from their sources. Creation-Date: 20240804 Handle: RePEc:boc:usug24:08 Template-Type: ReDIF-Paper 1.0 Title: adodown—a framework for Stata package development File-URL: http://repec.org/usug2024//US24_Bjarkefur2.pdf Author-Name: Kristoffer Bjarkefur Author-Workplace-Name: The World Bank, DIME/LSMS Author-Name: Arthur Shaw Author-Workplace-Name: World Bank, DIME/LSMS Abstract: adodown aims to make Stata packages easier for both developers to create and users to understand. For developers, adodown offers workflow commands that automate manual tasks at each stage of development. At project's start, adodown creates the necessary scaffolding for the package (folders, pkg-file, etc). For each package command, it uses templates to create necessary files (i.e., ado, documentation, unit test) and adds appropriate entries in the pkg-file. For documentation, it allows developers to draft in plain Markdown while creating standard help files in SMCL. And for publication, adodown collects the required files, puts them in proper format, and prepares a zip file for SSC submission. Also, adodown automatically deploys a package documentation website. For users, this provides an easy way to discover packages, to understand what they do, and to explore how commands work—all without installing the package. For developers, this provides packages with a welcome web presence and offers a home for additional documentation (e.g., how-to guides, technical notes, FAQs) and keeps HTML documentation up to date with SMCL documentation through continuous deployment via GitHub Actions. This talk will demonstrate how adodown works, showcase a few live examples, and seek feedback from the Stata community. Creation-Date: 20240804 Handle: RePEc:boc:usug24:09 Template-Type: ReDIF-Paper 1.0 Title: Estimating the wage premia of refugee immigrants with coarsened exact matching and recentered influence function quantile regressions File-URL: http://repec.org/usug2024/US24_Baum.pdf Abstract: In this case study, we examine the wage earnings of fully employed previous refugee immigrants in Sweden. Using administrative employer–employee data from 1990 onward, about 100,000 refugee immigrants who arrived between 1980 and 1996 and were granted asylum are compared with a matched sample of native-born workers using coarsened exact matching. Employing recentered influence function (RIF) quantile regressions to wage earnings for the period 2011–2015, the occupational-task-based Oaxaca–Blinder decomposition approach shows that refugees perform better than natives at the median wage, controlling for individual and firm characteristics. The RIF-quantile approach provides better insights for the analysis of these wage differentials than the standard regression model employed in earlier versions of the study. Author-Name: Kit Baum Author-Workplace-Name: Boston College Author-Person: pba1 Author-Name: Hans Lööf Author-Workplace-Name: Royal Institute of Technology Author-Person: plf1 Author-Name: Andreas Stephan Author-Workplace-Name: Linnaeus University Author-Person: pst185 Author-Name: Klaus Zimmermann Author-Workplace-Name: UNU-MERIT Author-Person: pzi13 Creation-Date: 20240804 Handle: RePEc:boc:usug24:10 Template-Type: ReDIF-Paper 1.0 Title: Optimal policy learning with observational data in multiaction scenarios: Stata implementation File-URL: http://repec.org/usug2024/US24_Cerulli.pdf Author-Name: Giovanni Cerulli Author-Workplace-Name: CNR-IRCRES, Research Institute on Sustainable Economic Growth, National Research Council of Italy Author-Person: pce40 Author-Name: Antonio Zinilli Author-Workplace-Name: CNR-IRCRES, Research Institute on Sustainable Economic Growth, National Research Council of Italy Author-Person: pzi131 Abstract: This presentation presents a new Stata command for carrying out optimal policy learning (OPL) with observational data, i.e., data-driven optimal decision-making, in multiaction (or multiarm) settings, where a finite set of decision options is available. The presentation and related command focus on three components: estimation, risk preference, and regret estimation via three estimation methods (i.e., regression adjustment, inverse probability weighting, and doubly robust estimators). After briefly presenting the statistical background of this OPL model and the related syntax of the Stata command, the presentation will focus on an application related to climate-related agricultural policies. Creation-Date: 20240804 Handle: RePEc:boc:usug24:11 Template-Type: ReDIF-Paper 1.0 Title: Estimating a probit model with a continuous endogenous covariate and using complex survey data: An application to socioeconomic mobility analysis in Mexico File-URL: http://repec.org/usug2024/US24_Peon.pdf Author-Name: Sylvia Beatriz Guillermo Peon Author-Workplace-Name: Benemérita Universidad Autónoma de Puebla Author-Name: Alejandro Miguel Castañeda Valencia Author-Workplace-Name: Benemérita Universidad Autónoma de Puebla Author-Name: Juan Enrique Huerta Wong Author-Workplace-Name: Vocería de Presidencia de la República Abstract: We use Stata to estimate the probability of having a high socioeconomic destination as a function of education, parental economic level, and other explanatory variables. Given the potential endogeneity of the education variable, we estimate a probit model with an instrumental variable under the context of a complex survey dataset. The maximum-likelihood estimation procedure of the structural parameters is carried out using two equivalent strategies that consider Stata functions and reporting options. Following Long and Freese (2014), we first estimate the model using the ivprobit command with sampling weights and clustered robust standard errors, allowing us to obtain the report of the Wald exogeneity test. As a second strategy, we use the ivprobit command with survey data analysis estimation. We performed additional steps needed to compute the overall rate of correctly classified results after estimation under survey data analysis or using sampling weights. The validity of an instruments test remains challenging for ivprobit models with survey data. Our analysis of the estimation results is extended and enriched with the calculation of odds ratios (testing whether they are statistically different from one) and average probabilities by region in Mexico and educational level. Creation-Date: 20240804 Handle: RePEc:boc:usug24:12 Template-Type: ReDIF-Paper 1.0 Title: Data visualization with Stata File-URL: http://repec.org/usug2024/US24_Peng.html Author-Name: Hua Peng Author-Workplace-Name: StataCorp Abstract: This talk will demonstrate how to produce informative, robust, and complex graphs using reproducible official and community-contributed routines in Stata. We will also discuss commonly used programming tools and tips for creating more engaging graphs. Creation-Date: 20240804 Handle: RePEc:boc:usug24:13 Template-Type: ReDIF-Paper 1.0 Title: Postestimation with latent class analysis accounting for class uncertainty File-URL: http://repec.org/usug2024/US24_Kolenikov.pdf Author-Name: Stas Kolenikov Author-Workplace-Name: NORC Author-Person: pko3 Author-Name: Kathy Rowan Author-Workplace-Name: NORC Abstract: Latent class analysis (LCA) is a statistical model with categorical latent variables in which the measured categorical outcomes have proportions of the outcome categories that differ between classes. In official Stata, the model is fit using the gsem, lclass() command. Applied researchers often need to follow up the LCA modeling with other statistical analyses that involve the classes from the model, from simple descriptive statistics of variables not in the model, to multivariate models. A simplified shortcut procedure is to assign the class with the highest predicted probability, but doing so results in treating the classes as fixed and perfectly observed, rather than latent and estimated, leading to underaccounting of uncertainty and downward bias in standard errors. We demonstrate how to utilize the existing official Stata multiple imputation (MI) capacity to impute classes based on the LCA postestimation results and present the resulting dataset to Stata mi procedures as valid MI data. The standard MI diagnostics that can be applied to the mi estimate results show that variances are noticeably underestimated when only the modal class is imputed. In the application that motivated this development, the variances were biased down by 25% to 40%. Creation-Date: 20240804 Handle: RePEc:boc:usug24:14 Template-Type: ReDIF-Paper 1.0 Title: Ensuring reproducibility in Stata: Insights from the World Bank's reproducible research repository File-URL: http://repec.org/usug2024/US24_San_Martin.pdf Author-Name: Luis Eduardo San Martin Author-Workplace-Name: World Bank Author-Name: Maria Ruth Jones Author-Workplace-Name: World Bank Author-Name: Maria Reyes Retana Author-Workplace-Name: World Bank Author-Name: Benjamin B. Daniels Author-Workplace-Name: World Bank, DIME Author-Person: pda505 Author-Name: Kristoffer Bjarkefur Author-Workplace-Name: The World Bank, DIME/LSMS Abstract: The challenge of reproducing economics research has gained increased attention with the growing advocacy for open science in the field. Economics journals and research institutions are quickly adopting reproducibility guidelines, requiring authors to provide code and data for reproducing results and ensuring the trustworthiness of their findings. Presented by the Development Impact Analytics team of the World Bank, this session delves into the intricacies of achieving reproducibility in Stata. Since the launch of the World Bank's Reproducible Research Repository, the team has conducted reproducibility verifications and curated reproducibility packages for almost a hundred working papers from diverse research teams in the organization, building up a valuable and novel experience into addressing common issues that break reproducibility in Stata analyses. The session will present an overview of the workflows and tools the team has developed in response to identified reproducibility challenges in typical Stata works, covering key topics such as controlling the versions of external dependencies and appropriately handling randomness in Stata code. The presentation will include practical strategies for enhancing the transparency and reliability of Stata-based research. Creation-Date: 20240804 Handle: RePEc:boc:usug24:15 Template-Type: ReDIF-Paper 1.0 Title: classify: Over 200 measures of association, correlation, and forecast accuracy for categorical outcomes File-URL: http://repec.org/usug2024/US24_Sirchenko.pdf Author-Name: Andriy Sirchenko Author-Workplace-Name: Nyenrode Business University Author-Person: psi424 Author-Name: J. Huismans Author-Workplace-Name: University of Amsterdam Author-Name: J. W. Nijenhuis Author-Workplace-Name: Nedap NV Abstract:We describe a new Stata command, classify, that computes various measures of association and correlation, between two categorical variables (dichotomous and polytomous, nominal and ordinal), diagnostic scores of probabilistic forecasts of such variables, and various measures of the accuracy of deterministic forecasts of them. We compiled a comprehensive catalogue of over 210 measures of association, correlation and forecast verification and 9 diagnostic scores for probabilistic forecasts from different fields, along with the terminological synonymy and bibliography associated with them. In addition to the overall measures, the command computes the class-specific metrics as well as their macro and weighted averages. Creation-Date: 20240804 Handle: RePEc:boc:usug24:16 Template-Type: ReDIF-Paper 1.0 Title: Scalable high-dimensional nonparametric density estimation, with Bayesian applications File-URL: http://repec.org/usug2024/US24_Grant.pdf Author-Name: Robert Grant Author-Workplace-Name: BayesCamp Abstract: Few methods have been proposed for flexible, nonparametric density estimation, and they do not scale well to high-dimensional problems. We describe a new approach based on smoothed trees called the kudzu density (Grant 2022). This fits the little-known density estimation tree (Ram & Gray 2011) to a dataset and convolves the edges with inverse logistic functions, which are in the class of computationally minimal smooth ramps. New Stata commands provide tree fitting, kudzu tuning, estimates of joint, marginal and cumulative densities, and pseudo-random numbers. Results will be shown for fidelity and computational cost. Preliminary results will also be shown for ensembles of kudzu under bagging and boosting. Kudzu densities are useful for Bayesian model updating where models have many unknowns, require rapid update, datasets are large, and posteriors have no guarantee of convexity and unimodality. The input “dataset” is the posterior sample from a previous analysis. This is demonstrated with a real-life large dataset. A new command outputs code to use the kudzu prior in bayesmh evaluators, BUGS/JAGS, and Stan. Creation-Date: 20240804 Handle: RePEc:boc:usug24:17 Template-Type: ReDIF-Paper 1.0 Title: Using Stata to analyze public opinion data: An open educational resource for beginning undergraduate and graduate methods courses in political science File-URL: http://repec.org/usug2024/US24_Benstead.pptx Author-Name: Lindsay Benstead Author-Workplace-Name: Portland State University Abstract: This presentation provides an overview of a new Open Educational Resource (OER) textbook for analyzing public opinion data using Stata created by the author. Based on the author’s experience teaching research methods for political science at Portland State University, the guide walks students through an article published in Democratization using publicly available data from the Arab Barometer. Students learn how to craft a research question, identify and measure variables, and perform descriptive and multivariate statistical tests. At the end of the course, students will be able to replicate the published findings and craft their own research design for analyzing public opinion data. The author presents the Stata commands used to produce the results reported in the published article to guide students through the replication and their own original research project. Creation-Date: 20240804 Handle: RePEc:boc:usug24:18