Template-Type: ReDIF-Paper 1.0 Title: pystacked: Stacking generalization and machine learning in Stata File-URL: http://repec.org/csug2022/Ahrens-Bern2022-pystacked.pdf Author-Name: Christian B. Hansen Author-Workplace-Name: University of Chicago Author-Person: pha982 Author-Name: Mark E. Schaffer Author-Workplace-Name: Heriot-Watt University Author-Person: psc51 Author-Name: Achim Ahrens Author-Workplace-Name: ETH Zürich Author-Person: pah173 Abstract: pystacked implements stacked generalization (Wolpert 1992) for regression and binary classification via Python’s scikit-learn. Stacking combines multiple supervised machine learners—the “base” or “level-0” learners—into a single learner. The currently supported base learners include regularized regression, random forest, gradient boosted trees, support vector machines, and feed-forward neural nets (multilayer perceptron). pystacked can also be used as a ‘regular’ machine learning program to fit a single base learner and, thus, provides an easy-to-use API for scikit-learn’s machine learning algorithms. Creation-Date: 20221130 Handle: RePEc:boc:csug22:01 Template-Type: ReDIF-Paper 1.0 Title: ddml: Double/debiased machine learning in Stata File-URL: http://repec.org/csug2022/Ahrens-Bern2022-ddml.pdf Author-Name: Christian B. Hansen Author-Workplace-Name: University of Chicago Author-Person: pha982 Author-Name: Mark E. Schaffer Author-Workplace-Name: Heriot-Watt University Author-Person: psc51 Author-Name: Thomas Wiemann Author-Workplace-Name: University of Chicago Author-Name: Achim Ahrens Author-Workplace-Name: ETH Zürich Author-Person: pah173 Abstract: We introduce the Stata package ddml, which implements double/debiased machine learning (DDML) for causal inference aided by supervised machine learning. Five different models are supported, allowing for multiple treatment variables in the presence of high-dimensional controls and instrumental variables. ddml is compatible with many existing supervised machine learning programs in Stata. Creation-Date: 20221130 Handle: RePEc:boc:csug22:02 Template-Type: ReDIF-Paper 1.0 Title: Stata–Python API for bulk data download: Example with UN Comtrade File-URL: http://repec.org/csug2022/Wong-Bern2022-comtrade.pdf Author-Name: Ka Lok Wong Author-Workplace-Name: Geneva Graduate Institute Abstract: This presentation aims to guide the audience through the bulk download of Comtrade data via a Stata–Python integration setup that has been made available since Stata 16. Though this presentation is explicitly about the UN Comtrade dataset, the methodology employed is generalizable to other data platforms that allow API downloads. The UN Comtrade Database is one of the best sources when it comes to bilateral trade data by product code. As of early 2022, it covers more country-year observations than the World Trade 1 Organization and the International Trade Centre. However, tailoring the raw data to each researcher’s needs is often time-consuming. Using the Comtrade API with my Stata–Python setup would allow researchers to tailor their downloaded data to their desired specification. In addition, employing this setup significantly reduces human error when compared with the manual downloading and cleaning of Comtrade data. Creation-Date: 20221130 Handle: RePEc:boc:csug22:03 Template-Type: ReDIF-Paper 1.0 Title: Flexible and fast estimation of quantile treatment effects: The rqr and rqrplot commands File-URL: http://repec.org/csug2022/Haupt-Bern2022-rqr.pdf Author-Name: Andreas Haupt Author-Workplace-Name: Karlsruhe Institute of Technology Author-Name: Øyvind Wiborg Author-Workplace-Name: University of Oslo Author-Name: Nicolai T. Borgen Author-Workplace-Name: University of Oslo Abstract: Using quantile regression models to estimate quantile treatment effects is becoming increasingly popular. This presentation introduces the rqr command, which can be used to estimate residualized quantile regression (RQR) coefficients and the rqrplot postestimation command, which can be used to effortlessly plot the coefficients. The main advantages of the rqr command compared with other Stata commands that estimate (unconditional) quantile treatment effects are that it can include high-dimensional fixed effects and that it is considerably faster than the other commands. Creation-Date: 20221130 Handle: RePEc:boc:csug22:04 Template-Type: ReDIF-Paper 1.0 Title: Stata commands to estimate quantile regression with panel and grouped data File-URL: http://repec.org/csug2022/Melly-Bern2022-mdqr.pdf Author-Name: Martina Pons Author-Workplace-Name: Unversity of Bern Author-Name: Blaise Melly Author-Workplace-Name: Unversity of Bern Author-Person: pme143 Abstract: In this presentation, we introduce two Stata commands that allow estimating quantile regression with panel and grouped data. The commands implement two-step minimum-distance estimators. We first compute a quantile regression within each unit and then apply GMM to the fitted values from the first stage. The command xtmdqr applies to classical panel data, where we follow the same units over time, while the command mdqr applies to grouped data, where the observations are at the individual level but the treatment varies at the group level. Depending on the variables assumed to be exogenous, this approach provides quantile analogs of the classical least-squares panel-data estimators such as the fixed-effects, random-effects, between, and Hausman–Taylor estimators. For grouped (instrumental) quantile regression, we provide a more precise estimator than the existing estimators. In our companion paper (Melly and Pons, "Minimum distance estimation of quantile panel data models"), we study the theoretical properties of these estimators. Creation-Date: 20221130 Handle: RePEc:boc:csug22:05 Template-Type: ReDIF-Paper 1.0 Title: Improved tests for Granger noncausality in panel data File-URL: http://repec.org/csug2022/Karavias-Bern2022-xtgranger.pdf Author-Name: Arturas Juodis Author-Workplace-Name: University of Amsterdam Author-Person: pju116 Author-Name: Yiannis Karavias Author-Workplace-Name: University of Birmingham Author-Person: pka744 Author-Name: Vasilis Sarafidis Author-Workplace-Name: BI Norwegian Business School Author-Person: psa786 Author-Name: Jan Ditzen Author-Workplace-Name: Free University of Bozen-Bolzano Author-Person: pdi434 Author-Name: Jiaqi Xiao Author-Workplace-Name: University of Birmingham Abstract: Granger causality is an important aspect of applied panel (longitudinal) data analysis because it can be used to determine whether one variable is useful in forecasting another. This presentation describes xtgranger, a community-contributed Stata command, which implements the panel Granger noncausality test of Juodis, Karavias, and Sarafidis (2021). This test offers superior size and power performance to existing tests, which stems from the use of a pooled estimator that has a faster convergence rate. The test has several other useful properties; it can be used in multivariate systems, it has power against both homogeneous as well as heterogeneous alternatives, and it allows for cross-section dependence and cross-section heteroskedasticity. The command is used to examine the type of temporal relation between profitability, cost efficiency, and asset quality in the U.S. banking industry. Creation-Date: 20221130 Handle: RePEc:boc:csug22:06 Template-Type: ReDIF-Paper 1.0 Title: Drivers of COVID-19 deaths in the United States: A two-stage modeling approach File-URL: http://repec.org/csug2022/Baum-Bern2022-covid.pdf Author-Name: Andrés Garcia-Suaza Author-Workplace-Name: Universidad del Rosario Author-Person: pga253 Author-Name: Miguel Henry Author-Workplace-Name: Greylock McKinnon Associates Author-Person: phe668 Author-Name: Jesús Otero Author-Workplace-Name: Universidad del Rosario Author-Person: pot11 Author-Name: Kit Baum Author-Workplace-Name: Boston College Author-Person: pba1 Abstract: We offer a two-stage (time-series and cross-section) econometric modeling approach to examine the drivers behind the spread of COVID-19 deaths across counties in the United States. Our empirical strategy exploits the availability of two years (January 2020 through January 2022) of daily data on the number of confirmed deaths and cases of COVID-19 in the 3,000 U.S. counties of the 48 contiguous states and the District of Columbia. In the first stage of the analysis, we use daily time-series data on COVID-19 cases and deaths to fit mixed models of deaths against lagged confirmed cases for each county. Because the resulting coefficients are county specific, they relax the homogeneity assumption that is implicit when the analysis is performed using geographically aggregated cross-section units. In the second stage of the analysis, we assume that these county estimates are a function of economic and sociodemographic factors that are taken as fixed over the course of the pandemic. Here we employ the novel one-covariate-at-atime variable-selection algorithm proposed by Chudik et al. (2018) to guide the choice of regressors. Creation-Date: 20221130 Handle: RePEc:boc:csug22:07 Template-Type: ReDIF-Paper 1.0 Title: Bayesian time series in Stata 17 File-URL: http://repec.org/csug2022/Schenck-Bern2022.pdf Author-Name: David Schenck Author-Workplace-Name: StataCorp Abstract: Stata 17 introduced Bayesian support for several multivariate time-series commands. In this presentation, I will discuss Bayesian vector autoregressive models and Bayesian DSGE models. Bayesian estimation is well suited to these models because economic considerations often impose structure that is captured well by informative priors. I will describe the main features of these commands, as well as Bayesian diagnostics, posterior hypothesis tests, predictions, impulse–response functions, and forecasts. Creation-Date: 20221130 Handle: RePEc:boc:csug22:08 Template-Type: ReDIF-Paper 1.0 Title: Network regressions in Stata File-URL: http://repec.org/csug2022/Ditzen-Bern2022-nwxtregress.pdf Author-Name: William Grieser Author-Workplace-Name: Texas Christian University Author-Name: Morad Zekhnini Author-Workplace-Name: Michigan State University Author-Name: Jan Ditzen Author-Workplace-Name: Free University of Bozen-Bolzano Author-Person: pdi434 Abstract: Network analysis has become critical to the study of social sciences. While several Stata programs are available for analyzing network structures, programs that execute regression analysis with a network structure are currently lacking. We fill this gap by introducing the nwxtregress command. Building on spatial econometric methods (LeSage and Pace 2009), nwxtregress uses MCMC estimation to produce estimates of endogenous peer effects, as well as own-node (direct) and cross-node (indirect) partial effects, where nodes correspond to cross-sectional units of observation, such as firms, and edges correspond to the relations between nodes. Unlike existing spatial regression commands (for example, spxtregress), nwxtregress is designed to handle unbalanced panels of economic and social networks as in Grieser et al. (2021). Networks can be directed or undirected with weighted or unweighted edges, and they can be imported in a list format that does not require a shapefile or a Stata spatial weight matrix set by spmatrix. Finally, the command allows for the inclusion or exclusion of contextual effects. To improve speed, the command transforms the spatial weighting matrix into a sparse matrix. Future work will be targeted toward improving sparse matrix routines, as well as introducing a framework that allows for multiple networks. Creation-Date: 20221130 Handle: RePEc:boc:csug22:09 Template-Type: ReDIF-Paper 1.0 Title: Exchangeably weighted bootstrap schemes File-URL: http://repec.org/csug2022/VanKerm-Bern2022-exbsample.pdf Author-Name: Philippe Van Kerm Author-Workplace-Name: Luxembourg Institute of Socio-Economic Research Author-Person: pva19 Abstract: The exchangeably weighted bootstrap is one of the many variants of bootstrap resampling schemes. Rather than directly drawing observations with replacement from the data, weighted bootstrap schemes generate vectors of replication weights to form bootstrap replications. Various ways to generate the replication weights can be adopted, and some choices bring practical computational advantages. This presentation demonstrates how easily such schemes can be implemented and where they are particularly useful, and introduces the exbsample command, which facilitates their implementation. Creation-Date: 20221130 Handle: RePEc:boc:csug22:10 Template-Type: ReDIF-Paper 1.0 Title: Marginal odds ratios: What they are, how to compute them, and why applied researchers might want to use them File-URL: http://repec.org/csug2022/Jann-Bern2022-lnmor.pdf Author-Name: Kristian Bernt Karlson Author-Workplace-Name: Unversity of Copenhagen Author-Person: pka471 Author-Name: Ben Jann Author-Workplace-Name: University of Bern Author-Person: pja61 Abstract: Logistic response models form the backbone of much applied quantitative research in epidemiology and the social sciences. However, recent methodological research highlights difficulties in interpreting odds ratios, particularly in a multivariate modeling setting. These difficulties arise from the fact that coefficients from nonlinear probability models such as the logistic response model (for example, log odds-ratios) depend on model specification in ways that differ from the linear model. Applied researchers have responded to this situation by reporting marginal effects on the probability scale implied by the nonlinear probability model or obtained by the linear probability model. Although marginal effects on the probability scale have many desirable properties, they do not align well with research in which relative inequality is a key concept. We argue that, in many cases, the odds ratio is preferable because it is a relative measure that does not depend on the marginal distribution of the dependent variable. In our presentation, we aim to remedy the declining popularity of the odds ratio by introducing what we term the "marginal odds ratio", that is, logit coefficients that have similar properties as marginal effects on the probability scale but that retain the odds-ratio interpretation. We define the marginal odds ratio theoretically in terms of potential outcomes, both for binary and continuous treatments, we develop estimation methods using three different approaches (G-computation, inverse probability weighting, RIF regression), and we present examples that illustrate the usefulness and interpretation of the marginal odds ratio. Creation-Date: 20221130 Handle: RePEc:boc:csug22:11 Template-Type: ReDIF-Paper 1.0 Title: It is all about the data File-URL: http://repec.org/csug2022/Buis-Bern2022.pdf File-URL: http://repec.org/csug2022/Buis-Bern2022.zip Author-Name: Maarten Buis Author-Workplace-Name: University of Konstanz Author-Person: pbu92 Abstract: This presentation is a collection of tips for exploring a new dataset and preparing a dataset using both official and community-contributed commands. Community contributed commands that will be covered are lany, lookfor2, htmlcb, and closedesc. Creation-Date: 20221130 Handle: RePEc:boc:csug22:12 Template-Type: ReDIF-Paper 1.0 Title: btable: Extensive summary tables in Stata File-URL: http://repec.org/csug2022/Buetikofer-Bern2022-btable.pdf File-URL: http://repec.org/csug2022/Buetikofer-Bern2022-btable.html Author-Name: Lukas Bütikofer Author-Workplace-Name: University of Bern Abstract: The construction of summary tables is a very common, repetitive, and time-consuming step in data analysis. btable is a flexible, easy-to-use, and powerful algorithm for generating such tables in Stata. It is freely available from GitHub. btable can summarize continuous, categorical, count, and time-to-event variables within one table using various descriptive statistics that can be individually chosen and combined for each variable. If the summary is grouped, effect measures with confidence intervals and p-values are added. User-defined effect measures and tests can be integrated. The table is constructed in a two-step approach using two functions: btable produces an unformatted, raw table, which is then formatted by btable_format to produce a final, publication-ready table. By default, the raw table contains all descriptive statistics, and, if grouped, effect measures with confidence intervals and p-values. The formatting step allows for variable-specific selection and formatting. The two-step approach separates data analysis and formatting. The analysis step does not change the current dataset, and the raw data table can be loaded, formatted by hand, or used for other purposes. The formatting step can be modified without rerunning the analysis. Creation-Date: 20221130 Handle: RePEc:boc:csug22:13 Template-Type: ReDIF-Paper 1.0 Title: Visualizing categorical data with hammock plots File-URL: http://repec.org/csug2022/Schonlau-Bern2022-hammock.pdf Author-Name: Matthias Schonlau Author-Workplace-Name: University of Waterloo Abstract: Visualizing data with more than two variables is not straightforward, especially when some variables are categorical rather than continuous. My hammock plots are one option to visualize categorical data and mixed categorical/continuous data. Hammock plots can be viewed as a generalization of parallel coordinate plots, where the lines are replaced by rectangles that are proportional to the number of observations they represent. I will introduce my Stata program for hammock plots and give several short examples where I have found them useful. Creation-Date: 20221130 Handle: RePEc:boc:csug22:14 Template-Type: ReDIF-Paper 1.0 Title:circlebar: A Stata package for plotting circular bar graph File-URL: http://repec.org/csug2022/Naqvi-Bern2022-circlebar-spider.pdf File-URL: http://repec.org/csug2022/Naqvi-Bern2022-circlebar-spider.do Author-Name: Asjad Naqvi Author-Workplace-Name: Austrian Institute for Economic Research Author-Workplace-Name: Vienna University of Economics and Business Author-Person: pna493 Abstract: This presentation will introduce circlebar, a Stata package that allows users to visualize data as circular bar graphs organized in polar coordinates. The command allows for flexibility of selecting and changing bar dimensions, including starting and ending circles, colors and label placements, and controlling spacing between the bars. Creation-Date: 20221130 Handle: RePEc:boc:csug22:15