Template-Type: ReDIF-Paper 1.0
Title: pystacked: Stacking generalization and machine learning in Stata
File-URL: http://repec.org/csug2022/Ahrens-Bern2022-pystacked.pdf
Author-Name: Christian B. Hansen
Author-Workplace-Name: University of Chicago 
Author-Person: pha982
Author-Name: Mark E. Schaffer
Author-Workplace-Name: Heriot-Watt University
Author-Person: psc51
Author-Name: Achim Ahrens
Author-Workplace-Name: ETH Zürich
Author-Person: pah173
Abstract: pystacked implements stacked generalization (Wolpert 1992) for regression and binary classification via Python’s scikit-learn.
 Stacking combines multiple supervised machine learners—the “base” or “level-0” learners—into a single learner. The currently supported base learners include regularized regression, random forest, gradient boosted trees, support vector machines, and feed-forward neural nets (multilayer perceptron). pystacked can also be used as a ‘regular’ machine learning program to fit a single base learner and, thus, provides an easy-to-use API for scikit-learn’s machine learning algorithms.
Creation-Date: 20221130
Handle: RePEc:boc:csug22:01

Template-Type: ReDIF-Paper 1.0
Title: ddml: Double/debiased machine learning in Stata
File-URL: http://repec.org/csug2022/Ahrens-Bern2022-ddml.pdf
Author-Name: Christian B. Hansen
Author-Workplace-Name: University of Chicago 
Author-Person: pha982
Author-Name: Mark E. Schaffer
Author-Workplace-Name: Heriot-Watt University
Author-Person: psc51
Author-Name: Thomas Wiemann
Author-Workplace-Name: University of Chicago 
Author-Name: Achim Ahrens
Author-Workplace-Name: ETH Zürich
Author-Person: pah173
Abstract: We introduce the Stata package ddml, which implements double/debiased machine learning (DDML) for causal inference aided by supervised machine learning.
Five different models are supported, allowing for multiple treatment variables in the presence of high-dimensional controls and instrumental variables. ddml is compatible with many existing supervised machine learning programs in Stata.
Creation-Date: 20221130
Handle: RePEc:boc:csug22:02

Template-Type: ReDIF-Paper 1.0
Title: Stata–Python API for bulk data download: Example with UN Comtrade
File-URL: http://repec.org/csug2022/Wong-Bern2022-comtrade.pdf
Author-Name: Ka Lok Wong
Author-Workplace-Name: Geneva Graduate Institute
Abstract: This presentation aims to guide the audience through the bulk download of Comtrade data via a Stata–Python integration setup that has been made available since Stata 16.
Though this presentation is explicitly about the UN Comtrade dataset, the methodology employed is generalizable to other data platforms that allow API downloads. The UN Comtrade Database is one of the best sources when it comes to bilateral trade data by product code. As of early 2022, it covers more country-year observations than the World Trade 1 Organization and the International Trade Centre. However, tailoring the raw data to each researcher’s needs is often time-consuming. Using the Comtrade API with my Stata–Python setup would allow researchers to tailor their downloaded data to their desired specification. In addition, employing this setup significantly reduces human error when compared with the manual downloading and cleaning of Comtrade data.
Creation-Date: 20221130
Handle: RePEc:boc:csug22:03

Template-Type: ReDIF-Paper 1.0
Title: Flexible and fast estimation of quantile treatment effects: The rqr and rqrplot commands
File-URL: http://repec.org/csug2022/Haupt-Bern2022-rqr.pdf
Author-Name: Andreas Haupt
Author-Workplace-Name: Karlsruhe Institute of Technology
Author-Name: Øyvind Wiborg
Author-Workplace-Name: University of Oslo
Author-Name: Nicolai T. Borgen
Author-Workplace-Name: University of Oslo
Abstract: Using quantile regression models to estimate quantile treatment effects is becoming increasingly popular.
This presentation introduces the rqr command, which can be used to estimate residualized quantile regression (RQR) coefficients and the rqrplot postestimation command, which can be used to effortlessly plot the coefficients. The main advantages of the rqr command compared with other Stata commands that estimate (unconditional) quantile treatment effects are that it can include high-dimensional fixed effects and that it is considerably faster than the other commands.
Creation-Date: 20221130
Handle: RePEc:boc:csug22:04

Template-Type: ReDIF-Paper 1.0
Title: Stata commands to estimate quantile regression with panel and grouped data
File-URL: http://repec.org/csug2022/Melly-Bern2022-mdqr.pdf
Author-Name: Martina Pons
Author-Workplace-Name: Unversity of Bern
Author-Name: Blaise Melly
Author-Workplace-Name: Unversity of Bern
Author-Person: pme143
Abstract: In this presentation, we introduce two Stata commands that allow estimating quantile regression with panel and grouped data.
The commands implement two-step minimum-distance estimators. We first compute a quantile regression within each unit and then apply GMM to the fitted values from the first stage. The command xtmdqr applies to classical panel data, where we follow the same units over time, while the command mdqr applies to grouped data, where the observations are at the individual level but the treatment varies at the group level. Depending on the variables assumed to be exogenous, this approach provides quantile analogs of the classical least-squares panel-data estimators such as the fixed-effects, random-effects, between, and Hausman–Taylor estimators. For grouped (instrumental) quantile regression, we provide a more precise estimator than the existing estimators. In our companion paper (Melly and Pons, "Minimum distance estimation of quantile panel data models"), we study the theoretical properties of these estimators.
Creation-Date: 20221130
Handle: RePEc:boc:csug22:05

Template-Type: ReDIF-Paper 1.0
Title: Improved tests for Granger noncausality in panel data
File-URL: http://repec.org/csug2022/Karavias-Bern2022-xtgranger.pdf
Author-Name: Arturas Juodis
Author-Workplace-Name: University of Amsterdam
Author-Person: pju116
Author-Name: Yiannis Karavias
Author-Workplace-Name: University of Birmingham
Author-Person: pka744
Author-Name: Vasilis Sarafidis
Author-Workplace-Name: BI Norwegian Business School
Author-Person: psa786
Author-Name: Jan Ditzen
Author-Workplace-Name: Free University of Bozen-Bolzano
Author-Person: pdi434
Author-Name: Jiaqi Xiao
Author-Workplace-Name: University of Birmingham
Abstract: Granger causality is an important aspect of applied panel (longitudinal) data analysis because it can be used to determine whether one variable is useful in forecasting another.
This presentation describes xtgranger, a community-contributed Stata command, which implements the panel Granger noncausality test of Juodis, Karavias, and Sarafidis (2021). This test offers superior size and power performance to existing tests, which stems from the use of a pooled estimator that has a faster convergence rate. The test has several other useful properties; it can be used in multivariate systems, it has power against both homogeneous as well as heterogeneous alternatives, and it allows for cross-section dependence and cross-section heteroskedasticity. The command is used to examine the type of temporal relation between profitability, cost efficiency, and asset quality in the U.S. banking industry.
Creation-Date: 20221130
Handle: RePEc:boc:csug22:06

Template-Type: ReDIF-Paper 1.0
Title: Drivers of COVID-19 deaths in the United States: A two-stage modeling approach
File-URL: http://repec.org/csug2022/Baum-Bern2022-covid.pdf
Author-Name: Andrés Garcia-Suaza
Author-Workplace-Name: Universidad del Rosario
Author-Person: pga253
Author-Name: Miguel Henry
Author-Workplace-Name: Greylock McKinnon Associates
Author-Person: phe668
Author-Name: Jesús Otero
Author-Workplace-Name: Universidad del Rosario
Author-Person: pot11
Author-Name: Kit Baum
Author-Workplace-Name: Boston College
Author-Person: pba1
Abstract: We offer a two-stage (time-series and cross-section) econometric modeling approach to examine the drivers behind the spread of COVID-19 deaths across counties in the United States.
Our empirical strategy exploits the availability of two years (January 2020 through January 2022) of daily data on the number of confirmed deaths and cases of COVID-19 in the 3,000 U.S. counties of the 48 contiguous states and the District of Columbia. In the first stage of the analysis, we use daily time-series data on COVID-19 cases and deaths to fit mixed models of deaths against lagged confirmed cases for each county. Because the resulting coefficients are county specific, they relax the homogeneity assumption that is implicit when the analysis is performed using geographically aggregated cross-section units. In the second stage of the analysis, we assume that these county estimates are a function of economic and sociodemographic factors that are taken as fixed over the course of the pandemic. Here we employ the novel one-covariate-at-atime variable-selection algorithm proposed by Chudik et al. (2018) to guide the choice of regressors.
Creation-Date: 20221130
Handle: RePEc:boc:csug22:07

Template-Type: ReDIF-Paper 1.0
Title: Bayesian time series in Stata 17
File-URL: http://repec.org/csug2022/Schenck-Bern2022.pdf
Author-Name: David Schenck
Author-Workplace-Name: StataCorp	
Abstract: Stata 17 introduced Bayesian support for several multivariate time-series commands.
In this presentation, I will discuss Bayesian vector autoregressive models and Bayesian DSGE models. Bayesian estimation is well suited to these models because economic considerations often impose structure that is captured well by informative priors. I will describe the main features of these commands, as well as Bayesian diagnostics, posterior hypothesis tests, predictions, impulse–response functions, and forecasts.
Creation-Date: 20221130
Handle: RePEc:boc:csug22:08

Template-Type: ReDIF-Paper 1.0
Title: Network regressions in Stata
File-URL: http://repec.org/csug2022/Ditzen-Bern2022-nwxtregress.pdf
Author-Name: William Grieser
Author-Workplace-Name: Texas Christian University
Author-Name: Morad Zekhnini
Author-Workplace-Name: Michigan State University
Author-Name: Jan Ditzen
Author-Workplace-Name: Free University of Bozen-Bolzano
Author-Person: pdi434
Abstract: Network analysis has become critical to the study of social sciences.
While several Stata programs are available for analyzing network structures, programs that execute regression analysis with a network structure are currently lacking. We fill this gap by introducing the nwxtregress command. Building on spatial econometric methods (LeSage and Pace 2009), nwxtregress uses MCMC estimation to produce estimates of endogenous peer effects, as well as own-node (direct) and cross-node (indirect) partial effects, where nodes correspond to cross-sectional units of observation, such as firms, and edges correspond to the relations between nodes. Unlike existing spatial regression commands (for example, spxtregress), nwxtregress is designed to handle unbalanced panels of economic and social networks as in Grieser et al. (2021). Networks can be directed or undirected with weighted or unweighted edges, and they can be imported in a list format that does not require a shapefile or a Stata spatial weight matrix set by spmatrix. Finally, the command allows for the inclusion or exclusion of contextual effects. To improve speed, the command transforms the spatial weighting matrix into a sparse matrix. Future work will be targeted toward improving sparse matrix routines, as well as introducing a framework that allows for multiple networks.
Creation-Date: 20221130
Handle: RePEc:boc:csug22:09

Template-Type: ReDIF-Paper 1.0
Title: Exchangeably weighted bootstrap schemes
File-URL: http://repec.org/csug2022/VanKerm-Bern2022-exbsample.pdf
Author-Name: Philippe Van Kerm
Author-Workplace-Name: Luxembourg Institute of Socio-Economic Research
Author-Person: pva19
Abstract: The exchangeably weighted bootstrap is one of the many variants of bootstrap resampling schemes.
Rather than directly drawing observations with replacement from the data, weighted bootstrap schemes generate vectors of replication weights to form bootstrap replications. Various ways to generate the replication weights can be adopted, and some choices bring practical computational advantages. This presentation demonstrates how easily such schemes can be implemented and where they are particularly useful, and introduces the exbsample command, which facilitates their implementation.
Creation-Date: 20221130
Handle: RePEc:boc:csug22:10

Template-Type: ReDIF-Paper 1.0
Title: Marginal odds ratios: What they are, how to compute them, and why applied researchers might want to use them
File-URL: http://repec.org/csug2022/Jann-Bern2022-lnmor.pdf
Author-Name: Kristian Bernt Karlson
Author-Workplace-Name: Unversity of Copenhagen
Author-Person: pka471
Author-Name: Ben Jann
Author-Workplace-Name: University of Bern
Author-Person: pja61
Abstract: Logistic response models form the backbone of much applied quantitative research in epidemiology and the social sciences.
However, recent methodological research highlights difficulties in interpreting odds ratios, particularly in a multivariate modeling setting. These difficulties arise from the fact that coefficients from nonlinear probability models such as the logistic response model (for example, log odds-ratios) depend on model specification in ways that differ from the linear model. Applied researchers have responded to this situation by reporting marginal effects on the probability scale implied by the nonlinear probability model or obtained by the linear probability model.
Although marginal effects on the probability scale have many desirable properties, they do not align well with research in which relative inequality is a key concept. We argue that, in many cases, the odds ratio is preferable because it is a relative measure that does not depend on the marginal distribution of the dependent variable. In our presentation, we aim to remedy the declining popularity of the odds ratio by introducing what we term the "marginal odds ratio", that is, logit coefficients that have similar properties as marginal effects on the probability scale but that retain the odds-ratio interpretation. We define the marginal odds ratio theoretically in terms of potential outcomes, both for binary and continuous treatments, we develop estimation methods using three different approaches (G-computation, inverse probability weighting, RIF regression), and we present examples that illustrate the usefulness and interpretation of the marginal odds ratio.
Creation-Date: 20221130
Handle: RePEc:boc:csug22:11

Template-Type: ReDIF-Paper 1.0
Title: It is all about the data
File-URL: http://repec.org/csug2022/Buis-Bern2022.pdf
File-URL: http://repec.org/csug2022/Buis-Bern2022.zip
Author-Name: Maarten Buis
Author-Workplace-Name: University of Konstanz
Author-Person: pbu92
Abstract: This presentation is a collection of tips for exploring a new dataset and preparing a dataset using both official and community-contributed commands.
Community contributed commands that will be covered are lany, lookfor2, htmlcb, and closedesc.
Creation-Date: 20221130
Handle: RePEc:boc:csug22:12

Template-Type: ReDIF-Paper 1.0
Title: btable: Extensive summary tables in Stata
File-URL: http://repec.org/csug2022/Buetikofer-Bern2022-btable.pdf
File-URL: http://repec.org/csug2022/Buetikofer-Bern2022-btable.html
Author-Name: Lukas Bütikofer
Author-Workplace-Name: University of Bern
Abstract: The construction of summary tables is a very common, repetitive, and time-consuming step in data analysis.
btable is a flexible, easy-to-use, and powerful algorithm for generating such tables in Stata. It is freely available from GitHub. btable can summarize continuous, categorical, count, and time-to-event variables within one table using various descriptive statistics that can be individually chosen and combined for each variable. If the summary is grouped, effect measures with confidence intervals and p-values are added. User-defined effect measures and tests can be integrated.
The table is constructed in a two-step approach using two functions: btable produces an unformatted, raw table, which is then formatted by btable_format to produce a final, publication-ready table. By default, the raw table contains all descriptive statistics, and, if grouped, effect measures with confidence intervals and p-values. The formatting step allows for variable-specific selection and formatting. The two-step approach separates data analysis and formatting. The analysis step does not change the current dataset, and the raw data table can be loaded, formatted by hand, or used for other purposes. The formatting step can be modified without rerunning the analysis.
Creation-Date: 20221130
Handle: RePEc:boc:csug22:13

Template-Type: ReDIF-Paper 1.0
Title: Visualizing categorical data with hammock plots
File-URL: http://repec.org/csug2022/Schonlau-Bern2022-hammock.pdf
Author-Name: Matthias Schonlau
Author-Workplace-Name: University of Waterloo
Abstract: Visualizing data with more than two variables is not straightforward, especially when some variables are categorical rather than continuous.
My hammock plots are one option to visualize categorical data and mixed categorical/continuous data. Hammock plots can be viewed as a generalization of parallel coordinate plots, where the lines are replaced by rectangles that are proportional to the number of observations they represent. I will introduce my Stata program for hammock plots and give several short examples where I have found them useful.
Creation-Date: 20221130
Handle: RePEc:boc:csug22:14

Template-Type: ReDIF-Paper 1.0
Title:circlebar: A Stata package for plotting circular bar graph
File-URL: http://repec.org/csug2022/Naqvi-Bern2022-circlebar-spider.pdf
File-URL: http://repec.org/csug2022/Naqvi-Bern2022-circlebar-spider.do
Author-Name: Asjad Naqvi
Author-Workplace-Name: Austrian Institute for Economic Research 
Author-Workplace-Name:  Vienna University of Economics and Business
Author-Person: pna493
Abstract: This presentation will introduce circlebar, a Stata package that allows users to visualize data as circular bar graphs organized in polar coordinates.
The command allows for flexibility of selecting and changing bar dimensions, including starting and ending circles, colors and label placements, and controlling spacing between the bars.
Creation-Date: 20221130
Handle: RePEc:boc:csug22:15