Template-Type: ReDIF-Article 1.0
Author-Name: Matthias Schonlau	
Author-Workplace-Name: University of Waterloo
Author-Email: schonlau@uwaterloo.ca 
Author-Name: Rosie Yuyan Zou
Author-Workplace-Name: University of Waterloo
Author-Email: y53zou@uwaterloo.ca
Title: The random forest algorithm for statistical learning
Journal: Stata Journal
Pages: 3-29
Issue: 1
Volume: 20
Year: 2020
Month: March
X-DOI: 10.1177/1536867X20909688
Abstract: Random forests (Breiman, 2001, Machine Learning 45: 5–32) is a statistical- or machine-learning algorithm for prediction. In this article, we intro- duce a corresponding new command, rforest. We overview the random forest algorithm and illustrate its use with two examples: The first example is a clas- sification problem that predicts whether a credit card holder will default on his or her debt. The second example is a regression problem that predicts the log- scaled number of shares of online news articles. We conclude with a discussion that summarizes key points demonstrated in the examples.
Keywords: rforest, random decision forest algorithm
File-URL: http://hdl.handle.net/10.1177/1536867X20909688
Note: to access software from within Stata, net describe http://www.stata-journal.com/software/sj20-1/st0587/
Handle:RePEc:tsj:stataj:v:20:y:2020:i:1:p:3-29

Template-Type: ReDIF-Article 1.0
Author-Name: John Luke Gallup	
Author-Workplace-Name: Portland State University
Author-Email: jlgallup@pdx.edu
Title: Added-variable plots for panel-data estimation
Journal: Stata Journal
Pages: 30-50
Issue: 1
Volume: 20
Year: 2020
Month: March
X-DOI: 10.1177/1536867X20909689
Abstract: In this article, I extend the theory of added-variable plots to three panel-data estimation methods: fixed effects, between effects, and random effects. An added-variable plot is an effective way to show the correlation between an independent variable and a dependent variable conditional on other independent variables. In a multivariate context, a simple scatterplot showing x versus y is not adequate to show the relationship of x with y, because it ignores the impact of the other covariates. Added-variable plots are also useful for spotting influential outliers in the data that affect the estimated regression parameters. Stata can display added-variable plots with the command avplot, but it can be used only after regress. My new command, xtavplot, is a postestimation command that creates added-variable plots after xtreg estimates. Unlike avplot, xtavplot can display a confidence interval around the fitted regression line.
Keywords: xtavplot, xtavplots, added-variable plot, panel data, postestimation diagnostics, xtreg
File-URL: http://hdl.handle.net/10.1177/1536867X20909689
Note: to access software from within Stata, net describe http://www.stata-journal.com/software/sj20-1/gr0082/
Handle:RePEc:tsj:stataj:v:20:y:2020:i:1:p:30-50

Template-Type: ReDIF-Article 1.0
Author-Name: Fernando Rios-Avila
Author-Workplace-Name: Levy Economics Institute of Bard College
Author-Email: friosavi@levy.org
Author-Person: pri214
Title: Recentered influence functions (RIFs) in Stata: RIF regression and RIF decomposition
Journal: Stata Journal
Pages: 51-94
Issue: 1
Volume: 20
Year: 2020
Month: March
X-DOI: 10.1177/1536867X20909690
Abstract: Recentered influence functions (RIFs) are statistical tools popularized by Firpo, Fortin, and Lemieux (2009, Econometrica 77: 953–973) for analyzing unconditional partial effects on quantiles in a regression analysis framework (un- conditional quantile regressions). The flexibility and simplicity of these tools have opened the possibility to extend the analysis to other distributional statistics us- ing linear regressions or decomposition approaches. In this article, I introduce one function and two commands to facilitate the use of RIFs in the analysis of outcome distributions: rifvar() is an egen extension used to create RIFs for a large set of distributional statistics, rifhdreg facilitates the estimation of RIF regressions enabling the use of high-dimensional fixed effects, and oaxaca rif implements Oaxaca–Blinder decomposition analysis (RIF decompositions).
Keywords: rifvar(), rifhdreg, rifsureg2, oaxaca rif, uqreg, recentered influence functions, unconditional partial effects, unconditional quantile regression, RIF regressions, distributional statistics, Oaxaca–Blinder, RIF decomposition
File-URL: http://hdl.handle.net/10.1177/1536867X20909690
Note: to access software from within Stata, net describe http://www.stata-journal.com/software/sj20-1/st0588/
Handle:RePEc:tsj:stataj:v:20:y:2020:i:1:p:51-94

Template-Type: ReDIF-Article 1.0
Author-Name: Sergio Correia
Author-Workplace-Name: Federal Reserve Board of Governors
Author-Email: sergio.a.correia@frb.gov
Author-Person: pco826
Author-Name: Paulo Guimarães
Author-Workplace-Name: Banco de Portugal
Author-Email: pfguimaraes@bportugal.pt
Author-Person: pgu11
Author-Name: 	Tom Zylkin
Author-Workplace-Name: University of Richmond
Author-Email: tzylkin@richmond.edu
Author-Person: pzy12
Title: Fast Poisson estimation with high-dimensional fixed effects
Journal: Stata Journal
Pages: 95-115
Issue: 1
Volume: 20
Year: 2020
Month: March
X-DOI: 10.1177/1536867X20909691
Abstract: In this article, we present ppmlhdfe, a new command for estimation of (pseudo-)Poisson regression models with multiple high-dimensional fixed effects (HDFE). Estimation is implemented using a modified version of the iteratively reweighted least-squares algorithm that allows for fast estimation in the presence of HDFE. Because the code is built around the reghdfe package (Correia, 2014, Statistical Software Components S457874, Department of Economics, Boston Col- lege), it has similar syntax, supports many of the same functionalities, and benefits from reghdfe’s fast convergence properties for computing high-dimensional least- squares problems. Performance is further enhanced by some new techniques we introduce for accelerating HDFE iteratively reweighted least-squares estimation specifically. ppmlhdfe also implements a novel and more robust approach to check for the existence of (pseudo)maximum likelihood estimates.
Keywords: ppmlhdfe, reghdfe, Poisson regression, high-dimensional fixed effects
File-URL: http://hdl.handle.net/10.1177/1536867X20909691
Note: to access software from within Stata, net describe http://www.stata-journal.com/software/sj20-1/st0589/
Handle:RePEc:tsj:stataj:v:20:y:2020:i:1:p:95-115

Template-Type: ReDIF-Article 1.0
Author-Name: J. R. Lockwood
Author-Workplace-Name: Educational Testing Service
Author-Email: jrlockwood@ets.org
Author-Name: Daniel F. McCaffrey
Author-Workplace-Name: Educational Testing Service
Author-Email: dmccaffrey@ets.org
Title: Recommendations about estimating errors-in-variables regression in Stata
Journal: Stata Journal
Pages: 116-130
Issue: 1
Volume: 20
Year: 2020
Month: March
X-DOI: 10.1177/1536867X20909692
Abstract: Errors-in-variables (EIV) regression is a standard method for consistent estimation in linear models with error-prone covariates. The Stata commands eivreg and sem both can be used to compute the same EIV estimator of the regression coefficients. However, the commands do not use the same methods to estimate the standard errors of the estimated regression coefficients. In this article, we use analysis and simulation to demonstrate that standard errors reported by eivreg are negatively biased under assumptions typically made in latent-variable modeling, leading to confidence interval coverage that is below the nominal level. Thus, sem alone or eivreg augmented with bootstrapped standard errors should be preferred to eivreg alone in most practical applications of EIV regression.
Keywords: errors-in-variables regression, eivreg, sem, standard-error estimation
File-URL: http://hdl.handle.net/10.1177/1536867X20909692
Note: to access software from within Stata, net describe http://www.stata-journal.com/software/sj20-1/st0590/
Handle:RePEc:tsj:stataj:v:20:y:2020:i:1:p:116-130

Template-Type: ReDIF-Article 1.0
Author-Name: Jonathan Cook
Author-Workplace-Name: Public Company Accounting Oversight Board
Author-Email: jacook@uci.edu
Author-Name: 	Vikram Ramadas
Author-Workplace-Name: Public Company Accounting Oversight Board
Author-Email: vnramadas@ucdavis.edu
Title: When to consult precision-recall curves
Journal: Stata Journal
Pages: 131-148
Issue: 1
Volume: 20
Year: 2020
Month: March
X-DOI:  10.1177/1536867X20909693
Abstract: Receiver operating characteristic (ROC) curves are commonly used to evaluate predictions of binary outcomes. When there is a small percentage of items of interest (as would be the case with fraud detection, for example), ROC curves can provide an inflated view of performance. This can cause challenges in determining which set of predictions is better. In this article, we discuss the condi- tions under which precision-recall curves may be preferable to ROC curves. As an illustrative example, we compare two commonly used fraud predictors (Beneish’s [1999, Financial Analysts Journal 55: 24–36] M score and Dechow et al.’s [2011, Contemporary Accounting Research 28: 17–82] F score) using both ROC and precision-recall curves. To aid the reader with using precision-recall curves, we also introduce the command prcurve to plot them.
Keywords: prcurve, precision-recall curves, classifier evaluation, ROC curves
File-URL: http://hdl.handle.net/10.1177/1536867X20909693
Note: to access software from within Stata, net describe http://www.stata-journal.com/software/sj20-1/st0591/
Handle:RePEc:tsj:stataj:v:20:y:2020:i:1:p:131-148

Template-Type: ReDIF-Article 1.0
Author-Name: 	Koen Jochmans
Author-Workplace-Name: University of Cambridge
Author-Email: kj345@cam.ac.uk
Author-Person: pjo240
Author-Name: 	Vincenzo Verardi
Author-Workplace-Name: Université de Namur
Author-Email: vverardi@unamur.be
Author-Person: pve73
Title: A portmanteau test for serial correlation in a linear panel model
Journal: Stata Journal
Pages: 149-161
Issue: 1
Volume: 20
Year: 2020
Month: March
X-DOI: 10.1177/1536867X20909695
Abstract: We introduce the command xtserialpm to perform the portmanteau test developed in Jochmans (2019, Cambridge Working Papers in Economics No. 1993, University of Cambridge, Faculty of Economics). The procedure tests for serial correlation of arbitrary form in the errors of a linear panel model af- ter estimation of the regression coefficients by the within-group estimator. The test is designed for short panels and can deal with general missing-data patterns. The test is different from the related portmanteau test of Inoue and Solon (2006, Econometric Theory 22: 835–851), which is performed by xtistest (Wursten, 2018, Stata Journal 18: 76–100), in that it allows for heteroskedasticity. In sim- ulations documented below, xtserialpm is found to provide a more powerful test than xthrtest (Wursten 2018), which performs the test for first-order autocorre- lation of Born and Breitung (2016, Econometric Reviews 35: 1290–1316). We also provide comparisons with xtistest and xtserial (Drukker, 2003, Stata Journal 3: 168–177). These tests perform well under stationarity but break down under even mild forms of heteroskedasticity.
Keywords: xtserialpm, heteroskedasticity, fixed-effects model, portmanteau test, serial correlation, short panel data, unbalanced panel
File-URL: http://hdl.handle.net/10.1177/1536867X20909695
Note: to access software from within Stata, net describe http://www.stata-journal.com/software/sj20-1/st0592/
Handle:RePEc:tsj:stataj:v:20:y:2020:i:1:p:149-161

Template-Type: ReDIF-Article 1.0
Author-Name: 	Ariel Linden
Author-Workplace-Name: Linden Consulting Group, LLC
Author-Email: alinden@lindenconsulting.org
Author-Person: pli1113
Author-Name: 	Maya B. Mathur
Author-Workplace-Name: Harvard University
Author-Email: mmathur@stanford.edu
Author-Name: 	Tyler J. VanderWeele
Author-Workplace-Name: tvanderw@hsph.harvard.edu
Author-Email: mfdicle@gmail.com
Title: Conducting sensitivity analysis for unmeasured confounding in observational studies using E-values: The evalue package
Journal: Stata Journal
Pages: 162-175
Issue: 1
Volume: 20
Year: 2020
Month: March
X-DOI:  10.1177/1536867X20909696
Abstract: In this article, we introduce the evalue package, which performs sensitivity analyses for unmeasured confounding in observational studies using the methodology proposed by VanderWeele and Ding (2017, Annals of Inter- nal Medicine 167: 268–274). evalue reports E-values, defined as the minimum strength of association on the risk-ratio scale that an unmeasured confounder would need to have with both the treatment assignment and the outcome to fully explain away a specific treatment-outcome association, conditional on the mea- sured covariates. evalue computes E-values for point estimates (and optionally, confidence limits) for several common outcome types, including risk and rate ra- tios, odds ratios with common or rare outcomes, hazard ratios with common or rare outcomes, standardized mean differences in outcomes, and risk differences.
Keywords: evalue, E-value, sensitivity analysis, treatment effects, causality, confounding
File-URL: http://hdl.handle.net/10.1177/1536867X20909696
Note: to access software from within Stata, net describe http://www.stata-journal.com/software/sj20-1/st0593/
Handle:RePEc:tsj:stataj:v:20:y:2020:i:1:p:162-175

Template-Type: ReDIF-Article 1.0
Author-Name: Achim Ahrens
Author-Workplace-Name: ETH Zürich
Author-Email: achim.ahrens@gess.ethz.ch
Author-Person: pah173
Author-Name: 	Christian B. Hansen
Author-Workplace-Name: University of Chicago
Author-Email: christian.hansen@chicagobooth.edu
Author-Person: pha982
Author-Name: Mark E. Schaffer
Author-Workplace-Name: Heriot-Watt University
Author-Email: m.e.schaffer@hw.ac.uk
Author-Person: psc51
Title: lassopack: Model selection and prediction with regularized regression in Stata
Journal: Stata Journal
Pages: 176-235
Issue: 1
Volume: 20
Year: 2020
Month: March
X-DOI: 10.1177/1536867X20909697
Abstract: In this article, we introduce lassopack, a suite of programs for regularized regression in Stata. lassopack implements lasso, square-root lasso, elastic net, ridge regression, adaptive lasso, and postestimation ordinary least squares. The methods are suitable for the high-dimensional setting, where the number of predictors p may be large and possibly greater than the number of observations, n. We offer three approaches for selecting the penalization (“tuning”) parame- ters: information criteria (implemented in lasso2), K-fold cross-validation and h-step-ahead rolling cross-validation for cross-section, panel, and time-series data (cvlasso), and theory-driven (“rigorous” or plugin) penalization for the lasso and square-root lasso for cross-section and panel data (rlasso). We discuss the theo- retical framework and practical considerations for each approach. We also present Monte Carlo results to compare the performances of the penalization approaches.
Keywords:  lasso2, cvlasso, rlasso, cvlassologit, lassologit, rlassologit, lasso2 postestimation, lassologit postestimation, rlasso postestimation, lasso, elastic net, square-root lasso, cross-validation
File-URL: http://hdl.handle.net/10.1177/1536867X20909697
Note: to access software from within Stata, net describe http://www.stata-journal.com/software/sj20-1/st0594/
Handle:RePEc:tsj:stataj:v:20:y:2020:i:1:p:176-235

Template-Type: ReDIF-Article 1.0
Author-Name: Nicholas J. Cox
Author-Workplace-Name: Durham University
Author-Email: n.j.cox@durham.ac.uk
Author-Person: pco34
Title: Speaking Stata: Concatenating values over observations
Journal: Stata Journal
Pages: 236-243
Issue: 1
Volume: 20
Year: 2020
Month: March
X-DOI: 10.1177/1536867X20909698
Abstract: Concatenation, or joining together, of strings or other values, possibly with extra punctuation such as spaces, is supported in Stata by addition of strings and by the egen function concat(), which concatenates values of variables within observations. In this column, I discuss basic techniques for concatenating values of variables over observations, emphasizing simple loops that can be tuned to suit variants as desired. Commonly, such concatenated strings report a profile or history of each individual within panel or longitudinal data. Such histories can then be analyzed further.
Keywords: concatenation, strings, panel data, longitudinal data
File-URL: http://hdl.handle.net/10.1177/1536867X20909698
Note: to access software from within Stata, net describe http://www.stata-journal.com/software/sj20-1/pr0071/
Handle:RePEc:tsj:stataj:y:20:y:2020:i:1:p:236-243

Template-Type: ReDIF-Article 1.0
Author-Name: Maarten L. Buis
Author-Workplace-Name: University of Konstanz
Author-Email: maarten.buis@uni-konstanz.de
Author-Person: pbu92
Title: Stata tip 135: Leaps and bounds
Journal: Stata Journal
Pages: 244-249
Issue: 1
Volume: 20
Year: 2020
Month: March
X-DOI: 10.1177/1536867X20909707
File-URL: http://hdl.handle.net/10.1177/1536867X20909707
Note: to access software from within Stata, net describe http://www.stata-journal.com/software/sj20-1/pr0071/
Handle:RePEc:tsj:stataj:y:20:y:2020:i:1:p:244-249

Template-Type: ReDIF-Article 1.0
Author-Name: Editors
Author-Email: editors@stata.com
Title: Software updates
Journal: Stata Journal
Pages: 250-251
Issue: 1
Volume: 20
Year: 2020
Month: March 
Abstract: Updates for previously published packages are provided.
Note: to access software from within Stata, net describe http://www.stata-journal.com/software/sj20-1/st0399_1/
Note: to access software from within Stata, net describe http://www.stata-journal.com/software/sj20-1/st0526_1/
Note: to access software from within Stata, net describe http://www.stata-journal.com/software/sj20-1/st0574_1/

Note: Windows users should not attempt to download these files with a web browser. 
Handle:RePEc:tsj:stataj:v:20:y:2020:i:1:p:250-251