Template-Type: ReDIF-Article 1.0 Author-Name: Philippe Barbe Author-X-Name-First: Philippe Author-X-Name-Last: Barbe Author-Name: William C. Horrace Author-X-Name-First: William C. Author-X-Name-Last: Horrace Title: A Critical Reanalysis of Maryland State Police Searches Abstract: This article argues that previous analyses of the Maryland State Police search data may be unreliable, since nonstationarity of these data precludes the use of standard statistical inference techniques. In contrast, proper statistical graphics seem better suited to capture the complexities of the racial bias issue. Journal: The American Statistician Pages: 1-7 Issue: 1 Volume: 66 Year: 2012 Month: 2 X-DOI: 10.1080/00031305.2012.663662 File-URL: http://hdl.handle.net/10.1080/00031305.2012.663662 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:1:p:1-7 Template-Type: ReDIF-Article 1.0 Author-Name: Jesse Frey Author-X-Name-First: Jesse Author-X-Name-Last: Frey Author-Name: Andrés Pérez Author-X-Name-First: Andrés Author-X-Name-Last: Pérez Title: Exact Binomial Confidence Intervals for Randomized Response Abstract: We consider the problem of finding an exact confidence interval for a proportion that is estimated using randomized response. For many randomized response schemes, this is equivalent to finding an exact confidence interval for a bounded binomial proportion. Such intervals can be obtained by truncating standard exact binomial confidence intervals, but the truncated intervals may be empty or misleadingly short. We address this problem by using exact confidence intervals obtained by inverting a likelihood ratio test that takes into account that the proportion is bounded. A simple adjustment is made to keep the intervals from being excessively conservative. An R function for computing the intervals is available as online supplementary material. Journal: The American Statistician Pages: 8-15 Issue: 1 Volume: 66 Year: 2012 Month: 2 X-DOI: 10.1080/00031305.2012.663680 File-URL: http://hdl.handle.net/10.1080/00031305.2012.663680 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:1:p:8-15 Template-Type: ReDIF-Article 1.0 Author-Name: Robert S. Poulson Author-X-Name-First: Robert S. Author-X-Name-Last: Poulson Author-Name: Gary L. Gadbury Author-X-Name-First: Gary L. Author-X-Name-Last: Gadbury Author-Name: David B. Allison Author-X-Name-First: David B. Author-X-Name-Last: Allison Title: Treatment Heterogeneity and Individual Qualitative Interaction Abstract: Plausibility of high variability in treatment effects across individuals has been recognized as an important consideration in clinical studies. Surprisingly, little attention has been given to evaluating this variability in design of clinical trials or analyses of resulting data. High variation in a treatment's efficacy or safety across individuals (referred to herein as treatment heterogeneity) may have important consequences because the optimal treatment choice for an individual may be different from that suggested by a study of average effects. We call this an individual qualitative interaction (IQI), borrowing terminology from earlier work—referring to a qualitative interaction (QI) being present when the optimal treatment varies across “groups” of individuals. At least three techniques have been proposed to investigate treatment heterogeneity: techniques to detect a QI, use of measures such as the density overlap of two outcome variables under different treatments, and use of cross-over designs to observe “individual effects.” We elucidate underlying connections among them, their limitations, and some assumptions that may be required. We do so under a potential outcomes framework that can add insights to results from usual data analyses and to study design features that improve the capability to more directly assess treatment heterogeneity. Journal: The American Statistician Pages: 16-24 Issue: 1 Volume: 66 Year: 2012 Month: 2 X-DOI: 10.1080/00031305.2012.671724 File-URL: http://hdl.handle.net/10.1080/00031305.2012.671724 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:1:p:16-24 Template-Type: ReDIF-Article 1.0 Author-Name: A. S. Hedayat Author-X-Name-First: A. S. Author-X-Name-Last: Hedayat Author-Name: Guoqin Su Author-X-Name-First: Guoqin Author-X-Name-Last: Su Title: Robustness of the Simultaneous Estimators of Location and Scale From Approximating a Histogram by a Normal Density Curve Abstract: The robust properties of the simultaneous estimators of location and scale parameters (μ*, σ*) proposed by Brown and Hwang are studied. As a pair of simultaneous M estimators of location and scale, their asymptotic efficiencies (0.650 for μ* and 0.541 for σ*) are higher than those for median (0.637) and median absolute deviation (0.368) under the normal distribution. Simulation indicates that the distributions of and are much flatter than those based on the sample mean and the sample standard deviation under the normal distribution when the sample size is small. Journal: The American Statistician Pages: 25-33 Issue: 1 Volume: 66 Year: 2012 Month: 2 X-DOI: 10.1080/00031305.2012.663665 File-URL: http://hdl.handle.net/10.1080/00031305.2012.663665 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:1:p:25-33 Template-Type: ReDIF-Article 1.0 Author-Name: Bailey K. Fosdick Author-X-Name-First: Bailey K. Author-X-Name-Last: Fosdick Author-Name: Adrian E. Raftery Author-X-Name-First: Adrian E. Author-X-Name-Last: Raftery Title: Estimating the Correlation in Bivariate Normal Data With Known Variances and Small Sample Sizes Abstract: We consider the problem of estimating the correlation in bivariate normal data when the means and variances are assumed known, with emphasis on the small sample case. We consider eight different estimators, several of them considered here for the first time in the literature. In a simulation study, we found that Bayesian estimators using the uniform and arc-sine priors outperformed several empirical and exact or approximate maximum likelihood estimators in small samples. The arc-sine prior did better for large values of the correlation. For testing whether the correlation is zero, we found that Bayesian hypothesis tests outperformed significance tests based on the empirical and exact or approximate maximum likelihood estimators considered in small samples, but that all tests performed similarly for sample size 50. These results lead us to suggest using the posterior mean with the arc-sine prior to estimate the correlation in small samples when the variances are assumed known. Journal: The American Statistician Pages: 34-41 Issue: 1 Volume: 66 Year: 2012 Month: 2 X-DOI: 10.1080/00031305.2012.676329 File-URL: http://hdl.handle.net/10.1080/00031305.2012.676329 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:1:p:34-41 Template-Type: ReDIF-Article 1.0 Author-Name: Michael L. Lavine Author-X-Name-First: Michael L. Author-X-Name-Last: Lavine Author-Name: James S. Hodges Author-X-Name-First: James S. Author-X-Name-Last: Hodges Title: On Rigorous Specification of ICAR Models Abstract: Intrinsic (or improper) conditional autoregressions, or ICARs, are widely used in spatial statistics, splines, dynamic linear models, and elsewhere. Such models usually have several variance components, including one for errors and at least one for random effects. Likelihood and Bayesian inference depend on the likelihood function of those variances. But in the absence of constraints or further specifications that are not inherent to ICARs, the likelihood function is arbitrary and thus so are some inferences. We suggest several ways to add constraints or further specifications, but any choice is merely a convention. Journal: The American Statistician Pages: 42-49 Issue: 1 Volume: 66 Year: 2012 Month: 2 X-DOI: 10.1080/00031305.2012.654746 File-URL: http://hdl.handle.net/10.1080/00031305.2012.654746 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:1:p:42-49 Template-Type: ReDIF-Article 1.0 Author-Name: Alvaro Nosedal-Sanchez Author-X-Name-First: Alvaro Author-X-Name-Last: Nosedal-Sanchez Author-Name: Curtis B. Storlie Author-X-Name-First: Curtis B. Author-X-Name-Last: Storlie Author-Name: Thomas C.M. Lee Author-X-Name-First: Thomas C.M. Author-X-Name-Last: Lee Author-Name: Ronald Christensen Author-X-Name-First: Ronald Author-X-Name-Last: Christensen Title: Reproducing Kernel Hilbert Spaces for Penalized Regression: A Tutorial Abstract: Penalized regression procedures have become very popular ways to estimate complicated functions. The smoothing spline, for example, is the solution of a minimization problem in a functional space. If such a minimization problem is posed on a reproducing kernel Hilbert space (RKHS), the solution is guaranteed to exist, is unique, and has a very simple form. There are excellent books and articles about RKHS and their applications in statistics; however, this existing literature is very dense. This article provides a friendly reference for a reader approaching this subject for the first time. It begins with a simple problem, a system of linear equations, and then gives an intuitive motivation for reproducing kernels. Armed with the intuition gained from our first examples, we take the reader from vector spaces to Banach spaces and to RKHS. Finally, we present some statistical estimation problems that can be solved using the mathematical machinery discussed. After reading this tutorial, the reader will be ready to study more advanced texts and articles about the subject, such as those by Wahba or Gu. Online supplements are available for this article. Journal: The American Statistician Pages: 50-60 Issue: 1 Volume: 66 Year: 2012 Month: 2 X-DOI: 10.1080/00031305.2012.678196 File-URL: http://hdl.handle.net/10.1080/00031305.2012.678196 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:1:p:50-60 Template-Type: ReDIF-Article 1.0 Author-Name: Daniel F. Stone Author-X-Name-First: Daniel F. Author-X-Name-Last: Stone Title: Measurement Error and the Hot Hand Abstract: This article shows that the first autocorrelation of basketball shot results is a highly biased and inconsistent estimator of the first autocorrelation of the ex ante probabilities with which the shots are made. Shot result autocorrelation is close to zero even when shot probability autocorrelation is close to one. The bias is caused by what is equivalent to a severe measurement error problem. The results imply that the widespread belief among players and fans in the hot hand is not necessarily a cognitive fallacy. Journal: The American Statistician Pages: 61-66 Issue: 1 Volume: 66 Year: 2012 Month: 2 X-DOI: 10.1080/00031305.2012.676467 File-URL: http://hdl.handle.net/10.1080/00031305.2012.676467 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:1:p:61-66 Template-Type: ReDIF-Article 1.0 Author-Name: Christopher Tong Author-X-Name-First: Christopher Author-X-Name-Last: Tong Title: Letter to the Editor Journal: The American Statistician Pages: 75-75 Issue: 1 Volume: 66 Year: 2012 Month: 2 X-DOI: 10.1080/00031305.2012.667900 File-URL: http://hdl.handle.net/10.1080/00031305.2012.667900 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:1:p:75-75 Template-Type: ReDIF-Article 1.0 Author-Name: John W. Seaman Author-X-Name-First: John W. Author-X-Name-Last: Seaman Author-Name: John W. Seaman Author-X-Name-First: John W. Author-X-Name-Last: Seaman Author-Name: James D. Stamey Author-X-Name-First: James D. Author-X-Name-Last: Stamey Title: Hidden Dangers of Specifying Noninformative Priors Abstract: “Noninformative” priors are widely used in Bayesian inference. Diffuse priors are often placed on parameters that are components of some function of interest. That function may, of course, have a prior distribution that is highly informative, in contrast to the joint prior placed on its arguments, resulting in unintended influence on the posterior for the function. This problem is not always recognized by users of “noninformative” priors. We consider several examples of this problem. We also suggest methods for handling such induced priors. Journal: The American Statistician Pages: 77-84 Issue: 2 Volume: 66 Year: 2012 Month: 5 X-DOI: 10.1080/00031305.2012.695938 File-URL: http://hdl.handle.net/10.1080/00031305.2012.695938 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:2:p:77-84 Template-Type: ReDIF-Article 1.0 Author-Name: Tanya P. Garcia Author-X-Name-First: Tanya P. Author-X-Name-Last: Garcia Author-Name: Priya Kohli Author-X-Name-First: Priya Author-X-Name-Last: Kohli Author-Name: Mohsen Pourahmadi Author-X-Name-First: Mohsen Author-X-Name-Last: Pourahmadi Title: Regressograms and Mean-Covariance Models for Incomplete Longitudinal Data Abstract: Longitudinal studies are prevalent in biological and social sciences where subjects are measured repeatedly over time. Modeling the correlations and handling missing data are among the most challenging problems in analyzing such data. There are various methods for handling missing data, but data-based and graphical methods for modeling the covariance matrix of longitudinal data are relatively new. We adopt an approach based on the modified Cholesky decomposition of the covariance matrix which handles both challenges. It amounts to formulating parametric models for the regression coefficients of the conditional mean and variance of each measurement given its predecessors. We demonstrate the roles of profile plots and regressograms in formulating joint mean-covariance models for incomplete longitudinal data. Applying these graphical tools to the Fruit Fly Mortality (FFM) data, which has 22% missing values, reveals a logistic curve for the mean function and two different models for the two factors of the modified Cholesky decomposition of the sample covariance matrix. An expectation-maximization algorithm is proposed for estimating the parameters of the mean-covariance models; it performs well for the FFM data and in a simulation study of incomplete longitudinal data. Journal: The American Statistician Pages: 85-91 Issue: 2 Volume: 66 Year: 2012 Month: 5 X-DOI: 10.1080/00031305.2012.695935 File-URL: http://hdl.handle.net/10.1080/00031305.2012.695935 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:2:p:85-91 Template-Type: ReDIF-Article 1.0 Author-Name: Shelley Hurwitz Author-X-Name-First: Shelley Author-X-Name-Last: Hurwitz Author-Name: John S. Gardenier Author-X-Name-First: John S. Author-X-Name-Last: Gardenier Title: Ethical Guidelines for Statistical Practice: The First 60 Years and Beyond Abstract: The Ethical Guidelines for Statistical Practice of the American Statistical Association (ASA) have evolved over a span of more than 60 years, going back to 1949. The Interim version of the Guidelines was published in 1980, the Trial version was published in 1983 and revised and formalized in 1989, the current version was approved by the Board of Directors and made available on the ASA's Web site in 1999, and ASA accreditation now requires statistical practitioners to agree to abide by them. The new century brings new ethical concerns for statisticians. As examples, bioethics is booming, climate science is newsworthy for both science and ethics, and issues of statistical integrity in research keeps the U.S. Department of Health and Human Services Office of Research Integrity very busy. In this century, we see a rapid increase in the ability to collect massive amounts of data, with complex structure and a sometimes sensitive nature. With these unparalleled opportunities for statisticians comes an increased need for clear guidelines on professional ethics. The evolution of the Guidelines therefore needs to continue. In this article, we examine the long history of the ASA Ethical Guidelines for Statistical Practice, and discuss potential areas for revision to meet the needs of our expanding profession. Journal: The American Statistician Pages: 99-103 Issue: 2 Volume: 66 Year: 2012 Month: 5 X-DOI: 10.1080/00031305.2012.695959 File-URL: http://hdl.handle.net/10.1080/00031305.2012.695959 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:2:p:99-103 Template-Type: ReDIF-Article 1.0 Author-Name: Ruud Wetzels Author-X-Name-First: Ruud Author-X-Name-Last: Wetzels Author-Name: Raoul P. P. P. Grasman Author-X-Name-First: Raoul P. P. P. Author-X-Name-Last: Grasman Author-Name: Eric-Jan Wagenmakers Author-X-Name-First: Eric-Jan Author-X-Name-Last: Wagenmakers Title: A Default Bayesian Hypothesis Test for ANOVA Designs Abstract: This article presents a Bayesian hypothesis test for analysis of variance (ANOVA) designs. The test is an application of standard Bayesian methods for variable selection in regression models. We illustrate the effect of various g-priors on the ANOVA hypothesis test. The Bayesian test for ANOVA designs is useful for empirical researchers and for students; both groups will get a more acute appreciation of Bayesian inference when they can apply it to practical statistical problems such as ANOVA. We illustrate the use of the test with two examples, and we provide R code that makes the test easy to use. Journal: The American Statistician Pages: 104-111 Issue: 2 Volume: 66 Year: 2012 Month: 5 X-DOI: 10.1080/00031305.2012.695956 File-URL: http://hdl.handle.net/10.1080/00031305.2012.695956 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:2:p:104-111 Template-Type: ReDIF-Article 1.0 Author-Name: Ananda Sen Author-X-Name-First: Ananda Author-X-Name-Last: Sen Title: On the Interrelation Between the Sample Mean and the Sample Variance Abstract: The linearity (or lack thereof) of association between sample mean and sample variance is explored in this note with the intent of providing new insights. Of particular interest is a well-known inequality involving the measures of skewness and kurtosis that is derived as a consequence of an identity involving the correlation between sample mean and sample variance. The nature of association between the two is explored further by means of the conditional expectation of the sample variance given the mean. We present several characterization results where the specific relationship of this conditional expectation and sample mean uniquely determines the parent population. The note is presented at a level accessible to graduate or upper-level undergraduate students with several illustrative examples included as teaching aid. Journal: The American Statistician Pages: 112-117 Issue: 2 Volume: 66 Year: 2012 Month: 5 X-DOI: 10.1080/00031305.2012.695960 File-URL: http://hdl.handle.net/10.1080/00031305.2012.695960 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:2:p:112-117 Template-Type: ReDIF-Article 1.0 Author-Name: Jay M. Ver Hoef Author-X-Name-First: Jay M. Author-X-Name-Last: Ver Hoef Title: Who Invented the Delta Method? Abstract: Many statisticians and other scientists use what is commonly called the “delta method.” However, few people know who proposed it. The earliest article was found in an obscure journal, and the author is rarely cited for his contribution. This article briefly reviews three modern versions of the delta method and how they are used. Then, some history on the author and the journal of the first known article on the delta method is given. The original author’s specific contribution is reproduced, along with a discussion on possible reasons that it has been overlooked. Journal: The American Statistician Pages: 124-127 Issue: 2 Volume: 66 Year: 2012 Month: 5 X-DOI: 10.1080/00031305.2012.687494 File-URL: http://hdl.handle.net/10.1080/00031305.2012.687494 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:2:p:124-127 Template-Type: ReDIF-Article 1.0 Author-Name: Theodore G. Karrison Author-X-Name-First: Theodore G. Author-X-Name-Last: Karrison Author-Name: Mark J. Ratain Author-X-Name-First: Mark J. Author-X-Name-Last: Ratain Author-Name: Walter M. Stadler Author-X-Name-First: Walter M. Author-X-Name-Last: Stadler Author-Name: Gary L. Rosner Author-X-Name-First: Gary L. Author-X-Name-Last: Rosner Title: Estimation of Progression-Free Survival for All Treated Patients in the Randomized Discontinuation Trial Design Abstract: The randomized discontinuation trial (RDT) design is an enrichment-type design that has been used in a variety of diseases to evaluate the efficacy of new treatments. The RDT design seeks to select a more homogeneous group of patients, consisting of those who are more likely to show a treatment benefit if one exists. In oncology, the RDT design has been applied to evaluate the effects of cytostatic agents, that is, drugs that act primarily by slowing tumor growth rather than shrinking tumors. In the RDT design, all patients receive treatment during an initial, open-label run-in period of duration T. Patients with objective response (substantial tumor shrinkage) remain on therapy while those with early progressive disease are removed from the trial. Patients with stable disease (SD) are then randomized to either continue active treatment or switched to placebo. The main analysis compares outcomes, for example, progression-free survival (PFS), between the two randomized arms. As a secondary objective, investigators may seek to estimate PFS for all treated patients, measured from the time of entry into the study, by combining information from the run-in and post run-in periods. For tT, PFS is estimated by the observed proportion of patients who are progression-free among all patients enrolled. For t > T, the estimate can be expressed as , where is the estimated probability of response during the run-in period, is the estimated probability of SD, and and are the Kaplan--Meier estimates of subsequent PFS in the responders and patients with SD randomized to continue treatment, respectively. In this article, we derive the variance of , enabling the construction of confidence intervals for both S(t) and the median survival time. Simulation results indicate that the method provides accurate coverage rates. An interesting aspect of the design is that outcomes during the run-in phase have a negative multinomial distribution, something not frequently encountered in practice. Journal: The American Statistician Pages: 155-162 Issue: 3 Volume: 66 Year: 2012 Month: 8 X-DOI: 10.1080/00031305.2012.720900 File-URL: http://hdl.handle.net/10.1080/00031305.2012.720900 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:3:p:155-162 Template-Type: ReDIF-Article 1.0 Author-Name: Iliana Ignatova Author-X-Name-First: Iliana Author-X-Name-Last: Ignatova Author-Name: Roland C. Deutsch Author-X-Name-First: Roland C. Author-X-Name-Last: Deutsch Author-Name: Don Edwards Author-X-Name-First: Don Author-X-Name-Last: Edwards Title: Closed Sequential and Multistage Inference on Binary Responses With or Without Replacement Abstract: We consider closed sequential or multistage sampling, with or without replacement, from a lot of N items, where each item can be identified as defective (in error, tainted, etc.) or not. The goal is to make inference on the proportion, π, of defectives in the lot, or equivalently on the number of defectives in the lot D = Nπ. It is shown that exact inference on π using closed (bounded) sequential or multistage procedures with general prespecified elimination boundaries is completely tractable and not at all inconvenient using modern statistical software. We give relevant theory and demonstrate functions for this purpose written in R (R Development Core Team 2005, available as online supplementary material). Applicability of the methodology is illustrated in three examples: (1) sharpening of Wald's (1947) sequential probability ratio test used in industrial acceptance sampling, (2) two-stage sampling for auditing Medicare or Medicaid health care providers, and (3) risk-limited sequential procedures for election audits. Journal: The American Statistician Pages: 163-172 Issue: 3 Volume: 66 Year: 2012 Month: 8 X-DOI: 10.1080/00031305.2012.722901 File-URL: http://hdl.handle.net/10.1080/00031305.2012.722901 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:3:p:163-172 Template-Type: ReDIF-Article 1.0 Author-Name: Mehmet Kocak Author-X-Name-First: Mehmet Author-X-Name-Last: Kocak Author-Name: Arzu Onar-Thomas Author-X-Name-First: Arzu Author-X-Name-Last: Onar-Thomas Title: A Simulation-Based Evaluation of the Asymptotic Power Formulas for Cox Models in Small Sample Cases Abstract: Cox proportional hazards (PH) models are commonly used in medical research to investigate the associations between covariates and time-to-event outcomes. It is frequently noted that with less than 10 events per covariate, these models produce spurious results and therefore should not be used. Statistical literature contains asymptotic power formulas for the Cox model which can be used to determine the number of events needed to detect an association. Here, we investigate via simulations the performance of these formulas in small sample settings for Cox models with one or two covariates. Our simulations indicate that when the number of events is small, the power estimate based on the asymptotic formula is often inflated. The discrepancy between the asymptotic and empirical power is larger for the dichotomous covariate especially in cases where allocation of sample size to its levels is unequal. When more than one covariate is included in the same model, the discrepancy between the asymptotic power and the empirical power is even larger, especially when a high positive correlation exists between the two covariates. Journal: The American Statistician Pages: 173-179 Issue: 3 Volume: 66 Year: 2012 Month: 8 X-DOI: 10.1080/00031305.2012.703873 File-URL: http://hdl.handle.net/10.1080/00031305.2012.703873 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:3:p:173-179 Template-Type: ReDIF-Article 1.0 Author-Name: Lingyun Zhang Author-X-Name-First: Lingyun Author-X-Name-Last: Zhang Author-Name: Xinzhong Xu Author-X-Name-First: Xinzhong Author-X-Name-Last: Xu Author-Name: Gemai Chen Author-X-Name-First: Gemai Author-X-Name-Last: Chen Title: The Exact Likelihood Ratio Test for Equality of Two Normal Populations Abstract: Testing the equality of two independent normal populations is a perfect case of the two-sample problem, yet it is not treated in the main text of any textbook or handbook. In this article, we derive the exact distribution of the likelihood ratio test and implement this test with an R function. This article has supplementary materials online. Journal: The American Statistician Pages: 180-184 Issue: 3 Volume: 66 Year: 2012 Month: 8 X-DOI: 10.1080/00031305.2012.707083 File-URL: http://hdl.handle.net/10.1080/00031305.2012.707083 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:3:p:180-184 Template-Type: ReDIF-Article 1.0 Author-Name: José R. Berrendero Author-X-Name-First: José R. Author-X-Name-Last: Berrendero Author-Name: Javier Cárcamo Author-X-Name-First: Javier Author-X-Name-Last: Cárcamo Title: The Tangent Classifier Abstract: Given a classifier, we describe a general method to construct a simple linear classification rule. This rule, called the tangent classifier, is obtained by computing the tangent hyperplane to the separation boundary of the groups (generated by the initial classifier) at a certain point. When applied to a quadratic region, the tangent classifier has a neat closed-form expression. We discuss various examples and the application of this new linear classifier in two situations under which standard rules may fail: when there is a fraction of outliers in the training sample and when the dimension of the data is large in comparison with the sample size. Journal: The American Statistician Pages: 185-194 Issue: 3 Volume: 66 Year: 2012 Month: 8 X-DOI: 10.1080/00031305.2012.710511 File-URL: http://hdl.handle.net/10.1080/00031305.2012.710511 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:3:p:185-194 Template-Type: ReDIF-Article 1.0 Author-Name: Michael Friendly Author-X-Name-First: Michael Author-X-Name-Last: Friendly Author-Name: Nicolas de Sainte Agathe Author-X-Name-First: Nicolas Author-X-Name-Last: de Sainte Agathe Title: André-Michel Guerry's Ordonnateur Statistique: The First Statistical Calculator? Abstract: A document retrieved from the archives of the Conservatoire National des Arts et Métiers (CNAM) in Paris sheds new light on the invention by André-Michel Guerry of a mechanical device for obtaining statistical summaries and for examining the relationship between different variables, well before general purpose statistical calculators and the idea of correlation had even been conceived. Guerry's ordonnateur statistique may arguably be considered as the first example of a mechanical device devoted to statistical calculations. This article describes what is now known about this machine and illustrates how Guerry probably used it in his program of statistique analytique to reason about the relationship of types of crimes to various potential causes or associations. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 195-200 Issue: 3 Volume: 66 Year: 2012 Month: 8 X-DOI: 10.1080/00031305.2012.714716 File-URL: http://hdl.handle.net/10.1080/00031305.2012.714716 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:3:p:195-200 Template-Type: ReDIF-Article 1.0 Author-Name: Lawren Smithline Author-X-Name-First: Lawren Author-X-Name-Last: Smithline Title: Letter to the Editor Journal: The American Statistician Pages: 207-207 Issue: 3 Volume: 66 Year: 2012 Month: 8 X-DOI: 10.1080/00031305.2012.718996 File-URL: http://hdl.handle.net/10.1080/00031305.2012.718996 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:3:p:207-207 Template-Type: ReDIF-Article 1.0 Author-Name: Sarah Keogh Author-X-Name-First: Sarah Author-X-Name-Last: Keogh Author-Name: Donal O’neill Author-X-Name-First: Donal Author-X-Name-Last: O’neill Title: A Statistical Analysis of the Fairness of Alternative Handicapping Systems in Ten-Pin Bowling Abstract: Using data on approximately 1040 games of bowling, we examine the fairness of alternative handicapping systems in ten-pin bowling. The objective of a handicap system is to allow less-skilled bowlers to compete against more skilled opponents on a level playing field. We show that the current systems used in many leagues do not achieve this objective and we propose a new optimal system which equalizes the playing field across all potential match-ups. Journal: The American Statistician Pages: 209-213 Issue: 4 Volume: 66 Year: 2012 Month: 11 X-DOI: 10.1080/00031305.2012.726933 File-URL: http://hdl.handle.net/10.1080/00031305.2012.726933 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:4:p:209-213 Template-Type: ReDIF-Article 1.0 Author-Name: Steven G. From Author-X-Name-First: Steven G. Author-X-Name-Last: From Title: A Comparison of the Moment and Factorial Moment Bounds for Discrete Random Variables Abstract: In this note, we establish the superiority of the factorial moment bound over the moment bound for nonnegative integer-valued discrete random variables. This solves a problem given earlier. We use some results from approximation theory/ numerical analysis to compare the bounds. Journal: The American Statistician Pages: 214-216 Issue: 4 Volume: 66 Year: 2012 Month: 11 X-DOI: 10.1080/00031305.2012.734769 File-URL: http://hdl.handle.net/10.1080/00031305.2012.734769 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:4:p:214-216 Template-Type: ReDIF-Article 1.0 Author-Name: Tommy Wright Author-X-Name-First: Tommy Author-X-Name-Last: Wright Title: The Equivalence of Neyman Optimum Allocation for Sampling and Equal Proportions for Apportioning the U.S. House of Representatives Abstract: We present a surprising though obvious result that seems to have been unnoticed until now. In particular, we demonstrate the equivalence of two well-known problems—the optimal allocation of the fixed overall sample size n among L strata under stratified random sampling and the optimal allocation of the H = 435 seats among the 50 states for apportionment of the U.S. House of Representatives following each decennial census. In spite of the strong similarity manifest in the statements of the two problems, they have not been linked and they have well-known but different solutions; one solution is not explicitly exact (Neyman allocation), and the other (equal proportions) is exact. We give explicit exact solutions for both and note that the solutions are equivalent. In fact, we conclude by showing that both problems are special cases of a general problem. The result is significant for stratified random sampling in that it explicitly shows how to minimize sampling error when estimating a total TY while keeping the final overall sample size fixed at n; this is usually not the case in practice with Neyman allocation where the resulting final overall sample size might be near n + L after rounding. An example reveals that controlled rounding with Neyman allocation does not always lead to the optimum allocation, that is, an allocation that minimizes variance. Journal: The American Statistician Pages: 217-224 Issue: 4 Volume: 66 Year: 2012 Month: 11 X-DOI: 10.1080/00031305.2012.733679 File-URL: http://hdl.handle.net/10.1080/00031305.2012.733679 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:4:p:217-224 Template-Type: ReDIF-Article 1.0 Author-Name: Yves Tillé Author-X-Name-First: Yves Author-X-Name-Last: Tillé Author-Name: Matti Langel Author-X-Name-First: Matti Author-X-Name-Last: Langel Title: Histogram-Based Interpolation of the Lorenz Curve and Gini Index for Grouped Data Abstract: In grouped data, the estimation of the Lorenz curve without taking into account the within-class variability leads to an overestimation of the curve and an underestimation of the Gini index. We propose a new strictly convex estimator of the Lorenz curve derived from a linear interpolation-based approximation of the cumulative distribution function. Integrating the Lorenz curve, a correction can be derived for the Gini index that takes the intraclass variability into account. Journal: The American Statistician Pages: 225-231 Issue: 4 Volume: 66 Year: 2012 Month: 11 X-DOI: 10.1080/00031305.2012.734197 File-URL: http://hdl.handle.net/10.1080/00031305.2012.734197 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:4:p:225-231 Template-Type: ReDIF-Article 1.0 Author-Name: Liang Hong Author-X-Name-First: Liang Author-X-Name-Last: Hong Title: A Remark on the Alternative Expectation Formula Abstract: Students in their first course in probability will often see the expectation formula for nonnegative continuous random variables in terms of the survival function. The traditional approach for deriving this formula (using double integrals) is well-received by students. Some students tend to approach this using integration by parts, but often get stuck. Most standard textbooks do not elaborate on this alternative approach. We present a rigorous derivation here. We hope that students and instructors of the first course in probability will find this short note helpful. Journal: The American Statistician Pages: 232-233 Issue: 4 Volume: 66 Year: 2012 Month: 11 X-DOI: 10.1080/00031305.2012.726934 File-URL: http://hdl.handle.net/10.1080/00031305.2012.726934 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:4:p:232-233 Template-Type: ReDIF-Article 1.0 Author-Name: Stavros Kourouklis Author-X-Name-First: Stavros Author-X-Name-Last: Kourouklis Title: A New Estimator of the Variance Based on Minimizing Mean Squared Error Abstract: In 2005, Yatracos constructed the estimator S -super-2 2 = c 2 S -super-2, c 2 = (n + 2)(n − 1)[n(n + 1)]-super-− 1, of the variance, which has smaller mean squared error (MSE) than the unbiased estimator S -super-2. In this work, the estimator S -super-2 1 = c 1 S -super-2, c 1 = n(n − 1)[n(n − 1) + 2]-super-− 1, is constructed and is shown to have the following properties: (a) it has smaller MSE than S -super-2 2, and (b) it cannot be improved in terms of MSE by an estimator of the form cS -super-2, c > 0. The method of construction is based on Stein’s classical idea brought forward in 1964, is very simple, and may be taught even in an undergraduate class. Also, all the estimators of the form cS -super-2, c > 0, with smaller MSE than S -super-2 as well as all those that have the property (b) are found. In contrast to S -super-2, the method of moments estimator is among the latter estimators. Journal: The American Statistician Pages: 234-236 Issue: 4 Volume: 66 Year: 2012 Month: 11 X-DOI: 10.1080/00031305.2012.735209 File-URL: http://hdl.handle.net/10.1080/00031305.2012.735209 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:4:p:234-236 Template-Type: ReDIF-Article 1.0 Author-Name: J. Kelly Cunningham Author-X-Name-First: J. Kelly Author-X-Name-Last: Cunningham Title: Should S Get More Press? Abstract: This note discusses a problem appropriate for a beginning mathematical statistics course. Four estimators of the standard deviation of a normal data source are compared using mean square error. Both the uniformly minimum variance unbiased estimator and the usual estimator are found to be inadmissible. Journal: The American Statistician Pages: 237-237 Issue: 4 Volume: 66 Year: 2012 Month: 11 X-DOI: 10.1080/00031305.2012.736915 File-URL: http://hdl.handle.net/10.1080/00031305.2012.736915 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:4:p:237-237 Template-Type: ReDIF-Article 1.0 Author-Name: Robert A. Oster Author-X-Name-First: Robert A. Author-X-Name-Last: Oster Title: Section Editor's Notes Journal: The American Statistician Pages: 238-238 Issue: 4 Volume: 66 Year: 2012 Month: 11 X-DOI: 10.1080/00031305.2012.743422 File-URL: http://hdl.handle.net/10.1080/00031305.2012.743422 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:66:y:2012:i:4:p:238-238 Template-Type: ReDIF-Article 1.0 Author-Name: Andrew Gelman Author-X-Name-First: Andrew Author-X-Name-Last: Gelman Author-Name: Christian P. Robert Author-X-Name-First: Christian P. Author-X-Name-Last: Robert Title: "Not Only Defended But Also Applied": The Perceived Absurdity of Bayesian Inference Abstract: The missionary zeal of many Bayesians of old has been matched, in the other direction, by an attitude among some theoreticians that Bayesian methods were absurd-not merely misguided but obviously wrong in principle. We consider several examples, beginning with Feller's classic text on probability theory and continuing with more recent cases such as the perceived Bayesian nature of the so-called doomsday argument. We analyze in this note the intellectual background behind various misconceptions about Bayesian statistics, without aiming at a complete historical coverage of the reasons for this dismissal. Journal: The American Statistician Pages: 1-5 Issue: 1 Volume: 67 Year: 2013 Month: 2 X-DOI: 10.1080/00031305.2013.760987 File-URL: http://hdl.handle.net/10.1080/00031305.2013.760987 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:1:p:1-5 Template-Type: ReDIF-Article 1.0 Author-Name: Stephen Stigler Author-X-Name-First: Stephen Author-X-Name-Last: Stigler Title: Comment: Bayesian Inference: The Rodney Dangerfield of Statistics? Journal: The American Statistician Pages: 6-7 Issue: 1 Volume: 67 Year: 2013 Month: 2 X-DOI: 10.1080/00031305.2012.747448 File-URL: http://hdl.handle.net/10.1080/00031305.2012.747448 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:1:p:6-7 Template-Type: ReDIF-Article 1.0 Author-Name: Stephen E. Fienberg Author-X-Name-First: Stephen E. Author-X-Name-Last: Fienberg Title: Comment: Bayesian Ideas Reemerged in the 1950s Journal: The American Statistician Pages: 7-8 Issue: 1 Volume: 67 Year: 2013 Month: 2 X-DOI: 10.1080/00031305.2012.751881 File-URL: http://hdl.handle.net/10.1080/00031305.2012.751881 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:1:p:7-8 Template-Type: ReDIF-Article 1.0 Author-Name: Wesley O. Johnson Author-X-Name-First: Wesley O. Author-X-Name-Last: Johnson Title: Comment: Bayesian Statistics in the Twenty First Century Journal: The American Statistician Pages: 9-11 Issue: 1 Volume: 67 Year: 2013 Month: 2 X-DOI: 10.1080/00031305.2012.751880 File-URL: http://hdl.handle.net/10.1080/00031305.2012.751880 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:1:p:9-11 Template-Type: ReDIF-Article 1.0 Author-Name: Deborah G. Mayo Author-X-Name-First: Deborah G. Author-X-Name-Last: Mayo Title: Discussion: Bayesian Methods: Applied? Yes. Philosophical Defense? In Flux Journal: The American Statistician Pages: 11-15 Issue: 1 Volume: 67 Year: 2013 Month: 2 X-DOI: 10.1080/00031305.2012.752410 File-URL: http://hdl.handle.net/10.1080/00031305.2012.752410 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:1:p:11-15 Template-Type: ReDIF-Article 1.0 Author-Name: Andrew Gelman Author-X-Name-First: Andrew Author-X-Name-Last: Gelman Author-Name: Christian P. Robert Author-X-Name-First: Christian P. Author-X-Name-Last: Robert Title: Rejoinder: The Anti-Bayesian Moment and Its Passing Journal: The American Statistician Pages: 16-17 Issue: 1 Volume: 67 Year: 2013 Month: 2 X-DOI: 10.1080/00031305.2012.752409 File-URL: http://hdl.handle.net/10.1080/00031305.2012.752409 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:1:p:16-17 Template-Type: ReDIF-Article 1.0 Author-Name: Sandy Zabell Author-X-Name-First: Sandy Author-X-Name-Last: Zabell Title: Paul Meier on Legal Consulting Abstract: In addition to his contributions to biostatistics and clinical trials, Paul Meier had a long-term interest in the legal applications of statistics. As part of this, he had extensive experience as a statistical consultant. Legal consulting can be a minefield, but as a result of his background, Paul had excellent advice to give to those starting out on how to function successfully in this environment. Journal: The American Statistician Pages: 18-21 Issue: 1 Volume: 67 Year: 2013 Month: 2 X-DOI: 10.1080/00031305.2012.742026 File-URL: http://hdl.handle.net/10.1080/00031305.2012.742026 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:1:p:18-21 Template-Type: ReDIF-Article 1.0 Author-Name: Rick Picard Author-X-Name-First: Rick Author-X-Name-Last: Picard Author-Name: Brian Williams Author-X-Name-First: Brian Author-X-Name-Last: Williams Title: Rare Event Estimation for Computer Models Abstract: Rare events for computer models are usually impossible to address via direct methods-the conceptually straightforward approach of making millions of "ordinary" code runs to generate an adequate number of rare events simply is not an option. In Bayesian applications, the common practice of sampling from posterior distributions is inefficient for rare event estimation when some parameters are important, and corresponding normalized estimates can be seriously biased for seemingly adequate sample sizes (e.g., N = 10-super-6). Rare event estimation based on adaptive importance sampling can improve computational efficiencies by orders of magnitude relative to ordinary simulation methods, greatly reducing the need for time-consuming code runs. Journal: The American Statistician Pages: 22-32 Issue: 1 Volume: 67 Year: 2013 Month: 2 X-DOI: 10.1080/00031305.2012.751879 File-URL: http://hdl.handle.net/10.1080/00031305.2012.751879 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:1:p:22-32 Template-Type: ReDIF-Article 1.0 Author-Name: Valeria Sambucini Author-X-Name-First: Valeria Author-X-Name-Last: Sambucini Title: On the Nature of the Stationary Point of a Quadratic Response Surface: A Bayesian Simulation-Based Approach Abstract: In response-surface methodology, when the data are fitted using a quadratic model, it is important to make inference about the eigenvalues of the matrix of pure and mixed second-order coefficients, since they contain information on the nature of the stationary point and the shape of the surface. In this article, we propose a Bayesian simulation-based approach to explore the behavior of the posterior distributions of these eigenvalues. Highest posterior density (HPD) intervals for the ordered eigenvalues are then computed and their empirical coverage probabilities are evaluated. A user-friendly software tool has been developed to get the kernel density plots of these simulated posterior distributions and to obtain the corresponding HPD intervals. It is provided online as supplementary materials to this article. Journal: The American Statistician Pages: 33-41 Issue: 1 Volume: 67 Year: 2013 Month: 2 X-DOI: 10.1080/00031305.2012.755366 File-URL: http://hdl.handle.net/10.1080/00031305.2012.755366 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:1:p:33-41 Template-Type: ReDIF-Article 1.0 Author-Name: Bruce Levin Author-X-Name-First: Bruce Author-X-Name-Last: Levin Author-Name: Cheng-Shiun Leu Author-X-Name-First: Cheng-Shiun Author-X-Name-Last: Leu Title: Note on an Identity Between Two Unbiased Variance Estimators for the Grand Mean in a Simple Random Effects Model Abstract: We demonstrate the algebraic equivalence of two unbiased variance estimators for the sample grand mean in a random sample of subjects from an infinite population where subjects provide repeated observations following a homoscedastic random effects model. Journal: The American Statistician Pages: 42-43 Issue: 1 Volume: 67 Year: 2013 Month: 2 X-DOI: 10.1080/00031305.2012.752105 File-URL: http://hdl.handle.net/10.1080/00031305.2012.752105 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:1:p:42-43 Template-Type: ReDIF-Article 1.0 Author-Name: Roberto Behar Author-X-Name-First: Roberto Author-X-Name-Last: Behar Author-Name: Pere Grima Author-X-Name-First: Pere Author-X-Name-Last: Grima Author-Name: Lluís Marco-Almagro Author-X-Name-First: Lluís Author-X-Name-Last: Marco-Almagro Title: Twenty-Five Analogies for Explaining Statistical Concepts Abstract: The use of analogies is a resource that can be used for transmitting concepts and making classes more enjoyable. This article presents 25 analogies that we use in our introductory statistical courses for introducing concepts and clarifying possible doubts. We have found that these analogies draw students' attention and reinforce the ideas that we want to transmit. Journal: The American Statistician Pages: 44-48 Issue: 1 Volume: 67 Year: 2013 Month: 2 X-DOI: 10.1080/00031305.2012.752408 File-URL: http://hdl.handle.net/10.1080/00031305.2012.752408 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:1:p:44-48 Template-Type: ReDIF-Article 1.0 Author-Name: Subhash C. Bagui Author-X-Name-First: Subhash C. Author-X-Name-Last: Bagui Author-Name: Dulal K. Bhaumik Author-X-Name-First: Dulal K. Author-X-Name-Last: Bhaumik Author-Name: K. L. Mehra Author-X-Name-First: K. L. Author-X-Name-Last: Mehra Title: A Few Counter Examples Useful in Teaching Central Limit Theorems Abstract: In probability theory, central limit theorems (CLTs), broadly speaking, state that the distribution of the sum of a sequence of random variables (r.v.'s), suitably normalized, converges to a normal distribution as their number n increases indefinitely. However, the preceding convergence in distribution holds only under certain conditions, depending on the underlying probabilistic nature of this sequence of r.v.'s. If some of the assumed conditions are violated, the convergence may or may not hold, or if it does, this convergence may be to a nonnormal distribution. We shall illustrate this via a few counter examples. While teaching CLTs at an advanced level, counter examples can serve as useful tools for explaining the true nature of these CLTs and the consequences when some of the assumptions made are violated. Journal: The American Statistician Pages: 49-56 Issue: 1 Volume: 67 Year: 2013 Month: 2 X-DOI: 10.1080/00031305.2012.755361 File-URL: http://hdl.handle.net/10.1080/00031305.2012.755361 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:1:p:49-56 Template-Type: ReDIF-Article 1.0 Author-Name: Stephen M. Stigler Author-X-Name-First: Stephen M. Author-X-Name-Last: Stigler Title: The Digital Approximation of the Binomial by the Poisson Abstract: An old source can lead to looking at the Poisson approximation to the binomial in a new light. Journal: The American Statistician Pages: 57-59 Issue: 1 Volume: 67 Year: 2013 Month: 2 X-DOI: 10.1080/00031305.2012.755473 File-URL: http://hdl.handle.net/10.1080/00031305.2012.755473 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:1:p:57-59 Template-Type: ReDIF-Article 1.0 Author-Name: Paul R. Rosenbaum Author-X-Name-First: Paul R. Author-X-Name-Last: Rosenbaum Author-Name: Jeffrey H. Silber Author-X-Name-First: Jeffrey H. Author-X-Name-Last: Silber Title: Using the Exterior Match to Compare Two Entwined Matched Control Groups Abstract: When comparing outcomes, such as survival, in two groups- say a focal group and a comparison group-a common question is whether an adjustment for certain baseline differences that separate these two groups actually matters for the difference in outcomes. Did the adjustment matter? If it did matter, to what quantitative extent did it matter? This question is quite distinct from whether the baseline variables predict the outcome: baseline variables may predict the outcome, yet explain no part of the difference in outcomes in two groups. The question is also distinct from whether a difference between the groups remains after adjustment: an adjustment may matter quite a bit, yet fail to explain a substantial part of the difference in outcomes, and, indeed, adjustment may increase the difference. Whether an adjustment for (x 1, x 2) matters over and above an adjustment for x 1 alone can be addressed by comparing outcomes in two control groups formed from the comparison group, one matched to the focal group for x 1 alone, the other matched to focal group for (x 1, x 2). How do outcomes differ in these two matched control groups? If two control groups are each pair-matched to the same focal group, then the result is a set of matched triples, so controls in the two groups are implicitly matched to each other by virtue of being matched to the same person in the focal group. When the comparison group is vastly larger than the focal group and their distributions exhibit extensive overlap on (x 1, x 2), it may be possible to construct nonintersecting matched control groups, but quite often the comparison group is large enough to yield closely matched groups one at a time, but is not large enough to produce several nonintersecting matched control groups. How can one compare two matched control groups that are entwined, with some of the same controls in both groups? Two entwined control groups have a nonempty intersection: some of the same controls appear in both groups as duplicates. These duplicates may appear in the same matched triple, but more commonly they appear in different matched triples. This structure yields a new nonintersecting match that we call the exterior match. Properties of the exterior match are discussed. Our on-going study of black-versus-white disparities in survival following breast cancer in Medicare motivated this work and is used to illustrate. Journal: The American Statistician Pages: 67-75 Issue: 2 Volume: 67 Year: 2013 Month: 5 X-DOI: 10.1080/00031305.2013.769914 File-URL: http://hdl.handle.net/10.1080/00031305.2013.769914 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:2:p:67-75 Template-Type: ReDIF-Article 1.0 Author-Name: Micha Mandel Author-X-Name-First: Micha Author-X-Name-Last: Mandel Title: Simulation-Based Confidence Intervals for Functions With Complicated Derivatives Abstract: In many scientific problems, the quantity of interest is a function of parameters that index the model, and confidence intervals are constructed by applying the delta method. However, when the function of interest has complicated derivatives, this standard approach is unattractive and alternative algorithms are required. This article discusses a simple simulation-based algorithm for estimating the variance of a transformation, and demonstrates its simplicity and accuracy by applying it to several statistical problems. Journal: The American Statistician Pages: 76-81 Issue: 2 Volume: 67 Year: 2013 Month: 5 X-DOI: 10.1080/00031305.2013.783880 File-URL: http://hdl.handle.net/10.1080/00031305.2013.783880 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:2:p:76-81 Template-Type: ReDIF-Article 1.0 Author-Name: Anthony J. Webster Author-X-Name-First: Anthony J. Author-X-Name-Last: Webster Author-Name: Richard Kemp Author-X-Name-First: Richard Author-X-Name-Last: Kemp Title: Estimating Omissions From Searches Abstract: The mark-recapture method was devised by Petersen in 1896 to estimate the number of fish migrating into the Limfjord, and independently by Lincoln in 1930 to estimate waterfowl abundance. The technique can be applied to any search for a finite number of items by two or more people or agents, allowing the number of searched-for items to be estimated. This ubiquitous problem appears in fields from ecology and epidemiology, through to mathematics, social sciences, and computing. Here, we exactly calculate the moments of the hypergeometric distribution associated with this longstanding problem, confirming that widely used estimates conjectured in 1951 are often too small. Our Bayesian approach highlights how different search strategies will modify the estimates. The estimates are applied to several examples. For some published applications, substantial errors are found to result from using the Chapman or Lincoln--Petersen estimates. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 82-89 Issue: 2 Volume: 67 Year: 2013 Month: 5 X-DOI: 10.1080/00031305.2013.783881 File-URL: http://hdl.handle.net/10.1080/00031305.2013.783881 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:2:p:82-89 Template-Type: ReDIF-Article 1.0 Author-Name: Sergio Wechsler Author-X-Name-First: Sergio Author-X-Name-Last: Wechsler Author-Name: Rafael Izbicki Author-X-Name-First: Rafael Author-X-Name-Last: Izbicki Author-Name: Luís Gustavo Esteves Author-X-Name-First: Luís Gustavo Author-X-Name-Last: Esteves Title: A Bayesian Look at Nonidentifiability: A Simple Example Abstract: This article discusses the concept of identifiability in simple probability calculus. Emphasis is given to Bayesian solutions. In particular, we compare Bayes and maximum likelihood estimators. We advocate adoption of informative prior probabilities for the Bayesian operation in place of diffuse or reference priors. We also discuss the concept of identifying functions. Journal: The American Statistician Pages: 90-93 Issue: 2 Volume: 67 Year: 2013 Month: 5 X-DOI: 10.1080/00031305.2013.778787 File-URL: http://hdl.handle.net/10.1080/00031305.2013.778787 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:2:p:90-93 Template-Type: ReDIF-Article 1.0 Author-Name: Stephen B. Vardeman Author-X-Name-First: Stephen B. Author-X-Name-Last: Vardeman Author-Name: Max D. Morris Author-X-Name-First: Max D. Author-X-Name-Last: Morris Title: Majority Voting by Independent Classifiers Can Increase Error Rates Abstract: The technique of "majority voting" of classifiers is used in machine learning with the aim of constructing a new combined classification rule that has better characteristics than any of a given set of rules. The "Condorcet Jury Theorem" is often cited, incorrectly, as support for a claim that this practice leads to an improved classifier (i.e., one with smaller error probabilities) when the given classifiers are sufficiently good and are uncorrelated. We specifically address the case of two-category classification, and argue that a correct claim can be made for independent (not just uncorrelated) classification errors (not the classifiers themselves), and offer an example demonstrating that the common claim is false. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 94-96 Issue: 2 Volume: 67 Year: 2013 Month: 5 X-DOI: 10.1080/00031305.2013.778788 File-URL: http://hdl.handle.net/10.1080/00031305.2013.778788 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:2:p:94-96 Template-Type: ReDIF-Article 1.0 Author-Name: Mithat Gönen Author-X-Name-First: Mithat Author-X-Name-Last: Gönen Title: Visualizing Longitudinal Data With Dropouts Abstract: This article proposes a triangle plot to display longitudinal data with dropouts. The triangle plot is a tool of data visualization that can also serve as a graphical check for informativeness of the dropout process. There are similarities between the lasagna plot and the triangle plot, but the explicit use of dropout time as an axis is an advantage of the triangle plot over the more commonly used graphical strategies for longitudinal data. It is possible to interpret the triangle plot as a trellis plot, which gives rise to several extensions such as the triangle histogram and the triangle boxplot. R code is available to streamline the use of the triangle plot in practice. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 97-103 Issue: 2 Volume: 67 Year: 2013 Month: 5 X-DOI: 10.1080/00031305.2013.785980 File-URL: http://hdl.handle.net/10.1080/00031305.2013.785980 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:2:p:97-103 Template-Type: ReDIF-Article 1.0 Author-Name: Robert A. Oster Author-X-Name-First: Robert A. Author-X-Name-Last: Oster Title: Section Editor's Notes Journal: The American Statistician Pages: 104-104 Issue: 2 Volume: 67 Year: 2013 Month: 5 X-DOI: 10.1080/00031305.2013.788307 File-URL: http://hdl.handle.net/10.1080/00031305.2013.788307 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:2:p:104-104 Template-Type: ReDIF-Article 1.0 Author-Name: Richard G. Lomax Author-X-Name-First: Richard G. Author-X-Name-Last: Lomax Title: Statistical Accuracy of iPad Applications: An Initial Examination Abstract: With the recent advent of the iPad, statistics-related applications (apps) have begun development. Given their newness, statistical accuracy is a concern. This study assessed the accuracy of the following iPad apps: Data Explorer, StatsMate, Statistics Visualizer, and TC-Stats. Early and recent versions of Excel were also included for comparative purposes. Accuracy was considered in two ways. First, the National Institute of Standards and Technology Statistical Reference Datasets (StRD) were used to benchmark accuracy. Analyses included univariate summary statistics (means, standard deviations), analysis of variance (ANOVA; F statistics), and linear regression (regression coefficients, standard deviations). The log relative error was computed for each dataset (comparing the "certified" values from StRD against the app actual values). Second, Wilkinson's tests were conducted to assess app "pass" rates (rounding, scatterplot, univariate, regression, overall). The results suggest the following: (a) the most accurate app for summary statistics and for lower difficulty ANOVA datasets was StatsMate, (b) the most accurate app for average difficulty ANOVA datasets was Data Explorer, (c) no app was accurate for higher difficulty ANOVA datasets, (d) only Data Explorer could handle most regression models, and (e) Wilkinson pass rates for Data Explorer (79%) and StatsMate (58%) were highest. Overall, StatsMate compares favorably to Excel 97, early versions being similarly accurate. Much remains to be done to improve the statistical accuracy of these apps. Journal: The American Statistician Pages: 105-108 Issue: 2 Volume: 67 Year: 2013 Month: 5 X-DOI: 10.1080/00031305.2013.778789 File-URL: http://hdl.handle.net/10.1080/00031305.2013.778789 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:2:p:105-108 Template-Type: ReDIF-Article 1.0 Author-Name: Brian Caffo Author-X-Name-First: Brian Author-X-Name-Last: Caffo Author-Name: Carolyn Lauzon Author-X-Name-First: Carolyn Author-X-Name-Last: Lauzon Author-Name: Joachim Röhmel Author-X-Name-First: Joachim Author-X-Name-Last: Röhmel Title: Correction to "Easy Multiplicity Control in Equivalence Testing Using Two One-Sided Tests" Journal: The American Statistician Pages: 115-116 Issue: 2 Volume: 67 Year: 2013 Month: 5 X-DOI: 10.1080/00031305.2012.760487 File-URL: http://hdl.handle.net/10.1080/00031305.2012.760487 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:2:p:115-116 Template-Type: ReDIF-Article 1.0 Author-Name: Francesca Greselin Author-X-Name-First: Francesca Author-X-Name-Last: Greselin Author-Name: Antonio Punzo Author-X-Name-First: Antonio Author-X-Name-Last: Punzo Title: Closed Likelihood Ratio Testing Procedures to Assess Similarity of Covariance Matrices Abstract: In this article, we introduce a multiple testing procedure to assess a common covariance structure between k groups. The new test allows for a choice among eight different patterns arising from the three-term eigen decomposition of the group covariances. It is based on the closed testing principle and adopts local likelihood ratio (LR) tests. The approach reveals richer information about the underlying data structure than classical methods, the most common one being only based on homo/heteroscedasticity. At the same time, it provides a more parsimonious parameterization, whenever the constrained model is suitable to describe the real data. The new inferential methodology is then applied to some well-known datasets chosen from the multivariate literature. Finally, simulation results are presented to investigate its performance in different situations representing gradual departures from homoscedasticity and to evaluate the reliability of using the asymptotic χ-super-2 to approximate the actual distribution of the local LR test statistics. Journal: The American Statistician Pages: 117-128 Issue: 3 Volume: 67 Year: 2013 Month: 8 X-DOI: 10.1080/00031305.2013.791643 File-URL: http://hdl.handle.net/10.1080/00031305.2013.791643 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:3:p:117-128 Template-Type: ReDIF-Article 1.0 Author-Name: Kevin Wright Author-X-Name-First: Kevin Author-X-Name-Last: Wright Title: Revisiting Immer's Barley Data Abstract: This article reexamines the famous barley data that are often used to demonstrate dot plots. Additional sources of supplemental data provide context for interpretation of the original data. Graphical and mixed-model analyses shed new light on the variability in the data and challenge previously held beliefs about the accuracy of the data. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 129-133 Issue: 3 Volume: 67 Year: 2013 Month: 8 X-DOI: 10.1080/00031305.2013.801783 File-URL: http://hdl.handle.net/10.1080/00031305.2013.801783 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:3:p:129-133 Template-Type: ReDIF-Article 1.0 Author-Name: Chong Zhang Author-X-Name-First: Chong Author-X-Name-Last: Zhang Author-Name: Yufeng Liu Author-X-Name-First: Yufeng Author-X-Name-Last: Liu Author-Name: Zhengxiao Wu Author-X-Name-First: Zhengxiao Author-X-Name-Last: Wu Title: On the Effect and Remedies of Shrinkage on Classification Probability Estimation Abstract: Shrinkage methods have been shown to be effective for classification problems. As a form of regularization, shrinkage through penalization helps to avoid overfitting and produces accurate classifiers for prediction, especially when the dimension is relatively high. Despite the benefit of shrinkage on classification accuracy of resulting classifiers, in this article, we demonstrate that shrinkage creates biases on classification probability estimation. In many cases, this bias can be large and consequently yield poor class probability estimation when the sample size is small or moderate. We offer some theoretical insights into the effect of shrinkage and provide remedies for better class probability estimation. Using penalized logistic regression and proximal support vector machines as examples, we demonstrate that our proposed refit method gives similar classification accuracy and remarkable improvements on probability estimation on several simulated and real data examples. Journal: The American Statistician Pages: 134-142 Issue: 3 Volume: 67 Year: 2013 Month: 8 X-DOI: 10.1080/00031305.2013.817356 File-URL: http://hdl.handle.net/10.1080/00031305.2013.817356 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:3:p:134-142 Template-Type: ReDIF-Article 1.0 Author-Name: Jingchen Hu Author-X-Name-First: Jingchen Author-X-Name-Last: Hu Author-Name: Robin Mitra Author-X-Name-First: Robin Author-X-Name-Last: Mitra Author-Name: Jerome Reiter Author-X-Name-First: Jerome Author-X-Name-Last: Reiter Title: Are Independent Parameter Draws Necessary for Multiple Imputation? Abstract: In typical implementations of multiple imputation for missing data, analysts create m completed datasets based on approximately independent draws of imputation model parameters. We use theoretical arguments and simulations to show that, provided m is large, the use of independent draws is not necessary. In fact, appropriate use of dependent draws can improve precision relative to the use of independent draws. It also eliminates the sometimes difficult task of obtaining independent draws; for example, in fully Bayesian imputation models based on MCMC, analysts can avoid the search for a subsampling interval that ensures approximately independent draws for all parameters. We illustrate the use of dependent draws in multiple imputation with a study of the effect of breast feeding on children's later cognitive abilities. Journal: The American Statistician Pages: 143-149 Issue: 3 Volume: 67 Year: 2013 Month: 8 X-DOI: 10.1080/00031305.2013.821953 File-URL: http://hdl.handle.net/10.1080/00031305.2013.821953 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:3:p:143-149 Template-Type: ReDIF-Article 1.0 Author-Name: Richard J. Barker Author-X-Name-First: Richard J. Author-X-Name-Last: Barker Author-Name: William A. Link Author-X-Name-First: William A. Author-X-Name-Last: Link Title: Bayesian Multimodel Inference by RJMCMC: A Gibbs Sampling Approach Abstract: Bayesian multimodel inference treats a set of candidate models as the sample space of a latent categorical random variable, sampled once; the data at hand are modeled as having been generated according to the sampled model. Model selection and model averaging are based on the posterior probabilities for the model set. Reversible-jump Markov chain Monte Carlo (RJMCMC) extends ordinary MCMC methods to this meta-model. We describe a version of RJMCMC that intuitively represents the process as Gibbs sampling with alternating updates of a categorical variable M (for Model) and a "palette" of parameters , from which any of the model-specific parameters can be calculated. Our representation makes plain how model-specific Monte Carlo outputs (analytical or numerical) can be post-processed to compute model weights or Bayes factors. We illustrate the procedure with several examples. Journal: The American Statistician Pages: 150-156 Issue: 3 Volume: 67 Year: 2013 Month: 8 X-DOI: 10.1080/00031305.2013.791644 File-URL: http://hdl.handle.net/10.1080/00031305.2013.791644 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:3:p:150-156 Template-Type: ReDIF-Article 1.0 Author-Name: Daniel A. Griffith Author-X-Name-First: Daniel A. Author-X-Name-Last: Griffith Title: Better Articulating Normal Curve Theory for Introductory Mathematical Statistics Students: Power Transformations and Their Back-Transformations Abstract: This article addresses a gap in many, if not all, introductory mathematical statistics textbooks, namely, transforming a random variable so that it better mimics a normal distribution. Virtually all such textbooks treat the subject of variable transformations, which furnishes a nice opportunity to introduce and study this transformation-to-normality topic, a topic students frequently encounter in subsequent applied statistics courses. Accordingly, this article reviews variable power transformations of the Box--Cox type within the context of normal curve theory, as well as addresses their corresponding back-transformations. It presents four theorems and a conjecture that furnish the basics needed to derive equivalent results for all nonnegative values of the Box--Cox power transformation exponent. Results are illustrated with the exponential random variable. This article also includes selected pedagogic tools created with R code. Journal: The American Statistician Pages: 157-169 Issue: 3 Volume: 67 Year: 2013 Month: 8 X-DOI: 10.1080/00031305.2013.801782 File-URL: http://hdl.handle.net/10.1080/00031305.2013.801782 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:3:p:157-169 Template-Type: ReDIF-Article 1.0 Author-Name: Robert A. Oster Author-X-Name-First: Robert A. Author-X-Name-Last: Oster Title: Section Editor's Notes Journal: The American Statistician Pages: 170-170 Issue: 3 Volume: 67 Year: 2013 Month: 8 X-DOI: 10.1080/00031305.2013.822199 File-URL: http://hdl.handle.net/10.1080/00031305.2013.822199 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:3:p:170-170 Template-Type: ReDIF-Article 1.0 Author-Name: Yoonsang Kim Author-X-Name-First: Yoonsang Author-X-Name-Last: Kim Author-Name: Young-Ku Choi Author-X-Name-First: Young-Ku Author-X-Name-Last: Choi Author-Name: Sherry Emery Author-X-Name-First: Sherry Author-X-Name-Last: Emery Title: Logistic Regression With Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages Abstract: Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss--Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to antitobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. Journal: The American Statistician Pages: 171-182 Issue: 3 Volume: 67 Year: 2013 Month: 8 X-DOI: 10.1080/00031305.2013.817357 File-URL: http://hdl.handle.net/10.1080/00031305.2013.817357 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:3:p:171-182 Template-Type: ReDIF-Article 1.0 Author-Name: Kristian Lum Author-X-Name-First: Kristian Author-X-Name-Last: Lum Author-Name: Megan Emily Price Author-X-Name-First: Megan Emily Author-X-Name-Last: Price Author-Name: David Banks Author-X-Name-First: David Author-X-Name-Last: Banks Title: Applications of Multiple Systems Estimation in Human Rights Research Abstract: Multiple systems estimation (MSE) is becoming an increasingly common approach for exploratory study of underreported events in the field of quantitative human rights. In this context, it is used to estimate the number of people who died as a result of political unrest when it is believed that many of those who died or disappeared were never reported. MSE relies upon several assumptions, each of which may be slightly or significantly violated in particular applications. This article outlines the evolution of the application of MSE to human rights research through the use of three case studies: Guatemala, Peru, and Colombia. Each of these cases presents distinct challenges to the MSE method. Motivated by these applications, we describe new methodology for assessing the impact of violated assumptions in MSE. Our approach uses simulations to explore the cumulative magnitude of errors introduced by violation of the model assumptions at each stage in the analysis. Journal: The American Statistician Pages: 191-200 Issue: 4 Volume: 67 Year: 2013 Month: 11 X-DOI: 10.1080/00031305.2013.821093 File-URL: http://hdl.handle.net/10.1080/00031305.2013.821093 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:4:p:191-200 Template-Type: ReDIF-Article 1.0 Author-Name: Stephen E. Fienberg Author-X-Name-First: Stephen E. Author-X-Name-Last: Fienberg Title: Comment: Innovations Associated with Multiple Systems Estimation in Human Rights Settings Journal: The American Statistician Pages: 201-202 Issue: 4 Volume: 67 Year: 2013 Month: 11 X-DOI: 10.1080/00031305.2013.855108 File-URL: http://hdl.handle.net/10.1080/00031305.2013.855108 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:4:p:201-202 Template-Type: ReDIF-Article 1.0 Author-Name: Joseph B. Kadane Author-X-Name-First: Joseph B. Author-X-Name-Last: Kadane Title: Comment Journal: The American Statistician Pages: 202-203 Issue: 4 Volume: 67 Year: 2013 Month: 11 X-DOI: 10.1080/00031305.2013.855106 File-URL: http://hdl.handle.net/10.1080/00031305.2013.855106 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:4:p:202-203 Template-Type: ReDIF-Article 1.0 Author-Name: Fritz Scheuren Author-X-Name-First: Fritz Author-X-Name-Last: Scheuren Title: Comment Journal: The American Statistician Pages: 203-205 Issue: 4 Volume: 67 Year: 2013 Month: 11 X-DOI: 10.1080/00031305.2013.852026 File-URL: http://hdl.handle.net/10.1080/00031305.2013.852026 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:4:p:203-205 Template-Type: ReDIF-Article 1.0 Author-Name: Kristian Lum Author-X-Name-First: Kristian Author-X-Name-Last: Lum Author-Name: Megan Emily Price Author-X-Name-First: Megan Emily Author-X-Name-Last: Price Author-Name: David Banks Author-X-Name-First: David Author-X-Name-Last: Banks Title: Rejoinder Journal: The American Statistician Pages: 205-206 Issue: 4 Volume: 67 Year: 2013 Month: 11 X-DOI: 10.1080/00031305.2013.855109 File-URL: http://hdl.handle.net/10.1080/00031305.2013.855109 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:4:p:205-206 Template-Type: ReDIF-Article 1.0 Author-Name: Paul Fogel Author-X-Name-First: Paul Author-X-Name-Last: Fogel Author-Name: Douglas M. Hawkins Author-X-Name-First: Douglas M. Author-X-Name-Last: Hawkins Author-Name: Chris Beecher Author-X-Name-First: Chris Author-X-Name-Last: Beecher Author-Name: George Luta Author-X-Name-First: George Author-X-Name-Last: Luta Author-Name: S. Stanley Young Author-X-Name-First: S. Stanley Author-X-Name-Last: Young Title: A Tale of Two Matrix Factorizations Abstract: In statistical practice, rectangular tables of numeric data are commonplace, and are often analyzed using dimension-reduction methods like the singular value decomposition and its close cousin, principal component analysis (PCA). This analysis produces score and loading matrices representing the rows and the columns of the original table and these matrices may be used for both prediction purposes and to gain structural understanding of the data. In some tables, the data entries are necessarily nonnegative (apart, perhaps, from some small random noise), and so the matrix factors meant to represent them should arguably also contain only nonnegative elements. This thinking, and the desire for parsimony, underlies such techniques as rotating factors in a search for "simple structure." These attempts to transform score or loading matrices of mixed sign into nonnegative, parsimonious forms are, however, indirect and at best imperfect. The recent development of nonnegative matrix factorization, or NMF, is an attractive alternative. Rather than attempt to transform a loading or score matrix of mixed signs into one with only nonnegative elements, it directly seeks matrix factors containing only nonnegative elements. The resulting factorization often leads to substantial improvements in interpretability of the factors. We illustrate this potential by synthetic examples and a real dataset. The question of exactly when NMF is effective is not fully resolved, but some indicators of its domain of success are given. It is pointed out that the NMF factors can be used in much the same way as those coming from PCA for such tasks as ordination, clustering, and prediction. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 207-218 Issue: 4 Volume: 67 Year: 2013 Month: 11 X-DOI: 10.1080/00031305.2013.845607 File-URL: http://hdl.handle.net/10.1080/00031305.2013.845607 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:4:p:207-218 Template-Type: ReDIF-Article 1.0 Author-Name: Nicholas J. Horton Author-X-Name-First: Nicholas J. Author-X-Name-Last: Horton Title: I Hear, I Forget. I Do, I Understand: A Modified Moore-Method Mathematical Statistics Course Abstract: Moore introduced a method for graduate mathematics instruction that consisted primarily of individual student work on challenging proofs. Cohen described an adaptation with less explicit competition suitable for undergraduate students at a liberal arts college. This article details an adaptation of this modified Moore method to teach mathematical statistics, and describes ways that such an approach helps engage students and foster the teaching of statistics. Groups of students worked a set of three difficult problems (some theoretical, some applied) every two weeks. Class time was devoted to coaching sessions with the instructor, group meeting time, and class presentations. R was used to estimate solutions empirically, where analytic results were intractable, as well as to provide an environment to undertake simulation studies with the aim of deepening understanding and complementing analytic solutions. Each group presented comprehensive solutions to complement oral presentations. Development of parallel techniques for empirical and analytic problem solving was an explicit goal of the course, which also attempted to communicate ways that statistics can be used to tackle interesting problems. The group problem-solving component and use of technology allowed students to attempt much more challenging questions than they could otherwise solve. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 219-228 Issue: 4 Volume: 67 Year: 2013 Month: 11 X-DOI: 10.1080/00031305.2013.849207 File-URL: http://hdl.handle.net/10.1080/00031305.2013.849207 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:4:p:219-228 Template-Type: ReDIF-Article 1.0 Author-Name: Howard Gitlow Author-X-Name-First: Howard Author-X-Name-Last: Gitlow Author-Name: Hernan Awad Author-X-Name-First: Hernan Author-X-Name-Last: Awad Title: Intro Stats Students Need Both Confidence and Tolerance (Intervals) Abstract: Tolerance intervals are typically not taught in introductory statistics courses aimed at business, engineering, and science majors. This is regrettable, since students are likely to encounter practical problems that should be analyzed using tolerance intervals. Additionally, contrasting tolerance intervals against confidence intervals will improve students' understanding of confidence intervals, eliminating frequent confusions. In this article, we make the argument for teaching tolerance intervals in introductory statistics courses, and we offer suggestions about what to teach. Journal: The American Statistician Pages: 229-234 Issue: 4 Volume: 67 Year: 2013 Month: 11 X-DOI: 10.1080/00031305.2013.839482 File-URL: http://hdl.handle.net/10.1080/00031305.2013.839482 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:4:p:229-234 Template-Type: ReDIF-Article 1.0 Author-Name: Yeyi Zhu Author-X-Name-First: Yeyi Author-X-Name-Last: Zhu Author-Name: Ladia M. Hernandez Author-X-Name-First: Ladia M. Author-X-Name-Last: Hernandez Author-Name: Peter Mueller Author-X-Name-First: Peter Author-X-Name-Last: Mueller Author-Name: Yongquan Dong Author-X-Name-First: Yongquan Author-X-Name-Last: Dong Author-Name: Michele R. Forman Author-X-Name-First: Michele R. Author-X-Name-Last: Forman Title: Data Acquisition and Preprocessing in Studies on Humans: What is Not Taught in Statistics Classes? Abstract: The aim of this article is to address issues in research that may be missing from statistics classes and important for (bio-) statistics students. In the context of a case study, we discuss data acquisition and preprocessing steps that fill the gap between research questions posed by subject matter scientists and statistical methodology for formal inference. Issues include participant recruitment, data collection training and standardization, variable coding, data review and verification, data cleaning and editing, and documentation. Despite the critical importance of these details in research, most of these issues are rarely discussed in an applied statistics program. One reason for the lack of more formal training is the difficulty in addressing the many challenges that can possibly arise in the course of a study in a systematic way. This article can help to bridge the gap between research questions and formal statistical inference by using an illustrative case study for a discussion. We hope that reading and discussing this article and practicing data preprocessing exercises will sensitize statistics students to these important issues and achieve optimal conduct, quality control, analysis, and interpretation of a study. Journal: The American Statistician Pages: 235-241 Issue: 4 Volume: 67 Year: 2013 Month: 11 X-DOI: 10.1080/00031305.2013.842498 File-URL: http://hdl.handle.net/10.1080/00031305.2013.842498 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:4:p:235-241 Template-Type: ReDIF-Article 1.0 Author-Name: Richard L. Warr Author-X-Name-First: Richard L. Author-X-Name-Last: Warr Author-Name: Roger A. Erich Author-X-Name-First: Roger A. Author-X-Name-Last: Erich Title: Should the Interquartile Range Divided by the Standard Deviation be Used to Assess Normality? Abstract: We discourage the use of a diagnostic for normality: the interquartile range divided by the standard deviation. This statistic has been suggested in several introductory statistics books as a method to assess normality. Through simulation, we explore the rate at which this statistic converges to its asymptotic normal distribution, and the actual size of tests based on the asymptotic distribution at several sample sizes. We show that there are nonnormal distributions from which this method cannot detect a difference. Additionally, we show the power of this test for normality is quite poor when compared with the Shapiro--Wilk test. Journal: The American Statistician Pages: 242-244 Issue: 4 Volume: 67 Year: 2013 Month: 11 X-DOI: 10.1080/00031305.2013.847385 File-URL: http://hdl.handle.net/10.1080/00031305.2013.847385 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:4:p:242-244 Template-Type: ReDIF-Article 1.0 Author-Name: Changyong Feng Author-X-Name-First: Changyong Author-X-Name-Last: Feng Author-Name: Hongyue Wang Author-X-Name-First: Hongyue Author-X-Name-Last: Wang Author-Name: Yu Han Author-X-Name-First: Yu Author-X-Name-Last: Han Author-Name: Yinglin Xia Author-X-Name-First: Yinglin Author-X-Name-Last: Xia Author-Name: Xin M. Tu Author-X-Name-First: Xin M. Author-X-Name-Last: Tu Title: The Mean Value Theorem and Taylor's Expansion in Statistics Abstract: The mean value theorem and Taylor's expansion are powerful tools in statistics that are used to derive estimators from nonlinear estimating equations and to study the asymptotic properties of the resulting estimators. However, the mean value theorem for a vector-valued differentiable function does not exist. Our survey shows that this nonexistent theorem has been used for a long time in statistical literature to derive the asymptotic properties of estimators and is still being used. We review several frequently cited papers and monographs that have misused this "theorem" and discuss the flaws in these applications. We also offer methods to fix such errors. Journal: The American Statistician Pages: 245-248 Issue: 4 Volume: 67 Year: 2013 Month: 11 X-DOI: 10.1080/00031305.2013.844203 File-URL: http://hdl.handle.net/10.1080/00031305.2013.844203 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:4:p:245-248 Template-Type: ReDIF-Article 1.0 Author-Name: Sivan Aldor-Noiman Author-X-Name-First: Sivan Author-X-Name-Last: Aldor-Noiman Author-Name: Lawrence D. Brown Author-X-Name-First: Lawrence D. Author-X-Name-Last: Brown Author-Name: Andreas Buja Author-X-Name-First: Andreas Author-X-Name-Last: Buja Author-Name: Wolfgang Rolke Author-X-Name-First: Wolfgang Author-X-Name-Last: Rolke Author-Name: Robert A. Stine Author-X-Name-First: Robert A. Author-X-Name-Last: Stine Title: The Power to See: A New Graphical Test of Normality Abstract: Many statistical procedures assume that the underlying data-generating process involves Gaussian errors. Among the popular tests for normality, only the Kolmogorov--Smirnov test has a graphical representation. Alternative tests, such as the Shapiro--Wilk test, offer little insight as to how the observed data deviate from normality. In this article, we discuss a simple new graphical procedure which provides simultaneous confidence bands for a normal quantile--quantile plot. These bands define a test of normality and are narrower in the tails than those related to the Kolmogorov--Smirnov test. Correspondingly, the new procedure has greater power to detect deviations from normality in the tails. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 249-260 Issue: 4 Volume: 67 Year: 2013 Month: 11 X-DOI: 10.1080/00031305.2013.847865 File-URL: http://hdl.handle.net/10.1080/00031305.2013.847865 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:4:p:249-260 Template-Type: ReDIF-Article 1.0 Author-Name: Nancy L. Segal Author-X-Name-First: Nancy L. Author-X-Name-Last: Segal Author-Name: Jorge Torres Author-X-Name-First: Jorge Author-X-Name-Last: Torres Title: A Repeated Grammatical Error Does Not Make it Right Journal: The American Statistician Pages: 266-266 Issue: 4 Volume: 67 Year: 2013 Month: 11 X-DOI: 10.1080/00031305.2013.834269 File-URL: http://hdl.handle.net/10.1080/00031305.2013.834269 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:67:y:2013:i:4:p:266-266 Template-Type: ReDIF-Article 1.0 Author-Name: Timothy W. Armistead Author-X-Name-First: Timothy W. Author-X-Name-Last: Armistead Title: Resurrecting the Third Variable: A Critique of Pearl's Causal Analysis of Simpson's Paradox Abstract: Pearl argued that Simpson's Paradox would not be considered paradoxical but for statisticians' unwillingness to acknowledge the role of causality in resolving an instance of it. He proposed using a causal calculus to determine which set of contradictory findings in an instance of the paradox should be accepted-the aggregated data or the data disaggregated by conditioning on the third variable. Pearl used the example of a hypothetical quasi-experiment to argue that when third variables are not causal, one should not condition on them, and-assuming no other sources of confounding-the aggregated data should be accepted. Pearl was precipitate in his argument that it would be inappropriate to condition on the noncausal third variables in the example. Whether causal or not, third variables can convey critical information about a first-order relationship, study design, and previously unobserved variables. Any conditioning on a nontrivial third variable that produces Simpson's Paradox should be carefully examined before either the aggregated or the disaggregated findings are accepted, regardless of whether the third variable is thought to be causal. In some cases, neither set of data is trustworthy; in others, both convey information of value. Pearl's hypothetical example is used to illustrate this argument. Journal: The American Statistician Pages: 1-7 Issue: 1 Volume: 68 Year: 2014 Month: 2 X-DOI: 10.1080/00031305.2013.807750 File-URL: http://hdl.handle.net/10.1080/00031305.2013.807750 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:1:p:1-7 Template-Type: ReDIF-Article 1.0 Author-Name: Judea Pearl Author-X-Name-First: Judea Author-X-Name-Last: Pearl Title: Comment: Understanding Simpson's Paradox Journal: The American Statistician Pages: 8-13 Issue: 1 Volume: 68 Year: 2014 Month: 2 X-DOI: 10.1080/00031305.2014.876829 File-URL: http://hdl.handle.net/10.1080/00031305.2014.876829 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:1:p:8-13 Template-Type: ReDIF-Article 1.0 Author-Name: Ronald Christensen Author-X-Name-First: Ronald Author-X-Name-Last: Christensen Title: Comment Abstract: I discuss predicting outcomes and the roles of causation and sampling design. Journal: The American Statistician Pages: 13-17 Issue: 1 Volume: 68 Year: 2014 Month: 2 X-DOI: 10.1080/00031305.2014.876832 File-URL: http://hdl.handle.net/10.1080/00031305.2014.876832 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:1:p:13-17 Template-Type: ReDIF-Article 1.0 Author-Name: Keli Liu Author-X-Name-First: Keli Author-X-Name-Last: Liu Author-Name: Xiao-Li Meng Author-X-Name-First: Xiao-Li Author-X-Name-Last: Meng Title: Comment: A Fruitful Resolution to Simpson's Paradox via Multiresolution Inference Abstract: Simpson's Paradox is really a Simple Paradox if one at all. Peeling away the paradox is as easy (or hard) as avoiding a comparison of apples and oranges, a concept requiring no mention of causality. We show how the commonly adopted notation has committed the gross-ery mistake of tagging unlike fruit with alike labels. Hence, the "fruitful" question to ask is not "Do we condition on the third variable?" but rather "Are two fruits, which appear similar, actually similar at their core?." We introduce the concept of intrinsic similarity to escape this bind. The notion of "core" depends on how deep one looks-the multi resolution inference framework provides a natural way to define intrinsic similarity at the resolution appropriate for the treatment. To harvest the fruits of this insight, we will need to estimate intrinsic similarity, which often results in an indirect conditioning on the "third variable." A ripening estimation theory shows that the standard treatment comparisons, unconditional or conditional on the third variable, are low hanging fruit but often rotten. We pose assumptions to pluck away higher-resolution (more conditional) comparisons-the multiresolution framework allows us to rigorously assess the price of these assumptions against the resulting yield. One such assessment gives us Simpson's Warning: less conditioning is most likely to lead to serious bias when Simpson's Paradox appears. Journal: The American Statistician Pages: 17-29 Issue: 1 Volume: 68 Year: 2014 Month: 2 X-DOI: 10.1080/00031305.2014.876842 File-URL: http://hdl.handle.net/10.1080/00031305.2014.876842 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:1:p:17-29 Template-Type: ReDIF-Article 1.0 Author-Name: Timothy Armistead Author-X-Name-First: Timothy Author-X-Name-Last: Armistead Title: Rejoinder Journal: The American Statistician Pages: 30-31 Issue: 1 Volume: 68 Year: 2014 Month: 2 X-DOI: 10.1080/00031305.2014.879772 File-URL: http://hdl.handle.net/10.1080/00031305.2014.879772 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:1:p:30-31 Template-Type: ReDIF-Article 1.0 Author-Name: Peng Ding Author-X-Name-First: Peng Author-X-Name-Last: Ding Title: Three Occurrences of the Hyperbolic-Secant Distribution Abstract: Although it is the generator distribution of the sixth natural exponential family with quadratic variance function, the Hyperbolic-Secant (HS) distribution is much less known than other distributions in the exponential families. Its lack of familiarity is due to its isolation from many widely used statistical models. We fill in the gap by showing three examples naturally generating the HS distribution, including Fisher's analysis of similarity between twins, the Jeffreys' prior for contingency tables, and invalid instrumental variables. Journal: The American Statistician Pages: 32-35 Issue: 1 Volume: 68 Year: 2014 Month: 2 X-DOI: 10.1080/00031305.2013.867902 File-URL: http://hdl.handle.net/10.1080/00031305.2013.867902 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:1:p:32-35 Template-Type: ReDIF-Article 1.0 Author-Name: Edward J. Bedrick Author-X-Name-First: Edward J. Author-X-Name-Last: Bedrick Title: Two Useful Reformulations of the Hazard Ratio Abstract: The hazard ratio is a standard summary for comparing survival curves yet hazard ratios are often difficult for scientists and clinicians to interpret. Insight into the interpretation of hazard ratios is obtained by relating hazard ratios to the maximum difference and an average difference between survival probabilities. These reformulations of the hazard ratio are useful in classroom discussions of survival analysis and when discussing analyses with scientists and clinicians. Large-sample distribution theory is provided for these reformulations of the hazard ratio. Two examples are used to illustrate the ideas. Journal: The American Statistician Pages: 36-41 Issue: 1 Volume: 68 Year: 2014 Month: 2 X-DOI: 10.1080/00031305.2013.868827 File-URL: http://hdl.handle.net/10.1080/00031305.2013.868827 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:1:p:36-41 Template-Type: ReDIF-Article 1.0 Author-Name: Geoffrey Jones Author-X-Name-First: Geoffrey Author-X-Name-Last: Jones Author-Name: Wesley O. Johnson Author-X-Name-First: Wesley O. Author-X-Name-Last: Johnson Title: Prior Elicitation: Interactive Spreadsheet Graphics With Sliders Can Be Fun, and Informative Abstract: There are several approaches to setting priors in Bayesian data analysis. Some attempt to minimize the impact of the prior on the posterior, allowing the data to "speak for themselves," or to provide Bayesian inferences that have good frequentist properties. In contrast, this note focuses on priors where scientific knowledge is used, possibly partially informative. There are many articles on the use of such subjective information. We focus on using standard software for eliciting priors from subject-matter specialists, in the form of models such as the binomial, Poisson, and normal. Our approach uses a common spreadsheet package with the facility to display dynamic pictures of prior distributions as the user toggles scroll bars or "sliders" that manipulate parameters of particular distributions. This allows interactive exploration of the shape of a probability distribution. We have found this a useful tool when eliciting priors for Bayesian data analysis. We present examples to illustrate the scope and flexibility of the method. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 42-51 Issue: 1 Volume: 68 Year: 2014 Month: 2 X-DOI: 10.1080/00031305.2013.868828 File-URL: http://hdl.handle.net/10.1080/00031305.2013.868828 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:1:p:42-51 Template-Type: ReDIF-Article 1.0 Author-Name: Lynn Roy LaMotte Author-X-Name-First: Lynn Roy Author-X-Name-Last: LaMotte Title: The Gram-Schmidt Construction as a Basis for Linear Models Abstract: The Gram-Schmidt construction, with a little extension, can be used to establish results in linear algebra, multiple regression analysis, and the theory of linear models. This article describes and illustrates how it serves to develop the basic results required for statistical inference in the Gauss--Markov model. For upper-level theory courses, the method's advantage is that it requires less background and fewer results in linear algebra than are usually required. For applications-oriented courses, it makes it possible to describe relations and computations simply and explicitly. Journal: The American Statistician Pages: 52-55 Issue: 1 Volume: 68 Year: 2014 Month: 2 X-DOI: 10.1080/00031305.2013.875485 File-URL: http://hdl.handle.net/10.1080/00031305.2013.875485 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:1:p:52-55 Template-Type: ReDIF-Article 1.0 Author-Name: A. J. Hayter Author-X-Name-First: A. J. Author-X-Name-Last: Hayter Title: Simultaneous Confidence Intervals for Several Quantiles of an Unknown Distribution Abstract: Given a sample of independent observations from an unknown continuous distribution, it is standard practice to construct a confidence interval for a specified quantile of the distribution using the binomial distribution. Furthermore, confidence bands for the unknown cumulative distribution function, such as Kolmogorov's, provide simultaneous confidence intervals for all quantiles of the distribution, which are necessarily wider than the individual confidence intervals at the same confidence level. The purpose of this article is to show how simultaneous confidence intervals for several specified quantiles of the unknown distribution can be calculated using probabilities from a multinomial distribution. An efficient recursive algorithm is described for these calculations. An experimenter may typically be interested in several quantiles of the distribution, such as the median, quartiles, and upper and lower tail quantiles, and this methodology provides a bridge between the confidence intervals with individual confidence levels and those that can be obtained from confidence bands. Some examples of the implementation of this nonparametric methodology are provided, and some comparisons are made with some parametric approaches to the problem. Journal: The American Statistician Pages: 56-62 Issue: 1 Volume: 68 Year: 2014 Month: 2 X-DOI: 10.1080/00031305.2013.869259 File-URL: http://hdl.handle.net/10.1080/00031305.2013.869259 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:1:p:56-62 Template-Type: ReDIF-Article 1.0 Author-Name: Nitis Mukhopadhyay Author-X-Name-First: Nitis Author-X-Name-Last: Mukhopadhyay Title: Letter to the Editor: Griffith, Daniel A. (2013), "Better Articulating Normal Curve for Introductory Mathematical Statistics Students: Power Transformations," The American Statistician, 67, 157-169 Journal: The American Statistician Pages: 67-67 Issue: 1 Volume: 68 Year: 2014 Month: 2 X-DOI: 10.1080/00031305.2013.867903 File-URL: http://hdl.handle.net/10.1080/00031305.2013.867903 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:1:p:67-67 Template-Type: ReDIF-Article 1.0 Author-Name: Daniel A. Griffith Author-X-Name-First: Daniel A. Author-X-Name-Last: Griffith Title: Reply Journal: The American Statistician Pages: 67-69 Issue: 1 Volume: 68 Year: 2014 Month: 2 X-DOI: 10.1080/00031305.2014.890005 File-URL: http://hdl.handle.net/10.1080/00031305.2014.890005 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:1:p:67-69 Template-Type: ReDIF-Article 1.0 Author-Name: David A. Harville Author-X-Name-First: David A. Author-X-Name-Last: Harville Title: The Need for More Emphasis on Prediction: A "Nondenominational" Model-Based Approach Abstract: Prediction problems are ubiquitous. In a model-based approach to predictive inference, the values of random variables that are presently observable are used to make inferences about the values of random variables that will become observable in the future, and the joint distribution of the random variables or various of its characteristics are assumed to be known up to the value of a vector of unknown parameters. Such an approach has proved to be highly effective in many important applications.This article argues that the performance of a prediction procedure in repeated application is important and should play a significant role in its evaluation. A "nondenominational" model-based approach to predictive inference is described and discussed; what in a Bayesian approach would be regarded as a prior distribution is simply regarded as part of a model that is hierarchical in nature. Some specifics are given for mixed-effects linear models, and an application to the prediction of the outcomes of basketball or football games (and to the ranking and rating of basketball or football teams) is included for purposes of illustration. Journal: The American Statistician Pages: 71-83 Issue: 2 Volume: 68 Year: 2014 Month: 5 X-DOI: 10.1080/00031305.2013.836987 File-URL: http://hdl.handle.net/10.1080/00031305.2013.836987 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:71-83 Template-Type: ReDIF-Article 1.0 Author-Name: Hal Stern Author-X-Name-First: Hal Author-X-Name-Last: Stern Title: Comment Journal: The American Statistician Pages: 83-84 Issue: 2 Volume: 68 Year: 2014 Month: 5 X-DOI: 10.1080/00031305.2014.897257 File-URL: http://hdl.handle.net/10.1080/00031305.2014.897257 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:83-84 Template-Type: ReDIF-Article 1.0 Author-Name: Dale L. Zimmerman Author-X-Name-First: Dale L. Author-X-Name-Last: Zimmerman Title: Comment Journal: The American Statistician Pages: 85-86 Issue: 2 Volume: 68 Year: 2014 Month: 5 X-DOI: 10.1080/00031305.2014.898973 File-URL: http://hdl.handle.net/10.1080/00031305.2014.898973 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:85-86 Template-Type: ReDIF-Article 1.0 Author-Name: Robert McCulloch Author-X-Name-First: Robert Author-X-Name-Last: McCulloch Title: Comment Journal: The American Statistician Pages: 87-88 Issue: 2 Volume: 68 Year: 2014 Month: 5 X-DOI: 10.1080/00031305.2014.904174 File-URL: http://hdl.handle.net/10.1080/00031305.2014.904174 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:87-88 Template-Type: ReDIF-Article 1.0 Author-Name: Donald A. Berry Author-X-Name-First: Donald A. Author-X-Name-Last: Berry Author-Name: Scott M. Berry Author-X-Name-First: Scott M. Author-X-Name-Last: Berry Title: Comment Journal: The American Statistician Pages: 88-89 Issue: 2 Volume: 68 Year: 2014 Month: 5 X-DOI: 10.1080/00031305.2014.911546 File-URL: http://hdl.handle.net/10.1080/00031305.2014.911546 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:88-89 Template-Type: ReDIF-Article 1.0 Author-Name: David A. Harville Author-X-Name-First: David A. Author-X-Name-Last: Harville Title: Rejoinder Journal: The American Statistician Pages: 89-92 Issue: 2 Volume: 68 Year: 2014 Month: 5 X-DOI: 10.1080/00031305.2014.904640 File-URL: http://hdl.handle.net/10.1080/00031305.2014.904640 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:89-92 Template-Type: ReDIF-Article 1.0 Author-Name: Woojoo Lee Author-X-Name-First: Woojoo Author-X-Name-Last: Lee Author-Name: Yudi Pawitan Author-X-Name-First: Yudi Author-X-Name-Last: Pawitan Title: Direct Calculation of the Variance of Maximum Penalized Likelihood Estimates via EM Algorithm Abstract: The variance of the maximum penalized likelihood estimate obtained through the EM algorithm has not been explored in detail. We provide a simple and intuitive new representation for the variance that can be computed from the EM algorithm directly. For pedagogical purposes, we illustrate the new formula with two examples where analytical solutions are possible. Journal: The American Statistician Pages: 93-97 Issue: 2 Volume: 68 Year: 2014 Month: 5 X-DOI: 10.1080/00031305.2014.899273 File-URL: http://hdl.handle.net/10.1080/00031305.2014.899273 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:93-97 Template-Type: ReDIF-Article 1.0 Author-Name: Reto Bürgin Author-X-Name-First: Reto Author-X-Name-Last: Bürgin Author-Name: Gilbert Ritschard Author-X-Name-First: Gilbert Author-X-Name-Last: Ritschard Title: A Decorated Parallel Coordinate Plot for Categorical Longitudinal Data Abstract: This article proposes a decorated parallel coordinate plot for longitudinal categorical data, featuring a jitter mechanism revealing the diversity of observed longitudinal patterns and allowing the tracking of each individual pattern, variable point and line widths reflecting weighted pattern frequencies, the rendering of simultaneous events, and different filter options for highlighting typical patterns. The proposed visual display has been developed for describing and exploring the order of event occurrences, but it can be equally applied to other types of longitudinal categorical data. Alongside the description of the principle of the plot, we demonstrate the scope of the plot with a real dataset. A second application and R code for the plot are available online as supplementary materials. Journal: The American Statistician Pages: 98-103 Issue: 2 Volume: 68 Year: 2014 Month: 5 X-DOI: 10.1080/00031305.2014.887591 File-URL: http://hdl.handle.net/10.1080/00031305.2014.887591 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:98-103 Template-Type: ReDIF-Article 1.0 Author-Name: Matthew W. Guerra Author-X-Name-First: Matthew W. Author-X-Name-Last: Guerra Author-Name: Justine Shults Author-X-Name-First: Justine Author-X-Name-Last: Shults Title: A Note on the Simulation of Overdispersed Random Variables With Specified Marginal Means and Product Correlations Abstract: We propose a straightforward approach for simulation of discrete random variables with overdispersion, specified marginal means, and product correlations. The method stems from results we prove for variables with first-order antedependence and linearity of the conditional expectations and is therefore appropriate to simulate variables with these properties. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 104-107 Issue: 2 Volume: 68 Year: 2014 Month: 5 X-DOI: 10.1080/00031305.2014.887592 File-URL: http://hdl.handle.net/10.1080/00031305.2014.887592 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:104-107 Template-Type: ReDIF-Article 1.0 Author-Name: Ulrike Grömping Author-X-Name-First: Ulrike Author-X-Name-Last: Grömping Title: Mosaic Plots are Useful for Visualizing Low-Order Projections of Factorial Designs Abstract: Factorial experiments are widely used in industrial experimentation and other fields. Whenever a factorial experiment is not designed as a full factorial, but as a regular or nonregular fraction thereof, choice between competing designs and interpretation of experimental results should take into consideration how the experimental plan will confound experimental effects. This article proposes mosaic plots of low-order projections of factorial designs for visualizing confounding of low-order effects. Mosaic plots are particularly useful for design and analysis of orthogonal main effect plans. The R code for the creation of the plots in this article is available online in the supplementary material. Journal: The American Statistician Pages: 108-116 Issue: 2 Volume: 68 Year: 2014 Month: 5 X-DOI: 10.1080/00031305.2014.896829 File-URL: http://hdl.handle.net/10.1080/00031305.2014.896829 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:108-116 Template-Type: ReDIF-Article 1.0 Author-Name: Stuart Baker Author-X-Name-First: Stuart Author-X-Name-Last: Baker Author-Name: Jian-Lun Xu Author-X-Name-First: Jian-Lun Author-X-Name-Last: Xu Author-Name: Ping Hu Author-X-Name-First: Ping Author-X-Name-Last: Hu Author-Name: Peng Huang Author-X-Name-First: Peng Author-X-Name-Last: Huang Title: Vardeman, S. B. and Morris, M. D. (2013), "Majority Voting by Independent Classifiers can Increase Error Rates," The American Statistician, 67, 94-96: Comment by Baker, Xu, Hu, and Huang and Reply Journal: The American Statistician Pages: 125-126 Issue: 2 Volume: 68 Year: 2014 Month: 5 X-DOI: 10.1080/00031305.2014.882867 File-URL: http://hdl.handle.net/10.1080/00031305.2014.882867 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:125-126 Template-Type: ReDIF-Article 1.0 Author-Name: Bart Holland Author-X-Name-First: Bart Author-X-Name-Last: Holland Title: Segal, N. L., and Torres, J. (2013), "A Repeated Grammatical Error Does Not Make it Right," The American Statistician, 67, 266: Comment by Holland and Reply Journal: The American Statistician Pages: 127-127 Issue: 2 Volume: 68 Year: 2014 Month: 5 X-DOI: 10.1080/00031305.2014.887593 File-URL: http://hdl.handle.net/10.1080/00031305.2014.887593 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:127-127 Template-Type: ReDIF-Article 1.0 Author-Name: Nancy L. Segal Author-X-Name-First: Nancy L. Author-X-Name-Last: Segal Author-Name: Jorge Luis Torres Author-X-Name-First: Jorge Luis Author-X-Name-Last: Torres Title: Reply Journal: The American Statistician Pages: 127-128 Issue: 2 Volume: 68 Year: 2014 Month: 5 X-DOI: 10.1080/00031305.2014.890483 File-URL: http://hdl.handle.net/10.1080/00031305.2014.890483 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:127-128 Template-Type: ReDIF-Article 1.0 Author-Name: Yefim Haim Michlin Author-X-Name-First: Yefim Haim Author-X-Name-Last: Michlin Author-Name: Ofer Shaham Author-X-Name-First: Ofer Author-X-Name-Last: Shaham Title: Ignatova, I., Deutsch, R. C., and Edwards, D. (2012), "Closed Sequential and Multistage Inference on Binary Responses With or Without Replacement," The American Statistician, 66, 163-172: Comment by Michlin and Shaham and Reply Journal: The American Statistician Pages: 128-128 Issue: 2 Volume: 68 Year: 2014 Month: 5 X-DOI: 10.1080/00031305.2014.897256 File-URL: http://hdl.handle.net/10.1080/00031305.2014.897256 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:128-128 Template-Type: ReDIF-Article 1.0 Author-Name: Lina Ignatova Author-X-Name-First: Lina Author-X-Name-Last: Ignatova Author-Name: Roland C. Deutsch Author-X-Name-First: Roland C. Author-X-Name-Last: Deutsch Author-Name: Don Edwards Author-X-Name-First: Don Author-X-Name-Last: Edwards Title: Reply Journal: The American Statistician Pages: 129-129 Issue: 2 Volume: 68 Year: 2014 Month: 5 X-DOI: 10.1080/00031305.2014.898969 File-URL: http://hdl.handle.net/10.1080/00031305.2014.898969 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:129-129 Template-Type: ReDIF-Article 1.0 Author-Name: Gul Inan Author-X-Name-First: Gul Author-X-Name-Last: Inan Author-Name: Ozlem Ilk-Dag Author-X-Name-First: Ozlem Author-X-Name-Last: Ilk-Dag Author-Name: Alexander de Leon Author-X-Name-First: Alexander Author-X-Name-Last: de Leon Title: Kim, Y., Choi, Y.-K., and Emery, S. (2013), "Logistic Regression With Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages," The American Statistician, 67, 171-182 Journal: The American Statistician Pages: 129-130 Issue: 2 Volume: 68 Year: 2014 Month: 5 X-DOI: 10.1080/00031305.2014.904251 File-URL: http://hdl.handle.net/10.1080/00031305.2014.904251 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:129-130 Template-Type: ReDIF-Article 1.0 Author-Name: Yoonsang Kim Author-X-Name-First: Yoonsang Author-X-Name-Last: Kim Author-Name: Sherry Emery Author-X-Name-First: Sherry Author-X-Name-Last: Emery Title: Reply Journal: The American Statistician Pages: 130-131 Issue: 2 Volume: 68 Year: 2014 Month: 5 X-DOI: 10.1080/00031305.2014.904638 File-URL: http://hdl.handle.net/10.1080/00031305.2014.904638 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:130-131 Template-Type: ReDIF-Article 1.0 Author-Name: Mark F. Schilling Author-X-Name-First: Mark F. Author-X-Name-Last: Schilling Author-Name: Jimmy A. Doi Author-X-Name-First: Jimmy A. Author-X-Name-Last: Doi Title: A Coverage Probability Approach to Finding an Optimal Binomial Confidence Procedure Abstract: The problem of finding confidence intervals for the success parameter of a binomial experiment has a long history, and a myriad of procedures have been developed. Most exploit the duality between hypothesis testing and confidence regions and are typically based on large sample approximations. We instead employ a direct approach that attempts to determine the optimal coverage probability function a binomial confidence procedure can have from the exact underlying binomial distributions, which in turn defines the associated procedure. We show that a graphical perspective provides much insight into the problem. Both procedures whose coverage never falls below the declared confidence level and those that achieve that level only approximately are analyzed. We introduce the Length/Coverage Optimal method, a variant of Sterne's procedure that minimizes average length while maximizing coverage among all length minimizing procedures, and show that it is superior in important ways to existing procedures. Journal: The American Statistician Pages: 133-145 Issue: 3 Volume: 68 Year: 2014 Month: 2 X-DOI: 10.1080/00031305.2014.899274 File-URL: http://hdl.handle.net/10.1080/00031305.2014.899274 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:3:p:133-145 Template-Type: ReDIF-Article 1.0 Author-Name: Weiwen Miao Author-X-Name-First: Weiwen Author-X-Name-Last: Miao Author-Name: Joseph L. Gastwirth Author-X-Name-First: Joseph L. Author-X-Name-Last: Gastwirth Title: New Statistical Tests for Detecting Disparate Impact Arising From Two-Stage Selection Processes Abstract: Statistical evidence of a significant difference between the performance of a protected group and the majority on a preemployment exam is often critical when a court decides whether the exam has a disparate impact, that is, whether the exam has a disproportionate adverse impact on minority candidates. In many cases, the hiring or promotion process consists of two steps. Since disparate impact can occur at each step, parties submitting evidence may use statistical tests at each stage without accounting for a potential multiple comparisons problem. Because different courts have focused on data concerning either one or the other step or a composite of both, they have reached opposite conclusions when faced with similar data. After illustrating the issues, two two-step tests are recommended to alleviate the problem. The large sample properties of these tests are obtained. A simulation study shows that in most situations, the new tests have higher power than the ones in current use. Journal: The American Statistician Pages: 146-157 Issue: 3 Volume: 68 Year: 2014 Month: 4 X-DOI: 10.1080/00031305.2014.917054 File-URL: http://hdl.handle.net/10.1080/00031305.2014.917054 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:3:p:146-157 Template-Type: ReDIF-Article 1.0 Author-Name: Albert Vexler Author-X-Name-First: Albert Author-X-Name-Last: Vexler Author-Name: Wan-Min Tsai Author-X-Name-First: Wan-Min Author-X-Name-Last: Tsai Author-Name: Alan D. Hutson Author-X-Name-First: Alan D. Author-X-Name-Last: Hutson Title: A Simple Density-Based Empirical Likelihood Ratio Test for Independence Abstract: We develop a novel nonparametric likelihood ratio test for independence between two random variables using a technique that is free of the common constraints of defining a given set of specific dependence structures. Our methodology revolves around an exact density-based empirical likelihood ratio test statistic that approximates in a distribution-free fashion the corresponding most powerful parametric likelihood ratio test. We demonstrate that the proposed test is very powerful in detecting general structures of dependence between two random variables, including nonlinear and/or random-effect dependence structures. An extensive Monte Carlo study confirms that the proposed test is superior to the classical nonparametric procedures across a variety of settings. The real-world applicability of the proposed test is illustrated using data from a study of biomarkers associated with myocardial infarction. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 158-169 Issue: 3 Volume: 68 Year: 2014 Month: 2 X-DOI: 10.1080/00031305.2014.901922 File-URL: http://hdl.handle.net/10.1080/00031305.2014.901922 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:3:p:158-169 Template-Type: ReDIF-Article 1.0 Author-Name: Djilali Ait Aoudia Author-X-Name-First: Djilali Ait Author-X-Name-Last: Aoudia Author-Name: Éric Marchand Author-X-Name-First: Éric Author-X-Name-Last: Marchand Title: On a Simple Construction of a Bivariate Probability Function With a Common Marginal Abstract: We introduce a family of bivariate discrete distributions whose members are generated by a decreasing mass function p, and with margins given by p. Several properties and examples are obtained, including a family of seemingly novel bivariate Poisson distributions. Journal: The American Statistician Pages: 170-173 Issue: 3 Volume: 68 Year: 2014 Month: 2 X-DOI: 10.1080/00031305.2014.904250 File-URL: http://hdl.handle.net/10.1080/00031305.2014.904250 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:3:p:170-173 Template-Type: ReDIF-Article 1.0 Author-Name: Christopher E. Marks Author-X-Name-First: Christopher E. Author-X-Name-Last: Marks Author-Name: Andrew G. Glen Author-X-Name-First: Andrew G. Author-X-Name-Last: Glen Author-Name: Matthew W. Robinson Author-X-Name-First: Matthew W. Author-X-Name-Last: Robinson Author-Name: Lawrence M. Leemis Author-X-Name-First: Lawrence M. Author-X-Name-Last: Leemis Title: Applying Bootstrap Methods to System Reliability Abstract: We present a fully enumerated bootstrap method to find the empirical system lifetime distribution for a coherent system modeled by a reliability block diagram. Given failure data for individual components of a coherent system, the bootstrap empirical system lifetime distribution derived here will be free of resampling error. We further derive distribution-free expressions for the bias associated with the bootstrap method for estimating the mean system lifetimes of parallel and series systems with statistically identical components. We show that bootstrapping underestimates the mean system lifetime for parallel systems and overestimates the mean system lifetime for series systems, although both bootstrap estimates are asymptotically unbiased. The expressions for the bias are evaluated for several popular parametric lifetime distributions. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 174-182 Issue: 3 Volume: 68 Year: 2014 Month: 5 X-DOI: 10.1080/00031305.2014.928232 File-URL: http://hdl.handle.net/10.1080/00031305.2014.928232 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:3:p:174-182 Template-Type: ReDIF-Article 1.0 Author-Name: Kai Zhang Author-X-Name-First: Kai Author-X-Name-Last: Zhang Author-Name: Lawrence D. Brown Author-X-Name-First: Lawrence D. Author-X-Name-Last: Brown Author-Name: Edward George Author-X-Name-First: Edward Author-X-Name-Last: George Author-Name: Linda Zhao Author-X-Name-First: Linda Author-X-Name-Last: Zhao Title: Uniform Correlation Mixture of Bivariate Normal Distributions and Hypercubically Contoured Densities That Are Marginally Normal Abstract: The bivariate normal density with unit variance and correlation ρ is well known. We show that by integrating out ρ, the result is a function of the maximum norm. The Bayesian interpretation of this result is that if we put a uniform prior over ρ, then the marginal bivariate density depends only on the maximal magnitude of the variables. The square-shaped isodensity contour of this resulting marginal bivariate density can also be regarded as the equally weighted mixture of bivariate normal distributions over all possible correlation coefficients. This density links to the Khintchine mixture method of generating random variables. We use this method to construct the higher dimensional generalizations of this distribution. We further show that for each dimension, there is a unique multivariate density that is a differentiable function of the maximum norm and is marginally normal, and the bivariate density from the integral over ρ is its special case in two dimensions. Journal: The American Statistician Pages: 183-187 Issue: 3 Volume: 68 Year: 2014 Month: 3 X-DOI: 10.1080/00031305.2014.909741 File-URL: http://hdl.handle.net/10.1080/00031305.2014.909741 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:3:p:183-187 Template-Type: ReDIF-Article 1.0 Author-Name: Liang Hong Author-X-Name-First: Liang Author-X-Name-Last: Hong Title: Two New Elementary Derivations of Geometric Expectation Abstract: This article presents two new elementary derivations of the expectation of the geometric distribution. I also review six existing approaches. I hope that this article will benefit instructors and students in an introductory probability course. Journal: The American Statistician Pages: 188-190 Issue: 3 Volume: 68 Year: 2014 Month: 3 X-DOI: 10.1080/00031305.2014.915234 File-URL: http://hdl.handle.net/10.1080/00031305.2014.915234 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:3:p:188-190 Template-Type: ReDIF-Article 1.0 Author-Name: Peter H. Westfall Author-X-Name-First: Peter H. Author-X-Name-Last: Westfall Title: Kurtosis as Peakedness, 1905-2014. R.I.P. Abstract: The incorrect notion that kurtosis somehow measures "peakedness" (flatness, pointiness, or modality) of a distribution is remarkably persistent, despite attempts by statisticians to set the record straight. This article puts the notion to rest once and for all. Kurtosis tells you virtually nothing about the shape of the peak-its only unambiguous interpretation is in terms of tail extremity, that is, either existing outliers (for the sample kurtosis) or propensity to produce outliers (for the kurtosis of a probability distribution). To clarify this point, relevant literature is reviewed, counterexample distributions are given, and it is shown that the proportion of the kurtosis that is determined by the central μ ± σ range is usually quite small. Journal: The American Statistician Pages: 191-195 Issue: 3 Volume: 68 Year: 2014 Month: 4 X-DOI: 10.1080/00031305.2014.917055 File-URL: http://hdl.handle.net/10.1080/00031305.2014.917055 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:3:p:191-195 Template-Type: ReDIF-Article 1.0 Author-Name: Catherine Michalopoulou Author-X-Name-First: Catherine Author-X-Name-Last: Michalopoulou Title: A Unique Collaboration: Prominent Statisticians' Survey Work in Greece in 1946 Abstract: In 1946, Neyman, Jessen, Deming, Kempthorne, Daly, and Blythe conducted a series of sample surveys as sampling experts of the two Allied Missions that were set up to observe the preparation and conduct of the Greek parliamentary elections (March 31) and the revision of electoral rolls for the plebiscite (September 1). This article revisits these surveys, using both published and unpublished sources, and discusses the lessons learned from their history as they relate to current sampling practices. Journal: The American Statistician Pages: 196-203 Issue: 3 Volume: 68 Year: 2014 Month: 3 X-DOI: 10.1080/00031305.2014.920276 File-URL: http://hdl.handle.net/10.1080/00031305.2014.920276 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:3:p:196-203 Template-Type: ReDIF-Article 1.0 Author-Name: Robert A. Oster Author-X-Name-First: Robert A. Author-X-Name-Last: Oster Title: Section Editor's Notes Journal: The American Statistician Pages: 204-204 Issue: 3 Volume: 68 Year: 2014 Month: 7 X-DOI: 10.1080/00031305.2014.928560 File-URL: http://hdl.handle.net/10.1080/00031305.2014.928560 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:3:p:204-204 Template-Type: ReDIF-Article 1.0 Author-Name: Sara Fontdecaba Author-X-Name-First: Sara Author-X-Name-Last: Fontdecaba Author-Name: Pere Grima Author-X-Name-First: Pere Author-X-Name-Last: Grima Author-Name: Xavier Tort-Martorell Author-X-Name-First: Xavier Author-X-Name-Last: Tort-Martorell Title: Analyzing DOE With Statistical Software Packages: Controversies and Proposals Abstract: This article studies and evaluates how five well-known statistical packages-JMP, Minitab, SigmaXL, Statgraphics, and Statistica-address the problem of analyzing the significance of effects in unreplicated factorial designs. All five use different methods and criteria that deliver different results, even for simple textbook examples. The article shows that some of the methods used are clearly incorrect and deliver incorrect results. Finally, it raises the question of the impact that this may have in hindering the use of design of experiments (DOE) by nonexpert practitioners, and it provides suggestions for making this analysis more effective and easier to understand. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 205-211 Issue: 3 Volume: 68 Year: 2014 Month: 5 X-DOI: 10.1080/00031305.2014.923784 File-URL: http://hdl.handle.net/10.1080/00031305.2014.923784 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:3:p:205-211 Template-Type: ReDIF-Article 1.0 Author-Name: Liang Hong Author-X-Name-First: Liang Author-X-Name-Last: Hong Title: Letter to the Editor Journal: The American Statistician Pages: 220-220 Issue: 3 Volume: 68 Year: 2014 Month: 7 X-DOI: 10.1080/00031305.2014.908790 File-URL: http://hdl.handle.net/10.1080/00031305.2014.908790 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:3:p:220-220 Template-Type: ReDIF-Article 1.0 Author-Name: Changyong Feng Author-X-Name-First: Changyong Author-X-Name-Last: Feng Author-Name: Hongyue Wang Author-X-Name-First: Hongyue Author-X-Name-Last: Wang Author-Name: Yu Han Author-X-Name-First: Yu Author-X-Name-Last: Han Author-Name: Yinglin Xia Author-X-Name-First: Yinglin Author-X-Name-Last: Xia Author-Name: Xin M. Tu Author-X-Name-First: Xin M. Author-X-Name-Last: Tu Title: Reply Journal: The American Statistician Pages: 220a-220a Issue: 3 Volume: 68 Year: 2014 Month: 7 X-DOI: 10.1080/00031305.2014.916929 File-URL: http://hdl.handle.net/10.1080/00031305.2014.916929 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:3:p:220a-220a Template-Type: ReDIF-Article 1.0 Author-Name: Robert F. Bordley Author-X-Name-First: Robert F. Author-X-Name-Last: Bordley Title: Reference Class Forecasting: Resolving Its Challenge to Statistical Modeling Abstract: Statisticians generally consider statistical modeling superior (or at least a useful supplement) to experience-based intuition for estimating the outputs of a complex system. But recent psychological research has led to an enhancement of experience-based intuition known as reference class forecasting. The reference class forecasting approach has been championed as a superior alternative to statistical modeling and is already well-regarded in the planning community. This presents a challenge to statistical modeling. To address this challenge, this article uses a Bayesian approach for combining the reference class forecast and the model-based forecast. The Bayesian prior is informed by the reference class information. A likelihood function was constructed to reflect the model's information. This approach was used to estimate healthcare costs under a voluntary employee benefit association (VEBA). The resulting Bayesian posterior forecast had lower variance (and lower forecast error) than either the model-based forecast or the reference-class forecast. Journal: The American Statistician Pages: 221-229 Issue: 4 Volume: 68 Year: 2014 Month: 11 X-DOI: 10.1080/00031305.2014.937544 File-URL: http://hdl.handle.net/10.1080/00031305.2014.937544 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:4:p:221-229 Template-Type: ReDIF-Article 1.0 Author-Name: Jennifer L. Kirk Author-X-Name-First: Jennifer L. Author-X-Name-Last: Kirk Author-Name: Michael P. Fay Author-X-Name-First: Michael P. Author-X-Name-Last: Fay Title: An Introduction to Practical Sequential Inferences via Single-Arm Binary Response Studies Using the binseqtest R Package Abstract: We review sequential designs, including group sequential and two-stage designs, for testing or estimating a single binary parameter. We use this simple case to introduce ideas common to many sequential designs, which in this case can be explained without explicitly using stochastic processes. We focus on methods provided by our newly developed R package, binseqtest, which exactly bound the Type I error rate of tests and exactly maintain proper coverage of confidence intervals. Within this framework, we review some allowable practical adaptations of the sequential design. We explore issues such as the following: How should the design be modified if no assessment was made at one of the planned sequential stopping times? How should the parameter be estimated if the study needs to be stopped early? What reasons for stopping early are allowed? How should inferences be made when the study is stopped for crossing the boundary, but later information is collected about responses of subjects that had enrolled before the decision to stop but had not responded by that time? Answers to these questions are demonstrated using basic methods that are available in our binseqtest R package. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 230-242 Issue: 4 Volume: 68 Year: 2014 Month: 11 X-DOI: 10.1080/00031305.2014.951126 File-URL: http://hdl.handle.net/10.1080/00031305.2014.951126 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:4:p:230-242 Template-Type: ReDIF-Article 1.0 Author-Name: Jun Yan Author-X-Name-First: Jun Author-X-Name-Last: Yan Author-Name: Chao Guo Author-X-Name-First: Chao Author-X-Name-Last: Guo Author-Name: Laurie E. Paarlberg Author-X-Name-First: Laurie E. Author-X-Name-Last: Paarlberg Title: Are Nonprofit Antipoverty Organizations Located Where They Are Needed? A Spatial Analysis of the Greater Hartford Region Abstract: The geographic distribution of nonprofit antipoverty organizations has important implications for economic development, social services, public health, and policy efforts. With counts of antipoverty nonprofits at the census tract level in Greater Hartford, Connecticut, we examine whether these organizations are located in areas with high levels of poverty with a spatial zero-inflated-Poisson model. Covariates that measure need, resources, urban structure, and demographic characteristics are incorporated into both the zero-inflation component and the Poisson component of the model. Variation not explained by the covariates is captured by the combination of a spatial random effect and an unstructured random effect. Statistical inferences are done within the Bayesian framework. Model comparison with the conditional predictive ordinate suggests that the random effects and the zero-inflation are both important components in fitting the data. All three need measures-proportion of people below the poverty line, unemployment rate, and rental occupancy-are found to have significantly positive effect on the mean of the count, providing evidence that antipoverty nonprofits tend to locate where they are needed. The dataset and R/OpenBUGS code are available in supplementary materials online. Journal: The American Statistician Pages: 243-252 Issue: 4 Volume: 68 Year: 2014 Month: 11 X-DOI: 10.1080/00031305.2014.955211 File-URL: http://hdl.handle.net/10.1080/00031305.2014.955211 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:4:p:243-252 Template-Type: ReDIF-Article 1.0 Author-Name: Fan Yang Author-X-Name-First: Fan Author-X-Name-Last: Yang Author-Name: José R. Zubizarreta Author-X-Name-First: José R. Author-X-Name-Last: Zubizarreta Author-Name: Dylan S. Small Author-X-Name-First: Dylan S. Author-X-Name-Last: Small Author-Name: Scott Lorch Author-X-Name-First: Scott Author-X-Name-Last: Lorch Author-Name: Paul R. Rosenbaum Author-X-Name-First: Paul R. Author-X-Name-Last: Rosenbaum Title: Dissonant Conclusions When Testing the Validity of an Instrumental Variable Abstract: An instrument or instrumental variable is often used in an effort to avoid selection bias in inference about the effects of treatments when treatment choice is based on thoughtful deliberation. Instruments are increasingly used in health outcomes research. An instrument is a haphazard push to accept one treatment or another, where the push can affect outcomes only to the extent that it alters the treatment received. There are two key assumptions here: (R) the push is haphazard or essentially random once adjustments have been made for observed covariates, (E) the push affects outcomes only by altering the treatment, the so-called "exclusion restriction." These assumptions are often said to be untestable; however, that is untrue if testable means checking the compatibility of assumptions with other things we think we know. A test of this sort may result in a collection of claims that are individually plausible but mutually inconsistent, without clear indication as to which claim is culpable for the inconsistency. We discuss this subject in the context of our on-going study of the effects of delivery by cesarean section on the survival of extremely premature infants of 23-24 weeks gestational age. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 253-263 Issue: 4 Volume: 68 Year: 2014 Month: 11 X-DOI: 10.1080/00031305.2014.962764 File-URL: http://hdl.handle.net/10.1080/00031305.2014.962764 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:4:p:253-263 Template-Type: ReDIF-Article 1.0 Author-Name: Phillip E. Pfeifer Author-X-Name-First: Phillip E. Author-X-Name-Last: Pfeifer Author-Name: Yael Grushka-Cockayne Author-X-Name-First: Yael Author-X-Name-Last: Grushka-Cockayne Author-Name: Kenneth C. Lichtendahl Author-X-Name-First: Kenneth C. Author-X-Name-Last: Lichtendahl Title: The Promise of Prediction Contests Abstract: This article examines the prediction contest as a vehicle for aggregating the opinions of a crowd of experts. After proposing a general definition distinguishing prediction contests from other mechanisms for harnessing the wisdom of crowds, we focus on point-forecasting contests-contests in which forecasters submit point forecasts with a prize going to the entry closest to the quantity of interest. We first illustrate the incentive for forecasters to submit reports that exaggerate in the direction of their private information. Whereas this exaggeration raises a forecaster's mean squared error, it increases his or her chances of winning the contest. And in contrast to conventional wisdom, this nontruthful reporting usually improves the accuracy of the resulting crowd forecast. The source of this improvement is that exaggeration shifts weight away from public information (information known to all forecasters) and by so doing helps alleviate public knowledge bias. In the context of a simple theoretical model of overlapping information and forecaster behaviors, we present closed-form expressions for the mean squared error of the crowd forecasts which will help identify the situations in which point forecasting contests will be most useful. Journal: The American Statistician Pages: 264-270 Issue: 4 Volume: 68 Year: 2014 Month: 11 X-DOI: 10.1080/00031305.2014.937545 File-URL: http://hdl.handle.net/10.1080/00031305.2014.937545 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:4:p:264-270 Template-Type: ReDIF-Article 1.0 Author-Name: Thaddeus Tarpey Author-X-Name-First: Thaddeus Author-X-Name-Last: Tarpey Author-Name: R. Todd Ogden Author-X-Name-First: R. Todd Author-X-Name-Last: Ogden Author-Name: Eva Petkova Author-X-Name-First: Eva Author-X-Name-Last: Petkova Author-Name: Ronald Christensen Author-X-Name-First: Ronald Author-X-Name-Last: Christensen Title: A Paradoxical Result in Estimating Regression Coefficients Abstract: This article presents a counterintuitive result regarding the estimation of a regression slope coefficient. Paradoxically, the precision of the slope estimator can deteriorate when additional information is used to estimate its value. In a randomized experiment, the distribution of baseline variables should be identical across treatments due to randomization. The motivation for this article came from noting that the precision of slope estimators deteriorated when pooling baseline predictors across treatment groups. Journal: The American Statistician Pages: 271-276 Issue: 4 Volume: 68 Year: 2014 Month: 11 X-DOI: 10.1080/00031305.2014.940467 File-URL: http://hdl.handle.net/10.1080/00031305.2014.940467 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:4:p:271-276 Template-Type: ReDIF-Article 1.0 Author-Name: Shaoji Xu Author-X-Name-First: Shaoji Author-X-Name-Last: Xu Title: A Property of Geometric Mean Regression Abstract: This article gives an overview of four classical regressions: regression of Y on X, regression of X on Y, orthogonal regression, and geometric mean regression. It also compares two general parametric families that unify all four regressions: Deming's parametric family and Roos' parametric family. It is shown that Roos regression can be done by minimizing the sum of squared α-distance, and as a special case, geometric mean regression can be obtained by minimizing the sum of squared adjusted distances between the sample points and an imaginary line. Journal: The American Statistician Pages: 277-281 Issue: 4 Volume: 68 Year: 2014 Month: 11 X-DOI: 10.1080/00031305.2014.962763 File-URL: http://hdl.handle.net/10.1080/00031305.2014.962763 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:4:p:277-281 Template-Type: ReDIF-Article 1.0 Author-Name: B. O'Neill Author-X-Name-First: B. Author-X-Name-Last: O'Neill Title: Some Useful Moment Results in Sampling Problems Abstract: We consider the standard sampling problem involving a finite population of N objects and a sample of n objects taken from this population using simple random sampling without replacement. We consider the relationship between the moments of the sampled and unsampled parts and show how these are related to the population moments. We derive expectation, variance, and covariance results for the various quantities under consideration and use these to obtain standard sampling results with an extension to variance estimation with a "finite population correction." This clarifies and extends standard results in sampling theory for the estimation of the mean and variance of a population. Journal: The American Statistician Pages: 282-296 Issue: 4 Volume: 68 Year: 2014 Month: 11 X-DOI: 10.1080/00031305.2014.966589 File-URL: http://hdl.handle.net/10.1080/00031305.2014.966589 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:4:p:282-296 Template-Type: ReDIF-Article 1.0 Author-Name: A. B. Owen Author-X-Name-First: A. B. Author-X-Name-Last: Owen Author-Name: P. A. Roediger Author-X-Name-First: P. A. Author-X-Name-Last: Roediger Title: The Sign of the Logistic Regression Coefficient Abstract: Let Y be a binary random variable and X a scalar. Let be the maximum likelihood estimate of the slope in a logistic regression of Y on X with intercept. Further let and be the average of sample x values for cases with y = 0 and y = 1, respectively. Then under a condition that rules out separable predictors, we show that . More generally, if the xi are vector valued, then we show that if and only if . This holds for logistic regression and also for more general binary regressions with inverse link functions satisfying a log-concavity condition. Finally, when then the angle between and is less than 90° in binary regressions satisfying the log-concavity condition and the separation condition, when the design matrix has full rank. Journal: The American Statistician Pages: 297-301 Issue: 4 Volume: 68 Year: 2014 Month: 11 X-DOI: 10.1080/00031305.2014.951128 File-URL: http://hdl.handle.net/10.1080/00031305.2014.951128 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:4:p:297-301 Template-Type: ReDIF-Article 1.0 Author-Name: Vito M. R. Muggeo Author-X-Name-First: Vito M. R. Author-X-Name-Last: Muggeo Author-Name: Gianfranco Lovison Author-X-Name-First: Gianfranco Author-X-Name-Last: Lovison Title: The "Three Plus One" Likelihood-Based Test Statistics: Unified Geometrical and Graphical Interpretations Abstract: The presentations of the well-known likelihood ratio, Wald and score test statistics in textbooks appear to lack a unified graphical and geometrical interpretation. We present two simple graphical representations on a common scale for these three test statistics, and also the recently proposed gradient test statistic. These unified graphical displays may favor better understanding of the geometrical meaning of the likelihood-based statistics and provide useful insights into their connections. Journal: The American Statistician Pages: 302-306 Issue: 4 Volume: 68 Year: 2014 Month: 11 X-DOI: 10.1080/00031305.2014.955212 File-URL: http://hdl.handle.net/10.1080/00031305.2014.955212 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:4:p:302-306 Template-Type: ReDIF-Article 1.0 Author-Name: Peng Ding Author-X-Name-First: Peng Author-X-Name-Last: Ding Title: Tarpey, T., Ogden, R. T., Petkova, E., and Christensen R. (2014), "A Paradoxical Result in Estimating Regression Coefficients," The American Statistician, 68, 271-276 (this issue): Comment by Peng Ding Journal: The American Statistician Pages: 316-316 Issue: 4 Volume: 68 Year: 2014 Month: 11 X-DOI: 10.1080/00031305.2014.954733 File-URL: http://hdl.handle.net/10.1080/00031305.2014.954733 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:4:p:316-316 Template-Type: ReDIF-Article 1.0 Author-Name: Nitis Mukhopadhyay Author-X-Name-First: Nitis Author-X-Name-Last: Mukhopadhyay Title: Warr, R. L. and Erich, R. A. (2013), "Should the Interquartile Range Divided by the Standard Deviation be Used to Assess Normality?," The American Statistician, 67, 242-244: Comment by Mukhopadhyay and Reply Journal: The American Statistician Pages: 316-317 Issue: 4 Volume: 68 Year: 2014 Month: 11 X-DOI: 10.1080/00031305.2014.962765 File-URL: http://hdl.handle.net/10.1080/00031305.2014.962765 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:4:p:316-317 Template-Type: ReDIF-Article 1.0 Author-Name: Richard L. Warr Author-X-Name-First: Richard L. Author-X-Name-Last: Warr Author-Name: Roger A. Erich Author-X-Name-First: Roger A. Author-X-Name-Last: Erich Title: Reply Journal: The American Statistician Pages: 317-317 Issue: 4 Volume: 68 Year: 2014 Month: 11 X-DOI: 10.1080/00031305.2014.970867 File-URL: http://hdl.handle.net/10.1080/00031305.2014.970867 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:4:p:317-317 Template-Type: ReDIF-Article 1.0 Author-Name: Sivan Aldor-Noiman Author-X-Name-First: Sivan Author-X-Name-Last: Aldor-Noiman Author-Name: Lawrence D. Brown Author-X-Name-First: Lawrence D. Author-X-Name-Last: Brown Author-Name: Andreas Buja Author-X-Name-First: Andreas Author-X-Name-Last: Buja Author-Name: Wolfgang Rolke Author-X-Name-First: Wolfgang Author-X-Name-Last: Rolke Author-Name: Robert A. Stine Author-X-Name-First: Robert A. Author-X-Name-Last: Stine Title: Aldor-Noiman, S., Brown, L.D., Buja, A., Rolke, W., and Stine, R.A. (2013), "The Power to See: A New Graphical Test of Normality," The American Statistician, 67, 249-260 Journal: The American Statistician Pages: 318-318 Issue: 4 Volume: 68 Year: 2014 Month: 11 X-DOI: 10.1080/00031305.2014.970871 File-URL: http://hdl.handle.net/10.1080/00031305.2014.970871 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:68:y:2014:i:4:p:318-318 Template-Type: ReDIF-Article 1.0 Author-Name: Gregory P. Samsa Author-X-Name-First: Gregory P. Author-X-Name-Last: Samsa Title: Has It Really Been Demonstrated That Most Genomic Research Findings Are False? Abstract: In a widely cited article, Ioannidis argued that most published research findings are false; particularly discovery research involving massive testing, genomics being a typical example. However, his argument ignores adjustment for multiple testing and thus should be taken with a large grain of salt. This is a potential example for statistics courses that concentrate on problem formulation. Journal: The American Statistician Pages: 1-4 Issue: 1 Volume: 69 Year: 2015 Month: 2 X-DOI: 10.1080/00031305.2014.951127 File-URL: http://hdl.handle.net/10.1080/00031305.2014.951127 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:1:p:1-4 Template-Type: ReDIF-Article 1.0 Author-Name: Joel E. Cohen Author-X-Name-First: Joel E. Author-X-Name-Last: Cohen Title: Markov's Inequality and Chebyshev's Inequality for Tail Probabilities: A Sharper Image Abstract: Markov's inequality gives an upper bound on the probability that a nonnegative random variable takes large values. For example, if the random variable is the lifetime of a person or a machine, Markov's inequality says that the probability that an individual survives more than three times the average lifetime in the population of such individuals cannot exceed one-third. Here we give a simple, intuitive geometric interpretation and derivation of Markov's inequality. These results lead to inequalities sharper than Markov's when information about conditional expectations is available, as in reliability theory, demography, and actuarial mathematics. We use these results to sharpen Chebyshev's tail inequality also. Journal: The American Statistician Pages: 5-7 Issue: 1 Volume: 69 Year: 2015 Month: 2 X-DOI: 10.1080/00031305.2014.975842 File-URL: http://hdl.handle.net/10.1080/00031305.2014.975842 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:1:p:5-7 Template-Type: ReDIF-Article 1.0 Author-Name: Liang Hong Author-X-Name-First: Liang Author-X-Name-Last: Hong Title: The Absolute Difference Law For Expectations Abstract: We revisit the addition law for expectations and present a sibling law: the absolute law for expectations. We show that these two laws and their corresponding laws for probabilities can be reconciled under a single framework. As an application, we use the absolute law for expectations to calculate the mean absolute deviation. Finally, we remark on a hidden point in a related article previously published on these pages; this will help readers to avoid a potential pitfall. Journal: The American Statistician Pages: 8-10 Issue: 1 Volume: 69 Year: 2015 Month: 2 X-DOI: 10.1080/00031305.2014.994712 File-URL: http://hdl.handle.net/10.1080/00031305.2014.994712 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:1:p:8-10 Template-Type: ReDIF-Article 1.0 Author-Name: Lisa M. Lee Author-X-Name-First: Lisa M. Author-X-Name-Last: Lee Author-Name: Frances A. McCarty Author-X-Name-First: Frances A. Author-X-Name-Last: McCarty Author-Name: Tenny R. Zhang Author-X-Name-First: Tenny R. Author-X-Name-Last: Zhang Title: Ethical Numbers: Ethics Training in U.S. Graduate Statistics Programs, 2013-2014 Abstract: As important members of research teams, statisticians bear an ethical responsibility to analyze, interpret, and report data honestly and objectively. One way of reinforcing ethical responsibilities is through required courses covering a variety of ethics-related topics at the graduate level. We assessed ethics requirements for graduate-level statistics training programs in the United States for the 2013-2014 academic year using the websites of 88 universities, examining 103 biostatistics programs, and 136 statistics degree programs. We categorized programs' ethics training requirements as required or not required. Thirty-one (35.1%) universities required an ethics course for at least some degree students. Sixty-two (25.5%) degree programs required an ethics course for at least some students. The majority (77.4%) of required courses were worth 0 or 1 credit. Of the 177 programs without an ethics requirement, 19 (10.7%) listed an ethics elective. Although a single ethics course is insufficient for instilling an ethical approach to science, degree programs that model expectations through coursework point to the value of ethics in science. More training programs should prepare statisticians to consider the ethical dimensions of their work through required coursework. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 11-16 Issue: 1 Volume: 69 Year: 2015 Month: 2 X-DOI: 10.1080/00031305.2014.997891 File-URL: http://hdl.handle.net/10.1080/00031305.2014.997891 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:1:p:11-16 Template-Type: ReDIF-Article 1.0 Author-Name: Valeria Espinosa Author-X-Name-First: Valeria Author-X-Name-Last: Espinosa Author-Name: Donald B. Rubin Author-X-Name-First: Donald B. Author-X-Name-Last: Rubin Title: Did the Military Interventions in the Mexican Drug War Increase Violence? Abstract: We analyze publicly available data to estimate the causal effects of military interventions on the homicide rates in certain problematic regions in Mexico. We use the Rubin causal model to compare the post-intervention homicide rate in each intervened region to the hypothetical homicide rate for that same year had the military intervention not taken place. Because the effect of a military intervention is not confined to the municipality subject to the intervention, a nonstandard definition of units is necessary to estimate the causal effect of the intervention under the standard no-interference assumption of stable-unit treatment value assumption (SUTVA). Donor pools are created for each missing potential outcome under no intervention, thereby allowing for the estimation of unit-level causal effects. A multiple imputation approach accounts for uncertainty about the missing potential outcomes. Journal: The American Statistician Pages: 17-27 Issue: 1 Volume: 69 Year: 2015 Month: 2 X-DOI: 10.1080/00031305.2014.965796 File-URL: http://hdl.handle.net/10.1080/00031305.2014.965796 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:1:p:17-27 Template-Type: ReDIF-Article 1.0 Author-Name: Wei Wang Author-X-Name-First: Wei Author-X-Name-Last: Wang Author-Name: Dylan S. Small Author-X-Name-First: Dylan S. Author-X-Name-Last: Small Title: Monotone B-Spline Smoothing for a Generalized Linear Model Response Abstract: Various methods have been proposed for smoothing under the monotonicity constraint. We review the literature and implement an approach of monotone smoothing with B-splines for a generalized linear model response. The approach is expressed as a quadratic programming problem and is easily solved using the statistical software R. In a simulation study, we find that the approach performs better than other approaches with much faster computation time. The approach can also be used for smoothing under other shape constraints or mixed constraints. Supplementary materials of the appendices and R code to implement the developed approach is available online. Journal: The American Statistician Pages: 28-33 Issue: 1 Volume: 69 Year: 2015 Month: 2 X-DOI: 10.1080/00031305.2014.969445 File-URL: http://hdl.handle.net/10.1080/00031305.2014.969445 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:1:p:28-33 Template-Type: ReDIF-Article 1.0 Author-Name: Guangxiang Zhang Author-X-Name-First: Guangxiang Author-X-Name-Last: Zhang Author-Name: John J. Chen Author-X-Name-First: John J. Author-X-Name-Last: Chen Title: Biostatistics Faculty and NIH Awards at U.S. Medical Schools Abstract: Statistical principles and methods are critical to the success of biomedical and translational research. However, it is difficult to track and evaluate the monetary value of a biostatistician to a medical school (SoM). Limited published data on this topic are available, especially comparing across SoMs. Using National Institutes of Health (NIH) awards and American Association of Medical Colleges (AAMC) faculty counts data (2010-2013), together with online information on biostatistics faculty from 119 institutions across the country, we demonstrated that the number of biostatistics faculty was significantly positively associated with the amount of NIH awards, both as a school total and on a per faculty basis, across various sizes of U.S. SoMs. Biostatisticians, as a profession, should be proactive in communicating and advocating the value of their work and their unique contribution to the long-term success of a biomedical research enterprise. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 34-40 Issue: 1 Volume: 69 Year: 2015 Month: 2 X-DOI: 10.1080/00031305.2014.992959 File-URL: http://hdl.handle.net/10.1080/00031305.2014.992959 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:1:p:34-40 Template-Type: ReDIF-Article 1.0 Author-Name: Stephen Portnoy Author-X-Name-First: Stephen Author-X-Name-Last: Portnoy Title: Maximizing Probability Bounds Under Moment-Matching Restrictions Abstract: The problem of characterizing a distribution by its moments dates to work by Chebyshev in the mid-nineteenth century. There are clear (and close) connections with characteristic functions, moment spaces, quadrature, and other very classical mathematical pursuits. Lindsay and Basak posed the specific question of how far from normality could a distribution be if it matches k normal moments. They provided a bound on the maximal difference in cdfs, and implied that these bounds were attained. It will be shown here that in fact the bound is not attained if the number of even moments matched is odd. An explicit solution is developed as a symmetric distribution with a finite number of mass points when the number of even moments matched is even, and this bound for the even case is shown to hold as an explicit limit for the subsequent odd case. As Lindsay noted, the discrepancies can be sizable even for a moderate number of matched moments. Some comments on implications are proffered. Journal: The American Statistician Pages: 41-44 Issue: 1 Volume: 69 Year: 2015 Month: 2 X-DOI: 10.1080/00031305.2014.992960 File-URL: http://hdl.handle.net/10.1080/00031305.2014.992960 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:1:p:41-44 Template-Type: ReDIF-Article 1.0 Author-Name: Yaakov Malinovsky Author-X-Name-First: Yaakov Author-X-Name-Last: Malinovsky Author-Name: Paul S. Albert Author-X-Name-First: Paul S. Author-X-Name-Last: Albert Title: A Note on the Minimax Solution for the Two-Stage Group Testing Problem Abstract: Group testing is an active area of current research and has important applications in medicine, biotechnology, genetics, and product testing. There have been recent advances in design and estimation, but the simple Dorfman procedure introduced by R. Dorfman in 1943 is widely used in practice. In many practical situations, the exact value of the probability p of being affected is unknown. We present both minimax and Bayesian solutions for the group size problem when p is unknown. For unbounded p, we show that the minimax solution for group size is 8, while using a Bayesian strategy with Jeffreys' prior results in a group size of 13. We also present solutions when p is bounded from above. For the practitioner, we propose strong justification for using a group size of between 8 and 13 when a constraint on p is not incorporated and provide useable code for computing the minimax group size under a constrained p. Journal: The American Statistician Pages: 45-52 Issue: 1 Volume: 69 Year: 2015 Month: 2 X-DOI: 10.1080/00031305.2014.983545 File-URL: http://hdl.handle.net/10.1080/00031305.2014.983545 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:1:p:45-52 Template-Type: ReDIF-Article 1.0 Author-Name: Spyros Missiakoulis Author-X-Name-First: Spyros Author-X-Name-Last: Missiakoulis Title: Letter to the Editor Journal: The American Statistician Pages: 62-62 Issue: 1 Volume: 69 Year: 2015 Month: 2 X-DOI: 10.1080/00031305.2014.984816 File-URL: http://hdl.handle.net/10.1080/00031305.2014.984816 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:1:p:62-62 Template-Type: ReDIF-Article 1.0 Author-Name: Christy Chuang-Stein Author-X-Name-First: Christy Author-X-Name-Last: Chuang-Stein Author-Name: Narayanaswamy Balakrishnan Author-X-Name-First: Narayanaswamy Author-X-Name-Last: Balakrishnan Author-Name: Marcus Berzofsky Author-X-Name-First: Marcus Author-X-Name-Last: Berzofsky Author-Name: Amy Herring Author-X-Name-First: Amy Author-X-Name-Last: Herring Author-Name: Fred Hulting Author-X-Name-First: Fred Author-X-Name-Last: Hulting Author-Name: John McKenzie Author-X-Name-First: John Author-X-Name-Last: McKenzie Author-Name: Dionne Price Author-X-Name-First: Dionne Author-X-Name-Last: Price Author-Name: Stephen Stigler Author-X-Name-First: Stephen Author-X-Name-Last: Stigler Author-Name: George Williams Author-X-Name-First: George Author-X-Name-Last: Williams Author-Name: Ronald Wasserstein Author-X-Name-First: Ronald Author-X-Name-Last: Wasserstein Title: Celebrating the 175th Anniversary of ASA Journal: The American Statistician Pages: 64-67 Issue: 2 Volume: 69 Year: 2015 Month: 5 X-DOI: 10.1080/00031305.2015.1028765 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1028765 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:2:p:64-67 Template-Type: ReDIF-Article 1.0 Author-Name: Robert L. Mason Author-X-Name-First: Robert L. Author-X-Name-Last: Mason Author-Name: John D. McKenzie Author-X-Name-First: John D. Author-X-Name-Last: McKenzie Title: A Brief History of the American Statistical Association, 1990-2014 Abstract: The objective of this article is to present a brief chronological record of the American Statistical Association (ASA) from its modest beginnings in Boston in 1839 to its present status as a worldwide professional organization with approximately 19,000 members and a headquarters in Alexandria, Virginia. Emphasis is placed on accomplishments over the past 25 years of the ASA from the end of its Sesquicentennial Celebration in 1989 to the end of its 175th Anniversary Celebration in 2014. Its continued growth during this period has been achieved through the work of outstanding leaders, sections, chapters, and committees. This article briefly summarizes its achievements in organizational efficiency, membership services, innovative meetings, and publications. It also describes its work in structural change, education, public relations, and science policy. It ends with a positive look to the future. Journal: The American Statistician Pages: 68-78 Issue: 2 Volume: 69 Year: 2015 Month: 5 X-DOI: 10.1080/00031305.2015.1033984 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1033984 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:2:p:68-78 Template-Type: ReDIF-Article 1.0 Author-Name: James J. Cochran Author-X-Name-First: James J. Author-X-Name-Last: Cochran Title: ASA Presidents and Executive Directors Look Back on their Terms in Office Journal: The American Statistician Pages: 79-85 Issue: 2 Volume: 69 Year: 2015 Month: 5 X-DOI: 10.1080/00031305.2015.1033988 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1033988 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:2:p:79-85 Template-Type: ReDIF-Article 1.0 Author-Name: Jon R. Kettenring Author-X-Name-First: Jon R. Author-X-Name-Last: Kettenring Author-Name: Kenneth J. Koehler Author-X-Name-First: Kenneth J. Author-X-Name-Last: Koehler Author-Name: John D. McKenzie Jr. Author-X-Name-First: John D. Author-X-Name-Last: McKenzie Jr. Title: Challenges and Opportunities for Statistics in the Next 25 Years Abstract: Beginning with the 75th Anniversary of the American Statistical Association in 1914 and for subsequent 25-year celebrations, distinguished members of the association have addressed the future of statistics. A four-person panel engaged in the same exercise during the 2014 Joint Statistical Meetings for the ASA's dodransbicentennial. The panel identified a variety of strengths, weaknesses, opportunities, and threats for the profession in the next quarter of a century. This article highlights some of the discussion that took place. Journal: The American Statistician Pages: 86-90 Issue: 2 Volume: 69 Year: 2015 Month: 5 X-DOI: 10.1080/00031305.2015.1033987 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1033987 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:2:p:86-90 Template-Type: ReDIF-Article 1.0 Author-Name: Robert N. Rodriguez Author-X-Name-First: Robert N. Author-X-Name-Last: Rodriguez Title: Who Will Celebrate Our 200th Anniversary? Growing the Next Generation of ASA Members Abstract: During the next 25 years, the growth and vitality of the American Statistical Association will depend on how well we attract and serve members in emerging areas of practice such as data science, where statistics as a skill set is in high demand but statistics as a profession has low recognition. Successful adaptation to the era of Big Data requires that we broaden our understanding of statistical practice to include the work of all those who learn from data. In order to grow the next generation of members, we must also retain a much higher proportion of today's student members, many of whom leave the ASA upon graduation. By providing value that meets the needs of these groups and equips them to flourish in their organizations, we can become the Big Tent for Statistics. Journal: The American Statistician Pages: 91-95 Issue: 2 Volume: 69 Year: 2015 Month: 5 X-DOI: 10.1080/00031305.2015.1028231 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1028231 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:2:p:91-95 Template-Type: ReDIF-Article 1.0 Author-Name: Ron Wasserstein Author-X-Name-First: Ron Author-X-Name-Last: Wasserstein Title: Communicating the Power and Impact of Our Profession: A Heads Up for the Next Executive Directors of the ASA Journal: The American Statistician Pages: 96-99 Issue: 2 Volume: 69 Year: 2015 Month: 5 X-DOI: 10.1080/00031305.2015.1031283 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1031283 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:2:p:96-99 Template-Type: ReDIF-Article 1.0 Author-Name: Jessica Utts Author-X-Name-First: Jessica Author-X-Name-Last: Utts Title: The Many Facets of Statistics Education: 175 Years of Common Themes Abstract: The American Statistical Association's primary founder, Lemuel Shattuck, was driven by a passion for collecting and disseminating accurate information on vital statistics, public health, and other statistically related concerns. The 175th anniversary provides an opportunity to reflect on the education-related reasons ASA was founded and what it has done in education since its founding, especially in the past 25 years since the 150th anniversary. An examination of early and more recent issues of the ASA's journals reveals some common themes that have recurred over the past 175 years. We discuss what those themes are and what the ASA is doing to address them currently, and then conclude by discussing what ASA members can do to help. Journal: The American Statistician Pages: 100-107 Issue: 2 Volume: 69 Year: 2015 Month: 5 X-DOI: 10.1080/00031305.2015.1033981 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1033981 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:2:p:100-107 Template-Type: ReDIF-Article 1.0 Author-Name: David L. DeMets Author-X-Name-First: David L. Author-X-Name-Last: DeMets Author-Name: Janet Turk Wittes Author-X-Name-First: Janet Turk Author-X-Name-Last: Wittes Author-Name: Nancy L. Geller Author-X-Name-First: Nancy L. Author-X-Name-Last: Geller Title: The Influence of Biostatistics at the National Heart, Lung, and Blood Institute Abstract: Since the early 1950s, the National Heart, Lung, and Blood Institute (NHBLI) has conducted a long series of influential randomized clinical trials in heart, lung, and blood diseases. The biostatisticians at the Institute have been central to the design, conduct, monitoring, and final analyses of these trials. The uniquely favorable deck of cards the group of biostatisticians at the Institute has been dealt over the six and half decades of the group's life has led to contributions that have had a major impact on the fields of biostatistics and clinical trials. The leaders of the NHLBI and its several Divisions have valued the independence, creativity, and collaborative interactions of statisticians within the Institute. The medical problems the Institute faced impelled the statisticians to develop methodology that would address questions of great public importance. Perhaps most importantly, the individual members of the group had a collective vision passed from member to member over time that new methodology must fit the questions being asked. The group has always had the technical ability to develop new methods and the conviction that they were responsible for ensuring that they could explain their methods to the clinicians with whom they worked. Journal: The American Statistician Pages: 108-120 Issue: 2 Volume: 69 Year: 2015 Month: 5 X-DOI: 10.1080/00031305.2015.1035962 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1035962 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:2:p:108-120 Template-Type: ReDIF-Article 1.0 Author-Name: Allan J. Rossman Author-X-Name-First: Allan J. Author-X-Name-Last: Rossman Author-Name: Roy St. Laurent Author-X-Name-First: Roy St. Author-X-Name-Last: Laurent Author-Name: Josh Tabor Author-X-Name-First: Josh Author-X-Name-Last: Tabor Title: Advanced Placement Statistics: Expanding the Scope of Statistics Education Abstract: A list of consequential developments in the field of statistics for the past quarter-century must include the creation and implementation of the Advanced Placement (AP) program in Statistics. This program has introduced millions of high school students to our discipline over the past 18 years, contributing to the large increase in the number of undergraduate students pursuing statistics as their major in college. ASA members and leaders have played a substantial role in shaping this program and furthering its success. Journal: The American Statistician Pages: 121-126 Issue: 2 Volume: 69 Year: 2015 Month: 5 X-DOI: 10.1080/00031305.2015.1033985 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1033985 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:2:p:121-126 Template-Type: ReDIF-Article 1.0 Author-Name: Eric A. Vance Author-X-Name-First: Eric A. Author-X-Name-Last: Vance Title: Recent Developments and Their Implications for the Future of Academic Statistical Consulting Centers Abstract: I describe how developments over the past 25 years in computing, funding, personnel, purpose, and training have affected academic statistical consulting centers and discuss how these developments and trends point to a range of potential futures. At one extreme, academic statistical consulting centers fail to adapt to competition from other disciplines in an increasingly fragmented market for statistical consulting and spiral downward toward irrelevancy and extinction. At the other extreme, purpose-driven academic statistical consulting centers constantly increase their impact in a virtuous cycle, leading the way toward the profession of statistics having greater positive impact on society. I conclude with actions to take to assure a robust future and increased impact for academic statistical consulting centers. Journal: The American Statistician Pages: 127-137 Issue: 2 Volume: 69 Year: 2015 Month: 5 X-DOI: 10.1080/00031305.2015.1033990 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1033990 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:2:p:127-137 Template-Type: ReDIF-Article 1.0 Author-Name: Nicholas J. Horton Author-X-Name-First: Nicholas J. Author-X-Name-Last: Horton Title: Challenges and Opportunities for Statistics and Statistical Education: Looking Back, Looking Forward Abstract: The 175th anniversary of the ASA provides an opportunity to look back into the past and peer into the future. What led our forebears to found the association? What commonalities do we still see? What insights might we glean from their experiences and observations? I will use the anniversary as a chance to reflect on where we are now and where we are headed in terms of statistical education amidst the growth of data science. Statistics is the science of learning from data. By fostering more multivariable thinking, building data-related skills, and developing simulation-based problem solving, we can help to ensure that statisticians are fully engaged in data science and the analysis of the abundance of data now available to us. Journal: The American Statistician Pages: 138-145 Issue: 2 Volume: 69 Year: 2015 Month: 5 X-DOI: 10.1080/00031305.2015.1032435 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1032435 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:2:p:138-145 Template-Type: ReDIF-Article 1.0 Author-Name: Saralees Nadarajah Author-X-Name-First: Saralees Author-X-Name-Last: Nadarajah Title: On the Computation of Gauss Hypergeometric Functions Abstract: The pioneering study undertaken by Liang et al. in 2008 (Journal of the American Statistical Association, 103, 410-423) and the hundreds of papers citing that work make use of certain hypergeometric functions. Liang et al. and many others claim that the computation of the hypergeometric functions is difficult. Here, we show that the hypergeometric functions can in fact be reduced to simpler functions that can often be computed using a pocket calculator. Journal: The American Statistician Pages: 146-148 Issue: 2 Volume: 69 Year: 2015 Month: 5 X-DOI: 10.1080/00031305.2015.1028595 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1028595 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:2:p:146-148 Template-Type: ReDIF-Article 1.0 Author-Name: Robert Easterling Author-X-Name-First: Robert Author-X-Name-Last: Easterling Title: There's Nothing Wrong With Clopper-Pearson Binomial Confidence Limits Journal: The American Statistician Pages: 154-155 Issue: 2 Volume: 69 Year: 2015 Month: 5 X-DOI: 10.1080/00031305.2015.1019646 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1019646 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:2:p:154-155 Template-Type: ReDIF-Article 1.0 Author-Name: Mark F. Schilling Author-X-Name-First: Mark F. Author-X-Name-Last: Schilling Author-Name: Jimmy A. Doi Author-X-Name-First: Jimmy A. Author-X-Name-Last: Doi Title: Reply Journal: The American Statistician Pages: 155-156 Issue: 2 Volume: 69 Year: 2015 Month: 5 X-DOI: 10.1080/00031305.2015.1026760 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1026760 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:2:p:155-156 Template-Type: ReDIF-Article 1.0 Author-Name: Liang Hong Author-X-Name-First: Liang Author-X-Name-Last: Hong Title: Another Remark on the Alternative Expectation Formula Abstract: Students in a calculus-based probability course will often see the expectation formula for nonnegative continuous random variables in terms of the survival function. This alternative expectation formula has a wide spectrum of applications. It is natural to ask whether there is a multivariate version of this formula. This note gives an affirmative answer by establishing such a formula using two different approaches. The two approaches employed in this note correspond to the two approaches for the univariate case. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 157-159 Issue: 3 Volume: 69 Year: 2015 Month: 8 X-DOI: 10.1080/00031305.2015.1049710 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1049710 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:157-159 Template-Type: ReDIF-Article 1.0 Author-Name: Per Gösta Andersson Author-X-Name-First: Per Gösta Author-X-Name-Last: Andersson Title: A Classroom Approach to the Construction of an Approximate Confidence Interval of a Poisson Mean Using One Observation Abstract: Even elementary statistical problems may give rise to a deeper and broader discussion of issues in probability and statistics. The construction of an approximate confidence interval for a Poisson mean turns out to be such a case. The simple standard two-sided Wald confidence interval by normal approximation is discussed and compared with the score interval. The discussion is partly in the form of an imaginary dialog between a teacher and a student, where the latter is supposed to have studied mathematical statistics for at least one semester. Journal: The American Statistician Pages: 160-164 Issue: 3 Volume: 69 Year: 2015 Month: 8 X-DOI: 10.1080/00031305.2015.1056830 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1056830 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:160-164 Template-Type: ReDIF-Article 1.0 Author-Name: Joyee Ghosh Author-X-Name-First: Joyee Author-X-Name-Last: Ghosh Author-Name: Andrew E. Ghattas Author-X-Name-First: Andrew E. Author-X-Name-Last: Ghattas Title: Bayesian Variable Selection Under Collinearity Abstract: In this article, we highlight some interesting facts about Bayesian variable selection methods for linear regression models in settings where the design matrix exhibits strong collinearity. We first demonstrate via real data analysis and simulation studies that summaries of the posterior distribution based on marginal and joint distributions may give conflicting results for assessing the importance of strongly correlated covariates. The natural question is which one should be used in practice. The simulation studies suggest that posterior inclusion probabilities and Bayes factors that evaluate the importance of correlated covariates jointly are more appropriate, and some priors may be more adversely affected in such a setting. To obtain a better understanding behind the phenomenon, we study some toy examples with Zellner's g-prior. The results show that strong collinearity may lead to a multimodal posterior distribution over models, in which joint summaries are more appropriate than marginal summaries. Thus, we recommend a routine examination of the correlation matrix and calculation of the joint inclusion probabilities for correlated covariates, in addition to marginal inclusion probabilities, for assessing the importance of covariates in Bayesian variable selection. Journal: The American Statistician Pages: 165-173 Issue: 3 Volume: 69 Year: 2015 Month: 8 X-DOI: 10.1080/00031305.2015.1031827 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1031827 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:165-173 Template-Type: ReDIF-Article 1.0 Author-Name: Darrick Yee Author-X-Name-First: Darrick Author-X-Name-Last: Yee Author-Name: Andrew Ho Author-X-Name-First: Andrew Author-X-Name-Last: Ho Title: Discreteness Causes Bias in Percentage-Based Comparisons: A Case Study From Educational Testing Abstract: Discretizing continuous distributions can lead to bias in parameter estimates. We present a case study from educational testing that illustrates dramatic consequences of discreteness when discretizing partitions differ across distributions. The percentage of test takers who score above a certain cutoff score (percent above cutoff, or "PAC") often describes overall performance on a test. Year-over-year changes in PAC, or ΔPAC, have gained prominence under recent U.S. education policies, with public schools facing sanctions if they fail to meet PAC targets. In this article, we describe how test score distributions act as continuous distributions that are discretized inconsistently over time. We show that this can propagate considerable bias to PAC trends, where positive ΔPACs appear negative, and vice versa, for a substantial number of actual tests. A simple model shows that this bias applies to any comparison of PAC statistics in which values for one distribution are discretized differently from values for the other. Journal: The American Statistician Pages: 174-181 Issue: 3 Volume: 69 Year: 2015 Month: 8 X-DOI: 10.1080/00031305.2015.1031828 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1031828 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:174-181 Template-Type: ReDIF-Article 1.0 Author-Name: Timothy A. C. Hughes Author-X-Name-First: Timothy A. C. Author-X-Name-Last: Hughes Author-Name: Jaechoul Lee Author-X-Name-First: Jaechoul Author-X-Name-Last: Lee Title: A New Test for Short Memory in Long Memory Time Series Abstract: This article considers short memory characteristics in a long memory process. We derive new asymptotic results for the sample autocorrelation difference ratios. We used these results to develop a new portmanteau test that determines if short memory parameters are statistically significant. In simulations, the new test can detect short memory components more often than the Ljung-Box test when these short memory components are in fact within a long memory process. Interestingly, our test finds short memory autocorrelations in U.S. inflation rate data, whereas the Ljung-Box test fails to find these autocorrelations. Modeling these short memory autocorrelations of the inflation rate data leads to improved model accuracy and more precise prediction. Journal: The American Statistician Pages: 182-190 Issue: 3 Volume: 69 Year: 2015 Month: 8 X-DOI: 10.1080/00031305.2015.1056829 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1056829 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:182-190 Template-Type: ReDIF-Article 1.0 Author-Name: Shiyao Liu Author-X-Name-First: Shiyao Author-X-Name-Last: Liu Author-Name: Huaiqing Wu Author-X-Name-First: Huaiqing Author-X-Name-Last: Wu Author-Name: William Q. Meeker Author-X-Name-First: William Q. Author-X-Name-Last: Meeker Title: Understanding and Addressing the Unbounded "Likelihood" Problem Abstract: The joint probability density function, evaluated at the observed data, is commonly used as the likelihood function to compute maximum likelihood estimates. For some models, however, there exist paths in the parameter space along which this density-approximation likelihood goes to infinity and maximum likelihood estimation breaks down. In all applications, however, observed data are really discrete due to the round-off or grouping error of measurements. The "correct likelihood" based on interval censoring can eliminate the problem of an unbounded likelihood. This article categorizes the models leading to unbounded likelihoods into three groups and illustrates the density-approximation breakdown with specific examples. Although it is usually possible to infer how given data were rounded, when this is not possible, one must choose the width for interval censoring, so we study the effect of the round-off on estimation. We also give sufficient conditions for the joint density to provide the same maximum likelihood estimate as the correct likelihood, as the round-off error goes to zero. Journal: The American Statistician Pages: 191-200 Issue: 3 Volume: 69 Year: 2015 Month: 8 X-DOI: 10.1080/00031305.2014.1003968 File-URL: http://hdl.handle.net/10.1080/00031305.2014.1003968 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:191-200 Template-Type: ReDIF-Article 1.0 Author-Name: Anne-Laure Boulesteix Author-X-Name-First: Anne-Laure Author-X-Name-Last: Boulesteix Author-Name: Robert Hable Author-X-Name-First: Robert Author-X-Name-Last: Hable Author-Name: Sabine Lauer Author-X-Name-First: Sabine Author-X-Name-Last: Lauer Author-Name: Manuel J. A. Eugster Author-X-Name-First: Manuel J. A. Author-X-Name-Last: Eugster Title: A Statistical Framework for Hypothesis Testing in Real Data Comparison Studies Abstract: In computational sciences, including computational statistics, machine learning, and bioinformatics, it is often claimed in articles presenting new supervised learning methods that the new method performs better than existing methods on real data, for instance in terms of error rate. However, these claims are often not based on proper statistical tests and, even if such tests are performed, the tested hypothesis is not clearly defined and poor attention is devoted to the Type I and Type II errors. In the present article, we aim to fill this gap by providing a proper statistical framework for hypothesis tests that compare the performances of supervised learning methods based on several real datasets with unknown underlying distributions. After giving a statistical interpretation of ad hoc tests commonly performed by computational researchers, we devote special attention to power issues and outline a simple method of determining the number of datasets to be included in a comparison study to reach an adequate power. These methods are illustrated through three comparison studies from the literature and an exemplary benchmarking study using gene expression microarray data. All our results can be reproduced using R codes and datasets available from the companion website http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/020_pr ofessuren/boulesteix/compstud2013. Journal: The American Statistician Pages: 201-212 Issue: 3 Volume: 69 Year: 2015 Month: 8 X-DOI: 10.1080/00031305.2015.1005128 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1005128 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:201-212 Template-Type: ReDIF-Article 1.0 Author-Name: Derek S. Young Author-X-Name-First: Derek S. Author-X-Name-Last: Young Author-Name: Glenn F. Johnson Author-X-Name-First: Glenn F. Author-X-Name-Last: Johnson Author-Name: Mosuk Chow Author-X-Name-First: Mosuk Author-X-Name-Last: Chow Author-Name: James L. Rosenberger Author-X-Name-First: James L. Author-X-Name-Last: Rosenberger Title: The Challenges in Developing an Online Applied Statistics Program: Lessons Learned at Penn State University Abstract: Numerous professional fields have an increasing need for individuals trained in statistics and other quantitative analysis techniques. Today there exists great potential to fulfill this need by providing opportunities through online learning. However, to provide a high-quality education for returning adult professionals seeking advanced degrees in applied statistics online, many challenges need to be overcome. Based on our experience developing Penn State University's online program in applied statistics, we discuss the evolution of the program's curriculum, recruitment and development of online faculty, and meeting the requirements of students as important areas that require consideration in the development of an online program. We also highlight program evaluation strategies employed to ensure innovation and improvement in online education as cornerstones to a program's success. Journal: The American Statistician Pages: 213-220 Issue: 3 Volume: 69 Year: 2015 Month: 8 X-DOI: 10.1080/00031305.2015.1038583 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1038583 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:213-220 Template-Type: ReDIF-Article 1.0 Author-Name: Hyunju Lee Author-X-Name-First: Hyunju Author-X-Name-Last: Lee Author-Name: Ji Hwan Cha Author-X-Name-First: Ji Hwan Author-X-Name-Last: Cha Title: On Two General Classes of Discrete Bivariate Distributions Abstract: In this article, we develop two general classes of discrete bivariate distributions. We derive general formulas for the joint distributions belonging to the classes. The obtained formulas for the joint distributions are very general in the sense that new families of distributions can be generated just by specifying the "baseline seed distributions." The dependence structures of the bivariate distributions belonging to the proposed classes, along with basic statistical properties, are also discussed. New families of discrete bivariate distributions are generated from the classes. Furthermore, to assess the usefulness of the proposed classes, two discrete bivariate distributions generated from the classes are applied to analyze a real dataset and the results are compared with those obtained from conventional models. Journal: The American Statistician Pages: 221-230 Issue: 3 Volume: 69 Year: 2015 Month: 8 X-DOI: 10.1080/00031305.2015.1044564 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1044564 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:221-230 Template-Type: ReDIF-Article 1.0 Author-Name: Brigitte Baldi Author-X-Name-First: Brigitte Author-X-Name-Last: Baldi Author-Name: Jessica Utts Author-X-Name-First: Jessica Author-X-Name-Last: Utts Title: What Your Future Doctor Should Know About Statistics: Must-Include Topics for Introductory Undergraduate Biostatistics Abstract: The increased emphasis on evidence-based medicine creates a greater need for educating future physicians in the general domain of quantitative reasoning, probability, and statistics. Reflecting this trend, more medical schools now require applicants to have taken an undergraduate course in introductory statistics. Given the breadth of statistical applications, we should cover in that course certain essential topics that may not be covered in the more general introductory statistics course. In selecting and presenting such topics, we should bear in mind that doctors also need to communicate probabilistic concepts of risks and benefits to patients who are increasingly expected to be active participants in their own health care choices despite having no training in medicine or statistics. It is also important that interesting and relevant examples accompany the presentation, because the examples (rather than the details) are what students tend to retain years later. Here, we present a list of topics we cover in the introductory biostatistics course that may not be covered in the general introductory course. We also provide some of our favorite examples for discussing these topics. Journal: The American Statistician Pages: 231-240 Issue: 3 Volume: 69 Year: 2015 Month: 8 X-DOI: 10.1080/00031305.2015.1048903 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1048903 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:231-240 Template-Type: ReDIF-Article 1.0 Author-Name: P. Vellaisamy Author-X-Name-First: P. Author-X-Name-Last: Vellaisamy Title: On Probabilistic Proofs of Certain Binomial Identities Abstract: This short note gives a simple statistical proof of a binomial identity, by evaluating the Laplace transform of the maximum of n independent exponential random variables in two different ways. As a by-product, we obtain a rigorous proof of an interesting result concerning the exponential distribution. The connections between a probabilistic approach and our approach are discussed. In the process, several new binomial identities are also obtained. Journal: The American Statistician Pages: 241-243 Issue: 3 Volume: 69 Year: 2015 Month: 8 X-DOI: 10.1080/00031305.2015.1056381 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1056381 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:241-243 Template-Type: ReDIF-Article 1.0 Author-Name: R. Dennis Cook Author-X-Name-First: R. Dennis Author-X-Name-Last: Cook Author-Name: Liliana Forzani Author-X-Name-First: Liliana Author-X-Name-Last: Forzani Author-Name: Adam Rothman Author-X-Name-First: Adam Author-X-Name-Last: Rothman Title: Letter to the Editor Journal: The American Statistician Pages: 253-254 Issue: 3 Volume: 69 Year: 2015 Month: 8 X-DOI: 10.1080/00031305.2015.1053522 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1053522 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:253-254 Template-Type: ReDIF-Article 1.0 Author-Name: Thaddeus Tarpey Author-X-Name-First: Thaddeus Author-X-Name-Last: Tarpey Author-Name: R. Todd Ogden Author-X-Name-First: R. Todd Author-X-Name-Last: Ogden Author-Name: Eva Petkova Author-X-Name-First: Eva Author-X-Name-Last: Petkova Author-Name: Ronald Christensen Author-X-Name-First: Ronald Author-X-Name-Last: Christensen Title: Reply Journal: The American Statistician Pages: 254-255 Issue: 3 Volume: 69 Year: 2015 Month: 8 X-DOI: 10.1080/00031305.2015.1056613 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1056613 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:254-255 Template-Type: ReDIF-Article 1.0 Author-Name: Peng Ding Author-X-Name-First: Peng Author-X-Name-Last: Ding Title: Reply Journal: The American Statistician Pages: 255-256 Issue: 3 Volume: 69 Year: 2015 Month: 8 X-DOI: 10.1080/00031305.2015.1056615 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1056615 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:255-256 Template-Type: ReDIF-Article 1.0 Author-Name: Iliana Ignatova Author-X-Name-First: Iliana Author-X-Name-Last: Ignatova Author-Name: Roland Deutsch Author-X-Name-First: Roland Author-X-Name-Last: Deutsch Author-Name: Don Edwards Author-X-Name-First: Don Author-X-Name-Last: Edwards Title: Kirk, J.L., and Fay, M.P. "An Introduction to Practical Sequential Inferences Via Single-Arm Binary Response Studies Using the Binseqtest R Package," The American Statistician, 68, 230-242 Journal: The American Statistician Pages: 256-257 Issue: 3 Volume: 69 Year: 2015 Month: 8 X-DOI: 10.1080/00031305.2015.1053523 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1053523 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:256-257a Template-Type: ReDIF-Article 1.0 Author-Name: Emil M. Friedman Author-X-Name-First: Emil M. Author-X-Name-Last: Friedman Title: Nontransitivity, Correlation, and Causation Journal: The American Statistician Pages: 257-257 Issue: 3 Volume: 69 Year: 2015 Month: 8 X-DOI: 10.1080/00031305.2015.1056382 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1056382 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:257b-257b Template-Type: ReDIF-Article 1.0 Author-Name: Stavros D. Veresoglou Author-X-Name-First: Stavros D. Author-X-Name-Last: Veresoglou Author-Name: Matthias C. Rillig Author-X-Name-First: Matthias C. Author-X-Name-Last: Rillig Title: Evidence-Based Data Analysis: Protecting the World From Bad Code? Comment by Veresoglou and Rillig Journal: The American Statistician Pages: 257-257 Issue: 3 Volume: 69 Year: 2015 Month: 8 X-DOI: 10.1080/00031305.2015.1056831 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1056831 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:257c-257c Template-Type: ReDIF-Article 1.0 Author-Name: Nicholas J. Horton Author-X-Name-First: Nicholas J. Author-X-Name-Last: Horton Author-Name: Johanna S. Hardin Author-X-Name-First: Johanna S. Author-X-Name-Last: Hardin Title: Teaching the Next Generation of Statistics Students to “Think With Data”: Special Issue on Statistics and the Undergraduate Curriculum Journal: The American Statistician Pages: 259-265 Issue: 4 Volume: 69 Year: 2015 Month: 11 X-DOI: 10.1080/00031305.2015.1094283 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1094283 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:259-265 Template-Type: ReDIF-Article 1.0 Author-Name: George Cobb Author-X-Name-First: George Author-X-Name-Last: Cobb Title: Mere Renovation is Too Little Too Late: We Need to Rethink our Undergraduate Curriculum from the Ground Up Abstract: The last half-dozen years have seen The American Statistician publish well-argued and provocative calls to change our thinking about statistics and how we teach it, among them Brown and Kass, Nolan and Temple-Lang, and Legler et al. Within this past year, the ASA has issued a new and comprehensive set of guidelines for undergraduate programs (ASA, Curriculum Guidelines for Undergraduate Programs in Statistical Science). Accepting (and applauding) all this as background, the current article argues the need to rethink our curriculum from the ground up, and offers five principles and two caveats intended to help us along the path toward a new synthesis. These principles and caveats rest on my sense of three parallel evolutions: the convergence of trends in the roles of mathematics, computation, and context within statistics education. These ongoing changes, together with the articles cited above and the seminal provocation by Leo Breiman call for a deep rethinking of what we teach to undergraduates. In particular, following Brown and Kass, we should put priority on two goals, to make “fundamental concepts accessible” and to “minimize prerequisites to research.”[Received December 2014. Revised July 2015] Journal: The American Statistician Pages: 266-282 Issue: 4 Volume: 69 Year: 2015 Month: 11 X-DOI: 10.1080/00031305.2015.1093029 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1093029 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:266-282 Template-Type: ReDIF-Article 1.0 Author-Name: Nicholas Chamandy Author-X-Name-First: Nicholas Author-X-Name-Last: Chamandy Author-Name: Omkar Muralidharan Author-X-Name-First: Omkar Author-X-Name-Last: Muralidharan Author-Name: Stefan Wager Author-X-Name-First: Stefan Author-X-Name-Last: Wager Title: Teaching Statistics at Google-Scale Abstract: Modern data and applications pose very different challenges from those of the 1950s or even the 1980s. Students contemplating a career in statistics or data science need to have the tools to tackle problems involving massive, heavy-tailed data, often interacting with live, complex systems. However, despite the deepening connections between engineering and modern data science, we argue that training in classical statistical concepts plays a central role in preparing students to solve Google-scale problems. To this end, we present three industrial applications where significant modern data challenges were overcome by statistical thinking.[Received December 2014. Revised August 2015.] Journal: The American Statistician Pages: 283-291 Issue: 4 Volume: 69 Year: 2015 Month: 11 X-DOI: 10.1080/00031305.2015.1089790 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1089790 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:283-291 Template-Type: ReDIF-Article 1.0 Author-Name: Deborah Nolan Author-X-Name-First: Deborah Author-X-Name-Last: Nolan Author-Name: Duncan Temple Lang Author-X-Name-First: Duncan Author-X-Name-Last: Temple Lang Title: Explorations in Statistics Research: An Approach to Expose Undergraduates to Authentic Data Analysis Abstract: The Explorations in Statistics Research workshop is a one-week NSF-funded summer program that introduces undergraduate students to current research problems in applied statistics. The goal of the workshop is to expose students to exciting, modern applied statistical research and practice, with the ultimate aim of interesting them in seeking more training in statistics at the undergraduate and graduate levels. The program is explicitly designed to engage students in the connections between authentic domain problems and the statistical ideas and approaches needed to address these problems, which is an important aspect of statistical thinking that is difficult to teach and sometimes lacking in our methodological courses and programs. Over the past 9 years, we ran the workshop six times and a similar program in the sciences two times. We describe the program, summarize feedback from participants, and identify the key features to its success. We abstract these features and provide a set of recommendations for how faculty can incorporate important elements into their regular courses.[Received December 2014. Revised June 2015.] Journal: The American Statistician Pages: 292-299 Issue: 4 Volume: 69 Year: 2015 Month: 11 X-DOI: 10.1080/00031305.2015.1073624 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1073624 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:292-299 Template-Type: ReDIF-Article 1.0 Author-Name: Byran J. Smucker Author-X-Name-First: Byran J. Author-X-Name-Last: Smucker Author-Name: A. John Bailer Author-X-Name-First: A. John Author-X-Name-Last: Bailer Title: Beyond Normal: Preparing Undergraduates for the Work Force in a Statistical Consulting Capstone Abstract: In this article we chronicle the development of the undergraduate statistical consulting course at Miami University, from canned to client-based projects, and argue that if the course is well designed with suitable mentoring, students can perform remarkably sophisticated analyses of real-world data problems that require solutions beyond the methods encountered in previous classes. We review the historical context in which the consulting class evolved, describe the logistics of implementing it, and review assessment and student reaction to the course. We also illustrate the types of challenging projects the students are confronted with via two case studies and relate the skills learned and reinforced in this consulting class model to the skills demanded in the modern statistical work force. This course also provides an opportunity to strengthen and nurture key points from the new American Statistical Association guidelines for undergraduate programs: namely, communicating analyses of real and complex data that require the application of diverse statistical models and approaches. Supplementary materials for this article are available online.[Received December 2014. Revised July 2015.] Journal: The American Statistician Pages: 300-306 Issue: 4 Volume: 69 Year: 2015 Month: 11 X-DOI: 10.1080/00031305.2015.1077731 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1077731 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:300-306 Template-Type: ReDIF-Article 1.0 Author-Name: Scott D. Grimshaw Author-X-Name-First: Scott D. Author-X-Name-Last: Grimshaw Title: A Framework for Infusing Authentic Data Experiences Within Statistics Courses Abstract: Working with complex data is one of the important updates to the 2014 ASA Curriculum Guidelines for Undergraduate Programs in Statistical Science. Infusing “authentic data experiences” within courses allow students opportunities to learn and practice data skills as they prepare a dataset for analysis. While more modest in scope than a senior-level culminating experience, authentic data experiences provide an opportunity to demonstrate connections between data skills and statistical skills. The result is more practice of data skills for undergraduate statisticians.[Received November 2014. Revised July 2015.] Journal: The American Statistician Pages: 307-314 Issue: 4 Volume: 69 Year: 2015 Month: 11 X-DOI: 10.1080/00031305.2015.1081106 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1081106 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:307-314 Template-Type: ReDIF-Article 1.0 Author-Name: Jennifer L. Green Author-X-Name-First: Jennifer L. Author-X-Name-Last: Green Author-Name: Erin E. Blankenship Author-X-Name-First: Erin E. Author-X-Name-Last: Blankenship Title: Fostering Conceptual Understanding in Mathematical Statistics Abstract: In many undergraduate statistics programs, the two-semester calculus-based mathematical statistics sequence is the cornerstone of the curriculum. However, 10 years after the release of the Guidelines for the Assessment and Instruction in Statistics Education (GAISE) College Report, 2005, and the subsequent movement to stress conceptual understanding and foster active learning in statistics classrooms, the sequence still remains a traditional, lecture-intensive course. In this article, we discuss various instructional approaches, activities, and assessments that can be used to foster active learning and emphasize conceptual understanding while still covering the necessary theoretical content students need to be successful in subsequent statistics or actuarial science courses. In addition, we share student reflections on these course enhancements. The course revision we suggest doesn’t require substantial changes in content, so other mathematical statistics instructors can implement these strategies without sacrificing concepts in probability and inference that are fundamental to the needs of their students. Supplementary materials, including code used to generate class plots and activity handouts, are available online.Received December 2014. Revised June 2015. Journal: The American Statistician Pages: 315-325 Issue: 4 Volume: 69 Year: 2015 Month: 11 X-DOI: 10.1080/00031305.2015.1069759 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1069759 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:315-325 Template-Type: ReDIF-Article 1.0 Author-Name: Natalie J. Blades Author-X-Name-First: Natalie J. Author-X-Name-Last: Blades Author-Name: G. Bruce Schaalje Author-X-Name-First: G. Bruce Author-X-Name-Last: Schaalje Author-Name: William F. Christensen Author-X-Name-First: William F. Author-X-Name-Last: Christensen Title: The Second Course in Statistics: Design and Analysis of Experiments? Abstract: Statistics departments are facing rapid growth in enrollments and increases in demand for courses. This article discusses the use of design and analysis of experiments (DAE) as a nonterminal second course in statistics for undergraduate statistics majors, minors, and other students seeking exposure to the practice of statistics beyond the introductory course. DAE is a gateway to approaching statistical thinking as data-based problem solving by exposing students to statistical, computational, data, and communication skills in the second course. Given the somewhat antiquated view of design and deemphasis of classical design of experiments topics in the new ASA curriculum guidelines, DAE may seem an odd choice for the second course; however, it exposes students to the breadth of the statistical problem-solving process, explores foundational issues of the discipline, and is accessible to students who have not yet finished their advanced mathematical training. These skills remain essential in the data science era as students must be equipped to understand the potential and peril of found data using the principles of design. While DAE may not be the appropriate second course for all statistics programs, it provides a strong foundation for causal inference and experimental design for students pursuing a B.S. in Statistics in a program housed in a department of statistics.[Received December 2014. Revised July 2015.] Journal: The American Statistician Pages: 326-333 Issue: 4 Volume: 69 Year: 2015 Month: 11 X-DOI: 10.1080/00031305.2015.1086437 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1086437 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:326-333 Template-Type: ReDIF-Article 1.0 Author-Name: Ben Baumer Author-X-Name-First: Ben Author-X-Name-Last: Baumer Title: A Data Science Course for Undergraduates: Thinking With Data Abstract: Data science is an emerging interdisciplinary field that combines elements of mathematics, statistics, computer science, and knowledge in a particular application domain for the purpose of extracting meaningful information from the increasingly sophisticated array of data available in many settings. These data tend to be nontraditional, in the sense that they are often live, large, complex, and/or messy. A first course in statistics at the undergraduate level typically introduces students to a variety of techniques to analyze small, neat, and clean datasets. However, whether they pursue more formal training in statistics or not, many of these students will end up working with data that are considerably more complex, and will need facility with statistical computing techniques. More importantly, these students require a framework for thinking structurally about data. We describe an undergraduate course in a liberal arts environment that provides students with the tools necessary to apply data science. The course emphasizes modern, practical, and useful skills that cover the full data analysis spectrum, from asking an interesting question to acquiring, managing, manipulating, processing, querying, analyzing, and visualizing data, as well communicating findings in written, graphical, and oral forms. Supplementary materials for this article are available online.[Received June 2014. Revised July 2015.] Journal: The American Statistician Pages: 334-342 Issue: 4 Volume: 69 Year: 2015 Month: 11 X-DOI: 10.1080/00031305.2015.1081105 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1081105 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:334-342 Template-Type: ReDIF-Article 1.0 Author-Name: J. Hardin Author-X-Name-First: J. Author-X-Name-Last: Hardin Author-Name: R. Hoerl Author-X-Name-First: R. Author-X-Name-Last: Hoerl Author-Name: Nicholas J. Horton Author-X-Name-First: Nicholas J. Author-X-Name-Last: Horton Author-Name: D. Nolan Author-X-Name-First: D. Author-X-Name-Last: Nolan Author-Name: B. Baumer Author-X-Name-First: B. Author-X-Name-Last: Baumer Author-Name: O. Hall-Holt Author-X-Name-First: O. Author-X-Name-Last: Hall-Holt Author-Name: P. Murrell Author-X-Name-First: P. Author-X-Name-Last: Murrell Author-Name: R. Peng Author-X-Name-First: R. Author-X-Name-Last: Peng Author-Name: P. Roback Author-X-Name-First: P. Author-X-Name-Last: Roback Author-Name: D. Temple Lang Author-X-Name-First: D. Author-X-Name-Last: Temple Lang Author-Name: M. D. Ward Author-X-Name-First: M. D. Author-X-Name-Last: Ward Title: Data Science in Statistics Curricula: Preparing Students to “Think with Data” Abstract: A growing number of students are completing undergraduate degrees in statistics and entering the workforce as data analysts. In these positions, they are expected to understand how to use databases and other data warehouses, scrape data from Internet sources, program solutions to complex problems in multiple languages, and think algorithmically as well as statistically. These data science topics have not traditionally been a major component of undergraduate programs in statistics. Consequently, a curricular shift is needed to address additional learning outcomes. The goal of this article is to motivate the importance of data science proficiency and to provide examples and resources for instructors to implement data science in their own statistics curricula. We provide case studies from seven institutions. These varied approaches to teaching data science demonstrate curricular innovations to address new needs. Also included here are examples of assignments designed for courses that foster engagement of undergraduates with data and data science.[Received November 2014. Revised July 2015.] Journal: The American Statistician Pages: 343-353 Issue: 4 Volume: 69 Year: 2015 Month: 11 X-DOI: 10.1080/00031305.2015.1077729 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1077729 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:343-353 Template-Type: ReDIF-Article 1.0 Author-Name: Shonda Kuiper Author-X-Name-First: Shonda Author-X-Name-Last: Kuiper Author-Name: Rodney X. Sturdivant Author-X-Name-First: Rodney X. Author-X-Name-Last: Sturdivant Title: Using Online Game-Based Simulations to Strengthen Students’ Understanding of Practical Statistical Issues in Real-World Data Analysis Abstract: Datasets provided to students are typically carefully chosen and vetted to illustrate a key statistical topic or method. Rarely are real studies and data so straightforward. In addition, carefully curated datasets that are brought into the statistics classroom may not feel realistic to students. We provide several examples of online activities where students can quickly collect their own local data, have input on the goals of the study and draw their own conclusions. These activities focus on core statistical issues that are often challenging to teach with traditional textbooks, such as working with messy data, bias, data relevance, and reliability. This approach to teaching integrates the challenges of data in a way that encourages students to see how easy it can be to inadvertently draw misleading conclusions. These activities are designed to be highly adaptable and have proven effective in a wide variety of introductory and advanced undergraduate courses.[Received December 2014. Revised July 2015.] Journal: The American Statistician Pages: 354-361 Issue: 4 Volume: 69 Year: 2015 Month: 11 X-DOI: 10.1080/00031305.2015.1075421 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1075421 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:354-361 Template-Type: ReDIF-Article 1.0 Author-Name: Nathan Tintle Author-X-Name-First: Nathan Author-X-Name-Last: Tintle Author-Name: Beth Chance Author-X-Name-First: Beth Author-X-Name-Last: Chance Author-Name: George Cobb Author-X-Name-First: George Author-X-Name-Last: Cobb Author-Name: Soma Roy Author-X-Name-First: Soma Author-X-Name-Last: Roy Author-Name: Todd Swanson Author-X-Name-First: Todd Author-X-Name-Last: Swanson Author-Name: Jill VanderStoep Author-X-Name-First: Jill Author-X-Name-Last: VanderStoep Title: Combating Anti-Statistical Thinking Using Simulation-Based Methods Throughout the Undergraduate Curriculum Abstract: The use of simulation-based methods for introducing inferen-ce is growing in popularity for the Stat 101 course, due in part to increasing evidence of the methods ability to improve studen-ts’ statistical thinking. This impact comes from simulation-based methods (a) clearly presenting the overarching logic of inference, (b) strengthening ties between statistics and probability/mathematical concepts, (c) encouraging a focus on the entire research process, (d) facilitating student thinking about advanced statistical concepts, (e) allowing more time to explore, do, and talk about real research and messy data, and (f) acting as a firm-er foundation on which to build statistical intuition. Thus, we argue that simulation-based inference should be an entry point to an undergraduate statistics program for all students, and that simulation-based inference should be used throughout all under-graduate statistics courses. To achieve this goal and fully recognize the benefits of simulation-based inference on the undergraduate statistics program, we will need to break free of historical forces tying undergraduate statistics curricula to mathematics, consider radical and innovative new pedagogical approaches in our courses, fully implement assessment-driven content innovations, and embrace computation throughout the curriculum.[Received December 2014. Revised July 2015] Journal: The American Statistician Pages: 362-370 Issue: 4 Volume: 69 Year: 2015 Month: 11 X-DOI: 10.1080/00031305.2015.1081619 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1081619 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:362-370 Template-Type: ReDIF-Article 1.0 Author-Name: Tim C. Hesterberg Author-X-Name-First: Tim C. Author-X-Name-Last: Hesterberg Title: What Teachers Should Know About the Bootstrap: Resampling in the Undergraduate Statistics Curriculum Abstract: Bootstrapping has enormous potential in statistics education and practice, but there are subtle issues and ways to go wrong. For example, the common combination of nonparametric bootstrapping and bootstrap percentile confidence intervals is less accurate than using t-intervals for small samples, though more accurate for larger samples. My goals in this article are to provide a deeper understanding of bootstrap methods—how they work, when they work or not, and which methods work better—and to highlight pedagogical issues. Supplementary materials for this article are available online.[Received December 2014. Revised August 2015] Journal: The American Statistician Pages: 371-386 Issue: 4 Volume: 69 Year: 2015 Month: 11 X-DOI: 10.1080/00031305.2015.1089789 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1089789 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:371-386 Template-Type: ReDIF-Article 1.0 Author-Name: Davit Khachatryan Author-X-Name-First: Davit Author-X-Name-Last: Khachatryan Title: Incorporating Statistical Consulting Case Studies in Introductory Time Series Courses Abstract: Established as a rigorous pedagogical device at Harvard University, the case method has grown into an indispensable mode of instruction at many business schools. Its effectiveness has been praised for increasing student participation during in-class discussions, providing hands-on engagement in real-world business problems, and increasing long-term retention rates. This article illustrates how novel case studies that mimic real-life statistical consulting engagements can be incorporated in the curriculum of an undergraduate, introductory time series course. The assessment of learning objectives as well as pedagogical implications when teaching using statistical consulting case studies is elucidated. The article also lays out guidelines for adopting statistical consulting case studies should the readers choose to incorporate the case method into the curricula of courses that they teach. A sample case study which the author has successfully used in his classroom instruction is provided in this article.Received July 2014. Revised January 2015 Journal: The American Statistician Pages: 387-396 Issue: 4 Volume: 69 Year: 2015 Month: 11 X-DOI: 10.1080/00031305.2015.1026611 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1026611 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:387-396 Template-Type: ReDIF-Article 1.0 Author-Name: Scotland Leman Author-X-Name-First: Scotland Author-X-Name-Last: Leman Author-Name: Leanna House Author-X-Name-First: Leanna Author-X-Name-Last: House Author-Name: Andrew Hoegh Author-X-Name-First: Andrew Author-X-Name-Last: Hoegh Title: Developing a New Interdisciplinary Computational Analytics Undergraduate Program: A Qualitative-Quantitative-Qualitative Approach Abstract: Statistics departments play a vital role in educating students on the analysis of data for obtaining information and discovering knowledge. In the last several years, we have witnessed an explosion of data, which was not imaginable in years past. As a result, the methods and techniques used for data analysis have evolved. Beyond this, the technology used for storing, porting, and computing big data has also evolved, and so now must traditionally oriented statistics departments. In this article, we discuss the development of a new computational modeling program that meets these demands, and we detail how to balance the qualitative and quantitative components of modern day data analyses for statistical education.[Received December 2014. Revised August 2015.] Journal: The American Statistician Pages: 397-408 Issue: 4 Volume: 69 Year: 2015 Month: 11 X-DOI: 10.1080/00031305.2015.1090337 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1090337 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:397-408 Template-Type: ReDIF-Article 1.0 Author-Name: Beth Chance Author-X-Name-First: Beth Author-X-Name-Last: Chance Author-Name: Roxy Peck Author-X-Name-First: Roxy Author-X-Name-Last: Peck Title: From Curriculum Guidelines to Learning Outcomes: Assessment at the Program Level Abstract: The 2000 ASA Guidelines for Undergraduate Statistics majors aimed to provide guidance to programs with undergraduate degrees in statistics as to the content and skills that statistics majors should be learning. The 2014 Guidelines revise the earlier guidelines to reflect changes in the discipline. As programs strive to adjust their curricula to align with the 2014 Guidelines, it is appropriate to also think about developing an assessment cycle of evaluation. This will enable programs to determine whether students are learning what we want them to learn and to work on continuously improving the program over time. The first step is to translate the broader Guidelines into institution-specific measurable learning outcomes. This article focuses on providing examples of learning outcomes developed by different institutions based on the 2000 Guidelines. The companion article by Moore and Kaplan (this issue) focuses on choosing appropriate assessment methods and rubrics and creating an assessment plan. We hope the examples provided are illustrative and that they will assist programs as they implement the 2014 Guidelines.[Received November 2014. Revised July 2015.] Journal: The American Statistician Pages: 409-416 Issue: 4 Volume: 69 Year: 2015 Month: 11 X-DOI: 10.1080/00031305.2015.1077730 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1077730 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:409-416 Template-Type: ReDIF-Article 1.0 Author-Name: Allison Amanda Moore Author-X-Name-First: Allison Amanda Author-X-Name-Last: Moore Author-Name: Jennifer J. Kaplan Author-X-Name-First: Jennifer J. Author-X-Name-Last: Kaplan Title: Program Assessment for an Undergraduate Statistics Major Abstract: Program assessment is used by institutions and/or departments to prompt conversations about the status of student learning and make informed decisions about educational programs. It is also typically required by accreditation agencies, such as the Southern Association of Colleges and Schools (SACS) or the Western Association of Schools & Colleges (WASC). The cyclic assessment process includes four steps: establishing student learning outcomes, deciding on assessment methods, collecting and analyzing data, and reflecting on the results. The theory behind the choice of assessment methods and the use of rubrics in assessment is discussed. A description of the experiences of a Department of Statistics at a large research university during their process of developing an assessment plan for the undergraduate statistics major is provided. The article concludes with the lessons learned by the department as they completed the assessment development process. Supplementary materials for this article are available online.[Received December 2014. Revised July 2015] Journal: The American Statistician Pages: 417-424 Issue: 4 Volume: 69 Year: 2015 Month: 11 X-DOI: 10.1080/00031305.2015.1087331 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1087331 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:417-424 Template-Type: ReDIF-Article 1.0 Author-Name: Xu Xu Author-X-Name-First: Xu Author-X-Name-Last: Xu Author-Name: Peter Z. G. Qian Author-X-Name-First: Peter Z. G. Author-X-Name-Last: Qian Author-Name: Qing Liu Author-X-Name-First: Qing Author-X-Name-Last: Liu Title: Samurai Sudoku-Based Space-Filling Designs for Data Pooling Abstract: Pooling data from multiple sources plays an increasingly vital role in today’s world. By using a popular Sudoku game, we propose a new type of design, called a Samurai Sudoku-based space-filling design to address this issue. Such a design is an orthogonal array-based Latin hypercube design with the following attractive properties: (i) the complete design achieves uniformity in both univariate and bivariate margins; (ii) it can be divided into groups of subdesigns with overlaps such that each subdesign achieves uniformity in both univariate and bivariate margins; and (iii) each of the overlaps achieves uniformity in both univariate and bivariate margins. Examples are given to illustrate the properties of the proposed design, and to demonstrate the advantages of using the proposed design for pooling data from multiple sources.[Received August 2013. Revised July 2015.] Journal: The American Statistician Pages: 1-8 Issue: 1 Volume: 70 Year: 2016 Month: 2 X-DOI: 10.1080/00031305.2015.1114970 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1114970 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:1-8 Template-Type: ReDIF-Article 1.0 Author-Name: Marcio A. Diniz Author-X-Name-First: Marcio A. Author-X-Name-Last: Diniz Author-Name: Jasper De Bock Author-X-Name-First: Jasper Author-X-Name-Last: De Bock Author-Name: Arthur Van Camp Author-X-Name-First: Arthur Author-X-Name-Last: Van Camp Title: Characterizing Dirichlet Priors Abstract: The selection of prior distributions is a problem that has been heavily discussed since Bayes and Price published their article in 1763. Conjugate priors became popular, largely because of their mathematical convenience. In this study, we justify the use of the conjugate combination of a Dirichlet prior and a multinomial likelihood by imposing a fundamental principle that we call partition invariance, alongside other requirements that are well known in the literature.[Received January 2014. Revised July 2015.] Journal: The American Statistician Pages: 9-17 Issue: 1 Volume: 70 Year: 2016 Month: 2 X-DOI: 10.1080/00031305.2015.1100137 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1100137 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:9-17 Template-Type: ReDIF-Article 1.0 Author-Name: Omar A. Kittaneh Author-X-Name-First: Omar A. Author-X-Name-Last: Kittaneh Author-Name: Mohammad A. U. Khan Author-X-Name-First: Mohammad A. U. Author-X-Name-Last: Khan Author-Name: Muhammed Akbar Author-X-Name-First: Muhammed Author-X-Name-Last: Akbar Author-Name: Husam A. Bayoud Author-X-Name-First: Husam A. Author-X-Name-Last: Bayoud Title: Average Entropy: A New Uncertainty Measure with Application to Image Segmentation Abstract: Various modifications have been suggested in the past to extend Shannon entropy to continuous random variables. This article investigates these modifications, and suggests a new entropy measure with the name of average entropy (AE). AE is more general than Shannon entropy in the sense that its definition encompasses both continuous as well as discrete domains. It is additive, positive and attains zero only when the distribution is uniform. The main characteristic of the suggested measure lies in its consistency behavior. Many properties of AE, including its relationship with Kullback--Leibler information measure, are studied. Precise theorems about the vanishing of the conditional AE for both continuous and discrete distributions are provided. Toward the end, the measure is tested for its effectiveness in image segmentation.[Received March 2014. Revised June 2015.] Journal: The American Statistician Pages: 18-24 Issue: 1 Volume: 70 Year: 2016 Month: 2 X-DOI: 10.1080/00031305.2015.1089788 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1089788 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:18-24 Template-Type: ReDIF-Article 1.0 Author-Name: Merritt Lyon Author-X-Name-First: Merritt Author-X-Name-Last: Lyon Author-Name: Li C. Cheung Author-X-Name-First: Li C. Author-X-Name-Last: Cheung Author-Name: Joseph L. Gastwirth Author-X-Name-First: Joseph L. Author-X-Name-Last: Gastwirth Title: The Advantages of Using Group Means in Estimating the Lorenz Curve and Gini Index From Grouped Data Abstract: A recent article proposed a histogram-based method for estimating the Lorenz curve and Gini index from grouped data that did not use the group means reported by government agencies. When comparing their method to one based on group means, the authors assume a uniform density in each grouping interval, which leads to an overestimate of the overall average income. After reviewing the additional information in the group means, it will be shown that as the number of groups increases, the bounds on the Gini index obtained from the group means become narrower. This is not necessarily true for the histogram method. Two simple interpolation methods using the group means are described and the accuracy of the estimated Gini index they yield and the histogram-based one are compared to the published Gini index for the 1967--2013 period. The average absolute errors of the estimated Gini index obtained from the two methods using group means are noticeably less than that of the histogram-based method. Supplementary materials for this article are available online.[Received August 2014. Revised September 2015.] Journal: The American Statistician Pages: 25-32 Issue: 1 Volume: 70 Year: 2016 Month: 2 X-DOI: 10.1080/00031305.2015.1105152 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1105152 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:25-32 Template-Type: ReDIF-Article 1.0 Author-Name: Eugene Demidenko Author-X-Name-First: Eugene Author-X-Name-Last: Demidenko Title: The p-Value You Can’t Buy Abstract: There is growing frustration with the concept of the p-value. Besides having an ambiguous interpretation, the p-value can be made as small as desired by increasing the sample size, n. The p-value is outdated and does not make sense with big data: Everything becomes statistically significant. The root of the problem with the p-value is in the mean comparison. We argue that statistical uncertainty should be measured on the individual, not the group, level. Consequently, standard deviation (SD), not standard error (SE), error bars should be used to graphically present the data on two groups. We introduce a new measure based on the discrimination of individuals/objects from two groups, and call it the D-value. The D-value can be viewed as the n-of-1 p-value because it is computed in the same way as p while letting n equal 1. We show how the D-value is related to discrimination probability and the area above the receiver operating characteristic (ROC) curve. The D-value has a clear interpretation as the proportion of patients who get worse after the treatment, and as such facilitates to weigh up the likelihood of events under different scenarios.[Received January 2015. Revised June 2015.] Journal: The American Statistician Pages: 33-38 Issue: 1 Volume: 70 Year: 2016 Month: 2 X-DOI: 10.1080/00031305.2015.1069760 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1069760 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:33-38 Template-Type: ReDIF-Article 1.0 Author-Name: Joseph J. Lee Author-X-Name-First: Joseph J. Author-X-Name-Last: Lee Author-Name: Donald B. Rubin Author-X-Name-First: Donald B. Author-X-Name-Last: Rubin Title: Evaluating the Validity of Post-Hoc Subgroup Inferences: A Case Study Abstract: In randomized experiments, the random assignment of units to treatment groups justifies many of the widely used traditional analysis methods for evaluating causal effects. Specifying subgroups of units for further examination after observing outcomes, however, may partially nullify any advantages of randomized assignment when data are analyzed naively. Some previous statistical literature has treated all post-hoc subgroup analyses homogeneously as entirely invalid and thus uninterpretable. The extent of the validity of such analyses and the factors that affect the degree of validity remain largely unstudied. Here, we describe a recent pharmaceutical case with First Amendment legal implications, in which post-hoc subgroup analyses played a pivotal and controversial role. Through Monte Carlo simulation, we show that post-hoc results that seem highly significant make dramatic movements toward insignificance after accounting for the subgrouping procedure presumably used. Finally, we propose a novel, randomization-based method that generates valid post-hoc subgroup p-values, provided we know exactly how the subgroups were constructed. If we do not know the exact subgrouping procedure, our method may still place helpful bounds on the significance level of estimated effects. This randomization-based approach allows us to evaluate causal effects in situations where valid evaluations were previously considered impossible.[Received February 2014. Revised April 2015.] Journal: The American Statistician Pages: 39-46 Issue: 1 Volume: 70 Year: 2016 Month: 2 X-DOI: 10.1080/00031305.2015.1093961 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1093961 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:39-46 Template-Type: ReDIF-Article 1.0 Author-Name: Corwin Matthew Zigler Author-X-Name-First: Corwin Matthew Author-X-Name-Last: Zigler Title: The Central Role of Bayes’ Theorem for Joint Estimation of Causal Effects and Propensity Scores Abstract: Although propensity scores have been central to the estimation of causal effects for over 30 years, only recently has the statistical literature begun to consider in detail methods for Bayesian estimation of propensity scores and causal effects. Underlying this recent body of literature on Bayesian propensity score estimation is an implicit discordance between the goal of the propensity score and the use of Bayes’ theorem. The propensity score condenses multivariate covariate information into a scalar to allow estimation of causal effects without specifying a model for how each covariate relates to the outcome. Avoiding specification of a detailed model for the outcome response surface is valuable for robust estimation of causal effects, but this strategy is at odds with the use of Bayes’ theorem, which presupposes a full probability model for the observed data that adheres to the likelihood principle. The goal of this article is to explicate this fundamental feature of Bayesian estimation of causal effects with propensity scores to provide context for the existing literature and for future work on this important topic.[Received June 2014. Revised September 2015.] Journal: The American Statistician Pages: 47-54 Issue: 1 Volume: 70 Year: 2016 Month: 2 X-DOI: 10.1080/00031305.2015.1111260 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1111260 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:47-54 Template-Type: ReDIF-Article 1.0 Author-Name: Robert Lund Author-X-Name-First: Robert Author-X-Name-Last: Lund Author-Name: Gang Liu Author-X-Name-First: Gang Author-X-Name-Last: Liu Author-Name: Qin Shao Author-X-Name-First: Qin Author-X-Name-Last: Shao Title: A New Approach to ANOVA Methods for Autocorrelated Data Abstract: This article reexamines ANOVA problems for autocorrelated data. Using linear prediction techniques for stationary time series, a new test statistic that assesses a null hypothesis of equal means is proposed and investigated. Our test statistic mimics the classical F-type ratio form used with independent data, but substitutes estimated prediction residuals in for the errors. This simple tactic departs from past studies that adjust the quadratic forms in the numerator and denominator in the F ratio for autocorrelation. One of the advantages is that our statistic retains the classical null hypothesis F distribution (now as a limit) with the customary degrees of freedom. The statistic is shown to perform well in simulations. Asymptotic proofs are given in the case of autoregressive random errors; a sports application is supplied.[Received December 2014. Revised August 2015.] Journal: The American Statistician Pages: 55-62 Issue: 1 Volume: 70 Year: 2016 Month: 2 X-DOI: 10.1080/00031305.2015.1093026 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1093026 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:55-62 Template-Type: ReDIF-Article 1.0 Author-Name: Timothy W. Armistead Author-X-Name-First: Timothy W. Author-X-Name-Last: Armistead Title: Misunderstood and Unattributed: Revisiting M. H. Doolittle's Measures of Association, With a Note on Bayes’ Theorem Abstract: In the 1880s, American scholars developed measures of association and chance for cross-classification tables that anticipated the more widely known work of Galton, Pearson, Yule, and Fisher. Three of the measures form the historical backdrop for the earliest known use of a joint probability measure that mirrored Bayes’ theorem long before the latter gained general interest among statisticians. The joint probability measure, which served as a foundational step in M. H. Doolittle's development of the first of the two “association ratios,” has not previously been reviewed in the statistical literature. It was reintroduced as if newly developed in a subfield of experimental psychology more than a century after Doolittle's work was published. It has flourished there, but it has not seen use in other academic venues. The article describes its properties and limitations and proposes that it be disseminated and debated beyond its current narrow application. The article notes that Doolittle's first association ratio can be expressed as another joint probability and that prior treatments in the literature are inconsistent with Doolittle's understanding of its purpose. The article also demonstrates that the equivalent of Cohen's kappa (κ) was developed by Doolittle in 1887, as his second association measure.[Received December 2014. Revised August 2015.] Journal: The American Statistician Pages: 63-73 Issue: 1 Volume: 70 Year: 2016 Month: 2 X-DOI: 10.1080/00031305.2015.1086686 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1086686 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:63-73 Template-Type: ReDIF-Article 1.0 Author-Name: R. Wayne Oldford Author-X-Name-First: R. Wayne Author-X-Name-Last: Oldford Title: Self-Calibrating Quantile--Quantile Plots Abstract: Quantile--quantile plots, or qqplots, are an important visual tool for many applications but their interpretation requires some care and often more experience. This apparent subjectivity is unnecessary. By drawing on the computational and display facilities now widely available, qqplots are easily enriched to help with their interpretation. An overview of quantile functions and quantile--quantile plots is presented against the backdrop of their early historical development. Strengths and shortcomings of the traditional display are described. A new enhanced qqplot, the self-calibrating qqplot, is introduced and demonstrated on a variety of examples—both synthetic and real. Real examples include normal qqplots, log-normal plots, half-normal plots for factorial experiments, qqplots for and s in process improvement applications, detection of multivariate outliers, and the comparison of empirical distributions. Self-calibration is had by visually incorporating sampling variation in the qqplot display in a variety of ways. The new qqplot is available through the function and R package qqtest.[Received December 2014. Revised August 2015.] Journal: The American Statistician Pages: 74-90 Issue: 1 Volume: 70 Year: 2016 Month: 2 X-DOI: 10.1080/00031305.2015.1090338 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1090338 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:74-90 Template-Type: ReDIF-Article 1.0 Author-Name: Jocelyn T. Chi Author-X-Name-First: Jocelyn T. Author-X-Name-Last: Chi Author-Name: Eric C. Chi Author-X-Name-First: Eric C. Author-X-Name-Last: Chi Author-Name: Richard G. Baraniuk Author-X-Name-First: Richard G. Author-X-Name-Last: Baraniuk Title: k-POD: A Method for k-Means Clustering of Missing Data Abstract: The k-means algorithm is often used in clustering applications but its usage requires a complete data matrix. Missing data, however, are common in many applications. Mainstream approaches to clustering missing data reduce the missing data problem to a complete data formulation through either deletion or imputation but these solutions may incur significant costs. Our k-POD method presents a simple extension of k-means clustering for missing data that works even when the missingness mechanism is unknown, when external information is unavailable, and when there is significant missingness in the data.[Received November 2014. Revised August 2015.] Journal: The American Statistician Pages: 91-99 Issue: 1 Volume: 70 Year: 2016 Month: 2 X-DOI: 10.1080/00031305.2015.1086685 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1086685 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:91-99 Template-Type: ReDIF-Article 1.0 Author-Name: John E. Angus Author-X-Name-First: John E. Author-X-Name-Last: Angus Title: Bootstrapping a Universal Pivot When Nuisance Parameters are Estimated Abstract: In complete samples from a continuous cumulative distribution with unknown parameters, it is known that various pivotal functions can be constructed by appealing to the probability integral transform. A pivotal function (or simply pivot) is a function of the data and parameters that has the property that its distribution is free of any unknown parameters. Pivotal functions play a key role in constructing confidence intervals and hypothesis tests. If there are nuisance parameters in addition to a parameter of interest, and consistent estimators of the nuisance parameters are available, then substituting them into the pivot can preserve the pivot property while altering the pivot distribution, or may instead create a function that is approximately a pivot in the sense that its asymptotic distribution is free of unknown parameters. In this latter case, bootstrapping has been shown to be an effective way of estimating its distribution accurately and constructing confidence intervals that have more accurate coverage probability in finite samples than those based on the asymptotic pivot distribution. In this article, one particular pivotal function based on the probability integral transform is considered when nuisance parameters are estimated, and the estimation of its distribution using parametric bootstrapping is examined. Applications to finding confidence intervals are emphasized. This material should be of interest to instructors of upper division and beginning graduate courses in mathematical statistics who wish to integrate bootstrapping into their lessons on interval estimation and the use of pivotal functions.[Received November 2014. Revised August 2015.] Journal: The American Statistician Pages: 100-107 Issue: 1 Volume: 70 Year: 2016 Month: 2 X-DOI: 10.1080/00031305.2015.1086436 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1086436 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:100-107 Template-Type: ReDIF-Article 1.0 Author-Name: Tal Galili Author-X-Name-First: Tal Author-X-Name-Last: Galili Author-Name: Isaac Meilijson Author-X-Name-First: Isaac Author-X-Name-Last: Meilijson Title: An Example of an Improvable Rao--Blackwell Improvement, Inefficient Maximum Likelihood Estimator, and Unbiased Generalized Bayes Estimator Abstract: The Rao--Blackwell theorem offers a procedure for converting a crude unbiased estimator of a parameter θ into a “better” one, in fact unique and optimal if the improvement is based on a minimal sufficient statistic that is complete. In contrast, behind every minimal sufficient statistic that is not complete, there is an improvable Rao--Blackwell improvement. This is illustrated via a simple example based on the uniform distribution, in which a rather natural Rao--Blackwell improvement is uniformly improvable. Furthermore, in this example the maximum likelihood estimator is inefficient, and an unbiased generalized Bayes estimator performs exceptionally well. Counterexamples of this sort can be useful didactic tools for explaining the true nature of a methodology and possible consequences when some of the assumptions are violated.[Received December 2014. Revised September 2015.] Journal: The American Statistician Pages: 108-113 Issue: 1 Volume: 70 Year: 2016 Month: 2 X-DOI: 10.1080/00031305.2015.1100683 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1100683 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:108-113 Template-Type: ReDIF-Article 1.0 Author-Name: Iain L. MacDonald Author-X-Name-First: Iain L. Author-X-Name-Last: MacDonald Author-Name: Brendon M. Lapham Author-X-Name-First: Brendon M. Author-X-Name-Last: Lapham Title: Even More Direct Calculation of the Variance of a Maximum Penalized-Likelihood Estimator Abstract: We discuss here two examples of estimation by numerical maximization of penalized likelihood. We show that, in these examples, it is simpler not to use the EM algorithm for computation of the estimates or their standard errors. We discuss also confidence and credibility intervals based on penalized likelihood and a chi-squared approximate distribution, and compare such intervals with intervals of Wald type.[Received July 2014. Revised September 2015.] Journal: The American Statistician Pages: 114-118 Issue: 1 Volume: 70 Year: 2016 Month: 2 X-DOI: 10.1080/00031305.2015.1105151 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1105151 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:114-118 Template-Type: ReDIF-Article 1.0 Author-Name: Philip B. Stark Author-X-Name-First: Philip B. Author-X-Name-Last: Stark Title: Privacy, Big Data, and the Public Good: Frameworks for Engagement Journal: The American Statistician Pages: 119-119 Issue: 1 Volume: 70 Year: 2016 Month: 2 X-DOI: 10.1080/00031305.2015.1068625 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1068625 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:119-119 Template-Type: ReDIF-Article 1.0 Author-Name: Stephen M. Stigler Author-X-Name-First: Stephen M. Author-X-Name-Last: Stigler Title: Letter to the Editor Journal: The American Statistician Pages: 127-127 Issue: 1 Volume: 70 Year: 2016 Month: 2 X-DOI: 10.1080/00031305.2015.1105758 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1105758 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:127-127 Template-Type: ReDIF-Article 1.0 Author-Name: Ronald L. Wasserstein Author-X-Name-First: Ronald L. Author-X-Name-Last: Wasserstein Author-Name: Nicole A. Lazar Author-X-Name-First: Nicole A. Author-X-Name-Last: Lazar Title: The ASA's Statement on p-Values: Context, Process, and Purpose Journal: The American Statistician Pages: 129-133 Issue: 2 Volume: 70 Year: 2016 Month: 5 X-DOI: 10.1080/00031305.2016.1154108 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1154108 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:2:p:129-133 Template-Type: ReDIF-Article 1.0 Author-Name: Hossein Hoshyarmanesh Author-X-Name-First: Hossein Author-X-Name-Last: Hoshyarmanesh Author-Name: Amirhossein Karami Author-X-Name-First: Amirhossein Author-X-Name-Last: Karami Author-Name: Adel Mohammadpour Author-X-Name-First: Adel Author-X-Name-Last: Mohammadpour Title: Confidence Intervals for the Scale Parameter of Exponential Family of Distributions Abstract: This article presents a unified approach for computing nonequal tail optimal confidence intervals (CIs) for the scale parameter of the exponential family of distributions. We prove that there exists a pivotal quantity, as a function of a complete sufficient statistic, with a chi-square distribution. Using the similarity between equations of shortest, unbiased, and highest density CIs, all equations are reduced into a system of two equations that can be solved via a straightforward algorithm. Journal: The American Statistician Pages: 134-137 Issue: 2 Volume: 70 Year: 2016 Month: 5 X-DOI: 10.1080/00031305.2015.1123184 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1123184 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:2:p:134-137 Template-Type: ReDIF-Article 1.0 Author-Name: Pere Grima Author-X-Name-First: Pere Author-X-Name-Last: Grima Author-Name: Lourdes Rodero Author-X-Name-First: Lourdes Author-X-Name-Last: Rodero Author-Name: Xavier Tort-Martorell Author-X-Name-First: Xavier Author-X-Name-Last: Tort-Martorell Title: Explaining the Importance of Variability to Engineering Students Abstract: One of the main challenges of teaching statistics to engineering students is to convey the importance of being conscious of the presence of variability and of taking it into account when making technical and managerial decisions. Often, technical subjects are explained in an ideal and deterministic environment. This article shows the possibilities of simple electrical circuits—the Wheatstone Bridge among them—to explain to students how to characterize variability, how it is transmitted, and how it affects decisions. Additionally, they can be used to introduce the importance of robustness by showing that taking into account the variability of components allows the design of cheaper products with greater benefits than if one were to simply apply formulas that consider variables as exact values. The results are quite unexpected, and they arouse the interest and motivation of students. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 138-142 Issue: 2 Volume: 70 Year: 2016 Month: 5 X-DOI: 10.1080/00031305.2015.1064478 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1064478 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:2:p:138-142 Template-Type: ReDIF-Article 1.0 Author-Name: Hakan Demirtas Author-X-Name-First: Hakan Author-X-Name-Last: Demirtas Title: A Note on the Relationship Between the Phi Coefficient and the Tetrachoric Correlation Under Nonnormal Underlying Distributions Abstract: The connection between the phi coefficient and the tetrachoric correlation is well-understood when the underlying distribution is bivariate normal. For many other bivariate distributions, the identity that links these two quantities together is not straightforward to formulate. Furthermore, even when this can be done, solving the equation in either direction may be far from trivial. We propose a simple technique that enables students and researchers to compute one of these correlations when the other is specified. Generalizing the normal-based results to a broad range of bivariate distributional setups is potentially useful in graduate-level teaching as well as in simulation studies that involve dichotomization and random number generation where the relationships between these correlation types need to be modeled. Journal: The American Statistician Pages: 143-148 Issue: 2 Volume: 70 Year: 2016 Month: 5 X-DOI: 10.1080/00031305.2015.1077161 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1077161 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:2:p:143-148 Template-Type: ReDIF-Article 1.0 Author-Name: Henry S. Lynn Author-X-Name-First: Henry S. Author-X-Name-Last: Lynn Title: Training the Next Generation of Statisticians: From Head to Heart Abstract: A holistic view of training is advocated where educators focus not only on the competence but also on the character of future statisticians. The issues related to developing passion, formulating philosophy, and building moral personhood are discussed. The vision is to foster a generation of statisticians who are both well-equipped problem-solvers in specific scientific areas and compassionate reformers to the general society. Journal: The American Statistician Pages: 149-151 Issue: 2 Volume: 70 Year: 2016 Month: 5 X-DOI: 10.1080/00031305.2015.1123186 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1123186 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:2:p:149-151 Template-Type: ReDIF-Article 1.0 Author-Name: Michael D. Porter Author-X-Name-First: Michael D. Author-X-Name-Last: Porter Title: A Statistical Approach to Crime Linkage Abstract: The object of this article is to develop a statistical approach to criminal linkage analysis that discovers and groups crime events that share a common offender and prioritizes suspects for further investigation. Bayes factors are used to describe the strength of evidence that two crimes are linked. Using concepts from agglomerative hierarchical clustering, the Bayes factors for crime pairs are combined to provide similarity measures for comparing two crime series. This facilitates crime series clustering, crime series identification, and suspect prioritization. The ability of our models to make correct linkages and predictions is demonstrated under a variety of real-world scenarios with a large number of solved and unsolved breaking and entering crimes. For example, a naive Bayes model for pairwise case linkage can identify 82% of actual linkages with a 5% false positive rate. For crime series identification, 74%--89% of the additional crimes in a crime series can be identified from a ranked list of 50 incidents. Journal: The American Statistician Pages: 152-165 Issue: 2 Volume: 70 Year: 2016 Month: 5 X-DOI: 10.1080/00031305.2015.1123185 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1123185 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:2:p:152-165 Template-Type: ReDIF-Article 1.0 Author-Name: Martin L. Lesser Author-X-Name-First: Martin L. Author-X-Name-Last: Lesser Author-Name: Meredith B. Akerman Author-X-Name-First: Meredith B. Author-X-Name-Last: Akerman Author-Name: Nina Kohn Author-X-Name-First: Nina Author-X-Name-Last: Kohn Title: Analogies for Helping Clinicians and Investigators Better Understand the Principles and Practice of Biostatistics Abstract: For the interaction between the biostatistician and the clinician or research investigator to be successful, it is important not only for the investigator to be able to explain biological and medical principles in a way that can be understood by the biostatistician, so, too, the biostatistician needs tools to help the investigator understand both the practice of statistics and specific statistical methods. In our practice, we have found it useful to draw analogies between statistical concepts and familiar medical or everyday ideas. These analogies help to stress a point or provide an understanding on the part of the investigator. For example, explaining the reason for using a nonparametric procedure (a general procedure used when the underlying distribution of the data is not known or cannot be assumed) by comparing it to using broad spectrum antibiotics (a general antibiotic used when the specific bacteria causing infection is unknown or cannot be assumed) can be an effective teaching tool. We present a variety of useful (and hopefully amusing) analogies that can be adopted by statisticians to help investigators at all levels of experience better understand principles and practice of statistics. Journal: The American Statistician Pages: 166-170 Issue: 2 Volume: 70 Year: 2016 Month: 5 X-DOI: 10.1080/00031305.2015.1073625 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1073625 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:2:p:166-170 Template-Type: ReDIF-Article 1.0 Author-Name: Eloísa Díaz-Francés Author-X-Name-First: Eloísa Author-X-Name-Last: Díaz-Francés Title: Simple Estimation Intervals for Poisson, Exponential, and Inverse Gaussian Means Obtained by Symmetrizing the Likelihood Function Abstract: Likelihood intervals for the Poisson, exponential, and inverse Gaussian means that have simple analytically closed expressions and good coverage frequencies for any sample size are given here explicitly. Their simplicity is striking and they should be more broadly used in applications everywhere. Their soundness is due to three statistical properties that these three distributions share as well as the fact that for all of them there exists a simple power reparameterization that symmetrizes the corresponding likelihood function. As a consequence, asymptotic maximum likelihood results are applicable even for samples of size one. Likelihood intervals of the new parameter may be easily transformed back to the original parameter of interest, the mean, by the invariance property of the likelihood function. Practical examples are given to illustrate the proposed inferential procedures. Journal: The American Statistician Pages: 171-180 Issue: 2 Volume: 70 Year: 2016 Month: 5 X-DOI: 10.1080/00031305.2015.1123187 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1123187 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:2:p:171-180 Template-Type: ReDIF-Article 1.0 Author-Name: Thaddeus Tarpey Author-X-Name-First: Thaddeus Author-X-Name-Last: Tarpey Author-Name: R. Todd Ogden Author-X-Name-First: R. Todd Author-X-Name-Last: Ogden Title: Statistical Modeling to Inform Optimal Game Strategy: Markov Plays H-O-R-S-E Abstract: We illustrate practical uses of logistic regression and Markov chains by applying these concepts to the problem of developing optimal strategy in the popular basketball game of H-O-R-S-E. Based on data collected by the authors, we estimate model parameters for each author, describe strategies of optimizing each author’s probability of winning, and calculate the stationary distribution of a Markov chain that arises from the game. Journal: The American Statistician Pages: 181-186 Issue: 2 Volume: 70 Year: 2016 Month: 5 X-DOI: 10.1080/00031305.2016.1148629 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1148629 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:2:p:181-186 Template-Type: ReDIF-Article 1.0 Author-Name: Susan M. Perkins Author-X-Name-First: Susan M. Author-X-Name-Last: Perkins Author-Name: Peter Bacchetti Author-X-Name-First: Peter Author-X-Name-Last: Bacchetti Author-Name: Cynthia S. Davey Author-X-Name-First: Cynthia S. Author-X-Name-Last: Davey Author-Name: Christopher J. Lindsell Author-X-Name-First: Christopher J. Author-X-Name-Last: Lindsell Author-Name: Madhu Mazumdar Author-X-Name-First: Madhu Author-X-Name-Last: Mazumdar Author-Name: Robert A. Oster Author-X-Name-First: Robert A. Author-X-Name-Last: Oster Author-Name: Peter N. Peduzzi Author-X-Name-First: Peter N. Author-X-Name-Last: Peduzzi Author-Name: David M. Rocke Author-X-Name-First: David M. Author-X-Name-Last: Rocke Author-Name: Kyle D. Rudser Author-X-Name-First: Kyle D. Author-X-Name-Last: Rudser Author-Name: Mimi Kim Author-X-Name-First: Mimi Author-X-Name-Last: Kim Title: Best Practices for Biostatistical Consultation and Collaboration in Academic Health Centers Abstract: Given the increasing level and scope of biostatistics expertise needed at academic health centers today, we developed best practices guidelines for biostatistics units to be more effective in providing biostatistical support to their institutions, and in fostering an environment in which unit members can thrive professionally. Our recommendations focus on the key areas of: (1) funding sources and mechanisms; (2) providing and prioritizing access to biostatistical resources; and (3) interacting with investigators. We recommend that the leadership of biostatistics units negotiate for sufficient long-term infrastructure support to ensure stability and continuity of funding for personnel, align project budgets closely with actual level of biostatistical effort, devise and consistently apply strategies for prioritizing and tracking effort on studies, and clearly stipulate with investigators prior to project initiation policies regarding funding, lead time, and authorship. Journal: The American Statistician Pages: 187-194 Issue: 2 Volume: 70 Year: 2016 Month: 5 X-DOI: 10.1080/00031305.2015.1077727 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1077727 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:2:p:187-194 Template-Type: ReDIF-Article 1.0 Author-Name: Min Wang Author-X-Name-First: Min Author-X-Name-Last: Wang Author-Name: Guangying Liu Author-X-Name-First: Guangying Author-X-Name-Last: Liu Title: A Simple Two-Sample Bayesian t-Test for Hypothesis Testing Abstract: In this article, we propose an explicit closed-form Bayes factor for the problem of two-sample hypothesis testing. The proposed approach can be regarded as a Bayesian version of the pooled-variance t-statistic and has various appealing properties in practical applications. It relies on data only through the t-statistic and can thus be calculated by using an Excel spreadsheet or a pocket calculator. It avoids several undesirable paradoxes, which may be encountered by the previous Bayesian approach in the literature. Specifically, the proposed approach can be easily taught in an introductory statistics course with an emphasis on Bayesian thinking. Simulated and real data examples are provided for illustrative purposes. Journal: The American Statistician Pages: 195-201 Issue: 2 Volume: 70 Year: 2016 Month: 5 X-DOI: 10.1080/00031305.2015.1093027 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1093027 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:2:p:195-201 Template-Type: ReDIF-Article 1.0 Author-Name: Adam Loy Author-X-Name-First: Adam Author-X-Name-Last: Loy Author-Name: Lendie Follett Author-X-Name-First: Lendie Author-X-Name-Last: Follett Author-Name: Heike Hofmann Author-X-Name-First: Heike Author-X-Name-Last: Hofmann Title: Variations of Q--Q Plots: The Power of Our Eyes! Abstract: In statistical modeling, we strive to specify models that resemble data collected in studies or observed from processes. Consequently, distributional specification and parameter estimation are central to parametric models. Graphical procedures, such as the quantile--quantile (Q--Q) plot, are arguably the most widely used method of distributional assessment, though critics find their interpretation to be overly subjective. Formal goodness of fit tests are available and are quite powerful, but only indicate whether there is a lack of fit, not why there is lack of fit. In this article, we explore the use of the lineup protocol to inject rigor into graphical distributional assessment and compare its power to that of formal distributional tests. We find that lineup tests are considerably more powerful than traditional tests of normality. A further investigation into the design of Q--Q plots shows that de-trended Q--Q plots are more powerful than the standard approach as long as the plot preserves distances in x and y to be the same. While we focus on diagnosing nonnormality, our approach is general and can be directly extended to the assessment of other distributions. Journal: The American Statistician Pages: 202-214 Issue: 2 Volume: 70 Year: 2016 Month: 5 X-DOI: 10.1080/00031305.2015.1077728 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1077728 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:2:p:202-214 Template-Type: ReDIF-Article 1.0 Author-Name: Christopher S. Pentoney Author-X-Name-First: Christopher S. Author-X-Name-Last: Pentoney Author-Name: Dale E. Berger Author-X-Name-First: Dale E. Author-X-Name-Last: Berger Title: Confidence Intervals and the Within-the-Bar Bias Abstract: Bar graphs displaying means have been shown to bias interpretations of the underlying distributions: viewers typically report higher likelihoods for values within a bar than outside of a bar. One explanation is that viewer attention is driven by the whole bar, rather than only the edge that provides information about an average. This study explored several approaches to correcting this bias. Bar graphs with 95% confidence intervals were used with different levels of contrast to manipulate attention directed to the bar. Viewers showed less bias when the salience of the bar itself was reduced. Response latencies were lowest and bias was eliminated when participants were presented with only a confidence interval and no bar. Journal: The American Statistician Pages: 215-220 Issue: 2 Volume: 70 Year: 2016 Month: 5 X-DOI: 10.1080/00031305.2016.1141706 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1141706 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:2:p:215-220 Template-Type: ReDIF-Article 1.0 Author-Name: Saralees Nadarajah Author-X-Name-First: Saralees Author-X-Name-Last: Nadarajah Title: Letter to the Editor Journal: The American Statistician Pages: 224-224 Issue: 2 Volume: 70 Year: 2016 Month: 5 X-DOI: 10.1080/00031305.2015.1086438 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1086438 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:2:p:224-224 Template-Type: ReDIF-Article 1.0 Author-Name: Bartolomeo Stellato Author-X-Name-First: Bartolomeo Author-X-Name-Last: Stellato Author-Name: Bart P. G. Van Parys Author-X-Name-First: Bart P. G. Author-X-Name-Last: Van Parys Author-Name: Paul J. Goulart Author-X-Name-First: Paul J. Author-X-Name-Last: Goulart Title: Multivariate Chebyshev Inequality With Estimated Mean and Variance Abstract: A variant of the well-known Chebyshev inequality for scalar random variables can be formulated in the case where the mean and variance are estimated from samples. In this article, we present a generalization of this result to multiple dimensions where the only requirement is that the samples are independent and identically distributed. Furthermore, we show that as the number of samples tends to infinity our inequality converges to the theoretical multi-dimensional Chebyshev bound. Journal: The American Statistician Pages: 123-127 Issue: 2 Volume: 71 Year: 2017 Month: 4 X-DOI: 10.1080/00031305.2016.1186559 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1186559 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:123-127 Template-Type: ReDIF-Article 1.0 Author-Name: Robert A. Stine Author-X-Name-First: Robert A. Author-X-Name-Last: Stine Title: Explaining Normal Quantile-Quantile Plots Through Animation: The Water-Filling Analogy Abstract: A normal quantile-quantile (QQ) plot is an important diagnostic for checking the assumption of normality. Though useful, these plots confuse students in my introductory statistics classes. A water-filling analogy, however, intuitively conveys the underlying concept. This analogy characterizes a QQ plot as a parametric plot of the water levels in two gradually filling vases. Each vase takes its shape from a probability distribution or sample. If the vases share a common shape, then the water levels match throughout the filling, and the QQ plot traces a diagonal line. An R package qqvases provides an interactive animation of this process and is suitable for classroom use. Journal: The American Statistician Pages: 145-147 Issue: 2 Volume: 71 Year: 2017 Month: 4 X-DOI: 10.1080/00031305.2016.1200488 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1200488 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:145-147 Template-Type: ReDIF-Article 1.0 Author-Name: Hillel Bar-Gera Author-X-Name-First: Hillel Author-X-Name-Last: Bar-Gera Title: The Target Parameter of Adjusted R-Squared in Fixed-Design Experiments Abstract: R-squared (R2) and adjusted R-squared (R2Adj) are sometimes viewed as statistics detached from any target parameter, and sometimes as estimators for the population multiple correlation. The latter interpretation is meaningful only if the explanatory variables are random. This article proposes an alternative perspective for the case where the x’s are fixed. A new parameter is defined, in a similar fashion to the construction of R2, but relying on the true parameters rather than their estimates. (The parameter definition includes also the fixed x values.) This parameter is referred to as the “parametric” coefficient of determination, and denoted by ρ2*. The proposed ρ2* remains stable when irrelevant variables are removed (or added), unlike the unadjusted R2, which always goes up when variables, either relevant or not, are added to the model (and goes down when they are removed). The value of the traditional R2Adj may go up or down with added (or removed) variables, either relevant or not. It is shown that the unadjusted R2 overestimates ρ2*, while the traditional R2Adj underestimates it. It is also shown that for simple linear regression the magnitude of the bias of R2Adj can be as high as the bias of the unadjusted R2 (while their signs are opposite). Asymptotic convergence in probability of R2Adj to ρ2* is demonstrated. The effects of model parameters on the bias of R2 and R2Adj are characterized analytically and numerically. An alternative bi-adjusted estimator is presented and evaluated. Journal: The American Statistician Pages: 112-119 Issue: 2 Volume: 71 Year: 2017 Month: 4 X-DOI: 10.1080/00031305.2016.1200489 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1200489 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:112-119 Template-Type: ReDIF-Article 1.0 Author-Name: Yudi Pawitan Author-X-Name-First: Yudi Author-X-Name-Last: Pawitan Author-Name: Youngjo Lee Author-X-Name-First: Youngjo Author-X-Name-Last: Lee Title: Wallet Game: Probability, Likelihood, and Extended Likelihood Abstract: We propose a likelihood explanation to the two-person wallet game, a probability-related paradox, where an obviously fair game may appear favorable to both players. Yet a small variation of the game, without changing its fairness, turns it to seem unfavorable. The extended likelihood concept seems logically necessary if we want to allow the sense of uncertainty associated with a realized but still unobserved random outcome, while at the same time avoid potential probability-related paradoxes. Journal: The American Statistician Pages: 120-122 Issue: 2 Volume: 71 Year: 2017 Month: 4 X-DOI: 10.1080/00031305.2016.1202140 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1202140 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:120-122 Template-Type: ReDIF-Article 1.0 Author-Name: Ryan Martin Author-X-Name-First: Ryan Author-X-Name-Last: Martin Title: A Statistical Inference Course Based on -Values Abstract: Introductory statistical inference texts and courses treat the point estimation, hypothesis testing, and interval estimation problems separately, with primary emphasis on large-sample approximations. Here, I present an alternative approach to teaching this course, built around p-values, emphasizing provably valid inference for all sample sizes. Details about computation and marginalization are also provided, with several illustrative examples, along with a course outline. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 128-136 Issue: 2 Volume: 71 Year: 2017 Month: 4 X-DOI: 10.1080/00031305.2016.1208629 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1208629 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:128-136 Template-Type: ReDIF-Article 1.0 Author-Name: Shaobo Jin Author-X-Name-First: Shaobo Author-X-Name-Last: Jin Author-Name: Måns Thulin Author-X-Name-First: Måns Author-X-Name-Last: Thulin Author-Name: Rolf Larsson Author-X-Name-First: Rolf Author-X-Name-Last: Larsson Title: Approximate Bayesianity of Frequentist Confidence Intervals for a Binomial Proportion Abstract: The well-known Wilson and Agresti–Coull confidence intervals for a binomial proportion p are centered around a Bayesian estimator. Using this as a starting point, similarities between frequentist confidence intervals for proportions and Bayesian credible intervals based on low-informative priors are studied using asymptotic expansions. A Bayesian motivation for a large class of frequentist confidence intervals is provided. It is shown that the likelihood ratio interval for p approximates a Bayesian credible interval based on Kerman’s neutral noninformative conjugate prior up to O(n− 1) in the confidence bounds. For the significance level α ≲ 0.317, the Bayesian interval based on the Jeffreys’ prior is then shown to be a compromise between the likelihood ratio and Wilson intervals. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 106-111 Issue: 2 Volume: 71 Year: 2017 Month: 4 X-DOI: 10.1080/00031305.2016.1208630 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1208630 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:106-111 Template-Type: ReDIF-Article 1.0 Author-Name: Zhi-Sheng Ye Author-X-Name-First: Zhi-Sheng Author-X-Name-Last: Ye Author-Name: Nan Chen Author-X-Name-First: Nan Author-X-Name-Last: Chen Title: Closed-Form Estimators for the Gamma Distribution Derived From Likelihood Equations Abstract: It is well-known that maximum likelihood (ML) estimators of the two parameters in a gamma distribution do not have closed forms. This poses difficulties in some applications such as real-time signal processing using low-grade processors. The gamma distribution is a special case of a generalized gamma distribution. Surprisingly, two out of the three likelihood equations of the generalized gamma distribution can be used as estimating equations for the gamma distribution, based on which simple closed-form estimators for the two gamma parameters are available. Intuitively, performance of the new estimators based on likelihood equations should be close to the ML estimators. The study consolidates this conjecture by establishing the asymptotic behaviors of the new estimators. In addition, the closed-forms enable bias-corrections to these estimators. The bias-correction significantly improves the small-sample performance. Journal: The American Statistician Pages: 177-181 Issue: 2 Volume: 71 Year: 2017 Month: 4 X-DOI: 10.1080/00031305.2016.1209129 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1209129 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:177-181 Template-Type: ReDIF-Article 1.0 Author-Name: Brian Knaeble Author-X-Name-First: Brian Author-X-Name-Last: Knaeble Author-Name: Seth Dutter Author-X-Name-First: Seth Author-X-Name-Last: Dutter Title: Reversals of Least-Square Estimates and Model-Invariant Estimation for Directions of Unique Effects Abstract: When a linear model is adjusted to control for additional explanatory variables, the sign of a fitted coefficient may reverse. Here, these reversals are studied using coefficients of determination. The resulting theory can be used to determine directions of unique effects in the presence of model uncertainty. This process is called model-invariant estimation when the estimates are invariant across changes to the model structure. When a single covariate is added, the reversal region can be understood geometrically as an elliptical cone of two nappes with an axis of symmetry relating to a best-possible condition for a reversal using a single coefficient of determination. When a set of covariates are added to a model with a single explanatory variable, model-invariant estimation can be implemented using subject matter knowledge. More general theory with partial coefficients is applicable to analysis of large datasets. Applications are demonstrated with dietary health data from the United Nations. Journal: The American Statistician Pages: 97-105 Issue: 2 Volume: 71 Year: 2017 Month: 4 X-DOI: 10.1080/00031305.2016.1226951 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1226951 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:97-105 Template-Type: ReDIF-Article 1.0 Author-Name: Jo A. Wick Author-X-Name-First: Jo A. Author-X-Name-Last: Wick Author-Name: Hung-Wen Yeh Author-X-Name-First: Hung-Wen Author-X-Name-Last: Yeh Author-Name: Byron J. Gajewski Author-X-Name-First: Byron J. Author-X-Name-Last: Gajewski Title: A Bayesian Analysis of Synchronous Distance Learning versus Matched Traditional Control in Graduate Biostatistics Courses Abstract: Distance learning can be useful for bridging geographical barriers to education in rural settings. However, empirical evidence on the equivalence of distance education and traditional face-to-face (F2F) instruction in statistics and biostatistics is mixed. Despite the difficulty in randomization, we minimized intra-instructor variation between F2F and online sections in seven graduate-level biostatistics service courses in a synchronous (live, real time) fashion; that is, for each course taught in a traditional F2F setting, a separate set of students were taught simultaneously via online learning technology, allowing for two-way interaction between instructor and students. Our primary objective was to compare student performance in the two courses that use these two teaching modes. We used a Bayesian hierarchical model to test equivalence of modes. The frequentist mixed model approach was also conducted for reference. The results of Bayesian and frequentist methods agree and suggest a difference of less than 1% in average final grades. Finally, we discuss barriers to instruction and learning using the applied online teaching technology. Journal: The American Statistician Pages: 137-144 Issue: 2 Volume: 71 Year: 2017 Month: 4 X-DOI: 10.1080/00031305.2016.1247014 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1247014 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:137-144 Template-Type: ReDIF-Article 1.0 Author-Name: Philippa Swartz Author-X-Name-First: Philippa Author-X-Name-Last: Swartz Author-Name: Mike Grosskopf Author-X-Name-First: Mike Author-X-Name-Last: Grosskopf Author-Name: Derek Bingham Author-X-Name-First: Derek Author-X-Name-Last: Bingham Author-Name: Tim B. Swartz Author-X-Name-First: Tim B. Author-X-Name-Last: Swartz Title: The Quality of Pitches in Major League Baseball Abstract: This article considers the quality of pitches in Major League Baseball (MLB). Based on approximately 2.2 million pitches taken from the 2013, 2014, and 2015 MLB seasons, the quality of a particular pitch is evaluated as the expected number of bases conceded. Quality is expressed as a function of various covariates including pitch count, pitch location, pitch type, and pitch speed. The estimation of the pitch quality is obtained through the use of random forest methodology to accommodate the inherent complexity of the relationship between pitch quality and the associated covariates. With the fitted model, various applications are considered which provide new insights on pitching and batting. Journal: The American Statistician Pages: 148-154 Issue: 2 Volume: 71 Year: 2017 Month: 4 X-DOI: 10.1080/00031305.2016.1264313 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1264313 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:148-154 Template-Type: ReDIF-Article 1.0 Author-Name: Olanrewaju Akande Author-X-Name-First: Olanrewaju Author-X-Name-Last: Akande Author-Name: Fan Li Author-X-Name-First: Fan Author-X-Name-Last: Li Author-Name: Jerome Reiter Author-X-Name-First: Jerome Author-X-Name-Last: Reiter Title: An Empirical Comparison of Multiple Imputation Methods for Categorical Data Abstract: Multiple imputation is a common approach for dealing with missing values in statistical databases. The imputer fills in missing values with draws from predictive models estimated from the observed data, resulting in multiple, completed versions of the database. Researchers have developed a variety of default routines to implement multiple imputation; however, there has been limited research comparing the performance of these methods, particularly for categorical data. We use simulation studies to compare repeated sampling properties of three default multiple imputation methods for categorical data, including chained equations using generalized linear models, chained equations using classification and regression trees, and a fully Bayesian joint distribution based on Dirichlet process mixture models. We base the simulations on categorical data from the American Community Survey. In the circumstances of this study, the results suggest that default chained equations approaches based on generalized linear models are dominated by the default regression tree and Bayesian mixture model approaches. They also suggest competing advantages for the regression tree and Bayesian mixture model approaches, making both reasonable default engines for multiple imputation of categorical data. Supplementary material for this article is available online. Journal: The American Statistician Pages: 162-170 Issue: 2 Volume: 71 Year: 2017 Month: 4 X-DOI: 10.1080/00031305.2016.1277158 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1277158 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:162-170 Template-Type: ReDIF-Article 1.0 Author-Name: Amy L. Phelps Author-X-Name-First: Amy L. Author-X-Name-Last: Phelps Author-Name: Kathryn A. Szabat Author-X-Name-First: Kathryn A. Author-X-Name-Last: Szabat Title: The Current Landscape of Teaching Analytics to Business Students at Institutions of Higher Education: Who is Teaching What? Abstract: Business analytics continues to become increasingly important in business and therefore in business education. We surveyed faculty who teach statistics or whose institutions offer statistics to business students and conducted web searches of business analytics and data science programs that are offered by these faculties associated with schools of business. The intent of the survey and web searches was to gain insight on the current landscape of business analytics and how it may work synergistically with data science at institutions of higher education, as well as inform the role that statistics education plays in the era of big data. The study presents an analysis of subject areas (Statistics, Operations Research, Management Information Systems, Data Analytics, and Soft Skills) covered in courses offered by institutions with undergraduate degrees in business analytics or data science influencing statistics taught to business students. Given the notable contribution of statistics to the study of business analytics and data science and the importance of knowledge and skills acquired in statistics-based courses not only for students pursuing a major or minor in the discipline, but also for all business majors entering the current data-centric business environment, we present findings about who is teaching what in business statistics education. Journal: The American Statistician Pages: 155-161 Issue: 2 Volume: 71 Year: 2017 Month: 4 X-DOI: 10.1080/00031305.2016.1277160 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1277160 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:155-161 Template-Type: ReDIF-Article 1.0 Author-Name: Michael P. Cohen Author-X-Name-First: Michael P. Author-X-Name-Last: Cohen Title: Non-Asymptotic Mean and Variance Also Approximately Satisfy Taylor's Law Journal: The American Statistician Pages: 187-187 Issue: 2 Volume: 71 Year: 2017 Month: 4 X-DOI: 10.1080/00031305.2017.1286261 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1286261 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:187-187 Template-Type: ReDIF-Article 1.0 Author-Name: Iain L. MacDonald Author-X-Name-First: Iain L. Author-X-Name-Last: MacDonald Title: Models for count data Journal: The American Statistician Pages: 187-190 Issue: 2 Volume: 71 Year: 2017 Month: 4 X-DOI: 10.1080/00031305.2017.1291449 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1291449 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:187-190 Template-Type: ReDIF-Article 1.0 Author-Name: Stuart R. Lipsitz Author-X-Name-First: Stuart R. Author-X-Name-Last: Lipsitz Author-Name: Garrett M. Fitzmaurice Author-X-Name-First: Garrett M. Author-X-Name-Last: Fitzmaurice Author-Name: Debajyoti Sinha Author-X-Name-First: Debajyoti Author-X-Name-Last: Sinha Author-Name: Nathanael Hevelone Author-X-Name-First: Nathanael Author-X-Name-Last: Hevelone Author-Name: Edward Giovannucci Author-X-Name-First: Edward Author-X-Name-Last: Giovannucci Author-Name: Quoc-Dien Trinh Author-X-Name-First: Quoc-Dien Author-X-Name-Last: Trinh Author-Name: Jim C. Hu Author-X-Name-First: Jim C. Author-X-Name-Last: Hu Title: Efficient Computation of Reduced Regression Models Abstract: We consider settings where it is of interest to fit and assess regression submodels that arise as various explanatory variables are excluded from a larger regression model. The larger model is referred to as the full model; the submodels are the reduced models. We show that a computationally efficient approximation to the regression estimates under any reduced model can be obtained from a simple weighted least squares (WLS) approach based on the estimated regression parameters and covariance matrix from the full model. This WLS approach can be considered an extension to unbiased estimating equations of a first-order Taylor series approach proposed by Lawless and Singhal. Using data from the 2010 Nationwide Inpatient Sample (NIS), a 20% weighted, stratified, cluster sample of approximately 8 million hospital stays from approximately 1000 hospitals, we illustrate the WLS approach when fitting interval censored regression models to estimate the effect of type of surgery (robotic versus nonrobotic surgery) on hospital length-of-stay while adjusting for three sets of covariates: patient-level characteristics, hospital characteristics, and zip-code level characteristics. Ordinarily, standard fitting of the reduced models to the NIS data takes approximately 10 hours; using the proposed WLS approach, the reduced models take seconds to fit. Journal: The American Statistician Pages: 171-176 Issue: 2 Volume: 71 Year: 2017 Month: 4 X-DOI: 10.1080/00031305.2017.1296375 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1296375 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:171-176 Template-Type: ReDIF-Article 1.0 Author-Name: Kimberly F. Sellers Author-X-Name-First: Kimberly F. Author-X-Name-Last: Sellers Author-Name: Darcy S. Morris Author-X-Name-First: Darcy S. Author-X-Name-Last: Morris Author-Name: Galit Shmueli Author-X-Name-First: Galit Author-X-Name-Last: Shmueli Author-Name: Li Zhu Author-X-Name-First: Li Author-X-Name-Last: Zhu Title: Reply Journal: The American Statistician Pages: 190-190 Issue: 2 Volume: 71 Year: 2017 Month: 4 X-DOI: 10.1080/00031305.2017.1296738 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1296738 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:190-190 Template-Type: ReDIF-Article 1.0 Author-Name: The Editors Title: Reviews of Books and Teaching Materials Journal: The American Statistician Pages: 182-186 Issue: 2 Volume: 71 Year: 2017 Month: 4 X-DOI: 10.1080/00031305.2017.1325631 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1325631 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:182-186 Template-Type: ReDIF-Article 1.0 Author-Name: Victor Fossaluza Author-X-Name-First: Victor Author-X-Name-Last: Fossaluza Author-Name: Rafael Izbicki Author-X-Name-First: Rafael Author-X-Name-Last: Izbicki Author-Name: Gustavo Miranda da Silva Author-X-Name-First: Gustavo Miranda Author-X-Name-Last: da Silva Author-Name: Luís Gustavo Esteves Author-X-Name-First: Luís Gustavo Author-X-Name-Last: Esteves Title: Coherent Hypothesis Testing Abstract: Multiple hypothesis testing, an important quantitative tool to report the results of scientific inquiries, frequently leads to contradictory conclusions. For instance, in an analysis of variance (ANOVA) setting, the same dataset can lead one to reject the equality of two means, say μ1 = μ2, but at the same time to not reject the hypothesis that μ1 = μ2 = 0. These two conclusions violate the coherence principle introduced by Gabriel in 1969, and lead to results that are difficult to communicate, and, many times, embarrassing for practitioners of statistical methods. Although this situation is common in the daily life of statisticians, it is usually not discussed in courses of statistics. In this work, we enrich the teaching and discussion of this important topic by investigating through a few examples whether several standard test procedures are coherent or not. We also discuss the relationship between coherent tests and measures of support. Finally, we show how a Bayesian decision-theoretical framework can be used to build coherent tests. These approaches to coherence enlighten when such property is appealing in multiple testing and provide means of obtaining it. Journal: The American Statistician Pages: 242-248 Issue: 3 Volume: 71 Year: 2017 Month: 7 X-DOI: 10.1080/00031305.2016.1237893 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1237893 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:3:p:242-248 Template-Type: ReDIF-Article 1.0 Author-Name: Panagiotis (Panos) Toulis Author-X-Name-First: Panagiotis (Panos) Author-X-Name-Last: Toulis Title: A Useful Pivotal Quantity Abstract: Consider n continuous random variables with joint density f that possibly dependson unknown parameters θ. If the negative of the logarithm of f is a positive homogenous function of degree p taking only positive values, then that function is distributed as a Gamma random variable with shape n/p and scale 2, and thus it is a pivotal quantity for θ. This provides a general method to construct pivotal quantities, which are widely applicable in statistical practice, such as hypothesis testing and confidence intervals. Here, we prove the aforementioned result and illustrate through examples. Journal: The American Statistician Pages: 272-274 Issue: 3 Volume: 71 Year: 2017 Month: 7 X-DOI: 10.1080/00031305.2016.1237894 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1237894 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:3:p:272-274 Template-Type: ReDIF-Article 1.0 Author-Name: Chunpeng Fan Author-X-Name-First: Chunpeng Author-X-Name-Last: Fan Author-Name: Lin Wang Author-X-Name-First: Lin Author-X-Name-Last: Wang Author-Name: Lynn Wei Author-X-Name-First: Lynn Author-X-Name-Last: Wei Title: Comparing Two Tests for Two Rates Abstract: This article rigorously proves superiority of the proportion χ2 test to the logistic regression Wald test in terms of power when comparing two rates, despite their asymptotic equivalence under the null hypothesis that the two rates are equal. Journal: The American Statistician Pages: 275-281 Issue: 3 Volume: 71 Year: 2017 Month: 7 X-DOI: 10.1080/00031305.2016.1246263 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1246263 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:3:p:275-281 Template-Type: ReDIF-Article 1.0 Author-Name: Roger W. Hoerl Author-X-Name-First: Roger W. Author-X-Name-Last: Hoerl Author-Name: Ronald D. Snee Author-X-Name-First: Ronald D. Author-X-Name-Last: Snee Title: Statistical Engineering: An Idea Whose Time Has Come? Abstract: Several authors, including the American Statistical Association (ASA) guidelines for undergraduate statistics education (American Statistical Association Undergraduate Guidelines Workgroup), have noted the challenges facing statisticians when attacking large, complex, and unstructured problems, as opposed to well-defined textbook problems. Clearly, the standard paradigm of selecting the one “correct” statistical method for such problems is not sufficient; a new paradigm is needed. Statistical engineering has been proposed as a discipline that can provide a viable paradigm to attack such problems, used in conjunction with sound statistical science. Of course, to develop as a true discipline, statistical engineering must be clearly defined and articulated. Further, a well-developed underlying theory is needed, one that would prove helpful in addressing such large, complex, and unstructured problems. The purpose of this expository article is to more clearly articulate the current state of statistical engineering, and make a case for why it merits further study by the profession as a means of addressing such problems. We conclude with a “call to action.” Journal: The American Statistician Pages: 209-219 Issue: 3 Volume: 71 Year: 2017 Month: 7 X-DOI: 10.1080/00031305.2016.1247015 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1247015 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:3:p:209-219 Template-Type: ReDIF-Article 1.0 Author-Name: Wen-Han Hwang Author-X-Name-First: Wen-Han Author-X-Name-Last: Hwang Author-Name: Richard Huggins Author-X-Name-First: Richard Author-X-Name-Last: Huggins Author-Name: Lu-Fang Chen Author-X-Name-First: Lu-Fang Author-X-Name-Last: Chen Title: A Note on the Inverse Birthday Problem With Applications Abstract: The classical birthday problem considers the probability that at least two people in a group of size N share the same birthday. The inverse birthday problem considers the estimation of the size N of a group given the number of different birthdays in the group. In practice, this problem is analogous to estimating the size of a population from occurrence data only. The inverse problem can be solved via two simple approaches including the method of moments for a multinominal model and the maximum likelihood estimate of a Poisson model, which we present in this study. We investigate properties of both methods and show that they can yield asymptotically equivalent Wald-type interval estimators. Moreover, we show that these methods estimate a lower bound for the population size when birth rates are nonhomogenous or individuals in the population are aggregated. A simulation study was conducted to evaluate the performance of the point estimates arising from the two approaches and to compare the performance of seven interval estimators, including likelihood ratio and log-transformation methods. We illustrate the utility of these methods by estimating: (1) the abundance of tree species over a 50-hectare forest plot, (2) the number of Chlamydia infections when only the number of different birthdays of the patients is known, and (3) the number of rainy days when the number of rainy weeks is known. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 191-201 Issue: 3 Volume: 71 Year: 2017 Month: 7 X-DOI: 10.1080/00031305.2016.1255657 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1255657 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:3:p:191-201 Template-Type: ReDIF-Article 1.0 Author-Name: Jarod Y. L. Lee Author-X-Name-First: Jarod Y. L. Author-X-Name-Last: Lee Author-Name: James J. Brown Author-X-Name-First: James J. Author-X-Name-Last: Brown Author-Name: Louise M. Ryan Author-X-Name-First: Louise M. Author-X-Name-Last: Ryan Title: Sufficiency Revisited: Rethinking Statistical Algorithms in the Big Data Era Abstract: The big data era demands new statistical analysis paradigms, since traditional methods often break down when datasets are too large to fit on a single desktop computer. Divide and Recombine (D&R) is becoming a popular approach for big data analysis, where results are combined over subanalyses performed in separate data subsets. In this article, we consider situations where unit record data cannot be made available by data custodians due to privacy concerns, and explore the concept of statistical sufficiency and summary statistics for model fitting. The resulting approach represents a type of D&R strategy, which we refer to as summary statistics D&R; as opposed to the standard approach, which we refer to as horizontal D&R. We demonstrate the concept via an extended Gamma–Poisson model, where summary statistics are extracted from different databases and incorporated directly into the fitting algorithm without having to combine unit record data. By exploiting the natural hierarchy of data, our approach has major benefits in terms of privacy protection. Incorporating the proposed modelling framework into data extraction tools such as TableBuilder by the Australian Bureau of Statistics allows for potential analysis at a finer geographical level, which we illustrate with a multilevel analysis of the Australian unemployment data. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 202-208 Issue: 3 Volume: 71 Year: 2017 Month: 7 X-DOI: 10.1080/00031305.2016.1255659 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1255659 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:3:p:202-208 Template-Type: ReDIF-Article 1.0 Author-Name: William B. Fairley Author-X-Name-First: William B. Author-X-Name-Last: Fairley Author-Name: Peter J. Kempthorne Author-X-Name-First: Peter J. Author-X-Name-Last: Kempthorne Author-Name: Julie Novak Author-X-Name-First: Julie Author-X-Name-Last: Novak Author-Name: Scott McGarvie Author-X-Name-First: Scott Author-X-Name-Last: McGarvie Author-Name: Steve Crunk Author-X-Name-First: Steve Author-X-Name-Last: Crunk Author-Name: Bee Leng Lee Author-X-Name-First: Bee Leng Author-X-Name-Last: Lee Author-Name: Alan J. Salzberg Author-X-Name-First: Alan J. Author-X-Name-Last: Salzberg Title: Resolving a Multi-Million Dollar Contract Dispute With a Latin Square Abstract: The City of New York negotiated a dispute over the performance of new garbage trucks purchased from a vehicle manufacturer. The dispute concerned the fulfillment of a specification in the purchase contract that the trucks load a minimum full-load of 12.5 tons of household refuse. On behalf of the City, but in cooperation with the manufacturer, the City's Department of Sanitation and consulting statisticians tested fulfillment of the contract specification, employing a Latin Square design for routing trucks. We present the classical analysis using a linear model and analysis of variance. We also show how fixed, mixed, and random effect models are useful in analyzing the results of the test. Finally, we take a Bayesian perspective to demonstrate how the information from the data overcomes the difference between the prior densities of the city and the manufacturer for the load capacities of the trucks to result in much closer posterior densities. This procedure might prove useful in similar negotiations. Supplementary material including the data and R code for computations in the article are available online. Journal: The American Statistician Pages: 249-258 Issue: 3 Volume: 71 Year: 2017 Month: 7 X-DOI: 10.1080/00031305.2016.1256231 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1256231 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:3:p:249-258 Template-Type: ReDIF-Article 1.0 Author-Name: Heidi Spratt Author-X-Name-First: Heidi Author-X-Name-Last: Spratt Author-Name: Erin E. Fox Author-X-Name-First: Erin E. Author-X-Name-Last: Fox Author-Name: Nawar Shara Author-X-Name-First: Nawar Author-X-Name-Last: Shara Author-Name: Madhu Mazumdar Author-X-Name-First: Madhu Author-X-Name-Last: Mazumdar Title: Strategies for Success: Early-Stage Collaborating Biostatistics Faculty in an Academic Health Center Abstract: Collaborative biostatistics faculty (CBF) are increasingly valued by academic health centers (AHCs) for their role in increasing success rates of grants and publications, and educating medical students and clinical researchers. Some AHCs have a biostatistics department that consists of only biostatisticians focused on methodological research, collaborative research, and education. Others may have a biostatistics unit within an interdisciplinary department, or statisticians recruited into clinical departments. Within each model, there is also variability in environment, influenced by the chair's background, research focus of colleagues, type of students taught, funding sources, and whether the department is in a medical school or school of public health. CBF appointments may be tenure track or nontenure, and expectations for promotion may vary greatly depending on the type of department, track, and the AHC. In this article, the authors identify strategies for developing early-stage CBFs in four domains: (1) Influence of department/environment, (2) Skills to develop, (3) Ways to increase productivity, and (4) Ways to document accomplishments. Graduating students and postdoctoral fellows should consider the first domain when choosing a faculty position. Early-stage CBFs will benefit by understanding the requirements of their environment early in their appointment and by modifying the provided progression grid with their chair and mentoring team as needed. Following this personalized grid will increase the chances of a satisfying career with appropriate recognition for academic accomplishments. Journal: The American Statistician Pages: 220-230 Issue: 3 Volume: 71 Year: 2017 Month: 7 X-DOI: 10.1080/00031305.2016.1277157 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1277157 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:3:p:220-230 Template-Type: ReDIF-Article 1.0 Author-Name: Tahir Ekin Author-X-Name-First: Tahir Author-X-Name-Last: Ekin Author-Name: Francesca Ieva Author-X-Name-First: Francesca Author-X-Name-Last: Ieva Author-Name: Fabrizio Ruggeri Author-X-Name-First: Fabrizio Author-X-Name-Last: Ruggeri Author-Name: Refik Soyer Author-X-Name-First: Refik Author-X-Name-Last: Soyer Title: On the Use of the Concentration Function in Medical Fraud Assessment Abstract: We propose a simple, but effective, tool to detect possible anomalies in the services prescribed by a health care provider (HP) compared to his/her colleagues in the same field and environment. Our method is based on the concentration function that is an extension of the Lorenz curve widely used in describing uneven distribution of wealth in a population. The proposed tool provides a graphical illustration of a possible anomalous behavior of the HPs and it can be used as a prescreening device for further investigations of potential medical fraud. Journal: The American Statistician Pages: 236-241 Issue: 3 Volume: 71 Year: 2017 Month: 7 X-DOI: 10.1080/00031305.2017.1292955 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1292955 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:3:p:236-241 Template-Type: ReDIF-Article 1.0 Author-Name: Jeff Witmer Author-X-Name-First: Jeff Author-X-Name-Last: Witmer Title: Bayes and MCMC for Undergraduates Abstract: Students of statistics should be taught the ideas and methods that are widely used in practice and that will help them understand the world of statistics. Today, this means teaching them about Bayesian methods. In this article, I present ideas on teaching an undergraduate Bayesian course that uses Markov chain Monte Carlo and that can be a second course or, for strong students, a first course in statistics. Journal: The American Statistician Pages: 259-264 Issue: 3 Volume: 71 Year: 2017 Month: 7 X-DOI: 10.1080/00031305.2017.1305289 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1305289 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:3:p:259-264 Template-Type: ReDIF-Article 1.0 Author-Name: Sitsofe Tsagbey Author-X-Name-First: Sitsofe Author-X-Name-Last: Tsagbey Author-Name: Miguel de Carvalho Author-X-Name-First: Miguel Author-X-Name-Last: de Carvalho Author-Name: Garritt L. Page Author-X-Name-First: Garritt L. Author-X-Name-Last: Page Title: All Data are Wrong, but Some are Useful? Advocating the Need for Data Auditing Abstract: In a recent article from the Annals of Applied Statistics, Cox discussed the main phases of applied statistical research ranging from clarifying study objectives to final data analysis and interpreting results. As an incidental remark to these main phases, we advocate that beyond cleaning and preprocessing the data, it is a good practice to audit the data to determine if they can be trusted at all. A case study based on Ghanaian Official Fishery Statistics is used to illustrate this need, with Benford's law being the tool used to carrying out the data audit. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 231-235 Issue: 3 Volume: 71 Year: 2017 Month: 7 X-DOI: 10.1080/00031305.2017.1311282 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1311282 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:3:p:231-235 Template-Type: ReDIF-Article 1.0 Author-Name: Subhash Bagui Author-X-Name-First: Subhash Author-X-Name-Last: Bagui Author-Name: K. L. Mehra Author-X-Name-First: K. L. Author-X-Name-Last: Mehra Title: Convergence of Known Distributions to Limiting Normal or Non-normal Distributions: An Elementary Ratio Technique Abstract: This article presents an elementary informal technique for deriving the convergence of known distributions to limiting normal or non-normal distributions. The presentation should be of interest to teachers and students of first year graduate level courses in probability and statistics. Journal: The American Statistician Pages: 265-271 Issue: 3 Volume: 71 Year: 2017 Month: 7 X-DOI: 10.1080/00031305.2017.1322001 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1322001 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:3:p:265-271 Template-Type: ReDIF-Article 1.0 Author-Name: The Editors Title: Reviews of Books and Teaching Materials Journal: The American Statistician Pages: 282-289 Issue: 3 Volume: 71 Year: 2017 Month: 7 X-DOI: 10.1080/00031305.2017.1367180 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1367180 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:3:p:282-289 Template-Type: ReDIF-Article 1.0 Author-Name: Eric W. Gibson Author-X-Name-First: Eric W. Author-X-Name-Last: Gibson Title: Leadership in Statistics: Increasing Our Value and Visibility Abstract: Scientists in every discipline are generating data more rapidly than ever before, resulting in an increasing need for statistical skills at a time when there is decreasing visibility for the field of statistics. Resolving this paradox requires stronger statistical leadership to guide multidisciplinary teams in the design and planning of scientific research and making decisions based on data. It requires more effective communication to nonstatisticians of the value of statistics in using data to answer questions, predict outcomes, and support decision-making in the face of uncertainty. It also requires a greater appreciation of the unique capabilities of alternative quantitative disciplines such as machine learning, data science, pharmacometrics, and bioinformatics which represent an opportunity for statisticians to achieve greater impact through collaborative partnership. Examples taken from pharmaceutical drug development are used to illustrate the concept of statistical leadership in a collaborative multidisciplinary team environment. Journal: The American Statistician Pages: 109-116 Issue: 2 Volume: 73 Year: 2019 Month: 4 X-DOI: 10.1080/00031305.2017.1336484 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1336484 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:109-116 Template-Type: ReDIF-Article 1.0 Author-Name: Jeffrey N. Rouder Author-X-Name-First: Jeffrey N. Author-X-Name-Last: Rouder Author-Name: Richard D. Morey Author-X-Name-First: Richard D. Author-X-Name-Last: Morey Title: Teaching Bayes’ Theorem: Strength of Evidence as Predictive Accuracy Abstract: Although teaching Bayes’ theorem is popular, the standard approach—targeting posterior distributions of parameters—may be improved. We advocate teaching Bayes’ theorem in a ratio form where the posterior beliefs relative to the prior beliefs equals the conditional probability of data relative to the marginal probability of data. This form leads to an interpretation that the strength of evidence is relative predictive accuracy. With this approach, students are encouraged to view Bayes’ theorem as an updating mechanism, to obtain a deeper appreciation of the role of the prior and of marginal data, and to view estimation and model comparison from a unified perspective. Journal: The American Statistician Pages: 186-190 Issue: 2 Volume: 73 Year: 2019 Month: 4 X-DOI: 10.1080/00031305.2017.1341334 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1341334 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:186-190 Template-Type: ReDIF-Article 1.0 Author-Name: Subhabrata Chakraborti Author-X-Name-First: Subhabrata Author-X-Name-Last: Chakraborti Author-Name: Felipe Jardim Author-X-Name-First: Felipe Author-X-Name-Last: Jardim Author-Name: Eugenio Epprecht Author-X-Name-First: Eugenio Author-X-Name-Last: Epprecht Title: Higher-Order Moments Using the Survival Function: The Alternative Expectation Formula Abstract: Undergraduate and graduate students in a first-year probability (or a mathematical statistics) course learn the important concept of the moment of a random variable. The moments are related to various aspects of a probability distribution. In this context, the formula for the mean or the first moment of a nonnegative continuous random variable is often shown in terms of its c.d.f. (or the survival function). This has been called the alternative expectation formula. However, higher-order moments are also important, for example, to study the variance or the skewness of a distribution. In this note, we consider the rth moment of a nonnegative random variable and derive formulas in terms of the c.d.f. (or the survival function) paralleling the existing results for the first moment (the mean) using Fubini's theorem. Both nonnegative continuous and discrete integer-valued random variables are considered. These formulas may be advantageous, for example, when dealing with the moments of a transformed random variable, where it may be easier to derive its c.d.f. using the so-called c.d.f. method. Journal: The American Statistician Pages: 191-194 Issue: 2 Volume: 73 Year: 2019 Month: 4 X-DOI: 10.1080/00031305.2017.1356374 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1356374 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:191-194 Template-Type: ReDIF-Article 1.0 Author-Name: Yaakov Malinovsky Author-X-Name-First: Yaakov Author-X-Name-Last: Malinovsky Author-Name: Paul S. Albert Author-X-Name-First: Paul S. Author-X-Name-Last: Albert Title: Revisiting Nested Group Testing Procedures: New Results, Comparisons, and Robustness Abstract: Group testing has its origin in the identification of syphilis in the U.S. army during World War II. Much of the theoretical framework of group testing was developed starting in the late 1950s, with continued work into the 1990s. Recently, with the advent of new laboratory and genetic technologies, there has been an increasing interest in group testing designs for cost saving purposes. In this article, we compare different nested designs, including Dorfman, Sterrett and an optimal nested procedure obtained through dynamic programming. To elucidate these comparisons, we develop closed-form expressions for the optimal Sterrett procedure and provide a concise review of the prior literature for other commonly used procedures. We consider designs where the prevalence of disease is known as well as investigate the robustness of these procedures, when it is incorrectly assumed. This article provides a technical presentation that will be of interest to researchers as well as from a pedagogical perspective. Supplementary material for this article is available online. Journal: The American Statistician Pages: 117-125 Issue: 2 Volume: 73 Year: 2019 Month: 4 X-DOI: 10.1080/00031305.2017.1366367 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1366367 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:117-125 Template-Type: ReDIF-Article 1.0 Author-Name: Richard Le Blanc Author-X-Name-First: Richard Author-X-Name-Last: Le Blanc Title: Bayesian Analysis on a Noncentral Fisher–Student’s Hypersphere Abstract: Fisher succeeded early on in redefining Student’s t-distribution in geometrical terms on a central hypersphere. Intriguingly, a noncentral analytical extension for this fundamental Fisher–Student’s central hypersphere h-distribution does not exist. We therefore set to derive the noncentral h-distribution and use it to graphically illustrate the limitations of the Neyman–Pearson null hypothesis significance testing framework and the strengths of the Bayesian statistical hypothesis analysis framework on the hypersphere polar axis, a compact nontrivial one-dimensional parameter space. Using a geometrically meaningful maximal entropy prior, we requalify the apparent failure of an important psychological science reproducibility project. We proceed to show that the Bayes factor appropriately models the two-sample t-test p-value density of a gene expression profile produced by the high-throughput genomic-scale microarray technology, and provides a simple expression for a local false discovery rate addressing the multiple hypothesis testing problem brought about by such a technology. Journal: The American Statistician Pages: 126-140 Issue: 2 Volume: 73 Year: 2019 Month: 4 X-DOI: 10.1080/00031305.2017.1377111 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1377111 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:126-140 Template-Type: ReDIF-Article 1.0 Author-Name: Xiaoyue Niu Author-X-Name-First: Xiaoyue Author-X-Name-Last: Niu Author-Name: James L. Rosenberger Author-X-Name-First: James L. Author-X-Name-Last: Rosenberger Title: Near-Balanced Incomplete Block Designs, With an Application to Poster Competitions Abstract: Judging scholarly posters creates a challenge to assign the judges efficiently. If there are many posters and few reviews per judge, the commonly used balanced incomplete block design is not a feasible option. An additional challenge is an unknown number of judges before the event. We propose two connected near-balanced incomplete block designs that both satisfy the requirements of our setting: one that generates a connected assignment and balances the treatments and another one that further balances pairs of treatments. We describe both fixed and random effects models to estimate the population marginal means of the poster scores and rationalize the use of the random effects model. We evaluate the estimation accuracy and efficiency, especially the winning chance of the truly best posters, of the two designs in comparison with a random assignment via simulation studies. The two proposed designs both demonstrate accuracy and efficiency gain over the random assignment. Journal: The American Statistician Pages: 159-164 Issue: 2 Volume: 73 Year: 2019 Month: 4 X-DOI: 10.1080/00031305.2017.1385534 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1385534 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:159-164 Template-Type: ReDIF-Article 1.0 Author-Name: Luke Keele Author-X-Name-First: Luke Author-X-Name-Last: Keele Author-Name: Luke Miratrix Author-X-Name-First: Luke Author-X-Name-Last: Miratrix Title: Randomization Inference for Outcomes with Clumping at Zero Abstract: While randomization inference is well developed for continuous and binary outcomes, there has been comparatively little work for outcomes with nonnegative support and clumping at zero. Typically, outcomes of this type have been modeled using parametric models that impose strong distributional assumptions. This article proposes new randomization inference procedures for nonnegative outcomes with clumping at zero. Instead of making distributional assumptions, we propose various assumptions about the nature of the response to treatment and use permutation inference for both testing and estimation. This approach allows for some natural goodness-of-fit tests for model assessment, as well as flexibility in selecting test statistics sensitive to different potential alternatives. We illustrate our approach using two randomized trials, where job training interventions were designed to increase earnings of participants. Journal: The American Statistician Pages: 141-150 Issue: 2 Volume: 73 Year: 2019 Month: 4 X-DOI: 10.1080/00031305.2017.1385535 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1385535 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:141-150 Template-Type: ReDIF-Article 1.0 Author-Name: Tommy Wright Author-X-Name-First: Tommy Author-X-Name-Last: Wright Author-Name: Martin Klein Author-X-Name-First: Martin Author-X-Name-Last: Klein Author-Name: Jerzy Wieczorek Author-X-Name-First: Jerzy Author-X-Name-Last: Wieczorek Title: A Primer on Visualizations for Comparing Populations, Including the Issue of Overlapping Confidence Intervals Abstract: In comparing a collection of K populations, it is common practice to display in one visualization confidence intervals for the corresponding population parameters θ1, θ2, …, θK. For a pair of confidence intervals that do (or do not) overlap, viewers of the visualization are cognitively compelled to declare that there is not (or there is) a statistically significant difference between the two corresponding population parameters. It is generally well known that the method of examining overlap of pairs of confidence intervals should not be used for formal hypothesis testing. However, use of a single visualization with overlapping and nonoverlapping confidence intervals leads many to draw such conclusions, despite the best efforts of statisticians toward preventing users from reaching such conclusions. In this article, we summarize some alternative visualizations from the literature that can be used to properly test equality between a pair of population parameters. We recommend that these visualizations be used with caution to avoid incorrect statistical inference. The methods presented require only that we have K sample estimates and their associated standard errors. We also assume that the sample estimators are independent, unbiased, and normally distributed. Journal: The American Statistician Pages: 165-178 Issue: 2 Volume: 73 Year: 2019 Month: 4 X-DOI: 10.1080/00031305.2017.1392359 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1392359 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:165-178 Template-Type: ReDIF-Article 1.0 Author-Name: Charles South Author-X-Name-First: Charles Author-X-Name-Last: South Author-Name: Ryan Elmore Author-X-Name-First: Ryan Author-X-Name-Last: Elmore Author-Name: Andrew Clarage Author-X-Name-First: Andrew Author-X-Name-Last: Clarage Author-Name: Rob Sickorez Author-X-Name-First: Rob Author-X-Name-Last: Sickorez Author-Name: Jing Cao Author-X-Name-First: Jing Author-X-Name-Last: Cao Title: A Starting Point for Navigating the World of Daily Fantasy Basketball Abstract: Fantasy sports, particularly the daily variety in which new lineups are selected each day, are a rapidly growing industry. The two largest companies in the daily fantasy business, DraftKings and Fanduel, have been valued as high as $2 billion. This research focuses on the development of a complete system for daily fantasy basketball, including both the prediction of player performance and the construction of a team. First, a Bayesian random effects model is used to predict an aggregate measure of daily NBA player performance. The predictions are then used to construct teams under the constraints of the game, typically related to a fictional salary cap and player positions. Permutation based and K-nearest neighbors approaches are compared in terms of the identification of “successful” teams—those who would be competitive more often than not based on historical data. We demonstrate the efficacy of our system by comparing our predictions to those from a well-known analytics website, and by simulating daily competitions over the course of the 2015–2016 season. Our results show an expected profit of approximately $9,000 on an initial $500 investment using the K-nearest neighbors approach, a 36% increase relative to using the permutation-based approach alone. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 179-185 Issue: 2 Volume: 73 Year: 2019 Month: 4 X-DOI: 10.1080/00031305.2017.1401559 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1401559 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:179-185 Template-Type: ReDIF-Article 1.0 Author-Name: Flavio Santi Author-X-Name-First: Flavio Author-X-Name-Last: Santi Author-Name: Maria Michela Dickson Author-X-Name-First: Maria Michela Author-X-Name-Last: Dickson Author-Name: Giuseppe Espa Author-X-Name-First: Giuseppe Author-X-Name-Last: Espa Title: A Graphical Tool for Interpreting Regression Coefficients of Trinomial Logit Models Abstract: Multinomial logit (also termed multi-logit) models permit the analysis of the statistical relation between a categorical response variable and a set of explicative variables (called covariates or regressors). Although multinomial logit is widely used in both the social and economic sciences, the interpretation of regression coefficients may be tricky, as the effect of covariates on the probability distribution of the response variable is nonconstant and difficult to quantify. The ternary plots illustrated in this article aim at facilitating the interpretation of regression coefficients and permit the effect of covariates (either singularly or jointly considered) on the probability distribution of the dependent variable to be quantified. Ternary plots can be drawn both for ordered and for unordered categorical dependent variables, when the number of possible outcomes equals three (trinomial response variable); these plots allow not only to represent the covariate effects over the whole parameter space of the dependent variable but also to compare the covariate effects of any given individual profile. The method is illustrated and discussed through analysis of a dataset concerning the transition of master’s graduates of the University of Trento (Italy) from university to employment. Journal: The American Statistician Pages: 200-207 Issue: 2 Volume: 73 Year: 2019 Month: 4 X-DOI: 10.1080/00031305.2018.1442368 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1442368 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:200-207 Template-Type: ReDIF-Article 1.0 Author-Name: Frank Tuyl Author-X-Name-First: Frank Author-X-Name-Last: Tuyl Title: A Method to Handle Zero Counts in the Multinomial Model Abstract: In the context of an objective Bayesian approach to the multinomial model, Dirichlet(a, …, a) priors with a < 1 have previously been shown to be inadequate in the presence of zero counts, suggesting that the uniform prior (a = 1) is the preferred candidate. In the presence of many zero counts, however, this prior may not be satisfactory either. A model selection approach is proposed, allowing for the possibility of zero parameters corresponding to zero count categories. This approach results in a posterior mixture of Dirichlet distributions and marginal mixtures of beta distributions, which seem to avoid the problems that potentially result from the various proposed Dirichlet priors, in particular in the context of extreme data with zero counts. Journal: The American Statistician Pages: 151-158 Issue: 2 Volume: 73 Year: 2019 Month: 4 X-DOI: 10.1080/00031305.2018.1444673 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1444673 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:151-158 Template-Type: ReDIF-Article 1.0 Author-Name: Francisco Louzada Author-X-Name-First: Francisco Author-X-Name-Last: Louzada Author-Name: Pedro L. Ramos Author-X-Name-First: Pedro L. Author-X-Name-Last: Ramos Author-Name: Eduardo Ramos Author-X-Name-First: Eduardo Author-X-Name-Last: Ramos Title: A Note on Bias of Closed-Form Estimators for the Gamma Distribution Derived From Likelihood Equations Abstract: We discuss here an alternative approach for decreasing the bias of the closed-form estimators for the gamma distribution recently proposed by Ye and Chen in 2017. We show that, the new estimator has also closed-form expression, is positive, and can be computed for n > 2. Moreover, the corrective approach returns better estimates when compared with the former ones. Journal: The American Statistician Pages: 195-199 Issue: 2 Volume: 73 Year: 2019 Month: 4 X-DOI: 10.1080/00031305.2018.1513376 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1513376 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:195-199 Template-Type: ReDIF-Article 1.0 Author-Name: Xin Wang Author-X-Name-First: Xin Author-X-Name-Last: Wang Title: Business Survival Analysis Using SAS: An Introduction to Lifetime Probabilities Journal: The American Statistician Pages: 208-209 Issue: 2 Volume: 73 Year: 2019 Month: 4 X-DOI: 10.1080/00031305.2018.1538851 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1538851 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:208-209 Template-Type: ReDIF-Article 1.0 Author-Name: Anna Schenfisch Author-X-Name-First: Anna Author-X-Name-Last: Schenfisch Author-Name: Brittany Fasy Author-X-Name-First: Brittany Author-X-Name-Last: Fasy Title: Statistical Analysis of Contingency Tables. Journal: The American Statistician Pages: 208-208 Issue: 2 Volume: 73 Year: 2019 Month: 4 X-DOI: 10.1080/00031305.2019.1571848 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1571848 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:208-208 Template-Type: ReDIF-Article 1.0 Author-Name: Nicole Bohme Carnegie Author-X-Name-First: Nicole Bohme Author-X-Name-Last: Carnegie Title: Quantitative Methods for HIV/AIDS Research Journal: The American Statistician Pages: 209-210 Issue: 2 Volume: 73 Year: 2019 Month: 4 X-DOI: 10.1080/00031305.2019.1603473 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1603473 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:209-210 Template-Type: ReDIF-Article 1.0 Author-Name: Minggen Lu Author-X-Name-First: Minggen Author-X-Name-Last: Lu Title: Survival Analysis with Interval-Censored Data: A Practical Approach with Examples in R, SAS, and BUGS. Journal: The American Statistician Pages: 211-212 Issue: 2 Volume: 73 Year: 2019 Month: 4 X-DOI: 10.1080/00031305.2019.1603477 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1603477 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:211-212 Template-Type: ReDIF-Article 1.0 Author-Name: Emily Dressler Author-X-Name-First: Emily Author-X-Name-Last: Dressler Title: Clinical Trial Optimization Using R. Journal: The American Statistician Pages: 210-211 Issue: 2 Volume: 73 Year: 2019 Month: 4 X-DOI: 10.1080/00031305.2019.1603479 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1603479 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:210-211 Template-Type: ReDIF-Article 1.0 Author-Name: Prakash Gorroochurn Author-X-Name-First: Prakash Author-X-Name-Last: Gorroochurn Title: On Galton's Change From “Reversion” to “Regression” Abstract: Galton's first work on regression probably led him to think of it as a unidirectional, genetic process, which he called “reversion.” A subsequent experiment on family heights made him realize that the phenomenon was symmetric and nongenetic. Galton then abandoned “reversion” in favor of “regression.” Final confirmation was provided through Dickson's mathematical analysis and Galton's examination of height data on brothers. Journal: The American Statistician Pages: 227-231 Issue: 3 Volume: 70 Year: 2016 Month: 7 X-DOI: 10.1080/00031305.2015.1087876 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1087876 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:227-231 Template-Type: ReDIF-Article 1.0 Author-Name: J. G. Liao Author-X-Name-First: J. G. Author-X-Name-Last: Liao Author-Name: Duanping Liao Author-X-Name-First: Duanping Author-X-Name-Last: Liao Author-Name: Arthur Berg Author-X-Name-First: Arthur Author-X-Name-Last: Berg Title: Calibrated Bayes Factors in Assessing Genetic Association Models Abstract: Three competing genetic models—additive, dominant, and recessive—are often considered in genetic association analysis. We propose and develop a calibrated Bayes approach for comparing these competing models that has the desired property of giving equal support to the three models when no genetic association is present. The naïve approach with noncalibrated priors is shown to produce misleading Bayes factors. The method is fully developed with simulation studies, real data analyses, and an efficient algorithm based on an asymptotic approximation. An illuminating connection to the Kullback–Leibler divergence is also established. The proposed calibrated prior can serve as a reference prior for a genetic association study or as a common baseline prior for comparing Bayes analyses of different datasets. Journal: The American Statistician Pages: 250-256 Issue: 3 Volume: 70 Year: 2016 Month: 7 X-DOI: 10.1080/00031305.2015.1109548 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1109548 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:250-256 Template-Type: ReDIF-Article 1.0 Author-Name: Deborah Nolan Author-X-Name-First: Deborah Author-X-Name-Last: Nolan Author-Name: Jamis Perrett Author-X-Name-First: Jamis Author-X-Name-Last: Perrett Title: Teaching and Learning Data Visualization: Ideas and Assignments Abstract: This article discusses how to make statistical graphics a more prominent element of the undergraduate statistics curricula. The focus is on several different types of assignments that exemplify how to incorporate graphics into a course in a pedagogically meaningful way. These assignments include having students deconstruct and reconstruct plots, copy masterful graphs, create one-minute visual revelations, convert tables into “pictures,” and develop interactive visualizations, for example, with the virtual earth as a plotting canvas. In addition to describing the goals and details of each assignment, we also discuss the broader topic of graphics and key concepts that we think warrant inclusion in the statistics curricula. We advocate that more attention needs to be paid to this fundamental field of statistics at all levels, from introductory undergraduate through graduate level courses. With the rapid rise of tools to visualize data, for example, Google trends, GapMinder, ManyEyes, and Tableau, and the increased use of graphics in the media, understanding the principles of good statistical graphics, and having the ability to create informative visualizations is an ever more important aspect of statistics education. Supplementary materials containing code and data for the assignments are available online. Journal: The American Statistician Pages: 260-269 Issue: 3 Volume: 70 Year: 2016 Month: 7 X-DOI: 10.1080/00031305.2015.1123651 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1123651 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:260-269 Template-Type: ReDIF-Article 1.0 Author-Name: Albert Vexler Author-X-Name-First: Albert Author-X-Name-Last: Vexler Author-Name: Li Zou Author-X-Name-First: Li Author-X-Name-Last: Zou Author-Name: Alan D. Hutson Author-X-Name-First: Alan D. Author-X-Name-Last: Hutson Title: Data-Driven Confidence Interval Estimation Incorporating Prior Information with an Adjustment for Skewed Data Abstract: Bayesian credible interval (CI) estimation is a statistical procedure that has been well addressed in both the theoretical and applied literature. Parametric assumptions regarding baseline data distributions are critical for the implementation of this method. We provide a nonparametric technique for incorporating prior information into the equal-tailed (ET) and highest posterior density (HPD) CI estimators in the Bayesian manner. We propose to use a data-driven likelihood function, replacing the parametric likelihood function to create a distribution-free posterior. Higher order asymptotic propositions are derived to show the efficiency and consistency of the proposed method. We demonstrate that the proposed approach may correct confidence regions with respect to skewness of the data distribution. An extensive Monte Carlo (MC) study confirms the proposed method significantly outperforms the classical CI estimation in a frequentist context. A real data example related to a study of myocardial infarction illustrates the excellent applicability of the proposed technique. Supplementary material, including the R code used to implement the developed method, is available online. Journal: The American Statistician Pages: 243-249 Issue: 3 Volume: 70 Year: 2016 Month: 7 X-DOI: 10.1080/00031305.2016.1141707 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1141707 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:243-249 Template-Type: ReDIF-Article 1.0 Author-Name: Xavier Puig Author-X-Name-First: Xavier Author-X-Name-Last: Puig Author-Name: Martí Font Author-X-Name-First: Martí Author-X-Name-Last: Font Author-Name: Josep Ginebra Author-X-Name-First: Josep Author-X-Name-Last: Ginebra Title: A Unified Approach to Authorship Attribution and Verification Abstract: In authorship attribution, one assigns texts from an unknown author to either one of two or more candidate authors by comparing the disputed texts with texts known to have been written by the candidate authors. In authorship verification, one decides whether a text or a set of texts could have been written by a given author. These two problems are usually treated separately. By assuming an open-set classification framework for the attribution problem, contemplating the possibility that none of the candidate authors is the unknown author, the verification problem becomes a special case of attribution problem. Here both problems are posed as a formal Bayesian multinomial model selection problem and are given a closed-form solution, tailored for categorical data, naturally incorporating text length and dependence in the analysis, and coping well with settings with a small number of training texts. The approach to authorship verification is illustrated by exploring whether a court ruling sentence could have been written by the judge that signs it, and the approach to authorship attribution is illustrated by revisiting the authorship attribution of the Federalist papers and through a small simulation study. Journal: The American Statistician Pages: 232-242 Issue: 3 Volume: 70 Year: 2016 Month: 7 X-DOI: 10.1080/00031305.2016.1148630 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1148630 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:232-242 Template-Type: ReDIF-Article 1.0 Author-Name: Nicholas G. Reich Author-X-Name-First: Nicholas G. Author-X-Name-Last: Reich Author-Name: Justin Lessler Author-X-Name-First: Justin Author-X-Name-Last: Lessler Author-Name: Krzysztof Sakrejda Author-X-Name-First: Krzysztof Author-X-Name-Last: Sakrejda Author-Name: Stephen A. Lauer Author-X-Name-First: Stephen A. Author-X-Name-Last: Lauer Author-Name: Sopon Iamsirithaworn Author-X-Name-First: Sopon Author-X-Name-Last: Iamsirithaworn Author-Name: Derek A. T. Cummings Author-X-Name-First: Derek A. T. Author-X-Name-Last: Cummings Title: Case Study in Evaluating Time Series Prediction Models Using the Relative Mean Absolute Error Abstract: Statistical prediction models inform decision-making processes in many real-world settings. Prior to using predictions in practice, one must rigorously test and validate candidate models to ensure that the proposed predictions have sufficient accuracy to be used in practice. In this article, we present a framework for evaluating time series predictions, which emphasizes computational simplicity and an intuitive interpretation using the relative mean absolute error metric. For a single time series, this metric enables comparisons of candidate model predictions against naïve reference models, a method that can provide useful and standardized performance benchmarks. Additionally, in applications with multiple time series, this framework facilitates comparisons of one or more models’ predictive performance across different sets of data. We illustrate the use of this metric with a case study comparing predictions of dengue hemorrhagic fever incidence in two provinces of Thailand. This example demonstrates the utility and interpretability of the relative mean absolute error metric in practice, and underscores the practical advantages of using relative performance metrics when evaluating predictions. Journal: The American Statistician Pages: 285-292 Issue: 3 Volume: 70 Year: 2016 Month: 7 X-DOI: 10.1080/00031305.2016.1148631 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1148631 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:285-292 Template-Type: ReDIF-Article 1.0 Author-Name: Miguel de Carvalho Author-X-Name-First: Miguel Author-X-Name-Last: de Carvalho Title: Mean, What do You Mean? Abstract: When teaching statistics we often resort to several notions of mean, such as arithmetic mean, geometric mean, and harmonic mean, and hence the student is often left with the question: The word mean appears in all such concepts, so what is actually a mean? I revisit Kolmogorov's axiomatic view of the mean, which unifies all these concepts of mean, among others. A population counterpart of the notion of regular mean, along with notions of regular variance and standard deviation will also be discussed here as unifying concepts. Some examples are used to illustrate main ideas. Journal: The American Statistician Pages: 270-274 Issue: 3 Volume: 70 Year: 2016 Month: 7 X-DOI: 10.1080/00031305.2016.1148632 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1148632 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:270-274 Template-Type: ReDIF-Article 1.0 Author-Name: John C. Wierman Author-X-Name-First: John C. Author-X-Name-Last: Wierman Title: The Class Joke Contest: Encouraging Creativity and Improving Attendance Abstract: Jokes are a resource that can be used to transmit concepts, motivate students, encourage creativity, and make learning more enjoyable. In each of my classes on probability and stochastic processes, I hold a monthly joke contest. Students are encouraged to submit original jokes relating to the course and its topics. The teaching assistants and I select a few finalists, and the class votes to determine winners, who receive extra credit. This article discusses the origin and evolution of the contest, describes its benefits in increased engagement and improved attendance, provides information and tips for faculty who might want to conduct a joke contest, and includes some example jokes. Journal: The American Statistician Pages: 257-259 Issue: 3 Volume: 70 Year: 2016 Month: 7 X-DOI: 10.1080/00031305.2016.1148633 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1148633 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:257-259 Template-Type: ReDIF-Article 1.0 Author-Name: Jia Liu Author-X-Name-First: Jia Author-X-Name-Last: Liu Author-Name: Daniel J. Nordman Author-X-Name-First: Daniel J. Author-X-Name-Last: Nordman Author-Name: William Q. Meeker Author-X-Name-First: William Q. Author-X-Name-Last: Meeker Title: The Number of MCMC Draws Needed to Compute Bayesian Credible Bounds Abstract: In the past 20 years, there has been a staggering increase in the use of Bayesian statistical inference, based on Markov chain Monte Carlo (MCMC) methods, to estimate model parameters and other quantities of interest. This trend exists in virtually all areas of engineering and science. In a typical application, researchers will report estimates of parametric functions (e.g., quantiles, probabilities, or predictions of future outcomes) and corresponding intervals from MCMC methods. One difficulty with the use of inferential methods based on Monte Carlo (MC) is that reported results may be inaccurate due to MC error. MC error, however, can be made arbitrarily small by increasing the number of MC draws. Most users of MCMC methods seem to use indirect diagnostics, trial-and-error, or guess-work to decide how long to run a MCMC algorithm and accuracy of MCMC output results is rarely reported. Unless careful analysis is done, reported numerical results may contain digits that are completely meaningless. In this article, we describe an algorithm to provide direct guidance on the number of MCMC draws needed to achieve a desired amount of precision (i.e., a specified number of accurate significant digits) for Bayesian credible interval endpoints. Journal: The American Statistician Pages: 275-284 Issue: 3 Volume: 70 Year: 2016 Month: 7 X-DOI: 10.1080/00031305.2016.1158738 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1158738 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:275-284 Template-Type: ReDIF-Article 1.0 Author-Name: Peng Ding Author-X-Name-First: Peng Author-X-Name-Last: Ding Title: On the Conditional Distribution of the Multivariate Distribution Abstract: As alternatives to the normal distributions, t distributions are widely applied in robust analysis for data with outliers or heavy tails. The properties of the multivariate t distribution are well documented in Kotz and Nadarajah's book, which, however, states a wrong conclusion about the conditional distribution of the multivariate t distribution. Previous literature has recognized that the conditional distribution of the multivariate t distribution also follows the multivariate t distribution. We provide an intuitive proof without directly manipulating the complicated density function of the multivariate t distribution. Journal: The American Statistician Pages: 293-295 Issue: 3 Volume: 70 Year: 2016 Month: 7 X-DOI: 10.1080/00031305.2016.1164756 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1164756 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:293-295 Template-Type: ReDIF-Article 1.0 Author-Name: Jyotirmoy Sarkar Author-X-Name-First: Jyotirmoy Author-X-Name-Last: Sarkar Author-Name: Mamunur Rashid Author-X-Name-First: Mamunur Author-X-Name-Last: Rashid Title: Visualizing Mean, Median, Mean Deviation, and Standard Deviation of a Set of Numbers Abstract: We review the existing visualizations of the mean and the median of a given set of numbers. Then we give an alternative visualization of the mean using the empirical cumulative distribution function of the given numbers. Next, we visualize the mean deviation (MD) and the mean square deviation (MSD) of the given numbers from any arbitrary value, including the variance. In light of these new visualizations, we revisit the well-known optimal properties of the MD from the median and the MSD from the mean. We also give a more elementary explanation of why the denominator of the sample variance of a set of numbers is one less than the sample size. Journal: The American Statistician Pages: 304-312 Issue: 3 Volume: 70 Year: 2016 Month: 7 X-DOI: 10.1080/00031305.2016.1165734 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1165734 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:304-312 Template-Type: ReDIF-Article 1.0 Author-Name: Christian Kleiber Author-X-Name-First: Christian Author-X-Name-Last: Kleiber Author-Name: Achim Zeileis Author-X-Name-First: Achim Author-X-Name-Last: Zeileis Title: Visualizing Count Data Regressions Using Rootograms Abstract: The rootogram is a graphical tool associated with the work of J. W. Tukey that was originally used for assessing goodness of fit of univariate distributions. Here, we extend the rootogram to regression models and show that this is particularly useful for diagnosing and treating issues such as overdispersion and/or excess zeros in count data models. We also introduce a weighted version of the rootogram that can be applied out of sample or to (weighted) subsets of the data, for example, in finite mixture models. An empirical illustration revisiting a well-known dataset from ethology is included, for which a negative binomial hurdle model is employed. Supplementary materials providing two further illustrations are available online: the first, using data from public health, employs a two-component finite mixture of negative binomial models; the second, using data from finance, involves underdispersion. An R implementation of our tools is available in the R package countreg. It also contains the data and replication code. Journal: The American Statistician Pages: 296-303 Issue: 3 Volume: 70 Year: 2016 Month: 7 X-DOI: 10.1080/00031305.2016.1173590 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1173590 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:296-303 Template-Type: ReDIF-Article 1.0 Author-Name: Ben O'Neill Author-X-Name-First: Ben Author-X-Name-Last: O'Neill Title: Corrigendum Journal: The American Statistician Pages: 323-323 Issue: 3 Volume: 70 Year: 2016 Month: 7 X-DOI: 10.1080/00031305.2016.1188584 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1188584 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:323-323 Template-Type: ReDIF-Article 1.0 Author-Name: The Editors Title: Reviews of Books and Teaching Materials Journal: The American Statistician Pages: 313-322 Issue: 3 Volume: 70 Year: 2016 Month: 7 X-DOI: 10.1080/00031305.2016.1203696 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1203696 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:313-322 Template-Type: ReDIF-Article 1.0 Author-Name: Ronald L. Wasserstein Author-X-Name-First: Ronald L. Author-X-Name-Last: Wasserstein Author-Name: Allen L. Schirm Author-X-Name-First: Allen L. Author-X-Name-Last: Schirm Author-Name: Nicole A. Lazar Author-X-Name-First: Nicole A. Author-X-Name-Last: Lazar Title: Moving to a World Beyond “p < 0.05” Journal: The American Statistician Pages: 1-19 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2019.1583913 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1583913 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:1-19 Template-Type: ReDIF-Article 1.0 Author-Name: John P. A. Ioannidis Author-X-Name-First: John P. A. Author-X-Name-Last: Ioannidis Title: What Have We (Not) Learnt from Millions of Scientific Papers with P Values? Abstract: P values linked to null hypothesis significance testing (NHST) is the most widely (mis)used method of statistical inference. Empirical data suggest that across the biomedical literature (1990–2015), when abstracts use P values 96% of them have P values of 0.05 or less. The same percentage (96%) applies for full-text articles. Among 100 articles in PubMed, 55 report P values, while only 4 present confidence intervals for all the reported effect sizes, none use Bayesian methods and none use false-discovery rate. Over 25 years (1990–2015), use of P values in abstracts has doubled for all PubMed, and tripled for meta-analyses, while for some types of designs such as randomized trials the majority of abstracts report P values. There is major selective reporting for P values. Abstracts tend to highlight most favorable P values and inferences use even further spin to reach exaggerated, unreliable conclusions. The availability of large-scale data on P values from many papers has allowed the development and applications of methods that try to detect and model selection biases, for example, p-hacking, that cause patterns of excess significance. Inferences need to be cautious as they depend on the assumptions made by these models and can be affected by the presence of other biases (e.g., confounding in observational studies). While much of the unreliability of past and present research is driven by small, underpowered studies, NHST with P values may be also particularly problematic in the era of overpowered big data. NHST and P values are optimal only in a minority of current research. Using a more stringent threshold, as in the recently proposed shift from P < 0.05 to P < 0.005, is a temporizing measure to contain the flood and death-by-significance. NHST and P values may be replaced in many fields by other, more fit-for-purpose, inferential methods. However, curtailing selection biases requires additional measures, beyond changes in inferential methods, and in particular reproducible research practices. Journal: The American Statistician Pages: 20-25 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1447512 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1447512 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:20-25 Template-Type: ReDIF-Article 1.0 Author-Name: Steven N. Goodman Author-X-Name-First: Steven N. Author-X-Name-Last: Goodman Title: Why is Getting Rid of P-Values So Hard? Musings on Science and Statistics Abstract: The current concerns about reproducibility have focused attention on proper use of statistics across the sciences. This gives statisticians an extraordinary opportunity to change what are widely regarded as statistical practices detrimental to the cause of good science. However, how that should be done is enormously complex, made more difficult by the balkanization of research methods and statistical traditions across scientific subdisciplines. Working within those sciences while also allying with science reform movements—operating simultaneously on the micro and macro levels—are the key to making lasting change in applied science. Journal: The American Statistician Pages: 26-30 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1558111 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1558111 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:26-30 Template-Type: ReDIF-Article 1.0 Author-Name: Raymond Hubbard Author-X-Name-First: Raymond Author-X-Name-Last: Hubbard Title: Will the ASA's Efforts to Improve Statistical Practice be Successful? Some Evidence to the Contrary Abstract: Recent efforts by the American Statistical Association to improve statistical practice, especially in countering the misuse and abuse of null hypothesis significance testing (NHST) and p-values, are to be welcomed. But will they be successful? The present study offers compelling evidence that this will be an extraordinarily difficult task. Dramatic citation-count data on 25 articles and books severely critical of NHST's negative impact on good science, underlining that this issue was/is well known, did nothing to stem its usage over the period 1960–2007. On the contrary, employment of NHST increased during this time. To be successful in this endeavor, as well as restoring the relevance of the statistics profession to the scientific community in the 21st century, the ASA must be prepared to dispense detailed advice. This includes specifying those situations, if they can be identified, in which the p-value plays a clearly valuable role in data analysis and interpretation. The ASA might also consider a statement that recommends abandoning the use of p-values. Journal: The American Statistician Pages: 31-35 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1497540 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1497540 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:31-35 Template-Type: ReDIF-Article 1.0 Author-Name: John L. Kmetz Author-X-Name-First: John L. Author-X-Name-Last: Kmetz Title: Correcting Corrupt Research: Recommendations for the Profession to Stop Misuse of p-Values Abstract: p-Values and Null Hypothesis Significance Testing (NHST), combined with a large number of institutional factors, jointly define the Generally Accepted Soft Social Science Publishing Process (GASSSPP) that is now dominant in the social sciences and is increasingly used elsewhere. The case against NHST and the GASSSPP has been abundantly articulated over past decades, and yet it continues to spread, supported by a large number of self-reinforcing institutional processes. In this article, the author presents a number of steps that may be taken to counter the spread of this corruption that directly address the institutional forces, both as individuals and through collaborative efforts. While individual efforts are indispensable to this undertaking, the author argues that these alone cannot succeed unless the institutional forces are also addressed. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 36-45 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1518271 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1518271 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:36-45 Template-Type: ReDIF-Article 1.0 Author-Name: Douglas W. Hubbard Author-X-Name-First: Douglas W. Author-X-Name-Last: Hubbard Author-Name: Alicia L. Carriquiry Author-X-Name-First: Alicia L. Author-X-Name-Last: Carriquiry Title: Quality Control for Scientific Research: Addressing Reproducibility, Responsiveness, and Relevance Abstract: Efforts to address a reproducibility crisis have generated several valid proposals for improving the quality of scientific research. We argue there is also need to address the separate but related issues of relevance and responsiveness. To address relevance, researchers must produce what decision makers actually need to inform investments and public policy—that is, the probability that a claim is true or the probability distribution of an effect size given the data. The term responsiveness refers to the irregularity and delay in which issues about the quality of research are brought to light. Instead of relying on the good fortune that some motivated researchers will periodically conduct efforts to reveal potential shortcomings of published research, we could establish a continuous quality-control process for scientific research itself. Quality metrics could be designed through the application of this statistical process control for the research enterprise. We argue that one quality control metric—the probability that a research hypothesis is true—is required to address at least relevance and may also be part of the solution for improving responsiveness and reproducibility. This article proposes a “straw man” solution which could be the basis of implementing these improvements. As part of this solution, we propose one way to “bootstrap” priors. The processes required for improving reproducibility and relevance can also be part of a comprehensive statistical quality control for science itself by making continuously monitored metrics about the scientific performance of a field of research. Journal: The American Statistician Pages: 46-55 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1543138 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1543138 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:46-55 Template-Type: ReDIF-Article 1.0 Author-Name: Naomi C. Brownstein Author-X-Name-First: Naomi C. Author-X-Name-Last: Brownstein Author-Name: Thomas A. Louis Author-X-Name-First: Thomas A. Author-X-Name-Last: Louis Author-Name: Anthony O’Hagan Author-X-Name-First: Anthony Author-X-Name-Last: O’Hagan Author-Name: Jane Pendergast Author-X-Name-First: Jane Author-X-Name-Last: Pendergast Title: The Role of Expert Judgment in Statistical Inference and Evidence-Based Decision-Making Abstract: This article resulted from our participation in the session on the “role of expert opinion and judgment in statistical inference” at the October 2017 ASA Symposium on Statistical Inference. We present a strong, unified statement on roles of expert judgment in statistics with processes for obtaining input, whether from a Bayesian or frequentist perspective. Topics include the role of subjectivity in the cycle of scientific inference and decisions, followed by a clinical trial and a greenhouse gas emissions case study that illustrate the role of judgments and the importance of basing them on objective information and a comprehensive uncertainty assessment. We close with a call for increased proactivity and involvement of statisticians in study conceptualization, design, conduct, analysis, and communication. Journal: The American Statistician Pages: 56-68 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1529623 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1529623 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:56-68 Template-Type: ReDIF-Article 1.0 Author-Name: Anthony O’Hagan Author-X-Name-First: Anthony Author-X-Name-Last: O’Hagan Title: Expert Knowledge Elicitation: Subjective but Scientific Abstract: Expert opinion and judgment enter into the practice of statistical inference and decision-making in numerous ways. Indeed, there is essentially no aspect of scientific investigation in which judgment is not required. Judgment is necessarily subjective, but should be made as carefully, as objectively, and as scientifically as possible.Elicitation of expert knowledge concerning an uncertain quantity expresses that knowledge in the form of a (subjective) probability distribution for the quantity. Such distributions play an important role in statistical inference (for example as prior distributions in a Bayesian analysis) and in evidence-based decision-making (for example as expressions of uncertainty regarding inputs to a decision model). This article sets out a number of practices through which elicitation can be made as rigorous and scientific as possible.One such practice is to follow a recognized protocol that is designed to address and minimize the cognitive biases that experts are prone to when making probabilistic judgments. We review the leading protocols in the field, and contrast their different approaches to dealing with these biases through the medium of a detailed case study employing the SHELF protocol.The article ends with discussion of how to elicit a joint probability distribution for multiple uncertain quantities, which is a challenge for all the leading protocols. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 69-81 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1518265 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1518265 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:69-81 Template-Type: ReDIF-Article 1.0 Author-Name: Lee Kennedy-Shaffer Author-X-Name-First: Lee Author-X-Name-Last: Kennedy-Shaffer Title: Before p < 0.05 to Beyond p < 0.05: Using History to Contextualize p-Values and Significance Testing Abstract: As statisticians and scientists consider a world beyond p < 0.05, it is important to not lose sight of how we got to this point. Although significance testing and p-values are often presented as prescriptive procedures, they came about through a process of refinement and extension to other disciplines. Ronald A. Fisher and his contemporaries formalized these methods in the early twentieth century and Fisher’s 1925 Statistical Methods for Research Workers brought the techniques to experimentalists in a variety of disciplines. Understanding how these methods arose, spread, and were argued over since then illuminates how p < 0.05 came to be a standard for scientific inference, the advantage it offered at the time, and how it was interpreted. This historical perspective can inform the work of statisticians today by encouraging thoughtful consideration of how their work, including proposed alternatives to the p-value, will be perceived and used by scientists. And it can engage students more fully and encourage critical thinking rather than rote applications of formulae. Incorporating history enables students, practitioners, and statisticians to treat the discipline as an ongoing endeavor, crafted by fallible humans, and provides a deeper understanding of the subject and its consequences for science and society. Journal: The American Statistician Pages: 82-90 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1537891 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1537891 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:82-90 Template-Type: ReDIF-Article 1.0 Author-Name: Raymond Hubbard Author-X-Name-First: Raymond Author-X-Name-Last: Hubbard Author-Name: Brian D. Haig Author-X-Name-First: Brian D. Author-X-Name-Last: Haig Author-Name: Rahul A. Parsa Author-X-Name-First: Rahul A. Author-X-Name-Last: Parsa Title: The Limited Role of Formal Statistical Inference in Scientific Inference Abstract: Such is the grip of formal methods of statistical inference—that is, frequentist methods for generalizing from sample to population in enumerative studies—in the drawing of scientific inferences that the two are routinely deemed equivalent in the social, management, and biomedical sciences. This, despite the fact that legitimate employment of said methods is difficult to implement on practical grounds alone. But supposing the adoption of these procedures were simple does not get us far; crucially, methods of formal statistical inference are ill-suited to the analysis of much scientific data. Even findings from the claimed gold standard for examination by the latter, randomized controlled trials, can be problematic.Scientific inference is a far broader concept than statistical inference. Its authority derives from the accumulation, over an extensive period of time, of both theoretical and empirical knowledge that has won the (provisional) acceptance of the scholarly community. A major focus of scientific inference can be viewed as the pursuit of significant sameness, meaning replicable and empirically generalizable results among phenomena. Regrettably, the obsession with users of statistical inference to report significant differences in data sets actively thwarts cumulative knowledge development.The manifold problems surrounding the implementation and usefulness of formal methods of statistical inference in advancing science do not speak well of much teaching in methods/statistics classes. Serious reflection on statistics' role in producing viable knowledge is needed. Commendably, the American Statistical Association is committed to addressing this challenge, as further witnessed in this special online, open access issue of The American Statistician. Journal: The American Statistician Pages: 91-98 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1464947 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1464947 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:91-98 Template-Type: ReDIF-Article 1.0 Author-Name: Blakeley B. McShane Author-X-Name-First: Blakeley B. Author-X-Name-Last: McShane Author-Name: Jennifer L. Tackett Author-X-Name-First: Jennifer L. Author-X-Name-Last: Tackett Author-Name: Ulf Böckenholt Author-X-Name-First: Ulf Author-X-Name-Last: Böckenholt Author-Name: Andrew Gelman Author-X-Name-First: Andrew Author-X-Name-Last: Gelman Title: Large-Scale Replication Projects in Contemporary Psychological Research Abstract: Replication is complicated in psychological research because studies of a given psychological phenomenon can never be direct or exact replications of one another, and thus effect sizes vary from one study of the phenomenon to the next—an issue of clear importance for replication. Current large-scale replication projects represent an important step forward for assessing replicability, but provide only limited information because they have thus far been designed in a manner such that heterogeneity either cannot be assessed or is intended to be eliminated. Consequently, the nontrivial degree of heterogeneity found in these projects represents a lower bound on the true degree of heterogeneity. We recommend enriching large-scale replication projects going forward by embracing heterogeneity. We argue this is the key for assessing replicability: if effect sizes are sufficiently heterogeneous—even if the sign of the effect is consistent—the phenomenon in question does not seem particularly replicable and the theory underlying it seems poorly constructed and in need of enrichment. Uncovering why and revising theory in light of it will lead to improved theory that explains heterogeneity and increases replicability. Given this, large-scale replication projects can play an important role not only in assessing replicability but also in advancing theory. Journal: The American Statistician Pages: 99-105 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1505655 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1505655 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:99-105 Template-Type: ReDIF-Article 1.0 Author-Name: Sander Greenland Author-X-Name-First: Sander Author-X-Name-Last: Greenland Title: Valid P-Values Behave Exactly as They Should: Some Misleading Criticisms of P-Values and Their Resolution With S-Values Abstract: The present note explores sources of misplaced criticisms of P-values, such as conflicting definitions of “significance levels” and “P-values” in authoritative sources, and the consequent misinterpretation of P-values as error probabilities. It then discusses several properties of P-values that have been presented as fatal flaws: That P-values exhibit extreme variation across samples (and thus are “unreliable”), confound effect size with sample size, are sensitive to sample size, and depend on investigator sampling intentions. These properties are often criticized from a likelihood or Bayesian framework, yet they are exactly the properties P-values should exhibit when they are constructed and interpreted correctly within their originating framework. Other common criticisms are that P-values force users to focus on irrelevant hypotheses and overstate evidence against those hypotheses. These problems are not however properties of P-values but are faults of researchers who focus on null hypotheses and overstate evidence based on misperceptions that p = 0.05 represents enough evidence to reject hypotheses. Those problems are easily seen without use of Bayesian concepts by translating the observed P-value p into the Shannon information (S-value or surprisal) –log2(p). Journal: The American Statistician Pages: 106-114 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1529625 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1529625 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:106-114 Template-Type: ReDIF-Article 1.0 Author-Name: Rebecca A. Betensky Author-X-Name-First: Rebecca A. Author-X-Name-Last: Betensky Title: The p-Value Requires Context, Not a Threshold Abstract: It is widely recognized by statisticians, though not as widely by other researchers, that the p-value cannot be interpreted in isolation, but rather must be considered in the context of certain features of the design and substantive application, such as sample size and meaningful effect size. I consider the setting of the normal mean and highlight the information contained in the p-value in conjunction with the sample size and meaningful effect size. The p-value and sample size jointly yield 95% confidence bounds for the effect of interest, which can be compared to the predetermined meaningful effect size to make inferences about the true effect. I provide simple examples to demonstrate that although the p-value is calculated under the null hypothesis, and thus seemingly may be divorced from the features of the study from which it arises, its interpretation as a measure of evidence requires its contextualization within the study. This implies that any proposal for improved use of the p-value as a measure of the strength of evidence cannot simply be a change to the threshold for significance. Journal: The American Statistician Pages: 115-117 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1529624 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1529624 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:115-117 Template-Type: ReDIF-Article 1.0 Author-Name: Andrew A. Anderson Author-X-Name-First: Andrew A. Author-X-Name-Last: Anderson Title: Assessing Statistical Results: Magnitude, Precision, and Model Uncertainty Abstract: Evaluating the importance and the strength of empirical evidence requires asking three questions: First, what are the practical implications of the findings? Second, how precise are the estimates? Confidence intervals provide an intuitive way to communicate precision. Although nontechnical audiences often misinterpret confidence intervals (CIs), I argue that the result is less dangerous than the misunderstandings that arise from hypothesis tests. Third, is the model correctly specified? The validity of point estimates and CIs depends on the soundness of the underlying model. Journal: The American Statistician Pages: 118-121 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1537889 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1537889 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:118-121 Template-Type: ReDIF-Article 1.0 Author-Name: Joachim I. Krueger Author-X-Name-First: Joachim I. Author-X-Name-Last: Krueger Author-Name: Patrick R. Heck Author-X-Name-First: Patrick R. Author-X-Name-Last: Heck Title: Putting the P-Value in its Place Abstract: As the debate over best statistical practices continues in academic journals, conferences, and the blogosphere, working researchers (e.g., psychologists) need to figure out how much time and effort to invest in attending to experts' arguments, how to design their next project, and how to craft a sustainable long-term strategy for data analysis and inference. The present special issue of The American Statistician promises help. In this article, we offer a modest proposal for a continued and informed use of the conventional p-value without the pitfalls of statistical rituals. Other statistical indices should complement reporting, and extra-statistical (e.g., theoretical) judgments ought to be made with care and clarity. Journal: The American Statistician Pages: 122-128 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1470033 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1470033 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:122-128 Template-Type: ReDIF-Article 1.0 Author-Name: Valen E. Johnson Author-X-Name-First: Valen E. Author-X-Name-Last: Johnson Title: Evidence From Marginally Significant t Statistics Abstract: This article examines the evidence contained in t statistics that are marginally significant in 5% tests. The bases for evaluating evidence are likelihood ratios and integrated likelihood ratios, computed under a variety of assumptions regarding the alternative hypotheses in null hypothesis significance tests. Likelihood ratios and integrated likelihood ratios provide a useful measure of the evidence in favor of competing hypotheses because they can be interpreted as representing the ratio of the probabilities that each hypothesis assigns to observed data. When they are either very large or very small, they suggest that one hypothesis is much better than the other in predicting observed data. If they are close to 1.0, then both hypotheses provide approximately equally valid explanations for observed data. I find that p-values that are close to 0.05 (i.e., that are “marginally significant”) correspond to integrated likelihood ratios that are bounded by approximately 7 in two-sided tests, and by approximately 4 in one-sided tests.The modest magnitude of integrated likelihood ratios corresponding to p-values close to 0.05 clearly suggests that higher standards of evidence are needed to support claims of novel discoveries and new effects. Journal: The American Statistician Pages: 129-134 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1518788 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1518788 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:129-134 Template-Type: ReDIF-Article 1.0 Author-Name: D. A. S. Fraser Author-X-Name-First: D. A. S. Author-X-Name-Last: Fraser Title: The p-value Function and Statistical Inference Abstract: This article has two objectives. The first and narrower is to formalize the p-value function, which records all possible p-values, each corresponding to a value for whatever the scalar parameter of interest is for the problem at hand, and to show how this p-value function directly provides full inference information for any corresponding user or scientist. The p-value function provides familiar inference objects: significance levels, confidence intervals, critical values for fixed-level tests, and the power function at all values of the parameter of interest. It thus gives an immediate accurate and visual summary of inference information for the parameter of interest. We show that the p-value function of the key scalar interest parameter records the statistical position of the observed data relative to that parameter, and we then describe an accurate approximation to that p-value function which is readily constructed. Journal: The American Statistician Pages: 135-147 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1556735 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1556735 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:135-147 Template-Type: ReDIF-Article 1.0 Author-Name: Jonathan Rougier Author-X-Name-First: Jonathan Author-X-Name-Last: Rougier Title: p-Values, Bayes Factors, and Sufficiency Abstract: Various approaches can be used to construct a model from a null distribution and a test statistic. I prove that one such approach, originating with D. R. Cox, has the property that the p-value is never greater than the Generalized Likelihood Ratio (GLR). When combined with the general result that the GLR is never greater than any Bayes factor, we conclude that, under Cox’s model, the p-value is never greater than any Bayes factor. I also provide a generalization, illustrations for the canonical Normal model, and an alternative approach based on sufficiency. This result is relevant for the ongoing discussion about the evidential value of small p-values, and the movement among statisticians to “redefine statistical significance.” Journal: The American Statistician Pages: 148-151 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1502684 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1502684 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:148-151 Template-Type: ReDIF-Article 1.0 Author-Name: Sherri Rose Author-X-Name-First: Sherri Author-X-Name-Last: Rose Author-Name: Thomas G. McGuire Author-X-Name-First: Thomas G. Author-X-Name-Last: McGuire Title: Limitations of P-Values and R-squared for Stepwise Regression Building: A Fairness Demonstration in Health Policy Risk Adjustment Abstract: Stepwise regression building procedures are commonly used applied statistical tools, despite their well-known drawbacks. While many of their limitations have been widely discussed in the literature, other aspects of the use of individual statistical fit measures, especially in high-dimensional stepwise regression settings, have not. Giving primacy to individual fit, as is done with p-values and R2, when group fit may be the larger concern, can lead to misguided decision making. One of the most consequential uses of stepwise regression is in health care, where these tools allocate hundreds of billions of dollars to health plans enrolling individuals with different predicted health care costs. The main goal of this “risk adjustment” system is to convey incentives to health plans such that they provide health care services fairly, a component of which is not to discriminate in access or care for persons or groups likely to be expensive. We address some specific limitations of p-values and R2 for high-dimensional stepwise regression in this policy problem through an illustrated example by additionally considering a group-level fairness metric. Journal: The American Statistician Pages: 152-156 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1518269 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1518269 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:152-156 Template-Type: ReDIF-Article 1.0 Author-Name: Jeffrey D. Blume Author-X-Name-First: Jeffrey D. Author-X-Name-Last: Blume Author-Name: Robert A. Greevy Author-X-Name-First: Robert A. Author-X-Name-Last: Greevy Author-Name: Valerie F. Welty Author-X-Name-First: Valerie F. Author-X-Name-Last: Welty Author-Name: Jeffrey R. Smith Author-X-Name-First: Jeffrey R. Author-X-Name-Last: Smith Author-Name: William D. Dupont Author-X-Name-First: William D. Author-X-Name-Last: Dupont Title: An Introduction to Second-Generation p-Values Abstract: Second generation p-values preserve the simplicity that has made p-values popular while resolving critical flaws that promote misinterpretation of data, distraction by trivial effects, and unreproducible assessments of data. The second-generation p-value (SGPV) is an extension that formally accounts for scientific relevance by using a composite null hypothesis that captures null and scientifically trivial effects. Because the majority of spurious findings are small effects that are technically nonnull but practically indistinguishable from the null, the second-generation approach greatly reduces the likelihood of a false discovery. SGPVs promote transparency, rigor and reproducibility of scientific results by a priori identifying which candidate hypotheses are practically meaningful and by providing a more reliable statistical summary of when the data are compatible with the candidate hypotheses or null hypotheses, or when the data are inconclusive. We illustrate the importance of these advances using a dataset of 247,000 single-nucleotide polymorphisms, i.e., genetic markers that are potentially associated with prostate cancer. Journal: The American Statistician Pages: 157-167 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1537893 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1537893 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:157-167 Template-Type: ReDIF-Article 1.0 Author-Name: William M. Goodman Author-X-Name-First: William M. Author-X-Name-Last: Goodman Author-Name: Susan E. Spruill Author-X-Name-First: Susan E. Author-X-Name-Last: Spruill Author-Name: Eugene Komaroff Author-X-Name-First: Eugene Author-X-Name-Last: Komaroff Title: A Proposed Hybrid Effect Size Plus p-Value Criterion: Empirical Evidence Supporting its Use Abstract: When the editors of Basic and Applied Social Psychology effectively banned the use of null hypothesis significance testing (NHST) from articles published in their journal, it set off a fire-storm of discussions both supporting the decision and defending the utility of NHST in scientific research. At the heart of NHST is the p-value which is the probability of obtaining an effect equal to or more extreme than the one observed in the sample data, given the null hypothesis and other model assumptions. Although this is conceptually different from the probability of the null hypothesis being true, given the sample, p-values nonetheless can provide evidential information, toward making an inference about a parameter. Applying a 10,000-case simulation described in this article, the authors found that p-values’ inferential signals to either reject or not reject a null hypothesis about the mean (α = 0.05) were consistent for almost 70% of the cases with the parameter’s true location for the sampled-from population. Success increases if a hybrid decision criterion, minimum effect size plus p-value (MESP), is used. Here, rejecting the null also requires the difference of the observed statistic from the exact null to be meaningfully large or practically significant, in the researcher’s judgment and experience. The simulation compares performances of several methods: from p-value and/or effect size-based, to confidence-interval based, under various conditions of true location of the mean, test power, and comparative sizes of the meaningful distance and population variability. For any inference procedure that outputs a binary indicator, like flagging whether a p-value is significant, the output of one single experiment is not sufficient evidence for a definitive conclusion. Yet, if a tool like MESP generates a relatively reliable signal and is used knowledgeably as part of a research process, it can provide useful information. Journal: The American Statistician Pages: 168-185 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1564697 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1564697 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:168-185 Template-Type: ReDIF-Article 1.0 Author-Name: Daniel J. Benjamin Author-X-Name-First: Daniel J. Author-X-Name-Last: Benjamin Author-Name: James O. Berger Author-X-Name-First: James O. Author-X-Name-Last: Berger Title: Three Recommendations for Improving the Use of p-Values Abstract: Researchers commonly use p-values to answer the question: How strongly does the evidence favor the alternative hypothesis relative to the null hypothesis? p-Values themselves do not directly answer this question and are often misinterpreted in ways that lead to overstating the evidence against the null hypothesis. Even in the “post p < 0.05 era,” however, it is quite possible that p-values will continue to be widely reported and used to assess the strength of evidence (if for no other reason than the widespread availability and use of statistical software that routinely produces p-values and thereby implicitly advocates for their use). If so, the potential for misinterpretation will persist. In this article, we recommend three practices that would help researchers more accurately interpret p-values. Each of the three recommended practices involves interpreting p-values in light of their corresponding “Bayes factor bound,” which is the largest odds in favor of the alternative hypothesis relative to the null hypothesis that is consistent with the observed data. The Bayes factor bound generally indicates that a given p-value provides weaker evidence against the null hypothesis than typically assumed. We therefore believe that our recommendations can guard against some of the most harmful p-value misinterpretations. In research communities that are deeply attached to reliance on “p < 0.05,” our recommendations will serve as initial steps away from this attachment. We emphasize that our recommendations are intended merely as initial, temporary steps and that many further steps will need to be taken to reach the ultimate destination: a holistic interpretation of statistical evidence that fully conforms to the principles laid out in the ASA statement on statistical significance and p-values. Journal: The American Statistician Pages: 186-191 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1543135 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1543135 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:186-191 Template-Type: ReDIF-Article 1.0 Author-Name: David Colquhoun Author-X-Name-First: David Author-X-Name-Last: Colquhoun Title: The False Positive Risk: A Proposal Concerning What to Do About p-Values Abstract: It is widely acknowledged that the biomedical literature suffers from a surfeit of false positive results. Part of the reason for this is the persistence of the myth that observation of p < 0.05 is sufficient justification to claim that you have made a discovery. It is hopeless to expect users to change their reliance on p-values unless they are offered an alternative way of judging the reliability of their conclusions. If the alternative method is to have a chance of being adopted widely, it will have to be easy to understand and to calculate. One such proposal is based on calculation of false positive risk(FPR). It is suggested that p-values and confidence intervals should continue to be given, but that they should be supplemented by a single additional number that conveys the strength of the evidence better than the p-value. This number could be the minimum FPR (that calculated on the assumption of a prior probability of 0.5, the largest value that can be assumed in the absence of hard prior data). Alternatively one could specify the prior probability that it would be necessary to believe in order to achieve an FPR of, say, 0.05. Journal: The American Statistician Pages: 192-201 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1529622 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1529622 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:192-201 Template-Type: ReDIF-Article 1.0 Author-Name: Robert A. J. Matthews Author-X-Name-First: Robert A. J. Author-X-Name-Last: Matthews Title: Moving Towards the Post p < 0.05 Era via the Analysis of Credibility Abstract: It is now widely accepted that the techniques of null hypothesis significance testing (NHST) are routinely misused and misinterpreted by researchers seeking insight from data. There is, however, no consensus on acceptable alternatives, leaving researchers with little choice but to continue using NHST, regardless of its failings. I examine the potential for the Analysis of Credibility (AnCred) to resolve this impasse. Using real-life examples, I assess the ability of AnCred to provide researchers with a simple but robust framework for assessing study findings that goes beyond the standard dichotomy of statistical significance/nonsignificance. By extracting more insight from standard summary statistics while offering more protection against inferential fallacies, AnCred may encourage researchers to move toward the post p < 0.05 era. Journal: The American Statistician Pages: 202-212 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1543136 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1543136 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:202-212 Template-Type: ReDIF-Article 1.0 Author-Name: Mark Andrew Gannon Author-X-Name-First: Mark Andrew Author-X-Name-Last: Gannon Author-Name: Carlos Alberto de Bragança Pereira Author-X-Name-First: Carlos Alberto Author-X-Name-Last: de Bragança Pereira Author-Name: Adriano Polpo Author-X-Name-First: Adriano Author-X-Name-Last: Polpo Title: Blending Bayesian and Classical Tools to Define Optimal Sample-Size-Dependent Significance Levels Abstract: This article argues that researchers do not need to completely abandon the p-value, the best-known significance index, but should instead stop using significance levels that do not depend on sample sizes. A testing procedure is developed using a mixture of frequentist and Bayesian tools, with a significance level that is a function of sample size, obtained from a generalized form of the Neyman–Pearson Lemma that minimizes a linear combination of α, the probability of rejecting a true null hypothesis, and β, the probability of failing to reject a false null, instead of fixing α and minimizing β. The resulting hypothesis tests do not violate the Likelihood Principle and do not require any constraints on the dimensionalities of the sample space and parameter space. The procedure includes an ordering of the entire sample space and uses predictive probability (density) functions, allowing for testing of both simple and compound hypotheses. Accessible examples are presented to highlight specific characteristics of the new tests. Journal: The American Statistician Pages: 213-222 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1518268 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1518268 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:213-222 Template-Type: ReDIF-Article 1.0 Author-Name: Stanley Pogrow Author-X-Name-First: Stanley Author-X-Name-Last: Pogrow Title: How Effect Size (Practical Significance) Misleads Clinical Practice: The Case for Switching to Practical Benefit to Assess Applied Research Findings Abstract: Relying on effect size as a measure of practical significance is turning out to be just as misleading as using p-values to determine the effectiveness of interventions for improving clinical practice in complex organizations such as schools. This article explains how effect sizes have misdirected practice in education and other disciplines. Even when effect size is incorporated into RCT research the recommendations of whether interventions are effective are misleading and generally useless to practitioners. As a result, a new criterion of practical benefit is recommended for evaluating research findings about the effectiveness of interventions in complex organizations where benchmarks of existing performance exist. Practical benefit exists when the unadjusted performance of an experimental group provides a noticeable advantage over an existing benchmark. Some basic principles for determining practical benefit are provided. Practical benefit is more intuitive and is expected to enable leaders to make more accurate assessments as to whether published research findings are likely to produce noticeable improvements in their organizations. In addition, practical benefit is used routinely as the research criterion for the alternative scientific methodology of improvement science that has an established track record of being a more efficient way to develop new interventions that improve practice dramatically than RCT research. Finally, the problems with practical significance suggest that the research community should seek different inferential methods for research designed to improve clinical performance in complex organizations, as compared to methods for testing theories and medicines. Journal: The American Statistician Pages: 223-234 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1549101 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1549101 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:223-234 Template-Type: ReDIF-Article 1.0 Author-Name: Blakeley B. McShane Author-X-Name-First: Blakeley B. Author-X-Name-Last: McShane Author-Name: David Gal Author-X-Name-First: David Author-X-Name-Last: Gal Author-Name: Andrew Gelman Author-X-Name-First: Andrew Author-X-Name-Last: Gelman Author-Name: Christian Robert Author-X-Name-First: Christian Author-X-Name-Last: Robert Author-Name: Jennifer L. Tackett Author-X-Name-First: Jennifer L. Author-X-Name-Last: Tackett Title: Abandon Statistical Significance Abstract: We discuss problems the null hypothesis significance testing (NHST) paradigm poses for replication and more broadly in the biomedical and social sciences as well as how these problems remain unresolved by proposals involving modified p-value thresholds, confidence intervals, and Bayes factors. We then discuss our own proposal, which is to abandon statistical significance. We recommend dropping the NHST paradigm—and the p-value thresholds intrinsic to it—as the default statistical paradigm for research, publication, and discovery in the biomedical and social sciences. Specifically, we propose that the p-value be demoted from its threshold screening role and instead, treated continuously, be considered along with currently subordinate factors (e.g., related prior evidence, plausibility of mechanism, study design and data quality, real world costs and benefits, novelty of finding, and other factors that vary by research domain) as just one among many pieces of evidence. We have no desire to “ban” p-values or other purely statistical measures. Rather, we believe that such measures should not be thresholded and that, thresholded or not, they should not take priority over the currently subordinate factors. We also argue that it seldom makes sense to calibrate evidence as a function of p-values or other purely statistical measures. We offer recommendations for how our proposal can be implemented in the scientific publication process as well as in statistical decision making more broadly. Journal: The American Statistician Pages: 235-245 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1527253 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1527253 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:235-245 Template-Type: ReDIF-Article 1.0 Author-Name: Christopher Tong Author-X-Name-First: Christopher Author-X-Name-Last: Tong Title: Statistical Inference Enables Bad Science; Statistical Thinking Enables Good Science Abstract: Scientific research of all kinds should be guided by statistical thinking: in the design and conduct of the study, in the disciplined exploration and enlightened display of the data, and to avoid statistical pitfalls in the interpretation of the results. However, formal, probability-based statistical inference should play no role in most scientific research, which is inherently exploratory, requiring flexible methods of analysis that inherently risk overfitting. The nature of exploratory work is that data are used to help guide model choice, and under these circumstances, uncertainty cannot be precisely quantified, because of the inevitable model selection bias that results. To be valid, statistical inference should be restricted to situations where the study design and analysis plan are specified prior to data collection. Exploratory data analysis provides the flexibility needed for most other situations, including statistical methods that are regularized, robust, or nonparametric. Of course, no individual statistical analysis should be considered sufficient to establish scientific validity: research requires many sets of data along many lines of evidence, with a watchfulness for systematic error. Replicating and predicting findings in new data and new settings is a stronger way of validating claims than blessing results from an isolated study with statistical inferences. Journal: The American Statistician Pages: 246-261 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1518264 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1518264 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:246-261 Template-Type: ReDIF-Article 1.0 Author-Name: Valentin Amrhein Author-X-Name-First: Valentin Author-X-Name-Last: Amrhein Author-Name: David Trafimow Author-X-Name-First: David Author-X-Name-Last: Trafimow Author-Name: Sander Greenland Author-X-Name-First: Sander Author-X-Name-Last: Greenland Title: Inferential Statistics as Descriptive Statistics: There Is No Replication Crisis if We Don’t Expect Replication Abstract: Statistical inference often fails to replicate. One reason is that many results may be selected for drawing inference because some threshold of a statistic like the P-value was crossed, leading to biased reported effect sizes. Nonetheless, considerable non-replication is to be expected even without selective reporting, and generalizations from single studies are rarely if ever warranted. Honestly reported results must vary from replication to replication because of varying assumption violations and random variation; excessive agreement itself would suggest deeper problems, such as failure to publish results in conflict with group expectations or desires. A general perception of a “replication crisis” may thus reflect failure to recognize that statistical tests not only test hypotheses, but countless assumptions and the entire environment in which research takes place. Because of all the uncertain and unknown assumptions that underpin statistical inferences, we should treat inferential statistics as highly unstable local descriptions of relations between assumptions and data, rather than as providing generalizable inferences about hypotheses or models. And that means we should treat statistical results as being much more incomplete and uncertain than is currently the norm. Acknowledging this uncertainty could help reduce the allure of selective reporting: Since a small P-value could be large in a replication study, and a large P-value could be small, there is simply no need to selectively report studies based on statistical results. Rather than focusing our study reports on uncertain conclusions, we should thus focus on describing accurately how the study was conducted, what problems occurred, what data were obtained, what analysis methods were used and why, and what output those methods produced. Journal: The American Statistician Pages: 262-270 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1543137 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1543137 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:262-270 Template-Type: ReDIF-Article 1.0 Author-Name: Robert J. Calin-Jageman Author-X-Name-First: Robert J. Author-X-Name-Last: Calin-Jageman Author-Name: Geoff Cumming Author-X-Name-First: Geoff Author-X-Name-Last: Cumming Title: The New Statistics for Better Science: Ask How Much, How Uncertain, and What Else Is Known Abstract: The “New Statistics” emphasizes effect sizes, confidence intervals, meta-analysis, and the use of Open Science practices. We present three specific ways in which a New Statistics approach can help improve scientific practice: by reducing overconfidence in small samples, by reducing confirmation bias, and by fostering more cautious judgments of consistency. We illustrate these points through consideration of the literature on oxytocin and human trust, a research area that typifies some of the endemic problems that arise with poor statistical practice. Journal: The American Statistician Pages: 271-280 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1518266 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1518266 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:271-280 Template-Type: ReDIF-Article 1.0 Author-Name: Stephen T. Ziliak Author-X-Name-First: Stephen T. Author-X-Name-Last: Ziliak Title: How Large Are Your G-Values? Try Gosset’s Guinnessometrics When a Little “p” Is Not Enough Abstract: A crisis of validity has emerged from three related crises of science, that is, the crises of statistical significance and complete randomization, of replication, and of reproducibility. Guinnessometrics takes commonplace assumptions and methods of statistical science and stands them on their head, from little p-values to unstructured Big Data. Guinnessometrics focuses instead on the substantive significance which emerges from a small series of independent and economical yet balanced and repeated experiments. Originally developed and market-tested by William S. Gosset aka “Student” in his job as Head Experimental Brewer at the Guinness Brewery in Dublin, Gosset’s economic and common sense approach to statistical inference and scientific method has been unwisely neglected. In many areas of science and life, the 10 principles of Guinnessometrics or G-values outlined here can help. Other things equal, the larger the G-values, the better the science and judgment. By now a colleague, neighbor, or YouTube junkie has probably shown you one of those wacky psychology experiments in a video involving a gorilla, and testing the limits of human cognition. In one video, a person wearing a gorilla suit suddenly appears on the scene among humans, who are themselves engaged in some ordinary, mundane activity such as passing a basketball. The funny thing is, prankster researchers have discovered, when observers are asked to think about the mundane activity (such as by counting the number of observed passes of a basketball), the unexpected gorilla is frequently unseen (for discussion see Kahneman 2011). The gorilla is invisible. People don’t see it. Journal: The American Statistician Pages: 281-290 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1514325 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1514325 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:281-290 Template-Type: ReDIF-Article 1.0 Author-Name: Dean Billheimer Author-X-Name-First: Dean Author-X-Name-Last: Billheimer Title: Predictive Inference and Scientific Reproducibility Abstract: Most statistical analyses use hypothesis tests or estimation about parameters to form inferential conclusions. I think this is noble, but misguided. The point of view expressed here is that observables are fundamental, and that the goal of statistical modeling should be to predict future observations, given the current data and other relevant information. Further, the prediction of future observables provides multiple advantages to practicing scientists, and to science in general. These include an interpretable numerical summary of a quantity of direct interest to current and future researchers, a calibrated prediction of what’s likely to happen in future experiments, a prediction that can be either “corroborated” or “refuted” through experimentation, and avoidance of inference about parameters; quantities that exists only as convenient indices of hypothetical distributions. Finally, the predictive probability of a future observable can be used as a standard for communicating the reliability of the current work, regardless of whether confirmatory experiments are conducted. Adoption of this paradigm would improve our rigor for scientific accuracy and reproducibility by shifting our focus from “finding differences” among hypothetical parameters to predicting observable events based on our current scientific understanding. Journal: The American Statistician Pages: 291-295 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1518270 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1518270 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:291-295 Template-Type: ReDIF-Article 1.0 Author-Name: Charles F. Manski Author-X-Name-First: Charles F. Author-X-Name-Last: Manski Title: Treatment Choice With Trial Data: Statistical Decision Theory Should Supplant Hypothesis Testing Abstract: A central objective of empirical research on treatment response is to inform treatment choice. Unfortunately, researchers commonly use concepts of statistical inference whose foundations are distant from the problem of treatment choice. It has been particularly common to use hypothesis tests to compare treatments. Wald’s development of statistical decision theory provides a coherent frequentist framework for use of sample data on treatment response to make treatment decisions. A body of recent research applies statistical decision theory to characterize uniformly satisfactory treatment choices, in the sense of maximum loss relative to optimal decisions (also known as maximum regret). This article describes the basic ideas and findings, which provide an appealing practical alternative to use of hypothesis tests. For simplicity, the article focuses on medical treatment with evidence from classical randomized clinical trials. The ideas apply generally, encompassing use of observational data and treatment choice in nonmedical contexts. Journal: The American Statistician Pages: 296-304 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1513377 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1513377 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:296-304 Template-Type: ReDIF-Article 1.0 Author-Name: Charles F. Manski Author-X-Name-First: Charles F. Author-X-Name-Last: Manski Author-Name: Aleksey Tetenov Author-X-Name-First: Aleksey Author-X-Name-Last: Tetenov Title: Trial Size for Near-Optimal Choice Between Surveillance and Aggressive Treatment: Reconsidering MSLT-II Abstract: A convention in designing randomized clinical trials has been to choose sample sizes that yield specified statistical power when testing hypotheses about treatment response. Manski and Tetenov recently critiqued this convention and proposed enrollment of sufficiently many subjects to enable near-optimal treatment choices. This article develops a refined version of that analysis applicable to trials comparing aggressive treatment of patients with surveillance. The need for a refined analysis arises because the earlier work assumed that there is only a primary health outcome of interest, without secondary outcomes. An important aspect of choice between surveillance and aggressive treatment is that the latter may have side effects. One should then consider how the primary outcome and side effects jointly determine patient welfare. This requires new analysis of sample design. As a case study, we reconsider a trial comparing nodal observation and lymph node dissection when treating patients with cutaneous melanoma. Using a statistical power calculation, the investigators assigned 971 patients to dissection and 968 to observation. We conclude that assigning 244 patients to each option would yield findings that enable suitably near-optimal treatment choice. Thus, a much smaller sample size would have sufficed to inform clinical practice. Journal: The American Statistician Pages: 305-311 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1543617 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1543617 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:305-311 Template-Type: ReDIF-Article 1.0 Author-Name: Michael Lavine Author-X-Name-First: Michael Author-X-Name-Last: Lavine Title: Frequentist, Bayes, or Other? Abstract: Both philosophically and in practice, statistics is dominated by frequentist and Bayesian thinking. Under those paradigms, our courses and textbooks talk about the accuracy with which true model parameters are estimated or the posterior probability that they lie in a given set. In nonparametric problems, they talk about convergence to the true function (density, regression, etc.) or the probability that the true function lies in a given set. But the usual paradigms' focus on learning the true model and parameters can distract the analyst from another important task: discovering whether there are many sets of models and parameters that describe the data reasonably well. When we discover many good models we can see in what ways they agree. Points of agreement give us more confidence in our inferences, but points of disagreement give us less. Further, the usual paradigms’ focus seduces us into judging and adopting procedures according to how well they learn the true values. An alternative is to judge models and parameter values, not procedures, and judge them by how well they describe data, not how close they come to the truth. The latter is especially appealing in problems without a true model. Journal: The American Statistician Pages: 312-318 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1459317 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1459317 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:312-318 Template-Type: ReDIF-Article 1.0 Author-Name: Stephen J. Ruberg Author-X-Name-First: Stephen J. Author-X-Name-Last: Ruberg Author-Name: Frank E. Harrell Author-X-Name-First: Frank E. Author-X-Name-Last: Harrell Author-Name: Margaret Gamalo-Siebers Author-X-Name-First: Margaret Author-X-Name-Last: Gamalo-Siebers Author-Name: Lisa LaVange Author-X-Name-First: Lisa Author-X-Name-Last: LaVange Author-Name: J. Jack Lee Author-X-Name-First: J. Author-X-Name-Last: Jack Lee Author-Name: Karen Price Author-X-Name-First: Karen Author-X-Name-Last: Price Author-Name: Carl Peck Author-X-Name-First: Carl Author-X-Name-Last: Peck Title: Inference and Decision Making for 21st-Century Drug Development and Approval Abstract: The cost and time of pharmaceutical drug development continue to grow at rates that many say are unsustainable. These trends have enormous impact on what treatments get to patients, when they get them and how they are used. The statistical framework for supporting decisions in regulated clinical development of new medicines has followed a traditional path of frequentist methodology. Trials using hypothesis tests of “no treatment effect” are done routinely, and the p-value < 0.05 is often the determinant of what constitutes a “successful” trial. Many drugs fail in clinical development, adding to the cost of new medicines, and some evidence points blame at the deficiencies of the frequentist paradigm. An unknown number effective medicines may have been abandoned because trials were declared “unsuccessful” due to a p-value exceeding 0.05. Recently, the Bayesian paradigm has shown utility in the clinical drug development process for its probability-based inference. We argue for a Bayesian approach that employs data from other trials as a “prior” for Phase 3 trials so that synthesized evidence across trials can be utilized to compute probability statements that are valuable for understanding the magnitude of treatment effect. Such a Bayesian paradigm provides a promising framework for improving statistical inference and regulatory decision making. Journal: The American Statistician Pages: 319-327 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2019.1566091 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1566091 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:319-327 Template-Type: ReDIF-Article 1.0 Author-Name: Noah N. N. van Dongen Author-X-Name-First: Noah N. N. Author-X-Name-Last: van Dongen Author-Name: Johnny B. van Doorn Author-X-Name-First: Johnny B. Author-X-Name-Last: van Doorn Author-Name: Quentin F. Gronau Author-X-Name-First: Quentin F. Author-X-Name-Last: Gronau Author-Name: Don van Ravenzwaaij Author-X-Name-First: Don Author-X-Name-Last: van Ravenzwaaij Author-Name: Rink Hoekstra Author-X-Name-First: Rink Author-X-Name-Last: Hoekstra Author-Name: Matthias N. Haucke Author-X-Name-First: Matthias N. Author-X-Name-Last: Haucke Author-Name: Daniel Lakens Author-X-Name-First: Daniel Author-X-Name-Last: Lakens Author-Name: Christian Hennig Author-X-Name-First: Christian Author-X-Name-Last: Hennig Author-Name: Richard D. Morey Author-X-Name-First: Richard D. Author-X-Name-Last: Morey Author-Name: Saskia Homer Author-X-Name-First: Saskia Author-X-Name-Last: Homer Author-Name: Andrew Gelman Author-X-Name-First: Andrew Author-X-Name-Last: Gelman Author-Name: Jan Sprenger Author-X-Name-First: Jan Author-X-Name-Last: Sprenger Author-Name: Eric-Jan Wagenmakers Author-X-Name-First: Eric-Jan Author-X-Name-Last: Wagenmakers Title: Multiple Perspectives on Inference for Two Simple Statistical Scenarios Abstract: When data analysts operate within different statistical frameworks (e.g., frequentist versus Bayesian, emphasis on estimation versus emphasis on testing), how does this impact the qualitative conclusions that are drawn for real data? To study this question empirically we selected from the literature two simple scenarios—involving a comparison of two proportions and a Pearson correlation—and asked four teams of statisticians to provide a concise analysis and a qualitative interpretation of the outcome. The results showed considerable overall agreement; nevertheless, this agreement did not appear to diminish the intensity of the subsequent debate over which statistical framework is more appropriate to address the questions at hand. Journal: The American Statistician Pages: 328-339 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2019.1565553 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1565553 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:328-339 Template-Type: ReDIF-Article 1.0 Author-Name: David Trafimow Author-X-Name-First: David Author-X-Name-Last: Trafimow Title: Five Nonobvious Changes in Editorial Practice for Editors and Reviewers to Consider When Evaluating Submissions in a Post p < 0.05 Universe Abstract: The American Statistical Association’s Symposium on Statistical Inference (SSI) included a session on how editorial practices should change in a universe no longer dominated by null hypothesis significance testing (NHST). The underlying assumptions were first, that NHST is problematic; and second, that editorial practices really should change. The present article is based on my talk in this session, and on these assumptions. Consistent with the spirit of the SSI, my focus is not on what reviewers and editors should not do (e.g., NHST) but rather on what they should do, with an emphasis on changes that are not obvious. The recommended changes include a wider consideration of the nature of the contribution than submitted manuscripts usually receive; a greater tolerance of ambiguity; more of an emphasis on the thinking and execution of the study, with a decreased emphasis on the findings; replacing NHST with the a priori procedure; and a call for reviewers and editors to recognize that there are many cases where the basic assumptions of inferential statistical procedures simply are not met, and that inferential statistics (even the a priori procedure) may consequently be inappropriate. Journal: The American Statistician Pages: 340-345 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1537888 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1537888 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:340-345 Template-Type: ReDIF-Article 1.0 Author-Name: Joseph J. Locascio Author-X-Name-First: Joseph J. Author-X-Name-Last: Locascio Title: The Impact of Results Blind Science Publishing on Statistical Consultation and Collaboration Abstract: The author has previously proposed results blind manuscript evaluation (RBME) as a method of ameliorating often cited problems of statistical inference and scientific publication, notably publication bias, overuse/misuse of null hypothesis significance testing (NHST), and irreproducibility of reported scientific results. In RBME, manuscripts submitted to scientific journals are assessed for suitability for publication without regard to their reported results. Criteria for publication are based exclusively on the substantive importance of the research question addressed in the study, conveyed in the Introduction section of the manuscript, and the quality of the methodology, as reported in the Methods section. Practically, this policy is implemented by a two stage process whereby the editor initially distributes only the Introduction and Methods sections of a submitted manuscript to reviewers and a provisional decision regarding acceptance is made, followed by a second stage in which the complete manuscript is distributed for review but only if the decision of the first stage is for acceptance. The present paper expands upon this recommendation by addressing implications of this proposed policy with respect to statistical consultation and collaboration in research. It is suggested that under RBME, statisticians will become more integrated into research endeavors and called upon sooner for their input. Journal: The American Statistician Pages: 346-351 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1505658 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1505658 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:346-351 Template-Type: ReDIF-Article 1.0 Author-Name: Stuart H. Hurlbert Author-X-Name-First: Stuart H. Author-X-Name-Last: Hurlbert Author-Name: Richard A. Levine Author-X-Name-First: Richard A. Author-X-Name-Last: Levine Author-Name: Jessica Utts Author-X-Name-First: Jessica Author-X-Name-Last: Utts Title: Coup de Grâce for a Tough Old Bull: “Statistically Significant” Expires Abstract: Many controversies in statistics are due primarily or solely to poor quality control in journals, bad statistical textbooks, bad teaching, unclear writing, and lack of knowledge of the historical literature. One way to improve the practice of statistics and resolve these issues is to do what initiators of the 2016 ASA statement did: take one issue at a time, have extensive discussions about the issue among statisticians of diverse backgrounds and perspectives and eventually develop and publish a broadly supported consensus on that issue. Upon completion of this task, we then move on to deal with another core issue in the same way. We propose as the next project a process that might lead quickly to a strong consensus that the term “statistically significant” and all its cognates and symbolic adjuncts be disallowed in the scientific literature except where focus is on the history of statistics and its philosophies and methodologies. Calculation and presentation of accurate p-values will often remain highly desirable though not obligatory. Supplementary materials for this article are available online in the form of an appendix listing the names and institutions of 48 other statisticians and scientists who endorse the principal propositions put forward here. Journal: The American Statistician Pages: 352-357 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1543616 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1543616 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:352-357 Template-Type: ReDIF-Article 1.0 Author-Name: Harlan Campbell Author-X-Name-First: Harlan Author-X-Name-Last: Campbell Author-Name: Paul Gustafson Author-X-Name-First: Paul Author-X-Name-Last: Gustafson Title: The World of Research Has Gone Berserk: Modeling the Consequences of Requiring “Greater Statistical Stringency” for Scientific Publication Abstract: In response to growing concern about the reliability and reproducibility of published science, researchers have proposed adopting measures of “greater statistical stringency,” including suggestions to require larger sample sizes and to lower the highly criticized “p < 0.05” significance threshold. While pros and cons are vigorously debated, there has been little to no modeling of how adopting these measures might affect what type of science is published. In this article, we develop a novel optimality model that, given current incentives to publish, predicts a researcher’s most rational use of resources in terms of the number of studies to undertake, the statistical power to devote to each study, and the desirable prestudy odds to pursue. We then develop a methodology that allows one to estimate the reliability of published research by considering a distribution of preferred research strategies. Using this approach, we investigate the merits of adopting measures of “greater statistical stringency” with the goal of informing the ongoing debate. Journal: The American Statistician Pages: 358-373 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1555101 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1555101 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:358-373 Template-Type: ReDIF-Article 1.0 Author-Name: Ronald D. Fricker Author-X-Name-First: Ronald D. Author-X-Name-Last: Fricker Author-Name: Katherine Burke Author-X-Name-First: Katherine Author-X-Name-Last: Burke Author-Name: Xiaoyan Han Author-X-Name-First: Xiaoyan Author-X-Name-Last: Han Author-Name: William H. Woodall Author-X-Name-First: William H. Author-X-Name-Last: Woodall Title: Assessing the Statistical Analyses Used in Basic and Applied Social Psychology After Their p-Value Ban Abstract: In this article, we assess the 31 articles published in Basic and Applied Social Psychology (BASP) in 2016, which is one full year after the BASP editors banned the use of inferential statistics. We discuss how the authors collected their data, how they reported and summarized their data, and how they used their data to reach conclusions. We found multiple instances of authors overstating conclusions beyond what the data would support if statistical significance had been considered. Readers would be largely unable to recognize this because the necessary information to do so was not readily available. Journal: The American Statistician Pages: 374-384 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1537892 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1537892 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:374-384 Template-Type: ReDIF-Article 1.0 Author-Name: Karsten Maurer Author-X-Name-First: Karsten Author-X-Name-Last: Maurer Author-Name: Lynette Hudiburgh Author-X-Name-First: Lynette Author-X-Name-Last: Hudiburgh Author-Name: Lisa Werwinski Author-X-Name-First: Lisa Author-X-Name-Last: Werwinski Author-Name: John Bailer Author-X-Name-First: John Author-X-Name-Last: Bailer Title: Content Audit for p-value Principles in Introductory Statistics Abstract: Longstanding concerns with the role and interpretation of p-values in statistical practice prompted the American Statistical Association (ASA) to make a statement on p-values. The ASA statement spurred a flurry of responses and discussions by statisticians, with many wondering about the steps necessary to expand the adoption of these principles. Introductory statistics classrooms are key locations to introduce and emphasize the nuance related to p-values; in part because they engrain appropriate analysis choices at the earliest stages of statistics education, and also because they reach the broadest group of students. We propose a framework for statistics departments to conduct a content audit for p-value principles in their introductory curriculum. We then discuss the process and results from applying this course audit framework within our own statistics department. We also recommend meeting with client departments as a complement to the course audit. Discussions about analyses and practices common to particular fields can help to evaluate if our service courses are meeting the needs of client departments and to identify what is needed in our introductory courses to combat the misunderstanding and future misuse of p-values. Journal: The American Statistician Pages: 385-391 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1537890 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1537890 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:385-391 Template-Type: ReDIF-Article 1.0 Author-Name: E. Ashley Steel Author-X-Name-First: E. Ashley Author-X-Name-Last: Steel Author-Name: Martin Liermann Author-X-Name-First: Martin Author-X-Name-Last: Liermann Author-Name: Peter Guttorp Author-X-Name-First: Peter Author-X-Name-Last: Guttorp Title: Beyond Calculations: A Course in Statistical Thinking Abstract: Statisticians are in general agreement that there are flaws in how science is currently practiced; there is less agreement in how to make repairs. Our prescription for a Post-p < 0.05 Era is to develop and teach courses that expand our view of what constitutes the domain of statistics and thereby bridge undergraduate statistics coursework and the graduate student experience of applying statistics in research. Such courses can speed up the process of gaining statistical wisdom by giving students insight into the human propensity to make statistical errors, the meaning of a single test within a research project, ways in which p-values work and don't work as expected, the role of statistics in the lifecycle of science, and best practices for statistical communication. The course we have developed follows the story of how we use data to understand the world, leveraging simulation-based approaches to perform customized analyses and evaluate the behavior of statistical procedures. We provide ideas for expanding beyond the traditional classroom, two example activities, and a course syllabus as well as the set of statistical best practices for creating and consuming scientific information that we develop during the course. Journal: The American Statistician Pages: 392-401 Issue: S1 Volume: 73 Year: 2019 Month: 3 X-DOI: 10.1080/00031305.2018.1505657 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1505657 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:392-401 Template-Type: ReDIF-Article 1.0 Author-Name: Johnny van Doorn Author-X-Name-First: Johnny Author-X-Name-Last: van Doorn Author-Name: Alexander Ly Author-X-Name-First: Alexander Author-X-Name-Last: Ly Author-Name: Maarten Marsman Author-X-Name-First: Maarten Author-X-Name-Last: Marsman Author-Name: Eric-Jan Wagenmakers Author-X-Name-First: Eric-Jan Author-X-Name-Last: Wagenmakers Title: Bayesian Inference for Kendall’s Rank Correlation Coefficient Abstract: This article outlines a Bayesian methodology to estimate and test the Kendall rank correlation coefficient τ. The nonparametric nature of rank data implies the absence of a generative model and the lack of an explicit likelihood function. These challenges can be overcome by modeling test statistics rather than data. We also introduce a method for obtaining a default prior distribution. The combined result is an inferential methodology that yields a posterior distribution for Kendall’s τ. Journal: The American Statistician Pages: 303-308 Issue: 4 Volume: 72 Year: 2018 Month: 10 X-DOI: 10.1080/00031305.2016.1264998 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1264998 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:303-308 Template-Type: ReDIF-Article 1.0 Author-Name: Agnan Kessy Author-X-Name-First: Agnan Author-X-Name-Last: Kessy Author-Name: Alex Lewin Author-X-Name-First: Alex Author-X-Name-Last: Lewin Author-Name: Korbinian Strimmer Author-X-Name-First: Korbinian Author-X-Name-Last: Strimmer Title: Optimal Whitening and Decorrelation Abstract: Whitening, or sphering, is a common preprocessing step in statistical analysis to transform random variables to orthogonality. However, due to rotational freedom there are infinitely many possible whitening procedures. Consequently, there is a diverse range of sphering methods in use, for example, based on principal component analysis (PCA), Cholesky matrix decomposition, and zero-phase component analysis (ZCA), among others. Here, we provide an overview of the underlying theory and discuss five natural whitening procedures. Subsequently, we demonstrate that investigating the cross-covariance and the cross-correlation matrix between sphered and original variables allows to break the rotational invariance and to identify optimal whitening transformations. As a result we recommend two particular approaches: ZCA-cor whitening to produce sphered variables that are maximally similar to the original variables, and PCA-cor whitening to obtain sphered variables that maximally compress the original variables. Journal: The American Statistician Pages: 309-314 Issue: 4 Volume: 72 Year: 2018 Month: 10 X-DOI: 10.1080/00031305.2016.1277159 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1277159 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:309-314 Template-Type: ReDIF-Article 1.0 Author-Name: Weizhen Wang Author-X-Name-First: Weizhen Author-X-Name-Last: Wang Title: A “Paradox” in Confidence Interval Construction Using Sufficient Statistics Abstract: Statistical inference about parameters should depend on raw data only through sufficient statistics—the well known sufficiency principle. In particular, inference should depend on minimal sufficient statistics if these are simpler than the raw data. In this article, we construct one-sided confidence intervals for a proportion which: (i) depend on the raw binary data, and (ii) are uniformly shorter than the smallest intervals based on the binomial random variable—a minimal sufficient statistic. In practice, randomized confidence intervals are seldom used. The proposed intervals violate the aforementioned principle if the search of optimal intervals is restricted within the class of nonrandomized confidence intervals. Similar results occur for other discrete distributions. Journal: The American Statistician Pages: 315-320 Issue: 4 Volume: 72 Year: 2018 Month: 10 X-DOI: 10.1080/00031305.2017.1305292 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1305292 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:315-320 Template-Type: ReDIF-Article 1.0 Author-Name: Michael Harwell Author-X-Name-First: Michael Author-X-Name-Last: Harwell Author-Name: Nidhi Kohli Author-X-Name-First: Nidhi Author-X-Name-Last: Kohli Author-Name: Yadira Peralta-Torres Author-X-Name-First: Yadira Author-X-Name-Last: Peralta-Torres Title: A Survey of Reporting Practices of Computer Simulation Studies in Statistical Research Abstract: Computer simulation studies represent an important tool for investigating processes difficult or impossible to study using mathematical theory or real data. Hoaglin and Andrews recommended these studies be treated as statistical sampling experiments subject to established principles of design and data analysis, but the survey of Hauck and Anderson suggested these recommendations had, at that point in time, generally been ignored. We update the survey results of Hauck and Anderson using a sample of studies applying simulation methods in statistical research to assess the extent to which the recommendations of Hoaglin and Andrews and others for conducting simulation studies have been adopted. The important role of statistical applications of computer simulation studies in enhancing the reproducibility of scientific findings is also discussed. The results speak to the state of the art and the extent to which these studies are realizing their potential to inform statistical practice and a program of statistical research. Journal: The American Statistician Pages: 321-327 Issue: 4 Volume: 72 Year: 2018 Month: 10 X-DOI: 10.1080/00031305.2017.1342692 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1342692 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:321-327 Template-Type: ReDIF-Article 1.0 Author-Name: Luke A. Prendergast Author-X-Name-First: Luke A. Author-X-Name-Last: Prendergast Author-Name: Robert G. Staudte Author-X-Name-First: Robert G. Author-X-Name-Last: Staudte Title: A Simple and Effective Inequality Measure Abstract: Ratios of quantiles are often computed for income distributions as rough measures of inequality, and inference for such ratios has recently become available. The special case when the quantiles are symmetrically chosen; that is, when the p/2 quantile is divided by the (1 − p/2) quantile, is of special interest because the graph of such ratios, plotted as a function of p over the unit interval, yields an informative inequality curve. The area above the curve and less than the horizontal line at one is an easily interpretable measure of inequality. The advantages of these concepts over the traditional Lorenz curve and Gini coefficient are numerous: they are defined for all positive income distributions, they can be robustly estimated and large sample confidence intervals for the inequality coefficient are easily found. Moreover, the inequality curves satisfy a median-based transference principle and are convex for many commonly assumed income distributions. Journal: The American Statistician Pages: 328-343 Issue: 4 Volume: 72 Year: 2018 Month: 10 X-DOI: 10.1080/00031305.2017.1366366 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1366366 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:328-343 Template-Type: ReDIF-Article 1.0 Author-Name: Jonathan D. Rosenblatt Author-X-Name-First: Jonathan D. Author-X-Name-Last: Rosenblatt Author-Name: Yoav Benjamini Author-X-Name-First: Yoav Author-X-Name-Last: Benjamini Title: On Mixture Alternatives and Wilcoxon’s Signed-Rank Test Abstract: The shift alternative model has been the canonical alternative hypothesis since the early days of statistics. This holds true both in parametric and nonparametric statistical testing. In this contribution, we argue that in several applications of interest, the shift alternative is dubious while a mixture alternative is more plausible, because the treatment is expected to affect only a subpopulation. When considering mixture hypotheses, classical tests may no longer enjoy their desirable properties. In particular, we show that the t-test may be underpowered compared to Wilcoxon’s signed-rank test, even under a Gaussian null. We consider implications to personalized medicine and medical imaging. Journal: The American Statistician Pages: 344-347 Issue: 4 Volume: 72 Year: 2018 Month: 10 X-DOI: 10.1080/00031305.2017.1360795 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1360795 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:344-347 Template-Type: ReDIF-Article 1.0 Author-Name: M. L. Walker Author-X-Name-First: M. L. Author-X-Name-Last: Walker Author-Name: Y. H. Dovoedo Author-X-Name-First: Y. H. Author-X-Name-Last: Dovoedo Author-Name: S. Chakraborti Author-X-Name-First: S. Author-X-Name-Last: Chakraborti Author-Name: C. W. Hilton Author-X-Name-First: C. W. Author-X-Name-Last: Hilton Title: An Improved Boxplot for Univariate Data Abstract: The boxplot is an effective data-visualization tool useful in diverse applications and disciplines. Although more sophisticated graphical methods exist, the boxplot remains relevant due to its simplicity, interpretability, and usefulness, even in the age of big data. This article highlights the origins and developments of the boxplot that is now widely viewed as an industry standard as well as its inherent limitations when dealing with data from skewed distributions, particularly when detecting outliers. The proposed Ratio-Skewed boxplot is shown to be practical and suitable for outlier labeling across several parametric distributions. Journal: The American Statistician Pages: 348-353 Issue: 4 Volume: 72 Year: 2018 Month: 10 X-DOI: 10.1080/00031305.2018.1448891 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1448891 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:348-353 Template-Type: ReDIF-Article 1.0 Author-Name: Sherri Cheng Author-X-Name-First: Sherri Author-X-Name-Last: Cheng Author-Name: Mark Ferris Author-X-Name-First: Mark Author-X-Name-Last: Ferris Author-Name: Jessica Perolio Author-X-Name-First: Jessica Author-X-Name-Last: Perolio Title: An Innovative Classroom Approach for Developing Critical Thinkers in the Introductory Statistics Course Abstract: Misrepresented data and data taken out of context can be misleading at best. Statisticians present data to compel arguments, and they have a responsibility to be balanced and transparent in their use of evidence. In the classroom, learning how to analyze, interpret, and report data also needs to include explicit training in critical thinking skills, in which students explore the importance of context, assumptions, and bias. With this in mind, we integrate an innovative, multi-faceted pedagogical approach into an introductory statistics course, which incorporates writing assignments, small group discussion, and Socratic dialog. Our approach provides real-life applications for traditional statistical topics while also helping students learn to use data with integrity, ask important questions, and view problems holistically. Journal: The American Statistician Pages: 354-358 Issue: 4 Volume: 72 Year: 2018 Month: 10 X-DOI: 10.1080/00031305.2017.1305293 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1305293 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:354-358 Template-Type: ReDIF-Article 1.0 Author-Name: Alan C. Elliott Author-X-Name-First: Alan C. Author-X-Name-Last: Elliott Author-Name: S. Lynne Stokes Author-X-Name-First: S. Lynne Author-X-Name-Last: Stokes Author-Name: Jing Cao Author-X-Name-First: Jing Author-X-Name-Last: Cao Title: Teaching Ethics in a Statistics Curriculum with a Cross-Cultural Emphasis Abstract: Like most professional disciplines, the ASA has adopted ethical guidelines for its practitioners. To promote these guidelines, as well as to meet governmental and institutional mandates, U.S. universities are demanding more training on ethics within existing statistics graduate student curricula. Most of this training is based on the teachings of Western philosophers. However, many statistics graduate students are from Eastern cultures (particularly Chinese), and cultural and linguistic evidence indicates that Western ethics may be difficult to translate into the philosophical concepts common to students from different cultural backgrounds. This article describes how to teach cross-cultural ethics, with emphasis on the ASA Ethical Guidelines, within a graduate-level statistical consulting course. In particular, we present content that can help students overcome cultural and language barriers to gain an understanding of ethical decision-making that is compatible with both Western and Eastern philosophical models. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 359-367 Issue: 4 Volume: 72 Year: 2018 Month: 10 X-DOI: 10.1080/00031305.2017.1307140 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1307140 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:359-367 Template-Type: ReDIF-Article 1.0 Author-Name: Julian Stander Author-X-Name-First: Julian Author-X-Name-Last: Stander Author-Name: Luciana Dalla Valle Author-X-Name-First: Luciana Author-X-Name-Last: Dalla Valle Author-Name: Mario Cortina-Borja Author-X-Name-First: Mario Author-X-Name-Last: Cortina-Borja Title: A Bayesian Survival Analysis of a Historical Dataset: How Long Do Popes Live? Abstract: University courses in statistical modeling often place great emphasis on methodological theory, illustrating it only briefly by means of limited and repeatedly used standard examples. Unfortunately, this approach often fails to actively engage and motivate students in their learning process. The teaching of statistical topics such as Bayesian survival analysis can be enhanced by focusing on innovative applications. Here, we discuss the visualization and modeling of a dataset of historical events comprising the post-election survival times of popes. Inference, prediction, and model checking are performed in the Bayesian framework, with comparisons being made with the frequentist approach. Further opportunities for similar statistical investigations are outlined. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 368-375 Issue: 4 Volume: 72 Year: 2018 Month: 10 X-DOI: 10.1080/00031305.2017.1328374 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1328374 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:368-375 Template-Type: ReDIF-Article 1.0 Author-Name: Simon Demers Author-X-Name-First: Simon Author-X-Name-Last: Demers Title: Taylor's Law Holds for Finite OEIS Integer Sequences and Binomial Coefficients Abstract: Taylor's law (TL) predicts that the variance and the mean will be related empirically through a power-law function. TL previously has been shown to arise even in the absence of biological, ecological or physical processes. We report here that the mean and variance of 110 finite integer sequences in the On-Line Encyclopedia of Integer Sequences (OEIS) obey TL approximately. We also show that the binomial coefficients on each row of Pascal's triangle obey TL asymptotically. These applications of TL to seemingly unrelated mathematical structures tend to confirm there might be purely statistical, context-independent mechanisms at play. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 376-378 Issue: 4 Volume: 72 Year: 2018 Month: 10 X-DOI: 10.1080/00031305.2017.1422439 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1422439 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:376-378 Template-Type: ReDIF-Article 1.0 Author-Name: DeWayne Derryberry Author-X-Name-First: DeWayne Author-X-Name-Last: Derryberry Author-Name: Ken Aho Author-X-Name-First: Ken Author-X-Name-Last: Aho Author-Name: John Edwards Author-X-Name-First: John Author-X-Name-Last: Edwards Author-Name: Teri Peterson Author-X-Name-First: Teri Author-X-Name-Last: Peterson Title: Model Selection and Regression -Statistics Abstract: It is shown that dropping quantitative variables from a linear regression, based on t-statistics, is mathematically equivalent to dropping variables based on commonly used information criteria. Journal: The American Statistician Pages: 379-381 Issue: 4 Volume: 72 Year: 2018 Month: 10 X-DOI: 10.1080/00031305.2018.1459316 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1459316 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:379-381 Template-Type: ReDIF-Article 1.0 Author-Name: Stephanie C. Hicks Author-X-Name-First: Stephanie C. Author-X-Name-Last: Hicks Author-Name: Rafael A. Irizarry Author-X-Name-First: Rafael A. Author-X-Name-Last: Irizarry Title: A Guide to Teaching Data Science Abstract: Demand for data science education is surging and traditional courses offered by statistics departments are not meeting the needs of those seeking training. This has led to a number of opinion pieces advocating for an update to the Statistics curriculum. The unifying recommendation is that computing should play a more prominent role. We strongly agree with this recommendation, but advocate the main priority is to bring applications to the forefront as proposed by Nolan and Speed in 1999. We also argue that the individuals tasked with developing data science courses should not only have statistical training, but also have experience analyzing data with the main objective of solving real-world problems. Here, we share a set of general principles and offer a detailed guide derived from our successful experience developing and teaching a graduate-level, introductory data science course centered entirely on case studies. We argue for the importance of statistical thinking, as defined by Wild and Pfannkuch in 1999 and describe how our approach teaches students three key skills needed to succeed in data science, which we refer to as creating, connecting, and computing. This guide can also be used for statisticians wanting to gain more practical knowledge about data science before embarking on teaching an introductory course. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 382-391 Issue: 4 Volume: 72 Year: 2018 Month: 10 X-DOI: 10.1080/00031305.2017.1356747 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1356747 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:382-391 Template-Type: ReDIF-Article 1.0 Author-Name: Ronald Christensen Author-X-Name-First: Ronald Author-X-Name-Last: Christensen Title: Comment on Knaeble and Dutter (2017) Journal: The American Statistician Pages: 392-393 Issue: 4 Volume: 72 Year: 2018 Month: 10 X-DOI: 10.1080/00031305.2016.1278036 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1278036 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:392-393 Template-Type: ReDIF-Article 1.0 Author-Name: The Editors Title: Corrigenda Journal: The American Statistician Pages: 394-394 Issue: 4 Volume: 72 Year: 2018 Month: 10 X-DOI: 10.1080/00031305.2018.1523641 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1523641 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:394-394 Template-Type: ReDIF-Article 1.0 Author-Name: Feifei Wang Author-X-Name-First: Feifei Author-X-Name-Last: Wang Author-Name: Jian Wang Author-X-Name-First: Jian Author-X-Name-Last: Wang Author-Name: Alan E. Gelfand Author-X-Name-First: Alan E. Author-X-Name-Last: Gelfand Author-Name: Fan Li Author-X-Name-First: Fan Author-X-Name-Last: Li Title: Disease Mapping With Generative Models Abstract: Disease mapping focuses on learning about areal units presenting high relative risk. Disease mapping models assume that the disease counts are distributed as Poisson random variables with the respective means typically specified as the product of the relative risk and the expected count. These models usually incorporate spatial random effects to accomplish spatial smoothing of the relative risks. Fitting of these models often computes expected disease counts via internal standardization. This places the data on both sides of the model, that is, the counts are on the left side but they are also used to obtain the expected counts on the right side. As a result, these internally standardized models are incoherent and not generative; probabilistically, they could not produce the data we observe. Here, we argue for adopting the direct generative model for disease counts, modeling disease incidence rates instead of relative risks, using a generalized logistic regression. Then, the relative risks are then extracted post model fitting. We first demonstrate the benefit of the generative model without incorporating spatial smoothing using simulation. Then, spatial smoothing is introduced using the customary conditionally autoregressive model. We also extend the generative model to dynamic settings. The generative models are compared with internally standardized models, again through simulated datasets but also through a well-examined lung cancer morbidity dataset in Ohio. Both models are spatial and both smooth the data similarly with regard to relative risks. However, the generative coherent models tend to provide tighter credible intervals. Since the generative specification is coherent, is at least as good inferentially, and is no more difficult to fit, we suggest that it should be the model of choice for spatial disease mapping. Journal: The American Statistician Pages: 213-223 Issue: 3 Volume: 73 Year: 2019 Month: 7 X-DOI: 10.1080/00031305.2017.1392358 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1392358 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:3:p:213-223 Template-Type: ReDIF-Article 1.0 Author-Name: Peter K. Dunn Author-X-Name-First: Peter K. Author-X-Name-Last: Dunn Author-Name: Margaret Marshman Author-X-Name-First: Margaret Author-X-Name-Last: Marshman Author-Name: Robert McDougall Author-X-Name-First: Robert Author-X-Name-Last: McDougall Title: Evaluating Wikipedia as a Self-Learning Resource for Statistics: You Know They'll Use It Abstract: The role of Wikipedia for learning has been debated because it does not conform to the usual standards. Despite this, people use it, due to the ubiquity of Wikipedia entries in the outcomes from popular search engines. It is important for academic disciplines, including statistics, to ensure they are correctly represented in a medium where anyone can assume the role of discipline expert. In this context, we first develop a tool for evaluating Wikipedia articles for topics with a procedural component. Then, using this tool, five Wikipedia articles on basic statistical concepts are critiqued from the point of view of a self-learner: “arithmetic mean,” “standard deviation,” “standard error,” “confidence interval,” and “histogram.” We find that the articles, in general, are poor, and some articles contain inaccuracies. We propose that Wikipedia be actively discouraged for self-learning (using, for example, a classroom activity) except to give a brief overview; that in more formal learning environments, teachers be explicit about not using Wikipedia as a learning resource for course content; and, because Wikipedia is used regardless of considered advice or the organizational protocols in place, teachers move away from minimal contact with Wikipedia towards more constructive engagement. Journal: The American Statistician Pages: 224-231 Issue: 3 Volume: 73 Year: 2019 Month: 7 X-DOI: 10.1080/00031305.2017.1392360 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1392360 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:3:p:224-231 Template-Type: ReDIF-Article 1.0 Author-Name: Thomas J. Fisher Author-X-Name-First: Thomas J. Author-X-Name-Last: Fisher Author-Name: Michael W. Robbins Author-X-Name-First: Michael W. Author-X-Name-Last: Robbins Title: A Cheap Trick to Improve the Power of a Conservative Hypothesis Test Abstract: Critical values and p-values of statistical hypothesis tests are often derived using asymptotic approximations of sampling distributions. However, this sometimes results in tests that are conservative (i.e., understate the frequency of an incorrectly rejected null hypothesis by employing too stringent of a threshold for rejection). Although computationally rigorous options (e.g., the bootstrap) are available for such situations, we illustrate that simple transformations can be used to improve both the size and power of such tests. Using a logarithmic transformation, we show that the transformed statistic is asymptotically equivalent to its untransformed analogue under the null hypothesis and is divergent from the untransformed version under the alternative (yielding a potentially substantial increase in power). The transformation is applied to several easily-accessible statistical hypothesis tests, a few of which are taught in introductory statistics courses. With theoretical arguments and simulations, we illustrate that the log transformation is preferable to other forms of correction (such as statistics that use a multiplier). Finally, we illustrate application of the method to a well-known dataset. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 232-242 Issue: 3 Volume: 73 Year: 2019 Month: 7 X-DOI: 10.1080/00031305.2017.1395364 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1395364 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:3:p:232-242 Template-Type: ReDIF-Article 1.0 Author-Name: Geoffrey K. Robinson Author-X-Name-First: Geoffrey K. Author-X-Name-Last: Robinson Title: What Properties Might Statistical Inferences Reasonably be Expected to Have?—Crisis and Resolution in Statistical Inference Abstract: There is a crisis in the foundations of statistical inference. I believe that this crisis will eventually be resolved by regarding the subjective Bayesian paradigm as ideal in principle but often using standard procedures which are not subjective Bayesian for well-defined standard circumstances. As a step toward this resolution, this article looks at the question of what properties statistical inferences might reasonably be expected to have and argues that the use of p-values should be restricted to pure significance testing. The value judgments presented are supported by a range of examples. Journal: The American Statistician Pages: 243-252 Issue: 3 Volume: 73 Year: 2019 Month: 7 X-DOI: 10.1080/00031305.2017.1415971 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1415971 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:3:p:243-252 Template-Type: ReDIF-Article 1.0 Author-Name: Jeff Allen Author-X-Name-First: Jeff Author-X-Name-Last: Allen Title: Who Wants to be a Statistician? An Analysis of ACT-Tested Public School Students Abstract: This study examines predictors of statistics as occupation choice while in high school. The overall rate of choosing statistics was 1 per 1,681 students. Females, Asian students, students from the southern United States, and students from rural schools were less likely to choose statistics, and there was an increase in statistics choice rates between 2014 and 2017. Differences across other socio-demographic groups were small after accounting for other predictors. The strongest predictors of statistics choice were ACT Mathematics score and a measure of vocational interests corresponding to Holland's Conventional personality type. The results of the study can be used to identify high school students with interest and achievement profiles that are common among prospective statisticians, and to gain a better understanding of factors that affect statistics occupation choice. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 253-263 Issue: 3 Volume: 73 Year: 2019 Month: 7 X-DOI: 10.1080/00031305.2017.1419143 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1419143 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:3:p:253-263 Template-Type: ReDIF-Article 1.0 Author-Name: Jacob Goldin Author-X-Name-First: Jacob Author-X-Name-Last: Goldin Author-Name: Daniel Reck Author-X-Name-First: Daniel Author-X-Name-Last: Reck Title: The Analysis of Survey Data with Framing Effects Abstract: A well-known difficulty in survey research is that respondents’ answers to questions can depend on arbitrary features of a survey’s design, such as the wording of questions or the ordering of answer choices. In this paper, we describe a novel set of tools for analyzing survey data characterized by such framing effects. We show that the conventional approach to analyzing data with framing effects—randomizing survey-takers across frames and pooling the responses—generally does not identify a useful parameter. In its place, we propose an alternative approach and provide conditions under which it identifies the responses that are unaffected by framing. We also present several results for shedding light on the population distribution of the individual characteristic the survey is designed to measure. Journal: The American Statistician Pages: 264-272 Issue: 3 Volume: 73 Year: 2019 Month: 7 X-DOI: 10.1080/00031305.2017.1407358 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1407358 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:3:p:264-272 Template-Type: ReDIF-Article 1.0 Author-Name: Hakan Demirtas Author-X-Name-First: Hakan Author-X-Name-Last: Demirtas Title: Inducing Any Feasible Level of Correlation to Bivariate Data With Any Marginals Abstract: A simple sorting approach for inducing any desired Pearson or Spearman correlation to independent bivariate data, whose marginals can be of any distributional type and nature is described and illustrated through examples that span a broad range of situations. The proposed method has substantial potential in simulated settings that involve random number generation. Journal: The American Statistician Pages: 273-277 Issue: 3 Volume: 73 Year: 2019 Month: 7 X-DOI: 10.1080/00031305.2017.1379438 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1379438 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:3:p:273-277 Template-Type: ReDIF-Article 1.0 Author-Name: J. G. Liao Author-X-Name-First: J. G. Author-X-Name-Last: Liao Author-Name: Arthur Berg Author-X-Name-First: Arthur Author-X-Name-Last: Berg Title: Sharpening Jensen's Inequality Abstract: This article proposes a new sharpened version of Jensen's inequality. The proposed new bound is simple and insightful, is broadly applicable by imposing minimum assumptions, and provides fairly accurate results in spite of its simple form. Applications to the moment generating function, power mean inequalities, and Rao-Blackwell estimation are presented. This presentation can be incorporated in any calculus-based statistical course. Journal: The American Statistician Pages: 278-281 Issue: 3 Volume: 73 Year: 2019 Month: 7 X-DOI: 10.1080/00031305.2017.1419145 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1419145 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:3:p:278-281 Template-Type: ReDIF-Article 1.0 Author-Name: Ryoungsun Park Author-X-Name-First: Ryoungsun Author-X-Name-Last: Park Title: Practical Teaching Strategies for Hypothesis Testing Abstract: Teaching the concept of inferential statistics is one of the most challenging tasks for statistics educators. Often, students cannot make logical connections between inferential statistics and other topics such as descriptive statistics and probability. The source of difficulty may be that inferential statistics is based on complex ideas such as hypothetical reasoning, data analytic methods, and probabilistic thinking. This article presents classroom practices that teachers can easily adapt for their statistics classes to teach fundamental ideas of inferential statistics. The expected educational outcome is the conceptual understanding of the elements of statistical testing rather than learning about a specific testing methodology. Using the proposed practices, students are guided to propose their own hypotheses, collect actual data, and make their own inferences, rather than following a predetermined sequence of procedures. The practice material is divided into three subtasks, so that teachers can plan their curriculum effectively and perform formative assessments regarding students' progress. Journal: The American Statistician Pages: 282-287 Issue: 3 Volume: 73 Year: 2019 Month: 7 X-DOI: 10.1080/00031305.2018.1424034 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1424034 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:3:p:282-287 Template-Type: ReDIF-Article 1.0 Author-Name: Peter S. Fader Author-X-Name-First: Peter S. Author-X-Name-Last: Fader Author-Name: Bruce G. S. Hardie Author-X-Name-First: Bruce G. S. Author-X-Name-Last: Hardie Author-Name: Daniel McCarthy Author-X-Name-First: Daniel Author-X-Name-Last: McCarthy Author-Name: Ramnath Vaidyanathan Author-X-Name-First: Ramnath Author-X-Name-Last: Vaidyanathan Title: Exploring the Equivalence of Two Common Mixture Models for Duration Data Abstract: The beta-geometric (BG) distribution and the Pareto distribution of the second kind (P(II)) are two basic models for duration-time data that share some underlying characteristics (i.e., continuous mixtures of memoryless distributions), but differ in two important respects: first, the BG is the natural model to use when the event of interest occurs in discrete time, while the P(II) is the right choice for a continuous-time setting. Second, the underlying mixing distributions (the beta and gamma for the BG and P(II), respectively), are very different—and often believed to be noncomparable with each other. Despite these and other key differences, the two models are strikingly similar in terms of their fit and predictive performance as well as their parameter estimates. We explore this equivalence, both empirically and analytically, and discuss the implications from both a substantive and methodological standpoint. Journal: The American Statistician Pages: 288-295 Issue: 3 Volume: 73 Year: 2019 Month: 7 X-DOI: 10.1080/00031305.2018.1543134 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1543134 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:3:p:288-295 Template-Type: ReDIF-Article 1.0 Author-Name: Hongmei Zhang Author-X-Name-First: Hongmei Author-X-Name-Last: Zhang Author-Name: Yubo Zou Author-X-Name-First: Yubo Author-X-Name-Last: Zou Author-Name: Will Terry Author-X-Name-First: Will Author-X-Name-Last: Terry Author-Name: Wilfried Karmaus Author-X-Name-First: Wilfried Author-X-Name-Last: Karmaus Author-Name: Hasan Arshad Author-X-Name-First: Hasan Author-X-Name-Last: Arshad Title: Joint Clustering With Correlated Variables Abstract: Traditional clustering methods focus on grouping subjects or (dependent) variables assuming independence between the variables. Clusters formed through these approaches can potentially lack homogeneity. This article proposes a joint clustering method by which both variables and subjects are clustered. In each joint cluster (in general composed of a subset of variables and a subset of subjects), there exists a unique association between dependent variables and covariates of interest. To this end, a Bayesian method is designed, in which a semi-parametric model is used to evaluate any unknown relationships between possibly correlated variables and covariates of interest, and a Dirichlet process is used to cluster subjects. Compared to existing clustering techniques, the major novelty of the method exists in its ability to improve the homogeneity of clusters, along with the ability to take the correlations between variables into account. Via simulations, we examine the performance and efficiency of the proposed method. Applying the method to cluster allergens and subjects based on the association of wheal size in reaction to allergens with age, we found that a certain pattern of allergic sensitization to a set of allergens has a potential to reduce the occurrence of asthma. Journal: The American Statistician Pages: 296-306 Issue: 3 Volume: 73 Year: 2019 Month: 7 X-DOI: 10.1080/00031305.2018.1424033 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1424033 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:3:p:296-306 Template-Type: ReDIF-Article 1.0 Author-Name: Andrew Gelman Author-X-Name-First: Andrew Author-X-Name-Last: Gelman Author-Name: Ben Goodrich Author-X-Name-First: Ben Author-X-Name-Last: Goodrich Author-Name: Jonah Gabry Author-X-Name-First: Jonah Author-X-Name-Last: Gabry Author-Name: Aki Vehtari Author-X-Name-First: Aki Author-X-Name-Last: Vehtari Title: R-squared for Bayesian Regression Models Abstract: The usual definition of R2 (variance of the predicted values divided by the variance of the data) has a problem for Bayesian fits, as the numerator can be larger than the denominator. We propose an alternative definition similar to one that has appeared in the survival analysis literature: the variance of the predicted values divided by the variance of predicted values plus the expected variance of the errors. Journal: The American Statistician Pages: 307-309 Issue: 3 Volume: 73 Year: 2019 Month: 7 X-DOI: 10.1080/00031305.2018.1549100 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1549100 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:3:p:307-309 Template-Type: ReDIF-Article 1.0 Author-Name: Silas Bergen Author-X-Name-First: Silas Author-X-Name-Last: Bergen Title: Displaying Time Series, Spatial, and Space-Time Data with R, 2nd ed Journal: The American Statistician Pages: 310-311 Issue: 3 Volume: 73 Year: 2019 Month: 7 X-DOI: 10.1080/00031305.2019.1641357 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1641357 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:3:p:310-311 Template-Type: ReDIF-Article 1.0 Author-Name: Thaddeus Tarpey Author-X-Name-First: Thaddeus Author-X-Name-Last: Tarpey Author-Name: Eva Petkova Author-X-Name-First: Eva Author-X-Name-Last: Petkova Title: Letter to the Editor Journal: The American Statistician Pages: 312-312 Issue: 3 Volume: 73 Year: 2019 Month: 7 X-DOI: 10.1080/00031305.2018.1537894 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1537894 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:3:p:312-312 Template-Type: ReDIF-Article 1.0 Author-Name: Jiangtao Gou Author-X-Name-First: Jiangtao Author-X-Name-Last: Gou Author-Name: Fengqing (Zoe) Zhang Author-X-Name-First: Fengqing (Zoe) Author-X-Name-Last: Zhang Title: Experience Simpson's Paradox in the Classroom Abstract: Simpson's paradox is a challenging topic to teach in an introductory statistics course. To motivate students to understand this paradox both intuitively and statistically, this article introduces several new ways to teach Simpson's paradox. We design a paper toss activity between instructors and students in class to engage students in the learning process. We show that Simpson's paradox widely exists in basketball statistics, and thus instructors may consider looking for Simpson's paradox in their own school basketball teams as examples to motivate students’ interest. A new probabilistic explanation of Simpson's paradox is provided, which helps foster students’ statistical understanding. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 61-66 Issue: 1 Volume: 71 Year: 2017 Month: 1 X-DOI: 10.1080/00031305.2016.1200485 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1200485 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:61-66 Template-Type: ReDIF-Article 1.0 Author-Name: Christine M. Anderson-Cook Author-X-Name-First: Christine M. Author-X-Name-Last: Anderson-Cook Author-Name: Michael S. Hamada Author-X-Name-First: Michael S. Author-X-Name-Last: Hamada Author-Name: Leslie M. Moore Author-X-Name-First: Leslie M. Author-X-Name-Last: Moore Author-Name: Joanne R. Wendelberger Author-X-Name-First: Joanne R. Author-X-Name-Last: Wendelberger Title: Statistical Mentoring at Early Training and Career Stages Abstract: At Los Alamos National Laboratory (LANL), statistical scientists develop solutions for a variety of national security challenges through scientific excellence, typically as members of interdisciplinary teams. At LANL, mentoring is actively encouraged and practiced to develop statistical skills and positive career-building behaviors. Mentoring activities targeted at different career phases from student to junior staff are an important catalyst for both short and long term career development. This article discusses mentoring strategies for undergraduate and graduate students through internships as well as for postdoctoral research associates and junior staff. Topics addressed include project selection, progress, and outcome; intellectual and social activities that complement the student internship experience; key skills/knowledge not typically obtained in academic training; and the impact of such internships on students’ careers. Experiences and strategies from a number of successful mentorships are presented. Feedback from former mentees obtained via a questionnaire is incorporated. These responses address some of the benefits the respondents received from mentoring, helpful contributions and advice from their mentors, key skills learned, and how mentoring impacted their later careers. Journal: The American Statistician Pages: 6-14 Issue: 1 Volume: 71 Year: 2017 Month: 1 X-DOI: 10.1080/00031305.2016.1200491 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1200491 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:6-14 Template-Type: ReDIF-Article 1.0 Author-Name: Leandro da Silva Pereira Author-X-Name-First: Leandro da Silva Author-X-Name-Last: Pereira Author-Name: Lucas Monteiro Chaves Author-X-Name-First: Lucas Monteiro Author-X-Name-Last: Chaves Author-Name: Devanil Jaques de Souza Author-X-Name-First: Devanil Jaques Author-X-Name-Last: de Souza Title: An Intuitive Geometric Approach to the Gauss Markov Theorem Abstract: Algebraic proofs of Gauss–Markov theorem are very disappointing from an intuitive point of view. An alternative is to use geometry that emphasizes the essential statistical ideas behind the result. This article presents a truly geometrical intuitive approach to the theorem, based only in simple geometrical concepts, like linear subspaces and orthogonal projections. Journal: The American Statistician Pages: 67-70 Issue: 1 Volume: 71 Year: 2017 Month: 1 X-DOI: 10.1080/00031305.2016.1209127 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1209127 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:67-70 Template-Type: ReDIF-Article 1.0 Author-Name: Li Zhu Author-X-Name-First: Li Author-X-Name-Last: Zhu Author-Name: Kimberly F. Sellers Author-X-Name-First: Kimberly F. Author-X-Name-Last: Sellers Author-Name: Darcy Steeg Morris Author-X-Name-First: Darcy Steeg Author-X-Name-Last: Morris Author-Name: Galit Shmueli Author-X-Name-First: Galit Author-X-Name-Last: Shmueli Title: Bridging the Gap: A Generalized Stochastic Process for Count Data Abstract: The Bernoulli and Poisson processes are two popular discrete count processes; however, both rely on strict assumptions. We instead propose a generalized homogenous count process (which we name the Conway–Maxwell–Poisson or COM-Poisson process) that not only includes the Bernoulli and Poisson processes as special cases, but also serves as a flexible mechanism to describe count processes that approximate data with over- or under-dispersion. We introduce the process and an associated generalized waiting time distribution with several real-data applications to illustrate its flexibility for a variety of data structures. We consider model estimation under different scenarios of data availability, and assess performance through simulated and real datasets. This new generalized process will enable analysts to better model count processes where data dispersion exists in a more accommodating and flexible manner. Journal: The American Statistician Pages: 71-80 Issue: 1 Volume: 71 Year: 2017 Month: 1 X-DOI: 10.1080/00031305.2016.1234976 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1234976 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:71-80 Template-Type: ReDIF-Article 1.0 Author-Name: Edward L. Ionides Author-X-Name-First: Edward L. Author-X-Name-Last: Ionides Author-Name: Alexander Giessing Author-X-Name-First: Alexander Author-X-Name-Last: Giessing Author-Name: Yaacov Ritov Author-X-Name-First: Yaacov Author-X-Name-Last: Ritov Author-Name: Scott E. Page Author-X-Name-First: Scott E. Author-X-Name-Last: Page Title: Response to the ASA’s Statement on -Values: Context, Process, and Purpose Journal: The American Statistician Pages: 88-89 Issue: 1 Volume: 71 Year: 2017 Month: 1 X-DOI: 10.1080/00031305.2016.1234977 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1234977 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:88-89 Template-Type: ReDIF-Article 1.0 Author-Name: Richard G. Spencer Author-X-Name-First: Richard G. Author-X-Name-Last: Spencer Author-Name: Benjamin D. Cortese Author-X-Name-First: Benjamin D. Author-X-Name-Last: Cortese Author-Name: Vanessa A. Lukas Author-X-Name-First: Vanessa A. Author-X-Name-Last: Lukas Author-Name: Nancy Pleshko Author-X-Name-First: Nancy Author-X-Name-Last: Pleshko Title: Point Estimates of Test Sensitivity and Specificity from Sample Means and Variances Abstract: In a wide variety of biomedical and clinical research studies, sample statistics from diagnostic marker measurements are presented as a means of distinguishing between two populations, such as with and without disease. Intuitively, a larger difference between the mean values of a marker for the two populations, and a smaller spread of values within each population, should lead to more reliable classification rules based on this marker. We formalize this intuitive notion by deriving practical, new, closed-form expressions for the sensitivity and specificity of three different discriminant tests defined in terms of the sample means and standard deviations of diagnostic marker measurements. The three discriminant tests evaluated are based, respectively, on the Euclidean distance and the Mahalanobis distance between means, and a likelihood ratio analysis. Expressions for the effects of measurement error are also presented. Our final expressions assume that the diagnostic markers follow independent normal distributions for the two populations, although it will be clear that other known distributions may be similarly analyzed. We then discuss applications drawn from the medical literature, although the formalism is clearly not restricted to that application. Journal: The American Statistician Pages: 81-87 Issue: 1 Volume: 71 Year: 2017 Month: 1 X-DOI: 10.1080/00031305.2016.1239589 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1239589 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:81-87 Template-Type: ReDIF-Article 1.0 Author-Name: Eric A. Vance Author-X-Name-First: Eric A. Author-X-Name-Last: Vance Author-Name: Donna E. LaLonde Author-X-Name-First: Donna E. Author-X-Name-Last: LaLonde Author-Name: Lin Zhang Author-X-Name-First: Lin Author-X-Name-Last: Zhang Title: The Big Tent for Statistics: Mentoring Required Abstract: Research supports the positive impact of mentoring on both job and career satisfaction. Recognizing this, the American Statistical Association (ASA) has started a new mission-centered focus on mentoring. This article describes the development and implementation of meeting-based mentoring programs at four ASA conferences in 2014 and 2015. We present results of the feedback evaluations from program participants and use them to motivate recommendations for creating and running conference mentoring programs and overcoming common challenges. These recommendations are applicable to creating and running conference mentoring programs in any field. We conclude with a discussion of the opportunities for the ASA to augment its mentoring programs in support of the professional development of its members. Journal: The American Statistician Pages: 15-22 Issue: 1 Volume: 71 Year: 2017 Month: 1 X-DOI: 10.1080/00031305.2016.1247016 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1247016 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:15-22 Template-Type: ReDIF-Article 1.0 Author-Name: Eric A. Vance Author-X-Name-First: Eric A. Author-X-Name-Last: Vance Author-Name: Erin Tanenbaum Author-X-Name-First: Erin Author-X-Name-Last: Tanenbaum Author-Name: Amarjot Kaur Author-X-Name-First: Amarjot Author-X-Name-Last: Kaur Author-Name: Mark C. Otto Author-X-Name-First: Mark C. Author-X-Name-Last: Otto Author-Name: Richard Morris Author-X-Name-First: Richard Author-X-Name-Last: Morris Title: An Eight-Step Guide to Creating and Sustaining a Mentoring Program Abstract: Mentoring is an extremely valuable activity for both individuals and organizations. Mentoring within organizations can develop and integrate employees into their corporate culture. Mentoring outside the mentees’ work groups or through professional development organizations can give broader perspective and support, especially in times of transition. But mentoring programs require tremendous effort to start, organize, and maintain. Few last more than two years. This article provides a structured approach to starting and sustaining a successful program. The steps include understanding an organization’s particular needs, learning from small pilot programs, following up with mentoring pairs during a committed formal mentoring period, and evaluating results from each program’s cycle to learn and grow the program. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 23-29 Issue: 1 Volume: 71 Year: 2017 Month: 1 X-DOI: 10.1080/00031305.2016.1251493 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1251493 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:23-29 Template-Type: ReDIF-Article 1.0 Author-Name: Mark Daniel Ward Author-X-Name-First: Mark Daniel Author-X-Name-Last: Ward Title: Building Bridges: The Role of an Undergraduate Mentor Abstract: I share some advice and lessons that I have learned from working with many wonderful students and colleagues, in my role as Undergraduate Chair of Statistics at Purdue University since 2008. I also reflect on developing, implementing, and sustaining a new living, learning community environment for statistics students. Journal: The American Statistician Pages: 30-33 Issue: 1 Volume: 71 Year: 2017 Month: 1 X-DOI: 10.1080/00031305.2016.1251494 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1251494 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:30-33 Template-Type: ReDIF-Article 1.0 Author-Name: Lauren Vollmer Author-X-Name-First: Lauren Author-X-Name-Last: Vollmer Author-Name: Aparna Keshaviah Author-X-Name-First: Aparna Author-X-Name-Last: Keshaviah Author-Name: Dmitriy Poznyak Author-X-Name-First: Dmitriy Author-X-Name-Last: Poznyak Author-Name: Sharon Zhao Author-X-Name-First: Sharon Author-X-Name-Last: Zhao Author-Name: Fei Xing Author-X-Name-First: Fei Author-X-Name-Last: Xing Author-Name: Nicholas Beyler Author-X-Name-First: Nicholas Author-X-Name-Last: Beyler Title: Re-Defining the , and of Mentoring for Professional Statisticians Abstract: Organizations tailor their mentoring strategies to accommodate internal resources and preferences, producing different approaches in academic, government, and corporate environments. Across these settings, three common barriers impede effective mentoring of statisticians: overspecialization, time constraints, and geographic dispersion. The authors share mentoring strategies that have emerged at their organization, Mathematica Policy Research, to overcome these obstacles. Practices include creating a methodology working group to unite researchers with diverse backgrounds, integrating mentoring into existing workflows, and harnessing modern technological infrastructure to facilitate virtual mentoring. Although these strategies emerged within a specific professional context, they suggest opportunities for statisticians to expand the channels through which mentorship can occur. Journal: The American Statistician Pages: 34-37 Issue: 1 Volume: 71 Year: 2017 Month: 1 X-DOI: 10.1080/00031305.2016.1255256 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1255256 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:34-37 Template-Type: ReDIF-Article 1.0 Author-Name: Kim Love Author-X-Name-First: Kim Author-X-Name-Last: Love Author-Name: Eric A. Vance Author-X-Name-First: Eric A. Author-X-Name-Last: Vance Author-Name: Frank E. Harrell, Author-X-Name-First: Frank E. Author-X-Name-Last: Harrell, Author-Name: Dallas E. Johnson Author-X-Name-First: Dallas E. Author-X-Name-Last: Johnson Author-Name: Michael H. Kutner Author-X-Name-First: Michael H. Author-X-Name-Last: Kutner Author-Name: Ronald D. Snee Author-X-Name-First: Ronald D. Author-X-Name-Last: Snee Author-Name: Doug Zahn Author-X-Name-First: Doug Author-X-Name-Last: Zahn Title: Developing a Career in the Practice of Statistics: The Mentor's Perspective Abstract: The W.J. Dixon Award for Excellence in Statistical Consulting is given by the American Statistical Association to “a distinguished individual who has demonstrated excellence in statistical consulting or developed and contributed new methods, software, or ways of thinking that improve statistical practice in general.” In this article, five of the seven past recipients of this career-capping award share their experiences and perspectives through 10 stepping stones that move a practicing statistician from consultant to collaborator to leader. We highlight the need for mentorship throughout the discussion, and provide direction for statisticians who would like to incorporate this advice into their careers. Journal: The American Statistician Pages: 38-46 Issue: 1 Volume: 71 Year: 2017 Month: 1 X-DOI: 10.1080/00031305.2016.1255257 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1255257 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:38-46 Template-Type: ReDIF-Article 1.0 Author-Name: Amanda L. Golbeck Author-X-Name-First: Amanda L. Author-X-Name-Last: Golbeck Title: Mentoring Faculty Women in Statistics: Exploring Challenges and Opportunities for Leadership Development Abstract: The problems for faculty women in statistics (FWIS) in the United States are complex and call for programs that aim to develop inclusive leadership competencies among both FWIS and faculty men in statistics (FMIS) regardless of whether they currently hold, or aspire to, administrative positions. Data indicate that, among faculty in doctorate-granting departments of statistics and biostatistics, there is a disparity between genders in numbers of role models or exemplars. Yet we note that there have been some innovative national initiatives over the years in mentoring, networking, or leadership that have been instrumental in advancing FWIS. Given current understandings of the role of implicit bias in sustaining a differential status for FWIS, this discussion emphasizes a new approach as a way to further advance FWIS: one that involves the development of inclusive leadership among both men and women toward promoting inclusive faculty cultures in statistics. Journal: The American Statistician Pages: 47-54 Issue: 1 Volume: 71 Year: 2017 Month: 1 X-DOI: 10.1080/00031305.2016.1255658 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1255658 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:47-54 Template-Type: ReDIF-Article 1.0 Author-Name: Jacqueline M. Hughes-Oliver Author-X-Name-First: Jacqueline M. Author-X-Name-Last: Hughes-Oliver Title: Mentoring to Achieve Diversity in Graduate Programs Abstract: The discipline of statistics has a celebrated, diverse, and colorful past.  With a definite international flavor, we continue to make great strides in keeping our discipline relevant and accessible for addressing significant societal concerns. Unfortunately, we lag behind many other disciplines when it comes to fully tapping into the potential of all demographic groups within the United States.  Mentoring provides one of many opportunities to change this narrative. This article looks at hard numbers related to diversity, points to some existing successful mentoring programs, and is a reflection of lessons learned through personal experiences. Journal: The American Statistician Pages: 55-60 Issue: 1 Volume: 71 Year: 2017 Month: 1 X-DOI: 10.1080/00031305.2016.1255661 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1255661 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:55-60 Template-Type: ReDIF-Article 1.0 Author-Name: Susan E. Hodge Author-X-Name-First: Susan E. Author-X-Name-Last: Hodge Title: Letter to the Editor: Average Entropy Does Not Measure Uncertainty Journal: The American Statistician Pages: 89-90 Issue: 1 Volume: 71 Year: 2017 Month: 1 X-DOI: 10.1080/00031305.2016.1265586 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1265586 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:89-90 Template-Type: ReDIF-Article 1.0 Author-Name: Mary Kwasny Author-X-Name-First: Mary Author-X-Name-Last: Kwasny Title: Mentoring in the ASA: A Rejoinder Journal: The American Statistician Pages: 5-5 Issue: 1 Volume: 71 Year: 2017 Month: 1 X-DOI: 10.1080/00031305.2016.1268502 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1268502 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:5-5 Template-Type: ReDIF-Article 1.0 Author-Name: David Morganstein Author-X-Name-First: David Author-X-Name-Last: Morganstein Title: Mentoring in the ASA: A Commentary Journal: The American Statistician Pages: 3-4 Issue: 1 Volume: 71 Year: 2017 Month: 1 X-DOI: 10.1080/00031305.2016.1268504 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1268504 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:3-4 Template-Type: ReDIF-Article 1.0 Author-Name: Omar A. Kittaneh Author-X-Name-First: Omar A. Author-X-Name-Last: Kittaneh Title: Response to "Average Entropy Does Not Measure Uncertainty" Journal: The American Statistician Pages: 91-91 Issue: 1 Volume: 71 Year: 2017 Month: 1 X-DOI: 10.1080/00031305.2016.1269484 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1269484 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:91-91 Template-Type: ReDIF-Article 1.0 Author-Name: Aarti Shah Author-X-Name-First: Aarti Author-X-Name-Last: Shah Title: What is Mentoring? Abstract: What is mentoring? Is it just a buzz word or is this really valuable? How can mentoring help one to grow and advance personally and professionally? How and where does one even begin? Many of us have these questions. In this article, I will share my perspective and provide some reflections on these questions based on my own personal and professional journey. Journal: The American Statistician Pages: 1-2 Issue: 1 Volume: 71 Year: 2017 Month: 1 X-DOI: 10.1080/00031305.2016.1269686 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1269686 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:1-2 Template-Type: ReDIF-Article 1.0 Author-Name: Reza Ramezan Author-X-Name-First: Reza Author-X-Name-Last: Ramezan Title: Reviews of Books and Teaching Materials Journal: The American Statistician Pages: 92-96 Issue: 1 Volume: 71 Year: 2017 Month: 1 X-DOI: 10.1080/00031305.2017.1271242 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1271242 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:92-96 Template-Type: ReDIF-Article 1.0 Author-Name: Aniko Szabo Author-X-Name-First: Aniko Author-X-Name-Last: Szabo Title: Test for Trend With a Multinomial Outcome Abstract: There is no established procedure for testing for trend with nominal outcomes that would provide both a global hypothesis test and outcome-specific inference. We derive a simple formula for such a test using a weighted sum of Cochran–Armitage test statistics evaluating the trend in each outcome separately. The test is shown to be equivalent to the score test for multinomial logistic regression, however, the new formulation enables the derivation of a sample size formula and multiplicity-adjusted inference for individual outcomes. The proposed methods are implemented in the R package multiCA. Journal: The American Statistician Pages: 313-320 Issue: 4 Volume: 73 Year: 2019 Month: 10 X-DOI: 10.1080/00031305.2017.1407823 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1407823 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:313-320 Template-Type: ReDIF-Article 1.0 Author-Name: Aaron McDaid Author-X-Name-First: Aaron Author-X-Name-Last: McDaid Author-Name: Zoltán Kutalik Author-X-Name-First: Zoltán Author-X-Name-Last: Kutalik Author-Name: Valentin Rousson Author-X-Name-First: Valentin Author-X-Name-Last: Rousson Title: A Five-Decision Testing Procedure to Infer the Value of a Unidimensional Parameter Abstract: A statistical test can be seen as a procedure to produce a decision based on observed data, where some decisions consist of rejecting a hypothesis (yielding a significant result) and some do not, and where one controls the probability to make a wrong rejection at some prespecified significance level. Whereas traditional hypothesis testing involves only two possible decisions (to reject or not a null hypothesis), Kaiser’s directional two-sided test as well as the more recently introduced testing procedure of Jones and Tukey, each equivalent to running two one-sided tests, involve three possible decisions to infer the value of a unidimensional parameter. The latter procedure assumes that a point null hypothesis is impossible (e.g., that two treatments cannot have exactly the same effect), allowing a gain of statistical power. There are, however, situations where a point hypothesis is indeed plausible, for example, when considering hypotheses derived from Einstein’s theories. In this article, we introduce a five-decision rule testing procedure, equivalent to running a traditional two-sided test in addition to two one-sided tests, which combines the advantages of the testing procedures of Kaiser (no assumption on a point hypothesis being impossible) and Jones and Tukey (higher power), allowing for a nonnegligible (typically 20%) reduction of the sample size needed to reach a given statistical power to get a significant result, compared to the traditional approach. Journal: The American Statistician Pages: 321-326 Issue: 4 Volume: 73 Year: 2019 Month: 10 X-DOI: 10.1080/00031305.2018.1437075 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1437075 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:321-326 Template-Type: ReDIF-Article 1.0 Author-Name: Christopher Harms Author-X-Name-First: Christopher Author-X-Name-Last: Harms Title: A Bayes Factor for Replications of ANOVA Results Abstract: With an increasing number of replication studies performed in psychological science, the question of how to evaluate the outcome of a replication attempt deserves careful consideration. Bayesian approaches allow to incorporate uncertainty and prior information into the analysis of the replication attempt by their design. The Replication Bayes factor, introduced by Verhagen and Wagenmakers (2014), provides quantitative, relative evidence in favor or against a successful replication. In previous work by Verhagen and Wagenmakers (2014), it was limited to the case of t-tests. In this article, the Replication Bayes factor is extended to F-tests in multigroup, fixed-effect ANOVA designs. Simulations and examples are presented to facilitate the understanding and to demonstrate the usefulness of this approach. Finally, the Replication Bayes factor is compared to other Bayesian and frequentist approaches and discussed in the context of replication attempts. R code to calculate Replication Bayes factors and to reproduce the examples in the article is available at https://osf.io/jv39h/. Journal: The American Statistician Pages: 327-339 Issue: 4 Volume: 73 Year: 2019 Month: 10 X-DOI: 10.1080/00031305.2018.1518787 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1518787 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:327-339 Template-Type: ReDIF-Article 1.0 Author-Name: Arnab Kumar Maity Author-X-Name-First: Arnab Kumar Author-X-Name-Last: Maity Author-Name: Vivek Pradhan Author-X-Name-First: Vivek Author-X-Name-Last: Pradhan Author-Name: Ujjwal Das Author-X-Name-First: Ujjwal Author-X-Name-Last: Das Title: Bias Reduction in Logistic Regression with Missing Responses When the Missing Data Mechanism is Nonignorable Abstract: In logistic regression with nonignorable missing responses, Ibrahim and Lipsitz proposed a method for estimating regression parameters. It is known that the regression estimates obtained by using this method are biased when the sample size is small. Also, another complexity arises when the iterative estimation process encounters separation in estimating regression coefficients. In this article, we propose a method to improve the estimation of regression coefficients. In our likelihood-based method, we penalize the likelihood by multiplying it by a noninformative Jeffreys prior as a penalty term. The proposed method reduces bias and is able to handle the issue of separation. Simulation results show substantial bias reduction for the proposed method as compared to the existing method. Analyses using real world data also support the simulation findings. An R package called brlrmr is developed implementing the proposed method and the Ibrahim and Lipsitz method. Journal: The American Statistician Pages: 340-349 Issue: 4 Volume: 73 Year: 2019 Month: 10 X-DOI: 10.1080/00031305.2017.1407359 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1407359 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:340-349 Template-Type: ReDIF-Article 1.0 Author-Name: Yueh-Yun Chi Author-X-Name-First: Yueh-Yun Author-X-Name-Last: Chi Author-Name: Deborah H. Glueck Author-X-Name-First: Deborah H. Author-X-Name-Last: Glueck Author-Name: Keith E. Muller Author-X-Name-First: Keith E. Author-X-Name-Last: Muller Title: Power and Sample Size for Fixed-Effects Inference in Reversible Linear Mixed Models Abstract: Despite the popularity of the general linear mixed model for data analysis, power and sample size methods and software are not generally available for commonly used test statistics and reference distributions. Statisticians resort to simulations with homegrown and uncertified programs or rough approximations which are misaligned with the data analysis. For a wide range of designs with longitudinal and clustering features, we provide accurate power and sample size approximations for inference about fixed effects in the linear models we call reversible. We show that under widely applicable conditions, the general linear mixed-model Wald test has noncentral distributions equivalent to well-studied multivariate tests. In turn, exact and approximate power and sample size results for the multivariate Hotelling–Lawley test provide exact and approximate power and sample size results for the mixed-model Wald test. The calculations are easily computed with a free, open-source product that requires only a web browser to use. Commercial software can be used for a smaller range of reversible models. Simple approximations allow accounting for modest amounts of missing data. A real-world example illustrates the methods. Sample size results are presented for a multicenter study on pregnancy. The proposed study, an extension of a funded project, has clustering within clinic. Exchangeability among the participants allows averaging across them to remove the clustering structure. The resulting simplified design is a single-level longitudinal study. Multivariate methods for power provide an approximate sample size. All proofs and inputs for the example are in the supplementary materials (available online). Journal: The American Statistician Pages: 350-359 Issue: 4 Volume: 73 Year: 2019 Month: 10 X-DOI: 10.1080/00031305.2017.1415972 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1415972 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:350-359 Template-Type: ReDIF-Article 1.0 Author-Name: Alice Richardson Author-X-Name-First: Alice Author-X-Name-Last: Richardson Title: A Comparative Review of Nonparametric Statistics Textbooks Abstract: In this article I will review six textbooks commonly set in University undergraduate nonparametric statistics courses. The books will be evaluated in terms of how key statistical concepts are presented; use of software; exercises; and location on a theory-applications axis and an algorithms-principles axis. The placement of books on these axes provides a novel guide for instructors looking for the book that best fits their approach to teaching nonparametric statistics. Journal: The American Statistician Pages: 360-366 Issue: 4 Volume: 73 Year: 2019 Month: 10 X-DOI: 10.1080/00031305.2018.1437076 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1437076 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:360-366 Template-Type: ReDIF-Article 1.0 Author-Name: Ambrose Lo Author-X-Name-First: Ambrose Author-X-Name-Last: Lo Title: Demystifying the Integrated Tail Probability Expectation Formula Abstract: Calculating the expected values of different types of random variables is a central topic in mathematical statistics. Targeted toward students and instructors in both introductory probability and statistics courses and graduate-level measure-theoretic probability courses, this pedagogical note casts light on a general expectation formula stated in terms of distribution and survival functions of random variables and discusses its educational merits. Often consigned to an end-of-chapter exercise in mathematical statistics textbooks with minimal discussion and presented under superfluous technical assumptions, this unconventional expectation formula provides an invaluable opportunity for students to appreciate the geometric meaning of expectations, which is overlooked in most undergraduate and graduate curricula, and serves as an efficient tool for the calculation of expected values that could be much more laborious by traditional means. For students’ benefit, this formula deserves a thorough in-class treatment in conjunction with the teaching of expectations. Besides clarifying some commonly held misconceptions and showing the pedagogical value of the expectation formula, this note offers guidance for instructors on teaching the formula taking the background of the target student group into account. Journal: The American Statistician Pages: 367-374 Issue: 4 Volume: 73 Year: 2019 Month: 10 X-DOI: 10.1080/00031305.2018.1497541 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1497541 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:367-374 Template-Type: ReDIF-Article 1.0 Author-Name: Amelia McNamara Author-X-Name-First: Amelia Author-X-Name-Last: McNamara Title: Key Attributes of a Modern Statistical Computing Tool Abstract: In the 1990s, statisticians began thinking in a principled way about how computation could better support the learning and doing of statistics. Since then, the pace of software development has accelerated, advancements in computing and data science have moved the goalposts, and it is time to reassess. Software continues to be developed to help do and learn statistics, but there is little critical evaluation of the resulting tools, and no accepted framework with which to critique them. This article presents a set of attributes necessary for a modern statistical computing tool. The framework was designed to be broadly applicable to both novice and expert users, with a particular focus on making more supportive statistical computing environments. A modern statistical computing tool should be accessible, provide easy entry, privilege data as a first-order object, support exploratory and confirmatory analysis, allow for flexible plot creation, support randomization, be interactive, include inherent documentation, support narrative, publishing, and reproducibility, and be flexible to extensions. Ideally, all these attributes could be incorporated into one tool, supporting users at all levels, but a more reasonable goal is for tools designed for novices and professionals to “reach across the gap,” taking inspiration from each others’ strengths. Journal: The American Statistician Pages: 375-384 Issue: 4 Volume: 73 Year: 2019 Month: 10 X-DOI: 10.1080/00031305.2018.1482784 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1482784 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:375-384 Template-Type: ReDIF-Article 1.0 Author-Name: Philip A. White Author-X-Name-First: Philip A. Author-X-Name-Last: White Author-Name: Candace Berrett Author-X-Name-First: Candace Author-X-Name-Last: Berrett Author-Name: E. Shannon Neeley-Tass Author-X-Name-First: E. Shannon Author-X-Name-Last: Neeley-Tass Author-Name: Michael G. Findley Author-X-Name-First: Michael G. Author-X-Name-Last: Findley Title: Modeling Efficiency of Foreign Aid Allocation in Malawi Abstract: The Open Aid Malawi initiative has collected an unprecedented database that identifies as much location-specific information as possible for each of over 2500 individual foreign aid donations to Malawi since 2003. The efficient use and distribution of such aid is important to donors and to Malawi citizens. However, because of individual donor goals and difficulty in tracking donor coordination it is difficult to determine whether aid allocation is efficient. We compare several Bayesian spatial generalized linear mixed models to relate aid allocation to various economic indicators within seven donation sectors. We find that the spatial gamma regression model best predicts current aid allocation. While we are cautious about making strong claims based on this exploratory study, we provide a methodology by which one could (i) evaluate the efficiency of aid allocation via a study of the locations of current aid allocation as compared to the need at those locations and (ii) come up with a strategy for efficient allocation of resources in conditions where there exists an ideal relationship between aid allocation and economic sectors. Journal: The American Statistician Pages: 385-399 Issue: 4 Volume: 73 Year: 2019 Month: 10 X-DOI: 10.1080/00031305.2018.1470032 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1470032 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:385-399 Template-Type: ReDIF-Article 1.0 Author-Name: Ronald D. Snee Author-X-Name-First: Ronald D. Author-X-Name-Last: Snee Title: We Stand on the Shoulders of Giants—Pioneers of Statistics in Industry Abstract: Industrial statistics has a rich and proud heritage. The field was initiated in the 1920s and picked up steam in the 1950s with the establishment of industrial statistics groups in several companies including American Cyanamid, DuPont, General Electric, Kodak, Western Electric, Procter and Gamble, General Foods, and 3M. It can be argued that we are in the third generation of the development of the profession. Indeed we are standing on the shoulders of giants. Several pioneering industrial statistics organizations are profiled in this article. The focus is on the roots of the organizations, the people involved and their contributions to their employers, advancements in the field and the development of the profession. Synthesis of this information provides some unique insights into who we are, what we have accomplished, and the needs and opportunities of the future. Journal: The American Statistician Pages: 400-407 Issue: 4 Volume: 73 Year: 2019 Month: 10 X-DOI: 10.1080/00031305.2018.1543140 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1543140 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:400-407 Template-Type: ReDIF-Article 1.0 Author-Name: Alexandre Galvão Patriota Author-X-Name-First: Alexandre Galvão Author-X-Name-Last: Patriota Title: On the Mean Value Theorem for Estimating Functions Abstract: Feng et al. revealed that the usual mean value theorem (MVT) should not be applied directly to a vector-valued function (e.g., the score function or a general estimating function under a multiparametric model). This note shows that the application of the Cramer–Wold’s device to a corrected version of the MVT is sufficient to obtain standard asymptotics for the estimators attained from vector-valued estimating functions. Journal: The American Statistician Pages: 408-410 Issue: 4 Volume: 73 Year: 2019 Month: 10 X-DOI: 10.1080/00031305.2018.1558110 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1558110 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:408-410 Template-Type: ReDIF-Article 1.0 Author-Name: Jesse Frey Author-X-Name-First: Jesse Author-X-Name-Last: Frey Title: Comment on VanDerwerken (2019) Journal: The American Statistician Pages: 411-412 Issue: 4 Volume: 73 Year: 2019 Month: 10 X-DOI: 10.1080/00031305.2019.1604433 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1604433 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:411-412 Template-Type: ReDIF-Article 1.0 Author-Name: Peter Bacchetti Author-X-Name-First: Peter Author-X-Name-Last: Bacchetti Title: The Other Arbitrary Cutoff Journal: The American Statistician Pages: 413-414 Issue: 4 Volume: 73 Year: 2019 Month: 10 X-DOI: 10.1080/00031305.2019.1654920 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1654920 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:413-414 Template-Type: ReDIF-Article 1.0 Author-Name: Anelise G. Sabbag Author-X-Name-First: Anelise G. Author-X-Name-Last: Sabbag Title: Handbook of Educational Measurement and Psychometrics Using R. Journal: The American Statistician Pages: 415-416 Issue: 4 Volume: 73 Year: 2019 Month: 10 X-DOI: 10.1080/00031305.2019.1676110 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1676110 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:415-416 Template-Type: ReDIF-Article 1.0 Author-Name: Megan D. Higgs Author-X-Name-First: Megan D. Author-X-Name-Last: Higgs Title: Randomistas: How Radical Researchers Are Changing Our World. Journal: The American Statistician Pages: 416-417 Issue: 4 Volume: 73 Year: 2019 Month: 10 X-DOI: 10.1080/00031305.2019.1676111 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1676111 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:416-417 Template-Type: ReDIF-Article 1.0 Author-Name: Christian Litterer Author-X-Name-First: Christian Author-X-Name-Last: Litterer Title: Stochastic Processes: From Applications to Theory. Journal: The American Statistician Pages: 418-419 Issue: 4 Volume: 73 Year: 2019 Month: 10 X-DOI: 10.1080/00031305.2019.1676116 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1676116 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:418-419 Template-Type: ReDIF-Article 1.0 Author-Name: The Editors Title: Correction Journal: The American Statistician Pages: 420-420 Issue: 4 Volume: 73 Year: 2019 Month: 10 X-DOI: 10.1080/00031305.2019.1660112 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1660112 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:420-420 Template-Type: ReDIF-Article 1.0 Author-Name: The Editors Title: Editorial Collaborators Journal: The American Statistician Pages: 420-421 Issue: 4 Volume: 73 Year: 2019 Month: 10 X-DOI: 10.1080/00031305.2019.1680048 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1680048 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:420-421 Template-Type: ReDIF-Article 1.0 Author-Name: Luca Bagnato Author-X-Name-First: Luca Author-X-Name-Last: Bagnato Author-Name: Lucio De Capitani Author-X-Name-First: Lucio Author-X-Name-Last: De Capitani Author-Name: Antonio Punzo Author-X-Name-First: Antonio Author-X-Name-Last: Punzo Title: Testing for Serial Independence: Beyond the Portmanteau Approach Abstract: Portmanteau tests are typically used to test serial independence even if, by construction, they are generally powerful only in presence of pairwise dependence between lagged variables. In this article, we present a simple statistic defining a new serial independence test, which is able to detect more general forms of dependence. In particular, differently from the Portmanteau tests, the resulting test is powerful also under a dependent process characterized by pairwise independence. A diagram, based on p-values from the proposed test, is introduced to investigate serial dependence. Finally, the effectiveness of the proposal is evaluated in a simulation study and with an application on financial data. Both show that the new test, used in synergy with the existing ones, helps in the identification of the true data-generating process. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 219-238 Issue: 3 Volume: 72 Year: 2018 Month: 7 X-DOI: 10.1080/00031305.2016.1264314 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1264314 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:3:p:219-238 Template-Type: ReDIF-Article 1.0 Author-Name: Jin Zhang Author-X-Name-First: Jin Author-X-Name-Last: Zhang Title: Minimum Volume Confidence Sets for Two-Parameter Exponential Distributions Abstract: Under a reasonable restriction, we create the minimum volume confidence set for location and scale parameters of the exponential distribution. Compared to existing methods, none of which has a minimum-area property, the new confidence set is significantly the best (most accurate) with smallest volume, for whatever confidence level, sample size, and sample data. Journal: The American Statistician Pages: 213-218 Issue: 3 Volume: 72 Year: 2018 Month: 7 X-DOI: 10.1080/00031305.2016.1264315 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1264315 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:3:p:213-218 Template-Type: ReDIF-Article 1.0 Author-Name: Hauke Thaden Author-X-Name-First: Hauke Author-X-Name-Last: Thaden Author-Name: Thomas Kneib Author-X-Name-First: Thomas Author-X-Name-Last: Kneib Title: Structural Equation Models for Dealing With Spatial Confounding Abstract: In regression analyses of spatially structured data, it is common practice to introduce spatially correlated random effects into the regression model to reduce or even avoid unobserved variable bias in the estimation of other covariate effects. If besides the response the covariates are also spatially correlated, the spatial effects may confound the effect of the covariates or vice versa. In this case, the model fails to identify the true covariate effect due to multicollinearity. For highly collinear continuous covariates, path analysis and structural equation modeling techniques prove to be helpful to disentangle direct covariate effects from indirect covariate effects arising from correlation with other variables. This work discusses the applicability of these techniques in regression setups, where spatial and covariate effects coincide at least partly and classical geoadditive models fail to separate these effects. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 239-252 Issue: 3 Volume: 72 Year: 2018 Month: 7 X-DOI: 10.1080/00031305.2017.1305290 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1305290 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:3:p:239-252 Template-Type: ReDIF-Article 1.0 Author-Name: George W. Divine Author-X-Name-First: George W. Author-X-Name-Last: Divine Author-Name: H. James Norton Author-X-Name-First: H. James Author-X-Name-Last: Norton Author-Name: Anna E. Barón Author-X-Name-First: Anna E. Author-X-Name-Last: Barón Author-Name: Elizabeth Juarez-Colunga Author-X-Name-First: Elizabeth Author-X-Name-Last: Juarez-Colunga Title: The Wilcoxon–Mann–Whitney Procedure Fails as a Test of Medians Abstract: To illustrate and document the tenuous connection between the Wilcoxon–Mann–Whitney (WMW) procedure and medians, its relationship to mean ranks is first contrasted with the relationship of a t-test to means. The quantity actually tested: Pr ^(X1<X2)+ Pr ^(X1=X2)/2$\widehat {{\rm{Pr}}}({{{X}}_1} < {{{X}}_2}) + \widehat {{\rm{Pr}}}({{{X}}_1} = {{{X}}_2})/2$ is then described and recommended as the basis for an alternative summary statistic that can be employed instead of medians. In order to graphically represent an estimate of the quantity: Pr(X1 < X2) + Pr(X1 = X2)/2, use of a bubble plot, an ROC curve and a dominance diagram are illustrated. Several counter-examples (real and constructed) are presented, all demonstrating that the WMW procedure fails to be a test of medians. The discussion also addresses another, less common and perhaps less clear cut, but potentially even more important misconception: that the WMW procedure requires continuous data in order to be valid. Discussion of other issues surrounding the question of the WMW procedure and medians is presented, along with the authors' teaching experience with the topic. SAS code used for the examples is included as supplementary material. Journal: The American Statistician Pages: 278-286 Issue: 3 Volume: 72 Year: 2018 Month: 7 X-DOI: 10.1080/00031305.2017.1305291 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1305291 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:3:p:278-286 Template-Type: ReDIF-Article 1.0 Author-Name: Ashok Kumar Pathak Author-X-Name-First: Ashok Kumar Author-X-Name-Last: Pathak Title: A Simple Probabilistic Proof for the Alternating Convolution of the Central Binomial Coefficients Abstract: This note presents a simple probabilistic proof of the identity for the alternating convolution of the central binomial coefficients. The proof of the identity involves the computation of moments of order n for the product of standard normal random variables. Journal: The American Statistician Pages: 287-288 Issue: 3 Volume: 72 Year: 2018 Month: 7 X-DOI: 10.1080/00031305.2017.1358216 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1358216 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:3:p:287-288 Template-Type: ReDIF-Article 1.0 Author-Name: Weihua An Author-X-Name-First: Weihua Author-X-Name-Last: An Author-Name: Ying Ding Author-X-Name-First: Ying Author-X-Name-Last: Ding Title: The Landscape of Causal Inference: Perspective From Citation Network Analysis Abstract: Causal inference is a fast-growing multidisciplinary field that has drawn extensive interests from statistical sciences and health and social sciences. In this article, we gather comprehensive information on publications and citations in causal inference and provide a review of the field from the perspective of citation network analysis. We provide descriptive analyses by showing the most cited publications, the most prolific and the most cited authors, and structural properties of the citation network. Then, we examine the citation network through exponential random graph models (ERGMs). We show that both technical aspects of the publications (e.g., publication length, time and quality) and social processes such as homophily (the tendency to cite publications in the same field or with shared authors), cumulative advantage, and transitivity (the tendency to cite references’ references), matter for citations. We also provide specific analysis of citations among the top authors in the field and present a ranking and clustering of the authors. Overall, our article reveals new insights into the landscape of the field of causal inference and may serve as a case study for analyzing citation networks in a multidisciplinary field and for fitting ERGMs on big networks. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 265-277 Issue: 3 Volume: 72 Year: 2018 Month: 7 X-DOI: 10.1080/00031305.2017.1360794 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1360794 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:3:p:265-277 Template-Type: ReDIF-Article 1.0 Author-Name: Gilbert W. Fellingham Author-X-Name-First: Gilbert W. Author-X-Name-Last: Fellingham Author-Name: Jared D. Fisher Author-X-Name-First: Jared D. Author-X-Name-Last: Fisher Title: Predicting Home Run Production in Major League Baseball Using a Bayesian Semiparametric Model Abstract: This article attempts to predict home run hitting performance of Major League Baseball players using a Bayesian semiparametric model. Following Berry, Reese and Larkey we include in the model effects for era of birth, season of play, and home ball park. We estimate performance curves for each player using orthonormal quartic polynomials. We use a Dirichlet process prior on the unknown distribution for the coefficients of the polynomials, and parametric priors for the other effects. Dirichlet process priors are useful in prediction for two reasons: (1) an increased probability of obtaining more precise prediction comes with the increased flexibility of the prior specification, and (2) the clustering inherent in the Dirichlet process provides the means to share information across players. Data from 1871 to 2008 were used to fit the model. Data from 2009 to 2016 were used to test the predictive ability of the model. A parametric model was also fit to compare the predictive performance of the models. We used what we called “pure performance” curves to predict future performance for 22 players. The nonparametric method provided superior predictive performance. Journal: The American Statistician Pages: 253-264 Issue: 3 Volume: 72 Year: 2018 Month: 7 X-DOI: 10.1080/00031305.2017.1401959 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1401959 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:3:p:253-264 Template-Type: ReDIF-Article 1.0 Author-Name: Daniel Cerqueira Author-X-Name-First: Daniel Author-X-Name-Last: Cerqueira Author-Name: Danilo Coelho Author-X-Name-First: Danilo Author-X-Name-Last: Coelho Author-Name: Marcelo Fernandes Author-X-Name-First: Marcelo Author-X-Name-Last: Fernandes Author-Name: Jony Pinto Junior Author-X-Name-First: Jony Pinto Author-X-Name-Last: Junior Title: Guns and Suicides Abstract: There is a consensus in the literature that the ratio of suicides committed with guns to total suicides is the best indirect measure of gun ownership. However, such a proxy is not accurate for any locality with low population density in view that suicides are rare events. To circumvent this issue, we exploit the socioeconomic characteristics of the suicide victims in order to come up with a novel proxy for gun ownership. We assess our indicator using suicide micro-data from the Brazilian Ministry of Health between 2000 and 2010. Journal: The American Statistician Pages: 289-294 Issue: 3 Volume: 72 Year: 2018 Month: 7 X-DOI: 10.1080/00031305.2017.1419144 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1419144 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:3:p:289-294 Template-Type: ReDIF-Article 1.0 Author-Name: Olivier J. M. Guilbaud Author-X-Name-First: Olivier J. M. Author-X-Name-Last: Guilbaud Title: Some Complementary History and Results Journal: The American Statistician Pages: 300-301 Issue: 3 Volume: 72 Year: 2018 Month: 7 X-DOI: 10.1080/00031305.2018.1448892 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1448892 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:3:p:300-301 Template-Type: ReDIF-Article 1.0 Author-Name: Alan Hutson Author-X-Name-First: Alan Author-X-Name-Last: Hutson Title: Comment on “What Do Interpolated Nonparametric Confidence Intervals for Population Quantiles Guarantee?”, Frey and Zhang (2017) Journal: The American Statistician Pages: 302-302 Issue: 3 Volume: 72 Year: 2018 Month: 7 X-DOI: 10.1080/00031305.2018.1448893 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1448893 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:3:p:302-302 Template-Type: ReDIF-Article 1.0 Author-Name: The Editors Title: Reviews of Books and Teaching Materials Journal: The American Statistician Pages: 295-299 Issue: 3 Volume: 72 Year: 2018 Month: 7 X-DOI: 10.1080/00031305.2018.1496649 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1496649 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:3:p:295-299 Template-Type: ReDIF-Article 1.0 Author-Name: Christopher Abdul-Chani Author-X-Name-First: Christopher Author-X-Name-Last: Abdul-Chani Author-Name: Jesse Frey Author-X-Name-First: Jesse Author-X-Name-Last: Frey Title: Improving the Big East Conference Basketball Tournament Abstract: The Big East Conference basketball tournament is a four-day, 10-team, knockout tournament that is used to decide which team receives the conference’s automatic bid to the NCAA basketball tournament. Through data-based modeling, we show that the current tournament format is not very effective in determining the true best team. Specifically, by considering a variety of alternate formats, we find that certain formats that exclude all but a handful of teams substantially outperform the current format in determining the true best team. We also find that among formats that involve all ten teams, a format in which the top two seeds each receive two byes is relatively effective. We show that our conclusions are robust to several key modeling choices. We also investigate the effectiveness of the tie-breaking scheme used by the Big East Conference, finding that it is little better than random and may even favor weaker teams. Journal: The American Statistician Pages: 342-349 Issue: 4 Volume: 70 Year: 2016 Month: 10 X-DOI: 10.1080/00031305.2015.1105153 File-URL: http://hdl.handle.net/10.1080/00031305.2015.1105153 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:342-349 Template-Type: ReDIF-Article 1.0 Author-Name: David Quarfoot Author-X-Name-First: David Author-X-Name-Last: Quarfoot Author-Name: Richard A. Levine Author-X-Name-First: Richard A. Author-X-Name-Last: Levine Title: How Robust Are Multirater Interrater Reliability Indices to Changes in Frequency Distribution? Abstract: Interrater reliability studies are used in a diverse set of fields. Often, these investigations involve three or more raters, and thus, require the use of indices such as Fleiss’s kappa, Conger’s kappa, or Krippendorff’s alpha. Through two motivating examples—one theoretical and one from practice—this article exposes limitations of these indices when the units to be rated are not well-distributed across the rating categories. Then, using a Monte Carlo simulation and information visualizations, we argue for the use of two alternative indices, the Brennan–Prediger coefficient and Gwet’s AC2, because the agreement levels reported by these indices are more robust to variation in the distribution of units that raters encounter. The article concludes by exploring the complex, interwoven relationship between the number of levels in a rating instrument, the agreement level present among raters, and the distribution of units that are to be scored. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 373-384 Issue: 4 Volume: 70 Year: 2016 Month: 10 X-DOI: 10.1080/00031305.2016.1141708 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1141708 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:373-384 Template-Type: ReDIF-Article 1.0 Author-Name: Matthias Katzfuss Author-X-Name-First: Matthias Author-X-Name-Last: Katzfuss Author-Name: Jonathan R. Stroud Author-X-Name-First: Jonathan R. Author-X-Name-Last: Stroud Author-Name: Christopher K. Wikle Author-X-Name-First: Christopher K. Author-X-Name-Last: Wikle Title: Understanding the Ensemble Kalman Filter Abstract: The ensemble Kalman filter (EnKF) is a computational technique for approximate inference in state-space models. In typical applications, the state vectors are large spatial fields that are observed sequentially over time. The EnKF approximates the Kalman filter by representing the distribution of the state with an ensemble of draws from that distribution. The ensemble members are updated based on newly available data by shifting instead of reweighting, which allows the EnKF to avoid the degeneracy problems of reweighting-based algorithms. Taken together, the ensemble representation and shifting-based updates make the EnKF computationally feasible even for extremely high-dimensional state spaces. The EnKF is successfully used in data-assimilation applications with tens of millions of dimensions. While it implicitly assumes a linear Gaussian state-space model, it has also turned out to be remarkably robust to deviations from these assumptions in many applications. Despite its successes, the EnKF is largely unknown in the statistics community. We aim to change that with the present article, and to entice more statisticians to work on this topic. Journal: The American Statistician Pages: 350-357 Issue: 4 Volume: 70 Year: 2016 Month: 10 X-DOI: 10.1080/00031305.2016.1141709 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1141709 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:350-357 Template-Type: ReDIF-Article 1.0 Author-Name: Brendan Rocks Author-X-Name-First: Brendan Author-X-Name-Last: Rocks Title: Interval Estimation for the “Net Promoter Score” Abstract: The net promoter score (NPS) is a novel summary statistic used by thousands of companies as a key performance indicator of customer loyalty. While adoption of the statistic has grown rapidly over the last decade, there has been little published on its statistical properties. Common interval estimation techniques are adapted for use with the NPS, and performance assessed on the largest available database of companies’ net promoter scores. Variations on the adjusted Wald, and an iterative score test are found to have superior performance. Journal: The American Statistician Pages: 365-372 Issue: 4 Volume: 70 Year: 2016 Month: 10 X-DOI: 10.1080/00031305.2016.1158124 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1158124 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:365-372 Template-Type: ReDIF-Article 1.0 Author-Name: Jack Bowden Author-X-Name-First: Jack Author-X-Name-Last: Bowden Author-Name: Chris Jackson Author-X-Name-First: Chris Author-X-Name-Last: Jackson Title: Weighing Evidence “Steampunk” Style via the Meta-Analyser Abstract: The funnel plot is a graphical visualization of summary data estimates from a meta-analysis, and is a useful tool for detecting departures from the standard modeling assumptions. Although perhaps not widely appreciated, a simple extension of the funnel plot can help to facilitate an intuitive interpretation of the mathematics underlying a meta-analysis at a more fundamental level, by equating it to determining the center of mass of a physical system. We used this analogy to explain the concepts of weighing evidence and of biased evidence to a young audience at the Cambridge Science Festival, without recourse to precise definitions or statistical formulas and with a little help from Sherlock Holmes! Following on from the science fair, we have developed an interactive web-application (named the Meta-Analyser) to bring these ideas to a wider audience. We envisage that our application will be a useful tool for researchers when interpreting their data. First, to facilitate a simple understanding of fixed and random effects modeling approaches; second, to assess the importance of outliers; and third, to show the impact of adjusting for small study bias. This final aim is realized by introducing a novel graphical interpretation of the well-known method of Egger regression. Journal: The American Statistician Pages: 385-394 Issue: 4 Volume: 70 Year: 2016 Month: 10 X-DOI: 10.1080/00031305.2016.1165735 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1165735 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:385-394 Template-Type: ReDIF-Article 1.0 Author-Name: Joel E. Cohen Author-X-Name-First: Joel E. Author-X-Name-Last: Cohen Title: Statistics of Primes (and Probably Twin Primes) Satisfy Taylor's Law from Ecology Abstract: Taylor's law, which originated in ecology, states that, in sets of measurements of population density, the sample variance is approximately proportional to a power of the sample mean. Taylor's law has been verified for many species ranging from bacterial to human. Here, we show that the variance V(x) and the mean M(x) of the primes not exceeding a real number x obey Taylor's law asymptotically for large x. Specifically, V(x) ∼ (1/3)(M(x))2 as x → ∞. This apparently new fact about primes shows that Taylor's law may arise in the absence of biological processes, and that patterns discovered in biological data can suggest novel questions in number theory. If the Hardy-Littlewood twin primes conjecture is true, then the identical Taylor's law holds also for twin primes. Taylor's law holds in both instances because the primes (and the twin primes, given the conjecture) not exceeding x are asymptotically uniformly distributed on the integers in [2, x]. Hence, asymptotically M(x) ∼ x/2, V(x) ∼ x2/12. Higher-order moments of the primes (twin primes) not exceeding x satisfy a generalized Taylor's law. The 11,078,937 primes and 813,371 twin primes not exceeding 2 × 108 illustrate these results. Journal: The American Statistician Pages: 399-404 Issue: 4 Volume: 70 Year: 2016 Month: 10 X-DOI: 10.1080/00031305.2016.1173591 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1173591 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:399-404 Template-Type: ReDIF-Article 1.0 Author-Name: Adam Jaeger Author-X-Name-First: Adam Author-X-Name-Last: Jaeger Title: Computation of Two- and Three-Dimensional Confidence Regions With the Likelihood Ratio Abstract: The asymptotic results pertaining to the distribution of the log-likelihood ratio allow for the creation of a confidence region, which is a general extension of the confidence interval. Two- and three-dimensional regions can be displayed visually to describe the plausible region of the parameters of interest simultaneously. While most advanced statistical textbooks on inference discuss these asymptotic confidence regions, there is no exploration of how to numerically compute these regions for graphical purposes. This article demonstrates the application of a simple trigonometric transformation to compute two- and three-dimensional confidence regions; we transform the Cartesian coordinates of the parameters to create what we call the radial profile log-likelihood. The method is applicable to any distribution with a defined likelihood function, so it is not limited to specific data distributions or model paradigms. We describe the method along with the algorithm, follow with an example of our method, and end with an examination of computation time. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 395-398 Issue: 4 Volume: 70 Year: 2016 Month: 10 X-DOI: 10.1080/00031305.2016.1182946 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1182946 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:395-398 Template-Type: ReDIF-Article 1.0 Author-Name: Piaomu Liu Author-X-Name-First: Piaomu Author-X-Name-Last: Liu Author-Name: Edsel A. Peña Author-X-Name-First: Edsel A. Author-X-Name-Last: Peña Title: Sojourning With the Homogeneous Poisson Process Abstract: In this pedagogical article, distributional properties, some surprising, pertaining to the homogeneous Poisson process (HPP), when observed over a possibly random window, are presented. Properties of the gap-time that covered the termination time and the correlations among gap-times of the observed events are obtained. Inference procedures, such as estimation and model validation, based on event occurrence data over the observation window, are also presented. We envision that through the results in this article, a better appreciation of the subtleties involved in the modeling and analysis of recurrent events data will ensue, since the HPP is arguably one of the simplest among recurrent event models. In addition, the use of the theorem of total probability, Bayes’ theorem, the iterated rules of expectation, variance and covariance, and the renewal equation could be illustrative when teaching distribution theory, mathematical statistics, and stochastic processes at both the undergraduate and graduate levels. This article is targeted toward both instructors and students. Journal: The American Statistician Pages: 413-423 Issue: 4 Volume: 70 Year: 2016 Month: 10 X-DOI: 10.1080/00031305.2016.1200484 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1200484 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:413-423 Template-Type: ReDIF-Article 1.0 Author-Name: Yan Xia Author-X-Name-First: Yan Author-X-Name-Last: Xia Author-Name: Yanyun Yang Author-X-Name-First: Yanyun Author-X-Name-Last: Yang Title: Bias Introduced by Rounding in Multiple Imputation for Ordered Categorical Variables Abstract: Multivariate normality is frequently assumed when multiple imputation is applied for missing data. When data are ordered categorical, imputing missing data using the fully normal imputation results in implausible values falling outside of the categorical values. Naïve rounding has been suggested to round the imputed values to their categorical neighbors for further analysis. Previous studies showed that, for binary data, the rounded values can result in biased mean estimation when the population distribution is asymmetric. However, it has been conjectured that as the number of categories increases, the bias will decrease. To investigate this conjecture, the present study derives the formulas for the biases of the mean and standard deviation for ordered categorical variables with naïve rounding. Results show that both the biases of the mean and standard deviation decrease as the number of categories increases from 3 to 9. This study also finds that although symmetric population distributions lead to unbiased means of the rounded values, the standard deviations may still be largely biased. A simulation study further shows that the biases due to naïve rounding can result in substantially low coverage rates for the population mean parameter. Journal: The American Statistician Pages: 358-364 Issue: 4 Volume: 70 Year: 2016 Month: 10 X-DOI: 10.1080/00031305.2016.1200486 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1200486 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:358-364 Template-Type: ReDIF-Article 1.0 Author-Name: Kimihiro Noguchi Author-X-Name-First: Kimihiro Author-X-Name-Last: Noguchi Author-Name: Fernando Marmolejo-Ramos Author-X-Name-First: Fernando Author-X-Name-Last: Marmolejo-Ramos Title: Assessing Equality of Means Using the Overlap of Range-Preserving Confidence Intervals Abstract: Hypothesis testing procedures where equality of means is assessed at a prespecified level based on the (non-)overlap of confidence intervals are discussed. Assessing statistical significance via the (non-)overlap of two confidence intervals with an appropriate confidence level provides a simple and effective way of visually understanding statistical results. This article extends previous approaches by considering range-preserving confidence intervals where the values in such intervals are in the allowable range of the parameter of interest. To obtain reliable procedures, appropriate effective degrees of freedom are suggested by considering the Welch-Satterthwaite equation for both independent two-sample and paired-sample cases. The proposed procedures also allow users to express results in terms of commonly used scale-free effect sizes, which are highly useful for interpreting parameters of interest. Simulation results suggest that the proposed procedures may be robust to unequal or small sample sizes, nonnormal distributions, heterogeneous variances, and various degrees of correlation. A real-life application from a study in cognitive psychology illustrates the effectiveness of the proposed procedures. Journal: The American Statistician Pages: 325-334 Issue: 4 Volume: 70 Year: 2016 Month: 10 X-DOI: 10.1080/00031305.2016.1200487 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1200487 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:325-334 Template-Type: ReDIF-Article 1.0 Author-Name: Amy Wagaman Author-X-Name-First: Amy Author-X-Name-Last: Wagaman Title: Meeting Student Needs for Multivariate Data Analysis: A Case Study in Teaching an Undergraduate Multivariate Data Analysis Course Abstract: Modern students encounter large, messy datasets long before setting foot in our classrooms. Many of these students need to develop skills in exploratory data analysis and multivariate analysis techniques for their jobs after college, but such topics are not covered in traditional introductory statistics courses. This case study describes my experience in designing and teaching an undergraduate course on multivariate data analysis with minimal prerequisites, using real data, active learning, and other interactive activities to help students tackle the material. Multivariate topics covered include clustering and classification (among others) for exploratory data analysis and an introduction to algorithmic modeling. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 405-412 Issue: 4 Volume: 70 Year: 2016 Month: 10 X-DOI: 10.1080/00031305.2016.1201005 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1201005 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:405-412 Template-Type: ReDIF-Article 1.0 Author-Name: Leonhard Held Author-X-Name-First: Leonhard Author-X-Name-Last: Held Author-Name: Manuela Ott Author-X-Name-First: Manuela Author-X-Name-Last: Ott Title: How the Maximal Evidence of -Values Against Point Null Hypotheses Depends on Sample Size Abstract: Minimum Bayes factors are commonly used to transform two-sided p-values to lower bounds on the posterior probability of the null hypothesis. Several proposals exist in the literature, but none of them depends on the sample size. However, the evidence of a p-value against a point null hypothesis is known to depend on the sample size. In this article, we consider p-values in the linear model and propose new minimum Bayes factors that depend on sample size and converge to existing bounds as the sample size goes to infinity. It turns out that the maximal evidence of an exact two-sided p-value increases with decreasing sample size. The effect of adjusting minimum Bayes factors for sample size is shown in two applications. Journal: The American Statistician Pages: 335-341 Issue: 4 Volume: 70 Year: 2016 Month: 10 X-DOI: 10.1080/00031305.2016.1209128 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1209128 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:335-341 Template-Type: ReDIF-Article 1.0 Author-Name: Lawrence M. Lesser Author-X-Name-First: Lawrence M. Author-X-Name-Last: Lesser Title: Letter to the Editor Journal: The American Statistician Pages: 434-434 Issue: 4 Volume: 70 Year: 2016 Month: 10 X-DOI: 10.1080/00031305.2016.1222310 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1222310 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:434-434 Template-Type: ReDIF-Article 1.0 Author-Name: The Editors Title: Reviews of Books and Teaching Materials Journal: The American Statistician Pages: 424-433 Issue: 4 Volume: 70 Year: 2016 Month: 10 X-DOI: 10.1080/00031305.2016.1234902 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1234902 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:424-433 Template-Type: ReDIF-Article 1.0 Author-Name: The Editors Title: Editorial Collaborators Journal: The American Statistician Pages: 435-437 Issue: 4 Volume: 70 Year: 2016 Month: 10 X-DOI: 10.1080/00031305.2016.1248726 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1248726 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:435-437 Template-Type: ReDIF-Article 1.0 Author-Name: The Editors Title: Editorial Board EOV Journal: The American Statistician Pages: ebi-ebi Issue: 4 Volume: 70 Year: 2016 Month: 10 X-DOI: 10.1080/00031305.2016.1250537 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1250537 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:ebi-ebi Template-Type: ReDIF-Article 1.0 Author-Name: Alan D. Hutson Author-X-Name-First: Alan D. Author-X-Name-Last: Hutson Author-Name: Albert Vexler Author-X-Name-First: Albert Author-X-Name-Last: Vexler Title: A Cautionary Note on Beta Families of Distributions and the Aliases Within Abstract: In this note, we examine the four parameter beta family of distributions in the context of the beta-normal and beta-logistic distributions. In the process, we highlight the concept of numerical and limiting alias distributions, which in turn relate to numerical instabilities in the numerical maximum likelihood fitting routines for these families of distributions. We conjecture that the numerical issues pertaining to fitting these multiparameter distributions may be more widespread than has originally been reported across several families of distributions. Journal: The American Statistician Pages: 121-129 Issue: 2 Volume: 72 Year: 2018 Month: 4 X-DOI: 10.1080/00031305.2016.1213661 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1213661 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:2:p:121-129 Template-Type: ReDIF-Article 1.0 Author-Name: Sashi Kanth Tadinada Author-X-Name-First: Sashi Kanth Author-X-Name-Last: Tadinada Author-Name: Abhinav Gupta Author-X-Name-First: Abhinav Author-X-Name-Last: Gupta Title: Simulation of Constrained Variables in Engineering Risk Analyses Abstract: The problem of sampling random variables with overlapping pdfs subject to inequality constraints is addressed. Often, the values of physical variables in an engineering model are interrelated. This mutual dependence imposes inequality constraints on the random variables representing these parameters. Ignoring the interdependencies and sampling the variables independently can lead to inconsistency/bias. We propose an algorithm to generate samples of constrained random variables that are characterized by typical continuous probability distributions and are subject to different kinds of inequality constraints. The sampling procedure is illustrated for various representative cases and one realistic application to simulation of structural natural frequencies. Journal: The American Statistician Pages: 130-139 Issue: 2 Volume: 72 Year: 2018 Month: 4 X-DOI: 10.1080/00031305.2016.1255660 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1255660 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:2:p:130-139 Template-Type: ReDIF-Article 1.0 Author-Name: Santiago Velilla Author-X-Name-First: Santiago Author-X-Name-Last: Velilla Title: A Note on Collinearity Diagnostics and Centering Abstract: The usual approach for diagnosing collinearity proceeds by centering and standardizing the regressors. The sample correlation matrix of the predictors is then the basic tool for describing approximate linear combinations that may distort the conclusions of a standard least-square analysis. However, as indicated by several authors, centering may eventually fail to detect the sources of ill-conditioning. In spite of this earlier claim, there does not seem to be in the literature a fully clear explanation of the reasons for this bad potential behavior of the traditional strategy for analyzing collinearity. This note studies this issue in some detail. Results derived are motivated by the analysis of a well-known real dataset. Practical conclusions are illustrated with several examples. Journal: The American Statistician Pages: 140-146 Issue: 2 Volume: 72 Year: 2018 Month: 4 X-DOI: 10.1080/00031305.2016.1264312 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1264312 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:2:p:140-146 Template-Type: ReDIF-Article 1.0 Author-Name: Joel B. Greenhouse Author-X-Name-First: Joel B. Author-X-Name-Last: Greenhouse Author-Name: Howard J. Seltman Author-X-Name-First: Howard J. Author-X-Name-Last: Seltman Title: On Teaching Statistical Practice: From Novice to Expert Abstract: This article introduces principles of learning based on research in cognitive science that help explain how learning works. We adapt these principles to the teaching of statistical practice and illustrate the application of these principles to the curricular design of a new master's degree program in applied statistics. We emphasize how these principles can be used not only to improve instruction at the course level but also at the program level. Journal: The American Statistician Pages: 147-154 Issue: 2 Volume: 72 Year: 2018 Month: 4 X-DOI: 10.1080/00031305.2016.1270230 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1270230 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:2:p:147-154 Template-Type: ReDIF-Article 1.0 Author-Name: Rolf Sundberg Author-X-Name-First: Rolf Author-X-Name-Last: Sundberg Title: A Note on “Shaved Dice” Inference Abstract: Two dice are rolled repeatedly, only their sum is registered. Have the two dice been “shaved,” so two of the six sides appear more frequently? Pavlides and Perlman discussed this somewhat complicated type of situation through curved exponential families. Here, we contrast their approach by regarding data as incomplete data from a simple exponential family. The latter, supplementary approach is in some respects simpler, it provides additional insight about the relationships among the likelihood equation, the Fisher information, and the EM algorithm, and it illustrates the information content in ancillary statistics. Journal: The American Statistician Pages: 155-157 Issue: 2 Volume: 72 Year: 2018 Month: 4 X-DOI: 10.1080/00031305.2016.1277162 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1277162 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:2:p:155-157 Template-Type: ReDIF-Article 1.0 Author-Name: José A. Sánchez-Espigares Author-X-Name-First: José A. Author-X-Name-Last: Sánchez-Espigares Author-Name: Pere Grima Author-X-Name-First: Pere Author-X-Name-Last: Grima Author-Name: Lluís Marco-Almagro Author-X-Name-First: Lluís Author-X-Name-Last: Marco-Almagro Title: Visualizing Type II Error in Normality Tests Abstract: A skewed exponential power distribution, with parameters defining kurtosis and skewness, is introduced as a way to visualize Type II error in normality tests. By varying these parameters a mosaic of distributions is built, ranging from double exponential to uniform or from positive to negative exponential; the normal distribution is a particular case located in the center of the mosaic. Using a sequential color scheme, a different color is assigned to each distribution in the mosaic depending on the probability of committing a Type II error. This graph gives a visual representation of the power of the performed test. This way of representing results facilitates the comparison of the power of various tests and the influence of sample size. A script to perform this graphical representation, programmed in the R statistical software, is available online as supplementary material. Journal: The American Statistician Pages: 158-162 Issue: 2 Volume: 72 Year: 2018 Month: 4 X-DOI: 10.1080/00031305.2016.1278035 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1278035 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:2:p:158-162 Template-Type: ReDIF-Article 1.0 Author-Name: Saralees Nadarajah Author-X-Name-First: Saralees Author-X-Name-Last: Nadarajah Author-Name: Rui Li Author-X-Name-First: Rui Author-X-Name-Last: Li Title: An Expression for Fast Computation of Sample Central Moments Abstract: An expression is provided for the expectation of sample central moments. It is practical and offers computational advantages over the original form due to Kong (The American Statistician, 65, 2011, 198–199). Journal: The American Statistician Pages: 169-171 Issue: 2 Volume: 72 Year: 2018 Month: 4 X-DOI: 10.1080/00031305.2017.1286259 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1286259 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:2:p:169-171 Template-Type: ReDIF-Article 1.0 Author-Name: P. M. Kroonenberg Author-X-Name-First: P. M. Author-X-Name-Last: Kroonenberg Author-Name: Albert Verbeek Author-X-Name-First: Albert Author-X-Name-Last: Verbeek Title: The Tale of Cochran's Rule: My Contingency Table has so Many Expected Values Smaller than 5, What Am I to Do? Abstract: In an informal way, some dilemmas in connection with hypothesis testing in contingency tables are discussed. The body of the article concerns the numerical evaluation of Cochran's Rule about the minimum expected value in r × c contingency tables with fixed margins when testing independence with Pearson's X2 statistic using the χ2 distribution. Journal: The American Statistician Pages: 175-183 Issue: 2 Volume: 72 Year: 2018 Month: 4 X-DOI: 10.1080/00031305.2017.1286260 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1286260 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:2:p:175-183 Template-Type: ReDIF-Article 1.0 Author-Name: Timothy G. Gregoire Author-X-Name-First: Timothy G. Author-X-Name-Last: Gregoire Author-Name: David L. R. Affleck Author-X-Name-First: David L. R. Author-X-Name-Last: Affleck Title: Estimating Desired Sample Size for Simple Random Sampling of a Skewed Population Abstract: A simulation study was conducted to assess how well the necessary sample size to achieve a stipulated margin of error can be estimated prior to sampling. Our concern was particularly focused on performance when sampling from a very skewed distribution, which is a common feature of many biological, economic, and other populations. We examined two approaches for estimating sample size—one being the commonly used strategy aimed at regulating the average magnitude of the stipulated margin of error and the second being a previously proposed strategy to control the tolerance probability with which the stipulated margin of error is exceeded. Results of the simulation revealed that (1) skewness does not much affect the average estimated sample size but can greatly extend the range of estimated sample sizes; and (2) skewness does reduce the effectiveness of Kupper and Hafner's sample size estimator, yet its effectiveness is negatively impacted less by skewness directly, and to a much greater degree by the common practice of estimating the population variance via a pilot sampling from the skewed population. Nonetheless, the simulations suggest that estimating sample size to control the probability with which the desired margin of error is achieved is a worthwhile alternative to the usual sample size formula that controls the average width of the confidence interval only. Journal: The American Statistician Pages: 184-190 Issue: 2 Volume: 72 Year: 2018 Month: 4 X-DOI: 10.1080/00031305.2017.1290548 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1290548 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:2:p:184-190 Template-Type: ReDIF-Article 1.0 Author-Name: Peng Ding Author-X-Name-First: Peng Author-X-Name-Last: Ding Author-Name: Joseph K. Blitzstein Author-X-Name-First: Joseph K. Author-X-Name-Last: Blitzstein Title: On the Gaussian Mixture Representation of the Laplace Distribution Abstract: Under certain conditions, a symmetric unimodal continuous random variable ξ can be represented as a scale mixture of a standard Normal distribution Z, that is, ξ=WZ$\xi = \sqrt{W} Z$, where the mixing distribution W is independent of Z. It is well known that if the mixing distribution is inverse Gamma, then ξ has Student’s t distribution. However, it is less well known that if the mixing distribution is Gamma, then ξ has a Laplace distribution. Several existing proofs of the latter result rely on complex calculus or nontrivial change of variables in integrals. We offer two simple and intuitive proofs based on representation and moment generating functions. As a byproduct, our proof by representation makes connections to many existing results in statistics. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 172-174 Issue: 2 Volume: 72 Year: 2018 Month: 4 X-DOI: 10.1080/00031305.2017.1291448 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1291448 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:2:p:172-174 Template-Type: ReDIF-Article 1.0 Author-Name: Daniel R. Jeske Author-X-Name-First: Daniel R. Author-X-Name-Last: Jeske Author-Name: Janet M. Myhre Author-X-Name-First: Janet M. Author-X-Name-Last: Myhre Title: Regression Using Pairs vs. Regression on Differences: A Real-life Case Study for a Master's Level Methods Class Abstract: When teaching regression classes real-life examples help emphasize the importance of understanding theoretical concepts related to methodologies. This can be appreciated after a little reflection on the difficulty of constructing novel questions in regression that test on concepts rather than mere calculations. Interdisciplinary collaborations can be fertile contexts for questions of this type. In this article, we offer a case study that students will find: (1) practical with respect to the question being addressed, (2) compelling in the way it shows how a solid understanding of theory helps answer the question, and (3) enlightening in the way it shows how statisticians contribute to problem solving in interdisciplinary environments. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 163-168 Issue: 2 Volume: 72 Year: 2018 Month: 4 X-DOI: 10.1080/00031305.2017.1292956 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1292956 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:2:p:163-168 Template-Type: ReDIF-Article 1.0 Author-Name: Kathryn Schaefer Ziemer Author-X-Name-First: Kathryn Schaefer Author-X-Name-Last: Ziemer Author-Name: Bianica Pires Author-X-Name-First: Bianica Author-X-Name-Last: Pires Author-Name: Vicki Lancaster Author-X-Name-First: Vicki Author-X-Name-Last: Lancaster Author-Name: Sallie Keller Author-X-Name-First: Sallie Author-X-Name-Last: Keller Author-Name: Mark Orr Author-X-Name-First: Mark Author-X-Name-Last: Orr Author-Name: Stephanie Shipp Author-X-Name-First: Stephanie Author-X-Name-Last: Shipp Title: A New Lens on High School Dropout: Use of Correspondence Analysis and the Statewide Longitudinal Data System Abstract: The combination of log-linear models and correspondence analysis have long been used to decompose contingency tables and aid in their interpretation. Until now, this approach has not been applied to the education Statewide Longitudinal Data System (SLDS), which contains administrative school data at the student level. While some research has been conducted using the SLDS, its primary use is for state education administrative reporting. This article uses the combination of log-linear models and correspondence analysis to gain insight into high school dropouts in two discrete regions in Kentucky, Appalachia and non-Appalachia, defined by the American Community Survey. The individual student records from the SLDS were categorized into one of the two regions and a log-linear model was used to identify the interactions between the demographic characteristics and the dropout categories, push-out and pull-out. Correspondence analysis was then used to visualize the interactions with the expanded push-out categories, boredom, course selection, expulsion, failing grade, teacher conflict, and pull-out categories, employment, family problems, illness, marriage, and pregnancy to provide insights into the regional differences. In this article, we demonstrate that correspondence analysis can extend the insights gained from SDLS data and provide new perspectives on dropouts. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 191-198 Issue: 2 Volume: 72 Year: 2018 Month: 4 X-DOI: 10.1080/00031305.2017.1322002 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1322002 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:2:p:191-198 Template-Type: ReDIF-Article 1.0 Author-Name: Alexander B. Sibley Author-X-Name-First: Alexander Author-X-Name-Last: B. Sibley Author-Name: Zhiguo Li Author-X-Name-First: Zhiguo Author-X-Name-Last: Li Author-Name: Yu Jiang Author-X-Name-First: Yu Author-X-Name-Last: Jiang Author-Name: Yi-Ju Li Author-X-Name-First: Yi-Ju Author-X-Name-Last: Li Author-Name: Cliburn Chan Author-X-Name-First: Cliburn Author-X-Name-Last: Chan Author-Name: Andrew Allen Author-X-Name-First: Andrew Author-X-Name-Last: Allen Author-Name: Kouros Owzar Author-X-Name-First: Kouros Author-X-Name-Last: Owzar Title: Facilitating the Calculation of the Efficient Score Using Symbolic Computing Abstract: The score statistic continues to be a fundamental tool for statistical inference. In the analysis of data from high-throughput genomic assays, inference on the basis of the score usually enjoys greater stability, considerably higher computational efficiency, and lends itself more readily to the use of resampling methods than the asymptotically equivalent Wald or likelihood ratio tests. The score function often depends on a set of unknown nuisance parameters which have to be replaced by estimators, but can be improved by calculating the efficient score, which accounts for the variability induced by estimating these parameters. Manual derivation of the efficient score is tedious and error-prone, so we illustrate using computer algebra to facilitate this derivation. We demonstrate this process within the context of a standard example from genetic association analyses, though the techniques shown here could be applied to any derivation, and have a place in the toolbox of any modern statistician. We further show how the resulting symbolic expressions can be readily ported to compiled languages, to develop fast numerical algorithms for high-throughput genomic analysis. We conclude by considering extensions of this approach. The code featured in this report is available online as part of the supplementary material. Journal: The American Statistician Pages: 199-205 Issue: 2 Volume: 72 Year: 2018 Month: 4 X-DOI: 10.1080/00031305.2017.1392361 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1392361 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:2:p:199-205 Template-Type: ReDIF-Article 1.0 Author-Name: The Editors Title: Reviews of Books and Teaching Materials Journal: The American Statistician Pages: 206-212 Issue: 2 Volume: 72 Year: 2018 Month: 4 X-DOI: 10.1080/00031305.2018.1469927 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1469927 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:2:p:206-212 Template-Type: ReDIF-Article 1.0 Author-Name: Amelia McNamara Author-X-Name-First: Amelia Author-X-Name-Last: McNamara Author-Name: Nicholas J. Horton Author-X-Name-First: Nicholas J. Author-X-Name-Last: Horton Title: Wrangling Categorical Data in R Abstract: Data wrangling is a critical foundation of data science, and wrangling of categorical data is an important component of this process. However, categorical data can introduce unique issues in data wrangling, particularly in real-world settings with collaborators and periodically-updated dynamic data. This article discusses common problems arising from categorical variable transformations in R, demonstrates the use of factors, and suggests approaches to address data wrangling challenges. For each problem, we present at least two strategies for management, one in base R and the other from the “tidyverse.”  We consider several motivating examples, suggest defensive coding strategies, and outline principles for data wrangling to help ensure data quality and sound analysis. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 97-104 Issue: 1 Volume: 72 Year: 2018 Month: 1 X-DOI: 10.1080/00031305.2017.1356375 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1356375 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:97-104 Template-Type: ReDIF-Article 1.0 Author-Name: Benjamin S. Baumer Author-X-Name-First: Benjamin S. Author-X-Name-Last: Baumer Title: Lessons From Between the White Lines for Isolated Data Scientists Abstract: Many current and future data scientists will be “isolated”—working alone or in small teams within a larger organization. This isolation brings certain challenges as well as freedoms. Drawing on my considerable experience both working in the professional sports industry and teaching in academia, I discuss troubled waters likely to be encountered by newly minted data scientists and offer advice about how to navigate them. Neither the issues raised nor the advice given are particular to sports and should be applicable to a wide range of knowledge domains. Journal: The American Statistician Pages: 66-71 Issue: 1 Volume: 72 Year: 2018 Month: 1 X-DOI: 10.1080/00031305.2017.1375985 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1375985 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:66-71 Template-Type: ReDIF-Article 1.0 Author-Name: Ben Marwick Author-X-Name-First: Ben Author-X-Name-Last: Marwick Author-Name: Carl Boettiger Author-X-Name-First: Carl Author-X-Name-Last: Boettiger Author-Name: Lincoln Mullen Author-X-Name-First: Lincoln Author-X-Name-Last: Mullen Title: Packaging Data Analytical Work Reproducibly Using R (and Friends) Abstract: Computers are a central tool in the research process, enabling complex and large-scale data analysis. As computer-based research has increased in complexity, so have the challenges of ensuring that this research is reproducible. To address this challenge, we review the concept of the research compendium as a solution for providing a standard and easily recognizable way for organizing the digital materials of a research project to enable other researchers to inspect, reproduce, and extend the research. We investigate how the structure and tooling of software packages of the R programming language are being used to produce research compendia in a variety of disciplines. We also describe how software engineering tools and services are being used by researchers to streamline working with research compendia. Using real-world examples, we show how researchers can improve the reproducibility of their work using research compendia based on R packages and related tools. Journal: The American Statistician Pages: 80-88 Issue: 1 Volume: 72 Year: 2018 Month: 1 X-DOI: 10.1080/00031305.2017.1375986 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1375986 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:80-88 Template-Type: ReDIF-Article 1.0 Author-Name: Shannon E. Ellis Author-X-Name-First: Shannon E. Author-X-Name-Last: Ellis Author-Name: Jeffrey T. Leek Author-X-Name-First: Jeffrey T. Author-X-Name-Last: Leek Title: How to Share Data for Collaboration Abstract: Within the statistics community, a number of guiding principles for sharing data have emerged; however, these principles are not always made clear to collaborators generating the data. To bridge this divide, we have established a set of guidelines for sharing data. In these, we highlight the need to provide raw data to the statistician, the importance of consistent formatting, and the necessity of including all essential experimental information and pre-processing steps carried out to the statistician. With these guidelines we hope to avoid errors and delays in data analysis. Journal: The American Statistician Pages: 53-57 Issue: 1 Volume: 72 Year: 2018 Month: 1 X-DOI: 10.1080/00031305.2017.1375987 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1375987 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:53-57 Template-Type: ReDIF-Article 1.0 Author-Name: Lance A. Waller Author-X-Name-First: Lance A. Author-X-Name-Last: Waller Title: Documenting and Evaluating Data Science Contributions in Academic Promotion in Departments of Statistics and Biostatistics Abstract: The dynamic intersection of the field of Data Science with the established academic communities of Statistics and Biostatistics continues to generate lively debate, often with the two fields playing the role of an upstart (but brilliant), tech-savvy prodigy and an established (but brilliant), curmudgeonly expert, respectively. Like any emerging discipline, Data Science brings new perspectives and new tools to address new questions requiring new perspectives on traditionally established concepts. We explore a specific component of this discussion, namely the documentation and evaluation of Data Science-related research, teaching, and service contributions for faculty members seeking promotion and tenure within traditional departments of Statistics and Biostatistics. We focus on three perspectives: the department chair nominating a candidate for promotion, the junior faculty member going up for promotion, and the senior faculty members evaluating the promotion package. We contrast conservative, strategic, and iconoclastic approaches to promotion based on accomplishments in data science. Journal: The American Statistician Pages: 11-19 Issue: 1 Volume: 72 Year: 2018 Month: 1 X-DOI: 10.1080/00031305.2017.1375988 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1375988 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:11-19 Template-Type: ReDIF-Article 1.0 Author-Name: Karl W. Broman Author-X-Name-First: Karl W. Author-X-Name-Last: Broman Author-Name: Kara H. Woo Author-X-Name-First: Kara H. Author-X-Name-Last: Woo Title: Data Organization in Spreadsheets Abstract: Spreadsheets are widely used software tools for data entry, storage, analysis, and visualization. Focusing on the data entry and storage aspects, this article offers practical recommendations for organizing spreadsheet data to reduce errors and ease later analyses. The basic principles are: be consistent, write dates like YYYY-MM-DD, do not leave any cells empty, put just one thing in a cell, organize the data as a single rectangle (with subjects as rows and variables as columns, and with a single header row), create a data dictionary, do not include calculations in the raw data files, do not use font color or highlighting as data, choose good names for things, make backups, use data validation to avoid data entry errors, and save the data in plain text files. Journal: The American Statistician Pages: 2-10 Issue: 1 Volume: 72 Year: 2018 Month: 1 X-DOI: 10.1080/00031305.2017.1375989 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1375989 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:2-10 Template-Type: ReDIF-Article 1.0 Author-Name: Dirk Eddelbuettel Author-X-Name-First: Dirk Author-X-Name-Last: Eddelbuettel Author-Name: James Joseph Balamuta Author-X-Name-First: James Joseph Author-X-Name-Last: Balamuta Title: Extending R with C++: A Brief Introduction to Rcpp Abstract: R has always provided an application programming interface (API) for extensions. Based on the C language, it uses a number of macros and other low-level constructs to exchange data structures between the R process and any dynamically loaded component modules authors added to it. With the introduction of the Rcpp package, and its later refinements, this process has become considerably easier yet also more robust. By now, Rcpp has become the most popular extension mechanism for R. This article introduces Rcpp, and illustrates with several examples how the Rcpp Attributes mechanism in particular eases the transition of objects between R and C++ code. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 28-36 Issue: 1 Volume: 72 Year: 2018 Month: 1 X-DOI: 10.1080/00031305.2017.1375990 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1375990 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:28-36 Template-Type: ReDIF-Article 1.0 Author-Name: Sean J. Taylor Author-X-Name-First: Sean J. Author-X-Name-Last: Taylor Author-Name: Benjamin Letham Author-X-Name-First: Benjamin Author-X-Name-Last: Letham Title: Forecasting at Scale Abstract: Forecasting is a common data science task that helps organizations with capacity planning, goal setting, and anomaly detection. Despite its importance, there are serious challenges associated with producing reliable and high-quality forecasts—especially when there are a variety of time series and analysts with expertise in time series modeling are relatively rare. To address these challenges, we describe a practical approach to forecasting “at scale” that combines configurable models with analyst-in-the-loop performance analysis. We propose a modular regression model with interpretable parameters that can be intuitively adjusted by analysts with domain knowledge about the time series. We describe performance analyses to compare and evaluate forecasting procedures, and automatically flag forecasts for manual review and adjustment. Tools that help analysts to use their expertise most effectively enable reliable, practical forecasting of business time series. Journal: The American Statistician Pages: 37-45 Issue: 1 Volume: 72 Year: 2018 Month: 1 X-DOI: 10.1080/00031305.2017.1380080 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1380080 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:37-45 Template-Type: ReDIF-Article 1.0 Author-Name: Ricardo Bion Author-X-Name-First: Ricardo Author-X-Name-Last: Bion Author-Name: Robert Chang Author-X-Name-First: Robert Author-X-Name-Last: Chang Author-Name: Jason Goodman Author-X-Name-First: Jason Author-X-Name-Last: Goodman Title: How R Helps Airbnb Make the Most of its Data Abstract: At Airbnb, R has been among the most popular tools for doing data science work in many different contexts, including generating product insights, interpreting experiments, and building predictive models. Airbnb supports R usage by creating internal R tools and by creating a community of R users. We provide some specific advice for practitioners who wish to incorporate R into their day-to-day workflow. Journal: The American Statistician Pages: 46-52 Issue: 1 Volume: 72 Year: 2018 Month: 1 X-DOI: 10.1080/00031305.2017.1392362 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1392362 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:46-52 Template-Type: ReDIF-Article 1.0 Author-Name: Ronald Christensen Author-X-Name-First: Ronald Author-X-Name-Last: Christensen Title: Comment on “A Note on Collinearity Diagnostics and Centering” by Velilla (2018) Journal: The American Statistician Pages: 114-117 Issue: 1 Volume: 72 Year: 2018 Month: 1 X-DOI: 10.1080/00031305.2017.1392896 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1392896 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:114-117 Template-Type: ReDIF-Article 1.0 Author-Name: Steven Wu Author-X-Name-First: Steven Author-X-Name-Last: Wu Author-Name: Luke Bornn Author-X-Name-First: Luke Author-X-Name-Last: Bornn Title: Modeling Offensive Player Movement in Professional Basketball Abstract: The 2013 arrival of SportVU player tracking data in all NBA arenas introduced an overwhelming amount of on-court information—information which the league is still learning how to maximize for insights into player performance and basketball strategy. The data contain the spatial coordinates for the ball and every player on the court for 25 frames per second, which opens up avenues of player and team performance analysis that was not possible before this technology existed. This article serves as a step-by-step guide for how to leverage a data feed from SportVU for one NBA game into visualizable components that can model any player's movement on offense. We detail some utility functions that are helpful for manipulating SportVU data before applying it to the task of visualizing player offensive movement. We conclude with visualizations of the resulting output for one NBA game, as well as what the results look like aggregated across an entire season for three NBA stars with very different offensive tendencies. Journal: The American Statistician Pages: 72-79 Issue: 1 Volume: 72 Year: 2018 Month: 1 X-DOI: 10.1080/00031305.2017.1395365 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1395365 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:72-79 Template-Type: ReDIF-Article 1.0 Author-Name: Mine Çetinkaya-Rundel Author-X-Name-First: Mine Author-X-Name-Last: Çetinkaya-Rundel Author-Name: Colin Rundel Author-X-Name-First: Colin Author-X-Name-Last: Rundel Title: Infrastructure and Tools for Teaching Computing Throughout the Statistical Curriculum Abstract: Modern statistics is fundamentally a computational discipline, but too often this fact is not reflected in our statistics curricula. With the rise of big data and data science, it has become increasingly clear that students want, expect, and need explicit training in this area of the discipline. Additionally, recent curricular guidelines clearly state that working with data requires extensive computing skills and that statistics students should be fluent in accessing, manipulating, analyzing, and modeling with professional statistical analysis software. Much has been written in the statistics education literature about pedagogical tools and approaches to provide a practical computational foundation for students. This article discusses the computational infrastructure and toolkit choices to allow for these pedagogical innovations while minimizing frustration and improving adoption for both our students and instructors. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 58-65 Issue: 1 Volume: 72 Year: 2018 Month: 1 X-DOI: 10.1080/00031305.2017.1397549 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1397549 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:58-65 Template-Type: ReDIF-Article 1.0 Author-Name: Daniel Kaplan Author-X-Name-First: Daniel Author-X-Name-Last: Kaplan Title: Teaching Stats for Data Science Abstract: “Data science” is a useful catchword for methods and concepts original to the field of statistics, but typically being applied to large, multivariate, observational records. Such datasets call for techniques not often part of an introduction to statistics: modeling, consideration of covariates, sophisticated visualization, and causal reasoning. This article re-imagines introductory statistics as an introduction to data science and proposes a sequence of 10 blocks that together compose a suitable course for extracting information from contemporary data. Recent extensions to the mosaic packages for R together with tools from the “tidyverse” provide a concise and readable notation for wrangling, visualization, model-building, and model interpretation: the fundamental computational tasks of data science. Journal: The American Statistician Pages: 89-96 Issue: 1 Volume: 72 Year: 2018 Month: 1 X-DOI: 10.1080/00031305.2017.1398107 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1398107 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:89-96 Template-Type: ReDIF-Article 1.0 Author-Name: Santiago Velilla Author-X-Name-First: Santiago Author-X-Name-Last: Velilla Title: Reply Journal: The American Statistician Pages: 117-119 Issue: 1 Volume: 72 Year: 2018 Month: 1 X-DOI: 10.1080/00031305.2017.1398985 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1398985 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:117-119 Template-Type: ReDIF-Article 1.0 Author-Name: Jennifer Bryan Author-X-Name-First: Jennifer Author-X-Name-Last: Bryan Title: Excuse Me, Do You Have a Moment to Talk About Version Control? Abstract: Data analysis, statistical research, and teaching statistics have at least one thing in common: these activities all produce many files! There are data files, source code, figures, tables, prepared reports, and much more. Most of these files evolve over the course of a project and often need to be shared with others, for reading or edits, as a project unfolds. Without explicit and structured management, project organization can easily descend into chaos, taking time away from the primary work and reducing the quality of the final product. This unhappy result can be avoided by repurposing tools and workflows from the software development world, namely, distributed version control. This article describes the use of the version control system Git and the hosting site GitHub for statistical and data scientific workflows. Special attention is given to projects that use the statistical language R and, optionally, R Markdown documents. Supplementary materials include an annotated set of links to step-by-step tutorials, real world examples, and other useful learning resources. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 20-27 Issue: 1 Volume: 72 Year: 2018 Month: 1 X-DOI: 10.1080/00031305.2017.1399928 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1399928 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:20-27 Template-Type: ReDIF-Article 1.0 Author-Name: Hadley Wickham Author-X-Name-First: Hadley Author-X-Name-Last: Wickham Author-Name: Jennifer Bryan Author-X-Name-First: Jennifer Author-X-Name-Last: Bryan Author-Name: Nicole Lazar Author-X-Name-First: Nicole Author-X-Name-Last: Lazar Title: Introduction: Special Issue on Data Science Journal: The American Statistician Pages: 1-1 Issue: 1 Volume: 72 Year: 2018 Month: 1 X-DOI: 10.1080/00031305.2018.1438699 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1438699 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:1-1 Template-Type: ReDIF-Article 1.0 Author-Name: The Editors Title: Reviews of Books and Teaching Materials Journal: The American Statistician Pages: 105-113 Issue: 1 Volume: 72 Year: 2018 Month: 1 X-DOI: 10.1080/00031305.2018.1444855 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1444855 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:105-113 Template-Type: ReDIF-Article 1.0 Author-Name: Alejandro Quintela-del-Río Author-X-Name-First: Alejandro Author-X-Name-Last: Quintela-del-Río Author-Name: Mario Francisco-Fernández Author-X-Name-First: Mario Author-X-Name-Last: Francisco-Fernández Title: Excel Templates: A Helpful Tool for Teaching Statistics Abstract: This article describes a free, open-source collection of templates for the popular Excel (2013, and later versions) spreadsheet program. These templates are spreadsheet files that allow easy and intuitive learning and the implementation of practical examples concerning descriptive statistics, random variables, confidence intervals, and hypothesis testing. Although they are designed to be used with Excel, they can also be employed with other free spreadsheet programs (changing some particular formulas). Moreover, we exploit some possibilities of the ActiveX controls of the Excel Developer Menu to perform interactive Gaussian density charts. Finally, it is important to note that they can be often embedded in a web page, so it is not necessary to employ Excel software for their use. These templates have been designed as a useful tool to teach basic statistics and to carry out data analysis even when the students are not familiar with Excel. Additionally, they can be used as a complement to other analytical software packages. They aim to assist students in learning statistics, within an intuitive working environment. Supplementary materials with the Excel templates are available online. Journal: The American Statistician Pages: 317-325 Issue: 4 Volume: 71 Year: 2017 Month: 10 X-DOI: 10.1080/00031305.2016.1186115 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1186115 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:4:p:317-325 Template-Type: ReDIF-Article 1.0 Author-Name: Philip M. Westgate Author-X-Name-First: Philip M. Author-X-Name-Last: Westgate Author-Name: Woodrow W. Burchett Author-X-Name-First: Woodrow W. Author-X-Name-Last: Burchett Title: A Comparison of Correlation Structure Selection Penalties for Generalized Estimating Equations Abstract: Correlated data are commonly analyzed using models constructed using population-averaged generalized estimating equations (GEEs). The specification of a population-averaged GEE model includes selection of a structure describing the correlation of repeated measures. Accurate specification of this structure can improve efficiency, whereas the finite-sample estimation of nuisance correlation parameters can inflate the variances of regression parameter estimates. Therefore, correlation structure selection criteria should penalize, or account for, correlation parameter estimation. In this article, we compare recently proposed penalties in terms of their impacts on correlation structure selection and regression parameter estimation, and give practical considerations for data analysts. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 344-353 Issue: 4 Volume: 71 Year: 2017 Month: 10 X-DOI: 10.1080/00031305.2016.1200490 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1200490 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:4:p:344-353 Template-Type: ReDIF-Article 1.0 Author-Name: Frank Tuyl Author-X-Name-First: Frank Author-X-Name-Last: Tuyl Title: A Note on Priors for the Multinomial Model Abstract: An “overall objective” prior proposed for the multinomial model is shown to be inadequate in the presence of zero counts. An earlier proposed reference prior for when interest is in a particular category suffers from similar problems. It is argued that there is no need to deviate from the uniform prior proposed by Jeffreys, for which links with a non-Bayesian approach, when prediction is of interest, are shown. Journal: The American Statistician Pages: 298-301 Issue: 4 Volume: 71 Year: 2017 Month: 10 X-DOI: 10.1080/00031305.2016.1222309 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1222309 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:4:p:298-301 Template-Type: ReDIF-Article 1.0 Author-Name: Jesse Frey Author-X-Name-First: Jesse Author-X-Name-Last: Frey Author-Name: Yimin Zhang Author-X-Name-First: Yimin Author-X-Name-Last: Zhang Title: What Do Interpolated Nonparametric Confidence Intervals for Population Quantiles Guarantee? Abstract: The interval between two prespecified order statistics of a sample provides a distribution-free confidence interval for a population quantile. However, due to discreteness, only a small set of exact coverage probabilities is available. Interpolated confidence intervals are designed to expand the set of available coverage probabilities. However, we show here that the infimum of the coverage probability for an interpolated confidence interval is either the coverage probability for the inner interval or the coverage probability obtained by removing the more likely of the two extreme subintervals from the outer interval. Thus, without additional assumptions, interpolated intervals do not expand the set of available guaranteed coverage probabilities. Journal: The American Statistician Pages: 305-309 Issue: 4 Volume: 71 Year: 2017 Month: 10 X-DOI: 10.1080/00031305.2016.1226952 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1226952 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:4:p:305-309 Template-Type: ReDIF-Article 1.0 Author-Name: Peter K. Dunn Author-X-Name-First: Peter K. Author-X-Name-Last: Dunn Author-Name: Michael D. Carey Author-X-Name-First: Michael D. Author-X-Name-Last: Carey Author-Name: Michael B. Farrar Author-X-Name-First: Michael B. Author-X-Name-Last: Farrar Author-Name: Alice M. Richardson Author-X-Name-First: Alice M. Author-X-Name-Last: Richardson Author-Name: Christine McDonald Author-X-Name-First: Christine Author-X-Name-Last: McDonald Title: Introductory Statistics Textbooks and the GAISE Recommendations Abstract: The six recommendations made by the Guidelines for Assessment and Instruction in Statistics Education (GAISE) committee were first communicated in 2005 and more formally in 2010. In this article, 25 introductory statistics textbooks are examined to assess how well these textbooks have incorporated the three GAISE recommendations most relevant to implementation in textbooks (statistical literacy and thinking; use of real data; stress concepts over procedures). The implementation of another recommendation (using technology) is described but not assessed. In general, most textbooks appear to be adopting the GAISE recommendations reasonably well in both exposition and exercises. The textbooks are particularly adept at using real data, using real data well, and promoting statistical literacy. Textbooks are less adept—but still rated reasonably well, in general—at explaining concepts over procedures and promoting statistical thinking. In contrast, few textbooks have easy-usable glossaries of statistical terms to assist with understanding of statistical language and literacy development. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 326-335 Issue: 4 Volume: 71 Year: 2017 Month: 10 X-DOI: 10.1080/00031305.2016.1251972 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1251972 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:4:p:326-335 Template-Type: ReDIF-Article 1.0 Author-Name: Thomas J. DiCiccio Author-X-Name-First: Thomas J. Author-X-Name-Last: DiCiccio Author-Name: Todd A. Kuffner Author-X-Name-First: Todd A. Author-X-Name-Last: Kuffner Author-Name: G. Alastair Young Author-X-Name-First: G. Alastair Author-X-Name-Last: Young Title: A Simple Analysis of the Exact Probability Matching Prior in the Location-Scale Model Abstract: It has long been asserted that in univariate location-scale models, when concerned with inference for either the location or scale parameter, the use of the inverse of the scale parameter as a Bayesian prior yields posterior credible sets that have exactly the correct frequentist confidence set interpretation. This claim dates to at least Peers, and has subsequently been noted by various authors, with varying degrees of justification. We present a simple, direct demonstration of the exact matching property of the posterior credible sets derived under use of this prior in the univariate location-scale model. This is done by establishing an equivalence between the conditional frequentist and posterior densities of the pivotal quantities on which conditional frequentist inferences are based. Journal: The American Statistician Pages: 302-304 Issue: 4 Volume: 71 Year: 2017 Month: 10 X-DOI: 10.1080/00031305.2016.1255662 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1255662 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:4:p:302-304 Template-Type: ReDIF-Article 1.0 Author-Name: Joseph B. Lang Author-X-Name-First: Joseph B. Author-X-Name-Last: Lang Title: Mean-Minimum Exact Confidence Intervals Abstract: This article introduces mean-minimum (MM) exact confidence intervals for a binomial probability. These intervals guarantee that both the mean and the minimum frequentist coverage never drop below specified values. For example, an MM 95[93]% interval has mean coverage at least 95% and minimum coverage at least 93%. In the conventional sense, such an interval can be viewed as an exact 93% interval that has mean coverage at least 95% or it can be viewed as an approximate 95% interval that has minimum coverage at least 93%. Graphical and numerical summaries of coverage and expected length suggest that the Blaker-based MM exact interval is an attractive alternative to, even an improvement over, commonly recommended approximate and exact intervals, including the Agresti–Coull approximate interval, the Clopper–Pearson (CP) exact interval, and the more recently recommended CP-, Blaker-, and Sterne-based mean-coverage-adjusted approximate intervals. Journal: The American Statistician Pages: 354-368 Issue: 4 Volume: 71 Year: 2017 Month: 10 X-DOI: 10.1080/00031305.2016.1256838 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1256838 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:4:p:354-368 Template-Type: ReDIF-Article 1.0 Author-Name: Dabao Zhang Author-X-Name-First: Dabao Author-X-Name-Last: Zhang Title: A Coefficient of Determination for Generalized Linear Models Abstract: The coefficient of determination, a.k.a. R2, is well-defined in linear regression models, and measures the proportion of variation in the dependent variable explained by the predictors included in the model. To extend it for generalized linear models, we use the variance function to define the total variation of the dependent variable, as well as the remaining variation of the dependent variable after modeling the predictive effects of the independent variables. Unlike other definitions that demand complete specification of the likelihood function, our definition of R2 only needs to know the mean and variance functions, so applicable to more general quasi-models. It is consistent with the classical measure of uncertainty using variance, and reduces to the classical definition of the coefficient of determination when linear regression models are considered. Journal: The American Statistician Pages: 310-316 Issue: 4 Volume: 71 Year: 2017 Month: 10 X-DOI: 10.1080/00031305.2016.1256839 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1256839 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:4:p:310-316 Template-Type: ReDIF-Article 1.0 Author-Name: Ning Hao Author-X-Name-First: Ning Author-X-Name-Last: Hao Author-Name: Hao Helen Zhang Author-X-Name-First: Hao Helen Author-X-Name-Last: Zhang Title: A Note on High-Dimensional Linear Regression With Interactions Abstract: The problem of interaction selection in high-dimensional data analysis has recently received much attention. This note aims to address and clarify several fundamental issues in interaction selection for linear regression models, especially when the input dimension p is much larger than the sample size n. We first discuss how to give a formal definition of “importance” for main and interaction effects. Then we focus on two-stage methods, which are computationally attractive for high-dimensional data analysis but thus far have been regarded as heuristic. We revisit the counterexample of Turlach and provide new insight to justify two-stage methods from the theoretical perspective. In the end, we suggest new strategies for interaction selection under the marginality principle and provide some simulation results. Journal: The American Statistician Pages: 291-297 Issue: 4 Volume: 71 Year: 2017 Month: 10 X-DOI: 10.1080/00031305.2016.1264311 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1264311 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:4:p:291-297 Template-Type: ReDIF-Article 1.0 Author-Name: Luís Gustavo Esteves Author-X-Name-First: Luís Gustavo Author-X-Name-Last: Esteves Author-Name: Rafael Izbicki Author-X-Name-First: Rafael Author-X-Name-Last: Izbicki Author-Name: Rafael Bassi Stern Author-X-Name-First: Rafael Bassi Author-X-Name-Last: Stern Title: Teaching Decision Theory Proof Strategies Using a Crowdsourcing Problem Abstract: Teaching how to derive minimax decision rules can be challenging because of the lack of examples that are simple enough to be used in the classroom. Motivated by this challenge, we provide a new example that illustrates the use of standard techniques in the derivation of optimal decision rules under the Bayes and minimax approaches. We discuss how to predict the value of an unknown quantity, θ ∈ {0, 1}, given the opinions of n experts. An important example of such crowdsourcing problem occurs in modern cosmology, where θ indicates whether a given galaxy is merging or not, and Y1, …, Yn are the opinions from n astronomers regarding θ. We use the obtained prediction rules to discuss advantages and disadvantages of the Bayes and minimax approaches to decision theory. The material presented here is intended to be taught to first-year graduate students. Journal: The American Statistician Pages: 336-343 Issue: 4 Volume: 71 Year: 2017 Month: 10 X-DOI: 10.1080/00031305.2016.1264316 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1264316 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:4:p:336-343 Template-Type: ReDIF-Article 1.0 Author-Name: Ronald Christensen Author-X-Name-First: Ronald Author-X-Name-Last: Christensen Title: Comment on “The Target Parameter of Adjusted R-Squared in Fixed-Design Experiments” by Bar-Gera (2017) Journal: The American Statistician Pages: 373-375 Issue: 4 Volume: 71 Year: 2017 Month: 10 X-DOI: 10.1080/00031305.2017.1358215 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1358215 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:4:p:373-375 Template-Type: ReDIF-Article 1.0 Author-Name: The Editors Title: Editorial Collaborators Journal: The American Statistician Pages: 376-377 Issue: 4 Volume: 71 Year: 2017 Month: 10 X-DOI: 10.1080/00031305.2017.1395629 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1395629 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:4:p:376-377 Template-Type: ReDIF-Article 1.0 Author-Name: The Editors Title: Reviews of Books and Teaching Materials Journal: The American Statistician Pages: 369-372 Issue: 4 Volume: 71 Year: 2017 Month: 10 X-DOI: 10.1080/00031305.2017.1395630 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1395630 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:4:p:369-372 Template-Type: ReDIF-Article 1.0 Author-Name: The Editors Title: Editorial Board EOV Journal: The American Statistician Pages: ebi-ebi Issue: 4 Volume: 71 Year: 2017 Month: 10 X-DOI: 10.1080/00031305.2017.1400355 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1400355 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:71:y:2017:i:4:p:ebi-ebi Template-Type: ReDIF-Article 1.0 Author-Name: Sean Kross Author-X-Name-First: Sean Author-X-Name-Last: Kross Author-Name: Roger D. Peng Author-X-Name-First: Roger D. Author-X-Name-Last: Peng Author-Name: Brian S. Caffo Author-X-Name-First: Brian S. Author-X-Name-Last: Caffo Author-Name: Ira Gooding Author-X-Name-First: Ira Author-X-Name-Last: Gooding Author-Name: Jeffrey T. Leek Author-X-Name-First: Jeffrey T. Author-X-Name-Last: Leek Title: The Democratization of Data Science Education Abstract: Over the last three decades, data have become ubiquitous and cheap. This transition has accelerated over the last five years and training in statistics, machine learning, and data analysis has struggled to keep up. In April 2014, we launched a program of nine courses, the Johns Hopkins Data Science Specialization, which has now had more than 4 million enrollments over the past five years. Here, the program is described and compared to standard data science curricula as they were organized in 2014 and 2015. We show that novel pedagogical and administrative decisions introduced in our program are now standard in online data science programs. The impact of the Data Science Specialization on data science education in the U.S. is also discussed. Finally, we conclude with some thoughts about the future of data science education in a data democratized world. Journal: The American Statistician Pages: 1-7 Issue: 1 Volume: 74 Year: 2020 Month: 1 X-DOI: 10.1080/00031305.2019.1668849 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1668849 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:1-7 Template-Type: ReDIF-Article 1.0 Author-Name: Fulya Gokalp Yavuz Author-X-Name-First: Fulya Gokalp Author-X-Name-Last: Yavuz Author-Name: Mark Daniel Ward Author-X-Name-First: Mark Daniel Author-X-Name-Last: Ward Title: Fostering Undergraduate Data Science Abstract: Data Science is one of the newest interdisciplinary areas. It is transforming our lives unexpectedly fast. This transformation is also happening in our learning styles and practicing habits. We advocate an approach to data science training that uses several types of computational tools, including R, bash, awk, regular expressions, SQL, and XPath, often used in tandem. We discuss ways for undergraduate mentees to learn about data science topics, at an early point in their training. We give some intuition for researchers, professors, and practitioners about how to effectively embed real-life examples into data science learning environments. As a result, we have a unified program built on a foundation of team-oriented, data-driven projects. Journal: The American Statistician Pages: 8-16 Issue: 1 Volume: 74 Year: 2020 Month: 1 X-DOI: 10.1080/00031305.2017.1407360 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1407360 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:8-16 Template-Type: ReDIF-Article 1.0 Author-Name: Debashis Chatterjee Author-X-Name-First: Debashis Author-X-Name-Last: Chatterjee Author-Name: Trisha Maitra Author-X-Name-First: Trisha Author-X-Name-Last: Maitra Author-Name: Sourabh Bhattacharya Author-X-Name-First: Sourabh Author-X-Name-Last: Bhattacharya Title: A Short Note on Almost Sure Convergence of Bayes Factors in the General Set-Up Abstract: Although there is a significant literature on the asymptotic theory of Bayes factor, the set-ups considered are usually specialized and often involves independent and identically distributed data. Even in such specialized cases, mostly weak consistency results are available. In this article, for the first time ever, we derive the almost sure convergence theory of Bayes factor in the general set-up that includes even dependent data and misspecified models. Somewhat surprisingly, the key to the proof of such a general theory is a simple application of a result of Shalizi to a well-known identity satisfied by the Bayes factor. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 17-20 Issue: 1 Volume: 74 Year: 2020 Month: 1 X-DOI: 10.1080/00031305.2017.1397548 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1397548 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:17-20 Template-Type: ReDIF-Article 1.0 Author-Name: Niels G. Waller Author-X-Name-First: Niels G. Author-X-Name-Last: Waller Title: Generating Correlation Matrices With Specified Eigenvalues Using the Method of Alternating Projections Abstract: This article describes a new algorithm for generating correlation matrices with specified eigenvalues. The algorithm uses the method of alternating projections (MAP) that was first described by Neumann. The MAP algorithm for generating correlation matrices is both easy to understand and to program in higher-level computer languages, making this method accessible to applied researchers with no formal training in advanced mathematics. Simulations indicate that the new algorithm has excellent convergence properties. Correlation matrices with specified eigenvalues can be profitably used in Monte Carlo research in statistics, psychometrics, computer science, and related disciplines. To encourage such use, R code (R Core Team) for implementing the algorithm is provided in the supplementary material. Journal: The American Statistician Pages: 21-28 Issue: 1 Volume: 74 Year: 2020 Month: 1 X-DOI: 10.1080/00031305.2017.1401960 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1401960 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:21-28 Template-Type: ReDIF-Article 1.0 Author-Name: Xinjie Hu Author-X-Name-First: Xinjie Author-X-Name-Last: Hu Author-Name: Aekyung Jung Author-X-Name-First: Aekyung Author-X-Name-Last: Jung Author-Name: Gengsheng Qin Author-X-Name-First: Gengsheng Author-X-Name-Last: Qin Title: Interval Estimation for the Correlation Coefficient Abstract: The correlation coefficient (CC) is a standard measure of a possible linear association between two continuous random variables. The CC plays a significant role in many scientific disciplines. For a bivariate normal distribution, there are many types of confidence intervals for the CC, such as z-transformation and maximum likelihood-based intervals. However, when the underlying bivariate distribution is unknown, the construction of confidence intervals for the CC is not well-developed. In this paper, we discuss various interval estimation methods for the CC. We propose a generalized confidence interval for the CC when the underlying bivariate distribution is a normal distribution, and two empirical likelihood-based intervals for the CC when the underlying bivariate distribution is unknown. We also conduct extensive simulation studies to compare the new intervals with existing intervals in terms of coverage probability and interval length. Finally, two real examples are used to demonstrate the application of the proposed methods. Journal: The American Statistician Pages: 29-36 Issue: 1 Volume: 74 Year: 2020 Month: 1 X-DOI: 10.1080/00031305.2018.1437077 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1437077 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:29-36 Template-Type: ReDIF-Article 1.0 Author-Name: Johan René van Dorp Author-X-Name-First: Johan René Author-X-Name-Last: van Dorp Author-Name: M. C. Jones Author-X-Name-First: M. C. Author-X-Name-Last: Jones Title: The Johnson System of Frequency Curves—Historical, Graphical, and Limiting Perspectives Abstract: The idea of transforming one random variate to another with a more convenient density has been developed in the first half of the 20th century. In his thesis, Norman L. Johnson (1917–2004) developed a pioneering system of transformations of the standard normal distribution which gained substantial popularity in the second half of the 20th century and beyond. In Johnson’s 1949 Biometrika paper entitled Systems of frequency curves generated by methods of translation, summarizing that thesis, one of his primary interests was the behavior of the shape of the probability density functions as their parameter values change. Herein, we attempt to further elucidate this behavior through a series of geometric expositions of that transformation process. In these expositions insight is obtained into the behavior of Johnson’s density functions, and their skewness and kurtosis, as they converge to their limiting distributions, a topic which received little attention. Journal: The American Statistician Pages: 37-52 Issue: 1 Volume: 74 Year: 2020 Month: 1 X-DOI: 10.1080/00031305.2019.1637778 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1637778 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:37-52 Template-Type: ReDIF-Article 1.0 Author-Name: Chien-Lang Su Author-X-Name-First: Chien-Lang Author-X-Name-Last: Su Author-Name: Sun-Hao Chang Author-X-Name-First: Sun-Hao Author-X-Name-Last: Chang Author-Name: Ruby Chiu-Hsing Weng Author-X-Name-First: Ruby Chiu-Hsing Author-X-Name-Last: Weng Title: A Note on Item Response Theory Modeling for Online Customer Ratings Abstract: Online consumer product ratings data are increasing rapidly. While most of the current graphical displays mainly represent the average ratings, Ho and Quinn proposed an easily interpretable graphical display based on an ordinal item response theory (IRT) model, which successfully accounts for systematic interrater differences. Conventionally, the discrimination parameters in IRT models are constrained to be positive, particularly in the modeling of scored data from educational tests. In this article, we use real-world ratings data to demonstrate that such a constraint can have a great impact on the parameter estimation. This impact on estimation was explained through rater behavior. We also discuss correlation among raters and assess the prediction accuracy for both the constrained and the unconstrained models. The results show that the unconstrained model performs better when a larger fraction of rater pairs exhibit negative correlations in ratings. Journal: The American Statistician Pages: 53-63 Issue: 1 Volume: 74 Year: 2020 Month: 1 X-DOI: 10.1080/00031305.2017.1422804 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1422804 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:53-63 Template-Type: ReDIF-Article 1.0 Author-Name: Tamal Ghosh Author-X-Name-First: Tamal Author-X-Name-Last: Ghosh Author-Name: Malay Ghosh Author-X-Name-First: Malay Author-X-Name-Last: Ghosh Author-Name: Tatsuya Kubokawa Author-X-Name-First: Tatsuya Author-X-Name-Last: Kubokawa Title: On the Loss Robustness of Least-Square Estimators Abstract: The article revisits univariate and multivariate linear regression models. It is shown that least-square estimators (LSEs) are minimum risk estimators in general class of linear unbiased estimators under some general divergence loss. This amounts to the loss robustness of LSEs. Journal: The American Statistician Pages: 64-67 Issue: 1 Volume: 74 Year: 2020 Month: 1 X-DOI: 10.1080/00031305.2018.1529626 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1529626 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:64-67 Template-Type: ReDIF-Article 1.0 Author-Name: Román Salmerón Gómez Author-X-Name-First: Román Author-X-Name-Last: Salmerón Gómez Author-Name: Catalina García García Author-X-Name-First: Catalina Author-X-Name-Last: García García Author-Name: Jose García Pérez Author-X-Name-First: Jose Author-X-Name-Last: García Pérez Title: Comment on “A Note on Collinearity Diagnostics and Centering” by Velilla (2018) Journal: The American Statistician Pages: 68-71 Issue: 1 Volume: 74 Year: 2020 Month: 1 X-DOI: 10.1080/00031305.2019.1635527 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1635527 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:68-71 Template-Type: ReDIF-Article 1.0 Author-Name: Victor De Oliveira Author-X-Name-First: Victor Author-X-Name-Last: De Oliveira Title: Models for Geostatistical Binary Data: Properties and Connections Abstract: This article explores models for geostatistical data for situations in which the region where the phenomenon of interest varies is partitioned into two disjoint subregions. This is called a binary map. The goals of the article are 3-fold. First, a review is provided of the classes of models that have been proposed so far in the literature for geostatistical binary data as well as a description of their main features. A problems with the use of moment-based models is pointed out. Second, a generalization is provided of the clipped Gaussian random field that eases regression function modeling, interpretation of the regression parameters, and establishing connections with other models. The second-order properties of this model are studied in some detail. Finally, connections between the aforementioned classes of models are established, showing that some of these are reformulations (reparameterizations) of the other classes of models. Journal: The American Statistician Pages: 72-79 Issue: 1 Volume: 74 Year: 2020 Month: 1 X-DOI: 10.1080/00031305.2018.1444674 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1444674 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:72-79 Template-Type: ReDIF-Article 1.0 Author-Name: Peter H. Peskun Author-X-Name-First: Peter H. Author-X-Name-Last: Peskun Title: Two-Tailed p-Values and Coherent Measures of Evidence Abstract: In a test of significance, it is common practice to report the p-value as one way of summarizing the incompatibility between a set of data and a proposed model for the data constructed under a set of assumptions together with a null hypothesis. However, the p-value does have some flaws, one being in general its definition for two-sided tests and a related serious logical one of incoherence, in its interpretation as a statistical measure of evidence for its respective null hypothesis. We shall address these two issues in this article. Journal: The American Statistician Pages: 80-86 Issue: 1 Volume: 74 Year: 2020 Month: 1 X-DOI: 10.1080/00031305.2018.1475304 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1475304 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:80-86 Template-Type: ReDIF-Article 1.0 Author-Name: Robert B. Gramacy Author-X-Name-First: Robert B. Author-X-Name-Last: Gramacy Title: A Shiny Update to an Old Experiment Game Abstract: Games can be a powerful tool for learning about statistical methodology. Effective game design involves a fine balance between caricature and realism, to simultaneously illustrate salient concepts in a controlled setting and serve as a testament to real-world applicability. Striking that balance is particularly challenging in response surface and design domains, where real-world scenarios often play out over long time scales, during which theories are revised, model and inferential techniques are improved, and knowledge is updated. Here, I present a game, borrowing liberally from one first played over 40 years ago, which attempts to achieve that balance while reinforcing a cascade of topics in modern nonparametric response surfaces, sequential design, and optimization. The game embeds a blackbox simulation within a shiny app whose interface is designed to simulate a realistic information–availability setting, while offering a stimulating, competitive environment wherein students can try out new methodology, and ultimately appreciate its power and limitations. Interface, rules, timing with course material, and evaluation are described, along with a “case study” involving a cohort of students at Virginia Tech. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 87-92 Issue: 1 Volume: 74 Year: 2020 Month: 1 X-DOI: 10.1080/00031305.2018.1505659 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1505659 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:87-92 Template-Type: ReDIF-Article 1.0 Author-Name: Barry C. Arnold Author-X-Name-First: Barry C. Author-X-Name-Last: Arnold Title: Further Examples Related to the Identical Distribution of X/(X+Y) and Y/(X+Y) Abstract: The study of conditions under which a two-dimensional random variable (X, Y) will have the property that X/(X+Y)=dY/(X+Y) was initiated by Bhattacharjee and Dhar. Some additional perhaps unexpected examples related to this phenomenon are provided. Discrete and absolutely continuous cases are discussed in detail. Singular continuous cases are briefly mentioned. Journal: The American Statistician Pages: 93-97 Issue: 1 Volume: 74 Year: 2020 Month: 1 X-DOI: 10.1080/00031305.2019.1575772 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1575772 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:93-97 Template-Type: ReDIF-Article 1.0 Author-Name: Micha Mandel Author-X-Name-First: Micha Author-X-Name-Last: Mandel Title: The Scaled Uniform Model Revisited Abstract: Sufficiency, conditionality, and invariance are basic principles of statistical inference. Current mathematical statistics courses do not devote much teaching time to these classical principles, and even ignore the latter two, in order to teach modern methods. However, being the philosophical cornerstones of statistical inference, a minimal understanding of these principles should be part of any curriculum in statistics. The scaled uniform model is used here to demonstrate the importance and usefulness of the conditionality principle, which is probably the most basic and less familiar among the three. Journal: The American Statistician Pages: 98-100 Issue: 1 Volume: 74 Year: 2020 Month: 1 X-DOI: 10.1080/00031305.2019.1604431 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1604431 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:98-100 Template-Type: ReDIF-Article 1.0 Author-Name: Jean-Louis Foulley Author-X-Name-First: Jean-Louis Author-X-Name-Last: Foulley Title: Benjamin, D. J., and Berger, J. O. (2019), “Three Recommendations for Improving the Use of p-Values”, The American Statistician, 73, 186–191: Comment by Foulley Journal: The American Statistician Pages: 101-102 Issue: 1 Volume: 74 Year: 2020 Month: 1 X-DOI: 10.1080/00031305.2019.1668850 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1668850 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:101-102 Template-Type: ReDIF-Article 1.0 Author-Name: Todd A. Kuffner Author-X-Name-First: Todd A. Author-X-Name-Last: Kuffner Author-Name: Stephen G. Walker Author-X-Name-First: Stephen G. Author-X-Name-Last: Walker Title: Why are p-Values Controversial? Abstract: While it is often argued that a p-value is a probability; see Wasserstein and Lazar, we argue that a p-value is not defined as a probability. A p-value is a bijection of the sufficient statistic for a given test which maps to the same scale as the Type I error probability. As such, the use of p-values in a test should be no more a source of controversy than the use of a sufficient statistic. It is demonstrated that there is, in fact, no ambiguity about what a p-value is, contrary to what has been claimed in recent public debates in the applied statistics community. We give a simple example to illustrate that rejecting the use of p-values in testing for a normal mean parameter is conceptually no different from rejecting the use of a sample mean. The p-value is innocent; the problem arises from its misuse and misinterpretation. The way that p-values have been informally defined and interpreted appears to have led to tremendous confusion and controversy regarding their place in statistical analysis. Journal: The American Statistician Pages: 1-3 Issue: 1 Volume: 73 Year: 2019 Month: 1 X-DOI: 10.1080/00031305.2016.1277161 File-URL: http://hdl.handle.net/10.1080/00031305.2016.1277161 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:1-3 Template-Type: ReDIF-Article 1.0 Author-Name: Gyuhyeong Goh Author-X-Name-First: Gyuhyeong Author-X-Name-Last: Goh Author-Name: Dipak K. Dey Author-X-Name-First: Dipak K. Author-X-Name-Last: Dey Title: Asymptotic Properties of Marginal Least-Square Estimator for Ultrahigh-Dimensional Linear Regression Models with Correlated Errors Abstract: In this article, we discuss asymptotic properties of marginal least-square estimator for ultrahigh-dimensional linear regression models. We are specifically interested in probabilistic consistency of the marginal least-square estimator in the presence of correlated errors. We show that under a partial orthogonality condition, the marginal least-square estimator can achieve variable selection consistency. In addition, we demonstrate that if a mutual orthogonality holds, the marginal least-square estimator satisfies estimation consistency. The discussed theories are exemplified through extensive simulation studies. Journal: The American Statistician Pages: 4-9 Issue: 1 Volume: 73 Year: 2019 Month: 1 X-DOI: 10.1080/00031305.2017.1302359 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1302359 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:4-9 Template-Type: ReDIF-Article 1.0 Author-Name: Stephen Portnoy Author-X-Name-First: Stephen Author-X-Name-Last: Portnoy Title: Invariance, Optimality, and a 1-Observation Confidence Interval for a Normal Mean Abstract: In a 1965 Decision Theory course at Stanford University, Charles Stein began a digression with “an amusing problem”: is there a proper confidence interval for the mean based on a single observation from a normal distribution with both mean and variance unknown? Stein introduced the interval with endpoints  ± c|X|  and showed indeed that for c large enough, the minimum coverage probability (over all values for the mean and variance) could be made arbitrarily near one. While the problem and coverage calculation were in the author’s hand-written notes from the course, there was no development of any optimality result for the interval. Here, the Hunt–Stein construction plus analysis based on special features of the problem provides a “minimax” rule in the sense that it minimizes the maximum expected length among all procedures with fixed coverage (or, equivalently, maximizes the minimal coverage among all procedures with a fixed expected length). The minimax rule is a mixture of two confidence procedures that are equivariant under scale and sign changes, and are uniformly better than the classroom example or the natural interval  X ± c|X| . Journal: The American Statistician Pages: 10-15 Issue: 1 Volume: 73 Year: 2019 Month: 1 X-DOI: 10.1080/00031305.2017.1360796 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1360796 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:10-15 Template-Type: ReDIF-Article 1.0 Author-Name: M. C. Jones Author-X-Name-First: M. C. Author-X-Name-Last: Jones Author-Name: Éric Marchand Author-X-Name-First: Éric Author-X-Name-Last: Marchand Author-Name: William E. Strawderman Author-X-Name-First: William E. Author-X-Name-Last: Strawderman Title: On An Intriguing Distributional Identity Abstract: For a continuous random variable X with support equal to (a, b), with c.d.f. F, and g: Ω1 → Ω2 a continuous, strictly increasing function, such that Ω1∩Ω2⊇(a, b), but otherwise arbitrary, we establish that the random variables F(X) − F(g(X)) and F(g− 1(X)) − F(X) have the same distribution. Further developments, accompanied by illustrations and observations, address as well the equidistribution identity U − ψ(U) = dψ− 1(U) − U for U ∼ U(0, 1), where ψ is a continuous, strictly increasing and onto function, but otherwise arbitrary. Finally, we expand on applications with connections to variance reduction techniques, the discrepancy between distributions, and a risk identity in predictive density estimation. Journal: The American Statistician Pages: 16-21 Issue: 1 Volume: 73 Year: 2019 Month: 1 X-DOI: 10.1080/00031305.2017.1375984 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1375984 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:16-21 Template-Type: ReDIF-Article 1.0 Author-Name: Mithat Gönen Author-X-Name-First: Mithat Author-X-Name-Last: Gönen Author-Name: Wesley O. Johnson Author-X-Name-First: Wesley O. Author-X-Name-Last: Johnson Author-Name: Yonggang Lu Author-X-Name-First: Yonggang Author-X-Name-Last: Lu Author-Name: Peter H. Westfall Author-X-Name-First: Peter H. Author-X-Name-Last: Westfall Title: Comparing Objective and Subjective Bayes Factors for the Two-Sample Comparison: The Classification Theorem in Action Abstract: Many Bayes factors have been proposed for comparing population means in two-sample (independent samples) studies. Recently, Wang and Liu presented an “objective” Bayes factor (BF) as an alternative to a “subjective” one presented by Gönen et al. Their report was evidently intended to show the superiority of their BF based on “undesirable behavior” of the latter. A wonderful aspect of Bayesian models is that they provide an opportunity to “lay all cards on the table.” What distinguishes the various BFs in the two-sample problem is the choice of priors (cards) for the model parameters. This article discusses desiderata of BFs that have been proposed, and proposes a new criterion to compare BFs, no matter whether subjectively or objectively determined. A BF may be preferred if it correctly classifies the data as coming from the correct model most often. The criterion is based on a famous result in classification theory to minimize the total probability of misclassification. This criterion is objective, easily verified by simulation, shows clearly the effects (positive or negative) of assuming particular priors, provides new insights into the appropriateness of BFs in general, and provides a new answer to the question, “Which BF is best?” Journal: The American Statistician Pages: 22-31 Issue: 1 Volume: 73 Year: 2019 Month: 1 X-DOI: 10.1080/00031305.2017.1322142 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1322142 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:22-31 Template-Type: ReDIF-Article 1.0 Author-Name: Diana C. Mutz Author-X-Name-First: Diana C. Author-X-Name-Last: Mutz Author-Name: Robin Pemantle Author-X-Name-First: Robin Author-X-Name-Last: Pemantle Author-Name: Philip Pham Author-X-Name-First: Philip Author-X-Name-Last: Pham Title: The Perils of Balance Testing in Experimental Design: Messy Analyses of Clean Data Abstract: Widespread concern over the credibility of published results has led to scrutiny of statistical practices. We address one aspect of this problem that stems from the use of balance tests in conjunction with experimental data. When random assignment is botched, due either to mistakes in implementation or differential attrition, balance tests can be an important tool in determining whether to treat the data as observational versus experimental. Unfortunately, the use of balance tests has become commonplace in analyses of “clean” data, that is, data for which random assignment can be stipulated. Here, we show that balance tests can destroy the basis on which scientific conclusions are formed, and can lead to erroneous and even fraudulent conclusions. We conclude by advocating that scientists and journal editors resist the use of balance tests in all analyses of clean data. Supplementary materials for this article are available online Journal: The American Statistician Pages: 32-42 Issue: 1 Volume: 73 Year: 2019 Month: 1 X-DOI: 10.1080/00031305.2017.1322143 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1322143 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:32-42 Template-Type: ReDIF-Article 1.0 Author-Name: Youyi Fong Author-X-Name-First: Youyi Author-X-Name-Last: Fong Author-Name: Ying Huang Author-X-Name-First: Ying Author-X-Name-Last: Huang Title: Modified Wilcoxon–Mann–Whitney Test and Power Against Strong Null Abstract: The Wilcoxon–Mann–Whitney (WMW) test is a popular rank-based two-sample testing procedure for the strong null hypothesis that the two samples come from the same distribution. A modified WMW test, the Fligner–Policello (FP) test, has been proposed for comparing the medians of two populations. A fact that may be under-appreciated among some practitioners is that the FP test can also be used to test the strong null like the WMW. In this article, we compare the power of the WMW and FP tests for testing the strong null. Our results show that neither test is uniformly better than the other and that there can be substantial differences in power between the two choices. We propose a new, modified WMW test that combines the WMW and FP tests. Monte Carlo studies show that the combined test has good power compared to either the WMW and FP test. We provide a fast implementation of the proposed test in an open-source software. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 43-49 Issue: 1 Volume: 73 Year: 2019 Month: 1 X-DOI: 10.1080/00031305.2017.1328375 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1328375 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:43-49 Template-Type: ReDIF-Article 1.0 Author-Name: Xiaofei Wang Author-X-Name-First: Xiaofei Author-X-Name-Last: Wang Author-Name: Nicholas G. Reich Author-X-Name-First: Nicholas G. Author-X-Name-Last: Reich Author-Name: Nicholas J. Horton Author-X-Name-First: Nicholas J. Author-X-Name-Last: Horton Title: Enriching Students’ Conceptual Understanding of Confidence Intervals: An Interactive Trivia-Based Classroom Activity Abstract: Confidence intervals provide a way to determine plausible values for a population parameter. They are omnipresent in research articles involving statistical analyses. Appropriately, a key statistical literacy learning objective is the ability to interpret and understand confidence intervals in a wide range of settings. As instructors, we devote a considerable amount of time and effort to ensure that students master this topic in introductory courses and beyond. Yet, studies continue to find that confidence intervals are commonly misinterpreted and that even experts have trouble calibrating their individual confidence levels. In this article, we present a 10-min trivia game-based activity that addresses these misconceptions by exposing students to confidence intervals from a personal perspective. We describe how the activity can be integrated into a statistics course as a one-time activity or with repetition at intervals throughout a course, discuss results of using the activity in class, and present possible extensions. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 50-55 Issue: 1 Volume: 73 Year: 2019 Month: 1 X-DOI: 10.1080/00031305.2017.1305294 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1305294 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:50-55 Template-Type: ReDIF-Article 1.0 Author-Name: Joel E. Cohen Author-X-Name-First: Joel E. Author-X-Name-Last: Cohen Title: Sum of a Random Number of Correlated Random Variables that Depend on the Number of Summands Abstract: The mean and variance of a sum of a random number of random variables are well known when the number of summands is independent of each summand and when the summands are independent and identically distributed (iid), or when all summands are identical. In scientific and financial applications, the preceding conditions are often too restrictive. Here, we calculate the mean and variance of a sum of a random number of random summands when the mean and variance of each summand depend on the number of summands and when every pair of summands has the same correlation. This article shows that the variance increases with the correlation between summands and equals the variance in the iid or identical cases when the correlation is zero or one. Journal: The American Statistician Pages: 56-60 Issue: 1 Volume: 73 Year: 2019 Month: 1 X-DOI: 10.1080/00031305.2017.1311283 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1311283 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:56-60 Template-Type: ReDIF-Article 1.0 Author-Name: Mario A. Davidson Author-X-Name-First: Mario A. Author-X-Name-Last: Davidson Author-Name: Charlene M. Dewey Author-X-Name-First: Charlene M. Author-X-Name-Last: Dewey Author-Name: Amy E. Fleming Author-X-Name-First: Amy E. Author-X-Name-Last: Fleming Title: Teaching Communication in a Statistical Collaboration Course: A Feasible, Project-Based, Multimodal Curriculum Abstract: Many schools offer a statistical collaboration curriculum using standard instructional methods such as lectures whereby students are taught to successfully apply their training. The process of building statisticians' collaborative skills and characteristics can be challenging due to logistical issues, time constraints, unstructured research problems, and resources. Instructors vary in their pedagogy and topics taught, and students' experiences vary. There is a dearth of literature describing how to implement a course integrating communication skills, critical thinking, collaboration, and the integration of team members in a learner-centered format. Few courses integrate behavior-based learning using role-playing, video demonstration and feedback, case-based teaching activities, and presentation of basic statistical concepts. We have developed and implemented a two-semester biostatistics collaboration course, of which the purpose is to develop the students' knowledge, skills, attitudes, and behaviors necessary to interact effectively with investigators. Our innovative curriculum uses a multimodal, project-based, experiential process to address real-world problems provided by real and/or simulated collaborators while minimizing usual challenges. Rubrics and peer evaluation forms are offered as online supplementary materials. This article describes how a collaboration curriculum focusing on communication and team practice is feasible, how it enhances skill and professionalism, and how it can be implemented at other institutions. Journal: The American Statistician Pages: 61-69 Issue: 1 Volume: 73 Year: 2019 Month: 1 X-DOI: 10.1080/00031305.2018.1448890 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1448890 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:61-69 Template-Type: ReDIF-Article 1.0 Author-Name: Dongmeng Liu Author-X-Name-First: Dongmeng Author-X-Name-Last: Liu Author-Name: Jinko Graham Author-X-Name-First: Jinko Author-X-Name-Last: Graham Title: Simple Measures of Individual Cluster-Membership Certainty for Hard Partitional Clustering Abstract: We propose two probability-like measures of individual cluster-membership certainty that can be applied to a hard partition of the sample such as that obtained from the partitioning around medoids (PAM) algorithm, hierarchical clustering or k-means clustering. One measure extends the individual silhouette widths and the other is obtained directly from the pairwise dissimilarities in the sample. Unlike the classic silhouette, however, the measures behave like probabilities and can be used to investigate an individual’s tendency to belong to a cluster. We also suggest two possible ways to evaluate the hard partition using these measures. We evaluate the performance of both measures in individuals with ambiguous cluster membership, using simulated binary datasets that have been partitioned by the PAM algorithm or continuous datasets that have been partitioned by hierarchical clustering and k-means clustering. For comparison, we also present results from soft-clustering algorithms such as soft analysis clustering (FANNY) and two model-based clustering methods. Our proposed measures perform comparably to the posterior probability estimators from either FANNY or the model-based clustering methods. We also illustrate the proposed measures by applying them to Fisher’s classic dataset on irises. Journal: The American Statistician Pages: 70-79 Issue: 1 Volume: 73 Year: 2019 Month: 1 X-DOI: 10.1080/00031305.2018.1459315 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1459315 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:70-79 Template-Type: ReDIF-Article 1.0 Author-Name: Jianjun Wang Author-X-Name-First: Jianjun Author-X-Name-Last: Wang Author-Name: Dallas E. Johnson Author-X-Name-First: Dallas E. Author-X-Name-Last: Johnson Title: An Examination of Discrepancies in Multiple Imputation Procedures Between SAS® and SPSS® Abstract: Multiple imputation (MI) has become a feasible method to replace missing data due to the rapid development of computer technology over the past three decades. Nonetheless, a unique issue with MI hinges on the fact that different software packages can give different results. Even when one begins with the same random number seed, conflicting findings can be obtained from the same data under an identical imputation model between SAS® and SPSS®. Consequently, as illustrated in this article, a predictor variable can be claimed both significant and not significant depending on the software being used. Based on the considerations of multiple imputation steps, including result pooling, default selection, and different numbers of imputations, practical suggestions are provided to minimize the discrepancies in the results obtained when using MI. Features of Stata® are briefly reviewed in the Discussion section to broaden the comparison of MI computing across widely used software packages. Journal: The American Statistician Pages: 80-88 Issue: 1 Volume: 73 Year: 2019 Month: 1 X-DOI: 10.1080/00031305.2018.1437078 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1437078 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:80-88 Template-Type: ReDIF-Article 1.0 Author-Name: Spyros Missiakoulis Author-X-Name-First: Spyros Author-X-Name-Last: Missiakoulis Title: Phlegon's Stem-and-Leaf Display Abstract: The Greek writer Phlegon (80–140 AD) from Tralles in Asia Minor wrote a book entitled On Long-lived Persons that contains a long list of people over a hundred years old. He collected data from the Roman censuses. With respect to the history of statistics, Phlegon's book is the earliest surviving text to use the Stem-and-Leaf display of collected data. Journal: The American Statistician Pages: 89-93 Issue: 1 Volume: 73 Year: 2019 Month: 1 X-DOI: 10.1080/00031305.2017.1328376 File-URL: http://hdl.handle.net/10.1080/00031305.2017.1328376 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:89-93 Template-Type: ReDIF-Article 1.0 Author-Name: Megan D. Higgs Author-X-Name-First: Megan D. Author-X-Name-Last: Higgs Author-Name: Xiaoke Zhang Author-X-Name-First: Xiaoke Author-X-Name-Last: Zhang Author-Name: Angelo Elmi Author-X-Name-First: Angelo Author-X-Name-Last: Elmi Author-Name: James M. Flegal Author-X-Name-First: James M. Author-X-Name-Last: Flegal Author-Name: Jessica Utts Author-X-Name-First: Jessica Author-X-Name-Last: Utts Author-Name: Sandra E. Safo Author-X-Name-First: Sandra E. Author-X-Name-Last: Safo Author-Name: Craig A. Rolling Author-X-Name-First: Craig A. Author-X-Name-Last: Rolling Author-Name: Michael J. Higgins Author-X-Name-First: Michael J. Author-X-Name-Last: Higgins Author-Name: Jingyi Jessica Li Author-X-Name-First: Jingyi Author-X-Name-Last: Jessica Li Title: blogdown: Creating Websites With R Markdown. Journal: The American StatisticianThe American StatisticianThe American StatisticianThe American StatisticianThe American StatisticianThe American StatisticianThe American StatisticianThe American StatisticianThe American StatisticianThe American Statistician Pages: 94-104 Issue: 1 Volume: 73 Year: 2019 Month: 1 X-DOI: 10.1080/00031305.2018.1538846 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1538846 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:94-104 Template-Type: ReDIF-Article 1.0 Author-Name: M.C. Jones Author-X-Name-First: M.C. Author-X-Name-Last: Jones Title: Letter to the Editor Journal: The American Statistician Pages: 105-105 Issue: 1 Volume: 73 Year: 2019 Month: 1 X-DOI: 10.1080/00031305.2018.1556736 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1556736 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:105-105 Template-Type: ReDIF-Article 1.0 Author-Name: The Editors Title: Editorial Collaborators Journal: The American Statistician Pages: 106-108 Issue: 1 Volume: 73 Year: 2019 Month: 1 X-DOI: 10.1080/00031305.2018.1538832 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1538832 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:106-108 Template-Type: ReDIF-Article 1.0 Author-Name: Tim B. Swartz Author-X-Name-First: Tim B. Author-X-Name-Last: Swartz Title: Where Should I Publish My Sports Paper? Abstract: With the increasing fascination of sport in society and the increasing availability of sport-related data, there are great opportunities to carry out sports analytics research. In this article, we discuss some of the issues that are relevant to publishing in the field of sports analytics. Potential publication outlets are identified, some summary statistics are given, and some experiences and opinions are provided. Journal: The American Statistician Pages: 103-108 Issue: 2 Volume: 74 Year: 2020 Month: 4 X-DOI: 10.1080/00031305.2018.1459842 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1459842 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:103-108 Template-Type: ReDIF-Article 1.0 Author-Name: Anne Lott Author-X-Name-First: Anne Author-X-Name-Last: Lott Author-Name: Jerome P. Reiter Author-X-Name-First: Jerome P. Author-X-Name-Last: Reiter Title: Wilson Confidence Intervals for Binomial Proportions With Multiple Imputation for Missing Data Abstract: We present a Wilson interval for binomial proportions for use with multiple imputation for missing data. Using simulation studies, we show that it can have better repeated sampling properties than the usual confidence interval for binomial proportions based on Rubin’s combining rules. Further, in contrast to the usual multiple imputation confidence interval for proportions, the multiple imputation Wilson interval is always bounded by zero and one. Supplementary material is available online. Journal: The American Statistician Pages: 109-115 Issue: 2 Volume: 74 Year: 2020 Month: 4 X-DOI: 10.1080/00031305.2018.1473796 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1473796 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:109-115 Template-Type: ReDIF-Article 1.0 Author-Name: Ernest C. Davenport, Author-X-Name-First: Ernest C. Author-X-Name-Last: Davenport, Author-Name: Kyle Nickodem Author-X-Name-First: Kyle Author-X-Name-Last: Nickodem Author-Name: Mark L. Davison Author-X-Name-First: Mark L. Author-X-Name-Last: Davison Author-Name: Gareth Phillips Author-X-Name-First: Gareth Author-X-Name-Last: Phillips Author-Name: Edmund Graham Author-X-Name-First: Edmund Author-X-Name-Last: Graham Title: The Relative Performance Index: Neutralizing Simpson's Paradox Abstract: Comparing populations on one or more variables is often of interest. These comparisons are typically made using the mean; however, it is well known that mean comparisons can lead to misinterpretation because of Simpson's paradox. Simpson's paradox occurs when there is a differential distribution of subpopulations across the populations being compared and the means of those subpopulations are different. This article develops the relative performance index (RPI) to ameliorate effects of Simpson's paradox. Data from the National Assessment of Educational Progress (NAEP) are used to illustrate use of the new index. The utility of RPI is compared to the population mean and a prior index, the balanced index. This article shows how RPI can be generalized to a variety of contexts with implications for decision making. Journal: The American Statistician Pages: 116-124 Issue: 2 Volume: 74 Year: 2020 Month: 4 X-DOI: 10.1080/00031305.2018.1451777 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1451777 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:116-124 Template-Type: ReDIF-Article 1.0 Author-Name: Vahid Nassiri Author-X-Name-First: Vahid Author-X-Name-Last: Nassiri Author-Name: Geert Molenberghs Author-X-Name-First: Geert Author-X-Name-Last: Molenberghs Author-Name: Geert Verbeke Author-X-Name-First: Geert Author-X-Name-Last: Verbeke Author-Name: João Barbosa-Breda Author-X-Name-First: João Author-X-Name-Last: Barbosa-Breda Title: Iterative Multiple Imputation: A Framework to Determine the Number of Imputed Datasets Abstract: We consider multiple imputation as a procedure iterating over a set of imputed datasets. Based on an appropriate stopping rule the number of imputed datasets is determined. Simulations and real-data analyses indicate that the sufficient number of imputed datasets may in some cases be substantially larger than the very small numbers that are usually recommended. For an easier use in various applications, the proposed method is implemented in the R package imi. Journal: The American Statistician Pages: 125-136 Issue: 2 Volume: 74 Year: 2020 Month: 4 X-DOI: 10.1080/00031305.2018.1543615 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1543615 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:125-136 Template-Type: ReDIF-Article 1.0 Author-Name: Quentin F. Gronau Author-X-Name-First: Quentin F. Author-X-Name-Last: Gronau Author-Name: Alexander Ly Author-X-Name-First: Alexander Author-X-Name-Last: Ly Author-Name: Eric-Jan Wagenmakers Author-X-Name-First: Eric-Jan Author-X-Name-Last: Wagenmakers Title: Informed Bayesian t-Tests Abstract: Across the empirical sciences, few statistical procedures rival the popularity of the frequentist t -test. In contrast, the Bayesian versions of the t -test have languished in obscurity. In recent years, however, the theoretical and practical advantages of the Bayesian t -test have become increasingly apparent and various Bayesian t-tests have been proposed, both objective ones (based on general desiderata) and subjective ones (based on expert knowledge). Here, we propose a flexible t-prior for standardized effect size that allows computation of the Bayes factor by evaluating a single numerical integral. This specification contains previous objective and subjective t-test Bayes factors as special cases. Furthermore, we propose two measures for informed prior distributions that quantify the departure from the objective Bayes factor desiderata of predictive matching and information consistency. We illustrate the use of informed prior distributions based on an expert prior elicitation effort. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 137-143 Issue: 2 Volume: 74 Year: 2020 Month: 4 X-DOI: 10.1080/00031305.2018.1562983 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1562983 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:137-143 Template-Type: ReDIF-Article 1.0 Author-Name: Michael Friendly Author-X-Name-First: Michael Author-X-Name-Last: Friendly Author-Name: Matthew Sigal Author-X-Name-First: Matthew Author-X-Name-Last: Sigal Title: Visualizing Tests for Equality of Covariance Matrices Abstract: This article explores a variety of topics related to the question of testing the equality of covariance matrices in multivariate linear models, particularly in the MANOVA setting. Further, a plot of the components of Box’s M test is proposed that shows how groups differ in covariance and also suggests other visualizations and alternative test statistics. These methods are implemented and freely available in the heplots and candisc packages for R. Examples from the article and some further extensions are available in the online supplementary materials. Journal: The American Statistician Pages: 144-155 Issue: 2 Volume: 74 Year: 2020 Month: 4 X-DOI: 10.1080/00031305.2018.1497537 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1497537 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:144-155 Template-Type: ReDIF-Article 1.0 Author-Name: Christopher Weld Author-X-Name-First: Christopher Author-X-Name-Last: Weld Author-Name: Andrew Loh Author-X-Name-First: Andrew Author-X-Name-Last: Loh Author-Name: Lawrence Leemis Author-X-Name-First: Lawrence Author-X-Name-Last: Leemis Title: Plotting Likelihood-Ratio-Based Confidence Regions for Two-Parameter Univariate Probability Models Abstract: Plotting two-parameter confidence regions is nontrivial. Numerical methods often rely on a computationally expensive grid-like exploration of the parameter space. A recent advance reduces the two-dimensional problem to many one-dimensional problems employing a trigonometric transformation that assigns an angle ϕ from the maximum likelihood estimator, and an unknown radial distance to its confidence region boundary. This paradigm shift can improve computational runtime by orders of magnitude, but it is not robust. Specifically, parameters differing greatly in magnitude and/or challenging nonconvex confidence region shapes make the plot susceptible to inefficiencies and/or inaccuracies. This article improves the technique by (i) keeping confidence region boundary searches in the parameter space, (ii) selectively targeting confidence region boundary points in lieu of uniformly spaced ϕ angles from the maximum likelihood estimator and (iii) enabling access to regions otherwise unreachable due to multiple roots for select ϕ angles. Two heuristics are given for ϕ selection: an elliptic-inspired angle selection heuristic and an intelligent smoothing search heuristic. Finally, a jump-center heuristic permits plotting otherwise inaccessible multiroot regions. This article develops these heuristics for two-parameter likelihood-ratio-based confidence regions associated with univariate probability distributions, and introduces the R conf package, which automates the process and is publicly available via CRAN. Journal: The American Statistician Pages: 156-168 Issue: 2 Volume: 74 Year: 2020 Month: 4 X-DOI: 10.1080/00031305.2018.1564696 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1564696 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:156-168 Template-Type: ReDIF-Article 1.0 Author-Name: Irene Epifanio Author-X-Name-First: Irene Author-X-Name-Last: Epifanio Author-Name: M. Victoria Ibáñez Author-X-Name-First: M. Victoria Author-X-Name-Last: Ibáñez Author-Name: Amelia Simó Author-X-Name-First: Amelia Author-X-Name-Last: Simó Title: Archetypal Analysis With Missing Data: See All Samples by Looking at a Few Based on Extreme Profiles Abstract: In this article, we propose several methodologies for handling missing or incomplete data in archetype analysis (AA) and archetypoid analysis (ADA). AA seeks to find archetypes, which are convex combinations of data points, and to approximate the samples as mixtures of those archetypes. In ADA, the representative archetypal data belong to the sample, that is, they are actual data points. With the proposed procedures, missing data are not discarded or previously filled by imputation and the theoretical properties regarding location of archetypes are guaranteed, unlike the previous approaches. The new procedures adapt the AA algorithm either by considering the missing values in the computation of the solution or by skipping them. In the first case, the solutions of previous approaches are modified to fulfill the theory and a new procedure is proposed, where the missing values are updated by the fitted values. In this second case, the procedure is based on the estimation of dissimilarities between samples and the projection of these dissimilarities in a new space, where AA or ADA is applied, and those results are used to provide a solution in the original space. A comparative analysis is carried out in a simulation study, with favorable results. The methodology is also applied to two real datasets: a well-known climate dataset and a global development dataset. We illustrate how these unsupervised methodologies allow complex data to be understood, even by nonexperts. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 169-183 Issue: 2 Volume: 74 Year: 2020 Month: 4 X-DOI: 10.1080/00031305.2018.1545700 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1545700 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:169-183 Template-Type: ReDIF-Article 1.0 Author-Name: Kelsey L. Grantham Author-X-Name-First: Kelsey L. Author-X-Name-Last: Grantham Author-Name: Andrew B. Forbes Author-X-Name-First: Andrew B. Author-X-Name-Last: Forbes Author-Name: Stephane Heritier Author-X-Name-First: Stephane Author-X-Name-Last: Heritier Author-Name: Jessica Kasza Author-X-Name-First: Jessica Author-X-Name-Last: Kasza Title: Time Parameterizations in Cluster Randomized Trial Planning Abstract: Models for cluster randomized trials conducted over multiple time periods should account for underlying temporal trends. However, in practice there is often limited knowledge or data available to inform the choice of time parameterization of these trends, or to anticipate the implications of this choice on trial planning. In this article, we establish a sufficient condition for when the choice of time parameterization does not affect the form of the variance of the treatment effect estimator, thereby simplifying the planning of these trials. Journal: The American Statistician Pages: 184-189 Issue: 2 Volume: 74 Year: 2020 Month: 4 X-DOI: 10.1080/00031305.2019.1623072 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1623072 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:184-189 Template-Type: ReDIF-Article 1.0 Author-Name: Tim Johnson Author-X-Name-First: Tim Author-X-Name-Last: Johnson Author-Name: Christopher T. Dawes Author-X-Name-First: Christopher T. Author-X-Name-Last: Dawes Author-Name: Dalton Conley Author-X-Name-First: Dalton Author-X-Name-Last: Conley Title: How Does a Statistician Raise an Army? The Time When John W. Tukey, a Team of Luminaries, and a Statistics Graduate Student Repaired the Vietnam Selective Service Lotteries Abstract: Scholars have documented the failed randomization in 1969’s inaugural Vietnam Selective Service Lottery, but the story of how statisticians fixed that problem remains untold. Here, as the 50th anniversary of these events approaches, we recount how John W. Tukey, a team of statistical luminaries, and a graduate student from the University of Chicago repaired the draft lottery. Journal: The American Statistician Pages: 190-196 Issue: 2 Volume: 74 Year: 2020 Month: 4 X-DOI: 10.1080/00031305.2019.1677267 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1677267 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:190-196 Template-Type: ReDIF-Article 1.0 Author-Name: James A. Hanley Author-X-Name-First: James A. Author-X-Name-Last: Hanley Title: Lest We Forget: U.S. Selective Service Lotteries, 1917–2019 Abstract: The United States held 13 draft lotteries between 1917 and 1975, and a contingency procedure is in place for a selective service lottery were there ever to be a return to the draft. In 11 of these instances, the selection procedures spread the risk/harm evenhandedly. In two, whose anniversaries approach, the lotteries were problematic. Fortunately, one (1940) employed a “doubly robust” selection scheme that preserved the overall randomness; the other (1969) did not, and was not even-handed. These 13 lotteries provide examples of sound and unsound statistical planning, statistical acuity, and lessons ignored/learned. Existing and newly assembled raw data are used to describe the randomizations and to statistically measure deviations from randomness. The key statistical principle used in the selection procedures in WW I and WW II, in 1970–1975, and in the current (2019) contingency plan, is that of “double”—or even “quadruple”—robustness. This principle was used in medieval lotteries, such as the (four-month) two-drum lottery of 1569. Its use in the speeded up 2019 version provides a valuable and transparent statistical backstop where “an image of absolute fairness” is the over-riding concern. Journal: The American Statistician Pages: 197-206 Issue: 2 Volume: 74 Year: 2020 Month: 4 X-DOI: 10.1080/00031305.2019.1699444 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1699444 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:197-206 Template-Type: ReDIF-Article 1.0 Author-Name: Jong Hee Park Author-X-Name-First: Jong Hee Author-X-Name-Last: Park Title: The Art of Statistics: How to Learn From Data Journal: The American Statistician Pages: 207-207 Issue: 2 Volume: 74 Year: 2020 Month: 4 X-DOI: 10.1080/00031305.2020.1745572 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1745572 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:207-207 Template-Type: ReDIF-Article 1.0 Author-Name: Daniel Manrique-Vallier Author-X-Name-First: Daniel Author-X-Name-Last: Manrique-Vallier Title: Capture-Recapture Methods for the Social and Medical Sciences Journal: The American Statistician Pages: 207-208 Issue: 2 Volume: 74 Year: 2020 Month: 4 X-DOI: 10.1080/00031305.2020.1745574 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1745574 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:207-208 Template-Type: ReDIF-Article 1.0 Author-Name: Seung Jun Shin Author-X-Name-First: Seung Jun Author-X-Name-Last: Shin Title: Model-Based Clustering and Classification for Data Science: With Applications in R Journal: The American Statistician Pages: 208-209 Issue: 2 Volume: 74 Year: 2020 Month: 4 X-DOI: 10.1080/00031305.2020.1745576 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1745576 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:208-209 Template-Type: ReDIF-Article 1.0 Author-Name: Paul Johnson Author-X-Name-First: Paul Author-X-Name-Last: Johnson Title: R Markdown: The Definitive Guide Journal: The American Statistician Pages: 209-210 Issue: 2 Volume: 74 Year: 2020 Month: 4 X-DOI: 10.1080/00031305.2020.1745577 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1745577 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:209-210 Template-Type: ReDIF-Article 1.0 Author-Name: David C. Hoaglin Author-X-Name-First: David C. Author-X-Name-Last: Hoaglin Title: Did Phlegon Actually Use a Stem-and-Leaf Display? Journal: The American Statistician Pages: 211-211 Issue: 2 Volume: 74 Year: 2020 Month: 4 X-DOI: 10.1080/00031305.2020.1721329 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1721329 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:211-211 Template-Type: ReDIF-Article 1.0 Author-Name: The Editors Title: Correction Journal: The American Statistician Pages: 212-212 Issue: 2 Volume: 74 Year: 2020 Month: 4 X-DOI: 10.1080/00031305.2019.1708461 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1708461 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:212-212 Template-Type: ReDIF-Article 1.0 Author-Name: Melinda H. McCann Author-X-Name-First: Melinda H. Author-X-Name-Last: McCann Author-Name: Joshua D. Habiger Author-X-Name-First: Joshua D. Author-X-Name-Last: Habiger Title: The Detection of Nonnegligible Directional Effects With Associated Measures of Statistical Significance Abstract: When comparing two treatment groups, the objectives are often to (1) determine if the difference between groups (the effect) is of scientific interest, or nonnegligible, and (2) determine if the effect is positive or negative. In practice, a p-value corresponding to the null hypothesis that no effect exists is used to accomplish the first objective and a point estimate for the effect is used to accomplish the second objective. This article demonstrates that this approach is fundamentally flawed and proposes a new approach. The proposed method allows for claims regarding the size of an effect (nonnegligible vs. negligible) and its nature (positive vs. negative) to be made, and provides measures of statistical significance associated with each claim. Journal: The American Statistician Pages: 213-217 Issue: 3 Volume: 74 Year: 2020 Month: 7 X-DOI: 10.1080/00031305.2018.1497538 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1497538 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:213-217 Template-Type: ReDIF-Article 1.0 Author-Name: Haruhiko Ogasawara Author-X-Name-First: Haruhiko Author-X-Name-Last: Ogasawara Title: Some Improvements on Markov's Theorem with Extensions Abstract: Markov's theorem for an upper bound of the probability related to a nonnegative random variable has been improved using additional information in almost the nontrivial entire range of the variable. In the improvement, Cantelli's inequality is applied to the square root of the original variable, whose expectation is finite when that of the original variable is finite. The improvement has been extended to lower bounds and monotonic transformations of the original variable. The improvements are used in Chebyshev's inequality and its multivariate version. Journal: The American Statistician Pages: 218-225 Issue: 3 Volume: 74 Year: 2020 Month: 7 X-DOI: 10.1080/00031305.2018.1497539 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1497539 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:218-225 Template-Type: ReDIF-Article 1.0 Author-Name: Ling Leng Author-X-Name-First: Ling Author-X-Name-Last: Leng Author-Name: Wei Zhu Author-X-Name-First: Wei Author-X-Name-Last: Zhu Title: Compound Regression and Constrained Regression: Nonparametric Regression Frameworks for EIV Models Abstract: Errors-in-variable (EIV) regression is often used to gauge linear relationship between two variables both suffering from measurement and other errors, such as, the comparison of two measurement platforms (e.g., RNA sequencing vs. microarray). Scientists are often at a loss as to which EIV regression model to use for there are infinite many choices. We provide sound guidelines toward viable solutions to this dilemma by introducing two general nonparametric EIV regression frameworks: the compound regression and the constrained regression. It is shown that these approaches are equivalent to each other and, to the general parametric structural modeling approach. The advantages of these methods lie in their intuitive geometric representations, their distribution free nature, and their ability to offer candidate solutions with various optimal properties when the ratio of the error variances is unknown. Each includes the classic nonparametric regression methods of ordinary least squares, geometric mean regression (GMR), and orthogonal regression as special cases. Under these general frameworks, one can readily uncover some surprising optimal properties of the GMR, and truly comprehend the benefit of data normalization. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 226-232 Issue: 3 Volume: 74 Year: 2020 Month: 7 X-DOI: 10.1080/00031305.2018.1556734 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1556734 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:226-232 Template-Type: ReDIF-Article 1.0 Author-Name: Srinjoy Das Author-X-Name-First: Srinjoy Author-X-Name-Last: Das Author-Name: Dimitris N. Politis Author-X-Name-First: Dimitris N. Author-X-Name-Last: Politis Title: Nonparametric Estimation of the Conditional Distribution at Regression Boundary Points Abstract: Nonparametric regression is a standard statistical tool with increased importance in the Big Data era. Boundary points pose additional difficulties but local polynomial regression can be used to alleviate them. Local linear regression, for example, is easy to implement and performs quite well both at interior and boundary points. Estimating the conditional distribution function and/or the quantile function at a given regressor point is immediate via standard kernel methods but problems ensue if local linear methods are to be used. In particular, the distribution function estimator is not guaranteed to be monotone increasing, and the quantile curves can “cross.” In the article at hand, a simple method of correcting the local linear distribution estimator for monotonicity is proposed, and its good performance is demonstrated via simulations and real data examples. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 233-242 Issue: 3 Volume: 74 Year: 2020 Month: 7 X-DOI: 10.1080/00031305.2018.1558109 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1558109 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:233-242 Template-Type: ReDIF-Article 1.0 Author-Name: Sander Greenland Author-X-Name-First: Sander Author-X-Name-Last: Greenland Author-Name: Michael P. Fay Author-X-Name-First: Michael P. Author-X-Name-Last: Fay Author-Name: Erica H. Brittain Author-X-Name-First: Erica H. Author-X-Name-Last: Brittain Author-Name: Joanna H. Shih Author-X-Name-First: Joanna H. Author-X-Name-Last: Shih Author-Name: Dean A. Follmann Author-X-Name-First: Dean A. Author-X-Name-Last: Follmann Author-Name: Erin E. Gabriel Author-X-Name-First: Erin E. Author-X-Name-Last: Gabriel Author-Name: James M. Robins Author-X-Name-First: James M. Author-X-Name-Last: Robins Title: On Causal Inferences for Personalized Medicine: How Hidden Causal Assumptions Led to Erroneous Causal Claims About the D-Value Abstract: Personalized medicine asks if a new treatment will help a particular patient, rather than if it improves the average response in a population. Without a causal model to distinguish these questions, interpretational mistakes arise. These mistakes are seen in an article by Demidenko that recommends the “D-value,” which is the probability that a randomly chosen person from the new-treatment group has a higher value for the outcome than a randomly chosen person from the control-treatment group. The abstract states “The D-value has a clear interpretation as the proportion of patients who get worse after the treatment” with similar assertions appearing later. We show these statements are incorrect because they require assumptions about the potential outcomes which are neither testable in randomized experiments nor plausible in general. The D-value will not equal the proportion of patients who get worse after treatment if (as expected) those outcomes are correlated. Independence of potential outcomes is unrealistic and eliminates any personalized treatment effects; with dependence, the D-value can even imply treatment is better than control even though most patients are harmed by the treatment. Thus, D-values are misleading for personalized medicine. To prevent misunderstandings, we advise incorporating causal models into basic statistics education. Journal: The American Statistician Pages: 243-248 Issue: 3 Volume: 74 Year: 2020 Month: 7 X-DOI: 10.1080/00031305.2019.1575771 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1575771 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:243-248 Template-Type: ReDIF-Article 1.0 Author-Name: Pierre Baldi Author-X-Name-First: Pierre Author-X-Name-Last: Baldi Author-Name: Babak Shahbaba Author-X-Name-First: Babak Author-X-Name-Last: Shahbaba Title: Bayesian Causality Abstract: Although no universally accepted definition of causality exists, in practice one is often faced with the question of statistically assessing causal relationships in different settings. We present a uniform general approach to causality problems derived from the axiomatic foundations of the Bayesian statistical framework. In this approach, causality statements are viewed as hypotheses, or models, about the world and the fundamental object to be computed is the posterior distribution of the causal hypotheses, given the data and the background knowledge. Computation of the posterior, illustrated here in simple examples, may involve complex probabilistic modeling but this is no different than in any other Bayesian modeling situation. The main advantage of the approach is its connection to the axiomatic foundations of the Bayesian framework, and the general uniformity with which it can be applied to a variety of causality settings, ranging from specific to general cases, or from causes of effects to effects of causes. Journal: The American Statistician Pages: 249-257 Issue: 3 Volume: 74 Year: 2020 Month: 7 X-DOI: 10.1080/00031305.2019.1647876 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1647876 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:249-257 Template-Type: ReDIF-Article 1.0 Author-Name: Mahayaudin M. Mansor Author-X-Name-First: Mahayaudin M. Author-X-Name-Last: Mansor Author-Name: David A. Green Author-X-Name-First: David A. Author-X-Name-Last: Green Author-Name: Andrew V. Metcalfe Author-X-Name-First: Andrew V. Author-X-Name-Last: Metcalfe Title: Detecting Directionality in Time Series Abstract: Directionality can be seen in many stationary time series from various disciplines, but it is overlooked when fitting linear models with Gaussian errors. Moreover, we cannot rely on distinguishing directionality by comparing a plot of a time series in time order with a plot in reverse time order. In general, a statistical measure is required to detect and quantify directionality. There are several quite different qualitative forms of directionality, and we distinguish: rapid rises followed by slow recessions; rapid increases and rapid decreases from the mean followed by slow recovery toward the mean; directionality above or below some threshold; and intermittent directionality. The first objective is to develop a suite of statistical measures that will detect directionality and help classify its nature. The second objective is to demonstrate the potential benefits of detecting directionality. We consider applications from business, environmental science, finance, and medicine. Time series data are collected from many processes, both natural and anthropogenic, by a wide range of organizations, and directionality can easily be monitored as part of routine analysis. We suggest that doing so may provide new insights to the processes. Journal: The American Statistician Pages: 258-266 Issue: 3 Volume: 74 Year: 2020 Month: 7 X-DOI: 10.1080/00031305.2018.1545699 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1545699 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:258-266 Template-Type: ReDIF-Article 1.0 Author-Name: McKinley L. Blackburn Author-X-Name-First: McKinley L. Author-X-Name-Last: Blackburn Title: Bias in Small-Sample Inference With Count-Data Models Abstract: Both Poisson and negative binomial regression can provide quasi-likelihood estimates for coefficients in exponential-mean models that are consistent in the presence of distributional misspecification. It has generally been recommended, however, that inference be carried out using asymptotically robust estimators for the parameter covariance matrix. As with linear models, such robust inference tends to lead to over-rejection of null hypotheses in small samples. Alternative methods for estimating coefficient estimator variances are considered. No one approach seems to remove all test bias, but the results do suggest that the use of the jackknife with Poisson regression tends to be least biased for inference. Journal: The American Statistician Pages: 267-273 Issue: 3 Volume: 74 Year: 2020 Month: 7 X-DOI: 10.1080/00031305.2018.1564699 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1564699 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:267-273 Template-Type: ReDIF-Article 1.0 Author-Name: Andrew Kane Author-X-Name-First: Andrew Author-X-Name-Last: Kane Author-Name: Abhyuday Mandal Author-X-Name-First: Abhyuday Author-X-Name-Last: Mandal Title: A New Analysis Strategy for Designs With Complex Aliasing Abstract: Nonregular designs are popular in planning industrial experiments for their run-size economy. These designs often produce partially aliased effects, where the effects of different factors cannot be completely separated from each other. In this article, we propose applying an adaptive lasso regression as an analytical tool for designs with complex aliasing. Its utility compared to traditional methods is demonstrated by analyzing real-life experimental data and simulation studies. Journal: The American Statistician Pages: 274-281 Issue: 3 Volume: 74 Year: 2020 Month: 7 X-DOI: 10.1080/00031305.2019.1585287 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1585287 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:274-281 Template-Type: ReDIF-Article 1.0 Author-Name: Mintaek Lee Author-X-Name-First: Mintaek Author-X-Name-Last: Lee Author-Name: Jaechoul Lee Author-X-Name-First: Jaechoul Author-X-Name-Last: Lee Title: Trend and Return Level of Extreme Snow Events in New York City Abstract: A major winter storm brought up to 42 inches of snow in parts of the Mid-Atlantic and Northeast states for January 22–24, 2016. The blizzard of January 2016 impacted about 102.8 million people, claiming at least 55 lives and $500 million to $3 billion in economic losses. This article studies two important aspects of extreme snowfall events: 1. trends in annual maxima and threshold exceedances and 2. return levels for extreme snowfall. Applying extreme value methods to the extreme snow data in the New York City area, we quantify linear trends in extreme snowfall and assess how severe the 2016 blizzard is in terms of return levels. To find a more realistic standard error for the extreme value methods, we extend Smith’s method to adapt to both spatial and temporal correlations in the snow data. Our results show increasing, but insignificant trends in the annual maximum snowfall series. However, we find that the 87.5th percentile snowfall has significantly increased by 0.564 inches per decade, suggesting that, while the maximum snowfall is not significantly increasing, there have been increases in the snowfall among the larger storms. We also find that the 2016 blizzard is indeed an extreme snow event equivalent to about a 40-year return level in the New York City area. The extreme value methods used in this study are thoroughly illustrated for general readers. Data and modularized programming codes are to be available online to aid practitioners in using extreme value methods in applications. Journal: The American Statistician Pages: 282-293 Issue: 3 Volume: 74 Year: 2020 Month: 7 X-DOI: 10.1080/00031305.2019.1592780 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1592780 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:282-293 Template-Type: ReDIF-Article 1.0 Author-Name: Pankaj Bhagwat Author-X-Name-First: Pankaj Author-X-Name-Last: Bhagwat Author-Name: Éric Marchand Author-X-Name-First: Éric Author-X-Name-Last: Marchand Title: On a Proper Bayes, but Inadmissible Estimator Abstract: We present an example of a proper Bayes point estimator which is inadmissible. It occurs for a negative binomial model with shape parameter a, probability parameter p, prior densities of the form π(a,p) = β g(a) (1−p)β−1 , and for estimating the population mean μ=a(1−p)/p under squared error loss. Other intriguing features are exhibited such as the constancy of the Bayes estimator with respect to the choice of g, including degenerate or known a cases. Journal: The American Statistician Pages: 294-296 Issue: 3 Volume: 74 Year: 2020 Month: 7 X-DOI: 10.1080/00031305.2019.1604432 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1604432 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:294-296 Template-Type: ReDIF-Article 1.0 Author-Name: Ibrahim Salama Author-X-Name-First: Ibrahim Author-X-Name-Last: Salama Author-Name: Gary Koch Author-X-Name-First: Gary Author-X-Name-Last: Koch Title: On the Maximum–Minimums Identity: Extension and Applications Abstract: For real numbers x1,…,xn the maximum–minimums identity allows us to express the maximum of x1,…,xn in terms of the minimums of subsets of {x1,…,xn} . In this note, we provide an extension allowing us to express the kth-ranked element in terms of the minimums of subsets of sizes (n−k+1),…,n . We also discuss the dual identity, allowing us to express the kth-ranked element in terms of the maximums of subsets of sizes k,…,n . We present three examples: The first relates to the expected value of order statistics from independent nonidentical geometric distributions, the second to the partial coupon collector’s problem, and the third to relations among moments of order statistics. Journal: The American Statistician Pages: 297-300 Issue: 3 Volume: 74 Year: 2020 Month: 7 X-DOI: 10.1080/00031305.2019.1638832 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1638832 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:297-300 Template-Type: ReDIF-Article 1.0 Author-Name: Eugene D. Gallagher Author-X-Name-First: Eugene D. Author-X-Name-Last: Gallagher Title: Was Quetelet’s Average Man Normal? Abstract: Quetelet’s data on Scottish chest girths are analyzed with eight normality tests. In contrast to Quetelet’s conclusion that the data are fit well by what is now known as the normal distribution, six of eight normality tests provide strong evidence that the chest circumferences are not normally distributed. Using corrected chest circumferences from Stigler, the χ2 test no longer provides strong evidence against normality, but five commonly used normality tests do. The D’Agostino–Pearson K2 and Jarque–Bera tests, based only on skewness and kurtosis, find that both Quetelet’s original data and the Stigler-corrected data are consistent with the hypothesis of normality. The major reason causing most normality tests to produce low p-values, indicating that Quetelet’s data are not normally distributed, is that the chest circumferences were reported in whole inches and rounding of large numbers of observations can produce many tied values that strongly affect most normality tests. Users should be cautious using many standard normality tests if data have ties, are rounded, and the ratio of the standard deviation to rounding interval is small. Journal: The American Statistician Pages: 301-306 Issue: 3 Volume: 74 Year: 2020 Month: 7 X-DOI: 10.1080/00031305.2019.1706635 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1706635 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:301-306 Template-Type: ReDIF-Article 1.0 Author-Name: Yongdai Kim Author-X-Name-First: Yongdai Author-X-Name-Last: Kim Title: The 9 Pitfalls of Data Science Journal: The American Statistician Pages: 307-307 Issue: 3 Volume: 74 Year: 2020 Month: 7 X-DOI: 10.1080/00031305.2020.1790216 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1790216 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:307-307 Template-Type: ReDIF-Article 1.0 Author-Name: Brandon Butcher Author-X-Name-First: Brandon Author-X-Name-Last: Butcher Author-Name: Brian J. Smith Author-X-Name-First: Brian J. Author-X-Name-Last: Smith Title: Feature Engineering and Selection: A Practical Approach for Predictive Models Journal: The American Statistician Pages: 308-309 Issue: 3 Volume: 74 Year: 2020 Month: 7 X-DOI: 10.1080/00031305.2020.1790217 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1790217 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:308-309 Template-Type: ReDIF-Article 1.0 Author-Name: Bailey K. Fosdick Author-X-Name-First: Bailey K. Author-X-Name-Last: Fosdick Author-Name: G. Brooke Anderson Author-X-Name-First: G. Author-X-Name-Last: Brooke Anderson Title: Modern Statistics for Modern Biology Journal: The American Statistician Pages: 309-311 Issue: 3 Volume: 74 Year: 2020 Month: 7 X-DOI: 10.1080/00031305.2020.1790218 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1790218 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:309-311 Template-Type: ReDIF-Article 1.0 Author-Name: Jonathan M. Wells Author-X-Name-First: Jonathan M. Author-X-Name-Last: Wells Title: Surprises in Probability: Seventeen Short Stories Journal: The American Statistician Pages: 311-311 Issue: 3 Volume: 74 Year: 2020 Month: 7 X-DOI: 10.1080/00031305.2020.1790219 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1790219 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:311-311 Template-Type: ReDIF-Article 1.0 Author-Name: Robert B. Lund Author-X-Name-First: Robert B. Author-X-Name-Last: Lund Title: Time Series: A Data Analysis Approach Using R Journal: The American Statistician Pages: 312-312 Issue: 3 Volume: 74 Year: 2020 Month: 7 X-DOI: 10.1080/00031305.2020.1790221 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1790221 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:312-312 Template-Type: ReDIF-Article 1.0 Author-Name: Ronald Christensen Author-X-Name-First: Ronald Author-X-Name-Last: Christensen Title: Comment on “Test for Trend With a Multinomial Outcome” by Szabo (2019) Journal: The American Statistician Pages: 313-314 Issue: 3 Volume: 74 Year: 2020 Month: 7 X-DOI: 10.1080/00031305.2020.1763835 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1763835 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:313-314 Template-Type: ReDIF-Article 1.0 Author-Name: Gunnar Taraldsen Author-X-Name-First: Gunnar Author-X-Name-Last: Taraldsen Title: Micha Mandel (2020), “The Scaled Uniform Model Revisited,” The American Statistician, 74:1, 98–100: Comment Journal: The American Statistician Pages: 315-315 Issue: 3 Volume: 74 Year: 2020 Month: 7 X-DOI: 10.1080/00031305.2020.1769727 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1769727 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:315-315 Template-Type: ReDIF-Article 1.0 Author-Name: Iain L. MacDonald Author-X-Name-First: Iain L. Author-X-Name-Last: MacDonald Author-Name: Feroz Bhamani Author-X-Name-First: Feroz Author-X-Name-Last: Bhamani Title: A Time-Series Model for Underdispersed or Overdispersed Counts Abstract: It is common for time series of unbounded counts (that is, nonnegative integers) to display overdispersion relative to the Poisson. Such an overdispersed series can be modeled by a hidden Markov model with Poisson state-dependent distributions (a “Poisson–HMM”), since a Poisson–HMM allows for both overdispersion and serial dependence. Time series of underdispersed counts seems less common, but more awkward to model; a Poisson–HMM cannot cope with underdispersion. But if in a Poisson–HMM one replaces the Poisson distributions by Conway–Maxwell–Poisson distributions, one gets a class of models which can allow for under- or overdispersion (and serial dependence). In addition, this class can cope with the combination of slight overdispersion and substantial serial dependence, a combination that is apparently difficult for a Poisson–HMM to represent. We discuss the properties of this class of models, and use direct numerical maximization of likelihood to fit a range of models to three published series of counts which display underdispersion, and to a series which displays slight overdispersion plus substantial serial dependence. In addition, we illustrate how such models can be fitted without imputation when some observations are missing from the series, and how approximate standard errors of the parameter estimates can be found. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 317-328 Issue: 4 Volume: 74 Year: 2020 Month: 10 X-DOI: 10.1080/00031305.2018.1505656 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1505656 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:4:p:317-328 Template-Type: ReDIF-Article 1.0 Author-Name: Peihua Qiu Author-X-Name-First: Peihua Author-X-Name-Last: Qiu Title: Big Data? Statistical Process Control Can Help! Abstract: “Big data” is a buzzword these days due to an enormous amount of data-rich applications in different industries and research projects. In practice, big data often take the form of data streams in the sense that new batches of data keep being collected over time. One fundamental research problem when analyzing big data in a given application is to monitor the underlying sequential process of the observed data to see whether it is longitudinally stable, or how its distribution changes over time. To monitor a sequential process, one major statistical tool is the statistical process control (SPC) charts, which have been developed and used mainly for monitoring production lines in the manufacturing industries during the past several decades. With many new and versatile SPC methods developed in the recent research, it is our belief that SPC can become a powerful tool for handling many big data applications that are beyond the production line monitoring. In this article, we introduce some recent SPC methods, and discuss their potential to solve some big data problems. Certain challenges in the interface between the current SPC research and some big data applications are also discussed. Journal: The American Statistician Pages: 329-344 Issue: 4 Volume: 74 Year: 2020 Month: 10 X-DOI: 10.1080/00031305.2019.1700163 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1700163 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:4:p:329-344 Template-Type: ReDIF-Article 1.0 Author-Name: Li Xu Author-X-Name-First: Li Author-X-Name-Last: Xu Author-Name: Chris Gotwalt Author-X-Name-First: Chris Author-X-Name-Last: Gotwalt Author-Name: Yili Hong Author-X-Name-First: Yili Author-X-Name-Last: Hong Author-Name: Caleb B. King Author-X-Name-First: Caleb B. Author-X-Name-Last: King Author-Name: William Q. Meeker Author-X-Name-First: William Q. Author-X-Name-Last: Meeker Title: Applications of the Fractional-Random-Weight Bootstrap Abstract: For several decades, the resampling based bootstrap has been widely used for computing confidence intervals (CIs) for applications where no exact method is available. However, there are many applications where the resampling bootstrap method cannot be used. These include situations where the data are heavily censored due to the success response being a rare event, situations where there is insufficient mixing of successes and failures across the explanatory variable(s), and designed experiments where the number of parameters is close to the number of observations. These three situations all have in common that there may be a substantial proportion of the resamples where it is not possible to estimate all of the parameters in the model. This article reviews the fractional-random-weight bootstrap method and demonstrates how it can be used to avoid these problems and construct CIs in a way that is accessible to statistical practitioners. The fractional-random-weight bootstrap method is easy to use and has advantages over the resampling method in many challenging applications. Journal: The American Statistician Pages: 345-358 Issue: 4 Volume: 74 Year: 2020 Month: 10 X-DOI: 10.1080/00031305.2020.1731599 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1731599 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:4:p:345-358 Template-Type: ReDIF-Article 1.0 Author-Name: Christopher T. Franck Author-X-Name-First: Christopher T. Author-X-Name-Last: Franck Author-Name: Robert B. Gramacy Author-X-Name-First: Robert B. Author-X-Name-Last: Gramacy Title: Assessing Bayes Factor Surfaces Using Interactive Visualization and Computer Surrogate Modeling Abstract: Bayesian model selection provides a natural alternative to classical hypothesis testing based on p-values. While many articles mention that Bayesian model selection can be sensitive to prior specification on parameters, there are few practical strategies to assess and report this sensitivity. This article has two goals. First, we aim to educate the broader statistical community about the extent of potential sensitivity through visualization of the Bayes factor surface. The Bayes factor surface shows the value a Bayes factor takes as a function of user-specified hyperparameters. Second, we suggest surrogate modeling via Gaussian processes to visualize the Bayes factor surface in situations where computation is expensive. We provide three examples including an interactive R shiny application that explores a simple regression problem, a hierarchical linear model selection exercise, and finally surrogate modeling via Gaussian processes to a study of the influence of outliers in empirical finance. We suggest Bayes factor surfaces are valuable for scientific reporting since they (i) increase transparency by making instability in Bayes factors easy to visualize, (ii) generalize to simple and complicated examples, and (iii) provide a path for researchers to assess the impact of prior choice on modeling decisions in a wide variety of research areas. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 359-369 Issue: 4 Volume: 74 Year: 2020 Month: 10 X-DOI: 10.1080/00031305.2019.1671219 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1671219 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:4:p:359-369 Template-Type: ReDIF-Article 1.0 Author-Name: Jae H. Kim Author-X-Name-First: Jae H. Author-X-Name-Last: Kim Title: Decision-Theoretic Hypothesis Testing: A Primer With R Package OptSig Abstract: This article is a primer for a decision-theoretic approach to hypothesis testing for students and teachers of basic statistics. Using three examples at an introductory level, this article demonstrates how decision-theoretic hypothesis testing can be taught to the students of basic statistics. It also demonstrates that students and researchers can make more sensible and unambiguous decisions under uncertainty by employing this particular approach. The examples are illustrated using R and its package “OptSig.” Journal: The American Statistician Pages: 370-379 Issue: 4 Volume: 74 Year: 2020 Month: 10 X-DOI: 10.1080/00031305.2020.1750484 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1750484 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:4:p:370-379 Template-Type: ReDIF-Article 1.0 Author-Name: Scott D. Grimshaw Author-X-Name-First: Scott D. Author-X-Name-Last: Grimshaw Author-Name: Natalie J. Blades Author-X-Name-First: Natalie J. Author-X-Name-Last: Blades Author-Name: Candace Berrett Author-X-Name-First: Candace Author-X-Name-Last: Berrett Title: Going Viral, Binge-Watching, and Attention Cannibalism Abstract: Binge-watching behavior is modeled for a single season of an original program from a streaming service to understand and make predictions about how individuals watch newly released content. Viewers make two choices in binge watching. First, the onset when individuals begin viewing the program is modeled using a change point between epidemic viewing with a nonconstant hazard rate and endemic viewing with a constant hazard rate. Second, the time it takes for individuals to complete the full season is modeled using an expanded negative binomial hurdle model to account for both binge racers (who watch all episodes in a single day) and other viewers. With the rapid increase in original content for streaming services, network executives are interested in the decision of simultaneously releasing multiple original programs or staggering premiere dates. The two model results are used to investigate competing risks to determine how the amount of time between premieres impacts attention cannibalism, when a viewer takes a long time watching their first choice program and consequently never watches the second program. Journal: The American Statistician Pages: 380-391 Issue: 4 Volume: 74 Year: 2020 Month: 10 X-DOI: 10.1080/00031305.2020.1774415 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1774415 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:4:p:380-391 Template-Type: ReDIF-Article 1.0 Author-Name: Haozhe Zhang Author-X-Name-First: Haozhe Author-X-Name-Last: Zhang Author-Name: Joshua Zimmerman Author-X-Name-First: Joshua Author-X-Name-Last: Zimmerman Author-Name: Dan Nettleton Author-X-Name-First: Dan Author-X-Name-Last: Nettleton Author-Name: Daniel J. Nordman Author-X-Name-First: Daniel J. Author-X-Name-Last: Nordman Title: Random Forest Prediction Intervals Abstract: Random forests are among the most popular machine learning techniques for prediction problems. When using random forests to predict a quantitative response, an important but often overlooked challenge is the determination of prediction intervals that will contain an unobserved response value with a specified probability. We propose new random forest prediction intervals that are based on the empirical distribution of out-of-bag prediction errors. These intervals can be obtained as a by-product of a single random forest. Under regularity conditions, we prove that the proposed intervals have asymptotically correct coverage rates. Simulation studies and analysis of 60 real datasets are used to compare the finite-sample properties of the proposed intervals with quantile regression forests and recently proposed split conformal intervals. The results indicate that intervals constructed with our proposed method tend to be narrower than those of competing methods while still maintaining marginal coverage rates approximately equal to nominal levels. Journal: The American Statistician Pages: 392-406 Issue: 4 Volume: 74 Year: 2020 Month: 10 X-DOI: 10.1080/00031305.2019.1585288 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1585288 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:4:p:392-406 Template-Type: ReDIF-Article 1.0 Author-Name: Tomasz J. Kozubowski Author-X-Name-First: Tomasz J. Author-X-Name-Last: Kozubowski Author-Name: Krzysztof Podgórski Author-X-Name-First: Krzysztof Author-X-Name-Last: Podgórski Title: Gaussian Mixture Representation of the Laplace Distribution Revisited: Bibliographical Connections and Extensions Abstract: We provide bibliographical connections and extensions of several representations of the classical Laplace distribution, discussed recently in the study of Ding and Blitzstein. Beyond presenting relation to some previous results, we also include their skew as well as multivariate versions. In particular, the distribution of det Z, where Z is an n × n matrix of iid standard normal components, is obtained for an arbitrary integer n. While the latter is a scale mixture of Gaussian distributions, the Laplace distribution is obtained only in the case n = 2. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 407-412 Issue: 4 Volume: 74 Year: 2020 Month: 10 X-DOI: 10.1080/00031305.2019.1630000 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1630000 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:4:p:407-412 Template-Type: ReDIF-Article 1.0 Author-Name: Malay Ghosh Author-X-Name-First: Malay Author-X-Name-Last: Ghosh Title: Revisiting Jeffreys’ Example: Bayes Test of the Normal Mean Abstract: We revisit the classical problem of testing whether a normal mean is zero against all possible alternatives within a Bayesian framework. Jeffreys showed that the Bayes factor for this problem has a drawback with normal priors for the alternatives. He showed also that this deficiency is rectified when one uses a Cauchy prior instead. Noting that a Cauchy prior is an example of a scale-mixed normal prior, we want to examine whether or not scale-mixed normal priors can always overcome the deficiency of the Bayes factor. It turns out though that while mixing priors with polynomial tails can overcome this deficiency, those with exponential tails fail to do so. Examples are provided to illustrate this point. Journal: The American Statistician Pages: 413-415 Issue: 4 Volume: 74 Year: 2020 Month: 10 X-DOI: 10.1080/00031305.2019.1687013 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1687013 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:4:p:413-415 Template-Type: ReDIF-Article 1.0 Author-Name: Wen Li Author-X-Name-First: Wen Author-X-Name-Last: Li Author-Name: Thomas O. Jemielita Author-X-Name-First: Thomas O. Author-X-Name-Last: Jemielita Title: Mathematical and Statistical Skills in the Biopharmaceutical Industry: A Pragmatic Approach. Journal: The American Statistician Pages: 416-417 Issue: 4 Volume: 74 Year: 2020 Month: 10 X-DOI: 10.1080/00031305.2020.1831806 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1831806 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:4:p:416-417 Template-Type: ReDIF-Article 1.0 Author-Name: Qixuan Chen Author-X-Name-First: Qixuan Author-X-Name-Last: Chen Title: Multiple Imputation in Practice: With Examples Using IVEware. Journal: The American Statistician Pages: 417-417 Issue: 4 Volume: 74 Year: 2020 Month: 10 X-DOI: 10.1080/00031305.2020.1831809 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1831809 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:4:p:417-417 Template-Type: ReDIF-Article 1.0 Author-Name: Kalimuthu Krishnamoorthy Author-X-Name-First: Kalimuthu Author-X-Name-Last: Krishnamoorthy Author-Name: Yanping Xia Author-X-Name-First: Yanping Author-X-Name-Last: Xia Title: Xinjie Hu, Aekyung Jung, and Gengsheng Qin (2020), “Interval Estimation for the Correlation Coefficient,” The American Statistician, 74:1, 29–36: Comment by Krishnamoorthy and Xia Journal: The American Statistician Pages: 418-418 Issue: 4 Volume: 74 Year: 2020 Month: 10 X-DOI: 10.1080/00031305.2020.1829048 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1829048 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:4:p:418-418 Template-Type: ReDIF-Article 1.0 Author-Name: Xinjie Hu Author-X-Name-First: Xinjie Author-X-Name-Last: Hu Author-Name: Aekyung Jung Author-X-Name-First: Aekyung Author-X-Name-Last: Jung Author-Name: Gengsheng Qin Author-X-Name-First: Gengsheng Author-X-Name-Last: Qin Title: A Response to the Letter to the Editor on “Interval Estimation for the Correlation Coefficient,” The American Statistician, 74:1, 29–36: Comment by Krishnamoorthy and Xia Journal: The American Statistician Pages: 419-419 Issue: 4 Volume: 74 Year: 2020 Month: 10 X-DOI: 10.1080/00031305.2020.1827032 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1827032 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:4:p:419-419 Template-Type: ReDIF-Article 1.0 Author-Name: The Editors Title: Editorial Collaborators Journal: The American Statistician Pages: 420-421 Issue: 4 Volume: 74 Year: 2020 Month: 10 X-DOI: 10.1080/00031305.2020.1842019 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1842019 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:74:y:2020:i:4:p:420-421 Template-Type: ReDIF-Article 1.0 Author-Name: Michael Lavine Author-X-Name-First: Michael Author-X-Name-Last: Lavine Author-Name: Jim Hodges Author-X-Name-First: Jim Author-X-Name-Last: Hodges Title: Intuition for an Old Curiosity and an Implication for MCMC Abstract: Morris and Ebey reported the following curiosity. “The unweighted sample mean is examined as an estimator of the population mean in a first-order autoregressive model. It is demonstrated that the precision of this estimator deteriorates as the number of equally spaced observations taken within a fixed time interval increases.” Morris and Ebey proved their result but gave no intuition for it. We provide some intuition, then examine an implication: that the usual practice of estimating posterior expectations by taking the unweighted average of consecutive Markov chain Monte Carlo (MCMC) samples may not be optimal. Journal: The American Statistician Pages: 1-6 Issue: 1 Volume: 75 Year: 2021 Month: 1 X-DOI: 10.1080/00031305.2018.1518267 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1518267 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:1-6 Template-Type: ReDIF-Article 1.0 Author-Name: Rudy Ligtvoet Author-X-Name-First: Rudy Author-X-Name-Last: Ligtvoet Title: Exact Bayes Factors for the Comparison of Multinomial Distributions Abstract: This article deals with the problem of comparing multinomial distributions with multiple ordered categories. A graphical procedure is proposed for obtaining the posterior probabilities for the hypotheses of a stochastic dominance relationship, positive cumulative odds ratios, and a likelihood ratio ordering. From these expressions we subsequently obtain exact expressions for the Bayes factors related to these hypotheses. Supplemental materials for running the analysis for the examples presented in the article are available online. Journal: The American Statistician Pages: 7-14 Issue: 1 Volume: 75 Year: 2021 Month: 1 X-DOI: 10.1080/00031305.2019.1575773 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1575773 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:7-14 Template-Type: ReDIF-Article 1.0 Author-Name: Paul Kvam Author-X-Name-First: Paul Author-X-Name-Last: Kvam Title: The Price is Right: Analyzing Bidding Behavior on Contestants’ Row Abstract: The TV game show “The Price is Right” features a bidding auction called Contestant’s Row that rewards the player (out of four) who bids closest to an item’s value without overbidding. By exploring 903 game outcomes from the 2000–2001 season, we show how player strategies are significantly inefficient, and compare the empirical results to probability outcomes for optimal bid strategies found in a recent study. Findings show that the last bidder would do better using the naïve strategy of bidding a dollar more than the highest of the three bids. We apply the EM algorithm in a novel way to extract a maximum amount of information from observed player bids. The gained knowledge about a player’s evaluation of merchandise allows us to uncover new insights into player behavior, including the potential effects of anchoring. Journal: The American Statistician Pages: 15-22 Issue: 1 Volume: 75 Year: 2021 Month: 1 X-DOI: 10.1080/00031305.2019.1592782 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1592782 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:15-22 Template-Type: ReDIF-Article 1.0 Author-Name: Chuan-Fa Tang Author-X-Name-First: Chuan-Fa Author-X-Name-Last: Tang Author-Name: Dewei Wang Author-X-Name-First: Dewei Author-X-Name-Last: Wang Author-Name: Hammou El Barmi Author-X-Name-First: Hammou Author-X-Name-Last: El Barmi Author-Name: Joshua M. Tebbs Author-X-Name-First: Joshua M. Author-X-Name-Last: Tebbs Title: Testing for Positive Quadrant Dependence Abstract: We develop an empirical likelihood (EL) approach to test independence of two univariate random variables X and Y versus the alternative that X and Y are strictly positive quadrant dependent (PQD). Establishing this type of ordering between X and Y is of interest in many applications, including finance, insurance, engineering, and other areas. Adopting the framework in Einmahl and McKeague, we create a distribution-free test statistic that integrates a localized EL ratio test statistic with respect to the empirical joint distribution of X and Y. When compared to well-known existing tests and distance-based tests we develop by using copula functions, simulation results show the EL testing procedure performs well in a variety of scenarios when X and Y are strictly PQD. We use three datasets for illustration and provide an online R resource practitioners can use to implement the methods in this article. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 23-30 Issue: 1 Volume: 75 Year: 2021 Month: 1 X-DOI: 10.1080/00031305.2019.1607554 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1607554 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:23-30 Template-Type: ReDIF-Article 1.0 Author-Name: J. González-Ortega Author-X-Name-First: J. Author-X-Name-Last: González-Ortega Author-Name: D. Ríos Insua Author-X-Name-First: D. Author-X-Name-Last: Ríos Insua Author-Name: F. Ruggeri Author-X-Name-First: F. Author-X-Name-Last: Ruggeri Author-Name: R. Soyer Author-X-Name-First: R. Author-X-Name-Last: Soyer Title: Hypothesis Testing in Presence of Adversaries Abstract: We present an extension to the classical problem of hypothesis testing by incorporating actions of an adversary who intends to mislead the decision-maker and attain a certain benefit. After presenting the general problem within an adversarial statistical decision theory framework, we consider the cases of adversaries who can either perturb the data received or modify the underlying data-generating process parametrically. Supplemental materials for this article are available online. Journal: The American Statistician Pages: 31-40 Issue: 1 Volume: 75 Year: 2021 Month: 1 X-DOI: 10.1080/00031305.2019.1630001 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1630001 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:31-40 Template-Type: ReDIF-Article 1.0 Author-Name: Bo Peng Author-X-Name-First: Bo Author-X-Name-Last: Peng Author-Name: Min Wang Author-X-Name-First: Min Author-X-Name-Last: Wang Title: Objective Bayesian testing for the correlation coefficient under divergence-based priors Abstract: The correlation coefficient is a commonly used criterion to measure the strength of a linear relationship between the two quantitative variables. For a bivariate normal distribution, numerous procedures have been proposed for testing a precise null hypothesis of the correlation coefficient, whereas the construction of flexible procedures for testing a set of (multiple) precise and/or interval hypotheses has received less attention. This paper fills the gap by proposing an objective Bayesian testing procedure using the divergence-based priors. The proposed Bayes factors can be used for testing any combination of precise and interval hypotheses and also allow a researcher to quantify evidence in the data in favor of the null or any other hypothesis under consideration. An extensive simulation study is conducted to compare the performances between the proposed Bayesian methods and some existing ones in the literature. Finally, a real-data example is provided for illustrative purposes. Journal: The American Statistician Pages: 41-51 Issue: 1 Volume: 75 Year: 2021 Month: 1 X-DOI: 10.1080/00031305.2019.1677266 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1677266 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:41-51 Template-Type: ReDIF-Article 1.0 Author-Name: D. Andrew Brown Author-X-Name-First: D. Andrew Author-X-Name-Last: Brown Author-Name: Christopher S. McMahan Author-X-Name-First: Christopher S. Author-X-Name-Last: McMahan Author-Name: Stella Watson Self Author-X-Name-First: Stella Author-X-Name-Last: Watson Self Title: Sampling Strategies for Fast Updating of Gaussian Markov Random Fields Abstract: Gaussian Markov random fields (GMRFs) are popular for modeling dependence in large areal datasets due to their ease of interpretation and computational convenience afforded by the sparse precision matrices needed for random variable generation. Typically in Bayesian computation, GMRFs are updated jointly in a block Gibbs sampler or componentwise in a single-site sampler via the full conditional distributions. The former approach can speed convergence by updating correlated variables all at once, while the latter avoids solving large matrices. We consider a sampling approach in which the underlying graph can be cut so that conditionally independent sites are updated simultaneously. This algorithm allows a practitioner to parallelize updates of subsets of locations or to take advantage of “vectorized” calculations in a high-level language such as R. Through both simulated and real data, we demonstrate computational savings that can be achieved versus both single-site and block updating, regardless of whether the data are on a regular or an irregular lattice. The approach provides a good compromise between statistical and computational efficiency and is accessible to statisticians without expertise in numerical analysis or advanced computing. Journal: The American Statistician Pages: 52-65 Issue: 1 Volume: 75 Year: 2021 Month: 1 X-DOI: 10.1080/00031305.2019.1595144 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1595144 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:52-65 Template-Type: ReDIF-Article 1.0 Author-Name: Alex Karanevich Author-X-Name-First: Alex Author-X-Name-Last: Karanevich Author-Name: Richard Meier Author-X-Name-First: Richard Author-X-Name-Last: Meier Author-Name: Stefan Graw Author-X-Name-First: Stefan Author-X-Name-Last: Graw Author-Name: Anna McGlothlin Author-X-Name-First: Anna Author-X-Name-Last: McGlothlin Author-Name: Byron Gajewski Author-X-Name-First: Byron Author-X-Name-Last: Gajewski Title: Optimizing Sample Size Allocation and Power in a Bayesian Two-Stage Drop-the-Losers Design Abstract: When a researcher desires to test several treatment arms against a control arm, a two-stage adaptive design can be more efficient than a single-stage design where patients are equally allocated to all treatment arms and the control. We see this type of approach in clinical trials as a seamless Phase II–Phase III design. These designs require more statistical support and are less straightforward to plan and analyze than a standard single-stage design. To diminish the barriers associated with a Bayesian two-stage drop-the-losers design, we built a user-friendly point-and-click graphical user interface with R Shiny to aid researchers in planning such designs by allowing them to easily obtain trial operating characteristics, estimate statistical power and sample size, and optimize patient allocation in each stage to maximize power. We assume that endpoints are distributed normally with unknown but common variance between treatments. We recommend this software as an easy way to engage statisticians and researchers in two-stage designs as well as to actively investigate the power of two-stage designs relative to more traditional approaches. The software is freely available at https://github.com/stefangraw/Allocation-Power-Optimizer. Journal: The American Statistician Pages: 66-75 Issue: 1 Volume: 75 Year: 2021 Month: 1 X-DOI: 10.1080/00031305.2019.1610065 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1610065 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:66-75 Template-Type: ReDIF-Article 1.0 Author-Name: Richard Berk Author-X-Name-First: Richard Author-X-Name-Last: Berk Author-Name: Andreas Buja Author-X-Name-First: Andreas Author-X-Name-Last: Buja Author-Name: Lawrence Brown Author-X-Name-First: Lawrence Author-X-Name-Last: Brown Author-Name: Edward George Author-X-Name-First: Edward Author-X-Name-Last: George Author-Name: Arun Kumar Kuchibhotla Author-X-Name-First: Arun Kumar Author-X-Name-Last: Kuchibhotla Author-Name: Weijie Su Author-X-Name-First: Weijie Author-X-Name-Last: Su Author-Name: Linda Zhao Author-X-Name-First: Linda Author-X-Name-Last: Zhao Title: Assumption Lean Regression Abstract: It is well known that with observational data, models used in conventional regression analyses are commonly misspecified. Yet in practice, one tends to proceed with interpretations and inferences that rely on correct specification. Even those who invoke Box’s maxim that all models are wrong proceed as if results were generally useful. Misspecification, however, has implications that affect practice. Regression models are approximations to a true response surface and should be treated as such. Accordingly, regression parameters should be interpreted as statistical functionals. Importantly, the regressor distribution affects targets of estimation and regressor randomness affects the sampling variability of estimates. As a consequence, inference should be based on sandwich estimators or the pairs (x–y) bootstrap. Traditional prediction intervals lose their pointwise coverage guarantees, but empirically calibrated intervals can be justified for future populations. We illustrate the key concepts with an empirical application. Journal: The American Statistician Pages: 76-84 Issue: 1 Volume: 75 Year: 2021 Month: 1 X-DOI: 10.1080/00031305.2019.1592781 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1592781 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:76-84 Template-Type: ReDIF-Article 1.0 Author-Name: Julian Fecker Author-X-Name-First: Julian Author-X-Name-Last: Fecker Author-Name: Martin Schumacher Author-X-Name-First: Martin Author-X-Name-Last: Schumacher Author-Name: Kristin Ohneberg Author-X-Name-First: Kristin Author-X-Name-Last: Ohneberg Author-Name: Martin Wolkewitz Author-X-Name-First: Martin Author-X-Name-Last: Wolkewitz Title: Correction of Survival Bias in a Study About Increased Mortality of Heads of Government Abstract: A recent study reported increased mortality of heads in government. To avoid the time-dependent bias (also known as immortal-time bias), survival from last election was compared between election winners and runners-up. We claim that this data manipulation results in bias due to conditioning on future events; survival should be compared from first election as well as winning should be considered as a time-dependent covariate. We collected the missing life-time periods and redesigned the study to display this bias using Lexis diagrams and multistate methodology. We found that the bias that we termed the healthy candidate bias was even more severe than the time-dependent bias. Journal: The American Statistician Pages: 85-91 Issue: 1 Volume: 75 Year: 2021 Month: 1 X-DOI: 10.1080/00031305.2019.1638831 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1638831 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:85-91 Template-Type: ReDIF-Article 1.0 Author-Name: Madison Arnsbarger Author-X-Name-First: Madison Author-X-Name-Last: Arnsbarger Author-Name: Joshua Goldstein Author-X-Name-First: Joshua Author-X-Name-Last: Goldstein Author-Name: Claire Kelling Author-X-Name-First: Claire Author-X-Name-Last: Kelling Author-Name: Gizem Korkmaz Author-X-Name-First: Gizem Author-X-Name-Last: Korkmaz Author-Name: Sallie Keller Author-X-Name-First: Sallie Author-X-Name-Last: Keller Title: Modeling Response Time to Structure Fires Abstract: It is important to reduce fire department response times to incidents to improve communities’ general safety, to make the allocation of emergency resources more efficient, and to improve situational awareness. In this article, we identify which factors affect turnout times and travel times for the Arlington County Fire Department in Virginia by applying both linear and spatial models to the U.S. National Fire Incident Reporting System (NFIRS) data. The uniformity of NFIRS data makes this article’s methodological innovations applicable to other participating fire departments in the United States and advances the effort to incorporate scientific evidence into government-level policy-making. Journal: The American Statistician Pages: 92-100 Issue: 1 Volume: 75 Year: 2021 Month: 1 X-DOI: 10.1080/00031305.2019.1695664 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1695664 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:92-100 Template-Type: ReDIF-Article 1.0 Author-Name: Duy Nguyen Author-X-Name-First: Duy Author-X-Name-Last: Nguyen Title: A Probabilistic Approach to The Moments of Binomial Random Variables and Application Abstract: In this paper, we provide a closed form formula for the moments of binomial random variables using a probabilistic approach. As an interesting application, we give a closed form formula for the sum 1k+2k+3k+…+nk . Journal: The American Statistician Pages: 101-103 Issue: 1 Volume: 75 Year: 2021 Month: 1 X-DOI: 10.1080/00031305.2019.1679257 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1679257 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:101-103 Template-Type: ReDIF-Article 1.0 Author-Name: David R. Bickel Author-X-Name-First: David R. Author-X-Name-Last: Bickel Title: Null Hypothesis Significance Testing Interpreted and Calibrated by Estimating Probabilities of Sign Errors: A Bayes-Frequentist Continuum Abstract: Hypothesis tests are conducted not only to determine whether a null hypothesis (H0) is true but also to determine the direction or sign of an effect. A simple estimate of the posterior probability of a sign error is PSE = (1 – PH0)p/2 + PH0, depending only on a two-sided p-value and PH0, an estimate of the posterior probability of H0. A convenient option for PH0 is the posterior probability derived from estimating the Bayes factor to be its e p ln (1/p) lower bound. In that case, PSE depends only on p and an estimate of the prior probability of H0. PSE provides a continuum between significance testing and traditional Bayesian testing. The former effectively assumes the prior probability of H0 is 0, as some statisticians argue. In that case, PSE is equal to a one-sided p-value. (In that sense, PSE is a calibrated p-value.) In traditional Bayesian testing, on the other hand, the prior probability of H0 is at least 50%, which usually brings PSE close to PH0. Journal: The American Statistician Pages: 104-112 Issue: 1 Volume: 75 Year: 2020 Month: 10 X-DOI: 10.1080/00031305.2020.1816214 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1816214 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2020:i:1:p:104-112 Template-Type: ReDIF-Article 1.0 Author-Name: James M. Flegal Author-X-Name-First: James M. Author-X-Name-Last: Flegal Title: Data Visualization: Charts, Maps, and Interactive Graphics. Robert Grant. Journal: The American Statistician Pages: 113-113 Issue: 1 Volume: 75 Year: 2021 Month: 1 X-DOI: 10.1080/00031305.2020.1865062 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1865062 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:113-113 Template-Type: ReDIF-Article 1.0 Author-Name: James M. Flegal Author-X-Name-First: James M. Author-X-Name-Last: Flegal Title: Fundamentals of Probability with Stochastic Processes, 4th ed. Saeed Ghahramani. Journal: The American Statistician Pages: 113-114 Issue: 1 Volume: 75 Year: 2021 Month: 1 X-DOI: 10.1080/00031305.2020.1865063 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1865063 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:113-114 Template-Type: ReDIF-Article 1.0 Author-Name: Nicholas W. Bussberg Author-X-Name-First: Nicholas W. Author-X-Name-Last: Bussberg Title: Spatio-Temporal Statistics With R. Journal: The American Statistician Pages: 114-114 Issue: 1 Volume: 75 Year: 2021 Month: 1 X-DOI: 10.1080/00031305.2020.1865066 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1865066 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:114-114 Template-Type: ReDIF-Article 1.0 Author-Name: Roger L. Berger Author-X-Name-First: Roger L. Author-X-Name-Last: Berger Title: McCann and Habiger (2020), “The Detection of Nonnegligible Directional Effects With Associated Measures of Statistical Significance,” Journal: The American Statistician Pages: 115-115 Issue: 1 Volume: 75 Year: 2020 Month: 12 X-DOI: 10.1080/00031305.2020.1850523 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1850523 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2020:i:1:p:115-115 Template-Type: ReDIF-Article 1.0 Author-Name: Melinda H. McCann Author-X-Name-First: Melinda H. Author-X-Name-Last: McCann Author-Name: Joshua D. Habiger Author-X-Name-First: Joshua D. Author-X-Name-Last: Habiger Title: Response to the Letter to the Editor on “The Detection of Nonnegligible Directional Effects With Associated Measures of Statistical Significance,” The American Statistician, 74:3, 213–217: Comment by Roger Berger Journal: The American Statistician Pages: 116-116 Issue: 1 Volume: 75 Year: 2021 Month: 1 X-DOI: 10.1080/00031305.2020.1851766 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1851766 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:116-116 Template-Type: ReDIF-Article 1.0 Author-Name: Dale L. Zimmerman Author-X-Name-First: Dale L. Author-X-Name-Last: Zimmerman Author-Name: Nathan D. Zimmerman Author-X-Name-First: Nathan D. Author-X-Name-Last: Zimmerman Author-Name: Joshua T. Zimmerman Author-X-Name-First: Joshua T. Author-X-Name-Last: Zimmerman Title: March Madness “Anomalies”: Are They Real, and If So, Can They Be Explained? Abstract: Previously published statistical analyses of NCAA Division I Men’s Basketball Tournament (“March Madness”) game outcomes since the 64-team format for its main draw began in 1985 have uncovered some apparent anomalies, such as 12-seeds upsetting 5-seeds more often than might be expected, and seeds 10 through 12 advancing to the Sweet Sixteen much more often than 8-seeds and 9-seeds—the so-called middle-seed anomaly. In this article, we address the questions of whether these perceived anomalies truly are anomalous and if so, what is responsible for them. We find that, in contrast to conclusions drawn from previous analyses, the statistical evidence for a 12-5 upset anomaly actually is very weak, while that for the middle-seed anomaly is quite strong. We dispel some (but not all) theories for the former and offer an explanation for the latter that is based primarily on the combined effects of a nonlinear relationship between team strength and seed, the lack of reseeding between rounds, and a strong quasi-home advantage accorded to 1-seeds. We also investigate the effects that hypothetical modifications to the tournament would have on the anomalies and explore whether similar anomalies exist in the NCAA Women’s Basketball Tournament. Journal: The American Statistician Pages: 207-216 Issue: 2 Volume: 75 Year: 2021 Month: 5 X-DOI: 10.1080/00031305.2020.1720814 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1720814 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:2:p:207-216 Template-Type: ReDIF-Article 1.0 Author-Name: Mevin B. Hooten Author-X-Name-First: Mevin B. Author-X-Name-Last: Hooten Author-Name: Devin S. Johnson Author-X-Name-First: Devin S. Author-X-Name-Last: Johnson Author-Name: Brian M. Brost Author-X-Name-First: Brian M. Author-X-Name-Last: Brost Title: Making Recursive Bayesian Inference Accessible Abstract: Bayesian models provide recursive inference naturally because they can formally reconcile new data and existing scientific information. However, popular use of Bayesian methods often avoids priors that are based on exact posterior distributions resulting from former studies. Two existing Recursive Bayesian methods are: Prior- and Proposal-Recursive Bayes. Prior-Recursive Bayes uses Bayesian updating, fitting models to partitions of data sequentially, and provides a way to accommodate new data as they become available using the posterior from the previous stage as the prior in the new stage based on the latest data. Proposal-Recursive Bayes is intended for use with hierarchical Bayesian models and uses a set of transient priors in first stage independent analyses of the data partitions. The second stage of Proposal-Recursive Bayes uses the posteriors from the first stage as proposals in a Markov chain Monte Carlo algorithm to fit the full model. We combine Prior- and Proposal-Recursive concepts to fit any Bayesian model, and often with computational improvements. We demonstrate our method with two case studies. Our approach has implications for big data, streaming data, and optimal adaptive design situations. Journal: The American Statistician Pages: 185-194 Issue: 2 Volume: 75 Year: 2021 Month: 5 X-DOI: 10.1080/00031305.2019.1665584 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1665584 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:2:p:185-194 Template-Type: ReDIF-Article 1.0 Author-Name: Aaron Fisher Author-X-Name-First: Aaron Author-X-Name-Last: Fisher Author-Name: Edward H. Kennedy Author-X-Name-First: Edward H. Author-X-Name-Last: Kennedy Title: Visually Communicating and Teaching Intuition for Influence Functions Abstract: Estimators based on influence functions (IFs) have been shown to be effective in many settings, especially when combined with machine learning techniques. By focusing on estimating a specific target of interest (e.g., the average effect of a treatment), rather than on estimating the full underlying data generating distribution, IF-based estimators are often able to achieve asymptotically optimal mean-squared error. Still, many researchers find IF-based estimators to be opaque or overly technical, which makes their use less prevalent and their benefits less available. To help foster understanding and trust in IF-based estimators, we present tangible, visual illustrations of when and how IF-based estimators can outperform standard “plug-in” estimators. The figures we show are based on connections between IFs, gradients, linear approximations, and Newton–Raphson. Journal: The American Statistician Pages: 162-172 Issue: 2 Volume: 75 Year: 2021 Month: 5 X-DOI: 10.1080/00031305.2020.1717620 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1717620 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:2:p:162-172 Template-Type: ReDIF-Article 1.0 Author-Name: Brian D. Segal Author-X-Name-First: Brian D. Author-X-Name-Last: Segal Title: Toward Replicability With Confidence Intervals for the Exceedance Probability Abstract: Several scientific fields including psychology are undergoing a replication crisis. There are many reasons for this problem, one of which is a misuse of p-values. There are several alternatives to p-values, and in this article we describe a complement that is geared toward replication. In particular, we focus on confidence intervals for the probability that a parameter estimate will exceed a specified value in an exact replication study. These intervals convey uncertainty in a way that p-values and standard confidence intervals do not, and can help researchers to draw sounder scientific conclusions. After briefly reviewing background on p-values and a few alternatives, we describe our approach and provide examples with simulated and real data. For linear models, we also describe how confidence intervals for the exceedance probability are related to p-values and confidence intervals for parameters. Journal: The American Statistician Pages: 128-138 Issue: 2 Volume: 75 Year: 2021 Month: 5 X-DOI: 10.1080/00031305.2019.1678521 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1678521 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:2:p:128-138 Template-Type: ReDIF-Article 1.0 Author-Name: Adam Kapelner Author-X-Name-First: Adam Author-X-Name-Last: Kapelner Author-Name: Abba M. Krieger Author-X-Name-First: Abba M. Author-X-Name-Last: Krieger Author-Name: Michael Sklar Author-X-Name-First: Michael Author-X-Name-Last: Sklar Author-Name: Uri Shalit Author-X-Name-First: Uri Author-X-Name-Last: Shalit Author-Name: David Azriel Author-X-Name-First: David Author-X-Name-Last: Azriel Title: Harmonizing Optimized Designs With Classic Randomization in Experiments Abstract: There is a long debate in experimental design between the classic randomization design of Fisher, Yates, Kempthorne, Cochran, and those who advocate deterministic assignments based on notions of optimality. In nonsequential trials comparing treatment and control, covariate measurements for each subject are known in advance, and subjects can be divided into two groups based on a criterion of imbalance. With the advent of modern computing, this partition can be made nearly perfectly balanced via numerical optimization, but these allocations are far from random. These perfect allocations may endanger estimation relative to classic randomization because unseen subject-specific characteristics can be highly imbalanced. To demonstrate this, we consider different performance criterions such as Efron’s worst-case analysis and our original tail criterion of mean squared error. Under our tail criterion for the differences-in-mean estimator, we prove asymptotically that the optimal design must be more random than perfect balance but less random than completely random. Our result vindicates restricted designs that are used regularly such as blocking and rerandomization. For a covariate-adjusted estimator, balancing offers less rewards and it seems good performance is achievable with complete randomization. Further work will provide a procedure to find the explicit optimal design in different scenarios in practice. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 195-206 Issue: 2 Volume: 75 Year: 2021 Month: 5 X-DOI: 10.1080/00031305.2020.1717619 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1717619 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:2:p:195-206 Template-Type: ReDIF-Article 1.0 Author-Name: Jin Zhang Author-X-Name-First: Jin Author-X-Name-Last: Zhang Title: The Mean Relative Entropy: An Invariant Measure of Estimation Error Abstract: A fundamental issue in statistics is parameter estimation, where the first step is to select estimators under some measure of estimation error. The commonly used measure is the mean squared error, which is simple, intuitive and highly interpretable, but it has some drawbacks, often creating confusions in evaluating estimators. To solve these problems, we propose two invariance properties and the sufficiency principle as the prerequisite for any reasonable measure. Then, the mean relative entropy is established as an invariant measure of estimation error. Journal: The American Statistician Pages: 117-123 Issue: 2 Volume: 75 Year: 2021 Month: 5 X-DOI: 10.1080/00031305.2018.1543139 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1543139 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:2:p:117-123 Template-Type: ReDIF-Article 1.0 Author-Name: Philip T. Reiss Author-X-Name-First: Philip T. Author-X-Name-Last: Reiss Title: A Problem of Distributive Justice, Solved by the Lasso Abstract: The problem of dividing an estate among creditors, when their claims total more than the value of the estate, was posed in the Talmud and has been analyzed in the game theory literature. Here, we reveal a close connection between schemes for estate division and linear regression solution paths obtained by least angle regression or by the lasso. We focus primarily on the division scheme known as constrained equal awards, but also consider a more complex approach described by Aumann and Maschler. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 139-144 Issue: 2 Volume: 75 Year: 2021 Month: 5 X-DOI: 10.1080/00031305.2019.1688682 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1688682 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:2:p:139-144 Template-Type: ReDIF-Article 1.0 Author-Name: Christine R. Wells Author-X-Name-First: Christine R. Author-X-Name-Last: Wells Title: SAS for Mixed Models: Introduction and Basic Applications Journal: The American Statistician Pages: 231-231 Issue: 2 Volume: 75 Year: 2021 Month: 5 X-DOI: 10.1080/00031305.2021.1907997 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1907997 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:2:p:231-231 Template-Type: ReDIF-Article 1.0 Author-Name: Roberta La Haye Author-X-Name-First: Roberta La Author-X-Name-Last: Haye Author-Name: Petr Zizler Author-X-Name-First: Petr Author-X-Name-Last: Zizler Title: The Lorenz Curve in the Classroom Abstract: The Lorenz curve and Gini index have great social relevance due to concerns regarding income inequality. However, their discussion is limited in the undergraduate statistics and mathematics curriculum. This article outlines how to increase the educational potential of Lorenz curves as an application in both the calculus class and introductory probability classroom. We show how calculus and probability techniques can be used to obtain not only the Gini index, but also a variety of other statistical measures from the Lorenz curve, provided the mean is known. The measures discussed include the median, and various measures of dispersion. Journal: The American Statistician Pages: 217-225 Issue: 2 Volume: 75 Year: 2020 Month: 10 X-DOI: 10.1080/00031305.2020.1822916 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1822916 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2020:i:2:p:217-225 Template-Type: ReDIF-Article 1.0 Author-Name: Christina P. Knudson Author-X-Name-First: Christina P. Author-X-Name-Last: Knudson Title: x + y: A Mathematician's Manifesto for Rethinking Gender Journal: The American Statistician Pages: 232-233 Issue: 2 Volume: 75 Year: 2021 Month: 5 X-DOI: 10.1080/00031305.2021.1907998 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1907998 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:2:p:232-233 Template-Type: ReDIF-Article 1.0 Author-Name: Florian Böing-Messing Author-X-Name-First: Florian Author-X-Name-Last: Böing-Messing Author-Name: Joris Mulder Author-X-Name-First: Joris Author-X-Name-Last: Mulder Title: Bayes Factors for Testing Order Constraints on Variances of Dependent Outcomes Abstract: In statistical practice, researchers commonly focus on patterns in the means of multiple dependent outcomes while treating variances as nuisance parameters. However, in fact, there are often substantive reasons to expect certain patterns in the variances of dependent outcomes as well. For example, in a repeated measures study, one may expect the variance of the outcome to increase over time if the difference between subjects becomes more pronounced over time because the subjects respond differently to a given treatment. Such expectations can be formulated as order constrained hypotheses on the variances of the dependent outcomes. Currently, however, no methods exist for testing such hypotheses in a direct manner. To fill this gap, we develop a Bayes factor for this challenging testing problem. Our Bayes factor is based on the multivariate normal distribution with an unstructured covariance matrix, which is often used to model dependent outcomes. Order constrained hypotheses can then be formulated on the variances on the diagonal of the covariance matrix. To compute Bayes factors between multiple order constrained hypotheses, a prior distribution needs to be specified under every hypothesis to be tested. Here, we use the encompassing prior approach in which priors under order constrained hypotheses are truncations of the prior under the unconstrained hypothesis. The resulting Bayes factor is fully automatic in the sense that no subjective priors need to be specified by the user. Journal: The American Statistician Pages: 152-161 Issue: 2 Volume: 75 Year: 2021 Month: 5 X-DOI: 10.1080/00031305.2020.1715257 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1715257 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:2:p:152-161 Template-Type: ReDIF-Article 1.0 Author-Name: J. G. Liao Author-X-Name-First: J. G. Author-X-Name-Last: Liao Author-Name: Arthur Berg Author-X-Name-First: Arthur Author-X-Name-Last: Berg Author-Name: Timothy L. McMurry Author-X-Name-First: Timothy L. Author-X-Name-Last: McMurry Title: A Robustified Posterior for Bayesian Inference on a Large Number of Parallel Effects Abstract: Many modern experiments, such as microarray gene expression and genome-wide association studies, present the problem of estimating a large number of parallel effects. Bayesian inference is a popular approach for analyzing such data by modeling the large number of unknown parameters as random effects from a common prior distribution. However, misspecification of the prior distribution can lead to erroneous estimates of the random effects, especially for the largest and most interesting effects. This article has two aims. First, we propose a robustified posterior distribution for a parametric Bayesian hierarchical model that can substantially reduce the impact of a misspecified prior. Second, we conduct a systematic comparison of the standard parametric posterior, the proposed robustified parametric posterior, and nonparametric Bayesian posterior which uses a Dirichlet process mixture prior. The proposed robustified posterior when combined with a flexible parametric prior can be a superior alternative to nonparametric Bayesian methods. Journal: The American Statistician Pages: 145-151 Issue: 2 Volume: 75 Year: 2021 Month: 5 X-DOI: 10.1080/00031305.2019.1701549 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1701549 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:2:p:145-151 Template-Type: ReDIF-Article 1.0 Author-Name: Douglas VanDerwerken Author-X-Name-First: Douglas Author-X-Name-Last: VanDerwerken Title: Slugging Percentage Is Not a Percentage—And Why That Matters Abstract: In this short note, the asymptotic distribution of slugging percentage (SLG) in baseball is derived under multinomial sampling. It is shown that treating SLG like a binomial random variable divided by the number of trials (as is occasionally done in the literature) gives only a lower bound on the variance, which may be a considerable underestimate in practice. Journal: The American Statistician Pages: 124-127 Issue: 2 Volume: 75 Year: 2021 Month: 5 X-DOI: 10.1080/00031305.2018.1564698 File-URL: http://hdl.handle.net/10.1080/00031305.2018.1564698 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:2:p:124-127 Template-Type: ReDIF-Article 1.0 Author-Name: Edward L. Boone Author-X-Name-First: Edward L. Author-X-Name-Last: Boone Title: The Model Thinker: What You Need to Know to Make Data Work for You Journal: The American Statistician Pages: 230-231 Issue: 2 Volume: 75 Year: 2021 Month: 5 X-DOI: 10.1080/00031305.2021.1907993 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1907993 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:2:p:230-231 Template-Type: ReDIF-Article 1.0 Author-Name: Chunming Zhang Author-X-Name-First: Chunming Author-X-Name-Last: Zhang Title: Further Examples Related to Correlations Between Variables and Ranks Abstract: Rank statistics {R1,…,Rn} of actual variates {X1,…,Xn} play an important role in university undergraduate nonparametric statistics courses. This article derives explicit expressions of the correlation coefficients between Xi and Rj for not only i = j but also i≠j, for iid continuous variables X1,…,Xn with a distribution function FX(·) of X and n≥2: (a) ρXi,Ri=n−1n+1 ρX,FX(X)∈(0,n−1n+1] for any i, revealing that the correlation can be as close to one as expected, while may also unexpectedly decrease approaching zero for other distributions of X; (b) ρXi,Rj=−1n−1ρXi,Ri∈[−1n2−1,0) for any i≠j, inferring a negligible negative association with ranks from other data; (c) the partial correlation coefficient between Xi and Ri on Xj for any i≠j equals ρ(Xi,Ri)·Xj=ρXi,Ri/1−ρXj,Ri2∈(ρXi,Ri,n−1n2−2], invariably exceeding ρXi,Ri. Implications of the results necessitate more relevant interpretation of ranks in sharing information of data. Journal: The American Statistician Pages: 226-229 Issue: 2 Volume: 75 Year: 2020 Month: 11 X-DOI: 10.1080/00031305.2020.1831956 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1831956 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2020:i:2:p:226-229 Template-Type: ReDIF-Article 1.0 Author-Name: Alecos Papadopoulos Author-X-Name-First: Alecos Author-X-Name-Last: Papadopoulos Author-Name: Roland B. Stark Author-X-Name-First: Roland B. Author-X-Name-Last: Stark Title: Does Home Health Care Increase the Probability of 30-Day Hospital Readmissions? Interpreting Coefficient Sign Reversals, or Their Absence, in Binary Logistic Regression Analysis Abstract: Data for 30-day readmission rates in American hospitals often show that patients that receive Home Health Care (HHC) have a higher probability of being readmitted to hospital than those that did not receive such services, but it is expected that when control variables are included in a regression we will obtain a “sign reversal” of the treatment effect. We map the real-world situation to the binary logistic regression model, and we construct a counterfactual probability metric that leads to necessary and sufficient conditions for the sign reversal to occur, conditions that show that logistic regression is an appropriate tool for this research purpose. This metric also permits us to obtain evidence related to the criteria used to assign HHC treatment. We examine seven data samples from different USA hospitals for the period 2011–2017. We find that in all cases the provision of HHC increased the probability of readmission of the treated patients. This casts doubt on the appropriateness of the 30-day readmission rate as an indicator of hospital performance and a criterion for hospital reimbursement, as it is currently used for Medicare patients. Journal: The American Statistician Pages: 173-184 Issue: 2 Volume: 75 Year: 2021 Month: 5 X-DOI: 10.1080/00031305.2019.1704873 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1704873 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:2:p:173-184 Template-Type: ReDIF-Article 1.0 Author-Name: Jonathan Rougier Author-X-Name-First: Jonathan Author-X-Name-Last: Rougier Author-Name: Carey E. Priebe Author-X-Name-First: Carey E. Author-X-Name-Last: Priebe Title: The Exact Form of the “Ockham Factor” in Model Selection Abstract: We explore the arguments for maximizing the “evidence” as an algorithm for model selection. We show, using a new definition of model complexity which we term “flexibility,” that maximizing the evidence should appeal to both Bayesian and frequentist statisticians. This is due to flexibility’s unique position in the exact decomposition of log-evidence into log-fit minus flexibility. In the Gaussian linear model, flexibility is asymptotically equal to the Bayesian information criterion (BIC) penalty, but we caution against using BIC in place of flexibility for model selection. Journal: The American Statistician Pages: 288-293 Issue: 3 Volume: 75 Year: 2021 Month: 7 X-DOI: 10.1080/00031305.2020.1764865 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1764865 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:3:p:288-293 Template-Type: ReDIF-Article 1.0 Author-Name: David L. Banks Author-X-Name-First: David L. Author-X-Name-Last: Banks Author-Name: Mevin B. Hooten Author-X-Name-First: Mevin B. Author-X-Name-Last: Hooten Title: Statistical Challenges in Agent-Based Modeling Abstract: Agent-based models (ABMs) are popular in many research communities, but few statisticians have contributed to their theoretical development. They are models like any other models we study, but in general, we are still learning how to fit ABMs to data and how to make quantified statements of uncertainty about the outputs of an ABM. ABM validation is also an underdeveloped area that is ripe for new statistical developments. In what follows, we lay out the research space and encourage statisticians to address the many research issues in the ABM ambit. Journal: The American Statistician Pages: 235-242 Issue: 3 Volume: 75 Year: 2021 Month: 7 X-DOI: 10.1080/00031305.2021.1900914 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1900914 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:3:p:235-242 Template-Type: ReDIF-Article 1.0 Author-Name: David R. Bickel Author-X-Name-First: David R. Author-X-Name-Last: Bickel Title: Null Hypothesis Significance Testing Defended and Calibrated by Bayesian Model Checking Abstract: Significance testing is often criticized because p-values can be low even though posterior probabilities of the null hypothesis are not low according to some Bayesian models. Those models, however, would assign low prior probabilities to the observation that the p-value is sufficiently low. That conflict between the models and the data may indicate that the models needs revision. Indeed, if the p-value is sufficiently small while the posterior probability according to a model is insufficiently small, then the model will fail a model check. That result leads to a way to calibrate a p-value by transforming it into an upper bound on the posterior probability of the null hypothesis (conditional on rejection) for any model that would pass the check. The calibration may be calculated from a prior probability of the null hypothesis and the stringency of the check without more detailed modeling. An upper bound, as opposed to a lower bound, can justify concluding that the null hypothesis has a low posterior probability. Journal: The American Statistician Pages: 249-255 Issue: 3 Volume: 75 Year: 2021 Month: 7 X-DOI: 10.1080/00031305.2019.1699443 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1699443 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:3:p:249-255 Template-Type: ReDIF-Article 1.0 Author-Name: Travis Loux Author-X-Name-First: Travis Author-X-Name-Last: Loux Author-Name: Orlando Davy Author-X-Name-First: Orlando Author-X-Name-Last: Davy Title: Adjusting Published Estimates for Exploratory Biases Using the Truncated Normal Distribution Abstract: Abstract–Publication bias can occur for many reasons, including the perceived need to present statistically significant results. We propose and compare methods for adjusting a single published estimate for possible publication bias using a truncated normal distribution. We attempt to estimate the mean of the underlying normal sampling distribution using only summary data readily available in most published work, making the results practical for use by a consumer of research. The adjustment methods are investigated via simulation and their results compared in terms of bias, mean squared error, and confidence interval coverage. The methods are also applied to eleven previously published studies. We find the proposed methods improve but do not eliminate biases from the statistical significance filter. Journal: The American Statistician Pages: 294-299 Issue: 3 Volume: 75 Year: 2021 Month: 7 X-DOI: 10.1080/00031305.2020.1775700 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1775700 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:3:p:294-299 Template-Type: ReDIF-Article 1.0 Author-Name: Amanda S. Hering Author-X-Name-First: Amanda S. Author-X-Name-Last: Hering Author-Name: Luke Durell Author-X-Name-First: Luke Author-X-Name-Last: Durell Author-Name: Grant Morgan Author-X-Name-First: Grant Author-X-Name-Last: Morgan Title: Illustrating Randomness in Statistics Courses With Spatial Experiments Abstract: Understanding the concept of randomness is fundamental for students in introductory statistics courses, but the notion of randomness is deceivingly complex, so it is often emphasized less than the mechanics of probability and inference. The most commonly used classroom tools to assess students’ production or perception of randomness are binary choices, such as coin tosses, and number sequences, such as dice rolls. The field of psychology has a long history of research on random choice, and we have replicated some experiments that support results seen there regarding the collective distribution of individual choices in spatial geometries. The data from these experiments can easily be incorporated into the undergraduate classroom to visually illustrate the concepts of random choice, complete spatial randomness (CSR), and Poisson processes. Furthermore, spatial statistics classes can use this point pattern data in exploring hypothesis tests for CSR along with simulation. To foster student engagement, it is simple to collect additional data from students to assess agreement with existing data or to develop related, unique experiments. All R code and data to duplicate results are provided. Journal: The American Statistician Pages: 343-353 Issue: 3 Volume: 75 Year: 2021 Month: 7 X-DOI: 10.1080/00031305.2020.1871070 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1871070 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:3:p:343-353 Template-Type: ReDIF-Article 1.0 Author-Name: Keith Kranker Author-X-Name-First: Keith Author-X-Name-Last: Kranker Author-Name: Laura Blue Author-X-Name-First: Laura Author-X-Name-Last: Blue Author-Name: Lauren Vollmer Forrow Author-X-Name-First: Lauren Vollmer Author-X-Name-Last: Forrow Title: Improving Effect Estimates by Limiting the Variability in Inverse Propensity Score Weights Abstract: This study describes a novel method to reweight a comparison group used for causal inference, so the group is similar to a treatment group on observable characteristics yet avoids highly variable weights that would limit statistical power. The proposed method generalizes the covariate-balancing propensity score (CBPS) methodology developed by Imai and Ratkovic (2014) to enable researchers to effectively prespecify the variance (or higher-order moments) of the matching weight distribution. This lets researchers choose among alternative sets of matching weights, some of which produce better balance and others of which yield higher statistical power. We demonstrate using simulations that our penalized CBPS approach can improve effect estimates over those from other established propensity score estimation approaches, producing lower mean squared error. We discuss applications where the method or extensions of it are especially likely to improve effect estimates and we provide an empirical example from the evaluation of Comprehensive Primary Care Plus, a U.S. health care model that aims to strengthen primary care across roughly 3000 practices. Programming code is available to implement the method in Stata. Journal: The American Statistician Pages: 276-287 Issue: 3 Volume: 75 Year: 2021 Month: 7 X-DOI: 10.1080/00031305.2020.1737229 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1737229 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:3:p:276-287 Template-Type: ReDIF-Article 1.0 Author-Name: Haolun Shi Author-X-Name-First: Haolun Author-X-Name-Last: Shi Author-Name: Guosheng Yin Author-X-Name-First: Guosheng Author-X-Name-Last: Yin Title: Reconnecting p-Value and Posterior Probability Under One- and Two-Sided Tests Abstract: As a convention, p-value is often computed in frequentist hypothesis testing and compared with the nominal significance level of 0.05 to determine whether or not to reject the null hypothesis. The smaller the p-value, the more significant the statistical test. Under noninformative prior distributions, we establish the equivalence relationship between the p-value and Bayesian posterior probability of the null hypothesis for one-sided tests and, more importantly, the equivalence between the p-value and a transformation of posterior probabilities of the hypotheses for two-sided tests. For two-sided hypothesis tests with a point null, we recast the problem as a combination of two one-sided hypotheses along the opposite directions and establish the notion of a “two-sided posterior probability,” which reconnects with the (two-sided) p-value. In contrast to the common belief, such an equivalence relationship renders p-value an explicit interpretation of how strong the data support the null. Extensive simulation studies are conducted to demonstrate the equivalence relationship between the p-value and Bayesian posterior probability. Contrary to broad criticisms on the use of p-value in evidence-based studies, we justify its utility and reclaim its importance from the Bayesian perspective. Journal: The American Statistician Pages: 265-275 Issue: 3 Volume: 75 Year: 2021 Month: 7 X-DOI: 10.1080/00031305.2020.1717621 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1717621 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:3:p:265-275 Template-Type: ReDIF-Article 1.0 Author-Name: William B. Fairley Author-X-Name-First: William B. Author-X-Name-Last: Fairley Author-Name: William A. Huber Author-X-Name-First: William A. Author-X-Name-Last: Huber Title: On Being an Ethical Statistical Expert in a Legal Case Abstract: In the Anglo-American legal system, courts rely heavily on experts who perform an essential social function in supplying information to resolve disputes. Experts are the vehicles through which facts of any technical complexity are brought out. The adversarial nature of this legal system places expert witnesses in a quandary. Enjoined to serve the court and their profession with unbiased, independent opinion, expert witnesses nevertheless do not work directly for the court: they are employed by advocates (lawyers) who aim to win a high stakes debate for their clients. The system is imperfect. Pressures (whether real or perceived) on experts to please their clients may cause truth to be the victim. We use examples from our experience, and reports of statisticians commenting on theirs, to show how statistical evidence can be honestly and effectively used in courts. We maintain it is vital for would-be experts to study the rules of the legal process and their role within it. (The present article is a step toward that end.) We explain what the legal process looks for in an expert and present some ways in which an expert can maintain their independence and avoid being co-opted by the lawyer who sponsors them. Statisticians contribute in sometimes unique ways to the resolution of disputes, including in forums like negotiations, mediation, arbitration, and regulatory hearing, where the misuse and abuse of statistical procedures occur too often. It is a challenge for statisticians to improve that situation, but they can find professional opportunities and satisfaction in doing so. Because this discussion pertains generally to the application and communication of statistical thinking, statisticians in any sphere of application should find it useful. Journal: The American Statistician Pages: 323-333 Issue: 3 Volume: 75 Year: 2021 Month: 7 X-DOI: 10.1080/00031305.2020.1763834 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1763834 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:3:p:323-333 Template-Type: ReDIF-Article 1.0 Author-Name: James J. Higgins Author-X-Name-First: James J. Author-X-Name-Last: Higgins Author-Name: Michael J. Higgins Author-X-Name-First: Michael J. Author-X-Name-Last: Higgins Author-Name: Jinguang Lin Author-X-Name-First: Jinguang Author-X-Name-Last: Lin Title: From One Environment to Many: The Problem of Replicability of Statistical Inferences Abstract: Among plausible causes for replicability failure, one that has not received sufficient attention is the environment in which the research is conducted. Consisting of the population, equipment, personnel, and various conditions such as location, time, and weather, the research environment can affect treatments and outcomes, and changes in the research environment that occur when an experiment is redone can affect replicability. We examine the extent to which such changes contribute to replicability failure. Our framework is that of an initial experiment that generates the data and a follow-up experiment that is done the same way except for a change in the research environment. We assume that the initial experiment satisfies the assumptions of the two-sample t-statistic and that the follow-up experiment is described by a mixed model which includes environmental parameters. We derive expressions for the effect that the research environment has on power, sample size selection, p-values, and confidence levels. We measure the size of the environmental effect with the environmental effect ratio (EER) which is the ratio of the standard deviations of environment by treatment interaction and error. By varying EER, it is possible to determine conditions that favor replicability and those that do not. Journal: The American Statistician Pages: 334-342 Issue: 3 Volume: 75 Year: 2021 Month: 7 X-DOI: 10.1080/00031305.2020.1829047 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1829047 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:3:p:334-342 Template-Type: ReDIF-Article 1.0 Author-Name: Wei Jiang Author-X-Name-First: Wei Author-X-Name-Last: Jiang Author-Name: Shuang Song Author-X-Name-First: Shuang Author-X-Name-Last: Song Author-Name: Lin Hou Author-X-Name-First: Lin Author-X-Name-Last: Hou Author-Name: Hongyu Zhao Author-X-Name-First: Hongyu Author-X-Name-Last: Zhao Title: A Set of Efficient Methods to Generate High-Dimensional Binary Data With Specified Correlation Structures Abstract: High-dimensional correlated binary data arise in many areas, such as observed genetic variations in biomedical research. Data simulation can help researchers evaluate efficiency and explore properties of different computational and statistical methods. Also, some statistical methods, such as Monte Carlo methods, rely on data simulation. Lunn and Davies proposed linear time complexity methods to generate correlated binary variables with three common correlation structures. However, it is infeasible to specify unequal probabilities in their methods. In this article, we introduce several computationally efficient algorithms that generate high-dimensional binary data with specified correlation structures and unequal probabilities. Our algorithms have linear time complexity with respect to the dimension for three commonly studied correlation structures, namely exchangeable, decaying-product and K-dependent correlation structures. In addition, we extend our algorithms to generate binary data of specified nonnegative correlation matrices satisfying the validity condition with quadratic time complexity. We provide an R package, CorBin, to implement our simulation methods. Compared to the existing packages for binary data generation, the time cost to generate a 100-dimensional binary vector with the common correlation structures and general correlation matrices can be reduced up to 105 folds and 103 folds, respectively, and the efficiency can be further improved with the increase of dimensions. The R package CorBin is available on CRAN at https://cran.r-project.org/. Journal: The American Statistician Pages: 310-322 Issue: 3 Volume: 75 Year: 2021 Month: 7 X-DOI: 10.1080/00031305.2020.1816213 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1816213 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:3:p:310-322 Template-Type: ReDIF-Article 1.0 Author-Name: David J. Aldous Author-X-Name-First: David J. Author-X-Name-Last: Aldous Title: A Prediction Tournament Paradox Abstract: In a prediction tournament, contestants “forecast” by asserting a numerical probability for each of (say) 100 future real-world events. The scoring system is designed so that (regardless of the unknown true probabilities) more accurate forecasters will likely score better. This is true for one-on-one comparisons between contestants. But consider a realistic-size tournament with many contestants, with a range of accuracies. It may seem self-evident that the winner will likely be one of the most accurate forecasters. But, in the setting where the range extends to very accurate forecasters, simulations show this is mathematically false, within a somewhat plausible model. Even outside that setting the winner is less likely than intuition suggests to be one of the handful of best forecasters. Though implicit in recent technical papers, this paradox has apparently not been explicitly pointed out before, though is easily explained. It perhaps has implications for the ongoing IARPA-sponsored research programs involving forecasting. Journal: The American Statistician Pages: 243-248 Issue: 3 Volume: 75 Year: 2021 Month: 7 X-DOI: 10.1080/00031305.2019.1604430 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1604430 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:3:p:243-248 Template-Type: ReDIF-Article 1.0 Author-Name: Yanlong Sun Author-X-Name-First: Yanlong Author-X-Name-Last: Sun Author-Name: Hongbin Wang Author-X-Name-First: Hongbin Author-X-Name-Last: Wang Title: Learning Temporal Structures of Random Patterns by Generating Functions Abstract: We present a method of generating functions to compute the distributions of the first-arrival and inter-arrival times of random patterns in independent Bernoulli trials and first-order Markov trials. We use segmentation of pattern events and diagrams of Markov chains to illustrate the recursive structures represented by generating functions. We then relate the results of pattern time to the probability of first occurrence and the probability of occurrence at least once within a finite sample size. Through symbolic manipulation of formal power series and multiple levels of compression, generating functions provide a powerful way to discover the rich statistical structures embedded in random sequences. Journal: The American Statistician Pages: 300-309 Issue: 3 Volume: 75 Year: 2021 Month: 7 X-DOI: 10.1080/00031305.2020.1778527 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1778527 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:3:p:300-309 Template-Type: ReDIF-Article 1.0 Author-Name: Yen-Chi Chen Author-X-Name-First: Yen-Chi Author-X-Name-Last: Chen Title: Reviewof Books and Teaching Materials Journal: The American Statistician Pages: 354-354 Issue: 3 Volume: 75 Year: 2021 Month: 7 X-DOI: 10.1080/00031305.2021.1949931 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1949931 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:3:p:354-354 Template-Type: ReDIF-Article 1.0 Author-Name: J. G. Liao Author-X-Name-First: J. G. Author-X-Name-Last: Liao Author-Name: Vishal Midya Author-X-Name-First: Vishal Author-X-Name-Last: Midya Author-Name: Arthur Berg Author-X-Name-First: Arthur Author-X-Name-Last: Berg Title: Connecting and Contrasting the Bayes Factor and a Modified ROPE Procedure for Testing Interval Null Hypotheses Abstract: There has been strong recent interest in testing interval null hypotheses for improved scientific inference. For example, Lakens et al. and Lakens and Harms use this approach to study if there is a prespecified meaningful treatment effect in gerontology and clinical trials, instead of a point null hypothesis of any effect. Two popular Bayesian approaches are available for interval null hypothesis testing. One is the standard Bayes factor and the other is the region of practical equivalence (ROPE) procedure championed by Kruschke and others over many years. This article connects key quantities in the two approaches, which in turn allow us to contrast two major differences between the approaches with substantial practical implications. The first is that the Bayes factor depends heavily on the prior specification while a modified ROPE procedure is very robust. The second difference is concerned with the statistical property when data are generated under a neutral parameter value on the common boundary of competing hypotheses. In this case, the Bayes factors can be severely biased whereas the modified ROPE approach gives a reasonable result. Finally, the connection leads to a simple and effective algorithm for computing Bayes factors using draws from posterior distributions generated by standard Bayesian programs such as BUGS, JAGS, and Stan. Journal: The American Statistician Pages: 256-264 Issue: 3 Volume: 75 Year: 2021 Month: 7 X-DOI: 10.1080/00031305.2019.1701550 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1701550 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:3:p:256-264 Template-Type: ReDIF-Article 1.0 Author-Name: Youjin Lee Author-X-Name-First: Youjin Author-X-Name-Last: Lee Title: Beyond Multiple Linear Regression: Applied Generalized Linear Models and Multilevel Models in R Journal: The American Statistician Pages: 450-451 Issue: 4 Volume: 75 Year: 2021 Month: 10 X-DOI: 10.1080/00031305.2021.1985862 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1985862 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:450-451 Template-Type: ReDIF-Article 1.0 Author-Name: Ben O’Neill Author-X-Name-First: Ben Author-X-Name-Last: O’Neill Title: The Classical Occupancy Distribution: Computation and Approximation Abstract: We examine the discrete distributional form that arises from the “classical occupancy problem,” which looks at the behavior of the number of occupied bins when we allocate a given number of balls uniformly at random to a given number of bins. We review the mass function and moments of the classical occupancy distribution and derive exact and asymptotic results for the mean, variance, skewness and kurtosis. We develop an algorithm to compute a cubic array of log-probabilities from the classical occupancy distribution. This algorithm allows the computation of large blocks of values while avoiding underflow problems in computation. Using this algorithm, we compute the classical occupancy distribution for a large block of values of balls and bins, and we measure the accuracy of its asymptotic approximation using the normal distribution. We analyze the accuracy of the normal approximation with respect to the variance, skewness and kurtosis of the distribution. Based on this analysis, we give some practical guidance on the feasibility of computing large blocks of values from the occupancy distribution, and when approximation is required. Journal: The American Statistician Pages: 364-375 Issue: 4 Volume: 75 Year: 2021 Month: 10 X-DOI: 10.1080/00031305.2019.1699445 File-URL: http://hdl.handle.net/10.1080/00031305.2019.1699445 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:364-375 Template-Type: ReDIF-Article 1.0 Author-Name: Peter E. Freeman Author-X-Name-First: Peter E. Author-X-Name-Last: Freeman Title: Facilitating Authentic Practice for Early Undergraduate Statistics Students Abstract: In current curricula, authentic statistical practice generally only occurs in capstone projects undertaken by advanced undergraduate and Master’s students. We argue that deferring practice is a mistake: undergraduate students should achieve experience via repeated practice from their first years onward, to achieve heightened levels of confidence and competence prior to graduation. However, statistical practice is not a “one size fits all” enterprise: for instance, elements of a capstone experience, such as extensive data preprocessing, may be out of place in earlier practice settings due to less-experienced students’ relative lack of coding skill. We describe a course we have implemented at Carnegie Mellon University, currently open to second-year students, that provides a circumscribed opportunity for statistical practice that limits coding breadth, uses fully curated data, treats statistical learning models as “gray boxes” to be understood qualitatively, and provides open-ended semester-long projects that students pursue outside of class. We show how pre- and post-course assessment tests and retrospective surveys indicate clear gains in the students’ knowledge of, and attitudes toward, statistical practice. Given its clear benefits, we feel that statistics and data science programs should offer a course like the one we describe to all undergraduate students pursuing statistics and data science degrees. Journal: The American Statistician Pages: 433-444 Issue: 4 Volume: 75 Year: 2021 Month: 10 X-DOI: 10.1080/00031305.2020.1844293 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1844293 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:433-444 Template-Type: ReDIF-Article 1.0 Author-Name: Paul Vos Author-X-Name-First: Paul Author-X-Name-Last: Vos Author-Name: Qiang Wu Author-X-Name-First: Qiang Author-X-Name-Last: Wu Title: Letter to the Editor: Zhang, J. (2021), “The Mean Relative Entropy: An Invariant Measure of Estimation Error,” The American Statistician, 75, 117–123: comment by Vos and Wu Journal: The American Statistician Pages: 455-457 Issue: 4 Volume: 75 Year: 2021 Month: 10 X-DOI: 10.1080/00031305.2021.1978544 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1978544 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:455-457 Template-Type: ReDIF-Article 1.0 Author-Name: Kevin Kunzmann Author-X-Name-First: Kevin Author-X-Name-Last: Kunzmann Author-Name: Michael J. Grayling Author-X-Name-First: Michael J. Author-X-Name-Last: Grayling Author-Name: Kim May Lee Author-X-Name-First: Kim May Author-X-Name-Last: Lee Author-Name: David S. Robertson Author-X-Name-First: David S. Author-X-Name-Last: Robertson Author-Name: Kaspar Rufibach Author-X-Name-First: Kaspar Author-X-Name-Last: Rufibach Author-Name: James M. S. Wason Author-X-Name-First: James M. S. Author-X-Name-Last: Wason Title: A Review of Bayesian Perspectives on Sample Size Derivation for Confirmatory Trials Abstract: Sample size derivation is a crucial element of planning any confirmatory trial. The required sample size is typically derived based on constraints on the maximal acceptable Type I error rate and minimal desired power. Power depends on the unknown true effect and tends to be calculated either for the smallest relevant effect or a likely point alternative. The former might be problematic if the minimal relevant effect is close to the null, thus requiring an excessively large sample size, while the latter is dubious since it does not account for the a priori uncertainty about the likely alternative effect. A Bayesian perspective on sample size derivation for a frequentist trial can reconcile arguments about the relative a priori plausibility of alternative effects with ideas based on the relevance of effect sizes. Many suggestions as to how such “hybrid” approaches could be implemented in practice have been put forward. However, key quantities are often defined in subtly different ways in the literature. Starting from the traditional entirely frequentist approach to sample size derivation, we derive consistent definitions for the most commonly used hybrid quantities and highlight connections, before discussing and demonstrating their use in sample size derivation for clinical trials. Journal: The American Statistician Pages: 424-432 Issue: 4 Volume: 75 Year: 2021 Month: 10 X-DOI: 10.1080/00031305.2021.1901782 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1901782 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:424-432 Template-Type: ReDIF-Article 1.0 Author-Name: Dennis D. Boos Author-X-Name-First: Dennis D. Author-X-Name-Last: Boos Author-Name: Siyu Duan Author-X-Name-First: Siyu Author-X-Name-Last: Duan Title: Pairwise Comparisons Using Ranks in the One-Way Model Abstract: The Wilcoxon rank sum test for two independent samples and the Kruskal–Wallis rank test for the one-way model with k independent samples are very competitive robust alternatives to the two-sample t-test and k-sample F-test when the underlying data have tails longer than the normal distribution. However, these positives for rank methods do not extend as readily to methods for making all pairwise comparisons used to reveal where the differences in location may exist. Here, we show that the closed method of Marcus et al. applied to ranks is quite powerful for both small and large samples and better than any methods suggested in the list of applied nonparametric texts found in the recent study by Richardson. In addition, we show that the closed method applied to means is even more powerful than the classical Tukey–Kramer method applied to means, which itself is very competitive for nonnormal data with moderately long tails and small samples. Journal: The American Statistician Pages: 414-423 Issue: 4 Volume: 75 Year: 2021 Month: 10 X-DOI: 10.1080/00031305.2020.1860819 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1860819 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:414-423 Template-Type: ReDIF-Article 1.0 Author-Name: Marius Hofert Author-X-Name-First: Marius Author-X-Name-Last: Hofert Title: Random number generators produce collisions: Why, how many and more Abstract: It seems surprising that when applying widely used random number generators to generate one million random numbers on modern architectures, one obtains, on average, about 116 collisions. This article explains why, how to mathematically compute such a number, why they often cannot be obtained in a straightforward way, how to numerically compute them in a robust way and, among other things, what would need to be changed to bring this number below 1. The probability of at least one collision is also briefly addressed, which, as it turns out, again needs a careful numerical treatment. Overall, the article provides an introduction to the representation of floating-point numbers on a computer and corresponding implications in statistics and simulation. All computations are carried out in R and are reproducible with the texttt included in this article. Journal: The American Statistician Pages: 394-402 Issue: 4 Volume: 75 Year: 2021 Month: 10 X-DOI: 10.1080/00031305.2020.1782261 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1782261 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:394-402 Template-Type: ReDIF-Article 1.0 Author-Name: Luke Keele Author-X-Name-First: Luke Author-X-Name-Last: Keele Author-Name: Dylan S. Small Author-X-Name-First: Dylan S. Author-X-Name-Last: Small Title: Comparing Covariate Prioritization via Matching to Machine Learning Methods for Causal Inference Using Five Empirical Applications Abstract: When investigators seek to estimate causal effects, they often assume that selection into treatment is based only on observed covariates. Under this identification strategy, analysts must adjust for observed confounders. While basic regression models have long been the dominant method of statistical adjustment, methods based on matching or weighting have become more common. Of late, methods based on machine learning (ML) have been developed for statistical adjustment. These ML methods are often designed to be black box methods with little input from the researcher. In contrast, matching methods that use covariate prioritization are designed to allow for direct input from substantive investigators. In this article, we use a novel research design to compare matching with covariate prioritization to black box methods. We use black box methods to replicate results from five studies where matching with covariate prioritization was used to customize the statistical adjustment in direct response to substantive expertise. We compare the methods in terms of both point and interval estimation. We conclude with advice for investigators. Journal: The American Statistician Pages: 355-363 Issue: 4 Volume: 75 Year: 2021 Month: 10 X-DOI: 10.1080/00031305.2020.1867638 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1867638 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:355-363 Template-Type: ReDIF-Article 1.0 Author-Name: Narges Motalebi Author-X-Name-First: Narges Author-X-Name-Last: Motalebi Author-Name: Nathaniel T. Stevens Author-X-Name-First: Nathaniel T. Author-X-Name-Last: Stevens Author-Name: Stefan H. Steiner Author-X-Name-First: Stefan H. Author-X-Name-Last: Steiner Title: Hurdle Blockmodels for Sparse Network Modeling Abstract: A variety of random graph models have been proposed in the literature to model the associations within an interconnected system and to realistically account for various structures and attributes of such systems. In particular, much research has been devoted to modeling the interaction of humans within social networks. However, such networks in real-life tend to be extremely sparse and existing methods do not adequately address this issue. In this article, we propose an extension to ordinary and degree corrected stochastic blockmodels that accounts for a high degree of sparsity. Specifically, we propose hurdle versions of these blockmodels to account for community structure and degree heterogeneity in sparse networks. We use simulation to ensure parameter estimation is consistent and precise, and we propose the use of likelihood ratio-type tests for model selection. We illustrate the necessity for hurdle blockmodels with a small research collaboration network as well as the infamous Enron E-mail exchange network. Methods for determining goodness of fit and performing model selection are also proposed. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 383-393 Issue: 4 Volume: 75 Year: 2021 Month: 10 X-DOI: 10.1080/00031305.2020.1865199 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1865199 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:383-393 Template-Type: ReDIF-Article 1.0 Author-Name: Philippe Flandre Author-X-Name-First: Philippe Author-X-Name-Last: Flandre Author-Name: John O’Quigley Author-X-Name-First: John Author-X-Name-Last: O’Quigley Title: The Short-Term and Long-Term Hazard Ratio Model: Parameterization Inconsistency Abstract: The test of Yang and Prentice, based on the short-term and long-term hazard ratio model for the presence of a regression effect appears to be an attractive one, being able to detect departures from a null hypothesis of no effect against quite broad alternatives. We recall the model on which this test is based and the test itself. In simulations, the test has shown good performance and is judged to be of potential value when alternatives to the null may be of a nonproportional hazards nature. However, the model, even when valid, suffers from a parameterization inconsistency in the sense that parameter estimates can violate the model’s assumed parametric structure even when true. This leads to awkward behavior in some situations. For example, this inconsistency implies that inference will not be invariant to the coding of treatment allocation. While this is a theoretical observation, we provide real examples that highlight the difficulty in making clear cut inferences from the model. Potential solutions are available and we provide some discussion on this. Journal: The American Statistician Pages: 376-382 Issue: 4 Volume: 75 Year: 2021 Month: 10 X-DOI: 10.1080/00031305.2020.1740786 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1740786 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:376-382 Template-Type: ReDIF-Article 1.0 Author-Name: Jianning Yang Author-X-Name-First: Jianning Author-X-Name-Last: Yang Author-Name: John E. Kolassa Author-X-Name-First: John E. Author-X-Name-Last: Kolassa Title: The Impact of Application of the Jackknife to the Sample Median Abstract: The jackknife is a reliable tool for reducing the bias of a wide range of estimators. This note demonstrates that even such versatile tools have regularity conditions that can be violated even in relatively simple cases, and that caution needs to be exercised in their use. In particular, we show that the jackknife does not provide the expected reliability for bias-reduction for the sample median, because of subtle changes in behavior of the sample median as one moves between even and odd sample sizes. These considerations arose out of class discussions in a MS-level nonparametrics course. Journal: The American Statistician Pages: 445-449 Issue: 4 Volume: 75 Year: 2021 Month: 10 X-DOI: 10.1080/00031305.2020.1869090 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1869090 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:445-449 Template-Type: ReDIF-Article 1.0 Author-Name: Jin Zhang Author-X-Name-First: Jin Author-X-Name-Last: Zhang Title: Response to Letter to the Editor: Zhang, J. (2021) Journal: The American Statistician Pages: 458-458 Issue: 4 Volume: 75 Year: 2021 Month: 10 X-DOI: 10.1080/00031305.2021.1982557 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1982557 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:458-458 Template-Type: ReDIF-Article 1.0 Author-Name: Gabriel J. Young Author-X-Name-First: Gabriel J. Author-X-Name-Last: Young Title: Probability and Statistical Inference: From Basic Principles to Advanced Models Journal: The American Statistician Pages: 451-453 Issue: 4 Volume: 75 Year: 2021 Month: 10 X-DOI: 10.1080/00031305.2021.1985863 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1985863 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:451-453 Template-Type: ReDIF-Article 1.0 Author-Name: Samuel Thomas Author-X-Name-First: Samuel Author-X-Name-Last: Thomas Author-Name: Wanzhu Tu Author-X-Name-First: Wanzhu Author-X-Name-Last: Tu Title: Learning Hamiltonian Monte Carlo in R Abstract: Hamiltonian Monte Carlo (HMC) is a powerful tool for Bayesian computation. In comparison with the traditional Metropolis–Hastings algorithm, HMC offers greater computational efficiency, especially in higher dimensional or more complex modeling situations. To most statisticians, however, the idea of HMC comes from a less familiar origin, one that is based on the theory of classical mechanics. Its implementation, either through Stan or one of its derivative programs, can appear opaque to beginners. A lack of understanding of the inner working of HMC, in our opinion, has hindered its application to a broader range of statistical problems. In this article, we review the basic concepts of HMC in a language that is more familiar to statisticians, and we describe an HMC implementation in R, one of the most frequently used statistical software environments. We also present hmclearn, an R package for learning HMC. This package contains a general-purpose HMC function for data analysis. We illustrate the use of this package in common statistical models. In doing so, we hope to promote this powerful computational tool for wider use. Example code for common statistical models is presented as supplementary material for online publication. Journal: The American Statistician Pages: 403-413 Issue: 4 Volume: 75 Year: 2021 Month: 10 X-DOI: 10.1080/00031305.2020.1865198 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1865198 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:403-413 Template-Type: ReDIF-Article 1.0 Author-Name: Kenneth R. Benoit Author-X-Name-First: Kenneth R. Author-X-Name-Last: Benoit Title: Textual Data Science with R Journal: The American Statistician Pages: 453-454 Issue: 4 Volume: 75 Year: 2021 Month: 10 X-DOI: 10.1080/00031305.2021.1985864 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1985864 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:453-454 Template-Type: ReDIF-Article 1.0 Author-Name: William F. Christensen Author-X-Name-First: William F. Author-X-Name-Last: Christensen Author-Name: Brinley N. Zabriskie Author-X-Name-First: Brinley N. Author-X-Name-Last: Zabriskie Title: When Your Permutation Test is Doomed to Fail Abstract: A two-tailed test comparing the means of two independent populations is perhaps the most commonly used hypothesis test in quantitative research, featured centrally in medical research, A/B testing, and throughout the sciences. When data are skewed, the standard two-tailed t test is not appropriate and the permutation test comparing the two means (or medians) has been a widely recommended alternative, with statistical authors and statistical software packages touting the permutation test’s utility, particularly for small samples. In this presentation, we illustrate that when the two samples are skewed and the sample sizes are unequal, the two-tailed permutation test (as traditionally implemented) can in some cases have power equal to zero, even when the k highest values in the combined data are all found in the group with k observations. Further, in many cases the standard permutation test exhibits decreasing power as the total sample size increases! We illustrate the causes of these perverse properties via both simulation and real-world examples, and we recommend approaches for ameliorating or avoiding these potential problems. Journal: The American Statistician Pages: 53-63 Issue: 1 Volume: 76 Year: 2022 Month: 1 X-DOI: 10.1080/00031305.2021.1902856 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1902856 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:53-63 Template-Type: ReDIF-Article 1.0 Author-Name: David A. Harville Author-X-Name-First: David A. Author-X-Name-Last: Harville Title: Bayesian Inference Is Unaffected by Selection: Fact or Fiction? Abstract: The problem considered is that of making inferences about the value of a parameter vector θ based on the value of an observable random vector y that is subject to selection of the form y∈S (for a known subset S). According to conventional wisdom, a Bayesian approach (unlike a frequentist approach) requires no adjustment for selection, which is generally regarded as counterintuitive and even paradoxical. An alternative considered herein consists (when taking a Bayesian approach in the face of selection) of basing the inferences for the value of θ on the posterior distribution derived from the conditional (on y∈S ) joint distribution of y and θ . That leads to an adjustment in the likelihood function that is reinterpretable as an adjustment to the prior distribution and ultimately leads to a different posterior distribution. And it serves to make the inferences specific to settings that are subject to selection of the same kind as the setting that gave rise to the data. Moreover, even in the absence of any real selection, this approach can be used to make the inferences specific to a meaningful subset of y-values. Journal: The American Statistician Pages: 22-28 Issue: 1 Volume: 76 Year: 2022 Month: 1 X-DOI: 10.1080/00031305.2020.1858963 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1858963 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:22-28 Template-Type: ReDIF-Article 1.0 Author-Name: James M. Flegal Author-X-Name-First: James M. Author-X-Name-Last: Flegal Title: Do Dice Play God? The Mathematics of Uncertainty, by Ian Stewart Journal: The American Statistician Pages: 85-85 Issue: 1 Volume: 76 Year: 2022 Month: 1 X-DOI: 10.1080/00031305.2021.2019999 File-URL: http://hdl.handle.net/10.1080/00031305.2021.2019999 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:85-85 Template-Type: ReDIF-Article 1.0 Author-Name: William E. Griffiths Author-X-Name-First: William E. Author-X-Name-Last: Griffiths Author-Name: R. Carter Hill Author-X-Name-First: R. Carter Author-X-Name-Last: Hill Title: On the Power of the F-test for Hypotheses in a Linear Model Abstract: We improve students’ understanding of the F-test for linear hypotheses in a linear model by explaining elements that affect the power of the test. Including true restrictions in a joint null hypothesis affects test power in a way that is not generally known. Asking a student whether including the true restrictions in the null hypothesis will increase or decrease power, the student is likely to say: “I don’t know.” The student’s answer is not bad because the power depends on the noncentrality parameter and the degrees of freedom. We show that adding true restrictions to a linear hypothesis cannot decrease the noncentrality parameter of the F-statistic, a result many will find counterintuitive. Adding true restrictions can increase or decrease F-test power depending on the offsetting negative effect of reducing the numerator degrees of freedom. We provide illustrative examples of these results and prove them for the general case. Journal: The American Statistician Pages: 78-84 Issue: 1 Volume: 76 Year: 2022 Month: 1 X-DOI: 10.1080/00031305.2021.1979652 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1979652 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:78-84 Template-Type: ReDIF-Article 1.0 Author-Name: Matthew J. McIntosh Author-X-Name-First: Matthew J. Author-X-Name-Last: McIntosh Title: Calculating Sample Size for Follmann’s Simple Multivariate Test for One-Sided Alternatives Abstract: Follmann developed a multivariate test, when X ∼ MVN(μ,Σ) , to test H0 versus H1 − H0 where H0: μ=0 and H1:μ≥0 . Follmann provided strict lower bounds on the power function when an orthogonal mapping requirement was satisfied, the use of which requires knowledge about the unknown population covariance matrix. In this article, we show that the orthogonal mapping requirement for his theorem is equivalent to and can be replaced with 1′μ≥0 , which does not require knowledge about the population covariance matrix. Using the lower bound on power, we are able to develop conservative sample sizes for this test. The conservative sample sizes are upper bounds on the actual sample size needed to achieve at least the desired power. Results from a simulation study are provided illustrating that the sample sizes are indeed upper bounds. Also, a simple R program to calculate sample size is provided. Journal: The American Statistician Pages: 16-21 Issue: 1 Volume: 76 Year: 2022 Month: 1 X-DOI: 10.1080/00031305.2020.1787224 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1787224 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:16-21 Template-Type: ReDIF-Article 1.0 Author-Name: Giulia Carella Author-X-Name-First: Giulia Author-X-Name-Last: Carella Author-Name: Javier Pérez Trufero Author-X-Name-First: Javier Author-X-Name-Last: Pérez Trufero Author-Name: Miguel Álvarez Author-X-Name-First: Miguel Author-X-Name-Last: Álvarez Author-Name: Jorge Mateu Author-X-Name-First: Jorge Author-X-Name-Last: Mateu Title: A Bayesian Spatial Analysis of the Heterogeneity in Human Mobility Changes During the First Wave of the COVID-19 Epidemic in the United States Abstract: The spread of COVID-19 in the U.S. prompted nonpharmaceutical interventions which caused a reduction in mobility everywhere, although with large disparities between different counties. Using a Bayesian spatial modeling framework, we investigated the association of county-level demographic and socioeconomic factors with changes in workplace mobility at two points in time: during the early stages of the epidemic (lockdown phase) and in the following phase (recovery phase) up to July 2020. While controlling for the perceived risk of infection, socioeconomic and demographic covariates explain about 40% of the variance in changes in workplace mobility during the lockdown phase, which reduces to about 10% during the recovery phase. During the lockdown phase, the results show larger drops in mobility in counties with richer families, that are less densely populated, with an older population living in dense neighborhoods, and with a lower proportion of Hispanic population. When also accounting for the residual spatial variability, the variance explained by the model increases to more than 70%, suggesting strong proximity effects potentially related to state- and county-wise regulations. These results provide community-level insights on the evolution of the U.S. mobility during the first wave of the epidemic that could directly benefit policy evaluation and interventions. Journal: The American Statistician Pages: 64-72 Issue: 1 Volume: 76 Year: 2022 Month: 1 X-DOI: 10.1080/00031305.2021.1965657 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1965657 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:64-72 Template-Type: ReDIF-Article 1.0 Author-Name: Jiaqi Gu Author-X-Name-First: Jiaqi Author-X-Name-Last: Gu Author-Name: Yiwei Fan Author-X-Name-First: Yiwei Author-X-Name-Last: Fan Author-Name: Guosheng Yin Author-X-Name-First: Guosheng Author-X-Name-Last: Yin Title: Reconstructing the Kaplan–Meier Estimator as an M-estimator Abstract: The Kaplan–Meier (KM) estimator, which provides a nonparametric estimate of a survival function for time-to-event data, has broad applications in clinical studies, engineering, economics and many other fields. The theoretical properties of the KM estimator including its consistency and asymptotic distribution have been well established. From a new perspective, we reconstruct the KM estimator as an M-estimator by maximizing a quadratic M-function based on concordance, which can be computed using the expectation–maximization (EM) algorithm. It is shown that the convergent point of the EM algorithm coincides with the traditional KM estimator, which offers a new interpretation of the KM estimator as an M-estimator. As a result, the limiting distribution of the KM estimator can be established using M-estimation theory. Application on two real datasets demonstrates that the proposed M-estimator is equivalent to the KM estimator, and the confidence intervals and confidence bands can be derived as well. Journal: The American Statistician Pages: 37-43 Issue: 1 Volume: 76 Year: 2022 Month: 1 X-DOI: 10.1080/00031305.2021.1947376 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1947376 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:37-43 Template-Type: ReDIF-Article 1.0 Author-Name: Byron J. Gajewski Author-X-Name-First: Byron J. Author-X-Name-Last: Gajewski Author-Name: Jo A. Wick Author-X-Name-First: Jo A. Author-X-Name-Last: Wick Author-Name: Truman J. Milling Author-X-Name-First: Truman J. Author-X-Name-Last: Milling Title: A Connection Between Baseball and Clinical Trials Found in “Slugging Percentage is Not a Percentage—And Why That Matters” Journal: The American Statistician Pages: 89-89 Issue: 1 Volume: 76 Year: 2022 Month: 1 X-DOI: 10.1080/00031305.2021.1990128 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1990128 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:89-89 Template-Type: ReDIF-Article 1.0 Author-Name: Brett Presnell Author-X-Name-First: Brett Author-X-Name-Last: Presnell Title: A Geometric Derivation of the Cantor Distribution Abstract: For students of probability and statistics, the Cantor distribution provides a useful example of a continuous probability distribution on the real line which cannot be obtained by integrating its derivative or indeed any density function. While usually treated as an advanced topic, we show that the basic facts about the Cantor distribution can be rigorously derived from a sequence of uniform distributions using simple geometry and recursion, together with one basic result from advanced calculus. Journal: The American Statistician Pages: 73-77 Issue: 1 Volume: 76 Year: 2022 Month: 1 X-DOI: 10.1080/00031305.2021.1905062 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1905062 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:73-77 Template-Type: ReDIF-Article 1.0 Author-Name: Angelika M. Stefan Author-X-Name-First: Angelika M. Author-X-Name-Last: Stefan Title: Statistics for Making Decisions, Journal: The American Statistician Pages: 87-88 Issue: 1 Volume: 76 Year: 2022 Month: 1 X-DOI: 10.1080/00031305.2021.2020003 File-URL: http://hdl.handle.net/10.1080/00031305.2021.2020003 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:87-88 Template-Type: ReDIF-Article 1.0 Author-Name: Emilija Perković Author-X-Name-First: Emilija Author-X-Name-Last: Perković Title: The Phantom Pattern Problem: The Mirage of Big Data, Journal: The American Statistician Pages: 86-87 Issue: 1 Volume: 76 Year: 2022 Month: 1 X-DOI: 10.1080/00031305.2021.2020002 File-URL: http://hdl.handle.net/10.1080/00031305.2021.2020002 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:86-87 Template-Type: ReDIF-Article 1.0 Author-Name: Yang Ni Author-X-Name-First: Yang Author-X-Name-Last: Ni Title: Exploratory Data Analysis with MATLAB, 3rd ed., by Wendy L. Martinez, Angel R. Martinez, and Jeffrey L. Solka Journal: The American Statistician Pages: 85-86 Issue: 1 Volume: 76 Year: 2022 Month: 1 X-DOI: 10.1080/00031305.2021.2020000 File-URL: http://hdl.handle.net/10.1080/00031305.2021.2020000 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:85-86 Template-Type: ReDIF-Article 1.0 Author-Name: Erik van Zwet Author-X-Name-First: Erik van Author-X-Name-Last: Zwet Author-Name: Andrew Gelman Author-X-Name-First: Andrew Author-X-Name-Last: Gelman Title: A Proposal for Informative Default Priors Scaled by the Standard Error of Estimates Abstract: If we have an unbiased estimate of some parameter of interest, then its absolute value is positively biased for the absolute value of the parameter. This bias is large when the signal-to-noise ratio (SNR) is small, and it becomes even larger when we condition on statistical significance; the winner’s curse. This is a frequentist motivation for regularization or “shrinkage.” To determine a suitable amount of shrinkage, we propose to estimate the distribution of the SNR from a large collection or “corpus” of similar studies and use this as a prior distribution. The wider the scope of the corpus, the less informative the prior, but a wider scope does not necessarily result in a more diffuse prior. We show that the estimation of the prior simplifies if we require that posterior inference is equivariant under linear transformations of the data. We demonstrate our approach with corpora of 86 replication studies from psychology and 178 phase 3 clinical trials. Our suggestion is not intended to be a replacement for a prior based on full information about a particular problem; rather, it represents a familywise choice that should yield better long-term properties than the current default uniform prior, which has led to systematic overestimates of effect sizes and a replication crisis when these inflated estimates have not shown up in later studies. Journal: The American Statistician Pages: 1-9 Issue: 1 Volume: 76 Year: 2022 Month: 1 X-DOI: 10.1080/00031305.2021.1938225 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1938225 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:1-9 Template-Type: ReDIF-Article 1.0 Author-Name: Edwin van den Heuvel Author-X-Name-First: Edwin Author-X-Name-Last: van den Heuvel Author-Name: Zhuozhao Zhan Author-X-Name-First: Zhuozhao Author-X-Name-Last: Zhan Title: Myths About Linear and Monotonic Associations: Pearson’s r, Spearman’s ρ, and Kendall’s τ Abstract: Pearson’s correlation coefficient is considered a measure of linear association between bivariate random variables X and Y. It is recommended not to use it for other forms of associations. Indeed, for nonlinear monotonic associations alternative measures like Spearman’s rank and Kendall’s tau correlation coefficients are considered more appropriate. These views or opinions on the estimation of association are strongly rooted in the statistical and other empirical sciences. After defining linear and monotonic associations, we will demonstrate that these opinions are incorrect. Pearson’s correlation coefficient should not be ruled out a priori for measuring nonlinear monotonic associations. We will provide examples of practically relevant families of bivariate distribution functions with nonlinear monotonic associations for which Pearson’s correlation is preferred over Spearman’s rank and Kendall’s tau correlation in testing the dependency between X and Y. Alternatively, we will provide a family of bivariate distributions with a linear association between X and Y for which Spearman’s rank and Kendall’s tau are preferred over Pearson’s correlation. Our examples show that existing views on linear and monotonic associations are myths. Journal: The American Statistician Pages: 44-52 Issue: 1 Volume: 76 Year: 2022 Month: 1 X-DOI: 10.1080/00031305.2021.2004922 File-URL: http://hdl.handle.net/10.1080/00031305.2021.2004922 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:44-52 Template-Type: ReDIF-Article 1.0 Author-Name: Yulia Sidi Author-X-Name-First: Yulia Author-X-Name-Last: Sidi Author-Name: Ofer Harel Author-X-Name-First: Ofer Author-X-Name-Last: Harel Title: Difference Between Binomial Proportions Using Newcombe’s Method With Multiple Imputation for Incomplete Data Abstract: The difference between two binomial proportions is commonly used in applied research. Since many studies encounter incomplete data, proper methods to analyze such data are needed. Here, we present a proper multiple imputation (MI) procedure for constructing confidence interval for difference between binomial proportions using Newcombe’s method, which is known to have a better coverage probability when compared with Wald’s method. We use both a conventional MI procedure for ignorable missingness and a two-stage MI for non-ignorable missingness. Using simulation studies, we compare our method to three other methods and provide recommendation for the use of such methods in practice. In addition, we show the application of our new method on a COVID-19 dataset. Journal: The American Statistician Pages: 29-36 Issue: 1 Volume: 76 Year: 2022 Month: 1 X-DOI: 10.1080/00031305.2021.1898468 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1898468 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:29-36 Template-Type: ReDIF-Article 1.0 Author-Name: Christian H. Weiß Author-X-Name-First: Christian H. Author-X-Name-Last: Weiß Author-Name: Boris Aleksandrov Author-X-Name-First: Boris Author-X-Name-Last: Aleksandrov Title: Computing (Bivariate) Poisson Moments Using Stein–Chen Identities Abstract: Abstract–The (bivariate) Poisson distribution is the most common distribution for (bivariate) count random variables. The univariate Poisson distribution is characterized by the famous Stein–Chen identity. We demonstrate that this identity allows to derive even sophisticated moment expressions in such a simple way that the corresponding computations can be presented in an introductory statistics class. Then, we newly derive different types of Stein–Chen identity for the bivariate Poisson distribution. These are shown to be very useful for computing joint moments, again in a surprisingly simple way. We also explain how to extend our results to the general multivariate case. Journal: The American Statistician Pages: 10-15 Issue: 1 Volume: 76 Year: 2022 Month: 1 X-DOI: 10.1080/00031305.2020.1763836 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1763836 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:10-15 Template-Type: ReDIF-Article 1.0 Author-Name: Joshua Habiger Author-X-Name-First: Joshua Author-X-Name-Last: Habiger Author-Name: Ye Liang Author-X-Name-First: Ye Author-X-Name-Last: Liang Title: Publication Policies for Replicable Research and the Community-Wide False Discovery Rate Abstract: Recent literature has shown that statistically significant results are often not replicated because the “p-value < 0.05” publication rule results in a high false positive rate (FPR) or false discovery rate (FDR) in some scientific communities. While recommendations to address the phenomenon vary, many amount to incorporating additional study summary information, such as prior null hypothesis odds and/or effect sizes, in some way. This article demonstrates that a statistic called the local false discovery rate (lfdr), which incorporates this information, is a sufficient summary for addressing false positive rates. Specifically, it is shown that lfdr-values among published results are sufficient for estimating the community-wide FDR for any well-defined publication policy, and that lfdr-values are sufficient for defining policies for community-wide FDR control. It is also demonstrated that, though p-values can be useful for computing an lfdr, they alone are not sufficient for addressing the community-wide FDR. Data from the recent replication study are used to compare publication policies and illustrate the FDR estimator. Journal: The American Statistician Pages: 131-141 Issue: 2 Volume: 76 Year: 2022 Month: 4 X-DOI: 10.1080/00031305.2021.1999857 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1999857 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:131-141 Template-Type: ReDIF-Article 1.0 Author-Name: Jelle J. Goeman Author-X-Name-First: Jelle J. Author-X-Name-Last: Goeman Author-Name: Aldo Solari Author-X-Name-First: Aldo Author-X-Name-Last: Solari Title: Comparing Three Groups Abstract: For multiple comparisons in analysis of variance, the practitioners’ handbooks generally advocate standard methods such as Bonferroni, or an F-test followed by Tukey’s honest significant difference method. These methods are known to be suboptimal compared to closed testing procedures, but improved methods can be complex in the general multigroup set-up. In this note, we argue that the case of three-groups is special: with three groups, closed testing procedures are powerful and easy to use. We describe four different closed testing procedures specifically for the three-group set-up. The choice of method should be determined by assessing which of the comparisons are considered primary and which are secondary, as dictated by subject-matter considerations. We describe how all four methods can be used with any standard software. Journal: The American Statistician Pages: 168-176 Issue: 2 Volume: 76 Year: 2022 Month: 4 X-DOI: 10.1080/00031305.2021.2002188 File-URL: http://hdl.handle.net/10.1080/00031305.2021.2002188 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:168-176 Template-Type: ReDIF-Article 1.0 Author-Name: Nitis Mukhopadhyay Author-X-Name-First: Nitis Author-X-Name-Last: Mukhopadhyay Title: Pairwise Independence May Not Imply Independence: New Illustrations and a Generalization Abstract: A number of standard textbooks that are followed in a junior/senior level course or in a first-year graduate level course in mathematical statistics and probability, routinely include one single basic illustration, obviously in its variant forms, to highlight an important point: pairwise independence may not imply (mutual) independence. We earnestly believe that beginning students appreciate more examples to clarify these key issues. Hence, we hope that our new sets of nontrivial illustrations from Section 2 will help our audience. Next, in Section 3, we extend the notion to q-wise independence with a large set of illustrations using both discrete and continuous random variables showing that q-wise independence may not imply (mutual) independence. We reasonably assure that this discourse is immediately accessible to juniors/seniors and first-year graduate students. Journal: The American Statistician Pages: 184-187 Issue: 2 Volume: 76 Year: 2022 Month: 4 X-DOI: 10.1080/00031305.2022.2039763 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2039763 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:184-187 Template-Type: ReDIF-Article 1.0 Author-Name: Rachel C. Nethery Author-X-Name-First: Rachel C. Author-X-Name-Last: Nethery Author-Name: Jarvis T. Chen Author-X-Name-First: Jarvis T. Author-X-Name-Last: Chen Author-Name: Nancy Krieger Author-X-Name-First: Nancy Author-X-Name-Last: Krieger Author-Name: Pamela D. Waterman Author-X-Name-First: Pamela D. Author-X-Name-Last: Waterman Author-Name: Emily Peterson Author-X-Name-First: Emily Author-X-Name-Last: Peterson Author-Name: Lance A. Waller Author-X-Name-First: Lance A. Author-X-Name-Last: Waller Author-Name: Brent A. Coull Author-X-Name-First: Brent A. Author-X-Name-Last: Coull Title: Statistical Implications of Endogeneity Induced by Residential Segregation in Small-Area Modeling of Health Inequities Abstract: Health inequities are assessed by health departments to identify social groups disproportionately burdened by disease and by academic researchers to understand how social, economic, and environmental inequities manifest as health inequities. To characterize inequities, group-specific small-area health data are often modeled using log-linear generalized linear models (GLM) or generalized linear mixed models (GLMM) with a random intercept. These approaches estimate the same marginal rate ratio comparing disease rates across groups under standard assumptions. Here we explore how residential segregation combined with social group differences in disease risk can lead to contradictory findings from the GLM and GLMM. We show that this occurs because small-area disease rate data collected under these conditions induce endogeneity in the GLMM due to correlation between the model’s offset and random effect. This results in GLMM estimates that represent conditional rather than marginal associations. We refer to endogeneity arising from the offset, which to our knowledge has not been noted previously, as “offset endogeneity.” We illustrate this phenomenon in simulated data and real premature mortality data, and we propose alternative modeling approaches to address it. We also introduce to a statistical audience the social epidemiologic terminology for framing health inequities, which enables responsible interpretation of results. Journal: The American Statistician Pages: 142-151 Issue: 2 Volume: 76 Year: 2022 Month: 4 X-DOI: 10.1080/00031305.2021.2003245 File-URL: http://hdl.handle.net/10.1080/00031305.2021.2003245 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:142-151 Template-Type: ReDIF-Article 1.0 Author-Name: Yi Zuo Author-X-Name-First: Yi Author-X-Name-Last: Zuo Author-Name: Thomas G. Stewart Author-X-Name-First: Thomas G. Author-X-Name-Last: Stewart Author-Name: Jeffrey D. Blume Author-X-Name-First: Jeffrey D. Author-X-Name-Last: Blume Title: Variable Selection With Second-Generation P-Values Abstract: Many statistical methods have been proposed for variable selection in the past century, but few balance inference and prediction tasks well. Here, we report on a novel variable selection approach called penalized regression with second-generation p-values (ProSGPV). It captures the true model at the best rate achieved by current standards, is easy to implement in practice, and often yields the smallest parameter estimation error. The idea is to use an l0 penalization scheme with second-generation p-values (SGPV), instead of traditional ones, to determine which variables remain in a model. The approach yields tangible advantages for balancing support recovery, parameter estimation, and prediction tasks. The ProSGPV algorithm can maintain its good performance even when there is strong collinearity among features or when a high-dimensional feature space with p > n is considered. We present extensive simulations and a real-world application comparing the ProSGPV approach with smoothly clipped absolute deviation (SCAD), adaptive lasso (AL), and minimax concave penalty with penalized linear unbiased selection (MC+). While the last three algorithms are among the current standards for variable selection, ProSGPV has superior inference performance and comparable prediction performance in certain scenarios. Journal: The American Statistician Pages: 91-101 Issue: 2 Volume: 76 Year: 2022 Month: 4 X-DOI: 10.1080/00031305.2021.1946150 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1946150 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:91-101 Template-Type: ReDIF-Article 1.0 Author-Name: Qiwei Li Author-X-Name-First: Qiwei Author-X-Name-Last: Li Title: Bayesian Analysis of Infectious Diseases: COVID-19 and Beyond. Journal: The American Statistician Pages: 199-199 Issue: 2 Volume: 76 Year: 2022 Month: 4 X-DOI: 10.1080/00031305.2022.2054625 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2054625 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:199-199 Template-Type: ReDIF-Article 1.0 Author-Name: Georg Zimmermann Author-X-Name-First: Georg Author-X-Name-Last: Zimmermann Author-Name: Edgar Brunner Author-X-Name-First: Edgar Author-X-Name-Last: Brunner Author-Name: Werner Brannath Author-X-Name-First: Werner Author-X-Name-Last: Brannath Author-Name: Martin Happ Author-X-Name-First: Martin Author-X-Name-Last: Happ Author-Name: Arne C. Bathke Author-X-Name-First: Arne C. Author-X-Name-Last: Bathke Title: Pseudo-Ranks: The Better Way of Ranking? Abstract: Rank-based methods are frequently used in the life sciences, and in the empirical sciences in general. Among the best-known examples of nonparametric rank-based tests are the Wilcoxon-Mann-Whitney test and the Kruskal–Wallis test. However, recently, potential pitfalls and paradoxical results pertaining to the use of traditional rank-based procedures for more than two samples have been highlighted, and the so-called pseudo-ranks have been proposed as a remedy for this type of problems. The aim of the present article is twofold: First, we show that pseudo-ranks might also behave counterintuitively when splitting up groups. Second, since the use of pseudo-ranks leads to a slightly different interpretation of the results, we provide some guidance regarding the decision for one or the other approach, in particular with respect to interpretability and generalizability of the findings. It turns out that the choice of the reference distribution, to which the individual groups are compared, is crucial. The practically relevant implications of these aspects are illustrated by a discussion of a dataset from epilepsy research. Summing up, one should decide based on thorough case-by-case considerations whether ranks or pseudo-ranks are appropriate. Journal: The American Statistician Pages: 124-130 Issue: 2 Volume: 76 Year: 2022 Month: 4 X-DOI: 10.1080/00031305.2021.1972836 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1972836 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:124-130 Template-Type: ReDIF-Article 1.0 Author-Name: Dale L. Zimmerman Author-X-Name-First: Dale L. Author-X-Name-Last: Zimmerman Author-Name: Jay M. Ver Hoef Author-X-Name-First: Jay M. Author-X-Name-Last: Ver Hoef Title: On Deconfounding Spatial Confounding in Linear Models Abstract: Spatial confounding, that is, collinearity between fixed effects and random effects in a spatial generalized linear mixed model, can adversely affect estimates of the fixed effects. Restricted spatial regression methods have been proposed as a remedy for spatial confounding. Such methods replace inference for the fixed effects of the original model with inference for those effects under a model in which the random effects are restricted to a subspace orthogonal to the column space of the fixed effects model matrix; thus, they “deconfound” the two types of effects. We prove, however, that frequentist inference for the fixed effects of a deconfounded linear model is generally inferior to that for the fixed effects of the original spatial linear model; in fact, it is even inferior to inference for the corresponding nonspatial model. We show further that deconfounding also leads to inferior predictive inferences, though its impact on prediction appears to be relatively small in practice. Based on these results, we argue that deconfounding a spatial linear model is bad statistical practice and should be avoided. Journal: The American Statistician Pages: 159-167 Issue: 2 Volume: 76 Year: 2022 Month: 4 X-DOI: 10.1080/00031305.2021.1946149 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1946149 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:159-167 Template-Type: ReDIF-Article 1.0 Author-Name: Weixiao Dai Author-X-Name-First: Weixiao Author-X-Name-Last: Dai Author-Name: Toshimitsu Hamasaki Author-X-Name-First: Toshimitsu Author-X-Name-Last: Hamasaki Title: Statistics in Medicine Journal: The American Statistician Pages: 199-200 Issue: 2 Volume: 76 Year: 2022 Month: 4 X-DOI: 10.1080/00031305.2022.2054626 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2054626 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:199-200 Template-Type: ReDIF-Article 1.0 Author-Name: Philippe Besse Author-X-Name-First: Philippe Author-X-Name-Last: Besse Author-Name: Eustasio del Barrio Author-X-Name-First: Eustasio Author-X-Name-Last: del Barrio Author-Name: Paula Gordaliza Author-X-Name-First: Paula Author-X-Name-Last: Gordaliza Author-Name: Jean-Michel Loubes Author-X-Name-First: Jean-Michel Author-X-Name-Last: Loubes Author-Name: Laurent Risser Author-X-Name-First: Laurent Author-X-Name-Last: Risser Title: A Survey of Bias in Machine Learning Through the Prism of Statistical Parity Abstract: Applications based on machine learning models have now become an indispensable part of the everyday life and the professional world. As a consequence, a critical question has recently arose among the population: Do algorithmic decisions convey any type of discrimination against specific groups of population or minorities? In this article, we show the importance of understanding how bias can be introduced into automatic decisions. We first present a mathematical framework for the fair learning problem, specifically in the binary classification setting. We then propose to quantify the presence of bias by using the standard disparate impact index on the real and well-known adult income dataset. Finally, we check the performance of different approaches aiming to reduce the bias in binary classification outcomes. Importantly, we show that some intuitive methods are ineffective with respect to the statistical parity criterion. This sheds light on the fact that trying to make fair machine learning models may be a particularly challenging task, in particular when the training observations contain some bias. Journal: The American Statistician Pages: 188-198 Issue: 2 Volume: 76 Year: 2022 Month: 4 X-DOI: 10.1080/00031305.2021.1952897 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1952897 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:188-198 Template-Type: ReDIF-Article 1.0 Author-Name: Ryan Elmore Author-X-Name-First: Ryan Author-X-Name-Last: Elmore Author-Name: Gregory J. Matthews Author-X-Name-First: Gregory J. Author-X-Name-Last: Matthews Title: Bang the Can Slowly: An Investigation into the 2017 Houston Astros Abstract: This article is a statistical investigation into the 2017 Major League Baseball scandal involving the Houston Astros, the World Series championship winner that the same year. The Astros were alleged to have stolen their opponents’ pitching signs in order to provide their batters with a potentially unfair advantage. This work finds compelling evidence that the Astros on-field performance was significantly affected by their sign-stealing ploy and quantifies the effects. The three main findings in the article are (i) the Astros’ odds of swinging at a pitch were reduced by approximately 27% (OR: 0.725, 95% CI: (0.618, 0.850)) when the sign was stolen, (ii) when an Astros player swung, the odds of making contact with the ball increased roughly 80% (OR: 1.805, 95% CI: (1.342, 2.675)) on non-fastball pitches, and (iii) when the Astros made contact with a ball on a pitch in which the sign was known, the ball’s exit velocity (launch speed) increased on average by 2.386 (95% CI: (0.334, 4.451)) miles per hour. Journal: The American Statistician Pages: 110-116 Issue: 2 Volume: 76 Year: 2022 Month: 4 X-DOI: 10.1080/00031305.2021.1902391 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1902391 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:110-116 Template-Type: ReDIF-Article 1.0 Author-Name: Wei Wang Author-X-Name-First: Wei Author-X-Name-Last: Wang Author-Name: Dylan S. Small Author-X-Name-First: Dylan S. Author-X-Name-Last: Small Author-Name: Guy Cafri Author-X-Name-First: Guy Author-X-Name-Last: Cafri Author-Name: Elizabeth W. Paxton Author-X-Name-First: Elizabeth W. Author-X-Name-Last: Paxton Title: The Case-Control Approach Can be More Powerful for Matched Pair Observational Studies When the Outcome is Rare Abstract: In an observational study, to investigate the treatment effect, one common strategy is to match the control subjects to the treated subjects. The outcomes between the two groups are then compared after the TC (treatment-control) match. However, when the outcome is rare, detection of an outcome difference can be challenging. An alternative approach is to compare the treatment or exposure discrepancy after matching subjects with the outcome (cases) to subjects without the outcome (referents). Throughout the article, we follow the tradition to call this the matched “case-control” approach instead of the matched “case-referent” approach. We reserve “control” to mean not taking the treatment, and the abbreviation TC and CC (case-control) when possible confusion may arise. We derive conditions when the matched CC approach has more power for testing the treatment effect and examine its empirical performance in simulations and in our data example. We also show that the CC approach gives better match quality in our study of the effect of long vs. short stay in the hospital after joint surgery. Journal: The American Statistician Pages: 117-123 Issue: 2 Volume: 76 Year: 2022 Month: 4 X-DOI: 10.1080/00031305.2021.1972835 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1972835 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:117-123 Template-Type: ReDIF-Article 1.0 Author-Name: Joris Mulder Author-X-Name-First: Joris Author-X-Name-Last: Mulder Author-Name: Eric-Jan Wagenmakers Author-X-Name-First: Eric-Jan Author-X-Name-Last: Wagenmakers Author-Name: Maarten Marsman Author-X-Name-First: Maarten Author-X-Name-Last: Marsman Title: A Generalization of the Savage–Dickey Density Ratio for Testing Equality and Order Constrained Hypotheses Abstract: The Savage–Dickey density ratio is a specific expression of the Bayes factor when testing a precise (equality constrained) hypothesis against an unrestricted alternative. The expression greatly simplifies the computation of the Bayes factor at the cost of assuming a specific form of the prior under the precise hypothesis as a function of the unrestricted prior. A generalization was proposed by Verdinelli and Wasserman such that the priors can be freely specified under both hypotheses while keeping the computational advantage. This article presents an extension of this generalization when the hypothesis has equality as well as order constraints on the parameters of interest. The methodology is used for a constrained multivariate t-test using the JZS Bayes factor and a constrained hypothesis test under the multinomial model. Journal: The American Statistician Pages: 102-109 Issue: 2 Volume: 76 Year: 2022 Month: 4 X-DOI: 10.1080/00031305.2020.1799861 File-URL: http://hdl.handle.net/10.1080/00031305.2020.1799861 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:102-109 Template-Type: ReDIF-Article 1.0 Author-Name: Sushil Kumar Singh Author-X-Name-First: Sushil Kumar Author-X-Name-Last: Singh Author-Name: Neelkanth Rawat Author-X-Name-First: Neelkanth Author-X-Name-Last: Rawat Author-Name: Sargun Singh Author-X-Name-First: Sargun Author-X-Name-Last: Singh Author-Name: Savinder Kaur Author-X-Name-First: Savinder Author-X-Name-Last: Kaur Title: Re-exploring the Penney-Ante Game Abstract: We propose a single loop diagram and use it to devise a single loop matrix method to computationally solve the Penney-Ante game. This method avoids the nuances of repeated use of conditional probability and Markov chain representations. We remove the limitations of Conway’s trick as applied to a fair coin and generalize the method where the coin is allowed to be biased. A uniform random number generator is used to simulate the game and formulate implicit mathematical relations to explore nontransitivity. Journal: The American Statistician Pages: 177-183 Issue: 2 Volume: 76 Year: 2022 Month: 4 X-DOI: 10.1080/00031305.2021.1961860 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1961860 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:177-183 Template-Type: ReDIF-Article 1.0 Author-Name: The Editors Title: The Impact of Application of the Jackknife to the Sample Median Journal: The American Statistician Pages: 201-201 Issue: 2 Volume: 76 Year: 2022 Month: 4 X-DOI: 10.1080/00031305.2022.2032827 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2032827 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:201-201 Template-Type: ReDIF-Article 1.0 Author-Name: Shing Lee Author-X-Name-First: Shing Author-X-Name-Last: Lee Author-Name: Emilia Bagiella Author-X-Name-First: Emilia Author-X-Name-Last: Bagiella Author-Name: Roger Vaughan Author-X-Name-First: Roger Author-X-Name-Last: Vaughan Author-Name: Usha Govindarajulu Author-X-Name-First: Usha Author-X-Name-Last: Govindarajulu Author-Name: Paul Christos Author-X-Name-First: Paul Author-X-Name-Last: Christos Author-Name: Denise Esserman Author-X-Name-First: Denise Author-X-Name-Last: Esserman Author-Name: Hua Zhong Author-X-Name-First: Hua Author-X-Name-Last: Zhong Author-Name: Mimi Kim Author-X-Name-First: Mimi Author-X-Name-Last: Kim Title: COVID-19 Pandemic as a Change Agent in the Structure and Practice of Statistical Consulting Centers Abstract: When New York City (NYC) became an epicenter of the COVID-19 pandemic in the spring of 2020, statistical consulting centers at academic medical institutions in the area were immediately inundated with requests from hospital leadership and researchers for methodological support to address different aspects of the outbreak. Statisticians suddenly had to pivot from their usual responsibilities to focus entirely on COVID-19 work, and consulting centers had to devise innovative strategies to restructure their workflow and develop new infrastructure to address the acute demand for support. As statisticians from seven NYC-area institutions, we share our experiences and lessons learned during the pandemic, with the hope that this will lead not only to better preparedness for future public health crises when the skills and expertise of statisticians are critically needed, but also to lasting improvements to the structure and practice of statistical consulting centers. Journal: The American Statistician Pages: 152-158 Issue: 2 Volume: 76 Year: 2022 Month: 4 X-DOI: 10.1080/00031305.2021.2023045 File-URL: http://hdl.handle.net/10.1080/00031305.2021.2023045 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:152-158 Template-Type: ReDIF-Article 1.0 Author-Name: Christopher R. Bilder Author-X-Name-First: Christopher R. Author-X-Name-Last: Bilder Title: Alpha Seminar: A Course for New Graduate Students in Statistics Abstract: The accumulation of technical knowledge is the central focus of graduate programs in statistics. However, student success does not depend solely on acquiring such knowledge. Rather, students must also understand the rigors of graduate study to complete their degree. And, they need to understand the statistics profession to prepare for a career after graduation. The purpose of the one-credit hour Alpha Seminar course at the University of Nebraska-Lincoln is to educate graduate students in these nontechnical areas. Students are required to enroll in Alpha Seminar during their first semester of study. In addition to advisement on courses and graduation requirements, Alpha Seminar features topics on career paths, ethics, professional accreditation, internships, and professional societies. Alumni also meet with the class to discuss how to be successful in the program and in a future career. This article discusses course topics, examines assignments, and provides evaluations from student cohorts. The corresponding course website is available at www.chrisbilder.com/stat810. Journal: The American Statistician Pages: 286-291 Issue: 3 Volume: 76 Year: 2022 Month: 7 X-DOI: 10.1080/00031305.2022.2049366 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2049366 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:286-291 Template-Type: ReDIF-Article 1.0 Author-Name: Chi-Kuang Yeh Author-X-Name-First: Chi-Kuang Author-X-Name-Last: Yeh Author-Name: Gregory Rice Author-X-Name-First: Gregory Author-X-Name-Last: Rice Author-Name: Joel A. Dubin Author-X-Name-First: Joel A. Author-X-Name-Last: Dubin Title: Evaluating Real-Time Probabilistic Forecasts With Application to National Basketball Association Outcome Prediction Abstract: Motivated by the goal of evaluating real-time forecasts of home team win probabilities in the National Basketball Association, we develop new tools for measuring the quality of continuously updated probabilistic forecasts. This includes introducing calibration surface plots, and simple graphical summaries of them, to evaluate at a glance whether a given continuously updated probability forecasting method is well-calibrated, as well as developing statistical tests and graphical tools to evaluate the skill, or relative performance, of two competing continuously updated forecasting methods. These tools are demonstrated in an application to evaluate the continuously updated forecasts published by the United States-based multinational sports network ESPN on its principle webpage espn.com. This application lends statistical evidence that the forecasts published there are well-calibrated, and exhibit improved skill over several naïve models, but do not demonstrate significantly improved skill over simple logistic regression models based solely on a measurement of each teams’ relative strength, and the evolving score difference throughout the game. Journal: The American Statistician Pages: 214-223 Issue: 3 Volume: 76 Year: 2022 Month: 7 X-DOI: 10.1080/00031305.2021.1967781 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1967781 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:214-223 Template-Type: ReDIF-Article 1.0 Author-Name: Chris Barker Author-X-Name-First: Chris Author-X-Name-Last: Barker Title: Data Monitoring Committees in Clinical Trials: A Practical Perspective Journal: The American Statistician Pages: 305-306 Issue: 3 Volume: 76 Year: 2022 Month: 7 X-DOI: 10.1080/00031305.2022.2088199 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2088199 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:305-306 Template-Type: ReDIF-Article 1.0 Author-Name: Brendan Kline Author-X-Name-First: Brendan Author-X-Name-Last: Kline Title: Bayes Factors Based on p-Values and Sets of Priors With Restricted Strength Abstract: This article focuses on the minimum Bayes factor compatible with a p-value, considering a set of priors with restricted strength. The resulting minimum Bayes factor depends on both the strength of the set of priors and the sample size. The results can be used to interpret the evidence for/against the hypothesis provided by a p-value in a way that accounts for the strength of the priors and the sample size. In particular, the results suggest further lowering the p-value cutoff for “statistical significance.” Journal: The American Statistician Pages: 203-213 Issue: 3 Volume: 76 Year: 2022 Month: 7 X-DOI: 10.1080/00031305.2021.1877815 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1877815 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:203-213 Template-Type: ReDIF-Article 1.0 Author-Name: Kimihiro Noguchi Author-X-Name-First: Kimihiro Author-X-Name-Last: Noguchi Author-Name: Koby F. Robles Author-X-Name-First: Koby F. Author-X-Name-Last: Robles Title: On Generating Distributions with the Memoryless Property Abstract: The exponential and geometric distribution are well-known continuous and discrete family of distributions with the memoryless property, respectively. The memoryless property is emphasized in introductory probability and statistics textbooks even though no distribution beyond these two families of distributions has been explored in detail. By examining the relationship between these two families of distributions, we propose a general algorithm for generating distributions with the memoryless property. Then, we show that the general algorithm uniquely determines the distribution with the memoryless property given the parameter value, and nonnegative support which contains zero and is closed under addition. Furthermore, we present a few nontrivial examples and their applications to demonstrate the richness of such distributions. Journal: The American Statistician Pages: 280-285 Issue: 3 Volume: 76 Year: 2022 Month: 7 X-DOI: 10.1080/00031305.2021.2006782 File-URL: http://hdl.handle.net/10.1080/00031305.2021.2006782 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:280-285 Template-Type: ReDIF-Article 1.0 Author-Name: Nathaniel T. Stevens Author-X-Name-First: Nathaniel T. Author-X-Name-Last: Stevens Author-Name: Luke Hagar Author-X-Name-First: Luke Author-X-Name-Last: Hagar Title: Comparative Probability Metrics: Using Posterior Probabilities to Account for Practical Equivalence in A/B tests Abstract: Recently, online-controlled experiments (i.e., A/B tests) have become an extremely valuable tool used by internet and technology companies for purposes of advertising, product development, product improvement, customer acquisition, and customer retention to name a few. The data-driven decisions that result from these experiments have traditionally been informed by null hypothesis significance tests and analyses based on p-values. However, recently attention has been drawn to the shortcomings of hypothesis testing, and an emphasis has been placed on the development of new methodologies that overcome these shortcomings. We propose the use of posterior probabilities to facilitate comparisons that account for practical equivalence and that quantify the likelihood that a result is practically meaningful, as opposed to statistically significant. We call these posterior probabilities comparative probability metrics (CPMs). This Bayesian methodology provides a flexible and intuitive means of making meaningful comparisons by directly calculating, for example, the probability that two groups are practically equivalent, or the probability that one group is practically superior to another. In this article, we describe a unified framework for constructing and estimating such probabilities, and we develop a sample size determination methodology that may be used to determine how much data are required to calculate trustworthy CPMs. Journal: The American Statistician Pages: 224-237 Issue: 3 Volume: 76 Year: 2022 Month: 7 X-DOI: 10.1080/00031305.2021.2000495 File-URL: http://hdl.handle.net/10.1080/00031305.2021.2000495 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:224-237 Template-Type: ReDIF-Article 1.0 Author-Name: Ronald Christensen Author-X-Name-First: Ronald Author-X-Name-Last: Christensen Title: Comment on “On the Power of the F-test for Hypotheses in a Linear Model” by Griffiths and Hill (2022) Journal: The American Statistician Pages: 310-311 Issue: 3 Volume: 76 Year: 2022 Month: 7 X-DOI: 10.1080/00031305.2022.2074541 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2074541 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:310-311 Template-Type: ReDIF-Article 1.0 Author-Name: Kenneth Rice Author-X-Name-First: Kenneth Author-X-Name-Last: Rice Author-Name: Lingbo Ye Author-X-Name-First: Lingbo Author-X-Name-Last: Ye Title: Expressing Regret: A Unified View of Credible Intervals Abstract: Posterior uncertainty is typically summarized as a credible interval, an interval in the parameter space that contains a fixed proportion—usually 95%—of the posterior’s support. For multivariate parameters, credible sets perform the same role. There are of course many potential 95% intervals from which to choose, yet even standard choices are rarely justified in any formal way. In this article we give a general method, focusing on the loss function that motivates an estimate—the Bayes rule—around which we construct a credible set. The set contains all points which, as estimates, would have minimally-worse expected loss than the Bayes rule: we call this excess expected loss “regret.” The approach can be used for any model and prior, and we show how it justifies all widely used choices of credible interval/set. Further examples show how it provides insights into more complex estimation problems. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 248-256 Issue: 3 Volume: 76 Year: 2022 Month: 7 X-DOI: 10.1080/00031305.2022.2039764 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2039764 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:248-256 Template-Type: ReDIF-Article 1.0 Author-Name: Oliver Hines Author-X-Name-First: Oliver Author-X-Name-Last: Hines Author-Name: Oliver Dukes Author-X-Name-First: Oliver Author-X-Name-Last: Dukes Author-Name: Karla Diaz-Ordaz Author-X-Name-First: Karla Author-X-Name-Last: Diaz-Ordaz Author-Name: Stijn Vansteelandt Author-X-Name-First: Stijn Author-X-Name-Last: Vansteelandt Title: Demystifying Statistical Learning Based on Efficient Influence Functions Abstract: Evaluation of treatment effects and more general estimands is typically achieved via parametric modeling, which is unsatisfactory since model misspecification is likely. Data-adaptive model building (e.g., statistical/machine learning) is commonly employed to reduce the risk of misspecification. Naïve use of such methods, however, delivers estimators whose bias may shrink too slowly with sample size for inferential methods to perform well, including those based on the bootstrap. Bias arises because standard data-adaptive methods are tuned toward minimal prediction error as opposed to, for example, minimal MSE in the estimator. This may cause excess variability that is difficult to acknowledge, due to the complexity of such strategies. Building on results from nonparametric statistics, targeted learning and debiased machine learning overcome these problems by constructing estimators using the estimand’s efficient influence function under the nonparametric model. These increasingly popular methodologies typically assume that the efficient influence function is given, or that the reader is familiar with its derivation. In this article, we focus on derivation of the efficient influence function and explain how it may be used to construct statistical/machine-learning-based estimators. We discuss the requisite conditions for these estimators to perform well and use diverse examples to convey the broad applicability of the theory. Journal: The American Statistician Pages: 292-304 Issue: 3 Volume: 76 Year: 2022 Month: 7 X-DOI: 10.1080/00031305.2021.2021984 File-URL: http://hdl.handle.net/10.1080/00031305.2021.2021984 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:292-304 Template-Type: ReDIF-Article 1.0 Author-Name: Chanseok Park Author-X-Name-First: Chanseok Author-X-Name-Last: Park Author-Name: Kun Gou Author-X-Name-First: Kun Author-X-Name-Last: Gou Author-Name: Min Wang Author-X-Name-First: Min Author-X-Name-Last: Wang Title: A Study on Estimating the Parameter of the Truncated Geometric Distribution Abstract: We consider the truncated geometric distribution and analyze the condition under which a nontrivial maximum likelihood (ML) estimator of the parameter p exists. Additionally, the uniqueness criterion of such an ML estimator is also investigated. Our results indicate that in order to ensure the existence of a nontrivial ML estimator, the sample mean should be smaller than the midpoint of the two boundary positions. Without such a condition, the ML estimator will only exist trivially at p = 0. Finally, we demonstrate that the same condition is also required for the existence of the method of moments estimator. Our results lead to a rigorous understanding of the two estimators and aid in the interpretation of experimental designs that incorporate the truncated geometric distribution. Journal: The American Statistician Pages: 257-261 Issue: 3 Volume: 76 Year: 2022 Month: 7 X-DOI: 10.1080/00031305.2022.2034666 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2034666 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:257-261 Template-Type: ReDIF-Article 1.0 Author-Name: Emilija Perković Author-X-Name-First: Emilija Author-X-Name-Last: Perković Title: Leadership in Statistics and Data Science: Planning for Inclusive Excellence, Journal: The American Statistician Pages: 306-307 Issue: 3 Volume: 76 Year: 2022 Month: 7 X-DOI: 10.1080/00031305.2022.2088201 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2088201 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:306-307 Template-Type: ReDIF-Article 1.0 Author-Name: Francis K. C. Hui Author-X-Name-First: Francis K. C. Author-X-Name-Last: Hui Author-Name: Howard D. Bondell Author-X-Name-First: Howard D. Author-X-Name-Last: Bondell Title: Spatial Confounding in Generalized Estimating Equations Abstract: Spatial confounding, where the inclusion of a spatial random effect introduces multicollinearity with spatially structured covariates, is a contentious and active area of research in spatial statistics. However, the majority of research into this topic has focused on the case of spatial mixed models. In this article, we demonstrate that spatial confounding can also arise in the setting of generalized estimating equations (GEEs). The phenomenon occurs when a spatially structured working correlation matrix is used, as it effectively induces a spatial effect which may exhibit collinearity with the covariates in the marginal mean. As a result, the GEE ends up estimating a so-called unpartitioned effect of the covariates. To overcome spatial confounding, we propose a restricted spatial working correlation matrix that leads the GEE to instead estimate a partitioned covariate effect, which additionally captures the portion of spatial variability in the response spanned by the column space of the covariates. We also examine the construction of sandwich-based standard errors, showing that the issue of efficiency is tied to whether the working correlation matrix aligns with the target effect of interest. We conclude by highlighting the need for practitioners to make clear the assumptions and target of interest when applying GEEs in a spatial setting, and not simply rely on the robustness property of GEEs to misspecification of the working correlation matrix. Journal: The American Statistician Pages: 238-247 Issue: 3 Volume: 76 Year: 2022 Month: 7 X-DOI: 10.1080/00031305.2021.2009372 File-URL: http://hdl.handle.net/10.1080/00031305.2021.2009372 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:238-247 Template-Type: ReDIF-Article 1.0 Author-Name: Maren Hackenberg Author-X-Name-First: Maren Author-X-Name-Last: Hackenberg Author-Name: Marlon Grodd Author-X-Name-First: Marlon Author-X-Name-Last: Grodd Author-Name: Clemens Kreutz Author-X-Name-First: Clemens Author-X-Name-Last: Kreutz Author-Name: Martina Fischer Author-X-Name-First: Martina Author-X-Name-Last: Fischer Author-Name: Janina Esins Author-X-Name-First: Janina Author-X-Name-Last: Esins Author-Name: Linus Grabenhenrich Author-X-Name-First: Linus Author-X-Name-Last: Grabenhenrich Author-Name: Christian Karagiannidis Author-X-Name-First: Christian Author-X-Name-Last: Karagiannidis Author-Name: Harald Binder Author-X-Name-First: Harald Author-X-Name-Last: Binder Title: Using Differentiable Programming for Flexible Statistical Modeling Abstract: Differentiable programming has recently received much interest as a paradigm that facilitates taking gradients of computer programs. While the corresponding flexible gradient-based optimization approaches so far have been used predominantly for deep learning or enriching the latter with modeling components, we want to demonstrate that they can also be useful for statistical modeling per se, for example, for quick prototyping when classical maximum likelihood approaches are challenging or not feasible. In an application from a COVID-19 setting, we use differentiable programming to quickly build and optimize a flexible prediction model adapted to the data quality challenges at hand. Specifically, we develop a regression model, inspired by delay differential equations, that can bridge temporal gaps of observations in the central German registry of COVID-19 intensive care cases for predicting future demand. With this exemplary modeling challenge, we illustrate how differentiable programming can enable simple gradient-based optimization of the model by automatic differentiation. This allowed us to quickly prototype a model under time pressure that outperforms simpler benchmark models. We thus exemplify the potential of differentiable programming also outside deep learning applications to provide more options for flexible applied statistical modeling. Journal: The American Statistician Pages: 270-279 Issue: 3 Volume: 76 Year: 2022 Month: 7 X-DOI: 10.1080/00031305.2021.2002189 File-URL: http://hdl.handle.net/10.1080/00031305.2021.2002189 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:270-279 Template-Type: ReDIF-Article 1.0 Author-Name: David A. Harville Author-X-Name-First: David A. Author-X-Name-Last: Harville Title: Comment on “On the Power of the F-test for Hypotheses in a Linear Model,” by Griffiths and Hill (2022) Journal: The American Statistician Pages: 308-309 Issue: 3 Volume: 76 Year: 2022 Month: 7 X-DOI: 10.1080/00031305.2022.2074540 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2074540 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:308-309 Template-Type: ReDIF-Article 1.0 Author-Name: Katherine Allen-Moyer Author-X-Name-First: Katherine Author-X-Name-Last: Allen-Moyer Author-Name: Jonathan Stallrich Author-X-Name-First: Jonathan Author-X-Name-Last: Stallrich Title: Incorporating Minimum Variances into Weighted Optimality Criteria Abstract: Weighted optimality criteria allow an experimenter to express hierarchical interest across estimable functions through a concise weighting system. We show how such criteria can be implicitly influenced by the estimable functions’ minimum variances, leading to nonintuitive variance properties of the optimal designs. To address this, we propose a new optimality and evaluation approach that incorporates these minimum variances. A modified c-optimality criterion is introduced to calculate an estimable function’s minimum variance while requiring estimability of all other functions of interest. These minimum variances are then incorporated into a standardized weighted A-criterion that has an intuitive weighting system. We argue that optimal designs under this criterion tend to satisfy the conditions of a new design property we call weight adherence that sets appropriate expectations for how a given weighting system will influence variance properties. A practical, exploratory approach is then described for weighted optimal design generation and evaluation. Examples of the exploratory approach and weight adherence are provided for two types of factorial experiments. Journal: The American Statistician Pages: 262-269 Issue: 3 Volume: 76 Year: 2022 Month: 7 X-DOI: 10.1080/00031305.2021.1947375 File-URL: http://hdl.handle.net/10.1080/00031305.2021.1947375 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:262-269 Template-Type: ReDIF-Article 1.0 Author-Name: William E. Griffiths Author-X-Name-First: William E. Author-X-Name-Last: Griffiths Author-Name: R. Carter Hill Author-X-Name-First: R. Author-X-Name-Last: Carter Hill Title: Rejoinder to Harville (2022) and Christensen (2022) Comments on “On the Power of the F-test for Hypotheses in a Linear Model,” by Griffiths and Hill (2022) Journal: The American Statistician Pages: 312-312 Issue: 3 Volume: 76 Year: 2022 Month: 7 X-DOI: 10.1080/00031305.2022.2074542 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2074542 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:312-312 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2066725_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: Juxin Liu Author-X-Name-First: Juxin Author-X-Name-Last: Liu Author-Name: Annshirley Afful Author-X-Name-First: Annshirley Author-X-Name-Last: Afful Author-Name: Holly Mansell Author-X-Name-First: Holly Author-X-Name-Last: Mansell Author-Name: Yanyuan Ma Author-X-Name-First: Yanyuan Author-X-Name-Last: Ma Title: Bias Analysis for Misclassification Errors in both the Response Variable and Covariate Abstract: Abstract–Much literature has focused on statistical inference for misclassified response variables or misclassified covariates. However, misclassification in both the response variable and the covariate has received very limited attention within applied fields and the statistics community. In situations where the response variable and the covariate are simultaneously subject to misclassification errors, an assumption of independent misclassification errors is often used for convenience without justification. This article aims to show the harmful consequences of inappropriate adjustment for joint misclassification errors. In particular, we focus on the wrong adjustment by ignoring the dependence between the misclassification process of the response variable and the covariate. In this article, the dependence of misclassification in both variables is characterized by covariance-type parameters. We extend the original definition of dependence parameters to a more general setting. We discover a single quantity that governs the dependence of the two misclassification processes. Moreover, we propose likelihood ratio tests to check the nondifferential/independent misclassification assumption in main study/internal validation study designs. Our simulation studies indicate that ignoring the dependent error structure can be even worse than ignoring all the misclassification errors when the validation data size is relatively small. The methodology is illustrated by a real data example. Journal: The American Statistician Pages: 353-362 Issue: 4 Volume: 76 Year: 2022 Month: 10 X-DOI: 10.1080/00031305.2022.2066725 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2066725 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:353-362 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2063944_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: Paul R. Rosenbaum Author-X-Name-First: Paul R. Author-X-Name-Last: Rosenbaum Title: A New Transformation of Treated-Control Matched-Pair Differences for Graphical Display Abstract: A new transformation is proposed for treated-minus-control matched pair differences that leaves the center of their distribution untouched, but symmetrically and smoothly transforms and shortens the tails. In this way, the center of the distribution is interpretable, undistorted and uncompressed, yet outliers are clear and distinct along the periphery. The transformation of pair differences, y↦ϱ(y) ,is strictly increasing, continuous, differentiable and odd, ϱ(−y)=−ϱ(y) , so its action in the extreme upper tail mirrors its action in the extreme lower tail. Moreover, the center of the distribution—typically 90% or 95% of the distribution—is not transformed, with ϱ(y)=y for −β≤y≤β , yet the nonlinear transformation of the tails is barely perceptible as it begins at ±β , in the sense that 1=ϱ′(β)=ϱ′(−β) , where ϱ′(·) is the derivative of ϱ(·) . The transformation is applied to an observational study of the effect of light daily alcohol consumption on the level of HDL cholesterol. The study has three control groups intended to address specific unmeasured biases; so, several types of pair differences require coordinated depiction focused on unmeasured bias, not outliers. An R package tailTransform implements the method, contains the data, and reproduces aspects of the graphs and data analysis. Journal: The American Statistician Pages: 346-352 Issue: 4 Volume: 76 Year: 2022 Month: 10 X-DOI: 10.1080/00031305.2022.2063944 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2063944 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:346-352 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2126685_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: Jie Cui Author-X-Name-First: Jie Author-X-Name-Last: Cui Author-Name: Haoda Fu Author-X-Name-First: Haoda Author-X-Name-Last: Fu Title: Statistical Issues in Drug Development, 3rd ed. Journal: The American Statistician Pages: 431-431 Issue: 4 Volume: 76 Year: 2022 Month: 10 X-DOI: 10.1080/00031305.2022.2126685 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2126685 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:431-431 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2089232_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: Mine Dogucu Author-X-Name-First: Mine Author-X-Name-Last: Dogucu Author-Name: Jingchen Hu Author-X-Name-First: Jingchen Author-X-Name-Last: Hu Title: The Current State of Undergraduate Bayesian Education and Recommendations for the Future Abstract: As a result of the increased emphasis on mis- and over-use of p-values in scientific research and the rise in popularity of Bayesian statistics, Bayesian education is becoming more important at the undergraduate level. With the advances in computing tools, Bayesian statistics is also becoming more accessible for undergraduates. This study focuses on analyzing Bayesian courses for undergraduates. We explored whether an undergraduate Bayesian course is offered in our sample of 152 high-ranking research universities and liberal arts colleges. For each identified Bayesian course, we examined how it fits into the institution’s undergraduate curricula, such as majors and prerequisites. Through a series of course syllabi analyses, we explored the topics covered and their popularity in these courses, and the adopted teaching and learning tools, such as software. This article presents our findings on the current practices of teaching full Bayesian courses at the undergraduate level. Based on our findings, we provide recommendations for programs that may consider offering Bayesian courses to their students. Journal: The American Statistician Pages: 405-413 Issue: 4 Volume: 76 Year: 2022 Month: 10 X-DOI: 10.1080/00031305.2022.2089232 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2089232 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:405-413 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2126684_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: Scott A. Roths Author-X-Name-First: Scott A. Author-X-Name-Last: Roths Title: Probability, Statistics, and Data: A Fresh Approach Using R Journal: The American Statistician Pages: 430-430 Issue: 4 Volume: 76 Year: 2022 Month: 10 X-DOI: 10.1080/00031305.2022.2126684 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2126684 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:430-430 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2107568_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: Andrew J. Sage Author-X-Name-First: Andrew J. Author-X-Name-Last: Sage Author-Name: Yang Liu Author-X-Name-First: Yang Author-X-Name-Last: Liu Author-Name: Joe Sato Author-X-Name-First: Joe Author-X-Name-Last: Sato Title: From Black Box to Shining Spotlight: Using Random Forest Prediction Intervals to Illuminate the Impact of Assumptions in Linear Regression Abstract: We introduce a pair of Shiny web applications that allow users to visualize random forest prediction intervals alongside those produced by linear regression models. The apps are designed to help undergraduate students deepen their understanding of the role that assumptions play in statistical modeling by comparing and contrasting intervals produced by regression models with those produced by more flexible algorithmic techniques. We describe the mechanics of each approach, illustrate the features of the apps, provide examples highlighting the insights students can gain through their use, and discuss our experience implementing them in an undergraduate class. We argue that, contrary to their reputation as a black box, random forests can be used as a spotlight, for educational purposes, illuminating the role of assumptions in regression models and their impact on the shape, width, and coverage rates of prediction intervals. Journal: The American Statistician Pages: 414-429 Issue: 4 Volume: 76 Year: 2022 Month: 10 X-DOI: 10.1080/00031305.2022.2107568 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2107568 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:414-429 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2055644_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: Ronald Christensen Author-X-Name-First: Ronald Author-X-Name-Last: Christensen Title: Comment on “On Optimal Correlation-Based Prediction,” by Bottai et al. (2022) Journal: The American Statistician Pages: 322-322 Issue: 4 Volume: 76 Year: 2022 Month: 10 X-DOI: 10.1080/00031305.2022.2055644 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2055644 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:322-322 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2046159_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: James A. Hanley Author-X-Name-First: James A. Author-X-Name-Last: Hanley Author-Name: Sahir Bhatnagar Author-X-Name-First: Sahir Author-X-Name-Last: Bhatnagar Title: The “Poisson” Distribution: History, Reenactments, Adaptations Abstract: Although it is a widely used—and misused—discrete distribution, textbooks tend to give the history of the Poisson distribution short shrift, typically deriving it in the abstract as a limiting case of a binomial. The biological and physical scientists who independently derived it using space and time considerations and used it in their work are seldom mentioned. Nor are the difficulties of applying it to counts involving human activities/behavior. We (a) sketch the early history of the Poisson distribution (b) illustrate principles of the Poisson distribution involving space and time using the original biological and physical applications, as well as modern multimedia reenactments of them, and (c) motivate count distributions accounting for extra-Poisson variation. The replayed historical applications can help today’s students, teachers and practitioners to see or hear what randomness looks or sounds like, to get practice in the practicalities of “counting statistics,” to distinguish situations where the pure Poisson distribution does and doesn’t hold—and to think about what one might do when it doesn’t. Journal: The American Statistician Pages: 363-371 Issue: 4 Volume: 76 Year: 2022 Month: 10 X-DOI: 10.1080/00031305.2022.2046159 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2046159 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:363-371 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2076743_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: Stephen Portnoy Author-X-Name-First: Stephen Author-X-Name-Last: Portnoy Title: Linearity of Unbiased Linear Model Estimators Abstract: Best linear unbiased estimators (BLUE’s) are known to be optimal in many respects under normal assumptions. Since variance minimization doesn’t depend on normality and unbiasedness is often considered reasonable, many statisticians have felt that BLUE’s ought to preform relatively well in some generality. The result here considers the general linear model and shows that any measurable estimator that is unbiased over a moderately large family of distributions must be linear. Thus, imposing unbiasedness cannot offer any improvement over imposing linearity. The problem was suggested by Hansen, who showed that any estimator unbiased for nearly all error distributions (with finite covariance) must have a variance no smaller than that of the best linear estimator in some parametric subfamily. Specifically, the hypothesis of linearity can be dropped from the classical Gauss–Markov Theorem. This might suggest that the best unbiased estimator should provide superior performance, but the result here shows that the best unbiased regression estimator can be no better than the best linear estimator. Journal: The American Statistician Pages: 372-375 Issue: 4 Volume: 76 Year: 2022 Month: 10 X-DOI: 10.1080/00031305.2022.2076743 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2076743 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:372-375 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2096695_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: Bradley Lubich Author-X-Name-First: Bradley Author-X-Name-Last: Lubich Author-Name: Daniel Jeske Author-X-Name-First: Daniel Author-X-Name-Last: Jeske Author-Name: Weixin Yao Author-X-Name-First: Weixin Author-X-Name-Last: Yao Title: Statistical Inference for Method of Moments Estimators of a Semi-Supervised Two-Component Mixture Model Abstract: A mixture of a distribution of responses from untreated patients and a shift of that distribution is a useful model for the responses from a group of treated patients. The mixture model accounts for the fact that not all the patients in the treated group will respond to the treatment and consequently their responses follow the same distribution as the responses from untreated patients. The treatment effect in this context consists of both the fraction of the treated patients that are responders and the magnitude of the shift in the distribution for the responders. In this article, we investigate asymptotic properties of method of moment estimators for the treatment effect based on a semi-supervised two-component mixture model. From these properties, we develop asymptotic confidence intervals and demonstrate their superior statistical inference performance compared to the computationally intensive bootstrap intervals and their Bias-Corrected versions. Journal: The American Statistician Pages: 376-383 Issue: 4 Volume: 76 Year: 2022 Month: 10 X-DOI: 10.1080/00031305.2022.2096695 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2096695 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:376-383 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2041482_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: Andee Kaplan Author-X-Name-First: Andee Author-X-Name-Last: Kaplan Author-Name: Brenda Betancourt Author-X-Name-First: Brenda Author-X-Name-Last: Betancourt Author-Name: Rebecca C. Steorts Author-X-Name-First: Rebecca C. Author-X-Name-Last: Steorts Title: A Practical Approach to Proper Inference with Linked Data Abstract: Entity resolution (ER), comprising record linkage and deduplication, is the process of merging noisy databases in the absence of unique identifiers to remove duplicate entities. One major challenge of analysis with linked data is identifying a representative record among determined matches to pass to an inferential or predictive task, referred to as the downstream task. Additionally, incorporating uncertainty from ER in the downstream task is critical to ensure proper inference. To bridge the gap between ER and the downstream task in an analysis pipeline, we propose five methods to choose a representative (or canonical) record from linked data, referred to as canonicalization. Our methods are scalable in the number of records, appropriate in general data scenarios, and provide natural error propagation via a Bayesian canonicalization stage. The proposed methodology is evaluated on three simulated datasets and one application – determining the relationship between demographic information and party affiliation in voter registration data from the North Carolina State Board of Elections. We first perform Bayesian ER and evaluate our proposed methods for canonicalization before considering the downstream tasks of linear and logistic regression. Bayesian canonicalization methods are empirically shown to improve downstream inference in both settings through prediction and coverage. Journal: The American Statistician Pages: 384-393 Issue: 4 Volume: 76 Year: 2022 Month: 10 X-DOI: 10.1080/00031305.2022.2041482 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2041482 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:384-393 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2006781_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: Yao Li Author-X-Name-First: Yao Author-X-Name-Last: Li Author-Name: Minhao Cheng Author-X-Name-First: Minhao Author-X-Name-Last: Cheng Author-Name: Cho-Jui Hsieh Author-X-Name-First: Cho-Jui Author-X-Name-Last: Hsieh Author-Name: Thomas C. M. Lee Author-X-Name-First: Thomas C. M. Author-X-Name-Last: Lee Title: A Review of Adversarial Attack and Defense for Classification Methods Abstract: Despite the efficiency and scalability of machine learning systems, recent studies have demonstrated that many classification methods, especially Deep Neural Networks (DNNs), are vulnerable to adversarial examples; that is, examples that are carefully crafted to fool a well-trained classification model while being indistinguishable from natural data to human. This makes it potentially unsafe to apply DNNs or related methods in security-critical areas. Since this issue was first identified by Biggio et al. and Szegedy et al., much work has been done in this field, including the development of attack methods to generate adversarial examples and the construction of defense techniques to guard against such examples. This article aims to introduce this topic and its latest developments to the statistical community, primarily focusing on the generation and guarding of adversarial examples. Computing codes (in Python and R) used in the numerical experiments are publicly available for readers to explore the surveyed methods. It is the hope of the authors that this article will encourage more statisticians to work on this important and exciting field of generating and defending against adversarial examples. Journal: The American Statistician Pages: 329-345 Issue: 4 Volume: 76 Year: 2022 Month: 10 X-DOI: 10.1080/00031305.2021.2006781 File-URL: http://hdl.handle.net/10.1080/00031305.2021.2006781 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:329-345 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2051604_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: Matteo Bottai Author-X-Name-First: Matteo Author-X-Name-Last: Bottai Author-Name: Taeho Kim Author-X-Name-First: Taeho Author-X-Name-Last: Kim Author-Name: Benjamin Lieberman Author-X-Name-First: Benjamin Author-X-Name-Last: Lieberman Author-Name: George Luta Author-X-Name-First: George Author-X-Name-Last: Luta Author-Name: Edsel Peña Author-X-Name-First: Edsel Author-X-Name-Last: Peña Title: On Optimal Correlation-Based Prediction Abstract: This note examines, at the population-level, the approach of obtaining predictors h˜(X) of a random variable Y, given the joint distribution of (Y,X), by maximizing the mapping h↦κ(Y,h(X)) for a given correlation function κ(·,·). Commencing with Pearson’s correlation function, the class of such predictors is uncountably infinite. The least-squares predictor h* is an element of this class obtained by equating the expectations of Y and h(X) to be equal and the variances of h(X) and E(Y|X) to be also equal. On the other hand, replacing the second condition by the equality of the variances of Y and h(X), a natural requirement for some calibration problems, the unique predictor h** that is obtained has the maximum value of Lin’s (1989) concordance correlation coefficient (CCC) with Y among all predictors. Since the CCC measures the degree of agreement, the new predictor h** is called the maximal agreement predictor. These predictors are illustrated for three special distributions: the multivariate normal distribution; the exponential distribution, conditional on covariates; and the Dirichlet distribution. The exponential distribution is relevant in survival analysis or in reliability settings, while the Dirichlet distribution is relevant for compositional data. Journal: The American Statistician Pages: 313-321 Issue: 4 Volume: 76 Year: 2022 Month: 10 X-DOI: 10.1080/00031305.2022.2051604 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2051604 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:313-321 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2006780_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: Bo Liu Author-X-Name-First: Bo Author-X-Name-Last: Liu Author-Name: Jerome P. Reiter Author-X-Name-First: Jerome P. Author-X-Name-Last: Reiter Title: Multiple Imputation Inference with Integer-Valued Point Estimates Abstract: We consider settings where an analyst of multiply imputed data desires an integer-valued point estimate and an associated interval estimate, for example, a count of the number of individuals with certain characteristics in a population. Even when the point estimate in each completed dataset is an integer, the multiple imputation point estimator, that is, the average of these completed-data estimators, is not guaranteed to be an integer. One natural approach is to round the standard multiple imputation point estimator to an integer. Another seemingly natural approach is to use the median of the completed-data point estimates (when they are integers). However, these two approaches have not been compared; indeed, methods for obtaining multiple imputation inferences associated with the median of the completed-data point estimates do not even exist. In this article, we evaluate and compare these two approaches. In doing so, we derive an estimator of the variance of the median-based multiple imputation point estimator, as well as a method for obtaining associated multiple imputation confidence intervals. Using simulation studies, we show that both methods can offer well-calibrated coverage rates and have similar repeated sampling properties, and hence are both useful for this analysis task. Journal: The American Statistician Pages: 323-328 Issue: 4 Volume: 76 Year: 2022 Month: 10 X-DOI: 10.1080/00031305.2021.2006780 File-URL: http://hdl.handle.net/10.1080/00031305.2021.2006780 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:323-328 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2054859_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: Per Johansson Author-X-Name-First: Per Author-X-Name-Last: Johansson Author-Name: Mattias Nordin Author-X-Name-First: Mattias Author-X-Name-Last: Nordin Title: Inference in Experiments Conditional on Observed Imbalances in Covariates Abstract: Double-blind randomized controlled trials are traditionally seen as the gold standard for causal inferences as the difference-in-means estimator is an unbiased estimator of the average treatment effect in the experiment. The fact that this estimator is unbiased over all possible randomizations does not, however, mean that any given estimate is close to the true treatment effect. Similarly, while predetermined covariates will be balanced between treatment and control groups on average, large imbalances may be observed in a given experiment and the researcher may therefore want to condition on such covariates using linear regression. This article studies the theoretical properties of both the difference-in-means and OLS estimators conditional on observed differences in covariates. By deriving the statistical properties of the conditional estimators, we can establish guidance for how to deal with covariate imbalances. Journal: The American Statistician Pages: 394-404 Issue: 4 Volume: 76 Year: 2022 Month: 10 X-DOI: 10.1080/00031305.2022.2054859 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2054859 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:394-404 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2070279_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: Yuxin Qin Author-X-Name-First: Yuxin Author-X-Name-Last: Qin Author-Name: Heather Sasinowska Author-X-Name-First: Heather Author-X-Name-Last: Sasinowska Author-Name: Lawrence Leemis Author-X-Name-First: Lawrence Author-X-Name-Last: Leemis Title: The Probability Mass Function of the Kaplan–Meier Product–Limit Estimator Abstract: Kaplan and Meier’s 1958 article developed a nonparametric estimator of the survivor function from a right-censored dataset. Determining the size of the support of the estimator as a function of the sample size provides a challenging exercise for students in an advanced course in mathematical statistics. We devise two algorithms for calculating the support size and calculate the associated probability mass function for small sample sizes and particular probability distributions for the failure and censoring times. Journal: The American Statistician Pages: 102-110 Issue: 1 Volume: 77 Year: 2023 Month: 1 X-DOI: 10.1080/00031305.2022.2070279 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2070279 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:102-110 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2028675_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: Joris Mulder Author-X-Name-First: Joris Author-X-Name-Last: Mulder Title: Bayesian Testing of Linear Versus Nonlinear Effects Using Gaussian Process Priors Abstract: A Bayes factor is proposed for testing whether the effect of a key predictor variable on a dependent variable is linear or nonlinear, possibly while controlling for certain covariates. The test can be used (i) in substantive research for assessing the nature of the relationship between certain variables based on scientific expectations, and (ii) for statistical model building to infer whether a (transformed) variable should be added as a linear or nonlinear predictor in a regression model. Under the nonlinear model, a Gaussian process prior is employed using a parameterization similar to Zellner’s g prior resulting in a scale-invariant test. Unlike existing p-values, the proposed Bayes factor can be used for quantifying the relative evidence in the data in favor of linearity. Furthermore the Bayes factor does not overestimate the evidence against the linear null model resulting in more parsimonious models. An extension is proposed for Bayesian one-sided testing of whether a nonlinear effect is consistently positive, consistently negative, or neither. Applications are provided from various fields including social network research and education. Journal: The American Statistician Pages: 1-11 Issue: 1 Volume: 77 Year: 2023 Month: 1 X-DOI: 10.1080/00031305.2022.2028675 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2028675 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:1-11 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2110938_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: Alan D. Hutson Author-X-Name-First: Alan D. Author-X-Name-Last: Hutson Author-Name: Han Yu Author-X-Name-First: Han Author-X-Name-Last: Yu Title: The Sign Test, Paired Data, and Asymmetric Dependence: A Cautionary Tale Abstract: In the paired data setting, the sign test is often described in statistical textbooks as a test for comparing differences between the medians of two marginal distributions. There is an implicit assumption that the median of the differences is equivalent to the difference of the medians when employing the sign test in this fashion. We demonstrate however that given asymmetry in the bivariate distribution of the paired data, there are often scenarios where the median of the differences is not equal to the difference of the medians. Further, we show that these scenarios will lead to a false interpretation of the sign test for its intended use in the paired data setting. We illustrate the false-interpretation concept via theory, a simulation study, and through a real-world example based on breast cancer RNA sequencing data obtained from the Cancer Genome Atlas (TCGA). Journal: The American Statistician Pages: 35-40 Issue: 1 Volume: 77 Year: 2023 Month: 1 X-DOI: 10.1080/00031305.2022.2110938 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2110938 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:35-40 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2058611_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: Vojtech Kejzlar Author-X-Name-First: Vojtech Author-X-Name-Last: Kejzlar Author-Name: Shrijita Bhattacharya Author-X-Name-First: Shrijita Author-X-Name-Last: Bhattacharya Author-Name: Mookyong Son Author-X-Name-First: Mookyong Author-X-Name-Last: Son Author-Name: Tapabrata Maiti Author-X-Name-First: Tapabrata Author-X-Name-Last: Maiti Title: Black Box Variational Bayesian Model Averaging Abstract: For many decades now, Bayesian Model Averaging (BMA) has been a popular framework to systematically account for model uncertainty that arises in situations when multiple competing models are available to describe the same or similar physical process. The implementation of this framework, however, comes with a multitude of practical challenges including posterior approximation via Markov chain Monte Carlo and numerical integration. We present a Variational Bayesian Inference approach to BMA as a viable alternative to the standard solutions which avoids many of the aforementioned pitfalls. The proposed method is “black box” in the sense that it can be readily applied to many models with little to no model-specific derivation. We illustrate the utility of our variational approach on a suite of examples and discuss all the necessary implementation details. Fully documented Python code with all the examples is provided as well. Journal: The American Statistician Pages: 85-96 Issue: 1 Volume: 77 Year: 2023 Month: 1 X-DOI: 10.1080/00031305.2022.2058611 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2058611 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:85-96 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2046160_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: Zifei Han Author-X-Name-First: Zifei Author-X-Name-Last: Han Author-Name: Keying Ye Author-X-Name-First: Keying Author-X-Name-Last: Ye Author-Name: Min Wang Author-X-Name-First: Min Author-X-Name-Last: Wang Title: A Study on the Power Parameter in Power Prior Bayesian Analysis Abstract: The power prior and its variations have been proven to be a useful class of informative priors in Bayesian inference due to their flexibility in incorporating the historical information by raising the likelihood of the historical data to a fractional power δ. The derivation of the marginal likelihood based on the original power prior, and its variation, the normalized power prior, introduces a scaling factor C(δ) in the form of a prior predictive distribution with powered likelihood. In this article, we show that the scaling factor might be infinite for some positive δ with conventionally used initial priors, which would change the admissible set of the power parameter. This result seems to have been almost completely ignored in the literature. We then illustrate that such a phenomenon may jeopardize the posterior inference under the power priors when the initial prior of the model parameters is improper. The main findings of this article suggest that special attention should be paid when the suggested level of borrowing is close to 0, while the actual optimum might be below the suggested value. We use a normal linear model as an example for illustrative purposes. Journal: The American Statistician Pages: 12-19 Issue: 1 Volume: 77 Year: 2023 Month: 1 X-DOI: 10.1080/00031305.2022.2046160 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2046160 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:12-19 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2141879_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: Stan Lipovetsky Author-X-Name-First: Stan Author-X-Name-Last: Lipovetsky Title: Comment on “On Optimal Correlation-Based Prediction”, By Bottai et al. (2022) Journal: The American Statistician Pages: 113-113 Issue: 1 Volume: 77 Year: 2023 Month: 1 X-DOI: 10.1080/00031305.2022.2141879 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2141879 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:113-113 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2160590_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: James O. Ramsay Author-X-Name-First: James O. Author-X-Name-Last: Ramsay Title: Object Oriented Data Analysis Journal: The American Statistician Pages: 111-111 Issue: 1 Volume: 77 Year: 2023 Month: 1 X-DOI: 10.1080/00031305.2022.2160590 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2160590 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:111-111 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2106305_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: Alan Huang Author-X-Name-First: Alan Author-X-Name-Last: Huang Title: On Arbitrarily Underdispersed Discrete Distributions Abstract: We survey a range of popular generalized count distributions, investigating which (if any) can be arbitrarily underdispersed, that is, its variance can be arbitrarily small compared to its mean. A philosophical implication is that some models failing this simple criterion should not be considered as “statistical models” according to McCullagh’s extendibility criterion. Four practical implications are also discussed: (i) functional independence of parameters, (ii) double generalized linear models, (iii) simulation of underdispersed counts, and (iv) severely underdispersed count regression. We suggest that all future generalizations of the Poisson distribution be tested against this key property. Journal: The American Statistician Pages: 29-34 Issue: 1 Volume: 77 Year: 2023 Month: 1 X-DOI: 10.1080/00031305.2022.2106305 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2106305 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:29-34 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2026478_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: Dan J. Spitzner Author-X-Name-First: Dan J. Author-X-Name-Last: Spitzner Title: A Statistical Basis for Reporting Strength of Evidence as Pool Reduction Abstract: This article establishes a statistical basis for an evidence-reporting strategy that interprets strength of evidence in terms of a reduction in the size of a pool of relevant conceptual objects. The strategy is motivated by debates in forensic science, wherein the pool would consist of sources of forensic material. An advantage of using the pool-reduction strategy is that it highlights uncertainty that cannot be resolved by empirical considerations. It is shown mathematically to reflect a nonstandard formulation of a Bayes factor, and to extend for use in problems of general quantitative inference. A number of conventions are proposed for full effectiveness of the strategy’s implementation in practice. Journal: The American Statistician Pages: 62-71 Issue: 1 Volume: 77 Year: 2023 Month: 1 X-DOI: 10.1080/00031305.2022.2026478 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2026478 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:62-71 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2058612_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: Andrew Gelman Author-X-Name-First: Andrew Author-X-Name-Last: Gelman Title: “Two Truths and a Lie” as a Class-Participation Activity Abstract: We adapt the social game “Two truths and a lie” to a classroom setting to give an activity that introduces principles of statistical measurement, uncertainty, prediction, and calibration, while giving students an opportunity to meet each other. We discuss how this activity can be used in a range of different statistics courses. Journal: The American Statistician Pages: 97-101 Issue: 1 Volume: 77 Year: 2023 Month: 1 X-DOI: 10.1080/00031305.2022.2058612 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2058612 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:97-101 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2110939_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: Hadrien Charvat Author-X-Name-First: Hadrien Author-X-Name-Last: Charvat Title: Using the Lambert Function to Estimate Shared Frailty Models with a Normally Distributed Random Intercept Abstract: Shared frailty models, that is, hazard regression models for censored data including random effects acting multiplicatively on the hazard, are commonly used to analyze time-to-event data possessing a hierarchical structure. When the random effects are assumed to be normally distributed, the cluster-specific marginal likelihood has no closed-form expression. A powerful method for approximating such integrals is the adaptive Gauss-Hermite quadrature (AGHQ). However, this method requires the estimation of the mode of the integrand in the expression defining the cluster-specific marginal likelihood: it is generally obtained through a nested optimization at the cluster level for each evaluation of the likelihood function. In this work, we show that in the case of a parametric shared frailty model including a normal random intercept, the cluster-specific modes can be determined analytically by using the principal branch of the Lambert function, W0 . Besides removing the need for the nested optimization procedure, it provides closed-form formulas for the gradient and Hessian of the approximated likelihood making its maximization by Newton-type algorithms convenient and efficient. The Lambert-based AGHQ (LAGHQ) might be applied to other problems involving similar integrals, such as the normally distributed random intercept Poisson model and the computation of probabilities from a Poisson lognormal distribution. Journal: The American Statistician Pages: 41-50 Issue: 1 Volume: 77 Year: 2023 Month: 1 X-DOI: 10.1080/00031305.2022.2110939 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2110939 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:41-50 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2160592_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: Huan Wang Author-X-Name-First: Huan Author-X-Name-Last: Wang Title: Quantitative Drug Safety and Benefit-Risk Evaluation: Practical and Cross-Disciplinary Approaches Journal: The American Statistician Pages: 111-112 Issue: 1 Volume: 77 Year: 2023 Month: 1 X-DOI: 10.1080/00031305.2022.2160592 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2160592 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:111-112 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2050299_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: Spencer Hansen Author-X-Name-First: Spencer Author-X-Name-Last: Hansen Author-Name: Ken Rice Author-X-Name-First: Ken Author-X-Name-Last: Rice Title: Coherent Tests for Interval Null Hypotheses Abstract: In a celebrated 1996 article, Schervish showed that, for testing interval null hypotheses, tests typically viewed as optimal can be logically incoherent. Specifically, one may fail to reject a specific interval null, but nevertheless—testing at the same level with the same data—reject a larger null, in which the original one is nested. This result has been used to argue against the widespread practice of viewing p-values as measures of evidence. In the current work we approach tests of interval nulls using simple Bayesian decision theory, and establish straightforward conditions that ensure coherence in Schervish’s sense. From these, we go on to establish novel frequentist criteria—different to Type I error rate—that, when controlled at fixed levels, give tests that are coherent in Schervish’s sense. The results suggest that exploring frequentist properties beyond the familiar Neyman–Pearson framework may ameliorate some of statistical testing’s well-known problems. Journal: The American Statistician Pages: 20-28 Issue: 1 Volume: 77 Year: 2023 Month: 1 X-DOI: 10.1080/00031305.2022.2050299 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2050299 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:20-28 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2051605_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: Rachael C. Aikens Author-X-Name-First: Rachael C. Author-X-Name-Last: Aikens Author-Name: Michael Baiocchi Author-X-Name-First: Michael Author-X-Name-Last: Baiocchi Title: Assignment-Control Plots: A Visual Companion for Causal Inference Study Design Abstract: An important step for any causal inference study design is understanding the distribution of the subjects in terms of measured baseline covariates. However, not all baseline variation is equally important. We propose a set of visualizations that reduce the space of measured covariates into two components of baseline variation important to the design of an observational causal inference study: a propensity score summarizing baseline variation associated with treatment assignment and a prognostic score summarizing baseline variation associated with the untreated potential outcome. These assignment-control plots and variations thereof visualize study design tradeoffs and illustrate core methodological concepts in causal inference. As a practical demonstration, we apply assignment-control plots to a hypothetical study of cardiothoracic surgery. To demonstrate how these plots can be used to illustrate nuanced concepts, we use them to visualize unmeasured confounding and to consider the relationship between propensity scores and instrumental variables. While the family of visualization tools for studies of causality is relatively sparse, simple visual tools can be an asset to education, application, and methods development. Journal: The American Statistician Pages: 72-84 Issue: 1 Volume: 77 Year: 2023 Month: 1 X-DOI: 10.1080/00031305.2022.2051605 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2051605 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:72-84 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2023633_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949 Author-Name: Jeroen de Mast Author-X-Name-First: Jeroen Author-X-Name-Last: de Mast Author-Name: Stefan H. Steiner Author-X-Name-First: Stefan H. Author-X-Name-Last: Steiner Author-Name: Wim P. M. Nuijten Author-X-Name-First: Wim P. M. Author-X-Name-Last: Nuijten Author-Name: Daniel Kapitan Author-X-Name-First: Daniel Author-X-Name-Last: Kapitan Title: Analytical Problem Solving Based on Causal, Correlational and Deductive Models Abstract: Many approaches for solving problems in business and industry are based on analytics and statistical modeling. Analytical problem solving is driven by the modeling of relationships between dependent (Y) and independent (X) variables, and we discuss three frameworks for modeling such relationships: cause-and-effect modeling, popular in applied statistics and beyond, correlational predictive modeling, popular in machine learning, and deductive (first-principles) modeling, popular in business analytics and operations research. We aim to explain the differences between these types of models, and flesh out the implications of these differences for study design, for discovering potential X/Y relationships, and for the types of solution patterns that each type of modeling could support. We use our account to clarify the popular descriptive-diagnostic-predictive-prescriptive analytics framework, but extend it to offer a more complete model of the process of analytical problem solving, reflecting the essential differences between causal, correlational, and deductive models. Journal: The American Statistician Pages: 51-61 Issue: 1 Volume: 77 Year: 2023 Month: 1 X-DOI: 10.1080/00031305.2021.2023633 File-URL: http://hdl.handle.net/10.1080/00031305.2021.2023633 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:51-61 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2115552_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Michael Grabchak Author-X-Name-First: Michael Author-X-Name-Last: Grabchak Title: How Do We Perform a Paired t-Test When We Don’t Know How to Pair? Abstract: We address the question of how to perform a paired t-test in situations where we do not know how to pair the data. Specifically, we discuss approaches for bounding the test statistic of the paired t-test in a way that allows us to recover the results of this test in some cases. We also discuss the relationship between the paired t-test and the independent samples t-test and what happens if we use the latter to approximate the former. Our results are informed by both theoretical results and a simulation study. Journal: The American Statistician Pages: 127-133 Issue: 2 Volume: 77 Year: 2023 Month: 4 X-DOI: 10.1080/00031305.2022.2115552 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2115552 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:127-133 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2129787_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Mauricio Tec Author-X-Name-First: Mauricio Author-X-Name-Last: Tec Author-Name: Yunshan Duan Author-X-Name-First: Yunshan Author-X-Name-Last: Duan Author-Name: Peter Müller Author-X-Name-First: Peter Author-X-Name-Last: Müller Title: A Comparative Tutorial of Bayesian Sequential Design and Reinforcement Learning Abstract: Reinforcement learning (RL) is a computational approach to reward-driven learning in sequential decision problems. It implements the discovery of optimal actions by learning from an agent interacting with an environment rather than from supervised data. We contrast and compare RL with traditional sequential design, focusing on simulation-based Bayesian sequential design (BSD). Recently, there has been an increasing interest in RL techniques for healthcare applications. We introduce two related applications as motivating examples. In both applications, the sequential nature of the decisions is restricted to sequential stopping. Rather than a comprehensive survey, the focus of the discussion is on solutions using standard tools for these two relatively simple sequential stopping problems. Both problems are inspired by adaptive clinical trial design. We use examples to explain the terminology and mathematical background that underlie each framework and map one to the other. The implementations and results illustrate the many similarities between RL and BSD. The results motivate the discussion of the potential strengths and limitations of each approach. Journal: The American Statistician Pages: 223-233 Issue: 2 Volume: 77 Year: 2023 Month: 4 X-DOI: 10.1080/00031305.2022.2129787 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2129787 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:223-233 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2128421_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Jay Bartroff Author-X-Name-First: Jay Author-X-Name-Last: Bartroff Author-Name: Gary Lorden Author-X-Name-First: Gary Author-X-Name-Last: Lorden Author-Name: Lijia Wang Author-X-Name-First: Lijia Author-X-Name-Last: Wang Title: Optimal and Fast Confidence Intervals for Hypergeometric Successes Abstract: We present an efficient method of calculating exact confidence intervals for the hypergeometric parameter representing the number of “successes,” or “special items,” in the population. The method inverts minimum-width acceptance intervals after shifting them to make their endpoints nondecreasing while preserving their level. The resulting set of confidence intervals achieves minimum possible average size, and even in comparison with confidence sets not required to be intervals it attains the minimum possible cardinality most of the time, and always within 1. The method compares favorably with existing methods not only in the size of the intervals but also in the time required to compute them. The available R package hyperMCI implements the proposed method. Journal: The American Statistician Pages: 151-159 Issue: 2 Volume: 77 Year: 2023 Month: 4 X-DOI: 10.1080/00031305.2022.2128421 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2128421 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:151-159 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2127896_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Monnie McGee Author-X-Name-First: Monnie Author-X-Name-Last: McGee Author-Name: Benjamin Williams Author-X-Name-First: Benjamin Author-X-Name-Last: Williams Author-Name: Jacy Sparks Author-X-Name-First: Jacy Author-X-Name-Last: Sparks Title: Athlete Recruitment and the Myth of the Sophomore Peak Abstract: Conventional wisdom dispersed by fans and coaches in the stands at almost any high school track meet suggests female athletes typically peak around 10th grade or earlier (15 years of age), particularly for distance runners, and male athletes continuously improve. Given that universities in the United States typically recruit track and field athletes from high school teams, it is important to understand the age of peak performance at the high school level. Athletes are often recruited starting in their sophomore year of high school and individuals develop at different rates during adolescence; however, the individual development factor is usually not taken into account during recruitment. In this study, we curate data on event times for high school track and field athletes from the years 2011 to 2019 to determine the trajectory of fastest times for male and female athletes in the 200m, 400m, 800m, and 1600m races. We show, through visualizations and models, that, for most athletes, the sophomore peak is a myth. Performance is mostly dependent on the individual athlete. That said, the trajectories cluster into four or five types, depending on the race distance. We explain the significance of the types for future recruitment. Journal: The American Statistician Pages: 182-191 Issue: 2 Volume: 77 Year: 2023 Month: 4 X-DOI: 10.1080/00031305.2022.2127896 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2127896 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:182-191 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2131625_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Xin Xiong Author-X-Name-First: Xin Author-X-Name-Last: Xiong Author-Name: Ivor Cribben Author-X-Name-First: Ivor Author-X-Name-Last: Cribben Title: The State of Play of Reproducibility in Statistics: An Empirical Analysis Abstract: Reproducibility, the ability to reproduce the results of published papers or studies using their computer code and data, is a cornerstone of reliable scientific methodology. Studies where results cannot be reproduced by the scientific community should be treated with caution. Over the past decade, the importance of reproducible research has been frequently stressed in a wide range of scientific journals such as Nature and Science and international magazines such as The Economist. However, multiple studies have demonstrated that scientific results are often not reproducible across research areas such as psychology and medicine. Statistics, the science concerned with developing and studying methods for collecting, analyzing, interpreting and presenting empirical data, prides itself on its openness when it comes to sharing both computer code and data. In this article, we examine reproducibility in the field of statistics by attempting to reproduce the results in 93 published papers in prominent journals using functional magnetic resonance imaging (fMRI) data during the 2010–2021 period. Overall, from both the computer code and the data perspective, among all the 93 examined papers, we could only reproduce the results in 14 (15.1%) papers, that is, the papers provide both executable computer code (or software) with the real fMRI data, and our results matched the results in the paper. Finally, we conclude with some author-specific and journal-specific recommendations to improve the research reproducibility in statistics. Journal: The American Statistician Pages: 115-126 Issue: 2 Volume: 77 Year: 2023 Month: 4 X-DOI: 10.1080/00031305.2022.2131625 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2131625 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:115-126 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2128874_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Chris Rohlfs Author-X-Name-First: Chris Author-X-Name-Last: Rohlfs Title: Forbidden Knowledge and Specialized Training: A Versatile Solution for the Two Main Sources of Overfitting in Linear Regression Abstract: Overfitting in linear regression is broken down into two main causes. First, the formula for the estimator includes “forbidden knowledge” about training observations’ residuals, and it loses this advantage when deployed out-of-sample. Second, the estimator has “specialized training” that makes it particularly capable of explaining movements in the predictors that are idiosyncratic to the training sample. An out-of-sample counterpart is introduced to the popular “leverage” measure of training observations’ importance. A new method is proposed to forecast out-of-sample fit at the time of deployment, when the values for the predictors are known but the true outcome variable is not. In Monte Carlo simulations and in an empirical application using MRI brain scans, the proposed estimator performs comparably to Predicted Residual Error Sum of Squares (PRESS) for the average out-of-sample case and unlike PRESS, also performs consistently across different test samples, even those that differ substantially from the training set. Journal: The American Statistician Pages: 160-168 Issue: 2 Volume: 77 Year: 2023 Month: 4 X-DOI: 10.1080/00031305.2022.2128874 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2128874 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:160-168 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2198354_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Jae-Kwang Kim Author-X-Name-First: Jae-Kwang Author-X-Name-Last: Kim Title: Graph Sampling Journal: The American Statistician Pages: 234-234 Issue: 2 Volume: 77 Year: 2023 Month: 4 X-DOI: 10.1080/00031305.2023.2198354 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2198354 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:234-234 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2087734_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Andee Kaplan Author-X-Name-First: Andee Author-X-Name-Last: Kaplan Author-Name: Jacob Bien Author-X-Name-First: Jacob Author-X-Name-Last: Bien Title: Interactive Exploration of Large Dendrograms with Prototypes Abstract: Hierarchical clustering is one of the standard methods taught for identifying and exploring the underlying structures that may be present within a dataset. Students are shown examples in which the dendrogram, a visual representation of the hierarchical clustering, reveals a clear clustering structure. However, in practice, data analysts today frequently encounter datasets whose large scale undermines the usefulness of the dendrogram as a visualization tool. Densely packed branches obscure structure, and overlapping labels are impossible to read. In this article we present a new workflow for performing hierarchical clustering via the R package called protoshiny that aims to restore hierarchical clustering to its former role of being an effective and versatile visualization tool. Our proposal leverages interactivity combined with the ability to label internal nodes in a dendrogram with a representative data point (called a prototype). After presenting the workflow, we provide three case studies to demonstrate its utility. Journal: The American Statistician Pages: 201-211 Issue: 2 Volume: 77 Year: 2023 Month: 4 X-DOI: 10.1080/00031305.2022.2087734 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2087734 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:201-211 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2184423_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: The Editors Title: Correction: Linearity of Unbiased Linear Model Estimators Journal: The American Statistician Pages: 237-237 Issue: 2 Volume: 77 Year: 2023 Month: 4 X-DOI: 10.1080/00031305.2023.2184423 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2184423 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:237-237 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2141858_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Roy Bower Author-X-Name-First: Roy Author-X-Name-Last: Bower Author-Name: Justin Hager Author-X-Name-First: Justin Author-X-Name-Last: Hager Author-Name: Chris Cherniakov Author-X-Name-First: Chris Author-X-Name-Last: Cherniakov Author-Name: Samay Gupta Author-X-Name-First: Samay Author-X-Name-Last: Gupta Author-Name: William Cipolli Author-X-Name-First: William Author-X-Name-Last: Cipolli Title: A Case for Nonparametrics Abstract: We provide a case study for motivating and teaching nonparametric statistical inference alongside traditional parametric approaches. The case consists of analyses by Bracht et al. who use analysis of variance (ANOVA) to assess the applicability of the human microfibrillar-associated protein 4 (MFAP4) as a biomarker for hepatic fibrosis in hepatitis C patients. We revisit their analyses and consider two nonparametric approaches: Mood’s median test and the Kruskal-Wallis test. We demonstrate how this case study enables instructors to discuss critical assumptions of parametric procedures while comparing and contrasting the results of multiple approaches. Interestingly, only one of the three approaches creates groupings that match the treatment recommendations of the European Association for the Study of the Liver (EASL). We provide guidance and resources to aid instructors in directing their students through this case study at various levels, including R code and novel R shiny applications for conducting the analyses in the classroom. Journal: The American Statistician Pages: 212-219 Issue: 2 Volume: 77 Year: 2023 Month: 4 X-DOI: 10.1080/00031305.2022.2141858 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2141858 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:212-219 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2116109_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Mingya Long Author-X-Name-First: Mingya Author-X-Name-Last: Long Author-Name: Zhengbang Li Author-X-Name-First: Zhengbang Author-X-Name-Last: Li Author-Name: Wei Zhang Author-X-Name-First: Wei Author-X-Name-Last: Zhang Author-Name: Qizhai Li Author-X-Name-First: Qizhai Author-X-Name-Last: Li Title: The Cauchy Combination Test under Arbitrary Dependence Structures Abstract: Combining individual p-values to perform an overall test is often encountered in statistical applications. The Cauchy combination test (CCT) (Journal of the American Statistical Association, 2020, 115, 393–402) is a powerful and computationally efficient approach to integrate individual p-values under arbitrary dependence structures for sparse signals. We revisit this test to additionally show that (i) the tail probability of the CCT can be approximated just as well when more relaxed assumptions are imposed on individual p-values compared to those of the original test statistics; (ii) such assumptions are satisfied by six popular copula distributions; and (iii) the power of the CCT is no less than that of the minimum p-value test when the number of p-values goes to infinity under some regularity conditions. These findings are confirmed by both simulations and applications in two real datasets, thus, further broadening the theory and applications of the CCT. Journal: The American Statistician Pages: 134-142 Issue: 2 Volume: 77 Year: 2023 Month: 4 X-DOI: 10.1080/00031305.2022.2116109 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2116109 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:134-142 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2198355_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Junyong Park Author-X-Name-First: Junyong Author-X-Name-Last: Park Title: Handbook of Multiple Comparisons Journal: The American Statistician Pages: 234-236 Issue: 2 Volume: 77 Year: 2023 Month: 4 X-DOI: 10.1080/00031305.2023.2198355 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2198355 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:234-236 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2105950_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Marcos Matabuena Author-X-Name-First: Marcos Author-X-Name-Last: Matabuena Author-Name: Marta Karas Author-X-Name-First: Marta Author-X-Name-Last: Karas Author-Name: Sherveen Riazati Author-X-Name-First: Sherveen Author-X-Name-Last: Riazati Author-Name: Nick Caplan Author-X-Name-First: Nick Author-X-Name-Last: Caplan Author-Name: Philip R. Hayes Author-X-Name-First: Philip R. Author-X-Name-Last: Hayes Title: Estimating Knee Movement Patterns of Recreational Runners Across Training Sessions Using Multilevel Functional Regression Models Abstract: Modern wearable monitors and laboratory equipment allow the recording of high-frequency data that can be used to quantify human movement. However, currently, data analysis approaches in these domains remain limited. This article proposes a new framework to analyze biomechanical patterns in sport training data recorded across multiple training sessions using multilevel functional models. We apply the methods to subsecond-level data of knee location trajectories collected in 19 recreational runners during a medium-intensity continuous run (MICR) and a high-intensity interval training (HIIT) session, with multiple steps recorded in each participant-session. We estimate functional intra-class correlation coefficient to evaluate the reliability of recorded measurements across multiple sessions of the same training type. Furthermore, we obtained a vectorial representation of the three hierarchical levels of the data and visualize them in a low-dimensional space. Finally, we quantified the differences between genders and between two training types using functional multilevel regression models that incorporate covariate information. We provide an overview of the relevant methods and make both data and the R code for all analyses freely available online on GitHub. Thus, this work can serve as a helpful reference for practitioners and guide for a broader audience of researchers interested in modeling repeated functional measures at different resolution levels in the context of biomechanics and sports science applications. Journal: The American Statistician Pages: 169-181 Issue: 2 Volume: 77 Year: 2023 Month: 4 X-DOI: 10.1080/00031305.2022.2105950 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2105950 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:169-181 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2182362_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Roy Bower Author-X-Name-First: Roy Author-X-Name-Last: Bower Author-Name: William Cipolli Author-X-Name-First: William Author-X-Name-Last: Cipolli Title: A Response to Rice and Lumley Abstract: We recognize the careful reading of and thought-provoking commentary on our work by Rice and Lumley. Further, we appreciate the opportunity to respond and clarify our position regarding the three presented concerns. We address these points in three sections below and conclude with final remarks in Section 4. Journal: The American Statistician Pages: 221-222 Issue: 2 Volume: 77 Year: 2023 Month: 4 X-DOI: 10.1080/00031305.2023.2182362 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2182362 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:221-222 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2077440_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Shijie Guo Author-X-Name-First: Shijie Author-X-Name-Last: Guo Author-Name: Jingchen Hu Author-X-Name-First: Jingchen Author-X-Name-Last: Hu Title: Data Privacy Protection and Utility Preservation through Bayesian Data Synthesis: A Case Study on Airbnb Listings Abstract: When releasing record-level data containing sensitive information to the public, the data disseminator is responsible for protecting the privacy of every record in the dataset, simultaneously preserving important features of the data for users’ analyses. These goals can be achieved by data synthesis, where confidential data are replaced with synthetic data that are simulated based on statistical models estimated on the confidential data. In this article, we present a data synthesis case study, where synthetic values of price and the number of available days in a sample of the New York Airbnb Open Data are created for privacy protection. One sensitive variable, the number of available days of an Airbnb listing, has a large amount of zero-valued records and also truncated at the two ends. We propose a zero-inflated truncated Poisson regression model for its synthesis. We use a sequential synthesis approach to further synthesize the sensitive price variable. The resulting synthetic data are evaluated for its utility preservation and privacy protection, the latter in the form of disclosure risks. Furthermore, we propose methods to investigate how uncertainties in intruder’s knowledge would influence the identification disclosure risks of the synthetic data. In particular, we explore several realistic scenarios of uncertainties in intruder’s knowledge of available information and evaluate their impacts on the resulting identification disclosure risks. Journal: The American Statistician Pages: 192-200 Issue: 2 Volume: 77 Year: 2023 Month: 4 X-DOI: 10.1080/00031305.2022.2077440 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2077440 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:192-200 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2127897_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Gang Han Author-X-Name-First: Gang Author-X-Name-Last: Han Author-Name: Thomas J. Santner Author-X-Name-First: Thomas J. Author-X-Name-Last: Santner Author-Name: Haiqun Lin Author-X-Name-First: Haiqun Author-X-Name-Last: Lin Author-Name: Ao Yuan Author-X-Name-First: Ao Author-X-Name-Last: Yuan Title: Bayesian-Frequentist Hybrid Inference in Applications with Small Sample Sizes Abstract: The Bayesian-frequentist hybrid model and associated inference can combine the advantages of both Bayesian and frequentist methods and avoid their limitations. However, except for few special cases in existing literature, the computation under the hybrid model is generally nontrivial or even unsolvable. This article develops a computation algorithm for hybrid inference under any general loss functions. Three simulation examples demonstrate that hybrid inference can improve upon frequentist inference by incorporating valuable prior information, and also improve Bayesian inference based on non-informative priors where the latter leads to biased estimates for the small sample sizes used in inference. The proposed method is illustrated in applications including a biomechanical engineering design and a surgical treatment of acral lentiginous melanoma. Journal: The American Statistician Pages: 143-150 Issue: 2 Volume: 77 Year: 2023 Month: 4 X-DOI: 10.1080/00031305.2022.2127897 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2127897 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:143-150 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2172078_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Kenneth Rice Author-X-Name-First: Kenneth Author-X-Name-Last: Rice Author-Name: Thomas Lumley Author-X-Name-First: Thomas Author-X-Name-Last: Lumley Title: Comment on “A Case for Nonparametrics” by Bower et al. Journal: The American Statistician Pages: 220-220 Issue: 2 Volume: 77 Year: 2023 Month: 4 X-DOI: 10.1080/00031305.2023.2172078 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2172078 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:220-220 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2161637_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Jiaqi Gu Author-X-Name-First: Jiaqi Author-X-Name-Last: Gu Author-Name: Yan Zhang Author-X-Name-First: Yan Author-X-Name-Last: Zhang Author-Name: Guosheng Yin Author-X-Name-First: Guosheng Author-X-Name-Last: Yin Title: Bayesian Log-Rank Test Abstract: Comparison of two survival curves is a fundamental problem in survival analysis. Although abundant frequentist methods have been developed for comparing survival functions, inference procedures from the Bayesian perspective are rather limited. In this article, we extract the quantity of interest from the classic log-rank test and propose its Bayesian counterpart. Monte Carlo methods, including a Gibbs sampler and a sequential importance sampling procedure, are developed to draw posterior samples of survival functions and a decision rule of hypothesis testing is constructed for making inference. Via simulations and real data analysis, the proposed Bayesian log-rank test is shown to be asymptotically equivalent to the classic one when noninformative prior distributions are used, which provides a Bayesian interpretation of the log-rank test. When using the correct prior information from historical data, the Bayesian log-rank test is shown to outperform the classic one in terms of power. R codes to implement the Bayesian log-rank test are also provided with step-by-step instructions. Journal: The American Statistician Pages: 292-300 Issue: 3 Volume: 77 Year: 2023 Month: 7 X-DOI: 10.1080/00031305.2022.2161637 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2161637 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:3:p:292-300 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2197021_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Mark F. Schilling Author-X-Name-First: Mark F. Author-X-Name-Last: Schilling Title: Bartroff, J., Lorden, G. and Wang, L. (2022), “Optimal and Fast Confidence Intervals for Hypergeometric Successes,” The American Statistician: Comment by Schilling Journal: The American Statistician Pages: 342-342 Issue: 3 Volume: 77 Year: 2023 Month: 7 X-DOI: 10.1080/00031305.2023.2197021 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2197021 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:3:p:342-342 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2156612_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Rameela Raman Author-X-Name-First: Rameela Author-X-Name-Last: Raman Author-Name: Jessica Utts Author-X-Name-First: Jessica Author-X-Name-Last: Utts Author-Name: Andrew I. Cohen Author-X-Name-First: Andrew I. Author-X-Name-Last: Cohen Author-Name: Matthew J. Hayat Author-X-Name-First: Matthew J. Author-X-Name-Last: Hayat Title: Integrating Ethics into the Guidelines for Assessment and Instruction in Statistics Education (GAISE) Abstract: Statistics education at all levels includes data collected on human subjects. Thus, statistics educators have a responsibility to educate their students about the ethical aspects related to the collection of those data. The changing statistics education landscape has seen instruction moving from being formula-based to being focused on statistical reasoning. The widely implemented Guidelines for Assessment and Instruction in Statistics Education (GAISE) Report has paved the way for instructors to present introductory statistics to students in a way that is both approachable and engaging. However, with technological advancement and the increase in availability of real-world datasets, it is necessary that instruction also integrate the ethical aspects around data sources, such as privacy, how the data were obtained and whether participants consent to the use of their data. In this article, we propose incorporating ethics into established curricula and integrating ethics into undergraduate-level introductory statistics courses based on recommendations in the GAISE Report. We provide a few examples of how to prompt students to constructively think about their ethical responsibilities when working with data. Journal: The American Statistician Pages: 323-330 Issue: 3 Volume: 77 Year: 2023 Month: 7 X-DOI: 10.1080/00031305.2022.2156612 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2156612 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:3:p:323-330 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2139293_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Harlan Campbell Author-X-Name-First: Harlan Author-X-Name-Last: Campbell Author-Name: Paul Gustafson Author-X-Name-First: Paul Author-X-Name-Last: Gustafson Title: Bayes Factors and Posterior Estimation: Two Sides of the Very Same Coin Abstract: Recently, several researchers have claimed that conclusions obtained from a Bayes factor (or the posterior odds) may contradict those obtained from Bayesian posterior estimation. In this article, we wish to point out that no such “contradiction” exists if one is willing to consistently define one’s priors and posteriors. The key for congruence is that the (implied) prior model odds used for testing are the same as those used for estimation. Our recommendation is simple: If one reports a Bayes factor comparing two models, then one should also report posterior estimates which appropriately acknowledge the uncertainty with regards to which of the two models is correct. Journal: The American Statistician Pages: 248-258 Issue: 3 Volume: 77 Year: 2023 Month: 7 X-DOI: 10.1080/00031305.2022.2139293 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2139293 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:3:p:248-258 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2163689_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Serveh Sharifi Far Author-X-Name-First: Serveh Author-X-Name-Last: Sharifi Far Author-Name: Vanda Inácio Author-X-Name-First: Vanda Author-X-Name-Last: Inácio Author-Name: Daniel Paulin Author-X-Name-First: Daniel Author-X-Name-Last: Paulin Author-Name: Miguel de Carvalho Author-X-Name-First: Miguel Author-X-Name-Last: de Carvalho Author-Name: Nicole H. Augustin Author-X-Name-First: Nicole H. Author-X-Name-Last: Augustin Author-Name: Mike Allerhand Author-X-Name-First: Mike Author-X-Name-Last: Allerhand Author-Name: Gail Robertson Author-X-Name-First: Gail Author-X-Name-Last: Robertson Title: Consultancy Style Dissertations in Statistics and Data Science: Why and How Abstract: In this article, we chronicle the development of the consultancy style dissertations of the MSc program in Statistics with Data Science at the University of Edinburgh. These dissertations are based on real-world data problems, in joint supervision with industrial and academic partners, and aim to get all students in the cohort together to develop consultancy skills and best practices, and also to promote their statistical leadership. Aligning with recently published research on statistical education suggesting the need for a greater focus on statistical consultancy skills, we summarize our experience in organizing and supervising such consultancy style dissertations, describe the logistics of implementing them, and review the students’ and supervisors’ feedback about these dissertations. Journal: The American Statistician Pages: 331-339 Issue: 3 Volume: 77 Year: 2023 Month: 7 X-DOI: 10.1080/00031305.2022.2163689 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2163689 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:3:p:331-339 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2143897_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Tom E. Hardwicke Author-X-Name-First: Tom E. Author-X-Name-Last: Hardwicke Author-Name: Maia Salholz-Hillel Author-X-Name-First: Maia Author-X-Name-Last: Salholz-Hillel Author-Name: Mario Malički Author-X-Name-First: Mario Author-X-Name-Last: Malički Author-Name: Dénes Szűcs Author-X-Name-First: Dénes Author-X-Name-Last: Szűcs Author-Name: Theiss Bendixen Author-X-Name-First: Theiss Author-X-Name-Last: Bendixen Author-Name: John P. A. Ioannidis Author-X-Name-First: John P. A. Author-X-Name-Last: Ioannidis Title: Statistical Guidance to Authors at Top-Ranked Journals across Scientific Disciplines Abstract: Scientific journals may counter the misuse, misreporting, and misinterpretation of statistics by providing guidance to authors. We described the nature and prevalence of statistical guidance at 15 journals (top-ranked by Impact Factor) in each of 22 scientific disciplines across five high-level domains (N = 330 journals). The frequency of statistical guidance varied across domains (Health & Life Sciences: 122/165 journals, 74%; Multidisciplinary: 9/15 journals, 60%; Social Sciences: 8/30 journals, 27%; Physical Sciences: 21/90 journals, 23%; Formal Sciences: 0/30 journals, 0%). In one discipline (Clinical Medicine), statistical guidance was provided by all examined journals and in two disciplines (Mathematics and Computer Science) no examined journals provided statistical guidance. Of the 160 journals providing statistical guidance, 93 had a dedicated statistics section in their author instructions. The most frequently mentioned topics were confidence intervals (90 journals) and p-values (88 journals). For six “hotly debated” topics (statistical significance, p-values, Bayesian statistics, effect sizes, confidence intervals, and sample size planning/justification) journals typically offered implicit or explicit endorsement and rarely provided opposition. The heterogeneity of statistical guidance provided by top-ranked journals within and between disciplines highlights a need for further research and debate about the role journals can play in improving statistical practice. Journal: The American Statistician Pages: 239-247 Issue: 3 Volume: 77 Year: 2023 Month: 7 X-DOI: 10.1080/00031305.2022.2143897 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2143897 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:3:p:239-247 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2087735_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Xun Li Author-X-Name-First: Xun Author-X-Name-Last: Li Author-Name: Joyee Ghosh Author-X-Name-First: Joyee Author-X-Name-Last: Ghosh Author-Name: Gabriele Villarini Author-X-Name-First: Gabriele Author-X-Name-Last: Villarini Title: A Comparison of Bayesian Multivariate Versus Univariate Normal Regression Models for Prediction Abstract: In many moderate dimensional applications we have multiple response variables that are associated with a common set of predictors. When the main objective is prediction of the response variables, a natural question is: do multivariate regression models that accommodate dependency among the response variables improve prediction compared to their univariate counterparts? Note that in this article, by univariate versus multivariate regression models we refer to regression models with a single versus multiple response variables, respectively. We assume that under both scenarios, there are multiple covariates. Our question is motivated by an application in climate science, which involves the prediction of multiple metrics that measure the activity, intensity, severity etc. of a hurricane season. Average sea surface temperatures (SSTs) during the hurricane season have been used as predictors for each of these metrics, in separate univariate regression models, in the literature. Since the true SSTs are yet to be observed during prediction, typically their forecasts from multiple climate models are used as predictors. Some climate models have a few missing values so we develop Bayesian univariate/multivariate normal regression models, that can handle missing covariates and variable selection uncertainty. Whether Bayesian multivariate normal regression models improve prediction compared to their univariate counterparts is not clear from the existing literature, and in this work we try to fill this gap. Journal: The American Statistician Pages: 304-312 Issue: 3 Volume: 77 Year: 2023 Month: 7 X-DOI: 10.1080/00031305.2022.2087735 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2087735 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:3:p:304-312 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2157874_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Marcelo dos Santos Author-X-Name-First: Marcelo Author-X-Name-Last: dos Santos Author-Name: Fernanda De Bastiani Author-X-Name-First: Fernanda Author-X-Name-Last: De Bastiani Author-Name: Miguel A. Uribe-Opazo Author-X-Name-First: Miguel A. Author-X-Name-Last: Uribe-Opazo Author-Name: Manuel Galea Author-X-Name-First: Manuel Author-X-Name-Last: Galea Title: Selection Criterion of Working Correlation Structure for Spatially Correlated Data Abstract: To obtain regression parameter estimates in generalized estimation equation modeling, whether in longitudinal or spatially correlated data, it is necessary to specify the structure of the working correlation matrix. The regression parameter estimates can be affected by the choice of this matrix. Within spatial statistics, the correlation matrix also influences how spatial variability is modeled. Therefore, this study proposes a new method for selecting a working matrix, based on conditioning the variance-covariance matrix naive. The method performance is evaluated by an extensive simulation study, using the marginal distributions of normal, Poisson, and gamma for spatially correlated data. The correlation structure specification is based on semivariogram models, using the Wendland, Matérn, and spherical model families. The results reveal that regarding the hit rates of the true spatial correlation structure of simulated data, the proposed criterion resulted in better performance than competing criteria: quasi-likelihood under the independence model criterion QIC, correlation information criterion CIC, and the Rotnizky–Jewell criterion RJC. The application of an appropriate spatial correlation structure selection was shown using the first-semester average rainfall data of 2021 in the state of Pernambuco, Brazil. Journal: The American Statistician Pages: 283-291 Issue: 3 Volume: 77 Year: 2023 Month: 7 X-DOI: 10.1080/00031305.2022.2157874 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2157874 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:3:p:283-291 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2151510_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Diana Rauwolf Author-X-Name-First: Diana Author-X-Name-Last: Rauwolf Author-Name: Udo Kamps Author-X-Name-First: Udo Author-X-Name-Last: Kamps Title: Quantifying the Inspection Paradox with Random Time Abstract: The well-known inspection paradox of renewal theory states that, in expectation, the inspection interval is larger than a common renewal interval, in general. For a random inspection time, which includes the deterministic case, and a delayed renewal process, representations of the expected length of an inspection interval and related inequalities in terms of covariances are shown. Datasets of eruption times of Beehive Geyser and Riverside Geyser in Yellowstone National Park, as well as several distributional examples, illustrate the findings. Journal: The American Statistician Pages: 274-282 Issue: 3 Volume: 77 Year: 2023 Month: 7 X-DOI: 10.1080/00031305.2022.2151510 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2151510 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:3:p:274-282 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2179664_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Noga Alon Author-X-Name-First: Noga Author-X-Name-Last: Alon Author-Name: Yaakov Malinovsky Author-X-Name-First: Yaakov Author-X-Name-Last: Malinovsky Title: Hitting a Prime in 2.43 Dice Rolls (On Average) Abstract: What is the number of rolls of fair six-sided dice until the first time the total sum of all rolls is a prime? We compute the expectation and the variance of this random variable up to an additive error of less than 10−4 . This is a solution to a puzzle suggested by DasGupta in the Bulletin of the Institute of Mathematical Statistics, where the published solution is incomplete. The proof is simple, combining a basic dynamic programming algorithm with a quick Matlab computation and basic facts about the distribution of primes. Journal: The American Statistician Pages: 301-303 Issue: 3 Volume: 77 Year: 2023 Month: 7 X-DOI: 10.1080/00031305.2023.2179664 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2179664 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:3:p:301-303 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2143898_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Daniel Vedensky Author-X-Name-First: Daniel Author-X-Name-Last: Vedensky Author-Name: Paul A. Parker Author-X-Name-First: Paul A. Author-X-Name-Last: Parker Author-Name: Scott H. Holan Author-X-Name-First: Scott H. Author-X-Name-Last: Holan Title: A Look into the Problem of Preferential Sampling through the Lens of Survey Statistics Abstract: An evolving problem in the field of spatial and ecological statistics is that of preferential sampling, where biases may be present due to a relationship between sample data locations and a response of interest. This field of research bears a striking resemblance to the longstanding problem of informative sampling within survey methodology, although with some important distinctions. With the goal of promoting collaborative effort within and between these two problem domains, we make comparisons and contrasts between the two problem statements. Specifically, we review many of the solutions available to address each of these problems, noting the important differences in modeling techniques. Additionally, we construct a series of simulation studies to examine some of the methods available for preferential sampling, as well as a comparison analyzing heavy metal biomonitoring data. Journal: The American Statistician Pages: 313-322 Issue: 3 Volume: 77 Year: 2023 Month: 7 X-DOI: 10.1080/00031305.2022.2143898 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2143898 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:3:p:313-322 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2205455_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Jay Bartroff Author-X-Name-First: Jay Author-X-Name-Last: Bartroff Author-Name: Gary Lorden Author-X-Name-First: Gary Author-X-Name-Last: Lorden Author-Name: Lijia Wang Author-X-Name-First: Lijia Author-X-Name-Last: Wang Title: Response to Comment by Schilling Journal: The American Statistician Pages: 343-344 Issue: 3 Volume: 77 Year: 2023 Month: 7 X-DOI: 10.1080/00031305.2023.2205455 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2205455 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:3:p:343-344 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2230758_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Ding-Geng Chen Author-X-Name-First: Ding-Geng Author-X-Name-Last: Chen Title: Event History Analysis with R, 2nd ed. Journal: The American Statistician Pages: 340-341 Issue: 3 Volume: 77 Year: 2023 Month: 7 X-DOI: 10.1080/00031305.2023.2230758 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2230758 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:3:p:340-341 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2141856_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Jangsun Baek Author-X-Name-First: Jangsun Author-X-Name-Last: Baek Author-Name: Jeong-Soo Park Author-X-Name-First: Jeong-Soo Author-X-Name-Last: Park Title: Mixture of Networks for Clustering Categorical Data: A Penalized Composite Likelihood Approach Abstract: One of the challenges in clustering categorical data is the curse of dimensionality caused by the inherent sparsity of high-dimensional data, the records of which include a large number of attributes. The latent class model (LCM) assumes local independence between the variables in clusters, and is a parsimonious model-based clustering approach that has been used to circumvent the problem. The mixture of a log-linear model is more flexible but requires more parameters to be estimated. In this research, we recognize that each categorical observation can be conceived as a network with pairwise linked nodes, which are the response levels of the observation attributes. Therefore, the categorical data for clustering is considered a finite mixture of different component layer networks with distinct patterns. We apply a penalized composite likelihood approach to a finite mixture of networks for sparse multivariate categorical data to reduce the number of parameters, implement the EM algorithm to estimate the model parameters, and show that the estimates are consistent and satisfy asymptotic normality. The performance of the proposed approach is shown to be better in comparison with the conventional methods for both synthetic and real datasets. Journal: The American Statistician Pages: 259-273 Issue: 3 Volume: 77 Year: 2023 Month: 7 X-DOI: 10.1080/00031305.2022.2141856 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2141856 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:3:p:259-273 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2139294_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: David I. Warton Author-X-Name-First: David I. Author-X-Name-Last: Warton Title: Global Simulation Envelopes for Diagnostic Plots in Regression Models Abstract: Residual plots are often used to interrogate regression model assumptions, but interpreting them requires an understanding of how much sampling variation to expect when assumptions are satisfied. In this article, we propose constructing global envelopes around data (or around trends fitted to data) on residual plots, exploiting recent advances that enable construction of global envelopes around functions by simulation. While the proposed tools are primarily intended as a graphical aid, they can be interpreted as formal tests of model assumptions, which enables the study of their properties via simulation experiments. We considered three model scenarios—fitting a linear model, generalized linear model or generalized linear mixed model—and explored the power of global simulation envelope tests constructed around data on quantile-quantile plots, or around trend lines on residual versus fits plots or scale-location plots. Global envelope tests compared favorably to commonly used tests of assumptions at detecting violations of distributional and linearity assumptions. Freely available R software (ecostats::plotenvelope) enables application of these tools to any fitted model that has methods for the simulate, residuals and predict functions. Journal: The American Statistician Pages: 425-431 Issue: 4 Volume: 77 Year: 2023 Month: 10 X-DOI: 10.1080/00031305.2022.2139294 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2139294 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:4:p:425-431 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2200512_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Marcos Matabuena Author-X-Name-First: Marcos Author-X-Name-Last: Matabuena Author-Name: Paulo Félix Author-X-Name-First: Paulo Author-X-Name-Last: Félix Author-Name: Marc Ditzhaus Author-X-Name-First: Marc Author-X-Name-Last: Ditzhaus Author-Name: Juan Vidal Author-X-Name-First: Juan Author-X-Name-Last: Vidal Author-Name: Francisco Gude Author-X-Name-First: Francisco Author-X-Name-Last: Gude Title: Hypothesis Testing for Matched Pairs with Missing Data by Maximum Mean Discrepancy: An Application to Continuous Glucose Monitoring Abstract: A frequent problem in statistical science is how to properly handle missing data in matched paired observations. There is a large body of literature coping with the univariate case. Yet, the ongoing technological progress in measuring biological systems raises the need for addressing more complex data, for example, graphs, strings, and probability distributions. To fill this gap, this article proposes new estimators of the maximum mean discrepancy (MMD) to handle complex matched pairs with missing data. These estimators can detect differences in data distributions under different missingness assumptions. The validity of this approach is proven and further studied in an extensive simulation study, and statistical consistency results are provided. Data obtained from continuous glucose monitoring in a longitudinal population-based diabetes study are used to illustrate the application of this approach. By employing new distributional representations along with cluster analysis, new clinical criteria on how glucose changes vary at the distributional level over 5 years can be explored. Journal: The American Statistician Pages: 357-369 Issue: 4 Volume: 77 Year: 2023 Month: 10 X-DOI: 10.1080/00031305.2023.2200512 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2200512 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:4:p:357-369 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2261817_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Brady T. West Author-X-Name-First: Brady T. Author-X-Name-Last: West Title: ANOVA and Mixed Models: A Short Introduction Using R Journal: The American Statistician Pages: 449-450 Issue: 4 Volume: 77 Year: 2023 Month: 10 X-DOI: 10.1080/00031305.2023.2261817 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2261817 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:4:p:449-450 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2191670_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Mehdi Moradi Author-X-Name-First: Mehdi Author-X-Name-Last: Moradi Author-Name: Ottmar Cronie Author-X-Name-First: Ottmar Author-X-Name-Last: Cronie Author-Name: Unai Pérez-Goya Author-X-Name-First: Unai Author-X-Name-Last: Pérez-Goya Author-Name: Jorge Mateu Author-X-Name-First: Jorge Author-X-Name-Last: Mateu Title: Hierarchical Spatio-Temporal Change-Point Detection Abstract: Detecting change-points in multivariate settings is usually carried out by analyzing all marginals either independently, via univariate methods, or jointly, through multivariate approaches. The former discards any inherent dependencies between different marginals and the latter may suffer from domination/masking among different change-points of distinct marginals. As a remedy, we propose an approach which groups marginals with similar temporal behaviors, and then performs group-wise multivariate change-point detection. Our approach groups marginals based on hierarchical clustering using distances which adjust for inherent dependencies. Through a simulation study we show that our approach, by preventing domination/masking, significantly enhances the general performance of the employed multivariate change-point detection method. Finally, we apply our approach to two datasets: (i) Land Surface Temperature in Spain, during the years 2000–2021, and (ii) The WikiLeaks Afghan War Diary data. Journal: The American Statistician Pages: 390-400 Issue: 4 Volume: 77 Year: 2023 Month: 10 X-DOI: 10.1080/00031305.2023.2191670 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2191670 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:4:p:390-400 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2261819_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Carlos Cinelli Author-X-Name-First: Carlos Author-X-Name-Last: Cinelli Title: A First Course in Linear Model Theory, 2nd ed. Journal: The American Statistician Pages: 451-451 Issue: 4 Volume: 77 Year: 2023 Month: 10 X-DOI: 10.1080/00031305.2023.2261819 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2261819 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:4:p:451-451 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2183257_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Per Gösta Andersson Author-X-Name-First: Per Gösta Author-X-Name-Last: Andersson Title: The Wald Confidence Interval for a Binomial p as an Illuminating “Bad” Example Abstract: When teaching we usually not only demonstrate/discuss how a certain method works, but, not less important, why it works. In contrast, the Wald confidence interval for a binomial p constitutes an excellent example of a case where we might be interested in why a method does not work. It has been in use for many years and, sadly enough, it is still to be found in many textbooks in mathematical statistics/statistics. The reasons for not using this interval are plentiful and this fact gives us a good opportunity to discuss all of its deficiencies and draw conclusions which are of more general interest. We will mostly use already known results and bring them together in a manner appropriate to the teaching situation. The main purpose of this article is to show how to stimulate students to take a more critical view of simplifications and approximations. We primarily aim for master’s students who previously have been confronted with the Wilson (score) interval, but parts of the presentation may as well be suitable for bachelor’s students. Journal: The American Statistician Pages: 443-448 Issue: 4 Volume: 77 Year: 2023 Month: 10 X-DOI: 10.1080/00031305.2023.2183257 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2183257 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:4:p:443-448 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2191664_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Olivier Binette Author-X-Name-First: Olivier Author-X-Name-Last: Binette Author-Name: Sokhna A York Author-X-Name-First: Sokhna A Author-X-Name-Last: York Author-Name: Emma Hickerson Author-X-Name-First: Emma Author-X-Name-Last: Hickerson Author-Name: Youngsoo Baek Author-X-Name-First: Youngsoo Author-X-Name-Last: Baek Author-Name: Sarvo Madhavan Author-X-Name-First: Sarvo Author-X-Name-Last: Madhavan Author-Name: Christina Jones Author-X-Name-First: Christina Author-X-Name-Last: Jones Title: Estimating the Performance of Entity Resolution Algorithms: Lessons Learned Through PatentsView.org Abstract: This article introduces a novel evaluation methodology for entity resolution algorithms. It is motivated by PatentsView.org, a public-use patent data exploration platform that disambiguates patent inventors using an entity resolution algorithm. We provide a data collection methodology and tailored performance estimators that account for sampling biases. Our approach is simple, practical, and principled—key characteristics that allow us to paint the first representative picture of PatentsView’s disambiguation performance. The results are used to inform PatentsView’s users of the reliability of the data and to allow the comparison of competing disambiguation algorithms. Journal: The American Statistician Pages: 370-380 Issue: 4 Volume: 77 Year: 2023 Month: 10 X-DOI: 10.1080/00031305.2023.2191664 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2191664 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:4:p:370-380 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2173294_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Peng Wang Author-X-Name-First: Peng Author-X-Name-Last: Wang Author-Name: Yilei Ma Author-X-Name-First: Yilei Author-X-Name-Last: Ma Author-Name: Siqi Xu Author-X-Name-First: Siqi Author-X-Name-Last: Xu Author-Name: Yi-Xin Wang Author-X-Name-First: Yi-Xin Author-X-Name-Last: Wang Author-Name: Yu Zhang Author-X-Name-First: Yu Author-X-Name-Last: Zhang Author-Name: Xiangyang Lou Author-X-Name-First: Xiangyang Author-X-Name-Last: Lou Author-Name: Ming Li Author-X-Name-First: Ming Author-X-Name-Last: Li Author-Name: Baolin Wu Author-X-Name-First: Baolin Author-X-Name-Last: Wu Author-Name: Guimin Gao Author-X-Name-First: Guimin Author-X-Name-Last: Gao Author-Name: Ping Yin Author-X-Name-First: Ping Author-X-Name-Last: Yin Author-Name: Nianjun Liu Author-X-Name-First: Nianjun Author-X-Name-Last: Liu Title: MOVER-R and Penalized MOVER-R Confidence Intervals for the Ratio of Two Quantities Abstract: Developing a confidence interval for the ratio of two quantities is an important task in statistics because of its omnipresence in real world applications. For such a problem, the MOVER-R (method of variance recovery for the ratio) technique, which is based on the recovery of variance estimates from confidence limits of the numerator and the denominator separately, was proposed as a useful and efficient approach. However, this method implicitly assumes that the confidence interval for the denominator never includes zero, which might be violated in practice. In this article, we first use a new framework to derive the MOVER-R confidence interval, which does not require the above assumption and covers the whole parameter space. We find that MOVER-R can produce an unbounded confidence interval, just like the well-known Fieller method. To overcome this issue, we further propose the penalized MOVER-R. We prove that the new method differs from MOVER-R only at the second order. It, however, always gives a bounded and analytic confidence interval. Through simulation studies and a real data application, we show that the penalized MOVER-R generally provides a better confidence interval than MOVER-R in terms of controlling the coverage probability and the median width. Journal: The American Statistician Pages: 381-389 Issue: 4 Volume: 77 Year: 2023 Month: 10 X-DOI: 10.1080/00031305.2023.2173294 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2173294 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:4:p:381-389 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2197022_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Xavier Puig Author-X-Name-First: Xavier Author-X-Name-Last: Puig Author-Name: Josep Ginebra Author-X-Name-First: Josep Author-X-Name-Last: Ginebra Title: Mapping Life Expectancy Loss in Barcelona in 2020 Abstract: We use a Bayesian spatio-temporal model, first to smooth small-area initial life expectancy estimates in Barcelona for 2020, and second to predict what small-area life expectancy would have been in 2020 in absence of covid-19 using mortality data from 2007 to 2019. This allows us to estimate and map the small-area life expectancy loss, which can be used to assess how the impact of covid-19 varies spatially, and to explore whether that loss relates to underlying factors, such as population density, educational level, or proportion of older individuals living alone. We find that the small-area life expectancy loss for men and for women have similar distributions, and are spatially uncorrelated but positively correlated with population density and among themselves. On average, we estimate that the life expectancy loss in Barcelona in 2020 was of 2.01 years for men, falling back to 2011 levels, and of 2.11 years for women, falling back to 2006 levels. Journal: The American Statistician Pages: 417-424 Issue: 4 Volume: 77 Year: 2023 Month: 10 X-DOI: 10.1080/00031305.2023.2197022 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2197022 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:4:p:417-424 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2186952_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Jan Graffelman Author-X-Name-First: Jan Author-X-Name-Last: Graffelman Author-Name: Jan de Leeuw Author-X-Name-First: Jan Author-X-Name-Last: de Leeuw Title: Improved Approximation and Visualization of the Correlation Matrix Abstract: The graphical representation of the correlation matrix by means of different multivariate statistical methods is reviewed, a comparison of the different procedures is presented with the use of an example dataset, and an improved representation with better fit is proposed. Principal component analysis is widely used for making pictures of correlation structure, though as shown a weighted alternating least squares approach that avoids the fitting of the diagonal of the correlation matrix outperforms both principal component analysis and principal factor analysis in approximating a correlation matrix. Weighted alternating least squares is a very strong competitor for principal component analysis, in particular if the correlation matrix is the focus of the study, because it improves the representation of the correlation matrix, often at the expense of only a minor percentage of explained variance for the original data matrix, if the latter is mapped onto the correlation biplot by regression. In this article, we propose to combine weighted alternating least squares with an additive adjustment of the correlation matrix, and this is seen to lead to further improved approximation of the correlation matrix. Journal: The American Statistician Pages: 432-442 Issue: 4 Volume: 77 Year: 2023 Month: 10 X-DOI: 10.1080/00031305.2023.2186952 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2186952 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:4:p:432-442 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2173293_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Davy Paindaveine Author-X-Name-First: Davy Author-X-Name-Last: Paindaveine Author-Name: Philippe Spindel Author-X-Name-First: Philippe Author-X-Name-Last: Spindel Title: Revisiting the Name Variant of the Two-Children Problem Abstract: Initially proposed by Martin Gardner in the 1950s, the famous two-children problem is often presented as a paradox in probability theory. A relatively recent variant of this paradox states that, while in a two-children family for which at least one child is a girl, the probability that the other child is a boy is 2/3, this probability becomes 1/2 if the first name of the girl is disclosed (provided that two sisters may not be given the same first name). We revisit this variant of the problem and show that, if one adopts a natural model for the way first names are given to girls, then the probability that the other child is a boy may take any value in (0,2/3) . By exploiting the concept of Schur-concavity, we study how this probability depends on model parameters. Journal: The American Statistician Pages: 401-405 Issue: 4 Volume: 77 Year: 2023 Month: 10 X-DOI: 10.1080/00031305.2023.2173293 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2173293 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:4:p:401-405 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2261818_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: P. Richard Hahn Author-X-Name-First: P. Author-X-Name-Last: Richard Hahn Title: Bayesian Modeling and Computation in Python Journal: The American Statistician Pages: 450-451 Issue: 4 Volume: 77 Year: 2023 Month: 10 X-DOI: 10.1080/00031305.2023.2261818 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2261818 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:4:p:450-451 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2141857_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Marius Hofert Author-X-Name-First: Marius Author-X-Name-Last: Hofert Author-Name: Avinash Prasad Author-X-Name-First: Avinash Author-X-Name-Last: Prasad Author-Name: Mu Zhu Author-X-Name-First: Mu Author-X-Name-Last: Zhu Title: RafterNet: Probabilistic Predictions in Multi-Response Regression Abstract: A fully nonparametric approach for making probabilistic predictions in multi-response regression problems is introduced. Random forests are used as marginal models for each response variable and, as novel contribution of the present work, the dependence between the multiple response variables is modeled by a generative neural network. This combined modeling approach of random forests, corresponding empirical marginal residual distributions and a generative neural network is referred to as RafterNet. Multiple datasets serve as examples to demonstrate the flexibility of the approach and its impact for making probabilistic forecasts. Journal: The American Statistician Pages: 406-416 Issue: 4 Volume: 77 Year: 2023 Month: 10 X-DOI: 10.1080/00031305.2022.2141857 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2141857 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:4:p:406-416 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2203177_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20 Author-Name: Sandra Siegfried Author-X-Name-First: Sandra Author-X-Name-Last: Siegfried Author-Name: Lucas Kook Author-X-Name-First: Lucas Author-X-Name-Last: Kook Author-Name: Torsten Hothorn Author-X-Name-First: Torsten Author-X-Name-Last: Hothorn Title: Distribution-Free Location-Scale Regression Abstract: We introduce a generalized additive model for location, scale, and shape (GAMLSS) next of kin aiming at distribution-free and parsimonious regression modeling for arbitrary outcomes. We replace the strict parametric distribution formulating such a model by a transformation function, which in turn is estimated from data. Doing so not only makes the model distribution-free but also allows to limit the number of linear or smooth model terms to a pair of location-scale predictor functions. We derive the likelihood for continuous, discrete, and randomly censored observations, along with corresponding score functions. A plethora of existing algorithms is leveraged for model estimation, including constrained maximum-likelihood, the original GAMLSS algorithm, and transformation trees. Parameter interpretability in the resulting models is closely connected to model selection. We propose the application of a novel best subset selection procedure to achieve especially simple ways of interpretation. All techniques are motivated and illustrated by a collection of applications from different domains, including crossing and partial proportional hazards, complex count regression, nonlinear ordinal regression, and growth curves. All analyses are reproducible with the help of the tram add-on package to the R system for statistical computing and graphics. Journal: The American Statistician Pages: 345-356 Issue: 4 Volume: 77 Year: 2023 Month: 10 X-DOI: 10.1080/00031305.2023.2203177 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2203177 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:77:y:2023:i:4:p:345-356 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2244542_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857 Author-Name: Joel E. Cohen Author-X-Name-First: Joel E. Author-X-Name-Last: Cohen Title: First-Passage Times for Random Partial Sums: Yadrenko’s Model for e and Beyond Abstract: M. I. Yadrenko discovered that the expectation of the minimum number N1 of independent and identically distributed uniform random variables on (0, 1) that have to be added to exceed 1 is e. For any threshold a > 0, K. G. Russell found the distribution, mean, and variance of the minimum number Na of independent and identically distributed uniform random summands required to exceed a. Here we calculate the distribution and moments of Na when the summands obey the negative exponential and Lévy distributions. The Lévy distribution has infinite mean. We compare these results with the results of Yadrenko and Russell for uniform random summands to see how the expected first-passage time E(Na),a>0, and other moments of Na depend on the distribution of the summand. Journal: The American Statistician Pages: 111-114 Issue: 1 Volume: 78 Year: 2024 Month: 1 X-DOI: 10.1080/00031305.2023.2244542 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2244542 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:111-114 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2216252_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857 Author-Name: Stijn Hawinkel Author-X-Name-First: Stijn Author-X-Name-Last: Hawinkel Author-Name: Willem Waegeman Author-X-Name-First: Willem Author-X-Name-Last: Waegeman Author-Name: Steven Maere Author-X-Name-First: Steven Author-X-Name-Last: Maere Title: Out-of-Sample R2: Estimation and Inference Abstract: Out-of-sample prediction is the acid test of predictive models, yet an independent test dataset is often not available for assessment of the prediction error. For this reason, out-of-sample performance is commonly estimated using data splitting algorithms such as cross-validation or the bootstrap. For quantitative outcomes, the ratio of variance explained to total variance can be summarized by the coefficient of determination or in-sample R2, which is easy to interpret and to compare across different outcome variables. As opposed to in-sample R2, out-of-sample R2 has not been well defined and the variability on out-of-sample R̂2 has been largely ignored. Usually only its point estimate is reported, hampering formal comparison of predictability of different outcome variables. Here we explicitly define out-of-sample R2 as a comparison of two predictive models, provide an unbiased estimator and exploit recent theoretical advances on uncertainty of data splitting estimates to provide a standard error for R̂2. The performance of the estimators for R2 and its standard error are investigated in a simulation study. We demonstrate our new method by constructing confidence intervals and comparing models for prediction of quantitative Brassica napus and Zea mays phenotypes based on gene expression data. Our method is available in the R-package oosse. Journal: The American Statistician Pages: 15-25 Issue: 1 Volume: 78 Year: 2024 Month: 1 X-DOI: 10.1080/00031305.2023.2216252 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2216252 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:15-25 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2223582_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857 Author-Name: Preston Biro Author-X-Name-First: Preston Author-X-Name-Last: Biro Author-Name: Stephen G. Walker Author-X-Name-First: Stephen G. Author-X-Name-Last: Walker Title: Play Call Strategies and Modeling for Target Outcomes in Football Abstract: This article considers one-off actions for a football coach who is asking for a specific outcome from a play. This will be in the form of a minimum gain in yards, usually in order to gain a first down. Using a random utility model approach we propose the play to be called is the one which maximizes the probability of the desired outcome. We specifically focus on pass plays, which requires the modeling of outcomes in terms of yards gained, for which we use the family of generalized gamma distributions. The data and results relate to the Fall 2021 Presbyterian College football team, in which we leverage specific information pertaining to the offensive playbook. Journal: The American Statistician Pages: 66-75 Issue: 1 Volume: 78 Year: 2024 Month: 1 X-DOI: 10.1080/00031305.2023.2223582 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2223582 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:66-75 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2216253_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857 Author-Name: Larry Han Author-X-Name-First: Larry Author-X-Name-Last: Han Author-Name: Andrea Arfè Author-X-Name-First: Andrea Author-X-Name-Last: Arfè Author-Name: Lorenzo Trippa Author-X-Name-First: Lorenzo Author-X-Name-Last: Trippa Title: Sensitivity Analyses of Clinical Trial Designs: Selecting Scenarios and Summarizing Operating Characteristics Abstract: The use of simulation-based sensitivity analyses is fundamental for evaluating and comparing candidate designs of future clinical trials. In this context, sensitivity analyses are especially useful to assess the dependence of important design operating characteristics with respect to various unknown parameters. Typical examples of operating characteristics include the likelihood of detecting treatment effects and the average study duration, which depend on parameters that are unknown until after the onset of the clinical study, such as the distributions of the primary outcomes and patient profiles. Two crucial components of sensitivity analyses are (i) the choice of a set of plausible simulation scenarios and (ii) the list of operating characteristics of interest. We propose a new approach for choosing the set of scenarios to be included in a sensitivity analysis. We maximize a utility criterion that formalizes whether a specific set of sensitivity scenarios is adequate to summarize how the operating characteristics of the trial design vary across plausible values of the unknown parameters. Then, we use optimization techniques to select the best set of simulation scenarios (according to the criteria specified by the investigator) to exemplify the operating characteristics of the trial design. We illustrate our proposal in three trial designs. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 76-87 Issue: 1 Volume: 78 Year: 2024 Month: 1 X-DOI: 10.1080/00031305.2023.2216253 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2216253 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:76-87 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2216239_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857 Author-Name: Samuel Pawel Author-X-Name-First: Samuel Author-X-Name-Last: Pawel Author-Name: Alexander Ly Author-X-Name-First: Alexander Author-X-Name-Last: Ly Author-Name: Eric-Jan Wagenmakers Author-X-Name-First: Eric-Jan Author-X-Name-Last: Wagenmakers Title: Evidential Calibration of Confidence Intervals Abstract: We present a novel and easy-to-use method for calibrating error-rate based confidence intervals to evidence-based support intervals. Support intervals are obtained from inverting Bayes factors based on a parameter estimate and its standard error. A k support interval can be interpreted as “the observed data are at least k times more likely under the included parameter values than under a specified alternative.” Support intervals depend on the specification of prior distributions for the parameter under the alternative, and we present several types that allow different forms of external knowledge to be encoded. We also show how prior specification can to some extent be avoided by considering a class of prior distributions and then computing so-called minimum support intervals which, for a given class of priors, have a one-to-one mapping with confidence intervals. We also illustrate how the sample size of a future study can be determined based on the concept of support. Finally, we show how the bound for the Type I error rate of Bayes factors leads to a bound for the coverage of support intervals. An application to data from a clinical trial illustrates how support intervals can lead to inferences that are both intuitive and informative. Journal: The American Statistician Pages: 47-57 Issue: 1 Volume: 78 Year: 2024 Month: 1 X-DOI: 10.1080/00031305.2023.2216239 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2216239 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:47-57 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2303414_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857 Author-Name: Skevi Michael Author-X-Name-First: Skevi Author-X-Name-Last: Michael Title: Introduction to Stochastic Finance with Market Examples, 2nd ed Journal: The American Statistician Pages: 129-130 Issue: 1 Volume: 78 Year: 2024 Month: 1 X-DOI: 10.1080/00031305.2024.2303414 File-URL: http://hdl.handle.net/10.1080/00031305.2024.2303414 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:129-130 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2164054_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857 Author-Name: David Rügamer Author-X-Name-First: David Author-X-Name-Last: Rügamer Author-Name: Chris Kolb Author-X-Name-First: Chris Author-X-Name-Last: Kolb Author-Name: Nadja Klein Author-X-Name-First: Nadja Author-X-Name-Last: Klein Title: Semi-Structured Distributional Regression Abstract: Combining additive models and neural networks allows to broaden the scope of statistical regression and extend deep learning-based approaches by interpretable structured additive predictors at the same time. Existing attempts uniting the two modeling approaches are, however, limited to very specific combinations and, more importantly, involve an identifiability issue. As a consequence, interpretability and stable estimation are typically lost. We propose a general framework to combine structured regression models and deep neural networks into a unifying network architecture. To overcome the inherent identifiability issues between different model parts, we construct an orthogonalization cell that projects the deep neural network into the orthogonal complement of the statistical model predictor. This enables proper estimation of structured model parts and thereby interpretability. We demonstrate the framework’s efficacy in numerical experiments and illustrate its special merits in benchmarks and real-world applications. Journal: The American Statistician Pages: 88-99 Issue: 1 Volume: 78 Year: 2024 Month: 1 X-DOI: 10.1080/00031305.2022.2164054 File-URL: http://hdl.handle.net/10.1080/00031305.2022.2164054 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:88-99 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2282631_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857 Author-Name: Jessica Allen Author-X-Name-First: Jessica Author-X-Name-Last: Allen Author-Name: Ting Wang Author-X-Name-First: Ting Author-X-Name-Last: Wang Title: Hidden Markov Models for Low-Frequency Earthquake Recurrence Abstract: Low-frequency earthquakes (LFEs) are small magnitude earthquakes with frequencies of 1–10 Hertz which often occur in overlapping sequence forming persistent seismic tremors. They provide insights into large earthquake processes along plate boundaries. LFEs occur stochastically in time, often forming temporally recurring clusters. The occurrence times are typically modeled using point processes and their intensity functions. We demonstrate how to use hidden Markov models coupled with visualization techniques to model inter-arrival times directly, classify LFE occurrence patterns along the San Andreas Fault, and perform model selection. We highlight two subsystems of LFE activity corresponding to periods of alternating episodic and quiescent behavior. Journal: The American Statistician Pages: 100-110 Issue: 1 Volume: 78 Year: 2024 Month: 1 X-DOI: 10.1080/00031305.2023.2282631 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2282631 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:100-110 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2226184_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857 Author-Name: Rolf Larsson Author-X-Name-First: Rolf Author-X-Name-Last: Larsson Title: Confidence Distributions for the Autoregressive Parameter Abstract: The notion of confidence distributions is applied to inference about the parameter in a simple autoregressive model, allowing the parameter to take the value one. This makes it possible to compare to asymptotic approximations in both the stationary and the nonstationary cases at the same time. The main point, however, is to compare to a Bayesian analysis of the same problem. A noninformative prior for a parameter, in the sense of Jeffreys, is given as the ratio of the confidence density and the likelihood. In this way, the similarity between the confidence and noninformative Bayesian frameworks is exploited. It is shown that, in the stationary case, asymptotically the so induced prior is flat. However, if a unit parameter is allowed, the induced prior has to have a spike at one of some size. Simulation studies and two empirical examples illustrate the ideas. Journal: The American Statistician Pages: 58-65 Issue: 1 Volume: 78 Year: 2024 Month: 1 X-DOI: 10.1080/00031305.2023.2226184 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2226184 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:58-65 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2277156_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857 Author-Name: Ronald Christensen Author-X-Name-First: Ronald Author-X-Name-Last: Christensen Title: Comment on “Forbidden Knowledge and Specialized Training: A Versatile Solution for the Two Main Sources of Overfitting in Linear Regression,” by Rohlfs (2023) Journal: The American Statistician Pages: 131-133 Issue: 1 Volume: 78 Year: 2024 Month: 1 X-DOI: 10.1080/00031305.2023.2277156 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2277156 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:131-133 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2304534_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857 Author-Name: The Editors Title: The American Statistician 2023 Associate Editors Journal: The American Statistician Pages: i-i Issue: 1 Volume: 78 Year: 2024 Month: 1 X-DOI: 10.1080/00031305.2024.2304534 File-URL: http://hdl.handle.net/10.1080/00031305.2024.2304534 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:i-i Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2192746_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857 Author-Name: Albert Vexler Author-X-Name-First: Albert Author-X-Name-Last: Vexler Author-Name: Alan D. Hutson Author-X-Name-First: Alan D. Author-X-Name-Last: Hutson Title: A Characterization of Most(More) Powerful Test Statistics with Simple Nonparametric Applications Abstract: Data-driven most powerful tests are statistical hypothesis decision-making tools that deliver the greatest power against a fixed null hypothesis among all corresponding data-based tests of a given size. When the underlying data distributions are known, the likelihood ratio principle can be applied to conduct most powerful tests. Reversing this notion, we consider the following questions. (a) Assuming a test statistic, say T, is given, how can we transform T to improve the power of the test? (b) Can T be used to generate the most powerful test? (c) How does one compare test statistics with respect to an attribute of the desired most powerful decision-making procedure? To examine these questions, we propose one-to-one mapping of the term “most powerful” to the distribution properties of a given test statistic via matching characterization. This form of characterization has practical applicability and aligns well with the general principle of sufficiency. Findings indicate that to improve a given test, we can employ relevant ancillary statistics that do not have changes in their distributions with respect to tested hypotheses. As an example, the present method is illustrated by modifying the usual t-test under nonparametric settings. Numerical studies based on generated data and a real-data set confirm that the proposed approach can be useful in practice. Journal: The American Statistician Pages: 36-46 Issue: 1 Volume: 78 Year: 2024 Month: 1 X-DOI: 10.1080/00031305.2023.2192746 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2192746 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:36-46 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2199800_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857 Author-Name: Johannes Bracher Author-X-Name-First: Johannes Author-X-Name-Last: Bracher Author-Name: Nils Koster Author-X-Name-First: Nils Author-X-Name-Last: Koster Author-Name: Fabian Krüger Author-X-Name-First: Fabian Author-X-Name-Last: Krüger Author-Name: Sebastian Lerch Author-X-Name-First: Sebastian Author-X-Name-Last: Lerch Title: Learning to Forecast: The Probabilistic Time Series Forecasting Challenge Abstract: We report on a course project in which students submit weekly probabilistic forecasts of two weather variables and one financial variable. This real-time format allows students to engage in practical forecasting, which requires a diverse set of skills in data science and applied statistics. We describe the context and aims of the course, and discuss design parameters like the selection of target variables, the forecast submission process, the evaluation of forecast performance, and the feedback provided to students. Furthermore, we describe empirical properties of students’ probabilistic forecasts, as well as some lessons learned on our part. Journal: The American Statistician Pages: 115-127 Issue: 1 Volume: 78 Year: 2024 Month: 1 X-DOI: 10.1080/00031305.2023.2199800 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2199800 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:115-127 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2302792_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857 Author-Name: Maria Francesca Marino Author-X-Name-First: Maria Francesca Author-X-Name-Last: Marino Title: Applied Linear Regression for Longitudinal Data: With an Emphasis on Missing Observations Journal: The American Statistician Pages: 128-129 Issue: 1 Volume: 78 Year: 2024 Month: 1 X-DOI: 10.1080/00031305.2024.2302792 File-URL: http://hdl.handle.net/10.1080/00031305.2024.2302792 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:128-129 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2216247_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857 Author-Name: Biao Zhang Author-X-Name-First: Biao Author-X-Name-Last: Zhang Title: Inverse Probability Weighting Estimation in Completely Randomized Experiments Abstract: In addition to treatment assignments and observed outcomes, covariate information is often available prior to randomization in completely randomized experiments that compare an active treatment versus control. The analysis of covariance (ANCOVA) method is commonly applied to adjust for baseline covariates in order to improve precision. We focus on making propensity score-based adjustment to covariates under the completely randomized design in a finite population of experimental units with two treatment groups. We study inverse probability weighting (IPW) estimation of the finite-population average treatment effect for a general class of working propensity score models, which includes generalized linear models for binary data. We provide randomization-based asymptotic analysis of the propensity score approach and explore the finite-population asymptotic behaviors of two IPW estimators of the average treatment effect. We identify a condition under which propensity score-based covariate adjustment is asymptotically equivalent to an ANCOVA-based covariate adjustment and improves precision compared with a simple unadjusted comparison between treatment and control arms. In particular, when the working propensity score is fitted by a generalized linear model for binary data with an intercept term, the asymptotic variance of the IPW estimators is the same for any link function, including identity link, logit link, probit link, and complementary log-log link. We demonstrate these methods using an HIV clinical trial and a post-traumatic stress disorder study. Finally, we present a simulation study comparing the finite-sample performance of IPW and other methods for both continuous and binary outcomes. Supplementary materials for this article are available online. Journal: The American Statistician Pages: 26-35 Issue: 1 Volume: 78 Year: 2024 Month: 1 X-DOI: 10.1080/00031305.2023.2216247 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2216247 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:26-35 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2249522_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857 Author-Name: Matthew Sainsbury-Dale Author-X-Name-First: Matthew Author-X-Name-Last: Sainsbury-Dale Author-Name: Andrew Zammit-Mangion Author-X-Name-First: Andrew Author-X-Name-Last: Zammit-Mangion Author-Name: Raphaël Huser Author-X-Name-First: Raphaël Author-X-Name-Last: Huser Title: Likelihood-Free Parameter Estimation with Neural Bayes Estimators Abstract: Neural Bayes estimators are neural networks that approximate Bayes estimators. They are fast, likelihood-free, and amenable to rapid bootstrap-based uncertainty quantification. In this article, we aim to increase the awareness of statisticians to this relatively new inferential tool, and to facilitate its adoption by providing user-friendly open-source software. We also give attention to the ubiquitous problem of estimating parameters from replicated data, which we address using permutation-invariant neural networks. Through extensive simulation studies we demonstrate that neural Bayes estimators can be used to quickly estimate parameters in weakly identified and highly parameterized models with relative ease. We illustrate their applicability through an analysis of extreme sea-surface temperature in the Red Sea where, after training, we obtain parameter estimates and bootstrap-based confidence intervals from hundreds of spatial fields in a fraction of a second. Journal: The American Statistician Pages: 1-14 Issue: 1 Volume: 78 Year: 2024 Month: 1 X-DOI: 10.1080/00031305.2023.2249522 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2249522 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:1-14 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2250399_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20240209T083504 git hash: db97ba8e3a Author-Name: Chixiang Chen Author-X-Name-First: Chixiang Author-X-Name-Last: Chen Author-Name: Shuo Chen Author-X-Name-First: Shuo Author-X-Name-Last: Chen Author-Name: Qi Long Author-X-Name-First: Qi Author-X-Name-Last: Long Author-Name: Sudeshna Das Author-X-Name-First: Sudeshna Author-X-Name-Last: Das Author-Name: Ming Wang Author-X-Name-First: Ming Author-X-Name-Last: Wang Title: Multiple-Model-based Robust Estimation of Causal Treatment Effect on a Binary Outcome with Integrated Information from Secondary Outcomes Abstract: An assessment of the causal treatment effect in the development and progression of certain diseases is important in clinical trials and biomedical studies. However, it is not possible to infer a causal relationship when the treatment assignment is imbalanced and confounded by other mechanisms. Specifically, when the treatment assignment is not randomized and the primary outcome is binary, a conventional logistic regression may not be valid to elucidate any causal inference. Moreover, exclusively capturing all confounders is extremely difficult and even impossible in large-scale observational studies. We propose a multiple-model-based robust (MultiMR) estimator for estimating the causal effect with a binary outcome, where multiple propensity score models and conditional mean imputation models are used to ensure estimation robustness. Furthermore, we propose an enhanced MultiMR (eMultiMR) estimator that reduces the estimation variability of MultiMR estimates by incorporating secondary outcomes that are highly correlated with the primary binary outcome. The resulting estimates are less sensitive to model mis-specification compared to those based on state-of-the-art doubly-robust methods. These estimates are verified through both theoretical and numerical assessments. The utility of (e)MultiMR estimation is illustrated using the Uniform Data Set (UDS) from the National Alzheimer’s Coordinating Center with the objective of detecting the causal effect of the short-term use of antihypertensive medications on the development of dementia or mild cognitive impairment. The proposed method has been implemented in an R package and is available at https://github.com/chencxxy28/eMultiMR. Journal: The American Statistician Pages: 150-160 Issue: 2 Volume: 78 Year: 2024 Month: 4 X-DOI: 10.1080/00031305.2023.2250399 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2250399 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:2:p:150-160 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2257237_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20240209T083504 git hash: db97ba8e3a Author-Name: Nicholas Larsen Author-X-Name-First: Nicholas Author-X-Name-Last: Larsen Author-Name: Jonathan Stallrich Author-X-Name-First: Jonathan Author-X-Name-Last: Stallrich Author-Name: Srijan Sengupta Author-X-Name-First: Srijan Author-X-Name-Last: Sengupta Author-Name: Alex Deng Author-X-Name-First: Alex Author-X-Name-Last: Deng Author-Name: Ron Kohavi Author-X-Name-First: Ron Author-X-Name-Last: Kohavi Author-Name: Nathaniel T. Stevens Author-X-Name-First: Nathaniel T. Author-X-Name-Last: Stevens Title: Statistical Challenges in Online Controlled Experiments: A Review of A/B Testing Methodology Abstract: The rise of internet-based services and products in the late 1990s brought about an unprecedented opportunity for online businesses to engage in large scale data-driven decision making. Over the past two decades, organizations such as Airbnb, Alibaba, Amazon, Baidu, Booking.com, Alphabet’s Google, LinkedIn, Lyft, Meta’s Facebook, Microsoft, Netflix, Twitter, Uber, and Yandex have invested tremendous resources in online controlled experiments (OCEs) to assess the impact of innovation on their customers and businesses. Running OCEs at scale has presented a host of challenges requiring solutions from many domains. In this article we review challenges that require new statistical methodologies to address them. In particular, we discuss the practice and culture of online experimentation, as well as its statistics literature, placing the current methodologies within their relevant statistical lineages and providing illustrative examples of OCE applications. Our goal is to raise academic statisticians’ awareness of these new research opportunities to increase collaboration between academia and the online industry. Journal: The American Statistician Pages: 135-149 Issue: 2 Volume: 78 Year: 2024 Month: 4 X-DOI: 10.1080/00031305.2023.2257237 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2257237 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:2:p:135-149 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2250401_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20240209T083504 git hash: db97ba8e3a Author-Name: Lin Ge Author-X-Name-First: Lin Author-X-Name-Last: Ge Author-Name: Yuzi Zhang Author-X-Name-First: Yuzi Author-X-Name-Last: Zhang Author-Name: Lance A. Waller Author-X-Name-First: Lance A. Author-X-Name-Last: Waller Author-Name: Robert H. Lyles Author-X-Name-First: Robert H. Author-X-Name-Last: Lyles Title: Enhanced Inference for Finite Population Sampling-Based Prevalence Estimation with Misclassification Errors Abstract: Epidemiologic screening programs often make use of tests with small, but nonzero probabilities of misdiagnosis. In this article, we assume the target population is finite with a fixed number of true cases, and that we apply an imperfect test with known sensitivity and specificity to a sample of individuals from the population. In this setting, we propose an enhanced inferential approach for use in conjunction with sampling-based bias-corrected prevalence estimation. While ignoring the finite nature of the population can yield markedly conservative estimates, direct application of a standard finite population correction (FPC) conversely leads to underestimation of variance. We uncover a way to leverage the typical FPC indirectly toward valid statistical inference. In particular, we derive a readily estimable extra variance component induced by misclassification in this specific but arguably common diagnostic testing scenario. Our approach yields a standard error estimate that properly captures the sampling variability of the usual bias-corrected maximum likelihood estimator of disease prevalence. Finally, we develop an adapted Bayesian credible interval for the true prevalence that offers improved frequentist properties (i.e., coverage and width) relative to a Wald-type confidence interval. We report the simulation results to demonstrate the enhanced performance of the proposed inferential methods. Journal: The American Statistician Pages: 192-198 Issue: 2 Volume: 78 Year: 2024 Month: 4 X-DOI: 10.1080/00031305.2023.2250401 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2250401 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:2:p:192-198 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2249967_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20240209T083504 git hash: db97ba8e3a Author-Name: Sachin S. Pandya Author-X-Name-First: Sachin S. Author-X-Name-Last: Pandya Author-Name: Xiaomeng Li Author-X-Name-First: Xiaomeng Author-X-Name-Last: Li Author-Name: Eric Barón Author-X-Name-First: Eric Author-X-Name-Last: Barón Author-Name: Timothy E. Moore Author-X-Name-First: Timothy E. Author-X-Name-Last: Moore Title: Bayesian Detection of Bias in Peremptory Challenges Using Historical Strike Data Abstract: United States law bars using peremptory strikes during jury selection because of prospective juror race, ethnicity, sex, or membership in certain other cognizable classes. Here, we extend a Bayesian approach for detecting such illegal strike bias by showing how to incorporate historical data on an attorney’s use of peremptory strikes in past cases. In so doing, we use the power prior to adjust the weight of such historical information in the analysis. Using simulations, we show how the choice of the power prior’s discounting parameter influences bias detection (how likely the credible interval for the bias parameter excludes zero), depending on the degree of incompatibility between current and historical trial data. Finally, we extend this approach with a prototype software application that lawyers could use to detect strike bias in real time during jury-selection. We illustrate this application’s use with real historical strike data from a convenience sample of cases from one court. Journal: The American Statistician Pages: 209-219 Issue: 2 Volume: 78 Year: 2024 Month: 4 X-DOI: 10.1080/00031305.2023.2249967 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2249967 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:2:p:209-219 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2259962_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20240209T083504 git hash: db97ba8e3a Author-Name: Alberto Brini Author-X-Name-First: Alberto Author-X-Name-Last: Brini Author-Name: Edwin R. van den Heuvel Author-X-Name-First: Edwin R. Author-X-Name-Last: van den Heuvel Title: Missing Data Imputation with High-Dimensional Data Abstract: Imputation of missing data in high-dimensional datasets with more variables P than samples N, P≫N, is hampered by the data dimensionality. For multivariate imputation, the covariance matrix is ill conditioned and cannot be properly estimated. For fully conditional imputation, the regression models for imputation cannot include all the variables. Thus, the high dimension requires special imputation approaches. In this article, we provide an overview and realistic comparisons of imputation approaches for high-dimensional data when applied to a linear mixed modeling (LMM) framework. We examine approaches from three different classes using simulation studies: multiple imputation with penalized regression, multiple imputation with recursive partitioning and predictive mean matching; and multiple imputation with Principal Component Analysis (PCA). We illustrate the methods on a real case study where a multivariate outcome (i.e., an extracted set of correlated biomarkers from human urine samples) was collected and monitored over time and we discuss the proposed methods with more standard imputation techniques that could be applied by ignoring either the multivariate or the longitudinal dimension. Our simulations demonstrate the superiority of the recursive partitioning and predictive mean matching algorithm over the other methods in terms of bias, mean squared error and coverage of the LMM parameter estimates when compared to those obtained from a data analysis without missingness, although it comes at the expense of high computational costs. It is worthwhile reconsidering much faster methodologies like the one relying on PCA. Journal: The American Statistician Pages: 240-252 Issue: 2 Volume: 78 Year: 2024 Month: 4 X-DOI: 10.1080/00031305.2023.2259962 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2259962 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:2:p:240-252 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2320219_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20240209T083504 git hash: db97ba8e3a Author-Name: Yang Ni Author-X-Name-First: Yang Author-X-Name-Last: Ni Title: Deep Learning and Scientific Computing with R torch Journal: The American Statistician Pages: 264-264 Issue: 2 Volume: 78 Year: 2024 Month: 4 X-DOI: 10.1080/00031305.2024.2320219 File-URL: http://hdl.handle.net/10.1080/00031305.2024.2320219 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:2:p:264-264 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2267639_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20240209T083504 git hash: db97ba8e3a Author-Name: Adriana Verónica Blanc Author-X-Name-First: Adriana Verónica Author-X-Name-Last: Blanc Title: The Phistogram Abstract: This article introduces a new kind of histogram-based representation for univariate random variables, named the phistogram because of its perceptual qualities. The technique relies on shifted groupings of data, creating a color-gradient zone that evidences the uncertainty from smoothing and highlights sampling issues. In this way, the phistogram offers a deep and visually appealing perspective on the finite sample peculiarities, being capable of depicting the underlying distribution as well, thus, becoming an useful complement to histograms and other statistical summaries. Although not limited to it, the present construction is derived from the equal-area histogram, a variant that differs conceptually from the traditional one. As such a distinction is not greatly emphasized in the literature, the graphical fundamentals are described in detail, and an alternative terminology is proposed to separate some concepts. Additionally, a compact notation is adopted to integrate the representation’s metadata into the graphic itself. Journal: The American Statistician Pages: 229-239 Issue: 2 Volume: 78 Year: 2024 Month: 4 X-DOI: 10.1080/00031305.2023.2267639 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2267639 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:2:p:229-239 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2252870_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20240209T083504 git hash: db97ba8e3a Author-Name: Chengxin Yang Author-X-Name-First: Chengxin Author-X-Name-Last: Yang Author-Name: Jerome P. Reiter Author-X-Name-First: Jerome P. Author-X-Name-Last: Reiter Title: Differentially Private Methods for Releasing Results of Stability Analyses Abstract: Data stewards and analysts can promote transparent and trustworthy science and policy-making by facilitating assessments of the sensitivity of published results to alternate analysis choices. For example, researchers may want to assess whether the results change substantially when different subsets of data points (e.g., sets formed by demographic characteristics) are used in the analysis, or when different models (e.g., with or without log transformations) are estimated on the data. Releasing the results of such stability analyses leaks information about the data subjects. When the underlying data are confidential, the data stewards and analysts may seek to bound this information leakage. We present methods for stability analyses that can satisfy differential privacy, a definition of data confidentiality providing such bounds. We use regression modeling as the motivating example. The basic idea is to split the data into disjoint subsets, compute a measure summarizing the difference between the published and alternative analysis on each subset, aggregate these subset estimates, and add noise to the aggregated value to satisfy differential privacy. We illustrate the methods using regressions in which an analyst compares coefficient estimates for different groups in the data, and in which analysts fit two different models on the data. Journal: The American Statistician Pages: 180-191 Issue: 2 Volume: 78 Year: 2024 Month: 4 X-DOI: 10.1080/00031305.2023.2252870 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2252870 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:2:p:180-191 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2259969_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20240209T083504 git hash: db97ba8e3a Author-Name: Weiwen Miao Author-X-Name-First: Weiwen Author-X-Name-Last: Miao Author-Name: Joseph L. Gastwirth Author-X-Name-First: Joseph L. Author-X-Name-Last: Gastwirth Title: The Application of the Likelihood Ratio Test and the Cochran-Mantel-Haenszel Test to Discrimination Cases Abstract: In practice, the ultimate outcome of many important discrimination cases, for example, the Wal-Mart, Nike and Goldman-Sachs equal pay cases, is determined at the stage when the plaintiffs request that the case be certified as a class action. The primary statistical issue at this time is whether the employment practice in question leads to a common pattern of outcomes disadvantaging most plaintiffs. However, there are no formal procedures or government guidelines for checking whether an employment practice results in a common pattern of disparity. This article proposes using the slightly modified likelihood ratio test and the one-sided Cochran-Mantel-Haenszel (CMH) test to examine data relevant to deciding whether this commonality requirement is satisfied. Data considered at the class certification stage from several actual cases are analyzed by the proposed procedures. The results often show that the employment practice at issue created a common pattern of disparity, however, based on the evidence presented to the courts, the class action requests were denied. Journal: The American Statistician Pages: 253-263 Issue: 2 Volume: 78 Year: 2024 Month: 4 X-DOI: 10.1080/00031305.2023.2259969 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2259969 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:2:p:253-263 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2249529_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20240209T083504 git hash: db97ba8e3a Author-Name: Wen-Han Hwang Author-X-Name-First: Wen-Han Author-X-Name-Last: Hwang Author-Name: Lu-Fang Chen Author-X-Name-First: Lu-Fang Author-X-Name-Last: Chen Author-Name: Jakub Stoklosa Author-X-Name-First: Jakub Author-X-Name-Last: Stoklosa Title: Counting the Unseen: Estimation of Susceptibility Proportions in Zero-Inflated Models Using a Conditional Likelihood Approach Abstract: Zero-inflated count data models are widely used in various fields such as ecology, epidemiology, and transportation, where count data with a large proportion of zeros is prevalent. Despite their widespread use, their theoretical properties have not been extensively studied. This study aims to investigate the impact of ignoring heterogeneity in event count intensity and susceptibility probability on zero-inflated count data analysis within the zero-inflated Poisson framework. To address this issue, we propose a novel conditional likelihood approach that uses positive count data only to estimate event count intensity parameters and develop a consistent estimator for estimating the average susceptibility probability. Our approach is compared with the maximum likelihood approach, and we demonstrate our findings through a comprehensive simulation study and real data analysis. The results can also be extended to zero-inflated binomial and geometric models with similar conclusions. These findings contribute to the understanding of the theoretical properties of zero-inflated count data models and provide a practical approach to handling heterogeneity in such models. Journal: The American Statistician Pages: 161-170 Issue: 2 Volume: 78 Year: 2024 Month: 4 X-DOI: 10.1080/00031305.2023.2249529 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2249529 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:2:p:161-170 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2320949_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20240209T083504 git hash: db97ba8e3a Author-Name: Gabriel Wallin Author-X-Name-First: Gabriel Author-X-Name-Last: Wallin Title: An Introduction to R and Python for Data Analysis: A Side-by-Side Approach. Journal: The American Statistician Pages: 265-265 Issue: 2 Volume: 78 Year: 2024 Month: 4 X-DOI: 10.1080/00031305.2024.2320949 File-URL: http://hdl.handle.net/10.1080/00031305.2024.2320949 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:2:p:265-265 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2270649_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20240209T083504 git hash: db97ba8e3a Author-Name: Anne Helby Petersen Author-X-Name-First: Anne Helby Author-X-Name-Last: Petersen Author-Name: Claus Ekstrøm Author-X-Name-First: Claus Author-X-Name-Last: Ekstrøm Title: Technical Validation of Plot Designs by Use of Deep Learning Abstract: When does inspecting a certain graphical plot allow for an investigator to reach the right statistical conclusion? Visualizations are commonly used for various tasks in statistics—including model diagnostics and exploratory data analysis—and though attractive due to its intuitive nature, the lack of available methods for validating plots is a major drawback. We propose a new technical validation method for visual reasoning. Our method trains deep neural networks to distinguish between plots simulated under two different data generating mechanisms (null or alternative), and we use the classification accuracy as a technical validation score (TVS). The TVS measures the information content in the plots, and TVS values can be used to compare different plots or different choices of data generating mechanisms, thereby providing a meaningful scale that new visual reasoning procedures can be validated against. We apply the method to three popular diagnostic plots for linear regression, namely scatterplots, quantile-quantile plots and residual plots. We consider various types and degrees of misspecification, as well as different within-plot sample sizes. Our method produces TVSs that increase with increasing sample size and decrease with increasing difficulty, and hence the TVS is a meaningful measure of validity. Journal: The American Statistician Pages: 220-228 Issue: 2 Volume: 78 Year: 2024 Month: 4 X-DOI: 10.1080/00031305.2023.2270649 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2270649 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:2:p:220-228 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2242442_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20240209T083504 git hash: db97ba8e3a Author-Name: Quang Nguyen Author-X-Name-First: Quang Author-X-Name-Last: Nguyen Author-Name: Ronald Yurko Author-X-Name-First: Ronald Author-X-Name-Last: Yurko Author-Name: Gregory J. Matthews Author-X-Name-First: Gregory J. Author-X-Name-Last: Matthews Title: Here Comes the STRAIN: Analyzing Defensive Pass Rush in American Football with Player Tracking Data Abstract: In American football, a pass rush is an attempt by the defensive team to disrupt the offense and prevent the quarterback (QB) from completing a pass. Existing metrics for assessing pass rush performance are either discrete-time quantities or based on subjective judgment. Using player tracking data, we propose STRAIN, a novel metric for evaluating pass rushers in the National Football League (NFL) at the continuous-time within-play level. Inspired by the concept of strain rate in materials science, STRAIN is a simple and interpretable means for measuring defensive pressure in football. It is a directly observed statistic as a function of two features: the distance between the pass rusher and QB, and the rate at which this distance is being reduced. Our metric possesses great predictability of pressure and stability over time. We also fit a multilevel model for STRAIN to understand the defensive pressure contribution of every pass rusher at the play-level. We apply our approach to NFL data and present results for the first eight weeks of the 2021 regular season. In particular, we provide comparisons of STRAIN for different defensive positions and play outcomes, and rankings of the NFL’s best pass rushers according to our metric. Journal: The American Statistician Pages: 199-208 Issue: 2 Volume: 78 Year: 2024 Month: 4 X-DOI: 10.1080/00031305.2023.2242442 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2242442 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:2:p:199-208 Template-Type: ReDIF-Article 1.0 # input file: UTAS_A_2249965_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20240209T083504 git hash: db97ba8e3a Author-Name: Hsin-wen Chang Author-X-Name-First: Hsin-wen Author-X-Name-Last: Chang Author-Name: Shu-Hsiang Wang Author-X-Name-First: Shu-Hsiang Author-X-Name-Last: Wang Title: Bivariate Analysis of Distribution Functions Under Biased Sampling Abstract: This article compares distribution functions among pairs of locations in their domains, in contrast to the typical approach of univariate comparison across individual locations. This bivariate approach is studied in the presence of sampling bias, which has been gaining attention in COVID-19 studies that over-represent more symptomatic people. In cases with either known or unknown sampling bias, we introduce Anderson–Darling-type tests based on both the univariate and bivariate formulation. A simulation study shows the superior performance of the bivariate approach over the univariate one. We illustrate the proposed methods using real data on the distribution of the number of symptoms suggestive of COVID-19. Journal: The American Statistician Pages: 171-179 Issue: 2 Volume: 78 Year: 2024 Month: 4 X-DOI: 10.1080/00031305.2023.2249965 File-URL: http://hdl.handle.net/10.1080/00031305.2023.2249965 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:taf:amstat:v:78:y:2024:i:2:p:171-179