Template-Type: ReDIF-Article 1.0
Author-Name: Philippe Barbe
Author-X-Name-First: Philippe
Author-X-Name-Last: Barbe
Author-Name: William C. Horrace
Author-X-Name-First: William C.
Author-X-Name-Last: Horrace
Title: A Critical Reanalysis of Maryland State Police Searches
Abstract:
This article argues that previous analyses of the Maryland State Police
search data may be unreliable, since nonstationarity of these data
precludes the use of standard statistical inference techniques. In
contrast, proper statistical graphics seem better suited to capture the
complexities of the racial bias issue.
Journal: The American Statistician
Pages: 1-7
Issue: 1
Volume: 66
Year: 2012
Month: 2
X-DOI: 10.1080/00031305.2012.663662
File-URL: http://hdl.handle.net/10.1080/00031305.2012.663662
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:1:p:1-7
Template-Type: ReDIF-Article 1.0
Author-Name: Jesse Frey
Author-X-Name-First: Jesse
Author-X-Name-Last: Frey
Author-Name: Andrés Pérez
Author-X-Name-First: Andrés
Author-X-Name-Last: Pérez
Title: Exact Binomial Confidence Intervals for Randomized Response
Abstract:
We consider the problem of finding an exact confidence interval for a
proportion that is estimated using randomized response. For many
randomized response schemes, this is equivalent to finding an exact
confidence interval for a bounded binomial proportion. Such intervals can
be obtained by truncating standard exact binomial confidence intervals,
but the truncated intervals may be empty or misleadingly short. We address
this problem by using exact confidence intervals obtained by inverting a
likelihood ratio test that takes into account that the proportion is
bounded. A simple adjustment is made to keep the intervals from being
excessively conservative. An R function for computing the intervals is
available as online supplementary material.
Journal: The American Statistician
Pages: 8-15
Issue: 1
Volume: 66
Year: 2012
Month: 2
X-DOI: 10.1080/00031305.2012.663680
File-URL: http://hdl.handle.net/10.1080/00031305.2012.663680
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:1:p:8-15
Template-Type: ReDIF-Article 1.0
Author-Name: Robert S. Poulson
Author-X-Name-First: Robert S.
Author-X-Name-Last: Poulson
Author-Name: Gary L. Gadbury
Author-X-Name-First: Gary L.
Author-X-Name-Last: Gadbury
Author-Name: David B. Allison
Author-X-Name-First: David B.
Author-X-Name-Last: Allison
Title: Treatment Heterogeneity and Individual Qualitative Interaction
Abstract:
Plausibility of high variability in treatment effects across individuals
has been recognized as an important consideration in clinical studies.
Surprisingly, little attention has been given to evaluating this
variability in design of clinical trials or analyses of resulting data.
High variation in a treatment's efficacy or safety across individuals
(referred to herein as treatment heterogeneity) may have important
consequences because the optimal treatment choice for an individual may be
different from that suggested by a study of average effects. We call this
an individual qualitative interaction (IQI), borrowing terminology from
earlier work—referring to a qualitative interaction (QI) being
present when the optimal treatment varies across “groups” of
individuals. At least three techniques have been proposed to investigate
treatment heterogeneity: techniques to detect a QI, use of measures such
as the density overlap of two outcome variables under different
treatments, and use of cross-over designs to observe “individual
effects.” We elucidate underlying connections among them, their
limitations, and some assumptions that may be required. We do so under a
potential outcomes framework that can add insights to results from usual
data analyses and to study design features that improve the capability to
more directly assess treatment heterogeneity.
Journal: The American Statistician
Pages: 16-24
Issue: 1
Volume: 66
Year: 2012
Month: 2
X-DOI: 10.1080/00031305.2012.671724
File-URL: http://hdl.handle.net/10.1080/00031305.2012.671724
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:1:p:16-24
Template-Type: ReDIF-Article 1.0
Author-Name: A. S. Hedayat
Author-X-Name-First: A. S.
Author-X-Name-Last: Hedayat
Author-Name: Guoqin Su
Author-X-Name-First: Guoqin
Author-X-Name-Last: Su
Title: Robustness of the Simultaneous Estimators of Location and Scale From Approximating a Histogram by a Normal Density Curve
Abstract:
The robust properties of the simultaneous estimators of location and
scale parameters (μ*, σ*) proposed by Brown and Hwang are
studied. As a pair of simultaneous M estimators of
location and scale, their asymptotic efficiencies (0.650 for μ* and
0.541 for σ*) are higher than those for median (0.637) and median
absolute deviation (0.368) under the normal distribution. Simulation
indicates that the distributions of and are much flatter than those
based on the sample mean and the sample standard deviation under the
normal distribution when the sample size is small.
Journal: The American Statistician
Pages: 25-33
Issue: 1
Volume: 66
Year: 2012
Month: 2
X-DOI: 10.1080/00031305.2012.663665
File-URL: http://hdl.handle.net/10.1080/00031305.2012.663665
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:1:p:25-33
Template-Type: ReDIF-Article 1.0
Author-Name: Bailey K. Fosdick
Author-X-Name-First: Bailey K.
Author-X-Name-Last: Fosdick
Author-Name: Adrian E. Raftery
Author-X-Name-First: Adrian E.
Author-X-Name-Last: Raftery
Title: Estimating the Correlation in Bivariate Normal Data With Known Variances and Small Sample Sizes
Abstract:
We consider the problem of estimating the correlation in bivariate normal
data when the means and variances are assumed known, with emphasis on the
small sample case. We consider eight different estimators, several of them
considered here for the first time in the literature. In a simulation
study, we found that Bayesian estimators using the uniform and arc-sine
priors outperformed several empirical and exact or approximate maximum
likelihood estimators in small samples. The arc-sine prior did better for
large values of the correlation. For testing whether the correlation is
zero, we found that Bayesian hypothesis tests outperformed significance
tests based on the empirical and exact or approximate maximum likelihood
estimators considered in small samples, but that all tests performed
similarly for sample size 50. These results lead us to suggest using the
posterior mean with the arc-sine prior to estimate the correlation in
small samples when the variances are assumed known.
Journal: The American Statistician
Pages: 34-41
Issue: 1
Volume: 66
Year: 2012
Month: 2
X-DOI: 10.1080/00031305.2012.676329
File-URL: http://hdl.handle.net/10.1080/00031305.2012.676329
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:1:p:34-41
Template-Type: ReDIF-Article 1.0
Author-Name: Michael L. Lavine
Author-X-Name-First: Michael L.
Author-X-Name-Last: Lavine
Author-Name: James S. Hodges
Author-X-Name-First: James S.
Author-X-Name-Last: Hodges
Title: On Rigorous Specification of ICAR Models
Abstract:
Intrinsic (or improper) conditional autoregressions, or ICARs, are widely
used in spatial statistics, splines, dynamic linear models, and elsewhere.
Such models usually have several variance components, including one for
errors and at least one for random effects. Likelihood and Bayesian
inference depend on the likelihood function of those variances. But in the
absence of constraints or further specifications that are not inherent to
ICARs, the likelihood function is arbitrary and thus so are some
inferences. We suggest several ways to add constraints or further
specifications, but any choice is merely a convention.
Journal: The American Statistician
Pages: 42-49
Issue: 1
Volume: 66
Year: 2012
Month: 2
X-DOI: 10.1080/00031305.2012.654746
File-URL: http://hdl.handle.net/10.1080/00031305.2012.654746
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:1:p:42-49
Template-Type: ReDIF-Article 1.0
Author-Name: Alvaro Nosedal-Sanchez
Author-X-Name-First: Alvaro
Author-X-Name-Last: Nosedal-Sanchez
Author-Name: Curtis B. Storlie
Author-X-Name-First: Curtis B.
Author-X-Name-Last: Storlie
Author-Name: Thomas C.M. Lee
Author-X-Name-First: Thomas C.M.
Author-X-Name-Last: Lee
Author-Name: Ronald Christensen
Author-X-Name-First: Ronald
Author-X-Name-Last: Christensen
Title: Reproducing Kernel Hilbert Spaces for Penalized Regression: A Tutorial
Abstract:
Penalized regression procedures have become very popular ways to estimate
complicated functions. The smoothing spline, for example, is the solution
of a minimization problem in a functional space. If such a minimization
problem is posed on a reproducing kernel Hilbert space (RKHS), the
solution is guaranteed to exist, is unique, and has a very simple form.
There are excellent books and articles about RKHS and their applications
in statistics; however, this existing literature is very dense. This
article provides a friendly reference for a reader approaching this
subject for the first time. It begins with a simple problem, a system of
linear equations, and then gives an intuitive motivation for reproducing
kernels. Armed with the intuition gained from our first examples, we take
the reader from vector spaces to Banach spaces and to RKHS. Finally, we
present some statistical estimation problems that can be solved using the
mathematical machinery discussed. After reading this tutorial, the reader
will be ready to study more advanced texts and articles about the subject,
such as those by Wahba or Gu. Online supplements are available for this
article.
Journal: The American Statistician
Pages: 50-60
Issue: 1
Volume: 66
Year: 2012
Month: 2
X-DOI: 10.1080/00031305.2012.678196
File-URL: http://hdl.handle.net/10.1080/00031305.2012.678196
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:1:p:50-60
Template-Type: ReDIF-Article 1.0
Author-Name: Daniel F. Stone
Author-X-Name-First: Daniel F.
Author-X-Name-Last: Stone
Title: Measurement Error and the Hot Hand
Abstract:
This article shows that the first autocorrelation of basketball shot
results is a highly biased and inconsistent estimator of the first
autocorrelation of the ex ante probabilities with which
the shots are made. Shot result autocorrelation is close to zero even when
shot probability autocorrelation is close to one. The bias is caused by
what is equivalent to a severe measurement error problem. The results
imply that the widespread belief among players and fans in the hot hand is
not necessarily a cognitive fallacy.
Journal: The American Statistician
Pages: 61-66
Issue: 1
Volume: 66
Year: 2012
Month: 2
X-DOI: 10.1080/00031305.2012.676467
File-URL: http://hdl.handle.net/10.1080/00031305.2012.676467
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:1:p:61-66
Template-Type: ReDIF-Article 1.0
Author-Name: Christopher Tong
Author-X-Name-First: Christopher
Author-X-Name-Last: Tong
Title: Letter to the Editor
Journal: The American Statistician
Pages: 75-75
Issue: 1
Volume: 66
Year: 2012
Month: 2
X-DOI: 10.1080/00031305.2012.667900
File-URL: http://hdl.handle.net/10.1080/00031305.2012.667900
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:1:p:75-75
Template-Type: ReDIF-Article 1.0
Author-Name: John W. Seaman
Author-X-Name-First: John W.
Author-X-Name-Last: Seaman
Author-Name: John W. Seaman
Author-X-Name-First: John W.
Author-X-Name-Last: Seaman
Author-Name: James D. Stamey
Author-X-Name-First: James D.
Author-X-Name-Last: Stamey
Title: Hidden Dangers of Specifying Noninformative Priors
Abstract:
“Noninformative” priors are widely used in Bayesian
inference. Diffuse priors are often placed on parameters that are
components of some function of interest. That function may, of course,
have a prior distribution that is highly informative, in contrast to the
joint prior placed on its arguments, resulting in unintended influence on
the posterior for the function. This problem is not always recognized by
users of “noninformative” priors. We consider several
examples of this problem. We also suggest methods for handling such
induced priors.
Journal: The American Statistician
Pages: 77-84
Issue: 2
Volume: 66
Year: 2012
Month: 5
X-DOI: 10.1080/00031305.2012.695938
File-URL: http://hdl.handle.net/10.1080/00031305.2012.695938
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:2:p:77-84
Template-Type: ReDIF-Article 1.0
Author-Name: Tanya P. Garcia
Author-X-Name-First: Tanya P.
Author-X-Name-Last: Garcia
Author-Name: Priya Kohli
Author-X-Name-First: Priya
Author-X-Name-Last: Kohli
Author-Name: Mohsen Pourahmadi
Author-X-Name-First: Mohsen
Author-X-Name-Last: Pourahmadi
Title: Regressograms and Mean-Covariance Models for Incomplete Longitudinal Data
Abstract:
Longitudinal studies are prevalent in biological and social sciences
where subjects are measured repeatedly over time. Modeling the
correlations and handling missing data are among the most challenging
problems in analyzing such data. There are various methods for handling
missing data, but data-based and graphical methods for modeling the
covariance matrix of longitudinal data are relatively new. We adopt an
approach based on the modified Cholesky decomposition of the covariance
matrix which handles both challenges. It amounts to formulating parametric
models for the regression coefficients of the conditional mean and
variance of each measurement given its predecessors. We demonstrate the
roles of profile plots and regressograms in formulating
joint mean-covariance models for incomplete longitudinal data. Applying
these graphical tools to the Fruit Fly Mortality (FFM) data, which has 22%
missing values, reveals a logistic curve for the mean function and two
different models for the two factors of the modified Cholesky
decomposition of the sample covariance matrix. An expectation-maximization
algorithm is proposed for estimating the parameters of the mean-covariance
models; it performs well for the FFM data and in a simulation study of
incomplete longitudinal data.
Journal: The American Statistician
Pages: 85-91
Issue: 2
Volume: 66
Year: 2012
Month: 5
X-DOI: 10.1080/00031305.2012.695935
File-URL: http://hdl.handle.net/10.1080/00031305.2012.695935
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:2:p:85-91
Template-Type: ReDIF-Article 1.0
Author-Name: Shelley Hurwitz
Author-X-Name-First: Shelley
Author-X-Name-Last: Hurwitz
Author-Name: John S. Gardenier
Author-X-Name-First: John S.
Author-X-Name-Last: Gardenier
Title: Ethical Guidelines for Statistical Practice: The First 60 Years and Beyond
Abstract:
The Ethical Guidelines for Statistical Practice of the American
Statistical Association (ASA) have evolved over a span of more than
60 years, going back to 1949. The Interim version of the Guidelines
was published in 1980, the Trial version was published in 1983 and revised
and formalized in 1989, the current version was approved by the Board of
Directors and made available on the ASA's Web site in 1999, and ASA
accreditation now requires statistical practitioners to agree to abide by
them. The new century brings new ethical concerns for statisticians. As
examples, bioethics is booming, climate science is newsworthy for both
science and ethics, and issues of statistical integrity in research keeps
the U.S. Department of Health and Human Services Office of Research
Integrity very busy. In this century, we see a rapid increase in the
ability to collect massive amounts of data, with complex structure and a
sometimes sensitive nature. With these unparalleled opportunities for
statisticians comes an increased need for clear guidelines on professional
ethics. The evolution of the Guidelines therefore needs to continue. In
this article, we examine the long history of the ASA Ethical Guidelines
for Statistical Practice, and discuss potential areas for revision to meet
the needs of our expanding profession.
Journal: The American Statistician
Pages: 99-103
Issue: 2
Volume: 66
Year: 2012
Month: 5
X-DOI: 10.1080/00031305.2012.695959
File-URL: http://hdl.handle.net/10.1080/00031305.2012.695959
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:2:p:99-103
Template-Type: ReDIF-Article 1.0
Author-Name: Ruud Wetzels
Author-X-Name-First: Ruud
Author-X-Name-Last: Wetzels
Author-Name: Raoul P. P. P. Grasman
Author-X-Name-First: Raoul P. P. P.
Author-X-Name-Last: Grasman
Author-Name: Eric-Jan Wagenmakers
Author-X-Name-First: Eric-Jan
Author-X-Name-Last: Wagenmakers
Title: A Default Bayesian Hypothesis Test for ANOVA Designs
Abstract:
This article presents a Bayesian hypothesis test for analysis of variance
(ANOVA) designs. The test is an application of standard Bayesian methods
for variable selection in regression models. We illustrate the effect of
various g-priors on the ANOVA hypothesis test. The
Bayesian test for ANOVA designs is useful for empirical researchers and
for students; both groups will get a more acute appreciation of Bayesian
inference when they can apply it to practical statistical problems such as
ANOVA. We illustrate the use of the test with two examples, and we provide
R code that makes the test easy to use.
Journal: The American Statistician
Pages: 104-111
Issue: 2
Volume: 66
Year: 2012
Month: 5
X-DOI: 10.1080/00031305.2012.695956
File-URL: http://hdl.handle.net/10.1080/00031305.2012.695956
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:2:p:104-111
Template-Type: ReDIF-Article 1.0
Author-Name: Ananda Sen
Author-X-Name-First: Ananda
Author-X-Name-Last: Sen
Title: On the Interrelation Between the Sample Mean and the Sample Variance
Abstract:
The linearity (or lack thereof) of association between sample mean and
sample variance is explored in this note with the intent of providing new
insights. Of particular interest is a well-known inequality involving the
measures of skewness and kurtosis that is derived as a consequence of an
identity involving the correlation between sample mean and sample
variance. The nature of association between the two is explored further by
means of the conditional expectation of the sample variance given the
mean. We present several characterization results where the specific
relationship of this conditional expectation and sample mean uniquely
determines the parent population. The note is presented at a level
accessible to graduate or upper-level undergraduate students with several
illustrative examples included as teaching aid.
Journal: The American Statistician
Pages: 112-117
Issue: 2
Volume: 66
Year: 2012
Month: 5
X-DOI: 10.1080/00031305.2012.695960
File-URL: http://hdl.handle.net/10.1080/00031305.2012.695960
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:2:p:112-117
Template-Type: ReDIF-Article 1.0
Author-Name: Jay M. Ver Hoef
Author-X-Name-First: Jay M.
Author-X-Name-Last: Ver Hoef
Title: Who Invented the Delta Method?
Abstract:
Many statisticians and other scientists use what is commonly called the
“delta method.” However, few people know who proposed it.
The earliest article was found in an obscure journal, and the author is
rarely cited for his contribution. This article briefly reviews three
modern versions of the delta method and how they are used. Then, some
history on the author and the journal of the first known article on the
delta method is given. The original author’s specific contribution
is reproduced, along with a discussion on possible reasons that it has
been overlooked.
Journal: The American Statistician
Pages: 124-127
Issue: 2
Volume: 66
Year: 2012
Month: 5
X-DOI: 10.1080/00031305.2012.687494
File-URL: http://hdl.handle.net/10.1080/00031305.2012.687494
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:2:p:124-127
Template-Type: ReDIF-Article 1.0
Author-Name: Theodore G. Karrison
Author-X-Name-First: Theodore G.
Author-X-Name-Last: Karrison
Author-Name: Mark J. Ratain
Author-X-Name-First: Mark J.
Author-X-Name-Last: Ratain
Author-Name: Walter M. Stadler
Author-X-Name-First: Walter M.
Author-X-Name-Last: Stadler
Author-Name: Gary L. Rosner
Author-X-Name-First: Gary L.
Author-X-Name-Last: Rosner
Title: Estimation of Progression-Free Survival for All Treated Patients in the Randomized Discontinuation Trial Design
Abstract:
The randomized discontinuation trial (RDT) design is an enrichment-type
design that has been used in a variety of diseases to evaluate the
efficacy of new treatments. The RDT design seeks to select a more
homogeneous group of patients, consisting of those who are more likely to
show a treatment benefit if one exists. In oncology, the RDT design has
been applied to evaluate the effects of cytostatic agents, that is, drugs
that act primarily by slowing tumor growth rather than shrinking tumors.
In the RDT design, all patients receive treatment during an initial,
open-label run-in period of duration T. Patients with
objective response (substantial tumor shrinkage) remain on therapy while
those with early progressive disease are removed from the trial. Patients
with stable disease (SD) are then randomized to either continue active
treatment or switched to placebo. The main analysis compares outcomes, for
example, progression-free survival (PFS), between the two randomized arms.
As a secondary objective, investigators may seek to estimate PFS for all
treated patients, measured from the time of entry into the study, by
combining information from the run-in and post run-in periods. For
t ⩽ T, PFS is estimated by the
observed proportion of patients who are progression-free among all
patients enrolled. For t > T, the
estimate can be expressed as , where is the estimated probability of
response during the run-in period, is the estimated probability of
SD, and and are the Kaplan--Meier estimates
of subsequent PFS in the responders and patients with SD randomized to
continue treatment, respectively. In this article, we derive the variance
of , enabling the construction of
confidence intervals for both S(t) and
the median survival time. Simulation results indicate that the method
provides accurate coverage rates. An interesting aspect of the design is
that outcomes during the run-in phase have a negative multinomial
distribution, something not frequently encountered in practice.
Journal: The American Statistician
Pages: 155-162
Issue: 3
Volume: 66
Year: 2012
Month: 8
X-DOI: 10.1080/00031305.2012.720900
File-URL: http://hdl.handle.net/10.1080/00031305.2012.720900
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:3:p:155-162
Template-Type: ReDIF-Article 1.0
Author-Name: Iliana Ignatova
Author-X-Name-First: Iliana
Author-X-Name-Last: Ignatova
Author-Name: Roland C. Deutsch
Author-X-Name-First: Roland C.
Author-X-Name-Last: Deutsch
Author-Name: Don Edwards
Author-X-Name-First: Don
Author-X-Name-Last: Edwards
Title: Closed Sequential and Multistage Inference on Binary Responses With or Without Replacement
Abstract:
We consider closed sequential or multistage sampling, with or without
replacement, from a lot of N items, where each item can
be identified as defective (in error, tainted, etc.) or not. The goal is
to make inference on the proportion, π, of defectives in the lot, or
equivalently on the number of defectives in the lot D =
Nπ. It is shown that exact inference on π
using closed (bounded) sequential or multistage procedures with general
prespecified elimination boundaries is completely tractable and not at all
inconvenient using modern statistical software. We give relevant theory
and demonstrate functions for this purpose written in R (R Development
Core Team 2005, available as online supplementary material). Applicability
of the methodology is illustrated in three examples: (1) sharpening of
Wald's (1947) sequential probability ratio test used in industrial
acceptance sampling, (2) two-stage sampling for auditing Medicare or
Medicaid health care providers, and (3) risk-limited sequential procedures
for election audits.
Journal: The American Statistician
Pages: 163-172
Issue: 3
Volume: 66
Year: 2012
Month: 8
X-DOI: 10.1080/00031305.2012.722901
File-URL: http://hdl.handle.net/10.1080/00031305.2012.722901
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:3:p:163-172
Template-Type: ReDIF-Article 1.0
Author-Name: Mehmet Kocak
Author-X-Name-First: Mehmet
Author-X-Name-Last: Kocak
Author-Name: Arzu Onar-Thomas
Author-X-Name-First: Arzu
Author-X-Name-Last: Onar-Thomas
Title: A Simulation-Based Evaluation of the Asymptotic Power Formulas for Cox Models in Small Sample Cases
Abstract:
Cox proportional hazards (PH) models are commonly used in medical
research to investigate the associations between covariates and
time-to-event outcomes. It is frequently noted that with less than 10
events per covariate, these models produce spurious results and therefore
should not be used. Statistical literature contains asymptotic power
formulas for the Cox model which can be used to determine the number of
events needed to detect an association. Here, we investigate via
simulations the performance of these formulas in small sample settings for
Cox models with one or two covariates. Our simulations indicate that when
the number of events is small, the power estimate based on the asymptotic
formula is often inflated. The discrepancy between the asymptotic and
empirical power is larger for the dichotomous covariate especially in
cases where allocation of sample size to its levels is unequal. When more
than one covariate is included in the same model, the discrepancy between
the asymptotic power and the empirical power is even larger, especially
when a high positive correlation exists between the two covariates.
Journal: The American Statistician
Pages: 173-179
Issue: 3
Volume: 66
Year: 2012
Month: 8
X-DOI: 10.1080/00031305.2012.703873
File-URL: http://hdl.handle.net/10.1080/00031305.2012.703873
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:3:p:173-179
Template-Type: ReDIF-Article 1.0
Author-Name: Lingyun Zhang
Author-X-Name-First: Lingyun
Author-X-Name-Last: Zhang
Author-Name: Xinzhong Xu
Author-X-Name-First: Xinzhong
Author-X-Name-Last: Xu
Author-Name: Gemai Chen
Author-X-Name-First: Gemai
Author-X-Name-Last: Chen
Title: The Exact Likelihood Ratio Test for Equality of Two Normal Populations
Abstract:
Testing the equality of two independent normal populations is a perfect
case of the two-sample problem, yet it is not treated in the main text of
any textbook or handbook. In this article, we derive the exact
distribution of the likelihood ratio test and implement this test with an
R function. This article has supplementary materials online.
Journal: The American Statistician
Pages: 180-184
Issue: 3
Volume: 66
Year: 2012
Month: 8
X-DOI: 10.1080/00031305.2012.707083
File-URL: http://hdl.handle.net/10.1080/00031305.2012.707083
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:3:p:180-184
Template-Type: ReDIF-Article 1.0
Author-Name: José R. Berrendero
Author-X-Name-First: José R.
Author-X-Name-Last: Berrendero
Author-Name: Javier Cárcamo
Author-X-Name-First: Javier
Author-X-Name-Last: Cárcamo
Title: The Tangent Classifier
Abstract:
Given a classifier, we describe a general method to construct a simple
linear classification rule. This rule, called the tangent
classifier, is obtained by computing the tangent hyperplane to
the separation boundary of the groups (generated by the initial
classifier) at a certain point. When applied to a quadratic region, the
tangent classifier has a neat closed-form expression. We discuss various
examples and the application of this new linear classifier in two
situations under which standard rules may fail: when there is a fraction
of outliers in the training sample and when the dimension of the data is
large in comparison with the sample size.
Journal: The American Statistician
Pages: 185-194
Issue: 3
Volume: 66
Year: 2012
Month: 8
X-DOI: 10.1080/00031305.2012.710511
File-URL: http://hdl.handle.net/10.1080/00031305.2012.710511
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:3:p:185-194
Template-Type: ReDIF-Article 1.0
Author-Name: Michael Friendly
Author-X-Name-First: Michael
Author-X-Name-Last: Friendly
Author-Name: Nicolas de Sainte Agathe
Author-X-Name-First: Nicolas
Author-X-Name-Last: de Sainte Agathe
Title: André-Michel Guerry's Ordonnateur Statistique: The First Statistical Calculator?
Abstract:
A document retrieved from the archives of the Conservatoire National des
Arts et Métiers (CNAM) in Paris sheds new light on the invention by
André-Michel Guerry of a mechanical device for obtaining statistical
summaries and for examining the relationship between different variables,
well before general purpose statistical calculators and the idea of
correlation had even been conceived. Guerry's ordonnateur
statistique may arguably be considered as the first example of a
mechanical device devoted to statistical calculations. This article
describes what is now known about this machine and illustrates how Guerry
probably used it in his program of statistique analytique
to reason about the relationship of types of crimes to various potential
causes or associations. Supplementary materials for this article are
available online.
Journal: The American Statistician
Pages: 195-200
Issue: 3
Volume: 66
Year: 2012
Month: 8
X-DOI: 10.1080/00031305.2012.714716
File-URL: http://hdl.handle.net/10.1080/00031305.2012.714716
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:3:p:195-200
Template-Type: ReDIF-Article 1.0
Author-Name: Lawren Smithline
Author-X-Name-First: Lawren
Author-X-Name-Last: Smithline
Title: Letter to the Editor
Journal: The American Statistician
Pages: 207-207
Issue: 3
Volume: 66
Year: 2012
Month: 8
X-DOI: 10.1080/00031305.2012.718996
File-URL: http://hdl.handle.net/10.1080/00031305.2012.718996
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:3:p:207-207
Template-Type: ReDIF-Article 1.0
Author-Name: Sarah Keogh
Author-X-Name-First: Sarah
Author-X-Name-Last: Keogh
Author-Name: Donal O’neill
Author-X-Name-First: Donal
Author-X-Name-Last: O’neill
Title: A Statistical Analysis of the Fairness of Alternative Handicapping Systems in Ten-Pin Bowling
Abstract:
Using data on approximately 1040 games of bowling, we examine the
fairness of alternative handicapping systems in ten-pin bowling. The
objective of a handicap system is to allow less-skilled bowlers to compete
against more skilled opponents on a level playing field. We show that the
current systems used in many leagues do not achieve this objective and we
propose a new optimal system which equalizes the playing field across all
potential match-ups.
Journal: The American Statistician
Pages: 209-213
Issue: 4
Volume: 66
Year: 2012
Month: 11
X-DOI: 10.1080/00031305.2012.726933
File-URL: http://hdl.handle.net/10.1080/00031305.2012.726933
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:4:p:209-213
Template-Type: ReDIF-Article 1.0
Author-Name: Steven G. From
Author-X-Name-First: Steven G.
Author-X-Name-Last: From
Title: A Comparison of the Moment and Factorial Moment Bounds for Discrete Random Variables
Abstract:
In this note, we establish the superiority of the factorial moment bound
over the moment bound for nonnegative integer-valued discrete random
variables. This solves a problem given earlier. We use some results from
approximation theory/ numerical analysis to compare the bounds.
Journal: The American Statistician
Pages: 214-216
Issue: 4
Volume: 66
Year: 2012
Month: 11
X-DOI: 10.1080/00031305.2012.734769
File-URL: http://hdl.handle.net/10.1080/00031305.2012.734769
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:4:p:214-216
Template-Type: ReDIF-Article 1.0
Author-Name: Tommy Wright
Author-X-Name-First: Tommy
Author-X-Name-Last: Wright
Title: The Equivalence of Neyman Optimum Allocation for Sampling and Equal Proportions for Apportioning the U.S. House of Representatives
Abstract:
We present a surprising though obvious result that seems to have been
unnoticed until now. In particular, we demonstrate the equivalence of two
well-known problems—the optimal allocation of the fixed overall
sample size n among L strata under
stratified random sampling and the optimal allocation of the
H = 435 seats among the 50 states for apportionment of
the U.S. House of Representatives following each decennial census. In
spite of the strong similarity manifest in the statements of the two
problems, they have not been linked and they have well-known but different
solutions; one solution is not explicitly exact (Neyman allocation), and
the other (equal proportions) is exact. We give explicit exact solutions
for both and note that the solutions are equivalent. In fact, we conclude
by showing that both problems are special cases of a general problem. The
result is significant for stratified random sampling in that it explicitly
shows how to minimize sampling error when estimating a total
TY while keeping the final overall sample
size fixed at n; this is usually not the case in practice
with Neyman allocation where the resulting final overall sample size might
be near n + L after rounding. An example
reveals that controlled rounding with Neyman allocation does not always
lead to the optimum allocation, that is, an allocation that minimizes
variance.
Journal: The American Statistician
Pages: 217-224
Issue: 4
Volume: 66
Year: 2012
Month: 11
X-DOI: 10.1080/00031305.2012.733679
File-URL: http://hdl.handle.net/10.1080/00031305.2012.733679
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:4:p:217-224
Template-Type: ReDIF-Article 1.0
Author-Name: Yves Tillé
Author-X-Name-First: Yves
Author-X-Name-Last: Tillé
Author-Name: Matti Langel
Author-X-Name-First: Matti
Author-X-Name-Last: Langel
Title: Histogram-Based Interpolation of the Lorenz Curve and Gini Index for Grouped Data
Abstract:
In grouped data, the estimation of the Lorenz curve without taking into
account the within-class variability leads to an overestimation of the
curve and an underestimation of the Gini index. We propose a new strictly
convex estimator of the Lorenz curve derived from a linear
interpolation-based approximation of the cumulative distribution function.
Integrating the Lorenz curve, a correction can be derived for the Gini
index that takes the intraclass variability into account.
Journal: The American Statistician
Pages: 225-231
Issue: 4
Volume: 66
Year: 2012
Month: 11
X-DOI: 10.1080/00031305.2012.734197
File-URL: http://hdl.handle.net/10.1080/00031305.2012.734197
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:4:p:225-231
Template-Type: ReDIF-Article 1.0
Author-Name: Liang Hong
Author-X-Name-First: Liang
Author-X-Name-Last: Hong
Title: A Remark on the Alternative Expectation Formula
Abstract:
Students in their first course in probability will often see the
expectation formula for nonnegative continuous random variables in terms
of the survival function. The traditional approach for deriving this
formula (using double integrals) is well-received by students. Some
students tend to approach this using integration by parts, but often get
stuck. Most standard textbooks do not elaborate on this alternative
approach. We present a rigorous derivation here. We hope that students and
instructors of the first course in probability will find this short note
helpful.
Journal: The American Statistician
Pages: 232-233
Issue: 4
Volume: 66
Year: 2012
Month: 11
X-DOI: 10.1080/00031305.2012.726934
File-URL: http://hdl.handle.net/10.1080/00031305.2012.726934
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:4:p:232-233
Template-Type: ReDIF-Article 1.0
Author-Name: Stavros Kourouklis
Author-X-Name-First: Stavros
Author-X-Name-Last: Kourouklis
Title: A New Estimator of the Variance Based on Minimizing Mean Squared Error
Abstract:
In 2005, Yatracos constructed the estimator S -super-2
2 = c 2 S
-super-2, c 2 = (n +
2)(n − 1)[n(n +
1)]-super-− 1, of the variance, which has smaller mean squared
error (MSE) than the unbiased estimator S -super-2. In
this work, the estimator S -super-2 1 =
c 1 S -super-2,
c 1 = n(n
− 1)[n(n − 1) +
2]-super-− 1, is constructed and is shown to have the following
properties: (a) it has smaller MSE than S -super-2
2, and (b) it cannot be improved in terms of MSE by an
estimator of the form cS -super-2, c >
0. The method of construction is based on Stein’s classical idea
brought forward in 1964, is very simple, and may be taught even in an
undergraduate class. Also, all the estimators of the form
cS -super-2, c > 0, with smaller MSE
than S -super-2 as well as all those that have the
property (b) are found. In contrast to S -super-2, the
method of moments estimator is among the latter estimators.
Journal: The American Statistician
Pages: 234-236
Issue: 4
Volume: 66
Year: 2012
Month: 11
X-DOI: 10.1080/00031305.2012.735209
File-URL: http://hdl.handle.net/10.1080/00031305.2012.735209
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:4:p:234-236
Template-Type: ReDIF-Article 1.0
Author-Name: J. Kelly Cunningham
Author-X-Name-First: J. Kelly
Author-X-Name-Last: Cunningham
Title: Should S Get More Press?
Abstract:
This note discusses a problem appropriate for a beginning mathematical
statistics course. Four estimators of the standard deviation of a normal
data source are compared using mean square error. Both the uniformly
minimum variance unbiased estimator and the usual estimator are found to
be inadmissible.
Journal: The American Statistician
Pages: 237-237
Issue: 4
Volume: 66
Year: 2012
Month: 11
X-DOI: 10.1080/00031305.2012.736915
File-URL: http://hdl.handle.net/10.1080/00031305.2012.736915
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:4:p:237-237
Template-Type: ReDIF-Article 1.0
Author-Name: Robert A. Oster
Author-X-Name-First: Robert A.
Author-X-Name-Last: Oster
Title: Section Editor's Notes
Journal: The American Statistician
Pages: 238-238
Issue: 4
Volume: 66
Year: 2012
Month: 11
X-DOI: 10.1080/00031305.2012.743422
File-URL: http://hdl.handle.net/10.1080/00031305.2012.743422
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:66:y:2012:i:4:p:238-238
Template-Type: ReDIF-Article 1.0
Author-Name: Andrew Gelman
Author-X-Name-First: Andrew
Author-X-Name-Last: Gelman
Author-Name: Christian P. Robert
Author-X-Name-First: Christian P.
Author-X-Name-Last: Robert
Title: "Not Only Defended But Also Applied": The Perceived Absurdity of Bayesian Inference
Abstract:
The missionary zeal of many Bayesians of old has been matched, in the
other direction, by an attitude among some theoreticians that Bayesian
methods were absurd-not merely misguided but obviously wrong in principle.
We consider several examples, beginning with Feller's classic text on
probability theory and continuing with more recent cases such as the
perceived Bayesian nature of the so-called doomsday argument. We analyze
in this note the intellectual background behind various misconceptions
about Bayesian statistics, without aiming at a complete historical
coverage of the reasons for this dismissal.
Journal: The American Statistician
Pages: 1-5
Issue: 1
Volume: 67
Year: 2013
Month: 2
X-DOI: 10.1080/00031305.2013.760987
File-URL: http://hdl.handle.net/10.1080/00031305.2013.760987
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:1:p:1-5
Template-Type: ReDIF-Article 1.0
Author-Name: Stephen Stigler
Author-X-Name-First: Stephen
Author-X-Name-Last: Stigler
Title: Comment: Bayesian Inference: The Rodney Dangerfield of Statistics?
Journal: The American Statistician
Pages: 6-7
Issue: 1
Volume: 67
Year: 2013
Month: 2
X-DOI: 10.1080/00031305.2012.747448
File-URL: http://hdl.handle.net/10.1080/00031305.2012.747448
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:1:p:6-7
Template-Type: ReDIF-Article 1.0
Author-Name: Stephen E. Fienberg
Author-X-Name-First: Stephen E.
Author-X-Name-Last: Fienberg
Title: Comment: Bayesian Ideas Reemerged in the 1950s
Journal: The American Statistician
Pages: 7-8
Issue: 1
Volume: 67
Year: 2013
Month: 2
X-DOI: 10.1080/00031305.2012.751881
File-URL: http://hdl.handle.net/10.1080/00031305.2012.751881
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:1:p:7-8
Template-Type: ReDIF-Article 1.0
Author-Name: Wesley O. Johnson
Author-X-Name-First: Wesley O.
Author-X-Name-Last: Johnson
Title: Comment: Bayesian Statistics in the Twenty First Century
Journal: The American Statistician
Pages: 9-11
Issue: 1
Volume: 67
Year: 2013
Month: 2
X-DOI: 10.1080/00031305.2012.751880
File-URL: http://hdl.handle.net/10.1080/00031305.2012.751880
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:1:p:9-11
Template-Type: ReDIF-Article 1.0
Author-Name: Deborah G. Mayo
Author-X-Name-First: Deborah G.
Author-X-Name-Last: Mayo
Title: Discussion: Bayesian Methods: Applied? Yes. Philosophical Defense? In Flux
Journal: The American Statistician
Pages: 11-15
Issue: 1
Volume: 67
Year: 2013
Month: 2
X-DOI: 10.1080/00031305.2012.752410
File-URL: http://hdl.handle.net/10.1080/00031305.2012.752410
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:1:p:11-15
Template-Type: ReDIF-Article 1.0
Author-Name: Andrew Gelman
Author-X-Name-First: Andrew
Author-X-Name-Last: Gelman
Author-Name: Christian P. Robert
Author-X-Name-First: Christian P.
Author-X-Name-Last: Robert
Title: Rejoinder: The Anti-Bayesian Moment and Its Passing
Journal: The American Statistician
Pages: 16-17
Issue: 1
Volume: 67
Year: 2013
Month: 2
X-DOI: 10.1080/00031305.2012.752409
File-URL: http://hdl.handle.net/10.1080/00031305.2012.752409
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:1:p:16-17
Template-Type: ReDIF-Article 1.0
Author-Name: Sandy Zabell
Author-X-Name-First: Sandy
Author-X-Name-Last: Zabell
Title: Paul Meier on Legal Consulting
Abstract:
In addition to his contributions to biostatistics and clinical trials,
Paul Meier had a long-term interest in the legal applications of
statistics. As part of this, he had extensive experience as a statistical
consultant. Legal consulting can be a minefield, but as a result of his
background, Paul had excellent advice to give to those starting out on how
to function successfully in this environment.
Journal: The American Statistician
Pages: 18-21
Issue: 1
Volume: 67
Year: 2013
Month: 2
X-DOI: 10.1080/00031305.2012.742026
File-URL: http://hdl.handle.net/10.1080/00031305.2012.742026
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:1:p:18-21
Template-Type: ReDIF-Article 1.0
Author-Name: Rick Picard
Author-X-Name-First: Rick
Author-X-Name-Last: Picard
Author-Name: Brian Williams
Author-X-Name-First: Brian
Author-X-Name-Last: Williams
Title: Rare Event Estimation for Computer Models
Abstract:
Rare events for computer models are usually impossible to address via
direct methods-the conceptually straightforward approach of making
millions of "ordinary" code runs to generate an adequate number of rare
events simply is not an option. In Bayesian applications, the common
practice of sampling from posterior distributions is inefficient for rare
event estimation when some parameters are important, and corresponding
normalized estimates can be seriously biased for seemingly adequate sample
sizes (e.g., N = 10-super-6). Rare event estimation based
on adaptive importance sampling can improve computational efficiencies by
orders of magnitude relative to ordinary simulation methods, greatly
reducing the need for time-consuming code runs.
Journal: The American Statistician
Pages: 22-32
Issue: 1
Volume: 67
Year: 2013
Month: 2
X-DOI: 10.1080/00031305.2012.751879
File-URL: http://hdl.handle.net/10.1080/00031305.2012.751879
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:1:p:22-32
Template-Type: ReDIF-Article 1.0
Author-Name: Valeria Sambucini
Author-X-Name-First: Valeria
Author-X-Name-Last: Sambucini
Title: On the Nature of the Stationary Point of a Quadratic Response Surface: A Bayesian Simulation-Based Approach
Abstract:
In response-surface methodology, when the data are fitted using a
quadratic model, it is important to make inference about the eigenvalues
of the matrix of pure and mixed second-order coefficients, since they
contain information on the nature of the stationary point and the shape of
the surface. In this article, we propose a Bayesian simulation-based
approach to explore the behavior of the posterior distributions of these
eigenvalues. Highest posterior density (HPD) intervals for the ordered
eigenvalues are then computed and their empirical coverage probabilities
are evaluated. A user-friendly software tool has been developed to get the
kernel density plots of these simulated posterior distributions and to
obtain the corresponding HPD intervals. It is provided online as
supplementary materials to this article.
Journal: The American Statistician
Pages: 33-41
Issue: 1
Volume: 67
Year: 2013
Month: 2
X-DOI: 10.1080/00031305.2012.755366
File-URL: http://hdl.handle.net/10.1080/00031305.2012.755366
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:1:p:33-41
Template-Type: ReDIF-Article 1.0
Author-Name: Bruce Levin
Author-X-Name-First: Bruce
Author-X-Name-Last: Levin
Author-Name: Cheng-Shiun Leu
Author-X-Name-First: Cheng-Shiun
Author-X-Name-Last: Leu
Title: Note on an Identity Between Two Unbiased Variance Estimators for the Grand Mean in a Simple Random Effects Model
Abstract:
We demonstrate the algebraic equivalence of two unbiased variance
estimators for the sample grand mean in a random sample of subjects from
an infinite population where subjects provide repeated observations
following a homoscedastic random effects model.
Journal: The American Statistician
Pages: 42-43
Issue: 1
Volume: 67
Year: 2013
Month: 2
X-DOI: 10.1080/00031305.2012.752105
File-URL: http://hdl.handle.net/10.1080/00031305.2012.752105
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:1:p:42-43
Template-Type: ReDIF-Article 1.0
Author-Name: Roberto Behar
Author-X-Name-First: Roberto
Author-X-Name-Last: Behar
Author-Name: Pere Grima
Author-X-Name-First: Pere
Author-X-Name-Last: Grima
Author-Name: Lluís Marco-Almagro
Author-X-Name-First: Lluís
Author-X-Name-Last: Marco-Almagro
Title: Twenty-Five Analogies for Explaining Statistical Concepts
Abstract:
The use of analogies is a resource that can be used for transmitting
concepts and making classes more enjoyable. This article presents 25
analogies that we use in our introductory statistical courses for
introducing concepts and clarifying possible doubts. We have found that
these analogies draw students' attention and reinforce the ideas that we
want to transmit.
Journal: The American Statistician
Pages: 44-48
Issue: 1
Volume: 67
Year: 2013
Month: 2
X-DOI: 10.1080/00031305.2012.752408
File-URL: http://hdl.handle.net/10.1080/00031305.2012.752408
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:1:p:44-48
Template-Type: ReDIF-Article 1.0
Author-Name: Subhash C. Bagui
Author-X-Name-First: Subhash C.
Author-X-Name-Last: Bagui
Author-Name: Dulal K. Bhaumik
Author-X-Name-First: Dulal K.
Author-X-Name-Last: Bhaumik
Author-Name: K. L. Mehra
Author-X-Name-First: K. L.
Author-X-Name-Last: Mehra
Title: A Few Counter Examples Useful in Teaching Central Limit Theorems
Abstract:
In probability theory, central limit theorems (CLTs), broadly speaking,
state that the distribution of the sum of a sequence of random variables
(r.v.'s), suitably normalized, converges to a normal distribution as their
number n increases indefinitely. However, the preceding
convergence in distribution holds only under certain conditions, depending
on the underlying probabilistic nature of this sequence of r.v.'s. If some
of the assumed conditions are violated, the convergence may or may not
hold, or if it does, this convergence may be to a nonnormal distribution.
We shall illustrate this via a few counter examples. While teaching CLTs
at an advanced level, counter examples can serve as useful tools for
explaining the true nature of these CLTs and the consequences when some of
the assumptions made are violated.
Journal: The American Statistician
Pages: 49-56
Issue: 1
Volume: 67
Year: 2013
Month: 2
X-DOI: 10.1080/00031305.2012.755361
File-URL: http://hdl.handle.net/10.1080/00031305.2012.755361
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:1:p:49-56
Template-Type: ReDIF-Article 1.0
Author-Name: Stephen M. Stigler
Author-X-Name-First: Stephen M.
Author-X-Name-Last: Stigler
Title: The Digital Approximation of the Binomial by the Poisson
Abstract:
An old source can lead to looking at the Poisson approximation to the
binomial in a new light.
Journal: The American Statistician
Pages: 57-59
Issue: 1
Volume: 67
Year: 2013
Month: 2
X-DOI: 10.1080/00031305.2012.755473
File-URL: http://hdl.handle.net/10.1080/00031305.2012.755473
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:1:p:57-59
Template-Type: ReDIF-Article 1.0
Author-Name: Paul R. Rosenbaum
Author-X-Name-First: Paul R.
Author-X-Name-Last: Rosenbaum
Author-Name: Jeffrey H. Silber
Author-X-Name-First: Jeffrey H.
Author-X-Name-Last: Silber
Title: Using the Exterior Match to Compare Two Entwined Matched Control Groups
Abstract:
When comparing outcomes, such as survival, in two groups- say
a focal group and a comparison group-a common question is whether an
adjustment for certain baseline differences that separate these two groups
actually matters for the difference in outcomes. Did the adjustment
matter? If it did matter, to what quantitative extent did it matter? This
question is quite distinct from whether the baseline variables predict the
outcome: baseline variables may predict the outcome, yet explain no part
of the difference in outcomes in two groups. The question is also distinct
from whether a difference between the groups remains after adjustment: an
adjustment may matter quite a bit, yet fail to explain a substantial part
of the difference in outcomes, and, indeed, adjustment may increase the
difference. Whether an adjustment for (x
1, x 2) matters over and
above an adjustment for x 1 alone
can be addressed by comparing outcomes in two control groups formed from
the comparison group, one matched to the focal group for x
1 alone, the other matched to focal group for
(x 1, x
2). How do outcomes differ in these two matched control groups?
If two control groups are each pair-matched to the same focal group, then
the result is a set of matched triples, so controls in the two groups are
implicitly matched to each other by virtue of being matched to the same
person in the focal group. When the comparison group is vastly larger than
the focal group and their distributions exhibit extensive overlap on
(x 1, x
2), it may be possible to construct nonintersecting matched
control groups, but quite often the comparison group is large enough to
yield closely matched groups one at a time, but is not large enough to
produce several nonintersecting matched control groups. How can one
compare two matched control groups that are entwined, with some of the
same controls in both groups? Two entwined control groups have a nonempty
intersection: some of the same controls appear in both groups as
duplicates. These duplicates may appear in the same matched triple, but
more commonly they appear in different matched triples. This structure
yields a new nonintersecting match that we call the exterior match.
Properties of the exterior match are discussed. Our on-going study of
black-versus-white disparities in survival following breast cancer in
Medicare motivated this work and is used to illustrate.
Journal: The American Statistician
Pages: 67-75
Issue: 2
Volume: 67
Year: 2013
Month: 5
X-DOI: 10.1080/00031305.2013.769914
File-URL: http://hdl.handle.net/10.1080/00031305.2013.769914
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:2:p:67-75
Template-Type: ReDIF-Article 1.0
Author-Name: Micha Mandel
Author-X-Name-First: Micha
Author-X-Name-Last: Mandel
Title: Simulation-Based Confidence Intervals for Functions With Complicated Derivatives
Abstract:
In many scientific problems, the quantity of interest is a
function of parameters that index the model, and confidence intervals are
constructed by applying the delta method. However, when the function of
interest has complicated derivatives, this standard approach is
unattractive and alternative algorithms are required. This article
discusses a simple simulation-based algorithm for estimating the variance
of a transformation, and demonstrates its simplicity and accuracy by
applying it to several statistical problems.
Journal: The American Statistician
Pages: 76-81
Issue: 2
Volume: 67
Year: 2013
Month: 5
X-DOI: 10.1080/00031305.2013.783880
File-URL: http://hdl.handle.net/10.1080/00031305.2013.783880
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:2:p:76-81
Template-Type: ReDIF-Article 1.0
Author-Name: Anthony J. Webster
Author-X-Name-First: Anthony J.
Author-X-Name-Last: Webster
Author-Name: Richard Kemp
Author-X-Name-First: Richard
Author-X-Name-Last: Kemp
Title: Estimating Omissions From Searches
Abstract:
The mark-recapture method was devised by Petersen in 1896 to
estimate the number of fish migrating into the Limfjord, and independently
by Lincoln in 1930 to estimate waterfowl abundance. The technique can be
applied to any search for a finite number of items by two or more people
or agents, allowing the number of searched-for items to be estimated. This
ubiquitous problem appears in fields from ecology and epidemiology,
through to mathematics, social sciences, and computing. Here, we exactly
calculate the moments of the hypergeometric distribution associated with
this longstanding problem, confirming that widely used estimates
conjectured in 1951 are often too small. Our Bayesian approach highlights
how different search strategies will modify the estimates. The estimates
are applied to several examples. For some published applications,
substantial errors are found to result from using the Chapman or
Lincoln--Petersen estimates. Supplementary materials for this article are
available online.
Journal: The American Statistician
Pages: 82-89
Issue: 2
Volume: 67
Year: 2013
Month: 5
X-DOI: 10.1080/00031305.2013.783881
File-URL: http://hdl.handle.net/10.1080/00031305.2013.783881
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:2:p:82-89
Template-Type: ReDIF-Article 1.0
Author-Name: Sergio Wechsler
Author-X-Name-First: Sergio
Author-X-Name-Last: Wechsler
Author-Name: Rafael Izbicki
Author-X-Name-First: Rafael
Author-X-Name-Last: Izbicki
Author-Name: Luís Gustavo Esteves
Author-X-Name-First: Luís Gustavo
Author-X-Name-Last: Esteves
Title: A Bayesian Look at Nonidentifiability: A Simple Example
Abstract:
This article discusses the concept of identifiability in
simple probability calculus. Emphasis is given to Bayesian solutions. In
particular, we compare Bayes and maximum likelihood estimators. We
advocate adoption of informative prior probabilities for the Bayesian
operation in place of diffuse or reference priors. We also discuss the
concept of identifying functions.
Journal: The American Statistician
Pages: 90-93
Issue: 2
Volume: 67
Year: 2013
Month: 5
X-DOI: 10.1080/00031305.2013.778787
File-URL: http://hdl.handle.net/10.1080/00031305.2013.778787
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:2:p:90-93
Template-Type: ReDIF-Article 1.0
Author-Name: Stephen B. Vardeman
Author-X-Name-First: Stephen B.
Author-X-Name-Last: Vardeman
Author-Name: Max D. Morris
Author-X-Name-First: Max D.
Author-X-Name-Last: Morris
Title: Majority Voting by Independent Classifiers Can Increase Error Rates
Abstract:
The technique of "majority voting" of classifiers is used in
machine learning with the aim of constructing a new combined
classification rule that has better characteristics than any of a given
set of rules. The "Condorcet Jury Theorem" is often cited, incorrectly, as
support for a claim that this practice leads to an improved classifier
(i.e., one with smaller error probabilities) when the given classifiers
are sufficiently good and are uncorrelated. We specifically address the
case of two-category classification, and argue that a correct claim can be
made for independent (not just uncorrelated) classification errors (not
the classifiers themselves), and offer an example demonstrating that the
common claim is false. Supplementary materials for this article are
available online.
Journal: The American Statistician
Pages: 94-96
Issue: 2
Volume: 67
Year: 2013
Month: 5
X-DOI: 10.1080/00031305.2013.778788
File-URL: http://hdl.handle.net/10.1080/00031305.2013.778788
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:2:p:94-96
Template-Type: ReDIF-Article 1.0
Author-Name: Mithat Gönen
Author-X-Name-First: Mithat
Author-X-Name-Last: Gönen
Title: Visualizing Longitudinal Data With Dropouts
Abstract:
This article proposes a triangle plot to display longitudinal
data with dropouts. The triangle plot is a tool of data visualization that
can also serve as a graphical check for informativeness of the dropout
process. There are similarities between the lasagna plot and the triangle
plot, but the explicit use of dropout time as an axis is an advantage of
the triangle plot over the more commonly used graphical strategies for
longitudinal data. It is possible to interpret the triangle plot as a
trellis plot, which gives rise to several extensions such as the triangle
histogram and the triangle boxplot. R code is available to streamline the
use of the triangle plot in practice. Supplementary materials for this
article are available online.
Journal: The American Statistician
Pages: 97-103
Issue: 2
Volume: 67
Year: 2013
Month: 5
X-DOI: 10.1080/00031305.2013.785980
File-URL: http://hdl.handle.net/10.1080/00031305.2013.785980
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:2:p:97-103
Template-Type: ReDIF-Article 1.0
Author-Name: Robert A. Oster
Author-X-Name-First: Robert A.
Author-X-Name-Last: Oster
Title: Section Editor's Notes
Journal: The American Statistician
Pages: 104-104
Issue: 2
Volume: 67
Year: 2013
Month: 5
X-DOI: 10.1080/00031305.2013.788307
File-URL: http://hdl.handle.net/10.1080/00031305.2013.788307
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:2:p:104-104
Template-Type: ReDIF-Article 1.0
Author-Name: Richard G. Lomax
Author-X-Name-First: Richard G.
Author-X-Name-Last: Lomax
Title: Statistical Accuracy of iPad Applications: An Initial Examination
Abstract:
With the recent advent of the iPad, statistics-related
applications (apps) have begun development. Given their newness,
statistical accuracy is a concern. This study assessed the accuracy of the
following iPad apps: Data Explorer, StatsMate, Statistics Visualizer, and
TC-Stats. Early and recent versions of Excel were also included for
comparative purposes. Accuracy was considered in two ways. First, the
National Institute of Standards and Technology Statistical Reference
Datasets (StRD) were used to benchmark accuracy. Analyses included
univariate summary statistics (means, standard deviations), analysis of
variance (ANOVA; F statistics), and linear regression
(regression coefficients, standard deviations). The log relative error was
computed for each dataset (comparing the "certified" values from StRD
against the app actual values). Second, Wilkinson's tests were conducted
to assess app "pass" rates (rounding, scatterplot, univariate, regression,
overall). The results suggest the following: (a) the most accurate app for
summary statistics and for lower difficulty ANOVA datasets was StatsMate,
(b) the most accurate app for average difficulty ANOVA datasets was Data
Explorer, (c) no app was accurate for higher difficulty ANOVA datasets,
(d) only Data Explorer could handle most regression models, and (e)
Wilkinson pass rates for Data Explorer (79%) and StatsMate (58%) were
highest. Overall, StatsMate compares favorably to Excel 97, early versions
being similarly accurate. Much remains to be done to improve the
statistical accuracy of these apps.
Journal: The American Statistician
Pages: 105-108
Issue: 2
Volume: 67
Year: 2013
Month: 5
X-DOI: 10.1080/00031305.2013.778789
File-URL: http://hdl.handle.net/10.1080/00031305.2013.778789
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:2:p:105-108
Template-Type: ReDIF-Article 1.0
Author-Name: Brian Caffo
Author-X-Name-First: Brian
Author-X-Name-Last: Caffo
Author-Name: Carolyn Lauzon
Author-X-Name-First: Carolyn
Author-X-Name-Last: Lauzon
Author-Name: Joachim Röhmel
Author-X-Name-First: Joachim
Author-X-Name-Last: Röhmel
Title: Correction to "Easy Multiplicity Control in Equivalence Testing Using Two One-Sided Tests"
Journal: The American Statistician
Pages: 115-116
Issue: 2
Volume: 67
Year: 2013
Month: 5
X-DOI: 10.1080/00031305.2012.760487
File-URL: http://hdl.handle.net/10.1080/00031305.2012.760487
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:2:p:115-116
Template-Type: ReDIF-Article 1.0
Author-Name: Francesca Greselin
Author-X-Name-First: Francesca
Author-X-Name-Last: Greselin
Author-Name: Antonio Punzo
Author-X-Name-First: Antonio
Author-X-Name-Last: Punzo
Title: Closed Likelihood Ratio Testing Procedures to Assess Similarity of Covariance Matrices
Abstract:
In this article, we introduce a multiple testing procedure to
assess a common covariance structure between k groups.
The new test allows for a choice among eight different patterns arising
from the three-term eigen decomposition of the group covariances. It is
based on the closed testing principle and adopts local likelihood ratio
(LR) tests. The approach reveals richer information about the underlying
data structure than classical methods, the most common one being only
based on homo/heteroscedasticity. At the same time, it provides a more
parsimonious parameterization, whenever the constrained model is suitable
to describe the real data. The new inferential methodology is then applied
to some well-known datasets chosen from the multivariate literature.
Finally, simulation results are presented to investigate its performance
in different situations representing gradual departures from
homoscedasticity and to evaluate the reliability of using the asymptotic
χ-super-2 to approximate the actual distribution of the local LR
test statistics.
Journal: The American Statistician
Pages: 117-128
Issue: 3
Volume: 67
Year: 2013
Month: 8
X-DOI: 10.1080/00031305.2013.791643
File-URL: http://hdl.handle.net/10.1080/00031305.2013.791643
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:3:p:117-128
Template-Type: ReDIF-Article 1.0
Author-Name: Kevin Wright
Author-X-Name-First: Kevin
Author-X-Name-Last: Wright
Title: Revisiting Immer's Barley Data
Abstract:
This article reexamines the famous barley data that are often
used to demonstrate dot plots. Additional sources of supplemental data
provide context for interpretation of the original data. Graphical and
mixed-model analyses shed new light on the variability in the data and
challenge previously held beliefs about the accuracy of the data.
Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 129-133
Issue: 3
Volume: 67
Year: 2013
Month: 8
X-DOI: 10.1080/00031305.2013.801783
File-URL: http://hdl.handle.net/10.1080/00031305.2013.801783
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:3:p:129-133
Template-Type: ReDIF-Article 1.0
Author-Name: Chong Zhang
Author-X-Name-First: Chong
Author-X-Name-Last: Zhang
Author-Name: Yufeng Liu
Author-X-Name-First: Yufeng
Author-X-Name-Last: Liu
Author-Name: Zhengxiao Wu
Author-X-Name-First: Zhengxiao
Author-X-Name-Last: Wu
Title: On the Effect and Remedies of Shrinkage on Classification Probability Estimation
Abstract:
Shrinkage methods have been shown to be effective for
classification problems. As a form of regularization, shrinkage through
penalization helps to avoid overfitting and produces accurate classifiers
for prediction, especially when the dimension is relatively high. Despite
the benefit of shrinkage on classification accuracy of resulting
classifiers, in this article, we demonstrate that shrinkage creates biases
on classification probability estimation. In many cases, this bias can be
large and consequently yield poor class probability estimation when the
sample size is small or moderate. We offer some theoretical insights into
the effect of shrinkage and provide remedies for better class probability
estimation. Using penalized logistic regression and proximal support
vector machines as examples, we demonstrate that our proposed refit method
gives similar classification accuracy and remarkable improvements on
probability estimation on several simulated and real data examples.
Journal: The American Statistician
Pages: 134-142
Issue: 3
Volume: 67
Year: 2013
Month: 8
X-DOI: 10.1080/00031305.2013.817356
File-URL: http://hdl.handle.net/10.1080/00031305.2013.817356
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:3:p:134-142
Template-Type: ReDIF-Article 1.0
Author-Name: Jingchen Hu
Author-X-Name-First: Jingchen
Author-X-Name-Last: Hu
Author-Name: Robin Mitra
Author-X-Name-First: Robin
Author-X-Name-Last: Mitra
Author-Name: Jerome Reiter
Author-X-Name-First: Jerome
Author-X-Name-Last: Reiter
Title: Are Independent Parameter Draws Necessary for Multiple Imputation?
Abstract:
In typical implementations of multiple imputation for missing
data, analysts create m completed datasets based on
approximately independent draws of imputation model parameters. We use
theoretical arguments and simulations to show that, provided
m is large, the use of independent draws is not
necessary. In fact, appropriate use of dependent draws can improve
precision relative to the use of independent draws. It also eliminates the
sometimes difficult task of obtaining independent draws; for example, in
fully Bayesian imputation models based on MCMC, analysts can avoid the
search for a subsampling interval that ensures approximately independent
draws for all parameters. We illustrate the use of dependent draws in
multiple imputation with a study of the effect of breast feeding on
children's later cognitive abilities.
Journal: The American Statistician
Pages: 143-149
Issue: 3
Volume: 67
Year: 2013
Month: 8
X-DOI: 10.1080/00031305.2013.821953
File-URL: http://hdl.handle.net/10.1080/00031305.2013.821953
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:3:p:143-149
Template-Type: ReDIF-Article 1.0
Author-Name: Richard J. Barker
Author-X-Name-First: Richard J.
Author-X-Name-Last: Barker
Author-Name: William A. Link
Author-X-Name-First: William A.
Author-X-Name-Last: Link
Title: Bayesian Multimodel Inference by RJMCMC: A Gibbs Sampling Approach
Abstract:
Bayesian multimodel inference treats a set of candidate
models as the sample space of a latent categorical random variable,
sampled once; the data at hand are modeled as having been generated
according to the sampled model. Model selection and model averaging are
based on the posterior probabilities for the model set. Reversible-jump
Markov chain Monte Carlo (RJMCMC) extends ordinary MCMC methods to this
meta-model. We describe a version of RJMCMC that intuitively represents
the process as Gibbs sampling with alternating updates of a categorical
variable M (for Model) and a "palette"
of parameters ,
from which any of the model-specific parameters can be calculated. Our
representation makes plain how model-specific Monte Carlo outputs
(analytical or numerical) can be post-processed to compute model weights
or Bayes factors. We illustrate the procedure with several examples.
Journal: The American Statistician
Pages: 150-156
Issue: 3
Volume: 67
Year: 2013
Month: 8
X-DOI: 10.1080/00031305.2013.791644
File-URL: http://hdl.handle.net/10.1080/00031305.2013.791644
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:3:p:150-156
Template-Type: ReDIF-Article 1.0
Author-Name: Daniel A. Griffith
Author-X-Name-First: Daniel A.
Author-X-Name-Last: Griffith
Title: Better Articulating Normal Curve Theory for Introductory Mathematical Statistics Students: Power Transformations and Their Back-Transformations
Abstract:
This article addresses a gap in many, if not all,
introductory mathematical statistics textbooks, namely, transforming a
random variable so that it better mimics a normal distribution. Virtually
all such textbooks treat the subject of variable transformations, which
furnishes a nice opportunity to introduce and study this
transformation-to-normality topic, a topic students frequently encounter
in subsequent applied statistics courses. Accordingly, this article
reviews variable power transformations of the Box--Cox type within the
context of normal curve theory, as well as addresses their corresponding
back-transformations. It presents four theorems and a conjecture that
furnish the basics needed to derive equivalent results for all nonnegative
values of the Box--Cox power transformation exponent. Results are
illustrated with the exponential random variable. This article also
includes selected pedagogic tools created with R code.
Journal: The American Statistician
Pages: 157-169
Issue: 3
Volume: 67
Year: 2013
Month: 8
X-DOI: 10.1080/00031305.2013.801782
File-URL: http://hdl.handle.net/10.1080/00031305.2013.801782
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:3:p:157-169
Template-Type: ReDIF-Article 1.0
Author-Name: Robert A. Oster
Author-X-Name-First: Robert A.
Author-X-Name-Last: Oster
Title: Section Editor's Notes
Journal: The American Statistician
Pages: 170-170
Issue: 3
Volume: 67
Year: 2013
Month: 8
X-DOI: 10.1080/00031305.2013.822199
File-URL: http://hdl.handle.net/10.1080/00031305.2013.822199
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:3:p:170-170
Template-Type: ReDIF-Article 1.0
Author-Name: Yoonsang Kim
Author-X-Name-First: Yoonsang
Author-X-Name-Last: Kim
Author-Name: Young-Ku Choi
Author-X-Name-First: Young-Ku
Author-X-Name-Last: Choi
Author-Name: Sherry Emery
Author-X-Name-First: Sherry
Author-X-Name-Last: Emery
Title: Logistic Regression With Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages
Abstract:
Several statistical packages are capable of estimating
generalized linear mixed models and these packages provide one or more of
three estimation methods: penalized quasi-likelihood, Laplace, and
Gauss--Hermite. Many studies have investigated these methods' performance
for the mixed-effects logistic regression model. However, the authors
focused on models with one or two random effects and assumed a simple
covariance structure between them, which may not be realistic. When there
are multiple correlated random effects in a model, the computation becomes
intensive, and often an algorithm fails to converge. Moreover, in our
analysis of smoking status and exposure to antitobacco advertisements, we
have observed that when a model included multiple random effects,
parameter estimates varied considerably from one statistical package to
another even when using the same estimation method. This article presents
a comprehensive review of the advantages and disadvantages of each
estimation method. In addition, we compare the performances of the three
methods across statistical packages via simulation, which involves two-
and three-level logistic regression models with at least three correlated
random effects. We apply our findings to a real dataset. Our results
suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian
quadrature-perform well in terms of accuracy, precision, convergence
rates, and computing speed. We also discuss the strengths and weaknesses
of the two packages in regard to sample sizes.
Journal: The American Statistician
Pages: 171-182
Issue: 3
Volume: 67
Year: 2013
Month: 8
X-DOI: 10.1080/00031305.2013.817357
File-URL: http://hdl.handle.net/10.1080/00031305.2013.817357
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:3:p:171-182
Template-Type: ReDIF-Article 1.0
Author-Name: Kristian Lum
Author-X-Name-First: Kristian
Author-X-Name-Last: Lum
Author-Name: Megan Emily Price
Author-X-Name-First: Megan Emily
Author-X-Name-Last: Price
Author-Name: David Banks
Author-X-Name-First: David
Author-X-Name-Last: Banks
Title: Applications of Multiple Systems Estimation in Human Rights Research
Abstract:
Multiple systems estimation (MSE) is becoming an increasingly
common approach for exploratory study of underreported events in the field
of quantitative human rights. In this context, it is used to estimate the
number of people who died as a result of political unrest when it is
believed that many of those who died or disappeared were never reported.
MSE relies upon several assumptions, each of which may be slightly or
significantly violated in particular applications. This article outlines
the evolution of the application of MSE to human rights research through
the use of three case studies: Guatemala, Peru, and Colombia. Each of
these cases presents distinct challenges to the MSE method. Motivated by
these applications, we describe new methodology for assessing the impact
of violated assumptions in MSE. Our approach uses simulations to explore
the cumulative magnitude of errors introduced by violation of the model
assumptions at each stage in the analysis.
Journal: The American Statistician
Pages: 191-200
Issue: 4
Volume: 67
Year: 2013
Month: 11
X-DOI: 10.1080/00031305.2013.821093
File-URL: http://hdl.handle.net/10.1080/00031305.2013.821093
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:4:p:191-200
Template-Type: ReDIF-Article 1.0
Author-Name: Stephen E. Fienberg
Author-X-Name-First: Stephen E.
Author-X-Name-Last: Fienberg
Title: Comment: Innovations Associated with Multiple Systems Estimation in Human Rights Settings
Journal: The American Statistician
Pages: 201-202
Issue: 4
Volume: 67
Year: 2013
Month: 11
X-DOI: 10.1080/00031305.2013.855108
File-URL: http://hdl.handle.net/10.1080/00031305.2013.855108
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:4:p:201-202
Template-Type: ReDIF-Article 1.0
Author-Name: Joseph B. Kadane
Author-X-Name-First: Joseph B.
Author-X-Name-Last: Kadane
Title: Comment
Journal: The American Statistician
Pages: 202-203
Issue: 4
Volume: 67
Year: 2013
Month: 11
X-DOI: 10.1080/00031305.2013.855106
File-URL: http://hdl.handle.net/10.1080/00031305.2013.855106
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:4:p:202-203
Template-Type: ReDIF-Article 1.0
Author-Name: Fritz Scheuren
Author-X-Name-First: Fritz
Author-X-Name-Last: Scheuren
Title: Comment
Journal: The American Statistician
Pages: 203-205
Issue: 4
Volume: 67
Year: 2013
Month: 11
X-DOI: 10.1080/00031305.2013.852026
File-URL: http://hdl.handle.net/10.1080/00031305.2013.852026
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:4:p:203-205
Template-Type: ReDIF-Article 1.0
Author-Name: Kristian Lum
Author-X-Name-First: Kristian
Author-X-Name-Last: Lum
Author-Name: Megan Emily Price
Author-X-Name-First: Megan Emily
Author-X-Name-Last: Price
Author-Name: David Banks
Author-X-Name-First: David
Author-X-Name-Last: Banks
Title: Rejoinder
Journal: The American Statistician
Pages: 205-206
Issue: 4
Volume: 67
Year: 2013
Month: 11
X-DOI: 10.1080/00031305.2013.855109
File-URL: http://hdl.handle.net/10.1080/00031305.2013.855109
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:4:p:205-206
Template-Type: ReDIF-Article 1.0
Author-Name: Paul Fogel
Author-X-Name-First: Paul
Author-X-Name-Last: Fogel
Author-Name: Douglas M. Hawkins
Author-X-Name-First: Douglas M.
Author-X-Name-Last: Hawkins
Author-Name: Chris Beecher
Author-X-Name-First: Chris
Author-X-Name-Last: Beecher
Author-Name: George Luta
Author-X-Name-First: George
Author-X-Name-Last: Luta
Author-Name: S. Stanley Young
Author-X-Name-First: S. Stanley
Author-X-Name-Last: Young
Title: A Tale of Two Matrix Factorizations
Abstract:
In statistical practice, rectangular tables of numeric data
are commonplace, and are often analyzed using dimension-reduction methods
like the singular value decomposition and its close cousin, principal
component analysis (PCA). This analysis produces score and loading
matrices representing the rows and the columns of the original table and
these matrices may be used for both prediction purposes and to gain
structural understanding of the data. In some tables, the data entries are
necessarily nonnegative (apart, perhaps, from some small random noise),
and so the matrix factors meant to represent them should arguably also
contain only nonnegative elements. This thinking, and the desire for
parsimony, underlies such techniques as rotating factors in a search for
"simple structure." These attempts to transform score or
loading matrices of mixed sign into nonnegative, parsimonious forms are,
however, indirect and at best imperfect. The recent development of
nonnegative matrix factorization, or NMF, is an attractive alternative.
Rather than attempt to transform a loading or score matrix of mixed signs
into one with only nonnegative elements, it directly seeks matrix factors
containing only nonnegative elements. The resulting factorization often
leads to substantial improvements in interpretability of the factors. We
illustrate this potential by synthetic examples and a real dataset. The
question of exactly when NMF is effective is not fully resolved, but some
indicators of its domain of success are given. It is pointed out that the
NMF factors can be used in much the same way as those coming from PCA for
such tasks as ordination, clustering, and prediction. Supplementary
materials for this article are available online.
Journal: The American Statistician
Pages: 207-218
Issue: 4
Volume: 67
Year: 2013
Month: 11
X-DOI: 10.1080/00031305.2013.845607
File-URL: http://hdl.handle.net/10.1080/00031305.2013.845607
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:4:p:207-218
Template-Type: ReDIF-Article 1.0
Author-Name: Nicholas J. Horton
Author-X-Name-First: Nicholas J.
Author-X-Name-Last: Horton
Title: I Hear, I Forget. I Do, I Understand: A Modified Moore-Method Mathematical Statistics Course
Abstract:
Moore introduced a method for graduate mathematics
instruction that consisted primarily of individual student work on
challenging proofs. Cohen described an adaptation with less explicit
competition suitable for undergraduate students at a liberal arts college.
This article details an adaptation of this modified Moore method to teach
mathematical statistics, and describes ways that such an approach helps
engage students and foster the teaching of statistics. Groups of students
worked a set of three difficult problems (some theoretical, some applied)
every two weeks. Class time was devoted to coaching sessions with the
instructor, group meeting time, and class presentations. R was used to
estimate solutions empirically, where analytic results were intractable,
as well as to provide an environment to undertake simulation studies with
the aim of deepening understanding and complementing analytic solutions.
Each group presented comprehensive solutions to complement oral
presentations. Development of parallel techniques for empirical and
analytic problem solving was an explicit goal of the course, which also
attempted to communicate ways that statistics can be used to tackle
interesting problems. The group problem-solving component and use of
technology allowed students to attempt much more challenging questions
than they could otherwise solve. Supplementary materials for this article
are available online.
Journal: The American Statistician
Pages: 219-228
Issue: 4
Volume: 67
Year: 2013
Month: 11
X-DOI: 10.1080/00031305.2013.849207
File-URL: http://hdl.handle.net/10.1080/00031305.2013.849207
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:4:p:219-228
Template-Type: ReDIF-Article 1.0
Author-Name: Howard Gitlow
Author-X-Name-First: Howard
Author-X-Name-Last: Gitlow
Author-Name: Hernan Awad
Author-X-Name-First: Hernan
Author-X-Name-Last: Awad
Title: Intro Stats Students Need Both Confidence and Tolerance (Intervals)
Abstract:
Tolerance intervals are typically not taught in introductory
statistics courses aimed at business, engineering, and science majors.
This is regrettable, since students are likely to encounter practical
problems that should be analyzed using tolerance intervals. Additionally,
contrasting tolerance intervals against confidence intervals will improve
students' understanding of confidence intervals, eliminating
frequent confusions. In this article, we make the argument for teaching
tolerance intervals in introductory statistics courses, and we offer
suggestions about what to teach.
Journal: The American Statistician
Pages: 229-234
Issue: 4
Volume: 67
Year: 2013
Month: 11
X-DOI: 10.1080/00031305.2013.839482
File-URL: http://hdl.handle.net/10.1080/00031305.2013.839482
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:4:p:229-234
Template-Type: ReDIF-Article 1.0
Author-Name: Yeyi Zhu
Author-X-Name-First: Yeyi
Author-X-Name-Last: Zhu
Author-Name: Ladia M. Hernandez
Author-X-Name-First: Ladia M.
Author-X-Name-Last: Hernandez
Author-Name: Peter Mueller
Author-X-Name-First: Peter
Author-X-Name-Last: Mueller
Author-Name: Yongquan Dong
Author-X-Name-First: Yongquan
Author-X-Name-Last: Dong
Author-Name: Michele R. Forman
Author-X-Name-First: Michele R.
Author-X-Name-Last: Forman
Title: Data Acquisition and Preprocessing in Studies on Humans: What is Not Taught in Statistics Classes?
Abstract:
The aim of this article is to address issues in research that
may be missing from statistics classes and important for (bio-) statistics
students. In the context of a case study, we discuss data acquisition and
preprocessing steps that fill the gap between research questions posed by
subject matter scientists and statistical methodology for formal
inference. Issues include participant recruitment, data collection
training and standardization, variable coding, data review and
verification, data cleaning and editing, and documentation. Despite the
critical importance of these details in research, most of these issues are
rarely discussed in an applied statistics program. One reason for the lack
of more formal training is the difficulty in addressing the many
challenges that can possibly arise in the course of a study in a
systematic way. This article can help to bridge the gap between research
questions and formal statistical inference by using an illustrative case
study for a discussion. We hope that reading and discussing this article
and practicing data preprocessing exercises will sensitize statistics
students to these important issues and achieve optimal conduct, quality
control, analysis, and interpretation of a study.
Journal: The American Statistician
Pages: 235-241
Issue: 4
Volume: 67
Year: 2013
Month: 11
X-DOI: 10.1080/00031305.2013.842498
File-URL: http://hdl.handle.net/10.1080/00031305.2013.842498
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:4:p:235-241
Template-Type: ReDIF-Article 1.0
Author-Name: Richard L. Warr
Author-X-Name-First: Richard L.
Author-X-Name-Last: Warr
Author-Name: Roger A. Erich
Author-X-Name-First: Roger A.
Author-X-Name-Last: Erich
Title: Should the Interquartile Range Divided by the Standard Deviation be Used to Assess Normality?
Abstract:
We discourage the use of a diagnostic for normality: the
interquartile range divided by the standard deviation. This statistic has
been suggested in several introductory statistics books as a method to
assess normality. Through simulation, we explore the rate at which this
statistic converges to its asymptotic normal distribution, and the actual
size of tests based on the asymptotic distribution at several sample
sizes. We show that there are nonnormal distributions from which this
method cannot detect a difference. Additionally, we show the power of this
test for normality is quite poor when compared with the Shapiro--Wilk
test.
Journal: The American Statistician
Pages: 242-244
Issue: 4
Volume: 67
Year: 2013
Month: 11
X-DOI: 10.1080/00031305.2013.847385
File-URL: http://hdl.handle.net/10.1080/00031305.2013.847385
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:4:p:242-244
Template-Type: ReDIF-Article 1.0
Author-Name: Changyong Feng
Author-X-Name-First: Changyong
Author-X-Name-Last: Feng
Author-Name: Hongyue Wang
Author-X-Name-First: Hongyue
Author-X-Name-Last: Wang
Author-Name: Yu Han
Author-X-Name-First: Yu
Author-X-Name-Last: Han
Author-Name: Yinglin Xia
Author-X-Name-First: Yinglin
Author-X-Name-Last: Xia
Author-Name: Xin M. Tu
Author-X-Name-First: Xin M.
Author-X-Name-Last: Tu
Title: The Mean Value Theorem and Taylor's Expansion in Statistics
Abstract:
The mean value theorem and Taylor's expansion are
powerful tools in statistics that are used to derive estimators from
nonlinear estimating equations and to study the asymptotic properties of
the resulting estimators. However, the mean value theorem for a
vector-valued differentiable function does not exist. Our survey shows
that this nonexistent theorem has been used for a long time in statistical
literature to derive the asymptotic properties of estimators and is still
being used. We review several frequently cited papers and monographs that
have misused this "theorem" and discuss the flaws in these
applications. We also offer methods to fix such errors.
Journal: The American Statistician
Pages: 245-248
Issue: 4
Volume: 67
Year: 2013
Month: 11
X-DOI: 10.1080/00031305.2013.844203
File-URL: http://hdl.handle.net/10.1080/00031305.2013.844203
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:4:p:245-248
Template-Type: ReDIF-Article 1.0
Author-Name: Sivan Aldor-Noiman
Author-X-Name-First: Sivan
Author-X-Name-Last: Aldor-Noiman
Author-Name: Lawrence D. Brown
Author-X-Name-First: Lawrence D.
Author-X-Name-Last: Brown
Author-Name: Andreas Buja
Author-X-Name-First: Andreas
Author-X-Name-Last: Buja
Author-Name: Wolfgang Rolke
Author-X-Name-First: Wolfgang
Author-X-Name-Last: Rolke
Author-Name: Robert A. Stine
Author-X-Name-First: Robert A.
Author-X-Name-Last: Stine
Title: The Power to See: A New Graphical Test of Normality
Abstract:
Many statistical procedures assume that the underlying
data-generating process involves Gaussian errors. Among the popular tests
for normality, only the Kolmogorov--Smirnov test has a graphical
representation. Alternative tests, such as the Shapiro--Wilk test, offer
little insight as to how the observed data deviate from normality. In this
article, we discuss a simple new graphical procedure which provides
simultaneous confidence bands for a normal quantile--quantile plot. These
bands define a test of normality and are narrower in the tails than those
related to the Kolmogorov--Smirnov test. Correspondingly, the new
procedure has greater power to detect deviations from normality in the
tails. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 249-260
Issue: 4
Volume: 67
Year: 2013
Month: 11
X-DOI: 10.1080/00031305.2013.847865
File-URL: http://hdl.handle.net/10.1080/00031305.2013.847865
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:4:p:249-260
Template-Type: ReDIF-Article 1.0
Author-Name: Nancy L. Segal
Author-X-Name-First: Nancy L.
Author-X-Name-Last: Segal
Author-Name: Jorge Torres
Author-X-Name-First: Jorge
Author-X-Name-Last: Torres
Title: A Repeated Grammatical Error Does Not Make it Right
Journal: The American Statistician
Pages: 266-266
Issue: 4
Volume: 67
Year: 2013
Month: 11
X-DOI: 10.1080/00031305.2013.834269
File-URL: http://hdl.handle.net/10.1080/00031305.2013.834269
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:67:y:2013:i:4:p:266-266
Template-Type: ReDIF-Article 1.0
Author-Name: Timothy W. Armistead
Author-X-Name-First: Timothy W.
Author-X-Name-Last: Armistead
Title: Resurrecting the Third Variable: A Critique of Pearl's Causal Analysis of Simpson's Paradox
Abstract:
Pearl argued that Simpson's Paradox would
not be considered paradoxical but for statisticians' unwillingness to
acknowledge the role of causality in resolving an instance of it. He
proposed using a causal calculus to determine which set of contradictory
findings in an instance of the paradox should be accepted-the aggregated
data or the data disaggregated by conditioning on the third variable.
Pearl used the example of a hypothetical quasi-experiment to argue that
when third variables are not causal, one should not condition on them,
and-assuming no other sources of confounding-the aggregated data should be
accepted. Pearl was precipitate in his argument that it would be
inappropriate to condition on the noncausal third variables in the
example. Whether causal or not, third variables can convey critical
information about a first-order relationship, study design, and previously
unobserved variables. Any conditioning on a nontrivial third variable that
produces Simpson's Paradox should be carefully examined before either the
aggregated or the disaggregated findings are accepted, regardless of
whether the third variable is thought to be causal. In some cases, neither
set of data is trustworthy; in others, both convey information of value.
Pearl's hypothetical example is used to illustrate this argument.
Journal: The American Statistician
Pages: 1-7
Issue: 1
Volume: 68
Year: 2014
Month: 2
X-DOI: 10.1080/00031305.2013.807750
File-URL: http://hdl.handle.net/10.1080/00031305.2013.807750
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:1:p:1-7
Template-Type: ReDIF-Article 1.0
Author-Name: Judea Pearl
Author-X-Name-First: Judea
Author-X-Name-Last: Pearl
Title: Comment: Understanding Simpson's Paradox
Journal: The American Statistician
Pages: 8-13
Issue: 1
Volume: 68
Year: 2014
Month: 2
X-DOI: 10.1080/00031305.2014.876829
File-URL: http://hdl.handle.net/10.1080/00031305.2014.876829
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:1:p:8-13
Template-Type: ReDIF-Article 1.0
Author-Name: Ronald Christensen
Author-X-Name-First: Ronald
Author-X-Name-Last: Christensen
Title: Comment
Abstract:
I discuss predicting outcomes and the
roles of causation and sampling design.
Journal: The American Statistician
Pages: 13-17
Issue: 1
Volume: 68
Year: 2014
Month: 2
X-DOI: 10.1080/00031305.2014.876832
File-URL: http://hdl.handle.net/10.1080/00031305.2014.876832
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:1:p:13-17
Template-Type: ReDIF-Article 1.0
Author-Name: Keli Liu
Author-X-Name-First: Keli
Author-X-Name-Last: Liu
Author-Name: Xiao-Li Meng
Author-X-Name-First: Xiao-Li
Author-X-Name-Last: Meng
Title: Comment: A Fruitful Resolution to Simpson's Paradox via Multiresolution Inference
Abstract:
Simpson's Paradox is really a Simple
Paradox if one at all. Peeling away the paradox is as easy (or hard) as
avoiding a comparison of apples and oranges, a concept requiring no
mention of causality. We show how the commonly adopted notation has
committed the gross-ery mistake of tagging unlike fruit with alike labels.
Hence, the "fruitful" question to ask is not "Do we condition on the third
variable?" but rather "Are two fruits, which appear similar, actually
similar at their core?." We introduce the concept of
intrinsic similarity to escape this bind. The notion of
"core" depends on how deep one looks-the multi resolution inference
framework provides a natural way to define intrinsic similarity at the
resolution appropriate for the treatment. To harvest the fruits of this
insight, we will need to estimate intrinsic similarity, which often
results in an indirect conditioning on the "third variable." A ripening
estimation theory shows that the standard treatment comparisons,
unconditional or conditional on the third variable, are low hanging fruit
but often rotten. We pose assumptions to pluck away higher-resolution
(more conditional) comparisons-the multiresolution framework allows us to
rigorously assess the price of these assumptions against the resulting
yield. One such assessment gives us Simpson's Warning: less
conditioning is most likely to lead to serious bias when Simpson's Paradox
appears.
Journal: The American Statistician
Pages: 17-29
Issue: 1
Volume: 68
Year: 2014
Month: 2
X-DOI: 10.1080/00031305.2014.876842
File-URL: http://hdl.handle.net/10.1080/00031305.2014.876842
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:1:p:17-29
Template-Type: ReDIF-Article 1.0
Author-Name: Timothy Armistead
Author-X-Name-First: Timothy
Author-X-Name-Last: Armistead
Title: Rejoinder
Journal: The American Statistician
Pages: 30-31
Issue: 1
Volume: 68
Year: 2014
Month: 2
X-DOI: 10.1080/00031305.2014.879772
File-URL: http://hdl.handle.net/10.1080/00031305.2014.879772
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:1:p:30-31
Template-Type: ReDIF-Article 1.0
Author-Name: Peng Ding
Author-X-Name-First: Peng
Author-X-Name-Last: Ding
Title: Three Occurrences of the Hyperbolic-Secant Distribution
Abstract:
Although it is the generator distribution
of the sixth natural exponential family with quadratic variance function,
the Hyperbolic-Secant (HS) distribution is much less known than other
distributions in the exponential families. Its lack of familiarity is due
to its isolation from many widely used statistical models. We fill in the
gap by showing three examples naturally generating the HS distribution,
including Fisher's analysis of similarity between twins, the Jeffreys'
prior for contingency tables, and invalid instrumental variables.
Journal: The American Statistician
Pages: 32-35
Issue: 1
Volume: 68
Year: 2014
Month: 2
X-DOI: 10.1080/00031305.2013.867902
File-URL: http://hdl.handle.net/10.1080/00031305.2013.867902
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:1:p:32-35
Template-Type: ReDIF-Article 1.0
Author-Name: Edward J. Bedrick
Author-X-Name-First: Edward J.
Author-X-Name-Last: Bedrick
Title: Two Useful Reformulations of the Hazard Ratio
Abstract:
The hazard ratio is a standard summary for
comparing survival curves yet hazard ratios are often difficult for
scientists and clinicians to interpret. Insight into the interpretation of
hazard ratios is obtained by relating hazard ratios to the maximum
difference and an average difference between survival probabilities. These
reformulations of the hazard ratio are useful in classroom discussions of
survival analysis and when discussing analyses with scientists and
clinicians. Large-sample distribution theory is provided for these
reformulations of the hazard ratio. Two examples are used to illustrate
the ideas.
Journal: The American Statistician
Pages: 36-41
Issue: 1
Volume: 68
Year: 2014
Month: 2
X-DOI: 10.1080/00031305.2013.868827
File-URL: http://hdl.handle.net/10.1080/00031305.2013.868827
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:1:p:36-41
Template-Type: ReDIF-Article 1.0
Author-Name: Geoffrey Jones
Author-X-Name-First: Geoffrey
Author-X-Name-Last: Jones
Author-Name: Wesley O. Johnson
Author-X-Name-First: Wesley O.
Author-X-Name-Last: Johnson
Title: Prior Elicitation: Interactive Spreadsheet Graphics With Sliders Can Be Fun, and Informative
Abstract:
There are several approaches to setting
priors in Bayesian data analysis. Some attempt to minimize the impact of
the prior on the posterior, allowing the data to "speak for themselves,"
or to provide Bayesian inferences that have good frequentist properties.
In contrast, this note focuses on priors where scientific knowledge is
used, possibly partially informative. There are many articles on the use
of such subjective information. We focus on using standard software for
eliciting priors from subject-matter specialists, in the form of models
such as the binomial, Poisson, and normal. Our approach uses a common
spreadsheet package with the facility to display dynamic pictures of prior
distributions as the user toggles scroll bars or "sliders" that manipulate
parameters of particular distributions. This allows interactive
exploration of the shape of a probability distribution. We have found this
a useful tool when eliciting priors for Bayesian data analysis. We present
examples to illustrate the scope and flexibility of the method.
Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 42-51
Issue: 1
Volume: 68
Year: 2014
Month: 2
X-DOI: 10.1080/00031305.2013.868828
File-URL: http://hdl.handle.net/10.1080/00031305.2013.868828
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:1:p:42-51
Template-Type: ReDIF-Article 1.0
Author-Name: Lynn Roy LaMotte
Author-X-Name-First: Lynn Roy
Author-X-Name-Last: LaMotte
Title: The Gram-Schmidt Construction as a Basis for Linear Models
Abstract:
The Gram-Schmidt construction, with a
little extension, can be used to establish results in linear algebra,
multiple regression analysis, and the theory of linear models. This
article describes and illustrates how it serves to develop the basic
results required for statistical inference in the Gauss--Markov model. For
upper-level theory courses, the method's advantage is that it requires
less background and fewer results in linear algebra than are usually
required. For applications-oriented courses, it makes it possible to
describe relations and computations simply and explicitly.
Journal: The American Statistician
Pages: 52-55
Issue: 1
Volume: 68
Year: 2014
Month: 2
X-DOI: 10.1080/00031305.2013.875485
File-URL: http://hdl.handle.net/10.1080/00031305.2013.875485
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:1:p:52-55
Template-Type: ReDIF-Article 1.0
Author-Name: A. J. Hayter
Author-X-Name-First: A. J.
Author-X-Name-Last: Hayter
Title: Simultaneous Confidence Intervals for Several Quantiles of an Unknown Distribution
Abstract:
Given a sample of independent observations
from an unknown continuous distribution, it is standard practice to
construct a confidence interval for a specified quantile of the
distribution using the binomial distribution. Furthermore, confidence
bands for the unknown cumulative distribution function, such as
Kolmogorov's, provide simultaneous confidence intervals for all quantiles
of the distribution, which are necessarily wider than the individual
confidence intervals at the same confidence level. The purpose of this
article is to show how simultaneous confidence intervals for several
specified quantiles of the unknown distribution can be calculated using
probabilities from a multinomial distribution. An efficient recursive
algorithm is described for these calculations. An experimenter may
typically be interested in several quantiles of the distribution, such as
the median, quartiles, and upper and lower tail quantiles, and this
methodology provides a bridge between the confidence intervals with
individual confidence levels and those that can be obtained from
confidence bands. Some examples of the implementation of this
nonparametric methodology are provided, and some comparisons are made with
some parametric approaches to the problem.
Journal: The American Statistician
Pages: 56-62
Issue: 1
Volume: 68
Year: 2014
Month: 2
X-DOI: 10.1080/00031305.2013.869259
File-URL: http://hdl.handle.net/10.1080/00031305.2013.869259
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:1:p:56-62
Template-Type: ReDIF-Article 1.0
Author-Name: Nitis Mukhopadhyay
Author-X-Name-First: Nitis
Author-X-Name-Last: Mukhopadhyay
Title: Letter to the Editor: Griffith, Daniel A. (2013), "Better Articulating Normal Curve for Introductory Mathematical Statistics Students: Power Transformations," The American Statistician, 67, 157-169
Journal: The American Statistician
Pages: 67-67
Issue: 1
Volume: 68
Year: 2014
Month: 2
X-DOI: 10.1080/00031305.2013.867903
File-URL: http://hdl.handle.net/10.1080/00031305.2013.867903
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:1:p:67-67
Template-Type: ReDIF-Article 1.0
Author-Name: Daniel A. Griffith
Author-X-Name-First: Daniel A.
Author-X-Name-Last: Griffith
Title: Reply
Journal: The American Statistician
Pages: 67-69
Issue: 1
Volume: 68
Year: 2014
Month: 2
X-DOI: 10.1080/00031305.2014.890005
File-URL: http://hdl.handle.net/10.1080/00031305.2014.890005
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:1:p:67-69
Template-Type: ReDIF-Article 1.0
Author-Name: David A. Harville
Author-X-Name-First: David A.
Author-X-Name-Last: Harville
Title: The Need for More Emphasis on Prediction: A "Nondenominational" Model-Based Approach
Abstract:
Prediction problems are ubiquitous. In a
model-based approach to predictive inference, the values of random
variables that are presently observable are used to make inferences about
the values of random variables that will become observable in the future,
and the joint distribution of the random variables or various of its
characteristics are assumed to be known up to the value of a vector of
unknown parameters. Such an approach has proved to be highly effective in
many important applications.This article argues that the performance of a
prediction procedure in repeated application is important and should play
a significant role in its evaluation. A "nondenominational" model-based
approach to predictive inference is described and discussed; what in a
Bayesian approach would be regarded as a prior distribution is simply
regarded as part of a model that is hierarchical in nature. Some specifics
are given for mixed-effects linear models, and an application to the
prediction of the outcomes of basketball or football games (and to the
ranking and rating of basketball or football teams) is included for
purposes of illustration.
Journal: The American Statistician
Pages: 71-83
Issue: 2
Volume: 68
Year: 2014
Month: 5
X-DOI: 10.1080/00031305.2013.836987
File-URL: http://hdl.handle.net/10.1080/00031305.2013.836987
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:71-83
Template-Type: ReDIF-Article 1.0
Author-Name: Hal Stern
Author-X-Name-First: Hal
Author-X-Name-Last: Stern
Title: Comment
Journal: The American Statistician
Pages: 83-84
Issue: 2
Volume: 68
Year: 2014
Month: 5
X-DOI: 10.1080/00031305.2014.897257
File-URL: http://hdl.handle.net/10.1080/00031305.2014.897257
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:83-84
Template-Type: ReDIF-Article 1.0
Author-Name: Dale L. Zimmerman
Author-X-Name-First: Dale L.
Author-X-Name-Last: Zimmerman
Title: Comment
Journal: The American Statistician
Pages: 85-86
Issue: 2
Volume: 68
Year: 2014
Month: 5
X-DOI: 10.1080/00031305.2014.898973
File-URL: http://hdl.handle.net/10.1080/00031305.2014.898973
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:85-86
Template-Type: ReDIF-Article 1.0
Author-Name: Robert McCulloch
Author-X-Name-First: Robert
Author-X-Name-Last: McCulloch
Title: Comment
Journal: The American Statistician
Pages: 87-88
Issue: 2
Volume: 68
Year: 2014
Month: 5
X-DOI: 10.1080/00031305.2014.904174
File-URL: http://hdl.handle.net/10.1080/00031305.2014.904174
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:87-88
Template-Type: ReDIF-Article 1.0
Author-Name: Donald A. Berry
Author-X-Name-First: Donald A.
Author-X-Name-Last: Berry
Author-Name: Scott M. Berry
Author-X-Name-First: Scott M.
Author-X-Name-Last: Berry
Title: Comment
Journal: The American Statistician
Pages: 88-89
Issue: 2
Volume: 68
Year: 2014
Month: 5
X-DOI: 10.1080/00031305.2014.911546
File-URL: http://hdl.handle.net/10.1080/00031305.2014.911546
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:88-89
Template-Type: ReDIF-Article 1.0
Author-Name: David A. Harville
Author-X-Name-First: David A.
Author-X-Name-Last: Harville
Title: Rejoinder
Journal: The American Statistician
Pages: 89-92
Issue: 2
Volume: 68
Year: 2014
Month: 5
X-DOI: 10.1080/00031305.2014.904640
File-URL: http://hdl.handle.net/10.1080/00031305.2014.904640
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:89-92
Template-Type: ReDIF-Article 1.0
Author-Name: Woojoo Lee
Author-X-Name-First: Woojoo
Author-X-Name-Last: Lee
Author-Name: Yudi Pawitan
Author-X-Name-First: Yudi
Author-X-Name-Last: Pawitan
Title: Direct Calculation of the Variance of Maximum Penalized Likelihood Estimates via EM Algorithm
Abstract:
The variance of the maximum penalized
likelihood estimate obtained through the EM algorithm has not been
explored in detail. We provide a simple and intuitive new representation
for the variance that can be computed from the EM algorithm directly. For
pedagogical purposes, we illustrate the new formula with two examples
where analytical solutions are possible.
Journal: The American Statistician
Pages: 93-97
Issue: 2
Volume: 68
Year: 2014
Month: 5
X-DOI: 10.1080/00031305.2014.899273
File-URL: http://hdl.handle.net/10.1080/00031305.2014.899273
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:93-97
Template-Type: ReDIF-Article 1.0
Author-Name: Reto Bürgin
Author-X-Name-First: Reto
Author-X-Name-Last: Bürgin
Author-Name: Gilbert Ritschard
Author-X-Name-First: Gilbert
Author-X-Name-Last: Ritschard
Title: A Decorated Parallel Coordinate Plot for Categorical Longitudinal Data
Abstract:
This article proposes a decorated parallel
coordinate plot for longitudinal categorical data, featuring a jitter
mechanism revealing the diversity of observed longitudinal patterns and
allowing the tracking of each individual pattern, variable point and line
widths reflecting weighted pattern frequencies, the rendering of
simultaneous events, and different filter options for highlighting typical
patterns. The proposed visual display has been developed for describing
and exploring the order of event occurrences, but it can be equally
applied to other types of longitudinal categorical data. Alongside the
description of the principle of the plot, we demonstrate the scope of the
plot with a real dataset. A second application and R code for the plot are
available online as supplementary materials.
Journal: The American Statistician
Pages: 98-103
Issue: 2
Volume: 68
Year: 2014
Month: 5
X-DOI: 10.1080/00031305.2014.887591
File-URL: http://hdl.handle.net/10.1080/00031305.2014.887591
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:98-103
Template-Type: ReDIF-Article 1.0
Author-Name: Matthew W. Guerra
Author-X-Name-First: Matthew W.
Author-X-Name-Last: Guerra
Author-Name: Justine Shults
Author-X-Name-First: Justine
Author-X-Name-Last: Shults
Title: A Note on the Simulation of Overdispersed Random Variables With Specified Marginal Means and Product Correlations
Abstract:
We propose a straightforward approach for
simulation of discrete random variables with overdispersion, specified
marginal means, and product correlations. The method stems from results we
prove for variables with first-order antedependence and linearity of the
conditional expectations and is therefore appropriate to simulate
variables with these properties. Supplementary materials for this article
are available online.
Journal: The American Statistician
Pages: 104-107
Issue: 2
Volume: 68
Year: 2014
Month: 5
X-DOI: 10.1080/00031305.2014.887592
File-URL: http://hdl.handle.net/10.1080/00031305.2014.887592
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:104-107
Template-Type: ReDIF-Article 1.0
Author-Name: Ulrike Grömping
Author-X-Name-First: Ulrike
Author-X-Name-Last: Grömping
Title: Mosaic Plots are Useful for Visualizing Low-Order Projections of Factorial Designs
Abstract:
Factorial experiments are widely used in
industrial experimentation and other fields. Whenever a factorial
experiment is not designed as a full factorial, but as a regular or
nonregular fraction thereof, choice between competing designs and
interpretation of experimental results should take into consideration how
the experimental plan will confound experimental effects. This article
proposes mosaic plots of low-order projections of factorial designs for
visualizing confounding of low-order effects. Mosaic plots are
particularly useful for design and analysis of orthogonal main effect
plans. The R code for the creation of the plots in this article is
available online in the supplementary material.
Journal: The American Statistician
Pages: 108-116
Issue: 2
Volume: 68
Year: 2014
Month: 5
X-DOI: 10.1080/00031305.2014.896829
File-URL: http://hdl.handle.net/10.1080/00031305.2014.896829
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:108-116
Template-Type: ReDIF-Article 1.0
Author-Name: Stuart Baker
Author-X-Name-First: Stuart
Author-X-Name-Last: Baker
Author-Name: Jian-Lun Xu
Author-X-Name-First: Jian-Lun
Author-X-Name-Last: Xu
Author-Name: Ping Hu
Author-X-Name-First: Ping
Author-X-Name-Last: Hu
Author-Name: Peng Huang
Author-X-Name-First: Peng
Author-X-Name-Last: Huang
Title: Vardeman, S. B. and Morris, M. D. (2013), "Majority Voting by Independent Classifiers can Increase Error Rates," The American Statistician, 67, 94-96: Comment by Baker, Xu, Hu, and Huang and Reply
Journal: The American Statistician
Pages: 125-126
Issue: 2
Volume: 68
Year: 2014
Month: 5
X-DOI: 10.1080/00031305.2014.882867
File-URL: http://hdl.handle.net/10.1080/00031305.2014.882867
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:125-126
Template-Type: ReDIF-Article 1.0
Author-Name: Bart Holland
Author-X-Name-First: Bart
Author-X-Name-Last: Holland
Title: Segal, N. L., and Torres, J. (2013), "A Repeated Grammatical Error Does Not Make it Right," The American Statistician, 67, 266: Comment by Holland and Reply
Journal: The American Statistician
Pages: 127-127
Issue: 2
Volume: 68
Year: 2014
Month: 5
X-DOI: 10.1080/00031305.2014.887593
File-URL: http://hdl.handle.net/10.1080/00031305.2014.887593
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:127-127
Template-Type: ReDIF-Article 1.0
Author-Name: Nancy L. Segal
Author-X-Name-First: Nancy L.
Author-X-Name-Last: Segal
Author-Name: Jorge Luis Torres
Author-X-Name-First: Jorge Luis
Author-X-Name-Last: Torres
Title: Reply
Journal: The American Statistician
Pages: 127-128
Issue: 2
Volume: 68
Year: 2014
Month: 5
X-DOI: 10.1080/00031305.2014.890483
File-URL: http://hdl.handle.net/10.1080/00031305.2014.890483
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:127-128
Template-Type: ReDIF-Article 1.0
Author-Name: Yefim Haim Michlin
Author-X-Name-First: Yefim Haim
Author-X-Name-Last: Michlin
Author-Name: Ofer Shaham
Author-X-Name-First: Ofer
Author-X-Name-Last: Shaham
Title: Ignatova, I., Deutsch, R. C., and Edwards, D. (2012), "Closed Sequential and Multistage Inference on Binary Responses With or Without Replacement," The American Statistician, 66, 163-172: Comment by Michlin and Shaham and Reply
Journal: The American Statistician
Pages: 128-128
Issue: 2
Volume: 68
Year: 2014
Month: 5
X-DOI: 10.1080/00031305.2014.897256
File-URL: http://hdl.handle.net/10.1080/00031305.2014.897256
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:128-128
Template-Type: ReDIF-Article 1.0
Author-Name: Lina Ignatova
Author-X-Name-First: Lina
Author-X-Name-Last: Ignatova
Author-Name: Roland C. Deutsch
Author-X-Name-First: Roland C.
Author-X-Name-Last: Deutsch
Author-Name: Don Edwards
Author-X-Name-First: Don
Author-X-Name-Last: Edwards
Title: Reply
Journal: The American Statistician
Pages: 129-129
Issue: 2
Volume: 68
Year: 2014
Month: 5
X-DOI: 10.1080/00031305.2014.898969
File-URL: http://hdl.handle.net/10.1080/00031305.2014.898969
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:129-129
Template-Type: ReDIF-Article 1.0
Author-Name: Gul Inan
Author-X-Name-First: Gul
Author-X-Name-Last: Inan
Author-Name: Ozlem Ilk-Dag
Author-X-Name-First: Ozlem
Author-X-Name-Last: Ilk-Dag
Author-Name: Alexander de Leon
Author-X-Name-First: Alexander
Author-X-Name-Last: de Leon
Title: Kim, Y., Choi, Y.-K., and Emery, S. (2013), "Logistic Regression With Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages," The American Statistician, 67, 171-182
Journal: The American Statistician
Pages: 129-130
Issue: 2
Volume: 68
Year: 2014
Month: 5
X-DOI: 10.1080/00031305.2014.904251
File-URL: http://hdl.handle.net/10.1080/00031305.2014.904251
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:129-130
Template-Type: ReDIF-Article 1.0
Author-Name: Yoonsang Kim
Author-X-Name-First: Yoonsang
Author-X-Name-Last: Kim
Author-Name: Sherry Emery
Author-X-Name-First: Sherry
Author-X-Name-Last: Emery
Title: Reply
Journal: The American Statistician
Pages: 130-131
Issue: 2
Volume: 68
Year: 2014
Month: 5
X-DOI: 10.1080/00031305.2014.904638
File-URL: http://hdl.handle.net/10.1080/00031305.2014.904638
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:2:p:130-131
Template-Type: ReDIF-Article 1.0
Author-Name: Mark F. Schilling
Author-X-Name-First: Mark F.
Author-X-Name-Last: Schilling
Author-Name: Jimmy A. Doi
Author-X-Name-First: Jimmy A.
Author-X-Name-Last: Doi
Title: A Coverage Probability Approach to Finding an Optimal Binomial Confidence Procedure
Abstract:
The problem of finding confidence intervals for the success parameter of a
binomial experiment has a long history, and a myriad of procedures have
been developed. Most exploit the duality between hypothesis testing and
confidence regions and are typically based on large sample approximations.
We instead employ a direct approach that attempts to determine the optimal
coverage probability function a binomial confidence procedure can have
from the exact underlying binomial distributions, which in turn
defines the associated procedure. We show that a
graphical perspective provides much insight into the problem. Both
procedures whose coverage never falls below the declared confidence level
and those that achieve that level only approximately are analyzed. We
introduce the Length/Coverage Optimal method, a variant of Sterne's
procedure that minimizes average length while maximizing coverage among
all length minimizing procedures, and show that it is superior in
important ways to existing procedures.
Journal: The American Statistician
Pages: 133-145
Issue: 3
Volume: 68
Year: 2014
Month: 2
X-DOI: 10.1080/00031305.2014.899274
File-URL: http://hdl.handle.net/10.1080/00031305.2014.899274
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:3:p:133-145
Template-Type: ReDIF-Article 1.0
Author-Name: Weiwen Miao
Author-X-Name-First: Weiwen
Author-X-Name-Last: Miao
Author-Name: Joseph L. Gastwirth
Author-X-Name-First: Joseph L.
Author-X-Name-Last: Gastwirth
Title: New Statistical Tests for Detecting Disparate Impact Arising From Two-Stage Selection Processes
Abstract:
Statistical evidence of a significant difference between the performance
of a protected group and the majority on a preemployment exam is often
critical when a court decides whether the exam has a disparate impact,
that is, whether the exam has a disproportionate adverse impact on
minority candidates. In many cases, the hiring or promotion process
consists of two steps. Since disparate impact can occur at each step,
parties submitting evidence may use statistical tests at each stage
without accounting for a potential multiple comparisons problem. Because
different courts have focused on data concerning either one or the other
step or a composite of both, they have reached opposite conclusions when
faced with similar data. After illustrating the issues, two two-step tests
are recommended to alleviate the problem. The large sample properties of
these tests are obtained. A simulation study shows that in most
situations, the new tests have higher power than the ones in current use.
Journal: The American Statistician
Pages: 146-157
Issue: 3
Volume: 68
Year: 2014
Month: 4
X-DOI: 10.1080/00031305.2014.917054
File-URL: http://hdl.handle.net/10.1080/00031305.2014.917054
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:3:p:146-157
Template-Type: ReDIF-Article 1.0
Author-Name: Albert Vexler
Author-X-Name-First: Albert
Author-X-Name-Last: Vexler
Author-Name: Wan-Min Tsai
Author-X-Name-First: Wan-Min
Author-X-Name-Last: Tsai
Author-Name: Alan D. Hutson
Author-X-Name-First: Alan D.
Author-X-Name-Last: Hutson
Title: A Simple Density-Based Empirical Likelihood Ratio Test for Independence
Abstract:
We develop a novel nonparametric likelihood ratio test for independence
between two random variables using a technique that is free of the common
constraints of defining a given set of specific dependence structures. Our
methodology revolves around an exact density-based empirical likelihood
ratio test statistic that approximates in a distribution-free fashion the
corresponding most powerful parametric likelihood ratio test. We
demonstrate that the proposed test is very powerful in detecting general
structures of dependence between two random variables, including nonlinear
and/or random-effect dependence structures. An extensive Monte Carlo study
confirms that the proposed test is superior to the classical nonparametric
procedures across a variety of settings. The real-world applicability of
the proposed test is illustrated using data from a study of biomarkers
associated with myocardial infarction. Supplementary materials for this
article are available online.
Journal: The American Statistician
Pages: 158-169
Issue: 3
Volume: 68
Year: 2014
Month: 2
X-DOI: 10.1080/00031305.2014.901922
File-URL: http://hdl.handle.net/10.1080/00031305.2014.901922
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:3:p:158-169
Template-Type: ReDIF-Article 1.0
Author-Name: Djilali Ait Aoudia
Author-X-Name-First: Djilali Ait
Author-X-Name-Last: Aoudia
Author-Name: Éric Marchand
Author-X-Name-First: Éric
Author-X-Name-Last: Marchand
Title: On a Simple Construction of a Bivariate Probability Function With a Common Marginal
Abstract:
We introduce a family of bivariate discrete distributions whose members
are generated by a decreasing mass function p, and with
margins given by p. Several properties and examples are
obtained, including a family of seemingly novel bivariate Poisson
distributions.
Journal: The American Statistician
Pages: 170-173
Issue: 3
Volume: 68
Year: 2014
Month: 2
X-DOI: 10.1080/00031305.2014.904250
File-URL: http://hdl.handle.net/10.1080/00031305.2014.904250
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:3:p:170-173
Template-Type: ReDIF-Article 1.0
Author-Name: Christopher E. Marks
Author-X-Name-First: Christopher E.
Author-X-Name-Last: Marks
Author-Name: Andrew G. Glen
Author-X-Name-First: Andrew G.
Author-X-Name-Last: Glen
Author-Name: Matthew W. Robinson
Author-X-Name-First: Matthew W.
Author-X-Name-Last: Robinson
Author-Name: Lawrence M. Leemis
Author-X-Name-First: Lawrence M.
Author-X-Name-Last: Leemis
Title: Applying Bootstrap Methods to System Reliability
Abstract:
We present a fully enumerated bootstrap method to find the empirical
system lifetime distribution for a coherent system modeled by a
reliability block diagram. Given failure data for individual components of
a coherent system, the bootstrap empirical system lifetime distribution
derived here will be free of resampling error. We further derive
distribution-free expressions for the bias associated with the bootstrap
method for estimating the mean system lifetimes of parallel and series
systems with statistically identical components. We show that
bootstrapping underestimates the mean system lifetime for parallel systems
and overestimates the mean system lifetime for series systems, although
both bootstrap estimates are asymptotically unbiased. The expressions for
the bias are evaluated for several popular parametric lifetime
distributions. Supplementary materials for this article are available
online.
Journal: The American Statistician
Pages: 174-182
Issue: 3
Volume: 68
Year: 2014
Month: 5
X-DOI: 10.1080/00031305.2014.928232
File-URL: http://hdl.handle.net/10.1080/00031305.2014.928232
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:3:p:174-182
Template-Type: ReDIF-Article 1.0
Author-Name: Kai Zhang
Author-X-Name-First: Kai
Author-X-Name-Last: Zhang
Author-Name: Lawrence D. Brown
Author-X-Name-First: Lawrence D.
Author-X-Name-Last: Brown
Author-Name: Edward George
Author-X-Name-First: Edward
Author-X-Name-Last: George
Author-Name: Linda Zhao
Author-X-Name-First: Linda
Author-X-Name-Last: Zhao
Title: Uniform Correlation Mixture of Bivariate Normal Distributions and Hypercubically Contoured Densities That Are Marginally Normal
Abstract:
The bivariate normal density with unit variance and correlation ρ is
well known. We show that by integrating out ρ, the result is a
function of the maximum norm. The Bayesian interpretation of this result
is that if we put a uniform prior over ρ, then the marginal bivariate
density depends only on the maximal magnitude of the variables. The
square-shaped isodensity contour of this resulting marginal bivariate
density can also be regarded as the equally weighted mixture of bivariate
normal distributions over all possible correlation coefficients. This
density links to the Khintchine mixture method of generating random
variables. We use this method to construct the higher dimensional
generalizations of this distribution. We further show that for each
dimension, there is a unique multivariate density that is a differentiable
function of the maximum norm and is marginally normal, and the bivariate
density from the integral over ρ is its special case in two
dimensions.
Journal: The American Statistician
Pages: 183-187
Issue: 3
Volume: 68
Year: 2014
Month: 3
X-DOI: 10.1080/00031305.2014.909741
File-URL: http://hdl.handle.net/10.1080/00031305.2014.909741
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:3:p:183-187
Template-Type: ReDIF-Article 1.0
Author-Name: Liang Hong
Author-X-Name-First: Liang
Author-X-Name-Last: Hong
Title: Two New Elementary Derivations of Geometric Expectation
Abstract:
This article presents two new elementary derivations of the expectation of
the geometric distribution. I also review six existing approaches. I hope
that this article will benefit instructors and students in an introductory
probability course.
Journal: The American Statistician
Pages: 188-190
Issue: 3
Volume: 68
Year: 2014
Month: 3
X-DOI: 10.1080/00031305.2014.915234
File-URL: http://hdl.handle.net/10.1080/00031305.2014.915234
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:3:p:188-190
Template-Type: ReDIF-Article 1.0
Author-Name: Peter H. Westfall
Author-X-Name-First: Peter H.
Author-X-Name-Last: Westfall
Title: Kurtosis as Peakedness, 1905-2014. R.I.P.
Abstract:
The incorrect notion that kurtosis somehow measures "peakedness"
(flatness, pointiness, or modality) of a distribution is remarkably
persistent, despite attempts by statisticians to set the record straight.
This article puts the notion to rest once and for all. Kurtosis tells you
virtually nothing about the shape of the peak-its only unambiguous
interpretation is in terms of tail extremity, that is, either existing
outliers (for the sample kurtosis) or propensity to produce outliers (for
the kurtosis of a probability distribution). To clarify this point,
relevant literature is reviewed, counterexample distributions are given,
and it is shown that the proportion of the kurtosis that is determined by
the central μ ± σ range
is usually quite small.
Journal: The American Statistician
Pages: 191-195
Issue: 3
Volume: 68
Year: 2014
Month: 4
X-DOI: 10.1080/00031305.2014.917055
File-URL: http://hdl.handle.net/10.1080/00031305.2014.917055
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:3:p:191-195
Template-Type: ReDIF-Article 1.0
Author-Name: Catherine Michalopoulou
Author-X-Name-First: Catherine
Author-X-Name-Last: Michalopoulou
Title: A Unique Collaboration: Prominent Statisticians' Survey Work in Greece in 1946
Abstract:
In 1946, Neyman, Jessen, Deming, Kempthorne, Daly, and Blythe conducted a
series of sample surveys as sampling experts of the two Allied Missions
that were set up to observe the preparation and conduct of the Greek
parliamentary elections (March 31) and the revision of electoral rolls for
the plebiscite (September 1). This article revisits these surveys, using
both published and unpublished sources, and discusses the lessons learned
from their history as they relate to current sampling practices.
Journal: The American Statistician
Pages: 196-203
Issue: 3
Volume: 68
Year: 2014
Month: 3
X-DOI: 10.1080/00031305.2014.920276
File-URL: http://hdl.handle.net/10.1080/00031305.2014.920276
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:3:p:196-203
Template-Type: ReDIF-Article 1.0
Author-Name: Robert A. Oster
Author-X-Name-First: Robert A.
Author-X-Name-Last: Oster
Title: Section Editor's Notes
Journal: The American Statistician
Pages: 204-204
Issue: 3
Volume: 68
Year: 2014
Month: 7
X-DOI: 10.1080/00031305.2014.928560
File-URL: http://hdl.handle.net/10.1080/00031305.2014.928560
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:3:p:204-204
Template-Type: ReDIF-Article 1.0
Author-Name: Sara Fontdecaba
Author-X-Name-First: Sara
Author-X-Name-Last: Fontdecaba
Author-Name: Pere Grima
Author-X-Name-First: Pere
Author-X-Name-Last: Grima
Author-Name: Xavier Tort-Martorell
Author-X-Name-First: Xavier
Author-X-Name-Last: Tort-Martorell
Title: Analyzing DOE With Statistical Software Packages: Controversies and Proposals
Abstract:
This article studies and evaluates how five well-known statistical
packages-JMP, Minitab, SigmaXL, Statgraphics, and Statistica-address the
problem of analyzing the significance of effects in unreplicated factorial
designs. All five use different methods and criteria that deliver
different results, even for simple textbook examples. The article shows
that some of the methods used are clearly incorrect and deliver incorrect
results. Finally, it raises the question of the impact that this may have
in hindering the use of design of experiments (DOE) by nonexpert
practitioners, and it provides suggestions for making this analysis more
effective and easier to understand. Supplementary materials for this
article are available online.
Journal: The American Statistician
Pages: 205-211
Issue: 3
Volume: 68
Year: 2014
Month: 5
X-DOI: 10.1080/00031305.2014.923784
File-URL: http://hdl.handle.net/10.1080/00031305.2014.923784
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:3:p:205-211
Template-Type: ReDIF-Article 1.0
Author-Name: Liang Hong
Author-X-Name-First: Liang
Author-X-Name-Last: Hong
Title: Letter to the Editor
Journal: The American Statistician
Pages: 220-220
Issue: 3
Volume: 68
Year: 2014
Month: 7
X-DOI: 10.1080/00031305.2014.908790
File-URL: http://hdl.handle.net/10.1080/00031305.2014.908790
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:3:p:220-220
Template-Type: ReDIF-Article 1.0
Author-Name: Changyong Feng
Author-X-Name-First: Changyong
Author-X-Name-Last: Feng
Author-Name: Hongyue Wang
Author-X-Name-First: Hongyue
Author-X-Name-Last: Wang
Author-Name: Yu Han
Author-X-Name-First: Yu
Author-X-Name-Last: Han
Author-Name: Yinglin Xia
Author-X-Name-First: Yinglin
Author-X-Name-Last: Xia
Author-Name: Xin M. Tu
Author-X-Name-First: Xin M.
Author-X-Name-Last: Tu
Title: Reply
Journal: The American Statistician
Pages: 220a-220a
Issue: 3
Volume: 68
Year: 2014
Month: 7
X-DOI: 10.1080/00031305.2014.916929
File-URL: http://hdl.handle.net/10.1080/00031305.2014.916929
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:3:p:220a-220a
Template-Type: ReDIF-Article 1.0
Author-Name: Robert F. Bordley
Author-X-Name-First: Robert F.
Author-X-Name-Last: Bordley
Title: Reference Class Forecasting: Resolving Its Challenge to Statistical Modeling
Abstract:
Statisticians generally consider statistical modeling superior (or at
least a useful supplement) to experience-based intuition for estimating
the outputs of a complex system. But recent psychological research has led
to an enhancement of experience-based intuition known as reference class
forecasting. The reference class forecasting approach has been championed
as a superior alternative to statistical modeling and is already
well-regarded in the planning community. This presents a challenge to
statistical modeling. To address this challenge, this article uses a
Bayesian approach for combining the reference class forecast and the
model-based forecast. The Bayesian prior is informed by the reference
class information. A likelihood function was constructed to reflect the
model's information. This approach was used to estimate healthcare costs
under a voluntary employee benefit association (VEBA). The resulting
Bayesian posterior forecast had lower variance (and lower forecast error)
than either the model-based forecast or the reference-class forecast.
Journal: The American Statistician
Pages: 221-229
Issue: 4
Volume: 68
Year: 2014
Month: 11
X-DOI: 10.1080/00031305.2014.937544
File-URL: http://hdl.handle.net/10.1080/00031305.2014.937544
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:4:p:221-229
Template-Type: ReDIF-Article 1.0
Author-Name: Jennifer L. Kirk
Author-X-Name-First: Jennifer L.
Author-X-Name-Last: Kirk
Author-Name: Michael P. Fay
Author-X-Name-First: Michael P.
Author-X-Name-Last: Fay
Title: An Introduction to Practical Sequential Inferences via Single-Arm Binary Response Studies Using the binseqtest R Package
Abstract:
We review sequential designs, including group sequential and two-stage
designs, for testing or estimating a single binary parameter. We use this
simple case to introduce ideas common to many sequential designs, which in
this case can be explained without explicitly using stochastic processes.
We focus on methods provided by our newly developed R package,
binseqtest, which exactly bound the Type I error
rate of tests and exactly maintain proper coverage of confidence
intervals. Within this framework, we review some allowable practical
adaptations of the sequential design. We explore issues such as the
following: How should the design be modified if no assessment was made at
one of the planned sequential stopping times? How should the parameter be
estimated if the study needs to be stopped early? What reasons for
stopping early are allowed? How should inferences be made when the study
is stopped for crossing the boundary, but later information is collected
about responses of subjects that had enrolled before the decision to stop
but had not responded by that time? Answers to these questions are
demonstrated using basic methods that are available in our
binseqtest R package. Supplementary materials for
this article are available online.
Journal: The American Statistician
Pages: 230-242
Issue: 4
Volume: 68
Year: 2014
Month: 11
X-DOI: 10.1080/00031305.2014.951126
File-URL: http://hdl.handle.net/10.1080/00031305.2014.951126
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:4:p:230-242
Template-Type: ReDIF-Article 1.0
Author-Name: Jun Yan
Author-X-Name-First: Jun
Author-X-Name-Last: Yan
Author-Name: Chao Guo
Author-X-Name-First: Chao
Author-X-Name-Last: Guo
Author-Name: Laurie E. Paarlberg
Author-X-Name-First: Laurie E.
Author-X-Name-Last: Paarlberg
Title: Are Nonprofit Antipoverty Organizations Located Where They Are Needed? A Spatial Analysis of the Greater Hartford Region
Abstract:
The geographic distribution of nonprofit antipoverty organizations has
important implications for economic development, social services, public
health, and policy efforts. With counts of antipoverty nonprofits at the
census tract level in Greater Hartford, Connecticut, we examine whether
these organizations are located in areas with high levels of poverty with
a spatial zero-inflated-Poisson model. Covariates that measure need,
resources, urban structure, and demographic characteristics are
incorporated into both the zero-inflation component and the Poisson
component of the model. Variation not explained by the covariates is
captured by the combination of a spatial random effect and an unstructured
random effect. Statistical inferences are done within the Bayesian
framework. Model comparison with the conditional predictive ordinate
suggests that the random effects and the zero-inflation are both important
components in fitting the data. All three need measures-proportion of
people below the poverty line, unemployment rate, and rental occupancy-are
found to have significantly positive effect on the mean of the count,
providing evidence that antipoverty nonprofits tend to locate where they
are needed. The dataset and R/OpenBUGS code are available in supplementary
materials online.
Journal: The American Statistician
Pages: 243-252
Issue: 4
Volume: 68
Year: 2014
Month: 11
X-DOI: 10.1080/00031305.2014.955211
File-URL: http://hdl.handle.net/10.1080/00031305.2014.955211
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:4:p:243-252
Template-Type: ReDIF-Article 1.0
Author-Name: Fan Yang
Author-X-Name-First: Fan
Author-X-Name-Last: Yang
Author-Name: José R. Zubizarreta
Author-X-Name-First: José R.
Author-X-Name-Last: Zubizarreta
Author-Name: Dylan S. Small
Author-X-Name-First: Dylan S.
Author-X-Name-Last: Small
Author-Name: Scott Lorch
Author-X-Name-First: Scott
Author-X-Name-Last: Lorch
Author-Name: Paul R. Rosenbaum
Author-X-Name-First: Paul R.
Author-X-Name-Last: Rosenbaum
Title: Dissonant Conclusions When Testing the Validity of an Instrumental Variable
Abstract:
An instrument or instrumental variable is often used in an effort to avoid
selection bias in inference about the effects of treatments when treatment
choice is based on thoughtful deliberation. Instruments are increasingly
used in health outcomes research. An instrument is a haphazard push to
accept one treatment or another, where the push can affect outcomes only
to the extent that it alters the treatment received. There are two key
assumptions here: (R) the push is haphazard or essentially random once
adjustments have been made for observed covariates, (E) the push affects
outcomes only by altering the treatment, the so-called "exclusion
restriction." These assumptions are often said to be untestable; however,
that is untrue if testable means checking the compatibility of assumptions
with other things we think we know. A test of this sort may result in a
collection of claims that are individually plausible but mutually
inconsistent, without clear indication as to which claim is culpable for
the inconsistency. We discuss this subject in the context of our on-going
study of the effects of delivery by cesarean section on the survival of
extremely premature infants of 23-24 weeks gestational age. Supplementary
materials for this article are available online.
Journal: The American Statistician
Pages: 253-263
Issue: 4
Volume: 68
Year: 2014
Month: 11
X-DOI: 10.1080/00031305.2014.962764
File-URL: http://hdl.handle.net/10.1080/00031305.2014.962764
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:4:p:253-263
Template-Type: ReDIF-Article 1.0
Author-Name: Phillip E. Pfeifer
Author-X-Name-First: Phillip E.
Author-X-Name-Last: Pfeifer
Author-Name: Yael Grushka-Cockayne
Author-X-Name-First: Yael
Author-X-Name-Last: Grushka-Cockayne
Author-Name: Kenneth C. Lichtendahl
Author-X-Name-First: Kenneth C.
Author-X-Name-Last: Lichtendahl
Title: The Promise of Prediction Contests
Abstract:
This article examines the prediction contest as a vehicle for aggregating
the opinions of a crowd of experts. After proposing a general definition
distinguishing prediction contests from other mechanisms for harnessing
the wisdom of crowds, we focus on point-forecasting contests-contests in
which forecasters submit point forecasts with a prize going to the entry
closest to the quantity of interest. We first illustrate the incentive for
forecasters to submit reports that exaggerate in the direction of their
private information. Whereas this exaggeration raises a forecaster's mean
squared error, it increases his or her chances of winning the contest. And
in contrast to conventional wisdom, this nontruthful reporting usually
improves the accuracy of the resulting crowd forecast. The source of this
improvement is that exaggeration shifts weight away from public
information (information known to all forecasters) and by so doing helps
alleviate public knowledge bias. In the context of a simple theoretical
model of overlapping information and forecaster behaviors, we present
closed-form expressions for the mean squared error of the crowd forecasts
which will help identify the situations in which point forecasting
contests will be most useful.
Journal: The American Statistician
Pages: 264-270
Issue: 4
Volume: 68
Year: 2014
Month: 11
X-DOI: 10.1080/00031305.2014.937545
File-URL: http://hdl.handle.net/10.1080/00031305.2014.937545
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:4:p:264-270
Template-Type: ReDIF-Article 1.0
Author-Name: Thaddeus Tarpey
Author-X-Name-First: Thaddeus
Author-X-Name-Last: Tarpey
Author-Name: R. Todd Ogden
Author-X-Name-First: R. Todd
Author-X-Name-Last: Ogden
Author-Name: Eva Petkova
Author-X-Name-First: Eva
Author-X-Name-Last: Petkova
Author-Name: Ronald Christensen
Author-X-Name-First: Ronald
Author-X-Name-Last: Christensen
Title: A Paradoxical Result in Estimating Regression Coefficients
Abstract:
This article presents a counterintuitive result regarding the estimation
of a regression slope coefficient. Paradoxically, the precision of the
slope estimator can deteriorate when additional information is used to
estimate its value. In a randomized experiment, the distribution of
baseline variables should be identical across treatments due to
randomization. The motivation for this article came from noting that the
precision of slope estimators deteriorated when pooling baseline
predictors across treatment groups.
Journal: The American Statistician
Pages: 271-276
Issue: 4
Volume: 68
Year: 2014
Month: 11
X-DOI: 10.1080/00031305.2014.940467
File-URL: http://hdl.handle.net/10.1080/00031305.2014.940467
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:4:p:271-276
Template-Type: ReDIF-Article 1.0
Author-Name: Shaoji Xu
Author-X-Name-First: Shaoji
Author-X-Name-Last: Xu
Title: A Property of Geometric Mean Regression
Abstract:
This article gives an overview of four classical regressions: regression
of Y on X, regression of
X on Y, orthogonal regression, and
geometric mean regression. It also compares two general parametric
families that unify all four regressions: Deming's parametric family and
Roos' parametric family. It is shown that Roos regression can be done by
minimizing the sum of squared α-distance, and as a special case,
geometric mean regression can be obtained by minimizing the sum of squared
adjusted distances between the sample points and an imaginary line.
Journal: The American Statistician
Pages: 277-281
Issue: 4
Volume: 68
Year: 2014
Month: 11
X-DOI: 10.1080/00031305.2014.962763
File-URL: http://hdl.handle.net/10.1080/00031305.2014.962763
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:4:p:277-281
Template-Type: ReDIF-Article 1.0
Author-Name: B. O'Neill
Author-X-Name-First: B.
Author-X-Name-Last: O'Neill
Title: Some Useful Moment Results in Sampling Problems
Abstract:
We consider the standard sampling problem involving a finite population of
N objects and a sample of n objects
taken from this population using simple random sampling without
replacement. We consider the relationship between the moments of the
sampled and unsampled parts and show how these are related to the
population moments. We derive expectation, variance, and covariance
results for the various quantities under consideration and use these to
obtain standard sampling results with an extension to variance estimation
with a "finite population correction." This clarifies and extends standard
results in sampling theory for the estimation of the mean and variance of
a population.
Journal: The American Statistician
Pages: 282-296
Issue: 4
Volume: 68
Year: 2014
Month: 11
X-DOI: 10.1080/00031305.2014.966589
File-URL: http://hdl.handle.net/10.1080/00031305.2014.966589
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:4:p:282-296
Template-Type: ReDIF-Article 1.0
Author-Name: A. B. Owen
Author-X-Name-First: A. B.
Author-X-Name-Last: Owen
Author-Name: P. A. Roediger
Author-X-Name-First: P. A.
Author-X-Name-Last: Roediger
Title: The Sign of the Logistic Regression Coefficient
Abstract:
Let Y be a binary random variable and X
a scalar. Let be the maximum
likelihood estimate of the slope in a logistic regression of
Y on X with intercept. Further let
and
be the average
of sample x values for cases with y = 0
and y = 1, respectively. Then under a condition that
rules out separable predictors, we show that
. More generally,
if the xi are vector valued, then we show that
if and only if
. This holds for
logistic regression and also for more general binary regressions with
inverse link functions satisfying a log-concavity condition. Finally, when
then the angle
between and
is less than
90° in binary regressions satisfying the log-concavity condition and
the separation condition, when the design matrix has full rank.
Journal: The American Statistician
Pages: 297-301
Issue: 4
Volume: 68
Year: 2014
Month: 11
X-DOI: 10.1080/00031305.2014.951128
File-URL: http://hdl.handle.net/10.1080/00031305.2014.951128
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:4:p:297-301
Template-Type: ReDIF-Article 1.0
Author-Name: Vito M. R. Muggeo
Author-X-Name-First: Vito M. R.
Author-X-Name-Last: Muggeo
Author-Name: Gianfranco Lovison
Author-X-Name-First: Gianfranco
Author-X-Name-Last: Lovison
Title: The "Three Plus One" Likelihood-Based Test Statistics: Unified Geometrical and Graphical Interpretations
Abstract:
The presentations of the well-known likelihood ratio, Wald and score test
statistics in textbooks appear to lack a unified graphical and geometrical
interpretation. We present two simple graphical representations on a
common scale for these three test statistics, and also the recently
proposed gradient test statistic. These unified graphical displays may
favor better understanding of the geometrical meaning of the
likelihood-based statistics and provide useful insights into their
connections.
Journal: The American Statistician
Pages: 302-306
Issue: 4
Volume: 68
Year: 2014
Month: 11
X-DOI: 10.1080/00031305.2014.955212
File-URL: http://hdl.handle.net/10.1080/00031305.2014.955212
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:4:p:302-306
Template-Type: ReDIF-Article 1.0
Author-Name: Peng Ding
Author-X-Name-First: Peng
Author-X-Name-Last: Ding
Title: Tarpey, T., Ogden, R. T., Petkova, E., and Christensen R. (2014), "A Paradoxical Result in Estimating Regression Coefficients," The American Statistician, 68, 271-276 (this issue): Comment by Peng Ding
Journal: The American Statistician
Pages: 316-316
Issue: 4
Volume: 68
Year: 2014
Month: 11
X-DOI: 10.1080/00031305.2014.954733
File-URL: http://hdl.handle.net/10.1080/00031305.2014.954733
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:4:p:316-316
Template-Type: ReDIF-Article 1.0
Author-Name: Nitis Mukhopadhyay
Author-X-Name-First: Nitis
Author-X-Name-Last: Mukhopadhyay
Title: Warr, R. L. and Erich, R. A. (2013), "Should the Interquartile Range Divided by the Standard Deviation be Used to Assess Normality?," The American Statistician, 67, 242-244: Comment by Mukhopadhyay and Reply
Journal: The American Statistician
Pages: 316-317
Issue: 4
Volume: 68
Year: 2014
Month: 11
X-DOI: 10.1080/00031305.2014.962765
File-URL: http://hdl.handle.net/10.1080/00031305.2014.962765
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:4:p:316-317
Template-Type: ReDIF-Article 1.0
Author-Name: Richard L. Warr
Author-X-Name-First: Richard L.
Author-X-Name-Last: Warr
Author-Name: Roger A. Erich
Author-X-Name-First: Roger A.
Author-X-Name-Last: Erich
Title: Reply
Journal: The American Statistician
Pages: 317-317
Issue: 4
Volume: 68
Year: 2014
Month: 11
X-DOI: 10.1080/00031305.2014.970867
File-URL: http://hdl.handle.net/10.1080/00031305.2014.970867
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:4:p:317-317
Template-Type: ReDIF-Article 1.0
Author-Name: Sivan Aldor-Noiman
Author-X-Name-First: Sivan
Author-X-Name-Last: Aldor-Noiman
Author-Name: Lawrence D. Brown
Author-X-Name-First: Lawrence D.
Author-X-Name-Last: Brown
Author-Name: Andreas Buja
Author-X-Name-First: Andreas
Author-X-Name-Last: Buja
Author-Name: Wolfgang Rolke
Author-X-Name-First: Wolfgang
Author-X-Name-Last: Rolke
Author-Name: Robert A. Stine
Author-X-Name-First: Robert A.
Author-X-Name-Last: Stine
Title: Aldor-Noiman, S., Brown, L.D., Buja, A., Rolke, W., and Stine, R.A. (2013), "The Power to See: A New Graphical Test of Normality," The American Statistician, 67, 249-260
Journal: The American Statistician
Pages: 318-318
Issue: 4
Volume: 68
Year: 2014
Month: 11
X-DOI: 10.1080/00031305.2014.970871
File-URL: http://hdl.handle.net/10.1080/00031305.2014.970871
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:68:y:2014:i:4:p:318-318
Template-Type: ReDIF-Article 1.0
Author-Name: Gregory P. Samsa
Author-X-Name-First: Gregory P.
Author-X-Name-Last: Samsa
Title: Has It Really Been Demonstrated That Most Genomic Research Findings Are False?
Abstract:
In a widely cited article, Ioannidis argued that most published research
findings are false; particularly discovery research involving massive
testing, genomics being a typical example. However, his argument ignores
adjustment for multiple testing and thus should be taken with a large
grain of salt. This is a potential example for statistics courses that
concentrate on problem formulation.
Journal: The American Statistician
Pages: 1-4
Issue: 1
Volume: 69
Year: 2015
Month: 2
X-DOI: 10.1080/00031305.2014.951127
File-URL: http://hdl.handle.net/10.1080/00031305.2014.951127
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:1:p:1-4
Template-Type: ReDIF-Article 1.0
Author-Name: Joel E. Cohen
Author-X-Name-First: Joel E.
Author-X-Name-Last: Cohen
Title: Markov's Inequality and Chebyshev's Inequality for Tail Probabilities: A Sharper Image
Abstract:
Markov's inequality gives an upper bound on the probability that a
nonnegative random variable takes large values. For example, if the random
variable is the lifetime of a person or a machine, Markov's inequality
says that the probability that an individual survives more than three
times the average lifetime in the population of such individuals cannot
exceed one-third. Here we give a simple, intuitive geometric
interpretation and derivation of Markov's inequality. These results lead
to inequalities sharper than Markov's when information about conditional
expectations is available, as in reliability theory, demography, and
actuarial mathematics. We use these results to sharpen Chebyshev's tail
inequality also.
Journal: The American Statistician
Pages: 5-7
Issue: 1
Volume: 69
Year: 2015
Month: 2
X-DOI: 10.1080/00031305.2014.975842
File-URL: http://hdl.handle.net/10.1080/00031305.2014.975842
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:1:p:5-7
Template-Type: ReDIF-Article 1.0
Author-Name: Liang Hong
Author-X-Name-First: Liang
Author-X-Name-Last: Hong
Title: The Absolute Difference Law For Expectations
Abstract:
We revisit the addition law for expectations and present a sibling law:
the absolute law for expectations. We show that these two laws and their
corresponding laws for probabilities can be reconciled under a single
framework. As an application, we use the absolute law for expectations to
calculate the mean absolute deviation. Finally, we remark on a hidden
point in a related article previously published on these pages; this will
help readers to avoid a potential pitfall.
Journal: The American Statistician
Pages: 8-10
Issue: 1
Volume: 69
Year: 2015
Month: 2
X-DOI: 10.1080/00031305.2014.994712
File-URL: http://hdl.handle.net/10.1080/00031305.2014.994712
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:1:p:8-10
Template-Type: ReDIF-Article 1.0
Author-Name: Lisa M. Lee
Author-X-Name-First: Lisa M.
Author-X-Name-Last: Lee
Author-Name: Frances A. McCarty
Author-X-Name-First: Frances A.
Author-X-Name-Last: McCarty
Author-Name: Tenny R. Zhang
Author-X-Name-First: Tenny R.
Author-X-Name-Last: Zhang
Title: Ethical Numbers: Ethics Training in U.S. Graduate Statistics Programs, 2013-2014
Abstract:
As important members of research teams, statisticians bear an ethical
responsibility to analyze, interpret, and report data honestly and
objectively. One way of reinforcing ethical responsibilities is through
required courses covering a variety of ethics-related topics at the
graduate level. We assessed ethics requirements for graduate-level
statistics training programs in the United States for the 2013-2014
academic year using the websites of 88 universities, examining 103
biostatistics programs, and 136 statistics degree programs. We categorized
programs' ethics training requirements as required or not required.
Thirty-one (35.1%) universities required an ethics course for at least
some degree students. Sixty-two (25.5%) degree programs required an ethics
course for at least some students. The majority (77.4%) of required
courses were worth 0 or 1 credit. Of the 177 programs without an ethics
requirement, 19 (10.7%) listed an ethics elective. Although a single
ethics course is insufficient for instilling an ethical approach to
science, degree programs that model expectations through coursework point
to the value of ethics in science. More training programs should prepare
statisticians to consider the ethical dimensions of their work through
required coursework. Supplementary materials for this article are
available online.
Journal: The American Statistician
Pages: 11-16
Issue: 1
Volume: 69
Year: 2015
Month: 2
X-DOI: 10.1080/00031305.2014.997891
File-URL: http://hdl.handle.net/10.1080/00031305.2014.997891
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:1:p:11-16
Template-Type: ReDIF-Article 1.0
Author-Name: Valeria Espinosa
Author-X-Name-First: Valeria
Author-X-Name-Last: Espinosa
Author-Name: Donald B. Rubin
Author-X-Name-First: Donald B.
Author-X-Name-Last: Rubin
Title: Did the Military Interventions in the Mexican Drug War Increase Violence?
Abstract:
We analyze publicly available data to estimate the causal effects of
military interventions on the homicide rates in certain problematic
regions in Mexico. We use the Rubin causal model to compare the
post-intervention homicide rate in each intervened region to the
hypothetical homicide rate for that same year had the military
intervention not taken place. Because the effect of a military
intervention is not confined to the municipality subject to the
intervention, a nonstandard definition of units is necessary to estimate
the causal effect of the intervention under the standard no-interference
assumption of stable-unit treatment value assumption (SUTVA). Donor pools
are created for each missing potential outcome under no intervention,
thereby allowing for the estimation of unit-level causal effects. A
multiple imputation approach accounts for uncertainty about the missing
potential outcomes.
Journal: The American Statistician
Pages: 17-27
Issue: 1
Volume: 69
Year: 2015
Month: 2
X-DOI: 10.1080/00031305.2014.965796
File-URL: http://hdl.handle.net/10.1080/00031305.2014.965796
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:1:p:17-27
Template-Type: ReDIF-Article 1.0
Author-Name: Wei Wang
Author-X-Name-First: Wei
Author-X-Name-Last: Wang
Author-Name: Dylan S. Small
Author-X-Name-First: Dylan S.
Author-X-Name-Last: Small
Title: Monotone B-Spline Smoothing for a Generalized Linear Model Response
Abstract:
Various methods have been proposed for smoothing under the monotonicity
constraint. We review the literature and implement an approach of monotone
smoothing with B-splines for a generalized linear model response. The
approach is expressed as a quadratic programming problem and is easily
solved using the statistical software R. In a simulation study, we find
that the approach performs better than other approaches with much faster
computation time. The approach can also be used for smoothing under other
shape constraints or mixed constraints. Supplementary materials of the
appendices and R code to implement the developed approach is available
online.
Journal: The American Statistician
Pages: 28-33
Issue: 1
Volume: 69
Year: 2015
Month: 2
X-DOI: 10.1080/00031305.2014.969445
File-URL: http://hdl.handle.net/10.1080/00031305.2014.969445
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:1:p:28-33
Template-Type: ReDIF-Article 1.0
Author-Name: Guangxiang Zhang
Author-X-Name-First: Guangxiang
Author-X-Name-Last: Zhang
Author-Name: John J. Chen
Author-X-Name-First: John J.
Author-X-Name-Last: Chen
Title: Biostatistics Faculty and NIH Awards at U.S. Medical Schools
Abstract:
Statistical principles and methods are critical to the success of
biomedical and translational research. However, it is difficult to track
and evaluate the monetary value of a biostatistician to a medical school
(SoM). Limited published data on this topic are available, especially
comparing across SoMs. Using National Institutes of Health (NIH) awards
and American Association of Medical Colleges (AAMC) faculty counts data
(2010-2013), together with online information on biostatistics faculty
from 119 institutions across the country, we demonstrated that the number
of biostatistics faculty was significantly positively associated with the
amount of NIH awards, both as a school total and on a per faculty basis,
across various sizes of U.S. SoMs. Biostatisticians, as a profession,
should be proactive in communicating and advocating the value of their
work and their unique contribution to the long-term success of a
biomedical research enterprise. Supplementary materials for this article
are available online.
Journal: The American Statistician
Pages: 34-40
Issue: 1
Volume: 69
Year: 2015
Month: 2
X-DOI: 10.1080/00031305.2014.992959
File-URL: http://hdl.handle.net/10.1080/00031305.2014.992959
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:1:p:34-40
Template-Type: ReDIF-Article 1.0
Author-Name: Stephen Portnoy
Author-X-Name-First: Stephen
Author-X-Name-Last: Portnoy
Title: Maximizing Probability Bounds Under Moment-Matching Restrictions
Abstract:
The problem of characterizing a distribution by its moments dates to work
by Chebyshev in the mid-nineteenth century. There are clear (and close)
connections with characteristic functions, moment spaces, quadrature, and
other very classical mathematical pursuits. Lindsay and Basak posed the
specific question of how far from normality could a distribution be if it
matches k normal moments. They provided a bound on the
maximal difference in cdfs, and implied that these bounds were attained.
It will be shown here that in fact the bound is not attained if the number
of even moments matched is odd. An explicit solution is developed as a
symmetric distribution with a finite number of mass points when the number
of even moments matched is even, and this bound for the even case is shown
to hold as an explicit limit for the subsequent odd case. As Lindsay
noted, the discrepancies can be sizable even for a moderate number of
matched moments. Some comments on implications are proffered.
Journal: The American Statistician
Pages: 41-44
Issue: 1
Volume: 69
Year: 2015
Month: 2
X-DOI: 10.1080/00031305.2014.992960
File-URL: http://hdl.handle.net/10.1080/00031305.2014.992960
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:1:p:41-44
Template-Type: ReDIF-Article 1.0
Author-Name: Yaakov Malinovsky
Author-X-Name-First: Yaakov
Author-X-Name-Last: Malinovsky
Author-Name: Paul S. Albert
Author-X-Name-First: Paul S.
Author-X-Name-Last: Albert
Title: A Note on the Minimax Solution for the Two-Stage Group Testing Problem
Abstract:
Group testing is an active area of current research and has important
applications in medicine, biotechnology, genetics, and product testing.
There have been recent advances in design and estimation, but the simple
Dorfman procedure introduced by R. Dorfman in 1943 is widely used in
practice. In many practical situations, the exact value of the probability
p of being affected is unknown. We present both minimax
and Bayesian solutions for the group size problem when p
is unknown. For unbounded p, we show that the minimax
solution for group size is 8, while using a Bayesian strategy with
Jeffreys' prior results in a group size of 13. We also present solutions
when p is bounded from above. For the practitioner, we
propose strong justification for using a group size of between 8 and 13
when a constraint on p is not incorporated and provide
useable code for computing the minimax group size under a constrained
p.
Journal: The American Statistician
Pages: 45-52
Issue: 1
Volume: 69
Year: 2015
Month: 2
X-DOI: 10.1080/00031305.2014.983545
File-URL: http://hdl.handle.net/10.1080/00031305.2014.983545
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:1:p:45-52
Template-Type: ReDIF-Article 1.0
Author-Name: Spyros Missiakoulis
Author-X-Name-First: Spyros
Author-X-Name-Last: Missiakoulis
Title: Letter to the Editor
Journal: The American Statistician
Pages: 62-62
Issue: 1
Volume: 69
Year: 2015
Month: 2
X-DOI: 10.1080/00031305.2014.984816
File-URL: http://hdl.handle.net/10.1080/00031305.2014.984816
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:1:p:62-62
Template-Type: ReDIF-Article 1.0
Author-Name: Christy Chuang-Stein
Author-X-Name-First: Christy
Author-X-Name-Last: Chuang-Stein
Author-Name: Narayanaswamy Balakrishnan
Author-X-Name-First: Narayanaswamy
Author-X-Name-Last: Balakrishnan
Author-Name: Marcus Berzofsky
Author-X-Name-First: Marcus
Author-X-Name-Last: Berzofsky
Author-Name: Amy Herring
Author-X-Name-First: Amy
Author-X-Name-Last: Herring
Author-Name: Fred Hulting
Author-X-Name-First: Fred
Author-X-Name-Last: Hulting
Author-Name: John McKenzie
Author-X-Name-First: John
Author-X-Name-Last: McKenzie
Author-Name: Dionne Price
Author-X-Name-First: Dionne
Author-X-Name-Last: Price
Author-Name: Stephen Stigler
Author-X-Name-First: Stephen
Author-X-Name-Last: Stigler
Author-Name: George Williams
Author-X-Name-First: George
Author-X-Name-Last: Williams
Author-Name: Ronald Wasserstein
Author-X-Name-First: Ronald
Author-X-Name-Last: Wasserstein
Title: Celebrating the 175th Anniversary of ASA
Journal: The American Statistician
Pages: 64-67
Issue: 2
Volume: 69
Year: 2015
Month: 5
X-DOI: 10.1080/00031305.2015.1028765
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1028765
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:2:p:64-67
Template-Type: ReDIF-Article 1.0
Author-Name: Robert L. Mason
Author-X-Name-First: Robert L.
Author-X-Name-Last: Mason
Author-Name: John D. McKenzie
Author-X-Name-First: John D.
Author-X-Name-Last: McKenzie
Title: A Brief History of the American Statistical Association, 1990-2014
Abstract:
The objective of this article is to present a brief chronological record
of the American Statistical Association (ASA) from its modest beginnings
in Boston in 1839 to its present status as a worldwide professional
organization with approximately 19,000 members and a headquarters in
Alexandria, Virginia. Emphasis is placed on accomplishments over the past
25 years of the ASA from the end of its Sesquicentennial Celebration
in 1989 to the end of its 175th Anniversary Celebration in 2014. Its
continued growth during this period has been achieved through the work of
outstanding leaders, sections, chapters, and committees. This article
briefly summarizes its achievements in organizational efficiency,
membership services, innovative meetings, and publications. It also
describes its work in structural change, education, public relations, and
science policy. It ends with a positive look to the future.
Journal: The American Statistician
Pages: 68-78
Issue: 2
Volume: 69
Year: 2015
Month: 5
X-DOI: 10.1080/00031305.2015.1033984
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1033984
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:2:p:68-78
Template-Type: ReDIF-Article 1.0
Author-Name: James J. Cochran
Author-X-Name-First: James J.
Author-X-Name-Last: Cochran
Title: ASA Presidents and Executive Directors Look Back on their Terms in Office
Journal: The American Statistician
Pages: 79-85
Issue: 2
Volume: 69
Year: 2015
Month: 5
X-DOI: 10.1080/00031305.2015.1033988
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1033988
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:2:p:79-85
Template-Type: ReDIF-Article 1.0
Author-Name: Jon R. Kettenring
Author-X-Name-First: Jon R.
Author-X-Name-Last: Kettenring
Author-Name: Kenneth J. Koehler
Author-X-Name-First: Kenneth J.
Author-X-Name-Last: Koehler
Author-Name: John D. McKenzie Jr.
Author-X-Name-First: John D.
Author-X-Name-Last: McKenzie Jr.
Title: Challenges and Opportunities for Statistics in the Next 25 Years
Abstract:
Beginning with the 75th Anniversary of the American Statistical
Association in 1914 and for subsequent 25-year celebrations, distinguished
members of the association have addressed the future of statistics. A
four-person panel engaged in the same exercise during the 2014 Joint
Statistical Meetings for the ASA's dodransbicentennial. The panel
identified a variety of strengths, weaknesses, opportunities, and threats
for the profession in the next quarter of a century. This article
highlights some of the discussion that took place.
Journal: The American Statistician
Pages: 86-90
Issue: 2
Volume: 69
Year: 2015
Month: 5
X-DOI: 10.1080/00031305.2015.1033987
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1033987
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:2:p:86-90
Template-Type: ReDIF-Article 1.0
Author-Name: Robert N. Rodriguez
Author-X-Name-First: Robert N.
Author-X-Name-Last: Rodriguez
Title: Who Will Celebrate Our 200th Anniversary? Growing the Next Generation of ASA Members
Abstract:
During the next 25 years, the growth and vitality of the American
Statistical Association will depend on how well we attract and serve
members in emerging areas of practice such as data science, where
statistics as a skill set is in high demand but statistics as a profession
has low recognition. Successful adaptation to the era of Big Data requires
that we broaden our understanding of statistical practice to include the
work of all those who learn from data. In order to grow the next
generation of members, we must also retain a much higher proportion of
today's student members, many of whom leave the ASA upon graduation. By
providing value that meets the needs of these groups and equips them to
flourish in their organizations, we can become the Big Tent for
Statistics.
Journal: The American Statistician
Pages: 91-95
Issue: 2
Volume: 69
Year: 2015
Month: 5
X-DOI: 10.1080/00031305.2015.1028231
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1028231
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:2:p:91-95
Template-Type: ReDIF-Article 1.0
Author-Name: Ron Wasserstein
Author-X-Name-First: Ron
Author-X-Name-Last: Wasserstein
Title: Communicating the Power and Impact of Our Profession: A Heads Up for the Next Executive Directors of the ASA
Journal: The American Statistician
Pages: 96-99
Issue: 2
Volume: 69
Year: 2015
Month: 5
X-DOI: 10.1080/00031305.2015.1031283
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1031283
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:2:p:96-99
Template-Type: ReDIF-Article 1.0
Author-Name: Jessica Utts
Author-X-Name-First: Jessica
Author-X-Name-Last: Utts
Title: The Many Facets of Statistics Education: 175 Years of Common Themes
Abstract:
The American Statistical Association's primary founder, Lemuel Shattuck,
was driven by a passion for collecting and disseminating accurate
information on vital statistics, public health, and other statistically
related concerns. The 175th anniversary provides an opportunity to reflect
on the education-related reasons ASA was founded and what it has done in
education since its founding, especially in the past 25 years since
the 150th anniversary. An examination of early and more recent issues of
the ASA's journals reveals some common themes that have recurred over the
past 175 years. We discuss what those themes are and what the ASA is
doing to address them currently, and then conclude by discussing what ASA
members can do to help.
Journal: The American Statistician
Pages: 100-107
Issue: 2
Volume: 69
Year: 2015
Month: 5
X-DOI: 10.1080/00031305.2015.1033981
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1033981
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:2:p:100-107
Template-Type: ReDIF-Article 1.0
Author-Name: David L. DeMets
Author-X-Name-First: David L.
Author-X-Name-Last: DeMets
Author-Name: Janet Turk Wittes
Author-X-Name-First: Janet Turk
Author-X-Name-Last: Wittes
Author-Name: Nancy L. Geller
Author-X-Name-First: Nancy L.
Author-X-Name-Last: Geller
Title: The Influence of Biostatistics at the National Heart, Lung, and Blood Institute
Abstract:
Since the early 1950s, the National Heart, Lung, and Blood Institute
(NHBLI) has conducted a long series of influential randomized clinical
trials in heart, lung, and blood diseases. The biostatisticians at the
Institute have been central to the design, conduct, monitoring, and final
analyses of these trials. The uniquely favorable deck of cards the group
of biostatisticians at the Institute has been dealt over the six and half
decades of the group's life has led to contributions that have had a major
impact on the fields of biostatistics and clinical trials. The leaders of
the NHLBI and its several Divisions have valued the independence,
creativity, and collaborative interactions of statisticians within the
Institute. The medical problems the Institute faced impelled the
statisticians to develop methodology that would address questions of great
public importance. Perhaps most importantly, the individual members of the
group had a collective vision passed from member to member over time that
new methodology must fit the questions being asked. The group has always
had the technical ability to develop new methods and the conviction that
they were responsible for ensuring that they could explain their methods
to the clinicians with whom they worked.
Journal: The American Statistician
Pages: 108-120
Issue: 2
Volume: 69
Year: 2015
Month: 5
X-DOI: 10.1080/00031305.2015.1035962
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1035962
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:2:p:108-120
Template-Type: ReDIF-Article 1.0
Author-Name: Allan J. Rossman
Author-X-Name-First: Allan J.
Author-X-Name-Last: Rossman
Author-Name: Roy St. Laurent
Author-X-Name-First: Roy St.
Author-X-Name-Last: Laurent
Author-Name: Josh Tabor
Author-X-Name-First: Josh
Author-X-Name-Last: Tabor
Title: Advanced Placement Statistics: Expanding the Scope of Statistics Education
Abstract:
A list of consequential developments in the field of statistics for the
past quarter-century must include the creation and implementation of the
Advanced Placement (AP) program in Statistics. This program has introduced
millions of high school students to our discipline over the past
18 years, contributing to the large increase in the number of
undergraduate students pursuing statistics as their major in college. ASA
members and leaders have played a substantial role in shaping this program
and furthering its success.
Journal: The American Statistician
Pages: 121-126
Issue: 2
Volume: 69
Year: 2015
Month: 5
X-DOI: 10.1080/00031305.2015.1033985
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1033985
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:2:p:121-126
Template-Type: ReDIF-Article 1.0
Author-Name: Eric A. Vance
Author-X-Name-First: Eric A.
Author-X-Name-Last: Vance
Title: Recent Developments and Their Implications for the Future of Academic Statistical Consulting Centers
Abstract:
I describe how developments over the past 25 years in computing,
funding, personnel, purpose, and training have affected academic
statistical consulting centers and discuss how these developments and
trends point to a range of potential futures. At one extreme, academic
statistical consulting centers fail to adapt to competition from other
disciplines in an increasingly fragmented market for statistical
consulting and spiral downward toward irrelevancy and extinction. At the
other extreme, purpose-driven academic statistical consulting centers
constantly increase their impact in a virtuous cycle, leading the way
toward the profession of statistics having greater positive impact on
society. I conclude with actions to take to assure a robust future and
increased impact for academic statistical consulting centers.
Journal: The American Statistician
Pages: 127-137
Issue: 2
Volume: 69
Year: 2015
Month: 5
X-DOI: 10.1080/00031305.2015.1033990
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1033990
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:2:p:127-137
Template-Type: ReDIF-Article 1.0
Author-Name: Nicholas J. Horton
Author-X-Name-First: Nicholas J.
Author-X-Name-Last: Horton
Title: Challenges and Opportunities for Statistics and Statistical Education: Looking Back, Looking Forward
Abstract:
The 175th anniversary of the ASA provides an opportunity to look back into
the past and peer into the future. What led our forebears to found the
association? What commonalities do we still see? What insights might we
glean from their experiences and observations? I will use the anniversary
as a chance to reflect on where we are now and where we are headed in
terms of statistical education amidst the growth of data science.
Statistics is the science of learning from data. By fostering more
multivariable thinking, building data-related skills, and developing
simulation-based problem solving, we can help to ensure that statisticians
are fully engaged in data science and the analysis of the abundance of
data now available to us.
Journal: The American Statistician
Pages: 138-145
Issue: 2
Volume: 69
Year: 2015
Month: 5
X-DOI: 10.1080/00031305.2015.1032435
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1032435
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:2:p:138-145
Template-Type: ReDIF-Article 1.0
Author-Name: Saralees Nadarajah
Author-X-Name-First: Saralees
Author-X-Name-Last: Nadarajah
Title: On the Computation of Gauss Hypergeometric Functions
Abstract:
The pioneering study undertaken by Liang et al. in 2008
(Journal of the American Statistical Association, 103,
410-423) and the hundreds of papers citing that work make use of certain
hypergeometric functions. Liang et al. and many others claim that the
computation of the hypergeometric functions is difficult. Here, we show
that the hypergeometric functions can in fact be reduced to simpler
functions that can often be computed using a pocket calculator.
Journal: The American Statistician
Pages: 146-148
Issue: 2
Volume: 69
Year: 2015
Month: 5
X-DOI: 10.1080/00031305.2015.1028595
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1028595
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:2:p:146-148
Template-Type: ReDIF-Article 1.0
Author-Name: Robert Easterling
Author-X-Name-First: Robert
Author-X-Name-Last: Easterling
Title: There's Nothing Wrong With Clopper-Pearson Binomial Confidence Limits
Journal: The American Statistician
Pages: 154-155
Issue: 2
Volume: 69
Year: 2015
Month: 5
X-DOI: 10.1080/00031305.2015.1019646
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1019646
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:2:p:154-155
Template-Type: ReDIF-Article 1.0
Author-Name: Mark F. Schilling
Author-X-Name-First: Mark F.
Author-X-Name-Last: Schilling
Author-Name: Jimmy A. Doi
Author-X-Name-First: Jimmy A.
Author-X-Name-Last: Doi
Title: Reply
Journal: The American Statistician
Pages: 155-156
Issue: 2
Volume: 69
Year: 2015
Month: 5
X-DOI: 10.1080/00031305.2015.1026760
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1026760
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:2:p:155-156
Template-Type: ReDIF-Article 1.0
Author-Name: Liang Hong
Author-X-Name-First: Liang
Author-X-Name-Last: Hong
Title: Another Remark on the Alternative Expectation Formula
Abstract:
Students in a calculus-based probability course will often see the
expectation formula for nonnegative continuous random variables in terms
of the survival function. This alternative expectation formula has a wide
spectrum of applications. It is natural to ask whether there is a
multivariate version of this formula. This note gives an affirmative
answer by establishing such a formula using two different approaches. The
two approaches employed in this note correspond to the two approaches for
the univariate case. Supplementary materials for this article are
available online.
Journal: The American Statistician
Pages: 157-159
Issue: 3
Volume: 69
Year: 2015
Month: 8
X-DOI: 10.1080/00031305.2015.1049710
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1049710
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:157-159
Template-Type: ReDIF-Article 1.0
Author-Name: Per Gösta Andersson
Author-X-Name-First: Per Gösta
Author-X-Name-Last: Andersson
Title: A Classroom Approach to the Construction of an Approximate Confidence Interval of a Poisson Mean Using One Observation
Abstract:
Even elementary statistical problems may give rise to a deeper and broader
discussion of issues in probability and statistics. The construction of an
approximate confidence interval for a Poisson mean turns out to be such a
case. The simple standard two-sided Wald confidence interval by normal
approximation is discussed and compared with the score interval. The
discussion is partly in the form of an imaginary dialog between a teacher
and a student, where the latter is supposed to have studied mathematical
statistics for at least one semester.
Journal: The American Statistician
Pages: 160-164
Issue: 3
Volume: 69
Year: 2015
Month: 8
X-DOI: 10.1080/00031305.2015.1056830
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1056830
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:160-164
Template-Type: ReDIF-Article 1.0
Author-Name: Joyee Ghosh
Author-X-Name-First: Joyee
Author-X-Name-Last: Ghosh
Author-Name: Andrew E. Ghattas
Author-X-Name-First: Andrew E.
Author-X-Name-Last: Ghattas
Title: Bayesian Variable Selection Under Collinearity
Abstract:
In this article, we highlight some interesting facts about Bayesian
variable selection methods for linear regression models in settings where
the design matrix exhibits strong collinearity. We first demonstrate via
real data analysis and simulation studies that summaries of the posterior
distribution based on marginal and joint distributions may give
conflicting results for assessing the importance of strongly correlated
covariates. The natural question is which one should be used in practice.
The simulation studies suggest that posterior inclusion probabilities and
Bayes factors that evaluate the importance of correlated covariates
jointly are more appropriate, and some priors may be more adversely
affected in such a setting. To obtain a better understanding behind the
phenomenon, we study some toy examples with Zellner's
g-prior. The results show that strong collinearity may
lead to a multimodal posterior distribution over models, in which joint
summaries are more appropriate than marginal summaries. Thus, we recommend
a routine examination of the correlation matrix and calculation of the
joint inclusion probabilities for correlated covariates, in addition to
marginal inclusion probabilities, for assessing the importance of
covariates in Bayesian variable selection.
Journal: The American Statistician
Pages: 165-173
Issue: 3
Volume: 69
Year: 2015
Month: 8
X-DOI: 10.1080/00031305.2015.1031827
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1031827
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:165-173
Template-Type: ReDIF-Article 1.0
Author-Name: Darrick Yee
Author-X-Name-First: Darrick
Author-X-Name-Last: Yee
Author-Name: Andrew Ho
Author-X-Name-First: Andrew
Author-X-Name-Last: Ho
Title: Discreteness Causes Bias in Percentage-Based Comparisons: A Case Study From Educational Testing
Abstract:
Discretizing continuous distributions can lead to bias in parameter
estimates. We present a case study from educational testing that
illustrates dramatic consequences of discreteness when discretizing
partitions differ across distributions. The percentage of test takers who
score above a certain cutoff score (percent above cutoff, or "PAC") often
describes overall performance on a test. Year-over-year changes in PAC, or
ΔPAC, have gained prominence under recent U.S. education policies,
with public schools facing sanctions if they fail to meet PAC targets. In
this article, we describe how test score distributions act as continuous
distributions that are discretized inconsistently over time. We show that
this can propagate considerable bias to PAC trends, where positive
ΔPACs appear negative, and vice versa, for a substantial number of
actual tests. A simple model shows that this bias applies to any
comparison of PAC statistics in which values for one distribution are
discretized differently from values for the other.
Journal: The American Statistician
Pages: 174-181
Issue: 3
Volume: 69
Year: 2015
Month: 8
X-DOI: 10.1080/00031305.2015.1031828
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1031828
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:174-181
Template-Type: ReDIF-Article 1.0
Author-Name: Timothy A. C. Hughes
Author-X-Name-First: Timothy A. C.
Author-X-Name-Last: Hughes
Author-Name: Jaechoul Lee
Author-X-Name-First: Jaechoul
Author-X-Name-Last: Lee
Title: A New Test for Short Memory in Long Memory Time Series
Abstract:
This article considers short memory characteristics in a long memory
process. We derive new asymptotic results for the sample autocorrelation
difference ratios. We used these results to develop a new portmanteau test
that determines if short memory parameters are statistically significant.
In simulations, the new test can detect short memory components more often
than the Ljung-Box test when these short memory components are in fact
within a long memory process. Interestingly, our test finds short memory
autocorrelations in U.S. inflation rate data, whereas the Ljung-Box test
fails to find these autocorrelations. Modeling these short memory
autocorrelations of the inflation rate data leads to improved model
accuracy and more precise prediction.
Journal: The American Statistician
Pages: 182-190
Issue: 3
Volume: 69
Year: 2015
Month: 8
X-DOI: 10.1080/00031305.2015.1056829
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1056829
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:182-190
Template-Type: ReDIF-Article 1.0
Author-Name: Shiyao Liu
Author-X-Name-First: Shiyao
Author-X-Name-Last: Liu
Author-Name: Huaiqing Wu
Author-X-Name-First: Huaiqing
Author-X-Name-Last: Wu
Author-Name: William Q. Meeker
Author-X-Name-First: William Q.
Author-X-Name-Last: Meeker
Title: Understanding and Addressing the Unbounded "Likelihood" Problem
Abstract:
The joint probability density function, evaluated at the observed data, is
commonly used as the likelihood function to compute maximum likelihood
estimates. For some models, however, there exist paths in the parameter
space along which this density-approximation likelihood goes to infinity
and maximum likelihood estimation breaks down. In all applications,
however, observed data are really discrete due to the round-off or
grouping error of measurements. The "correct likelihood" based on interval
censoring can eliminate the problem of an unbounded likelihood. This
article categorizes the models leading to unbounded likelihoods into three
groups and illustrates the density-approximation breakdown with specific
examples. Although it is usually possible to infer how given data were
rounded, when this is not possible, one must choose the width for interval
censoring, so we study the effect of the round-off on estimation. We also
give sufficient conditions for the joint density to provide the same
maximum likelihood estimate as the correct likelihood, as the round-off
error goes to zero.
Journal: The American Statistician
Pages: 191-200
Issue: 3
Volume: 69
Year: 2015
Month: 8
X-DOI: 10.1080/00031305.2014.1003968
File-URL: http://hdl.handle.net/10.1080/00031305.2014.1003968
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:191-200
Template-Type: ReDIF-Article 1.0
Author-Name: Anne-Laure Boulesteix
Author-X-Name-First: Anne-Laure
Author-X-Name-Last: Boulesteix
Author-Name: Robert Hable
Author-X-Name-First: Robert
Author-X-Name-Last: Hable
Author-Name: Sabine Lauer
Author-X-Name-First: Sabine
Author-X-Name-Last: Lauer
Author-Name: Manuel J. A. Eugster
Author-X-Name-First: Manuel J. A.
Author-X-Name-Last: Eugster
Title: A Statistical Framework for Hypothesis Testing in Real Data Comparison Studies
Abstract:
In computational sciences, including computational statistics, machine
learning, and bioinformatics, it is often claimed in articles presenting
new supervised learning methods that the new method performs better than
existing methods on real data, for instance in terms of error rate.
However, these claims are often not based on proper statistical tests and,
even if such tests are performed, the tested hypothesis is not clearly
defined and poor attention is devoted to the Type I and Type II errors. In
the present article, we aim to fill this gap by providing a proper
statistical framework for hypothesis tests that compare the performances
of supervised learning methods based on several real datasets with unknown
underlying distributions. After giving a statistical interpretation of ad
hoc tests commonly performed by computational researchers, we devote
special attention to power issues and outline a simple method of
determining the number of datasets to be included in a comparison study to
reach an adequate power. These methods are illustrated through three
comparison studies from the literature and an exemplary benchmarking study
using gene expression microarray data. All our results can be reproduced
using R codes and datasets available from the companion website
http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/020_pr
ofessuren/boulesteix/compstud2013.
Journal: The American Statistician
Pages: 201-212
Issue: 3
Volume: 69
Year: 2015
Month: 8
X-DOI: 10.1080/00031305.2015.1005128
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1005128
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:201-212
Template-Type: ReDIF-Article 1.0
Author-Name: Derek S. Young
Author-X-Name-First: Derek S.
Author-X-Name-Last: Young
Author-Name: Glenn F. Johnson
Author-X-Name-First: Glenn F.
Author-X-Name-Last: Johnson
Author-Name: Mosuk Chow
Author-X-Name-First: Mosuk
Author-X-Name-Last: Chow
Author-Name: James L. Rosenberger
Author-X-Name-First: James L.
Author-X-Name-Last: Rosenberger
Title: The Challenges in Developing an Online Applied Statistics Program: Lessons Learned at Penn State University
Abstract:
Numerous professional fields have an increasing need for individuals
trained in statistics and other quantitative analysis techniques. Today
there exists great potential to fulfill this need by providing
opportunities through online learning. However, to provide a high-quality
education for returning adult professionals seeking advanced degrees in
applied statistics online, many challenges need to be overcome. Based on
our experience developing Penn State University's online program in
applied statistics, we discuss the evolution of the program's curriculum,
recruitment and development of online faculty, and meeting the
requirements of students as important areas that require consideration in
the development of an online program. We also highlight program evaluation
strategies employed to ensure innovation and improvement in online
education as cornerstones to a program's success.
Journal: The American Statistician
Pages: 213-220
Issue: 3
Volume: 69
Year: 2015
Month: 8
X-DOI: 10.1080/00031305.2015.1038583
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1038583
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:213-220
Template-Type: ReDIF-Article 1.0
Author-Name: Hyunju Lee
Author-X-Name-First: Hyunju
Author-X-Name-Last: Lee
Author-Name: Ji Hwan Cha
Author-X-Name-First: Ji Hwan
Author-X-Name-Last: Cha
Title: On Two General Classes of Discrete Bivariate Distributions
Abstract:
In this article, we develop two general classes of discrete bivariate
distributions. We derive general formulas for the joint distributions
belonging to the classes. The obtained formulas for the joint
distributions are very general in the sense that new families of
distributions can be generated just by specifying the "baseline seed
distributions." The dependence structures of the bivariate distributions
belonging to the proposed classes, along with basic statistical
properties, are also discussed. New families of discrete bivariate
distributions are generated from the classes. Furthermore, to assess the
usefulness of the proposed classes, two discrete bivariate distributions
generated from the classes are applied to analyze a real dataset and the
results are compared with those obtained from conventional models.
Journal: The American Statistician
Pages: 221-230
Issue: 3
Volume: 69
Year: 2015
Month: 8
X-DOI: 10.1080/00031305.2015.1044564
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1044564
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:221-230
Template-Type: ReDIF-Article 1.0
Author-Name: Brigitte Baldi
Author-X-Name-First: Brigitte
Author-X-Name-Last: Baldi
Author-Name: Jessica Utts
Author-X-Name-First: Jessica
Author-X-Name-Last: Utts
Title: What Your Future Doctor Should Know About Statistics: Must-Include Topics for Introductory Undergraduate Biostatistics
Abstract:
The increased emphasis on evidence-based medicine creates a greater need
for educating future physicians in the general domain of quantitative
reasoning, probability, and statistics. Reflecting this trend, more
medical schools now require applicants to have taken an undergraduate
course in introductory statistics. Given the breadth of statistical
applications, we should cover in that course certain essential topics that
may not be covered in the more general introductory statistics course. In
selecting and presenting such topics, we should bear in mind that doctors
also need to communicate probabilistic concepts of risks and benefits to
patients who are increasingly expected to be active participants in their
own health care choices despite having no training in medicine or
statistics. It is also important that interesting and relevant examples
accompany the presentation, because the examples (rather than the details)
are what students tend to retain years later. Here, we present a list of
topics we cover in the introductory biostatistics course that may not be
covered in the general introductory course. We also provide some of our
favorite examples for discussing these topics.
Journal: The American Statistician
Pages: 231-240
Issue: 3
Volume: 69
Year: 2015
Month: 8
X-DOI: 10.1080/00031305.2015.1048903
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1048903
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:231-240
Template-Type: ReDIF-Article 1.0
Author-Name: P. Vellaisamy
Author-X-Name-First: P.
Author-X-Name-Last: Vellaisamy
Title: On Probabilistic Proofs of Certain Binomial Identities
Abstract:
This short note gives a simple statistical proof of a binomial identity,
by evaluating the Laplace transform of the maximum of n
independent exponential random variables in two different ways. As a
by-product, we obtain a rigorous proof of an interesting result concerning
the exponential distribution. The connections between a probabilistic
approach and our approach are discussed. In the process, several new
binomial identities are also obtained.
Journal: The American Statistician
Pages: 241-243
Issue: 3
Volume: 69
Year: 2015
Month: 8
X-DOI: 10.1080/00031305.2015.1056381
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1056381
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:241-243
Template-Type: ReDIF-Article 1.0
Author-Name: R. Dennis Cook
Author-X-Name-First: R. Dennis
Author-X-Name-Last: Cook
Author-Name: Liliana Forzani
Author-X-Name-First: Liliana
Author-X-Name-Last: Forzani
Author-Name: Adam Rothman
Author-X-Name-First: Adam
Author-X-Name-Last: Rothman
Title: Letter to the Editor
Journal: The American Statistician
Pages: 253-254
Issue: 3
Volume: 69
Year: 2015
Month: 8
X-DOI: 10.1080/00031305.2015.1053522
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1053522
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:253-254
Template-Type: ReDIF-Article 1.0
Author-Name: Thaddeus Tarpey
Author-X-Name-First: Thaddeus
Author-X-Name-Last: Tarpey
Author-Name: R. Todd Ogden
Author-X-Name-First: R. Todd
Author-X-Name-Last: Ogden
Author-Name: Eva Petkova
Author-X-Name-First: Eva
Author-X-Name-Last: Petkova
Author-Name: Ronald Christensen
Author-X-Name-First: Ronald
Author-X-Name-Last: Christensen
Title: Reply
Journal: The American Statistician
Pages: 254-255
Issue: 3
Volume: 69
Year: 2015
Month: 8
X-DOI: 10.1080/00031305.2015.1056613
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1056613
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:254-255
Template-Type: ReDIF-Article 1.0
Author-Name: Peng Ding
Author-X-Name-First: Peng
Author-X-Name-Last: Ding
Title: Reply
Journal: The American Statistician
Pages: 255-256
Issue: 3
Volume: 69
Year: 2015
Month: 8
X-DOI: 10.1080/00031305.2015.1056615
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1056615
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:255-256
Template-Type: ReDIF-Article 1.0
Author-Name: Iliana Ignatova
Author-X-Name-First: Iliana
Author-X-Name-Last: Ignatova
Author-Name: Roland Deutsch
Author-X-Name-First: Roland
Author-X-Name-Last: Deutsch
Author-Name: Don Edwards
Author-X-Name-First: Don
Author-X-Name-Last: Edwards
Title: Kirk, J.L., and Fay, M.P. "An Introduction to Practical Sequential Inferences Via Single-Arm Binary Response Studies Using the Binseqtest R Package," The American Statistician, 68, 230-242
Journal: The American Statistician
Pages: 256-257
Issue: 3
Volume: 69
Year: 2015
Month: 8
X-DOI: 10.1080/00031305.2015.1053523
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1053523
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:256-257a
Template-Type: ReDIF-Article 1.0
Author-Name: Emil M. Friedman
Author-X-Name-First: Emil M.
Author-X-Name-Last: Friedman
Title: Nontransitivity, Correlation, and Causation
Journal: The American Statistician
Pages: 257-257
Issue: 3
Volume: 69
Year: 2015
Month: 8
X-DOI: 10.1080/00031305.2015.1056382
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1056382
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:257b-257b
Template-Type: ReDIF-Article 1.0
Author-Name: Stavros D. Veresoglou
Author-X-Name-First: Stavros D.
Author-X-Name-Last: Veresoglou
Author-Name: Matthias C. Rillig
Author-X-Name-First: Matthias C.
Author-X-Name-Last: Rillig
Title: Evidence-Based Data Analysis: Protecting the World From Bad Code? Comment by Veresoglou and Rillig
Journal: The American Statistician
Pages: 257-257
Issue: 3
Volume: 69
Year: 2015
Month: 8
X-DOI: 10.1080/00031305.2015.1056831
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1056831
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:257c-257c
Template-Type: ReDIF-Article 1.0
Author-Name: Nicholas J. Horton
Author-X-Name-First: Nicholas J.
Author-X-Name-Last: Horton
Author-Name: Johanna S. Hardin
Author-X-Name-First: Johanna S.
Author-X-Name-Last: Hardin
Title: Teaching the Next Generation of Statistics Students to “Think With Data”: Special Issue on Statistics and the Undergraduate Curriculum
Journal: The American Statistician
Pages: 259-265
Issue: 4
Volume: 69
Year: 2015
Month: 11
X-DOI: 10.1080/00031305.2015.1094283
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1094283
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:259-265
Template-Type: ReDIF-Article 1.0
Author-Name: George Cobb
Author-X-Name-First: George
Author-X-Name-Last: Cobb
Title: Mere Renovation is Too Little Too Late: We Need to Rethink our Undergraduate Curriculum from the Ground Up
Abstract:
The last half-dozen years have seen The American
Statistician publish well-argued and provocative calls to change
our thinking about statistics and how we teach it, among them Brown and
Kass, Nolan and Temple-Lang, and Legler et al. Within this past year,
the ASA has issued a new and comprehensive set of guidelines for
undergraduate programs (ASA, Curriculum Guidelines for
Undergraduate Programs in Statistical Science). Accepting (and
applauding) all this as background, the current article argues the need to
rethink our curriculum from the ground up, and offers five principles and
two caveats intended to help us along the path toward a new synthesis.
These principles and caveats rest on my sense of three parallel
evolutions: the convergence of trends in the roles of mathematics,
computation, and context within statistics education. These ongoing
changes, together with the articles cited above and the seminal
provocation by Leo Breiman call for a deep rethinking of what we teach to
undergraduates. In particular, following Brown and Kass, we should put
priority on two goals, to make “fundamental concepts
accessible” and to “minimize prerequisites to
research.”[Received December 2014. Revised July 2015]
Journal: The American Statistician
Pages: 266-282
Issue: 4
Volume: 69
Year: 2015
Month: 11
X-DOI: 10.1080/00031305.2015.1093029
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1093029
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:266-282
Template-Type: ReDIF-Article 1.0
Author-Name: Nicholas Chamandy
Author-X-Name-First: Nicholas
Author-X-Name-Last: Chamandy
Author-Name: Omkar Muralidharan
Author-X-Name-First: Omkar
Author-X-Name-Last: Muralidharan
Author-Name: Stefan Wager
Author-X-Name-First: Stefan
Author-X-Name-Last: Wager
Title: Teaching Statistics at Google-Scale
Abstract:
Modern data and applications pose very different challenges from those of
the 1950s or even the 1980s. Students contemplating a career in statistics
or data science need to have the tools to tackle problems involving
massive, heavy-tailed data, often interacting with live, complex systems.
However, despite the deepening connections between engineering and modern
data science, we argue that training in classical statistical concepts
plays a central role in preparing students to solve Google-scale problems.
To this end, we present three industrial applications where significant
modern data challenges were overcome by statistical thinking.[Received
December 2014. Revised August 2015.]
Journal: The American Statistician
Pages: 283-291
Issue: 4
Volume: 69
Year: 2015
Month: 11
X-DOI: 10.1080/00031305.2015.1089790
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1089790
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:283-291
Template-Type: ReDIF-Article 1.0
Author-Name: Deborah Nolan
Author-X-Name-First: Deborah
Author-X-Name-Last: Nolan
Author-Name: Duncan Temple Lang
Author-X-Name-First: Duncan
Author-X-Name-Last: Temple Lang
Title: Explorations in Statistics Research: An Approach to Expose Undergraduates to Authentic Data Analysis
Abstract:
The Explorations in Statistics Research workshop is a one-week NSF-funded
summer program that introduces undergraduate students to current research
problems in applied statistics. The goal of the workshop is to expose
students to exciting, modern applied statistical research and practice,
with the ultimate aim of interesting them in seeking more training in
statistics at the undergraduate and graduate levels. The program is
explicitly designed to engage students in the connections between
authentic domain problems and the statistical ideas and approaches needed
to address these problems, which is an important aspect of statistical
thinking that is difficult to teach and sometimes lacking in our
methodological courses and programs. Over the past 9 years, we ran
the workshop six times and a similar program in the sciences two times. We
describe the program, summarize feedback from participants, and identify
the key features to its success. We abstract these features and provide a
set of recommendations for how faculty can incorporate important elements
into their regular courses.[Received December 2014. Revised June 2015.]
Journal: The American Statistician
Pages: 292-299
Issue: 4
Volume: 69
Year: 2015
Month: 11
X-DOI: 10.1080/00031305.2015.1073624
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1073624
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:292-299
Template-Type: ReDIF-Article 1.0
Author-Name: Byran J. Smucker
Author-X-Name-First: Byran J.
Author-X-Name-Last: Smucker
Author-Name: A. John Bailer
Author-X-Name-First: A. John
Author-X-Name-Last: Bailer
Title: Beyond Normal: Preparing Undergraduates for the Work Force in a Statistical Consulting Capstone
Abstract:
In this article we chronicle the development of the undergraduate
statistical consulting course at Miami University, from canned to
client-based projects, and argue that if the course is well designed with
suitable mentoring, students can perform remarkably sophisticated analyses
of real-world data problems that require solutions beyond the methods
encountered in previous classes. We review the historical context in which
the consulting class evolved, describe the logistics of implementing it,
and review assessment and student reaction to the course. We also
illustrate the types of challenging projects the students are confronted
with via two case studies and relate the skills learned and reinforced in
this consulting class model to the skills demanded in the modern
statistical work force. This course also provides an opportunity to
strengthen and nurture key points from the new American Statistical
Association guidelines for undergraduate programs: namely, communicating
analyses of real and complex data that require the application of diverse
statistical models and approaches. Supplementary materials for this
article are available online.[Received December 2014. Revised July 2015.]
Journal: The American Statistician
Pages: 300-306
Issue: 4
Volume: 69
Year: 2015
Month: 11
X-DOI: 10.1080/00031305.2015.1077731
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1077731
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:300-306
Template-Type: ReDIF-Article 1.0
Author-Name: Scott D. Grimshaw
Author-X-Name-First: Scott D.
Author-X-Name-Last: Grimshaw
Title: A Framework for Infusing Authentic Data Experiences Within Statistics Courses
Abstract:
Working with complex data is one of the important updates to the 2014 ASA
Curriculum Guidelines for Undergraduate Programs in Statistical Science.
Infusing “authentic data experiences” within courses allow
students opportunities to learn and practice data skills as they prepare a
dataset for analysis. While more modest in scope than a senior-level
culminating experience, authentic data experiences provide an opportunity
to demonstrate connections between data skills and statistical skills. The
result is more practice of data skills for undergraduate
statisticians.[Received November 2014. Revised July 2015.]
Journal: The American Statistician
Pages: 307-314
Issue: 4
Volume: 69
Year: 2015
Month: 11
X-DOI: 10.1080/00031305.2015.1081106
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1081106
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:307-314
Template-Type: ReDIF-Article 1.0
Author-Name: Jennifer L. Green
Author-X-Name-First: Jennifer L.
Author-X-Name-Last: Green
Author-Name: Erin E. Blankenship
Author-X-Name-First: Erin E.
Author-X-Name-Last: Blankenship
Title: Fostering Conceptual Understanding in Mathematical Statistics
Abstract:
In many undergraduate statistics programs, the two-semester calculus-based
mathematical statistics sequence is the cornerstone of the curriculum.
However, 10 years after the release of the Guidelines for the
Assessment and Instruction in Statistics Education (GAISE) College Report,
2005, and the subsequent movement to stress conceptual understanding and
foster active learning in statistics classrooms, the sequence still
remains a traditional, lecture-intensive course. In this article, we
discuss various instructional approaches, activities, and assessments that
can be used to foster active learning and emphasize conceptual
understanding while still covering the necessary theoretical content
students need to be successful in subsequent statistics or actuarial
science courses. In addition, we share student reflections on these course
enhancements. The course revision we suggest doesn’t require
substantial changes in content, so other mathematical statistics
instructors can implement these strategies without sacrificing concepts in
probability and inference that are fundamental to the needs of their
students. Supplementary materials, including code used to generate class
plots and activity handouts, are available online.Received December 2014.
Revised June 2015.
Journal: The American Statistician
Pages: 315-325
Issue: 4
Volume: 69
Year: 2015
Month: 11
X-DOI: 10.1080/00031305.2015.1069759
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1069759
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:315-325
Template-Type: ReDIF-Article 1.0
Author-Name: Natalie J. Blades
Author-X-Name-First: Natalie J.
Author-X-Name-Last: Blades
Author-Name: G. Bruce Schaalje
Author-X-Name-First: G. Bruce
Author-X-Name-Last: Schaalje
Author-Name: William F. Christensen
Author-X-Name-First: William F.
Author-X-Name-Last: Christensen
Title: The Second Course in Statistics: Design and Analysis of Experiments?
Abstract:
Statistics departments are facing rapid growth in enrollments and
increases in demand for courses. This article discusses the use of design
and analysis of experiments (DAE) as a nonterminal second course in
statistics for undergraduate statistics majors, minors, and other students
seeking exposure to the practice of statistics beyond the introductory
course. DAE is a gateway to approaching statistical thinking as data-based
problem solving by exposing students to statistical, computational, data,
and communication skills in the second course. Given the somewhat
antiquated view of design and deemphasis of classical design of
experiments topics in the new ASA curriculum guidelines, DAE may seem an
odd choice for the second course; however, it exposes students to the
breadth of the statistical problem-solving process, explores foundational
issues of the discipline, and is accessible to students who have not yet
finished their advanced mathematical training. These skills remain
essential in the data science era as students must be equipped to
understand the potential and peril of found data using the principles of
design. While DAE may not be the appropriate second course for all
statistics programs, it provides a strong foundation for causal inference
and experimental design for students pursuing a B.S. in Statistics in a
program housed in a department of statistics.[Received December 2014.
Revised July 2015.]
Journal: The American Statistician
Pages: 326-333
Issue: 4
Volume: 69
Year: 2015
Month: 11
X-DOI: 10.1080/00031305.2015.1086437
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1086437
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:326-333
Template-Type: ReDIF-Article 1.0
Author-Name: Ben Baumer
Author-X-Name-First: Ben
Author-X-Name-Last: Baumer
Title: A Data Science Course for Undergraduates: Thinking With Data
Abstract:
Data science is an emerging interdisciplinary field that combines elements
of mathematics, statistics, computer science, and knowledge in a
particular application domain for the purpose of extracting meaningful
information from the increasingly sophisticated array of data available in
many settings. These data tend to be nontraditional, in the sense that
they are often live, large, complex, and/or messy. A first course in
statistics at the undergraduate level typically introduces students to a
variety of techniques to analyze small, neat, and clean datasets. However,
whether they pursue more formal training in statistics or not, many of
these students will end up working with data that are considerably more
complex, and will need facility with statistical computing techniques.
More importantly, these students require a framework for thinking
structurally about data. We describe an undergraduate course in a liberal
arts environment that provides students with the tools necessary to apply
data science. The course emphasizes modern, practical, and useful skills
that cover the full data analysis spectrum, from asking an interesting
question to acquiring, managing, manipulating, processing, querying,
analyzing, and visualizing data, as well communicating findings in
written, graphical, and oral forms. Supplementary materials for this
article are available online.[Received June 2014. Revised July 2015.]
Journal: The American Statistician
Pages: 334-342
Issue: 4
Volume: 69
Year: 2015
Month: 11
X-DOI: 10.1080/00031305.2015.1081105
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1081105
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:334-342
Template-Type: ReDIF-Article 1.0
Author-Name: J. Hardin
Author-X-Name-First: J.
Author-X-Name-Last: Hardin
Author-Name: R. Hoerl
Author-X-Name-First: R.
Author-X-Name-Last: Hoerl
Author-Name: Nicholas J. Horton
Author-X-Name-First: Nicholas J.
Author-X-Name-Last: Horton
Author-Name: D. Nolan
Author-X-Name-First: D.
Author-X-Name-Last: Nolan
Author-Name: B. Baumer
Author-X-Name-First: B.
Author-X-Name-Last: Baumer
Author-Name: O. Hall-Holt
Author-X-Name-First: O.
Author-X-Name-Last: Hall-Holt
Author-Name: P. Murrell
Author-X-Name-First: P.
Author-X-Name-Last: Murrell
Author-Name: R. Peng
Author-X-Name-First: R.
Author-X-Name-Last: Peng
Author-Name: P. Roback
Author-X-Name-First: P.
Author-X-Name-Last: Roback
Author-Name: D. Temple Lang
Author-X-Name-First: D.
Author-X-Name-Last: Temple Lang
Author-Name: M. D. Ward
Author-X-Name-First: M. D.
Author-X-Name-Last: Ward
Title: Data Science in Statistics Curricula: Preparing Students to “Think with Data”
Abstract:
A growing number of students are completing undergraduate degrees in
statistics and entering the workforce as data analysts. In these
positions, they are expected to understand how to use databases and other
data warehouses, scrape data from Internet sources, program solutions to
complex problems in multiple languages, and think algorithmically as well
as statistically. These data science topics have not traditionally been a
major component of undergraduate programs in statistics. Consequently, a
curricular shift is needed to address additional learning outcomes. The
goal of this article is to motivate the importance of data science
proficiency and to provide examples and resources for instructors to
implement data science in their own statistics curricula. We provide case
studies from seven institutions. These varied approaches to teaching data
science demonstrate curricular innovations to address new needs. Also
included here are examples of assignments designed for courses that foster
engagement of undergraduates with data and data science.[Received November
2014. Revised July 2015.]
Journal: The American Statistician
Pages: 343-353
Issue: 4
Volume: 69
Year: 2015
Month: 11
X-DOI: 10.1080/00031305.2015.1077729
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1077729
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:343-353
Template-Type: ReDIF-Article 1.0
Author-Name: Shonda Kuiper
Author-X-Name-First: Shonda
Author-X-Name-Last: Kuiper
Author-Name: Rodney X. Sturdivant
Author-X-Name-First: Rodney X.
Author-X-Name-Last: Sturdivant
Title: Using Online Game-Based Simulations to Strengthen Students’ Understanding of Practical Statistical Issues in Real-World Data Analysis
Abstract:
Datasets provided to students are typically carefully chosen and vetted to
illustrate a key statistical topic or method. Rarely are real studies and
data so straightforward. In addition, carefully curated datasets that are
brought into the statistics classroom may not feel realistic to students.
We provide several examples of online activities where students can
quickly collect their own local data, have input on the goals of the study
and draw their own conclusions. These activities focus on core statistical
issues that are often challenging to teach with traditional textbooks,
such as working with messy data, bias, data relevance, and reliability.
This approach to teaching integrates the challenges of data in a way that
encourages students to see how easy it can be to inadvertently draw
misleading conclusions. These activities are designed to be highly
adaptable and have proven effective in a wide variety of introductory and
advanced undergraduate courses.[Received December 2014. Revised July
2015.]
Journal: The American Statistician
Pages: 354-361
Issue: 4
Volume: 69
Year: 2015
Month: 11
X-DOI: 10.1080/00031305.2015.1075421
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1075421
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:354-361
Template-Type: ReDIF-Article 1.0
Author-Name: Nathan Tintle
Author-X-Name-First: Nathan
Author-X-Name-Last: Tintle
Author-Name: Beth Chance
Author-X-Name-First: Beth
Author-X-Name-Last: Chance
Author-Name: George Cobb
Author-X-Name-First: George
Author-X-Name-Last: Cobb
Author-Name: Soma Roy
Author-X-Name-First: Soma
Author-X-Name-Last: Roy
Author-Name: Todd Swanson
Author-X-Name-First: Todd
Author-X-Name-Last: Swanson
Author-Name: Jill VanderStoep
Author-X-Name-First: Jill
Author-X-Name-Last: VanderStoep
Title: Combating Anti-Statistical Thinking Using Simulation-Based Methods Throughout the Undergraduate Curriculum
Abstract:
The use of simulation-based methods for introducing inferen-ce is growing
in popularity for the Stat 101 course, due in part to increasing evidence
of the methods ability to improve studen-ts’ statistical thinking.
This impact comes from simulation-based methods (a) clearly presenting the
overarching logic of inference, (b) strengthening ties between statistics
and probability/mathematical concepts, (c) encouraging a focus on the
entire research process, (d) facilitating student thinking about advanced
statistical concepts, (e) allowing more time to explore, do, and talk
about real research and messy data, and (f) acting as a firm-er foundation
on which to build statistical intuition. Thus, we argue that
simulation-based inference should be an entry point to an undergraduate
statistics program for all students, and that simulation-based inference
should be used throughout all under-graduate statistics courses. To
achieve this goal and fully recognize the benefits of simulation-based
inference on the undergraduate statistics program, we will need to break
free of historical forces tying undergraduate statistics curricula to
mathematics, consider radical and innovative new pedagogical approaches in
our courses, fully implement assessment-driven content innovations, and
embrace computation throughout the curriculum.[Received December 2014.
Revised July 2015]
Journal: The American Statistician
Pages: 362-370
Issue: 4
Volume: 69
Year: 2015
Month: 11
X-DOI: 10.1080/00031305.2015.1081619
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1081619
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:362-370
Template-Type: ReDIF-Article 1.0
Author-Name: Tim C. Hesterberg
Author-X-Name-First: Tim C.
Author-X-Name-Last: Hesterberg
Title: What Teachers Should Know About the Bootstrap: Resampling in the Undergraduate Statistics Curriculum
Abstract:
Bootstrapping has enormous potential in statistics education and practice,
but there are subtle issues and ways to go wrong. For example, the common
combination of nonparametric bootstrapping and bootstrap percentile
confidence intervals is less accurate than using
t-intervals for small samples, though more accurate for
larger samples. My goals in this article are to provide a deeper
understanding of bootstrap methods—how they work, when they work or
not, and which methods work better—and to highlight pedagogical
issues. Supplementary materials for this article are available
online.[Received December 2014. Revised August 2015]
Journal: The American Statistician
Pages: 371-386
Issue: 4
Volume: 69
Year: 2015
Month: 11
X-DOI: 10.1080/00031305.2015.1089789
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1089789
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:371-386
Template-Type: ReDIF-Article 1.0
Author-Name: Davit Khachatryan
Author-X-Name-First: Davit
Author-X-Name-Last: Khachatryan
Title: Incorporating Statistical Consulting Case Studies in Introductory Time Series Courses
Abstract:
Established as a rigorous pedagogical device at Harvard University, the
case method has grown into an indispensable mode of instruction at many
business schools. Its effectiveness has been praised for increasing
student participation during in-class discussions, providing hands-on
engagement in real-world business problems, and increasing long-term
retention rates. This article illustrates how novel case studies that
mimic real-life statistical consulting engagements can be incorporated in
the curriculum of an undergraduate, introductory time series course. The
assessment of learning objectives as well as pedagogical implications when
teaching using statistical consulting case studies is elucidated. The
article also lays out guidelines for adopting statistical consulting case
studies should the readers choose to incorporate the case method into the
curricula of courses that they teach. A sample case study which the author
has successfully used in his classroom instruction is provided in this
article.Received July 2014. Revised January 2015
Journal: The American Statistician
Pages: 387-396
Issue: 4
Volume: 69
Year: 2015
Month: 11
X-DOI: 10.1080/00031305.2015.1026611
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1026611
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:387-396
Template-Type: ReDIF-Article 1.0
Author-Name: Scotland Leman
Author-X-Name-First: Scotland
Author-X-Name-Last: Leman
Author-Name: Leanna House
Author-X-Name-First: Leanna
Author-X-Name-Last: House
Author-Name: Andrew Hoegh
Author-X-Name-First: Andrew
Author-X-Name-Last: Hoegh
Title: Developing a New Interdisciplinary Computational Analytics Undergraduate Program: A Qualitative-Quantitative-Qualitative Approach
Abstract:
Statistics departments play a vital role in educating students on the
analysis of data for obtaining information and discovering knowledge. In
the last several years, we have witnessed an explosion of data, which was
not imaginable in years past. As a result, the methods and techniques used
for data analysis have evolved. Beyond this, the technology used for
storing, porting, and computing big data has also
evolved, and so now must traditionally oriented statistics departments. In
this article, we discuss the development of a new computational modeling
program that meets these demands, and we detail how to balance the
qualitative and quantitative components of modern day data analyses for
statistical education.[Received December 2014. Revised August 2015.]
Journal: The American Statistician
Pages: 397-408
Issue: 4
Volume: 69
Year: 2015
Month: 11
X-DOI: 10.1080/00031305.2015.1090337
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1090337
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:397-408
Template-Type: ReDIF-Article 1.0
Author-Name: Beth Chance
Author-X-Name-First: Beth
Author-X-Name-Last: Chance
Author-Name: Roxy Peck
Author-X-Name-First: Roxy
Author-X-Name-Last: Peck
Title: From Curriculum Guidelines to Learning Outcomes: Assessment at the Program Level
Abstract:
The 2000 ASA Guidelines for Undergraduate Statistics majors aimed to
provide guidance to programs with undergraduate degrees in statistics as
to the content and skills that statistics majors should be learning. The
2014 Guidelines revise the earlier guidelines to reflect changes in the
discipline. As programs strive to adjust their curricula to align with the
2014 Guidelines, it is appropriate to also think about developing an
assessment cycle of evaluation. This will enable programs to determine
whether students are learning what we want them to learn and to work on
continuously improving the program over time. The first step is to
translate the broader Guidelines into institution-specific measurable
learning outcomes. This article focuses on providing examples of learning
outcomes developed by different institutions based on the 2000 Guidelines.
The companion article by Moore and Kaplan (this issue) focuses on choosing
appropriate assessment methods and rubrics and creating an assessment
plan. We hope the examples provided are illustrative and that they will
assist programs as they implement the 2014 Guidelines.[Received November
2014. Revised July 2015.]
Journal: The American Statistician
Pages: 409-416
Issue: 4
Volume: 69
Year: 2015
Month: 11
X-DOI: 10.1080/00031305.2015.1077730
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1077730
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:409-416
Template-Type: ReDIF-Article 1.0
Author-Name: Allison Amanda Moore
Author-X-Name-First: Allison Amanda
Author-X-Name-Last: Moore
Author-Name: Jennifer J. Kaplan
Author-X-Name-First: Jennifer J.
Author-X-Name-Last: Kaplan
Title: Program Assessment for an Undergraduate Statistics Major
Abstract:
Program assessment is used by institutions and/or departments to prompt
conversations about the status of student learning and make informed
decisions about educational programs. It is also typically required by
accreditation agencies, such as the Southern Association of Colleges and
Schools (SACS) or the Western Association of Schools & Colleges (WASC).
The cyclic assessment process includes four steps: establishing student
learning outcomes, deciding on assessment methods, collecting and
analyzing data, and reflecting on the results. The theory behind the
choice of assessment methods and the use of rubrics in assessment is
discussed. A description of the experiences of a Department of Statistics
at a large research university during their process of developing an
assessment plan for the undergraduate statistics major is provided. The
article concludes with the lessons learned by the department as they
completed the assessment development process. Supplementary materials for
this article are available online.[Received December 2014. Revised July
2015]
Journal: The American Statistician
Pages: 417-424
Issue: 4
Volume: 69
Year: 2015
Month: 11
X-DOI: 10.1080/00031305.2015.1087331
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1087331
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:69:y:2015:i:4:p:417-424
Template-Type: ReDIF-Article 1.0
Author-Name: Xu Xu
Author-X-Name-First: Xu
Author-X-Name-Last: Xu
Author-Name: Peter Z. G. Qian
Author-X-Name-First: Peter Z. G.
Author-X-Name-Last: Qian
Author-Name: Qing Liu
Author-X-Name-First: Qing
Author-X-Name-Last: Liu
Title: Samurai Sudoku-Based Space-Filling Designs for Data Pooling
Abstract:
Pooling data from multiple sources plays an increasingly vital role in
today’s world. By using a popular Sudoku game, we propose a new
type of design, called a Samurai Sudoku-based space-filling design to
address this issue. Such a design is an orthogonal array-based Latin
hypercube design with the following attractive properties: (i) the
complete design achieves uniformity in both univariate and bivariate
margins; (ii) it can be divided into groups of subdesigns with overlaps
such that each subdesign achieves uniformity in both univariate and
bivariate margins; and (iii) each of the overlaps achieves uniformity in
both univariate and bivariate margins. Examples are given to illustrate
the properties of the proposed design, and to demonstrate the advantages
of using the proposed design for pooling data from multiple
sources.[Received August 2013. Revised July 2015.]
Journal: The American Statistician
Pages: 1-8
Issue: 1
Volume: 70
Year: 2016
Month: 2
X-DOI: 10.1080/00031305.2015.1114970
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1114970
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:1-8
Template-Type: ReDIF-Article 1.0
Author-Name: Marcio A. Diniz
Author-X-Name-First: Marcio A.
Author-X-Name-Last: Diniz
Author-Name: Jasper De Bock
Author-X-Name-First: Jasper
Author-X-Name-Last: De Bock
Author-Name: Arthur Van Camp
Author-X-Name-First: Arthur
Author-X-Name-Last: Van Camp
Title: Characterizing Dirichlet Priors
Abstract:
The selection of prior distributions is a problem that has been heavily
discussed since Bayes and Price published their article in 1763. Conjugate
priors became popular, largely because of their mathematical convenience.
In this study, we justify the use of the conjugate combination of a
Dirichlet prior and a multinomial likelihood by imposing a fundamental
principle that we call partition invariance, alongside other requirements
that are well known in the literature.[Received January 2014. Revised July
2015.]
Journal: The American Statistician
Pages: 9-17
Issue: 1
Volume: 70
Year: 2016
Month: 2
X-DOI: 10.1080/00031305.2015.1100137
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1100137
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:9-17
Template-Type: ReDIF-Article 1.0
Author-Name: Omar A. Kittaneh
Author-X-Name-First: Omar A.
Author-X-Name-Last: Kittaneh
Author-Name: Mohammad A. U. Khan
Author-X-Name-First: Mohammad A. U.
Author-X-Name-Last: Khan
Author-Name: Muhammed Akbar
Author-X-Name-First: Muhammed
Author-X-Name-Last: Akbar
Author-Name: Husam A. Bayoud
Author-X-Name-First: Husam A.
Author-X-Name-Last: Bayoud
Title: Average Entropy: A New Uncertainty Measure with Application to Image Segmentation
Abstract:
Various modifications have been suggested in the past to extend Shannon
entropy to continuous random variables. This article investigates these
modifications, and suggests a new entropy measure with the name of average
entropy (AE). AE is more general than Shannon entropy in the sense that
its definition encompasses both continuous as well as discrete domains. It
is additive, positive and attains zero only when the distribution is
uniform. The main characteristic of the suggested measure lies in its
consistency behavior. Many properties of AE, including its relationship
with Kullback--Leibler information measure, are studied. Precise theorems
about the vanishing of the conditional AE for both continuous and discrete
distributions are provided. Toward the end, the measure is tested for its
effectiveness in image segmentation.[Received March 2014. Revised June
2015.]
Journal: The American Statistician
Pages: 18-24
Issue: 1
Volume: 70
Year: 2016
Month: 2
X-DOI: 10.1080/00031305.2015.1089788
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1089788
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:18-24
Template-Type: ReDIF-Article 1.0
Author-Name: Merritt Lyon
Author-X-Name-First: Merritt
Author-X-Name-Last: Lyon
Author-Name: Li C. Cheung
Author-X-Name-First: Li C.
Author-X-Name-Last: Cheung
Author-Name: Joseph L. Gastwirth
Author-X-Name-First: Joseph L.
Author-X-Name-Last: Gastwirth
Title: The Advantages of Using Group Means in Estimating the Lorenz Curve and Gini Index From Grouped Data
Abstract:
A recent article proposed a histogram-based method for estimating the
Lorenz curve and Gini index from grouped data that did not use the group
means reported by government agencies. When comparing their method to one
based on group means, the authors assume a uniform density in each
grouping interval, which leads to an overestimate of the overall average
income. After reviewing the additional information in the group means, it
will be shown that as the number of groups increases, the bounds on the
Gini index obtained from the group means become narrower. This is not
necessarily true for the histogram method. Two simple interpolation
methods using the group means are described and the accuracy of the
estimated Gini index they yield and the histogram-based one are compared
to the published Gini index for the 1967--2013 period. The average
absolute errors of the estimated Gini index obtained from the two methods
using group means are noticeably less than that of the histogram-based
method. Supplementary materials for this article are available
online.[Received August 2014. Revised September 2015.]
Journal: The American Statistician
Pages: 25-32
Issue: 1
Volume: 70
Year: 2016
Month: 2
X-DOI: 10.1080/00031305.2015.1105152
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1105152
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:25-32
Template-Type: ReDIF-Article 1.0
Author-Name: Eugene Demidenko
Author-X-Name-First: Eugene
Author-X-Name-Last: Demidenko
Title: The p-Value You Can’t Buy
Abstract:
There is growing frustration with the concept of the
p-value. Besides having an ambiguous interpretation, the
p-value can be made as small as desired by increasing the
sample size, n. The p-value is outdated
and does not make sense with big data: Everything becomes statistically
significant. The root of the problem with the p-value is
in the mean comparison. We argue that statistical uncertainty should be
measured on the individual, not the group, level. Consequently, standard
deviation (SD), not standard error (SE), error bars should be used to
graphically present the data on two groups. We introduce a new measure
based on the discrimination of individuals/objects from two groups, and
call it the D-value. The D-value can be
viewed as the n-of-1 p-value because it
is computed in the same way as p while letting
n equal 1. We show how the D-value is
related to discrimination probability and the area above the receiver
operating characteristic (ROC) curve. The D-value has a
clear interpretation as the proportion of patients who get worse after the
treatment, and as such facilitates to weigh up the likelihood of events
under different scenarios.[Received January 2015. Revised June 2015.]
Journal: The American Statistician
Pages: 33-38
Issue: 1
Volume: 70
Year: 2016
Month: 2
X-DOI: 10.1080/00031305.2015.1069760
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1069760
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:33-38
Template-Type: ReDIF-Article 1.0
Author-Name: Joseph J. Lee
Author-X-Name-First: Joseph J.
Author-X-Name-Last: Lee
Author-Name: Donald B. Rubin
Author-X-Name-First: Donald B.
Author-X-Name-Last: Rubin
Title: Evaluating the Validity of Post-Hoc Subgroup Inferences: A Case Study
Abstract:
In randomized experiments, the random assignment of units to treatment
groups justifies many of the widely used traditional analysis methods for
evaluating causal effects. Specifying subgroups of units for further
examination after observing outcomes, however, may partially nullify any
advantages of randomized assignment when data are analyzed naively. Some
previous statistical literature has treated all post-hoc subgroup analyses
homogeneously as entirely invalid and thus uninterpretable. The extent of
the validity of such analyses and the factors that affect the degree of
validity remain largely unstudied. Here, we describe a recent
pharmaceutical case with First Amendment legal implications, in which
post-hoc subgroup analyses played a pivotal and controversial role.
Through Monte Carlo simulation, we show that post-hoc results that seem
highly significant make dramatic movements toward insignificance after
accounting for the subgrouping procedure presumably used. Finally, we
propose a novel, randomization-based method that generates valid post-hoc
subgroup p-values, provided we know exactly how the
subgroups were constructed. If we do not know the exact subgrouping
procedure, our method may still place helpful bounds on the significance
level of estimated effects. This randomization-based approach allows us to
evaluate causal effects in situations where valid evaluations were
previously considered impossible.[Received February 2014. Revised April
2015.]
Journal: The American Statistician
Pages: 39-46
Issue: 1
Volume: 70
Year: 2016
Month: 2
X-DOI: 10.1080/00031305.2015.1093961
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1093961
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:39-46
Template-Type: ReDIF-Article 1.0
Author-Name: Corwin Matthew Zigler
Author-X-Name-First: Corwin Matthew
Author-X-Name-Last: Zigler
Title: The Central Role of Bayes’ Theorem for Joint Estimation of Causal Effects and Propensity Scores
Abstract:
Although propensity scores have been central to the estimation of causal
effects for over 30 years, only recently has the statistical literature
begun to consider in detail methods for Bayesian estimation of propensity
scores and causal effects. Underlying this recent body of literature on
Bayesian propensity score estimation is an implicit discordance between
the goal of the propensity score and the use of Bayes’ theorem. The
propensity score condenses multivariate covariate information into a
scalar to allow estimation of causal effects without specifying a model
for how each covariate relates to the outcome. Avoiding specification of a
detailed model for the outcome response surface is valuable for robust
estimation of causal effects, but this strategy is at odds with the use of
Bayes’ theorem, which presupposes a full probability model for the
observed data that adheres to the likelihood principle. The goal of this
article is to explicate this fundamental feature of Bayesian estimation of
causal effects with propensity scores to provide context for the existing
literature and for future work on this important topic.[Received June
2014. Revised September 2015.]
Journal: The American Statistician
Pages: 47-54
Issue: 1
Volume: 70
Year: 2016
Month: 2
X-DOI: 10.1080/00031305.2015.1111260
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1111260
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:47-54
Template-Type: ReDIF-Article 1.0
Author-Name: Robert Lund
Author-X-Name-First: Robert
Author-X-Name-Last: Lund
Author-Name: Gang Liu
Author-X-Name-First: Gang
Author-X-Name-Last: Liu
Author-Name: Qin Shao
Author-X-Name-First: Qin
Author-X-Name-Last: Shao
Title: A New Approach to ANOVA Methods for Autocorrelated Data
Abstract:
This article reexamines ANOVA problems for autocorrelated data. Using
linear prediction techniques for stationary time series, a new test
statistic that assesses a null hypothesis of equal means is proposed and
investigated. Our test statistic mimics the classical
F-type ratio form used with independent data, but
substitutes estimated prediction residuals in for the errors. This simple
tactic departs from past studies that adjust the quadratic forms in the
numerator and denominator in the F ratio for
autocorrelation. One of the advantages is that our statistic retains the
classical null hypothesis F distribution (now as a limit)
with the customary degrees of freedom. The statistic is shown to perform
well in simulations. Asymptotic proofs are given in the case of
autoregressive random errors; a sports application is supplied.[Received
December 2014. Revised August 2015.]
Journal: The American Statistician
Pages: 55-62
Issue: 1
Volume: 70
Year: 2016
Month: 2
X-DOI: 10.1080/00031305.2015.1093026
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1093026
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:55-62
Template-Type: ReDIF-Article 1.0
Author-Name: Timothy W. Armistead
Author-X-Name-First: Timothy W.
Author-X-Name-Last: Armistead
Title: Misunderstood and Unattributed: Revisiting M. H. Doolittle's Measures of Association, With a Note on Bayes’ Theorem
Abstract:
In the 1880s, American scholars developed measures of association and
chance for cross-classification tables that anticipated the more widely
known work of Galton, Pearson, Yule, and Fisher. Three of the measures
form the historical backdrop for the earliest known use of a joint
probability measure that mirrored Bayes’ theorem long before the
latter gained general interest among statisticians. The joint probability
measure, which served as a foundational step in M. H. Doolittle's
development of the first of the two “association ratios,”
has not previously been reviewed in the statistical literature. It was
reintroduced as if newly developed in a subfield of experimental
psychology more than a century after Doolittle's work was published. It
has flourished there, but it has not seen use in other academic venues.
The article describes its properties and limitations and proposes that it
be disseminated and debated beyond its current narrow application. The
article notes that Doolittle's first association ratio can be expressed as
another joint probability and that prior treatments in the literature are
inconsistent with Doolittle's understanding of its purpose. The article
also demonstrates that the equivalent of Cohen's kappa
(κ) was developed by Doolittle in
1887, as his second association measure.[Received December 2014. Revised
August 2015.]
Journal: The American Statistician
Pages: 63-73
Issue: 1
Volume: 70
Year: 2016
Month: 2
X-DOI: 10.1080/00031305.2015.1086686
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1086686
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:63-73
Template-Type: ReDIF-Article 1.0
Author-Name: R. Wayne Oldford
Author-X-Name-First: R. Wayne
Author-X-Name-Last: Oldford
Title: Self-Calibrating Quantile--Quantile Plots
Abstract:
Quantile--quantile plots, or qqplots, are an important visual tool for
many applications but their interpretation requires some care and often
more experience. This apparent subjectivity is unnecessary. By drawing on
the computational and display facilities now widely available, qqplots are
easily enriched to help with their interpretation. An overview of quantile
functions and quantile--quantile plots is presented against the backdrop
of their early historical development. Strengths and shortcomings of the
traditional display are described. A new enhanced qqplot, the
self-calibrating qqplot, is introduced and demonstrated on a variety of
examples—both synthetic and real. Real examples include normal
qqplots, log-normal plots, half-normal plots for factorial experiments,
qqplots for and
s in process improvement applications, detection of
multivariate outliers, and the comparison of empirical distributions.
Self-calibration is had by visually incorporating sampling variation in
the qqplot display in a variety of ways. The new qqplot is available
through the function and R package qqtest.[Received December 2014. Revised
August 2015.]
Journal: The American Statistician
Pages: 74-90
Issue: 1
Volume: 70
Year: 2016
Month: 2
X-DOI: 10.1080/00031305.2015.1090338
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1090338
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:74-90
Template-Type: ReDIF-Article 1.0
Author-Name: Jocelyn T. Chi
Author-X-Name-First: Jocelyn T.
Author-X-Name-Last: Chi
Author-Name: Eric C. Chi
Author-X-Name-First: Eric C.
Author-X-Name-Last: Chi
Author-Name: Richard G. Baraniuk
Author-X-Name-First: Richard G.
Author-X-Name-Last: Baraniuk
Title: k-POD: A Method for k-Means Clustering of Missing Data
Abstract:
The k-means algorithm is often used in clustering
applications but its usage requires a complete data matrix. Missing data,
however, are common in many applications. Mainstream approaches to
clustering missing data reduce the missing data problem to a complete data
formulation through either deletion or imputation but these solutions may
incur significant costs. Our k-POD method presents a
simple extension of k-means clustering for missing data
that works even when the missingness mechanism is unknown, when external
information is unavailable, and when there is significant missingness in
the data.[Received November 2014. Revised August 2015.]
Journal: The American Statistician
Pages: 91-99
Issue: 1
Volume: 70
Year: 2016
Month: 2
X-DOI: 10.1080/00031305.2015.1086685
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1086685
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:91-99
Template-Type: ReDIF-Article 1.0
Author-Name: John E. Angus
Author-X-Name-First: John E.
Author-X-Name-Last: Angus
Title: Bootstrapping a Universal Pivot When Nuisance Parameters are Estimated
Abstract:
In complete samples from a continuous cumulative distribution with unknown
parameters, it is known that various pivotal functions can be constructed
by appealing to the probability integral transform. A pivotal function (or
simply pivot) is a function of the data and parameters that has the
property that its distribution is free of any unknown parameters. Pivotal
functions play a key role in constructing confidence intervals and
hypothesis tests. If there are nuisance parameters in addition to a
parameter of interest, and consistent estimators of the nuisance
parameters are available, then substituting them into the pivot can
preserve the pivot property while altering the pivot distribution, or may
instead create a function that is approximately a pivot in the sense that
its asymptotic distribution is free of unknown parameters. In this latter
case, bootstrapping has been shown to be an effective way of estimating
its distribution accurately and constructing confidence intervals that
have more accurate coverage probability in finite samples than those based
on the asymptotic pivot distribution. In this article, one particular
pivotal function based on the probability integral transform is considered
when nuisance parameters are estimated, and the estimation of its
distribution using parametric bootstrapping is examined. Applications to
finding confidence intervals are emphasized. This material should be of
interest to instructors of upper division and beginning graduate courses
in mathematical statistics who wish to integrate bootstrapping into their
lessons on interval estimation and the use of pivotal functions.[Received
November 2014. Revised August 2015.]
Journal: The American Statistician
Pages: 100-107
Issue: 1
Volume: 70
Year: 2016
Month: 2
X-DOI: 10.1080/00031305.2015.1086436
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1086436
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:100-107
Template-Type: ReDIF-Article 1.0
Author-Name: Tal Galili
Author-X-Name-First: Tal
Author-X-Name-Last: Galili
Author-Name: Isaac Meilijson
Author-X-Name-First: Isaac
Author-X-Name-Last: Meilijson
Title: An Example of an Improvable Rao--Blackwell Improvement, Inefficient Maximum Likelihood Estimator, and Unbiased Generalized Bayes Estimator
Abstract:
The Rao--Blackwell theorem offers a procedure for converting a crude
unbiased estimator of a parameter θ into a “better”
one, in fact unique and optimal if the improvement is based on a minimal
sufficient statistic that is complete. In contrast, behind every minimal
sufficient statistic that is not complete, there is an improvable
Rao--Blackwell improvement. This is illustrated via a simple example based
on the uniform distribution, in which a rather natural Rao--Blackwell
improvement is uniformly improvable. Furthermore, in this example the
maximum likelihood estimator is inefficient, and an unbiased generalized
Bayes estimator performs exceptionally well. Counterexamples of this sort
can be useful didactic tools for explaining the true nature of a
methodology and possible consequences when some of the assumptions are
violated.[Received December 2014. Revised September 2015.]
Journal: The American Statistician
Pages: 108-113
Issue: 1
Volume: 70
Year: 2016
Month: 2
X-DOI: 10.1080/00031305.2015.1100683
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1100683
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:108-113
Template-Type: ReDIF-Article 1.0
Author-Name: Iain L. MacDonald
Author-X-Name-First: Iain L.
Author-X-Name-Last: MacDonald
Author-Name: Brendon M. Lapham
Author-X-Name-First: Brendon M.
Author-X-Name-Last: Lapham
Title: Even More Direct Calculation of the Variance of a Maximum Penalized-Likelihood Estimator
Abstract:
We discuss here two examples of estimation by numerical maximization of
penalized likelihood. We show that, in these examples, it is simpler not
to use the EM algorithm for computation of the estimates or their standard
errors. We discuss also confidence and credibility intervals based on
penalized likelihood and a chi-squared approximate distribution, and
compare such intervals with intervals of Wald type.[Received July 2014.
Revised September 2015.]
Journal: The American Statistician
Pages: 114-118
Issue: 1
Volume: 70
Year: 2016
Month: 2
X-DOI: 10.1080/00031305.2015.1105151
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1105151
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:114-118
Template-Type: ReDIF-Article 1.0
Author-Name: Philip B. Stark
Author-X-Name-First: Philip B.
Author-X-Name-Last: Stark
Title: Privacy, Big Data, and the Public Good: Frameworks for Engagement
Journal: The American Statistician
Pages: 119-119
Issue: 1
Volume: 70
Year: 2016
Month: 2
X-DOI: 10.1080/00031305.2015.1068625
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1068625
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:119-119
Template-Type: ReDIF-Article 1.0
Author-Name: Stephen M. Stigler
Author-X-Name-First: Stephen M.
Author-X-Name-Last: Stigler
Title: Letter to the Editor
Journal: The American Statistician
Pages: 127-127
Issue: 1
Volume: 70
Year: 2016
Month: 2
X-DOI: 10.1080/00031305.2015.1105758
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1105758
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:1:p:127-127
Template-Type: ReDIF-Article 1.0
Author-Name: Ronald L. Wasserstein
Author-X-Name-First: Ronald L.
Author-X-Name-Last: Wasserstein
Author-Name: Nicole A. Lazar
Author-X-Name-First: Nicole A.
Author-X-Name-Last: Lazar
Title: The ASA's Statement on p-Values: Context, Process, and Purpose
Journal: The American Statistician
Pages: 129-133
Issue: 2
Volume: 70
Year: 2016
Month: 5
X-DOI: 10.1080/00031305.2016.1154108
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1154108
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:2:p:129-133
Template-Type: ReDIF-Article 1.0
Author-Name: Hossein Hoshyarmanesh
Author-X-Name-First: Hossein
Author-X-Name-Last: Hoshyarmanesh
Author-Name: Amirhossein Karami
Author-X-Name-First: Amirhossein
Author-X-Name-Last: Karami
Author-Name: Adel Mohammadpour
Author-X-Name-First: Adel
Author-X-Name-Last: Mohammadpour
Title: Confidence Intervals for the Scale Parameter of Exponential Family of Distributions
Abstract:
This article presents a unified approach for computing nonequal tail
optimal confidence intervals (CIs) for the scale parameter of the
exponential family of distributions. We prove that there exists a pivotal
quantity, as a function of a complete sufficient statistic, with a
chi-square distribution. Using the similarity between equations of
shortest, unbiased, and highest density CIs, all equations are reduced
into a system of two equations that can be solved via a straightforward
algorithm.
Journal: The American Statistician
Pages: 134-137
Issue: 2
Volume: 70
Year: 2016
Month: 5
X-DOI: 10.1080/00031305.2015.1123184
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1123184
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:2:p:134-137
Template-Type: ReDIF-Article 1.0
Author-Name: Pere Grima
Author-X-Name-First: Pere
Author-X-Name-Last: Grima
Author-Name: Lourdes Rodero
Author-X-Name-First: Lourdes
Author-X-Name-Last: Rodero
Author-Name: Xavier Tort-Martorell
Author-X-Name-First: Xavier
Author-X-Name-Last: Tort-Martorell
Title: Explaining the Importance of Variability to Engineering Students
Abstract:
One of the main challenges of teaching statistics to engineering students
is to convey the importance of being conscious of the presence of
variability and of taking it into account when making technical and
managerial decisions. Often, technical subjects are explained in an ideal
and deterministic environment. This article shows the possibilities of
simple electrical circuits—the Wheatstone Bridge among
them—to explain to students how to characterize variability, how it
is transmitted, and how it affects decisions. Additionally, they can be
used to introduce the importance of robustness by showing that taking into
account the variability of components allows the design of cheaper
products with greater benefits than if one were to simply apply formulas
that consider variables as exact values. The results are quite unexpected,
and they arouse the interest and motivation of students. Supplementary
materials for this article are available online.
Journal: The American Statistician
Pages: 138-142
Issue: 2
Volume: 70
Year: 2016
Month: 5
X-DOI: 10.1080/00031305.2015.1064478
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1064478
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:2:p:138-142
Template-Type: ReDIF-Article 1.0
Author-Name: Hakan Demirtas
Author-X-Name-First: Hakan
Author-X-Name-Last: Demirtas
Title: A Note on the Relationship Between the Phi Coefficient and the Tetrachoric Correlation Under Nonnormal Underlying Distributions
Abstract:
The connection between the phi coefficient and the tetrachoric correlation
is well-understood when the underlying distribution is bivariate normal.
For many other bivariate distributions, the identity that links these two
quantities together is not straightforward to formulate. Furthermore, even
when this can be done, solving the equation in either direction may be far
from trivial. We propose a simple technique that enables students and
researchers to compute one of these correlations when the other is
specified. Generalizing the normal-based results to a broad range of
bivariate distributional setups is potentially useful in graduate-level
teaching as well as in simulation studies that involve dichotomization and
random number generation where the relationships between these correlation
types need to be modeled.
Journal: The American Statistician
Pages: 143-148
Issue: 2
Volume: 70
Year: 2016
Month: 5
X-DOI: 10.1080/00031305.2015.1077161
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1077161
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:2:p:143-148
Template-Type: ReDIF-Article 1.0
Author-Name: Henry S. Lynn
Author-X-Name-First: Henry S.
Author-X-Name-Last: Lynn
Title: Training the Next Generation of Statisticians: From Head to Heart
Abstract:
A holistic view of training is advocated where educators focus not only on
the competence but also on the character of future statisticians. The
issues related to developing passion, formulating philosophy, and building
moral personhood are discussed. The vision is to foster a generation of
statisticians who are both well-equipped problem-solvers in specific
scientific areas and compassionate reformers to the general society.
Journal: The American Statistician
Pages: 149-151
Issue: 2
Volume: 70
Year: 2016
Month: 5
X-DOI: 10.1080/00031305.2015.1123186
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1123186
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:2:p:149-151
Template-Type: ReDIF-Article 1.0
Author-Name: Michael D. Porter
Author-X-Name-First: Michael D.
Author-X-Name-Last: Porter
Title: A Statistical Approach to Crime Linkage
Abstract:
The object of this article is to develop a statistical approach to
criminal linkage analysis that discovers and groups crime events that
share a common offender and prioritizes suspects for further
investigation. Bayes factors are used to describe the strength of evidence
that two crimes are linked. Using concepts from agglomerative hierarchical
clustering, the Bayes factors for crime pairs are combined to provide
similarity measures for comparing two crime series. This facilitates crime
series clustering, crime series identification, and suspect
prioritization. The ability of our models to make correct linkages and
predictions is demonstrated under a variety of real-world scenarios with a
large number of solved and unsolved breaking and entering crimes. For
example, a naive Bayes model for pairwise case linkage can identify 82% of
actual linkages with a 5% false positive rate. For crime series
identification, 74%--89% of the additional crimes in a crime series can be
identified from a ranked list of 50 incidents.
Journal: The American Statistician
Pages: 152-165
Issue: 2
Volume: 70
Year: 2016
Month: 5
X-DOI: 10.1080/00031305.2015.1123185
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1123185
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:2:p:152-165
Template-Type: ReDIF-Article 1.0
Author-Name: Martin L. Lesser
Author-X-Name-First: Martin L.
Author-X-Name-Last: Lesser
Author-Name: Meredith B. Akerman
Author-X-Name-First: Meredith B.
Author-X-Name-Last: Akerman
Author-Name: Nina Kohn
Author-X-Name-First: Nina
Author-X-Name-Last: Kohn
Title: Analogies for Helping Clinicians and Investigators Better Understand the Principles and Practice of Biostatistics
Abstract:
For the interaction between the biostatistician and the clinician or
research investigator to be successful, it is important not only for the
investigator to be able to explain biological and medical principles in a
way that can be understood by the biostatistician, so, too, the
biostatistician needs tools to help the investigator understand both the
practice of statistics and specific statistical methods. In our practice,
we have found it useful to draw analogies between statistical concepts and
familiar medical or everyday ideas. These analogies help to stress a point
or provide an understanding on the part of the investigator. For example,
explaining the reason for using a nonparametric procedure (a general
procedure used when the underlying distribution of the data is not known
or cannot be assumed) by comparing it to using broad spectrum antibiotics
(a general antibiotic used when the specific bacteria causing infection is
unknown or cannot be assumed) can be an effective teaching tool. We
present a variety of useful (and hopefully amusing) analogies that can be
adopted by statisticians to help investigators at all levels of experience
better understand principles and practice of statistics.
Journal: The American Statistician
Pages: 166-170
Issue: 2
Volume: 70
Year: 2016
Month: 5
X-DOI: 10.1080/00031305.2015.1073625
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1073625
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:2:p:166-170
Template-Type: ReDIF-Article 1.0
Author-Name: Eloísa Díaz-Francés
Author-X-Name-First: Eloísa
Author-X-Name-Last: Díaz-Francés
Title: Simple Estimation Intervals for Poisson, Exponential, and Inverse Gaussian Means Obtained by Symmetrizing the Likelihood Function
Abstract:
Likelihood intervals for the Poisson, exponential, and inverse Gaussian
means that have simple analytically closed expressions and good coverage
frequencies for any sample size are given here explicitly. Their
simplicity is striking and they should be more broadly used in
applications everywhere. Their soundness is due to three statistical
properties that these three distributions share as well as the fact that
for all of them there exists a simple power reparameterization that
symmetrizes the corresponding likelihood function. As a consequence,
asymptotic maximum likelihood results are applicable even for samples of
size one. Likelihood intervals of the new parameter may be easily
transformed back to the original parameter of interest, the mean, by the
invariance property of the likelihood function. Practical examples are
given to illustrate the proposed inferential procedures.
Journal: The American Statistician
Pages: 171-180
Issue: 2
Volume: 70
Year: 2016
Month: 5
X-DOI: 10.1080/00031305.2015.1123187
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1123187
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:2:p:171-180
Template-Type: ReDIF-Article 1.0
Author-Name: Thaddeus Tarpey
Author-X-Name-First: Thaddeus
Author-X-Name-Last: Tarpey
Author-Name: R. Todd Ogden
Author-X-Name-First: R. Todd
Author-X-Name-Last: Ogden
Title: Statistical Modeling to Inform Optimal Game Strategy: Markov Plays H-O-R-S-E
Abstract:
We illustrate practical uses of logistic regression and Markov chains by
applying these concepts to the problem of developing optimal strategy in
the popular basketball game of H-O-R-S-E. Based on data collected by the
authors, we estimate model parameters for each author, describe strategies
of optimizing each author’s probability of winning, and calculate
the stationary distribution of a Markov chain that arises from the game.
Journal: The American Statistician
Pages: 181-186
Issue: 2
Volume: 70
Year: 2016
Month: 5
X-DOI: 10.1080/00031305.2016.1148629
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1148629
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:2:p:181-186
Template-Type: ReDIF-Article 1.0
Author-Name: Susan M. Perkins
Author-X-Name-First: Susan M.
Author-X-Name-Last: Perkins
Author-Name: Peter Bacchetti
Author-X-Name-First: Peter
Author-X-Name-Last: Bacchetti
Author-Name: Cynthia S. Davey
Author-X-Name-First: Cynthia S.
Author-X-Name-Last: Davey
Author-Name: Christopher J. Lindsell
Author-X-Name-First: Christopher J.
Author-X-Name-Last: Lindsell
Author-Name: Madhu Mazumdar
Author-X-Name-First: Madhu
Author-X-Name-Last: Mazumdar
Author-Name: Robert A. Oster
Author-X-Name-First: Robert A.
Author-X-Name-Last: Oster
Author-Name: Peter N. Peduzzi
Author-X-Name-First: Peter N.
Author-X-Name-Last: Peduzzi
Author-Name: David M. Rocke
Author-X-Name-First: David M.
Author-X-Name-Last: Rocke
Author-Name: Kyle D. Rudser
Author-X-Name-First: Kyle D.
Author-X-Name-Last: Rudser
Author-Name: Mimi Kim
Author-X-Name-First: Mimi
Author-X-Name-Last: Kim
Title: Best Practices for Biostatistical Consultation and Collaboration in Academic Health Centers
Abstract:
Given the increasing level and scope of biostatistics expertise needed at
academic health centers today, we developed best practices guidelines for
biostatistics units to be more effective in providing biostatistical
support to their institutions, and in fostering an environment in which
unit members can thrive professionally. Our recommendations focus on the
key areas of: (1) funding sources and mechanisms; (2) providing and
prioritizing access to biostatistical resources; and (3) interacting with
investigators. We recommend that the leadership of biostatistics units
negotiate for sufficient long-term infrastructure support to ensure
stability and continuity of funding for personnel, align project budgets
closely with actual level of biostatistical effort, devise and
consistently apply strategies for prioritizing and tracking effort on
studies, and clearly stipulate with investigators prior to project
initiation policies regarding funding, lead time, and authorship.
Journal: The American Statistician
Pages: 187-194
Issue: 2
Volume: 70
Year: 2016
Month: 5
X-DOI: 10.1080/00031305.2015.1077727
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1077727
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:2:p:187-194
Template-Type: ReDIF-Article 1.0
Author-Name: Min Wang
Author-X-Name-First: Min
Author-X-Name-Last: Wang
Author-Name: Guangying Liu
Author-X-Name-First: Guangying
Author-X-Name-Last: Liu
Title: A Simple Two-Sample Bayesian t-Test for Hypothesis Testing
Abstract:
In this article, we propose an explicit closed-form Bayes factor for the
problem of two-sample hypothesis testing. The proposed approach can be
regarded as a Bayesian version of the pooled-variance
t-statistic and has various appealing properties in
practical applications. It relies on data only through the
t-statistic and can thus be calculated by using an Excel
spreadsheet or a pocket calculator. It avoids several undesirable
paradoxes, which may be encountered by the previous Bayesian approach in
the literature. Specifically, the proposed approach can be easily taught
in an introductory statistics course with an emphasis on Bayesian
thinking. Simulated and real data examples are provided for illustrative
purposes.
Journal: The American Statistician
Pages: 195-201
Issue: 2
Volume: 70
Year: 2016
Month: 5
X-DOI: 10.1080/00031305.2015.1093027
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1093027
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:2:p:195-201
Template-Type: ReDIF-Article 1.0
Author-Name: Adam Loy
Author-X-Name-First: Adam
Author-X-Name-Last: Loy
Author-Name: Lendie Follett
Author-X-Name-First: Lendie
Author-X-Name-Last: Follett
Author-Name: Heike Hofmann
Author-X-Name-First: Heike
Author-X-Name-Last: Hofmann
Title: Variations of Q--Q Plots: The Power of Our Eyes!
Abstract:
In statistical modeling, we strive to specify models that resemble data
collected in studies or observed from processes. Consequently,
distributional specification and parameter estimation are central to
parametric models. Graphical procedures, such as the quantile--quantile
(Q--Q) plot, are arguably the most
widely used method of distributional assessment, though critics find their
interpretation to be overly subjective. Formal goodness of fit tests are
available and are quite powerful, but only indicate whether there is a
lack of fit, not why there is lack of fit. In this article, we explore the
use of the lineup protocol to inject rigor into graphical distributional
assessment and compare its power to that of formal distributional tests.
We find that lineup tests are considerably more powerful than traditional
tests of normality. A further investigation into the design of
Q--Q plots shows that de-trended
Q--Q plots are more powerful than the
standard approach as long as the plot preserves distances in
x and y to be the same. While we focus
on diagnosing nonnormality, our approach is general and can be directly
extended to the assessment of other distributions.
Journal: The American Statistician
Pages: 202-214
Issue: 2
Volume: 70
Year: 2016
Month: 5
X-DOI: 10.1080/00031305.2015.1077728
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1077728
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:2:p:202-214
Template-Type: ReDIF-Article 1.0
Author-Name: Christopher S. Pentoney
Author-X-Name-First: Christopher S.
Author-X-Name-Last: Pentoney
Author-Name: Dale E. Berger
Author-X-Name-First: Dale E.
Author-X-Name-Last: Berger
Title: Confidence Intervals and the Within-the-Bar Bias
Abstract:
Bar graphs displaying means have been shown to bias interpretations of the
underlying distributions: viewers typically report higher likelihoods for
values within a bar than outside of a bar. One explanation is that viewer
attention is driven by the whole bar, rather than only the edge that
provides information about an average. This study explored several
approaches to correcting this bias. Bar graphs with 95% confidence
intervals were used with different levels of contrast to manipulate
attention directed to the bar. Viewers showed less bias when the salience
of the bar itself was reduced. Response latencies were lowest and bias was
eliminated when participants were presented with only a confidence
interval and no bar.
Journal: The American Statistician
Pages: 215-220
Issue: 2
Volume: 70
Year: 2016
Month: 5
X-DOI: 10.1080/00031305.2016.1141706
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1141706
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:2:p:215-220
Template-Type: ReDIF-Article 1.0
Author-Name: Saralees Nadarajah
Author-X-Name-First: Saralees
Author-X-Name-Last: Nadarajah
Title: Letter to the Editor
Journal: The American Statistician
Pages: 224-224
Issue: 2
Volume: 70
Year: 2016
Month: 5
X-DOI: 10.1080/00031305.2015.1086438
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1086438
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:2:p:224-224
Template-Type: ReDIF-Article 1.0
Author-Name: Bartolomeo Stellato
Author-X-Name-First: Bartolomeo
Author-X-Name-Last: Stellato
Author-Name: Bart P. G. Van Parys
Author-X-Name-First: Bart P. G.
Author-X-Name-Last: Van Parys
Author-Name: Paul J. Goulart
Author-X-Name-First: Paul J.
Author-X-Name-Last: Goulart
Title: Multivariate Chebyshev Inequality With Estimated Mean and Variance
Abstract:
A variant of the well-known Chebyshev inequality for scalar random variables can be formulated in the case where the mean and variance are estimated from samples. In this article, we present a generalization of this result to multiple dimensions where the only requirement is that the samples are independent and identically distributed. Furthermore, we show that as the number of samples tends to infinity our inequality converges to the theoretical multi-dimensional Chebyshev bound.
Journal: The American Statistician
Pages: 123-127
Issue: 2
Volume: 71
Year: 2017
Month: 4
X-DOI: 10.1080/00031305.2016.1186559
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1186559
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:123-127
Template-Type: ReDIF-Article 1.0
Author-Name: Robert A. Stine
Author-X-Name-First: Robert A.
Author-X-Name-Last: Stine
Title: Explaining Normal Quantile-Quantile Plots Through Animation: The Water-Filling Analogy
Abstract:
A normal quantile-quantile (QQ) plot is an important diagnostic for checking the assumption of normality. Though useful, these plots confuse students in my introductory statistics classes. A water-filling analogy, however, intuitively conveys the underlying concept. This analogy characterizes a QQ plot as a parametric plot of the water levels in two gradually filling vases. Each vase takes its shape from a probability distribution or sample. If the vases share a common shape, then the water levels match throughout the filling, and the QQ plot traces a diagonal line. An R package qqvases provides an interactive animation of this process and is suitable for classroom use.
Journal: The American Statistician
Pages: 145-147
Issue: 2
Volume: 71
Year: 2017
Month: 4
X-DOI: 10.1080/00031305.2016.1200488
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1200488
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:145-147
Template-Type: ReDIF-Article 1.0
Author-Name: Hillel Bar-Gera
Author-X-Name-First: Hillel
Author-X-Name-Last: Bar-Gera
Title: The Target Parameter of Adjusted R-Squared in Fixed-Design Experiments
Abstract:
R-squared (R2) and adjusted R-squared (R2Adj) are sometimes viewed as statistics detached from any target parameter, and sometimes as estimators for the population multiple correlation. The latter interpretation is meaningful only if the explanatory variables are random. This article proposes an alternative perspective for the case where the x’s are fixed. A new parameter is defined, in a similar fashion to the construction of R2, but relying on the true parameters rather than their estimates. (The parameter definition includes also the fixed x values.) This parameter is referred to as the “parametric” coefficient of determination, and denoted by ρ2*. The proposed ρ2* remains stable when irrelevant variables are removed (or added), unlike the unadjusted R2, which always goes up when variables, either relevant or not, are added to the model (and goes down when they are removed). The value of the traditional R2Adj may go up or down with added (or removed) variables, either relevant or not. It is shown that the unadjusted R2 overestimates ρ2*, while the traditional R2Adj underestimates it. It is also shown that for simple linear regression the magnitude of the bias of R2Adj can be as high as the bias of the unadjusted R2 (while their signs are opposite). Asymptotic convergence in probability of R2Adj to ρ2* is demonstrated. The effects of model parameters on the bias of R2 and R2Adj are characterized analytically and numerically. An alternative bi-adjusted estimator is presented and evaluated.
Journal: The American Statistician
Pages: 112-119
Issue: 2
Volume: 71
Year: 2017
Month: 4
X-DOI: 10.1080/00031305.2016.1200489
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1200489
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:112-119
Template-Type: ReDIF-Article 1.0
Author-Name: Yudi Pawitan
Author-X-Name-First: Yudi
Author-X-Name-Last: Pawitan
Author-Name: Youngjo Lee
Author-X-Name-First: Youngjo
Author-X-Name-Last: Lee
Title: Wallet Game: Probability, Likelihood, and Extended Likelihood
Abstract:
We propose a likelihood explanation to the two-person wallet game, a probability-related paradox, where an obviously fair game may appear favorable to both players. Yet a small variation of the game, without changing its fairness, turns it to seem unfavorable. The extended likelihood concept seems logically necessary if we want to allow the sense of uncertainty associated with a realized but still unobserved random outcome, while at the same time avoid potential probability-related paradoxes.
Journal: The American Statistician
Pages: 120-122
Issue: 2
Volume: 71
Year: 2017
Month: 4
X-DOI: 10.1080/00031305.2016.1202140
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1202140
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:120-122
Template-Type: ReDIF-Article 1.0
Author-Name: Ryan Martin
Author-X-Name-First: Ryan
Author-X-Name-Last: Martin
Title: A Statistical Inference Course Based on -Values
Abstract:
Introductory statistical inference texts and courses treat the point estimation, hypothesis testing, and interval estimation problems separately, with primary emphasis on large-sample approximations. Here, I present an alternative approach to teaching this course, built around p-values, emphasizing provably valid inference for all sample sizes. Details about computation and marginalization are also provided, with several illustrative examples, along with a course outline. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 128-136
Issue: 2
Volume: 71
Year: 2017
Month: 4
X-DOI: 10.1080/00031305.2016.1208629
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1208629
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:128-136
Template-Type: ReDIF-Article 1.0
Author-Name: Shaobo Jin
Author-X-Name-First: Shaobo
Author-X-Name-Last: Jin
Author-Name: Måns Thulin
Author-X-Name-First: Måns
Author-X-Name-Last: Thulin
Author-Name: Rolf Larsson
Author-X-Name-First: Rolf
Author-X-Name-Last: Larsson
Title: Approximate Bayesianity of Frequentist Confidence Intervals for a Binomial Proportion
Abstract:
The well-known Wilson and Agresti–Coull confidence intervals for a binomial proportion p are centered around a Bayesian estimator. Using this as a starting point, similarities between frequentist confidence intervals for proportions and Bayesian credible intervals based on low-informative priors are studied using asymptotic expansions. A Bayesian motivation for a large class of frequentist confidence intervals is provided. It is shown that the likelihood ratio interval for p approximates a Bayesian credible interval based on Kerman’s neutral noninformative conjugate prior up to O(n− 1) in the confidence bounds. For the significance level α ≲ 0.317, the Bayesian interval based on the Jeffreys’ prior is then shown to be a compromise between the likelihood ratio and Wilson intervals. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 106-111
Issue: 2
Volume: 71
Year: 2017
Month: 4
X-DOI: 10.1080/00031305.2016.1208630
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1208630
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:106-111
Template-Type: ReDIF-Article 1.0
Author-Name: Zhi-Sheng Ye
Author-X-Name-First: Zhi-Sheng
Author-X-Name-Last: Ye
Author-Name: Nan Chen
Author-X-Name-First: Nan
Author-X-Name-Last: Chen
Title: Closed-Form Estimators for the Gamma Distribution Derived From Likelihood Equations
Abstract:
It is well-known that maximum likelihood (ML) estimators of the two parameters in a gamma distribution do not have closed forms. This poses difficulties in some applications such as real-time signal processing using low-grade processors. The gamma distribution is a special case of a generalized gamma distribution. Surprisingly, two out of the three likelihood equations of the generalized gamma distribution can be used as estimating equations for the gamma distribution, based on which simple closed-form estimators for the two gamma parameters are available. Intuitively, performance of the new estimators based on likelihood equations should be close to the ML estimators. The study consolidates this conjecture by establishing the asymptotic behaviors of the new estimators. In addition, the closed-forms enable bias-corrections to these estimators. The bias-correction significantly improves the small-sample performance.
Journal: The American Statistician
Pages: 177-181
Issue: 2
Volume: 71
Year: 2017
Month: 4
X-DOI: 10.1080/00031305.2016.1209129
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1209129
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:177-181
Template-Type: ReDIF-Article 1.0
Author-Name: Brian Knaeble
Author-X-Name-First: Brian
Author-X-Name-Last: Knaeble
Author-Name: Seth Dutter
Author-X-Name-First: Seth
Author-X-Name-Last: Dutter
Title: Reversals of Least-Square Estimates and Model-Invariant Estimation for Directions of Unique Effects
Abstract:
When a linear model is adjusted to control for additional explanatory variables, the sign of a fitted coefficient may reverse. Here, these reversals are studied using coefficients of determination. The resulting theory can be used to determine directions of unique effects in the presence of model uncertainty. This process is called model-invariant estimation when the estimates are invariant across changes to the model structure. When a single covariate is added, the reversal region can be understood geometrically as an elliptical cone of two nappes with an axis of symmetry relating to a best-possible condition for a reversal using a single coefficient of determination. When a set of covariates are added to a model with a single explanatory variable, model-invariant estimation can be implemented using subject matter knowledge. More general theory with partial coefficients is applicable to analysis of large datasets. Applications are demonstrated with dietary health data from the United Nations.
Journal: The American Statistician
Pages: 97-105
Issue: 2
Volume: 71
Year: 2017
Month: 4
X-DOI: 10.1080/00031305.2016.1226951
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1226951
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:97-105
Template-Type: ReDIF-Article 1.0
Author-Name: Jo A. Wick
Author-X-Name-First: Jo A.
Author-X-Name-Last: Wick
Author-Name: Hung-Wen Yeh
Author-X-Name-First: Hung-Wen
Author-X-Name-Last: Yeh
Author-Name: Byron J. Gajewski
Author-X-Name-First: Byron J.
Author-X-Name-Last: Gajewski
Title: A Bayesian Analysis of Synchronous Distance Learning versus Matched Traditional Control in Graduate Biostatistics Courses
Abstract:
Distance learning can be useful for bridging geographical barriers to education in rural settings. However, empirical evidence on the equivalence of distance education and traditional face-to-face (F2F) instruction in statistics and biostatistics is mixed. Despite the difficulty in randomization, we minimized intra-instructor variation between F2F and online sections in seven graduate-level biostatistics service courses in a synchronous (live, real time) fashion; that is, for each course taught in a traditional F2F setting, a separate set of students were taught simultaneously via online learning technology, allowing for two-way interaction between instructor and students. Our primary objective was to compare student performance in the two courses that use these two teaching modes. We used a Bayesian hierarchical model to test equivalence of modes. The frequentist mixed model approach was also conducted for reference. The results of Bayesian and frequentist methods agree and suggest a difference of less than 1% in average final grades. Finally, we discuss barriers to instruction and learning using the applied online teaching technology.
Journal: The American Statistician
Pages: 137-144
Issue: 2
Volume: 71
Year: 2017
Month: 4
X-DOI: 10.1080/00031305.2016.1247014
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1247014
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:137-144
Template-Type: ReDIF-Article 1.0
Author-Name: Philippa Swartz
Author-X-Name-First: Philippa
Author-X-Name-Last: Swartz
Author-Name: Mike Grosskopf
Author-X-Name-First: Mike
Author-X-Name-Last: Grosskopf
Author-Name: Derek Bingham
Author-X-Name-First: Derek
Author-X-Name-Last: Bingham
Author-Name: Tim B. Swartz
Author-X-Name-First: Tim B.
Author-X-Name-Last: Swartz
Title: The Quality of Pitches in Major League Baseball
Abstract:
This article considers the quality of pitches in Major League Baseball (MLB). Based on approximately 2.2 million pitches taken from the 2013, 2014, and 2015 MLB seasons, the quality of a particular pitch is evaluated as the expected number of bases conceded. Quality is expressed as a function of various covariates including pitch count, pitch location, pitch type, and pitch speed. The estimation of the pitch quality is obtained through the use of random forest methodology to accommodate the inherent complexity of the relationship between pitch quality and the associated covariates. With the fitted model, various applications are considered which provide new insights on pitching and batting.
Journal: The American Statistician
Pages: 148-154
Issue: 2
Volume: 71
Year: 2017
Month: 4
X-DOI: 10.1080/00031305.2016.1264313
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1264313
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:148-154
Template-Type: ReDIF-Article 1.0
Author-Name: Olanrewaju Akande
Author-X-Name-First: Olanrewaju
Author-X-Name-Last: Akande
Author-Name: Fan Li
Author-X-Name-First: Fan
Author-X-Name-Last: Li
Author-Name: Jerome Reiter
Author-X-Name-First: Jerome
Author-X-Name-Last: Reiter
Title: An Empirical Comparison of Multiple Imputation Methods for Categorical Data
Abstract:
Multiple imputation is a common approach for dealing with missing values in statistical databases. The imputer fills in missing values with draws from predictive models estimated from the observed data, resulting in multiple, completed versions of the database. Researchers have developed a variety of default routines to implement multiple imputation; however, there has been limited research comparing the performance of these methods, particularly for categorical data. We use simulation studies to compare repeated sampling properties of three default multiple imputation methods for categorical data, including chained equations using generalized linear models, chained equations using classification and regression trees, and a fully Bayesian joint distribution based on Dirichlet process mixture models. We base the simulations on categorical data from the American Community Survey. In the circumstances of this study, the results suggest that default chained equations approaches based on generalized linear models are dominated by the default regression tree and Bayesian mixture model approaches. They also suggest competing advantages for the regression tree and Bayesian mixture model approaches, making both reasonable default engines for multiple imputation of categorical data. Supplementary material for this article is available online.
Journal: The American Statistician
Pages: 162-170
Issue: 2
Volume: 71
Year: 2017
Month: 4
X-DOI: 10.1080/00031305.2016.1277158
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1277158
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:162-170
Template-Type: ReDIF-Article 1.0
Author-Name: Amy L. Phelps
Author-X-Name-First: Amy L.
Author-X-Name-Last: Phelps
Author-Name: Kathryn A. Szabat
Author-X-Name-First: Kathryn A.
Author-X-Name-Last: Szabat
Title: The Current Landscape of Teaching Analytics to Business Students at Institutions of Higher Education: Who is Teaching What?
Abstract:
Business analytics continues to become increasingly important in business and therefore in business education. We surveyed faculty who teach statistics or whose institutions offer statistics to business students and conducted web searches of business analytics and data science programs that are offered by these faculties associated with schools of business. The intent of the survey and web searches was to gain insight on the current landscape of business analytics and how it may work synergistically with data science at institutions of higher education, as well as inform the role that statistics education plays in the era of big data. The study presents an analysis of subject areas (Statistics, Operations Research, Management Information Systems, Data Analytics, and Soft Skills) covered in courses offered by institutions with undergraduate degrees in business analytics or data science influencing statistics taught to business students. Given the notable contribution of statistics to the study of business analytics and data science and the importance of knowledge and skills acquired in statistics-based courses not only for students pursuing a major or minor in the discipline, but also for all business majors entering the current data-centric business environment, we present findings about who is teaching what in business statistics education.
Journal: The American Statistician
Pages: 155-161
Issue: 2
Volume: 71
Year: 2017
Month: 4
X-DOI: 10.1080/00031305.2016.1277160
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1277160
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:155-161
Template-Type: ReDIF-Article 1.0
Author-Name: Michael P. Cohen
Author-X-Name-First: Michael P.
Author-X-Name-Last: Cohen
Title: Non-Asymptotic Mean and Variance Also Approximately Satisfy Taylor's Law
Journal: The American Statistician
Pages: 187-187
Issue: 2
Volume: 71
Year: 2017
Month: 4
X-DOI: 10.1080/00031305.2017.1286261
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1286261
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:187-187
Template-Type: ReDIF-Article 1.0
Author-Name: Iain L. MacDonald
Author-X-Name-First: Iain L.
Author-X-Name-Last: MacDonald
Title: Models for count data
Journal: The American Statistician
Pages: 187-190
Issue: 2
Volume: 71
Year: 2017
Month: 4
X-DOI: 10.1080/00031305.2017.1291449
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1291449
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:187-190
Template-Type: ReDIF-Article 1.0
Author-Name: Stuart R. Lipsitz
Author-X-Name-First: Stuart R.
Author-X-Name-Last: Lipsitz
Author-Name: Garrett M. Fitzmaurice
Author-X-Name-First: Garrett M.
Author-X-Name-Last: Fitzmaurice
Author-Name: Debajyoti Sinha
Author-X-Name-First: Debajyoti
Author-X-Name-Last: Sinha
Author-Name: Nathanael Hevelone
Author-X-Name-First: Nathanael
Author-X-Name-Last: Hevelone
Author-Name: Edward Giovannucci
Author-X-Name-First: Edward
Author-X-Name-Last: Giovannucci
Author-Name: Quoc-Dien Trinh
Author-X-Name-First: Quoc-Dien
Author-X-Name-Last: Trinh
Author-Name: Jim C. Hu
Author-X-Name-First: Jim C.
Author-X-Name-Last: Hu
Title: Efficient Computation of Reduced Regression Models
Abstract:
We consider settings where it is of interest to fit and assess regression submodels that arise as various explanatory variables are excluded from a larger regression model. The larger model is referred to as the full model; the submodels are the reduced models. We show that a computationally efficient approximation to the regression estimates under any reduced model can be obtained from a simple weighted least squares (WLS) approach based on the estimated regression parameters and covariance matrix from the full model. This WLS approach can be considered an extension to unbiased estimating equations of a first-order Taylor series approach proposed by Lawless and Singhal. Using data from the 2010 Nationwide Inpatient Sample (NIS), a 20% weighted, stratified, cluster sample of approximately 8 million hospital stays from approximately 1000 hospitals, we illustrate the WLS approach when fitting interval censored regression models to estimate the effect of type of surgery (robotic versus nonrobotic surgery) on hospital length-of-stay while adjusting for three sets of covariates: patient-level characteristics, hospital characteristics, and zip-code level characteristics. Ordinarily, standard fitting of the reduced models to the NIS data takes approximately 10 hours; using the proposed WLS approach, the reduced models take seconds to fit.
Journal: The American Statistician
Pages: 171-176
Issue: 2
Volume: 71
Year: 2017
Month: 4
X-DOI: 10.1080/00031305.2017.1296375
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1296375
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:171-176
Template-Type: ReDIF-Article 1.0
Author-Name: Kimberly F. Sellers
Author-X-Name-First: Kimberly F.
Author-X-Name-Last: Sellers
Author-Name: Darcy S. Morris
Author-X-Name-First: Darcy S.
Author-X-Name-Last: Morris
Author-Name: Galit Shmueli
Author-X-Name-First: Galit
Author-X-Name-Last: Shmueli
Author-Name: Li Zhu
Author-X-Name-First: Li
Author-X-Name-Last: Zhu
Title: Reply
Journal: The American Statistician
Pages: 190-190
Issue: 2
Volume: 71
Year: 2017
Month: 4
X-DOI: 10.1080/00031305.2017.1296738
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1296738
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:190-190
Template-Type: ReDIF-Article 1.0
Author-Name: The Editors
Title: Reviews of Books and Teaching Materials
Journal: The American Statistician
Pages: 182-186
Issue: 2
Volume: 71
Year: 2017
Month: 4
X-DOI: 10.1080/00031305.2017.1325631
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1325631
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:2:p:182-186
Template-Type: ReDIF-Article 1.0
Author-Name: Victor Fossaluza
Author-X-Name-First: Victor
Author-X-Name-Last: Fossaluza
Author-Name: Rafael Izbicki
Author-X-Name-First: Rafael
Author-X-Name-Last: Izbicki
Author-Name: Gustavo Miranda da Silva
Author-X-Name-First: Gustavo Miranda
Author-X-Name-Last: da Silva
Author-Name: Luís Gustavo Esteves
Author-X-Name-First: Luís Gustavo
Author-X-Name-Last: Esteves
Title: Coherent Hypothesis Testing
Abstract:
Multiple hypothesis testing, an important quantitative tool to report the results of scientific inquiries, frequently leads to contradictory conclusions. For instance, in an analysis of variance (ANOVA) setting, the same dataset can lead one to reject the equality of two means, say μ1 = μ2, but at the same time to not reject the hypothesis that μ1 = μ2 = 0. These two conclusions violate the coherence principle introduced by Gabriel in 1969, and lead to results that are difficult to communicate, and, many times, embarrassing for practitioners of statistical methods. Although this situation is common in the daily life of statisticians, it is usually not discussed in courses of statistics. In this work, we enrich the teaching and discussion of this important topic by investigating through a few examples whether several standard test procedures are coherent or not. We also discuss the relationship between coherent tests and measures of support. Finally, we show how a Bayesian decision-theoretical framework can be used to build coherent tests. These approaches to coherence enlighten when such property is appealing in multiple testing and provide means of obtaining it.
Journal: The American Statistician
Pages: 242-248
Issue: 3
Volume: 71
Year: 2017
Month: 7
X-DOI: 10.1080/00031305.2016.1237893
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1237893
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:3:p:242-248
Template-Type: ReDIF-Article 1.0
Author-Name: Panagiotis (Panos) Toulis
Author-X-Name-First: Panagiotis (Panos)
Author-X-Name-Last: Toulis
Title: A Useful Pivotal Quantity
Abstract:
Consider n continuous random variables with joint density f that possibly dependson unknown parameters θ. If the negative of the logarithm of f is a positive homogenous function of degree p taking only positive values, then that function is distributed as a Gamma random variable with shape n/p and scale 2, and thus it is a pivotal quantity for θ. This provides a general method to construct pivotal quantities, which are widely applicable in statistical practice, such as hypothesis testing and confidence intervals. Here, we prove the aforementioned result and illustrate through examples.
Journal: The American Statistician
Pages: 272-274
Issue: 3
Volume: 71
Year: 2017
Month: 7
X-DOI: 10.1080/00031305.2016.1237894
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1237894
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:3:p:272-274
Template-Type: ReDIF-Article 1.0
Author-Name: Chunpeng Fan
Author-X-Name-First: Chunpeng
Author-X-Name-Last: Fan
Author-Name: Lin Wang
Author-X-Name-First: Lin
Author-X-Name-Last: Wang
Author-Name: Lynn Wei
Author-X-Name-First: Lynn
Author-X-Name-Last: Wei
Title: Comparing Two Tests for Two Rates
Abstract:
This article rigorously proves superiority of the proportion χ2 test to the logistic regression Wald test in terms of power when comparing two rates, despite their asymptotic equivalence under the null hypothesis that the two rates are equal.
Journal: The American Statistician
Pages: 275-281
Issue: 3
Volume: 71
Year: 2017
Month: 7
X-DOI: 10.1080/00031305.2016.1246263
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1246263
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:3:p:275-281
Template-Type: ReDIF-Article 1.0
Author-Name: Roger W. Hoerl
Author-X-Name-First: Roger W.
Author-X-Name-Last: Hoerl
Author-Name: Ronald D. Snee
Author-X-Name-First: Ronald D.
Author-X-Name-Last: Snee
Title: Statistical Engineering: An Idea Whose Time Has Come?
Abstract:
Several authors, including the American Statistical Association (ASA) guidelines for undergraduate statistics education (American Statistical Association Undergraduate Guidelines Workgroup), have noted the challenges facing statisticians when attacking large, complex, and unstructured problems, as opposed to well-defined textbook problems. Clearly, the standard paradigm of selecting the one “correct” statistical method for such problems is not sufficient; a new paradigm is needed. Statistical engineering has been proposed as a discipline that can provide a viable paradigm to attack such problems, used in conjunction with sound statistical science. Of course, to develop as a true discipline, statistical engineering must be clearly defined and articulated. Further, a well-developed underlying theory is needed, one that would prove helpful in addressing such large, complex, and unstructured problems. The purpose of this expository article is to more clearly articulate the current state of statistical engineering, and make a case for why it merits further study by the profession as a means of addressing such problems. We conclude with a “call to action.”
Journal: The American Statistician
Pages: 209-219
Issue: 3
Volume: 71
Year: 2017
Month: 7
X-DOI: 10.1080/00031305.2016.1247015
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1247015
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:3:p:209-219
Template-Type: ReDIF-Article 1.0
Author-Name: Wen-Han Hwang
Author-X-Name-First: Wen-Han
Author-X-Name-Last: Hwang
Author-Name: Richard Huggins
Author-X-Name-First: Richard
Author-X-Name-Last: Huggins
Author-Name: Lu-Fang Chen
Author-X-Name-First: Lu-Fang
Author-X-Name-Last: Chen
Title: A Note on the Inverse Birthday Problem With Applications
Abstract:
The classical birthday problem considers the probability that at least two people in a group of size N share the same birthday. The inverse birthday problem considers the estimation of the size N of a group given the number of different birthdays in the group. In practice, this problem is analogous to estimating the size of a population from occurrence data only. The inverse problem can be solved via two simple approaches including the method of moments for a multinominal model and the maximum likelihood estimate of a Poisson model, which we present in this study. We investigate properties of both methods and show that they can yield asymptotically equivalent Wald-type interval estimators. Moreover, we show that these methods estimate a lower bound for the population size when birth rates are nonhomogenous or individuals in the population are aggregated. A simulation study was conducted to evaluate the performance of the point estimates arising from the two approaches and to compare the performance of seven interval estimators, including likelihood ratio and log-transformation methods. We illustrate the utility of these methods by estimating: (1) the abundance of tree species over a 50-hectare forest plot, (2) the number of Chlamydia infections when only the number of different birthdays of the patients is known, and (3) the number of rainy days when the number of rainy weeks is known. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 191-201
Issue: 3
Volume: 71
Year: 2017
Month: 7
X-DOI: 10.1080/00031305.2016.1255657
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1255657
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:3:p:191-201
Template-Type: ReDIF-Article 1.0
Author-Name: Jarod Y. L. Lee
Author-X-Name-First: Jarod Y. L.
Author-X-Name-Last: Lee
Author-Name: James J. Brown
Author-X-Name-First: James J.
Author-X-Name-Last: Brown
Author-Name: Louise M. Ryan
Author-X-Name-First: Louise M.
Author-X-Name-Last: Ryan
Title: Sufficiency Revisited: Rethinking Statistical Algorithms in the Big Data Era
Abstract:
The big data era demands new statistical analysis paradigms, since traditional methods often break down when datasets are too large to fit on a single desktop computer. Divide and Recombine (D&R) is becoming a popular approach for big data analysis, where results are combined over subanalyses performed in separate data subsets. In this article, we consider situations where unit record data cannot be made available by data custodians due to privacy concerns, and explore the concept of statistical sufficiency and summary statistics for model fitting. The resulting approach represents a type of D&R strategy, which we refer to as summary statistics D&R; as opposed to the standard approach, which we refer to as horizontal D&R. We demonstrate the concept via an extended Gamma–Poisson model, where summary statistics are extracted from different databases and incorporated directly into the fitting algorithm without having to combine unit record data. By exploiting the natural hierarchy of data, our approach has major benefits in terms of privacy protection. Incorporating the proposed modelling framework into data extraction tools such as TableBuilder by the Australian Bureau of Statistics allows for potential analysis at a finer geographical level, which we illustrate with a multilevel analysis of the Australian unemployment data. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 202-208
Issue: 3
Volume: 71
Year: 2017
Month: 7
X-DOI: 10.1080/00031305.2016.1255659
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1255659
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:3:p:202-208
Template-Type: ReDIF-Article 1.0
Author-Name: William B. Fairley
Author-X-Name-First: William B.
Author-X-Name-Last: Fairley
Author-Name: Peter J. Kempthorne
Author-X-Name-First: Peter J.
Author-X-Name-Last: Kempthorne
Author-Name: Julie Novak
Author-X-Name-First: Julie
Author-X-Name-Last: Novak
Author-Name: Scott McGarvie
Author-X-Name-First: Scott
Author-X-Name-Last: McGarvie
Author-Name: Steve Crunk
Author-X-Name-First: Steve
Author-X-Name-Last: Crunk
Author-Name: Bee Leng Lee
Author-X-Name-First: Bee Leng
Author-X-Name-Last: Lee
Author-Name: Alan J. Salzberg
Author-X-Name-First: Alan J.
Author-X-Name-Last: Salzberg
Title: Resolving a Multi-Million Dollar Contract Dispute With a Latin Square
Abstract:
The City of New York negotiated a dispute over the performance of new garbage trucks purchased from a vehicle manufacturer. The dispute concerned the fulfillment of a specification in the purchase contract that the trucks load a minimum full-load of 12.5 tons of household refuse. On behalf of the City, but in cooperation with the manufacturer, the City's Department of Sanitation and consulting statisticians tested fulfillment of the contract specification, employing a Latin Square design for routing trucks. We present the classical analysis using a linear model and analysis of variance. We also show how fixed, mixed, and random effect models are useful in analyzing the results of the test. Finally, we take a Bayesian perspective to demonstrate how the information from the data overcomes the difference between the prior densities of the city and the manufacturer for the load capacities of the trucks to result in much closer posterior densities. This procedure might prove useful in similar negotiations. Supplementary material including the data and R code for computations in the article are available online.
Journal: The American Statistician
Pages: 249-258
Issue: 3
Volume: 71
Year: 2017
Month: 7
X-DOI: 10.1080/00031305.2016.1256231
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1256231
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:3:p:249-258
Template-Type: ReDIF-Article 1.0
Author-Name: Heidi Spratt
Author-X-Name-First: Heidi
Author-X-Name-Last: Spratt
Author-Name: Erin E. Fox
Author-X-Name-First: Erin E.
Author-X-Name-Last: Fox
Author-Name: Nawar Shara
Author-X-Name-First: Nawar
Author-X-Name-Last: Shara
Author-Name: Madhu Mazumdar
Author-X-Name-First: Madhu
Author-X-Name-Last: Mazumdar
Title: Strategies for Success: Early-Stage Collaborating Biostatistics Faculty in an Academic Health Center
Abstract:
Collaborative biostatistics faculty (CBF) are increasingly valued by academic health centers (AHCs) for their role in increasing success rates of grants and publications, and educating medical students and clinical researchers. Some AHCs have a biostatistics department that consists of only biostatisticians focused on methodological research, collaborative research, and education. Others may have a biostatistics unit within an interdisciplinary department, or statisticians recruited into clinical departments. Within each model, there is also variability in environment, influenced by the chair's background, research focus of colleagues, type of students taught, funding sources, and whether the department is in a medical school or school of public health. CBF appointments may be tenure track or nontenure, and expectations for promotion may vary greatly depending on the type of department, track, and the AHC. In this article, the authors identify strategies for developing early-stage CBFs in four domains: (1) Influence of department/environment, (2) Skills to develop, (3) Ways to increase productivity, and (4) Ways to document accomplishments. Graduating students and postdoctoral fellows should consider the first domain when choosing a faculty position. Early-stage CBFs will benefit by understanding the requirements of their environment early in their appointment and by modifying the provided progression grid with their chair and mentoring team as needed. Following this personalized grid will increase the chances of a satisfying career with appropriate recognition for academic accomplishments.
Journal: The American Statistician
Pages: 220-230
Issue: 3
Volume: 71
Year: 2017
Month: 7
X-DOI: 10.1080/00031305.2016.1277157
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1277157
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:3:p:220-230
Template-Type: ReDIF-Article 1.0
Author-Name: Tahir Ekin
Author-X-Name-First: Tahir
Author-X-Name-Last: Ekin
Author-Name: Francesca Ieva
Author-X-Name-First: Francesca
Author-X-Name-Last: Ieva
Author-Name: Fabrizio Ruggeri
Author-X-Name-First: Fabrizio
Author-X-Name-Last: Ruggeri
Author-Name: Refik Soyer
Author-X-Name-First: Refik
Author-X-Name-Last: Soyer
Title: On the Use of the Concentration Function in Medical Fraud Assessment
Abstract:
We propose a simple, but effective, tool to detect possible anomalies in the services prescribed by a health care provider (HP) compared to his/her colleagues in the same field and environment. Our method is based on the concentration function that is an extension of the Lorenz curve widely used in describing uneven distribution of wealth in a population. The proposed tool provides a graphical illustration of a possible anomalous behavior of the HPs and it can be used as a prescreening device for further investigations of potential medical fraud.
Journal: The American Statistician
Pages: 236-241
Issue: 3
Volume: 71
Year: 2017
Month: 7
X-DOI: 10.1080/00031305.2017.1292955
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1292955
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:3:p:236-241
Template-Type: ReDIF-Article 1.0
Author-Name: Jeff Witmer
Author-X-Name-First: Jeff
Author-X-Name-Last: Witmer
Title: Bayes and MCMC for Undergraduates
Abstract:
Students of statistics should be taught the ideas and methods that are widely used in practice and that will help them understand the world of statistics. Today, this means teaching them about Bayesian methods. In this article, I present ideas on teaching an undergraduate Bayesian course that uses Markov chain Monte Carlo and that can be a second course or, for strong students, a first course in statistics.
Journal: The American Statistician
Pages: 259-264
Issue: 3
Volume: 71
Year: 2017
Month: 7
X-DOI: 10.1080/00031305.2017.1305289
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1305289
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:3:p:259-264
Template-Type: ReDIF-Article 1.0
Author-Name: Sitsofe Tsagbey
Author-X-Name-First: Sitsofe
Author-X-Name-Last: Tsagbey
Author-Name: Miguel de Carvalho
Author-X-Name-First: Miguel
Author-X-Name-Last: de Carvalho
Author-Name: Garritt L. Page
Author-X-Name-First: Garritt L.
Author-X-Name-Last: Page
Title: All Data are Wrong, but Some are Useful? Advocating the Need for Data Auditing
Abstract:
In a recent article from the Annals of Applied Statistics, Cox discussed the main phases of applied statistical research ranging from clarifying study objectives to final data analysis and interpreting results. As an incidental remark to these main phases, we advocate that beyond cleaning and preprocessing the data, it is a good practice to audit the data to determine if they can be trusted at all. A case study based on Ghanaian Official Fishery Statistics is used to illustrate this need, with Benford's law being the tool used to carrying out the data audit. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 231-235
Issue: 3
Volume: 71
Year: 2017
Month: 7
X-DOI: 10.1080/00031305.2017.1311282
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1311282
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:3:p:231-235
Template-Type: ReDIF-Article 1.0
Author-Name: Subhash Bagui
Author-X-Name-First: Subhash
Author-X-Name-Last: Bagui
Author-Name: K. L. Mehra
Author-X-Name-First: K. L.
Author-X-Name-Last: Mehra
Title: Convergence of Known Distributions to Limiting Normal or Non-normal Distributions: An Elementary Ratio Technique
Abstract:
This article presents an elementary informal technique for deriving the convergence of known distributions to limiting normal or non-normal distributions. The presentation should be of interest to teachers and students of first year graduate level courses in probability and statistics.
Journal: The American Statistician
Pages: 265-271
Issue: 3
Volume: 71
Year: 2017
Month: 7
X-DOI: 10.1080/00031305.2017.1322001
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1322001
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:3:p:265-271
Template-Type: ReDIF-Article 1.0
Author-Name: The Editors
Title: Reviews of Books and Teaching Materials
Journal: The American Statistician
Pages: 282-289
Issue: 3
Volume: 71
Year: 2017
Month: 7
X-DOI: 10.1080/00031305.2017.1367180
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1367180
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:3:p:282-289
Template-Type: ReDIF-Article 1.0
Author-Name: Eric W. Gibson
Author-X-Name-First: Eric W.
Author-X-Name-Last: Gibson
Title: Leadership in Statistics: Increasing Our Value and Visibility
Abstract:
Scientists in every discipline are generating data more rapidly than ever before, resulting in an increasing need for statistical skills at a time when there is decreasing visibility for the field of statistics. Resolving this paradox requires stronger statistical leadership to guide multidisciplinary teams in the design and planning of scientific research and making decisions based on data. It requires more effective communication to nonstatisticians of the value of statistics in using data to answer questions, predict outcomes, and support decision-making in the face of uncertainty. It also requires a greater appreciation of the unique capabilities of alternative quantitative disciplines such as machine learning, data science, pharmacometrics, and bioinformatics which represent an opportunity for statisticians to achieve greater impact through collaborative partnership. Examples taken from pharmaceutical drug development are used to illustrate the concept of statistical leadership in a collaborative multidisciplinary team environment.
Journal: The American Statistician
Pages: 109-116
Issue: 2
Volume: 73
Year: 2019
Month: 4
X-DOI: 10.1080/00031305.2017.1336484
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1336484
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:109-116
Template-Type: ReDIF-Article 1.0
Author-Name: Jeffrey N. Rouder
Author-X-Name-First: Jeffrey N.
Author-X-Name-Last: Rouder
Author-Name: Richard D. Morey
Author-X-Name-First: Richard D.
Author-X-Name-Last: Morey
Title: Teaching Bayes’ Theorem: Strength of Evidence as Predictive Accuracy
Abstract:
Although teaching Bayes’ theorem is popular, the standard approach—targeting posterior distributions of parameters—may be improved. We advocate teaching Bayes’ theorem in a ratio form where the posterior beliefs relative to the prior beliefs equals the conditional probability of data relative to the marginal probability of data. This form leads to an interpretation that the strength of evidence is relative predictive accuracy. With this approach, students are encouraged to view Bayes’ theorem as an updating mechanism, to obtain a deeper appreciation of the role of the prior and of marginal data, and to view estimation and model comparison from a unified perspective.
Journal: The American Statistician
Pages: 186-190
Issue: 2
Volume: 73
Year: 2019
Month: 4
X-DOI: 10.1080/00031305.2017.1341334
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1341334
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:186-190
Template-Type: ReDIF-Article 1.0
Author-Name: Subhabrata Chakraborti
Author-X-Name-First: Subhabrata
Author-X-Name-Last: Chakraborti
Author-Name: Felipe Jardim
Author-X-Name-First: Felipe
Author-X-Name-Last: Jardim
Author-Name: Eugenio Epprecht
Author-X-Name-First: Eugenio
Author-X-Name-Last: Epprecht
Title: Higher-Order Moments Using the Survival Function: The Alternative Expectation Formula
Abstract:
Undergraduate and graduate students in a first-year probability (or a mathematical statistics) course learn the important concept of the moment of a random variable. The moments are related to various aspects of a probability distribution. In this context, the formula for the mean or the first moment of a nonnegative continuous random variable is often shown in terms of its c.d.f. (or the survival function). This has been called the alternative expectation formula. However, higher-order moments are also important, for example, to study the variance or the skewness of a distribution. In this note, we consider the rth moment of a nonnegative random variable and derive formulas in terms of the c.d.f. (or the survival function) paralleling the existing results for the first moment (the mean) using Fubini's theorem. Both nonnegative continuous and discrete integer-valued random variables are considered. These formulas may be advantageous, for example, when dealing with the moments of a transformed random variable, where it may be easier to derive its c.d.f. using the so-called c.d.f. method.
Journal: The American Statistician
Pages: 191-194
Issue: 2
Volume: 73
Year: 2019
Month: 4
X-DOI: 10.1080/00031305.2017.1356374
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1356374
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:191-194
Template-Type: ReDIF-Article 1.0
Author-Name: Yaakov Malinovsky
Author-X-Name-First: Yaakov
Author-X-Name-Last: Malinovsky
Author-Name: Paul S. Albert
Author-X-Name-First: Paul S.
Author-X-Name-Last: Albert
Title: Revisiting Nested Group Testing Procedures: New Results, Comparisons, and Robustness
Abstract:
Group testing has its origin in the identification of syphilis in the U.S. army during World War II. Much of the theoretical framework of group testing was developed starting in the late 1950s, with continued work into the 1990s. Recently, with the advent of new laboratory and genetic technologies, there has been an increasing interest in group testing designs for cost saving purposes. In this article, we compare different nested designs, including Dorfman, Sterrett and an optimal nested procedure obtained through dynamic programming. To elucidate these comparisons, we develop closed-form expressions for the optimal Sterrett procedure and provide a concise review of the prior literature for other commonly used procedures. We consider designs where the prevalence of disease is known as well as investigate the robustness of these procedures, when it is incorrectly assumed. This article provides a technical presentation that will be of interest to researchers as well as from a pedagogical perspective. Supplementary material for this article is available online.
Journal: The American Statistician
Pages: 117-125
Issue: 2
Volume: 73
Year: 2019
Month: 4
X-DOI: 10.1080/00031305.2017.1366367
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1366367
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:117-125
Template-Type: ReDIF-Article 1.0
Author-Name: Richard Le Blanc
Author-X-Name-First: Richard
Author-X-Name-Last: Le Blanc
Title: Bayesian Analysis on a Noncentral Fisher–Student’s Hypersphere
Abstract:
Fisher succeeded early on in redefining Student’s t-distribution in geometrical terms on a central hypersphere. Intriguingly, a noncentral analytical extension for this fundamental Fisher–Student’s central hypersphere h-distribution does not exist. We therefore set to derive the noncentral h-distribution and use it to graphically illustrate the limitations of the Neyman–Pearson null hypothesis significance testing framework and the strengths of the Bayesian statistical hypothesis analysis framework on the hypersphere polar axis, a compact nontrivial one-dimensional parameter space. Using a geometrically meaningful maximal entropy prior, we requalify the apparent failure of an important psychological science reproducibility project. We proceed to show that the Bayes factor appropriately models the two-sample t-test p-value density of a gene expression profile produced by the high-throughput genomic-scale microarray technology, and provides a simple expression for a local false discovery rate addressing the multiple hypothesis testing problem brought about by such a technology.
Journal: The American Statistician
Pages: 126-140
Issue: 2
Volume: 73
Year: 2019
Month: 4
X-DOI: 10.1080/00031305.2017.1377111
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1377111
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:126-140
Template-Type: ReDIF-Article 1.0
Author-Name: Xiaoyue Niu
Author-X-Name-First: Xiaoyue
Author-X-Name-Last: Niu
Author-Name: James L. Rosenberger
Author-X-Name-First: James L.
Author-X-Name-Last: Rosenberger
Title: Near-Balanced Incomplete Block Designs, With an Application to Poster Competitions
Abstract:
Judging scholarly posters creates a challenge to assign the judges efficiently. If there are many posters and few reviews per judge, the commonly used balanced incomplete block design is not a feasible option. An additional challenge is an unknown number of judges before the event. We propose two connected near-balanced incomplete block designs that both satisfy the requirements of our setting: one that generates a connected assignment and balances the treatments and another one that further balances pairs of treatments. We describe both fixed and random effects models to estimate the population marginal means of the poster scores and rationalize the use of the random effects model. We evaluate the estimation accuracy and efficiency, especially the winning chance of the truly best posters, of the two designs in comparison with a random assignment via simulation studies. The two proposed designs both demonstrate accuracy and efficiency gain over the random assignment.
Journal: The American Statistician
Pages: 159-164
Issue: 2
Volume: 73
Year: 2019
Month: 4
X-DOI: 10.1080/00031305.2017.1385534
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1385534
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:159-164
Template-Type: ReDIF-Article 1.0
Author-Name: Luke Keele
Author-X-Name-First: Luke
Author-X-Name-Last: Keele
Author-Name: Luke Miratrix
Author-X-Name-First: Luke
Author-X-Name-Last: Miratrix
Title: Randomization Inference for Outcomes with Clumping at Zero
Abstract:
While randomization inference is well developed for continuous and binary outcomes, there has been comparatively little work for outcomes with nonnegative support and clumping at zero. Typically, outcomes of this type have been modeled using parametric models that impose strong distributional assumptions. This article proposes new randomization inference procedures for nonnegative outcomes with clumping at zero. Instead of making distributional assumptions, we propose various assumptions about the nature of the response to treatment and use permutation inference for both testing and estimation. This approach allows for some natural goodness-of-fit tests for model assessment, as well as flexibility in selecting test statistics sensitive to different potential alternatives. We illustrate our approach using two randomized trials, where job training interventions were designed to increase earnings of participants.
Journal: The American Statistician
Pages: 141-150
Issue: 2
Volume: 73
Year: 2019
Month: 4
X-DOI: 10.1080/00031305.2017.1385535
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1385535
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:141-150
Template-Type: ReDIF-Article 1.0
Author-Name: Tommy Wright
Author-X-Name-First: Tommy
Author-X-Name-Last: Wright
Author-Name: Martin Klein
Author-X-Name-First: Martin
Author-X-Name-Last: Klein
Author-Name: Jerzy Wieczorek
Author-X-Name-First: Jerzy
Author-X-Name-Last: Wieczorek
Title: A Primer on Visualizations for Comparing Populations, Including the Issue of Overlapping Confidence Intervals
Abstract:
In comparing a collection of K populations, it is common practice to display in one visualization confidence intervals for the corresponding population parameters θ1, θ2, …, θK. For a pair of confidence intervals that do (or do not) overlap, viewers of the visualization are cognitively compelled to declare that there is not (or there is) a statistically significant difference between the two corresponding population parameters. It is generally well known that the method of examining overlap of pairs of confidence intervals should not be used for formal hypothesis testing. However, use of a single visualization with overlapping and nonoverlapping confidence intervals leads many to draw such conclusions, despite the best efforts of statisticians toward preventing users from reaching such conclusions. In this article, we summarize some alternative visualizations from the literature that can be used to properly test equality between a pair of population parameters. We recommend that these visualizations be used with caution to avoid incorrect statistical inference. The methods presented require only that we have K sample estimates and their associated standard errors. We also assume that the sample estimators are independent, unbiased, and normally distributed.
Journal: The American Statistician
Pages: 165-178
Issue: 2
Volume: 73
Year: 2019
Month: 4
X-DOI: 10.1080/00031305.2017.1392359
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1392359
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:165-178
Template-Type: ReDIF-Article 1.0
Author-Name: Charles South
Author-X-Name-First: Charles
Author-X-Name-Last: South
Author-Name: Ryan Elmore
Author-X-Name-First: Ryan
Author-X-Name-Last: Elmore
Author-Name: Andrew Clarage
Author-X-Name-First: Andrew
Author-X-Name-Last: Clarage
Author-Name: Rob Sickorez
Author-X-Name-First: Rob
Author-X-Name-Last: Sickorez
Author-Name: Jing Cao
Author-X-Name-First: Jing
Author-X-Name-Last: Cao
Title: A Starting Point for Navigating the World of Daily Fantasy Basketball
Abstract:
Fantasy sports, particularly the daily variety in which new lineups are selected each day, are a rapidly growing industry. The two largest companies in the daily fantasy business, DraftKings and Fanduel, have been valued as high as $2 billion. This research focuses on the development of a complete system for daily fantasy basketball, including both the prediction of player performance and the construction of a team. First, a Bayesian random effects model is used to predict an aggregate measure of daily NBA player performance. The predictions are then used to construct teams under the constraints of the game, typically related to a fictional salary cap and player positions. Permutation based and K-nearest neighbors approaches are compared in terms of the identification of “successful” teams—those who would be competitive more often than not based on historical data. We demonstrate the efficacy of our system by comparing our predictions to those from a well-known analytics website, and by simulating daily competitions over the course of the 2015–2016 season. Our results show an expected profit of approximately $9,000 on an initial $500 investment using the K-nearest neighbors approach, a 36% increase relative to using the permutation-based approach alone. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 179-185
Issue: 2
Volume: 73
Year: 2019
Month: 4
X-DOI: 10.1080/00031305.2017.1401559
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1401559
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:179-185
Template-Type: ReDIF-Article 1.0
Author-Name: Flavio Santi
Author-X-Name-First: Flavio
Author-X-Name-Last: Santi
Author-Name: Maria Michela Dickson
Author-X-Name-First: Maria Michela
Author-X-Name-Last: Dickson
Author-Name: Giuseppe Espa
Author-X-Name-First: Giuseppe
Author-X-Name-Last: Espa
Title: A Graphical Tool for Interpreting Regression Coefficients of Trinomial Logit Models
Abstract:
Multinomial logit (also termed multi-logit) models permit the analysis of the statistical relation between a categorical response variable and a set of explicative variables (called covariates or regressors). Although multinomial logit is widely used in both the social and economic sciences, the interpretation of regression coefficients may be tricky, as the effect of covariates on the probability distribution of the response variable is nonconstant and difficult to quantify. The ternary plots illustrated in this article aim at facilitating the interpretation of regression coefficients and permit the effect of covariates (either singularly or jointly considered) on the probability distribution of the dependent variable to be quantified. Ternary plots can be drawn both for ordered and for unordered categorical dependent variables, when the number of possible outcomes equals three (trinomial response variable); these plots allow not only to represent the covariate effects over the whole parameter space of the dependent variable but also to compare the covariate effects of any given individual profile. The method is illustrated and discussed through analysis of a dataset concerning the transition of master’s graduates of the University of Trento (Italy) from university to employment.
Journal: The American Statistician
Pages: 200-207
Issue: 2
Volume: 73
Year: 2019
Month: 4
X-DOI: 10.1080/00031305.2018.1442368
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1442368
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:200-207
Template-Type: ReDIF-Article 1.0
Author-Name: Frank Tuyl
Author-X-Name-First: Frank
Author-X-Name-Last: Tuyl
Title: A Method to Handle Zero Counts in the Multinomial Model
Abstract:
In the context of an objective Bayesian approach to the multinomial model, Dirichlet(a, …, a) priors with a < 1 have previously been shown to be inadequate in the presence of zero counts, suggesting that the uniform prior (a = 1) is the preferred candidate. In the presence of many zero counts, however, this prior may not be satisfactory either. A model selection approach is proposed, allowing for the possibility of zero parameters corresponding to zero count categories. This approach results in a posterior mixture of Dirichlet distributions and marginal mixtures of beta distributions, which seem to avoid the problems that potentially result from the various proposed Dirichlet priors, in particular in the context of extreme data with zero counts.
Journal: The American Statistician
Pages: 151-158
Issue: 2
Volume: 73
Year: 2019
Month: 4
X-DOI: 10.1080/00031305.2018.1444673
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1444673
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:151-158
Template-Type: ReDIF-Article 1.0
Author-Name: Francisco Louzada
Author-X-Name-First: Francisco
Author-X-Name-Last: Louzada
Author-Name: Pedro L. Ramos
Author-X-Name-First: Pedro L.
Author-X-Name-Last: Ramos
Author-Name: Eduardo Ramos
Author-X-Name-First: Eduardo
Author-X-Name-Last: Ramos
Title: A Note on Bias of Closed-Form Estimators for the Gamma Distribution Derived From Likelihood Equations
Abstract:
We discuss here an alternative approach for decreasing the bias of the closed-form estimators for the gamma distribution recently proposed by Ye and Chen in 2017. We show that, the new estimator has also closed-form expression, is positive, and can be computed for n > 2. Moreover, the corrective approach returns better estimates when compared with the former ones.
Journal: The American Statistician
Pages: 195-199
Issue: 2
Volume: 73
Year: 2019
Month: 4
X-DOI: 10.1080/00031305.2018.1513376
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1513376
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:195-199
Template-Type: ReDIF-Article 1.0
Author-Name: Xin Wang
Author-X-Name-First: Xin
Author-X-Name-Last: Wang
Title: Business Survival Analysis Using SAS: An Introduction to Lifetime Probabilities
Journal: The American Statistician
Pages: 208-209
Issue: 2
Volume: 73
Year: 2019
Month: 4
X-DOI: 10.1080/00031305.2018.1538851
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1538851
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:208-209
Template-Type: ReDIF-Article 1.0
Author-Name: Anna Schenfisch
Author-X-Name-First: Anna
Author-X-Name-Last: Schenfisch
Author-Name: Brittany Fasy
Author-X-Name-First: Brittany
Author-X-Name-Last: Fasy
Title: Statistical Analysis of Contingency Tables.
Journal: The American Statistician
Pages: 208-208
Issue: 2
Volume: 73
Year: 2019
Month: 4
X-DOI: 10.1080/00031305.2019.1571848
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1571848
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:208-208
Template-Type: ReDIF-Article 1.0
Author-Name: Nicole Bohme Carnegie
Author-X-Name-First: Nicole Bohme
Author-X-Name-Last: Carnegie
Title: Quantitative Methods for HIV/AIDS Research
Journal: The American Statistician
Pages: 209-210
Issue: 2
Volume: 73
Year: 2019
Month: 4
X-DOI: 10.1080/00031305.2019.1603473
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1603473
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:209-210
Template-Type: ReDIF-Article 1.0
Author-Name: Minggen Lu
Author-X-Name-First: Minggen
Author-X-Name-Last: Lu
Title: Survival Analysis with Interval-Censored Data: A Practical Approach with Examples in R, SAS, and BUGS.
Journal: The American Statistician
Pages: 211-212
Issue: 2
Volume: 73
Year: 2019
Month: 4
X-DOI: 10.1080/00031305.2019.1603477
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1603477
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:211-212
Template-Type: ReDIF-Article 1.0
Author-Name: Emily Dressler
Author-X-Name-First: Emily
Author-X-Name-Last: Dressler
Title: Clinical Trial Optimization Using R.
Journal: The American Statistician
Pages: 210-211
Issue: 2
Volume: 73
Year: 2019
Month: 4
X-DOI: 10.1080/00031305.2019.1603479
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1603479
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:2:p:210-211
Template-Type: ReDIF-Article 1.0
Author-Name: Prakash Gorroochurn
Author-X-Name-First: Prakash
Author-X-Name-Last: Gorroochurn
Title: On Galton's Change From “Reversion” to “Regression”
Abstract:
Galton's first work on regression probably led him to think of it as a unidirectional, genetic process, which he called “reversion.” A subsequent experiment on family heights made him realize that the phenomenon was symmetric and nongenetic. Galton then abandoned “reversion” in favor of “regression.” Final confirmation was provided through Dickson's mathematical analysis and Galton's examination of height data on brothers.
Journal: The American Statistician
Pages: 227-231
Issue: 3
Volume: 70
Year: 2016
Month: 7
X-DOI: 10.1080/00031305.2015.1087876
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1087876
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:227-231
Template-Type: ReDIF-Article 1.0
Author-Name: J. G. Liao
Author-X-Name-First: J. G.
Author-X-Name-Last: Liao
Author-Name: Duanping Liao
Author-X-Name-First: Duanping
Author-X-Name-Last: Liao
Author-Name: Arthur Berg
Author-X-Name-First: Arthur
Author-X-Name-Last: Berg
Title: Calibrated Bayes Factors in Assessing Genetic Association Models
Abstract:
Three competing genetic models—additive, dominant, and recessive—are often considered in genetic association analysis. We propose and develop a calibrated Bayes approach for comparing these competing models that has the desired property of giving equal support to the three models when no genetic association is present. The naïve approach with noncalibrated priors is shown to produce misleading Bayes factors. The method is fully developed with simulation studies, real data analyses, and an efficient algorithm based on an asymptotic approximation. An illuminating connection to the Kullback–Leibler divergence is also established. The proposed calibrated prior can serve as a reference prior for a genetic association study or as a common baseline prior for comparing Bayes analyses of different datasets.
Journal: The American Statistician
Pages: 250-256
Issue: 3
Volume: 70
Year: 2016
Month: 7
X-DOI: 10.1080/00031305.2015.1109548
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1109548
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:250-256
Template-Type: ReDIF-Article 1.0
Author-Name: Deborah Nolan
Author-X-Name-First: Deborah
Author-X-Name-Last: Nolan
Author-Name: Jamis Perrett
Author-X-Name-First: Jamis
Author-X-Name-Last: Perrett
Title: Teaching and Learning Data Visualization: Ideas and Assignments
Abstract:
This article discusses how to make statistical graphics a more prominent element of the undergraduate statistics curricula. The focus is on several different types of assignments that exemplify how to incorporate graphics into a course in a pedagogically meaningful way. These assignments include having students deconstruct and reconstruct plots, copy masterful graphs, create one-minute visual revelations, convert tables into “pictures,” and develop interactive visualizations, for example, with the virtual earth as a plotting canvas. In addition to describing the goals and details of each assignment, we also discuss the broader topic of graphics and key concepts that we think warrant inclusion in the statistics curricula. We advocate that more attention needs to be paid to this fundamental field of statistics at all levels, from introductory undergraduate through graduate level courses. With the rapid rise of tools to visualize data, for example, Google trends, GapMinder, ManyEyes, and Tableau, and the increased use of graphics in the media, understanding the principles of good statistical graphics, and having the ability to create informative visualizations is an ever more important aspect of statistics education. Supplementary materials containing code and data for the assignments are available online.
Journal: The American Statistician
Pages: 260-269
Issue: 3
Volume: 70
Year: 2016
Month: 7
X-DOI: 10.1080/00031305.2015.1123651
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1123651
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:260-269
Template-Type: ReDIF-Article 1.0
Author-Name: Albert Vexler
Author-X-Name-First: Albert
Author-X-Name-Last: Vexler
Author-Name: Li Zou
Author-X-Name-First: Li
Author-X-Name-Last: Zou
Author-Name: Alan D. Hutson
Author-X-Name-First: Alan D.
Author-X-Name-Last: Hutson
Title: Data-Driven Confidence Interval Estimation Incorporating Prior Information with an Adjustment for Skewed Data
Abstract:
Bayesian credible interval (CI) estimation is a statistical procedure that has been well addressed in both the theoretical and applied literature. Parametric assumptions regarding baseline data distributions are critical for the implementation of this method. We provide a nonparametric technique for incorporating prior information into the equal-tailed (ET) and highest posterior density (HPD) CI estimators in the Bayesian manner. We propose to use a data-driven likelihood function, replacing the parametric likelihood function to create a distribution-free posterior. Higher order asymptotic propositions are derived to show the efficiency and consistency of the proposed method. We demonstrate that the proposed approach may correct confidence regions with respect to skewness of the data distribution. An extensive Monte Carlo (MC) study confirms the proposed method significantly outperforms the classical CI estimation in a frequentist context. A real data example related to a study of myocardial infarction illustrates the excellent applicability of the proposed technique. Supplementary material, including the R code used to implement the developed method, is available online.
Journal: The American Statistician
Pages: 243-249
Issue: 3
Volume: 70
Year: 2016
Month: 7
X-DOI: 10.1080/00031305.2016.1141707
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1141707
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:243-249
Template-Type: ReDIF-Article 1.0
Author-Name: Xavier Puig
Author-X-Name-First: Xavier
Author-X-Name-Last: Puig
Author-Name: Martí Font
Author-X-Name-First: Martí
Author-X-Name-Last: Font
Author-Name: Josep Ginebra
Author-X-Name-First: Josep
Author-X-Name-Last: Ginebra
Title: A Unified Approach to Authorship Attribution and Verification
Abstract:
In authorship attribution, one assigns texts from an unknown author to either one of two or more candidate authors by comparing the disputed texts with texts known to have been written by the candidate authors. In authorship verification, one decides whether a text or a set of texts could have been written by a given author. These two problems are usually treated separately. By assuming an open-set classification framework for the attribution problem, contemplating the possibility that none of the candidate authors is the unknown author, the verification problem becomes a special case of attribution problem. Here both problems are posed as a formal Bayesian multinomial model selection problem and are given a closed-form solution, tailored for categorical data, naturally incorporating text length and dependence in the analysis, and coping well with settings with a small number of training texts. The approach to authorship verification is illustrated by exploring whether a court ruling sentence could have been written by the judge that signs it, and the approach to authorship attribution is illustrated by revisiting the authorship attribution of the Federalist papers and through a small simulation study.
Journal: The American Statistician
Pages: 232-242
Issue: 3
Volume: 70
Year: 2016
Month: 7
X-DOI: 10.1080/00031305.2016.1148630
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1148630
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:232-242
Template-Type: ReDIF-Article 1.0
Author-Name: Nicholas G. Reich
Author-X-Name-First: Nicholas G.
Author-X-Name-Last: Reich
Author-Name: Justin Lessler
Author-X-Name-First: Justin
Author-X-Name-Last: Lessler
Author-Name: Krzysztof Sakrejda
Author-X-Name-First: Krzysztof
Author-X-Name-Last: Sakrejda
Author-Name: Stephen A. Lauer
Author-X-Name-First: Stephen A.
Author-X-Name-Last: Lauer
Author-Name: Sopon Iamsirithaworn
Author-X-Name-First: Sopon
Author-X-Name-Last: Iamsirithaworn
Author-Name: Derek A. T. Cummings
Author-X-Name-First: Derek A. T.
Author-X-Name-Last: Cummings
Title: Case Study in Evaluating Time Series Prediction Models Using the Relative Mean Absolute Error
Abstract:
Statistical prediction models inform decision-making processes in many real-world settings. Prior to using predictions in practice, one must rigorously test and validate candidate models to ensure that the proposed predictions have sufficient accuracy to be used in practice. In this article, we present a framework for evaluating time series predictions, which emphasizes computational simplicity and an intuitive interpretation using the relative mean absolute error metric. For a single time series, this metric enables comparisons of candidate model predictions against naïve reference models, a method that can provide useful and standardized performance benchmarks. Additionally, in applications with multiple time series, this framework facilitates comparisons of one or more models’ predictive performance across different sets of data. We illustrate the use of this metric with a case study comparing predictions of dengue hemorrhagic fever incidence in two provinces of Thailand. This example demonstrates the utility and interpretability of the relative mean absolute error metric in practice, and underscores the practical advantages of using relative performance metrics when evaluating predictions.
Journal: The American Statistician
Pages: 285-292
Issue: 3
Volume: 70
Year: 2016
Month: 7
X-DOI: 10.1080/00031305.2016.1148631
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1148631
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:285-292
Template-Type: ReDIF-Article 1.0
Author-Name: Miguel de Carvalho
Author-X-Name-First: Miguel
Author-X-Name-Last: de Carvalho
Title: Mean, What do You Mean?
Abstract:
When teaching statistics we often resort to several notions of mean, such as arithmetic mean, geometric mean, and harmonic mean, and hence the student is often left with the question: The word mean appears in all such concepts, so what is actually a mean? I revisit Kolmogorov's axiomatic view of the mean, which unifies all these concepts of mean, among others. A population counterpart of the notion of regular mean, along with notions of regular variance and standard deviation will also be discussed here as unifying concepts. Some examples are used to illustrate main ideas.
Journal: The American Statistician
Pages: 270-274
Issue: 3
Volume: 70
Year: 2016
Month: 7
X-DOI: 10.1080/00031305.2016.1148632
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1148632
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:270-274
Template-Type: ReDIF-Article 1.0
Author-Name: John C. Wierman
Author-X-Name-First: John C.
Author-X-Name-Last: Wierman
Title: The Class Joke Contest: Encouraging Creativity and Improving Attendance
Abstract:
Jokes are a resource that can be used to transmit concepts, motivate students, encourage creativity, and make learning more enjoyable. In each of my classes on probability and stochastic processes, I hold a monthly joke contest. Students are encouraged to submit original jokes relating to the course and its topics. The teaching assistants and I select a few finalists, and the class votes to determine winners, who receive extra credit. This article discusses the origin and evolution of the contest, describes its benefits in increased engagement and improved attendance, provides information and tips for faculty who might want to conduct a joke contest, and includes some example jokes.
Journal: The American Statistician
Pages: 257-259
Issue: 3
Volume: 70
Year: 2016
Month: 7
X-DOI: 10.1080/00031305.2016.1148633
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1148633
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:257-259
Template-Type: ReDIF-Article 1.0
Author-Name: Jia Liu
Author-X-Name-First: Jia
Author-X-Name-Last: Liu
Author-Name: Daniel J. Nordman
Author-X-Name-First: Daniel J.
Author-X-Name-Last: Nordman
Author-Name: William Q. Meeker
Author-X-Name-First: William Q.
Author-X-Name-Last: Meeker
Title: The Number of MCMC Draws Needed to Compute Bayesian Credible Bounds
Abstract:
In the past 20 years, there has been a staggering increase in the use of Bayesian statistical inference, based on Markov chain Monte Carlo (MCMC) methods, to estimate model parameters and other quantities of interest. This trend exists in virtually all areas of engineering and science. In a typical application, researchers will report estimates of parametric functions (e.g., quantiles, probabilities, or predictions of future outcomes) and corresponding intervals from MCMC methods. One difficulty with the use of inferential methods based on Monte Carlo (MC) is that reported results may be inaccurate due to MC error. MC error, however, can be made arbitrarily small by increasing the number of MC draws. Most users of MCMC methods seem to use indirect diagnostics, trial-and-error, or guess-work to decide how long to run a MCMC algorithm and accuracy of MCMC output results is rarely reported. Unless careful analysis is done, reported numerical results may contain digits that are completely meaningless. In this article, we describe an algorithm to provide direct guidance on the number of MCMC draws needed to achieve a desired amount of precision (i.e., a specified number of accurate significant digits) for Bayesian credible interval endpoints.
Journal: The American Statistician
Pages: 275-284
Issue: 3
Volume: 70
Year: 2016
Month: 7
X-DOI: 10.1080/00031305.2016.1158738
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1158738
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:275-284
Template-Type: ReDIF-Article 1.0
Author-Name: Peng Ding
Author-X-Name-First: Peng
Author-X-Name-Last: Ding
Title: On the Conditional Distribution of the Multivariate Distribution
Abstract:
As alternatives to the normal distributions, t distributions are widely applied in robust analysis for data with outliers or heavy tails. The properties of the multivariate t distribution are well documented in Kotz and Nadarajah's book, which, however, states a wrong conclusion about the conditional distribution of the multivariate t distribution. Previous literature has recognized that the conditional distribution of the multivariate t distribution also follows the multivariate t distribution. We provide an intuitive proof without directly manipulating the complicated density function of the multivariate t distribution.
Journal: The American Statistician
Pages: 293-295
Issue: 3
Volume: 70
Year: 2016
Month: 7
X-DOI: 10.1080/00031305.2016.1164756
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1164756
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:293-295
Template-Type: ReDIF-Article 1.0
Author-Name: Jyotirmoy Sarkar
Author-X-Name-First: Jyotirmoy
Author-X-Name-Last: Sarkar
Author-Name: Mamunur Rashid
Author-X-Name-First: Mamunur
Author-X-Name-Last: Rashid
Title: Visualizing Mean, Median, Mean Deviation, and Standard Deviation of a Set of Numbers
Abstract:
We review the existing visualizations of the mean and the median of a given set of numbers. Then we give an alternative visualization of the mean using the empirical cumulative distribution function of the given numbers. Next, we visualize the mean deviation (MD) and the mean square deviation (MSD) of the given numbers from any arbitrary value, including the variance. In light of these new visualizations, we revisit the well-known optimal properties of the MD from the median and the MSD from the mean. We also give a more elementary explanation of why the denominator of the sample variance of a set of numbers is one less than the sample size.
Journal: The American Statistician
Pages: 304-312
Issue: 3
Volume: 70
Year: 2016
Month: 7
X-DOI: 10.1080/00031305.2016.1165734
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1165734
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:304-312
Template-Type: ReDIF-Article 1.0
Author-Name: Christian Kleiber
Author-X-Name-First: Christian
Author-X-Name-Last: Kleiber
Author-Name: Achim Zeileis
Author-X-Name-First: Achim
Author-X-Name-Last: Zeileis
Title: Visualizing Count Data Regressions Using Rootograms
Abstract:
The rootogram is a graphical tool associated with the work of J. W. Tukey that was originally used for assessing goodness of fit of univariate distributions. Here, we extend the rootogram to regression models and show that this is particularly useful for diagnosing and treating issues such as overdispersion and/or excess zeros in count data models. We also introduce a weighted version of the rootogram that can be applied out of sample or to (weighted) subsets of the data, for example, in finite mixture models. An empirical illustration revisiting a well-known dataset from ethology is included, for which a negative binomial hurdle model is employed. Supplementary materials providing two further illustrations are available online: the first, using data from public health, employs a two-component finite mixture of negative binomial models; the second, using data from finance, involves underdispersion. An R implementation of our tools is available in the R package countreg. It also contains the data and replication code.
Journal: The American Statistician
Pages: 296-303
Issue: 3
Volume: 70
Year: 2016
Month: 7
X-DOI: 10.1080/00031305.2016.1173590
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1173590
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:296-303
Template-Type: ReDIF-Article 1.0
Author-Name: Ben O'Neill
Author-X-Name-First: Ben
Author-X-Name-Last: O'Neill
Title: Corrigendum
Journal: The American Statistician
Pages: 323-323
Issue: 3
Volume: 70
Year: 2016
Month: 7
X-DOI: 10.1080/00031305.2016.1188584
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1188584
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:323-323
Template-Type: ReDIF-Article 1.0
Author-Name: The Editors
Title: Reviews of Books and Teaching Materials
Journal: The American Statistician
Pages: 313-322
Issue: 3
Volume: 70
Year: 2016
Month: 7
X-DOI: 10.1080/00031305.2016.1203696
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1203696
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:313-322
Template-Type: ReDIF-Article 1.0
Author-Name: Ronald L. Wasserstein
Author-X-Name-First: Ronald L.
Author-X-Name-Last: Wasserstein
Author-Name: Allen L. Schirm
Author-X-Name-First: Allen L.
Author-X-Name-Last: Schirm
Author-Name: Nicole A. Lazar
Author-X-Name-First: Nicole A.
Author-X-Name-Last: Lazar
Title: Moving to a World Beyond “p < 0.05”
Journal: The American Statistician
Pages: 1-19
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2019.1583913
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1583913
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:1-19
Template-Type: ReDIF-Article 1.0
Author-Name: John P. A. Ioannidis
Author-X-Name-First: John P. A.
Author-X-Name-Last: Ioannidis
Title: What Have We (Not) Learnt from Millions of Scientific Papers with P Values?
Abstract:
P values linked to null hypothesis significance testing (NHST) is the most widely (mis)used method of statistical inference. Empirical data suggest that across the biomedical literature (1990–2015), when abstracts use P values 96% of them have P values of 0.05 or less. The same percentage (96%) applies for full-text articles. Among 100 articles in PubMed, 55 report P values, while only 4 present confidence intervals for all the reported effect sizes, none use Bayesian methods and none use false-discovery rate. Over 25 years (1990–2015), use of P values in abstracts has doubled for all PubMed, and tripled for meta-analyses, while for some types of designs such as randomized trials the majority of abstracts report P values. There is major selective reporting for P values. Abstracts tend to highlight most favorable P values and inferences use even further spin to reach exaggerated, unreliable conclusions. The availability of large-scale data on P values from many papers has allowed the development and applications of methods that try to detect and model selection biases, for example, p-hacking, that cause patterns of excess significance. Inferences need to be cautious as they depend on the assumptions made by these models and can be affected by the presence of other biases (e.g., confounding in observational studies). While much of the unreliability of past and present research is driven by small, underpowered studies, NHST with P values may be also particularly problematic in the era of overpowered big data. NHST and P values are optimal only in a minority of current research. Using a more stringent threshold, as in the recently proposed shift from P < 0.05 to P < 0.005, is a temporizing measure to contain the flood and death-by-significance. NHST and P values may be replaced in many fields by other, more fit-for-purpose, inferential methods. However, curtailing selection biases requires additional measures, beyond changes in inferential methods, and in particular reproducible research practices.
Journal: The American Statistician
Pages: 20-25
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1447512
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1447512
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:20-25
Template-Type: ReDIF-Article 1.0
Author-Name: Steven N. Goodman
Author-X-Name-First: Steven N.
Author-X-Name-Last: Goodman
Title: Why is Getting Rid of P-Values So Hard? Musings on Science and Statistics
Abstract:
The current concerns about reproducibility have focused attention on proper use of statistics across the sciences. This gives statisticians an extraordinary opportunity to change what are widely regarded as statistical practices detrimental to the cause of good science. However, how that should be done is enormously complex, made more difficult by the balkanization of research methods and statistical traditions across scientific subdisciplines. Working within those sciences while also allying with science reform movements—operating simultaneously on the micro and macro levels—are the key to making lasting change in applied science.
Journal: The American Statistician
Pages: 26-30
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1558111
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1558111
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:26-30
Template-Type: ReDIF-Article 1.0
Author-Name: Raymond Hubbard
Author-X-Name-First: Raymond
Author-X-Name-Last: Hubbard
Title: Will the ASA's Efforts to Improve Statistical Practice be Successful? Some Evidence to the Contrary
Abstract:
Recent efforts by the American Statistical Association to improve statistical practice, especially in countering the misuse and abuse of null hypothesis significance testing (NHST) and p-values, are to be welcomed. But will they be successful? The present study offers compelling evidence that this will be an extraordinarily difficult task. Dramatic citation-count data on 25 articles and books severely critical of NHST's negative impact on good science, underlining that this issue was/is well known, did nothing to stem its usage over the period 1960–2007. On the contrary, employment of NHST increased during this time. To be successful in this endeavor, as well as restoring the relevance of the statistics profession to the scientific community in the 21st century, the ASA must be prepared to dispense detailed advice. This includes specifying those situations, if they can be identified, in which the p-value plays a clearly valuable role in data analysis and interpretation. The ASA might also consider a statement that recommends abandoning the use of p-values.
Journal: The American Statistician
Pages: 31-35
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1497540
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1497540
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:31-35
Template-Type: ReDIF-Article 1.0
Author-Name: John L. Kmetz
Author-X-Name-First: John L.
Author-X-Name-Last: Kmetz
Title: Correcting Corrupt Research: Recommendations for the Profession to Stop Misuse of p-Values
Abstract:
p-Values and Null Hypothesis Significance Testing (NHST), combined with a large number of institutional factors, jointly define the Generally Accepted Soft Social Science Publishing Process (GASSSPP) that is now dominant in the social sciences and is increasingly used elsewhere. The case against NHST and the GASSSPP has been abundantly articulated over past decades, and yet it continues to spread, supported by a large number of self-reinforcing institutional processes. In this article, the author presents a number of steps that may be taken to counter the spread of this corruption that directly address the institutional forces, both as individuals and through collaborative efforts. While individual efforts are indispensable to this undertaking, the author argues that these alone cannot succeed unless the institutional forces are also addressed. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 36-45
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1518271
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1518271
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:36-45
Template-Type: ReDIF-Article 1.0
Author-Name: Douglas W. Hubbard
Author-X-Name-First: Douglas W.
Author-X-Name-Last: Hubbard
Author-Name: Alicia L. Carriquiry
Author-X-Name-First: Alicia L.
Author-X-Name-Last: Carriquiry
Title: Quality Control for Scientific Research: Addressing Reproducibility, Responsiveness, and Relevance
Abstract:
Efforts to address a reproducibility crisis have generated several valid proposals for improving the quality of scientific research. We argue there is also need to address the separate but related issues of relevance and responsiveness. To address relevance, researchers must produce what decision makers actually need to inform investments and public policy—that is, the probability that a claim is true or the probability distribution of an effect size given the data. The term responsiveness refers to the irregularity and delay in which issues about the quality of research are brought to light. Instead of relying on the good fortune that some motivated researchers will periodically conduct efforts to reveal potential shortcomings of published research, we could establish a continuous quality-control process for scientific research itself. Quality metrics could be designed through the application of this statistical process control for the research enterprise. We argue that one quality control metric—the probability that a research hypothesis is true—is required to address at least relevance and may also be part of the solution for improving responsiveness and reproducibility. This article proposes a “straw man” solution which could be the basis of implementing these improvements. As part of this solution, we propose one way to “bootstrap” priors. The processes required for improving reproducibility and relevance can also be part of a comprehensive statistical quality control for science itself by making continuously monitored metrics about the scientific performance of a field of research.
Journal: The American Statistician
Pages: 46-55
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1543138
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1543138
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:46-55
Template-Type: ReDIF-Article 1.0
Author-Name: Naomi C. Brownstein
Author-X-Name-First: Naomi C.
Author-X-Name-Last: Brownstein
Author-Name: Thomas A. Louis
Author-X-Name-First: Thomas A.
Author-X-Name-Last: Louis
Author-Name: Anthony O’Hagan
Author-X-Name-First: Anthony
Author-X-Name-Last: O’Hagan
Author-Name: Jane Pendergast
Author-X-Name-First: Jane
Author-X-Name-Last: Pendergast
Title: The Role of Expert Judgment in Statistical Inference and Evidence-Based Decision-Making
Abstract:
This article resulted from our participation in the session on the “role of expert opinion and judgment in statistical inference” at the October 2017 ASA Symposium on Statistical Inference. We present a strong, unified statement on roles of expert judgment in statistics with processes for obtaining input, whether from a Bayesian or frequentist perspective. Topics include the role of subjectivity in the cycle of scientific inference and decisions, followed by a clinical trial and a greenhouse gas emissions case study that illustrate the role of judgments and the importance of basing them on objective information and a comprehensive uncertainty assessment. We close with a call for increased proactivity and involvement of statisticians in study conceptualization, design, conduct, analysis, and communication.
Journal: The American Statistician
Pages: 56-68
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1529623
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1529623
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:56-68
Template-Type: ReDIF-Article 1.0
Author-Name: Anthony O’Hagan
Author-X-Name-First: Anthony
Author-X-Name-Last: O’Hagan
Title: Expert Knowledge Elicitation: Subjective but Scientific
Abstract:
Expert opinion and judgment enter into the practice of statistical inference and decision-making in numerous ways. Indeed, there is essentially no aspect of scientific investigation in which judgment is not required. Judgment is necessarily subjective, but should be made as carefully, as objectively, and as scientifically as possible.Elicitation of expert knowledge concerning an uncertain quantity expresses that knowledge in the form of a (subjective) probability distribution for the quantity. Such distributions play an important role in statistical inference (for example as prior distributions in a Bayesian analysis) and in evidence-based decision-making (for example as expressions of uncertainty regarding inputs to a decision model). This article sets out a number of practices through which elicitation can be made as rigorous and scientific as possible.One such practice is to follow a recognized protocol that is designed to address and minimize the cognitive biases that experts are prone to when making probabilistic judgments. We review the leading protocols in the field, and contrast their different approaches to dealing with these biases through the medium of a detailed case study employing the SHELF protocol.The article ends with discussion of how to elicit a joint probability distribution for multiple uncertain quantities, which is a challenge for all the leading protocols. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 69-81
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1518265
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1518265
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:69-81
Template-Type: ReDIF-Article 1.0
Author-Name: Lee Kennedy-Shaffer
Author-X-Name-First: Lee
Author-X-Name-Last: Kennedy-Shaffer
Title: Before p < 0.05 to Beyond p < 0.05: Using History to Contextualize p-Values and Significance Testing
Abstract:
As statisticians and scientists consider a world beyond p < 0.05, it is important to not lose sight of how we got to this point. Although significance testing and p-values are often presented as prescriptive procedures, they came about through a process of refinement and extension to other disciplines. Ronald A. Fisher and his contemporaries formalized these methods in the early twentieth century and Fisher’s 1925 Statistical Methods for Research Workers brought the techniques to experimentalists in a variety of disciplines. Understanding how these methods arose, spread, and were argued over since then illuminates how p < 0.05 came to be a standard for scientific inference, the advantage it offered at the time, and how it was interpreted. This historical perspective can inform the work of statisticians today by encouraging thoughtful consideration of how their work, including proposed alternatives to the p-value, will be perceived and used by scientists. And it can engage students more fully and encourage critical thinking rather than rote applications of formulae. Incorporating history enables students, practitioners, and statisticians to treat the discipline as an ongoing endeavor, crafted by fallible humans, and provides a deeper understanding of the subject and its consequences for science and society.
Journal: The American Statistician
Pages: 82-90
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1537891
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1537891
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:82-90
Template-Type: ReDIF-Article 1.0
Author-Name: Raymond Hubbard
Author-X-Name-First: Raymond
Author-X-Name-Last: Hubbard
Author-Name: Brian D. Haig
Author-X-Name-First: Brian D.
Author-X-Name-Last: Haig
Author-Name: Rahul A. Parsa
Author-X-Name-First: Rahul A.
Author-X-Name-Last: Parsa
Title: The Limited Role of Formal Statistical Inference in Scientific Inference
Abstract:
Such is the grip of formal methods of statistical inference—that is, frequentist methods for generalizing from sample to population in enumerative studies—in the drawing of scientific inferences that the two are routinely deemed equivalent in the social, management, and biomedical sciences. This, despite the fact that legitimate employment of said methods is difficult to implement on practical grounds alone. But supposing the adoption of these procedures were simple does not get us far; crucially, methods of formal statistical inference are ill-suited to the analysis of much scientific data. Even findings from the claimed gold standard for examination by the latter, randomized controlled trials, can be problematic.Scientific inference is a far broader concept than statistical inference. Its authority derives from the accumulation, over an extensive period of time, of both theoretical and empirical knowledge that has won the (provisional) acceptance of the scholarly community. A major focus of scientific inference can be viewed as the pursuit of significant sameness, meaning replicable and empirically generalizable results among phenomena. Regrettably, the obsession with users of statistical inference to report significant differences in data sets actively thwarts cumulative knowledge development.The manifold problems surrounding the implementation and usefulness of formal methods of statistical inference in advancing science do not speak well of much teaching in methods/statistics classes. Serious reflection on statistics' role in producing viable knowledge is needed. Commendably, the American Statistical Association is committed to addressing this challenge, as further witnessed in this special online, open access issue of The American Statistician.
Journal: The American Statistician
Pages: 91-98
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1464947
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1464947
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:91-98
Template-Type: ReDIF-Article 1.0
Author-Name: Blakeley B. McShane
Author-X-Name-First: Blakeley B.
Author-X-Name-Last: McShane
Author-Name: Jennifer L. Tackett
Author-X-Name-First: Jennifer L.
Author-X-Name-Last: Tackett
Author-Name: Ulf Böckenholt
Author-X-Name-First: Ulf
Author-X-Name-Last: Böckenholt
Author-Name: Andrew Gelman
Author-X-Name-First: Andrew
Author-X-Name-Last: Gelman
Title: Large-Scale Replication Projects in Contemporary Psychological Research
Abstract:
Replication is complicated in psychological research because studies of a given psychological phenomenon can never be direct or exact replications of one another, and thus effect sizes vary from one study of the phenomenon to the next—an issue of clear importance for replication. Current large-scale replication projects represent an important step forward for assessing replicability, but provide only limited information because they have thus far been designed in a manner such that heterogeneity either cannot be assessed or is intended to be eliminated. Consequently, the nontrivial degree of heterogeneity found in these projects represents a lower bound on the true degree of heterogeneity. We recommend enriching large-scale replication projects going forward by embracing heterogeneity. We argue this is the key for assessing replicability: if effect sizes are sufficiently heterogeneous—even if the sign of the effect is consistent—the phenomenon in question does not seem particularly replicable and the theory underlying it seems poorly constructed and in need of enrichment. Uncovering why and revising theory in light of it will lead to improved theory that explains heterogeneity and increases replicability. Given this, large-scale replication projects can play an important role not only in assessing replicability but also in advancing theory.
Journal: The American Statistician
Pages: 99-105
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1505655
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1505655
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:99-105
Template-Type: ReDIF-Article 1.0
Author-Name: Sander Greenland
Author-X-Name-First: Sander
Author-X-Name-Last: Greenland
Title: Valid P-Values Behave Exactly as They Should: Some Misleading Criticisms of P-Values and Their Resolution With S-Values
Abstract:
The present note explores sources of misplaced criticisms of P-values, such as conflicting definitions of “significance levels” and “P-values” in authoritative sources, and the consequent misinterpretation of P-values as error probabilities. It then discusses several properties of P-values that have been presented as fatal flaws: That P-values exhibit extreme variation across samples (and thus are “unreliable”), confound effect size with sample size, are sensitive to sample size, and depend on investigator sampling intentions. These properties are often criticized from a likelihood or Bayesian framework, yet they are exactly the properties P-values should exhibit when they are constructed and interpreted correctly within their originating framework. Other common criticisms are that P-values force users to focus on irrelevant hypotheses and overstate evidence against those hypotheses. These problems are not however properties of P-values but are faults of researchers who focus on null hypotheses and overstate evidence based on misperceptions that p = 0.05 represents enough evidence to reject hypotheses. Those problems are easily seen without use of Bayesian concepts by translating the observed P-value p into the Shannon information (S-value or surprisal) –log2(p).
Journal: The American Statistician
Pages: 106-114
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1529625
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1529625
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:106-114
Template-Type: ReDIF-Article 1.0
Author-Name: Rebecca A. Betensky
Author-X-Name-First: Rebecca A.
Author-X-Name-Last: Betensky
Title: The p-Value Requires Context, Not a Threshold
Abstract:
It is widely recognized by statisticians, though not as widely by other researchers, that the p-value cannot be interpreted in isolation, but rather must be considered in the context of certain features of the design and substantive application, such as sample size and meaningful effect size. I consider the setting of the normal mean and highlight the information contained in the p-value in conjunction with the sample size and meaningful effect size. The p-value and sample size jointly yield 95% confidence bounds for the effect of interest, which can be compared to the predetermined meaningful effect size to make inferences about the true effect. I provide simple examples to demonstrate that although the p-value is calculated under the null hypothesis, and thus seemingly may be divorced from the features of the study from which it arises, its interpretation as a measure of evidence requires its contextualization within the study. This implies that any proposal for improved use of the p-value as a measure of the strength of evidence cannot simply be a change to the threshold for significance.
Journal: The American Statistician
Pages: 115-117
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1529624
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1529624
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:115-117
Template-Type: ReDIF-Article 1.0
Author-Name: Andrew A. Anderson
Author-X-Name-First: Andrew A.
Author-X-Name-Last: Anderson
Title: Assessing Statistical Results: Magnitude, Precision, and Model Uncertainty
Abstract:
Evaluating the importance and the strength of empirical evidence requires asking three questions: First, what are the practical implications of the findings? Second, how precise are the estimates? Confidence intervals provide an intuitive way to communicate precision. Although nontechnical audiences often misinterpret confidence intervals (CIs), I argue that the result is less dangerous than the misunderstandings that arise from hypothesis tests. Third, is the model correctly specified? The validity of point estimates and CIs depends on the soundness of the underlying model.
Journal: The American Statistician
Pages: 118-121
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1537889
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1537889
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:118-121
Template-Type: ReDIF-Article 1.0
Author-Name: Joachim I. Krueger
Author-X-Name-First: Joachim I.
Author-X-Name-Last: Krueger
Author-Name: Patrick R. Heck
Author-X-Name-First: Patrick R.
Author-X-Name-Last: Heck
Title: Putting the P-Value in its Place
Abstract:
As the debate over best statistical practices continues in academic journals, conferences, and the blogosphere, working researchers (e.g., psychologists) need to figure out how much time and effort to invest in attending to experts' arguments, how to design their next project, and how to craft a sustainable long-term strategy for data analysis and inference. The present special issue of The American Statistician promises help. In this article, we offer a modest proposal for a continued and informed use of the conventional p-value without the pitfalls of statistical rituals. Other statistical indices should complement reporting, and extra-statistical (e.g., theoretical) judgments ought to be made with care and clarity.
Journal: The American Statistician
Pages: 122-128
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1470033
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1470033
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:122-128
Template-Type: ReDIF-Article 1.0
Author-Name: Valen E. Johnson
Author-X-Name-First: Valen E.
Author-X-Name-Last: Johnson
Title: Evidence From Marginally Significant t Statistics
Abstract:
This article examines the evidence contained in t statistics that are marginally significant in 5% tests. The bases for evaluating evidence are likelihood ratios and integrated likelihood ratios, computed under a variety of assumptions regarding the alternative hypotheses in null hypothesis significance tests. Likelihood ratios and integrated likelihood ratios provide a useful measure of the evidence in favor of competing hypotheses because they can be interpreted as representing the ratio of the probabilities that each hypothesis assigns to observed data. When they are either very large or very small, they suggest that one hypothesis is much better than the other in predicting observed data. If they are close to 1.0, then both hypotheses provide approximately equally valid explanations for observed data. I find that p-values that are close to 0.05 (i.e., that are “marginally significant”) correspond to integrated likelihood ratios that are bounded by approximately 7 in two-sided tests, and by approximately 4 in one-sided tests.The modest magnitude of integrated likelihood ratios corresponding to p-values close to 0.05 clearly suggests that higher standards of evidence are needed to support claims of novel discoveries and new effects.
Journal: The American Statistician
Pages: 129-134
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1518788
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1518788
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:129-134
Template-Type: ReDIF-Article 1.0
Author-Name: D. A. S. Fraser
Author-X-Name-First: D. A. S.
Author-X-Name-Last: Fraser
Title: The p-value Function and Statistical Inference
Abstract:
This article has two objectives. The first and narrower is to formalize the p-value function, which records all possible p-values, each corresponding to a value for whatever the scalar parameter of interest is for the problem at hand, and to show how this p-value function directly provides full inference information for any corresponding user or scientist. The p-value function provides familiar inference objects: significance levels, confidence intervals, critical values for fixed-level tests, and the power function at all values of the parameter of interest. It thus gives an immediate accurate and visual summary of inference information for the parameter of interest. We show that the p-value function of the key scalar interest parameter records the statistical position of the observed data relative to that parameter, and we then describe an accurate approximation to that p-value function which is readily constructed.
Journal: The American Statistician
Pages: 135-147
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1556735
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1556735
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:135-147
Template-Type: ReDIF-Article 1.0
Author-Name: Jonathan Rougier
Author-X-Name-First: Jonathan
Author-X-Name-Last: Rougier
Title: p-Values, Bayes Factors, and Sufficiency
Abstract:
Various approaches can be used to construct a model from a null distribution and a test statistic. I prove that one such approach, originating with D. R. Cox, has the property that the p-value is never greater than the Generalized Likelihood Ratio (GLR). When combined with the general result that the GLR is never greater than any Bayes factor, we conclude that, under Cox’s model, the p-value is never greater than any Bayes factor. I also provide a generalization, illustrations for the canonical Normal model, and an alternative approach based on sufficiency. This result is relevant for the ongoing discussion about the evidential value of small p-values, and the movement among statisticians to “redefine statistical significance.”
Journal: The American Statistician
Pages: 148-151
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1502684
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1502684
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:148-151
Template-Type: ReDIF-Article 1.0
Author-Name: Sherri Rose
Author-X-Name-First: Sherri
Author-X-Name-Last: Rose
Author-Name: Thomas G. McGuire
Author-X-Name-First: Thomas G.
Author-X-Name-Last: McGuire
Title: Limitations of P-Values and R-squared for Stepwise Regression Building: A Fairness Demonstration in Health Policy Risk Adjustment
Abstract:
Stepwise regression building procedures are commonly used applied statistical tools, despite their well-known drawbacks. While many of their limitations have been widely discussed in the literature, other aspects of the use of individual statistical fit measures, especially in high-dimensional stepwise regression settings, have not. Giving primacy to individual fit, as is done with p-values and R2, when group fit may be the larger concern, can lead to misguided decision making. One of the most consequential uses of stepwise regression is in health care, where these tools allocate hundreds of billions of dollars to health plans enrolling individuals with different predicted health care costs. The main goal of this “risk adjustment” system is to convey incentives to health plans such that they provide health care services fairly, a component of which is not to discriminate in access or care for persons or groups likely to be expensive. We address some specific limitations of p-values and R2 for high-dimensional stepwise regression in this policy problem through an illustrated example by additionally considering a group-level fairness metric.
Journal: The American Statistician
Pages: 152-156
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1518269
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1518269
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:152-156
Template-Type: ReDIF-Article 1.0
Author-Name: Jeffrey D. Blume
Author-X-Name-First: Jeffrey D.
Author-X-Name-Last: Blume
Author-Name: Robert A. Greevy
Author-X-Name-First: Robert A.
Author-X-Name-Last: Greevy
Author-Name: Valerie F. Welty
Author-X-Name-First: Valerie F.
Author-X-Name-Last: Welty
Author-Name: Jeffrey R. Smith
Author-X-Name-First: Jeffrey R.
Author-X-Name-Last: Smith
Author-Name: William D. Dupont
Author-X-Name-First: William D.
Author-X-Name-Last: Dupont
Title: An Introduction to Second-Generation p-Values
Abstract:
Second generation p-values preserve the simplicity that has made p-values popular while resolving critical flaws that promote misinterpretation of data, distraction by trivial effects, and unreproducible assessments of data. The second-generation p-value (SGPV) is an extension that formally accounts for scientific relevance by using a composite null hypothesis that captures null and scientifically trivial effects. Because the majority of spurious findings are small effects that are technically nonnull but practically indistinguishable from the null, the second-generation approach greatly reduces the likelihood of a false discovery. SGPVs promote transparency, rigor and reproducibility of scientific results by a priori identifying which candidate hypotheses are practically meaningful and by providing a more reliable statistical summary of when the data are compatible with the candidate hypotheses or null hypotheses, or when the data are inconclusive. We illustrate the importance of these advances using a dataset of 247,000 single-nucleotide polymorphisms, i.e., genetic markers that are potentially associated with prostate cancer.
Journal: The American Statistician
Pages: 157-167
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1537893
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1537893
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:157-167
Template-Type: ReDIF-Article 1.0
Author-Name: William M. Goodman
Author-X-Name-First: William M.
Author-X-Name-Last: Goodman
Author-Name: Susan E. Spruill
Author-X-Name-First: Susan E.
Author-X-Name-Last: Spruill
Author-Name: Eugene Komaroff
Author-X-Name-First: Eugene
Author-X-Name-Last: Komaroff
Title: A Proposed Hybrid Effect Size Plus p-Value Criterion: Empirical Evidence Supporting its Use
Abstract:
When the editors of Basic and Applied Social Psychology effectively banned the use of null hypothesis significance testing (NHST) from articles published in their journal, it set off a fire-storm of discussions both supporting the decision and defending the utility of NHST in scientific research. At the heart of NHST is the p-value which is the probability of obtaining an effect equal to or more extreme than the one observed in the sample data, given the null hypothesis and other model assumptions. Although this is conceptually different from the probability of the null hypothesis being true, given the sample, p-values nonetheless can provide evidential information, toward making an inference about a parameter. Applying a 10,000-case simulation described in this article, the authors found that p-values’ inferential signals to either reject or not reject a null hypothesis about the mean (α = 0.05) were consistent for almost 70% of the cases with the parameter’s true location for the sampled-from population. Success increases if a hybrid decision criterion, minimum effect size plus p-value (MESP), is used. Here, rejecting the null also requires the difference of the observed statistic from the exact null to be meaningfully large or practically significant, in the researcher’s judgment and experience. The simulation compares performances of several methods: from p-value and/or effect size-based, to confidence-interval based, under various conditions of true location of the mean, test power, and comparative sizes of the meaningful distance and population variability. For any inference procedure that outputs a binary indicator, like flagging whether a p-value is significant, the output of one single experiment is not sufficient evidence for a definitive conclusion. Yet, if a tool like MESP generates a relatively reliable signal and is used knowledgeably as part of a research process, it can provide useful information.
Journal: The American Statistician
Pages: 168-185
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1564697
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1564697
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:168-185
Template-Type: ReDIF-Article 1.0
Author-Name: Daniel J. Benjamin
Author-X-Name-First: Daniel J.
Author-X-Name-Last: Benjamin
Author-Name: James O. Berger
Author-X-Name-First: James O.
Author-X-Name-Last: Berger
Title: Three Recommendations for Improving the Use of p-Values
Abstract:
Researchers commonly use p-values to answer the question: How strongly does the evidence favor the alternative hypothesis relative to the null hypothesis? p-Values themselves do not directly answer this question and are often misinterpreted in ways that lead to overstating the evidence against the null hypothesis. Even in the “post p < 0.05 era,” however, it is quite possible that p-values will continue to be widely reported and used to assess the strength of evidence (if for no other reason than the widespread availability and use of statistical software that routinely produces p-values and thereby implicitly advocates for their use). If so, the potential for misinterpretation will persist. In this article, we recommend three practices that would help researchers more accurately interpret p-values. Each of the three recommended practices involves interpreting p-values in light of their corresponding “Bayes factor bound,” which is the largest odds in favor of the alternative hypothesis relative to the null hypothesis that is consistent with the observed data. The Bayes factor bound generally indicates that a given p-value provides weaker evidence against the null hypothesis than typically assumed. We therefore believe that our recommendations can guard against some of the most harmful p-value misinterpretations. In research communities that are deeply attached to reliance on “p < 0.05,” our recommendations will serve as initial steps away from this attachment. We emphasize that our recommendations are intended merely as initial, temporary steps and that many further steps will need to be taken to reach the ultimate destination: a holistic interpretation of statistical evidence that fully conforms to the principles laid out in the ASA statement on statistical significance and p-values.
Journal: The American Statistician
Pages: 186-191
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1543135
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1543135
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:186-191
Template-Type: ReDIF-Article 1.0
Author-Name: David Colquhoun
Author-X-Name-First: David
Author-X-Name-Last: Colquhoun
Title: The False Positive Risk: A Proposal Concerning What to Do About p-Values
Abstract:
It is widely acknowledged that the biomedical literature suffers from a surfeit of false positive results. Part of the reason for this is the persistence of the myth that observation of p < 0.05 is sufficient justification to claim that you have made a discovery. It is hopeless to expect users to change their reliance on p-values unless they are offered an alternative way of judging the reliability of their conclusions. If the alternative method is to have a chance of being adopted widely, it will have to be easy to understand and to calculate. One such proposal is based on calculation of false positive risk(FPR). It is suggested that p-values and confidence intervals should continue to be given, but that they should be supplemented by a single additional number that conveys the strength of the evidence better than the p-value. This number could be the minimum FPR (that calculated on the assumption of a prior probability of 0.5, the largest value that can be assumed in the absence of hard prior data). Alternatively one could specify the prior probability that it would be necessary to believe in order to achieve an FPR of, say, 0.05.
Journal: The American Statistician
Pages: 192-201
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1529622
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1529622
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:192-201
Template-Type: ReDIF-Article 1.0
Author-Name: Robert A. J. Matthews
Author-X-Name-First: Robert A. J.
Author-X-Name-Last: Matthews
Title: Moving Towards the Post p < 0.05 Era via the Analysis of Credibility
Abstract:
It is now widely accepted that the techniques of null hypothesis significance testing (NHST) are routinely misused and misinterpreted by researchers seeking insight from data. There is, however, no consensus on acceptable alternatives, leaving researchers with little choice but to continue using NHST, regardless of its failings. I examine the potential for the Analysis of Credibility (AnCred) to resolve this impasse. Using real-life examples, I assess the ability of AnCred to provide researchers with a simple but robust framework for assessing study findings that goes beyond the standard dichotomy of statistical significance/nonsignificance. By extracting more insight from standard summary statistics while offering more protection against inferential fallacies, AnCred may encourage researchers to move toward the post p < 0.05 era.
Journal: The American Statistician
Pages: 202-212
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1543136
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1543136
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:202-212
Template-Type: ReDIF-Article 1.0
Author-Name: Mark Andrew Gannon
Author-X-Name-First: Mark Andrew
Author-X-Name-Last: Gannon
Author-Name: Carlos Alberto de Bragança Pereira
Author-X-Name-First: Carlos Alberto
Author-X-Name-Last: de Bragança Pereira
Author-Name: Adriano Polpo
Author-X-Name-First: Adriano
Author-X-Name-Last: Polpo
Title: Blending Bayesian and Classical Tools to Define Optimal Sample-Size-Dependent Significance Levels
Abstract:
This article argues that researchers do not need to completely abandon the p-value, the best-known significance index, but should instead stop using significance levels that do not depend on sample sizes. A testing procedure is developed using a mixture of frequentist and Bayesian tools, with a significance level that is a function of sample size, obtained from a generalized form of the Neyman–Pearson Lemma that minimizes a linear combination of α, the probability of rejecting a true null hypothesis, and β, the probability of failing to reject a false null, instead of fixing α and minimizing β. The resulting hypothesis tests do not violate the Likelihood Principle and do not require any constraints on the dimensionalities of the sample space and parameter space. The procedure includes an ordering of the entire sample space and uses predictive probability (density) functions, allowing for testing of both simple and compound hypotheses. Accessible examples are presented to highlight specific characteristics of the new tests.
Journal: The American Statistician
Pages: 213-222
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1518268
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1518268
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:213-222
Template-Type: ReDIF-Article 1.0
Author-Name: Stanley Pogrow
Author-X-Name-First: Stanley
Author-X-Name-Last: Pogrow
Title: How Effect Size (Practical Significance) Misleads Clinical Practice: The Case for Switching to Practical Benefit to Assess Applied Research Findings
Abstract:
Relying on effect size as a measure of practical significance is turning out to be just as misleading as using p-values to determine the effectiveness of interventions for improving clinical practice in complex organizations such as schools. This article explains how effect sizes have misdirected practice in education and other disciplines. Even when effect size is incorporated into RCT research the recommendations of whether interventions are effective are misleading and generally useless to practitioners. As a result, a new criterion of practical benefit is recommended for evaluating research findings about the effectiveness of interventions in complex organizations where benchmarks of existing performance exist. Practical benefit exists when the unadjusted performance of an experimental group provides a noticeable advantage over an existing benchmark. Some basic principles for determining practical benefit are provided. Practical benefit is more intuitive and is expected to enable leaders to make more accurate assessments as to whether published research findings are likely to produce noticeable improvements in their organizations. In addition, practical benefit is used routinely as the research criterion for the alternative scientific methodology of improvement science that has an established track record of being a more efficient way to develop new interventions that improve practice dramatically than RCT research. Finally, the problems with practical significance suggest that the research community should seek different inferential methods for research designed to improve clinical performance in complex organizations, as compared to methods for testing theories and medicines.
Journal: The American Statistician
Pages: 223-234
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1549101
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1549101
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:223-234
Template-Type: ReDIF-Article 1.0
Author-Name: Blakeley B. McShane
Author-X-Name-First: Blakeley B.
Author-X-Name-Last: McShane
Author-Name: David Gal
Author-X-Name-First: David
Author-X-Name-Last: Gal
Author-Name: Andrew Gelman
Author-X-Name-First: Andrew
Author-X-Name-Last: Gelman
Author-Name: Christian Robert
Author-X-Name-First: Christian
Author-X-Name-Last: Robert
Author-Name: Jennifer L. Tackett
Author-X-Name-First: Jennifer L.
Author-X-Name-Last: Tackett
Title: Abandon Statistical Significance
Abstract:
We discuss problems the null hypothesis significance testing (NHST) paradigm poses for replication and more broadly in the biomedical and social sciences as well as how these problems remain unresolved by proposals involving modified p-value thresholds, confidence intervals, and Bayes factors. We then discuss our own proposal, which is to abandon statistical significance. We recommend dropping the NHST paradigm—and the p-value thresholds intrinsic to it—as the default statistical paradigm for research, publication, and discovery in the biomedical and social sciences. Specifically, we propose that the p-value be demoted from its threshold screening role and instead, treated continuously, be considered along with currently subordinate factors (e.g., related prior evidence, plausibility of mechanism, study design and data quality, real world costs and benefits, novelty of finding, and other factors that vary by research domain) as just one among many pieces of evidence. We have no desire to “ban” p-values or other purely statistical measures. Rather, we believe that such measures should not be thresholded and that, thresholded or not, they should not take priority over the currently subordinate factors. We also argue that it seldom makes sense to calibrate evidence as a function of p-values or other purely statistical measures. We offer recommendations for how our proposal can be implemented in the scientific publication process as well as in statistical decision making more broadly.
Journal: The American Statistician
Pages: 235-245
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1527253
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1527253
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:235-245
Template-Type: ReDIF-Article 1.0
Author-Name: Christopher Tong
Author-X-Name-First: Christopher
Author-X-Name-Last: Tong
Title: Statistical Inference Enables Bad Science; Statistical Thinking Enables Good Science
Abstract:
Scientific research of all kinds should be guided by statistical thinking: in the design and conduct of the study, in the disciplined exploration and enlightened display of the data, and to avoid statistical pitfalls in the interpretation of the results. However, formal, probability-based statistical inference should play no role in most scientific research, which is inherently exploratory, requiring flexible methods of analysis that inherently risk overfitting. The nature of exploratory work is that data are used to help guide model choice, and under these circumstances, uncertainty cannot be precisely quantified, because of the inevitable model selection bias that results. To be valid, statistical inference should be restricted to situations where the study design and analysis plan are specified prior to data collection. Exploratory data analysis provides the flexibility needed for most other situations, including statistical methods that are regularized, robust, or nonparametric. Of course, no individual statistical analysis should be considered sufficient to establish scientific validity: research requires many sets of data along many lines of evidence, with a watchfulness for systematic error. Replicating and predicting findings in new data and new settings is a stronger way of validating claims than blessing results from an isolated study with statistical inferences.
Journal: The American Statistician
Pages: 246-261
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1518264
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1518264
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:246-261
Template-Type: ReDIF-Article 1.0
Author-Name: Valentin Amrhein
Author-X-Name-First: Valentin
Author-X-Name-Last: Amrhein
Author-Name: David Trafimow
Author-X-Name-First: David
Author-X-Name-Last: Trafimow
Author-Name: Sander Greenland
Author-X-Name-First: Sander
Author-X-Name-Last: Greenland
Title: Inferential Statistics as Descriptive Statistics: There Is No Replication Crisis if We Don’t Expect Replication
Abstract:
Statistical inference often fails to replicate. One reason is that many results may be selected for drawing inference because some threshold of a statistic like the P-value was crossed, leading to biased reported effect sizes. Nonetheless, considerable non-replication is to be expected even without selective reporting, and generalizations from single studies are rarely if ever warranted. Honestly reported results must vary from replication to replication because of varying assumption violations and random variation; excessive agreement itself would suggest deeper problems, such as failure to publish results in conflict with group expectations or desires. A general perception of a “replication crisis” may thus reflect failure to recognize that statistical tests not only test hypotheses, but countless assumptions and the entire environment in which research takes place. Because of all the uncertain and unknown assumptions that underpin statistical inferences, we should treat inferential statistics as highly unstable local descriptions of relations between assumptions and data, rather than as providing generalizable inferences about hypotheses or models. And that means we should treat statistical results as being much more incomplete and uncertain than is currently the norm. Acknowledging this uncertainty could help reduce the allure of selective reporting: Since a small P-value could be large in a replication study, and a large P-value could be small, there is simply no need to selectively report studies based on statistical results. Rather than focusing our study reports on uncertain conclusions, we should thus focus on describing accurately how the study was conducted, what problems occurred, what data were obtained, what analysis methods were used and why, and what output those methods produced.
Journal: The American Statistician
Pages: 262-270
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1543137
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1543137
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:262-270
Template-Type: ReDIF-Article 1.0
Author-Name: Robert J. Calin-Jageman
Author-X-Name-First: Robert J.
Author-X-Name-Last: Calin-Jageman
Author-Name: Geoff Cumming
Author-X-Name-First: Geoff
Author-X-Name-Last: Cumming
Title: The New Statistics for Better Science: Ask How Much, How Uncertain, and What Else Is Known
Abstract:
The “New Statistics” emphasizes effect sizes, confidence intervals, meta-analysis, and the use of Open Science practices. We present three specific ways in which a New Statistics approach can help improve scientific practice: by reducing overconfidence in small samples, by reducing confirmation bias, and by fostering more cautious judgments of consistency. We illustrate these points through consideration of the literature on oxytocin and human trust, a research area that typifies some of the endemic problems that arise with poor statistical practice.
Journal: The American Statistician
Pages: 271-280
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1518266
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1518266
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:271-280
Template-Type: ReDIF-Article 1.0
Author-Name: Stephen T. Ziliak
Author-X-Name-First: Stephen T.
Author-X-Name-Last: Ziliak
Title: How Large Are Your G-Values? Try Gosset’s Guinnessometrics When a Little “p” Is Not Enough
Abstract:
A crisis of validity has emerged from three related crises of science, that is, the crises of statistical significance and complete randomization, of replication, and of reproducibility. Guinnessometrics takes commonplace assumptions and methods of statistical science and stands them on their head, from little p-values to unstructured Big Data. Guinnessometrics focuses instead on the substantive significance which emerges from a small series of independent and economical yet balanced and repeated experiments. Originally developed and market-tested by William S. Gosset aka “Student” in his job as Head Experimental Brewer at the Guinness Brewery in Dublin, Gosset’s economic and common sense approach to statistical inference and scientific method has been unwisely neglected. In many areas of science and life, the 10 principles of Guinnessometrics or G-values outlined here can help. Other things equal, the larger the G-values, the better the science and judgment. By now a colleague, neighbor, or YouTube junkie has probably shown you one of those wacky psychology experiments in a video involving a gorilla, and testing the limits of human cognition. In one video, a person wearing a gorilla suit suddenly appears on the scene among humans, who are themselves engaged in some ordinary, mundane activity such as passing a basketball. The funny thing is, prankster researchers have discovered, when observers are asked to think about the mundane activity (such as by counting the number of observed passes of a basketball), the unexpected gorilla is frequently unseen (for discussion see Kahneman 2011). The gorilla is invisible. People don’t see it.
Journal: The American Statistician
Pages: 281-290
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1514325
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1514325
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:281-290
Template-Type: ReDIF-Article 1.0
Author-Name: Dean Billheimer
Author-X-Name-First: Dean
Author-X-Name-Last: Billheimer
Title: Predictive Inference and Scientific Reproducibility
Abstract:
Most statistical analyses use hypothesis tests or estimation about parameters to form inferential conclusions. I think this is noble, but misguided. The point of view expressed here is that observables are fundamental, and that the goal of statistical modeling should be to predict future observations, given the current data and other relevant information. Further, the prediction of future observables provides multiple advantages to practicing scientists, and to science in general. These include an interpretable numerical summary of a quantity of direct interest to current and future researchers, a calibrated prediction of what’s likely to happen in future experiments, a prediction that can be either “corroborated” or “refuted” through experimentation, and avoidance of inference about parameters; quantities that exists only as convenient indices of hypothetical distributions. Finally, the predictive probability of a future observable can be used as a standard for communicating the reliability of the current work, regardless of whether confirmatory experiments are conducted. Adoption of this paradigm would improve our rigor for scientific accuracy and reproducibility by shifting our focus from “finding differences” among hypothetical parameters to predicting observable events based on our current scientific understanding.
Journal: The American Statistician
Pages: 291-295
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1518270
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1518270
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:291-295
Template-Type: ReDIF-Article 1.0
Author-Name: Charles F. Manski
Author-X-Name-First: Charles F.
Author-X-Name-Last: Manski
Title: Treatment Choice With Trial Data: Statistical Decision Theory Should Supplant Hypothesis Testing
Abstract:
A central objective of empirical research on treatment response is to inform treatment choice. Unfortunately, researchers commonly use concepts of statistical inference whose foundations are distant from the problem of treatment choice. It has been particularly common to use hypothesis tests to compare treatments. Wald’s development of statistical decision theory provides a coherent frequentist framework for use of sample data on treatment response to make treatment decisions. A body of recent research applies statistical decision theory to characterize uniformly satisfactory treatment choices, in the sense of maximum loss relative to optimal decisions (also known as maximum regret). This article describes the basic ideas and findings, which provide an appealing practical alternative to use of hypothesis tests. For simplicity, the article focuses on medical treatment with evidence from classical randomized clinical trials. The ideas apply generally, encompassing use of observational data and treatment choice in nonmedical contexts.
Journal: The American Statistician
Pages: 296-304
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1513377
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1513377
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:296-304
Template-Type: ReDIF-Article 1.0
Author-Name: Charles F. Manski
Author-X-Name-First: Charles F.
Author-X-Name-Last: Manski
Author-Name: Aleksey Tetenov
Author-X-Name-First: Aleksey
Author-X-Name-Last: Tetenov
Title: Trial Size for Near-Optimal Choice Between Surveillance and Aggressive Treatment: Reconsidering MSLT-II
Abstract:
A convention in designing randomized clinical trials has been to choose sample sizes that yield specified statistical power when testing hypotheses about treatment response. Manski and Tetenov recently critiqued this convention and proposed enrollment of sufficiently many subjects to enable near-optimal treatment choices. This article develops a refined version of that analysis applicable to trials comparing aggressive treatment of patients with surveillance. The need for a refined analysis arises because the earlier work assumed that there is only a primary health outcome of interest, without secondary outcomes. An important aspect of choice between surveillance and aggressive treatment is that the latter may have side effects. One should then consider how the primary outcome and side effects jointly determine patient welfare. This requires new analysis of sample design. As a case study, we reconsider a trial comparing nodal observation and lymph node dissection when treating patients with cutaneous melanoma. Using a statistical power calculation, the investigators assigned 971 patients to dissection and 968 to observation. We conclude that assigning 244 patients to each option would yield findings that enable suitably near-optimal treatment choice. Thus, a much smaller sample size would have sufficed to inform clinical practice.
Journal: The American Statistician
Pages: 305-311
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1543617
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1543617
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:305-311
Template-Type: ReDIF-Article 1.0
Author-Name: Michael Lavine
Author-X-Name-First: Michael
Author-X-Name-Last: Lavine
Title: Frequentist, Bayes, or Other?
Abstract:
Both philosophically and in practice, statistics is dominated by frequentist and Bayesian thinking. Under those paradigms, our courses and textbooks talk about the accuracy with which true model parameters are estimated or the posterior probability that they lie in a given set. In nonparametric problems, they talk about convergence to the true function (density, regression, etc.) or the probability that the true function lies in a given set. But the usual paradigms' focus on learning the true model and parameters can distract the analyst from another important task: discovering whether there are many sets of models and parameters that describe the data reasonably well. When we discover many good models we can see in what ways they agree. Points of agreement give us more confidence in our inferences, but points of disagreement give us less. Further, the usual paradigms’ focus seduces us into judging and adopting procedures according to how well they learn the true values. An alternative is to judge models and parameter values, not procedures, and judge them by how well they describe data, not how close they come to the truth. The latter is especially appealing in problems without a true model.
Journal: The American Statistician
Pages: 312-318
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1459317
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1459317
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:312-318
Template-Type: ReDIF-Article 1.0
Author-Name: Stephen J. Ruberg
Author-X-Name-First: Stephen J.
Author-X-Name-Last: Ruberg
Author-Name: Frank E. Harrell
Author-X-Name-First: Frank E.
Author-X-Name-Last: Harrell
Author-Name: Margaret Gamalo-Siebers
Author-X-Name-First: Margaret
Author-X-Name-Last: Gamalo-Siebers
Author-Name: Lisa LaVange
Author-X-Name-First: Lisa
Author-X-Name-Last: LaVange
Author-Name: J. Jack Lee
Author-X-Name-First: J.
Author-X-Name-Last: Jack Lee
Author-Name: Karen Price
Author-X-Name-First: Karen
Author-X-Name-Last: Price
Author-Name: Carl Peck
Author-X-Name-First: Carl
Author-X-Name-Last: Peck
Title: Inference and Decision Making for 21st-Century Drug Development and Approval
Abstract:
The cost and time of pharmaceutical drug development continue to grow at rates that many say are unsustainable. These trends have enormous impact on what treatments get to patients, when they get them and how they are used. The statistical framework for supporting decisions in regulated clinical development of new medicines has followed a traditional path of frequentist methodology. Trials using hypothesis tests of “no treatment effect” are done routinely, and the p-value < 0.05 is often the determinant of what constitutes a “successful” trial. Many drugs fail in clinical development, adding to the cost of new medicines, and some evidence points blame at the deficiencies of the frequentist paradigm. An unknown number effective medicines may have been abandoned because trials were declared “unsuccessful” due to a p-value exceeding 0.05. Recently, the Bayesian paradigm has shown utility in the clinical drug development process for its probability-based inference. We argue for a Bayesian approach that employs data from other trials as a “prior” for Phase 3 trials so that synthesized evidence across trials can be utilized to compute probability statements that are valuable for understanding the magnitude of treatment effect. Such a Bayesian paradigm provides a promising framework for improving statistical inference and regulatory decision making.
Journal: The American Statistician
Pages: 319-327
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2019.1566091
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1566091
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:319-327
Template-Type: ReDIF-Article 1.0
Author-Name: Noah N. N. van Dongen
Author-X-Name-First: Noah N. N.
Author-X-Name-Last: van Dongen
Author-Name: Johnny B. van Doorn
Author-X-Name-First: Johnny B.
Author-X-Name-Last: van Doorn
Author-Name: Quentin F. Gronau
Author-X-Name-First: Quentin F.
Author-X-Name-Last: Gronau
Author-Name: Don van Ravenzwaaij
Author-X-Name-First: Don
Author-X-Name-Last: van Ravenzwaaij
Author-Name: Rink Hoekstra
Author-X-Name-First: Rink
Author-X-Name-Last: Hoekstra
Author-Name: Matthias N. Haucke
Author-X-Name-First: Matthias N.
Author-X-Name-Last: Haucke
Author-Name: Daniel Lakens
Author-X-Name-First: Daniel
Author-X-Name-Last: Lakens
Author-Name: Christian Hennig
Author-X-Name-First: Christian
Author-X-Name-Last: Hennig
Author-Name: Richard D. Morey
Author-X-Name-First: Richard D.
Author-X-Name-Last: Morey
Author-Name: Saskia Homer
Author-X-Name-First: Saskia
Author-X-Name-Last: Homer
Author-Name: Andrew Gelman
Author-X-Name-First: Andrew
Author-X-Name-Last: Gelman
Author-Name: Jan Sprenger
Author-X-Name-First: Jan
Author-X-Name-Last: Sprenger
Author-Name: Eric-Jan Wagenmakers
Author-X-Name-First: Eric-Jan
Author-X-Name-Last: Wagenmakers
Title: Multiple Perspectives on Inference for Two Simple Statistical Scenarios
Abstract:
When data analysts operate within different statistical frameworks (e.g., frequentist versus Bayesian, emphasis on estimation versus emphasis on testing), how does this impact the qualitative conclusions that are drawn for real data? To study this question empirically we selected from the literature two simple scenarios—involving a comparison of two proportions and a Pearson correlation—and asked four teams of statisticians to provide a concise analysis and a qualitative interpretation of the outcome. The results showed considerable overall agreement; nevertheless, this agreement did not appear to diminish the intensity of the subsequent debate over which statistical framework is more appropriate to address the questions at hand.
Journal: The American Statistician
Pages: 328-339
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2019.1565553
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1565553
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:328-339
Template-Type: ReDIF-Article 1.0
Author-Name: David Trafimow
Author-X-Name-First: David
Author-X-Name-Last: Trafimow
Title: Five Nonobvious Changes in Editorial Practice for Editors and Reviewers to Consider When Evaluating Submissions in a Post p < 0.05 Universe
Abstract:
The American Statistical Association’s Symposium on Statistical Inference (SSI) included a session on how editorial practices should change in a universe no longer dominated by null hypothesis significance testing (NHST). The underlying assumptions were first, that NHST is problematic; and second, that editorial practices really should change. The present article is based on my talk in this session, and on these assumptions. Consistent with the spirit of the SSI, my focus is not on what reviewers and editors should not do (e.g., NHST) but rather on what they should do, with an emphasis on changes that are not obvious. The recommended changes include a wider consideration of the nature of the contribution than submitted manuscripts usually receive; a greater tolerance of ambiguity; more of an emphasis on the thinking and execution of the study, with a decreased emphasis on the findings; replacing NHST with the a priori procedure; and a call for reviewers and editors to recognize that there are many cases where the basic assumptions of inferential statistical procedures simply are not met, and that inferential statistics (even the a priori procedure) may consequently be inappropriate.
Journal: The American Statistician
Pages: 340-345
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1537888
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1537888
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:340-345
Template-Type: ReDIF-Article 1.0
Author-Name: Joseph J. Locascio
Author-X-Name-First: Joseph J.
Author-X-Name-Last: Locascio
Title: The Impact of Results Blind Science Publishing on Statistical Consultation and Collaboration
Abstract:
The author has previously proposed results blind manuscript evaluation (RBME) as a method of ameliorating often cited problems of statistical inference and scientific publication, notably publication bias, overuse/misuse of null hypothesis significance testing (NHST), and irreproducibility of reported scientific results. In RBME, manuscripts submitted to scientific journals are assessed for suitability for publication without regard to their reported results. Criteria for publication are based exclusively on the substantive importance of the research question addressed in the study, conveyed in the Introduction section of the manuscript, and the quality of the methodology, as reported in the Methods section. Practically, this policy is implemented by a two stage process whereby the editor initially distributes only the Introduction and Methods sections of a submitted manuscript to reviewers and a provisional decision regarding acceptance is made, followed by a second stage in which the complete manuscript is distributed for review but only if the decision of the first stage is for acceptance. The present paper expands upon this recommendation by addressing implications of this proposed policy with respect to statistical consultation and collaboration in research. It is suggested that under RBME, statisticians will become more integrated into research endeavors and called upon sooner for their input.
Journal: The American Statistician
Pages: 346-351
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1505658
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1505658
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:346-351
Template-Type: ReDIF-Article 1.0
Author-Name: Stuart H. Hurlbert
Author-X-Name-First: Stuart H.
Author-X-Name-Last: Hurlbert
Author-Name: Richard A. Levine
Author-X-Name-First: Richard A.
Author-X-Name-Last: Levine
Author-Name: Jessica Utts
Author-X-Name-First: Jessica
Author-X-Name-Last: Utts
Title: Coup de Grâce for a Tough Old Bull: “Statistically Significant” Expires
Abstract:
Many controversies in statistics are due primarily or solely to poor quality control in journals, bad statistical textbooks, bad teaching, unclear writing, and lack of knowledge of the historical literature. One way to improve the practice of statistics and resolve these issues is to do what initiators of the 2016 ASA statement did: take one issue at a time, have extensive discussions about the issue among statisticians of diverse backgrounds and perspectives and eventually develop and publish a broadly supported consensus on that issue. Upon completion of this task, we then move on to deal with another core issue in the same way. We propose as the next project a process that might lead quickly to a strong consensus that the term “statistically significant” and all its cognates and symbolic adjuncts be disallowed in the scientific literature except where focus is on the history of statistics and its philosophies and methodologies. Calculation and presentation of accurate p-values will often remain highly desirable though not obligatory. Supplementary materials for this article are available online in the form of an appendix listing the names and institutions of 48 other statisticians and scientists who endorse the principal propositions put forward here.
Journal: The American Statistician
Pages: 352-357
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1543616
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1543616
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:352-357
Template-Type: ReDIF-Article 1.0
Author-Name: Harlan Campbell
Author-X-Name-First: Harlan
Author-X-Name-Last: Campbell
Author-Name: Paul Gustafson
Author-X-Name-First: Paul
Author-X-Name-Last: Gustafson
Title: The World of Research Has Gone Berserk: Modeling the Consequences of Requiring “Greater Statistical Stringency” for Scientific Publication
Abstract:
In response to growing concern about the reliability and reproducibility of published science, researchers have proposed adopting measures of “greater statistical stringency,” including suggestions to require larger sample sizes and to lower the highly criticized “p < 0.05” significance threshold. While pros and cons are vigorously debated, there has been little to no modeling of how adopting these measures might affect what type of science is published. In this article, we develop a novel optimality model that, given current incentives to publish, predicts a researcher’s most rational use of resources in terms of the number of studies to undertake, the statistical power to devote to each study, and the desirable prestudy odds to pursue. We then develop a methodology that allows one to estimate the reliability of published research by considering a distribution of preferred research strategies. Using this approach, we investigate the merits of adopting measures of “greater statistical stringency” with the goal of informing the ongoing debate.
Journal: The American Statistician
Pages: 358-373
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1555101
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1555101
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:358-373
Template-Type: ReDIF-Article 1.0
Author-Name: Ronald D. Fricker
Author-X-Name-First: Ronald D.
Author-X-Name-Last: Fricker
Author-Name: Katherine Burke
Author-X-Name-First: Katherine
Author-X-Name-Last: Burke
Author-Name: Xiaoyan Han
Author-X-Name-First: Xiaoyan
Author-X-Name-Last: Han
Author-Name: William H. Woodall
Author-X-Name-First: William H.
Author-X-Name-Last: Woodall
Title: Assessing the Statistical Analyses Used in Basic and Applied Social Psychology After Their p-Value Ban
Abstract:
In this article, we assess the 31 articles published in Basic and Applied Social Psychology (BASP) in 2016, which is one full year after the BASP editors banned the use of inferential statistics. We discuss how the authors collected their data, how they reported and summarized their data, and how they used their data to reach conclusions. We found multiple instances of authors overstating conclusions beyond what the data would support if statistical significance had been considered. Readers would be largely unable to recognize this because the necessary information to do so was not readily available.
Journal: The American Statistician
Pages: 374-384
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1537892
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1537892
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:374-384
Template-Type: ReDIF-Article 1.0
Author-Name: Karsten Maurer
Author-X-Name-First: Karsten
Author-X-Name-Last: Maurer
Author-Name: Lynette Hudiburgh
Author-X-Name-First: Lynette
Author-X-Name-Last: Hudiburgh
Author-Name: Lisa Werwinski
Author-X-Name-First: Lisa
Author-X-Name-Last: Werwinski
Author-Name: John Bailer
Author-X-Name-First: John
Author-X-Name-Last: Bailer
Title: Content Audit for p-value Principles in Introductory Statistics
Abstract:
Longstanding concerns with the role and interpretation of p-values in statistical practice prompted the American Statistical Association (ASA) to make a statement on p-values. The ASA statement spurred a flurry of responses and discussions by statisticians, with many wondering about the steps necessary to expand the adoption of these principles. Introductory statistics classrooms are key locations to introduce and emphasize the nuance related to p-values; in part because they engrain appropriate analysis choices at the earliest stages of statistics education, and also because they reach the broadest group of students. We propose a framework for statistics departments to conduct a content audit for p-value principles in their introductory curriculum. We then discuss the process and results from applying this course audit framework within our own statistics department. We also recommend meeting with client departments as a complement to the course audit. Discussions about analyses and practices common to particular fields can help to evaluate if our service courses are meeting the needs of client departments and to identify what is needed in our introductory courses to combat the misunderstanding and future misuse of p-values.
Journal: The American Statistician
Pages: 385-391
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1537890
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1537890
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:385-391
Template-Type: ReDIF-Article 1.0
Author-Name: E. Ashley Steel
Author-X-Name-First: E. Ashley
Author-X-Name-Last: Steel
Author-Name: Martin Liermann
Author-X-Name-First: Martin
Author-X-Name-Last: Liermann
Author-Name: Peter Guttorp
Author-X-Name-First: Peter
Author-X-Name-Last: Guttorp
Title: Beyond Calculations: A Course in Statistical Thinking
Abstract:
Statisticians are in general agreement that there are flaws in how science is currently practiced; there is less agreement in how to make repairs. Our prescription for a Post-p < 0.05 Era is to develop and teach courses that expand our view of what constitutes the domain of statistics and thereby bridge undergraduate statistics coursework and the graduate student experience of applying statistics in research. Such courses can speed up the process of gaining statistical wisdom by giving students insight into the human propensity to make statistical errors, the meaning of a single test within a research project, ways in which p-values work and don't work as expected, the role of statistics in the lifecycle of science, and best practices for statistical communication. The course we have developed follows the story of how we use data to understand the world, leveraging simulation-based approaches to perform customized analyses and evaluate the behavior of statistical procedures. We provide ideas for expanding beyond the traditional classroom, two example activities, and a course syllabus as well as the set of statistical best practices for creating and consuming scientific information that we develop during the course.
Journal: The American Statistician
Pages: 392-401
Issue: S1
Volume: 73
Year: 2019
Month: 3
X-DOI: 10.1080/00031305.2018.1505657
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1505657
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:S1:p:392-401
Template-Type: ReDIF-Article 1.0
Author-Name: Johnny van Doorn
Author-X-Name-First: Johnny
Author-X-Name-Last: van Doorn
Author-Name: Alexander Ly
Author-X-Name-First: Alexander
Author-X-Name-Last: Ly
Author-Name: Maarten Marsman
Author-X-Name-First: Maarten
Author-X-Name-Last: Marsman
Author-Name: Eric-Jan Wagenmakers
Author-X-Name-First: Eric-Jan
Author-X-Name-Last: Wagenmakers
Title: Bayesian Inference for Kendall’s Rank Correlation Coefficient
Abstract:
This article outlines a Bayesian methodology to estimate and test the Kendall rank correlation coefficient τ. The nonparametric nature of rank data implies the absence of a generative model and the lack of an explicit likelihood function. These challenges can be overcome by modeling test statistics rather than data. We also introduce a method for obtaining a default prior distribution. The combined result is an inferential methodology that yields a posterior distribution for Kendall’s τ.
Journal: The American Statistician
Pages: 303-308
Issue: 4
Volume: 72
Year: 2018
Month: 10
X-DOI: 10.1080/00031305.2016.1264998
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1264998
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:303-308
Template-Type: ReDIF-Article 1.0
Author-Name: Agnan Kessy
Author-X-Name-First: Agnan
Author-X-Name-Last: Kessy
Author-Name: Alex Lewin
Author-X-Name-First: Alex
Author-X-Name-Last: Lewin
Author-Name: Korbinian Strimmer
Author-X-Name-First: Korbinian
Author-X-Name-Last: Strimmer
Title: Optimal Whitening and Decorrelation
Abstract:
Whitening, or sphering, is a common preprocessing step in statistical analysis to transform random variables to orthogonality. However, due to rotational freedom there are infinitely many possible whitening procedures. Consequently, there is a diverse range of sphering methods in use, for example, based on principal component analysis (PCA), Cholesky matrix decomposition, and zero-phase component analysis (ZCA), among others. Here, we provide an overview of the underlying theory and discuss five natural whitening procedures. Subsequently, we demonstrate that investigating the cross-covariance and the cross-correlation matrix between sphered and original variables allows to break the rotational invariance and to identify optimal whitening transformations. As a result we recommend two particular approaches: ZCA-cor whitening to produce sphered variables that are maximally similar to the original variables, and PCA-cor whitening to obtain sphered variables that maximally compress the original variables.
Journal: The American Statistician
Pages: 309-314
Issue: 4
Volume: 72
Year: 2018
Month: 10
X-DOI: 10.1080/00031305.2016.1277159
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1277159
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:309-314
Template-Type: ReDIF-Article 1.0
Author-Name: Weizhen Wang
Author-X-Name-First: Weizhen
Author-X-Name-Last: Wang
Title: A “Paradox” in Confidence Interval Construction Using Sufficient Statistics
Abstract:
Statistical inference about parameters should depend on raw data only through sufficient statistics—the well known sufficiency principle. In particular, inference should depend on minimal sufficient statistics if these are simpler than the raw data. In this article, we construct one-sided confidence intervals for a proportion which: (i) depend on the raw binary data, and (ii) are uniformly shorter than the smallest intervals based on the binomial random variable—a minimal sufficient statistic. In practice, randomized confidence intervals are seldom used. The proposed intervals violate the aforementioned principle if the search of optimal intervals is restricted within the class of nonrandomized confidence intervals. Similar results occur for other discrete distributions.
Journal: The American Statistician
Pages: 315-320
Issue: 4
Volume: 72
Year: 2018
Month: 10
X-DOI: 10.1080/00031305.2017.1305292
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1305292
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:315-320
Template-Type: ReDIF-Article 1.0
Author-Name: Michael Harwell
Author-X-Name-First: Michael
Author-X-Name-Last: Harwell
Author-Name: Nidhi Kohli
Author-X-Name-First: Nidhi
Author-X-Name-Last: Kohli
Author-Name: Yadira Peralta-Torres
Author-X-Name-First: Yadira
Author-X-Name-Last: Peralta-Torres
Title: A Survey of Reporting Practices of Computer Simulation Studies in Statistical Research
Abstract:
Computer simulation studies represent an important tool for investigating processes difficult or impossible to study using mathematical theory or real data. Hoaglin and Andrews recommended these studies be treated as statistical sampling experiments subject to established principles of design and data analysis, but the survey of Hauck and Anderson suggested these recommendations had, at that point in time, generally been ignored. We update the survey results of Hauck and Anderson using a sample of studies applying simulation methods in statistical research to assess the extent to which the recommendations of Hoaglin and Andrews and others for conducting simulation studies have been adopted. The important role of statistical applications of computer simulation studies in enhancing the reproducibility of scientific findings is also discussed. The results speak to the state of the art and the extent to which these studies are realizing their potential to inform statistical practice and a program of statistical research.
Journal: The American Statistician
Pages: 321-327
Issue: 4
Volume: 72
Year: 2018
Month: 10
X-DOI: 10.1080/00031305.2017.1342692
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1342692
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:321-327
Template-Type: ReDIF-Article 1.0
Author-Name: Luke A. Prendergast
Author-X-Name-First: Luke A.
Author-X-Name-Last: Prendergast
Author-Name: Robert G. Staudte
Author-X-Name-First: Robert G.
Author-X-Name-Last: Staudte
Title: A Simple and Effective Inequality Measure
Abstract:
Ratios of quantiles are often computed for income distributions as rough measures of inequality, and inference for such ratios has recently become available. The special case when the quantiles are symmetrically chosen; that is, when the p/2 quantile is divided by the (1 − p/2) quantile, is of special interest because the graph of such ratios, plotted as a function of p over the unit interval, yields an informative inequality curve. The area above the curve and less than the horizontal line at one is an easily interpretable measure of inequality. The advantages of these concepts over the traditional Lorenz curve and Gini coefficient are numerous: they are defined for all positive income distributions, they can be robustly estimated and large sample confidence intervals for the inequality coefficient are easily found. Moreover, the inequality curves satisfy a median-based transference principle and are convex for many commonly assumed income distributions.
Journal: The American Statistician
Pages: 328-343
Issue: 4
Volume: 72
Year: 2018
Month: 10
X-DOI: 10.1080/00031305.2017.1366366
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1366366
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:328-343
Template-Type: ReDIF-Article 1.0
Author-Name: Jonathan D. Rosenblatt
Author-X-Name-First: Jonathan D.
Author-X-Name-Last: Rosenblatt
Author-Name: Yoav Benjamini
Author-X-Name-First: Yoav
Author-X-Name-Last: Benjamini
Title: On Mixture Alternatives and Wilcoxon’s Signed-Rank Test
Abstract:
The shift alternative model has been the canonical alternative hypothesis since the early days of statistics. This holds true both in parametric and nonparametric statistical testing. In this contribution, we argue that in several applications of interest, the shift alternative is dubious while a mixture alternative is more plausible, because the treatment is expected to affect only a subpopulation. When considering mixture hypotheses, classical tests may no longer enjoy their desirable properties. In particular, we show that the t-test may be underpowered compared to Wilcoxon’s signed-rank test, even under a Gaussian null. We consider implications to personalized medicine and medical imaging.
Journal: The American Statistician
Pages: 344-347
Issue: 4
Volume: 72
Year: 2018
Month: 10
X-DOI: 10.1080/00031305.2017.1360795
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1360795
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:344-347
Template-Type: ReDIF-Article 1.0
Author-Name: M. L. Walker
Author-X-Name-First: M. L.
Author-X-Name-Last: Walker
Author-Name: Y. H. Dovoedo
Author-X-Name-First: Y. H.
Author-X-Name-Last: Dovoedo
Author-Name: S. Chakraborti
Author-X-Name-First: S.
Author-X-Name-Last: Chakraborti
Author-Name: C. W. Hilton
Author-X-Name-First: C. W.
Author-X-Name-Last: Hilton
Title: An Improved Boxplot for Univariate Data
Abstract:
The boxplot is an effective data-visualization tool useful in diverse applications and disciplines. Although more sophisticated graphical methods exist, the boxplot remains relevant due to its simplicity, interpretability, and usefulness, even in the age of big data. This article highlights the origins and developments of the boxplot that is now widely viewed as an industry standard as well as its inherent limitations when dealing with data from skewed distributions, particularly when detecting outliers. The proposed Ratio-Skewed boxplot is shown to be practical and suitable for outlier labeling across several parametric distributions.
Journal: The American Statistician
Pages: 348-353
Issue: 4
Volume: 72
Year: 2018
Month: 10
X-DOI: 10.1080/00031305.2018.1448891
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1448891
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:348-353
Template-Type: ReDIF-Article 1.0
Author-Name: Sherri Cheng
Author-X-Name-First: Sherri
Author-X-Name-Last: Cheng
Author-Name: Mark Ferris
Author-X-Name-First: Mark
Author-X-Name-Last: Ferris
Author-Name: Jessica Perolio
Author-X-Name-First: Jessica
Author-X-Name-Last: Perolio
Title: An Innovative Classroom Approach for Developing Critical Thinkers in the Introductory Statistics Course
Abstract:
Misrepresented data and data taken out of context can be misleading at best. Statisticians present data to compel arguments, and they have a responsibility to be balanced and transparent in their use of evidence. In the classroom, learning how to analyze, interpret, and report data also needs to include explicit training in critical thinking skills, in which students explore the importance of context, assumptions, and bias. With this in mind, we integrate an innovative, multi-faceted pedagogical approach into an introductory statistics course, which incorporates writing assignments, small group discussion, and Socratic dialog. Our approach provides real-life applications for traditional statistical topics while also helping students learn to use data with integrity, ask important questions, and view problems holistically.
Journal: The American Statistician
Pages: 354-358
Issue: 4
Volume: 72
Year: 2018
Month: 10
X-DOI: 10.1080/00031305.2017.1305293
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1305293
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:354-358
Template-Type: ReDIF-Article 1.0
Author-Name: Alan C. Elliott
Author-X-Name-First: Alan C.
Author-X-Name-Last: Elliott
Author-Name: S. Lynne Stokes
Author-X-Name-First: S. Lynne
Author-X-Name-Last: Stokes
Author-Name: Jing Cao
Author-X-Name-First: Jing
Author-X-Name-Last: Cao
Title: Teaching Ethics in a Statistics Curriculum with a Cross-Cultural Emphasis
Abstract:
Like most professional disciplines, the ASA has adopted ethical guidelines for its practitioners. To promote these guidelines, as well as to meet governmental and institutional mandates, U.S. universities are demanding more training on ethics within existing statistics graduate student curricula. Most of this training is based on the teachings of Western philosophers. However, many statistics graduate students are from Eastern cultures (particularly Chinese), and cultural and linguistic evidence indicates that Western ethics may be difficult to translate into the philosophical concepts common to students from different cultural backgrounds. This article describes how to teach cross-cultural ethics, with emphasis on the ASA Ethical Guidelines, within a graduate-level statistical consulting course. In particular, we present content that can help students overcome cultural and language barriers to gain an understanding of ethical decision-making that is compatible with both Western and Eastern philosophical models. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 359-367
Issue: 4
Volume: 72
Year: 2018
Month: 10
X-DOI: 10.1080/00031305.2017.1307140
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1307140
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:359-367
Template-Type: ReDIF-Article 1.0
Author-Name: Julian Stander
Author-X-Name-First: Julian
Author-X-Name-Last: Stander
Author-Name: Luciana Dalla Valle
Author-X-Name-First: Luciana
Author-X-Name-Last: Dalla Valle
Author-Name: Mario Cortina-Borja
Author-X-Name-First: Mario
Author-X-Name-Last: Cortina-Borja
Title: A Bayesian Survival Analysis of a Historical Dataset: How Long Do Popes Live?
Abstract:
University courses in statistical modeling often place great emphasis on methodological theory, illustrating it only briefly by means of limited and repeatedly used standard examples. Unfortunately, this approach often fails to actively engage and motivate students in their learning process. The teaching of statistical topics such as Bayesian survival analysis can be enhanced by focusing on innovative applications. Here, we discuss the visualization and modeling of a dataset of historical events comprising the post-election survival times of popes. Inference, prediction, and model checking are performed in the Bayesian framework, with comparisons being made with the frequentist approach. Further opportunities for similar statistical investigations are outlined. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 368-375
Issue: 4
Volume: 72
Year: 2018
Month: 10
X-DOI: 10.1080/00031305.2017.1328374
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1328374
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:368-375
Template-Type: ReDIF-Article 1.0
Author-Name: Simon Demers
Author-X-Name-First: Simon
Author-X-Name-Last: Demers
Title: Taylor's Law Holds for Finite OEIS Integer Sequences and Binomial Coefficients
Abstract:
Taylor's law (TL) predicts that the variance and the mean will be related empirically through a power-law function. TL previously has been shown to arise even in the absence of biological, ecological or physical processes. We report here that the mean and variance of 110 finite integer sequences in the On-Line Encyclopedia of Integer Sequences (OEIS) obey TL approximately. We also show that the binomial coefficients on each row of Pascal's triangle obey TL asymptotically. These applications of TL to seemingly unrelated mathematical structures tend to confirm there might be purely statistical, context-independent mechanisms at play. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 376-378
Issue: 4
Volume: 72
Year: 2018
Month: 10
X-DOI: 10.1080/00031305.2017.1422439
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1422439
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:376-378
Template-Type: ReDIF-Article 1.0
Author-Name: DeWayne Derryberry
Author-X-Name-First: DeWayne
Author-X-Name-Last: Derryberry
Author-Name: Ken Aho
Author-X-Name-First: Ken
Author-X-Name-Last: Aho
Author-Name: John Edwards
Author-X-Name-First: John
Author-X-Name-Last: Edwards
Author-Name: Teri Peterson
Author-X-Name-First: Teri
Author-X-Name-Last: Peterson
Title: Model Selection and Regression -Statistics
Abstract:
It is shown that dropping quantitative variables from a linear regression, based on t-statistics, is mathematically equivalent to dropping variables based on commonly used information criteria.
Journal: The American Statistician
Pages: 379-381
Issue: 4
Volume: 72
Year: 2018
Month: 10
X-DOI: 10.1080/00031305.2018.1459316
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1459316
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:379-381
Template-Type: ReDIF-Article 1.0
Author-Name: Stephanie C. Hicks
Author-X-Name-First: Stephanie C.
Author-X-Name-Last: Hicks
Author-Name: Rafael A. Irizarry
Author-X-Name-First: Rafael A.
Author-X-Name-Last: Irizarry
Title: A Guide to Teaching Data Science
Abstract:
Demand for data science education is surging and traditional courses offered by statistics departments are not meeting the needs of those seeking training. This has led to a number of opinion pieces advocating for an update to the Statistics curriculum. The unifying recommendation is that computing should play a more prominent role. We strongly agree with this recommendation, but advocate the main priority is to bring applications to the forefront as proposed by Nolan and Speed in 1999. We also argue that the individuals tasked with developing data science courses should not only have statistical training, but also have experience analyzing data with the main objective of solving real-world problems. Here, we share a set of general principles and offer a detailed guide derived from our successful experience developing and teaching a graduate-level, introductory data science course centered entirely on case studies. We argue for the importance of statistical thinking, as defined by Wild and Pfannkuch in 1999 and describe how our approach teaches students three key skills needed to succeed in data science, which we refer to as creating, connecting, and computing. This guide can also be used for statisticians wanting to gain more practical knowledge about data science before embarking on teaching an introductory course. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 382-391
Issue: 4
Volume: 72
Year: 2018
Month: 10
X-DOI: 10.1080/00031305.2017.1356747
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1356747
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:382-391
Template-Type: ReDIF-Article 1.0
Author-Name: Ronald Christensen
Author-X-Name-First: Ronald
Author-X-Name-Last: Christensen
Title: Comment on Knaeble and Dutter (2017)
Journal: The American Statistician
Pages: 392-393
Issue: 4
Volume: 72
Year: 2018
Month: 10
X-DOI: 10.1080/00031305.2016.1278036
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1278036
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:392-393
Template-Type: ReDIF-Article 1.0
Author-Name: The Editors
Title: Corrigenda
Journal: The American Statistician
Pages: 394-394
Issue: 4
Volume: 72
Year: 2018
Month: 10
X-DOI: 10.1080/00031305.2018.1523641
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1523641
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:394-394
Template-Type: ReDIF-Article 1.0
Author-Name: Feifei Wang
Author-X-Name-First: Feifei
Author-X-Name-Last: Wang
Author-Name: Jian Wang
Author-X-Name-First: Jian
Author-X-Name-Last: Wang
Author-Name: Alan E. Gelfand
Author-X-Name-First: Alan E.
Author-X-Name-Last: Gelfand
Author-Name: Fan Li
Author-X-Name-First: Fan
Author-X-Name-Last: Li
Title: Disease Mapping With Generative Models
Abstract:
Disease mapping focuses on learning about areal units presenting high relative risk. Disease mapping models assume that the disease counts are distributed as Poisson random variables with the respective means typically specified as the product of the relative risk and the expected count. These models usually incorporate spatial random effects to accomplish spatial smoothing of the relative risks. Fitting of these models often computes expected disease counts via internal standardization. This places the data on both sides of the model, that is, the counts are on the left side but they are also used to obtain the expected counts on the right side. As a result, these internally standardized models are incoherent and not generative; probabilistically, they could not produce the data we observe. Here, we argue for adopting the direct generative model for disease counts, modeling disease incidence rates instead of relative risks, using a generalized logistic regression. Then, the relative risks are then extracted post model fitting. We first demonstrate the benefit of the generative model without incorporating spatial smoothing using simulation. Then, spatial smoothing is introduced using the customary conditionally autoregressive model. We also extend the generative model to dynamic settings. The generative models are compared with internally standardized models, again through simulated datasets but also through a well-examined lung cancer morbidity dataset in Ohio. Both models are spatial and both smooth the data similarly with regard to relative risks. However, the generative coherent models tend to provide tighter credible intervals. Since the generative specification is coherent, is at least as good inferentially, and is no more difficult to fit, we suggest that it should be the model of choice for spatial disease mapping.
Journal: The American Statistician
Pages: 213-223
Issue: 3
Volume: 73
Year: 2019
Month: 7
X-DOI: 10.1080/00031305.2017.1392358
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1392358
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:3:p:213-223
Template-Type: ReDIF-Article 1.0
Author-Name: Peter K. Dunn
Author-X-Name-First: Peter K.
Author-X-Name-Last: Dunn
Author-Name: Margaret Marshman
Author-X-Name-First: Margaret
Author-X-Name-Last: Marshman
Author-Name: Robert McDougall
Author-X-Name-First: Robert
Author-X-Name-Last: McDougall
Title: Evaluating Wikipedia as a Self-Learning Resource for Statistics: You Know They'll Use It
Abstract:
The role of Wikipedia for learning has been debated because it does not conform to the usual standards. Despite this, people use it, due to the ubiquity of Wikipedia entries in the outcomes from popular search engines. It is important for academic disciplines, including statistics, to ensure they are correctly represented in a medium where anyone can assume the role of discipline expert. In this context, we first develop a tool for evaluating Wikipedia articles for topics with a procedural component. Then, using this tool, five Wikipedia articles on basic statistical concepts are critiqued from the point of view of a self-learner: “arithmetic mean,” “standard deviation,” “standard error,” “confidence interval,” and “histogram.” We find that the articles, in general, are poor, and some articles contain inaccuracies. We propose that Wikipedia be actively discouraged for self-learning (using, for example, a classroom activity) except to give a brief overview; that in more formal learning environments, teachers be explicit about not using Wikipedia as a learning resource for course content; and, because Wikipedia is used regardless of considered advice or the organizational protocols in place, teachers move away from minimal contact with Wikipedia towards more constructive engagement.
Journal: The American Statistician
Pages: 224-231
Issue: 3
Volume: 73
Year: 2019
Month: 7
X-DOI: 10.1080/00031305.2017.1392360
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1392360
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:3:p:224-231
Template-Type: ReDIF-Article 1.0
Author-Name: Thomas J. Fisher
Author-X-Name-First: Thomas J.
Author-X-Name-Last: Fisher
Author-Name: Michael W. Robbins
Author-X-Name-First: Michael W.
Author-X-Name-Last: Robbins
Title: A Cheap Trick to Improve the Power of a Conservative Hypothesis Test
Abstract:
Critical values and p-values of statistical hypothesis tests are often derived using asymptotic approximations of sampling distributions. However, this sometimes results in tests that are conservative (i.e., understate the frequency of an incorrectly rejected null hypothesis by employing too stringent of a threshold for rejection). Although computationally rigorous options (e.g., the bootstrap) are available for such situations, we illustrate that simple transformations can be used to improve both the size and power of such tests. Using a logarithmic transformation, we show that the transformed statistic is asymptotically equivalent to its untransformed analogue under the null hypothesis and is divergent from the untransformed version under the alternative (yielding a potentially substantial increase in power). The transformation is applied to several easily-accessible statistical hypothesis tests, a few of which are taught in introductory statistics courses. With theoretical arguments and simulations, we illustrate that the log transformation is preferable to other forms of correction (such as statistics that use a multiplier). Finally, we illustrate application of the method to a well-known dataset. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 232-242
Issue: 3
Volume: 73
Year: 2019
Month: 7
X-DOI: 10.1080/00031305.2017.1395364
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1395364
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:3:p:232-242
Template-Type: ReDIF-Article 1.0
Author-Name: Geoffrey K. Robinson
Author-X-Name-First: Geoffrey K.
Author-X-Name-Last: Robinson
Title: What Properties Might Statistical Inferences Reasonably be Expected to Have?—Crisis and Resolution in Statistical Inference
Abstract:
There is a crisis in the foundations of statistical inference. I believe that this crisis will eventually be resolved by regarding the subjective Bayesian paradigm as ideal in principle but often using standard procedures which are not subjective Bayesian for well-defined standard circumstances. As a step toward this resolution, this article looks at the question of what properties statistical inferences might reasonably be expected to have and argues that the use of p-values should be restricted to pure significance testing. The value judgments presented are supported by a range of examples.
Journal: The American Statistician
Pages: 243-252
Issue: 3
Volume: 73
Year: 2019
Month: 7
X-DOI: 10.1080/00031305.2017.1415971
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1415971
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:3:p:243-252
Template-Type: ReDIF-Article 1.0
Author-Name: Jeff Allen
Author-X-Name-First: Jeff
Author-X-Name-Last: Allen
Title: Who Wants to be a Statistician? An Analysis of ACT-Tested Public School Students
Abstract:
This study examines predictors of statistics as occupation choice while in high school. The overall rate of choosing statistics was 1 per 1,681 students. Females, Asian students, students from the southern United States, and students from rural schools were less likely to choose statistics, and there was an increase in statistics choice rates between 2014 and 2017. Differences across other socio-demographic groups were small after accounting for other predictors. The strongest predictors of statistics choice were ACT Mathematics score and a measure of vocational interests corresponding to Holland's Conventional personality type. The results of the study can be used to identify high school students with interest and achievement profiles that are common among prospective statisticians, and to gain a better understanding of factors that affect statistics occupation choice. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 253-263
Issue: 3
Volume: 73
Year: 2019
Month: 7
X-DOI: 10.1080/00031305.2017.1419143
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1419143
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:3:p:253-263
Template-Type: ReDIF-Article 1.0
Author-Name: Jacob Goldin
Author-X-Name-First: Jacob
Author-X-Name-Last: Goldin
Author-Name: Daniel Reck
Author-X-Name-First: Daniel
Author-X-Name-Last: Reck
Title: The Analysis of Survey Data with Framing Effects
Abstract:
A well-known difficulty in survey research is that respondents’ answers to questions can depend on arbitrary features of a survey’s design, such as the wording of questions or the ordering of answer choices. In this paper, we describe a novel set of tools for analyzing survey data characterized by such framing effects. We show that the conventional approach to analyzing data with framing effects—randomizing survey-takers across frames and pooling the responses—generally does not identify a useful parameter. In its place, we propose an alternative approach and provide conditions under which it identifies the responses that are unaffected by framing. We also present several results for shedding light on the population distribution of the individual characteristic the survey is designed to measure.
Journal: The American Statistician
Pages: 264-272
Issue: 3
Volume: 73
Year: 2019
Month: 7
X-DOI: 10.1080/00031305.2017.1407358
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1407358
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:3:p:264-272
Template-Type: ReDIF-Article 1.0
Author-Name: Hakan Demirtas
Author-X-Name-First: Hakan
Author-X-Name-Last: Demirtas
Title: Inducing Any Feasible Level of Correlation to Bivariate Data With Any Marginals
Abstract:
A simple sorting approach for inducing any desired Pearson or Spearman correlation to independent bivariate data, whose marginals can be of any distributional type and nature is described and illustrated through examples that span a broad range of situations. The proposed method has substantial potential in simulated settings that involve random number generation.
Journal: The American Statistician
Pages: 273-277
Issue: 3
Volume: 73
Year: 2019
Month: 7
X-DOI: 10.1080/00031305.2017.1379438
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1379438
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:3:p:273-277
Template-Type: ReDIF-Article 1.0
Author-Name: J. G. Liao
Author-X-Name-First: J. G.
Author-X-Name-Last: Liao
Author-Name: Arthur Berg
Author-X-Name-First: Arthur
Author-X-Name-Last: Berg
Title: Sharpening Jensen's Inequality
Abstract:
This article proposes a new sharpened version of Jensen's inequality. The proposed new bound is simple and insightful, is broadly applicable by imposing minimum assumptions, and provides fairly accurate results in spite of its simple form. Applications to the moment generating function, power mean inequalities, and Rao-Blackwell estimation are presented. This presentation can be incorporated in any calculus-based statistical course.
Journal: The American Statistician
Pages: 278-281
Issue: 3
Volume: 73
Year: 2019
Month: 7
X-DOI: 10.1080/00031305.2017.1419145
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1419145
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:3:p:278-281
Template-Type: ReDIF-Article 1.0
Author-Name: Ryoungsun Park
Author-X-Name-First: Ryoungsun
Author-X-Name-Last: Park
Title: Practical Teaching Strategies for Hypothesis Testing
Abstract:
Teaching the concept of inferential statistics is one of the most challenging tasks for statistics educators. Often, students cannot make logical connections between inferential statistics and other topics such as descriptive statistics and probability. The source of difficulty may be that inferential statistics is based on complex ideas such as hypothetical reasoning, data analytic methods, and probabilistic thinking. This article presents classroom practices that teachers can easily adapt for their statistics classes to teach fundamental ideas of inferential statistics. The expected educational outcome is the conceptual understanding of the elements of statistical testing rather than learning about a specific testing methodology. Using the proposed practices, students are guided to propose their own hypotheses, collect actual data, and make their own inferences, rather than following a predetermined sequence of procedures. The practice material is divided into three subtasks, so that teachers can plan their curriculum effectively and perform formative assessments regarding students' progress.
Journal: The American Statistician
Pages: 282-287
Issue: 3
Volume: 73
Year: 2019
Month: 7
X-DOI: 10.1080/00031305.2018.1424034
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1424034
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:3:p:282-287
Template-Type: ReDIF-Article 1.0
Author-Name: Peter S. Fader
Author-X-Name-First: Peter S.
Author-X-Name-Last: Fader
Author-Name: Bruce G. S. Hardie
Author-X-Name-First: Bruce G. S.
Author-X-Name-Last: Hardie
Author-Name: Daniel McCarthy
Author-X-Name-First: Daniel
Author-X-Name-Last: McCarthy
Author-Name: Ramnath Vaidyanathan
Author-X-Name-First: Ramnath
Author-X-Name-Last: Vaidyanathan
Title: Exploring the Equivalence of Two Common Mixture Models for Duration Data
Abstract:
The beta-geometric (BG) distribution and the Pareto distribution of the second kind (P(II)) are two basic models for duration-time data that share some underlying characteristics (i.e., continuous mixtures of memoryless distributions), but differ in two important respects: first, the BG is the natural model to use when the event of interest occurs in discrete time, while the P(II) is the right choice for a continuous-time setting. Second, the underlying mixing distributions (the beta and gamma for the BG and P(II), respectively), are very different—and often believed to be noncomparable with each other. Despite these and other key differences, the two models are strikingly similar in terms of their fit and predictive performance as well as their parameter estimates. We explore this equivalence, both empirically and analytically, and discuss the implications from both a substantive and methodological standpoint.
Journal: The American Statistician
Pages: 288-295
Issue: 3
Volume: 73
Year: 2019
Month: 7
X-DOI: 10.1080/00031305.2018.1543134
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1543134
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:3:p:288-295
Template-Type: ReDIF-Article 1.0
Author-Name: Hongmei Zhang
Author-X-Name-First: Hongmei
Author-X-Name-Last: Zhang
Author-Name: Yubo Zou
Author-X-Name-First: Yubo
Author-X-Name-Last: Zou
Author-Name: Will Terry
Author-X-Name-First: Will
Author-X-Name-Last: Terry
Author-Name: Wilfried Karmaus
Author-X-Name-First: Wilfried
Author-X-Name-Last: Karmaus
Author-Name: Hasan Arshad
Author-X-Name-First: Hasan
Author-X-Name-Last: Arshad
Title: Joint Clustering With Correlated Variables
Abstract:
Traditional clustering methods focus on grouping subjects or (dependent) variables assuming independence between the variables. Clusters formed through these approaches can potentially lack homogeneity. This article proposes a joint clustering method by which both variables and subjects are clustered. In each joint cluster (in general composed of a subset of variables and a subset of subjects), there exists a unique association between dependent variables and covariates of interest. To this end, a Bayesian method is designed, in which a semi-parametric model is used to evaluate any unknown relationships between possibly correlated variables and covariates of interest, and a Dirichlet process is used to cluster subjects. Compared to existing clustering techniques, the major novelty of the method exists in its ability to improve the homogeneity of clusters, along with the ability to take the correlations between variables into account. Via simulations, we examine the performance and efficiency of the proposed method. Applying the method to cluster allergens and subjects based on the association of wheal size in reaction to allergens with age, we found that a certain pattern of allergic sensitization to a set of allergens has a potential to reduce the occurrence of asthma.
Journal: The American Statistician
Pages: 296-306
Issue: 3
Volume: 73
Year: 2019
Month: 7
X-DOI: 10.1080/00031305.2018.1424033
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1424033
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:3:p:296-306
Template-Type: ReDIF-Article 1.0
Author-Name: Andrew Gelman
Author-X-Name-First: Andrew
Author-X-Name-Last: Gelman
Author-Name: Ben Goodrich
Author-X-Name-First: Ben
Author-X-Name-Last: Goodrich
Author-Name: Jonah Gabry
Author-X-Name-First: Jonah
Author-X-Name-Last: Gabry
Author-Name: Aki Vehtari
Author-X-Name-First: Aki
Author-X-Name-Last: Vehtari
Title: R-squared for Bayesian Regression Models
Abstract:
The usual definition of R2 (variance of the predicted values divided by the variance of the data) has a problem for Bayesian fits, as the numerator can be larger than the denominator. We propose an alternative definition similar to one that has appeared in the survival analysis literature: the variance of the predicted values divided by the variance of predicted values plus the expected variance of the errors.
Journal: The American Statistician
Pages: 307-309
Issue: 3
Volume: 73
Year: 2019
Month: 7
X-DOI: 10.1080/00031305.2018.1549100
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1549100
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:3:p:307-309
Template-Type: ReDIF-Article 1.0
Author-Name: Silas Bergen
Author-X-Name-First: Silas
Author-X-Name-Last: Bergen
Title: Displaying Time Series, Spatial, and Space-Time Data with R, 2nd ed
Journal: The American Statistician
Pages: 310-311
Issue: 3
Volume: 73
Year: 2019
Month: 7
X-DOI: 10.1080/00031305.2019.1641357
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1641357
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:3:p:310-311
Template-Type: ReDIF-Article 1.0
Author-Name: Thaddeus Tarpey
Author-X-Name-First: Thaddeus
Author-X-Name-Last: Tarpey
Author-Name: Eva Petkova
Author-X-Name-First: Eva
Author-X-Name-Last: Petkova
Title: Letter to the Editor
Journal: The American Statistician
Pages: 312-312
Issue: 3
Volume: 73
Year: 2019
Month: 7
X-DOI: 10.1080/00031305.2018.1537894
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1537894
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:3:p:312-312
Template-Type: ReDIF-Article 1.0
Author-Name: Jiangtao Gou
Author-X-Name-First: Jiangtao
Author-X-Name-Last: Gou
Author-Name: Fengqing (Zoe) Zhang
Author-X-Name-First: Fengqing (Zoe)
Author-X-Name-Last: Zhang
Title: Experience Simpson's Paradox in the Classroom
Abstract:
Simpson's paradox is a challenging topic to teach in an introductory statistics course. To motivate students to understand this paradox both intuitively and statistically, this article introduces several new ways to teach Simpson's paradox. We design a paper toss activity between instructors and students in class to engage students in the learning process. We show that Simpson's paradox widely exists in basketball statistics, and thus instructors may consider looking for Simpson's paradox in their own school basketball teams as examples to motivate students’ interest. A new probabilistic explanation of Simpson's paradox is provided, which helps foster students’ statistical understanding. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 61-66
Issue: 1
Volume: 71
Year: 2017
Month: 1
X-DOI: 10.1080/00031305.2016.1200485
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1200485
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:61-66
Template-Type: ReDIF-Article 1.0
Author-Name: Christine M. Anderson-Cook
Author-X-Name-First: Christine M.
Author-X-Name-Last: Anderson-Cook
Author-Name: Michael S. Hamada
Author-X-Name-First: Michael S.
Author-X-Name-Last: Hamada
Author-Name: Leslie M. Moore
Author-X-Name-First: Leslie M.
Author-X-Name-Last: Moore
Author-Name: Joanne R. Wendelberger
Author-X-Name-First: Joanne R.
Author-X-Name-Last: Wendelberger
Title: Statistical Mentoring at Early Training and Career Stages
Abstract:
At Los Alamos National Laboratory (LANL), statistical scientists develop solutions for a variety of national security challenges through scientific excellence, typically as members of interdisciplinary teams. At LANL, mentoring is actively encouraged and practiced to develop statistical skills and positive career-building behaviors. Mentoring activities targeted at different career phases from student to junior staff are an important catalyst for both short and long term career development. This article discusses mentoring strategies for undergraduate and graduate students through internships as well as for postdoctoral research associates and junior staff. Topics addressed include project selection, progress, and outcome; intellectual and social activities that complement the student internship experience; key skills/knowledge not typically obtained in academic training; and the impact of such internships on students’ careers. Experiences and strategies from a number of successful mentorships are presented. Feedback from former mentees obtained via a questionnaire is incorporated. These responses address some of the benefits the respondents received from mentoring, helpful contributions and advice from their mentors, key skills learned, and how mentoring impacted their later careers.
Journal: The American Statistician
Pages: 6-14
Issue: 1
Volume: 71
Year: 2017
Month: 1
X-DOI: 10.1080/00031305.2016.1200491
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1200491
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:6-14
Template-Type: ReDIF-Article 1.0
Author-Name: Leandro da Silva Pereira
Author-X-Name-First: Leandro da Silva
Author-X-Name-Last: Pereira
Author-Name: Lucas Monteiro Chaves
Author-X-Name-First: Lucas Monteiro
Author-X-Name-Last: Chaves
Author-Name: Devanil Jaques de Souza
Author-X-Name-First: Devanil Jaques
Author-X-Name-Last: de Souza
Title: An Intuitive Geometric Approach to the Gauss Markov Theorem
Abstract:
Algebraic proofs of Gauss–Markov theorem are very disappointing from an intuitive point of view. An alternative is to use geometry that emphasizes the essential statistical ideas behind the result. This article presents a truly geometrical intuitive approach to the theorem, based only in simple geometrical concepts, like linear subspaces and orthogonal projections.
Journal: The American Statistician
Pages: 67-70
Issue: 1
Volume: 71
Year: 2017
Month: 1
X-DOI: 10.1080/00031305.2016.1209127
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1209127
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:67-70
Template-Type: ReDIF-Article 1.0
Author-Name: Li Zhu
Author-X-Name-First: Li
Author-X-Name-Last: Zhu
Author-Name: Kimberly F. Sellers
Author-X-Name-First: Kimberly F.
Author-X-Name-Last: Sellers
Author-Name: Darcy Steeg Morris
Author-X-Name-First: Darcy Steeg
Author-X-Name-Last: Morris
Author-Name: Galit Shmueli
Author-X-Name-First: Galit
Author-X-Name-Last: Shmueli
Title: Bridging the Gap: A Generalized Stochastic Process for Count Data
Abstract:
The Bernoulli and Poisson processes are two popular discrete count processes; however, both rely on strict assumptions. We instead propose a generalized homogenous count process (which we name the Conway–Maxwell–Poisson or COM-Poisson process) that not only includes the Bernoulli and Poisson processes as special cases, but also serves as a flexible mechanism to describe count processes that approximate data with over- or under-dispersion. We introduce the process and an associated generalized waiting time distribution with several real-data applications to illustrate its flexibility for a variety of data structures. We consider model estimation under different scenarios of data availability, and assess performance through simulated and real datasets. This new generalized process will enable analysts to better model count processes where data dispersion exists in a more accommodating and flexible manner.
Journal: The American Statistician
Pages: 71-80
Issue: 1
Volume: 71
Year: 2017
Month: 1
X-DOI: 10.1080/00031305.2016.1234976
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1234976
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:71-80
Template-Type: ReDIF-Article 1.0
Author-Name: Edward L. Ionides
Author-X-Name-First: Edward L.
Author-X-Name-Last: Ionides
Author-Name: Alexander Giessing
Author-X-Name-First: Alexander
Author-X-Name-Last: Giessing
Author-Name: Yaacov Ritov
Author-X-Name-First: Yaacov
Author-X-Name-Last: Ritov
Author-Name: Scott E. Page
Author-X-Name-First: Scott E.
Author-X-Name-Last: Page
Title: Response to the ASA’s Statement on -Values: Context, Process, and Purpose
Journal: The American Statistician
Pages: 88-89
Issue: 1
Volume: 71
Year: 2017
Month: 1
X-DOI: 10.1080/00031305.2016.1234977
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1234977
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:88-89
Template-Type: ReDIF-Article 1.0
Author-Name: Richard G. Spencer
Author-X-Name-First: Richard G.
Author-X-Name-Last: Spencer
Author-Name: Benjamin D. Cortese
Author-X-Name-First: Benjamin D.
Author-X-Name-Last: Cortese
Author-Name: Vanessa A. Lukas
Author-X-Name-First: Vanessa A.
Author-X-Name-Last: Lukas
Author-Name: Nancy Pleshko
Author-X-Name-First: Nancy
Author-X-Name-Last: Pleshko
Title: Point Estimates of Test Sensitivity and Specificity from Sample Means and Variances
Abstract:
In a wide variety of biomedical and clinical research studies, sample statistics from diagnostic marker measurements are presented as a means of distinguishing between two populations, such as with and without disease. Intuitively, a larger difference between the mean values of a marker for the two populations, and a smaller spread of values within each population, should lead to more reliable classification rules based on this marker. We formalize this intuitive notion by deriving practical, new, closed-form expressions for the sensitivity and specificity of three different discriminant tests defined in terms of the sample means and standard deviations of diagnostic marker measurements. The three discriminant tests evaluated are based, respectively, on the Euclidean distance and the Mahalanobis distance between means, and a likelihood ratio analysis. Expressions for the effects of measurement error are also presented. Our final expressions assume that the diagnostic markers follow independent normal distributions for the two populations, although it will be clear that other known distributions may be similarly analyzed. We then discuss applications drawn from the medical literature, although the formalism is clearly not restricted to that application.
Journal: The American Statistician
Pages: 81-87
Issue: 1
Volume: 71
Year: 2017
Month: 1
X-DOI: 10.1080/00031305.2016.1239589
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1239589
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:81-87
Template-Type: ReDIF-Article 1.0
Author-Name: Eric A. Vance
Author-X-Name-First: Eric A.
Author-X-Name-Last: Vance
Author-Name: Donna E. LaLonde
Author-X-Name-First: Donna E.
Author-X-Name-Last: LaLonde
Author-Name: Lin Zhang
Author-X-Name-First: Lin
Author-X-Name-Last: Zhang
Title: The Big Tent for Statistics: Mentoring Required
Abstract:
Research supports the positive impact of mentoring on both job and career satisfaction. Recognizing this, the American Statistical Association (ASA) has started a new mission-centered focus on mentoring. This article describes the development and implementation of meeting-based mentoring programs at four ASA conferences in 2014 and 2015. We present results of the feedback evaluations from program participants and use them to motivate recommendations for creating and running conference mentoring programs and overcoming common challenges. These recommendations are applicable to creating and running conference mentoring programs in any field. We conclude with a discussion of the opportunities for the ASA to augment its mentoring programs in support of the professional development of its members.
Journal: The American Statistician
Pages: 15-22
Issue: 1
Volume: 71
Year: 2017
Month: 1
X-DOI: 10.1080/00031305.2016.1247016
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1247016
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:15-22
Template-Type: ReDIF-Article 1.0
Author-Name: Eric A. Vance
Author-X-Name-First: Eric A.
Author-X-Name-Last: Vance
Author-Name: Erin Tanenbaum
Author-X-Name-First: Erin
Author-X-Name-Last: Tanenbaum
Author-Name: Amarjot Kaur
Author-X-Name-First: Amarjot
Author-X-Name-Last: Kaur
Author-Name: Mark C. Otto
Author-X-Name-First: Mark C.
Author-X-Name-Last: Otto
Author-Name: Richard Morris
Author-X-Name-First: Richard
Author-X-Name-Last: Morris
Title: An Eight-Step Guide to Creating and Sustaining a Mentoring Program
Abstract:
Mentoring is an extremely valuable activity for both individuals and organizations. Mentoring within organizations can develop and integrate employees into their corporate culture. Mentoring outside the mentees’ work groups or through professional development organizations can give broader perspective and support, especially in times of transition. But mentoring programs require tremendous effort to start, organize, and maintain. Few last more than two years. This article provides a structured approach to starting and sustaining a successful program. The steps include understanding an organization’s particular needs, learning from small pilot programs, following up with mentoring pairs during a committed formal mentoring period, and evaluating results from each program’s cycle to learn and grow the program. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 23-29
Issue: 1
Volume: 71
Year: 2017
Month: 1
X-DOI: 10.1080/00031305.2016.1251493
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1251493
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:23-29
Template-Type: ReDIF-Article 1.0
Author-Name: Mark Daniel Ward
Author-X-Name-First: Mark Daniel
Author-X-Name-Last: Ward
Title: Building Bridges: The Role of an Undergraduate Mentor
Abstract:
I share some advice and lessons that I have learned from working with many wonderful students and colleagues, in my role as Undergraduate Chair of Statistics at Purdue University since 2008. I also reflect on developing, implementing, and sustaining a new living, learning community environment for statistics students.
Journal: The American Statistician
Pages: 30-33
Issue: 1
Volume: 71
Year: 2017
Month: 1
X-DOI: 10.1080/00031305.2016.1251494
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1251494
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:30-33
Template-Type: ReDIF-Article 1.0
Author-Name: Lauren Vollmer
Author-X-Name-First: Lauren
Author-X-Name-Last: Vollmer
Author-Name: Aparna Keshaviah
Author-X-Name-First: Aparna
Author-X-Name-Last: Keshaviah
Author-Name: Dmitriy Poznyak
Author-X-Name-First: Dmitriy
Author-X-Name-Last: Poznyak
Author-Name: Sharon Zhao
Author-X-Name-First: Sharon
Author-X-Name-Last: Zhao
Author-Name: Fei Xing
Author-X-Name-First: Fei
Author-X-Name-Last: Xing
Author-Name: Nicholas Beyler
Author-X-Name-First: Nicholas
Author-X-Name-Last: Beyler
Title: Re-Defining the , and of Mentoring for Professional Statisticians
Abstract:
Organizations tailor their mentoring strategies to accommodate internal resources and preferences, producing different approaches in academic, government, and corporate environments. Across these settings, three common barriers impede effective mentoring of statisticians: overspecialization, time constraints, and geographic dispersion. The authors share mentoring strategies that have emerged at their organization, Mathematica Policy Research, to overcome these obstacles. Practices include creating a methodology working group to unite researchers with diverse backgrounds, integrating mentoring into existing workflows, and harnessing modern technological infrastructure to facilitate virtual mentoring. Although these strategies emerged within a specific professional context, they suggest opportunities for statisticians to expand the channels through which mentorship can occur.
Journal: The American Statistician
Pages: 34-37
Issue: 1
Volume: 71
Year: 2017
Month: 1
X-DOI: 10.1080/00031305.2016.1255256
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1255256
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:34-37
Template-Type: ReDIF-Article 1.0
Author-Name: Kim Love
Author-X-Name-First: Kim
Author-X-Name-Last: Love
Author-Name: Eric A. Vance
Author-X-Name-First: Eric A.
Author-X-Name-Last: Vance
Author-Name: Frank E. Harrell,
Author-X-Name-First: Frank E.
Author-X-Name-Last: Harrell,
Author-Name: Dallas E. Johnson
Author-X-Name-First: Dallas E.
Author-X-Name-Last: Johnson
Author-Name: Michael H. Kutner
Author-X-Name-First: Michael H.
Author-X-Name-Last: Kutner
Author-Name: Ronald D. Snee
Author-X-Name-First: Ronald D.
Author-X-Name-Last: Snee
Author-Name: Doug Zahn
Author-X-Name-First: Doug
Author-X-Name-Last: Zahn
Title: Developing a Career in the Practice of Statistics: The Mentor's Perspective
Abstract:
The W.J. Dixon Award for Excellence in Statistical Consulting is given by the American Statistical Association to “a distinguished individual who has demonstrated excellence in statistical consulting or developed and contributed new methods, software, or ways of thinking that improve statistical practice in general.” In this article, five of the seven past recipients of this career-capping award share their experiences and perspectives through 10 stepping stones that move a practicing statistician from consultant to collaborator to leader. We highlight the need for mentorship throughout the discussion, and provide direction for statisticians who would like to incorporate this advice into their careers.
Journal: The American Statistician
Pages: 38-46
Issue: 1
Volume: 71
Year: 2017
Month: 1
X-DOI: 10.1080/00031305.2016.1255257
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1255257
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:38-46
Template-Type: ReDIF-Article 1.0
Author-Name: Amanda L. Golbeck
Author-X-Name-First: Amanda L.
Author-X-Name-Last: Golbeck
Title: Mentoring Faculty Women in Statistics: Exploring Challenges and Opportunities for Leadership Development
Abstract:
The problems for faculty women in statistics (FWIS) in the United States are complex and call for programs that aim to develop inclusive leadership competencies among both FWIS and faculty men in statistics (FMIS) regardless of whether they currently hold, or aspire to, administrative positions. Data indicate that, among faculty in doctorate-granting departments of statistics and biostatistics, there is a disparity between genders in numbers of role models or exemplars. Yet we note that there have been some innovative national initiatives over the years in mentoring, networking, or leadership that have been instrumental in advancing FWIS. Given current understandings of the role of implicit bias in sustaining a differential status for FWIS, this discussion emphasizes a new approach as a way to further advance FWIS: one that involves the development of inclusive leadership among both men and women toward promoting inclusive faculty cultures in statistics.
Journal: The American Statistician
Pages: 47-54
Issue: 1
Volume: 71
Year: 2017
Month: 1
X-DOI: 10.1080/00031305.2016.1255658
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1255658
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:47-54
Template-Type: ReDIF-Article 1.0
Author-Name: Jacqueline M. Hughes-Oliver
Author-X-Name-First: Jacqueline M.
Author-X-Name-Last: Hughes-Oliver
Title: Mentoring to Achieve Diversity in Graduate Programs
Abstract:
The discipline of statistics has a celebrated, diverse, and colorful past. With a definite international flavor, we continue to make great strides in keeping our discipline relevant and accessible for addressing significant societal concerns. Unfortunately, we lag behind many other disciplines when it comes to fully tapping into the potential of all demographic groups within the United States. Mentoring provides one of many opportunities to change this narrative. This article looks at hard numbers related to diversity, points to some existing successful mentoring programs, and is a reflection of lessons learned through personal experiences.
Journal: The American Statistician
Pages: 55-60
Issue: 1
Volume: 71
Year: 2017
Month: 1
X-DOI: 10.1080/00031305.2016.1255661
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1255661
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:55-60
Template-Type: ReDIF-Article 1.0
Author-Name: Susan E. Hodge
Author-X-Name-First: Susan E.
Author-X-Name-Last: Hodge
Title: Letter to the Editor: Average Entropy Does Not Measure Uncertainty
Journal: The American Statistician
Pages: 89-90
Issue: 1
Volume: 71
Year: 2017
Month: 1
X-DOI: 10.1080/00031305.2016.1265586
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1265586
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:89-90
Template-Type: ReDIF-Article 1.0
Author-Name: Mary Kwasny
Author-X-Name-First: Mary
Author-X-Name-Last: Kwasny
Title: Mentoring in the ASA: A Rejoinder
Journal: The American Statistician
Pages: 5-5
Issue: 1
Volume: 71
Year: 2017
Month: 1
X-DOI: 10.1080/00031305.2016.1268502
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1268502
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:5-5
Template-Type: ReDIF-Article 1.0
Author-Name: David Morganstein
Author-X-Name-First: David
Author-X-Name-Last: Morganstein
Title: Mentoring in the ASA: A Commentary
Journal: The American Statistician
Pages: 3-4
Issue: 1
Volume: 71
Year: 2017
Month: 1
X-DOI: 10.1080/00031305.2016.1268504
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1268504
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:3-4
Template-Type: ReDIF-Article 1.0
Author-Name: Omar A. Kittaneh
Author-X-Name-First: Omar A.
Author-X-Name-Last: Kittaneh
Title: Response to "Average Entropy Does Not Measure Uncertainty"
Journal: The American Statistician
Pages: 91-91
Issue: 1
Volume: 71
Year: 2017
Month: 1
X-DOI: 10.1080/00031305.2016.1269484
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1269484
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:91-91
Template-Type: ReDIF-Article 1.0
Author-Name: Aarti Shah
Author-X-Name-First: Aarti
Author-X-Name-Last: Shah
Title: What is Mentoring?
Abstract:
What is mentoring? Is it just a buzz word or is this really valuable? How can mentoring help one to grow and advance personally and professionally? How and where does one even begin? Many of us have these questions. In this article, I will share my perspective and provide some reflections on these questions based on my own personal and professional journey.
Journal: The American Statistician
Pages: 1-2
Issue: 1
Volume: 71
Year: 2017
Month: 1
X-DOI: 10.1080/00031305.2016.1269686
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1269686
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:1-2
Template-Type: ReDIF-Article 1.0
Author-Name: Reza Ramezan
Author-X-Name-First: Reza
Author-X-Name-Last: Ramezan
Title: Reviews of Books and Teaching Materials
Journal: The American Statistician
Pages: 92-96
Issue: 1
Volume: 71
Year: 2017
Month: 1
X-DOI: 10.1080/00031305.2017.1271242
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1271242
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:1:p:92-96
Template-Type: ReDIF-Article 1.0
Author-Name: Aniko Szabo
Author-X-Name-First: Aniko
Author-X-Name-Last: Szabo
Title: Test for Trend With a Multinomial Outcome
Abstract:
There is no established procedure for testing for trend with nominal outcomes that would provide both a global hypothesis test and outcome-specific inference. We derive a simple formula for such a test using a weighted sum of Cochran–Armitage test statistics evaluating the trend in each outcome separately. The test is shown to be equivalent to the score test for multinomial logistic regression, however, the new formulation enables the derivation of a sample size formula and multiplicity-adjusted inference for individual outcomes. The proposed methods are implemented in the R package multiCA.
Journal: The American Statistician
Pages: 313-320
Issue: 4
Volume: 73
Year: 2019
Month: 10
X-DOI: 10.1080/00031305.2017.1407823
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1407823
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:313-320
Template-Type: ReDIF-Article 1.0
Author-Name: Aaron McDaid
Author-X-Name-First: Aaron
Author-X-Name-Last: McDaid
Author-Name: Zoltán Kutalik
Author-X-Name-First: Zoltán
Author-X-Name-Last: Kutalik
Author-Name: Valentin Rousson
Author-X-Name-First: Valentin
Author-X-Name-Last: Rousson
Title: A Five-Decision Testing Procedure to Infer the Value of a Unidimensional Parameter
Abstract:
A statistical test can be seen as a procedure to produce a decision based on observed data, where some decisions consist of rejecting a hypothesis (yielding a significant result) and some do not, and where one controls the probability to make a wrong rejection at some prespecified significance level. Whereas traditional hypothesis testing involves only two possible decisions (to reject or not a null hypothesis), Kaiser’s directional two-sided test as well as the more recently introduced testing procedure of Jones and Tukey, each equivalent to running two one-sided tests, involve three possible decisions to infer the value of a unidimensional parameter. The latter procedure assumes that a point null hypothesis is impossible (e.g., that two treatments cannot have exactly the same effect), allowing a gain of statistical power. There are, however, situations where a point hypothesis is indeed plausible, for example, when considering hypotheses derived from Einstein’s theories. In this article, we introduce a five-decision rule testing procedure, equivalent to running a traditional two-sided test in addition to two one-sided tests, which combines the advantages of the testing procedures of Kaiser (no assumption on a point hypothesis being impossible) and Jones and Tukey (higher power), allowing for a nonnegligible (typically 20%) reduction of the sample size needed to reach a given statistical power to get a significant result, compared to the traditional approach.
Journal: The American Statistician
Pages: 321-326
Issue: 4
Volume: 73
Year: 2019
Month: 10
X-DOI: 10.1080/00031305.2018.1437075
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1437075
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:321-326
Template-Type: ReDIF-Article 1.0
Author-Name: Christopher Harms
Author-X-Name-First: Christopher
Author-X-Name-Last: Harms
Title: A Bayes Factor for Replications of ANOVA Results
Abstract:
With an increasing number of replication studies performed in psychological science, the question of how to evaluate the outcome of a replication attempt deserves careful consideration. Bayesian approaches allow to incorporate uncertainty and prior information into the analysis of the replication attempt by their design. The Replication Bayes factor, introduced by Verhagen and Wagenmakers (2014), provides quantitative, relative evidence in favor or against a successful replication. In previous work by Verhagen and Wagenmakers (2014), it was limited to the case of t-tests. In this article, the Replication Bayes factor is extended to F-tests in multigroup, fixed-effect ANOVA designs. Simulations and examples are presented to facilitate the understanding and to demonstrate the usefulness of this approach. Finally, the Replication Bayes factor is compared to other Bayesian and frequentist approaches and discussed in the context of replication attempts. R code to calculate Replication Bayes factors and to reproduce the examples in the article is available at https://osf.io/jv39h/.
Journal: The American Statistician
Pages: 327-339
Issue: 4
Volume: 73
Year: 2019
Month: 10
X-DOI: 10.1080/00031305.2018.1518787
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1518787
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:327-339
Template-Type: ReDIF-Article 1.0
Author-Name: Arnab Kumar Maity
Author-X-Name-First: Arnab Kumar
Author-X-Name-Last: Maity
Author-Name: Vivek Pradhan
Author-X-Name-First: Vivek
Author-X-Name-Last: Pradhan
Author-Name: Ujjwal Das
Author-X-Name-First: Ujjwal
Author-X-Name-Last: Das
Title: Bias Reduction in Logistic Regression with Missing Responses When the Missing Data Mechanism is Nonignorable
Abstract:
In logistic regression with nonignorable missing responses, Ibrahim and Lipsitz proposed a method for estimating regression parameters. It is known that the regression estimates obtained by using this method are biased when the sample size is small. Also, another complexity arises when the iterative estimation process encounters separation in estimating regression coefficients. In this article, we propose a method to improve the estimation of regression coefficients. In our likelihood-based method, we penalize the likelihood by multiplying it by a noninformative Jeffreys prior as a penalty term. The proposed method reduces bias and is able to handle the issue of separation. Simulation results show substantial bias reduction for the proposed method as compared to the existing method. Analyses using real world data also support the simulation findings. An R package called brlrmr is developed implementing the proposed method and the Ibrahim and Lipsitz method.
Journal: The American Statistician
Pages: 340-349
Issue: 4
Volume: 73
Year: 2019
Month: 10
X-DOI: 10.1080/00031305.2017.1407359
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1407359
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:340-349
Template-Type: ReDIF-Article 1.0
Author-Name: Yueh-Yun Chi
Author-X-Name-First: Yueh-Yun
Author-X-Name-Last: Chi
Author-Name: Deborah H. Glueck
Author-X-Name-First: Deborah H.
Author-X-Name-Last: Glueck
Author-Name: Keith E. Muller
Author-X-Name-First: Keith E.
Author-X-Name-Last: Muller
Title: Power and Sample Size for Fixed-Effects Inference in Reversible Linear Mixed Models
Abstract:
Despite the popularity of the general linear mixed model for data analysis, power and sample size methods and software are not generally available for commonly used test statistics and reference distributions. Statisticians resort to simulations with homegrown and uncertified programs or rough approximations which are misaligned with the data analysis. For a wide range of designs with longitudinal and clustering features, we provide accurate power and sample size approximations for inference about fixed effects in the linear models we call reversible. We show that under widely applicable conditions, the general linear mixed-model Wald test has noncentral distributions equivalent to well-studied multivariate tests. In turn, exact and approximate power and sample size results for the multivariate Hotelling–Lawley test provide exact and approximate power and sample size results for the mixed-model Wald test. The calculations are easily computed with a free, open-source product that requires only a web browser to use. Commercial software can be used for a smaller range of reversible models. Simple approximations allow accounting for modest amounts of missing data. A real-world example illustrates the methods. Sample size results are presented for a multicenter study on pregnancy. The proposed study, an extension of a funded project, has clustering within clinic. Exchangeability among the participants allows averaging across them to remove the clustering structure. The resulting simplified design is a single-level longitudinal study. Multivariate methods for power provide an approximate sample size. All proofs and inputs for the example are in the supplementary materials (available online).
Journal: The American Statistician
Pages: 350-359
Issue: 4
Volume: 73
Year: 2019
Month: 10
X-DOI: 10.1080/00031305.2017.1415972
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1415972
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:350-359
Template-Type: ReDIF-Article 1.0
Author-Name: Alice Richardson
Author-X-Name-First: Alice
Author-X-Name-Last: Richardson
Title: A Comparative Review of Nonparametric Statistics Textbooks
Abstract:
In this article I will review six textbooks commonly set in University undergraduate nonparametric statistics courses. The books will be evaluated in terms of how key statistical concepts are presented; use of software; exercises; and location on a theory-applications axis and an algorithms-principles axis. The placement of books on these axes provides a novel guide for instructors looking for the book that best fits their approach to teaching nonparametric statistics.
Journal: The American Statistician
Pages: 360-366
Issue: 4
Volume: 73
Year: 2019
Month: 10
X-DOI: 10.1080/00031305.2018.1437076
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1437076
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:360-366
Template-Type: ReDIF-Article 1.0
Author-Name: Ambrose Lo
Author-X-Name-First: Ambrose
Author-X-Name-Last: Lo
Title: Demystifying the Integrated Tail Probability Expectation Formula
Abstract:
Calculating the expected values of different types of random variables is a central topic in mathematical statistics. Targeted toward students and instructors in both introductory probability and statistics courses and graduate-level measure-theoretic probability courses, this pedagogical note casts light on a general expectation formula stated in terms of distribution and survival functions of random variables and discusses its educational merits. Often consigned to an end-of-chapter exercise in mathematical statistics textbooks with minimal discussion and presented under superfluous technical assumptions, this unconventional expectation formula provides an invaluable opportunity for students to appreciate the geometric meaning of expectations, which is overlooked in most undergraduate and graduate curricula, and serves as an efficient tool for the calculation of expected values that could be much more laborious by traditional means. For students’ benefit, this formula deserves a thorough in-class treatment in conjunction with the teaching of expectations. Besides clarifying some commonly held misconceptions and showing the pedagogical value of the expectation formula, this note offers guidance for instructors on teaching the formula taking the background of the target student group into account.
Journal: The American Statistician
Pages: 367-374
Issue: 4
Volume: 73
Year: 2019
Month: 10
X-DOI: 10.1080/00031305.2018.1497541
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1497541
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:367-374
Template-Type: ReDIF-Article 1.0
Author-Name: Amelia McNamara
Author-X-Name-First: Amelia
Author-X-Name-Last: McNamara
Title: Key Attributes of a Modern Statistical Computing Tool
Abstract:
In the 1990s, statisticians began thinking in a principled way about how computation could better support the learning and doing of statistics. Since then, the pace of software development has accelerated, advancements in computing and data science have moved the goalposts, and it is time to reassess. Software continues to be developed to help do and learn statistics, but there is little critical evaluation of the resulting tools, and no accepted framework with which to critique them. This article presents a set of attributes necessary for a modern statistical computing tool. The framework was designed to be broadly applicable to both novice and expert users, with a particular focus on making more supportive statistical computing environments. A modern statistical computing tool should be accessible, provide easy entry, privilege data as a first-order object, support exploratory and confirmatory analysis, allow for flexible plot creation, support randomization, be interactive, include inherent documentation, support narrative, publishing, and reproducibility, and be flexible to extensions. Ideally, all these attributes could be incorporated into one tool, supporting users at all levels, but a more reasonable goal is for tools designed for novices and professionals to “reach across the gap,” taking inspiration from each others’ strengths.
Journal: The American Statistician
Pages: 375-384
Issue: 4
Volume: 73
Year: 2019
Month: 10
X-DOI: 10.1080/00031305.2018.1482784
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1482784
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:375-384
Template-Type: ReDIF-Article 1.0
Author-Name: Philip A. White
Author-X-Name-First: Philip A.
Author-X-Name-Last: White
Author-Name: Candace Berrett
Author-X-Name-First: Candace
Author-X-Name-Last: Berrett
Author-Name: E. Shannon Neeley-Tass
Author-X-Name-First: E. Shannon
Author-X-Name-Last: Neeley-Tass
Author-Name: Michael G. Findley
Author-X-Name-First: Michael G.
Author-X-Name-Last: Findley
Title: Modeling Efficiency of Foreign Aid Allocation in Malawi
Abstract:
The Open Aid Malawi initiative has collected an unprecedented database that identifies as much location-specific information as possible for each of over 2500 individual foreign aid donations to Malawi since 2003. The efficient use and distribution of such aid is important to donors and to Malawi citizens. However, because of individual donor goals and difficulty in tracking donor coordination it is difficult to determine whether aid allocation is efficient. We compare several Bayesian spatial generalized linear mixed models to relate aid allocation to various economic indicators within seven donation sectors. We find that the spatial gamma regression model best predicts current aid allocation. While we are cautious about making strong claims based on this exploratory study, we provide a methodology by which one could (i) evaluate the efficiency of aid allocation via a study of the locations of current aid allocation as compared to the need at those locations and (ii) come up with a strategy for efficient allocation of resources in conditions where there exists an ideal relationship between aid allocation and economic sectors.
Journal: The American Statistician
Pages: 385-399
Issue: 4
Volume: 73
Year: 2019
Month: 10
X-DOI: 10.1080/00031305.2018.1470032
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1470032
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:385-399
Template-Type: ReDIF-Article 1.0
Author-Name: Ronald D. Snee
Author-X-Name-First: Ronald D.
Author-X-Name-Last: Snee
Title: We Stand on the Shoulders of Giants—Pioneers of Statistics in Industry
Abstract:
Industrial statistics has a rich and proud heritage. The field was initiated in the 1920s and picked up steam in the 1950s with the establishment of industrial statistics groups in several companies including American Cyanamid, DuPont, General Electric, Kodak, Western Electric, Procter and Gamble, General Foods, and 3M. It can be argued that we are in the third generation of the development of the profession. Indeed we are standing on the shoulders of giants. Several pioneering industrial statistics organizations are profiled in this article. The focus is on the roots of the organizations, the people involved and their contributions to their employers, advancements in the field and the development of the profession. Synthesis of this information provides some unique insights into who we are, what we have accomplished, and the needs and opportunities of the future.
Journal: The American Statistician
Pages: 400-407
Issue: 4
Volume: 73
Year: 2019
Month: 10
X-DOI: 10.1080/00031305.2018.1543140
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1543140
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:400-407
Template-Type: ReDIF-Article 1.0
Author-Name: Alexandre Galvão Patriota
Author-X-Name-First: Alexandre Galvão
Author-X-Name-Last: Patriota
Title: On the Mean Value Theorem for Estimating Functions
Abstract:
Feng et al. revealed that the usual mean value theorem (MVT) should not be applied directly to a vector-valued function (e.g., the score function or a general estimating function under a multiparametric model). This note shows that the application of the Cramer–Wold’s device to a corrected version of the MVT is sufficient to obtain standard asymptotics for the estimators attained from vector-valued estimating functions.
Journal: The American Statistician
Pages: 408-410
Issue: 4
Volume: 73
Year: 2019
Month: 10
X-DOI: 10.1080/00031305.2018.1558110
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1558110
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:408-410
Template-Type: ReDIF-Article 1.0
Author-Name: Jesse Frey
Author-X-Name-First: Jesse
Author-X-Name-Last: Frey
Title: Comment on VanDerwerken (2019)
Journal: The American Statistician
Pages: 411-412
Issue: 4
Volume: 73
Year: 2019
Month: 10
X-DOI: 10.1080/00031305.2019.1604433
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1604433
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:411-412
Template-Type: ReDIF-Article 1.0
Author-Name: Peter Bacchetti
Author-X-Name-First: Peter
Author-X-Name-Last: Bacchetti
Title: The Other Arbitrary Cutoff
Journal: The American Statistician
Pages: 413-414
Issue: 4
Volume: 73
Year: 2019
Month: 10
X-DOI: 10.1080/00031305.2019.1654920
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1654920
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:413-414
Template-Type: ReDIF-Article 1.0
Author-Name: Anelise G. Sabbag
Author-X-Name-First: Anelise G.
Author-X-Name-Last: Sabbag
Title: Handbook of Educational Measurement and Psychometrics Using R.
Journal: The American Statistician
Pages: 415-416
Issue: 4
Volume: 73
Year: 2019
Month: 10
X-DOI: 10.1080/00031305.2019.1676110
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1676110
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:415-416
Template-Type: ReDIF-Article 1.0
Author-Name: Megan D. Higgs
Author-X-Name-First: Megan D.
Author-X-Name-Last: Higgs
Title: Randomistas: How Radical Researchers Are Changing Our World.
Journal: The American Statistician
Pages: 416-417
Issue: 4
Volume: 73
Year: 2019
Month: 10
X-DOI: 10.1080/00031305.2019.1676111
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1676111
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:416-417
Template-Type: ReDIF-Article 1.0
Author-Name: Christian Litterer
Author-X-Name-First: Christian
Author-X-Name-Last: Litterer
Title: Stochastic Processes: From Applications to Theory.
Journal: The American Statistician
Pages: 418-419
Issue: 4
Volume: 73
Year: 2019
Month: 10
X-DOI: 10.1080/00031305.2019.1676116
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1676116
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:418-419
Template-Type: ReDIF-Article 1.0
Author-Name: The Editors
Title: Correction
Journal: The American Statistician
Pages: 420-420
Issue: 4
Volume: 73
Year: 2019
Month: 10
X-DOI: 10.1080/00031305.2019.1660112
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1660112
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:420-420
Template-Type: ReDIF-Article 1.0
Author-Name: The Editors
Title: Editorial Collaborators
Journal: The American Statistician
Pages: 420-421
Issue: 4
Volume: 73
Year: 2019
Month: 10
X-DOI: 10.1080/00031305.2019.1680048
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1680048
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:4:p:420-421
Template-Type: ReDIF-Article 1.0
Author-Name: Luca Bagnato
Author-X-Name-First: Luca
Author-X-Name-Last: Bagnato
Author-Name: Lucio De Capitani
Author-X-Name-First: Lucio
Author-X-Name-Last: De Capitani
Author-Name: Antonio Punzo
Author-X-Name-First: Antonio
Author-X-Name-Last: Punzo
Title: Testing for Serial Independence: Beyond the Portmanteau Approach
Abstract:
Portmanteau tests are typically used to test serial independence even if, by construction, they are generally powerful only in presence of pairwise dependence between lagged variables. In this article, we present a simple statistic defining a new serial independence test, which is able to detect more general forms of dependence. In particular, differently from the Portmanteau tests, the resulting test is powerful also under a dependent process characterized by pairwise independence. A diagram, based on p-values from the proposed test, is introduced to investigate serial dependence. Finally, the effectiveness of the proposal is evaluated in a simulation study and with an application on financial data. Both show that the new test, used in synergy with the existing ones, helps in the identification of the true data-generating process. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 219-238
Issue: 3
Volume: 72
Year: 2018
Month: 7
X-DOI: 10.1080/00031305.2016.1264314
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1264314
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:3:p:219-238
Template-Type: ReDIF-Article 1.0
Author-Name: Jin Zhang
Author-X-Name-First: Jin
Author-X-Name-Last: Zhang
Title: Minimum Volume Confidence Sets for Two-Parameter Exponential Distributions
Abstract:
Under a reasonable restriction, we create the minimum volume confidence set for location and scale parameters of the exponential distribution. Compared to existing methods, none of which has a minimum-area property, the new confidence set is significantly the best (most accurate) with smallest volume, for whatever confidence level, sample size, and sample data.
Journal: The American Statistician
Pages: 213-218
Issue: 3
Volume: 72
Year: 2018
Month: 7
X-DOI: 10.1080/00031305.2016.1264315
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1264315
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:3:p:213-218
Template-Type: ReDIF-Article 1.0
Author-Name: Hauke Thaden
Author-X-Name-First: Hauke
Author-X-Name-Last: Thaden
Author-Name: Thomas Kneib
Author-X-Name-First: Thomas
Author-X-Name-Last: Kneib
Title: Structural Equation Models for Dealing With Spatial Confounding
Abstract:
In regression analyses of spatially structured data, it is common practice to introduce spatially correlated random effects into the regression model to reduce or even avoid unobserved variable bias in the estimation of other covariate effects. If besides the response the covariates are also spatially correlated, the spatial effects may confound the effect of the covariates or vice versa. In this case, the model fails to identify the true covariate effect due to multicollinearity. For highly collinear continuous covariates, path analysis and structural equation modeling techniques prove to be helpful to disentangle direct covariate effects from indirect covariate effects arising from correlation with other variables. This work discusses the applicability of these techniques in regression setups, where spatial and covariate effects coincide at least partly and classical geoadditive models fail to separate these effects. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 239-252
Issue: 3
Volume: 72
Year: 2018
Month: 7
X-DOI: 10.1080/00031305.2017.1305290
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1305290
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:3:p:239-252
Template-Type: ReDIF-Article 1.0
Author-Name: George W. Divine
Author-X-Name-First: George W.
Author-X-Name-Last: Divine
Author-Name: H. James Norton
Author-X-Name-First: H. James
Author-X-Name-Last: Norton
Author-Name: Anna E. Barón
Author-X-Name-First: Anna E.
Author-X-Name-Last: Barón
Author-Name: Elizabeth Juarez-Colunga
Author-X-Name-First: Elizabeth
Author-X-Name-Last: Juarez-Colunga
Title: The Wilcoxon–Mann–Whitney Procedure Fails as a Test of Medians
Abstract:
To illustrate and document the tenuous connection between the Wilcoxon–Mann–Whitney (WMW) procedure and medians, its relationship to mean ranks is first contrasted with the relationship of a t-test to means. The quantity actually tested: Pr ^(X1<X2)+ Pr ^(X1=X2)/2$\widehat {{\rm{Pr}}}({{{X}}_1} < {{{X}}_2}) + \widehat {{\rm{Pr}}}({{{X}}_1} = {{{X}}_2})/2$ is then described and recommended as the basis for an alternative summary statistic that can be employed instead of medians. In order to graphically represent an estimate of the quantity: Pr(X1 < X2) + Pr(X1 = X2)/2, use of a bubble plot, an ROC curve and a dominance diagram are illustrated. Several counter-examples (real and constructed) are presented, all demonstrating that the WMW procedure fails to be a test of medians. The discussion also addresses another, less common and perhaps less clear cut, but potentially even more important misconception: that the WMW procedure requires continuous data in order to be valid. Discussion of other issues surrounding the question of the WMW procedure and medians is presented, along with the authors' teaching experience with the topic. SAS code used for the examples is included as supplementary material.
Journal: The American Statistician
Pages: 278-286
Issue: 3
Volume: 72
Year: 2018
Month: 7
X-DOI: 10.1080/00031305.2017.1305291
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1305291
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:3:p:278-286
Template-Type: ReDIF-Article 1.0
Author-Name: Ashok Kumar Pathak
Author-X-Name-First: Ashok Kumar
Author-X-Name-Last: Pathak
Title: A Simple Probabilistic Proof for the Alternating Convolution of the Central Binomial Coefficients
Abstract:
This note presents a simple probabilistic proof of the identity for the alternating convolution of the central binomial coefficients. The proof of the identity involves the computation of moments of order n for the product of standard normal random variables.
Journal: The American Statistician
Pages: 287-288
Issue: 3
Volume: 72
Year: 2018
Month: 7
X-DOI: 10.1080/00031305.2017.1358216
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1358216
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:3:p:287-288
Template-Type: ReDIF-Article 1.0
Author-Name: Weihua An
Author-X-Name-First: Weihua
Author-X-Name-Last: An
Author-Name: Ying Ding
Author-X-Name-First: Ying
Author-X-Name-Last: Ding
Title: The Landscape of Causal Inference: Perspective From Citation Network Analysis
Abstract:
Causal inference is a fast-growing multidisciplinary field that has drawn extensive interests from statistical sciences and health and social sciences. In this article, we gather comprehensive information on publications and citations in causal inference and provide a review of the field from the perspective of citation network analysis. We provide descriptive analyses by showing the most cited publications, the most prolific and the most cited authors, and structural properties of the citation network. Then, we examine the citation network through exponential random graph models (ERGMs). We show that both technical aspects of the publications (e.g., publication length, time and quality) and social processes such as homophily (the tendency to cite publications in the same field or with shared authors), cumulative advantage, and transitivity (the tendency to cite references’ references), matter for citations. We also provide specific analysis of citations among the top authors in the field and present a ranking and clustering of the authors. Overall, our article reveals new insights into the landscape of the field of causal inference and may serve as a case study for analyzing citation networks in a multidisciplinary field and for fitting ERGMs on big networks. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 265-277
Issue: 3
Volume: 72
Year: 2018
Month: 7
X-DOI: 10.1080/00031305.2017.1360794
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1360794
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:3:p:265-277
Template-Type: ReDIF-Article 1.0
Author-Name: Gilbert W. Fellingham
Author-X-Name-First: Gilbert W.
Author-X-Name-Last: Fellingham
Author-Name: Jared D. Fisher
Author-X-Name-First: Jared D.
Author-X-Name-Last: Fisher
Title: Predicting Home Run Production in Major League Baseball Using a Bayesian Semiparametric Model
Abstract:
This article attempts to predict home run hitting performance of Major League Baseball players using a Bayesian semiparametric model. Following Berry, Reese and Larkey we include in the model effects for era of birth, season of play, and home ball park. We estimate performance curves for each player using orthonormal quartic polynomials. We use a Dirichlet process prior on the unknown distribution for the coefficients of the polynomials, and parametric priors for the other effects. Dirichlet process priors are useful in prediction for two reasons: (1) an increased probability of obtaining more precise prediction comes with the increased flexibility of the prior specification, and (2) the clustering inherent in the Dirichlet process provides the means to share information across players. Data from 1871 to 2008 were used to fit the model. Data from 2009 to 2016 were used to test the predictive ability of the model. A parametric model was also fit to compare the predictive performance of the models. We used what we called “pure performance” curves to predict future performance for 22 players. The nonparametric method provided superior predictive performance.
Journal: The American Statistician
Pages: 253-264
Issue: 3
Volume: 72
Year: 2018
Month: 7
X-DOI: 10.1080/00031305.2017.1401959
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1401959
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:3:p:253-264
Template-Type: ReDIF-Article 1.0
Author-Name: Daniel Cerqueira
Author-X-Name-First: Daniel
Author-X-Name-Last: Cerqueira
Author-Name: Danilo Coelho
Author-X-Name-First: Danilo
Author-X-Name-Last: Coelho
Author-Name: Marcelo Fernandes
Author-X-Name-First: Marcelo
Author-X-Name-Last: Fernandes
Author-Name: Jony Pinto Junior
Author-X-Name-First: Jony Pinto
Author-X-Name-Last: Junior
Title: Guns and Suicides
Abstract:
There is a consensus in the literature that the ratio of suicides committed with guns to total suicides is the best indirect measure of gun ownership. However, such a proxy is not accurate for any locality with low population density in view that suicides are rare events. To circumvent this issue, we exploit the socioeconomic characteristics of the suicide victims in order to come up with a novel proxy for gun ownership. We assess our indicator using suicide micro-data from the Brazilian Ministry of Health between 2000 and 2010.
Journal: The American Statistician
Pages: 289-294
Issue: 3
Volume: 72
Year: 2018
Month: 7
X-DOI: 10.1080/00031305.2017.1419144
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1419144
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:3:p:289-294
Template-Type: ReDIF-Article 1.0
Author-Name: Olivier J. M. Guilbaud
Author-X-Name-First: Olivier J. M.
Author-X-Name-Last: Guilbaud
Title: Some Complementary History and Results
Journal: The American Statistician
Pages: 300-301
Issue: 3
Volume: 72
Year: 2018
Month: 7
X-DOI: 10.1080/00031305.2018.1448892
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1448892
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:3:p:300-301
Template-Type: ReDIF-Article 1.0
Author-Name: Alan Hutson
Author-X-Name-First: Alan
Author-X-Name-Last: Hutson
Title: Comment on “What Do Interpolated Nonparametric Confidence Intervals for Population Quantiles Guarantee?”, Frey and Zhang (2017)
Journal: The American Statistician
Pages: 302-302
Issue: 3
Volume: 72
Year: 2018
Month: 7
X-DOI: 10.1080/00031305.2018.1448893
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1448893
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:3:p:302-302
Template-Type: ReDIF-Article 1.0
Author-Name: The Editors
Title: Reviews of Books and Teaching Materials
Journal: The American Statistician
Pages: 295-299
Issue: 3
Volume: 72
Year: 2018
Month: 7
X-DOI: 10.1080/00031305.2018.1496649
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1496649
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:3:p:295-299
Template-Type: ReDIF-Article 1.0
Author-Name: Christopher Abdul-Chani
Author-X-Name-First: Christopher
Author-X-Name-Last: Abdul-Chani
Author-Name: Jesse Frey
Author-X-Name-First: Jesse
Author-X-Name-Last: Frey
Title: Improving the Big East Conference Basketball Tournament
Abstract:
The Big East Conference basketball tournament is a four-day, 10-team, knockout tournament that is used to decide which team receives the conference’s automatic bid to the NCAA basketball tournament. Through data-based modeling, we show that the current tournament format is not very effective in determining the true best team. Specifically, by considering a variety of alternate formats, we find that certain formats that exclude all but a handful of teams substantially outperform the current format in determining the true best team. We also find that among formats that involve all ten teams, a format in which the top two seeds each receive two byes is relatively effective. We show that our conclusions are robust to several key modeling choices. We also investigate the effectiveness of the tie-breaking scheme used by the Big East Conference, finding that it is little better than random and may even favor weaker teams.
Journal: The American Statistician
Pages: 342-349
Issue: 4
Volume: 70
Year: 2016
Month: 10
X-DOI: 10.1080/00031305.2015.1105153
File-URL: http://hdl.handle.net/10.1080/00031305.2015.1105153
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:342-349
Template-Type: ReDIF-Article 1.0
Author-Name: David Quarfoot
Author-X-Name-First: David
Author-X-Name-Last: Quarfoot
Author-Name: Richard A. Levine
Author-X-Name-First: Richard A.
Author-X-Name-Last: Levine
Title: How Robust Are Multirater Interrater Reliability Indices to Changes in Frequency Distribution?
Abstract:
Interrater reliability studies are used in a diverse set of fields. Often, these investigations involve three or more raters, and thus, require the use of indices such as Fleiss’s kappa, Conger’s kappa, or Krippendorff’s alpha. Through two motivating examples—one theoretical and one from practice—this article exposes limitations of these indices when the units to be rated are not well-distributed across the rating categories. Then, using a Monte Carlo simulation and information visualizations, we argue for the use of two alternative indices, the Brennan–Prediger coefficient and Gwet’s AC2, because the agreement levels reported by these indices are more robust to variation in the distribution of units that raters encounter. The article concludes by exploring the complex, interwoven relationship between the number of levels in a rating instrument, the agreement level present among raters, and the distribution of units that are to be scored. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 373-384
Issue: 4
Volume: 70
Year: 2016
Month: 10
X-DOI: 10.1080/00031305.2016.1141708
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1141708
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:373-384
Template-Type: ReDIF-Article 1.0
Author-Name: Matthias Katzfuss
Author-X-Name-First: Matthias
Author-X-Name-Last: Katzfuss
Author-Name: Jonathan R. Stroud
Author-X-Name-First: Jonathan R.
Author-X-Name-Last: Stroud
Author-Name: Christopher K. Wikle
Author-X-Name-First: Christopher K.
Author-X-Name-Last: Wikle
Title: Understanding the Ensemble Kalman Filter
Abstract:
The ensemble Kalman filter (EnKF) is a computational technique for approximate inference in state-space models. In typical applications, the state vectors are large spatial fields that are observed sequentially over time. The EnKF approximates the Kalman filter by representing the distribution of the state with an ensemble of draws from that distribution. The ensemble members are updated based on newly available data by shifting instead of reweighting, which allows the EnKF to avoid the degeneracy problems of reweighting-based algorithms. Taken together, the ensemble representation and shifting-based updates make the EnKF computationally feasible even for extremely high-dimensional state spaces. The EnKF is successfully used in data-assimilation applications with tens of millions of dimensions. While it implicitly assumes a linear Gaussian state-space model, it has also turned out to be remarkably robust to deviations from these assumptions in many applications. Despite its successes, the EnKF is largely unknown in the statistics community. We aim to change that with the present article, and to entice more statisticians to work on this topic.
Journal: The American Statistician
Pages: 350-357
Issue: 4
Volume: 70
Year: 2016
Month: 10
X-DOI: 10.1080/00031305.2016.1141709
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1141709
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:350-357
Template-Type: ReDIF-Article 1.0
Author-Name: Brendan Rocks
Author-X-Name-First: Brendan
Author-X-Name-Last: Rocks
Title: Interval Estimation for the “Net Promoter Score”
Abstract:
The net promoter score (NPS) is a novel summary statistic used by thousands of companies as a key performance indicator of customer loyalty. While adoption of the statistic has grown rapidly over the last decade, there has been little published on its statistical properties. Common interval estimation techniques are adapted for use with the NPS, and performance assessed on the largest available database of companies’ net promoter scores. Variations on the adjusted Wald, and an iterative score test are found to have superior performance.
Journal: The American Statistician
Pages: 365-372
Issue: 4
Volume: 70
Year: 2016
Month: 10
X-DOI: 10.1080/00031305.2016.1158124
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1158124
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:365-372
Template-Type: ReDIF-Article 1.0
Author-Name: Jack Bowden
Author-X-Name-First: Jack
Author-X-Name-Last: Bowden
Author-Name: Chris Jackson
Author-X-Name-First: Chris
Author-X-Name-Last: Jackson
Title: Weighing Evidence “Steampunk” Style via the Meta-Analyser
Abstract:
The funnel plot is a graphical visualization of summary data estimates from a meta-analysis, and is a useful tool for detecting departures from the standard modeling assumptions. Although perhaps not widely appreciated, a simple extension of the funnel plot can help to facilitate an intuitive interpretation of the mathematics underlying a meta-analysis at a more fundamental level, by equating it to determining the center of mass of a physical system. We used this analogy to explain the concepts of weighing evidence and of biased evidence to a young audience at the Cambridge Science Festival, without recourse to precise definitions or statistical formulas and with a little help from Sherlock Holmes! Following on from the science fair, we have developed an interactive web-application (named the Meta-Analyser) to bring these ideas to a wider audience. We envisage that our application will be a useful tool for researchers when interpreting their data. First, to facilitate a simple understanding of fixed and random effects modeling approaches; second, to assess the importance of outliers; and third, to show the impact of adjusting for small study bias. This final aim is realized by introducing a novel graphical interpretation of the well-known method of Egger regression.
Journal: The American Statistician
Pages: 385-394
Issue: 4
Volume: 70
Year: 2016
Month: 10
X-DOI: 10.1080/00031305.2016.1165735
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1165735
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:385-394
Template-Type: ReDIF-Article 1.0
Author-Name: Joel E. Cohen
Author-X-Name-First: Joel E.
Author-X-Name-Last: Cohen
Title: Statistics of Primes (and Probably Twin Primes) Satisfy Taylor's Law from Ecology
Abstract:
Taylor's law, which originated in ecology, states that, in sets of measurements of population density, the sample variance is approximately proportional to a power of the sample mean. Taylor's law has been verified for many species ranging from bacterial to human. Here, we show that the variance V(x) and the mean M(x) of the primes not exceeding a real number x obey Taylor's law asymptotically for large x. Specifically, V(x) ∼ (1/3)(M(x))2 as x → ∞. This apparently new fact about primes shows that Taylor's law may arise in the absence of biological processes, and that patterns discovered in biological data can suggest novel questions in number theory. If the Hardy-Littlewood twin primes conjecture is true, then the identical Taylor's law holds also for twin primes. Taylor's law holds in both instances because the primes (and the twin primes, given the conjecture) not exceeding x are asymptotically uniformly distributed on the integers in [2, x]. Hence, asymptotically M(x) ∼ x/2, V(x) ∼ x2/12. Higher-order moments of the primes (twin primes) not exceeding x satisfy a generalized Taylor's law. The 11,078,937 primes and 813,371 twin primes not exceeding 2 × 108 illustrate these results.
Journal: The American Statistician
Pages: 399-404
Issue: 4
Volume: 70
Year: 2016
Month: 10
X-DOI: 10.1080/00031305.2016.1173591
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1173591
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:399-404
Template-Type: ReDIF-Article 1.0
Author-Name: Adam Jaeger
Author-X-Name-First: Adam
Author-X-Name-Last: Jaeger
Title: Computation of Two- and Three-Dimensional Confidence Regions With the Likelihood Ratio
Abstract:
The asymptotic results pertaining to the distribution of the log-likelihood ratio allow for the creation of a confidence region, which is a general extension of the confidence interval. Two- and three-dimensional regions can be displayed visually to describe the plausible region of the parameters of interest simultaneously. While most advanced statistical textbooks on inference discuss these asymptotic confidence regions, there is no exploration of how to numerically compute these regions for graphical purposes. This article demonstrates the application of a simple trigonometric transformation to compute two- and three-dimensional confidence regions; we transform the Cartesian coordinates of the parameters to create what we call the radial profile log-likelihood. The method is applicable to any distribution with a defined likelihood function, so it is not limited to specific data distributions or model paradigms. We describe the method along with the algorithm, follow with an example of our method, and end with an examination of computation time. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 395-398
Issue: 4
Volume: 70
Year: 2016
Month: 10
X-DOI: 10.1080/00031305.2016.1182946
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1182946
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:395-398
Template-Type: ReDIF-Article 1.0
Author-Name: Piaomu Liu
Author-X-Name-First: Piaomu
Author-X-Name-Last: Liu
Author-Name: Edsel A. Peña
Author-X-Name-First: Edsel A.
Author-X-Name-Last: Peña
Title: Sojourning With the Homogeneous Poisson Process
Abstract:
In this pedagogical article, distributional properties, some surprising, pertaining to the homogeneous Poisson process (HPP), when observed over a possibly random window, are presented. Properties of the gap-time that covered the termination time and the correlations among gap-times of the observed events are obtained. Inference procedures, such as estimation and model validation, based on event occurrence data over the observation window, are also presented. We envision that through the results in this article, a better appreciation of the subtleties involved in the modeling and analysis of recurrent events data will ensue, since the HPP is arguably one of the simplest among recurrent event models. In addition, the use of the theorem of total probability, Bayes’ theorem, the iterated rules of expectation, variance and covariance, and the renewal equation could be illustrative when teaching distribution theory, mathematical statistics, and stochastic processes at both the undergraduate and graduate levels. This article is targeted toward both instructors and students.
Journal: The American Statistician
Pages: 413-423
Issue: 4
Volume: 70
Year: 2016
Month: 10
X-DOI: 10.1080/00031305.2016.1200484
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1200484
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:413-423
Template-Type: ReDIF-Article 1.0
Author-Name: Yan Xia
Author-X-Name-First: Yan
Author-X-Name-Last: Xia
Author-Name: Yanyun Yang
Author-X-Name-First: Yanyun
Author-X-Name-Last: Yang
Title: Bias Introduced by Rounding in Multiple Imputation for Ordered Categorical Variables
Abstract:
Multivariate normality is frequently assumed when multiple imputation is applied for missing data. When data are ordered categorical, imputing missing data using the fully normal imputation results in implausible values falling outside of the categorical values. Naïve rounding has been suggested to round the imputed values to their categorical neighbors for further analysis. Previous studies showed that, for binary data, the rounded values can result in biased mean estimation when the population distribution is asymmetric. However, it has been conjectured that as the number of categories increases, the bias will decrease. To investigate this conjecture, the present study derives the formulas for the biases of the mean and standard deviation for ordered categorical variables with naïve rounding. Results show that both the biases of the mean and standard deviation decrease as the number of categories increases from 3 to 9. This study also finds that although symmetric population distributions lead to unbiased means of the rounded values, the standard deviations may still be largely biased. A simulation study further shows that the biases due to naïve rounding can result in substantially low coverage rates for the population mean parameter.
Journal: The American Statistician
Pages: 358-364
Issue: 4
Volume: 70
Year: 2016
Month: 10
X-DOI: 10.1080/00031305.2016.1200486
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1200486
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:358-364
Template-Type: ReDIF-Article 1.0
Author-Name: Kimihiro Noguchi
Author-X-Name-First: Kimihiro
Author-X-Name-Last: Noguchi
Author-Name: Fernando Marmolejo-Ramos
Author-X-Name-First: Fernando
Author-X-Name-Last: Marmolejo-Ramos
Title: Assessing Equality of Means Using the Overlap of Range-Preserving Confidence Intervals
Abstract:
Hypothesis testing procedures where equality of means is assessed at a prespecified level based on the (non-)overlap of confidence intervals are discussed. Assessing statistical significance via the (non-)overlap of two confidence intervals with an appropriate confidence level provides a simple and effective way of visually understanding statistical results. This article extends previous approaches by considering range-preserving confidence intervals where the values in such intervals are in the allowable range of the parameter of interest. To obtain reliable procedures, appropriate effective degrees of freedom are suggested by considering the Welch-Satterthwaite equation for both independent two-sample and paired-sample cases. The proposed procedures also allow users to express results in terms of commonly used scale-free effect sizes, which are highly useful for interpreting parameters of interest. Simulation results suggest that the proposed procedures may be robust to unequal or small sample sizes, nonnormal distributions, heterogeneous variances, and various degrees of correlation. A real-life application from a study in cognitive psychology illustrates the effectiveness of the proposed procedures.
Journal: The American Statistician
Pages: 325-334
Issue: 4
Volume: 70
Year: 2016
Month: 10
X-DOI: 10.1080/00031305.2016.1200487
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1200487
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:325-334
Template-Type: ReDIF-Article 1.0
Author-Name: Amy Wagaman
Author-X-Name-First: Amy
Author-X-Name-Last: Wagaman
Title: Meeting Student Needs for Multivariate Data Analysis: A Case Study in Teaching an Undergraduate Multivariate Data Analysis Course
Abstract:
Modern students encounter large, messy datasets long before setting foot in our classrooms. Many of these students need to develop skills in exploratory data analysis and multivariate analysis techniques for their jobs after college, but such topics are not covered in traditional introductory statistics courses. This case study describes my experience in designing and teaching an undergraduate course on multivariate data analysis with minimal prerequisites, using real data, active learning, and other interactive activities to help students tackle the material. Multivariate topics covered include clustering and classification (among others) for exploratory data analysis and an introduction to algorithmic modeling. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 405-412
Issue: 4
Volume: 70
Year: 2016
Month: 10
X-DOI: 10.1080/00031305.2016.1201005
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1201005
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:405-412
Template-Type: ReDIF-Article 1.0
Author-Name: Leonhard Held
Author-X-Name-First: Leonhard
Author-X-Name-Last: Held
Author-Name: Manuela Ott
Author-X-Name-First: Manuela
Author-X-Name-Last: Ott
Title: How the Maximal Evidence of -Values Against Point Null Hypotheses Depends on Sample Size
Abstract:
Minimum Bayes factors are commonly used to transform two-sided p-values to lower bounds on the posterior probability of the null hypothesis. Several proposals exist in the literature, but none of them depends on the sample size. However, the evidence of a p-value against a point null hypothesis is known to depend on the sample size. In this article, we consider p-values in the linear model and propose new minimum Bayes factors that depend on sample size and converge to existing bounds as the sample size goes to infinity. It turns out that the maximal evidence of an exact two-sided p-value increases with decreasing sample size. The effect of adjusting minimum Bayes factors for sample size is shown in two applications.
Journal: The American Statistician
Pages: 335-341
Issue: 4
Volume: 70
Year: 2016
Month: 10
X-DOI: 10.1080/00031305.2016.1209128
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1209128
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:335-341
Template-Type: ReDIF-Article 1.0
Author-Name: Lawrence M. Lesser
Author-X-Name-First: Lawrence M.
Author-X-Name-Last: Lesser
Title: Letter to the Editor
Journal: The American Statistician
Pages: 434-434
Issue: 4
Volume: 70
Year: 2016
Month: 10
X-DOI: 10.1080/00031305.2016.1222310
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1222310
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:434-434
Template-Type: ReDIF-Article 1.0
Author-Name: The Editors
Title: Reviews of Books and Teaching Materials
Journal: The American Statistician
Pages: 424-433
Issue: 4
Volume: 70
Year: 2016
Month: 10
X-DOI: 10.1080/00031305.2016.1234902
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1234902
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:424-433
Template-Type: ReDIF-Article 1.0
Author-Name: The Editors
Title: Editorial Collaborators
Journal: The American Statistician
Pages: 435-437
Issue: 4
Volume: 70
Year: 2016
Month: 10
X-DOI: 10.1080/00031305.2016.1248726
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1248726
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:435-437
Template-Type: ReDIF-Article 1.0
Author-Name: The Editors
Title: Editorial Board EOV
Journal: The American Statistician
Pages: ebi-ebi
Issue: 4
Volume: 70
Year: 2016
Month: 10
X-DOI: 10.1080/00031305.2016.1250537
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1250537
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:ebi-ebi
Template-Type: ReDIF-Article 1.0
Author-Name: Alan D. Hutson
Author-X-Name-First: Alan D.
Author-X-Name-Last: Hutson
Author-Name: Albert Vexler
Author-X-Name-First: Albert
Author-X-Name-Last: Vexler
Title: A Cautionary Note on Beta Families of Distributions and the Aliases Within
Abstract:
In this note, we examine the four parameter beta family of distributions in the context of the beta-normal and beta-logistic distributions. In the process, we highlight the concept of numerical and limiting alias distributions, which in turn relate to numerical instabilities in the numerical maximum likelihood fitting routines for these families of distributions. We conjecture that the numerical issues pertaining to fitting these multiparameter distributions may be more widespread than has originally been reported across several families of distributions.
Journal: The American Statistician
Pages: 121-129
Issue: 2
Volume: 72
Year: 2018
Month: 4
X-DOI: 10.1080/00031305.2016.1213661
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1213661
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:2:p:121-129
Template-Type: ReDIF-Article 1.0
Author-Name: Sashi Kanth Tadinada
Author-X-Name-First: Sashi Kanth
Author-X-Name-Last: Tadinada
Author-Name: Abhinav Gupta
Author-X-Name-First: Abhinav
Author-X-Name-Last: Gupta
Title: Simulation of Constrained Variables in Engineering Risk Analyses
Abstract:
The problem of sampling random variables with overlapping pdfs subject to inequality constraints is addressed. Often, the values of physical variables in an engineering model are interrelated. This mutual dependence imposes inequality constraints on the random variables representing these parameters. Ignoring the interdependencies and sampling the variables independently can lead to inconsistency/bias. We propose an algorithm to generate samples of constrained random variables that are characterized by typical continuous probability distributions and are subject to different kinds of inequality constraints. The sampling procedure is illustrated for various representative cases and one realistic application to simulation of structural natural frequencies.
Journal: The American Statistician
Pages: 130-139
Issue: 2
Volume: 72
Year: 2018
Month: 4
X-DOI: 10.1080/00031305.2016.1255660
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1255660
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:2:p:130-139
Template-Type: ReDIF-Article 1.0
Author-Name: Santiago Velilla
Author-X-Name-First: Santiago
Author-X-Name-Last: Velilla
Title: A Note on Collinearity Diagnostics and Centering
Abstract:
The usual approach for diagnosing collinearity proceeds by centering and standardizing the regressors. The sample correlation matrix of the predictors is then the basic tool for describing approximate linear combinations that may distort the conclusions of a standard least-square analysis. However, as indicated by several authors, centering may eventually fail to detect the sources of ill-conditioning. In spite of this earlier claim, there does not seem to be in the literature a fully clear explanation of the reasons for this bad potential behavior of the traditional strategy for analyzing collinearity. This note studies this issue in some detail. Results derived are motivated by the analysis of a well-known real dataset. Practical conclusions are illustrated with several examples.
Journal: The American Statistician
Pages: 140-146
Issue: 2
Volume: 72
Year: 2018
Month: 4
X-DOI: 10.1080/00031305.2016.1264312
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1264312
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:2:p:140-146
Template-Type: ReDIF-Article 1.0
Author-Name: Joel B. Greenhouse
Author-X-Name-First: Joel B.
Author-X-Name-Last: Greenhouse
Author-Name: Howard J. Seltman
Author-X-Name-First: Howard J.
Author-X-Name-Last: Seltman
Title: On Teaching Statistical Practice: From Novice to Expert
Abstract:
This article introduces principles of learning based on research in cognitive science that help explain how learning works. We adapt these principles to the teaching of statistical practice and illustrate the application of these principles to the curricular design of a new master's degree program in applied statistics. We emphasize how these principles can be used not only to improve instruction at the course level but also at the program level.
Journal: The American Statistician
Pages: 147-154
Issue: 2
Volume: 72
Year: 2018
Month: 4
X-DOI: 10.1080/00031305.2016.1270230
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1270230
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:2:p:147-154
Template-Type: ReDIF-Article 1.0
Author-Name: Rolf Sundberg
Author-X-Name-First: Rolf
Author-X-Name-Last: Sundberg
Title: A Note on “Shaved Dice” Inference
Abstract:
Two dice are rolled repeatedly, only their sum is registered. Have the two dice been “shaved,” so two of the six sides appear more frequently? Pavlides and Perlman discussed this somewhat complicated type of situation through curved exponential families. Here, we contrast their approach by regarding data as incomplete data from a simple exponential family. The latter, supplementary approach is in some respects simpler, it provides additional insight about the relationships among the likelihood equation, the Fisher information, and the EM algorithm, and it illustrates the information content in ancillary statistics.
Journal: The American Statistician
Pages: 155-157
Issue: 2
Volume: 72
Year: 2018
Month: 4
X-DOI: 10.1080/00031305.2016.1277162
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1277162
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:2:p:155-157
Template-Type: ReDIF-Article 1.0
Author-Name: José A. Sánchez-Espigares
Author-X-Name-First: José A.
Author-X-Name-Last: Sánchez-Espigares
Author-Name: Pere Grima
Author-X-Name-First: Pere
Author-X-Name-Last: Grima
Author-Name: Lluís Marco-Almagro
Author-X-Name-First: Lluís
Author-X-Name-Last: Marco-Almagro
Title: Visualizing Type II Error in Normality Tests
Abstract:
A skewed exponential power distribution, with parameters defining kurtosis and skewness, is introduced as a way to visualize Type II error in normality tests. By varying these parameters a mosaic of distributions is built, ranging from double exponential to uniform or from positive to negative exponential; the normal distribution is a particular case located in the center of the mosaic. Using a sequential color scheme, a different color is assigned to each distribution in the mosaic depending on the probability of committing a Type II error. This graph gives a visual representation of the power of the performed test. This way of representing results facilitates the comparison of the power of various tests and the influence of sample size. A script to perform this graphical representation, programmed in the R statistical software, is available online as supplementary material.
Journal: The American Statistician
Pages: 158-162
Issue: 2
Volume: 72
Year: 2018
Month: 4
X-DOI: 10.1080/00031305.2016.1278035
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1278035
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:2:p:158-162
Template-Type: ReDIF-Article 1.0
Author-Name: Saralees Nadarajah
Author-X-Name-First: Saralees
Author-X-Name-Last: Nadarajah
Author-Name: Rui Li
Author-X-Name-First: Rui
Author-X-Name-Last: Li
Title: An Expression for Fast Computation of Sample Central Moments
Abstract:
An expression is provided for the expectation of sample central moments. It is practical and offers computational advantages over the original form due to Kong (The American Statistician, 65, 2011, 198–199).
Journal: The American Statistician
Pages: 169-171
Issue: 2
Volume: 72
Year: 2018
Month: 4
X-DOI: 10.1080/00031305.2017.1286259
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1286259
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:2:p:169-171
Template-Type: ReDIF-Article 1.0
Author-Name: P. M. Kroonenberg
Author-X-Name-First: P. M.
Author-X-Name-Last: Kroonenberg
Author-Name: Albert Verbeek
Author-X-Name-First: Albert
Author-X-Name-Last: Verbeek
Title: The Tale of Cochran's Rule: My Contingency Table has so Many Expected Values Smaller than 5, What Am I to Do?
Abstract:
In an informal way, some dilemmas in connection with hypothesis testing in contingency tables are discussed. The body of the article concerns the numerical evaluation of Cochran's Rule about the minimum expected value in r × c contingency tables with fixed margins when testing independence with Pearson's X2 statistic using the χ2 distribution.
Journal: The American Statistician
Pages: 175-183
Issue: 2
Volume: 72
Year: 2018
Month: 4
X-DOI: 10.1080/00031305.2017.1286260
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1286260
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:2:p:175-183
Template-Type: ReDIF-Article 1.0
Author-Name: Timothy G. Gregoire
Author-X-Name-First: Timothy G.
Author-X-Name-Last: Gregoire
Author-Name: David L. R. Affleck
Author-X-Name-First: David L. R.
Author-X-Name-Last: Affleck
Title: Estimating Desired Sample Size for Simple Random Sampling of a Skewed Population
Abstract:
A simulation study was conducted to assess how well the necessary sample size to achieve a stipulated margin of error can be estimated prior to sampling. Our concern was particularly focused on performance when sampling from a very skewed distribution, which is a common feature of many biological, economic, and other populations. We examined two approaches for estimating sample size—one being the commonly used strategy aimed at regulating the average magnitude of the stipulated margin of error and the second being a previously proposed strategy to control the tolerance probability with which the stipulated margin of error is exceeded. Results of the simulation revealed that (1) skewness does not much affect the average estimated sample size but can greatly extend the range of estimated sample sizes; and (2) skewness does reduce the effectiveness of Kupper and Hafner's sample size estimator, yet its effectiveness is negatively impacted less by skewness directly, and to a much greater degree by the common practice of estimating the population variance via a pilot sampling from the skewed population. Nonetheless, the simulations suggest that estimating sample size to control the probability with which the desired margin of error is achieved is a worthwhile alternative to the usual sample size formula that controls the average width of the confidence interval only.
Journal: The American Statistician
Pages: 184-190
Issue: 2
Volume: 72
Year: 2018
Month: 4
X-DOI: 10.1080/00031305.2017.1290548
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1290548
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:2:p:184-190
Template-Type: ReDIF-Article 1.0
Author-Name: Peng Ding
Author-X-Name-First: Peng
Author-X-Name-Last: Ding
Author-Name: Joseph K. Blitzstein
Author-X-Name-First: Joseph K.
Author-X-Name-Last: Blitzstein
Title: On the Gaussian Mixture Representation of the Laplace Distribution
Abstract:
Under certain conditions, a symmetric unimodal continuous random variable ξ can be represented as a scale mixture of a standard Normal distribution Z, that is, ξ=WZ$\xi = \sqrt{W} Z$, where the mixing distribution W is independent of Z. It is well known that if the mixing distribution is inverse Gamma, then ξ has Student’s t distribution. However, it is less well known that if the mixing distribution is Gamma, then ξ has a Laplace distribution. Several existing proofs of the latter result rely on complex calculus or nontrivial change of variables in integrals. We offer two simple and intuitive proofs based on representation and moment generating functions. As a byproduct, our proof by representation makes connections to many existing results in statistics. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 172-174
Issue: 2
Volume: 72
Year: 2018
Month: 4
X-DOI: 10.1080/00031305.2017.1291448
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1291448
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:2:p:172-174
Template-Type: ReDIF-Article 1.0
Author-Name: Daniel R. Jeske
Author-X-Name-First: Daniel R.
Author-X-Name-Last: Jeske
Author-Name: Janet M. Myhre
Author-X-Name-First: Janet M.
Author-X-Name-Last: Myhre
Title: Regression Using Pairs vs. Regression on Differences: A Real-life Case Study for a Master's Level Methods Class
Abstract:
When teaching regression classes real-life examples help emphasize the importance of understanding theoretical concepts related to methodologies. This can be appreciated after a little reflection on the difficulty of constructing novel questions in regression that test on concepts rather than mere calculations. Interdisciplinary collaborations can be fertile contexts for questions of this type. In this article, we offer a case study that students will find: (1) practical with respect to the question being addressed, (2) compelling in the way it shows how a solid understanding of theory helps answer the question, and (3) enlightening in the way it shows how statisticians contribute to problem solving in interdisciplinary environments. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 163-168
Issue: 2
Volume: 72
Year: 2018
Month: 4
X-DOI: 10.1080/00031305.2017.1292956
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1292956
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:2:p:163-168
Template-Type: ReDIF-Article 1.0
Author-Name: Kathryn Schaefer Ziemer
Author-X-Name-First: Kathryn Schaefer
Author-X-Name-Last: Ziemer
Author-Name: Bianica Pires
Author-X-Name-First: Bianica
Author-X-Name-Last: Pires
Author-Name: Vicki Lancaster
Author-X-Name-First: Vicki
Author-X-Name-Last: Lancaster
Author-Name: Sallie Keller
Author-X-Name-First: Sallie
Author-X-Name-Last: Keller
Author-Name: Mark Orr
Author-X-Name-First: Mark
Author-X-Name-Last: Orr
Author-Name: Stephanie Shipp
Author-X-Name-First: Stephanie
Author-X-Name-Last: Shipp
Title: A New Lens on High School Dropout: Use of Correspondence Analysis and the Statewide Longitudinal Data System
Abstract:
The combination of log-linear models and correspondence analysis have long been used to decompose contingency tables and aid in their interpretation. Until now, this approach has not been applied to the education Statewide Longitudinal Data System (SLDS), which contains administrative school data at the student level. While some research has been conducted using the SLDS, its primary use is for state education administrative reporting. This article uses the combination of log-linear models and correspondence analysis to gain insight into high school dropouts in two discrete regions in Kentucky, Appalachia and non-Appalachia, defined by the American Community Survey. The individual student records from the SLDS were categorized into one of the two regions and a log-linear model was used to identify the interactions between the demographic characteristics and the dropout categories, push-out and pull-out. Correspondence analysis was then used to visualize the interactions with the expanded push-out categories, boredom, course selection, expulsion, failing grade, teacher conflict, and pull-out categories, employment, family problems, illness, marriage, and pregnancy to provide insights into the regional differences. In this article, we demonstrate that correspondence analysis can extend the insights gained from SDLS data and provide new perspectives on dropouts. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 191-198
Issue: 2
Volume: 72
Year: 2018
Month: 4
X-DOI: 10.1080/00031305.2017.1322002
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1322002
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:2:p:191-198
Template-Type: ReDIF-Article 1.0
Author-Name: Alexander B. Sibley
Author-X-Name-First: Alexander
Author-X-Name-Last: B. Sibley
Author-Name: Zhiguo Li
Author-X-Name-First: Zhiguo
Author-X-Name-Last: Li
Author-Name: Yu Jiang
Author-X-Name-First: Yu
Author-X-Name-Last: Jiang
Author-Name: Yi-Ju Li
Author-X-Name-First: Yi-Ju
Author-X-Name-Last: Li
Author-Name: Cliburn Chan
Author-X-Name-First: Cliburn
Author-X-Name-Last: Chan
Author-Name: Andrew Allen
Author-X-Name-First: Andrew
Author-X-Name-Last: Allen
Author-Name: Kouros Owzar
Author-X-Name-First: Kouros
Author-X-Name-Last: Owzar
Title: Facilitating the Calculation of the Efficient Score Using Symbolic Computing
Abstract:
The score statistic continues to be a fundamental tool for statistical inference. In the analysis of data from high-throughput genomic assays, inference on the basis of the score usually enjoys greater stability, considerably higher computational efficiency, and lends itself more readily to the use of resampling methods than the asymptotically equivalent Wald or likelihood ratio tests. The score function often depends on a set of unknown nuisance parameters which have to be replaced by estimators, but can be improved by calculating the efficient score, which accounts for the variability induced by estimating these parameters. Manual derivation of the efficient score is tedious and error-prone, so we illustrate using computer algebra to facilitate this derivation. We demonstrate this process within the context of a standard example from genetic association analyses, though the techniques shown here could be applied to any derivation, and have a place in the toolbox of any modern statistician. We further show how the resulting symbolic expressions can be readily ported to compiled languages, to develop fast numerical algorithms for high-throughput genomic analysis. We conclude by considering extensions of this approach. The code featured in this report is available online as part of the supplementary material.
Journal: The American Statistician
Pages: 199-205
Issue: 2
Volume: 72
Year: 2018
Month: 4
X-DOI: 10.1080/00031305.2017.1392361
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1392361
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:2:p:199-205
Template-Type: ReDIF-Article 1.0
Author-Name: The Editors
Title: Reviews of Books and Teaching Materials
Journal: The American Statistician
Pages: 206-212
Issue: 2
Volume: 72
Year: 2018
Month: 4
X-DOI: 10.1080/00031305.2018.1469927
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1469927
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:2:p:206-212
Template-Type: ReDIF-Article 1.0
Author-Name: Amelia McNamara
Author-X-Name-First: Amelia
Author-X-Name-Last: McNamara
Author-Name: Nicholas J. Horton
Author-X-Name-First: Nicholas J.
Author-X-Name-Last: Horton
Title: Wrangling Categorical Data in R
Abstract:
Data wrangling is a critical foundation of data science, and wrangling of categorical data is an important component of this process. However, categorical data can introduce unique issues in data wrangling, particularly in real-world settings with collaborators and periodically-updated dynamic data. This article discusses common problems arising from categorical variable transformations in R, demonstrates the use of factors, and suggests approaches to address data wrangling challenges. For each problem, we present at least two strategies for management, one in base R and the other from the “tidyverse.” We consider several motivating examples, suggest defensive coding strategies, and outline principles for data wrangling to help ensure data quality and sound analysis. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 97-104
Issue: 1
Volume: 72
Year: 2018
Month: 1
X-DOI: 10.1080/00031305.2017.1356375
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1356375
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:97-104
Template-Type: ReDIF-Article 1.0
Author-Name: Benjamin S. Baumer
Author-X-Name-First: Benjamin S.
Author-X-Name-Last: Baumer
Title: Lessons From Between the White Lines for Isolated Data Scientists
Abstract:
Many current and future data scientists will be “isolated”—working alone or in small teams within a larger organization. This isolation brings certain challenges as well as freedoms. Drawing on my considerable experience both working in the professional sports industry and teaching in academia, I discuss troubled waters likely to be encountered by newly minted data scientists and offer advice about how to navigate them. Neither the issues raised nor the advice given are particular to sports and should be applicable to a wide range of knowledge domains.
Journal: The American Statistician
Pages: 66-71
Issue: 1
Volume: 72
Year: 2018
Month: 1
X-DOI: 10.1080/00031305.2017.1375985
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1375985
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:66-71
Template-Type: ReDIF-Article 1.0
Author-Name: Ben Marwick
Author-X-Name-First: Ben
Author-X-Name-Last: Marwick
Author-Name: Carl Boettiger
Author-X-Name-First: Carl
Author-X-Name-Last: Boettiger
Author-Name: Lincoln Mullen
Author-X-Name-First: Lincoln
Author-X-Name-Last: Mullen
Title: Packaging Data Analytical Work Reproducibly Using R (and Friends)
Abstract:
Computers are a central tool in the research process, enabling complex and large-scale data analysis. As computer-based research has increased in complexity, so have the challenges of ensuring that this research is reproducible. To address this challenge, we review the concept of the research compendium as a solution for providing a standard and easily recognizable way for organizing the digital materials of a research project to enable other researchers to inspect, reproduce, and extend the research. We investigate how the structure and tooling of software packages of the R programming language are being used to produce research compendia in a variety of disciplines. We also describe how software engineering tools and services are being used by researchers to streamline working with research compendia. Using real-world examples, we show how researchers can improve the reproducibility of their work using research compendia based on R packages and related tools.
Journal: The American Statistician
Pages: 80-88
Issue: 1
Volume: 72
Year: 2018
Month: 1
X-DOI: 10.1080/00031305.2017.1375986
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1375986
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:80-88
Template-Type: ReDIF-Article 1.0
Author-Name: Shannon E. Ellis
Author-X-Name-First: Shannon E.
Author-X-Name-Last: Ellis
Author-Name: Jeffrey T. Leek
Author-X-Name-First: Jeffrey T.
Author-X-Name-Last: Leek
Title: How to Share Data for Collaboration
Abstract:
Within the statistics community, a number of guiding principles for sharing data have emerged; however, these principles are not always made clear to collaborators generating the data. To bridge this divide, we have established a set of guidelines for sharing data. In these, we highlight the need to provide raw data to the statistician, the importance of consistent formatting, and the necessity of including all essential experimental information and pre-processing steps carried out to the statistician. With these guidelines we hope to avoid errors and delays in data analysis.
Journal: The American Statistician
Pages: 53-57
Issue: 1
Volume: 72
Year: 2018
Month: 1
X-DOI: 10.1080/00031305.2017.1375987
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1375987
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:53-57
Template-Type: ReDIF-Article 1.0
Author-Name: Lance A. Waller
Author-X-Name-First: Lance A.
Author-X-Name-Last: Waller
Title: Documenting and Evaluating Data Science Contributions in Academic Promotion in Departments of Statistics and Biostatistics
Abstract:
The dynamic intersection of the field of Data Science with the established academic communities of Statistics and Biostatistics continues to generate lively debate, often with the two fields playing the role of an upstart (but brilliant), tech-savvy prodigy and an established (but brilliant), curmudgeonly expert, respectively. Like any emerging discipline, Data Science brings new perspectives and new tools to address new questions requiring new perspectives on traditionally established concepts. We explore a specific component of this discussion, namely the documentation and evaluation of Data Science-related research, teaching, and service contributions for faculty members seeking promotion and tenure within traditional departments of Statistics and Biostatistics. We focus on three perspectives: the department chair nominating a candidate for promotion, the junior faculty member going up for promotion, and the senior faculty members evaluating the promotion package. We contrast conservative, strategic, and iconoclastic approaches to promotion based on accomplishments in data science.
Journal: The American Statistician
Pages: 11-19
Issue: 1
Volume: 72
Year: 2018
Month: 1
X-DOI: 10.1080/00031305.2017.1375988
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1375988
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:11-19
Template-Type: ReDIF-Article 1.0
Author-Name: Karl W. Broman
Author-X-Name-First: Karl W.
Author-X-Name-Last: Broman
Author-Name: Kara H. Woo
Author-X-Name-First: Kara H.
Author-X-Name-Last: Woo
Title: Data Organization in Spreadsheets
Abstract:
Spreadsheets are widely used software tools for data entry, storage, analysis, and visualization. Focusing on the data entry and storage aspects, this article offers practical recommendations for organizing spreadsheet data to reduce errors and ease later analyses. The basic principles are: be consistent, write dates like YYYY-MM-DD, do not leave any cells empty, put just one thing in a cell, organize the data as a single rectangle (with subjects as rows and variables as columns, and with a single header row), create a data dictionary, do not include calculations in the raw data files, do not use font color or highlighting as data, choose good names for things, make backups, use data validation to avoid data entry errors, and save the data in plain text files.
Journal: The American Statistician
Pages: 2-10
Issue: 1
Volume: 72
Year: 2018
Month: 1
X-DOI: 10.1080/00031305.2017.1375989
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1375989
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:2-10
Template-Type: ReDIF-Article 1.0
Author-Name: Dirk Eddelbuettel
Author-X-Name-First: Dirk
Author-X-Name-Last: Eddelbuettel
Author-Name: James Joseph Balamuta
Author-X-Name-First: James Joseph
Author-X-Name-Last: Balamuta
Title: Extending R with C++: A Brief Introduction to Rcpp
Abstract:
R has always provided an application programming interface (API) for extensions. Based on the C language, it uses a number of macros and other low-level constructs to exchange data structures between the R process and any dynamically loaded component modules authors added to it. With the introduction of the Rcpp package, and its later refinements, this process has become considerably easier yet also more robust. By now, Rcpp has become the most popular extension mechanism for R. This article introduces Rcpp, and illustrates with several examples how the Rcpp Attributes mechanism in particular eases the transition of objects between R and C++ code. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 28-36
Issue: 1
Volume: 72
Year: 2018
Month: 1
X-DOI: 10.1080/00031305.2017.1375990
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1375990
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:28-36
Template-Type: ReDIF-Article 1.0
Author-Name: Sean J. Taylor
Author-X-Name-First: Sean J.
Author-X-Name-Last: Taylor
Author-Name: Benjamin Letham
Author-X-Name-First: Benjamin
Author-X-Name-Last: Letham
Title: Forecasting at Scale
Abstract:
Forecasting is a common data science task that helps organizations with capacity planning, goal setting, and anomaly detection. Despite its importance, there are serious challenges associated with producing reliable and high-quality forecasts—especially when there are a variety of time series and analysts with expertise in time series modeling are relatively rare. To address these challenges, we describe a practical approach to forecasting “at scale” that combines configurable models with analyst-in-the-loop performance analysis. We propose a modular regression model with interpretable parameters that can be intuitively adjusted by analysts with domain knowledge about the time series. We describe performance analyses to compare and evaluate forecasting procedures, and automatically flag forecasts for manual review and adjustment. Tools that help analysts to use their expertise most effectively enable reliable, practical forecasting of business time series.
Journal: The American Statistician
Pages: 37-45
Issue: 1
Volume: 72
Year: 2018
Month: 1
X-DOI: 10.1080/00031305.2017.1380080
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1380080
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:37-45
Template-Type: ReDIF-Article 1.0
Author-Name: Ricardo Bion
Author-X-Name-First: Ricardo
Author-X-Name-Last: Bion
Author-Name: Robert Chang
Author-X-Name-First: Robert
Author-X-Name-Last: Chang
Author-Name: Jason Goodman
Author-X-Name-First: Jason
Author-X-Name-Last: Goodman
Title: How R Helps Airbnb Make the Most of its Data
Abstract:
At Airbnb, R has been among the most popular tools for doing data science work in many different contexts, including generating product insights, interpreting experiments, and building predictive models. Airbnb supports R usage by creating internal R tools and by creating a community of R users. We provide some specific advice for practitioners who wish to incorporate R into their day-to-day workflow.
Journal: The American Statistician
Pages: 46-52
Issue: 1
Volume: 72
Year: 2018
Month: 1
X-DOI: 10.1080/00031305.2017.1392362
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1392362
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:46-52
Template-Type: ReDIF-Article 1.0
Author-Name: Ronald Christensen
Author-X-Name-First: Ronald
Author-X-Name-Last: Christensen
Title: Comment on “A Note on Collinearity Diagnostics and Centering” by Velilla (2018)
Journal: The American Statistician
Pages: 114-117
Issue: 1
Volume: 72
Year: 2018
Month: 1
X-DOI: 10.1080/00031305.2017.1392896
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1392896
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:114-117
Template-Type: ReDIF-Article 1.0
Author-Name: Steven Wu
Author-X-Name-First: Steven
Author-X-Name-Last: Wu
Author-Name: Luke Bornn
Author-X-Name-First: Luke
Author-X-Name-Last: Bornn
Title: Modeling Offensive Player Movement in Professional Basketball
Abstract:
The 2013 arrival of SportVU player tracking data in all NBA arenas introduced an overwhelming amount of on-court information—information which the league is still learning how to maximize for insights into player performance and basketball strategy. The data contain the spatial coordinates for the ball and every player on the court for 25 frames per second, which opens up avenues of player and team performance analysis that was not possible before this technology existed. This article serves as a step-by-step guide for how to leverage a data feed from SportVU for one NBA game into visualizable components that can model any player's movement on offense. We detail some utility functions that are helpful for manipulating SportVU data before applying it to the task of visualizing player offensive movement. We conclude with visualizations of the resulting output for one NBA game, as well as what the results look like aggregated across an entire season for three NBA stars with very different offensive tendencies.
Journal: The American Statistician
Pages: 72-79
Issue: 1
Volume: 72
Year: 2018
Month: 1
X-DOI: 10.1080/00031305.2017.1395365
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1395365
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:72-79
Template-Type: ReDIF-Article 1.0
Author-Name: Mine Çetinkaya-Rundel
Author-X-Name-First: Mine
Author-X-Name-Last: Çetinkaya-Rundel
Author-Name: Colin Rundel
Author-X-Name-First: Colin
Author-X-Name-Last: Rundel
Title: Infrastructure and Tools for Teaching Computing Throughout the Statistical Curriculum
Abstract:
Modern statistics is fundamentally a computational discipline, but too often this fact is not reflected in our statistics curricula. With the rise of big data and data science, it has become increasingly clear that students want, expect, and need explicit training in this area of the discipline. Additionally, recent curricular guidelines clearly state that working with data requires extensive computing skills and that statistics students should be fluent in accessing, manipulating, analyzing, and modeling with professional statistical analysis software. Much has been written in the statistics education literature about pedagogical tools and approaches to provide a practical computational foundation for students. This article discusses the computational infrastructure and toolkit choices to allow for these pedagogical innovations while minimizing frustration and improving adoption for both our students and instructors. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 58-65
Issue: 1
Volume: 72
Year: 2018
Month: 1
X-DOI: 10.1080/00031305.2017.1397549
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1397549
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:58-65
Template-Type: ReDIF-Article 1.0
Author-Name: Daniel Kaplan
Author-X-Name-First: Daniel
Author-X-Name-Last: Kaplan
Title: Teaching Stats for Data Science
Abstract:
“Data science” is a useful catchword for methods and concepts original to the field of statistics, but typically being applied to large, multivariate, observational records. Such datasets call for techniques not often part of an introduction to statistics: modeling, consideration of covariates, sophisticated visualization, and causal reasoning. This article re-imagines introductory statistics as an introduction to data science and proposes a sequence of 10 blocks that together compose a suitable course for extracting information from contemporary data. Recent extensions to the mosaic packages for R together with tools from the “tidyverse” provide a concise and readable notation for wrangling, visualization, model-building, and model interpretation: the fundamental computational tasks of data science.
Journal: The American Statistician
Pages: 89-96
Issue: 1
Volume: 72
Year: 2018
Month: 1
X-DOI: 10.1080/00031305.2017.1398107
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1398107
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:89-96
Template-Type: ReDIF-Article 1.0
Author-Name: Santiago Velilla
Author-X-Name-First: Santiago
Author-X-Name-Last: Velilla
Title: Reply
Journal: The American Statistician
Pages: 117-119
Issue: 1
Volume: 72
Year: 2018
Month: 1
X-DOI: 10.1080/00031305.2017.1398985
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1398985
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:117-119
Template-Type: ReDIF-Article 1.0
Author-Name: Jennifer Bryan
Author-X-Name-First: Jennifer
Author-X-Name-Last: Bryan
Title: Excuse Me, Do You Have a Moment to Talk About Version Control?
Abstract:
Data analysis, statistical research, and teaching statistics have at least one thing in common: these activities all produce many files! There are data files, source code, figures, tables, prepared reports, and much more. Most of these files evolve over the course of a project and often need to be shared with others, for reading or edits, as a project unfolds. Without explicit and structured management, project organization can easily descend into chaos, taking time away from the primary work and reducing the quality of the final product. This unhappy result can be avoided by repurposing tools and workflows from the software development world, namely, distributed version control. This article describes the use of the version control system Git and the hosting site GitHub for statistical and data scientific workflows. Special attention is given to projects that use the statistical language R and, optionally, R Markdown documents. Supplementary materials include an annotated set of links to step-by-step tutorials, real world examples, and other useful learning resources. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 20-27
Issue: 1
Volume: 72
Year: 2018
Month: 1
X-DOI: 10.1080/00031305.2017.1399928
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1399928
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:20-27
Template-Type: ReDIF-Article 1.0
Author-Name: Hadley Wickham
Author-X-Name-First: Hadley
Author-X-Name-Last: Wickham
Author-Name: Jennifer Bryan
Author-X-Name-First: Jennifer
Author-X-Name-Last: Bryan
Author-Name: Nicole Lazar
Author-X-Name-First: Nicole
Author-X-Name-Last: Lazar
Title: Introduction: Special Issue on Data Science
Journal: The American Statistician
Pages: 1-1
Issue: 1
Volume: 72
Year: 2018
Month: 1
X-DOI: 10.1080/00031305.2018.1438699
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1438699
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:1-1
Template-Type: ReDIF-Article 1.0
Author-Name: The Editors
Title: Reviews of Books and Teaching Materials
Journal: The American Statistician
Pages: 105-113
Issue: 1
Volume: 72
Year: 2018
Month: 1
X-DOI: 10.1080/00031305.2018.1444855
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1444855
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:72:y:2018:i:1:p:105-113
Template-Type: ReDIF-Article 1.0
Author-Name: Alejandro Quintela-del-Río
Author-X-Name-First: Alejandro
Author-X-Name-Last: Quintela-del-Río
Author-Name: Mario Francisco-Fernández
Author-X-Name-First: Mario
Author-X-Name-Last: Francisco-Fernández
Title: Excel Templates: A Helpful Tool for Teaching Statistics
Abstract:
This article describes a free, open-source collection of templates for the popular Excel (2013, and later versions) spreadsheet program. These templates are spreadsheet files that allow easy and intuitive learning and the implementation of practical examples concerning descriptive statistics, random variables, confidence intervals, and hypothesis testing. Although they are designed to be used with Excel, they can also be employed with other free spreadsheet programs (changing some particular formulas). Moreover, we exploit some possibilities of the ActiveX controls of the Excel Developer Menu to perform interactive Gaussian density charts. Finally, it is important to note that they can be often embedded in a web page, so it is not necessary to employ Excel software for their use. These templates have been designed as a useful tool to teach basic statistics and to carry out data analysis even when the students are not familiar with Excel. Additionally, they can be used as a complement to other analytical software packages. They aim to assist students in learning statistics, within an intuitive working environment. Supplementary materials with the Excel templates are available online.
Journal: The American Statistician
Pages: 317-325
Issue: 4
Volume: 71
Year: 2017
Month: 10
X-DOI: 10.1080/00031305.2016.1186115
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1186115
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:4:p:317-325
Template-Type: ReDIF-Article 1.0
Author-Name: Philip M. Westgate
Author-X-Name-First: Philip M.
Author-X-Name-Last: Westgate
Author-Name: Woodrow W. Burchett
Author-X-Name-First: Woodrow W.
Author-X-Name-Last: Burchett
Title: A Comparison of Correlation Structure Selection Penalties for Generalized Estimating Equations
Abstract:
Correlated data are commonly analyzed using models constructed using population-averaged generalized estimating equations (GEEs). The specification of a population-averaged GEE model includes selection of a structure describing the correlation of repeated measures. Accurate specification of this structure can improve efficiency, whereas the finite-sample estimation of nuisance correlation parameters can inflate the variances of regression parameter estimates. Therefore, correlation structure selection criteria should penalize, or account for, correlation parameter estimation. In this article, we compare recently proposed penalties in terms of their impacts on correlation structure selection and regression parameter estimation, and give practical considerations for data analysts. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 344-353
Issue: 4
Volume: 71
Year: 2017
Month: 10
X-DOI: 10.1080/00031305.2016.1200490
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1200490
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:4:p:344-353
Template-Type: ReDIF-Article 1.0
Author-Name: Frank Tuyl
Author-X-Name-First: Frank
Author-X-Name-Last: Tuyl
Title: A Note on Priors for the Multinomial Model
Abstract:
An “overall objective” prior proposed for the multinomial model is shown to be inadequate in the presence of zero counts. An earlier proposed reference prior for when interest is in a particular category suffers from similar problems. It is argued that there is no need to deviate from the uniform prior proposed by Jeffreys, for which links with a non-Bayesian approach, when prediction is of interest, are shown.
Journal: The American Statistician
Pages: 298-301
Issue: 4
Volume: 71
Year: 2017
Month: 10
X-DOI: 10.1080/00031305.2016.1222309
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1222309
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:4:p:298-301
Template-Type: ReDIF-Article 1.0
Author-Name: Jesse Frey
Author-X-Name-First: Jesse
Author-X-Name-Last: Frey
Author-Name: Yimin Zhang
Author-X-Name-First: Yimin
Author-X-Name-Last: Zhang
Title: What Do Interpolated Nonparametric Confidence Intervals for Population Quantiles Guarantee?
Abstract:
The interval between two prespecified order statistics of a sample provides a distribution-free confidence interval for a population quantile. However, due to discreteness, only a small set of exact coverage probabilities is available. Interpolated confidence intervals are designed to expand the set of available coverage probabilities. However, we show here that the infimum of the coverage probability for an interpolated confidence interval is either the coverage probability for the inner interval or the coverage probability obtained by removing the more likely of the two extreme subintervals from the outer interval. Thus, without additional assumptions, interpolated intervals do not expand the set of available guaranteed coverage probabilities.
Journal: The American Statistician
Pages: 305-309
Issue: 4
Volume: 71
Year: 2017
Month: 10
X-DOI: 10.1080/00031305.2016.1226952
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1226952
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:4:p:305-309
Template-Type: ReDIF-Article 1.0
Author-Name: Peter K. Dunn
Author-X-Name-First: Peter K.
Author-X-Name-Last: Dunn
Author-Name: Michael D. Carey
Author-X-Name-First: Michael D.
Author-X-Name-Last: Carey
Author-Name: Michael B. Farrar
Author-X-Name-First: Michael B.
Author-X-Name-Last: Farrar
Author-Name: Alice M. Richardson
Author-X-Name-First: Alice M.
Author-X-Name-Last: Richardson
Author-Name: Christine McDonald
Author-X-Name-First: Christine
Author-X-Name-Last: McDonald
Title: Introductory Statistics Textbooks and the GAISE Recommendations
Abstract:
The six recommendations made by the Guidelines for Assessment and Instruction in Statistics Education (GAISE) committee were first communicated in 2005 and more formally in 2010. In this article, 25 introductory statistics textbooks are examined to assess how well these textbooks have incorporated the three GAISE recommendations most relevant to implementation in textbooks (statistical literacy and thinking; use of real data; stress concepts over procedures). The implementation of another recommendation (using technology) is described but not assessed. In general, most textbooks appear to be adopting the GAISE recommendations reasonably well in both exposition and exercises. The textbooks are particularly adept at using real data, using real data well, and promoting statistical literacy. Textbooks are less adept—but still rated reasonably well, in general—at explaining concepts over procedures and promoting statistical thinking. In contrast, few textbooks have easy-usable glossaries of statistical terms to assist with understanding of statistical language and literacy development. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 326-335
Issue: 4
Volume: 71
Year: 2017
Month: 10
X-DOI: 10.1080/00031305.2016.1251972
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1251972
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:4:p:326-335
Template-Type: ReDIF-Article 1.0
Author-Name: Thomas J. DiCiccio
Author-X-Name-First: Thomas J.
Author-X-Name-Last: DiCiccio
Author-Name: Todd A. Kuffner
Author-X-Name-First: Todd A.
Author-X-Name-Last: Kuffner
Author-Name: G. Alastair Young
Author-X-Name-First: G. Alastair
Author-X-Name-Last: Young
Title: A Simple Analysis of the Exact Probability Matching Prior in the Location-Scale Model
Abstract:
It has long been asserted that in univariate location-scale models, when concerned with inference for either the location or scale parameter, the use of the inverse of the scale parameter as a Bayesian prior yields posterior credible sets that have exactly the correct frequentist confidence set interpretation. This claim dates to at least Peers, and has subsequently been noted by various authors, with varying degrees of justification. We present a simple, direct demonstration of the exact matching property of the posterior credible sets derived under use of this prior in the univariate location-scale model. This is done by establishing an equivalence between the conditional frequentist and posterior densities of the pivotal quantities on which conditional frequentist inferences are based.
Journal: The American Statistician
Pages: 302-304
Issue: 4
Volume: 71
Year: 2017
Month: 10
X-DOI: 10.1080/00031305.2016.1255662
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1255662
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:4:p:302-304
Template-Type: ReDIF-Article 1.0
Author-Name: Joseph B. Lang
Author-X-Name-First: Joseph B.
Author-X-Name-Last: Lang
Title: Mean-Minimum Exact Confidence Intervals
Abstract:
This article introduces mean-minimum (MM) exact confidence intervals for a binomial probability. These intervals guarantee that both the mean and the minimum frequentist coverage never drop below specified values. For example, an MM 95[93]% interval has mean coverage at least 95% and minimum coverage at least 93%. In the conventional sense, such an interval can be viewed as an exact 93% interval that has mean coverage at least 95% or it can be viewed as an approximate 95% interval that has minimum coverage at least 93%. Graphical and numerical summaries of coverage and expected length suggest that the Blaker-based MM exact interval is an attractive alternative to, even an improvement over, commonly recommended approximate and exact intervals, including the Agresti–Coull approximate interval, the Clopper–Pearson (CP) exact interval, and the more recently recommended CP-, Blaker-, and Sterne-based mean-coverage-adjusted approximate intervals.
Journal: The American Statistician
Pages: 354-368
Issue: 4
Volume: 71
Year: 2017
Month: 10
X-DOI: 10.1080/00031305.2016.1256838
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1256838
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:4:p:354-368
Template-Type: ReDIF-Article 1.0
Author-Name: Dabao Zhang
Author-X-Name-First: Dabao
Author-X-Name-Last: Zhang
Title: A Coefficient of Determination for Generalized Linear Models
Abstract:
The coefficient of determination, a.k.a. R2, is well-defined in linear regression models, and measures the proportion of variation in the dependent variable explained by the predictors included in the model. To extend it for generalized linear models, we use the variance function to define the total variation of the dependent variable, as well as the remaining variation of the dependent variable after modeling the predictive effects of the independent variables. Unlike other definitions that demand complete specification of the likelihood function, our definition of R2 only needs to know the mean and variance functions, so applicable to more general quasi-models. It is consistent with the classical measure of uncertainty using variance, and reduces to the classical definition of the coefficient of determination when linear regression models are considered.
Journal: The American Statistician
Pages: 310-316
Issue: 4
Volume: 71
Year: 2017
Month: 10
X-DOI: 10.1080/00031305.2016.1256839
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1256839
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:4:p:310-316
Template-Type: ReDIF-Article 1.0
Author-Name: Ning Hao
Author-X-Name-First: Ning
Author-X-Name-Last: Hao
Author-Name: Hao Helen Zhang
Author-X-Name-First: Hao Helen
Author-X-Name-Last: Zhang
Title: A Note on High-Dimensional Linear Regression With Interactions
Abstract:
The problem of interaction selection in high-dimensional data analysis has recently received much attention. This note aims to address and clarify several fundamental issues in interaction selection for linear regression models, especially when the input dimension p is much larger than the sample size n. We first discuss how to give a formal definition of “importance” for main and interaction effects. Then we focus on two-stage methods, which are computationally attractive for high-dimensional data analysis but thus far have been regarded as heuristic. We revisit the counterexample of Turlach and provide new insight to justify two-stage methods from the theoretical perspective. In the end, we suggest new strategies for interaction selection under the marginality principle and provide some simulation results.
Journal: The American Statistician
Pages: 291-297
Issue: 4
Volume: 71
Year: 2017
Month: 10
X-DOI: 10.1080/00031305.2016.1264311
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1264311
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:4:p:291-297
Template-Type: ReDIF-Article 1.0
Author-Name: Luís Gustavo Esteves
Author-X-Name-First: Luís Gustavo
Author-X-Name-Last: Esteves
Author-Name: Rafael Izbicki
Author-X-Name-First: Rafael
Author-X-Name-Last: Izbicki
Author-Name: Rafael Bassi Stern
Author-X-Name-First: Rafael Bassi
Author-X-Name-Last: Stern
Title: Teaching Decision Theory Proof Strategies Using a Crowdsourcing Problem
Abstract:
Teaching how to derive minimax decision rules can be challenging because of the lack of examples that are simple enough to be used in the classroom. Motivated by this challenge, we provide a new example that illustrates the use of standard techniques in the derivation of optimal decision rules under the Bayes and minimax approaches. We discuss how to predict the value of an unknown quantity, θ ∈ {0, 1}, given the opinions of n experts. An important example of such crowdsourcing problem occurs in modern cosmology, where θ indicates whether a given galaxy is merging or not, and Y1, …, Yn are the opinions from n astronomers regarding θ. We use the obtained prediction rules to discuss advantages and disadvantages of the Bayes and minimax approaches to decision theory. The material presented here is intended to be taught to first-year graduate students.
Journal: The American Statistician
Pages: 336-343
Issue: 4
Volume: 71
Year: 2017
Month: 10
X-DOI: 10.1080/00031305.2016.1264316
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1264316
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:4:p:336-343
Template-Type: ReDIF-Article 1.0
Author-Name: Ronald Christensen
Author-X-Name-First: Ronald
Author-X-Name-Last: Christensen
Title: Comment on “The Target Parameter of Adjusted R-Squared in Fixed-Design Experiments” by Bar-Gera (2017)
Journal: The American Statistician
Pages: 373-375
Issue: 4
Volume: 71
Year: 2017
Month: 10
X-DOI: 10.1080/00031305.2017.1358215
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1358215
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:4:p:373-375
Template-Type: ReDIF-Article 1.0
Author-Name: The Editors
Title: Editorial Collaborators
Journal: The American Statistician
Pages: 376-377
Issue: 4
Volume: 71
Year: 2017
Month: 10
X-DOI: 10.1080/00031305.2017.1395629
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1395629
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:4:p:376-377
Template-Type: ReDIF-Article 1.0
Author-Name: The Editors
Title: Reviews of Books and Teaching Materials
Journal: The American Statistician
Pages: 369-372
Issue: 4
Volume: 71
Year: 2017
Month: 10
X-DOI: 10.1080/00031305.2017.1395630
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1395630
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:4:p:369-372
Template-Type: ReDIF-Article 1.0
Author-Name: The Editors
Title: Editorial Board EOV
Journal: The American Statistician
Pages: ebi-ebi
Issue: 4
Volume: 71
Year: 2017
Month: 10
X-DOI: 10.1080/00031305.2017.1400355
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1400355
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:71:y:2017:i:4:p:ebi-ebi
Template-Type: ReDIF-Article 1.0
Author-Name: Sean Kross
Author-X-Name-First: Sean
Author-X-Name-Last: Kross
Author-Name: Roger D. Peng
Author-X-Name-First: Roger D.
Author-X-Name-Last: Peng
Author-Name: Brian S. Caffo
Author-X-Name-First: Brian S.
Author-X-Name-Last: Caffo
Author-Name: Ira Gooding
Author-X-Name-First: Ira
Author-X-Name-Last: Gooding
Author-Name: Jeffrey T. Leek
Author-X-Name-First: Jeffrey T.
Author-X-Name-Last: Leek
Title: The Democratization of Data Science Education
Abstract:
Over the last three decades, data have become ubiquitous and cheap. This transition has accelerated over the last five years and training in statistics, machine learning, and data analysis has struggled to keep up. In April 2014, we launched a program of nine courses, the Johns Hopkins Data Science Specialization, which has now had more than 4 million enrollments over the past five years. Here, the program is described and compared to standard data science curricula as they were organized in 2014 and 2015. We show that novel pedagogical and administrative decisions introduced in our program are now standard in online data science programs. The impact of the Data Science Specialization on data science education in the U.S. is also discussed. Finally, we conclude with some thoughts about the future of data science education in a data democratized world.
Journal: The American Statistician
Pages: 1-7
Issue: 1
Volume: 74
Year: 2020
Month: 1
X-DOI: 10.1080/00031305.2019.1668849
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1668849
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:1-7
Template-Type: ReDIF-Article 1.0
Author-Name: Fulya Gokalp Yavuz
Author-X-Name-First: Fulya Gokalp
Author-X-Name-Last: Yavuz
Author-Name: Mark Daniel Ward
Author-X-Name-First: Mark Daniel
Author-X-Name-Last: Ward
Title: Fostering Undergraduate Data Science
Abstract:
Data Science is one of the newest interdisciplinary areas. It is transforming our lives unexpectedly fast. This transformation is also happening in our learning styles and practicing habits. We advocate an approach to data science training that uses several types of computational tools, including R, bash, awk, regular expressions, SQL, and XPath, often used in tandem. We discuss ways for undergraduate mentees to learn about data science topics, at an early point in their training. We give some intuition for researchers, professors, and practitioners about how to effectively embed real-life examples into data science learning environments. As a result, we have a unified program built on a foundation of team-oriented, data-driven projects.
Journal: The American Statistician
Pages: 8-16
Issue: 1
Volume: 74
Year: 2020
Month: 1
X-DOI: 10.1080/00031305.2017.1407360
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1407360
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:8-16
Template-Type: ReDIF-Article 1.0
Author-Name: Debashis Chatterjee
Author-X-Name-First: Debashis
Author-X-Name-Last: Chatterjee
Author-Name: Trisha Maitra
Author-X-Name-First: Trisha
Author-X-Name-Last: Maitra
Author-Name: Sourabh Bhattacharya
Author-X-Name-First: Sourabh
Author-X-Name-Last: Bhattacharya
Title: A Short Note on Almost Sure Convergence of Bayes Factors in the General Set-Up
Abstract:
Although there is a significant literature on the asymptotic theory of Bayes factor, the set-ups considered are usually specialized and often involves independent and identically distributed data. Even in such specialized cases, mostly weak consistency results are available. In this article, for the first time ever, we derive the almost sure convergence theory of Bayes factor in the general set-up that includes even dependent data and misspecified models. Somewhat surprisingly, the key to the proof of such a general theory is a simple application of a result of Shalizi to a well-known identity satisfied by the Bayes factor. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 17-20
Issue: 1
Volume: 74
Year: 2020
Month: 1
X-DOI: 10.1080/00031305.2017.1397548
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1397548
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:17-20
Template-Type: ReDIF-Article 1.0
Author-Name: Niels G. Waller
Author-X-Name-First: Niels G.
Author-X-Name-Last: Waller
Title: Generating Correlation Matrices With Specified Eigenvalues Using the Method of Alternating Projections
Abstract:
This article describes a new algorithm for generating correlation matrices with specified eigenvalues. The algorithm uses the method of alternating projections (MAP) that was first described by Neumann. The MAP algorithm for generating correlation matrices is both easy to understand and to program in higher-level computer languages, making this method accessible to applied researchers with no formal training in advanced mathematics. Simulations indicate that the new algorithm has excellent convergence properties. Correlation matrices with specified eigenvalues can be profitably used in Monte Carlo research in statistics, psychometrics, computer science, and related disciplines. To encourage such use, R code (R Core Team) for implementing the algorithm is provided in the supplementary material.
Journal: The American Statistician
Pages: 21-28
Issue: 1
Volume: 74
Year: 2020
Month: 1
X-DOI: 10.1080/00031305.2017.1401960
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1401960
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:21-28
Template-Type: ReDIF-Article 1.0
Author-Name: Xinjie Hu
Author-X-Name-First: Xinjie
Author-X-Name-Last: Hu
Author-Name: Aekyung Jung
Author-X-Name-First: Aekyung
Author-X-Name-Last: Jung
Author-Name: Gengsheng Qin
Author-X-Name-First: Gengsheng
Author-X-Name-Last: Qin
Title: Interval Estimation for the Correlation Coefficient
Abstract:
The correlation coefficient (CC) is a standard measure of a possible linear association between two continuous random variables. The CC plays a significant role in many scientific disciplines. For a bivariate normal distribution, there are many types of confidence intervals for the CC, such as z-transformation and maximum likelihood-based intervals. However, when the underlying bivariate distribution is unknown, the construction of confidence intervals for the CC is not well-developed. In this paper, we discuss various interval estimation methods for the CC. We propose a generalized confidence interval for the CC when the underlying bivariate distribution is a normal distribution, and two empirical likelihood-based intervals for the CC when the underlying bivariate distribution is unknown. We also conduct extensive simulation studies to compare the new intervals with existing intervals in terms of coverage probability and interval length. Finally, two real examples are used to demonstrate the application of the proposed methods.
Journal: The American Statistician
Pages: 29-36
Issue: 1
Volume: 74
Year: 2020
Month: 1
X-DOI: 10.1080/00031305.2018.1437077
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1437077
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:29-36
Template-Type: ReDIF-Article 1.0
Author-Name: Johan René van Dorp
Author-X-Name-First: Johan René
Author-X-Name-Last: van Dorp
Author-Name: M. C. Jones
Author-X-Name-First: M. C.
Author-X-Name-Last: Jones
Title: The Johnson System of Frequency Curves—Historical, Graphical, and Limiting Perspectives
Abstract:
The idea of transforming one random variate to another with a more convenient density has been developed in the first half of the 20th century. In his thesis, Norman L. Johnson (1917–2004) developed a pioneering system of transformations of the standard normal distribution which gained substantial popularity in the second half of the 20th century and beyond. In Johnson’s 1949 Biometrika paper entitled Systems of frequency curves generated by methods of translation, summarizing that thesis, one of his primary interests was the behavior of the shape of the probability density functions as their parameter values change. Herein, we attempt to further elucidate this behavior through a series of geometric expositions of that transformation process. In these expositions insight is obtained into the behavior of Johnson’s density functions, and their skewness and kurtosis, as they converge to their limiting distributions, a topic which received little attention.
Journal: The American Statistician
Pages: 37-52
Issue: 1
Volume: 74
Year: 2020
Month: 1
X-DOI: 10.1080/00031305.2019.1637778
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1637778
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:37-52
Template-Type: ReDIF-Article 1.0
Author-Name: Chien-Lang Su
Author-X-Name-First: Chien-Lang
Author-X-Name-Last: Su
Author-Name: Sun-Hao Chang
Author-X-Name-First: Sun-Hao
Author-X-Name-Last: Chang
Author-Name: Ruby Chiu-Hsing Weng
Author-X-Name-First: Ruby Chiu-Hsing
Author-X-Name-Last: Weng
Title: A Note on Item Response Theory Modeling for Online Customer Ratings
Abstract:
Online consumer product ratings data are increasing rapidly. While most of the current graphical displays mainly represent the average ratings, Ho and Quinn proposed an easily interpretable graphical display based on an ordinal item response theory (IRT) model, which successfully accounts for systematic interrater differences. Conventionally, the discrimination parameters in IRT models are constrained to be positive, particularly in the modeling of scored data from educational tests. In this article, we use real-world ratings data to demonstrate that such a constraint can have a great impact on the parameter estimation. This impact on estimation was explained through rater behavior. We also discuss correlation among raters and assess the prediction accuracy for both the constrained and the unconstrained models. The results show that the unconstrained model performs better when a larger fraction of rater pairs exhibit negative correlations in ratings.
Journal: The American Statistician
Pages: 53-63
Issue: 1
Volume: 74
Year: 2020
Month: 1
X-DOI: 10.1080/00031305.2017.1422804
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1422804
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:53-63
Template-Type: ReDIF-Article 1.0
Author-Name: Tamal Ghosh
Author-X-Name-First: Tamal
Author-X-Name-Last: Ghosh
Author-Name: Malay Ghosh
Author-X-Name-First: Malay
Author-X-Name-Last: Ghosh
Author-Name: Tatsuya Kubokawa
Author-X-Name-First: Tatsuya
Author-X-Name-Last: Kubokawa
Title: On the Loss Robustness of Least-Square Estimators
Abstract:
The article revisits univariate and multivariate linear regression models. It is shown that least-square estimators (LSEs) are minimum risk estimators in general class of linear unbiased estimators under some general divergence loss. This amounts to the loss robustness of LSEs.
Journal: The American Statistician
Pages: 64-67
Issue: 1
Volume: 74
Year: 2020
Month: 1
X-DOI: 10.1080/00031305.2018.1529626
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1529626
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:64-67
Template-Type: ReDIF-Article 1.0
Author-Name: Román Salmerón Gómez
Author-X-Name-First: Román
Author-X-Name-Last: Salmerón Gómez
Author-Name: Catalina García García
Author-X-Name-First: Catalina
Author-X-Name-Last: García García
Author-Name: Jose García Pérez
Author-X-Name-First: Jose
Author-X-Name-Last: García Pérez
Title: Comment on “A Note on Collinearity Diagnostics and Centering” by Velilla (2018)
Journal: The American Statistician
Pages: 68-71
Issue: 1
Volume: 74
Year: 2020
Month: 1
X-DOI: 10.1080/00031305.2019.1635527
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1635527
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:68-71
Template-Type: ReDIF-Article 1.0
Author-Name: Victor De Oliveira
Author-X-Name-First: Victor
Author-X-Name-Last: De Oliveira
Title: Models for Geostatistical Binary Data: Properties and Connections
Abstract:
This article explores models for geostatistical data for situations in which the region where the phenomenon of interest varies is partitioned into two disjoint subregions. This is called a binary map. The goals of the article are 3-fold. First, a review is provided of the classes of models that have been proposed so far in the literature for geostatistical binary data as well as a description of their main features. A problems with the use of moment-based models is pointed out. Second, a generalization is provided of the clipped Gaussian random field that eases regression function modeling, interpretation of the regression parameters, and establishing connections with other models. The second-order properties of this model are studied in some detail. Finally, connections between the aforementioned classes of models are established, showing that some of these are reformulations (reparameterizations) of the other classes of models.
Journal: The American Statistician
Pages: 72-79
Issue: 1
Volume: 74
Year: 2020
Month: 1
X-DOI: 10.1080/00031305.2018.1444674
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1444674
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:72-79
Template-Type: ReDIF-Article 1.0
Author-Name: Peter H. Peskun
Author-X-Name-First: Peter H.
Author-X-Name-Last: Peskun
Title: Two-Tailed p-Values and Coherent Measures of Evidence
Abstract:
In a test of significance, it is common practice to report the p-value as one way of summarizing the incompatibility between a set of data and a proposed model for the data constructed under a set of assumptions together with a null hypothesis. However, the p-value does have some flaws, one being in general its definition for two-sided tests and a related serious logical one of incoherence, in its interpretation as a statistical measure of evidence for its respective null hypothesis. We shall address these two issues in this article.
Journal: The American Statistician
Pages: 80-86
Issue: 1
Volume: 74
Year: 2020
Month: 1
X-DOI: 10.1080/00031305.2018.1475304
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1475304
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:80-86
Template-Type: ReDIF-Article 1.0
Author-Name: Robert B. Gramacy
Author-X-Name-First: Robert B.
Author-X-Name-Last: Gramacy
Title: A Shiny Update to an Old Experiment Game
Abstract:
Games can be a powerful tool for learning about statistical methodology. Effective game design involves a fine balance between caricature and realism, to simultaneously illustrate salient concepts in a controlled setting and serve as a testament to real-world applicability. Striking that balance is particularly challenging in response surface and design domains, where real-world scenarios often play out over long time scales, during which theories are revised, model and inferential techniques are improved, and knowledge is updated. Here, I present a game, borrowing liberally from one first played over 40 years ago, which attempts to achieve that balance while reinforcing a cascade of topics in modern nonparametric response surfaces, sequential design, and optimization. The game embeds a blackbox simulation within a shiny app whose interface is designed to simulate a realistic information–availability setting, while offering a stimulating, competitive environment wherein students can try out new methodology, and ultimately appreciate its power and limitations. Interface, rules, timing with course material, and evaluation are described, along with a “case study” involving a cohort of students at Virginia Tech. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 87-92
Issue: 1
Volume: 74
Year: 2020
Month: 1
X-DOI: 10.1080/00031305.2018.1505659
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1505659
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:87-92
Template-Type: ReDIF-Article 1.0
Author-Name: Barry C. Arnold
Author-X-Name-First: Barry C.
Author-X-Name-Last: Arnold
Title: Further Examples Related to the Identical Distribution of X/(X+Y) and Y/(X+Y)
Abstract:
The study of conditions under which a two-dimensional random variable (X, Y) will have the property that X/(X+Y)=dY/(X+Y)
was initiated by Bhattacharjee and Dhar. Some additional perhaps unexpected examples related to this phenomenon are provided. Discrete and absolutely continuous cases are discussed in detail. Singular continuous cases are briefly mentioned.
Journal: The American Statistician
Pages: 93-97
Issue: 1
Volume: 74
Year: 2020
Month: 1
X-DOI: 10.1080/00031305.2019.1575772
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1575772
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:93-97
Template-Type: ReDIF-Article 1.0
Author-Name: Micha Mandel
Author-X-Name-First: Micha
Author-X-Name-Last: Mandel
Title: The Scaled Uniform Model Revisited
Abstract:
Sufficiency, conditionality, and invariance are basic principles of statistical inference. Current mathematical statistics courses do not devote much teaching time to these classical principles, and even ignore the latter two, in order to teach modern methods. However, being the philosophical cornerstones of statistical inference, a minimal understanding of these principles should be part of any curriculum in statistics. The scaled uniform model is used here to demonstrate the importance and usefulness of the conditionality principle, which is probably the most basic and less familiar among the three.
Journal: The American Statistician
Pages: 98-100
Issue: 1
Volume: 74
Year: 2020
Month: 1
X-DOI: 10.1080/00031305.2019.1604431
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1604431
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:98-100
Template-Type: ReDIF-Article 1.0
Author-Name: Jean-Louis Foulley
Author-X-Name-First: Jean-Louis
Author-X-Name-Last: Foulley
Title: Benjamin, D. J., and Berger, J. O. (2019), “Three Recommendations for Improving the Use of p-Values”, The American Statistician, 73, 186–191: Comment by Foulley
Journal: The American Statistician
Pages: 101-102
Issue: 1
Volume: 74
Year: 2020
Month: 1
X-DOI: 10.1080/00031305.2019.1668850
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1668850
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:1:p:101-102
Template-Type: ReDIF-Article 1.0
Author-Name: Todd A. Kuffner
Author-X-Name-First: Todd A.
Author-X-Name-Last: Kuffner
Author-Name: Stephen G. Walker
Author-X-Name-First: Stephen G.
Author-X-Name-Last: Walker
Title: Why are p-Values Controversial?
Abstract:
While it is often argued that a p-value is a probability; see Wasserstein and Lazar, we argue that a p-value is not defined as a probability. A p-value is a bijection of the sufficient statistic for a given test which maps to the same scale as the Type I error probability. As such, the use of p-values in a test should be no more a source of controversy than the use of a sufficient statistic. It is demonstrated that there is, in fact, no ambiguity about what a p-value is, contrary to what has been claimed in recent public debates in the applied statistics community. We give a simple example to illustrate that rejecting the use of p-values in testing for a normal mean parameter is conceptually no different from rejecting the use of a sample mean. The p-value is innocent; the problem arises from its misuse and misinterpretation. The way that p-values have been informally defined and interpreted appears to have led to tremendous confusion and controversy regarding their place in statistical analysis.
Journal: The American Statistician
Pages: 1-3
Issue: 1
Volume: 73
Year: 2019
Month: 1
X-DOI: 10.1080/00031305.2016.1277161
File-URL: http://hdl.handle.net/10.1080/00031305.2016.1277161
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:1-3
Template-Type: ReDIF-Article 1.0
Author-Name: Gyuhyeong Goh
Author-X-Name-First: Gyuhyeong
Author-X-Name-Last: Goh
Author-Name: Dipak K. Dey
Author-X-Name-First: Dipak K.
Author-X-Name-Last: Dey
Title: Asymptotic Properties of Marginal Least-Square Estimator for Ultrahigh-Dimensional Linear Regression Models with Correlated Errors
Abstract:
In this article, we discuss asymptotic properties of marginal least-square estimator for ultrahigh-dimensional linear regression models. We are specifically interested in probabilistic consistency of the marginal least-square estimator in the presence of correlated errors. We show that under a partial orthogonality condition, the marginal least-square estimator can achieve variable selection consistency. In addition, we demonstrate that if a mutual orthogonality holds, the marginal least-square estimator satisfies estimation consistency. The discussed theories are exemplified through extensive simulation studies.
Journal: The American Statistician
Pages: 4-9
Issue: 1
Volume: 73
Year: 2019
Month: 1
X-DOI: 10.1080/00031305.2017.1302359
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1302359
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:4-9
Template-Type: ReDIF-Article 1.0
Author-Name: Stephen Portnoy
Author-X-Name-First: Stephen
Author-X-Name-Last: Portnoy
Title: Invariance, Optimality, and a 1-Observation Confidence Interval for a Normal Mean
Abstract:
In a 1965 Decision Theory course at Stanford University, Charles Stein began a digression with “an amusing problem”: is there a proper confidence interval for the mean based on a single observation from a normal distribution with both mean and variance unknown? Stein introduced the interval with endpoints ± c|X| and showed indeed that for c large enough, the minimum coverage probability (over all values for the mean and variance) could be made arbitrarily near one. While the problem and coverage calculation were in the author’s hand-written notes from the course, there was no development of any optimality result for the interval. Here, the Hunt–Stein construction plus analysis based on special features of the problem provides a “minimax” rule in the sense that it minimizes the maximum expected length among all procedures with fixed coverage (or, equivalently, maximizes the minimal coverage among all procedures with a fixed expected length). The minimax rule is a mixture of two confidence procedures that are equivariant under scale and sign changes, and are uniformly better than the classroom example or the natural interval X ± c|X| .
Journal: The American Statistician
Pages: 10-15
Issue: 1
Volume: 73
Year: 2019
Month: 1
X-DOI: 10.1080/00031305.2017.1360796
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1360796
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:10-15
Template-Type: ReDIF-Article 1.0
Author-Name: M. C. Jones
Author-X-Name-First: M. C.
Author-X-Name-Last: Jones
Author-Name: Éric Marchand
Author-X-Name-First: Éric
Author-X-Name-Last: Marchand
Author-Name: William E. Strawderman
Author-X-Name-First: William E.
Author-X-Name-Last: Strawderman
Title: On An Intriguing Distributional Identity
Abstract:
For a continuous random variable X with support equal to (a, b), with c.d.f. F, and g: Ω1 → Ω2 a continuous, strictly increasing function, such that Ω1∩Ω2⊇(a, b), but otherwise arbitrary, we establish that the random variables F(X) − F(g(X)) and F(g− 1(X)) − F(X) have the same distribution. Further developments, accompanied by illustrations and observations, address as well the equidistribution identity U − ψ(U) = dψ− 1(U) − U for U ∼ U(0, 1), where ψ is a continuous, strictly increasing and onto function, but otherwise arbitrary. Finally, we expand on applications with connections to variance reduction techniques, the discrepancy between distributions, and a risk identity in predictive density estimation.
Journal: The American Statistician
Pages: 16-21
Issue: 1
Volume: 73
Year: 2019
Month: 1
X-DOI: 10.1080/00031305.2017.1375984
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1375984
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:16-21
Template-Type: ReDIF-Article 1.0
Author-Name: Mithat Gönen
Author-X-Name-First: Mithat
Author-X-Name-Last: Gönen
Author-Name: Wesley O. Johnson
Author-X-Name-First: Wesley O.
Author-X-Name-Last: Johnson
Author-Name: Yonggang Lu
Author-X-Name-First: Yonggang
Author-X-Name-Last: Lu
Author-Name: Peter H. Westfall
Author-X-Name-First: Peter H.
Author-X-Name-Last: Westfall
Title: Comparing Objective and Subjective Bayes Factors for the Two-Sample Comparison: The Classification Theorem in Action
Abstract:
Many Bayes factors have been proposed for comparing population means in two-sample (independent samples) studies. Recently, Wang and Liu presented an “objective” Bayes factor (BF) as an alternative to a “subjective” one presented by Gönen et al. Their report was evidently intended to show the superiority of their BF based on “undesirable behavior” of the latter. A wonderful aspect of Bayesian models is that they provide an opportunity to “lay all cards on the table.” What distinguishes the various BFs in the two-sample problem is the choice of priors (cards) for the model parameters. This article discusses desiderata of BFs that have been proposed, and proposes a new criterion to compare BFs, no matter whether subjectively or objectively determined. A BF may be preferred if it correctly classifies the data as coming from the correct model most often. The criterion is based on a famous result in classification theory to minimize the total probability of misclassification. This criterion is objective, easily verified by simulation, shows clearly the effects (positive or negative) of assuming particular priors, provides new insights into the appropriateness of BFs in general, and provides a new answer to the question, “Which BF is best?”
Journal: The American Statistician
Pages: 22-31
Issue: 1
Volume: 73
Year: 2019
Month: 1
X-DOI: 10.1080/00031305.2017.1322142
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1322142
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:22-31
Template-Type: ReDIF-Article 1.0
Author-Name: Diana C. Mutz
Author-X-Name-First: Diana C.
Author-X-Name-Last: Mutz
Author-Name: Robin Pemantle
Author-X-Name-First: Robin
Author-X-Name-Last: Pemantle
Author-Name: Philip Pham
Author-X-Name-First: Philip
Author-X-Name-Last: Pham
Title: The Perils of Balance Testing in Experimental Design: Messy Analyses of Clean Data
Abstract:
Widespread concern over the credibility of published results has led to scrutiny of statistical practices. We address one aspect of this problem that stems from the use of balance tests in conjunction with experimental data. When random assignment is botched, due either to mistakes in implementation or differential attrition, balance tests can be an important tool in determining whether to treat the data as observational versus experimental. Unfortunately, the use of balance tests has become commonplace in analyses of “clean” data, that is, data for which random assignment can be stipulated. Here, we show that balance tests can destroy the basis on which scientific conclusions are formed, and can lead to erroneous and even fraudulent conclusions. We conclude by advocating that scientists and journal editors resist the use of balance tests in all analyses of clean data. Supplementary materials for this article are available online
Journal: The American Statistician
Pages: 32-42
Issue: 1
Volume: 73
Year: 2019
Month: 1
X-DOI: 10.1080/00031305.2017.1322143
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1322143
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:32-42
Template-Type: ReDIF-Article 1.0
Author-Name: Youyi Fong
Author-X-Name-First: Youyi
Author-X-Name-Last: Fong
Author-Name: Ying Huang
Author-X-Name-First: Ying
Author-X-Name-Last: Huang
Title: Modified Wilcoxon–Mann–Whitney Test and Power Against Strong Null
Abstract:
The Wilcoxon–Mann–Whitney (WMW) test is a popular rank-based two-sample testing procedure for the strong null hypothesis that the two samples come from the same distribution. A modified WMW test, the Fligner–Policello (FP) test, has been proposed for comparing the medians of two populations. A fact that may be under-appreciated among some practitioners is that the FP test can also be used to test the strong null like the WMW. In this article, we compare the power of the WMW and FP tests for testing the strong null. Our results show that neither test is uniformly better than the other and that there can be substantial differences in power between the two choices. We propose a new, modified WMW test that combines the WMW and FP tests. Monte Carlo studies show that the combined test has good power compared to either the WMW and FP test. We provide a fast implementation of the proposed test in an open-source software. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 43-49
Issue: 1
Volume: 73
Year: 2019
Month: 1
X-DOI: 10.1080/00031305.2017.1328375
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1328375
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:43-49
Template-Type: ReDIF-Article 1.0
Author-Name: Xiaofei Wang
Author-X-Name-First: Xiaofei
Author-X-Name-Last: Wang
Author-Name: Nicholas G. Reich
Author-X-Name-First: Nicholas G.
Author-X-Name-Last: Reich
Author-Name: Nicholas J. Horton
Author-X-Name-First: Nicholas J.
Author-X-Name-Last: Horton
Title: Enriching Students’ Conceptual Understanding of Confidence Intervals: An Interactive Trivia-Based Classroom Activity
Abstract:
Confidence intervals provide a way to determine plausible values for a population parameter. They are omnipresent in research articles involving statistical analyses. Appropriately, a key statistical literacy learning objective is the ability to interpret and understand confidence intervals in a wide range of settings. As instructors, we devote a considerable amount of time and effort to ensure that students master this topic in introductory courses and beyond. Yet, studies continue to find that confidence intervals are commonly misinterpreted and that even experts have trouble calibrating their individual confidence levels. In this article, we present a 10-min trivia game-based activity that addresses these misconceptions by exposing students to confidence intervals from a personal perspective. We describe how the activity can be integrated into a statistics course as a one-time activity or with repetition at intervals throughout a course, discuss results of using the activity in class, and present possible extensions. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 50-55
Issue: 1
Volume: 73
Year: 2019
Month: 1
X-DOI: 10.1080/00031305.2017.1305294
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1305294
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:50-55
Template-Type: ReDIF-Article 1.0
Author-Name: Joel E. Cohen
Author-X-Name-First: Joel E.
Author-X-Name-Last: Cohen
Title: Sum of a Random Number of Correlated Random Variables that Depend on the Number of Summands
Abstract:
The mean and variance of a sum of a random number of random variables are well known when the number of summands is independent of each summand and when the summands are independent and identically distributed (iid), or when all summands are identical. In scientific and financial applications, the preceding conditions are often too restrictive. Here, we calculate the mean and variance of a sum of a random number of random summands when the mean and variance of each summand depend on the number of summands and when every pair of summands has the same correlation. This article shows that the variance increases with the correlation between summands and equals the variance in the iid or identical cases when the correlation is zero or one.
Journal: The American Statistician
Pages: 56-60
Issue: 1
Volume: 73
Year: 2019
Month: 1
X-DOI: 10.1080/00031305.2017.1311283
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1311283
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:56-60
Template-Type: ReDIF-Article 1.0
Author-Name: Mario A. Davidson
Author-X-Name-First: Mario A.
Author-X-Name-Last: Davidson
Author-Name: Charlene M. Dewey
Author-X-Name-First: Charlene M.
Author-X-Name-Last: Dewey
Author-Name: Amy E. Fleming
Author-X-Name-First: Amy E.
Author-X-Name-Last: Fleming
Title: Teaching Communication in a Statistical Collaboration Course: A Feasible, Project-Based, Multimodal Curriculum
Abstract:
Many schools offer a statistical collaboration curriculum using standard instructional methods such as lectures whereby students are taught to successfully apply their training. The process of building statisticians' collaborative skills and characteristics can be challenging due to logistical issues, time constraints, unstructured research problems, and resources. Instructors vary in their pedagogy and topics taught, and students' experiences vary. There is a dearth of literature describing how to implement a course integrating communication skills, critical thinking, collaboration, and the integration of team members in a learner-centered format. Few courses integrate behavior-based learning using role-playing, video demonstration and feedback, case-based teaching activities, and presentation of basic statistical concepts. We have developed and implemented a two-semester biostatistics collaboration course, of which the purpose is to develop the students' knowledge, skills, attitudes, and behaviors necessary to interact effectively with investigators. Our innovative curriculum uses a multimodal, project-based, experiential process to address real-world problems provided by real and/or simulated collaborators while minimizing usual challenges. Rubrics and peer evaluation forms are offered as online supplementary materials. This article describes how a collaboration curriculum focusing on communication and team practice is feasible, how it enhances skill and professionalism, and how it can be implemented at other institutions.
Journal: The American Statistician
Pages: 61-69
Issue: 1
Volume: 73
Year: 2019
Month: 1
X-DOI: 10.1080/00031305.2018.1448890
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1448890
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:61-69
Template-Type: ReDIF-Article 1.0
Author-Name: Dongmeng Liu
Author-X-Name-First: Dongmeng
Author-X-Name-Last: Liu
Author-Name: Jinko Graham
Author-X-Name-First: Jinko
Author-X-Name-Last: Graham
Title: Simple Measures of Individual Cluster-Membership Certainty for Hard Partitional Clustering
Abstract:
We propose two probability-like measures of individual cluster-membership certainty that can be applied to a hard partition of the sample such as that obtained from the partitioning around medoids (PAM) algorithm, hierarchical clustering or k-means clustering. One measure extends the individual silhouette widths and the other is obtained directly from the pairwise dissimilarities in the sample. Unlike the classic silhouette, however, the measures behave like probabilities and can be used to investigate an individual’s tendency to belong to a cluster. We also suggest two possible ways to evaluate the hard partition using these measures. We evaluate the performance of both measures in individuals with ambiguous cluster membership, using simulated binary datasets that have been partitioned by the PAM algorithm or continuous datasets that have been partitioned by hierarchical clustering and k-means clustering. For comparison, we also present results from soft-clustering algorithms such as soft analysis clustering (FANNY) and two model-based clustering methods. Our proposed measures perform comparably to the posterior probability estimators from either FANNY or the model-based clustering methods. We also illustrate the proposed measures by applying them to Fisher’s classic dataset on irises.
Journal: The American Statistician
Pages: 70-79
Issue: 1
Volume: 73
Year: 2019
Month: 1
X-DOI: 10.1080/00031305.2018.1459315
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1459315
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:70-79
Template-Type: ReDIF-Article 1.0
Author-Name: Jianjun Wang
Author-X-Name-First: Jianjun
Author-X-Name-Last: Wang
Author-Name: Dallas E. Johnson
Author-X-Name-First: Dallas E.
Author-X-Name-Last: Johnson
Title: An Examination of Discrepancies in Multiple Imputation Procedures Between SAS® and SPSS®
Abstract:
Multiple imputation (MI) has become a feasible method to replace missing data due to the rapid development of computer technology over the past three decades. Nonetheless, a unique issue with MI hinges on the fact that different software packages can give different results. Even when one begins with the same random number seed, conflicting findings can be obtained from the same data under an identical imputation model between SAS® and SPSS®. Consequently, as illustrated in this article, a predictor variable can be claimed both significant and not significant depending on the software being used. Based on the considerations of multiple imputation steps, including result pooling, default selection, and different numbers of imputations, practical suggestions are provided to minimize the discrepancies in the results obtained when using MI. Features of Stata® are briefly reviewed in the Discussion section to broaden the comparison of MI computing across widely used software packages.
Journal: The American Statistician
Pages: 80-88
Issue: 1
Volume: 73
Year: 2019
Month: 1
X-DOI: 10.1080/00031305.2018.1437078
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1437078
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:80-88
Template-Type: ReDIF-Article 1.0
Author-Name: Spyros Missiakoulis
Author-X-Name-First: Spyros
Author-X-Name-Last: Missiakoulis
Title: Phlegon's Stem-and-Leaf Display
Abstract:
The Greek writer Phlegon (80–140 AD) from Tralles in Asia Minor wrote a book entitled On Long-lived Persons that contains a long list of people over a hundred years old. He collected data from the Roman censuses. With respect to the history of statistics, Phlegon's book is the earliest surviving text to use the Stem-and-Leaf display of collected data.
Journal: The American Statistician
Pages: 89-93
Issue: 1
Volume: 73
Year: 2019
Month: 1
X-DOI: 10.1080/00031305.2017.1328376
File-URL: http://hdl.handle.net/10.1080/00031305.2017.1328376
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:89-93
Template-Type: ReDIF-Article 1.0
Author-Name: Megan D. Higgs
Author-X-Name-First: Megan D.
Author-X-Name-Last: Higgs
Author-Name: Xiaoke Zhang
Author-X-Name-First: Xiaoke
Author-X-Name-Last: Zhang
Author-Name: Angelo Elmi
Author-X-Name-First: Angelo
Author-X-Name-Last: Elmi
Author-Name: James M. Flegal
Author-X-Name-First: James M.
Author-X-Name-Last: Flegal
Author-Name: Jessica Utts
Author-X-Name-First: Jessica
Author-X-Name-Last: Utts
Author-Name: Sandra E. Safo
Author-X-Name-First: Sandra E.
Author-X-Name-Last: Safo
Author-Name: Craig A. Rolling
Author-X-Name-First: Craig A.
Author-X-Name-Last: Rolling
Author-Name: Michael J. Higgins
Author-X-Name-First: Michael J.
Author-X-Name-Last: Higgins
Author-Name: Jingyi Jessica Li
Author-X-Name-First: Jingyi
Author-X-Name-Last: Jessica Li
Title: blogdown: Creating Websites With R Markdown.
Journal: The American StatisticianThe American StatisticianThe American StatisticianThe American StatisticianThe American StatisticianThe American StatisticianThe American StatisticianThe American StatisticianThe American StatisticianThe American Statistician
Pages: 94-104
Issue: 1
Volume: 73
Year: 2019
Month: 1
X-DOI: 10.1080/00031305.2018.1538846
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1538846
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:94-104
Template-Type: ReDIF-Article 1.0
Author-Name: M.C. Jones
Author-X-Name-First: M.C.
Author-X-Name-Last: Jones
Title: Letter to the Editor
Journal: The American Statistician
Pages: 105-105
Issue: 1
Volume: 73
Year: 2019
Month: 1
X-DOI: 10.1080/00031305.2018.1556736
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1556736
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:105-105
Template-Type: ReDIF-Article 1.0
Author-Name: The Editors
Title: Editorial Collaborators
Journal: The American Statistician
Pages: 106-108
Issue: 1
Volume: 73
Year: 2019
Month: 1
X-DOI: 10.1080/00031305.2018.1538832
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1538832
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:73:y:2019:i:1:p:106-108
Template-Type: ReDIF-Article 1.0
Author-Name: Tim B. Swartz
Author-X-Name-First: Tim B.
Author-X-Name-Last: Swartz
Title: Where Should I Publish My Sports Paper?
Abstract:
With the increasing fascination of sport in society and the increasing availability of sport-related data, there are great opportunities to carry out sports analytics research. In this article, we discuss some of the issues that are relevant to publishing in the field of sports analytics. Potential publication outlets are identified, some summary statistics are given, and some experiences and opinions are provided.
Journal: The American Statistician
Pages: 103-108
Issue: 2
Volume: 74
Year: 2020
Month: 4
X-DOI: 10.1080/00031305.2018.1459842
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1459842
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:103-108
Template-Type: ReDIF-Article 1.0
Author-Name: Anne Lott
Author-X-Name-First: Anne
Author-X-Name-Last: Lott
Author-Name: Jerome P. Reiter
Author-X-Name-First: Jerome P.
Author-X-Name-Last: Reiter
Title: Wilson Confidence Intervals for Binomial Proportions With Multiple Imputation for Missing Data
Abstract:
We present a Wilson interval for binomial proportions for use with multiple imputation for missing data. Using simulation studies, we show that it can have better repeated sampling properties than the usual confidence interval for binomial proportions based on Rubin’s combining rules. Further, in contrast to the usual multiple imputation confidence interval for proportions, the multiple imputation Wilson interval is always bounded by zero and one. Supplementary material is available online.
Journal: The American Statistician
Pages: 109-115
Issue: 2
Volume: 74
Year: 2020
Month: 4
X-DOI: 10.1080/00031305.2018.1473796
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1473796
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:109-115
Template-Type: ReDIF-Article 1.0
Author-Name: Ernest C. Davenport,
Author-X-Name-First: Ernest C.
Author-X-Name-Last: Davenport,
Author-Name: Kyle Nickodem
Author-X-Name-First: Kyle
Author-X-Name-Last: Nickodem
Author-Name: Mark L. Davison
Author-X-Name-First: Mark L.
Author-X-Name-Last: Davison
Author-Name: Gareth Phillips
Author-X-Name-First: Gareth
Author-X-Name-Last: Phillips
Author-Name: Edmund Graham
Author-X-Name-First: Edmund
Author-X-Name-Last: Graham
Title: The Relative Performance Index: Neutralizing Simpson's Paradox
Abstract:
Comparing populations on one or more variables is often of interest. These comparisons are typically made using the mean; however, it is well known that mean comparisons can lead to misinterpretation because of Simpson's paradox. Simpson's paradox occurs when there is a differential distribution of subpopulations across the populations being compared and the means of those subpopulations are different. This article develops the relative performance index (RPI) to ameliorate effects of Simpson's paradox. Data from the National Assessment of Educational Progress (NAEP) are used to illustrate use of the new index. The utility of RPI is compared to the population mean and a prior index, the balanced index. This article shows how RPI can be generalized to a variety of contexts with implications for decision making.
Journal: The American Statistician
Pages: 116-124
Issue: 2
Volume: 74
Year: 2020
Month: 4
X-DOI: 10.1080/00031305.2018.1451777
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1451777
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:116-124
Template-Type: ReDIF-Article 1.0
Author-Name: Vahid Nassiri
Author-X-Name-First: Vahid
Author-X-Name-Last: Nassiri
Author-Name: Geert Molenberghs
Author-X-Name-First: Geert
Author-X-Name-Last: Molenberghs
Author-Name: Geert Verbeke
Author-X-Name-First: Geert
Author-X-Name-Last: Verbeke
Author-Name: João Barbosa-Breda
Author-X-Name-First: João
Author-X-Name-Last: Barbosa-Breda
Title: Iterative Multiple Imputation: A Framework to Determine the Number of Imputed Datasets
Abstract:
We consider multiple imputation as a procedure iterating over a set of imputed datasets. Based on an appropriate stopping rule the number of imputed datasets is determined. Simulations and real-data analyses indicate that the sufficient number of imputed datasets may in some cases be substantially larger than the very small numbers that are usually recommended. For an easier use in various applications, the proposed method is implemented in the R package imi.
Journal: The American Statistician
Pages: 125-136
Issue: 2
Volume: 74
Year: 2020
Month: 4
X-DOI: 10.1080/00031305.2018.1543615
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1543615
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:125-136
Template-Type: ReDIF-Article 1.0
Author-Name: Quentin F. Gronau
Author-X-Name-First: Quentin F.
Author-X-Name-Last: Gronau
Author-Name: Alexander Ly
Author-X-Name-First: Alexander
Author-X-Name-Last: Ly
Author-Name: Eric-Jan Wagenmakers
Author-X-Name-First: Eric-Jan
Author-X-Name-Last: Wagenmakers
Title: Informed Bayesian t-Tests
Abstract:
Across the empirical sciences, few statistical procedures rival the popularity of the frequentist t
-test. In contrast, the Bayesian versions of the t
-test have languished in obscurity. In recent years, however, the theoretical and practical advantages of the Bayesian t
-test have become increasingly apparent and various Bayesian t-tests have been proposed, both objective ones (based on general desiderata) and subjective ones (based on expert knowledge). Here, we propose a flexible t-prior for standardized effect size that allows computation of the Bayes factor by evaluating a single numerical integral. This specification contains previous objective and subjective t-test Bayes factors as special cases. Furthermore, we propose two measures for informed prior distributions that quantify the departure from the objective Bayes factor desiderata of predictive matching and information consistency. We illustrate the use of informed prior distributions based on an expert prior elicitation effort. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 137-143
Issue: 2
Volume: 74
Year: 2020
Month: 4
X-DOI: 10.1080/00031305.2018.1562983
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1562983
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:137-143
Template-Type: ReDIF-Article 1.0
Author-Name: Michael Friendly
Author-X-Name-First: Michael
Author-X-Name-Last: Friendly
Author-Name: Matthew Sigal
Author-X-Name-First: Matthew
Author-X-Name-Last: Sigal
Title: Visualizing Tests for Equality of Covariance Matrices
Abstract:
This article explores a variety of topics related to the question of testing the equality of covariance matrices in multivariate linear models, particularly in the MANOVA setting. Further, a plot of the components of Box’s M test is proposed that shows how groups differ in covariance and also suggests other visualizations and alternative test statistics. These methods are implemented and freely available in the heplots and candisc packages for R. Examples from the article and some further extensions are available in the online supplementary materials.
Journal: The American Statistician
Pages: 144-155
Issue: 2
Volume: 74
Year: 2020
Month: 4
X-DOI: 10.1080/00031305.2018.1497537
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1497537
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:144-155
Template-Type: ReDIF-Article 1.0
Author-Name: Christopher Weld
Author-X-Name-First: Christopher
Author-X-Name-Last: Weld
Author-Name: Andrew Loh
Author-X-Name-First: Andrew
Author-X-Name-Last: Loh
Author-Name: Lawrence Leemis
Author-X-Name-First: Lawrence
Author-X-Name-Last: Leemis
Title: Plotting Likelihood-Ratio-Based Confidence Regions for Two-Parameter Univariate Probability Models
Abstract:
Plotting two-parameter confidence regions is nontrivial. Numerical methods often rely on a computationally expensive grid-like exploration of the parameter space. A recent advance reduces the two-dimensional problem to many one-dimensional problems employing a trigonometric transformation that assigns an angle ϕ
from the maximum likelihood estimator, and an unknown radial distance to its confidence region boundary. This paradigm shift can improve computational runtime by orders of magnitude, but it is not robust. Specifically, parameters differing greatly in magnitude and/or challenging nonconvex confidence region shapes make the plot susceptible to inefficiencies and/or inaccuracies. This article improves the technique by (i) keeping confidence region boundary searches in the parameter space, (ii) selectively targeting confidence region boundary points in lieu of uniformly spaced ϕ
angles from the maximum likelihood estimator and (iii) enabling access to regions otherwise unreachable due to multiple roots for select ϕ
angles. Two heuristics are given for ϕ
selection: an elliptic-inspired angle selection heuristic and an intelligent smoothing search heuristic. Finally, a jump-center heuristic permits plotting otherwise inaccessible multiroot regions. This article develops these heuristics for two-parameter likelihood-ratio-based confidence regions associated with univariate probability distributions, and introduces the R conf package, which automates the process and is publicly available via CRAN.
Journal: The American Statistician
Pages: 156-168
Issue: 2
Volume: 74
Year: 2020
Month: 4
X-DOI: 10.1080/00031305.2018.1564696
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1564696
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:156-168
Template-Type: ReDIF-Article 1.0
Author-Name: Irene Epifanio
Author-X-Name-First: Irene
Author-X-Name-Last: Epifanio
Author-Name: M. Victoria Ibáñez
Author-X-Name-First: M. Victoria
Author-X-Name-Last: Ibáñez
Author-Name: Amelia Simó
Author-X-Name-First: Amelia
Author-X-Name-Last: Simó
Title: Archetypal Analysis With Missing Data: See All Samples by Looking at a Few Based on Extreme Profiles
Abstract:
In this article, we propose several methodologies for handling missing or incomplete data in archetype analysis (AA) and archetypoid analysis (ADA). AA seeks to find archetypes, which are convex combinations of data points, and to approximate the samples as mixtures of those archetypes. In ADA, the representative archetypal data belong to the sample, that is, they are actual data points. With the proposed procedures, missing data are not discarded or previously filled by imputation and the theoretical properties regarding location of archetypes are guaranteed, unlike the previous approaches. The new procedures adapt the AA algorithm either by considering the missing values in the computation of the solution or by skipping them. In the first case, the solutions of previous approaches are modified to fulfill the theory and a new procedure is proposed, where the missing values are updated by the fitted values. In this second case, the procedure is based on the estimation of dissimilarities between samples and the projection of these dissimilarities in a new space, where AA or ADA is applied, and those results are used to provide a solution in the original space. A comparative analysis is carried out in a simulation study, with favorable results. The methodology is also applied to two real datasets: a well-known climate dataset and a global development dataset. We illustrate how these unsupervised methodologies allow complex data to be understood, even by nonexperts. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 169-183
Issue: 2
Volume: 74
Year: 2020
Month: 4
X-DOI: 10.1080/00031305.2018.1545700
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1545700
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:169-183
Template-Type: ReDIF-Article 1.0
Author-Name: Kelsey L. Grantham
Author-X-Name-First: Kelsey L.
Author-X-Name-Last: Grantham
Author-Name: Andrew B. Forbes
Author-X-Name-First: Andrew B.
Author-X-Name-Last: Forbes
Author-Name: Stephane Heritier
Author-X-Name-First: Stephane
Author-X-Name-Last: Heritier
Author-Name: Jessica Kasza
Author-X-Name-First: Jessica
Author-X-Name-Last: Kasza
Title: Time Parameterizations in Cluster Randomized Trial Planning
Abstract:
Models for cluster randomized trials conducted over multiple time periods should account for underlying temporal trends. However, in practice there is often limited knowledge or data available to inform the choice of time parameterization of these trends, or to anticipate the implications of this choice on trial planning. In this article, we establish a sufficient condition for when the choice of time parameterization does not affect the form of the variance of the treatment effect estimator, thereby simplifying the planning of these trials.
Journal: The American Statistician
Pages: 184-189
Issue: 2
Volume: 74
Year: 2020
Month: 4
X-DOI: 10.1080/00031305.2019.1623072
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1623072
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:184-189
Template-Type: ReDIF-Article 1.0
Author-Name: Tim Johnson
Author-X-Name-First: Tim
Author-X-Name-Last: Johnson
Author-Name: Christopher T. Dawes
Author-X-Name-First: Christopher T.
Author-X-Name-Last: Dawes
Author-Name: Dalton Conley
Author-X-Name-First: Dalton
Author-X-Name-Last: Conley
Title: How Does a Statistician Raise an Army? The Time When John W. Tukey, a Team of Luminaries, and a Statistics Graduate Student Repaired the Vietnam Selective Service Lotteries
Abstract:
Scholars have documented the failed randomization in 1969’s inaugural Vietnam Selective Service Lottery, but the story of how statisticians fixed that problem remains untold. Here, as the 50th anniversary of these events approaches, we recount how John W. Tukey, a team of statistical luminaries, and a graduate student from the University of Chicago repaired the draft lottery.
Journal: The American Statistician
Pages: 190-196
Issue: 2
Volume: 74
Year: 2020
Month: 4
X-DOI: 10.1080/00031305.2019.1677267
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1677267
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:190-196
Template-Type: ReDIF-Article 1.0
Author-Name: James A. Hanley
Author-X-Name-First: James A.
Author-X-Name-Last: Hanley
Title: Lest We Forget: U.S. Selective Service Lotteries, 1917–2019
Abstract:
The United States held 13 draft lotteries between 1917 and 1975, and a contingency procedure is in place for a selective service lottery were there ever to be a return to the draft. In 11 of these instances, the selection procedures spread the risk/harm evenhandedly. In two, whose anniversaries approach, the lotteries were problematic. Fortunately, one (1940) employed a “doubly robust” selection scheme that preserved the overall randomness; the other (1969) did not, and was not even-handed. These 13 lotteries provide examples of sound and unsound statistical planning, statistical acuity, and lessons ignored/learned. Existing and newly assembled raw data are used to describe the randomizations and to statistically measure deviations from randomness. The key statistical principle used in the selection procedures in WW I and WW II, in 1970–1975, and in the current (2019) contingency plan, is that of “double”—or even “quadruple”—robustness. This principle was used in medieval lotteries, such as the (four-month) two-drum lottery of 1569. Its use in the speeded up 2019 version provides a valuable and transparent statistical backstop where “an image of absolute fairness” is the over-riding concern.
Journal: The American Statistician
Pages: 197-206
Issue: 2
Volume: 74
Year: 2020
Month: 4
X-DOI: 10.1080/00031305.2019.1699444
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1699444
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:197-206
Template-Type: ReDIF-Article 1.0
Author-Name: Jong Hee Park
Author-X-Name-First: Jong Hee
Author-X-Name-Last: Park
Title: The Art of Statistics: How to Learn From Data
Journal: The American Statistician
Pages: 207-207
Issue: 2
Volume: 74
Year: 2020
Month: 4
X-DOI: 10.1080/00031305.2020.1745572
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1745572
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:207-207
Template-Type: ReDIF-Article 1.0
Author-Name: Daniel Manrique-Vallier
Author-X-Name-First: Daniel
Author-X-Name-Last: Manrique-Vallier
Title: Capture-Recapture Methods for the Social and Medical Sciences
Journal: The American Statistician
Pages: 207-208
Issue: 2
Volume: 74
Year: 2020
Month: 4
X-DOI: 10.1080/00031305.2020.1745574
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1745574
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:207-208
Template-Type: ReDIF-Article 1.0
Author-Name: Seung Jun Shin
Author-X-Name-First: Seung Jun
Author-X-Name-Last: Shin
Title: Model-Based Clustering and Classification for Data Science: With Applications in R
Journal: The American Statistician
Pages: 208-209
Issue: 2
Volume: 74
Year: 2020
Month: 4
X-DOI: 10.1080/00031305.2020.1745576
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1745576
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:208-209
Template-Type: ReDIF-Article 1.0
Author-Name: Paul Johnson
Author-X-Name-First: Paul
Author-X-Name-Last: Johnson
Title: R Markdown: The Definitive Guide
Journal: The American Statistician
Pages: 209-210
Issue: 2
Volume: 74
Year: 2020
Month: 4
X-DOI: 10.1080/00031305.2020.1745577
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1745577
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:209-210
Template-Type: ReDIF-Article 1.0
Author-Name: David C. Hoaglin
Author-X-Name-First: David C.
Author-X-Name-Last: Hoaglin
Title: Did Phlegon Actually Use a Stem-and-Leaf Display?
Journal: The American Statistician
Pages: 211-211
Issue: 2
Volume: 74
Year: 2020
Month: 4
X-DOI: 10.1080/00031305.2020.1721329
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1721329
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:211-211
Template-Type: ReDIF-Article 1.0
Author-Name: The Editors
Title: Correction
Journal: The American Statistician
Pages: 212-212
Issue: 2
Volume: 74
Year: 2020
Month: 4
X-DOI: 10.1080/00031305.2019.1708461
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1708461
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:212-212
Template-Type: ReDIF-Article 1.0
Author-Name: Melinda H. McCann
Author-X-Name-First: Melinda H.
Author-X-Name-Last: McCann
Author-Name: Joshua D. Habiger
Author-X-Name-First: Joshua D.
Author-X-Name-Last: Habiger
Title: The Detection of Nonnegligible Directional Effects With Associated Measures of Statistical Significance
Abstract:
When comparing two treatment groups, the objectives are often to (1) determine if the difference between groups (the effect) is of scientific interest, or nonnegligible, and (2) determine if the effect is positive or negative. In practice, a p-value corresponding to the null hypothesis that no effect exists is used to accomplish the first objective and a point estimate for the effect is used to accomplish the second objective. This article demonstrates that this approach is fundamentally flawed and proposes a new approach. The proposed method allows for claims regarding the size of an effect (nonnegligible vs. negligible) and its nature (positive vs. negative) to be made, and provides measures of statistical significance associated with each claim.
Journal: The American Statistician
Pages: 213-217
Issue: 3
Volume: 74
Year: 2020
Month: 7
X-DOI: 10.1080/00031305.2018.1497538
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1497538
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:213-217
Template-Type: ReDIF-Article 1.0
Author-Name: Haruhiko Ogasawara
Author-X-Name-First: Haruhiko
Author-X-Name-Last: Ogasawara
Title: Some Improvements on Markov's Theorem with Extensions
Abstract:
Markov's theorem for an upper bound of the probability related to a nonnegative random variable has been improved using additional information in almost the nontrivial entire range of the variable. In the improvement, Cantelli's inequality is applied to the square root of the original variable, whose expectation is finite when that of the original variable is finite. The improvement has been extended to lower bounds and monotonic transformations of the original variable. The improvements are used in Chebyshev's inequality and its multivariate version.
Journal: The American Statistician
Pages: 218-225
Issue: 3
Volume: 74
Year: 2020
Month: 7
X-DOI: 10.1080/00031305.2018.1497539
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1497539
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:218-225
Template-Type: ReDIF-Article 1.0
Author-Name: Ling Leng
Author-X-Name-First: Ling
Author-X-Name-Last: Leng
Author-Name: Wei Zhu
Author-X-Name-First: Wei
Author-X-Name-Last: Zhu
Title: Compound Regression and Constrained Regression: Nonparametric Regression Frameworks for EIV Models
Abstract:
Errors-in-variable (EIV) regression is often used to gauge linear relationship between two variables both suffering from measurement and other errors, such as, the comparison of two measurement platforms (e.g., RNA sequencing vs. microarray). Scientists are often at a loss as to which EIV regression model to use for there are infinite many choices. We provide sound guidelines toward viable solutions to this dilemma by introducing two general nonparametric EIV regression frameworks: the compound regression and the constrained regression. It is shown that these approaches are equivalent to each other and, to the general parametric structural modeling approach. The advantages of these methods lie in their intuitive geometric representations, their distribution free nature, and their ability to offer candidate solutions with various optimal properties when the ratio of the error variances is unknown. Each includes the classic nonparametric regression methods of ordinary least squares, geometric mean regression (GMR), and orthogonal regression as special cases. Under these general frameworks, one can readily uncover some surprising optimal properties of the GMR, and truly comprehend the benefit of data normalization. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 226-232
Issue: 3
Volume: 74
Year: 2020
Month: 7
X-DOI: 10.1080/00031305.2018.1556734
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1556734
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:226-232
Template-Type: ReDIF-Article 1.0
Author-Name: Srinjoy Das
Author-X-Name-First: Srinjoy
Author-X-Name-Last: Das
Author-Name: Dimitris N. Politis
Author-X-Name-First: Dimitris N.
Author-X-Name-Last: Politis
Title: Nonparametric Estimation of the Conditional Distribution at Regression Boundary Points
Abstract:
Nonparametric regression is a standard statistical tool with increased importance in the Big Data era. Boundary points pose additional difficulties but local polynomial regression can be used to alleviate them. Local linear regression, for example, is easy to implement and performs quite well both at interior and boundary points. Estimating the conditional distribution function and/or the quantile function at a given regressor point is immediate via standard kernel methods but problems ensue if local linear methods are to be used. In particular, the distribution function estimator is not guaranteed to be monotone increasing, and the quantile curves can “cross.” In the article at hand, a simple method of correcting the local linear distribution estimator for monotonicity is proposed, and its good performance is demonstrated via simulations and real data examples. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 233-242
Issue: 3
Volume: 74
Year: 2020
Month: 7
X-DOI: 10.1080/00031305.2018.1558109
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1558109
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:233-242
Template-Type: ReDIF-Article 1.0
Author-Name: Sander Greenland
Author-X-Name-First: Sander
Author-X-Name-Last: Greenland
Author-Name: Michael P. Fay
Author-X-Name-First: Michael P.
Author-X-Name-Last: Fay
Author-Name: Erica H. Brittain
Author-X-Name-First: Erica H.
Author-X-Name-Last: Brittain
Author-Name: Joanna H. Shih
Author-X-Name-First: Joanna H.
Author-X-Name-Last: Shih
Author-Name: Dean A. Follmann
Author-X-Name-First: Dean A.
Author-X-Name-Last: Follmann
Author-Name: Erin E. Gabriel
Author-X-Name-First: Erin E.
Author-X-Name-Last: Gabriel
Author-Name: James M. Robins
Author-X-Name-First: James M.
Author-X-Name-Last: Robins
Title: On Causal Inferences for Personalized Medicine: How Hidden Causal Assumptions Led to Erroneous Causal Claims About the D-Value
Abstract:
Personalized medicine asks if a new treatment will help a particular patient, rather than if it improves the average response in a population. Without a causal model to distinguish these questions, interpretational mistakes arise. These mistakes are seen in an article by Demidenko that recommends the “D-value,” which is the probability that a randomly chosen person from the new-treatment group has a higher value for the outcome than a randomly chosen person from the control-treatment group. The abstract states “The D-value has a clear interpretation as the proportion of patients who get worse after the treatment” with similar assertions appearing later. We show these statements are incorrect because they require assumptions about the potential outcomes which are neither testable in randomized experiments nor plausible in general. The D-value will not equal the proportion of patients who get worse after treatment if (as expected) those outcomes are correlated. Independence of potential outcomes is unrealistic and eliminates any personalized treatment effects; with dependence, the D-value can even imply treatment is better than control even though most patients are harmed by the treatment. Thus, D-values are misleading for personalized medicine. To prevent misunderstandings, we advise incorporating causal models into basic statistics education.
Journal: The American Statistician
Pages: 243-248
Issue: 3
Volume: 74
Year: 2020
Month: 7
X-DOI: 10.1080/00031305.2019.1575771
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1575771
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:243-248
Template-Type: ReDIF-Article 1.0
Author-Name: Pierre Baldi
Author-X-Name-First: Pierre
Author-X-Name-Last: Baldi
Author-Name: Babak Shahbaba
Author-X-Name-First: Babak
Author-X-Name-Last: Shahbaba
Title: Bayesian Causality
Abstract:
Although no universally accepted definition of causality exists, in practice one is often faced with the question of statistically assessing causal relationships in different settings. We present a uniform general approach to causality problems derived from the axiomatic foundations of the Bayesian statistical framework. In this approach, causality statements are viewed as hypotheses, or models, about the world and the fundamental object to be computed is the posterior distribution of the causal hypotheses, given the data and the background knowledge. Computation of the posterior, illustrated here in simple examples, may involve complex probabilistic modeling but this is no different than in any other Bayesian modeling situation. The main advantage of the approach is its connection to the axiomatic foundations of the Bayesian framework, and the general uniformity with which it can be applied to a variety of causality settings, ranging from specific to general cases, or from causes of effects to effects of causes.
Journal: The American Statistician
Pages: 249-257
Issue: 3
Volume: 74
Year: 2020
Month: 7
X-DOI: 10.1080/00031305.2019.1647876
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1647876
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:249-257
Template-Type: ReDIF-Article 1.0
Author-Name: Mahayaudin M. Mansor
Author-X-Name-First: Mahayaudin M.
Author-X-Name-Last: Mansor
Author-Name: David A. Green
Author-X-Name-First: David A.
Author-X-Name-Last: Green
Author-Name: Andrew V. Metcalfe
Author-X-Name-First: Andrew V.
Author-X-Name-Last: Metcalfe
Title: Detecting Directionality in Time Series
Abstract:
Directionality can be seen in many stationary time series from various disciplines, but it is overlooked when fitting linear models with Gaussian errors. Moreover, we cannot rely on distinguishing directionality by comparing a plot of a time series in time order with a plot in reverse time order. In general, a statistical measure is required to detect and quantify directionality. There are several quite different qualitative forms of directionality, and we distinguish: rapid rises followed by slow recessions; rapid increases and rapid decreases from the mean followed by slow recovery toward the mean; directionality above or below some threshold; and intermittent directionality. The first objective is to develop a suite of statistical measures that will detect directionality and help classify its nature. The second objective is to demonstrate the potential benefits of detecting directionality. We consider applications from business, environmental science, finance, and medicine. Time series data are collected from many processes, both natural and anthropogenic, by a wide range of organizations, and directionality can easily be monitored as part of routine analysis. We suggest that doing so may provide new insights to the processes.
Journal: The American Statistician
Pages: 258-266
Issue: 3
Volume: 74
Year: 2020
Month: 7
X-DOI: 10.1080/00031305.2018.1545699
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1545699
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:258-266
Template-Type: ReDIF-Article 1.0
Author-Name: McKinley L. Blackburn
Author-X-Name-First: McKinley L.
Author-X-Name-Last: Blackburn
Title: Bias in Small-Sample Inference With Count-Data Models
Abstract:
Both Poisson and negative binomial regression can provide quasi-likelihood estimates for coefficients in exponential-mean models that are consistent in the presence of distributional misspecification. It has generally been recommended, however, that inference be carried out using asymptotically robust estimators for the parameter covariance matrix. As with linear models, such robust inference tends to lead to over-rejection of null hypotheses in small samples. Alternative methods for estimating coefficient estimator variances are considered. No one approach seems to remove all test bias, but the results do suggest that the use of the jackknife with Poisson regression tends to be least biased for inference.
Journal: The American Statistician
Pages: 267-273
Issue: 3
Volume: 74
Year: 2020
Month: 7
X-DOI: 10.1080/00031305.2018.1564699
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1564699
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:267-273
Template-Type: ReDIF-Article 1.0
Author-Name: Andrew Kane
Author-X-Name-First: Andrew
Author-X-Name-Last: Kane
Author-Name: Abhyuday Mandal
Author-X-Name-First: Abhyuday
Author-X-Name-Last: Mandal
Title: A New Analysis Strategy for Designs With Complex Aliasing
Abstract:
Nonregular designs are popular in planning industrial experiments for their run-size economy. These designs often produce partially aliased effects, where the effects of different factors cannot be completely separated from each other. In this article, we propose applying an adaptive lasso regression as an analytical tool for designs with complex aliasing. Its utility compared to traditional methods is demonstrated by analyzing real-life experimental data and simulation studies.
Journal: The American Statistician
Pages: 274-281
Issue: 3
Volume: 74
Year: 2020
Month: 7
X-DOI: 10.1080/00031305.2019.1585287
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1585287
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:274-281
Template-Type: ReDIF-Article 1.0
Author-Name: Mintaek Lee
Author-X-Name-First: Mintaek
Author-X-Name-Last: Lee
Author-Name: Jaechoul Lee
Author-X-Name-First: Jaechoul
Author-X-Name-Last: Lee
Title: Trend and Return Level of Extreme Snow Events in New York City
Abstract:
A major winter storm brought up to 42 inches of snow in parts of the Mid-Atlantic and Northeast states for January 22–24, 2016. The blizzard of January 2016 impacted about 102.8 million people, claiming at least 55 lives and $500 million to $3 billion in economic losses. This article studies two important aspects of extreme snowfall events: 1. trends in annual maxima and threshold exceedances and 2. return levels for extreme snowfall. Applying extreme value methods to the extreme snow data in the New York City area, we quantify linear trends in extreme snowfall and assess how severe the 2016 blizzard is in terms of return levels. To find a more realistic standard error for the extreme value methods, we extend Smith’s method to adapt to both spatial and temporal correlations in the snow data. Our results show increasing, but insignificant trends in the annual maximum snowfall series. However, we find that the 87.5th percentile snowfall has significantly increased by 0.564 inches per decade, suggesting that, while the maximum snowfall is not significantly increasing, there have been increases in the snowfall among the larger storms. We also find that the 2016 blizzard is indeed an extreme snow event equivalent to about a 40-year return level in the New York City area. The extreme value methods used in this study are thoroughly illustrated for general readers. Data and modularized programming codes are to be available online to aid practitioners in using extreme value methods in applications.
Journal: The American Statistician
Pages: 282-293
Issue: 3
Volume: 74
Year: 2020
Month: 7
X-DOI: 10.1080/00031305.2019.1592780
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1592780
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:282-293
Template-Type: ReDIF-Article 1.0
Author-Name: Pankaj Bhagwat
Author-X-Name-First: Pankaj
Author-X-Name-Last: Bhagwat
Author-Name: Éric Marchand
Author-X-Name-First: Éric
Author-X-Name-Last: Marchand
Title: On a Proper Bayes, but Inadmissible Estimator
Abstract:
We present an example of a proper Bayes point estimator which is inadmissible. It occurs for a negative binomial model with shape parameter a, probability parameter p, prior densities of the form π(a,p) = β g(a) (1−p)β−1
, and for estimating the population mean μ=a(1−p)/p
under squared error loss. Other intriguing features are exhibited such as the constancy of the Bayes estimator with respect to the choice of g, including degenerate or known a cases.
Journal: The American Statistician
Pages: 294-296
Issue: 3
Volume: 74
Year: 2020
Month: 7
X-DOI: 10.1080/00031305.2019.1604432
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1604432
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:294-296
Template-Type: ReDIF-Article 1.0
Author-Name: Ibrahim Salama
Author-X-Name-First: Ibrahim
Author-X-Name-Last: Salama
Author-Name: Gary Koch
Author-X-Name-First: Gary
Author-X-Name-Last: Koch
Title: On the Maximum–Minimums Identity: Extension and Applications
Abstract:
For real numbers x1,…,xn
the maximum–minimums identity allows us to express the maximum of x1,…,xn
in terms of the minimums of subsets of {x1,…,xn}
. In this note, we provide an extension allowing us to express the kth-ranked element in terms of the minimums of subsets of sizes (n−k+1),…,n
. We also discuss the dual identity, allowing us to express the kth-ranked element in terms of the maximums of subsets of sizes k,…,n
. We present three examples: The first relates to the expected value of order statistics from independent nonidentical geometric distributions, the second to the partial coupon collector’s problem, and the third to relations among moments of order statistics.
Journal: The American Statistician
Pages: 297-300
Issue: 3
Volume: 74
Year: 2020
Month: 7
X-DOI: 10.1080/00031305.2019.1638832
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1638832
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:297-300
Template-Type: ReDIF-Article 1.0
Author-Name: Eugene D. Gallagher
Author-X-Name-First: Eugene D.
Author-X-Name-Last: Gallagher
Title: Was Quetelet’s Average Man Normal?
Abstract:
Quetelet’s data on Scottish chest girths are analyzed with eight normality tests. In contrast to Quetelet’s conclusion that the data are fit well by what is now known as the normal distribution, six of eight normality tests provide strong evidence that the chest circumferences are not normally distributed. Using corrected chest circumferences from Stigler, the χ2 test no longer provides strong evidence against normality, but five commonly used normality tests do. The D’Agostino–Pearson K2 and Jarque–Bera tests, based only on skewness and kurtosis, find that both Quetelet’s original data and the Stigler-corrected data are consistent with the hypothesis of normality. The major reason causing most normality tests to produce low p-values, indicating that Quetelet’s data are not normally distributed, is that the chest circumferences were reported in whole inches and rounding of large numbers of observations can produce many tied values that strongly affect most normality tests. Users should be cautious using many standard normality tests if data have ties, are rounded, and the ratio of the standard deviation to rounding interval is small.
Journal: The American Statistician
Pages: 301-306
Issue: 3
Volume: 74
Year: 2020
Month: 7
X-DOI: 10.1080/00031305.2019.1706635
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1706635
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:301-306
Template-Type: ReDIF-Article 1.0
Author-Name: Yongdai Kim
Author-X-Name-First: Yongdai
Author-X-Name-Last: Kim
Title: The 9 Pitfalls of Data Science
Journal: The American Statistician
Pages: 307-307
Issue: 3
Volume: 74
Year: 2020
Month: 7
X-DOI: 10.1080/00031305.2020.1790216
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1790216
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:307-307
Template-Type: ReDIF-Article 1.0
Author-Name: Brandon Butcher
Author-X-Name-First: Brandon
Author-X-Name-Last: Butcher
Author-Name: Brian J. Smith
Author-X-Name-First: Brian J.
Author-X-Name-Last: Smith
Title: Feature Engineering and Selection: A Practical Approach for Predictive Models
Journal: The American Statistician
Pages: 308-309
Issue: 3
Volume: 74
Year: 2020
Month: 7
X-DOI: 10.1080/00031305.2020.1790217
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1790217
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:308-309
Template-Type: ReDIF-Article 1.0
Author-Name: Bailey K. Fosdick
Author-X-Name-First: Bailey K.
Author-X-Name-Last: Fosdick
Author-Name: G. Brooke Anderson
Author-X-Name-First: G.
Author-X-Name-Last: Brooke Anderson
Title: Modern Statistics for Modern Biology
Journal: The American Statistician
Pages: 309-311
Issue: 3
Volume: 74
Year: 2020
Month: 7
X-DOI: 10.1080/00031305.2020.1790218
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1790218
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:309-311
Template-Type: ReDIF-Article 1.0
Author-Name: Jonathan M. Wells
Author-X-Name-First: Jonathan M.
Author-X-Name-Last: Wells
Title: Surprises in Probability: Seventeen Short Stories
Journal: The American Statistician
Pages: 311-311
Issue: 3
Volume: 74
Year: 2020
Month: 7
X-DOI: 10.1080/00031305.2020.1790219
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1790219
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:311-311
Template-Type: ReDIF-Article 1.0
Author-Name: Robert B. Lund
Author-X-Name-First: Robert B.
Author-X-Name-Last: Lund
Title: Time Series: A Data Analysis Approach Using R
Journal: The American Statistician
Pages: 312-312
Issue: 3
Volume: 74
Year: 2020
Month: 7
X-DOI: 10.1080/00031305.2020.1790221
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1790221
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:312-312
Template-Type: ReDIF-Article 1.0
Author-Name: Ronald Christensen
Author-X-Name-First: Ronald
Author-X-Name-Last: Christensen
Title: Comment on “Test for Trend With a Multinomial Outcome” by Szabo (2019)
Journal: The American Statistician
Pages: 313-314
Issue: 3
Volume: 74
Year: 2020
Month: 7
X-DOI: 10.1080/00031305.2020.1763835
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1763835
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:313-314
Template-Type: ReDIF-Article 1.0
Author-Name: Gunnar Taraldsen
Author-X-Name-First: Gunnar
Author-X-Name-Last: Taraldsen
Title: Micha Mandel (2020), “The Scaled Uniform Model Revisited,” The American Statistician, 74:1, 98–100: Comment
Journal: The American Statistician
Pages: 315-315
Issue: 3
Volume: 74
Year: 2020
Month: 7
X-DOI: 10.1080/00031305.2020.1769727
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1769727
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:3:p:315-315
Template-Type: ReDIF-Article 1.0
Author-Name: Iain L. MacDonald
Author-X-Name-First: Iain L.
Author-X-Name-Last: MacDonald
Author-Name: Feroz Bhamani
Author-X-Name-First: Feroz
Author-X-Name-Last: Bhamani
Title: A Time-Series Model for Underdispersed or Overdispersed Counts
Abstract:
It is common for time series of unbounded counts (that is, nonnegative integers) to display overdispersion relative to the Poisson. Such an overdispersed series can be modeled by a hidden Markov model with Poisson state-dependent distributions (a “Poisson–HMM”), since a Poisson–HMM allows for both overdispersion and serial dependence. Time series of underdispersed counts seems less common, but more awkward to model; a Poisson–HMM cannot cope with underdispersion. But if in a Poisson–HMM one replaces the Poisson distributions by Conway–Maxwell–Poisson distributions, one gets a class of models which can allow for under- or overdispersion (and serial dependence). In addition, this class can cope with the combination of slight overdispersion and substantial serial dependence, a combination that is apparently difficult for a Poisson–HMM to represent. We discuss the properties of this class of models, and use direct numerical maximization of likelihood to fit a range of models to three published series of counts which display underdispersion, and to a series which displays slight overdispersion plus substantial serial dependence. In addition, we illustrate how such models can be fitted without imputation when some observations are missing from the series, and how approximate standard errors of the parameter estimates can be found. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 317-328
Issue: 4
Volume: 74
Year: 2020
Month: 10
X-DOI: 10.1080/00031305.2018.1505656
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1505656
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:4:p:317-328
Template-Type: ReDIF-Article 1.0
Author-Name: Peihua Qiu
Author-X-Name-First: Peihua
Author-X-Name-Last: Qiu
Title: Big Data? Statistical Process Control Can Help!
Abstract:
“Big data” is a buzzword these days due to an enormous amount of data-rich applications in different industries and research projects. In practice, big data often take the form of data streams in the sense that new batches of data keep being collected over time. One fundamental research problem when analyzing big data in a given application is to monitor the underlying sequential process of the observed data to see whether it is longitudinally stable, or how its distribution changes over time. To monitor a sequential process, one major statistical tool is the statistical process control (SPC) charts, which have been developed and used mainly for monitoring production lines in the manufacturing industries during the past several decades. With many new and versatile SPC methods developed in the recent research, it is our belief that SPC can become a powerful tool for handling many big data applications that are beyond the production line monitoring. In this article, we introduce some recent SPC methods, and discuss their potential to solve some big data problems. Certain challenges in the interface between the current SPC research and some big data applications are also discussed.
Journal: The American Statistician
Pages: 329-344
Issue: 4
Volume: 74
Year: 2020
Month: 10
X-DOI: 10.1080/00031305.2019.1700163
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1700163
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:4:p:329-344
Template-Type: ReDIF-Article 1.0
Author-Name: Li Xu
Author-X-Name-First: Li
Author-X-Name-Last: Xu
Author-Name: Chris Gotwalt
Author-X-Name-First: Chris
Author-X-Name-Last: Gotwalt
Author-Name: Yili Hong
Author-X-Name-First: Yili
Author-X-Name-Last: Hong
Author-Name: Caleb B. King
Author-X-Name-First: Caleb B.
Author-X-Name-Last: King
Author-Name: William Q. Meeker
Author-X-Name-First: William Q.
Author-X-Name-Last: Meeker
Title: Applications of the Fractional-Random-Weight Bootstrap
Abstract:
For several decades, the resampling based bootstrap has been widely used for computing confidence intervals (CIs) for applications where no exact method is available. However, there are many applications where the resampling bootstrap method cannot be used. These include situations where the data are heavily censored due to the success response being a rare event, situations where there is insufficient mixing of successes and failures across the explanatory variable(s), and designed experiments where the number of parameters is close to the number of observations. These three situations all have in common that there may be a substantial proportion of the resamples where it is not possible to estimate all of the parameters in the model. This article reviews the fractional-random-weight bootstrap method and demonstrates how it can be used to avoid these problems and construct CIs in a way that is accessible to statistical practitioners. The fractional-random-weight bootstrap method is easy to use and has advantages over the resampling method in many challenging applications.
Journal: The American Statistician
Pages: 345-358
Issue: 4
Volume: 74
Year: 2020
Month: 10
X-DOI: 10.1080/00031305.2020.1731599
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1731599
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:4:p:345-358
Template-Type: ReDIF-Article 1.0
Author-Name: Christopher T. Franck
Author-X-Name-First: Christopher T.
Author-X-Name-Last: Franck
Author-Name: Robert B. Gramacy
Author-X-Name-First: Robert B.
Author-X-Name-Last: Gramacy
Title: Assessing Bayes Factor Surfaces Using Interactive Visualization and Computer Surrogate Modeling
Abstract:
Bayesian model selection provides a natural alternative to classical hypothesis testing based on p-values. While many articles mention that Bayesian model selection can be sensitive to prior specification on parameters, there are few practical strategies to assess and report this sensitivity. This article has two goals. First, we aim to educate the broader statistical community about the extent of potential sensitivity through visualization of the Bayes factor surface. The Bayes factor surface shows the value a Bayes factor takes as a function of user-specified hyperparameters. Second, we suggest surrogate modeling via Gaussian processes to visualize the Bayes factor surface in situations where computation is expensive. We provide three examples including an interactive R shiny application that explores a simple regression problem, a hierarchical linear model selection exercise, and finally surrogate modeling via Gaussian processes to a study of the influence of outliers in empirical finance. We suggest Bayes factor surfaces are valuable for scientific reporting since they (i) increase transparency by making instability in Bayes factors easy to visualize, (ii) generalize to simple and complicated examples, and (iii) provide a path for researchers to assess the impact of prior choice on modeling decisions in a wide variety of research areas. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 359-369
Issue: 4
Volume: 74
Year: 2020
Month: 10
X-DOI: 10.1080/00031305.2019.1671219
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1671219
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:4:p:359-369
Template-Type: ReDIF-Article 1.0
Author-Name: Jae H. Kim
Author-X-Name-First: Jae H.
Author-X-Name-Last: Kim
Title: Decision-Theoretic Hypothesis Testing: A Primer With R Package OptSig
Abstract:
This article is a primer for a decision-theoretic approach to hypothesis testing for students and teachers of basic statistics. Using three examples at an introductory level, this article demonstrates how decision-theoretic hypothesis testing can be taught to the students of basic statistics. It also demonstrates that students and researchers can make more sensible and unambiguous decisions under uncertainty by employing this particular approach. The examples are illustrated using R and its package “OptSig.”
Journal: The American Statistician
Pages: 370-379
Issue: 4
Volume: 74
Year: 2020
Month: 10
X-DOI: 10.1080/00031305.2020.1750484
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1750484
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:4:p:370-379
Template-Type: ReDIF-Article 1.0
Author-Name: Scott D. Grimshaw
Author-X-Name-First: Scott D.
Author-X-Name-Last: Grimshaw
Author-Name: Natalie J. Blades
Author-X-Name-First: Natalie J.
Author-X-Name-Last: Blades
Author-Name: Candace Berrett
Author-X-Name-First: Candace
Author-X-Name-Last: Berrett
Title: Going Viral, Binge-Watching, and Attention Cannibalism
Abstract:
Binge-watching behavior is modeled for a single season of an original program from a streaming service to understand and make predictions about how individuals watch newly released content. Viewers make two choices in binge watching. First, the onset when individuals begin viewing the program is modeled using a change point between epidemic viewing with a nonconstant hazard rate and endemic viewing with a constant hazard rate. Second, the time it takes for individuals to complete the full season is modeled using an expanded negative binomial hurdle model to account for both binge racers (who watch all episodes in a single day) and other viewers. With the rapid increase in original content for streaming services, network executives are interested in the decision of simultaneously releasing multiple original programs or staggering premiere dates. The two model results are used to investigate competing risks to determine how the amount of time between premieres impacts attention cannibalism, when a viewer takes a long time watching their first choice program and consequently never watches the second program.
Journal: The American Statistician
Pages: 380-391
Issue: 4
Volume: 74
Year: 2020
Month: 10
X-DOI: 10.1080/00031305.2020.1774415
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1774415
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:4:p:380-391
Template-Type: ReDIF-Article 1.0
Author-Name: Haozhe Zhang
Author-X-Name-First: Haozhe
Author-X-Name-Last: Zhang
Author-Name: Joshua Zimmerman
Author-X-Name-First: Joshua
Author-X-Name-Last: Zimmerman
Author-Name: Dan Nettleton
Author-X-Name-First: Dan
Author-X-Name-Last: Nettleton
Author-Name: Daniel J. Nordman
Author-X-Name-First: Daniel J.
Author-X-Name-Last: Nordman
Title: Random Forest Prediction Intervals
Abstract:
Random forests are among the most popular machine learning techniques for prediction problems. When using random forests to predict a quantitative response, an important but often overlooked challenge is the determination of prediction intervals that will contain an unobserved response value with a specified probability. We propose new random forest prediction intervals that are based on the empirical distribution of out-of-bag prediction errors. These intervals can be obtained as a by-product of a single random forest. Under regularity conditions, we prove that the proposed intervals have asymptotically correct coverage rates. Simulation studies and analysis of 60 real datasets are used to compare the finite-sample properties of the proposed intervals with quantile regression forests and recently proposed split conformal intervals. The results indicate that intervals constructed with our proposed method tend to be narrower than those of competing methods while still maintaining marginal coverage rates approximately equal to nominal levels.
Journal: The American Statistician
Pages: 392-406
Issue: 4
Volume: 74
Year: 2020
Month: 10
X-DOI: 10.1080/00031305.2019.1585288
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1585288
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:4:p:392-406
Template-Type: ReDIF-Article 1.0
Author-Name: Tomasz J. Kozubowski
Author-X-Name-First: Tomasz J.
Author-X-Name-Last: Kozubowski
Author-Name: Krzysztof Podgórski
Author-X-Name-First: Krzysztof
Author-X-Name-Last: Podgórski
Title: Gaussian Mixture Representation of the Laplace Distribution Revisited: Bibliographical Connections and Extensions
Abstract:
We provide bibliographical connections and extensions of several representations of the classical Laplace distribution, discussed recently in the study of Ding and Blitzstein. Beyond presenting relation to some previous results, we also include their skew as well as multivariate versions. In particular, the distribution of det Z, where Z is an n × n matrix of iid standard normal components, is obtained for an arbitrary integer n. While the latter is a scale mixture of Gaussian distributions, the Laplace distribution is obtained only in the case n = 2. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 407-412
Issue: 4
Volume: 74
Year: 2020
Month: 10
X-DOI: 10.1080/00031305.2019.1630000
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1630000
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:4:p:407-412
Template-Type: ReDIF-Article 1.0
Author-Name: Malay Ghosh
Author-X-Name-First: Malay
Author-X-Name-Last: Ghosh
Title: Revisiting Jeffreys’ Example: Bayes Test of the Normal Mean
Abstract:
We revisit the classical problem of testing whether a normal mean is zero against all possible alternatives within a Bayesian framework. Jeffreys showed that the Bayes factor for this problem has a drawback with normal priors for the alternatives. He showed also that this deficiency is rectified when one uses a Cauchy prior instead. Noting that a Cauchy prior is an example of a scale-mixed normal prior, we want to examine whether or not scale-mixed normal priors can always overcome the deficiency of the Bayes factor. It turns out though that while mixing priors with polynomial tails can overcome this deficiency, those with exponential tails fail to do so. Examples are provided to illustrate this point.
Journal: The American Statistician
Pages: 413-415
Issue: 4
Volume: 74
Year: 2020
Month: 10
X-DOI: 10.1080/00031305.2019.1687013
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1687013
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:4:p:413-415
Template-Type: ReDIF-Article 1.0
Author-Name: Wen Li
Author-X-Name-First: Wen
Author-X-Name-Last: Li
Author-Name: Thomas O. Jemielita
Author-X-Name-First: Thomas O.
Author-X-Name-Last: Jemielita
Title: Mathematical and Statistical Skills in the Biopharmaceutical Industry: A Pragmatic Approach.
Journal: The American Statistician
Pages: 416-417
Issue: 4
Volume: 74
Year: 2020
Month: 10
X-DOI: 10.1080/00031305.2020.1831806
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1831806
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:4:p:416-417
Template-Type: ReDIF-Article 1.0
Author-Name: Qixuan Chen
Author-X-Name-First: Qixuan
Author-X-Name-Last: Chen
Title: Multiple Imputation in Practice: With Examples Using IVEware.
Journal: The American Statistician
Pages: 417-417
Issue: 4
Volume: 74
Year: 2020
Month: 10
X-DOI: 10.1080/00031305.2020.1831809
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1831809
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:4:p:417-417
Template-Type: ReDIF-Article 1.0
Author-Name: Kalimuthu Krishnamoorthy
Author-X-Name-First: Kalimuthu
Author-X-Name-Last: Krishnamoorthy
Author-Name: Yanping Xia
Author-X-Name-First: Yanping
Author-X-Name-Last: Xia
Title: Xinjie Hu, Aekyung Jung, and Gengsheng Qin (2020), “Interval Estimation for the Correlation Coefficient,” The American Statistician, 74:1, 29–36: Comment by Krishnamoorthy and Xia
Journal: The American Statistician
Pages: 418-418
Issue: 4
Volume: 74
Year: 2020
Month: 10
X-DOI: 10.1080/00031305.2020.1829048
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1829048
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:4:p:418-418
Template-Type: ReDIF-Article 1.0
Author-Name: Xinjie Hu
Author-X-Name-First: Xinjie
Author-X-Name-Last: Hu
Author-Name: Aekyung Jung
Author-X-Name-First: Aekyung
Author-X-Name-Last: Jung
Author-Name: Gengsheng Qin
Author-X-Name-First: Gengsheng
Author-X-Name-Last: Qin
Title: A Response to the Letter to the Editor on “Interval Estimation for the Correlation Coefficient,” The American Statistician, 74:1, 29–36: Comment by Krishnamoorthy and Xia
Journal: The American Statistician
Pages: 419-419
Issue: 4
Volume: 74
Year: 2020
Month: 10
X-DOI: 10.1080/00031305.2020.1827032
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1827032
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:4:p:419-419
Template-Type: ReDIF-Article 1.0
Author-Name: The Editors
Title: Editorial Collaborators
Journal: The American Statistician
Pages: 420-421
Issue: 4
Volume: 74
Year: 2020
Month: 10
X-DOI: 10.1080/00031305.2020.1842019
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1842019
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:74:y:2020:i:4:p:420-421
Template-Type: ReDIF-Article 1.0
Author-Name: Michael Lavine
Author-X-Name-First: Michael
Author-X-Name-Last: Lavine
Author-Name: Jim Hodges
Author-X-Name-First: Jim
Author-X-Name-Last: Hodges
Title: Intuition for an Old Curiosity and an Implication for MCMC
Abstract:
Morris and Ebey reported the following curiosity. “The unweighted sample mean is examined as an estimator of the population mean in a first-order autoregressive model. It is demonstrated that the precision of this estimator deteriorates as the number of equally spaced observations taken within a fixed time interval increases.” Morris and Ebey proved their result but gave no intuition for it. We provide some intuition, then examine an implication: that the usual practice of estimating posterior expectations by taking the unweighted average of consecutive Markov chain Monte Carlo (MCMC) samples may not be optimal.
Journal: The American Statistician
Pages: 1-6
Issue: 1
Volume: 75
Year: 2021
Month: 1
X-DOI: 10.1080/00031305.2018.1518267
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1518267
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:1-6
Template-Type: ReDIF-Article 1.0
Author-Name: Rudy Ligtvoet
Author-X-Name-First: Rudy
Author-X-Name-Last: Ligtvoet
Title: Exact Bayes Factors for the Comparison of Multinomial Distributions
Abstract:
This article deals with the problem of comparing multinomial distributions with multiple ordered categories. A graphical procedure is proposed for obtaining the posterior probabilities for the hypotheses of a stochastic dominance relationship, positive cumulative odds ratios, and a likelihood ratio ordering. From these expressions we subsequently obtain exact expressions for the Bayes factors related to these hypotheses. Supplemental materials for running the analysis for the examples presented in the article are available online.
Journal: The American Statistician
Pages: 7-14
Issue: 1
Volume: 75
Year: 2021
Month: 1
X-DOI: 10.1080/00031305.2019.1575773
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1575773
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:7-14
Template-Type: ReDIF-Article 1.0
Author-Name: Paul Kvam
Author-X-Name-First: Paul
Author-X-Name-Last: Kvam
Title: The Price is Right: Analyzing Bidding Behavior on Contestants’ Row
Abstract:
The TV game show “The Price is Right” features a bidding auction called Contestant’s Row that rewards the player (out of four) who bids closest to an item’s value without overbidding. By exploring 903 game outcomes from the 2000–2001 season, we show how player strategies are significantly inefficient, and compare the empirical results to probability outcomes for optimal bid strategies found in a recent study. Findings show that the last bidder would do better using the naïve strategy of bidding a dollar more than the highest of the three bids. We apply the EM algorithm in a novel way to extract a maximum amount of information from observed player bids. The gained knowledge about a player’s evaluation of merchandise allows us to uncover new insights into player behavior, including the potential effects of anchoring.
Journal: The American Statistician
Pages: 15-22
Issue: 1
Volume: 75
Year: 2021
Month: 1
X-DOI: 10.1080/00031305.2019.1592782
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1592782
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:15-22
Template-Type: ReDIF-Article 1.0
Author-Name: Chuan-Fa Tang
Author-X-Name-First: Chuan-Fa
Author-X-Name-Last: Tang
Author-Name: Dewei Wang
Author-X-Name-First: Dewei
Author-X-Name-Last: Wang
Author-Name: Hammou El Barmi
Author-X-Name-First: Hammou
Author-X-Name-Last: El Barmi
Author-Name: Joshua M. Tebbs
Author-X-Name-First: Joshua M.
Author-X-Name-Last: Tebbs
Title: Testing for Positive Quadrant Dependence
Abstract:
We develop an empirical likelihood (EL) approach to test independence of two univariate random variables X and Y versus the alternative that X and Y are strictly positive quadrant dependent (PQD). Establishing this type of ordering between X and Y is of interest in many applications, including finance, insurance, engineering, and other areas. Adopting the framework in Einmahl and McKeague, we create a distribution-free test statistic that integrates a localized EL ratio test statistic with respect to the empirical joint distribution of X and Y. When compared to well-known existing tests and distance-based tests we develop by using copula functions, simulation results show the EL testing procedure performs well in a variety of scenarios when X and Y are strictly PQD. We use three datasets for illustration and provide an online R resource practitioners can use to implement the methods in this article. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 23-30
Issue: 1
Volume: 75
Year: 2021
Month: 1
X-DOI: 10.1080/00031305.2019.1607554
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1607554
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:23-30
Template-Type: ReDIF-Article 1.0
Author-Name: J. González-Ortega
Author-X-Name-First: J.
Author-X-Name-Last: González-Ortega
Author-Name: D. Ríos Insua
Author-X-Name-First: D.
Author-X-Name-Last: Ríos Insua
Author-Name: F. Ruggeri
Author-X-Name-First: F.
Author-X-Name-Last: Ruggeri
Author-Name: R. Soyer
Author-X-Name-First: R.
Author-X-Name-Last: Soyer
Title: Hypothesis Testing in Presence of Adversaries
Abstract:
We present an extension to the classical problem of hypothesis testing by incorporating actions of an adversary who intends to mislead the decision-maker and attain a certain benefit. After presenting the general problem within an adversarial statistical decision theory framework, we consider the cases of adversaries who can either perturb the data received or modify the underlying data-generating process parametrically. Supplemental materials for this article are available online.
Journal: The American Statistician
Pages: 31-40
Issue: 1
Volume: 75
Year: 2021
Month: 1
X-DOI: 10.1080/00031305.2019.1630001
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1630001
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:31-40
Template-Type: ReDIF-Article 1.0
Author-Name: Bo Peng
Author-X-Name-First: Bo
Author-X-Name-Last: Peng
Author-Name: Min Wang
Author-X-Name-First: Min
Author-X-Name-Last: Wang
Title: Objective Bayesian testing for the correlation coefficient under divergence-based priors
Abstract:
The correlation coefficient is a commonly used criterion to measure the strength of a linear relationship between the two quantitative variables. For a bivariate normal distribution, numerous procedures have been proposed for testing a precise null hypothesis of the correlation coefficient, whereas the construction of flexible procedures for testing a set of (multiple) precise and/or interval hypotheses has received less attention. This paper fills the gap by proposing an objective Bayesian testing procedure using the divergence-based priors. The proposed Bayes factors can be used for testing any combination of precise and interval hypotheses and also allow a researcher to quantify evidence in the data in favor of the null or any other hypothesis under consideration. An extensive simulation study is conducted to compare the performances between the proposed Bayesian methods and some existing ones in the literature. Finally, a real-data example is provided for illustrative purposes.
Journal: The American Statistician
Pages: 41-51
Issue: 1
Volume: 75
Year: 2021
Month: 1
X-DOI: 10.1080/00031305.2019.1677266
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1677266
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:41-51
Template-Type: ReDIF-Article 1.0
Author-Name: D. Andrew Brown
Author-X-Name-First: D. Andrew
Author-X-Name-Last: Brown
Author-Name: Christopher S. McMahan
Author-X-Name-First: Christopher S.
Author-X-Name-Last: McMahan
Author-Name: Stella Watson Self
Author-X-Name-First: Stella
Author-X-Name-Last: Watson Self
Title: Sampling Strategies for Fast Updating of Gaussian Markov Random Fields
Abstract:
Gaussian Markov random fields (GMRFs) are popular for modeling dependence in large areal datasets due to their ease of interpretation and computational convenience afforded by the sparse precision matrices needed for random variable generation. Typically in Bayesian computation, GMRFs are updated jointly in a block Gibbs sampler or componentwise in a single-site sampler via the full conditional distributions. The former approach can speed convergence by updating correlated variables all at once, while the latter avoids solving large matrices. We consider a sampling approach in which the underlying graph can be cut so that conditionally independent sites are updated simultaneously. This algorithm allows a practitioner to parallelize updates of subsets of locations or to take advantage of “vectorized” calculations in a high-level language such as R. Through both simulated and real data, we demonstrate computational savings that can be achieved versus both single-site and block updating, regardless of whether the data are on a regular or an irregular lattice. The approach provides a good compromise between statistical and computational efficiency and is accessible to statisticians without expertise in numerical analysis or advanced computing.
Journal: The American Statistician
Pages: 52-65
Issue: 1
Volume: 75
Year: 2021
Month: 1
X-DOI: 10.1080/00031305.2019.1595144
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1595144
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:52-65
Template-Type: ReDIF-Article 1.0
Author-Name: Alex Karanevich
Author-X-Name-First: Alex
Author-X-Name-Last: Karanevich
Author-Name: Richard Meier
Author-X-Name-First: Richard
Author-X-Name-Last: Meier
Author-Name: Stefan Graw
Author-X-Name-First: Stefan
Author-X-Name-Last: Graw
Author-Name: Anna McGlothlin
Author-X-Name-First: Anna
Author-X-Name-Last: McGlothlin
Author-Name: Byron Gajewski
Author-X-Name-First: Byron
Author-X-Name-Last: Gajewski
Title: Optimizing Sample Size Allocation and Power in a Bayesian Two-Stage Drop-the-Losers Design
Abstract:
When a researcher desires to test several treatment arms against a control arm, a two-stage adaptive design can be more efficient than a single-stage design where patients are equally allocated to all treatment arms and the control. We see this type of approach in clinical trials as a seamless Phase II–Phase III design. These designs require more statistical support and are less straightforward to plan and analyze than a standard single-stage design. To diminish the barriers associated with a Bayesian two-stage drop-the-losers design, we built a user-friendly point-and-click graphical user interface with R Shiny to aid researchers in planning such designs by allowing them to easily obtain trial operating characteristics, estimate statistical power and sample size, and optimize patient allocation in each stage to maximize power. We assume that endpoints are distributed normally with unknown but common variance between treatments. We recommend this software as an easy way to engage statisticians and researchers in two-stage designs as well as to actively investigate the power of two-stage designs relative to more traditional approaches. The software is freely available at https://github.com/stefangraw/Allocation-Power-Optimizer.
Journal: The American Statistician
Pages: 66-75
Issue: 1
Volume: 75
Year: 2021
Month: 1
X-DOI: 10.1080/00031305.2019.1610065
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1610065
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:66-75
Template-Type: ReDIF-Article 1.0
Author-Name: Richard Berk
Author-X-Name-First: Richard
Author-X-Name-Last: Berk
Author-Name: Andreas Buja
Author-X-Name-First: Andreas
Author-X-Name-Last: Buja
Author-Name: Lawrence Brown
Author-X-Name-First: Lawrence
Author-X-Name-Last: Brown
Author-Name: Edward George
Author-X-Name-First: Edward
Author-X-Name-Last: George
Author-Name: Arun Kumar Kuchibhotla
Author-X-Name-First: Arun Kumar
Author-X-Name-Last: Kuchibhotla
Author-Name: Weijie Su
Author-X-Name-First: Weijie
Author-X-Name-Last: Su
Author-Name: Linda Zhao
Author-X-Name-First: Linda
Author-X-Name-Last: Zhao
Title: Assumption Lean Regression
Abstract:
It is well known that with observational data, models used in conventional regression analyses are commonly misspecified. Yet in practice, one tends to proceed with interpretations and inferences that rely on correct specification. Even those who invoke Box’s maxim that all models are wrong proceed as if results were generally useful. Misspecification, however, has implications that affect practice. Regression models are approximations to a true response surface and should be treated as such. Accordingly, regression parameters should be interpreted as statistical functionals. Importantly, the regressor distribution affects targets of estimation and regressor randomness affects the sampling variability of estimates. As a consequence, inference should be based on sandwich estimators or the pairs (x–y) bootstrap. Traditional prediction intervals lose their pointwise coverage guarantees, but empirically calibrated intervals can be justified for future populations. We illustrate the key concepts with an empirical application.
Journal: The American Statistician
Pages: 76-84
Issue: 1
Volume: 75
Year: 2021
Month: 1
X-DOI: 10.1080/00031305.2019.1592781
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1592781
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:76-84
Template-Type: ReDIF-Article 1.0
Author-Name: Julian Fecker
Author-X-Name-First: Julian
Author-X-Name-Last: Fecker
Author-Name: Martin Schumacher
Author-X-Name-First: Martin
Author-X-Name-Last: Schumacher
Author-Name: Kristin Ohneberg
Author-X-Name-First: Kristin
Author-X-Name-Last: Ohneberg
Author-Name: Martin Wolkewitz
Author-X-Name-First: Martin
Author-X-Name-Last: Wolkewitz
Title: Correction of Survival Bias in a Study About Increased Mortality of Heads of Government
Abstract:
A recent study reported increased mortality of heads in government. To avoid the time-dependent bias (also known as immortal-time bias), survival from last election was compared between election winners and runners-up. We claim that this data manipulation results in bias due to conditioning on future events; survival should be compared from first election as well as winning should be considered as a time-dependent covariate. We collected the missing life-time periods and redesigned the study to display this bias using Lexis diagrams and multistate methodology. We found that the bias that we termed the healthy candidate bias was even more severe than the time-dependent bias.
Journal: The American Statistician
Pages: 85-91
Issue: 1
Volume: 75
Year: 2021
Month: 1
X-DOI: 10.1080/00031305.2019.1638831
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1638831
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:85-91
Template-Type: ReDIF-Article 1.0
Author-Name: Madison Arnsbarger
Author-X-Name-First: Madison
Author-X-Name-Last: Arnsbarger
Author-Name: Joshua Goldstein
Author-X-Name-First: Joshua
Author-X-Name-Last: Goldstein
Author-Name: Claire Kelling
Author-X-Name-First: Claire
Author-X-Name-Last: Kelling
Author-Name: Gizem Korkmaz
Author-X-Name-First: Gizem
Author-X-Name-Last: Korkmaz
Author-Name: Sallie Keller
Author-X-Name-First: Sallie
Author-X-Name-Last: Keller
Title: Modeling Response Time to Structure Fires
Abstract:
It is important to reduce fire department response times to incidents to improve communities’ general safety, to make the allocation of emergency resources more efficient, and to improve situational awareness. In this article, we identify which factors affect turnout times and travel times for the Arlington County Fire Department in Virginia by applying both linear and spatial models to the U.S. National Fire Incident Reporting System (NFIRS) data. The uniformity of NFIRS data makes this article’s methodological innovations applicable to other participating fire departments in the United States and advances the effort to incorporate scientific evidence into government-level policy-making.
Journal: The American Statistician
Pages: 92-100
Issue: 1
Volume: 75
Year: 2021
Month: 1
X-DOI: 10.1080/00031305.2019.1695664
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1695664
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:92-100
Template-Type: ReDIF-Article 1.0
Author-Name: Duy Nguyen
Author-X-Name-First: Duy
Author-X-Name-Last: Nguyen
Title: A Probabilistic Approach to The Moments of Binomial Random Variables and Application
Abstract:
In this paper, we provide a closed form formula for the moments of binomial random variables using a probabilistic approach. As an interesting application, we give a closed form formula for the sum 1k+2k+3k+…+nk
.
Journal: The American Statistician
Pages: 101-103
Issue: 1
Volume: 75
Year: 2021
Month: 1
X-DOI: 10.1080/00031305.2019.1679257
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1679257
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:101-103
Template-Type: ReDIF-Article 1.0
Author-Name: David R. Bickel
Author-X-Name-First: David R.
Author-X-Name-Last: Bickel
Title: Null Hypothesis Significance Testing Interpreted and Calibrated by Estimating Probabilities of Sign Errors: A Bayes-Frequentist Continuum
Abstract:
Hypothesis tests are conducted not only to determine whether a null hypothesis (H0) is true but also to determine the direction or sign of an effect. A simple estimate of the posterior probability of a sign error is PSE = (1 – PH0)p/2 + PH0, depending only on a two-sided p-value and PH0, an estimate of the posterior probability of H0. A convenient option for PH0 is the posterior probability derived from estimating the Bayes factor to be its e p ln
(1/p) lower bound. In that case, PSE depends only on p and an estimate of the prior probability of H0. PSE provides a continuum between significance testing and traditional Bayesian testing. The former effectively assumes the prior probability of H0 is 0, as some statisticians argue. In that case, PSE is equal to a one-sided p-value. (In that sense, PSE is a calibrated p-value.) In traditional Bayesian testing, on the other hand, the prior probability of H0 is at least 50%, which usually brings PSE close to PH0.
Journal: The American Statistician
Pages: 104-112
Issue: 1
Volume: 75
Year: 2020
Month: 10
X-DOI: 10.1080/00031305.2020.1816214
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1816214
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2020:i:1:p:104-112
Template-Type: ReDIF-Article 1.0
Author-Name: James M. Flegal
Author-X-Name-First: James M.
Author-X-Name-Last: Flegal
Title: Data Visualization: Charts, Maps, and Interactive Graphics. Robert Grant.
Journal: The American Statistician
Pages: 113-113
Issue: 1
Volume: 75
Year: 2021
Month: 1
X-DOI: 10.1080/00031305.2020.1865062
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1865062
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:113-113
Template-Type: ReDIF-Article 1.0
Author-Name: James M. Flegal
Author-X-Name-First: James M.
Author-X-Name-Last: Flegal
Title: Fundamentals of Probability with Stochastic Processes, 4th ed. Saeed Ghahramani.
Journal: The American Statistician
Pages: 113-114
Issue: 1
Volume: 75
Year: 2021
Month: 1
X-DOI: 10.1080/00031305.2020.1865063
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1865063
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:113-114
Template-Type: ReDIF-Article 1.0
Author-Name: Nicholas W. Bussberg
Author-X-Name-First: Nicholas W.
Author-X-Name-Last: Bussberg
Title: Spatio-Temporal Statistics With R.
Journal: The American Statistician
Pages: 114-114
Issue: 1
Volume: 75
Year: 2021
Month: 1
X-DOI: 10.1080/00031305.2020.1865066
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1865066
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:114-114
Template-Type: ReDIF-Article 1.0
Author-Name: Roger L. Berger
Author-X-Name-First: Roger L.
Author-X-Name-Last: Berger
Title: McCann and Habiger (2020), “The Detection of Nonnegligible Directional Effects With Associated Measures of Statistical Significance,”
Journal: The American Statistician
Pages: 115-115
Issue: 1
Volume: 75
Year: 2020
Month: 12
X-DOI: 10.1080/00031305.2020.1850523
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1850523
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2020:i:1:p:115-115
Template-Type: ReDIF-Article 1.0
Author-Name: Melinda H. McCann
Author-X-Name-First: Melinda H.
Author-X-Name-Last: McCann
Author-Name: Joshua D. Habiger
Author-X-Name-First: Joshua D.
Author-X-Name-Last: Habiger
Title: Response to the Letter to the Editor on “The Detection of Nonnegligible Directional Effects With Associated Measures of Statistical Significance,” The American Statistician, 74:3, 213–217: Comment by Roger Berger
Journal: The American Statistician
Pages: 116-116
Issue: 1
Volume: 75
Year: 2021
Month: 1
X-DOI: 10.1080/00031305.2020.1851766
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1851766
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:1:p:116-116
Template-Type: ReDIF-Article 1.0
Author-Name: Dale L. Zimmerman
Author-X-Name-First: Dale L.
Author-X-Name-Last: Zimmerman
Author-Name: Nathan D. Zimmerman
Author-X-Name-First: Nathan D.
Author-X-Name-Last: Zimmerman
Author-Name: Joshua T. Zimmerman
Author-X-Name-First: Joshua T.
Author-X-Name-Last: Zimmerman
Title: March Madness “Anomalies”: Are They Real, and If So, Can They Be Explained?
Abstract:
Previously published statistical analyses of NCAA Division I Men’s Basketball Tournament (“March Madness”) game outcomes since the 64-team format for its main draw began in 1985 have uncovered some apparent anomalies, such as 12-seeds upsetting 5-seeds more often than might be expected, and seeds 10 through 12 advancing to the Sweet Sixteen much more often than 8-seeds and 9-seeds—the so-called middle-seed anomaly. In this article, we address the questions of whether these perceived anomalies truly are anomalous and if so, what is responsible for them. We find that, in contrast to conclusions drawn from previous analyses, the statistical evidence for a 12-5 upset anomaly actually is very weak, while that for the middle-seed anomaly is quite strong. We dispel some (but not all) theories for the former and offer an explanation for the latter that is based primarily on the combined effects of a nonlinear relationship between team strength and seed, the lack of reseeding between rounds, and a strong quasi-home advantage accorded to 1-seeds. We also investigate the effects that hypothetical modifications to the tournament would have on the anomalies and explore whether similar anomalies exist in the NCAA Women’s Basketball Tournament.
Journal: The American Statistician
Pages: 207-216
Issue: 2
Volume: 75
Year: 2021
Month: 5
X-DOI: 10.1080/00031305.2020.1720814
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1720814
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:2:p:207-216
Template-Type: ReDIF-Article 1.0
Author-Name: Mevin B. Hooten
Author-X-Name-First: Mevin B.
Author-X-Name-Last: Hooten
Author-Name: Devin S. Johnson
Author-X-Name-First: Devin S.
Author-X-Name-Last: Johnson
Author-Name: Brian M. Brost
Author-X-Name-First: Brian M.
Author-X-Name-Last: Brost
Title: Making Recursive Bayesian Inference Accessible
Abstract:
Bayesian models provide recursive inference naturally because they can formally reconcile new data and existing scientific information. However, popular use of Bayesian methods often avoids priors that are based on exact posterior distributions resulting from former studies. Two existing Recursive Bayesian methods are: Prior- and Proposal-Recursive Bayes. Prior-Recursive Bayes uses Bayesian updating, fitting models to partitions of data sequentially, and provides a way to accommodate new data as they become available using the posterior from the previous stage as the prior in the new stage based on the latest data. Proposal-Recursive Bayes is intended for use with hierarchical Bayesian models and uses a set of transient priors in first stage independent analyses of the data partitions. The second stage of Proposal-Recursive Bayes uses the posteriors from the first stage as proposals in a Markov chain Monte Carlo algorithm to fit the full model. We combine Prior- and Proposal-Recursive concepts to fit any Bayesian model, and often with computational improvements. We demonstrate our method with two case studies. Our approach has implications for big data, streaming data, and optimal adaptive design situations.
Journal: The American Statistician
Pages: 185-194
Issue: 2
Volume: 75
Year: 2021
Month: 5
X-DOI: 10.1080/00031305.2019.1665584
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1665584
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:2:p:185-194
Template-Type: ReDIF-Article 1.0
Author-Name: Aaron Fisher
Author-X-Name-First: Aaron
Author-X-Name-Last: Fisher
Author-Name: Edward H. Kennedy
Author-X-Name-First: Edward H.
Author-X-Name-Last: Kennedy
Title: Visually Communicating and Teaching Intuition for Influence Functions
Abstract:
Estimators based on influence functions (IFs) have been shown to be effective in many settings, especially when combined with machine learning techniques. By focusing on estimating a specific target of interest (e.g., the average effect of a treatment), rather than on estimating the full underlying data generating distribution, IF-based estimators are often able to achieve asymptotically optimal mean-squared error. Still, many researchers find IF-based estimators to be opaque or overly technical, which makes their use less prevalent and their benefits less available. To help foster understanding and trust in IF-based estimators, we present tangible, visual illustrations of when and how IF-based estimators can outperform standard “plug-in” estimators. The figures we show are based on connections between IFs, gradients, linear approximations, and Newton–Raphson.
Journal: The American Statistician
Pages: 162-172
Issue: 2
Volume: 75
Year: 2021
Month: 5
X-DOI: 10.1080/00031305.2020.1717620
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1717620
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:2:p:162-172
Template-Type: ReDIF-Article 1.0
Author-Name: Brian D. Segal
Author-X-Name-First: Brian D.
Author-X-Name-Last: Segal
Title: Toward Replicability With Confidence Intervals for the Exceedance Probability
Abstract:
Several scientific fields including psychology are undergoing a replication crisis. There are many reasons for this problem, one of which is a misuse of p-values. There are several alternatives to p-values, and in this article we describe a complement that is geared toward replication. In particular, we focus on confidence intervals for the probability that a parameter estimate will exceed a specified value in an exact replication study. These intervals convey uncertainty in a way that p-values and standard confidence intervals do not, and can help researchers to draw sounder scientific conclusions. After briefly reviewing background on p-values and a few alternatives, we describe our approach and provide examples with simulated and real data. For linear models, we also describe how confidence intervals for the exceedance probability are related to p-values and confidence intervals for parameters.
Journal: The American Statistician
Pages: 128-138
Issue: 2
Volume: 75
Year: 2021
Month: 5
X-DOI: 10.1080/00031305.2019.1678521
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1678521
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:2:p:128-138
Template-Type: ReDIF-Article 1.0
Author-Name: Adam Kapelner
Author-X-Name-First: Adam
Author-X-Name-Last: Kapelner
Author-Name: Abba M. Krieger
Author-X-Name-First: Abba M.
Author-X-Name-Last: Krieger
Author-Name: Michael Sklar
Author-X-Name-First: Michael
Author-X-Name-Last: Sklar
Author-Name: Uri Shalit
Author-X-Name-First: Uri
Author-X-Name-Last: Shalit
Author-Name: David Azriel
Author-X-Name-First: David
Author-X-Name-Last: Azriel
Title: Harmonizing Optimized Designs With Classic Randomization in Experiments
Abstract:
There is a long debate in experimental design between the classic randomization design of Fisher, Yates, Kempthorne, Cochran, and those who advocate deterministic assignments based on notions of optimality. In nonsequential trials comparing treatment and control, covariate measurements for each subject are known in advance, and subjects can be divided into two groups based on a criterion of imbalance. With the advent of modern computing, this partition can be made nearly perfectly balanced via numerical optimization, but these allocations are far from random. These perfect allocations may endanger estimation relative to classic randomization because unseen subject-specific characteristics can be highly imbalanced. To demonstrate this, we consider different performance criterions such as Efron’s worst-case analysis and our original tail criterion of mean squared error. Under our tail criterion for the differences-in-mean estimator, we prove asymptotically that the optimal design must be more random than perfect balance but less random than completely random. Our result vindicates restricted designs that are used regularly such as blocking and rerandomization. For a covariate-adjusted estimator, balancing offers less rewards and it seems good performance is achievable with complete randomization. Further work will provide a procedure to find the explicit optimal design in different scenarios in practice. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 195-206
Issue: 2
Volume: 75
Year: 2021
Month: 5
X-DOI: 10.1080/00031305.2020.1717619
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1717619
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:2:p:195-206
Template-Type: ReDIF-Article 1.0
Author-Name: Jin Zhang
Author-X-Name-First: Jin
Author-X-Name-Last: Zhang
Title: The Mean Relative Entropy: An Invariant Measure of Estimation Error
Abstract:
A fundamental issue in statistics is parameter estimation, where the first step is to select estimators under some measure of estimation error. The commonly used measure is the mean squared error, which is simple, intuitive and highly interpretable, but it has some drawbacks, often creating confusions in evaluating estimators. To solve these problems, we propose two invariance properties and the sufficiency principle as the prerequisite for any reasonable measure. Then, the mean relative entropy is established as an invariant measure of estimation error.
Journal: The American Statistician
Pages: 117-123
Issue: 2
Volume: 75
Year: 2021
Month: 5
X-DOI: 10.1080/00031305.2018.1543139
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1543139
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:2:p:117-123
Template-Type: ReDIF-Article 1.0
Author-Name: Philip T. Reiss
Author-X-Name-First: Philip T.
Author-X-Name-Last: Reiss
Title: A Problem of Distributive Justice, Solved by the Lasso
Abstract:
The problem of dividing an estate among creditors, when their claims total more than the value of the estate, was posed in the Talmud and has been analyzed in the game theory literature. Here, we reveal a close connection between schemes for estate division and linear regression solution paths obtained by least angle regression or by the lasso. We focus primarily on the division scheme known as constrained equal awards, but also consider a more complex approach described by Aumann and Maschler. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 139-144
Issue: 2
Volume: 75
Year: 2021
Month: 5
X-DOI: 10.1080/00031305.2019.1688682
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1688682
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:2:p:139-144
Template-Type: ReDIF-Article 1.0
Author-Name: Christine R. Wells
Author-X-Name-First: Christine R.
Author-X-Name-Last: Wells
Title: SAS for Mixed Models: Introduction and Basic Applications
Journal: The American Statistician
Pages: 231-231
Issue: 2
Volume: 75
Year: 2021
Month: 5
X-DOI: 10.1080/00031305.2021.1907997
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1907997
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:2:p:231-231
Template-Type: ReDIF-Article 1.0
Author-Name: Roberta La Haye
Author-X-Name-First: Roberta La
Author-X-Name-Last: Haye
Author-Name: Petr Zizler
Author-X-Name-First: Petr
Author-X-Name-Last: Zizler
Title: The Lorenz Curve in the Classroom
Abstract:
The Lorenz curve and Gini index have great social relevance due to concerns regarding income inequality. However, their discussion is limited in the undergraduate statistics and mathematics curriculum. This article outlines how to increase the educational potential of Lorenz curves as an application in both the calculus class and introductory probability classroom. We show how calculus and probability techniques can be used to obtain not only the Gini index, but also a variety of other statistical measures from the Lorenz curve, provided the mean is known. The measures discussed include the median, and various measures of dispersion.
Journal: The American Statistician
Pages: 217-225
Issue: 2
Volume: 75
Year: 2020
Month: 10
X-DOI: 10.1080/00031305.2020.1822916
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1822916
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2020:i:2:p:217-225
Template-Type: ReDIF-Article 1.0
Author-Name: Christina P. Knudson
Author-X-Name-First: Christina P.
Author-X-Name-Last: Knudson
Title: x + y: A Mathematician's Manifesto for Rethinking Gender
Journal: The American Statistician
Pages: 232-233
Issue: 2
Volume: 75
Year: 2021
Month: 5
X-DOI: 10.1080/00031305.2021.1907998
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1907998
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:2:p:232-233
Template-Type: ReDIF-Article 1.0
Author-Name: Florian Böing-Messing
Author-X-Name-First: Florian
Author-X-Name-Last: Böing-Messing
Author-Name: Joris Mulder
Author-X-Name-First: Joris
Author-X-Name-Last: Mulder
Title: Bayes Factors for Testing Order Constraints on Variances of Dependent Outcomes
Abstract:
In statistical practice, researchers commonly focus on patterns in the means of multiple dependent outcomes while treating variances as nuisance parameters. However, in fact, there are often substantive reasons to expect certain patterns in the variances of dependent outcomes as well. For example, in a repeated measures study, one may expect the variance of the outcome to increase over time if the difference between subjects becomes more pronounced over time because the subjects respond differently to a given treatment. Such expectations can be formulated as order constrained hypotheses on the variances of the dependent outcomes. Currently, however, no methods exist for testing such hypotheses in a direct manner. To fill this gap, we develop a Bayes factor for this challenging testing problem. Our Bayes factor is based on the multivariate normal distribution with an unstructured covariance matrix, which is often used to model dependent outcomes. Order constrained hypotheses can then be formulated on the variances on the diagonal of the covariance matrix. To compute Bayes factors between multiple order constrained hypotheses, a prior distribution needs to be specified under every hypothesis to be tested. Here, we use the encompassing prior approach in which priors under order constrained hypotheses are truncations of the prior under the unconstrained hypothesis. The resulting Bayes factor is fully automatic in the sense that no subjective priors need to be specified by the user.
Journal: The American Statistician
Pages: 152-161
Issue: 2
Volume: 75
Year: 2021
Month: 5
X-DOI: 10.1080/00031305.2020.1715257
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1715257
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:2:p:152-161
Template-Type: ReDIF-Article 1.0
Author-Name: J. G. Liao
Author-X-Name-First: J. G.
Author-X-Name-Last: Liao
Author-Name: Arthur Berg
Author-X-Name-First: Arthur
Author-X-Name-Last: Berg
Author-Name: Timothy L. McMurry
Author-X-Name-First: Timothy L.
Author-X-Name-Last: McMurry
Title: A Robustified Posterior for Bayesian Inference on a Large Number of Parallel Effects
Abstract:
Many modern experiments, such as microarray gene expression and genome-wide association studies, present the problem of estimating a large number of parallel effects. Bayesian inference is a popular approach for analyzing such data by modeling the large number of unknown parameters as random effects from a common prior distribution. However, misspecification of the prior distribution can lead to erroneous estimates of the random effects, especially for the largest and most interesting effects. This article has two aims. First, we propose a robustified posterior distribution for a parametric Bayesian hierarchical model that can substantially reduce the impact of a misspecified prior. Second, we conduct a systematic comparison of the standard parametric posterior, the proposed robustified parametric posterior, and nonparametric Bayesian posterior which uses a Dirichlet process mixture prior. The proposed robustified posterior when combined with a flexible parametric prior can be a superior alternative to nonparametric Bayesian methods.
Journal: The American Statistician
Pages: 145-151
Issue: 2
Volume: 75
Year: 2021
Month: 5
X-DOI: 10.1080/00031305.2019.1701549
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1701549
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:2:p:145-151
Template-Type: ReDIF-Article 1.0
Author-Name: Douglas VanDerwerken
Author-X-Name-First: Douglas
Author-X-Name-Last: VanDerwerken
Title: Slugging Percentage Is Not a Percentage—And Why That Matters
Abstract:
In this short note, the asymptotic distribution of slugging percentage (SLG) in baseball is derived under multinomial sampling. It is shown that treating SLG like a binomial random variable divided by the number of trials (as is occasionally done in the literature) gives only a lower bound on the variance, which may be a considerable underestimate in practice.
Journal: The American Statistician
Pages: 124-127
Issue: 2
Volume: 75
Year: 2021
Month: 5
X-DOI: 10.1080/00031305.2018.1564698
File-URL: http://hdl.handle.net/10.1080/00031305.2018.1564698
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:2:p:124-127
Template-Type: ReDIF-Article 1.0
Author-Name: Edward L. Boone
Author-X-Name-First: Edward L.
Author-X-Name-Last: Boone
Title: The Model Thinker: What You Need to Know to Make Data Work for You
Journal: The American Statistician
Pages: 230-231
Issue: 2
Volume: 75
Year: 2021
Month: 5
X-DOI: 10.1080/00031305.2021.1907993
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1907993
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:2:p:230-231
Template-Type: ReDIF-Article 1.0
Author-Name: Chunming Zhang
Author-X-Name-First: Chunming
Author-X-Name-Last: Zhang
Title: Further Examples Related to Correlations Between Variables and Ranks
Abstract:
Rank statistics {R1,…,Rn} of actual variates {X1,…,Xn} play an important role in university undergraduate nonparametric statistics courses. This article derives explicit expressions of the correlation coefficients between Xi and Rj for not only i = j but also i≠j, for iid continuous variables X1,…,Xn with a distribution function FX(·) of X and n≥2: (a) ρXi,Ri=n−1n+1 ρX,FX(X)∈(0,n−1n+1] for any i, revealing that the correlation can be as close to one as expected, while may also unexpectedly decrease approaching zero for other distributions of X; (b) ρXi,Rj=−1n−1ρXi,Ri∈[−1n2−1,0) for any i≠j, inferring a negligible negative association with ranks from other data; (c) the partial correlation coefficient between Xi and Ri on Xj for any i≠j equals ρ(Xi,Ri)·Xj=ρXi,Ri/1−ρXj,Ri2∈(ρXi,Ri,n−1n2−2], invariably exceeding ρXi,Ri. Implications of the results necessitate more relevant interpretation of ranks in sharing information of data.
Journal: The American Statistician
Pages: 226-229
Issue: 2
Volume: 75
Year: 2020
Month: 11
X-DOI: 10.1080/00031305.2020.1831956
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1831956
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2020:i:2:p:226-229
Template-Type: ReDIF-Article 1.0
Author-Name: Alecos Papadopoulos
Author-X-Name-First: Alecos
Author-X-Name-Last: Papadopoulos
Author-Name: Roland B. Stark
Author-X-Name-First: Roland B.
Author-X-Name-Last: Stark
Title: Does Home Health Care Increase the Probability of 30-Day Hospital Readmissions? Interpreting Coefficient Sign Reversals, or Their Absence, in Binary Logistic Regression Analysis
Abstract:
Data for 30-day readmission rates in American hospitals often show that patients that receive Home Health Care (HHC) have a higher probability of being readmitted to hospital than those that did not receive such services, but it is expected that when control variables are included in a regression we will obtain a “sign reversal” of the treatment effect. We map the real-world situation to the binary logistic regression model, and we construct a counterfactual probability metric that leads to necessary and sufficient conditions for the sign reversal to occur, conditions that show that logistic regression is an appropriate tool for this research purpose. This metric also permits us to obtain evidence related to the criteria used to assign HHC treatment. We examine seven data samples from different USA hospitals for the period 2011–2017. We find that in all cases the provision of HHC increased the probability of readmission of the treated patients. This casts doubt on the appropriateness of the 30-day readmission rate as an indicator of hospital performance and a criterion for hospital reimbursement, as it is currently used for Medicare patients.
Journal: The American Statistician
Pages: 173-184
Issue: 2
Volume: 75
Year: 2021
Month: 5
X-DOI: 10.1080/00031305.2019.1704873
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1704873
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:2:p:173-184
Template-Type: ReDIF-Article 1.0
Author-Name: Jonathan Rougier
Author-X-Name-First: Jonathan
Author-X-Name-Last: Rougier
Author-Name: Carey E. Priebe
Author-X-Name-First: Carey E.
Author-X-Name-Last: Priebe
Title: The Exact Form of the “Ockham Factor” in Model Selection
Abstract:
We explore the arguments for maximizing the “evidence” as an algorithm for model selection. We show, using a new definition of model complexity which we term “flexibility,” that maximizing the evidence should appeal to both Bayesian and frequentist statisticians. This is due to flexibility’s unique position in the exact decomposition of log-evidence into log-fit minus flexibility. In the Gaussian linear model, flexibility is asymptotically equal to the Bayesian information criterion (BIC) penalty, but we caution against using BIC in place of flexibility for model selection.
Journal: The American Statistician
Pages: 288-293
Issue: 3
Volume: 75
Year: 2021
Month: 7
X-DOI: 10.1080/00031305.2020.1764865
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1764865
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:3:p:288-293
Template-Type: ReDIF-Article 1.0
Author-Name: David L. Banks
Author-X-Name-First: David L.
Author-X-Name-Last: Banks
Author-Name: Mevin B. Hooten
Author-X-Name-First: Mevin B.
Author-X-Name-Last: Hooten
Title: Statistical Challenges in Agent-Based Modeling
Abstract:
Agent-based models (ABMs) are popular in many research communities, but few statisticians have contributed to their theoretical development. They are models like any other models we study, but in general, we are still learning how to fit ABMs to data and how to make quantified statements of uncertainty about the outputs of an ABM. ABM validation is also an underdeveloped area that is ripe for new statistical developments. In what follows, we lay out the research space and encourage statisticians to address the many research issues in the ABM ambit.
Journal: The American Statistician
Pages: 235-242
Issue: 3
Volume: 75
Year: 2021
Month: 7
X-DOI: 10.1080/00031305.2021.1900914
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1900914
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:3:p:235-242
Template-Type: ReDIF-Article 1.0
Author-Name: David R. Bickel
Author-X-Name-First: David R.
Author-X-Name-Last: Bickel
Title: Null Hypothesis Significance Testing Defended and Calibrated by Bayesian Model Checking
Abstract:
Significance testing is often criticized because p-values can be low even though posterior probabilities of the null hypothesis are not low according to some Bayesian models. Those models, however, would assign low prior probabilities to the observation that the p-value is sufficiently low. That conflict between the models and the data may indicate that the models needs revision. Indeed, if the p-value is sufficiently small while the posterior probability according to a model is insufficiently small, then the model will fail a model check. That result leads to a way to calibrate a p-value by transforming it into an upper bound on the posterior probability of the null hypothesis (conditional on rejection) for any model that would pass the check. The calibration may be calculated from a prior probability of the null hypothesis and the stringency of the check without more detailed modeling. An upper bound, as opposed to a lower bound, can justify concluding that the null hypothesis has a low posterior probability.
Journal: The American Statistician
Pages: 249-255
Issue: 3
Volume: 75
Year: 2021
Month: 7
X-DOI: 10.1080/00031305.2019.1699443
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1699443
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:3:p:249-255
Template-Type: ReDIF-Article 1.0
Author-Name: Travis Loux
Author-X-Name-First: Travis
Author-X-Name-Last: Loux
Author-Name: Orlando Davy
Author-X-Name-First: Orlando
Author-X-Name-Last: Davy
Title: Adjusting Published Estimates for Exploratory Biases Using the Truncated Normal Distribution
Abstract:
Abstract–Publication bias can occur for many reasons, including the perceived need to present statistically significant results. We propose and compare methods for adjusting a single published estimate for possible publication bias using a truncated normal distribution. We attempt to estimate the mean of the underlying normal sampling distribution using only summary data readily available in most published work, making the results practical for use by a consumer of research. The adjustment methods are investigated via simulation and their results compared in terms of bias, mean squared error, and confidence interval coverage. The methods are also applied to eleven previously published studies. We find the proposed methods improve but do not eliminate biases from the statistical significance filter.
Journal: The American Statistician
Pages: 294-299
Issue: 3
Volume: 75
Year: 2021
Month: 7
X-DOI: 10.1080/00031305.2020.1775700
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1775700
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:3:p:294-299
Template-Type: ReDIF-Article 1.0
Author-Name: Amanda S. Hering
Author-X-Name-First: Amanda S.
Author-X-Name-Last: Hering
Author-Name: Luke Durell
Author-X-Name-First: Luke
Author-X-Name-Last: Durell
Author-Name: Grant Morgan
Author-X-Name-First: Grant
Author-X-Name-Last: Morgan
Title: Illustrating Randomness in Statistics Courses With Spatial Experiments
Abstract:
Understanding the concept of randomness is fundamental for students in introductory statistics courses, but the notion of randomness is deceivingly complex, so it is often emphasized less than the mechanics of probability and inference. The most commonly used classroom tools to assess students’ production or perception of randomness are binary choices, such as coin tosses, and number sequences, such as dice rolls. The field of psychology has a long history of research on random choice, and we have replicated some experiments that support results seen there regarding the collective distribution of individual choices in spatial geometries. The data from these experiments can easily be incorporated into the undergraduate classroom to visually illustrate the concepts of random choice, complete spatial randomness (CSR), and Poisson processes. Furthermore, spatial statistics classes can use this point pattern data in exploring hypothesis tests for CSR along with simulation. To foster student engagement, it is simple to collect additional data from students to assess agreement with existing data or to develop related, unique experiments. All R code and data to duplicate results are provided.
Journal: The American Statistician
Pages: 343-353
Issue: 3
Volume: 75
Year: 2021
Month: 7
X-DOI: 10.1080/00031305.2020.1871070
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1871070
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:3:p:343-353
Template-Type: ReDIF-Article 1.0
Author-Name: Keith Kranker
Author-X-Name-First: Keith
Author-X-Name-Last: Kranker
Author-Name: Laura Blue
Author-X-Name-First: Laura
Author-X-Name-Last: Blue
Author-Name: Lauren Vollmer Forrow
Author-X-Name-First: Lauren Vollmer
Author-X-Name-Last: Forrow
Title: Improving Effect Estimates by Limiting the Variability in Inverse Propensity Score Weights
Abstract:
This study describes a novel method to reweight a comparison group used for causal inference, so the group is similar to a treatment group on observable characteristics yet avoids highly variable weights that would limit statistical power. The proposed method generalizes the covariate-balancing propensity score (CBPS) methodology developed by Imai and Ratkovic (2014) to enable researchers to effectively prespecify the variance (or higher-order moments) of the matching weight distribution. This lets researchers choose among alternative sets of matching weights, some of which produce better balance and others of which yield higher statistical power. We demonstrate using simulations that our penalized CBPS approach can improve effect estimates over those from other established propensity score estimation approaches, producing lower mean squared error. We discuss applications where the method or extensions of it are especially likely to improve effect estimates and we provide an empirical example from the evaluation of Comprehensive Primary Care Plus, a U.S. health care model that aims to strengthen primary care across roughly 3000 practices. Programming code is available to implement the method in Stata.
Journal: The American Statistician
Pages: 276-287
Issue: 3
Volume: 75
Year: 2021
Month: 7
X-DOI: 10.1080/00031305.2020.1737229
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1737229
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:3:p:276-287
Template-Type: ReDIF-Article 1.0
Author-Name: Haolun Shi
Author-X-Name-First: Haolun
Author-X-Name-Last: Shi
Author-Name: Guosheng Yin
Author-X-Name-First: Guosheng
Author-X-Name-Last: Yin
Title: Reconnecting p-Value and Posterior Probability Under One- and Two-Sided Tests
Abstract:
As a convention, p-value is often computed in frequentist hypothesis testing and compared with the nominal significance level of 0.05 to determine whether or not to reject the null hypothesis. The smaller the p-value, the more significant the statistical test. Under noninformative prior distributions, we establish the equivalence relationship between the p-value and Bayesian posterior probability of the null hypothesis for one-sided tests and, more importantly, the equivalence between the p-value and a transformation of posterior probabilities of the hypotheses for two-sided tests. For two-sided hypothesis tests with a point null, we recast the problem as a combination of two one-sided hypotheses along the opposite directions and establish the notion of a “two-sided posterior probability,” which reconnects with the (two-sided) p-value. In contrast to the common belief, such an equivalence relationship renders p-value an explicit interpretation of how strong the data support the null. Extensive simulation studies are conducted to demonstrate the equivalence relationship between the p-value and Bayesian posterior probability. Contrary to broad criticisms on the use of p-value in evidence-based studies, we justify its utility and reclaim its importance from the Bayesian perspective.
Journal: The American Statistician
Pages: 265-275
Issue: 3
Volume: 75
Year: 2021
Month: 7
X-DOI: 10.1080/00031305.2020.1717621
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1717621
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:3:p:265-275
Template-Type: ReDIF-Article 1.0
Author-Name: William B. Fairley
Author-X-Name-First: William B.
Author-X-Name-Last: Fairley
Author-Name: William A. Huber
Author-X-Name-First: William A.
Author-X-Name-Last: Huber
Title: On Being an Ethical Statistical Expert in a Legal Case
Abstract:
In the Anglo-American legal system, courts rely heavily on experts who perform an essential social function in supplying information to resolve disputes. Experts are the vehicles through which facts of any technical complexity are brought out. The adversarial nature of this legal system places expert witnesses in a quandary. Enjoined to serve the court and their profession with unbiased, independent opinion, expert witnesses nevertheless do not work directly for the court: they are employed by advocates (lawyers) who aim to win a high stakes debate for their clients. The system is imperfect. Pressures (whether real or perceived) on experts to please their clients may cause truth to be the victim. We use examples from our experience, and reports of statisticians commenting on theirs, to show how statistical evidence can be honestly and effectively used in courts. We maintain it is vital for would-be experts to study the rules of the legal process and their role within it. (The present article is a step toward that end.) We explain what the legal process looks for in an expert and present some ways in which an expert can maintain their independence and avoid being co-opted by the lawyer who sponsors them. Statisticians contribute in sometimes unique ways to the resolution of disputes, including in forums like negotiations, mediation, arbitration, and regulatory hearing, where the misuse and abuse of statistical procedures occur too often. It is a challenge for statisticians to improve that situation, but they can find professional opportunities and satisfaction in doing so. Because this discussion pertains generally to the application and communication of statistical thinking, statisticians in any sphere of application should find it useful.
Journal: The American Statistician
Pages: 323-333
Issue: 3
Volume: 75
Year: 2021
Month: 7
X-DOI: 10.1080/00031305.2020.1763834
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1763834
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:3:p:323-333
Template-Type: ReDIF-Article 1.0
Author-Name: James J. Higgins
Author-X-Name-First: James J.
Author-X-Name-Last: Higgins
Author-Name: Michael J. Higgins
Author-X-Name-First: Michael J.
Author-X-Name-Last: Higgins
Author-Name: Jinguang Lin
Author-X-Name-First: Jinguang
Author-X-Name-Last: Lin
Title: From One Environment to Many: The Problem of Replicability of Statistical Inferences
Abstract:
Among plausible causes for replicability failure, one that has not received sufficient attention is the environment in which the research is conducted. Consisting of the population, equipment, personnel, and various conditions such as location, time, and weather, the research environment can affect treatments and outcomes, and changes in the research environment that occur when an experiment is redone can affect replicability. We examine the extent to which such changes contribute to replicability failure. Our framework is that of an initial experiment that generates the data and a follow-up experiment that is done the same way except for a change in the research environment. We assume that the initial experiment satisfies the assumptions of the two-sample t-statistic and that the follow-up experiment is described by a mixed model which includes environmental parameters. We derive expressions for the effect that the research environment has on power, sample size selection, p-values, and confidence levels. We measure the size of the environmental effect with the environmental effect ratio (EER) which is the ratio of the standard deviations of environment by treatment interaction and error. By varying EER, it is possible to determine conditions that favor replicability and those that do not.
Journal: The American Statistician
Pages: 334-342
Issue: 3
Volume: 75
Year: 2021
Month: 7
X-DOI: 10.1080/00031305.2020.1829047
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1829047
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:3:p:334-342
Template-Type: ReDIF-Article 1.0
Author-Name: Wei Jiang
Author-X-Name-First: Wei
Author-X-Name-Last: Jiang
Author-Name: Shuang Song
Author-X-Name-First: Shuang
Author-X-Name-Last: Song
Author-Name: Lin Hou
Author-X-Name-First: Lin
Author-X-Name-Last: Hou
Author-Name: Hongyu Zhao
Author-X-Name-First: Hongyu
Author-X-Name-Last: Zhao
Title: A Set of Efficient Methods to Generate High-Dimensional Binary Data With Specified Correlation Structures
Abstract:
High-dimensional correlated binary data arise in many areas, such as observed genetic variations in biomedical research. Data simulation can help researchers evaluate efficiency and explore properties of different computational and statistical methods. Also, some statistical methods, such as Monte Carlo methods, rely on data simulation. Lunn and Davies proposed linear time complexity methods to generate correlated binary variables with three common correlation structures. However, it is infeasible to specify unequal probabilities in their methods. In this article, we introduce several computationally efficient algorithms that generate high-dimensional binary data with specified correlation structures and unequal probabilities. Our algorithms have linear time complexity with respect to the dimension for three commonly studied correlation structures, namely exchangeable, decaying-product and K-dependent correlation structures. In addition, we extend our algorithms to generate binary data of specified nonnegative correlation matrices satisfying the validity condition with quadratic time complexity. We provide an R package, CorBin, to implement our simulation methods. Compared to the existing packages for binary data generation, the time cost to generate a 100-dimensional binary vector with the common correlation structures and general correlation matrices can be reduced up to 105 folds and 103 folds, respectively, and the efficiency can be further improved with the increase of dimensions. The R package CorBin is available on CRAN at https://cran.r-project.org/.
Journal: The American Statistician
Pages: 310-322
Issue: 3
Volume: 75
Year: 2021
Month: 7
X-DOI: 10.1080/00031305.2020.1816213
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1816213
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:3:p:310-322
Template-Type: ReDIF-Article 1.0
Author-Name: David J. Aldous
Author-X-Name-First: David J.
Author-X-Name-Last: Aldous
Title: A Prediction Tournament Paradox
Abstract:
In a prediction tournament, contestants “forecast” by asserting a numerical probability for each of (say) 100 future real-world events. The scoring system is designed so that (regardless of the unknown true probabilities) more accurate forecasters will likely score better. This is true for one-on-one comparisons between contestants. But consider a realistic-size tournament with many contestants, with a range of accuracies. It may seem self-evident that the winner will likely be one of the most accurate forecasters. But, in the setting where the range extends to very accurate forecasters, simulations show this is mathematically false, within a somewhat plausible model. Even outside that setting the winner is less likely than intuition suggests to be one of the handful of best forecasters. Though implicit in recent technical papers, this paradox has apparently not been explicitly pointed out before, though is easily explained. It perhaps has implications for the ongoing IARPA-sponsored research programs involving forecasting.
Journal: The American Statistician
Pages: 243-248
Issue: 3
Volume: 75
Year: 2021
Month: 7
X-DOI: 10.1080/00031305.2019.1604430
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1604430
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:3:p:243-248
Template-Type: ReDIF-Article 1.0
Author-Name: Yanlong Sun
Author-X-Name-First: Yanlong
Author-X-Name-Last: Sun
Author-Name: Hongbin Wang
Author-X-Name-First: Hongbin
Author-X-Name-Last: Wang
Title: Learning Temporal Structures of Random Patterns by Generating Functions
Abstract:
We present a method of generating functions to compute the distributions of the first-arrival and inter-arrival times of random patterns in independent Bernoulli trials and first-order Markov trials. We use segmentation of pattern events and diagrams of Markov chains to illustrate the recursive structures represented by generating functions. We then relate the results of pattern time to the probability of first occurrence and the probability of occurrence at least once within a finite sample size. Through symbolic manipulation of formal power series and multiple levels of compression, generating functions provide a powerful way to discover the rich statistical structures embedded in random sequences.
Journal: The American Statistician
Pages: 300-309
Issue: 3
Volume: 75
Year: 2021
Month: 7
X-DOI: 10.1080/00031305.2020.1778527
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1778527
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:3:p:300-309
Template-Type: ReDIF-Article 1.0
Author-Name: Yen-Chi Chen
Author-X-Name-First: Yen-Chi
Author-X-Name-Last: Chen
Title: Reviewof Books and Teaching Materials
Journal: The American Statistician
Pages: 354-354
Issue: 3
Volume: 75
Year: 2021
Month: 7
X-DOI: 10.1080/00031305.2021.1949931
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1949931
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:3:p:354-354
Template-Type: ReDIF-Article 1.0
Author-Name: J. G. Liao
Author-X-Name-First: J. G.
Author-X-Name-Last: Liao
Author-Name: Vishal Midya
Author-X-Name-First: Vishal
Author-X-Name-Last: Midya
Author-Name: Arthur Berg
Author-X-Name-First: Arthur
Author-X-Name-Last: Berg
Title: Connecting and Contrasting the Bayes Factor and a Modified ROPE Procedure for Testing Interval Null Hypotheses
Abstract:
There has been strong recent interest in testing interval null hypotheses for improved scientific inference. For example, Lakens et al. and Lakens and Harms use this approach to study if there is a prespecified meaningful treatment effect in gerontology and clinical trials, instead of a point null hypothesis of any effect. Two popular Bayesian approaches are available for interval null hypothesis testing. One is the standard Bayes factor and the other is the region of practical equivalence (ROPE) procedure championed by Kruschke and others over many years. This article connects key quantities in the two approaches, which in turn allow us to contrast two major differences between the approaches with substantial practical implications. The first is that the Bayes factor depends heavily on the prior specification while a modified ROPE procedure is very robust. The second difference is concerned with the statistical property when data are generated under a neutral parameter value on the common boundary of competing hypotheses. In this case, the Bayes factors can be severely biased whereas the modified ROPE approach gives a reasonable result. Finally, the connection leads to a simple and effective algorithm for computing Bayes factors using draws from posterior distributions generated by standard Bayesian programs such as BUGS, JAGS, and Stan.
Journal: The American Statistician
Pages: 256-264
Issue: 3
Volume: 75
Year: 2021
Month: 7
X-DOI: 10.1080/00031305.2019.1701550
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1701550
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:3:p:256-264
Template-Type: ReDIF-Article 1.0
Author-Name: Youjin Lee
Author-X-Name-First: Youjin
Author-X-Name-Last: Lee
Title: Beyond Multiple Linear Regression: Applied Generalized Linear Models and Multilevel Models in R
Journal: The American Statistician
Pages: 450-451
Issue: 4
Volume: 75
Year: 2021
Month: 10
X-DOI: 10.1080/00031305.2021.1985862
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1985862
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:450-451
Template-Type: ReDIF-Article 1.0
Author-Name: Ben O’Neill
Author-X-Name-First: Ben
Author-X-Name-Last: O’Neill
Title: The Classical Occupancy Distribution: Computation and Approximation
Abstract:
We examine the discrete distributional form that arises from the “classical occupancy problem,” which looks at the behavior of the number of occupied bins when we allocate a given number of balls uniformly at random to a given number of bins. We review the mass function and moments of the classical occupancy distribution and derive exact and asymptotic results for the mean, variance, skewness and kurtosis. We develop an algorithm to compute a cubic array of log-probabilities from the classical occupancy distribution. This algorithm allows the computation of large blocks of values while avoiding underflow problems in computation. Using this algorithm, we compute the classical occupancy distribution for a large block of values of balls and bins, and we measure the accuracy of its asymptotic approximation using the normal distribution. We analyze the accuracy of the normal approximation with respect to the variance, skewness and kurtosis of the distribution. Based on this analysis, we give some practical guidance on the feasibility of computing large blocks of values from the occupancy distribution, and when approximation is required.
Journal: The American Statistician
Pages: 364-375
Issue: 4
Volume: 75
Year: 2021
Month: 10
X-DOI: 10.1080/00031305.2019.1699445
File-URL: http://hdl.handle.net/10.1080/00031305.2019.1699445
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:364-375
Template-Type: ReDIF-Article 1.0
Author-Name: Peter E. Freeman
Author-X-Name-First: Peter E.
Author-X-Name-Last: Freeman
Title: Facilitating Authentic Practice for Early Undergraduate Statistics Students
Abstract:
In current curricula, authentic statistical practice generally only occurs in capstone projects undertaken by advanced undergraduate and Master’s students. We argue that deferring practice is a mistake: undergraduate students should achieve experience via repeated practice from their first years onward, to achieve heightened levels of confidence and competence prior to graduation. However, statistical practice is not a “one size fits all” enterprise: for instance, elements of a capstone experience, such as extensive data preprocessing, may be out of place in earlier practice settings due to less-experienced students’ relative lack of coding skill. We describe a course we have implemented at Carnegie Mellon University, currently open to second-year students, that provides a circumscribed opportunity for statistical practice that limits coding breadth, uses fully curated data, treats statistical learning models as “gray boxes” to be understood qualitatively, and provides open-ended semester-long projects that students pursue outside of class. We show how pre- and post-course assessment tests and retrospective surveys indicate clear gains in the students’ knowledge of, and attitudes toward, statistical practice. Given its clear benefits, we feel that statistics and data science programs should offer a course like the one we describe to all undergraduate students pursuing statistics and data science degrees.
Journal: The American Statistician
Pages: 433-444
Issue: 4
Volume: 75
Year: 2021
Month: 10
X-DOI: 10.1080/00031305.2020.1844293
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1844293
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:433-444
Template-Type: ReDIF-Article 1.0
Author-Name: Paul Vos
Author-X-Name-First: Paul
Author-X-Name-Last: Vos
Author-Name: Qiang Wu
Author-X-Name-First: Qiang
Author-X-Name-Last: Wu
Title: Letter to the Editor: Zhang, J. (2021), “The Mean Relative Entropy: An Invariant Measure of Estimation Error,” The American Statistician, 75, 117–123: comment by Vos and Wu
Journal: The American Statistician
Pages: 455-457
Issue: 4
Volume: 75
Year: 2021
Month: 10
X-DOI: 10.1080/00031305.2021.1978544
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1978544
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:455-457
Template-Type: ReDIF-Article 1.0
Author-Name: Kevin Kunzmann
Author-X-Name-First: Kevin
Author-X-Name-Last: Kunzmann
Author-Name: Michael J. Grayling
Author-X-Name-First: Michael J.
Author-X-Name-Last: Grayling
Author-Name: Kim May Lee
Author-X-Name-First: Kim May
Author-X-Name-Last: Lee
Author-Name: David S. Robertson
Author-X-Name-First: David S.
Author-X-Name-Last: Robertson
Author-Name: Kaspar Rufibach
Author-X-Name-First: Kaspar
Author-X-Name-Last: Rufibach
Author-Name: James M. S. Wason
Author-X-Name-First: James M. S.
Author-X-Name-Last: Wason
Title: A Review of Bayesian Perspectives on Sample Size Derivation for Confirmatory Trials
Abstract:
Sample size derivation is a crucial element of planning any confirmatory trial. The required sample size is typically derived based on constraints on the maximal acceptable Type I error rate and minimal desired power. Power depends on the unknown true effect and tends to be calculated either for the smallest relevant effect or a likely point alternative. The former might be problematic if the minimal relevant effect is close to the null, thus requiring an excessively large sample size, while the latter is dubious since it does not account for the a priori uncertainty about the likely alternative effect. A Bayesian perspective on sample size derivation for a frequentist trial can reconcile arguments about the relative a priori plausibility of alternative effects with ideas based on the relevance of effect sizes. Many suggestions as to how such “hybrid” approaches could be implemented in practice have been put forward. However, key quantities are often defined in subtly different ways in the literature. Starting from the traditional entirely frequentist approach to sample size derivation, we derive consistent definitions for the most commonly used hybrid quantities and highlight connections, before discussing and demonstrating their use in sample size derivation for clinical trials.
Journal: The American Statistician
Pages: 424-432
Issue: 4
Volume: 75
Year: 2021
Month: 10
X-DOI: 10.1080/00031305.2021.1901782
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1901782
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:424-432
Template-Type: ReDIF-Article 1.0
Author-Name: Dennis D. Boos
Author-X-Name-First: Dennis D.
Author-X-Name-Last: Boos
Author-Name: Siyu Duan
Author-X-Name-First: Siyu
Author-X-Name-Last: Duan
Title: Pairwise Comparisons Using Ranks in the One-Way Model
Abstract:
The Wilcoxon rank sum test for two independent samples and the Kruskal–Wallis rank test for the one-way model with k independent samples are very competitive robust alternatives to the two-sample t-test and k-sample F-test when the underlying data have tails longer than the normal distribution. However, these positives for rank methods do not extend as readily to methods for making all pairwise comparisons used to reveal where the differences in location may exist. Here, we show that the closed method of Marcus et al. applied to ranks is quite powerful for both small and large samples and better than any methods suggested in the list of applied nonparametric texts found in the recent study by Richardson. In addition, we show that the closed method applied to means is even more powerful than the classical Tukey–Kramer method applied to means, which itself is very competitive for nonnormal data with moderately long tails and small samples.
Journal: The American Statistician
Pages: 414-423
Issue: 4
Volume: 75
Year: 2021
Month: 10
X-DOI: 10.1080/00031305.2020.1860819
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1860819
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:414-423
Template-Type: ReDIF-Article 1.0
Author-Name: Marius Hofert
Author-X-Name-First: Marius
Author-X-Name-Last: Hofert
Title: Random number generators produce collisions: Why, how many and more
Abstract:
It seems surprising that when applying widely used random number generators to generate one million random numbers on modern architectures, one obtains, on average, about 116 collisions. This article explains why, how to mathematically compute such a number, why they often cannot be obtained in a straightforward way, how to numerically compute them in a robust way and, among other things, what would need to be changed to bring this number below 1. The probability of at least one collision is also briefly addressed, which, as it turns out, again needs a careful numerical treatment. Overall, the article provides an introduction to the representation of floating-point numbers on a computer and corresponding implications in statistics and simulation. All computations are carried out in R and are reproducible with the texttt included in this article.
Journal: The American Statistician
Pages: 394-402
Issue: 4
Volume: 75
Year: 2021
Month: 10
X-DOI: 10.1080/00031305.2020.1782261
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1782261
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:394-402
Template-Type: ReDIF-Article 1.0
Author-Name: Luke Keele
Author-X-Name-First: Luke
Author-X-Name-Last: Keele
Author-Name: Dylan S. Small
Author-X-Name-First: Dylan S.
Author-X-Name-Last: Small
Title: Comparing Covariate Prioritization via Matching to Machine Learning Methods for Causal Inference Using Five Empirical Applications
Abstract:
When investigators seek to estimate causal effects, they often assume that selection into treatment is based only on observed covariates. Under this identification strategy, analysts must adjust for observed confounders. While basic regression models have long been the dominant method of statistical adjustment, methods based on matching or weighting have become more common. Of late, methods based on machine learning (ML) have been developed for statistical adjustment. These ML methods are often designed to be black box methods with little input from the researcher. In contrast, matching methods that use covariate prioritization are designed to allow for direct input from substantive investigators. In this article, we use a novel research design to compare matching with covariate prioritization to black box methods. We use black box methods to replicate results from five studies where matching with covariate prioritization was used to customize the statistical adjustment in direct response to substantive expertise. We compare the methods in terms of both point and interval estimation. We conclude with advice for investigators.
Journal: The American Statistician
Pages: 355-363
Issue: 4
Volume: 75
Year: 2021
Month: 10
X-DOI: 10.1080/00031305.2020.1867638
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1867638
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:355-363
Template-Type: ReDIF-Article 1.0
Author-Name: Narges Motalebi
Author-X-Name-First: Narges
Author-X-Name-Last: Motalebi
Author-Name: Nathaniel T. Stevens
Author-X-Name-First: Nathaniel T.
Author-X-Name-Last: Stevens
Author-Name: Stefan H. Steiner
Author-X-Name-First: Stefan H.
Author-X-Name-Last: Steiner
Title: Hurdle Blockmodels for Sparse Network Modeling
Abstract:
A variety of random graph models have been proposed in the literature to model the associations within an interconnected system and to realistically account for various structures and attributes of such systems. In particular, much research has been devoted to modeling the interaction of humans within social networks. However, such networks in real-life tend to be extremely sparse and existing methods do not adequately address this issue. In this article, we propose an extension to ordinary and degree corrected stochastic blockmodels that accounts for a high degree of sparsity. Specifically, we propose hurdle versions of these blockmodels to account for community structure and degree heterogeneity in sparse networks. We use simulation to ensure parameter estimation is consistent and precise, and we propose the use of likelihood ratio-type tests for model selection. We illustrate the necessity for hurdle blockmodels with a small research collaboration network as well as the infamous Enron E-mail exchange network. Methods for determining goodness of fit and performing model selection are also proposed. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 383-393
Issue: 4
Volume: 75
Year: 2021
Month: 10
X-DOI: 10.1080/00031305.2020.1865199
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1865199
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:383-393
Template-Type: ReDIF-Article 1.0
Author-Name: Philippe Flandre
Author-X-Name-First: Philippe
Author-X-Name-Last: Flandre
Author-Name: John O’Quigley
Author-X-Name-First: John
Author-X-Name-Last: O’Quigley
Title: The Short-Term and Long-Term Hazard Ratio Model: Parameterization Inconsistency
Abstract:
The test of Yang and Prentice, based on the short-term and long-term hazard ratio model for the presence of a regression effect appears to be an attractive one, being able to detect departures from a null hypothesis of no effect against quite broad alternatives. We recall the model on which this test is based and the test itself. In simulations, the test has shown good performance and is judged to be of potential value when alternatives to the null may be of a nonproportional hazards nature. However, the model, even when valid, suffers from a parameterization inconsistency in the sense that parameter estimates can violate the model’s assumed parametric structure even when true. This leads to awkward behavior in some situations. For example, this inconsistency implies that inference will not be invariant to the coding of treatment allocation. While this is a theoretical observation, we provide real examples that highlight the difficulty in making clear cut inferences from the model. Potential solutions are available and we provide some discussion on this.
Journal: The American Statistician
Pages: 376-382
Issue: 4
Volume: 75
Year: 2021
Month: 10
X-DOI: 10.1080/00031305.2020.1740786
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1740786
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:376-382
Template-Type: ReDIF-Article 1.0
Author-Name: Jianning Yang
Author-X-Name-First: Jianning
Author-X-Name-Last: Yang
Author-Name: John E. Kolassa
Author-X-Name-First: John E.
Author-X-Name-Last: Kolassa
Title: The Impact of Application of the Jackknife to the Sample Median
Abstract:
The jackknife is a reliable tool for reducing the bias of a wide range of estimators. This note demonstrates that even such versatile tools have regularity conditions that can be violated even in relatively simple cases, and that caution needs to be exercised in their use. In particular, we show that the jackknife does not provide the expected reliability for bias-reduction for the sample median, because of subtle changes in behavior of the sample median as one moves between even and odd sample sizes. These considerations arose out of class discussions in a MS-level nonparametrics course.
Journal: The American Statistician
Pages: 445-449
Issue: 4
Volume: 75
Year: 2021
Month: 10
X-DOI: 10.1080/00031305.2020.1869090
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1869090
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:445-449
Template-Type: ReDIF-Article 1.0
Author-Name: Jin Zhang
Author-X-Name-First: Jin
Author-X-Name-Last: Zhang
Title: Response to Letter to the Editor: Zhang, J. (2021)
Journal: The American Statistician
Pages: 458-458
Issue: 4
Volume: 75
Year: 2021
Month: 10
X-DOI: 10.1080/00031305.2021.1982557
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1982557
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:458-458
Template-Type: ReDIF-Article 1.0
Author-Name: Gabriel J. Young
Author-X-Name-First: Gabriel J.
Author-X-Name-Last: Young
Title: Probability and Statistical Inference: From Basic Principles to Advanced Models
Journal: The American Statistician
Pages: 451-453
Issue: 4
Volume: 75
Year: 2021
Month: 10
X-DOI: 10.1080/00031305.2021.1985863
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1985863
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:451-453
Template-Type: ReDIF-Article 1.0
Author-Name: Samuel Thomas
Author-X-Name-First: Samuel
Author-X-Name-Last: Thomas
Author-Name: Wanzhu Tu
Author-X-Name-First: Wanzhu
Author-X-Name-Last: Tu
Title: Learning Hamiltonian Monte Carlo in R
Abstract:
Hamiltonian Monte Carlo (HMC) is a powerful tool for Bayesian computation. In comparison with the traditional Metropolis–Hastings algorithm, HMC offers greater computational efficiency, especially in higher dimensional or more complex modeling situations. To most statisticians, however, the idea of HMC comes from a less familiar origin, one that is based on the theory of classical mechanics. Its implementation, either through Stan or one of its derivative programs, can appear opaque to beginners. A lack of understanding of the inner working of HMC, in our opinion, has hindered its application to a broader range of statistical problems. In this article, we review the basic concepts of HMC in a language that is more familiar to statisticians, and we describe an HMC implementation in R, one of the most frequently used statistical software environments. We also present hmclearn, an R package for learning HMC. This package contains a general-purpose HMC function for data analysis. We illustrate the use of this package in common statistical models. In doing so, we hope to promote this powerful computational tool for wider use. Example code for common statistical models is presented as supplementary material for online publication.
Journal: The American Statistician
Pages: 403-413
Issue: 4
Volume: 75
Year: 2021
Month: 10
X-DOI: 10.1080/00031305.2020.1865198
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1865198
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:403-413
Template-Type: ReDIF-Article 1.0
Author-Name: Kenneth R. Benoit
Author-X-Name-First: Kenneth R.
Author-X-Name-Last: Benoit
Title: Textual Data Science with R
Journal: The American Statistician
Pages: 453-454
Issue: 4
Volume: 75
Year: 2021
Month: 10
X-DOI: 10.1080/00031305.2021.1985864
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1985864
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:75:y:2021:i:4:p:453-454
Template-Type: ReDIF-Article 1.0
Author-Name: William F. Christensen
Author-X-Name-First: William F.
Author-X-Name-Last: Christensen
Author-Name: Brinley N. Zabriskie
Author-X-Name-First: Brinley N.
Author-X-Name-Last: Zabriskie
Title: When Your Permutation Test is Doomed to Fail
Abstract:
A two-tailed test comparing the means of two independent populations is perhaps the most commonly used hypothesis test in quantitative research, featured centrally in medical research, A/B testing, and throughout the sciences. When data are skewed, the standard two-tailed t test is not appropriate and the permutation test comparing the two means (or medians) has been a widely recommended alternative, with statistical authors and statistical software packages touting the permutation test’s utility, particularly for small samples. In this presentation, we illustrate that when the two samples are skewed and the sample sizes are unequal, the two-tailed permutation test (as traditionally implemented) can in some cases have power equal to zero, even when the k highest values in the combined data are all found in the group with k observations. Further, in many cases the standard permutation test exhibits decreasing power as the total sample size increases! We illustrate the causes of these perverse properties via both simulation and real-world examples, and we recommend approaches for ameliorating or avoiding these potential problems.
Journal: The American Statistician
Pages: 53-63
Issue: 1
Volume: 76
Year: 2022
Month: 1
X-DOI: 10.1080/00031305.2021.1902856
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1902856
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:53-63
Template-Type: ReDIF-Article 1.0
Author-Name: David A. Harville
Author-X-Name-First: David A.
Author-X-Name-Last: Harville
Title: Bayesian Inference Is Unaffected by Selection: Fact or Fiction?
Abstract:
The problem considered is that of making inferences about the value of a parameter vector θ
based on the value of an observable random vector y that is subject to selection of the form y∈S
(for a known subset S). According to conventional wisdom, a Bayesian approach (unlike a frequentist approach) requires no adjustment for selection, which is generally regarded as counterintuitive and even paradoxical. An alternative considered herein consists (when taking a Bayesian approach in the face of selection) of basing the inferences for the value of θ
on the posterior distribution derived from the conditional (on y∈S
) joint distribution of y and θ
. That leads to an adjustment in the likelihood function that is reinterpretable as an adjustment to the prior distribution and ultimately leads to a different posterior distribution. And it serves to make the inferences specific to settings that are subject to selection of the same kind as the setting that gave rise to the data. Moreover, even in the absence of any real selection, this approach can be used to make the inferences specific to a meaningful subset of y-values.
Journal: The American Statistician
Pages: 22-28
Issue: 1
Volume: 76
Year: 2022
Month: 1
X-DOI: 10.1080/00031305.2020.1858963
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1858963
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:22-28
Template-Type: ReDIF-Article 1.0
Author-Name: James M. Flegal
Author-X-Name-First: James M.
Author-X-Name-Last: Flegal
Title: Do Dice Play God? The Mathematics of Uncertainty, by Ian Stewart
Journal: The American Statistician
Pages: 85-85
Issue: 1
Volume: 76
Year: 2022
Month: 1
X-DOI: 10.1080/00031305.2021.2019999
File-URL: http://hdl.handle.net/10.1080/00031305.2021.2019999
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:85-85
Template-Type: ReDIF-Article 1.0
Author-Name: William E. Griffiths
Author-X-Name-First: William E.
Author-X-Name-Last: Griffiths
Author-Name: R. Carter Hill
Author-X-Name-First: R. Carter
Author-X-Name-Last: Hill
Title: On the Power of the F-test for Hypotheses in a Linear Model
Abstract:
We improve students’ understanding of the F-test for linear hypotheses in a linear model by explaining elements that affect the power of the test. Including true restrictions in a joint null hypothesis affects test power in a way that is not generally known. Asking a student whether including the true restrictions in the null hypothesis will increase or decrease power, the student is likely to say: “I don’t know.” The student’s answer is not bad because the power depends on the noncentrality parameter and the degrees of freedom. We show that adding true restrictions to a linear hypothesis cannot decrease the noncentrality parameter of the F-statistic, a result many will find counterintuitive. Adding true restrictions can increase or decrease F-test power depending on the offsetting negative effect of reducing the numerator degrees of freedom. We provide illustrative examples of these results and prove them for the general case.
Journal: The American Statistician
Pages: 78-84
Issue: 1
Volume: 76
Year: 2022
Month: 1
X-DOI: 10.1080/00031305.2021.1979652
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1979652
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:78-84
Template-Type: ReDIF-Article 1.0
Author-Name: Matthew J. McIntosh
Author-X-Name-First: Matthew J.
Author-X-Name-Last: McIntosh
Title: Calculating Sample Size for Follmann’s Simple Multivariate Test for One-Sided Alternatives
Abstract:
Follmann developed a multivariate test, when X ∼ MVN(μ,Σ)
, to test H0 versus H1 − H0
where H0:
μ=0
and H1:μ≥0
. Follmann provided strict lower bounds on the power function when an orthogonal mapping requirement was satisfied, the use of which requires knowledge about the unknown population covariance matrix. In this article, we show that the orthogonal mapping requirement for his theorem is equivalent to and can be replaced with 1′μ≥0
, which does not require knowledge about the population covariance matrix. Using the lower bound on power, we are able to develop conservative sample sizes for this test. The conservative sample sizes are upper bounds on the actual sample size needed to achieve at least the desired power. Results from a simulation study are provided illustrating that the sample sizes are indeed upper bounds. Also, a simple R program to calculate sample size is provided.
Journal: The American Statistician
Pages: 16-21
Issue: 1
Volume: 76
Year: 2022
Month: 1
X-DOI: 10.1080/00031305.2020.1787224
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1787224
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:16-21
Template-Type: ReDIF-Article 1.0
Author-Name: Giulia Carella
Author-X-Name-First: Giulia
Author-X-Name-Last: Carella
Author-Name: Javier Pérez Trufero
Author-X-Name-First: Javier
Author-X-Name-Last: Pérez Trufero
Author-Name: Miguel Álvarez
Author-X-Name-First: Miguel
Author-X-Name-Last: Álvarez
Author-Name: Jorge Mateu
Author-X-Name-First: Jorge
Author-X-Name-Last: Mateu
Title: A Bayesian Spatial Analysis of the Heterogeneity in Human Mobility Changes During the First Wave of the COVID-19 Epidemic in the United States
Abstract:
The spread of COVID-19 in the U.S. prompted nonpharmaceutical interventions which caused a reduction in mobility everywhere, although with large disparities between different counties. Using a Bayesian spatial modeling framework, we investigated the association of county-level demographic and socioeconomic factors with changes in workplace mobility at two points in time: during the early stages of the epidemic (lockdown phase) and in the following phase (recovery phase) up to July 2020. While controlling for the perceived risk of infection, socioeconomic and demographic covariates explain about 40% of the variance in changes in workplace mobility during the lockdown phase, which reduces to about 10% during the recovery phase. During the lockdown phase, the results show larger drops in mobility in counties with richer families, that are less densely populated, with an older population living in dense neighborhoods, and with a lower proportion of Hispanic population. When also accounting for the residual spatial variability, the variance explained by the model increases to more than 70%, suggesting strong proximity effects potentially related to state- and county-wise regulations. These results provide community-level insights on the evolution of the U.S. mobility during the first wave of the epidemic that could directly benefit policy evaluation and interventions.
Journal: The American Statistician
Pages: 64-72
Issue: 1
Volume: 76
Year: 2022
Month: 1
X-DOI: 10.1080/00031305.2021.1965657
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1965657
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:64-72
Template-Type: ReDIF-Article 1.0
Author-Name: Jiaqi Gu
Author-X-Name-First: Jiaqi
Author-X-Name-Last: Gu
Author-Name: Yiwei Fan
Author-X-Name-First: Yiwei
Author-X-Name-Last: Fan
Author-Name: Guosheng Yin
Author-X-Name-First: Guosheng
Author-X-Name-Last: Yin
Title: Reconstructing the Kaplan–Meier Estimator as an M-estimator
Abstract:
The Kaplan–Meier (KM) estimator, which provides a nonparametric estimate of a survival function for time-to-event data, has broad applications in clinical studies, engineering, economics and many other fields. The theoretical properties of the KM estimator including its consistency and asymptotic distribution have been well established. From a new perspective, we reconstruct the KM estimator as an M-estimator by maximizing a quadratic M-function based on concordance, which can be computed using the expectation–maximization (EM) algorithm. It is shown that the convergent point of the EM algorithm coincides with the traditional KM estimator, which offers a new interpretation of the KM estimator as an M-estimator. As a result, the limiting distribution of the KM estimator can be established using M-estimation theory. Application on two real datasets demonstrates that the proposed M-estimator is equivalent to the KM estimator, and the confidence intervals and confidence bands can be derived as well.
Journal: The American Statistician
Pages: 37-43
Issue: 1
Volume: 76
Year: 2022
Month: 1
X-DOI: 10.1080/00031305.2021.1947376
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1947376
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:37-43
Template-Type: ReDIF-Article 1.0
Author-Name: Byron J. Gajewski
Author-X-Name-First: Byron J.
Author-X-Name-Last: Gajewski
Author-Name: Jo A. Wick
Author-X-Name-First: Jo A.
Author-X-Name-Last: Wick
Author-Name: Truman J. Milling
Author-X-Name-First: Truman J.
Author-X-Name-Last: Milling
Title: A Connection Between Baseball and Clinical Trials Found in “Slugging Percentage is Not a Percentage—And Why That Matters”
Journal: The American Statistician
Pages: 89-89
Issue: 1
Volume: 76
Year: 2022
Month: 1
X-DOI: 10.1080/00031305.2021.1990128
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1990128
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:89-89
Template-Type: ReDIF-Article 1.0
Author-Name: Brett Presnell
Author-X-Name-First: Brett
Author-X-Name-Last: Presnell
Title: A Geometric Derivation of the Cantor Distribution
Abstract:
For students of probability and statistics, the Cantor distribution provides a useful example of a continuous probability distribution on the real line which cannot be obtained by integrating its derivative or indeed any density function. While usually treated as an advanced topic, we show that the basic facts about the Cantor distribution can be rigorously derived from a sequence of uniform distributions using simple geometry and recursion, together with one basic result from advanced calculus.
Journal: The American Statistician
Pages: 73-77
Issue: 1
Volume: 76
Year: 2022
Month: 1
X-DOI: 10.1080/00031305.2021.1905062
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1905062
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:73-77
Template-Type: ReDIF-Article 1.0
Author-Name: Angelika M. Stefan
Author-X-Name-First: Angelika M.
Author-X-Name-Last: Stefan
Title: Statistics for Making Decisions,
Journal: The American Statistician
Pages: 87-88
Issue: 1
Volume: 76
Year: 2022
Month: 1
X-DOI: 10.1080/00031305.2021.2020003
File-URL: http://hdl.handle.net/10.1080/00031305.2021.2020003
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:87-88
Template-Type: ReDIF-Article 1.0
Author-Name: Emilija Perković
Author-X-Name-First: Emilija
Author-X-Name-Last: Perković
Title: The Phantom Pattern Problem: The Mirage of Big Data,
Journal: The American Statistician
Pages: 86-87
Issue: 1
Volume: 76
Year: 2022
Month: 1
X-DOI: 10.1080/00031305.2021.2020002
File-URL: http://hdl.handle.net/10.1080/00031305.2021.2020002
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:86-87
Template-Type: ReDIF-Article 1.0
Author-Name: Yang Ni
Author-X-Name-First: Yang
Author-X-Name-Last: Ni
Title: Exploratory Data Analysis with MATLAB, 3rd ed., by Wendy L. Martinez, Angel R. Martinez, and Jeffrey L. Solka
Journal: The American Statistician
Pages: 85-86
Issue: 1
Volume: 76
Year: 2022
Month: 1
X-DOI: 10.1080/00031305.2021.2020000
File-URL: http://hdl.handle.net/10.1080/00031305.2021.2020000
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:85-86
Template-Type: ReDIF-Article 1.0
Author-Name: Erik van Zwet
Author-X-Name-First: Erik van
Author-X-Name-Last: Zwet
Author-Name: Andrew Gelman
Author-X-Name-First: Andrew
Author-X-Name-Last: Gelman
Title: A Proposal for Informative Default Priors Scaled by the Standard Error of Estimates
Abstract:
If we have an unbiased estimate of some parameter of interest, then its absolute value is positively biased for the absolute value of the parameter. This bias is large when the signal-to-noise ratio (SNR) is small, and it becomes even larger when we condition on statistical significance; the winner’s curse. This is a frequentist motivation for regularization or “shrinkage.” To determine a suitable amount of shrinkage, we propose to estimate the distribution of the SNR from a large collection or “corpus” of similar studies and use this as a prior distribution. The wider the scope of the corpus, the less informative the prior, but a wider scope does not necessarily result in a more diffuse prior. We show that the estimation of the prior simplifies if we require that posterior inference is equivariant under linear transformations of the data. We demonstrate our approach with corpora of 86 replication studies from psychology and 178 phase 3 clinical trials. Our suggestion is not intended to be a replacement for a prior based on full information about a particular problem; rather, it represents a familywise choice that should yield better long-term properties than the current default uniform prior, which has led to systematic overestimates of effect sizes and a replication crisis when these inflated estimates have not shown up in later studies.
Journal: The American Statistician
Pages: 1-9
Issue: 1
Volume: 76
Year: 2022
Month: 1
X-DOI: 10.1080/00031305.2021.1938225
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1938225
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:1-9
Template-Type: ReDIF-Article 1.0
Author-Name: Edwin van den Heuvel
Author-X-Name-First: Edwin
Author-X-Name-Last: van den Heuvel
Author-Name: Zhuozhao Zhan
Author-X-Name-First: Zhuozhao
Author-X-Name-Last: Zhan
Title: Myths About Linear and Monotonic Associations: Pearson’s r, Spearman’s ρ, and Kendall’s τ
Abstract:
Pearson’s correlation coefficient is considered a measure of linear association between bivariate random variables X and Y. It is recommended not to use it for other forms of associations. Indeed, for nonlinear monotonic associations alternative measures like Spearman’s rank and Kendall’s tau correlation coefficients are considered more appropriate. These views or opinions on the estimation of association are strongly rooted in the statistical and other empirical sciences. After defining linear and monotonic associations, we will demonstrate that these opinions are incorrect. Pearson’s correlation coefficient should not be ruled out a priori for measuring nonlinear monotonic associations. We will provide examples of practically relevant families of bivariate distribution functions with nonlinear monotonic associations for which Pearson’s correlation is preferred over Spearman’s rank and Kendall’s tau correlation in testing the dependency between X and Y. Alternatively, we will provide a family of bivariate distributions with a linear association between X and Y for which Spearman’s rank and Kendall’s tau are preferred over Pearson’s correlation. Our examples show that existing views on linear and monotonic associations are myths.
Journal: The American Statistician
Pages: 44-52
Issue: 1
Volume: 76
Year: 2022
Month: 1
X-DOI: 10.1080/00031305.2021.2004922
File-URL: http://hdl.handle.net/10.1080/00031305.2021.2004922
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:44-52
Template-Type: ReDIF-Article 1.0
Author-Name: Yulia Sidi
Author-X-Name-First: Yulia
Author-X-Name-Last: Sidi
Author-Name: Ofer Harel
Author-X-Name-First: Ofer
Author-X-Name-Last: Harel
Title: Difference Between Binomial Proportions Using Newcombe’s Method With Multiple Imputation for Incomplete Data
Abstract:
The difference between two binomial proportions is commonly used in applied research. Since many studies encounter incomplete data, proper methods to analyze such data are needed. Here, we present a proper multiple imputation (MI) procedure for constructing confidence interval for difference between binomial proportions using Newcombe’s method, which is known to have a better coverage probability when compared with Wald’s method. We use both a conventional MI procedure for ignorable missingness and a two-stage MI for non-ignorable missingness. Using simulation studies, we compare our method to three other methods and provide recommendation for the use of such methods in practice. In addition, we show the application of our new method on a COVID-19 dataset.
Journal: The American Statistician
Pages: 29-36
Issue: 1
Volume: 76
Year: 2022
Month: 1
X-DOI: 10.1080/00031305.2021.1898468
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1898468
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:29-36
Template-Type: ReDIF-Article 1.0
Author-Name: Christian H. Weiß
Author-X-Name-First: Christian H.
Author-X-Name-Last: Weiß
Author-Name: Boris Aleksandrov
Author-X-Name-First: Boris
Author-X-Name-Last: Aleksandrov
Title: Computing (Bivariate) Poisson Moments Using Stein–Chen Identities
Abstract:
Abstract–The (bivariate) Poisson distribution is the most common distribution for (bivariate) count random variables. The univariate Poisson distribution is characterized by the famous Stein–Chen identity. We demonstrate that this identity allows to derive even sophisticated moment expressions in such a simple way that the corresponding computations can be presented in an introductory statistics class. Then, we newly derive different types of Stein–Chen identity for the bivariate Poisson distribution. These are shown to be very useful for computing joint moments, again in a surprisingly simple way. We also explain how to extend our results to the general multivariate case.
Journal: The American Statistician
Pages: 10-15
Issue: 1
Volume: 76
Year: 2022
Month: 1
X-DOI: 10.1080/00031305.2020.1763836
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1763836
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:1:p:10-15
Template-Type: ReDIF-Article 1.0
Author-Name: Joshua Habiger
Author-X-Name-First: Joshua
Author-X-Name-Last: Habiger
Author-Name: Ye Liang
Author-X-Name-First: Ye
Author-X-Name-Last: Liang
Title: Publication Policies for Replicable Research and the Community-Wide False Discovery Rate
Abstract:
Recent literature has shown that statistically significant results are often not replicated because the “p-value < 0.05” publication rule results in a high false positive rate (FPR) or false discovery rate (FDR) in some scientific communities. While recommendations to address the phenomenon vary, many amount to incorporating additional study summary information, such as prior null hypothesis odds and/or effect sizes, in some way. This article demonstrates that a statistic called the local false discovery rate (lfdr), which incorporates this information, is a sufficient summary for addressing false positive rates. Specifically, it is shown that lfdr-values among published results are sufficient for estimating the community-wide FDR for any well-defined publication policy, and that lfdr-values are sufficient for defining policies for community-wide FDR control. It is also demonstrated that, though p-values can be useful for computing an lfdr, they alone are not sufficient for addressing the community-wide FDR. Data from the recent replication study are used to compare publication policies and illustrate the FDR estimator.
Journal: The American Statistician
Pages: 131-141
Issue: 2
Volume: 76
Year: 2022
Month: 4
X-DOI: 10.1080/00031305.2021.1999857
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1999857
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:131-141
Template-Type: ReDIF-Article 1.0
Author-Name: Jelle J. Goeman
Author-X-Name-First: Jelle J.
Author-X-Name-Last: Goeman
Author-Name: Aldo Solari
Author-X-Name-First: Aldo
Author-X-Name-Last: Solari
Title: Comparing Three Groups
Abstract:
For multiple comparisons in analysis of variance, the practitioners’ handbooks generally advocate standard methods such as Bonferroni, or an F-test followed by Tukey’s honest significant difference method. These methods are known to be suboptimal compared to closed testing procedures, but improved methods can be complex in the general multigroup set-up. In this note, we argue that the case of three-groups is special: with three groups, closed testing procedures are powerful and easy to use. We describe four different closed testing procedures specifically for the three-group set-up. The choice of method should be determined by assessing which of the comparisons are considered primary and which are secondary, as dictated by subject-matter considerations. We describe how all four methods can be used with any standard software.
Journal: The American Statistician
Pages: 168-176
Issue: 2
Volume: 76
Year: 2022
Month: 4
X-DOI: 10.1080/00031305.2021.2002188
File-URL: http://hdl.handle.net/10.1080/00031305.2021.2002188
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:168-176
Template-Type: ReDIF-Article 1.0
Author-Name: Nitis Mukhopadhyay
Author-X-Name-First: Nitis
Author-X-Name-Last: Mukhopadhyay
Title: Pairwise Independence May Not Imply Independence: New Illustrations and a Generalization
Abstract:
A number of standard textbooks that are followed in a junior/senior level course or in a first-year graduate level course in mathematical statistics and probability, routinely include one single basic illustration, obviously in its variant forms, to highlight an important point: pairwise independence may not imply (mutual) independence. We earnestly believe that beginning students appreciate more examples to clarify these key issues. Hence, we hope that our new sets of nontrivial illustrations from Section 2 will help our audience. Next, in Section 3, we extend the notion to q-wise independence with a large set of illustrations using both discrete and continuous random variables showing that q-wise independence may not imply (mutual) independence. We reasonably assure that this discourse is immediately accessible to juniors/seniors and first-year graduate students.
Journal: The American Statistician
Pages: 184-187
Issue: 2
Volume: 76
Year: 2022
Month: 4
X-DOI: 10.1080/00031305.2022.2039763
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2039763
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:184-187
Template-Type: ReDIF-Article 1.0
Author-Name: Rachel C. Nethery
Author-X-Name-First: Rachel C.
Author-X-Name-Last: Nethery
Author-Name: Jarvis T. Chen
Author-X-Name-First: Jarvis T.
Author-X-Name-Last: Chen
Author-Name: Nancy Krieger
Author-X-Name-First: Nancy
Author-X-Name-Last: Krieger
Author-Name: Pamela D. Waterman
Author-X-Name-First: Pamela D.
Author-X-Name-Last: Waterman
Author-Name: Emily Peterson
Author-X-Name-First: Emily
Author-X-Name-Last: Peterson
Author-Name: Lance A. Waller
Author-X-Name-First: Lance A.
Author-X-Name-Last: Waller
Author-Name: Brent A. Coull
Author-X-Name-First: Brent A.
Author-X-Name-Last: Coull
Title: Statistical Implications of Endogeneity Induced by Residential Segregation in Small-Area Modeling of Health Inequities
Abstract:
Health inequities are assessed by health departments to identify social groups disproportionately burdened by disease and by academic researchers to understand how social, economic, and environmental inequities manifest as health inequities. To characterize inequities, group-specific small-area health data are often modeled using log-linear generalized linear models (GLM) or generalized linear mixed models (GLMM) with a random intercept. These approaches estimate the same marginal rate ratio comparing disease rates across groups under standard assumptions. Here we explore how residential segregation combined with social group differences in disease risk can lead to contradictory findings from the GLM and GLMM. We show that this occurs because small-area disease rate data collected under these conditions induce endogeneity in the GLMM due to correlation between the model’s offset and random effect. This results in GLMM estimates that represent conditional rather than marginal associations. We refer to endogeneity arising from the offset, which to our knowledge has not been noted previously, as “offset endogeneity.” We illustrate this phenomenon in simulated data and real premature mortality data, and we propose alternative modeling approaches to address it. We also introduce to a statistical audience the social epidemiologic terminology for framing health inequities, which enables responsible interpretation of results.
Journal: The American Statistician
Pages: 142-151
Issue: 2
Volume: 76
Year: 2022
Month: 4
X-DOI: 10.1080/00031305.2021.2003245
File-URL: http://hdl.handle.net/10.1080/00031305.2021.2003245
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:142-151
Template-Type: ReDIF-Article 1.0
Author-Name: Yi Zuo
Author-X-Name-First: Yi
Author-X-Name-Last: Zuo
Author-Name: Thomas G. Stewart
Author-X-Name-First: Thomas G.
Author-X-Name-Last: Stewart
Author-Name: Jeffrey D. Blume
Author-X-Name-First: Jeffrey D.
Author-X-Name-Last: Blume
Title: Variable Selection With Second-Generation P-Values
Abstract:
Many statistical methods have been proposed for variable selection in the past century, but few balance inference and prediction tasks well. Here, we report on a novel variable selection approach called penalized regression with second-generation p-values (ProSGPV). It captures the true model at the best rate achieved by current standards, is easy to implement in practice, and often yields the smallest parameter estimation error. The idea is to use an l0
penalization scheme with second-generation p-values (SGPV), instead of traditional ones, to determine which variables remain in a model. The approach yields tangible advantages for balancing support recovery, parameter estimation, and prediction tasks. The ProSGPV algorithm can maintain its good performance even when there is strong collinearity among features or when a high-dimensional feature space with p > n is considered. We present extensive simulations and a real-world application comparing the ProSGPV approach with smoothly clipped absolute deviation (SCAD), adaptive lasso (AL), and minimax concave penalty with penalized linear unbiased selection (MC+). While the last three algorithms are among the current standards for variable selection, ProSGPV has superior inference performance and comparable prediction performance in certain scenarios.
Journal: The American Statistician
Pages: 91-101
Issue: 2
Volume: 76
Year: 2022
Month: 4
X-DOI: 10.1080/00031305.2021.1946150
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1946150
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:91-101
Template-Type: ReDIF-Article 1.0
Author-Name: Qiwei Li
Author-X-Name-First: Qiwei
Author-X-Name-Last: Li
Title: Bayesian Analysis of Infectious Diseases: COVID-19 and Beyond.
Journal: The American Statistician
Pages: 199-199
Issue: 2
Volume: 76
Year: 2022
Month: 4
X-DOI: 10.1080/00031305.2022.2054625
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2054625
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:199-199
Template-Type: ReDIF-Article 1.0
Author-Name: Georg Zimmermann
Author-X-Name-First: Georg
Author-X-Name-Last: Zimmermann
Author-Name: Edgar Brunner
Author-X-Name-First: Edgar
Author-X-Name-Last: Brunner
Author-Name: Werner Brannath
Author-X-Name-First: Werner
Author-X-Name-Last: Brannath
Author-Name: Martin Happ
Author-X-Name-First: Martin
Author-X-Name-Last: Happ
Author-Name: Arne C. Bathke
Author-X-Name-First: Arne C.
Author-X-Name-Last: Bathke
Title: Pseudo-Ranks: The Better Way of Ranking?
Abstract:
Rank-based methods are frequently used in the life sciences, and in the empirical sciences in general. Among the best-known examples of nonparametric rank-based tests are the Wilcoxon-Mann-Whitney test and the Kruskal–Wallis test. However, recently, potential pitfalls and paradoxical results pertaining to the use of traditional rank-based procedures for more than two samples have been highlighted, and the so-called pseudo-ranks have been proposed as a remedy for this type of problems. The aim of the present article is twofold: First, we show that pseudo-ranks might also behave counterintuitively when splitting up groups. Second, since the use of pseudo-ranks leads to a slightly different interpretation of the results, we provide some guidance regarding the decision for one or the other approach, in particular with respect to interpretability and generalizability of the findings. It turns out that the choice of the reference distribution, to which the individual groups are compared, is crucial. The practically relevant implications of these aspects are illustrated by a discussion of a dataset from epilepsy research. Summing up, one should decide based on thorough case-by-case considerations whether ranks or pseudo-ranks are appropriate.
Journal: The American Statistician
Pages: 124-130
Issue: 2
Volume: 76
Year: 2022
Month: 4
X-DOI: 10.1080/00031305.2021.1972836
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1972836
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:124-130
Template-Type: ReDIF-Article 1.0
Author-Name: Dale L. Zimmerman
Author-X-Name-First: Dale L.
Author-X-Name-Last: Zimmerman
Author-Name: Jay M. Ver Hoef
Author-X-Name-First: Jay M.
Author-X-Name-Last: Ver Hoef
Title: On Deconfounding Spatial Confounding in Linear Models
Abstract:
Spatial confounding, that is, collinearity between fixed effects and random effects in a spatial generalized linear mixed model, can adversely affect estimates of the fixed effects. Restricted spatial regression methods have been proposed as a remedy for spatial confounding. Such methods replace inference for the fixed effects of the original model with inference for those effects under a model in which the random effects are restricted to a subspace orthogonal to the column space of the fixed effects model matrix; thus, they “deconfound” the two types of effects. We prove, however, that frequentist inference for the fixed effects of a deconfounded linear model is generally inferior to that for the fixed effects of the original spatial linear model; in fact, it is even inferior to inference for the corresponding nonspatial model. We show further that deconfounding also leads to inferior predictive inferences, though its impact on prediction appears to be relatively small in practice. Based on these results, we argue that deconfounding a spatial linear model is bad statistical practice and should be avoided.
Journal: The American Statistician
Pages: 159-167
Issue: 2
Volume: 76
Year: 2022
Month: 4
X-DOI: 10.1080/00031305.2021.1946149
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1946149
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:159-167
Template-Type: ReDIF-Article 1.0
Author-Name: Weixiao Dai
Author-X-Name-First: Weixiao
Author-X-Name-Last: Dai
Author-Name: Toshimitsu Hamasaki
Author-X-Name-First: Toshimitsu
Author-X-Name-Last: Hamasaki
Title: Statistics in Medicine
Journal: The American Statistician
Pages: 199-200
Issue: 2
Volume: 76
Year: 2022
Month: 4
X-DOI: 10.1080/00031305.2022.2054626
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2054626
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:199-200
Template-Type: ReDIF-Article 1.0
Author-Name: Philippe Besse
Author-X-Name-First: Philippe
Author-X-Name-Last: Besse
Author-Name: Eustasio del Barrio
Author-X-Name-First: Eustasio
Author-X-Name-Last: del Barrio
Author-Name: Paula Gordaliza
Author-X-Name-First: Paula
Author-X-Name-Last: Gordaliza
Author-Name: Jean-Michel Loubes
Author-X-Name-First: Jean-Michel
Author-X-Name-Last: Loubes
Author-Name: Laurent Risser
Author-X-Name-First: Laurent
Author-X-Name-Last: Risser
Title: A Survey of Bias in Machine Learning Through the Prism of Statistical Parity
Abstract:
Applications based on machine learning models have now become an indispensable part of the everyday life and the professional world. As a consequence, a critical question has recently arose among the population: Do algorithmic decisions convey any type of discrimination against specific groups of population or minorities? In this article, we show the importance of understanding how bias can be introduced into automatic decisions. We first present a mathematical framework for the fair learning problem, specifically in the binary classification setting. We then propose to quantify the presence of bias by using the standard disparate impact index on the real and well-known adult income dataset. Finally, we check the performance of different approaches aiming to reduce the bias in binary classification outcomes. Importantly, we show that some intuitive methods are ineffective with respect to the statistical parity criterion. This sheds light on the fact that trying to make fair machine learning models may be a particularly challenging task, in particular when the training observations contain some bias.
Journal: The American Statistician
Pages: 188-198
Issue: 2
Volume: 76
Year: 2022
Month: 4
X-DOI: 10.1080/00031305.2021.1952897
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1952897
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:188-198
Template-Type: ReDIF-Article 1.0
Author-Name: Ryan Elmore
Author-X-Name-First: Ryan
Author-X-Name-Last: Elmore
Author-Name: Gregory J. Matthews
Author-X-Name-First: Gregory J.
Author-X-Name-Last: Matthews
Title: Bang the Can Slowly: An Investigation into the 2017 Houston Astros
Abstract:
This article is a statistical investigation into the 2017 Major League Baseball scandal involving the Houston Astros, the World Series championship winner that the same year. The Astros were alleged to have stolen their opponents’ pitching signs in order to provide their batters with a potentially unfair advantage. This work finds compelling evidence that the Astros on-field performance was significantly affected by their sign-stealing ploy and quantifies the effects. The three main findings in the article are (i) the Astros’ odds of swinging at a pitch were reduced by approximately 27% (OR: 0.725, 95% CI: (0.618, 0.850)) when the sign was stolen, (ii) when an Astros player swung, the odds of making contact with the ball increased roughly 80% (OR: 1.805, 95% CI: (1.342, 2.675)) on non-fastball pitches, and (iii) when the Astros made contact with a ball on a pitch in which the sign was known, the ball’s exit velocity (launch speed) increased on average by 2.386 (95% CI: (0.334, 4.451)) miles per hour.
Journal: The American Statistician
Pages: 110-116
Issue: 2
Volume: 76
Year: 2022
Month: 4
X-DOI: 10.1080/00031305.2021.1902391
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1902391
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:110-116
Template-Type: ReDIF-Article 1.0
Author-Name: Wei Wang
Author-X-Name-First: Wei
Author-X-Name-Last: Wang
Author-Name: Dylan S. Small
Author-X-Name-First: Dylan S.
Author-X-Name-Last: Small
Author-Name: Guy Cafri
Author-X-Name-First: Guy
Author-X-Name-Last: Cafri
Author-Name: Elizabeth W. Paxton
Author-X-Name-First: Elizabeth W.
Author-X-Name-Last: Paxton
Title: The Case-Control Approach Can be More Powerful for Matched Pair Observational Studies When the Outcome is Rare
Abstract:
In an observational study, to investigate the treatment effect, one common strategy is to match the control subjects to the treated subjects. The outcomes between the two groups are then compared after the TC (treatment-control) match. However, when the outcome is rare, detection of an outcome difference can be challenging. An alternative approach is to compare the treatment or exposure discrepancy after matching subjects with the outcome (cases) to subjects without the outcome (referents). Throughout the article, we follow the tradition to call this the matched “case-control” approach instead of the matched “case-referent” approach. We reserve “control” to mean not taking the treatment, and the abbreviation TC and CC (case-control) when possible confusion may arise. We derive conditions when the matched CC approach has more power for testing the treatment effect and examine its empirical performance in simulations and in our data example. We also show that the CC approach gives better match quality in our study of the effect of long vs. short stay in the hospital after joint surgery.
Journal: The American Statistician
Pages: 117-123
Issue: 2
Volume: 76
Year: 2022
Month: 4
X-DOI: 10.1080/00031305.2021.1972835
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1972835
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:117-123
Template-Type: ReDIF-Article 1.0
Author-Name: Joris Mulder
Author-X-Name-First: Joris
Author-X-Name-Last: Mulder
Author-Name: Eric-Jan Wagenmakers
Author-X-Name-First: Eric-Jan
Author-X-Name-Last: Wagenmakers
Author-Name: Maarten Marsman
Author-X-Name-First: Maarten
Author-X-Name-Last: Marsman
Title: A Generalization of the Savage–Dickey Density Ratio for Testing Equality and Order Constrained Hypotheses
Abstract:
The Savage–Dickey density ratio is a specific expression of the Bayes factor when testing a precise (equality constrained) hypothesis against an unrestricted alternative. The expression greatly simplifies the computation of the Bayes factor at the cost of assuming a specific form of the prior under the precise hypothesis as a function of the unrestricted prior. A generalization was proposed by Verdinelli and Wasserman such that the priors can be freely specified under both hypotheses while keeping the computational advantage. This article presents an extension of this generalization when the hypothesis has equality as well as order constraints on the parameters of interest. The methodology is used for a constrained multivariate t-test using the JZS Bayes factor and a constrained hypothesis test under the multinomial model.
Journal: The American Statistician
Pages: 102-109
Issue: 2
Volume: 76
Year: 2022
Month: 4
X-DOI: 10.1080/00031305.2020.1799861
File-URL: http://hdl.handle.net/10.1080/00031305.2020.1799861
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:102-109
Template-Type: ReDIF-Article 1.0
Author-Name: Sushil Kumar Singh
Author-X-Name-First: Sushil Kumar
Author-X-Name-Last: Singh
Author-Name: Neelkanth Rawat
Author-X-Name-First: Neelkanth
Author-X-Name-Last: Rawat
Author-Name: Sargun Singh
Author-X-Name-First: Sargun
Author-X-Name-Last: Singh
Author-Name: Savinder Kaur
Author-X-Name-First: Savinder
Author-X-Name-Last: Kaur
Title: Re-exploring the Penney-Ante Game
Abstract:
We propose a single loop diagram and use it to devise a single loop matrix method to computationally solve the Penney-Ante game. This method avoids the nuances of repeated use of conditional probability and Markov chain representations. We remove the limitations of Conway’s trick as applied to a fair coin and generalize the method where the coin is allowed to be biased. A uniform random number generator is used to simulate the game and formulate implicit mathematical relations to explore nontransitivity.
Journal: The American Statistician
Pages: 177-183
Issue: 2
Volume: 76
Year: 2022
Month: 4
X-DOI: 10.1080/00031305.2021.1961860
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1961860
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:177-183
Template-Type: ReDIF-Article 1.0
Author-Name: The Editors
Title: The Impact of Application of the Jackknife to the Sample Median
Journal: The American Statistician
Pages: 201-201
Issue: 2
Volume: 76
Year: 2022
Month: 4
X-DOI: 10.1080/00031305.2022.2032827
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2032827
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:201-201
Template-Type: ReDIF-Article 1.0
Author-Name: Shing Lee
Author-X-Name-First: Shing
Author-X-Name-Last: Lee
Author-Name: Emilia Bagiella
Author-X-Name-First: Emilia
Author-X-Name-Last: Bagiella
Author-Name: Roger Vaughan
Author-X-Name-First: Roger
Author-X-Name-Last: Vaughan
Author-Name: Usha Govindarajulu
Author-X-Name-First: Usha
Author-X-Name-Last: Govindarajulu
Author-Name: Paul Christos
Author-X-Name-First: Paul
Author-X-Name-Last: Christos
Author-Name: Denise Esserman
Author-X-Name-First: Denise
Author-X-Name-Last: Esserman
Author-Name: Hua Zhong
Author-X-Name-First: Hua
Author-X-Name-Last: Zhong
Author-Name: Mimi Kim
Author-X-Name-First: Mimi
Author-X-Name-Last: Kim
Title: COVID-19 Pandemic as a Change Agent in the Structure and Practice of Statistical Consulting Centers
Abstract:
When New York City (NYC) became an epicenter of the COVID-19 pandemic in the spring of 2020, statistical consulting centers at academic medical institutions in the area were immediately inundated with requests from hospital leadership and researchers for methodological support to address different aspects of the outbreak. Statisticians suddenly had to pivot from their usual responsibilities to focus entirely on COVID-19 work, and consulting centers had to devise innovative strategies to restructure their workflow and develop new infrastructure to address the acute demand for support. As statisticians from seven NYC-area institutions, we share our experiences and lessons learned during the pandemic, with the hope that this will lead not only to better preparedness for future public health crises when the skills and expertise of statisticians are critically needed, but also to lasting improvements to the structure and practice of statistical consulting centers.
Journal: The American Statistician
Pages: 152-158
Issue: 2
Volume: 76
Year: 2022
Month: 4
X-DOI: 10.1080/00031305.2021.2023045
File-URL: http://hdl.handle.net/10.1080/00031305.2021.2023045
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:2:p:152-158
Template-Type: ReDIF-Article 1.0
Author-Name: Christopher R. Bilder
Author-X-Name-First: Christopher R.
Author-X-Name-Last: Bilder
Title: Alpha Seminar: A Course for New Graduate Students in Statistics
Abstract:
The accumulation of technical knowledge is the central focus of graduate programs in statistics. However, student success does not depend solely on acquiring such knowledge. Rather, students must also understand the rigors of graduate study to complete their degree. And, they need to understand the statistics profession to prepare for a career after graduation. The purpose of the one-credit hour Alpha Seminar course at the University of Nebraska-Lincoln is to educate graduate students in these nontechnical areas. Students are required to enroll in Alpha Seminar during their first semester of study. In addition to advisement on courses and graduation requirements, Alpha Seminar features topics on career paths, ethics, professional accreditation, internships, and professional societies. Alumni also meet with the class to discuss how to be successful in the program and in a future career. This article discusses course topics, examines assignments, and provides evaluations from student cohorts. The corresponding course website is available at www.chrisbilder.com/stat810.
Journal: The American Statistician
Pages: 286-291
Issue: 3
Volume: 76
Year: 2022
Month: 7
X-DOI: 10.1080/00031305.2022.2049366
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2049366
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:286-291
Template-Type: ReDIF-Article 1.0
Author-Name: Chi-Kuang Yeh
Author-X-Name-First: Chi-Kuang
Author-X-Name-Last: Yeh
Author-Name: Gregory Rice
Author-X-Name-First: Gregory
Author-X-Name-Last: Rice
Author-Name: Joel A. Dubin
Author-X-Name-First: Joel A.
Author-X-Name-Last: Dubin
Title: Evaluating Real-Time Probabilistic Forecasts With Application to National Basketball Association Outcome Prediction
Abstract:
Motivated by the goal of evaluating real-time forecasts of home team win probabilities in the National Basketball Association, we develop new tools for measuring the quality of continuously updated probabilistic forecasts. This includes introducing calibration surface plots, and simple graphical summaries of them, to evaluate at a glance whether a given continuously updated probability forecasting method is well-calibrated, as well as developing statistical tests and graphical tools to evaluate the skill, or relative performance, of two competing continuously updated forecasting methods. These tools are demonstrated in an application to evaluate the continuously updated forecasts published by the United States-based multinational sports network ESPN on its principle webpage espn.com. This application lends statistical evidence that the forecasts published there are well-calibrated, and exhibit improved skill over several naïve models, but do not demonstrate significantly improved skill over simple logistic regression models based solely on a measurement of each teams’ relative strength, and the evolving score difference throughout the game.
Journal: The American Statistician
Pages: 214-223
Issue: 3
Volume: 76
Year: 2022
Month: 7
X-DOI: 10.1080/00031305.2021.1967781
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1967781
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:214-223
Template-Type: ReDIF-Article 1.0
Author-Name: Chris Barker
Author-X-Name-First: Chris
Author-X-Name-Last: Barker
Title: Data Monitoring Committees in Clinical Trials: A Practical Perspective
Journal: The American Statistician
Pages: 305-306
Issue: 3
Volume: 76
Year: 2022
Month: 7
X-DOI: 10.1080/00031305.2022.2088199
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2088199
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:305-306
Template-Type: ReDIF-Article 1.0
Author-Name: Brendan Kline
Author-X-Name-First: Brendan
Author-X-Name-Last: Kline
Title: Bayes Factors Based on p-Values and Sets of Priors With Restricted Strength
Abstract:
This article focuses on the minimum Bayes factor compatible with a p-value, considering a set of priors with restricted strength. The resulting minimum Bayes factor depends on both the strength of the set of priors and the sample size. The results can be used to interpret the evidence for/against the hypothesis provided by a p-value in a way that accounts for the strength of the priors and the sample size. In particular, the results suggest further lowering the p-value cutoff for “statistical significance.”
Journal: The American Statistician
Pages: 203-213
Issue: 3
Volume: 76
Year: 2022
Month: 7
X-DOI: 10.1080/00031305.2021.1877815
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1877815
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:203-213
Template-Type: ReDIF-Article 1.0
Author-Name: Kimihiro Noguchi
Author-X-Name-First: Kimihiro
Author-X-Name-Last: Noguchi
Author-Name: Koby F. Robles
Author-X-Name-First: Koby F.
Author-X-Name-Last: Robles
Title: On Generating Distributions with the Memoryless Property
Abstract:
The exponential and geometric distribution are well-known continuous and discrete family of distributions with the memoryless property, respectively. The memoryless property is emphasized in introductory probability and statistics textbooks even though no distribution beyond these two families of distributions has been explored in detail. By examining the relationship between these two families of distributions, we propose a general algorithm for generating distributions with the memoryless property. Then, we show that the general algorithm uniquely determines the distribution with the memoryless property given the parameter value, and nonnegative support which contains zero and is closed under addition. Furthermore, we present a few nontrivial examples and their applications to demonstrate the richness of such distributions.
Journal: The American Statistician
Pages: 280-285
Issue: 3
Volume: 76
Year: 2022
Month: 7
X-DOI: 10.1080/00031305.2021.2006782
File-URL: http://hdl.handle.net/10.1080/00031305.2021.2006782
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:280-285
Template-Type: ReDIF-Article 1.0
Author-Name: Nathaniel T. Stevens
Author-X-Name-First: Nathaniel T.
Author-X-Name-Last: Stevens
Author-Name: Luke Hagar
Author-X-Name-First: Luke
Author-X-Name-Last: Hagar
Title: Comparative Probability Metrics: Using Posterior Probabilities to Account for Practical Equivalence in A/B tests
Abstract:
Recently, online-controlled experiments (i.e., A/B tests) have become an extremely valuable tool used by internet and technology companies for purposes of advertising, product development, product improvement, customer acquisition, and customer retention to name a few. The data-driven decisions that result from these experiments have traditionally been informed by null hypothesis significance tests and analyses based on p-values. However, recently attention has been drawn to the shortcomings of hypothesis testing, and an emphasis has been placed on the development of new methodologies that overcome these shortcomings. We propose the use of posterior probabilities to facilitate comparisons that account for practical equivalence and that quantify the likelihood that a result is practically meaningful, as opposed to statistically significant. We call these posterior probabilities comparative probability metrics (CPMs). This Bayesian methodology provides a flexible and intuitive means of making meaningful comparisons by directly calculating, for example, the probability that two groups are practically equivalent, or the probability that one group is practically superior to another. In this article, we describe a unified framework for constructing and estimating such probabilities, and we develop a sample size determination methodology that may be used to determine how much data are required to calculate trustworthy CPMs.
Journal: The American Statistician
Pages: 224-237
Issue: 3
Volume: 76
Year: 2022
Month: 7
X-DOI: 10.1080/00031305.2021.2000495
File-URL: http://hdl.handle.net/10.1080/00031305.2021.2000495
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:224-237
Template-Type: ReDIF-Article 1.0
Author-Name: Ronald Christensen
Author-X-Name-First: Ronald
Author-X-Name-Last: Christensen
Title: Comment on “On the Power of the F-test for Hypotheses in a Linear Model” by Griffiths and Hill (2022)
Journal: The American Statistician
Pages: 310-311
Issue: 3
Volume: 76
Year: 2022
Month: 7
X-DOI: 10.1080/00031305.2022.2074541
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2074541
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:310-311
Template-Type: ReDIF-Article 1.0
Author-Name: Kenneth Rice
Author-X-Name-First: Kenneth
Author-X-Name-Last: Rice
Author-Name: Lingbo Ye
Author-X-Name-First: Lingbo
Author-X-Name-Last: Ye
Title: Expressing Regret: A Unified View of Credible Intervals
Abstract:
Posterior uncertainty is typically summarized as a credible interval, an interval in the parameter space that contains a fixed proportion—usually 95%—of the posterior’s support. For multivariate parameters, credible sets perform the same role. There are of course many potential 95% intervals from which to choose, yet even standard choices are rarely justified in any formal way. In this article we give a general method, focusing on the loss function that motivates an estimate—the Bayes rule—around which we construct a credible set. The set contains all points which, as estimates, would have minimally-worse expected loss than the Bayes rule: we call this excess expected loss “regret.” The approach can be used for any model and prior, and we show how it justifies all widely used choices of credible interval/set. Further examples show how it provides insights into more complex estimation problems. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 248-256
Issue: 3
Volume: 76
Year: 2022
Month: 7
X-DOI: 10.1080/00031305.2022.2039764
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2039764
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:248-256
Template-Type: ReDIF-Article 1.0
Author-Name: Oliver Hines
Author-X-Name-First: Oliver
Author-X-Name-Last: Hines
Author-Name: Oliver Dukes
Author-X-Name-First: Oliver
Author-X-Name-Last: Dukes
Author-Name: Karla Diaz-Ordaz
Author-X-Name-First: Karla
Author-X-Name-Last: Diaz-Ordaz
Author-Name: Stijn Vansteelandt
Author-X-Name-First: Stijn
Author-X-Name-Last: Vansteelandt
Title: Demystifying Statistical Learning Based on Efficient Influence Functions
Abstract:
Evaluation of treatment effects and more general estimands is typically achieved via parametric modeling, which is unsatisfactory since model misspecification is likely. Data-adaptive model building (e.g., statistical/machine learning) is commonly employed to reduce the risk of misspecification. Naïve use of such methods, however, delivers estimators whose bias may shrink too slowly with sample size for inferential methods to perform well, including those based on the bootstrap. Bias arises because standard data-adaptive methods are tuned toward minimal prediction error as opposed to, for example, minimal MSE in the estimator. This may cause excess variability that is difficult to acknowledge, due to the complexity of such strategies. Building on results from nonparametric statistics, targeted learning and debiased machine learning overcome these problems by constructing estimators using the estimand’s efficient influence function under the nonparametric model. These increasingly popular methodologies typically assume that the efficient influence function is given, or that the reader is familiar with its derivation. In this article, we focus on derivation of the efficient influence function and explain how it may be used to construct statistical/machine-learning-based estimators. We discuss the requisite conditions for these estimators to perform well and use diverse examples to convey the broad applicability of the theory.
Journal: The American Statistician
Pages: 292-304
Issue: 3
Volume: 76
Year: 2022
Month: 7
X-DOI: 10.1080/00031305.2021.2021984
File-URL: http://hdl.handle.net/10.1080/00031305.2021.2021984
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:292-304
Template-Type: ReDIF-Article 1.0
Author-Name: Chanseok Park
Author-X-Name-First: Chanseok
Author-X-Name-Last: Park
Author-Name: Kun Gou
Author-X-Name-First: Kun
Author-X-Name-Last: Gou
Author-Name: Min Wang
Author-X-Name-First: Min
Author-X-Name-Last: Wang
Title: A Study on Estimating the Parameter of the Truncated Geometric Distribution
Abstract:
We consider the truncated geometric distribution and analyze the condition under which a nontrivial maximum likelihood (ML) estimator of the parameter p exists. Additionally, the uniqueness criterion of such an ML estimator is also investigated. Our results indicate that in order to ensure the existence of a nontrivial ML estimator, the sample mean should be smaller than the midpoint of the two boundary positions. Without such a condition, the ML estimator will only exist trivially at p = 0. Finally, we demonstrate that the same condition is also required for the existence of the method of moments estimator. Our results lead to a rigorous understanding of the two estimators and aid in the interpretation of experimental designs that incorporate the truncated geometric distribution.
Journal: The American Statistician
Pages: 257-261
Issue: 3
Volume: 76
Year: 2022
Month: 7
X-DOI: 10.1080/00031305.2022.2034666
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2034666
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:257-261
Template-Type: ReDIF-Article 1.0
Author-Name: Emilija Perković
Author-X-Name-First: Emilija
Author-X-Name-Last: Perković
Title: Leadership in Statistics and Data Science: Planning for Inclusive Excellence,
Journal: The American Statistician
Pages: 306-307
Issue: 3
Volume: 76
Year: 2022
Month: 7
X-DOI: 10.1080/00031305.2022.2088201
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2088201
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:306-307
Template-Type: ReDIF-Article 1.0
Author-Name: Francis K. C. Hui
Author-X-Name-First: Francis K. C.
Author-X-Name-Last: Hui
Author-Name: Howard D. Bondell
Author-X-Name-First: Howard D.
Author-X-Name-Last: Bondell
Title: Spatial Confounding in Generalized Estimating Equations
Abstract:
Spatial confounding, where the inclusion of a spatial random effect introduces multicollinearity with spatially structured covariates, is a contentious and active area of research in spatial statistics. However, the majority of research into this topic has focused on the case of spatial mixed models. In this article, we demonstrate that spatial confounding can also arise in the setting of generalized estimating equations (GEEs). The phenomenon occurs when a spatially structured working correlation matrix is used, as it effectively induces a spatial effect which may exhibit collinearity with the covariates in the marginal mean. As a result, the GEE ends up estimating a so-called unpartitioned effect of the covariates. To overcome spatial confounding, we propose a restricted spatial working correlation matrix that leads the GEE to instead estimate a partitioned covariate effect, which additionally captures the portion of spatial variability in the response spanned by the column space of the covariates. We also examine the construction of sandwich-based standard errors, showing that the issue of efficiency is tied to whether the working correlation matrix aligns with the target effect of interest. We conclude by highlighting the need for practitioners to make clear the assumptions and target of interest when applying GEEs in a spatial setting, and not simply rely on the robustness property of GEEs to misspecification of the working correlation matrix.
Journal: The American Statistician
Pages: 238-247
Issue: 3
Volume: 76
Year: 2022
Month: 7
X-DOI: 10.1080/00031305.2021.2009372
File-URL: http://hdl.handle.net/10.1080/00031305.2021.2009372
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:238-247
Template-Type: ReDIF-Article 1.0
Author-Name: Maren Hackenberg
Author-X-Name-First: Maren
Author-X-Name-Last: Hackenberg
Author-Name: Marlon Grodd
Author-X-Name-First: Marlon
Author-X-Name-Last: Grodd
Author-Name: Clemens Kreutz
Author-X-Name-First: Clemens
Author-X-Name-Last: Kreutz
Author-Name: Martina Fischer
Author-X-Name-First: Martina
Author-X-Name-Last: Fischer
Author-Name: Janina Esins
Author-X-Name-First: Janina
Author-X-Name-Last: Esins
Author-Name: Linus Grabenhenrich
Author-X-Name-First: Linus
Author-X-Name-Last: Grabenhenrich
Author-Name: Christian Karagiannidis
Author-X-Name-First: Christian
Author-X-Name-Last: Karagiannidis
Author-Name: Harald Binder
Author-X-Name-First: Harald
Author-X-Name-Last: Binder
Title: Using Differentiable Programming for Flexible Statistical Modeling
Abstract:
Differentiable programming has recently received much interest as a paradigm that facilitates taking gradients of computer programs. While the corresponding flexible gradient-based optimization approaches so far have been used predominantly for deep learning or enriching the latter with modeling components, we want to demonstrate that they can also be useful for statistical modeling per se, for example, for quick prototyping when classical maximum likelihood approaches are challenging or not feasible. In an application from a COVID-19 setting, we use differentiable programming to quickly build and optimize a flexible prediction model adapted to the data quality challenges at hand. Specifically, we develop a regression model, inspired by delay differential equations, that can bridge temporal gaps of observations in the central German registry of COVID-19 intensive care cases for predicting future demand. With this exemplary modeling challenge, we illustrate how differentiable programming can enable simple gradient-based optimization of the model by automatic differentiation. This allowed us to quickly prototype a model under time pressure that outperforms simpler benchmark models. We thus exemplify the potential of differentiable programming also outside deep learning applications to provide more options for flexible applied statistical modeling.
Journal: The American Statistician
Pages: 270-279
Issue: 3
Volume: 76
Year: 2022
Month: 7
X-DOI: 10.1080/00031305.2021.2002189
File-URL: http://hdl.handle.net/10.1080/00031305.2021.2002189
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:270-279
Template-Type: ReDIF-Article 1.0
Author-Name: David A. Harville
Author-X-Name-First: David A.
Author-X-Name-Last: Harville
Title: Comment on “On the Power of the F-test for Hypotheses in a Linear Model,” by Griffiths and Hill (2022)
Journal: The American Statistician
Pages: 308-309
Issue: 3
Volume: 76
Year: 2022
Month: 7
X-DOI: 10.1080/00031305.2022.2074540
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2074540
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:308-309
Template-Type: ReDIF-Article 1.0
Author-Name: Katherine Allen-Moyer
Author-X-Name-First: Katherine
Author-X-Name-Last: Allen-Moyer
Author-Name: Jonathan Stallrich
Author-X-Name-First: Jonathan
Author-X-Name-Last: Stallrich
Title: Incorporating Minimum Variances into Weighted Optimality Criteria
Abstract:
Weighted optimality criteria allow an experimenter to express hierarchical interest across estimable functions through a concise weighting system. We show how such criteria can be implicitly influenced by the estimable functions’ minimum variances, leading to nonintuitive variance properties of the optimal designs. To address this, we propose a new optimality and evaluation approach that incorporates these minimum variances. A modified c-optimality criterion is introduced to calculate an estimable function’s minimum variance while requiring estimability of all other functions of interest. These minimum variances are then incorporated into a standardized weighted A-criterion that has an intuitive weighting system. We argue that optimal designs under this criterion tend to satisfy the conditions of a new design property we call weight adherence that sets appropriate expectations for how a given weighting system will influence variance properties. A practical, exploratory approach is then described for weighted optimal design generation and evaluation. Examples of the exploratory approach and weight adherence are provided for two types of factorial experiments.
Journal: The American Statistician
Pages: 262-269
Issue: 3
Volume: 76
Year: 2022
Month: 7
X-DOI: 10.1080/00031305.2021.1947375
File-URL: http://hdl.handle.net/10.1080/00031305.2021.1947375
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:262-269
Template-Type: ReDIF-Article 1.0
Author-Name: William E. Griffiths
Author-X-Name-First: William E.
Author-X-Name-Last: Griffiths
Author-Name: R. Carter Hill
Author-X-Name-First: R.
Author-X-Name-Last: Carter Hill
Title: Rejoinder to Harville (2022) and Christensen (2022) Comments on “On the Power of the F-test for Hypotheses in a Linear Model,” by Griffiths and Hill (2022)
Journal: The American Statistician
Pages: 312-312
Issue: 3
Volume: 76
Year: 2022
Month: 7
X-DOI: 10.1080/00031305.2022.2074542
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2074542
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:3:p:312-312
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2066725_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: Juxin Liu
Author-X-Name-First: Juxin
Author-X-Name-Last: Liu
Author-Name: Annshirley Afful
Author-X-Name-First: Annshirley
Author-X-Name-Last: Afful
Author-Name: Holly Mansell
Author-X-Name-First: Holly
Author-X-Name-Last: Mansell
Author-Name: Yanyuan Ma
Author-X-Name-First: Yanyuan
Author-X-Name-Last: Ma
Title: Bias Analysis for Misclassification Errors in both the Response Variable and Covariate
Abstract:
Abstract–Much literature has focused on statistical inference for misclassified response variables or misclassified covariates. However, misclassification in both the response variable and the covariate has received very limited attention within applied fields and the statistics community. In situations where the response variable and the covariate are simultaneously subject to misclassification errors, an assumption of independent misclassification errors is often used for convenience without justification. This article aims to show the harmful consequences of inappropriate adjustment for joint misclassification errors. In particular, we focus on the wrong adjustment by ignoring the dependence between the misclassification process of the response variable and the covariate. In this article, the dependence of misclassification in both variables is characterized by covariance-type parameters. We extend the original definition of dependence parameters to a more general setting. We discover a single quantity that governs the dependence of the two misclassification processes. Moreover, we propose likelihood ratio tests to check the nondifferential/independent misclassification assumption in main study/internal validation study designs. Our simulation studies indicate that ignoring the dependent error structure can be even worse than ignoring all the misclassification errors when the validation data size is relatively small. The methodology is illustrated by a real data example.
Journal: The American Statistician
Pages: 353-362
Issue: 4
Volume: 76
Year: 2022
Month: 10
X-DOI: 10.1080/00031305.2022.2066725
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2066725
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:353-362
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2063944_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: Paul R. Rosenbaum
Author-X-Name-First: Paul R.
Author-X-Name-Last: Rosenbaum
Title: A New Transformation of Treated-Control Matched-Pair Differences for Graphical Display
Abstract:
A new transformation is proposed for treated-minus-control matched pair differences that leaves the center of their distribution untouched, but symmetrically and smoothly transforms and shortens the tails. In this way, the center of the distribution is interpretable, undistorted and uncompressed, yet outliers are clear and distinct along the periphery. The transformation of pair differences, y↦ϱ(y)
,is strictly increasing, continuous, differentiable and odd, ϱ(−y)=−ϱ(y)
, so its action in the extreme upper tail mirrors its action in the extreme lower tail. Moreover, the center of the distribution—typically 90% or 95% of the distribution—is not transformed, with ϱ(y)=y
for −β≤y≤β
, yet the nonlinear transformation of the tails is barely perceptible as it begins at ±β
, in the sense that 1=ϱ′(β)=ϱ′(−β)
, where ϱ′(·)
is the derivative of ϱ(·)
. The transformation is applied to an observational study of the effect of light daily alcohol consumption on the level of HDL cholesterol. The study has three control groups intended to address specific unmeasured biases; so, several types of pair differences require coordinated depiction focused on unmeasured bias, not outliers. An R package tailTransform implements the method, contains the data, and reproduces aspects of the graphs and data analysis.
Journal: The American Statistician
Pages: 346-352
Issue: 4
Volume: 76
Year: 2022
Month: 10
X-DOI: 10.1080/00031305.2022.2063944
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2063944
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:346-352
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2126685_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: Jie Cui
Author-X-Name-First: Jie
Author-X-Name-Last: Cui
Author-Name: Haoda Fu
Author-X-Name-First: Haoda
Author-X-Name-Last: Fu
Title: Statistical Issues in Drug Development, 3rd ed.
Journal: The American Statistician
Pages: 431-431
Issue: 4
Volume: 76
Year: 2022
Month: 10
X-DOI: 10.1080/00031305.2022.2126685
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2126685
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:431-431
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2089232_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: Mine Dogucu
Author-X-Name-First: Mine
Author-X-Name-Last: Dogucu
Author-Name: Jingchen Hu
Author-X-Name-First: Jingchen
Author-X-Name-Last: Hu
Title: The Current State of Undergraduate Bayesian Education and Recommendations for the Future
Abstract:
As a result of the increased emphasis on mis- and over-use of p-values in scientific research and the rise in popularity of Bayesian statistics, Bayesian education is becoming more important at the undergraduate level. With the advances in computing tools, Bayesian statistics is also becoming more accessible for undergraduates. This study focuses on analyzing Bayesian courses for undergraduates. We explored whether an undergraduate Bayesian course is offered in our sample of 152 high-ranking research universities and liberal arts colleges. For each identified Bayesian course, we examined how it fits into the institution’s undergraduate curricula, such as majors and prerequisites. Through a series of course syllabi analyses, we explored the topics covered and their popularity in these courses, and the adopted teaching and learning tools, such as software. This article presents our findings on the current practices of teaching full Bayesian courses at the undergraduate level. Based on our findings, we provide recommendations for programs that may consider offering Bayesian courses to their students.
Journal: The American Statistician
Pages: 405-413
Issue: 4
Volume: 76
Year: 2022
Month: 10
X-DOI: 10.1080/00031305.2022.2089232
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2089232
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:405-413
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2126684_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: Scott A. Roths
Author-X-Name-First: Scott A.
Author-X-Name-Last: Roths
Title: Probability, Statistics, and Data: A Fresh Approach Using R
Journal: The American Statistician
Pages: 430-430
Issue: 4
Volume: 76
Year: 2022
Month: 10
X-DOI: 10.1080/00031305.2022.2126684
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2126684
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:430-430
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2107568_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: Andrew J. Sage
Author-X-Name-First: Andrew J.
Author-X-Name-Last: Sage
Author-Name: Yang Liu
Author-X-Name-First: Yang
Author-X-Name-Last: Liu
Author-Name: Joe Sato
Author-X-Name-First: Joe
Author-X-Name-Last: Sato
Title: From Black Box to Shining Spotlight: Using Random Forest Prediction Intervals to Illuminate the Impact of Assumptions in Linear Regression
Abstract:
We introduce a pair of Shiny web applications that allow users to visualize random forest prediction intervals alongside those produced by linear regression models. The apps are designed to help undergraduate students deepen their understanding of the role that assumptions play in statistical modeling by comparing and contrasting intervals produced by regression models with those produced by more flexible algorithmic techniques. We describe the mechanics of each approach, illustrate the features of the apps, provide examples highlighting the insights students can gain through their use, and discuss our experience implementing them in an undergraduate class. We argue that, contrary to their reputation as a black box, random forests can be used as a spotlight, for educational purposes, illuminating the role of assumptions in regression models and their impact on the shape, width, and coverage rates of prediction intervals.
Journal: The American Statistician
Pages: 414-429
Issue: 4
Volume: 76
Year: 2022
Month: 10
X-DOI: 10.1080/00031305.2022.2107568
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2107568
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:414-429
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2055644_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: Ronald Christensen
Author-X-Name-First: Ronald
Author-X-Name-Last: Christensen
Title: Comment on “On Optimal Correlation-Based Prediction,” by Bottai et al. (2022)
Journal: The American Statistician
Pages: 322-322
Issue: 4
Volume: 76
Year: 2022
Month: 10
X-DOI: 10.1080/00031305.2022.2055644
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2055644
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:322-322
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2046159_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: James A. Hanley
Author-X-Name-First: James A.
Author-X-Name-Last: Hanley
Author-Name: Sahir Bhatnagar
Author-X-Name-First: Sahir
Author-X-Name-Last: Bhatnagar
Title: The “Poisson” Distribution: History, Reenactments, Adaptations
Abstract:
Although it is a widely used—and misused—discrete distribution, textbooks tend to give the history of the Poisson distribution short shrift, typically deriving it in the abstract as a limiting case of a binomial. The biological and physical scientists who independently derived it using space and time considerations and used it in their work are seldom mentioned. Nor are the difficulties of applying it to counts involving human activities/behavior. We (a) sketch the early history of the Poisson distribution (b) illustrate principles of the Poisson distribution involving space and time using the original biological and physical applications, as well as modern multimedia reenactments of them, and (c) motivate count distributions accounting for extra-Poisson variation. The replayed historical applications can help today’s students, teachers and practitioners to see or hear what randomness looks or sounds like, to get practice in the practicalities of “counting statistics,” to distinguish situations where the pure Poisson distribution does and doesn’t hold—and to think about what one might do when it doesn’t.
Journal: The American Statistician
Pages: 363-371
Issue: 4
Volume: 76
Year: 2022
Month: 10
X-DOI: 10.1080/00031305.2022.2046159
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2046159
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:363-371
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2076743_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: Stephen Portnoy
Author-X-Name-First: Stephen
Author-X-Name-Last: Portnoy
Title: Linearity of Unbiased Linear Model Estimators
Abstract:
Best linear unbiased estimators (BLUE’s) are known to be optimal in many respects under normal assumptions. Since variance minimization doesn’t depend on normality and unbiasedness is often considered reasonable, many statisticians have felt that BLUE’s ought to preform relatively well in some generality. The result here considers the general linear model and shows that any measurable estimator that is unbiased over a moderately large family of distributions must be linear. Thus, imposing unbiasedness cannot offer any improvement over imposing linearity. The problem was suggested by Hansen, who showed that any estimator unbiased for nearly all error distributions (with finite covariance) must have a variance no smaller than that of the best linear estimator in some parametric subfamily. Specifically, the hypothesis of linearity can be dropped from the classical Gauss–Markov Theorem. This might suggest that the best unbiased estimator should provide superior performance, but the result here shows that the best unbiased regression estimator can be no better than the best linear estimator.
Journal: The American Statistician
Pages: 372-375
Issue: 4
Volume: 76
Year: 2022
Month: 10
X-DOI: 10.1080/00031305.2022.2076743
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2076743
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:372-375
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2096695_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: Bradley Lubich
Author-X-Name-First: Bradley
Author-X-Name-Last: Lubich
Author-Name: Daniel Jeske
Author-X-Name-First: Daniel
Author-X-Name-Last: Jeske
Author-Name: Weixin Yao
Author-X-Name-First: Weixin
Author-X-Name-Last: Yao
Title: Statistical Inference for Method of Moments Estimators of a Semi-Supervised Two-Component Mixture Model
Abstract:
A mixture of a distribution of responses from untreated patients and a shift of that distribution is a useful model for the responses from a group of treated patients. The mixture model accounts for the fact that not all the patients in the treated group will respond to the treatment and consequently their responses follow the same distribution as the responses from untreated patients. The treatment effect in this context consists of both the fraction of the treated patients that are responders and the magnitude of the shift in the distribution for the responders. In this article, we investigate asymptotic properties of method of moment estimators for the treatment effect based on a semi-supervised two-component mixture model. From these properties, we develop asymptotic confidence intervals and demonstrate their superior statistical inference performance compared to the computationally intensive bootstrap intervals and their Bias-Corrected versions.
Journal: The American Statistician
Pages: 376-383
Issue: 4
Volume: 76
Year: 2022
Month: 10
X-DOI: 10.1080/00031305.2022.2096695
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2096695
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:376-383
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2041482_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: Andee Kaplan
Author-X-Name-First: Andee
Author-X-Name-Last: Kaplan
Author-Name: Brenda Betancourt
Author-X-Name-First: Brenda
Author-X-Name-Last: Betancourt
Author-Name: Rebecca C. Steorts
Author-X-Name-First: Rebecca C.
Author-X-Name-Last: Steorts
Title: A Practical Approach to Proper Inference with Linked Data
Abstract:
Entity resolution (ER), comprising record linkage and deduplication, is the process of merging noisy databases in the absence of unique identifiers to remove duplicate entities. One major challenge of analysis with linked data is identifying a representative record among determined matches to pass to an inferential or predictive task, referred to as the downstream task. Additionally, incorporating uncertainty from ER in the downstream task is critical to ensure proper inference. To bridge the gap between ER and the downstream task in an analysis pipeline, we propose five methods to choose a representative (or canonical) record from linked data, referred to as canonicalization. Our methods are scalable in the number of records, appropriate in general data scenarios, and provide natural error propagation via a Bayesian canonicalization stage. The proposed methodology is evaluated on three simulated datasets and one application – determining the relationship between demographic information and party affiliation in voter registration data from the North Carolina State Board of Elections. We first perform Bayesian ER and evaluate our proposed methods for canonicalization before considering the downstream tasks of linear and logistic regression. Bayesian canonicalization methods are empirically shown to improve downstream inference in both settings through prediction and coverage.
Journal: The American Statistician
Pages: 384-393
Issue: 4
Volume: 76
Year: 2022
Month: 10
X-DOI: 10.1080/00031305.2022.2041482
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2041482
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:384-393
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2006781_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: Yao Li
Author-X-Name-First: Yao
Author-X-Name-Last: Li
Author-Name: Minhao Cheng
Author-X-Name-First: Minhao
Author-X-Name-Last: Cheng
Author-Name: Cho-Jui Hsieh
Author-X-Name-First: Cho-Jui
Author-X-Name-Last: Hsieh
Author-Name: Thomas C. M. Lee
Author-X-Name-First: Thomas C. M.
Author-X-Name-Last: Lee
Title: A Review of Adversarial Attack and Defense for Classification Methods
Abstract:
Despite the efficiency and scalability of machine learning systems, recent studies have demonstrated that many classification methods, especially Deep Neural Networks (DNNs), are vulnerable to adversarial examples; that is, examples that are carefully crafted to fool a well-trained classification model while being indistinguishable from natural data to human. This makes it potentially unsafe to apply DNNs or related methods in security-critical areas. Since this issue was first identified by Biggio et al. and Szegedy et al., much work has been done in this field, including the development of attack methods to generate adversarial examples and the construction of defense techniques to guard against such examples. This article aims to introduce this topic and its latest developments to the statistical community, primarily focusing on the generation and guarding of adversarial examples. Computing codes (in Python and R) used in the numerical experiments are publicly available for readers to explore the surveyed methods. It is the hope of the authors that this article will encourage more statisticians to work on this important and exciting field of generating and defending against adversarial examples.
Journal: The American Statistician
Pages: 329-345
Issue: 4
Volume: 76
Year: 2022
Month: 10
X-DOI: 10.1080/00031305.2021.2006781
File-URL: http://hdl.handle.net/10.1080/00031305.2021.2006781
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:329-345
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2051604_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: Matteo Bottai
Author-X-Name-First: Matteo
Author-X-Name-Last: Bottai
Author-Name: Taeho Kim
Author-X-Name-First: Taeho
Author-X-Name-Last: Kim
Author-Name: Benjamin Lieberman
Author-X-Name-First: Benjamin
Author-X-Name-Last: Lieberman
Author-Name: George Luta
Author-X-Name-First: George
Author-X-Name-Last: Luta
Author-Name: Edsel Peña
Author-X-Name-First: Edsel
Author-X-Name-Last: Peña
Title: On Optimal Correlation-Based Prediction
Abstract:
This note examines, at the population-level, the approach of obtaining predictors h˜(X) of a random variable Y, given the joint distribution of (Y,X), by maximizing the mapping h↦κ(Y,h(X)) for a given correlation function κ(·,·). Commencing with Pearson’s correlation function, the class of such predictors is uncountably infinite. The least-squares predictor h* is an element of this class obtained by equating the expectations of Y and h(X) to be equal and the variances of h(X) and E(Y|X) to be also equal. On the other hand, replacing the second condition by the equality of the variances of Y and h(X), a natural requirement for some calibration problems, the unique predictor h** that is obtained has the maximum value of Lin’s (1989) concordance correlation coefficient (CCC) with Y among all predictors. Since the CCC measures the degree of agreement, the new predictor h** is called the maximal agreement predictor. These predictors are illustrated for three special distributions: the multivariate normal distribution; the exponential distribution, conditional on covariates; and the Dirichlet distribution. The exponential distribution is relevant in survival analysis or in reliability settings, while the Dirichlet distribution is relevant for compositional data.
Journal: The American Statistician
Pages: 313-321
Issue: 4
Volume: 76
Year: 2022
Month: 10
X-DOI: 10.1080/00031305.2022.2051604
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2051604
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:313-321
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2006780_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: Bo Liu
Author-X-Name-First: Bo
Author-X-Name-Last: Liu
Author-Name: Jerome P. Reiter
Author-X-Name-First: Jerome P.
Author-X-Name-Last: Reiter
Title: Multiple Imputation Inference with Integer-Valued Point Estimates
Abstract:
We consider settings where an analyst of multiply imputed data desires an integer-valued point estimate and an associated interval estimate, for example, a count of the number of individuals with certain characteristics in a population. Even when the point estimate in each completed dataset is an integer, the multiple imputation point estimator, that is, the average of these completed-data estimators, is not guaranteed to be an integer. One natural approach is to round the standard multiple imputation point estimator to an integer. Another seemingly natural approach is to use the median of the completed-data point estimates (when they are integers). However, these two approaches have not been compared; indeed, methods for obtaining multiple imputation inferences associated with the median of the completed-data point estimates do not even exist. In this article, we evaluate and compare these two approaches. In doing so, we derive an estimator of the variance of the median-based multiple imputation point estimator, as well as a method for obtaining associated multiple imputation confidence intervals. Using simulation studies, we show that both methods can offer well-calibrated coverage rates and have similar repeated sampling properties, and hence are both useful for this analysis task.
Journal: The American Statistician
Pages: 323-328
Issue: 4
Volume: 76
Year: 2022
Month: 10
X-DOI: 10.1080/00031305.2021.2006780
File-URL: http://hdl.handle.net/10.1080/00031305.2021.2006780
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:323-328
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2054859_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: Per Johansson
Author-X-Name-First: Per
Author-X-Name-Last: Johansson
Author-Name: Mattias Nordin
Author-X-Name-First: Mattias
Author-X-Name-Last: Nordin
Title: Inference in Experiments Conditional on Observed Imbalances in Covariates
Abstract:
Double-blind randomized controlled trials are traditionally seen as the gold standard for causal inferences as the difference-in-means estimator is an unbiased estimator of the average treatment effect in the experiment. The fact that this estimator is unbiased over all possible randomizations does not, however, mean that any given estimate is close to the true treatment effect. Similarly, while predetermined covariates will be balanced between treatment and control groups on average, large imbalances may be observed in a given experiment and the researcher may therefore want to condition on such covariates using linear regression. This article studies the theoretical properties of both the difference-in-means and OLS estimators conditional on observed differences in covariates. By deriving the statistical properties of the conditional estimators, we can establish guidance for how to deal with covariate imbalances.
Journal: The American Statistician
Pages: 394-404
Issue: 4
Volume: 76
Year: 2022
Month: 10
X-DOI: 10.1080/00031305.2022.2054859
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2054859
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:76:y:2022:i:4:p:394-404
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2070279_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: Yuxin Qin
Author-X-Name-First: Yuxin
Author-X-Name-Last: Qin
Author-Name: Heather Sasinowska
Author-X-Name-First: Heather
Author-X-Name-Last: Sasinowska
Author-Name: Lawrence Leemis
Author-X-Name-First: Lawrence
Author-X-Name-Last: Leemis
Title: The Probability Mass Function of the Kaplan–Meier Product–Limit Estimator
Abstract:
Kaplan and Meier’s 1958 article developed a nonparametric estimator of the survivor function from a right-censored dataset. Determining the size of the support of the estimator as a function of the sample size provides a challenging exercise for students in an advanced course in mathematical statistics. We devise two algorithms for calculating the support size and calculate the associated probability mass function for small sample sizes and particular probability distributions for the failure and censoring times.
Journal: The American Statistician
Pages: 102-110
Issue: 1
Volume: 77
Year: 2023
Month: 1
X-DOI: 10.1080/00031305.2022.2070279
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2070279
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:102-110
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2028675_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: Joris Mulder
Author-X-Name-First: Joris
Author-X-Name-Last: Mulder
Title: Bayesian Testing of Linear Versus Nonlinear Effects Using Gaussian Process Priors
Abstract:
A Bayes factor is proposed for testing whether the effect of a key predictor variable on a dependent variable is linear or nonlinear, possibly while controlling for certain covariates. The test can be used (i) in substantive research for assessing the nature of the relationship between certain variables based on scientific expectations, and (ii) for statistical model building to infer whether a (transformed) variable should be added as a linear or nonlinear predictor in a regression model. Under the nonlinear model, a Gaussian process prior is employed using a parameterization similar to Zellner’s g prior resulting in a scale-invariant test. Unlike existing p-values, the proposed Bayes factor can be used for quantifying the relative evidence in the data in favor of linearity. Furthermore the Bayes factor does not overestimate the evidence against the linear null model resulting in more parsimonious models. An extension is proposed for Bayesian one-sided testing of whether a nonlinear effect is consistently positive, consistently negative, or neither. Applications are provided from various fields including social network research and education.
Journal: The American Statistician
Pages: 1-11
Issue: 1
Volume: 77
Year: 2023
Month: 1
X-DOI: 10.1080/00031305.2022.2028675
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2028675
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:1-11
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2110938_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: Alan D. Hutson
Author-X-Name-First: Alan D.
Author-X-Name-Last: Hutson
Author-Name: Han Yu
Author-X-Name-First: Han
Author-X-Name-Last: Yu
Title: The Sign Test, Paired Data, and Asymmetric Dependence: A Cautionary Tale
Abstract:
In the paired data setting, the sign test is often described in statistical textbooks as a test for comparing differences between the medians of two marginal distributions. There is an implicit assumption that the median of the differences is equivalent to the difference of the medians when employing the sign test in this fashion. We demonstrate however that given asymmetry in the bivariate distribution of the paired data, there are often scenarios where the median of the differences is not equal to the difference of the medians. Further, we show that these scenarios will lead to a false interpretation of the sign test for its intended use in the paired data setting. We illustrate the false-interpretation concept via theory, a simulation study, and through a real-world example based on breast cancer RNA sequencing data obtained from the Cancer Genome Atlas (TCGA).
Journal: The American Statistician
Pages: 35-40
Issue: 1
Volume: 77
Year: 2023
Month: 1
X-DOI: 10.1080/00031305.2022.2110938
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2110938
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:35-40
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2058611_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: Vojtech Kejzlar
Author-X-Name-First: Vojtech
Author-X-Name-Last: Kejzlar
Author-Name: Shrijita Bhattacharya
Author-X-Name-First: Shrijita
Author-X-Name-Last: Bhattacharya
Author-Name: Mookyong Son
Author-X-Name-First: Mookyong
Author-X-Name-Last: Son
Author-Name: Tapabrata Maiti
Author-X-Name-First: Tapabrata
Author-X-Name-Last: Maiti
Title: Black Box Variational Bayesian Model Averaging
Abstract:
For many decades now, Bayesian Model Averaging (BMA) has been a popular framework to systematically account for model uncertainty that arises in situations when multiple competing models are available to describe the same or similar physical process. The implementation of this framework, however, comes with a multitude of practical challenges including posterior approximation via Markov chain Monte Carlo and numerical integration. We present a Variational Bayesian Inference approach to BMA as a viable alternative to the standard solutions which avoids many of the aforementioned pitfalls. The proposed method is “black box” in the sense that it can be readily applied to many models with little to no model-specific derivation. We illustrate the utility of our variational approach on a suite of examples and discuss all the necessary implementation details. Fully documented Python code with all the examples is provided as well.
Journal: The American Statistician
Pages: 85-96
Issue: 1
Volume: 77
Year: 2023
Month: 1
X-DOI: 10.1080/00031305.2022.2058611
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2058611
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:85-96
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2046160_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: Zifei Han
Author-X-Name-First: Zifei
Author-X-Name-Last: Han
Author-Name: Keying Ye
Author-X-Name-First: Keying
Author-X-Name-Last: Ye
Author-Name: Min Wang
Author-X-Name-First: Min
Author-X-Name-Last: Wang
Title: A Study on the Power Parameter in Power Prior Bayesian Analysis
Abstract:
The power prior and its variations have been proven to be a useful class of informative priors in Bayesian inference due to their flexibility in incorporating the historical information by raising the likelihood of the historical data to a fractional power δ. The derivation of the marginal likelihood based on the original power prior, and its variation, the normalized power prior, introduces a scaling factor C(δ)
in the form of a prior predictive distribution with powered likelihood. In this article, we show that the scaling factor might be infinite for some positive δ with conventionally used initial priors, which would change the admissible set of the power parameter. This result seems to have been almost completely ignored in the literature. We then illustrate that such a phenomenon may jeopardize the posterior inference under the power priors when the initial prior of the model parameters is improper. The main findings of this article suggest that special attention should be paid when the suggested level of borrowing is close to 0, while the actual optimum might be below the suggested value. We use a normal linear model as an example for illustrative purposes.
Journal: The American Statistician
Pages: 12-19
Issue: 1
Volume: 77
Year: 2023
Month: 1
X-DOI: 10.1080/00031305.2022.2046160
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2046160
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:12-19
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2141879_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: Stan Lipovetsky
Author-X-Name-First: Stan
Author-X-Name-Last: Lipovetsky
Title: Comment on “On Optimal Correlation-Based Prediction”, By Bottai et al. (2022)
Journal: The American Statistician
Pages: 113-113
Issue: 1
Volume: 77
Year: 2023
Month: 1
X-DOI: 10.1080/00031305.2022.2141879
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2141879
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:113-113
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2160590_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: James O. Ramsay
Author-X-Name-First: James O.
Author-X-Name-Last: Ramsay
Title: Object Oriented Data Analysis
Journal: The American Statistician
Pages: 111-111
Issue: 1
Volume: 77
Year: 2023
Month: 1
X-DOI: 10.1080/00031305.2022.2160590
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2160590
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:111-111
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2106305_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: Alan Huang
Author-X-Name-First: Alan
Author-X-Name-Last: Huang
Title: On Arbitrarily Underdispersed Discrete Distributions
Abstract:
We survey a range of popular generalized count distributions, investigating which (if any) can be arbitrarily underdispersed, that is, its variance can be arbitrarily small compared to its mean. A philosophical implication is that some models failing this simple criterion should not be considered as “statistical models” according to McCullagh’s extendibility criterion. Four practical implications are also discussed: (i) functional independence of parameters, (ii) double generalized linear models, (iii) simulation of underdispersed counts, and (iv) severely underdispersed count regression. We suggest that all future generalizations of the Poisson distribution be tested against this key property.
Journal: The American Statistician
Pages: 29-34
Issue: 1
Volume: 77
Year: 2023
Month: 1
X-DOI: 10.1080/00031305.2022.2106305
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2106305
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:29-34
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2026478_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: Dan J. Spitzner
Author-X-Name-First: Dan J.
Author-X-Name-Last: Spitzner
Title: A Statistical Basis for Reporting Strength of Evidence as Pool Reduction
Abstract:
This article establishes a statistical basis for an evidence-reporting strategy that interprets strength of evidence in terms of a reduction in the size of a pool of relevant conceptual objects. The strategy is motivated by debates in forensic science, wherein the pool would consist of sources of forensic material. An advantage of using the pool-reduction strategy is that it highlights uncertainty that cannot be resolved by empirical considerations. It is shown mathematically to reflect a nonstandard formulation of a Bayes factor, and to extend for use in problems of general quantitative inference. A number of conventions are proposed for full effectiveness of the strategy’s implementation in practice.
Journal: The American Statistician
Pages: 62-71
Issue: 1
Volume: 77
Year: 2023
Month: 1
X-DOI: 10.1080/00031305.2022.2026478
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2026478
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:62-71
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2058612_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: Andrew Gelman
Author-X-Name-First: Andrew
Author-X-Name-Last: Gelman
Title: “Two Truths and a Lie” as a Class-Participation Activity
Abstract:
We adapt the social game “Two truths and a lie” to a classroom setting to give an activity that introduces principles of statistical measurement, uncertainty, prediction, and calibration, while giving students an opportunity to meet each other. We discuss how this activity can be used in a range of different statistics courses.
Journal: The American Statistician
Pages: 97-101
Issue: 1
Volume: 77
Year: 2023
Month: 1
X-DOI: 10.1080/00031305.2022.2058612
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2058612
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:97-101
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2110939_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: Hadrien Charvat
Author-X-Name-First: Hadrien
Author-X-Name-Last: Charvat
Title: Using the Lambert Function to Estimate Shared Frailty Models with a Normally Distributed Random Intercept
Abstract:
Shared frailty models, that is, hazard regression models for censored data including random effects acting multiplicatively on the hazard, are commonly used to analyze time-to-event data possessing a hierarchical structure. When the random effects are assumed to be normally distributed, the cluster-specific marginal likelihood has no closed-form expression. A powerful method for approximating such integrals is the adaptive Gauss-Hermite quadrature (AGHQ). However, this method requires the estimation of the mode of the integrand in the expression defining the cluster-specific marginal likelihood: it is generally obtained through a nested optimization at the cluster level for each evaluation of the likelihood function. In this work, we show that in the case of a parametric shared frailty model including a normal random intercept, the cluster-specific modes can be determined analytically by using the principal branch of the Lambert function, W0
. Besides removing the need for the nested optimization procedure, it provides closed-form formulas for the gradient and Hessian of the approximated likelihood making its maximization by Newton-type algorithms convenient and efficient. The Lambert-based AGHQ (LAGHQ) might be applied to other problems involving similar integrals, such as the normally distributed random intercept Poisson model and the computation of probabilities from a Poisson lognormal distribution.
Journal: The American Statistician
Pages: 41-50
Issue: 1
Volume: 77
Year: 2023
Month: 1
X-DOI: 10.1080/00031305.2022.2110939
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2110939
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:41-50
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2160592_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: Huan Wang
Author-X-Name-First: Huan
Author-X-Name-Last: Wang
Title: Quantitative Drug Safety and Benefit-Risk Evaluation: Practical and Cross-Disciplinary Approaches
Journal: The American Statistician
Pages: 111-112
Issue: 1
Volume: 77
Year: 2023
Month: 1
X-DOI: 10.1080/00031305.2022.2160592
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2160592
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:111-112
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2050299_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: Spencer Hansen
Author-X-Name-First: Spencer
Author-X-Name-Last: Hansen
Author-Name: Ken Rice
Author-X-Name-First: Ken
Author-X-Name-Last: Rice
Title: Coherent Tests for Interval Null Hypotheses
Abstract:
In a celebrated 1996 article, Schervish showed that, for testing interval null hypotheses, tests typically viewed as optimal can be logically incoherent. Specifically, one may fail to reject a specific interval null, but nevertheless—testing at the same level with the same data—reject a larger null, in which the original one is nested. This result has been used to argue against the widespread practice of viewing p-values as measures of evidence. In the current work we approach tests of interval nulls using simple Bayesian decision theory, and establish straightforward conditions that ensure coherence in Schervish’s sense. From these, we go on to establish novel frequentist criteria—different to Type I error rate—that, when controlled at fixed levels, give tests that are coherent in Schervish’s sense. The results suggest that exploring frequentist properties beyond the familiar Neyman–Pearson framework may ameliorate some of statistical testing’s well-known problems.
Journal: The American Statistician
Pages: 20-28
Issue: 1
Volume: 77
Year: 2023
Month: 1
X-DOI: 10.1080/00031305.2022.2050299
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2050299
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:20-28
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2051605_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: Rachael C. Aikens
Author-X-Name-First: Rachael C.
Author-X-Name-Last: Aikens
Author-Name: Michael Baiocchi
Author-X-Name-First: Michael
Author-X-Name-Last: Baiocchi
Title: Assignment-Control Plots: A Visual Companion for Causal Inference Study Design
Abstract:
An important step for any causal inference study design is understanding the distribution of the subjects in terms of measured baseline covariates. However, not all baseline variation is equally important. We propose a set of visualizations that reduce the space of measured covariates into two components of baseline variation important to the design of an observational causal inference study: a propensity score summarizing baseline variation associated with treatment assignment and a prognostic score summarizing baseline variation associated with the untreated potential outcome. These assignment-control plots and variations thereof visualize study design tradeoffs and illustrate core methodological concepts in causal inference. As a practical demonstration, we apply assignment-control plots to a hypothetical study of cardiothoracic surgery. To demonstrate how these plots can be used to illustrate nuanced concepts, we use them to visualize unmeasured confounding and to consider the relationship between propensity scores and instrumental variables. While the family of visualization tools for studies of causality is relatively sparse, simple visual tools can be an asset to education, application, and methods development.
Journal: The American Statistician
Pages: 72-84
Issue: 1
Volume: 77
Year: 2023
Month: 1
X-DOI: 10.1080/00031305.2022.2051605
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2051605
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:72-84
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2023633_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20220907T060133 git hash: 85d61bd949
Author-Name: Jeroen de Mast
Author-X-Name-First: Jeroen
Author-X-Name-Last: de Mast
Author-Name: Stefan H. Steiner
Author-X-Name-First: Stefan H.
Author-X-Name-Last: Steiner
Author-Name: Wim P. M. Nuijten
Author-X-Name-First: Wim P. M.
Author-X-Name-Last: Nuijten
Author-Name: Daniel Kapitan
Author-X-Name-First: Daniel
Author-X-Name-Last: Kapitan
Title: Analytical Problem Solving Based on Causal, Correlational and Deductive Models
Abstract:
Many approaches for solving problems in business and industry are based on analytics and statistical modeling. Analytical problem solving is driven by the modeling of relationships between dependent (Y) and independent (X) variables, and we discuss three frameworks for modeling such relationships: cause-and-effect modeling, popular in applied statistics and beyond, correlational predictive modeling, popular in machine learning, and deductive (first-principles) modeling, popular in business analytics and operations research. We aim to explain the differences between these types of models, and flesh out the implications of these differences for study design, for discovering potential X/Y relationships, and for the types of solution patterns that each type of modeling could support. We use our account to clarify the popular descriptive-diagnostic-predictive-prescriptive analytics framework, but extend it to offer a more complete model of the process of analytical problem solving, reflecting the essential differences between causal, correlational, and deductive models.
Journal: The American Statistician
Pages: 51-61
Issue: 1
Volume: 77
Year: 2023
Month: 1
X-DOI: 10.1080/00031305.2021.2023633
File-URL: http://hdl.handle.net/10.1080/00031305.2021.2023633
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:1:p:51-61
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2115552_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Michael Grabchak
Author-X-Name-First: Michael
Author-X-Name-Last: Grabchak
Title: How Do We Perform a Paired t-Test When We Don’t Know How to Pair?
Abstract:
We address the question of how to perform a paired t-test in situations where we do not know how to pair the data. Specifically, we discuss approaches for bounding the test statistic of the paired t-test in a way that allows us to recover the results of this test in some cases. We also discuss the relationship between the paired t-test and the independent samples t-test and what happens if we use the latter to approximate the former. Our results are informed by both theoretical results and a simulation study.
Journal: The American Statistician
Pages: 127-133
Issue: 2
Volume: 77
Year: 2023
Month: 4
X-DOI: 10.1080/00031305.2022.2115552
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2115552
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:127-133
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2129787_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Mauricio Tec
Author-X-Name-First: Mauricio
Author-X-Name-Last: Tec
Author-Name: Yunshan Duan
Author-X-Name-First: Yunshan
Author-X-Name-Last: Duan
Author-Name: Peter Müller
Author-X-Name-First: Peter
Author-X-Name-Last: Müller
Title: A Comparative Tutorial of Bayesian Sequential Design and Reinforcement Learning
Abstract:
Reinforcement learning (RL) is a computational approach to reward-driven learning in sequential decision problems. It implements the discovery of optimal actions by learning from an agent interacting with an environment rather than from supervised data. We contrast and compare RL with traditional sequential design, focusing on simulation-based Bayesian sequential design (BSD). Recently, there has been an increasing interest in RL techniques for healthcare applications. We introduce two related applications as motivating examples. In both applications, the sequential nature of the decisions is restricted to sequential stopping. Rather than a comprehensive survey, the focus of the discussion is on solutions using standard tools for these two relatively simple sequential stopping problems. Both problems are inspired by adaptive clinical trial design. We use examples to explain the terminology and mathematical background that underlie each framework and map one to the other. The implementations and results illustrate the many similarities between RL and BSD. The results motivate the discussion of the potential strengths and limitations of each approach.
Journal: The American Statistician
Pages: 223-233
Issue: 2
Volume: 77
Year: 2023
Month: 4
X-DOI: 10.1080/00031305.2022.2129787
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2129787
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:223-233
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2128421_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Jay Bartroff
Author-X-Name-First: Jay
Author-X-Name-Last: Bartroff
Author-Name: Gary Lorden
Author-X-Name-First: Gary
Author-X-Name-Last: Lorden
Author-Name: Lijia Wang
Author-X-Name-First: Lijia
Author-X-Name-Last: Wang
Title: Optimal and Fast Confidence Intervals for Hypergeometric Successes
Abstract:
We present an efficient method of calculating exact confidence intervals for the hypergeometric parameter representing the number of “successes,” or “special items,” in the population. The method inverts minimum-width acceptance intervals after shifting them to make their endpoints nondecreasing while preserving their level. The resulting set of confidence intervals achieves minimum possible average size, and even in comparison with confidence sets not required to be intervals it attains the minimum possible cardinality most of the time, and always within 1. The method compares favorably with existing methods not only in the size of the intervals but also in the time required to compute them. The available R package hyperMCI implements the proposed method.
Journal: The American Statistician
Pages: 151-159
Issue: 2
Volume: 77
Year: 2023
Month: 4
X-DOI: 10.1080/00031305.2022.2128421
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2128421
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:151-159
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2127896_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Monnie McGee
Author-X-Name-First: Monnie
Author-X-Name-Last: McGee
Author-Name: Benjamin Williams
Author-X-Name-First: Benjamin
Author-X-Name-Last: Williams
Author-Name: Jacy Sparks
Author-X-Name-First: Jacy
Author-X-Name-Last: Sparks
Title: Athlete Recruitment and the Myth of the Sophomore Peak
Abstract:
Conventional wisdom dispersed by fans and coaches in the stands at almost any high school track meet suggests female athletes typically peak around 10th grade or earlier (15 years of age), particularly for distance runners, and male athletes continuously improve. Given that universities in the United States typically recruit track and field athletes from high school teams, it is important to understand the age of peak performance at the high school level. Athletes are often recruited starting in their sophomore year of high school and individuals develop at different rates during adolescence; however, the individual development factor is usually not taken into account during recruitment. In this study, we curate data on event times for high school track and field athletes from the years 2011 to 2019 to determine the trajectory of fastest times for male and female athletes in the 200m, 400m, 800m, and 1600m races. We show, through visualizations and models, that, for most athletes, the sophomore peak is a myth. Performance is mostly dependent on the individual athlete. That said, the trajectories cluster into four or five types, depending on the race distance. We explain the significance of the types for future recruitment.
Journal: The American Statistician
Pages: 182-191
Issue: 2
Volume: 77
Year: 2023
Month: 4
X-DOI: 10.1080/00031305.2022.2127896
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2127896
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:182-191
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2131625_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Xin Xiong
Author-X-Name-First: Xin
Author-X-Name-Last: Xiong
Author-Name: Ivor Cribben
Author-X-Name-First: Ivor
Author-X-Name-Last: Cribben
Title: The State of Play of Reproducibility in Statistics: An Empirical Analysis
Abstract:
Reproducibility, the ability to reproduce the results of published papers or studies using their computer code and data, is a cornerstone of reliable scientific methodology. Studies where results cannot be reproduced by the scientific community should be treated with caution. Over the past decade, the importance of reproducible research has been frequently stressed in a wide range of scientific journals such as Nature and Science and international magazines such as The Economist. However, multiple studies have demonstrated that scientific results are often not reproducible across research areas such as psychology and medicine. Statistics, the science concerned with developing and studying methods for collecting, analyzing, interpreting and presenting empirical data, prides itself on its openness when it comes to sharing both computer code and data. In this article, we examine reproducibility in the field of statistics by attempting to reproduce the results in 93 published papers in prominent journals using functional magnetic resonance imaging (fMRI) data during the 2010–2021 period. Overall, from both the computer code and the data perspective, among all the 93 examined papers, we could only reproduce the results in 14 (15.1%) papers, that is, the papers provide both executable computer code (or software) with the real fMRI data, and our results matched the results in the paper. Finally, we conclude with some author-specific and journal-specific recommendations to improve the research reproducibility in statistics.
Journal: The American Statistician
Pages: 115-126
Issue: 2
Volume: 77
Year: 2023
Month: 4
X-DOI: 10.1080/00031305.2022.2131625
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2131625
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:115-126
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2128874_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Chris Rohlfs
Author-X-Name-First: Chris
Author-X-Name-Last: Rohlfs
Title: Forbidden Knowledge and Specialized Training: A Versatile Solution for the Two Main Sources of Overfitting in Linear Regression
Abstract:
Overfitting in linear regression is broken down into two main causes. First, the formula for the estimator includes “forbidden knowledge” about training observations’ residuals, and it loses this advantage when deployed out-of-sample. Second, the estimator has “specialized training” that makes it particularly capable of explaining movements in the predictors that are idiosyncratic to the training sample. An out-of-sample counterpart is introduced to the popular “leverage” measure of training observations’ importance. A new method is proposed to forecast out-of-sample fit at the time of deployment, when the values for the predictors are known but the true outcome variable is not. In Monte Carlo simulations and in an empirical application using MRI brain scans, the proposed estimator performs comparably to Predicted Residual Error Sum of Squares (PRESS) for the average out-of-sample case and unlike PRESS, also performs consistently across different test samples, even those that differ substantially from the training set.
Journal: The American Statistician
Pages: 160-168
Issue: 2
Volume: 77
Year: 2023
Month: 4
X-DOI: 10.1080/00031305.2022.2128874
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2128874
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:160-168
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2198354_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Jae-Kwang Kim
Author-X-Name-First: Jae-Kwang
Author-X-Name-Last: Kim
Title: Graph Sampling
Journal: The American Statistician
Pages: 234-234
Issue: 2
Volume: 77
Year: 2023
Month: 4
X-DOI: 10.1080/00031305.2023.2198354
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2198354
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:234-234
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2087734_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Andee Kaplan
Author-X-Name-First: Andee
Author-X-Name-Last: Kaplan
Author-Name: Jacob Bien
Author-X-Name-First: Jacob
Author-X-Name-Last: Bien
Title: Interactive Exploration of Large Dendrograms with Prototypes
Abstract:
Hierarchical clustering is one of the standard methods taught for identifying and exploring the underlying structures that may be present within a dataset. Students are shown examples in which the dendrogram, a visual representation of the hierarchical clustering, reveals a clear clustering structure. However, in practice, data analysts today frequently encounter datasets whose large scale undermines the usefulness of the dendrogram as a visualization tool. Densely packed branches obscure structure, and overlapping labels are impossible to read. In this article we present a new workflow for performing hierarchical clustering via the R package called protoshiny that aims to restore hierarchical clustering to its former role of being an effective and versatile visualization tool. Our proposal leverages interactivity combined with the ability to label internal nodes in a dendrogram with a representative data point (called a prototype). After presenting the workflow, we provide three case studies to demonstrate its utility.
Journal: The American Statistician
Pages: 201-211
Issue: 2
Volume: 77
Year: 2023
Month: 4
X-DOI: 10.1080/00031305.2022.2087734
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2087734
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:201-211
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2184423_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: The Editors
Title: Correction: Linearity of Unbiased Linear Model Estimators
Journal: The American Statistician
Pages: 237-237
Issue: 2
Volume: 77
Year: 2023
Month: 4
X-DOI: 10.1080/00031305.2023.2184423
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2184423
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:237-237
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2141858_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Roy Bower
Author-X-Name-First: Roy
Author-X-Name-Last: Bower
Author-Name: Justin Hager
Author-X-Name-First: Justin
Author-X-Name-Last: Hager
Author-Name: Chris Cherniakov
Author-X-Name-First: Chris
Author-X-Name-Last: Cherniakov
Author-Name: Samay Gupta
Author-X-Name-First: Samay
Author-X-Name-Last: Gupta
Author-Name: William Cipolli
Author-X-Name-First: William
Author-X-Name-Last: Cipolli
Title: A Case for Nonparametrics
Abstract:
We provide a case study for motivating and teaching nonparametric statistical inference alongside traditional parametric approaches. The case consists of analyses by Bracht et al. who use analysis of variance (ANOVA) to assess the applicability of the human microfibrillar-associated protein 4 (MFAP4) as a biomarker for hepatic fibrosis in hepatitis C patients. We revisit their analyses and consider two nonparametric approaches: Mood’s median test and the Kruskal-Wallis test. We demonstrate how this case study enables instructors to discuss critical assumptions of parametric procedures while comparing and contrasting the results of multiple approaches. Interestingly, only one of the three approaches creates groupings that match the treatment recommendations of the European Association for the Study of the Liver (EASL). We provide guidance and resources to aid instructors in directing their students through this case study at various levels, including R code and novel R shiny applications for conducting the analyses in the classroom.
Journal: The American Statistician
Pages: 212-219
Issue: 2
Volume: 77
Year: 2023
Month: 4
X-DOI: 10.1080/00031305.2022.2141858
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2141858
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:212-219
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2116109_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Mingya Long
Author-X-Name-First: Mingya
Author-X-Name-Last: Long
Author-Name: Zhengbang Li
Author-X-Name-First: Zhengbang
Author-X-Name-Last: Li
Author-Name: Wei Zhang
Author-X-Name-First: Wei
Author-X-Name-Last: Zhang
Author-Name: Qizhai Li
Author-X-Name-First: Qizhai
Author-X-Name-Last: Li
Title: The Cauchy Combination Test under Arbitrary Dependence Structures
Abstract:
Combining individual p-values to perform an overall test is often encountered in statistical applications. The Cauchy combination test (CCT) (Journal of the American Statistical Association, 2020, 115, 393–402) is a powerful and computationally efficient approach to integrate individual p-values under arbitrary dependence structures for sparse signals. We revisit this test to additionally show that (i) the tail probability of the CCT can be approximated just as well when more relaxed assumptions are imposed on individual p-values compared to those of the original test statistics; (ii) such assumptions are satisfied by six popular copula distributions; and (iii) the power of the CCT is no less than that of the minimum p-value test when the number of p-values goes to infinity under some regularity conditions. These findings are confirmed by both simulations and applications in two real datasets, thus, further broadening the theory and applications of the CCT.
Journal: The American Statistician
Pages: 134-142
Issue: 2
Volume: 77
Year: 2023
Month: 4
X-DOI: 10.1080/00031305.2022.2116109
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2116109
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:134-142
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2198355_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Junyong Park
Author-X-Name-First: Junyong
Author-X-Name-Last: Park
Title: Handbook of Multiple Comparisons
Journal: The American Statistician
Pages: 234-236
Issue: 2
Volume: 77
Year: 2023
Month: 4
X-DOI: 10.1080/00031305.2023.2198355
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2198355
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:234-236
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2105950_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Marcos Matabuena
Author-X-Name-First: Marcos
Author-X-Name-Last: Matabuena
Author-Name: Marta Karas
Author-X-Name-First: Marta
Author-X-Name-Last: Karas
Author-Name: Sherveen Riazati
Author-X-Name-First: Sherveen
Author-X-Name-Last: Riazati
Author-Name: Nick Caplan
Author-X-Name-First: Nick
Author-X-Name-Last: Caplan
Author-Name: Philip R. Hayes
Author-X-Name-First: Philip R.
Author-X-Name-Last: Hayes
Title: Estimating Knee Movement Patterns of Recreational Runners Across Training Sessions Using Multilevel Functional Regression Models
Abstract:
Modern wearable monitors and laboratory equipment allow the recording of high-frequency data that can be used to quantify human movement. However, currently, data analysis approaches in these domains remain limited. This article proposes a new framework to analyze biomechanical patterns in sport training data recorded across multiple training sessions using multilevel functional models. We apply the methods to subsecond-level data of knee location trajectories collected in 19 recreational runners during a medium-intensity continuous run (MICR) and a high-intensity interval training (HIIT) session, with multiple steps recorded in each participant-session. We estimate functional intra-class correlation coefficient to evaluate the reliability of recorded measurements across multiple sessions of the same training type. Furthermore, we obtained a vectorial representation of the three hierarchical levels of the data and visualize them in a low-dimensional space. Finally, we quantified the differences between genders and between two training types using functional multilevel regression models that incorporate covariate information. We provide an overview of the relevant methods and make both data and the R code for all analyses freely available online on GitHub. Thus, this work can serve as a helpful reference for practitioners and guide for a broader audience of researchers interested in modeling repeated functional measures at different resolution levels in the context of biomechanics and sports science applications.
Journal: The American Statistician
Pages: 169-181
Issue: 2
Volume: 77
Year: 2023
Month: 4
X-DOI: 10.1080/00031305.2022.2105950
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2105950
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:169-181
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2182362_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Roy Bower
Author-X-Name-First: Roy
Author-X-Name-Last: Bower
Author-Name: William Cipolli
Author-X-Name-First: William
Author-X-Name-Last: Cipolli
Title: A Response to Rice and Lumley
Abstract:
We recognize the careful reading of and thought-provoking commentary on our work by Rice and Lumley. Further, we appreciate the opportunity to respond and clarify our position regarding the three presented concerns. We address these points in three sections below and conclude with final remarks in Section 4.
Journal: The American Statistician
Pages: 221-222
Issue: 2
Volume: 77
Year: 2023
Month: 4
X-DOI: 10.1080/00031305.2023.2182362
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2182362
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:221-222
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2077440_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Shijie Guo
Author-X-Name-First: Shijie
Author-X-Name-Last: Guo
Author-Name: Jingchen Hu
Author-X-Name-First: Jingchen
Author-X-Name-Last: Hu
Title: Data Privacy Protection and Utility Preservation through Bayesian Data Synthesis: A Case Study on Airbnb Listings
Abstract:
When releasing record-level data containing sensitive information to the public, the data disseminator is responsible for protecting the privacy of every record in the dataset, simultaneously preserving important features of the data for users’ analyses. These goals can be achieved by data synthesis, where confidential data are replaced with synthetic data that are simulated based on statistical models estimated on the confidential data. In this article, we present a data synthesis case study, where synthetic values of price and the number of available days in a sample of the New York Airbnb Open Data are created for privacy protection. One sensitive variable, the number of available days of an Airbnb listing, has a large amount of zero-valued records and also truncated at the two ends. We propose a zero-inflated truncated Poisson regression model for its synthesis. We use a sequential synthesis approach to further synthesize the sensitive price variable. The resulting synthetic data are evaluated for its utility preservation and privacy protection, the latter in the form of disclosure risks. Furthermore, we propose methods to investigate how uncertainties in intruder’s knowledge would influence the identification disclosure risks of the synthetic data. In particular, we explore several realistic scenarios of uncertainties in intruder’s knowledge of available information and evaluate their impacts on the resulting identification disclosure risks.
Journal: The American Statistician
Pages: 192-200
Issue: 2
Volume: 77
Year: 2023
Month: 4
X-DOI: 10.1080/00031305.2022.2077440
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2077440
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:192-200
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2127897_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Gang Han
Author-X-Name-First: Gang
Author-X-Name-Last: Han
Author-Name: Thomas J. Santner
Author-X-Name-First: Thomas J.
Author-X-Name-Last: Santner
Author-Name: Haiqun Lin
Author-X-Name-First: Haiqun
Author-X-Name-Last: Lin
Author-Name: Ao Yuan
Author-X-Name-First: Ao
Author-X-Name-Last: Yuan
Title: Bayesian-Frequentist Hybrid Inference in Applications with Small Sample Sizes
Abstract:
The Bayesian-frequentist hybrid model and associated inference can combine the advantages of both Bayesian and frequentist methods and avoid their limitations. However, except for few special cases in existing literature, the computation under the hybrid model is generally nontrivial or even unsolvable. This article develops a computation algorithm for hybrid inference under any general loss functions. Three simulation examples demonstrate that hybrid inference can improve upon frequentist inference by incorporating valuable prior information, and also improve Bayesian inference based on non-informative priors where the latter leads to biased estimates for the small sample sizes used in inference. The proposed method is illustrated in applications including a biomechanical engineering design and a surgical treatment of acral lentiginous melanoma.
Journal: The American Statistician
Pages: 143-150
Issue: 2
Volume: 77
Year: 2023
Month: 4
X-DOI: 10.1080/00031305.2022.2127897
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2127897
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:143-150
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2172078_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Kenneth Rice
Author-X-Name-First: Kenneth
Author-X-Name-Last: Rice
Author-Name: Thomas Lumley
Author-X-Name-First: Thomas
Author-X-Name-Last: Lumley
Title: Comment on “A Case for Nonparametrics” by Bower et al.
Journal: The American Statistician
Pages: 220-220
Issue: 2
Volume: 77
Year: 2023
Month: 4
X-DOI: 10.1080/00031305.2023.2172078
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2172078
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:220-220
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2161637_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Jiaqi Gu
Author-X-Name-First: Jiaqi
Author-X-Name-Last: Gu
Author-Name: Yan Zhang
Author-X-Name-First: Yan
Author-X-Name-Last: Zhang
Author-Name: Guosheng Yin
Author-X-Name-First: Guosheng
Author-X-Name-Last: Yin
Title: Bayesian Log-Rank Test
Abstract:
Comparison of two survival curves is a fundamental problem in survival analysis. Although abundant frequentist methods have been developed for comparing survival functions, inference procedures from the Bayesian perspective are rather limited. In this article, we extract the quantity of interest from the classic log-rank test and propose its Bayesian counterpart. Monte Carlo methods, including a Gibbs sampler and a sequential importance sampling procedure, are developed to draw posterior samples of survival functions and a decision rule of hypothesis testing is constructed for making inference. Via simulations and real data analysis, the proposed Bayesian log-rank test is shown to be asymptotically equivalent to the classic one when noninformative prior distributions are used, which provides a Bayesian interpretation of the log-rank test. When using the correct prior information from historical data, the Bayesian log-rank test is shown to outperform the classic one in terms of power. R codes to implement the Bayesian log-rank test are also provided with step-by-step instructions.
Journal: The American Statistician
Pages: 292-300
Issue: 3
Volume: 77
Year: 2023
Month: 7
X-DOI: 10.1080/00031305.2022.2161637
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2161637
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:3:p:292-300
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2197021_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Mark F. Schilling
Author-X-Name-First: Mark F.
Author-X-Name-Last: Schilling
Title: Bartroff, J., Lorden, G. and Wang, L. (2022), “Optimal and Fast Confidence Intervals for Hypergeometric Successes,” The American Statistician: Comment by Schilling
Journal: The American Statistician
Pages: 342-342
Issue: 3
Volume: 77
Year: 2023
Month: 7
X-DOI: 10.1080/00031305.2023.2197021
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2197021
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:3:p:342-342
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2156612_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Rameela Raman
Author-X-Name-First: Rameela
Author-X-Name-Last: Raman
Author-Name: Jessica Utts
Author-X-Name-First: Jessica
Author-X-Name-Last: Utts
Author-Name: Andrew I. Cohen
Author-X-Name-First: Andrew I.
Author-X-Name-Last: Cohen
Author-Name: Matthew J. Hayat
Author-X-Name-First: Matthew J.
Author-X-Name-Last: Hayat
Title: Integrating Ethics into the Guidelines for Assessment and Instruction in Statistics Education (GAISE)
Abstract:
Statistics education at all levels includes data collected on human subjects. Thus, statistics educators have a responsibility to educate their students about the ethical aspects related to the collection of those data. The changing statistics education landscape has seen instruction moving from being formula-based to being focused on statistical reasoning. The widely implemented Guidelines for Assessment and Instruction in Statistics Education (GAISE) Report has paved the way for instructors to present introductory statistics to students in a way that is both approachable and engaging. However, with technological advancement and the increase in availability of real-world datasets, it is necessary that instruction also integrate the ethical aspects around data sources, such as privacy, how the data were obtained and whether participants consent to the use of their data. In this article, we propose incorporating ethics into established curricula and integrating ethics into undergraduate-level introductory statistics courses based on recommendations in the GAISE Report. We provide a few examples of how to prompt students to constructively think about their ethical responsibilities when working with data.
Journal: The American Statistician
Pages: 323-330
Issue: 3
Volume: 77
Year: 2023
Month: 7
X-DOI: 10.1080/00031305.2022.2156612
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2156612
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:3:p:323-330
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2139293_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Harlan Campbell
Author-X-Name-First: Harlan
Author-X-Name-Last: Campbell
Author-Name: Paul Gustafson
Author-X-Name-First: Paul
Author-X-Name-Last: Gustafson
Title: Bayes Factors and Posterior Estimation: Two Sides of the Very Same Coin
Abstract:
Recently, several researchers have claimed that conclusions obtained from a Bayes factor (or the posterior odds) may contradict those obtained from Bayesian posterior estimation. In this article, we wish to point out that no such “contradiction” exists if one is willing to consistently define one’s priors and posteriors. The key for congruence is that the (implied) prior model odds used for testing are the same as those used for estimation. Our recommendation is simple: If one reports a Bayes factor comparing two models, then one should also report posterior estimates which appropriately acknowledge the uncertainty with regards to which of the two models is correct.
Journal: The American Statistician
Pages: 248-258
Issue: 3
Volume: 77
Year: 2023
Month: 7
X-DOI: 10.1080/00031305.2022.2139293
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2139293
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:3:p:248-258
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2163689_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Serveh Sharifi Far
Author-X-Name-First: Serveh
Author-X-Name-Last: Sharifi Far
Author-Name: Vanda Inácio
Author-X-Name-First: Vanda
Author-X-Name-Last: Inácio
Author-Name: Daniel Paulin
Author-X-Name-First: Daniel
Author-X-Name-Last: Paulin
Author-Name: Miguel de Carvalho
Author-X-Name-First: Miguel
Author-X-Name-Last: de Carvalho
Author-Name: Nicole H. Augustin
Author-X-Name-First: Nicole H.
Author-X-Name-Last: Augustin
Author-Name: Mike Allerhand
Author-X-Name-First: Mike
Author-X-Name-Last: Allerhand
Author-Name: Gail Robertson
Author-X-Name-First: Gail
Author-X-Name-Last: Robertson
Title: Consultancy Style Dissertations in Statistics and Data Science: Why and How
Abstract:
In this article, we chronicle the development of the consultancy style dissertations of the MSc program in Statistics with Data Science at the University of Edinburgh. These dissertations are based on real-world data problems, in joint supervision with industrial and academic partners, and aim to get all students in the cohort together to develop consultancy skills and best practices, and also to promote their statistical leadership. Aligning with recently published research on statistical education suggesting the need for a greater focus on statistical consultancy skills, we summarize our experience in organizing and supervising such consultancy style dissertations, describe the logistics of implementing them, and review the students’ and supervisors’ feedback about these dissertations.
Journal: The American Statistician
Pages: 331-339
Issue: 3
Volume: 77
Year: 2023
Month: 7
X-DOI: 10.1080/00031305.2022.2163689
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2163689
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:3:p:331-339
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2143897_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Tom E. Hardwicke
Author-X-Name-First: Tom E.
Author-X-Name-Last: Hardwicke
Author-Name: Maia Salholz-Hillel
Author-X-Name-First: Maia
Author-X-Name-Last: Salholz-Hillel
Author-Name: Mario Malički
Author-X-Name-First: Mario
Author-X-Name-Last: Malički
Author-Name: Dénes Szűcs
Author-X-Name-First: Dénes
Author-X-Name-Last: Szűcs
Author-Name: Theiss Bendixen
Author-X-Name-First: Theiss
Author-X-Name-Last: Bendixen
Author-Name: John P. A. Ioannidis
Author-X-Name-First: John P. A.
Author-X-Name-Last: Ioannidis
Title: Statistical Guidance to Authors at Top-Ranked Journals across Scientific Disciplines
Abstract:
Scientific journals may counter the misuse, misreporting, and misinterpretation of statistics by providing guidance to authors. We described the nature and prevalence of statistical guidance at 15 journals (top-ranked by Impact Factor) in each of 22 scientific disciplines across five high-level domains (N = 330 journals). The frequency of statistical guidance varied across domains (Health & Life Sciences: 122/165 journals, 74%; Multidisciplinary: 9/15 journals, 60%; Social Sciences: 8/30 journals, 27%; Physical Sciences: 21/90 journals, 23%; Formal Sciences: 0/30 journals, 0%). In one discipline (Clinical Medicine), statistical guidance was provided by all examined journals and in two disciplines (Mathematics and Computer Science) no examined journals provided statistical guidance. Of the 160 journals providing statistical guidance, 93 had a dedicated statistics section in their author instructions. The most frequently mentioned topics were confidence intervals (90 journals) and p-values (88 journals). For six “hotly debated” topics (statistical significance, p-values, Bayesian statistics, effect sizes, confidence intervals, and sample size planning/justification) journals typically offered implicit or explicit endorsement and rarely provided opposition. The heterogeneity of statistical guidance provided by top-ranked journals within and between disciplines highlights a need for further research and debate about the role journals can play in improving statistical practice.
Journal: The American Statistician
Pages: 239-247
Issue: 3
Volume: 77
Year: 2023
Month: 7
X-DOI: 10.1080/00031305.2022.2143897
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2143897
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:3:p:239-247
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2087735_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Xun Li
Author-X-Name-First: Xun
Author-X-Name-Last: Li
Author-Name: Joyee Ghosh
Author-X-Name-First: Joyee
Author-X-Name-Last: Ghosh
Author-Name: Gabriele Villarini
Author-X-Name-First: Gabriele
Author-X-Name-Last: Villarini
Title: A Comparison of Bayesian Multivariate Versus Univariate Normal Regression Models for Prediction
Abstract:
In many moderate dimensional applications we have multiple response variables that are associated with a common set of predictors. When the main objective is prediction of the response variables, a natural question is: do multivariate regression models that accommodate dependency among the response variables improve prediction compared to their univariate counterparts? Note that in this article, by univariate versus multivariate regression models we refer to regression models with a single versus multiple response variables, respectively. We assume that under both scenarios, there are multiple covariates. Our question is motivated by an application in climate science, which involves the prediction of multiple metrics that measure the activity, intensity, severity etc. of a hurricane season. Average sea surface temperatures (SSTs) during the hurricane season have been used as predictors for each of these metrics, in separate univariate regression models, in the literature. Since the true SSTs are yet to be observed during prediction, typically their forecasts from multiple climate models are used as predictors. Some climate models have a few missing values so we develop Bayesian univariate/multivariate normal regression models, that can handle missing covariates and variable selection uncertainty. Whether Bayesian multivariate normal regression models improve prediction compared to their univariate counterparts is not clear from the existing literature, and in this work we try to fill this gap.
Journal: The American Statistician
Pages: 304-312
Issue: 3
Volume: 77
Year: 2023
Month: 7
X-DOI: 10.1080/00031305.2022.2087735
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2087735
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:3:p:304-312
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2157874_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Marcelo dos Santos
Author-X-Name-First: Marcelo
Author-X-Name-Last: dos Santos
Author-Name: Fernanda De Bastiani
Author-X-Name-First: Fernanda
Author-X-Name-Last: De Bastiani
Author-Name: Miguel A. Uribe-Opazo
Author-X-Name-First: Miguel A.
Author-X-Name-Last: Uribe-Opazo
Author-Name: Manuel Galea
Author-X-Name-First: Manuel
Author-X-Name-Last: Galea
Title: Selection Criterion of Working Correlation Structure for Spatially Correlated Data
Abstract:
To obtain regression parameter estimates in generalized estimation equation modeling, whether in longitudinal or spatially correlated data, it is necessary to specify the structure of the working correlation matrix. The regression parameter estimates can be affected by the choice of this matrix. Within spatial statistics, the correlation matrix also influences how spatial variability is modeled. Therefore, this study proposes a new method for selecting a working matrix, based on conditioning the variance-covariance matrix naive. The method performance is evaluated by an extensive simulation study, using the marginal distributions of normal, Poisson, and gamma for spatially correlated data. The correlation structure specification is based on semivariogram models, using the Wendland, Matérn, and spherical model families. The results reveal that regarding the hit rates of the true spatial correlation structure of simulated data, the proposed criterion resulted in better performance than competing criteria: quasi-likelihood under the independence model criterion QIC, correlation information criterion CIC, and the Rotnizky–Jewell criterion RJC. The application of an appropriate spatial correlation structure selection was shown using the first-semester average rainfall data of 2021 in the state of Pernambuco, Brazil.
Journal: The American Statistician
Pages: 283-291
Issue: 3
Volume: 77
Year: 2023
Month: 7
X-DOI: 10.1080/00031305.2022.2157874
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2157874
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:3:p:283-291
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2151510_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Diana Rauwolf
Author-X-Name-First: Diana
Author-X-Name-Last: Rauwolf
Author-Name: Udo Kamps
Author-X-Name-First: Udo
Author-X-Name-Last: Kamps
Title: Quantifying the Inspection Paradox with Random Time
Abstract:
The well-known inspection paradox of renewal theory states that, in expectation, the inspection interval is larger than a common renewal interval, in general. For a random inspection time, which includes the deterministic case, and a delayed renewal process, representations of the expected length of an inspection interval and related inequalities in terms of covariances are shown. Datasets of eruption times of Beehive Geyser and Riverside Geyser in Yellowstone National Park, as well as several distributional examples, illustrate the findings.
Journal: The American Statistician
Pages: 274-282
Issue: 3
Volume: 77
Year: 2023
Month: 7
X-DOI: 10.1080/00031305.2022.2151510
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2151510
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:3:p:274-282
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2179664_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Noga Alon
Author-X-Name-First: Noga
Author-X-Name-Last: Alon
Author-Name: Yaakov Malinovsky
Author-X-Name-First: Yaakov
Author-X-Name-Last: Malinovsky
Title: Hitting a Prime in 2.43 Dice Rolls (On Average)
Abstract:
What is the number of rolls of fair six-sided dice until the first time the total sum of all rolls is a prime? We compute the expectation and the variance of this random variable up to an additive error of less than 10−4
. This is a solution to a puzzle suggested by DasGupta in the Bulletin of the Institute of Mathematical Statistics, where the published solution is incomplete. The proof is simple, combining a basic dynamic programming algorithm with a quick Matlab computation and basic facts about the distribution of primes.
Journal: The American Statistician
Pages: 301-303
Issue: 3
Volume: 77
Year: 2023
Month: 7
X-DOI: 10.1080/00031305.2023.2179664
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2179664
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:3:p:301-303
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2143898_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Daniel Vedensky
Author-X-Name-First: Daniel
Author-X-Name-Last: Vedensky
Author-Name: Paul A. Parker
Author-X-Name-First: Paul A.
Author-X-Name-Last: Parker
Author-Name: Scott H. Holan
Author-X-Name-First: Scott H.
Author-X-Name-Last: Holan
Title: A Look into the Problem of Preferential Sampling through the Lens of Survey Statistics
Abstract:
An evolving problem in the field of spatial and ecological statistics is that of preferential sampling, where biases may be present due to a relationship between sample data locations and a response of interest. This field of research bears a striking resemblance to the longstanding problem of informative sampling within survey methodology, although with some important distinctions. With the goal of promoting collaborative effort within and between these two problem domains, we make comparisons and contrasts between the two problem statements. Specifically, we review many of the solutions available to address each of these problems, noting the important differences in modeling techniques. Additionally, we construct a series of simulation studies to examine some of the methods available for preferential sampling, as well as a comparison analyzing heavy metal biomonitoring data.
Journal: The American Statistician
Pages: 313-322
Issue: 3
Volume: 77
Year: 2023
Month: 7
X-DOI: 10.1080/00031305.2022.2143898
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2143898
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:3:p:313-322
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2205455_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Jay Bartroff
Author-X-Name-First: Jay
Author-X-Name-Last: Bartroff
Author-Name: Gary Lorden
Author-X-Name-First: Gary
Author-X-Name-Last: Lorden
Author-Name: Lijia Wang
Author-X-Name-First: Lijia
Author-X-Name-Last: Wang
Title: Response to Comment by Schilling
Journal: The American Statistician
Pages: 343-344
Issue: 3
Volume: 77
Year: 2023
Month: 7
X-DOI: 10.1080/00031305.2023.2205455
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2205455
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:3:p:343-344
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2230758_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Ding-Geng Chen
Author-X-Name-First: Ding-Geng
Author-X-Name-Last: Chen
Title: Event History Analysis with R, 2nd ed.
Journal: The American Statistician
Pages: 340-341
Issue: 3
Volume: 77
Year: 2023
Month: 7
X-DOI: 10.1080/00031305.2023.2230758
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2230758
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:3:p:340-341
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2141856_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Jangsun Baek
Author-X-Name-First: Jangsun
Author-X-Name-Last: Baek
Author-Name: Jeong-Soo Park
Author-X-Name-First: Jeong-Soo
Author-X-Name-Last: Park
Title: Mixture of Networks for Clustering Categorical Data: A Penalized Composite Likelihood Approach
Abstract:
One of the challenges in clustering categorical data is the curse of dimensionality caused by the inherent sparsity of high-dimensional data, the records of which include a large number of attributes. The latent class model (LCM) assumes local independence between the variables in clusters, and is a parsimonious model-based clustering approach that has been used to circumvent the problem. The mixture of a log-linear model is more flexible but requires more parameters to be estimated. In this research, we recognize that each categorical observation can be conceived as a network with pairwise linked nodes, which are the response levels of the observation attributes. Therefore, the categorical data for clustering is considered a finite mixture of different component layer networks with distinct patterns. We apply a penalized composite likelihood approach to a finite mixture of networks for sparse multivariate categorical data to reduce the number of parameters, implement the EM algorithm to estimate the model parameters, and show that the estimates are consistent and satisfy asymptotic normality. The performance of the proposed approach is shown to be better in comparison with the conventional methods for both synthetic and real datasets.
Journal: The American Statistician
Pages: 259-273
Issue: 3
Volume: 77
Year: 2023
Month: 7
X-DOI: 10.1080/00031305.2022.2141856
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2141856
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:3:p:259-273
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2139294_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: David I. Warton
Author-X-Name-First: David I.
Author-X-Name-Last: Warton
Title: Global Simulation Envelopes for Diagnostic Plots in Regression Models
Abstract:
Residual plots are often used to interrogate regression model assumptions, but interpreting them requires an understanding of how much sampling variation to expect when assumptions are satisfied. In this article, we propose constructing global envelopes around data (or around trends fitted to data) on residual plots, exploiting recent advances that enable construction of global envelopes around functions by simulation. While the proposed tools are primarily intended as a graphical aid, they can be interpreted as formal tests of model assumptions, which enables the study of their properties via simulation experiments. We considered three model scenarios—fitting a linear model, generalized linear model or generalized linear mixed model—and explored the power of global simulation envelope tests constructed around data on quantile-quantile plots, or around trend lines on residual versus fits plots or scale-location plots. Global envelope tests compared favorably to commonly used tests of assumptions at detecting violations of distributional and linearity assumptions. Freely available R software (ecostats::plotenvelope) enables application of these tools to any fitted model that has methods for the simulate, residuals and predict functions.
Journal: The American Statistician
Pages: 425-431
Issue: 4
Volume: 77
Year: 2023
Month: 10
X-DOI: 10.1080/00031305.2022.2139294
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2139294
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:4:p:425-431
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2200512_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Marcos Matabuena
Author-X-Name-First: Marcos
Author-X-Name-Last: Matabuena
Author-Name: Paulo Félix
Author-X-Name-First: Paulo
Author-X-Name-Last: Félix
Author-Name: Marc Ditzhaus
Author-X-Name-First: Marc
Author-X-Name-Last: Ditzhaus
Author-Name: Juan Vidal
Author-X-Name-First: Juan
Author-X-Name-Last: Vidal
Author-Name: Francisco Gude
Author-X-Name-First: Francisco
Author-X-Name-Last: Gude
Title: Hypothesis Testing for Matched Pairs with Missing Data by Maximum Mean Discrepancy: An Application to Continuous Glucose Monitoring
Abstract:
A frequent problem in statistical science is how to properly handle missing data in matched paired observations. There is a large body of literature coping with the univariate case. Yet, the ongoing technological progress in measuring biological systems raises the need for addressing more complex data, for example, graphs, strings, and probability distributions. To fill this gap, this article proposes new estimators of the maximum mean discrepancy (MMD) to handle complex matched pairs with missing data. These estimators can detect differences in data distributions under different missingness assumptions. The validity of this approach is proven and further studied in an extensive simulation study, and statistical consistency results are provided. Data obtained from continuous glucose monitoring in a longitudinal population-based diabetes study are used to illustrate the application of this approach. By employing new distributional representations along with cluster analysis, new clinical criteria on how glucose changes vary at the distributional level over 5 years can be explored.
Journal: The American Statistician
Pages: 357-369
Issue: 4
Volume: 77
Year: 2023
Month: 10
X-DOI: 10.1080/00031305.2023.2200512
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2200512
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:4:p:357-369
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2261817_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Brady T. West
Author-X-Name-First: Brady T.
Author-X-Name-Last: West
Title: ANOVA and Mixed Models: A Short Introduction Using R
Journal: The American Statistician
Pages: 449-450
Issue: 4
Volume: 77
Year: 2023
Month: 10
X-DOI: 10.1080/00031305.2023.2261817
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2261817
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:4:p:449-450
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2191670_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Mehdi Moradi
Author-X-Name-First: Mehdi
Author-X-Name-Last: Moradi
Author-Name: Ottmar Cronie
Author-X-Name-First: Ottmar
Author-X-Name-Last: Cronie
Author-Name: Unai Pérez-Goya
Author-X-Name-First: Unai
Author-X-Name-Last: Pérez-Goya
Author-Name: Jorge Mateu
Author-X-Name-First: Jorge
Author-X-Name-Last: Mateu
Title: Hierarchical Spatio-Temporal Change-Point Detection
Abstract:
Detecting change-points in multivariate settings is usually carried out by analyzing all marginals either independently, via univariate methods, or jointly, through multivariate approaches. The former discards any inherent dependencies between different marginals and the latter may suffer from domination/masking among different change-points of distinct marginals. As a remedy, we propose an approach which groups marginals with similar temporal behaviors, and then performs group-wise multivariate change-point detection. Our approach groups marginals based on hierarchical clustering using distances which adjust for inherent dependencies. Through a simulation study we show that our approach, by preventing domination/masking, significantly enhances the general performance of the employed multivariate change-point detection method. Finally, we apply our approach to two datasets: (i) Land Surface Temperature in Spain, during the years 2000–2021, and (ii) The WikiLeaks Afghan War Diary data.
Journal: The American Statistician
Pages: 390-400
Issue: 4
Volume: 77
Year: 2023
Month: 10
X-DOI: 10.1080/00031305.2023.2191670
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2191670
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:4:p:390-400
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2261819_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Carlos Cinelli
Author-X-Name-First: Carlos
Author-X-Name-Last: Cinelli
Title: A First Course in Linear Model Theory, 2nd ed.
Journal: The American Statistician
Pages: 451-451
Issue: 4
Volume: 77
Year: 2023
Month: 10
X-DOI: 10.1080/00031305.2023.2261819
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2261819
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:4:p:451-451
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2183257_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Per Gösta Andersson
Author-X-Name-First: Per Gösta
Author-X-Name-Last: Andersson
Title: The Wald Confidence Interval for a Binomial p as an Illuminating “Bad” Example
Abstract:
When teaching we usually not only demonstrate/discuss how a certain method works, but, not less important, why it works. In contrast, the Wald confidence interval for a binomial p constitutes an excellent example of a case where we might be interested in why a method does not work. It has been in use for many years and, sadly enough, it is still to be found in many textbooks in mathematical statistics/statistics. The reasons for not using this interval are plentiful and this fact gives us a good opportunity to discuss all of its deficiencies and draw conclusions which are of more general interest. We will mostly use already known results and bring them together in a manner appropriate to the teaching situation. The main purpose of this article is to show how to stimulate students to take a more critical view of simplifications and approximations. We primarily aim for master’s students who previously have been confronted with the Wilson (score) interval, but parts of the presentation may as well be suitable for bachelor’s students.
Journal: The American Statistician
Pages: 443-448
Issue: 4
Volume: 77
Year: 2023
Month: 10
X-DOI: 10.1080/00031305.2023.2183257
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2183257
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:4:p:443-448
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2191664_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Olivier Binette
Author-X-Name-First: Olivier
Author-X-Name-Last: Binette
Author-Name: Sokhna A York
Author-X-Name-First: Sokhna A
Author-X-Name-Last: York
Author-Name: Emma Hickerson
Author-X-Name-First: Emma
Author-X-Name-Last: Hickerson
Author-Name: Youngsoo Baek
Author-X-Name-First: Youngsoo
Author-X-Name-Last: Baek
Author-Name: Sarvo Madhavan
Author-X-Name-First: Sarvo
Author-X-Name-Last: Madhavan
Author-Name: Christina Jones
Author-X-Name-First: Christina
Author-X-Name-Last: Jones
Title: Estimating the Performance of Entity Resolution Algorithms: Lessons Learned Through PatentsView.org
Abstract:
This article introduces a novel evaluation methodology for entity resolution algorithms. It is motivated by PatentsView.org, a public-use patent data exploration platform that disambiguates patent inventors using an entity resolution algorithm. We provide a data collection methodology and tailored performance estimators that account for sampling biases. Our approach is simple, practical, and principled—key characteristics that allow us to paint the first representative picture of PatentsView’s disambiguation performance. The results are used to inform PatentsView’s users of the reliability of the data and to allow the comparison of competing disambiguation algorithms.
Journal: The American Statistician
Pages: 370-380
Issue: 4
Volume: 77
Year: 2023
Month: 10
X-DOI: 10.1080/00031305.2023.2191664
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2191664
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:4:p:370-380
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2173294_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Peng Wang
Author-X-Name-First: Peng
Author-X-Name-Last: Wang
Author-Name: Yilei Ma
Author-X-Name-First: Yilei
Author-X-Name-Last: Ma
Author-Name: Siqi Xu
Author-X-Name-First: Siqi
Author-X-Name-Last: Xu
Author-Name: Yi-Xin Wang
Author-X-Name-First: Yi-Xin
Author-X-Name-Last: Wang
Author-Name: Yu Zhang
Author-X-Name-First: Yu
Author-X-Name-Last: Zhang
Author-Name: Xiangyang Lou
Author-X-Name-First: Xiangyang
Author-X-Name-Last: Lou
Author-Name: Ming Li
Author-X-Name-First: Ming
Author-X-Name-Last: Li
Author-Name: Baolin Wu
Author-X-Name-First: Baolin
Author-X-Name-Last: Wu
Author-Name: Guimin Gao
Author-X-Name-First: Guimin
Author-X-Name-Last: Gao
Author-Name: Ping Yin
Author-X-Name-First: Ping
Author-X-Name-Last: Yin
Author-Name: Nianjun Liu
Author-X-Name-First: Nianjun
Author-X-Name-Last: Liu
Title: MOVER-R and Penalized MOVER-R Confidence Intervals for the Ratio of Two Quantities
Abstract:
Developing a confidence interval for the ratio of two quantities is an important task in statistics because of its omnipresence in real world applications. For such a problem, the MOVER-R (method of variance recovery for the ratio) technique, which is based on the recovery of variance estimates from confidence limits of the numerator and the denominator separately, was proposed as a useful and efficient approach. However, this method implicitly assumes that the confidence interval for the denominator never includes zero, which might be violated in practice. In this article, we first use a new framework to derive the MOVER-R confidence interval, which does not require the above assumption and covers the whole parameter space. We find that MOVER-R can produce an unbounded confidence interval, just like the well-known Fieller method. To overcome this issue, we further propose the penalized MOVER-R. We prove that the new method differs from MOVER-R only at the second order. It, however, always gives a bounded and analytic confidence interval. Through simulation studies and a real data application, we show that the penalized MOVER-R generally provides a better confidence interval than MOVER-R in terms of controlling the coverage probability and the median width.
Journal: The American Statistician
Pages: 381-389
Issue: 4
Volume: 77
Year: 2023
Month: 10
X-DOI: 10.1080/00031305.2023.2173294
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2173294
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:4:p:381-389
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2197022_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Xavier Puig
Author-X-Name-First: Xavier
Author-X-Name-Last: Puig
Author-Name: Josep Ginebra
Author-X-Name-First: Josep
Author-X-Name-Last: Ginebra
Title: Mapping Life Expectancy Loss in Barcelona in 2020
Abstract:
We use a Bayesian spatio-temporal model, first to smooth small-area initial life expectancy estimates in Barcelona for 2020, and second to predict what small-area life expectancy would have been in 2020 in absence of covid-19 using mortality data from 2007 to 2019. This allows us to estimate and map the small-area life expectancy loss, which can be used to assess how the impact of covid-19 varies spatially, and to explore whether that loss relates to underlying factors, such as population density, educational level, or proportion of older individuals living alone. We find that the small-area life expectancy loss for men and for women have similar distributions, and are spatially uncorrelated but positively correlated with population density and among themselves. On average, we estimate that the life expectancy loss in Barcelona in 2020 was of 2.01 years for men, falling back to 2011 levels, and of 2.11 years for women, falling back to 2006 levels.
Journal: The American Statistician
Pages: 417-424
Issue: 4
Volume: 77
Year: 2023
Month: 10
X-DOI: 10.1080/00031305.2023.2197022
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2197022
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:4:p:417-424
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2186952_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Jan Graffelman
Author-X-Name-First: Jan
Author-X-Name-Last: Graffelman
Author-Name: Jan de Leeuw
Author-X-Name-First: Jan
Author-X-Name-Last: de Leeuw
Title: Improved Approximation and Visualization of the Correlation Matrix
Abstract:
The graphical representation of the correlation matrix by means of different multivariate statistical methods is reviewed, a comparison of the different procedures is presented with the use of an example dataset, and an improved representation with better fit is proposed. Principal component analysis is widely used for making pictures of correlation structure, though as shown a weighted alternating least squares approach that avoids the fitting of the diagonal of the correlation matrix outperforms both principal component analysis and principal factor analysis in approximating a correlation matrix. Weighted alternating least squares is a very strong competitor for principal component analysis, in particular if the correlation matrix is the focus of the study, because it improves the representation of the correlation matrix, often at the expense of only a minor percentage of explained variance for the original data matrix, if the latter is mapped onto the correlation biplot by regression. In this article, we propose to combine weighted alternating least squares with an additive adjustment of the correlation matrix, and this is seen to lead to further improved approximation of the correlation matrix.
Journal: The American Statistician
Pages: 432-442
Issue: 4
Volume: 77
Year: 2023
Month: 10
X-DOI: 10.1080/00031305.2023.2186952
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2186952
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:4:p:432-442
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2173293_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Davy Paindaveine
Author-X-Name-First: Davy
Author-X-Name-Last: Paindaveine
Author-Name: Philippe Spindel
Author-X-Name-First: Philippe
Author-X-Name-Last: Spindel
Title: Revisiting the Name Variant of the Two-Children Problem
Abstract:
Initially proposed by Martin Gardner in the 1950s, the famous two-children problem is often presented as a paradox in probability theory. A relatively recent variant of this paradox states that, while in a two-children family for which at least one child is a girl, the probability that the other child is a boy is 2/3, this probability becomes 1/2 if the first name of the girl is disclosed (provided that two sisters may not be given the same first name). We revisit this variant of the problem and show that, if one adopts a natural model for the way first names are given to girls, then the probability that the other child is a boy may take any value in (0,2/3)
. By exploiting the concept of Schur-concavity, we study how this probability depends on model parameters.
Journal: The American Statistician
Pages: 401-405
Issue: 4
Volume: 77
Year: 2023
Month: 10
X-DOI: 10.1080/00031305.2023.2173293
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2173293
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:4:p:401-405
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2261818_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: P. Richard Hahn
Author-X-Name-First: P.
Author-X-Name-Last: Richard Hahn
Title: Bayesian Modeling and Computation in Python
Journal: The American Statistician
Pages: 450-451
Issue: 4
Volume: 77
Year: 2023
Month: 10
X-DOI: 10.1080/00031305.2023.2261818
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2261818
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:4:p:450-451
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2141857_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Marius Hofert
Author-X-Name-First: Marius
Author-X-Name-Last: Hofert
Author-Name: Avinash Prasad
Author-X-Name-First: Avinash
Author-X-Name-Last: Prasad
Author-Name: Mu Zhu
Author-X-Name-First: Mu
Author-X-Name-Last: Zhu
Title: RafterNet: Probabilistic Predictions in Multi-Response Regression
Abstract:
A fully nonparametric approach for making probabilistic predictions in multi-response regression problems is introduced. Random forests are used as marginal models for each response variable and, as novel contribution of the present work, the dependence between the multiple response variables is modeled by a generative neural network. This combined modeling approach of random forests, corresponding empirical marginal residual distributions and a generative neural network is referred to as RafterNet. Multiple datasets serve as examples to demonstrate the flexibility of the approach and its impact for making probabilistic forecasts.
Journal: The American Statistician
Pages: 406-416
Issue: 4
Volume: 77
Year: 2023
Month: 10
X-DOI: 10.1080/00031305.2022.2141857
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2141857
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:4:p:406-416
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2203177_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20230119T200553 git hash: 724830af20
Author-Name: Sandra Siegfried
Author-X-Name-First: Sandra
Author-X-Name-Last: Siegfried
Author-Name: Lucas Kook
Author-X-Name-First: Lucas
Author-X-Name-Last: Kook
Author-Name: Torsten Hothorn
Author-X-Name-First: Torsten
Author-X-Name-Last: Hothorn
Title: Distribution-Free Location-Scale Regression
Abstract:
We introduce a generalized additive model for location, scale, and shape (GAMLSS) next of kin aiming at distribution-free and parsimonious regression modeling for arbitrary outcomes. We replace the strict parametric distribution formulating such a model by a transformation function, which in turn is estimated from data. Doing so not only makes the model distribution-free but also allows to limit the number of linear or smooth model terms to a pair of location-scale predictor functions. We derive the likelihood for continuous, discrete, and randomly censored observations, along with corresponding score functions. A plethora of existing algorithms is leveraged for model estimation, including constrained maximum-likelihood, the original GAMLSS algorithm, and transformation trees. Parameter interpretability in the resulting models is closely connected to model selection. We propose the application of a novel best subset selection procedure to achieve especially simple ways of interpretation. All techniques are motivated and illustrated by a collection of applications from different domains, including crossing and partial proportional hazards, complex count regression, nonlinear ordinal regression, and growth curves. All analyses are reproducible with the help of the tram add-on package to the R system for statistical computing and graphics.
Journal: The American Statistician
Pages: 345-356
Issue: 4
Volume: 77
Year: 2023
Month: 10
X-DOI: 10.1080/00031305.2023.2203177
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2203177
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:77:y:2023:i:4:p:345-356
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2244542_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857
Author-Name: Joel E. Cohen
Author-X-Name-First: Joel E.
Author-X-Name-Last: Cohen
Title: First-Passage Times for Random Partial Sums: Yadrenko’s Model for e and Beyond
Abstract:
M. I. Yadrenko discovered that the expectation of the minimum number N1 of independent and identically distributed uniform random variables on (0, 1) that have to be added to exceed 1 is e. For any threshold a > 0, K. G. Russell found the distribution, mean, and variance of the minimum number Na of independent and identically distributed uniform random summands required to exceed a. Here we calculate the distribution and moments of Na when the summands obey the negative exponential and Lévy distributions. The Lévy distribution has infinite mean. We compare these results with the results of Yadrenko and Russell for uniform random summands to see how the expected first-passage time E(Na),a>0, and other moments of Na depend on the distribution of the summand.
Journal: The American Statistician
Pages: 111-114
Issue: 1
Volume: 78
Year: 2024
Month: 1
X-DOI: 10.1080/00031305.2023.2244542
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2244542
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:111-114
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2216252_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857
Author-Name: Stijn Hawinkel
Author-X-Name-First: Stijn
Author-X-Name-Last: Hawinkel
Author-Name: Willem Waegeman
Author-X-Name-First: Willem
Author-X-Name-Last: Waegeman
Author-Name: Steven Maere
Author-X-Name-First: Steven
Author-X-Name-Last: Maere
Title: Out-of-Sample R2: Estimation and Inference
Abstract:
Out-of-sample prediction is the acid test of predictive models, yet an independent test dataset is often not available for assessment of the prediction error. For this reason, out-of-sample performance is commonly estimated using data splitting algorithms such as cross-validation or the bootstrap. For quantitative outcomes, the ratio of variance explained to total variance can be summarized by the coefficient of determination or in-sample R2, which is easy to interpret and to compare across different outcome variables. As opposed to in-sample R2, out-of-sample R2 has not been well defined and the variability on out-of-sample R̂2 has been largely ignored. Usually only its point estimate is reported, hampering formal comparison of predictability of different outcome variables. Here we explicitly define out-of-sample R2 as a comparison of two predictive models, provide an unbiased estimator and exploit recent theoretical advances on uncertainty of data splitting estimates to provide a standard error for R̂2. The performance of the estimators for R2 and its standard error are investigated in a simulation study. We demonstrate our new method by constructing confidence intervals and comparing models for prediction of quantitative Brassica napus and Zea mays phenotypes based on gene expression data. Our method is available in the R-package oosse.
Journal: The American Statistician
Pages: 15-25
Issue: 1
Volume: 78
Year: 2024
Month: 1
X-DOI: 10.1080/00031305.2023.2216252
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2216252
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:15-25
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2223582_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857
Author-Name: Preston Biro
Author-X-Name-First: Preston
Author-X-Name-Last: Biro
Author-Name: Stephen G. Walker
Author-X-Name-First: Stephen G.
Author-X-Name-Last: Walker
Title: Play Call Strategies and Modeling for Target Outcomes in Football
Abstract:
This article considers one-off actions for a football coach who is asking for a specific outcome from a play. This will be in the form of a minimum gain in yards, usually in order to gain a first down. Using a random utility model approach we propose the play to be called is the one which maximizes the probability of the desired outcome. We specifically focus on pass plays, which requires the modeling of outcomes in terms of yards gained, for which we use the family of generalized gamma distributions. The data and results relate to the Fall 2021 Presbyterian College football team, in which we leverage specific information pertaining to the offensive playbook.
Journal: The American Statistician
Pages: 66-75
Issue: 1
Volume: 78
Year: 2024
Month: 1
X-DOI: 10.1080/00031305.2023.2223582
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2223582
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:66-75
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2216253_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857
Author-Name: Larry Han
Author-X-Name-First: Larry
Author-X-Name-Last: Han
Author-Name: Andrea Arfè
Author-X-Name-First: Andrea
Author-X-Name-Last: Arfè
Author-Name: Lorenzo Trippa
Author-X-Name-First: Lorenzo
Author-X-Name-Last: Trippa
Title: Sensitivity Analyses of Clinical Trial Designs: Selecting Scenarios and Summarizing Operating Characteristics
Abstract:
The use of simulation-based sensitivity analyses is fundamental for evaluating and comparing candidate designs of future clinical trials. In this context, sensitivity analyses are especially useful to assess the dependence of important design operating characteristics with respect to various unknown parameters. Typical examples of operating characteristics include the likelihood of detecting treatment effects and the average study duration, which depend on parameters that are unknown until after the onset of the clinical study, such as the distributions of the primary outcomes and patient profiles. Two crucial components of sensitivity analyses are (i) the choice of a set of plausible simulation scenarios and (ii) the list of operating characteristics of interest. We propose a new approach for choosing the set of scenarios to be included in a sensitivity analysis. We maximize a utility criterion that formalizes whether a specific set of sensitivity scenarios is adequate to summarize how the operating characteristics of the trial design vary across plausible values of the unknown parameters. Then, we use optimization techniques to select the best set of simulation scenarios (according to the criteria specified by the investigator) to exemplify the operating characteristics of the trial design. We illustrate our proposal in three trial designs. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 76-87
Issue: 1
Volume: 78
Year: 2024
Month: 1
X-DOI: 10.1080/00031305.2023.2216253
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2216253
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:76-87
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2216239_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857
Author-Name: Samuel Pawel
Author-X-Name-First: Samuel
Author-X-Name-Last: Pawel
Author-Name: Alexander Ly
Author-X-Name-First: Alexander
Author-X-Name-Last: Ly
Author-Name: Eric-Jan Wagenmakers
Author-X-Name-First: Eric-Jan
Author-X-Name-Last: Wagenmakers
Title: Evidential Calibration of Confidence Intervals
Abstract:
We present a novel and easy-to-use method for calibrating error-rate based confidence intervals to evidence-based support intervals. Support intervals are obtained from inverting Bayes factors based on a parameter estimate and its standard error. A k support interval can be interpreted as “the observed data are at least k times more likely under the included parameter values than under a specified alternative.” Support intervals depend on the specification of prior distributions for the parameter under the alternative, and we present several types that allow different forms of external knowledge to be encoded. We also show how prior specification can to some extent be avoided by considering a class of prior distributions and then computing so-called minimum support intervals which, for a given class of priors, have a one-to-one mapping with confidence intervals. We also illustrate how the sample size of a future study can be determined based on the concept of support. Finally, we show how the bound for the Type I error rate of Bayes factors leads to a bound for the coverage of support intervals. An application to data from a clinical trial illustrates how support intervals can lead to inferences that are both intuitive and informative.
Journal: The American Statistician
Pages: 47-57
Issue: 1
Volume: 78
Year: 2024
Month: 1
X-DOI: 10.1080/00031305.2023.2216239
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2216239
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:47-57
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2303414_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857
Author-Name: Skevi Michael
Author-X-Name-First: Skevi
Author-X-Name-Last: Michael
Title: Introduction to Stochastic Finance with Market Examples, 2nd ed
Journal: The American Statistician
Pages: 129-130
Issue: 1
Volume: 78
Year: 2024
Month: 1
X-DOI: 10.1080/00031305.2024.2303414
File-URL: http://hdl.handle.net/10.1080/00031305.2024.2303414
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:129-130
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2164054_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857
Author-Name: David Rügamer
Author-X-Name-First: David
Author-X-Name-Last: Rügamer
Author-Name: Chris Kolb
Author-X-Name-First: Chris
Author-X-Name-Last: Kolb
Author-Name: Nadja Klein
Author-X-Name-First: Nadja
Author-X-Name-Last: Klein
Title: Semi-Structured Distributional Regression
Abstract:
Combining additive models and neural networks allows to broaden the scope of statistical regression and extend deep learning-based approaches by interpretable structured additive predictors at the same time. Existing attempts uniting the two modeling approaches are, however, limited to very specific combinations and, more importantly, involve an identifiability issue. As a consequence, interpretability and stable estimation are typically lost. We propose a general framework to combine structured regression models and deep neural networks into a unifying network architecture. To overcome the inherent identifiability issues between different model parts, we construct an orthogonalization cell that projects the deep neural network into the orthogonal complement of the statistical model predictor. This enables proper estimation of structured model parts and thereby interpretability. We demonstrate the framework’s efficacy in numerical experiments and illustrate its special merits in benchmarks and real-world applications.
Journal: The American Statistician
Pages: 88-99
Issue: 1
Volume: 78
Year: 2024
Month: 1
X-DOI: 10.1080/00031305.2022.2164054
File-URL: http://hdl.handle.net/10.1080/00031305.2022.2164054
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:88-99
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2282631_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857
Author-Name: Jessica Allen
Author-X-Name-First: Jessica
Author-X-Name-Last: Allen
Author-Name: Ting Wang
Author-X-Name-First: Ting
Author-X-Name-Last: Wang
Title: Hidden Markov Models for Low-Frequency Earthquake Recurrence
Abstract:
Low-frequency earthquakes (LFEs) are small magnitude earthquakes with frequencies of 1–10 Hertz which often occur in overlapping sequence forming persistent seismic tremors. They provide insights into large earthquake processes along plate boundaries. LFEs occur stochastically in time, often forming temporally recurring clusters. The occurrence times are typically modeled using point processes and their intensity functions. We demonstrate how to use hidden Markov models coupled with visualization techniques to model inter-arrival times directly, classify LFE occurrence patterns along the San Andreas Fault, and perform model selection. We highlight two subsystems of LFE activity corresponding to periods of alternating episodic and quiescent behavior.
Journal: The American Statistician
Pages: 100-110
Issue: 1
Volume: 78
Year: 2024
Month: 1
X-DOI: 10.1080/00031305.2023.2282631
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2282631
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:100-110
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2226184_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857
Author-Name: Rolf Larsson
Author-X-Name-First: Rolf
Author-X-Name-Last: Larsson
Title: Confidence Distributions for the Autoregressive Parameter
Abstract:
The notion of confidence distributions is applied to inference about the parameter in a simple autoregressive model, allowing the parameter to take the value one. This makes it possible to compare to asymptotic approximations in both the stationary and the nonstationary cases at the same time. The main point, however, is to compare to a Bayesian analysis of the same problem. A noninformative prior for a parameter, in the sense of Jeffreys, is given as the ratio of the confidence density and the likelihood. In this way, the similarity between the confidence and noninformative Bayesian frameworks is exploited. It is shown that, in the stationary case, asymptotically the so induced prior is flat. However, if a unit parameter is allowed, the induced prior has to have a spike at one of some size. Simulation studies and two empirical examples illustrate the ideas.
Journal: The American Statistician
Pages: 58-65
Issue: 1
Volume: 78
Year: 2024
Month: 1
X-DOI: 10.1080/00031305.2023.2226184
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2226184
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:58-65
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2277156_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857
Author-Name: Ronald Christensen
Author-X-Name-First: Ronald
Author-X-Name-Last: Christensen
Title: Comment on “Forbidden Knowledge and Specialized Training: A Versatile Solution for the Two Main Sources of Overfitting in Linear Regression,” by Rohlfs (2023)
Journal: The American Statistician
Pages: 131-133
Issue: 1
Volume: 78
Year: 2024
Month: 1
X-DOI: 10.1080/00031305.2023.2277156
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2277156
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:131-133
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2304534_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857
Author-Name: The Editors
Title: The American Statistician 2023 Associate Editors
Journal: The American Statistician
Pages: i-i
Issue: 1
Volume: 78
Year: 2024
Month: 1
X-DOI: 10.1080/00031305.2024.2304534
File-URL: http://hdl.handle.net/10.1080/00031305.2024.2304534
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:i-i
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2192746_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857
Author-Name: Albert Vexler
Author-X-Name-First: Albert
Author-X-Name-Last: Vexler
Author-Name: Alan D. Hutson
Author-X-Name-First: Alan D.
Author-X-Name-Last: Hutson
Title: A Characterization of Most(More) Powerful Test Statistics with Simple Nonparametric Applications
Abstract:
Data-driven most powerful tests are statistical hypothesis decision-making tools that deliver the greatest power against a fixed null hypothesis among all corresponding data-based tests of a given size. When the underlying data distributions are known, the likelihood ratio principle can be applied to conduct most powerful tests. Reversing this notion, we consider the following questions. (a) Assuming a test statistic, say T, is given, how can we transform T to improve the power of the test? (b) Can T be used to generate the most powerful test? (c) How does one compare test statistics with respect to an attribute of the desired most powerful decision-making procedure? To examine these questions, we propose one-to-one mapping of the term “most powerful” to the distribution properties of a given test statistic via matching characterization. This form of characterization has practical applicability and aligns well with the general principle of sufficiency. Findings indicate that to improve a given test, we can employ relevant ancillary statistics that do not have changes in their distributions with respect to tested hypotheses. As an example, the present method is illustrated by modifying the usual t-test under nonparametric settings. Numerical studies based on generated data and a real-data set confirm that the proposed approach can be useful in practice.
Journal: The American Statistician
Pages: 36-46
Issue: 1
Volume: 78
Year: 2024
Month: 1
X-DOI: 10.1080/00031305.2023.2192746
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2192746
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:36-46
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2199800_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857
Author-Name: Johannes Bracher
Author-X-Name-First: Johannes
Author-X-Name-Last: Bracher
Author-Name: Nils Koster
Author-X-Name-First: Nils
Author-X-Name-Last: Koster
Author-Name: Fabian Krüger
Author-X-Name-First: Fabian
Author-X-Name-Last: Krüger
Author-Name: Sebastian Lerch
Author-X-Name-First: Sebastian
Author-X-Name-Last: Lerch
Title: Learning to Forecast: The Probabilistic Time Series Forecasting Challenge
Abstract:
We report on a course project in which students submit weekly probabilistic forecasts of two weather variables and one financial variable. This real-time format allows students to engage in practical forecasting, which requires a diverse set of skills in data science and applied statistics. We describe the context and aims of the course, and discuss design parameters like the selection of target variables, the forecast submission process, the evaluation of forecast performance, and the feedback provided to students. Furthermore, we describe empirical properties of students’ probabilistic forecasts, as well as some lessons learned on our part.
Journal: The American Statistician
Pages: 115-127
Issue: 1
Volume: 78
Year: 2024
Month: 1
X-DOI: 10.1080/00031305.2023.2199800
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2199800
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:115-127
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2302792_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857
Author-Name: Maria Francesca Marino
Author-X-Name-First: Maria Francesca
Author-X-Name-Last: Marino
Title: Applied Linear Regression for Longitudinal Data: With an Emphasis on Missing Observations
Journal: The American Statistician
Pages: 128-129
Issue: 1
Volume: 78
Year: 2024
Month: 1
X-DOI: 10.1080/00031305.2024.2302792
File-URL: http://hdl.handle.net/10.1080/00031305.2024.2302792
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:128-129
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2216247_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857
Author-Name: Biao Zhang
Author-X-Name-First: Biao
Author-X-Name-Last: Zhang
Title: Inverse Probability Weighting Estimation in Completely Randomized Experiments
Abstract:
In addition to treatment assignments and observed outcomes, covariate information is often available prior to randomization in completely randomized experiments that compare an active treatment versus control. The analysis of covariance (ANCOVA) method is commonly applied to adjust for baseline covariates in order to improve precision. We focus on making propensity score-based adjustment to covariates under the completely randomized design in a finite population of experimental units with two treatment groups. We study inverse probability weighting (IPW) estimation of the finite-population average treatment effect for a general class of working propensity score models, which includes generalized linear models for binary data. We provide randomization-based asymptotic analysis of the propensity score approach and explore the finite-population asymptotic behaviors of two IPW estimators of the average treatment effect. We identify a condition under which propensity score-based covariate adjustment is asymptotically equivalent to an ANCOVA-based covariate adjustment and improves precision compared with a simple unadjusted comparison between treatment and control arms. In particular, when the working propensity score is fitted by a generalized linear model for binary data with an intercept term, the asymptotic variance of the IPW estimators is the same for any link function, including identity link, logit link, probit link, and complementary log-log link. We demonstrate these methods using an HIV clinical trial and a post-traumatic stress disorder study. Finally, we present a simulation study comparing the finite-sample performance of IPW and other methods for both continuous and binary outcomes. Supplementary materials for this article are available online.
Journal: The American Statistician
Pages: 26-35
Issue: 1
Volume: 78
Year: 2024
Month: 1
X-DOI: 10.1080/00031305.2023.2216247
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2216247
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:26-35
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2249522_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20231214T103247 git hash: d7a2cb0857
Author-Name: Matthew Sainsbury-Dale
Author-X-Name-First: Matthew
Author-X-Name-Last: Sainsbury-Dale
Author-Name: Andrew Zammit-Mangion
Author-X-Name-First: Andrew
Author-X-Name-Last: Zammit-Mangion
Author-Name: Raphaël Huser
Author-X-Name-First: Raphaël
Author-X-Name-Last: Huser
Title: Likelihood-Free Parameter Estimation with Neural Bayes Estimators
Abstract:
Neural Bayes estimators are neural networks that approximate Bayes estimators. They are fast, likelihood-free, and amenable to rapid bootstrap-based uncertainty quantification. In this article, we aim to increase the awareness of statisticians to this relatively new inferential tool, and to facilitate its adoption by providing user-friendly open-source software. We also give attention to the ubiquitous problem of estimating parameters from replicated data, which we address using permutation-invariant neural networks. Through extensive simulation studies we demonstrate that neural Bayes estimators can be used to quickly estimate parameters in weakly identified and highly parameterized models with relative ease. We illustrate their applicability through an analysis of extreme sea-surface temperature in the Red Sea where, after training, we obtain parameter estimates and bootstrap-based confidence intervals from hundreds of spatial fields in a fraction of a second.
Journal: The American Statistician
Pages: 1-14
Issue: 1
Volume: 78
Year: 2024
Month: 1
X-DOI: 10.1080/00031305.2023.2249522
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2249522
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:1:p:1-14
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2250399_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20240209T083504 git hash: db97ba8e3a
Author-Name: Chixiang Chen
Author-X-Name-First: Chixiang
Author-X-Name-Last: Chen
Author-Name: Shuo Chen
Author-X-Name-First: Shuo
Author-X-Name-Last: Chen
Author-Name: Qi Long
Author-X-Name-First: Qi
Author-X-Name-Last: Long
Author-Name: Sudeshna Das
Author-X-Name-First: Sudeshna
Author-X-Name-Last: Das
Author-Name: Ming Wang
Author-X-Name-First: Ming
Author-X-Name-Last: Wang
Title: Multiple-Model-based Robust Estimation of Causal Treatment Effect on a Binary Outcome with Integrated Information from Secondary Outcomes
Abstract:
An assessment of the causal treatment effect in the development and progression of certain diseases is important in clinical trials and biomedical studies. However, it is not possible to infer a causal relationship when the treatment assignment is imbalanced and confounded by other mechanisms. Specifically, when the treatment assignment is not randomized and the primary outcome is binary, a conventional logistic regression may not be valid to elucidate any causal inference. Moreover, exclusively capturing all confounders is extremely difficult and even impossible in large-scale observational studies. We propose a multiple-model-based robust (MultiMR) estimator for estimating the causal effect with a binary outcome, where multiple propensity score models and conditional mean imputation models are used to ensure estimation robustness. Furthermore, we propose an enhanced MultiMR (eMultiMR) estimator that reduces the estimation variability of MultiMR estimates by incorporating secondary outcomes that are highly correlated with the primary binary outcome. The resulting estimates are less sensitive to model mis-specification compared to those based on state-of-the-art doubly-robust methods. These estimates are verified through both theoretical and numerical assessments. The utility of (e)MultiMR estimation is illustrated using the Uniform Data Set (UDS) from the National Alzheimer’s Coordinating Center with the objective of detecting the causal effect of the short-term use of antihypertensive medications on the development of dementia or mild cognitive impairment. The proposed method has been implemented in an R package and is available at https://github.com/chencxxy28/eMultiMR.
Journal: The American Statistician
Pages: 150-160
Issue: 2
Volume: 78
Year: 2024
Month: 4
X-DOI: 10.1080/00031305.2023.2250399
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2250399
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:2:p:150-160
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2257237_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20240209T083504 git hash: db97ba8e3a
Author-Name: Nicholas Larsen
Author-X-Name-First: Nicholas
Author-X-Name-Last: Larsen
Author-Name: Jonathan Stallrich
Author-X-Name-First: Jonathan
Author-X-Name-Last: Stallrich
Author-Name: Srijan Sengupta
Author-X-Name-First: Srijan
Author-X-Name-Last: Sengupta
Author-Name: Alex Deng
Author-X-Name-First: Alex
Author-X-Name-Last: Deng
Author-Name: Ron Kohavi
Author-X-Name-First: Ron
Author-X-Name-Last: Kohavi
Author-Name: Nathaniel T. Stevens
Author-X-Name-First: Nathaniel T.
Author-X-Name-Last: Stevens
Title: Statistical Challenges in Online Controlled Experiments: A Review of A/B Testing Methodology
Abstract:
The rise of internet-based services and products in the late 1990s brought about an unprecedented opportunity for online businesses to engage in large scale data-driven decision making. Over the past two decades, organizations such as Airbnb, Alibaba, Amazon, Baidu, Booking.com, Alphabet’s Google, LinkedIn, Lyft, Meta’s Facebook, Microsoft, Netflix, Twitter, Uber, and Yandex have invested tremendous resources in online controlled experiments (OCEs) to assess the impact of innovation on their customers and businesses. Running OCEs at scale has presented a host of challenges requiring solutions from many domains. In this article we review challenges that require new statistical methodologies to address them. In particular, we discuss the practice and culture of online experimentation, as well as its statistics literature, placing the current methodologies within their relevant statistical lineages and providing illustrative examples of OCE applications. Our goal is to raise academic statisticians’ awareness of these new research opportunities to increase collaboration between academia and the online industry.
Journal: The American Statistician
Pages: 135-149
Issue: 2
Volume: 78
Year: 2024
Month: 4
X-DOI: 10.1080/00031305.2023.2257237
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2257237
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:2:p:135-149
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2250401_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20240209T083504 git hash: db97ba8e3a
Author-Name: Lin Ge
Author-X-Name-First: Lin
Author-X-Name-Last: Ge
Author-Name: Yuzi Zhang
Author-X-Name-First: Yuzi
Author-X-Name-Last: Zhang
Author-Name: Lance A. Waller
Author-X-Name-First: Lance A.
Author-X-Name-Last: Waller
Author-Name: Robert H. Lyles
Author-X-Name-First: Robert H.
Author-X-Name-Last: Lyles
Title: Enhanced Inference for Finite Population Sampling-Based Prevalence Estimation with Misclassification Errors
Abstract:
Epidemiologic screening programs often make use of tests with small, but nonzero probabilities of misdiagnosis. In this article, we assume the target population is finite with a fixed number of true cases, and that we apply an imperfect test with known sensitivity and specificity to a sample of individuals from the population. In this setting, we propose an enhanced inferential approach for use in conjunction with sampling-based bias-corrected prevalence estimation. While ignoring the finite nature of the population can yield markedly conservative estimates, direct application of a standard finite population correction (FPC) conversely leads to underestimation of variance. We uncover a way to leverage the typical FPC indirectly toward valid statistical inference. In particular, we derive a readily estimable extra variance component induced by misclassification in this specific but arguably common diagnostic testing scenario. Our approach yields a standard error estimate that properly captures the sampling variability of the usual bias-corrected maximum likelihood estimator of disease prevalence. Finally, we develop an adapted Bayesian credible interval for the true prevalence that offers improved frequentist properties (i.e., coverage and width) relative to a Wald-type confidence interval. We report the simulation results to demonstrate the enhanced performance of the proposed inferential methods.
Journal: The American Statistician
Pages: 192-198
Issue: 2
Volume: 78
Year: 2024
Month: 4
X-DOI: 10.1080/00031305.2023.2250401
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2250401
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:2:p:192-198
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2249967_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20240209T083504 git hash: db97ba8e3a
Author-Name: Sachin S. Pandya
Author-X-Name-First: Sachin S.
Author-X-Name-Last: Pandya
Author-Name: Xiaomeng Li
Author-X-Name-First: Xiaomeng
Author-X-Name-Last: Li
Author-Name: Eric Barón
Author-X-Name-First: Eric
Author-X-Name-Last: Barón
Author-Name: Timothy E. Moore
Author-X-Name-First: Timothy E.
Author-X-Name-Last: Moore
Title: Bayesian Detection of Bias in Peremptory Challenges Using Historical Strike Data
Abstract:
United States law bars using peremptory strikes during jury selection because of prospective juror race, ethnicity, sex, or membership in certain other cognizable classes. Here, we extend a Bayesian approach for detecting such illegal strike bias by showing how to incorporate historical data on an attorney’s use of peremptory strikes in past cases. In so doing, we use the power prior to adjust the weight of such historical information in the analysis. Using simulations, we show how the choice of the power prior’s discounting parameter influences bias detection (how likely the credible interval for the bias parameter excludes zero), depending on the degree of incompatibility between current and historical trial data. Finally, we extend this approach with a prototype software application that lawyers could use to detect strike bias in real time during jury-selection. We illustrate this application’s use with real historical strike data from a convenience sample of cases from one court.
Journal: The American Statistician
Pages: 209-219
Issue: 2
Volume: 78
Year: 2024
Month: 4
X-DOI: 10.1080/00031305.2023.2249967
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2249967
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:2:p:209-219
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2259962_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20240209T083504 git hash: db97ba8e3a
Author-Name: Alberto Brini
Author-X-Name-First: Alberto
Author-X-Name-Last: Brini
Author-Name: Edwin R. van den Heuvel
Author-X-Name-First: Edwin R.
Author-X-Name-Last: van den Heuvel
Title: Missing Data Imputation with High-Dimensional Data
Abstract:
Imputation of missing data in high-dimensional datasets with more variables P than samples N, P≫N, is hampered by the data dimensionality. For multivariate imputation, the covariance matrix is ill conditioned and cannot be properly estimated. For fully conditional imputation, the regression models for imputation cannot include all the variables. Thus, the high dimension requires special imputation approaches. In this article, we provide an overview and realistic comparisons of imputation approaches for high-dimensional data when applied to a linear mixed modeling (LMM) framework. We examine approaches from three different classes using simulation studies: multiple imputation with penalized regression, multiple imputation with recursive partitioning and predictive mean matching; and multiple imputation with Principal Component Analysis (PCA). We illustrate the methods on a real case study where a multivariate outcome (i.e., an extracted set of correlated biomarkers from human urine samples) was collected and monitored over time and we discuss the proposed methods with more standard imputation techniques that could be applied by ignoring either the multivariate or the longitudinal dimension. Our simulations demonstrate the superiority of the recursive partitioning and predictive mean matching algorithm over the other methods in terms of bias, mean squared error and coverage of the LMM parameter estimates when compared to those obtained from a data analysis without missingness, although it comes at the expense of high computational costs. It is worthwhile reconsidering much faster methodologies like the one relying on PCA.
Journal: The American Statistician
Pages: 240-252
Issue: 2
Volume: 78
Year: 2024
Month: 4
X-DOI: 10.1080/00031305.2023.2259962
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2259962
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:2:p:240-252
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2320219_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20240209T083504 git hash: db97ba8e3a
Author-Name: Yang Ni
Author-X-Name-First: Yang
Author-X-Name-Last: Ni
Title: Deep Learning and Scientific Computing with R torch
Journal: The American Statistician
Pages: 264-264
Issue: 2
Volume: 78
Year: 2024
Month: 4
X-DOI: 10.1080/00031305.2024.2320219
File-URL: http://hdl.handle.net/10.1080/00031305.2024.2320219
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:2:p:264-264
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2267639_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20240209T083504 git hash: db97ba8e3a
Author-Name: Adriana Verónica Blanc
Author-X-Name-First: Adriana Verónica
Author-X-Name-Last: Blanc
Title: The Phistogram
Abstract:
This article introduces a new kind of histogram-based representation for univariate random variables, named the phistogram because of its perceptual qualities. The technique relies on shifted groupings of data, creating a color-gradient zone that evidences the uncertainty from smoothing and highlights sampling issues. In this way, the phistogram offers a deep and visually appealing perspective on the finite sample peculiarities, being capable of depicting the underlying distribution as well, thus, becoming an useful complement to histograms and other statistical summaries. Although not limited to it, the present construction is derived from the equal-area histogram, a variant that differs conceptually from the traditional one. As such a distinction is not greatly emphasized in the literature, the graphical fundamentals are described in detail, and an alternative terminology is proposed to separate some concepts. Additionally, a compact notation is adopted to integrate the representation’s metadata into the graphic itself.
Journal: The American Statistician
Pages: 229-239
Issue: 2
Volume: 78
Year: 2024
Month: 4
X-DOI: 10.1080/00031305.2023.2267639
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2267639
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:2:p:229-239
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2252870_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20240209T083504 git hash: db97ba8e3a
Author-Name: Chengxin Yang
Author-X-Name-First: Chengxin
Author-X-Name-Last: Yang
Author-Name: Jerome P. Reiter
Author-X-Name-First: Jerome P.
Author-X-Name-Last: Reiter
Title: Differentially Private Methods for Releasing Results of Stability Analyses
Abstract:
Data stewards and analysts can promote transparent and trustworthy science and policy-making by facilitating assessments of the sensitivity of published results to alternate analysis choices. For example, researchers may want to assess whether the results change substantially when different subsets of data points (e.g., sets formed by demographic characteristics) are used in the analysis, or when different models (e.g., with or without log transformations) are estimated on the data. Releasing the results of such stability analyses leaks information about the data subjects. When the underlying data are confidential, the data stewards and analysts may seek to bound this information leakage. We present methods for stability analyses that can satisfy differential privacy, a definition of data confidentiality providing such bounds. We use regression modeling as the motivating example. The basic idea is to split the data into disjoint subsets, compute a measure summarizing the difference between the published and alternative analysis on each subset, aggregate these subset estimates, and add noise to the aggregated value to satisfy differential privacy. We illustrate the methods using regressions in which an analyst compares coefficient estimates for different groups in the data, and in which analysts fit two different models on the data.
Journal: The American Statistician
Pages: 180-191
Issue: 2
Volume: 78
Year: 2024
Month: 4
X-DOI: 10.1080/00031305.2023.2252870
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2252870
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:2:p:180-191
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2259969_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20240209T083504 git hash: db97ba8e3a
Author-Name: Weiwen Miao
Author-X-Name-First: Weiwen
Author-X-Name-Last: Miao
Author-Name: Joseph L. Gastwirth
Author-X-Name-First: Joseph L.
Author-X-Name-Last: Gastwirth
Title: The Application of the Likelihood Ratio Test and the Cochran-Mantel-Haenszel Test to Discrimination Cases
Abstract:
In practice, the ultimate outcome of many important discrimination cases, for example, the Wal-Mart, Nike and Goldman-Sachs equal pay cases, is determined at the stage when the plaintiffs request that the case be certified as a class action. The primary statistical issue at this time is whether the employment practice in question leads to a common pattern of outcomes disadvantaging most plaintiffs. However, there are no formal procedures or government guidelines for checking whether an employment practice results in a common pattern of disparity. This article proposes using the slightly modified likelihood ratio test and the one-sided Cochran-Mantel-Haenszel (CMH) test to examine data relevant to deciding whether this commonality requirement is satisfied. Data considered at the class certification stage from several actual cases are analyzed by the proposed procedures. The results often show that the employment practice at issue created a common pattern of disparity, however, based on the evidence presented to the courts, the class action requests were denied.
Journal: The American Statistician
Pages: 253-263
Issue: 2
Volume: 78
Year: 2024
Month: 4
X-DOI: 10.1080/00031305.2023.2259969
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2259969
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:2:p:253-263
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2249529_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20240209T083504 git hash: db97ba8e3a
Author-Name: Wen-Han Hwang
Author-X-Name-First: Wen-Han
Author-X-Name-Last: Hwang
Author-Name: Lu-Fang Chen
Author-X-Name-First: Lu-Fang
Author-X-Name-Last: Chen
Author-Name: Jakub Stoklosa
Author-X-Name-First: Jakub
Author-X-Name-Last: Stoklosa
Title: Counting the Unseen: Estimation of Susceptibility Proportions in Zero-Inflated Models Using a Conditional Likelihood Approach
Abstract:
Zero-inflated count data models are widely used in various fields such as ecology, epidemiology, and transportation, where count data with a large proportion of zeros is prevalent. Despite their widespread use, their theoretical properties have not been extensively studied. This study aims to investigate the impact of ignoring heterogeneity in event count intensity and susceptibility probability on zero-inflated count data analysis within the zero-inflated Poisson framework. To address this issue, we propose a novel conditional likelihood approach that uses positive count data only to estimate event count intensity parameters and develop a consistent estimator for estimating the average susceptibility probability. Our approach is compared with the maximum likelihood approach, and we demonstrate our findings through a comprehensive simulation study and real data analysis. The results can also be extended to zero-inflated binomial and geometric models with similar conclusions. These findings contribute to the understanding of the theoretical properties of zero-inflated count data models and provide a practical approach to handling heterogeneity in such models.
Journal: The American Statistician
Pages: 161-170
Issue: 2
Volume: 78
Year: 2024
Month: 4
X-DOI: 10.1080/00031305.2023.2249529
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2249529
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:2:p:161-170
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2320949_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20240209T083504 git hash: db97ba8e3a
Author-Name: Gabriel Wallin
Author-X-Name-First: Gabriel
Author-X-Name-Last: Wallin
Title: An Introduction to R and Python for Data Analysis: A Side-by-Side Approach.
Journal: The American Statistician
Pages: 265-265
Issue: 2
Volume: 78
Year: 2024
Month: 4
X-DOI: 10.1080/00031305.2024.2320949
File-URL: http://hdl.handle.net/10.1080/00031305.2024.2320949
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:2:p:265-265
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2270649_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20240209T083504 git hash: db97ba8e3a
Author-Name: Anne Helby Petersen
Author-X-Name-First: Anne Helby
Author-X-Name-Last: Petersen
Author-Name: Claus Ekstrøm
Author-X-Name-First: Claus
Author-X-Name-Last: Ekstrøm
Title: Technical Validation of Plot Designs by Use of Deep Learning
Abstract:
When does inspecting a certain graphical plot allow for an investigator to reach the right statistical conclusion? Visualizations are commonly used for various tasks in statistics—including model diagnostics and exploratory data analysis—and though attractive due to its intuitive nature, the lack of available methods for validating plots is a major drawback. We propose a new technical validation method for visual reasoning. Our method trains deep neural networks to distinguish between plots simulated under two different data generating mechanisms (null or alternative), and we use the classification accuracy as a technical validation score (TVS). The TVS measures the information content in the plots, and TVS values can be used to compare different plots or different choices of data generating mechanisms, thereby providing a meaningful scale that new visual reasoning procedures can be validated against. We apply the method to three popular diagnostic plots for linear regression, namely scatterplots, quantile-quantile plots and residual plots. We consider various types and degrees of misspecification, as well as different within-plot sample sizes. Our method produces TVSs that increase with increasing sample size and decrease with increasing difficulty, and hence the TVS is a meaningful measure of validity.
Journal: The American Statistician
Pages: 220-228
Issue: 2
Volume: 78
Year: 2024
Month: 4
X-DOI: 10.1080/00031305.2023.2270649
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2270649
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:2:p:220-228
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2242442_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20240209T083504 git hash: db97ba8e3a
Author-Name: Quang Nguyen
Author-X-Name-First: Quang
Author-X-Name-Last: Nguyen
Author-Name: Ronald Yurko
Author-X-Name-First: Ronald
Author-X-Name-Last: Yurko
Author-Name: Gregory J. Matthews
Author-X-Name-First: Gregory J.
Author-X-Name-Last: Matthews
Title: Here Comes the STRAIN: Analyzing Defensive Pass Rush in American Football with Player Tracking Data
Abstract:
In American football, a pass rush is an attempt by the defensive team to disrupt the offense and prevent the quarterback (QB) from completing a pass. Existing metrics for assessing pass rush performance are either discrete-time quantities or based on subjective judgment. Using player tracking data, we propose STRAIN, a novel metric for evaluating pass rushers in the National Football League (NFL) at the continuous-time within-play level. Inspired by the concept of strain rate in materials science, STRAIN is a simple and interpretable means for measuring defensive pressure in football. It is a directly observed statistic as a function of two features: the distance between the pass rusher and QB, and the rate at which this distance is being reduced. Our metric possesses great predictability of pressure and stability over time. We also fit a multilevel model for STRAIN to understand the defensive pressure contribution of every pass rusher at the play-level. We apply our approach to NFL data and present results for the first eight weeks of the 2021 regular season. In particular, we provide comparisons of STRAIN for different defensive positions and play outcomes, and rankings of the NFL’s best pass rushers according to our metric.
Journal: The American Statistician
Pages: 199-208
Issue: 2
Volume: 78
Year: 2024
Month: 4
X-DOI: 10.1080/00031305.2023.2242442
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2242442
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:2:p:199-208
Template-Type: ReDIF-Article 1.0
# input file: UTAS_A_2249965_J.xml processed with: repec_from_jats12.xsl darts-xml-transformations-20240209T083504 git hash: db97ba8e3a
Author-Name: Hsin-wen Chang
Author-X-Name-First: Hsin-wen
Author-X-Name-Last: Chang
Author-Name: Shu-Hsiang Wang
Author-X-Name-First: Shu-Hsiang
Author-X-Name-Last: Wang
Title: Bivariate Analysis of Distribution Functions Under Biased Sampling
Abstract:
This article compares distribution functions among pairs of locations in their domains, in contrast to the typical approach of univariate comparison across individual locations. This bivariate approach is studied in the presence of sampling bias, which has been gaining attention in COVID-19 studies that over-represent more symptomatic people. In cases with either known or unknown sampling bias, we introduce Anderson–Darling-type tests based on both the univariate and bivariate formulation. A simulation study shows the superior performance of the bivariate approach over the univariate one. We illustrate the proposed methods using real data on the distribution of the number of symptoms suggestive of COVID-19.
Journal: The American Statistician
Pages: 171-179
Issue: 2
Volume: 78
Year: 2024
Month: 4
X-DOI: 10.1080/00031305.2023.2249965
File-URL: http://hdl.handle.net/10.1080/00031305.2023.2249965
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:taf:amstat:v:78:y:2024:i:2:p:171-179