- Home
- A-Z Publications
- Annual Review of Statistics and Its Application
- Previous Issues
- Volume 1, 2014
Annual Review of Statistics and Its Application - Volume 1, 2014
Volume 1, 2014
-
-
What Is Statistics?
Vol. 1 (2014), pp. 1–9More LessOne might think that there is a simple answer to the question posed in the title of the form “Statistics is….” Sadly, there is not, although many contemporary statistical authors have attempted to answer the question. This article captures the essence of some of these efforts, setting them in their historical contexts. In the process, we focus on the cross-disciplinary nature of much modern statistical research. This discussion serves as a backdrop to the the aims of the Annual Review of Statistics and its Application, which begins publication with the present volume.
-
-
-
A Systematic Statistical Approach to Evaluating Evidence from Observational Studies
Vol. 1 (2014), pp. 11–39More LessThreats to the validity of observational studies on the effects of interventions raise questions about the appropriate role of such studies in decision making. Nonetheless, scholarly journals in fields such as medicine, education, and the social sciences feature many such studies, often with limited exploration of these threats, and the lay press is rife with news stories based on these studies. Consumers of these studies rely on the expertise of the study authors to conduct appropriate analyses, and on the thoroughness of the scientific peer-review process to check the validity, but the introspective and ad hoc nature of the design of these analyses appears to elude any meaningful objective assessment of their performance. Here, we review some of the challenges encountered in observational studies and review an alternative, data-driven approach to observational study design, execution, and analysis. Although much work remains, we believe this research direction shows promise.
-
-
-
The Role of Statistics in the Discovery of a Higgs Boson
Vol. 1 (2014), pp. 41–59More LessThe 2012–2013 discovery of a Higgs boson appears to have filled the final missing gap in the Standard Model of particle physics and was greeted with fanfare by the scientific community and by the public at large. Particle physicists have developed and rigorously tested a specialized statistical tool kit that is designed for the search for new physics. This tool kit was put to the test in a 40-year search that culminated in the discovery of a Higgs boson. This article reviews these statistical methods, the controversies that surround them, and how they led to this historic discovery.
-
-
-
Brain Imaging Analysis
Vol. 1 (2014), pp. 61–85More LessThe increasing availability of brain imaging technologies has led to intense neuroscientific inquiry into the human brain. Studies often investigate brain function related to emotion, cognition, language, memory, and responses to numerous other external stimuli, as well as resting-state brain function. Brain imaging studies also attempt to determine the functional or structural basis for psychiatric or neurological disorders and to examine the responses of these disorders to treatment. Neuroimaging is a highly interdisciplinary field, and statistics plays a critical role in establishing rigorous methods to extract information and to quantify evidence for formal inferences. Neuroimaging data present numerous challenges for statistical analysis, including the vast amounts of data collected from each individual and the complex temporal and spatial dependencies present in the data. I briefly provide background on various types of neuroimaging data and analysis objectives that are commonly targeted in the field. I also present a survey of existing methods aimed at these objectives and identify particular areas offering opportunities for future statistical contribution.
-
-
-
Statistics and Climate
Vol. 1 (2014), pp. 87–101More LessFor a statistician, climate is the distribution of weather and other variables that are part of the climate system. This distribution changes over time. This review considers some aspects of climate data, climate model assessment, and uncertainty estimation pertinent to climate issues, focusing mainly on temperatures. Some interesting methodological needs that arise from these issues are also considered.
-
-
-
Climate Simulators and Climate Projections
Vol. 1 (2014), pp. 103–123More LessWe provide a statistical interpretation of current practice in climate modeling. In this review, we define weather and climate, clarify the relationship between simulator output and simulator climate, distinguish between a climate simulator and a statistical climate model, provide a statistical interpretation of the ubiquitous practice of anomaly correction along with a substantial generalization (the best-parameter approach), and interpret simulator/data comparisons as posterior predictive checking, including a simple adjustment to allow for double counting. We also discuss statistical approaches to simulator tuning, assessing parametric uncertainty, and responding to unrealistic outputs. We finish with a more general discussion of larger themes.
-
-
-
Probabilistic Forecasting
Vol. 1 (2014), pp. 125–151More LessA probabilistic forecast takes the form of a predictive probability distribution over future quantities or events of interest. Probabilistic forecasting aims to maximize the sharpness of the predictive distributions, subject to calibration, on the basis of the available information set. We formalize and study notions of calibration in a prediction space setting. In practice, probabilistic calibration can be checked by examining probability integral transform (PIT) histograms. Proper scoring rules such as the logarithmic score and the continuous ranked probability score serve to assess calibration and sharpness simultaneously. As a special case, consistent scoring functions provide decision-theoretically coherent tools for evaluating point forecasts. We emphasize methodological links to parametric and nonparametric distributional regression techniques, which attempt to model and to estimate conditional distribution functions; we use the context of statistically postprocessed ensemble forecasts in numerical weather prediction as an example. Throughout, we illustrate concepts and methodologies in data examples.
-
-
-
Bayesian Computational Tools
Vol. 1 (2014), pp. 153–177More LessThis article surveys advances in the field of Bayesian computation over the past 20 years from a purely personal viewpoint, hence containing some ommissions given the spectrum of the field. Monte Carlo, MCMC, and ABC themes are covered here, whereas the rapidly expanding area of particle methods is only briefly mentioned and different approximative techniques such as variational Bayes and linear Bayes methods do not appear at all. This article also contains some novel computational entries on the double-exponential model that may be of interest.
-
-
-
Bayesian Computation Via Markov Chain Monte Carlo
Vol. 1 (2014), pp. 179–201More LessMarkov chain Monte Carlo (MCMC) algorithms are an indispensable tool for performing Bayesian inference. This review discusses widely used sampling algorithms and illustrates their implementation on a probit regression model for lupus data. The examples considered highlight the importance of tuning the simulation parameters and underscore the important contributions of modern developments such as adaptive MCMC. We then use the theory underlying MCMC to explain the validity of the algorithms considered and to assess the variance of the resulting Monte Carlo estimators.
-
-
-
Build, Compute, Critique, Repeat: Data Analysis with Latent Variable Models
Vol. 1 (2014), pp. 203–232More LessWe survey latent variable models for solving data-analysis problems. A latent variable model is a probabilistic model that encodes hidden patterns in the data. We uncover these patterns from their conditional distribution and use them to summarize data and form predictions. Latent variable models are important in many fields, including computational biology, natural language processing, and social network analysis. Our perspective is that models are developed iteratively: We build a model, use it to analyze data, assess how it succeeds and fails, revise it, and repeat. We describe how new research has transformed these essential activities. First, we describe probabilistic graphical models, a language for formulating latent variable models. Second, we describe mean field variational inference, a generic algorithm for approximating conditional distributions. Third, we describe how to use our analyses to solve problems: exploring the data, forming predictions, and pointing us in the direction of improved models.
-
-
-
Structured Regularizers for High-Dimensional Problems: Statistical and Computational Issues
Vol. 1 (2014), pp. 233–253More LessRegularization is a widely used technique throughout statistics, machine learning, and applied mathematics. Modern applications in science and engineering lead to massive and complex data sets, which motivate the use of more structured types of regularizers. This survey provides an overview of the use of structured regularization in high-dimensional statistics, including regularizers for group-structured and hierarchical sparsity, low-rank matrices, additive and multiplicative matrix decomposition, and high-dimensional nonparametric models. It includes various examples with motivating applications; it also covers key aspects of statistical theory and provides some discussion of efficient algorithms.
-
-
-
High-Dimensional Statistics with a View Toward Applications in Biology
Vol. 1 (2014), pp. 255–278More LessWe review statistical methods for high-dimensional data analysis and pay particular attention to recent developments for assessing uncertainties in terms of controlling false positive statements (type I error) and p-values. The main focus is on regression models, but we also discuss graphical modeling and causal inference based on observational data. We illustrate the concepts and methods with various packages from the statistical software using a high-throughput genomic data set about riboflavin production with Bacillus subtilis, which we make publicly available for the first time.
-
-
-
Next-Generation Statistical Genetics: Modeling, Penalization, and Optimization in High-Dimensional Data
Vol. 1 (2014), pp. 279–300More LessStatistical genetics is undergoing the same transition to big data that all branches of applied statistics are experiencing, and this transition is only accelerating with the advent of inexpensive DNA sequencing technology. This brief review highlights some modern techniques with recent successes in statistical genetics. These include (a) Lasso penalized regression for association mapping, (b) ethnic admixture estimation, (c) matrix completion for genotype and sequence imputation, (d) the fused Lasso for discovery of copy number variation, (e) haplotyping, (f) relatedness estimation, (g) variance components models, and (h) rare variant testing. For more than a century, genetics has been both a driver and beneficiary of statistical theory and practice. This symbiotic relationship will persist for the foreseeable future.
-
-
-
Breaking Bad: Two Decades of Life-Course Data Analysis in Criminology, Developmental Psychology, and Beyond
Vol. 1 (2014), pp. 301–332More LessStudies of human development require longitudinal data analysis methods that describe within- and between-individual variation in developmental and behavioral trajectories. This article reviews life-course data analysis methods for modeling these trajectories, as well as their application in studies of antisocial behavior and of crime in childhood, in adolescence, and throughout life. We set the stage by introducing growth curve (hierarchical linear) models. We focus our review on finite mixture models for life-course data, known as group-based trajectory and growth mixture models. We then discuss how these models are applied within criminology and developmental psychology, recent controversies over their substantive use and interpretation, and important issues of statistical practice and the challenges they raise. Building on the critical literature, we offer several recommendations for the applied users of the models. Finally, we present the most recent method of examining behavioral trajectories in criminology, the unimodal curve registration (UCR) approach. We briefly contrast the UCR model with growth curve and finite mixture models for life-course data analysis.
-
-
-
Event History Analysis
Vol. 1 (2014), pp. 333–360More LessEvent history analysis deals with data obtained by observing individuals over time, focusing on events occurring for the individuals under observation. Important applications are to life events of humans in demography, life insurance mathematics, epidemiology, and sociology. The basic data are the times of occurrence of the events and the types of events that occur. The standard approach to the analysis of such data is to use multistate models; a basic example is finite-state Markov processes in continuous time. Censoring and truncation are defining features of the area. This review comments specifically on three areas that are current subjects of active development, all motivated by demands from applications: sampling patterns, the possibility of causal interpretation of the analyses, and the levels and interpretation of variability.
-
-
-
Statistical Evaluation of Forensic DNA Profile Evidence
Vol. 1 (2014), pp. 361–384More LessThe evaluation of weight of evidence for forensic DNA profiles has been a subject of controversy since their introduction over 20 years ago. Substantial progress has been made for standard DNA profiles, but new issues have arisen in recent years with the advent of more sensitive profiling techniques, allowing profiles to be recovered from minuscule amounts of possibly degraded DNA. These low-template DNA profiles suffer from enhanced stochastic effects, including dropin, dropout, and stutter, which pose problems for DNA profile evaluation. These problems are now beginning to be overcome with the emergence of several statistical models and software. We first review the general principles of statistical evaluation of DNA profile evidence, and we then focus on low-template DNA profiles, briefly reviewing the main statistical models and software. We cover methods that use allele presence/absence and those that use electropherogram peak heights, focusing on the likelihood ratio as measure of evidential weight.
-
-
-
Using League Table Rankings in Public Policy Formation: Statistical Issues
Vol. 1 (2014), pp. 385–399More LessThis article reviews the statistical models that underpin institutional comparisons on the basis of outcome measures for their students. These multilevel models are developed to levels of complexity that match the problems posed. The strengths and limitations of inferences from these models are explored with examples taken from education.
-
-
-
Statistical Ecology
Vol. 1 (2014), pp. 401–426More LessStatistical ecology deals with the development of new methodologies for analyzing ecological data. Advanced statistical models and techniques are often needed to provide robust analyses of the available data. The statistical models that are developed can often be separated into two distinct processes: a system process that describes the underlying biological system and an observation process that describes the data collection process. The system process is often a function of the demographic parameters of interest, such as survival probabilities, transition rates between states, and/or abundance, whereas the model parameters associated with the observation process are conditional on the underlying state of the system. This review focuses on a number of common forms of ecological data and discusses their associated models and model-fitting approaches, including the incorporation of heterogeneity within the given biological system and the integration of different data sources.
-
-
-
Estimating the Number of Species in Microbial Diversity Studies
John Bunge, Amy Willis, and Fiona WalshVol. 1 (2014), pp. 427–445More LessFor decades, statisticians have studied the species problem: how to estimate the total number of species, observed plus unobserved, in a population. This problem dates at least as far back as 1943, to a paper by R.A. Fisher. These methods have found many applications in general ecology, but their importance has grown considerably in recent years, driven by the introduction of high-throughput DNA sequencing into microbial ecology. We examine the state of the art in terms of estimating the total number of taxa in a microbial population from a sample of sequences. We focus mainly on estimating the number of species within a single population (α-diversity), but we also briefly consider statistical inference for comparing the numbers of species across populations (β-diversity). We discuss the full range of statistical techniques, parametric and nonparametric as well as frequentist and Bayesian, and specific implications of their use in microbial diversity studies. We conclude with some recommendations for theoretical investigation and computational tool development.
-
-
-
Dynamic Treatment Regimes
Vol. 1 (2014), pp. 447–464More LessA dynamic treatment regime consists of a sequence of decision rules, one per stage of intervention, that dictate how to individualize treatments to patients, based on evolving treatment and covariate history. These regimes are particularly useful for managing chronic disorders and fit well into the larger paradigm of personalized medicine. They provide one way to operationalize a clinical decision support system. Statistics plays a key role in the construction of evidence-based dynamic treatment regimes—informing the best study design as well as efficient estimation and valid inference. Owing to the many novel methodological challenges this area offers, it has been growing in popularity among statisticians in recent years. In this article, we review the key developments in this exciting field of research. In particular, we discuss the sequential multiple assignment randomized trial designs, estimation techniques like Q-learning and marginal structural models, and several inference techniques designed to address the associated nonstandard asymptotics. We reference software whenever available. We also outline some important future directions.
-