Volume 1

Annual Review of Statistics and Its Application - Volume 1, 2014

Volume 1, 2014

- What Is Statistics?
  
  Stephen E. Fienberg
  
  Vol. 1 (2014), pp. 1–9
  
  https://doi.org/10.1146/annurev-statistics-022513-115703
  More Less
  
  One might think that there is a simple answer to the question posed in the title of the form “Statistics is….” Sadly, there is not, although many contemporary statistical authors have attempted to answer the question. This article captures the essence of some of these efforts, setting them in their historical contexts. In the process, we focus on the cross-disciplinary nature of much modern statistical research. This discussion serves as a backdrop to the the aims of the Annual Review of Statistics and its Application, which begins publication with the present volume.
  
  Add to my favoritesFavourites
  
  Email this

- A Systematic Statistical Approach to Evaluating Evidence from Observational Studies
  
  David Madigan, Paul E. Stang, Jesse A. Berlin, Martijn Schuemie, J. Marc Overhage, Marc A. Suchard, Bill Dumouchel, Abraham G. Hartzema, and Patrick B. Ryan
  
  Vol. 1 (2014), pp. 11–39
  
  https://doi.org/10.1146/annurev-statistics-022513-115645
  More Less
  
  Threats to the validity of observational studies on the effects of interventions raise questions about the appropriate role of such studies in decision making. Nonetheless, scholarly journals in fields such as medicine, education, and the social sciences feature many such studies, often with limited exploration of these threats, and the lay press is rife with news stories based on these studies. Consumers of these studies rely on the expertise of the study authors to conduct appropriate analyses, and on the thoroughness of the scientific peer-review process to check the validity, but the introspective and ad hoc nature of the design of these analyses appears to elude any meaningful objective assessment of their performance. Here, we review some of the challenges encountered in observational studies and review an alternative, data-driven approach to observational study design, execution, and analysis. Although much work remains, we believe this research direction shows promise.
  
  Add to my favoritesFavourites
  
  Email this

- The Role of Statistics in the Discovery of a Higgs Boson
  
  David A. van Dyk
  
  Vol. 1 (2014), pp. 41–59
  
  https://doi.org/10.1146/annurev-statistics-062713-085841
  More Less
  
  The 2012–2013 discovery of a Higgs boson appears to have filled the final missing gap in the Standard Model of particle physics and was greeted with fanfare by the scientific community and by the public at large. Particle physicists have developed and rigorously tested a specialized statistical tool kit that is designed for the search for new physics. This tool kit was put to the test in a 40-year search that culminated in the discovery of a Higgs boson. This article reviews these statistical methods, the controversies that surround them, and how they led to this historic discovery.
  
  Add to my favoritesFavourites
  
  Email this

- Brain Imaging Analysis
  
  F. DuBois Bowman
  
  Vol. 1 (2014), pp. 61–85
  
  https://doi.org/10.1146/annurev-statistics-022513-115611
  More Less
  
  The increasing availability of brain imaging technologies has led to intense neuroscientific inquiry into the human brain. Studies often investigate brain function related to emotion, cognition, language, memory, and responses to numerous other external stimuli, as well as resting-state brain function. Brain imaging studies also attempt to determine the functional or structural basis for psychiatric or neurological disorders and to examine the responses of these disorders to treatment. Neuroimaging is a highly interdisciplinary field, and statistics plays a critical role in establishing rigorous methods to extract information and to quantify evidence for formal inferences. Neuroimaging data present numerous challenges for statistical analysis, including the vast amounts of data collected from each individual and the complex temporal and spatial dependencies present in the data. I briefly provide background on various types of neuroimaging data and analysis objectives that are commonly targeted in the field. I also present a survey of existing methods aimed at these objectives and identify particular areas offering opportunities for future statistical contribution.
  
  Add to my favoritesFavourites
  
  Email this

- Statistics and Climate
  
  Peter Guttorp
  
  Vol. 1 (2014), pp. 87–101
  
  https://doi.org/10.1146/annurev-statistics-022513-115648
  More Less
  
  For a statistician, climate is the distribution of weather and other variables that are part of the climate system. This distribution changes over time. This review considers some aspects of climate data, climate model assessment, and uncertainty estimation pertinent to climate issues, focusing mainly on temperatures. Some interesting methodological needs that arise from these issues are also considered.
  
  Add to my favoritesFavourites
  
  Email this

- Climate Simulators and Climate Projections
  
  Jonathan Rougier, and Michael Goldstein
  
  Vol. 1 (2014), pp. 103–123
  
  https://doi.org/10.1146/annurev-statistics-022513-115652
  More Less
  
  We provide a statistical interpretation of current practice in climate modeling. In this review, we define weather and climate, clarify the relationship between simulator output and simulator climate, distinguish between a climate simulator and a statistical climate model, provide a statistical interpretation of the ubiquitous practice of anomaly correction along with a substantial generalization (the best-parameter approach), and interpret simulator/data comparisons as posterior predictive checking, including a simple adjustment to allow for double counting. We also discuss statistical approaches to simulator tuning, assessing parametric uncertainty, and responding to unrealistic outputs. We finish with a more general discussion of larger themes.
  
  Add to my favoritesFavourites
  
  Email this

- Probabilistic Forecasting
  
  Tilmann Gneiting, and Matthias Katzfuss
  
  Vol. 1 (2014), pp. 125–151
  
  https://doi.org/10.1146/annurev-statistics-062713-085831
  More Less
  
  A probabilistic forecast takes the form of a predictive probability distribution over future quantities or events of interest. Probabilistic forecasting aims to maximize the sharpness of the predictive distributions, subject to calibration, on the basis of the available information set. We formalize and study notions of calibration in a prediction space setting. In practice, probabilistic calibration can be checked by examining probability integral transform (PIT) histograms. Proper scoring rules such as the logarithmic score and the continuous ranked probability score serve to assess calibration and sharpness simultaneously. As a special case, consistent scoring functions provide decision-theoretically coherent tools for evaluating point forecasts. We emphasize methodological links to parametric and nonparametric distributional regression techniques, which attempt to model and to estimate conditional distribution functions; we use the context of statistically postprocessed ensemble forecasts in numerical weather prediction as an example. Throughout, we illustrate concepts and methodologies in data examples.
  
  Add to my favoritesFavourites
  
  Email this

- Bayesian Computational Tools
  
  Christian P. Robert
  
  Vol. 1 (2014), pp. 153–177
  
  https://doi.org/10.1146/annurev-statistics-022513-115543
  More Less
  
  This article surveys advances in the field of Bayesian computation over the past 20 years from a purely personal viewpoint, hence containing some ommissions given the spectrum of the field. Monte Carlo, MCMC, and ABC themes are covered here, whereas the rapidly expanding area of particle methods is only briefly mentioned and different approximative techniques such as variational Bayes and linear Bayes methods do not appear at all. This article also contains some novel computational entries on the double-exponential model that may be of interest.
  
  Add to my favoritesFavourites
  
  Email this

- Bayesian Computation Via Markov Chain Monte Carlo
  
  Radu V. Craiu, and Jeffrey S. Rosenthal
  
  Vol. 1 (2014), pp. 179–201
  
  https://doi.org/10.1146/annurev-statistics-022513-115540
  More Less
  
  Markov chain Monte Carlo (MCMC) algorithms are an indispensable tool for performing Bayesian inference. This review discusses widely used sampling algorithms and illustrates their implementation on a probit regression model for lupus data. The examples considered highlight the importance of tuning the simulation parameters and underscore the important contributions of modern developments such as adaptive MCMC. We then use the theory underlying MCMC to explain the validity of the algorithms considered and to assess the variance of the resulting Monte Carlo estimators.
  
  Add to my favoritesFavourites
  
  Email this

- Build, Compute, Critique, Repeat: Data Analysis with Latent Variable Models
  
  David M. Blei
  
  Vol. 1 (2014), pp. 203–232
  
  https://doi.org/10.1146/annurev-statistics-022513-115657
  More Less
  
  We survey latent variable models for solving data-analysis problems. A latent variable model is a probabilistic model that encodes hidden patterns in the data. We uncover these patterns from their conditional distribution and use them to summarize data and form predictions. Latent variable models are important in many fields, including computational biology, natural language processing, and social network analysis. Our perspective is that models are developed iteratively: We build a model, use it to analyze data, assess how it succeeds and fails, revise it, and repeat. We describe how new research has transformed these essential activities. First, we describe probabilistic graphical models, a language for formulating latent variable models. Second, we describe mean field variational inference, a generic algorithm for approximating conditional distributions. Third, we describe how to use our analyses to solve problems: exploring the data, forming predictions, and pointing us in the direction of improved models.
  
  Add to my favoritesFavourites
  
  Email this

- Structured Regularizers for High-Dimensional Problems: Statistical and Computational Issues
  
  Martin J. Wainwright
  
  Vol. 1 (2014), pp. 233–253
  
  https://doi.org/10.1146/annurev-statistics-022513-115643
  More Less
  
  Regularization is a widely used technique throughout statistics, machine learning, and applied mathematics. Modern applications in science and engineering lead to massive and complex data sets, which motivate the use of more structured types of regularizers. This survey provides an overview of the use of structured regularization in high-dimensional statistics, including regularizers for group-structured and hierarchical sparsity, low-rank matrices, additive and multiplicative matrix decomposition, and high-dimensional nonparametric models. It includes various examples with motivating applications; it also covers key aspects of statistical theory and provides some discussion of efficient algorithms.
  
  Add to my favoritesFavourites
  
  Email this

- High-Dimensional Statistics with a View Toward Applications in Biology
  
  Peter Bühlmann, Markus Kalisch, and Lukas Meier
  
  Vol. 1 (2014), pp. 255–278
  
  https://doi.org/10.1146/annurev-statistics-022513-115545
  More Less
  
  We review statistical methods for high-dimensional data analysis and pay particular attention to recent developments for assessing uncertainties in terms of controlling false positive statements (type I error) and p-values. The main focus is on regression models, but we also discuss graphical modeling and causal inference based on observational data. We illustrate the concepts and methods with various packages from the statistical software using a high-throughput genomic data set about riboflavin production with Bacillus subtilis, which we make publicly available for the first time.
  
  Add to my favoritesFavourites
  
  Email this

- Next-Generation Statistical Genetics: Modeling, Penalization, and Optimization in High-Dimensional Data
  
  Kenneth Lange, Jeanette C. Papp, Janet S. Sinsheimer, and Eric M. Sobel
  
  Vol. 1 (2014), pp. 279–300
  
  https://doi.org/10.1146/annurev-statistics-022513-115638
  More Less
  
  Statistical genetics is undergoing the same transition to big data that all branches of applied statistics are experiencing, and this transition is only accelerating with the advent of inexpensive DNA sequencing technology. This brief review highlights some modern techniques with recent successes in statistical genetics. These include (a) Lasso penalized regression for association mapping, (b) ethnic admixture estimation, (c) matrix completion for genotype and sequence imputation, (d) the fused Lasso for discovery of copy number variation, (e) haplotyping, (f) relatedness estimation, (g) variance components models, and (h) rare variant testing. For more than a century, genetics has been both a driver and beneficiary of statistical theory and practice. This symbiotic relationship will persist for the foreseeable future.
  
  Add to my favoritesFavourites
  
  Email this

- Breaking Bad: Two Decades of Life-Course Data Analysis in Criminology, Developmental Psychology, and Beyond
  
  Elena A. Erosheva, Ross L. Matsueda, and Donatello Telesca
  
  Vol. 1 (2014), pp. 301–332
  
  https://doi.org/10.1146/annurev-statistics-022513-115701
  More Less
  
  Studies of human development require longitudinal data analysis methods that describe within- and between-individual variation in developmental and behavioral trajectories. This article reviews life-course data analysis methods for modeling these trajectories, as well as their application in studies of antisocial behavior and of crime in childhood, in adolescence, and throughout life. We set the stage by introducing growth curve (hierarchical linear) models. We focus our review on finite mixture models for life-course data, known as group-based trajectory and growth mixture models. We then discuss how these models are applied within criminology and developmental psychology, recent controversies over their substantive use and interpretation, and important issues of statistical practice and the challenges they raise. Building on the critical literature, we offer several recommendations for the applied users of the models. Finally, we present the most recent method of examining behavioral trajectories in criminology, the unimodal curve registration (UCR) approach. We briefly contrast the UCR model with growth curve and finite mixture models for life-course data analysis.
  
  Add to my favoritesFavourites
  
  Email this

- Event History Analysis
  
  Niels Keiding
  
  Vol. 1 (2014), pp. 333–360
  
  https://doi.org/10.1146/annurev-statistics-022513-115558
  More Less
  
  Event history analysis deals with data obtained by observing individuals over time, focusing on events occurring for the individuals under observation. Important applications are to life events of humans in demography, life insurance mathematics, epidemiology, and sociology. The basic data are the times of occurrence of the events and the types of events that occur. The standard approach to the analysis of such data is to use multistate models; a basic example is finite-state Markov processes in continuous time. Censoring and truncation are defining features of the area. This review comments specifically on three areas that are current subjects of active development, all motivated by demands from applications: sampling patterns, the possibility of causal interpretation of the analyses, and the levels and interpretation of variability.
  
  Add to my favoritesFavourites
  
  Email this

- Statistical Evaluation of Forensic DNA Profile Evidence
  
  Christopher D. Steele, and David J. Balding
  
  Vol. 1 (2014), pp. 361–384
  
  https://doi.org/10.1146/annurev-statistics-022513-115602
  More Less
  
  The evaluation of weight of evidence for forensic DNA profiles has been a subject of controversy since their introduction over 20 years ago. Substantial progress has been made for standard DNA profiles, but new issues have arisen in recent years with the advent of more sensitive profiling techniques, allowing profiles to be recovered from minuscule amounts of possibly degraded DNA. These low-template DNA profiles suffer from enhanced stochastic effects, including dropin, dropout, and stutter, which pose problems for DNA profile evaluation. These problems are now beginning to be overcome with the emergence of several statistical models and software. We first review the general principles of statistical evaluation of DNA profile evidence, and we then focus on low-template DNA profiles, briefly reviewing the main statistical models and software. We cover methods that use allele presence/absence and those that use electropherogram peak heights, focusing on the likelihood ratio as measure of evidential weight.
  
  Add to my favoritesFavourites
  
  Email this

- Using League Table Rankings in Public Policy Formation: Statistical Issues
  
  Harvey Goldstein
  
  Vol. 1 (2014), pp. 385–399
  
  https://doi.org/10.1146/annurev-statistics-022513-115615
  More Less
  
  This article reviews the statistical models that underpin institutional comparisons on the basis of outcome measures for their students. These multilevel models are developed to levels of complexity that match the problems posed. The strengths and limitations of inferences from these models are explored with examples taken from education.
  
  Add to my favoritesFavourites
  
  Email this

- Statistical Ecology
  
  Ruth King
  
  Vol. 1 (2014), pp. 401–426
  
  https://doi.org/10.1146/annurev-statistics-022513-115633
  More Less
  
  Statistical ecology deals with the development of new methodologies for analyzing ecological data. Advanced statistical models and techniques are often needed to provide robust analyses of the available data. The statistical models that are developed can often be separated into two distinct processes: a system process that describes the underlying biological system and an observation process that describes the data collection process. The system process is often a function of the demographic parameters of interest, such as survival probabilities, transition rates between states, and/or abundance, whereas the model parameters associated with the observation process are conditional on the underlying state of the system. This review focuses on a number of common forms of ecological data and discusses their associated models and model-fitting approaches, including the incorporation of heterogeneity within the given biological system and the integration of different data sources.
  
  Add to my favoritesFavourites
  
  Email this

- Estimating the Number of Species in Microbial Diversity Studies
  
  John Bunge, Amy Willis, and Fiona Walsh
  
  Vol. 1 (2014), pp. 427–445
  
  https://doi.org/10.1146/annurev-statistics-022513-115654
  More Less
  
  For decades, statisticians have studied the species problem: how to estimate the total number of species, observed plus unobserved, in a population. This problem dates at least as far back as 1943, to a paper by R.A. Fisher. These methods have found many applications in general ecology, but their importance has grown considerably in recent years, driven by the introduction of high-throughput DNA sequencing into microbial ecology. We examine the state of the art in terms of estimating the total number of taxa in a microbial population from a sample of sequences. We focus mainly on estimating the number of species within a single population (α-diversity), but we also briefly consider statistical inference for comparing the numbers of species across populations (β-diversity). We discuss the full range of statistical techniques, parametric and nonparametric as well as frequentist and Bayesian, and specific implications of their use in microbial diversity studies. We conclude with some recommendations for theoretical investigation and computational tool development.
  
  Add to my favoritesFavourites
  
  Email this

- Dynamic Treatment Regimes
  
  Bibhas Chakraborty, and Susan A. Murphy
  
  Vol. 1 (2014), pp. 447–464
  
  https://doi.org/10.1146/annurev-statistics-022513-115553
  More Less
  
  A dynamic treatment regime consists of a sequence of decision rules, one per stage of intervention, that dictate how to individualize treatments to patients, based on evolving treatment and covariate history. These regimes are particularly useful for managing chronic disorders and fit well into the larger paradigm of personalized medicine. They provide one way to operationalize a clinical decision support system. Statistics plays a key role in the construction of evidence-based dynamic treatment regimes—informing the best study design as well as efficient estimation and valid inference. Owing to the many novel methodological challenges this area offers, it has been growing in popularity among statisticians in recent years. In this article, we review the key developments in this exciting field of research. In particular, we discuss the sequential multiple assignment randomized trial designs, estimation techniques like Q-learning and marginal structural models, and several inference techniques designed to address the associated nonstandard asymptotics. We reference software whenever available. We also outline some important future directions.
  
  Add to my favoritesFavourites
  
  Email this

1 2 ►

20 | 50 | 100 per page

Annual Review of Statistics and Its Application - Volume 1, 2014

Volume 1, 2014

What Is Statistics?

A Systematic Statistical Approach to Evaluating Evidence from Observational Studies

The Role of Statistics in the Discovery of a Higgs Boson

Brain Imaging Analysis

Statistics and Climate

Climate Simulators and Climate Projections

Probabilistic Forecasting

Bayesian Computational Tools

Bayesian Computation Via Markov Chain Monte Carlo

Build, Compute, Critique, Repeat: Data Analysis with Latent Variable Models

Structured Regularizers for High-Dimensional Problems: Statistical and Computational Issues

High-Dimensional Statistics with a View Toward Applications in Biology

Next-Generation Statistical Genetics: Modeling, Penalization, and Optimization in High-Dimensional Data

Breaking Bad: Two Decades of Life-Course Data Analysis in Criminology, Developmental Psychology, and Beyond

Event History Analysis

Statistical Evaluation of Forensic DNA Profile Evidence

Using League Table Rankings in Public Policy Formation: Statistical Issues

Statistical Ecology

Estimating the Number of Species in Microbial Diversity Studies

Dynamic Treatment Regimes

Previous Volumes

Volume 11 (2024)

Volume 10 (2023)

Volume 9 (2022)

Volume 8 (2021)

Volume 7 (2020)

Volume 6 (2019)

Volume 5 (2018)

Volume 4 (2017)

Volume 3 (2016)

Volume 2 (2015)

Volume 1 (2014)

Volume 0 (1932)