Volume 5

Annual Review of Statistics and Its Application - Volume 5, 2018

Volume 5, 2018

- Introduction
  
  Nancy Reid, Thomas Louis, and Stephen Stigler
  
  Vol. 5 (2018), pp. i–i
  
  https://doi.org/10.1146/annurev-st-011018-100001
  More Less
  
  Add to my favoritesFavourites
  
  Email this

- Election Polls—A Survey, A Critique, and Proposals
  
  Ron S. Kenett, Danny Pfeffermann, and David M. Steinberg
  
  Vol. 5 (2018), pp. 1–24
  
  https://doi.org/10.1146/annurev-statistics-031017-100204
  More Less
  
  Election polls, also called election surveys, have been under severe criticism because of apparent gaps between their outcomes and election results. In this article, we survey election poll performance in the United States, United Kingdom, Canada, and Israel and discuss the current state of the art. We list the main data collection methods used in election surveys, describe a wide range of analysis techniques that can be applied to such data, and expand on the relatively new application of predictive models used in this context. A special section considers sources of error in election surveys followed by an introduction and a general discussion of an information quality framework for studying them. We conclude with a section on outlooks and proposals that require more research.
  
  Add to my favoritesFavourites
  
  Email this

- Web-Based Enrollment and Other Types of Self-Selection in Surveys and Studies: Consequences for Generalizability
  
  Niels Keiding, and Thomas A. Louis
  
  Vol. 5 (2018), pp. 25–47
  
  https://doi.org/10.1146/annurev-statistics-031017-100127
  More Less
  
  Web-based enrollment in surveys and studies is increasingly attractive as the Internet is approaching near-universal coverage and the attitude of respondents toward participation in classical modes of study deteriorates. Follow-up is also facilitated by the web-based approach. However, the consequent self-selection raises the question of the importance of representativity when attempting to generalize the results of a study beyond the context in which they were obtained, particularly under effect heterogeneity. Our review is divided into three main components: first, sample surveys or prevalence studies, assessing the frequency or prevalence of some attitude or disease condition in a population from its frequency in a sample from this population; second, generalization of the results from randomized trials to the population in which they were performed and to other populations; and third, generalization of results from observational studies.
  
  Add to my favoritesFavourites
  
  Email this

- Issues and Challenges in Census Taking
  
  Chris Skinner
  
  Vol. 5 (2018), pp. 49–63
  
  https://doi.org/10.1146/annurev-statistics-041715-033713
  More Less
  
  In recent decades, census taking around the world has faced major challenges, including cost pressures, concerns about intrusiveness, privacy and response burden, reduced cooperation, difficulties in accessing secure apartments and enumerating unsafe areas, more complex living arrangements, and timeliness concerns. National statistical offices have responded to these concerns with various methodological developments. These include the use of new technologies and sampling in traditional census taking. There has also been a shift from traditional census methods to greater use of registers, as pioneered in the Nordic countries, and to the combined use of registers and sample surveys. In addition to reviewing such developments, this article reviews developments in associated statistical methodology, including record linkage and coverage adjustment methods. The article concludes by discussing possible future developments in census taking around the world.
  
  Add to my favoritesFavourites
  
  Email this

- Methods for Inference from Respondent-Driven Sampling Data
  
  Krista J. Gile, Isabelle S. Beaudry, Mark S. Handcock, and Miles Q. Ott
  
  Vol. 5 (2018), pp. 65–93
  
  https://doi.org/10.1146/annurev-statistics-031017-100704
  More Less
  
  Respondent-driven sampling is a commonly used method for sampling from hard-to-reach human populations connected by an underlying social network of relations. Beginning with a convenience sample, participants pass coupons to invite their contacts to join the sample. Although the method is often effective at attaining large and varied samples, its reliance on convenience samples, social network contacts, and participant decisions makes it subject to a large number of statistical concerns. This article reviews inferential methods available for data collected by respondent-driven sampling.
  
  Add to my favoritesFavourites
  
  Email this

- Multiple Systems Estimation (or Capture-Recapture Estimation) to Inform Public Policy
  
  Sheila M. Bird, and Ruth King
  
  Vol. 5 (2018), pp. 95–118
  
  https://doi.org/10.1146/annurev-statistics-031017-100641
  More Less
  
  Applications of estimating population sizes range from estimating human or ecological population size within regions or countries to estimating the hidden number of civilian casualties in war. Total enumeration via a census is typically infeasible. However, a series of partial enumerations of a population is often possible, leading to capture-recapture methods, which have been extensively used in ecology to estimate the size of wildlife populations with an associated measure of uncertainty and are most effectively applied when there are multiple capture occasions. Capture-recapture ideology can be more widely applied to multiple data sources by the linkage of individuals across multiple lists, often referred to as multiple systems estimation (MSE). The MSE approach is preferred when estimating capture-shy or hard-to-reach populations, including those who are caught up in the criminal justice system, trafficked, or civilian casualties of war. Motivated by the public policy applications of MSE, each briefly introduced, we discuss practical problems with methodological implications. They include period definition; case definition; scenarios when an observed count is not a true count of the population of interest but an upper bound due to mismatched definitions; exact or probabilistic matching of cases across lists; demographic or other information about the case that influences capture propensities; permissions to access lists; list creation by research teams or interested parties; referrals (if presence on list A results, almost surely, in presence on list B); different mathematical models leading to widely different estimated population sizes; uncertainty in estimation; computational efficiency; external validation; hypothesis generation; and additional independent external information. Returning to our motivational applications, we focus finally on whether the uncertainty that qualified their estimates was sufficiently narrow to orient public policy.
  
  Add to my favoritesFavourites
  
  Email this

- Words, Words, Words: How the Digital Humanities Are Integrating Diverse Research Fields to Study People
  
  Chad Gaffield
  
  Vol. 5 (2018), pp. 119–139
  
  https://doi.org/10.1146/annurev-statistics-031017-100547
  More Less
  
  The rapidly developing field of digital humanities (DH) is showing how unprecedented volumes of data such as written expression can be studied to reveal new insights into humans and, therefore, into individual and collective experiences within and across societies. Scholars from disciplines such as literature and history are collaborating with scientists from disciplines such as statistics and computer science. Moreover, these interdisciplinary teams often reach beyond campuses to companies as well as local, national, and international public and nonprofit institutions. Surprisingly, the computational research that began in the humanities in the 1950s did not develop an important presence within mainstream scholarship until half a century later. The DH experiences thus far reflect the complexity of both human expression and research collaborations across diverse fields and sectors. Learning from past successes and failures will help meet today's data analytic challenges and prepare us for opportunities in statistical applications ranging from literary studies and cybersecurity to business intelligence and health indicators.
  
  Add to my favoritesFavourites
  
  Email this

- Toward Integrative Bayesian Analysis in Molecular Biology
  
  Katja Ickstadt, Martin Schäfer, and Manuela Zucknick
  
  Vol. 5 (2018), pp. 141–167
  
  https://doi.org/10.1146/annurev-statistics-031017-100438
  More Less
  
  In the postgenome era, multiple types of molecular data for the same set of samples are often available and should be analyzed jointly in an integrative analysis in order to maximize the information gain. Bayesian methods are particularly well suited for integrating different biological data sources. In this article, we cover crucial tasks and corresponding methods with a focus on integrative analyses. We emphasize gene prioritization, model-based cluster approaches for subgroup identification, regression modeling, and prediction, as well as structure learning using network models. Our review introduces prior concepts for sparsity and variable selection and concludes with some aspects on validation and computation.
  
  Add to my favoritesFavourites
  
  Email this

- Personalized Cancer Genomics*
  
  Richard M. Simon
  
  Vol. 5 (2018), pp. 169–182
  
  https://doi.org/10.1146/annurev-statistics-031017-100609
  More Less
  
  Developments in biotechnology have enabled the design of whole genome assays for nucleic acid sequencing, gene expression monitoring, gene copy number evaluation, and epigenetic silencing that have had major effects on biology, cancer drug development, and clinical trial design. Because cancer is a disease of DNA alteration, these developments have had a particularly important effect on the development of personalized oncology. Facilitating this transition has been development of statistical methods for transforming this high dimensional data into useful biological information, p > n classification methods, and new designs for clinical trials. In this article we review some of the key statistical developments in this area.
  
  Add to my favoritesFavourites
  
  Email this

- Computational Neuroscience: Mathematical and Statistical Perspectives
  
  Robert E. Kass, Shun-Ichi Amari, Kensuke Arai, Emery N. Brown, Casey O. Diekman, Markus Diesmann, Brent Doiron, Uri T. Eden, Adrienne L. Fairhall, Grant M. Fiddyment, Tomoki Fukai, Sonja Grün, Matthew T. Harrison, Moritz Helias, Hiroyuki Nakahara, Jun-nosuke Teramae, Peter J. Thomas, Mark Reimers, Jordan Rodu, Horacio G. Rotstein, Eric Shea-Brown, Hideaki Shimazaki, Shigeru Shinomoto, Byron M. Yu, and Mark A. Kramer
  
  Vol. 5 (2018), pp. 183–214
  
  https://doi.org/10.1146/annurev-statistics-041715-033733
  More Less
  
  Mathematical and statistical models have played important roles in neuroscience, especially by describing the electrical activity of neurons recorded individually, or collectively across large networks. As the field moves forward rapidly, new challenges are emerging. For maximal effectiveness, those working to advance computational neuroscience will need to appreciate and exploit the complementary strengths of mechanistic theory and the statistical paradigm.
  
  Add to my favoritesFavourites
  
  Email this

- Review of State-Space Models for Fisheries Science
  
  William H. Aeberhard, Joanna Mills Flemming, and Anders Nielsen
  
  Vol. 5 (2018), pp. 215–235
  
  https://doi.org/10.1146/annurev-statistics-031017-100427
  More Less
  
  Fisheries science is concerned with the management and understanding of the raising and harvesting of fish. Fish stocks are assessed using biological and fisheries data with the goal of estimating either their total population or biomass. Stock assessment models also make it possible to predict how stocks will respond to varying levels of fishing pressure in the future. Such tools are essential with overfishing now reducing stocks and employment worldwide, with in turn many serious social, economic, and environmental implications. Increasingly, a state-space framework is being used in place of deterministic and standard parametric stock assessment models. These efforts have not only had considerable impact on fisheries management but have also advanced the supporting statistical theory and inference tools as well as the required software. An application of such techniques to the North Sea cod stock highlights what should be considered best practices for science-based fisheries management.
  
  Add to my favoritesFavourites
  
  Email this

- Statistical Challenges in Assessing the Engineering Properties of Forest Products
  
  James V. Zidek, and Conroy Lum
  
  Vol. 5 (2018), pp. 237–264
  
  https://doi.org/10.1146/annurev-statistics-041715-033633
  More Less
  
  While the traditional approach to engineering materials aims to reduce their variability through a (generally energy-intensive) refining process, a more environmentally appropriate approach is to keep the product as natural as possible and quantify their variability, while reducing their variability only as necessary. This is the approach for sawn lumber and other solid wood products, made possible by the application of advanced statistical theory in the manufacturing, grading, and evaluation processes and in the engineering design models. This article reviews a number of statistical advances related to these objectives, and it ends with a view of a future characterized by engineered wood products and the use of high technology.
  
  Add to my favoritesFavourites
  
  Email this

- Overview and History of Statistics for Equity Markets
  
  John Lehoczky, and Mark Schervish
  
  Vol. 5 (2018), pp. 265–288
  
  https://doi.org/10.1146/annurev-statistics-031017-100518
  More Less
  
  This article surveys the evolution of stock market trading over a 60-year period. It begins before 1960, when there was no database widely available to conduct a statistical analysis of stock price movements. This changed in the 1960s with the introduction of the Center for Research in Security Prices database. A major finding was the heavy-tailed nature of stock returns. The 1960s also brought major theoretical developments, including the martingale theory of stock price processes and the efficient market hypothesis. This hypothesis prevailed until the 1990s, when the discovery of market anomalies led to statistical arbitrage strategies. We describe the use of modern machine learning methods, such as AdaBoost and random forests, which can combine some of these strategies into an improved trading strategy. The twenty-first century was marked by the rapid evolution of electronic markets and the rise of computer-driven high-frequency trading based on computing technology, low latency access, and limit order book modeling.
  
  Add to my favoritesFavourites
  
  Email this

- Statistical Modeling for Health Economic Evaluations
  
  Gianluca Baio
  
  Vol. 5 (2018), pp. 289–309
  
  https://doi.org/10.1146/annurev-statistics-031017-100404
  More Less
  
  Health economic evaluation has become increasingly important in medical research and recently has been built on solid statistical and decision-theoretic foundations, particularly under the Bayesian approach. In this article we review the basic concepts and issues associated with the statistical and decision-theoretic components of health economic evaluations. We present examples of typical models used in different contexts (depending on the availability of data). We also describe the process of uncertainty analysis, a crucial component of economic evaluations for health care interventions, aimed at assessing the impact of uncertainty in the model parameters on the final decision-making process. Finally, we discuss some of the most recent methodological developments, related with the application of advanced statistical models (e.g., Gaussian process regression) to facilitate the application of computationally expensive tools such as value of information analysis.
  
  Add to my favoritesFavourites
  
  Email this

- Cure Models in Survival Analysis
  
  Maïlis Amico, and Ingrid Van Keilegom
  
  Vol. 5 (2018), pp. 311–342
  
  https://doi.org/10.1146/annurev-statistics-031017-100101
  More Less
  
  When analyzing time-to-event data, it often happens that a certain fraction of the data corresponds to subjects who will never experience the event of interest. These event times are considered as infinite and the subjects are said to be cured. Survival models that take this feature into account are commonly referred to as cure models. This article reviews the literature on cure regression models in which the event time (response) is subject to random right censoring and has a positive probability to be equal to infinity.
  
  Add to my favoritesFavourites
  
  Email this

- Social Network Modeling
  
  Viviana Amati, Alessandro Lomi, and Antonietta Mira
  
  Vol. 5 (2018), pp. 343–369
  
  https://doi.org/10.1146/annurev-statistics-031017-100746
  More Less
  
  The development of stochastic models for the analysis of social networks is an important growth area in contemporary statistics. The last few decades have witnessed the rapid development of a variety of statistical models capable of representing the global structure of an observed network in terms of underlying generating mechanisms. The distinctive feature of statistical models for social networks is their ability to represent directly the dependence relations that these mechanisms entail. In this review, we focus on models for single network observations, particularly on the family of exponential random graph models. After defining the models, we discuss issues of model specification, estimation and assessment. We then review model extensions for the analysis of other types of network data, provide an empirical example, and give a selective overview of empirical studies that have adopted the basic model and its many variants. We conclude with an outline of the current analytical challenges.
  
  Add to my favoritesFavourites
  
  Email this

- Causal Structure Learning
  
  Christina Heinze-Deml, Marloes H. Maathuis, and Nicolai Meinshausen
  
  Vol. 5 (2018), pp. 371–391
  
  https://doi.org/10.1146/annurev-statistics-031017-100630
  More Less
  
  Graphical models can represent a multivariate distribution in a convenient and accessible form as a graph. Causal models can be viewed as a special class of graphical models that represent not only the distribution of the observed system but also the distributions under external interventions. They hence enable predictions under hypothetical interventions, which is important for decision making. The challenging task of learning causal models from data always relies on some underlying assumptions. We discuss several recently proposed structure learning algorithms and their assumptions, and we compare their empirical performance under various scenarios.
  
  Add to my favoritesFavourites
  
  Email this

- On p-Values and Bayes Factors
  
  Leonhard Held, and Manuela Ott
  
  Vol. 5 (2018), pp. 393–419
  
  https://doi.org/10.1146/annurev-statistics-031017-100307
  More Less
  
  The p-value quantifies the discrepancy between the data and a null hypothesis of interest, usually the assumption of no difference or no effect. A Bayesian approach allows the calibration of p-values by transforming them to direct measures of the evidence against the null hypothesis, so-called Bayes factors. We review the available literature in this area and consider two-sided significance tests for a point null hypothesis in more detail. We distinguish simple from local alternative hypotheses and contrast traditional Bayes factors based on the data with Bayes factors based on p-values or test statistics. A well-known finding is that the minimum Bayes factor, the smallest possible Bayes factor within a certain class of alternative hypotheses, provides less evidence against the null hypothesis than the corresponding p-value might suggest. It is less known that the relationship between p-values and minimum Bayes factors also depends on the sample size and on the dimension of the parameter of interest. We illustrate the transformation of p-values to minimum Bayes factors with two examples from clinical research.
  
  Add to my favoritesFavourites
  
  Email this

- Particle Filters and Data Assimilation
  
  Paul Fearnhead, and Hans R. Künsch
  
  Vol. 5 (2018), pp. 421–449
  
  https://doi.org/10.1146/annurev-statistics-031017-100232
  More Less
  
  State-space models can be used to incorporate subject knowledge on the underlying dynamics of a time series by the introduction of a latent Markov state process. A user can specify the dynamics of this process together with how the state relates to partial and noisy observations that have been made. Inference and prediction then involve solving a challenging inverse problem: calculating the conditional distribution of quantities of interest given the observations. This article reviews Monte Carlo algorithms for solving this inverse problem, covering methods based on the particle filter and the ensemble Kalman filter. We discuss the challenges posed by models with high-dimensional states, joint estimation of parameters and the state, and inference for the history of the state process. We also point out some potential new developments that will be important for tackling cutting-edge filtering applications.
  
  Add to my favoritesFavourites
  
  Email this

- Geometry and Dynamics for Markov Chain Monte Carlo
  
  Alessandro Barp, François-Xavier Briol, Anthony D. Kennedy, and Mark Girolami
  
  Vol. 5 (2018), pp. 451–471
  
  https://doi.org/10.1146/annurev-statistics-031017-100141
  More Less
  
  Markov chain Monte Carlo methods have revolutionized mathematical computation and enabled statistical inference within many previously intractable models. In this context, Hamiltonian dynamics have been proposed as an efficient way of building chains that can explore probability densities efficiently. The method emerges from physics and geometry, and these links have been extensively studied over the past thirty years. The aim of this review is to provide a comprehensive introduction to the geometric tools used in Hamiltonian Monte Carlo at a level accessible to statisticians, machine learners, and other users of the methodology with only a basic understanding of Monte Carlo methods. This will be complemented with some discussion of the most recent advances in the field, which we believe will become increasingly relevant to scientists.
  
  Add to my favoritesFavourites
  
  Email this

1 2 ►

20 | 50 | 100 per page

Annual Review of Statistics and Its Application - Volume 5, 2018

Volume 5, 2018

Introduction

Election Polls—A Survey, A Critique, and Proposals

Web-Based Enrollment and Other Types of Self-Selection in Surveys and Studies: Consequences for Generalizability

Issues and Challenges in Census Taking

Methods for Inference from Respondent-Driven Sampling Data

Multiple Systems Estimation (or Capture-Recapture Estimation) to Inform Public Policy

Words, Words, Words: How the Digital Humanities Are Integrating Diverse Research Fields to Study People

Toward Integrative Bayesian Analysis in Molecular Biology

Personalized Cancer Genomics*

Computational Neuroscience: Mathematical and Statistical Perspectives

Review of State-Space Models for Fisheries Science

Statistical Challenges in Assessing the Engineering Properties of Forest Products

Overview and History of Statistics for Equity Markets

Statistical Modeling for Health Economic Evaluations

Cure Models in Survival Analysis

Social Network Modeling

Causal Structure Learning

On p-Values and Bayes Factors

Particle Filters and Data Assimilation

Geometry and Dynamics for Markov Chain Monte Carlo

Previous Volumes

Volume 11 (2024)

Volume 10 (2023)

Volume 9 (2022)

Volume 8 (2021)

Volume 7 (2020)

Volume 6 (2019)

Volume 5 (2018)

Volume 4 (2017)

Volume 3 (2016)

Volume 2 (2015)

Volume 1 (2014)

Volume 0 (1932)