- Home
- A-Z Publications
- Annual Review of Statistics and Its Application
- Previous Issues
- Volume 5, 2018
Annual Review of Statistics and Its Application - Volume 5, 2018
Volume 5, 2018
-
-
Election Polls—A Survey, A Critique, and Proposals
Vol. 5 (2018), pp. 1–24More LessElection polls, also called election surveys, have been under severe criticism because of apparent gaps between their outcomes and election results. In this article, we survey election poll performance in the United States, United Kingdom, Canada, and Israel and discuss the current state of the art. We list the main data collection methods used in election surveys, describe a wide range of analysis techniques that can be applied to such data, and expand on the relatively new application of predictive models used in this context. A special section considers sources of error in election surveys followed by an introduction and a general discussion of an information quality framework for studying them. We conclude with a section on outlooks and proposals that require more research.
-
-
-
Web-Based Enrollment and Other Types of Self-Selection in Surveys and Studies: Consequences for Generalizability
Vol. 5 (2018), pp. 25–47More LessWeb-based enrollment in surveys and studies is increasingly attractive as the Internet is approaching near-universal coverage and the attitude of respondents toward participation in classical modes of study deteriorates. Follow-up is also facilitated by the web-based approach. However, the consequent self-selection raises the question of the importance of representativity when attempting to generalize the results of a study beyond the context in which they were obtained, particularly under effect heterogeneity. Our review is divided into three main components: first, sample surveys or prevalence studies, assessing the frequency or prevalence of some attitude or disease condition in a population from its frequency in a sample from this population; second, generalization of the results from randomized trials to the population in which they were performed and to other populations; and third, generalization of results from observational studies.
-
-
-
Issues and Challenges in Census Taking
Vol. 5 (2018), pp. 49–63More LessIn recent decades, census taking around the world has faced major challenges, including cost pressures, concerns about intrusiveness, privacy and response burden, reduced cooperation, difficulties in accessing secure apartments and enumerating unsafe areas, more complex living arrangements, and timeliness concerns. National statistical offices have responded to these concerns with various methodological developments. These include the use of new technologies and sampling in traditional census taking. There has also been a shift from traditional census methods to greater use of registers, as pioneered in the Nordic countries, and to the combined use of registers and sample surveys. In addition to reviewing such developments, this article reviews developments in associated statistical methodology, including record linkage and coverage adjustment methods. The article concludes by discussing possible future developments in census taking around the world.
-
-
-
Methods for Inference from Respondent-Driven Sampling Data
Vol. 5 (2018), pp. 65–93More LessRespondent-driven sampling is a commonly used method for sampling from hard-to-reach human populations connected by an underlying social network of relations. Beginning with a convenience sample, participants pass coupons to invite their contacts to join the sample. Although the method is often effective at attaining large and varied samples, its reliance on convenience samples, social network contacts, and participant decisions makes it subject to a large number of statistical concerns. This article reviews inferential methods available for data collected by respondent-driven sampling.
-
-
-
Multiple Systems Estimation (or Capture-Recapture Estimation) to Inform Public Policy
Sheila M. Bird, and Ruth KingVol. 5 (2018), pp. 95–118More LessApplications of estimating population sizes range from estimating human or ecological population size within regions or countries to estimating the hidden number of civilian casualties in war. Total enumeration via a census is typically infeasible. However, a series of partial enumerations of a population is often possible, leading to capture-recapture methods, which have been extensively used in ecology to estimate the size of wildlife populations with an associated measure of uncertainty and are most effectively applied when there are multiple capture occasions. Capture-recapture ideology can be more widely applied to multiple data sources by the linkage of individuals across multiple lists, often referred to as multiple systems estimation (MSE). The MSE approach is preferred when estimating capture-shy or hard-to-reach populations, including those who are caught up in the criminal justice system, trafficked, or civilian casualties of war. Motivated by the public policy applications of MSE, each briefly introduced, we discuss practical problems with methodological implications. They include period definition; case definition; scenarios when an observed count is not a true count of the population of interest but an upper bound due to mismatched definitions; exact or probabilistic matching of cases across lists; demographic or other information about the case that influences capture propensities; permissions to access lists; list creation by research teams or interested parties; referrals (if presence on list A results, almost surely, in presence on list B); different mathematical models leading to widely different estimated population sizes; uncertainty in estimation; computational efficiency; external validation; hypothesis generation; and additional independent external information. Returning to our motivational applications, we focus finally on whether the uncertainty that qualified their estimates was sufficiently narrow to orient public policy.
-
-
-
Words, Words, Words: How the Digital Humanities Are Integrating Diverse Research Fields to Study People
Vol. 5 (2018), pp. 119–139More LessThe rapidly developing field of digital humanities (DH) is showing how unprecedented volumes of data such as written expression can be studied to reveal new insights into humans and, therefore, into individual and collective experiences within and across societies. Scholars from disciplines such as literature and history are collaborating with scientists from disciplines such as statistics and computer science. Moreover, these interdisciplinary teams often reach beyond campuses to companies as well as local, national, and international public and nonprofit institutions. Surprisingly, the computational research that began in the humanities in the 1950s did not develop an important presence within mainstream scholarship until half a century later. The DH experiences thus far reflect the complexity of both human expression and research collaborations across diverse fields and sectors. Learning from past successes and failures will help meet today's data analytic challenges and prepare us for opportunities in statistical applications ranging from literary studies and cybersecurity to business intelligence and health indicators.
-
-
-
Toward Integrative Bayesian Analysis in Molecular Biology
Vol. 5 (2018), pp. 141–167More LessIn the postgenome era, multiple types of molecular data for the same set of samples are often available and should be analyzed jointly in an integrative analysis in order to maximize the information gain. Bayesian methods are particularly well suited for integrating different biological data sources. In this article, we cover crucial tasks and corresponding methods with a focus on integrative analyses. We emphasize gene prioritization, model-based cluster approaches for subgroup identification, regression modeling, and prediction, as well as structure learning using network models. Our review introduces prior concepts for sparsity and variable selection and concludes with some aspects on validation and computation.
-
-
-
Personalized Cancer Genomics*
Vol. 5 (2018), pp. 169–182More LessDevelopments in biotechnology have enabled the design of whole genome assays for nucleic acid sequencing, gene expression monitoring, gene copy number evaluation, and epigenetic silencing that have had major effects on biology, cancer drug development, and clinical trial design. Because cancer is a disease of DNA alteration, these developments have had a particularly important effect on the development of personalized oncology. Facilitating this transition has been development of statistical methods for transforming this high dimensional data into useful biological information, p > n classification methods, and new designs for clinical trials. In this article we review some of the key statistical developments in this area.
-
-
-
Computational Neuroscience: Mathematical and Statistical Perspectives
Robert E. Kass, Shun-Ichi Amari, Kensuke Arai, Emery N. Brown, Casey O. Diekman, Markus Diesmann, Brent Doiron, Uri T. Eden, Adrienne L. Fairhall, Grant M. Fiddyment, Tomoki Fukai, Sonja Grün, Matthew T. Harrison, Moritz Helias, Hiroyuki Nakahara, Jun-nosuke Teramae, Peter J. Thomas, Mark Reimers, Jordan Rodu, Horacio G. Rotstein, Eric Shea-Brown, Hideaki Shimazaki, Shigeru Shinomoto, Byron M. Yu, and Mark A. KramerVol. 5 (2018), pp. 183–214More LessMathematical and statistical models have played important roles in neuroscience, especially by describing the electrical activity of neurons recorded individually, or collectively across large networks. As the field moves forward rapidly, new challenges are emerging. For maximal effectiveness, those working to advance computational neuroscience will need to appreciate and exploit the complementary strengths of mechanistic theory and the statistical paradigm.
-
-
-
Review of State-Space Models for Fisheries Science
Vol. 5 (2018), pp. 215–235More LessFisheries science is concerned with the management and understanding of the raising and harvesting of fish. Fish stocks are assessed using biological and fisheries data with the goal of estimating either their total population or biomass. Stock assessment models also make it possible to predict how stocks will respond to varying levels of fishing pressure in the future. Such tools are essential with overfishing now reducing stocks and employment worldwide, with in turn many serious social, economic, and environmental implications. Increasingly, a state-space framework is being used in place of deterministic and standard parametric stock assessment models. These efforts have not only had considerable impact on fisheries management but have also advanced the supporting statistical theory and inference tools as well as the required software. An application of such techniques to the North Sea cod stock highlights what should be considered best practices for science-based fisheries management.
-
-
-
Statistical Challenges in Assessing the Engineering Properties of Forest Products
James V. Zidek, and Conroy LumVol. 5 (2018), pp. 237–264More LessWhile the traditional approach to engineering materials aims to reduce their variability through a (generally energy-intensive) refining process, a more environmentally appropriate approach is to keep the product as natural as possible and quantify their variability, while reducing their variability only as necessary. This is the approach for sawn lumber and other solid wood products, made possible by the application of advanced statistical theory in the manufacturing, grading, and evaluation processes and in the engineering design models. This article reviews a number of statistical advances related to these objectives, and it ends with a view of a future characterized by engineered wood products and the use of high technology.
-
-
-
Overview and History of Statistics for Equity Markets
Vol. 5 (2018), pp. 265–288More LessThis article surveys the evolution of stock market trading over a 60-year period. It begins before 1960, when there was no database widely available to conduct a statistical analysis of stock price movements. This changed in the 1960s with the introduction of the Center for Research in Security Prices database. A major finding was the heavy-tailed nature of stock returns. The 1960s also brought major theoretical developments, including the martingale theory of stock price processes and the efficient market hypothesis. This hypothesis prevailed until the 1990s, when the discovery of market anomalies led to statistical arbitrage strategies. We describe the use of modern machine learning methods, such as AdaBoost and random forests, which can combine some of these strategies into an improved trading strategy. The twenty-first century was marked by the rapid evolution of electronic markets and the rise of computer-driven high-frequency trading based on computing technology, low latency access, and limit order book modeling.
-
-
-
Statistical Modeling for Health Economic Evaluations
Vol. 5 (2018), pp. 289–309More LessHealth economic evaluation has become increasingly important in medical research and recently has been built on solid statistical and decision-theoretic foundations, particularly under the Bayesian approach. In this article we review the basic concepts and issues associated with the statistical and decision-theoretic components of health economic evaluations. We present examples of typical models used in different contexts (depending on the availability of data). We also describe the process of uncertainty analysis, a crucial component of economic evaluations for health care interventions, aimed at assessing the impact of uncertainty in the model parameters on the final decision-making process. Finally, we discuss some of the most recent methodological developments, related with the application of advanced statistical models (e.g., Gaussian process regression) to facilitate the application of computationally expensive tools such as value of information analysis.
-
-
-
Cure Models in Survival Analysis
Vol. 5 (2018), pp. 311–342More LessWhen analyzing time-to-event data, it often happens that a certain fraction of the data corresponds to subjects who will never experience the event of interest. These event times are considered as infinite and the subjects are said to be cured. Survival models that take this feature into account are commonly referred to as cure models. This article reviews the literature on cure regression models in which the event time (response) is subject to random right censoring and has a positive probability to be equal to infinity.
-
-
-
Social Network Modeling
Vol. 5 (2018), pp. 343–369More LessThe development of stochastic models for the analysis of social networks is an important growth area in contemporary statistics. The last few decades have witnessed the rapid development of a variety of statistical models capable of representing the global structure of an observed network in terms of underlying generating mechanisms. The distinctive feature of statistical models for social networks is their ability to represent directly the dependence relations that these mechanisms entail. In this review, we focus on models for single network observations, particularly on the family of exponential random graph models. After defining the models, we discuss issues of model specification, estimation and assessment. We then review model extensions for the analysis of other types of network data, provide an empirical example, and give a selective overview of empirical studies that have adopted the basic model and its many variants. We conclude with an outline of the current analytical challenges.
-
-
-
Causal Structure Learning
Vol. 5 (2018), pp. 371–391More LessGraphical models can represent a multivariate distribution in a convenient and accessible form as a graph. Causal models can be viewed as a special class of graphical models that represent not only the distribution of the observed system but also the distributions under external interventions. They hence enable predictions under hypothetical interventions, which is important for decision making. The challenging task of learning causal models from data always relies on some underlying assumptions. We discuss several recently proposed structure learning algorithms and their assumptions, and we compare their empirical performance under various scenarios.
-
-
-
On p-Values and Bayes Factors
Leonhard Held, and Manuela OttVol. 5 (2018), pp. 393–419More LessThe p-value quantifies the discrepancy between the data and a null hypothesis of interest, usually the assumption of no difference or no effect. A Bayesian approach allows the calibration of p-values by transforming them to direct measures of the evidence against the null hypothesis, so-called Bayes factors. We review the available literature in this area and consider two-sided significance tests for a point null hypothesis in more detail. We distinguish simple from local alternative hypotheses and contrast traditional Bayes factors based on the data with Bayes factors based on p-values or test statistics. A well-known finding is that the minimum Bayes factor, the smallest possible Bayes factor within a certain class of alternative hypotheses, provides less evidence against the null hypothesis than the corresponding p-value might suggest. It is less known that the relationship between p-values and minimum Bayes factors also depends on the sample size and on the dimension of the parameter of interest. We illustrate the transformation of p-values to minimum Bayes factors with two examples from clinical research.
-
-
-
Particle Filters and Data Assimilation
Vol. 5 (2018), pp. 421–449More LessState-space models can be used to incorporate subject knowledge on the underlying dynamics of a time series by the introduction of a latent Markov state process. A user can specify the dynamics of this process together with how the state relates to partial and noisy observations that have been made. Inference and prediction then involve solving a challenging inverse problem: calculating the conditional distribution of quantities of interest given the observations. This article reviews Monte Carlo algorithms for solving this inverse problem, covering methods based on the particle filter and the ensemble Kalman filter. We discuss the challenges posed by models with high-dimensional states, joint estimation of parameters and the state, and inference for the history of the state process. We also point out some potential new developments that will be important for tackling cutting-edge filtering applications.
-
-
-
Geometry and Dynamics for Markov Chain Monte Carlo
Vol. 5 (2018), pp. 451–471More LessMarkov chain Monte Carlo methods have revolutionized mathematical computation and enabled statistical inference within many previously intractable models. In this context, Hamiltonian dynamics have been proposed as an efficient way of building chains that can explore probability densities efficiently. The method emerges from physics and geometry, and these links have been extensively studied over the past thirty years. The aim of this review is to provide a comprehensive introduction to the geometric tools used in Hamiltonian Monte Carlo at a level accessible to statisticians, machine learners, and other users of the methodology with only a basic understanding of Monte Carlo methods. This will be complemented with some discussion of the most recent advances in the field, which we believe will become increasingly relevant to scientists.
-