Xiao-Li Meng
Harvard, Whipple V. N. Jones Professor of Statistics
Founding Editor-in-Chief of Harvard Data Science Review
Abstract
The phrase “Big Data” has greatly raised expectations of what we can learn about ourselves and the world
in which we live or will live. It also appears to have boosted general trust in empirical findings, because
it seems to be common sense that the more data, the more reliable are our results. Unfortunately, this
commonsense conception can be falsified mathematically even for methods such as the time-honored
ordinary least squares regressions (Meng and Xie, 2014, Econometric Reviews 33: 218-250).
Furthermore, whereas the size of data is a common indicator of the amount of information, what matters
far more is the quality of data. A 5-element Euler-formula like identity reveals that trading quantity for
quality in population statistical inference is a mathematically demonstrably doomed game (Meng, 2018,
Annals of Applied Statistics, 685-726). Without considering data quality, Big Data can do more harm
than good because of the drastically inflated precision assessment, and hence the gross overconfidence,
setting us up to be caught by surprise when the reality unfolds, as we all experienced during the 2016 US
presidential election. Data from Cooperative Congressional Election Study (CCES, conducted by Stephen
Ansolabehere, Douglas River and others, and analyzed by Shiro Kuriwaki), are used to assess the data
quality in 2016 US election polls, with the aim to gain a clearer vision for the 2020 election and beyond.
Slides
Both articles are available at https://statistics.fas.harvard.edu/people/xiao-li-meng; the first one is inside Xiao-Li’s
CV.