A review of Big Data: A Revolution That Will Transform How We Live, Work, and Think, by Viktor Mayer-Schoeneberger and Kenneth Cukier
@@@@ (4 out of 5)
While Edward Snowden bounces from one temporary refuge to another in search of safe harbor from the long arms of the U.S. government, the American public is starting to wake up to the reality of Big Data. The National Security Agency, long one of the pioneers in this burgeoning but little-appreciated field, has been teaching us — or, rather, Snowden, The Guardian, and the Washington Post have been teaching us — about the power that resides in gargantuan masses of data. Now here come Viktor Mayer-Schoeneberger and Kenneth Cukier with a new book that goes far beyond the headlines about espionage and invasion of privacy to give us an eminently readable, well-organized overview of Big Data’s origins, its characteristics, and its potential for both good and evil.
When we think of Big Data, we, or at least most of us, think of computers. However, the authors persuade us that the fundamentals of Big Data were laid down more than a century before the invention of the microprocessor. They point to a legendary American seaman named Matthew Maury. In the middle of the 19th Century, after 16 years of effort, Maury published a book based on 1.2 billion data points gleaned from old ships’ logs stored by the Navy that dramatically reduced the distances (and, hence, the time elapsed) in ocean voyages by both military and commercial ships. Maury used facts derived from decades of mariners’ observations to dispel the myths, legends, superstitions, and rumors that had long caused ocean-voyaging ships to pursue roundabout courses. Not so incidentally, Maury’s work also facilitated the laying of the first transatlantic telegraph cable.
If not the first, this was certainly an early application of Big Data, which the authors describe as follows: “big data refers to things one can do at a large scale that cannot be done at a smaller one, to extract new insights or create new forms of value, in ways that change markets, organizations, the relationship between citizens and governments, and more.” For example, if Maury had had available only a fraction of the old ships’ logs he found in the naval archives, his task would have been impractical, since each individual log doubtless included small errors (and an occasional big one). Only by amassing a huge store of data did those errors cancel out one another.
Now, in the Digital Age, the volumes of data that can be harnessed are, at times, literally astronomical. “Google processes more than 24 petabytes of data per day, a volume that is thousands of times the quantity of all printed material in the U.S. Library of Congress.” AT&T transfers about 30 petabytes of data through its networks each day. Twenty-four or 30 of something doesn’t sound like much, unless you understand that a megabyte is a million bytes, a gigabyte is a billion bytes, a terabyte is 1,000 times the size of a gigabyte, and a petabyte is 1,000 times the size of a terabyte. That’s 1,000,000,000,000,000 bytes. That’s a lot of data! But even that’s only a tiny slice of all the data now stored in the world, “estimated to be around 1,200 exabytes.” And an exabyte (I’m sure you’re dying to know) is the equivalent of 1,000 petabytes. So, 1,200 petabytes could also be stated as 1.2 zettabytes, with a zettabyte equal to 1,000 petabytes, and I’ll bet that not one person in a million has ever heard of a zettabyte before. Had you?
All of which should make clear that when we talk about Big Data today, we’re talking about really, really big numbers — so big, in fact, that almost no matter how messy or inaccurate the data might be, it’s usually possible to draw useful, on-target insights from analyzing it. That’s what’s different about Big Data — and that’s why the phenomenon is bound to change the way we think about the world.
We live in a society obsessed with causality. We often care more about why something happened than about what it was that happened. And in a world where Big Data looms larger and larger all the time, we’ll have to get used to not knowing — or even caring much — why things happen.
“At its core,” write Mayer-Schoeneberger and Cukier, “big data is about predictions. Though it is described as part of the branch of computer science called artificial intelligence, and more specifically, an area called machine learning, this characterization is misleading. Big data is not about trying to ‘teach’ a computer to ‘think’ like humans. Instead, it’s about applying math to huge quantities of data in order to infer probabilities: the likelihood that an email message is spam; that the typed letters ‘teh’ are supposed to be ‘the’; that the trajectory and velocity of a person jaywalking mean he’ll make it across the street in time [so that] the self-driving car need only slow slightly.”
The authors refer to data as “the oil of the information economy,” predicting that, as it flows into all the nooks and crannies of our society, it will bring about “three major shifts of mindset that are interlinked and hence reinforce one another.” First among these is our ever-growing ability to analyze inconceivably large amounts of data and not have to settle for sampling. Second, we’ll come to accept the inevitable messiness in huge stores of data and learn not to insist on precision in reporting. Third, and last, we’ll get used to accepting correlations rather than causality. “The ideal of identifying causal mechanisms is a self-congratulatory illusion; big data overturns this,” the authors assert.
If you want to understand this increasingly important aspect of contemporary life, I suggest you read Big Data.
Viktor Mayer-Schoeneberger and Kenneth Cukier come to the task of writing this book with unbeatable credentials. Mayer-Schoeneberger is Professor of Internet Governance at Oxford University, and Kenneth Cukier is Data Editor at The Economist.