Tag Archives: Big Data

Shocked by the NSA revelations? You don’t know the whole story


A review of Big Data: A Revolution That Will Transform How We Live, Work, and Think, by Viktor Mayer-Schoeneberger and Kenneth Cukier

@@@@ (4 out of 5)

While Edward Snowden bounces from one temporary refuge to another in search of safe harbor from the long arms of the U.S. government, the American public is starting to wake up to the reality of Big Data. The National Security Agency, long one of the pioneers in this burgeoning but little-appreciated field, has been teaching us — or, rather, Snowden, The Guardian, and the Washington Post have been teaching us — about the power that resides in gargantuan masses of data. Now here come Viktor Mayer-Schoeneberger and Kenneth Cukier with a new book that goes far beyond the headlines about espionage and invasion of privacy to give us an eminently readable, well-organized overview of Big Data’s origins, its characteristics, and its potential for both good and evil.

When we think of Big Data, we, or at least most of us, think of computers. However, the authors persuade us that the fundamentals of Big Data were laid down more than a century before the invention of the microprocessor. They point to a legendary American seaman named Matthew Maury. In the middle of the 19th Century, after 16 years of effort, Maury published a book based on 1.2 billion data points gleaned from old ships’ logs stored by the Navy that dramatically reduced the distances (and, hence, the time elapsed) in ocean voyages by both military and commercial ships. Maury used facts derived from decades of mariners’ observations to dispel the myths, legends, superstitions, and rumors that had long caused ocean-voyaging ships to pursue roundabout courses. Not so incidentally, Maury’s work also facilitated the laying of the first transatlantic telegraph cable.

If not the first, this was certainly an early application of Big Data, which the authors describe as follows: “big data refers to things one can do at a large scale that cannot be done at a smaller one, to extract new insights or create new forms of value, in ways that change markets, organizations, the relationship between citizens and governments, and more.” For example, if Maury had had available only a fraction of the old ships’ logs he found in the naval archives, his task would have been impractical, since each individual log doubtless included small errors (and an occasional big one). Only by amassing a huge store of data did those errors cancel out one another.

Now, in the Digital Age, the volumes of data that can be harnessed are, at times, literally astronomical. “Google processes more than 24 petabytes of data per day, a volume that is thousands of times the quantity of all printed material in the U.S. Library of Congress.” AT&T transfers about 30 petabytes of data through its networks each day. Twenty-four or 30 of something doesn’t sound like much, unless you understand that a megabyte is a million bytes, a gigabyte is a billion bytes, a terabyte is 1,000 times the size of a gigabyte, and a petabyte is 1,000 times the size of a terabyte. That’s 1,000,000,000,000,000 bytes. That’s a lot of data! But even that’s only a tiny slice of all the data now stored in the world, “estimated to be around 1,200 exabytes.” And an exabyte (I’m sure you’re dying to know) is the equivalent of 1,000 petabytes. So, 1,200 petabytes could also be stated as 1.2 zettabytes, with a zettabyte equal to 1,000 petabytes, and I’ll bet that not one person in a million has ever heard of a zettabyte before. Had you?

All of which should make clear that when we talk about Big Data today, we’re talking about really, really big numbers — so big, in fact, that almost no matter how messy or inaccurate the data might be, it’s usually possible to draw useful, on-target insights from analyzing it. That’s what’s different about Big Data — and that’s why the phenomenon is bound to change the way we think about the world.

We live in a society obsessed with causality. We often care more about why something happened than about what it was that happened. And in a world where Big Data looms larger and larger all the time, we’ll have to get used to not knowing — or even caring much — why things happen.

“At its core,” write Mayer-Schoeneberger and Cukier, “big data is about predictions. Though it is described as part of the branch of computer science called artificial intelligence, and more specifically, an area called machine learning, this characterization is misleading. Big data is not about trying to ‘teach’ a computer to ‘think’ like humans. Instead, it’s about applying math to huge quantities of data in order to infer probabilities: the likelihood that an email message is spam; that the typed letters ‘teh’ are supposed to be ‘the’; that the trajectory and velocity of a person jaywalking mean he’ll make it across the street in time [so that] the self-driving car need only slow slightly.”

The authors refer to data as “the oil of the information economy,” predicting that, as it flows into all the nooks and crannies of our society, it will bring about “three major shifts of mindset that are interlinked and hence reinforce one another.” First among these is our ever-growing ability to analyze inconceivably large amounts of data and not have to settle for sampling. Second, we’ll come to accept the inevitable messiness in huge stores of data and learn not to insist on precision in reporting. Third, and last, we’ll get used to accepting correlations rather than causality. “The ideal of identifying causal mechanisms is a self-congratulatory illusion; big data overturns this,” the authors assert.

If you want to understand this increasingly important aspect of contemporary life, I suggest you read Big Data.

Viktor Mayer-Schoeneberger and Kenneth Cukier come to the task of writing this book with unbeatable credentials. Mayer-Schoeneberger is Professor of Internet Governance at Oxford University, and Kenneth Cukier is Data Editor at The Economist.

Leave a comment

Filed under Nonfiction, Science

Top 10 trends shaping the future of publishing

By Johanna Vondeling   

1. Everyone’s a publisher

Now that digital content is popular and relatively easy and inexpensive to produce, millions of individuals and thousands of non-book-publishing media companies have leapt into the business of creating and distributing digital content (often coupled with print-on-demand options).[i] [ii] The near-elimination of barriers to entry into the publishing marketplace has produced an ever-increasing flood of information and entertainment options for consumers.[iii]

Moreover, publishers’ primary competition today isn’t other books, but rather other forms of media, such as social media platforms, games, and streaming media. As the presence and relevance of physical retail for books continues to decline, so too will the necessity for other entities — including authors and other content producers — to work with established legacy publishers to bring books to market.[iv]

2. Content comes first

All content producers now need to approach format as a secondary consideration. The innovators are designing work-flows that prioritize the development and (pre-publication) tagging of content irrespective of format, knowing that the eventual outputs could be infinite: Print book? E-book? Online course? Webinar? App? Blog? Tweet? Tagging must be “semantic” (tagged for meaning, not just coincidence of terms), to facilitate discoverability. Content producers must make it as easy as possible for content to be re-purposed by its curators and leveraged and shared by its marketers and distribution partners.

3. Content marketing is king

Content is still king. And content marketing (defined as “marketing without marketing, or building soft power and social gravity for a brand through shared values and interests”) is edging out traditional push-marketing practices. By disseminating great quality and immersive content through social platforms, content producers can market themselves without interrupting consumers with more explicit advertising.[v]

Content marketing facilitates reader engagement. Engagement, in turn, produces strong brand ties, leading to increased purchasing, product loyalty, and customer advocacy. But there is no standard definition or metric for engagement, nor do most organizations fully understand the migration from engagement to revenue. The challenges are 1) understanding what’s happening within the dynamic ecosystem of content and social media and 2) being able to make tactical changes to increase conversion and revenue.

4. Big data rules

The amount of data in our world has been exploding. Analyzing large data sets—so-called big data—has become a key basis of competition, driving growth and innovation. The increasing volume and detail of information captured by enterprises, and the rise of multimedia and social media, have all been fueling exponential growth in data.[vi] As a result, businesses now have broad and deep visibility into their stakeholders’ behaviors and values. But which information matters most? Big data offers promise in making sense of this complexity.

The few businesses that have successful migrated from print-first to digital-first models have invested significantly in building in-house data and analytics teams.[vii] While the growing importance of data analysts should not be under-estimated, the need for creative thinking in the changing world of marketing has never been greater. Note the rise in recruitment of ‘data scientists,’ who are savvy in computer science but – crucially – also able to apply creative thinking to data-driven challenges.

5. Mobile matters

The number of mobile-connected devices will exceed the world’s population in 2013.[viii] In 2012, mobile subscriptions in China surpassed 1 billion and mobile Web users overtook PC access to the web.[ix] Millions of people in developing countries may never own a book or a computer, but they do own a mobile phone.

To move forward in “mobile optimization” means content must be conceived of and designed explicitly for mobile devices. Every experience offered through digital channels – every web page, shopping cart and piece of rich content – must work well on any device in any location. Customers generally understand that concessions need to be made for the smaller screen, touchscreen input, and slower speed, but they won’t accept unnecessary hassle or delay. Apps are a part of today’s approach to mobile, but they are not a cure-all to this challenge, as use of the mobile web increases daily.[x]

6. The Internet is the classroom

The education industry is experiencing dramatic disruption. Profits and enrollment at for-profit colleges and universities in the United States are growing at a staggering rate.[xi] We’re witnessing the proliferation of “massive open online courses (“MOOCs”).[xii] Education start-ups are creating and offering online study groups, flashcards, lecture notes, and a wealth of other tools for free. Investment in education technology companies increased from less than $100 million in 2007 to nearly $400 million last year.[xiii] And while digital textbooks have been slow to gain adoption, many education providers are turning away from print textbooks in favor of digital devices in classrooms and lecture halls. In response, some publishers are diving head-first into the growing business of online education.[xiv]

The disruptive power of information technology may be our best hope for containing the soaring costs that are driving a growing number of students into ruinous debt or out of higher education altogether. It is also a potential boon to those displaced workers under pressure to become “life-long learners.” But this disruptive power also poses a potential existential threat to many physical universities and traditional textbook publishers.[xv]

7. Get used to strange bedfellows

Legacy industries, like book publishing, are realizing that they can’t go it alone if they hope to survive and thrive. Many are forming unlikely alliances or funding start-ups to help them adapt amid the present flux and strategize for the future. In 2012, Pearson bought Author Solutions, one of the leading providers of self-publishing services. In 2013, Pearson and Kaplan have both launched incubator programs to help vet and mentor education-tech start-ups. Macmillan has been aggressively investing a fund of over $100 million in ed-tech start-ups.[xvi] Other publishers are leveraging ties with other branded media platforms and content providers. Hyperion is selling its backlist and will focus exclusively on content tied to its sister companies Disney and ABC.[xvii] Wiley is distributing material from (former competitor) OpenStax College, an open-source platform that makes introductory college textbooks available as free downloads.[xviii]

8. Set up high-value networks

Platforms like Craigslist and eBay engage in “commons-creation” by establishing virtual spaces in which strangers can pool their ideas, sell products or services, and make social connections. The platforms that can provide real value gain users (and often revenue) quickly. We’re also witnessing a dramatic rise in the use digital personal assistants networks like Task Rabbit.

And Amazon successfully launched Audiobook Creation Exchange, a platform that connects freelance narrators of audio books with the owners of content who are looking to publish audio books. As workers experience less job security and turn increasingly to independent and task-based employment options, such platforms provide value by leveraging the sponsor’s “right of way” to create credible networks that connect people seeking products and services with those eager to provide them.

9. Crowdfunding has come of age

Digital crowdsourcing platforms like Indiegogo, Kickstarter, Unbound, and Pubslush are proliferating, gaining both users and donors at a remarkable pace. Now, content curators can use these platforms to locate content that readers are attracted to and willing to pay for – before it is produced and distributed. Combined with the boom in self-publishing, this trend means more opportunities for cultural producers to identify content with proven market demand, and more ways to identify the hardcore fan base for a particular set of content, before making the decision to invest.[xix]

10. The means of production is going hyper-local

Paradoxically, globalization is both making it easier to purchase a product on the other side of the planet and moving the production of goods closer to the site of purchase. The emergence of “additive manufacturing” and 3-D printing holds the promise that individual creators and users can “make” anything in their own homes. Book and magazine publishers are printing closer to their customers through globally dispersed printing operations and print-on-demand programs. Espresso machines facilitate the printing of out-of-stock and self-published books in physical bookstores.[xx]

All these developments offer the opportunity to bring production closer to the customer, facilitating just-in-time sales and providing more sustainable alternatives to current distribution practices.

Johanna Vondeling is Vice President for Business Development at Berrett-Koehler Publishers. 


i.         “Shatzkin: Soon, Most People Working in Publishing Won’t Be Working at Publishing Companies.” Digital Book World. March 19, 2013.

ii.         “Ecco, MLB Team Up for E-book Series.” Publishers Weekly. March 20, 2013.

iii.         “The Ten Awful Truths About Book Publishing.” Steve Piersanti. March 6, 2012.

iv.         “Book Publishers Scramble to Rewrite Their Future.” Wired. March 19, 2013.

v.         Adobe/Econsultancy Quarterly 2013 Digital Intelligence Briefing. January, 2013.

vi.         “Big data: The next frontier for innovation, competition, and productivity.” McKinsey Global Institute. March, 2011.

vii.         “The FT has ‘crossed over’ to become a digital business—but can anyone else replicate that feat?” paidContent. March 18, 2013.

viii.         Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2012-2017. February 6, 2013.

ix.         “2013: The year nothing but mobile matters for any business selling in China.” MobiThinking. December 20, 2012.

x.         Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2012-2017. February 6, 2013.

xi.         “The Rise of For-Profit Universities and Colleges.” University World News. July 15, 2012.

xii.         “Massive open online courses: Time and a little money are a worthy investment.” Financial Times. March 11, 2013.

xiii.         “The Siege of Academe.” Washington Monthly. September/October 2012.

 xiv.         “Wiley Launches Digital Classroom, Video and Ebook E-Learning Site.” Digital Book World. March 19, 2013.

xv.         “The Siege of Academe.” Washington Monthly. September/October 2012.

xvi.         Publishers Lunch. March 7, 2013.

xvii.        Publishers Lunch. March 7. 2013

xviii.        “Wiley, OpenStax Team on College Biology Textbook.” InformationWeek.com. March 11, 2013.

xix.         “Veronica Mars Lives again: Lessons from a record-breaking Kickstarter campaign.” paidContent. March 17, 2013.

xx.         “Just Press Print.” The Economist. February 12, 2010.


Filed under Commentaries, FAQs & Commentaries

Understand how Netflix, Wall Street, and economists use (and misuse) statistics



A review of Naked Statistics: Stripping the Dread from the Data, by Charles Whelan

@@@@ (4 out of 5)

In the unfolding Age of Big Data, no one who hopes to understand the way the world works can afford to be ignorant of statistical methods. Not a day goes by that statistical analysis isn’t behind some front-page story — in politics, sports, business, or even entertainment. The statistical concepts of probability, sampling,  and statistical validity, once considered obscure and of interest only to geeks wearing pocket protectors, are now indispensable tools for the active citizen to grasp. Writing in a breezy and intimate style, with humor and lots asides to the reader, Charles Whelan attempts to unpack these concepts and explain them in English with a minimal use of advanced math, and he succeeds . . . up to a point.

Naked Statistics was published just four months after Nate Silver’s best-selling book, The Signal and the Noise, which covers much the same ground in a very different way. (I reviewed that book here.) Whelan focuses on the nitty-gritty of statistical methodology, delving into such topics as how samples are chosen, what’s meant by terms such as correlation, standard deviation, and regression analysis, and how to determine whether the results of a test are statistically valid. However, he doesn’t lose sight of practical questions, unpacking such seemingly puzzling statements as “the average income in America is not equal to the income of the average American” and spotlighting the difference between precision and accuracy. Silver instead explains how statistical methods are applied in a wide range of activities, from baseball and basketball to Wall Street. Whelan includes lots of formulas laden with Greek letters, though, conveniently, they’re confined for the most part to Appendixes that follow many of the book’s chapters and can be skipped by a non-technical reader. (I ignored them.) Silver’s book is refreshingly devoid of Greek letters.

As Whelan makes clear, perhaps unintentionally, statistics is a forbiddingly technical field. Truth to tell, if you really want to understand statistical methodology and how it can be applied, you need a fair grounding in mathematics and a tolerance for terminology that doesn’t appear in everyday English. In fact, you probably need to take the same sort of graduate school courses Whelan took years ago. This is heady stuff!

All in all, for a run-of-the-mill mathematical illiterate such as me, Nate Silver did a much better job getting across the significance of statistics and how its methods are applied to strip away the complexities of today’s often baffling, data-driven world.

1 Comment

Filed under Nonfiction, Science