April 28, 2012
I] Data in the World

Defining data

Defining data can be a rather esoteric business, but for the purposes of this paper we will stick to the essentials. Data is typically viewed as a constituent of information - that is, the reduced, discrete entities that make up a larger system, set or understanding.

One of the cornerstones of non-fiction writer James Gleick’s enthralling book The Information: A History, A Theory A Flood is the thinking of Claude Shannon, the mathematician founder of Information Theory which views information as strings of bits which have quantities but which have nothing to do with meaning or values like truth or falsehood. [Gleick, J. (2011) The Information: A History, A Theory A Flood. Kindle. New York: Pantheon Books].

Many of the major scientific breakthroughs of the past century have been informational in nature. For instance the revelation that human biological substance is made of coded information patterns and that “our nerves carry messages” occurred only after the invention of the Telegraph, which carries human-made messages through wires. [Kelly, K. & Gleick, J. (2011) Why the Basis of the Universe Isn’t Matter or Energy—It’s Data. Wired.com/magazine [online]. Available from: http://www.wired.com/magazine/2011/02/mf_gleick_qa/all/1 (Accessed 2 April 2012). [online]].

More fundamentally information is, by many contemporary scientific accounts, the constitutional force behind life and the universe. [Landsburg, S. E. (2009) The Big Questions. London: Simon & Schuster.]

The Oxford English Dictionary defines data as a noun: “facts and statistics collected together for reference or analysis.” The singular of data is datum: “a piece of information.” These will be instructive moving forward.

Data in the World

Advances in information technologies are transforming how we communicate with friends and family, changing the way many enterprises do business and eroding the reliability and function of disparate organisational structures on which society and commerce depended for much of human history. [Shirky, C. (2008) Here Comes Everybody. New York: Penguin Group.] Rapid contractions in the price and size of consumer hardware necessary to interface with many of those information technologies have made the consumption and production of digital media both mobile and ubiquitous throughout much of the developed world.

Among the chief consequences of these transformations, as well as advances in industrial computing and interface design technologies, has been an explosion in the global volume of data.

So then, now that every person with a digital device - not to mention the places where they use and nearly any imaginable action performed on those devices - can be turned into a data point, how much data is there in the world?

In information disciplines data is measured in bytes. One byte is roughly the amount of information it takes to represent a single alphanumeric character.

Chief executive of enterprise software company Open Text, Mark Barrenechea, estimates that two and a half exabytes (one exabyte is a quintillion bytes) of computer data are produced every day, which means “every two days humans are putting more data online than what has been put into print since the dawn of recorded history.” [Simone, R. (2012) Power of big data will transform society, Canada 3.0 forum told. The Record [online]. Available from: http://www.therecord.com/news/business/article/711815—power-of-big-data-will-transform-society-canada-3-0-forum-tol (Accessed 28 April 2012).]

According to a 2012 report by the International Data Corporation (IDC), the amount of data available to the global Big Data industry is growing in volume by over 50 percent each year. [International Data Corporation (2012) Worldwide Big Data Technology and Services Forecast. [online]. Available from: http://www.idc.com/getdoc.jsp?containerId=prUS23355112 (Accessed April 16, 2012)].

All of that data, it seems prudent to remark, is being generated in one way or another by the growing population of Internet-connected human beings which surpassed the 2 billion mark near the start of 2011 and doubled in the five years before that. [ Lynn, J. (2010) Internet users to exceed 2 billion this year. http://www.reuters.com/. 19 October. [online]. Available from: http://www.reuters.com/article/2010/10/19/us-telecoms-internet-idUSTRE69I24720101019 (Accessed April 20, 2012)].

How manageable is all of that data? Not very if Eric Schmidt chief executive of Google - the Internet search giant, one of whose corporate missions is to “organise the world’s information - is to be believed. [ Corporate, G. (n.d.) Google’s mission is to organize the world’s information and make it universally accessible and useful. Google Company [online]. Available from: http://www.google.com/about/company/ (Accessed 28 April 2012). [online].

At the 2005 Association of National Advertisers conference, Schmidt estimated the company had indexed roughly 170 of the “5 million terabytes of information out in the world,” which is less than 0.004 percent. A terabyte is 10^12 bytes; ten terabytes is roughly the amount of information in the US Library of Congress’s entire catalogue. Without reference to the Internet’s exponential growth trends in the seven years since his talk (or sigmoidal growth, given infinite, exponential growth is likely impossible [Arthur, C. (2007) Want to impress your friends? Tell them internet growth is sigmoidal, not exponential. Guardian Technology Blog [online]. Available from: http://www.guardian.co.uk/technology/blog/2007/nov/26/wanttoimpressyourfriendst (Accessed 15 April 2012). [online].]), Schmidt predicted it could take Google 300 years to index the world’s data exhaustively and make it searchable. [Mills, E. (2005) Google reveals its 300-year plan. ZDNet IT Strategy [online]. Available from: http://www.zdnet.co.uk/news/it-strategy/2005/10/10/google-reveals-its-300-year-plan-39228011/ (Accessed 16 March 2012).]

In any event, the growth of the Internet is expanding at a blistering pace as measured by any number of factors including number of hosts, speed, and connected population. That growth, along with advances in peripheral technological and design innovations, is among the primary causes of the global swelling of collectible and usable data sets.

Adoption and Value

"Most great revolutions in science are preceded by revolutions in measurement. We have had a revolution in measurement in the last few years," says economist and professor of data measurement Erik Brynjolfsson. "This revolution in measurement, which started with the switch from analogue to digital data, is as profound as the development of the microscope and what it did for biology and medicine." [ Eric, B. (n.d.) Competing through Data. [online]. Available from: http://rss.mckinseyquarterly.com/fp/video (Accessed 10 April 2012). [online]. ]

The range of industries for which Big Data is providing applications and value is larger, however, than just biology and medicine.

"The march of quantification, made possible by enormous new sources of data, will sweep through academia, business and government. There is no area that is going to be untouched," the director of Harvard’s Institute for Quantitative Social Science Gary King told the New York Times. [ Lohr, S. (2012) The Age of Big Data. News analysis on the New York Times Sunday Review [online]. Available from: http://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.html?pagewanted=all (Accessed 3 April 2012). [online]. ]

Similarly, research on technology and innovation by the McKinsey Global Institute and MGI asserts that “data have swept into every industry and business function and are now an important factor for production, alongside labor and capital. Accordingly, companies are starting to “leverage data-driven strategies to innovate, compete and capture value from deep and up-to-real-time information.” [ Manyika, J. et al. (2011) Big data: The next frontier for innovation, competition, and productivity. McKinsey Insights [online]. Available from: http://www.mckinsey.com/Insights/MGI/Research/Technology_and_Innovation/Big_data_The_next_frontier_for_innovation (Accessed 28 April 2012). [online]. ]

A panel on Big Data at the World Economic Forum 2012 in Davos, Switzerland went so far as to declare data a new class of economic asset, similar to currency or gold. [ World Economic Forum (2012) Big Data, Big Impact:  New Possibilities for International Development. [online]. Available from: http://www3.weforum.org/docs/WEF_TC_MFS_BigDataBigImpact_Briefing_2012.pdf (Accessed 1 April 2012). [online]. ]

What is it that make data so valuable and why are the insights they provide so highly prized?

For one, this revolution in measurement is intertwined with innovations in data collection and analysis and expedited by wide adoption of consumer-facing data-input and management platforms like Google, Facebook, Twitter and blogging services, all of which rely on digital data to function in the first place and “make it possible to measure behaviour and sentiment in fine detail and as it happens,” says Brynjolfsson.

Another, more directly commercial instance of data’s value is personalisation. Facebook is able to charge advertisers a higher premium than, say, digital news publishers in part because of how much specific demographic and behavioural data the company has about its users. The DVD rental and video streaming service Netflix has developed an algorithm that analyses data about demographics, location and  film preference and processes that into personalised recommendation engines from which, by some estimates, 70 percent of titles are chosen. [ Simone, R. (2012) Power of big data will transform society, Canada 3.0 forum told. The Record [online]. Available from: http://www.therecord.com/news/business/article/711815—power-of-big-data-will-transform-society-canada-3-0-forum-tol (Accessed 28 April 2012). [online]].

In both cases knowing more fine-grained detail about customers increases companies’s ability to make evidence-based decisions about changes and refinements to products and services.

The lion’s share of Google’s revenues come from its ad model, which harnesses the value of data by analysing global search trends and auctioning off text-based ad space against keywords and phrases. Here, data plays several roles.

First, search trends are collected, analysed and made publicly available in the form of Google Analytics. (This platform has more function than just advertising, which we will come back to). This allows advertisers to research relative search volume for specific keywords (how many people are searching for what) and therefore provides business insights into what terms to associate with a product. This is a paradigmatic shift in the nature of advertising-led business models because, as opposed to picking a broad audience grouped around a publication, advertisers now buy visual real-estate around topics in which the consumer in question is known to have some level of interest. (That interest is expressed through the act of searching).

Secondly, click-through data about the ads themselves is collected and used to rank ads purchased against specific search terms. This prevents wealthy advertisers from bidding their way to the top of every high-traffic search term and instills “an economy of relevance and profit” into Google’s ad model. [Battelle, J. (2005) The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture. New York: Penguin Group]. Data usage has brought forth a profitable advertising business model that doesn’t require a single salesperson.

Industries

"This is part of a broader revolution as we move from just financial and numerical data towards all sorts of non-financial metrics," says Brynjolfsson. So the advantages of making data-based decisions as opposed to relying on "experience and intuition" are applicable to more industries than finance and web giants. [ Lohr, S. (2012) The Age of Big Data. News analysis on the New York Times Sunday Review [online]. Available from: http://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.html?pagewanted=all (Accessed 3 April 2012). ]

For example the McKinsey study found that with “intelligent, creative use of Big Data,” European Governments could make up €100 billion in operational efficiencies and United States healthcare could “create more than $300 billion in value every year.”

Retailers are processing information about sales, pricing, markets, demographics and weather to make informed decisions about which products to stock at what store locations and when to offer price reductions. Shipping companies are using local traffic and weather data to perfect their delivery routes. Police departments, starting with the NYPD, are learning to use historical arrest patterns, paydays, sporting events, rainfall and vacations to predict where crimes will happen when. Online dating services rely on algorithms to sift through the data in member profiles to look for personal traits and other factors that might allow them to better match couples.

Big Data is also advancing the level of sophistication in interdisciplinary academia and other varieties of institutional research. Although there are numerous incompatible formats in which data can exist, once data is digitised into machine-readable format - and much of the data being produced and worked with today is natively digital - the barriers to compatibility and collaboration break down quickly.

In 2011 for instance, assistant director of the Text and Digital Media Analytics department at the University of Illinois, Kalev Leetaru, used a supercomputer to analyse an archive of 100 million global news articles spanning 30 years. The result was a network of 10 billion people, places and things, and 100 trillion relationships which he used “to forecast the Arab Spring, pinpoint Bin Laden’s location,” and visualise the evolution of human society. [ Leetaru, K. (n.d.) Kalev H. Leetaru online homepage. Kelevleetaru.com [online]. Available from: http://www.kalevleetaru.com/ (Accessed 28 April 2012). [online].]

Future Prospects - Moving into News Production

Given some of the material we’ve already covered on the growth of the Internet and digital data, a somewhat strange picture of the future of commerce emerges. For instance, the top ten in-demand jobs globally in 2010 didn’t even exist in 2004. [ Fisch, K. et al. (2008) Did you know? [online]. Available from: http://www.youtube.com/watch?v=Mmz5qYbKsvM&feature=player_embedded (Accessed 28 April 2012). [online]. ]

Industries that previously had no interest in or use for data are now spending large portions of their budgets on leveraging the analysis of Big Data into meaningful business intelligence and insight. Other areas are transforming from their 20th century analogues into full-fledged data-centric businesses.

To be sure, industry and societal transformations on this scale, taking place at this speed, manufacture problems in education and training. The United States alone will face a shortage of 140,000-190,000 workers with “analytical expertise” and one and half million managers and analysts with the skills to understand and make decisions based on the analysis of Big Data. [ McKinsey Global Institute (2009) Big Data: The next frontier for competition. McKinsey & Company Features [online]. Available from: http://www.mckinsey.com/Features/Big_Data (Accessed 24 April 2012). ]

In the process of moving from a print-only world into a mixed media, digital-first or digital-only one, the journalism and publishing communities are as exposed to the costs of adapting to data-driven environments as the next industry.

The next section will…