Defining data can be a rather esoteric business, but for the purposes of this paper we will stick to the essentials. Data is typically viewed as a constituent of information - that is, the reduced, discrete entities that make up a larger system, set or understanding.
One of the cornerstones of non-fiction writer James Gleick’s enthralling book The Information: A History, A Theory A Flood is the thinking of Claude Shannon, the mathematician founder of Information Theory which views information as strings of bits which have quantities but which have nothing to do with meaning or values like truth or falsehood. [Gleick, J. (2011) The Information: A History, A Theory A Flood. Kindle. New York: Pantheon Books].
Many of the major scientific breakthroughs of the past century have been informational in nature. For instance the revelation that human biological substance is made of coded information patterns and that “our nerves carry messages” occurred only after the invention of the Telegraph, which carries human-made messages through wires. [Kelly, K. & Gleick, J. (2011) Why the Basis of the Universe Isn’t Matter or Energy—It’s Data. Wired.com/magazine [online]. Available from: http://www.wired.com/magazine/2011/02/mf_gleick_qa/all/1 (Accessed 2 April 2012). [online]].
More fundamentally information is, by many contemporary scientific accounts, the constitutional force behind life and the universe. [Landsburg, S. E. (2009) The Big Questions. London: Simon & Schuster.]
The Oxford English Dictionary defines data as a noun: “facts and statistics collected together for reference or analysis.” The singular of data is datum: “a piece of information.” These will be instructive moving forward.
Data in the World
Advances in information technologies are transforming how we communicate with friends and family, changing the way many enterprises do business and eroding the reliability and function of disparate organisational structures on which society and commerce depended for much of human history. [Shirky, C. (2008) Here Comes Everybody. New York: Penguin Group.] Rapid contractions in the price and size of consumer hardware necessary to interface with many of those information technologies have made the consumption and production of digital media both mobile and ubiquitous throughout much of the developed world.
Among the chief consequences of these transformations, as well as advances in industrial computing and interface design technologies, has been an explosion in the global volume of data.
So then, now that every person with a digital device - not to mention the places where they use and nearly any imaginable action performed on those devices - can be turned into a data point, how much data is there in the world?
In information disciplines data is measured in bytes. One byte is roughly the amount of information it takes to represent a single alphanumeric character.
Chief executive of enterprise software company Open Text, Mark Barrenechea, estimates that two and a half exabytes (one exabyte is a quintillion bytes) of computer data are produced every day, which means “every two days humans are putting more data online than what has been put into print since the dawn of recorded history.” [Simone, R. (2012) Power of big data will transform society, Canada 3.0 forum told. The Record [online]. Available from: http://www.therecord.com/news/business/article/711815—power-of-big-data-will-transform-society-canada-3-0-forum-tol (Accessed 28 April 2012).]
According to a 2012 report by the International Data Corporation (IDC), the amount of data available to the global Big Data industry is growing in volume by over 50 percent each year. [International Data Corporation (2012) Worldwide Big Data Technology and Services Forecast. [online]. Available from: http://www.idc.com/getdoc.jsp?containerId=prUS23355112 (Accessed April 16, 2012)].
All of that data, it seems prudent to remark, is being generated in one way or another by the growing population of Internet-connected human beings which surpassed the 2 billion mark near the start of 2011 and doubled in the five years before that. [ Lynn, J. (2010) Internet users to exceed 2 billion this year. http://www.reuters.com/. 19 October. [online]. Available from: http://www.reuters.com/article/2010/10/19/us-telecoms-internet-idUSTRE69I24720101019 (Accessed April 20, 2012)].
How manageable is all of that data? Not very if Eric Schmidt chief executive of Google - the Internet search giant, one of whose corporate missions is to “organise the world’s information - is to be believed. [ Corporate, G. (n.d.) Google’s mission is to organize the world’s information and make it universally accessible and useful. Google Company [online]. Available from: http://www.google.com/about/company/ (Accessed 28 April 2012). [online].
At the 2005 Association of National Advertisers conference, Schmidt estimated the company had indexed roughly 170 of the “5 million terabytes of information out in the world,” which is less than 0.004 percent. A terabyte is 10^12 bytes; ten terabytes is roughly the amount of information in the US Library of Congress’s entire catalogue. Without reference to the Internet’s exponential growth trends in the seven years since his talk (or sigmoidal growth, given infinite, exponential growth is likely impossible [Arthur, C. (2007) Want to impress your friends? Tell them internet growth is sigmoidal, not exponential. Guardian Technology Blog [online]. Available from: http://www.guardian.co.uk/technology/blog/2007/nov/26/wanttoimpressyourfriendst (Accessed 15 April 2012). [online].]), Schmidt predicted it could take Google 300 years to index the world’s data exhaustively and make it searchable. [Mills, E. (2005) Google reveals its 300-year plan. ZDNet IT Strategy [online]. Available from: http://www.zdnet.co.uk/news/it-strategy/2005/10/10/google-reveals-its-300-year-plan-39228011/ (Accessed 16 March 2012).]
In any event, the growth of the Internet is expanding at a blistering pace as measured by any number of factors including number of hosts, speed, and connected population. That growth, along with advances in peripheral technological and design innovations, is among the primary causes of the global swelling of collectible and usable data sets.
Adoption and Value
“Most great revolutions in science are preceded by revolutions in measurement. We have had a revolution in measurement in the last few years,” says economist and professor of data measurement Erik Brynjolfsson. “This revolution in measurement, which started with the switch from analogue to digital data, is as profound as the development of the microscope and what it did for biology and medicine.” [ Eric, B. (n.d.) Competing through Data. [online]. Available from: http://rss.mckinseyquarterly.com/fp/video (Accessed 10 April 2012). [online]. ]
The range of industries for which Big Data is providing applications and value is larger, however, than just biology and medicine.
“The march of quantification, made possible by enormous new sources of data, will sweep through academia, business and government. There is no area that is going to be untouched,” the director of Harvard’s Institute for Quantitative Social Science Gary King told the New York Times. [ Lohr, S. (2012) The Age of Big Data. News analysis on the New York Times Sunday Review [online]. Available from: http://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.html?pagewanted=all (Accessed 3 April 2012). [online]. ]
Similarly, research on technology and innovation by the McKinsey Global Institute and MGI asserts that “data have swept into every industry and business function and are now an important factor for production, alongside labor and capital. Accordingly, companies are starting to “leverage data-driven strategies to innovate, compete and capture value from deep and up-to-real-time information.” [ Manyika, J. et al. (2011) Big data: The next frontier for innovation, competition, and productivity. McKinsey Insights [online]. Available from: http://www.mckinsey.com/Insights/MGI/Research/Technology_and_Innovation/Big_data_The_next_frontier_for_innovation (Accessed 28 April 2012). [online]. ]
A panel on Big Data at the World Economic Forum 2012 in Davos, Switzerland went so far as to declare data a new class of economic asset, similar to currency or gold. [ World Economic Forum (2012) Big Data, Big Impact: New Possibilities for International Development. [online]. Available from: http://www3.weforum.org/docs/WEF_TC_MFS_BigDataBigImpact_Briefing_2012.pdf (Accessed 1 April 2012). [online]. ]
What is it that make data so valuable and why are the insights they provide so highly prized?
For one, this revolution in measurement is intertwined with innovations in data collection and analysis and expedited by wide adoption of consumer-facing data-input and management platforms like Google, Facebook, Twitter and blogging services, all of which rely on digital data to function in the first place and “make it possible to measure behaviour and sentiment in fine detail and as it happens,” says Brynjolfsson.
Another, more directly commercial instance of data’s value is personalisation. Facebook is able to charge advertisers a higher premium than, say, digital news publishers in part because of how much specific demographic and behavioural data the company has about its users. The DVD rental and video streaming service Netflix has developed an algorithm that analyses data about demographics, location and film preference and processes that into personalised recommendation engines from which, by some estimates, 70 percent of titles are chosen. [ Simone, R. (2012) Power of big data will transform society, Canada 3.0 forum told. The Record [online]. Available from: http://www.therecord.com/news/business/article/711815—power-of-big-data-will-transform-society-canada-3-0-forum-tol (Accessed 28 April 2012). [online]].
In both cases knowing more fine-grained detail about customers increases companies’s ability to make evidence-based decisions about changes and refinements to products and services.
The lion’s share of Google’s revenues come from its ad model, which harnesses the value of data by analysing global search trends and auctioning off text-based ad space against keywords and phrases. Here, data plays several roles.
First, search trends are collected, analysed and made publicly available in the form of Google Analytics. (This platform has more function than just advertising, which we will come back to). This allows advertisers to research relative search volume for specific keywords (how many people are searching for what) and therefore provides business insights into what terms to associate with a product. This is a paradigmatic shift in the nature of advertising-led business models because, as opposed to picking a broad audience grouped around a publication, advertisers now buy visual real-estate around topics in which the consumer in question is known to have some level of interest. (That interest is expressed through the act of searching).
Secondly, click-through data about the ads themselves is collected and used to rank ads purchased against specific search terms. This prevents wealthy advertisers from bidding their way to the top of every high-traffic search term and instills “an economy of relevance and profit” into Google’s ad model. [Battelle, J. (2005) The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture. New York: Penguin Group]. Data usage has brought forth a profitable advertising business model that doesn’t require a single salesperson.
“This is part of a broader revolution as we move from just financial and numerical data towards all sorts of non-financial metrics,” says Brynjolfsson. So the advantages of making data-based decisions as opposed to relying on “experience and intuition” are applicable to more industries than finance and web giants. [ Lohr, S. (2012) The Age of Big Data. News analysis on the New York Times Sunday Review [online]. Available from: http://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.html?pagewanted=all (Accessed 3 April 2012). ]
For example the McKinsey study found that with “intelligent, creative use of Big Data,” European Governments could make up €100 billion in operational efficiencies and United States healthcare could “create more than $300 billion in value every year.”
Retailers are processing information about sales, pricing, markets, demographics and weather to make informed decisions about which products to stock at what store locations and when to offer price reductions. Shipping companies are using local traffic and weather data to perfect their delivery routes. Police departments, starting with the NYPD, are learning to use historical arrest patterns, paydays, sporting events, rainfall and vacations to predict where crimes will happen when. Online dating services rely on algorithms to sift through the data in member profiles to look for personal traits and other factors that might allow them to better match couples.
Big Data is also advancing the level of sophistication in interdisciplinary academia and other varieties of institutional research. Although there are numerous incompatible formats in which data can exist, once data is digitised into machine-readable format - and much of the data being produced and worked with today is natively digital - the barriers to compatibility and collaboration break down quickly.
In 2011 for instance, assistant director of the Text and Digital Media Analytics department at the University of Illinois, Kalev Leetaru, used a supercomputer to analyse an archive of 100 million global news articles spanning 30 years. The result was a network of 10 billion people, places and things, and 100 trillion relationships which he used “to forecast the Arab Spring, pinpoint Bin Laden’s location,” and visualise the evolution of human society. [ Leetaru, K. (n.d.) Kalev H. Leetaru online homepage. Kelevleetaru.com [online]. Available from: http://www.kalevleetaru.com/ (Accessed 28 April 2012). [online].]
Future Prospects - Moving into News Production
Given some of the material we’ve already covered on the growth of the Internet and digital data, a somewhat strange picture of the future of commerce emerges. For instance, the top ten in-demand jobs globally in 2010 didn’t even exist in 2004. [ Fisch, K. et al. (2008) Did you know? [online]. Available from: http://www.youtube.com/watch?v=Mmz5qYbKsvM&feature=player_embedded (Accessed 28 April 2012). [online]. ]
Industries that previously had no interest in or use for data are now spending large portions of their budgets on leveraging the analysis of Big Data into meaningful business intelligence and insight. Other areas are transforming from their 20th century analogues into full-fledged data-centric businesses.
To be sure, industry and societal transformations on this scale, taking place at this speed, manufacture problems in education and training. The United States alone will face a shortage of 140,000-190,000 workers with “analytical expertise” and one and half million managers and analysts with the skills to understand and make decisions based on the analysis of Big Data. [ McKinsey Global Institute (2009) Big Data: The next frontier for competition. McKinsey & Company Features [online]. Available from: http://www.mckinsey.com/Features/Big_Data (Accessed 24 April 2012). ]
In the process of moving from a print-only world into a mixed media, digital-first or digital-only one, the journalism and publishing communities are as exposed to the costs of adapting to data-driven environments as the next industry.
The next section will…
This week, I resigned to lots and lots and lots of reading.
Exhaustive lists and noteage to follow but for now, I’m starting to see how the whole thing is going to focus.
One problem I’ve been toying with is the question of how to angle the paper so the title question reflects the most valuable insights in the paper.
Among the things I’m the most excited about is an insight about data and gamification - the argument loosely goes as follows:
- The importance and presence of data has become huge for enterprises and individuals and this applies to journalism and journalists too. (Furthermore, that role is growing - no trouble finding sources to mine and evidence to point to here).
- Because of a number of factors formalised in the fields of social, motivational, media and evolutionary psychology as well as in behavioural economics, any information about a system given to someone associated with that system (consumer, producer, designer, observer, etc) changes that person’s view of and relationship to the system in question.
- Consequently, gamification is just the most elaborately structured form of data provision - a buzzword that represents a kind of pinnacle in our formalised thinking about driving behaviour through engineering incentives for precise, discreet actions within data-reliant and data-driven systems.
- So in thinking about how newsrooms are using and will continue to use data to motivate behaviours and encourage the production of different kinds of content, it’s important to realise that the difference between gamifying a news production environment and giving reporters access to site analytics, even in formats as analogue as twice-daily printed reports, is one of degrees.
That is, there is a spectrum of which gamificaiton is one current extreme and onto which fall everything from geo-tagged, auto-bylined and auto-dated story meta-data, story comment counters, as well as all of the better-known metrics like hits, time on site, time on page, new visitors, etc.
- Past this realisation - supported, obviously, by primary and secondary literature - the role of this paper will be to look at specific news production environments, how content contributors are given access to various data and metrics and how that data is structured and delivered.
The task then is to assess, on a case-by-case basis, what the effects of that data expose have been, are at present and might evolve into in the future.
So obviously, the last bullet point gets into the case studies and analysis area of the paper, while the argument outlined in the four bullet points above will beed to come towards the end of the literature review - I think.
The lead up to this argument will be material about gamification, big data and some of the social sciences, mentioned above, that inform motivational and behavioural theory. This last though will be used more, I think, in the analysis where I can look at the particulars of what different newsrooms are doing and then analyse based on the breadth of literature I’ve been consuming in those fields.
The purpose of this study is to look at the nascent data and gamification strategies being employed in news production environments to incent the production of news journalism. To the extent it is possible this paper also aims to reveal what the practical consequences of these strategies might be for the people producing news as well as for the nature of news output produced under such systems.
The study will proceed as follows.
Firstly reviewing what gamification means and how it can mean different things to different people in different situations and contexts will remove some of the blurry ambiguity of this buzz-word, the online presence of which Google Trends only acknowledges starting in 2010, and hopefully clear the way for more reasoned discussions moving forward. This will be accomplished through several real-world examples of gamification implementations in a range of industries for a range of desired results.
Secondly we will look at successful and unsuccessful examples of reader-facing gamification systems implemented by news-producing publications with digital presences. Throughout these minor case studies we will look to the fields of motivational psychology and social psychology, behavioural economics and theories of game mechanics - sometimes called game thinking - to try and understand what makes gamification so successful when it is in fact successful and what the limits to that effectiveness might be. This section will bring our discussion of why gamification has been deemed, and in many cases empirically shown to be, such an effective tool for motivating specific behaviours closer the realm of journalism and news production in particular, which is the focus of this study.
Next, a brief discussion of so-called “big data,” its presence in the lives of consumers, enterprises, society and culture will help to view gamification systems in light of the underlying collection and availability of data that makes such systems possible. This will lead to a discussion of data in the news production environment, how digital publications view the wealth of website and reader data available to them and why this matters on both abstract intellectual theoretical levels as well as in day-to-day news production practice.
Another brief section will examine the less-visible, though arguably faster-growing, area of enterprise gamification - that is gamifications of corporate workplace environments, often with the aim of increasing employee efficiency and satisfaction and of incentivising specific behaviours and tasks.
This will lead to a longer section on enterprise gamification at publications and in particular in daily news production environments. Following the earlier discussion about data in newsrooms however, this section will also consider how publications treat site analytics and reader data including how, in what forms and in what frequencies that data is made available to content producers both inside and outside the formal constructs of gamification.
Finally, the study will begin to examine the potential consequences firstly of exposing news content producers to specific site and reader metrics, and secondly of structuring those exposures into strategies with explicit aims in terms of influencing the production of news content.