Computational Journalism

Introduction

According to media scholar Nicholas Diakopoulos, “computational journalism” refers to the application of computing and computational thinking to the activities of journalism (e.g., newsgathering), all while upholding core values of journalism (e.g., accuracy). As such, computational journalism isn’t just about the technology; it is also a way of approaching the practice of journalism.

As a way of thinking, computational journalism is rooted in the idea of translating the messy world into organized (structured) information schemas. For example, the many attributes (aspects) of a murder incident can be indexed based on taxonomies and categories of people, entities, concepts, events, and locations (e.g., who the perpetrator was, what kind of weapon they used, and what sort of location the murder took place in). In a way, journalists have always done this in an informal way in order to produce things like the summary lead (5 Ws and H). However, computational journalism requires journalists to do it in a formal way, such as by storing each part of that information as a distinct item in a database.

A Brief History

Although computational journalism may seem like a novel thing, we can trace some of its informal origins to the 1800s. For example, the very first edition of The Manchester Guardian (May 5, 1821) offered a table listing the amount of patients at a local hospital who were inoculated against the cow pox, the amount who were released after surviving the disease, and the amount who died from it. It similarly offered other figures about the patients who were being treated for an accident as well as those being held in its ‘lunatic asylum.’ While no computers were used to compile that table — computers had not yet been invented — the Guardian journalists were already engaging in the form of thinking that powers computational journalism today.

The machine-aided form of journalism that is more typically associated with today’s computational journalism arguably began in 1952, when CBS News used a digital computer to predict the outcome of a presidential election by using partial results. By the 1960s, journalists like Phil Meyer of the Detroit Free Press and Clarence Jones of the Miami Herald were using computers to analyze things from survey data (e.g., to determine the underlying causes of the 1967 Detroit riot) to court records (to uncover bias in the criminal justice system in Dade County). By the 1980s, an array of different computational practices for gathering and analyzing news began to emerge, many of which were categorized into what was termed “computer-assisted reporting.” Put another way, the logic used in computational journalism was being increasingly paired with the technology that is now associated with it.

As the Internet proliferated in the 1990s, journalistic practices became even more computationally oriented. In particular, journalistic outlets started investing more money in “digital” positions, resulting in new jobs and departments. This included the hiring of multi-person software development teams who could work with non-technically savvy journalists to produce computational journalism stories and develop computational journalism workflows. While such teams, processes, and products remained relatively small and had limited influence on the broader practice of journalism, they were important for seeding the changes to journalistic norms and logics that would accelerate in the coming years.

Computational Journalism in the 21st Century

By the late 2000s, new areas of specialization were emerging. These include automated journalism (having machines produce news content from data with limited human supervision), conversational journalism (communicating news via automated, dialogic interfaces like chat bots), data journalism (using data to report, analyze, write, and visualize stories), sensor journalism (using electronic sensors to collect and analyze new data for journalistic purposes), and structured journalism (publishing news as data).

While some of those specializations emerged relatively independently from one another, they are still centered on interpreting the world through data, and generally rely on computational processes to translate knowledge into data and data into knowledge. As such, they are fundamentally computational forms of journalism, regardless of the amount of technological wherewithal that is actually required.

Computational journalism also aims to blend logics and processes spanning multiple disciplines, such as journalism, computer science, information retrieval, and visual design. With regard to journalism, it involves a significant shift away from the traditional focus on nuance (in reporting), individualism (in subject or focus), and creativity (in writing). Instead, it orients itself toward standardization (in reporting), scale (in subject or focus), and efficiency (in writing). These differences in logics and approaches often make it difficult for editorial and technical actors to work together on computational journalism projects. In fact, researchers have found that when computational journalism projects fizzle or fail, it is often due to the philosophical and procedural differences among members of the team.

Nevertheless, computational forms of journalism have been used to produce highly impactful work in recent years, both in terms of journalistic content and new tools for producing journalism. Several computational journalists (who don’t always self-identify as such) have won prestigious awards for their computational journalism. For example, Jay Hancock and Elizabeth Lucas of Kaiser Health News won a Pulitzer Prize in 2020 for exposing predatory bill collection by the University of Virginia Health System, which had forced many low-income patients into bankruptcy. Hancock and Lucas worked together with an open data advocate to collect and analyze information about millions of civil court records in Virginia — far more than a human journalist could inspect manually. Their reporting resulted in the non-profit, state-run hospital changing its behavior.

On the software side, journalists have worked alongside software development teams to create technologies like DocumentCloud, an all-in-one platform designed to help journalists (and teams of journalists working across multiple journalistic outlets) to upload, organize, analyze, annotate, search, and embed documents. The project brings together existing tools from disciplines like computational linguistics into an interface that is accessible to many journalists. Similarly, MuckRock has made it easier for journalists to make several Freedom of Information Act requests at the same time, write news stories from them, and share the data with other journalists.

Computational journalism demands the same high ethical standards as traditional journalism to ensure that the process of gathering, analyzing, and disseminating information to the public is truthful, independent, and inclusive. However, computational forms of journalism do not always have a distinct code of ethics. This can be challenging as computational journalists tend to place a greater premium on transparency and openness than traditional journalists, which can introduce ethical tensions. For example, some computational journalists have been critiqued as being naive for posting unredacted datasets (that placed unwitting individuals at risk) or not reviewing automated stories (that included misinformation).

It is expected that computational journalism will only continue to grow in the coming years. For example, The New York Times launched a short program to teach its journalists data skills, and the outlet made that course open-source when publishing it online. And, journalistic outlets like BuzzFeed News, FiveThirtyEight, The Marshall Project, and The Washington Post sometimes post the code powering their computational journalism on the code-sharing platform GitHub in order to promote their craft. Moreover, as computers become more powerful and intelligent, automation is likely to become more commonplace — as will the tasks related to translating the natural world into structured data.


Key Takeaways

  • Computational journalism covers both the application of computing and computational thinking to various journalistic activities, including information gathering, sensemaking, and information dissemination.

  • Computational journalism is not an entirely new phenomenon, but it has developed intensely in recent years as new forms of journalism emerged.

  • Computational journalism has been used to produce both award-winning journalistic work as well as impactful journalism-oriented technologies.