Jeremy Singer-Vine has the Buzz on using Data in Journalism
May 10, 2016
We recently spoke with Jeremy Singer-Vine, a data editor at Buzzfeed News investigative unit. He explains how data has become more important to journalism in recent years, and provides advice to aspiring journalists on how to become data savvy.
What came first for you as a student — your love of data or writing? Did you study statistics?
For me, journalism came first. I worked on my high school and college newspapers — first as a photographer and later as a reporter. During college I took a couple of classes on statistics and scientific research methods, which I enjoyed. But I didn’t start doing analyses of my own until early in my post-college career. Once I did, I quickly realized how much I enjoyed analyzing data, and weaving those analyses into traditional reporting.
Data journalism may be a new phrase to some but the practice of using data, especially in the investigative journalism you do at Buzzfeed, is not. What’s changing in the way journalists approach data?
You’re absolutely right; there’s a long history of using data analysis in journalism. At the same time, data-driven journalism does seem to have become both more prevalent and more popular in the last five or ten years. I think a few factors might explain this:
- Computers have become cheaper and faster. That’s made it possible for more journalists to pursue this type of research, to learn the necessary skills earlier in life, and to analyze larger datasets than was previously possible.
- The Internet and World Wide Web has enabled new, exciting ways of presenting data-intensive reporting.
- As the global economy has become increasingly reliant on technology and data, I think readers have become increasingly interested in — and comfortable with — these types of stories.
What’s changing now about the way we approach data? It seems a few approaches are becoming more common:
- Integrating advanced statistical analysis and machine learning into the reporting process, and making the results of those computations understandable to readers. See, for example, the Los Angeles Times’ use of machine learning to identify misclassified crimes.
- Publishing more of the code and data used in stories. We’ve made this a big priority at BuzzFeed News.
- Trying to predict the future. While that may sound glib, it’s not! FiveThirtyEight’s election forecasts are a great example of journalists using statistical techniques to “model” future outcomes. ProPublica recently published interactive simulations of what might happen when the next mega-storm hits Houston. And USA Today used Census data to predict the future of diversity in America.
You also write a newsletter — Data is Plural — of useful or interesting data sets. Do you get story ideas from data sets or seek out data sets based on a story idea? Is there a particular story where the data took you in a different direction or revealed something deeper?
The data <-> story relationship goes both ways, even within a single story. For me, the most gratifying type of story involves a constant bouncing back-and-forth between data analysis and other types of reporting. An interview may give you an idea for an analysis, which may point in another direction, which may lead to another analysis. Our series on the United States’ H-2 immigration program is an example of reporting that proceeded in that sort of highly collaborative mode.
What advice would you give to aspiring data journalists or journalists of any beat when it comes to working with data?
Two bits of advice:
- Data, despite its appearances, is deeply human. When you think about data, don’t just think of the “product” — say, the spreadsheet or database — but also of the process: Who chose to collect this data, and why? How was the data collected? Was it cleaned, massaged, or otherwise altered before it was published? What isn’t in the dataset? And how might all of that affect how you interpret the data.
- There’s a huge community of people eager to help you. Take advantage of it! One great resource is NICAR-L, a listserv run by the National Institute of Computer Assisted Reporting. The subscribers are mainly journalists, but you don’t have to be one to join. Ask a question and you’ll often find people willing to help within minutes or hours.
We tell students studying statistics can help solve any problem. Have you finally figured out why Corbin Bleu is so popular on Wikipedia?
The puzzle of Corbin Bleu’s international Wiki-fame continues to baffle us. We could use your help!
As we hit the final rounds of the NCAA Men’s Basketball Tournament, many of us are inevitably eliminated from our brackets as teams are eliminated from the court. But don’t throw your hat in yet! Get back in the game with the Statsketball Overtime Challenge. How to play Follow This is Statistics on Twitter and tweet us…
While many may choose their NCAA Men’s Basketball Tournament predictions on a classic gut-check (or maybe by mascot), statistics buffs used a more systematic approach to enter This is Statistics’ Statsketball Tournament. The competition drew submissions from undergraduate and high school students across the country for its two challenges, the “Pick ‘Em” Upset Challenge and…