Jeremy Singer-Vine has the Buzz on using Data in Journalism
May 10, 2016
We recently spoke with Jeremy Singer-Vine, a data editor at Buzzfeed News investigative unit. He explains how data has become more important to journalism in recent years, and provides advice to aspiring journalists on how to become data savvy.
What came first for you as a student — your love of data or writing? Did you study statistics?
For me, journalism came first. I worked on my high school and college newspapers — first as a photographer and later as a reporter. During college I took a couple of classes on statistics and scientific research methods, which I enjoyed. But I didn’t start doing analyses of my own until early in my post-college career. Once I did, I quickly realized how much I enjoyed analyzing data, and weaving those analyses into traditional reporting.
Data journalism may be a new phrase to some but the practice of using data, especially in the investigative journalism you do at Buzzfeed, is not. What’s changing in the way journalists approach data?
You’re absolutely right; there’s a long history of using data analysis in journalism. At the same time, data-driven journalism does seem to have become both more prevalent and more popular in the last five or ten years. I think a few factors might explain this:
- Computers have become cheaper and faster. That’s made it possible for more journalists to pursue this type of research, to learn the necessary skills earlier in life, and to analyze larger datasets than was previously possible.
- The Internet and World Wide Web has enabled new, exciting ways of presenting data-intensive reporting.
- As the global economy has become increasingly reliant on technology and data, I think readers have become increasingly interested in — and comfortable with — these types of stories.
What’s changing now about the way we approach data? It seems a few approaches are becoming more common:
- Integrating advanced statistical analysis and machine learning into the reporting process, and making the results of those computations understandable to readers. See, for example, the Los Angeles Times’ use of machine learning to identify misclassified crimes.
- Publishing more of the code and data used in stories. We’ve made this a big priority at BuzzFeed News.
- Trying to predict the future. While that may sound glib, it’s not! FiveThirtyEight’s election forecasts are a great example of journalists using statistical techniques to “model” future outcomes. ProPublica recently published interactive simulations of what might happen when the next mega-storm hits Houston. And USA Today used Census data to predict the future of diversity in America.
You also write a newsletter — Data is Plural — of useful or interesting data sets. Do you get story ideas from data sets or seek out data sets based on a story idea? Is there a particular story where the data took you in a different direction or revealed something deeper?
The data <-> story relationship goes both ways, even within a single story. For me, the most gratifying type of story involves a constant bouncing back-and-forth between data analysis and other types of reporting. An interview may give you an idea for an analysis, which may point in another direction, which may lead to another analysis. Our series on the United States’ H-2 immigration program is an example of reporting that proceeded in that sort of highly collaborative mode.
What advice would you give to aspiring data journalists or journalists of any beat when it comes to working with data?
Two bits of advice:
- Data, despite its appearances, is deeply human. When you think about data, don’t just think of the “product” — say, the spreadsheet or database — but also of the process: Who chose to collect this data, and why? How was the data collected? Was it cleaned, massaged, or otherwise altered before it was published? What isn’t in the dataset? And how might all of that affect how you interpret the data.
- There’s a huge community of people eager to help you. Take advantage of it! One great resource is NICAR-L, a listserv run by the National Institute of Computer Assisted Reporting. The subscribers are mainly journalists, but you don’t have to be one to join. Ask a question and you’ll often find people willing to help within minutes or hours.
We tell students studying statistics can help solve any problem. Have you finally figured out why Corbin Bleu is so popular on Wikipedia?
The puzzle of Corbin Bleu’s international Wiki-fame continues to baffle us. We could use your help!
This is Statistics’ fifth annual Fall Data Challenge, Get Out the Vote, is right around the corner! You and your classmates will have the opportunity to work in teams to apply your statistical skills to real voter-turnout data and provide insights to inspire more people to vote in the upcoming election. The submission window opens on October 19. With contest submissions opening soon, we want to introduce the real-life statisticians, with experience in election…
The 2020 Fall Data Challenge: Get Out the Vote submission window is almost here! In preparation, you can begin reviewing the dataset with your team now. For this year’s challenge, all submissions must utilize the IPUMS-ASA U.S. Voting Behaviors dataset. This rich dataset includes information about voting behaviors in the U.S. over the past 14 years, including 28 variables…