Stats in Action: How One Data Journalist Writes an Article 

Nick Thieme is an AAAS Mass Media Fellow with the sponsorship of the American Statistical Association, working at Slate Magazine over the summer months. Through this series, he gives us a glimpse behind the scenes of his journey as a statistician and data journalist. 

Over the last 10 weeks, I’ve written about robot soccer tournaments, fidget spinners, and whether AI is developing sentience. It’s been as tough as it has been fun.  

Despite this broad variety of topics, my articles have all gone through extremely similar stages of development, and, in this month’s update, I’m sharing a look at that development. To do this, I’ll take you through the process of one of my recent stories—one about p-values. 


Behind every story is a sticky idea. In this story, my statistics background drew me to an article recommending that the “threshold for significance” in hypothesis testing be lowered from .05 to .005. Immediately, this seemed misguided. 

The article, published jointly by 72 researchers on PsyArxiv, was released on July 25. That same day, I started tossing around ideas for a response in Slate.  

Getting to write about this topic was a delight. My undergrad degree is in statistics, and I’ve spent a lot of time doing mathematical statistics during my graduate studies, but, every time I test hypotheses, I have to reject the impulse to use p-values like measures of substantive significance. If I have to be careful in avoiding that mistake, I assume lay people do too, and addressing that impulse in lay audiences is the statistical evangelism that’s important to me. But, partly because I know more about the theory behind this subject than others I write on, it was tough to make my writing digestible and broadly interesting. 

Story angle 

Unlike most stories, once I saw the article that led to my piece, I more or less knew the content and tone of what I was going to write. That’s what I thought at least.  

As I started writing about assumptions under the null hypothesis, though, it became clear that the article I had in mind was better suited to the Joint Statistical Meeting than Slate Magazine 

So, even though I understand statistics better than most of the subjects I write on, I found myself taking the same first step as my other articles—trying to find the thread of an interesting narrative.  

Here, to the rescue, came my editor. Over the last several months, she’s displayed a monastic patience with my slow-learning, and one of the points I’ve been slowest in picking up is the difference between topic and story. While the topic of this piece was clear, “This research doesn’t address the reproducibility crisis properly,” the story was not. “What interesting and potentially surprising narrative makes this topic interesting to readers?” she asked me. 

Framed this way, we realized the research addressed a “consensus on p-values” that had already been dismissed and discouraged. Here was where domain knowledge proved useful. I’d read the ASA’s wonderful statement on p-values and I’d read the responses. I knew some journals had banned p-values, others discouraged them, and still others gave nuanced suggestions. It turned out I’d been writing this story in my head for the last year and a half. 


Even with a clear angle on the story, the article was only half done. Outside academia, I have fairly nuanced ideas when it comes to science, but, compared with expert thinkers, I’m a kid in a sandbox. So I took the usual step and interviewed someone smarter than me, a paragon of statistical practice and someone whose online debates I’ve read since undergrad, Dr. Deborah Mayo of Virginia Tech.  

Interviewing experts is the best way to make a story better. Dr. Mayo gave me new language to frame the terms of the debate, exposed misconceptions I had about the history of the Neyman-Pearson/Fisher debates (misconceptions I’m excited to read about in her upcoming book), and made a point that perfectly illustrated the issues with the research. 

Writing colloquially about a subject you understand well is tricky. This article convinced me it’s ultimately not so different from writing an article about the use of fetal bovine serum in lab-grown meat—something I knew nothing about prior to writing. You identify a topic, conduct research to separate the story from the topic, and talk to people smarter than you (in my case, my editor and Dr. Mayo).  


On August 2, I argued in Slate that the p-value proposal does little to address the reproducibility crisis, and, stranger still, that it’s a strawman policy, aimed at a “consensus” that has dramatically changed in recent years.  

Read the full article here.  

From concept to publication 

 No matter the topic, there is a process that helps me develop a story into something worth reading. For heavily reported stories, like this one, that process is especially laborious. The facts drive the content, making research and fact-checking essential, and facts are not always easy to come by. But, even when the topic has nothing to do with statistics, it’s exactly that labor that gives the article its value.  


Related Posts


Fall Data Challenge 2020: Meet the Judges

This is Statistics’ fifth annual Fall Data Challenge, Get Out the Vote, is right around the corner!  You and your classmates will have the opportunity to work in teams to apply your statistical skills to real voter-turnout data and provide insights to inspire more people to vote in the upcoming election. The submission window opens on October 19.   With contest submissions opening soon, we want to introduce the real-life statisticians, with experience in election…


How to Create Data Subsets for the 2020 Fall Data Challenge

The 2020 Fall Data Challenge: Get Out the Vote submission window is almost here! In preparation, you can begin reviewing the dataset with your team now.    For this year’s challenge, all submissions must utilize the IPUMS-ASA U.S. Voting Behaviors dataset. This rich dataset includes information about voting behaviors in the U.S. over the past 14 years, including 28 variables…


Comments are closed.