Creating a Solution with Statistics, One E.Coli Colony at a Time
December 14, 2017
Ana Humphrey, a junior at T. C. Williams High School in Alexandria, VA, is not your average student. As a national science fair winner, founder of two organizations, and creator of an innovative, low-cost approach to water testing, Ana is a girl on a mission. Along the way, her accomplishments have been powered by statistical analysis, computer programming, and a passion for the environment.
Background on Bacteria
It started when Ana realized while monitoring local waterways for bacteria that the existing methods to accurately detect bacteria in water are both expensive and difficult to perform. She explained, “To collect good, comprehensive data, you need to have significant resources, and that’s not sustainable if you’re trying to cover a large area or if you’re in a location that doesn’t necessarily have access to those resources.”
While exploring other options, Ana realized that although some tests were inexpensive and simple, “the way the data was quantified made it inaccurate.”
For Ana, this was a problem, and she was determined to come up with a solution. She aimed to create a method of monitoring waterborne bacteria that is simple, inexpensive, and has accuracy so that anyone can produce comprehensive databases of bacteria levels in local waterways without running out of resources.
Ana had previous experience in bacteria testing, colony counting, and other data-focused research, but said her idea all came together when she saw another student who had been using image analysis to identify lung tumors in CT scans.
“I thought if this student could create something to identify lung tumors, then why can’t I use a similar idea to identify E. coli colonies in these tests that are hard to count, really hard to get the data for and it developed from there,” she said.
New Method for the Madness
Ana’s idea was to create an app that would accurately count and classify the bacteria colonies through a photo of a petri dish. This app would be used in conjunction with Coliscan Easygel tests and came to be known as ColiFind—a digital image analysis application to identify E. coli colonies in Coliscan Easygel water quality tests. Her innovative solution employs five digital E. coli signatures in the image’s color intensity, saturation and pattern correlation, and makes the process of measuring bacteria in water easier, more objective, and more consistent.
As Ana developed ColiFind, she realized she needed to better understand statistics and was determined to learn it. Ana was then connected through her science research class with Mark Otto, a statistician with the U.S. Fish and Wildlife Service and the American Statistical Association. Mr. Otto has served as Ana’s mentor, emphasizing that statistics is a huge aspect of science and experimentation.
Ana added to this, saying, “An important thing in testing a new method is you have to do high-quality, detailed and meticulous validation, and statistics is a large part of that validation process. When validating any sort of assay, it is essential that the assay’s accuracy be rigorously tested.
For ColiFind, this means finding the most valid statistics to communicate false positives and the overall bacteria colony detection rate. It also means identifying and limiting possible sources of error in the data collection process. Because ColiFind is trained to detect E. coli colonies using images provided by volunteers, it is essential that these images are representative of the images ColiFind will have to process when it is deployed in the field. Having an unbiased sample of training images will make sure that the ColiFind’s accuracy rates in beta testing translate over to real-world use.”
Working with Mr. Otto and learning statistics is giving Ana the foundation she needs to make sure that this validation is done correctly.
Making the World a Little Better
Ana is also known for her other environmental initiatives and water quality related research.
“I have a number of projects, things I do because I really care about them,” she said.
She previously founded Kidslovemountains.org, a website for kids that explains mountaintop removal mining, a form of mining that blows off the tops of mountains to get to the coal beneath and has environmental ramifications and negatives impact on the surrounding communities.
More recently, Ana created the Watershed Warriors Initiative, an organization that teaches students in her school system hands-on science lessons about environmental issues that are prevalent in her own community. The Watershed Warriors Initiative is currently working to expand its efforts to other school districts.
Why does Ana put so much of her time and effort into these extracurriculars? “I think I can help make the world a little bit better,” she said.
This is Statistics’ fifth annual Fall Data Challenge, Get Out the Vote, is right around the corner! You and your classmates will have the opportunity to work in teams to apply your statistical skills to real voter-turnout data and provide insights to inspire more people to vote in the upcoming election. The submission window opens on October 19. With contest submissions opening soon, we want to introduce the real-life statisticians, with experience in election…
The 2020 Fall Data Challenge: Get Out the Vote submission window is almost here! In preparation, you can begin reviewing the dataset with your team now. For this year’s challenge, all submissions must utilize the IPUMS-ASA U.S. Voting Behaviors dataset. This rich dataset includes information about voting behaviors in the U.S. over the past 14 years, including 28 variables…