Fall Data Challenge: Exploring a Dataset for Beginner Statisticians
October 14, 2022
When you first start exploring datasets, they can be intimidating! You’re faced with an overwhelming set of numbers and labels, and are trying to figure out what to do first.
You’re not alone–the Fall Data Challenge is one of the first true datasets many students explore. Don’t let it intimidate you! Tucked within the expanses of the spreadsheet grid, patterns and insights are waiting to be discovered. You just have to know how to hunt for them.
These 4 steps will help a beginner statistician learn their way around analyzing a dataset.
Download and Open the Dataset
The first step to understanding a dataset is to face it head-on. It may not make sense right away, but don’t let that hold you back. Trust yourself and give it a moment – it will all start to come together as you take the next few steps.
For the Fall Data Challenge, the data is available in an Excel file. There are many different tools available to open a dataset, depending on the coding language you’re working from. The book R for Data Science by Hadley Wickham and Garrett Grolemond has some excellent tips.three-part series offers an introduction to the popular programming language R. Or, try another popular option with this beginner’s guide to Python.
Understand the Definitions
Datasets tend to use a lot of abbreviations to fit the key information into the parameters of a spreadsheet.
Which means that to fully understand what you’re looking at, you’ll need not just the dataset, but also the dataset dictionary or key. This will help you understand what the abbreviations in the columns and rows signify, and help you know exactly how each aspect of the dataset is defined and measured. Attention to detail is important, as these definitions may not be quite what you’d assume!
As you review the dataset overview and codebook, and the definitions become clearer, you’ll gain a better grasp of what the numbers of the dataset signify and how they relate to each other – you’re already well on your way to some strong analysis!
Observe and Ask Questions
Now that you have some context for what you’re looking at, review the dataset again. What do you notice? You may start to observe patterns, trends, or other points that stand out.
These observations offer a great starting point for your analysis of the dataset. Ask questions, look for the answers in the data, and follow your curiosity. This is called Exploratory Data Analysis (EDA), and it’s the heart of data analysis.
As you do this, you can take advantage of the tools in the program you’re using to dig deeper. The more time you spend practicing and building skills within your chosen programming language, the more freedom you’ll have to explore! Utilizing data visualizations can be especially helpful to understand trends within the data.
Explain Your Findings
As you explore the data, answers to your questions should start to become clear. The final step of analyzing a dataset is to explain what the data has revealed to you.
When explaining your findings, remember that good communication is an important part of being a statistician or data scientist. Share your conclusions in a way that is easy for others to understand (even non-statisticians). It should also be easy for others to understand how the data led you to these conclusions.
If you struggle to explain how the data demonstrated one of your points, that’s a clue to revisit the findings!
Exploring Datasets is a Skill Built with Practice and Curiosity
Datasets may seem daunting at first, but they’re also a rich resource for those who can learn to be comfortable with the unknown and set out for an adventure of questions and deepened understanding. Learning how to navigate through datasets and mine them for insights is a skillset that will serve you well no matter where your interests or career path lead you.
By practicing and staying curious, you’re well on your way to exciting data discoveries, no matter what kind of dataset you explore.
- Tukey, Design Thinking, and Better Questions
- Resources Roundup for Fall Data Challenge: After the Bell
- Introduction to the Dataset for Fall Data Challenge: After the Bell
In this year’s Fall Data Challenge, After the Bell, 72 teams and 262 students submitted their data analyses on how to enhance familial involvement in the K-12 educational experience using data from the National Household Education Surveys Program (NHES)’s 2019 Parents and Family Involvement (PFI) Survey. “Our annual Fall Data Challenge continues to be an opportunity…
Today, we want to recognize and celebrate some of the groundbreaking statisticians and data scientists in gratitude for their scientific and societal contributions. Careers in statistics and data science are growing and expanding into various industries. From healthcare to sports, these trailblazers have forged paths, made inspiring contributions, and made us grateful for all statistics…