Fall Data Challenge: Exploring a Dataset for Beginner Statisticians
October 14, 2022
When you first start exploring datasets, they can be intimidating! You’re faced with an overwhelming set of numbers and labels, and are trying to figure out what to do first.
You’re not alone–the Fall Data Challenge is one of the first true datasets many students explore. Don’t let it intimidate you! Tucked within the expanses of the spreadsheet grid, patterns and insights are waiting to be discovered. You just have to know how to hunt for them.
These 4 steps will help a beginner statistician learn their way around analyzing a dataset.
Download and Open the Dataset
The first step to understanding a dataset is to face it head-on. It may not make sense right away, but don’t let that hold you back. Trust yourself and give it a moment – it will all start to come together as you take the next few steps.
For the Fall Data Challenge, the data is available in an Excel file. There are many different tools available to open a dataset, depending on the coding language you’re working from. The book R for Data Science by Hadley Wickham and Garrett Grolemond has some excellent tips.three-part series offers an introduction to the popular programming language R. Or, try another popular option with this beginner’s guide to Python.
Understand the Definitions
Datasets tend to use a lot of abbreviations to fit the key information into the parameters of a spreadsheet.
Which means that to fully understand what you’re looking at, you’ll need not just the dataset, but also the dataset dictionary or key. This will help you understand what the abbreviations in the columns and rows signify, and help you know exactly how each aspect of the dataset is defined and measured. Attention to detail is important, as these definitions may not be quite what you’d assume!
As you review the dataset overview and codebook, and the definitions become clearer, you’ll gain a better grasp of what the numbers of the dataset signify and how they relate to each other – you’re already well on your way to some strong analysis!
Observe and Ask Questions
Now that you have some context for what you’re looking at, review the dataset again. What do you notice? You may start to observe patterns, trends, or other points that stand out.
These observations offer a great starting point for your analysis of the dataset. Ask questions, look for the answers in the data, and follow your curiosity. This is called Exploratory Data Analysis (EDA), and it’s the heart of data analysis.
As you do this, you can take advantage of the tools in the program you’re using to dig deeper. The more time you spend practicing and building skills within your chosen programming language, the more freedom you’ll have to explore! Utilizing data visualizations can be especially helpful to understand trends within the data.
Explain Your Findings
As you explore the data, answers to your questions should start to become clear. The final step of analyzing a dataset is to explain what the data has revealed to you.
When explaining your findings, remember that good communication is an important part of being a statistician or data scientist. Share your conclusions in a way that is easy for others to understand (even non-statisticians). It should also be easy for others to understand how the data led you to these conclusions.
If you struggle to explain how the data demonstrated one of your points, that’s a clue to revisit the findings!
Exploring Datasets is a Skill Built with Practice and Curiosity
Datasets may seem daunting at first, but they’re also a rich resource for those who can learn to be comfortable with the unknown and set out for an adventure of questions and deepened understanding. Learning how to navigate through datasets and mine them for insights is a skillset that will serve you well no matter where your interests or career path lead you.
By practicing and staying curious, you’re well on your way to exciting data discoveries, no matter what kind of dataset you explore.
- Tukey, Design Thinking, and Better Questions
- Resources Roundup for Fall Data Challenge: After the Bell
- Introduction to the Dataset for Fall Data Challenge: After the Bell
The Power of Biostatistics: Eric J. Daza Holds the Key to Evolving the Healthcare Industry
Many students don’t realize how much they love statistics until they take their first class. This was the case for Dr. Eric J. Daza, a health data scientist with over two decades of experience. Throughout his educational and professional experiences, Eric has learned that statistics can be a framework for life. Among his many accomplishments,…
Enhancing Lives: Dr. Kathy Ensor Uses Statistics to Improve Public Health Across Communities
Statisticians and data scientists truly have the power to change the world for the better. Throughout the COVID-19 pandemic, statisticians, like Dr. Kathy Ensor, conducted crucial data analysis to inform real-world public health decisions. And that’s just the beginning! In addition to her work teaching the next generation of statisticians at George R. Brown School…