Case Study: School Budgeting with Machine Learning in Python

Learn how to build a model to automatically classify items in a school budget.

4 heures15 vidéos51 exercices59 259 apprenantsDéclaration de réalisation

Créez votre compte gratuit

En continuant, vous acceptez nos Conditions d'utilisation, notre Politique de confidentialité et le fait que vos données sont stockées aux États-Unis.

Formation de 2 personnes ou plus ?

Essayer DataCamp for Business

Apprécié par les apprenants de milliers d'entreprises

Description du cours

Data science isn't just for predicting ad-clicks-it's also useful for social impact! This course is a case study from a machine learning competition on DrivenData. You'll explore a problem related to school district budgeting. By building a model to automatically classify items in a school's budget, it makes it easier and faster for schools to compare their spending with other schools. In this course, you'll begin by building a baseline model that is a simple, first-pass approach. In particular, you'll do some natural language processing to prepare the budgets for modeling. Next, you'll have the opportunity to try your own techniques and see how they compare to participants from the competition. Finally, you'll see how the winner was able to combine a number of expert techniques to build the most accurate model.

Pour les entreprises

Formation de 2 personnes ou plus ?

Donnez à votre équipe l’accès à la plateforme DataCamp complète, y compris toutes les fonctionnalités.

1
Exploring the raw data
Gratuit
In this chapter, you'll be introduced to the problem you'll be solving in this course. How do you accurately classify line-items in a school budget based on what that money is being used for? You will explore the raw text and numeric values in the dataset, both quantitatively and visually. And you'll learn how to measure success when trying to predict class labels for each row of the dataset.
Jouez Au Chapitre Maintenant
Introducing the challenge
50 xp
What category of problem is this?
50 xp
What is the goal of the algorithm?
50 xp
Exploring the data
50 xp
Loading the data
50 xp
Summarizing the data
100 xp
Looking at the datatypes
50 xp
Exploring datatypes in pandas
50 xp
Encode the labels as categorical variables
100 xp
Counting unique labels
100 xp
How do we measure success?
50 xp
Penalizing highly confident wrong answers
50 xp
Computing log loss with NumPy
100 xp
2
Creating a simple first model
In this chapter, you'll build a first-pass model. You'll use numeric data only to train the model. Spoiler alert - throwing out all of the text data is bad for performance! But you'll learn how to format your predictions. Then, you'll be introduced to natural language processing (NLP) in order to start working with the large amounts of text in the data.
Jouez Au Chapitre Maintenant
It's time to build a model
50 xp
Setting up a train-test split in scikit-learn
100 xp
Training a model
100 xp
Making predictions
50 xp
Use your model to predict values on holdout data
100 xp
Writing out your results to a csv for submission
100 xp
A very brief introduction to NLP
50 xp
Tokenizing text
50 xp
Testing your NLP credentials with n-grams
50 xp
Representing text numerically
50 xp
Creating a bag-of-words in scikit-learn
100 xp
Combining text columns for tokenization
100 xp
What's in a token?
100 xp
3
Improving your model
Here, you'll improve on your benchmark model using pipelines. Because the budget consists of both text and numeric data, you'll learn to how build pipielines that process multiple types of data. You'll also explore how the flexibility of the pipeline workflow makes testing different approaches efficient, even in complicated problems like this one!
Jouez Au Chapitre Maintenant
Pipelines, feature & text preprocessing
50 xp
Instantiate pipeline
100 xp
Preprocessing numeric features
100 xp
Text features and feature unions
50 xp
Preprocessing text features
100 xp
Multiple types of processing: FunctionTransformer
100 xp
Multiple types of processing: FeatureUnion
100 xp
Choosing a classification model
50 xp
Using FunctionTransformer on the main dataset
100 xp
Add a model to the pipeline
100 xp
Try a different class of model
100 xp
Can you adjust the model or parameters to improve accuracy?
100 xp
4
Learning from the experts
In this chapter, you will learn the tricks used by the competition winner, and implement them yourself using scikit-learn. Enjoy!
Jouez Au Chapitre Maintenant
Learning from the expert: processing
50 xp
How many tokens?
50 xp
Deciding what's a word
100 xp
N-gram range in scikit-learn
100 xp
Learning from the expert: a stats trick
50 xp
Which models of the data include interaction terms?
50 xp
Implement interaction modeling in scikit-learn
100 xp
Learning from the expert: the winning model
50 xp
Why is hashing a useful trick?
50 xp
Implementing the hashing trick in scikit-learn
100 xp
Build the winning model
100 xp
What tactics got the winner the best score?
50 xp
Next steps and the social impact of your work
50 xp

Pour les entreprises

Formation de 2 personnes ou plus ?

Donnez à votre équipe l’accès à la plateforme DataCamp complète, y compris toutes les fonctionnalités.

collaborateurs

Hugo Bowne-Anderson

Yashas Roy

Casey Fitzpatrick

prérequis

Supervised Learning with scikit-learn

Peter Bull

Co-founder of DrivenData

Qu’est-ce que les autres apprenants ont à dire ?

Inscrivez-vous 15 millions d’apprenants et commencer Case Study: School Budgeting with Machine Learning in Python Aujourd’hui!

Créez votre compte gratuit

Google LinkedIn Facebook

En continuant, vous acceptez nos Conditions d'utilisation, notre Politique de confidentialité et le fait que vos données sont stockées aux États-Unis.

Description du cours

.css-10r9e5n{-webkit-margin-end:8px;margin-inline-end:8px;}.css-1309hh9{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;-webkit-margin-end:8px;margin-inline-end:8px;}Formation de 2 personnes ou plus ?

Exploring the raw data

Creating a simple first model

Improving your model

Learning from the experts

Formation de 2 personnes ou plus ?

Qu’est-ce que les autres apprenants ont à dire ?

Inscrivez-vous .css-ou6dz6{color:#03ef62;}15 millions d’apprenants et commencer Case Study: School Budgeting with Machine Learning in Python Aujourd’hui!

Créez votre compte gratuit

Formation de 2 personnes ou plus ?

Inscrivez-vous 15 millions d’apprenants et commencer Case Study: School Budgeting with Machine Learning in Python Aujourd’hui!