Case Study: School Budgeting with Machine Learning in Python

Learn how to build a model to automatically classify items in a school budget.

4 Stunden15 Videos51 Übungen59.276 LernendeLeistungsnachweis

Kostenloses Konto erstellen

oder

Durch Klick auf die Schaltfläche akzeptierst du unsere Nutzungsbedingungen, unsere Datenschutzrichtlinie und die Speicherung deiner Daten in den USA.

Trainierst du 2 oder mehr?

Versuchen DataCamp for Business

Beliebt bei Lernenden in Tausenden Unternehmen

Kursbeschreibung

Data science isn't just for predicting ad-clicks-it's also useful for social impact! This course is a case study from a machine learning competition on DrivenData. You'll explore a problem related to school district budgeting. By building a model to automatically classify items in a school's budget, it makes it easier and faster for schools to compare their spending with other schools. In this course, you'll begin by building a baseline model that is a simple, first-pass approach. In particular, you'll do some natural language processing to prepare the budgets for modeling. Next, you'll have the opportunity to try your own techniques and see how they compare to participants from the competition. Finally, you'll see how the winner was able to combine a number of expert techniques to build the most accurate model.

Für Unternehmen

Trainierst du 2 oder mehr?

Verschaffen Sie Ihrem Team Zugriff auf die vollständige DataCamp-Plattform, einschließlich aller Funktionen.

1
Exploring the raw data
Kostenlos
In this chapter, you'll be introduced to the problem you'll be solving in this course. How do you accurately classify line-items in a school budget based on what that money is being used for? You will explore the raw text and numeric values in the dataset, both quantitatively and visually. And you'll learn how to measure success when trying to predict class labels for each row of the dataset.
Kapitel Jetzt Abspielen
Introducing the challenge
50 xp
What category of problem is this?
50 xp
What is the goal of the algorithm?
50 xp
Exploring the data
50 xp
Loading the data
50 xp
Summarizing the data
100 xp
Looking at the datatypes
50 xp
Exploring datatypes in pandas
50 xp
Encode the labels as categorical variables
100 xp
Counting unique labels
100 xp
How do we measure success?
50 xp
Penalizing highly confident wrong answers
50 xp
Computing log loss with NumPy
100 xp
2
Creating a simple first model
In this chapter, you'll build a first-pass model. You'll use numeric data only to train the model. Spoiler alert - throwing out all of the text data is bad for performance! But you'll learn how to format your predictions. Then, you'll be introduced to natural language processing (NLP) in order to start working with the large amounts of text in the data.
Kapitel Jetzt Abspielen
It's time to build a model
50 xp
Setting up a train-test split in scikit-learn
100 xp
Training a model
100 xp
Making predictions
50 xp
Use your model to predict values on holdout data
100 xp
Writing out your results to a csv for submission
100 xp
A very brief introduction to NLP
50 xp
Tokenizing text
50 xp
Testing your NLP credentials with n-grams
50 xp
Representing text numerically
50 xp
Creating a bag-of-words in scikit-learn
100 xp
Combining text columns for tokenization
100 xp
What's in a token?
100 xp
3
Improving your model
Here, you'll improve on your benchmark model using pipelines. Because the budget consists of both text and numeric data, you'll learn to how build pipielines that process multiple types of data. You'll also explore how the flexibility of the pipeline workflow makes testing different approaches efficient, even in complicated problems like this one!
Kapitel Jetzt Abspielen
Pipelines, feature & text preprocessing
50 xp
Instantiate pipeline
100 xp
Preprocessing numeric features
100 xp
Text features and feature unions
50 xp
Preprocessing text features
100 xp
Multiple types of processing: FunctionTransformer
100 xp
Multiple types of processing: FeatureUnion
100 xp
Choosing a classification model
50 xp
Using FunctionTransformer on the main dataset
100 xp
Add a model to the pipeline
100 xp
Try a different class of model
100 xp
Can you adjust the model or parameters to improve accuracy?
100 xp
4
Learning from the experts
In this chapter, you will learn the tricks used by the competition winner, and implement them yourself using scikit-learn. Enjoy!
Kapitel Jetzt Abspielen
Learning from the expert: processing
50 xp
How many tokens?
50 xp
Deciding what's a word
100 xp
N-gram range in scikit-learn
100 xp
Learning from the expert: a stats trick
50 xp
Which models of the data include interaction terms?
50 xp
Implement interaction modeling in scikit-learn
100 xp
Learning from the expert: the winning model
50 xp
Why is hashing a useful trick?
50 xp
Implementing the hashing trick in scikit-learn
100 xp
Build the winning model
100 xp
What tactics got the winner the best score?
50 xp
Next steps and the social impact of your work
50 xp

Für Unternehmen

Trainierst du 2 oder mehr?

Verschaffen Sie Ihrem Team Zugriff auf die vollständige DataCamp-Plattform, einschließlich aller Funktionen.

Mitwirkende

Hugo Bowne-Anderson

Yashas Roy

Casey Fitzpatrick

Voraussetzungen

Supervised Learning with scikit-learn

Peter Bull

Co-founder of DrivenData

Was sagen andere Lernende?

Melden Sie sich an 15 Millionen Lernende und starten Sie Case Study: School Budgeting with Machine Learning in Python Heute!

Kostenloses Konto erstellen

Google LinkedIn Facebook

oder

Durch Klick auf die Schaltfläche akzeptierst du unsere Nutzungsbedingungen, unsere Datenschutzrichtlinie und die Speicherung deiner Daten in den USA.

Kursbeschreibung

.css-10r9e5n{-webkit-margin-end:8px;margin-inline-end:8px;}.css-1309hh9{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;-webkit-margin-end:8px;margin-inline-end:8px;}Trainierst du 2 oder mehr?

Exploring the raw data

Creating a simple first model

Improving your model

Learning from the experts

Trainierst du 2 oder mehr?

Was sagen andere Lernende?

Melden Sie sich an .css-ou6dz6{color:#03ef62;}15 Millionen Lernende und starten Sie Case Study: School Budgeting with Machine Learning in Python Heute!

Kostenloses Konto erstellen

Trainierst du 2 oder mehr?

Melden Sie sich an 15 Millionen Lernende und starten Sie Case Study: School Budgeting with Machine Learning in Python Heute!