Case Study: School Budgeting with Machine Learning in Python

Learn how to build a model to automatically classify items in a school budget.

Comece O Curso Gratuitamente

4 horas15 vídeos51 exercícios59.274 aprendizesDeclaração de Realização

Crie sua conta gratuita

Google LinkedIn Facebook

Ao continuar, você aceita nossos Termos de Uso, nossa Política de Privacidade e que seus dados são armazenados nos EUA.

Treinar 2 ou mais pessoas?

Tentar DataCamp for Business

Amado por alunos de milhares de empresas

Descrição do Curso

Data science isn't just for predicting ad-clicks-it's also useful for social impact! This course is a case study from a machine learning competition on DrivenData. You'll explore a problem related to school district budgeting. By building a model to automatically classify items in a school's budget, it makes it easier and faster for schools to compare their spending with other schools. In this course, you'll begin by building a baseline model that is a simple, first-pass approach. In particular, you'll do some natural language processing to prepare the budgets for modeling. Next, you'll have the opportunity to try your own techniques and see how they compare to participants from the competition. Finally, you'll see how the winner was able to combine a number of expert techniques to build the most accurate model.

Para Empresas

Treinar 2 ou mais pessoas?

Obtenha acesso à sua equipe à plataforma DataCamp completa, incluindo todos os recursos.

1
Exploring the raw data
Gratuito
In this chapter, you'll be introduced to the problem you'll be solving in this course. How do you accurately classify line-items in a school budget based on what that money is being used for? You will explore the raw text and numeric values in the dataset, both quantitatively and visually. And you'll learn how to measure success when trying to predict class labels for each row of the dataset.
Reproduzir Capítulo Agora
Introducing the challenge
50 xp
What category of problem is this?
50 xp
What is the goal of the algorithm?
50 xp
Exploring the data
50 xp
Loading the data
50 xp
Summarizing the data
100 xp
Looking at the datatypes
50 xp
Exploring datatypes in pandas
50 xp
Encode the labels as categorical variables
100 xp
Counting unique labels
100 xp
How do we measure success?
50 xp
Penalizing highly confident wrong answers
50 xp
Computing log loss with NumPy
100 xp
2
Creating a simple first model
In this chapter, you'll build a first-pass model. You'll use numeric data only to train the model. Spoiler alert - throwing out all of the text data is bad for performance! But you'll learn how to format your predictions. Then, you'll be introduced to natural language processing (NLP) in order to start working with the large amounts of text in the data.
Reproduzir Capítulo Agora
It's time to build a model
50 xp
Setting up a train-test split in scikit-learn
100 xp
Training a model
100 xp
Making predictions
50 xp
Use your model to predict values on holdout data
100 xp
Writing out your results to a csv for submission
100 xp
A very brief introduction to NLP
50 xp
Tokenizing text
50 xp
Testing your NLP credentials with n-grams
50 xp
Representing text numerically
50 xp
Creating a bag-of-words in scikit-learn
100 xp
Combining text columns for tokenization
100 xp
What's in a token?
100 xp
3
Improving your model
Here, you'll improve on your benchmark model using pipelines. Because the budget consists of both text and numeric data, you'll learn to how build pipielines that process multiple types of data. You'll also explore how the flexibility of the pipeline workflow makes testing different approaches efficient, even in complicated problems like this one!
Reproduzir Capítulo Agora
Pipelines, feature & text preprocessing
50 xp
Instantiate pipeline
100 xp
Preprocessing numeric features
100 xp
Text features and feature unions
50 xp
Preprocessing text features
100 xp
Multiple types of processing: FunctionTransformer
100 xp
Multiple types of processing: FeatureUnion
100 xp
Choosing a classification model
50 xp
Using FunctionTransformer on the main dataset
100 xp
Add a model to the pipeline
100 xp
Try a different class of model
100 xp
Can you adjust the model or parameters to improve accuracy?
100 xp
4
Learning from the experts
In this chapter, you will learn the tricks used by the competition winner, and implement them yourself using scikit-learn. Enjoy!
Reproduzir Capítulo Agora
Learning from the expert: processing
50 xp
How many tokens?
50 xp
Deciding what's a word
100 xp
N-gram range in scikit-learn
100 xp
Learning from the expert: a stats trick
50 xp
Which models of the data include interaction terms?
50 xp
Implement interaction modeling in scikit-learn
100 xp
Learning from the expert: the winning model
50 xp
Why is hashing a useful trick?
50 xp
Implementing the hashing trick in scikit-learn
100 xp
Build the winning model
100 xp
What tactics got the winner the best score?
50 xp
Next steps and the social impact of your work
50 xp

Para Empresas

Treinar 2 ou mais pessoas?

Obtenha acesso à sua equipe à plataforma DataCamp completa, incluindo todos os recursos.

colaboradores

Hugo Bowne-Anderson

Yashas Roy

Casey Fitzpatrick

pré-requisitos

Supervised Learning with scikit-learn

Peter Bull

Co-founder of DrivenData

Ver Mais

O que os outros alunos têm a dizer?

Junte-se a mais de 15 milhões de alunos e comece Case Study: School Budgeting with Machine Learning in Python hoje mesmo!

Crie sua conta gratuita

Google LinkedIn Facebook

Ao continuar, você aceita nossos Termos de Uso, nossa Política de Privacidade e que seus dados são armazenados nos EUA.

Descrição do Curso

.css-10r9e5n{-webkit-margin-end:8px;margin-inline-end:8px;}.css-1309hh9{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;-webkit-margin-end:8px;margin-inline-end:8px;}Treinar 2 ou mais pessoas?

Exploring the raw data

Creating a simple first model

Improving your model

Learning from the experts

Treinar 2 ou mais pessoas?

O que os outros alunos têm a dizer?

Junte-se a mais de .css-ou6dz6{color:#03ef62;}15 milhões de alunos e comece Case Study: School Budgeting with Machine Learning in Python hoje mesmo!

Crie sua conta gratuita

Treinar 2 ou mais pessoas?

Junte-se a mais de 15 milhões de alunos e comece Case Study: School Budgeting with Machine Learning in Python hoje mesmo!