Skip to main content
HomePythonFeature Engineering for NLP in Python

Feature Engineering for NLP in Python

4.1+
11 reviews
Advanced

Learn techniques to extract useful information from text and process them into a format suitable for machine learning.

Start Course for Free
4 Hours15 Videos52 Exercises
23,135 LearnersTrophyStatement of Accomplishment

Create Your Free Account

GoogleLinkedInFacebook

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.
GroupTraining 2 or more people?Try DataCamp For Business

Loved by learners at thousands of companies


Course Description

In this course, you will learn techniques that will allow you to extract useful information from text and process them into a format suitable for applying ML models. More specifically, you will learn about POS tagging, named entity recognition, readability scores, the n-gram and tf-idf models, and how to implement them using scikit-learn and spaCy. You will also learn to compute how similar two documents are to each other. In the process, you will predict the sentiment of movie reviews and build movie and Ted Talk recommenders. Following the course, you will be able to engineer critical features out of any text and solve some of the most challenging problems in data science!
For Business

GroupTraining 2 or more people?

Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and more
Try DataCamp for BusinessFor a bespoke solution book a demo.

In the following Tracks

Machine Learning Scientist with Python

Go To Track

Natural Language Processing in Python

Go To Track
  1. 1

    Basic features and readability scores

    Free

    Learn to compute basic features such as number of words, number of characters, average word length and number of special characters (such as Twitter hashtags and mentions). You will also learn to compute readability scores and determine the amount of education required to comprehend a piece of text.

    Play Chapter Now
    Introduction to NLP feature engineering
    50 xp
    Data format for ML algorithms
    50 xp
    One-hot encoding
    100 xp
    Basic feature extraction
    50 xp
    Character count of Russian tweets
    100 xp
    Word count of TED talks
    100 xp
    Hashtags and mentions in Russian tweets
    100 xp
    Readability tests
    50 xp
    Readability of 'The Myth of Sisyphus'
    100 xp
    Readability of various publications
    100 xp
  2. 2

    Text preprocessing, POS tagging and NER

    In this chapter, you will learn about tokenization and lemmatization. You will then learn how to perform text cleaning, part-of-speech tagging, and named entity recognition using the spaCy library. Upon mastering these concepts, you will proceed to make the Gettysburg address machine-friendly, analyze noun usage in fake news, and identify people mentioned in a TechCrunch article.

    Play Chapter Now
For Business

GroupTraining 2 or more people?

Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and more

In the following Tracks

Machine Learning Scientist with Python

Go To Track

Natural Language Processing in Python

Go To Track

Datasets

Russian Troll TweetsMovie Overviews and TaglinesPreprocessed Movie ReviewsTED Talk TranscriptsReal and Fake News Headlines

Collaborators

Collaborator's avatar
Hillary Green-Lerman
Collaborator's avatar
Adrián Soto
Rounak Banik HeadshotRounak Banik

Data Scientist at Fractal Analytics

See More

Don’t just take our word for it

*4.1
from 11 reviews
55%
27%
9%
0%
9%
Sort by
  • Pierre-Etienne T.
    11 months

    Synthetic, exciting and relevant

  • Anastasia M.
    11 months

    This is a great course that takes you from zero to hero. All concepts are clearly explained and repeatedly trained during the exercises. It was fun to accomplish this course! I'm looking forward to use this course in my DataCamp classroom!

  • Juan L.
    over 1 year

    I think it's one of the courses with the best instructors I've seen in Datacamp. Everything was super clear and explained with details for someone that lacked some background.

  • Son T.
    over 1 year

    Not so much deep knowledge in feature Engineering for NLP but ok

  • Dierk P.
    over 1 year

    Well done

"Synthetic, exciting and relevant"

Pierre-Etienne T.

"This is a great course that takes you from zero to hero. All concepts are clearly explained and repeatedly trained during the exercises. It was fun to accomplish this course! I'm looking forward to use this course in my DataCamp classroom!"

Anastasia M.

"I think it's one of the courses with the best instructors I've seen in Datacamp. Everything was super clear and explained with details for someone that lacked some background."

Juan L.

Join over 13 million learners and start Feature Engineering for NLP in Python today!

Create Your Free Account

GoogleLinkedInFacebook

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.