Skip to main content
HomeBlogDataLab

Building Your Data Science Portfolio with DataCamp Workspace (Part 3): Add Machine Learning Workspace

Learn how to leverage DataCamp Workspace to produce a machine-learning project to add to your data science portfolio. We cover how to get started, how to structure your work, and common mistakes to avoid.
May 2023  · 8 min read

This is the third and final article in our series on how to use DataCamp Workspace to create a data science portfolio. In Part 1, we covered the basics of a portfolio with Workspace. In Part 2, we dove into creating an analytics project. Finally, in this article, we will develop a machine learning project.

Why? A machine learning project can be a great way to show off your ability to process data, select and fit appropriate models, and demonstrate your ability to solve practical problems.

What is a Machine Learning Project and How Do I Get Started?

A portfolio machine learning project demonstrates you can build an end-to-end machine learning solution. This covers conceptualizing the problem, evaluating a model’s performance, and interpreting the results.

The first decision you must make is what type of project you will create. This should be based on several factors, including:

What types of skills will I be expected to perform?

If you are applying for a marketing position, perhaps demonstrating knowledge of segmentation techniques is a good choice. On the other hand, if you aim for a position in the financial sector, predicting a company's or portfolio's future earnings might be more appropriate. Try to anticipate the types of tasks your desired role would perform regularly.

What kind of data will I be working with?

You should also tailor the data to the types of roles you are applying to. For example, don’t use iris flower data to demonstrate your classification skills when applying for a sales analyst role. Try to get your hands on relevant data!

If you want inspiration, be sure to check out this article on machine learning projects for all levels. You can also check out our curated datasets on Workspace which include prompts that can get you started!

Key Sections of a Machine Learning Project

As with an analytics project, sticking to a structure makes your work easier to follow while maintaining focus.

While the structure of a machine learning project for a portfolio might not be identical to the one you would use in production, we’ve outlined a good format for a portfolio project in Workspace below. See our Developing Machine Learning Models for Production course for more on machine learning models in production.

Here is where you motivate the purpose of your project. What problem are you trying to solve? How might stakeholders be able to use your results? Readers should come away from this section with a clear understanding of how your project provides value.

The data you select for a machine learning project is rarely ready for modeling. Therefore, you must be transparent about how you acquired your data and the steps you took to clean it.

It is important to perform a thorough exploratory analysis of your data prior to modeling. This includes (but is not limited to):

  • Performing data validation
  • Reviewing missing data
  • Visualizing distributions and frequencies
  • Identifying outliers
  • Analyzing relationships in the data

A strong exploratory analysis will inform decisions such as pre-processing steps and the models that you select. Be sure to check out Exploratory Data Analysis in Python or Exploratory Data Analysis in R for a thorough review of exploratory analysis techniques! Alternatively, you can also check out this Python tutorial on the topic.

Note: Workspace Chart cells can be a handy way to create clean and interactive visualizations of your data and save you precious coding time.

Horizontal bar chart created in seconds from a DataFrame

4. Feature Engineering and Pre-Processing

Data rarely comes in a format that is immediately ready for a machine learning model. Feature engineering and pre-processing your data are essential to make your model run faster, prevent overfitting, and improve overall performance.

Things you might include here (but are not limited to) are:

  • Data transformations, such as scaling your numeric data
  • Encoding categorical variables (see our Workspace template here)
  • Imputing missing data
  • Dimensionality reduction
  • Creating new features

Get caught up on essential techniques by taking our Feature Engineering for Machine Learning in Python or Feature Engineering in R!

This section should house all your modeling steps, including fitting a model, making predictions, and evaluating the performance. It also includes additional stages, such as tuning your model and comparing different models.

As with any data science topic, DataCamp has a wealth of resources to get you caught up on all the techniques you will need:

Of course, this is only a small selection of the machine learning content available on DataCamp. For a more comprehensive overview, be sure to check out our full catalog of machine learning and AI courses. Alternatively, you can also check out our Workspace templates that have code ready for regression and classification workflows.

This is where you translate your results into action. What did you accomplish with your project? Are you able to accurately predict who will churn from your subscription service? What is your margin of error when predicting house prices?

You should also think about future steps for the project. What could you do to build upon the work? This is your time to show that you are an excellent coder and can help others extract value from your work.

Tip: Take full advantage of Workspace text cells to ensure your work is nicely formatted and contains appropriate headers. Not only does this make your work look more professional, but it will also auto-generate a table of contents that users can use to navigate the project.

To quickly get started with this structure, you can use this Python or R workspace template!

An extract of a table of contents generated from workspace headers!

Common Mistakes to Avoid

Getting too technical

Ultimately your work should be solving a problem. You may be incredibly gifted on a technical level, but it is wasted talent if others can’t extract value from your work.

Make sure that you can translate the technical outcomes for a non-technical audience. For instance, what does the error of your predictive model mean when it goes into practice? How do the groups in your customer segmentation differ, and how can the marketing team leverage this analysis?

A helpful trick to reduce your written content's complexity is using the AI Generate feature in Workspace text cells to provide simple explanations of technical terms.

Getting inspiration for explanations

Spreading yourself too thin

It can be tempting to use the project to show off every modeling technique you know. However, you often risk overwhelming the reader while also demonstrating a lack of editing skills. It is great if you tried multiple techniques and compared the results, but oftentimes these are best saved for an appendix.

Instead, try to focus on one or two modeling techniques, analyze and evaluate them thoroughly, and shift additional work to the appendix with a reference. This is far more digestible for readers and also shows a commitment to quality over quantity.

Next Steps

If you need to brush up on some machine learning techniques, our extensive course catalog on the topic should get you caught up quickly. Otherwise, we recommend you already jump into Workspace and get coding! As mentioned earlier in the article, you can create a workspace directly from a dataset.

Alternatively, you can also create an empty workspace from scratch. Get started now in either Python or R!

Build your data portfolio

Showcase your skills and projects in minutes.

Build Your Portfolio
Topics
Related

blog

Building Your Data Science Portfolio With DataCamp Workspace (Part 2): Add an Analytics Workspace

Get tips and tricks for creating an analytics-style project to add to your data science portfolio using DataCamp Workspace.
Justin Saddlemyer's photo

Justin Saddlemyer

10 min

blog

Building Your Data Science Portfolio with DataCamp Workspace (Part 1)

Learn how to build a comprehensive data science portfolio by exploring examples different examples, mastering tips to make your work stand out, and utilizing the DataCamp Workspace effectively to showcase your results.
Justin Saddlemyer's photo

Justin Saddlemyer

9 min

blog

How Workspace empowers learners’ personal development

Learn how Workspace is becoming the go-to tool for our learners to apply their data science skills and enhance their career opportunities.

Olivia van Aalst

3 min

blog

How To Use Workspace AI-Powered Notebooks for Every Data Skill Level

Find out how DataCamp Workspace and its AI Assistant can boost your data science workflow - regardless of your skill level.
Alena Guzharina's photo

Alena Guzharina

6 min

code-along

Land Your Dream Job with a Data Science Portfolio

We discuss the importance of a data science portfolio and how to build it.
Richie Cotton's photo

Richie Cotton

code-along

Using DataLab in Data Academies

Learn how to use DataCamp Workspace as part of a corporate training program
Filip Schouwenaars's photo

Filip Schouwenaars

See MoreSee More