Skip to main content

How to Become a Data Scientist

Find out if data science is for you, what courses to start with, how to build a portfolio, and how to land you your first job.
Feb 2022  · 10 min read

Data Science Concept Vector Image

Introduction

Data science is everywhere right now. One after the other, companies around the world are turning to data science to solve the most diverse problems out there. This situation has put data scientists in an advantageous position regarding employment and payment.

Naturally, a lot of people are becoming interested in learning how to become a data scientist. If you are reading this article, you may be one of those people. Here, we will show you an effective and attractive way to learn what you need to get started in the data science world.

Data Scientist Learning Path

The path to becoming a data scientist is long. You need to master a wide set of skills such as coding, math, and communication. In this article, we propose a checklist for those who are considering data science as a good choice for a professional career. The checklist is for beginners, yet if you are not quite a beginner you may still gain some useful insights.

The main requirement to become a good data scientist is to be passionate about what you do. It will consume a lot of time to learn all that you need and to complete projects. Hence, the first step is to make sure you enjoy working with data science.

1: Find whether data science is for you

We strongly recommend finding out about where data scientists can work and what their average workday might look like. Logistics aside, knowing whether data science is for you requires getting your hands on data and analyzing it. It could be a good start to read some articles about the tasks that a data scientist does.

You could think that to be a data analyst you would need to know advanced math or programming, yet making simple intuitive analysis of available data and arriving at conclusions can be enough.

You could consider reading some articles like this one on gun deaths in the United States or this one on online communities supporting Donald Trump. The data used in these articles is available to download and you can open it with a tool like Excel and analyze it. For example, you can question whether the article seems to be right and why. It is also a great exercise to think about other questions that the article doesn’t answer. If you think answering those questions is an interesting thing to do, you will likely enjoy data science.

Data science is used in a wide variety of industries. If you are not motivated enough, think about a field you are passionate about, maybe sports, the stock market, or health care. Try to do a little research about how data science is being applied in a field you’re interested in; try to read some articles and analyze the data related to them. This will give you an idea of the work carried out by data scientists and if it is a good fit for you. If you think it is, then it’s time to take the second step, learning to code.

2: Learn to code

Humans are unable to make all the calculations required to process great amounts of data. We have computers to do that for us, but we need to know how to make computers do what we want them to do. For this reason, every data scientist needs to know at least one programming language.

The two main languages for data science are Python and R. Python serves a multitude of purposes, whilst R is a language just for doing data science.

There are a lot of resources out there to help you learn any programming language. For example, you can take the Python or R introductory free courses by DataCamp. This way, you’ll create your working environment, and get used to the language syntax and the specific tools used to conduct data science in that language.

Don’t aim to be completely proficient in the language before advancing to the next step. Coding skills come with practice. Just try to learn the basics and understand the different elements of the language. Becoming an expert is something that takes time and will start with the next step.

3: Begin making projects

By now, you will likely have read some interesting articles and have a bunch of questions that you think are interesting too. You may also have basic knowledge about a language for data science and have written some code. It’s time to conduct your first data science projects.

If you feel you are not ready to start a project from scratch, I strongly recommend DataCamp’s guided projects. This way you won’t be alone and you can start to gain some insights into data science projects. After that, you can start other projects that you are passionate about. It doesn’t matter if you are not doing anything new, or if you’re not convinced your conclusions or outcomes are particularly useful. Thinking about how good your solutions are is a great exercise. If you can identify the weaknesses of your methodologies and algorithms, you are on the right path to becoming an awesome data scientist.

No matter what projects you decide to do, you will always need data. Collecting the data is not always an easy task. You can look at Google Dataset Search, Kaggle, and the UCI Machine Learning Repository, among other places.

Always approach your projects like something that will be published and needs to be understood by many people. These projects will be your main presentation card when you begin to look for a job or an internship. Take your time, add some documentation and try to get the code to be as clean as possible.

After doing a project, it’s time to let others know about it, get some feedback, and get more visibility.

4: Publish your projects and interact with others

Publishing your progress is not just a great way to become more visible to potential employers, it is also the best way to learn and get more expertise. Both the feedback you get from other people and the effort you put into making others understand your work will help you to grow. Bear in mind that an important part of the data scientist’s job is to communicate your results to others.

Uploading your projects to GitHub, and starting to build a portfolio with them is a great way to practice communicating your findings to others. Sharing your progress on social media platforms like LinkedIn or Twitter is also recommended. If you want to go even further, you could start writing your blog to document your progress and to share what you have learned with others. There are other options to share with and learn from others, such as on Reddit, Quora, and on Kaggle’s discussions.

This step makes the difference between slow and fast progress in data science. There is a lot to learn, but learning it alone is a very difficult task.

At this point, you should have some basic knowledge of how the data science pipeline works and why. You will likely have gained some basic statistics and math knowledge during your journey so far, but it is maybe intuitive and poorly formalized. Having a solid theoretical background is very important and will let you face the most diverse and difficult scenarios. With the next step, you’ll complete the checklist to become a data scientist.

5: Gradually gain a theoretical understanding

This is the last step of the data scientist learning path. Trying to understand the theory at the beginning would be a mistake. With the development of several tools in recent years, doing data science is easier than ever. You don’t need to know the theory behind the algorithms to implement a decent solution. On the other hand, math, statistics, probability, and machine learning theory can be very hard and take time to understand; you may become discouraged if you don’t have a solid, pre-existing math background.

That’s why you should begin building things, focusing on learning to code and solving problems without paying too much attention to why and how those algorithms work. You might be amazed to find just how many projects you can complete this way. Only after building some projects will you be in the perfect situation to get a deeper understanding of the theory.

However, theoretical knowledge is what will set you apart from other data science practitioners. It will allow you to solve very complex and specific problems, validate your methodologies, and make better decisions.

You will need to learn about algebra, calculus, statistics, probabilities, optimization, and machine learning theory, among other fields. It is advisable to stay focused on your area of interest so you invest most of your time learning what you will really need in that area. DataCamp offers a lot of introductory materials in this regard. Also, there are more advanced courses available for free like the machine learning course by Andrew Ng on Coursera, which is free if you decide not to get the certificate. This free data science course on Udacity is also a good option for this stage.

Although these five steps will have set you up well for becoming a successful data scientist, this is not the end of your path. A data scientist should always be learning because this field is progressing at a mind-blowing pace. You should keep learning to write better code, exploring new tools and languages, building new projects, and learning yet other theoretical concepts.

Conclusion

In this article, we proposed a five-step learning path to becoming a data scientist. We strongly recommend you focus on building and making a portfolio of projects. In this regard, I highly recommend the DataCamp Workspace, a place where you can practice a wide variety of skills and build your portfolio through your DataCamp profile. This can even hold you in good stead when looking for jobs. Having a good theoretical understanding is very important, yet learning the theory should happen organically as you work on projects you are interested in. You should keep in mind that the data scientist learning path is way larger than five steps, so keep learning and building.

← Back to Blogs