Deep Reinforcement Learning in Python
Learn and use powerful Deep Reinforcement Learning algorithms, including refinement and optimization techniques.
Start Course for Free4 hours15 videos49 exercises
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.Training 2 or more people?
Try DataCamp for BusinessLoved by learners at thousands of companies
Course Description
Discover the cutting-edge techniques that empower machines to learn and interact with their environments. You will dive into the world of Deep Reinforcement Learning (DRL) and gain hands-on experience with the most powerful algorithms driving the field forward. You will use PyTorch and the Gymnasium environment to build your own agents.
Master the Fundamentals of Deep Reinforcement Learning
Our journey begins with the foundations of DRL and their relationship to traditional Reinforcement Learning. From there, we swiftly move on to implementing Deep Q-Networks (DQN) in PyTorch, including advanced refinements such as Double DQN and Prioritized Experience Replay to supercharge your models. Take your skills to the next level as you explore policy-based methods. You will learn and implement essential policy-gradient techniques such as REINFORCE and Actor-Critic methods.Use Cutting-edge Algorithms
You will encounter powerful DRL algorithms commonly used in the industry today, including Proximal Policy Optimization (PPO). You will gain practical experience with the techniques driving breakthroughs in robotics, game AI, and beyond. Finally, you will learn to optimize your models using Optuna for hyperparameter tuning. By the end of this course, you will have acquired the skills to apply these cutting-edge techniques to real-world problems and harness DRL's full potential!Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.- 1
Introduction to Deep Reinforcement Learning
FreeDiscover how deep reinforcement learning improves upon traditional Reinforcement Learning while studying and implementing your first Deep Q Learning algorithm.
Introduction to deep reinforcement learning50 xpEnvironment and neural network setup100 xpDRL training loop100 xpIntroduction to deep Q learning50 xpDeep learning and DQN50 xpThe Q-Network architecture100 xpInstantiating the Q-Network100 xpThe barebone DQN algorithm50 xpBarebone DQN action selection100 xpBarebone DQN loss function100 xpTraining the barebone DQN100 xp - 2
Deep Q-learning
Dive into Deep Q-learning by implementing the original DQN algorithm, featuring Experience Replay, epsilon-greediness and fixed Q-targets. Beyond DQN, you will then explore two fascinating extensions that improve the performance and stability of Deep Q-learning: Double DQN and Prioritized Experience Replay.
DQN with experience replay50 xpThe double-ended queue100 xpExperience replay buffer100 xpDQN with experience replay100 xpThe complete DQN algorithm50 xpEpsilon-greediness100 xpFixed Q-targets100 xpImplementing the complete DQN algorithm100 xpDouble DQN50 xpOnline network and target network in DDQN100 xpTraining the double DQN100 xpPrioritized experience replay50 xpPrioritized experience replay buffer100 xpSampling from the PER buffer100 xpDQN with prioritized experience replay100 xp - 3
Introduction to Policy Gradient Methods
Learn about the foundational concepts of policy gradient methods found in DRL. You will begin with the policy gradient theorem, which forms the basis for these methods. Then, you will implement the REINFORCE algorithm, a powerful approach to learning policies. The chapter will then guide you through Actor-Critic methods, focusing on the Advantage Actor-Critic (A2C) algorithm, which combines the strengths of both policy gradient and value-based methods to enhance learning efficiency and stability.
Introduction to policy gradient50 xpThe policy network architecture100 xpWorking with discrete distributions100 xpPolicy gradient and REINFORCE50 xpAction selection in REINFORCE100 xpTraining the REINFORCE algorithm100 xpAdvantage Actor Critic50 xpCritic network100 xpActor Critic loss calculations100 xpTraining the A2C algorithm100 xp - 4
Proximal Policy Optimization and DRL Tips
Explore Proximal Policy Optimization (PPO) for robust DRL performance. Next, you will examine using an entropy bonus in PPO, which encourages exploration by preventing premature convergence to deterministic policies. You'll also learn about batch updates in policy gradient methods. Finally, you will learn about hyperparameter optimization with Optuna, a powerful tool for optimizing performance in your DRL models.
Proximal policy optimization50 xpThe clipped probability ratio100 xpThe clipped surrogate objective function100 xpEntropy bonus and PPO50 xpEntropy playground100 xpTraining the PPO algorithm100 xpBatch updates in policy gradient50 xpMinibatch and DRL50 xpA2C with batch updates100 xpHyperparameter optimization with Optuna50 xpHyperparameter or not?100 xpHands-on with Optuna100 xpCongratulations!50 xp
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.collaborators
audio recorded by
Timothée Carayol
See MorePrincipal Machine Learning Engineer
Timothée Carayol has been a passionate practitioner of data science and machine learning since 2010. Formerly a Research Data Scientist at Meta working on AI Infrastructure Analytics, Timothée recently took up a new challenge as Principal Machine Learning Engineer at Komment, where he helps build the future of software documentation.
What do other learners have to say?
FAQs
Join over 15 million learners and start Deep Reinforcement Learning in Python today!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.