Today we're going to use double Q learning to deal with the problem of maximization bias in reinforcement learning problems. We'll use the Open AI gym's cart pole example.
We get maximization bias when we use the same set of samples to calculate the max action, and to calculate the value of that action. We can deal with this by using two estimates of the action value function, and alternating between them.
Code for this video is here:
[ Ссылка ]
Learn how to turn deep reinforcement learning papers into code:
Get instant access to all my courses, including the new Prioritized Experience Replay course, with my subscription service. $29 a month gives you instant access to 42 hours of instructional content plus access to future updates, added monthly.
Discounts available for Udemy students (enrolled longer than 30 days). Just send an email to sales@neuralnet.ai
[ Ссылка ]
Or, pickup my Udemy courses here:
Deep Q Learning:
[ Ссылка ]
Actor Critic Methods:
[ Ссылка ]
Curiosity Driven Deep Reinforcement Learning
[ Ссылка ]
Natural Language Processing from First Principles:
[ Ссылка ]
Reinforcement Learning Fundamentals
[ Ссылка ]
Here are some books / courses I recommend (affiliate links):
Grokking Deep Learning in Motion: [ Ссылка ]
Grokking Deep Learning: [ Ссылка ]
Grokking Deep Reinforcement Learning: [ Ссылка ]
Come hang out on Discord here:
[ Ссылка ]
Need personalized tutoring? Help on a programming project? Shoot me an email! phil@neuralnet.ai
Website: [ Ссылка ]
Github: [ Ссылка ]
Twitter: [ Ссылка ]
#OpenAIGym #ReinforcementLearning #DoubleQLearning
Ещё видео!