Review: PR-005-Playing Atari with Deep Reinforcement Learning | by Joonsu Oh | Medium

Review: PR-005-Playing Atari with Deep Reinforcement Learning
Joonsu Oh
·Follow
2 min read·
May 23, 2021
--
DisclaimerI am a novice on deep reinforcement learning.
This article will really superficially cover the paper only to grasp main contribution of this paper.
AbstractFirst case that applied deep learning on reinforcement learning.
Successfully learn control policies directly from high-dimensional sensory input (pixels)
Applied on 7 Atari games — outperforms ALL previous approaches on 6 games, surpasses a human expert on 3 games
Challenges in RLMost DL requires hand labeled training data (supervised learning only)
RL must learn from a scalar reward signal
Reward signal is often sparse, noisy, and delayed
Delay between actions and resulting rewards can be thousands of time step
Solution: CNN with a variant Q-learning!
Most DL assumes data samples are independent while RL encounters sequences of highly correlated states
Solution: Experience replay
RL BackgroundAgent and Environment
Figure 1State
Figure 2Major Components of an RL AgentPolicy: Agent’s behaviour function — Deterministic Policy: a = π(s), Stochastic Policy: π(a|s) = P[a|s]
Value Function: How good is each state and/or action
Figure 3Model: Agent’s representation of the environment
Approaches To Reinforcement LearningValue-based RL: Estimate the optimal value function Q*(s, a), which is the maximum value achievable under any policy
Policy-based RL: Select directly for the optimal policy π*, which is the policy achieving maximum future reward
Q-Learning
Figure 4Note that the Q-value is estimated by a simple neural network
But it is really easy for the NN to diverge due to correlation between samples and non-stationary targets.
Experience Replay
Figure 5DQN in Atari
--
--
Written by Joonsu Oh5 Followers
·5 Following
No responses yet
Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams