Review: PR-005-Playing Atari with Deep Reinforcement Learning
Disclaimer
- I am a novice on deep reinforcement learning.
- This article will really superficially cover the paper only to grasp main contribution of this paper.
Abstract
- First case that applied deep learning on reinforcement learning.
- Successfully learn control policies directly from high-dimensional sensory input (pixels)
- Applied on 7 Atari games — outperforms ALL previous approaches on 6 games, surpasses a human expert on 3 games
Challenges in RL
- Most DL requires hand labeled training data (supervised learning only)
- RL must learn from a scalar reward signal
- Reward signal is often sparse, noisy, and delayed
- Delay between actions and resulting rewards can be thousands of time step
- Solution: CNN with a variant Q-learning!
- Most DL assumes data samples are independent while RL encounters sequences of highly correlated states
- Solution: Experience replay
RL Background
Agent and Environment
State
Major Components of an RL Agent
- Policy: Agent’s behaviour function — Deterministic Policy: a = π(s), Stochastic Policy: π(a|s) = P[a|s]
- Value Function: How good is each state and/or action
- Model: Agent’s representation of the environment
Approaches To Reinforcement Learning
- Value-based RL: Estimate the optimal value function Q*(s, a), which is the maximum value achievable under any policy
- Policy-based RL: Select directly for the optimal policy π*, which is the policy achieving maximum future reward
Q-Learning
- Note that the Q-value is estimated by a simple neural network
- But it is really easy for the NN to diverge due to correlation between samples and non-stationary targets.
Experience Replay
DQN in Atari