Review: PR-005-Playing Atari with Deep Reinforcement Learning

Joonsu Oh
2 min readMay 23, 2021

--

Disclaimer

  • I am a novice on deep reinforcement learning.
  • This article will really superficially cover the paper only to grasp main contribution of this paper.

Abstract

  • First case that applied deep learning on reinforcement learning.
  • Successfully learn control policies directly from high-dimensional sensory input (pixels)
  • Applied on 7 Atari games — outperforms ALL previous approaches on 6 games, surpasses a human expert on 3 games

Challenges in RL

  • Most DL requires hand labeled training data (supervised learning only)
  • RL must learn from a scalar reward signal
  • Reward signal is often sparse, noisy, and delayed
  • Delay between actions and resulting rewards can be thousands of time step
  • Solution: CNN with a variant Q-learning!
  • Most DL assumes data samples are independent while RL encounters sequences of highly correlated states
  • Solution: Experience replay

RL Background

Agent and Environment

Figure 1

State

Figure 2

Major Components of an RL Agent

  • Policy: Agent’s behaviour function — Deterministic Policy: a = π(s), Stochastic Policy: π(a|s) = P[a|s]
  • Value Function: How good is each state and/or action
Figure 3
  • Model: Agent’s representation of the environment

Approaches To Reinforcement Learning

  • Value-based RL: Estimate the optimal value function Q*(s, a), which is the maximum value achievable under any policy
  • Policy-based RL: Select directly for the optimal policy π*, which is the policy achieving maximum future reward

Q-Learning

Figure 4
  • Note that the Q-value is estimated by a simple neural network
  • But it is really easy for the NN to diverge due to correlation between samples and non-stationary targets.

Experience Replay

Figure 5

DQN in Atari

--

--

No responses yet