Contents

  • TOC {:toc}

Value based RL

DQN (Playing Atari with Deep Reinforcement Learning)

  • Optimal Q-function에 대한 Bellman equation
  • target network와 experience replay를 적용해주면 된다

Policy based RL

vanilla policy gradient

natural policy gradient

TRPO

PPO

GRPO

Actor Critic based RL

Title