跳转至

强化学习

约 93 个字 预计阅读时间不到 1 分钟

Reference

  • https://github.com/boyu-ai/Hands-on-RL
  • https://campusai.github.io/theory/
  • https://rail.eecs.berkeley.edu/deeprlcourse/
  • https://web.stanford.edu/class/cs234/modules.html
  • https://deeprlcourse.github.io/course_notes/intro-to-rl/
  • https://sites.google.com/view/deep-rl-bootcamp/lectures
  • https://people.cs.umass.edu/~bsilva/courses/CMPSCI_687/Fall2022/Lecture_Notes_v1.0_687_F22.pdf
  • https://webee.technion.ac.il/shimkin/LCS11/LCS11index.html

Content

  • MDPs
  • Bellman Equation
  • Policy Iteration and Value Iteration
  • Monte Carlo Methods
  • Temporal Differenence (TD) Learning: Sarsa, Q-learning
  • TD(\(\lambda\))
  • DQN
  • Policy Gradient: REINFORCE
  • Actor Critic
  • TRPO
  • PPO
  • DDPG
  • SAC
  • Backup Diagram
  • Algorithm Summary

Exercise

  • policy iteration and value iteration for Cliff Walking

  • Sarsa and Q-learning for Cliff Walking

  • DQN for CartPole-v0

  • REINFORCE for CartPole-v0

  • Actor-Critic for CartPole-v0

  • TRPO for CartPole-v0 and Pendulum-v0

  • PPO for CartPole-v0 and Pendulum-v0

  • DDPG for Pendulum-v0

  • SAC for Pendulum-v0