强化学习¶
约 93 个字 预计阅读时间不到 1 分钟
Reference¶
- https://github.com/boyu-ai/Hands-on-RL
- https://campusai.github.io/theory/
- https://rail.eecs.berkeley.edu/deeprlcourse/
- https://web.stanford.edu/class/cs234/modules.html
- https://deeprlcourse.github.io/course_notes/intro-to-rl/
- https://sites.google.com/view/deep-rl-bootcamp/lectures
- https://people.cs.umass.edu/~bsilva/courses/CMPSCI_687/Fall2022/Lecture_Notes_v1.0_687_F22.pdf
- https://webee.technion.ac.il/shimkin/LCS11/LCS11index.html
Content¶
- MDPs
- Bellman Equation
- Policy Iteration and Value Iteration
- Monte Carlo Methods
- Temporal Differenence (TD) Learning: Sarsa, Q-learning
- TD(\(\lambda\))
- DQN
- Policy Gradient: REINFORCE
- Actor Critic
- TRPO
- PPO
- DDPG
- SAC
- Backup Diagram
- Algorithm Summary
Exercise¶
-
policy iteration and value iteration for Cliff Walking
-
Sarsa and Q-learning for Cliff Walking
-
DQN for CartPole-v0
-
REINFORCE for CartPole-v0
-
Actor-Critic for CartPole-v0
-
TRPO for CartPole-v0 and Pendulum-v0
-
PPO for CartPole-v0 and Pendulum-v0
-
DDPG for Pendulum-v0
-
SAC for Pendulum-v0