Introduction
This repo is used to learn basic RL algorithms, we will make it detailed comment and clear structure as much as possible:
The code structure mainly contains several scripts as following:
model.pybasic network model of RL, like MLP, CNNmemory.pyReplay Bufferplot.pyuse seaborn to plot rewards curve,saved in folderresult.env.pyto custom or normalize environmentsagent.pycore algorithms, include a python Class with functions(choose action, update)main.pymain function
Note that model.py,memory.py,plot.py shall be utilized in different algorithms,thus they are put into common folder。
Runnig Environment
python 3.7.9、pytorch 1.6.0、gym 0.18.0
Usage
Environment infomations see 环境说明
Schedule
| Name | Related materials | Used Envs | Notes |
|---|---|---|---|
| On-Policy First-Visit MC | Racetrack | ||
| Q-Learning | CliffWalking-v0 | ||
| Sarsa | Racetrack | ||
| DQN | DQN-paper | CartPole-v0 | |
| DQN-cnn | DQN-paper | CartPole-v0 | |
| DoubleDQN | CartPole-v0 | not well | |
| Hierarchical DQN | Hierarchical DQN | ||
| PolicyGradient | CartPole-v0 | ||
| A2C | CartPole-v0 | ||
| A3C | |||
| SAC | |||
| PPO | PPO paper | CartPole-v0 | |
| DDPG | DDPG Paper | Pendulum-v0 | |
| TD3 | Twin Dueling DDPG Paper | ||
| GAIL |