3.8 KiB
3.8 KiB
Introduction
This repo is used to learn basic RL algorithms, we will make it detailed comment and clear structure as much as possible:
The code structure mainly contains several scripts as following:
model.pybasic network model of RL, like MLP, CNNmemory.pyReplay Bufferplot.pyuse seaborn to plot rewards curve,saved in folderresult.env.pyto custom or normalize environmentsagent.pycore algorithms, include a python Class with functions(choose action, update)main.pymain function
Note that model.py,memory.py,plot.py shall be utilized in different algorithms,thus they are put into common folder。
Runnig Environment
python 3.7.9、pytorch 1.6.0、gym 0.18.0
Usage
run main.py or main.ipynb, or run files with task(like task1.py)
Schedule
| Name | Related materials | Used Envs | Notes |
|---|---|---|---|
| On-Policy First-Visit MC | Racetrack | ||
| Q-Learning | CliffWalking-v0 | ||
| Sarsa | Racetrack | ||
| DQN | DQN-paper | CartPole-v0 | |
| DQN-cnn | DQN-paper | CartPole-v0 | |
| DoubleDQN | CartPole-v0 | ||
| Hierarchical DQN | Hierarchical DQN | CartPole-v0 | |
| PolicyGradient | CartPole-v0 | ||
| A2C | A3C Paper | CartPole-v0 | |
| A3C | A3C Paper | ||
| SAC | SAC Paper | ||
| PPO | PPO paper | CartPole-v0 | |
| DDPG | DDPG Paper | Pendulum-v0 | |
| TD3 | TD3 Paper | HalfCheetah-v2 | |
| GAIL |