Introduction
This repo is used to learn basic RL algorithms, we will make it detailed comment and clear structure as much as possible:
The code structure mainly contains several scripts as following:
model.pybasic network model of RL, like MLP, CNNmemory.pyReplay Bufferplot.pyuse seaborn to plot rewards curve,saved in folderresult.env.pyto custom or normalize environmentsagent.pycore algorithms, include a python Class with functions(choose action, update)main.pymain function
Note that model.py,memory.py,plot.py shall be utilized in different algorithms,thus they are put into common folder。
Runnig Environment
python 3.7、pytorch 1.6.0-1.7.1、gym 0.17.0-0.18.0
Usage
运行带有train的py文件或ipynb文件进行训练,如果前面带有task如task0_train.py,表示对task0任务训练
类似的带有eval即为测试。
run python scripts or jupyter notebook file with train to train the agent, if there is a task like task0_train.py, it means to train with task 0.
similar to file with eval, which means to evaluate the agent.
Schedule
| Name | Related materials | Used Envs | Notes |
|---|---|---|---|
| On-Policy First-Visit MC | Racetrack | ||
| Q-Learning | CliffWalking-v0 | ||
| Sarsa | Racetrack | ||
| DQN | DQN-paper,Nature DQN Paper | CartPole-v0 | |
| DQN-cnn | DQN-paper | CartPole-v0 | |
| DoubleDQN | CartPole-v0 | ||
| Hierarchical DQN | Hierarchical DQN | CartPole-v0 | |
| PolicyGradient | CartPole-v0 | ||
| A2C | A3C Paper | CartPole-v0 | |
| SAC | SAC Paper | ||
| PPO | PPO paper | CartPole-v0 | |
| DDPG | DDPG Paper | Pendulum-v0 | |
| TD3 | TD3 Paper | HalfCheetah-v2 |