easy-rl/codes/README_en.md at 4b96f5a6b029632bd7e9b6689e12dc4245cd90ed

bacow/easy-rl

Fork 0

Files

johnjim0816 8028f7145e update

2021-05-03 23:00:01 +08:00

4.5 KiB

Raw Blame History

Eng|中文

Introduction

This repo is used to learn basic RL algorithms, we will make it detailed comment and clear structure as much as possible:

The code structure mainly contains several scripts as following：

model.py basic network model of RL, like MLP, CNN
memory.py Replay Buffer
plot.py use seaborn to plot rewards curve，saved in folder result.
env.py to custom or normalize environments
agent.py core algorithms, include a python Class with functions(choose action, update)
main.py main function

Note that model.py,memory.py,plot.py shall be utilized in different algorithms，thus they are put into common folder。

Runnig Environment

python 3.7、pytorch 1.6.0-1.7.1、gym 0.17.0-0.18.0

Usage

运行带有train的py文件或ipynb文件进行训练，如果前面带有task如task0_train.py，表示对task0任务训练类似的带有eval即为测试。

run python scripts or jupyter notebook file with train to train the agent, if there is a task like task0_train.py, it means to train with task 0.

similar to file with eval, which means to evaluate the agent.

Schedule

Name	Related materials	Used Envs	Notes
On-Policy First-Visit MC	medium blog	Racetrack
Q-Learning	towardsdatascience blog,q learning paper	CliffWalking-v0
Sarsa	geeksforgeeks blog	Racetrack
DQN	DQN Paper,Nature DQN Paper	CartPole-v0
DQN-cnn	DQN Paper	CartPole-v0	与DQN相比使用了CNN而不是全链接网络
DoubleDQN	DoubleDQN Paper	CartPole-v0
Hierarchical DQN	H-DQN Paper	CartPole-v0
PolicyGradient	Lil'log	CartPole-v0
A2C	A3C Paper	CartPole-v0
SAC	SAC Paper	Pendulum-v0
PPO	PPO paper	CartPole-v0
DDPG	DDPG Paper	Pendulum-v0
TD3	TD3 Paper	HalfCheetah-v2

Refs

RL-Adventure-2

RL-Adventure

4.5 KiB Raw Blame History Unescape Escape