easy-rl

bacow/easy-rl

Fork 0

Files

History

johnjim0816 895094a893 update

2021-04-29 14:44:25 +08:00

A2C

update

2021-04-16 14:59:23 +08:00

assets

update

2021-03-23 16:10:11 +08:00

common

update

2021-04-29 14:44:25 +08:00

DDPG

update

2021-04-29 14:44:25 +08:00

DoubleDQN

update

2021-03-31 15:37:09 +08:00

DQN

update

2021-04-29 14:44:25 +08:00

DQN_cnn

update

2021-04-16 14:59:23 +08:00

envs

update

2021-04-16 14:59:23 +08:00

HierarchicalDQN

update

2021-03-31 15:37:09 +08:00

MonteCarlo

update

2021-03-28 11:18:52 +08:00

PolicyGradient

update

2021-04-28 22:11:22 +08:00

PPO

update

2021-04-28 22:11:22 +08:00

QLearning

update

2021-04-04 16:59:03 +08:00

RandomPolicy

update

2021-04-28 22:11:22 +08:00

SAC

update

2021-04-29 14:44:25 +08:00

Sarsa

update

2021-03-28 11:18:52 +08:00

TD3

update

2021-04-28 22:11:22 +08:00

LICENSE

update

2021-03-23 16:10:11 +08:00

README_en.md

update

2021-04-29 14:44:25 +08:00

README.md

update

2021-04-29 14:44:25 +08:00

test.py

update

2021-04-29 14:44:25 +08:00

README_en.md

Eng|中文

Introduction

This repo is used to learn basic RL algorithms, we will make it detailed comment and clear structure as much as possible:

The code structure mainly contains several scripts as following：

model.py basic network model of RL, like MLP, CNN
memory.py Replay Buffer
plot.py use seaborn to plot rewards curve，saved in folder result.
env.py to custom or normalize environments
agent.py core algorithms, include a python Class with functions(choose action, update)
main.py main function

Note that model.py,memory.py,plot.py shall be utilized in different algorithms，thus they are put into common folder。

Runnig Environment

python 3.7、pytorch 1.6.0-1.7.1、gym 0.17.0-0.18.0

Usage

运行带有train的py文件或ipynb文件进行训练，如果前面带有task如task0_train.py，表示对task0任务训练类似的带有eval即为测试。

run python scripts or jupyter notebook file with train to train the agent, if there is a task like task0_train.py, it means to train with task 0.

similar to file with eval, which means to evaluate the agent.

Schedule

Name	Related materials	Used Envs
On-Policy First-Visit MC		Racetrack
Q-Learning		CliffWalking-v0
Sarsa		Racetrack
DQN	DQN-paper,Nature DQN Paper	CartPole-v0
DQN-cnn	DQN-paper	CartPole-v0
DoubleDQN		CartPole-v0
Hierarchical DQN	Hierarchical DQN	CartPole-v0
PolicyGradient		CartPole-v0
A2C	A3C Paper	CartPole-v0
SAC	SAC Paper
PPO	PPO paper	CartPole-v0
DDPG	DDPG Paper	Pendulum-v0
TD3	TD3 Paper	HalfCheetah-v2

Refs

RL-Adventure-2

RL-Adventure

README_en.md Unescape Escape

Introduction

Runnig Environment

Usage

Schedule

Refs

README_en.md