Files
easy-rl/codes/PolicyGradient
2020-11-27 18:34:11 +08:00
..
2020-11-27 18:34:11 +08:00
2020-11-27 18:34:11 +08:00
2020-11-27 18:34:11 +08:00
2020-11-27 18:34:11 +08:00
2020-11-27 18:34:11 +08:00
2020-11-27 18:34:11 +08:00
2020-11-27 18:34:11 +08:00
2020-11-27 18:34:11 +08:00
2020-11-27 18:34:11 +08:00
2020-11-27 18:34:11 +08:00
2020-11-27 18:34:11 +08:00

Policy Gradient

实现的是Policy Gradient最基本的REINFORCE方法

原理讲解

参考我的博客Policy Gradient算法实战

环境

python 3.7.9

pytorch 1.6.0

tensorboard 2.3.0

torchvision 0.7.0

程序运行方法

train:

python main.py 

eval:

python main.py --train 0 

tensorboard

tensorboard --logdir logs 

参考

REINFORCE和Reparameterization Trick

Policy Gradient paper

REINFORCE