update PolicyGradient

This commit is contained in:
JohnJim0816
2020-11-27 18:34:04 +08:00
parent 9590e80a2b
commit abfe6ea62b
38 changed files with 210 additions and 22 deletions

View File

@@ -0,0 +1,42 @@
# Policy Gradient
实现的是Policy Gradient最基本的REINFORCE方法
## 原理讲解
参考我的博客[Policy Gradient算法实战](https://blog.csdn.net/JohnJim0/article/details/110236851)
## 环境
python 3.7.9
pytorch 1.6.0
tensorboard 2.3.0
torchvision 0.7.0
## 程序运行方法
train:
```python
python main.py
```
eval:
```python
python main.py --train 0
```
tensorboard
```python
tensorboard --logdir logs
```
## 参考
[REINFORCE和Reparameterization Trick](https://blog.csdn.net/JohnJim0/article/details/110230703)
[Policy Gradient paper](https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf)
[REINFORCE](https://towardsdatascience.com/policy-gradient-methods-104c783251e0)