42 lines
778 B
Markdown
42 lines
778 B
Markdown
# Policy Gradient
|
||
实现的是Policy Gradient最基本的REINFORCE方法
|
||
## 原理讲解
|
||
|
||
参考我的博客[Policy Gradient算法实战](https://blog.csdn.net/JohnJim0/article/details/110236851)
|
||
|
||
## 环境
|
||
|
||
python 3.7.9
|
||
|
||
pytorch 1.6.0
|
||
|
||
tensorboard 2.3.0
|
||
|
||
torchvision 0.7.0
|
||
|
||
## 程序运行方法
|
||
|
||
train:
|
||
|
||
```python
|
||
python main.py
|
||
```
|
||
|
||
eval:
|
||
|
||
```python
|
||
python main.py --train 0
|
||
```
|
||
tensorboard:
|
||
```python
|
||
tensorboard --logdir logs
|
||
```
|
||
|
||
|
||
## 参考
|
||
|
||
[REINFORCE和Reparameterization Trick](https://blog.csdn.net/JohnJim0/article/details/110230703)
|
||
|
||
[Policy Gradient paper](https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf)
|
||
|
||
[REINFORCE](https://towardsdatascience.com/policy-gradient-methods-104c783251e0) |