This commit is contained in:
JohnJim0816
2021-03-23 16:10:11 +08:00
parent d4690c2058
commit bf0f2990cf
198 changed files with 1668 additions and 1545 deletions

View File

@@ -1,38 +1,15 @@
# Policy Gradient
实现的是Policy Gradient最基本的REINFORCE方法
## 使用说明
直接运行```main.py```即可
## 原理讲解
参考我的博客[Policy Gradient算法实战](https://blog.csdn.net/JohnJim0/article/details/110236851)
## 环境
python 3.7.9
pytorch 1.6.0
tensorboard 2.3.0
torchvision 0.7.0
python 3.7.9、pytorch 1.6.0
## 程序运行方法
train:
```python
python main.py
```
eval:
```python
python main.py --train 0
```
tensorboard
```python
tensorboard --logdir logs
```
## 参考
[REINFORCE和Reparameterization Trick](https://blog.csdn.net/JohnJim0/article/details/110230703)