update
This commit is contained in:
@@ -1,38 +1,15 @@
|
||||
# Policy Gradient
|
||||
实现的是Policy Gradient最基本的REINFORCE方法
|
||||
## 使用说明
|
||||
直接运行```main.py```即可
|
||||
## 原理讲解
|
||||
|
||||
参考我的博客[Policy Gradient算法实战](https://blog.csdn.net/JohnJim0/article/details/110236851)
|
||||
|
||||
## 环境
|
||||
|
||||
python 3.7.9
|
||||
|
||||
pytorch 1.6.0
|
||||
|
||||
tensorboard 2.3.0
|
||||
|
||||
torchvision 0.7.0
|
||||
|
||||
python 3.7.9、pytorch 1.6.0
|
||||
## 程序运行方法
|
||||
|
||||
train:
|
||||
|
||||
```python
|
||||
python main.py
|
||||
```
|
||||
|
||||
eval:
|
||||
|
||||
```python
|
||||
python main.py --train 0
|
||||
```
|
||||
tensorboard:
|
||||
```python
|
||||
tensorboard --logdir logs
|
||||
```
|
||||
|
||||
|
||||
## 参考
|
||||
|
||||
[REINFORCE和Reparameterization Trick](https://blog.csdn.net/JohnJim0/article/details/110230703)
|
||||
|
||||
Reference in New Issue
Block a user