update PolicyGradient
This commit is contained in:
42
codes/PolicyGradient/README.md
Normal file
42
codes/PolicyGradient/README.md
Normal file
@@ -0,0 +1,42 @@
|
||||
# Policy Gradient
|
||||
实现的是Policy Gradient最基本的REINFORCE方法
|
||||
## 原理讲解
|
||||
|
||||
参考我的博客[Policy Gradient算法实战](https://blog.csdn.net/JohnJim0/article/details/110236851)
|
||||
|
||||
## 环境
|
||||
|
||||
python 3.7.9
|
||||
|
||||
pytorch 1.6.0
|
||||
|
||||
tensorboard 2.3.0
|
||||
|
||||
torchvision 0.7.0
|
||||
|
||||
## 程序运行方法
|
||||
|
||||
train:
|
||||
|
||||
```python
|
||||
python main.py
|
||||
```
|
||||
|
||||
eval:
|
||||
|
||||
```python
|
||||
python main.py --train 0
|
||||
```
|
||||
tensorboard:
|
||||
```python
|
||||
tensorboard --logdir logs
|
||||
```
|
||||
|
||||
|
||||
## 参考
|
||||
|
||||
[REINFORCE和Reparameterization Trick](https://blog.csdn.net/JohnJim0/article/details/110230703)
|
||||
|
||||
[Policy Gradient paper](https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf)
|
||||
|
||||
[REINFORCE](https://towardsdatascience.com/policy-gradient-methods-104c783251e0)
|
||||
Reference in New Issue
Block a user