update codes

This commit is contained in:
johnjim0816
2021-11-17 14:36:51 +08:00
parent 8e5090a653
commit 442e307b01
81 changed files with 976 additions and 401 deletions

View File

@@ -8,12 +8,16 @@ Policy-based方法是强化学习中与Value-based(比如Q-learning)相对的方
结合REINFORCE原理其伪代码如下
<img src="assets/image-20211016004808604.png" alt="image-20211016004808604" style="zoom:50%;" />
https://pytorch.org/docs/stable/distributions.html
加负号的原因是在公式中应该是实现的梯度上升算法而loss一般使用随机梯度下降的所以加个负号保持一致性。
![img](assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0pvaG5KaW0w,size_16,color_FFFFFF,t_70-20210428001336032.png)
## 实现
## 参考
[REINFORCE和Reparameterization Trick](https://blog.csdn.net/JohnJim0/article/details/110230703)