Merge branch 'master' of github.com:datawhalechina/easy-rl

This commit is contained in:
qiwang067
2021-05-25 11:00:15 +08:00
2 changed files with 2 additions and 2 deletions

View File

@@ -162,7 +162,7 @@ $$
> Law of total expectation 也被称为 law of iterated expectations(LIE)。如果 $A_i$ 是样本空间的有限或可数的划分(partition),则全期望公式可以写成如下形式: > Law of total expectation 也被称为 law of iterated expectations(LIE)。如果 $A_i$ 是样本空间的有限或可数的划分(partition),则全期望公式可以写成如下形式:
> $$ > $$
> \mathrm{E}(X)=\sum_{i} \mathrm{E}\left(X \mid A_{i}\right) \mathrm{P}\left(A_{i}\right) \nonumber > \mathrm{E}(X)=\sum_{i} \mathrm{E}\left(X \mid A_{i}\right) \mathrm{P}\left(A_{i}\right)
> $$ > $$
**证明:** **证明:**

View File

@@ -157,7 +157,7 @@ PPO 有一个前身叫做`信任区域策略优化(Trust Region Policy Optimizat
$$ $$
\begin{aligned} \begin{aligned}
J_{T R P O}^{\theta^{\prime}}(\theta)=E_{\left(s_{t}, a_{t}\right) \sim \pi_{\theta^{\prime}}}\left[\frac{p_{\theta}\left(a_{t} | s_{t}\right)}{p_{\theta^{\prime}}\left(a_{t} | s_{t}\right)} A^{\theta^{\prime}}\left(s_{t}, a_{t}\right)\right] \\ \\ J_{T R P O}^{\theta^{\prime}}(\theta)=E_{\left(s_{t}, a_{t}\right) \sim \pi_{\theta^{\prime}}}\left[\frac{p_{\theta}\left(a_{t} | s_{t}\right)}{p_{\theta^{\prime}}\left(a_{t} | s_{t}\right)} A^{\theta^{\prime}}\left(s_{t}, a_{t}\right)\right] \\ \\
\mathrm{KL}\left(\theta, \theta^{\prime}\right)<\delta <div align = right> \mathrm{KL}\left(\theta, \theta^{\prime}\right)<\delta </div>
\end{aligned} \end{aligned}
$$ $$