From d25cb05358bc63124770ab9f88e9b26bfa84dde0 Mon Sep 17 00:00:00 2001
From: Yiyuan Yang <yyy1997sjz@gmail.com>
Date: Tue, 25 May 2021 10:44:59 +0800
Subject: [PATCH 1/2] Update chapter2.md

---
 docs/chapter2/chapter2.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/docs/chapter2/chapter2.md b/docs/chapter2/chapter2.md
index 33f640a..3ef1156 100644
--- a/docs/chapter2/chapter2.md
+++ b/docs/chapter2/chapter2.md
@@ -162,7 +162,7 @@ $$
 
 > Law of total expectation 也被称为 law of iterated expectations(LIE)。如果 $A_i$ 是样本空间的有限或可数的划分(partition)，则全期望公式可以写成如下形式：
 > $$
-> \mathrm{E}(X)=\sum_{i} \mathrm{E}\left(X \mid A_{i}\right) \mathrm{P}\left(A_{i}\right) \nonumber
+> \mathrm{E}(X)=\sum_{i} \mathrm{E}\left(X \mid A_{i}\right) \mathrm{P}\left(A_{i}\right)
 > $$
 
 **证明：**

From cb4313d702bbc542059f25394abb696f1ff3ccd6 Mon Sep 17 00:00:00 2001
From: Yiyuan Yang <yyy1997sjz@gmail.com>
Date: Tue, 25 May 2021 10:57:43 +0800
Subject: [PATCH 2/2] Update chapter5.md

---
 docs/chapter5/chapter5.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/chapter5/chapter5.md b/docs/chapter5/chapter5.md
index b323bf6..71a171b 100644
--- a/docs/chapter5/chapter5.md
+++ b/docs/chapter5/chapter5.md
@@ -157,7 +157,7 @@ PPO 有一个前身叫做`信任区域策略优化(Trust Region Policy Optimizat
 $$
 \begin{aligned}
 J_{T R P O}^{\theta^{\prime}}(\theta)=E_{\left(s_{t}, a_{t}\right) \sim \pi_{\theta^{\prime}}}\left[\frac{p_{\theta}\left(a_{t} | s_{t}\right)}{p_{\theta^{\prime}}\left(a_{t} | s_{t}\right)} A^{\theta^{\prime}}\left(s_{t}, a_{t}\right)\right] \\ \\
-\mathrm{KL}\left(\theta, \theta^{\prime}\right)<\delta
+<div align = right> \mathrm{KL}\left(\theta, \theta^{\prime}\right)<\delta </div>
 \end{aligned}
 $$