From 14a00afcc0c3fe6f5387b7950da74368bf9684bc Mon Sep 17 00:00:00 2001
From: Yiyuan Yang <yyy1997sjz@gmail.com>
Date: Mon, 24 May 2021 09:41:07 +0800
Subject: [PATCH] Update chapter4_questions&keywords.md

---
 docs/chapter4/chapter4_questions&keywords.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/chapter4/chapter4_questions&keywords.md b/docs/chapter4/chapter4_questions&keywords.md
index cba8d29..29c4a19 100644
--- a/docs/chapter4/chapter4_questions&keywords.md
+++ b/docs/chapter4/chapter4_questions&keywords.md
@@ -101,9 +101,9 @@
   $$
   带入第三个式子，可以将其化简为：
   $$\begin{aligned}
-  \nabla_{\theta}J(\theta) = E_{\tau \sim p_{\theta}(\tau)}[{\nabla}_{\theta}logp_{\theta}(\tau)r(\tau)] \\
-  &= E_{\tau \sim p_{\theta}}[(\nabla_{\theta}log\pi_{\theta}(a_t|s_t))(\sum_{t=1}^Tr(s_t,a_t))] \\ 
-  &= \frac{1}{N}\sum_{i=1}^N[(\sum_{t=1}^T\nabla_{\theta}log \pi_{\theta}(a_{i,t}|s_{i,t}))(\sum_{t=1}^Nr(s_{i,t},a_{i,t}))]
+  \nabla_{\theta}J(\theta) &=& E_{\tau \sim p_{\theta}(\tau)}[{\nabla}_{\theta}logp_{\theta}(\tau)r(\tau)] \\
+  &=& E_{\tau \sim p_{\theta}}[(\nabla_{\theta}log\pi_{\theta}(a_t|s_t))(\sum_{t=1}^Tr(s_t,a_t))] \\ 
+  &=& \frac{1}{N}\sum_{i=1}^N[(\sum_{t=1}^T\nabla_{\theta}log \pi_{\theta}(a_{i,t}|s_{i,t}))(\sum_{t=1}^Nr(s_{i,t},a_{i,t}))]
   \end{aligned}$$
   
 - 高冷的面试官：可以说一下你了解到的基于梯度策略的优化时的小技巧吗？