diff --git a/docs/chapter4/chapter4.md b/docs/chapter4/chapter4.md index 4550df0..c459be8 100644 --- a/docs/chapter4/chapter4.md +++ b/docs/chapter4/chapter4.md @@ -302,7 +302,7 @@ $$ $$ (s_1,a_1,G_1),(s_2,a_2,G_2),\cdots,(s_T,a_T,G_T) $$ -然后针对每个动作计算梯度 $\nabla \ln \pi(a_t|s_t,\theta)$ 。在代码上计算时,我们要获取神经网络的输出。神经网络会输出每个动作对应的概率值(比如0.2、0.5、0.3),然后我们还可以获取实际的动作$a_t$,把动作转成独热(one-hot)向量(比如[0,1,0])与 $\log [0.2,0.5,0.3]$ 相乘就可以得到 $\ln \pi(a_t|s_t,\theta)$ 。 +然后针对每个动作计算梯度 $\nabla \log \pi(a_t|s_t,\theta)$ 。在代码上计算时,我们要获取神经网络的输出。神经网络会输出每个动作对应的概率值(比如0.2、0.5、0.3),然后我们还可以获取实际的动作$a_t$,把动作转成独热(one-hot)向量(比如[0,1,0])与 $\log [0.2,0.5,0.3]$ 相乘就可以得到 $\log \pi(a_t|s_t,\theta)$ 。
@@ -353,7 +353,7 @@ $$
图 4.18 策略梯度损失
-如图 4.19 所示,实际上我们在计算策略梯度损失的时候,要先对实际执行的动作取独热向量,再获取神经网络预测的动作概率,将它们相乘,我们就可以得到 $\ln \pi(a_t|s_t,\theta)$,这就是我们要构造的损失。因为我们可以获取整个回合的所有的轨迹,所以我们可以对这一条轨迹里面的每个动作都去计算一个损失。把所有的损失加起来,我们再将其“扔”给 Adam 的优化器去自动更新参数就好了。 +如图 4.19 所示,实际上我们在计算策略梯度损失的时候,要先对实际执行的动作取独热向量,再获取神经网络预测的动作概率,将它们相乘,我们就可以得到 $\log \pi(a_t|s_t,\theta)$,这就是我们要构造的损失。因为我们可以获取整个回合的所有的轨迹,所以我们可以对这一条轨迹里面的每个动作都去计算一个损失。把所有的损失加起来,我们再将其“扔”给 Adam 的优化器去自动更新参数就好了。
diff --git a/docs/img/ch2/图片 [Auto-saved].pptx b/docs/img/ch2/图片 [Auto-saved].pptx new file mode 100644 index 0000000..5e8717f Binary files /dev/null and b/docs/img/ch2/图片 [Auto-saved].pptx differ diff --git a/docs/img/ch3/3.19a.png b/docs/img/ch3/3.19a.png deleted file mode 100644 index 24d0585..0000000 Binary files a/docs/img/ch3/3.19a.png and /dev/null differ diff --git a/docs/img/ch3/3.19b.png b/docs/img/ch3/3.19b.png deleted file mode 100644 index 96d50fb..0000000 Binary files a/docs/img/ch3/3.19b.png and /dev/null differ diff --git a/docs/img/ch3/3.8a.png b/docs/img/ch3/3.8a.png deleted file mode 100644 index 9eea2ba..0000000 Binary files a/docs/img/ch3/3.8a.png and /dev/null differ diff --git a/docs/img/ch3/3.8b.png b/docs/img/ch3/3.8b.png deleted file mode 100644 index 62cd27a..0000000 Binary files a/docs/img/ch3/3.8b.png and /dev/null differ diff --git a/docs/img/ch3/3.8c.png b/docs/img/ch3/3.8c.png deleted file mode 100644 index 0991ab9..0000000 Binary files a/docs/img/ch3/3.8c.png and /dev/null differ diff --git a/docs/img/ch3/model_free_control_5.png b/docs/img/ch3/model_free_control_5.png deleted file mode 100644 index 7cacf6c..0000000 Binary files a/docs/img/ch3/model_free_control_5.png and /dev/null differ diff --git a/docs/img/ch3/model_free_control_6.png b/docs/img/ch3/model_free_control_6.png deleted file mode 100644 index 97ff496..0000000 Binary files a/docs/img/ch3/model_free_control_6.png and /dev/null differ diff --git a/docs/img/ch3/model_free_control_9.png b/docs/img/ch3/model_free_control_9.png deleted file mode 100644 index a6d415d..0000000 Binary files a/docs/img/ch3/model_free_control_9.png and /dev/null differ diff --git a/docs/img/ch4/4.22.png b/docs/img/ch4/4.22.png index 450b44a..5039171 100644 Binary files a/docs/img/ch4/4.22.png and b/docs/img/ch4/4.22.png differ