This commit is contained in:
qiwang067
2021-08-08 14:52:27 +08:00
parent e0918f2c58
commit 719042c9f5

View File

@@ -167,13 +167,18 @@ $$
**证明:** **证明:**
为了记号简洁并且易读,我们丢掉了下标,令 $s=s_t,g'=G_{t+1},s'=s_{t+1}$。按照惯例,我们可以重写这个回报的期望为: 为了记号简洁并且易读,我们丢掉了下标,令 $s=s_t,g'=G_{t+1},s'=s_{t+1}$。我们可以根据条件期望的定义来重写这个回报的期望为:
$$ $$
\begin{aligned} \begin{aligned}
\mathbb{E}\left[G_{t+1} \mid s_{t+1}\right] &=\mathbb{E}\left[g^{\prime} \mid s^{\prime}\right] \\ \mathbb{E}\left[G_{t+1} \mid s_{t+1}\right] &=\mathbb{E}\left[g^{\prime} \mid s^{\prime}\right] \\
&=\sum_{g^{\prime}} g^{\prime}~p\left(g^{\prime} \mid s^{\prime}\right) &=\sum_{g^{\prime}} g^{\prime}~p\left(g^{\prime} \mid s^{\prime}\right)
\end{aligned} \end{aligned}
$$ $$
> 如果 $X$ 和 $Y$ 都是离散型随机变量则条件期望Conditional Expectation$E(X|Y=y)$的定义如下式所示:
> $$
> \mathrm{E}(X \mid Y=y)=\sum_{x} x P(X=x \mid Y=y)
> $$
令 $s_t=s$,我们对上述表达式求期望可得: 令 $s_t=s$,我们对上述表达式求期望可得:
$$ $$
\begin{aligned} \begin{aligned}