Update chapter4_questions&keywords.md
This commit is contained in:
@@ -83,10 +83,11 @@
|
|||||||
\theta^* = \theta + \alpha\nabla J({\theta})
|
\theta^* = \theta + \alpha\nabla J({\theta})
|
||||||
$$
|
$$
|
||||||
所以我们仅仅需要计算(更新)$\nabla J({\theta})$ ,也就是计算回报函数 $J({\theta})$ 关于 $\theta$ 的梯度,也就是策略梯度,计算方法如下:
|
所以我们仅仅需要计算(更新)$\nabla J({\theta})$ ,也就是计算回报函数 $J({\theta})$ 关于 $\theta$ 的梯度,也就是策略梯度,计算方法如下:
|
||||||
$$
|
$$\begin{aligned}
|
||||||
\nabla_{\theta}J(\theta) = \int {\nabla}_{\theta}p_{\theta}(\tau)r(\tau)d_{\tau}
|
\nabla_{\theta}J(\theta) &= \int {\nabla}_{\theta}p_{\theta}(\tau)r(\tau)d_{\tau} \\
|
||||||
=\int p_{\theta}{\nabla}_{\theta}logp_{\theta}(\tau)r(\tau)d_{\tau}=E_{\tau \sim p_{\theta}(\tau)}[{\nabla}_{\theta}logp_{\theta}(\tau)r(\tau)]
|
&= \int p_{\theta}{\nabla}_{\theta}logp_{\theta}(\tau)r(\tau)d_{\tau} \\
|
||||||
$$
|
&= E_{\tau \sim p_{\theta}(\tau)}[{\nabla}_{\theta}logp_{\theta}(\tau)r(\tau)]
|
||||||
|
\end{aligned}$$
|
||||||
接着我们继续讲上式展开,对于 $p_{\theta}(\tau)$ ,即 $p_{\theta}(\tau|{\theta})$ :
|
接着我们继续讲上式展开,对于 $p_{\theta}(\tau)$ ,即 $p_{\theta}(\tau|{\theta})$ :
|
||||||
$$
|
$$
|
||||||
p_{\theta}(\tau|{\theta}) = p(s_1)\prod_{t=1}^T \pi_{\theta}(a_t|s_t)p(s_{t+1}|s_t,a_t)
|
p_{\theta}(\tau|{\theta}) = p(s_1)\prod_{t=1}^T \pi_{\theta}(a_t|s_t)p(s_{t+1}|s_t,a_t)
|
||||||
|
|||||||
Reference in New Issue
Block a user