Update chapter4_questions&keywords.md
This commit is contained in:
@@ -83,10 +83,11 @@
|
||||
\theta^* = \theta + \alpha\nabla J({\theta})
|
||||
$$
|
||||
所以我们仅仅需要计算(更新)$\nabla J({\theta})$ ,也就是计算回报函数 $J({\theta})$ 关于 $\theta$ 的梯度,也就是策略梯度,计算方法如下:
|
||||
$$
|
||||
\nabla_{\theta}J(\theta) = \int {\nabla}_{\theta}p_{\theta}(\tau)r(\tau)d_{\tau}
|
||||
=\int p_{\theta}{\nabla}_{\theta}logp_{\theta}(\tau)r(\tau)d_{\tau}=E_{\tau \sim p_{\theta}(\tau)}[{\nabla}_{\theta}logp_{\theta}(\tau)r(\tau)]
|
||||
$$
|
||||
$$\begin{aligned}
|
||||
\nabla_{\theta}J(\theta) &= \int {\nabla}_{\theta}p_{\theta}(\tau)r(\tau)d_{\tau} \\
|
||||
&= \int p_{\theta}{\nabla}_{\theta}logp_{\theta}(\tau)r(\tau)d_{\tau} \\
|
||||
&= E_{\tau \sim p_{\theta}(\tau)}[{\nabla}_{\theta}logp_{\theta}(\tau)r(\tau)]
|
||||
\end{aligned}$$
|
||||
接着我们继续讲上式展开,对于 $p_{\theta}(\tau)$ ,即 $p_{\theta}(\tau|{\theta})$ :
|
||||
$$
|
||||
p_{\theta}(\tau|{\theta}) = p(s_1)\prod_{t=1}^T \pi_{\theta}(a_t|s_t)p(s_{t+1}|s_t,a_t)
|
||||
|
||||
Reference in New Issue
Block a user