udpate
This commit is contained in:
@@ -193,14 +193,15 @@ $$
|
||||
|
||||
智能体在采取当前动作的时候会依赖于它之前得到的历史,所以我们可以把整个游戏的状态看成关于这个历史的函数:
|
||||
|
||||
$$
|
||||
S_{t}=f\left(H_{t}\right)
|
||||
$$
|
||||
|
||||
<div align=center>
|
||||
<img width="550" src="../img/ch1/1.21.png"/>
|
||||
</div>
|
||||
<div align=center>图 1.13 玩Pong游戏</div>
|
||||
|
||||
$$
|
||||
S_{t}=f\left(H_{t}\right)
|
||||
$$
|
||||
|
||||
Q:状态和观测有什么关系?
|
||||
|
||||
|
||||
Reference in New Issue
Block a user