update ch1

2022-10-23 20:12:01 +08:00
parent 0647bdf6c7
commit f02a0b9da7
1 changed files with 1 additions and 1 deletions
@@ -188,7 +188,7 @@ $$
 在与环境的交互过程中，智能体会获得很多观测。针对每一个观测，智能体会采取一个动作，也会得到一个奖励。所以历史是观测、动作、奖励的序列：
 $$
-H_{t}=o_{1}, r_{1}, a_{1}, \ldots, o_{t}, a_{t}, r_{t}
+H_{t}=o_{1},  a_{1}, r_{1}, \ldots, o_{t}, a_{t}, r_{t}
 $$
 智能体在采取当前动作的时候会依赖于它之前得到的历史，所以我们可以把整个游戏的状态看成关于这个历史的函数：