From f02a0b9da76de59ae9a2336b6565b4da17822fd0 Mon Sep 17 00:00:00 2001 From: qiwang067 Date: Sun, 23 Oct 2022 20:12:01 +0800 Subject: [PATCH] update ch1 --- docs/chapter1/chapter1.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/chapter1/chapter1.md b/docs/chapter1/chapter1.md index b21b264..c6184a5 100644 --- a/docs/chapter1/chapter1.md +++ b/docs/chapter1/chapter1.md @@ -188,7 +188,7 @@ $$ 在与环境的交互过程中,智能体会获得很多观测。针对每一个观测,智能体会采取一个动作,也会得到一个奖励。所以历史是观测、动作、奖励的序列: $$ -H_{t}=o_{1}, r_{1}, a_{1}, \ldots, o_{t}, a_{t}, r_{t} +H_{t}=o_{1}, a_{1}, r_{1}, \ldots, o_{t}, a_{t}, r_{t} $$ 智能体在采取当前动作的时候会依赖于它之前得到的历史,所以我们可以把整个游戏的状态看成关于这个历史的函数: