From 838088be41837a8e3642b0d1cb9fce39fe125ef9 Mon Sep 17 00:00:00 2001
From: qiwang067 <qiwang067@163.com>
Date: Tue, 13 Oct 2020 22:38:28 +0800
Subject: [PATCH] fix ch4 typos

---
 docs/chapter4/chapter4.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/chapter4/chapter4.md b/docs/chapter4/chapter4.md
index d3c7fe7..000cc66 100644
--- a/docs/chapter4/chapter4.md
+++ b/docs/chapter4/chapter4.md
@@ -235,7 +235,7 @@ $$
 
 Advantage function 的意义就是，假设我们在某一个 state $s_t$ 执行某一个 action $a_t$，相较于其他可能的 action，它有多好。它在意的不是一个绝对的好，而是相对的好，即`相对优势(relative advantage)`。因为会减掉一个 b，减掉一个 baseline， 所以这个东西是相对的好，不是绝对的好。 $A^{\theta}\left(s_{t}, a_{t}\right)$ 通常可以是由一个 network estimate 出来的，这个 network 叫做 critic。 
 
-## REINFORCE
+## REINFORCE: Monte Carlo Policy Gradient
 
 ![](img/4.20.png)