update ch11

Merge branch 'master' of github.com:datawhalechina/easy-rl
update ch11
2025-08-06 17:33:25 +08:00 · 2025-08-06 17:32:16 +08:00 · 2025-08-06 17:32:00 +08:00 · 2025-06-19 10:45:46 +08:00 · 2025-06-13 23:27:08 +08:00 · 2025-06-07 22:54:35 +08:00
4 changed files with 42 additions and 38 deletions
@@ -1,4 +1,4 @@
-[![GitHub issues](https://img.shields.io/github/issues/datawhalechina/easy-rl)](https://github.com/datawhalechina/easy-rl/issues) [![GitHub stars](https://img.shields.io/github/stars/datawhalechina/easy-rl)](https://github.com/datawhalechina/easy-rl/stargazers) [![GitHub forks](https://img.shields.io/github/forks/datawhalechina/easy-rl)](https://github.com/datawhalechina/easy-rl/network) [![Hits](https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2Fdatawhalechina%2Feasy-rl%2F&count_bg=%2379C83D&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=hits&edge_flat=false)](https://hits.seeyoufarm.com) ![Downloads](https://img.shields.io/github/downloads/datawhalechina/easy-rl/total)
+[![GitHub issues](https://img.shields.io/github/issues/datawhalechina/easy-rl)](https://github.com/datawhalechina/easy-rl/issues) [![GitHub stars](https://img.shields.io/github/stars/datawhalechina/easy-rl)](https://github.com/datawhalechina/easy-rl/stargazers) [![GitHub forks](https://img.shields.io/github/forks/datawhalechina/easy-rl)](https://github.com/datawhalechina/easy-rl/network) ![Downloads](https://img.shields.io/github/downloads/datawhalechina/easy-rl/total)
 <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://img.shields.io/badge/license-CC%20BY--NC--SA%204.0-lightgrey" /></a>
@@ -8,6 +8,31 @@
 本教程也称为“蘑菇书”，寓意是希望此书能够为读者注入活力，让读者“吃”下这本蘑菇之后，能够饶有兴致地探索强化学习，像马里奥那样愈加强大，继而在人工智能领域觅得意外的收获。
 ## 贡献者
 <table border="0">
  <tbody>
    <tr align="center" >
      <td>
         <a href="https://github.com/qiwang067"><img width="70" height="70" src="https://github.com/qiwang067.png?s=40" alt="pic"></a><br>
         <a href="https://github.com/qiwang067">Qi Wang</a> 
        <p>教程设计(第1~12章)<br> 上海交通大学博士生<br> 中国科学院大学硕士</p>
      </td>
      <td>
         <a href="https://github.com/yyysjz1997"><img width="70" height="70" src="https://github.com/yyysjz1997.png?s=40" alt="pic"></a><br>
         <a href="https://github.com/yyysjz1997">Yiyuan Yang</a> 
        <p>习题设计&第13章 <br> 牛津大学博士生<br> 清华大学硕士</p>
      </td>
      <td>
         <a href="https://github.com/JohnJim0816"><img width="70" height="70" src="https://github.com/JohnJim0816.png?s=40" alt="pic"></a><br>
         <a href="https://github.com/JohnJim0816">John Jim</a>
         <p>算法实战<br> 北京大学硕士</p>
      </td>
    </tr>
  </tbody>
 </table>
 ## 使用说明
 * 第 4 章到第 11 章为[李宏毅《深度强化学习》](http://speech.ee.ntu.edu.tw/~tlkagk/courses_MLDS18.html)的部分；
@@ -39,7 +64,8 @@
 豆瓣评分：https://book.douban.com/subject/35781275/
-ℹ️ **勘误修订表**：https://datawhalechina.github.io/easy-rl/#/errata
+> [!IMPORTANT]
 **勘误修订表**：https://datawhalechina.github.io/easy-rl/#/errata
 ## 在线阅读(内容实时更新)
@@ -49,7 +75,7 @@
 地址：https://github.com/datawhalechina/easy-rl/releases
-国内地址(推荐国内读者使用)：链接: https://pan.baidu.com/s/1isqQnpVRWbb3yh83Vs0kbw 提取码: us6a
+国内地址：链接: https://pan.baidu.com/s/1isqQnpVRWbb3yh83Vs0kbw 提取码: us6a
 压缩版(推荐网速较差的读者使用，文件小，图片分辨率较低)：链接: https://pan.baidu.com/s/1mUECyMKDZp-z4-CGjFhdAw 提取码: tzds 
@@ -102,29 +128,6 @@ PDF版本是全书初稿，人民邮电出版社的编辑老师们对初稿进
 * [蘑菇书开源组队学习活动](https://www.bilibili.com/video/BV1Ha41197Pg/?spm_id_from=333.999.0.0&vd_source=642fa389e9e78cff4881c038963ac312)
 * [蘑菇书开源学习与成长](https://www.bilibili.com/video/BV1xW4y1B72o/?spm_id_from=333.999.0.0&vd_source=642fa389e9e78cff4881c038963ac312)
 ## 贡献者
 <table border="0">
  <tbody>
    <tr align="center" >
      <td>
         <a href="https://github.com/qiwang067"><img width="70" height="70" src="https://github.com/qiwang067.png?s=40" alt="pic"></a><br>
         <a href="https://github.com/qiwang067">Qi Wang</a> 
        <p>教程设计(第1~12章)<br> 上海交通大学博士生<br> 中国科学院大学硕士</p>
      </td>
      <td>
         <a href="https://github.com/yyysjz1997"><img width="70" height="70" src="https://github.com/yyysjz1997.png?s=40" alt="pic"></a><br>
         <a href="https://github.com/yyysjz1997">Yiyuan Yang</a> 
        <p>习题设计&第13章 <br> 牛津大学博士生<br> 清华大学硕士</p>
      </td>
      <td>
         <a href="https://github.com/JohnJim0816"><img width="70" height="70" src="https://github.com/JohnJim0816.png?s=40" alt="pic"></a><br>
         <a href="https://github.com/JohnJim0816">John Jim</a>
         <p>算法实战<br> 北京大学硕士</p>
      </td>
    </tr>
  </tbody>
 </table>
 ## 引用信息
@@ -168,7 +171,7 @@ url = {https://github.com/datawhalechina/easy-rl}
 [![Forkers repo roster for @datawhalechina/easy-rl](https://reporoster.com/forks/datawhalechina/easy-rl)](https://github.com/datawhalechina/easy-rl/network/members)
 ## 关注我们
-扫描下方二维码关注公众号：Datawhale，回复关键词“Easy-RL读者交流群”，即可加入“Easy-RL读者交流群”
+扫描下方二维码关注公众号：Datawhale，回复关键词“Easy-RL”，即可加入“Easy-RL读者交流群”
 <div align=center><img src="https://raw.githubusercontent.com/datawhalechina/easy-rl/master/docs/res/qrcode.jpeg" width = "250" height = "270" alt="Datawhale是一个专注AI领域的开源组织，以“for the learner，和学习者一起成长”为愿景，构建对学习者最有价值的开源学习社区。关注我们，一起学习成长。"></div>
 ## LICENSE
@@ -33,7 +33,7 @@
 行为克隆还有一个问题：智能体会完全模仿专家的行为，不管专家的行为是否有道理，就算没有道理，没有什么用，就算这是专家本身的习惯，智能体也会把它记下来。如果智能体确实可以记住所有专家的行为，也许还好。因为如果专家这么做，有些行为是多余的。但是没有问题，假设智能体的行为可以完全仿造专家行为，也就算了，它就是与专家一样得好，只是做一些多余的事。但问题是智能体是一个网络，网络的容量是有限的。就算给网络训练数据，它在训练数据上得到的正确率往往也不是 100\%，它有些事情是学不起来的。这个时候，什么该学，什么不该学就变得很重要。
-例如，如图 11.4 所示，在学习中文的时候，老师有语音、行为和知识，但其实只有语音部分是重要的，知识部分是不重要的。也许智能体只能学一件事，如果它只学到了语音，没有问题。如果它只学到了手势，这样就有问题了。所以让智能体学习什么东西是需要模仿的、什么东西是不需要模仿的，这件事情是很重要的。而单纯的行为克隆没有学习这件事情，因为智能体只是复制专家所有的行为而已，它不知道哪些行为是重要的，是对接下来有影响的，哪些行为是不重要的、是对接下来没有影响的。
+例如，如图 11.4 所示，在学习中文的时候，老师有语音和手势，但只有语音部分是重要的，手势部分是不重要的。也许智能体只能学一件事，如果它只学到了语音，没有问题。如果它只学到了手势，这样就有问题了。所以让智能体学习什么东西是需要模仿的、什么东西是不需要模仿的，这件事情是很重要的。而单纯的行为克隆没有学习这件事情，因为智能体只是复制专家所有的行为而已，它不知道哪些行为是重要的，是对接下来有影响的，哪些行为是不重要的、是对接下来没有影响的。
 <div align=center>
 <img width="550" src="../img/ch11/11.5.png"/>
@@ -765,7 +765,7 @@ $$
 价值迭代算法的过程如下。
-（1）初始化：令 $k=1$，对于所有状态 $s$，$V_0(s)=0$。
+（1）初始化：令 $k=0$，对于所有状态 $s$，$V_0(s)=0$。
 （2）对于 $k=1:H$（$H$是让$V(s)$收敛所需的迭代次数）
@@ -3,6 +3,7 @@
 **如何使用勘误？首先找到你的书的印次，接下来对着下表索引印次，该印次之后所有的勘误都是你的书中所要注意的勘误，印次前的所有勘误在当印次和之后印次均已印刷修正。为方便读者，所有修订内容都列举在此。其中部分修订是为了更便于读者理解，并非原文有误。**
 ## 第1版第15次印刷（2025.03）
 * 62页，第一段第2行：令 $k=1$ → 令 $k=0$
 ## 第1版第14次印刷（2025.02）
Author	SHA1	Message	Date
qiwang	9c99315ed4	update ch11	2025-08-06 17:33:25 +08:00
qiwang	0cbe50a813	Merge branch 'master' of github.com:datawhalechina/easy-rl	2025-08-06 17:32:16 +08:00
qiwang	a3da4be5b3	update ch11	2025-08-06 17:32:00 +08:00
Qi WangandGitHub	2debe15dc8	Update README.md	2025-06-19 10:45:46 +08:00
Qi WangandGitHub	f958882ec1	Update README.md	2025-06-13 23:27:08 +08:00
Qi WangandGitHub	dec972fc3a	Update README.md	2025-06-07 22:54:35 +08:00
qiwang	80de123f00	Merge branch 'master' of github.com:datawhalechina/easy-rl	2025-06-04 23:28:08 +08:00
qiwang	c685e5b276	update ch2 typo	2025-06-04 23:27:57 +08:00
Qi WangandGitHub	9782af53f5	Update README.md	2025-05-13 13:57:27 +08:00
Qi WangandGitHub	fc65fc9b45	Update README.md	2025-05-01 13:44:50 +08:00