Files
easy-rl/docs/README.md
2025-06-19 10:45:46 +08:00

182 lines
15 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
[![GitHub issues](https://img.shields.io/github/issues/datawhalechina/easy-rl)](https://github.com/datawhalechina/easy-rl/issues) [![GitHub stars](https://img.shields.io/github/stars/datawhalechina/easy-rl)](https://github.com/datawhalechina/easy-rl/stargazers) [![GitHub forks](https://img.shields.io/github/forks/datawhalechina/easy-rl)](https://github.com/datawhalechina/easy-rl/network) ![Downloads](https://img.shields.io/github/downloads/datawhalechina/easy-rl/total)
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://img.shields.io/badge/license-CC%20BY--NC--SA%204.0-lightgrey" /></a>
# 蘑菇书EasyRL
李宏毅老师的《深度强化学习》是强化学习领域经典的中文视频之一。李老师幽默风趣的上课风格让晦涩难懂的强化学习理论变得轻松易懂,他会通过很多有趣的例子来讲解强化学习理论。比如老师经常会用玩 Atari 游戏的例子来讲解强化学习算法。此外,为了教程的完整性,我们整理了周博磊老师的《强化学习纲要》、李科浇老师的《世界冠军带你从零实践强化学习》以及多个强化学习的经典资料作为补充。对于想入门强化学习又想看中文讲解的人来说绝对是非常推荐的。
本教程也称为“蘑菇书”,寓意是希望此书能够为读者注入活力,让读者“吃”下这本蘑菇之后,能够饶有兴致地探索强化学习,像马里奥那样愈加强大,继而在人工智能领域觅得意外的收获。
## 贡献者
<table border="0">
<tbody>
<tr align="center" >
<td>
<a href="https://github.com/qiwang067"><img width="70" height="70" src="https://github.com/qiwang067.png?s=40" alt="pic"></a><br>
<a href="https://github.com/qiwang067">Qi Wang</a>
<p>教程设计(第1~12章)<br> 上海交通大学博士生<br> 中国科学院大学硕士</p>
</td>
<td>
<a href="https://github.com/yyysjz1997"><img width="70" height="70" src="https://github.com/yyysjz1997.png?s=40" alt="pic"></a><br>
<a href="https://github.com/yyysjz1997">Yiyuan Yang</a>
<p>习题设计&第13章 <br> 牛津大学博士生<br> 清华大学硕士</p>
</td>
<td>
<a href="https://github.com/JohnJim0816"><img width="70" height="70" src="https://github.com/JohnJim0816.png?s=40" alt="pic"></a><br>
<a href="https://github.com/JohnJim0816">John Jim</a>
<p>算法实战<br> 北京大学硕士</p>
</td>
</tr>
</tbody>
</table>
## 使用说明
* 第 4 章到第 11 章为[李宏毅《深度强化学习》](http://speech.ee.ntu.edu.tw/~tlkagk/courses_MLDS18.html)的部分;
* 第 1 章和第 2 章根据[《强化学习纲要》](https://github.com/zhoubolei/introRL)整理而来;
* 第 3 章和第 12 章根据[《世界冠军带你从零实践强化学习》](https://aistudio.baidu.com/aistudio/education/group/info/1335) 整理而来。
## 纸质版
<img src="https://raw.githubusercontent.com/datawhalechina/easy-rl/master/docs/res/mogu.png" width="300">
推荐购买链接:[京东](https://item.jd.com/13075567.html) | [当当](http://product.dangdang.com/29374163.html)
<table border="0">
<tbody>
<tr align="center" >
<td>
<img width="120" height="120" src="https://raw.githubusercontent.com/datawhalechina/easy-rl/master/docs/res/qrcode_jingdong.jpg" alt="pic">
<br>
<p>推荐京东扫码购买</p>
</td>
<td>
<img width="120" height="120" src="https://raw.githubusercontent.com/datawhalechina/easy-rl/master/docs/res/qrcode_dangdang.png" alt="pic"><br>
<p>当当扫码购买</p>
</td>
</tr>
</tbody>
</table>
豆瓣评分https://book.douban.com/subject/35781275/
> [!IMPORTANT]
**勘误修订表**https://datawhalechina.github.io/easy-rl/#/errata
## 在线阅读(内容实时更新)
地址https://datawhalechina.github.io/easy-rl/
## 最新版PDF下载
地址https://github.com/datawhalechina/easy-rl/releases
国内地址:链接: https://pan.baidu.com/s/1isqQnpVRWbb3yh83Vs0kbw 提取码: us6a
压缩版(推荐网速较差的读者使用,文件小,图片分辨率较低):链接: https://pan.baidu.com/s/1mUECyMKDZp-z4-CGjFhdAw 提取码: tzds
## 纸质版和PDF版的区别
PDF版本是全书初稿人民邮电出版社的编辑老师们对初稿进行了反复修缮最终诞生了纸质书籍在此向人民邮电出版社的编辑老师的认真严谨表示衷心的感谢校对样稿
<center class="half"><img src="https://raw.githubusercontent.com/datawhalechina/easy-rl/master/docs/res/yanggao.png" width="680"></center>
## 内容导航
| 章节 | 习题 | 相关项目 | 配套代码 |
| ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| [第一章 强化学习基础](https://datawhalechina.github.io/easy-rl/#/chapter1/chapter1) | [第一章 习题](https://datawhalechina.github.io/easy-rl/#/chapter1/chapter1_questions&keywords) | | |
| [第二章 马尔可夫决策过程 (MDP)](https://datawhalechina.github.io/easy-rl/#/chapter2/chapter2) | [第二章 习题](https://datawhalechina.github.io/easy-rl/#/chapter2/chapter2_questions&keywords) | | [值迭代算法](https://github.com/datawhalechina/easy-rl/blob/master/notebooks/Value%20Iteration/value_iteration.ipynb) |
| [第三章 表格型方法](https://datawhalechina.github.io/easy-rl/#/chapter3/chapter3) | [第三章 习题](https://datawhalechina.github.io/easy-rl/#/chapter3/chapter3_questions&keywords) | [Q-learning算法实战](https://datawhalechina.github.io/easy-rl/#/chapter3/project1) | [Q-learning](https://github.com/datawhalechina/easy-rl/tree/master/notebooks/Q-learning)[Sarsa](https://github.com/datawhalechina/easy-rl/blob/master/notebooks/Sarsa.ipynb)[蒙特卡洛](https://github.com/datawhalechina/easy-rl/blob/master/notebooks/MonteCarlo.ipynb) |
| [第四章 策略梯度](https://datawhalechina.github.io/easy-rl/#/chapter4/chapter4) | [第四章 习题](https://datawhalechina.github.io/easy-rl/#/chapter4/chapter4_questions&keywords) | | [策略梯度](https://github.com/datawhalechina/easy-rl/blob/master/notebooks/PolicyGradient.ipynb) |
| [第五章 近端策略优化 (PPO) 算法](https://datawhalechina.github.io/easy-rl/#/chapter5/chapter5) | [第五章 习题](https://datawhalechina.github.io/easy-rl/#/chapter5/chapter5_questions&keywords) | | [PPO](https://github.com/datawhalechina/easy-rl/blob/master/notebooks/PPO.ipynb) |
| [第六章 DQN (基本概念)](https://datawhalechina.github.io/easy-rl/#/chapter6/chapter6) | [第六章 习题](https://datawhalechina.github.io/easy-rl/#/chapter6/chapter6_questions&keywords) | | [DQN](https://github.com/datawhalechina/easy-rl/blob/master/notebooks/DQN.ipynb) |
| [第七章 DQN (进阶技巧)](https://datawhalechina.github.io/easy-rl/#/chapter7/chapter7) | [第七章 习题](https://datawhalechina.github.io/easy-rl/#/chapter7/chapter7_questions&keywords) | [DQN算法实战](https://datawhalechina.github.io/easy-rl/#/chapter7/project2) | [Double DQN](https://github.com/datawhalechina/easy-rl/blob/master/notebooks/DoubleDQN.ipynb)[Dueling DQN](https://github.com/datawhalechina/easy-rl/blob/master/notebooks/DuelingDQN.ipynb)[PER DQN](https://github.com/datawhalechina/easy-rl/blob/master/notebooks/PER_DQN.ipynb)[Noisy DQN](https://github.com/datawhalechina/easy-rl/blob/master/notebooks/NoisyDQN.ipynb) |
| [第八章 DQN (连续动作)](https://datawhalechina.github.io/easy-rl/#/chapter8/chapter8) | [第八章 习题](https://datawhalechina.github.io/easy-rl/#/chapter8/chapter8_questions&keywords) | | |
| [第九章 演员-评论员算法](https://datawhalechina.github.io/easy-rl/#/chapter9/chapter9) | [第九章 习题](https://datawhalechina.github.io/easy-rl/#/chapter9/chapter9_questions&keywords) | | [A2C](https://github.com/datawhalechina/easy-rl/blob/master/notebooks/A2C.ipynb) |
| [第十章 稀疏奖励](https://datawhalechina.github.io/easy-rl/#/chapter10/chapter10) | [第十章 习题](https://datawhalechina.github.io/easy-rl/#/chapter10/chapter10_questions&keywords) | | |
| [第十一章 模仿学习](https://datawhalechina.github.io/easy-rl/#/chapter11/chapter11) | [第十一章 习题](https://datawhalechina.github.io/easy-rl/#/chapter11/chapter11_questions&keywords) | | |
| [第十二章 深度确定性策略梯度 (DDPG) 算法](https://datawhalechina.github.io/easy-rl/#/chapter12/chapter12) | [第十二章 习题](https://datawhalechina.github.io/easy-rl/#/chapter12/chapter12_questions&keywords) | [DDPG算法实战](https://datawhalechina.github.io/easy-rl/#/chapter12/project3) | [DDPG](https://github.com/datawhalechina/easy-rl/blob/master/notebooks/DDPG.ipynb) |
| [第十三章 AlphaStar 论文解读](https://datawhalechina.github.io/easy-rl/#/chapter13/chapter13) | | | |
| [ICLR'25 Oral 强化学习玩MineCraft](https://datawhalechina.github.io/easy-rl/#/chapter14/ls-imagine) | | | |
| [视觉强化学习论文清单Awesome Visual RL](https://github.com/qiwang067/awesome-visual-rl) | | | |
## 算法实战
算法实战部分包括附书代码和JoyRL代码
* [蘑菇书附书代码](https://github.com/datawhalechina/easy-rl/tree/master/notebooks)
* [JoyRL离线版](https://github.com/johnjim0816/rl-tutorials/tree/master/joyrl)
* [JoyRL上线版](https://github.com/datawhalechina/joyrl)
## 经典强化学习论文解读
[点击](https://github.com/datawhalechina/easy-rl/tree/master/papers)或者网页点击```papers```文件夹进入经典强化学习论文解读
## 扩展资源
- 对**强化学习玩我的世界Minecraft游戏**感兴趣的读者,可阅读 [LS-Imagine](https://github.com/qiwang067/LS-Imagine)
- 对**视觉强化学习**感兴趣的读者,可阅读[Awesome Visual RL](https://github.com/qiwang067/awesome-visual-rl)
- 对**深度学习**感兴趣的读者,可阅读[李宏毅深度学习教程LeeDL-Tutorial](https://github.com/datawhalechina/leedl-tutorial)
## 相关视频内容
* [《Datawhale强化学习教程》出版](https://www.bilibili.com/video/BV1rb4y1x7Zd/?spm_id_from=333.999.0.0&vd_source=642fa389e9e78cff4881c038963ac312)
* [蘑菇书起源与RL入门指南](https://www.bilibili.com/video/BV1HZ4y1v7eX/?spm_id_from=333.999.0.0&vd_source=642fa389e9e78cff4881c038963ac312)
* [蘑菇书开源组队学习活动](https://www.bilibili.com/video/BV1Ha41197Pg/?spm_id_from=333.999.0.0&vd_source=642fa389e9e78cff4881c038963ac312)
* [蘑菇书开源学习与成长](https://www.bilibili.com/video/BV1xW4y1B72o/?spm_id_from=333.999.0.0&vd_source=642fa389e9e78cff4881c038963ac312)
## 引用信息
```
王琦杨毅远江季Easy RL强化学习教程人民邮电出版社https://github.com/datawhalechina/easy-rl, 2022.
Qi Wang, Yiyuan Yang, Ji JiangEasy RL: Reinforcement Learning TutorialPosts & Telecom Presshttps://github.com/datawhalechina/easy-rl, 2022.
```
```bibtex
@book{wang2022easyrl,
title = {Easy RL强化学习教程},
publisher = {人民邮电出版社},
year = {2022},
author = {王琦,杨毅远,江季},
address = {北京},
isbn = {9787115584700},
url = {https://github.com/datawhalechina/easy-rl}
}
```
```bibtex
@book{wang2022easyrl,
title = {Easy RL: Reinforcement Learning Tutorial},
publisher = {Posts & Telecom Press},
year = {2022},
author = {Qi Wang, Yiyuan Yang, Ji Jiang},
address = {Beijing},
isbn = {9787115584700},
url = {https://github.com/datawhalechina/easy-rl}
}
```
如果您需要转载该教程的内容,请注明出处:[https://github.com/datawhalechina/easy-rl](https://github.com/datawhalechina/easy-rl)。
## 致谢
特别感谢 [@Sm1les](https://github.com/Sm1les)、[@LSGOMYP](https://github.com/LSGOMYP) 对本项目的帮助与支持。
另外十分感谢大家对于Easy-RL的关注。
[![Stargazers repo roster for @datawhalechina/easy-rl](https://reporoster.com/stars/datawhalechina/easy-rl)](https://github.com/datawhalechina/easy-rl/stargazers)
[![Forkers repo roster for @datawhalechina/easy-rl](https://reporoster.com/forks/datawhalechina/easy-rl)](https://github.com/datawhalechina/easy-rl/network/members)
## 关注我们
扫描下方二维码关注公众号Datawhale回复关键词“Easy-RL”即可加入“Easy-RL读者交流群”
<div align=center><img src="https://raw.githubusercontent.com/datawhalechina/easy-rl/master/docs/res/qrcode.jpeg" width = "250" height = "270" alt="Datawhale是一个专注AI领域的开源组织以“for the learner和学习者一起成长”为愿景构建对学习者最有价值的开源学习社区。关注我们一起学习成长。"></div>
## LICENSE
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://img.shields.io/badge/license-CC%20BY--NC--SA%204.0-lightgrey" /></a><br />本作品采用<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议</a>进行许可。
## Star History
[![Star History](https://api.star-history.com/svg?repos=datawhalechina/easy-rl)](https://star-history.com/#datawhalechina/easy-rl&Date)