40 lines
445 B
Markdown
Executable File
40 lines
445 B
Markdown
Executable File
- 目录
|
|
- [P1 策略梯度](chapter1/chapter1.md)
|
|
- [P2 近端策略优化 (PPO) 算法](chapter2/chapter2.md)
|
|
- [P3 Q 学习 (基本概念)](chapter3/chapter3.md)
|
|
- [P4 Q 学习 (进阶技巧)](chapter4/chapter4.md)
|
|
- [P5 Q 学习 (连续动作)](chapter5/chapter5.md)
|
|
- [P6 演员-评论员算法](chapter6/chapter6.md)
|
|
- [P7 稀疏奖励](chapter7/chapter7.md)
|
|
- [P8 模仿学习](chapter8/chapter8.md)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|