40 lines
458 B
Markdown
Executable File
40 lines
458 B
Markdown
Executable File
- 目录
|
|
- [P1 Policy Gradient](chapter1/chapter1.md)
|
|
- [P2 Proximal Policy Optimization (PPO)](chapter2/chapter2.md)
|
|
- [P3 Q-learning (Basic Idea)](chapter3/chapter3.md)
|
|
- [P4 Q-learning (Advanced Tips)](chapter4/chapter4.md)
|
|
- [P5 Q-learning (Continuous Action)](chapter5/chapter5.md)
|
|
- [P6 Actor-Critic](chapter6/chapter6.md)
|
|
- [P7 Sparse Reward](chapter7/chapter7.md)
|
|
- [P8 Imitation Learning](chapter8/chapter8.md)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|