change readme
This commit is contained in:
@@ -11,7 +11,7 @@
|
||||
- [P2 Proximal Policy Optimization (PPO)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter2/chapter2)
|
||||
- [P3 Q-learning (基本概念)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter3/chapter3)
|
||||
- [P4 Q-learning (进阶技巧)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter4/chapter4)
|
||||
- [P5 Q-learning (连续行动)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter5/chapter5)
|
||||
- [P5 Q-learning (连续动作)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter5/chapter5)
|
||||
- [P6 Actor-Critic](https://datawhalechina.github.io/leedeeprl-notes/#/chapter6/chapter6)
|
||||
- [P7 稀疏奖励](https://datawhalechina.github.io/leedeeprl-notes/#/chapter7/chapter7)
|
||||
- [P8 模仿学习](https://datawhalechina.github.io/leedeeprl-notes/#/chapter8/chapter8)
|
||||
|
||||
Reference in New Issue
Block a user