diff --git a/README.md b/README.md index 6348c95..e0950bd 100644 --- a/README.md +++ b/README.md @@ -14,7 +14,7 @@ - [P2 Proximal Policy Optimization (PPO)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter2/chapter2) - [P3 Q-learning (基本概念)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter3/chapter3) - [P4 Q-learning (进阶技巧)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter4/chapter4) -- [P5 Q-learning (连续行动)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter5/chapter5) +- [P5 Q-learning (连续动作)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter5/chapter5) - [P6 Actor-Critic](https://datawhalechina.github.io/leedeeprl-notes/#/chapter6/chapter6) - [P7 稀疏奖励](https://datawhalechina.github.io/leedeeprl-notes/#/chapter7/chapter7) - [P8 模仿学习](https://datawhalechina.github.io/leedeeprl-notes/#/chapter8/chapter8) diff --git a/docs/README.md b/docs/README.md index 434b3b1..7d242c2 100644 --- a/docs/README.md +++ b/docs/README.md @@ -11,7 +11,7 @@ - [P2 Proximal Policy Optimization (PPO)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter2/chapter2) - [P3 Q-learning (基本概念)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter3/chapter3) - [P4 Q-learning (进阶技巧)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter4/chapter4) -- [P5 Q-learning (连续行动)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter5/chapter5) +- [P5 Q-learning (连续动作)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter5/chapter5) - [P6 Actor-Critic](https://datawhalechina.github.io/leedeeprl-notes/#/chapter6/chapter6) - [P7 稀疏奖励](https://datawhalechina.github.io/leedeeprl-notes/#/chapter7/chapter7) - [P8 模仿学习](https://datawhalechina.github.io/leedeeprl-notes/#/chapter8/chapter8)