change contents

2020-07-04 15:52:20 +08:00
parent 359828f6e1
commit 03dedd6cc1
3 changed files with 18 additions and 18 deletions
--- a/README.md
+++ b/README.md
@@ -11,11 +11,11 @@

 ## 目录
 - [P1 策略梯度](https://datawhalechina.github.io/leedeeprl-notes/#/chapter1/chapter1)
- [P2 Proximal Policy Optimization (PPO)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter2/chapter2)
- [P3 Q-learning (基本概念)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter3/chapter3)
- [P4 Q-learning (进阶技巧)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter4/chapter4)
- [P5 Q-learning (连续动作)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter5/chapter5)
- [P6 Actor-Critic](https://datawhalechina.github.io/leedeeprl-notes/#/chapter6/chapter6)
+- [P2 近端策略优化 (PPO) 算法](https://datawhalechina.github.io/leedeeprl-notes/#/chapter2/chapter2)
+- [P3 Q 学习 (基本概念)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter3/chapter3)
+- [P4 Q 学习 (进阶技巧)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter4/chapter4)
+- [P5 Q 学习 (连续动作)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter5/chapter5)
+- [P6 演员-评论员算法](https://datawhalechina.github.io/leedeeprl-notes/#/chapter6/chapter6)
 - [P7 稀疏奖励](https://datawhalechina.github.io/leedeeprl-notes/#/chapter7/chapter7)
 - [P8 模仿学习](https://datawhalechina.github.io/leedeeprl-notes/#/chapter8/chapter8)

--- a/docs/README.md
+++ b/docs/README.md
@@ -8,11 +8,11 @@

 ## 目录
 - [P1 策略梯度](https://datawhalechina.github.io/leedeeprl-notes/#/chapter1/chapter1)
- [P2 Proximal Policy Optimization (PPO)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter2/chapter2)
- [P3 Q-learning (基本概念)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter3/chapter3)
- [P4 Q-learning (进阶技巧)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter4/chapter4)
- [P5 Q-learning (连续动作)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter5/chapter5)
- [P6 Actor-Critic](https://datawhalechina.github.io/leedeeprl-notes/#/chapter6/chapter6)
+- [P2 近端策略优化 (PPO) 算法](https://datawhalechina.github.io/leedeeprl-notes/#/chapter2/chapter2)
+- [P3 Q 学习 (基本概念)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter3/chapter3)
+- [P4 Q 学习 (进阶技巧)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter4/chapter4)
+- [P5 Q 学习 (连续动作)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter5/chapter5)
+- [P6 演员-评论员算法](https://datawhalechina.github.io/leedeeprl-notes/#/chapter6/chapter6)
 - [P7 稀疏奖励](https://datawhalechina.github.io/leedeeprl-notes/#/chapter7/chapter7)
 - [P8 模仿学习](https://datawhalechina.github.io/leedeeprl-notes/#/chapter8/chapter8)

--- a/docs/_sidebar.md
+++ b/docs/_sidebar.md
@@ -1,12 +1,12 @@
 - 目录
- - [P1 Policy Gradient](chapter1/chapter1.md)
- - [P2 Proximal Policy Optimization (PPO)](chapter2/chapter2.md)
- - [P3 Q-learning (Basic Idea)](chapter3/chapter3.md)
- - [P4 Q-learning (Advanced Tips)](chapter4/chapter4.md)
- - [P5 Q-learning (Continuous Action)](chapter5/chapter5.md)
- - [P6 Actor-Critic](chapter6/chapter6.md)
- - [P7 Sparse Reward](chapter7/chapter7.md)
- - [P8 Imitation Learning](chapter8/chapter8.md)
+ - [P1 策略梯度](chapter1/chapter1.md)
+ - [P2 近端策略优化 (PPO) 算法](chapter2/chapter2.md)
+ - [P3 Q 学习 (基本概念)](chapter3/chapter3.md)
+ - [P4 Q 学习 (进阶技巧)](chapter4/chapter4.md)
+ - [P5 Q 学习 (连续动作)](chapter5/chapter5.md)
+ - [P6 演员-评论员算法](chapter6/chapter6.md)
+ - [P7 稀疏奖励](chapter7/chapter7.md)
+ - [P8 模仿学习](chapter8/chapter8.md)