From 03dedd6cc11b5d93f623aeb1523d54e2f7bb253b Mon Sep 17 00:00:00 2001 From: qiwang067 Date: Sat, 4 Jul 2020 15:52:20 +0800 Subject: [PATCH] change contents --- README.md | 10 +++++----- docs/README.md | 10 +++++----- docs/_sidebar.md | 16 ++++++++-------- 3 files changed, 18 insertions(+), 18 deletions(-) diff --git a/README.md b/README.md index e0950bd..df983e1 100644 --- a/README.md +++ b/README.md @@ -11,11 +11,11 @@ ## 目录 - [P1 策略梯度](https://datawhalechina.github.io/leedeeprl-notes/#/chapter1/chapter1) -- [P2 Proximal Policy Optimization (PPO)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter2/chapter2) -- [P3 Q-learning (基本概念)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter3/chapter3) -- [P4 Q-learning (进阶技巧)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter4/chapter4) -- [P5 Q-learning (连续动作)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter5/chapter5) -- [P6 Actor-Critic](https://datawhalechina.github.io/leedeeprl-notes/#/chapter6/chapter6) +- [P2 近端策略优化 (PPO) 算法](https://datawhalechina.github.io/leedeeprl-notes/#/chapter2/chapter2) +- [P3 Q 学习 (基本概念)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter3/chapter3) +- [P4 Q 学习 (进阶技巧)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter4/chapter4) +- [P5 Q 学习 (连续动作)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter5/chapter5) +- [P6 演员-评论员算法](https://datawhalechina.github.io/leedeeprl-notes/#/chapter6/chapter6) - [P7 稀疏奖励](https://datawhalechina.github.io/leedeeprl-notes/#/chapter7/chapter7) - [P8 模仿学习](https://datawhalechina.github.io/leedeeprl-notes/#/chapter8/chapter8) diff --git a/docs/README.md b/docs/README.md index 7d242c2..79bc674 100644 --- a/docs/README.md +++ b/docs/README.md @@ -8,11 +8,11 @@ ## 目录 - [P1 策略梯度](https://datawhalechina.github.io/leedeeprl-notes/#/chapter1/chapter1) -- [P2 Proximal Policy Optimization (PPO)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter2/chapter2) -- [P3 Q-learning (基本概念)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter3/chapter3) -- [P4 Q-learning (进阶技巧)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter4/chapter4) -- [P5 Q-learning (连续动作)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter5/chapter5) -- [P6 Actor-Critic](https://datawhalechina.github.io/leedeeprl-notes/#/chapter6/chapter6) +- [P2 近端策略优化 (PPO) 算法](https://datawhalechina.github.io/leedeeprl-notes/#/chapter2/chapter2) +- [P3 Q 学习 (基本概念)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter3/chapter3) +- [P4 Q 学习 (进阶技巧)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter4/chapter4) +- [P5 Q 学习 (连续动作)](https://datawhalechina.github.io/leedeeprl-notes/#/chapter5/chapter5) +- [P6 演员-评论员算法](https://datawhalechina.github.io/leedeeprl-notes/#/chapter6/chapter6) - [P7 稀疏奖励](https://datawhalechina.github.io/leedeeprl-notes/#/chapter7/chapter7) - [P8 模仿学习](https://datawhalechina.github.io/leedeeprl-notes/#/chapter8/chapter8) diff --git a/docs/_sidebar.md b/docs/_sidebar.md index 6efb656..d595064 100755 --- a/docs/_sidebar.md +++ b/docs/_sidebar.md @@ -1,12 +1,12 @@ - 目录 - - [P1 Policy Gradient](chapter1/chapter1.md) - - [P2 Proximal Policy Optimization (PPO)](chapter2/chapter2.md) - - [P3 Q-learning (Basic Idea)](chapter3/chapter3.md) - - [P4 Q-learning (Advanced Tips)](chapter4/chapter4.md) - - [P5 Q-learning (Continuous Action)](chapter5/chapter5.md) - - [P6 Actor-Critic](chapter6/chapter6.md) - - [P7 Sparse Reward](chapter7/chapter7.md) - - [P8 Imitation Learning](chapter8/chapter8.md) + - [P1 策略梯度](chapter1/chapter1.md) + - [P2 近端策略优化 (PPO) 算法](chapter2/chapter2.md) + - [P3 Q 学习 (基本概念)](chapter3/chapter3.md) + - [P4 Q 学习 (进阶技巧)](chapter4/chapter4.md) + - [P5 Q 学习 (连续动作)](chapter5/chapter5.md) + - [P6 演员-评论员算法](chapter6/chapter6.md) + - [P7 稀疏奖励](chapter7/chapter7.md) + - [P8 模仿学习](chapter8/chapter8.md)