From 6bb5b04f58f9f036d3e8be970f4d671a1767969e Mon Sep 17 00:00:00 2001 From: Yiyuan Yang Date: Tue, 22 Nov 2022 18:50:16 +0800 Subject: [PATCH] Update readme.md --- papers/readme.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/papers/readme.md b/papers/readme.md index 0432429..3c7e58b 100644 --- a/papers/readme.md +++ b/papers/readme.md @@ -19,7 +19,7 @@ | Policy gradient | Asynchronous Methods for Deep Reinforcement Learning (**A3C**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Asynchronous%20Methods%20for%20Deep%20Reinforcement%20Learning.md) | https://arxiv.org/abs/1602.01783 | | | | Trust Region Policy Optimization (**TRPO**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Trust%20Region%20Policy%20Optimization.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/Trust%20Region%20Policy%20Optimization.pdf)| https://arxiv.org/abs/1502.05477 | | | | High-Dimensional Continuous Control Using Generalized Advantage Estimation (**GAE**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/High-Dimensional%20Continuous%20Control%20Using%20Generalized%20Advantage%20Estimation.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/High-Dimensional%20Continuous%20Control%20Using%20Generalised%20Advantage%20Estimation.pdf) | https://arxiv.org/abs/1506.02438 | | -| | Proximal Policy Optimization Algorithms (**PPO**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Proximal%20Policy%20Optimization%20Algorithms.md) | https://arxiv.org/abs/1707.06347 | | +| | Proximal Policy Optimization Algorithms (**PPO**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Proximal%20Policy%20Optimization%20Algorithms.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/Proximal%20Policy%20Optimization%20Algorithms.pdf) | https://arxiv.org/abs/1707.06347 | | | | Emergence of Locomotion Behaviours in Rich Environments (**PPO-Penalty**) | https://arxiv.org/abs/1707.02286 | | | | Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (**ACKTP**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Scalable%20trust-region%20method%20for%20deep%20reinforcement%20learning%20using%20Kronecker-factored.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/Scalable%20trust-region%20method%20for%20deep%20reinforcement%20learning%20using%20Kronecker-factored.pdf)| https://arxiv.org/abs/1708.05144 | | | | Sample Efficient Actor-Critic with Experience Replay (**ACER**) | https://arxiv.org/abs/1611.01224 | |