Update readme.md
This commit is contained in:
@@ -20,7 +20,7 @@
|
||||
| | Trust Region Policy Optimization (**TRPO**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Trust%20Region%20Policy%20Optimization.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/Trust%20Region%20Policy%20Optimization.pdf) | https://arxiv.org/abs/1502.05477 | |
|
||||
| | High-Dimensional Continuous Control Using Generalized Advantage Estimation (**GAE**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/High-Dimensional%20Continuous%20Control%20Using%20Generalized%20Advantage%20Estimation.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/High-Dimensional%20Continuous%20Control%20Using%20Generalised%20Advantage%20Estimation.pdf) | https://arxiv.org/abs/1506.02438 | |
|
||||
| | Proximal Policy Optimization Algorithms (**PPO**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Proximal%20Policy%20Optimization%20Algorithms.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/Proximal%20Policy%20Optimization%20Algorithms.pdf) | https://arxiv.org/abs/1707.06347 | |
|
||||
| | Emergence of Locomotion Behaviours in Rich Environments (**PPO-Penalty**) | https://arxiv.org/abs/1707.02286 | |
|
||||
| | Emergence of Locomotion Behaviours in Rich Environments (**PPO-Penalty**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Emergence%20of%20Locomotion%20Behaviours%20in%20Rich%20Environments.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/Emergence%20of%20Locomotion%20Behaviours%20in%20Rich%20Environments.pdf) | https://arxiv.org/abs/1707.02286 | |
|
||||
| | Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (**ACKTP**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Scalable%20trust-region%20method%20for%20deep%20reinforcement%20learning%20using%20Kronecker-factored.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/Scalable%20trust-region%20method%20for%20deep%20reinforcement%20learning%20using%20Kronecker-factored.pdf) | https://arxiv.org/abs/1708.05144 | |
|
||||
| | Sample Efficient Actor-Critic with Experience Replay (**ACER**) | https://arxiv.org/abs/1611.01224 | |
|
||||
| | Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor (**SAC**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Soft%20Actor-Critic_Off-Policy%20Maximum%20Entropy%20Deep%20Reinforcement%20Learning%20with%20a%20Stochastic%20Actor.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/Soft%20Actor-Critic_Off-Policy%20Maximum%20Entropy%20Deep%20Reinforcement%20Learning%20with%20a%20Stochastic%20Actor.pdf) | https://arxiv.org/abs/1801.01290 | |
|
||||
|
||||
Reference in New Issue
Block a user