Update readme.md
This commit is contained in:
@@ -20,7 +20,7 @@
|
|||||||
| | Prioritized Experience Replay (**PER**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/DQN/Prioritized%20Experience%20Replay.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/DQN/PDF/Prioritized%20Experience%20Replay.pdf) | https://arxiv.org/abs/1511.05952 | |
|
| | Prioritized Experience Replay (**PER**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/DQN/Prioritized%20Experience%20Replay.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/DQN/PDF/Prioritized%20Experience%20Replay.pdf) | https://arxiv.org/abs/1511.05952 | |
|
||||||
| | Rainbow: Combining Improvements in Deep Reinforcement Learning (**Rainbow**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/DQN/Rainbow_Combining%20Improvements%20in%20Deep%20Reinforcement%20Learning.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/DQN/PDF/Rainbow_Combining%20Improvements%20in%20Deep%20Reinforcement%20Learning.pdf) | https://arxiv.org/abs/1710.02298 | |
|
| | Rainbow: Combining Improvements in Deep Reinforcement Learning (**Rainbow**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/DQN/Rainbow_Combining%20Improvements%20in%20Deep%20Reinforcement%20Learning.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/DQN/PDF/Rainbow_Combining%20Improvements%20in%20Deep%20Reinforcement%20Learning.pdf) | https://arxiv.org/abs/1710.02298 | |
|
||||||
| | A Distributional Perspective on Reinforcement Learning (**C51**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/A%20Distributional%20Perspective%20on%20Reinforcement%20Learning.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/A%20Distributional%20Perspective%20on%20Reinforcement%20Learning.pdf) | https://arxiv.org/abs/1707.06887 | |
|
| | A Distributional Perspective on Reinforcement Learning (**C51**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/A%20Distributional%20Perspective%20on%20Reinforcement%20Learning.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/A%20Distributional%20Perspective%20on%20Reinforcement%20Learning.pdf) | https://arxiv.org/abs/1707.06887 | |
|
||||||
| Policy -based | Asynchronous Methods for Deep Reinforcement Learning (**A3C**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Asynchronous%20Methods%20for%20Deep%20Reinforcement%20Learning.md) | https://arxiv.org/abs/1602.01783 | |
|
| Policy -based | Asynchronous Methods for Deep Reinforcement Learning (**A3C**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Asynchronous%20Methods%20for%20Deep%20Reinforcement%20Learning.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/Asynchronous%20Methods%20for%20Deep%20Reinforcement%20Learning.pdf) | https://arxiv.org/abs/1602.01783 | |
|
||||||
| | Trust Region Policy Optimization (**TRPO**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Trust%20Region%20Policy%20Optimization.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/Trust%20Region%20Policy%20Optimization.pdf) | https://arxiv.org/abs/1502.05477 | |
|
| | Trust Region Policy Optimization (**TRPO**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Trust%20Region%20Policy%20Optimization.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/Trust%20Region%20Policy%20Optimization.pdf) | https://arxiv.org/abs/1502.05477 | |
|
||||||
| | High-Dimensional Continuous Control Using Generalized Advantage Estimation (**GAE**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/High-Dimensional%20Continuous%20Control%20Using%20Generalized%20Advantage%20Estimation.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/High-Dimensional%20Continuous%20Control%20Using%20Generalised%20Advantage%20Estimation.pdf) | https://arxiv.org/abs/1506.02438 | |
|
| | High-Dimensional Continuous Control Using Generalized Advantage Estimation (**GAE**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/High-Dimensional%20Continuous%20Control%20Using%20Generalized%20Advantage%20Estimation.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/High-Dimensional%20Continuous%20Control%20Using%20Generalised%20Advantage%20Estimation.pdf) | https://arxiv.org/abs/1506.02438 | |
|
||||||
| | Proximal Policy Optimization Algorithms (**PPO**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Proximal%20Policy%20Optimization%20Algorithms.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/Proximal%20Policy%20Optimization%20Algorithms.pdf) | https://arxiv.org/abs/1707.06347 | |
|
| | Proximal Policy Optimization Algorithms (**PPO**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Proximal%20Policy%20Optimization%20Algorithms.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/Proximal%20Policy%20Optimization%20Algorithms.pdf) | https://arxiv.org/abs/1707.06347 | |
|
||||||
|
|||||||
Reference in New Issue
Block a user