Update readme.md
This commit is contained in:
@@ -28,6 +28,11 @@
|
||||
| | Continuous Control With Deep Reinforcement Learning (**DDPG**) | https://arxiv.org/abs/1509.02971 | |
|
||||
| | Addressing Function Approximation Error in Actor-Critic Methods (**TD3**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Addressing%20Function%20Approximation%20Error%20in%20Actor-Critic%20Methods.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/Addressing%20Function%20Approximation%20Error%20in%20Actor-Critic%20Methods.pdf) | https://arxiv.org/abs/1802.09477 | |
|
||||
| | A Distributional Perspective on Reinforcement Learning (**C51**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/A%20Distributional%20Perspective%20on%20Reinforcement%20Learning.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/A%20Distributional%20Perspective%20on%20Reinforcement%20Learning.pdf) | https://arxiv.org/abs/1707.06887 | |
|
||||
| | | | |
|
||||
| | Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic (**Q-Prop**) | https://arxiv.org/abs/1611.02247 | |
|
||||
| | Action-depedent Control Variates for Policy Optimization via Stein’s Identity (**Stein Control Variates**) | https://arxiv.org/abs/1710.11198 | |
|
||||
| | The Mirage of Action-Dependent Baselines in Reinforcement Learning | https://arxiv.org/abs/1802.10031 | |
|
||||
| | Bridging the Gap Between Value and Policy Based Reinforcement Learning (**PCL**) | https://arxiv.org/abs/1702.08892 | |
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user