Update readme.md

This commit is contained in:
Yiyuan Yang
2022-12-06 19:11:29 +08:00
committed by GitHub
parent e18e7358c1
commit 1321d676bf

View File

@@ -29,9 +29,9 @@
| | Addressing Function Approximation Error in Actor-Critic Methods (**TD3**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Addressing%20Function%20Approximation%20Error%20in%20Actor-Critic%20Methods.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/Addressing%20Function%20Approximation%20Error%20in%20Actor-Critic%20Methods.pdf) | https://arxiv.org/abs/1802.09477 | |
| | A Distributional Perspective on Reinforcement Learning (**C51**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/A%20Distributional%20Perspective%20on%20Reinforcement%20Learning.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/A%20Distributional%20Perspective%20on%20Reinforcement%20Learning.pdf) | https://arxiv.org/abs/1707.06887 | |
| | Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic (**Q-Prop**) | https://arxiv.org/abs/1611.02247 | |
| | Action-depedent Control Variates for Policy Optimization via Steins Identity (**Stein Control Variates**) | https://arxiv.org/abs/1710.11198 | |
| | The Mirage of Action-Dependent Baselines in Reinforcement Learning | https://arxiv.org/abs/1802.10031 | |
| | Bridging the Gap Between Value and Policy Based Reinforcement Learning (**PCL**) | https://arxiv.org/abs/1702.08892 | |
| | Action-depedent Control Variates for Policy Optimization via Steins Identity (**Stein Control Variates**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Action-depedent%20Control%20Variates%20for%20Policy%20Optimization%20via%20Stein%E2%80%99s%20Identity.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/Action-depedent%20Control%20Variates%20for%20Policy%20Optimization%20via%20Stein%E2%80%99s%20Identity.pdf)| https://arxiv.org/abs/1710.11198 | |
| | The Mirage of Action-Dependent Baselines in Reinforcement Learning [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/The%20Mirage%20of%20Action-Dependent%20Baselines%20in%20Reinforcement%20Learning.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/The%20Mirage%20of%20Action-Dependent%20Baselines%20in%20Reinforcement%20Learning.pdf)| https://arxiv.org/abs/1802.10031 | |
| | Bridging the Gap Between Value and Policy Based Reinforcement Learning (**PCL**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Bridging%20the%20Gap%20Between%20Value%20and%20Policy%20Based%20Reinforcement%20Learning.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/Bridging%20the%20Gap%20Between%20Value%20and%20Policy%20Based%20Reinforcement%20Learning.pdf) | https://arxiv.org/abs/1702.08892 | |