From 1321d676bf4aa59bc1509ed4f40c1d1ccae068ba Mon Sep 17 00:00:00 2001 From: Yiyuan Yang Date: Tue, 6 Dec 2022 19:11:29 +0800 Subject: [PATCH] Update readme.md --- papers/readme.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/papers/readme.md b/papers/readme.md index 54c86a6..a92d2df 100644 --- a/papers/readme.md +++ b/papers/readme.md @@ -29,9 +29,9 @@ | | Addressing Function Approximation Error in Actor-Critic Methods (**TD3**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Addressing%20Function%20Approximation%20Error%20in%20Actor-Critic%20Methods.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/Addressing%20Function%20Approximation%20Error%20in%20Actor-Critic%20Methods.pdf) | https://arxiv.org/abs/1802.09477 | | | | A Distributional Perspective on Reinforcement Learning (**C51**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/A%20Distributional%20Perspective%20on%20Reinforcement%20Learning.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/A%20Distributional%20Perspective%20on%20Reinforcement%20Learning.pdf) | https://arxiv.org/abs/1707.06887 | | | | Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic (**Q-Prop**) | https://arxiv.org/abs/1611.02247 | | -| | Action-depedent Control Variates for Policy Optimization via Stein’s Identity (**Stein Control Variates**) | https://arxiv.org/abs/1710.11198 | | -| | The Mirage of Action-Dependent Baselines in Reinforcement Learning | https://arxiv.org/abs/1802.10031 | | -| | Bridging the Gap Between Value and Policy Based Reinforcement Learning (**PCL**) | https://arxiv.org/abs/1702.08892 | | +| | Action-depedent Control Variates for Policy Optimization via Stein’s Identity (**Stein Control Variates**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Action-depedent%20Control%20Variates%20for%20Policy%20Optimization%20via%20Stein%E2%80%99s%20Identity.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/Action-depedent%20Control%20Variates%20for%20Policy%20Optimization%20via%20Stein%E2%80%99s%20Identity.pdf)| https://arxiv.org/abs/1710.11198 | | +| | The Mirage of Action-Dependent Baselines in Reinforcement Learning [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/The%20Mirage%20of%20Action-Dependent%20Baselines%20in%20Reinforcement%20Learning.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/The%20Mirage%20of%20Action-Dependent%20Baselines%20in%20Reinforcement%20Learning.pdf)| https://arxiv.org/abs/1802.10031 | | +| | Bridging the Gap Between Value and Policy Based Reinforcement Learning (**PCL**) [[Markdown]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Bridging%20the%20Gap%20Between%20Value%20and%20Policy%20Based%20Reinforcement%20Learning.md) [[PDF]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/Bridging%20the%20Gap%20Between%20Value%20and%20Policy%20Based%20Reinforcement%20Learning.pdf) | https://arxiv.org/abs/1702.08892 | |