From 16b493c41fb0741e59db9ea6f3b652c25ee31b8f Mon Sep 17 00:00:00 2001 From: Yiyuan Yang Date: Sun, 20 Nov 2022 20:46:52 +0800 Subject: [PATCH] Update readme.md --- papers/readme.md | 50 ++++++++++++++++++++++++++++-------------------- 1 file changed, 29 insertions(+), 21 deletions(-) diff --git a/papers/readme.md b/papers/readme.md index 2b983f8..70a8b50 100644 --- a/papers/readme.md +++ b/papers/readme.md @@ -1,22 +1,30 @@ +## 说明 + +该部分是蘑菇书的扩展内容,整理&总结了强化学习领域的经典论文。主要有DQN类、策略梯度类、模仿学习类、分布式强化学习、多任务强化学习、探索策略、分层强化学习以及其他技巧等方向的论文。后续会配有视频解读(与WhalePaper合作),会陆续上线Datawhale B站公众号。 + +每周更新5篇左右的论文,欢迎关注。 + +**转发请加上链接&来源(Easy-RL项目)** + +| 类别 | 论文题目 | 原文链接 | 其他链接(视频解读) | +| --------------- | ------------------------------------------------------------ | --------------------------------------------- | -------------------- | +| DQN | Playing Atari with Deep Reinforcement Learning | https://arxiv.org/abs/1312.5602 | | +| | Deep Recurrent Q-Learning for Partially Observable MDPs | https://arxiv.org/abs/1507.06527 | | +| | Dueling Network Architectures for Deep Reinforcement Learning (**Dueling DQN**) | https://arxiv.org/abs/1511.06581 | | +| | Deep Reinforcement Learning with Double Q-learning (**Double DQN**) | https://arxiv.org/abs/1509.06461 | | +| | Prioritized Experience Replay (**PER**) | https://arxiv.org/abs/1511.05952 | | +| | Rainbow: Combining Improvements in Deep Reinforcement Learning (**Rainbow**) | https://arxiv.org/abs/1710.02298 | | +| Policy gradient | Asynchronous Methods for Deep Reinforcement Learning (**A3C**) | https://arxiv.org/abs/1602.01783 | | +| | Trust Region Policy Optimization (**TRPO**) | https://arxiv.org/abs/1502.05477 | | +| | High-Dimensional Continuous Control Using Generalized Advantage Estimation (**GAE**) | https://arxiv.org/abs/1506.02438 | | +| | Proximal Policy Optimization Algorithms (**PPO**) | https://arxiv.org/abs/1707.06347 | | +| | Emergence of Locomotion Behaviours in Rich Environments (**PPO-Penalty**) | https://arxiv.org/abs/1707.02286 | | +| | Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (**ACKTP**) | https://arxiv.org/abs/1708.05144 | | +| | Sample Efficient Actor-Critic with Experience Replay (**ACER**) | https://arxiv.org/abs/1611.01224 | | +| | Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with (**SAC**) | https://arxiv.org/abs/1801.01290 | | +| | Deterministic Policy Gradient Algorithms (**DPG**) | http://proceedings.mlr.press/v32/silver14.pdf | | +| | Continuous Control With Deep Reinforcement Learning (**DDPG**) | https://arxiv.org/abs/1509.02971 | | +| | Addressing Function Approximation Error in Actor-Critic Methods (**TD3**) | https://arxiv.org/abs/1802.09477 | | +| | A Distributional Perspective on Reinforcement Learning (**C51**) | https://arxiv.org/abs/1707.06887 | | +| | | | | -| 类别 | 论文题目 | 原文链接 | 其他(视频解读) | -| -------- | ------------------------------------------------------------ | --------------------------------------------- | -------------------- | -| DQN | Playing Atari with Deep Reinforcement Learning | https://arxiv.org/abs/1312.5602 | | -| | Deep Recurrent Q-Learning for Partially Observable MDPs | https://arxiv.org/abs/1507.06527 | | -| | Dueling Network Architectures for Deep Reinforcement Learning (**Dueling DQN**) | https://arxiv.org/abs/1511.06581 | | -| | Deep Reinforcement Learning with Double Q-learning (**Double DQN**) | https://arxiv.org/abs/1509.06461 | | -| | Prioritized Experience Replay (**PER**) | https://arxiv.org/abs/1511.05952 | | -| | Rainbow: Combining Improvements in Deep Reinforcement Learning (**Rainbow**) | https://arxiv.org/abs/1710.02298 | | -| 策略梯度 | Asynchronous Methods for Deep Reinforcement Learning (**A3C**) | https://arxiv.org/abs/1602.01783 | | -| | Trust Region Policy Optimization (**TRPO**) | https://arxiv.org/abs/1502.05477 | | -| | High-Dimensional Continuous Control Using Generalized Advantage Estimation (**GAE**) | https://arxiv.org/abs/1506.02438 | | -| | Proximal Policy Optimization Algorithms (**PPO**) | https://arxiv.org/abs/1707.06347 | | -| | Emergence of Locomotion Behaviours in Rich Environments (**PPO-Penalty**) | https://arxiv.org/abs/1707.02286 | | -| | Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (**ACKTP**) | https://arxiv.org/abs/1708.05144 | | -| | Sample Efficient Actor-Critic with Experience Replay (**ACER**) | https://arxiv.org/abs/1611.01224 | | -| | Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with (**SAC**) | https://arxiv.org/abs/1801.01290 | | -| | Deterministic Policy Gradient Algorithms (**DPG**) | http://proceedings.mlr.press/v32/silver14.pdf | | -| | Continuous Control With Deep Reinforcement Learning (**DDPG**) | https://arxiv.org/abs/1509.02971 | | -| | Addressing Function Approximation Error in Actor-Critic Methods (**TD3**) | https://arxiv.org/abs/1802.09477 | | -| | A Distributional Perspective on Reinforcement Learning (**C51**) | https://arxiv.org/abs/1707.06887 | | -| | | | |