说明
该部分是蘑菇书的扩展内容,整理&总结&解读强化学习领域的经典论文。主要有DQN类、策略梯度类、模仿学习类、分布式强化学习、多任务强化学习、探索策略、分层强化学习以及其他技巧等方向的论文。后续会配有视频解读(与WhalePaper合作),会陆续上线Datawhale B站公众号。
每周更新5篇左右的论文,欢迎关注。
转发请加上链接&来源(Easy-RL项目)
| 类别 | 论文题目 | 原文链接 | 其他-视频解读 |
|---|---|---|---|
| DQN | Playing Atari with Deep Reinforcement Learning | https://arxiv.org/abs/1312.5602 | |
| Deep Recurrent Q-Learning for Partially Observable MDPs | https://arxiv.org/abs/1507.06527 | ||
| Dueling Network Architectures for Deep Reinforcement Learning (Dueling DQN) | https://arxiv.org/abs/1511.06581 | ||
| Deep Reinforcement Learning with Double Q-learning (Double DQN) | https://arxiv.org/abs/1509.06461 | ||
| Prioritized Experience Replay (PER) | https://arxiv.org/abs/1511.05952 | ||
| Rainbow: Combining Improvements in Deep Reinforcement Learning (Rainbow) | https://arxiv.org/abs/1710.02298 | ||
| Policy gradient | Asynchronous Methods for Deep Reinforcement Learning (A3C) | https://arxiv.org/abs/1602.01783 | |
| Trust Region Policy Optimization (TRPO) | https://arxiv.org/abs/1502.05477 | ||
| High-Dimensional Continuous Control Using Generalized Advantage Estimation (GAE) | https://arxiv.org/abs/1506.02438 | ||
| Proximal Policy Optimization Algorithms (PPO) | https://arxiv.org/abs/1707.06347 | ||
| Emergence of Locomotion Behaviours in Rich Environments (PPO-Penalty) | https://arxiv.org/abs/1707.02286 | ||
| Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTP) | https://arxiv.org/abs/1708.05144 | ||
| Sample Efficient Actor-Critic with Experience Replay (ACER) | https://arxiv.org/abs/1611.01224 | ||
| Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with (SAC) | https://arxiv.org/abs/1801.01290 | ||
| Deterministic Policy Gradient Algorithms (DPG) | http://proceedings.mlr.press/v32/silver14.pdf | ||
| Continuous Control With Deep Reinforcement Learning (DDPG) | https://arxiv.org/abs/1509.02971 | ||
| Addressing Function Approximation Error in Actor-Critic Methods (TD3) | https://arxiv.org/abs/1802.09477 | ||
| A Distributional Perspective on Reinforcement Learning (C51) | https://arxiv.org/abs/1707.06887 | ||