31 lines
4.0 KiB
Markdown
31 lines
4.0 KiB
Markdown
# 经典强化学习论文解读
|
||
|
||
该部分是蘑菇书的扩展内容,**整理&总结&解读强化学习领域的经典论文**。主要有DQN类、策略梯度类、模仿学习类、分布式强化学习、多任务强化学习、探索策略、分层强化学习以及其他技巧等方向的论文。后续会配有视频解读(与WhalePaper合作),会陆续上线[Datawhale B站公众号](https://space.bilibili.com/431850986?spm_id_from=333.337.0.0)。
|
||
|
||
每周更新5篇左右的论文,欢迎关注。
|
||
|
||
**转发请加上链接&来源[Easy RL项目](https://github.com/datawhalechina/easy-rl)**
|
||
|
||
| 类别 | 论文题目 | 原文链接 | 视频解读 |
|
||
| --------------- | ------------------------------------------------------------ | --------------------------------------------- | -------------------- |
|
||
| DQN | Playing Atari with Deep Reinforcement Learning (**DQN**) | https://arxiv.org/abs/1312.5602 | |
|
||
| | Deep Recurrent Q-Learning for Partially Observable MDPs | https://arxiv.org/abs/1507.06527 | |
|
||
| | Dueling Network Architectures for Deep Reinforcement Learning (**Dueling DQN**) | https://arxiv.org/abs/1511.06581 | |
|
||
| | Deep Reinforcement Learning with Double Q-learning (**Double DQN**) | https://arxiv.org/abs/1509.06461 | |
|
||
| | Prioritized Experience Replay (**PER**) | https://arxiv.org/abs/1511.05952 | |
|
||
| | Rainbow: Combining Improvements in Deep Reinforcement Learning (**Rainbow**) | https://arxiv.org/abs/1710.02298 | |
|
||
| Policy gradient | Asynchronous Methods for Deep Reinforcement Learning (**A3C**) | https://arxiv.org/abs/1602.01783 | |
|
||
| | Trust Region Policy Optimization (**TRPO**) | https://arxiv.org/abs/1502.05477 | |
|
||
| | High-Dimensional Continuous Control Using Generalized Advantage Estimation (**GAE**) | https://arxiv.org/abs/1506.02438 | |
|
||
| | Proximal Policy Optimization Algorithms (**PPO**) | https://arxiv.org/abs/1707.06347 | |
|
||
| | Emergence of Locomotion Behaviours in Rich Environments (**PPO-Penalty**) | https://arxiv.org/abs/1707.02286 | |
|
||
| | Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (**ACKTP**) | https://arxiv.org/abs/1708.05144 | |
|
||
| | Sample Efficient Actor-Critic with Experience Replay (**ACER**) | https://arxiv.org/abs/1611.01224 | |
|
||
| | Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor(**SAC**) | https://arxiv.org/abs/1801.01290 | |
|
||
| | Deterministic Policy Gradient Algorithms (**DPG**) | http://proceedings.mlr.press/v32/silver14.pdf | |
|
||
| | Continuous Control With Deep Reinforcement Learning (**DDPG**) | https://arxiv.org/abs/1509.02971 | |
|
||
| | Addressing Function Approximation Error in Actor-Critic Methods (**TD3**) | https://arxiv.org/abs/1802.09477 | |
|
||
| | A Distributional Perspective on Reinforcement Learning (**C51**) | https://arxiv.org/abs/1707.06887 | |
|
||
| | | | |
|
||
|