Files
easy-rl/papers/readme.md
2022-11-20 22:40:02 +08:00

31 lines
4.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 经典强化学习论文解读
该部分是蘑菇书的扩展内容,**整理&总结&解读强化学习领域的经典论文**。主要有DQN类、策略梯度类、模仿学习类、分布式强化学习、多任务强化学习、探索策略、分层强化学习以及其他技巧等方向的论文。后续会配有视频解读与WhalePaper合作会陆续上线[Datawhale B站公众号](https://space.bilibili.com/431850986?spm_id_from=333.337.0.0)。
每周更新5篇左右的论文欢迎关注。
**转发请加上链接&来源[Easy RL项目](https://github.com/datawhalechina/easy-rl)**
| 类别 | 论文题目 | 原文链接 | 视频解读 |
| --------------- | ------------------------------------------------------------ | --------------------------------------------- | -------------------- |
| DQN | Playing Atari with Deep Reinforcement Learning (**DQN**) | https://arxiv.org/abs/1312.5602 | |
| | Deep Recurrent Q-Learning for Partially Observable MDPs | https://arxiv.org/abs/1507.06527 | |
| | Dueling Network Architectures for Deep Reinforcement Learning (**Dueling DQN**) | https://arxiv.org/abs/1511.06581 | |
| | Deep Reinforcement Learning with Double Q-learning (**Double DQN**) | https://arxiv.org/abs/1509.06461 | |
| | Prioritized Experience Replay (**PER**) | https://arxiv.org/abs/1511.05952 | |
| | Rainbow: Combining Improvements in Deep Reinforcement Learning (**Rainbow**) | https://arxiv.org/abs/1710.02298 | |
| Policy gradient | Asynchronous Methods for Deep Reinforcement Learning (**A3C**) | https://arxiv.org/abs/1602.01783 | |
| | Trust Region Policy Optimization (**TRPO**) | https://arxiv.org/abs/1502.05477 | |
| | High-Dimensional Continuous Control Using Generalized Advantage Estimation (**GAE**) | https://arxiv.org/abs/1506.02438 | |
| | Proximal Policy Optimization Algorithms (**PPO**) | https://arxiv.org/abs/1707.06347 | |
| | Emergence of Locomotion Behaviours in Rich Environments (**PPO-Penalty**) | https://arxiv.org/abs/1707.02286 | |
| | Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (**ACKTP**) | https://arxiv.org/abs/1708.05144 | |
| | Sample Efficient Actor-Critic with Experience Replay (**ACER**) | https://arxiv.org/abs/1611.01224 | |
| | Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor(**SAC**) | https://arxiv.org/abs/1801.01290 | |
| | Deterministic Policy Gradient Algorithms (**DPG**) | http://proceedings.mlr.press/v32/silver14.pdf | |
| | Continuous Control With Deep Reinforcement Learning (**DDPG**) | https://arxiv.org/abs/1509.02971 | |
| | Addressing Function Approximation Error in Actor-Critic Methods (**TD3**) | https://arxiv.org/abs/1802.09477 | |
| | A Distributional Perspective on Reinforcement Learning (**C51**) | https://arxiv.org/abs/1707.06887 | |
| | | | |