Files
easy-rl/papers
2022-12-31 23:23:30 +08:00
..
2022-11-22 19:08:46 +08:00
2022-12-31 23:23:30 +08:00
2022-12-06 19:11:29 +08:00

经典强化学习论文解读

该部分是蘑菇书的扩展内容,整理&总结&解读强化学习领域的经典论文。主要有DQN类、策略梯度类、模仿学习类、分布式强化学习、多任务强化学习、探索策略、分层强化学习以及其他技巧等方向的论文。后续会配有视频解读与WhalePaper合作会陆续上线Datawhale B站公众号

每周更新5篇左右的论文欢迎关注。

如果在线阅读Markdown文件有问题例如公式编译错误、图片显示较慢等请下载到本地阅读或观看PDF文件夹中的同名文件。

转发请加上链接&来源Easy RL项目

类别 论文题目 原文链接 视频解读
DQN Playing Atari with Deep Reinforcement Learning (DQN) [Markdown] [PDF] https://arxiv.org/abs/1312.5602
Deep Recurrent Q-Learning for Partially Observable MDPs [Markdown] [PDF] https://arxiv.org/abs/1507.06527
Dueling Network Architectures for Deep Reinforcement Learning (Dueling DQN) [Markdown] [PDF] https://arxiv.org/abs/1511.06581
Deep Reinforcement Learning with Double Q-learning (Double DQN) [Markdown] [PDF] https://arxiv.org/abs/1509.06461
Prioritized Experience Replay (PER) [Markdown] [PDF] https://arxiv.org/abs/1511.05952
Rainbow: Combining Improvements in Deep Reinforcement Learning (Rainbow) [Markdown] [PDF] https://arxiv.org/abs/1710.02298
Policy gradient Asynchronous Methods for Deep Reinforcement Learning (A3C) [Markdown] https://arxiv.org/abs/1602.01783
Trust Region Policy Optimization (TRPO) [Markdown] [PDF] https://arxiv.org/abs/1502.05477
High-Dimensional Continuous Control Using Generalized Advantage Estimation (GAE) [Markdown] [PDF] https://arxiv.org/abs/1506.02438
Proximal Policy Optimization Algorithms (PPO) [Markdown] [PDF] https://arxiv.org/abs/1707.06347
Emergence of Locomotion Behaviours in Rich Environments (PPO-Penalty) https://arxiv.org/abs/1707.02286
Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTP) [Markdown] [PDF] https://arxiv.org/abs/1708.05144
Sample Efficient Actor-Critic with Experience Replay (ACER) https://arxiv.org/abs/1611.01224
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor (SAC) [Markdown] [PDF] https://arxiv.org/abs/1801.01290
Deterministic Policy Gradient Algorithms (DPG) [Markdown] [PDF] http://proceedings.mlr.press/v32/silver14.pdf
Continuous Control With Deep Reinforcement Learning (DDPG) https://arxiv.org/abs/1509.02971
Addressing Function Approximation Error in Actor-Critic Methods (TD3) [Markdown] [PDF] https://arxiv.org/abs/1802.09477
A Distributional Perspective on Reinforcement Learning (C51) [Markdown] [PDF] https://arxiv.org/abs/1707.06887
Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic (Q-Prop) https://arxiv.org/abs/1611.02247
Action-depedent Control Variates for Policy Optimization via Steins Identity (Stein Control Variates) [Markdown] [PDF] https://arxiv.org/abs/1710.11198
The Mirage of Action-Dependent Baselines in Reinforcement Learning [Markdown] [PDF] https://arxiv.org/abs/1802.10031
Bridging the Gap Between Value and Policy Based Reinforcement Learning (PCL) [Markdown] [PDF] https://arxiv.org/abs/1702.08892