| DQN |
Playing Atari with Deep Reinforcement Learning |
https://arxiv.org/abs/1312.5602 |
|
|
Deep Recurrent Q-Learning for Partially Observable MDPs |
https://arxiv.org/abs/1507.06527 |
|
|
Dueling Network Architectures for Deep Reinforcement Learning (Dueling DQN) |
https://arxiv.org/abs/1511.06581 |
|
|
Deep Reinforcement Learning with Double Q-learning (Double DQN) |
https://arxiv.org/abs/1509.06461 |
|
|
Prioritized Experience Replay (PER) |
https://arxiv.org/abs/1511.05952 |
|
|
Rainbow: Combining Improvements in Deep Reinforcement Learning (Rainbow) |
https://arxiv.org/abs/1710.02298 |
|
| 策略梯度 |
Asynchronous Methods for Deep Reinforcement Learning (A3C) |
https://arxiv.org/abs/1602.01783 |
|
|
Trust Region Policy Optimization (TRPO) |
https://arxiv.org/abs/1502.05477 |
|
|
High-Dimensional Continuous Control Using Generalized Advantage Estimation (GAE) |
https://arxiv.org/abs/1506.02438 |
|
|
Proximal Policy Optimization Algorithms (PPO) |
https://arxiv.org/abs/1707.06347 |
|
|
Emergence of Locomotion Behaviours in Rich Environments (PPO-Penalty) |
https://arxiv.org/abs/1707.02286 |
|
|
Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTP) |
https://arxiv.org/abs/1708.05144 |
|
|
Sample Efficient Actor-Critic with Experience Replay (ACER) |
https://arxiv.org/abs/1611.01224 |
|
|
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with (SAC) |
https://arxiv.org/abs/1801.01290 |
|
|
Deterministic Policy Gradient Algorithms (DPG) |
http://proceedings.mlr.press/v32/silver14.pdf |
|
|
Continuous Control With Deep Reinforcement Learning (DDPG) |
https://arxiv.org/abs/1509.02971 |
|
|
Addressing Function Approximation Error in Actor-Critic Methods (TD3) |
https://arxiv.org/abs/1802.09477 |
|
|
A Distributional Perspective on Reinforcement Learning (C51) |
https://arxiv.org/abs/1707.06887 |
|
|
|
|
|