diff --git a/papers/readme.md b/papers/readme.md
index 665e244..d57642a 100644
--- a/papers/readme.md
+++ b/papers/readme.md
@@ -7,23 +7,25 @@
 **转发请加上链接&来源[Easy RL项目](https://github.com/datawhalechina/easy-rl)**
 
 | 类别            | 论文题目                                                     | 原文链接                                      | 视频解读 |
-| --------------- | ------------------------------------------------------------ | --------------------------------------------- | -------------------- |
-| DQN             | Playing Atari with Deep Reinforcement Learning (**DQN**)               | https://arxiv.org/abs/1312.5602               |                      |
-|                 | Deep Recurrent Q-Learning for Partially Observable MDPs      | https://arxiv.org/abs/1507.06527              |                      |
-|                 | Dueling Network Architectures for Deep Reinforcement Learning (**Dueling DQN**) | https://arxiv.org/abs/1511.06581              |                      |
-|                 | Deep Reinforcement Learning with Double Q-learning (**Double DQN**) | https://arxiv.org/abs/1509.06461              |                      |
-|                 | Prioritized Experience Replay (**PER**)                      | https://arxiv.org/abs/1511.05952              |                      |
-|                 | Rainbow: Combining Improvements in Deep Reinforcement Learning (**Rainbow**) | https://arxiv.org/abs/1710.02298              |                      |
-| Policy gradient | Asynchronous Methods for Deep Reinforcement Learning (**A3C**) | https://arxiv.org/abs/1602.01783              |                      |
-|                 | Trust Region Policy Optimization (**TRPO**)                  | https://arxiv.org/abs/1502.05477              |                      |
-|                 | High-Dimensional Continuous Control Using Generalized Advantage Estimation (**GAE**) | https://arxiv.org/abs/1506.02438              |                      |
-|                 | Proximal Policy Optimization Algorithms (**PPO**)            | https://arxiv.org/abs/1707.06347              |                      |
-|                 | Emergence of Locomotion Behaviours in Rich Environments (**PPO-Penalty**) | https://arxiv.org/abs/1707.02286              |                      |
-|                 | Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (**ACKTP**) | https://arxiv.org/abs/1708.05144              |                      |
-|                 | Sample Efficient Actor-Critic with Experience Replay (**ACER**) | https://arxiv.org/abs/1611.01224              |                      |
-|                 | Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor(**SAC**) | https://arxiv.org/abs/1801.01290              |                      |
-|                 | Deterministic Policy Gradient Algorithms (**DPG**)           | http://proceedings.mlr.press/v32/silver14.pdf |                      |
-|                 | Continuous Control With Deep Reinforcement Learning (**DDPG**) | https://arxiv.org/abs/1509.02971              |                      |
-|                 | Addressing Function Approximation Error in Actor-Critic Methods (**TD3**) | https://arxiv.org/abs/1802.09477              |                      |
-|                 | A Distributional Perspective on Reinforcement Learning (**C51**) | https://arxiv.org/abs/1707.06887              |                      |
+| --------------- | ------------------------------------------------------------ | --------------------------------------------- | -------- |
+| DQN             | Playing Atari with Deep Reinforcement Learning (**DQN**) [[Markdown格式]](https://github.com/datawhalechina/easy-rl/blob/master/papers/DQN/Playing%20Atari%20with%20Deep%20Reinforcement%20Learning.md)  [[PDF格式]](https://github.com/datawhalechina/easy-rl/blob/master/papers/DQN/PDF/Playing%20Atari%20with%20Deep%20Reinforcement%20Learning.pdf) | https://arxiv.org/abs/1312.5602               |          |
+|                 | Deep Recurrent Q-Learning for Partially Observable MDPs [[Markdown格式]](https://github.com/datawhalechina/easy-rl/blob/master/papers/DQN/Deep%20Recurrent%20Q-Learning%20for%20Partially%20Observable%20MDPs.md) | https://arxiv.org/abs/1507.06527              |          |
+|                 | Dueling Network Architectures for Deep Reinforcement Learning (**Dueling DQN**) [[Markdown格式]](https://github.com/datawhalechina/easy-rl/blob/master/papers/DQN/Dueling%20Network%20Architectures%20for%20Deep%20Reinforceme.md) | https://arxiv.org/abs/1511.06581              |          |
+|                 | Deep Reinforcement Learning with Double Q-learning (**Double DQN**) [[Markdown格式]](https://github.com/datawhalechina/easy-rl/blob/master/papers/DQN/Deep%20Reinforcement%20Learning%20with%20Double%20Q-learning.md) | https://arxiv.org/abs/1509.06461              |          |
+|                 | Prioritized Experience Replay (**PER**) [[Markdown格式]](https://github.com/datawhalechina/easy-rl/blob/master/papers/DQN/Prioritized%20Experience%20Replay.md) | https://arxiv.org/abs/1511.05952              |          |
+|                 | Rainbow: Combining Improvements in Deep Reinforcement Learning (**Rainbow**) [[Markdown格式]](https://github.com/datawhalechina/easy-rl/blob/master/papers/DQN/Rainbow_Combining%20Improvements%20in%20Deep%20Reinforcement%20Learning.md) | https://arxiv.org/abs/1710.02298              |          |
+| Policy gradient | Asynchronous Methods for Deep Reinforcement Learning (**A3C**) [[Markdown格式]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Asynchronous%20Methods%20for%20Deep%20Reinforcement%20Learning.md) | https://arxiv.org/abs/1602.01783              |          |
+|                 | Trust Region Policy Optimization (**TRPO**) [[Markdown格式]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Trust%20Region%20Policy%20Optimization.md) | https://arxiv.org/abs/1502.05477              |          |
+|                 | High-Dimensional Continuous Control Using Generalized Advantage Estimation (**GAE**) [[Markdown格式]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/High-Dimensional%20Continuous%20Control%20Using%20Generalized%20Advantage%20Estimation.md) [[PDF格式]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/High-Dimensional%20Continuous%20Control%20Using%20Generalised%20Advantage%20Estimation.pdf) | https://arxiv.org/abs/1506.02438              |          |
+|                 | Proximal Policy Optimization Algorithms (**PPO**) [[Markdown格式]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Proximal%20Policy%20Optimization%20Algorithms.md) | https://arxiv.org/abs/1707.06347              |          |
+|                 | Emergence of Locomotion Behaviours in Rich Environments (**PPO-Penalty**) | https://arxiv.org/abs/1707.02286              |          |
+|                 | Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (**ACKTP**) [[Markdown格式]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Scalable%20trust-region%20method%20for%20deep%20reinforcement%20learning%20using%20Kronecker-factored.md) | https://arxiv.org/abs/1708.05144              |          |
+|                 | Sample Efficient Actor-Critic with Experience Replay (**ACER**) | https://arxiv.org/abs/1611.01224              |          |
+|                 | Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor(**SAC**) [[Markdown格式]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Soft%20Actor-Critic_Off-Policy%20Maximum%20Entropy%20Deep%20Reinforcement%20Learning%20with%20a%20Stochastic%20Actor.md) [[PDF格式]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/Soft%20Actor-Critic_Off-Policy%20Maximum%20Entropy%20Deep%20Reinforcement%20Learning%20with%20a%20Stochastic%20Actor.pdf) | https://arxiv.org/abs/1801.01290              |          |
+|                 | Deterministic Policy Gradient Algorithms (**DPG**) [[Markdown格式]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Deterministic%20Policy%20Gradient%20Algorithms.md) | http://proceedings.mlr.press/v32/silver14.pdf |          |
+|                 | Continuous Control With Deep Reinforcement Learning (**DDPG**) | https://arxiv.org/abs/1509.02971              |          |
+|                 | Addressing Function Approximation Error in Actor-Critic Methods (**TD3**) [[Markdown格式]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/Addressing%20Function%20Approximation%20Error%20in%20Actor-Critic%20Methods.md) | https://arxiv.org/abs/1802.09477              |          |
+|                 | A Distributional Perspective on Reinforcement Learning (**C51**) [[Markdown格式]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/A%20Distributional%20Perspective%20on%20Reinforcement%20Learning.md) [[PDF格式]](https://github.com/datawhalechina/easy-rl/blob/master/papers/Policy_gradient/PDF/A%20Distributional%20Perspective%20on%20Reinforcement%20Learning.pdf) | https://arxiv.org/abs/1707.06887              |          |
+|                 |                                                              |                                               |          |
+