diff --git a/papers/Policy_gradient/Action-depedent Control Variates for Policy Optimization via Stein’s Identity.md b/papers/Policy_gradient/Action-depedent Control Variates for Policy Optimization via Stein’s Identity.md index 4946486..95f6c19 100644 --- a/papers/Policy_gradient/Action-depedent Control Variates for Policy Optimization via Stein’s Identity.md +++ b/papers/Policy_gradient/Action-depedent Control Variates for Policy Optimization via Stein’s Identity.md @@ -82,7 +82,7 @@ $$ #### 算法 -截屏2022-12-05 19.58.49 +#### 截屏2022-12-05 19.58.49 运用Stein控制变量的PPO算法。