update
This commit is contained in:
@@ -54,7 +54,23 @@ for i_episode in range(1, cfg.max_episodes+1): # cfg.max_episodes为最大训练
|
||||
|
||||
训练并绘制reward以及滑动平均后的reward随epiosde的变化曲线图并记录超参数写成报告,图示如下:
|
||||
|
||||

|
||||

|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
同时也可以绘制测试(eval)模型时的曲线:
|
||||
|
||||
()
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
也可以[tensorboard](https://pytorch.org/docs/stable/tensorboard.html)查看结果,如下:
|
||||
|
||||

|
||||
|
||||
### 代码清单
|
||||
|
||||
@@ -66,6 +82,6 @@ for i_episode in range(1, cfg.max_episodes+1): # cfg.max_episodes为最大训练
|
||||
|
||||
**memory.py**:保存Replay Buffer
|
||||
|
||||
**plot.py**:保存相关绘制函数
|
||||
**plot.py**:保存相关绘制函数,可选
|
||||
|
||||
[参考代码](https://github.com/datawhalechina/leedeeprl-notes/tree/master/codes/dqn)
|
||||
Reference in New Issue
Block a user