update q-learning

This commit is contained in:
JohnJim0816
2020-11-24 20:29:23 +08:00
parent 4cc12bf97f
commit cfe5a89fa7
12 changed files with 129 additions and 48 deletions

View File

@@ -16,4 +16,23 @@
![](assets/cliffwalking_2.png)
由于从起点到终点最少需要13步每步得到-1的reward因此最佳训练算法下每个episode下reward总和应该为-13。
由于从起点到终点最少需要13步每步得到-1的reward因此最佳训练算法下每个episode下reward总和应该为-13。
## 使用
train:
```python
python main.py
```
eval:
```python
python main.py --train 0
```
tensorboard
```python
tensorboard --logdir logs
```