diff --git a/projects/.gitignore b/projects/.gitignore deleted file mode 100644 index d1fc2b1..0000000 --- a/projects/.gitignore +++ /dev/null @@ -1,10 +0,0 @@ -.DS_Store -.ipynb_checkpoints -__pycache__ -.vscode -test.py -pseudocodes.aux -pseudocodes.log -pseudocodes.synctex.gz -pseudocodes.out -pseudocodes.toc \ No newline at end of file diff --git a/projects/LICENSE b/projects/LICENSE deleted file mode 100644 index 673d927..0000000 --- a/projects/LICENSE +++ /dev/null @@ -1,21 +0,0 @@ -MIT License - -Copyright (c) 2020 John Jim - -Permission is hereby granted, free of charge, to any person obtaining a copy -of this software and associated documentation files (the "Software"), to deal -in the Software without restriction, including without limitation the rights -to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -copies of the Software, and to permit persons to whom the Software is -furnished to do so, subject to the following conditions: - -The above copyright notice and this permission notice shall be included in all -copies or substantial portions of the Software. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -SOFTWARE. diff --git a/projects/README.md b/projects/README.md deleted file mode 100644 index bb6196a..0000000 --- a/projects/README.md +++ /dev/null @@ -1,115 +0,0 @@ -## 0. 写在前面 - -本项目用于学习RL基础算法,主要面向对象为RL初学者、需要结合RL的非专业学习者,尽量做到: **注释详细**,**结构清晰**。 - -注意本项目为实战内容,建议首先掌握相关算法的一些理论基础,再来享用本项目,理论教程参考本人参与编写的[蘑菇书](https://github.com/datawhalechina/easy-rl)。 - -未来开发计划包括但不限于:多智能体算法、强化学习Python包以及强化学习图形化编程平台等等。 - -## 1. 项目说明 - -项目内容主要包含以下几个部分: -* [Jupyter Notebook](./notebooks/):使用Notebook写的算法,有比较详细的实战引导,推荐新手食用 -* [codes](./codes/):这些是基于Python脚本写的算法,风格比较接近实际项目的写法,推荐有一定代码基础的人阅读,下面会说明其具体的一些架构 -* [附件](./assets/):目前包含强化学习各算法的中文伪代码 - - -[codes](./assets/)结构主要分为以下几个脚本: -* ```[algorithm_name].py```:即保存算法的脚本,例如```dqn.py```,每种算法都会有一定的基础模块,例如```Replay Buffer```、```MLP```(多层感知机)等等; -* ```task.py```: 即保存任务的脚本,基本包括基于```argparse```模块的参数,训练以及测试函数等等,其中训练函数即```train```遵循伪代码而设计,想读懂代码可从该函数入手; -* ```utils.py```:该脚本用于保存诸如存储结果以及画图的软件,在实际项目或研究中,推荐大家使用```Tensorboard```来保存结果,然后使用诸如```matplotlib```以及```seabron```来进一步画图。 -## 2. 算法列表 - -注:点击对应的名称会跳到[codes](./codes/)下对应的算法中,其他版本还请读者自行翻阅 - -| 算法名称 | 参考文献 | 作者 | 备注 | -| :-------------------------------------: | :----------------------------------------------------------: | :--------------------------------------------------: | :--: | -| [Policy Gradient](codes/PolicyGradient) | [Policy Gradient paper](https://proceedings.neurips.cc/paper/1999/file/464d828b85b0bed98e80ade0a5c43b0f-Paper.pdf) | [johnjim0816](https://github.com/johnjim0816) | | -| [Monte Carlo](codes/MonteCarlo) | | [johnjim0816](https://github.com/johnjim0816) | | -| [DQN](codes/DQN) | | [johnjim0816](https://github.com/johnjim0816) | | -| DQN-CNN | | | 待更 | -| [PER_DQN](codes/PER_DQN) | [PER DQN Paper](https://arxiv.org/abs/1511.05952) | [wangzhongren](https://github.com/wangzhongren-code) | | -| [DoubleDQN](codes/DoubleDQN) | [Double DQN Paper](https://arxiv.org/abs/1509.06461) | [johnjim0816](https://github.com/johnjim0816) | | -| [SoftQ](codes/SoftQ) | [Soft Q-learning paper](https://arxiv.org/abs/1702.08165) | [johnjim0816](https://github.com/johnjim0816) | | -| [SAC](codes/SAC) | [SAC paper](https://arxiv.org/pdf/1812.05905.pdf) | | | -| [SAC-Discrete](codes/SAC) | [SAC-Discrete paper](https://arxiv.org/pdf/1910.07207.pdf) | | | -| SAC-S | [SAC-S paper](https://arxiv.org/abs/1801.01290) | | | -| DSAC | [DSAC paper](https://paperswithcode.com/paper/addressing-value-estimation-errors-in) | | 待更 | - -## 3. 算法环境 - -算法环境说明请跳转[env](./codes/envs/README.md) - -## 4. 运行环境 - -主要依赖:Python 3.7、PyTorch 1.10.0、Gym 0.25.2。 - -### 4.1. 创建Conda环境 -```bash -conda create -n easyrl python=3.7 -conda activate easyrl # 激活环境 -``` -### 4.2. 安装Torch - -安装CPU版本: -```bash -conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cpuonly -c pytorch -``` -安装CUDA版本: -```bash -conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge -``` -如果安装Torch需要镜像加速的话,点击[清华镜像链接](https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/),选择对应的操作系统,如```win-64```,然后复制链接,执行: -```bash -conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/win-64/ -``` -也可以使用PiP镜像安装(仅限CUDA版本): -```bash -pip install torch==1.10.0+cu113 torchvision==0.11.0+cu113 torchaudio==0.10.0 --extra-index-url https://download.pytorch.org/whl/cu113 -``` -### 4.3. 检验CUDA版本Torch安装 - -CPU版本Torch请忽略此步,执行如下Python脚本,如果返回True说明CUDA版本安装成功: -```python -import torch -print(torch.cuda.is_available()) -``` -### 4.4. 安装Gym - -```bash -pip install gym==0.25.2 -``` -如需安装Atari环境,则需另外安装 - -```bash -pip install gym[atari,accept-rom-license]==0.25.2 -``` - -### 4.5. 安装其他依赖 - -项目根目录下执行: -```bash -pip install -r requirements.txt -``` - -## 6.使用说明 - -对于[codes](./codes/),`cd`到对应的算法目录下,例如`DQN`: - -```bash -python task_0.py -``` - -或者加载配置文件: - -```bash -python task0.py --yaml configs/CartPole-v1_DQN_Train.yaml -``` - -对于[Jupyter Notebook](./notebooks/): - -* 直接运行对应的ipynb文件就行 - -## 6. 友情说明 - -推荐使用VS Code做项目,入门可参考[VSCode上手指南](https://blog.csdn.net/JohnJim0/article/details/126366454) \ No newline at end of file diff --git a/projects/assets/pseudocodes/pseudocodes.pdf b/projects/assets/pseudocodes/pseudocodes.pdf deleted file mode 100644 index 5232181..0000000 Binary files a/projects/assets/pseudocodes/pseudocodes.pdf and /dev/null differ diff --git a/projects/assets/pseudocodes/pseudocodes.tex b/projects/assets/pseudocodes/pseudocodes.tex deleted file mode 100644 index 0033ae8..0000000 --- a/projects/assets/pseudocodes/pseudocodes.tex +++ /dev/null @@ -1,359 +0,0 @@ -\documentclass[11pt]{ctexart} -\usepackage{ctex} -\usepackage{algorithm} -\usepackage{algorithmic} -\usepackage{amssymb} -\usepackage{amsmath} -\usepackage{hyperref} -% \usepackage[hidelinks]{hyperref} 去除超链接的红色框 -\usepackage{setspace} -\usepackage{titlesec} -\usepackage{float} % 调用该包能够使用[H] -% \pagestyle{plain} % 去除页眉,但是保留页脚编号,都去掉plain换empty - -% 更改脚注为圆圈 -\usepackage{pifont} -\makeatletter -\newcommand*{\circnum}[1]{% - \expandafter\@circnum\csname c@#1\endcsname -} -\newcommand*{\@circnum}[1]{% - \ifnum#1<1 % - \@ctrerr - \else - \ifnum#1>20 % - \@ctrerr - \else - \ding{\the\numexpr 171+(#1)\relax}% - \fi - \fi -} -\makeatother - -\renewcommand*{\thefootnote}{\circnum{footnote}} - -\begin{document} -\tableofcontents % 目录,注意要运行两下或者vscode保存两下才能显示 -% \singlespacing -\clearpage -\section{模版备用} -\begin{algorithm}[H] % [H]固定位置 - \floatname{algorithm}{{算法}\footnotemark[1]} - \renewcommand{\thealgorithm}{} % 去掉算法标号 - \caption{} - \begin{algorithmic}[1] % [1]显示步数 - \STATE 测试 - \end{algorithmic} -\end{algorithm} -\footnotetext[1]{脚注} -\clearpage -\section{Q learning算法} -\begin{algorithm}[H] % [H]固定位置 - \floatname{algorithm}{{Q-learning算法}\footnotemark[1]} - \renewcommand{\thealgorithm}{} % 去掉算法标号 - \caption{} - \begin{algorithmic}[1] % [1]显示步数 - \STATE 初始化Q表$Q(s,a)$为任意值,但其中$Q(s_{terminal},)=0$,即终止状态对应的Q值为0 - \FOR {回合数 = $1,M$} - \STATE 重置环境,获得初始状态$s_1$ - \FOR {时步 = $1,T$} - \STATE 根据$\varepsilon-greedy$策略采样动作$a_t$ - \STATE 环境根据$a_t$反馈奖励$r_t$和下一个状态$s_{t+1}$ - \STATE {\bfseries 更新策略:} - \STATE $Q(s_t,a_t) \leftarrow Q(s_t,a_t)+\alpha[r_t+\gamma\max _{a}Q(s_{t+1},a)-Q(s_t,a_t)]$ - \STATE 更新状态$s_{t+1} \leftarrow s_t$ - \ENDFOR - \ENDFOR - \end{algorithmic} -\end{algorithm} -\footnotetext[1]{Reinforcement Learning: An Introduction} -\clearpage -\section{Sarsa算法} -\begin{algorithm}[H] % [H]固定位置 - \floatname{algorithm}{{Sarsa算法}\footnotemark[1]} - \renewcommand{\thealgorithm}{} % 去掉算法标号 - \caption{} - \begin{algorithmic}[1] % [1]显示步数 - \STATE 初始化Q表$Q(s,a)$为任意值,但其中$Q(s_{terminal},)=0$,即终止状态对应的Q值为0 - \FOR {回合数 = $1,M$} - \STATE 重置环境,获得初始状态$s_1$ - \STATE 根据$\varepsilon-greedy$策略采样初始动作$a_1$ - \FOR {时步 = $1,t$} - \STATE 环境根据$a_t$反馈奖励$r_t$和下一个状态$s_{t+1}$ - \STATE 根据$\varepsilon-greedy$策略$s_{t+1}$和采样动作$a_{t+1}$ - \STATE {\bfseries 更新策略:} - \STATE $Q(s_t,a_t) \leftarrow Q(s_t,a_t)+\alpha[r_t+\gamma Q(s_{t+1},a_{t+1})-Q(s_t,a_t)]$ - \STATE 更新状态$s_{t+1} \leftarrow s_t$ - \STATE 更新动作$a_{t+1} \leftarrow a_t$ - \ENDFOR - \ENDFOR - \end{algorithmic} -\end{algorithm} -\footnotetext[1]{Reinforcement Learning: An Introduction} -\clearpage - -\section{DQN算法} -\begin{algorithm}[H] % [H]固定位置 - \floatname{algorithm}{{DQN算法}\footnotemark[1]} - \renewcommand{\thealgorithm}{} % 去掉算法标号 - \caption{} - \renewcommand{\algorithmicrequire}{\textbf{输入:}} - \renewcommand{\algorithmicensure}{\textbf{输出:}} - \begin{algorithmic}[1] - % \REQUIRE $n \geq 0 \vee x \neq 0$ % 输入 - % \ENSURE $y = x^n$ % 输出 - \STATE 初始化策略网络参数$\theta$ % 初始化 - \STATE 复制参数到目标网络$\hat{Q} \leftarrow Q$ - \STATE 初始化经验回放$D$ - \FOR {回合数 = $1,M$} - \STATE 重置环境,获得初始状态$s_t$ - \FOR {时步 = $1,t$} - \STATE 根据$\varepsilon-greedy$策略采样动作$a_t$ - \STATE 环境根据$a_t$反馈奖励$r_t$和下一个状态$s_{t+1}$ - \STATE 存储transition即$(s_t,a_t,r_t,s_{t+1})$到经验回放$D$中 - \STATE 更新环境状态$s_{t+1} \leftarrow s_t$ - \STATE {\bfseries 更新策略:} - \STATE 从$D$中采样一个batch的transition - \STATE 计算实际的$Q$值,即$y_{j}$\footnotemark[2] - \STATE 对损失 $L(\theta)=\left(y_{i}-Q\left(s_{i}, a_{i} ; \theta\right)\right)^{2}$关于参数$\theta$做随机梯度下降\footnotemark[3] - \ENDFOR - \STATE 每$C$个回合复制参数$\hat{Q}\leftarrow Q$\footnotemark[4]] - \ENDFOR - \end{algorithmic} -\end{algorithm} -\footnotetext[1]{Playing Atari with Deep Reinforcement Learning} -\footnotetext[2]{$y_{i}= \begin{cases}r_{i} & \text {对于终止状态} s_{i+1} \\ r_{i}+\gamma \max _{a^{\prime}} Q\left(s_{i+1}, a^{\prime} ; \theta\right) & \text {对于非终止状态} s_{i+1}\end{cases}$} -\footnotetext[3]{$\theta_i \leftarrow \theta_i - \lambda \nabla_{\theta_{i}} L_{i}\left(\theta_{i}\right)$} -\footnotetext[4]{此处也可像原论文中放到小循环中改成每$C$步,但没有每$C$个回合稳定} -\clearpage - - -\section{PER\_DQN算法} -\begin{algorithm}[H] % [H]固定位置 - \floatname{algorithm}{{PER\_DQN算法}\footnotemark[1]} - \renewcommand{\thealgorithm}{} % 去掉算法标号 - \caption{} - \renewcommand{\algorithmicrequire}{\textbf{输入:}} - \renewcommand{\algorithmicensure}{\textbf{输出:}} - \begin{algorithmic}[1] - % \REQUIRE $n \geq 0 \vee x \neq 0$ % 输入 - % \ENSURE $y = x^n$ % 输出 - \STATE 初始化策略网络参数$\theta$ % 初始化 - \STATE 复制参数到目标网络$\hat{Q} \leftarrow Q$ - \STATE 初始化经验回放$D$ - \FOR {回合数 = $1,M$} - \STATE 重置环境,获得初始状态$s_t$ - \FOR {时步 = $1,t$} - \STATE 根据$\varepsilon-greedy$策略采样动作$a_t$ - \STATE 环境根据$a_t$反馈奖励$r_t$和下一个状态$s_{t+1}$ - \STATE 存储transition即$(s_t,a_t,r_t,s_{t+1})$到经验回放$D$,并根据TD-error损失确定其优先级$p_t$ - \STATE 更新环境状态$s_{t+1} \leftarrow s_t$ - \STATE {\bfseries 更新策略:} - \STATE 按照经验回放中的优先级别,每个样本采样概率为$P(j)=p_j^\alpha / \sum_i p_i^\alpha$,从$D$中采样一个大小为batch的transition - \STATE 计算各个样本重要性采样权重 $w_j=(N \cdot P(j))^{-\beta} / \max _i w_i$ - \STATE 计算TD-error $\delta_j$ ; 并根据TD-error更新优先级$p_j$ - \STATE 计算实际的$Q$值,即$y_{j}$\footnotemark[2] - \STATE 根据重要性采样权重调整损失 $L(\theta)=\left(y_{j}-Q\left(s_{j}, a_{j} ; \theta\right)\cdot w_j \right)^{2}$,并将其关于参数$\theta$做随机梯度下降\footnotemark[3] - \ENDFOR - \STATE 每$C$个回合复制参数$\hat{Q}\leftarrow Q$\footnotemark[4]] - \ENDFOR - \end{algorithmic} -\end{algorithm} -\footnotetext[1]{Playing Atari with Deep Reinforcement Learning} -\footnotetext[2]{$y_{i}= \begin{cases}r_{i} & \text {对于终止状态} s_{i+1} \\ r_{i}+\gamma \max _{a^{\prime}} Q\left(s_{i+1}, a^{\prime} ; \theta\right) & \text {对于非终止状态} s_{i+1}\end{cases}$} -\footnotetext[3]{$\theta_i \leftarrow \theta_i - \lambda \nabla_{\theta_{i}} L_{i}\left(\theta_{i}\right)$} -\footnotetext[4]{此处也可像原论文中放到小循环中改成每$C$步,但没有每$C$个回合稳定} -\clearpage - - -\section{Policy Gradient算法} -\begin{algorithm}[H] % [H]固定位置 - \floatname{algorithm}{{REINFORCE算法:Monte-Carlo Policy Gradient}\footnotemark[1]} - \renewcommand{\thealgorithm}{} % 去掉算法标号 - \caption{} - \begin{algorithmic}[1] % [1]显示步数 - \STATE 初始化策略参数$\boldsymbol{\theta} \in \mathbb{R}^{d^{\prime}}($ e.g., to $\mathbf{0})$ - \FOR {回合数 = $1,M$} - \STATE 根据策略$\pi(\cdot \mid \cdot, \boldsymbol{\theta})$采样一个(或几个)回合的transition - \FOR {时步 = $0,1,2,...,T-1$} - \STATE 计算回报$G \leftarrow \sum_{k=t+1}^{T} \gamma^{k-t-1} R_{k}$ - \STATE 更新策略$\boldsymbol{\theta} \leftarrow {\boldsymbol{\theta}+\alpha \gamma^{t}} G \nabla \ln \pi\left(A_{t} \mid S_{t}, \boldsymbol{\theta}\right)$ - \ENDFOR - \ENDFOR - \end{algorithmic} -\end{algorithm} -\footnotetext[1]{Reinforcement Learning: An Introduction} -\clearpage -\section{Advantage Actor Critic算法} -\begin{algorithm}[H] % [H]固定位置 - \floatname{algorithm}{{Q Actor Critic算法}} - \renewcommand{\thealgorithm}{} % 去掉算法标号 - \caption{} - \begin{algorithmic}[1] % [1]显示步数 - \STATE 初始化Actor参数$\theta$和Critic参数$w$ - \FOR {回合数 = $1,M$} - \STATE 根据策略$\pi_{\theta}(a|s)$采样一个(或几个)回合的transition - \STATE {\bfseries 更新Critic参数\footnotemark[1]} - \FOR {时步 = $t+1,1$} - \STATE 计算Advantage,即$ \delta_t = r_t + \gamma Q_w(s_{t+1},a_{t+1})-Q_w(s_t,a_t)$ - \STATE $w \leftarrow w+\alpha_{w} \delta_{t} \nabla_{w} Q_w(s_t,a_t)$ - \STATE $a_t \leftarrow a_{t+1}$,$s_t \leftarrow s_{t+1}$ - \ENDFOR - \STATE 更新Actor参数$\theta \leftarrow \theta+\alpha_{\theta} Q_{w}(s, a) \nabla_{\theta} \log \pi_{\theta}(a \mid s)$ - \ENDFOR - \end{algorithmic} -\end{algorithm} -\footnotetext[1]{这里结合TD error的特性按照从$t+1$到$1$计算法Advantage更方便} - -\clearpage - -\section{PPO-Clip算法} -\begin{algorithm}[H] % [H]固定位置 - \floatname{algorithm}{{PPO-Clip算法}\footnotemark[1]\footnotemark[2]} - \renewcommand{\thealgorithm}{} % 去掉算法标号 - \caption{} - \begin{algorithmic}[1] % [1]显示步数 - \STATE 初始化策略网络(Actor)参数$\theta$和价值网络(Critic)参数$\phi$ - \STATE 初始化Clip参数$\epsilon$ - \STATE 初始化epoch数量$K$ - \STATE 初始化经验回放$D$ - \STATE 初始化总时步数$c=0$ - \FOR {回合数 = $1,2,\cdots,M$} - \STATE 重置环境,获得初始状态$s_0$ - \FOR {时步 $t = 1,2,\cdots,T$} - \STATE 计数总时步$c \leftarrow c+1$ - \STATE 根据策略$\pi_{\theta}$选择$a_t$ - \STATE 环境根据$a_t$反馈奖励$r_t$和下一个状态$s_{t+1}$ - \STATE 存储$(s_t,a_t,r_t,s_{t+1})$到经验回放$D$中 - \IF{$c$被$C$整除\footnotemark[3]} - \FOR {$k= 1,2,\cdots,K$} - \STATE 测试 - \ENDFOR - \STATE 清空经验回放$D$ - \ENDIF - \ENDFOR - \ENDFOR - \end{algorithmic} -\end{algorithm} -\footnotetext[1]{Proximal Policy Optimization Algorithms} -\footnotetext[2]{https://spinningup.openai.com/en/latest/algorithms/ppo.html} -\footnotetext[3]{\bfseries 即每$C$个时步更新策略} -\clearpage -\section{DDPG算法} -\begin{algorithm}[H] % [H]固定位置 - \floatname{algorithm}{{DDPG算法}\footnotemark[1]} - \renewcommand{\thealgorithm}{} % 去掉算法标号 - \caption{} - \begin{algorithmic}[1] % [1]显示步数 - \STATE 初始化critic网络$Q\left(s, a \mid \theta^Q\right)$和actor网络$\mu(s|\theta^{\mu})$的参数$\theta^Q$和$\theta^{\mu}$ - \STATE 初始化对应的目标网络参数,即$\theta^{Q^{\prime}} \leftarrow \theta^Q, \theta^{\mu^{\prime}} \leftarrow \theta^\mu$ - \STATE 初始化经验回放$R$ - \FOR {回合数 = $1,M$} - \STATE 选择动作$a_t=\mu\left(s_t \mid \theta^\mu\right)+\mathcal{N}_t$,$\mathcal{N}_t$为探索噪声 - \STATE 环境根据$a_t$反馈奖励$s_t$和下一个状态$s_{t+1}$ - \STATE 存储transition$(s_t,a_t,r_t,s_{t+1})$到经验回放$R$中 - \STATE 更新环境状态$s_{t+1} \leftarrow s_t$ - \STATE {\bfseries 更新策略:} - \STATE 从$R$中取出一个随机批量的$(s_i,a_i,r_i,s_{i+1})$ - \STATE 求得$y_i=r_i+\gamma Q^{\prime}\left(s_{i+1}, \mu^{\prime}\left(s_{i+1} \mid \theta^{\mu^{\prime}}\right) \mid \theta^{Q^{\prime}}\right)$ - \STATE 更新critic参数,其损失为:$L=\frac{1}{N} \sum_i\left(y_i-Q\left(s_i, a_i \mid \theta^Q\right)\right)^2$ - \STATE 更新actor参数:$\left.\left.\nabla_{\theta^\mu} J \approx \frac{1}{N} \sum_i \nabla_a Q\left(s, a \mid \theta^Q\right)\right|_{s=s_i, a=\mu\left(s_i\right)} \nabla_{\theta^\mu} \mu\left(s \mid \theta^\mu\right)\right|_{s_i}$ - \STATE 软更新目标网络:$\theta^{Q^{\prime}} \leftarrow \tau \theta^Q+(1-\tau) \theta^{Q^{\prime}}$, - $\theta^{\mu^{\prime}} \leftarrow \tau \theta^\mu+(1-\tau) \theta^{\mu^{\prime}}$ - \ENDFOR - \end{algorithmic} -\end{algorithm} -\footnotetext[1]{Continuous control with deep reinforcement learning} -\clearpage -\section{SoftQ算法} -\begin{algorithm}[H] - \floatname{algorithm}{{SoftQ算法}} - \renewcommand{\thealgorithm}{} % 去掉算法标号 - \caption{} - \begin{algorithmic}[1] - \STATE 初始化参数$\theta$和$\phi$% 初始化 - \STATE 复制参数$\bar{\theta} \leftarrow \theta, \bar{\phi} \leftarrow \phi$ - \STATE 初始化经验回放$D$ - \FOR {回合数 = $1,M$} - \FOR {时步 = $1,t$} - \STATE 根据$\mathbf{a}_{t} \leftarrow f^{\phi}\left(\xi ; \mathbf{s}_{t}\right)$采样动作,其中$\xi \sim \mathcal{N}(\mathbf{0}, \boldsymbol{I})$ - \STATE 环境根据$a_t$反馈奖励$s_t$和下一个状态$s_{t+1}$ - \STATE 存储transition即$(s_t,a_t,r_t,s_{t+1})$到经验回放$D$中 - \STATE 更新环境状态$s_{t+1} \leftarrow s_t$ - \STATE {\bfseries 更新soft Q函数参数:} - \STATE 对于每个$s^{(i)}_{t+1}$采样$\left\{\mathbf{a}^{(i, j)}\right\}_{j=0}^{M} \sim q_{\mathbf{a}^{\prime}}$ - \STATE 计算empirical soft values $V_{\mathrm{soft}}^{\theta}\left(\mathbf{s}_{t}\right)$\footnotemark[1] - \STATE 计算empirical gradient $J_{Q}(\theta)$\footnotemark[2] - \STATE 根据$J_{Q}(\theta)$使用ADAM更新参数$\theta$ - \STATE {\bfseries 更新策略:} - \STATE 对于每个$s^{(i)}_{t}$采样$\left\{\xi^{(i, j)}\right\}_{j=0}^{M} \sim \mathcal{N}(\mathbf{0}, \boldsymbol{I})$ - \STATE 计算$\mathbf{a}_{t}^{(i, j)}=f^{\phi}\left(\xi^{(i, j)}, \mathbf{s}_{t}^{(i)}\right)$ - \STATE 使用经验估计计算$\Delta f^{\phi}\left(\cdot ; \mathbf{s}_{t}\right)$\footnotemark[3] - \STATE 计算经验估计$\frac{\partial J_{\pi}\left(\phi ; \mathbf{s}_{t}\right)}{\partial \phi} \propto \mathbb{E}_{\xi}\left[\Delta f^{\phi}\left(\xi ; \mathbf{s}_{t}\right) \frac{\partial f^{\phi}\left(\xi ; \mathbf{s}_{t}\right)}{\partial \phi}\right]$,即$\hat{\nabla}_{\phi} J_{\pi}$ - \STATE 根据$\hat{\nabla}_{\phi} J_{\pi}$使用ADAM更新参数$\phi$ - \STATE - \ENDFOR - \STATE 每$C$个回合复制参数$\bar{\theta} \leftarrow \theta, \bar{\phi} \leftarrow \phi$ - \ENDFOR - \end{algorithmic} -\end{algorithm} -\footnotetext[1]{$V_{\mathrm{soft}}^{\theta}\left(\mathbf{s}_{t}\right)=\alpha \log \mathbb{E}_{q_{\mathbf{a}^{\prime}}}\left[\frac{\exp \left(\frac{1}{\alpha} Q_{\mathrm{soft}}^{\theta}\left(\mathbf{s}_{t}, \mathbf{a}^{\prime}\right)\right)}{q_{\mathbf{a}^{\prime}}\left(\mathbf{a}^{\prime}\right)}\right]$} -\footnotetext[2]{$J_{Q}(\theta)=\mathbb{E}_{\mathbf{s}_{t} \sim q_{\mathbf{s}_{t}}, \mathbf{a}_{t} \sim q_{\mathbf{a}_{t}}}\left[\frac{1}{2}\left(\hat{Q}_{\mathrm{soft}}^{\bar{\theta}}\left(\mathbf{s}_{t}, \mathbf{a}_{t}\right)-Q_{\mathrm{soft}}^{\theta}\left(\mathbf{s}_{t}, \mathbf{a}_{t}\right)\right)^{2}\right]$} -\footnotetext[3]{$\begin{aligned} \Delta f^{\phi}\left(\cdot ; \mathbf{s}_{t}\right)=& \mathbb{E}_{\mathbf{a}_{t} \sim \pi^{\phi}}\left[\left.\kappa\left(\mathbf{a}_{t}, f^{\phi}\left(\cdot ; \mathbf{s}_{t}\right)\right) \nabla_{\mathbf{a}^{\prime}} Q_{\mathrm{soft}}^{\theta}\left(\mathbf{s}_{t}, \mathbf{a}^{\prime}\right)\right|_{\mathbf{a}^{\prime}=\mathbf{a}_{t}}\right.\\ &\left.+\left.\alpha \nabla_{\mathbf{a}^{\prime}} \kappa\left(\mathbf{a}^{\prime}, f^{\phi}\left(\cdot ; \mathbf{s}_{t}\right)\right)\right|_{\mathbf{a}^{\prime}=\mathbf{a}_{t}}\right] \end{aligned}$} -\clearpage -\section{SAC-S算法} -\begin{algorithm}[H] % [H]固定位置 - \floatname{algorithm}{{SAC-S算法}\footnotemark[1]} - \renewcommand{\thealgorithm}{} % 去掉算法标号 - \caption{} - \begin{algorithmic}[1] % [1]显示步数 - \STATE 初始化参数$\psi, \bar{\psi}, \theta, \phi$ - \FOR {回合数 = $1,M$} - \FOR {时步 = $1,t$} - \STATE 根据$\boldsymbol{a}_{t} \sim \pi_{\phi}\left(\boldsymbol{a}_{t} \mid \mathbf{s}_{t}\right)$采样动作$a_t$ - \STATE 环境反馈奖励和下一个状态,$\mathbf{s}_{t+1} \sim p\left(\mathbf{s}_{t+1} \mid \mathbf{s}_{t}, \mathbf{a}_{t}\right)$ - \STATE 存储transition到经验回放中,$\mathcal{D} \leftarrow \mathcal{D} \cup\left\{\left(\mathbf{s}_{t}, \mathbf{a}_{t}, r\left(\mathbf{s}_{t}, \mathbf{a}_{t}\right), \mathbf{s}_{t+1}\right)\right\}$ - \STATE 更新环境状态$s_{t+1} \leftarrow s_t$ - \STATE {\bfseries 更新策略:} - \STATE $\psi \leftarrow \psi-\lambda_{V} \hat{\nabla}_{\psi} J_{V}(\psi)$ - \STATE $\theta_{i} \leftarrow \theta_{i}-\lambda_{Q} \hat{\nabla}_{\theta_{i}} J_{Q}\left(\theta_{i}\right)$ for $i \in\{1,2\}$ - \STATE $\phi \leftarrow \phi-\lambda_{\pi} \hat{\nabla}_{\phi} J_{\pi}(\phi)$ - \STATE $\bar{\psi} \leftarrow \tau \psi+(1-\tau) \bar{\psi}$ - \ENDFOR - \ENDFOR - \end{algorithmic} -\end{algorithm} -\footnotetext[1]{Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor} -\clearpage -\section{SAC算法} -\begin{algorithm}[H] % [H]固定位置 - \floatname{algorithm}{{SAC算法}\footnotemark[1]} - \renewcommand{\thealgorithm}{} % 去掉算法标号 - \caption{} - \begin{algorithmic}[1] - \STATE 初始化网络参数$\theta_1,\theta_2$以及$\phi$ % 初始化 - \STATE 复制参数到目标网络$\bar{\theta_1} \leftarrow \theta_1,\bar{\theta_2} \leftarrow \theta_2,$ - \STATE 初始化经验回放$D$ - \FOR {回合数 = $1,M$} - \STATE 重置环境,获得初始状态$s_t$ - \FOR {时步 = $1,t$} - \STATE 根据$\boldsymbol{a}_{t} \sim \pi_{\phi}\left(\boldsymbol{a}_{t} \mid \mathbf{s}_{t}\right)$采样动作$a_t$ - \STATE 环境反馈奖励和下一个状态,$\mathbf{s}_{t+1} \sim p\left(\mathbf{s}_{t+1} \mid \mathbf{s}_{t}, \mathbf{a}_{t}\right)$ - \STATE 存储transition到经验回放中,$\mathcal{D} \leftarrow \mathcal{D} \cup\left\{\left(\mathbf{s}_{t}, \mathbf{a}_{t}, r\left(\mathbf{s}_{t}, \mathbf{a}_{t}\right), \mathbf{s}_{t+1}\right)\right\}$ - \STATE 更新环境状态$s_{t+1} \leftarrow s_t$ - \STATE {\bfseries 更新策略:} - \STATE 更新$Q$函数,$\theta_{i} \leftarrow \theta_{i}-\lambda_{Q} \hat{\nabla}_{\theta_{i}} J_{Q}\left(\theta_{i}\right)$ for $i \in\{1,2\}$\footnotemark[2]\footnotemark[3] - \STATE 更新策略权重,$\phi \leftarrow \phi-\lambda_{\pi} \hat{\nabla}_{\phi} J_{\pi}(\phi)$ \footnotemark[4] - \STATE 调整temperature,$\alpha \leftarrow \alpha-\lambda \hat{\nabla}_{\alpha} J(\alpha)$ \footnotemark[5] - \STATE 更新目标网络权重,$\bar{\theta}_{i} \leftarrow \tau \theta_{i}+(1-\tau) \bar{\theta}_{i}$ for $i \in\{1,2\}$ - \ENDFOR - \ENDFOR - \end{algorithmic} -\end{algorithm} -\footnotetext[2]{Soft Actor-Critic Algorithms and Applications} -\footnotetext[2]{$J_{Q}(\theta)=\mathbb{E}_{\left(\mathbf{s}_{t}, \mathbf{a}_{t}\right) \sim \mathcal{D}}\left[\frac{1}{2}\left(Q_{\theta}\left(\mathbf{s}_{t}, \mathbf{a}_{t}\right)-\left(r\left(\mathbf{s}_{t}, \mathbf{a}_{t}\right)+\gamma \mathbb{E}_{\mathbf{s}_{t+1} \sim p}\left[V_{\bar{\theta}}\left(\mathbf{s}_{t+1}\right)\right]\right)\right)^{2}\right]$} -\footnotetext[3]{$\hat{\nabla}_{\theta} J_{Q}(\theta)=\nabla_{\theta} Q_{\theta}\left(\mathbf{a}_{t}, \mathbf{s}_{t}\right)\left(Q_{\theta}\left(\mathbf{s}_{t}, \mathbf{a}_{t}\right)-\left(r\left(\mathbf{s}_{t}, \mathbf{a}_{t}\right)+\gamma\left(Q_{\bar{\theta}}\left(\mathbf{s}_{t+1}, \mathbf{a}_{t+1}\right)-\alpha \log \left(\pi_{\phi}\left(\mathbf{a}_{t+1} \mid \mathbf{s}_{t+1}\right)\right)\right)\right)\right.$} -\footnotetext[4]{$\hat{\nabla}_{\phi} J_{\pi}(\phi)=\nabla_{\phi} \alpha \log \left(\pi_{\phi}\left(\mathbf{a}_{t} \mid \mathbf{s}_{t}\right)\right)+\left(\nabla_{\mathbf{a}_{t}} \alpha \log \left(\pi_{\phi}\left(\mathbf{a}_{t} \mid \mathbf{s}_{t}\right)\right)-\nabla_{\mathbf{a}_{t}} Q\left(\mathbf{s}_{t}, \mathbf{a}_{t}\right)\right) \nabla_{\phi} f_{\phi}\left(\epsilon_{t} ; \mathbf{s}_{t}\right)$,$\mathbf{a}_{t}=f_{\phi}\left(\epsilon_{t} ; \mathbf{s}_{t}\right)$} -\footnotetext[5]{$J(\alpha)=\mathbb{E}_{\mathbf{a}_{t} \sim \pi_{t}}\left[-\alpha \log \pi_{t}\left(\mathbf{a}_{t} \mid \mathbf{s}_{t}\right)-\alpha \overline{\mathcal{H}}\right]$} -\clearpage -\end{document} \ No newline at end of file diff --git a/projects/codes/A2C/README.md b/projects/codes/A2C/README.md deleted file mode 100644 index 5252838..0000000 --- a/projects/codes/A2C/README.md +++ /dev/null @@ -1,7 +0,0 @@ -## 脚本描述 - -* `task0.py`:离散动作任务 - -* `task1.py`:离散动作任务,与`task0.py`唯一的区别就是Actor的激活函数是tanh而不是relu,在`CartPole-v1`上效果更好 - -* `task2.py`:连续动作任务,#TODO待调试 \ No newline at end of file diff --git a/projects/codes/A2C/Test_CartPole-v1_A2C_20221030-212553/config.yaml b/projects/codes/A2C/Test_CartPole-v1_A2C_20221030-212553/config.yaml deleted file mode 100644 index 865f5bb..0000000 --- a/projects/codes/A2C/Test_CartPole-v1_A2C_20221030-212553/config.yaml +++ /dev/null @@ -1,24 +0,0 @@ -general_cfg: - algo_name: A2C - device: cuda - env_name: CartPole-v1 - eval_eps: 10 - load_checkpoint: true - load_path: Train_CartPole-v1_A2C_20221030-211435 - max_steps: 200 - mode: test - save_fig: true - seed: 1 - show_fig: false - test_eps: 20 - train_eps: 1000 -algo_cfg: - actor_hidden_dim: 256 - actor_lr: 0.0003 - batch_size: 64 - buffer_size: 100000 - critic_hidden_dim: 256 - critic_lr: 0.001 - gamma: 0.99 - hidden_dim: 256 - target_update: 4 diff --git a/projects/codes/A2C/Test_CartPole-v1_A2C_20221030-212553/logs/log.txt b/projects/codes/A2C/Test_CartPole-v1_A2C_20221030-212553/logs/log.txt deleted file mode 100644 index 0ecfa0a..0000000 --- a/projects/codes/A2C/Test_CartPole-v1_A2C_20221030-212553/logs/log.txt +++ /dev/null @@ -1,23 +0,0 @@ -2022-10-30 21:25:53 - r - INFO: - n_states: 4, n_actions: 2 -2022-10-30 21:25:55 - r - INFO: - Start testing! -2022-10-30 21:25:55 - r - INFO: - Env: CartPole-v1, Algorithm: A2C, Device: cuda -2022-10-30 21:25:56 - r - INFO: - Episode: 1/20, Reward: 200.0, Step: 200 -2022-10-30 21:25:56 - r - INFO: - Episode: 2/20, Reward: 200.0, Step: 200 -2022-10-30 21:25:56 - r - INFO: - Episode: 3/20, Reward: 200.0, Step: 200 -2022-10-30 21:25:56 - r - INFO: - Episode: 4/20, Reward: 200.0, Step: 200 -2022-10-30 21:25:56 - r - INFO: - Episode: 5/20, Reward: 200.0, Step: 200 -2022-10-30 21:25:56 - r - INFO: - Episode: 6/20, Reward: 200.0, Step: 200 -2022-10-30 21:25:56 - r - INFO: - Episode: 7/20, Reward: 200.0, Step: 200 -2022-10-30 21:25:56 - r - INFO: - Episode: 8/20, Reward: 200.0, Step: 200 -2022-10-30 21:25:56 - r - INFO: - Episode: 9/20, Reward: 200.0, Step: 200 -2022-10-30 21:25:56 - r - INFO: - Episode: 10/20, Reward: 200.0, Step: 200 -2022-10-30 21:25:57 - r - INFO: - Episode: 11/20, Reward: 200.0, Step: 200 -2022-10-30 21:25:57 - r - INFO: - Episode: 12/20, Reward: 190.0, Step: 190 -2022-10-30 21:25:57 - r - INFO: - Episode: 13/20, Reward: 200.0, Step: 200 -2022-10-30 21:25:57 - r - INFO: - Episode: 14/20, Reward: 200.0, Step: 200 -2022-10-30 21:25:57 - r - INFO: - Episode: 15/20, Reward: 96.0, Step: 96 -2022-10-30 21:25:57 - r - INFO: - Episode: 16/20, Reward: 200.0, Step: 200 -2022-10-30 21:25:57 - r - INFO: - Episode: 17/20, Reward: 200.0, Step: 200 -2022-10-30 21:25:57 - r - INFO: - Episode: 18/20, Reward: 200.0, Step: 200 -2022-10-30 21:25:57 - r - INFO: - Episode: 19/20, Reward: 112.0, Step: 112 -2022-10-30 21:25:57 - r - INFO: - Episode: 20/20, Reward: 200.0, Step: 200 diff --git a/projects/codes/A2C/Test_CartPole-v1_A2C_20221030-212553/models/actor_checkpoint.pt b/projects/codes/A2C/Test_CartPole-v1_A2C_20221030-212553/models/actor_checkpoint.pt deleted file mode 100644 index 89d0854..0000000 Binary files a/projects/codes/A2C/Test_CartPole-v1_A2C_20221030-212553/models/actor_checkpoint.pt and /dev/null differ diff --git a/projects/codes/A2C/Test_CartPole-v1_A2C_20221030-212553/models/critic_checkpoint.pt b/projects/codes/A2C/Test_CartPole-v1_A2C_20221030-212553/models/critic_checkpoint.pt deleted file mode 100644 index 720f388..0000000 Binary files a/projects/codes/A2C/Test_CartPole-v1_A2C_20221030-212553/models/critic_checkpoint.pt and /dev/null differ diff --git a/projects/codes/A2C/Test_CartPole-v1_A2C_20221030-212553/results/learning_curve.png b/projects/codes/A2C/Test_CartPole-v1_A2C_20221030-212553/results/learning_curve.png deleted file mode 100644 index bfee34b..0000000 Binary files a/projects/codes/A2C/Test_CartPole-v1_A2C_20221030-212553/results/learning_curve.png and /dev/null differ diff --git a/projects/codes/A2C/Test_CartPole-v1_A2C_20221030-212553/results/res.csv b/projects/codes/A2C/Test_CartPole-v1_A2C_20221030-212553/results/res.csv deleted file mode 100644 index ce0e7d1..0000000 --- a/projects/codes/A2C/Test_CartPole-v1_A2C_20221030-212553/results/res.csv +++ /dev/null @@ -1,21 +0,0 @@ -episodes,rewards,steps -0,200.0,200 -1,200.0,200 -2,200.0,200 -3,200.0,200 -4,200.0,200 -5,200.0,200 -6,200.0,200 -7,200.0,200 -8,200.0,200 -9,200.0,200 -10,200.0,200 -11,190.0,190 -12,200.0,200 -13,200.0,200 -14,96.0,96 -15,200.0,200 -16,200.0,200 -17,200.0,200 -18,112.0,112 -19,200.0,200 diff --git a/projects/codes/A2C/Test_CartPole-v1_A2C_20221031-233316/config.yaml b/projects/codes/A2C/Test_CartPole-v1_A2C_20221031-233316/config.yaml deleted file mode 100644 index 709a1e3..0000000 --- a/projects/codes/A2C/Test_CartPole-v1_A2C_20221031-233316/config.yaml +++ /dev/null @@ -1,25 +0,0 @@ -general_cfg: - algo_name: A2C - device: cuda - env_name: CartPole-v1 - eval_eps: 10 - eval_per_episode: 5 - load_checkpoint: true - load_path: Train_CartPole-v1_A2C_20221031-232138 - max_steps: 200 - mode: test - save_fig: true - seed: 1 - show_fig: false - test_eps: 20 - train_eps: 1000 -algo_cfg: - actor_hidden_dim: 256 - actor_lr: 0.0003 - batch_size: 64 - buffer_size: 100000 - critic_hidden_dim: 256 - critic_lr: 0.001 - gamma: 0.99 - hidden_dim: 256 - target_update: 4 diff --git a/projects/codes/A2C/Test_CartPole-v1_A2C_20221031-233316/logs/log.txt b/projects/codes/A2C/Test_CartPole-v1_A2C_20221031-233316/logs/log.txt deleted file mode 100644 index d84edb2..0000000 --- a/projects/codes/A2C/Test_CartPole-v1_A2C_20221031-233316/logs/log.txt +++ /dev/null @@ -1,28 +0,0 @@ -2022-10-31 23:33:16 - r - INFO: - n_states: 4, n_actions: 2 -2022-10-31 23:33:16 - r - INFO: - Actor model name: ActorSoftmaxTanh -2022-10-31 23:33:16 - r - INFO: - Critic model name: Critic -2022-10-31 23:33:16 - r - INFO: - ACMemory memory name: PGReplay -2022-10-31 23:33:16 - r - INFO: - agent name: A2C -2022-10-31 23:33:17 - r - INFO: - Start testing! -2022-10-31 23:33:17 - r - INFO: - Env: CartPole-v1, Algorithm: A2C, Device: cuda -2022-10-31 23:33:18 - r - INFO: - Episode: 1/20, Reward: 200.0, Step: 200 -2022-10-31 23:33:18 - r - INFO: - Episode: 2/20, Reward: 200.0, Step: 200 -2022-10-31 23:33:18 - r - INFO: - Episode: 3/20, Reward: 186.0, Step: 186 -2022-10-31 23:33:18 - r - INFO: - Episode: 4/20, Reward: 200.0, Step: 200 -2022-10-31 23:33:18 - r - INFO: - Episode: 5/20, Reward: 200.0, Step: 200 -2022-10-31 23:33:19 - r - INFO: - Episode: 6/20, Reward: 200.0, Step: 200 -2022-10-31 23:33:19 - r - INFO: - Episode: 7/20, Reward: 200.0, Step: 200 -2022-10-31 23:33:19 - r - INFO: - Episode: 8/20, Reward: 200.0, Step: 200 -2022-10-31 23:33:19 - r - INFO: - Episode: 9/20, Reward: 200.0, Step: 200 -2022-10-31 23:33:19 - r - INFO: - Episode: 10/20, Reward: 200.0, Step: 200 -2022-10-31 23:33:19 - r - INFO: - Episode: 11/20, Reward: 200.0, Step: 200 -2022-10-31 23:33:19 - r - INFO: - Episode: 12/20, Reward: 200.0, Step: 200 -2022-10-31 23:33:19 - r - INFO: - Episode: 13/20, Reward: 200.0, Step: 200 -2022-10-31 23:33:19 - r - INFO: - Episode: 14/20, Reward: 200.0, Step: 200 -2022-10-31 23:33:19 - r - INFO: - Episode: 15/20, Reward: 200.0, Step: 200 -2022-10-31 23:33:19 - r - INFO: - Episode: 16/20, Reward: 200.0, Step: 200 -2022-10-31 23:33:19 - r - INFO: - Episode: 17/20, Reward: 200.0, Step: 200 -2022-10-31 23:33:19 - r - INFO: - Episode: 18/20, Reward: 200.0, Step: 200 -2022-10-31 23:33:19 - r - INFO: - Episode: 19/20, Reward: 200.0, Step: 200 -2022-10-31 23:33:20 - r - INFO: - Episode: 20/20, Reward: 200.0, Step: 200 -2022-10-31 23:33:20 - r - INFO: - Finish testing! diff --git a/projects/codes/A2C/Test_CartPole-v1_A2C_20221031-233316/models/actor_checkpoint.pt b/projects/codes/A2C/Test_CartPole-v1_A2C_20221031-233316/models/actor_checkpoint.pt deleted file mode 100644 index 05bd7b6..0000000 Binary files a/projects/codes/A2C/Test_CartPole-v1_A2C_20221031-233316/models/actor_checkpoint.pt and /dev/null differ diff --git a/projects/codes/A2C/Test_CartPole-v1_A2C_20221031-233316/models/critic_checkpoint.pt b/projects/codes/A2C/Test_CartPole-v1_A2C_20221031-233316/models/critic_checkpoint.pt deleted file mode 100644 index 720f388..0000000 Binary files a/projects/codes/A2C/Test_CartPole-v1_A2C_20221031-233316/models/critic_checkpoint.pt and /dev/null differ diff --git a/projects/codes/A2C/Test_CartPole-v1_A2C_20221031-233316/results/learning_curve.png b/projects/codes/A2C/Test_CartPole-v1_A2C_20221031-233316/results/learning_curve.png deleted file mode 100644 index 33274af..0000000 Binary files a/projects/codes/A2C/Test_CartPole-v1_A2C_20221031-233316/results/learning_curve.png and /dev/null differ diff --git a/projects/codes/A2C/Test_CartPole-v1_A2C_20221031-233316/results/res.csv b/projects/codes/A2C/Test_CartPole-v1_A2C_20221031-233316/results/res.csv deleted file mode 100644 index 571b1e6..0000000 --- a/projects/codes/A2C/Test_CartPole-v1_A2C_20221031-233316/results/res.csv +++ /dev/null @@ -1,21 +0,0 @@ -episodes,rewards,steps -0,200.0,200 -1,200.0,200 -2,186.0,186 -3,200.0,200 -4,200.0,200 -5,200.0,200 -6,200.0,200 -7,200.0,200 -8,200.0,200 -9,200.0,200 -10,200.0,200 -11,200.0,200 -12,200.0,200 -13,200.0,200 -14,200.0,200 -15,200.0,200 -16,200.0,200 -17,200.0,200 -18,200.0,200 -19,200.0,200 diff --git a/projects/codes/A2C/Train_CartPole-v1_A2C_20221030-211435/config.yaml b/projects/codes/A2C/Train_CartPole-v1_A2C_20221030-211435/config.yaml deleted file mode 100644 index 7dde5b7..0000000 --- a/projects/codes/A2C/Train_CartPole-v1_A2C_20221030-211435/config.yaml +++ /dev/null @@ -1,23 +0,0 @@ -general_cfg: - algo_name: A2C - device: cuda - env_name: CartPole-v1 - eval_eps: 10 - load_checkpoint: false - load_path: tasks - max_steps: 200 - mode: train - save_fig: true - seed: 1 - show_fig: false - test_eps: 20 - train_eps: 1000 -algo_cfg: - actor_hidden_dim: 256 - actor_lr: 0.0003 - batch_size: 64 - buffer_size: 100000 - critic_hidden_dim: 256 - critic_lr: 0.001 - gamma: 0.99 - hidden_dim: 256 diff --git a/projects/codes/A2C/Train_CartPole-v1_A2C_20221030-211435/logs/log.txt b/projects/codes/A2C/Train_CartPole-v1_A2C_20221030-211435/logs/log.txt deleted file mode 100644 index b13b335..0000000 --- a/projects/codes/A2C/Train_CartPole-v1_A2C_20221030-211435/logs/log.txt +++ /dev/null @@ -1,1066 +0,0 @@ -2022-10-30 21:14:35 - r - INFO: - n_states: 4, n_actions: 2 -2022-10-30 21:14:35 - r - INFO: - Start training! -2022-10-30 21:14:35 - r - INFO: - Env: CartPole-v1, Algorithm: A2C, Device: cuda -2022-10-30 21:14:37 - r - INFO: - Episode: 1/1000, Reward: 25.0, Step: 25 -2022-10-30 21:14:38 - r - INFO: - Current episode 1 has the best eval reward: 29.2 -2022-10-30 21:14:38 - r - INFO: - Episode: 2/1000, Reward: 13.0, Step: 13 -2022-10-30 21:14:38 - r - INFO: - Episode: 3/1000, Reward: 58.0, Step: 58 -2022-10-30 21:14:38 - r - INFO: - Episode: 4/1000, Reward: 10.0, Step: 10 -2022-10-30 21:14:38 - r - INFO: - Episode: 5/1000, Reward: 39.0, Step: 39 -2022-10-30 21:14:38 - r - INFO: - Episode: 6/1000, Reward: 39.0, Step: 39 -2022-10-30 21:14:38 - r - INFO: - Episode: 7/1000, Reward: 25.0, Step: 25 -2022-10-30 21:14:39 - r - INFO: - Episode: 8/1000, Reward: 22.0, Step: 22 -2022-10-30 21:14:39 - r - INFO: - Episode: 9/1000, Reward: 21.0, Step: 21 -2022-10-30 21:14:39 - r - INFO: - Episode: 10/1000, Reward: 27.0, Step: 27 -2022-10-30 21:14:39 - r - INFO: - Episode: 11/1000, Reward: 35.0, Step: 35 -2022-10-30 21:14:40 - r - INFO: - Episode: 12/1000, Reward: 26.0, Step: 26 -2022-10-30 21:14:40 - r - INFO: - Episode: 13/1000, Reward: 38.0, Step: 38 -2022-10-30 21:14:40 - r - INFO: - Episode: 14/1000, Reward: 29.0, Step: 29 -2022-10-30 21:14:40 - r - INFO: - Episode: 15/1000, Reward: 50.0, Step: 50 -2022-10-30 21:14:40 - r - INFO: - Episode: 16/1000, Reward: 20.0, Step: 20 -2022-10-30 21:14:40 - r - INFO: - Episode: 17/1000, Reward: 52.0, Step: 52 -2022-10-30 21:14:41 - r - INFO: - Current episode 17 has the best eval reward: 32.9 -2022-10-30 21:14:41 - r - INFO: - Episode: 18/1000, Reward: 12.0, Step: 12 -2022-10-30 21:14:41 - r - INFO: - Episode: 19/1000, Reward: 20.0, Step: 20 -2022-10-30 21:14:41 - r - INFO: - Episode: 20/1000, Reward: 38.0, Step: 38 -2022-10-30 21:14:41 - r - INFO: - Current episode 20 has the best eval reward: 38.9 -2022-10-30 21:14:41 - r - INFO: - Episode: 21/1000, Reward: 22.0, Step: 22 -2022-10-30 21:14:41 - r - INFO: - Episode: 22/1000, Reward: 36.0, Step: 36 -2022-10-30 21:14:42 - r - INFO: - Episode: 23/1000, Reward: 20.0, Step: 20 -2022-10-30 21:14:42 - r - INFO: - Episode: 24/1000, Reward: 35.0, Step: 35 -2022-10-30 21:14:42 - r - INFO: - Episode: 25/1000, Reward: 90.0, Step: 90 -2022-10-30 21:14:42 - r - INFO: - Episode: 26/1000, Reward: 29.0, Step: 29 -2022-10-30 21:14:42 - r - INFO: - Episode: 27/1000, Reward: 16.0, Step: 16 -2022-10-30 21:14:43 - r - INFO: - Episode: 28/1000, Reward: 25.0, Step: 25 -2022-10-30 21:14:43 - r - INFO: - Episode: 29/1000, Reward: 46.0, Step: 46 -2022-10-30 21:14:43 - r - INFO: - Episode: 30/1000, Reward: 33.0, Step: 33 -2022-10-30 21:14:43 - r - INFO: - Episode: 31/1000, Reward: 11.0, Step: 11 -2022-10-30 21:14:43 - r - INFO: - Episode: 32/1000, Reward: 27.0, Step: 27 -2022-10-30 21:14:44 - r - INFO: - Episode: 33/1000, Reward: 32.0, Step: 32 -2022-10-30 21:14:44 - r - INFO: - Current episode 33 has the best eval reward: 39.2 -2022-10-30 21:14:44 - r - INFO: - Episode: 34/1000, Reward: 21.0, Step: 21 -2022-10-30 21:14:44 - r - INFO: - Episode: 35/1000, Reward: 11.0, Step: 11 -2022-10-30 21:14:44 - r - INFO: - Episode: 36/1000, Reward: 21.0, Step: 21 -2022-10-30 21:14:44 - r - INFO: - Episode: 37/1000, Reward: 51.0, Step: 51 -2022-10-30 21:14:44 - r - INFO: - Episode: 38/1000, Reward: 29.0, Step: 29 -2022-10-30 21:14:45 - r - INFO: - Current episode 38 has the best eval reward: 41.7 -2022-10-30 21:14:45 - r - INFO: - Episode: 39/1000, Reward: 50.0, Step: 50 -2022-10-30 21:14:45 - r - INFO: - Current episode 39 has the best eval reward: 48.5 -2022-10-30 21:14:45 - r - INFO: - Episode: 40/1000, Reward: 19.0, Step: 19 -2022-10-30 21:14:45 - r - INFO: - Episode: 41/1000, Reward: 41.0, Step: 41 -2022-10-30 21:14:45 - r - INFO: - Episode: 42/1000, Reward: 28.0, Step: 28 -2022-10-30 21:14:46 - r - INFO: - Episode: 43/1000, Reward: 71.0, Step: 71 -2022-10-30 21:14:46 - r - INFO: - Episode: 44/1000, Reward: 45.0, Step: 45 -2022-10-30 21:14:46 - r - INFO: - Episode: 45/1000, Reward: 42.0, Step: 42 -2022-10-30 21:14:46 - r - INFO: - Current episode 45 has the best eval reward: 49.6 -2022-10-30 21:14:46 - r - INFO: - Episode: 46/1000, Reward: 39.0, Step: 39 -2022-10-30 21:14:47 - r - INFO: - Episode: 47/1000, Reward: 21.0, Step: 21 -2022-10-30 21:14:47 - r - INFO: - Episode: 48/1000, Reward: 14.0, Step: 14 -2022-10-30 21:14:47 - r - INFO: - Episode: 49/1000, Reward: 23.0, Step: 23 -2022-10-30 21:14:47 - r - INFO: - Episode: 50/1000, Reward: 21.0, Step: 21 -2022-10-30 21:14:47 - r - INFO: - Episode: 51/1000, Reward: 34.0, Step: 34 -2022-10-30 21:14:48 - r - INFO: - Episode: 52/1000, Reward: 14.0, Step: 14 -2022-10-30 21:14:48 - r - INFO: - Episode: 53/1000, Reward: 41.0, Step: 41 -2022-10-30 21:14:48 - r - INFO: - Episode: 54/1000, Reward: 99.0, Step: 99 -2022-10-30 21:14:48 - r - INFO: - Episode: 55/1000, Reward: 21.0, Step: 21 -2022-10-30 21:14:49 - r - INFO: - Episode: 56/1000, Reward: 52.0, Step: 52 -2022-10-30 21:14:49 - r - INFO: - Episode: 57/1000, Reward: 34.0, Step: 34 -2022-10-30 21:14:49 - r - INFO: - Episode: 58/1000, Reward: 73.0, Step: 73 -2022-10-30 21:14:49 - r - INFO: - Episode: 59/1000, Reward: 21.0, Step: 21 -2022-10-30 21:14:49 - r - INFO: - Episode: 60/1000, Reward: 27.0, Step: 27 -2022-10-30 21:14:50 - r - INFO: - Episode: 61/1000, Reward: 51.0, Step: 51 -2022-10-30 21:14:50 - r - INFO: - Episode: 62/1000, Reward: 46.0, Step: 46 -2022-10-30 21:14:50 - r - INFO: - Episode: 63/1000, Reward: 21.0, Step: 21 -2022-10-30 21:14:50 - r - INFO: - Episode: 64/1000, Reward: 20.0, Step: 20 -2022-10-30 21:14:51 - r - INFO: - Episode: 65/1000, Reward: 44.0, Step: 44 -2022-10-30 21:14:51 - r - INFO: - Episode: 66/1000, Reward: 16.0, Step: 16 -2022-10-30 21:14:51 - r - INFO: - Episode: 67/1000, Reward: 39.0, Step: 39 -2022-10-30 21:14:51 - r - INFO: - Episode: 68/1000, Reward: 30.0, Step: 30 -2022-10-30 21:14:51 - r - INFO: - Episode: 69/1000, Reward: 37.0, Step: 37 -2022-10-30 21:14:52 - r - INFO: - Episode: 70/1000, Reward: 20.0, Step: 20 -2022-10-30 21:14:52 - r - INFO: - Episode: 71/1000, Reward: 21.0, Step: 21 -2022-10-30 21:14:52 - r - INFO: - Episode: 72/1000, Reward: 13.0, Step: 13 -2022-10-30 21:14:52 - r - INFO: - Episode: 73/1000, Reward: 65.0, Step: 65 -2022-10-30 21:14:53 - r - INFO: - Episode: 74/1000, Reward: 45.0, Step: 45 -2022-10-30 21:14:53 - r - INFO: - Episode: 75/1000, Reward: 45.0, Step: 45 -2022-10-30 21:14:53 - r - INFO: - Episode: 76/1000, Reward: 46.0, Step: 46 -2022-10-30 21:14:53 - r - INFO: - Episode: 77/1000, Reward: 13.0, Step: 13 -2022-10-30 21:14:53 - r - INFO: - Episode: 78/1000, Reward: 33.0, Step: 33 -2022-10-30 21:14:54 - r - INFO: - Episode: 79/1000, Reward: 30.0, Step: 30 -2022-10-30 21:14:54 - r - INFO: - Episode: 80/1000, Reward: 52.0, Step: 52 -2022-10-30 21:14:54 - r - INFO: - Episode: 81/1000, Reward: 27.0, Step: 27 -2022-10-30 21:14:54 - r - INFO: - Episode: 82/1000, Reward: 30.0, Step: 30 -2022-10-30 21:14:55 - r - INFO: - Episode: 83/1000, Reward: 47.0, Step: 47 -2022-10-30 21:14:55 - r - INFO: - Episode: 84/1000, Reward: 56.0, Step: 56 -2022-10-30 21:14:55 - r - INFO: - Episode: 85/1000, Reward: 19.0, Step: 19 -2022-10-30 21:14:55 - r - INFO: - Episode: 86/1000, Reward: 33.0, Step: 33 -2022-10-30 21:14:56 - r - INFO: - Episode: 87/1000, Reward: 25.0, Step: 25 -2022-10-30 21:14:56 - r - INFO: - Episode: 88/1000, Reward: 41.0, Step: 41 -2022-10-30 21:14:56 - r - INFO: - Episode: 89/1000, Reward: 20.0, Step: 20 -2022-10-30 21:14:56 - r - INFO: - Episode: 90/1000, Reward: 58.0, Step: 58 -2022-10-30 21:14:56 - r - INFO: - Episode: 91/1000, Reward: 35.0, Step: 35 -2022-10-30 21:14:57 - r - INFO: - Episode: 92/1000, Reward: 23.0, Step: 23 -2022-10-30 21:14:57 - r - INFO: - Episode: 93/1000, Reward: 12.0, Step: 12 -2022-10-30 21:14:57 - r - INFO: - Episode: 94/1000, Reward: 20.0, Step: 20 -2022-10-30 21:14:57 - r - INFO: - Episode: 95/1000, Reward: 10.0, Step: 10 -2022-10-30 21:14:57 - r - INFO: - Episode: 96/1000, Reward: 49.0, Step: 49 -2022-10-30 21:14:58 - r - INFO: - Episode: 97/1000, Reward: 29.0, Step: 29 -2022-10-30 21:14:58 - r - INFO: - Episode: 98/1000, Reward: 35.0, Step: 35 -2022-10-30 21:14:58 - r - INFO: - Episode: 99/1000, Reward: 36.0, Step: 36 -2022-10-30 21:14:58 - r - INFO: - Current episode 99 has the best eval reward: 53.4 -2022-10-30 21:14:58 - r - INFO: - Episode: 100/1000, Reward: 36.0, Step: 36 -2022-10-30 21:14:59 - r - INFO: - Episode: 101/1000, Reward: 16.0, Step: 16 -2022-10-30 21:14:59 - r - INFO: - Episode: 102/1000, Reward: 36.0, Step: 36 -2022-10-30 21:14:59 - r - INFO: - Current episode 102 has the best eval reward: 70.3 -2022-10-30 21:14:59 - r - INFO: - Episode: 103/1000, Reward: 30.0, Step: 30 -2022-10-30 21:15:00 - r - INFO: - Episode: 104/1000, Reward: 76.0, Step: 76 -2022-10-30 21:15:00 - r - INFO: - Episode: 105/1000, Reward: 52.0, Step: 52 -2022-10-30 21:15:00 - r - INFO: - Episode: 106/1000, Reward: 39.0, Step: 39 -2022-10-30 21:15:00 - r - INFO: - Episode: 107/1000, Reward: 52.0, Step: 52 -2022-10-30 21:15:01 - r - INFO: - Episode: 108/1000, Reward: 69.0, Step: 69 -2022-10-30 21:15:01 - r - INFO: - Episode: 109/1000, Reward: 27.0, Step: 27 -2022-10-30 21:15:01 - r - INFO: - Episode: 110/1000, Reward: 14.0, Step: 14 -2022-10-30 21:15:01 - r - INFO: - Episode: 111/1000, Reward: 28.0, Step: 28 -2022-10-30 21:15:01 - r - INFO: - Episode: 112/1000, Reward: 12.0, Step: 12 -2022-10-30 21:15:02 - r - INFO: - Episode: 113/1000, Reward: 26.0, Step: 26 -2022-10-30 21:15:03 - r - INFO: - Episode: 114/1000, Reward: 50.0, Step: 50 -2022-10-30 21:15:03 - r - INFO: - Episode: 115/1000, Reward: 25.0, Step: 25 -2022-10-30 21:15:03 - r - INFO: - Episode: 116/1000, Reward: 53.0, Step: 53 -2022-10-30 21:15:03 - r - INFO: - Episode: 117/1000, Reward: 19.0, Step: 19 -2022-10-30 21:15:04 - r - INFO: - Episode: 118/1000, Reward: 33.0, Step: 33 -2022-10-30 21:15:04 - r - INFO: - Episode: 119/1000, Reward: 34.0, Step: 34 -2022-10-30 21:15:04 - r - INFO: - Episode: 120/1000, Reward: 41.0, Step: 41 -2022-10-30 21:15:04 - r - INFO: - Episode: 121/1000, Reward: 25.0, Step: 25 -2022-10-30 21:15:05 - r - INFO: - Episode: 122/1000, Reward: 18.0, Step: 18 -2022-10-30 21:15:05 - r - INFO: - Episode: 123/1000, Reward: 114.0, Step: 114 -2022-10-30 21:15:05 - r - INFO: - Episode: 124/1000, Reward: 25.0, Step: 25 -2022-10-30 21:15:05 - r - INFO: - Episode: 125/1000, Reward: 46.0, Step: 46 -2022-10-30 21:15:06 - r - INFO: - Episode: 126/1000, Reward: 22.0, Step: 22 -2022-10-30 21:15:06 - r - INFO: - Episode: 127/1000, Reward: 71.0, Step: 71 -2022-10-30 21:15:06 - r - INFO: - Episode: 128/1000, Reward: 30.0, Step: 30 -2022-10-30 21:15:07 - r - INFO: - Episode: 129/1000, Reward: 130.0, Step: 130 -2022-10-30 21:15:07 - r - INFO: - Episode: 130/1000, Reward: 65.0, Step: 65 -2022-10-30 21:15:07 - r - INFO: - Episode: 131/1000, Reward: 55.0, Step: 55 -2022-10-30 21:15:07 - r - INFO: - Episode: 132/1000, Reward: 37.0, Step: 37 -2022-10-30 21:15:08 - r - INFO: - Episode: 133/1000, Reward: 46.0, Step: 46 -2022-10-30 21:15:08 - r - INFO: - Episode: 134/1000, Reward: 65.0, Step: 65 -2022-10-30 21:15:08 - r - INFO: - Episode: 135/1000, Reward: 31.0, Step: 31 -2022-10-30 21:15:08 - r - INFO: - Episode: 136/1000, Reward: 33.0, Step: 33 -2022-10-30 21:15:09 - r - INFO: - Episode: 137/1000, Reward: 39.0, Step: 39 -2022-10-30 21:15:09 - r - INFO: - Episode: 138/1000, Reward: 73.0, Step: 73 -2022-10-30 21:15:09 - r - INFO: - Episode: 139/1000, Reward: 78.0, Step: 78 -2022-10-30 21:15:10 - r - INFO: - Episode: 140/1000, Reward: 36.0, Step: 36 -2022-10-30 21:15:10 - r - INFO: - Episode: 141/1000, Reward: 56.0, Step: 56 -2022-10-30 21:15:10 - r - INFO: - Episode: 142/1000, Reward: 12.0, Step: 12 -2022-10-30 21:15:10 - r - INFO: - Episode: 143/1000, Reward: 36.0, Step: 36 -2022-10-30 21:15:11 - r - INFO: - Episode: 144/1000, Reward: 13.0, Step: 13 -2022-10-30 21:15:11 - r - INFO: - Episode: 145/1000, Reward: 85.0, Step: 85 -2022-10-30 21:15:11 - r - INFO: - Episode: 146/1000, Reward: 34.0, Step: 34 -2022-10-30 21:15:11 - r - INFO: - Episode: 147/1000, Reward: 16.0, Step: 16 -2022-10-30 21:15:12 - r - INFO: - Episode: 148/1000, Reward: 68.0, Step: 68 -2022-10-30 21:15:12 - r - INFO: - Episode: 149/1000, Reward: 94.0, Step: 94 -2022-10-30 21:15:12 - r - INFO: - Episode: 150/1000, Reward: 17.0, Step: 17 -2022-10-30 21:15:13 - r - INFO: - Episode: 151/1000, Reward: 64.0, Step: 64 -2022-10-30 21:15:13 - r - INFO: - Episode: 152/1000, Reward: 33.0, Step: 33 -2022-10-30 21:15:13 - r - INFO: - Episode: 153/1000, Reward: 63.0, Step: 63 -2022-10-30 21:15:13 - r - INFO: - Episode: 154/1000, Reward: 39.0, Step: 39 -2022-10-30 21:15:14 - r - INFO: - Episode: 155/1000, Reward: 72.0, Step: 72 -2022-10-30 21:15:14 - r - INFO: - Episode: 156/1000, Reward: 39.0, Step: 39 -2022-10-30 21:15:14 - r - INFO: - Episode: 157/1000, Reward: 37.0, Step: 37 -2022-10-30 21:15:14 - r - INFO: - Episode: 158/1000, Reward: 18.0, Step: 18 -2022-10-30 21:15:15 - r - INFO: - Episode: 159/1000, Reward: 55.0, Step: 55 -2022-10-30 21:15:15 - r - INFO: - Episode: 160/1000, Reward: 21.0, Step: 21 -2022-10-30 21:15:15 - r - INFO: - Episode: 161/1000, Reward: 54.0, Step: 54 -2022-10-30 21:15:15 - r - INFO: - Episode: 162/1000, Reward: 46.0, Step: 46 -2022-10-30 21:15:16 - r - INFO: - Episode: 163/1000, Reward: 21.0, Step: 21 -2022-10-30 21:15:16 - r - INFO: - Episode: 164/1000, Reward: 26.0, Step: 26 -2022-10-30 21:15:16 - r - INFO: - Episode: 165/1000, Reward: 70.0, Step: 70 -2022-10-30 21:15:17 - r - INFO: - Episode: 166/1000, Reward: 20.0, Step: 20 -2022-10-30 21:15:17 - r - INFO: - Episode: 167/1000, Reward: 41.0, Step: 41 -2022-10-30 21:15:17 - r - INFO: - Episode: 168/1000, Reward: 77.0, Step: 77 -2022-10-30 21:15:17 - r - INFO: - Episode: 169/1000, Reward: 13.0, Step: 13 -2022-10-30 21:15:18 - r - INFO: - Episode: 170/1000, Reward: 66.0, Step: 66 -2022-10-30 21:15:18 - r - INFO: - Episode: 171/1000, Reward: 72.0, Step: 72 -2022-10-30 21:15:18 - r - INFO: - Episode: 172/1000, Reward: 28.0, Step: 28 -2022-10-30 21:15:19 - r - INFO: - Episode: 173/1000, Reward: 68.0, Step: 68 -2022-10-30 21:15:19 - r - INFO: - Episode: 174/1000, Reward: 124.0, Step: 124 -2022-10-30 21:15:19 - r - INFO: - Episode: 175/1000, Reward: 41.0, Step: 41 -2022-10-30 21:15:20 - r - INFO: - Episode: 176/1000, Reward: 54.0, Step: 54 -2022-10-30 21:15:20 - r - INFO: - Episode: 177/1000, Reward: 33.0, Step: 33 -2022-10-30 21:15:20 - r - INFO: - Episode: 178/1000, Reward: 92.0, Step: 92 -2022-10-30 21:15:20 - r - INFO: - Episode: 179/1000, Reward: 23.0, Step: 23 -2022-10-30 21:15:21 - r - INFO: - Episode: 180/1000, Reward: 76.0, Step: 76 -2022-10-30 21:15:21 - r - INFO: - Episode: 181/1000, Reward: 47.0, Step: 47 -2022-10-30 21:15:22 - r - INFO: - Episode: 182/1000, Reward: 89.0, Step: 89 -2022-10-30 21:15:22 - r - INFO: - Episode: 183/1000, Reward: 84.0, Step: 84 -2022-10-30 21:15:22 - r - INFO: - Episode: 184/1000, Reward: 75.0, Step: 75 -2022-10-30 21:15:23 - r - INFO: - Episode: 185/1000, Reward: 64.0, Step: 64 -2022-10-30 21:15:23 - r - INFO: - Episode: 186/1000, Reward: 35.0, Step: 35 -2022-10-30 21:15:23 - r - INFO: - Episode: 187/1000, Reward: 44.0, Step: 44 -2022-10-30 21:15:24 - r - INFO: - Episode: 188/1000, Reward: 46.0, Step: 46 -2022-10-30 21:15:24 - r - INFO: - Episode: 189/1000, Reward: 67.0, Step: 67 -2022-10-30 21:15:25 - r - INFO: - Episode: 190/1000, Reward: 82.0, Step: 82 -2022-10-30 21:15:25 - r - INFO: - Episode: 191/1000, Reward: 55.0, Step: 55 -2022-10-30 21:15:25 - r - INFO: - Episode: 192/1000, Reward: 26.0, Step: 26 -2022-10-30 21:15:26 - r - INFO: - Episode: 193/1000, Reward: 116.0, Step: 116 -2022-10-30 21:15:26 - r - INFO: - Episode: 194/1000, Reward: 116.0, Step: 116 -2022-10-30 21:15:26 - r - INFO: - Episode: 195/1000, Reward: 119.0, Step: 119 -2022-10-30 21:15:27 - r - INFO: - Episode: 196/1000, Reward: 50.0, Step: 50 -2022-10-30 21:15:27 - r - INFO: - Episode: 197/1000, Reward: 43.0, Step: 43 -2022-10-30 21:15:27 - r - INFO: - Episode: 198/1000, Reward: 47.0, Step: 47 -2022-10-30 21:15:28 - r - INFO: - Episode: 199/1000, Reward: 71.0, Step: 71 -2022-10-30 21:15:28 - r - INFO: - Episode: 200/1000, Reward: 53.0, Step: 53 -2022-10-30 21:15:28 - r - INFO: - Current episode 200 has the best eval reward: 86.0 -2022-10-30 21:15:29 - r - INFO: - Episode: 201/1000, Reward: 137.0, Step: 137 -2022-10-30 21:15:29 - r - INFO: - Episode: 202/1000, Reward: 82.0, Step: 82 -2022-10-30 21:15:30 - r - INFO: - Episode: 203/1000, Reward: 120.0, Step: 120 -2022-10-30 21:15:30 - r - INFO: - Current episode 203 has the best eval reward: 92.8 -2022-10-30 21:15:30 - r - INFO: - Episode: 204/1000, Reward: 69.0, Step: 69 -2022-10-30 21:15:31 - r - INFO: - Episode: 205/1000, Reward: 55.0, Step: 55 -2022-10-30 21:15:31 - r - INFO: - Episode: 206/1000, Reward: 62.0, Step: 62 -2022-10-30 21:15:31 - r - INFO: - Episode: 207/1000, Reward: 64.0, Step: 64 -2022-10-30 21:15:32 - r - INFO: - Episode: 208/1000, Reward: 49.0, Step: 49 -2022-10-30 21:15:32 - r - INFO: - Episode: 209/1000, Reward: 32.0, Step: 32 -2022-10-30 21:15:33 - r - INFO: - Episode: 210/1000, Reward: 42.0, Step: 42 -2022-10-30 21:15:33 - r - INFO: - Episode: 211/1000, Reward: 50.0, Step: 50 -2022-10-30 21:15:33 - r - INFO: - Episode: 212/1000, Reward: 93.0, Step: 93 -2022-10-30 21:15:34 - r - INFO: - Episode: 213/1000, Reward: 60.0, Step: 60 -2022-10-30 21:15:34 - r - INFO: - Episode: 214/1000, Reward: 54.0, Step: 54 -2022-10-30 21:15:35 - r - INFO: - Episode: 215/1000, Reward: 68.0, Step: 68 -2022-10-30 21:15:35 - r - INFO: - Episode: 216/1000, Reward: 84.0, Step: 84 -2022-10-30 21:15:35 - r - INFO: - Current episode 216 has the best eval reward: 94.6 -2022-10-30 21:15:36 - r - INFO: - Episode: 217/1000, Reward: 55.0, Step: 55 -2022-10-30 21:15:36 - r - INFO: - Episode: 218/1000, Reward: 70.0, Step: 70 -2022-10-30 21:15:37 - r - INFO: - Episode: 219/1000, Reward: 115.0, Step: 115 -2022-10-30 21:15:37 - r - INFO: - Episode: 220/1000, Reward: 149.0, Step: 149 -2022-10-30 21:15:38 - r - INFO: - Episode: 221/1000, Reward: 68.0, Step: 68 -2022-10-30 21:15:38 - r - INFO: - Episode: 222/1000, Reward: 50.0, Step: 50 -2022-10-30 21:15:38 - r - INFO: - Current episode 222 has the best eval reward: 95.5 -2022-10-30 21:15:39 - r - INFO: - Episode: 223/1000, Reward: 56.0, Step: 56 -2022-10-30 21:15:39 - r - INFO: - Episode: 224/1000, Reward: 61.0, Step: 61 -2022-10-30 21:15:39 - r - INFO: - Episode: 225/1000, Reward: 117.0, Step: 117 -2022-10-30 21:15:40 - r - INFO: - Episode: 226/1000, Reward: 66.0, Step: 66 -2022-10-30 21:15:41 - r - INFO: - Episode: 227/1000, Reward: 127.0, Step: 127 -2022-10-30 21:15:41 - r - INFO: - Episode: 228/1000, Reward: 66.0, Step: 66 -2022-10-30 21:15:42 - r - INFO: - Episode: 229/1000, Reward: 48.0, Step: 48 -2022-10-30 21:15:42 - r - INFO: - Episode: 230/1000, Reward: 36.0, Step: 36 -2022-10-30 21:15:42 - r - INFO: - Episode: 231/1000, Reward: 79.0, Step: 79 -2022-10-30 21:15:43 - r - INFO: - Episode: 232/1000, Reward: 49.0, Step: 49 -2022-10-30 21:15:43 - r - INFO: - Episode: 233/1000, Reward: 55.0, Step: 55 -2022-10-30 21:15:43 - r - INFO: - Episode: 234/1000, Reward: 41.0, Step: 41 -2022-10-30 21:15:43 - r - INFO: - Episode: 235/1000, Reward: 20.0, Step: 20 -2022-10-30 21:15:44 - r - INFO: - Episode: 236/1000, Reward: 40.0, Step: 40 -2022-10-30 21:15:44 - r - INFO: - Episode: 237/1000, Reward: 120.0, Step: 120 -2022-10-30 21:15:44 - r - INFO: - Episode: 238/1000, Reward: 27.0, Step: 27 -2022-10-30 21:15:45 - r - INFO: - Episode: 239/1000, Reward: 51.0, Step: 51 -2022-10-30 21:15:45 - r - INFO: - Episode: 240/1000, Reward: 35.0, Step: 35 -2022-10-30 21:15:45 - r - INFO: - Episode: 241/1000, Reward: 43.0, Step: 43 -2022-10-30 21:15:46 - r - INFO: - Episode: 242/1000, Reward: 54.0, Step: 54 -2022-10-30 21:15:46 - r - INFO: - Episode: 243/1000, Reward: 52.0, Step: 52 -2022-10-30 21:15:46 - r - INFO: - Episode: 244/1000, Reward: 47.0, Step: 47 -2022-10-30 21:15:46 - r - INFO: - Episode: 245/1000, Reward: 63.0, Step: 63 -2022-10-30 21:15:47 - r - INFO: - Episode: 246/1000, Reward: 29.0, Step: 29 -2022-10-30 21:15:47 - r - INFO: - Episode: 247/1000, Reward: 36.0, Step: 36 -2022-10-30 21:15:47 - r - INFO: - Episode: 248/1000, Reward: 58.0, Step: 58 -2022-10-30 21:15:48 - r - INFO: - Episode: 249/1000, Reward: 63.0, Step: 63 -2022-10-30 21:15:48 - r - INFO: - Episode: 250/1000, Reward: 49.0, Step: 49 -2022-10-30 21:15:48 - r - INFO: - Episode: 251/1000, Reward: 70.0, Step: 70 -2022-10-30 21:15:49 - r - INFO: - Episode: 252/1000, Reward: 114.0, Step: 114 -2022-10-30 21:15:49 - r - INFO: - Episode: 253/1000, Reward: 62.0, Step: 62 -2022-10-30 21:15:50 - r - INFO: - Episode: 254/1000, Reward: 73.0, Step: 73 -2022-10-30 21:15:50 - r - INFO: - Current episode 254 has the best eval reward: 96.7 -2022-10-30 21:15:50 - r - INFO: - Episode: 255/1000, Reward: 62.0, Step: 62 -2022-10-30 21:15:51 - r - INFO: - Episode: 256/1000, Reward: 61.0, Step: 61 -2022-10-30 21:15:51 - r - INFO: - Episode: 257/1000, Reward: 115.0, Step: 115 -2022-10-30 21:15:52 - r - INFO: - Episode: 258/1000, Reward: 50.0, Step: 50 -2022-10-30 21:15:52 - r - INFO: - Episode: 259/1000, Reward: 128.0, Step: 128 -2022-10-30 21:15:53 - r - INFO: - Current episode 259 has the best eval reward: 104.8 -2022-10-30 21:15:53 - r - INFO: - Episode: 260/1000, Reward: 200.0, Step: 200 -2022-10-30 21:15:53 - r - INFO: - Episode: 261/1000, Reward: 75.0, Step: 75 -2022-10-30 21:15:54 - r - INFO: - Episode: 262/1000, Reward: 64.0, Step: 64 -2022-10-30 21:15:54 - r - INFO: - Episode: 263/1000, Reward: 33.0, Step: 33 -2022-10-30 21:15:55 - r - INFO: - Episode: 264/1000, Reward: 90.0, Step: 90 -2022-10-30 21:15:55 - r - INFO: - Current episode 264 has the best eval reward: 107.6 -2022-10-30 21:15:56 - r - INFO: - Episode: 265/1000, Reward: 117.0, Step: 117 -2022-10-30 21:15:56 - r - INFO: - Current episode 265 has the best eval reward: 119.4 -2022-10-30 21:15:56 - r - INFO: - Episode: 266/1000, Reward: 60.0, Step: 60 -2022-10-30 21:15:57 - r - INFO: - Episode: 267/1000, Reward: 177.0, Step: 177 -2022-10-30 21:15:57 - r - INFO: - Episode: 268/1000, Reward: 39.0, Step: 39 -2022-10-30 21:15:58 - r - INFO: - Episode: 269/1000, Reward: 40.0, Step: 40 -2022-10-30 21:15:58 - r - INFO: - Episode: 270/1000, Reward: 109.0, Step: 109 -2022-10-30 21:15:59 - r - INFO: - Episode: 271/1000, Reward: 100.0, Step: 100 -2022-10-30 21:16:00 - r - INFO: - Episode: 272/1000, Reward: 99.0, Step: 99 -2022-10-30 21:16:00 - r - INFO: - Episode: 273/1000, Reward: 136.0, Step: 136 -2022-10-30 21:16:01 - r - INFO: - Episode: 274/1000, Reward: 62.0, Step: 62 -2022-10-30 21:16:01 - r - INFO: - Episode: 275/1000, Reward: 100.0, Step: 100 -2022-10-30 21:16:02 - r - INFO: - Current episode 275 has the best eval reward: 120.1 -2022-10-30 21:16:02 - r - INFO: - Episode: 276/1000, Reward: 73.0, Step: 73 -2022-10-30 21:16:03 - r - INFO: - Episode: 277/1000, Reward: 166.0, Step: 166 -2022-10-30 21:16:03 - r - INFO: - Episode: 278/1000, Reward: 74.0, Step: 74 -2022-10-30 21:16:04 - r - INFO: - Current episode 278 has the best eval reward: 121.8 -2022-10-30 21:16:04 - r - INFO: - Episode: 279/1000, Reward: 126.0, Step: 126 -2022-10-30 21:16:05 - r - INFO: - Episode: 280/1000, Reward: 111.0, Step: 111 -2022-10-30 21:16:06 - r - INFO: - Episode: 281/1000, Reward: 198.0, Step: 198 -2022-10-30 21:16:07 - r - INFO: - Episode: 282/1000, Reward: 106.0, Step: 106 -2022-10-30 21:16:07 - r - INFO: - Episode: 283/1000, Reward: 80.0, Step: 80 -2022-10-30 21:16:08 - r - INFO: - Episode: 284/1000, Reward: 74.0, Step: 74 -2022-10-30 21:16:08 - r - INFO: - Episode: 285/1000, Reward: 114.0, Step: 114 -2022-10-30 21:16:09 - r - INFO: - Episode: 286/1000, Reward: 69.0, Step: 69 -2022-10-30 21:16:09 - r - INFO: - Episode: 287/1000, Reward: 98.0, Step: 98 -2022-10-30 21:16:10 - r - INFO: - Episode: 288/1000, Reward: 63.0, Step: 63 -2022-10-30 21:16:10 - r - INFO: - Episode: 289/1000, Reward: 61.0, Step: 61 -2022-10-30 21:16:11 - r - INFO: - Episode: 290/1000, Reward: 49.0, Step: 49 -2022-10-30 21:16:11 - r - INFO: - Episode: 291/1000, Reward: 89.0, Step: 89 -2022-10-30 21:16:12 - r - INFO: - Episode: 292/1000, Reward: 114.0, Step: 114 -2022-10-30 21:16:13 - r - INFO: - Episode: 293/1000, Reward: 103.0, Step: 103 -2022-10-30 21:16:13 - r - INFO: - Episode: 294/1000, Reward: 103.0, Step: 103 -2022-10-30 21:16:14 - r - INFO: - Episode: 295/1000, Reward: 93.0, Step: 93 -2022-10-30 21:16:14 - r - INFO: - Episode: 296/1000, Reward: 137.0, Step: 137 -2022-10-30 21:16:15 - r - INFO: - Episode: 297/1000, Reward: 97.0, Step: 97 -2022-10-30 21:16:16 - r - INFO: - Episode: 298/1000, Reward: 124.0, Step: 124 -2022-10-30 21:16:16 - r - INFO: - Episode: 299/1000, Reward: 147.0, Step: 147 -2022-10-30 21:16:17 - r - INFO: - Episode: 300/1000, Reward: 125.0, Step: 125 -2022-10-30 21:16:18 - r - INFO: - Episode: 301/1000, Reward: 105.0, Step: 105 -2022-10-30 21:16:18 - r - INFO: - Current episode 301 has the best eval reward: 148.8 -2022-10-30 21:16:18 - r - INFO: - Episode: 302/1000, Reward: 113.0, Step: 113 -2022-10-30 21:16:19 - r - INFO: - Current episode 302 has the best eval reward: 150.8 -2022-10-30 21:16:19 - r - INFO: - Episode: 303/1000, Reward: 120.0, Step: 120 -2022-10-30 21:16:20 - r - INFO: - Episode: 304/1000, Reward: 159.0, Step: 159 -2022-10-30 21:16:21 - r - INFO: - Episode: 305/1000, Reward: 190.0, Step: 190 -2022-10-30 21:16:22 - r - INFO: - Current episode 305 has the best eval reward: 183.4 -2022-10-30 21:16:22 - r - INFO: - Episode: 306/1000, Reward: 119.0, Step: 119 -2022-10-30 21:16:23 - r - INFO: - Episode: 307/1000, Reward: 200.0, Step: 200 -2022-10-30 21:16:24 - r - INFO: - Episode: 308/1000, Reward: 148.0, Step: 148 -2022-10-30 21:16:25 - r - INFO: - Episode: 309/1000, Reward: 200.0, Step: 200 -2022-10-30 21:16:26 - r - INFO: - Episode: 310/1000, Reward: 79.0, Step: 79 -2022-10-30 21:16:27 - r - INFO: - Episode: 311/1000, Reward: 115.0, Step: 115 -2022-10-30 21:16:28 - r - INFO: - Episode: 312/1000, Reward: 147.0, Step: 147 -2022-10-30 21:16:29 - r - INFO: - Episode: 313/1000, Reward: 112.0, Step: 112 -2022-10-30 21:16:29 - r - INFO: - Episode: 314/1000, Reward: 125.0, Step: 125 -2022-10-30 21:16:30 - r - INFO: - Episode: 315/1000, Reward: 184.0, Step: 184 -2022-10-30 21:16:31 - r - INFO: - Episode: 316/1000, Reward: 193.0, Step: 193 -2022-10-30 21:16:32 - r - INFO: - Episode: 317/1000, Reward: 117.0, Step: 117 -2022-10-30 21:16:33 - r - INFO: - Episode: 318/1000, Reward: 153.0, Step: 153 -2022-10-30 21:16:34 - r - INFO: - Episode: 319/1000, Reward: 125.0, Step: 125 -2022-10-30 21:16:35 - r - INFO: - Episode: 320/1000, Reward: 184.0, Step: 184 -2022-10-30 21:16:36 - r - INFO: - Episode: 321/1000, Reward: 173.0, Step: 173 -2022-10-30 21:16:36 - r - INFO: - Episode: 322/1000, Reward: 117.0, Step: 117 -2022-10-30 21:16:37 - r - INFO: - Episode: 323/1000, Reward: 47.0, Step: 47 -2022-10-30 21:16:38 - r - INFO: - Episode: 324/1000, Reward: 107.0, Step: 107 -2022-10-30 21:16:38 - r - INFO: - Episode: 325/1000, Reward: 104.0, Step: 104 -2022-10-30 21:16:39 - r - INFO: - Episode: 326/1000, Reward: 114.0, Step: 114 -2022-10-30 21:16:39 - r - INFO: - Episode: 327/1000, Reward: 90.0, Step: 90 -2022-10-30 21:16:40 - r - INFO: - Episode: 328/1000, Reward: 112.0, Step: 112 -2022-10-30 21:16:41 - r - INFO: - Episode: 329/1000, Reward: 70.0, Step: 70 -2022-10-30 21:16:41 - r - INFO: - Episode: 330/1000, Reward: 74.0, Step: 74 -2022-10-30 21:16:42 - r - INFO: - Episode: 331/1000, Reward: 159.0, Step: 159 -2022-10-30 21:16:42 - r - INFO: - Episode: 332/1000, Reward: 39.0, Step: 39 -2022-10-30 21:16:43 - r - INFO: - Episode: 333/1000, Reward: 129.0, Step: 129 -2022-10-30 21:16:44 - r - INFO: - Episode: 334/1000, Reward: 50.0, Step: 50 -2022-10-30 21:16:44 - r - INFO: - Episode: 335/1000, Reward: 74.0, Step: 74 -2022-10-30 21:16:44 - r - INFO: - Episode: 336/1000, Reward: 31.0, Step: 31 -2022-10-30 21:16:45 - r - INFO: - Episode: 337/1000, Reward: 57.0, Step: 57 -2022-10-30 21:16:45 - r - INFO: - Episode: 338/1000, Reward: 71.0, Step: 71 -2022-10-30 21:16:46 - r - INFO: - Episode: 339/1000, Reward: 43.0, Step: 43 -2022-10-30 21:16:46 - r - INFO: - Episode: 340/1000, Reward: 41.0, Step: 41 -2022-10-30 21:16:46 - r - INFO: - Episode: 341/1000, Reward: 64.0, Step: 64 -2022-10-30 21:16:47 - r - INFO: - Episode: 342/1000, Reward: 38.0, Step: 38 -2022-10-30 21:16:47 - r - INFO: - Episode: 343/1000, Reward: 45.0, Step: 45 -2022-10-30 21:16:48 - r - INFO: - Episode: 344/1000, Reward: 120.0, Step: 120 -2022-10-30 21:16:48 - r - INFO: - Episode: 345/1000, Reward: 40.0, Step: 40 -2022-10-30 21:16:48 - r - INFO: - Episode: 346/1000, Reward: 46.0, Step: 46 -2022-10-30 21:16:48 - r - INFO: - Episode: 347/1000, Reward: 57.0, Step: 57 -2022-10-30 21:16:49 - r - INFO: - Episode: 348/1000, Reward: 29.0, Step: 29 -2022-10-30 21:16:49 - r - INFO: - Episode: 349/1000, Reward: 29.0, Step: 29 -2022-10-30 21:16:49 - r - INFO: - Episode: 350/1000, Reward: 50.0, Step: 50 -2022-10-30 21:16:50 - r - INFO: - Episode: 351/1000, Reward: 38.0, Step: 38 -2022-10-30 21:16:50 - r - INFO: - Episode: 352/1000, Reward: 51.0, Step: 51 -2022-10-30 21:16:50 - r - INFO: - Episode: 353/1000, Reward: 49.0, Step: 49 -2022-10-30 21:16:50 - r - INFO: - Episode: 354/1000, Reward: 30.0, Step: 30 -2022-10-30 21:16:51 - r - INFO: - Episode: 355/1000, Reward: 40.0, Step: 40 -2022-10-30 21:16:51 - r - INFO: - Episode: 356/1000, Reward: 45.0, Step: 45 -2022-10-30 21:16:51 - r - INFO: - Episode: 357/1000, Reward: 68.0, Step: 68 -2022-10-30 21:16:52 - r - INFO: - Episode: 358/1000, Reward: 27.0, Step: 27 -2022-10-30 21:16:52 - r - INFO: - Episode: 359/1000, Reward: 18.0, Step: 18 -2022-10-30 21:16:52 - r - INFO: - Episode: 360/1000, Reward: 26.0, Step: 26 -2022-10-30 21:16:52 - r - INFO: - Episode: 361/1000, Reward: 15.0, Step: 15 -2022-10-30 21:16:52 - r - INFO: - Episode: 362/1000, Reward: 65.0, Step: 65 -2022-10-30 21:16:53 - r - INFO: - Episode: 363/1000, Reward: 38.0, Step: 38 -2022-10-30 21:16:53 - r - INFO: - Episode: 364/1000, Reward: 41.0, Step: 41 -2022-10-30 21:16:53 - r - INFO: - Episode: 365/1000, Reward: 61.0, Step: 61 -2022-10-30 21:16:54 - r - INFO: - Episode: 366/1000, Reward: 113.0, Step: 113 -2022-10-30 21:16:54 - r - INFO: - Episode: 367/1000, Reward: 39.0, Step: 39 -2022-10-30 21:16:54 - r - INFO: - Episode: 368/1000, Reward: 60.0, Step: 60 -2022-10-30 21:16:55 - r - INFO: - Episode: 369/1000, Reward: 134.0, Step: 134 -2022-10-30 21:16:56 - r - INFO: - Episode: 370/1000, Reward: 122.0, Step: 122 -2022-10-30 21:16:56 - r - INFO: - Episode: 371/1000, Reward: 34.0, Step: 34 -2022-10-30 21:16:57 - r - INFO: - Episode: 372/1000, Reward: 129.0, Step: 129 -2022-10-30 21:16:57 - r - INFO: - Episode: 373/1000, Reward: 40.0, Step: 40 -2022-10-30 21:16:58 - r - INFO: - Episode: 374/1000, Reward: 128.0, Step: 128 -2022-10-30 21:16:59 - r - INFO: - Episode: 375/1000, Reward: 200.0, Step: 200 -2022-10-30 21:17:00 - r - INFO: - Episode: 376/1000, Reward: 108.0, Step: 108 -2022-10-30 21:17:01 - r - INFO: - Episode: 377/1000, Reward: 108.0, Step: 108 -2022-10-30 21:17:02 - r - INFO: - Episode: 378/1000, Reward: 151.0, Step: 151 -2022-10-30 21:17:03 - r - INFO: - Episode: 379/1000, Reward: 79.0, Step: 79 -2022-10-30 21:17:03 - r - INFO: - Episode: 380/1000, Reward: 105.0, Step: 105 -2022-10-30 21:17:04 - r - INFO: - Episode: 381/1000, Reward: 87.0, Step: 87 -2022-10-30 21:17:05 - r - INFO: - Episode: 382/1000, Reward: 94.0, Step: 94 -2022-10-30 21:17:06 - r - INFO: - Episode: 383/1000, Reward: 112.0, Step: 112 -2022-10-30 21:17:07 - r - INFO: - Episode: 384/1000, Reward: 200.0, Step: 200 -2022-10-30 21:17:08 - r - INFO: - Episode: 385/1000, Reward: 184.0, Step: 184 -2022-10-30 21:17:08 - r - INFO: - Episode: 386/1000, Reward: 124.0, Step: 124 -2022-10-30 21:17:09 - r - INFO: - Episode: 387/1000, Reward: 200.0, Step: 200 -2022-10-30 21:17:11 - r - INFO: - Episode: 388/1000, Reward: 200.0, Step: 200 -2022-10-30 21:17:12 - r - INFO: - Episode: 389/1000, Reward: 109.0, Step: 109 -2022-10-30 21:17:12 - r - INFO: - Episode: 390/1000, Reward: 88.0, Step: 88 -2022-10-30 21:17:13 - r - INFO: - Episode: 391/1000, Reward: 104.0, Step: 104 -2022-10-30 21:17:14 - r - INFO: - Episode: 392/1000, Reward: 200.0, Step: 200 -2022-10-30 21:17:15 - r - INFO: - Episode: 393/1000, Reward: 84.0, Step: 84 -2022-10-30 21:17:16 - r - INFO: - Episode: 394/1000, Reward: 187.0, Step: 187 -2022-10-30 21:17:17 - r - INFO: - Episode: 395/1000, Reward: 182.0, Step: 182 -2022-10-30 21:17:18 - r - INFO: - Episode: 396/1000, Reward: 148.0, Step: 148 -2022-10-30 21:17:19 - r - INFO: - Episode: 397/1000, Reward: 86.0, Step: 86 -2022-10-30 21:17:20 - r - INFO: - Episode: 398/1000, Reward: 200.0, Step: 200 -2022-10-30 21:17:21 - r - INFO: - Episode: 399/1000, Reward: 199.0, Step: 199 -2022-10-30 21:17:22 - r - INFO: - Episode: 400/1000, Reward: 200.0, Step: 200 -2022-10-30 21:17:23 - r - INFO: - Episode: 401/1000, Reward: 92.0, Step: 92 -2022-10-30 21:17:23 - r - INFO: - Episode: 402/1000, Reward: 112.0, Step: 112 -2022-10-30 21:17:24 - r - INFO: - Episode: 403/1000, Reward: 86.0, Step: 86 -2022-10-30 21:17:25 - r - INFO: - Episode: 404/1000, Reward: 114.0, Step: 114 -2022-10-30 21:17:26 - r - INFO: - Episode: 405/1000, Reward: 90.0, Step: 90 -2022-10-30 21:17:26 - r - INFO: - Episode: 406/1000, Reward: 101.0, Step: 101 -2022-10-30 21:17:27 - r - INFO: - Episode: 407/1000, Reward: 111.0, Step: 111 -2022-10-30 21:17:28 - r - INFO: - Episode: 408/1000, Reward: 107.0, Step: 107 -2022-10-30 21:17:28 - r - INFO: - Episode: 409/1000, Reward: 120.0, Step: 120 -2022-10-30 21:17:29 - r - INFO: - Episode: 410/1000, Reward: 114.0, Step: 114 -2022-10-30 21:17:30 - r - INFO: - Episode: 411/1000, Reward: 97.0, Step: 97 -2022-10-30 21:17:30 - r - INFO: - Episode: 412/1000, Reward: 95.0, Step: 95 -2022-10-30 21:17:31 - r - INFO: - Episode: 413/1000, Reward: 126.0, Step: 126 -2022-10-30 21:17:32 - r - INFO: - Episode: 414/1000, Reward: 111.0, Step: 111 -2022-10-30 21:17:33 - r - INFO: - Episode: 415/1000, Reward: 120.0, Step: 120 -2022-10-30 21:17:33 - r - INFO: - Episode: 416/1000, Reward: 178.0, Step: 178 -2022-10-30 21:17:34 - r - INFO: - Episode: 417/1000, Reward: 97.0, Step: 97 -2022-10-30 21:17:35 - r - INFO: - Episode: 418/1000, Reward: 144.0, Step: 144 -2022-10-30 21:17:36 - r - INFO: - Episode: 419/1000, Reward: 200.0, Step: 200 -2022-10-30 21:17:36 - r - INFO: - Episode: 420/1000, Reward: 190.0, Step: 190 -2022-10-30 21:17:37 - r - INFO: - Episode: 421/1000, Reward: 29.0, Step: 29 -2022-10-30 21:17:38 - r - INFO: - Episode: 422/1000, Reward: 200.0, Step: 200 -2022-10-30 21:17:38 - r - INFO: - Episode: 423/1000, Reward: 116.0, Step: 116 -2022-10-30 21:17:39 - r - INFO: - Episode: 424/1000, Reward: 200.0, Step: 200 -2022-10-30 21:17:40 - r - INFO: - Episode: 425/1000, Reward: 107.0, Step: 107 -2022-10-30 21:17:41 - r - INFO: - Episode: 426/1000, Reward: 128.0, Step: 128 -2022-10-30 21:17:41 - r - INFO: - Episode: 427/1000, Reward: 164.0, Step: 164 -2022-10-30 21:17:42 - r - INFO: - Episode: 428/1000, Reward: 30.0, Step: 30 -2022-10-30 21:17:42 - r - INFO: - Episode: 429/1000, Reward: 122.0, Step: 122 -2022-10-30 21:17:43 - r - INFO: - Episode: 430/1000, Reward: 110.0, Step: 110 -2022-10-30 21:17:44 - r - INFO: - Episode: 431/1000, Reward: 105.0, Step: 105 -2022-10-30 21:17:44 - r - INFO: - Episode: 432/1000, Reward: 137.0, Step: 137 -2022-10-30 21:17:45 - r - INFO: - Episode: 433/1000, Reward: 110.0, Step: 110 -2022-10-30 21:17:45 - r - INFO: - Episode: 434/1000, Reward: 111.0, Step: 111 -2022-10-30 21:17:46 - r - INFO: - Episode: 435/1000, Reward: 33.0, Step: 33 -2022-10-30 21:17:46 - r - INFO: - Episode: 436/1000, Reward: 100.0, Step: 100 -2022-10-30 21:17:47 - r - INFO: - Episode: 437/1000, Reward: 131.0, Step: 131 -2022-10-30 21:17:48 - r - INFO: - Episode: 438/1000, Reward: 99.0, Step: 99 -2022-10-30 21:17:48 - r - INFO: - Episode: 439/1000, Reward: 118.0, Step: 118 -2022-10-30 21:17:49 - r - INFO: - Episode: 440/1000, Reward: 98.0, Step: 98 -2022-10-30 21:17:49 - r - INFO: - Episode: 441/1000, Reward: 119.0, Step: 119 -2022-10-30 21:17:50 - r - INFO: - Episode: 442/1000, Reward: 41.0, Step: 41 -2022-10-30 21:17:50 - r - INFO: - Episode: 443/1000, Reward: 107.0, Step: 107 -2022-10-30 21:17:51 - r - INFO: - Episode: 444/1000, Reward: 41.0, Step: 41 -2022-10-30 21:17:52 - r - INFO: - Episode: 445/1000, Reward: 113.0, Step: 113 -2022-10-30 21:17:52 - r - INFO: - Episode: 446/1000, Reward: 113.0, Step: 113 -2022-10-30 21:17:53 - r - INFO: - Episode: 447/1000, Reward: 117.0, Step: 117 -2022-10-30 21:17:54 - r - INFO: - Episode: 448/1000, Reward: 140.0, Step: 140 -2022-10-30 21:17:54 - r - INFO: - Episode: 449/1000, Reward: 133.0, Step: 133 -2022-10-30 21:17:55 - r - INFO: - Episode: 450/1000, Reward: 108.0, Step: 108 -2022-10-30 21:17:56 - r - INFO: - Episode: 451/1000, Reward: 117.0, Step: 117 -2022-10-30 21:17:57 - r - INFO: - Episode: 452/1000, Reward: 40.0, Step: 40 -2022-10-30 21:17:57 - r - INFO: - Episode: 453/1000, Reward: 108.0, Step: 108 -2022-10-30 21:17:58 - r - INFO: - Episode: 454/1000, Reward: 140.0, Step: 140 -2022-10-30 21:17:59 - r - INFO: - Episode: 455/1000, Reward: 133.0, Step: 133 -2022-10-30 21:18:00 - r - INFO: - Episode: 456/1000, Reward: 115.0, Step: 115 -2022-10-30 21:18:00 - r - INFO: - Episode: 457/1000, Reward: 30.0, Step: 30 -2022-10-30 21:18:01 - r - INFO: - Episode: 458/1000, Reward: 119.0, Step: 119 -2022-10-30 21:18:02 - r - INFO: - Episode: 459/1000, Reward: 160.0, Step: 160 -2022-10-30 21:18:02 - r - INFO: - Episode: 460/1000, Reward: 125.0, Step: 125 -2022-10-30 21:18:03 - r - INFO: - Episode: 461/1000, Reward: 161.0, Step: 161 -2022-10-30 21:18:04 - r - INFO: - Episode: 462/1000, Reward: 139.0, Step: 139 -2022-10-30 21:18:05 - r - INFO: - Episode: 463/1000, Reward: 190.0, Step: 190 -2022-10-30 21:18:06 - r - INFO: - Episode: 464/1000, Reward: 149.0, Step: 149 -2022-10-30 21:18:07 - r - INFO: - Episode: 465/1000, Reward: 173.0, Step: 173 -2022-10-30 21:18:08 - r - INFO: - Current episode 465 has the best eval reward: 187.6 -2022-10-30 21:18:08 - r - INFO: - Episode: 466/1000, Reward: 165.0, Step: 165 -2022-10-30 21:18:09 - r - INFO: - Episode: 467/1000, Reward: 82.0, Step: 82 -2022-10-30 21:18:10 - r - INFO: - Episode: 468/1000, Reward: 197.0, Step: 197 -2022-10-30 21:18:11 - r - INFO: - Current episode 468 has the best eval reward: 195.0 -2022-10-30 21:18:12 - r - INFO: - Episode: 469/1000, Reward: 200.0, Step: 200 -2022-10-30 21:18:13 - r - INFO: - Episode: 470/1000, Reward: 200.0, Step: 200 -2022-10-30 21:18:14 - r - INFO: - Current episode 470 has the best eval reward: 199.4 -2022-10-30 21:18:14 - r - INFO: - Episode: 471/1000, Reward: 200.0, Step: 200 -2022-10-30 21:18:16 - r - INFO: - Episode: 472/1000, Reward: 182.0, Step: 182 -2022-10-30 21:18:17 - r - INFO: - Episode: 473/1000, Reward: 118.0, Step: 118 -2022-10-30 21:18:18 - r - INFO: - Episode: 474/1000, Reward: 200.0, Step: 200 -2022-10-30 21:18:19 - r - INFO: - Episode: 475/1000, Reward: 200.0, Step: 200 -2022-10-30 21:18:20 - r - INFO: - Episode: 476/1000, Reward: 93.0, Step: 93 -2022-10-30 21:18:21 - r - INFO: - Episode: 477/1000, Reward: 200.0, Step: 200 -2022-10-30 21:18:23 - r - INFO: - Episode: 478/1000, Reward: 200.0, Step: 200 -2022-10-30 21:18:24 - r - INFO: - Episode: 479/1000, Reward: 167.0, Step: 167 -2022-10-30 21:18:25 - r - INFO: - Episode: 480/1000, Reward: 200.0, Step: 200 -2022-10-30 21:18:26 - r - INFO: - Episode: 481/1000, Reward: 200.0, Step: 200 -2022-10-30 21:18:27 - r - INFO: - Episode: 482/1000, Reward: 200.0, Step: 200 -2022-10-30 21:18:28 - r - INFO: - Episode: 483/1000, Reward: 190.0, Step: 190 -2022-10-30 21:18:29 - r - INFO: - Episode: 484/1000, Reward: 86.0, Step: 86 -2022-10-30 21:18:30 - r - INFO: - Episode: 485/1000, Reward: 166.0, Step: 166 -2022-10-30 21:18:31 - r - INFO: - Episode: 486/1000, Reward: 200.0, Step: 200 -2022-10-30 21:18:32 - r - INFO: - Episode: 487/1000, Reward: 200.0, Step: 200 -2022-10-30 21:18:33 - r - INFO: - Episode: 488/1000, Reward: 172.0, Step: 172 -2022-10-30 21:18:34 - r - INFO: - Episode: 489/1000, Reward: 200.0, Step: 200 -2022-10-30 21:18:35 - r - INFO: - Episode: 490/1000, Reward: 102.0, Step: 102 -2022-10-30 21:18:36 - r - INFO: - Episode: 491/1000, Reward: 194.0, Step: 194 -2022-10-30 21:18:37 - r - INFO: - Episode: 492/1000, Reward: 200.0, Step: 200 -2022-10-30 21:18:38 - r - INFO: - Episode: 493/1000, Reward: 179.0, Step: 179 -2022-10-30 21:18:39 - r - INFO: - Episode: 494/1000, Reward: 187.0, Step: 187 -2022-10-30 21:18:40 - r - INFO: - Episode: 495/1000, Reward: 200.0, Step: 200 -2022-10-30 21:18:41 - r - INFO: - Episode: 496/1000, Reward: 89.0, Step: 89 -2022-10-30 21:18:41 - r - INFO: - Episode: 497/1000, Reward: 169.0, Step: 169 -2022-10-30 21:18:42 - r - INFO: - Episode: 498/1000, Reward: 28.0, Step: 28 -2022-10-30 21:18:43 - r - INFO: - Episode: 499/1000, Reward: 160.0, Step: 160 -2022-10-30 21:18:44 - r - INFO: - Episode: 500/1000, Reward: 140.0, Step: 140 -2022-10-30 21:18:44 - r - INFO: - Episode: 501/1000, Reward: 37.0, Step: 37 -2022-10-30 21:18:45 - r - INFO: - Episode: 502/1000, Reward: 32.0, Step: 32 -2022-10-30 21:18:45 - r - INFO: - Episode: 503/1000, Reward: 129.0, Step: 129 -2022-10-30 21:18:46 - r - INFO: - Episode: 504/1000, Reward: 22.0, Step: 22 -2022-10-30 21:18:46 - r - INFO: - Episode: 505/1000, Reward: 124.0, Step: 124 -2022-10-30 21:18:46 - r - INFO: - Episode: 506/1000, Reward: 24.0, Step: 24 -2022-10-30 21:18:47 - r - INFO: - Episode: 507/1000, Reward: 115.0, Step: 115 -2022-10-30 21:18:47 - r - INFO: - Episode: 508/1000, Reward: 24.0, Step: 24 -2022-10-30 21:18:48 - r - INFO: - Episode: 509/1000, Reward: 38.0, Step: 38 -2022-10-30 21:18:49 - r - INFO: - Episode: 510/1000, Reward: 24.0, Step: 24 -2022-10-30 21:18:49 - r - INFO: - Episode: 511/1000, Reward: 23.0, Step: 23 -2022-10-30 21:18:49 - r - INFO: - Episode: 512/1000, Reward: 125.0, Step: 125 -2022-10-30 21:18:49 - r - INFO: - Episode: 513/1000, Reward: 22.0, Step: 22 -2022-10-30 21:18:50 - r - INFO: - Episode: 514/1000, Reward: 24.0, Step: 24 -2022-10-30 21:18:50 - r - INFO: - Episode: 515/1000, Reward: 20.0, Step: 20 -2022-10-30 21:18:50 - r - INFO: - Episode: 516/1000, Reward: 25.0, Step: 25 -2022-10-30 21:18:50 - r - INFO: - Episode: 517/1000, Reward: 31.0, Step: 31 -2022-10-30 21:18:50 - r - INFO: - Episode: 518/1000, Reward: 23.0, Step: 23 -2022-10-30 21:18:51 - r - INFO: - Episode: 519/1000, Reward: 30.0, Step: 30 -2022-10-30 21:18:51 - r - INFO: - Episode: 520/1000, Reward: 101.0, Step: 101 -2022-10-30 21:18:51 - r - INFO: - Episode: 521/1000, Reward: 25.0, Step: 25 -2022-10-30 21:18:52 - r - INFO: - Episode: 522/1000, Reward: 22.0, Step: 22 -2022-10-30 21:18:52 - r - INFO: - Episode: 523/1000, Reward: 20.0, Step: 20 -2022-10-30 21:18:52 - r - INFO: - Episode: 524/1000, Reward: 16.0, Step: 16 -2022-10-30 21:18:53 - r - INFO: - Episode: 525/1000, Reward: 104.0, Step: 104 -2022-10-30 21:18:53 - r - INFO: - Episode: 526/1000, Reward: 17.0, Step: 17 -2022-10-30 21:18:53 - r - INFO: - Episode: 527/1000, Reward: 108.0, Step: 108 -2022-10-30 21:18:53 - r - INFO: - Episode: 528/1000, Reward: 121.0, Step: 121 -2022-10-30 21:18:54 - r - INFO: - Episode: 529/1000, Reward: 29.0, Step: 29 -2022-10-30 21:18:54 - r - INFO: - Episode: 530/1000, Reward: 29.0, Step: 29 -2022-10-30 21:18:54 - r - INFO: - Episode: 531/1000, Reward: 43.0, Step: 43 -2022-10-30 21:18:55 - r - INFO: - Episode: 532/1000, Reward: 105.0, Step: 105 -2022-10-30 21:18:55 - r - INFO: - Episode: 533/1000, Reward: 130.0, Step: 130 -2022-10-30 21:18:55 - r - INFO: - Episode: 534/1000, Reward: 30.0, Step: 30 -2022-10-30 21:18:56 - r - INFO: - Episode: 535/1000, Reward: 31.0, Step: 31 -2022-10-30 21:18:56 - r - INFO: - Episode: 536/1000, Reward: 30.0, Step: 30 -2022-10-30 21:18:56 - r - INFO: - Episode: 537/1000, Reward: 37.0, Step: 37 -2022-10-30 21:18:57 - r - INFO: - Episode: 538/1000, Reward: 115.0, Step: 115 -2022-10-30 21:18:58 - r - INFO: - Episode: 539/1000, Reward: 110.0, Step: 110 -2022-10-30 21:18:58 - r - INFO: - Episode: 540/1000, Reward: 112.0, Step: 112 -2022-10-30 21:18:59 - r - INFO: - Episode: 541/1000, Reward: 33.0, Step: 33 -2022-10-30 21:18:59 - r - INFO: - Episode: 542/1000, Reward: 120.0, Step: 120 -2022-10-30 21:19:00 - r - INFO: - Episode: 543/1000, Reward: 109.0, Step: 109 -2022-10-30 21:19:01 - r - INFO: - Episode: 544/1000, Reward: 122.0, Step: 122 -2022-10-30 21:19:01 - r - INFO: - Episode: 545/1000, Reward: 115.0, Step: 115 -2022-10-30 21:19:02 - r - INFO: - Episode: 546/1000, Reward: 34.0, Step: 34 -2022-10-30 21:19:02 - r - INFO: - Episode: 547/1000, Reward: 28.0, Step: 28 -2022-10-30 21:19:03 - r - INFO: - Episode: 548/1000, Reward: 29.0, Step: 29 -2022-10-30 21:19:03 - r - INFO: - Episode: 549/1000, Reward: 113.0, Step: 113 -2022-10-30 21:19:04 - r - INFO: - Episode: 550/1000, Reward: 100.0, Step: 100 -2022-10-30 21:19:04 - r - INFO: - Episode: 551/1000, Reward: 26.0, Step: 26 -2022-10-30 21:19:04 - r - INFO: - Episode: 552/1000, Reward: 24.0, Step: 24 -2022-10-30 21:19:05 - r - INFO: - Episode: 553/1000, Reward: 26.0, Step: 26 -2022-10-30 21:19:05 - r - INFO: - Episode: 554/1000, Reward: 102.0, Step: 102 -2022-10-30 21:19:05 - r - INFO: - Episode: 555/1000, Reward: 18.0, Step: 18 -2022-10-30 21:19:06 - r - INFO: - Episode: 556/1000, Reward: 107.0, Step: 107 -2022-10-30 21:19:06 - r - INFO: - Episode: 557/1000, Reward: 27.0, Step: 27 -2022-10-30 21:19:06 - r - INFO: - Episode: 558/1000, Reward: 87.0, Step: 87 -2022-10-30 21:19:07 - r - INFO: - Episode: 559/1000, Reward: 29.0, Step: 29 -2022-10-30 21:19:07 - r - INFO: - Episode: 560/1000, Reward: 31.0, Step: 31 -2022-10-30 21:19:07 - r - INFO: - Episode: 561/1000, Reward: 112.0, Step: 112 -2022-10-30 21:19:08 - r - INFO: - Episode: 562/1000, Reward: 112.0, Step: 112 -2022-10-30 21:19:09 - r - INFO: - Episode: 563/1000, Reward: 108.0, Step: 108 -2022-10-30 21:19:09 - r - INFO: - Episode: 564/1000, Reward: 98.0, Step: 98 -2022-10-30 21:19:10 - r - INFO: - Episode: 565/1000, Reward: 104.0, Step: 104 -2022-10-30 21:19:10 - r - INFO: - Episode: 566/1000, Reward: 116.0, Step: 116 -2022-10-30 21:19:11 - r - INFO: - Episode: 567/1000, Reward: 123.0, Step: 123 -2022-10-30 21:19:12 - r - INFO: - Episode: 568/1000, Reward: 105.0, Step: 105 -2022-10-30 21:19:12 - r - INFO: - Episode: 569/1000, Reward: 133.0, Step: 133 -2022-10-30 21:19:13 - r - INFO: - Episode: 570/1000, Reward: 116.0, Step: 116 -2022-10-30 21:19:14 - r - INFO: - Episode: 571/1000, Reward: 128.0, Step: 128 -2022-10-30 21:19:15 - r - INFO: - Episode: 572/1000, Reward: 130.0, Step: 130 -2022-10-30 21:19:15 - r - INFO: - Episode: 573/1000, Reward: 113.0, Step: 113 -2022-10-30 21:19:16 - r - INFO: - Episode: 574/1000, Reward: 143.0, Step: 143 -2022-10-30 21:19:17 - r - INFO: - Episode: 575/1000, Reward: 145.0, Step: 145 -2022-10-30 21:19:18 - r - INFO: - Episode: 576/1000, Reward: 159.0, Step: 159 -2022-10-30 21:19:19 - r - INFO: - Episode: 577/1000, Reward: 150.0, Step: 150 -2022-10-30 21:19:19 - r - INFO: - Episode: 578/1000, Reward: 130.0, Step: 130 -2022-10-30 21:19:20 - r - INFO: - Episode: 579/1000, Reward: 145.0, Step: 145 -2022-10-30 21:19:21 - r - INFO: - Episode: 580/1000, Reward: 173.0, Step: 173 -2022-10-30 21:19:22 - r - INFO: - Episode: 581/1000, Reward: 154.0, Step: 154 -2022-10-30 21:19:23 - r - INFO: - Episode: 582/1000, Reward: 131.0, Step: 131 -2022-10-30 21:19:24 - r - INFO: - Episode: 583/1000, Reward: 163.0, Step: 163 -2022-10-30 21:19:25 - r - INFO: - Episode: 584/1000, Reward: 160.0, Step: 160 -2022-10-30 21:19:26 - r - INFO: - Episode: 585/1000, Reward: 181.0, Step: 181 -2022-10-30 21:19:27 - r - INFO: - Episode: 586/1000, Reward: 161.0, Step: 161 -2022-10-30 21:19:28 - r - INFO: - Episode: 587/1000, Reward: 169.0, Step: 169 -2022-10-30 21:19:29 - r - INFO: - Episode: 588/1000, Reward: 150.0, Step: 150 -2022-10-30 21:19:30 - r - INFO: - Episode: 589/1000, Reward: 176.0, Step: 176 -2022-10-30 21:19:31 - r - INFO: - Episode: 590/1000, Reward: 157.0, Step: 157 -2022-10-30 21:19:32 - r - INFO: - Episode: 591/1000, Reward: 167.0, Step: 167 -2022-10-30 21:19:33 - r - INFO: - Episode: 592/1000, Reward: 168.0, Step: 168 -2022-10-30 21:19:34 - r - INFO: - Episode: 593/1000, Reward: 135.0, Step: 135 -2022-10-30 21:19:35 - r - INFO: - Episode: 594/1000, Reward: 157.0, Step: 157 -2022-10-30 21:19:35 - r - INFO: - Episode: 595/1000, Reward: 138.0, Step: 138 -2022-10-30 21:19:36 - r - INFO: - Episode: 596/1000, Reward: 139.0, Step: 139 -2022-10-30 21:19:37 - r - INFO: - Episode: 597/1000, Reward: 146.0, Step: 146 -2022-10-30 21:19:38 - r - INFO: - Episode: 598/1000, Reward: 121.0, Step: 121 -2022-10-30 21:19:39 - r - INFO: - Episode: 599/1000, Reward: 140.0, Step: 140 -2022-10-30 21:19:40 - r - INFO: - Episode: 600/1000, Reward: 124.0, Step: 124 -2022-10-30 21:19:41 - r - INFO: - Episode: 601/1000, Reward: 124.0, Step: 124 -2022-10-30 21:19:42 - r - INFO: - Episode: 602/1000, Reward: 115.0, Step: 115 -2022-10-30 21:19:42 - r - INFO: - Episode: 603/1000, Reward: 129.0, Step: 129 -2022-10-30 21:19:43 - r - INFO: - Episode: 604/1000, Reward: 107.0, Step: 107 -2022-10-30 21:19:44 - r - INFO: - Episode: 605/1000, Reward: 118.0, Step: 118 -2022-10-30 21:19:44 - r - INFO: - Episode: 606/1000, Reward: 108.0, Step: 108 -2022-10-30 21:19:45 - r - INFO: - Episode: 607/1000, Reward: 102.0, Step: 102 -2022-10-30 21:19:46 - r - INFO: - Episode: 608/1000, Reward: 105.0, Step: 105 -2022-10-30 21:19:46 - r - INFO: - Episode: 609/1000, Reward: 103.0, Step: 103 -2022-10-30 21:19:47 - r - INFO: - Episode: 610/1000, Reward: 96.0, Step: 96 -2022-10-30 21:19:47 - r - INFO: - Episode: 611/1000, Reward: 116.0, Step: 116 -2022-10-30 21:19:48 - r - INFO: - Episode: 612/1000, Reward: 51.0, Step: 51 -2022-10-30 21:19:48 - r - INFO: - Episode: 613/1000, Reward: 100.0, Step: 100 -2022-10-30 21:19:49 - r - INFO: - Episode: 614/1000, Reward: 121.0, Step: 121 -2022-10-30 21:19:50 - r - INFO: - Episode: 615/1000, Reward: 109.0, Step: 109 -2022-10-30 21:19:50 - r - INFO: - Episode: 616/1000, Reward: 85.0, Step: 85 -2022-10-30 21:19:51 - r - INFO: - Episode: 617/1000, Reward: 111.0, Step: 111 -2022-10-30 21:19:52 - r - INFO: - Episode: 618/1000, Reward: 91.0, Step: 91 -2022-10-30 21:19:52 - r - INFO: - Episode: 619/1000, Reward: 127.0, Step: 127 -2022-10-30 21:19:53 - r - INFO: - Episode: 620/1000, Reward: 117.0, Step: 117 -2022-10-30 21:19:53 - r - INFO: - Episode: 621/1000, Reward: 104.0, Step: 104 -2022-10-30 21:19:54 - r - INFO: - Episode: 622/1000, Reward: 119.0, Step: 119 -2022-10-30 21:19:55 - r - INFO: - Episode: 623/1000, Reward: 111.0, Step: 111 -2022-10-30 21:19:56 - r - INFO: - Episode: 624/1000, Reward: 132.0, Step: 132 -2022-10-30 21:19:56 - r - INFO: - Episode: 625/1000, Reward: 130.0, Step: 130 -2022-10-30 21:19:57 - r - INFO: - Episode: 626/1000, Reward: 140.0, Step: 140 -2022-10-30 21:19:58 - r - INFO: - Episode: 627/1000, Reward: 95.0, Step: 95 -2022-10-30 21:19:58 - r - INFO: - Episode: 628/1000, Reward: 106.0, Step: 106 -2022-10-30 21:19:59 - r - INFO: - Episode: 629/1000, Reward: 120.0, Step: 120 -2022-10-30 21:20:00 - r - INFO: - Episode: 630/1000, Reward: 111.0, Step: 111 -2022-10-30 21:20:00 - r - INFO: - Episode: 631/1000, Reward: 114.0, Step: 114 -2022-10-30 21:20:01 - r - INFO: - Episode: 632/1000, Reward: 126.0, Step: 126 -2022-10-30 21:20:02 - r - INFO: - Episode: 633/1000, Reward: 100.0, Step: 100 -2022-10-30 21:20:03 - r - INFO: - Episode: 634/1000, Reward: 111.0, Step: 111 -2022-10-30 21:20:03 - r - INFO: - Episode: 635/1000, Reward: 104.0, Step: 104 -2022-10-30 21:20:04 - r - INFO: - Episode: 636/1000, Reward: 103.0, Step: 103 -2022-10-30 21:20:04 - r - INFO: - Episode: 637/1000, Reward: 111.0, Step: 111 -2022-10-30 21:20:05 - r - INFO: - Episode: 638/1000, Reward: 110.0, Step: 110 -2022-10-30 21:20:06 - r - INFO: - Episode: 639/1000, Reward: 131.0, Step: 131 -2022-10-30 21:20:06 - r - INFO: - Episode: 640/1000, Reward: 90.0, Step: 90 -2022-10-30 21:20:07 - r - INFO: - Episode: 641/1000, Reward: 97.0, Step: 97 -2022-10-30 21:20:08 - r - INFO: - Episode: 642/1000, Reward: 104.0, Step: 104 -2022-10-30 21:20:09 - r - INFO: - Episode: 643/1000, Reward: 91.0, Step: 91 -2022-10-30 21:20:09 - r - INFO: - Episode: 644/1000, Reward: 97.0, Step: 97 -2022-10-30 21:20:10 - r - INFO: - Episode: 645/1000, Reward: 109.0, Step: 109 -2022-10-30 21:20:10 - r - INFO: - Episode: 646/1000, Reward: 112.0, Step: 112 -2022-10-30 21:20:11 - r - INFO: - Episode: 647/1000, Reward: 97.0, Step: 97 -2022-10-30 21:20:11 - r - INFO: - Episode: 648/1000, Reward: 32.0, Step: 32 -2022-10-30 21:20:12 - r - INFO: - Episode: 649/1000, Reward: 94.0, Step: 94 -2022-10-30 21:20:13 - r - INFO: - Episode: 650/1000, Reward: 107.0, Step: 107 -2022-10-30 21:20:13 - r - INFO: - Episode: 651/1000, Reward: 61.0, Step: 61 -2022-10-30 21:20:14 - r - INFO: - Episode: 652/1000, Reward: 97.0, Step: 97 -2022-10-30 21:20:14 - r - INFO: - Episode: 653/1000, Reward: 99.0, Step: 99 -2022-10-30 21:20:15 - r - INFO: - Episode: 654/1000, Reward: 76.0, Step: 76 -2022-10-30 21:20:15 - r - INFO: - Episode: 655/1000, Reward: 38.0, Step: 38 -2022-10-30 21:20:15 - r - INFO: - Episode: 656/1000, Reward: 96.0, Step: 96 -2022-10-30 21:20:16 - r - INFO: - Episode: 657/1000, Reward: 96.0, Step: 96 -2022-10-30 21:20:16 - r - INFO: - Episode: 658/1000, Reward: 65.0, Step: 65 -2022-10-30 21:20:17 - r - INFO: - Episode: 659/1000, Reward: 45.0, Step: 45 -2022-10-30 21:20:17 - r - INFO: - Episode: 660/1000, Reward: 91.0, Step: 91 -2022-10-30 21:20:18 - r - INFO: - Episode: 661/1000, Reward: 78.0, Step: 78 -2022-10-30 21:20:18 - r - INFO: - Episode: 662/1000, Reward: 90.0, Step: 90 -2022-10-30 21:20:19 - r - INFO: - Episode: 663/1000, Reward: 92.0, Step: 92 -2022-10-30 21:20:19 - r - INFO: - Episode: 664/1000, Reward: 94.0, Step: 94 -2022-10-30 21:20:20 - r - INFO: - Episode: 665/1000, Reward: 101.0, Step: 101 -2022-10-30 21:20:20 - r - INFO: - Episode: 666/1000, Reward: 111.0, Step: 111 -2022-10-30 21:20:21 - r - INFO: - Episode: 667/1000, Reward: 109.0, Step: 109 -2022-10-30 21:20:22 - r - INFO: - Episode: 668/1000, Reward: 99.0, Step: 99 -2022-10-30 21:20:22 - r - INFO: - Episode: 669/1000, Reward: 115.0, Step: 115 -2022-10-30 21:20:23 - r - INFO: - Episode: 670/1000, Reward: 112.0, Step: 112 -2022-10-30 21:20:23 - r - INFO: - Episode: 671/1000, Reward: 113.0, Step: 113 -2022-10-30 21:20:24 - r - INFO: - Episode: 672/1000, Reward: 110.0, Step: 110 -2022-10-30 21:20:25 - r - INFO: - Episode: 673/1000, Reward: 108.0, Step: 108 -2022-10-30 21:20:26 - r - INFO: - Episode: 674/1000, Reward: 112.0, Step: 112 -2022-10-30 21:20:26 - r - INFO: - Episode: 675/1000, Reward: 125.0, Step: 125 -2022-10-30 21:20:27 - r - INFO: - Episode: 676/1000, Reward: 122.0, Step: 122 -2022-10-30 21:20:28 - r - INFO: - Episode: 677/1000, Reward: 114.0, Step: 114 -2022-10-30 21:20:28 - r - INFO: - Episode: 678/1000, Reward: 127.0, Step: 127 -2022-10-30 21:20:29 - r - INFO: - Episode: 679/1000, Reward: 125.0, Step: 125 -2022-10-30 21:20:30 - r - INFO: - Episode: 680/1000, Reward: 112.0, Step: 112 -2022-10-30 21:20:30 - r - INFO: - Episode: 681/1000, Reward: 111.0, Step: 111 -2022-10-30 21:20:31 - r - INFO: - Episode: 682/1000, Reward: 124.0, Step: 124 -2022-10-30 21:20:32 - r - INFO: - Episode: 683/1000, Reward: 113.0, Step: 113 -2022-10-30 21:20:33 - r - INFO: - Episode: 684/1000, Reward: 103.0, Step: 103 -2022-10-30 21:20:33 - r - INFO: - Episode: 685/1000, Reward: 119.0, Step: 119 -2022-10-30 21:20:34 - r - INFO: - Episode: 686/1000, Reward: 120.0, Step: 120 -2022-10-30 21:20:35 - r - INFO: - Episode: 687/1000, Reward: 95.0, Step: 95 -2022-10-30 21:20:35 - r - INFO: - Episode: 688/1000, Reward: 100.0, Step: 100 -2022-10-30 21:20:36 - r - INFO: - Episode: 689/1000, Reward: 29.0, Step: 29 -2022-10-30 21:20:36 - r - INFO: - Episode: 690/1000, Reward: 119.0, Step: 119 -2022-10-30 21:20:37 - r - INFO: - Episode: 691/1000, Reward: 107.0, Step: 107 -2022-10-30 21:20:38 - r - INFO: - Episode: 692/1000, Reward: 117.0, Step: 117 -2022-10-30 21:20:38 - r - INFO: - Episode: 693/1000, Reward: 78.0, Step: 78 -2022-10-30 21:20:38 - r - INFO: - Episode: 694/1000, Reward: 35.0, Step: 35 -2022-10-30 21:20:39 - r - INFO: - Episode: 695/1000, Reward: 101.0, Step: 101 -2022-10-30 21:20:40 - r - INFO: - Episode: 696/1000, Reward: 98.0, Step: 98 -2022-10-30 21:20:40 - r - INFO: - Episode: 697/1000, Reward: 94.0, Step: 94 -2022-10-30 21:20:41 - r - INFO: - Episode: 698/1000, Reward: 102.0, Step: 102 -2022-10-30 21:20:41 - r - INFO: - Episode: 699/1000, Reward: 90.0, Step: 90 -2022-10-30 21:20:42 - r - INFO: - Episode: 700/1000, Reward: 86.0, Step: 86 -2022-10-30 21:20:42 - r - INFO: - Episode: 701/1000, Reward: 81.0, Step: 81 -2022-10-30 21:20:43 - r - INFO: - Episode: 702/1000, Reward: 105.0, Step: 105 -2022-10-30 21:20:43 - r - INFO: - Episode: 703/1000, Reward: 72.0, Step: 72 -2022-10-30 21:20:44 - r - INFO: - Episode: 704/1000, Reward: 100.0, Step: 100 -2022-10-30 21:20:44 - r - INFO: - Episode: 705/1000, Reward: 96.0, Step: 96 -2022-10-30 21:20:45 - r - INFO: - Episode: 706/1000, Reward: 111.0, Step: 111 -2022-10-30 21:20:45 - r - INFO: - Episode: 707/1000, Reward: 27.0, Step: 27 -2022-10-30 21:20:46 - r - INFO: - Episode: 708/1000, Reward: 107.0, Step: 107 -2022-10-30 21:20:47 - r - INFO: - Episode: 709/1000, Reward: 87.0, Step: 87 -2022-10-30 21:20:47 - r - INFO: - Episode: 710/1000, Reward: 114.0, Step: 114 -2022-10-30 21:20:48 - r - INFO: - Episode: 711/1000, Reward: 111.0, Step: 111 -2022-10-30 21:20:48 - r - INFO: - Episode: 712/1000, Reward: 88.0, Step: 88 -2022-10-30 21:20:49 - r - INFO: - Episode: 713/1000, Reward: 112.0, Step: 112 -2022-10-30 21:20:50 - r - INFO: - Episode: 714/1000, Reward: 108.0, Step: 108 -2022-10-30 21:20:50 - r - INFO: - Episode: 715/1000, Reward: 108.0, Step: 108 -2022-10-30 21:20:51 - r - INFO: - Episode: 716/1000, Reward: 103.0, Step: 103 -2022-10-30 21:20:52 - r - INFO: - Episode: 717/1000, Reward: 120.0, Step: 120 -2022-10-30 21:20:52 - r - INFO: - Episode: 718/1000, Reward: 116.0, Step: 116 -2022-10-30 21:20:53 - r - INFO: - Episode: 719/1000, Reward: 112.0, Step: 112 -2022-10-30 21:20:54 - r - INFO: - Episode: 720/1000, Reward: 99.0, Step: 99 -2022-10-30 21:20:54 - r - INFO: - Episode: 721/1000, Reward: 118.0, Step: 118 -2022-10-30 21:20:55 - r - INFO: - Episode: 722/1000, Reward: 114.0, Step: 114 -2022-10-30 21:20:56 - r - INFO: - Episode: 723/1000, Reward: 104.0, Step: 104 -2022-10-30 21:20:56 - r - INFO: - Episode: 724/1000, Reward: 99.0, Step: 99 -2022-10-30 21:20:57 - r - INFO: - Episode: 725/1000, Reward: 102.0, Step: 102 -2022-10-30 21:20:57 - r - INFO: - Episode: 726/1000, Reward: 106.0, Step: 106 -2022-10-30 21:20:58 - r - INFO: - Episode: 727/1000, Reward: 31.0, Step: 31 -2022-10-30 21:20:58 - r - INFO: - Episode: 728/1000, Reward: 91.0, Step: 91 -2022-10-30 21:20:59 - r - INFO: - Episode: 729/1000, Reward: 32.0, Step: 32 -2022-10-30 21:20:59 - r - INFO: - Episode: 730/1000, Reward: 96.0, Step: 96 -2022-10-30 21:20:59 - r - INFO: - Episode: 731/1000, Reward: 20.0, Step: 20 -2022-10-30 21:21:00 - r - INFO: - Episode: 732/1000, Reward: 33.0, Step: 33 -2022-10-30 21:21:00 - r - INFO: - Episode: 733/1000, Reward: 23.0, Step: 23 -2022-10-30 21:21:00 - r - INFO: - Episode: 734/1000, Reward: 80.0, Step: 80 -2022-10-30 21:21:01 - r - INFO: - Episode: 735/1000, Reward: 35.0, Step: 35 -2022-10-30 21:21:01 - r - INFO: - Episode: 736/1000, Reward: 88.0, Step: 88 -2022-10-30 21:21:01 - r - INFO: - Episode: 737/1000, Reward: 28.0, Step: 28 -2022-10-30 21:21:01 - r - INFO: - Episode: 738/1000, Reward: 26.0, Step: 26 -2022-10-30 21:21:02 - r - INFO: - Episode: 739/1000, Reward: 70.0, Step: 70 -2022-10-30 21:21:02 - r - INFO: - Episode: 740/1000, Reward: 86.0, Step: 86 -2022-10-30 21:21:02 - r - INFO: - Episode: 741/1000, Reward: 28.0, Step: 28 -2022-10-30 21:21:02 - r - INFO: - Episode: 742/1000, Reward: 39.0, Step: 39 -2022-10-30 21:21:03 - r - INFO: - Episode: 743/1000, Reward: 65.0, Step: 65 -2022-10-30 21:21:03 - r - INFO: - Episode: 744/1000, Reward: 52.0, Step: 52 -2022-10-30 21:21:03 - r - INFO: - Episode: 745/1000, Reward: 43.0, Step: 43 -2022-10-30 21:21:04 - r - INFO: - Episode: 746/1000, Reward: 97.0, Step: 97 -2022-10-30 21:21:04 - r - INFO: - Episode: 747/1000, Reward: 27.0, Step: 27 -2022-10-30 21:21:05 - r - INFO: - Episode: 748/1000, Reward: 89.0, Step: 89 -2022-10-30 21:21:05 - r - INFO: - Episode: 749/1000, Reward: 34.0, Step: 34 -2022-10-30 21:21:05 - r - INFO: - Episode: 750/1000, Reward: 35.0, Step: 35 -2022-10-30 21:21:06 - r - INFO: - Episode: 751/1000, Reward: 28.0, Step: 28 -2022-10-30 21:21:06 - r - INFO: - Episode: 752/1000, Reward: 96.0, Step: 96 -2022-10-30 21:21:07 - r - INFO: - Episode: 753/1000, Reward: 97.0, Step: 97 -2022-10-30 21:21:07 - r - INFO: - Episode: 754/1000, Reward: 108.0, Step: 108 -2022-10-30 21:21:08 - r - INFO: - Episode: 755/1000, Reward: 45.0, Step: 45 -2022-10-30 21:21:09 - r - INFO: - Episode: 756/1000, Reward: 103.0, Step: 103 -2022-10-30 21:21:10 - r - INFO: - Episode: 757/1000, Reward: 97.0, Step: 97 -2022-10-30 21:21:10 - r - INFO: - Episode: 758/1000, Reward: 114.0, Step: 114 -2022-10-30 21:21:11 - r - INFO: - Episode: 759/1000, Reward: 103.0, Step: 103 -2022-10-30 21:21:12 - r - INFO: - Episode: 760/1000, Reward: 116.0, Step: 116 -2022-10-30 21:21:12 - r - INFO: - Episode: 761/1000, Reward: 127.0, Step: 127 -2022-10-30 21:21:13 - r - INFO: - Episode: 762/1000, Reward: 122.0, Step: 122 -2022-10-30 21:21:14 - r - INFO: - Episode: 763/1000, Reward: 112.0, Step: 112 -2022-10-30 21:21:14 - r - INFO: - Episode: 764/1000, Reward: 112.0, Step: 112 -2022-10-30 21:21:15 - r - INFO: - Episode: 765/1000, Reward: 120.0, Step: 120 -2022-10-30 21:21:16 - r - INFO: - Episode: 766/1000, Reward: 129.0, Step: 129 -2022-10-30 21:21:17 - r - INFO: - Episode: 767/1000, Reward: 127.0, Step: 127 -2022-10-30 21:21:18 - r - INFO: - Episode: 768/1000, Reward: 125.0, Step: 125 -2022-10-30 21:21:19 - r - INFO: - Episode: 769/1000, Reward: 124.0, Step: 124 -2022-10-30 21:21:20 - r - INFO: - Episode: 770/1000, Reward: 126.0, Step: 126 -2022-10-30 21:21:20 - r - INFO: - Episode: 771/1000, Reward: 129.0, Step: 129 -2022-10-30 21:21:21 - r - INFO: - Episode: 772/1000, Reward: 129.0, Step: 129 -2022-10-30 21:21:22 - r - INFO: - Episode: 773/1000, Reward: 43.0, Step: 43 -2022-10-30 21:21:22 - r - INFO: - Episode: 774/1000, Reward: 121.0, Step: 121 -2022-10-30 21:21:23 - r - INFO: - Episode: 775/1000, Reward: 40.0, Step: 40 -2022-10-30 21:21:24 - r - INFO: - Episode: 776/1000, Reward: 116.0, Step: 116 -2022-10-30 21:21:24 - r - INFO: - Episode: 777/1000, Reward: 117.0, Step: 117 -2022-10-30 21:21:25 - r - INFO: - Episode: 778/1000, Reward: 113.0, Step: 113 -2022-10-30 21:21:26 - r - INFO: - Episode: 779/1000, Reward: 117.0, Step: 117 -2022-10-30 21:21:26 - r - INFO: - Episode: 780/1000, Reward: 108.0, Step: 108 -2022-10-30 21:21:27 - r - INFO: - Episode: 781/1000, Reward: 108.0, Step: 108 -2022-10-30 21:21:28 - r - INFO: - Episode: 782/1000, Reward: 119.0, Step: 119 -2022-10-30 21:21:28 - r - INFO: - Episode: 783/1000, Reward: 109.0, Step: 109 -2022-10-30 21:21:29 - r - INFO: - Episode: 784/1000, Reward: 116.0, Step: 116 -2022-10-30 21:21:29 - r - INFO: - Episode: 785/1000, Reward: 114.0, Step: 114 -2022-10-30 21:21:30 - r - INFO: - Episode: 786/1000, Reward: 45.0, Step: 45 -2022-10-30 21:21:31 - r - INFO: - Episode: 787/1000, Reward: 116.0, Step: 116 -2022-10-30 21:21:31 - r - INFO: - Episode: 788/1000, Reward: 116.0, Step: 116 -2022-10-30 21:21:32 - r - INFO: - Episode: 789/1000, Reward: 110.0, Step: 110 -2022-10-30 21:21:32 - r - INFO: - Episode: 790/1000, Reward: 105.0, Step: 105 -2022-10-30 21:21:33 - r - INFO: - Episode: 791/1000, Reward: 110.0, Step: 110 -2022-10-30 21:21:34 - r - INFO: - Episode: 792/1000, Reward: 112.0, Step: 112 -2022-10-30 21:21:34 - r - INFO: - Episode: 793/1000, Reward: 104.0, Step: 104 -2022-10-30 21:21:35 - r - INFO: - Episode: 794/1000, Reward: 120.0, Step: 120 -2022-10-30 21:21:36 - r - INFO: - Episode: 795/1000, Reward: 110.0, Step: 110 -2022-10-30 21:21:36 - r - INFO: - Episode: 796/1000, Reward: 113.0, Step: 113 -2022-10-30 21:21:37 - r - INFO: - Episode: 797/1000, Reward: 33.0, Step: 33 -2022-10-30 21:21:37 - r - INFO: - Episode: 798/1000, Reward: 111.0, Step: 111 -2022-10-30 21:21:38 - r - INFO: - Episode: 799/1000, Reward: 31.0, Step: 31 -2022-10-30 21:21:38 - r - INFO: - Episode: 800/1000, Reward: 139.0, Step: 139 -2022-10-30 21:21:39 - r - INFO: - Episode: 801/1000, Reward: 110.0, Step: 110 -2022-10-30 21:21:40 - r - INFO: - Episode: 802/1000, Reward: 124.0, Step: 124 -2022-10-30 21:21:41 - r - INFO: - Episode: 803/1000, Reward: 120.0, Step: 120 -2022-10-30 21:21:41 - r - INFO: - Episode: 804/1000, Reward: 112.0, Step: 112 -2022-10-30 21:21:42 - r - INFO: - Episode: 805/1000, Reward: 116.0, Step: 116 -2022-10-30 21:21:43 - r - INFO: - Episode: 806/1000, Reward: 105.0, Step: 105 -2022-10-30 21:21:43 - r - INFO: - Episode: 807/1000, Reward: 125.0, Step: 125 -2022-10-30 21:21:44 - r - INFO: - Episode: 808/1000, Reward: 103.0, Step: 103 -2022-10-30 21:21:45 - r - INFO: - Episode: 809/1000, Reward: 122.0, Step: 122 -2022-10-30 21:21:45 - r - INFO: - Episode: 810/1000, Reward: 109.0, Step: 109 -2022-10-30 21:21:46 - r - INFO: - Episode: 811/1000, Reward: 118.0, Step: 118 -2022-10-30 21:21:47 - r - INFO: - Episode: 812/1000, Reward: 124.0, Step: 124 -2022-10-30 21:21:48 - r - INFO: - Episode: 813/1000, Reward: 115.0, Step: 115 -2022-10-30 21:21:48 - r - INFO: - Episode: 814/1000, Reward: 26.0, Step: 26 -2022-10-30 21:21:49 - r - INFO: - Episode: 815/1000, Reward: 118.0, Step: 118 -2022-10-30 21:21:49 - r - INFO: - Episode: 816/1000, Reward: 118.0, Step: 118 -2022-10-30 21:21:50 - r - INFO: - Episode: 817/1000, Reward: 31.0, Step: 31 -2022-10-30 21:21:50 - r - INFO: - Episode: 818/1000, Reward: 99.0, Step: 99 -2022-10-30 21:21:51 - r - INFO: - Episode: 819/1000, Reward: 122.0, Step: 122 -2022-10-30 21:21:52 - r - INFO: - Episode: 820/1000, Reward: 102.0, Step: 102 -2022-10-30 21:21:52 - r - INFO: - Episode: 821/1000, Reward: 111.0, Step: 111 -2022-10-30 21:21:53 - r - INFO: - Episode: 822/1000, Reward: 110.0, Step: 110 -2022-10-30 21:21:54 - r - INFO: - Episode: 823/1000, Reward: 113.0, Step: 113 -2022-10-30 21:21:54 - r - INFO: - Episode: 824/1000, Reward: 117.0, Step: 117 -2022-10-30 21:21:55 - r - INFO: - Episode: 825/1000, Reward: 113.0, Step: 113 -2022-10-30 21:21:56 - r - INFO: - Episode: 826/1000, Reward: 109.0, Step: 109 -2022-10-30 21:21:57 - r - INFO: - Episode: 827/1000, Reward: 122.0, Step: 122 -2022-10-30 21:21:57 - r - INFO: - Episode: 828/1000, Reward: 117.0, Step: 117 -2022-10-30 21:21:58 - r - INFO: - Episode: 829/1000, Reward: 127.0, Step: 127 -2022-10-30 21:21:59 - r - INFO: - Episode: 830/1000, Reward: 113.0, Step: 113 -2022-10-30 21:21:59 - r - INFO: - Episode: 831/1000, Reward: 118.0, Step: 118 -2022-10-30 21:22:00 - r - INFO: - Episode: 832/1000, Reward: 107.0, Step: 107 -2022-10-30 21:22:01 - r - INFO: - Episode: 833/1000, Reward: 108.0, Step: 108 -2022-10-30 21:22:01 - r - INFO: - Episode: 834/1000, Reward: 103.0, Step: 103 -2022-10-30 21:22:02 - r - INFO: - Episode: 835/1000, Reward: 126.0, Step: 126 -2022-10-30 21:22:03 - r - INFO: - Episode: 836/1000, Reward: 131.0, Step: 131 -2022-10-30 21:22:03 - r - INFO: - Episode: 837/1000, Reward: 106.0, Step: 106 -2022-10-30 21:22:04 - r - INFO: - Episode: 838/1000, Reward: 116.0, Step: 116 -2022-10-30 21:22:05 - r - INFO: - Episode: 839/1000, Reward: 24.0, Step: 24 -2022-10-30 21:22:05 - r - INFO: - Episode: 840/1000, Reward: 107.0, Step: 107 -2022-10-30 21:22:06 - r - INFO: - Episode: 841/1000, Reward: 124.0, Step: 124 -2022-10-30 21:22:07 - r - INFO: - Episode: 842/1000, Reward: 125.0, Step: 125 -2022-10-30 21:22:07 - r - INFO: - Episode: 843/1000, Reward: 110.0, Step: 110 -2022-10-30 21:22:08 - r - INFO: - Episode: 844/1000, Reward: 112.0, Step: 112 -2022-10-30 21:22:09 - r - INFO: - Episode: 845/1000, Reward: 105.0, Step: 105 -2022-10-30 21:22:09 - r - INFO: - Episode: 846/1000, Reward: 104.0, Step: 104 -2022-10-30 21:22:10 - r - INFO: - Episode: 847/1000, Reward: 134.0, Step: 134 -2022-10-30 21:22:11 - r - INFO: - Episode: 848/1000, Reward: 107.0, Step: 107 -2022-10-30 21:22:11 - r - INFO: - Episode: 849/1000, Reward: 128.0, Step: 128 -2022-10-30 21:22:12 - r - INFO: - Episode: 850/1000, Reward: 113.0, Step: 113 -2022-10-30 21:22:13 - r - INFO: - Episode: 851/1000, Reward: 138.0, Step: 138 -2022-10-30 21:22:14 - r - INFO: - Episode: 852/1000, Reward: 118.0, Step: 118 -2022-10-30 21:22:14 - r - INFO: - Episode: 853/1000, Reward: 142.0, Step: 142 -2022-10-30 21:22:15 - r - INFO: - Episode: 854/1000, Reward: 118.0, Step: 118 -2022-10-30 21:22:16 - r - INFO: - Episode: 855/1000, Reward: 122.0, Step: 122 -2022-10-30 21:22:16 - r - INFO: - Episode: 856/1000, Reward: 130.0, Step: 130 -2022-10-30 21:22:17 - r - INFO: - Episode: 857/1000, Reward: 126.0, Step: 126 -2022-10-30 21:22:18 - r - INFO: - Episode: 858/1000, Reward: 111.0, Step: 111 -2022-10-30 21:22:19 - r - INFO: - Episode: 859/1000, Reward: 114.0, Step: 114 -2022-10-30 21:22:19 - r - INFO: - Episode: 860/1000, Reward: 128.0, Step: 128 -2022-10-30 21:22:20 - r - INFO: - Episode: 861/1000, Reward: 126.0, Step: 126 -2022-10-30 21:22:21 - r - INFO: - Episode: 862/1000, Reward: 143.0, Step: 143 -2022-10-30 21:22:22 - r - INFO: - Episode: 863/1000, Reward: 132.0, Step: 132 -2022-10-30 21:22:22 - r - INFO: - Episode: 864/1000, Reward: 123.0, Step: 123 -2022-10-30 21:22:23 - r - INFO: - Episode: 865/1000, Reward: 111.0, Step: 111 -2022-10-30 21:22:24 - r - INFO: - Episode: 866/1000, Reward: 129.0, Step: 129 -2022-10-30 21:22:25 - r - INFO: - Episode: 867/1000, Reward: 121.0, Step: 121 -2022-10-30 21:22:25 - r - INFO: - Episode: 868/1000, Reward: 114.0, Step: 114 -2022-10-30 21:22:26 - r - INFO: - Episode: 869/1000, Reward: 110.0, Step: 110 -2022-10-30 21:22:27 - r - INFO: - Episode: 870/1000, Reward: 118.0, Step: 118 -2022-10-30 21:22:27 - r - INFO: - Episode: 871/1000, Reward: 120.0, Step: 120 -2022-10-30 21:22:28 - r - INFO: - Episode: 872/1000, Reward: 109.0, Step: 109 -2022-10-30 21:22:29 - r - INFO: - Episode: 873/1000, Reward: 106.0, Step: 106 -2022-10-30 21:22:29 - r - INFO: - Episode: 874/1000, Reward: 118.0, Step: 118 -2022-10-30 21:22:30 - r - INFO: - Episode: 875/1000, Reward: 104.0, Step: 104 -2022-10-30 21:22:30 - r - INFO: - Episode: 876/1000, Reward: 98.0, Step: 98 -2022-10-30 21:22:31 - r - INFO: - Episode: 877/1000, Reward: 115.0, Step: 115 -2022-10-30 21:22:31 - r - INFO: - Episode: 878/1000, Reward: 34.0, Step: 34 -2022-10-30 21:22:32 - r - INFO: - Episode: 879/1000, Reward: 96.0, Step: 96 -2022-10-30 21:22:33 - r - INFO: - Episode: 880/1000, Reward: 108.0, Step: 108 -2022-10-30 21:22:33 - r - INFO: - Episode: 881/1000, Reward: 105.0, Step: 105 -2022-10-30 21:22:34 - r - INFO: - Episode: 882/1000, Reward: 33.0, Step: 33 -2022-10-30 21:22:34 - r - INFO: - Episode: 883/1000, Reward: 105.0, Step: 105 -2022-10-30 21:22:35 - r - INFO: - Episode: 884/1000, Reward: 111.0, Step: 111 -2022-10-30 21:22:35 - r - INFO: - Episode: 885/1000, Reward: 112.0, Step: 112 -2022-10-30 21:22:36 - r - INFO: - Episode: 886/1000, Reward: 101.0, Step: 101 -2022-10-30 21:22:36 - r - INFO: - Episode: 887/1000, Reward: 25.0, Step: 25 -2022-10-30 21:22:37 - r - INFO: - Episode: 888/1000, Reward: 35.0, Step: 35 -2022-10-30 21:22:37 - r - INFO: - Episode: 889/1000, Reward: 99.0, Step: 99 -2022-10-30 21:22:38 - r - INFO: - Episode: 890/1000, Reward: 105.0, Step: 105 -2022-10-30 21:22:38 - r - INFO: - Episode: 891/1000, Reward: 36.0, Step: 36 -2022-10-30 21:22:38 - r - INFO: - Episode: 892/1000, Reward: 92.0, Step: 92 -2022-10-30 21:22:39 - r - INFO: - Episode: 893/1000, Reward: 104.0, Step: 104 -2022-10-30 21:22:39 - r - INFO: - Episode: 894/1000, Reward: 111.0, Step: 111 -2022-10-30 21:22:40 - r - INFO: - Episode: 895/1000, Reward: 106.0, Step: 106 -2022-10-30 21:22:41 - r - INFO: - Episode: 896/1000, Reward: 109.0, Step: 109 -2022-10-30 21:22:41 - r - INFO: - Episode: 897/1000, Reward: 108.0, Step: 108 -2022-10-30 21:22:42 - r - INFO: - Episode: 898/1000, Reward: 101.0, Step: 101 -2022-10-30 21:22:42 - r - INFO: - Episode: 899/1000, Reward: 100.0, Step: 100 -2022-10-30 21:22:43 - r - INFO: - Episode: 900/1000, Reward: 33.0, Step: 33 -2022-10-30 21:22:44 - r - INFO: - Episode: 901/1000, Reward: 119.0, Step: 119 -2022-10-30 21:22:44 - r - INFO: - Episode: 902/1000, Reward: 112.0, Step: 112 -2022-10-30 21:22:45 - r - INFO: - Episode: 903/1000, Reward: 112.0, Step: 112 -2022-10-30 21:22:45 - r - INFO: - Episode: 904/1000, Reward: 126.0, Step: 126 -2022-10-30 21:22:46 - r - INFO: - Episode: 905/1000, Reward: 123.0, Step: 123 -2022-10-30 21:22:47 - r - INFO: - Episode: 906/1000, Reward: 125.0, Step: 125 -2022-10-30 21:22:47 - r - INFO: - Episode: 907/1000, Reward: 107.0, Step: 107 -2022-10-30 21:22:48 - r - INFO: - Episode: 908/1000, Reward: 128.0, Step: 128 -2022-10-30 21:22:49 - r - INFO: - Episode: 909/1000, Reward: 119.0, Step: 119 -2022-10-30 21:22:50 - r - INFO: - Episode: 910/1000, Reward: 142.0, Step: 142 -2022-10-30 21:22:50 - r - INFO: - Episode: 911/1000, Reward: 117.0, Step: 117 -2022-10-30 21:22:51 - r - INFO: - Episode: 912/1000, Reward: 125.0, Step: 125 -2022-10-30 21:22:52 - r - INFO: - Episode: 913/1000, Reward: 141.0, Step: 141 -2022-10-30 21:22:53 - r - INFO: - Episode: 914/1000, Reward: 134.0, Step: 134 -2022-10-30 21:22:53 - r - INFO: - Episode: 915/1000, Reward: 131.0, Step: 131 -2022-10-30 21:22:54 - r - INFO: - Episode: 916/1000, Reward: 131.0, Step: 131 -2022-10-30 21:22:55 - r - INFO: - Episode: 917/1000, Reward: 140.0, Step: 140 -2022-10-30 21:22:56 - r - INFO: - Episode: 918/1000, Reward: 115.0, Step: 115 -2022-10-30 21:22:56 - r - INFO: - Episode: 919/1000, Reward: 142.0, Step: 142 -2022-10-30 21:22:57 - r - INFO: - Episode: 920/1000, Reward: 142.0, Step: 142 -2022-10-30 21:22:58 - r - INFO: - Episode: 921/1000, Reward: 128.0, Step: 128 -2022-10-30 21:22:59 - r - INFO: - Episode: 922/1000, Reward: 139.0, Step: 139 -2022-10-30 21:23:00 - r - INFO: - Episode: 923/1000, Reward: 133.0, Step: 133 -2022-10-30 21:23:01 - r - INFO: - Episode: 924/1000, Reward: 129.0, Step: 129 -2022-10-30 21:23:01 - r - INFO: - Episode: 925/1000, Reward: 124.0, Step: 124 -2022-10-30 21:23:02 - r - INFO: - Episode: 926/1000, Reward: 131.0, Step: 131 -2022-10-30 21:23:03 - r - INFO: - Episode: 927/1000, Reward: 125.0, Step: 125 -2022-10-30 21:23:04 - r - INFO: - Episode: 928/1000, Reward: 146.0, Step: 146 -2022-10-30 21:23:04 - r - INFO: - Episode: 929/1000, Reward: 118.0, Step: 118 -2022-10-30 21:23:05 - r - INFO: - Episode: 930/1000, Reward: 126.0, Step: 126 -2022-10-30 21:23:06 - r - INFO: - Episode: 931/1000, Reward: 134.0, Step: 134 -2022-10-30 21:23:07 - r - INFO: - Episode: 932/1000, Reward: 155.0, Step: 155 -2022-10-30 21:23:07 - r - INFO: - Episode: 933/1000, Reward: 134.0, Step: 134 -2022-10-30 21:23:08 - r - INFO: - Episode: 934/1000, Reward: 136.0, Step: 136 -2022-10-30 21:23:09 - r - INFO: - Episode: 935/1000, Reward: 146.0, Step: 146 -2022-10-30 21:23:10 - r - INFO: - Episode: 936/1000, Reward: 150.0, Step: 150 -2022-10-30 21:23:11 - r - INFO: - Episode: 937/1000, Reward: 167.0, Step: 167 -2022-10-30 21:23:12 - r - INFO: - Episode: 938/1000, Reward: 135.0, Step: 135 -2022-10-30 21:23:13 - r - INFO: - Episode: 939/1000, Reward: 197.0, Step: 197 -2022-10-30 21:23:14 - r - INFO: - Episode: 940/1000, Reward: 190.0, Step: 190 -2022-10-30 21:23:15 - r - INFO: - Episode: 941/1000, Reward: 170.0, Step: 170 -2022-10-30 21:23:16 - r - INFO: - Episode: 942/1000, Reward: 179.0, Step: 179 -2022-10-30 21:23:17 - r - INFO: - Episode: 943/1000, Reward: 192.0, Step: 192 -2022-10-30 21:23:18 - r - INFO: - Episode: 944/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:19 - r - INFO: - Current episode 944 has the best eval reward: 199.5 -2022-10-30 21:23:20 - r - INFO: - Episode: 945/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:21 - r - INFO: - Episode: 946/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:22 - r - INFO: - Episode: 947/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:23 - r - INFO: - Current episode 947 has the best eval reward: 200.0 -2022-10-30 21:23:23 - r - INFO: - Episode: 948/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:25 - r - INFO: - Episode: 949/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:25 - r - INFO: - Current episode 949 has the best eval reward: 200.0 -2022-10-30 21:23:26 - r - INFO: - Episode: 950/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:26 - r - INFO: - Current episode 950 has the best eval reward: 200.0 -2022-10-30 21:23:27 - r - INFO: - Episode: 951/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:28 - r - INFO: - Current episode 951 has the best eval reward: 200.0 -2022-10-30 21:23:28 - r - INFO: - Episode: 952/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:29 - r - INFO: - Current episode 952 has the best eval reward: 200.0 -2022-10-30 21:23:29 - r - INFO: - Episode: 953/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:30 - r - INFO: - Current episode 953 has the best eval reward: 200.0 -2022-10-30 21:23:31 - r - INFO: - Episode: 954/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:31 - r - INFO: - Current episode 954 has the best eval reward: 200.0 -2022-10-30 21:23:32 - r - INFO: - Episode: 955/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:33 - r - INFO: - Current episode 955 has the best eval reward: 200.0 -2022-10-30 21:23:33 - r - INFO: - Episode: 956/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:34 - r - INFO: - Current episode 956 has the best eval reward: 200.0 -2022-10-30 21:23:34 - r - INFO: - Episode: 957/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:35 - r - INFO: - Current episode 957 has the best eval reward: 200.0 -2022-10-30 21:23:36 - r - INFO: - Episode: 958/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:36 - r - INFO: - Current episode 958 has the best eval reward: 200.0 -2022-10-30 21:23:37 - r - INFO: - Episode: 959/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:37 - r - INFO: - Current episode 959 has the best eval reward: 200.0 -2022-10-30 21:23:38 - r - INFO: - Episode: 960/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:39 - r - INFO: - Episode: 961/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:40 - r - INFO: - Current episode 961 has the best eval reward: 200.0 -2022-10-30 21:23:40 - r - INFO: - Episode: 962/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:41 - r - INFO: - Current episode 962 has the best eval reward: 200.0 -2022-10-30 21:23:42 - r - INFO: - Episode: 963/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:42 - r - INFO: - Current episode 963 has the best eval reward: 200.0 -2022-10-30 21:23:43 - r - INFO: - Episode: 964/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:43 - r - INFO: - Current episode 964 has the best eval reward: 200.0 -2022-10-30 21:23:44 - r - INFO: - Episode: 965/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:45 - r - INFO: - Episode: 966/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:46 - r - INFO: - Current episode 966 has the best eval reward: 200.0 -2022-10-30 21:23:46 - r - INFO: - Episode: 967/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:47 - r - INFO: - Current episode 967 has the best eval reward: 200.0 -2022-10-30 21:23:48 - r - INFO: - Episode: 968/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:48 - r - INFO: - Current episode 968 has the best eval reward: 200.0 -2022-10-30 21:23:49 - r - INFO: - Episode: 969/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:50 - r - INFO: - Current episode 969 has the best eval reward: 200.0 -2022-10-30 21:23:50 - r - INFO: - Episode: 970/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:51 - r - INFO: - Episode: 971/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:52 - r - INFO: - Current episode 971 has the best eval reward: 200.0 -2022-10-30 21:23:52 - r - INFO: - Episode: 972/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:53 - r - INFO: - Current episode 972 has the best eval reward: 200.0 -2022-10-30 21:23:54 - r - INFO: - Episode: 973/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:54 - r - INFO: - Current episode 973 has the best eval reward: 200.0 -2022-10-30 21:23:55 - r - INFO: - Episode: 974/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:55 - r - INFO: - Current episode 974 has the best eval reward: 200.0 -2022-10-30 21:23:56 - r - INFO: - Episode: 975/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:57 - r - INFO: - Current episode 975 has the best eval reward: 200.0 -2022-10-30 21:23:57 - r - INFO: - Episode: 976/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:58 - r - INFO: - Current episode 976 has the best eval reward: 200.0 -2022-10-30 21:23:58 - r - INFO: - Episode: 977/1000, Reward: 200.0, Step: 200 -2022-10-30 21:23:59 - r - INFO: - Current episode 977 has the best eval reward: 200.0 -2022-10-30 21:24:00 - r - INFO: - Episode: 978/1000, Reward: 200.0, Step: 200 -2022-10-30 21:24:01 - r - INFO: - Episode: 979/1000, Reward: 200.0, Step: 200 -2022-10-30 21:24:01 - r - INFO: - Current episode 979 has the best eval reward: 200.0 -2022-10-30 21:24:02 - r - INFO: - Episode: 980/1000, Reward: 200.0, Step: 200 -2022-10-30 21:24:03 - r - INFO: - Current episode 980 has the best eval reward: 200.0 -2022-10-30 21:24:03 - r - INFO: - Episode: 981/1000, Reward: 200.0, Step: 200 -2022-10-30 21:24:04 - r - INFO: - Episode: 982/1000, Reward: 200.0, Step: 200 -2022-10-30 21:24:05 - r - INFO: - Episode: 983/1000, Reward: 200.0, Step: 200 -2022-10-30 21:24:06 - r - INFO: - Current episode 983 has the best eval reward: 200.0 -2022-10-30 21:24:07 - r - INFO: - Episode: 984/1000, Reward: 200.0, Step: 200 -2022-10-30 21:24:08 - r - INFO: - Episode: 985/1000, Reward: 200.0, Step: 200 -2022-10-30 21:24:09 - r - INFO: - Episode: 986/1000, Reward: 200.0, Step: 200 -2022-10-30 21:24:10 - r - INFO: - Episode: 987/1000, Reward: 200.0, Step: 200 -2022-10-30 21:24:11 - r - INFO: - Current episode 987 has the best eval reward: 200.0 -2022-10-30 21:24:12 - r - INFO: - Episode: 988/1000, Reward: 200.0, Step: 200 -2022-10-30 21:24:12 - r - INFO: - Current episode 988 has the best eval reward: 200.0 -2022-10-30 21:24:13 - r - INFO: - Episode: 989/1000, Reward: 200.0, Step: 200 -2022-10-30 21:24:14 - r - INFO: - Episode: 990/1000, Reward: 200.0, Step: 200 -2022-10-30 21:24:15 - r - INFO: - Episode: 991/1000, Reward: 200.0, Step: 200 -2022-10-30 21:24:16 - r - INFO: - Current episode 991 has the best eval reward: 200.0 -2022-10-30 21:24:16 - r - INFO: - Episode: 992/1000, Reward: 198.0, Step: 198 -2022-10-30 21:24:17 - r - INFO: - Current episode 992 has the best eval reward: 200.0 -2022-10-30 21:24:18 - r - INFO: - Episode: 993/1000, Reward: 200.0, Step: 200 -2022-10-30 21:24:18 - r - INFO: - Current episode 993 has the best eval reward: 200.0 -2022-10-30 21:24:19 - r - INFO: - Episode: 994/1000, Reward: 200.0, Step: 200 -2022-10-30 21:24:19 - r - INFO: - Current episode 994 has the best eval reward: 200.0 -2022-10-30 21:24:20 - r - INFO: - Episode: 995/1000, Reward: 200.0, Step: 200 -2022-10-30 21:24:21 - r - INFO: - Episode: 996/1000, Reward: 200.0, Step: 200 -2022-10-30 21:24:22 - r - INFO: - Episode: 997/1000, Reward: 200.0, Step: 200 -2022-10-30 21:24:23 - r - INFO: - Episode: 998/1000, Reward: 200.0, Step: 200 -2022-10-30 21:24:24 - r - INFO: - Current episode 998 has the best eval reward: 200.0 -2022-10-30 21:24:25 - r - INFO: - Episode: 999/1000, Reward: 200.0, Step: 200 -2022-10-30 21:24:26 - r - INFO: - Episode: 1000/1000, Reward: 200.0, Step: 200 diff --git a/projects/codes/A2C/Train_CartPole-v1_A2C_20221030-211435/models/actor_checkpoint.pt b/projects/codes/A2C/Train_CartPole-v1_A2C_20221030-211435/models/actor_checkpoint.pt deleted file mode 100644 index 89d0854..0000000 Binary files a/projects/codes/A2C/Train_CartPole-v1_A2C_20221030-211435/models/actor_checkpoint.pt and /dev/null differ diff --git a/projects/codes/A2C/Train_CartPole-v1_A2C_20221030-211435/models/critic_checkpoint.pt b/projects/codes/A2C/Train_CartPole-v1_A2C_20221030-211435/models/critic_checkpoint.pt deleted file mode 100644 index 720f388..0000000 Binary files a/projects/codes/A2C/Train_CartPole-v1_A2C_20221030-211435/models/critic_checkpoint.pt and /dev/null differ diff --git a/projects/codes/A2C/Train_CartPole-v1_A2C_20221030-211435/results/learning_curve.png b/projects/codes/A2C/Train_CartPole-v1_A2C_20221030-211435/results/learning_curve.png deleted file mode 100644 index 8bbfcde..0000000 Binary files a/projects/codes/A2C/Train_CartPole-v1_A2C_20221030-211435/results/learning_curve.png and /dev/null differ diff --git a/projects/codes/A2C/Train_CartPole-v1_A2C_20221030-211435/results/res.csv b/projects/codes/A2C/Train_CartPole-v1_A2C_20221030-211435/results/res.csv deleted file mode 100644 index 6f853b9..0000000 --- a/projects/codes/A2C/Train_CartPole-v1_A2C_20221030-211435/results/res.csv +++ /dev/null @@ -1,1001 +0,0 @@ -episodes,rewards,steps -0,25.0,25 -1,13.0,13 -2,58.0,58 -3,10.0,10 -4,39.0,39 -5,39.0,39 -6,25.0,25 -7,22.0,22 -8,21.0,21 -9,27.0,27 -10,35.0,35 -11,26.0,26 -12,38.0,38 -13,29.0,29 -14,50.0,50 -15,20.0,20 -16,52.0,52 -17,12.0,12 -18,20.0,20 -19,38.0,38 -20,22.0,22 -21,36.0,36 -22,20.0,20 -23,35.0,35 -24,90.0,90 -25,29.0,29 -26,16.0,16 -27,25.0,25 -28,46.0,46 -29,33.0,33 -30,11.0,11 -31,27.0,27 -32,32.0,32 -33,21.0,21 -34,11.0,11 -35,21.0,21 -36,51.0,51 -37,29.0,29 -38,50.0,50 -39,19.0,19 -40,41.0,41 -41,28.0,28 -42,71.0,71 -43,45.0,45 -44,42.0,42 -45,39.0,39 -46,21.0,21 -47,14.0,14 -48,23.0,23 -49,21.0,21 -50,34.0,34 -51,14.0,14 -52,41.0,41 -53,99.0,99 -54,21.0,21 -55,52.0,52 -56,34.0,34 -57,73.0,73 -58,21.0,21 -59,27.0,27 -60,51.0,51 -61,46.0,46 -62,21.0,21 -63,20.0,20 -64,44.0,44 -65,16.0,16 -66,39.0,39 -67,30.0,30 -68,37.0,37 -69,20.0,20 -70,21.0,21 -71,13.0,13 -72,65.0,65 -73,45.0,45 -74,45.0,45 -75,46.0,46 -76,13.0,13 -77,33.0,33 -78,30.0,30 -79,52.0,52 -80,27.0,27 -81,30.0,30 -82,47.0,47 -83,56.0,56 -84,19.0,19 -85,33.0,33 -86,25.0,25 -87,41.0,41 -88,20.0,20 -89,58.0,58 -90,35.0,35 -91,23.0,23 -92,12.0,12 -93,20.0,20 -94,10.0,10 -95,49.0,49 -96,29.0,29 -97,35.0,35 -98,36.0,36 -99,36.0,36 -100,16.0,16 -101,36.0,36 -102,30.0,30 -103,76.0,76 -104,52.0,52 -105,39.0,39 -106,52.0,52 -107,69.0,69 -108,27.0,27 -109,14.0,14 -110,28.0,28 -111,12.0,12 -112,26.0,26 -113,50.0,50 -114,25.0,25 -115,53.0,53 -116,19.0,19 -117,33.0,33 -118,34.0,34 -119,41.0,41 -120,25.0,25 -121,18.0,18 -122,114.0,114 -123,25.0,25 -124,46.0,46 -125,22.0,22 -126,71.0,71 -127,30.0,30 -128,130.0,130 -129,65.0,65 -130,55.0,55 -131,37.0,37 -132,46.0,46 -133,65.0,65 -134,31.0,31 -135,33.0,33 -136,39.0,39 -137,73.0,73 -138,78.0,78 -139,36.0,36 -140,56.0,56 -141,12.0,12 -142,36.0,36 -143,13.0,13 -144,85.0,85 -145,34.0,34 -146,16.0,16 -147,68.0,68 -148,94.0,94 -149,17.0,17 -150,64.0,64 -151,33.0,33 -152,63.0,63 -153,39.0,39 -154,72.0,72 -155,39.0,39 -156,37.0,37 -157,18.0,18 -158,55.0,55 -159,21.0,21 -160,54.0,54 -161,46.0,46 -162,21.0,21 -163,26.0,26 -164,70.0,70 -165,20.0,20 -166,41.0,41 -167,77.0,77 -168,13.0,13 -169,66.0,66 -170,72.0,72 -171,28.0,28 -172,68.0,68 -173,124.0,124 -174,41.0,41 -175,54.0,54 -176,33.0,33 -177,92.0,92 -178,23.0,23 -179,76.0,76 -180,47.0,47 -181,89.0,89 -182,84.0,84 -183,75.0,75 -184,64.0,64 -185,35.0,35 -186,44.0,44 -187,46.0,46 -188,67.0,67 -189,82.0,82 -190,55.0,55 -191,26.0,26 -192,116.0,116 -193,116.0,116 -194,119.0,119 -195,50.0,50 -196,43.0,43 -197,47.0,47 -198,71.0,71 -199,53.0,53 -200,137.0,137 -201,82.0,82 -202,120.0,120 -203,69.0,69 -204,55.0,55 -205,62.0,62 -206,64.0,64 -207,49.0,49 -208,32.0,32 -209,42.0,42 -210,50.0,50 -211,93.0,93 -212,60.0,60 -213,54.0,54 -214,68.0,68 -215,84.0,84 -216,55.0,55 -217,70.0,70 -218,115.0,115 -219,149.0,149 -220,68.0,68 -221,50.0,50 -222,56.0,56 -223,61.0,61 -224,117.0,117 -225,66.0,66 -226,127.0,127 -227,66.0,66 -228,48.0,48 -229,36.0,36 -230,79.0,79 -231,49.0,49 -232,55.0,55 -233,41.0,41 -234,20.0,20 -235,40.0,40 -236,120.0,120 -237,27.0,27 -238,51.0,51 -239,35.0,35 -240,43.0,43 -241,54.0,54 -242,52.0,52 -243,47.0,47 -244,63.0,63 -245,29.0,29 -246,36.0,36 -247,58.0,58 -248,63.0,63 -249,49.0,49 -250,70.0,70 -251,114.0,114 -252,62.0,62 -253,73.0,73 -254,62.0,62 -255,61.0,61 -256,115.0,115 -257,50.0,50 -258,128.0,128 -259,200.0,200 -260,75.0,75 -261,64.0,64 -262,33.0,33 -263,90.0,90 -264,117.0,117 -265,60.0,60 -266,177.0,177 -267,39.0,39 -268,40.0,40 -269,109.0,109 -270,100.0,100 -271,99.0,99 -272,136.0,136 -273,62.0,62 -274,100.0,100 -275,73.0,73 -276,166.0,166 -277,74.0,74 -278,126.0,126 -279,111.0,111 -280,198.0,198 -281,106.0,106 -282,80.0,80 -283,74.0,74 -284,114.0,114 -285,69.0,69 -286,98.0,98 -287,63.0,63 -288,61.0,61 -289,49.0,49 -290,89.0,89 -291,114.0,114 -292,103.0,103 -293,103.0,103 -294,93.0,93 -295,137.0,137 -296,97.0,97 -297,124.0,124 -298,147.0,147 -299,125.0,125 -300,105.0,105 -301,113.0,113 -302,120.0,120 -303,159.0,159 -304,190.0,190 -305,119.0,119 -306,200.0,200 -307,148.0,148 -308,200.0,200 -309,79.0,79 -310,115.0,115 -311,147.0,147 -312,112.0,112 -313,125.0,125 -314,184.0,184 -315,193.0,193 -316,117.0,117 -317,153.0,153 -318,125.0,125 -319,184.0,184 -320,173.0,173 -321,117.0,117 -322,47.0,47 -323,107.0,107 -324,104.0,104 -325,114.0,114 -326,90.0,90 -327,112.0,112 -328,70.0,70 -329,74.0,74 -330,159.0,159 -331,39.0,39 -332,129.0,129 -333,50.0,50 -334,74.0,74 -335,31.0,31 -336,57.0,57 -337,71.0,71 -338,43.0,43 -339,41.0,41 -340,64.0,64 -341,38.0,38 -342,45.0,45 -343,120.0,120 -344,40.0,40 -345,46.0,46 -346,57.0,57 -347,29.0,29 -348,29.0,29 -349,50.0,50 -350,38.0,38 -351,51.0,51 -352,49.0,49 -353,30.0,30 -354,40.0,40 -355,45.0,45 -356,68.0,68 -357,27.0,27 -358,18.0,18 -359,26.0,26 -360,15.0,15 -361,65.0,65 -362,38.0,38 -363,41.0,41 -364,61.0,61 -365,113.0,113 -366,39.0,39 -367,60.0,60 -368,134.0,134 -369,122.0,122 -370,34.0,34 -371,129.0,129 -372,40.0,40 -373,128.0,128 -374,200.0,200 -375,108.0,108 -376,108.0,108 -377,151.0,151 -378,79.0,79 -379,105.0,105 -380,87.0,87 -381,94.0,94 -382,112.0,112 -383,200.0,200 -384,184.0,184 -385,124.0,124 -386,200.0,200 -387,200.0,200 -388,109.0,109 -389,88.0,88 -390,104.0,104 -391,200.0,200 -392,84.0,84 -393,187.0,187 -394,182.0,182 -395,148.0,148 -396,86.0,86 -397,200.0,200 -398,199.0,199 -399,200.0,200 -400,92.0,92 -401,112.0,112 -402,86.0,86 -403,114.0,114 -404,90.0,90 -405,101.0,101 -406,111.0,111 -407,107.0,107 -408,120.0,120 -409,114.0,114 -410,97.0,97 -411,95.0,95 -412,126.0,126 -413,111.0,111 -414,120.0,120 -415,178.0,178 -416,97.0,97 -417,144.0,144 -418,200.0,200 -419,190.0,190 -420,29.0,29 -421,200.0,200 -422,116.0,116 -423,200.0,200 -424,107.0,107 -425,128.0,128 -426,164.0,164 -427,30.0,30 -428,122.0,122 -429,110.0,110 -430,105.0,105 -431,137.0,137 -432,110.0,110 -433,111.0,111 -434,33.0,33 -435,100.0,100 -436,131.0,131 -437,99.0,99 -438,118.0,118 -439,98.0,98 -440,119.0,119 -441,41.0,41 -442,107.0,107 -443,41.0,41 -444,113.0,113 -445,113.0,113 -446,117.0,117 -447,140.0,140 -448,133.0,133 -449,108.0,108 -450,117.0,117 -451,40.0,40 -452,108.0,108 -453,140.0,140 -454,133.0,133 -455,115.0,115 -456,30.0,30 -457,119.0,119 -458,160.0,160 -459,125.0,125 -460,161.0,161 -461,139.0,139 -462,190.0,190 -463,149.0,149 -464,173.0,173 -465,165.0,165 -466,82.0,82 -467,197.0,197 -468,200.0,200 -469,200.0,200 -470,200.0,200 -471,182.0,182 -472,118.0,118 -473,200.0,200 -474,200.0,200 -475,93.0,93 -476,200.0,200 -477,200.0,200 -478,167.0,167 -479,200.0,200 -480,200.0,200 -481,200.0,200 -482,190.0,190 -483,86.0,86 -484,166.0,166 -485,200.0,200 -486,200.0,200 -487,172.0,172 -488,200.0,200 -489,102.0,102 -490,194.0,194 -491,200.0,200 -492,179.0,179 -493,187.0,187 -494,200.0,200 -495,89.0,89 -496,169.0,169 -497,28.0,28 -498,160.0,160 -499,140.0,140 -500,37.0,37 -501,32.0,32 -502,129.0,129 -503,22.0,22 -504,124.0,124 -505,24.0,24 -506,115.0,115 -507,24.0,24 -508,38.0,38 -509,24.0,24 -510,23.0,23 -511,125.0,125 -512,22.0,22 -513,24.0,24 -514,20.0,20 -515,25.0,25 -516,31.0,31 -517,23.0,23 -518,30.0,30 -519,101.0,101 -520,25.0,25 -521,22.0,22 -522,20.0,20 -523,16.0,16 -524,104.0,104 -525,17.0,17 -526,108.0,108 -527,121.0,121 -528,29.0,29 -529,29.0,29 -530,43.0,43 -531,105.0,105 -532,130.0,130 -533,30.0,30 -534,31.0,31 -535,30.0,30 -536,37.0,37 -537,115.0,115 -538,110.0,110 -539,112.0,112 -540,33.0,33 -541,120.0,120 -542,109.0,109 -543,122.0,122 -544,115.0,115 -545,34.0,34 -546,28.0,28 -547,29.0,29 -548,113.0,113 -549,100.0,100 -550,26.0,26 -551,24.0,24 -552,26.0,26 -553,102.0,102 -554,18.0,18 -555,107.0,107 -556,27.0,27 -557,87.0,87 -558,29.0,29 -559,31.0,31 -560,112.0,112 -561,112.0,112 -562,108.0,108 -563,98.0,98 -564,104.0,104 -565,116.0,116 -566,123.0,123 -567,105.0,105 -568,133.0,133 -569,116.0,116 -570,128.0,128 -571,130.0,130 -572,113.0,113 -573,143.0,143 -574,145.0,145 -575,159.0,159 -576,150.0,150 -577,130.0,130 -578,145.0,145 -579,173.0,173 -580,154.0,154 -581,131.0,131 -582,163.0,163 -583,160.0,160 -584,181.0,181 -585,161.0,161 -586,169.0,169 -587,150.0,150 -588,176.0,176 -589,157.0,157 -590,167.0,167 -591,168.0,168 -592,135.0,135 -593,157.0,157 -594,138.0,138 -595,139.0,139 -596,146.0,146 -597,121.0,121 -598,140.0,140 -599,124.0,124 -600,124.0,124 -601,115.0,115 -602,129.0,129 -603,107.0,107 -604,118.0,118 -605,108.0,108 -606,102.0,102 -607,105.0,105 -608,103.0,103 -609,96.0,96 -610,116.0,116 -611,51.0,51 -612,100.0,100 -613,121.0,121 -614,109.0,109 -615,85.0,85 -616,111.0,111 -617,91.0,91 -618,127.0,127 -619,117.0,117 -620,104.0,104 -621,119.0,119 -622,111.0,111 -623,132.0,132 -624,130.0,130 -625,140.0,140 -626,95.0,95 -627,106.0,106 -628,120.0,120 -629,111.0,111 -630,114.0,114 -631,126.0,126 -632,100.0,100 -633,111.0,111 -634,104.0,104 -635,103.0,103 -636,111.0,111 -637,110.0,110 -638,131.0,131 -639,90.0,90 -640,97.0,97 -641,104.0,104 -642,91.0,91 -643,97.0,97 -644,109.0,109 -645,112.0,112 -646,97.0,97 -647,32.0,32 -648,94.0,94 -649,107.0,107 -650,61.0,61 -651,97.0,97 -652,99.0,99 -653,76.0,76 -654,38.0,38 -655,96.0,96 -656,96.0,96 -657,65.0,65 -658,45.0,45 -659,91.0,91 -660,78.0,78 -661,90.0,90 -662,92.0,92 -663,94.0,94 -664,101.0,101 -665,111.0,111 -666,109.0,109 -667,99.0,99 -668,115.0,115 -669,112.0,112 -670,113.0,113 -671,110.0,110 -672,108.0,108 -673,112.0,112 -674,125.0,125 -675,122.0,122 -676,114.0,114 -677,127.0,127 -678,125.0,125 -679,112.0,112 -680,111.0,111 -681,124.0,124 -682,113.0,113 -683,103.0,103 -684,119.0,119 -685,120.0,120 -686,95.0,95 -687,100.0,100 -688,29.0,29 -689,119.0,119 -690,107.0,107 -691,117.0,117 -692,78.0,78 -693,35.0,35 -694,101.0,101 -695,98.0,98 -696,94.0,94 -697,102.0,102 -698,90.0,90 -699,86.0,86 -700,81.0,81 -701,105.0,105 -702,72.0,72 -703,100.0,100 -704,96.0,96 -705,111.0,111 -706,27.0,27 -707,107.0,107 -708,87.0,87 -709,114.0,114 -710,111.0,111 -711,88.0,88 -712,112.0,112 -713,108.0,108 -714,108.0,108 -715,103.0,103 -716,120.0,120 -717,116.0,116 -718,112.0,112 -719,99.0,99 -720,118.0,118 -721,114.0,114 -722,104.0,104 -723,99.0,99 -724,102.0,102 -725,106.0,106 -726,31.0,31 -727,91.0,91 -728,32.0,32 -729,96.0,96 -730,20.0,20 -731,33.0,33 -732,23.0,23 -733,80.0,80 -734,35.0,35 -735,88.0,88 -736,28.0,28 -737,26.0,26 -738,70.0,70 -739,86.0,86 -740,28.0,28 -741,39.0,39 -742,65.0,65 -743,52.0,52 -744,43.0,43 -745,97.0,97 -746,27.0,27 -747,89.0,89 -748,34.0,34 -749,35.0,35 -750,28.0,28 -751,96.0,96 -752,97.0,97 -753,108.0,108 -754,45.0,45 -755,103.0,103 -756,97.0,97 -757,114.0,114 -758,103.0,103 -759,116.0,116 -760,127.0,127 -761,122.0,122 -762,112.0,112 -763,112.0,112 -764,120.0,120 -765,129.0,129 -766,127.0,127 -767,125.0,125 -768,124.0,124 -769,126.0,126 -770,129.0,129 -771,129.0,129 -772,43.0,43 -773,121.0,121 -774,40.0,40 -775,116.0,116 -776,117.0,117 -777,113.0,113 -778,117.0,117 -779,108.0,108 -780,108.0,108 -781,119.0,119 -782,109.0,109 -783,116.0,116 -784,114.0,114 -785,45.0,45 -786,116.0,116 -787,116.0,116 -788,110.0,110 -789,105.0,105 -790,110.0,110 -791,112.0,112 -792,104.0,104 -793,120.0,120 -794,110.0,110 -795,113.0,113 -796,33.0,33 -797,111.0,111 -798,31.0,31 -799,139.0,139 -800,110.0,110 -801,124.0,124 -802,120.0,120 -803,112.0,112 -804,116.0,116 -805,105.0,105 -806,125.0,125 -807,103.0,103 -808,122.0,122 -809,109.0,109 -810,118.0,118 -811,124.0,124 -812,115.0,115 -813,26.0,26 -814,118.0,118 -815,118.0,118 -816,31.0,31 -817,99.0,99 -818,122.0,122 -819,102.0,102 -820,111.0,111 -821,110.0,110 -822,113.0,113 -823,117.0,117 -824,113.0,113 -825,109.0,109 -826,122.0,122 -827,117.0,117 -828,127.0,127 -829,113.0,113 -830,118.0,118 -831,107.0,107 -832,108.0,108 -833,103.0,103 -834,126.0,126 -835,131.0,131 -836,106.0,106 -837,116.0,116 -838,24.0,24 -839,107.0,107 -840,124.0,124 -841,125.0,125 -842,110.0,110 -843,112.0,112 -844,105.0,105 -845,104.0,104 -846,134.0,134 -847,107.0,107 -848,128.0,128 -849,113.0,113 -850,138.0,138 -851,118.0,118 -852,142.0,142 -853,118.0,118 -854,122.0,122 -855,130.0,130 -856,126.0,126 -857,111.0,111 -858,114.0,114 -859,128.0,128 -860,126.0,126 -861,143.0,143 -862,132.0,132 -863,123.0,123 -864,111.0,111 -865,129.0,129 -866,121.0,121 -867,114.0,114 -868,110.0,110 -869,118.0,118 -870,120.0,120 -871,109.0,109 -872,106.0,106 -873,118.0,118 -874,104.0,104 -875,98.0,98 -876,115.0,115 -877,34.0,34 -878,96.0,96 -879,108.0,108 -880,105.0,105 -881,33.0,33 -882,105.0,105 -883,111.0,111 -884,112.0,112 -885,101.0,101 -886,25.0,25 -887,35.0,35 -888,99.0,99 -889,105.0,105 -890,36.0,36 -891,92.0,92 -892,104.0,104 -893,111.0,111 -894,106.0,106 -895,109.0,109 -896,108.0,108 -897,101.0,101 -898,100.0,100 -899,33.0,33 -900,119.0,119 -901,112.0,112 -902,112.0,112 -903,126.0,126 -904,123.0,123 -905,125.0,125 -906,107.0,107 -907,128.0,128 -908,119.0,119 -909,142.0,142 -910,117.0,117 -911,125.0,125 -912,141.0,141 -913,134.0,134 -914,131.0,131 -915,131.0,131 -916,140.0,140 -917,115.0,115 -918,142.0,142 -919,142.0,142 -920,128.0,128 -921,139.0,139 -922,133.0,133 -923,129.0,129 -924,124.0,124 -925,131.0,131 -926,125.0,125 -927,146.0,146 -928,118.0,118 -929,126.0,126 -930,134.0,134 -931,155.0,155 -932,134.0,134 -933,136.0,136 -934,146.0,146 -935,150.0,150 -936,167.0,167 -937,135.0,135 -938,197.0,197 -939,190.0,190 -940,170.0,170 -941,179.0,179 -942,192.0,192 -943,200.0,200 -944,200.0,200 -945,200.0,200 -946,200.0,200 -947,200.0,200 -948,200.0,200 -949,200.0,200 -950,200.0,200 -951,200.0,200 -952,200.0,200 -953,200.0,200 -954,200.0,200 -955,200.0,200 -956,200.0,200 -957,200.0,200 -958,200.0,200 -959,200.0,200 -960,200.0,200 -961,200.0,200 -962,200.0,200 -963,200.0,200 -964,200.0,200 -965,200.0,200 -966,200.0,200 -967,200.0,200 -968,200.0,200 -969,200.0,200 -970,200.0,200 -971,200.0,200 -972,200.0,200 -973,200.0,200 -974,200.0,200 -975,200.0,200 -976,200.0,200 -977,200.0,200 -978,200.0,200 -979,200.0,200 -980,200.0,200 -981,200.0,200 -982,200.0,200 -983,200.0,200 -984,200.0,200 -985,200.0,200 -986,200.0,200 -987,200.0,200 -988,200.0,200 -989,200.0,200 -990,200.0,200 -991,198.0,198 -992,200.0,200 -993,200.0,200 -994,200.0,200 -995,200.0,200 -996,200.0,200 -997,200.0,200 -998,200.0,200 -999,200.0,200 diff --git a/projects/codes/A2C/Train_CartPole-v1_A2C_20221031-232138/config.yaml b/projects/codes/A2C/Train_CartPole-v1_A2C_20221031-232138/config.yaml deleted file mode 100644 index 49d9701..0000000 --- a/projects/codes/A2C/Train_CartPole-v1_A2C_20221031-232138/config.yaml +++ /dev/null @@ -1,24 +0,0 @@ -general_cfg: - algo_name: A2C - device: cuda - env_name: CartPole-v1 - eval_eps: 10 - eval_per_episode: 5 - load_checkpoint: false - load_path: tasks - max_steps: 200 - mode: train - save_fig: true - seed: 1 - show_fig: false - test_eps: 20 - train_eps: 1000 -algo_cfg: - actor_hidden_dim: 256 - actor_lr: 0.0003 - batch_size: 64 - buffer_size: 100000 - critic_hidden_dim: 256 - critic_lr: 0.001 - gamma: 0.99 - hidden_dim: 256 diff --git a/projects/codes/A2C/Train_CartPole-v1_A2C_20221031-232138/logs/log.txt b/projects/codes/A2C/Train_CartPole-v1_A2C_20221031-232138/logs/log.txt deleted file mode 100644 index 18436c8..0000000 --- a/projects/codes/A2C/Train_CartPole-v1_A2C_20221031-232138/logs/log.txt +++ /dev/null @@ -1,1086 +0,0 @@ -2022-10-31 23:21:38 - r - INFO: - n_states: 4, n_actions: 2 -2022-10-31 23:21:38 - r - INFO: - Actor model name: ActorSoftmaxTanh -2022-10-31 23:21:38 - r - INFO: - Critic model name: Critic -2022-10-31 23:21:38 - r - INFO: - ACMemory memory name: PGReplay -2022-10-31 23:21:38 - r - INFO: - agent name: A2C -2022-10-31 23:21:38 - r - INFO: - Start training! -2022-10-31 23:21:38 - r - INFO: - Env: CartPole-v1, Algorithm: A2C, Device: cuda -2022-10-31 23:21:40 - r - INFO: - Episode: 1/1000, Reward: 25.0, Step: 25 -2022-10-31 23:21:40 - r - INFO: - Episode: 2/1000, Reward: 11.0, Step: 11 -2022-10-31 23:21:41 - r - INFO: - Episode: 3/1000, Reward: 32.0, Step: 32 -2022-10-31 23:21:41 - r - INFO: - Episode: 4/1000, Reward: 11.0, Step: 11 -2022-10-31 23:21:41 - r - INFO: - Episode: 5/1000, Reward: 14.0, Step: 14 -2022-10-31 23:21:41 - r - INFO: - Current episode 5 has the best eval reward: 19.90 -2022-10-31 23:21:41 - r - INFO: - Episode: 6/1000, Reward: 11.0, Step: 11 -2022-10-31 23:21:41 - r - INFO: - Episode: 7/1000, Reward: 23.0, Step: 23 -2022-10-31 23:21:41 - r - INFO: - Episode: 8/1000, Reward: 27.0, Step: 27 -2022-10-31 23:21:41 - r - INFO: - Episode: 9/1000, Reward: 10.0, Step: 10 -2022-10-31 23:21:41 - r - INFO: - Episode: 10/1000, Reward: 21.0, Step: 21 -2022-10-31 23:21:41 - r - INFO: - Episode: 11/1000, Reward: 15.0, Step: 15 -2022-10-31 23:21:41 - r - INFO: - Episode: 12/1000, Reward: 26.0, Step: 26 -2022-10-31 23:21:41 - r - INFO: - Episode: 13/1000, Reward: 22.0, Step: 22 -2022-10-31 23:21:41 - r - INFO: - Episode: 14/1000, Reward: 14.0, Step: 14 -2022-10-31 23:21:41 - r - INFO: - Episode: 15/1000, Reward: 14.0, Step: 14 -2022-10-31 23:21:41 - r - INFO: - Episode: 16/1000, Reward: 21.0, Step: 21 -2022-10-31 23:21:41 - r - INFO: - Episode: 17/1000, Reward: 10.0, Step: 10 -2022-10-31 23:21:42 - r - INFO: - Episode: 18/1000, Reward: 19.0, Step: 19 -2022-10-31 23:21:42 - r - INFO: - Episode: 19/1000, Reward: 18.0, Step: 18 -2022-10-31 23:21:42 - r - INFO: - Episode: 20/1000, Reward: 26.0, Step: 26 -2022-10-31 23:21:42 - r - INFO: - Current episode 20 has the best eval reward: 21.50 -2022-10-31 23:21:42 - r - INFO: - Episode: 21/1000, Reward: 29.0, Step: 29 -2022-10-31 23:21:42 - r - INFO: - Episode: 22/1000, Reward: 40.0, Step: 40 -2022-10-31 23:21:42 - r - INFO: - Episode: 23/1000, Reward: 35.0, Step: 35 -2022-10-31 23:21:42 - r - INFO: - Episode: 24/1000, Reward: 33.0, Step: 33 -2022-10-31 23:21:42 - r - INFO: - Episode: 25/1000, Reward: 47.0, Step: 47 -2022-10-31 23:21:43 - r - INFO: - Current episode 25 has the best eval reward: 31.90 -2022-10-31 23:21:43 - r - INFO: - Episode: 26/1000, Reward: 50.0, Step: 50 -2022-10-31 23:21:43 - r - INFO: - Episode: 27/1000, Reward: 21.0, Step: 21 -2022-10-31 23:21:43 - r - INFO: - Episode: 28/1000, Reward: 30.0, Step: 30 -2022-10-31 23:21:43 - r - INFO: - Episode: 29/1000, Reward: 26.0, Step: 26 -2022-10-31 23:21:43 - r - INFO: - Episode: 30/1000, Reward: 40.0, Step: 40 -2022-10-31 23:21:43 - r - INFO: - Current episode 30 has the best eval reward: 56.70 -2022-10-31 23:21:43 - r - INFO: - Episode: 31/1000, Reward: 31.0, Step: 31 -2022-10-31 23:21:43 - r - INFO: - Episode: 32/1000, Reward: 54.0, Step: 54 -2022-10-31 23:21:43 - r - INFO: - Episode: 33/1000, Reward: 59.0, Step: 59 -2022-10-31 23:21:44 - r - INFO: - Episode: 34/1000, Reward: 50.0, Step: 50 -2022-10-31 23:21:44 - r - INFO: - Episode: 35/1000, Reward: 26.0, Step: 26 -2022-10-31 23:21:44 - r - INFO: - Episode: 36/1000, Reward: 34.0, Step: 34 -2022-10-31 23:21:44 - r - INFO: - Episode: 37/1000, Reward: 25.0, Step: 25 -2022-10-31 23:21:44 - r - INFO: - Episode: 38/1000, Reward: 166.0, Step: 166 -2022-10-31 23:21:44 - r - INFO: - Episode: 39/1000, Reward: 35.0, Step: 35 -2022-10-31 23:21:44 - r - INFO: - Episode: 40/1000, Reward: 25.0, Step: 25 -2022-10-31 23:21:45 - r - INFO: - Episode: 41/1000, Reward: 110.0, Step: 110 -2022-10-31 23:21:45 - r - INFO: - Episode: 42/1000, Reward: 22.0, Step: 22 -2022-10-31 23:21:45 - r - INFO: - Episode: 43/1000, Reward: 57.0, Step: 57 -2022-10-31 23:21:45 - r - INFO: - Episode: 44/1000, Reward: 45.0, Step: 45 -2022-10-31 23:21:45 - r - INFO: - Episode: 45/1000, Reward: 35.0, Step: 35 -2022-10-31 23:21:45 - r - INFO: - Episode: 46/1000, Reward: 45.0, Step: 45 -2022-10-31 23:21:45 - r - INFO: - Episode: 47/1000, Reward: 51.0, Step: 51 -2022-10-31 23:21:46 - r - INFO: - Episode: 48/1000, Reward: 32.0, Step: 32 -2022-10-31 23:21:46 - r - INFO: - Episode: 49/1000, Reward: 67.0, Step: 67 -2022-10-31 23:21:46 - r - INFO: - Episode: 50/1000, Reward: 46.0, Step: 46 -2022-10-31 23:21:46 - r - INFO: - Episode: 51/1000, Reward: 61.0, Step: 61 -2022-10-31 23:21:46 - r - INFO: - Episode: 52/1000, Reward: 49.0, Step: 49 -2022-10-31 23:21:46 - r - INFO: - Episode: 53/1000, Reward: 47.0, Step: 47 -2022-10-31 23:21:46 - r - INFO: - Episode: 54/1000, Reward: 37.0, Step: 37 -2022-10-31 23:21:46 - r - INFO: - Episode: 55/1000, Reward: 32.0, Step: 32 -2022-10-31 23:21:47 - r - INFO: - Current episode 55 has the best eval reward: 85.50 -2022-10-31 23:21:47 - r - INFO: - Episode: 56/1000, Reward: 31.0, Step: 31 -2022-10-31 23:21:47 - r - INFO: - Episode: 57/1000, Reward: 33.0, Step: 33 -2022-10-31 23:21:47 - r - INFO: - Episode: 58/1000, Reward: 93.0, Step: 93 -2022-10-31 23:21:47 - r - INFO: - Episode: 59/1000, Reward: 60.0, Step: 60 -2022-10-31 23:21:48 - r - INFO: - Episode: 60/1000, Reward: 128.0, Step: 128 -2022-10-31 23:21:48 - r - INFO: - Episode: 61/1000, Reward: 200.0, Step: 200 -2022-10-31 23:21:48 - r - INFO: - Episode: 62/1000, Reward: 47.0, Step: 47 -2022-10-31 23:21:48 - r - INFO: - Episode: 63/1000, Reward: 47.0, Step: 47 -2022-10-31 23:21:49 - r - INFO: - Episode: 64/1000, Reward: 63.0, Step: 63 -2022-10-31 23:21:49 - r - INFO: - Episode: 65/1000, Reward: 68.0, Step: 68 -2022-10-31 23:21:49 - r - INFO: - Episode: 66/1000, Reward: 45.0, Step: 45 -2022-10-31 23:21:49 - r - INFO: - Episode: 67/1000, Reward: 101.0, Step: 101 -2022-10-31 23:21:49 - r - INFO: - Episode: 68/1000, Reward: 47.0, Step: 47 -2022-10-31 23:21:49 - r - INFO: - Episode: 69/1000, Reward: 49.0, Step: 49 -2022-10-31 23:21:50 - r - INFO: - Episode: 70/1000, Reward: 54.0, Step: 54 -2022-10-31 23:21:50 - r - INFO: - Episode: 71/1000, Reward: 42.0, Step: 42 -2022-10-31 23:21:50 - r - INFO: - Episode: 72/1000, Reward: 77.0, Step: 77 -2022-10-31 23:21:50 - r - INFO: - Episode: 73/1000, Reward: 67.0, Step: 67 -2022-10-31 23:21:50 - r - INFO: - Episode: 74/1000, Reward: 41.0, Step: 41 -2022-10-31 23:21:51 - r - INFO: - Episode: 75/1000, Reward: 89.0, Step: 89 -2022-10-31 23:21:51 - r - INFO: - Episode: 76/1000, Reward: 51.0, Step: 51 -2022-10-31 23:21:51 - r - INFO: - Episode: 77/1000, Reward: 54.0, Step: 54 -2022-10-31 23:21:51 - r - INFO: - Episode: 78/1000, Reward: 37.0, Step: 37 -2022-10-31 23:21:51 - r - INFO: - Episode: 79/1000, Reward: 49.0, Step: 49 -2022-10-31 23:21:51 - r - INFO: - Episode: 80/1000, Reward: 46.0, Step: 46 -2022-10-31 23:21:52 - r - INFO: - Episode: 81/1000, Reward: 31.0, Step: 31 -2022-10-31 23:21:52 - r - INFO: - Episode: 82/1000, Reward: 43.0, Step: 43 -2022-10-31 23:21:52 - r - INFO: - Episode: 83/1000, Reward: 60.0, Step: 60 -2022-10-31 23:21:52 - r - INFO: - Episode: 84/1000, Reward: 41.0, Step: 41 -2022-10-31 23:21:52 - r - INFO: - Episode: 85/1000, Reward: 40.0, Step: 40 -2022-10-31 23:21:52 - r - INFO: - Episode: 86/1000, Reward: 28.0, Step: 28 -2022-10-31 23:21:52 - r - INFO: - Episode: 87/1000, Reward: 50.0, Step: 50 -2022-10-31 23:21:53 - r - INFO: - Episode: 88/1000, Reward: 159.0, Step: 159 -2022-10-31 23:21:53 - r - INFO: - Episode: 89/1000, Reward: 30.0, Step: 30 -2022-10-31 23:21:53 - r - INFO: - Episode: 90/1000, Reward: 34.0, Step: 34 -2022-10-31 23:21:53 - r - INFO: - Episode: 91/1000, Reward: 70.0, Step: 70 -2022-10-31 23:21:53 - r - INFO: - Episode: 92/1000, Reward: 22.0, Step: 22 -2022-10-31 23:21:53 - r - INFO: - Episode: 93/1000, Reward: 39.0, Step: 39 -2022-10-31 23:21:53 - r - INFO: - Episode: 94/1000, Reward: 50.0, Step: 50 -2022-10-31 23:21:53 - r - INFO: - Episode: 95/1000, Reward: 40.0, Step: 40 -2022-10-31 23:21:54 - r - INFO: - Episode: 96/1000, Reward: 37.0, Step: 37 -2022-10-31 23:21:54 - r - INFO: - Episode: 97/1000, Reward: 121.0, Step: 121 -2022-10-31 23:21:54 - r - INFO: - Episode: 98/1000, Reward: 26.0, Step: 26 -2022-10-31 23:21:54 - r - INFO: - Episode: 99/1000, Reward: 40.0, Step: 40 -2022-10-31 23:21:54 - r - INFO: - Episode: 100/1000, Reward: 30.0, Step: 30 -2022-10-31 23:21:55 - r - INFO: - Episode: 101/1000, Reward: 35.0, Step: 35 -2022-10-31 23:21:55 - r - INFO: - Episode: 102/1000, Reward: 40.0, Step: 40 -2022-10-31 23:21:55 - r - INFO: - Episode: 103/1000, Reward: 28.0, Step: 28 -2022-10-31 23:21:55 - r - INFO: - Episode: 104/1000, Reward: 29.0, Step: 29 -2022-10-31 23:21:55 - r - INFO: - Episode: 105/1000, Reward: 42.0, Step: 42 -2022-10-31 23:21:55 - r - INFO: - Episode: 106/1000, Reward: 54.0, Step: 54 -2022-10-31 23:21:55 - r - INFO: - Episode: 107/1000, Reward: 25.0, Step: 25 -2022-10-31 23:21:55 - r - INFO: - Episode: 108/1000, Reward: 47.0, Step: 47 -2022-10-31 23:21:55 - r - INFO: - Episode: 109/1000, Reward: 32.0, Step: 32 -2022-10-31 23:21:55 - r - INFO: - Episode: 110/1000, Reward: 50.0, Step: 50 -2022-10-31 23:21:56 - r - INFO: - Episode: 111/1000, Reward: 30.0, Step: 30 -2022-10-31 23:21:56 - r - INFO: - Episode: 112/1000, Reward: 58.0, Step: 58 -2022-10-31 23:21:56 - r - INFO: - Episode: 113/1000, Reward: 32.0, Step: 32 -2022-10-31 23:21:56 - r - INFO: - Episode: 114/1000, Reward: 43.0, Step: 43 -2022-10-31 23:21:56 - r - INFO: - Episode: 115/1000, Reward: 57.0, Step: 57 -2022-10-31 23:21:56 - r - INFO: - Episode: 116/1000, Reward: 20.0, Step: 20 -2022-10-31 23:21:57 - r - INFO: - Episode: 117/1000, Reward: 48.0, Step: 48 -2022-10-31 23:21:57 - r - INFO: - Episode: 118/1000, Reward: 45.0, Step: 45 -2022-10-31 23:21:57 - r - INFO: - Episode: 119/1000, Reward: 47.0, Step: 47 -2022-10-31 23:21:57 - r - INFO: - Episode: 120/1000, Reward: 69.0, Step: 69 -2022-10-31 23:21:57 - r - INFO: - Episode: 121/1000, Reward: 34.0, Step: 34 -2022-10-31 23:21:57 - r - INFO: - Episode: 122/1000, Reward: 22.0, Step: 22 -2022-10-31 23:21:57 - r - INFO: - Episode: 123/1000, Reward: 22.0, Step: 22 -2022-10-31 23:21:57 - r - INFO: - Episode: 124/1000, Reward: 38.0, Step: 38 -2022-10-31 23:21:57 - r - INFO: - Episode: 125/1000, Reward: 36.0, Step: 36 -2022-10-31 23:21:58 - r - INFO: - Episode: 126/1000, Reward: 41.0, Step: 41 -2022-10-31 23:21:58 - r - INFO: - Episode: 127/1000, Reward: 28.0, Step: 28 -2022-10-31 23:21:58 - r - INFO: - Episode: 128/1000, Reward: 35.0, Step: 35 -2022-10-31 23:21:58 - r - INFO: - Episode: 129/1000, Reward: 48.0, Step: 48 -2022-10-31 23:21:58 - r - INFO: - Episode: 130/1000, Reward: 51.0, Step: 51 -2022-10-31 23:21:58 - r - INFO: - Episode: 131/1000, Reward: 51.0, Step: 51 -2022-10-31 23:21:58 - r - INFO: - Episode: 132/1000, Reward: 36.0, Step: 36 -2022-10-31 23:21:59 - r - INFO: - Episode: 133/1000, Reward: 45.0, Step: 45 -2022-10-31 23:21:59 - r - INFO: - Episode: 134/1000, Reward: 27.0, Step: 27 -2022-10-31 23:21:59 - r - INFO: - Episode: 135/1000, Reward: 40.0, Step: 40 -2022-10-31 23:21:59 - r - INFO: - Episode: 136/1000, Reward: 43.0, Step: 43 -2022-10-31 23:21:59 - r - INFO: - Episode: 137/1000, Reward: 64.0, Step: 64 -2022-10-31 23:21:59 - r - INFO: - Episode: 138/1000, Reward: 43.0, Step: 43 -2022-10-31 23:21:59 - r - INFO: - Episode: 139/1000, Reward: 37.0, Step: 37 -2022-10-31 23:21:59 - r - INFO: - Episode: 140/1000, Reward: 38.0, Step: 38 -2022-10-31 23:22:00 - r - INFO: - Episode: 141/1000, Reward: 69.0, Step: 69 -2022-10-31 23:22:00 - r - INFO: - Episode: 142/1000, Reward: 36.0, Step: 36 -2022-10-31 23:22:00 - r - INFO: - Episode: 143/1000, Reward: 28.0, Step: 28 -2022-10-31 23:22:00 - r - INFO: - Episode: 144/1000, Reward: 58.0, Step: 58 -2022-10-31 23:22:00 - r - INFO: - Episode: 145/1000, Reward: 43.0, Step: 43 -2022-10-31 23:22:00 - r - INFO: - Episode: 146/1000, Reward: 50.0, Step: 50 -2022-10-31 23:22:01 - r - INFO: - Episode: 147/1000, Reward: 30.0, Step: 30 -2022-10-31 23:22:01 - r - INFO: - Episode: 148/1000, Reward: 42.0, Step: 42 -2022-10-31 23:22:01 - r - INFO: - Episode: 149/1000, Reward: 42.0, Step: 42 -2022-10-31 23:22:01 - r - INFO: - Episode: 150/1000, Reward: 35.0, Step: 35 -2022-10-31 23:22:01 - r - INFO: - Episode: 151/1000, Reward: 67.0, Step: 67 -2022-10-31 23:22:01 - r - INFO: - Episode: 152/1000, Reward: 45.0, Step: 45 -2022-10-31 23:22:01 - r - INFO: - Episode: 153/1000, Reward: 28.0, Step: 28 -2022-10-31 23:22:01 - r - INFO: - Episode: 154/1000, Reward: 59.0, Step: 59 -2022-10-31 23:22:02 - r - INFO: - Episode: 155/1000, Reward: 64.0, Step: 64 -2022-10-31 23:22:02 - r - INFO: - Episode: 156/1000, Reward: 67.0, Step: 67 -2022-10-31 23:22:02 - r - INFO: - Episode: 157/1000, Reward: 41.0, Step: 41 -2022-10-31 23:22:02 - r - INFO: - Episode: 158/1000, Reward: 81.0, Step: 81 -2022-10-31 23:22:02 - r - INFO: - Episode: 159/1000, Reward: 76.0, Step: 76 -2022-10-31 23:22:02 - r - INFO: - Episode: 160/1000, Reward: 91.0, Step: 91 -2022-10-31 23:22:03 - r - INFO: - Episode: 161/1000, Reward: 119.0, Step: 119 -2022-10-31 23:22:03 - r - INFO: - Episode: 162/1000, Reward: 47.0, Step: 47 -2022-10-31 23:22:03 - r - INFO: - Episode: 163/1000, Reward: 64.0, Step: 64 -2022-10-31 23:22:03 - r - INFO: - Episode: 164/1000, Reward: 178.0, Step: 178 -2022-10-31 23:22:04 - r - INFO: - Episode: 165/1000, Reward: 97.0, Step: 97 -2022-10-31 23:22:04 - r - INFO: - Current episode 165 has the best eval reward: 104.10 -2022-10-31 23:22:04 - r - INFO: - Episode: 166/1000, Reward: 181.0, Step: 181 -2022-10-31 23:22:05 - r - INFO: - Episode: 167/1000, Reward: 166.0, Step: 166 -2022-10-31 23:22:05 - r - INFO: - Episode: 168/1000, Reward: 79.0, Step: 79 -2022-10-31 23:22:05 - r - INFO: - Episode: 169/1000, Reward: 141.0, Step: 141 -2022-10-31 23:22:06 - r - INFO: - Episode: 170/1000, Reward: 119.0, Step: 119 -2022-10-31 23:22:06 - r - INFO: - Current episode 170 has the best eval reward: 119.50 -2022-10-31 23:22:06 - r - INFO: - Episode: 171/1000, Reward: 81.0, Step: 81 -2022-10-31 23:22:06 - r - INFO: - Episode: 172/1000, Reward: 124.0, Step: 124 -2022-10-31 23:22:07 - r - INFO: - Episode: 173/1000, Reward: 150.0, Step: 150 -2022-10-31 23:22:07 - r - INFO: - Episode: 174/1000, Reward: 98.0, Step: 98 -2022-10-31 23:22:07 - r - INFO: - Episode: 175/1000, Reward: 164.0, Step: 164 -2022-10-31 23:22:08 - r - INFO: - Current episode 175 has the best eval reward: 132.00 -2022-10-31 23:22:08 - r - INFO: - Episode: 176/1000, Reward: 200.0, Step: 200 -2022-10-31 23:22:09 - r - INFO: - Episode: 177/1000, Reward: 115.0, Step: 115 -2022-10-31 23:22:09 - r - INFO: - Episode: 178/1000, Reward: 116.0, Step: 116 -2022-10-31 23:22:09 - r - INFO: - Episode: 179/1000, Reward: 160.0, Step: 160 -2022-10-31 23:22:09 - r - INFO: - Episode: 180/1000, Reward: 103.0, Step: 103 -2022-10-31 23:22:10 - r - INFO: - Current episode 180 has the best eval reward: 134.00 -2022-10-31 23:22:10 - r - INFO: - Episode: 181/1000, Reward: 181.0, Step: 181 -2022-10-31 23:22:11 - r - INFO: - Episode: 182/1000, Reward: 185.0, Step: 185 -2022-10-31 23:22:11 - r - INFO: - Episode: 183/1000, Reward: 93.0, Step: 93 -2022-10-31 23:22:11 - r - INFO: - Episode: 184/1000, Reward: 110.0, Step: 110 -2022-10-31 23:22:12 - r - INFO: - Episode: 185/1000, Reward: 200.0, Step: 200 -2022-10-31 23:22:12 - r - INFO: - Current episode 185 has the best eval reward: 155.50 -2022-10-31 23:22:13 - r - INFO: - Episode: 186/1000, Reward: 141.0, Step: 141 -2022-10-31 23:22:13 - r - INFO: - Episode: 187/1000, Reward: 150.0, Step: 150 -2022-10-31 23:22:13 - r - INFO: - Episode: 188/1000, Reward: 121.0, Step: 121 -2022-10-31 23:22:13 - r - INFO: - Episode: 189/1000, Reward: 110.0, Step: 110 -2022-10-31 23:22:14 - r - INFO: - Episode: 190/1000, Reward: 115.0, Step: 115 -2022-10-31 23:22:14 - r - INFO: - Episode: 191/1000, Reward: 114.0, Step: 114 -2022-10-31 23:22:14 - r - INFO: - Episode: 192/1000, Reward: 45.0, Step: 45 -2022-10-31 23:22:15 - r - INFO: - Episode: 193/1000, Reward: 125.0, Step: 125 -2022-10-31 23:22:15 - r - INFO: - Episode: 194/1000, Reward: 142.0, Step: 142 -2022-10-31 23:22:15 - r - INFO: - Episode: 195/1000, Reward: 54.0, Step: 54 -2022-10-31 23:22:16 - r - INFO: - Episode: 196/1000, Reward: 62.0, Step: 62 -2022-10-31 23:22:16 - r - INFO: - Episode: 197/1000, Reward: 122.0, Step: 122 -2022-10-31 23:22:16 - r - INFO: - Episode: 198/1000, Reward: 58.0, Step: 58 -2022-10-31 23:22:16 - r - INFO: - Episode: 199/1000, Reward: 88.0, Step: 88 -2022-10-31 23:22:16 - r - INFO: - Episode: 200/1000, Reward: 141.0, Step: 141 -2022-10-31 23:22:17 - r - INFO: - Episode: 201/1000, Reward: 113.0, Step: 113 -2022-10-31 23:22:18 - r - INFO: - Episode: 202/1000, Reward: 200.0, Step: 200 -2022-10-31 23:22:18 - r - INFO: - Episode: 203/1000, Reward: 136.0, Step: 136 -2022-10-31 23:22:18 - r - INFO: - Episode: 204/1000, Reward: 114.0, Step: 114 -2022-10-31 23:22:18 - r - INFO: - Episode: 205/1000, Reward: 102.0, Step: 102 -2022-10-31 23:22:19 - r - INFO: - Episode: 206/1000, Reward: 176.0, Step: 176 -2022-10-31 23:22:20 - r - INFO: - Episode: 207/1000, Reward: 150.0, Step: 150 -2022-10-31 23:22:20 - r - INFO: - Episode: 208/1000, Reward: 105.0, Step: 105 -2022-10-31 23:22:20 - r - INFO: - Episode: 209/1000, Reward: 200.0, Step: 200 -2022-10-31 23:22:21 - r - INFO: - Episode: 210/1000, Reward: 200.0, Step: 200 -2022-10-31 23:22:21 - r - INFO: - Episode: 211/1000, Reward: 167.0, Step: 167 -2022-10-31 23:22:22 - r - INFO: - Episode: 212/1000, Reward: 104.0, Step: 104 -2022-10-31 23:22:22 - r - INFO: - Episode: 213/1000, Reward: 124.0, Step: 124 -2022-10-31 23:22:22 - r - INFO: - Episode: 214/1000, Reward: 96.0, Step: 96 -2022-10-31 23:22:23 - r - INFO: - Episode: 215/1000, Reward: 200.0, Step: 200 -2022-10-31 23:22:24 - r - INFO: - Episode: 216/1000, Reward: 199.0, Step: 199 -2022-10-31 23:22:24 - r - INFO: - Episode: 217/1000, Reward: 200.0, Step: 200 -2022-10-31 23:22:24 - r - INFO: - Episode: 218/1000, Reward: 132.0, Step: 132 -2022-10-31 23:22:25 - r - INFO: - Episode: 219/1000, Reward: 188.0, Step: 188 -2022-10-31 23:22:25 - r - INFO: - Episode: 220/1000, Reward: 132.0, Step: 132 -2022-10-31 23:22:26 - r - INFO: - Episode: 221/1000, Reward: 151.0, Step: 151 -2022-10-31 23:22:26 - r - INFO: - Episode: 222/1000, Reward: 125.0, Step: 125 -2022-10-31 23:22:26 - r - INFO: - Episode: 223/1000, Reward: 42.0, Step: 42 -2022-10-31 23:22:27 - r - INFO: - Episode: 224/1000, Reward: 200.0, Step: 200 -2022-10-31 23:22:27 - r - INFO: - Episode: 225/1000, Reward: 159.0, Step: 159 -2022-10-31 23:22:28 - r - INFO: - Episode: 226/1000, Reward: 171.0, Step: 171 -2022-10-31 23:22:28 - r - INFO: - Episode: 227/1000, Reward: 122.0, Step: 122 -2022-10-31 23:22:29 - r - INFO: - Episode: 228/1000, Reward: 189.0, Step: 189 -2022-10-31 23:22:29 - r - INFO: - Episode: 229/1000, Reward: 129.0, Step: 129 -2022-10-31 23:22:29 - r - INFO: - Episode: 230/1000, Reward: 106.0, Step: 106 -2022-10-31 23:22:30 - r - INFO: - Episode: 231/1000, Reward: 107.0, Step: 107 -2022-10-31 23:22:30 - r - INFO: - Episode: 232/1000, Reward: 200.0, Step: 200 -2022-10-31 23:22:31 - r - INFO: - Episode: 233/1000, Reward: 200.0, Step: 200 -2022-10-31 23:22:31 - r - INFO: - Episode: 234/1000, Reward: 200.0, Step: 200 -2022-10-31 23:22:32 - r - INFO: - Episode: 235/1000, Reward: 200.0, Step: 200 -2022-10-31 23:22:32 - r - INFO: - Current episode 235 has the best eval reward: 169.70 -2022-10-31 23:22:33 - r - INFO: - Episode: 236/1000, Reward: 158.0, Step: 158 -2022-10-31 23:22:33 - r - INFO: - Episode: 237/1000, Reward: 200.0, Step: 200 -2022-10-31 23:22:33 - r - INFO: - Episode: 238/1000, Reward: 192.0, Step: 192 -2022-10-31 23:22:34 - r - INFO: - Episode: 239/1000, Reward: 179.0, Step: 179 -2022-10-31 23:22:34 - r - INFO: - Episode: 240/1000, Reward: 102.0, Step: 102 -2022-10-31 23:22:35 - r - INFO: - Episode: 241/1000, Reward: 125.0, Step: 125 -2022-10-31 23:22:35 - r - INFO: - Episode: 242/1000, Reward: 138.0, Step: 138 -2022-10-31 23:22:36 - r - INFO: - Episode: 243/1000, Reward: 189.0, Step: 189 -2022-10-31 23:22:36 - r - INFO: - Episode: 244/1000, Reward: 41.0, Step: 41 -2022-10-31 23:22:36 - r - INFO: - Episode: 245/1000, Reward: 97.0, Step: 97 -2022-10-31 23:22:36 - r - INFO: - Episode: 246/1000, Reward: 49.0, Step: 49 -2022-10-31 23:22:37 - r - INFO: - Episode: 247/1000, Reward: 86.0, Step: 86 -2022-10-31 23:22:37 - r - INFO: - Episode: 248/1000, Reward: 121.0, Step: 121 -2022-10-31 23:22:37 - r - INFO: - Episode: 249/1000, Reward: 117.0, Step: 117 -2022-10-31 23:22:37 - r - INFO: - Episode: 250/1000, Reward: 43.0, Step: 43 -2022-10-31 23:22:38 - r - INFO: - Episode: 251/1000, Reward: 72.0, Step: 72 -2022-10-31 23:22:38 - r - INFO: - Episode: 252/1000, Reward: 34.0, Step: 34 -2022-10-31 23:22:38 - r - INFO: - Episode: 253/1000, Reward: 83.0, Step: 83 -2022-10-31 23:22:38 - r - INFO: - Episode: 254/1000, Reward: 83.0, Step: 83 -2022-10-31 23:22:38 - r - INFO: - Episode: 255/1000, Reward: 38.0, Step: 38 -2022-10-31 23:22:38 - r - INFO: - Episode: 256/1000, Reward: 34.0, Step: 34 -2022-10-31 23:22:39 - r - INFO: - Episode: 257/1000, Reward: 99.0, Step: 99 -2022-10-31 23:22:39 - r - INFO: - Episode: 258/1000, Reward: 45.0, Step: 45 -2022-10-31 23:22:39 - r - INFO: - Episode: 259/1000, Reward: 47.0, Step: 47 -2022-10-31 23:22:39 - r - INFO: - Episode: 260/1000, Reward: 44.0, Step: 44 -2022-10-31 23:22:39 - r - INFO: - Episode: 261/1000, Reward: 26.0, Step: 26 -2022-10-31 23:22:39 - r - INFO: - Episode: 262/1000, Reward: 37.0, Step: 37 -2022-10-31 23:22:39 - r - INFO: - Episode: 263/1000, Reward: 26.0, Step: 26 -2022-10-31 23:22:39 - r - INFO: - Episode: 264/1000, Reward: 43.0, Step: 43 -2022-10-31 23:22:40 - r - INFO: - Episode: 265/1000, Reward: 27.0, Step: 27 -2022-10-31 23:22:40 - r - INFO: - Episode: 266/1000, Reward: 24.0, Step: 24 -2022-10-31 23:22:40 - r - INFO: - Episode: 267/1000, Reward: 42.0, Step: 42 -2022-10-31 23:22:40 - r - INFO: - Episode: 268/1000, Reward: 86.0, Step: 86 -2022-10-31 23:22:40 - r - INFO: - Episode: 269/1000, Reward: 23.0, Step: 23 -2022-10-31 23:22:40 - r - INFO: - Episode: 270/1000, Reward: 32.0, Step: 32 -2022-10-31 23:22:40 - r - INFO: - Episode: 271/1000, Reward: 57.0, Step: 57 -2022-10-31 23:22:40 - r - INFO: - Episode: 272/1000, Reward: 25.0, Step: 25 -2022-10-31 23:22:41 - r - INFO: - Episode: 273/1000, Reward: 98.0, Step: 98 -2022-10-31 23:22:41 - r - INFO: - Episode: 274/1000, Reward: 29.0, Step: 29 -2022-10-31 23:22:41 - r - INFO: - Episode: 275/1000, Reward: 25.0, Step: 25 -2022-10-31 23:22:41 - r - INFO: - Episode: 276/1000, Reward: 29.0, Step: 29 -2022-10-31 23:22:41 - r - INFO: - Episode: 277/1000, Reward: 39.0, Step: 39 -2022-10-31 23:22:41 - r - INFO: - Episode: 278/1000, Reward: 20.0, Step: 20 -2022-10-31 23:22:41 - r - INFO: - Episode: 279/1000, Reward: 92.0, Step: 92 -2022-10-31 23:22:41 - r - INFO: - Episode: 280/1000, Reward: 28.0, Step: 28 -2022-10-31 23:22:42 - r - INFO: - Episode: 281/1000, Reward: 78.0, Step: 78 -2022-10-31 23:22:42 - r - INFO: - Episode: 282/1000, Reward: 25.0, Step: 25 -2022-10-31 23:22:42 - r - INFO: - Episode: 283/1000, Reward: 31.0, Step: 31 -2022-10-31 23:22:42 - r - INFO: - Episode: 284/1000, Reward: 88.0, Step: 88 -2022-10-31 23:22:42 - r - INFO: - Episode: 285/1000, Reward: 85.0, Step: 85 -2022-10-31 23:22:43 - r - INFO: - Episode: 286/1000, Reward: 37.0, Step: 37 -2022-10-31 23:22:43 - r - INFO: - Episode: 287/1000, Reward: 26.0, Step: 26 -2022-10-31 23:22:43 - r - INFO: - Episode: 288/1000, Reward: 19.0, Step: 19 -2022-10-31 23:22:43 - r - INFO: - Episode: 289/1000, Reward: 40.0, Step: 40 -2022-10-31 23:22:43 - r - INFO: - Episode: 290/1000, Reward: 27.0, Step: 27 -2022-10-31 23:22:43 - r - INFO: - Episode: 291/1000, Reward: 17.0, Step: 17 -2022-10-31 23:22:43 - r - INFO: - Episode: 292/1000, Reward: 27.0, Step: 27 -2022-10-31 23:22:43 - r - INFO: - Episode: 293/1000, Reward: 26.0, Step: 26 -2022-10-31 23:22:43 - r - INFO: - Episode: 294/1000, Reward: 82.0, Step: 82 -2022-10-31 23:22:43 - r - INFO: - Episode: 295/1000, Reward: 36.0, Step: 36 -2022-10-31 23:22:44 - r - INFO: - Episode: 296/1000, Reward: 24.0, Step: 24 -2022-10-31 23:22:44 - r - INFO: - Episode: 297/1000, Reward: 30.0, Step: 30 -2022-10-31 23:22:44 - r - INFO: - Episode: 298/1000, Reward: 20.0, Step: 20 -2022-10-31 23:22:44 - r - INFO: - Episode: 299/1000, Reward: 34.0, Step: 34 -2022-10-31 23:22:44 - r - INFO: - Episode: 300/1000, Reward: 30.0, Step: 30 -2022-10-31 23:22:44 - r - INFO: - Episode: 301/1000, Reward: 23.0, Step: 23 -2022-10-31 23:22:44 - r - INFO: - Episode: 302/1000, Reward: 36.0, Step: 36 -2022-10-31 23:22:44 - r - INFO: - Episode: 303/1000, Reward: 29.0, Step: 29 -2022-10-31 23:22:44 - r - INFO: - Episode: 304/1000, Reward: 34.0, Step: 34 -2022-10-31 23:22:44 - r - INFO: - Episode: 305/1000, Reward: 25.0, Step: 25 -2022-10-31 23:22:45 - r - INFO: - Episode: 306/1000, Reward: 42.0, Step: 42 -2022-10-31 23:22:45 - r - INFO: - Episode: 307/1000, Reward: 88.0, Step: 88 -2022-10-31 23:22:45 - r - INFO: - Episode: 308/1000, Reward: 26.0, Step: 26 -2022-10-31 23:22:45 - r - INFO: - Episode: 309/1000, Reward: 85.0, Step: 85 -2022-10-31 23:22:45 - r - INFO: - Episode: 310/1000, Reward: 89.0, Step: 89 -2022-10-31 23:22:46 - r - INFO: - Episode: 311/1000, Reward: 48.0, Step: 48 -2022-10-31 23:22:46 - r - INFO: - Episode: 312/1000, Reward: 83.0, Step: 83 -2022-10-31 23:22:46 - r - INFO: - Episode: 313/1000, Reward: 109.0, Step: 109 -2022-10-31 23:22:46 - r - INFO: - Episode: 314/1000, Reward: 42.0, Step: 42 -2022-10-31 23:22:46 - r - INFO: - Episode: 315/1000, Reward: 93.0, Step: 93 -2022-10-31 23:22:47 - r - INFO: - Episode: 316/1000, Reward: 85.0, Step: 85 -2022-10-31 23:22:47 - r - INFO: - Episode: 317/1000, Reward: 100.0, Step: 100 -2022-10-31 23:22:47 - r - INFO: - Episode: 318/1000, Reward: 106.0, Step: 106 -2022-10-31 23:22:47 - r - INFO: - Episode: 319/1000, Reward: 28.0, Step: 28 -2022-10-31 23:22:48 - r - INFO: - Episode: 320/1000, Reward: 108.0, Step: 108 -2022-10-31 23:22:48 - r - INFO: - Episode: 321/1000, Reward: 112.0, Step: 112 -2022-10-31 23:22:48 - r - INFO: - Episode: 322/1000, Reward: 88.0, Step: 88 -2022-10-31 23:22:49 - r - INFO: - Episode: 323/1000, Reward: 108.0, Step: 108 -2022-10-31 23:22:49 - r - INFO: - Episode: 324/1000, Reward: 108.0, Step: 108 -2022-10-31 23:22:49 - r - INFO: - Episode: 325/1000, Reward: 90.0, Step: 90 -2022-10-31 23:22:50 - r - INFO: - Episode: 326/1000, Reward: 112.0, Step: 112 -2022-10-31 23:22:50 - r - INFO: - Episode: 327/1000, Reward: 113.0, Step: 113 -2022-10-31 23:22:50 - r - INFO: - Episode: 328/1000, Reward: 94.0, Step: 94 -2022-10-31 23:22:50 - r - INFO: - Episode: 329/1000, Reward: 99.0, Step: 99 -2022-10-31 23:22:51 - r - INFO: - Episode: 330/1000, Reward: 45.0, Step: 45 -2022-10-31 23:22:51 - r - INFO: - Episode: 331/1000, Reward: 121.0, Step: 121 -2022-10-31 23:22:51 - r - INFO: - Episode: 332/1000, Reward: 102.0, Step: 102 -2022-10-31 23:22:52 - r - INFO: - Episode: 333/1000, Reward: 111.0, Step: 111 -2022-10-31 23:22:52 - r - INFO: - Episode: 334/1000, Reward: 54.0, Step: 54 -2022-10-31 23:22:52 - r - INFO: - Episode: 335/1000, Reward: 198.0, Step: 198 -2022-10-31 23:22:53 - r - INFO: - Episode: 336/1000, Reward: 83.0, Step: 83 -2022-10-31 23:22:53 - r - INFO: - Episode: 337/1000, Reward: 107.0, Step: 107 -2022-10-31 23:22:53 - r - INFO: - Episode: 338/1000, Reward: 101.0, Step: 101 -2022-10-31 23:22:54 - r - INFO: - Episode: 339/1000, Reward: 129.0, Step: 129 -2022-10-31 23:22:54 - r - INFO: - Episode: 340/1000, Reward: 88.0, Step: 88 -2022-10-31 23:22:54 - r - INFO: - Episode: 341/1000, Reward: 86.0, Step: 86 -2022-10-31 23:22:55 - r - INFO: - Episode: 342/1000, Reward: 199.0, Step: 199 -2022-10-31 23:22:55 - r - INFO: - Episode: 343/1000, Reward: 95.0, Step: 95 -2022-10-31 23:22:55 - r - INFO: - Episode: 344/1000, Reward: 103.0, Step: 103 -2022-10-31 23:22:56 - r - INFO: - Episode: 345/1000, Reward: 100.0, Step: 100 -2022-10-31 23:22:56 - r - INFO: - Episode: 346/1000, Reward: 89.0, Step: 89 -2022-10-31 23:22:56 - r - INFO: - Episode: 347/1000, Reward: 87.0, Step: 87 -2022-10-31 23:22:57 - r - INFO: - Episode: 348/1000, Reward: 110.0, Step: 110 -2022-10-31 23:22:57 - r - INFO: - Episode: 349/1000, Reward: 127.0, Step: 127 -2022-10-31 23:22:57 - r - INFO: - Episode: 350/1000, Reward: 97.0, Step: 97 -2022-10-31 23:22:57 - r - INFO: - Episode: 351/1000, Reward: 34.0, Step: 34 -2022-10-31 23:22:58 - r - INFO: - Episode: 352/1000, Reward: 123.0, Step: 123 -2022-10-31 23:22:58 - r - INFO: - Episode: 353/1000, Reward: 49.0, Step: 49 -2022-10-31 23:22:58 - r - INFO: - Episode: 354/1000, Reward: 96.0, Step: 96 -2022-10-31 23:22:58 - r - INFO: - Episode: 355/1000, Reward: 90.0, Step: 90 -2022-10-31 23:22:59 - r - INFO: - Episode: 356/1000, Reward: 110.0, Step: 110 -2022-10-31 23:22:59 - r - INFO: - Episode: 357/1000, Reward: 93.0, Step: 93 -2022-10-31 23:22:59 - r - INFO: - Episode: 358/1000, Reward: 102.0, Step: 102 -2022-10-31 23:23:00 - r - INFO: - Episode: 359/1000, Reward: 128.0, Step: 128 -2022-10-31 23:23:00 - r - INFO: - Episode: 360/1000, Reward: 125.0, Step: 125 -2022-10-31 23:23:01 - r - INFO: - Episode: 361/1000, Reward: 92.0, Step: 92 -2022-10-31 23:23:01 - r - INFO: - Episode: 362/1000, Reward: 109.0, Step: 109 -2022-10-31 23:23:01 - r - INFO: - Episode: 363/1000, Reward: 114.0, Step: 114 -2022-10-31 23:23:01 - r - INFO: - Episode: 364/1000, Reward: 111.0, Step: 111 -2022-10-31 23:23:02 - r - INFO: - Episode: 365/1000, Reward: 38.0, Step: 38 -2022-10-31 23:23:02 - r - INFO: - Episode: 366/1000, Reward: 55.0, Step: 55 -2022-10-31 23:23:02 - r - INFO: - Episode: 367/1000, Reward: 106.0, Step: 106 -2022-10-31 23:23:02 - r - INFO: - Episode: 368/1000, Reward: 115.0, Step: 115 -2022-10-31 23:23:03 - r - INFO: - Episode: 369/1000, Reward: 103.0, Step: 103 -2022-10-31 23:23:03 - r - INFO: - Episode: 370/1000, Reward: 50.0, Step: 50 -2022-10-31 23:23:03 - r - INFO: - Episode: 371/1000, Reward: 110.0, Step: 110 -2022-10-31 23:23:04 - r - INFO: - Episode: 372/1000, Reward: 102.0, Step: 102 -2022-10-31 23:23:04 - r - INFO: - Episode: 373/1000, Reward: 110.0, Step: 110 -2022-10-31 23:23:04 - r - INFO: - Episode: 374/1000, Reward: 29.0, Step: 29 -2022-10-31 23:23:04 - r - INFO: - Episode: 375/1000, Reward: 35.0, Step: 35 -2022-10-31 23:23:04 - r - INFO: - Episode: 376/1000, Reward: 42.0, Step: 42 -2022-10-31 23:23:05 - r - INFO: - Episode: 377/1000, Reward: 62.0, Step: 62 -2022-10-31 23:23:05 - r - INFO: - Episode: 378/1000, Reward: 119.0, Step: 119 -2022-10-31 23:23:05 - r - INFO: - Episode: 379/1000, Reward: 33.0, Step: 33 -2022-10-31 23:23:05 - r - INFO: - Episode: 380/1000, Reward: 31.0, Step: 31 -2022-10-31 23:23:05 - r - INFO: - Episode: 381/1000, Reward: 97.0, Step: 97 -2022-10-31 23:23:06 - r - INFO: - Episode: 382/1000, Reward: 192.0, Step: 192 -2022-10-31 23:23:06 - r - INFO: - Episode: 383/1000, Reward: 179.0, Step: 179 -2022-10-31 23:23:07 - r - INFO: - Episode: 384/1000, Reward: 89.0, Step: 89 -2022-10-31 23:23:07 - r - INFO: - Episode: 385/1000, Reward: 32.0, Step: 32 -2022-10-31 23:23:07 - r - INFO: - Episode: 386/1000, Reward: 33.0, Step: 33 -2022-10-31 23:23:07 - r - INFO: - Episode: 387/1000, Reward: 52.0, Step: 52 -2022-10-31 23:23:07 - r - INFO: - Episode: 388/1000, Reward: 31.0, Step: 31 -2022-10-31 23:23:07 - r - INFO: - Episode: 389/1000, Reward: 22.0, Step: 22 -2022-10-31 23:23:08 - r - INFO: - Episode: 390/1000, Reward: 118.0, Step: 118 -2022-10-31 23:23:08 - r - INFO: - Episode: 391/1000, Reward: 24.0, Step: 24 -2022-10-31 23:23:08 - r - INFO: - Episode: 392/1000, Reward: 115.0, Step: 115 -2022-10-31 23:23:08 - r - INFO: - Episode: 393/1000, Reward: 20.0, Step: 20 -2022-10-31 23:23:08 - r - INFO: - Episode: 394/1000, Reward: 33.0, Step: 33 -2022-10-31 23:23:08 - r - INFO: - Episode: 395/1000, Reward: 40.0, Step: 40 -2022-10-31 23:23:08 - r - INFO: - Episode: 396/1000, Reward: 27.0, Step: 27 -2022-10-31 23:23:08 - r - INFO: - Episode: 397/1000, Reward: 26.0, Step: 26 -2022-10-31 23:23:09 - r - INFO: - Episode: 398/1000, Reward: 24.0, Step: 24 -2022-10-31 23:23:09 - r - INFO: - Episode: 399/1000, Reward: 19.0, Step: 19 -2022-10-31 23:23:09 - r - INFO: - Episode: 400/1000, Reward: 22.0, Step: 22 -2022-10-31 23:23:09 - r - INFO: - Episode: 401/1000, Reward: 24.0, Step: 24 -2022-10-31 23:23:09 - r - INFO: - Episode: 402/1000, Reward: 18.0, Step: 18 -2022-10-31 23:23:09 - r - INFO: - Episode: 403/1000, Reward: 23.0, Step: 23 -2022-10-31 23:23:09 - r - INFO: - Episode: 404/1000, Reward: 27.0, Step: 27 -2022-10-31 23:23:09 - r - INFO: - Episode: 405/1000, Reward: 20.0, Step: 20 -2022-10-31 23:23:09 - r - INFO: - Episode: 406/1000, Reward: 27.0, Step: 27 -2022-10-31 23:23:09 - r - INFO: - Episode: 407/1000, Reward: 17.0, Step: 17 -2022-10-31 23:23:09 - r - INFO: - Episode: 408/1000, Reward: 27.0, Step: 27 -2022-10-31 23:23:09 - r - INFO: - Episode: 409/1000, Reward: 25.0, Step: 25 -2022-10-31 23:23:09 - r - INFO: - Episode: 410/1000, Reward: 25.0, Step: 25 -2022-10-31 23:23:09 - r - INFO: - Episode: 411/1000, Reward: 24.0, Step: 24 -2022-10-31 23:23:10 - r - INFO: - Episode: 412/1000, Reward: 24.0, Step: 24 -2022-10-31 23:23:10 - r - INFO: - Episode: 413/1000, Reward: 18.0, Step: 18 -2022-10-31 23:23:10 - r - INFO: - Episode: 414/1000, Reward: 20.0, Step: 20 -2022-10-31 23:23:10 - r - INFO: - Episode: 415/1000, Reward: 27.0, Step: 27 -2022-10-31 23:23:10 - r - INFO: - Episode: 416/1000, Reward: 28.0, Step: 28 -2022-10-31 23:23:10 - r - INFO: - Episode: 417/1000, Reward: 30.0, Step: 30 -2022-10-31 23:23:10 - r - INFO: - Episode: 418/1000, Reward: 28.0, Step: 28 -2022-10-31 23:23:10 - r - INFO: - Episode: 419/1000, Reward: 33.0, Step: 33 -2022-10-31 23:23:10 - r - INFO: - Episode: 420/1000, Reward: 24.0, Step: 24 -2022-10-31 23:23:10 - r - INFO: - Episode: 421/1000, Reward: 96.0, Step: 96 -2022-10-31 23:23:11 - r - INFO: - Episode: 422/1000, Reward: 26.0, Step: 26 -2022-10-31 23:23:11 - r - INFO: - Episode: 423/1000, Reward: 29.0, Step: 29 -2022-10-31 23:23:11 - r - INFO: - Episode: 424/1000, Reward: 25.0, Step: 25 -2022-10-31 23:23:11 - r - INFO: - Episode: 425/1000, Reward: 38.0, Step: 38 -2022-10-31 23:23:11 - r - INFO: - Episode: 426/1000, Reward: 33.0, Step: 33 -2022-10-31 23:23:11 - r - INFO: - Episode: 427/1000, Reward: 23.0, Step: 23 -2022-10-31 23:23:11 - r - INFO: - Episode: 428/1000, Reward: 39.0, Step: 39 -2022-10-31 23:23:11 - r - INFO: - Episode: 429/1000, Reward: 28.0, Step: 28 -2022-10-31 23:23:11 - r - INFO: - Episode: 430/1000, Reward: 97.0, Step: 97 -2022-10-31 23:23:12 - r - INFO: - Episode: 431/1000, Reward: 30.0, Step: 30 -2022-10-31 23:23:12 - r - INFO: - Episode: 432/1000, Reward: 29.0, Step: 29 -2022-10-31 23:23:12 - r - INFO: - Episode: 433/1000, Reward: 103.0, Step: 103 -2022-10-31 23:23:12 - r - INFO: - Episode: 434/1000, Reward: 36.0, Step: 36 -2022-10-31 23:23:12 - r - INFO: - Episode: 435/1000, Reward: 32.0, Step: 32 -2022-10-31 23:23:12 - r - INFO: - Episode: 436/1000, Reward: 41.0, Step: 41 -2022-10-31 23:23:13 - r - INFO: - Episode: 437/1000, Reward: 111.0, Step: 111 -2022-10-31 23:23:13 - r - INFO: - Episode: 438/1000, Reward: 48.0, Step: 48 -2022-10-31 23:23:13 - r - INFO: - Episode: 439/1000, Reward: 24.0, Step: 24 -2022-10-31 23:23:13 - r - INFO: - Episode: 440/1000, Reward: 49.0, Step: 49 -2022-10-31 23:23:14 - r - INFO: - Episode: 441/1000, Reward: 116.0, Step: 116 -2022-10-31 23:23:14 - r - INFO: - Episode: 442/1000, Reward: 118.0, Step: 118 -2022-10-31 23:23:14 - r - INFO: - Episode: 443/1000, Reward: 94.0, Step: 94 -2022-10-31 23:23:14 - r - INFO: - Episode: 444/1000, Reward: 132.0, Step: 132 -2022-10-31 23:23:14 - r - INFO: - Episode: 445/1000, Reward: 41.0, Step: 41 -2022-10-31 23:23:15 - r - INFO: - Episode: 446/1000, Reward: 105.0, Step: 105 -2022-10-31 23:23:15 - r - INFO: - Episode: 447/1000, Reward: 116.0, Step: 116 -2022-10-31 23:23:16 - r - INFO: - Episode: 448/1000, Reward: 136.0, Step: 136 -2022-10-31 23:23:16 - r - INFO: - Episode: 449/1000, Reward: 137.0, Step: 137 -2022-10-31 23:23:16 - r - INFO: - Episode: 450/1000, Reward: 45.0, Step: 45 -2022-10-31 23:23:17 - r - INFO: - Episode: 451/1000, Reward: 157.0, Step: 157 -2022-10-31 23:23:17 - r - INFO: - Episode: 452/1000, Reward: 116.0, Step: 116 -2022-10-31 23:23:17 - r - INFO: - Episode: 453/1000, Reward: 125.0, Step: 125 -2022-10-31 23:23:18 - r - INFO: - Episode: 454/1000, Reward: 120.0, Step: 120 -2022-10-31 23:23:18 - r - INFO: - Episode: 455/1000, Reward: 150.0, Step: 150 -2022-10-31 23:23:19 - r - INFO: - Episode: 456/1000, Reward: 114.0, Step: 114 -2022-10-31 23:23:19 - r - INFO: - Episode: 457/1000, Reward: 44.0, Step: 44 -2022-10-31 23:23:19 - r - INFO: - Episode: 458/1000, Reward: 138.0, Step: 138 -2022-10-31 23:23:19 - r - INFO: - Episode: 459/1000, Reward: 133.0, Step: 133 -2022-10-31 23:23:20 - r - INFO: - Episode: 460/1000, Reward: 141.0, Step: 141 -2022-10-31 23:23:20 - r - INFO: - Episode: 461/1000, Reward: 124.0, Step: 124 -2022-10-31 23:23:21 - r - INFO: - Episode: 462/1000, Reward: 143.0, Step: 143 -2022-10-31 23:23:21 - r - INFO: - Episode: 463/1000, Reward: 123.0, Step: 123 -2022-10-31 23:23:21 - r - INFO: - Episode: 464/1000, Reward: 134.0, Step: 134 -2022-10-31 23:23:22 - r - INFO: - Episode: 465/1000, Reward: 152.0, Step: 152 -2022-10-31 23:23:23 - r - INFO: - Episode: 466/1000, Reward: 140.0, Step: 140 -2022-10-31 23:23:23 - r - INFO: - Episode: 467/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:23 - r - INFO: - Episode: 468/1000, Reward: 168.0, Step: 168 -2022-10-31 23:23:24 - r - INFO: - Episode: 469/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:24 - r - INFO: - Episode: 470/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:25 - r - INFO: - Current episode 470 has the best eval reward: 199.80 -2022-10-31 23:23:25 - r - INFO: - Episode: 471/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:26 - r - INFO: - Episode: 472/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:26 - r - INFO: - Episode: 473/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:27 - r - INFO: - Episode: 474/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:27 - r - INFO: - Episode: 475/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:28 - r - INFO: - Current episode 475 has the best eval reward: 200.00 -2022-10-31 23:23:28 - r - INFO: - Episode: 476/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:29 - r - INFO: - Episode: 477/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:29 - r - INFO: - Episode: 478/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:30 - r - INFO: - Episode: 479/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:30 - r - INFO: - Episode: 480/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:31 - r - INFO: - Current episode 480 has the best eval reward: 200.00 -2022-10-31 23:23:31 - r - INFO: - Episode: 481/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:32 - r - INFO: - Episode: 482/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:32 - r - INFO: - Episode: 483/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:33 - r - INFO: - Episode: 484/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:33 - r - INFO: - Episode: 485/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:34 - r - INFO: - Current episode 485 has the best eval reward: 200.00 -2022-10-31 23:23:34 - r - INFO: - Episode: 486/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:35 - r - INFO: - Episode: 487/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:35 - r - INFO: - Episode: 488/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:36 - r - INFO: - Episode: 489/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:36 - r - INFO: - Episode: 490/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:37 - r - INFO: - Current episode 490 has the best eval reward: 200.00 -2022-10-31 23:23:37 - r - INFO: - Episode: 491/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:38 - r - INFO: - Episode: 492/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:38 - r - INFO: - Episode: 493/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:38 - r - INFO: - Episode: 494/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:39 - r - INFO: - Episode: 495/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:40 - r - INFO: - Current episode 495 has the best eval reward: 200.00 -2022-10-31 23:23:40 - r - INFO: - Episode: 496/1000, Reward: 169.0, Step: 169 -2022-10-31 23:23:40 - r - INFO: - Episode: 497/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:41 - r - INFO: - Episode: 498/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:41 - r - INFO: - Episode: 499/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:42 - r - INFO: - Episode: 500/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:42 - r - INFO: - Current episode 500 has the best eval reward: 200.00 -2022-10-31 23:23:43 - r - INFO: - Episode: 501/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:43 - r - INFO: - Episode: 502/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:44 - r - INFO: - Episode: 503/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:44 - r - INFO: - Episode: 504/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:45 - r - INFO: - Episode: 505/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:45 - r - INFO: - Current episode 505 has the best eval reward: 200.00 -2022-10-31 23:23:46 - r - INFO: - Episode: 506/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:46 - r - INFO: - Episode: 507/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:47 - r - INFO: - Episode: 508/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:47 - r - INFO: - Episode: 509/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:48 - r - INFO: - Episode: 510/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:49 - r - INFO: - Episode: 511/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:49 - r - INFO: - Episode: 512/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:50 - r - INFO: - Episode: 513/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:50 - r - INFO: - Episode: 514/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:51 - r - INFO: - Episode: 515/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:52 - r - INFO: - Episode: 516/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:52 - r - INFO: - Episode: 517/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:53 - r - INFO: - Episode: 518/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:53 - r - INFO: - Episode: 519/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:54 - r - INFO: - Episode: 520/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:55 - r - INFO: - Current episode 520 has the best eval reward: 200.00 -2022-10-31 23:23:55 - r - INFO: - Episode: 521/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:55 - r - INFO: - Episode: 522/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:56 - r - INFO: - Episode: 523/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:56 - r - INFO: - Episode: 524/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:57 - r - INFO: - Episode: 525/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:58 - r - INFO: - Current episode 525 has the best eval reward: 200.00 -2022-10-31 23:23:58 - r - INFO: - Episode: 526/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:59 - r - INFO: - Episode: 527/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:59 - r - INFO: - Episode: 528/1000, Reward: 200.0, Step: 200 -2022-10-31 23:23:59 - r - INFO: - Episode: 529/1000, Reward: 186.0, Step: 186 -2022-10-31 23:24:00 - r - INFO: - Episode: 530/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:00 - r - INFO: - Current episode 530 has the best eval reward: 200.00 -2022-10-31 23:24:01 - r - INFO: - Episode: 531/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:01 - r - INFO: - Episode: 532/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:02 - r - INFO: - Episode: 533/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:02 - r - INFO: - Episode: 534/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:03 - r - INFO: - Episode: 535/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:04 - r - INFO: - Current episode 535 has the best eval reward: 200.00 -2022-10-31 23:24:04 - r - INFO: - Episode: 536/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:04 - r - INFO: - Episode: 537/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:05 - r - INFO: - Episode: 538/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:05 - r - INFO: - Episode: 539/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:06 - r - INFO: - Episode: 540/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:07 - r - INFO: - Current episode 540 has the best eval reward: 200.00 -2022-10-31 23:24:07 - r - INFO: - Episode: 541/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:08 - r - INFO: - Episode: 542/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:08 - r - INFO: - Episode: 543/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:08 - r - INFO: - Episode: 544/1000, Reward: 84.0, Step: 84 -2022-10-31 23:24:09 - r - INFO: - Episode: 545/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:09 - r - INFO: - Current episode 545 has the best eval reward: 200.00 -2022-10-31 23:24:10 - r - INFO: - Episode: 546/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:10 - r - INFO: - Episode: 547/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:11 - r - INFO: - Episode: 548/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:11 - r - INFO: - Episode: 549/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:12 - r - INFO: - Episode: 550/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:13 - r - INFO: - Episode: 551/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:13 - r - INFO: - Episode: 552/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:14 - r - INFO: - Episode: 553/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:14 - r - INFO: - Episode: 554/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:15 - r - INFO: - Episode: 555/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:16 - r - INFO: - Episode: 556/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:16 - r - INFO: - Episode: 557/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:17 - r - INFO: - Episode: 558/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:17 - r - INFO: - Episode: 559/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:17 - r - INFO: - Episode: 560/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:18 - r - INFO: - Current episode 560 has the best eval reward: 200.00 -2022-10-31 23:24:19 - r - INFO: - Episode: 561/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:19 - r - INFO: - Episode: 562/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:20 - r - INFO: - Episode: 563/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:20 - r - INFO: - Episode: 564/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:20 - r - INFO: - Episode: 565/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:21 - r - INFO: - Current episode 565 has the best eval reward: 200.00 -2022-10-31 23:24:21 - r - INFO: - Episode: 566/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:22 - r - INFO: - Episode: 567/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:22 - r - INFO: - Episode: 568/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:23 - r - INFO: - Episode: 569/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:23 - r - INFO: - Episode: 570/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:24 - r - INFO: - Current episode 570 has the best eval reward: 200.00 -2022-10-31 23:24:24 - r - INFO: - Episode: 571/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:25 - r - INFO: - Episode: 572/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:25 - r - INFO: - Episode: 573/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:26 - r - INFO: - Episode: 574/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:26 - r - INFO: - Episode: 575/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:28 - r - INFO: - Episode: 576/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:28 - r - INFO: - Episode: 577/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:28 - r - INFO: - Episode: 578/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:29 - r - INFO: - Episode: 579/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:29 - r - INFO: - Episode: 580/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:30 - r - INFO: - Current episode 580 has the best eval reward: 200.00 -2022-10-31 23:24:31 - r - INFO: - Episode: 581/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:31 - r - INFO: - Episode: 582/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:32 - r - INFO: - Episode: 583/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:32 - r - INFO: - Episode: 584/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:33 - r - INFO: - Episode: 585/1000, Reward: 199.0, Step: 199 -2022-10-31 23:24:34 - r - INFO: - Episode: 586/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:34 - r - INFO: - Episode: 587/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:35 - r - INFO: - Episode: 588/1000, Reward: 178.0, Step: 178 -2022-10-31 23:24:35 - r - INFO: - Episode: 589/1000, Reward: 200.0, Step: 200 -2022-10-31 23:24:36 - r - INFO: - Episode: 590/1000, Reward: 188.0, Step: 188 -2022-10-31 23:24:36 - r - INFO: - Episode: 591/1000, Reward: 156.0, Step: 156 -2022-10-31 23:24:37 - r - INFO: - Episode: 592/1000, Reward: 165.0, Step: 165 -2022-10-31 23:24:37 - r - INFO: - Episode: 593/1000, Reward: 131.0, Step: 131 -2022-10-31 23:24:37 - r - INFO: - Episode: 594/1000, Reward: 157.0, Step: 157 -2022-10-31 23:24:38 - r - INFO: - Episode: 595/1000, Reward: 170.0, Step: 170 -2022-10-31 23:24:39 - r - INFO: - Episode: 596/1000, Reward: 123.0, Step: 123 -2022-10-31 23:24:39 - r - INFO: - Episode: 597/1000, Reward: 109.0, Step: 109 -2022-10-31 23:24:39 - r - INFO: - Episode: 598/1000, Reward: 124.0, Step: 124 -2022-10-31 23:24:39 - r - INFO: - Episode: 599/1000, Reward: 113.0, Step: 113 -2022-10-31 23:24:39 - r - INFO: - Episode: 600/1000, Reward: 38.0, Step: 38 -2022-10-31 23:24:40 - r - INFO: - Episode: 601/1000, Reward: 107.0, Step: 107 -2022-10-31 23:24:40 - r - INFO: - Episode: 602/1000, Reward: 115.0, Step: 115 -2022-10-31 23:24:41 - r - INFO: - Episode: 603/1000, Reward: 101.0, Step: 101 -2022-10-31 23:24:41 - r - INFO: - Episode: 604/1000, Reward: 113.0, Step: 113 -2022-10-31 23:24:41 - r - INFO: - Episode: 605/1000, Reward: 100.0, Step: 100 -2022-10-31 23:24:42 - r - INFO: - Episode: 606/1000, Reward: 109.0, Step: 109 -2022-10-31 23:24:42 - r - INFO: - Episode: 607/1000, Reward: 119.0, Step: 119 -2022-10-31 23:24:42 - r - INFO: - Episode: 608/1000, Reward: 117.0, Step: 117 -2022-10-31 23:24:43 - r - INFO: - Episode: 609/1000, Reward: 108.0, Step: 108 -2022-10-31 23:24:43 - r - INFO: - Episode: 610/1000, Reward: 101.0, Step: 101 -2022-10-31 23:24:43 - r - INFO: - Episode: 611/1000, Reward: 110.0, Step: 110 -2022-10-31 23:24:44 - r - INFO: - Episode: 612/1000, Reward: 59.0, Step: 59 -2022-10-31 23:24:44 - r - INFO: - Episode: 613/1000, Reward: 112.0, Step: 112 -2022-10-31 23:24:44 - r - INFO: - Episode: 614/1000, Reward: 104.0, Step: 104 -2022-10-31 23:24:44 - r - INFO: - Episode: 615/1000, Reward: 45.0, Step: 45 -2022-10-31 23:24:44 - r - INFO: - Episode: 616/1000, Reward: 29.0, Step: 29 -2022-10-31 23:24:44 - r - INFO: - Episode: 617/1000, Reward: 42.0, Step: 42 -2022-10-31 23:24:45 - r - INFO: - Episode: 618/1000, Reward: 74.0, Step: 74 -2022-10-31 23:24:45 - r - INFO: - Episode: 619/1000, Reward: 79.0, Step: 79 -2022-10-31 23:24:45 - r - INFO: - Episode: 620/1000, Reward: 50.0, Step: 50 -2022-10-31 23:24:45 - r - INFO: - Episode: 621/1000, Reward: 30.0, Step: 30 -2022-10-31 23:24:45 - r - INFO: - Episode: 622/1000, Reward: 43.0, Step: 43 -2022-10-31 23:24:46 - r - INFO: - Episode: 623/1000, Reward: 77.0, Step: 77 -2022-10-31 23:24:46 - r - INFO: - Episode: 624/1000, Reward: 36.0, Step: 36 -2022-10-31 23:24:46 - r - INFO: - Episode: 625/1000, Reward: 61.0, Step: 61 -2022-10-31 23:24:46 - r - INFO: - Episode: 626/1000, Reward: 36.0, Step: 36 -2022-10-31 23:24:46 - r - INFO: - Episode: 627/1000, Reward: 30.0, Step: 30 -2022-10-31 23:24:46 - r - INFO: - Episode: 628/1000, Reward: 43.0, Step: 43 -2022-10-31 23:24:46 - r - INFO: - Episode: 629/1000, Reward: 27.0, Step: 27 -2022-10-31 23:24:46 - r - INFO: - Episode: 630/1000, Reward: 88.0, Step: 88 -2022-10-31 23:24:47 - r - INFO: - Episode: 631/1000, Reward: 42.0, Step: 42 -2022-10-31 23:24:47 - r - INFO: - Episode: 632/1000, Reward: 40.0, Step: 40 -2022-10-31 23:24:47 - r - INFO: - Episode: 633/1000, Reward: 59.0, Step: 59 -2022-10-31 23:24:47 - r - INFO: - Episode: 634/1000, Reward: 81.0, Step: 81 -2022-10-31 23:24:47 - r - INFO: - Episode: 635/1000, Reward: 85.0, Step: 85 -2022-10-31 23:24:48 - r - INFO: - Episode: 636/1000, Reward: 55.0, Step: 55 -2022-10-31 23:24:48 - r - INFO: - Episode: 637/1000, Reward: 40.0, Step: 40 -2022-10-31 23:24:48 - r - INFO: - Episode: 638/1000, Reward: 99.0, Step: 99 -2022-10-31 23:24:48 - r - INFO: - Episode: 639/1000, Reward: 104.0, Step: 104 -2022-10-31 23:24:49 - r - INFO: - Episode: 640/1000, Reward: 117.0, Step: 117 -2022-10-31 23:24:49 - r - INFO: - Episode: 641/1000, Reward: 112.0, Step: 112 -2022-10-31 23:24:49 - r - INFO: - Episode: 642/1000, Reward: 43.0, Step: 43 -2022-10-31 23:24:50 - r - INFO: - Episode: 643/1000, Reward: 96.0, Step: 96 -2022-10-31 23:24:50 - r - INFO: - Episode: 644/1000, Reward: 105.0, Step: 105 -2022-10-31 23:24:50 - r - INFO: - Episode: 645/1000, Reward: 115.0, Step: 115 -2022-10-31 23:24:51 - r - INFO: - Episode: 646/1000, Reward: 99.0, Step: 99 -2022-10-31 23:24:51 - r - INFO: - Episode: 647/1000, Reward: 123.0, Step: 123 -2022-10-31 23:24:51 - r - INFO: - Episode: 648/1000, Reward: 123.0, Step: 123 -2022-10-31 23:24:51 - r - INFO: - Episode: 649/1000, Reward: 40.0, Step: 40 -2022-10-31 23:24:51 - r - INFO: - Episode: 650/1000, Reward: 100.0, Step: 100 -2022-10-31 23:24:52 - r - INFO: - Episode: 651/1000, Reward: 124.0, Step: 124 -2022-10-31 23:24:52 - r - INFO: - Episode: 652/1000, Reward: 106.0, Step: 106 -2022-10-31 23:24:53 - r - INFO: - Episode: 653/1000, Reward: 122.0, Step: 122 -2022-10-31 23:24:53 - r - INFO: - Episode: 654/1000, Reward: 127.0, Step: 127 -2022-10-31 23:24:53 - r - INFO: - Episode: 655/1000, Reward: 121.0, Step: 121 -2022-10-31 23:24:54 - r - INFO: - Episode: 656/1000, Reward: 121.0, Step: 121 -2022-10-31 23:24:54 - r - INFO: - Episode: 657/1000, Reward: 125.0, Step: 125 -2022-10-31 23:24:54 - r - INFO: - Episode: 658/1000, Reward: 127.0, Step: 127 -2022-10-31 23:24:55 - r - INFO: - Episode: 659/1000, Reward: 132.0, Step: 132 -2022-10-31 23:24:55 - r - INFO: - Episode: 660/1000, Reward: 142.0, Step: 142 -2022-10-31 23:24:56 - r - INFO: - Episode: 661/1000, Reward: 134.0, Step: 134 -2022-10-31 23:24:56 - r - INFO: - Episode: 662/1000, Reward: 147.0, Step: 147 -2022-10-31 23:24:57 - r - INFO: - Episode: 663/1000, Reward: 175.0, Step: 175 -2022-10-31 23:24:57 - r - INFO: - Episode: 664/1000, Reward: 180.0, Step: 180 -2022-10-31 23:24:57 - r - INFO: - Episode: 665/1000, Reward: 183.0, Step: 183 -2022-10-31 23:24:58 - r - INFO: - Episode: 666/1000, Reward: 167.0, Step: 167 -2022-10-31 23:24:59 - r - INFO: - Episode: 667/1000, Reward: 179.0, Step: 179 -2022-10-31 23:24:59 - r - INFO: - Episode: 668/1000, Reward: 173.0, Step: 173 -2022-10-31 23:25:00 - r - INFO: - Episode: 669/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:00 - r - INFO: - Episode: 670/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:01 - r - INFO: - Episode: 671/1000, Reward: 184.0, Step: 184 -2022-10-31 23:25:02 - r - INFO: - Episode: 672/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:02 - r - INFO: - Episode: 673/1000, Reward: 193.0, Step: 193 -2022-10-31 23:25:03 - r - INFO: - Episode: 674/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:03 - r - INFO: - Episode: 675/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:04 - r - INFO: - Current episode 675 has the best eval reward: 200.00 -2022-10-31 23:25:04 - r - INFO: - Episode: 676/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:05 - r - INFO: - Episode: 677/1000, Reward: 199.0, Step: 199 -2022-10-31 23:25:05 - r - INFO: - Episode: 678/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:06 - r - INFO: - Episode: 679/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:06 - r - INFO: - Episode: 680/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:07 - r - INFO: - Current episode 680 has the best eval reward: 200.00 -2022-10-31 23:25:08 - r - INFO: - Episode: 681/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:08 - r - INFO: - Episode: 682/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:09 - r - INFO: - Episode: 683/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:09 - r - INFO: - Episode: 684/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:10 - r - INFO: - Episode: 685/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:10 - r - INFO: - Current episode 685 has the best eval reward: 200.00 -2022-10-31 23:25:11 - r - INFO: - Episode: 686/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:11 - r - INFO: - Episode: 687/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:12 - r - INFO: - Episode: 688/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:12 - r - INFO: - Episode: 689/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:13 - r - INFO: - Episode: 690/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:14 - r - INFO: - Episode: 691/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:14 - r - INFO: - Episode: 692/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:15 - r - INFO: - Episode: 693/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:15 - r - INFO: - Episode: 694/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:15 - r - INFO: - Episode: 695/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:17 - r - INFO: - Episode: 696/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:17 - r - INFO: - Episode: 697/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:17 - r - INFO: - Episode: 698/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:18 - r - INFO: - Episode: 699/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:18 - r - INFO: - Episode: 700/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:19 - r - INFO: - Episode: 701/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:20 - r - INFO: - Episode: 702/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:20 - r - INFO: - Episode: 703/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:21 - r - INFO: - Episode: 704/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:21 - r - INFO: - Episode: 705/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:22 - r - INFO: - Current episode 705 has the best eval reward: 200.00 -2022-10-31 23:25:22 - r - INFO: - Episode: 706/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:23 - r - INFO: - Episode: 707/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:23 - r - INFO: - Episode: 708/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:24 - r - INFO: - Episode: 709/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:24 - r - INFO: - Episode: 710/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:25 - r - INFO: - Current episode 710 has the best eval reward: 200.00 -2022-10-31 23:25:26 - r - INFO: - Episode: 711/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:26 - r - INFO: - Episode: 712/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:26 - r - INFO: - Episode: 713/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:27 - r - INFO: - Episode: 714/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:27 - r - INFO: - Episode: 715/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:28 - r - INFO: - Current episode 715 has the best eval reward: 200.00 -2022-10-31 23:25:28 - r - INFO: - Episode: 716/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:29 - r - INFO: - Episode: 717/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:29 - r - INFO: - Episode: 718/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:30 - r - INFO: - Episode: 719/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:30 - r - INFO: - Episode: 720/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:31 - r - INFO: - Current episode 720 has the best eval reward: 200.00 -2022-10-31 23:25:31 - r - INFO: - Episode: 721/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:32 - r - INFO: - Episode: 722/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:32 - r - INFO: - Episode: 723/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:33 - r - INFO: - Episode: 724/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:33 - r - INFO: - Episode: 725/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:34 - r - INFO: - Current episode 725 has the best eval reward: 200.00 -2022-10-31 23:25:34 - r - INFO: - Episode: 726/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:35 - r - INFO: - Episode: 727/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:35 - r - INFO: - Episode: 728/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:35 - r - INFO: - Episode: 729/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:36 - r - INFO: - Episode: 730/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:37 - r - INFO: - Current episode 730 has the best eval reward: 200.00 -2022-10-31 23:25:37 - r - INFO: - Episode: 731/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:37 - r - INFO: - Episode: 732/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:38 - r - INFO: - Episode: 733/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:38 - r - INFO: - Episode: 734/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:39 - r - INFO: - Episode: 735/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:39 - r - INFO: - Current episode 735 has the best eval reward: 200.00 -2022-10-31 23:25:40 - r - INFO: - Episode: 736/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:40 - r - INFO: - Episode: 737/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:41 - r - INFO: - Episode: 738/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:41 - r - INFO: - Episode: 739/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:42 - r - INFO: - Episode: 740/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:42 - r - INFO: - Current episode 740 has the best eval reward: 200.00 -2022-10-31 23:25:43 - r - INFO: - Episode: 741/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:43 - r - INFO: - Episode: 742/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:44 - r - INFO: - Episode: 743/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:44 - r - INFO: - Episode: 744/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:44 - r - INFO: - Episode: 745/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:45 - r - INFO: - Current episode 745 has the best eval reward: 200.00 -2022-10-31 23:25:46 - r - INFO: - Episode: 746/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:46 - r - INFO: - Episode: 747/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:46 - r - INFO: - Episode: 748/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:47 - r - INFO: - Episode: 749/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:47 - r - INFO: - Episode: 750/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:48 - r - INFO: - Current episode 750 has the best eval reward: 200.00 -2022-10-31 23:25:48 - r - INFO: - Episode: 751/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:49 - r - INFO: - Episode: 752/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:49 - r - INFO: - Episode: 753/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:50 - r - INFO: - Episode: 754/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:50 - r - INFO: - Episode: 755/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:51 - r - INFO: - Current episode 755 has the best eval reward: 200.00 -2022-10-31 23:25:51 - r - INFO: - Episode: 756/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:52 - r - INFO: - Episode: 757/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:52 - r - INFO: - Episode: 758/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:53 - r - INFO: - Episode: 759/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:53 - r - INFO: - Episode: 760/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:54 - r - INFO: - Current episode 760 has the best eval reward: 200.00 -2022-10-31 23:25:54 - r - INFO: - Episode: 761/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:55 - r - INFO: - Episode: 762/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:55 - r - INFO: - Episode: 763/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:55 - r - INFO: - Episode: 764/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:56 - r - INFO: - Episode: 765/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:57 - r - INFO: - Current episode 765 has the best eval reward: 200.00 -2022-10-31 23:25:57 - r - INFO: - Episode: 766/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:57 - r - INFO: - Episode: 767/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:58 - r - INFO: - Episode: 768/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:58 - r - INFO: - Episode: 769/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:59 - r - INFO: - Episode: 770/1000, Reward: 200.0, Step: 200 -2022-10-31 23:25:59 - r - INFO: - Current episode 770 has the best eval reward: 200.00 -2022-10-31 23:26:00 - r - INFO: - Episode: 771/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:00 - r - INFO: - Episode: 772/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:01 - r - INFO: - Episode: 773/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:01 - r - INFO: - Episode: 774/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:02 - r - INFO: - Episode: 775/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:02 - r - INFO: - Current episode 775 has the best eval reward: 200.00 -2022-10-31 23:26:03 - r - INFO: - Episode: 776/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:03 - r - INFO: - Episode: 777/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:04 - r - INFO: - Episode: 778/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:04 - r - INFO: - Episode: 779/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:04 - r - INFO: - Episode: 780/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:06 - r - INFO: - Episode: 781/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:06 - r - INFO: - Episode: 782/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:06 - r - INFO: - Episode: 783/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:07 - r - INFO: - Episode: 784/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:07 - r - INFO: - Episode: 785/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:08 - r - INFO: - Episode: 786/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:09 - r - INFO: - Episode: 787/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:09 - r - INFO: - Episode: 788/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:10 - r - INFO: - Episode: 789/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:10 - r - INFO: - Episode: 790/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:11 - r - INFO: - Current episode 790 has the best eval reward: 200.00 -2022-10-31 23:26:11 - r - INFO: - Episode: 791/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:12 - r - INFO: - Episode: 792/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:12 - r - INFO: - Episode: 793/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:12 - r - INFO: - Episode: 794/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:13 - r - INFO: - Episode: 795/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:14 - r - INFO: - Current episode 795 has the best eval reward: 200.00 -2022-10-31 23:26:14 - r - INFO: - Episode: 796/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:14 - r - INFO: - Episode: 797/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:15 - r - INFO: - Episode: 798/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:15 - r - INFO: - Episode: 799/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:16 - r - INFO: - Episode: 800/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:16 - r - INFO: - Current episode 800 has the best eval reward: 200.00 -2022-10-31 23:26:17 - r - INFO: - Episode: 801/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:17 - r - INFO: - Episode: 802/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:18 - r - INFO: - Episode: 803/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:18 - r - INFO: - Episode: 804/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:19 - r - INFO: - Episode: 805/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:19 - r - INFO: - Current episode 805 has the best eval reward: 200.00 -2022-10-31 23:26:20 - r - INFO: - Episode: 806/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:20 - r - INFO: - Episode: 807/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:21 - r - INFO: - Episode: 808/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:21 - r - INFO: - Episode: 809/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:21 - r - INFO: - Episode: 810/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:22 - r - INFO: - Current episode 810 has the best eval reward: 200.00 -2022-10-31 23:26:23 - r - INFO: - Episode: 811/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:23 - r - INFO: - Episode: 812/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:23 - r - INFO: - Episode: 813/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:24 - r - INFO: - Episode: 814/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:24 - r - INFO: - Episode: 815/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:25 - r - INFO: - Current episode 815 has the best eval reward: 200.00 -2022-10-31 23:26:25 - r - INFO: - Episode: 816/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:26 - r - INFO: - Episode: 817/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:26 - r - INFO: - Episode: 818/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:27 - r - INFO: - Episode: 819/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:27 - r - INFO: - Episode: 820/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:28 - r - INFO: - Current episode 820 has the best eval reward: 200.00 -2022-10-31 23:26:28 - r - INFO: - Episode: 821/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:29 - r - INFO: - Episode: 822/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:29 - r - INFO: - Episode: 823/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:30 - r - INFO: - Episode: 824/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:30 - r - INFO: - Episode: 825/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:31 - r - INFO: - Current episode 825 has the best eval reward: 200.00 -2022-10-31 23:26:31 - r - INFO: - Episode: 826/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:32 - r - INFO: - Episode: 827/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:32 - r - INFO: - Episode: 828/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:32 - r - INFO: - Episode: 829/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:33 - r - INFO: - Episode: 830/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:34 - r - INFO: - Current episode 830 has the best eval reward: 200.00 -2022-10-31 23:26:34 - r - INFO: - Episode: 831/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:34 - r - INFO: - Episode: 832/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:35 - r - INFO: - Episode: 833/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:35 - r - INFO: - Episode: 834/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:36 - r - INFO: - Episode: 835/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:36 - r - INFO: - Current episode 835 has the best eval reward: 200.00 -2022-10-31 23:26:37 - r - INFO: - Episode: 836/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:37 - r - INFO: - Episode: 837/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:38 - r - INFO: - Episode: 838/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:38 - r - INFO: - Episode: 839/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:38 - r - INFO: - Episode: 840/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:39 - r - INFO: - Current episode 840 has the best eval reward: 200.00 -2022-10-31 23:26:40 - r - INFO: - Episode: 841/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:40 - r - INFO: - Episode: 842/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:40 - r - INFO: - Episode: 843/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:41 - r - INFO: - Episode: 844/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:41 - r - INFO: - Episode: 845/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:42 - r - INFO: - Current episode 845 has the best eval reward: 200.00 -2022-10-31 23:26:42 - r - INFO: - Episode: 846/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:43 - r - INFO: - Episode: 847/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:43 - r - INFO: - Episode: 848/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:44 - r - INFO: - Episode: 849/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:44 - r - INFO: - Episode: 850/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:45 - r - INFO: - Current episode 850 has the best eval reward: 200.00 -2022-10-31 23:26:45 - r - INFO: - Episode: 851/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:46 - r - INFO: - Episode: 852/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:46 - r - INFO: - Episode: 853/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:47 - r - INFO: - Episode: 854/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:47 - r - INFO: - Episode: 855/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:48 - r - INFO: - Current episode 855 has the best eval reward: 200.00 -2022-10-31 23:26:48 - r - INFO: - Episode: 856/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:49 - r - INFO: - Episode: 857/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:49 - r - INFO: - Episode: 858/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:49 - r - INFO: - Episode: 859/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:50 - r - INFO: - Episode: 860/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:51 - r - INFO: - Current episode 860 has the best eval reward: 200.00 -2022-10-31 23:26:51 - r - INFO: - Episode: 861/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:51 - r - INFO: - Episode: 862/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:52 - r - INFO: - Episode: 863/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:52 - r - INFO: - Episode: 864/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:53 - r - INFO: - Episode: 865/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:53 - r - INFO: - Current episode 865 has the best eval reward: 200.00 -2022-10-31 23:26:54 - r - INFO: - Episode: 866/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:54 - r - INFO: - Episode: 867/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:55 - r - INFO: - Episode: 868/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:55 - r - INFO: - Episode: 869/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:56 - r - INFO: - Episode: 870/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:56 - r - INFO: - Current episode 870 has the best eval reward: 200.00 -2022-10-31 23:26:57 - r - INFO: - Episode: 871/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:57 - r - INFO: - Episode: 872/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:58 - r - INFO: - Episode: 873/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:58 - r - INFO: - Episode: 874/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:59 - r - INFO: - Episode: 875/1000, Reward: 200.0, Step: 200 -2022-10-31 23:26:59 - r - INFO: - Current episode 875 has the best eval reward: 200.00 -2022-10-31 23:27:00 - r - INFO: - Episode: 876/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:00 - r - INFO: - Episode: 877/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:01 - r - INFO: - Episode: 878/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:01 - r - INFO: - Episode: 879/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:01 - r - INFO: - Episode: 880/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:02 - r - INFO: - Current episode 880 has the best eval reward: 200.00 -2022-10-31 23:27:03 - r - INFO: - Episode: 881/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:03 - r - INFO: - Episode: 882/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:03 - r - INFO: - Episode: 883/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:04 - r - INFO: - Episode: 884/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:04 - r - INFO: - Episode: 885/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:05 - r - INFO: - Current episode 885 has the best eval reward: 200.00 -2022-10-31 23:27:05 - r - INFO: - Episode: 886/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:06 - r - INFO: - Episode: 887/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:06 - r - INFO: - Episode: 888/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:07 - r - INFO: - Episode: 889/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:07 - r - INFO: - Episode: 890/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:08 - r - INFO: - Current episode 890 has the best eval reward: 200.00 -2022-10-31 23:27:08 - r - INFO: - Episode: 891/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:09 - r - INFO: - Episode: 892/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:09 - r - INFO: - Episode: 893/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:10 - r - INFO: - Episode: 894/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:10 - r - INFO: - Episode: 895/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:11 - r - INFO: - Current episode 895 has the best eval reward: 200.00 -2022-10-31 23:27:11 - r - INFO: - Episode: 896/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:12 - r - INFO: - Episode: 897/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:12 - r - INFO: - Episode: 898/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:12 - r - INFO: - Episode: 899/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:13 - r - INFO: - Episode: 900/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:14 - r - INFO: - Current episode 900 has the best eval reward: 200.00 -2022-10-31 23:27:14 - r - INFO: - Episode: 901/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:15 - r - INFO: - Episode: 902/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:15 - r - INFO: - Episode: 903/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:16 - r - INFO: - Episode: 904/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:16 - r - INFO: - Episode: 905/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:17 - r - INFO: - Current episode 905 has the best eval reward: 200.00 -2022-10-31 23:27:17 - r - INFO: - Episode: 906/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:18 - r - INFO: - Episode: 907/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:18 - r - INFO: - Episode: 908/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:19 - r - INFO: - Episode: 909/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:19 - r - INFO: - Episode: 910/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:20 - r - INFO: - Current episode 910 has the best eval reward: 200.00 -2022-10-31 23:27:20 - r - INFO: - Episode: 911/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:21 - r - INFO: - Episode: 912/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:21 - r - INFO: - Episode: 913/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:21 - r - INFO: - Episode: 914/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:22 - r - INFO: - Episode: 915/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:23 - r - INFO: - Current episode 915 has the best eval reward: 200.00 -2022-10-31 23:27:23 - r - INFO: - Episode: 916/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:23 - r - INFO: - Episode: 917/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:24 - r - INFO: - Episode: 918/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:24 - r - INFO: - Episode: 919/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:25 - r - INFO: - Episode: 920/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:25 - r - INFO: - Current episode 920 has the best eval reward: 200.00 -2022-10-31 23:27:26 - r - INFO: - Episode: 921/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:26 - r - INFO: - Episode: 922/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:27 - r - INFO: - Episode: 923/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:27 - r - INFO: - Episode: 924/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:28 - r - INFO: - Episode: 925/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:29 - r - INFO: - Episode: 926/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:29 - r - INFO: - Episode: 927/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:30 - r - INFO: - Episode: 928/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:30 - r - INFO: - Episode: 929/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:30 - r - INFO: - Episode: 930/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:31 - r - INFO: - Episode: 931/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:32 - r - INFO: - Episode: 932/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:32 - r - INFO: - Episode: 933/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:33 - r - INFO: - Episode: 934/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:33 - r - INFO: - Episode: 935/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:34 - r - INFO: - Episode: 936/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:35 - r - INFO: - Episode: 937/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:35 - r - INFO: - Episode: 938/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:36 - r - INFO: - Episode: 939/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:36 - r - INFO: - Episode: 940/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:37 - r - INFO: - Episode: 941/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:38 - r - INFO: - Episode: 942/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:38 - r - INFO: - Episode: 943/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:38 - r - INFO: - Episode: 944/1000, Reward: 153.0, Step: 153 -2022-10-31 23:27:39 - r - INFO: - Episode: 945/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:40 - r - INFO: - Episode: 946/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:40 - r - INFO: - Episode: 947/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:41 - r - INFO: - Episode: 948/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:41 - r - INFO: - Episode: 949/1000, Reward: 150.0, Step: 150 -2022-10-31 23:27:41 - r - INFO: - Episode: 950/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:43 - r - INFO: - Episode: 951/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:43 - r - INFO: - Episode: 952/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:43 - r - INFO: - Episode: 953/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:44 - r - INFO: - Episode: 954/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:44 - r - INFO: - Episode: 955/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:45 - r - INFO: - Episode: 956/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:46 - r - INFO: - Episode: 957/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:46 - r - INFO: - Episode: 958/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:47 - r - INFO: - Episode: 959/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:47 - r - INFO: - Episode: 960/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:48 - r - INFO: - Current episode 960 has the best eval reward: 200.00 -2022-10-31 23:27:48 - r - INFO: - Episode: 961/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:49 - r - INFO: - Episode: 962/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:49 - r - INFO: - Episode: 963/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:49 - r - INFO: - Episode: 964/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:50 - r - INFO: - Episode: 965/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:51 - r - INFO: - Current episode 965 has the best eval reward: 200.00 -2022-10-31 23:27:51 - r - INFO: - Episode: 966/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:51 - r - INFO: - Episode: 967/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:52 - r - INFO: - Episode: 968/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:52 - r - INFO: - Episode: 969/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:53 - r - INFO: - Episode: 970/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:53 - r - INFO: - Current episode 970 has the best eval reward: 200.00 -2022-10-31 23:27:54 - r - INFO: - Episode: 971/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:54 - r - INFO: - Episode: 972/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:55 - r - INFO: - Episode: 973/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:55 - r - INFO: - Episode: 974/1000, Reward: 161.0, Step: 161 -2022-10-31 23:27:55 - r - INFO: - Episode: 975/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:57 - r - INFO: - Episode: 976/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:57 - r - INFO: - Episode: 977/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:57 - r - INFO: - Episode: 978/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:58 - r - INFO: - Episode: 979/1000, Reward: 200.0, Step: 200 -2022-10-31 23:27:58 - r - INFO: - Episode: 980/1000, Reward: 200.0, Step: 200 -2022-10-31 23:28:00 - r - INFO: - Episode: 981/1000, Reward: 200.0, Step: 200 -2022-10-31 23:28:00 - r - INFO: - Episode: 982/1000, Reward: 200.0, Step: 200 -2022-10-31 23:28:01 - r - INFO: - Episode: 983/1000, Reward: 200.0, Step: 200 -2022-10-31 23:28:01 - r - INFO: - Episode: 984/1000, Reward: 111.0, Step: 111 -2022-10-31 23:28:01 - r - INFO: - Episode: 985/1000, Reward: 200.0, Step: 200 -2022-10-31 23:28:03 - r - INFO: - Episode: 986/1000, Reward: 200.0, Step: 200 -2022-10-31 23:28:03 - r - INFO: - Episode: 987/1000, Reward: 200.0, Step: 200 -2022-10-31 23:28:04 - r - INFO: - Episode: 988/1000, Reward: 200.0, Step: 200 -2022-10-31 23:28:04 - r - INFO: - Episode: 989/1000, Reward: 200.0, Step: 200 -2022-10-31 23:28:04 - r - INFO: - Episode: 990/1000, Reward: 200.0, Step: 200 -2022-10-31 23:28:05 - r - INFO: - Current episode 990 has the best eval reward: 200.00 -2022-10-31 23:28:06 - r - INFO: - Episode: 991/1000, Reward: 200.0, Step: 200 -2022-10-31 23:28:06 - r - INFO: - Episode: 992/1000, Reward: 200.0, Step: 200 -2022-10-31 23:28:07 - r - INFO: - Episode: 993/1000, Reward: 200.0, Step: 200 -2022-10-31 23:28:07 - r - INFO: - Episode: 994/1000, Reward: 200.0, Step: 200 -2022-10-31 23:28:07 - r - INFO: - Episode: 995/1000, Reward: 200.0, Step: 200 -2022-10-31 23:28:09 - r - INFO: - Episode: 996/1000, Reward: 200.0, Step: 200 -2022-10-31 23:28:09 - r - INFO: - Episode: 997/1000, Reward: 200.0, Step: 200 -2022-10-31 23:28:09 - r - INFO: - Episode: 998/1000, Reward: 200.0, Step: 200 -2022-10-31 23:28:10 - r - INFO: - Episode: 999/1000, Reward: 154.0, Step: 154 -2022-10-31 23:28:10 - r - INFO: - Episode: 1000/1000, Reward: 200.0, Step: 200 -2022-10-31 23:28:11 - r - INFO: - Finish training! diff --git a/projects/codes/A2C/Train_CartPole-v1_A2C_20221031-232138/models/actor_checkpoint.pt b/projects/codes/A2C/Train_CartPole-v1_A2C_20221031-232138/models/actor_checkpoint.pt deleted file mode 100644 index 05bd7b6..0000000 Binary files a/projects/codes/A2C/Train_CartPole-v1_A2C_20221031-232138/models/actor_checkpoint.pt and /dev/null differ diff --git a/projects/codes/A2C/Train_CartPole-v1_A2C_20221031-232138/models/critic_checkpoint.pt b/projects/codes/A2C/Train_CartPole-v1_A2C_20221031-232138/models/critic_checkpoint.pt deleted file mode 100644 index 720f388..0000000 Binary files a/projects/codes/A2C/Train_CartPole-v1_A2C_20221031-232138/models/critic_checkpoint.pt and /dev/null differ diff --git a/projects/codes/A2C/Train_CartPole-v1_A2C_20221031-232138/results/learning_curve.png b/projects/codes/A2C/Train_CartPole-v1_A2C_20221031-232138/results/learning_curve.png deleted file mode 100644 index 841a786..0000000 Binary files a/projects/codes/A2C/Train_CartPole-v1_A2C_20221031-232138/results/learning_curve.png and /dev/null differ diff --git a/projects/codes/A2C/Train_CartPole-v1_A2C_20221031-232138/results/res.csv b/projects/codes/A2C/Train_CartPole-v1_A2C_20221031-232138/results/res.csv deleted file mode 100644 index ee82c68..0000000 --- a/projects/codes/A2C/Train_CartPole-v1_A2C_20221031-232138/results/res.csv +++ /dev/null @@ -1,1001 +0,0 @@ -episodes,rewards,steps -0,25.0,25 -1,11.0,11 -2,32.0,32 -3,11.0,11 -4,14.0,14 -5,11.0,11 -6,23.0,23 -7,27.0,27 -8,10.0,10 -9,21.0,21 -10,15.0,15 -11,26.0,26 -12,22.0,22 -13,14.0,14 -14,14.0,14 -15,21.0,21 -16,10.0,10 -17,19.0,19 -18,18.0,18 -19,26.0,26 -20,29.0,29 -21,40.0,40 -22,35.0,35 -23,33.0,33 -24,47.0,47 -25,50.0,50 -26,21.0,21 -27,30.0,30 -28,26.0,26 -29,40.0,40 -30,31.0,31 -31,54.0,54 -32,59.0,59 -33,50.0,50 -34,26.0,26 -35,34.0,34 -36,25.0,25 -37,166.0,166 -38,35.0,35 -39,25.0,25 -40,110.0,110 -41,22.0,22 -42,57.0,57 -43,45.0,45 -44,35.0,35 -45,45.0,45 -46,51.0,51 -47,32.0,32 -48,67.0,67 -49,46.0,46 -50,61.0,61 -51,49.0,49 -52,47.0,47 -53,37.0,37 -54,32.0,32 -55,31.0,31 -56,33.0,33 -57,93.0,93 -58,60.0,60 -59,128.0,128 -60,200.0,200 -61,47.0,47 -62,47.0,47 -63,63.0,63 -64,68.0,68 -65,45.0,45 -66,101.0,101 -67,47.0,47 -68,49.0,49 -69,54.0,54 -70,42.0,42 -71,77.0,77 -72,67.0,67 -73,41.0,41 -74,89.0,89 -75,51.0,51 -76,54.0,54 -77,37.0,37 -78,49.0,49 -79,46.0,46 -80,31.0,31 -81,43.0,43 -82,60.0,60 -83,41.0,41 -84,40.0,40 -85,28.0,28 -86,50.0,50 -87,159.0,159 -88,30.0,30 -89,34.0,34 -90,70.0,70 -91,22.0,22 -92,39.0,39 -93,50.0,50 -94,40.0,40 -95,37.0,37 -96,121.0,121 -97,26.0,26 -98,40.0,40 -99,30.0,30 -100,35.0,35 -101,40.0,40 -102,28.0,28 -103,29.0,29 -104,42.0,42 -105,54.0,54 -106,25.0,25 -107,47.0,47 -108,32.0,32 -109,50.0,50 -110,30.0,30 -111,58.0,58 -112,32.0,32 -113,43.0,43 -114,57.0,57 -115,20.0,20 -116,48.0,48 -117,45.0,45 -118,47.0,47 -119,69.0,69 -120,34.0,34 -121,22.0,22 -122,22.0,22 -123,38.0,38 -124,36.0,36 -125,41.0,41 -126,28.0,28 -127,35.0,35 -128,48.0,48 -129,51.0,51 -130,51.0,51 -131,36.0,36 -132,45.0,45 -133,27.0,27 -134,40.0,40 -135,43.0,43 -136,64.0,64 -137,43.0,43 -138,37.0,37 -139,38.0,38 -140,69.0,69 -141,36.0,36 -142,28.0,28 -143,58.0,58 -144,43.0,43 -145,50.0,50 -146,30.0,30 -147,42.0,42 -148,42.0,42 -149,35.0,35 -150,67.0,67 -151,45.0,45 -152,28.0,28 -153,59.0,59 -154,64.0,64 -155,67.0,67 -156,41.0,41 -157,81.0,81 -158,76.0,76 -159,91.0,91 -160,119.0,119 -161,47.0,47 -162,64.0,64 -163,178.0,178 -164,97.0,97 -165,181.0,181 -166,166.0,166 -167,79.0,79 -168,141.0,141 -169,119.0,119 -170,81.0,81 -171,124.0,124 -172,150.0,150 -173,98.0,98 -174,164.0,164 -175,200.0,200 -176,115.0,115 -177,116.0,116 -178,160.0,160 -179,103.0,103 -180,181.0,181 -181,185.0,185 -182,93.0,93 -183,110.0,110 -184,200.0,200 -185,141.0,141 -186,150.0,150 -187,121.0,121 -188,110.0,110 -189,115.0,115 -190,114.0,114 -191,45.0,45 -192,125.0,125 -193,142.0,142 -194,54.0,54 -195,62.0,62 -196,122.0,122 -197,58.0,58 -198,88.0,88 -199,141.0,141 -200,113.0,113 -201,200.0,200 -202,136.0,136 -203,114.0,114 -204,102.0,102 -205,176.0,176 -206,150.0,150 -207,105.0,105 -208,200.0,200 -209,200.0,200 -210,167.0,167 -211,104.0,104 -212,124.0,124 -213,96.0,96 -214,200.0,200 -215,199.0,199 -216,200.0,200 -217,132.0,132 -218,188.0,188 -219,132.0,132 -220,151.0,151 -221,125.0,125 -222,42.0,42 -223,200.0,200 -224,159.0,159 -225,171.0,171 -226,122.0,122 -227,189.0,189 -228,129.0,129 -229,106.0,106 -230,107.0,107 -231,200.0,200 -232,200.0,200 -233,200.0,200 -234,200.0,200 -235,158.0,158 -236,200.0,200 -237,192.0,192 -238,179.0,179 -239,102.0,102 -240,125.0,125 -241,138.0,138 -242,189.0,189 -243,41.0,41 -244,97.0,97 -245,49.0,49 -246,86.0,86 -247,121.0,121 -248,117.0,117 -249,43.0,43 -250,72.0,72 -251,34.0,34 -252,83.0,83 -253,83.0,83 -254,38.0,38 -255,34.0,34 -256,99.0,99 -257,45.0,45 -258,47.0,47 -259,44.0,44 -260,26.0,26 -261,37.0,37 -262,26.0,26 -263,43.0,43 -264,27.0,27 -265,24.0,24 -266,42.0,42 -267,86.0,86 -268,23.0,23 -269,32.0,32 -270,57.0,57 -271,25.0,25 -272,98.0,98 -273,29.0,29 -274,25.0,25 -275,29.0,29 -276,39.0,39 -277,20.0,20 -278,92.0,92 -279,28.0,28 -280,78.0,78 -281,25.0,25 -282,31.0,31 -283,88.0,88 -284,85.0,85 -285,37.0,37 -286,26.0,26 -287,19.0,19 -288,40.0,40 -289,27.0,27 -290,17.0,17 -291,27.0,27 -292,26.0,26 -293,82.0,82 -294,36.0,36 -295,24.0,24 -296,30.0,30 -297,20.0,20 -298,34.0,34 -299,30.0,30 -300,23.0,23 -301,36.0,36 -302,29.0,29 -303,34.0,34 -304,25.0,25 -305,42.0,42 -306,88.0,88 -307,26.0,26 -308,85.0,85 -309,89.0,89 -310,48.0,48 -311,83.0,83 -312,109.0,109 -313,42.0,42 -314,93.0,93 -315,85.0,85 -316,100.0,100 -317,106.0,106 -318,28.0,28 -319,108.0,108 -320,112.0,112 -321,88.0,88 -322,108.0,108 -323,108.0,108 -324,90.0,90 -325,112.0,112 -326,113.0,113 -327,94.0,94 -328,99.0,99 -329,45.0,45 -330,121.0,121 -331,102.0,102 -332,111.0,111 -333,54.0,54 -334,198.0,198 -335,83.0,83 -336,107.0,107 -337,101.0,101 -338,129.0,129 -339,88.0,88 -340,86.0,86 -341,199.0,199 -342,95.0,95 -343,103.0,103 -344,100.0,100 -345,89.0,89 -346,87.0,87 -347,110.0,110 -348,127.0,127 -349,97.0,97 -350,34.0,34 -351,123.0,123 -352,49.0,49 -353,96.0,96 -354,90.0,90 -355,110.0,110 -356,93.0,93 -357,102.0,102 -358,128.0,128 -359,125.0,125 -360,92.0,92 -361,109.0,109 -362,114.0,114 -363,111.0,111 -364,38.0,38 -365,55.0,55 -366,106.0,106 -367,115.0,115 -368,103.0,103 -369,50.0,50 -370,110.0,110 -371,102.0,102 -372,110.0,110 -373,29.0,29 -374,35.0,35 -375,42.0,42 -376,62.0,62 -377,119.0,119 -378,33.0,33 -379,31.0,31 -380,97.0,97 -381,192.0,192 -382,179.0,179 -383,89.0,89 -384,32.0,32 -385,33.0,33 -386,52.0,52 -387,31.0,31 -388,22.0,22 -389,118.0,118 -390,24.0,24 -391,115.0,115 -392,20.0,20 -393,33.0,33 -394,40.0,40 -395,27.0,27 -396,26.0,26 -397,24.0,24 -398,19.0,19 -399,22.0,22 -400,24.0,24 -401,18.0,18 -402,23.0,23 -403,27.0,27 -404,20.0,20 -405,27.0,27 -406,17.0,17 -407,27.0,27 -408,25.0,25 -409,25.0,25 -410,24.0,24 -411,24.0,24 -412,18.0,18 -413,20.0,20 -414,27.0,27 -415,28.0,28 -416,30.0,30 -417,28.0,28 -418,33.0,33 -419,24.0,24 -420,96.0,96 -421,26.0,26 -422,29.0,29 -423,25.0,25 -424,38.0,38 -425,33.0,33 -426,23.0,23 -427,39.0,39 -428,28.0,28 -429,97.0,97 -430,30.0,30 -431,29.0,29 -432,103.0,103 -433,36.0,36 -434,32.0,32 -435,41.0,41 -436,111.0,111 -437,48.0,48 -438,24.0,24 -439,49.0,49 -440,116.0,116 -441,118.0,118 -442,94.0,94 -443,132.0,132 -444,41.0,41 -445,105.0,105 -446,116.0,116 -447,136.0,136 -448,137.0,137 -449,45.0,45 -450,157.0,157 -451,116.0,116 -452,125.0,125 -453,120.0,120 -454,150.0,150 -455,114.0,114 -456,44.0,44 -457,138.0,138 -458,133.0,133 -459,141.0,141 -460,124.0,124 -461,143.0,143 -462,123.0,123 -463,134.0,134 -464,152.0,152 -465,140.0,140 -466,200.0,200 -467,168.0,168 -468,200.0,200 -469,200.0,200 -470,200.0,200 -471,200.0,200 -472,200.0,200 -473,200.0,200 -474,200.0,200 -475,200.0,200 -476,200.0,200 -477,200.0,200 -478,200.0,200 -479,200.0,200 -480,200.0,200 -481,200.0,200 -482,200.0,200 -483,200.0,200 -484,200.0,200 -485,200.0,200 -486,200.0,200 -487,200.0,200 -488,200.0,200 -489,200.0,200 -490,200.0,200 -491,200.0,200 -492,200.0,200 -493,200.0,200 -494,200.0,200 -495,169.0,169 -496,200.0,200 -497,200.0,200 -498,200.0,200 -499,200.0,200 -500,200.0,200 -501,200.0,200 -502,200.0,200 -503,200.0,200 -504,200.0,200 -505,200.0,200 -506,200.0,200 -507,200.0,200 -508,200.0,200 -509,200.0,200 -510,200.0,200 -511,200.0,200 -512,200.0,200 -513,200.0,200 -514,200.0,200 -515,200.0,200 -516,200.0,200 -517,200.0,200 -518,200.0,200 -519,200.0,200 -520,200.0,200 -521,200.0,200 -522,200.0,200 -523,200.0,200 -524,200.0,200 -525,200.0,200 -526,200.0,200 -527,200.0,200 -528,186.0,186 -529,200.0,200 -530,200.0,200 -531,200.0,200 -532,200.0,200 -533,200.0,200 -534,200.0,200 -535,200.0,200 -536,200.0,200 -537,200.0,200 -538,200.0,200 -539,200.0,200 -540,200.0,200 -541,200.0,200 -542,200.0,200 -543,84.0,84 -544,200.0,200 -545,200.0,200 -546,200.0,200 -547,200.0,200 -548,200.0,200 -549,200.0,200 -550,200.0,200 -551,200.0,200 -552,200.0,200 -553,200.0,200 -554,200.0,200 -555,200.0,200 -556,200.0,200 -557,200.0,200 -558,200.0,200 -559,200.0,200 -560,200.0,200 -561,200.0,200 -562,200.0,200 -563,200.0,200 -564,200.0,200 -565,200.0,200 -566,200.0,200 -567,200.0,200 -568,200.0,200 -569,200.0,200 -570,200.0,200 -571,200.0,200 -572,200.0,200 -573,200.0,200 -574,200.0,200 -575,200.0,200 -576,200.0,200 -577,200.0,200 -578,200.0,200 -579,200.0,200 -580,200.0,200 -581,200.0,200 -582,200.0,200 -583,200.0,200 -584,199.0,199 -585,200.0,200 -586,200.0,200 -587,178.0,178 -588,200.0,200 -589,188.0,188 -590,156.0,156 -591,165.0,165 -592,131.0,131 -593,157.0,157 -594,170.0,170 -595,123.0,123 -596,109.0,109 -597,124.0,124 -598,113.0,113 -599,38.0,38 -600,107.0,107 -601,115.0,115 -602,101.0,101 -603,113.0,113 -604,100.0,100 -605,109.0,109 -606,119.0,119 -607,117.0,117 -608,108.0,108 -609,101.0,101 -610,110.0,110 -611,59.0,59 -612,112.0,112 -613,104.0,104 -614,45.0,45 -615,29.0,29 -616,42.0,42 -617,74.0,74 -618,79.0,79 -619,50.0,50 -620,30.0,30 -621,43.0,43 -622,77.0,77 -623,36.0,36 -624,61.0,61 -625,36.0,36 -626,30.0,30 -627,43.0,43 -628,27.0,27 -629,88.0,88 -630,42.0,42 -631,40.0,40 -632,59.0,59 -633,81.0,81 -634,85.0,85 -635,55.0,55 -636,40.0,40 -637,99.0,99 -638,104.0,104 -639,117.0,117 -640,112.0,112 -641,43.0,43 -642,96.0,96 -643,105.0,105 -644,115.0,115 -645,99.0,99 -646,123.0,123 -647,123.0,123 -648,40.0,40 -649,100.0,100 -650,124.0,124 -651,106.0,106 -652,122.0,122 -653,127.0,127 -654,121.0,121 -655,121.0,121 -656,125.0,125 -657,127.0,127 -658,132.0,132 -659,142.0,142 -660,134.0,134 -661,147.0,147 -662,175.0,175 -663,180.0,180 -664,183.0,183 -665,167.0,167 -666,179.0,179 -667,173.0,173 -668,200.0,200 -669,200.0,200 -670,184.0,184 -671,200.0,200 -672,193.0,193 -673,200.0,200 -674,200.0,200 -675,200.0,200 -676,199.0,199 -677,200.0,200 -678,200.0,200 -679,200.0,200 -680,200.0,200 -681,200.0,200 -682,200.0,200 -683,200.0,200 -684,200.0,200 -685,200.0,200 -686,200.0,200 -687,200.0,200 -688,200.0,200 -689,200.0,200 -690,200.0,200 -691,200.0,200 -692,200.0,200 -693,200.0,200 -694,200.0,200 -695,200.0,200 -696,200.0,200 -697,200.0,200 -698,200.0,200 -699,200.0,200 -700,200.0,200 -701,200.0,200 -702,200.0,200 -703,200.0,200 -704,200.0,200 -705,200.0,200 -706,200.0,200 -707,200.0,200 -708,200.0,200 -709,200.0,200 -710,200.0,200 -711,200.0,200 -712,200.0,200 -713,200.0,200 -714,200.0,200 -715,200.0,200 -716,200.0,200 -717,200.0,200 -718,200.0,200 -719,200.0,200 -720,200.0,200 -721,200.0,200 -722,200.0,200 -723,200.0,200 -724,200.0,200 -725,200.0,200 -726,200.0,200 -727,200.0,200 -728,200.0,200 -729,200.0,200 -730,200.0,200 -731,200.0,200 -732,200.0,200 -733,200.0,200 -734,200.0,200 -735,200.0,200 -736,200.0,200 -737,200.0,200 -738,200.0,200 -739,200.0,200 -740,200.0,200 -741,200.0,200 -742,200.0,200 -743,200.0,200 -744,200.0,200 -745,200.0,200 -746,200.0,200 -747,200.0,200 -748,200.0,200 -749,200.0,200 -750,200.0,200 -751,200.0,200 -752,200.0,200 -753,200.0,200 -754,200.0,200 -755,200.0,200 -756,200.0,200 -757,200.0,200 -758,200.0,200 -759,200.0,200 -760,200.0,200 -761,200.0,200 -762,200.0,200 -763,200.0,200 -764,200.0,200 -765,200.0,200 -766,200.0,200 -767,200.0,200 -768,200.0,200 -769,200.0,200 -770,200.0,200 -771,200.0,200 -772,200.0,200 -773,200.0,200 -774,200.0,200 -775,200.0,200 -776,200.0,200 -777,200.0,200 -778,200.0,200 -779,200.0,200 -780,200.0,200 -781,200.0,200 -782,200.0,200 -783,200.0,200 -784,200.0,200 -785,200.0,200 -786,200.0,200 -787,200.0,200 -788,200.0,200 -789,200.0,200 -790,200.0,200 -791,200.0,200 -792,200.0,200 -793,200.0,200 -794,200.0,200 -795,200.0,200 -796,200.0,200 -797,200.0,200 -798,200.0,200 -799,200.0,200 -800,200.0,200 -801,200.0,200 -802,200.0,200 -803,200.0,200 -804,200.0,200 -805,200.0,200 -806,200.0,200 -807,200.0,200 -808,200.0,200 -809,200.0,200 -810,200.0,200 -811,200.0,200 -812,200.0,200 -813,200.0,200 -814,200.0,200 -815,200.0,200 -816,200.0,200 -817,200.0,200 -818,200.0,200 -819,200.0,200 -820,200.0,200 -821,200.0,200 -822,200.0,200 -823,200.0,200 -824,200.0,200 -825,200.0,200 -826,200.0,200 -827,200.0,200 -828,200.0,200 -829,200.0,200 -830,200.0,200 -831,200.0,200 -832,200.0,200 -833,200.0,200 -834,200.0,200 -835,200.0,200 -836,200.0,200 -837,200.0,200 -838,200.0,200 -839,200.0,200 -840,200.0,200 -841,200.0,200 -842,200.0,200 -843,200.0,200 -844,200.0,200 -845,200.0,200 -846,200.0,200 -847,200.0,200 -848,200.0,200 -849,200.0,200 -850,200.0,200 -851,200.0,200 -852,200.0,200 -853,200.0,200 -854,200.0,200 -855,200.0,200 -856,200.0,200 -857,200.0,200 -858,200.0,200 -859,200.0,200 -860,200.0,200 -861,200.0,200 -862,200.0,200 -863,200.0,200 -864,200.0,200 -865,200.0,200 -866,200.0,200 -867,200.0,200 -868,200.0,200 -869,200.0,200 -870,200.0,200 -871,200.0,200 -872,200.0,200 -873,200.0,200 -874,200.0,200 -875,200.0,200 -876,200.0,200 -877,200.0,200 -878,200.0,200 -879,200.0,200 -880,200.0,200 -881,200.0,200 -882,200.0,200 -883,200.0,200 -884,200.0,200 -885,200.0,200 -886,200.0,200 -887,200.0,200 -888,200.0,200 -889,200.0,200 -890,200.0,200 -891,200.0,200 -892,200.0,200 -893,200.0,200 -894,200.0,200 -895,200.0,200 -896,200.0,200 -897,200.0,200 -898,200.0,200 -899,200.0,200 -900,200.0,200 -901,200.0,200 -902,200.0,200 -903,200.0,200 -904,200.0,200 -905,200.0,200 -906,200.0,200 -907,200.0,200 -908,200.0,200 -909,200.0,200 -910,200.0,200 -911,200.0,200 -912,200.0,200 -913,200.0,200 -914,200.0,200 -915,200.0,200 -916,200.0,200 -917,200.0,200 -918,200.0,200 -919,200.0,200 -920,200.0,200 -921,200.0,200 -922,200.0,200 -923,200.0,200 -924,200.0,200 -925,200.0,200 -926,200.0,200 -927,200.0,200 -928,200.0,200 -929,200.0,200 -930,200.0,200 -931,200.0,200 -932,200.0,200 -933,200.0,200 -934,200.0,200 -935,200.0,200 -936,200.0,200 -937,200.0,200 -938,200.0,200 -939,200.0,200 -940,200.0,200 -941,200.0,200 -942,200.0,200 -943,153.0,153 -944,200.0,200 -945,200.0,200 -946,200.0,200 -947,200.0,200 -948,150.0,150 -949,200.0,200 -950,200.0,200 -951,200.0,200 -952,200.0,200 -953,200.0,200 -954,200.0,200 -955,200.0,200 -956,200.0,200 -957,200.0,200 -958,200.0,200 -959,200.0,200 -960,200.0,200 -961,200.0,200 -962,200.0,200 -963,200.0,200 -964,200.0,200 -965,200.0,200 -966,200.0,200 -967,200.0,200 -968,200.0,200 -969,200.0,200 -970,200.0,200 -971,200.0,200 -972,200.0,200 -973,161.0,161 -974,200.0,200 -975,200.0,200 -976,200.0,200 -977,200.0,200 -978,200.0,200 -979,200.0,200 -980,200.0,200 -981,200.0,200 -982,200.0,200 -983,111.0,111 -984,200.0,200 -985,200.0,200 -986,200.0,200 -987,200.0,200 -988,200.0,200 -989,200.0,200 -990,200.0,200 -991,200.0,200 -992,200.0,200 -993,200.0,200 -994,200.0,200 -995,200.0,200 -996,200.0,200 -997,200.0,200 -998,154.0,154 -999,200.0,200 diff --git a/projects/codes/A2C/a2c.py b/projects/codes/A2C/a2c.py deleted file mode 100644 index f822451..0000000 --- a/projects/codes/A2C/a2c.py +++ /dev/null @@ -1,103 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: JiangJi -Email: johnjim0816@gmail.com -Date: 2022-08-16 23:05:25 -LastEditor: JiangJi -LastEditTime: 2022-11-01 00:33:49 -Discription: -''' -import torch -import numpy as np -from torch.distributions import Categorical,Normal - - -class A2C: - def __init__(self,models,memories,cfg): - self.n_actions = cfg.n_actions - self.gamma = cfg.gamma - self.device = torch.device(cfg.device) - self.continuous = cfg.continuous - if hasattr(cfg,'action_bound'): - self.action_bound = cfg.action_bound - self.memory = memories['ACMemory'] - self.actor = models['Actor'].to(self.device) - self.critic = models['Critic'].to(self.device) - self.actor_optim = torch.optim.Adam(self.actor.parameters(), lr=cfg.actor_lr) - self.critic_optim = torch.optim.Adam(self.critic.parameters(), lr=cfg.critic_lr) - def sample_action(self,state): - # state = torch.tensor(state, device=self.device, dtype=torch.float32).unsqueeze(dim=0) - # dist = self.actor(state) - # self.entropy = - np.sum(np.mean(dist.detach().cpu().numpy()) * np.log(dist.detach().cpu().numpy())) - # value = self.critic(state) # note that 'dist' need require_grad=True - # self.value = value.detach().cpu().numpy().squeeze(0)[0] - # action = np.random.choice(self.n_actions, p=dist.detach().cpu().numpy().squeeze(0)) # shape(p=(n_actions,1) - # self.log_prob = torch.log(dist.squeeze(0)[action]) - if self.continuous: - state = torch.tensor(state, device=self.device, dtype=torch.float32).unsqueeze(dim=0) - mu, sigma = self.actor(state) - dist = Normal(self.action_bound * mu.view(1,), sigma.view(1,)) - action = dist.sample() - value = self.critic(state) - # self.entropy = - np.sum(np.mean(dist.detach().cpu().numpy()) * np.log(dist.detach().cpu().numpy())) - self.value = value.detach().cpu().numpy().squeeze(0)[0] # detach() to avoid gradient - self.log_prob = dist.log_prob(action).squeeze(dim=0) # Tensor([0.]) - self.entropy = dist.entropy().cpu().detach().numpy().squeeze(0) # detach() to avoid gradient - return action.cpu().detach().numpy() - else: - state = torch.tensor(state, device=self.device, dtype=torch.float32).unsqueeze(dim=0) - probs = self.actor(state) - dist = Categorical(probs) - action = dist.sample() # Tensor([0]) - value = self.critic(state) - self.value = value.detach().cpu().numpy().squeeze(0)[0] # detach() to avoid gradient - self.log_prob = dist.log_prob(action).squeeze(dim=0) # Tensor([0.]) - self.entropy = dist.entropy().cpu().detach().numpy().squeeze(0) # detach() to avoid gradient - return action.cpu().numpy().item() - @torch.no_grad() - def predict_action(self,state): - if self.continuous: - state = torch.tensor(state, device=self.device, dtype=torch.float32).unsqueeze(dim=0) - mu, sigma = self.actor(state) - dist = Normal(self.action_bound * mu.view(1,), sigma.view(1,)) - action = dist.sample() - return action.cpu().detach().numpy() - else: - state = torch.tensor(state, device=self.device, dtype=torch.float32).unsqueeze(dim=0) - dist = self.actor(state) - # value = self.critic(state) # note that 'dist' need require_grad=True - # value = value.detach().cpu().numpy().squeeze(0)[0] - action = np.random.choice(self.n_actions, p=dist.detach().cpu().numpy().squeeze(0)) # shape(p=(n_actions,1) - return action - def update(self,next_state,entropy): - value_pool,log_prob_pool,reward_pool = self.memory.sample() - value_pool = torch.tensor(value_pool, device=self.device) - log_prob_pool = torch.stack(log_prob_pool) - next_state = torch.tensor(next_state, device=self.device, dtype=torch.float32).unsqueeze(dim=0) - next_value = self.critic(next_state) - returns = np.zeros_like(reward_pool) - for t in reversed(range(len(reward_pool))): - next_value = reward_pool[t] + self.gamma * next_value # G(s_{t},a{t}) = r_{t+1} + gamma * V(s_{t+1}) - returns[t] = next_value - returns = torch.tensor(returns, device=self.device) - advantages = returns - value_pool - actor_loss = (-log_prob_pool * advantages).mean() - critic_loss = 0.5 * advantages.pow(2).mean() - tot_loss = actor_loss + critic_loss + 0.001 * entropy - self.actor_optim.zero_grad() - self.critic_optim.zero_grad() - tot_loss.backward() - self.actor_optim.step() - self.critic_optim.step() - self.memory.clear() - def save_model(self, path): - from pathlib import Path - # create path - Path(path).mkdir(parents=True, exist_ok=True) - torch.save(self.actor.state_dict(), f"{path}/actor_checkpoint.pt") - torch.save(self.critic.state_dict(), f"{path}/critic_checkpoint.pt") - - def load_model(self, path): - self.actor.load_state_dict(torch.load(f"{path}/actor_checkpoint.pt")) - self.critic.load_state_dict(torch.load(f"{path}/critic_checkpoint.pt")) \ No newline at end of file diff --git a/projects/codes/A2C/a2c_2.py b/projects/codes/A2C/a2c_2.py deleted file mode 100644 index e29acdc..0000000 --- a/projects/codes/A2C/a2c_2.py +++ /dev/null @@ -1,65 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: JiangJi -Email: johnjim0816@gmail.com -Date: 2022-09-19 14:48:16 -LastEditor: JiangJi -LastEditTime: 2022-10-30 01:21:50 -Discription: #TODO,待更新模版 -''' -import torch -import numpy as np - -class A2C_2: - def __init__(self,models,memories,cfg): - self.n_actions = cfg.n_actions - self.gamma = cfg.gamma - self.device = torch.device(cfg.device) - self.memory = memories['ACMemory'] - self.ac_net = models['ActorCritic'].to(self.device) - self.ac_optimizer = torch.optim.Adam(self.ac_net.parameters(), lr = cfg.lr) - def sample_action(self,state): - state = torch.tensor(state, device=self.device, dtype=torch.float32).unsqueeze(dim=0) - value, dist = self.ac_net(state) # note that 'dist' need require_grad=True - value = value.detach().numpy().squeeze(0)[0] - action = np.random.choice(self.n_actions, p=dist.detach().numpy().squeeze(0)) # shape(p=(n_actions,1) - return action,value,dist - def predict_action(self,state): - ''' predict can be all wrapped with no_grad(), then donot need detach(), or you can just copy contents of 'sample_action' - ''' - with torch.no_grad(): - state = torch.tensor(state, device=self.device, dtype=torch.float32).unsqueeze(dim=0) - value, dist = self.ac_net(state) - value = value.numpy().squeeze(0)[0] # shape(value) = (1,) - action = np.random.choice(self.n_actions, p=dist.numpy().squeeze(0)) # shape(p=(n_actions,1) - return action,value,dist - def update(self,next_state,entropy): - value_pool,log_prob_pool,reward_pool = self.memory.sample() - next_state = torch.tensor(next_state, device=self.device, dtype=torch.float32).unsqueeze(dim=0) - next_value,_ = self.ac_net(next_state) - returns = np.zeros_like(reward_pool) - for t in reversed(range(len(reward_pool))): - next_value = reward_pool[t] + self.gamma * next_value # G(s_{t},a{t}) = r_{t+1} + gamma * V(s_{t+1}) - returns[t] = next_value - returns = torch.tensor(returns, device=self.device) - value_pool = torch.tensor(value_pool, device=self.device) - advantages = returns - value_pool - log_prob_pool = torch.stack(log_prob_pool) - actor_loss = (-log_prob_pool * advantages).mean() - critic_loss = 0.5 * advantages.pow(2).mean() - ac_loss = actor_loss + critic_loss + 0.001 * entropy - self.ac_optimizer.zero_grad() - ac_loss.backward() - self.ac_optimizer.step() - self.memory.clear() - def save_model(self, path): - from pathlib import Path - # create path - Path(path).mkdir(parents=True, exist_ok=True) - torch.save(self.ac_net.state_dict(), f"{path}/a2c_checkpoint.pt") - - def load_model(self, path): - self.ac_net.load_state_dict(torch.load(f"{path}/a2c_checkpoint.pt")) - - \ No newline at end of file diff --git a/projects/codes/A2C/config/CartPole-v1_A2C_Test.yaml b/projects/codes/A2C/config/CartPole-v1_A2C_Test.yaml deleted file mode 100644 index d148bb0..0000000 --- a/projects/codes/A2C/config/CartPole-v1_A2C_Test.yaml +++ /dev/null @@ -1,21 +0,0 @@ -general_cfg: - algo_name: A2C - device: cuda - env_name: CartPole-v1 - mode: test - load_checkpoint: true - load_path: Train_CartPole-v1_A2C_20221031-232138 - max_steps: 200 - save_fig: true - seed: 1 - show_fig: false - test_eps: 20 - train_eps: 1000 -algo_cfg: - continuous: false - batch_size: 64 - buffer_size: 100000 - gamma: 0.99 - actor_lr: 0.0003 - critic_lr: 0.001 - target_update: 4 diff --git a/projects/codes/A2C/config/CartPole-v1_A2C_Train.yaml b/projects/codes/A2C/config/CartPole-v1_A2C_Train.yaml deleted file mode 100644 index f79f148..0000000 --- a/projects/codes/A2C/config/CartPole-v1_A2C_Train.yaml +++ /dev/null @@ -1,19 +0,0 @@ -general_cfg: - algo_name: A2C - device: cuda - env_name: CartPole-v1 - mode: train - load_checkpoint: false - load_path: Train_CartPole-v1_DQN_20221026-054757 - max_steps: 200 - save_fig: true - seed: 1 - show_fig: false - test_eps: 20 - train_eps: 600 -algo_cfg: - continuous: false - batch_size: 64 - buffer_size: 100000 - gamma: 0.0003 - lr: 0.001 diff --git a/projects/codes/A2C/config/Pendulum-v1_A2C_Train.yaml b/projects/codes/A2C/config/Pendulum-v1_A2C_Train.yaml deleted file mode 100644 index a1680c9..0000000 --- a/projects/codes/A2C/config/Pendulum-v1_A2C_Train.yaml +++ /dev/null @@ -1,21 +0,0 @@ -general_cfg: - algo_name: A2C - device: cuda - env_name: Pendulum-v1 - mode: train - eval_per_episode: 200 - load_checkpoint: false - load_path: Train_CartPole-v1_DQN_20221026-054757 - max_steps: 200 - save_fig: true - seed: 1 - show_fig: false - test_eps: 20 - train_eps: 1000 -algo_cfg: - continuous: true - batch_size: 64 - buffer_size: 100000 - gamma: 0.0003 - actor_lr: 0.0003 - critic_lr: 0.001 diff --git a/projects/codes/A2C/config/config.py b/projects/codes/A2C/config/config.py deleted file mode 100644 index a552d38..0000000 --- a/projects/codes/A2C/config/config.py +++ /dev/null @@ -1,38 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: JiangJi -Email: johnjim0816@gmail.com -Date: 2022-10-30 00:53:03 -LastEditor: JiangJi -LastEditTime: 2022-11-01 00:17:55 -Discription: default parameters of A2C -''' -from common.config import GeneralConfig,AlgoConfig - -class GeneralConfigA2C(GeneralConfig): - def __init__(self) -> None: - self.env_name = "CartPole-v1" # name of environment - self.algo_name = "A2C" # name of algorithm - self.mode = "train" # train or test - self.seed = 1 # random seed - self.device = "cuda" # device to use - self.train_eps = 1000 # number of episodes for training - self.test_eps = 20 # number of episodes for testing - self.max_steps = 200 # max steps for each episode - self.load_checkpoint = False - self.load_path = "tasks" # path to load model - self.show_fig = False # show figure or not - self.save_fig = True # save figure or not - -class AlgoConfigA2C(AlgoConfig): - def __init__(self) -> None: - self.continuous = False # continuous or discrete action space - self.hidden_dim = 256 # hidden_dim for MLP - self.gamma = 0.99 # discount factor - self.actor_lr = 3e-4 # learning rate of actor - self.critic_lr = 1e-3 # learning rate of critic - self.actor_hidden_dim = 256 # hidden_dim for actor MLP - self.critic_hidden_dim = 256 # hidden_dim for critic MLP - self.buffer_size = 100000 # size of replay buffer - self.batch_size = 64 # batch size \ No newline at end of file diff --git a/projects/codes/A2C/main2.py b/projects/codes/A2C/main2.py deleted file mode 100644 index 60bd7c2..0000000 --- a/projects/codes/A2C/main2.py +++ /dev/null @@ -1,130 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: JiangJi -Email: johnjim0816@gmail.com -Date: 2022-09-19 14:48:16 -LastEditor: JiangJi -LastEditTime: 2022-10-30 01:21:15 -Discription: #TODO,待更新模版 -''' -import sys,os -os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE" # avoid "OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized." -curr_path = os.path.dirname(os.path.abspath(__file__)) # current path -parent_path = os.path.dirname(curr_path) # parent path -sys.path.append(parent_path) # add path to system path - -import datetime -import argparse -import gym -import torch -import numpy as np -from common.utils import all_seed -from common.launcher import Launcher -from common.memories import PGReplay -from common.models import ActorCriticSoftmax -from envs.register import register_env -from a2c_2 import A2C_2 - -class Main(Launcher): - def get_args(self): - curr_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S") # obtain current time - parser = argparse.ArgumentParser(description="hyperparameters") - parser.add_argument('--algo_name',default='A2C',type=str,help="name of algorithm") - parser.add_argument('--env_name',default='CartPole-v0',type=str,help="name of environment") - parser.add_argument('--train_eps',default=2000,type=int,help="episodes of training") - parser.add_argument('--test_eps',default=20,type=int,help="episodes of testing") - parser.add_argument('--ep_max_steps',default = 100000,type=int,help="steps per episode, much larger value can simulate infinite steps") - parser.add_argument('--gamma',default=0.99,type=float,help="discounted factor") - parser.add_argument('--lr',default=3e-4,type=float,help="learning rate") - parser.add_argument('--actor_hidden_dim',default=256,type=int) - parser.add_argument('--critic_hidden_dim',default=256,type=int) - parser.add_argument('--device',default='cpu',type=str,help="cpu or cuda") - parser.add_argument('--seed',default=10,type=int,help="seed") - parser.add_argument('--show_fig',default=False,type=bool,help="if show figure or not") - parser.add_argument('--save_fig',default=True,type=bool,help="if save figure or not") - args = parser.parse_args() - default_args = {'result_path':f"{curr_path}/outputs/{args.env_name}/{curr_time}/results/", - 'model_path':f"{curr_path}/outputs/{args.env_name}/{curr_time}/models/", - } - args = {**vars(args),**default_args} # type(dict) - return args - def env_agent_config(self,cfg): - ''' create env and agent - ''' - register_env(cfg['env_name']) - env = gym.make(cfg['env_name']) - if cfg['seed'] !=0: # set random seed - all_seed(env,seed=cfg["seed"]) - try: # state dimension - n_states = env.observation_space.n # print(hasattr(env.observation_space, 'n')) - except AttributeError: - n_states = env.observation_space.shape[0] # print(hasattr(env.observation_space, 'shape')) - n_actions = env.action_space.n # action dimension - print(f"n_states: {n_states}, n_actions: {n_actions}") - cfg.update({"n_states":n_states,"n_actions":n_actions}) # update to cfg paramters - models = {'ActorCritic':ActorCriticSoftmax(cfg['n_states'],cfg['n_actions'], actor_hidden_dim = cfg['actor_hidden_dim'],critic_hidden_dim=cfg['critic_hidden_dim'])} - memories = {'ACMemory':PGReplay()} - agent = A2C_2(models,memories,cfg) - return env,agent - def train(self,cfg,env,agent): - print("Start training!") - print(f"Env: {cfg['env_name']}, Algorithm: {cfg['algo_name']}, Device: {cfg['device']}") - rewards = [] # record rewards for all episodes - steps = [] # record steps for all episodes - - for i_ep in range(cfg['train_eps']): - ep_reward = 0 # reward per episode - ep_step = 0 # step per episode - ep_entropy = 0 - state = env.reset() # reset and obtain initial state - - for _ in range(cfg['ep_max_steps']): - action, value, dist = agent.sample_action(state) # sample action - next_state, reward, done, _ = env.step(action) # update env and return transitions - log_prob = torch.log(dist.squeeze(0)[action]) - entropy = -np.sum(np.mean(dist.detach().numpy()) * np.log(dist.detach().numpy())) - agent.memory.push((value,log_prob,reward)) # save transitions - state = next_state # update state - ep_reward += reward - ep_entropy += entropy - ep_step += 1 - if done: - break - agent.update(next_state,ep_entropy) # update agent - rewards.append(ep_reward) - steps.append(ep_step) - if (i_ep+1)%10==0: - print(f'Episode: {i_ep+1}/{cfg["train_eps"]}, Reward: {ep_reward:.2f}, Steps:{ep_step}') - print("Finish training!") - return {'episodes':range(len(rewards)),'rewards':rewards,'steps':steps} - def test(self,cfg,env,agent): - print("Start testing!") - print(f"Env: {cfg['env_name']}, Algorithm: {cfg['algo_name']}, Device: {cfg['device']}") - rewards = [] # record rewards for all episodes - steps = [] # record steps for all episodes - for i_ep in range(cfg['test_eps']): - ep_reward = 0 # reward per episode - ep_step = 0 - state = env.reset() # reset and obtain initial state - for _ in range(cfg['ep_max_steps']): - action,_,_ = agent.predict_action(state) # predict action - next_state, reward, done, _ = env.step(action) - state = next_state - ep_reward += reward - ep_step += 1 - if done: - break - rewards.append(ep_reward) - steps.append(ep_step) - print(f"Episode: {i_ep+1}/{cfg['test_eps']}, Steps:{ep_step}, Reward: {ep_reward:.2f}") - print("Finish testing!") - return {'episodes':range(len(rewards)),'rewards':rewards,'steps':steps} - -if __name__ == "__main__": - main = Main() - main.run() - - - - diff --git a/projects/codes/A2C/task0.py b/projects/codes/A2C/task0.py deleted file mode 100644 index 4a3208a..0000000 --- a/projects/codes/A2C/task0.py +++ /dev/null @@ -1,142 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: JiangJi -Email: johnjim0816@gmail.com -Date: 2022-10-30 01:19:43 -LastEditor: JiangJi -LastEditTime: 2022-11-01 01:21:06 -Discription: -''' -import sys,os -os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE" # avoid "OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized." -curr_path = os.path.dirname(os.path.abspath(__file__)) # current path -parent_path = os.path.dirname(curr_path) # parent path -sys.path.append(parent_path) # add path to system path - -import gym -from common.utils import all_seed,merge_class_attrs -from common.launcher import Launcher -from common.memories import PGReplay -from common.models import ActorSoftmax,Critic -from envs.register import register_env -from a2c import A2C -from config.config import GeneralConfigA2C,AlgoConfigA2C - -class Main(Launcher): - def __init__(self) -> None: - super().__init__() - self.cfgs['general_cfg'] = merge_class_attrs(self.cfgs['general_cfg'],GeneralConfigA2C()) - self.cfgs['algo_cfg'] = merge_class_attrs(self.cfgs['algo_cfg'],AlgoConfigA2C()) - def env_agent_config(self,cfg,logger): - ''' create env and agent - ''' - register_env(cfg.env_name) - env = gym.make(cfg.env_name,new_step_api=True) # create env - if cfg.seed !=0: # set random seed - all_seed(env,seed = cfg.seed) - try: # state dimension - n_states = env.observation_space.n # print(hasattr(env.observation_space, 'n')) - except AttributeError: - n_states = env.observation_space.shape[0] # print(hasattr(env.observation_space, 'shape')) - n_actions = env.action_space.n # action dimension - logger.info(f"n_states: {n_states}, n_actions: {n_actions}") # print info - # update to cfg paramters - setattr(cfg, 'n_states', n_states) - setattr(cfg, 'n_actions', n_actions) - models = {'Actor':ActorSoftmax(n_states,n_actions, hidden_dim = cfg.actor_hidden_dim),'Critic':Critic(n_states,1,hidden_dim=cfg.critic_hidden_dim)} - memories = {'ACMemory':PGReplay()} - agent = A2C(models,memories,cfg) - for k,v in models.items(): - logger.info(f"{k} model name: {type(v).__name__}") - for k,v in memories.items(): - logger.info(f"{k} memory name: {type(v).__name__}") - logger.info(f"agent name: {type(agent).__name__}") - return env,agent - def train_one_episode(self, env, agent, cfg): - ep_reward = 0 # reward per episode - ep_step = 0 # step per episode - ep_entropy = 0 # entropy per episode - state = env.reset() # reset and obtain initial state - for _ in range(cfg.max_steps): - action = agent.sample_action(state) # sample action - next_state, reward, terminated, truncated , info = env.step(action) # update env and return transitions - agent.memory.push((agent.value,agent.log_prob,reward)) # save transitions - state = next_state # update state - ep_reward += reward - ep_entropy += agent.entropy - ep_step += 1 - if terminated: - break - agent.update(next_state,ep_entropy) # update agent - return agent,ep_reward,ep_step - def test_one_episode(self, env, agent, cfg): - ep_reward = 0 # reward per episode - ep_step = 0 # step per episode - state = env.reset() # reset and obtain initial state - for _ in range(cfg.max_steps): - action = agent.predict_action(state) # predict action - next_state, reward, terminated, truncated , info = env.step(action) - state = next_state - ep_reward += reward - ep_step += 1 - if terminated: - break - return agent,ep_reward,ep_step - # def train(self,cfg,env,agent,logger): - # logger.info("Start training!") - # logger.info(f"Env: {cfg.env_name}, Algorithm: {cfg.algo_name}, Device: {cfg.device}") - # rewards = [] # record rewards for all episodes - # steps = [] # record steps for all episodes - # for i_ep in range(cfg.train_eps): - # ep_reward = 0 # reward per episode - # ep_step = 0 # step per episode - # ep_entropy = 0 - # state = env.reset() # reset and obtain initial state - # for _ in range(cfg.max_steps): - # action = agent.sample_action(state) # sample action - # next_state, reward, terminated, truncated , info = env.step(action) # update env and return transitions - # agent.memory.push((agent.value,agent.log_prob,reward)) # save transitions - # state = next_state # update state - # ep_reward += reward - # ep_entropy += agent.entropy - # ep_step += 1 - # if terminated: - # break - # agent.update(next_state,ep_entropy) # update agent - # rewards.append(ep_reward) - # steps.append(ep_step) - # logger.info(f"Episode: {i_ep+1}/{cfg.train_eps}, Reward: {ep_reward:.2f}, Steps:{ep_step}") - # logger.info("Finish training!") - # return {'episodes':range(len(rewards)),'rewards':rewards,'steps':steps} - # def test(self,cfg,env,agent,logger): - # logger.info("Start testing!") - # logger.info(f"Env: {cfg.env_name}, Algorithm: {cfg.algo_name}, Device: {cfg.device}") - # rewards = [] # record rewards for all episodes - # steps = [] # record steps for all episodes - # for i_ep in range(cfg.test_eps): - # ep_reward = 0 # reward per episode - # ep_step = 0 - # state = env.reset() # reset and obtain initial state - # for _ in range(cfg.max_steps): - # action = agent.predict_action(state) # predict action - # next_state, reward, terminated, truncated , info = env.step(action) - # state = next_state - # ep_reward += reward - # ep_step += 1 - # if terminated: - # break - # rewards.append(ep_reward) - # steps.append(ep_step) - # logger.info(f"Episode: {i_ep+1}/{cfg.test_eps}, Reward: {ep_reward:.2f}, Steps:{ep_step}") - # logger.info("Finish testing!") - # env.close() - # return {'episodes':range(len(rewards)),'rewards':rewards,'steps':steps} - -if __name__ == "__main__": - main = Main() - main.run() - - - - diff --git a/projects/codes/A2C/task1.py b/projects/codes/A2C/task1.py deleted file mode 100644 index ff7c86f..0000000 --- a/projects/codes/A2C/task1.py +++ /dev/null @@ -1,142 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: JiangJi -Email: johnjim0816@gmail.com -Date: 2022-10-30 01:19:43 -LastEditor: JiangJi -LastEditTime: 2022-11-01 01:21:12 -Discription: continuous action space -''' -import sys,os -os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE" # avoid "OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized." -curr_path = os.path.dirname(os.path.abspath(__file__)) # current path -parent_path = os.path.dirname(curr_path) # parent path -sys.path.append(parent_path) # add path to system path - -import gym -from common.utils import all_seed,merge_class_attrs -from common.launcher import Launcher -from common.memories import PGReplay -from common.models import ActorSoftmaxTanh,Critic -from envs.register import register_env -from a2c import A2C -from config.config import GeneralConfigA2C,AlgoConfigA2C - -class Main(Launcher): - def __init__(self) -> None: - super().__init__() - self.cfgs['general_cfg'] = merge_class_attrs(self.cfgs['general_cfg'],GeneralConfigA2C()) - self.cfgs['algo_cfg'] = merge_class_attrs(self.cfgs['algo_cfg'],AlgoConfigA2C()) - def env_agent_config(self,cfg,logger): - ''' create env and agent - ''' - register_env(cfg.env_name) - env = gym.make(cfg.env_name,new_step_api=True) # create env - if cfg.seed !=0: # set random seed - all_seed(env,seed = cfg.seed) - try: # state dimension - n_states = env.observation_space.n # print(hasattr(env.observation_space, 'n')) - except AttributeError: - n_states = env.observation_space.shape[0] # print(hasattr(env.observation_space, 'shape')) - n_actions = env.action_space.n # action dimension - logger.info(f"n_states: {n_states}, n_actions: {n_actions}") # print info - # update to cfg paramters - setattr(cfg, 'n_states', n_states) - setattr(cfg, 'n_actions', n_actions) - models = {'Actor':ActorSoftmaxTanh(n_states,n_actions, hidden_dim = cfg.actor_hidden_dim),'Critic':Critic(n_states,1,hidden_dim=cfg.critic_hidden_dim)} - memories = {'ACMemory':PGReplay()} - agent = A2C(models,memories,cfg) - for k,v in models.items(): - logger.info(f"{k} model name: {type(v).__name__}") - for k,v in memories.items(): - logger.info(f"{k} memory name: {type(v).__name__}") - logger.info(f"agent name: {type(agent).__name__}") - return env,agent - def train_one_episode(self, env, agent, cfg): - ep_reward = 0 # reward per episode - ep_step = 0 # step per episode - ep_entropy = 0 # entropy per episode - state = env.reset() # reset and obtain initial state - for _ in range(cfg.max_steps): - action = agent.sample_action(state) # sample action - next_state, reward, terminated, truncated , info = env.step(action) # update env and return transitions - agent.memory.push((agent.value,agent.log_prob,reward)) # save transitions - state = next_state # update state - ep_reward += reward - ep_entropy += agent.entropy - ep_step += 1 - if terminated: - break - agent.update(next_state,ep_entropy) # update agent - return agent,ep_reward,ep_step - def test_one_episode(self, env, agent, cfg): - ep_reward = 0 # reward per episode - ep_step = 0 # step per episode - state = env.reset() # reset and obtain initial state - for _ in range(cfg.max_steps): - action = agent.predict_action(state) # predict action - next_state, reward, terminated, truncated , info = env.step(action) - state = next_state - ep_reward += reward - ep_step += 1 - if terminated: - break - return agent,ep_reward,ep_step - # def train(self,cfg,env,agent,logger): - # logger.info("Start training!") - # logger.info(f"Env: {cfg.env_name}, Algorithm: {cfg.algo_name}, Device: {cfg.device}") - # rewards = [] # record rewards for all episodes - # steps = [] # record steps for all episodes - # for i_ep in range(cfg.train_eps): - # ep_reward = 0 # reward per episode - # ep_step = 0 # step per episode - # ep_entropy = 0 - # state = env.reset() # reset and obtain initial state - # for _ in range(cfg.max_steps): - # action = agent.sample_action(state) # sample action - # next_state, reward, terminated, truncated , info = env.step(action) # update env and return transitions - # agent.memory.push((agent.value,agent.log_prob,reward)) # save transitions - # state = next_state # update state - # ep_reward += reward - # ep_entropy += agent.entropy - # ep_step += 1 - # if terminated: - # break - # agent.update(next_state,ep_entropy) # update agent - # rewards.append(ep_reward) - # steps.append(ep_step) - # logger.info(f"Episode: {i_ep+1}/{cfg.train_eps}, Reward: {ep_reward:.2f}, Steps:{ep_step}") - # logger.info("Finish training!") - # return {'episodes':range(len(rewards)),'rewards':rewards,'steps':steps} - # def test(self,cfg,env,agent,logger): - # logger.info("Start testing!") - # logger.info(f"Env: {cfg.env_name}, Algorithm: {cfg.algo_name}, Device: {cfg.device}") - # rewards = [] # record rewards for all episodes - # steps = [] # record steps for all episodes - # for i_ep in range(cfg.test_eps): - # ep_reward = 0 # reward per episode - # ep_step = 0 - # state = env.reset() # reset and obtain initial state - # for _ in range(cfg.max_steps): - # action = agent.predict_action(state) # predict action - # next_state, reward, terminated, truncated , info = env.step(action) - # state = next_state - # ep_reward += reward - # ep_step += 1 - # if terminated: - # break - # rewards.append(ep_reward) - # steps.append(ep_step) - # logger.info(f"Episode: {i_ep+1}/{cfg.test_eps}, Reward: {ep_reward:.2f}, Steps:{ep_step}") - # logger.info("Finish testing!") - # env.close() - # return {'episodes':range(len(rewards)),'rewards':rewards,'steps':steps} - -if __name__ == "__main__": - main = Main() - main.run() - - - - diff --git a/projects/codes/A2C/task2.py b/projects/codes/A2C/task2.py deleted file mode 100644 index 96c1cc2..0000000 --- a/projects/codes/A2C/task2.py +++ /dev/null @@ -1,149 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: JiangJi -Email: johnjim0816@gmail.com -Date: 2022-10-30 01:19:43 -LastEditor: JiangJi -LastEditTime: 2022-11-01 00:08:22 -Discription: the only difference from task0.py is that the actor here we use ActorSoftmaxTanh instead of ActorSoftmax with ReLU -''' -import sys,os -os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE" # avoid "OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized." -curr_path = os.path.dirname(os.path.abspath(__file__)) # current path -parent_path = os.path.dirname(curr_path) # parent path -sys.path.append(parent_path) # add path to system path - -import gym -import torch -import numpy as np -from common.utils import all_seed,merge_class_attrs -from common.launcher import Launcher -from common.memories import PGReplay -from common.models import ActorNormal,Critic -from envs.register import register_env -from a2c import A2C -from config.config import GeneralConfigA2C,AlgoConfigA2C - -class Main(Launcher): - def __init__(self) -> None: - super().__init__() - self.cfgs['general_cfg'] = merge_class_attrs(self.cfgs['general_cfg'],GeneralConfigA2C()) - self.cfgs['algo_cfg'] = merge_class_attrs(self.cfgs['algo_cfg'],AlgoConfigA2C()) - def env_agent_config(self,cfg,logger): - ''' create env and agent - ''' - register_env(cfg.env_name) - env = gym.make(cfg.env_name,new_step_api=True) # create env - if cfg.seed !=0: # set random seed - all_seed(env,seed = cfg.seed) - try: # state dimension - n_states = env.observation_space.n # print(hasattr(env.observation_space, 'n')) - except AttributeError: - n_states = env.observation_space.shape[0] # print(hasattr(env.observation_space, 'shape')) - try: - n_actions = env.action_space.n # action dimension - except AttributeError: - n_actions = env.action_space.shape[0] - logger.info(f"action bound: {abs(env.action_space.low.item())}") - setattr(cfg, 'action_bound', abs(env.action_space.low.item())) - logger.info(f"n_states: {n_states}, n_actions: {n_actions}") # print info - # update to cfg paramters - setattr(cfg, 'n_states', n_states) - setattr(cfg, 'n_actions', n_actions) - models = {'Actor':ActorNormal(n_states,n_actions, hidden_dim = cfg.actor_hidden_dim),'Critic':Critic(n_states,1,hidden_dim=cfg.critic_hidden_dim)} - memories = {'ACMemory':PGReplay()} - agent = A2C(models,memories,cfg) - for k,v in models.items(): - logger.info(f"{k} model name: {type(v).__name__}") - for k,v in memories.items(): - logger.info(f"{k} memory name: {type(v).__name__}") - logger.info(f"agent name: {type(agent).__name__}") - return env,agent - def train_one_episode(self, env, agent, cfg): - ep_reward = 0 # reward per episode - ep_step = 0 # step per episode - ep_entropy = 0 # entropy per episode - state = env.reset() # reset and obtain initial state - for _ in range(cfg.max_steps): - action = agent.sample_action(state) # sample action - next_state, reward, terminated, truncated , info = env.step(action) # update env and return transitions - agent.memory.push((agent.value,agent.log_prob,reward)) # save transitions - state = next_state # update state - ep_reward += reward - ep_entropy += agent.entropy - ep_step += 1 - if terminated: - break - agent.update(next_state,ep_entropy) # update agent - return agent,ep_reward,ep_step - def test_one_episode(self, env, agent, cfg): - ep_reward = 0 # reward per episode - ep_step = 0 # step per episode - state = env.reset() # reset and obtain initial state - for _ in range(cfg.max_steps): - action = agent.predict_action(state) # predict action - next_state, reward, terminated, truncated , info = env.step(action) - state = next_state - ep_reward += reward - ep_step += 1 - if terminated: - break - return agent,ep_reward,ep_step - # def train(self,cfg,env,agent,logger): - # logger.info("Start training!") - # logger.info(f"Env: {cfg.env_name}, Algorithm: {cfg.algo_name}, Device: {cfg.device}") - # rewards = [] # record rewards for all episodes - # steps = [] # record steps for all episodes - # for i_ep in range(cfg.train_eps): - # ep_reward = 0 # reward per episode - # ep_step = 0 # step per episode - # ep_entropy = 0 - # state = env.reset() # reset and obtain initial state - # for _ in range(cfg.max_steps): - # action = agent.sample_action(state) # sample action - # next_state, reward, terminated, truncated , info = env.step(action) # update env and return transitions - # agent.memory.push((agent.value,agent.log_prob,reward)) # save transitions - # state = next_state # update state - # ep_reward += reward - # ep_entropy += agent.entropy - # ep_step += 1 - # if terminated: - # break - # agent.update(next_state,ep_entropy) # update agent - # rewards.append(ep_reward) - # steps.append(ep_step) - # logger.info(f"Episode: {i_ep+1}/{cfg.train_eps}, Reward: {ep_reward:.2f}, Steps:{ep_step}") - # logger.info("Finish training!") - # return {'episodes':range(len(rewards)),'rewards':rewards,'steps':steps} - # def test(self,cfg,env,agent,logger): - # logger.info("Start testing!") - # logger.info(f"Env: {cfg.env_name}, Algorithm: {cfg.algo_name}, Device: {cfg.device}") - # rewards = [] # record rewards for all episodes - # steps = [] # record steps for all episodes - # for i_ep in range(cfg.test_eps): - # ep_reward = 0 # reward per episode - # ep_step = 0 - # state = env.reset() # reset and obtain initial state - # for _ in range(cfg.max_steps): - # action = agent.predict_action(state) # predict action - # next_state, reward, terminated, truncated , info = env.step(action) - # state = next_state - # ep_reward += reward - # ep_step += 1 - # if terminated: - # break - # rewards.append(ep_reward) - # steps.append(ep_step) - # logger.info(f"Episode: {i_ep+1}/{cfg.test_eps}, Reward: {ep_reward:.2f}, Steps:{ep_step}") - # logger.info("Finish testing!") - # env.close() - # return {'episodes':range(len(rewards)),'rewards':rewards,'steps':steps} - -if __name__ == "__main__": - main = Main() - main.run() - - - - diff --git a/projects/codes/A3C/README.md b/projects/codes/A3C/README.md deleted file mode 100644 index 5856b80..0000000 --- a/projects/codes/A3C/README.md +++ /dev/null @@ -1,5 +0,0 @@ -## A2C - - - -https://towardsdatascience.com/understanding-actor-critic-methods-931b97b6df3f \ No newline at end of file diff --git a/projects/codes/A3C/a3c.py b/projects/codes/A3C/a3c.py deleted file mode 100644 index ba0ed7c..0000000 --- a/projects/codes/A3C/a3c.py +++ /dev/null @@ -1,56 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: JiangJi -Email: johnjim0816@gmail.com -Date: 2021-05-03 22:16:08 -LastEditor: JiangJi -LastEditTime: 2022-07-20 23:54:40 -Discription: -Environment: -''' -import torch -import torch.optim as optim -import torch.nn as nn -import torch.nn.functional as F -from torch.distributions import Categorical - -class ActorCritic(nn.Module): - ''' A2C网络模型,包含一个Actor和Critic - ''' - def __init__(self, input_dim, output_dim, hidden_dim): - super(ActorCritic, self).__init__() - self.critic = nn.Sequential( - nn.Linear(input_dim, hidden_dim), - nn.ReLU(), - nn.Linear(hidden_dim, 1) - ) - - self.actor = nn.Sequential( - nn.Linear(input_dim, hidden_dim), - nn.ReLU(), - nn.Linear(hidden_dim, output_dim), - nn.Softmax(dim=1), - ) - - def forward(self, x): - value = self.critic(x) - probs = self.actor(x) - dist = Categorical(probs) - return dist, value -class A2C: - ''' A2C算法 - ''' - def __init__(self,n_states,n_actions,cfg) -> None: - self.gamma = cfg.gamma - self.device = torch.device(cfg.device) - self.model = ActorCritic(n_states, n_actions, cfg.hidden_size).to(self.device) - self.optimizer = optim.Adam(self.model.parameters()) - - def compute_returns(self,next_value, rewards, masks): - R = next_value - returns = [] - for step in reversed(range(len(rewards))): - R = rewards[step] + self.gamma * R * masks[step] - returns.insert(0, R) - return returns \ No newline at end of file diff --git a/projects/codes/A3C/outputs/CartPole-v0/20220713-221850/results/params.json b/projects/codes/A3C/outputs/CartPole-v0/20220713-221850/results/params.json deleted file mode 100644 index 2773964..0000000 --- a/projects/codes/A3C/outputs/CartPole-v0/20220713-221850/results/params.json +++ /dev/null @@ -1,14 +0,0 @@ -{ - "algo_name": "A2C", - "env_name": "CartPole-v0", - "n_envs": 8, - "max_steps": 20000, - "n_steps": 5, - "gamma": 0.99, - "lr": 0.001, - "hidden_dim": 256, - "deivce": "cpu", - "result_path": "C:\\Users\\24438\\Desktop\\rl-tutorials/outputs/CartPole-v0/20220713-221850/results/", - "model_path": "C:\\Users\\24438\\Desktop\\rl-tutorials/outputs/CartPole-v0/20220713-221850/models/", - "save_fig": true -} \ No newline at end of file diff --git a/projects/codes/A3C/outputs/CartPole-v0/20220713-221850/results/train_ma_rewards.npy b/projects/codes/A3C/outputs/CartPole-v0/20220713-221850/results/train_ma_rewards.npy deleted file mode 100644 index 66091a2..0000000 Binary files a/projects/codes/A3C/outputs/CartPole-v0/20220713-221850/results/train_ma_rewards.npy and /dev/null differ diff --git a/projects/codes/A3C/outputs/CartPole-v0/20220713-221850/results/train_rewards.npy b/projects/codes/A3C/outputs/CartPole-v0/20220713-221850/results/train_rewards.npy deleted file mode 100644 index 5e6ea3f..0000000 Binary files a/projects/codes/A3C/outputs/CartPole-v0/20220713-221850/results/train_rewards.npy and /dev/null differ diff --git a/projects/codes/A3C/outputs/CartPole-v0/20220713-221850/results/train_rewards_curve.png b/projects/codes/A3C/outputs/CartPole-v0/20220713-221850/results/train_rewards_curve.png deleted file mode 100644 index b8c0921..0000000 Binary files a/projects/codes/A3C/outputs/CartPole-v0/20220713-221850/results/train_rewards_curve.png and /dev/null differ diff --git a/projects/codes/A3C/task0.py b/projects/codes/A3C/task0.py deleted file mode 100644 index 09dcceb..0000000 --- a/projects/codes/A3C/task0.py +++ /dev/null @@ -1,137 +0,0 @@ -import sys,os -curr_path = os.path.dirname(os.path.abspath(__file__)) # current path -parent_path = os.path.dirname(curr_path) # parent path -sys.path.append(parent_path) # add to system path - -import gym -import numpy as np -import torch -import torch.optim as optim -import datetime -import argparse -from common.multiprocessing_env import SubprocVecEnv -from a3c import ActorCritic -from common.utils import save_results, make_dir -from common.utils import plot_rewards, save_args - - -def get_args(): - """ Hyperparameters - """ - curr_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S") # Obtain current time - parser = argparse.ArgumentParser(description="hyperparameters") - parser.add_argument('--algo_name',default='A2C',type=str,help="name of algorithm") - parser.add_argument('--env_name',default='CartPole-v0',type=str,help="name of environment") - parser.add_argument('--n_envs',default=8,type=int,help="numbers of environments") - - parser.add_argument('--max_steps',default=20000,type=int,help="episodes of training") - parser.add_argument('--n_steps',default=5,type=int,help="episodes of testing") - parser.add_argument('--gamma',default=0.99,type=float,help="discounted factor") - parser.add_argument('--lr',default=1e-3,type=float,help="learning rate") - parser.add_argument('--hidden_dim',default=256,type=int) - parser.add_argument('--device',default='cpu',type=str,help="cpu or cuda") - parser.add_argument('--result_path',default=curr_path + "/outputs/" + parser.parse_args().env_name + \ - '/' + curr_time + '/results/' ) - parser.add_argument('--model_path',default=curr_path + "/outputs/" + parser.parse_args().env_name + \ - '/' + curr_time + '/models/' ) # path to save models - parser.add_argument('--save_fig',default=True,type=bool,help="if save figure or not") - args = parser.parse_args() - return args - -def make_envs(env_name): - def _thunk(): - env = gym.make(env_name) - env.seed(2) - return env - return _thunk -def test_env(env,model,vis=False): - state = env.reset() - if vis: env.render() - done = False - total_reward = 0 - while not done: - state = torch.FloatTensor(state).unsqueeze(0).to(cfg.device) - dist, _ = model(state) - next_state, reward, done, _ = env.step(dist.sample().cpu().numpy()[0]) - state = next_state - if vis: env.render() - total_reward += reward - return total_reward - -def compute_returns(next_value, rewards, masks, gamma=0.99): - R = next_value - returns = [] - for step in reversed(range(len(rewards))): - R = rewards[step] + gamma * R * masks[step] - returns.insert(0, R) - return returns - - -def train(cfg,envs): - print('Start training!') - print(f'Env:{cfg.env_name}, Algorithm:{cfg.algo_name}, Device:{cfg.device}') - env = gym.make(cfg.env_name) # a single env - env.seed(10) - n_states = envs.observation_space.shape[0] - n_actions = envs.action_space.n - model = ActorCritic(n_states, n_actions, cfg.hidden_dim).to(cfg.device) - optimizer = optim.Adam(model.parameters()) - step_idx = 0 - test_rewards = [] - test_ma_rewards = [] - state = envs.reset() - while step_idx < cfg.max_steps: - log_probs = [] - values = [] - rewards = [] - masks = [] - entropy = 0 - # rollout trajectory - for _ in range(cfg.n_steps): - state = torch.FloatTensor(state).to(cfg.device) - dist, value = model(state) - action = dist.sample() - next_state, reward, done, _ = envs.step(action.cpu().numpy()) - log_prob = dist.log_prob(action) - entropy += dist.entropy().mean() - log_probs.append(log_prob) - values.append(value) - rewards.append(torch.FloatTensor(reward).unsqueeze(1).to(cfg.device)) - masks.append(torch.FloatTensor(1 - done).unsqueeze(1).to(cfg.device)) - state = next_state - step_idx += 1 - if step_idx % 100 == 0: - test_reward = np.mean([test_env(env,model) for _ in range(10)]) - print(f"step_idx:{step_idx}, test_reward:{test_reward}") - test_rewards.append(test_reward) - if test_ma_rewards: - test_ma_rewards.append(0.9*test_ma_rewards[-1]+0.1*test_reward) - else: - test_ma_rewards.append(test_reward) - # plot(step_idx, test_rewards) - next_state = torch.FloatTensor(next_state).to(cfg.device) - _, next_value = model(next_state) - returns = compute_returns(next_value, rewards, masks) - log_probs = torch.cat(log_probs) - returns = torch.cat(returns).detach() - values = torch.cat(values) - advantage = returns - values - actor_loss = -(log_probs * advantage.detach()).mean() - critic_loss = advantage.pow(2).mean() - loss = actor_loss + 0.5 * critic_loss - 0.001 * entropy - optimizer.zero_grad() - loss.backward() - optimizer.step() - print('Finish training!') - return {'rewards':test_rewards,'ma_rewards':test_ma_rewards} -if __name__ == "__main__": - cfg = get_args() - envs = [make_envs(cfg.env_name) for i in range(cfg.n_envs)] - envs = SubprocVecEnv(envs) - # training - res_dic = train(cfg,envs) - make_dir(cfg.result_path,cfg.model_path) - save_args(cfg) - save_results(res_dic, tag='train', - path=cfg.result_path) - plot_rewards(res_dic['rewards'], res_dic['ma_rewards'], cfg, tag="train") # 画出结果 diff --git a/projects/codes/DDPG/ddpg.py b/projects/codes/DDPG/ddpg.py deleted file mode 100644 index 246966b..0000000 --- a/projects/codes/DDPG/ddpg.py +++ /dev/null @@ -1,96 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -@Author: John -@Email: johnjim0816@gmail.com -@Date: 2020-06-09 20:25:52 -@LastEditor: John -LastEditTime: 2022-09-27 15:43:21 -@Discription: -@Environment: python 3.7.7 -''' -import random -import numpy as np -import torch -import torch.nn as nn -import torch.optim as optim - -class DDPG: - def __init__(self, models,memories,cfg): - self.device = torch.device(cfg['device']) - self.critic = models['critic'].to(self.device) - self.target_critic = models['critic'].to(self.device) - self.actor = models['actor'].to(self.device) - self.target_actor = models['actor'].to(self.device) - # copy weights from critic to target_critic - for target_param, param in zip(self.target_critic.parameters(), self.critic.parameters()): - target_param.data.copy_(param.data) - # copy weights from actor to target_actor - for target_param, param in zip(self.target_actor.parameters(), self.actor.parameters()): - target_param.data.copy_(param.data) - self.critic_optimizer = optim.Adam(self.critic.parameters(), lr=cfg['critic_lr']) - self.actor_optimizer = optim.Adam(self.actor.parameters(), lr=cfg['actor_lr']) - self.memory = memories['memory'] - self.batch_size = cfg['batch_size'] - self.gamma = cfg['gamma'] - self.tau = cfg['tau'] - - def sample_action(self, state): - state = torch.FloatTensor(state).unsqueeze(0).to(self.device) - action = self.actor(state) - return action.detach().cpu().numpy()[0, 0] - @torch.no_grad() - def predict_action(self, state): - ''' predict action - ''' - state = torch.FloatTensor(state).unsqueeze(0).to(self.device) - action = self.actor(state) - return action.cpu().numpy()[0, 0] - - def update(self): - if len(self.memory) < self.batch_size: # when memory size is less than batch size, return - return - # sample a random minibatch of N transitions from R - state, action, reward, next_state, done = self.memory.sample(self.batch_size) - # convert to tensor - state = torch.FloatTensor(np.array(state)).to(self.device) - next_state = torch.FloatTensor(np.array(next_state)).to(self.device) - action = torch.FloatTensor(np.array(action)).to(self.device) - reward = torch.FloatTensor(reward).unsqueeze(1).to(self.device) - done = torch.FloatTensor(np.float32(done)).unsqueeze(1).to(self.device) - - policy_loss = self.critic(state, self.actor(state)) - policy_loss = -policy_loss.mean() - next_action = self.target_actor(next_state) - target_value = self.target_critic(next_state, next_action.detach()) - expected_value = reward + (1.0 - done) * self.gamma * target_value - expected_value = torch.clamp(expected_value, -np.inf, np.inf) - - value = self.critic(state, action) - value_loss = nn.MSELoss()(value, expected_value.detach()) - - self.actor_optimizer.zero_grad() - policy_loss.backward() - self.actor_optimizer.step() - self.critic_optimizer.zero_grad() - value_loss.backward() - self.critic_optimizer.step() - # soft update - for target_param, param in zip(self.target_critic.parameters(), self.critic.parameters()): - target_param.data.copy_( - target_param.data * (1.0 - self.tau) + - param.data * self.tau - ) - for target_param, param in zip(self.target_actor.parameters(), self.actor.parameters()): - target_param.data.copy_( - target_param.data * (1.0 - self.tau) + - param.data * self.tau - ) - def save_model(self,path): - from pathlib import Path - # create path - Path(path).mkdir(parents=True, exist_ok=True) - torch.save(self.actor.state_dict(), f"{path}/actor_checkpoint.pt") - - def load_model(self,path): - self.actor.load_state_dict(torch.load(f"{path}/actor_checkpoint.pt")) \ No newline at end of file diff --git a/projects/codes/DDPG/env.py b/projects/codes/DDPG/env.py deleted file mode 100644 index 89445cf..0000000 --- a/projects/codes/DDPG/env.py +++ /dev/null @@ -1,56 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -@Author: John -@Email: johnjim0816@gmail.com -@Date: 2020-06-10 15:28:30 -@LastEditor: John -LastEditTime: 2021-09-16 00:52:30 -@Discription: -@Environment: python 3.7.7 -''' -import gym -import numpy as np - -class NormalizedActions(gym.ActionWrapper): - ''' 将action范围重定在[0.1]之间 - ''' - def action(self, action): - low_bound = self.action_space.low - upper_bound = self.action_space.high - action = low_bound + (action + 1.0) * 0.5 * (upper_bound - low_bound) - action = np.clip(action, low_bound, upper_bound) - return action - - def reverse_action(self, action): - low_bound = self.action_space.low - upper_bound = self.action_space.high - action = 2 * (action - low_bound) / (upper_bound - low_bound) - 1 - action = np.clip(action, low_bound, upper_bound) - return action - -class OUNoise(object): - '''Ornstein–Uhlenbeck噪声 - ''' - def __init__(self, action_space, mu=0.0, theta=0.15, max_sigma=0.3, min_sigma=0.3, decay_period=100000): - self.mu = mu # OU噪声的参数 - self.theta = theta # OU噪声的参数 - self.sigma = max_sigma # OU噪声的参数 - self.max_sigma = max_sigma - self.min_sigma = min_sigma - self.decay_period = decay_period - self.n_actions = action_space.shape[0] - self.low = action_space.low - self.high = action_space.high - self.reset() - def reset(self): - self.obs = np.ones(self.n_actions) * self.mu - def evolve_obs(self): - x = self.obs - dx = self.theta * (self.mu - x) + self.sigma * np.random.randn(self.n_actions) - self.obs = x + dx - return self.obs - def get_action(self, action, t=0): - ou_obs = self.evolve_obs() - self.sigma = self.max_sigma - (self.max_sigma - self.min_sigma) * min(1.0, t / self.decay_period) # sigma会逐渐衰减 - return np.clip(action + ou_obs, self.low, self.high) # 动作加上噪声后进行剪切 \ No newline at end of file diff --git a/projects/codes/DDPG/main.py b/projects/codes/DDPG/main.py deleted file mode 100644 index 8da5d29..0000000 --- a/projects/codes/DDPG/main.py +++ /dev/null @@ -1,152 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -@Author: John -@Email: johnjim0816@gmail.com -@Date: 2020-06-11 20:58:21 -@LastEditor: John -LastEditTime: 2022-09-27 15:50:12 -@Discription: -@Environment: python 3.7.7 -''' -import sys,os -curr_path = os.path.dirname(os.path.abspath(__file__)) # current path -parent_path = os.path.dirname(curr_path) # parent path -sys.path.append(parent_path) # add to system path - -import datetime -import gym -import torch -import argparse -import torch.nn as nn -import torch.nn.functional as F -from env import NormalizedActions,OUNoise -from ddpg import DDPG -from common.utils import all_seed -from common.memories import ReplayBufferQue -from common.launcher import Launcher -from envs.register import register_env - -class Actor(nn.Module): - def __init__(self, n_states, n_actions, hidden_dim, init_w=3e-3): - super(Actor, self).__init__() - self.linear1 = nn.Linear(n_states, hidden_dim) - self.linear2 = nn.Linear(hidden_dim, hidden_dim) - self.linear3 = nn.Linear(hidden_dim, n_actions) - - self.linear3.weight.data.uniform_(-init_w, init_w) - self.linear3.bias.data.uniform_(-init_w, init_w) - - def forward(self, x): - x = F.relu(self.linear1(x)) - x = F.relu(self.linear2(x)) - x = torch.tanh(self.linear3(x)) - return x -class Critic(nn.Module): - def __init__(self, n_states, n_actions, hidden_dim, init_w=3e-3): - super(Critic, self).__init__() - - self.linear1 = nn.Linear(n_states + n_actions, hidden_dim) - self.linear2 = nn.Linear(hidden_dim, hidden_dim) - self.linear3 = nn.Linear(hidden_dim, 1) - # 随机初始化为较小的值 - self.linear3.weight.data.uniform_(-init_w, init_w) - self.linear3.bias.data.uniform_(-init_w, init_w) - - def forward(self, state, action): - # 按维数1拼接 - x = torch.cat([state, action], 1) - x = F.relu(self.linear1(x)) - x = F.relu(self.linear2(x)) - x = self.linear3(x) - return x -class Main(Launcher): - def get_args(self): - """ hyperparameters - """ - curr_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S") # obtain current time - parser = argparse.ArgumentParser(description="hyperparameters") - parser.add_argument('--algo_name',default='DDPG',type=str,help="name of algorithm") - parser.add_argument('--env_name',default='Pendulum-v1',type=str,help="name of environment") - parser.add_argument('--train_eps',default=300,type=int,help="episodes of training") - parser.add_argument('--test_eps',default=20,type=int,help="episodes of testing") - parser.add_argument('--max_steps',default=100000,type=int,help="steps per episode, much larger value can simulate infinite steps") - parser.add_argument('--gamma',default=0.99,type=float,help="discounted factor") - parser.add_argument('--critic_lr',default=1e-3,type=float,help="learning rate of critic") - parser.add_argument('--actor_lr',default=1e-4,type=float,help="learning rate of actor") - parser.add_argument('--memory_capacity',default=8000,type=int,help="memory capacity") - parser.add_argument('--batch_size',default=128,type=int) - parser.add_argument('--target_update',default=2,type=int) - parser.add_argument('--tau',default=1e-2,type=float) - parser.add_argument('--critic_hidden_dim',default=256,type=int) - parser.add_argument('--actor_hidden_dim',default=256,type=int) - parser.add_argument('--device',default='cpu',type=str,help="cpu or cuda") - parser.add_argument('--seed',default=1,type=int,help="random seed") - parser.add_argument('--show_fig',default=False,type=bool,help="if show figure or not") - parser.add_argument('--save_fig',default=True,type=bool,help="if save figure or not") - args = parser.parse_args() - default_args = {'result_path':f"{curr_path}/outputs/{args.env_name}/{curr_time}/results/", - 'model_path':f"{curr_path}/outputs/{args.env_name}/{curr_time}/models/", - } - args = {**vars(args),**default_args} # type(dict) - return args - - def env_agent_config(self,cfg): - register_env(cfg['env_name']) - env = gym.make(cfg['env_name']) - env = NormalizedActions(env) # decorate with action noise - if cfg['seed'] !=0: # set random seed - all_seed(env,seed=cfg["seed"]) - n_states = env.observation_space.shape[0] - n_actions = env.action_space.shape[0] - print(f"n_states: {n_states}, n_actions: {n_actions}") - cfg.update({"n_states":n_states,"n_actions":n_actions}) # update to cfg paramters - models = {"actor":Actor(n_states,n_actions,hidden_dim=cfg['actor_hidden_dim']),"critic":Critic(n_states,n_actions,hidden_dim=cfg['critic_hidden_dim'])} - memories = {"memory":ReplayBufferQue(cfg['memory_capacity'])} - agent = DDPG(models,memories,cfg) - return env,agent - def train(self,cfg, env, agent): - print('Start training!') - ou_noise = OUNoise(env.action_space) # noise of action - rewards = [] # record rewards for all episodes - for i_ep in range(cfg['train_eps']): - state = env.reset() - ou_noise.reset() - ep_reward = 0 - for i_step in range(cfg['max_steps']): - action = agent.sample_action(state) - action = ou_noise.get_action(action, i_step+1) - next_state, reward, done, _ = env.step(action) - ep_reward += reward - agent.memory.push((state, action, reward, next_state, done)) - agent.update() - state = next_state - if done: - break - if (i_ep+1)%10 == 0: - print(f"Env:{i_ep+1}/{cfg['train_eps']}, Reward:{ep_reward:.2f}") - rewards.append(ep_reward) - print('Finish training!') - return {'rewards':rewards} - - def test(self,cfg, env, agent): - print('Start testing!') - rewards = [] # record rewards for all episodes - for i_ep in range(cfg['test_eps']): - state = env.reset() - ep_reward = 0 - for i_step in range(cfg['max_steps']): - action = agent.predict_action(state) - next_state, reward, done, _ = env.step(action) - ep_reward += reward - state = next_state - if done: - break - rewards.append(ep_reward) - print(f"Episode:{i_ep+1}/{cfg['test_eps']}, Reward:{ep_reward:.1f}") - print('Finish testing!') - return {'rewards':rewards} -if __name__ == "__main__": - main = Main() - main.run() - diff --git a/projects/codes/DDPG/outputs/Pendulum-v1/20220927-155053/models/actor_checkpoint.pt b/projects/codes/DDPG/outputs/Pendulum-v1/20220927-155053/models/actor_checkpoint.pt deleted file mode 100644 index e65e7ca..0000000 Binary files a/projects/codes/DDPG/outputs/Pendulum-v1/20220927-155053/models/actor_checkpoint.pt and /dev/null differ diff --git a/projects/codes/DDPG/outputs/Pendulum-v1/20220927-155053/results/params.json b/projects/codes/DDPG/outputs/Pendulum-v1/20220927-155053/results/params.json deleted file mode 100644 index c3825cf..0000000 --- a/projects/codes/DDPG/outputs/Pendulum-v1/20220927-155053/results/params.json +++ /dev/null @@ -1,25 +0,0 @@ -{ - "algo_name": "DDPG", - "env_name": "Pendulum-v1", - "train_eps": 300, - "test_eps": 20, - "max_steps": 100000, - "gamma": 0.99, - "critic_lr": 0.001, - "actor_lr": 0.0001, - "memory_capacity": 8000, - "batch_size": 128, - "target_update": 2, - "tau": 0.01, - "critic_hidden_dim": 256, - "actor_hidden_dim": 256, - "device": "cpu", - "seed": 1, - "show_fig": false, - "save_fig": true, - "result_path": "/Users/jj/Desktop/rl-tutorials/codes/DDPG/outputs/Pendulum-v1/20220927-155053/results/", - "model_path": "/Users/jj/Desktop/rl-tutorials/codes/DDPG/outputs/Pendulum-v1/20220927-155053/models/", - "n_states": 3, - "n_actions": 1, - "training_time": 358.8142900466919 -} \ No newline at end of file diff --git a/projects/codes/DDPG/outputs/Pendulum-v1/20220927-155053/results/testing_curve.png b/projects/codes/DDPG/outputs/Pendulum-v1/20220927-155053/results/testing_curve.png deleted file mode 100644 index 44e53e2..0000000 Binary files a/projects/codes/DDPG/outputs/Pendulum-v1/20220927-155053/results/testing_curve.png and /dev/null differ diff --git a/projects/codes/DDPG/outputs/Pendulum-v1/20220927-155053/results/testing_results.csv b/projects/codes/DDPG/outputs/Pendulum-v1/20220927-155053/results/testing_results.csv deleted file mode 100644 index 590c141..0000000 --- a/projects/codes/DDPG/outputs/Pendulum-v1/20220927-155053/results/testing_results.csv +++ /dev/null @@ -1,21 +0,0 @@ -rewards --116.045416124376 --126.18022935469217 --231.46338228458293 --246.40481094689758 --304.69493818839186 --124.39609191913091 --1.060003582878406 --114.19659653048288 --348.9745708742037 --116.10811133324769 --117.20146333694844 --118.66206784602966 --235.17836229762355 --356.14054913290624 --118.38579118156366 --351.9415915140771 --114.50877866098972 --124.775484599685 --226.47062962476875 --121.48872909193936 diff --git a/projects/codes/DDPG/outputs/Pendulum-v1/20220927-155053/results/training_curve.png b/projects/codes/DDPG/outputs/Pendulum-v1/20220927-155053/results/training_curve.png deleted file mode 100644 index b0b95fe..0000000 Binary files a/projects/codes/DDPG/outputs/Pendulum-v1/20220927-155053/results/training_curve.png and /dev/null differ diff --git a/projects/codes/DDPG/outputs/Pendulum-v1/20220927-155053/results/training_results.csv b/projects/codes/DDPG/outputs/Pendulum-v1/20220927-155053/results/training_results.csv deleted file mode 100644 index 2fa54ec..0000000 --- a/projects/codes/DDPG/outputs/Pendulum-v1/20220927-155053/results/training_results.csv +++ /dev/null @@ -1,301 +0,0 @@ -rewards --1557.8518596631177 --1354.7599369723537 --1375.5732016629706 --1493.8609739040871 --1426.7116204537845 --1235.7920755027762 --1339.1647620443073 --1544.2379906560486 --1539.6232758780877 --1549.5690058648204 --1446.9193195793853 --1520.2666688767558 --1525.0116707122581 --1379.136573640111 --1532.702831768523 --1484.7552963941637 --1359.6699201737677 --1349.6805649166854 --1510.869999766432 --1515.8398785434708 --1447.4648656578254 --1537.3822077872178 --1249.6517039877456 --1350.0302666965736 --1529.4363372505607 --1320.28204807604 --1502.9248141320654 --1545.4861772197075 --1579.928789692619 --1413.296070504152 --1242.4673258663781 --1403.8672028946078 --1452.7199002523635 --871.6071114009982 --1324.1789316121412 --1313.3348146041249 --1059.8722927418046 --1054.232673559123 --973.8956270782459 --972.9936641224186 --972.9477399905655 --947.0613443333731 --737.3866328989184 --958.6068164634295 --739.6973395350705 --886.8383108399455 --775.1430379821574 --937.3115016337417 --700.875502951337 --829.9396339144109 --271.1629773396998 --493.5460684734584 --485.9321719313203 --858.3735607086766 --1145.3440084994113 --1121.1338201339777 --1191.5640831332366 --1350.0425368846784 --249.25438665107953 --727.9051714734406 --368.5579316240395 --392.0611344939354 --955.3231703741553 --488.27956192035265 --362.2734695759137 --949.5440839122496 --496.8460016912189 --726.6871514929877 --424.48641462866266 --954.7075428204689 --608.9650086409792 --848.6059768900151 --866.7052398755033 --856.9846415044439 --751.0342976129083 --749.5118249469103 --509.882299129811 --506.56154097018043 --906.0964475820368 --1318.3941416286855 --1422.2017011876615 --1523.1661091894277 --1209.2850593747999 --1415.0972750475833 --1533.2263827605834 --1405.8345530072663 --1244.3384723384913 --1237.4704845061992 --949.3394417935086 --981.1855396112669 --1241.224568444032 --1033.118364799829 --1017.2403725619487 --981.9727804516916 --853.1877724775591 --869.0652369861646 --1069.8265343327998 --371.73173813891884 --735.5887912713665 --1262.050240428957 --1242.985056062197 --1191.6867713427482 --1328.5323118458034 --1015.5308653784714 --895.3066515461381 --994.1114862316568 --761.4710321387583 --717.6979056272868 --782.302146467708 --640.4913147345328 --725.6469893076355 --497.5346232085584 --1027.1192149202325 --950.0117149822681 --956.1343737377374 --708.9489626669097 --964.5003064113283 --611.9111516886613 --612.3182791021098 --1100.0047939174613 --984.9262458612923 --858.7106075590494 --842.305917848386 --745.9043991922597 --741.2168858394704 --1143.0750387284456 --755.5257242325362 --745.8440029056219 --387.8717950334138 --764.6628701051523 --486.7967495537958 --485.13357559164814 --313.5415216767419 --611.3450529954782 --611.1570544377465 --507.6456747676814 --615.2032627013064 --242.37988821149764 --603.85498620892 --352.2672241055367 --155.99874664988383 --615.4003063516313 --384.9811293551548 --498.80727354456315 --407.6898591217813 --1213.6383844696395 --1122.2425748913884 --592.4819308883913 --478.2046833075051 --891.0254788311132 --482.40204115385 --339.34676196677407 --582.9985110154428 --213.38243627478826 --928.8434951613825 --1545.5433749195483 --1179.5016285049896 --1211.9549773601925 --1396.8082561792166 --1318.073128824395 --597.3837225413702 --564.7793352410449 --723.744223659601 --653.0145534050461 --847.6138123247009 --385.62784320332867 --245.25250602651928 --117.55094416757835 --864.0064774069044 --124.30221387458867 --244.4014050243669 --1148.861754008653 --914.4047868424254 --765.9394408203351 --124.05114610943177 --605.7641303826842 --616.3595829453579 --375.5024692962698 --253.51874076866997 --240.08405245866714 --503.96565579077225 --606.7646526173963 --502.6512112729435 --746.404013238678 --718.8658110051653 --125.65808359856703 --247.62256797883364 --363.69852213666803 --249.21801061415547 --491.7724416523124 --235.37050442527357 --609.6026403583944 --236.05731608228092 --381.19853850450454 --298.7683201867404 --127.64145601534942 --233.4300138495176 --129.11243486763516 --390.0092951263507 --1000.7729892969854 --249.60445310459787 --253.02347910759622 --129.04269174391223 --360.6321251486308 --377.26297602576534 --124.98466986009481 --245.47913567739212 --127.0885254550411 --118.11013006825459 --128.8682755001942 --497.3015586531096 --340.77352433313484 --514.4945799737978 --503.24077308842783 --627.9068157464455 --511.39396524392146 --763.8866112068075 --741.7885082408757 --617.4945380476306 --950.3176437519387 --643.4791402436576 --511.9377874351982 --573.6219349516633 --564.1297823875693 --242.06399233336583 --496.4020380325518 --360.56387982880364 --495.4590728336022 --503.7263345016764 --122.47964616802327 --254.16543926263168 --614.5335268729743 --234.3718017676852 --301.27514663062874 --387.64758894986204 --368.74492411716415 --364.43559131093593 --160.6845848115533 --504.1948947975429 --246.51676032967683 --251.5732500220603 --600.1463819723879 --247.17476928471288 --381.924164337607 --377.4773226068174 --378.511830774651 --126.69199895843033 --365.0506645811703 --130.45052114802874 --374.37400288581813 --502.37678159638887 --374.43552658473055 --241.157211525502 --388.9597456642503 --249.4412385534861 --114.71395078439846 --864.6882327286056 --626.8144095971478 --732.9226896140248 --368.24767905020394 --369.7425524469132 --398.07832598184626 --906.7113918582257 --252.2343258180765 --370.4258473086036 --736.0203154396909 --609.4605173515027 --661.1255920773486 --489.9605291008584 --364.1671188501402 --644.4029089587781 --477.9510457677364 --128.78294672880136 --373.74382001694886 --380.69931133982936 --372.60275628381805 --743.0410655515724 --597.558847789258 --387.94245652694394 --725.3939448944484 --409.1301313430852 --491.8442467896486 --123.0638156839621 --377.9292326597324 --489.27209762667974 --255.63227821371257 --379.5885382060625 --370.2312967024669 --250.94061817008688 --131.2125308195906 --600.3312016651868 --130.84444772735733 --312.6287688438562 --382.4144610039701 --259.03558003697265 --224.92206667096863 --376.81390821359685 --382.39993489751646 --380.25599578593636 --610.1016672243638 diff --git a/projects/codes/DQN/Test_CartPole-v1_DQN_20221031-001343/config.yaml b/projects/codes/DQN/Test_CartPole-v1_DQN_20221031-001343/config.yaml deleted file mode 100644 index 5e3ad4e..0000000 --- a/projects/codes/DQN/Test_CartPole-v1_DQN_20221031-001343/config.yaml +++ /dev/null @@ -1,25 +0,0 @@ -general_cfg: - algo_name: DQN - device: cuda - env_name: CartPole-v1 - eval_eps: 10 - eval_per_episode: 5 - load_checkpoint: true - load_path: Train_CartPole-v1_DQN_20221031-001201 - max_steps: 200 - mode: test - save_fig: true - seed: 0 - show_fig: false - test_eps: 10 - train_eps: 100 -algo_cfg: - batch_size: 64 - buffer_size: 100000 - epsilon_decay: 500 - epsilon_end: 0.01 - epsilon_start: 0.95 - gamma: 0.95 - hidden_dim: 256 - lr: 0.0001 - target_update: 4 diff --git a/projects/codes/DQN/Test_CartPole-v1_DQN_20221031-001343/logs/log.txt b/projects/codes/DQN/Test_CartPole-v1_DQN_20221031-001343/logs/log.txt deleted file mode 100644 index 44f28cb..0000000 --- a/projects/codes/DQN/Test_CartPole-v1_DQN_20221031-001343/logs/log.txt +++ /dev/null @@ -1,14 +0,0 @@ -2022-10-31 00:13:43 - r - INFO: - n_states: 4, n_actions: 2 -2022-10-31 00:13:44 - r - INFO: - Start testing! -2022-10-31 00:13:44 - r - INFO: - Env: CartPole-v1, Algorithm: DQN, Device: cuda -2022-10-31 00:13:45 - r - INFO: - Episode: 1/10, Reward: 200.0, Step: 200 -2022-10-31 00:13:45 - r - INFO: - Episode: 2/10, Reward: 200.0, Step: 200 -2022-10-31 00:13:45 - r - INFO: - Episode: 3/10, Reward: 200.0, Step: 200 -2022-10-31 00:13:45 - r - INFO: - Episode: 4/10, Reward: 200.0, Step: 200 -2022-10-31 00:13:45 - r - INFO: - Episode: 5/10, Reward: 200.0, Step: 200 -2022-10-31 00:13:45 - r - INFO: - Episode: 6/10, Reward: 200.0, Step: 200 -2022-10-31 00:13:45 - r - INFO: - Episode: 7/10, Reward: 200.0, Step: 200 -2022-10-31 00:13:45 - r - INFO: - Episode: 8/10, Reward: 200.0, Step: 200 -2022-10-31 00:13:45 - r - INFO: - Episode: 9/10, Reward: 200.0, Step: 200 -2022-10-31 00:13:45 - r - INFO: - Episode: 10/10, Reward: 200.0, Step: 200 -2022-10-31 00:13:45 - r - INFO: - Finish testing! diff --git a/projects/codes/DQN/Test_CartPole-v1_DQN_20221031-001343/models/checkpoint.pt b/projects/codes/DQN/Test_CartPole-v1_DQN_20221031-001343/models/checkpoint.pt deleted file mode 100644 index 722eb69..0000000 Binary files a/projects/codes/DQN/Test_CartPole-v1_DQN_20221031-001343/models/checkpoint.pt and /dev/null differ diff --git a/projects/codes/DQN/Test_CartPole-v1_DQN_20221031-001343/results/learning_curve.png b/projects/codes/DQN/Test_CartPole-v1_DQN_20221031-001343/results/learning_curve.png deleted file mode 100644 index 046009a..0000000 Binary files a/projects/codes/DQN/Test_CartPole-v1_DQN_20221031-001343/results/learning_curve.png and /dev/null differ diff --git a/projects/codes/DQN/Test_CartPole-v1_DQN_20221031-001343/results/res.csv b/projects/codes/DQN/Test_CartPole-v1_DQN_20221031-001343/results/res.csv deleted file mode 100644 index cbbcf2e..0000000 --- a/projects/codes/DQN/Test_CartPole-v1_DQN_20221031-001343/results/res.csv +++ /dev/null @@ -1,11 +0,0 @@ -episodes,rewards,steps -0,200.0,200 -1,200.0,200 -2,200.0,200 -3,200.0,200 -4,200.0,200 -5,200.0,200 -6,200.0,200 -7,200.0,200 -8,200.0,200 -9,200.0,200 diff --git a/projects/codes/DQN/Train_Acrobot-v1_DQN_20221026-094645/config.yaml b/projects/codes/DQN/Train_Acrobot-v1_DQN_20221026-094645/config.yaml deleted file mode 100644 index 7416aec..0000000 --- a/projects/codes/DQN/Train_Acrobot-v1_DQN_20221026-094645/config.yaml +++ /dev/null @@ -1,23 +0,0 @@ -general_cfg: - algo_name: DQN - device: cuda - env_name: Acrobot-v1 - load_checkpoint: false - load_path: Train_CartPole-v1_DQN_20221026-054757 - max_steps: 100000 - mode: train - save_fig: true - seed: 1 - show_fig: false - test_eps: 10 - train_eps: 100 -algo_cfg: - batch_size: 128 - buffer_size: 200000 - epsilon_decay: 500 - epsilon_end: 0.01 - epsilon_start: 0.95 - gamma: 0.95 - hidden_dim: 256 - lr: 0.002 - target_update: 4 diff --git a/projects/codes/DQN/Train_Acrobot-v1_DQN_20221026-094645/logs/log.txt b/projects/codes/DQN/Train_Acrobot-v1_DQN_20221026-094645/logs/log.txt deleted file mode 100644 index e745c8c..0000000 --- a/projects/codes/DQN/Train_Acrobot-v1_DQN_20221026-094645/logs/log.txt +++ /dev/null @@ -1,104 +0,0 @@ -2022-10-26 09:46:45 - r - INFO: - n_states: 6, n_actions: 3 -2022-10-26 09:46:48 - r - INFO: - Start training! -2022-10-26 09:46:48 - r - INFO: - Env: Acrobot-v1, Algorithm: DQN, Device: cuda -2022-10-26 09:46:50 - r - INFO: - Episode: 1/100, Reward: -861.00: Epislon: 0.178 -2022-10-26 09:46:50 - r - INFO: - Episode: 2/100, Reward: -252.00: Epislon: 0.111 -2022-10-26 09:46:50 - r - INFO: - Episode: 3/100, Reward: -196.00: Epislon: 0.078 -2022-10-26 09:46:51 - r - INFO: - Episode: 4/100, Reward: -390.00: Epislon: 0.041 -2022-10-26 09:46:52 - r - INFO: - Episode: 5/100, Reward: -371.00: Epislon: 0.025 -2022-10-26 09:46:52 - r - INFO: - Episode: 6/100, Reward: -237.00: Epislon: 0.019 -2022-10-26 09:46:52 - r - INFO: - Episode: 7/100, Reward: -227.00: Epislon: 0.016 -2022-10-26 09:46:53 - r - INFO: - Episode: 8/100, Reward: -228.00: Epislon: 0.014 -2022-10-26 09:46:53 - r - INFO: - Episode: 9/100, Reward: -305.00: Epislon: 0.012 -2022-10-26 09:46:54 - r - INFO: - Episode: 10/100, Reward: -234.00: Epislon: 0.011 -2022-10-26 09:46:54 - r - INFO: - Episode: 11/100, Reward: -204.00: Epislon: 0.011 -2022-10-26 09:46:55 - r - INFO: - Episode: 12/100, Reward: -277.00: Epislon: 0.010 -2022-10-26 09:46:55 - r - INFO: - Episode: 13/100, Reward: -148.00: Epislon: 0.010 -2022-10-26 09:46:56 - r - INFO: - Episode: 14/100, Reward: -372.00: Epislon: 0.010 -2022-10-26 09:46:56 - r - INFO: - Episode: 15/100, Reward: -273.00: Epislon: 0.010 -2022-10-26 09:46:56 - r - INFO: - Episode: 16/100, Reward: -105.00: Epislon: 0.010 -2022-10-26 09:46:56 - r - INFO: - Episode: 17/100, Reward: -79.00: Epislon: 0.010 -2022-10-26 09:46:57 - r - INFO: - Episode: 18/100, Reward: -112.00: Epislon: 0.010 -2022-10-26 09:46:57 - r - INFO: - Episode: 19/100, Reward: -276.00: Epislon: 0.010 -2022-10-26 09:46:57 - r - INFO: - Episode: 20/100, Reward: -148.00: Epislon: 0.010 -2022-10-26 09:46:58 - r - INFO: - Episode: 21/100, Reward: -201.00: Epislon: 0.010 -2022-10-26 09:46:58 - r - INFO: - Episode: 22/100, Reward: -173.00: Epislon: 0.010 -2022-10-26 09:46:58 - r - INFO: - Episode: 23/100, Reward: -226.00: Epislon: 0.010 -2022-10-26 09:46:59 - r - INFO: - Episode: 24/100, Reward: -154.00: Epislon: 0.010 -2022-10-26 09:46:59 - r - INFO: - Episode: 25/100, Reward: -269.00: Epislon: 0.010 -2022-10-26 09:46:59 - r - INFO: - Episode: 26/100, Reward: -191.00: Epislon: 0.010 -2022-10-26 09:47:00 - r - INFO: - Episode: 27/100, Reward: -177.00: Epislon: 0.010 -2022-10-26 09:47:00 - r - INFO: - Episode: 28/100, Reward: -209.00: Epislon: 0.010 -2022-10-26 09:47:00 - r - INFO: - Episode: 29/100, Reward: -116.00: Epislon: 0.010 -2022-10-26 09:47:00 - r - INFO: - Episode: 30/100, Reward: -117.00: Epislon: 0.010 -2022-10-26 09:47:01 - r - INFO: - Episode: 31/100, Reward: -121.00: Epislon: 0.010 -2022-10-26 09:47:01 - r - INFO: - Episode: 32/100, Reward: -208.00: Epislon: 0.010 -2022-10-26 09:47:01 - r - INFO: - Episode: 33/100, Reward: -147.00: Epislon: 0.010 -2022-10-26 09:47:02 - r - INFO: - Episode: 34/100, Reward: -104.00: Epislon: 0.010 -2022-10-26 09:47:02 - r - INFO: - Episode: 35/100, Reward: -161.00: Epislon: 0.010 -2022-10-26 09:47:02 - r - INFO: - Episode: 36/100, Reward: -144.00: Epislon: 0.010 -2022-10-26 09:47:02 - r - INFO: - Episode: 37/100, Reward: -131.00: Epislon: 0.010 -2022-10-26 09:47:03 - r - INFO: - Episode: 38/100, Reward: -226.00: Epislon: 0.010 -2022-10-26 09:47:03 - r - INFO: - Episode: 39/100, Reward: -117.00: Epislon: 0.010 -2022-10-26 09:47:03 - r - INFO: - Episode: 40/100, Reward: -344.00: Epislon: 0.010 -2022-10-26 09:47:04 - r - INFO: - Episode: 41/100, Reward: -123.00: Epislon: 0.010 -2022-10-26 09:47:04 - r - INFO: - Episode: 42/100, Reward: -232.00: Epislon: 0.010 -2022-10-26 09:47:04 - r - INFO: - Episode: 43/100, Reward: -190.00: Epislon: 0.010 -2022-10-26 09:47:05 - r - INFO: - Episode: 44/100, Reward: -176.00: Epislon: 0.010 -2022-10-26 09:47:05 - r - INFO: - Episode: 45/100, Reward: -139.00: Epislon: 0.010 -2022-10-26 09:47:06 - r - INFO: - Episode: 46/100, Reward: -410.00: Epislon: 0.010 -2022-10-26 09:47:06 - r - INFO: - Episode: 47/100, Reward: -115.00: Epislon: 0.010 -2022-10-26 09:47:06 - r - INFO: - Episode: 48/100, Reward: -118.00: Epislon: 0.010 -2022-10-26 09:47:06 - r - INFO: - Episode: 49/100, Reward: -113.00: Epislon: 0.010 -2022-10-26 09:47:07 - r - INFO: - Episode: 50/100, Reward: -355.00: Epislon: 0.010 -2022-10-26 09:47:07 - r - INFO: - Episode: 51/100, Reward: -110.00: Epislon: 0.010 -2022-10-26 09:47:07 - r - INFO: - Episode: 52/100, Reward: -148.00: Epislon: 0.010 -2022-10-26 09:47:08 - r - INFO: - Episode: 53/100, Reward: -135.00: Epislon: 0.010 -2022-10-26 09:47:08 - r - INFO: - Episode: 54/100, Reward: -220.00: Epislon: 0.010 -2022-10-26 09:47:08 - r - INFO: - Episode: 55/100, Reward: -157.00: Epislon: 0.010 -2022-10-26 09:47:09 - r - INFO: - Episode: 56/100, Reward: -130.00: Epislon: 0.010 -2022-10-26 09:47:09 - r - INFO: - Episode: 57/100, Reward: -150.00: Epislon: 0.010 -2022-10-26 09:47:09 - r - INFO: - Episode: 58/100, Reward: -254.00: Epislon: 0.010 -2022-10-26 09:47:10 - r - INFO: - Episode: 59/100, Reward: -148.00: Epislon: 0.010 -2022-10-26 09:47:10 - r - INFO: - Episode: 60/100, Reward: -108.00: Epislon: 0.010 -2022-10-26 09:47:10 - r - INFO: - Episode: 61/100, Reward: -152.00: Epislon: 0.010 -2022-10-26 09:47:10 - r - INFO: - Episode: 62/100, Reward: -107.00: Epislon: 0.010 -2022-10-26 09:47:10 - r - INFO: - Episode: 63/100, Reward: -110.00: Epislon: 0.010 -2022-10-26 09:47:11 - r - INFO: - Episode: 64/100, Reward: -266.00: Epislon: 0.010 -2022-10-26 09:47:11 - r - INFO: - Episode: 65/100, Reward: -344.00: Epislon: 0.010 -2022-10-26 09:47:12 - r - INFO: - Episode: 66/100, Reward: -93.00: Epislon: 0.010 -2022-10-26 09:47:12 - r - INFO: - Episode: 67/100, Reward: -113.00: Epislon: 0.010 -2022-10-26 09:47:12 - r - INFO: - Episode: 68/100, Reward: -191.00: Epislon: 0.010 -2022-10-26 09:47:12 - r - INFO: - Episode: 69/100, Reward: -102.00: Epislon: 0.010 -2022-10-26 09:47:13 - r - INFO: - Episode: 70/100, Reward: -187.00: Epislon: 0.010 -2022-10-26 09:47:13 - r - INFO: - Episode: 71/100, Reward: -158.00: Epislon: 0.010 -2022-10-26 09:47:13 - r - INFO: - Episode: 72/100, Reward: -166.00: Epislon: 0.010 -2022-10-26 09:47:14 - r - INFO: - Episode: 73/100, Reward: -202.00: Epislon: 0.010 -2022-10-26 09:47:14 - r - INFO: - Episode: 74/100, Reward: -179.00: Epislon: 0.010 -2022-10-26 09:47:14 - r - INFO: - Episode: 75/100, Reward: -150.00: Epislon: 0.010 -2022-10-26 09:47:14 - r - INFO: - Episode: 76/100, Reward: -170.00: Epislon: 0.010 -2022-10-26 09:47:15 - r - INFO: - Episode: 77/100, Reward: -149.00: Epislon: 0.010 -2022-10-26 09:47:15 - r - INFO: - Episode: 78/100, Reward: -119.00: Epislon: 0.010 -2022-10-26 09:47:15 - r - INFO: - Episode: 79/100, Reward: -115.00: Epislon: 0.010 -2022-10-26 09:47:15 - r - INFO: - Episode: 80/100, Reward: -97.00: Epislon: 0.010 -2022-10-26 09:47:16 - r - INFO: - Episode: 81/100, Reward: -153.00: Epislon: 0.010 -2022-10-26 09:47:16 - r - INFO: - Episode: 82/100, Reward: -97.00: Epislon: 0.010 -2022-10-26 09:47:16 - r - INFO: - Episode: 83/100, Reward: -211.00: Epislon: 0.010 -2022-10-26 09:47:16 - r - INFO: - Episode: 84/100, Reward: -195.00: Epislon: 0.010 -2022-10-26 09:47:17 - r - INFO: - Episode: 85/100, Reward: -125.00: Epislon: 0.010 -2022-10-26 09:47:17 - r - INFO: - Episode: 86/100, Reward: -155.00: Epislon: 0.010 -2022-10-26 09:47:17 - r - INFO: - Episode: 87/100, Reward: -151.00: Epislon: 0.010 -2022-10-26 09:47:18 - r - INFO: - Episode: 88/100, Reward: -194.00: Epislon: 0.010 -2022-10-26 09:47:18 - r - INFO: - Episode: 89/100, Reward: -188.00: Epislon: 0.010 -2022-10-26 09:47:18 - r - INFO: - Episode: 90/100, Reward: -195.00: Epislon: 0.010 -2022-10-26 09:47:19 - r - INFO: - Episode: 91/100, Reward: -141.00: Epislon: 0.010 -2022-10-26 09:47:19 - r - INFO: - Episode: 92/100, Reward: -132.00: Epislon: 0.010 -2022-10-26 09:47:19 - r - INFO: - Episode: 93/100, Reward: -127.00: Epislon: 0.010 -2022-10-26 09:47:19 - r - INFO: - Episode: 94/100, Reward: -195.00: Epislon: 0.010 -2022-10-26 09:47:20 - r - INFO: - Episode: 95/100, Reward: -152.00: Epislon: 0.010 -2022-10-26 09:47:20 - r - INFO: - Episode: 96/100, Reward: -145.00: Epislon: 0.010 -2022-10-26 09:47:20 - r - INFO: - Episode: 97/100, Reward: -123.00: Epislon: 0.010 -2022-10-26 09:47:20 - r - INFO: - Episode: 98/100, Reward: -176.00: Epislon: 0.010 -2022-10-26 09:47:21 - r - INFO: - Episode: 99/100, Reward: -180.00: Epislon: 0.010 -2022-10-26 09:47:21 - r - INFO: - Episode: 100/100, Reward: -124.00: Epislon: 0.010 -2022-10-26 09:47:21 - r - INFO: - Finish training! diff --git a/projects/codes/DQN/Train_Acrobot-v1_DQN_20221026-094645/models/checkpoint.pt b/projects/codes/DQN/Train_Acrobot-v1_DQN_20221026-094645/models/checkpoint.pt deleted file mode 100644 index 5448aca..0000000 Binary files a/projects/codes/DQN/Train_Acrobot-v1_DQN_20221026-094645/models/checkpoint.pt and /dev/null differ diff --git a/projects/codes/DQN/Train_Acrobot-v1_DQN_20221026-094645/results/learning_curve.png b/projects/codes/DQN/Train_Acrobot-v1_DQN_20221026-094645/results/learning_curve.png deleted file mode 100644 index 7f1054d..0000000 Binary files a/projects/codes/DQN/Train_Acrobot-v1_DQN_20221026-094645/results/learning_curve.png and /dev/null differ diff --git a/projects/codes/DQN/Train_Acrobot-v1_DQN_20221026-094645/results/res.csv b/projects/codes/DQN/Train_Acrobot-v1_DQN_20221026-094645/results/res.csv deleted file mode 100644 index 1758be2..0000000 --- a/projects/codes/DQN/Train_Acrobot-v1_DQN_20221026-094645/results/res.csv +++ /dev/null @@ -1,101 +0,0 @@ -episodes,rewards,steps -0,-861.0,862 -1,-252.0,253 -2,-196.0,197 -3,-390.0,391 -4,-371.0,372 -5,-237.0,238 -6,-227.0,228 -7,-228.0,229 -8,-305.0,306 -9,-234.0,235 -10,-204.0,205 -11,-277.0,278 -12,-148.0,149 -13,-372.0,373 -14,-273.0,274 -15,-105.0,106 -16,-79.0,80 -17,-112.0,113 -18,-276.0,277 -19,-148.0,149 -20,-201.0,202 -21,-173.0,174 -22,-226.0,227 -23,-154.0,155 -24,-269.0,270 -25,-191.0,192 -26,-177.0,178 -27,-209.0,210 -28,-116.0,117 -29,-117.0,118 -30,-121.0,122 -31,-208.0,209 -32,-147.0,148 -33,-104.0,105 -34,-161.0,162 -35,-144.0,145 -36,-131.0,132 -37,-226.0,227 -38,-117.0,118 -39,-344.0,345 -40,-123.0,124 -41,-232.0,233 -42,-190.0,191 -43,-176.0,177 -44,-139.0,140 -45,-410.0,411 -46,-115.0,116 -47,-118.0,119 -48,-113.0,114 -49,-355.0,356 -50,-110.0,111 -51,-148.0,149 -52,-135.0,136 -53,-220.0,221 -54,-157.0,158 -55,-130.0,131 -56,-150.0,151 -57,-254.0,255 -58,-148.0,149 -59,-108.0,109 -60,-152.0,153 -61,-107.0,108 -62,-110.0,111 -63,-266.0,267 -64,-344.0,345 -65,-93.0,94 -66,-113.0,114 -67,-191.0,192 -68,-102.0,103 -69,-187.0,188 -70,-158.0,159 -71,-166.0,167 -72,-202.0,203 -73,-179.0,180 -74,-150.0,151 -75,-170.0,171 -76,-149.0,150 -77,-119.0,120 -78,-115.0,116 -79,-97.0,98 -80,-153.0,154 -81,-97.0,98 -82,-211.0,212 -83,-195.0,196 -84,-125.0,126 -85,-155.0,156 -86,-151.0,152 -87,-194.0,195 -88,-188.0,189 -89,-195.0,196 -90,-141.0,142 -91,-132.0,133 -92,-127.0,128 -93,-195.0,196 -94,-152.0,153 -95,-145.0,146 -96,-123.0,124 -97,-176.0,177 -98,-180.0,181 -99,-124.0,125 diff --git a/projects/codes/DQN/Train_CartPole-v1_DQN_20221031-001201/config.yaml b/projects/codes/DQN/Train_CartPole-v1_DQN_20221031-001201/config.yaml deleted file mode 100644 index 33950ad..0000000 --- a/projects/codes/DQN/Train_CartPole-v1_DQN_20221031-001201/config.yaml +++ /dev/null @@ -1,25 +0,0 @@ -general_cfg: - algo_name: DQN - device: cuda - env_name: CartPole-v1 - eval_eps: 10 - eval_per_episode: 5 - load_checkpoint: false - load_path: tasks - max_steps: 200 - mode: train - save_fig: true - seed: 1 - show_fig: false - test_eps: 10 - train_eps: 100 -algo_cfg: - batch_size: 64 - buffer_size: 100000 - epsilon_decay: 500 - epsilon_end: 0.01 - epsilon_start: 0.95 - gamma: 0.95 - hidden_dim: 256 - lr: 0.0001 - target_update: 800 diff --git a/projects/codes/DQN/Train_CartPole-v1_DQN_20221031-001201/logs/log.txt b/projects/codes/DQN/Train_CartPole-v1_DQN_20221031-001201/logs/log.txt deleted file mode 100644 index 5b084be..0000000 --- a/projects/codes/DQN/Train_CartPole-v1_DQN_20221031-001201/logs/log.txt +++ /dev/null @@ -1,116 +0,0 @@ -2022-10-31 00:12:01 - r - INFO: - n_states: 4, n_actions: 2 -2022-10-31 00:12:01 - r - INFO: - Start training! -2022-10-31 00:12:01 - r - INFO: - Env: CartPole-v1, Algorithm: DQN, Device: cuda -2022-10-31 00:12:04 - r - INFO: - Episode: 1/100, Reward: 18.0, Step: 18 -2022-10-31 00:12:04 - r - INFO: - Episode: 2/100, Reward: 35.0, Step: 35 -2022-10-31 00:12:04 - r - INFO: - Episode: 3/100, Reward: 13.0, Step: 13 -2022-10-31 00:12:04 - r - INFO: - Episode: 4/100, Reward: 32.0, Step: 32 -2022-10-31 00:12:04 - r - INFO: - Episode: 5/100, Reward: 16.0, Step: 16 -2022-10-31 00:12:04 - r - INFO: - Current episode 5 has the best eval reward: 15.30 -2022-10-31 00:12:04 - r - INFO: - Episode: 6/100, Reward: 12.0, Step: 12 -2022-10-31 00:12:04 - r - INFO: - Episode: 7/100, Reward: 13.0, Step: 13 -2022-10-31 00:12:04 - r - INFO: - Episode: 8/100, Reward: 15.0, Step: 15 -2022-10-31 00:12:04 - r - INFO: - Episode: 9/100, Reward: 11.0, Step: 11 -2022-10-31 00:12:04 - r - INFO: - Episode: 10/100, Reward: 15.0, Step: 15 -2022-10-31 00:12:04 - r - INFO: - Episode: 11/100, Reward: 9.0, Step: 9 -2022-10-31 00:12:04 - r - INFO: - Episode: 12/100, Reward: 13.0, Step: 13 -2022-10-31 00:12:04 - r - INFO: - Episode: 13/100, Reward: 13.0, Step: 13 -2022-10-31 00:12:04 - r - INFO: - Episode: 14/100, Reward: 10.0, Step: 10 -2022-10-31 00:12:04 - r - INFO: - Episode: 15/100, Reward: 9.0, Step: 9 -2022-10-31 00:12:04 - r - INFO: - Episode: 16/100, Reward: 24.0, Step: 24 -2022-10-31 00:12:04 - r - INFO: - Episode: 17/100, Reward: 8.0, Step: 8 -2022-10-31 00:12:04 - r - INFO: - Episode: 18/100, Reward: 10.0, Step: 10 -2022-10-31 00:12:04 - r - INFO: - Episode: 19/100, Reward: 11.0, Step: 11 -2022-10-31 00:12:04 - r - INFO: - Episode: 20/100, Reward: 13.0, Step: 13 -2022-10-31 00:12:04 - r - INFO: - Episode: 21/100, Reward: 12.0, Step: 12 -2022-10-31 00:12:04 - r - INFO: - Episode: 22/100, Reward: 11.0, Step: 11 -2022-10-31 00:12:04 - r - INFO: - Episode: 23/100, Reward: 9.0, Step: 9 -2022-10-31 00:12:04 - r - INFO: - Episode: 24/100, Reward: 21.0, Step: 21 -2022-10-31 00:12:05 - r - INFO: - Episode: 25/100, Reward: 14.0, Step: 14 -2022-10-31 00:12:05 - r - INFO: - Episode: 26/100, Reward: 12.0, Step: 12 -2022-10-31 00:12:05 - r - INFO: - Episode: 27/100, Reward: 9.0, Step: 9 -2022-10-31 00:12:05 - r - INFO: - Episode: 28/100, Reward: 11.0, Step: 11 -2022-10-31 00:12:05 - r - INFO: - Episode: 29/100, Reward: 12.0, Step: 12 -2022-10-31 00:12:05 - r - INFO: - Episode: 30/100, Reward: 13.0, Step: 13 -2022-10-31 00:12:05 - r - INFO: - Episode: 31/100, Reward: 10.0, Step: 10 -2022-10-31 00:12:05 - r - INFO: - Episode: 32/100, Reward: 13.0, Step: 13 -2022-10-31 00:12:05 - r - INFO: - Episode: 33/100, Reward: 18.0, Step: 18 -2022-10-31 00:12:05 - r - INFO: - Episode: 34/100, Reward: 9.0, Step: 9 -2022-10-31 00:12:05 - r - INFO: - Episode: 35/100, Reward: 10.0, Step: 10 -2022-10-31 00:12:05 - r - INFO: - Episode: 36/100, Reward: 9.0, Step: 9 -2022-10-31 00:12:05 - r - INFO: - Episode: 37/100, Reward: 10.0, Step: 10 -2022-10-31 00:12:05 - r - INFO: - Episode: 38/100, Reward: 10.0, Step: 10 -2022-10-31 00:12:05 - r - INFO: - Episode: 39/100, Reward: 10.0, Step: 10 -2022-10-31 00:12:05 - r - INFO: - Episode: 40/100, Reward: 8.0, Step: 8 -2022-10-31 00:12:06 - r - INFO: - Episode: 41/100, Reward: 9.0, Step: 9 -2022-10-31 00:12:06 - r - INFO: - Episode: 42/100, Reward: 9.0, Step: 9 -2022-10-31 00:12:06 - r - INFO: - Episode: 43/100, Reward: 20.0, Step: 20 -2022-10-31 00:12:06 - r - INFO: - Episode: 44/100, Reward: 16.0, Step: 16 -2022-10-31 00:12:06 - r - INFO: - Episode: 45/100, Reward: 17.0, Step: 17 -2022-10-31 00:12:06 - r - INFO: - Current episode 45 has the best eval reward: 17.50 -2022-10-31 00:12:06 - r - INFO: - Episode: 46/100, Reward: 17.0, Step: 17 -2022-10-31 00:12:06 - r - INFO: - Episode: 47/100, Reward: 17.0, Step: 17 -2022-10-31 00:12:06 - r - INFO: - Episode: 48/100, Reward: 18.0, Step: 18 -2022-10-31 00:12:06 - r - INFO: - Episode: 49/100, Reward: 25.0, Step: 25 -2022-10-31 00:12:06 - r - INFO: - Episode: 50/100, Reward: 31.0, Step: 31 -2022-10-31 00:12:06 - r - INFO: - Current episode 50 has the best eval reward: 24.80 -2022-10-31 00:12:06 - r - INFO: - Episode: 51/100, Reward: 22.0, Step: 22 -2022-10-31 00:12:06 - r - INFO: - Episode: 52/100, Reward: 39.0, Step: 39 -2022-10-31 00:12:06 - r - INFO: - Episode: 53/100, Reward: 36.0, Step: 36 -2022-10-31 00:12:06 - r - INFO: - Episode: 54/100, Reward: 26.0, Step: 26 -2022-10-31 00:12:07 - r - INFO: - Episode: 55/100, Reward: 33.0, Step: 33 -2022-10-31 00:12:07 - r - INFO: - Current episode 55 has the best eval reward: 38.70 -2022-10-31 00:12:07 - r - INFO: - Episode: 56/100, Reward: 56.0, Step: 56 -2022-10-31 00:12:07 - r - INFO: - Episode: 57/100, Reward: 112.0, Step: 112 -2022-10-31 00:12:07 - r - INFO: - Episode: 58/100, Reward: 101.0, Step: 101 -2022-10-31 00:12:08 - r - INFO: - Episode: 59/100, Reward: 69.0, Step: 69 -2022-10-31 00:12:08 - r - INFO: - Episode: 60/100, Reward: 75.0, Step: 75 -2022-10-31 00:12:08 - r - INFO: - Episode: 61/100, Reward: 182.0, Step: 182 -2022-10-31 00:12:09 - r - INFO: - Episode: 62/100, Reward: 52.0, Step: 52 -2022-10-31 00:12:09 - r - INFO: - Episode: 63/100, Reward: 67.0, Step: 67 -2022-10-31 00:12:09 - r - INFO: - Episode: 64/100, Reward: 53.0, Step: 53 -2022-10-31 00:12:09 - r - INFO: - Episode: 65/100, Reward: 119.0, Step: 119 -2022-10-31 00:12:10 - r - INFO: - Current episode 65 has the best eval reward: 171.90 -2022-10-31 00:12:10 - r - INFO: - Episode: 66/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:10 - r - INFO: - Episode: 67/100, Reward: 74.0, Step: 74 -2022-10-31 00:12:11 - r - INFO: - Episode: 68/100, Reward: 138.0, Step: 138 -2022-10-31 00:12:11 - r - INFO: - Episode: 69/100, Reward: 149.0, Step: 149 -2022-10-31 00:12:12 - r - INFO: - Episode: 70/100, Reward: 144.0, Step: 144 -2022-10-31 00:12:12 - r - INFO: - Current episode 70 has the best eval reward: 173.70 -2022-10-31 00:12:13 - r - INFO: - Episode: 71/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:13 - r - INFO: - Episode: 72/100, Reward: 198.0, Step: 198 -2022-10-31 00:12:14 - r - INFO: - Episode: 73/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:14 - r - INFO: - Episode: 74/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:15 - r - INFO: - Episode: 75/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:16 - r - INFO: - Current episode 75 has the best eval reward: 200.00 -2022-10-31 00:12:16 - r - INFO: - Episode: 76/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:17 - r - INFO: - Episode: 77/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:17 - r - INFO: - Episode: 78/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:18 - r - INFO: - Episode: 79/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:19 - r - INFO: - Episode: 80/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:19 - r - INFO: - Current episode 80 has the best eval reward: 200.00 -2022-10-31 00:12:20 - r - INFO: - Episode: 81/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:20 - r - INFO: - Episode: 82/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:21 - r - INFO: - Episode: 83/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:21 - r - INFO: - Episode: 84/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:22 - r - INFO: - Episode: 85/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:23 - r - INFO: - Current episode 85 has the best eval reward: 200.00 -2022-10-31 00:12:23 - r - INFO: - Episode: 86/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:24 - r - INFO: - Episode: 87/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:25 - r - INFO: - Episode: 88/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:25 - r - INFO: - Episode: 89/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:26 - r - INFO: - Episode: 90/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:27 - r - INFO: - Current episode 90 has the best eval reward: 200.00 -2022-10-31 00:12:27 - r - INFO: - Episode: 91/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:28 - r - INFO: - Episode: 92/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:28 - r - INFO: - Episode: 93/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:29 - r - INFO: - Episode: 94/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:29 - r - INFO: - Episode: 95/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:30 - r - INFO: - Current episode 95 has the best eval reward: 200.00 -2022-10-31 00:12:31 - r - INFO: - Episode: 96/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:31 - r - INFO: - Episode: 97/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:32 - r - INFO: - Episode: 98/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:32 - r - INFO: - Episode: 99/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:33 - r - INFO: - Episode: 100/100, Reward: 200.0, Step: 200 -2022-10-31 00:12:33 - r - INFO: - Current episode 100 has the best eval reward: 200.00 -2022-10-31 00:12:33 - r - INFO: - Finish training! diff --git a/projects/codes/DQN/Train_CartPole-v1_DQN_20221031-001201/models/checkpoint.pt b/projects/codes/DQN/Train_CartPole-v1_DQN_20221031-001201/models/checkpoint.pt deleted file mode 100644 index 722eb69..0000000 Binary files a/projects/codes/DQN/Train_CartPole-v1_DQN_20221031-001201/models/checkpoint.pt and /dev/null differ diff --git a/projects/codes/DQN/Train_CartPole-v1_DQN_20221031-001201/results/learning_curve.png b/projects/codes/DQN/Train_CartPole-v1_DQN_20221031-001201/results/learning_curve.png deleted file mode 100644 index 331f645..0000000 Binary files a/projects/codes/DQN/Train_CartPole-v1_DQN_20221031-001201/results/learning_curve.png and /dev/null differ diff --git a/projects/codes/DQN/Train_CartPole-v1_DQN_20221031-001201/results/res.csv b/projects/codes/DQN/Train_CartPole-v1_DQN_20221031-001201/results/res.csv deleted file mode 100644 index 3bf53a3..0000000 --- a/projects/codes/DQN/Train_CartPole-v1_DQN_20221031-001201/results/res.csv +++ /dev/null @@ -1,101 +0,0 @@ -episodes,rewards,steps -0,18.0,18 -1,35.0,35 -2,13.0,13 -3,32.0,32 -4,16.0,16 -5,12.0,12 -6,13.0,13 -7,15.0,15 -8,11.0,11 -9,15.0,15 -10,9.0,9 -11,13.0,13 -12,13.0,13 -13,10.0,10 -14,9.0,9 -15,24.0,24 -16,8.0,8 -17,10.0,10 -18,11.0,11 -19,13.0,13 -20,12.0,12 -21,11.0,11 -22,9.0,9 -23,21.0,21 -24,14.0,14 -25,12.0,12 -26,9.0,9 -27,11.0,11 -28,12.0,12 -29,13.0,13 -30,10.0,10 -31,13.0,13 -32,18.0,18 -33,9.0,9 -34,10.0,10 -35,9.0,9 -36,10.0,10 -37,10.0,10 -38,10.0,10 -39,8.0,8 -40,9.0,9 -41,9.0,9 -42,20.0,20 -43,16.0,16 -44,17.0,17 -45,17.0,17 -46,17.0,17 -47,18.0,18 -48,25.0,25 -49,31.0,31 -50,22.0,22 -51,39.0,39 -52,36.0,36 -53,26.0,26 -54,33.0,33 -55,56.0,56 -56,112.0,112 -57,101.0,101 -58,69.0,69 -59,75.0,75 -60,182.0,182 -61,52.0,52 -62,67.0,67 -63,53.0,53 -64,119.0,119 -65,200.0,200 -66,74.0,74 -67,138.0,138 -68,149.0,149 -69,144.0,144 -70,200.0,200 -71,198.0,198 -72,200.0,200 -73,200.0,200 -74,200.0,200 -75,200.0,200 -76,200.0,200 -77,200.0,200 -78,200.0,200 -79,200.0,200 -80,200.0,200 -81,200.0,200 -82,200.0,200 -83,200.0,200 -84,200.0,200 -85,200.0,200 -86,200.0,200 -87,200.0,200 -88,200.0,200 -89,200.0,200 -90,200.0,200 -91,200.0,200 -92,200.0,200 -93,200.0,200 -94,200.0,200 -95,200.0,200 -96,200.0,200 -97,200.0,200 -98,200.0,200 -99,200.0,200 diff --git a/projects/codes/DQN/config/Acrobot-v1_DQN_Test.yaml b/projects/codes/DQN/config/Acrobot-v1_DQN_Test.yaml deleted file mode 100644 index d6e8d84..0000000 --- a/projects/codes/DQN/config/Acrobot-v1_DQN_Test.yaml +++ /dev/null @@ -1,22 +0,0 @@ -general_cfg: - algo_name: DQN - device: cuda - env_name: Acrobot-v1 - mode: test - load_checkpoint: true - load_path: Train_Acrobot-v1_DQN_20221026-094645 - max_steps: 100000 - save_fig: true - seed: 1 - show_fig: false - test_eps: 10 - train_eps: 100 -algo_cfg: - batch_size: 128 - buffer_size: 200000 - epsilon_decay: 500 - epsilon_end: 0.01 - epsilon_start: 0.95 - gamma: 0.95 - lr: 0.002 - target_update: 4 diff --git a/projects/codes/DQN/config/Acrobot-v1_DQN_Train.yaml b/projects/codes/DQN/config/Acrobot-v1_DQN_Train.yaml deleted file mode 100644 index 0b18e79..0000000 --- a/projects/codes/DQN/config/Acrobot-v1_DQN_Train.yaml +++ /dev/null @@ -1,22 +0,0 @@ -general_cfg: - algo_name: DQN - device: cuda - env_name: Acrobot-v1 - mode: train - load_checkpoint: false - load_path: Train_CartPole-v1_DQN_20221026-054757 - max_steps: 100000 - save_fig: true - seed: 1 - show_fig: false - test_eps: 10 - train_eps: 100 -algo_cfg: - batch_size: 128 - buffer_size: 200000 - epsilon_decay: 500 - epsilon_end: 0.01 - epsilon_start: 0.95 - gamma: 0.95 - lr: 0.002 - target_update: 4 diff --git a/projects/codes/DQN/config/CartPole-v1_DQN_Test.yaml b/projects/codes/DQN/config/CartPole-v1_DQN_Test.yaml deleted file mode 100644 index baa98f0..0000000 --- a/projects/codes/DQN/config/CartPole-v1_DQN_Test.yaml +++ /dev/null @@ -1,22 +0,0 @@ -general_cfg: - algo_name: DQN - device: cuda - env_name: CartPole-v1 - mode: test - load_checkpoint: true - load_path: Train_CartPole-v1_DQN_20221031-001201 - max_steps: 200 - save_fig: true - seed: 0 - show_fig: false - test_eps: 10 - train_eps: 100 -algo_cfg: - batch_size: 64 - buffer_size: 100000 - epsilon_decay: 500 - epsilon_end: 0.01 - epsilon_start: 0.95 - gamma: 0.95 - lr: 0.0001 - target_update: 4 diff --git a/projects/codes/DQN/config/CartPole-v1_DQN_Train.yaml b/projects/codes/DQN/config/CartPole-v1_DQN_Train.yaml deleted file mode 100644 index 14297b5..0000000 --- a/projects/codes/DQN/config/CartPole-v1_DQN_Train.yaml +++ /dev/null @@ -1,22 +0,0 @@ -general_cfg: - algo_name: DQN - device: cuda - env_name: CartPole-v1 - mode: train - load_checkpoint: false - load_path: Train_CartPole-v1_DQN_20221026-054757 - max_steps: 200 - save_fig: true - seed: 0 - show_fig: false - test_eps: 10 - train_eps: 200 -algo_cfg: - batch_size: 64 - buffer_size: 100000 - epsilon_decay: 500 - epsilon_end: 0.01 - epsilon_start: 0.95 - gamma: 0.95 - lr: 0.0001 - target_update: 4 diff --git a/projects/codes/DQN/config/config.py b/projects/codes/DQN/config/config.py deleted file mode 100644 index 2653c8d..0000000 --- a/projects/codes/DQN/config/config.py +++ /dev/null @@ -1,38 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: JiangJi -Email: johnjim0816@gmail.com -Date: 2022-10-30 00:37:33 -LastEditor: JiangJi -LastEditTime: 2022-10-31 00:11:57 -Discription: default parameters of DQN -''' -from common.config import GeneralConfig,AlgoConfig -class GeneralConfigDQN(GeneralConfig): - def __init__(self) -> None: - self.env_name = "CartPole-v1" # name of environment - self.algo_name = "DQN" # name of algorithm - self.mode = "train" # train or test - self.seed = 1 # random seed - self.device = "cuda" # device to use - self.train_eps = 100 # number of episodes for training - self.test_eps = 10 # number of episodes for testing - self.max_steps = 200 # max steps for each episode - self.load_checkpoint = False - self.load_path = "tasks" # path to load model - self.show_fig = False # show figure or not - self.save_fig = True # save figure or not - -class AlgoConfigDQN(AlgoConfig): - def __init__(self) -> None: - # set epsilon_start=epsilon_end can obtain fixed epsilon=epsilon_end - self.epsilon_start = 0.95 # epsilon start value - self.epsilon_end = 0.01 # epsilon end value - self.epsilon_decay = 500 # epsilon decay rate - self.hidden_dim = 256 # hidden_dim for MLP - self.gamma = 0.95 # discount factor - self.lr = 0.0001 # learning rate - self.buffer_size = 100000 # size of replay buffer - self.batch_size = 64 # batch size - self.target_update = 800 # target network update frequency per steps diff --git a/projects/codes/DQN/dqn.py b/projects/codes/DQN/dqn.py deleted file mode 100644 index 761d25f..0000000 --- a/projects/codes/DQN/dqn.py +++ /dev/null @@ -1,130 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -@Author: John -@Email: johnjim0816@gmail.com -@Date: 2020-06-12 00:50:49 -@LastEditor: John -LastEditTime: 2022-10-31 00:07:19 -@Discription: -@Environment: python 3.7.7 -''' -'''off-policy -''' - -import torch -import torch.nn as nn -import torch.optim as optim -import random -import math -import numpy as np - -class DQN: - def __init__(self,model,memory,cfg): - - self.n_actions = cfg.n_actions - self.device = torch.device(cfg.device) - self.gamma = cfg.gamma - ## e-greedy parameters - self.sample_count = 0 # sample count for epsilon decay - self.epsilon = cfg.epsilon_start - self.sample_count = 0 - self.epsilon_start = cfg.epsilon_start - self.epsilon_end = cfg.epsilon_end - self.epsilon_decay = cfg.epsilon_decay - self.batch_size = cfg.batch_size - self.target_update = cfg.target_update - self.policy_net = model.to(self.device) - self.target_net = model.to(self.device) - ## copy parameters from policy net to target net - for target_param, param in zip(self.target_net.parameters(),self.policy_net.parameters()): - target_param.data.copy_(param.data) - # self.target_net.load_state_dict(self.policy_net.state_dict()) # or use this to copy parameters - self.optimizer = optim.Adam(self.policy_net.parameters(), lr=cfg.lr) - self.memory = memory - self.update_flag = False - - def sample_action(self, state): - ''' sample action with e-greedy policy - ''' - self.sample_count += 1 - # epsilon must decay(linear,exponential and etc.) for balancing exploration and exploitation - self.epsilon = self.epsilon_end + (self.epsilon_start - self.epsilon_end) * \ - math.exp(-1. * self.sample_count / self.epsilon_decay) - if random.random() > self.epsilon: - with torch.no_grad(): - state = torch.tensor(state, device=self.device, dtype=torch.float32).unsqueeze(dim=0) - q_values = self.policy_net(state) - action = q_values.max(1)[1].item() # choose action corresponding to the maximum q value - else: - action = random.randrange(self.n_actions) - return action - # @torch.no_grad() - # def sample_action(self, state): - # ''' sample action with e-greedy policy - # ''' - # self.sample_count += 1 - # # epsilon must decay(linear,exponential and etc.) for balancing exploration and exploitation - # self.epsilon = self.epsilon_end + (self.epsilon_start - self.epsilon_end) * \ - # math.exp(-1. * self.sample_count / self.epsilon_decay) - # if random.random() > self.epsilon: - # state = torch.tensor(state, device=self.device, dtype=torch.float32).unsqueeze(dim=0) - # q_values = self.policy_net(state) - # action = q_values.max(1)[1].item() # choose action corresponding to the maximum q value - # else: - # action = random.randrange(self.n_actions) - # return action - def predict_action(self,state): - ''' predict action - ''' - with torch.no_grad(): - state = torch.tensor(state, device=self.device, dtype=torch.float32).unsqueeze(dim=0) - q_values = self.policy_net(state) - action = q_values.max(1)[1].item() # choose action corresponding to the maximum q value - return action - def update(self): - if len(self.memory) < self.batch_size: # when transitions in memory donot meet a batch, not update - return - else: - if not self.update_flag: - print("Begin to update!") - self.update_flag = True - # sample a batch of transitions from replay buffer - state_batch, action_batch, reward_batch, next_state_batch, done_batch = self.memory.sample( - self.batch_size) - state_batch = torch.tensor(np.array(state_batch), device=self.device, dtype=torch.float) # shape(batchsize,n_states) - action_batch = torch.tensor(action_batch, device=self.device).unsqueeze(1) # shape(batchsize,1) - reward_batch = torch.tensor(reward_batch, device=self.device, dtype=torch.float).unsqueeze(1) # shape(batchsize,1) - next_state_batch = torch.tensor(np.array(next_state_batch), device=self.device, dtype=torch.float) # shape(batchsize,n_states) - done_batch = torch.tensor(np.float32(done_batch), device=self.device).unsqueeze(1) # shape(batchsize,1) - # print(state_batch.shape,action_batch.shape,reward_batch.shape,next_state_batch.shape,done_batch.shape) - # compute current Q(s_t,a), it is 'y_j' in pseucodes - q_value_batch = self.policy_net(state_batch).gather(dim=1, index=action_batch) # shape(batchsize,1),requires_grad=True - # print(q_values.requires_grad) - # compute max(Q(s_t+1,A_t+1)) respects to actions A, next_max_q_value comes from another net and is just regarded as constant for q update formula below, thus should detach to requires_grad=False - next_max_q_value_batch = self.target_net(next_state_batch).max(1)[0].detach().unsqueeze(1) - # print(q_values.shape,next_q_values.shape) - # compute expected q value, for terminal state, done_batch[0]=1, and expected_q_value=rewardcorrespondingly - expected_q_value_batch = reward_batch + self.gamma * next_max_q_value_batch* (1-done_batch) - # print(expected_q_value_batch.shape,expected_q_value_batch.requires_grad) - loss = nn.MSELoss()(q_value_batch, expected_q_value_batch) # shape same to - # backpropagation - self.optimizer.zero_grad() - loss.backward() - # clip to avoid gradient explosion - for param in self.policy_net.parameters(): - param.grad.data.clamp_(-1, 1) - self.optimizer.step() - if self.sample_count % self.target_update == 0: # target net update, target_update means "C" in pseucodes - self.target_net.load_state_dict(self.policy_net.state_dict()) - - def save_model(self, fpath): - from pathlib import Path - # create path - Path(fpath).mkdir(parents=True, exist_ok=True) - torch.save(self.target_net.state_dict(), f"{fpath}/checkpoint.pt") - - def load_model(self, fpath): - self.target_net.load_state_dict(torch.load(f"{fpath}/checkpoint.pt")) - for target_param, param in zip(self.target_net.parameters(), self.policy_net.parameters()): - param.data.copy_(target_param.data) diff --git a/projects/codes/DQN/task0.py b/projects/codes/DQN/task0.py deleted file mode 100644 index e69ed45..0000000 --- a/projects/codes/DQN/task0.py +++ /dev/null @@ -1,138 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: JiangJi -Email: johnjim0816@gmail.com -Date: 2022-10-12 11:09:54 -LastEditor: JiangJi -LastEditTime: 2022-10-31 00:13:31 -Discription: CartPole-v1,Acrobot-v1 -''' -import sys,os -curr_path = os.path.dirname(os.path.abspath(__file__)) # current path -parent_path = os.path.dirname(curr_path) # parent path -sys.path.append(parent_path) # add to system path -import gym -from common.utils import all_seed,merge_class_attrs -from common.models import MLP -from common.memories import ReplayBuffer -from common.launcher import Launcher -from envs.register import register_env -from dqn import DQN -from config.config import GeneralConfigDQN,AlgoConfigDQN -class Main(Launcher): - def __init__(self) -> None: - super().__init__() - self.cfgs['general_cfg'] = merge_class_attrs(self.cfgs['general_cfg'],GeneralConfigDQN()) - self.cfgs['algo_cfg'] = merge_class_attrs(self.cfgs['algo_cfg'],AlgoConfigDQN()) - def env_agent_config(self,cfg,logger): - ''' create env and agent - ''' - register_env(cfg.env_name) - env = gym.make(cfg.env_name,new_step_api=True) # create env - if cfg.seed !=0: # set random seed - all_seed(env,seed=cfg.seed) - try: # state dimension - n_states = env.observation_space.n # print(hasattr(env.observation_space, 'n')) - except AttributeError: - n_states = env.observation_space.shape[0] # print(hasattr(env.observation_space, 'shape')) - n_actions = env.action_space.n # action dimension - logger.info(f"n_states: {n_states}, n_actions: {n_actions}") # print info - # update to cfg paramters - setattr(cfg, 'n_states', n_states) - setattr(cfg, 'n_actions', n_actions) - # cfg.update({"n_states":n_states,"n_actions":n_actions}) # update to cfg paramters - model = MLP(n_states,n_actions,hidden_dim=cfg.hidden_dim) - memory = ReplayBuffer(cfg.buffer_size) # replay buffer - agent = DQN(model,memory,cfg) # create agent - return env, agent - def train_one_episode(self, env, agent, cfg): - ep_reward = 0 # reward per episode - ep_step = 0 - state = env.reset() # reset and obtain initial state - for _ in range(cfg.max_steps): - ep_step += 1 - action = agent.sample_action(state) # sample action - next_state, reward, terminated, truncated , info = env.step(action) # update env and return transitions under new_step_api of OpenAI Gym - agent.memory.push(state, action, reward, - next_state, terminated) # save transitions - agent.update() # update agent - state = next_state # update next state for env - ep_reward += reward # - if terminated: - break - return agent,ep_reward,ep_step - def test_one_episode(self, env, agent, cfg): - ep_reward = 0 # reward per episode - ep_step = 0 - state = env.reset() # reset and obtain initial state - for _ in range(cfg.max_steps): - ep_step += 1 - action = agent.predict_action(state) # sample action - next_state, reward, terminated, truncated , info = env.step(action) # update env and return transitions under new_step_api of OpenAI Gym - state = next_state # update next state for env - ep_reward += reward # - if terminated: - break - return agent,ep_reward,ep_step - # def train(self,env, agent,cfg,logger): - # ''' 训练 - # ''' - # logger.info("Start training!") - # logger.info(f"Env: {cfg.env_name}, Algorithm: {cfg.algo_name}, Device: {cfg.device}") - # rewards = [] # record rewards for all episodes - # steps = [] # record steps for all episodes - # for i_ep in range(cfg.train_eps): - # ep_reward = 0 # reward per episode - # ep_step = 0 - # state = env.reset() # reset and obtain initial state - # for _ in range(cfg.max_steps): - # ep_step += 1 - # action = agent.sample_action(state) # sample action - # next_state, reward, terminated, truncated , info = env.step(action) # update env and return transitions under new_step_api of OpenAI Gym - # agent.memory.push(state, action, reward, - # next_state, terminated) # save transitions - # state = next_state # update next state for env - # agent.update() # update agent - # ep_reward += reward # - # if terminated: - # break - # if (i_ep + 1) % cfg.target_update == 0: # target net update, target_update means "C" in pseucodes - # agent.target_net.load_state_dict(agent.policy_net.state_dict()) - # steps.append(ep_step) - # rewards.append(ep_reward) - # logger.info(f'Episode: {i_ep+1}/{cfg.train_eps}, Reward: {ep_reward:.2f}: Epislon: {agent.epsilon:.3f}') - # logger.info("Finish training!") - # env.close() - # res_dic = {'episodes':range(len(rewards)),'rewards':rewards,'steps':steps} - # return res_dic - - # def test(self,cfg, env, agent,logger): - # logger.info("Start testing!") - # logger.info(f"Env: {cfg.env_name}, Algorithm: {cfg.algo_name}, Device: {cfg.device}") - # rewards = [] # record rewards for all episodes - # steps = [] # record steps for all episodes - # for i_ep in range(cfg.test_eps): - # ep_reward = 0 # reward per episode - # ep_step = 0 - # state = env.reset() # reset and obtain initial state - # for _ in range(cfg.max_steps): - # ep_step+=1 - # action = agent.predict_action(state) # predict action - # next_state, reward, terminated, _, _ = env.step(action) - # state = next_state - # ep_reward += reward - # if terminated: - # break - # steps.append(ep_step) - # rewards.append(ep_reward) - # logger.info(f"Episode: {i_ep+1}/{cfg.test_eps}, Reward: {ep_reward:.2f}") - # logger.info("Finish testing!") - # env.close() - # return {'episodes':range(len(rewards)),'rewards':rewards,'steps':steps} - - -if __name__ == "__main__": - main = Main() - main.run() - diff --git a/projects/codes/DQN/task1.py b/projects/codes/DQN/task1.py deleted file mode 100644 index 590d0c2..0000000 --- a/projects/codes/DQN/task1.py +++ /dev/null @@ -1,221 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: JiangJi -Email: johnjim0816@gmail.com -Date: 2022-10-24 08:21:31 -LastEditor: JiangJi -LastEditTime: 2022-10-26 09:50:49 -Discription: Not finished -''' -import sys,os -os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE" # avoid "OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized." -curr_path = os.path.dirname(os.path.abspath(__file__)) # current path -parent_path = os.path.dirname(curr_path) # parent path -sys.path.append(parent_path) # add path to system path - -import gym -import torch -import datetime -import numpy as np -import argparse -from common.utils import all_seed -from common.models import MLP -from common.memories import ReplayBuffer -from common.launcher import Launcher -from envs.register import register_env -from dqn import DQN -import torch.nn as nn -import torch.nn.functional as F -import torchvision.transforms as T -from PIL import Image -resize = T.Compose([T.ToPILImage(), - T.Resize(40, interpolation=Image.CUBIC), - T.ToTensor()]) - -# xvfb-run -s "-screen 0 640x480x24" python main1.py -def get_cart_location(env,screen_width): - world_width = env.x_threshold * 2 - scale = screen_width / world_width - return int(env.state[0] * scale + screen_width / 2.0) # MIDDLE OF CART - -def get_screen(env): - # Returned screen requested by gym is 400x600x3, but is sometimes larger - # such as 800x1200x3. Transpose it into torch order (CHW). - screen = env.render().transpose((2, 0, 1)) - # Cart is in the lower half, so strip off the top and bottom of the screen - _, screen_height, screen_width = screen.shape - screen = screen[:, int(screen_height*0.4):int(screen_height * 0.8)] - view_width = int(screen_width * 0.6) - cart_location = get_cart_location(env,screen_width) - if cart_location < view_width // 2: - slice_range = slice(view_width) - elif cart_location > (screen_width - view_width // 2): - slice_range = slice(-view_width, None) - else: - slice_range = slice(cart_location - view_width // 2, - cart_location + view_width // 2) - # Strip off the edges, so that we have a square image centered on a cart - screen = screen[:, :, slice_range] - # Convert to float, rescale, convert to torch tensor - # (this doesn't require a copy) - screen = np.ascontiguousarray(screen, dtype=np.float32) / 255 - screen = torch.from_numpy(screen) - # Resize, and add a batch dimension (BCHW) - return resize(screen) - - -class CNN(nn.Module): - - def __init__(self, h, w, outputs): - super(CNN, self).__init__() - self.conv1 = nn.Conv2d(3, 16, kernel_size=5, stride=2) - self.bn1 = nn.BatchNorm2d(16) - self.conv2 = nn.Conv2d(16, 32, kernel_size=5, stride=2) - self.bn2 = nn.BatchNorm2d(32) - self.conv3 = nn.Conv2d(32, 32, kernel_size=5, stride=2) - self.bn3 = nn.BatchNorm2d(32) - - # Number of Linear input connections depends on output of conv2d layers - # and therefore the input image size, so compute it. - def conv2d_size_out(size, kernel_size = 5, stride = 2): - return (size - (kernel_size - 1) - 1) // stride + 1 - convw = conv2d_size_out(conv2d_size_out(conv2d_size_out(w))) - convh = conv2d_size_out(conv2d_size_out(conv2d_size_out(h))) - linear_input_size = convw * convh * 32 - self.head = nn.Linear(linear_input_size, outputs) - - # Called with either one element to determine next action, or a batch - # during optimization. Returns tensor([[left0exp,right0exp]...]). - def forward(self, x): - x = F.relu(self.bn1(self.conv1(x))) - x = F.relu(self.bn2(self.conv2(x))) - x = F.relu(self.bn3(self.conv3(x))) - return self.head(x.view(x.size(0), -1)) -class Main(Launcher): - def get_args(self): - """ hyperparameters - """ - curr_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S") # obtain current time - parser = argparse.ArgumentParser(description="hyperparameters") - parser.add_argument('--algo_name',default='DQN',type=str,help="name of algorithm") - parser.add_argument('--env_name',default='CartPole-v1',type=str,help="name of environment") - parser.add_argument('--train_eps',default=800,type=int,help="episodes of training") - parser.add_argument('--test_eps',default=20,type=int,help="episodes of testing") - parser.add_argument('--ep_max_steps',default = 100000,type=int,help="steps per episode, much larger value can simulate infinite steps") - parser.add_argument('--gamma',default=0.999,type=float,help="discounted factor") - parser.add_argument('--epsilon_start',default=0.95,type=float,help="initial value of epsilon") - parser.add_argument('--epsilon_end',default=0.01,type=float,help="final value of epsilon") - parser.add_argument('--epsilon_decay',default=500,type=int,help="decay rate of epsilon, the higher value, the slower decay") - parser.add_argument('--lr',default=0.0001,type=float,help="learning rate") - parser.add_argument('--memory_capacity',default=100000,type=int,help="memory capacity") - parser.add_argument('--batch_size',default=128,type=int) - parser.add_argument('--target_update',default=4,type=int) - parser.add_argument('--hidden_dim',default=256,type=int) - parser.add_argument('--device',default='cuda',type=str,help="cpu or cuda") - parser.add_argument('--seed',default=10,type=int,help="seed") - parser.add_argument('--show_fig',default=False,type=bool,help="if show figure or not") - parser.add_argument('--save_fig',default=True,type=bool,help="if save figure or not") - # please manually change the following args in this script if you want - parser.add_argument('--result_path',default=curr_path + "/outputs/" + parser.parse_args().env_name + \ - '/' + curr_time + '/results' ) - parser.add_argument('--model_path',default=curr_path + "/outputs/" + parser.parse_args().env_name + \ - '/' + curr_time + '/models' ) - args = parser.parse_args() - args = {**vars(args)} # type(dict) - return args - - def env_agent_config(self,cfg): - ''' create env and agent - ''' - env = gym.make('CartPole-v1', new_step_api=True, render_mode='single_rgb_array').unwrapped - if cfg['seed'] !=0: # set random seed - all_seed(env,seed=cfg["seed"]) - try: # state dimension - n_states = env.observation_space.n # print(hasattr(env.observation_space, 'n')) - except AttributeError: - n_states = env.observation_space.shape[0] # print(hasattr(env.observation_space, 'shape')) - n_actions = env.action_space.n # action dimension - print(f"n_states: {n_states}, n_actions: {n_actions}") - cfg.update({"n_states":n_states,"n_actions":n_actions}) # update to cfg paramters - env.reset() - init_screen = get_screen(env) - _, screen_height, screen_width = init_screen.shape - model = CNN(screen_height, screen_width, n_actions) - memory = ReplayBuffer(cfg["memory_capacity"]) # replay buffer - agent = DQN(model,memory,cfg) # create agent - return env, agent - - def train(self,cfg, env, agent): - ''' 训练 - ''' - print("Start training!") - print(f"Env: {cfg['env_name']}, Algorithm: {cfg['algo_name']}, Device: {cfg['device']}") - rewards = [] # record rewards for all episodes - steps = [] - for i_ep in range(cfg["train_eps"]): - ep_reward = 0 # reward per episode - ep_step = 0 - state = env.reset() # reset and obtain initial state - last_screen = get_screen(env) - current_screen = get_screen(env) - state = current_screen - last_screen - for _ in range(cfg['ep_max_steps']): - ep_step += 1 - action = agent.sample_action(state) # sample action - _, reward, done, _,_ = env.step(action) # update env and return transitions - last_screen = current_screen - current_screen = get_screen(env) - next_state = current_screen - last_screen - agent.memory.push(state.cpu().numpy(), action, reward, - next_state.cpu().numpy(), done) # save transitions - state = next_state # update next state for env - agent.update() # update agent - ep_reward += reward # - if done: - break - if (i_ep + 1) % cfg["target_update"] == 0: # target net update, target_update means "C" in pseucodes - agent.target_net.load_state_dict(agent.policy_net.state_dict()) - steps.append(ep_step) - rewards.append(ep_reward) - if (i_ep + 1) % 10 == 0: - print(f'Episode: {i_ep+1}/{cfg["train_eps"]}, Reward: {ep_reward:.2f}, step: {ep_step:d}, Epislon: {agent.epsilon:.3f}') - print("Finish training!") - env.close() - res_dic = {'episodes':range(len(rewards)),'rewards':rewards,'steps':steps} - return res_dic - - def test(self,cfg, env, agent): - print("Start testing!") - print(f"Env: {cfg['env_name']}, Algorithm: {cfg['algo_name']}, Device: {cfg['device']}") - rewards = [] # record rewards for all episodes - steps = [] - for i_ep in range(cfg['test_eps']): - ep_reward = 0 # reward per episode - ep_step = 0 - state = env.reset() # reset and obtain initial state - last_screen = get_screen(env) - current_screen = get_screen(env) - state = current_screen - last_screen - for _ in range(cfg['ep_max_steps']): - ep_step+=1 - action = agent.predict_action(state) # predict action - _, reward, done, _,_ = env.step(action) - last_screen = current_screen - current_screen = get_screen(env) - next_state = current_screen - last_screen - state = next_state - ep_reward += reward - if done: - break - steps.append(ep_step) - rewards.append(ep_reward) - print(f"Episode: {i_ep+1}/{cfg['test_eps']},Reward: {ep_reward:.2f}") - print("Finish testing!") - env.close() - return {'episodes':range(len(rewards)),'rewards':rewards,'steps':steps} - - -if __name__ == "__main__": - main = Main() - main.run() diff --git a/projects/codes/DoubleDQN/README.md b/projects/codes/DoubleDQN/README.md deleted file mode 100644 index 714bd26..0000000 --- a/projects/codes/DoubleDQN/README.md +++ /dev/null @@ -1,39 +0,0 @@ -食用本篇之前,需要有DQN算法的基础,参考[DQN算法实战](../DQN)。 - -## 原理简介 - -Double-DQN是2016年提出的算法,灵感源自2010年的Double-Qlearning,可参考论文[Deep Reinforcement Learning with Double Q-learning](https://arxiv.org/abs/1509.06461)。 -跟Nature DQN一样,Double-DQN也用了两个网络,一个当前网络(对应用$Q$表示),一个目标网络(对应一般用$Q'$表示,为方便区分,以下用$Q_{tar}$代替)。我们先回忆一下,对于非终止状态,目标$Q_{tar}$值计算如下 -![在这里插入图片描述](assets/20201222145725907.png) - -而在Double-DQN中,不再是直接从目标$Q_{tar}$网络中选择各个动作中的最大$Q_{tar}$值,而是先从当前$Q$网络选择$Q$值最大对应的动作,然后代入到目标网络中计算对应的值: -![在这里插入图片描述](assets/20201222150225327.png) -Double-DQN的好处是Nature DQN中使用max虽然可以快速让Q值向可能的优化目标靠拢,但是很容易过犹不及,导致过度估计(Over Estimation),所谓过度估计就是最终我们得到的算法模型有很大的偏差(bias)。为了解决这个问题, DDQN通过解耦目标Q值动作的选择和目标Q值的计算这两步,来达到消除过度估计的问题,感兴趣可以阅读原论文。 - -伪代码如下: -![在这里插入图片描述](assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0pvaG5KaW0w,size_16,color_FFFFFF,t_70.png) -当然也可以两个网络可以同时为当前网络和目标网络,如下: -![在这里插入图片描述](assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0pvaG5KaW0w,size_16,color_FFFFFF,t_70-20210328110837146.png) -或者这样更好理解如何同时为当前网络和目标网络: -![在这里插入图片描述](assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0pvaG5KaW0w,size_16,color_FFFFFF,t_70-20210328110837157.png) - -## 代码实战 -完整程序见[github](https://github.com/JohnJim0816/reinforcement-learning-tutorials/tree/master/DoubleDQN)。结合上面的原理,其实Double DQN改进来很简单,基本只需要在```update```中修改几行代码,如下: -```python -'''以下是Nature DQN的q_target计算方式 -next_q_state_value = self.target_net( -next_state_batch).max(1)[0].detach() # # 计算所有next states的Q'(s_{t+1})的最大值,Q'为目标网络的q函数,比如tensor([ 0.0060, -0.0171,...,]) -#计算 q_target -#对于终止状态,此时done_batch[0]=1, 对应的expected_q_value等于reward -q_target = reward_batch + self.gamma * next_q_state_value * (1-done_batch[0]) -''' -'''以下是Double DQNq_target计算方式,与NatureDQN稍有不同''' -next_target_values = self.target_net( -next_state_batch) -#选出Q(s_t‘, a)对应的action,代入到next_target_values获得target net对应的next_q_value,即Q’(s_t|a=argmax Q(s_t‘, a)) -next_target_q_value = next_target_values.gather(1, torch.max(next_q_values, 1)[1].unsqueeze(1)).squeeze(1) -q_target = reward_batch + self.gamma * next_target_q_value * (1-done_batch[0]) -``` -reward变化结果如下: -![在这里插入图片描述](assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0pvaG5KaW0w,size_16,color_FFFFFF,t_70-20210328110837128.png) -其中下边蓝色和红色分别表示Double DQN和Nature DQN在训练中的reward变化图,而上面蓝色和绿色则表示Double DQN和Nature DQN在测试中的reward变化图。 \ No newline at end of file diff --git a/projects/codes/DoubleDQN/assets/20201222145725907.png b/projects/codes/DoubleDQN/assets/20201222145725907.png deleted file mode 100644 index d2cbb2d..0000000 Binary files a/projects/codes/DoubleDQN/assets/20201222145725907.png and /dev/null differ diff --git a/projects/codes/DoubleDQN/assets/20201222150225327.png b/projects/codes/DoubleDQN/assets/20201222150225327.png deleted file mode 100644 index 20b79be..0000000 Binary files a/projects/codes/DoubleDQN/assets/20201222150225327.png and /dev/null differ diff --git a/projects/codes/DoubleDQN/assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0pvaG5KaW0w,size_16,color_FFFFFF,t_70-20210328110837128.png b/projects/codes/DoubleDQN/assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0pvaG5KaW0w,size_16,color_FFFFFF,t_70-20210328110837128.png deleted file mode 100644 index 427a903..0000000 Binary files a/projects/codes/DoubleDQN/assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0pvaG5KaW0w,size_16,color_FFFFFF,t_70-20210328110837128.png and /dev/null differ diff --git a/projects/codes/DoubleDQN/assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0pvaG5KaW0w,size_16,color_FFFFFF,t_70-20210328110837146.png b/projects/codes/DoubleDQN/assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0pvaG5KaW0w,size_16,color_FFFFFF,t_70-20210328110837146.png deleted file mode 100644 index d95f900..0000000 Binary files a/projects/codes/DoubleDQN/assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0pvaG5KaW0w,size_16,color_FFFFFF,t_70-20210328110837146.png and /dev/null differ diff --git a/projects/codes/DoubleDQN/assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0pvaG5KaW0w,size_16,color_FFFFFF,t_70-20210328110837157.png b/projects/codes/DoubleDQN/assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0pvaG5KaW0w,size_16,color_FFFFFF,t_70-20210328110837157.png deleted file mode 100644 index ddeda96..0000000 Binary files a/projects/codes/DoubleDQN/assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0pvaG5KaW0w,size_16,color_FFFFFF,t_70-20210328110837157.png and /dev/null differ diff --git a/projects/codes/DoubleDQN/assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0pvaG5KaW0w,size_16,color_FFFFFF,t_70.png b/projects/codes/DoubleDQN/assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0pvaG5KaW0w,size_16,color_FFFFFF,t_70.png deleted file mode 100644 index dec19e5..0000000 Binary files a/projects/codes/DoubleDQN/assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0pvaG5KaW0w,size_16,color_FFFFFF,t_70.png and /dev/null differ diff --git a/projects/codes/DoubleDQN/double_dqn.py b/projects/codes/DoubleDQN/double_dqn.py deleted file mode 100644 index b7f4e97..0000000 --- a/projects/codes/DoubleDQN/double_dqn.py +++ /dev/null @@ -1,106 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -@Author: John -@Email: johnjim0816@gmail.com -@Date: 2020-06-12 00:50:49 -@LastEditor: John -LastEditTime: 2022-08-29 23:34:20 -@Discription: -@Environment: python 3.7.7 -''' -'''off-policy -''' - - -import torch -import torch.nn as nn -import torch.optim as optim -import torch.nn.functional as F -import random -import math -import numpy as np -class DoubleDQN: - def __init__(self,models, memories, cfg): - self.n_actions = cfg['n_actions'] - self.device = torch.device(cfg['device']) - self.gamma = cfg['gamma'] - ## e-greedy parameters - self.sample_count = 0 # sample count for epsilon decay - self.epsilon_start = cfg['epsilon_start'] - self.epsilon_end = cfg['epsilon_end'] - self.epsilon_decay = cfg['epsilon_decay'] - self.batch_size = cfg['batch_size'] - self.policy_net = models['Qnet'].to(self.device) - self.target_net = models['Qnet'].to(self.device) - # target_net copy from policy_net - for target_param, param in zip(self.target_net.parameters(), self.policy_net.parameters()): - target_param.data.copy_(param.data) - # self.target_net.eval() # donnot use BatchNormalization or Dropout - # the difference between parameters() and state_dict() is that parameters() require_grad=True - self.optimizer = optim.Adam(self.policy_net.parameters(), lr=cfg['lr']) - self.memory = memories['Memory'] - self.update_flag = False - - def sample_action(self, state): - ''' sample action - ''' - self.sample_count += 1 - self.epsilon = self.epsilon_end + (self.epsilon_start - self.epsilon_end) * math.exp(-1. * self.sample_count / self.epsilon_decay) - if random.random() > self.epsilon: - with torch.no_grad(): - state = torch.tensor(state, device=self.device, dtype=torch.float32).unsqueeze(0) - q_value = self.policy_net(state) - action = q_value.max(1)[1].item() - else: - action = random.randrange(self.n_actions) - return action - def predict_action(self, state): - ''' predict action - ''' - with torch.no_grad(): - state = torch.tensor(state, device=self.device, dtype=torch.float32).unsqueeze(0) - q_value = self.policy_net(state) - action = q_value.max(1)[1].item() - return action - def update(self): - if len(self.memory) < self.batch_size: # when transitions in memory donot meet a batch, not update - return - else: - if not self.update_flag: - print("Begin to update!") - self.update_flag = True - # sample a batch of transitions from replay buffer - state_batch, action_batch, reward_batch, next_state_batch, done_batch = self.memory.sample(self.batch_size) - # convert to tensor - state_batch = torch.tensor(np.array(state_batch), device=self.device, dtype=torch.float) - action_batch = torch.tensor(action_batch, device=self.device).unsqueeze(1) # shape(batchsize,1) - reward_batch = torch.tensor(reward_batch, device=self.device, dtype=torch.float).unsqueeze(1) # shape(batchsize,1) - next_state_batch = torch.tensor(np.array(next_state_batch), device=self.device, dtype=torch.float) - done_batch = torch.tensor(np.float32(done_batch), device=self.device).unsqueeze(1) # shape(batchsize,1) - # compute current Q(s_t|a=a_t) - q_value_batch = self.policy_net(state_batch).gather(dim=1, index=action_batch) # shape(batchsize,1),requires_grad=True - next_q_value_batch = self.policy_net(next_state_batch) - '''the following is the way of computing Double DQN expected_q_value,a bit different from Nature DQN''' - next_target_value_batch = self.target_net(next_state_batch) - # choose action a from Q(s_t‘, a), next_target_values obtain next_q_value,which is Q’(s_t|a=argmax Q(s_t‘, a)) - next_target_q_value_batch = next_target_value_batch.gather(1, torch.max(next_q_value_batch, 1)[1].unsqueeze(1)) # shape(batchsize,1) - expected_q_value_batch = reward_batch + self.gamma * next_target_q_value_batch * (1-done_batch) - loss = nn.MSELoss()(q_value_batch , expected_q_value_batch) - self.optimizer.zero_grad() - loss.backward() - # clip to avoid gradient explosion - for param in self.policy_net.parameters(): - param.grad.data.clamp_(-1, 1) - self.optimizer.step() - - def save_model(self,path): - from pathlib import Path - # create path - Path(path).mkdir(parents=True, exist_ok=True) - torch.save(self.target_net.state_dict(), path+'checkpoint.pth') - - def load_model(self,path): - self.target_net.load_state_dict(torch.load(path+'checkpoint.pth')) - for target_param, param in zip(self.target_net.parameters(), self.policy_net.parameters()): - param.data.copy_(target_param.data) diff --git a/projects/codes/DoubleDQN/main.py b/projects/codes/DoubleDQN/main.py deleted file mode 100644 index a66025e..0000000 --- a/projects/codes/DoubleDQN/main.py +++ /dev/null @@ -1,129 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: JiangJi -Email: johnjim0816@gmail.com -Date: 2021-11-07 18:10:37 -LastEditor: JiangJi -LastEditTime: 2022-08-29 23:33:31 -Discription: -''' -import sys,os -curr_path = os.path.dirname(os.path.abspath(__file__)) # current path -parent_path = os.path.dirname(curr_path) # parent path -sys.path.append(parent_path) # add to system path - -import gym -import datetime -import argparse - -from common.utils import all_seed -from common.models import MLP -from common.memories import ReplayBufferQue -from DoubleDQN.double_dqn import DoubleDQN -from common.launcher import Launcher -from envs.register import register_env -class Main(Launcher): - def get_args(self): - ''' hyperparameters - ''' - curr_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S") # obtain current time - parser = argparse.ArgumentParser(description="hyperparameters") - parser.add_argument('--algo_name',default='DoubleDQN',type=str,help="name of algorithm") - parser.add_argument('--env_name',default='CartPole-v0',type=str,help="name of environment") - parser.add_argument('--train_eps',default=200,type=int,help="episodes of training") - parser.add_argument('--test_eps',default=20,type=int,help="episodes of testing") - parser.add_argument('--ep_max_steps',default = 100000,type=int,help="steps per episode, much larger value can simulate infinite steps") - parser.add_argument('--gamma',default=0.95,type=float,help="discounted factor") - parser.add_argument('--epsilon_start',default=0.95,type=float,help="initial value of epsilon") - parser.add_argument('--epsilon_end',default=0.01,type=float,help="final value of epsilon") - parser.add_argument('--epsilon_decay',default=500,type=int,help="decay rate of epsilon") - parser.add_argument('--lr',default=0.0001,type=float,help="learning rate") - parser.add_argument('--memory_capacity',default=100000,type=int,help="memory capacity") - parser.add_argument('--batch_size',default=64,type=int) - parser.add_argument('--target_update',default=4,type=int) - parser.add_argument('--hidden_dim',default=256,type=int) - parser.add_argument('--device',default='cpu',type=str,help="cpu or cuda") - parser.add_argument('--seed',default=1,type=int,help="seed") - parser.add_argument('--show_fig',default=False,type=bool,help="if show figure or not") - parser.add_argument('--save_fig',default=True,type=bool,help="if save figure or not") - args = parser.parse_args() - default_args = {'result_path':f"{curr_path}/outputs/{args.env_name}/{curr_time}/results/", - 'model_path':f"{curr_path}/outputs/{args.env_name}/{curr_time}/models/", - } - args = {**vars(args),**default_args} # type(dict) - return args - def env_agent_config(self,cfg): - ''' create env and agent - ''' - register_env(cfg['env_name']) - env = gym.make(cfg['env_name']) - if cfg['seed'] !=0: # set random seed - all_seed(env,seed=cfg["seed"]) - try: # state dimension - n_states = env.observation_space.n # print(hasattr(env.observation_space, 'n')) - except AttributeError: - n_states = env.observation_space.shape[0] # print(hasattr(env.observation_space, 'shape')) - n_actions = env.action_space.n # action dimension - print(f"n_states: {n_states}, n_actions: {n_actions}") - cfg.update({"n_states":n_states,"n_actions":n_actions}) # update to cfg paramters - models = {'Qnet':MLP(n_states,n_actions,hidden_dim=cfg['hidden_dim'])} - memories = {'Memory':ReplayBufferQue(cfg['memory_capacity'])} - agent = DoubleDQN(models,memories,cfg) - return env,agent - - def train(self,cfg,env,agent): - print("Start training!") - print(f"Env: {cfg['env_name']}, Algorithm: {cfg['algo_name']}, Device: {cfg['device']}") - rewards = [] # record rewards for all episodes - steps = [] - for i_ep in range(cfg["train_eps"]): - ep_reward = 0 # reward per episode - ep_step = 0 - state = env.reset() # reset and obtain initial state - for _ in range(cfg['ep_max_steps']): - action = agent.sample_action(state) - next_state, reward, done, _ = env.step(action) - ep_reward += reward - agent.memory.push((state, action, reward, next_state, done)) - state = next_state - agent.update() - if done: - break - if i_ep % cfg['target_update'] == 0: - agent.target_net.load_state_dict(agent.policy_net.state_dict()) - steps.append(ep_step) - rewards.append(ep_reward) - if (i_ep+1)%10 == 0: - print(f'Episode: {i_ep+1}/{cfg["train_eps"]}, Reward: {ep_reward:.2f}: Epislon: {agent.epsilon:.3f}') - print("Finish training!") - env.close() - res_dic = {'episodes':range(len(rewards)),'rewards':rewards,'steps':steps} - return res_dic - - def test(self,cfg,env,agent): - print("Start testing!") - print(f"Env: {cfg['env_name']}, Algorithm: {cfg['algo_name']}, Device: {cfg['device']}") - rewards = [] # record rewards for all episodes - steps = [] - for i_ep in range(cfg['test_eps']): - ep_reward = 0 # reward per episode - ep_step = 0 - state = env.reset() # reset and obtain initial state - for _ in range(cfg['ep_max_steps']): - action = agent.predict_action(state) - next_state, reward, done, _ = env.step(action) - state = next_state - ep_reward += reward - if done: - break - steps.append(ep_step) - rewards.append(ep_reward) - print(f"Episode: {i_ep+1}/{cfg['test_eps']},Reward: {ep_reward:.2f}") - print("Finish testing!") - env.close() - return {'episodes':range(len(rewards)),'rewards':rewards,'steps':steps} - -if __name__ == "__main__": - main = Main() - main.run() diff --git a/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233435/models/checkpoint.pth b/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233435/models/checkpoint.pth deleted file mode 100644 index d402ba1..0000000 Binary files a/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233435/models/checkpoint.pth and /dev/null differ diff --git a/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233435/results/params.json b/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233435/results/params.json deleted file mode 100644 index 91df006..0000000 --- a/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233435/results/params.json +++ /dev/null @@ -1 +0,0 @@ -{"algo_name": "DoubleDQN", "env_name": "CartPole-v0", "train_eps": 200, "test_eps": 20, "ep_max_steps": 100000, "gamma": 0.95, "epsilon_start": 0.95, "epsilon_end": 0.01, "epsilon_decay": 500, "lr": 0.0001, "memory_capacity": 100000, "batch_size": 64, "target_update": 4, "hidden_dim": 256, "device": "cpu", "seed": 1, "show_fig": false, "save_fig": true, "result_path": "c:\\Users\\24438\\Desktop\\rl-tutorials\\codes\\DoubleDQN/outputs/CartPole-v0/20220829-233435/results/", "model_path": "c:\\Users\\24438\\Desktop\\rl-tutorials\\codes\\DoubleDQN/outputs/CartPole-v0/20220829-233435/models/", "n_states": 4, "n_actions": 2} \ No newline at end of file diff --git a/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233435/results/testing_curve.png b/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233435/results/testing_curve.png deleted file mode 100644 index fe21c95..0000000 Binary files a/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233435/results/testing_curve.png and /dev/null differ diff --git a/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233435/results/testing_results.csv b/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233435/results/testing_results.csv deleted file mode 100644 index 2a504ee..0000000 --- a/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233435/results/testing_results.csv +++ /dev/null @@ -1,21 +0,0 @@ -episodes,rewards,steps -0,145.0,0 -1,166.0,0 -2,171.0,0 -3,200.0,0 -4,139.0,0 -5,200.0,0 -6,200.0,0 -7,141.0,0 -8,200.0,0 -9,187.0,0 -10,166.0,0 -11,172.0,0 -12,121.0,0 -13,200.0,0 -14,200.0,0 -15,149.0,0 -16,128.0,0 -17,200.0,0 -18,178.0,0 -19,185.0,0 diff --git a/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233435/results/training_curve.png b/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233435/results/training_curve.png deleted file mode 100644 index a8475ea..0000000 Binary files a/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233435/results/training_curve.png and /dev/null differ diff --git a/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233435/results/training_results.csv b/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233435/results/training_results.csv deleted file mode 100644 index 8f87049..0000000 --- a/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233435/results/training_results.csv +++ /dev/null @@ -1,201 +0,0 @@ -episodes,rewards,steps -0,19.0,0 -1,16.0,0 -2,17.0,0 -3,11.0,0 -4,10.0,0 -5,27.0,0 -6,16.0,0 -7,9.0,0 -8,20.0,0 -9,21.0,0 -10,15.0,0 -11,10.0,0 -12,14.0,0 -13,37.0,0 -14,12.0,0 -15,10.0,0 -16,27.0,0 -17,33.0,0 -18,19.0,0 -19,13.0,0 -20,26.0,0 -21,15.0,0 -22,29.0,0 -23,11.0,0 -24,20.0,0 -25,23.0,0 -26,23.0,0 -27,26.0,0 -28,17.0,0 -29,33.0,0 -30,16.0,0 -31,48.0,0 -32,48.0,0 -33,69.0,0 -34,58.0,0 -35,24.0,0 -36,18.0,0 -37,28.0,0 -38,12.0,0 -39,12.0,0 -40,18.0,0 -41,12.0,0 -42,13.0,0 -43,21.0,0 -44,30.0,0 -45,32.0,0 -46,22.0,0 -47,18.0,0 -48,12.0,0 -49,12.0,0 -50,20.0,0 -51,32.0,0 -52,15.0,0 -53,100.0,0 -54,26.0,0 -55,25.0,0 -56,18.0,0 -57,15.0,0 -58,35.0,0 -59,12.0,0 -60,65.0,0 -61,27.0,0 -62,29.0,0 -63,22.0,0 -64,83.0,0 -65,24.0,0 -66,28.0,0 -67,15.0,0 -68,43.0,0 -69,13.0,0 -70,22.0,0 -71,46.0,0 -72,14.0,0 -73,32.0,0 -74,44.0,0 -75,53.0,0 -76,31.0,0 -77,51.0,0 -78,61.0,0 -79,30.0,0 -80,36.0,0 -81,30.0,0 -82,48.0,0 -83,26.0,0 -84,27.0,0 -85,43.0,0 -86,20.0,0 -87,87.0,0 -88,71.0,0 -89,43.0,0 -90,57.0,0 -91,40.0,0 -92,37.0,0 -93,43.0,0 -94,31.0,0 -95,45.0,0 -96,47.0,0 -97,52.0,0 -98,48.0,0 -99,98.0,0 -100,49.0,0 -101,98.0,0 -102,68.0,0 -103,70.0,0 -104,74.0,0 -105,73.0,0 -106,127.0,0 -107,92.0,0 -108,70.0,0 -109,97.0,0 -110,66.0,0 -111,112.0,0 -112,138.0,0 -113,81.0,0 -114,74.0,0 -115,153.0,0 -116,113.0,0 -117,88.0,0 -118,138.0,0 -119,200.0,0 -120,84.0,0 -121,123.0,0 -122,158.0,0 -123,171.0,0 -124,137.0,0 -125,143.0,0 -126,170.0,0 -127,127.0,0 -128,118.0,0 -129,200.0,0 -130,189.0,0 -131,149.0,0 -132,137.0,0 -133,115.0,0 -134,153.0,0 -135,136.0,0 -136,140.0,0 -137,169.0,0 -138,187.0,0 -139,200.0,0 -140,196.0,0 -141,200.0,0 -142,200.0,0 -143,137.0,0 -144,200.0,0 -145,185.0,0 -146,200.0,0 -147,164.0,0 -148,200.0,0 -149,143.0,0 -150,143.0,0 -151,112.0,0 -152,192.0,0 -153,200.0,0 -154,144.0,0 -155,188.0,0 -156,200.0,0 -157,133.0,0 -158,200.0,0 -159,143.0,0 -160,158.0,0 -161,161.0,0 -162,169.0,0 -163,176.0,0 -164,200.0,0 -165,149.0,0 -166,156.0,0 -167,200.0,0 -168,200.0,0 -169,200.0,0 -170,134.0,0 -171,171.0,0 -172,200.0,0 -173,200.0,0 -174,200.0,0 -175,194.0,0 -176,200.0,0 -177,138.0,0 -178,159.0,0 -179,187.0,0 -180,200.0,0 -181,192.0,0 -182,200.0,0 -183,200.0,0 -184,200.0,0 -185,173.0,0 -186,200.0,0 -187,178.0,0 -188,176.0,0 -189,196.0,0 -190,200.0,0 -191,195.0,0 -192,158.0,0 -193,156.0,0 -194,200.0,0 -195,200.0,0 -196,200.0,0 -197,200.0,0 -198,193.0,0 -199,200.0,0 diff --git a/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233635/models/checkpoint.pth b/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233635/models/checkpoint.pth deleted file mode 100644 index 01e8c46..0000000 Binary files a/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233635/models/checkpoint.pth and /dev/null differ diff --git a/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233635/results/params.json b/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233635/results/params.json deleted file mode 100644 index 2d2c2ca..0000000 --- a/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233635/results/params.json +++ /dev/null @@ -1 +0,0 @@ -{"algo_name": "DoubleDQN", "env_name": "CartPole-v0", "train_eps": 200, "test_eps": 20, "ep_max_steps": 100000, "gamma": 0.95, "epsilon_start": 0.95, "epsilon_end": 0.01, "epsilon_decay": 500, "lr": 0.0001, "memory_capacity": 100000, "batch_size": 64, "target_update": 4, "hidden_dim": 256, "device": "cuda", "seed": 1, "show_fig": false, "save_fig": true, "result_path": "C:\\Users\\24438\\Desktop\\rl-tutorials\\codes\\DoubleDQN/outputs/CartPole-v0/20220829-233635/results/", "model_path": "C:\\Users\\24438\\Desktop\\rl-tutorials\\codes\\DoubleDQN/outputs/CartPole-v0/20220829-233635/models/", "n_states": 4, "n_actions": 2} \ No newline at end of file diff --git a/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233635/results/testing_curve.png b/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233635/results/testing_curve.png deleted file mode 100644 index 288ee92..0000000 Binary files a/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233635/results/testing_curve.png and /dev/null differ diff --git a/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233635/results/testing_results.csv b/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233635/results/testing_results.csv deleted file mode 100644 index 6e8adb7..0000000 --- a/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233635/results/testing_results.csv +++ /dev/null @@ -1,21 +0,0 @@ -episodes,rewards,steps -0,200.0,0 -1,200.0,0 -2,200.0,0 -3,200.0,0 -4,191.0,0 -5,200.0,0 -6,200.0,0 -7,179.0,0 -8,200.0,0 -9,200.0,0 -10,200.0,0 -11,190.0,0 -12,147.0,0 -13,197.0,0 -14,200.0,0 -15,200.0,0 -16,167.0,0 -17,200.0,0 -18,200.0,0 -19,200.0,0 diff --git a/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233635/results/training_curve.png b/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233635/results/training_curve.png deleted file mode 100644 index 544de6e..0000000 Binary files a/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233635/results/training_curve.png and /dev/null differ diff --git a/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233635/results/training_results.csv b/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233635/results/training_results.csv deleted file mode 100644 index 67bdb9e..0000000 --- a/projects/codes/DoubleDQN/outputs/CartPole-v0/20220829-233635/results/training_results.csv +++ /dev/null @@ -1,201 +0,0 @@ -episodes,rewards,steps -0,19.0,0 -1,16.0,0 -2,17.0,0 -3,11.0,0 -4,10.0,0 -5,27.0,0 -6,55.0,0 -7,17.0,0 -8,23.0,0 -9,9.0,0 -10,17.0,0 -11,14.0,0 -12,17.0,0 -13,12.0,0 -14,14.0,0 -15,16.0,0 -16,27.0,0 -17,36.0,0 -18,17.0,0 -19,17.0,0 -20,21.0,0 -21,23.0,0 -22,13.0,0 -23,12.0,0 -24,17.0,0 -25,26.0,0 -26,25.0,0 -27,17.0,0 -28,10.0,0 -29,16.0,0 -30,14.0,0 -31,19.0,0 -32,23.0,0 -33,37.0,0 -34,29.0,0 -35,22.0,0 -36,29.0,0 -37,15.0,0 -38,16.0,0 -39,18.0,0 -40,23.0,0 -41,16.0,0 -42,26.0,0 -43,13.0,0 -44,24.0,0 -45,39.0,0 -46,23.0,0 -47,32.0,0 -48,123.0,0 -49,18.0,0 -50,39.0,0 -51,17.0,0 -52,28.0,0 -53,34.0,0 -54,26.0,0 -55,61.0,0 -56,28.0,0 -57,16.0,0 -58,45.0,0 -59,41.0,0 -60,49.0,0 -61,18.0,0 -62,40.0,0 -63,24.0,0 -64,37.0,0 -65,26.0,0 -66,51.0,0 -67,17.0,0 -68,152.0,0 -69,17.0,0 -70,29.0,0 -71,37.0,0 -72,15.0,0 -73,55.0,0 -74,152.0,0 -75,23.0,0 -76,45.0,0 -77,30.0,0 -78,39.0,0 -79,20.0,0 -80,53.0,0 -81,49.0,0 -82,71.0,0 -83,115.0,0 -84,41.0,0 -85,52.0,0 -86,52.0,0 -87,36.0,0 -88,84.0,0 -89,122.0,0 -90,49.0,0 -91,200.0,0 -92,67.0,0 -93,87.0,0 -94,183.0,0 -95,132.0,0 -96,76.0,0 -97,200.0,0 -98,200.0,0 -99,200.0,0 -100,200.0,0 -101,200.0,0 -102,106.0,0 -103,192.0,0 -104,111.0,0 -105,95.0,0 -106,200.0,0 -107,200.0,0 -108,148.0,0 -109,200.0,0 -110,97.0,0 -111,200.0,0 -112,200.0,0 -113,105.0,0 -114,135.0,0 -115,200.0,0 -116,144.0,0 -117,156.0,0 -118,200.0,0 -119,200.0,0 -120,166.0,0 -121,200.0,0 -122,200.0,0 -123,200.0,0 -124,200.0,0 -125,200.0,0 -126,200.0,0 -127,158.0,0 -128,139.0,0 -129,200.0,0 -130,200.0,0 -131,200.0,0 -132,200.0,0 -133,122.0,0 -134,200.0,0 -135,188.0,0 -136,200.0,0 -137,183.0,0 -138,200.0,0 -139,200.0,0 -140,200.0,0 -141,200.0,0 -142,200.0,0 -143,158.0,0 -144,200.0,0 -145,200.0,0 -146,200.0,0 -147,191.0,0 -148,200.0,0 -149,194.0,0 -150,178.0,0 -151,200.0,0 -152,200.0,0 -153,200.0,0 -154,162.0,0 -155,200.0,0 -156,200.0,0 -157,128.0,0 -158,200.0,0 -159,184.0,0 -160,194.0,0 -161,200.0,0 -162,200.0,0 -163,200.0,0 -164,200.0,0 -165,160.0,0 -166,163.0,0 -167,200.0,0 -168,200.0,0 -169,200.0,0 -170,141.0,0 -171,200.0,0 -172,200.0,0 -173,200.0,0 -174,200.0,0 -175,200.0,0 -176,200.0,0 -177,157.0,0 -178,164.0,0 -179,200.0,0 -180,200.0,0 -181,200.0,0 -182,200.0,0 -183,200.0,0 -184,200.0,0 -185,193.0,0 -186,182.0,0 -187,200.0,0 -188,200.0,0 -189,200.0,0 -190,200.0,0 -191,200.0,0 -192,174.0,0 -193,178.0,0 -194,200.0,0 -195,200.0,0 -196,200.0,0 -197,200.0,0 -198,200.0,0 -199,200.0,0 diff --git a/projects/codes/DuelingDQN/assets/task0_train_20211112021954.png b/projects/codes/DuelingDQN/assets/task0_train_20211112021954.png deleted file mode 100644 index 2529311..0000000 Binary files a/projects/codes/DuelingDQN/assets/task0_train_20211112021954.png and /dev/null differ diff --git a/projects/codes/DuelingDQN/task0_train.ipynb b/projects/codes/DuelingDQN/task0_train.ipynb deleted file mode 100644 index efa485f..0000000 --- a/projects/codes/DuelingDQN/task0_train.ipynb +++ /dev/null @@ -1,418 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import math, random\n", - "import gym\n", - "import numpy as np\n", - "import torch\n", - "import torch.nn as nn\n", - "import torch.optim as optim\n", - "import torch.autograd as autograd \n", - "import torch.nn.functional as F\n", - "from IPython.display import clear_output # 清空单元格输出区域\n", - "import matplotlib.pyplot as plt\n", - "# %matplotlib inline\n" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "USE_CUDA = torch.cuda.is_available()\n", - "Variable = lambda *args, **kwargs: autograd.Variable(*args, **kwargs).cuda() if USE_CUDA else autograd.Variable(*args, **kwargs)" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "from collections import deque\n", - "\n", - "class ReplayBuffer(object):\n", - " def __init__(self, capacity):\n", - " self.buffer = deque(maxlen=capacity)\n", - " \n", - " def push(self, state, action, reward, next_state, done):\n", - " state = np.expand_dims(state, 0)\n", - " next_state = np.expand_dims(next_state, 0)\n", - " \n", - " self.buffer.append((state, action, reward, next_state, done))\n", - " \n", - " def sample(self, batch_size):\n", - " state, action, reward, next_state, done = zip(*random.sample(self.buffer, batch_size))\n", - " return np.concatenate(state), action, reward, np.concatenate(next_state), done\n", - " \n", - " def __len__(self):\n", - " return len(self.buffer)" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [], - "source": [ - "env_name = \"CartPole-v0\"\n", - "env = gym.make(env_name)" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "epsilon_start = 1.0\n", - "epsilon_final = 0.01\n", - "epsilon_decay = 500\n", - "\n", - "epsilon_by_frame = lambda frame_idx: epsilon_final + (epsilon_start - epsilon_final) * math.exp(-1. * frame_idx / epsilon_decay)" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[]" - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "plt.plot([epsilon_by_frame(i) for i in range(10000)])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Dueling DQN 网络\n", - "\n", - "DQN等算法中使用的是一个简单的三层神经网络:一个输入层,一个隐藏层和一个输出层。如下左图:\n", - "\n", - "\"image-20211112022028670\"\n", - "\n", - "而在Dueling DQN中,我们在后面加了两个子网络结构,分别对应上面上到价格函数网络部分和优势函数网络部分。对应上面右图所示。最终Q网络的输出由价格函数网络的输出和优势函数网络的输出线性组合得到。\n", - "\n", - "我们可以直接使用上一节的价值函数的组合公式得到我们的动作价值,但是这个式子无法辨识最终输出里面$V(S, w, \\alpha)$和$A(S, A, w, \\beta)$各自的作用,为了可以体现这种可辨识性(identifiability),实际使用的组合公式如下:\n", - "\n", - "$$\n", - "Q(S, A, w, \\alpha, \\beta)=V(S, w, \\alpha)+\\left(A(S, A, w, \\beta)-\\frac{1}{\\mathcal{A}} \\sum_{a^{\\prime} \\in \\mathcal{A}} A\\left(S, a^{\\prime}, w, \\beta\\right)\\right)\n", - "$$" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "class DuelingNet(nn.Module):\n", - " def __init__(self, n_states, n_actions,hidden_size=128):\n", - " super(DuelingNet, self).__init__()\n", - " \n", - " # 隐藏层\n", - " self.hidden = nn.Sequential(\n", - " nn.Linear(n_states, hidden_size),\n", - " nn.ReLU()\n", - " )\n", - " \n", - " # 优势函数\n", - " self.advantage = nn.Sequential(\n", - " nn.Linear(hidden_size, hidden_size),\n", - " nn.ReLU(),\n", - " nn.Linear(hidden_size, n_actions)\n", - " )\n", - " \n", - " # 价值函数\n", - " self.value = nn.Sequential(\n", - " nn.Linear(hidden_size, hidden_size),\n", - " nn.ReLU(),\n", - " nn.Linear(hidden_size, 1)\n", - " )\n", - " \n", - " def forward(self, x):\n", - " x = self.hidden(x)\n", - " advantage = self.advantage(x)\n", - " value = self.value(x)\n", - " return value + advantage - advantage.mean()\n", - " \n", - " def act(self, state, epsilon):\n", - " if random.random() > epsilon:\n", - " with torch.no_grad():\n", - " state = Variable(torch.FloatTensor(state).unsqueeze(0))\n", - " q_value = self.forward(state)\n", - " action = q_value.max(1)[1].item()\n", - " else:\n", - " action = random.randrange(env.action_space.n)\n", - " return action" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "ename": "SyntaxError", - "evalue": "unexpected EOF while parsing (, line 1)", - "output_type": "error", - "traceback": [ - "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m class DuelingDQN:\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m unexpected EOF while parsing\n" - ] - } - ], - "source": [ - "class DuelingDQN:\n", - " def __init__(self,n_states,n_actions,cfg) -> None:\n", - " self.batch_size = cfg.batch_size\n", - " self.device = cfg.device\n", - " self.loss_history = [] # 记录loss的变化\n", - " self.frame_idx = 0 # 用于epsilon的衰减计数\n", - " self.epsilon = lambda frame_idx: cfg.epsilon_end + \\\n", - " (cfg.epsilon_start - cfg.epsilon_end) * \\\n", - " math.exp(-1. * frame_idx / cfg.epsilon_decay)\n", - " self.policy_net = DuelingNet(n_states, n_actions,hidden_dim=cfg.hidden_dim).to(self.device)\n", - " self.target_net = DuelingNet(n_states, n_actions,hidden_dim=cfg.hidden_dim).to(self.device)\n", - " for target_param, param in zip(self.target_net.parameters(),self.policy_net.parameters()): # 复制参数到目标网络targe_net\n", - " target_param.data.copy_(param.data)\n", - " self.optimizer = optim.Adam(self.policy_net.parameters(), lr=cfg.lr) # 优化器\n", - " self.memory = ReplayBuffer(cfg.memory_capacity) \n", - " def choose_action(self,state):\n", - " self.frame_idx += 1\n", - " if random.random() > self.epsilon(self.frame_idx):\n", - " with torch.no_grad():\n", - " state = torch.tensor([state], device=self.device, dtype=torch.float32)\n", - " q_values = self.policy_net(state)\n", - " action = q_values.max(1)[1].item() # 选择Q值最大的动作\n", - " else:\n", - " action = random.randrange(self.n_actions)\n", - " return action\n", - " def update(self):\n", - " if len(self.memory) < self.batch_size: # 当memory中不满足一个批量时,不更新策略\n", - " return\n", - " state, action, reward, next_state, done = self.memory.sample(batch_size)\n", - " state = torch.tensor(state, device=self.device, dtype=torch.float)\n", - " action = torch.tensor(action, device=self.device).unsqueeze(1) \n", - " reward = torch.tensor(reward, device=self.device, dtype=torch.float) \n", - " next_state = torch.tensor(next_state, device=self.device, dtype=torch.float)\n", - " done = torch.tensor(np.float32(done), device=self.device)\n", - " q_values = self.policy_net(state)\n", - " next_q_values = self.target_net(next_state)\n", - "\n", - " q_value = q_values.gather(1, action.unsqueeze(1)).squeeze(1)\n", - " next_q_value = next_q_values.max(1)[0]\n", - " expected_q_value = reward + gamma * next_q_value * (1 - done)\n", - " \n", - " loss = (q_value - expected_q_value.detach()).pow(2).mean()\n", - " self.loss_history.append(loss)\n", - " self.optimizer.zero_grad()\n", - " loss.backward()\n", - " self.optimizer.step()" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "current_model = DuelingNet(env.observation_space.shape[0], env.action_space.n)\n", - "target_model = DuelingNet(env.observation_space.shape[0], env.action_space.n)\n", - "\n", - "if USE_CUDA:\n", - " current_model = current_model.cuda()\n", - " target_model = target_model.cuda()\n", - " \n", - "optimizer = optim.Adam(current_model.parameters())\n", - "\n", - "replay_buffer = ReplayBuffer(1000)" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "\n", - "def update_target(current_model, target_model):\n", - " target_model.load_state_dict(current_model.state_dict())" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "\n", - "update_target(current_model, target_model)" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def compute_td_loss(batch_size):\n", - " state, action, reward, next_state, done = replay_buffer.sample(batch_size)\n", - "\n", - " state = Variable(torch.FloatTensor(np.float32(state)))\n", - " next_state = Variable(torch.FloatTensor(np.float32(next_state)))\n", - " action = Variable(torch.LongTensor(action))\n", - " reward = Variable(torch.FloatTensor(reward))\n", - " done = Variable(torch.FloatTensor(done))\n", - "\n", - " q_values = current_model(state)\n", - " next_q_values = target_model(next_state)\n", - "\n", - " q_value = q_values.gather(1, action.unsqueeze(1)).squeeze(1)\n", - " next_q_value = next_q_values.max(1)[0]\n", - " expected_q_value = reward + gamma * next_q_value * (1 - done)\n", - " \n", - " loss = (q_value - expected_q_value.detach()).pow(2).mean()\n", - " \n", - " optimizer.zero_grad()\n", - " loss.backward()\n", - " optimizer.step()\n", - " \n", - " return loss" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "def plot(frame_idx, rewards, losses):\n", - " clear_output(True) # 清空单元格输出区域,因为多次打印,每次需要清楚前面打印的图片\n", - " plt.figure(figsize=(20,5))\n", - " plt.subplot(131)\n", - " plt.title('frame %s. reward: %s' % (frame_idx, np.mean(rewards[-10:])))\n", - " plt.plot(rewards)\n", - " plt.subplot(132)\n", - " plt.title('loss')\n", - " plt.plot(losses)\n", - " plt.show()" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAvwAAAE/CAYAAAA6zBcIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAABvqUlEQVR4nO3dd3xkd33v/9dninrf1fZdb/e6FxbbYJtiuklCuQmB5IIhJA43cC+B3CQQcgNplOSSEFLIj9674eKAMcUFMDa2123ttXe93du0klZdmtG07++Pc85oRhpJI2mkkTTv5+Ohx0pn2nfKzvmcz/l8P19zziEiIiIiIstTqNwDEBERERGR+aOAX0RERERkGVPALyIiIiKyjCngFxERERFZxhTwi4iIiIgsYwr4RURERESWMQX8y4SZnW9mj5rZoJn9r3KPR+aPmb3ZzO4p9zhERJYbMztmZi8u9zhESk0B//LxZ8BdzrlG59zHyz2Y8czsk2Z2wMwyZvbmApe/y8w6zGzAzD5rZtU5l202s7vMbMTM9o//Mp7LbSvBVK+9f/CQNrOhnJ8X5Fx+l5l1+a/tY2b2qike54fj7idhZo/P2xMTERGRoijgXz7OA/ZNdqGZhRdwLIU8BvwR8PD4C8zsZcB7gBfhPY+twF/nXOVrwCPACuB9wLfNrH2ut50JM4vM9DalUKLHnfS1993nnGvI+bk757J3Amudc03AzcCXzWxtoTtxzr0i936Ae4FvlWD8IiIiMgcK+JcBM7sTeCHwb35mdaeZfd7MPmFmt5nZMPBCM3ulmT3iZ2tPmNkHcu5js5k5M3uLf1mvmb3NzJ5tZnvNrM/M/m3c4/6emT3lX/dHZnbeZGN0zv27c+4OIF7g4puAzzjn9jnneoG/Bd7sP8ZO4Erg/c65mHPuFuBx4L+V4LbTva7HzOzPzWwvMGxmETO7xszu9V+Px4JsuJm9MDebbWY/MbMHc/7+hZm92v/9PWZ22C+/etLMXpNzvTeb2S/N7J/N7BzwATNbYWa3+u/bA8C2YsYfmOa1n+62e51zqeBPIApsnO52ZrYZuB744kwfU0Sk3Mys2sw+Zman/Z+PBWePzWylmX3f3w/0+N/vIf+yPzezU/73+wEze1F5n4mIRwH/MuCcuwH4BfAOP7v6tH/R7wB/DzQC9wDDwJuAFuCVwP8IgtAcVwM7gN8GPoaXFX8xcBHwOjN7PoBf2vEXwGuBdv/xvzbLp3ARXhY68Biw2sxW+Jcdcc4Njrv8ohLcthhvwHutWoDVwA+AvwPagP8N3OKfMfgVsMPfEUSBS4F1ZtZoZrXAbrzXCOAwXjDcjHc2YnzW/GrgiP94fw/8O16wvhb4Pf8ny9/xvGcGz2m8K8ys28yeNrP/M/6sgn//ceB+4G5gTxH3+SbgF865Y3MYl4hIubwPuAa4HLgMuAr4S/+yPwFO4u37VuPtC52ZnQ+8A3i2c64ReBlwbEFHLTIJBfzL2/ecc790zmWcc3Hn3N3Oucf9v/fiBejPH3ebv/Wv+2O8A4SvOec6nXOn8ALWK/zrvQ34kHPuKT8D/EHg8qmy/FNoAPpz/g5+byxwWXB5YwluW4yPO+dOOOdiwH8HbnPO3ea/hj/BC35v9C9/EHge8Cy8A4tfAtfi7TQOOufOATjnvuWcO+3fxzeAg3g7k8Bp59y/+q9rAu+MxF8554adc08AX8gdoHPu15xzH57Bc8r1c+BiYJX/OG8A/nT8/eO9ZjcCP3bOZYq43zcBn5/lmEREyu13gb/x939deMmZN/qXJfESMOc555LOuV845xyQBqqBC80s6pw75pw7XJbRi4yjgH95O5H7h5ldnTMJsx8vaF857jZnc36PFfi7wf/9POBf/FOafUAPYMD6WYxzCGjK+Tv4fbDAZcHlQdZ+LrctRu5reB7wW8Fz9p/3dXhf/AA/A16AF/T/DC8b/nz/52fBnZjZm8zrqBTcx8Xkvw+5j9kORMZtOz6D8U/JOXfEOXfUP/h4HPgb4DcLXC/pnPsh8FIz+42p7tPMrgPWAN8u1ThFRBbYOvK/a4/72wD+ETgE/NjMjgRnWJ1zh4A/Bj4AdJrZ181sHSKLgAL+5c2N+/urwK3ARudcM/CfeEH6bJwA/tA515LzU+ucu3cW97UP75Rp4DLgrJ8R3wdsNbPGcZfvK8Fti5H7Gp4AvjTuOdfnZNfHB/w/Y1zA758B+RTead8VzrkW4Any34fcx+wCUuTXzW+awfhnyjH1ZyLC9HMIbgK+45wbKtmoREQW1mm8JE9gk78N59ygc+5PnHNbgd8A3h3U6jvnvuqcu86/rQM+srDDFilMAX9laQR6nHNxM7sKr8Z/tv4TeK+ZXQRgZs1m9luTXdnMqsysBi+YjJpZTTDJCW9i51vN7EIza8Grk/w8gD8f4VHg/f5tXoNXH39LCW47U18Gft3MXmZmYf8+X2BmG/zL7wXOxyvPecA5tw/vS/9qvNIZgHq8nUCX/7q8BS/DX5BzLg18B2/ybp2ZXYgXUBdtqtfezF5hZqv933cB/wf4XvC3f3mtmUXN7L8zdjAz2WPVAq9D5TwisrR9DfhLM2s3s5XAX+HtAzCzXzOz7WZmeGWiaSBj3no4N/iTe+N4Z8WLKYEUmXcK+CvLHwF/Y2aDeF9e35ztHTnnvouXufi6mQ3gZalfMcVNfoz35fdc4JP+78/z7+t24B+Au4Bn8E6dvj/ntq/Hm/TaC3wY+E2/pnJOtzWz3zWzorP9zrkTQDBZuQsv4/+n+P+PnHPDeK0v9znnEv7N7gOOO+c6/es8CXzU334WuASv1n8q78ArperAC6Q/l3uhef3v/2KK20/62uO1M91rXien2/AOLj4Y3DX+qWn/+b4T+G3n3MP+415vZuOz+K8G+vDeDxGRperv8OZo7cXr7vawvw28xhY/xSsbvQ/4D+fcXXj1+x8GuvG+r1cB713YYYsUZt48ExERERERWY6U4RcRERERWcYU8IuIiIiILGMK+EVEREREljEF/CIiIiIiy5gCfhERERGRZSxS7gEArFy50m3evLncwxARWZQeeuihbudce7nHUU7aT4iIFFbMPmJRBPybN29mz5495R6GiMiiZGbHyz2GctN+QkSksGL2ESrpERERERFZxhTwi4iIiIgsYwr4RURERESWMQX8IiIiIiLLmAJ+EREREZFlTAG/iIiIiMgypoBfRERERGQZmzbgN7ONZnaXmT1pZvvM7J3+9jYz+4mZHfT/bfW3m5l93MwOmdleM7tyvp+EiIiIiIgUVkyGPwX8iXPuQuAa4O1mdiHwHuAO59wO4A7/b4BXADv8n5uBT5R81CIiIiIiUpRpV9p1zp0Bzvi/D5rZU8B64FXAC/yrfQG4G/hzf/sXnXMO+JWZtZjZWv9+ZImJJ9Pc9vgZRlOZaa8bDhkvu2gNzbXRBRiZyMJ7+uwg9dUR1rfUTrhs78k+NrbW0Vpflbd9JJHiu4+c4uotK9i+qmGhhioiIovMLw52ce22lYRCtuCPPW3An8vMNgNXAPcDq3OC+A5gtf/7euBEzs1O+tvyAn4zuxnvDACbNm2a6bhlgdx9oIt3f/Oxoq8/EEvy+9dvnccRiZTP//raI+xY3ci/vuGKCZe94ZO/4q3Xb+XdL9mZt71vJMn7vvsEH37tJQr4RUQq1E+fPMvvf3EPf3HjLm5+3rYFf/yiA34zawBuAf7YOTdgNnZ04pxzZuZm8sDOuU8CnwTYvXv3jG4rC2cwngTglv/x3IJZzYDD8ZwP3clAPLVQQxNZcF2Do7Q3Vk/YnkhlGE6kGR6d+PlPpb2vt0hYPRJERCrVmYE4AMfPjZTl8YsK+M0sihfsf8U59x1/89mgVMfM1gKd/vZTwMacm2/wt8kSFPdLeTa11RUMdHLVREPEk+kJ24dHU7z7m4/yV79+0ZQHDSKLmXOOvliy4Gc85m9LZybmLpIZ7/9QNLzwp3BFRESguC49BnwGeMo59085F90K3OT/fhPwvZztb/K79VwD9Kt+f+mKJ7xApiY6fXayNhomlpgYDB3uGuJH+87yi6e7Sj4+kYUyNJoinXGMFPiMB5/7ZHriXJdshj+kDL+IiJRHMRn+a4E3Ao+b2aP+tr8APgx808zeChwHXudfdhtwI3AIGAHeUsoBy8IKMpc10fC0162NhrPXzxUESKf6YqUdnMgC6hvxytsKHdSOJLxSniC4zxUcBESU4RcRkTIppkvPPcBke6oXFbi+A94+x3HJIhFPpomEjGgR9cc1VeEpyx1O9Srgl6WrP+YF/IUy/MG2oHwnV8ov81FJj4iIlIvOMcuUYsk0tUVk9wFqIoUD/rgy/LIMBAF/obNYwee+UIY/FWT4VdIjIiJloj2QTCmezFBdZMBfW1W4pCeb4VfAL0vY1CU9fsBfIMOfzHbpUYZfRKTSlastpQJ+mVI8maa2qriPyWSTdoOAv6M/XrCLichS0BdLAJBIZ7JZ+0C2pKdAhj/4zCvDLyJSucqd8tEeSKYUn0lJTzRMLDkxwxnLZj8dZ/0+tCJLTZDhBxgZdyYrlgwm7RbI8Gc0aVdERMpLAb9MKZZMF9WhB7ySnoI1/DnbVNYjS1VQww9j81ICsYQX1KcKnMEK6vqjyvCLiEiZaA8kU4olZhDwR0NTlvSAOvXI0tU3ksj+Pr5Tz1RtOVNqyykiImWmgF+mFE9lZhDwh4mnCk9orPLbeirDL0tVboZ/fMAfm2rSrtpyiohImSnglynFE2lqi1hlF/wa/gIZ/ngyTXNdlLb6KgX8smTl1vAHNfuBoKa/0KRdteUUEZFy0x5IphRPFV/SUxMNM5rKkBlXxxxLeBN/17fUqqRHlqz+WJLWuigwswx/Sm05RUSkzBTwy5SCYL0YtVXe9caX9QSLd61vqVWGX5asvpEka5trgYm9+LMBf4EMf9Clp5jVqkVEZHlzZepOrj2QTCk+ky49/vUmBEPJDDVVYda3ehl+V65Pu8gc9MUSrG2uASautjtW0jNFhj+kDL+ISKWyMu8CFPDLlOLJmU3ahYnBUDyRps7P8MeSaXpzaqFFloJ4Mk08mWFtixfwTyzp8bv0FGrLqYW3RESkzLQHkkmlM45EOlP8wltBSc+4xbdiyTS1foYf1JpTlp4Bv0NPUNIzsS3n5CU9asspIiLlpoBfJhUsmFVTbJeeSCjvdoHcGn6AU30jJRylyPzrywb8Xoa/0GccJpm0m9GkXRERKS8F/DKpIIgJJuNOJ7je+JKeYPGusYA/XsJRisy/oCVne2M1IRtbaCsw5aRdP8OvlXZFRKRctAeSSWUz/JG5TtpNU1sVoqUuSl1VWCU9suQEq+y21lVRVxWZtKRnskm7IYOQJu2KiEiZKOCXSWUD/iIz/DWTTNoNWnuamd+aUyU9srQEJT3NtVFqqyYuMJet4S8waTeZyRBRS04RESkj7YVkUsHk2xn34c8J+J1z2Rp+wGvNqV78ssT0+yU9wVmqCZ2oklNN2nVEld0XEREAytOaXAG/TCo2w0m7QVCfG/CPpryDhuAswTqttitLUF8sQThkNFRHqI2G80p6nHPZmv5kwZV2leEXEal0RnkTP9oLyaSCwL3otpwFaviD3+uCDH9LLb0jyQmTHkUWs/5YkpbaKGY2oaRnNJUh46A6EsI5r51trmTGadEtEREpKwX8MqkgqJn5wltjWc7xnX42qBe/LEF9I0maa6MA1FWF8w5Yg/8nTf7l4yfuptNOLTlFRKSsFPDLpMZKeooL+Kv9Pvy59c3j72OsNacCflk6+mNJmuu8gL42mt+lZ8T/jDfVRICJE3eTmcyyX2XXzDaa2V1m9qSZ7TOzd/rb28zsJ2Z20P+31d9uZvZxMztkZnvN7MryPgMRkeVtee+FZE5G/Ux9sTX8oZBREw3l1fAH2c/cSbuggF+Wlr4Rr6QHvAx/oc94kOFPj5u4m0o7oss/w58C/sQ5dyFwDfB2M7sQeA9wh3NuB3CH/zfAK4Ad/s/NwCcWfsgiIpVj2kjOzD5rZp1m9kTOtm+Y2aP+zzEze9TfvtnMYjmX/ec8jl3mWWyGNfzBdfNq+MeV9KxqrCESMpX0yJLSF0vQUlcFMGHSbvB5b6zxS3rGTdxNVUBbTufcGefcw/7vg8BTwHrgVcAX/Kt9AXi1//urgC86z6+AFjNbu7CjFhGpHJEirvN54N+ALwYbnHO/HfxuZh8F+nOuf9g5d3mJxidlFJ9hSQ94wdBUGf5wyFjTXLPkMvwPP9PLZRtaCGvyZUXKreEfP2k3qOdvDEp6xmX4k+nKmrRrZpuBK4D7gdXOuTP+RR3Aav/39cCJnJud9LedydmGmd2MdwaATZs2zd+gRUQWiCtPV87pM/zOuZ8DPYUuMzMDXgd8rcTjkkVgpjX8wXWnquEHWNdcS0d/vESjnH+HOod47X/cy0+fOlvuoUgZpDOOwXiKlrqcSbvJNM7/1h6r4S88aTeVzhBd5hn+gJk1ALcAf+ycG8i9zHkv2Ix2dc65Tzrndjvndre3t5dwpCIilWWue6HrgbPOuYM527aY2SNm9jMzu36O9y9lFE9mqIqEZpTVrhmX4Q9+r8tZrbehJsLwEmrLebR7GICuwdEyj0TKYcBfZTe3hj+dcST9TH48W8NfeNJuKlMZXXrMLIoX7H/FOfcdf/PZoFTH/7fT334K2Jhz8w3+NhGRZc3KtDuYa8D/BvKz+2eATc65K4B3A181s6ZCNzSzm81sj5nt6erqmuMwZD7Ek2lqIjP7iNSOW4U0W9KTE/DXVYUZGU1PuO1iddovPxqML52DFCmdPj/gz3bpqfIC++CzHdTzBxn+1LgMfzKdIbr8u/QY8BngKefcP+VcdCtwk//7TcD3cra/ye/Wcw3Qn1P6IyIiJTbrvZCZRYDXAt8ItjnnRp1z5/zfHwIOAzsL3V6nahe/eDKdF6gXY9JJuzklPQ3VEYZGl07wHMw3GIgnyzwSmS+3P3Fm0sXg+kYSALTUjk3aBRhJpvx/89tyJsfV8KczrhLmflwLvBG4Iadpw43Ah4GXmNlB4MX+3wC3AUeAQ8CngD8qw5hFRCpGMZN2J/NiYL9z7mSwwczagR7nXNrMtuK1XDsyxzFKmcSS6RnV74NX0nNuOJF3H8H2QF1Vfh/zxS7oKBSUdsjycqJnhLd9+WE+9NpLeMNVEyeGjs/wB+VpwWc45h8oBG05U5nxGX5HTXR5B/zOuXtg0nXjX1Tg+g54+7wOSkREsoppy/k14D7gfDM7aWZv9S96PRMn6z4P2Ou36fw28DbnXMEJv7L4xRLpGbXkBK90Z3RcSY/Z2KJcAPXVYYYTqeykx8XuZDbDv3TOSkjxuoa8uRlnJplI3j+SX8MfnPWKZQN+L8BvnGThrVSmcibtiojI4jRtht8594ZJtr+5wLZb8CZtyTIQT2WonmmGPxKaUMNfGw1jObNU6qsjOOdl/+uq5nKSaWEow7+89fpnpLoGCwf82ZIevw9/kOEPPucjyRRVkRDVEW/7+LacqQpryykiIouP0k4yqXgiTW2Rq+wGJkzaTU48S1DvB0zD8zRx997D3fzyUHdJ7iueTNPtZ4AHVcO/LPX4Af/ZgcJdmIKSnqBGP1vDn83wp6mrCmfr9AtO2lWGX0REWMR9+KVyxVOzKOkpMGl3/DyA+movcJpskuRcfewnB/nI7ftLcl9Bhx4zlfQsV71+Br9zkgx/fyxJY00ku1ruWEmPP2nXP4sV9VtvJiu0LaeIiEyuXO04Awr4ZVKxxOwm7Y6mMmT8oCeeTOf14AeyZTzz1alnJJmic5Js7UwFHXo2r6hXSc8y1TPsva+TfWb6c1bZhbHPb26Gv7YqTMRvvTk+w++V9OirVkREykd7IZnUrDL8fnAfT+UHQ7nqq/NLIkptJJGma2iUdGbu582CDP+uNY1qy7lMBTX83ZN8ZvpiyewquzCxhj/mH9QGWfzxbTm9kh5l+EVEpHwU8MukYomZT9oNDhDiSS/LOVVJz/A8ZfhjiTTpjMvWZs/Fqd4YIYOdqxuJJzMkUpnpbzRHtz1+hu88fHL6K0pJ9PglPRkH54YnZvn7RhLZHvwwsUvPSCJFXTSSrdMf35ZTJT0iIlJuCvhlUvECE26nU+NP8h3LfmYKTNoNAv75yfAHjz1ZTfZMnOyLsbqphhUNXsC3EBN3P3PPUf7zZ4fn/XHE05tzYFiorKcvlsz24IfCk3a9kp5g0u74Lj0ZlfSIiEhZaS8kk4on09kAvlhBNj+WsyjR+IA/KIkYnqdJu0Eg1jk49zr+U70x1rfU0lTjBXwLMXG3czA+aU94Kb2ekQTrmmsA6CrwmekfSWZ78ANEwyGiYct+zsYm7Xr/V5Lja/gzasspIiLlpYBfCkqmM6QyblZdesA7WAC/Lee4Gv6GoEvPPJT0pDMuW3bTVYKJu6f6Yqxvrc0uqjTfE3edc3QNjjIYT81bFyPJ1zeS5Pw1jcDEs0LOOfrH1fCD9zmPT1LDP34eQCrtsh1+REREykF7ISkoCGbGB+vTqR0/oTGRmVDDX1cdZPhLX9KTuwbAXEt60hlHR3/cy/DXBhn+qQP+j9y+n//3yKlZP+bQaCo7/6FDWf55l844+kYS7AwC/nEHicOJNKmMy6vhB69TT3BANr5Lz/i2nMmMJu2KiIhHffhlUQkC59lO2g1KegrNA6gKh4iEbF4m7eZmxeda0nN2IE4q41jfOlbSMzhNSc83HzzB1x98ZtaPmVtS0jGggH++DcSSZBysaaqhpS7K2XEHicEqu83jM/xV4bySnrq8Gv6xkp50xuEcquEXEalw5U77aC8kBY36WeaZT9odK+lxzmXLHXKZGfXVkXkJ+OOJsWBrrr34gx78XoZ/+pKeoPzjydMDuFkewucepJxVwD/vgg49bfVVrGqsnvCZ6Rvx3u/cPvwwtsBcJuOyq0kHJT25k3aDen516RERkXJSwC8FBRn+WU/aTaZJph3pjCtYFlRfFZ6Xkp6R5NhBxPhs7UwFPfg3tNbSWDN9SU8s6ZV/DMRTnOyNzeox8zL8/aVZPEwmF3Toaa2rYnVTzYSzQudyLs9V52f4g/UmaqvG2nImc9pypvzyHpX0iIhIOSngl4KyNfyzXXgrmc45aJh4H3XVkXmZlBqUErXWReec4Q+C9nUttdRXhQkZDMQmH3N/TvZ/3+mBWT1mEPBHQqYM/wII1mpoq6+ivbF6Qpeew51DAGxtr8/bXlsVJpZMZz9vdZO05QzKe1TSIyIi5aS9kBQUBDKFgvWp5NbwT3XQUF8dYWge+vAH4960op6uwdFZl9aAV9LTWhelriqCmdFUG50yw597MPDkmVkG/EOjRMPGeSvqONM/u7MEUrxev6Sntb6KVY01Ez4zBzsHaa2LsqJ+YoY/lkhn6/hrq8KEC9TwB6vuqqRHRETKSQG/FDRVdn4q2YA/mckJhiZ+zOqrwvPSljN4zM0r6kikM3lZ95k61eu15Aw01USnnLSb+1hPnu6f1WN2DozS3lDN2uZaOkrQVlSm1jPsvWdtdV4NfyKdydbtAxw8O8SO1Y2Y5QfstdEwI8lU9v9JXVUYMyMatrwuPUGLTmX4RUSknLQXkoLis5y0Wx0ZW2k3yLYXuo+6qsi8tuU8b4VXgjGXTj2n+rxFtwJNtZEpJ+0Gl21dWc+TBUp6ijnb0DU0SntTDaubajirtpzzrnckQXUkRG1VmFVN1cDYZ8Y5x9NnB9mxqmHC7WqrIvkZfv8zHgmF8vrwa9KuiIjkcpSnL6cCfikoPstJu6GQURMNMTpNDX9DdXheuvTEcjL8MPtOPc45f5Xduuy2xuqpS3qCDP8121Zwuj+enRAajOu6j9zFH33loSnPOnQNehn+Nc3VdA2NTljESUqrZzhBm1+us6rRW203WL+ha3CUgXiKnasbJ9wuKOmJJfLXq4iELW+lXU3aFRERACvzbkABvxQ024W3wAvwY8mxGv66qsiE68zbpN0JGf7ZZcl7R5LEkun8kp7ayJSTdoODgedsXQHk1/HfdaCTU30xbnu8g1//13t44lThkp+uwVHaG6tZ01RDOuPoHlJZz3zqHU5kO/CsavQz/P5B4tNnvQm7hTL8dVVhRpJpYn5XqOAzHgmZJu2KiMiio72QFJTNzkdmHvAHPcqnKumprwozPA+TdoMSi/P8DP/ZWWb4T+f04A801RSZ4fcD/n05dfw/ePwMbfVVfPMPn0MyneG1/3Ev39pzIu/2qXSGc8N+wN/sPa5W251fPSM5GX6/pCdo53qwcxCA7asnBvw10TDOQa8/B6Aum+EPkcpMnLSrDL+ISGX68b4OTvSMlHsYCvilsGwN/ywy/LV+hj+WnGLSbnWEWDJd8pKVmH/WoK2uivqq8Kwz/EFLzg15Gf7pJ+02Vkdob6xmbXNNto4/lkhz1/5OXnbRGq7a0sZt/+t6Lt3QzAdu3Ucm5/n3DCdwzss0r2nyykvOKOCfV73DCVr9gL+uKkJjdSQvw99SF6W9oXrC7YIA/9ywd93goDYasmyQD2SDf2X4RUQq081feoiX/vPPyz0MBfxSWBCsB5NwZ6ImGp62D3+9XwJR6rKeYNXTUMhYVWAhpWKdKpDhb6yJMDSaymu7mGsglqLJX5H1wrVN2ZKenz3dyUgizSsvWQt4LSBfc+V6hhNpOnJ67QdjbW+sZnWzn21WL/551TOcoK1ubBXd9qaxXvyHOr0Ju+M79EBuwO/N06jNzfCrLaeIiOQI4iHw5nZ1lmHfroBfCoon09REQwWDnekEixJN1Ye/rtrbNlLiTj0jiXQ2GGtvrKZrliU9p3pj1EbDtOQEg03+artDk0w27o8lswH/ReuaONw1TDyZ5gePd9BWX8U1W9uy193W7pWJHO4aym7rGhoL+FfWVxMJWd4BgZRWMp1hIJ7KZvjBO7vSORj3O/R4LTkLqfUPWHuGvIC/LnfSbmZiDX+wCq+IiFS27zx8iqs+eAf9I7NvGz4b2gtJQV7AP/NyHvAC/HheH/5CXXq8gGmy4Hm2YjnjXt1UM+uSnoOdg2xtr8874AmC+ckm7g7EkzTVeM/rwnVNpDOOx070ccdTZ3nZRauJ5AR92YC/Myfg9w9O2huqvTMUjdVlbc15pj/GZ+45OqfFyxazoN9+W17A750V6hoapT+WLDhhF6Aump/hD+a6REP5Gf6gZC1YlEtERARgcHSRBfxm9lkz6zSzJ3K2fcDMTpnZo/7PjTmXvdfMDpnZATN72XwNXOZXLJGecQ/+QM24SbuFJv4GXU1GSjxxN5aT4feytbPL8B/oGOT8NfnZ3SCYn2zi7kAsSXM2w98MwCd+dpiRRJob/XKewMqGKppqIhzuGs5uy83wA6xprilrhv/7j53hb7//JHtPzm4RscUuu8pu3bgM/8AoB7MdeibL8PsB/9AoNdEQIT+gj4Tzu/Qk1ZZTREQWgWIy/J8HXl5g+z875y73f24DMLMLgdcDF/m3+Q8zm13UKGUVT2XmEPCHiPslPdWRsWAoV70fMA2XuIZ/JJHOBmOrGqsZSaRnfBahdzhB5+Aou8YH/EGGf4qAP7jOhtZaGmsi3H2gi9a6aLZVZ8DM2NrekF/SMzhKU00ke4ai3AF/EBDfub+zbGOYTz1+dj4vw99UTSyZ5tETfQDsLNChB8YC/u6hRF7bWa9Lj9pyiojI4jLtXsg593Ogp8j7exXwdefcqHPuKHAIuGoO45MyiSXSVM+hpCfo0lM3SZefer+kp9SLbwWTdiGnzeIMg+b9HV47xl1rmvK2NwYZ/klKevpzMvxmxoVrvdu/7KI1eeU8gW0FAv4guw9eSVJHf7xsJTW9fsnLXQeWZ8AfLIyWn+H3uiPdc7CbpppI3vuRK/hc9wwn8g6MIyEr2JZTk3ZFRKSc5pJ2eoeZ7fVLflr9beuB3ObiJ/1tssSMptLUznCV3UAwaXeqsqD66iDDP58lPf7KqTOcuLu/w+uuMyHDXzN5hj+VzjCcSGevA14dP8ArxpXzBLatqufswCiD/v11DsbzAsw1TTWMJNIMzsOKxMXoj3kB8d6T/fPSUeBw1xAfuHXfpF2P5ltvoRp+/yDxoeO97FzdOOmk9bqod/A3/qA2MklbTk3aFRGRcprtXugTwDbgcuAM8NGZ3oGZ3Wxme8xsT1dX1yyHIfMllpjbpN1Ywsvw10yS4R+r4Z+HDH9OSQ/MfLXdAx2DtNZFJ2R3xybtTgz4B/z+/M21Y+Udr758Pa+9Yj3P3bZiwvVhbOLuEb+O38vw12QvX9Ps/Z47cXfvyT5O9i7MAh59I0la/S5Fdx8o/f/RWx89zefvPcaDx3pLft/FCEqWcjsxBQeJiXSGHZOU8wDU5KwtkRvwR8e15Qzq+SOatCsiImU0q4DfOXfWOZd2zmWATzFWtnMK2Jhz1Q3+tkL38Unn3G7n3O729vbZDEPmUTw1t0m7oymvS8/kGf556tKTSFPrZ1+D4K1rhhN39/sTdsdndxurI5hRcPGtYJXd4KAA4LKNLfzTb18+aXZ3fGvOrsHRvEWeVvuLbwV1/LFEmt/51P186Lb9M3o+s9U7kuRZ57WytrmGO/afLfn9B8+7XCVDPcMJ6qvCeQe2QYYfJp+wC+TV7efePhK2vBr+pNpyiojIIjCrvZCZ5dYovAYIOvjcCrzezKrNbAuwA3hgbkOUcphTht/PePaNJCYP+Kvmqw9/Kruyb1NthKpIaEadejIZx9NnByfU7wOEQkZDVaRgSU+Q9W/OCfinc96KOiIh43DXEMOjKYYT6byAc62f4e/wM/w/fOIMQ6MpDpwdLPox5qJ/JEFLXRU37FrFPQe7GU2V9r0Kzmzc8VTpDyaKkbvKbqCxOkKNX8o2VYY/93OdX9ITGlfSoxp+EREpv2Lacn4NuA8438xOmtlbgX8ws8fNbC/wQuBdAM65fcA3gSeB24G3O+dKGyXIgognM7MO+Gv81Xl7hhMFe/CD182kOhIqeZcer6bay76aGaubqmdUf36yN8ZIIj2hfj/QVBstOGm3fxYBfzQcYtOKOo50DWfPQhTK8AeTjr+15yQAx7qHSaTmv+69L5akpTbKDbtWMZxI8+DR0pXeZDKOI91DNPqtSZ85tzBlSrl6RhJ59fvgfWaCM0NTZfjDIcuuQp2b7Y+GbVxJj7r0iIhI+RXTpecNzrm1zrmoc26Dc+4zzrk3Oucucc5d6pz7DefcmZzr/71zbptz7nzn3A/nd/gyX+LJdDZTPlO1OR1MpjpoqK+OlLRLTybjiCfz24kGCykV6yl/wu74HvyBxppJMvzxiSU9xQg69YzvwQ9eqUhLXZSOgTgneka478g5dq5uIJVxHDs3PNldlsRoKs1IIk1rfRXP3baS6kiopGU9p/tjxJMZfvfq8wC4cx5KhqbTO5zI69ATWNVYTWNNhNVNhTv0BILMfu5B7YS2nBnV8IuICBjl3Q8o7SQFxZLpggtmFSMI8gfiqSnnAdRVhUu68FYsOXFl35kuvnXAb8m5c/VUGf6JAf9sMvzgBfzHukc445ftjJ8ovKapho7+Ub790EnM4E9ftguAp+e5rCdY8ru5NkptVZjnblvBnfs7S9YiNFhw7IZdq9jaXs8dZej1XyjDD3D9jnZeecnaSTv0BILPdu5nPDquLWdKbTlFRGQRUMAvEzjn/Az/7Lv0BCbrww/QUB0paUlPEPDXjQv4Z9KH/0DHIJva6rKTisdrqokWnLQblPnktuUsxrb2ehLpDA8f782ON9fqphrO9Me45eGTXLttJdfvWEnIyK4EO1/6/AOYoIPNDbtWcfzcCEe6S3Nm4XCnN/5t7fW8aNcq7j/SU/I1GabTO5wsmOF/54t38OH/dum0tw/+f+TV8E9YaVeTdkVEBH5xqLusj6+9kEyQSGfIOOY8aRemvo+6qjDDpczw+xOA87uu1DAYTxFPFvc4+zsGJq3fB2iapKSnP5akKhzKTvgs1rZV3sTQXx05RzhkEwLQtc017Ds9wMneGL+1ewM10TCb2uo42Dm/Gf7xi1K9cNcqoHTtOQ93DdFSF6WtvooX7lpFIp3hngX8MhxNeSswt9XP7AAtV1C7n/t5D4+ftKu2nCIiAvzXY6fz/v7h4x1c++E7F2wtmsJpTKlo8aT34ZtLH/7s71Nk+OurIyVty1kowx+UyLzjq49kg/HfuWoTz92+csLt48k0R7uHeeUkC2XB5CU9A/EkTbWRactAxtu20gv493cMsrqpmtC4wDCYuNtYE+FlF60BYMfqxgXL8AclShta62hvrOaAP8dhrg53DbGtvQEz49mb22isjnDX/s7sc5xvfX7J0vguPTNRKMMfDY8v6fF+DyvgFxGRHH9/21OA1+p7LvuiYinDLxME2fCZZqsDuQcKU9Xw11dFSlrDH7T4zA3Anr25jQvXNnGke4gnzwzw4yfP8sX7jhe8/aHOITIOzi/QkjPQVBNhcDRFJpNfy94fS854wi5Ac12UlX5nnvH1+zC2+NavX7Yu+7ruWNXA0e7hbI/3+RDU8OcuSrVlRT1HS1XS0zXMtvZ6wCt3ed7O9rw5Aoe7hrj3UPe8ZT56/DMYbQVKeoqVreHP6dITCYXGlfQ4omGb8YGgiIhIKSnDLxMEAf9cFt4KTDlptzpc0gz/iD8fIPfxt6ys57Z3Xp/9+82fe4CTfYVbQO73J+xO1qEHvAy/czCcSNGYU68/EEvOuH4/sLW9nu6h/EW3AuevaSQSMn7nqk3ZbTuCTj3dw+yYZHLxXAWr0OaWGG1ZWV+STj39sSRdg6PZhcfAKxn6weNn+L8/PsC9h8/xyDN9AKxsqOY1V6zjtVduyK5LANBYEy0qa+6cKxhsZ0uW5pBVyXbpieZn+JPj2nKqJaeIiJSbAn6ZIDbHgD+vhn+qkp6qSDZIL4V4tqRn8o/1htZaHj3RV/CyAx0DVEVCbF5RN+ntG2u8+x6ITwz4W2aZLd7W3sADR3uy/d9zXbmplcfe/9K8ScRBf/iDnUPzFvD3xZJEw5Z3tmRLez3dexJe+dIsD24AjnQFE3bHAv4XnN9OyODf7zrMztUNvO/GC9jYVst3HznF5355jE/94mjefVy9pY1v/OFzCt5/z3CC7z16im8/dJJTfTHu+fMbaBg3CbvHP6Ap1KWnWJNO2s1badepQ4+IiJSdAn6ZoNDk15moLbakpzoy60m7P9rXwb5T/bz7pedntwUlPVM95obWOvpGkgzGk3kBO3gZ/p2rG4hM0VElCHQHYknWt9Rmt/fHkpy3on5WzyUobSlU0gNM6Bjk1b57rTlvnGK+wVz0jSRorq3Ky45v9p/fse5hLt3QMuv7Dlpybm0fe71WNlTzpbdeTWNNhEvWN2cf9+UXr+Xc0Ch3PNWZ7ei092Q/333kFAfPDk444PnI7fv59C+OkEw71jTV0DeS5Fj3MBevb8673vhJybNRsA9/KEQ647JnFlKZjCbsiohI2elcs0xQykm7U7XlrK8Kk0hnZrVq7Cd/foTP3Xssb1usQA3/eBtbvez9yd7YhMv2dwxy/urJ6/dhbGGt8RN3B+Ipmmpnd/wcdOqZLOAfr7Yq6NQzfxN3+0aStNblHxAFAfpc6/gPdw0RDRsb2/LPpFy7fSWXbmiZUIKzoqGa1z17I2+5dgtvuXYLf/nKC4iEjG89dHLC/X7i7sO8aNdqbv/j6/nUm3YDhd/rnuGJcxRmKjiTVDeupAfIdupJZ9yUB5AiIiILQXsimWCuk3arI2O3m7qG3wuYgkC9WIPxJI+e6GMwnsqrly608NZ4G1q9rPz4ILBvJEHX4Cjnr2kodLOsbIY/pxe/c47+WHLGi24FLl3fzNrmGi7d0Dz9lX07VjVwaB479fSNJCcEw5va6jArHPB3D40WvSjX4c4hzltRP+ve9Csaqrlh1yq+8/CpvPf/c788SlU4xN+++mJ2rWlivf9en+qbGPB3DsZpqYvOqT9+TbRAht+/v7Rf1pNMO6LK8IuISJkp4JcJ4kUEzlMJhSwb9E91lqCh2rtsaIZ1/Pcf6ckGVMHkUii2pMcLAk/05E/cPezXlW9fNU3A72fxB3N68Y8k0qQzbtZ17SsaqrnvvS/iik2tRd9m+6pGjnQPzVunnl6/pCdXTTTM+pbaCQH/yd4RnvOhO/j+3jNF3bfXknN25U+B39q9ke6hUX7mrwvQN5LglodO8arL12XPlLTWRamNhjlVIMN/ojfGprbJ52oUo65QDb8f3AcLbqXSGWX4RUSk7LQnkgmCTHlNZHYBP4wdLEx10BCURIzMsFNP7gJNvcNjgXesiIC/rb6KuqrwhAz/4U4viM2dSFpIY83Ekp7+cT3rF8LO1Q0k047j5wp3HJqr/tjEkh7wOvWMD/gfONpDMu344RPTB/zJdIZnekamfZ2n84Lz21nZUMW3HjoBwFcfeIZYMs1br9+SvY6Zsb61llMFujKd6BmZUFI0U/XZgD+3LacX8AetOZOZypi0a2afNbNOM3siZ9sHzOyUmT3q/9yYc9l7zeyQmR0ws5eVZ9QiIpVDAb9MENTwzzbDDzk9yqectOtdNjzDkp5fHurOdl0J+qmDd6BSHQlNWLwql5mxobWWk70TM/xV4RAbWqcOAnO79ASClXdn04d/trKdes7Oz4q7hUp6wA/4u4bzyncefqYXgJ8/3T3tfIwTPSMk027OAX80HOLVl6/njqc6OTsQ54v3Hufa7SvYNW4NhfUttZzui+dtS2ccJ3tHsvM5ZuvlF6/lfTdekNcuNMjmB+sHpNIZopXRlvPzwMsLbP9n59zl/s9tAGZ2IfB64CL/Nv9hZrP/shERkWlVxJ5IZiab4Z/lpF0oMuD3M6PDM8jwnx2Ic7BziJdeuBrIL+mJJdJTTtgNbGitm5jh7xpiy8r6aXu7R8Mh6qrC+Rn+kYXP8G9b5ZXEzMfE3XgyTSyZLthmdMvKegZHU5zLOdB65Jk+6qq8NRX2HOuZ8r6DDj3bpimdKsZv7d5IKuN4+1cepmMgzu9ft3XCdda11E6o4e8YiJNMuzmX9LQ3VvMHz9uaN8k4O2nXLzlLVUhbTufcz4Gp3/wxrwK+7pwbdc4dBQ4BV83b4ERERAG/TDTXSbvebb3Au6Zq8vsI2k3OJOC/97BXzvPrl60D8jP8I4n0lD34AxtaazkxLsN/pGs4G0RPp6kmms3qw1i2fy696WeqrirCxrZanp6HDH9QolQow795ZX6nnpFEiv0dg7zhqk1URULcsb9zyvsO5kpsnWMNP3iLkl26oZk9x3vZ2l7P83e2T7jOhtZaeoYTees9POOXQc014C8kWGQryPAn1aXnHWa21y/5CSaprAdO5FznpL9NRETmSUXviaSweDJNyKBqDoFKbYFVSMcLsvEjMyjpuefgOVrrojxn2wpgrJ86QCyZKuogZWNrHYPxVDawTaQyHO8ZYevK4rLOjTURBnNKespRww9eWc+hecjw9/lnLFpqJ2b4twYBv5+p33uyn3TGce32FTxn6wrumi7g7xxiVWN1yQ6OfutZGwD4vWu3FCzlCtZKOJ2T5Q8mbM9LwD+uLadX0rP8M/yT+ASwDbgcOAN8dKZ3YGY3m9keM9vT1dVV4uGJiJRfgcXg54UCfpkglkhTEw1P6Ic+E0HgPVXAH9ThDxWZ4XfO8ctD3Tx3+0pqomEaqyPZFVODcReb4QeydfzP9AyTzrjiM/y10WyQD2UM+Fc3cKRruOjXL/C9R0/x9QeemfTyoEyqUIZ/fUst0bBx9JwX8Af1+1dsbOWGXas40j08ZZ/+Q11Dc67fz/W6Z2/kI//tEl63e2PBy9cXaMP6TM8I4ZCxtmXiysZzFbT5TAVdejJu2jKx5co5d9Y5l3bOZYBPMVa2cwrIfcM2+NsK3ccnnXO7nXO729snnsEREZHiKOCXCeKp9JSBejFqo2GqwqEpyxmCPvwjRbblPNw1TMdAnGu3rQSgtb4qL8M/kihu3MHE3BM9XhB4qMgOPYHz2uo40DFIxq/TDur5G2oWduHql1ywmrTzathn0p7z3+86xL/eeWjSy7MZ/gIBfyQcYlNbXTbD//DxPrasrKe1voobdq0C4M5Jsvydg3H2nuznik0tRY91OtWRML/9bK+cqJAgw59bx/9MzwjrWmrm1IN/MuO79KTSmXl5nKXAzHKXgX4NEHTwuRV4vZlVm9kWYAfwwEKPT0SkklTmnkimFEtk5jRhF7wa/unKa4IVSodHiyvp+aXfjvO67WMBf8/IWKY9nkwX1VloY1t+hn+srry4gP+521dybjjBAb9+vj+WpLEmsuCZ3N2b2/j7V1/Mz57u4n3ffbyoha+GR1Mc6hziVF8sbx5Crv5YkOGfWNIDY605nXM8eqI3G8BvbKtjx6qGSct6vvvwKdIZx2/6ZTgLYXVTDZGQ5fXiP9E7Mi/lPJCb4XfZfyth0q6ZfQ24DzjfzE6a2VuBfzCzx81sL/BC4F0Azrl9wDeBJ4Hbgbc752bWqktERGZkYVOSsiTEU+k5TdgFaK2rmjRgDIRCRl1VuOhJu/cc6mZjWy2bVnjBWltdlO6h/Az/upbpA/7m2igN1ZFsmcfhriHWNNVkS4ymc+12b/7ALw91c8HaJgbiyQWdsJvr9Vdt4nR/nI/fcZC1zbW86yU7p7z+k2cG8GNRDnQM8uzNbROu05ut4S/8nLasrOcXB7s5fm6E7qEEV+YsGHbDrlV89pdHGRpN5b2ezjm+uecEzzqvtegDq1IIh4w1zTUTavhf4nd5mo/Hg5xJu2mXnci7nDnn3lBg82emuP7fA38/fyMSEZFcy39PJDMW92v45+KdL97BZ27aPe316qoiRfXhT6Uz/OrIuWx2H/wM/yxKesb34j88gw49AGuba9nWXp9dAGwgllzw+v1c73rxDn7rWRv4lzsO8l+PnZ7yuo+d6Mv+vr+jcIefvpEkVX770UK2rGxgNJXhNn+hrdwSnRt2rSKZdtxzMH+C5SMn+jjcNZydZLuQcltzDo+m6B5KzHnRrckUnLRbARl+ERFZ3BTwywSlqOFf2VDNjtWN016vvjpcVA3/fUfOMRhP8YLzV2W3tdVV5fXhL7akB8Z68TvnONI584mk121fyf1HekikMgzEUjTVlu9kmZnxwddewiXrm/nwD/czmpr8AOrxU/2saaqhsSbCgY6BgtfpjyVorotOOml780ovWL7loZPUVYU5P+d9ftZ5rTTVRPjJk/llPd/ac5KaaIhXXrqWhbahpTZb0hO0Y53/kp6xSbsV3pZTREQWAe2JZIKBWGrBJqDWV0WKKum57fEz1FeF83qtt9ZXMZJIZ9cNKDbDD/gZ/hidg6MMjqZmHPBfu30lsWSaR57ppb/MGX7wAs33vGIXp/pifOVXk3fgefxkP5duaOb81Y0cmCTD3zucnLScB8i2Lz3cNcylG5rzAtpIOMQrL13Hdx45ye1PdABe96TvP3aaGy9eS2MZSp/Wt9b6i21l5rUHP0yctJus7LacIiIyjcv/5icc6Sp9i+3xFPDLBGcH4qxuLH3LwkLqq8PTTtpNpTP8aN9ZXnTB6rxSo7Z6b45A70gC5xyxZHEr7YIX8A+Npnj4uNdWcqYB/9VbVxAyr46/nDX8ua7dvpJrt6/g3+46VLBV50A8yZHuYS7b2ML5axrZ3zFYcKJvXyxB6xTzL1Y3VWcPrHLr9wN/9WsXctmGFt759Ud46HgPP9rXweBoit/cvfDlPOB16sk46OiP88w89uCHsQx/0DWpUlbaFRGR2fuhnyCbT9MG/P4KiZ1m9kTOtn80s/3+CorfNbMWf/tmM4uZ2aP+z3/O49hlHqTSGbqHRlndVL0gj1dXFZm2pOdXR3roGU5w4yVr8rYHQWnPcIJ40guwaovoww9ka7jvPuDVms+khh+8ib+XbmjhnkPdiyLDH/izl+2iZzjBp39xZMJlT5zqB+CS9c3sWtvEYDzF6f74hOv1jSRpLtCSM2Bm2RV3rygQ8NdWhfnMTbtZ21zDW7+wh0/+/AgbWmu5ZsuK2T6tOQl68Z/qi3GiZ4TGmsi8vV9BcD/WpSejkh4RESm7YvZEnwdePm7bT4CLnXOXAk8D78257LBz7nL/522lGaYslHPDCTIOVjUtTIa/oTqSl43+wr3HuNefDBu47Ykz1FWF8+r3ISfDP5zMHjTUFtldKFh862dPd1FXFWbNLJ7vddtX8tjJfkYSaZoWScB/2cYWXnHxGj79i6OcGxrNu2zvyZyAf41Xd1+ojr9vZOqSHhhbcXeynvorGqr5wu9dRdiMJ88M8N+u3FBwJdyFkO3F3xvjmR6vJedcFpWbStCRJ5vhz7hsmY+IiEi5TBsdOed+DvSM2/Zj51wQpf0Kb6VEWQbODngZ39ULFPDXVYUZ8bv0PPxML++/dR9v+/JD2XGk0hl+9EQHN+xaNaFzUFu9F5T2jCSI+XX8xay0C2OLb3UMxNnW3jCrAPDa7StJ+5ncxZLhB/iTl+5kJJHiP+4+nLf98ZP9bGyrpbW+ip3+RNtCnXr6Ygla66duqfrrl63jDVdtYmXD5GeCzltRz2ff/GxefMEqfufqTbN4JqWxLmfxrSDgny9BR57gc5GqkLacIiKyuJViT/R7wA9z/t5iZo+Y2c/M7PrJbmRmN5vZHjPb09XVNdnVZIGdHfCywgtV0lNf7U3adc7xD7fvp7Uuymgqk11I6oGjPZwbTvDKSyZ2dwlKevpGEsT8g4Ziu/Q010Zp8icmb2ufWTlP4MrzWrLrFSymgH/7qkZ+81kb+NJ9x/NWmN17qo9L17cA3njXt9Sy/0x+wB9PpoknM9M+n5dfvIYPvfaSacdy2cYWPn3TsxfsALKQmmiYlQ3VnOwd4URvbF4D/nChSbuq4RcRkTKbU8BvZu8DUsBX/E1ngE3OuSuAdwNfNbOmQrd1zn3SObfbObe7vb290FWkDBY6w19fHWY4keYXB7v51ZEe3vmiHfzpy87np0918r1HT/ODx89QG51YzgNe0Grm1fAHGf6ZtBMNsvwznbAbqI6EucqvSy9nW85C3vninWDwsZ88DUDvcIITPTEu3dCcvc75ayZ26ukLFt2aooZ/KVrfUsMjz/SRSGXmrQc/5EzazWvLqYBfRETKa9YBv5m9Gfg14Hed3+rDOTfqnDvn//4QcBiYeulPWVQ6B+KEDFZMU9JRKnVVEdIZxwdve4oNrbW84epNvOXaLVy5qYX337qPHz7RwQ0XrCqYuY+EQzTXRukdTmTLgort0gNjdfzbVs1+5dfr/FV3F1OGH7y69Tddcx63PHySg2cHeTyYsDsu4D/cNUQilclu64t56xpM1aVnKVrfWsvBTq/t2Xxm+HPbcjrnSGdU0iMiIuU3qz2Rmb0c+DPgN5xzIznb280s7P++FdgBTGwXIotWx0Cc9sbqBess0lDtZcb3dwzyrhfvpDoSJhwy/vG3LiOWTNMzSTlPoK2uip6RZLakp2YGAX+Q6Z1thh/g1Ves5w1XbeLCtc3TX3mB/dELt1NXFeGjP36avSf7ALh4/dg4d61pJJVxHOke6//bO+xn+BfZAcxcBRN3gXnN8Edy2nIGq+2qpEdERMqtmLacXwPuA843s5Nm9lbg34BG4Cfj2m8+D9hrZo8C3wbe5pzrKXS/sjidHRhd0HrrICO/c3UDr75ifXb7tvYG3nfjBWxsq+UF509e8tVaX0XvcO6k3eID/qu3tLGtvT67cuxsrGqs4UOvvaTouQMLqa2+ij+4fiu37+vglodPsXVlfd56AbvWeNV2uXX8/X6Gf6q2nEtREPCb5Qf/pRbNacsZrLartpwiIsvD8XPDPHair9zDmJVpC4+dc28osPkzk1z3FuCWuQ5KyufsQDxb274Qgi4vf/qyXdkJj4GbnruZm567ecrbt9ZFOd0XHyvpiRZfS//Si9bw0ovWTH/FJeyt12/hi/cd42j3MK+6fF3eZVvb64mGLa9TT1DDv/xKerzP9LrmWqoi8xeAB+U7qZwMv9pyiogsD8//x7sBOPbhV5Z3ILOg1JPk6RxcuEW3AJ63s50f/K/reMmFq2d1+9a6KnpHEsT8Pvw1VfpI52qojvD2F24HvP77uaLhENvaG/J68fcu20m7XlZ/Y9v8ZfdhLMOfTDtSfi9+BfwiIlJui6u1iJTVaMqrmV/Ikp5wyLho3ezr39vqq+jJm7Srj/R4v3vNJkYSKV6TUzIV2LWmkQeOjlXd9cUSVIVDM+p2tBQEq+3O54Rd8FYhDoeMdMZle/GrpEdERMpNeyLJ6hpc2B78pdBaX8VoKkPPsFd7vtwC1VKojoR5xw07WFFgkazz1zRxuj9Ov5/Z7x9J0lIXnbeVaMulqSbCyy9aw4svmN2ZpJmIhIxkJkMyo0m7IiKyOCjgl6xg0a1VZVwkaaba/FrzU30xqiKhCfMAZGpXbGoB4H9+/RHODY3SO5JYduU84GXe//ONz1qQORuRkJHKK+nR16yIiJSX9kSS1RksutW4dAL+1vqxgH8mHXrEc/WWNv7u1RfzqyPneOXH7+HJMwO01C6vCbsLLRIO5U/aVYZfRETKTAG/ZI2tsrt0Snra6r1s9KnemMp5ZsHM+O/XnMd3/sdzqY6GONETW5YZ/oUUDRvJnLacUdXwi4hImWlPJFlnB0eJhm1JtWQMxto1NLooe+EvFRevb+b7//M63vzczbz2yg3lHs6SFgl5Gf6U2nKKiMgioZYmknW2P86qxhpCSyhAafNLepzThN25aqyJ8oHfuKjcw1jyImGvhj+ZVoZfRGQ5iifTnOwdYfuqxnIPpWjaE0nW2cH4kirnAWiqiRIcn6iGXxaDaDjkl/Sohl9EZDl659cf4cX/9HNG/DWAlgIF/JJ1dmB0QXvwl0IoNFaCVKse/LIIeF16xkp61DlKRGR5ue/wOQCSKVfmkRRPAb9knR2IL7mAH8Y69dRG9XGW8ouEQ6Q0aVdERBYR7YkEgJFEisF4ilVLrKQHxnrxa5VdWQzGZ/g1aVdERMpNAb8A0OkvurWUevAHWv3WnDWatCuLQCRspDKatCsiIouH9kQC5PbgX3oBf9CpR5N2ZTGIhkIk0xlN2hURkUVDAb8AXg9+WFqLbgVa6xTwy+Ixvi1nJKSvWRERKS/tiQSATj/Dv2oJZ/hV0iOLQSRoy+nX8EeV4RcRkTJTwC8AdPTHqYmGaKpZehNfleGXxSQaTNr1u/REVMMvIiJlpj2RAF5Jz+qmGsyWXjayLduWUwG/lN9YSY+f4VeXHhGRZalraLTcQyiaAn4Blm4Pfsjpw68MvywCXh/+DOmMFt4SEVmOguToi//pZ9x7qLvMoymOAn4BvBr+pRrwX7C2kZuecx7Xbl9Z7qGIeH34c9pyqqRHRGT5+p1P38+JnpFyD2Na2hMJzjnODoyyunHpdegBqI6E+etXXczKhqU5flleIqEQqbTLtuXUpF0RkeWtbyQ5p9s750o0kskp4BcGR1PEkuklm+EXWUyiYfP68Kstp4hIRVuIQL5Y2hMJZ/uDlpzKkIvM1dhKu8rwi4jI9EZTmXl/jKICfjP7rJl1mtkTOdvazOwnZnbQ/7fV325m9nEzO2Rme83syvkavJTGke5hAM5bUV/mkYgsfZHsSrsZwiFbkp2vRERk4ZwbTsz7YxSb4f888PJx294D3OGc2wHc4f8N8Apgh/9zM/CJuQ9T5tOhziEAtq9qKPNIRJa+qN+WM5V2RNShR0REFoGiAn7n3M+BnnGbXwV8wf/9C8Crc7Z/0Xl+BbSY2doSjFXmydNnB1nfUktD9dJbdEtksQnacibTjqg69IiILDuLqTa/WHPZG612zp3xf+8AVvu/rwdO5FzvpL9NFqmDZ4eU3RcpkWjISKZdtqRHRESk3EqSfnLeoc6MDnfM7GYz22Nme7q6ukoxDJmFdMZxuGuInasV8IuUQtB3P5HKaMKuiMgyVOzcrMV0ImAuAf/ZoFTH/7fT334K2JhzvQ3+tjzOuU8653Y753a3t7fPYRgyFyd6RhhNZdixqrHcQxFZFoKsfjyZVktOERFZFOayN7oVuMn//Sbgeznb3+R367kG6M8p/ZFF5mAwYVcZfpGSCLL6sWSaiDL8IiIyjYXYUxQ1S9PMvga8AFhpZieB9wMfBr5pZm8FjgOv869+G3AjcAgYAd5S4jFLCT19dhCAHarhFymJIKsfS2Y0aVdERKa1EJU/RQX8zrk3THLRiwpc1wFvn8ugZOEc6hxibXMNjTXRcg9FZFkIMvzxRFptOUVEKsD/+MpDbF5Rz5d//+pZ3X4hav2VfqpwT58dZMdq1e+LlEowadcr6amMr1gtzigilexkb4x7DnWXexhTqoy9kRSUzjgOdQ6pnEekhIKsfiyZrqQuPZ9HizOKiORZRE16FPBXslO9Mb9DjwJ+kVIJ6vZjFVTSo8UZRUQWNwX8FSw7YVclPSIlE3TmGU1VfFtOLc4oIrJIVPTeqNJlW3Iqwy9SMpFsH/6M2nL6ZrM4I2iBRhFZnIpcd6toP3myg83v+QHDo6nS3nEOBfwV7ODZQdY01dBcqw49IqUSZPVHEqmKmbQ7iTktzghaoFFEKkP3UAKA4+dG5u0xKnpvVOkOdg6xQwtuiZRUkNXPOIhWSA3/JLQ4o4hUpN7hBJvf8wO+v/d0uYeSpYC/QmX8Dj0q5xEprdzFtiqlpMdfnPE+4HwzO+kvyPhh4CVmdhB4sf83eIszHsFbnPFTwB+VYcgiIvPmcJdXMv25Xx4r70ByFLXwliw/p/pixJJpdmrCrkhJ5XbmqZSSHi3OKCIykdpyStkd7PQ79CjDL1JSuUF+hZf0iIjIIqGAv0I9fdY73bRjlTL8IqWUu9hWpWT4RUQqiVtMqfsiaW9UoY52DbOyoZrmOnXoESml3N77FbTSroiILGIK+CtUXyzBivqqcg9DZNnJDfLDKukREZFFQAF/hRqIpWis0ZxtkVLLDfIrfKVdEZFlqdQLby0E7Y0q1OBokiYtuCVScrltOVXSIyJSwRZRsb8C/gqlDL/I/Iho0q6IyLK2iOL4omlvVKEG40kF/CLzIG/Srmr4RUSWnf5YstxDmDEF/BXIOcdgPEVTjUp6REpNbTlFRGSx0d6oAsWSaVIZR6MCfpGSyw3yI6rhFxGRRUABfwUajKcAaKpVSY9IqUVyynii6tIjIiKLgPZGFWjArz1Thl+k9HK79KgPv4hI5VpMc3sV8FeggSDDr0m7IiWXG+OrLaeISOX47iMnyz2ESSngr0ADcWX4ReaLmWUDfU3aFRGpHO/6xmO4Rdqzc9YpXjM7H/hGzqatwF8BLcAfAF3+9r9wzt0228eR0gtq+JtVwy8yLyKhEMl0Oq+eX0REpFxmHfE55w4AlwOYWRg4BXwXeAvwz865/1uKAUrpDSrDLzKvImGDZH49v4iISLmUam/0IuCwc+54ie5P5tFAzMvwa+EtkfkRBPpqyykiIotBqQL+1wNfy/n7HWa218w+a2atJXoMKZHBeJJIyKiNhss9FJFlKSjliagtp4hIxVpM5fxz3huZWRXwG8C3/E2fALbhlfucAT46ye1uNrM9Zranq6ur0FVkngzEkzTWRDBT9lFkPgQZfnXpERGRxaAU6adXAA87584COOfOOufSzrkM8CngqkI3cs590jm32zm3u729vQTDkGINxlM01ap+X2S+RNSlR0SkIi2mrH6uUuyN3kBOOY+Zrc257DXAEyV4DCmhgVhS9fsi8yicLelRhl9EpFI9fqq/3EPImlPUZ2b1wEuAP8zZ/A9mdjneAmPHxl0mi8BgPEWTOvSIzJuoX7uvgF9ERBaDOQX8zrlhYMW4bW+c04hk3g3GU2xeWVfuYYgsWyrpERGRxUR7owrkTdpVhl9kvkQ0aVdERBYRBfwVSCU9IvMrqracIiKyiGhvVGHSGcfQaEqTdkXmUVDSowy/iEhlWaRNehTwV5qhuLfKrtpyisyfsZV29RUrIiLlp71RhRmIJwGU4ReZRxG15RQRkUVEAX+FCQJ+1fCLzJ+xSbv6ihURkfLT3qjCDAYlPcrwi8ybILMfVoZfREQWAQX8FWYg5mf4VcMvMm/UllNERBYTBfwVJsjwq4ZfZP5k23KqpEdEpKI4tzj79GhvVGHGJu0qwy8yX7Ir7aqkR0REFgEF/BVGGX6R+adJuyIisphob1RhBuNJaqNhBSIi8ygaMsw0aVdERBYHRX0VZiCWoqlW2X2R+RQJh4iG9PUqIiKLgyK/CjM4mlT9vsg8e9Z5rZzqjZV7GCIissAW55RdBfwVZyCWUg9+kXl24yVrufGSteUehoiICKCSnoozGFeGX0RERKSSKOCvMAPxlDr0iIiIiFQQBfwVZjCe1Cq7IiIiIhVEAX+FUYZfREREpLIo4K8g8WSaRCpDk2r4RURERErOLdI2PQr4K0iwyq669IiIiIhUDgX8FWQgngRQDb+IiIhIBVGqt4IEGX7V8IvIQjGzY8AgkAZSzrndZtYGfAPYDBwDXuec6y3XGEVEljtl+CvIQMzL8KsPv4gssBc65y53zu32/34PcIdzbgdwh/+3iIjMkzkH/GZ2zMweN7NHzWyPv63NzH5iZgf9f1vnPlSZq7EafgX8IlJWrwK+4P/+BeDV5RuKiMjyV6oMv7I3S8BgPMjwq6RHRBaMA35sZg+Z2c3+ttXOuTP+7x3A6vIMTUSktByLs03PfEV+rwJe4P/+BeBu4M/n6bGkSJq0KyJlcJ1z7pSZrQJ+Ymb7cy90zjkzK7iH9A8QbgbYtGnT/I9URGSZKkWGf1bZGzO72cz2mNmerq6uEgxDpjMYTxEyqK8Kl3soIlIhnHOn/H87ge8CVwFnzWwtgP9v5yS3/aRzbrdzbnd7e/tCDVlEZNkpRcB/nXPuSuAVwNvN7Hm5FzrnHEw8v6Ev8oU3EEvSWBPFzMo9FBGpAGZWb2aNwe/AS4EngFuBm/yr3QR8rzwjFBGpDHMu6cnN3phZXvbGOXdmquyNLKzBeEr1+yKykFYD3/WTDBHgq865283sQeCbZvZW4DjwujKOUURk2ZtT9OdnbELOucGc7M3fMJa9+TDK3iwaA/GUOvSIyIJxzh0BLiuw/RzwooUfkYhIZZprulfZmyVkIJ5Uhl9ERERknrjF2aRnbgG/sjdLy2A8xfqW2nIPQ0REREQWkFbarSADsSRNtcrwi4iIiFQSBfwVZDCeVA2/iIiISIVRwF8hMhnH4GiKJtXwi4iIiFQUBfwV4txwAuegpa6q3EMRERERkQWkgL9C7D3ZB8DF65vLOxARERERWVAK+CvEw8/0EgkZlyjgFxEREakoCvgrxMPH+7hgbRO1VeFyD0VEREREFpAC/gqQzjgeO9nHlZtayj0UEREREVlgCvgrwIGOQUYSaa7Y1FruoYiIiIjIAlPAXwEeOdELwJUK+EVEREQqjgL+CvDw8T5W1Fexsa223EMRERERWbacK/cIClPAXwEeOdHLFZtaMbNyD0VEREREFpgC/mWubyTBka5hrjyvpdxDEREREZEyUMC/zD1yog+AKzaqfl9ERESkEingX+YeOd5LyOCyjVpwS0RERKQSKeBf4tIZx0PHeya9/JETfexa00RdVWQBRyUiIiIii4UC/iXuR/s6+G+fuI9DnYMTLstkHI8+06f6fRERWZTiyTSn+2LlHoZIyTgWZ5seBfxL3PFzIwAc7hqecNnBziEGR1Oq3xcRkUXjTH+Mv/mvJ0lnHG/53IM898N3lntIIsue6jyWuI5+LzNyomdkwmWPPOMvuHWeAn4RkUqy/S9uo6Uuyp6/fEm5hwJAMp3h+LlhVjfV8M6vP8oDR3t43s6V3HfkXLmHJlIRFPCXQCbjuPWx07zy0rVEwwt70uR0fxyAZwoE/AfODlJfFWbziroFHZOIiJRXKuPoHkqUexhZH7ptP5/95dG8bW/+3INlGo1I5VFJzwz0jSTIZCbWZj14rIc//saj3PHU2QUfU8cUAf/R7mE2r6zXglsiIjJv9p7sY/N7fsCx7omlpYE9UzSXEJH5p4C/SAPxJM/98J3c8vDJCZcF9fNBPf1COjNNwL9lZf1CD0lERJa50VSaD9y6j76RBLc85O0X7z7Qmb3cOceP9nXw/b2nyzVEEclRkQH/P/5oP/9x96EZ3eZw5xAjiTSPneybcNnR7iGgcNA9W1+9/xnec8veKa8zmkrTPTRKOGSc7InlnX1IpDKc6BlhqwJ+ERGZgf/z/57gvsNebX3u/LBUOsObP/cADx3v5ZaHTvH5e4/xodv2F7yPL//qOH/4pYd4x1cfWZAxiywWbnE26Zl9wG9mG83sLjN70sz2mdk7/e0fMLNTZvao/3Nj6YZbGrc8dIov3XccN4N35ah/qvJw58RTlke7vS/EUgb8/++RU3z9wRN0DsQnvU7nwCgAF69vJpHOcHZw7LonekfIONisgF9ERGbgS786zhs+9Sve/Y1Huf4f7sqWq57qi3H3gS7e9Jn7+fe7vKTZNx86UfA+7j7QtWDjFZHpzSXDnwL+xDl3IXAN8HYzu9C/7J+dc5f7P7fNeZSz0DeS4FSB3r4jiRQdA3HO9Mc52Vt879+gNvFw19CEy4IMf6FOObPhnGN/xwAw9ZdmUM5z9ZY2AJ7JKSk66pcZqaRHRESKlc45U/ydR04B8OTpgbzrDCfS2f3rYs1miki+WQf8zrkzzrmH/d8HgaeA9aUaWLGGRlMFt7//1n288TP3T9h+rHssKH7gaPGTiI74AX/n4CgD8WR2eyqd4ZmeEUIGJ3tjeV+Ws9UxEGcg7j2vO/ZPPhH4jN+S86rNfsCfc8ARnJFQwC8iIsV64lT/pJcZxTeAUK8IkcWlJDX8ZrYZuAIIIux3mNleM/usmc1bE/iP/vgAV//9TwsG2Y+e6ONI1zAjifwDgqM5XQRmEvAf7R6mym+5eSRnkavTfXGSacflG1tIZVw2CJ+L/R3eqrk7Vzdwz8FuRlPpgtcLMvy7N7cSMjiRc8biSPcwbfVVtNRVzXk8IiIik1GSX2Txm3PAb2YNwC3AHzvnBoBPANuAy4EzwEcnud3NZrbHzPZ0dc2u1u+8FfUMJ9LZkprAQDyZ7ZhzqDP/smPnvGD92u0reOBYcQG/c45j3cNcs20F4E3gDRzxH/v5O1cBpanj33/GC/jf9vxtDCfSPHi0t+D1OvrjNNZEaKmrYm1zbV5J0bHuYfXfFxGRGSmUmZ8uoP/ifcfnZSwiUjpzCvjNLIoX7H/FOfcdAOfcWedc2jmXAT4FXFXots65Tzrndjvndre3t8/q8S/b0AzA3pP5pyCDgBng4Nn8gP9I1zBrmmp4/s52jnYP0zk4+aTYQNfgKMOJNM/f2U4kZHl1/MEZg+ftXAmUpo7/QMcAa5treMXFa6mOhCYt6zndF2Ntcw0Am9rqJpT0bFnZMOexiIjI0rbvdD/ffLDw5NpijKbSXP3Bn3LnFCWmE6mmRyrTYj3jNZcuPQZ8BnjKOfdPOdvX5lztNcATsx/e1La2N1BXFZ4Q8O873e+PBZ7uHMy77Gj3EFtW1nPVFi9bP1n2PFdQv79zdQObVtTlBfzHuodprI5wyfpmIiErTYa/Y5Dz1zRSWxXmOdtWcOf+zoIdhToG4qxprgXyA/7hUW9i8tZ21e+LiFS6V378Hv5smjbPUzndF+fswCh/+4OnSjgqEVlIc8nwXwu8EbhhXAvOfzCzx81sL/BC4F2lGGgh4ZBx8bpm9o7rjf/k6QFWNlSxc1Ujh8Zl+I92D7OlvZ6L1jVRGw3zwNFz0z5O0KFn84p6trU3ZBfaAu9gYEt7PZFwiPWttTzTM7ca/mQ6w+GuIc5f0wjAi3at4vi5kexBR64z/XHWBRn+FXV0DY4SS6SzZUubVyjgFxFZjjr64wzmNJCYTDw5Ngfsdf/ffXzl/qnLb2YyMVdElo65dOm5xzlnzrlLc1twOufe6Jy7xN/+G865M6Uc8HiXbGhm3+kBUulMdtu+0wNcsLaJHasb8jL8fSMJekeSbFlRTzQc4lnntXJ/ERN3j3YPUxUJsa6llm3tDRw/N5x9vKPdw9nAenxZzWwc6RommXZcsKYJgBfu8uYG3LW/M+96iVSG7qFR1vgB/8Y2r17/RO9IthOROvSIiCxP13zoDl72zz8veFkiNbY/3PV/bs/+/sDRHt733alPuheq4S/2EMBybqwuPSKLy5JfaffSDc2MpjI87WfyE6kMBzsHuWhdMztWNXKyN5bt1DO+VeVVW9o4cHaQ/pGpsyRH/Amw4ZCxrb2eZNpxojfGaMrrRRzc38a2ujnX8Af994MM/4bWOs5f3cid4wL+swNxnCOvhh+8XvzBJObNKzVpV0RkuTrtd2o72TuSl+0/0j1xvZiFMJPFLEVkYUXKPYC5unRDCwCPn+rjwnVNHOocIpl2XLiuiWjIcM5bHfeSDc1jAX/7WMDvHOw53sOLLlgNQOdAnKbaKDXRcPYxjnUPZ4P6bau8ibCHO4dIpTM4R7ZWflNbHT3DCQbjSRprorN6Pgc6BomEjG3tYxNuX7hrFZ/+xREG4kma/Pvt8Ffgza3hB69L0JHuYdY211BXteTfXhERyfHth07SVDP23X6qL8Z1H7kLgOt3rGRDax1vvOa8cg1PpOIt1gPfJZ/hP6+tjsaaSHbibjBh96J1XkkPwEG/rOdY9zAhg42tXnB8+cYWomHjgaM9OOf4zD1HufYjd/J3P3gye//pjOP4uZHsQcI2v/PN4a6hbF19cDAQBN0n5lDHf6BjkK3t9VRFxt6aG3atIpVx/PJgd3Zb0IM/qOFvrYvSUB3hmZ6RvDIjERFZPv73tx7j5i89lP37n3/ydPb3Xxzs5msPPMMHbt1XjqHll/SUZQQiMpklH/CHQsYl65t53F8d8MkzA9RGw2xeUc95K+qJho2Dft/8I93DbGyrywbTNdEwl21o4ecHu3nblx/ib7//JNWRMN/fe4akX6N/ui9GIp1hqx/UN9dFWdlQzeGuobHJvEFJT+tYln229ncMssuv3w9csamF+qowvzycE/D7y5oHNfxmxobWWk72jnhnJNShR0Rk2SsUWPeOJBZ8HJCf2VQNv8jisuQDfvDKep46M8BoKu1P2G0kHDKi4RBbVtZz8KyX4T+aU5oTuGpLG0+dGeCOpzr5y1dewEdfdxl9I0nuO3wuexvI73izrb2ew13DHO0eZmVDVbbMZizDP7uAfyCe5FRfLFu/H4iGQ1yzdQW/PDTWUehMf5yG6khe6dCmtjr2nuyndySZPUAREZGl6XO/PMrm9/yAZDpDIpXh4vf/aMJ1vvXQyZI+5mwW3hKRxW+ZBPzNJNOO/WcGeer0ABeuG8uQ71jVyMHOoexqueNLXV5zxXqu276Sb/zhc/j967fy/J3t1FeFue1xr7nQ+Lp/8Or4D3V6JT25BxDNdVGaaiKzzvA/3eEdmOwaF/ADXLt9JUe7hznZ6913R388O2E3sKmtjs7BUUAtOUVElqpEKsNgPMlHf+yV64yMptn5lz9kaDRV1O1LHaArWS+y9C2LgP+S9d6Ku7c9fobB0RQXrWvOXrZjdQPP9IzwTM8Iw4n0hMWodqxu5Mu/fzXPOq8V8Mp8XnTBan60r4NkOsPR7mEaqiO0N1Rnb7OtvYH+WJInTvVPCKw3rZjYmjOdKe7rd38Q8K9tmnDZdTu8lXzv9bP8Z/pj2XKe3McOqKRHRGTp6I8liSXS7Dvdz86//CGXfODH2QA/WFtlscuv4ddhgshisiwC/g2ttbTWRfm2f2rzwrX5GX7n4KdPeW0ti+lNf+Mla+kdSfKrI+e8lpwr6/K+yLb5wfRIIj0hsN40rjXn7U+c4Yq/+TH35Ey4ncyBjkEaayLZibi5dqxqoL2xmnsOefdzpkCGP+jFHw5Zdj6BiIgsbt995CSX/fWPueCvbueVH79nzvc3my4hqXSGWCI9/RVL/LiFfOT2/dmz7CJLzWItgVsWAb+ZccmGFs4NJwiHLK8GPujU8+N9HUBxpS4vOD8o6+nwW3I25F2e2zJzfK38xrY6TvbGSGcciVSGD962n4F4ird9+SGePD0w5ePu7xjg/NWNeQcXuc/xuu0r+eWhbhKpDF1Do9mWnIFgDsHG1tq8Lj8iIrI4dA7E+c7D+XX37/rGY1PeJjPDQHo2AcdbPv8gF/zV7dNfcR79eF8H8WSaT9x9mD/6ysO8+t9/yU+ePMvPnu4q67hEloNlExVetsEr49nWXp/XQ3/zinoiIePBYz3Z1XKnUxMNc8MFq7n9iTOc7B2ZcFZgfUst1X5AvXnlxAx/Ip3h7ECcbzz4DM/0jPDB11xCQ3WEt3z+AU753XUyGcevjpzj3+86xL/deZB/u/MgT50ZnDBhN9e121dybjjBLw524RwTzgSsb6nFbOKYRERkcfjvn7mfd3/zMfpjUy/4mKvIqtA5+YV/FvpjPz044bInz0ydrAoUSlYB/Pq/3pNtnT2Zx070cfOXHuKv/2uspeijJ/r4gy/u4abPPlDU44vI5JbNykxBHX9u/T5AVSTE5pX1HOocyq6WW4xXXrKG/3rsNABbxq1YGwoZW1bWs79jcGINv59lP9AxyMfvPMRVm9t4w1UbufK8Fn7rE/fx5s8+wCsvXcstD5+c0K8/ZF5QP5lrt68AyJYuja/hr4mGuWbLCq6b4j5ERKQ8vnDvseyq8NFw8TXuMy6VmebqqXSGSNhLWqUzjhs+enf2sp88eXbC9YP5ZcXMR0tnHO+/9QlO9o2VtgZts6fS47cSnW4dmxM9IwyNpugbSfLwM728/YXbOdMfY23z9Mk8kfGcc8SS6ZIuVHrpB37Mf/73K0t2f6WybAL+yze2EA5ZNtOfa+fqBj/gLz7z/YLzV1FXFfbq9MeV9IA3T2A0lck7mwBjAf+HfvgUXYOjfOJ3r8TM2LWmif/vjc/ips89wMd+epBrt6/g3S/ZyUsuXJM9W2CQ/RIuZG1zLVvb6/npU2ezf4/3tZuvKfo5iohI6R3tHqZnOJFtBhH46I8PZH+PhELce6ibZ29pm/b+fvM/75vR408Xlm9/3w951eXr+N6jp2d0v9N5/637eP8sF/16y+ceBMjOUxvv7ECcbz54go/mLDQG3r7/dz99Px/77ct59RXrZ/XYsvR9f+9p3vfdJ9jzly8mOkUcNd7XHjjBX3z3cX72py/gvBJ2NzzoH9gvJssm4F/VVMP3/+d1efX1ge2rGoGOGXWuqYmGuWHXKr6/9wxbCnwI/s+vXViwRdq6llpCBk+fHeJFu1axe/PYl/lzt6/kB//reuqqwmyY5aTa67av5Iv3HQdgbcvEyb0iIkuFmb0c+BcgDHzaOffhMg9pWrFEmrRzPHGqn6v9YD23lMU5xwv/790AfOy3L6e9sZrVTdWMpjJURcKAt9/Y+Zc/XOih5yl1sD/f/vBLD/Hoib4J23/30/cD8MffeJTLN7bw06fO0t5YzeUbWyYEcLFEms/+8ig3P2/rjILCSvPgsR7WNtcwmvIWHZ2sVKtYwRmq6e7nZO8IdVUR2uqrJlx2qi9GPJkuGOMBvOOrjwBw/l/+kIyDb7/tOXnx172Hu3nqzCArG6p4wc5VNNd5axjd7s/vPNI9zP6OQdb5idX66rHweDSV5t/uPMQfvWB70c95/IFpseKpuU2cn8qyCfgBLijQzhK8DD9MnGA7nT9+8U6evbkt+8HI1VpfRWuBD2U07M0TONUX43+/7PwCY5m8Rr8Y1/oBf31VmMbqZfX2iUgFMbMw8O/AS4CTwINmdqtz7slSPs6eYz0c7R7mhl2r6B5KcP6aRk71xejoj/Gs89roHIjzjQdP8Ecv3M63HzrBn9/yOPv/9uV88Lan+OJ9x/nAr1/I1VtX8Ip/+cWMH/uPv/FoKZ9K0frKtNLufCoU7I/3Av9AK3D5xhYuWNtE30iCR0/0caY/DsA//ugAH3/DFVy4ton+WJJnnddKOuPIODerA4FkOkPGOaoj4emvPI2fPnmW3ZtbSWUcVZFQdmHPYsSTaQ51DpHKOD5421P89u6NHO8Z4Q+u30JjTZRTfTGqIyFW5rQZzzUYT7LnWC9v+fyD2W037FrF+3/9QgbjKZpro3z1gWf4xN2HaayJ8FvP2shf/fqFE+7n0784wt/94Cm+/NaruW7HSra/74fZcrC/e/XFtDdW87KL1tA3kqChOsK+0wM88kwvH/ivJ6mKhPj337mScAhu2LWaze/5Qd59/+2rL2aDP1+xvbGabe0NeZUWQdXZR27fz5d//2qioRAO+J1P3Z93P1/9g6t57raVZHLK1P7wSw9lf3/ni3bwrpfsBOBr9z/Dv955iH+981AR78LcfOLuw3zqTbvn5b6tVG205mL37t1uz54983b/nYNx/uCLD/Gvr78ir1f9fPnQD58C4L2vuKDk990fS3LF3/yYLSvrueNPXlDy+xeRxcfMHnLOzc9eoEzM7DnAB5xzL/P/fi+Ac+5Dha4/2/3En397L9995BSJdGYuwxUBYGVDNd1Do/zGZeu4YG0TH7l9/4KPYUV9FeeGl99BnXiLwf7zb18+49sVs4+oiBTxqsYavvf2axfs8eYj0A8010a5essKWgqcdRARWULWAydy/j4JXJ17BTO7GbgZYNOmTbN6kFddsY5v7Dkx/RVFitA95K1mf+tjp7n1sfKURSnYXx6u37Ey2x0r8N4bd83b41VEwL/cfOqm3RTZbEhEZMlyzn0S+CR4Gf7Z3Mdzt63kF3/2QvpjScy8s6SGsba5hlQmQ/dQgvUttTx4rIdLNzQTDoUYHk0RCRvntdVzbniUvpEkG9vq6B9JMpxIsaK+CgckUhma66IMxJKkM46WuiriyTQjiTTtjdX0jSSIJzM01UQYSaQJmbGmuYah0RSxZJqmmghDoynCZjTVRjnRM8JIIs3qphp6hhP0DCfYsbqB4+dGSGccjTXeLjvjHA3VEc4OjDKaSnPeinrq/SYTo6kMPcMJNrbVsr6lluPnRmioidBWV8WZgThV4RCRkDEYT9E1FGdNcy01kRDJtKOlLkoskSaWTNNc6yWV4sk0tVVhUhlHPJFmZUM1wwmvS87qphqiYWMkkSaeTBNPZchknP/aOlL+ejTpjKOuKkw0HPJLX0I453XmaayJ+OU0UBUOEU+l6RnytrfUVXG0e5j66jDnhhJ0DY1y0bomIqEQ54ZGaaqNMhhP0VQT4ZETfWxqqyNkRm00zOBokjN9cUaSada31HLPwW6ef347kZBxuGuIuqoI9dVh/3WN0tEfpypiZDIQjYTo6I+xvqWOptoIITPODScwoLYq7L/XUXqHkyT8BcvCIfPLU/pprImwfZX3vtVXR0ilvY/uqb4Rdq5upDoSJpnOMDyaIpHOkEhlcEBbXRXrWmqJJdOc7B1hXUstNZEw3cOjpNOOrqFRVtRXEQ4Z6YyjOhomkcpwbmiU+uoIDTURcN7nI5HK0FgTxcwr1VnVVENVOMShziFqoiEG4ilqo2F6RxJcsr6ZWDJN2Iwj3cOsbqomkcqwvqWOkUSK0VSGzsFRVjdVMxhPsaG1liNdw6xqquZY9wiRkNFcF8U5r94dvPdyS3s9Q/EUybQjnkxTFQkRDYcYjCeJJdNUhUNUR8MMxpM45y0aOppKMxBLcYFfblUdCZHKZKiNep/9zkGvLKutvopoOEQ4ZKTSjqHRFOetqGNoNEV/LMloMs3QaJpVjdX0DCdY1VhNU200+9rEkxmOdg+xvrWWdMbr6BhLpGiqiTI4mmJTWx1Hu4epjniPEUukWdlYzZGuIX7jsvUc7xlmRX017Y3VE7podQ8l6OiPU18dZsvKeoZGUzTWRHHOzXk+xFxUREmPiMhSppIe7SdERCZTzD5C09RFRKQcHgR2mNkWM6sCXg/cWuYxiYgsSyrpERGRBeecS5nZO4Af4bXl/KxzbnZN3EVEZEoK+EVEpCycc7cBt5V7HCIiy51KekREREREljEF/CIiIiIiy9i8Bfxm9nIzO2Bmh8zsPfP1OCIiIiIiMrl5Cfhzlkx/BXAh8AYzm7j+soiIiIiIzKv5yvBfBRxyzh1xziWArwOvmqfHEhERERGRScxXwF9oyfT18/RYIiIiIiIyibJN2jWzm81sj5nt6erqKtcwRERERESWtfnqw38K2Jjz9wZ/W5Zz7pPAJwHMrMvMjs/ysVYC3bO87XKl1ySfXo+J9JpMtJhfk/PKPYBye+ihh7q1nyianu/ypue7vM3m+U67jzDn3OyGM9WdmkWAp4EX4QX6DwK/Mx+rKJrZHufc7lLf71Km1ySfXo+J9JpMpNdk+aq091bPd3nT813e5uv5zkuGX0umi4iIiIgsDvNV0qMl00VEREREFoHlsNLuJ8s9gEVIr0k+vR4T6TWZSK/J8lVp762e7/Km57u8zcvznZcafhERERERWRyWQ4ZfREREREQmsaQDfjN7uZkdMLNDZvaeco9noZnZRjO7y8yeNLN9ZvZOf3ubmf3EzA76/7aWe6wLzczCZvaImX3f/3uLmd3vf1a+YWZV5R7jQjGzFjP7tpntN7OnzOw5lf4ZMbN3+f9nnjCzr5lZTSV/Rpaz5bCfmOl3vXk+7j/nvWZ2Zc593eRf/6CZ3VSu51SMYr/Hzaza//uQf/nmnPt4r7/9gJm9rExPZVoz+Z5eDu/vTL6Dl+L7a2afNbNOM3siZ1vJ3k8ze5aZPe7f5uNmZtMOyjm3JH/wuv8cBrYCVcBjwIXlHtcCvwZrgSv93xvxWqFeCPwD8B5/+3uAj5R7rGV4bd4NfBX4vv/3N4HX+7//J/A/yj3GBXwtvgD8vv97FdBSyZ8RvFW/jwK1OZ+NN1fyZ2S5/iyX/cRMv+uBG4EfAgZcA9zvb28Djvj/tvq/t5b7+U3xvIv6Hgf+CPhP//fXA9/wf7/Qf8+rgS3+ZyFc7uc1yXMt+nt6qb+/M/0OXorvL/A84ErgiZxtJXs/gQf865p/21dMN6alnOG/CjjknDvinEsAXwdeVeYxLSjn3Bnn3MP+74PAU3j/kV6F9+WB/++ryzLAMjGzDcArgU/7fxtwA/Bt/yoV85qYWTPeF89nAJxzCedcHxX+GcHrUFZr3pohdcAZKvQzsswti/3ELL7rXwV80Xl+BbSY2VrgZcBPnHM9zrle4CfAyxfumRRvht/jua/Dt4EX+dd/FfB159yoc+4ocAjvM7GozOJ7esm/v8zsO3jJvb/OuZ8DPeM2l+T99C9rcs79ynnR/xcpYn+1lAP+9cCJnL9P+tsqkn+K6wrgfmC1c+6Mf1EHsLpc4yqTjwF/BmT8v1cAfc65lP93JX1WtgBdwOf8U+OfNrN6Kvgz4pw7Bfxf4Bm8nUw/8BCV+xlZzpbdfqLI7/rJnvdSej0+RvHf49nn5V/e719/qTzfmX5PL+n3dxbfwUv9/Q2U6v1c7/8+fvuUlnLALz4zawBuAf7YOTeQe5l/9FcxrZjM7NeATufcQ+UeyyIRwTut+Ann3BXAMN6pxKwK/Iy04mVUtgDrgHoWbxZMJKtSvusr8Hu8or6n9R1cnvdzKQf8p4CNOX9v8LdVFDOL4u0AvuKc+46/+ax/ygf/385yja8MrgV+w8yO4Z2+vwH4F7xTZMFCc5X0WTkJnHTO3e///W28HUslf0ZeDBx1znU555LAd/A+N5X6GVnOls1+Yobf9ZM976Xyesz0ezz7vPzLm4FzLJ3nO9Pv6aX+/s70O3ipv7+BUr2fp/zfx2+f0lIO+B8EdvizuqvwJnLcWuYxLSi/hu0zwFPOuX/KuehWIJjNfRPwvYUeW7k4597rnNvgnNuM95m40zn3u8BdwG/6V6uY18Q51wGcMLPz/U0vAp6kgj8jeKeRrzGzOv//UPCaVORnZJlbFvuJWXzX3wq8ye/+cQ3Q75cS/Ah4qZm1+lnWl/rbFpVZfI/nvg6/6V/f+dtf73d52QLswJvsuKjM4nt6Sb+/zPw7eEm/vzlK8n76lw2Y2TX+6/cmitlfzXTm8WL6wZvZ/DTezOz3lXs8ZXj+1+GdEtoLPOr/3IhX23YHcBD4KdBW7rGW6fV5AWPdHbbifREcAr4FVJd7fAv4OlwO7PE/J/8Pb7Z/RX9GgL8G9gNPAF/C6/JQsZ+R5fyzHPYTM/2ux+vc8e/+c34c2J1zX7/nf8YPAW8p93Mr4rlP+z0O1Ph/H/Iv35pz+/f5r8MBiuhkUsbnWfT39HJ4f2fyHbwU31/ga3jzE5J4Z3DeWsr3E9jtv3aHgX/DX0h3qh+ttCsiIiIisowt5ZIeERERERGZhgJ+EREREZFlTAG/iIiIiMgypoBfRERERGQZU8AvIiIiIrKMKeAXEREREVnGFPCLiIiIiCxjCvhFRERERJax/x/69NfX8sNkFgAAAABJRU5ErkJggg==", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "num_frames = 10000\n", - "batch_size = 32\n", - "gamma = 0.99\n", - "\n", - "losses = []\n", - "all_rewards = []\n", - "ep_reward = 0\n", - "\n", - "state = env.reset()\n", - "for frame_idx in range(1, num_frames + 1):\n", - " epsilon = epsilon_by_frame(frame_idx)\n", - " action = current_model.act(state, epsilon)\n", - " next_state, reward, done, _ = env.step(action)\n", - " replay_buffer.push(state, action, reward, next_state, done)\n", - " \n", - " state = next_state\n", - " ep_reward += reward\n", - " \n", - " if done:\n", - " state = env.reset()\n", - " all_rewards.append(ep_reward)\n", - " ep_reward = 0\n", - " \n", - " if len(replay_buffer) > batch_size:\n", - " loss = compute_td_loss(batch_size)\n", - " losses.append(loss.item())\n", - " \n", - " if frame_idx % 200 == 0:\n", - " plot(frame_idx, all_rewards, losses)\n", - " \n", - " if frame_idx % 100 == 0:\n", - " update_target(current_model, target_model)\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 参考\n", - "\n", - "[强化学习(十二) Dueling DQN](https://www.cnblogs.com/pinard/p/9923859.html)" - ] - } - ], - "metadata": { - "interpreter": { - "hash": "fe38df673a99c62a9fea33a7aceda74c9b65b12ee9d076c5851d98b692a4989a" - }, - "kernelspec": { - "display_name": "Python 3.7.10 64-bit ('py37': conda)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.10" - }, - "orig_nbformat": 4 - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/projects/codes/GAE/task0_train.py b/projects/codes/GAE/task0_train.py deleted file mode 100644 index 961816c..0000000 --- a/projects/codes/GAE/task0_train.py +++ /dev/null @@ -1,167 +0,0 @@ -import math -import random - -import gym -import numpy as np - -import torch -import torch.nn as nn -import torch.optim as optim -import torch.nn.functional as F -from torch.distributions import Normal -import matplotlib.pyplot as plt -import seaborn as sns -import sys,os -curr_path = os.path.dirname(os.path.abspath(__file__)) # 当前文件所在绝对路径 -parent_path = os.path.dirname(curr_path) # 父路径 -sys.path.append(parent_path) # 添加父路径到系统路径sys.path - -use_cuda = torch.cuda.is_available() -device = torch.device("cuda" if use_cuda else "cpu") - -from common.multiprocessing_env import SubprocVecEnv - -num_envs = 16 -env_name = "Pendulum-v0" - -def make_env(): - def _thunk(): - env = gym.make(env_name) - return env - - return _thunk - -envs = [make_env() for i in range(num_envs)] -envs = SubprocVecEnv(envs) - -env = gym.make(env_name) - -def init_weights(m): - if isinstance(m, nn.Linear): - nn.init.normal_(m.weight, mean=0., std=0.1) - nn.init.constant_(m.bias, 0.1) - -class ActorCritic(nn.Module): - def __init__(self, num_inputs, num_outputs, hidden_size, std=0.0): - super(ActorCritic, self).__init__() - - self.critic = nn.Sequential( - nn.Linear(num_inputs, hidden_size), - nn.ReLU(), - nn.Linear(hidden_size, 1) - ) - - self.actor = nn.Sequential( - nn.Linear(num_inputs, hidden_size), - nn.ReLU(), - nn.Linear(hidden_size, num_outputs), - ) - self.log_std = nn.Parameter(torch.ones(1, num_outputs) * std) - - self.apply(init_weights) - - def forward(self, x): - value = self.critic(x) - mu = self.actor(x) - std = self.log_std.exp().expand_as(mu) - dist = Normal(mu, std) - return dist, value - - -def plot(frame_idx, rewards): - plt.figure(figsize=(20,5)) - plt.subplot(131) - plt.title('frame %s. reward: %s' % (frame_idx, rewards[-1])) - plt.plot(rewards) - plt.show() - -def test_env(vis=False): - state = env.reset() - if vis: env.render() - done = False - total_reward = 0 - while not done: - state = torch.FloatTensor(state).unsqueeze(0).to(device) - dist, _ = model(state) - next_state, reward, done, _ = env.step(dist.sample().cpu().numpy()[0]) - state = next_state - if vis: env.render() - total_reward += reward - return total_reward - -def compute_gae(next_value, rewards, masks, values, gamma=0.99, tau=0.95): - values = values + [next_value] - gae = 0 - returns = [] - for step in reversed(range(len(rewards))): - delta = rewards[step] + gamma * values[step + 1] * masks[step] - values[step] - gae = delta + gamma * tau * masks[step] * gae - returns.insert(0, gae + values[step]) - return returns - -num_inputs = envs.observation_space.shape[0] -num_outputs = envs.action_space.shape[0] - -#Hyper params: -hidden_size = 256 -lr = 3e-2 -num_steps = 20 - -model = ActorCritic(num_inputs, num_outputs, hidden_size).to(device) -optimizer = optim.Adam(model.parameters()) - -max_frames = 100000 -frame_idx = 0 -test_rewards = [] - -state = envs.reset() - -while frame_idx < max_frames: - - log_probs = [] - values = [] - rewards = [] - masks = [] - entropy = 0 - - for _ in range(num_steps): - state = torch.FloatTensor(state).to(device) - dist, value = model(state) - - action = dist.sample() - next_state, reward, done, _ = envs.step(action.cpu().numpy()) - - log_prob = dist.log_prob(action) - entropy += dist.entropy().mean() - - log_probs.append(log_prob) - values.append(value) - rewards.append(torch.FloatTensor(reward).unsqueeze(1).to(device)) - masks.append(torch.FloatTensor(1 - done).unsqueeze(1).to(device)) - - state = next_state - frame_idx += 1 - - if frame_idx % 1000 == 0: - test_rewards.append(np.mean([test_env() for _ in range(10)])) - print(test_rewards[-1]) - # plot(frame_idx, test_rewards) - - next_state = torch.FloatTensor(next_state).to(device) - _, next_value = model(next_state) - returns = compute_gae(next_value, rewards, masks, values) - - log_probs = torch.cat(log_probs) - returns = torch.cat(returns).detach() - values = torch.cat(values) - - advantage = returns - values - - actor_loss = -(log_probs * advantage.detach()).mean() - critic_loss = advantage.pow(2).mean() - - loss = actor_loss + 0.5 * critic_loss - 0.001 * entropy - - optimizer.zero_grad() - loss.backward() - optimizer.step() diff --git a/projects/codes/HierarchicalDQN/README.md b/projects/codes/HierarchicalDQN/README.md deleted file mode 100644 index 383cdd0..0000000 --- a/projects/codes/HierarchicalDQN/README.md +++ /dev/null @@ -1,13 +0,0 @@ -# Hierarchical DQN - -## 原理简介 - -Hierarchical DQN是一种分层强化学习方法,与DQN相比增加了一个meta controller, - -![image-20210331153115575](assets/image-20210331153115575.png) - -即学习时,meta controller每次会生成一个goal,然后controller或者说下面的actor就会达到这个goal,直到done为止。这就相当于给agent增加了一个队长,队长擅长制定局部目标,指导agent前行,这样应对一些每回合步数较长或者稀疏奖励的问题会有所帮助。 - -## 伪代码 - -![image-20210331153542314](assets/image-20210331153542314.png) \ No newline at end of file diff --git a/projects/codes/HierarchicalDQN/agent.py b/projects/codes/HierarchicalDQN/agent.py deleted file mode 100644 index 91428cc..0000000 --- a/projects/codes/HierarchicalDQN/agent.py +++ /dev/null @@ -1,154 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: John -Email: johnjim0816@gmail.com -Date: 2021-03-24 22:18:18 -LastEditor: John -LastEditTime: 2021-05-04 22:39:34 -Discription: -Environment: -''' -import torch -import torch.nn as nn -import torch.optim as optim -import torch.nn.functional as F -import numpy as np -import random,math - -class ReplayBuffer: - def __init__(self, capacity): - self.capacity = capacity # 经验回放的容量 - self.buffer = [] # 缓冲区 - self.position = 0 - - def push(self, state, action, reward, next_state, done): - ''' 缓冲区是一个队列,容量超出时去掉开始存入的转移(transition) - ''' - if len(self.buffer) < self.capacity: - self.buffer.append(None) - self.buffer[self.position] = (state, action, reward, next_state, done) - self.position = (self.position + 1) % self.capacity - - def sample(self, batch_size): - batch = random.sample(self.buffer, batch_size) # 随机采出小批量转移 - state, action, reward, next_state, done = zip(*batch) # 解压成状态,动作等 - return state, action, reward, next_state, done - - def __len__(self): - ''' 返回当前存储的量 - ''' - return len(self.buffer) -class MLP(nn.Module): - def __init__(self, input_dim,output_dim,hidden_dim=128): - """ 初始化q网络,为全连接网络 - input_dim: 输入的特征数即环境的状态维度 - output_dim: 输出的动作维度 - """ - super(MLP, self).__init__() - self.fc1 = nn.Linear(input_dim, hidden_dim) # 输入层 - self.fc2 = nn.Linear(hidden_dim,hidden_dim) # 隐藏层 - self.fc3 = nn.Linear(hidden_dim, output_dim) # 输出层 - - def forward(self, x): - # 各层对应的激活函数 - x = F.relu(self.fc1(x)) - x = F.relu(self.fc2(x)) - return self.fc3(x) - -class HierarchicalDQN: - def __init__(self,n_states,n_actions,cfg): - self.n_states = n_states - self.n_actions = n_actions - self.gamma = cfg.gamma - self.device = cfg.device - self.batch_size = cfg.batch_size - self.frame_idx = 0 # 用于epsilon的衰减计数 - self.epsilon = lambda frame_idx: cfg.epsilon_end + (cfg.epsilon_start - cfg.epsilon_end ) * math.exp(-1. * frame_idx / cfg.epsilon_decay) - self.policy_net = MLP(2*n_states, n_actions,cfg.hidden_dim).to(self.device) - self.meta_policy_net = MLP(n_states, n_states,cfg.hidden_dim).to(self.device) - self.optimizer = optim.Adam(self.policy_net.parameters(),lr=cfg.lr) - self.meta_optimizer = optim.Adam(self.meta_policy_net.parameters(),lr=cfg.lr) - self.memory = ReplayBuffer(cfg.memory_capacity) - self.meta_memory = ReplayBuffer(cfg.memory_capacity) - self.loss_numpy = 0 - self.meta_loss_numpy = 0 - self.losses = [] - self.meta_losses = [] - def to_onehot(self,x): - oh = np.zeros(self.n_states) - oh[x - 1] = 1. - return oh - def set_goal(self,state): - if random.random() > self.epsilon(self.frame_idx): - with torch.no_grad(): - state = torch.tensor(state, device=self.device, dtype=torch.float32).unsqueeze(0) - goal = self.meta_policy_net(state).max(1)[1].item() - else: - goal = random.randrange(self.n_states) - return goal - def choose_action(self,state): - self.frame_idx += 1 - if random.random() > self.epsilon(self.frame_idx): - with torch.no_grad(): - state = torch.tensor(state, device=self.device, dtype=torch.float32).unsqueeze(0) - q_value = self.policy_net(state) - action = q_value.max(1)[1].item() - else: - action = random.randrange(self.n_actions) - return action - def update(self): - self.update_policy() - self.update_meta() - def update_policy(self): - if self.batch_size > len(self.memory): - return - state_batch, action_batch, reward_batch, next_state_batch, done_batch = self.memory.sample(self.batch_size) - state_batch = torch.tensor(state_batch,device=self.device,dtype=torch.float) - action_batch = torch.tensor(action_batch,device=self.device,dtype=torch.int64).unsqueeze(1) - reward_batch = torch.tensor(reward_batch,device=self.device,dtype=torch.float) - next_state_batch = torch.tensor(next_state_batch,device=self.device, dtype=torch.float) - done_batch = torch.tensor(np.float32(done_batch),device=self.device) - q_values = self.policy_net(state_batch).gather(dim=1, index=action_batch).squeeze(1) - next_state_values = self.policy_net(next_state_batch).max(1)[0].detach() - expected_q_values = reward_batch + 0.99 * next_state_values * (1-done_batch) - loss = nn.MSELoss()(q_values, expected_q_values) - self.optimizer.zero_grad() - loss.backward() - for param in self.policy_net.parameters(): # clip防止梯度爆炸 - param.grad.data.clamp_(-1, 1) - self.optimizer.step() - self.loss_numpy = loss.detach().cpu().numpy() - self.losses.append(self.loss_numpy) - def update_meta(self): - if self.batch_size > len(self.meta_memory): - return - state_batch, action_batch, reward_batch, next_state_batch, done_batch = self.meta_memory.sample(self.batch_size) - state_batch = torch.tensor(state_batch,device=self.device,dtype=torch.float) - action_batch = torch.tensor(action_batch,device=self.device,dtype=torch.int64).unsqueeze(1) - reward_batch = torch.tensor(reward_batch,device=self.device,dtype=torch.float) - next_state_batch = torch.tensor(next_state_batch,device=self.device, dtype=torch.float) - done_batch = torch.tensor(np.float32(done_batch),device=self.device) - q_values = self.meta_policy_net(state_batch).gather(dim=1, index=action_batch).squeeze(1) - next_state_values = self.meta_policy_net(next_state_batch).max(1)[0].detach() - expected_q_values = reward_batch + 0.99 * next_state_values * (1-done_batch) - meta_loss = nn.MSELoss()(q_values, expected_q_values) - self.meta_optimizer.zero_grad() - meta_loss.backward() - for param in self.meta_policy_net.parameters(): # clip防止梯度爆炸 - param.grad.data.clamp_(-1, 1) - self.meta_optimizer.step() - self.meta_loss_numpy = meta_loss.detach().cpu().numpy() - self.meta_losses.append(self.meta_loss_numpy) - - def save(self, path): - torch.save(self.policy_net.state_dict(), path+'policy_checkpoint.pth') - torch.save(self.meta_policy_net.state_dict(), path+'meta_checkpoint.pth') - - def load(self, path): - self.policy_net.load_state_dict(torch.load(path+'policy_checkpoint.pth')) - self.meta_policy_net.load_state_dict(torch.load(path+'meta_checkpoint.pth')) - - - - \ No newline at end of file diff --git a/projects/codes/HierarchicalDQN/assets/image-20210331153115575.png b/projects/codes/HierarchicalDQN/assets/image-20210331153115575.png deleted file mode 100644 index 5bb9251..0000000 Binary files a/projects/codes/HierarchicalDQN/assets/image-20210331153115575.png and /dev/null differ diff --git a/projects/codes/HierarchicalDQN/assets/image-20210331153542314.png b/projects/codes/HierarchicalDQN/assets/image-20210331153542314.png deleted file mode 100644 index 6db2d82..0000000 Binary files a/projects/codes/HierarchicalDQN/assets/image-20210331153542314.png and /dev/null differ diff --git a/projects/codes/HierarchicalDQN/outputs/CartPole-v0/20211221-200119/models/meta_checkpoint.pth b/projects/codes/HierarchicalDQN/outputs/CartPole-v0/20211221-200119/models/meta_checkpoint.pth deleted file mode 100644 index 02f3f7c..0000000 Binary files a/projects/codes/HierarchicalDQN/outputs/CartPole-v0/20211221-200119/models/meta_checkpoint.pth and /dev/null differ diff --git a/projects/codes/HierarchicalDQN/outputs/CartPole-v0/20211221-200119/models/policy_checkpoint.pth b/projects/codes/HierarchicalDQN/outputs/CartPole-v0/20211221-200119/models/policy_checkpoint.pth deleted file mode 100644 index 9d906ea..0000000 Binary files a/projects/codes/HierarchicalDQN/outputs/CartPole-v0/20211221-200119/models/policy_checkpoint.pth and /dev/null differ diff --git a/projects/codes/HierarchicalDQN/outputs/CartPole-v0/20211221-200119/results/test_ma_rewards.npy b/projects/codes/HierarchicalDQN/outputs/CartPole-v0/20211221-200119/results/test_ma_rewards.npy deleted file mode 100644 index 14dd955..0000000 Binary files a/projects/codes/HierarchicalDQN/outputs/CartPole-v0/20211221-200119/results/test_ma_rewards.npy and /dev/null differ diff --git a/projects/codes/HierarchicalDQN/outputs/CartPole-v0/20211221-200119/results/test_rewards.npy b/projects/codes/HierarchicalDQN/outputs/CartPole-v0/20211221-200119/results/test_rewards.npy deleted file mode 100644 index e815222..0000000 Binary files a/projects/codes/HierarchicalDQN/outputs/CartPole-v0/20211221-200119/results/test_rewards.npy and /dev/null differ diff --git a/projects/codes/HierarchicalDQN/outputs/CartPole-v0/20211221-200119/results/test_rewards_curve.png b/projects/codes/HierarchicalDQN/outputs/CartPole-v0/20211221-200119/results/test_rewards_curve.png deleted file mode 100644 index 645b21a..0000000 Binary files a/projects/codes/HierarchicalDQN/outputs/CartPole-v0/20211221-200119/results/test_rewards_curve.png and /dev/null differ diff --git a/projects/codes/HierarchicalDQN/outputs/CartPole-v0/20211221-200119/results/train_ma_rewards.npy b/projects/codes/HierarchicalDQN/outputs/CartPole-v0/20211221-200119/results/train_ma_rewards.npy deleted file mode 100644 index bf58391..0000000 Binary files a/projects/codes/HierarchicalDQN/outputs/CartPole-v0/20211221-200119/results/train_ma_rewards.npy and /dev/null differ diff --git a/projects/codes/HierarchicalDQN/outputs/CartPole-v0/20211221-200119/results/train_rewards.npy b/projects/codes/HierarchicalDQN/outputs/CartPole-v0/20211221-200119/results/train_rewards.npy deleted file mode 100644 index f4d20ff..0000000 Binary files a/projects/codes/HierarchicalDQN/outputs/CartPole-v0/20211221-200119/results/train_rewards.npy and /dev/null differ diff --git a/projects/codes/HierarchicalDQN/outputs/CartPole-v0/20211221-200119/results/train_rewards_curve.png b/projects/codes/HierarchicalDQN/outputs/CartPole-v0/20211221-200119/results/train_rewards_curve.png deleted file mode 100644 index 20ccbc5..0000000 Binary files a/projects/codes/HierarchicalDQN/outputs/CartPole-v0/20211221-200119/results/train_rewards_curve.png and /dev/null differ diff --git a/projects/codes/HierarchicalDQN/task0.py b/projects/codes/HierarchicalDQN/task0.py deleted file mode 100644 index b2cf312..0000000 --- a/projects/codes/HierarchicalDQN/task0.py +++ /dev/null @@ -1,88 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: John -Email: johnjim0816@gmail.com -Date: 2021-03-29 10:37:32 -LastEditor: John -LastEditTime: 2021-05-04 22:35:56 -Discription: -Environment: -''' -import sys -import os -curr_path = os.path.dirname(os.path.abspath(__file__)) # 当前文件所在绝对路径 -parent_path = os.path.dirname(curr_path) # 父路径 -sys.path.append(parent_path) # 添加路径到系统路径 - -import datetime -import numpy as np -import torch -import gym - -from common.utils import save_results,make_dir -from common.utils import plot_rewards -from HierarchicalDQN.agent import HierarchicalDQN -from HierarchicalDQN.train import train,test - -curr_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S") # 获取当前时间 -algo_name = "Hierarchical DQN" # 算法名称 -env_name = 'CartPole-v0' # 环境名称 -class HierarchicalDQNConfig: - def __init__(self): - self.algo_name = algo_name # 算法名称 - self.env_name = env_name # 环境名称 - self.device = torch.device( - "cuda" if torch.cuda.is_available() else "cpu") # 检测GPU - self.train_eps = 300 # 训练的episode数目 - self.test_eps = 50 # 测试的episode数目 - self.gamma = 0.99 - self.epsilon_start = 1 # start epsilon of e-greedy policy - self.epsilon_end = 0.01 - self.epsilon_decay = 200 - self.lr = 0.0001 # learning rate - self.memory_capacity = 10000 # Replay Memory capacity - self.batch_size = 32 - self.target_update = 2 # 目标网络的更新频率 - self.hidden_dim = 256 # 网络隐藏层 -class PlotConfig: - ''' 绘图相关参数设置 - ''' - - def __init__(self) -> None: - self.algo_name = algo_name # 算法名称 - self.env_name = env_name # 环境名称 - self.device = torch.device( - "cuda" if torch.cuda.is_available() else "cpu") # 检测GPU - self.result_path = curr_path + "/outputs/" + self.env_name + \ - '/' + curr_time + '/results/' # 保存结果的路径 - self.model_path = curr_path + "/outputs/" + self.env_name + \ - '/' + curr_time + '/models/' # 保存模型的路径 - self.save = True # 是否保存图片 - -def env_agent_config(cfg,seed=1): - env = gym.make(cfg.env_name) - env.seed(seed) - n_states = env.observation_space.shape[0] - n_actions = env.action_space.n - agent = HierarchicalDQN(n_states,n_actions,cfg) - return env,agent - -if __name__ == "__main__": - cfg = HierarchicalDQNConfig() - plot_cfg = PlotConfig() - # 训练 - env, agent = env_agent_config(cfg, seed=1) - rewards, ma_rewards = train(cfg, env, agent) - make_dir(plot_cfg.result_path, plot_cfg.model_path) # 创建保存结果和模型路径的文件夹 - agent.save(path=plot_cfg.model_path) # 保存模型 - save_results(rewards, ma_rewards, tag='train', - path=plot_cfg.result_path) # 保存结果 - plot_rewards(rewards, ma_rewards, plot_cfg, tag="train") # 画出结果 - # 测试 - env, agent = env_agent_config(cfg, seed=10) - agent.load(path=plot_cfg.model_path) # 导入模型 - rewards, ma_rewards = test(cfg, env, agent) - save_results(rewards, ma_rewards, tag='test', path=plot_cfg.result_path) # 保存结果 - plot_rewards(rewards, ma_rewards, plot_cfg, tag="test") # 画出结果 - diff --git a/projects/codes/HierarchicalDQN/train.py b/projects/codes/HierarchicalDQN/train.py deleted file mode 100644 index 3dc8aa3..0000000 --- a/projects/codes/HierarchicalDQN/train.py +++ /dev/null @@ -1,77 +0,0 @@ -import sys -import os -curr_path = os.path.dirname(os.path.abspath(__file__)) # 当前文件所在绝对路径 -parent_path = os.path.dirname(curr_path) # 父路径 -sys.path.append(parent_path) # 添加路径到系统路径 - -import numpy as np - -def train(cfg, env, agent): - print('开始训练!') - print(f'环境:{cfg.env_name}, 算法:{cfg.algo_name}, 设备:{cfg.device}') - rewards = [] # 记录所有回合的奖励 - ma_rewards = [] # 记录所有回合的滑动平均奖励 - for i_ep in range(cfg.train_eps): - state = env.reset() - done = False - ep_reward = 0 - while not done: - goal = agent.set_goal(state) - onehot_goal = agent.to_onehot(goal) - meta_state = state - extrinsic_reward = 0 - while not done and goal != np.argmax(state): - goal_state = np.concatenate([state, onehot_goal]) - action = agent.choose_action(goal_state) - next_state, reward, done, _ = env.step(action) - ep_reward += reward - extrinsic_reward += reward - intrinsic_reward = 1.0 if goal == np.argmax( - next_state) else 0.0 - agent.memory.push(goal_state, action, intrinsic_reward, np.concatenate( - [next_state, onehot_goal]), done) - state = next_state - agent.update() - if (i_ep+1)%10 == 0: - print(f'回合:{i_ep+1}/{cfg.train_eps},奖励:{ep_reward},Loss:{agent.loss_numpy:.2f}, Meta_Loss:{agent.meta_loss_numpy:.2f}') - agent.meta_memory.push(meta_state, goal, extrinsic_reward, state, done) - rewards.append(ep_reward) - if ma_rewards: - ma_rewards.append( - 0.9*ma_rewards[-1]+0.1*ep_reward) - else: - ma_rewards.append(ep_reward) - print('完成训练!') - return rewards, ma_rewards - -def test(cfg, env, agent): - print('开始测试!') - print(f'环境:{cfg.env_name}, 算法:{cfg.algo_name}, 设备:{cfg.device}') - rewards = [] # 记录所有回合的奖励 - ma_rewards = [] # 记录所有回合的滑动平均奖励 - for i_ep in range(cfg.train_eps): - state = env.reset() - done = False - ep_reward = 0 - while not done: - goal = agent.set_goal(state) - onehot_goal = agent.to_onehot(goal) - extrinsic_reward = 0 - while not done and goal != np.argmax(state): - goal_state = np.concatenate([state, onehot_goal]) - action = agent.choose_action(goal_state) - next_state, reward, done, _ = env.step(action) - ep_reward += reward - extrinsic_reward += reward - state = next_state - agent.update() - if (i_ep+1)%10 == 0: - print(f'回合:{i_ep+1}/{cfg.train_eps},奖励:{ep_reward},Loss:{agent.loss_numpy:.2f}, Meta_Loss:{agent.meta_loss_numpy:.2f}') - rewards.append(ep_reward) - if ma_rewards: - ma_rewards.append( - 0.9*ma_rewards[-1]+0.1*ep_reward) - else: - ma_rewards.append(ep_reward) - print('完成训练!') - return rewards, ma_rewards \ No newline at end of file diff --git a/projects/codes/MonteCarlo/README.md b/projects/codes/MonteCarlo/README.md deleted file mode 100644 index 91ff767..0000000 --- a/projects/codes/MonteCarlo/README.md +++ /dev/null @@ -1,5 +0,0 @@ -# *On-Policy First-Visit MC Control* - -### 伪代码 - -![mc_control_algo](assets/mc_control_algo.png) \ No newline at end of file diff --git a/projects/codes/MonteCarlo/Train_Racetrack-v0_FirstVisitMC_20221106-010504/config.yaml b/projects/codes/MonteCarlo/Train_Racetrack-v0_FirstVisitMC_20221106-010504/config.yaml deleted file mode 100644 index 326f84e..0000000 --- a/projects/codes/MonteCarlo/Train_Racetrack-v0_FirstVisitMC_20221106-010504/config.yaml +++ /dev/null @@ -1,19 +0,0 @@ -general_cfg: - algo_name: FirstVisitMC - device: cpu - env_name: Racetrack-v0 - eval_eps: 10 - eval_per_episode: 5 - load_checkpoint: false - load_path: tasks - max_steps: 200 - mode: train - save_fig: true - seed: 1 - show_fig: false - test_eps: 20 - train_eps: 200 -algo_cfg: - epsilon: 0.15 - gamma: 0.9 - lr: 0.1 diff --git a/projects/codes/MonteCarlo/Train_Racetrack-v0_FirstVisitMC_20221106-010504/logs/log.txt b/projects/codes/MonteCarlo/Train_Racetrack-v0_FirstVisitMC_20221106-010504/logs/log.txt deleted file mode 100644 index 993059b..0000000 --- a/projects/codes/MonteCarlo/Train_Racetrack-v0_FirstVisitMC_20221106-010504/logs/log.txt +++ /dev/null @@ -1,210 +0,0 @@ -2022-11-06 01:05:04 - r - INFO: - n_states: 4, n_actions: 9 -2022-11-06 01:05:04 - r - INFO: - Start training! -2022-11-06 01:05:04 - r - INFO: - Env: Racetrack-v0, Algorithm: FirstVisitMC, Device: cpu -2022-11-06 01:05:40 - r - INFO: - Episode: 1/200, Reward: -760.000, Step: 200 -2022-11-06 01:05:58 - r - INFO: - Episode: 2/200, Reward: -560.000, Step: 200 -2022-11-06 01:05:59 - r - INFO: - Episode: 3/200, Reward: -156.000, Step: 66 -2022-11-06 01:06:17 - r - INFO: - Episode: 4/200, Reward: -500.000, Step: 200 -2022-11-06 01:06:38 - r - INFO: - Episode: 5/200, Reward: -600.000, Step: 200 -2022-11-06 01:06:38 - r - INFO: - Current episode 5 has the best eval reward: -208.000 -2022-11-06 01:06:52 - r - INFO: - Episode: 6/200, Reward: -350.000, Step: 200 -2022-11-06 01:07:07 - r - INFO: - Episode: 7/200, Reward: -430.000, Step: 200 -2022-11-06 01:07:10 - r - INFO: - Episode: 8/200, Reward: -206.000, Step: 96 -2022-11-06 01:07:31 - r - INFO: - Episode: 9/200, Reward: -460.000, Step: 200 -2022-11-06 01:07:45 - r - INFO: - Episode: 10/200, Reward: -410.000, Step: 200 -2022-11-06 01:07:45 - r - INFO: - Current episode 10 has the best eval reward: -204.000 -2022-11-06 01:07:58 - r - INFO: - Episode: 11/200, Reward: -400.000, Step: 200 -2022-11-06 01:08:08 - r - INFO: - Episode: 12/200, Reward: -380.000, Step: 200 -2022-11-06 01:08:09 - r - INFO: - Episode: 13/200, Reward: -155.000, Step: 75 -2022-11-06 01:08:24 - r - INFO: - Episode: 14/200, Reward: -400.000, Step: 200 -2022-11-06 01:08:37 - r - INFO: - Episode: 15/200, Reward: -350.000, Step: 200 -2022-11-06 01:08:37 - r - INFO: - Current episode 15 has the best eval reward: -203.000 -2022-11-06 01:08:51 - r - INFO: - Episode: 16/200, Reward: -400.000, Step: 200 -2022-11-06 01:09:05 - r - INFO: - Episode: 17/200, Reward: -360.000, Step: 200 -2022-11-06 01:09:23 - r - INFO: - Episode: 18/200, Reward: -420.000, Step: 200 -2022-11-06 01:09:37 - r - INFO: - Episode: 19/200, Reward: -430.000, Step: 200 -2022-11-06 01:09:48 - r - INFO: - Episode: 20/200, Reward: -360.000, Step: 200 -2022-11-06 01:09:48 - r - INFO: - Current episode 20 has the best eval reward: -187.300 -2022-11-06 01:10:08 - r - INFO: - Episode: 21/200, Reward: -420.000, Step: 200 -2022-11-06 01:10:19 - r - INFO: - Episode: 22/200, Reward: -390.000, Step: 200 -2022-11-06 01:10:19 - r - INFO: - Episode: 23/200, Reward: -59.000, Step: 49 -2022-11-06 01:10:33 - r - INFO: - Episode: 24/200, Reward: -390.000, Step: 200 -2022-11-06 01:10:33 - r - INFO: - Episode: 25/200, Reward: 2.000, Step: 8 -2022-11-06 01:10:36 - r - INFO: - Episode: 26/200, Reward: -217.000, Step: 117 -2022-11-06 01:10:43 - r - INFO: - Episode: 27/200, Reward: -287.000, Step: 167 -2022-11-06 01:10:47 - r - INFO: - Episode: 28/200, Reward: -248.000, Step: 118 -2022-11-06 01:11:04 - r - INFO: - Episode: 29/200, Reward: -370.000, Step: 200 -2022-11-06 01:11:19 - r - INFO: - Episode: 30/200, Reward: -390.000, Step: 200 -2022-11-06 01:11:32 - r - INFO: - Episode: 31/200, Reward: -370.000, Step: 200 -2022-11-06 01:11:39 - r - INFO: - Episode: 32/200, Reward: -360.000, Step: 200 -2022-11-06 01:11:57 - r - INFO: - Episode: 33/200, Reward: -420.000, Step: 200 -2022-11-06 01:12:16 - r - INFO: - Episode: 34/200, Reward: -430.000, Step: 200 -2022-11-06 01:12:34 - r - INFO: - Episode: 35/200, Reward: -430.000, Step: 200 -2022-11-06 01:12:55 - r - INFO: - Episode: 36/200, Reward: -430.000, Step: 200 -2022-11-06 01:13:09 - r - INFO: - Episode: 37/200, Reward: -380.000, Step: 200 -2022-11-06 01:13:27 - r - INFO: - Episode: 38/200, Reward: -420.000, Step: 200 -2022-11-06 01:13:40 - r - INFO: - Episode: 39/200, Reward: -350.000, Step: 200 -2022-11-06 01:13:55 - r - INFO: - Episode: 40/200, Reward: -370.000, Step: 200 -2022-11-06 01:14:09 - r - INFO: - Episode: 41/200, Reward: -400.000, Step: 200 -2022-11-06 01:14:26 - r - INFO: - Episode: 42/200, Reward: -410.000, Step: 200 -2022-11-06 01:14:40 - r - INFO: - Episode: 43/200, Reward: -360.000, Step: 200 -2022-11-06 01:14:40 - r - INFO: - Episode: 44/200, Reward: -16.000, Step: 16 -2022-11-06 01:14:40 - r - INFO: - Episode: 45/200, Reward: -23.000, Step: 13 -2022-11-06 01:14:52 - r - INFO: - Episode: 46/200, Reward: -390.000, Step: 200 -2022-11-06 01:15:08 - r - INFO: - Episode: 47/200, Reward: -390.000, Step: 200 -2022-11-06 01:15:09 - r - INFO: - Episode: 48/200, Reward: -109.000, Step: 79 -2022-11-06 01:15:22 - r - INFO: - Episode: 49/200, Reward: -300.000, Step: 200 -2022-11-06 01:15:39 - r - INFO: - Episode: 50/200, Reward: -370.000, Step: 200 -2022-11-06 01:15:55 - r - INFO: - Episode: 51/200, Reward: -460.000, Step: 200 -2022-11-06 01:16:11 - r - INFO: - Episode: 52/200, Reward: -350.000, Step: 200 -2022-11-06 01:16:23 - r - INFO: - Episode: 53/200, Reward: -320.000, Step: 200 -2022-11-06 01:16:32 - r - INFO: - Episode: 54/200, Reward: -310.000, Step: 200 -2022-11-06 01:16:47 - r - INFO: - Episode: 55/200, Reward: -390.000, Step: 200 -2022-11-06 01:17:01 - r - INFO: - Episode: 56/200, Reward: -370.000, Step: 200 -2022-11-06 01:17:19 - r - INFO: - Episode: 57/200, Reward: -390.000, Step: 200 -2022-11-06 01:17:34 - r - INFO: - Episode: 58/200, Reward: -350.000, Step: 200 -2022-11-06 01:17:35 - r - INFO: - Episode: 59/200, Reward: -123.000, Step: 73 -2022-11-06 01:17:39 - r - INFO: - Episode: 60/200, Reward: -204.000, Step: 124 -2022-11-06 01:17:40 - r - INFO: - Episode: 61/200, Reward: -39.000, Step: 29 -2022-11-06 01:17:41 - r - INFO: - Episode: 62/200, Reward: -155.000, Step: 85 -2022-11-06 01:17:42 - r - INFO: - Episode: 63/200, Reward: -108.000, Step: 58 -2022-11-06 01:17:49 - r - INFO: - Episode: 64/200, Reward: -249.000, Step: 169 -2022-11-06 01:17:51 - r - INFO: - Episode: 65/200, Reward: -170.000, Step: 100 -2022-11-06 01:17:51 - r - INFO: - Current episode 65 has the best eval reward: -181.800 -2022-11-06 01:17:51 - r - INFO: - Episode: 66/200, Reward: 1.000, Step: 9 -2022-11-06 01:17:51 - r - INFO: - Episode: 67/200, Reward: -23.000, Step: 23 -2022-11-06 01:17:52 - r - INFO: - Episode: 68/200, Reward: -104.000, Step: 74 -2022-11-06 01:17:56 - r - INFO: - Episode: 69/200, Reward: -223.000, Step: 123 -2022-11-06 01:18:11 - r - INFO: - Episode: 70/200, Reward: -350.000, Step: 200 -2022-11-06 01:18:13 - r - INFO: - Episode: 71/200, Reward: -124.000, Step: 104 -2022-11-06 01:18:13 - r - INFO: - Episode: 72/200, Reward: -20.000, Step: 20 -2022-11-06 01:18:26 - r - INFO: - Episode: 73/200, Reward: -360.000, Step: 200 -2022-11-06 01:18:26 - r - INFO: - Episode: 74/200, Reward: -67.000, Step: 37 -2022-11-06 01:18:40 - r - INFO: - Episode: 75/200, Reward: -360.000, Step: 200 -2022-11-06 01:18:41 - r - INFO: - Episode: 76/200, Reward: -71.000, Step: 41 -2022-11-06 01:18:41 - r - INFO: - Episode: 77/200, Reward: -23.000, Step: 23 -2022-11-06 01:18:41 - r - INFO: - Episode: 78/200, Reward: -41.000, Step: 21 -2022-11-06 01:18:41 - r - INFO: - Episode: 79/200, Reward: -1.000, Step: 11 -2022-11-06 01:18:50 - r - INFO: - Episode: 80/200, Reward: -270.000, Step: 200 -2022-11-06 01:18:50 - r - INFO: - Current episode 80 has the best eval reward: -163.100 -2022-11-06 01:19:02 - r - INFO: - Episode: 81/200, Reward: -330.000, Step: 200 -2022-11-06 01:19:10 - r - INFO: - Episode: 82/200, Reward: -290.000, Step: 200 -2022-11-06 01:19:11 - r - INFO: - Episode: 83/200, Reward: -2.000, Step: 12 -2022-11-06 01:19:25 - r - INFO: - Episode: 84/200, Reward: -300.000, Step: 200 -2022-11-06 01:19:37 - r - INFO: - Episode: 85/200, Reward: -380.000, Step: 200 -2022-11-06 01:19:37 - r - INFO: - Episode: 86/200, Reward: -47.000, Step: 37 -2022-11-06 01:19:53 - r - INFO: - Episode: 87/200, Reward: -350.000, Step: 200 -2022-11-06 01:20:04 - r - INFO: - Episode: 88/200, Reward: -308.000, Step: 188 -2022-11-06 01:20:21 - r - INFO: - Episode: 89/200, Reward: -370.000, Step: 200 -2022-11-06 01:20:27 - r - INFO: - Episode: 90/200, Reward: -214.000, Step: 154 -2022-11-06 01:20:43 - r - INFO: - Episode: 91/200, Reward: -290.000, Step: 200 -2022-11-06 01:21:00 - r - INFO: - Episode: 92/200, Reward: -370.000, Step: 200 -2022-11-06 01:21:01 - r - INFO: - Episode: 93/200, Reward: -32.000, Step: 22 -2022-11-06 01:21:21 - r - INFO: - Episode: 94/200, Reward: -400.000, Step: 200 -2022-11-06 01:21:25 - r - INFO: - Episode: 95/200, Reward: -217.000, Step: 127 -2022-11-06 01:21:41 - r - INFO: - Episode: 96/200, Reward: -330.000, Step: 200 -2022-11-06 01:21:55 - r - INFO: - Episode: 97/200, Reward: -380.000, Step: 200 -2022-11-06 01:22:16 - r - INFO: - Episode: 98/200, Reward: -320.000, Step: 200 -2022-11-06 01:22:32 - r - INFO: - Episode: 99/200, Reward: -300.000, Step: 200 -2022-11-06 01:22:46 - r - INFO: - Episode: 100/200, Reward: -350.000, Step: 200 -2022-11-06 01:23:00 - r - INFO: - Episode: 101/200, Reward: -400.000, Step: 200 -2022-11-06 01:23:11 - r - INFO: - Episode: 102/200, Reward: -330.000, Step: 200 -2022-11-06 01:23:29 - r - INFO: - Episode: 103/200, Reward: -360.000, Step: 200 -2022-11-06 01:23:45 - r - INFO: - Episode: 104/200, Reward: -380.000, Step: 200 -2022-11-06 01:24:06 - r - INFO: - Episode: 105/200, Reward: -400.000, Step: 200 -2022-11-06 01:24:16 - r - INFO: - Episode: 106/200, Reward: -290.000, Step: 200 -2022-11-06 01:24:19 - r - INFO: - Episode: 107/200, Reward: -203.000, Step: 103 -2022-11-06 01:24:19 - r - INFO: - Episode: 108/200, Reward: -74.000, Step: 54 -2022-11-06 01:24:36 - r - INFO: - Episode: 109/200, Reward: -330.000, Step: 200 -2022-11-06 01:24:54 - r - INFO: - Episode: 110/200, Reward: -380.000, Step: 200 -2022-11-06 01:25:03 - r - INFO: - Episode: 111/200, Reward: -263.000, Step: 173 -2022-11-06 01:25:20 - r - INFO: - Episode: 112/200, Reward: -290.000, Step: 200 -2022-11-06 01:25:34 - r - INFO: - Episode: 113/200, Reward: -340.000, Step: 200 -2022-11-06 01:25:34 - r - INFO: - Episode: 114/200, Reward: -86.000, Step: 66 -2022-11-06 01:25:50 - r - INFO: - Episode: 115/200, Reward: -340.000, Step: 200 -2022-11-06 01:25:52 - r - INFO: - Episode: 116/200, Reward: -160.000, Step: 110 -2022-11-06 01:26:07 - r - INFO: - Episode: 117/200, Reward: -340.000, Step: 200 -2022-11-06 01:26:15 - r - INFO: - Episode: 118/200, Reward: -320.000, Step: 200 -2022-11-06 01:26:29 - r - INFO: - Episode: 119/200, Reward: -320.000, Step: 200 -2022-11-06 01:26:43 - r - INFO: - Episode: 120/200, Reward: -360.000, Step: 200 -2022-11-06 01:26:56 - r - INFO: - Episode: 121/200, Reward: -330.000, Step: 200 -2022-11-06 01:27:09 - r - INFO: - Episode: 122/200, Reward: -350.000, Step: 200 -2022-11-06 01:27:25 - r - INFO: - Episode: 123/200, Reward: -300.000, Step: 200 -2022-11-06 01:27:38 - r - INFO: - Episode: 124/200, Reward: -320.000, Step: 200 -2022-11-06 01:27:39 - r - INFO: - Episode: 125/200, Reward: -70.000, Step: 40 -2022-11-06 01:27:39 - r - INFO: - Episode: 126/200, Reward: -59.000, Step: 39 -2022-11-06 01:27:55 - r - INFO: - Episode: 127/200, Reward: -340.000, Step: 200 -2022-11-06 01:27:56 - r - INFO: - Episode: 128/200, Reward: -87.000, Step: 77 -2022-11-06 01:28:13 - r - INFO: - Episode: 129/200, Reward: -330.000, Step: 200 -2022-11-06 01:28:22 - r - INFO: - Episode: 130/200, Reward: -260.000, Step: 200 -2022-11-06 01:28:38 - r - INFO: - Episode: 131/200, Reward: -290.000, Step: 200 -2022-11-06 01:28:57 - r - INFO: - Episode: 132/200, Reward: -330.000, Step: 200 -2022-11-06 01:29:07 - r - INFO: - Episode: 133/200, Reward: -340.000, Step: 200 -2022-11-06 01:29:08 - r - INFO: - Episode: 134/200, Reward: -78.000, Step: 48 -2022-11-06 01:29:23 - r - INFO: - Episode: 135/200, Reward: -390.000, Step: 200 -2022-11-06 01:29:33 - r - INFO: - Episode: 136/200, Reward: -320.000, Step: 200 -2022-11-06 01:29:51 - r - INFO: - Episode: 137/200, Reward: -360.000, Step: 200 -2022-11-06 01:30:06 - r - INFO: - Episode: 138/200, Reward: -340.000, Step: 200 -2022-11-06 01:30:10 - r - INFO: - Episode: 139/200, Reward: -185.000, Step: 115 -2022-11-06 01:30:26 - r - INFO: - Episode: 140/200, Reward: -340.000, Step: 200 -2022-11-06 01:30:43 - r - INFO: - Episode: 141/200, Reward: -250.000, Step: 200 -2022-11-06 01:30:57 - r - INFO: - Episode: 142/200, Reward: -347.000, Step: 197 -2022-11-06 01:31:11 - r - INFO: - Episode: 143/200, Reward: -320.000, Step: 200 -2022-11-06 01:31:25 - r - INFO: - Episode: 144/200, Reward: -330.000, Step: 200 -2022-11-06 01:31:37 - r - INFO: - Episode: 145/200, Reward: -270.000, Step: 200 -2022-11-06 01:31:55 - r - INFO: - Episode: 146/200, Reward: -380.000, Step: 200 -2022-11-06 01:32:10 - r - INFO: - Episode: 147/200, Reward: -320.000, Step: 200 -2022-11-06 01:32:27 - r - INFO: - Episode: 148/200, Reward: -340.000, Step: 200 -2022-11-06 01:32:38 - r - INFO: - Episode: 149/200, Reward: -310.000, Step: 200 -2022-11-06 01:32:57 - r - INFO: - Episode: 150/200, Reward: -290.000, Step: 200 -2022-11-06 01:33:10 - r - INFO: - Episode: 151/200, Reward: -380.000, Step: 200 -2022-11-06 01:33:21 - r - INFO: - Episode: 152/200, Reward: -281.000, Step: 181 -2022-11-06 01:33:21 - r - INFO: - Episode: 153/200, Reward: -30.000, Step: 30 -2022-11-06 01:33:33 - r - INFO: - Episode: 154/200, Reward: -280.000, Step: 200 -2022-11-06 01:33:45 - r - INFO: - Episode: 155/200, Reward: -300.000, Step: 200 -2022-11-06 01:33:59 - r - INFO: - Episode: 156/200, Reward: -300.000, Step: 200 -2022-11-06 01:34:10 - r - INFO: - Episode: 157/200, Reward: -300.000, Step: 200 -2022-11-06 01:34:28 - r - INFO: - Episode: 158/200, Reward: -370.000, Step: 200 -2022-11-06 01:34:45 - r - INFO: - Episode: 159/200, Reward: -320.000, Step: 200 -2022-11-06 01:34:52 - r - INFO: - Episode: 160/200, Reward: -250.000, Step: 200 -2022-11-06 01:35:04 - r - INFO: - Episode: 161/200, Reward: -370.000, Step: 200 -2022-11-06 01:35:16 - r - INFO: - Episode: 162/200, Reward: -290.000, Step: 200 -2022-11-06 01:35:31 - r - INFO: - Episode: 163/200, Reward: -320.000, Step: 200 -2022-11-06 01:35:41 - r - INFO: - Episode: 164/200, Reward: -290.000, Step: 200 -2022-11-06 01:35:41 - r - INFO: - Episode: 165/200, Reward: -44.000, Step: 44 -2022-11-06 01:35:53 - r - INFO: - Episode: 166/200, Reward: -216.000, Step: 196 -2022-11-06 01:36:06 - r - INFO: - Episode: 167/200, Reward: -340.000, Step: 200 -2022-11-06 01:36:23 - r - INFO: - Episode: 168/200, Reward: -360.000, Step: 200 -2022-11-06 01:36:38 - r - INFO: - Episode: 169/200, Reward: -310.000, Step: 200 -2022-11-06 01:36:51 - r - INFO: - Episode: 170/200, Reward: -320.000, Step: 200 -2022-11-06 01:37:08 - r - INFO: - Episode: 171/200, Reward: -280.000, Step: 200 -2022-11-06 01:37:17 - r - INFO: - Episode: 172/200, Reward: -290.000, Step: 200 -2022-11-06 01:37:33 - r - INFO: - Episode: 173/200, Reward: -280.000, Step: 200 -2022-11-06 01:37:45 - r - INFO: - Episode: 174/200, Reward: -300.000, Step: 200 -2022-11-06 01:38:02 - r - INFO: - Episode: 175/200, Reward: -350.000, Step: 200 -2022-11-06 01:38:17 - r - INFO: - Episode: 176/200, Reward: -320.000, Step: 200 -2022-11-06 01:38:31 - r - INFO: - Episode: 177/200, Reward: -320.000, Step: 200 -2022-11-06 01:38:47 - r - INFO: - Episode: 178/200, Reward: -320.000, Step: 200 -2022-11-06 01:39:03 - r - INFO: - Episode: 179/200, Reward: -300.000, Step: 200 -2022-11-06 01:39:04 - r - INFO: - Episode: 180/200, Reward: -117.000, Step: 87 -2022-11-06 01:39:06 - r - INFO: - Episode: 181/200, Reward: -158.000, Step: 88 -2022-11-06 01:39:23 - r - INFO: - Episode: 182/200, Reward: -300.000, Step: 200 -2022-11-06 01:39:34 - r - INFO: - Episode: 183/200, Reward: -290.000, Step: 200 -2022-11-06 01:39:51 - r - INFO: - Episode: 184/200, Reward: -350.000, Step: 200 -2022-11-06 01:40:09 - r - INFO: - Episode: 185/200, Reward: -310.000, Step: 200 -2022-11-06 01:40:10 - r - INFO: - Episode: 186/200, Reward: -58.000, Step: 38 -2022-11-06 01:40:26 - r - INFO: - Episode: 187/200, Reward: -290.000, Step: 200 -2022-11-06 01:40:42 - r - INFO: - Episode: 188/200, Reward: -310.000, Step: 200 -2022-11-06 01:40:57 - r - INFO: - Episode: 189/200, Reward: -350.000, Step: 200 -2022-11-06 01:41:12 - r - INFO: - Episode: 190/200, Reward: -300.000, Step: 200 -2022-11-06 01:41:32 - r - INFO: - Episode: 191/200, Reward: -380.000, Step: 200 -2022-11-06 01:41:37 - r - INFO: - Episode: 192/200, Reward: -230.000, Step: 200 -2022-11-06 01:41:37 - r - INFO: - Episode: 193/200, Reward: -26.000, Step: 26 -2022-11-06 01:41:56 - r - INFO: - Episode: 194/200, Reward: -340.000, Step: 200 -2022-11-06 01:42:09 - r - INFO: - Episode: 195/200, Reward: -280.000, Step: 200 -2022-11-06 01:42:10 - r - INFO: - Episode: 196/200, Reward: -106.000, Step: 66 -2022-11-06 01:42:10 - r - INFO: - Episode: 197/200, Reward: -7.000, Step: 17 -2022-11-06 01:42:20 - r - INFO: - Episode: 198/200, Reward: -248.000, Step: 178 -2022-11-06 01:42:22 - r - INFO: - Episode: 199/200, Reward: -161.000, Step: 101 -2022-11-06 01:42:22 - r - INFO: - Episode: 200/200, Reward: -3.000, Step: 13 -2022-11-06 01:42:22 - r - INFO: - Finish training! diff --git a/projects/codes/MonteCarlo/Train_Racetrack-v0_FirstVisitMC_20221106-010504/models/Q_table b/projects/codes/MonteCarlo/Train_Racetrack-v0_FirstVisitMC_20221106-010504/models/Q_table deleted file mode 100644 index 3231a0b..0000000 Binary files a/projects/codes/MonteCarlo/Train_Racetrack-v0_FirstVisitMC_20221106-010504/models/Q_table and /dev/null differ diff --git a/projects/codes/MonteCarlo/Train_Racetrack-v0_FirstVisitMC_20221106-010504/results/learning_curve.png b/projects/codes/MonteCarlo/Train_Racetrack-v0_FirstVisitMC_20221106-010504/results/learning_curve.png deleted file mode 100644 index 3799635..0000000 Binary files a/projects/codes/MonteCarlo/Train_Racetrack-v0_FirstVisitMC_20221106-010504/results/learning_curve.png and /dev/null differ diff --git a/projects/codes/MonteCarlo/Train_Racetrack-v0_FirstVisitMC_20221106-010504/results/res.csv b/projects/codes/MonteCarlo/Train_Racetrack-v0_FirstVisitMC_20221106-010504/results/res.csv deleted file mode 100644 index 214239b..0000000 --- a/projects/codes/MonteCarlo/Train_Racetrack-v0_FirstVisitMC_20221106-010504/results/res.csv +++ /dev/null @@ -1,201 +0,0 @@ -episodes,rewards,steps -0,-760,200 -1,-560,200 -2,-156,66 -3,-500,200 -4,-600,200 -5,-350,200 -6,-430,200 -7,-206,96 -8,-460,200 -9,-410,200 -10,-400,200 -11,-380,200 -12,-155,75 -13,-400,200 -14,-350,200 -15,-400,200 -16,-360,200 -17,-420,200 -18,-430,200 -19,-360,200 -20,-420,200 -21,-390,200 -22,-59,49 -23,-390,200 -24,2,8 -25,-217,117 -26,-287,167 -27,-248,118 -28,-370,200 -29,-390,200 -30,-370,200 -31,-360,200 -32,-420,200 -33,-430,200 -34,-430,200 -35,-430,200 -36,-380,200 -37,-420,200 -38,-350,200 -39,-370,200 -40,-400,200 -41,-410,200 -42,-360,200 -43,-16,16 -44,-23,13 -45,-390,200 -46,-390,200 -47,-109,79 -48,-300,200 -49,-370,200 -50,-460,200 -51,-350,200 -52,-320,200 -53,-310,200 -54,-390,200 -55,-370,200 -56,-390,200 -57,-350,200 -58,-123,73 -59,-204,124 -60,-39,29 -61,-155,85 -62,-108,58 -63,-249,169 -64,-170,100 -65,1,9 -66,-23,23 -67,-104,74 -68,-223,123 -69,-350,200 -70,-124,104 -71,-20,20 -72,-360,200 -73,-67,37 -74,-360,200 -75,-71,41 -76,-23,23 -77,-41,21 -78,-1,11 -79,-270,200 -80,-330,200 -81,-290,200 -82,-2,12 -83,-300,200 -84,-380,200 -85,-47,37 -86,-350,200 -87,-308,188 -88,-370,200 -89,-214,154 -90,-290,200 -91,-370,200 -92,-32,22 -93,-400,200 -94,-217,127 -95,-330,200 -96,-380,200 -97,-320,200 -98,-300,200 -99,-350,200 -100,-400,200 -101,-330,200 -102,-360,200 -103,-380,200 -104,-400,200 -105,-290,200 -106,-203,103 -107,-74,54 -108,-330,200 -109,-380,200 -110,-263,173 -111,-290,200 -112,-340,200 -113,-86,66 -114,-340,200 -115,-160,110 -116,-340,200 -117,-320,200 -118,-320,200 -119,-360,200 -120,-330,200 -121,-350,200 -122,-300,200 -123,-320,200 -124,-70,40 -125,-59,39 -126,-340,200 -127,-87,77 -128,-330,200 -129,-260,200 -130,-290,200 -131,-330,200 -132,-340,200 -133,-78,48 -134,-390,200 -135,-320,200 -136,-360,200 -137,-340,200 -138,-185,115 -139,-340,200 -140,-250,200 -141,-347,197 -142,-320,200 -143,-330,200 -144,-270,200 -145,-380,200 -146,-320,200 -147,-340,200 -148,-310,200 -149,-290,200 -150,-380,200 -151,-281,181 -152,-30,30 -153,-280,200 -154,-300,200 -155,-300,200 -156,-300,200 -157,-370,200 -158,-320,200 -159,-250,200 -160,-370,200 -161,-290,200 -162,-320,200 -163,-290,200 -164,-44,44 -165,-216,196 -166,-340,200 -167,-360,200 -168,-310,200 -169,-320,200 -170,-280,200 -171,-290,200 -172,-280,200 -173,-300,200 -174,-350,200 -175,-320,200 -176,-320,200 -177,-320,200 -178,-300,200 -179,-117,87 -180,-158,88 -181,-300,200 -182,-290,200 -183,-350,200 -184,-310,200 -185,-58,38 -186,-290,200 -187,-310,200 -188,-350,200 -189,-300,200 -190,-380,200 -191,-230,200 -192,-26,26 -193,-340,200 -194,-280,200 -195,-106,66 -196,-7,17 -197,-248,178 -198,-161,101 -199,-3,13 diff --git a/projects/codes/MonteCarlo/agent.py b/projects/codes/MonteCarlo/agent.py deleted file mode 100644 index c426527..0000000 --- a/projects/codes/MonteCarlo/agent.py +++ /dev/null @@ -1,78 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: John -Email: johnjim0816@gmail.com -Date: 2021-03-12 16:14:34 -LastEditor: John -LastEditTime: 2022-11-06 01:04:57 -Discription: -Environment: -''' -import numpy as np -from collections import defaultdict -import torch -import dill - -class FisrtVisitMC: - ''' On-Policy First-Visit MC Control - ''' - def __init__(self,cfg): - self.n_actions = cfg.n_actions - self.epsilon = cfg.epsilon - self.gamma = cfg.gamma - self.Q_table = defaultdict(lambda: np.zeros(cfg.n_actions)) - self.returns_sum = defaultdict(float) # 保存return之和 - self.returns_count = defaultdict(float) - - def sample_action(self,state): - state = str(state) - if state in self.Q_table.keys(): - best_action = np.argmax(self.Q_table[state]) - action_probs = np.ones(self.n_actions, dtype=float) * self.epsilon / self.n_actions - action_probs[best_action] += (1.0 - self.epsilon) - action = np.random.choice(np.arange(len(action_probs)), p=action_probs) - else: - action = np.random.randint(0,self.n_actions) - return action - def predict_action(self,state): - state = str(state) - if state in self.Q_table.keys(): - best_action = np.argmax(self.Q_table[state]) - action_probs = np.ones(self.n_actions, dtype=float) * self.epsilon / self.n_actions - action_probs[best_action] += (1.0 - self.epsilon) - action = np.argmax(self.Q_table[state]) - else: - action = np.random.randint(0,self.n_actions) - return action - def update(self,one_ep_transition): - # Find all (state, action) pairs we've visited in this one_ep_transition - # We convert each state to a tuple so that we can use it as a dict key - sa_in_episode = set([(str(x[0]), x[1]) for x in one_ep_transition]) - for state, action in sa_in_episode: - sa_pair = (state, action) - # Find the first occurence of the (state, action) pair in the one_ep_transition - - first_occurence_idx = next(i for i,x in enumerate(one_ep_transition) - if str(x[0]) == state and x[1] == action) - # Sum up all rewards since the first occurance - G = sum([x[2]*(self.gamma**i) for i,x in enumerate(one_ep_transition[first_occurence_idx:])]) - # Calculate average return for this state over all sampled episodes - self.returns_sum[sa_pair] += G - self.returns_count[sa_pair] += 1.0 - self.Q_table[state][action] = self.returns_sum[sa_pair] / self.returns_count[sa_pair] - def save_model(self,path=None): - '''把 Q表格 的数据保存到文件中 - ''' - from pathlib import Path - Path(path).mkdir(parents=True, exist_ok=True) - torch.save( - obj=self.Q_table, - f=path+"Q_table", - pickle_module=dill - ) - - def load_model(self, path=None): - '''从文件中读取数据到 Q表格 - ''' - self.Q_table =torch.load(f=path+"Q_table",pickle_module=dill) \ No newline at end of file diff --git a/projects/codes/MonteCarlo/assets/mc_control_algo.png b/projects/codes/MonteCarlo/assets/mc_control_algo.png deleted file mode 100644 index 0b436fa..0000000 Binary files a/projects/codes/MonteCarlo/assets/mc_control_algo.png and /dev/null differ diff --git a/projects/codes/MonteCarlo/config/config.py b/projects/codes/MonteCarlo/config/config.py deleted file mode 100644 index d255547..0000000 --- a/projects/codes/MonteCarlo/config/config.py +++ /dev/null @@ -1,32 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: JiangJi -Email: johnjim0816@gmail.com -Date: 2022-11-06 00:31:35 -LastEditor: JiangJi -LastEditTime: 2022-11-06 00:45:44 -Discription: parameters of MonteCarlo -''' -from common.config import GeneralConfig,AlgoConfig - -class GeneralConfigMC(GeneralConfig): - def __init__(self) -> None: - self.env_name = "Racetrack-v0" # name of environment - self.algo_name = "FirstVisitMC" # name of algorithm - self.mode = "train" # train or test - self.seed = 1 # random seed - self.device = "cpu" # device to use - self.train_eps = 200 # number of episodes for training - self.test_eps = 20 # number of episodes for testing - self.max_steps = 200 # max steps for each episode - self.load_checkpoint = False - self.load_path = "tasks" # path to load model - self.show_fig = False # show figure or not - self.save_fig = True # save figure or not - -class AlgoConfigMC(AlgoConfig): - def __init__(self) -> None: - self.gamma = 0.90 # discount factor - self.epsilon = 0.15 # epsilon greedy - self.lr = 0.1 # learning rate \ No newline at end of file diff --git a/projects/codes/MonteCarlo/task0.py b/projects/codes/MonteCarlo/task0.py deleted file mode 100644 index 75e52e1..0000000 --- a/projects/codes/MonteCarlo/task0.py +++ /dev/null @@ -1,125 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: John -Email: johnjim0816@gmail.com -Date: 2021-03-11 14:26:44 -LastEditor: John -LastEditTime: 2022-11-08 23:35:18 -Discription: -Environment: -''' -import sys,os -os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE" # avoid "OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized." -curr_path = os.path.dirname(os.path.abspath(__file__)) # current path -parent_path = os.path.dirname(curr_path) # parent path -sys.path.append(parent_path) # add path to system path - -import datetime -import gym -from envs.wrappers import CliffWalkingWapper -from envs.register import register_env -from common.utils import merge_class_attrs,all_seed -from common.launcher import Launcher -from MonteCarlo.agent import FisrtVisitMC -from MonteCarlo.config.config import GeneralConfigMC,AlgoConfigMC - -class Main(Launcher): - def __init__(self) -> None: - super().__init__() - self.cfgs['general_cfg'] = merge_class_attrs(self.cfgs['general_cfg'],GeneralConfigMC()) - self.cfgs['algo_cfg'] = merge_class_attrs(self.cfgs['algo_cfg'],AlgoConfigMC()) - def env_agent_config(self,cfg,logger): - ''' create env and agent - ''' - register_env(cfg.env_name) - env = gym.make(cfg.env_name,new_step_api=False) # create env - if cfg.env_name == 'CliffWalking-v0': - env = CliffWalkingWapper(env) - if cfg.seed !=0: # set random seed - all_seed(env,seed=cfg.seed) - try: # state dimension - n_states = env.observation_space.n # print(hasattr(env.observation_space, 'n')) - except AttributeError: - n_states = env.observation_space.shape[0] # print(hasattr(env.observation_space, 'shape')) - n_actions = env.action_space.n # action dimension - logger.info(f"n_states: {n_states}, n_actions: {n_actions}") # print info - # update to cfg paramters - setattr(cfg, 'n_states', n_states) - setattr(cfg, 'n_actions', n_actions) - agent = FisrtVisitMC(cfg) - return env,agent - def train_one_episode(self, env, agent, cfg): - ep_reward = 0 # reward per episode - ep_step = 0 - state = env.reset() # reset and obtain initial state - one_ep_transition = [] - for _ in range(cfg.max_steps): - ep_step += 1 - action = agent.sample_action(state) # sample action - next_state, reward, terminated, info = env.step(action) # update env and return transitions under new_step_api of OpenAI Gym - one_ep_transition.append((state, action, reward)) # save transitions - agent.update(one_ep_transition) # update agent - state = next_state # update next state for env - ep_reward += reward # - if terminated: - break - return agent,ep_reward,ep_step - def test_one_episode(self, env, agent, cfg): - ep_reward = 0 # reward per episode - ep_step = 0 - state = env.reset() # reset and obtain initial state - for _ in range(cfg.max_steps): - ep_step += 1 - action = agent.predict_action(state) # sample action - next_state, reward, terminated, info = env.step(action) # update env and return transitions under new_step_api of OpenAI Gym - state = next_state # update next state for env - ep_reward += reward # - if terminated: - break - return agent,ep_reward,ep_step - -def train(cfg, env, agent): - print("开始训练!") - print(f"环境:{cfg.env_name},算法:{cfg.algo_name},设备:{cfg.device}") - rewards = [] - for i_ep in range(cfg.train_eps): - state = env.reset() - ep_reward = 0 - one_ep_transition = [] - while True: - action = agent.sample(state) - next_state, reward, done = env.step(action) - ep_reward += reward - one_ep_transition.append((state, action, reward)) - state = next_state - if done: - break - rewards.append(ep_reward) - agent.update(one_ep_transition) - if (i_ep+1) % 10 == 0: - print(f"Episode:{i_ep+1}/{cfg.train_eps}: Reward:{ep_reward}") - print("完成训练") - return {'rewards':rewards} - -def test(cfg, env, agent): - print("开始测试!") - print(f"环境:{cfg.env_name}, 算法:{cfg.algo_name}, 设备:{cfg.device}") - rewards = [] - for i_ep in range(cfg.test_eps): - state = env.reset() - ep_reward = 0 - while True: - action = agent.predict(state) - next_state, reward, done = env.step(action) - ep_reward += reward - state = next_state - if done: - break - rewards.append(ep_reward) - print(f'回合:{i_ep+1}/{cfg.test_eps},奖励:{ep_reward:.2f}') - return {'rewards':rewards} - -if __name__ == "__main__": - main = Main() - main.run() \ No newline at end of file diff --git a/projects/codes/NoisyDQN/noisy_dqn.py b/projects/codes/NoisyDQN/noisy_dqn.py deleted file mode 100644 index 45cc5d2..0000000 --- a/projects/codes/NoisyDQN/noisy_dqn.py +++ /dev/null @@ -1,52 +0,0 @@ -import torch -import torch.nn as nn - -class NoisyLinear(nn.Module): - def __init__(self, input_dim, output_dim, std_init=0.4): - super(NoisyLinear, self).__init__() - - self.input_dim = input_dim - self.output_dim = output_dim - self.std_init = std_init - - self.weight_mu = nn.Parameter(torch.FloatTensor(output_dim, input_dim)) - self.weight_sigma = nn.Parameter(torch.FloatTensor(output_dim, input_dim)) - self.register_buffer('weight_epsilon', torch.FloatTensor(output_dim, input_dim)) - - self.bias_mu = nn.Parameter(torch.FloatTensor(output_dim)) - self.bias_sigma = nn.Parameter(torch.FloatTensor(output_dim)) - self.register_buffer('bias_epsilon', torch.FloatTensor(output_dim)) - - self.reset_parameters() - self.reset_noise() - - def forward(self, x): - if self.training: - weight = self.weight_mu + self.weight_sigma.mul( (self.weight_epsilon)) - bias = self.bias_mu + self.bias_sigma.mul(Variable(self.bias_epsilon)) - else: - weight = self.weight_mu - bias = self.bias_mu - - return F.linear(x, weight, bias) - - def reset_parameters(self): - mu_range = 1 / math.sqrt(self.weight_mu.size(1)) - - self.weight_mu.data.uniform_(-mu_range, mu_range) - self.weight_sigma.data.fill_(self.std_init / math.sqrt(self.weight_sigma.size(1))) - - self.bias_mu.data.uniform_(-mu_range, mu_range) - self.bias_sigma.data.fill_(self.std_init / math.sqrt(self.bias_sigma.size(0))) - - def reset_noise(self): - epsilon_in = self._scale_noise(self.input_dim) - epsilon_out = self._scale_noise(self.output_dim) - - self.weight_epsilon.copy_(epsilon_out.ger(epsilon_in)) - self.bias_epsilon.copy_(self._scale_noise(self.output_dim)) - - def _scale_noise(self, size): - x = torch.randn(size) - x = x.sign().mul(x.abs().sqrt()) - return x \ No newline at end of file diff --git a/projects/codes/NoisyDQN/task0_train.ipynb b/projects/codes/NoisyDQN/task0_train.ipynb deleted file mode 100644 index ecd0092..0000000 --- a/projects/codes/NoisyDQN/task0_train.ipynb +++ /dev/null @@ -1,25 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import sys\n", - "from pathlib import Path\n", - "curr_path = str(Path().absolute()) # 当前路径\n", - "parent_path = str(Path().absolute().parent) # 父路径\n", - "sys.path.append(parent_path) # 添加路径到系统路径" - ] - } - ], - "metadata": { - "language_info": { - "name": "python" - }, - "orig_nbformat": 4 - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/projects/codes/PER_DQN/Test_CartPole-v1_PER_DQN_20221114-104649/config.yaml b/projects/codes/PER_DQN/Test_CartPole-v1_PER_DQN_20221114-104649/config.yaml deleted file mode 100644 index 39f8743..0000000 --- a/projects/codes/PER_DQN/Test_CartPole-v1_PER_DQN_20221114-104649/config.yaml +++ /dev/null @@ -1,25 +0,0 @@ -general_cfg: - algo_name: PER_DQN - device: cpu - env_name: CartPole-v1 - eval_eps: 10 - eval_per_episode: 5 - load_checkpoint: true - load_path: Train_CartPole-v1_PER_DQN_20221113-162804 - max_steps: 200 - mode: test - save_fig: true - seed: 0 - show_fig: false - test_eps: 10 - train_eps: 200 -algo_cfg: - batch_size: 64 - buffer_size: 100000 - epsilon_decay: 500 - epsilon_end: 0.01 - epsilon_start: 0.95 - gamma: 0.95 - hidden_dim: 256 - lr: 0.0001 - target_update: 4 diff --git a/projects/codes/PER_DQN/Test_CartPole-v1_PER_DQN_20221114-104649/logs/log.txt b/projects/codes/PER_DQN/Test_CartPole-v1_PER_DQN_20221114-104649/logs/log.txt deleted file mode 100644 index 9fe5454..0000000 --- a/projects/codes/PER_DQN/Test_CartPole-v1_PER_DQN_20221114-104649/logs/log.txt +++ /dev/null @@ -1,14 +0,0 @@ -2022-11-14 10:46:49 - r - INFO: - n_states: 4, n_actions: 2 -2022-11-14 10:46:49 - r - INFO: - Start testing! -2022-11-14 10:46:49 - r - INFO: - Env: CartPole-v1, Algorithm: PER_DQN, Device: cpu -2022-11-14 10:46:49 - r - INFO: - Episode: 1/10, Reward: 200.000, Step: 200 -2022-11-14 10:46:49 - r - INFO: - Episode: 2/10, Reward: 200.000, Step: 200 -2022-11-14 10:46:49 - r - INFO: - Episode: 3/10, Reward: 200.000, Step: 200 -2022-11-14 10:46:49 - r - INFO: - Episode: 4/10, Reward: 200.000, Step: 200 -2022-11-14 10:46:49 - r - INFO: - Episode: 5/10, Reward: 200.000, Step: 200 -2022-11-14 10:46:49 - r - INFO: - Episode: 6/10, Reward: 200.000, Step: 200 -2022-11-14 10:46:49 - r - INFO: - Episode: 7/10, Reward: 200.000, Step: 200 -2022-11-14 10:46:49 - r - INFO: - Episode: 8/10, Reward: 200.000, Step: 200 -2022-11-14 10:46:49 - r - INFO: - Episode: 9/10, Reward: 200.000, Step: 200 -2022-11-14 10:46:49 - r - INFO: - Episode: 10/10, Reward: 200.000, Step: 200 -2022-11-14 10:46:49 - r - INFO: - Finish testing! diff --git a/projects/codes/PER_DQN/Test_CartPole-v1_PER_DQN_20221114-104649/models/checkpoint.pt b/projects/codes/PER_DQN/Test_CartPole-v1_PER_DQN_20221114-104649/models/checkpoint.pt deleted file mode 100644 index 06d607b..0000000 Binary files a/projects/codes/PER_DQN/Test_CartPole-v1_PER_DQN_20221114-104649/models/checkpoint.pt and /dev/null differ diff --git a/projects/codes/PER_DQN/Test_CartPole-v1_PER_DQN_20221114-104649/results/learning_curve.png b/projects/codes/PER_DQN/Test_CartPole-v1_PER_DQN_20221114-104649/results/learning_curve.png deleted file mode 100644 index f1e8056..0000000 Binary files a/projects/codes/PER_DQN/Test_CartPole-v1_PER_DQN_20221114-104649/results/learning_curve.png and /dev/null differ diff --git a/projects/codes/PER_DQN/Test_CartPole-v1_PER_DQN_20221114-104649/results/res.csv b/projects/codes/PER_DQN/Test_CartPole-v1_PER_DQN_20221114-104649/results/res.csv deleted file mode 100644 index cbbcf2e..0000000 --- a/projects/codes/PER_DQN/Test_CartPole-v1_PER_DQN_20221114-104649/results/res.csv +++ /dev/null @@ -1,11 +0,0 @@ -episodes,rewards,steps -0,200.0,200 -1,200.0,200 -2,200.0,200 -3,200.0,200 -4,200.0,200 -5,200.0,200 -6,200.0,200 -7,200.0,200 -8,200.0,200 -9,200.0,200 diff --git a/projects/codes/PER_DQN/Train_CartPole-v1_PER_DQN_20221113-162804/config.yaml b/projects/codes/PER_DQN/Train_CartPole-v1_PER_DQN_20221113-162804/config.yaml deleted file mode 100644 index bd4f2bd..0000000 --- a/projects/codes/PER_DQN/Train_CartPole-v1_PER_DQN_20221113-162804/config.yaml +++ /dev/null @@ -1,25 +0,0 @@ -general_cfg: - algo_name: PER_DQN - device: cuda - env_name: CartPole-v1 - eval_eps: 10 - eval_per_episode: 5 - load_checkpoint: false - load_path: tasks - max_steps: 200 - mode: train - save_fig: true - seed: 1 - show_fig: false - test_eps: 10 - train_eps: 200 -algo_cfg: - batch_size: 64 - buffer_size: 100000 - epsilon_decay: 500 - epsilon_end: 0.01 - epsilon_start: 0.95 - gamma: 0.95 - hidden_dim: 256 - lr: 0.0001 - target_update: 4 diff --git a/projects/codes/PER_DQN/Train_CartPole-v1_PER_DQN_20221113-162804/logs/log.txt b/projects/codes/PER_DQN/Train_CartPole-v1_PER_DQN_20221113-162804/logs/log.txt deleted file mode 100644 index 1cea48c..0000000 --- a/projects/codes/PER_DQN/Train_CartPole-v1_PER_DQN_20221113-162804/logs/log.txt +++ /dev/null @@ -1,224 +0,0 @@ -2022-11-13 16:28:04 - r - INFO: - n_states: 4, n_actions: 2 -2022-11-13 16:28:19 - r - INFO: - Start training! -2022-11-13 16:28:19 - r - INFO: - Env: CartPole-v1, Algorithm: PER_DQN, Device: cuda -2022-11-13 16:28:23 - r - INFO: - Episode: 1/200, Reward: 18.000, Step: 18 -2022-11-13 16:28:24 - r - INFO: - Episode: 2/200, Reward: 35.000, Step: 35 -2022-11-13 16:28:24 - r - INFO: - Episode: 3/200, Reward: 13.000, Step: 13 -2022-11-13 16:28:24 - r - INFO: - Episode: 4/200, Reward: 20.000, Step: 20 -2022-11-13 16:28:24 - r - INFO: - Episode: 5/200, Reward: 24.000, Step: 24 -2022-11-13 16:28:24 - r - INFO: - Current episode 5 has the best eval reward: 9.100 -2022-11-13 16:28:24 - r - INFO: - Episode: 6/200, Reward: 10.000, Step: 10 -2022-11-13 16:28:24 - r - INFO: - Episode: 7/200, Reward: 20.000, Step: 20 -2022-11-13 16:28:24 - r - INFO: - Episode: 8/200, Reward: 19.000, Step: 19 -2022-11-13 16:28:25 - r - INFO: - Episode: 9/200, Reward: 30.000, Step: 30 -2022-11-13 16:28:25 - r - INFO: - Episode: 10/200, Reward: 10.000, Step: 10 -2022-11-13 16:28:25 - r - INFO: - Current episode 10 has the best eval reward: 9.200 -2022-11-13 16:28:25 - r - INFO: - Episode: 11/200, Reward: 16.000, Step: 16 -2022-11-13 16:28:25 - r - INFO: - Episode: 12/200, Reward: 16.000, Step: 16 -2022-11-13 16:28:25 - r - INFO: - Episode: 13/200, Reward: 12.000, Step: 12 -2022-11-13 16:28:25 - r - INFO: - Episode: 14/200, Reward: 28.000, Step: 28 -2022-11-13 16:28:25 - r - INFO: - Episode: 15/200, Reward: 22.000, Step: 22 -2022-11-13 16:28:25 - r - INFO: - Current episode 15 has the best eval reward: 9.300 -2022-11-13 16:28:25 - r - INFO: - Episode: 16/200, Reward: 14.000, Step: 14 -2022-11-13 16:28:25 - r - INFO: - Episode: 17/200, Reward: 9.000, Step: 9 -2022-11-13 16:28:26 - r - INFO: - Episode: 18/200, Reward: 13.000, Step: 13 -2022-11-13 16:28:26 - r - INFO: - Episode: 19/200, Reward: 19.000, Step: 19 -2022-11-13 16:28:26 - r - INFO: - Episode: 20/200, Reward: 10.000, Step: 10 -2022-11-13 16:28:26 - r - INFO: - Episode: 21/200, Reward: 10.000, Step: 10 -2022-11-13 16:28:26 - r - INFO: - Episode: 22/200, Reward: 12.000, Step: 12 -2022-11-13 16:28:26 - r - INFO: - Episode: 23/200, Reward: 9.000, Step: 9 -2022-11-13 16:28:26 - r - INFO: - Episode: 24/200, Reward: 12.000, Step: 12 -2022-11-13 16:28:26 - r - INFO: - Episode: 25/200, Reward: 11.000, Step: 11 -2022-11-13 16:28:26 - r - INFO: - Current episode 25 has the best eval reward: 9.800 -2022-11-13 16:28:26 - r - INFO: - Episode: 26/200, Reward: 11.000, Step: 11 -2022-11-13 16:28:26 - r - INFO: - Episode: 27/200, Reward: 13.000, Step: 13 -2022-11-13 16:28:26 - r - INFO: - Episode: 28/200, Reward: 11.000, Step: 11 -2022-11-13 16:28:27 - r - INFO: - Episode: 29/200, Reward: 13.000, Step: 13 -2022-11-13 16:28:27 - r - INFO: - Episode: 30/200, Reward: 20.000, Step: 20 -2022-11-13 16:28:27 - r - INFO: - Current episode 30 has the best eval reward: 12.200 -2022-11-13 16:28:27 - r - INFO: - Episode: 31/200, Reward: 16.000, Step: 16 -2022-11-13 16:28:27 - r - INFO: - Episode: 32/200, Reward: 9.000, Step: 9 -2022-11-13 16:28:27 - r - INFO: - Episode: 33/200, Reward: 16.000, Step: 16 -2022-11-13 16:28:27 - r - INFO: - Episode: 34/200, Reward: 15.000, Step: 15 -2022-11-13 16:28:27 - r - INFO: - Episode: 35/200, Reward: 12.000, Step: 12 -2022-11-13 16:28:27 - r - INFO: - Current episode 35 has the best eval reward: 12.500 -2022-11-13 16:28:27 - r - INFO: - Episode: 36/200, Reward: 12.000, Step: 12 -2022-11-13 16:28:27 - r - INFO: - Episode: 37/200, Reward: 16.000, Step: 16 -2022-11-13 16:28:28 - r - INFO: - Episode: 38/200, Reward: 13.000, Step: 13 -2022-11-13 16:28:28 - r - INFO: - Episode: 39/200, Reward: 18.000, Step: 18 -2022-11-13 16:28:28 - r - INFO: - Episode: 40/200, Reward: 18.000, Step: 18 -2022-11-13 16:28:28 - r - INFO: - Current episode 40 has the best eval reward: 20.400 -2022-11-13 16:28:28 - r - INFO: - Episode: 41/200, Reward: 48.000, Step: 48 -2022-11-13 16:28:29 - r - INFO: - Episode: 42/200, Reward: 52.000, Step: 52 -2022-11-13 16:28:29 - r - INFO: - Episode: 43/200, Reward: 33.000, Step: 33 -2022-11-13 16:28:29 - r - INFO: - Episode: 44/200, Reward: 15.000, Step: 15 -2022-11-13 16:28:29 - r - INFO: - Episode: 45/200, Reward: 18.000, Step: 18 -2022-11-13 16:28:29 - r - INFO: - Episode: 46/200, Reward: 22.000, Step: 22 -2022-11-13 16:28:29 - r - INFO: - Episode: 47/200, Reward: 19.000, Step: 19 -2022-11-13 16:28:30 - r - INFO: - Episode: 48/200, Reward: 19.000, Step: 19 -2022-11-13 16:28:30 - r - INFO: - Episode: 49/200, Reward: 11.000, Step: 11 -2022-11-13 16:28:30 - r - INFO: - Episode: 50/200, Reward: 9.000, Step: 9 -2022-11-13 16:28:30 - r - INFO: - Episode: 51/200, Reward: 10.000, Step: 10 -2022-11-13 16:28:30 - r - INFO: - Episode: 52/200, Reward: 10.000, Step: 10 -2022-11-13 16:28:30 - r - INFO: - Episode: 53/200, Reward: 10.000, Step: 10 -2022-11-13 16:28:30 - r - INFO: - Episode: 54/200, Reward: 10.000, Step: 10 -2022-11-13 16:28:30 - r - INFO: - Episode: 55/200, Reward: 9.000, Step: 9 -2022-11-13 16:28:30 - r - INFO: - Episode: 56/200, Reward: 17.000, Step: 17 -2022-11-13 16:28:31 - r - INFO: - Episode: 57/200, Reward: 75.000, Step: 75 -2022-11-13 16:28:31 - r - INFO: - Episode: 58/200, Reward: 28.000, Step: 28 -2022-11-13 16:28:31 - r - INFO: - Episode: 59/200, Reward: 30.000, Step: 30 -2022-11-13 16:28:32 - r - INFO: - Episode: 60/200, Reward: 54.000, Step: 54 -2022-11-13 16:28:32 - r - INFO: - Current episode 60 has the best eval reward: 34.600 -2022-11-13 16:28:32 - r - INFO: - Episode: 61/200, Reward: 22.000, Step: 22 -2022-11-13 16:28:32 - r - INFO: - Episode: 62/200, Reward: 28.000, Step: 28 -2022-11-13 16:28:32 - r - INFO: - Episode: 63/200, Reward: 26.000, Step: 26 -2022-11-13 16:28:33 - r - INFO: - Episode: 64/200, Reward: 32.000, Step: 32 -2022-11-13 16:28:33 - r - INFO: - Episode: 65/200, Reward: 30.000, Step: 30 -2022-11-13 16:28:33 - r - INFO: - Episode: 66/200, Reward: 29.000, Step: 29 -2022-11-13 16:28:34 - r - INFO: - Episode: 67/200, Reward: 28.000, Step: 28 -2022-11-13 16:28:34 - r - INFO: - Episode: 68/200, Reward: 38.000, Step: 38 -2022-11-13 16:28:34 - r - INFO: - Episode: 69/200, Reward: 28.000, Step: 28 -2022-11-13 16:28:34 - r - INFO: - Episode: 70/200, Reward: 22.000, Step: 22 -2022-11-13 16:28:34 - r - INFO: - Current episode 70 has the best eval reward: 36.700 -2022-11-13 16:28:35 - r - INFO: - Episode: 71/200, Reward: 40.000, Step: 40 -2022-11-13 16:28:35 - r - INFO: - Episode: 72/200, Reward: 27.000, Step: 27 -2022-11-13 16:28:35 - r - INFO: - Episode: 73/200, Reward: 24.000, Step: 24 -2022-11-13 16:28:35 - r - INFO: - Episode: 74/200, Reward: 47.000, Step: 47 -2022-11-13 16:28:36 - r - INFO: - Episode: 75/200, Reward: 127.000, Step: 127 -2022-11-13 16:28:37 - r - INFO: - Episode: 76/200, Reward: 48.000, Step: 48 -2022-11-13 16:28:37 - r - INFO: - Episode: 77/200, Reward: 27.000, Step: 27 -2022-11-13 16:28:37 - r - INFO: - Episode: 78/200, Reward: 65.000, Step: 65 -2022-11-13 16:28:38 - r - INFO: - Episode: 79/200, Reward: 75.000, Step: 75 -2022-11-13 16:28:38 - r - INFO: - Episode: 80/200, Reward: 47.000, Step: 47 -2022-11-13 16:28:38 - r - INFO: - Current episode 80 has the best eval reward: 37.200 -2022-11-13 16:28:39 - r - INFO: - Episode: 81/200, Reward: 34.000, Step: 34 -2022-11-13 16:28:39 - r - INFO: - Episode: 82/200, Reward: 38.000, Step: 38 -2022-11-13 16:28:39 - r - INFO: - Episode: 83/200, Reward: 24.000, Step: 24 -2022-11-13 16:28:39 - r - INFO: - Episode: 84/200, Reward: 47.000, Step: 47 -2022-11-13 16:28:40 - r - INFO: - Episode: 85/200, Reward: 35.000, Step: 35 -2022-11-13 16:28:40 - r - INFO: - Current episode 85 has the best eval reward: 66.900 -2022-11-13 16:28:41 - r - INFO: - Episode: 86/200, Reward: 103.000, Step: 103 -2022-11-13 16:28:41 - r - INFO: - Episode: 87/200, Reward: 64.000, Step: 64 -2022-11-13 16:28:42 - r - INFO: - Episode: 88/200, Reward: 59.000, Step: 59 -2022-11-13 16:28:43 - r - INFO: - Episode: 89/200, Reward: 200.000, Step: 200 -2022-11-13 16:28:44 - r - INFO: - Episode: 90/200, Reward: 200.000, Step: 200 -2022-11-13 16:28:46 - r - INFO: - Current episode 90 has the best eval reward: 200.000 -2022-11-13 16:28:47 - r - INFO: - Episode: 91/200, Reward: 200.000, Step: 200 -2022-11-13 16:28:48 - r - INFO: - Episode: 92/200, Reward: 200.000, Step: 200 -2022-11-13 16:28:50 - r - INFO: - Episode: 93/200, Reward: 200.000, Step: 200 -2022-11-13 16:28:51 - r - INFO: - Episode: 94/200, Reward: 200.000, Step: 200 -2022-11-13 16:28:52 - r - INFO: - Episode: 95/200, Reward: 200.000, Step: 200 -2022-11-13 16:28:54 - r - INFO: - Current episode 95 has the best eval reward: 200.000 -2022-11-13 16:28:55 - r - INFO: - Episode: 96/200, Reward: 200.000, Step: 200 -2022-11-13 16:28:56 - r - INFO: - Episode: 97/200, Reward: 200.000, Step: 200 -2022-11-13 16:28:58 - r - INFO: - Episode: 98/200, Reward: 200.000, Step: 200 -2022-11-13 16:28:59 - r - INFO: - Episode: 99/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:00 - r - INFO: - Episode: 100/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:02 - r - INFO: - Current episode 100 has the best eval reward: 200.000 -2022-11-13 16:29:04 - r - INFO: - Episode: 101/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:05 - r - INFO: - Episode: 102/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:06 - r - INFO: - Episode: 103/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:07 - r - INFO: - Episode: 104/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:09 - r - INFO: - Episode: 105/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:10 - r - INFO: - Current episode 105 has the best eval reward: 200.000 -2022-11-13 16:29:11 - r - INFO: - Episode: 106/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:13 - r - INFO: - Episode: 107/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:14 - r - INFO: - Episode: 108/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:16 - r - INFO: - Episode: 109/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:17 - r - INFO: - Episode: 110/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:20 - r - INFO: - Episode: 111/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:21 - r - INFO: - Episode: 112/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:22 - r - INFO: - Episode: 113/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:23 - r - INFO: - Episode: 114/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:25 - r - INFO: - Episode: 115/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:26 - r - INFO: - Current episode 115 has the best eval reward: 200.000 -2022-11-13 16:29:27 - r - INFO: - Episode: 116/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:29 - r - INFO: - Episode: 117/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:30 - r - INFO: - Episode: 118/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:31 - r - INFO: - Episode: 119/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:33 - r - INFO: - Episode: 120/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:34 - r - INFO: - Current episode 120 has the best eval reward: 200.000 -2022-11-13 16:29:35 - r - INFO: - Episode: 121/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:37 - r - INFO: - Episode: 122/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:38 - r - INFO: - Episode: 123/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:39 - r - INFO: - Episode: 124/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:41 - r - INFO: - Episode: 125/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:43 - r - INFO: - Episode: 126/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:45 - r - INFO: - Episode: 127/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:46 - r - INFO: - Episode: 128/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:47 - r - INFO: - Episode: 129/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:49 - r - INFO: - Episode: 130/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:51 - r - INFO: - Episode: 131/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:53 - r - INFO: - Episode: 132/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:54 - r - INFO: - Episode: 133/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:55 - r - INFO: - Episode: 134/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:57 - r - INFO: - Episode: 135/200, Reward: 200.000, Step: 200 -2022-11-13 16:29:59 - r - INFO: - Episode: 136/200, Reward: 200.000, Step: 200 -2022-11-13 16:30:01 - r - INFO: - Episode: 137/200, Reward: 185.000, Step: 185 -2022-11-13 16:30:02 - r - INFO: - Episode: 138/200, Reward: 193.000, Step: 193 -2022-11-13 16:30:03 - r - INFO: - Episode: 139/200, Reward: 192.000, Step: 192 -2022-11-13 16:30:04 - r - INFO: - Episode: 140/200, Reward: 200.000, Step: 200 -2022-11-13 16:30:07 - r - INFO: - Episode: 141/200, Reward: 200.000, Step: 200 -2022-11-13 16:30:08 - r - INFO: - Episode: 142/200, Reward: 200.000, Step: 200 -2022-11-13 16:30:10 - r - INFO: - Episode: 143/200, Reward: 200.000, Step: 200 -2022-11-13 16:30:11 - r - INFO: - Episode: 144/200, Reward: 191.000, Step: 191 -2022-11-13 16:30:12 - r - INFO: - Episode: 145/200, Reward: 200.000, Step: 200 -2022-11-13 16:30:15 - r - INFO: - Episode: 146/200, Reward: 184.000, Step: 184 -2022-11-13 16:30:17 - r - INFO: - Episode: 147/200, Reward: 198.000, Step: 198 -2022-11-13 16:30:18 - r - INFO: - Episode: 148/200, Reward: 200.000, Step: 200 -2022-11-13 16:30:19 - r - INFO: - Episode: 149/200, Reward: 200.000, Step: 200 -2022-11-13 16:30:21 - r - INFO: - Episode: 150/200, Reward: 192.000, Step: 192 -2022-11-13 16:30:23 - r - INFO: - Episode: 151/200, Reward: 186.000, Step: 186 -2022-11-13 16:30:25 - r - INFO: - Episode: 152/200, Reward: 200.000, Step: 200 -2022-11-13 16:30:26 - r - INFO: - Episode: 153/200, Reward: 194.000, Step: 194 -2022-11-13 16:30:27 - r - INFO: - Episode: 154/200, Reward: 199.000, Step: 199 -2022-11-13 16:30:29 - r - INFO: - Episode: 155/200, Reward: 183.000, Step: 183 -2022-11-13 16:30:32 - r - INFO: - Episode: 156/200, Reward: 173.000, Step: 173 -2022-11-13 16:30:33 - r - INFO: - Episode: 157/200, Reward: 197.000, Step: 197 -2022-11-13 16:30:34 - r - INFO: - Episode: 158/200, Reward: 200.000, Step: 200 -2022-11-13 16:30:36 - r - INFO: - Episode: 159/200, Reward: 200.000, Step: 200 -2022-11-13 16:30:37 - r - INFO: - Episode: 160/200, Reward: 196.000, Step: 196 -2022-11-13 16:30:40 - r - INFO: - Episode: 161/200, Reward: 200.000, Step: 200 -2022-11-13 16:30:42 - r - INFO: - Episode: 162/200, Reward: 200.000, Step: 200 -2022-11-13 16:30:43 - r - INFO: - Episode: 163/200, Reward: 194.000, Step: 194 -2022-11-13 16:30:44 - r - INFO: - Episode: 164/200, Reward: 185.000, Step: 185 -2022-11-13 16:30:45 - r - INFO: - Episode: 165/200, Reward: 173.000, Step: 173 -2022-11-13 16:30:48 - r - INFO: - Episode: 166/200, Reward: 192.000, Step: 192 -2022-11-13 16:30:49 - r - INFO: - Episode: 167/200, Reward: 164.000, Step: 164 -2022-11-13 16:30:50 - r - INFO: - Episode: 168/200, Reward: 188.000, Step: 188 -2022-11-13 16:30:52 - r - INFO: - Episode: 169/200, Reward: 189.000, Step: 189 -2022-11-13 16:30:53 - r - INFO: - Episode: 170/200, Reward: 197.000, Step: 197 -2022-11-13 16:30:55 - r - INFO: - Episode: 171/200, Reward: 187.000, Step: 187 -2022-11-13 16:30:57 - r - INFO: - Episode: 172/200, Reward: 200.000, Step: 200 -2022-11-13 16:30:58 - r - INFO: - Episode: 173/200, Reward: 195.000, Step: 195 -2022-11-13 16:30:59 - r - INFO: - Episode: 174/200, Reward: 200.000, Step: 200 -2022-11-13 16:31:01 - r - INFO: - Episode: 175/200, Reward: 195.000, Step: 195 -2022-11-13 16:31:03 - r - INFO: - Episode: 176/200, Reward: 200.000, Step: 200 -2022-11-13 16:31:05 - r - INFO: - Episode: 177/200, Reward: 200.000, Step: 200 -2022-11-13 16:31:06 - r - INFO: - Episode: 178/200, Reward: 200.000, Step: 200 -2022-11-13 16:31:07 - r - INFO: - Episode: 179/200, Reward: 200.000, Step: 200 -2022-11-13 16:31:09 - r - INFO: - Episode: 180/200, Reward: 200.000, Step: 200 -2022-11-13 16:31:11 - r - INFO: - Episode: 181/200, Reward: 200.000, Step: 200 -2022-11-13 16:31:13 - r - INFO: - Episode: 182/200, Reward: 200.000, Step: 200 -2022-11-13 16:31:14 - r - INFO: - Episode: 183/200, Reward: 200.000, Step: 200 -2022-11-13 16:31:15 - r - INFO: - Episode: 184/200, Reward: 200.000, Step: 200 -2022-11-13 16:31:17 - r - INFO: - Episode: 185/200, Reward: 173.000, Step: 173 -2022-11-13 16:31:19 - r - INFO: - Episode: 186/200, Reward: 200.000, Step: 200 -2022-11-13 16:31:21 - r - INFO: - Episode: 187/200, Reward: 200.000, Step: 200 -2022-11-13 16:31:22 - r - INFO: - Episode: 188/200, Reward: 200.000, Step: 200 -2022-11-13 16:31:23 - r - INFO: - Episode: 189/200, Reward: 200.000, Step: 200 -2022-11-13 16:31:24 - r - INFO: - Episode: 190/200, Reward: 200.000, Step: 200 -2022-11-13 16:31:26 - r - INFO: - Current episode 190 has the best eval reward: 200.000 -2022-11-13 16:31:27 - r - INFO: - Episode: 191/200, Reward: 200.000, Step: 200 -2022-11-13 16:31:29 - r - INFO: - Episode: 192/200, Reward: 200.000, Step: 200 -2022-11-13 16:31:30 - r - INFO: - Episode: 193/200, Reward: 200.000, Step: 200 -2022-11-13 16:31:31 - r - INFO: - Episode: 194/200, Reward: 200.000, Step: 200 -2022-11-13 16:31:33 - r - INFO: - Episode: 195/200, Reward: 200.000, Step: 200 -2022-11-13 16:31:34 - r - INFO: - Current episode 195 has the best eval reward: 200.000 -2022-11-13 16:31:35 - r - INFO: - Episode: 196/200, Reward: 200.000, Step: 200 -2022-11-13 16:31:37 - r - INFO: - Episode: 197/200, Reward: 200.000, Step: 200 -2022-11-13 16:31:38 - r - INFO: - Episode: 198/200, Reward: 200.000, Step: 200 -2022-11-13 16:31:39 - r - INFO: - Episode: 199/200, Reward: 200.000, Step: 200 -2022-11-13 16:31:40 - r - INFO: - Episode: 200/200, Reward: 200.000, Step: 200 -2022-11-13 16:31:42 - r - INFO: - Current episode 200 has the best eval reward: 200.000 -2022-11-13 16:31:42 - r - INFO: - Finish training! diff --git a/projects/codes/PER_DQN/Train_CartPole-v1_PER_DQN_20221113-162804/models/checkpoint.pt b/projects/codes/PER_DQN/Train_CartPole-v1_PER_DQN_20221113-162804/models/checkpoint.pt deleted file mode 100644 index acaef5b..0000000 Binary files a/projects/codes/PER_DQN/Train_CartPole-v1_PER_DQN_20221113-162804/models/checkpoint.pt and /dev/null differ diff --git a/projects/codes/PER_DQN/Train_CartPole-v1_PER_DQN_20221113-162804/results/learning_curve.png b/projects/codes/PER_DQN/Train_CartPole-v1_PER_DQN_20221113-162804/results/learning_curve.png deleted file mode 100644 index 6f666e3..0000000 Binary files a/projects/codes/PER_DQN/Train_CartPole-v1_PER_DQN_20221113-162804/results/learning_curve.png and /dev/null differ diff --git a/projects/codes/PER_DQN/Train_CartPole-v1_PER_DQN_20221113-162804/results/res.csv b/projects/codes/PER_DQN/Train_CartPole-v1_PER_DQN_20221113-162804/results/res.csv deleted file mode 100644 index 1c3339f..0000000 --- a/projects/codes/PER_DQN/Train_CartPole-v1_PER_DQN_20221113-162804/results/res.csv +++ /dev/null @@ -1,201 +0,0 @@ -episodes,rewards,steps -0,18.0,18 -1,35.0,35 -2,13.0,13 -3,20.0,20 -4,24.0,24 -5,10.0,10 -6,20.0,20 -7,19.0,19 -8,30.0,30 -9,10.0,10 -10,16.0,16 -11,16.0,16 -12,12.0,12 -13,28.0,28 -14,22.0,22 -15,14.0,14 -16,9.0,9 -17,13.0,13 -18,19.0,19 -19,10.0,10 -20,10.0,10 -21,12.0,12 -22,9.0,9 -23,12.0,12 -24,11.0,11 -25,11.0,11 -26,13.0,13 -27,11.0,11 -28,13.0,13 -29,20.0,20 -30,16.0,16 -31,9.0,9 -32,16.0,16 -33,15.0,15 -34,12.0,12 -35,12.0,12 -36,16.0,16 -37,13.0,13 -38,18.0,18 -39,18.0,18 -40,48.0,48 -41,52.0,52 -42,33.0,33 -43,15.0,15 -44,18.0,18 -45,22.0,22 -46,19.0,19 -47,19.0,19 -48,11.0,11 -49,9.0,9 -50,10.0,10 -51,10.0,10 -52,10.0,10 -53,10.0,10 -54,9.0,9 -55,17.0,17 -56,75.0,75 -57,28.0,28 -58,30.0,30 -59,54.0,54 -60,22.0,22 -61,28.0,28 -62,26.0,26 -63,32.0,32 -64,30.0,30 -65,29.0,29 -66,28.0,28 -67,38.0,38 -68,28.0,28 -69,22.0,22 -70,40.0,40 -71,27.0,27 -72,24.0,24 -73,47.0,47 -74,127.0,127 -75,48.0,48 -76,27.0,27 -77,65.0,65 -78,75.0,75 -79,47.0,47 -80,34.0,34 -81,38.0,38 -82,24.0,24 -83,47.0,47 -84,35.0,35 -85,103.0,103 -86,64.0,64 -87,59.0,59 -88,200.0,200 -89,200.0,200 -90,200.0,200 -91,200.0,200 -92,200.0,200 -93,200.0,200 -94,200.0,200 -95,200.0,200 -96,200.0,200 -97,200.0,200 -98,200.0,200 -99,200.0,200 -100,200.0,200 -101,200.0,200 -102,200.0,200 -103,200.0,200 -104,200.0,200 -105,200.0,200 -106,200.0,200 -107,200.0,200 -108,200.0,200 -109,200.0,200 -110,200.0,200 -111,200.0,200 -112,200.0,200 -113,200.0,200 -114,200.0,200 -115,200.0,200 -116,200.0,200 -117,200.0,200 -118,200.0,200 -119,200.0,200 -120,200.0,200 -121,200.0,200 -122,200.0,200 -123,200.0,200 -124,200.0,200 -125,200.0,200 -126,200.0,200 -127,200.0,200 -128,200.0,200 -129,200.0,200 -130,200.0,200 -131,200.0,200 -132,200.0,200 -133,200.0,200 -134,200.0,200 -135,200.0,200 -136,185.0,185 -137,193.0,193 -138,192.0,192 -139,200.0,200 -140,200.0,200 -141,200.0,200 -142,200.0,200 -143,191.0,191 -144,200.0,200 -145,184.0,184 -146,198.0,198 -147,200.0,200 -148,200.0,200 -149,192.0,192 -150,186.0,186 -151,200.0,200 -152,194.0,194 -153,199.0,199 -154,183.0,183 -155,173.0,173 -156,197.0,197 -157,200.0,200 -158,200.0,200 -159,196.0,196 -160,200.0,200 -161,200.0,200 -162,194.0,194 -163,185.0,185 -164,173.0,173 -165,192.0,192 -166,164.0,164 -167,188.0,188 -168,189.0,189 -169,197.0,197 -170,187.0,187 -171,200.0,200 -172,195.0,195 -173,200.0,200 -174,195.0,195 -175,200.0,200 -176,200.0,200 -177,200.0,200 -178,200.0,200 -179,200.0,200 -180,200.0,200 -181,200.0,200 -182,200.0,200 -183,200.0,200 -184,173.0,173 -185,200.0,200 -186,200.0,200 -187,200.0,200 -188,200.0,200 -189,200.0,200 -190,200.0,200 -191,200.0,200 -192,200.0,200 -193,200.0,200 -194,200.0,200 -195,200.0,200 -196,200.0,200 -197,200.0,200 -198,200.0,200 -199,200.0,200 diff --git a/projects/codes/PER_DQN/config/CartPole-v1_PER_DQN_Test.yaml b/projects/codes/PER_DQN/config/CartPole-v1_PER_DQN_Test.yaml deleted file mode 100644 index a1db2ab..0000000 --- a/projects/codes/PER_DQN/config/CartPole-v1_PER_DQN_Test.yaml +++ /dev/null @@ -1,22 +0,0 @@ -general_cfg: - algo_name: PER_DQN - device: cpu - env_name: CartPole-v1 - mode: test - load_checkpoint: true - load_path: Train_CartPole-v1_PER_DQN_20221113-162804 - max_steps: 200 - save_fig: true - seed: 0 - show_fig: false - test_eps: 10 - train_eps: 200 -algo_cfg: - batch_size: 64 - buffer_size: 100000 - epsilon_decay: 500 - epsilon_end: 0.01 - epsilon_start: 0.95 - gamma: 0.95 - lr: 0.0001 - target_update: 4 diff --git a/projects/codes/PER_DQN/config/CartPole-v1_PER_DQN_Train.yaml b/projects/codes/PER_DQN/config/CartPole-v1_PER_DQN_Train.yaml deleted file mode 100644 index 553622f..0000000 --- a/projects/codes/PER_DQN/config/CartPole-v1_PER_DQN_Train.yaml +++ /dev/null @@ -1,22 +0,0 @@ -general_cfg: - algo_name: PER_DQN - device: cuda - env_name: CartPole-v1 - mode: train - load_checkpoint: false - load_path: Train_CartPole-v1_PER_DQN_20221026-054757 - max_steps: 200 - save_fig: true - seed: 0 - show_fig: false - test_eps: 10 - train_eps: 200 -algo_cfg: - batch_size: 64 - buffer_size: 100000 - epsilon_decay: 500 - epsilon_end: 0.01 - epsilon_start: 0.95 - gamma: 0.95 - lr: 0.0001 - target_update: 4 diff --git a/projects/codes/PER_DQN/config/config.py b/projects/codes/PER_DQN/config/config.py deleted file mode 100644 index a92c7e0..0000000 --- a/projects/codes/PER_DQN/config/config.py +++ /dev/null @@ -1,38 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: JiangJi -Email: johnjim0816@gmail.com -Date: 2022-10-30 00:37:33 -LastEditor: JiangJi -LastEditTime: 2022-10-30 01:19:08 -Discription: default parameters of DQN -''' -from common.config import GeneralConfig,AlgoConfig -class GeneralConfigDQN(GeneralConfig): - def __init__(self) -> None: - self.env_name = "CartPole-v1" # name of environment - self.algo_name = "PER_DQN" # name of algorithm - self.mode = "train" # train or test - self.seed = 1 # random seed - self.device = "cuda" # device to use - self.train_eps = 200 # number of episodes for training - self.test_eps = 10 # number of episodes for testing - self.max_steps = 200 # max steps for each episode - self.load_checkpoint = False - self.load_path = "tasks" # path to load model - self.show_fig = False # show figure or not - self.save_fig = True # save figure or not - -class AlgoConfigDQN(AlgoConfig): - def __init__(self) -> None: - # set epsilon_start=epsilon_end can obtain fixed epsilon=epsilon_end - self.epsilon_start = 0.95 # epsilon start value - self.epsilon_end = 0.01 # epsilon end value - self.epsilon_decay = 500 # epsilon decay rate - self.hidden_dim = 256 # hidden_dim for MLP - self.gamma = 0.95 # discount factor - self.lr = 0.0001 # learning rate - self.buffer_size = 100000 # size of replay buffer - self.batch_size = 64 # batch size - self.target_update = 4 # target network update frequency diff --git a/projects/codes/PER_DQN/per_dqn.py b/projects/codes/PER_DQN/per_dqn.py deleted file mode 100644 index 6fbf651..0000000 --- a/projects/codes/PER_DQN/per_dqn.py +++ /dev/null @@ -1,139 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: DingLi -Email: wangzhongren@sjtu.edu.cn -Date: 2022-10-31 22:54:00 -LastEditor: DingLi -LastEditTime: 2022-11-14 10:43:18 -Discription: CartPole-v1 -''' - -''' -@Author: John -@Email: johnjim0816@gmail.com -@Date: 2020-06-12 00:50:49 -@LastEditor: John -LastEditTime: 2022-10-26 07:50:24 -@Discription: -@Environment: python 3.7.7 -''' -'''off-policy -''' - -import torch -import torch.nn as nn -import torch.optim as optim -import random -import math -import numpy as np - -class PER_DQN: - def __init__(self,model,memory,cfg): - - self.n_actions = cfg.n_actions - self.device = torch.device(cfg.device) - self.gamma = cfg.gamma - ## e-greedy parameters - self.sample_count = 0 # sample count for epsilon decay - self.epsilon = cfg.epsilon_start - self.sample_count = 0 - self.epsilon_start = cfg.epsilon_start - self.epsilon_end = cfg.epsilon_end - self.epsilon_decay = cfg.epsilon_decay - self.batch_size = cfg.batch_size - self.policy_net = model.to(self.device) - self.target_net = model.to(self.device) - ## copy parameters from policy net to target net - for target_param, param in zip(self.target_net.parameters(),self.policy_net.parameters()): - target_param.data.copy_(param.data) - # self.target_net.load_state_dict(self.policy_net.state_dict()) # or use this to copy parameters - self.optimizer = optim.Adam(self.policy_net.parameters(), lr=cfg.lr) - self.memory = memory - self.update_flag = False - - def sample_action(self, state): - ''' sample action with e-greedy policy - ''' - self.sample_count += 1 - # epsilon must decay(linear,exponential and etc.) for balancing exploration and exploitation - self.epsilon = self.epsilon_end + (self.epsilon_start - self.epsilon_end) * \ - math.exp(-1. * self.sample_count / self.epsilon_decay) - if random.random() > self.epsilon: - with torch.no_grad(): - state = torch.tensor(state, device=self.device, dtype=torch.float32).unsqueeze(dim=0) - q_values = self.policy_net(state) - action = q_values.max(1)[1].item() # choose action corresponding to the maximum q value - else: - action = random.randrange(self.n_actions) - return action - # @torch.no_grad() - # def sample_action(self, state): - # ''' sample action with e-greedy policy - # ''' - # self.sample_count += 1 - # # epsilon must decay(linear,exponential and etc.) for balancing exploration and exploitation - # self.epsilon = self.epsilon_end + (self.epsilon_start - self.epsilon_end) * \ - # math.exp(-1. * self.sample_count / self.epsilon_decay) - # if random.random() > self.epsilon: - # state = torch.tensor(state, device=self.device, dtype=torch.float32).unsqueeze(dim=0) - # q_values = self.policy_net(state) - # action = q_values.max(1)[1].item() # choose action corresponding to the maximum q value - # else: - # action = random.randrange(self.n_actions) - # return action - def predict_action(self,state): - ''' predict action - ''' - with torch.no_grad(): - state = torch.tensor(state, device=self.device, dtype=torch.float32).unsqueeze(dim=0) - q_values = self.policy_net(state) - action = q_values.max(1)[1].item() # choose action corresponding to the maximum q value - return action - def update(self): - if len(self.memory) < self.batch_size: # when transitions in memory donot meet a batch, not update - # print ("self.batch_size = ", self.batch_size) - return - else: - if not self.update_flag: - print("Begin to update!") - self.update_flag = True - # sample a batch of transitions from replay buffer - (state_batch, action_batch, reward_batch, next_state_batch, done_batch), idxs_batch, is_weights_batch = self.memory.sample( - self.batch_size) - state_batch = torch.tensor(np.array(state_batch), device=self.device, dtype=torch.float) # shape(batchsize,n_states) - action_batch = torch.tensor(action_batch, device=self.device).unsqueeze(1) # shape(batchsize,1) - reward_batch = torch.tensor(reward_batch, device=self.device, dtype=torch.float).unsqueeze(1) # shape(batchsize,1) - next_state_batch = torch.tensor(np.array(next_state_batch), device=self.device, dtype=torch.float) # shape(batchsize,n_states) - done_batch = torch.tensor(np.float32(done_batch), device=self.device).unsqueeze(1) # shape(batchsize,1) - q_value_batch = self.policy_net(state_batch).gather(dim=1, index=action_batch) # shape(batchsize,1),requires_grad=True - next_max_q_value_batch = self.target_net(next_state_batch).max(1)[0].detach().unsqueeze(1) - expected_q_value_batch = reward_batch + self.gamma * next_max_q_value_batch* (1-done_batch) - - loss = torch.mean(torch.pow((q_value_batch - expected_q_value_batch) * torch.from_numpy(is_weights_batch).cuda(), 2)) - # loss = nn.MSELoss()(q_value_batch, expected_q_value_batch) # shape same to - - abs_errors = np.sum(np.abs(q_value_batch.cpu().detach().numpy() - expected_q_value_batch.cpu().detach().numpy()), axis=1) - self.memory.batch_update(idxs_batch, abs_errors) - - # backpropagation - self.optimizer.zero_grad() - loss.backward() - # clip to avoid gradient explosion - for param in self.policy_net.parameters(): - param.grad.data.clamp_(-1, 1) - self.optimizer.step() - if self.sample_count % self.target_update == 0: # target net update, target_update means "C" in pseucodes - self.target_net.load_state_dict(self.policy_net.state_dict()) - - def save_model(self, fpath): - from pathlib import Path - # create path - Path(fpath).mkdir(parents=True, exist_ok=True) - torch.save(self.target_net.state_dict(), f"{fpath}/checkpoint.pt") - - def load_model(self, fpath): - checkpoint = torch.load(f"{fpath}/checkpoint.pt",map_location=self.device) - self.target_net.load_state_dict(checkpoint) - for target_param, param in zip(self.target_net.parameters(), self.policy_net.parameters()): - param.data.copy_(target_param.data) diff --git a/projects/codes/PER_DQN/task0.py b/projects/codes/PER_DQN/task0.py deleted file mode 100644 index 8b6247b..0000000 --- a/projects/codes/PER_DQN/task0.py +++ /dev/null @@ -1,104 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: DingLi -Email: wangzhongren@sjtu.edu.cn -Date: 2022-10-31 22:54:00 -LastEditor: DingLi -LastEditTime: 2022-11-14 10:45:11 -Discription: CartPole-v1 -''' - -''' -Author: JiangJi -Email: johnjim0816@gmail.com -Date: 2022-10-12 11:09:54 -LastEditor: JiangJi -LastEditTime: 2022-10-30 01:29:25 -Discription: CartPole-v1,Acrobot-v1 -''' -import sys,os -curr_path = os.path.dirname(os.path.abspath(__file__)) # current path -parent_path = os.path.dirname(curr_path) # parent path -sys.path.append(parent_path) # add to system path -import gym -import torch - -from common.utils import all_seed,merge_class_attrs -from common.models import MLP -from common.memories import ReplayBuffer, ReplayTree -from common.launcher import Launcher -from envs.register import register_env -from per_dqn import PER_DQN -from config.config import GeneralConfigDQN,AlgoConfigDQN -class Main(Launcher): - def __init__(self) -> None: - super().__init__() - self.cfgs['general_cfg'] = merge_class_attrs(self.cfgs['general_cfg'],GeneralConfigDQN()) - self.cfgs['algo_cfg'] = merge_class_attrs(self.cfgs['algo_cfg'],AlgoConfigDQN()) - def env_agent_config(self,cfg,logger): - ''' create env and agent - ''' - register_env(cfg.env_name) - env = gym.make(cfg.env_name,new_step_api=True) # create env - all_seed(env,seed=cfg.seed) # set random seed - try: # state dimension - n_states = env.observation_space.n # print(hasattr(env.observation_space, 'n')) - except AttributeError: - n_states = env.observation_space.shape[0] # print(hasattr(env.observation_space, 'shape')) - n_actions = env.action_space.n # action dimension - logger.info(f"n_states: {n_states}, n_actions: {n_actions}") # print info - # update to cfg paramters - setattr(cfg, 'n_states', n_states) - setattr(cfg, 'n_actions', n_actions) - # cfg.update({"n_states":n_states,"n_actions":n_actions}) # update to cfg paramters - model = MLP(n_states,n_actions,hidden_dim=cfg.hidden_dim) - memory = ReplayTree(cfg.buffer_size) # replay SumTree - agent = PER_DQN(model,memory,cfg) # create agent - return env, agent - - def train_one_episode(self,env, agent, cfg): - ''' train one episode - ''' - ep_step = 0 - state = env.reset() # reset and obtain initial state - for _ in range(cfg.max_steps): - ep_step += 1 - action = agent.sample_action(state) # sample action - next_state, reward, terminated, truncated , info = env.step(action) # update env and return transitions under new_step_api of OpenAI Gym - - policy_val = agent.policy_net(torch.tensor(state, device = cfg.device))[action] - target_val = agent.target_net(torch.tensor(next_state, device = cfg.device)) - - if terminated: - error = abs(policy_val - reward) - else: - error = abs(policy_val - reward - cfg.gamma * torch.max(target_val)) - agent.memory.push(error.cpu().detach().numpy(), (state, action, reward, - next_state, terminated)) # save transitions - state = next_state # update next state for env - agent.update() # update agent - ep_reward += reward # - if terminated: - break - return agent, ep_reward, ep_step - - def test_one_episode(self, env, agent, cfg): - ep_reward = 0 # reward per episode - ep_step = 0 - state = env.reset() # reset and obtain initial state - for _ in range(cfg.max_steps): - ep_step+=1 - action = agent.predict_action(state) # predict action - next_state, reward, terminated, _, _ = env.step(action) - state = next_state - ep_reward += reward - if terminated: - break - return agent, ep_reward, ep_step - - -if __name__ == "__main__": - main = Main() - main.run() - diff --git a/projects/codes/PPO/README.md b/projects/codes/PPO/README.md deleted file mode 100644 index 125ef51..0000000 --- a/projects/codes/PPO/README.md +++ /dev/null @@ -1,142 +0,0 @@ -## 原理简介 - -PPO是一种on-policy算法,具有较好的性能,其前身是TRPO算法,也是policy gradient算法的一种,它是现在 OpenAI 默认的强化学习算法,具体原理可参考[PPO算法讲解](https://datawhalechina.github.io/easy-rl/#/chapter5/chapter5)。PPO算法主要有两个变种,一个是结合KL penalty的,一个是用了clip方法,本文实现的是后者即```PPO-clip```。 -## 伪代码 -要实现必先了解伪代码,伪代码如下: -![在这里插入图片描述](assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0pvaG5KaW0w,size_16,color_FFFFFF,t_70.png) -这是谷歌找到的一张比较适合的图,本人比较懒就没有修改,上面的```k```就是第```k```个episode,第六步是用随机梯度下降的方法优化,这里的损失函数(即```argmax```后面的部分)可能有点难理解,可参考[PPO paper](https://arxiv.org/abs/1707.06347),如下: -![在这里插入图片描述](assets/20210323154236878.png) -第七步就是一个平方损失函数,即实际回报与期望回报的差平方。 -## 代码实战 -[点击查看完整代码](https://github.com/JohnJim0816/rl-tutorials/tree/master/PPO) -### PPOmemory -首先第三步需要搜集一条轨迹信息,我们可以定义一个```PPOmemory```来存储相关信息: -```python -class PPOMemory: - def __init__(self, batch_size): - self.states = [] - self.probs = [] - self.vals = [] - self.actions = [] - self.rewards = [] - self.dones = [] - self.batch_size = batch_size - def sample(self): - batch_step = np.arange(0, len(self.states), self.batch_size) - indices = np.arange(len(self.states), dtype=np.int64) - np.random.shuffle(indices) - batches = [indices[i:i+self.batch_size] for i in batch_step] - return np.array(self.states),\ - np.array(self.actions),\ - np.array(self.probs),\ - np.array(self.vals),\ - np.array(self.rewards),\ - np.array(self.dones),\ - batches - - def push(self, state, action, probs, vals, reward, done): - self.states.append(state) - self.actions.append(action) - self.probs.append(probs) - self.vals.append(vals) - self.rewards.append(reward) - self.dones.append(done) - - def clear(self): - self.states = [] - self.probs = [] - self.actions = [] - self.rewards = [] - self.dones = [] - self.vals = [] -``` -这里的push函数就是将得到的相关量放入memory中,sample就是随机采样出来,方便第六步的随机梯度下降。 -### PPO model -model就是actor和critic两个网络了: -```python -import torch.nn as nn -from torch.distributions.categorical import Categorical -class Actor(nn.Module): - def __init__(self,n_states, n_actions, - hidden_dim=256): - super(Actor, self).__init__() - - self.actor = nn.Sequential( - nn.Linear(n_states, hidden_dim), - nn.ReLU(), - nn.Linear(hidden_dim, hidden_dim), - nn.ReLU(), - nn.Linear(hidden_dim, n_actions), - nn.Softmax(dim=-1) - ) - def forward(self, state): - dist = self.actor(state) - dist = Categorical(dist) - return dist - -class Critic(nn.Module): - def __init__(self, n_states,hidden_dim=256): - super(Critic, self).__init__() - self.critic = nn.Sequential( - nn.Linear(n_states, hidden_dim), - nn.ReLU(), - nn.Linear(hidden_dim, hidden_dim), - nn.ReLU(), - nn.Linear(hidden_dim, 1) - ) - def forward(self, state): - value = self.critic(state) - return value -``` -这里Actor就是得到一个概率分布(Categorica,也可以是别的分布,可以搜索torch distributionsl),critc根据当前状态得到一个值,这里的输入维度可以是```n_states+n_actions```,即将action信息也纳入critic网络中,这样会更好一些,感兴趣的小伙伴可以试试。 - -### PPO update -定义一个update函数主要实现伪代码中的第六步和第七步: -```python -def update(self): - for _ in range(self.n_epochs): - state_arr, action_arr, old_prob_arr, vals_arr,\ - reward_arr, dones_arr, batches = \ - self.memory.sample() - values = vals_arr - ### compute advantage ### - advantage = np.zeros(len(reward_arr), dtype=np.float32) - for t in range(len(reward_arr)-1): - discount = 1 - a_t = 0 - for k in range(t, len(reward_arr)-1): - a_t += discount*(reward_arr[k] + self.gamma*values[k+1]*\ - (1-int(dones_arr[k])) - values[k]) - discount *= self.gamma*self.gae_lambda - advantage[t] = a_t - advantage = torch.tensor(advantage).to(self.device) - ### SGD ### - values = torch.tensor(values).to(self.device) - for batch in batches: - states = torch.tensor(state_arr[batch], dtype=torch.float).to(self.device) - old_probs = torch.tensor(old_prob_arr[batch]).to(self.device) - actions = torch.tensor(action_arr[batch]).to(self.device) - dist = self.actor(states) - critic_value = self.critic(states) - critic_value = torch.squeeze(critic_value) - new_probs = dist.log_prob(actions) - prob_ratio = new_probs.exp() / old_probs.exp() - weighted_probs = advantage[batch] * prob_ratio - weighted_clipped_probs = torch.clamp(prob_ratio, 1-self.policy_clip, - 1+self.policy_clip)*advantage[batch] - actor_loss = -torch.min(weighted_probs, weighted_clipped_probs).mean() - returns = advantage[batch] + values[batch] - critic_loss = (returns-critic_value)**2 - critic_loss = critic_loss.mean() - total_loss = actor_loss + 0.5*critic_loss - self.actor_optimizer.zero_grad() - self.critic_optimizer.zero_grad() - total_loss.backward() - self.actor_optimizer.step() - self.critic_optimizer.step() - self.memory.clear() -``` -该部分首先从memory中提取搜集到的轨迹信息,然后计算gae,即advantage,接着使用随机梯度下降更新网络,最后清除memory以便搜集下一条轨迹信息。 - -最后实现效果如下: -![在这里插入图片描述](assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0pvaG5KaW0w,size_16,color_FFFFFF,t_70-20210405110725113.png) \ No newline at end of file diff --git a/projects/codes/PPO/assets/20210323154236878.png b/projects/codes/PPO/assets/20210323154236878.png deleted file mode 100644 index 0e8d796..0000000 Binary files a/projects/codes/PPO/assets/20210323154236878.png and /dev/null differ diff --git a/projects/codes/PPO/assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0pvaG5KaW0w,size_16,color_FFFFFF,t_70-20210405110725113.png b/projects/codes/PPO/assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0pvaG5KaW0w,size_16,color_FFFFFF,t_70-20210405110725113.png deleted file mode 100644 index e1b61f4..0000000 Binary files a/projects/codes/PPO/assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0pvaG5KaW0w,size_16,color_FFFFFF,t_70-20210405110725113.png and /dev/null differ diff --git a/projects/codes/PPO/assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0pvaG5KaW0w,size_16,color_FFFFFF,t_70.png b/projects/codes/PPO/assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0pvaG5KaW0w,size_16,color_FFFFFF,t_70.png deleted file mode 100644 index 944c7a6..0000000 Binary files a/projects/codes/PPO/assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0pvaG5KaW0w,size_16,color_FFFFFF,t_70.png and /dev/null differ diff --git a/projects/codes/PPO/config/config.py b/projects/codes/PPO/config/config.py deleted file mode 100644 index b8f9870..0000000 --- a/projects/codes/PPO/config/config.py +++ /dev/null @@ -1,37 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: JiangJi -Email: johnjim0816@gmail.com -Date: 2022-10-30 11:30:56 -LastEditor: JiangJi -LastEditTime: 2022-10-31 00:33:15 -Discription: default parameters of PPO -''' -from common.config import GeneralConfig,AlgoConfig - -class GeneralConfigPPO(GeneralConfig): - def __init__(self) -> None: - self.env_name = "CartPole-v0" - self.algo_name = "PPO" - self.seed = 1 - self.device = "cuda" - self.train_eps = 100 # number of episodes for training - self.test_eps = 10 # number of episodes for testing - self.max_steps = 200 # max steps for each episode - -class AlgoConfigPPO(AlgoConfig): - def __init__(self) -> None: - self.gamma = 0.99 # discount factor - self.continuous = False # continuous action space or not - self.policy_clip = 0.2 # clip range of policy - self.n_epochs = 10 # number of epochs - self.gae_lambda = 0.95 # gae lambda - self.actor_lr = 0.0003 # learning rate of actor - self.critic_lr = 0.0003 # learning rate of critic - self.actor_hidden_dim = 256 # - self.critic_hidden_dim = 256 - self.n_epochs = 4 # epochs - self.batch_size = 5 # - self.policy_clip = 0.2 - self.update_fre = 20 # frequency of updating agent diff --git a/projects/codes/PPO/outputs/CartPole-v0/20220920-213310/models/ppo_actor.pt b/projects/codes/PPO/outputs/CartPole-v0/20220920-213310/models/ppo_actor.pt deleted file mode 100644 index e7660b4..0000000 Binary files a/projects/codes/PPO/outputs/CartPole-v0/20220920-213310/models/ppo_actor.pt and /dev/null differ diff --git a/projects/codes/PPO/outputs/CartPole-v0/20220920-213310/models/ppo_critic.pt b/projects/codes/PPO/outputs/CartPole-v0/20220920-213310/models/ppo_critic.pt deleted file mode 100644 index f0ec0d4..0000000 Binary files a/projects/codes/PPO/outputs/CartPole-v0/20220920-213310/models/ppo_critic.pt and /dev/null differ diff --git a/projects/codes/PPO/outputs/CartPole-v0/20220920-213310/results/params.json b/projects/codes/PPO/outputs/CartPole-v0/20220920-213310/results/params.json deleted file mode 100644 index 15097c6..0000000 --- a/projects/codes/PPO/outputs/CartPole-v0/20220920-213310/results/params.json +++ /dev/null @@ -1,25 +0,0 @@ -{ - "algo_name": "PPO", - "env_name": "CartPole-v0", - "continuous": false, - "train_eps": 200, - "test_eps": 20, - "gamma": 0.99, - "batch_size": 5, - "n_epochs": 4, - "actor_lr": 0.0003, - "critic_lr": 0.0003, - "gae_lambda": 0.95, - "policy_clip": 0.2, - "update_fre": 20, - "actor_hidden_dim": 256, - "critic_hidden_dim": 256, - "device": "cpu", - "seed": 10, - "show_fig": false, - "save_fig": true, - "result_path": "c:\\Users\\24438\\Desktop\\rl-tutorials\\codes\\PPO/outputs/CartPole-v0/20220920-213310/results/", - "model_path": "c:\\Users\\24438\\Desktop\\rl-tutorials\\codes\\PPO/outputs/CartPole-v0/20220920-213310/models/", - "n_states": 4, - "n_actions": 2 -} \ No newline at end of file diff --git a/projects/codes/PPO/outputs/CartPole-v0/20220920-213310/results/testing_curve.png b/projects/codes/PPO/outputs/CartPole-v0/20220920-213310/results/testing_curve.png deleted file mode 100644 index badf029..0000000 Binary files a/projects/codes/PPO/outputs/CartPole-v0/20220920-213310/results/testing_curve.png and /dev/null differ diff --git a/projects/codes/PPO/outputs/CartPole-v0/20220920-213310/results/testing_results.csv b/projects/codes/PPO/outputs/CartPole-v0/20220920-213310/results/testing_results.csv deleted file mode 100644 index fb73fd6..0000000 --- a/projects/codes/PPO/outputs/CartPole-v0/20220920-213310/results/testing_results.csv +++ /dev/null @@ -1,21 +0,0 @@ -episodes,rewards -0,200.0 -1,200.0 -2,200.0 -3,200.0 -4,200.0 -5,200.0 -6,200.0 -7,200.0 -8,200.0 -9,200.0 -10,200.0 -11,200.0 -12,200.0 -13,200.0 -14,200.0 -15,200.0 -16,200.0 -17,200.0 -18,200.0 -19,200.0 diff --git a/projects/codes/PPO/outputs/CartPole-v0/20220920-213310/results/training_curve.png b/projects/codes/PPO/outputs/CartPole-v0/20220920-213310/results/training_curve.png deleted file mode 100644 index 1bc6604..0000000 Binary files a/projects/codes/PPO/outputs/CartPole-v0/20220920-213310/results/training_curve.png and /dev/null differ diff --git a/projects/codes/PPO/outputs/CartPole-v0/20220920-213310/results/training_results.csv b/projects/codes/PPO/outputs/CartPole-v0/20220920-213310/results/training_results.csv deleted file mode 100644 index 7836df5..0000000 --- a/projects/codes/PPO/outputs/CartPole-v0/20220920-213310/results/training_results.csv +++ /dev/null @@ -1,201 +0,0 @@ -episodes,rewards -0,34.0 -1,12.0 -2,47.0 -3,29.0 -4,20.0 -5,23.0 -6,33.0 -7,25.0 -8,11.0 -9,30.0 -10,18.0 -11,16.0 -12,15.0 -13,25.0 -14,33.0 -15,19.0 -16,50.0 -17,23.0 -18,21.0 -19,42.0 -20,60.0 -21,64.0 -22,30.0 -23,31.0 -24,90.0 -25,43.0 -26,54.0 -27,74.0 -28,30.0 -29,82.0 -30,50.0 -31,53.0 -32,25.0 -33,27.0 -34,145.0 -35,118.0 -36,141.0 -37,148.0 -38,200.0 -39,191.0 -40,71.0 -41,105.0 -42,100.0 -43,120.0 -44,80.0 -45,40.0 -46,104.0 -47,39.0 -48,89.0 -49,60.0 -50,30.0 -51,24.0 -52,20.0 -53,23.0 -54,30.0 -55,32.0 -56,20.0 -57,12.0 -58,25.0 -59,25.0 -60,24.0 -61,29.0 -62,200.0 -63,62.0 -64,200.0 -65,58.0 -66,81.0 -67,200.0 -68,52.0 -69,140.0 -70,200.0 -71,74.0 -72,200.0 -73,29.0 -74,124.0 -75,129.0 -76,200.0 -77,194.0 -78,175.0 -79,117.0 -80,200.0 -81,186.0 -82,114.0 -83,200.0 -84,166.0 -85,150.0 -86,135.0 -87,200.0 -88,200.0 -89,133.0 -90,111.0 -91,200.0 -92,90.0 -93,200.0 -94,147.0 -95,30.0 -96,137.0 -97,200.0 -98,200.0 -99,179.0 -100,167.0 -101,186.0 -102,169.0 -103,200.0 -104,200.0 -105,171.0 -106,200.0 -107,181.0 -108,125.0 -109,200.0 -110,200.0 -111,122.0 -112,200.0 -113,124.0 -114,95.0 -115,102.0 -116,118.0 -117,91.0 -118,64.0 -119,124.0 -120,122.0 -121,76.0 -122,68.0 -123,40.0 -124,52.0 -125,51.0 -126,50.0 -127,49.0 -128,37.0 -129,76.0 -130,83.0 -131,76.0 -132,92.0 -133,113.0 -134,94.0 -135,157.0 -136,92.0 -137,200.0 -138,123.0 -139,200.0 -140,200.0 -141,200.0 -142,140.0 -143,200.0 -144,200.0 -145,200.0 -146,200.0 -147,200.0 -148,200.0 -149,200.0 -150,200.0 -151,78.0 -152,200.0 -153,200.0 -154,200.0 -155,200.0 -156,200.0 -157,200.0 -158,200.0 -159,200.0 -160,200.0 -161,200.0 -162,107.0 -163,187.0 -164,200.0 -165,200.0 -166,200.0 -167,200.0 -168,200.0 -169,200.0 -170,200.0 -171,200.0 -172,200.0 -173,200.0 -174,200.0 -175,200.0 -176,200.0 -177,200.0 -178,200.0 -179,200.0 -180,200.0 -181,200.0 -182,200.0 -183,200.0 -184,200.0 -185,200.0 -186,200.0 -187,200.0 -188,200.0 -189,200.0 -190,200.0 -191,200.0 -192,200.0 -193,200.0 -194,200.0 -195,200.0 -196,200.0 -197,200.0 -198,200.0 -199,200.0 diff --git a/projects/codes/PPO/ppo2.py b/projects/codes/PPO/ppo2.py deleted file mode 100644 index 5d399b8..0000000 --- a/projects/codes/PPO/ppo2.py +++ /dev/null @@ -1,119 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: JiangJi -Email: johnjim0816@gmail.com -Date: 2022-09-26 16:11:36 -LastEditor: JiangJi -LastEditTime: 2022-10-31 00:36:37 -Discription: PPO-clip -''' - -import os -import numpy as np -import torch -import torch.optim as optim -from torch.distributions.categorical import Categorical - - -class PPO: - def __init__(self, models,memory,cfg): - self.gamma = cfg.gamma - self.continuous = cfg.continuous - self.policy_clip = cfg.policy_clip - self.n_epochs = cfg.n_epochs - self.batch_size = cfg.batch_size - self.gae_lambda = cfg.gae_lambda - self.device = torch.device(cfg.device) - self.actor = models['Actor'].to(self.device) - self.critic = models['Critic'].to(self.device) - self.actor_optimizer = optim.Adam(self.actor.parameters(), lr=cfg.actor_lr) - self.critic_optimizer = optim.Adam(self.critic.parameters(), lr=cfg.critic_lr) - self.memory = memory - self.loss = 0 - - def sample_action(self, state): - state = np.array([state]) # 先转成数组再转tensor更高效 - state = torch.tensor(state, dtype=torch.float).to(self.device) - probs = self.actor(state) - dist = Categorical(probs) - value = self.critic(state) - action = dist.sample() - probs = torch.squeeze(dist.log_prob(action)).item() - if self.continuous: - action = torch.tanh(action) - else: - action = torch.squeeze(action).item() - value = torch.squeeze(value).item() - return action, probs, value - @torch.no_grad() - def predict_action(self, state): - state = np.array([state]) # 先转成数组再转tensor更高效 - state = torch.tensor(state, dtype=torch.float).to(self.device) - dist = self.actor(state) - value = self.critic(state) - action = dist.sample() - probs = torch.squeeze(dist.log_prob(action)).item() - if self.continuous: - action = torch.tanh(action) - else: - action = torch.squeeze(action).item() - value = torch.squeeze(value).item() - return action, probs, value - - def update(self): - for _ in range(self.n_epochs): - state_arr, action_arr, old_prob_arr, vals_arr,reward_arr, dones_arr, batches = self.memory.sample() - values = vals_arr[:] - ### compute advantage ### - advantage = np.zeros(len(reward_arr), dtype=np.float32) - for t in range(len(reward_arr)-1): - discount = 1 - a_t = 0 - for k in range(t, len(reward_arr)-1): - a_t += discount*(reward_arr[k] + self.gamma*values[k+1]*\ - (1-int(dones_arr[k])) - values[k]) - discount *= self.gamma*self.gae_lambda - advantage[t] = a_t - advantage = torch.tensor(advantage).to(self.device) - ### SGD ### - values = torch.tensor(values).to(self.device) - for batch in batches: - states = torch.tensor(state_arr[batch], dtype=torch.float).to(self.device) - old_probs = torch.tensor(old_prob_arr[batch]).to(self.device) - actions = torch.tensor(action_arr[batch]).to(self.device) - dist = self.actor(states) - critic_value = self.critic(states) - critic_value = torch.squeeze(critic_value) - new_probs = dist.log_prob(actions) - prob_ratio = new_probs.exp() / old_probs.exp() - weighted_probs = advantage[batch] * prob_ratio - weighted_clipped_probs = torch.clamp(prob_ratio, 1-self.policy_clip, - 1+self.policy_clip)*advantage[batch] - actor_loss = -torch.min(weighted_probs, weighted_clipped_probs).mean() - returns = advantage[batch] + values[batch] - critic_loss = (returns-critic_value)**2 - critic_loss = critic_loss.mean() - total_loss = actor_loss + 0.5*critic_loss - self.loss = total_loss - self.actor_optimizer.zero_grad() - self.critic_optimizer.zero_grad() - total_loss.backward() - self.actor_optimizer.step() - self.critic_optimizer.step() - self.memory.clear() - def save_model(self,path): - from pathlib import Path - # create path - Path(path).mkdir(parents=True, exist_ok=True) - actor_checkpoint = os.path.join(path, 'ppo_actor.pt') - critic_checkpoint= os.path.join(path, 'ppo_critic.pt') - torch.save(self.actor.state_dict(), actor_checkpoint) - torch.save(self.critic.state_dict(), critic_checkpoint) - def load_model(self,path): - actor_checkpoint = os.path.join(path, 'ppo_actor.pt') - critic_checkpoint= os.path.join(path, 'ppo_critic.pt') - self.actor.load_state_dict(torch.load(actor_checkpoint)) - self.critic.load_state_dict(torch.load(critic_checkpoint)) - - diff --git a/projects/codes/PPO/task0.py b/projects/codes/PPO/task0.py deleted file mode 100644 index dbf0e7a..0000000 --- a/projects/codes/PPO/task0.py +++ /dev/null @@ -1,159 +0,0 @@ -import sys,os -os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE" # avoid "OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized." -curr_path = os.path.dirname(os.path.abspath(__file__)) # current path -parent_path = os.path.dirname(curr_path) # parent path -sys.path.append(parent_path) # add path to system path - -import gym -import torch -import datetime -import numpy as np -import argparse -import torch.nn as nn - - -from common.utils import all_seed,merge_class_attrs -from common.models import ActorSoftmax, Critic -from common.memories import PGReplay -from common.launcher import Launcher -from envs.register import register_env -from ppo2 import PPO -from config,config import GeneralConfigPPO,AlgoConfigPPO -class PPOMemory: - def __init__(self, batch_size): - self.states = [] - self.probs = [] - self.vals = [] - self.actions = [] - self.rewards = [] - self.terminateds = [] - self.batch_size = batch_size - def sample(self): - batch_step = np.arange(0, len(self.states), self.batch_size) - indices = np.arange(len(self.states), dtype=np.int64) - np.random.shuffle(indices) - batches = [indices[i:i+self.batch_size] for i in batch_step] - return np.array(self.states),np.array(self.actions),np.array(self.probs),\ - np.array(self.vals),np.array(self.rewards),np.array(self.terminateds),batches - - def push(self, state, action, probs, vals, reward, terminated): - self.states.append(state) - self.actions.append(action) - self.probs.append(probs) - self.vals.append(vals) - self.rewards.append(reward) - self.terminateds.append(terminated) - - def clear(self): - self.states = [] - self.probs = [] - self.actions = [] - self.rewards = [] - self.terminateds = [] - self.vals = [] - - -class Main(Launcher): - def __init__(self) -> None: - super().__init__() - self.cfgs['general_cfg'] = merge_class_attrs(self.cfgs['general_cfg'],GeneralConfigPPO()) - self.cfgs['algo_cfg'] = merge_class_attrs(self.cfgs['algo_cfg'],AlgoConfigPPO()) - def env_agent_config(self,cfg,logger): - ''' create env and agent - ''' - register_env(cfg.env_name) - env = gym.make(cfg.env_name,new_step_api=False) # create env - if cfg.seed !=0: # set random seed - all_seed(env,seed=cfg.seed) - try: # state dimension - n_states = env.observation_space.n # print(hasattr(env.observation_space, 'n')) - except AttributeError: - n_states = env.observation_space.shape[0] # print(hasattr(env.observation_space, 'shape')) - n_actions = env.action_space.n # action dimension - logger.info(f"n_states: {n_states}, n_actions: {n_actions}") # print info - # update to cfg paramters - setattr(cfg, 'n_states', n_states) - setattr(cfg, 'n_actions', n_actions) - models = {'Actor':ActorSoftmax(n_states,n_actions, hidden_dim = cfg.actor_hidden_dim),'Critic':Critic(n_states,1,hidden_dim=cfg.critic_hidden_dim)} - memory = PGReplay # replay buffer - agent = PPO(models,memory,cfg) # create agent - return env, agent - def train_one_episode(self, env, agent, cfg): - ep_reward = 0 # reward per episode - ep_step = 0 # step per episode - state = env.reset() - for _ in range(cfg.max_steps): - action, prob, val = agent.sample_action(state) - next_state, reward, terminated, _ = env.step(action) - ep_reward += reward - ep_step += 1 - agent.memory.push((state, action, prob, val, reward, terminated)) - if ep_step % cfg['update_fre'] == 0: - agent.update() - state = next_state - if terminated: - break - return agent, ep_reward, ep_step - def test_one_episode(self, env, agent, cfg): - ep_reward = 0 # reward per episode - ep_step = 0 # step per episode - state = env.reset() - for _ in range(cfg.max_steps): - action, prob, val = agent.sample_action(state) - next_state, reward, terminated, _ = env.step(action) - ep_reward += reward - ep_step += 1 - state = next_state - if terminated: - break - return agent, ep_reward, ep_step - def train(self,cfg,env,agent): - ''' train agent - ''' - print("Start training!") - print(f"Env: {cfg['env_name']}, Algorithm: {cfg['algo_name']}, Device: {cfg['device']}") - rewards = [] # record rewards for all episodes - steps = 0 - for i_ep in range(cfg['train_eps']): - state = env.reset() - ep_reward = 0 - while True: - action, prob, val = agent.sample_action(state) - next_state, reward, terminated, _ = env.step(action) - steps += 1 - ep_reward += reward - agent.memory.push(state, action, prob, val, reward, terminated) - if steps % cfg['update_fre'] == 0: - agent.update() - state = next_state - if terminated: - break - rewards.append(ep_reward) - if (i_ep+1)%10==0: - print(f"Episode: {i_ep+1}/{cfg['train_eps']}, Reward: {ep_reward:.2f}") - print("Finish training!") - return {'episodes':range(len(rewards)),'rewards':rewards} - def test(self,cfg,env,agent): - ''' test agent - ''' - print("Start testing!") - print(f"Env: {cfg['env_name']}, Algorithm: {cfg['algo_name']}, Device: {cfg['device']}") - rewards = [] # record rewards for all episodes - for i_ep in range(cfg['test_eps']): - state = env.reset() - ep_reward = 0 - while True: - action, prob, val = agent.predict_action(state) - next_state, reward, terminated, _ = env.step(action) - ep_reward += reward - state = next_state - if terminated: - break - rewards.append(ep_reward) - print(f"Episode: {i_ep+1}/{cfg['test_eps']}, Reward: {ep_reward:.2f}") - print("Finish testing!") - return {'episodes':range(len(rewards)),'rewards':rewards} - -if __name__ == "__main__": - main = Main() - main.run() \ No newline at end of file diff --git a/projects/codes/PPO/task1.py b/projects/codes/PPO/task1.py deleted file mode 100644 index d664770..0000000 --- a/projects/codes/PPO/task1.py +++ /dev/null @@ -1,77 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: JiangJi -Email: johnjim0816@gmail.com -Date: 2022-09-19 14:48:16 -LastEditor: JiangJi -LastEditTime: 2022-10-30 00:45:14 -Discription: -''' -import sys,os -curr_path = os.path.dirname(os.path.abspath(__file__)) # 当前文件所在绝对路径 -parent_path = os.path.dirname(curr_path) # 父路径 -sys.path.append(parent_path) # 添加路径到系统路径 - -import gym -import torch -import datetime -from common.utils import plot_rewards -from common.utils import save_results,make_dir -from ppo2 import PPO - -curr_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S") # 获取当前时间 - -class PPOConfig: - def __init__(self) -> None: - self.algo = "PPO" # 算法名称 - self.env_name = 'Pendulum-v1' # 环境名称 - self.continuous = True # 环境是否为连续动作 - self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # 检测GPU - self.train_eps = 200 # 训练的回合数 - self.test_eps = 20 # 测试的回合数 - self.batch_size = 5 - self.gamma=0.99 - self.n_epochs = 4 - self.actor_lr = 0.0003 - self.critic_lr = 0.0003 - self.gae_lambda=0.95 - self.policy_clip=0.2 - self.hidden_dim = 256 - self.update_fre = 20 # frequency of agent update - -class PlotConfig: - def __init__(self) -> None: - self.algo = "PPO" # 算法名称 - self.env_name = 'Pendulum-v1' # 环境名称 - self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # 检测GPU - self.result_path = curr_path+"/outputs/" + self.env_name + \ - '/'+curr_time+'/results/' # 保存结果的路径 - self.model_path = curr_path+"/outputs/" + self.env_name + \ - '/'+curr_time+'/models/' # 保存模型的路径 - self.save = True # 是否保存图片 - -def env_agent_config(cfg,seed=1): - env = gym.make(cfg.env_name) - env.seed(seed) - n_states = env.observation_space.shape[0] - n_actions = env.action_space.shape[0] - agent = PPO(n_states,n_actions,cfg) - return env,agent - - -cfg = PPOConfig() -plot_cfg = PlotConfig() -# 训练 -env,agent = env_agent_config(cfg,seed=1) -rewards, ma_rewards = train(cfg, env, agent) -make_dir(plot_cfg.result_path, plot_cfg.model_path) # 创建保存结果和模型路径的文件夹 -agent.save(path=plot_cfg.model_path) -save_results(rewards, ma_rewards, tag='train', path=plot_cfg.result_path) -plot_rewards(rewards, ma_rewards, plot_cfg, tag="train") -# 测试 -env,agent = env_agent_config(cfg,seed=10) -agent.load(path=plot_cfg.model_path) -rewards,ma_rewards = eval(cfg,env,agent) -save_results(rewards,ma_rewards,tag='eval',path=plot_cfg.result_path) -plot_rewards(rewards,ma_rewards,plot_cfg,tag="eval") \ No newline at end of file diff --git a/projects/codes/PolicyGradient/README.md b/projects/codes/PolicyGradient/README.md deleted file mode 100644 index 956cdbf..0000000 --- a/projects/codes/PolicyGradient/README.md +++ /dev/null @@ -1,27 +0,0 @@ -# Policy Gradient - - -Policy-based方法是强化学习中与Value-based(比如Q-learning)相对的方法,其目的是对策略本身进行梯度下降,相关基础知识参考[Datawhale-Policy Gradient](https://datawhalechina.github.io/leedeeprl-notes/#/chapter4/chapter4)。 -其中REINFORCE是一个最基本的Policy Gradient方法,主要解决策略梯度无法直接计算的问题,具体原理参考[CSDN-REINFORCE和Reparameterization Trick](https://blog.csdn.net/JohnJim0/article/details/110230703) - -## 伪代码 - -结合REINFORCE原理,其伪代码如下: - -image-20211016004808604 - -https://pytorch.org/docs/stable/distributions.html - -加负号的原因是,在公式中应该是实现的梯度上升算法,而loss一般使用随机梯度下降的,所以加个负号保持一致性。 - -![img](assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0pvaG5KaW0w,size_16,color_FFFFFF,t_70-20210428001336032.png) - -## 实现 - -## 参考 - -[REINFORCE和Reparameterization Trick](https://blog.csdn.net/JohnJim0/article/details/110230703) - -[Policy Gradient paper](https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf) - -[REINFORCE](https://towardsdatascience.com/policy-gradient-methods-104c783251e0) \ No newline at end of file diff --git a/projects/codes/PolicyGradient/assets/image-20211016004808604.png b/projects/codes/PolicyGradient/assets/image-20211016004808604.png deleted file mode 100644 index b0a56b5..0000000 Binary files a/projects/codes/PolicyGradient/assets/image-20211016004808604.png and /dev/null differ diff --git a/projects/codes/PolicyGradient/assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0pvaG5KaW0w,size_16,color_FFFFFF,t_70-20210428001336032.png b/projects/codes/PolicyGradient/assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0pvaG5KaW0w,size_16,color_FFFFFF,t_70-20210428001336032.png deleted file mode 100644 index 44c1874..0000000 Binary files a/projects/codes/PolicyGradient/assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0pvaG5KaW0w,size_16,color_FFFFFF,t_70-20210428001336032.png and /dev/null differ diff --git a/projects/codes/PolicyGradient/main.py b/projects/codes/PolicyGradient/main.py deleted file mode 100644 index 3473c38..0000000 --- a/projects/codes/PolicyGradient/main.py +++ /dev/null @@ -1,131 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: John -Email: johnjim0816@gmail.com -Date: 2020-11-22 23:21:53 -LastEditor: John -LastEditTime: 2022-08-27 00:04:08 -Discription: -Environment: -''' -import sys,os -curr_path = os.path.dirname(os.path.abspath(__file__)) # current path -parent_path = os.path.dirname(curr_path) # parent path -sys.path.append(parent_path) # add to system path - -import gym -import torch -import datetime -import argparse -from itertools import count -import torch.nn.functional as F -from pg import PolicyGradient -from common.utils import save_results, make_dir,all_seed,save_args,plot_rewards -from common.models import MLP -from common.memories import PGReplay -from common.launcher import Launcher -from envs.register import register_env - - -class PGNet(MLP): - ''' instead of outputing action, PG Net outputs propabilities of actions, we can use class inheritance from MLP here - ''' - def forward(self, x): - x = F.relu(self.fc1(x)) - x = F.relu(self.fc2(x)) - x = torch.sigmoid(self.fc3(x)) - return x - -class Main(Launcher): - def get_args(self): - """ Hyperparameters - """ - curr_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S") # Obtain current time - parser = argparse.ArgumentParser(description="hyperparameters") - parser.add_argument('--algo_name',default='PolicyGradient',type=str,help="name of algorithm") - parser.add_argument('--env_name',default='CartPole-v0',type=str,help="name of environment") - parser.add_argument('--train_eps',default=200,type=int,help="episodes of training") - parser.add_argument('--test_eps',default=20,type=int,help="episodes of testing") - parser.add_argument('--ep_max_steps',default = 100000,type=int,help="steps per episode, much larger value can simulate infinite steps") - parser.add_argument('--gamma',default=0.99,type=float,help="discounted factor") - parser.add_argument('--lr',default=0.01,type=float,help="learning rate") - parser.add_argument('--update_fre',default=8,type=int) - parser.add_argument('--hidden_dim',default=36,type=int) - parser.add_argument('--device',default='cpu',type=str,help="cpu or cuda") - parser.add_argument('--seed',default=1,type=int,help="seed") - parser.add_argument('--save_fig',default=True,type=bool,help="if save figure or not") - parser.add_argument('--show_fig',default=False,type=bool,help="if show figure or not") - args = parser.parse_args() - default_args = {'result_path':f"{curr_path}/outputs/{args.env_name}/{curr_time}/results/", - 'model_path':f"{curr_path}/outputs/{args.env_name}/{curr_time}/models/", - } - args = {**vars(args),**default_args} # type(dict) - return args - def env_agent_config(self,cfg): - register_env(cfg['env_name']) - env = gym.make(cfg['env_name']) - if cfg['seed'] !=0: # set random seed - all_seed(env,seed=cfg['seed']) - n_states = env.observation_space.shape[0] - n_actions = env.action_space.n # action dimension - print(f"state dim: {n_states}, action dim: {n_actions}") - cfg.update({"n_states":n_states,"n_actions":n_actions}) # update to cfg paramters - model = PGNet(n_states,1,hidden_dim=cfg['hidden_dim']) - memory = PGReplay() - agent = PolicyGradient(model,memory,cfg) - return env,agent - def train(self,cfg,env,agent): - print("Start training!") - print(f"Env: {cfg['env_name']}, Algorithm: {cfg['algo_name']}, Device: {cfg['device']}") - rewards = [] - for i_ep in range(cfg['train_eps']): - state = env.reset() - ep_reward = 0 - for _ in range(cfg['ep_max_steps']): - action = agent.sample_action(state) # sample action - next_state, reward, done, _ = env.step(action) - ep_reward += reward - if done: - reward = 0 - agent.memory.push((state,float(action),reward)) - state = next_state - if done: - break - if (i_ep+1) % 10 == 0: - print(f"Episode:{i_ep+1}/{cfg['train_eps']}, Reward:{ep_reward:.2f}") - if (i_ep+1) % cfg['update_fre'] == 0: - agent.update() - rewards.append(ep_reward) - print('Finish training!') - env.close() # close environment - res_dic = {'episodes':range(len(rewards)),'rewards':rewards} - return res_dic - - def test(self,cfg,env,agent): - print("Start testing!") - print(f"Env: {cfg['env_name']}, Algorithm: {cfg['algo_name']}, Device: {cfg['device']}") - rewards = [] - for i_ep in range(cfg['test_eps']): - state = env.reset() - ep_reward = 0 - for _ in range(cfg['ep_max_steps']): - action = agent.predict_action(state) - next_state, reward, done, _ = env.step(action) - ep_reward += reward - if done: - reward = 0 - state = next_state - if done: - break - print(f"Episode: {i_ep+1}/{cfg['test_eps']},Reward: {ep_reward:.2f}") - rewards.append(ep_reward) - print("Finish testing!") - env.close() - return {'episodes':range(len(rewards)),'rewards':rewards} - -if __name__ == "__main__": - main = Main() - main.run() - - diff --git a/projects/codes/PolicyGradient/outputs/CartPole-v0/20220827-000433/models/checkpoint.pt b/projects/codes/PolicyGradient/outputs/CartPole-v0/20220827-000433/models/checkpoint.pt deleted file mode 100644 index 7b98cda..0000000 Binary files a/projects/codes/PolicyGradient/outputs/CartPole-v0/20220827-000433/models/checkpoint.pt and /dev/null differ diff --git a/projects/codes/PolicyGradient/outputs/CartPole-v0/20220827-000433/results/params.json b/projects/codes/PolicyGradient/outputs/CartPole-v0/20220827-000433/results/params.json deleted file mode 100644 index 4dfae79..0000000 --- a/projects/codes/PolicyGradient/outputs/CartPole-v0/20220827-000433/results/params.json +++ /dev/null @@ -1 +0,0 @@ -{"algo_name": "PolicyGradient", "env_name": "CartPole-v0", "train_eps": 200, "test_eps": 20, "ep_max_steps": 100000, "gamma": 0.99, "lr": 0.01, "update_fre": 8, "hidden_dim": 36, "device": "cpu", "seed": 1, "save_fig": true, "show_fig": false, "result_path": "c:\\Users\\24438\\Desktop\\rl-tutorials\\codes\\PolicyGradient/outputs/CartPole-v0/20220827-000433/results/", "model_path": "c:\\Users\\24438\\Desktop\\rl-tutorials\\codes\\PolicyGradient/outputs/CartPole-v0/20220827-000433/models/", "n_states": 4, "n_actions": 2} \ No newline at end of file diff --git a/projects/codes/PolicyGradient/outputs/CartPole-v0/20220827-000433/results/testing_curve.png b/projects/codes/PolicyGradient/outputs/CartPole-v0/20220827-000433/results/testing_curve.png deleted file mode 100644 index e3c3489..0000000 Binary files a/projects/codes/PolicyGradient/outputs/CartPole-v0/20220827-000433/results/testing_curve.png and /dev/null differ diff --git a/projects/codes/PolicyGradient/outputs/CartPole-v0/20220827-000433/results/testing_results.csv b/projects/codes/PolicyGradient/outputs/CartPole-v0/20220827-000433/results/testing_results.csv deleted file mode 100644 index fb73fd6..0000000 --- a/projects/codes/PolicyGradient/outputs/CartPole-v0/20220827-000433/results/testing_results.csv +++ /dev/null @@ -1,21 +0,0 @@ -episodes,rewards -0,200.0 -1,200.0 -2,200.0 -3,200.0 -4,200.0 -5,200.0 -6,200.0 -7,200.0 -8,200.0 -9,200.0 -10,200.0 -11,200.0 -12,200.0 -13,200.0 -14,200.0 -15,200.0 -16,200.0 -17,200.0 -18,200.0 -19,200.0 diff --git a/projects/codes/PolicyGradient/outputs/CartPole-v0/20220827-000433/results/training_curve.png b/projects/codes/PolicyGradient/outputs/CartPole-v0/20220827-000433/results/training_curve.png deleted file mode 100644 index 1f954a1..0000000 Binary files a/projects/codes/PolicyGradient/outputs/CartPole-v0/20220827-000433/results/training_curve.png and /dev/null differ diff --git a/projects/codes/PolicyGradient/outputs/CartPole-v0/20220827-000433/results/training_results.csv b/projects/codes/PolicyGradient/outputs/CartPole-v0/20220827-000433/results/training_results.csv deleted file mode 100644 index 715be6d..0000000 --- a/projects/codes/PolicyGradient/outputs/CartPole-v0/20220827-000433/results/training_results.csv +++ /dev/null @@ -1,201 +0,0 @@ -episodes,rewards -0,26.0 -1,53.0 -2,10.0 -3,37.0 -4,22.0 -5,21.0 -6,12.0 -7,34.0 -8,93.0 -9,36.0 -10,29.0 -11,18.0 -12,14.0 -13,62.0 -14,20.0 -15,40.0 -16,10.0 -17,10.0 -18,10.0 -19,11.0 -20,10.0 -21,14.0 -22,12.0 -23,8.0 -24,19.0 -25,33.0 -26,22.0 -27,32.0 -28,16.0 -29,24.0 -30,24.0 -31,24.0 -32,75.0 -33,33.0 -34,33.0 -35,72.0 -36,110.0 -37,48.0 -38,60.0 -39,43.0 -40,61.0 -41,34.0 -42,50.0 -43,61.0 -44,53.0 -45,58.0 -46,36.0 -47,44.0 -48,42.0 -49,64.0 -50,67.0 -51,52.0 -52,39.0 -53,42.0 -54,40.0 -55,33.0 -56,200.0 -57,199.0 -58,149.0 -59,185.0 -60,134.0 -61,174.0 -62,162.0 -63,200.0 -64,93.0 -65,72.0 -66,69.0 -67,51.0 -68,62.0 -69,98.0 -70,73.0 -71,73.0 -72,200.0 -73,200.0 -74,200.0 -75,200.0 -76,200.0 -77,200.0 -78,200.0 -79,133.0 -80,200.0 -81,200.0 -82,200.0 -83,200.0 -84,200.0 -85,200.0 -86,200.0 -87,200.0 -88,114.0 -89,151.0 -90,129.0 -91,156.0 -92,112.0 -93,172.0 -94,171.0 -95,141.0 -96,200.0 -97,200.0 -98,200.0 -99,200.0 -100,200.0 -101,200.0 -102,200.0 -103,200.0 -104,188.0 -105,199.0 -106,138.0 -107,200.0 -108,200.0 -109,181.0 -110,145.0 -111,200.0 -112,135.0 -113,119.0 -114,112.0 -115,122.0 -116,118.0 -117,119.0 -118,131.0 -119,119.0 -120,109.0 -121,96.0 -122,105.0 -123,29.0 -124,110.0 -125,113.0 -126,18.0 -127,90.0 -128,145.0 -129,152.0 -130,151.0 -131,109.0 -132,141.0 -133,109.0 -134,136.0 -135,143.0 -136,200.0 -137,200.0 -138,200.0 -139,200.0 -140,200.0 -141,200.0 -142,200.0 -143,200.0 -144,192.0 -145,173.0 -146,180.0 -147,182.0 -148,186.0 -149,175.0 -150,176.0 -151,191.0 -152,200.0 -153,200.0 -154,200.0 -155,200.0 -156,200.0 -157,200.0 -158,200.0 -159,200.0 -160,200.0 -161,200.0 -162,200.0 -163,200.0 -164,200.0 -165,200.0 -166,200.0 -167,200.0 -168,200.0 -169,200.0 -170,200.0 -171,200.0 -172,200.0 -173,200.0 -174,200.0 -175,200.0 -176,200.0 -177,200.0 -178,200.0 -179,200.0 -180,200.0 -181,200.0 -182,200.0 -183,200.0 -184,200.0 -185,200.0 -186,200.0 -187,200.0 -188,200.0 -189,200.0 -190,200.0 -191,200.0 -192,200.0 -193,200.0 -194,200.0 -195,200.0 -196,200.0 -197,200.0 -198,200.0 -199,200.0 diff --git a/projects/codes/PolicyGradient/pg.py b/projects/codes/PolicyGradient/pg.py deleted file mode 100644 index 7d84c6e..0000000 --- a/projects/codes/PolicyGradient/pg.py +++ /dev/null @@ -1,88 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: John -Email: johnjim0816@gmail.com -Date: 2020-11-22 23:27:44 -LastEditor: John -LastEditTime: 2022-10-09 21:28:18 -Discription: -Environment: -''' -import torch -import torch.nn as nn -import torch.nn.functional as F -from torch.distributions import Bernoulli -from torch.autograd import Variable -import numpy as np - - -class PolicyGradient: - - def __init__(self, model,memory,cfg): - self.gamma = cfg['gamma'] - self.device = torch.device(cfg['device']) - self.memory = memory - self.policy_net = model.to(self.device) - self.optimizer = torch.optim.RMSprop(self.policy_net.parameters(), lr=cfg['lr']) - - def sample_action(self,state): - - state = torch.from_numpy(state).float() - state = Variable(state) - probs = self.policy_net(state) - m = Bernoulli(probs) # 伯努利分布 - action = m.sample() - - action = action.data.numpy().astype(int)[0] # 转为标量 - return action - def predict_action(self,state): - - state = torch.from_numpy(state).float() - state = Variable(state) - probs = self.policy_net(state) - m = Bernoulli(probs) # 伯努利分布 - action = m.sample() - action = action.data.numpy().astype(int)[0] # 转为标量 - return action - - def update(self): - state_pool,action_pool,reward_pool= self.memory.sample() - state_pool,action_pool,reward_pool = list(state_pool),list(action_pool),list(reward_pool) - # Discount reward - running_add = 0 - for i in reversed(range(len(reward_pool))): - if reward_pool[i] == 0: - running_add = 0 - else: - running_add = running_add * self.gamma + reward_pool[i] - reward_pool[i] = running_add - - # Normalize reward - reward_mean = np.mean(reward_pool) - reward_std = np.std(reward_pool) - for i in range(len(reward_pool)): - reward_pool[i] = (reward_pool[i] - reward_mean) / reward_std - - # Gradient Desent - self.optimizer.zero_grad() - - for i in range(len(reward_pool)): - state = state_pool[i] - action = Variable(torch.FloatTensor([action_pool[i]])) - reward = reward_pool[i] - state = Variable(torch.from_numpy(state).float()) - probs = self.policy_net(state) - m = Bernoulli(probs) - loss = -m.log_prob(action) * reward # Negtive score function x reward - # print(loss) - loss.backward() - self.optimizer.step() - self.memory.clear() - def save_model(self,path): - from pathlib import Path - # create path - Path(path).mkdir(parents=True, exist_ok=True) - torch.save(self.policy_net.state_dict(), path+'checkpoint.pt') - def load_model(self,path): - self.policy_net.load_state_dict(torch.load(path+'checkpoint.pt')) \ No newline at end of file diff --git a/projects/codes/QLearning/Test_CliffWalking-v0_QLearning_20221030-014151/config.yaml b/projects/codes/QLearning/Test_CliffWalking-v0_QLearning_20221030-014151/config.yaml deleted file mode 100644 index d9b4258..0000000 --- a/projects/codes/QLearning/Test_CliffWalking-v0_QLearning_20221030-014151/config.yaml +++ /dev/null @@ -1,21 +0,0 @@ -general_cfg: - algo_name: QLearning - device: cpu - env_name: CliffWalking-v0 - load_checkpoint: true - load_path: Train_CliffWalking-v0_QLearning_20221030-013856 - max_steps: 200 - mode: test - save_fig: true - seed: 1 - show_fig: false - test_eps: 20 - train_eps: 400 -algo_cfg: - batch_size: 64 - buffer_size: 100000 - epsilon_decay: 300 - epsilon_end: 0.01 - epsilon_start: 0.95 - gamma: 0.95 - lr: 0.1 diff --git a/projects/codes/QLearning/Test_CliffWalking-v0_QLearning_20221030-014151/logs/log.txt b/projects/codes/QLearning/Test_CliffWalking-v0_QLearning_20221030-014151/logs/log.txt deleted file mode 100644 index d89037e..0000000 --- a/projects/codes/QLearning/Test_CliffWalking-v0_QLearning_20221030-014151/logs/log.txt +++ /dev/null @@ -1,24 +0,0 @@ -2022-10-30 01:41:51 - r - INFO: - n_states: 48, n_actions: 4 -2022-10-30 01:41:51 - r - INFO: - Start testing! -2022-10-30 01:41:51 - r - INFO: - Env: CliffWalking-v0, Algorithm: QLearning, Device: cpu -2022-10-30 01:41:51 - r - INFO: - Episode: 1/20, Steps:13 Reward: -13.00 -2022-10-30 01:41:51 - r - INFO: - Episode: 2/20, Steps:13 Reward: -13.00 -2022-10-30 01:41:51 - r - INFO: - Episode: 3/20, Steps:13 Reward: -13.00 -2022-10-30 01:41:51 - r - INFO: - Episode: 4/20, Steps:13 Reward: -13.00 -2022-10-30 01:41:51 - r - INFO: - Episode: 5/20, Steps:13 Reward: -13.00 -2022-10-30 01:41:51 - r - INFO: - Episode: 6/20, Steps:13 Reward: -13.00 -2022-10-30 01:41:51 - r - INFO: - Episode: 7/20, Steps:13 Reward: -13.00 -2022-10-30 01:41:51 - r - INFO: - Episode: 8/20, Steps:13 Reward: -13.00 -2022-10-30 01:41:51 - r - INFO: - Episode: 9/20, Steps:13 Reward: -13.00 -2022-10-30 01:41:51 - r - INFO: - Episode: 10/20, Steps:13 Reward: -13.00 -2022-10-30 01:41:51 - r - INFO: - Episode: 11/20, Steps:13 Reward: -13.00 -2022-10-30 01:41:51 - r - INFO: - Episode: 12/20, Steps:13 Reward: -13.00 -2022-10-30 01:41:51 - r - INFO: - Episode: 13/20, Steps:13 Reward: -13.00 -2022-10-30 01:41:51 - r - INFO: - Episode: 14/20, Steps:13 Reward: -13.00 -2022-10-30 01:41:51 - r - INFO: - Episode: 15/20, Steps:13 Reward: -13.00 -2022-10-30 01:41:51 - r - INFO: - Episode: 16/20, Steps:13 Reward: -13.00 -2022-10-30 01:41:51 - r - INFO: - Episode: 17/20, Steps:13 Reward: -13.00 -2022-10-30 01:41:51 - r - INFO: - Episode: 18/20, Steps:13 Reward: -13.00 -2022-10-30 01:41:51 - r - INFO: - Episode: 19/20, Steps:13 Reward: -13.00 -2022-10-30 01:41:51 - r - INFO: - Episode: 20/20, Steps:13 Reward: -13.00 -2022-10-30 01:41:51 - r - INFO: - Finish testing! diff --git a/projects/codes/QLearning/Test_CliffWalking-v0_QLearning_20221030-014151/models/Qleaning_model.pkl b/projects/codes/QLearning/Test_CliffWalking-v0_QLearning_20221030-014151/models/Qleaning_model.pkl deleted file mode 100644 index 2022d46..0000000 Binary files a/projects/codes/QLearning/Test_CliffWalking-v0_QLearning_20221030-014151/models/Qleaning_model.pkl and /dev/null differ diff --git a/projects/codes/QLearning/Test_CliffWalking-v0_QLearning_20221030-014151/results/learning_curve.png b/projects/codes/QLearning/Test_CliffWalking-v0_QLearning_20221030-014151/results/learning_curve.png deleted file mode 100644 index 49a7daa..0000000 Binary files a/projects/codes/QLearning/Test_CliffWalking-v0_QLearning_20221030-014151/results/learning_curve.png and /dev/null differ diff --git a/projects/codes/QLearning/Test_CliffWalking-v0_QLearning_20221030-014151/results/res.csv b/projects/codes/QLearning/Test_CliffWalking-v0_QLearning_20221030-014151/results/res.csv deleted file mode 100644 index c48c7ef..0000000 --- a/projects/codes/QLearning/Test_CliffWalking-v0_QLearning_20221030-014151/results/res.csv +++ /dev/null @@ -1,21 +0,0 @@ -episodes,rewards,steps -0,-13,13 -1,-13,13 -2,-13,13 -3,-13,13 -4,-13,13 -5,-13,13 -6,-13,13 -7,-13,13 -8,-13,13 -9,-13,13 -10,-13,13 -11,-13,13 -12,-13,13 -13,-13,13 -14,-13,13 -15,-13,13 -16,-13,13 -17,-13,13 -18,-13,13 -19,-13,13 diff --git a/projects/codes/QLearning/Test_FrozenLakeNoSlippery-v1_QLearning_20221030-014552/config.yaml b/projects/codes/QLearning/Test_FrozenLakeNoSlippery-v1_QLearning_20221030-014552/config.yaml deleted file mode 100644 index 537c003..0000000 --- a/projects/codes/QLearning/Test_FrozenLakeNoSlippery-v1_QLearning_20221030-014552/config.yaml +++ /dev/null @@ -1,19 +0,0 @@ -general_cfg: - algo_name: QLearning - device: cpu - env_name: FrozenLakeNoSlippery-v1 - load_checkpoint: true - load_path: Train_FrozenLakeNoSlippery-v1_QLearning_20221030-014504 - max_steps: 200 - mode: test - save_fig: true - seed: 10 - show_fig: false - test_eps: 20 - train_eps: 800 -algo_cfg: - epsilon_decay: 2000 - epsilon_end: 0.1 - epsilon_start: 0.7 - gamma: 0.95 - lr: 0.9 diff --git a/projects/codes/QLearning/Test_FrozenLakeNoSlippery-v1_QLearning_20221030-014552/logs/log.txt b/projects/codes/QLearning/Test_FrozenLakeNoSlippery-v1_QLearning_20221030-014552/logs/log.txt deleted file mode 100644 index d972a0c..0000000 --- a/projects/codes/QLearning/Test_FrozenLakeNoSlippery-v1_QLearning_20221030-014552/logs/log.txt +++ /dev/null @@ -1,24 +0,0 @@ -2022-10-30 01:45:52 - r - INFO: - n_states: 16, n_actions: 4 -2022-10-30 01:45:52 - r - INFO: - Start testing! -2022-10-30 01:45:52 - r - INFO: - Env: FrozenLakeNoSlippery-v1, Algorithm: QLearning, Device: cpu -2022-10-30 01:45:52 - r - INFO: - Episode: 1/20, Steps:6 Reward: 1.00 -2022-10-30 01:45:52 - r - INFO: - Episode: 2/20, Steps:6 Reward: 1.00 -2022-10-30 01:45:52 - r - INFO: - Episode: 3/20, Steps:6 Reward: 1.00 -2022-10-30 01:45:52 - r - INFO: - Episode: 4/20, Steps:6 Reward: 1.00 -2022-10-30 01:45:52 - r - INFO: - Episode: 5/20, Steps:6 Reward: 1.00 -2022-10-30 01:45:52 - r - INFO: - Episode: 6/20, Steps:6 Reward: 1.00 -2022-10-30 01:45:52 - r - INFO: - Episode: 7/20, Steps:6 Reward: 1.00 -2022-10-30 01:45:52 - r - INFO: - Episode: 8/20, Steps:6 Reward: 1.00 -2022-10-30 01:45:52 - r - INFO: - Episode: 9/20, Steps:6 Reward: 1.00 -2022-10-30 01:45:52 - r - INFO: - Episode: 10/20, Steps:6 Reward: 1.00 -2022-10-30 01:45:52 - r - INFO: - Episode: 11/20, Steps:6 Reward: 1.00 -2022-10-30 01:45:52 - r - INFO: - Episode: 12/20, Steps:6 Reward: 1.00 -2022-10-30 01:45:52 - r - INFO: - Episode: 13/20, Steps:6 Reward: 1.00 -2022-10-30 01:45:52 - r - INFO: - Episode: 14/20, Steps:6 Reward: 1.00 -2022-10-30 01:45:52 - r - INFO: - Episode: 15/20, Steps:6 Reward: 1.00 -2022-10-30 01:45:52 - r - INFO: - Episode: 16/20, Steps:6 Reward: 1.00 -2022-10-30 01:45:52 - r - INFO: - Episode: 17/20, Steps:6 Reward: 1.00 -2022-10-30 01:45:52 - r - INFO: - Episode: 18/20, Steps:6 Reward: 1.00 -2022-10-30 01:45:52 - r - INFO: - Episode: 19/20, Steps:6 Reward: 1.00 -2022-10-30 01:45:52 - r - INFO: - Episode: 20/20, Steps:6 Reward: 1.00 -2022-10-30 01:45:52 - r - INFO: - Finish testing! diff --git a/projects/codes/QLearning/Test_FrozenLakeNoSlippery-v1_QLearning_20221030-014552/models/Qleaning_model.pkl b/projects/codes/QLearning/Test_FrozenLakeNoSlippery-v1_QLearning_20221030-014552/models/Qleaning_model.pkl deleted file mode 100644 index 41a5a05..0000000 Binary files a/projects/codes/QLearning/Test_FrozenLakeNoSlippery-v1_QLearning_20221030-014552/models/Qleaning_model.pkl and /dev/null differ diff --git a/projects/codes/QLearning/Test_FrozenLakeNoSlippery-v1_QLearning_20221030-014552/results/learning_curve.png b/projects/codes/QLearning/Test_FrozenLakeNoSlippery-v1_QLearning_20221030-014552/results/learning_curve.png deleted file mode 100644 index 60eeac6..0000000 Binary files a/projects/codes/QLearning/Test_FrozenLakeNoSlippery-v1_QLearning_20221030-014552/results/learning_curve.png and /dev/null differ diff --git a/projects/codes/QLearning/Test_FrozenLakeNoSlippery-v1_QLearning_20221030-014552/results/res.csv b/projects/codes/QLearning/Test_FrozenLakeNoSlippery-v1_QLearning_20221030-014552/results/res.csv deleted file mode 100644 index b871b84..0000000 --- a/projects/codes/QLearning/Test_FrozenLakeNoSlippery-v1_QLearning_20221030-014552/results/res.csv +++ /dev/null @@ -1,21 +0,0 @@ -episodes,rewards,steps -0,1.0,6 -1,1.0,6 -2,1.0,6 -3,1.0,6 -4,1.0,6 -5,1.0,6 -6,1.0,6 -7,1.0,6 -8,1.0,6 -9,1.0,6 -10,1.0,6 -11,1.0,6 -12,1.0,6 -13,1.0,6 -14,1.0,6 -15,1.0,6 -16,1.0,6 -17,1.0,6 -18,1.0,6 -19,1.0,6 diff --git a/projects/codes/QLearning/Test_Racetrack-v0_QLearning_20221030-014958/config.yaml b/projects/codes/QLearning/Test_Racetrack-v0_QLearning_20221030-014958/config.yaml deleted file mode 100644 index 42d7573..0000000 --- a/projects/codes/QLearning/Test_Racetrack-v0_QLearning_20221030-014958/config.yaml +++ /dev/null @@ -1,19 +0,0 @@ -general_cfg: - algo_name: QLearning - device: cpu - env_name: Racetrack-v0 - load_checkpoint: true - load_path: Train_Racetrack-v0_QLearning_20221030-014833 - max_steps: 200 - mode: test - save_fig: true - seed: 10 - show_fig: false - test_eps: 20 - train_eps: 400 -algo_cfg: - epsilon_decay: 300 - epsilon_end: 0.01 - epsilon_start: 0.95 - gamma: 0.9 - lr: 0.1 diff --git a/projects/codes/QLearning/Test_Racetrack-v0_QLearning_20221030-014958/logs/log.txt b/projects/codes/QLearning/Test_Racetrack-v0_QLearning_20221030-014958/logs/log.txt deleted file mode 100644 index f36fac9..0000000 --- a/projects/codes/QLearning/Test_Racetrack-v0_QLearning_20221030-014958/logs/log.txt +++ /dev/null @@ -1,24 +0,0 @@ -2022-10-30 01:49:58 - r - INFO: - n_states: 4, n_actions: 9 -2022-10-30 01:49:58 - r - INFO: - Start testing! -2022-10-30 01:49:58 - r - INFO: - Env: Racetrack-v0, Algorithm: QLearning, Device: cpu -2022-10-30 01:49:58 - r - INFO: - Episode: 1/20, Steps:14 Reward: -4.00 -2022-10-30 01:49:58 - r - INFO: - Episode: 2/20, Steps:8 Reward: 2.00 -2022-10-30 01:49:58 - r - INFO: - Episode: 3/20, Steps:6 Reward: 4.00 -2022-10-30 01:49:58 - r - INFO: - Episode: 4/20, Steps:22 Reward: -12.00 -2022-10-30 01:49:58 - r - INFO: - Episode: 5/20, Steps:15 Reward: -15.00 -2022-10-30 01:49:58 - r - INFO: - Episode: 6/20, Steps:6 Reward: 4.00 -2022-10-30 01:49:58 - r - INFO: - Episode: 7/20, Steps:5 Reward: 5.00 -2022-10-30 01:49:58 - r - INFO: - Episode: 8/20, Steps:8 Reward: 2.00 -2022-10-30 01:49:58 - r - INFO: - Episode: 9/20, Steps:15 Reward: -5.00 -2022-10-30 01:49:58 - r - INFO: - Episode: 10/20, Steps:8 Reward: 2.00 -2022-10-30 01:49:58 - r - INFO: - Episode: 11/20, Steps:5 Reward: 5.00 -2022-10-30 01:49:58 - r - INFO: - Episode: 12/20, Steps:15 Reward: -5.00 -2022-10-30 01:49:58 - r - INFO: - Episode: 13/20, Steps:6 Reward: 4.00 -2022-10-30 01:49:58 - r - INFO: - Episode: 14/20, Steps:31 Reward: -51.00 -2022-10-30 01:49:58 - r - INFO: - Episode: 15/20, Steps:13 Reward: -13.00 -2022-10-30 01:49:58 - r - INFO: - Episode: 16/20, Steps:7 Reward: 3.00 -2022-10-30 01:49:58 - r - INFO: - Episode: 17/20, Steps:6 Reward: 4.00 -2022-10-30 01:49:58 - r - INFO: - Episode: 18/20, Steps:5 Reward: 5.00 -2022-10-30 01:49:58 - r - INFO: - Episode: 19/20, Steps:17 Reward: -17.00 -2022-10-30 01:49:58 - r - INFO: - Episode: 20/20, Steps:15 Reward: -5.00 -2022-10-30 01:49:58 - r - INFO: - Finish testing! diff --git a/projects/codes/QLearning/Test_Racetrack-v0_QLearning_20221030-014958/models/Qleaning_model.pkl b/projects/codes/QLearning/Test_Racetrack-v0_QLearning_20221030-014958/models/Qleaning_model.pkl deleted file mode 100644 index 1f458e1..0000000 Binary files a/projects/codes/QLearning/Test_Racetrack-v0_QLearning_20221030-014958/models/Qleaning_model.pkl and /dev/null differ diff --git a/projects/codes/QLearning/Test_Racetrack-v0_QLearning_20221030-014958/results/learning_curve.png b/projects/codes/QLearning/Test_Racetrack-v0_QLearning_20221030-014958/results/learning_curve.png deleted file mode 100644 index 869b2c9..0000000 Binary files a/projects/codes/QLearning/Test_Racetrack-v0_QLearning_20221030-014958/results/learning_curve.png and /dev/null differ diff --git a/projects/codes/QLearning/Test_Racetrack-v0_QLearning_20221030-014958/results/res.csv b/projects/codes/QLearning/Test_Racetrack-v0_QLearning_20221030-014958/results/res.csv deleted file mode 100644 index cf33c86..0000000 --- a/projects/codes/QLearning/Test_Racetrack-v0_QLearning_20221030-014958/results/res.csv +++ /dev/null @@ -1,21 +0,0 @@ -episodes,rewards,steps -0,-4,14 -1,2,8 -2,4,6 -3,-12,22 -4,-15,15 -5,4,6 -6,5,5 -7,2,8 -8,-5,15 -9,2,8 -10,5,5 -11,-5,15 -12,4,6 -13,-51,31 -14,-13,13 -15,3,7 -16,4,6 -17,5,5 -18,-17,17 -19,-5,15 diff --git a/projects/codes/QLearning/Train_CliffWalking-v0_QLearning_20221030-014916/config.yaml b/projects/codes/QLearning/Train_CliffWalking-v0_QLearning_20221030-014916/config.yaml deleted file mode 100644 index 7610f6c..0000000 --- a/projects/codes/QLearning/Train_CliffWalking-v0_QLearning_20221030-014916/config.yaml +++ /dev/null @@ -1,19 +0,0 @@ -general_cfg: - algo_name: QLearning - device: cpu - env_name: CliffWalking-v0 - load_checkpoint: false - load_path: Train_CartPole-v1_DQN_20221026-054757 - max_steps: 200 - mode: train - save_fig: true - seed: 1 - show_fig: false - test_eps: 20 - train_eps: 800 -algo_cfg: - epsilon_decay: 300 - epsilon_end: 0.01 - epsilon_start: 0.95 - gamma: 0.95 - lr: 0.1 diff --git a/projects/codes/QLearning/Train_CliffWalking-v0_QLearning_20221030-014916/logs/log.txt b/projects/codes/QLearning/Train_CliffWalking-v0_QLearning_20221030-014916/logs/log.txt deleted file mode 100644 index d42935f..0000000 --- a/projects/codes/QLearning/Train_CliffWalking-v0_QLearning_20221030-014916/logs/log.txt +++ /dev/null @@ -1,804 +0,0 @@ -2022-10-30 01:49:16 - r - INFO: - n_states: 48, n_actions: 4 -2022-10-30 01:49:16 - r - INFO: - Start training! -2022-10-30 01:49:16 - r - INFO: - Env: CliffWalking-v0, Algorithm: QLearning, Device: cpu -2022-10-30 01:49:16 - r - INFO: - Episode: 1/800, Reward: -1586.00: Epislon: 0.493 -2022-10-30 01:49:16 - r - INFO: - Episode: 2/800, Reward: -1091.00: Epislon: 0.258 -2022-10-30 01:49:16 - r - INFO: - Episode: 3/800, Reward: -596.00: Epislon: 0.137 -2022-10-30 01:49:16 - r - INFO: - Episode: 4/800, Reward: -497.00: Epislon: 0.075 -2022-10-30 01:49:16 - r - INFO: - Episode: 5/800, Reward: -398.00: Epislon: 0.044 -2022-10-30 01:49:16 - r - INFO: - Episode: 6/800, Reward: -362.00: Epislon: 0.029 -2022-10-30 01:49:16 - r - INFO: - Episode: 7/800, Reward: -179.00: Epislon: 0.021 -2022-10-30 01:49:16 - r - INFO: - Episode: 8/800, Reward: -398.00: Epislon: 0.015 -2022-10-30 01:49:16 - r - INFO: - Episode: 9/800, Reward: -79.00: Epislon: 0.014 -2022-10-30 01:49:16 - r - INFO: - Episode: 10/800, Reward: -141.00: Epislon: 0.013 -2022-10-30 01:49:16 - r - INFO: - Episode: 11/800, Reward: -143.00: Epislon: 0.012 -2022-10-30 01:49:16 - r - INFO: - Episode: 12/800, Reward: -134.00: Epislon: 0.011 -2022-10-30 01:49:16 - r - INFO: - Episode: 13/800, Reward: -299.00: Epislon: 0.011 -2022-10-30 01:49:16 - r - INFO: - Episode: 14/800, Reward: -102.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 15/800, Reward: -61.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 16/800, Reward: -136.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 17/800, Reward: -176.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 18/800, Reward: -98.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 19/800, Reward: -92.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 20/800, Reward: -110.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 21/800, Reward: -67.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 22/800, Reward: -136.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 23/800, Reward: -98.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 24/800, Reward: -164.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 25/800, Reward: -65.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 26/800, Reward: -98.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 27/800, Reward: -33.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 28/800, Reward: -161.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 29/800, Reward: -72.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 30/800, Reward: -73.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 31/800, Reward: -116.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 32/800, Reward: -50.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 33/800, Reward: -66.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 34/800, Reward: -123.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 35/800, Reward: -40.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 36/800, Reward: -100.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 37/800, Reward: -56.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 38/800, Reward: -101.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 39/800, Reward: -55.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 40/800, Reward: -84.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 41/800, Reward: -68.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 42/800, Reward: -33.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 43/800, Reward: -113.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 44/800, Reward: -72.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 45/800, Reward: -36.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 46/800, Reward: -84.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 47/800, Reward: -45.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 48/800, Reward: -86.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 49/800, Reward: -57.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 50/800, Reward: -92.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 51/800, Reward: -39.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 52/800, Reward: -76.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 53/800, Reward: -39.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 54/800, Reward: -47.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 55/800, Reward: -88.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 56/800, Reward: -40.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 57/800, Reward: -55.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 58/800, Reward: -69.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 59/800, Reward: -51.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 60/800, Reward: -69.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 61/800, Reward: -36.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 62/800, Reward: -84.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 63/800, Reward: -38.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 64/800, Reward: -56.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 65/800, Reward: -58.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 66/800, Reward: -27.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 67/800, Reward: -64.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 68/800, Reward: -38.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 69/800, Reward: -53.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 70/800, Reward: -70.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 71/800, Reward: -40.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 72/800, Reward: -47.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 73/800, Reward: -71.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 74/800, Reward: -47.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 75/800, Reward: -32.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 76/800, Reward: -70.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 77/800, Reward: -36.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 78/800, Reward: -36.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 79/800, Reward: -73.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 80/800, Reward: -18.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 81/800, Reward: -67.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 82/800, Reward: -29.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 83/800, Reward: -25.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 84/800, Reward: -64.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 85/800, Reward: -28.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 86/800, Reward: -61.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 87/800, Reward: -33.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 88/800, Reward: -30.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 89/800, Reward: -25.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 90/800, Reward: -86.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 91/800, Reward: -19.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 92/800, Reward: -53.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 93/800, Reward: -39.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 94/800, Reward: -39.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 95/800, Reward: -36.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 96/800, Reward: -45.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 97/800, Reward: -43.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 98/800, Reward: -31.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 99/800, Reward: -37.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 100/800, Reward: -43.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 101/800, Reward: -46.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 102/800, Reward: -28.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 103/800, Reward: -32.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 104/800, Reward: -47.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 105/800, Reward: -37.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 106/800, Reward: -39.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 107/800, Reward: -38.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 108/800, Reward: -29.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 109/800, Reward: -44.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 110/800, Reward: -39.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 111/800, Reward: -26.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 112/800, Reward: -28.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 113/800, Reward: -63.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 114/800, Reward: -18.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 115/800, Reward: -52.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 116/800, Reward: -24.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 117/800, Reward: -26.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 118/800, Reward: -39.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 119/800, Reward: -21.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 120/800, Reward: -44.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 121/800, Reward: -48.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 122/800, Reward: -24.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 123/800, Reward: -40.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 124/800, Reward: -31.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 125/800, Reward: -23.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 126/800, Reward: -35.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 127/800, Reward: -26.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 128/800, Reward: -31.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 129/800, Reward: -33.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 130/800, Reward: -32.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 131/800, Reward: -36.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 132/800, Reward: -21.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 133/800, Reward: -43.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 134/800, Reward: -51.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 135/800, Reward: -17.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 136/800, Reward: -28.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 137/800, Reward: -24.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 138/800, Reward: -42.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 139/800, Reward: -21.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 140/800, Reward: -24.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 141/800, Reward: -30.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 142/800, Reward: -24.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 143/800, Reward: -36.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 144/800, Reward: -14.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 145/800, Reward: -42.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 146/800, Reward: -44.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 147/800, Reward: -25.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 148/800, Reward: -42.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 149/800, Reward: -16.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 150/800, Reward: -30.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 151/800, Reward: -32.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 152/800, Reward: -20.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 153/800, Reward: -44.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 154/800, Reward: -20.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 155/800, Reward: -29.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 156/800, Reward: -20.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 157/800, Reward: -30.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 158/800, Reward: -18.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 159/800, Reward: -25.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 160/800, Reward: -38.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 161/800, Reward: -34.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 162/800, Reward: -25.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 163/800, Reward: -30.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 164/800, Reward: -24.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 165/800, Reward: -19.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 166/800, Reward: -34.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 167/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 168/800, Reward: -27.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 169/800, Reward: -22.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 170/800, Reward: -28.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 171/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 172/800, Reward: -20.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 173/800, Reward: -58.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 174/800, Reward: -18.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 175/800, Reward: -19.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 176/800, Reward: -29.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 177/800, Reward: -125.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 178/800, Reward: -156.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 179/800, Reward: -18.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 180/800, Reward: -25.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 181/800, Reward: -27.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 182/800, Reward: -21.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 183/800, Reward: -22.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 184/800, Reward: -21.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 185/800, Reward: -22.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 186/800, Reward: -25.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 187/800, Reward: -25.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 188/800, Reward: -27.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 189/800, Reward: -19.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 190/800, Reward: -26.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 191/800, Reward: -17.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 192/800, Reward: -21.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 193/800, Reward: -30.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 194/800, Reward: -16.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 195/800, Reward: -35.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 196/800, Reward: -21.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 197/800, Reward: -134.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 198/800, Reward: -21.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 199/800, Reward: -18.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 200/800, Reward: -27.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 201/800, Reward: -16.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 202/800, Reward: -44.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 203/800, Reward: -23.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 204/800, Reward: -23.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 205/800, Reward: -16.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 206/800, Reward: -18.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 207/800, Reward: -17.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 208/800, Reward: -18.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 209/800, Reward: -16.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 210/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 211/800, Reward: -19.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 212/800, Reward: -26.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 213/800, Reward: -27.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 214/800, Reward: -133.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 215/800, Reward: -21.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 216/800, Reward: -24.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 217/800, Reward: -29.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 218/800, Reward: -24.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 219/800, Reward: -16.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 220/800, Reward: -16.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 221/800, Reward: -24.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 222/800, Reward: -17.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 223/800, Reward: -143.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 224/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 225/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 226/800, Reward: -23.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 227/800, Reward: -21.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 228/800, Reward: -26.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 229/800, Reward: -16.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 230/800, Reward: -19.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 231/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 232/800, Reward: -19.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 233/800, Reward: -29.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 234/800, Reward: -26.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 235/800, Reward: -30.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 236/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 237/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 238/800, Reward: -21.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 239/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 240/800, Reward: -18.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 241/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 242/800, Reward: -16.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 243/800, Reward: -27.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 244/800, Reward: -17.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 245/800, Reward: -27.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 246/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 247/800, Reward: -14.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 248/800, Reward: -28.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 249/800, Reward: -17.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 250/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 251/800, Reward: -17.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 252/800, Reward: -19.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 253/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 254/800, Reward: -23.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 255/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 256/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 257/800, Reward: -19.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 258/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 259/800, Reward: -22.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 260/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 261/800, Reward: -16.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 262/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 263/800, Reward: -34.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 264/800, Reward: -16.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 265/800, Reward: -26.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 266/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 267/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 268/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 269/800, Reward: -18.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 270/800, Reward: -18.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 271/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 272/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 273/800, Reward: -19.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 274/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 275/800, Reward: -27.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 276/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 277/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 278/800, Reward: -22.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 279/800, Reward: -17.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 280/800, Reward: -17.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 281/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 282/800, Reward: -26.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 283/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 284/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 285/800, Reward: -14.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 286/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 287/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 288/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 289/800, Reward: -14.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 290/800, Reward: -24.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 291/800, Reward: -21.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 292/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 293/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 294/800, Reward: -14.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 295/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 296/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 297/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 298/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 299/800, Reward: -14.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 300/800, Reward: -21.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 301/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 302/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 303/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 304/800, Reward: -21.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 305/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 306/800, Reward: -22.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 307/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 308/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 309/800, Reward: -16.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 310/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 311/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 312/800, Reward: -31.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 313/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 314/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 315/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 316/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 317/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 318/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 319/800, Reward: -14.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 320/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 321/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 322/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 323/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 324/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 325/800, Reward: -16.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 326/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 327/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 328/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 329/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 330/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 331/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 332/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 333/800, Reward: -16.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 334/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 335/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 336/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 337/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 338/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 339/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 340/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 341/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 342/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 343/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 344/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 345/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 346/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 347/800, Reward: -22.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 348/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 349/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 350/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 351/800, Reward: -17.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 352/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 353/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 354/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 355/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 356/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 357/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 358/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 359/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 360/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 361/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 362/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 363/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 364/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 365/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 366/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 367/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 368/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 369/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 370/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 371/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 372/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 373/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 374/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 375/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 376/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 377/800, Reward: -21.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 378/800, Reward: -123.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 379/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 380/800, Reward: -14.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 381/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 382/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 383/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 384/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 385/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 386/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 387/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 388/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 389/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 390/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 391/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 392/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 393/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 394/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 395/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 396/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 397/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 398/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 399/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 400/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 401/800, Reward: -14.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 402/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 403/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 404/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 405/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 406/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 407/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 408/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 409/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 410/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 411/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 412/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 413/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 414/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 415/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 416/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 417/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 418/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 419/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 420/800, Reward: -113.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 421/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 422/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 423/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 424/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 425/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 426/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 427/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 428/800, Reward: -115.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 429/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 430/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 431/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 432/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 433/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 434/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 435/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 436/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 437/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 438/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 439/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 440/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 441/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 442/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 443/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 444/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 445/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 446/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 447/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 448/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 449/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 450/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 451/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 452/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 453/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 454/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 455/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 456/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 457/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 458/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 459/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 460/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 461/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 462/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 463/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 464/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 465/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 466/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 467/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 468/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 469/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 470/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 471/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 472/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 473/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 474/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 475/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 476/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 477/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 478/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 479/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 480/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 481/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 482/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 483/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 484/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 485/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 486/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 487/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 488/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 489/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 490/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 491/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 492/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 493/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 494/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 495/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 496/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 497/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 498/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 499/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 500/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 501/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 502/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 503/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 504/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 505/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 506/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 507/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 508/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 509/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 510/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 511/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 512/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 513/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 514/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 515/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 516/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 517/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 518/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 519/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 520/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 521/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 522/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 523/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 524/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 525/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 526/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 527/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 528/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 529/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 530/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 531/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 532/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 533/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 534/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 535/800, Reward: -122.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 536/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 537/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 538/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 539/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 540/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 541/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 542/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 543/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 544/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 545/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 546/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 547/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 548/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 549/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 550/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 551/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 552/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 553/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 554/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 555/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 556/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 557/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 558/800, Reward: -115.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 559/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 560/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 561/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 562/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 563/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 564/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 565/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 566/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 567/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 568/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 569/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 570/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 571/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 572/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 573/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 574/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 575/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 576/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 577/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 578/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 579/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 580/800, Reward: -122.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 581/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 582/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 583/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 584/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 585/800, Reward: -14.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 586/800, Reward: -14.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 587/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 588/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 589/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 590/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 591/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 592/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 593/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 594/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 595/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 596/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 597/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 598/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 599/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 600/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 601/800, Reward: -122.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 602/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 603/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 604/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 605/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 606/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 607/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 608/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 609/800, Reward: -116.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 610/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 611/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 612/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 613/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 614/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 615/800, Reward: -115.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 616/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 617/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 618/800, Reward: -122.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 619/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 620/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 621/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 622/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 623/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 624/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 625/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 626/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 627/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 628/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 629/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 630/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 631/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 632/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 633/800, Reward: -116.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 634/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 635/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 636/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 637/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 638/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 639/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 640/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 641/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 642/800, Reward: -117.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 643/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 644/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 645/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 646/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 647/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 648/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 649/800, Reward: -223.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 650/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 651/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 652/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 653/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 654/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 655/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 656/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 657/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 658/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 659/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 660/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 661/800, Reward: -221.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 662/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 663/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 664/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 665/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 666/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 667/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 668/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 669/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 670/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 671/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 672/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 673/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 674/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 675/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 676/800, Reward: -21.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 677/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 678/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 679/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 680/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 681/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 682/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 683/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 684/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 685/800, Reward: -113.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 686/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 687/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 688/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 689/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 690/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 691/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 692/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 693/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 694/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 695/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 696/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 697/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 698/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 699/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 700/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 701/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 702/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 703/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 704/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 705/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 706/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 707/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 708/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 709/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 710/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 711/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 712/800, Reward: -17.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 713/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 714/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 715/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 716/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 717/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 718/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 719/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 720/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 721/800, Reward: -14.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 722/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 723/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 724/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 725/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 726/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 727/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 728/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 729/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 730/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 731/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 732/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 733/800, Reward: -14.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 734/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 735/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 736/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 737/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 738/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 739/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 740/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 741/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 742/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 743/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 744/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 745/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 746/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 747/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 748/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 749/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 750/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 751/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 752/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 753/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 754/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 755/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 756/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 757/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 758/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 759/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 760/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 761/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 762/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 763/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 764/800, Reward: -122.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 765/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 766/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 767/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 768/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 769/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 770/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 771/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 772/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 773/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 774/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 775/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 776/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 777/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 778/800, Reward: -14.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 779/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 780/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 781/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 782/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 783/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 784/800, Reward: -17.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 785/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 786/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 787/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 788/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 789/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 790/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 791/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 792/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 793/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 794/800, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 795/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 796/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 797/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 798/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 799/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Episode: 800/800, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:49:16 - r - INFO: - Finish training! diff --git a/projects/codes/QLearning/Train_CliffWalking-v0_QLearning_20221030-014916/models/Qleaning_model.pkl b/projects/codes/QLearning/Train_CliffWalking-v0_QLearning_20221030-014916/models/Qleaning_model.pkl deleted file mode 100644 index 3be0dc4..0000000 Binary files a/projects/codes/QLearning/Train_CliffWalking-v0_QLearning_20221030-014916/models/Qleaning_model.pkl and /dev/null differ diff --git a/projects/codes/QLearning/Train_CliffWalking-v0_QLearning_20221030-014916/results/learning_curve.png b/projects/codes/QLearning/Train_CliffWalking-v0_QLearning_20221030-014916/results/learning_curve.png deleted file mode 100644 index ee7abc9..0000000 Binary files a/projects/codes/QLearning/Train_CliffWalking-v0_QLearning_20221030-014916/results/learning_curve.png and /dev/null differ diff --git a/projects/codes/QLearning/Train_CliffWalking-v0_QLearning_20221030-014916/results/res.csv b/projects/codes/QLearning/Train_CliffWalking-v0_QLearning_20221030-014916/results/res.csv deleted file mode 100644 index 3799662..0000000 --- a/projects/codes/QLearning/Train_CliffWalking-v0_QLearning_20221030-014916/results/res.csv +++ /dev/null @@ -1,801 +0,0 @@ -episodes,rewards,steps -0,-1586,200 -1,-1091,200 -2,-596,200 -3,-497,200 -4,-398,200 -5,-362,164 -6,-179,179 -7,-398,200 -8,-79,79 -9,-141,141 -10,-143,143 -11,-134,134 -12,-299,200 -13,-102,102 -14,-61,61 -15,-136,136 -16,-176,176 -17,-98,98 -18,-92,92 -19,-110,110 -20,-67,67 -21,-136,136 -22,-98,98 -23,-164,164 -24,-65,65 -25,-98,98 -26,-33,33 -27,-161,161 -28,-72,72 -29,-73,73 -30,-116,116 -31,-50,50 -32,-66,66 -33,-123,123 -34,-40,40 -35,-100,100 -36,-56,56 -37,-101,101 -38,-55,55 -39,-84,84 -40,-68,68 -41,-33,33 -42,-113,113 -43,-72,72 -44,-36,36 -45,-84,84 -46,-45,45 -47,-86,86 -48,-57,57 -49,-92,92 -50,-39,39 -51,-76,76 -52,-39,39 -53,-47,47 -54,-88,88 -55,-40,40 -56,-55,55 -57,-69,69 -58,-51,51 -59,-69,69 -60,-36,36 -61,-84,84 -62,-38,38 -63,-56,56 -64,-58,58 -65,-27,27 -66,-64,64 -67,-38,38 -68,-53,53 -69,-70,70 -70,-40,40 -71,-47,47 -72,-71,71 -73,-47,47 -74,-32,32 -75,-70,70 -76,-36,36 -77,-36,36 -78,-73,73 -79,-18,18 -80,-67,67 -81,-29,29 -82,-25,25 -83,-64,64 -84,-28,28 -85,-61,61 -86,-33,33 -87,-30,30 -88,-25,25 -89,-86,86 -90,-19,19 -91,-53,53 -92,-39,39 -93,-39,39 -94,-36,36 -95,-45,45 -96,-43,43 -97,-31,31 -98,-37,37 -99,-43,43 -100,-46,46 -101,-28,28 -102,-32,32 -103,-47,47 -104,-37,37 -105,-39,39 -106,-38,38 -107,-29,29 -108,-44,44 -109,-39,39 -110,-26,26 -111,-28,28 -112,-63,63 -113,-18,18 -114,-52,52 -115,-24,24 -116,-26,26 -117,-39,39 -118,-21,21 -119,-44,44 -120,-48,48 -121,-24,24 -122,-40,40 -123,-31,31 -124,-23,23 -125,-35,35 -126,-26,26 -127,-31,31 -128,-33,33 -129,-32,32 -130,-36,36 -131,-21,21 -132,-43,43 -133,-51,51 -134,-17,17 -135,-28,28 -136,-24,24 -137,-42,42 -138,-21,21 -139,-24,24 -140,-30,30 -141,-24,24 -142,-36,36 -143,-14,14 -144,-42,42 -145,-44,44 -146,-25,25 -147,-42,42 -148,-16,16 -149,-30,30 -150,-32,32 -151,-20,20 -152,-44,44 -153,-20,20 -154,-29,29 -155,-20,20 -156,-30,30 -157,-18,18 -158,-25,25 -159,-38,38 -160,-34,34 -161,-25,25 -162,-30,30 -163,-24,24 -164,-19,19 -165,-34,34 -166,-15,15 -167,-27,27 -168,-22,22 -169,-28,28 -170,-13,13 -171,-20,20 -172,-58,58 -173,-18,18 -174,-19,19 -175,-29,29 -176,-125,26 -177,-156,57 -178,-18,18 -179,-25,25 -180,-27,27 -181,-21,21 -182,-22,22 -183,-21,21 -184,-22,22 -185,-25,25 -186,-25,25 -187,-27,27 -188,-19,19 -189,-26,26 -190,-17,17 -191,-21,21 -192,-30,30 -193,-16,16 -194,-35,35 -195,-21,21 -196,-134,35 -197,-21,21 -198,-18,18 -199,-27,27 -200,-16,16 -201,-44,44 -202,-23,23 -203,-23,23 -204,-16,16 -205,-18,18 -206,-17,17 -207,-18,18 -208,-16,16 -209,-13,13 -210,-19,19 -211,-26,26 -212,-27,27 -213,-133,34 -214,-21,21 -215,-24,24 -216,-29,29 -217,-24,24 -218,-16,16 -219,-16,16 -220,-24,24 -221,-17,17 -222,-143,44 -223,-15,15 -224,-15,15 -225,-23,23 -226,-21,21 -227,-26,26 -228,-16,16 -229,-19,19 -230,-13,13 -231,-19,19 -232,-29,29 -233,-26,26 -234,-30,30 -235,-13,13 -236,-13,13 -237,-21,21 -238,-15,15 -239,-18,18 -240,-13,13 -241,-16,16 -242,-27,27 -243,-17,17 -244,-27,27 -245,-15,15 -246,-14,14 -247,-28,28 -248,-17,17 -249,-15,15 -250,-17,17 -251,-19,19 -252,-13,13 -253,-23,23 -254,-15,15 -255,-15,15 -256,-19,19 -257,-15,15 -258,-22,22 -259,-13,13 -260,-16,16 -261,-15,15 -262,-34,34 -263,-16,16 -264,-26,26 -265,-13,13 -266,-15,15 -267,-15,15 -268,-18,18 -269,-18,18 -270,-13,13 -271,-13,13 -272,-19,19 -273,-13,13 -274,-27,27 -275,-13,13 -276,-13,13 -277,-22,22 -278,-17,17 -279,-17,17 -280,-13,13 -281,-26,26 -282,-13,13 -283,-13,13 -284,-14,14 -285,-15,15 -286,-13,13 -287,-13,13 -288,-14,14 -289,-24,24 -290,-21,21 -291,-13,13 -292,-13,13 -293,-14,14 -294,-15,15 -295,-13,13 -296,-13,13 -297,-13,13 -298,-14,14 -299,-21,21 -300,-15,15 -301,-13,13 -302,-13,13 -303,-21,21 -304,-13,13 -305,-22,22 -306,-13,13 -307,-13,13 -308,-16,16 -309,-15,15 -310,-13,13 -311,-31,31 -312,-13,13 -313,-13,13 -314,-15,15 -315,-13,13 -316,-13,13 -317,-13,13 -318,-14,14 -319,-13,13 -320,-15,15 -321,-13,13 -322,-13,13 -323,-13,13 -324,-16,16 -325,-13,13 -326,-13,13 -327,-13,13 -328,-13,13 -329,-13,13 -330,-13,13 -331,-15,15 -332,-16,16 -333,-13,13 -334,-13,13 -335,-13,13 -336,-13,13 -337,-13,13 -338,-15,15 -339,-13,13 -340,-13,13 -341,-13,13 -342,-13,13 -343,-13,13 -344,-13,13 -345,-13,13 -346,-22,22 -347,-13,13 -348,-13,13 -349,-13,13 -350,-17,17 -351,-13,13 -352,-13,13 -353,-13,13 -354,-13,13 -355,-13,13 -356,-13,13 -357,-13,13 -358,-13,13 -359,-13,13 -360,-13,13 -361,-13,13 -362,-13,13 -363,-13,13 -364,-13,13 -365,-13,13 -366,-13,13 -367,-13,13 -368,-13,13 -369,-13,13 -370,-13,13 -371,-13,13 -372,-13,13 -373,-13,13 -374,-13,13 -375,-13,13 -376,-21,21 -377,-123,24 -378,-13,13 -379,-14,14 -380,-13,13 -381,-13,13 -382,-13,13 -383,-13,13 -384,-13,13 -385,-13,13 -386,-13,13 -387,-13,13 -388,-13,13 -389,-13,13 -390,-13,13 -391,-13,13 -392,-13,13 -393,-13,13 -394,-13,13 -395,-13,13 -396,-15,15 -397,-13,13 -398,-13,13 -399,-13,13 -400,-14,14 -401,-13,13 -402,-13,13 -403,-13,13 -404,-13,13 -405,-13,13 -406,-13,13 -407,-13,13 -408,-13,13 -409,-13,13 -410,-13,13 -411,-13,13 -412,-13,13 -413,-13,13 -414,-13,13 -415,-13,13 -416,-13,13 -417,-13,13 -418,-13,13 -419,-113,14 -420,-13,13 -421,-13,13 -422,-13,13 -423,-13,13 -424,-13,13 -425,-13,13 -426,-13,13 -427,-115,16 -428,-13,13 -429,-13,13 -430,-13,13 -431,-13,13 -432,-13,13 -433,-13,13 -434,-15,15 -435,-13,13 -436,-13,13 -437,-13,13 -438,-13,13 -439,-13,13 -440,-13,13 -441,-13,13 -442,-13,13 -443,-13,13 -444,-13,13 -445,-15,15 -446,-13,13 -447,-13,13 -448,-13,13 -449,-13,13 -450,-13,13 -451,-13,13 -452,-13,13 -453,-13,13 -454,-13,13 -455,-13,13 -456,-13,13 -457,-13,13 -458,-13,13 -459,-13,13 -460,-13,13 -461,-13,13 -462,-13,13 -463,-13,13 -464,-13,13 -465,-13,13 -466,-13,13 -467,-13,13 -468,-13,13 -469,-15,15 -470,-13,13 -471,-13,13 -472,-13,13 -473,-13,13 -474,-13,13 -475,-13,13 -476,-13,13 -477,-13,13 -478,-13,13 -479,-13,13 -480,-13,13 -481,-13,13 -482,-13,13 -483,-13,13 -484,-13,13 -485,-13,13 -486,-13,13 -487,-13,13 -488,-13,13 -489,-13,13 -490,-13,13 -491,-13,13 -492,-13,13 -493,-13,13 -494,-13,13 -495,-13,13 -496,-13,13 -497,-13,13 -498,-13,13 -499,-13,13 -500,-13,13 -501,-13,13 -502,-15,15 -503,-13,13 -504,-13,13 -505,-15,15 -506,-13,13 -507,-13,13 -508,-13,13 -509,-13,13 -510,-13,13 -511,-13,13 -512,-13,13 -513,-13,13 -514,-13,13 -515,-13,13 -516,-13,13 -517,-13,13 -518,-13,13 -519,-13,13 -520,-13,13 -521,-13,13 -522,-13,13 -523,-13,13 -524,-13,13 -525,-13,13 -526,-15,15 -527,-13,13 -528,-13,13 -529,-13,13 -530,-13,13 -531,-13,13 -532,-13,13 -533,-13,13 -534,-122,23 -535,-13,13 -536,-13,13 -537,-13,13 -538,-13,13 -539,-13,13 -540,-13,13 -541,-13,13 -542,-13,13 -543,-15,15 -544,-13,13 -545,-13,13 -546,-13,13 -547,-13,13 -548,-13,13 -549,-13,13 -550,-13,13 -551,-15,15 -552,-13,13 -553,-13,13 -554,-13,13 -555,-13,13 -556,-13,13 -557,-115,16 -558,-13,13 -559,-15,15 -560,-13,13 -561,-13,13 -562,-13,13 -563,-13,13 -564,-13,13 -565,-13,13 -566,-13,13 -567,-13,13 -568,-13,13 -569,-13,13 -570,-13,13 -571,-13,13 -572,-13,13 -573,-13,13 -574,-15,15 -575,-13,13 -576,-13,13 -577,-13,13 -578,-13,13 -579,-122,23 -580,-13,13 -581,-13,13 -582,-13,13 -583,-13,13 -584,-14,14 -585,-14,14 -586,-13,13 -587,-13,13 -588,-13,13 -589,-13,13 -590,-13,13 -591,-13,13 -592,-13,13 -593,-13,13 -594,-15,15 -595,-13,13 -596,-13,13 -597,-13,13 -598,-13,13 -599,-13,13 -600,-122,23 -601,-13,13 -602,-13,13 -603,-13,13 -604,-13,13 -605,-13,13 -606,-13,13 -607,-13,13 -608,-116,17 -609,-13,13 -610,-13,13 -611,-13,13 -612,-13,13 -613,-13,13 -614,-115,16 -615,-13,13 -616,-13,13 -617,-122,23 -618,-13,13 -619,-13,13 -620,-13,13 -621,-13,13 -622,-13,13 -623,-13,13 -624,-13,13 -625,-13,13 -626,-13,13 -627,-13,13 -628,-13,13 -629,-13,13 -630,-15,15 -631,-13,13 -632,-116,17 -633,-13,13 -634,-13,13 -635,-13,13 -636,-13,13 -637,-13,13 -638,-13,13 -639,-13,13 -640,-13,13 -641,-117,18 -642,-13,13 -643,-13,13 -644,-13,13 -645,-13,13 -646,-13,13 -647,-13,13 -648,-223,25 -649,-13,13 -650,-13,13 -651,-13,13 -652,-13,13 -653,-13,13 -654,-15,15 -655,-13,13 -656,-13,13 -657,-13,13 -658,-13,13 -659,-13,13 -660,-221,23 -661,-13,13 -662,-15,15 -663,-13,13 -664,-13,13 -665,-13,13 -666,-13,13 -667,-13,13 -668,-13,13 -669,-13,13 -670,-13,13 -671,-13,13 -672,-13,13 -673,-13,13 -674,-13,13 -675,-21,21 -676,-13,13 -677,-15,15 -678,-13,13 -679,-13,13 -680,-13,13 -681,-13,13 -682,-13,13 -683,-13,13 -684,-113,14 -685,-13,13 -686,-13,13 -687,-13,13 -688,-13,13 -689,-13,13 -690,-13,13 -691,-13,13 -692,-13,13 -693,-13,13 -694,-13,13 -695,-13,13 -696,-13,13 -697,-13,13 -698,-13,13 -699,-13,13 -700,-13,13 -701,-15,15 -702,-13,13 -703,-15,15 -704,-13,13 -705,-13,13 -706,-15,15 -707,-13,13 -708,-13,13 -709,-13,13 -710,-13,13 -711,-17,17 -712,-13,13 -713,-13,13 -714,-13,13 -715,-13,13 -716,-13,13 -717,-13,13 -718,-13,13 -719,-13,13 -720,-14,14 -721,-13,13 -722,-13,13 -723,-13,13 -724,-13,13 -725,-13,13 -726,-13,13 -727,-13,13 -728,-13,13 -729,-13,13 -730,-13,13 -731,-13,13 -732,-14,14 -733,-13,13 -734,-13,13 -735,-13,13 -736,-13,13 -737,-15,15 -738,-13,13 -739,-15,15 -740,-13,13 -741,-13,13 -742,-13,13 -743,-13,13 -744,-15,15 -745,-13,13 -746,-13,13 -747,-13,13 -748,-15,15 -749,-13,13 -750,-13,13 -751,-13,13 -752,-13,13 -753,-13,13 -754,-13,13 -755,-13,13 -756,-13,13 -757,-13,13 -758,-13,13 -759,-13,13 -760,-13,13 -761,-13,13 -762,-13,13 -763,-122,23 -764,-15,15 -765,-13,13 -766,-13,13 -767,-13,13 -768,-13,13 -769,-13,13 -770,-13,13 -771,-13,13 -772,-13,13 -773,-13,13 -774,-15,15 -775,-13,13 -776,-13,13 -777,-14,14 -778,-13,13 -779,-13,13 -780,-13,13 -781,-13,13 -782,-13,13 -783,-17,17 -784,-13,13 -785,-13,13 -786,-13,13 -787,-15,15 -788,-13,13 -789,-13,13 -790,-13,13 -791,-13,13 -792,-13,13 -793,-15,15 -794,-13,13 -795,-13,13 -796,-13,13 -797,-13,13 -798,-13,13 -799,-13,13 diff --git a/projects/codes/QLearning/Train_FrozenLakeNoSlippery-v1_QLearning_20221030-014504/config.yaml b/projects/codes/QLearning/Train_FrozenLakeNoSlippery-v1_QLearning_20221030-014504/config.yaml deleted file mode 100644 index a0bf456..0000000 --- a/projects/codes/QLearning/Train_FrozenLakeNoSlippery-v1_QLearning_20221030-014504/config.yaml +++ /dev/null @@ -1,19 +0,0 @@ -general_cfg: - algo_name: QLearning - device: cpu - env_name: FrozenLakeNoSlippery-v1 - load_checkpoint: false - load_path: Train_CartPole-v1_DQN_20221026-054757 - max_steps: 200 - mode: train - save_fig: true - seed: 10 - show_fig: false - test_eps: 20 - train_eps: 800 -algo_cfg: - epsilon_decay: 2000 - epsilon_end: 0.1 - epsilon_start: 0.7 - gamma: 0.95 - lr: 0.9 diff --git a/projects/codes/QLearning/Train_FrozenLakeNoSlippery-v1_QLearning_20221030-014504/logs/log.txt b/projects/codes/QLearning/Train_FrozenLakeNoSlippery-v1_QLearning_20221030-014504/logs/log.txt deleted file mode 100644 index f52cf7f..0000000 --- a/projects/codes/QLearning/Train_FrozenLakeNoSlippery-v1_QLearning_20221030-014504/logs/log.txt +++ /dev/null @@ -1,804 +0,0 @@ -2022-10-30 01:45:04 - r - INFO: - n_states: 16, n_actions: 4 -2022-10-30 01:45:04 - r - INFO: - Start training! -2022-10-30 01:45:04 - r - INFO: - Env: FrozenLakeNoSlippery-v1, Algorithm: QLearning, Device: cpu -2022-10-30 01:45:04 - r - INFO: - Episode: 1/800, Reward: 0.00: Epislon: 0.694 -2022-10-30 01:45:04 - r - INFO: - Episode: 2/800, Reward: 0.00: Epislon: 0.690 -2022-10-30 01:45:04 - r - INFO: - Episode: 3/800, Reward: 0.00: Epislon: 0.686 -2022-10-30 01:45:04 - r - INFO: - Episode: 4/800, Reward: 0.00: Epislon: 0.683 -2022-10-30 01:45:04 - r - INFO: - Episode: 5/800, Reward: 0.00: Epislon: 0.681 -2022-10-30 01:45:04 - r - INFO: - Episode: 6/800, Reward: 0.00: Epislon: 0.679 -2022-10-30 01:45:04 - r - INFO: - Episode: 7/800, Reward: 0.00: Epislon: 0.676 -2022-10-30 01:45:04 - r - INFO: - Episode: 8/800, Reward: 0.00: Epislon: 0.674 -2022-10-30 01:45:04 - r - INFO: - Episode: 9/800, Reward: 0.00: Epislon: 0.673 -2022-10-30 01:45:04 - r - INFO: - Episode: 10/800, Reward: 0.00: Epislon: 0.670 -2022-10-30 01:45:04 - r - INFO: - Episode: 11/800, Reward: 0.00: Epislon: 0.667 -2022-10-30 01:45:04 - r - INFO: - Episode: 12/800, Reward: 0.00: Epislon: 0.661 -2022-10-30 01:45:04 - r - INFO: - Episode: 13/800, Reward: 0.00: Epislon: 0.660 -2022-10-30 01:45:04 - r - INFO: - Episode: 14/800, Reward: 0.00: Epislon: 0.655 -2022-10-30 01:45:04 - r - INFO: - Episode: 15/800, Reward: 0.00: Epislon: 0.654 -2022-10-30 01:45:04 - r - INFO: - Episode: 16/800, Reward: 0.00: Epislon: 0.652 -2022-10-30 01:45:04 - r - INFO: - Episode: 17/800, Reward: 0.00: Epislon: 0.647 -2022-10-30 01:45:04 - r - INFO: - Episode: 18/800, Reward: 0.00: Epislon: 0.646 -2022-10-30 01:45:04 - r - INFO: - Episode: 19/800, Reward: 0.00: Epislon: 0.645 -2022-10-30 01:45:04 - r - INFO: - Episode: 20/800, Reward: 0.00: Epislon: 0.643 -2022-10-30 01:45:04 - r - INFO: - Episode: 21/800, Reward: 0.00: Epislon: 0.641 -2022-10-30 01:45:04 - r - INFO: - Episode: 22/800, Reward: 0.00: Epislon: 0.640 -2022-10-30 01:45:04 - r - INFO: - Episode: 23/800, Reward: 0.00: Epislon: 0.634 -2022-10-30 01:45:04 - r - INFO: - Episode: 24/800, Reward: 0.00: Epislon: 0.630 -2022-10-30 01:45:04 - r - INFO: - Episode: 25/800, Reward: 0.00: Epislon: 0.629 -2022-10-30 01:45:04 - r - INFO: - Episode: 26/800, Reward: 0.00: Epislon: 0.624 -2022-10-30 01:45:04 - r - INFO: - Episode: 27/800, Reward: 0.00: Epislon: 0.623 -2022-10-30 01:45:04 - r - INFO: - Episode: 28/800, Reward: 0.00: Epislon: 0.618 -2022-10-30 01:45:04 - r - INFO: - Episode: 29/800, Reward: 0.00: Epislon: 0.612 -2022-10-30 01:45:04 - r - INFO: - Episode: 30/800, Reward: 0.00: Epislon: 0.608 -2022-10-30 01:45:04 - r - INFO: - Episode: 31/800, Reward: 0.00: Epislon: 0.605 -2022-10-30 01:45:05 - r - INFO: - Episode: 32/800, Reward: 0.00: Epislon: 0.600 -2022-10-30 01:45:05 - r - INFO: - Episode: 33/800, Reward: 0.00: Epislon: 0.593 -2022-10-30 01:45:05 - r - INFO: - Episode: 34/800, Reward: 0.00: Epislon: 0.587 -2022-10-30 01:45:05 - r - INFO: - Episode: 35/800, Reward: 0.00: Epislon: 0.586 -2022-10-30 01:45:05 - r - INFO: - Episode: 36/800, Reward: 0.00: Epislon: 0.583 -2022-10-30 01:45:05 - r - INFO: - Episode: 37/800, Reward: 0.00: Epislon: 0.582 -2022-10-30 01:45:05 - r - INFO: - Episode: 38/800, Reward: 0.00: Epislon: 0.578 -2022-10-30 01:45:05 - r - INFO: - Episode: 39/800, Reward: 0.00: Epislon: 0.577 -2022-10-30 01:45:05 - r - INFO: - Episode: 40/800, Reward: 0.00: Epislon: 0.575 -2022-10-30 01:45:05 - r - INFO: - Episode: 41/800, Reward: 0.00: Epislon: 0.573 -2022-10-30 01:45:05 - r - INFO: - Episode: 42/800, Reward: 0.00: Epislon: 0.572 -2022-10-30 01:45:05 - r - INFO: - Episode: 43/800, Reward: 0.00: Epislon: 0.571 -2022-10-30 01:45:05 - r - INFO: - Episode: 44/800, Reward: 0.00: Epislon: 0.570 -2022-10-30 01:45:05 - r - INFO: - Episode: 45/800, Reward: 0.00: Epislon: 0.560 -2022-10-30 01:45:05 - r - INFO: - Episode: 46/800, Reward: 0.00: Epislon: 0.558 -2022-10-30 01:45:05 - r - INFO: - Episode: 47/800, Reward: 0.00: Epislon: 0.553 -2022-10-30 01:45:05 - r - INFO: - Episode: 48/800, Reward: 0.00: Epislon: 0.552 -2022-10-30 01:45:05 - r - INFO: - Episode: 49/800, Reward: 1.00: Epislon: 0.544 -2022-10-30 01:45:05 - r - INFO: - Episode: 50/800, Reward: 0.00: Epislon: 0.537 -2022-10-30 01:45:05 - r - INFO: - Episode: 51/800, Reward: 0.00: Epislon: 0.534 -2022-10-30 01:45:05 - r - INFO: - Episode: 52/800, Reward: 0.00: Epislon: 0.533 -2022-10-30 01:45:05 - r - INFO: - Episode: 53/800, Reward: 0.00: Epislon: 0.532 -2022-10-30 01:45:05 - r - INFO: - Episode: 54/800, Reward: 0.00: Epislon: 0.527 -2022-10-30 01:45:05 - r - INFO: - Episode: 55/800, Reward: 0.00: Epislon: 0.526 -2022-10-30 01:45:05 - r - INFO: - Episode: 56/800, Reward: 0.00: Epislon: 0.525 -2022-10-30 01:45:05 - r - INFO: - Episode: 57/800, Reward: 0.00: Epislon: 0.519 -2022-10-30 01:45:05 - r - INFO: - Episode: 58/800, Reward: 0.00: Epislon: 0.518 -2022-10-30 01:45:05 - r - INFO: - Episode: 59/800, Reward: 0.00: Epislon: 0.516 -2022-10-30 01:45:05 - r - INFO: - Episode: 60/800, Reward: 0.00: Epislon: 0.514 -2022-10-30 01:45:05 - r - INFO: - Episode: 61/800, Reward: 0.00: Epislon: 0.512 -2022-10-30 01:45:05 - r - INFO: - Episode: 62/800, Reward: 0.00: Epislon: 0.511 -2022-10-30 01:45:05 - r - INFO: - Episode: 63/800, Reward: 0.00: Epislon: 0.506 -2022-10-30 01:45:05 - r - INFO: - Episode: 64/800, Reward: 0.00: Epislon: 0.504 -2022-10-30 01:45:05 - r - INFO: - Episode: 65/800, Reward: 0.00: Epislon: 0.503 -2022-10-30 01:45:05 - r - INFO: - Episode: 66/800, Reward: 0.00: Epislon: 0.502 -2022-10-30 01:45:05 - r - INFO: - Episode: 67/800, Reward: 0.00: Epislon: 0.501 -2022-10-30 01:45:05 - r - INFO: - Episode: 68/800, Reward: 0.00: Epislon: 0.497 -2022-10-30 01:45:05 - r - INFO: - Episode: 69/800, Reward: 0.00: Epislon: 0.496 -2022-10-30 01:45:05 - r - INFO: - Episode: 70/800, Reward: 0.00: Epislon: 0.491 -2022-10-30 01:45:05 - r - INFO: - Episode: 71/800, Reward: 0.00: Epislon: 0.489 -2022-10-30 01:45:05 - r - INFO: - Episode: 72/800, Reward: 0.00: Epislon: 0.487 -2022-10-30 01:45:05 - r - INFO: - Episode: 73/800, Reward: 0.00: Epislon: 0.486 -2022-10-30 01:45:05 - r - INFO: - Episode: 74/800, Reward: 0.00: Epislon: 0.481 -2022-10-30 01:45:05 - r - INFO: - Episode: 75/800, Reward: 0.00: Epislon: 0.477 -2022-10-30 01:45:05 - r - INFO: - Episode: 76/800, Reward: 0.00: Epislon: 0.475 -2022-10-30 01:45:05 - r - INFO: - Episode: 77/800, Reward: 0.00: Epislon: 0.474 -2022-10-30 01:45:05 - r - INFO: - Episode: 78/800, Reward: 0.00: Epislon: 0.468 -2022-10-30 01:45:05 - r - INFO: - Episode: 79/800, Reward: 0.00: Epislon: 0.465 -2022-10-30 01:45:05 - r - INFO: - Episode: 80/800, Reward: 0.00: Epislon: 0.464 -2022-10-30 01:45:05 - r - INFO: - Episode: 81/800, Reward: 0.00: Epislon: 0.462 -2022-10-30 01:45:05 - r - INFO: - Episode: 82/800, Reward: 0.00: Epislon: 0.460 -2022-10-30 01:45:05 - r - INFO: - Episode: 83/800, Reward: 0.00: Epislon: 0.457 -2022-10-30 01:45:05 - r - INFO: - Episode: 84/800, Reward: 0.00: Epislon: 0.455 -2022-10-30 01:45:05 - r - INFO: - Episode: 85/800, Reward: 0.00: Epislon: 0.454 -2022-10-30 01:45:05 - r - INFO: - Episode: 86/800, Reward: 0.00: Epislon: 0.452 -2022-10-30 01:45:05 - r - INFO: - Episode: 87/800, Reward: 0.00: Epislon: 0.444 -2022-10-30 01:45:05 - r - INFO: - Episode: 88/800, Reward: 0.00: Epislon: 0.440 -2022-10-30 01:45:05 - r - INFO: - Episode: 89/800, Reward: 0.00: Epislon: 0.414 -2022-10-30 01:45:05 - r - INFO: - Episode: 90/800, Reward: 0.00: Epislon: 0.413 -2022-10-30 01:45:05 - r - INFO: - Episode: 91/800, Reward: 0.00: Epislon: 0.411 -2022-10-30 01:45:05 - r - INFO: - Episode: 92/800, Reward: 0.00: Epislon: 0.407 -2022-10-30 01:45:05 - r - INFO: - Episode: 93/800, Reward: 0.00: Epislon: 0.407 -2022-10-30 01:45:05 - r - INFO: - Episode: 94/800, Reward: 0.00: Epislon: 0.406 -2022-10-30 01:45:05 - r - INFO: - Episode: 95/800, Reward: 0.00: Epislon: 0.403 -2022-10-30 01:45:05 - r - INFO: - Episode: 96/800, Reward: 0.00: Epislon: 0.390 -2022-10-30 01:45:05 - r - INFO: - Episode: 97/800, Reward: 0.00: Epislon: 0.386 -2022-10-30 01:45:05 - r - INFO: - Episode: 98/800, Reward: 0.00: Epislon: 0.385 -2022-10-30 01:45:05 - r - INFO: - Episode: 99/800, Reward: 0.00: Epislon: 0.385 -2022-10-30 01:45:05 - r - INFO: - Episode: 100/800, Reward: 0.00: Epislon: 0.383 -2022-10-30 01:45:05 - r - INFO: - Episode: 101/800, Reward: 0.00: Epislon: 0.381 -2022-10-30 01:45:05 - r - INFO: - Episode: 102/800, Reward: 0.00: Epislon: 0.380 -2022-10-30 01:45:05 - r - INFO: - Episode: 103/800, Reward: 0.00: Epislon: 0.378 -2022-10-30 01:45:05 - r - INFO: - Episode: 104/800, Reward: 0.00: Epislon: 0.366 -2022-10-30 01:45:05 - r - INFO: - Episode: 105/800, Reward: 0.00: Epislon: 0.365 -2022-10-30 01:45:05 - r - INFO: - Episode: 106/800, Reward: 0.00: Epislon: 0.359 -2022-10-30 01:45:05 - r - INFO: - Episode: 107/800, Reward: 0.00: Epislon: 0.357 -2022-10-30 01:45:05 - r - INFO: - Episode: 108/800, Reward: 0.00: Epislon: 0.356 -2022-10-30 01:45:05 - r - INFO: - Episode: 109/800, Reward: 0.00: Epislon: 0.350 -2022-10-30 01:45:05 - r - INFO: - Episode: 110/800, Reward: 0.00: Epislon: 0.347 -2022-10-30 01:45:05 - r - INFO: - Episode: 111/800, Reward: 0.00: Epislon: 0.345 -2022-10-30 01:45:05 - r - INFO: - Episode: 112/800, Reward: 0.00: Epislon: 0.343 -2022-10-30 01:45:05 - r - INFO: - Episode: 113/800, Reward: 0.00: Epislon: 0.322 -2022-10-30 01:45:05 - r - INFO: - Episode: 114/800, Reward: 0.00: Epislon: 0.317 -2022-10-30 01:45:05 - r - INFO: - Episode: 115/800, Reward: 0.00: Epislon: 0.308 -2022-10-30 01:45:05 - r - INFO: - Episode: 116/800, Reward: 0.00: Epislon: 0.306 -2022-10-30 01:45:05 - r - INFO: - Episode: 117/800, Reward: 0.00: Epislon: 0.303 -2022-10-30 01:45:05 - r - INFO: - Episode: 118/800, Reward: 0.00: Epislon: 0.300 -2022-10-30 01:45:05 - r - INFO: - Episode: 119/800, Reward: 0.00: Epislon: 0.300 -2022-10-30 01:45:05 - r - INFO: - Episode: 120/800, Reward: 0.00: Epislon: 0.291 -2022-10-30 01:45:05 - r - INFO: - Episode: 121/800, Reward: 0.00: Epislon: 0.290 -2022-10-30 01:45:05 - r - INFO: - Episode: 122/800, Reward: 0.00: Epislon: 0.284 -2022-10-30 01:45:05 - r - INFO: - Episode: 123/800, Reward: 0.00: Epislon: 0.282 -2022-10-30 01:45:05 - r - INFO: - Episode: 124/800, Reward: 0.00: Epislon: 0.276 -2022-10-30 01:45:05 - r - INFO: - Episode: 125/800, Reward: 0.00: Epislon: 0.269 -2022-10-30 01:45:05 - r - INFO: - Episode: 126/800, Reward: 0.00: Epislon: 0.262 -2022-10-30 01:45:05 - r - INFO: - Episode: 127/800, Reward: 0.00: Epislon: 0.246 -2022-10-30 01:45:05 - r - INFO: - Episode: 128/800, Reward: 0.00: Epislon: 0.244 -2022-10-30 01:45:05 - r - INFO: - Episode: 129/800, Reward: 0.00: Epislon: 0.241 -2022-10-30 01:45:05 - r - INFO: - Episode: 130/800, Reward: 0.00: Epislon: 0.236 -2022-10-30 01:45:05 - r - INFO: - Episode: 131/800, Reward: 0.00: Epislon: 0.235 -2022-10-30 01:45:05 - r - INFO: - Episode: 132/800, Reward: 0.00: Epislon: 0.234 -2022-10-30 01:45:05 - r - INFO: - Episode: 133/800, Reward: 0.00: Epislon: 0.233 -2022-10-30 01:45:05 - r - INFO: - Episode: 134/800, Reward: 0.00: Epislon: 0.231 -2022-10-30 01:45:05 - r - INFO: - Episode: 135/800, Reward: 0.00: Epislon: 0.229 -2022-10-30 01:45:05 - r - INFO: - Episode: 136/800, Reward: 0.00: Epislon: 0.227 -2022-10-30 01:45:05 - r - INFO: - Episode: 137/800, Reward: 0.00: Epislon: 0.226 -2022-10-30 01:45:05 - r - INFO: - Episode: 138/800, Reward: 0.00: Epislon: 0.223 -2022-10-30 01:45:05 - r - INFO: - Episode: 139/800, Reward: 0.00: Epislon: 0.216 -2022-10-30 01:45:05 - r - INFO: - Episode: 140/800, Reward: 0.00: Epislon: 0.214 -2022-10-30 01:45:05 - r - INFO: - Episode: 141/800, Reward: 0.00: Epislon: 0.213 -2022-10-30 01:45:05 - r - INFO: - Episode: 142/800, Reward: 0.00: Epislon: 0.211 -2022-10-30 01:45:05 - r - INFO: - Episode: 143/800, Reward: 0.00: Epislon: 0.210 -2022-10-30 01:45:05 - r - INFO: - Episode: 144/800, Reward: 0.00: Epislon: 0.207 -2022-10-30 01:45:05 - r - INFO: - Episode: 145/800, Reward: 0.00: Epislon: 0.202 -2022-10-30 01:45:05 - r - INFO: - Episode: 146/800, Reward: 0.00: Epislon: 0.201 -2022-10-30 01:45:05 - r - INFO: - Episode: 147/800, Reward: 0.00: Epislon: 0.198 -2022-10-30 01:45:05 - r - INFO: - Episode: 148/800, Reward: 0.00: Epislon: 0.196 -2022-10-30 01:45:05 - r - INFO: - Episode: 149/800, Reward: 0.00: Epislon: 0.195 -2022-10-30 01:45:05 - r - INFO: - Episode: 150/800, Reward: 0.00: Epislon: 0.192 -2022-10-30 01:45:05 - r - INFO: - Episode: 151/800, Reward: 0.00: Epislon: 0.190 -2022-10-30 01:45:05 - r - INFO: - Episode: 152/800, Reward: 0.00: Epislon: 0.188 -2022-10-30 01:45:05 - r - INFO: - Episode: 153/800, Reward: 0.00: Epislon: 0.186 -2022-10-30 01:45:05 - r - INFO: - Episode: 154/800, Reward: 0.00: Epislon: 0.185 -2022-10-30 01:45:05 - r - INFO: - Episode: 155/800, Reward: 0.00: Epislon: 0.185 -2022-10-30 01:45:05 - r - INFO: - Episode: 156/800, Reward: 0.00: Epislon: 0.183 -2022-10-30 01:45:05 - r - INFO: - Episode: 157/800, Reward: 0.00: Epislon: 0.182 -2022-10-30 01:45:05 - r - INFO: - Episode: 158/800, Reward: 0.00: Epislon: 0.181 -2022-10-30 01:45:05 - r - INFO: - Episode: 159/800, Reward: 0.00: Epislon: 0.179 -2022-10-30 01:45:05 - r - INFO: - Episode: 160/800, Reward: 0.00: Epislon: 0.173 -2022-10-30 01:45:05 - r - INFO: - Episode: 161/800, Reward: 0.00: Epislon: 0.169 -2022-10-30 01:45:05 - r - INFO: - Episode: 162/800, Reward: 0.00: Epislon: 0.167 -2022-10-30 01:45:05 - r - INFO: - Episode: 163/800, Reward: 0.00: Epislon: 0.165 -2022-10-30 01:45:05 - r - INFO: - Episode: 164/800, Reward: 0.00: Epislon: 0.165 -2022-10-30 01:45:05 - r - INFO: - Episode: 165/800, Reward: 0.00: Epislon: 0.163 -2022-10-30 01:45:05 - r - INFO: - Episode: 166/800, Reward: 0.00: Epislon: 0.163 -2022-10-30 01:45:05 - r - INFO: - Episode: 167/800, Reward: 0.00: Epislon: 0.162 -2022-10-30 01:45:05 - r - INFO: - Episode: 168/800, Reward: 0.00: Epislon: 0.161 -2022-10-30 01:45:05 - r - INFO: - Episode: 169/800, Reward: 0.00: Epislon: 0.160 -2022-10-30 01:45:05 - r - INFO: - Episode: 170/800, Reward: 0.00: Epislon: 0.159 -2022-10-30 01:45:05 - r - INFO: - Episode: 171/800, Reward: 0.00: Epislon: 0.158 -2022-10-30 01:45:05 - r - INFO: - Episode: 172/800, Reward: 0.00: Epislon: 0.155 -2022-10-30 01:45:05 - r - INFO: - Episode: 173/800, Reward: 0.00: Epislon: 0.151 -2022-10-30 01:45:05 - r - INFO: - Episode: 174/800, Reward: 0.00: Epislon: 0.149 -2022-10-30 01:45:05 - r - INFO: - Episode: 175/800, Reward: 0.00: Epislon: 0.148 -2022-10-30 01:45:05 - r - INFO: - Episode: 176/800, Reward: 0.00: Epislon: 0.148 -2022-10-30 01:45:05 - r - INFO: - Episode: 177/800, Reward: 0.00: Epislon: 0.148 -2022-10-30 01:45:05 - r - INFO: - Episode: 178/800, Reward: 0.00: Epislon: 0.147 -2022-10-30 01:45:05 - r - INFO: - Episode: 179/800, Reward: 0.00: Epislon: 0.146 -2022-10-30 01:45:05 - r - INFO: - Episode: 180/800, Reward: 0.00: Epislon: 0.146 -2022-10-30 01:45:05 - r - INFO: - Episode: 181/800, Reward: 0.00: Epislon: 0.145 -2022-10-30 01:45:05 - r - INFO: - Episode: 182/800, Reward: 0.00: Epislon: 0.144 -2022-10-30 01:45:05 - r - INFO: - Episode: 183/800, Reward: 0.00: Epislon: 0.140 -2022-10-30 01:45:05 - r - INFO: - Episode: 184/800, Reward: 0.00: Epislon: 0.139 -2022-10-30 01:45:05 - r - INFO: - Episode: 185/800, Reward: 0.00: Epislon: 0.138 -2022-10-30 01:45:05 - r - INFO: - Episode: 186/800, Reward: 0.00: Epislon: 0.137 -2022-10-30 01:45:05 - r - INFO: - Episode: 187/800, Reward: 0.00: Epislon: 0.137 -2022-10-30 01:45:05 - r - INFO: - Episode: 188/800, Reward: 0.00: Epislon: 0.134 -2022-10-30 01:45:05 - r - INFO: - Episode: 189/800, Reward: 0.00: Epislon: 0.134 -2022-10-30 01:45:05 - r - INFO: - Episode: 190/800, Reward: 0.00: Epislon: 0.133 -2022-10-30 01:45:05 - r - INFO: - Episode: 191/800, Reward: 0.00: Epislon: 0.133 -2022-10-30 01:45:05 - r - INFO: - Episode: 192/800, Reward: 0.00: Epislon: 0.132 -2022-10-30 01:45:05 - r - INFO: - Episode: 193/800, Reward: 0.00: Epislon: 0.131 -2022-10-30 01:45:05 - r - INFO: - Episode: 194/800, Reward: 0.00: Epislon: 0.131 -2022-10-30 01:45:05 - r - INFO: - Episode: 195/800, Reward: 0.00: Epislon: 0.130 -2022-10-30 01:45:05 - r - INFO: - Episode: 196/800, Reward: 0.00: Epislon: 0.129 -2022-10-30 01:45:05 - r - INFO: - Episode: 197/800, Reward: 0.00: Epislon: 0.129 -2022-10-30 01:45:05 - r - INFO: - Episode: 198/800, Reward: 0.00: Epislon: 0.126 -2022-10-30 01:45:05 - r - INFO: - Episode: 199/800, Reward: 0.00: Epislon: 0.123 -2022-10-30 01:45:05 - r - INFO: - Episode: 200/800, Reward: 0.00: Epislon: 0.123 -2022-10-30 01:45:05 - r - INFO: - Episode: 201/800, Reward: 0.00: Epislon: 0.122 -2022-10-30 01:45:05 - r - INFO: - Episode: 202/800, Reward: 0.00: Epislon: 0.122 -2022-10-30 01:45:05 - r - INFO: - Episode: 203/800, Reward: 0.00: Epislon: 0.122 -2022-10-30 01:45:05 - r - INFO: - Episode: 204/800, Reward: 0.00: Epislon: 0.121 -2022-10-30 01:45:05 - r - INFO: - Episode: 205/800, Reward: 0.00: Epislon: 0.119 -2022-10-30 01:45:05 - r - INFO: - Episode: 206/800, Reward: 0.00: Epislon: 0.119 -2022-10-30 01:45:05 - r - INFO: - Episode: 207/800, Reward: 0.00: Epislon: 0.119 -2022-10-30 01:45:05 - r - INFO: - Episode: 208/800, Reward: 0.00: Epislon: 0.118 -2022-10-30 01:45:05 - r - INFO: - Episode: 209/800, Reward: 0.00: Epislon: 0.118 -2022-10-30 01:45:05 - r - INFO: - Episode: 210/800, Reward: 0.00: Epislon: 0.118 -2022-10-30 01:45:05 - r - INFO: - Episode: 211/800, Reward: 0.00: Epislon: 0.116 -2022-10-30 01:45:05 - r - INFO: - Episode: 212/800, Reward: 0.00: Epislon: 0.115 -2022-10-30 01:45:05 - r - INFO: - Episode: 213/800, Reward: 0.00: Epislon: 0.115 -2022-10-30 01:45:05 - r - INFO: - Episode: 214/800, Reward: 0.00: Epislon: 0.114 -2022-10-30 01:45:05 - r - INFO: - Episode: 215/800, Reward: 0.00: Epislon: 0.113 -2022-10-30 01:45:05 - r - INFO: - Episode: 216/800, Reward: 0.00: Epislon: 0.113 -2022-10-30 01:45:05 - r - INFO: - Episode: 217/800, Reward: 0.00: Epislon: 0.112 -2022-10-30 01:45:05 - r - INFO: - Episode: 218/800, Reward: 0.00: Epislon: 0.111 -2022-10-30 01:45:05 - r - INFO: - Episode: 219/800, Reward: 0.00: Epislon: 0.111 -2022-10-30 01:45:05 - r - INFO: - Episode: 220/800, Reward: 0.00: Epislon: 0.111 -2022-10-30 01:45:05 - r - INFO: - Episode: 221/800, Reward: 0.00: Epislon: 0.110 -2022-10-30 01:45:05 - r - INFO: - Episode: 222/800, Reward: 0.00: Epislon: 0.110 -2022-10-30 01:45:05 - r - INFO: - Episode: 223/800, Reward: 0.00: Epislon: 0.109 -2022-10-30 01:45:05 - r - INFO: - Episode: 224/800, Reward: 0.00: Epislon: 0.108 -2022-10-30 01:45:05 - r - INFO: - Episode: 225/800, Reward: 0.00: Epislon: 0.108 -2022-10-30 01:45:05 - r - INFO: - Episode: 226/800, Reward: 0.00: Epislon: 0.108 -2022-10-30 01:45:05 - r - INFO: - Episode: 227/800, Reward: 0.00: Epislon: 0.108 -2022-10-30 01:45:05 - r - INFO: - Episode: 228/800, Reward: 0.00: Epislon: 0.107 -2022-10-30 01:45:05 - r - INFO: - Episode: 229/800, Reward: 0.00: Epislon: 0.107 -2022-10-30 01:45:05 - r - INFO: - Episode: 230/800, Reward: 0.00: Epislon: 0.107 -2022-10-30 01:45:05 - r - INFO: - Episode: 231/800, Reward: 0.00: Epislon: 0.107 -2022-10-30 01:45:05 - r - INFO: - Episode: 232/800, Reward: 0.00: Epislon: 0.106 -2022-10-30 01:45:05 - r - INFO: - Episode: 233/800, Reward: 0.00: Epislon: 0.106 -2022-10-30 01:45:05 - r - INFO: - Episode: 234/800, Reward: 0.00: Epislon: 0.106 -2022-10-30 01:45:05 - r - INFO: - Episode: 235/800, Reward: 0.00: Epislon: 0.105 -2022-10-30 01:45:05 - r - INFO: - Episode: 236/800, Reward: 0.00: Epislon: 0.105 -2022-10-30 01:45:05 - r - INFO: - Episode: 237/800, Reward: 0.00: Epislon: 0.105 -2022-10-30 01:45:05 - r - INFO: - Episode: 238/800, Reward: 0.00: Epislon: 0.105 -2022-10-30 01:45:05 - r - INFO: - Episode: 239/800, Reward: 0.00: Epislon: 0.104 -2022-10-30 01:45:05 - r - INFO: - Episode: 240/800, Reward: 0.00: Epislon: 0.104 -2022-10-30 01:45:05 - r - INFO: - Episode: 241/800, Reward: 0.00: Epislon: 0.104 -2022-10-30 01:45:05 - r - INFO: - Episode: 242/800, Reward: 0.00: Epislon: 0.103 -2022-10-30 01:45:05 - r - INFO: - Episode: 243/800, Reward: 0.00: Epislon: 0.103 -2022-10-30 01:45:05 - r - INFO: - Episode: 244/800, Reward: 0.00: Epislon: 0.103 -2022-10-30 01:45:05 - r - INFO: - Episode: 245/800, Reward: 0.00: Epislon: 0.103 -2022-10-30 01:45:05 - r - INFO: - Episode: 246/800, Reward: 0.00: Epislon: 0.103 -2022-10-30 01:45:05 - r - INFO: - Episode: 247/800, Reward: 0.00: Epislon: 0.103 -2022-10-30 01:45:05 - r - INFO: - Episode: 248/800, Reward: 0.00: Epislon: 0.103 -2022-10-30 01:45:05 - r - INFO: - Episode: 249/800, Reward: 0.00: Epislon: 0.103 -2022-10-30 01:45:05 - r - INFO: - Episode: 250/800, Reward: 0.00: Epislon: 0.103 -2022-10-30 01:45:05 - r - INFO: - Episode: 251/800, Reward: 0.00: Epislon: 0.103 -2022-10-30 01:45:05 - r - INFO: - Episode: 252/800, Reward: 0.00: Epislon: 0.102 -2022-10-30 01:45:05 - r - INFO: - Episode: 253/800, Reward: 0.00: Epislon: 0.102 -2022-10-30 01:45:05 - r - INFO: - Episode: 254/800, Reward: 0.00: Epislon: 0.102 -2022-10-30 01:45:05 - r - INFO: - Episode: 255/800, Reward: 0.00: Epislon: 0.102 -2022-10-30 01:45:05 - r - INFO: - Episode: 256/800, Reward: 0.00: Epislon: 0.102 -2022-10-30 01:45:05 - r - INFO: - Episode: 257/800, Reward: 0.00: Epislon: 0.102 -2022-10-30 01:45:05 - r - INFO: - Episode: 258/800, Reward: 0.00: Epislon: 0.102 -2022-10-30 01:45:05 - r - INFO: - Episode: 259/800, Reward: 0.00: Epislon: 0.102 -2022-10-30 01:45:05 - r - INFO: - Episode: 260/800, Reward: 0.00: Epislon: 0.102 -2022-10-30 01:45:05 - r - INFO: - Episode: 261/800, Reward: 0.00: Epislon: 0.102 -2022-10-30 01:45:05 - r - INFO: - Episode: 262/800, Reward: 0.00: Epislon: 0.102 -2022-10-30 01:45:05 - r - INFO: - Episode: 263/800, Reward: 0.00: Epislon: 0.101 -2022-10-30 01:45:05 - r - INFO: - Episode: 264/800, Reward: 0.00: Epislon: 0.101 -2022-10-30 01:45:05 - r - INFO: - Episode: 265/800, Reward: 0.00: Epislon: 0.101 -2022-10-30 01:45:05 - r - INFO: - Episode: 266/800, Reward: 0.00: Epislon: 0.101 -2022-10-30 01:45:05 - r - INFO: - Episode: 267/800, Reward: 0.00: Epislon: 0.101 -2022-10-30 01:45:05 - r - INFO: - Episode: 268/800, Reward: 0.00: Epislon: 0.101 -2022-10-30 01:45:05 - r - INFO: - Episode: 269/800, Reward: 0.00: Epislon: 0.101 -2022-10-30 01:45:05 - r - INFO: - Episode: 270/800, Reward: 0.00: Epislon: 0.101 -2022-10-30 01:45:05 - r - INFO: - Episode: 271/800, Reward: 0.00: Epislon: 0.101 -2022-10-30 01:45:05 - r - INFO: - Episode: 272/800, Reward: 0.00: Epislon: 0.101 -2022-10-30 01:45:05 - r - INFO: - Episode: 273/800, Reward: 1.00: Epislon: 0.101 -2022-10-30 01:45:05 - r - INFO: - Episode: 274/800, Reward: 0.00: Epislon: 0.101 -2022-10-30 01:45:05 - r - INFO: - Episode: 275/800, Reward: 0.00: Epislon: 0.101 -2022-10-30 01:45:05 - r - INFO: - Episode: 276/800, Reward: 0.00: Epislon: 0.101 -2022-10-30 01:45:05 - r - INFO: - Episode: 277/800, Reward: 0.00: Epislon: 0.101 -2022-10-30 01:45:05 - r - INFO: - Episode: 278/800, Reward: 0.00: Epislon: 0.101 -2022-10-30 01:45:05 - r - INFO: - Episode: 279/800, Reward: 0.00: Epislon: 0.101 -2022-10-30 01:45:05 - r - INFO: - Episode: 280/800, Reward: 0.00: Epislon: 0.101 -2022-10-30 01:45:05 - r - INFO: - Episode: 281/800, Reward: 0.00: Epislon: 0.101 -2022-10-30 01:45:05 - r - INFO: - Episode: 282/800, Reward: 0.00: Epislon: 0.101 -2022-10-30 01:45:05 - r - INFO: - Episode: 283/800, Reward: 0.00: Epislon: 0.101 -2022-10-30 01:45:05 - r - INFO: - Episode: 284/800, Reward: 0.00: Epislon: 0.101 -2022-10-30 01:45:05 - r - INFO: - Episode: 285/800, Reward: 0.00: Epislon: 0.101 -2022-10-30 01:45:05 - r - INFO: - Episode: 286/800, Reward: 0.00: Epislon: 0.101 -2022-10-30 01:45:05 - r - INFO: - Episode: 287/800, Reward: 0.00: Epislon: 0.101 -2022-10-30 01:45:05 - r - INFO: - Episode: 288/800, Reward: 0.00: Epislon: 0.101 -2022-10-30 01:45:05 - r - INFO: - Episode: 289/800, Reward: 0.00: Epislon: 0.101 -2022-10-30 01:45:05 - r - INFO: - Episode: 290/800, Reward: 0.00: Epislon: 0.101 -2022-10-30 01:45:05 - r - INFO: - Episode: 291/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 292/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 293/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 294/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 295/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 296/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 297/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 298/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 299/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 300/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 301/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 302/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 303/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 304/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 305/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 306/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 307/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 308/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 309/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 310/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 311/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 312/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 313/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 314/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 315/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 316/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 317/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 318/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 319/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 320/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 321/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 322/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 323/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 324/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 325/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 326/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 327/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 328/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 329/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 330/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 331/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 332/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 333/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 334/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 335/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 336/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 337/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 338/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 339/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 340/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 341/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 342/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 343/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 344/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 345/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 346/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 347/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 348/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 349/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 350/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 351/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 352/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 353/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 354/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 355/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 356/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 357/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 358/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 359/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 360/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 361/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 362/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 363/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 364/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 365/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 366/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 367/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 368/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 369/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 370/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 371/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 372/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 373/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 374/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 375/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 376/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 377/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 378/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 379/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 380/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 381/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 382/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 383/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 384/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 385/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 386/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 387/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 388/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 389/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 390/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 391/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 392/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 393/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 394/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 395/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 396/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 397/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 398/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 399/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 400/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 401/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 402/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 403/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 404/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 405/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 406/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 407/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 408/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 409/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 410/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 411/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 412/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 413/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 414/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 415/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 416/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 417/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 418/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 419/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 420/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 421/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 422/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 423/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 424/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 425/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 426/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 427/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 428/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 429/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 430/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 431/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 432/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 433/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 434/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 435/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 436/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 437/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 438/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 439/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 440/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 441/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 442/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 443/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 444/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 445/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 446/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 447/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 448/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 449/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 450/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 451/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 452/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 453/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 454/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 455/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 456/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 457/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 458/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 459/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 460/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 461/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 462/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 463/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 464/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 465/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 466/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 467/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 468/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 469/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 470/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 471/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 472/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 473/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 474/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 475/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 476/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 477/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 478/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 479/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 480/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 481/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 482/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 483/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 484/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 485/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 486/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 487/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 488/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 489/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 490/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 491/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 492/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 493/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 494/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 495/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 496/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 497/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 498/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 499/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 500/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 501/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 502/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 503/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 504/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 505/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 506/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 507/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 508/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 509/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 510/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 511/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 512/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 513/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 514/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 515/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 516/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 517/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 518/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 519/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 520/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 521/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 522/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 523/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 524/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 525/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 526/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 527/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 528/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 529/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 530/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 531/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 532/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 533/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 534/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 535/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 536/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 537/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 538/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 539/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 540/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 541/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 542/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 543/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 544/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 545/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 546/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 547/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 548/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 549/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 550/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 551/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 552/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 553/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 554/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 555/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 556/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 557/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 558/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 559/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 560/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 561/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 562/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 563/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 564/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 565/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 566/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 567/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 568/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 569/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 570/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 571/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 572/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 573/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 574/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 575/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 576/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 577/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 578/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 579/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 580/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 581/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 582/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 583/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 584/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 585/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 586/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 587/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 588/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 589/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 590/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 591/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 592/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 593/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 594/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 595/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 596/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 597/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 598/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 599/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 600/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 601/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 602/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 603/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 604/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 605/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 606/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 607/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 608/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 609/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 610/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 611/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 612/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 613/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 614/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 615/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 616/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 617/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 618/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 619/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 620/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 621/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 622/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 623/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 624/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 625/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 626/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 627/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 628/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 629/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 630/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 631/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 632/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 633/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 634/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 635/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 636/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 637/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 638/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 639/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 640/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 641/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 642/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 643/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 644/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 645/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 646/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 647/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 648/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 649/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 650/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 651/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 652/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 653/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 654/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 655/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 656/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 657/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 658/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 659/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 660/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 661/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 662/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 663/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 664/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 665/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 666/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 667/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 668/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 669/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 670/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 671/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 672/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 673/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 674/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 675/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 676/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 677/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 678/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 679/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 680/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 681/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 682/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 683/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 684/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 685/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 686/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 687/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 688/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 689/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 690/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 691/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 692/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 693/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 694/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 695/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 696/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 697/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 698/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 699/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 700/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 701/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 702/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 703/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 704/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 705/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 706/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 707/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 708/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 709/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 710/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 711/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 712/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 713/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 714/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 715/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 716/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 717/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 718/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 719/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 720/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 721/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 722/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 723/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 724/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 725/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 726/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 727/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 728/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 729/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 730/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 731/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 732/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 733/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 734/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 735/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 736/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 737/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 738/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 739/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 740/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 741/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 742/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 743/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 744/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 745/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 746/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 747/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 748/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 749/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 750/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 751/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 752/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 753/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 754/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 755/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 756/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 757/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 758/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 759/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 760/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 761/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 762/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 763/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 764/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 765/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 766/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 767/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 768/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 769/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 770/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 771/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 772/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 773/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 774/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 775/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 776/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 777/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 778/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 779/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 780/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 781/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 782/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 783/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 784/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 785/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 786/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 787/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 788/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 789/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 790/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 791/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 792/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 793/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 794/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 795/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 796/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 797/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 798/800, Reward: 0.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 799/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Episode: 800/800, Reward: 1.00: Epislon: 0.100 -2022-10-30 01:45:05 - r - INFO: - Finish training! diff --git a/projects/codes/QLearning/Train_FrozenLakeNoSlippery-v1_QLearning_20221030-014504/models/Qleaning_model.pkl b/projects/codes/QLearning/Train_FrozenLakeNoSlippery-v1_QLearning_20221030-014504/models/Qleaning_model.pkl deleted file mode 100644 index 41a5a05..0000000 Binary files a/projects/codes/QLearning/Train_FrozenLakeNoSlippery-v1_QLearning_20221030-014504/models/Qleaning_model.pkl and /dev/null differ diff --git a/projects/codes/QLearning/Train_FrozenLakeNoSlippery-v1_QLearning_20221030-014504/results/learning_curve.png b/projects/codes/QLearning/Train_FrozenLakeNoSlippery-v1_QLearning_20221030-014504/results/learning_curve.png deleted file mode 100644 index ad789b7..0000000 Binary files a/projects/codes/QLearning/Train_FrozenLakeNoSlippery-v1_QLearning_20221030-014504/results/learning_curve.png and /dev/null differ diff --git a/projects/codes/QLearning/Train_FrozenLakeNoSlippery-v1_QLearning_20221030-014504/results/res.csv b/projects/codes/QLearning/Train_FrozenLakeNoSlippery-v1_QLearning_20221030-014504/results/res.csv deleted file mode 100644 index 335c1d8..0000000 --- a/projects/codes/QLearning/Train_FrozenLakeNoSlippery-v1_QLearning_20221030-014504/results/res.csv +++ /dev/null @@ -1,801 +0,0 @@ -episodes,rewards,steps -0,0.0,20 -1,0.0,14 -2,0.0,13 -3,0.0,9 -4,0.0,10 -5,0.0,6 -6,0.0,11 -7,0.0,6 -8,0.0,3 -9,0.0,9 -10,0.0,11 -11,0.0,22 -12,0.0,5 -13,0.0,16 -14,0.0,4 -15,0.0,9 -16,0.0,18 -17,0.0,2 -18,0.0,4 -19,0.0,8 -20,0.0,7 -21,0.0,4 -22,0.0,22 -23,0.0,15 -24,0.0,5 -25,0.0,16 -26,0.0,7 -27,0.0,19 -28,0.0,22 -29,0.0,16 -30,0.0,11 -31,0.0,22 -32,0.0,28 -33,0.0,23 -34,0.0,4 -35,0.0,11 -36,0.0,8 -37,0.0,15 -38,0.0,5 -39,0.0,7 -40,0.0,9 -41,0.0,4 -42,0.0,3 -43,0.0,6 -44,0.0,41 -45,0.0,9 -46,0.0,23 -47,0.0,3 -48,1.0,38 -49,0.0,29 -50,0.0,17 -51,0.0,4 -52,0.0,2 -53,0.0,25 -54,0.0,6 -55,0.0,2 -56,0.0,30 -57,0.0,6 -58,0.0,7 -59,0.0,11 -60,0.0,9 -61,0.0,8 -62,0.0,23 -63,0.0,10 -64,0.0,3 -65,0.0,5 -66,0.0,7 -67,0.0,18 -68,0.0,8 -69,0.0,26 -70,0.0,6 -71,0.0,14 -72,0.0,4 -73,0.0,25 -74,0.0,21 -75,0.0,13 -76,0.0,4 -77,0.0,29 -78,0.0,21 -79,0.0,6 -80,0.0,6 -81,0.0,11 -82,0.0,21 -83,0.0,9 -84,0.0,9 -85,0.0,7 -86,0.0,48 -87,0.0,23 -88,0.0,160 -89,0.0,7 -90,0.0,10 -91,0.0,24 -92,0.0,4 -93,0.0,7 -94,0.0,17 -95,0.0,87 -96,0.0,28 -97,0.0,7 -98,0.0,5 -99,0.0,12 -100,0.0,14 -101,0.0,6 -102,0.0,13 -103,0.0,93 -104,0.0,4 -105,0.0,50 -106,0.0,8 -107,0.0,12 -108,0.0,43 -109,0.0,30 -110,0.0,15 -111,0.0,19 -112,0.0,182 -113,0.0,40 -114,0.0,88 -115,0.0,19 -116,0.0,30 -117,0.0,27 -118,0.0,5 -119,0.0,87 -120,0.0,9 -121,0.0,64 -122,0.0,27 -123,0.0,68 -124,0.0,81 -125,0.0,86 -126,0.0,200 -127,0.0,27 -128,0.0,41 -129,0.0,70 -130,0.0,27 -131,0.0,6 -132,0.0,18 -133,0.0,38 -134,0.0,26 -135,0.0,36 -136,0.0,3 -137,0.0,61 -138,0.0,105 -139,0.0,38 -140,0.0,18 -141,0.0,33 -142,0.0,29 -143,0.0,49 -144,0.0,88 -145,0.0,22 -146,0.0,65 -147,0.0,36 -148,0.0,30 -149,0.0,58 -150,0.0,43 -151,0.0,53 -152,0.0,43 -153,0.0,13 -154,0.0,8 -155,0.0,39 -156,0.0,29 -157,0.0,26 -158,0.0,60 -159,0.0,153 -160,0.0,116 -161,0.0,53 -162,0.0,54 -163,0.0,8 -164,0.0,58 -165,0.0,3 -166,0.0,47 -167,0.0,16 -168,0.0,21 -169,0.0,44 -170,0.0,29 -171,0.0,104 -172,0.0,158 -173,0.0,83 -174,0.0,26 -175,0.0,24 -176,0.0,10 -177,0.0,12 -178,0.0,40 -179,0.0,25 -180,0.0,18 -181,0.0,60 -182,0.0,200 -183,0.0,24 -184,0.0,56 -185,0.0,71 -186,0.0,19 -187,0.0,118 -188,0.0,26 -189,0.0,41 -190,0.0,41 -191,0.0,60 -192,0.0,31 -193,0.0,34 -194,0.0,35 -195,0.0,59 -196,0.0,51 -197,0.0,200 -198,0.0,200 -199,0.0,37 -200,0.0,68 -201,0.0,40 -202,0.0,17 -203,0.0,79 -204,0.0,126 -205,0.0,61 -206,0.0,25 -207,0.0,18 -208,0.0,27 -209,0.0,13 -210,0.0,187 -211,0.0,160 -212,0.0,32 -213,0.0,108 -214,0.0,164 -215,0.0,17 -216,0.0,82 -217,0.0,194 -218,0.0,7 -219,0.0,36 -220,0.0,156 -221,0.0,17 -222,0.0,183 -223,0.0,200 -224,0.0,43 -225,0.0,87 -226,0.0,42 -227,0.0,80 -228,0.0,54 -229,0.0,82 -230,0.0,97 -231,0.0,65 -232,0.0,83 -233,0.0,159 -234,0.0,178 -235,0.0,104 -236,0.0,21 -237,0.0,118 -238,0.0,80 -239,0.0,170 -240,0.0,94 -241,0.0,200 -242,0.0,37 -243,0.0,11 -244,0.0,31 -245,0.0,134 -246,0.0,32 -247,0.0,58 -248,0.0,38 -249,0.0,28 -250,0.0,159 -251,0.0,182 -252,0.0,51 -253,0.0,25 -254,0.0,73 -255,0.0,56 -256,0.0,55 -257,0.0,38 -258,0.0,200 -259,0.0,92 -260,0.0,200 -261,0.0,119 -262,0.0,100 -263,0.0,84 -264,0.0,24 -265,0.0,17 -266,0.0,159 -267,0.0,25 -268,0.0,73 -269,0.0,130 -270,0.0,111 -271,0.0,65 -272,1.0,58 -273,0.0,47 -274,0.0,48 -275,0.0,13 -276,0.0,100 -277,0.0,38 -278,0.0,111 -279,0.0,200 -280,0.0,26 -281,0.0,38 -282,0.0,83 -283,0.0,42 -284,0.0,199 -285,0.0,83 -286,0.0,28 -287,0.0,46 -288,0.0,200 -289,0.0,62 -290,0.0,123 -291,0.0,91 -292,0.0,53 -293,0.0,19 -294,0.0,26 -295,0.0,93 -296,0.0,38 -297,0.0,22 -298,0.0,43 -299,0.0,163 -300,0.0,25 -301,0.0,59 -302,0.0,71 -303,0.0,20 -304,0.0,115 -305,0.0,200 -306,0.0,48 -307,0.0,66 -308,0.0,58 -309,0.0,129 -310,0.0,122 -311,0.0,47 -312,0.0,60 -313,0.0,79 -314,1.0,137 -315,0.0,27 -316,1.0,93 -317,0.0,46 -318,1.0,83 -319,1.0,8 -320,1.0,6 -321,1.0,6 -322,0.0,4 -323,1.0,6 -324,0.0,2 -325,1.0,6 -326,1.0,6 -327,1.0,6 -328,1.0,6 -329,1.0,8 -330,0.0,5 -331,1.0,6 -332,1.0,7 -333,0.0,5 -334,1.0,6 -335,1.0,6 -336,1.0,8 -337,1.0,6 -338,1.0,6 -339,1.0,6 -340,1.0,7 -341,1.0,6 -342,1.0,6 -343,0.0,3 -344,1.0,7 -345,0.0,4 -346,1.0,6 -347,1.0,6 -348,1.0,7 -349,1.0,6 -350,1.0,6 -351,1.0,7 -352,1.0,7 -353,1.0,7 -354,1.0,6 -355,1.0,6 -356,1.0,6 -357,1.0,6 -358,1.0,6 -359,1.0,6 -360,1.0,6 -361,1.0,7 -362,0.0,4 -363,1.0,8 -364,1.0,8 -365,1.0,7 -366,1.0,6 -367,1.0,8 -368,1.0,6 -369,1.0,6 -370,1.0,7 -371,1.0,6 -372,1.0,6 -373,1.0,8 -374,1.0,7 -375,1.0,6 -376,1.0,6 -377,0.0,3 -378,1.0,11 -379,1.0,6 -380,1.0,8 -381,0.0,2 -382,1.0,6 -383,1.0,6 -384,1.0,6 -385,1.0,6 -386,1.0,8 -387,1.0,6 -388,1.0,7 -389,1.0,6 -390,1.0,7 -391,1.0,6 -392,1.0,8 -393,0.0,2 -394,1.0,6 -395,1.0,7 -396,1.0,6 -397,1.0,6 -398,1.0,10 -399,1.0,7 -400,1.0,6 -401,1.0,6 -402,1.0,6 -403,1.0,6 -404,1.0,6 -405,1.0,7 -406,0.0,4 -407,1.0,7 -408,1.0,6 -409,1.0,8 -410,0.0,3 -411,1.0,6 -412,1.0,6 -413,1.0,6 -414,1.0,6 -415,0.0,2 -416,1.0,6 -417,1.0,6 -418,1.0,6 -419,1.0,6 -420,1.0,6 -421,1.0,7 -422,1.0,6 -423,1.0,6 -424,1.0,7 -425,1.0,6 -426,1.0,6 -427,1.0,6 -428,1.0,6 -429,1.0,6 -430,1.0,6 -431,1.0,6 -432,1.0,8 -433,1.0,6 -434,1.0,8 -435,1.0,7 -436,1.0,6 -437,0.0,3 -438,1.0,6 -439,1.0,7 -440,1.0,6 -441,1.0,6 -442,1.0,6 -443,1.0,10 -444,1.0,6 -445,1.0,6 -446,1.0,6 -447,1.0,6 -448,1.0,10 -449,1.0,6 -450,1.0,8 -451,1.0,8 -452,1.0,7 -453,1.0,6 -454,0.0,5 -455,0.0,2 -456,1.0,8 -457,1.0,6 -458,1.0,10 -459,1.0,6 -460,1.0,8 -461,1.0,10 -462,1.0,6 -463,1.0,6 -464,1.0,6 -465,1.0,10 -466,1.0,6 -467,0.0,4 -468,1.0,6 -469,1.0,6 -470,1.0,6 -471,1.0,15 -472,1.0,6 -473,1.0,6 -474,1.0,6 -475,1.0,6 -476,1.0,6 -477,1.0,6 -478,1.0,8 -479,1.0,6 -480,1.0,7 -481,1.0,6 -482,1.0,6 -483,1.0,8 -484,1.0,6 -485,1.0,6 -486,1.0,8 -487,1.0,8 -488,1.0,6 -489,1.0,6 -490,1.0,6 -491,1.0,10 -492,1.0,6 -493,1.0,6 -494,1.0,6 -495,1.0,6 -496,1.0,6 -497,1.0,6 -498,1.0,6 -499,1.0,8 -500,1.0,8 -501,1.0,6 -502,1.0,6 -503,0.0,2 -504,1.0,6 -505,1.0,6 -506,1.0,6 -507,1.0,8 -508,1.0,6 -509,1.0,6 -510,1.0,6 -511,1.0,6 -512,1.0,6 -513,1.0,6 -514,1.0,6 -515,1.0,6 -516,1.0,6 -517,1.0,7 -518,0.0,3 -519,1.0,7 -520,1.0,6 -521,1.0,6 -522,1.0,6 -523,0.0,2 -524,1.0,6 -525,1.0,8 -526,1.0,6 -527,1.0,6 -528,1.0,6 -529,1.0,6 -530,1.0,9 -531,1.0,6 -532,1.0,6 -533,1.0,6 -534,1.0,6 -535,1.0,6 -536,1.0,6 -537,1.0,9 -538,1.0,7 -539,0.0,4 -540,1.0,6 -541,1.0,8 -542,1.0,11 -543,1.0,6 -544,1.0,6 -545,1.0,6 -546,1.0,6 -547,1.0,6 -548,1.0,8 -549,1.0,6 -550,1.0,6 -551,1.0,8 -552,1.0,7 -553,1.0,6 -554,1.0,8 -555,1.0,6 -556,0.0,5 -557,1.0,9 -558,1.0,8 -559,1.0,8 -560,1.0,6 -561,1.0,8 -562,1.0,8 -563,1.0,6 -564,0.0,5 -565,0.0,3 -566,0.0,2 -567,1.0,8 -568,1.0,6 -569,1.0,6 -570,1.0,6 -571,1.0,6 -572,1.0,6 -573,1.0,6 -574,1.0,6 -575,1.0,6 -576,1.0,6 -577,1.0,6 -578,1.0,6 -579,1.0,6 -580,1.0,6 -581,1.0,6 -582,0.0,2 -583,1.0,6 -584,0.0,4 -585,1.0,6 -586,1.0,6 -587,1.0,6 -588,1.0,6 -589,1.0,6 -590,1.0,8 -591,0.0,5 -592,1.0,6 -593,1.0,6 -594,1.0,6 -595,1.0,6 -596,1.0,6 -597,1.0,6 -598,0.0,3 -599,1.0,6 -600,1.0,6 -601,1.0,6 -602,0.0,2 -603,1.0,6 -604,0.0,4 -605,1.0,6 -606,1.0,6 -607,1.0,6 -608,1.0,6 -609,1.0,8 -610,1.0,6 -611,1.0,7 -612,1.0,6 -613,1.0,7 -614,1.0,6 -615,0.0,2 -616,1.0,6 -617,1.0,6 -618,0.0,5 -619,0.0,3 -620,0.0,3 -621,1.0,6 -622,0.0,5 -623,1.0,8 -624,1.0,8 -625,1.0,6 -626,1.0,6 -627,1.0,7 -628,1.0,6 -629,1.0,6 -630,1.0,6 -631,1.0,6 -632,1.0,6 -633,1.0,8 -634,0.0,2 -635,1.0,6 -636,1.0,6 -637,1.0,6 -638,1.0,6 -639,1.0,6 -640,1.0,6 -641,1.0,6 -642,1.0,8 -643,1.0,6 -644,1.0,8 -645,1.0,6 -646,1.0,6 -647,1.0,8 -648,1.0,8 -649,0.0,5 -650,0.0,4 -651,0.0,4 -652,1.0,6 -653,1.0,6 -654,1.0,6 -655,1.0,6 -656,1.0,8 -657,1.0,6 -658,0.0,4 -659,1.0,6 -660,1.0,8 -661,1.0,6 -662,1.0,6 -663,1.0,6 -664,1.0,6 -665,1.0,6 -666,1.0,6 -667,1.0,6 -668,1.0,8 -669,1.0,8 -670,1.0,6 -671,1.0,8 -672,1.0,9 -673,1.0,6 -674,1.0,6 -675,1.0,6 -676,1.0,6 -677,1.0,10 -678,1.0,6 -679,1.0,6 -680,1.0,6 -681,1.0,11 -682,1.0,10 -683,1.0,8 -684,1.0,6 -685,1.0,6 -686,1.0,6 -687,0.0,5 -688,1.0,6 -689,0.0,2 -690,1.0,9 -691,1.0,6 -692,1.0,8 -693,1.0,7 -694,1.0,6 -695,1.0,6 -696,1.0,7 -697,0.0,3 -698,1.0,7 -699,0.0,2 -700,1.0,6 -701,1.0,6 -702,1.0,8 -703,1.0,8 -704,1.0,6 -705,1.0,6 -706,0.0,2 -707,1.0,8 -708,1.0,6 -709,1.0,8 -710,1.0,6 -711,1.0,6 -712,1.0,9 -713,1.0,6 -714,1.0,8 -715,1.0,11 -716,1.0,6 -717,1.0,6 -718,1.0,6 -719,1.0,6 -720,1.0,8 -721,1.0,6 -722,1.0,6 -723,1.0,6 -724,0.0,5 -725,1.0,6 -726,1.0,6 -727,1.0,6 -728,1.0,6 -729,1.0,6 -730,1.0,7 -731,1.0,6 -732,1.0,6 -733,1.0,6 -734,1.0,6 -735,1.0,10 -736,1.0,6 -737,1.0,6 -738,1.0,6 -739,1.0,6 -740,1.0,6 -741,1.0,7 -742,1.0,6 -743,1.0,8 -744,1.0,7 -745,1.0,6 -746,1.0,6 -747,1.0,14 -748,1.0,6 -749,1.0,6 -750,1.0,12 -751,1.0,6 -752,1.0,6 -753,1.0,6 -754,1.0,6 -755,1.0,6 -756,1.0,6 -757,0.0,3 -758,1.0,6 -759,1.0,6 -760,1.0,6 -761,1.0,7 -762,1.0,6 -763,1.0,6 -764,1.0,6 -765,1.0,8 -766,0.0,2 -767,1.0,6 -768,1.0,6 -769,1.0,6 -770,1.0,6 -771,1.0,6 -772,1.0,6 -773,1.0,6 -774,1.0,6 -775,1.0,6 -776,0.0,4 -777,1.0,8 -778,1.0,6 -779,0.0,2 -780,1.0,10 -781,1.0,8 -782,1.0,6 -783,1.0,6 -784,1.0,6 -785,0.0,3 -786,1.0,6 -787,1.0,6 -788,0.0,6 -789,1.0,8 -790,1.0,6 -791,1.0,9 -792,1.0,6 -793,1.0,6 -794,1.0,8 -795,1.0,8 -796,1.0,6 -797,0.0,5 -798,1.0,6 -799,1.0,6 diff --git a/projects/codes/QLearning/Train_Racetrack-v0_QLearning_20221030-014833/config.yaml b/projects/codes/QLearning/Train_Racetrack-v0_QLearning_20221030-014833/config.yaml deleted file mode 100644 index d5b9c4c..0000000 --- a/projects/codes/QLearning/Train_Racetrack-v0_QLearning_20221030-014833/config.yaml +++ /dev/null @@ -1,19 +0,0 @@ -general_cfg: - algo_name: QLearning - device: cpu - env_name: Racetrack-v0 - load_checkpoint: false - load_path: Train_CartPole-v1_DQN_20221026-054757 - max_steps: 200 - mode: train - save_fig: true - seed: 10 - show_fig: false - test_eps: 20 - train_eps: 400 -algo_cfg: - epsilon_decay: 300 - epsilon_end: 0.01 - epsilon_start: 0.95 - gamma: 0.9 - lr: 0.1 diff --git a/projects/codes/QLearning/Train_Racetrack-v0_QLearning_20221030-014833/logs/log.txt b/projects/codes/QLearning/Train_Racetrack-v0_QLearning_20221030-014833/logs/log.txt deleted file mode 100644 index e737550..0000000 --- a/projects/codes/QLearning/Train_Racetrack-v0_QLearning_20221030-014833/logs/log.txt +++ /dev/null @@ -1,404 +0,0 @@ -2022-10-30 01:48:33 - r - INFO: - n_states: 4, n_actions: 9 -2022-10-30 01:48:33 - r - INFO: - Start training! -2022-10-30 01:48:33 - r - INFO: - Env: Racetrack-v0, Algorithm: QLearning, Device: cpu -2022-10-30 01:48:33 - r - INFO: - Episode: 1/400, Reward: -850.00: Epislon: 0.493 -2022-10-30 01:48:33 - r - INFO: - Episode: 2/400, Reward: -780.00: Epislon: 0.258 -2022-10-30 01:48:33 - r - INFO: - Episode: 3/400, Reward: -730.00: Epislon: 0.137 -2022-10-30 01:48:33 - r - INFO: - Episode: 4/400, Reward: -650.00: Epislon: 0.075 -2022-10-30 01:48:33 - r - INFO: - Episode: 5/400, Reward: -540.00: Epislon: 0.044 -2022-10-30 01:48:33 - r - INFO: - Episode: 6/400, Reward: -640.00: Epislon: 0.027 -2022-10-30 01:48:34 - r - INFO: - Episode: 7/400, Reward: -570.00: Epislon: 0.019 -2022-10-30 01:48:34 - r - INFO: - Episode: 8/400, Reward: -570.00: Epislon: 0.015 -2022-10-30 01:48:34 - r - INFO: - Episode: 9/400, Reward: -550.00: Epislon: 0.012 -2022-10-30 01:48:34 - r - INFO: - Episode: 10/400, Reward: -550.00: Epislon: 0.011 -2022-10-30 01:48:34 - r - INFO: - Episode: 11/400, Reward: -580.00: Epislon: 0.011 -2022-10-30 01:48:34 - r - INFO: - Episode: 12/400, Reward: -530.00: Epislon: 0.010 -2022-10-30 01:48:34 - r - INFO: - Episode: 13/400, Reward: -580.00: Epislon: 0.010 -2022-10-30 01:48:34 - r - INFO: - Episode: 14/400, Reward: -570.00: Epislon: 0.010 -2022-10-30 01:48:34 - r - INFO: - Episode: 15/400, Reward: -550.00: Epislon: 0.010 -2022-10-30 01:48:34 - r - INFO: - Episode: 16/400, Reward: -560.00: Epislon: 0.010 -2022-10-30 01:48:34 - r - INFO: - Episode: 17/400, Reward: -550.00: Epislon: 0.010 -2022-10-30 01:48:34 - r - INFO: - Episode: 18/400, Reward: -580.00: Epislon: 0.010 -2022-10-30 01:48:34 - r - INFO: - Episode: 19/400, Reward: -520.00: Epislon: 0.010 -2022-10-30 01:48:34 - r - INFO: - Episode: 20/400, Reward: -490.00: Epislon: 0.010 -2022-10-30 01:48:34 - r - INFO: - Episode: 21/400, Reward: -480.00: Epislon: 0.010 -2022-10-30 01:48:34 - r - INFO: - Episode: 22/400, Reward: -540.00: Epislon: 0.010 -2022-10-30 01:48:34 - r - INFO: - Episode: 23/400, Reward: -550.00: Epislon: 0.010 -2022-10-30 01:48:34 - r - INFO: - Episode: 24/400, Reward: -560.00: Epislon: 0.010 -2022-10-30 01:48:34 - r - INFO: - Episode: 25/400, Reward: -510.00: Epislon: 0.010 -2022-10-30 01:48:34 - r - INFO: - Episode: 26/400, Reward: -520.00: Epislon: 0.010 -2022-10-30 01:48:34 - r - INFO: - Episode: 27/400, Reward: -480.00: Epislon: 0.010 -2022-10-30 01:48:34 - r - INFO: - Episode: 28/400, Reward: -520.00: Epislon: 0.010 -2022-10-30 01:48:34 - r - INFO: - Episode: 29/400, Reward: -480.00: Epislon: 0.010 -2022-10-30 01:48:34 - r - INFO: - Episode: 30/400, Reward: -470.00: Epislon: 0.010 -2022-10-30 01:48:34 - r - INFO: - Episode: 31/400, Reward: -540.00: Epislon: 0.010 -2022-10-30 01:48:34 - r - INFO: - Episode: 32/400, Reward: -540.00: Epislon: 0.010 -2022-10-30 01:48:34 - r - INFO: - Episode: 33/400, Reward: -470.00: Epislon: 0.010 -2022-10-30 01:48:35 - r - INFO: - Episode: 34/400, Reward: -540.00: Epislon: 0.010 -2022-10-30 01:48:35 - r - INFO: - Episode: 35/400, Reward: -490.00: Epislon: 0.010 -2022-10-30 01:48:35 - r - INFO: - Episode: 36/400, Reward: -530.00: Epislon: 0.010 -2022-10-30 01:48:35 - r - INFO: - Episode: 37/400, Reward: -520.00: Epislon: 0.010 -2022-10-30 01:48:35 - r - INFO: - Episode: 38/400, Reward: -510.00: Epislon: 0.010 -2022-10-30 01:48:35 - r - INFO: - Episode: 39/400, Reward: -520.00: Epislon: 0.010 -2022-10-30 01:48:35 - r - INFO: - Episode: 40/400, Reward: -510.00: Epislon: 0.010 -2022-10-30 01:48:35 - r - INFO: - Episode: 41/400, Reward: -480.00: Epislon: 0.010 -2022-10-30 01:48:35 - r - INFO: - Episode: 42/400, Reward: -510.00: Epislon: 0.010 -2022-10-30 01:48:35 - r - INFO: - Episode: 43/400, Reward: -470.00: Epislon: 0.010 -2022-10-30 01:48:35 - r - INFO: - Episode: 44/400, Reward: -490.00: Epislon: 0.010 -2022-10-30 01:48:35 - r - INFO: - Episode: 45/400, Reward: -490.00: Epislon: 0.010 -2022-10-30 01:48:35 - r - INFO: - Episode: 46/400, Reward: -490.00: Epislon: 0.010 -2022-10-30 01:48:35 - r - INFO: - Episode: 47/400, Reward: -520.00: Epislon: 0.010 -2022-10-30 01:48:35 - r - INFO: - Episode: 48/400, Reward: -530.00: Epislon: 0.010 -2022-10-30 01:48:35 - r - INFO: - Episode: 49/400, Reward: -510.00: Epislon: 0.010 -2022-10-30 01:48:35 - r - INFO: - Episode: 50/400, Reward: -460.00: Epislon: 0.010 -2022-10-30 01:48:35 - r - INFO: - Episode: 51/400, Reward: -500.00: Epislon: 0.010 -2022-10-30 01:48:35 - r - INFO: - Episode: 52/400, Reward: -470.00: Epislon: 0.010 -2022-10-30 01:48:35 - r - INFO: - Episode: 53/400, Reward: -520.00: Epislon: 0.010 -2022-10-30 01:48:35 - r - INFO: - Episode: 54/400, Reward: -490.00: Epislon: 0.010 -2022-10-30 01:48:35 - r - INFO: - Episode: 55/400, Reward: -500.00: Epislon: 0.010 -2022-10-30 01:48:35 - r - INFO: - Episode: 56/400, Reward: -460.00: Epislon: 0.010 -2022-10-30 01:48:35 - r - INFO: - Episode: 57/400, Reward: -490.00: Epislon: 0.010 -2022-10-30 01:48:35 - r - INFO: - Episode: 58/400, Reward: -510.00: Epislon: 0.010 -2022-10-30 01:48:35 - r - INFO: - Episode: 59/400, Reward: -460.00: Epislon: 0.010 -2022-10-30 01:48:35 - r - INFO: - Episode: 60/400, Reward: -530.00: Epislon: 0.010 -2022-10-30 01:48:35 - r - INFO: - Episode: 61/400, Reward: -440.00: Epislon: 0.010 -2022-10-30 01:48:35 - r - INFO: - Episode: 62/400, Reward: -510.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 63/400, Reward: -520.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 64/400, Reward: -510.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 65/400, Reward: -460.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 66/400, Reward: -344.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 67/400, Reward: -500.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 68/400, Reward: -490.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 69/400, Reward: -490.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 70/400, Reward: -440.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 71/400, Reward: -77.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 72/400, Reward: -198.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 73/400, Reward: -440.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 74/400, Reward: -480.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 75/400, Reward: -354.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 76/400, Reward: -470.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 77/400, Reward: -480.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 78/400, Reward: -38.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 79/400, Reward: -460.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 80/400, Reward: -480.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 81/400, Reward: -490.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 82/400, Reward: -140.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 83/400, Reward: -102.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 84/400, Reward: -265.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 85/400, Reward: -145.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 86/400, Reward: -460.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 87/400, Reward: -500.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 88/400, Reward: -470.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 89/400, Reward: -325.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 90/400, Reward: -470.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 91/400, Reward: -376.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 92/400, Reward: -98.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 93/400, Reward: -130.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 94/400, Reward: -450.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 95/400, Reward: -146.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 96/400, Reward: 2.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 97/400, Reward: -18.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 98/400, Reward: -102.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 99/400, Reward: -163.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 100/400, Reward: -209.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 101/400, Reward: -460.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 102/400, Reward: -286.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 103/400, Reward: -189.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 104/400, Reward: -50.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 105/400, Reward: -398.00: Epislon: 0.010 -2022-10-30 01:48:36 - r - INFO: - Episode: 106/400, Reward: -72.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 107/400, Reward: -450.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 108/400, Reward: -125.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 109/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 110/400, Reward: -161.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 111/400, Reward: -408.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 112/400, Reward: -440.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 113/400, Reward: -188.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 114/400, Reward: -114.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 115/400, Reward: -415.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 116/400, Reward: -159.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 117/400, Reward: -234.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 118/400, Reward: -31.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 119/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 120/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 121/400, Reward: -63.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 122/400, Reward: 5.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 123/400, Reward: -47.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 124/400, Reward: -16.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 125/400, Reward: -49.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 126/400, Reward: -87.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 127/400, Reward: -2.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 128/400, Reward: -26.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 129/400, Reward: -238.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 130/400, Reward: -18.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 131/400, Reward: -235.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 132/400, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 133/400, Reward: -135.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 134/400, Reward: -20.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 135/400, Reward: -46.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 136/400, Reward: -66.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 137/400, Reward: -45.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 138/400, Reward: -14.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 139/400, Reward: 1.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 140/400, Reward: -106.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 141/400, Reward: -112.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 142/400, Reward: -47.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 143/400, Reward: 1.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 144/400, Reward: -30.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 145/400, Reward: -147.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 146/400, Reward: 5.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 147/400, Reward: -30.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 148/400, Reward: -167.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 149/400, Reward: 1.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 150/400, Reward: -72.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 151/400, Reward: -44.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 152/400, Reward: -76.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 153/400, Reward: -63.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 154/400, Reward: -34.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 155/400, Reward: 5.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 156/400, Reward: 5.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 157/400, Reward: -26.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 158/400, Reward: -80.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 159/400, Reward: -168.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 160/400, Reward: -164.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 161/400, Reward: 1.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 162/400, Reward: -19.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 163/400, Reward: -12.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 164/400, Reward: -44.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 165/400, Reward: -80.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 166/400, Reward: 5.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 167/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 168/400, Reward: -29.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 169/400, Reward: -56.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 170/400, Reward: -47.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 171/400, Reward: -76.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 172/400, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 173/400, Reward: -145.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 174/400, Reward: -28.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 175/400, Reward: -63.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 176/400, Reward: -106.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 177/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 178/400, Reward: -28.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 179/400, Reward: -60.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 180/400, Reward: -49.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 181/400, Reward: -52.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 182/400, Reward: -84.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 183/400, Reward: 5.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 184/400, Reward: -55.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 185/400, Reward: -14.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 186/400, Reward: 1.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 187/400, Reward: -39.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 188/400, Reward: -47.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 189/400, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 190/400, Reward: -53.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 191/400, Reward: -50.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 192/400, Reward: -104.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 193/400, Reward: -253.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 194/400, Reward: -48.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 195/400, Reward: -190.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 196/400, Reward: -43.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 197/400, Reward: -35.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 198/400, Reward: 0.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 199/400, Reward: 5.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 200/400, Reward: -11.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 201/400, Reward: 5.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 202/400, Reward: -16.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 203/400, Reward: -99.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 204/400, Reward: -22.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 205/400, Reward: -170.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 206/400, Reward: -109.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 207/400, Reward: -48.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 208/400, Reward: -275.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 209/400, Reward: -49.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 210/400, Reward: -147.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 211/400, Reward: -51.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 212/400, Reward: -67.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 213/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 214/400, Reward: -17.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 215/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 216/400, Reward: -69.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 217/400, Reward: -218.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 218/400, Reward: -63.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 219/400, Reward: -11.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 220/400, Reward: -34.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 221/400, Reward: -32.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 222/400, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 223/400, Reward: -26.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 224/400, Reward: -19.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 225/400, Reward: -148.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 226/400, Reward: -19.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 227/400, Reward: 1.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 228/400, Reward: -49.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 229/400, Reward: 5.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 230/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 231/400, Reward: -223.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 232/400, Reward: -14.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 233/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 234/400, Reward: 5.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 235/400, Reward: 2.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 236/400, Reward: 5.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 237/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 238/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 239/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 240/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 241/400, Reward: -44.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 242/400, Reward: -10.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 243/400, Reward: 2.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 244/400, Reward: -108.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 245/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 246/400, Reward: -27.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 247/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 248/400, Reward: 5.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 249/400, Reward: 5.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 250/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 251/400, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 252/400, Reward: -28.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 253/400, Reward: -112.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 254/400, Reward: -39.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 255/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 256/400, Reward: -48.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 257/400, Reward: -149.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 258/400, Reward: -27.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 259/400, Reward: -33.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 260/400, Reward: -30.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 261/400, Reward: -29.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 262/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 263/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 264/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 265/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 266/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 267/400, Reward: 5.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 268/400, Reward: -52.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 269/400, Reward: -53.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 270/400, Reward: -62.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 271/400, Reward: 5.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 272/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 273/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 274/400, Reward: -10.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 275/400, Reward: -8.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 276/400, Reward: -30.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 277/400, Reward: -25.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 278/400, Reward: -45.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 279/400, Reward: -48.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 280/400, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 281/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 282/400, Reward: -14.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 283/400, Reward: -26.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 284/400, Reward: -116.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 285/400, Reward: 5.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 286/400, Reward: -14.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 287/400, Reward: -42.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 288/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 289/400, Reward: -31.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 290/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 291/400, Reward: -25.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 292/400, Reward: 5.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 293/400, Reward: -43.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 294/400, Reward: -21.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 295/400, Reward: -33.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 296/400, Reward: -12.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 297/400, Reward: -28.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 298/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 299/400, Reward: 0.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 300/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 301/400, Reward: -6.00: Epislon: 0.010 -2022-10-30 01:48:37 - r - INFO: - Episode: 302/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 303/400, Reward: 2.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 304/400, Reward: -12.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 305/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 306/400, Reward: -77.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 307/400, Reward: 5.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 308/400, Reward: -32.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 309/400, Reward: 5.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 310/400, Reward: -12.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 311/400, Reward: -36.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 312/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 313/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 314/400, Reward: -34.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 315/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 316/400, Reward: -21.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 317/400, Reward: -48.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 318/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 319/400, Reward: 5.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 320/400, Reward: -25.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 321/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 322/400, Reward: -14.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 323/400, Reward: -135.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 324/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 325/400, Reward: 5.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 326/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 327/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 328/400, Reward: -11.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 329/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 330/400, Reward: -11.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 331/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 332/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 333/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 334/400, Reward: 5.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 335/400, Reward: -12.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 336/400, Reward: -22.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 337/400, Reward: -16.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 338/400, Reward: -17.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 339/400, Reward: -14.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 340/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 341/400, Reward: 5.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 342/400, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 343/400, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 344/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 345/400, Reward: -90.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 346/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 347/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 348/400, Reward: -53.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 349/400, Reward: -87.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 350/400, Reward: -22.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 351/400, Reward: 5.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 352/400, Reward: -12.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 353/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 354/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 355/400, Reward: -113.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 356/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 357/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 358/400, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 359/400, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 360/400, Reward: -14.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 361/400, Reward: 2.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 362/400, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 363/400, Reward: -63.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 364/400, Reward: -14.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 365/400, Reward: -15.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 366/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 367/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 368/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 369/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 370/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 371/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 372/400, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 373/400, Reward: -12.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 374/400, Reward: -30.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 375/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 376/400, Reward: 2.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 377/400, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 378/400, Reward: 5.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 379/400, Reward: -31.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 380/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 381/400, Reward: 2.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 382/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 383/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 384/400, Reward: -84.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 385/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 386/400, Reward: -27.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 387/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 388/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 389/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 390/400, Reward: 2.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 391/400, Reward: 2.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 392/400, Reward: 3.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 393/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 394/400, Reward: 4.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 395/400, Reward: -18.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 396/400, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 397/400, Reward: -41.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 398/400, Reward: 5.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 399/400, Reward: -41.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Episode: 400/400, Reward: -13.00: Epislon: 0.010 -2022-10-30 01:48:38 - r - INFO: - Finish training! diff --git a/projects/codes/QLearning/Train_Racetrack-v0_QLearning_20221030-014833/models/Qleaning_model.pkl b/projects/codes/QLearning/Train_Racetrack-v0_QLearning_20221030-014833/models/Qleaning_model.pkl deleted file mode 100644 index 1f458e1..0000000 Binary files a/projects/codes/QLearning/Train_Racetrack-v0_QLearning_20221030-014833/models/Qleaning_model.pkl and /dev/null differ diff --git a/projects/codes/QLearning/Train_Racetrack-v0_QLearning_20221030-014833/results/learning_curve.png b/projects/codes/QLearning/Train_Racetrack-v0_QLearning_20221030-014833/results/learning_curve.png deleted file mode 100644 index 8c1c331..0000000 Binary files a/projects/codes/QLearning/Train_Racetrack-v0_QLearning_20221030-014833/results/learning_curve.png and /dev/null differ diff --git a/projects/codes/QLearning/Train_Racetrack-v0_QLearning_20221030-014833/results/res.csv b/projects/codes/QLearning/Train_Racetrack-v0_QLearning_20221030-014833/results/res.csv deleted file mode 100644 index 79373d8..0000000 --- a/projects/codes/QLearning/Train_Racetrack-v0_QLearning_20221030-014833/results/res.csv +++ /dev/null @@ -1,401 +0,0 @@ -episodes,rewards,steps -0,-850,200 -1,-780,200 -2,-730,200 -3,-650,200 -4,-540,200 -5,-640,200 -6,-570,200 -7,-570,200 -8,-550,200 -9,-550,200 -10,-580,200 -11,-530,200 -12,-580,200 -13,-570,200 -14,-550,200 -15,-560,200 -16,-550,200 -17,-580,200 -18,-520,200 -19,-490,200 -20,-480,200 -21,-540,200 -22,-550,200 -23,-560,200 -24,-510,200 -25,-520,200 -26,-480,200 -27,-520,200 -28,-480,200 -29,-470,200 -30,-540,200 -31,-540,200 -32,-470,200 -33,-540,200 -34,-490,200 -35,-530,200 -36,-520,200 -37,-510,200 -38,-520,200 -39,-510,200 -40,-480,200 -41,-510,200 -42,-470,200 -43,-490,200 -44,-490,200 -45,-490,200 -46,-520,200 -47,-530,200 -48,-510,200 -49,-460,200 -50,-500,200 -51,-470,200 -52,-520,200 -53,-490,200 -54,-500,200 -55,-460,200 -56,-490,200 -57,-510,200 -58,-460,200 -59,-530,200 -60,-440,200 -61,-510,200 -62,-520,200 -63,-510,200 -64,-460,200 -65,-344,154 -66,-500,200 -67,-490,200 -68,-490,200 -69,-440,200 -70,-77,47 -71,-198,88 -72,-440,200 -73,-480,200 -74,-354,154 -75,-470,200 -76,-480,200 -77,-38,28 -78,-460,200 -79,-480,200 -80,-490,200 -81,-140,70 -82,-102,52 -83,-265,125 -84,-145,75 -85,-460,200 -86,-500,200 -87,-470,200 -88,-325,155 -89,-470,200 -90,-376,156 -91,-98,58 -92,-130,70 -93,-450,200 -94,-146,66 -95,2,8 -96,-18,18 -97,-102,52 -98,-163,73 -99,-209,89 -100,-460,200 -101,-286,126 -102,-189,89 -103,-50,30 -104,-398,168 -105,-72,32 -106,-450,200 -107,-125,65 -108,4,6 -109,-161,71 -110,-408,178 -111,-440,200 -112,-188,78 -113,-114,64 -114,-415,185 -115,-159,69 -116,-234,104 -117,-31,21 -118,3,7 -119,4,6 -120,-63,33 -121,5,5 -122,-47,27 -123,-16,16 -124,-49,29 -125,-87,47 -126,-2,12 -127,-26,16 -128,-238,108 -129,-18,18 -130,-235,105 -131,-13,13 -132,-135,65 -133,-20,20 -134,-46,26 -135,-66,36 -136,-45,25 -137,-14,14 -138,1,9 -139,-106,56 -140,-112,62 -141,-47,27 -142,1,9 -143,-30,20 -144,-147,77 -145,5,5 -146,-30,20 -147,-167,77 -148,1,9 -149,-72,32 -150,-44,24 -151,-76,46 -152,-63,33 -153,-34,24 -154,5,5 -155,5,5 -156,-26,16 -157,-80,40 -158,-168,78 -159,-164,74 -160,1,9 -161,-19,19 -162,-12,12 -163,-44,24 -164,-80,40 -165,5,5 -166,4,6 -167,-29,19 -168,-56,26 -169,-47,27 -170,-76,46 -171,-13,13 -172,-145,65 -173,-28,18 -174,-63,33 -175,-106,56 -176,3,7 -177,-28,28 -178,-60,30 -179,-49,29 -180,-52,32 -181,-84,44 -182,5,5 -183,-55,35 -184,-14,14 -185,1,9 -186,-39,19 -187,-47,27 -188,-13,13 -189,-53,33 -190,-50,30 -191,-104,54 -192,-253,113 -193,-48,28 -194,-190,90 -195,-43,23 -196,-35,25 -197,0,10 -198,5,5 -199,-11,11 -200,5,5 -201,-16,16 -202,-99,49 -203,-22,22 -204,-170,80 -205,-109,59 -206,-48,28 -207,-275,115 -208,-49,29 -209,-147,77 -210,-51,31 -211,-67,37 -212,4,6 -213,-17,17 -214,3,7 -215,-69,39 -216,-218,88 -217,-63,33 -218,-11,11 -219,-34,24 -220,-32,22 -221,-15,15 -222,-26,16 -223,-19,19 -224,-148,78 -225,-19,19 -226,1,9 -227,-49,29 -228,5,5 -229,3,7 -230,-223,103 -231,-14,14 -232,4,6 -233,5,5 -234,2,8 -235,5,5 -236,4,6 -237,3,7 -238,3,7 -239,4,6 -240,-44,24 -241,-10,10 -242,2,8 -243,-108,58 -244,4,6 -245,-27,17 -246,3,7 -247,5,5 -248,5,5 -249,3,7 -250,-15,15 -251,-28,28 -252,-112,52 -253,-39,29 -254,4,6 -255,-48,28 -256,-149,69 -257,-27,17 -258,-33,23 -259,-30,20 -260,-29,19 -261,4,6 -262,4,6 -263,3,7 -264,3,7 -265,4,6 -266,5,5 -267,-52,42 -268,-53,33 -269,-62,42 -270,5,5 -271,4,6 -272,4,6 -273,-10,10 -274,-8,8 -275,-30,20 -276,-25,15 -277,-45,35 -278,-48,28 -279,-15,15 -280,4,6 -281,-14,14 -282,-26,16 -283,-116,56 -284,5,5 -285,-14,14 -286,-42,22 -287,3,7 -288,-31,21 -289,4,6 -290,-25,25 -291,5,5 -292,-43,23 -293,-21,21 -294,-33,23 -295,-12,12 -296,-28,18 -297,3,7 -298,0,10 -299,4,6 -300,-6,16 -301,4,6 -302,2,8 -303,-12,12 -304,4,6 -305,-77,47 -306,5,5 -307,-32,22 -308,5,5 -309,-12,12 -310,-36,26 -311,4,6 -312,4,6 -313,-34,24 -314,4,6 -315,-21,21 -316,-48,28 -317,4,6 -318,5,5 -319,-25,15 -320,4,6 -321,-14,14 -322,-135,65 -323,3,7 -324,5,5 -325,4,6 -326,3,7 -327,-11,11 -328,3,7 -329,-11,11 -330,3,7 -331,4,6 -332,3,7 -333,5,5 -334,-12,12 -335,-22,22 -336,-16,16 -337,-17,17 -338,-14,14 -339,3,7 -340,5,5 -341,-15,15 -342,-13,13 -343,4,6 -344,-90,40 -345,3,7 -346,3,7 -347,-53,33 -348,-87,47 -349,-22,22 -350,5,5 -351,-12,12 -352,3,7 -353,4,6 -354,-113,53 -355,3,7 -356,3,7 -357,-13,13 -358,-15,15 -359,-14,14 -360,2,8 -361,-15,15 -362,-63,33 -363,-14,14 -364,-15,15 -365,3,7 -366,3,7 -367,4,6 -368,4,6 -369,4,6 -370,3,7 -371,-13,13 -372,-12,12 -373,-30,20 -374,3,7 -375,2,8 -376,-13,13 -377,5,5 -378,-31,21 -379,3,7 -380,2,8 -381,4,6 -382,4,6 -383,-84,44 -384,3,7 -385,-27,17 -386,4,6 -387,4,6 -388,4,6 -389,2,8 -390,2,8 -391,3,7 -392,4,6 -393,4,6 -394,-18,18 -395,-13,13 -396,-41,31 -397,5,5 -398,-41,31 -399,-13,13 diff --git a/projects/codes/QLearning/config/CliffWalking-v0_QLearning_Test.yaml b/projects/codes/QLearning/config/CliffWalking-v0_QLearning_Test.yaml deleted file mode 100644 index d1a9903..0000000 --- a/projects/codes/QLearning/config/CliffWalking-v0_QLearning_Test.yaml +++ /dev/null @@ -1,19 +0,0 @@ -general_cfg: - algo_name: QLearning - device: cpu - env_name: CliffWalking-v0 - mode: test - load_checkpoint: true - load_path: Train_CliffWalking-v0_QLearning_20221030-013856 - max_steps: 200 - save_fig: true - seed: 1 - show_fig: false - test_eps: 20 - train_eps: 400 -algo_cfg: - epsilon_decay: 300 - epsilon_end: 0.01 - epsilon_start: 0.95 - gamma: 0.95 - lr: 0.1 diff --git a/projects/codes/QLearning/config/CliffWalking-v0_QLearning_Train.yaml b/projects/codes/QLearning/config/CliffWalking-v0_QLearning_Train.yaml deleted file mode 100644 index 332f3ab..0000000 --- a/projects/codes/QLearning/config/CliffWalking-v0_QLearning_Train.yaml +++ /dev/null @@ -1,19 +0,0 @@ -general_cfg: - algo_name: QLearning - device: cpu - env_name: CliffWalking-v0 - mode: train - load_checkpoint: false - load_path: Train_CartPole-v1_DQN_20221026-054757 - max_steps: 200 - save_fig: true - seed: 1 - show_fig: false - test_eps: 20 - train_eps: 800 -algo_cfg: - epsilon_decay: 300 - epsilon_end: 0.01 - epsilon_start: 0.95 - gamma: 0.95 - lr: 0.1 diff --git a/projects/codes/QLearning/config/FrozenLakeNoSlippery-v1_QLearning_Test.yaml b/projects/codes/QLearning/config/FrozenLakeNoSlippery-v1_QLearning_Test.yaml deleted file mode 100644 index 089e391..0000000 --- a/projects/codes/QLearning/config/FrozenLakeNoSlippery-v1_QLearning_Test.yaml +++ /dev/null @@ -1,19 +0,0 @@ -general_cfg: - algo_name: QLearning - device: cpu - env_name: FrozenLakeNoSlippery-v1 - mode: test - load_checkpoint: true - load_path: Train_FrozenLakeNoSlippery-v1_QLearning_20221030-014504 - max_steps: 200 - save_fig: true - seed: 10 - show_fig: false - test_eps: 20 - train_eps: 800 -algo_cfg: - epsilon_decay: 2000 - epsilon_end: 0.1 - epsilon_start: 0.7 - gamma: 0.95 - lr: 0.9 diff --git a/projects/codes/QLearning/config/FrozenLakeNoSlippery-v1_QLearning_Train.yaml b/projects/codes/QLearning/config/FrozenLakeNoSlippery-v1_QLearning_Train.yaml deleted file mode 100644 index 760750a..0000000 --- a/projects/codes/QLearning/config/FrozenLakeNoSlippery-v1_QLearning_Train.yaml +++ /dev/null @@ -1,19 +0,0 @@ -general_cfg: - algo_name: QLearning - device: cpu - env_name: FrozenLakeNoSlippery-v1 - mode: train - load_checkpoint: false - load_path: Train_CartPole-v1_DQN_20221026-054757 - max_steps: 200 - save_fig: true - seed: 10 - show_fig: false - test_eps: 20 - train_eps: 800 -algo_cfg: - epsilon_decay: 2000 - epsilon_end: 0.1 - epsilon_start: 0.7 - gamma: 0.95 - lr: 0.9 diff --git a/projects/codes/QLearning/config/Racetrack-v0_QLearning_Test.yaml b/projects/codes/QLearning/config/Racetrack-v0_QLearning_Test.yaml deleted file mode 100644 index 3aa9985..0000000 --- a/projects/codes/QLearning/config/Racetrack-v0_QLearning_Test.yaml +++ /dev/null @@ -1,19 +0,0 @@ -general_cfg: - algo_name: QLearning - device: cpu - env_name: Racetrack-v0 - mode: test - load_checkpoint: true - load_path: Train_Racetrack-v0_QLearning_20221030-014833 - max_steps: 200 - save_fig: true - seed: 10 - show_fig: false - test_eps: 20 - train_eps: 400 -algo_cfg: - epsilon_decay: 300 - epsilon_end: 0.01 - epsilon_start: 0.95 - gamma: 0.9 - lr: 0.1 diff --git a/projects/codes/QLearning/config/Racetrack-v0_QLearning_Train.yaml b/projects/codes/QLearning/config/Racetrack-v0_QLearning_Train.yaml deleted file mode 100644 index 63e51c3..0000000 --- a/projects/codes/QLearning/config/Racetrack-v0_QLearning_Train.yaml +++ /dev/null @@ -1,19 +0,0 @@ -general_cfg: - algo_name: QLearning - device: cpu - env_name: Racetrack-v0 - mode: train - load_checkpoint: false - load_path: Train_CartPole-v1_DQN_20221026-054757 - max_steps: 200 - save_fig: true - seed: 10 - show_fig: false - test_eps: 20 - train_eps: 400 -algo_cfg: - epsilon_decay: 300 - epsilon_end: 0.01 - epsilon_start: 0.95 - gamma: 0.9 - lr: 0.1 diff --git a/projects/codes/QLearning/config/config.py b/projects/codes/QLearning/config/config.py deleted file mode 100644 index e0ed62a..0000000 --- a/projects/codes/QLearning/config/config.py +++ /dev/null @@ -1,35 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: JiangJi -Email: johnjim0816@gmail.com -Date: 2022-10-30 01:23:07 -LastEditor: JiangJi -LastEditTime: 2022-10-30 01:39:54 -Discription: default parameters of QLearning -''' -from common.config import GeneralConfig,AlgoConfig - -class GeneralConfigQLearning(GeneralConfig): - def __init__(self) -> None: - self.env_name = "CliffWalking-v0" # name of environment - self.algo_name = "QLearning" # name of algorithm - self.mode = "train" # train or test - self.seed = 1 # random seed - self.device = "cpu" # device to use - self.train_eps = 400 # number of episodes for training - self.test_eps = 20 # number of episodes for testing - self.max_steps = 200 # max steps for each episode - self.load_checkpoint = False - self.load_path = "tasks" # path to load model - self.show_fig = False # show figure or not - self.save_fig = True # save figure or not - -class AlgoConfigQLearning(AlgoConfig): - def __init__(self) -> None: - # set epsilon_start=epsilon_end can obtain fixed epsilon=epsilon_end - self.epsilon_start = 0.95 # epsilon start value - self.epsilon_end = 0.01 # epsilon end value - self.epsilon_decay = 300 # epsilon decay rate - self.gamma = 0.90 # discount factor - self.lr = 0.1 # learning rate \ No newline at end of file diff --git a/projects/codes/QLearning/qlearning.py b/projects/codes/QLearning/qlearning.py deleted file mode 100644 index 48dfa37..0000000 --- a/projects/codes/QLearning/qlearning.py +++ /dev/null @@ -1,66 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: John -Email: johnjim0816@gmail.com -Date: 2020-09-11 23:03:00 -LastEditor: John -LastEditTime: 2022-10-30 01:38:26 -Discription: use defaultdict to define Q table -Environment: -''' -import numpy as np -import math -import torch -from collections import defaultdict - -class QLearning(object): - def __init__(self,cfg): - self.n_actions = cfg.n_actions - self.lr = cfg.lr - self.gamma = cfg.gamma - self.epsilon = cfg.epsilon_start - self.sample_count = 0 - self.epsilon_start = cfg.epsilon_start - self.epsilon_end = cfg.epsilon_end - self.epsilon_decay = cfg.epsilon_decay - self.Q_table = defaultdict(lambda: np.zeros(self.n_actions)) # use nested dictionary to represent Q(s,a), here set all Q(s,a)=0 initially, not like pseudo code - def sample_action(self, state): - ''' sample action with e-greedy policy while training - ''' - self.sample_count += 1 - # epsilon must decay(linear,exponential and etc.) for balancing exploration and exploitation - self.epsilon = self.epsilon_end + (self.epsilon_start - self.epsilon_end) * \ - math.exp(-1. * self.sample_count / self.epsilon_decay) - if np.random.uniform(0, 1) > self.epsilon: - action = np.argmax(self.Q_table[str(state)]) # choose action corresponding to the maximum q value - else: - action = np.random.choice(self.n_actions) # choose action randomly - return action - def predict_action(self,state): - ''' predict action while testing - ''' - action = np.argmax(self.Q_table[str(state)]) - return action - def update(self, state, action, reward, next_state, done): - Q_predict = self.Q_table[str(state)][action] - if done: # terminal state - Q_target = reward - else: - Q_target = reward + self.gamma * np.max(self.Q_table[str(next_state)]) - self.Q_table[str(state)][action] += self.lr * (Q_target - Q_predict) - def save_model(self,path): - import dill - from pathlib import Path - # create path - Path(path).mkdir(parents=True, exist_ok=True) - torch.save( - obj=self.Q_table, - f=path+"Qleaning_model.pkl", - pickle_module=dill - ) - print("Model saved!") - def load_model(self, path): - import dill - self.Q_table =torch.load(f=path+'Qleaning_model.pkl',pickle_module=dill) - print("Mode loaded!") \ No newline at end of file diff --git a/projects/codes/QLearning/task0.py b/projects/codes/QLearning/task0.py deleted file mode 100644 index da52113..0000000 --- a/projects/codes/QLearning/task0.py +++ /dev/null @@ -1,106 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: John -Email: johnjim0816@gmail.com -Date: 2020-09-11 23:03:00 -LastEditor: John -LastEditTime: 2022-10-30 02:04:55 -Discription: -Environment: -''' -import sys,os -os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE" # avoid "OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized." -curr_path = os.path.dirname(os.path.abspath(__file__)) # current path -parent_path = os.path.dirname(curr_path) # parent path -sys.path.append(parent_path) # add path to system path - -import gym -import datetime -import argparse -from envs.gridworld_env import FrozenLakeWapper -from envs.wrappers import CliffWalkingWapper -from envs.register import register_env -from qlearning import QLearning -from common.utils import all_seed,merge_class_attrs -from common.launcher import Launcher -from config.config import GeneralConfigQLearning,AlgoConfigQLearning - -class Main(Launcher): - def __init__(self) -> None: - super().__init__() - self.cfgs['general_cfg'] = merge_class_attrs(self.cfgs['general_cfg'],GeneralConfigQLearning()) - self.cfgs['algo_cfg'] = merge_class_attrs(self.cfgs['algo_cfg'],AlgoConfigQLearning()) - def env_agent_config(self,cfg,logger): - ''' create env and agent - ''' - register_env(cfg.env_name) - env = gym.make(cfg.env_name,new_step_api=False) # create env - if cfg.env_name == 'CliffWalking-v0': - env = CliffWalkingWapper(env) - if cfg.seed !=0: # set random seed - all_seed(env,seed=cfg.seed) - try: # state dimension - n_states = env.observation_space.n # print(hasattr(env.observation_space, 'n')) - except AttributeError: - n_states = env.observation_space.shape[0] # print(hasattr(env.observation_space, 'shape')) - n_actions = env.action_space.n # action dimension - logger.info(f"n_states: {n_states}, n_actions: {n_actions}") # print info - # update to cfg paramters - setattr(cfg, 'n_states', n_states) - setattr(cfg, 'n_actions', n_actions) - agent = QLearning(cfg) - return env,agent - def train(self,cfg,env,agent,logger): - logger.info("Start training!") - logger.info(f"Env: {cfg.env_name}, Algorithm: {cfg.algo_name}, Device: {cfg.device}") - rewards = [] # record rewards for all episodes - steps = [] # record steps for all episodes - for i_ep in range(cfg.train_eps): - ep_reward = 0 # reward per episode - ep_step = 0 # step per episode - state = env.reset() # reset and obtain initial state - for _ in range(cfg.max_steps): - action = agent.sample_action(state) # sample action - next_state, reward, terminated, _ = env.step(action) # update env and return transitions - agent.update(state, action, reward, next_state, terminated) # update agent - state = next_state # update state - ep_reward += reward - ep_step += 1 - if terminated: - break - rewards.append(ep_reward) - steps.append(ep_step) - logger.info(f'Episode: {i_ep+1}/{cfg.train_eps}, Reward: {ep_reward:.2f}, Steps:{ep_step:d}, Epislon: {agent.epsilon:.3f}') - logger.info("Finish training!") - return {'episodes':range(len(rewards)),'rewards':rewards,'steps':steps} - def test(self,cfg,env,agent,logger): - logger.info("Start testing!") - logger.info(f"Env: {cfg.env_name}, Algorithm: {cfg.algo_name}, Device: {cfg.device}") - rewards = [] # record rewards for all episodes - steps = [] # record steps for all episodes - for i_ep in range(cfg.test_eps): - ep_reward = 0 # reward per episode - ep_step = 0 - state = env.reset() # reset and obtain initial state - for _ in range(cfg.max_steps): - action = agent.predict_action(state) # predict action - next_state, reward, terminated, _ = env.step(action) - state = next_state - ep_reward += reward - ep_step += 1 - if terminated: - break - rewards.append(ep_reward) - steps.append(ep_step) - logger.info(f"Episode: {i_ep+1}/{cfg.test_eps}, Reward: {ep_reward:.2f}, Steps:{ep_step:d}") - logger.info("Finish testing!") - return {'episodes':range(len(rewards)),'rewards':rewards,'steps':steps} - -if __name__ == "__main__": - main = Main() - main.run() - - - - diff --git a/projects/codes/RainbowDQN/rainbow_dqn.py b/projects/codes/RainbowDQN/rainbow_dqn.py deleted file mode 100644 index 0d7f783..0000000 --- a/projects/codes/RainbowDQN/rainbow_dqn.py +++ /dev/null @@ -1,215 +0,0 @@ -import math -import torch -import torch.nn as nn -import torch.nn.functional as F -import torch.optim as optim -from torch.autograd import Variable -import random -class ReplayBuffer: - def __init__(self, capacity): - self.capacity = capacity # 经验回放的容量 - self.buffer = [] # 缓冲区 - self.position = 0 - - def push(self, state, action, reward, next_state, done): - ''' 缓冲区是一个队列,容量超出时去掉开始存入的转移(transition) - ''' - if len(self.buffer) < self.capacity: - self.buffer.append(None) - self.buffer[self.position] = (state, action, reward, next_state, done) - self.position = (self.position + 1) % self.capacity - - def sample(self, batch_size): - batch = random.sample(self.buffer, batch_size) # 随机采出小批量转移 - state, action, reward, next_state, done = zip(*batch) # 解压成状态,动作等 - return state, action, reward, next_state, done - - def __len__(self): - ''' 返回当前存储的量 - ''' - return len(self.buffer) -class NoisyLinear(nn.Module): - def __init__(self, input_dim, output_dim, device, std_init=0.4): - super(NoisyLinear, self).__init__() - - self.device = device - self.input_dim = input_dim - self.output_dim = output_dim - self.std_init = std_init - - self.weight_mu = nn.Parameter(torch.FloatTensor(output_dim, input_dim)) - self.weight_sigma = nn.Parameter(torch.FloatTensor(output_dim, input_dim)) - self.register_buffer('weight_epsilon', torch.FloatTensor(output_dim, input_dim)) - - self.bias_mu = nn.Parameter(torch.FloatTensor(output_dim)) - self.bias_sigma = nn.Parameter(torch.FloatTensor(output_dim)) - self.register_buffer('bias_epsilon', torch.FloatTensor(output_dim)) - - self.reset_parameters() - self.reset_noise() - - def forward(self, x): - if self.device: - weight_epsilon = self.weight_epsilon.cuda() - bias_epsilon = self.bias_epsilon.cuda() - else: - weight_epsilon = self.weight_epsilon - bias_epsilon = self.bias_epsilon - - if self.training: - weight = self.weight_mu + self.weight_sigma.mul(Variable(weight_epsilon)) - bias = self.bias_mu + self.bias_sigma.mul(Variable(bias_epsilon)) - else: - weight = self.weight_mu - bias = self.bias_mu - - return F.linear(x, weight, bias) - - def reset_parameters(self): - mu_range = 1 / math.sqrt(self.weight_mu.size(1)) - - self.weight_mu.data.uniform_(-mu_range, mu_range) - self.weight_sigma.data.fill_(self.std_init / math.sqrt(self.weight_sigma.size(1))) - - self.bias_mu.data.uniform_(-mu_range, mu_range) - self.bias_sigma.data.fill_(self.std_init / math.sqrt(self.bias_sigma.size(0))) - - def reset_noise(self): - epsilon_in = self._scale_noise(self.input_dim) - epsilon_out = self._scale_noise(self.output_dim) - - self.weight_epsilon.copy_(epsilon_out.ger(epsilon_in)) - self.bias_epsilon.copy_(self._scale_noise(self.output_dim)) - - def _scale_noise(self, size): - x = torch.randn(size) - x = x.sign().mul(x.abs().sqrt()) - return x - -class RainbowModel(nn.Module): - def __init__(self, n_states, n_actions, n_atoms, Vmin, Vmax): - super(RainbowModel, self).__init__() - - self.n_states = n_states - self.n_actions = n_actions - self.n_atoms = n_atoms - self.Vmin = Vmin - self.Vmax = Vmax - - self.linear1 = nn.Linear(n_states, 32) - self.linear2 = nn.Linear(32, 64) - - self.noisy_value1 = NoisyLinear(64, 64, device=device) - self.noisy_value2 = NoisyLinear(64, self.n_atoms, device=device) - - self.noisy_advantage1 = NoisyLinear(64, 64, device=device) - self.noisy_advantage2 = NoisyLinear(64, self.n_atoms * self.n_actions, device=device) - - def forward(self, x): - batch_size = x.size(0) - - x = F.relu(self.linear1(x)) - x = F.relu(self.linear2(x)) - - value = F.relu(self.noisy_value1(x)) - value = self.noisy_value2(value) - - advantage = F.relu(self.noisy_advantage1(x)) - advantage = self.noisy_advantage2(advantage) - - value = value.view(batch_size, 1, self.n_atoms) - advantage = advantage.view(batch_size, self.n_actions, self.n_atoms) - - x = value + advantage - advantage.mean(1, keepdim=True) - x = F.softmax(x.view(-1, self.n_atoms)).view(-1, self.n_actions, self.n_atoms) - - return x - - def reset_noise(self): - self.noisy_value1.reset_noise() - self.noisy_value2.reset_noise() - self.noisy_advantage1.reset_noise() - self.noisy_advantage2.reset_noise() - - def act(self, state): - state = Variable(torch.FloatTensor(state).unsqueeze(0), volatile=True) - dist = self.forward(state).data.cpu() - dist = dist * torch.linspace(self.Vmin, self.Vmax, self.n_atoms) - action = dist.sum(2).max(1)[1].numpy()[0] - return action - -class RainbowDQN(nn.Module): - def __init__(self, n_states, n_actions, n_atoms, Vmin, Vmax,cfg): - super(RainbowDQN, self).__init__() - self.n_states = n_states - self.n_actions = n_actions - self.n_atoms = cfg.n_atoms - self.Vmin = cfg.Vmin - self.Vmax = cfg.Vmax - self.policy_model = RainbowModel(n_states, n_actions, n_atoms, Vmin, Vmax) - self.target_model = RainbowModel(n_states, n_actions, n_atoms, Vmin, Vmax) - self.batch_size = cfg.batch_size - self.memory = ReplayBuffer(cfg.memory_capacity) # 经验回放 - self.optimizer = optim.Adam(self.policy_model.parameters(), 0.001) - def choose_action(self,state): - state = Variable(torch.FloatTensor(state).unsqueeze(0), volatile=True) - dist = self.policy_model(state).data.cpu() - dist = dist * torch.linspace(self.Vmin, self.Vmax, self.n_atoms) - action = dist.sum(2).max(1)[1].numpy()[0] - return action - def projection_distribution(self,next_state, rewards, dones): - - - delta_z = float(self.Vmax - self.Vmin) / (self.n_atoms - 1) - support = torch.linspace(self.Vmin, self.Vmax, self.n_atoms) - - next_dist = self.target_model(next_state).data.cpu() * support - next_action = next_dist.sum(2).max(1)[1] - next_action = next_action.unsqueeze(1).unsqueeze(1).expand(next_dist.size(0), 1, next_dist.size(2)) - next_dist = next_dist.gather(1, next_action).squeeze(1) - - rewards = rewards.unsqueeze(1).expand_as(next_dist) - dones = dones.unsqueeze(1).expand_as(next_dist) - support = support.unsqueeze(0).expand_as(next_dist) - - Tz = rewards + (1 - dones) * 0.99 * support - Tz = Tz.clamp(min=self.Vmin, max=self.Vmax) - b = (Tz - self.Vmin) / delta_z - l = b.floor().long() - u = b.ceil().long() - - offset = torch.linspace(0, (self.batch_size - 1) * self.n_atoms, self.batch_size).long()\ - .unsqueeze(1).expand(self.batch_size, self.n_atoms) - - proj_dist = torch.zeros(next_dist.size()) - proj_dist.view(-1).index_add_(0, (l + offset).view(-1), (next_dist * (u.float() - b)).view(-1)) - proj_dist.view(-1).index_add_(0, (u + offset).view(-1), (next_dist * (b - l.float())).view(-1)) - - return proj_dist - def update(self): - if len(self.memory) < self.batch_size: # 当memory中不满足一个批量时,不更新策略 - return - state, action, reward, next_state, done = self.memory.sample(self.batch_size) - - state = Variable(torch.FloatTensor(np.float32(state))) - next_state = Variable(torch.FloatTensor(np.float32(next_state)), volatile=True) - action = Variable(torch.LongTensor(action)) - reward = torch.FloatTensor(reward) - done = torch.FloatTensor(np.float32(done)) - - proj_dist = self.projection_distribution(next_state, reward, done) - - dist = self.policy_model(state) - action = action.unsqueeze(1).unsqueeze(1).expand(self.batch_size, 1, self.n_atoms) - dist = dist.gather(1, action).squeeze(1) - dist.data.clamp_(0.01, 0.99) - loss = -(Variable(proj_dist) * dist.log()).sum(1) - loss = loss.mean() - - self.optimizer.zero_grad() - loss.backward() - self.optimizer.step() - - self.policy_model.reset_noise() - self.target_model.reset_noise() - \ No newline at end of file diff --git a/projects/codes/RainbowDQN/task0.py b/projects/codes/RainbowDQN/task0.py deleted file mode 100644 index 49a97a4..0000000 --- a/projects/codes/RainbowDQN/task0.py +++ /dev/null @@ -1,177 +0,0 @@ -import sys -import os -import torch.nn as nn -import torch.nn.functional as F -curr_path = os.path.dirname(os.path.abspath(__file__)) # 当前文件所在绝对路径 -parent_path = os.path.dirname(curr_path) # 父路径 -sys.path.append(parent_path) # 添加路径到系统路径 - -import gym -import torch -import datetime -import numpy as np -from common.utils import save_results_1, make_dir -from common.utils import plot_rewards -from dqn import DQN - -curr_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S") # 获取当前时间 - -class MLP(nn.Module): - def __init__(self, n_states,n_actions,hidden_dim=128): - """ 初始化q网络,为全连接网络 - n_states: 输入的特征数即环境的状态维度 - n_actions: 输出的动作维度 - """ - super(MLP, self).__init__() - self.fc1 = nn.Linear(n_states, hidden_dim) # 输入层 - self.fc2 = nn.Linear(hidden_dim,hidden_dim) # 隐藏层 - self.fc3 = nn.Linear(hidden_dim, n_actions) # 输出层 - - def forward(self, x): - # 各层对应的激活函数 - x = F.relu(self.fc1(x)) - x = F.relu(self.fc2(x)) - return self.fc3(x) - -class Config: - '''超参数 - ''' - - def __init__(self): - ############################### hyperparameters ################################ - self.algo_name = 'DQN' # algorithm name - self.env_name = 'CartPole-v0' # environment name - self.device = torch.device( - "cuda" if torch.cuda.is_available() else "cpu") # check GPU - self.seed = 10 # 随机种子,置0则不设置随机种子 - self.train_eps = 200 # 训练的回合数 - self.test_eps = 20 # 测试的回合数 - ################################################################################ - - ################################## 算法超参数 ################################### - self.gamma = 0.95 # 强化学习中的折扣因子 - self.epsilon_start = 0.90 # e-greedy策略中初始epsilon - self.epsilon_end = 0.01 # e-greedy策略中的终止epsilon - self.epsilon_decay = 500 # e-greedy策略中epsilon的衰减率 - self.lr = 0.0001 # 学习率 - self.memory_capacity = 100000 # 经验回放的容量 - self.batch_size = 64 # mini-batch SGD中的批量大小 - self.target_update = 4 # 目标网络的更新频率 - self.hidden_dim = 256 # 网络隐藏层 - ################################################################################ - - ################################# 保存结果相关参数 ################################ - self.result_path = curr_path + "/outputs/" + self.env_name + \ - '/' + curr_time + '/results/' # 保存结果的路径 - self.model_path = curr_path + "/outputs/" + self.env_name + \ - '/' + curr_time + '/models/' # 保存模型的路径 - self.save = True # 是否保存图片 - ################################################################################ - - -def env_agent_config(cfg): - ''' 创建环境和智能体 - ''' - env = gym.make(cfg.env_name) # 创建环境 - n_states = env.observation_space.shape[0] # 状态维度 - n_actions = env.action_space.n # 动作维度 - print(f"n states: {n_states}, n actions: {n_actions}") - model = MLP(n_states,n_actions) - agent = DQN(n_actions, model, cfg) # 创建智能体 - if cfg.seed !=0: # 设置随机种子 - torch.manual_seed(cfg.seed) - env.seed(cfg.seed) - np.random.seed(cfg.seed) - return env, agent - - -def train(cfg, env, agent): - ''' 训练 - ''' - print('开始训练!') - print(f'环境:{cfg.env_name}, 算法:{cfg.algo_name}, 设备:{cfg.device}') - rewards = [] # 记录所有回合的奖励 - ma_rewards = [] # 记录所有回合的滑动平均奖励 - steps = [] - for i_ep in range(cfg.train_eps): - ep_reward = 0 # 记录一回合内的奖励 - ep_step = 0 - state = env.reset() # 重置环境,返回初始状态 - while True: - ep_step += 1 - action = agent.choose_action(state) # 选择动作 - next_state, reward, done, _ = env.step(action) # 更新环境,返回transition - agent.memory.push(state, action, reward, - next_state, done) # 保存transition - state = next_state # 更新下一个状态 - agent.update() # 更新智能体 - ep_reward += reward # 累加奖励 - if done: - break - if (i_ep + 1) % cfg.target_update == 0: # 智能体目标网络更新 - agent.target_net.load_state_dict(agent.policy_net.state_dict()) - steps.append(ep_step) - rewards.append(ep_reward) - if ma_rewards: - ma_rewards.append(0.9 * ma_rewards[-1] + 0.1 * ep_reward) - else: - ma_rewards.append(ep_reward) - if (i_ep + 1) % 1 == 0: - print(f'Episode:{i_ep+1}/{cfg.test_eps}, Reward:{ep_reward:.2f}, Step:{ep_step:.2f} Epislon:{agent.epsilon(agent.frame_idx):.3f}') - print('Finish training!') - env.close() - res_dic = {'rewards':rewards,'ma_rewards':ma_rewards,'steps':steps} - return res_dic - - -def test(cfg, env, agent): - print('开始测试!') - print(f'环境:{cfg.env_name}, 算法:{cfg.algo_name}, 设备:{cfg.device}') - ############# 由于测试不需要使用epsilon-greedy策略,所以相应的值设置为0 ############### - cfg.epsilon_start = 0.0 # e-greedy策略中初始epsilon - cfg.epsilon_end = 0.0 # e-greedy策略中的终止epsilon - ################################################################################ - rewards = [] # 记录所有回合的奖励 - ma_rewards = [] # 记录所有回合的滑动平均奖励 - steps = [] - for i_ep in range(cfg.test_eps): - ep_reward = 0 # 记录一回合内的奖励 - ep_step = 0 - state = env.reset() # 重置环境,返回初始状态 - while True: - ep_step+=1 - action = agent.choose_action(state) # 选择动作 - next_state, reward, done, _ = env.step(action) # 更新环境,返回transition - state = next_state # 更新下一个状态 - ep_reward += reward # 累加奖励 - if done: - break - steps.append(ep_step) - rewards.append(ep_reward) - if ma_rewards: - ma_rewards.append(ma_rewards[-1] * 0.9 + ep_reward * 0.1) - else: - ma_rewards.append(ep_reward) - print(f'Episode:{i_ep+1}/{cfg.train_eps}, Reward:{ep_reward:.2f}, Step:{ep_step:.2f}') - print('完成测试!') - env.close() - return {'rewards':rewards,'ma_rewards':ma_rewards,'steps':steps} - - -if __name__ == "__main__": - cfg = Config() - # 训练 - env, agent = env_agent_config(cfg) - res_dic = train(cfg, env, agent) - make_dir(cfg.result_path, cfg.model_path) # 创建保存结果和模型路径的文件夹 - agent.save(path=cfg.model_path) # 保存模型 - save_results_1(res_dic, tag='train', - path=cfg.result_path) # 保存结果 - plot_rewards(res_dic['rewards'], res_dic['ma_rewards'], cfg, tag="train") # 画出结果 - # 测试 - env, agent = env_agent_config(cfg) - agent.load(path=cfg.model_path) # 导入模型 - res_dic = test(cfg, env, agent) - save_results_1(res_dic, tag='test', - path=cfg.result_path) # 保存结果 - plot_rewards(res_dic['rewards'], res_dic['ma_rewards'],cfg, tag="test") # 画出结果 diff --git a/projects/codes/SAC-S/sac.py b/projects/codes/SAC-S/sac.py deleted file mode 100644 index 6351c3d..0000000 --- a/projects/codes/SAC-S/sac.py +++ /dev/null @@ -1,27 +0,0 @@ -import torch -import torch.optim as optim -import torch.nn as nn -import numpy as np -class SAC: - def __init__(self,n_actions,models,memory,cfg): - self.device = cfg.device - self.value_net = models['ValueNet'].to(self.device) # $\psi$ - self.target_value_net = models['ValueNet'].to(self.device) # $\bar{\psi}$ - self.soft_q_net = models['SoftQNet'].to(self.device) # $\theta$ - self.policy_net = models['PolicyNet'].to(self.device) # $\phi$ - self.value_optimizer = optim.Adam(self.value_net.parameters(), lr=cfg.value_lr) - self.soft_q_optimizer = optim.Adam(self.soft_q_net.parameters(), lr=cfg.soft_q_lr) - self.policy_optimizer = optim.Adam(self.policy_net.parameters(), lr=cfg.policy_lr) - for target_param, param in zip(self.target_value_net.parameters(), self.value_net.parameters()): - target_param.data.copy_(param.data) - self.value_criterion = nn.MSELoss() - self.soft_q_criterion = nn.MSELoss() - def update(self): - # sample a batch of transitions from replay buffer - state_batch, action_batch, reward_batch, next_state_batch, done_batch = self.memory.sample( - self.batch_size) - state_batch = torch.tensor(np.array(state_batch), device=self.device, dtype=torch.float) # shape(batchsize,n_states) - action_batch = torch.tensor(action_batch, device=self.device).unsqueeze(1) # shape(batchsize,1) - reward_batch = torch.tensor(reward_batch, device=self.device, dtype=torch.float).unsqueeze(1) # shape(batchsize) - next_state_batch = torch.tensor(np.array(next_state_batch), device=self.device, dtype=torch.float) # shape(batchsize,n_states) - done_batch = torch.tensor(np.float32(done_batch), device=self.device).unsqueeze(1) # shape(batchsize,1) diff --git a/projects/codes/SAC/sacd_cnn.py b/projects/codes/SAC/sacd_cnn.py deleted file mode 100644 index e69de29..0000000 diff --git a/projects/codes/Sarsa/README.md b/projects/codes/Sarsa/README.md deleted file mode 100644 index 5258664..0000000 --- a/projects/codes/Sarsa/README.md +++ /dev/null @@ -1,19 +0,0 @@ -# Sarsa - -## 使用说明 - -运行```main.py```即可 - -## 环境说明 - -见[环境说明](https://github.com/JohnJim0816/reinforcement-learning-tutorials/blob/master/env_info.md)中的The Racetrack - -## 算法伪代码 - -![sarsa_algo](assets/sarsa_algo.png) - -## 其他说明 - -### 与Q-learning区别 - -算法上区别很小,只在更新公式上,但Q-learning是Off-policy,而Sarsa是On-policy,可参考[知乎:强化学习中sarsa算法是不是比q-learning算法收敛速度更慢?](https://www.zhihu.com/question/268461866) \ No newline at end of file diff --git a/projects/codes/Sarsa/Test_CliffWalking-v0_Sarsa_20221030-021206/config.yaml b/projects/codes/Sarsa/Test_CliffWalking-v0_Sarsa_20221030-021206/config.yaml deleted file mode 100644 index f1c252d..0000000 --- a/projects/codes/Sarsa/Test_CliffWalking-v0_Sarsa_20221030-021206/config.yaml +++ /dev/null @@ -1,19 +0,0 @@ -general_cfg: - algo_name: Sarsa - device: cpu - env_name: CliffWalking-v0 - load_checkpoint: true - load_path: Train_CliffWalking-v0_Sarsa_20221030-021146 - max_steps: 200 - mode: test - save_fig: true - seed: 1 - show_fig: false - test_eps: 20 - train_eps: 400 -algo_cfg: - epsilon_decay: 300 - epsilon_end: 0.01 - epsilon_start: 0.95 - gamma: 0.95 - lr: 0.1 diff --git a/projects/codes/Sarsa/Test_CliffWalking-v0_Sarsa_20221030-021206/logs/log.txt b/projects/codes/Sarsa/Test_CliffWalking-v0_Sarsa_20221030-021206/logs/log.txt deleted file mode 100644 index 29ed4a8..0000000 --- a/projects/codes/Sarsa/Test_CliffWalking-v0_Sarsa_20221030-021206/logs/log.txt +++ /dev/null @@ -1,24 +0,0 @@ -2022-10-30 02:12:06 - r - INFO: - n_states: 48, n_actions: 4 -2022-10-30 02:12:06 - r - INFO: - Start testing! -2022-10-30 02:12:06 - r - INFO: - Env: CliffWalking-v0, Algorithm: Sarsa, Device: cpu -2022-10-30 02:12:06 - r - INFO: - Episode: 1/20, Reward: -15.00, Steps:15 -2022-10-30 02:12:06 - r - INFO: - Episode: 2/20, Reward: -15.00, Steps:15 -2022-10-30 02:12:06 - r - INFO: - Episode: 3/20, Reward: -15.00, Steps:15 -2022-10-30 02:12:06 - r - INFO: - Episode: 4/20, Reward: -15.00, Steps:15 -2022-10-30 02:12:06 - r - INFO: - Episode: 5/20, Reward: -15.00, Steps:15 -2022-10-30 02:12:06 - r - INFO: - Episode: 6/20, Reward: -15.00, Steps:15 -2022-10-30 02:12:06 - r - INFO: - Episode: 7/20, Reward: -15.00, Steps:15 -2022-10-30 02:12:06 - r - INFO: - Episode: 8/20, Reward: -15.00, Steps:15 -2022-10-30 02:12:06 - r - INFO: - Episode: 9/20, Reward: -15.00, Steps:15 -2022-10-30 02:12:06 - r - INFO: - Episode: 10/20, Reward: -15.00, Steps:15 -2022-10-30 02:12:06 - r - INFO: - Episode: 11/20, Reward: -15.00, Steps:15 -2022-10-30 02:12:06 - r - INFO: - Episode: 12/20, Reward: -15.00, Steps:15 -2022-10-30 02:12:06 - r - INFO: - Episode: 13/20, Reward: -15.00, Steps:15 -2022-10-30 02:12:06 - r - INFO: - Episode: 14/20, Reward: -15.00, Steps:15 -2022-10-30 02:12:06 - r - INFO: - Episode: 15/20, Reward: -15.00, Steps:15 -2022-10-30 02:12:06 - r - INFO: - Episode: 16/20, Reward: -15.00, Steps:15 -2022-10-30 02:12:06 - r - INFO: - Episode: 17/20, Reward: -15.00, Steps:15 -2022-10-30 02:12:06 - r - INFO: - Episode: 18/20, Reward: -15.00, Steps:15 -2022-10-30 02:12:06 - r - INFO: - Episode: 19/20, Reward: -15.00, Steps:15 -2022-10-30 02:12:06 - r - INFO: - Episode: 20/20, Reward: -15.00, Steps:15 -2022-10-30 02:12:06 - r - INFO: - Finish testing! diff --git a/projects/codes/Sarsa/Test_CliffWalking-v0_Sarsa_20221030-021206/models/checkpoint.pkl b/projects/codes/Sarsa/Test_CliffWalking-v0_Sarsa_20221030-021206/models/checkpoint.pkl deleted file mode 100644 index d226d4c..0000000 Binary files a/projects/codes/Sarsa/Test_CliffWalking-v0_Sarsa_20221030-021206/models/checkpoint.pkl and /dev/null differ diff --git a/projects/codes/Sarsa/Test_CliffWalking-v0_Sarsa_20221030-021206/results/learning_curve.png b/projects/codes/Sarsa/Test_CliffWalking-v0_Sarsa_20221030-021206/results/learning_curve.png deleted file mode 100644 index cf20c71..0000000 Binary files a/projects/codes/Sarsa/Test_CliffWalking-v0_Sarsa_20221030-021206/results/learning_curve.png and /dev/null differ diff --git a/projects/codes/Sarsa/Test_CliffWalking-v0_Sarsa_20221030-021206/results/res.csv b/projects/codes/Sarsa/Test_CliffWalking-v0_Sarsa_20221030-021206/results/res.csv deleted file mode 100644 index 7f09e4b..0000000 --- a/projects/codes/Sarsa/Test_CliffWalking-v0_Sarsa_20221030-021206/results/res.csv +++ /dev/null @@ -1,21 +0,0 @@ -episodes,rewards,steps -0,-15,15 -1,-15,15 -2,-15,15 -3,-15,15 -4,-15,15 -5,-15,15 -6,-15,15 -7,-15,15 -8,-15,15 -9,-15,15 -10,-15,15 -11,-15,15 -12,-15,15 -13,-15,15 -14,-15,15 -15,-15,15 -16,-15,15 -17,-15,15 -18,-15,15 -19,-15,15 diff --git a/projects/codes/Sarsa/Test_Racetrack-v0_Sarsa_20221030-021347/config.yaml b/projects/codes/Sarsa/Test_Racetrack-v0_Sarsa_20221030-021347/config.yaml deleted file mode 100644 index 7c1b16f..0000000 --- a/projects/codes/Sarsa/Test_Racetrack-v0_Sarsa_20221030-021347/config.yaml +++ /dev/null @@ -1,19 +0,0 @@ -general_cfg: - algo_name: Sarsa - device: cpu - env_name: Racetrack-v0 - load_checkpoint: true - load_path: Train_Racetrack-v0_Sarsa_20221030-021315 - max_steps: 200 - mode: test - save_fig: true - seed: 10 - show_fig: false - test_eps: 20 - train_eps: 400 -algo_cfg: - epsilon_decay: 200 - epsilon_end: 0.01 - epsilon_start: 0.9 - gamma: 0.99 - lr: 0.1 diff --git a/projects/codes/Sarsa/Test_Racetrack-v0_Sarsa_20221030-021347/logs/log.txt b/projects/codes/Sarsa/Test_Racetrack-v0_Sarsa_20221030-021347/logs/log.txt deleted file mode 100644 index 7fd4614..0000000 --- a/projects/codes/Sarsa/Test_Racetrack-v0_Sarsa_20221030-021347/logs/log.txt +++ /dev/null @@ -1,24 +0,0 @@ -2022-10-30 02:13:47 - r - INFO: - n_states: 4, n_actions: 9 -2022-10-30 02:13:47 - r - INFO: - Start testing! -2022-10-30 02:13:47 - r - INFO: - Env: Racetrack-v0, Algorithm: Sarsa, Device: cpu -2022-10-30 02:13:47 - r - INFO: - Episode: 1/20, Reward: 3.00, Steps:7 -2022-10-30 02:13:47 - r - INFO: - Episode: 2/20, Reward: 3.00, Steps:7 -2022-10-30 02:13:47 - r - INFO: - Episode: 3/20, Reward: 2.00, Steps:8 -2022-10-30 02:13:47 - r - INFO: - Episode: 4/20, Reward: 3.00, Steps:7 -2022-10-30 02:13:47 - r - INFO: - Episode: 5/20, Reward: -12.00, Steps:12 -2022-10-30 02:13:47 - r - INFO: - Episode: 6/20, Reward: -49.00, Steps:29 -2022-10-30 02:13:47 - r - INFO: - Episode: 7/20, Reward: 3.00, Steps:7 -2022-10-30 02:13:47 - r - INFO: - Episode: 8/20, Reward: -17.00, Steps:17 -2022-10-30 02:13:47 - r - INFO: - Episode: 9/20, Reward: 4.00, Steps:6 -2022-10-30 02:13:47 - r - INFO: - Episode: 10/20, Reward: -17.00, Steps:17 -2022-10-30 02:13:47 - r - INFO: - Episode: 11/20, Reward: 2.00, Steps:8 -2022-10-30 02:13:47 - r - INFO: - Episode: 12/20, Reward: 3.00, Steps:7 -2022-10-30 02:13:47 - r - INFO: - Episode: 13/20, Reward: 3.00, Steps:7 -2022-10-30 02:13:47 - r - INFO: - Episode: 14/20, Reward: 2.00, Steps:8 -2022-10-30 02:13:47 - r - INFO: - Episode: 15/20, Reward: 3.00, Steps:7 -2022-10-30 02:13:47 - r - INFO: - Episode: 16/20, Reward: -34.00, Steps:24 -2022-10-30 02:13:47 - r - INFO: - Episode: 17/20, Reward: 3.00, Steps:7 -2022-10-30 02:13:47 - r - INFO: - Episode: 18/20, Reward: 5.00, Steps:5 -2022-10-30 02:13:47 - r - INFO: - Episode: 19/20, Reward: 5.00, Steps:5 -2022-10-30 02:13:47 - r - INFO: - Episode: 20/20, Reward: 3.00, Steps:7 -2022-10-30 02:13:47 - r - INFO: - Finish testing! diff --git a/projects/codes/Sarsa/Test_Racetrack-v0_Sarsa_20221030-021347/models/checkpoint.pkl b/projects/codes/Sarsa/Test_Racetrack-v0_Sarsa_20221030-021347/models/checkpoint.pkl deleted file mode 100644 index d950b3f..0000000 Binary files a/projects/codes/Sarsa/Test_Racetrack-v0_Sarsa_20221030-021347/models/checkpoint.pkl and /dev/null differ diff --git a/projects/codes/Sarsa/Test_Racetrack-v0_Sarsa_20221030-021347/results/learning_curve.png b/projects/codes/Sarsa/Test_Racetrack-v0_Sarsa_20221030-021347/results/learning_curve.png deleted file mode 100644 index fde014e..0000000 Binary files a/projects/codes/Sarsa/Test_Racetrack-v0_Sarsa_20221030-021347/results/learning_curve.png and /dev/null differ diff --git a/projects/codes/Sarsa/Test_Racetrack-v0_Sarsa_20221030-021347/results/res.csv b/projects/codes/Sarsa/Test_Racetrack-v0_Sarsa_20221030-021347/results/res.csv deleted file mode 100644 index 5d08ed0..0000000 --- a/projects/codes/Sarsa/Test_Racetrack-v0_Sarsa_20221030-021347/results/res.csv +++ /dev/null @@ -1,21 +0,0 @@ -episodes,rewards,steps -0,3,7 -1,3,7 -2,2,8 -3,3,7 -4,-12,12 -5,-49,29 -6,3,7 -7,-17,17 -8,4,6 -9,-17,17 -10,2,8 -11,3,7 -12,3,7 -13,2,8 -14,3,7 -15,-34,24 -16,3,7 -17,5,5 -18,5,5 -19,3,7 diff --git a/projects/codes/Sarsa/Train_CliffWalking-v0_Sarsa_20221030-021146/config.yaml b/projects/codes/Sarsa/Train_CliffWalking-v0_Sarsa_20221030-021146/config.yaml deleted file mode 100644 index 4d61198..0000000 --- a/projects/codes/Sarsa/Train_CliffWalking-v0_Sarsa_20221030-021146/config.yaml +++ /dev/null @@ -1,19 +0,0 @@ -general_cfg: - algo_name: Sarsa - device: cpu - env_name: CliffWalking-v0 - load_checkpoint: false - load_path: Train_CartPole-v1_DQN_20221026-054757 - max_steps: 200 - mode: train - save_fig: true - seed: 1 - show_fig: false - test_eps: 20 - train_eps: 800 -algo_cfg: - epsilon_decay: 300 - epsilon_end: 0.01 - epsilon_start: 0.95 - gamma: 0.95 - lr: 0.1 diff --git a/projects/codes/Sarsa/Train_CliffWalking-v0_Sarsa_20221030-021146/logs/log.txt b/projects/codes/Sarsa/Train_CliffWalking-v0_Sarsa_20221030-021146/logs/log.txt deleted file mode 100644 index 76df5b3..0000000 --- a/projects/codes/Sarsa/Train_CliffWalking-v0_Sarsa_20221030-021146/logs/log.txt +++ /dev/null @@ -1,804 +0,0 @@ -2022-10-30 02:11:46 - r - INFO: - n_states: 48, n_actions: 4 -2022-10-30 02:11:46 - r - INFO: - Start training! -2022-10-30 02:11:46 - r - INFO: - Env: CliffWalking-v0, Algorithm: Sarsa, Device: cpu -2022-10-30 02:11:46 - r - INFO: - Episode: 1/800, Reward: -1091.00, Steps:200, Epislon: 0.491 -2022-10-30 02:11:46 - r - INFO: - Episode: 2/800, Reward: -320.00, Steps:122, Epislon: 0.329 -2022-10-30 02:11:46 - r - INFO: - Episode: 3/800, Reward: -794.00, Steps:200, Epislon: 0.173 -2022-10-30 02:11:46 - r - INFO: - Episode: 4/800, Reward: -596.00, Steps:200, Epislon: 0.094 -2022-10-30 02:11:46 - r - INFO: - Episode: 5/800, Reward: -398.00, Steps:200, Epislon: 0.053 -2022-10-30 02:11:46 - r - INFO: - Episode: 6/800, Reward: -59.00, Steps:59, Epislon: 0.045 -2022-10-30 02:11:46 - r - INFO: - Episode: 7/800, Reward: -299.00, Steps:200, Epislon: 0.028 -2022-10-30 02:11:46 - r - INFO: - Episode: 8/800, Reward: -82.00, Steps:82, Epislon: 0.024 -2022-10-30 02:11:46 - r - INFO: - Episode: 9/800, Reward: -125.00, Steps:125, Epislon: 0.019 -2022-10-30 02:11:46 - r - INFO: - Episode: 10/800, Reward: -75.00, Steps:75, Epislon: 0.017 -2022-10-30 02:11:46 - r - INFO: - Episode: 11/800, Reward: -285.00, Steps:186, Epislon: 0.014 -2022-10-30 02:11:46 - r - INFO: - Episode: 12/800, Reward: -103.00, Steps:103, Epislon: 0.013 -2022-10-30 02:11:46 - r - INFO: - Episode: 13/800, Reward: -103.00, Steps:103, Epislon: 0.012 -2022-10-30 02:11:46 - r - INFO: - Episode: 14/800, Reward: -131.00, Steps:131, Epislon: 0.011 -2022-10-30 02:11:46 - r - INFO: - Episode: 15/800, Reward: -53.00, Steps:53, Epislon: 0.011 -2022-10-30 02:11:46 - r - INFO: - Episode: 16/800, Reward: -113.00, Steps:113, Epislon: 0.011 -2022-10-30 02:11:46 - r - INFO: - Episode: 17/800, Reward: -125.00, Steps:125, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 18/800, Reward: -95.00, Steps:95, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 19/800, Reward: -97.00, Steps:97, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 20/800, Reward: -145.00, Steps:145, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 21/800, Reward: -89.00, Steps:89, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 22/800, Reward: -97.00, Steps:97, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 23/800, Reward: -115.00, Steps:115, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 24/800, Reward: -121.00, Steps:121, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 25/800, Reward: -53.00, Steps:53, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 26/800, Reward: -111.00, Steps:111, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 27/800, Reward: -97.00, Steps:97, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 28/800, Reward: -206.00, Steps:107, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 29/800, Reward: -147.00, Steps:147, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 30/800, Reward: -36.00, Steps:36, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 31/800, Reward: -216.00, Steps:117, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 32/800, Reward: -103.00, Steps:103, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 33/800, Reward: -87.00, Steps:87, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 34/800, Reward: -80.00, Steps:80, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 35/800, Reward: -73.00, Steps:73, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 36/800, Reward: -83.00, Steps:83, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 37/800, Reward: -143.00, Steps:44, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 38/800, Reward: -241.00, Steps:142, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 39/800, Reward: -77.00, Steps:77, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 40/800, Reward: -49.00, Steps:49, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 41/800, Reward: -87.00, Steps:87, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 42/800, Reward: -47.00, Steps:47, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 43/800, Reward: -89.00, Steps:89, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 44/800, Reward: -31.00, Steps:31, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 45/800, Reward: -192.00, Steps:93, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 46/800, Reward: -85.00, Steps:85, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 47/800, Reward: -55.00, Steps:55, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 48/800, Reward: -59.00, Steps:59, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 49/800, Reward: -60.00, Steps:60, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 50/800, Reward: -50.00, Steps:50, Epislon: 0.010 -2022-10-30 02:11:46 - r - INFO: - Episode: 51/800, Reward: -23.00, Steps:23, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 52/800, Reward: -101.00, Steps:101, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 53/800, Reward: -43.00, Steps:43, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 54/800, Reward: -70.00, Steps:70, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 55/800, Reward: -35.00, Steps:35, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 56/800, Reward: -47.00, Steps:47, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 57/800, Reward: -80.00, Steps:80, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 58/800, Reward: -61.00, Steps:61, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 59/800, Reward: -35.00, Steps:35, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 60/800, Reward: -73.00, Steps:73, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 61/800, Reward: -54.00, Steps:54, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 62/800, Reward: -37.00, Steps:37, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 63/800, Reward: -65.00, Steps:65, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 64/800, Reward: -41.00, Steps:41, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 65/800, Reward: -81.00, Steps:81, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 66/800, Reward: -39.00, Steps:39, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 67/800, Reward: -35.00, Steps:35, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 68/800, Reward: -61.00, Steps:61, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 69/800, Reward: -57.00, Steps:57, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 70/800, Reward: -43.00, Steps:43, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 71/800, Reward: -59.00, Steps:59, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 72/800, Reward: -43.00, Steps:43, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 73/800, Reward: -51.00, Steps:51, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 74/800, Reward: -43.00, Steps:43, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 75/800, Reward: -69.00, Steps:69, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 76/800, Reward: -41.00, Steps:41, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 77/800, Reward: -194.00, Steps:95, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 78/800, Reward: -35.00, Steps:35, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 79/800, Reward: -35.00, Steps:35, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 80/800, Reward: -81.00, Steps:81, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 81/800, Reward: -65.00, Steps:65, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 82/800, Reward: -35.00, Steps:35, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 83/800, Reward: -47.00, Steps:47, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 84/800, Reward: -53.00, Steps:53, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 85/800, Reward: -165.00, Steps:66, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 86/800, Reward: -69.00, Steps:69, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 87/800, Reward: -35.00, Steps:35, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 88/800, Reward: -56.00, Steps:56, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 89/800, Reward: -164.00, Steps:65, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 90/800, Reward: -45.00, Steps:45, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 91/800, Reward: -43.00, Steps:43, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 92/800, Reward: -43.00, Steps:43, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 93/800, Reward: -29.00, Steps:29, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 94/800, Reward: -69.00, Steps:69, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 95/800, Reward: -33.00, Steps:33, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 96/800, Reward: -57.00, Steps:57, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 97/800, Reward: -29.00, Steps:29, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 98/800, Reward: -55.00, Steps:55, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 99/800, Reward: -61.00, Steps:61, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 100/800, Reward: -162.00, Steps:63, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 101/800, Reward: -55.00, Steps:55, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 102/800, Reward: -31.00, Steps:31, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 103/800, Reward: -53.00, Steps:53, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 104/800, Reward: -39.00, Steps:39, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 105/800, Reward: -55.00, Steps:55, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 106/800, Reward: -25.00, Steps:25, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 107/800, Reward: -33.00, Steps:33, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 108/800, Reward: -49.00, Steps:49, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 109/800, Reward: -65.00, Steps:65, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 110/800, Reward: -45.00, Steps:45, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 111/800, Reward: -29.00, Steps:29, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 112/800, Reward: -25.00, Steps:25, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 113/800, Reward: -51.00, Steps:51, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 114/800, Reward: -43.00, Steps:43, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 115/800, Reward: -47.00, Steps:47, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 116/800, Reward: -41.00, Steps:41, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 117/800, Reward: -27.00, Steps:27, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 118/800, Reward: -31.00, Steps:31, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 119/800, Reward: -180.00, Steps:81, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 120/800, Reward: -43.00, Steps:43, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 121/800, Reward: -25.00, Steps:25, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 122/800, Reward: -47.00, Steps:47, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 123/800, Reward: -65.00, Steps:65, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 124/800, Reward: -29.00, Steps:29, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 125/800, Reward: -31.00, Steps:31, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 126/800, Reward: -49.00, Steps:49, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 127/800, Reward: -25.00, Steps:25, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 128/800, Reward: -45.00, Steps:45, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 129/800, Reward: -49.00, Steps:49, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 130/800, Reward: -37.00, Steps:37, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 131/800, Reward: -49.00, Steps:49, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 132/800, Reward: -25.00, Steps:25, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 133/800, Reward: -29.00, Steps:29, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 134/800, Reward: -35.00, Steps:35, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 135/800, Reward: -37.00, Steps:37, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 136/800, Reward: -43.00, Steps:43, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 137/800, Reward: -31.00, Steps:31, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 138/800, Reward: -51.00, Steps:51, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 139/800, Reward: -25.00, Steps:25, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 140/800, Reward: -51.00, Steps:51, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 141/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 142/800, Reward: -35.00, Steps:35, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 143/800, Reward: -31.00, Steps:31, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 144/800, Reward: -41.00, Steps:41, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 145/800, Reward: -29.00, Steps:29, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 146/800, Reward: -27.00, Steps:27, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 147/800, Reward: -47.00, Steps:47, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 148/800, Reward: -27.00, Steps:27, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 149/800, Reward: -23.00, Steps:23, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 150/800, Reward: -45.00, Steps:45, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 151/800, Reward: -31.00, Steps:31, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 152/800, Reward: -33.00, Steps:33, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 153/800, Reward: -31.00, Steps:31, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 154/800, Reward: -148.00, Steps:49, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 155/800, Reward: -41.00, Steps:41, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 156/800, Reward: -25.00, Steps:25, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 157/800, Reward: -29.00, Steps:29, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 158/800, Reward: -31.00, Steps:31, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 159/800, Reward: -33.00, Steps:33, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 160/800, Reward: -27.00, Steps:27, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 161/800, Reward: -27.00, Steps:27, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 162/800, Reward: -29.00, Steps:29, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 163/800, Reward: -27.00, Steps:27, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 164/800, Reward: -27.00, Steps:27, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 165/800, Reward: -23.00, Steps:23, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 166/800, Reward: -35.00, Steps:35, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 167/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 168/800, Reward: -23.00, Steps:23, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 169/800, Reward: -23.00, Steps:23, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 170/800, Reward: -41.00, Steps:41, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 171/800, Reward: -39.00, Steps:39, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 172/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 173/800, Reward: -25.00, Steps:25, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 174/800, Reward: -23.00, Steps:23, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 175/800, Reward: -31.00, Steps:31, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 176/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 177/800, Reward: -155.00, Steps:56, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 178/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 179/800, Reward: -37.00, Steps:37, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 180/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 181/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 182/800, Reward: -39.00, Steps:39, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 183/800, Reward: -25.00, Steps:25, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 184/800, Reward: -25.00, Steps:25, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 185/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 186/800, Reward: -29.00, Steps:29, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 187/800, Reward: -29.00, Steps:29, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 188/800, Reward: -25.00, Steps:25, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 189/800, Reward: -25.00, Steps:25, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 190/800, Reward: -42.00, Steps:42, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 191/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 192/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 193/800, Reward: -25.00, Steps:25, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 194/800, Reward: -29.00, Steps:29, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 195/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 196/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 197/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 198/800, Reward: -29.00, Steps:29, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 199/800, Reward: -25.00, Steps:25, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 200/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 201/800, Reward: -33.00, Steps:33, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 202/800, Reward: -37.00, Steps:37, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 203/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 204/800, Reward: -29.00, Steps:29, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 205/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 206/800, Reward: -25.00, Steps:25, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 207/800, Reward: -23.00, Steps:23, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 208/800, Reward: -25.00, Steps:25, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 209/800, Reward: -25.00, Steps:25, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 210/800, Reward: -25.00, Steps:25, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 211/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 212/800, Reward: -37.00, Steps:37, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 213/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 214/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 215/800, Reward: -23.00, Steps:23, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 216/800, Reward: -31.00, Steps:31, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 217/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 218/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 219/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 220/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 221/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 222/800, Reward: -27.00, Steps:27, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 223/800, Reward: -23.00, Steps:23, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 224/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 225/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 226/800, Reward: -43.00, Steps:43, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 227/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 228/800, Reward: -25.00, Steps:25, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 229/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 230/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 231/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 232/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 233/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 234/800, Reward: -23.00, Steps:23, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 235/800, Reward: -31.00, Steps:31, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 236/800, Reward: -23.00, Steps:23, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 237/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 238/800, Reward: -29.00, Steps:29, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 239/800, Reward: -30.00, Steps:30, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 240/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 241/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 242/800, Reward: -24.00, Steps:24, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 243/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 244/800, Reward: -23.00, Steps:23, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 245/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 246/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 247/800, Reward: -29.00, Steps:29, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 248/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 249/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 250/800, Reward: -29.00, Steps:29, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 251/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 252/800, Reward: -25.00, Steps:25, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 253/800, Reward: -25.00, Steps:25, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 254/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 255/800, Reward: -25.00, Steps:25, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 256/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 257/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 258/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 259/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 260/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 261/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 262/800, Reward: -23.00, Steps:23, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 263/800, Reward: -27.00, Steps:27, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 264/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 265/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 266/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 267/800, Reward: -23.00, Steps:23, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 268/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 269/800, Reward: -25.00, Steps:25, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 270/800, Reward: -27.00, Steps:27, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 271/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 272/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 273/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 274/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 275/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 276/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 277/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 278/800, Reward: -23.00, Steps:23, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 279/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 280/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 281/800, Reward: -23.00, Steps:23, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 282/800, Reward: -27.00, Steps:27, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 283/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 284/800, Reward: -23.00, Steps:23, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 285/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 286/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 287/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 288/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 289/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 290/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 291/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 292/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 293/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 294/800, Reward: -120.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 295/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 296/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 297/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 298/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 299/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 300/800, Reward: -35.00, Steps:35, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 301/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 302/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 303/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 304/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 305/800, Reward: -23.00, Steps:23, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 306/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 307/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 308/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 309/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 310/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 311/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 312/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 313/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 314/800, Reward: -25.00, Steps:25, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 315/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 316/800, Reward: -32.00, Steps:32, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 317/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 318/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 319/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 320/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 321/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 322/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 323/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 324/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 325/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 326/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 327/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 328/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 329/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 330/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 331/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 332/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 333/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 334/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 335/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 336/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 337/800, Reward: -26.00, Steps:26, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 338/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 339/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 340/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 341/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 342/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 343/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 344/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 345/800, Reward: -27.00, Steps:27, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 346/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 347/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 348/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 349/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 350/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 351/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 352/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 353/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 354/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 355/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 356/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 357/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 358/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 359/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 360/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 361/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 362/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 363/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 364/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 365/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 366/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 367/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 368/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 369/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 370/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 371/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 372/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 373/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 374/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 375/800, Reward: -25.00, Steps:25, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 376/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 377/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 378/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 379/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 380/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 381/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 382/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 383/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 384/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 385/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 386/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 387/800, Reward: -25.00, Steps:25, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 388/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 389/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 390/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 391/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 392/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 393/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 394/800, Reward: -122.00, Steps:23, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 395/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 396/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 397/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 398/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 399/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 400/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 401/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 402/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 403/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 404/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 405/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 406/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 407/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 408/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 409/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 410/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 411/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 412/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 413/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 414/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 415/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 416/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 417/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 418/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 419/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 420/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 421/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 422/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 423/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 424/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 425/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 426/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 427/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 428/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 429/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 430/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 431/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 432/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 433/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 434/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 435/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 436/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 437/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 438/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 439/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 440/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 441/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 442/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 443/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 444/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 445/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 446/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 447/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 448/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 449/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 450/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 451/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 452/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 453/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 454/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 455/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 456/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 457/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 458/800, Reward: -22.00, Steps:22, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 459/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 460/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 461/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 462/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 463/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 464/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 465/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 466/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 467/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 468/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 469/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 470/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 471/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 472/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 473/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 474/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 475/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 476/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 477/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 478/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 479/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 480/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 481/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 482/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 483/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 484/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 485/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 486/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 487/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 488/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 489/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 490/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 491/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 492/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 493/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 494/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 495/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 496/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 497/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 498/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 499/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 500/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 501/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 502/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 503/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 504/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 505/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 506/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 507/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 508/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 509/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 510/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 511/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 512/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 513/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 514/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 515/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 516/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 517/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 518/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 519/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 520/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 521/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 522/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 523/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 524/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 525/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 526/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 527/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 528/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 529/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 530/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 531/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 532/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 533/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 534/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 535/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 536/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 537/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 538/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 539/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 540/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 541/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 542/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 543/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 544/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 545/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 546/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 547/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 548/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 549/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 550/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 551/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 552/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 553/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 554/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 555/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 556/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 557/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 558/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:47 - r - INFO: - Episode: 559/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 560/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 561/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 562/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 563/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 564/800, Reward: -20.00, Steps:20, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 565/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 566/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 567/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 568/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 569/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 570/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 571/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 572/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 573/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 574/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 575/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 576/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 577/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 578/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 579/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 580/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 581/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 582/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 583/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 584/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 585/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 586/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 587/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 588/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 589/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 590/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 591/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 592/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 593/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 594/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 595/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 596/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 597/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 598/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 599/800, Reward: -16.00, Steps:16, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 600/800, Reward: -16.00, Steps:16, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 601/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 602/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 603/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 604/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 605/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 606/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 607/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 608/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 609/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 610/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 611/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 612/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 613/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 614/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 615/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 616/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 617/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 618/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 619/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 620/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 621/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 622/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 623/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 624/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 625/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 626/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 627/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 628/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 629/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 630/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 631/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 632/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 633/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 634/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 635/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 636/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 637/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 638/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 639/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 640/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 641/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 642/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 643/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 644/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 645/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 646/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 647/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 648/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 649/800, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 650/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 651/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 652/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 653/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 654/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 655/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 656/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 657/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 658/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 659/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 660/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 661/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 662/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 663/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 664/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 665/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 666/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 667/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 668/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 669/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 670/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 671/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 672/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 673/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 674/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 675/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 676/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 677/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 678/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 679/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 680/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 681/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 682/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 683/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 684/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 685/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 686/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 687/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 688/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 689/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 690/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 691/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 692/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 693/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 694/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 695/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 696/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 697/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 698/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 699/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 700/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 701/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 702/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 703/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 704/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 705/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 706/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 707/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 708/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 709/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 710/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 711/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 712/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 713/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 714/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 715/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 716/800, Reward: -16.00, Steps:16, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 717/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 718/800, Reward: -117.00, Steps:18, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 719/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 720/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 721/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 722/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 723/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 724/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 725/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 726/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 727/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 728/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 729/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 730/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 731/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 732/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 733/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 734/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 735/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 736/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 737/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 738/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 739/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 740/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 741/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 742/800, Reward: -16.00, Steps:16, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 743/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 744/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 745/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 746/800, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 747/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 748/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 749/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 750/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 751/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 752/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 753/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 754/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 755/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 756/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 757/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 758/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 759/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 760/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 761/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 762/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 763/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 764/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 765/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 766/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 767/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 768/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 769/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 770/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 771/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 772/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 773/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 774/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 775/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 776/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 777/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 778/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 779/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 780/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 781/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 782/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 783/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 784/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 785/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 786/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 787/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 788/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 789/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 790/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 791/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 792/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 793/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 794/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 795/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 796/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 797/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 798/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 799/800, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Episode: 800/800, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:11:48 - r - INFO: - Finish training! diff --git a/projects/codes/Sarsa/Train_CliffWalking-v0_Sarsa_20221030-021146/models/checkpoint.pkl b/projects/codes/Sarsa/Train_CliffWalking-v0_Sarsa_20221030-021146/models/checkpoint.pkl deleted file mode 100644 index d226d4c..0000000 Binary files a/projects/codes/Sarsa/Train_CliffWalking-v0_Sarsa_20221030-021146/models/checkpoint.pkl and /dev/null differ diff --git a/projects/codes/Sarsa/Train_CliffWalking-v0_Sarsa_20221030-021146/results/learning_curve.png b/projects/codes/Sarsa/Train_CliffWalking-v0_Sarsa_20221030-021146/results/learning_curve.png deleted file mode 100644 index 3c4dd0f..0000000 Binary files a/projects/codes/Sarsa/Train_CliffWalking-v0_Sarsa_20221030-021146/results/learning_curve.png and /dev/null differ diff --git a/projects/codes/Sarsa/Train_CliffWalking-v0_Sarsa_20221030-021146/results/res.csv b/projects/codes/Sarsa/Train_CliffWalking-v0_Sarsa_20221030-021146/results/res.csv deleted file mode 100644 index 16858a4..0000000 --- a/projects/codes/Sarsa/Train_CliffWalking-v0_Sarsa_20221030-021146/results/res.csv +++ /dev/null @@ -1,801 +0,0 @@ -episodes,rewards,steps -0,-1091,200 -1,-320,122 -2,-794,200 -3,-596,200 -4,-398,200 -5,-59,59 -6,-299,200 -7,-82,82 -8,-125,125 -9,-75,75 -10,-285,186 -11,-103,103 -12,-103,103 -13,-131,131 -14,-53,53 -15,-113,113 -16,-125,125 -17,-95,95 -18,-97,97 -19,-145,145 -20,-89,89 -21,-97,97 -22,-115,115 -23,-121,121 -24,-53,53 -25,-111,111 -26,-97,97 -27,-206,107 -28,-147,147 -29,-36,36 -30,-216,117 -31,-103,103 -32,-87,87 -33,-80,80 -34,-73,73 -35,-83,83 -36,-143,44 -37,-241,142 -38,-77,77 -39,-49,49 -40,-87,87 -41,-47,47 -42,-89,89 -43,-31,31 -44,-192,93 -45,-85,85 -46,-55,55 -47,-59,59 -48,-60,60 -49,-50,50 -50,-23,23 -51,-101,101 -52,-43,43 -53,-70,70 -54,-35,35 -55,-47,47 -56,-80,80 -57,-61,61 -58,-35,35 -59,-73,73 -60,-54,54 -61,-37,37 -62,-65,65 -63,-41,41 -64,-81,81 -65,-39,39 -66,-35,35 -67,-61,61 -68,-57,57 -69,-43,43 -70,-59,59 -71,-43,43 -72,-51,51 -73,-43,43 -74,-69,69 -75,-41,41 -76,-194,95 -77,-35,35 -78,-35,35 -79,-81,81 -80,-65,65 -81,-35,35 -82,-47,47 -83,-53,53 -84,-165,66 -85,-69,69 -86,-35,35 -87,-56,56 -88,-164,65 -89,-45,45 -90,-43,43 -91,-43,43 -92,-29,29 -93,-69,69 -94,-33,33 -95,-57,57 -96,-29,29 -97,-55,55 -98,-61,61 -99,-162,63 -100,-55,55 -101,-31,31 -102,-53,53 -103,-39,39 -104,-55,55 -105,-25,25 -106,-33,33 -107,-49,49 -108,-65,65 -109,-45,45 -110,-29,29 -111,-25,25 -112,-51,51 -113,-43,43 -114,-47,47 -115,-41,41 -116,-27,27 -117,-31,31 -118,-180,81 -119,-43,43 -120,-25,25 -121,-47,47 -122,-65,65 -123,-29,29 -124,-31,31 -125,-49,49 -126,-25,25 -127,-45,45 -128,-49,49 -129,-37,37 -130,-49,49 -131,-25,25 -132,-29,29 -133,-35,35 -134,-37,37 -135,-43,43 -136,-31,31 -137,-51,51 -138,-25,25 -139,-51,51 -140,-21,21 -141,-35,35 -142,-31,31 -143,-41,41 -144,-29,29 -145,-27,27 -146,-47,47 -147,-27,27 -148,-23,23 -149,-45,45 -150,-31,31 -151,-33,33 -152,-31,31 -153,-148,49 -154,-41,41 -155,-25,25 -156,-29,29 -157,-31,31 -158,-33,33 -159,-27,27 -160,-27,27 -161,-29,29 -162,-27,27 -163,-27,27 -164,-23,23 -165,-35,35 -166,-21,21 -167,-23,23 -168,-23,23 -169,-41,41 -170,-39,39 -171,-21,21 -172,-25,25 -173,-23,23 -174,-31,31 -175,-21,21 -176,-155,56 -177,-21,21 -178,-37,37 -179,-17,17 -180,-19,19 -181,-39,39 -182,-25,25 -183,-25,25 -184,-19,19 -185,-29,29 -186,-29,29 -187,-25,25 -188,-25,25 -189,-42,42 -190,-21,21 -191,-21,21 -192,-25,25 -193,-29,29 -194,-15,15 -195,-21,21 -196,-17,17 -197,-29,29 -198,-25,25 -199,-15,15 -200,-33,33 -201,-37,37 -202,-17,17 -203,-29,29 -204,-17,17 -205,-25,25 -206,-23,23 -207,-25,25 -208,-25,25 -209,-25,25 -210,-21,21 -211,-37,37 -212,-17,17 -213,-17,17 -214,-23,23 -215,-31,31 -216,-21,21 -217,-21,21 -218,-21,21 -219,-19,19 -220,-17,17 -221,-27,27 -222,-23,23 -223,-15,15 -224,-19,19 -225,-43,43 -226,-15,15 -227,-25,25 -228,-19,19 -229,-19,19 -230,-19,19 -231,-19,19 -232,-21,21 -233,-23,23 -234,-31,31 -235,-23,23 -236,-19,19 -237,-29,29 -238,-30,30 -239,-19,19 -240,-17,17 -241,-24,24 -242,-15,15 -243,-23,23 -244,-21,21 -245,-15,15 -246,-29,29 -247,-19,19 -248,-17,17 -249,-29,29 -250,-21,21 -251,-25,25 -252,-25,25 -253,-19,19 -254,-25,25 -255,-21,21 -256,-19,19 -257,-19,19 -258,-15,15 -259,-21,21 -260,-19,19 -261,-23,23 -262,-27,27 -263,-19,19 -264,-21,21 -265,-21,21 -266,-23,23 -267,-17,17 -268,-25,25 -269,-27,27 -270,-19,19 -271,-19,19 -272,-21,21 -273,-15,15 -274,-21,21 -275,-17,17 -276,-15,15 -277,-23,23 -278,-17,17 -279,-19,19 -280,-23,23 -281,-27,27 -282,-17,17 -283,-23,23 -284,-19,19 -285,-17,17 -286,-17,17 -287,-15,15 -288,-21,21 -289,-15,15 -290,-21,21 -291,-19,19 -292,-17,17 -293,-120,21 -294,-21,21 -295,-15,15 -296,-19,19 -297,-15,15 -298,-15,15 -299,-35,35 -300,-21,21 -301,-21,21 -302,-15,15 -303,-15,15 -304,-23,23 -305,-17,17 -306,-21,21 -307,-17,17 -308,-15,15 -309,-15,15 -310,-15,15 -311,-17,17 -312,-19,19 -313,-25,25 -314,-19,19 -315,-32,32 -316,-15,15 -317,-17,17 -318,-21,21 -319,-15,15 -320,-17,17 -321,-19,19 -322,-17,17 -323,-17,17 -324,-17,17 -325,-17,17 -326,-19,19 -327,-15,15 -328,-15,15 -329,-21,21 -330,-19,19 -331,-15,15 -332,-17,17 -333,-17,17 -334,-15,15 -335,-19,19 -336,-26,26 -337,-17,17 -338,-15,15 -339,-21,21 -340,-17,17 -341,-15,15 -342,-19,19 -343,-17,17 -344,-27,27 -345,-15,15 -346,-17,17 -347,-15,15 -348,-17,17 -349,-17,17 -350,-19,19 -351,-15,15 -352,-15,15 -353,-15,15 -354,-15,15 -355,-17,17 -356,-17,17 -357,-15,15 -358,-19,19 -359,-15,15 -360,-17,17 -361,-17,17 -362,-19,19 -363,-17,17 -364,-17,17 -365,-21,21 -366,-17,17 -367,-15,15 -368,-17,17 -369,-21,21 -370,-19,19 -371,-17,17 -372,-15,15 -373,-15,15 -374,-25,25 -375,-15,15 -376,-15,15 -377,-17,17 -378,-17,17 -379,-17,17 -380,-15,15 -381,-15,15 -382,-15,15 -383,-15,15 -384,-17,17 -385,-17,17 -386,-25,25 -387,-17,17 -388,-15,15 -389,-15,15 -390,-17,17 -391,-15,15 -392,-15,15 -393,-122,23 -394,-15,15 -395,-15,15 -396,-15,15 -397,-15,15 -398,-21,21 -399,-15,15 -400,-17,17 -401,-17,17 -402,-17,17 -403,-15,15 -404,-15,15 -405,-15,15 -406,-17,17 -407,-15,15 -408,-15,15 -409,-17,17 -410,-15,15 -411,-15,15 -412,-17,17 -413,-19,19 -414,-17,17 -415,-17,17 -416,-17,17 -417,-15,15 -418,-15,15 -419,-17,17 -420,-15,15 -421,-15,15 -422,-15,15 -423,-21,21 -424,-15,15 -425,-15,15 -426,-17,17 -427,-15,15 -428,-17,17 -429,-15,15 -430,-15,15 -431,-15,15 -432,-15,15 -433,-15,15 -434,-21,21 -435,-15,15 -436,-15,15 -437,-17,17 -438,-15,15 -439,-15,15 -440,-15,15 -441,-17,17 -442,-15,15 -443,-15,15 -444,-17,17 -445,-15,15 -446,-17,17 -447,-17,17 -448,-15,15 -449,-17,17 -450,-15,15 -451,-17,17 -452,-15,15 -453,-15,15 -454,-15,15 -455,-15,15 -456,-15,15 -457,-22,22 -458,-15,15 -459,-15,15 -460,-15,15 -461,-15,15 -462,-15,15 -463,-15,15 -464,-15,15 -465,-15,15 -466,-15,15 -467,-15,15 -468,-15,15 -469,-15,15 -470,-15,15 -471,-17,17 -472,-21,21 -473,-15,15 -474,-15,15 -475,-15,15 -476,-15,15 -477,-17,17 -478,-15,15 -479,-17,17 -480,-17,17 -481,-15,15 -482,-15,15 -483,-15,15 -484,-17,17 -485,-21,21 -486,-15,15 -487,-15,15 -488,-15,15 -489,-15,15 -490,-15,15 -491,-17,17 -492,-15,15 -493,-17,17 -494,-19,19 -495,-15,15 -496,-15,15 -497,-15,15 -498,-15,15 -499,-15,15 -500,-15,15 -501,-15,15 -502,-15,15 -503,-15,15 -504,-15,15 -505,-15,15 -506,-15,15 -507,-15,15 -508,-17,17 -509,-15,15 -510,-15,15 -511,-15,15 -512,-15,15 -513,-15,15 -514,-15,15 -515,-15,15 -516,-17,17 -517,-15,15 -518,-15,15 -519,-15,15 -520,-15,15 -521,-15,15 -522,-15,15 -523,-15,15 -524,-15,15 -525,-15,15 -526,-15,15 -527,-15,15 -528,-15,15 -529,-15,15 -530,-15,15 -531,-17,17 -532,-17,17 -533,-15,15 -534,-17,17 -535,-15,15 -536,-15,15 -537,-15,15 -538,-15,15 -539,-15,15 -540,-15,15 -541,-15,15 -542,-19,19 -543,-15,15 -544,-15,15 -545,-15,15 -546,-17,17 -547,-15,15 -548,-15,15 -549,-15,15 -550,-15,15 -551,-15,15 -552,-15,15 -553,-15,15 -554,-15,15 -555,-15,15 -556,-15,15 -557,-15,15 -558,-15,15 -559,-15,15 -560,-15,15 -561,-15,15 -562,-15,15 -563,-20,20 -564,-17,17 -565,-15,15 -566,-15,15 -567,-15,15 -568,-15,15 -569,-15,15 -570,-17,17 -571,-17,17 -572,-15,15 -573,-15,15 -574,-15,15 -575,-15,15 -576,-17,17 -577,-15,15 -578,-15,15 -579,-15,15 -580,-15,15 -581,-15,15 -582,-15,15 -583,-17,17 -584,-15,15 -585,-15,15 -586,-15,15 -587,-15,15 -588,-15,15 -589,-15,15 -590,-15,15 -591,-15,15 -592,-15,15 -593,-15,15 -594,-17,17 -595,-15,15 -596,-15,15 -597,-15,15 -598,-16,16 -599,-16,16 -600,-15,15 -601,-15,15 -602,-15,15 -603,-15,15 -604,-15,15 -605,-17,17 -606,-15,15 -607,-15,15 -608,-15,15 -609,-15,15 -610,-15,15 -611,-15,15 -612,-15,15 -613,-15,15 -614,-15,15 -615,-15,15 -616,-15,15 -617,-15,15 -618,-15,15 -619,-15,15 -620,-15,15 -621,-15,15 -622,-15,15 -623,-15,15 -624,-15,15 -625,-15,15 -626,-15,15 -627,-15,15 -628,-15,15 -629,-15,15 -630,-15,15 -631,-15,15 -632,-15,15 -633,-21,21 -634,-15,15 -635,-15,15 -636,-15,15 -637,-15,15 -638,-15,15 -639,-15,15 -640,-15,15 -641,-15,15 -642,-15,15 -643,-15,15 -644,-15,15 -645,-15,15 -646,-17,17 -647,-15,15 -648,-21,21 -649,-15,15 -650,-15,15 -651,-15,15 -652,-15,15 -653,-17,17 -654,-15,15 -655,-15,15 -656,-15,15 -657,-15,15 -658,-15,15 -659,-15,15 -660,-15,15 -661,-15,15 -662,-15,15 -663,-15,15 -664,-17,17 -665,-15,15 -666,-15,15 -667,-15,15 -668,-15,15 -669,-15,15 -670,-17,17 -671,-15,15 -672,-15,15 -673,-15,15 -674,-15,15 -675,-15,15 -676,-15,15 -677,-17,17 -678,-15,15 -679,-15,15 -680,-15,15 -681,-15,15 -682,-15,15 -683,-15,15 -684,-15,15 -685,-15,15 -686,-15,15 -687,-15,15 -688,-15,15 -689,-15,15 -690,-15,15 -691,-15,15 -692,-15,15 -693,-15,15 -694,-15,15 -695,-15,15 -696,-15,15 -697,-15,15 -698,-15,15 -699,-15,15 -700,-15,15 -701,-15,15 -702,-15,15 -703,-17,17 -704,-15,15 -705,-15,15 -706,-15,15 -707,-15,15 -708,-15,15 -709,-15,15 -710,-17,17 -711,-15,15 -712,-15,15 -713,-15,15 -714,-15,15 -715,-16,16 -716,-15,15 -717,-117,18 -718,-15,15 -719,-17,17 -720,-15,15 -721,-15,15 -722,-15,15 -723,-15,15 -724,-15,15 -725,-15,15 -726,-15,15 -727,-15,15 -728,-15,15 -729,-15,15 -730,-15,15 -731,-15,15 -732,-15,15 -733,-15,15 -734,-15,15 -735,-15,15 -736,-15,15 -737,-15,15 -738,-15,15 -739,-15,15 -740,-15,15 -741,-16,16 -742,-15,15 -743,-17,17 -744,-15,15 -745,-19,19 -746,-15,15 -747,-15,15 -748,-15,15 -749,-17,17 -750,-15,15 -751,-15,15 -752,-17,17 -753,-15,15 -754,-15,15 -755,-15,15 -756,-15,15 -757,-17,17 -758,-15,15 -759,-15,15 -760,-15,15 -761,-17,17 -762,-15,15 -763,-15,15 -764,-15,15 -765,-15,15 -766,-15,15 -767,-17,17 -768,-15,15 -769,-15,15 -770,-15,15 -771,-15,15 -772,-15,15 -773,-15,15 -774,-17,17 -775,-15,15 -776,-15,15 -777,-15,15 -778,-15,15 -779,-15,15 -780,-15,15 -781,-15,15 -782,-15,15 -783,-15,15 -784,-15,15 -785,-15,15 -786,-15,15 -787,-15,15 -788,-15,15 -789,-15,15 -790,-15,15 -791,-15,15 -792,-17,17 -793,-15,15 -794,-15,15 -795,-15,15 -796,-15,15 -797,-15,15 -798,-17,17 -799,-15,15 diff --git a/projects/codes/Sarsa/Train_Racetrack-v0_Sarsa_20221030-021315/config.yaml b/projects/codes/Sarsa/Train_Racetrack-v0_Sarsa_20221030-021315/config.yaml deleted file mode 100644 index 79e3694..0000000 --- a/projects/codes/Sarsa/Train_Racetrack-v0_Sarsa_20221030-021315/config.yaml +++ /dev/null @@ -1,19 +0,0 @@ -general_cfg: - algo_name: Sarsa - device: cpu - env_name: Racetrack-v0 - load_checkpoint: false - load_path: Train_CartPole-v1_DQN_20221026-054757 - max_steps: 200 - mode: train - save_fig: true - seed: 10 - show_fig: false - test_eps: 20 - train_eps: 400 -algo_cfg: - epsilon_decay: 200 - epsilon_end: 0.01 - epsilon_start: 0.9 - gamma: 0.99 - lr: 0.1 diff --git a/projects/codes/Sarsa/Train_Racetrack-v0_Sarsa_20221030-021315/logs/log.txt b/projects/codes/Sarsa/Train_Racetrack-v0_Sarsa_20221030-021315/logs/log.txt deleted file mode 100644 index ffa79ca..0000000 --- a/projects/codes/Sarsa/Train_Racetrack-v0_Sarsa_20221030-021315/logs/log.txt +++ /dev/null @@ -1,404 +0,0 @@ -2022-10-30 02:13:15 - r - INFO: - n_states: 4, n_actions: 9 -2022-10-30 02:13:15 - r - INFO: - Start training! -2022-10-30 02:13:15 - r - INFO: - Env: Racetrack-v0, Algorithm: Sarsa, Device: cpu -2022-10-30 02:13:15 - r - INFO: - Episode: 1/400, Reward: -870.00, Steps:200, Epislon: 0.336 -2022-10-30 02:13:15 - r - INFO: - Episode: 2/400, Reward: -740.00, Steps:200, Epislon: 0.129 -2022-10-30 02:13:15 - r - INFO: - Episode: 3/400, Reward: -710.00, Steps:200, Epislon: 0.054 -2022-10-30 02:13:15 - r - INFO: - Episode: 4/400, Reward: -600.00, Steps:200, Epislon: 0.026 -2022-10-30 02:13:15 - r - INFO: - Episode: 5/400, Reward: -580.00, Steps:200, Epislon: 0.016 -2022-10-30 02:13:15 - r - INFO: - Episode: 6/400, Reward: -620.00, Steps:200, Epislon: 0.012 -2022-10-30 02:13:16 - r - INFO: - Episode: 7/400, Reward: -590.00, Steps:200, Epislon: 0.011 -2022-10-30 02:13:16 - r - INFO: - Episode: 8/400, Reward: -590.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:16 - r - INFO: - Episode: 9/400, Reward: -520.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:16 - r - INFO: - Episode: 10/400, Reward: -570.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:16 - r - INFO: - Episode: 11/400, Reward: -580.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:16 - r - INFO: - Episode: 12/400, Reward: -580.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:16 - r - INFO: - Episode: 13/400, Reward: -500.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:16 - r - INFO: - Episode: 14/400, Reward: -540.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:16 - r - INFO: - Episode: 15/400, Reward: -510.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:16 - r - INFO: - Episode: 16/400, Reward: -570.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:16 - r - INFO: - Episode: 17/400, Reward: -560.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:16 - r - INFO: - Episode: 18/400, Reward: -540.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:16 - r - INFO: - Episode: 19/400, Reward: -490.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:16 - r - INFO: - Episode: 20/400, Reward: -490.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:16 - r - INFO: - Episode: 21/400, Reward: -530.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:16 - r - INFO: - Episode: 22/400, Reward: -520.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:16 - r - INFO: - Episode: 23/400, Reward: -530.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:16 - r - INFO: - Episode: 24/400, Reward: -520.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:16 - r - INFO: - Episode: 25/400, Reward: -500.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:16 - r - INFO: - Episode: 26/400, Reward: -510.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:16 - r - INFO: - Episode: 27/400, Reward: -520.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:16 - r - INFO: - Episode: 28/400, Reward: -530.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:17 - r - INFO: - Episode: 29/400, Reward: -560.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:17 - r - INFO: - Episode: 30/400, Reward: -490.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:17 - r - INFO: - Episode: 31/400, Reward: -530.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:17 - r - INFO: - Episode: 32/400, Reward: -359.00, Steps:149, Epislon: 0.010 -2022-10-30 02:13:17 - r - INFO: - Episode: 33/400, Reward: -470.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:17 - r - INFO: - Episode: 34/400, Reward: -510.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:17 - r - INFO: - Episode: 35/400, Reward: -520.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:17 - r - INFO: - Episode: 36/400, Reward: -500.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:17 - r - INFO: - Episode: 37/400, Reward: -540.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:17 - r - INFO: - Episode: 38/400, Reward: -560.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:17 - r - INFO: - Episode: 39/400, Reward: -500.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:17 - r - INFO: - Episode: 40/400, Reward: -480.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:17 - r - INFO: - Episode: 41/400, Reward: -490.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:17 - r - INFO: - Episode: 42/400, Reward: -480.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:17 - r - INFO: - Episode: 43/400, Reward: -540.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:17 - r - INFO: - Episode: 44/400, Reward: -500.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:17 - r - INFO: - Episode: 45/400, Reward: -500.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:17 - r - INFO: - Episode: 46/400, Reward: -480.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:17 - r - INFO: - Episode: 47/400, Reward: -550.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:17 - r - INFO: - Episode: 48/400, Reward: -490.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:17 - r - INFO: - Episode: 49/400, Reward: -540.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:17 - r - INFO: - Episode: 50/400, Reward: -420.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:17 - r - INFO: - Episode: 51/400, Reward: -530.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:17 - r - INFO: - Episode: 52/400, Reward: -510.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:18 - r - INFO: - Episode: 53/400, Reward: -530.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:18 - r - INFO: - Episode: 54/400, Reward: -460.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:18 - r - INFO: - Episode: 55/400, Reward: -480.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:18 - r - INFO: - Episode: 56/400, Reward: -480.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:18 - r - INFO: - Episode: 57/400, Reward: -470.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:18 - r - INFO: - Episode: 58/400, Reward: -490.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:18 - r - INFO: - Episode: 59/400, Reward: -470.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:18 - r - INFO: - Episode: 60/400, Reward: -500.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:18 - r - INFO: - Episode: 61/400, Reward: -500.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:18 - r - INFO: - Episode: 62/400, Reward: -480.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:18 - r - INFO: - Episode: 63/400, Reward: -450.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:18 - r - INFO: - Episode: 64/400, Reward: -490.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:18 - r - INFO: - Episode: 65/400, Reward: -420.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:18 - r - INFO: - Episode: 66/400, Reward: -480.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:18 - r - INFO: - Episode: 67/400, Reward: -440.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:18 - r - INFO: - Episode: 68/400, Reward: -490.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:18 - r - INFO: - Episode: 69/400, Reward: -188.00, Steps:88, Epislon: 0.010 -2022-10-30 02:13:18 - r - INFO: - Episode: 70/400, Reward: -327.00, Steps:167, Epislon: 0.010 -2022-10-30 02:13:18 - r - INFO: - Episode: 71/400, Reward: -530.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:18 - r - INFO: - Episode: 72/400, Reward: -48.00, Steps:28, Epislon: 0.010 -2022-10-30 02:13:18 - r - INFO: - Episode: 73/400, Reward: -460.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:18 - r - INFO: - Episode: 74/400, Reward: -460.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:18 - r - INFO: - Episode: 75/400, Reward: -25.00, Steps:25, Epislon: 0.010 -2022-10-30 02:13:18 - r - INFO: - Episode: 76/400, Reward: -428.00, Steps:178, Epislon: 0.010 -2022-10-30 02:13:18 - r - INFO: - Episode: 77/400, Reward: -460.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:18 - r - INFO: - Episode: 78/400, Reward: -341.00, Steps:151, Epislon: 0.010 -2022-10-30 02:13:18 - r - INFO: - Episode: 79/400, Reward: -480.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 80/400, Reward: -346.00, Steps:156, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 81/400, Reward: -34.00, Steps:24, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 82/400, Reward: -480.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 83/400, Reward: -480.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 84/400, Reward: -222.00, Steps:112, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 85/400, Reward: -470.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 86/400, Reward: -409.00, Steps:169, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 87/400, Reward: -139.00, Steps:59, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 88/400, Reward: -520.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 89/400, Reward: -108.00, Steps:58, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 90/400, Reward: -3.00, Steps:13, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 91/400, Reward: -131.00, Steps:71, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 92/400, Reward: -355.00, Steps:145, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 93/400, Reward: -470.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 94/400, Reward: -450.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 95/400, Reward: -490.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 96/400, Reward: -425.00, Steps:185, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 97/400, Reward: -130.00, Steps:70, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 98/400, Reward: -246.00, Steps:116, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 99/400, Reward: -480.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 100/400, Reward: -500.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 101/400, Reward: -13.00, Steps:13, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 102/400, Reward: -63.00, Steps:33, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 103/400, Reward: -311.00, Steps:131, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 104/400, Reward: -450.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 105/400, Reward: -520.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 106/400, Reward: -430.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 107/400, Reward: -79.00, Steps:39, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 108/400, Reward: -94.00, Steps:44, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 109/400, Reward: -37.00, Steps:27, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 110/400, Reward: -235.00, Steps:115, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 111/400, Reward: -440.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 112/400, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 113/400, Reward: -424.00, Steps:194, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 114/400, Reward: -470.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:19 - r - INFO: - Episode: 115/400, Reward: -344.00, Steps:164, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 116/400, Reward: -307.00, Steps:147, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 117/400, Reward: -82.00, Steps:52, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 118/400, Reward: -387.00, Steps:177, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 119/400, Reward: -500.00, Steps:200, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 120/400, Reward: -315.00, Steps:145, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 121/400, Reward: -289.00, Steps:119, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 122/400, Reward: -139.00, Steps:79, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 123/400, Reward: -392.00, Steps:192, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 124/400, Reward: -13.00, Steps:13, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 125/400, Reward: -35.00, Steps:25, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 126/400, Reward: -82.00, Steps:42, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 127/400, Reward: -134.00, Steps:64, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 128/400, Reward: -93.00, Steps:53, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 129/400, Reward: 2.00, Steps:8, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 130/400, Reward: -212.00, Steps:102, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 131/400, Reward: -87.00, Steps:47, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 132/400, Reward: -70.00, Steps:40, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 133/400, Reward: -109.00, Steps:49, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 134/400, Reward: -77.00, Steps:47, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 135/400, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 136/400, Reward: -118.00, Steps:58, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 137/400, Reward: -132.00, Steps:62, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 138/400, Reward: -76.00, Steps:36, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 139/400, Reward: -93.00, Steps:63, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 140/400, Reward: -357.00, Steps:157, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 141/400, Reward: -129.00, Steps:69, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 142/400, Reward: -46.00, Steps:26, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 143/400, Reward: -60.00, Steps:30, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 144/400, Reward: -339.00, Steps:159, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 145/400, Reward: -10.00, Steps:10, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 146/400, Reward: -164.00, Steps:84, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 147/400, Reward: -145.00, Steps:75, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 148/400, Reward: -53.00, Steps:33, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 149/400, Reward: -3.00, Steps:13, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 150/400, Reward: -55.00, Steps:35, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 151/400, Reward: -398.00, Steps:178, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 152/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 153/400, Reward: -20.00, Steps:20, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 154/400, Reward: -354.00, Steps:154, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 155/400, Reward: -439.00, Steps:189, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 156/400, Reward: -122.00, Steps:62, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 157/400, Reward: -80.00, Steps:40, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 158/400, Reward: -29.00, Steps:19, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 159/400, Reward: -185.00, Steps:85, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 160/400, Reward: -354.00, Steps:154, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 161/400, Reward: -35.00, Steps:25, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 162/400, Reward: -132.00, Steps:62, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 163/400, Reward: -155.00, Steps:75, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 164/400, Reward: -261.00, Steps:111, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 165/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 166/400, Reward: -135.00, Steps:65, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 167/400, Reward: -57.00, Steps:37, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 168/400, Reward: -432.00, Steps:182, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 169/400, Reward: -63.00, Steps:33, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 170/400, Reward: -119.00, Steps:59, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 171/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 172/400, Reward: -16.00, Steps:16, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 173/400, Reward: -112.00, Steps:62, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 174/400, Reward: 1.00, Steps:9, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 175/400, Reward: -354.00, Steps:164, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 176/400, Reward: -101.00, Steps:61, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 177/400, Reward: -86.00, Steps:46, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 178/400, Reward: -33.00, Steps:23, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 179/400, Reward: -339.00, Steps:139, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 180/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 181/400, Reward: -9.00, Steps:9, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 182/400, Reward: -224.00, Steps:104, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 183/400, Reward: -11.00, Steps:11, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 184/400, Reward: -52.00, Steps:32, Epislon: 0.010 -2022-10-30 02:13:20 - r - INFO: - Episode: 185/400, Reward: -98.00, Steps:48, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 186/400, Reward: -26.00, Steps:16, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 187/400, Reward: -89.00, Steps:39, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 188/400, Reward: 1.00, Steps:9, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 189/400, Reward: -66.00, Steps:36, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 190/400, Reward: -77.00, Steps:37, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 191/400, Reward: 5.00, Steps:5, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 192/400, Reward: 2.00, Steps:8, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 193/400, Reward: -64.00, Steps:34, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 194/400, Reward: 5.00, Steps:5, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 195/400, Reward: -10.00, Steps:10, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 196/400, Reward: -79.00, Steps:39, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 197/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 198/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 199/400, Reward: 0.00, Steps:10, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 200/400, Reward: -33.00, Steps:23, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 201/400, Reward: 2.00, Steps:8, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 202/400, Reward: 5.00, Steps:5, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 203/400, Reward: -110.00, Steps:50, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 204/400, Reward: -43.00, Steps:23, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 205/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 206/400, Reward: -13.00, Steps:13, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 207/400, Reward: 1.00, Steps:9, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 208/400, Reward: -32.00, Steps:22, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 209/400, Reward: -77.00, Steps:37, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 210/400, Reward: 5.00, Steps:5, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 211/400, Reward: -23.00, Steps:23, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 212/400, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 213/400, Reward: 4.00, Steps:6, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 214/400, Reward: 1.00, Steps:9, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 215/400, Reward: -42.00, Steps:22, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 216/400, Reward: -13.00, Steps:13, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 217/400, Reward: -64.00, Steps:34, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 218/400, Reward: -13.00, Steps:13, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 219/400, Reward: -2.00, Steps:12, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 220/400, Reward: 5.00, Steps:5, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 221/400, Reward: -129.00, Steps:69, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 222/400, Reward: -133.00, Steps:63, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 223/400, Reward: -47.00, Steps:37, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 224/400, Reward: -11.00, Steps:11, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 225/400, Reward: -25.00, Steps:25, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 226/400, Reward: -1.00, Steps:11, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 227/400, Reward: 5.00, Steps:5, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 228/400, Reward: -103.00, Steps:53, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 229/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 230/400, Reward: 2.00, Steps:8, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 231/400, Reward: -67.00, Steps:37, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 232/400, Reward: -65.00, Steps:35, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 233/400, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 234/400, Reward: -30.00, Steps:20, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 235/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 236/400, Reward: 4.00, Steps:6, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 237/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 238/400, Reward: -13.00, Steps:13, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 239/400, Reward: 1.00, Steps:9, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 240/400, Reward: -16.00, Steps:16, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 241/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 242/400, Reward: -39.00, Steps:29, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 243/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 244/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 245/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 246/400, Reward: -13.00, Steps:13, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 247/400, Reward: 5.00, Steps:5, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 248/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 249/400, Reward: 2.00, Steps:8, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 250/400, Reward: -12.00, Steps:12, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 251/400, Reward: -14.00, Steps:14, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 252/400, Reward: 2.00, Steps:8, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 253/400, Reward: -57.00, Steps:37, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 254/400, Reward: -29.00, Steps:19, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 255/400, Reward: 4.00, Steps:6, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 256/400, Reward: 2.00, Steps:8, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 257/400, Reward: -13.00, Steps:13, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 258/400, Reward: -40.00, Steps:30, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 259/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 260/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 261/400, Reward: -30.00, Steps:20, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 262/400, Reward: -34.00, Steps:24, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 263/400, Reward: -1.00, Steps:11, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 264/400, Reward: -13.00, Steps:13, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 265/400, Reward: 2.00, Steps:8, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 266/400, Reward: 5.00, Steps:5, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 267/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 268/400, Reward: -42.00, Steps:32, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 269/400, Reward: -17.00, Steps:17, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 270/400, Reward: -12.00, Steps:12, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 271/400, Reward: -28.00, Steps:18, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 272/400, Reward: -13.00, Steps:13, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 273/400, Reward: -2.00, Steps:12, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 274/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 275/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 276/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 277/400, Reward: -14.00, Steps:14, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 278/400, Reward: -14.00, Steps:14, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 279/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 280/400, Reward: 4.00, Steps:6, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 281/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 282/400, Reward: 5.00, Steps:5, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 283/400, Reward: -13.00, Steps:13, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 284/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 285/400, Reward: 2.00, Steps:8, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 286/400, Reward: 5.00, Steps:5, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 287/400, Reward: -1.00, Steps:11, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 288/400, Reward: -39.00, Steps:29, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 289/400, Reward: 5.00, Steps:5, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 290/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 291/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 292/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 293/400, Reward: -11.00, Steps:11, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 294/400, Reward: -30.00, Steps:20, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 295/400, Reward: -18.00, Steps:18, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 296/400, Reward: -13.00, Steps:13, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 297/400, Reward: 2.00, Steps:8, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 298/400, Reward: 5.00, Steps:5, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 299/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 300/400, Reward: 4.00, Steps:6, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 301/400, Reward: 2.00, Steps:8, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 302/400, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 303/400, Reward: -14.00, Steps:14, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 304/400, Reward: -13.00, Steps:13, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 305/400, Reward: -55.00, Steps:35, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 306/400, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 307/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 308/400, Reward: -12.00, Steps:12, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 309/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 310/400, Reward: -67.00, Steps:37, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 311/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 312/400, Reward: -20.00, Steps:20, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 313/400, Reward: 4.00, Steps:6, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 314/400, Reward: 5.00, Steps:5, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 315/400, Reward: -20.00, Steps:20, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 316/400, Reward: -36.00, Steps:26, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 317/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 318/400, Reward: 4.00, Steps:6, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 319/400, Reward: 2.00, Steps:8, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 320/400, Reward: -12.00, Steps:12, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 321/400, Reward: 5.00, Steps:5, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 322/400, Reward: 4.00, Steps:6, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 323/400, Reward: -16.00, Steps:16, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 324/400, Reward: 4.00, Steps:6, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 325/400, Reward: -18.00, Steps:18, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 326/400, Reward: -36.00, Steps:26, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 327/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 328/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 329/400, Reward: -28.00, Steps:18, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 330/400, Reward: -31.00, Steps:21, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 331/400, Reward: -1.00, Steps:11, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 332/400, Reward: -109.00, Steps:59, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 333/400, Reward: -29.00, Steps:19, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 334/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 335/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 336/400, Reward: 0.00, Steps:10, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 337/400, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 338/400, Reward: -12.00, Steps:12, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 339/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 340/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 341/400, Reward: -14.00, Steps:14, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 342/400, Reward: -35.00, Steps:25, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 343/400, Reward: -16.00, Steps:16, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 344/400, Reward: -21.00, Steps:21, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 345/400, Reward: -28.00, Steps:18, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 346/400, Reward: 2.00, Steps:8, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 347/400, Reward: -12.00, Steps:12, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 348/400, Reward: -28.00, Steps:18, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 349/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 350/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 351/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 352/400, Reward: -10.00, Steps:10, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 353/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 354/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 355/400, Reward: -62.00, Steps:32, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 356/400, Reward: 5.00, Steps:5, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 357/400, Reward: 4.00, Steps:6, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 358/400, Reward: -28.00, Steps:18, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 359/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 360/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 361/400, Reward: 2.00, Steps:8, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 362/400, Reward: 2.00, Steps:8, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 363/400, Reward: -16.00, Steps:16, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 364/400, Reward: 2.00, Steps:8, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 365/400, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 366/400, Reward: -18.00, Steps:18, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 367/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 368/400, Reward: -18.00, Steps:18, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 369/400, Reward: -15.00, Steps:15, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 370/400, Reward: 5.00, Steps:5, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 371/400, Reward: -29.00, Steps:19, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 372/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 373/400, Reward: -14.00, Steps:14, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 374/400, Reward: 1.00, Steps:9, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 375/400, Reward: -19.00, Steps:19, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 376/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 377/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 378/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 379/400, Reward: -31.00, Steps:21, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 380/400, Reward: 2.00, Steps:8, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 381/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 382/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 383/400, Reward: -14.00, Steps:14, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 384/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 385/400, Reward: 2.00, Steps:8, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 386/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 387/400, Reward: 4.00, Steps:6, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 388/400, Reward: -8.00, Steps:8, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 389/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 390/400, Reward: 4.00, Steps:6, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 391/400, Reward: -13.00, Steps:13, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 392/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 393/400, Reward: -12.00, Steps:12, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 394/400, Reward: -32.00, Steps:22, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 395/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 396/400, Reward: -27.00, Steps:17, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 397/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 398/400, Reward: 3.00, Steps:7, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 399/400, Reward: -37.00, Steps:27, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Episode: 400/400, Reward: -57.00, Steps:37, Epislon: 0.010 -2022-10-30 02:13:21 - r - INFO: - Finish training! diff --git a/projects/codes/Sarsa/Train_Racetrack-v0_Sarsa_20221030-021315/models/checkpoint.pkl b/projects/codes/Sarsa/Train_Racetrack-v0_Sarsa_20221030-021315/models/checkpoint.pkl deleted file mode 100644 index d950b3f..0000000 Binary files a/projects/codes/Sarsa/Train_Racetrack-v0_Sarsa_20221030-021315/models/checkpoint.pkl and /dev/null differ diff --git a/projects/codes/Sarsa/Train_Racetrack-v0_Sarsa_20221030-021315/results/learning_curve.png b/projects/codes/Sarsa/Train_Racetrack-v0_Sarsa_20221030-021315/results/learning_curve.png deleted file mode 100644 index 0626795..0000000 Binary files a/projects/codes/Sarsa/Train_Racetrack-v0_Sarsa_20221030-021315/results/learning_curve.png and /dev/null differ diff --git a/projects/codes/Sarsa/Train_Racetrack-v0_Sarsa_20221030-021315/results/res.csv b/projects/codes/Sarsa/Train_Racetrack-v0_Sarsa_20221030-021315/results/res.csv deleted file mode 100644 index d251623..0000000 --- a/projects/codes/Sarsa/Train_Racetrack-v0_Sarsa_20221030-021315/results/res.csv +++ /dev/null @@ -1,401 +0,0 @@ -episodes,rewards,steps -0,-870,200 -1,-740,200 -2,-710,200 -3,-600,200 -4,-580,200 -5,-620,200 -6,-590,200 -7,-590,200 -8,-520,200 -9,-570,200 -10,-580,200 -11,-580,200 -12,-500,200 -13,-540,200 -14,-510,200 -15,-570,200 -16,-560,200 -17,-540,200 -18,-490,200 -19,-490,200 -20,-530,200 -21,-520,200 -22,-530,200 -23,-520,200 -24,-500,200 -25,-510,200 -26,-520,200 -27,-530,200 -28,-560,200 -29,-490,200 -30,-530,200 -31,-359,149 -32,-470,200 -33,-510,200 -34,-520,200 -35,-500,200 -36,-540,200 -37,-560,200 -38,-500,200 -39,-480,200 -40,-490,200 -41,-480,200 -42,-540,200 -43,-500,200 -44,-500,200 -45,-480,200 -46,-550,200 -47,-490,200 -48,-540,200 -49,-420,200 -50,-530,200 -51,-510,200 -52,-530,200 -53,-460,200 -54,-480,200 -55,-480,200 -56,-470,200 -57,-490,200 -58,-470,200 -59,-500,200 -60,-500,200 -61,-480,200 -62,-450,200 -63,-490,200 -64,-420,200 -65,-480,200 -66,-440,200 -67,-490,200 -68,-188,88 -69,-327,167 -70,-530,200 -71,-48,28 -72,-460,200 -73,-460,200 -74,-25,25 -75,-428,178 -76,-460,200 -77,-341,151 -78,-480,200 -79,-346,156 -80,-34,24 -81,-480,200 -82,-480,200 -83,-222,112 -84,-470,200 -85,-409,169 -86,-139,59 -87,-520,200 -88,-108,58 -89,-3,13 -90,-131,71 -91,-355,145 -92,-470,200 -93,-450,200 -94,-490,200 -95,-425,185 -96,-130,70 -97,-246,116 -98,-480,200 -99,-500,200 -100,-13,13 -101,-63,33 -102,-311,131 -103,-450,200 -104,-520,200 -105,-430,200 -106,-79,39 -107,-94,44 -108,-37,27 -109,-235,115 -110,-440,200 -111,-19,19 -112,-424,194 -113,-470,200 -114,-344,164 -115,-307,147 -116,-82,52 -117,-387,177 -118,-500,200 -119,-315,145 -120,-289,119 -121,-139,79 -122,-392,192 -123,-13,13 -124,-35,25 -125,-82,42 -126,-134,64 -127,-93,53 -128,2,8 -129,-212,102 -130,-87,47 -131,-70,40 -132,-109,49 -133,-77,47 -134,-17,17 -135,-118,58 -136,-132,62 -137,-76,36 -138,-93,63 -139,-357,157 -140,-129,69 -141,-46,26 -142,-60,30 -143,-339,159 -144,-10,10 -145,-164,84 -146,-145,75 -147,-53,33 -148,-3,13 -149,-55,35 -150,-398,178 -151,3,7 -152,-20,20 -153,-354,154 -154,-439,189 -155,-122,62 -156,-80,40 -157,-29,19 -158,-185,85 -159,-354,154 -160,-35,25 -161,-132,62 -162,-155,75 -163,-261,111 -164,3,7 -165,-135,65 -166,-57,37 -167,-432,182 -168,-63,33 -169,-119,59 -170,3,7 -171,-16,16 -172,-112,62 -173,1,9 -174,-354,164 -175,-101,61 -176,-86,46 -177,-33,23 -178,-339,139 -179,3,7 -180,-9,9 -181,-224,104 -182,-11,11 -183,-52,32 -184,-98,48 -185,-26,16 -186,-89,39 -187,1,9 -188,-66,36 -189,-77,37 -190,5,5 -191,2,8 -192,-64,34 -193,5,5 -194,-10,10 -195,-79,39 -196,3,7 -197,3,7 -198,0,10 -199,-33,23 -200,2,8 -201,5,5 -202,-110,50 -203,-43,23 -204,3,7 -205,-13,13 -206,1,9 -207,-32,22 -208,-77,37 -209,5,5 -210,-23,23 -211,-15,15 -212,4,6 -213,1,9 -214,-42,22 -215,-13,13 -216,-64,34 -217,-13,13 -218,-2,12 -219,5,5 -220,-129,69 -221,-133,63 -222,-47,37 -223,-11,11 -224,-25,25 -225,-1,11 -226,5,5 -227,-103,53 -228,3,7 -229,2,8 -230,-67,37 -231,-65,35 -232,-15,15 -233,-30,20 -234,3,7 -235,4,6 -236,3,7 -237,-13,13 -238,1,9 -239,-16,16 -240,3,7 -241,-39,29 -242,3,7 -243,3,7 -244,3,7 -245,-13,13 -246,5,5 -247,3,7 -248,2,8 -249,-12,12 -250,-14,14 -251,2,8 -252,-57,37 -253,-29,19 -254,4,6 -255,2,8 -256,-13,13 -257,-40,30 -258,3,7 -259,3,7 -260,-30,20 -261,-34,24 -262,-1,11 -263,-13,13 -264,2,8 -265,5,5 -266,3,7 -267,-42,32 -268,-17,17 -269,-12,12 -270,-28,18 -271,-13,13 -272,-2,12 -273,3,7 -274,3,7 -275,3,7 -276,-14,14 -277,-14,14 -278,3,7 -279,4,6 -280,3,7 -281,5,5 -282,-13,13 -283,3,7 -284,2,8 -285,5,5 -286,-1,11 -287,-39,29 -288,5,5 -289,3,7 -290,3,7 -291,3,7 -292,-11,11 -293,-30,20 -294,-18,18 -295,-13,13 -296,2,8 -297,5,5 -298,3,7 -299,4,6 -300,2,8 -301,-15,15 -302,-14,14 -303,-13,13 -304,-55,35 -305,-19,19 -306,3,7 -307,-12,12 -308,3,7 -309,-67,37 -310,3,7 -311,-20,20 -312,4,6 -313,5,5 -314,-20,20 -315,-36,26 -316,3,7 -317,4,6 -318,2,8 -319,-12,12 -320,5,5 -321,4,6 -322,-16,16 -323,4,6 -324,-18,18 -325,-36,26 -326,3,7 -327,3,7 -328,-28,18 -329,-31,21 -330,-1,11 -331,-109,59 -332,-29,19 -333,3,7 -334,3,7 -335,0,10 -336,-15,15 -337,-12,12 -338,3,7 -339,3,7 -340,-14,14 -341,-35,25 -342,-16,16 -343,-21,21 -344,-28,18 -345,2,8 -346,-12,12 -347,-28,18 -348,3,7 -349,3,7 -350,3,7 -351,-10,10 -352,3,7 -353,3,7 -354,-62,32 -355,5,5 -356,4,6 -357,-28,18 -358,3,7 -359,3,7 -360,2,8 -361,2,8 -362,-16,16 -363,2,8 -364,-15,15 -365,-18,18 -366,3,7 -367,-18,18 -368,-15,15 -369,5,5 -370,-29,19 -371,3,7 -372,-14,14 -373,1,9 -374,-19,19 -375,3,7 -376,3,7 -377,3,7 -378,-31,21 -379,2,8 -380,3,7 -381,3,7 -382,-14,14 -383,3,7 -384,2,8 -385,3,7 -386,4,6 -387,-8,8 -388,3,7 -389,4,6 -390,-13,13 -391,3,7 -392,-12,12 -393,-32,22 -394,3,7 -395,-27,17 -396,3,7 -397,3,7 -398,-37,27 -399,-57,37 diff --git a/projects/codes/Sarsa/assets/sarsa_algo.png b/projects/codes/Sarsa/assets/sarsa_algo.png deleted file mode 100644 index 0abef7a..0000000 Binary files a/projects/codes/Sarsa/assets/sarsa_algo.png and /dev/null differ diff --git a/projects/codes/Sarsa/config/CliffWalking-v0_Sarsa_Test.yaml b/projects/codes/Sarsa/config/CliffWalking-v0_Sarsa_Test.yaml deleted file mode 100644 index f39b31b..0000000 --- a/projects/codes/Sarsa/config/CliffWalking-v0_Sarsa_Test.yaml +++ /dev/null @@ -1,19 +0,0 @@ -general_cfg: - algo_name: Sarsa - device: cpu - env_name: CliffWalking-v0 - mode: test - load_checkpoint: true - load_path: Train_CliffWalking-v0_Sarsa_20221030-021146 - max_steps: 200 - save_fig: true - seed: 1 - show_fig: false - test_eps: 20 - train_eps: 400 -algo_cfg: - epsilon_decay: 300 - epsilon_end: 0.01 - epsilon_start: 0.95 - gamma: 0.95 - lr: 0.1 diff --git a/projects/codes/Sarsa/config/CliffWalking-v0_Sarsa_Train.yaml b/projects/codes/Sarsa/config/CliffWalking-v0_Sarsa_Train.yaml deleted file mode 100644 index 630ead8..0000000 --- a/projects/codes/Sarsa/config/CliffWalking-v0_Sarsa_Train.yaml +++ /dev/null @@ -1,19 +0,0 @@ -general_cfg: - algo_name: Sarsa - device: cpu - env_name: CliffWalking-v0 - mode: train - load_checkpoint: false - load_path: Train_CartPole-v1_DQN_20221026-054757 - max_steps: 200 - save_fig: true - seed: 1 - show_fig: false - test_eps: 20 - train_eps: 800 -algo_cfg: - epsilon_decay: 300 - epsilon_end: 0.01 - epsilon_start: 0.95 - gamma: 0.95 - lr: 0.1 diff --git a/projects/codes/Sarsa/config/Racetrack-v0_Sarsa_Test.yaml b/projects/codes/Sarsa/config/Racetrack-v0_Sarsa_Test.yaml deleted file mode 100644 index e07a7e2..0000000 --- a/projects/codes/Sarsa/config/Racetrack-v0_Sarsa_Test.yaml +++ /dev/null @@ -1,19 +0,0 @@ -general_cfg: - algo_name: Sarsa - device: cpu - env_name: Racetrack-v0 - mode: test - load_checkpoint: true - load_path: Train_Racetrack-v0_Sarsa_20221030-021315 - max_steps: 200 - save_fig: true - seed: 10 - show_fig: false - test_eps: 20 - train_eps: 400 -algo_cfg: - epsilon_decay: 200 - epsilon_end: 0.01 - epsilon_start: 0.9 - gamma: 0.99 - lr: 0.1 diff --git a/projects/codes/Sarsa/config/Racetrack-v0_Sarsa_Train.yaml b/projects/codes/Sarsa/config/Racetrack-v0_Sarsa_Train.yaml deleted file mode 100644 index da6299f..0000000 --- a/projects/codes/Sarsa/config/Racetrack-v0_Sarsa_Train.yaml +++ /dev/null @@ -1,19 +0,0 @@ -general_cfg: - algo_name: Sarsa - device: cpu - env_name: Racetrack-v0 - mode: train - load_checkpoint: false - load_path: Train_CartPole-v1_DQN_20221026-054757 - max_steps: 200 - save_fig: true - seed: 10 - show_fig: false - test_eps: 20 - train_eps: 400 -algo_cfg: - epsilon_decay: 200 - epsilon_end: 0.01 - epsilon_start: 0.9 - gamma: 0.99 - lr: 0.1 diff --git a/projects/codes/Sarsa/config/config.py b/projects/codes/Sarsa/config/config.py deleted file mode 100644 index 9980c04..0000000 --- a/projects/codes/Sarsa/config/config.py +++ /dev/null @@ -1,35 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: JiangJi -Email: johnjim0816@gmail.com -Date: 2022-10-30 01:23:07 -LastEditor: JiangJi -LastEditTime: 2022-10-30 02:01:54 -Discription: default parameters of QLearning -''' -from common.config import GeneralConfig,AlgoConfig - -class GeneralConfigSarsa(GeneralConfig): - def __init__(self) -> None: - self.env_name = "CliffWalking-v0" # name of environment - self.algo_name = "Sarsa" # name of algorithm - self.mode = "train" # train or test - self.seed = 1 # random seed - self.device = "cpu" # device to use - self.train_eps = 400 # number of episodes for training - self.test_eps = 20 # number of episodes for testing - self.max_steps = 200 # max steps for each episode - self.load_checkpoint = False - self.load_path = "tasks" # path to load model - self.show_fig = False # show figure or not - self.save_fig = True # save figure or not - -class AlgoConfigSarsa(AlgoConfig): - def __init__(self) -> None: - # set epsilon_start=epsilon_end can obtain fixed epsilon=epsilon_end - self.epsilon_start = 0.95 # epsilon start value - self.epsilon_end = 0.01 # epsilon end value - self.epsilon_decay = 300 # epsilon decay rate - self.gamma = 0.90 # discount factor - self.lr = 0.1 # learning rate \ No newline at end of file diff --git a/projects/codes/Sarsa/sarsa.py b/projects/codes/Sarsa/sarsa.py deleted file mode 100644 index 753ee95..0000000 --- a/projects/codes/Sarsa/sarsa.py +++ /dev/null @@ -1,64 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: John -Email: johnjim0816@gmail.com -Date: 2021-03-12 16:58:16 -LastEditor: John -LastEditTime: 2022-10-30 02:00:51 -Discription: -Environment: -''' -import numpy as np -from collections import defaultdict -import torch -import math -class Sarsa(object): - def __init__(self,cfg): - self.n_actions = cfg.n_actions - self.lr = cfg.lr - self.gamma = cfg.gamma - self.epsilon = cfg.epsilon_start - self.sample_count = 0 - self.epsilon_start = cfg.epsilon_start - self.epsilon_end = cfg.epsilon_end - self.epsilon_decay = cfg.epsilon_decay - self.Q_table = defaultdict(lambda: np.zeros(self.n_actions)) # Q table - def sample_action(self, state): - ''' another way to represent e-greedy policy - ''' - self.sample_count += 1 - self.epsilon = self.epsilon_end + (self.epsilon_start - self.epsilon_end) * \ - math.exp(-1. * self.sample_count / self.epsilon_decay) # The probability to select a random action, is is log decayed - best_action = np.argmax(self.Q_table[str(state)]) # array cannot be hashtable, thus convert to str - action_probs = np.ones(self.n_actions, dtype=float) * self.epsilon / self.n_actions - action_probs[best_action] += (1.0 - self.epsilon) - action = np.random.choice(np.arange(len(action_probs)), p=action_probs) - return action - def predict_action(self,state): - ''' predict action while testing - ''' - action = np.argmax(self.Q_table[str(state)]) - return action - def update(self, state, action, reward, next_state, next_action,done): - Q_predict = self.Q_table[str(state)][action] - if done: - Q_target = reward # terminal state - else: - Q_target = reward + self.gamma * self.Q_table[str(next_state)][next_action] # the only difference from Q learning - self.Q_table[str(state)][action] += self.lr * (Q_target - Q_predict) - def save_model(self,path): - import dill - from pathlib import Path - # create path - Path(path).mkdir(parents=True, exist_ok=True) - torch.save( - obj=self.Q_table, - f=path+"checkpoint.pkl", - pickle_module=dill - ) - print("Model saved!") - def load_model(self, path): - import dill - self.Q_table=torch.load(f=path+'checkpoint.pkl',pickle_module=dill) - print("Mode loaded!") \ No newline at end of file diff --git a/projects/codes/Sarsa/task0.py b/projects/codes/Sarsa/task0.py deleted file mode 100644 index bdd5fda..0000000 --- a/projects/codes/Sarsa/task0.py +++ /dev/null @@ -1,106 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: JiangJi -Email: johnjim0816@gmail.com -Date: 2022-09-19 14:48:16 -LastEditor: JiangJi -LastEditTime: 2022-10-30 02:11:31 -Discription: -''' -import sys,os -os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE" # avoid "OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized." -curr_path = os.path.dirname(os.path.abspath(__file__)) # current path -parent_path = os.path.dirname(curr_path) # parent path -sys.path.append(parent_path) # add path to system path -import gym -import datetime -import argparse -from envs.register import register_env -from envs.wrappers import CliffWalkingWapper -from Sarsa.sarsa import Sarsa -from common.utils import all_seed,merge_class_attrs -from common.launcher import Launcher -from config.config import GeneralConfigSarsa,AlgoConfigSarsa - -class Main(Launcher): - def __init__(self) -> None: - super().__init__() - self.cfgs['general_cfg'] = merge_class_attrs(self.cfgs['general_cfg'],GeneralConfigSarsa()) - self.cfgs['algo_cfg'] = merge_class_attrs(self.cfgs['algo_cfg'],AlgoConfigSarsa()) - - def env_agent_config(self,cfg,logger): - register_env(cfg.env_name) - env = gym.make(cfg.env_name,new_step_api=False) # create env - if cfg.env_name == 'CliffWalking-v0': - env = CliffWalkingWapper(env) - if cfg.seed !=0: # set random seed - all_seed(env,seed=cfg.seed) - try: # state dimension - n_states = env.observation_space.n # print(hasattr(env.observation_space, 'n')) - except AttributeError: - n_states = env.observation_space.shape[0] # print(hasattr(env.observation_space, 'shape')) - n_actions = env.action_space.n # action dimension - logger.info(f"n_states: {n_states}, n_actions: {n_actions}") # print info - # update to cfg paramters - setattr(cfg, 'n_states', n_states) - setattr(cfg, 'n_actions', n_actions) - agent = Sarsa(cfg) - return env,agent - - def train(self,cfg,env,agent,logger): - logger.info("Start training!") - logger.info(f"Env: {cfg.env_name}, Algorithm: {cfg.algo_name}, Device: {cfg.device}") - rewards = [] # record rewards for all episodes - steps = [] # record steps for all episodes - for i_ep in range(cfg.train_eps): - ep_reward = 0 # reward per episode - ep_step = 0 # step per episode - state = env.reset() # reset and obtain initial state - action = agent.sample_action(state) - # while True: - for _ in range(cfg.max_steps): - next_state, reward, done, _ = env.step(action) # update env and return transitions - next_action = agent.sample_action(next_state) - agent.update(state, action, reward, next_state, next_action,done) # update agent - state = next_state # update state - action = next_action - ep_reward += reward - ep_step += 1 - if done: - break - rewards.append(ep_reward) - steps.append(ep_step) - logger.info(f'Episode: {i_ep+1}/{cfg.train_eps}, Reward: {ep_reward:.2f}, Steps:{ep_step:d}, Epislon: {agent.epsilon:.3f}') - logger.info("Finish training!") - return {'episodes':range(len(rewards)),'rewards':rewards,'steps':steps} - - def test(self,cfg,env,agent,logger): - logger.info("Start testing!") - logger.info(f"Env: {cfg.env_name}, Algorithm: {cfg.algo_name}, Device: {cfg.device}") - rewards = [] # record rewards for all episodes - steps = [] # record steps for all episodes - for i_ep in range(cfg.test_eps): - ep_reward = 0 # reward per episode - ep_step = 0 - state = env.reset() # reset and obtain initial state - for _ in range(cfg.max_steps): - action = agent.predict_action(state) - next_state, reward, done, _ = env.step(action) - state = next_state - ep_reward+=reward - ep_step+=1 - if done: - break - rewards.append(ep_reward) - steps.append(ep_step) - logger.info(f"Episode: {i_ep+1}/{cfg.test_eps}, Reward: {ep_reward:.2f}, Steps:{ep_step:d}") - logger.info("Finish testing!") - return {'episodes':range(len(rewards)),'rewards':rewards,'steps':steps} - -if __name__ == "__main__": - main = Main() - main.run() - - - diff --git a/projects/codes/SoftActorCritic/env_wrapper.py b/projects/codes/SoftActorCritic/env_wrapper.py deleted file mode 100644 index dfe1c4d..0000000 --- a/projects/codes/SoftActorCritic/env_wrapper.py +++ /dev/null @@ -1,30 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: JiangJi -Email: johnjim0816@gmail.com -Date: 2021-04-29 12:52:11 -LastEditor: JiangJi -LastEditTime: 2021-12-22 15:36:36 -Discription: -Environment: -''' -import gym -import numpy as np - -class NormalizedActions(gym.ActionWrapper): - def action(self, action): - low = self.action_space.low - high = self.action_space.high - - action = low + (action + 1.0) * 0.5 * (high - low) - action = np.clip(action, low, high) - - return action - - def reverse_action(self, action): - low = self.action_space.low - high = self.action_space.high - action = 2 * (action - low) / (high - low) - 1 - action = np.clip(action, low, high) - return action \ No newline at end of file diff --git a/projects/codes/SoftActorCritic/model.py b/projects/codes/SoftActorCritic/model.py deleted file mode 100644 index ba04737..0000000 --- a/projects/codes/SoftActorCritic/model.py +++ /dev/null @@ -1,108 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: JiangJi -Email: johnjim0816@gmail.com -Date: 2021-04-29 12:53:58 -LastEditor: JiangJi -LastEditTime: 2021-11-19 18:04:19 -Discription: -Environment: -''' -import torch -import torch.nn as nn -import torch.nn.functional as F -from torch.distributions import Normal - -device=torch.device("cuda" if torch.cuda.is_available() else "cpu") - -class ValueNet(nn.Module): - def __init__(self, n_states, hidden_dim, init_w=3e-3): - super(ValueNet, self).__init__() - - self.linear1 = nn.Linear(n_states, hidden_dim) - self.linear2 = nn.Linear(hidden_dim, hidden_dim) - self.linear3 = nn.Linear(hidden_dim, 1) - - self.linear3.weight.data.uniform_(-init_w, init_w) - self.linear3.bias.data.uniform_(-init_w, init_w) - - def forward(self, state): - x = F.relu(self.linear1(state)) - x = F.relu(self.linear2(x)) - x = self.linear3(x) - return x - - -class SoftQNet(nn.Module): - def __init__(self, n_states, n_actions, hidden_dim, init_w=3e-3): - super(SoftQNet, self).__init__() - - self.linear1 = nn.Linear(n_states + n_actions, hidden_dim) - self.linear2 = nn.Linear(hidden_dim, hidden_dim) - self.linear3 = nn.Linear(hidden_dim, 1) - - self.linear3.weight.data.uniform_(-init_w, init_w) - self.linear3.bias.data.uniform_(-init_w, init_w) - - def forward(self, state, action): - x = torch.cat([state, action], 1) - x = F.relu(self.linear1(x)) - x = F.relu(self.linear2(x)) - x = self.linear3(x) - return x - - -class PolicyNet(nn.Module): - def __init__(self, n_states, n_actions, hidden_dim, init_w=3e-3, log_std_min=-20, log_std_max=2): - super(PolicyNet, self).__init__() - - self.log_std_min = log_std_min - self.log_std_max = log_std_max - - self.linear1 = nn.Linear(n_states, hidden_dim) - self.linear2 = nn.Linear(hidden_dim, hidden_dim) - - self.mean_linear = nn.Linear(hidden_dim, n_actions) - self.mean_linear.weight.data.uniform_(-init_w, init_w) - self.mean_linear.bias.data.uniform_(-init_w, init_w) - - self.log_std_linear = nn.Linear(hidden_dim, n_actions) - self.log_std_linear.weight.data.uniform_(-init_w, init_w) - self.log_std_linear.bias.data.uniform_(-init_w, init_w) - - def forward(self, state): - x = F.relu(self.linear1(state)) - x = F.relu(self.linear2(x)) - - mean = self.mean_linear(x) - log_std = self.log_std_linear(x) - log_std = torch.clamp(log_std, self.log_std_min, self.log_std_max) - - return mean, log_std - - def evaluate(self, state, epsilon=1e-6): - mean, log_std = self.forward(state) - std = log_std.exp() - - normal = Normal(mean, std) - z = normal.sample() - action = torch.tanh(z) - - log_prob = normal.log_prob(z) - torch.log(1 - action.pow(2) + epsilon) - log_prob = log_prob.sum(-1, keepdim=True) - - return action, log_prob, z, mean, log_std - - - def get_action(self, state): - state = torch.FloatTensor(state).unsqueeze(0).to(device) - mean, log_std = self.forward(state) - std = log_std.exp() - - normal = Normal(mean, std) - z = normal.sample() - action = torch.tanh(z) - - action = action.detach().cpu().numpy() - return action[0] \ No newline at end of file diff --git a/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/models/sac_policy b/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/models/sac_policy deleted file mode 100644 index 9ae4e7b..0000000 Binary files a/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/models/sac_policy and /dev/null differ diff --git a/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/models/sac_policy_optimizer b/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/models/sac_policy_optimizer deleted file mode 100644 index 49c0d2a..0000000 Binary files a/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/models/sac_policy_optimizer and /dev/null differ diff --git a/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/models/sac_soft_q b/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/models/sac_soft_q deleted file mode 100644 index 3ff692f..0000000 Binary files a/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/models/sac_soft_q and /dev/null differ diff --git a/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/models/sac_soft_q_optimizer b/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/models/sac_soft_q_optimizer deleted file mode 100644 index 73be931..0000000 Binary files a/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/models/sac_soft_q_optimizer and /dev/null differ diff --git a/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/models/sac_value b/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/models/sac_value deleted file mode 100644 index 853ac6f..0000000 Binary files a/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/models/sac_value and /dev/null differ diff --git a/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/models/sac_value_optimizer b/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/models/sac_value_optimizer deleted file mode 100644 index 79410e4..0000000 Binary files a/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/models/sac_value_optimizer and /dev/null differ diff --git a/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/results/test_ma_rewards.npy b/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/results/test_ma_rewards.npy deleted file mode 100644 index eca3369..0000000 Binary files a/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/results/test_ma_rewards.npy and /dev/null differ diff --git a/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/results/test_rewards.npy b/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/results/test_rewards.npy deleted file mode 100644 index 09edb0e..0000000 Binary files a/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/results/test_rewards.npy and /dev/null differ diff --git a/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/results/test_rewards_curve.png b/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/results/test_rewards_curve.png deleted file mode 100644 index 5cc6e1d..0000000 Binary files a/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/results/test_rewards_curve.png and /dev/null differ diff --git a/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/results/train_ma_rewards.npy b/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/results/train_ma_rewards.npy deleted file mode 100644 index 3e1feac..0000000 Binary files a/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/results/train_ma_rewards.npy and /dev/null differ diff --git a/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/results/train_rewards.npy b/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/results/train_rewards.npy deleted file mode 100644 index 1c77a83..0000000 Binary files a/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/results/train_rewards.npy and /dev/null differ diff --git a/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/results/train_rewards_curve.png b/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/results/train_rewards_curve.png deleted file mode 100644 index 3e4c8aa..0000000 Binary files a/projects/codes/SoftActorCritic/outputs/Pendulum-v1/20211222-162722/results/train_rewards_curve.png and /dev/null differ diff --git a/projects/codes/SoftActorCritic/sac.py b/projects/codes/SoftActorCritic/sac.py deleted file mode 100644 index c67257f..0000000 --- a/projects/codes/SoftActorCritic/sac.py +++ /dev/null @@ -1,222 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: JiangJi -Email: johnjim0816@gmail.com -Date: 2021-04-29 12:53:54 -LastEditor: JiangJi -LastEditTime: 2021-12-22 15:41:19 -Discription: -Environment: -''' -import copy -import torch -import torch.nn as nn -import torch.optim as optim -import torch.nn.functional as F -from torch.distributions import Normal -import numpy as np -import random -device=torch.device("cuda" if torch.cuda.is_available() else "cpu") -class ReplayBuffer: - def __init__(self, capacity): - self.capacity = capacity # 经验回放的容量 - self.buffer = [] # 缓冲区 - self.position = 0 - - def push(self, state, action, reward, next_state, done): - ''' 缓冲区是一个队列,容量超出时去掉开始存入的转移(transition) - ''' - if len(self.buffer) < self.capacity: - self.buffer.append(None) - self.buffer[self.position] = (state, action, reward, next_state, done) - self.position = (self.position + 1) % self.capacity - - def sample(self, batch_size): - batch = random.sample(self.buffer, batch_size) # 随机采出小批量转移 - state, action, reward, next_state, done = zip(*batch) # 解压成状态,动作等 - return state, action, reward, next_state, done - - def __len__(self): - ''' 返回当前存储的量 - ''' - return len(self.buffer) - -class ValueNet(nn.Module): - def __init__(self, n_states, hidden_dim, init_w=3e-3): - super(ValueNet, self).__init__() - - self.linear1 = nn.Linear(n_states, hidden_dim) - self.linear2 = nn.Linear(hidden_dim, hidden_dim) - self.linear3 = nn.Linear(hidden_dim, 1) - - self.linear3.weight.data.uniform_(-init_w, init_w) - self.linear3.bias.data.uniform_(-init_w, init_w) - - def forward(self, state): - x = F.relu(self.linear1(state)) - x = F.relu(self.linear2(x)) - x = self.linear3(x) - return x - - -class SoftQNet(nn.Module): - def __init__(self, n_states, n_actions, hidden_dim, init_w=3e-3): - super(SoftQNet, self).__init__() - - self.linear1 = nn.Linear(n_states + n_actions, hidden_dim) - self.linear2 = nn.Linear(hidden_dim, hidden_dim) - self.linear3 = nn.Linear(hidden_dim, 1) - - self.linear3.weight.data.uniform_(-init_w, init_w) - self.linear3.bias.data.uniform_(-init_w, init_w) - - def forward(self, state, action): - x = torch.cat([state, action], 1) - x = F.relu(self.linear1(x)) - x = F.relu(self.linear2(x)) - x = self.linear3(x) - return x - - -class PolicyNet(nn.Module): - def __init__(self, n_states, n_actions, hidden_dim, init_w=3e-3, log_std_min=-20, log_std_max=2): - super(PolicyNet, self).__init__() - - self.log_std_min = log_std_min - self.log_std_max = log_std_max - - self.linear1 = nn.Linear(n_states, hidden_dim) - self.linear2 = nn.Linear(hidden_dim, hidden_dim) - - self.mean_linear = nn.Linear(hidden_dim, n_actions) - self.mean_linear.weight.data.uniform_(-init_w, init_w) - self.mean_linear.bias.data.uniform_(-init_w, init_w) - - self.log_std_linear = nn.Linear(hidden_dim, n_actions) - self.log_std_linear.weight.data.uniform_(-init_w, init_w) - self.log_std_linear.bias.data.uniform_(-init_w, init_w) - - def forward(self, state): - x = F.relu(self.linear1(state)) - x = F.relu(self.linear2(x)) - - mean = self.mean_linear(x) - log_std = self.log_std_linear(x) - log_std = torch.clamp(log_std, self.log_std_min, self.log_std_max) - - return mean, log_std - - def evaluate(self, state, epsilon=1e-6): - mean, log_std = self.forward(state) - std = log_std.exp() - - normal = Normal(mean, std) - z = normal.sample() - action = torch.tanh(z) - - log_prob = normal.log_prob(z) - torch.log(1 - action.pow(2) + epsilon) - log_prob = log_prob.sum(-1, keepdim=True) - - return action, log_prob, z, mean, log_std - - - def get_action(self, state): - state = torch.FloatTensor(state).unsqueeze(0).to(device) - mean, log_std = self.forward(state) - std = log_std.exp() - - normal = Normal(mean, std) - z = normal.sample() - action = torch.tanh(z) - - action = action.detach().cpu().numpy() - return action[0] - -class SAC: - def __init__(self,n_states,n_actions,cfg) -> None: - self.batch_size = cfg.batch_size - self.memory = ReplayBuffer(cfg.capacity) - self.device = cfg.device - self.value_net = ValueNet(n_states, cfg.hidden_dim).to(self.device) - self.target_value_net = ValueNet(n_states, cfg.hidden_dim).to(self.device) - self.soft_q_net = SoftQNet(n_states, n_actions, cfg.hidden_dim).to(self.device) - self.policy_net = PolicyNet(n_states, n_actions, cfg.hidden_dim).to(self.device) - self.value_optimizer = optim.Adam(self.value_net.parameters(), lr=cfg.value_lr) - self.soft_q_optimizer = optim.Adam(self.soft_q_net.parameters(), lr=cfg.soft_q_lr) - self.policy_optimizer = optim.Adam(self.policy_net.parameters(), lr=cfg.policy_lr) - for target_param, param in zip(self.target_value_net.parameters(), self.value_net.parameters()): - target_param.data.copy_(param.data) - self.value_criterion = nn.MSELoss() - self.soft_q_criterion = nn.MSELoss() - def update(self, gamma=0.99,mean_lambda=1e-3, - std_lambda=1e-3, - z_lambda=0.0, - soft_tau=1e-2, - ): - if len(self.memory) < self.batch_size: - return - state, action, reward, next_state, done = self.memory.sample(self.batch_size) - state = torch.FloatTensor(state).to(self.device) - next_state = torch.FloatTensor(next_state).to(self.device) - action = torch.FloatTensor(action).to(self.device) - reward = torch.FloatTensor(reward).unsqueeze(1).to(self.device) - done = torch.FloatTensor(np.float32(done)).unsqueeze(1).to(self.device) - expected_q_value = self.soft_q_net(state, action) - expected_value = self.value_net(state) - new_action, log_prob, z, mean, log_std = self.policy_net.evaluate(state) - - - target_value = self.target_value_net(next_state) - next_q_value = reward + (1 - done) * gamma * target_value - q_value_loss = self.soft_q_criterion(expected_q_value, next_q_value.detach()) - - expected_new_q_value = self.soft_q_net(state, new_action) - next_value = expected_new_q_value - log_prob - value_loss = self.value_criterion(expected_value, next_value.detach()) - - log_prob_target = expected_new_q_value - expected_value - policy_loss = (log_prob * (log_prob - log_prob_target).detach()).mean() - - - mean_loss = mean_lambda * mean.pow(2).mean() - std_loss = std_lambda * log_std.pow(2).mean() - z_loss = z_lambda * z.pow(2).sum(1).mean() - - policy_loss += mean_loss + std_loss + z_loss - - self.soft_q_optimizer.zero_grad() - q_value_loss.backward() - self.soft_q_optimizer.step() - - self.value_optimizer.zero_grad() - value_loss.backward() - self.value_optimizer.step() - - self.policy_optimizer.zero_grad() - policy_loss.backward() - self.policy_optimizer.step() - - for target_param, param in zip(self.target_value_net.parameters(), self.value_net.parameters()): - target_param.data.copy_( - target_param.data * (1.0 - soft_tau) + param.data * soft_tau - ) - def save(self, path): - torch.save(self.value_net.state_dict(), path + "sac_value") - torch.save(self.value_optimizer.state_dict(), path + "sac_value_optimizer") - torch.save(self.soft_q_net.state_dict(), path + "sac_soft_q") - torch.save(self.soft_q_optimizer.state_dict(), path + "sac_soft_q_optimizer") - - torch.save(self.policy_net.state_dict(), path + "sac_policy") - torch.save(self.policy_optimizer.state_dict(), path + "sac_policy_optimizer") - - def load(self, path): - self.value_net.load_state_dict(torch.load(path + "sac_value")) - self.value_optimizer.load_state_dict(torch.load(path + "sac_value_optimizer")) - self.target_value_net = copy.deepcopy(self.value_net) - - self.soft_q_net.load_state_dict(torch.load(path + "sac_soft_q")) - self.soft_q_optimizer.load_state_dict(torch.load(path + "sac_soft_q_optimizer")) - - self.policy_net.load_state_dict(torch.load(path + "sac_policy")) - self.policy_optimizer.load_state_dict(torch.load(path + "sac_policy_optimizer")) \ No newline at end of file diff --git a/projects/codes/SoftActorCritic/task0.py b/projects/codes/SoftActorCritic/task0.py deleted file mode 100644 index 668d289..0000000 --- a/projects/codes/SoftActorCritic/task0.py +++ /dev/null @@ -1,142 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: JiangJi -Email: johnjim0816@gmail.com -Date: 2021-04-29 12:59:22 -LastEditor: JiangJi -LastEditTime: 2021-12-22 16:27:13 -Discription: -Environment: -''' -import sys,os -curr_path = os.path.dirname(os.path.abspath(__file__)) # 当前文件所在绝对路径 -parent_path = os.path.dirname(curr_path) # 父路径 -sys.path.append(parent_path) # 添加路径到系统路径 - -import gym -import torch -import datetime - -from SoftActorCritic.env_wrapper import NormalizedActions -from SoftActorCritic.sac import SAC -from common.utils import save_results, make_dir -from common.utils import plot_rewards - -curr_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S") # 获取当前时间 -algo_name = 'SAC' # 算法名称 -env_name = 'Pendulum-v1' # 环境名称 -device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # 检测GPU - -class SACConfig: - def __init__(self) -> None: - self.algo_name = algo_name - self.env_name = env_name # 环境名称 - self.device= device - self.train_eps = 300 - self.test_eps = 20 - self.max_steps = 500 # 每回合的最大步数 - self.gamma = 0.99 - self.mean_lambda=1e-3 - self.std_lambda=1e-3 - self.z_lambda=0.0 - self.soft_tau=1e-2 - self.value_lr = 3e-4 - self.soft_q_lr = 3e-4 - self.policy_lr = 3e-4 - self.capacity = 1000000 - self.hidden_dim = 256 - self.batch_size = 128 - - -class PlotConfig: - def __init__(self) -> None: - self.algo_name = algo_name # 算法名称 - self.env_name = env_name # 环境名称 - self.device= device - self.result_path = curr_path + "/outputs/" + self.env_name + \ - '/' + curr_time + '/results/' # 保存结果的路径 - self.model_path = curr_path + "/outputs/" + self.env_name + \ - '/' + curr_time + '/models/' # 保存模型的路径 - self.save = True # 是否保存图片 - -def env_agent_config(cfg,seed=1): - env = NormalizedActions(gym.make(cfg.env_name)) - env.seed(seed) - n_actions = env.action_space.shape[0] - n_states = env.observation_space.shape[0] - agent = SAC(n_states,n_actions,cfg) - return env,agent - -def train(cfg,env,agent): - print('开始训练!') - print(f'环境:{cfg.env_name}, 算法:{cfg.algo_name}, 设备:{cfg.device}') - rewards = [] # 记录所有回合的奖励 - ma_rewards = [] # 记录所有回合的滑动平均奖励 - for i_ep in range(cfg.train_eps): - ep_reward = 0 # 记录一回合内的奖励 - state = env.reset() # 重置环境,返回初始状态 - for i_step in range(cfg.max_steps): - action = agent.policy_net.get_action(state) - next_state, reward, done, _ = env.step(action) - agent.memory.push(state, action, reward, next_state, done) - agent.update() - state = next_state - ep_reward += reward - if done: - break - rewards.append(ep_reward) - if ma_rewards: - ma_rewards.append(0.9*ma_rewards[-1]+0.1*ep_reward) - else: - ma_rewards.append(ep_reward) - if (i_ep+1)%10 == 0: - print(f'回合:{i_ep+1}/{cfg.train_eps}, 奖励:{ep_reward:.3f}') - print('完成训练!') - return rewards, ma_rewards - -def test(cfg,env,agent): - print('开始测试!') - print(f'环境:{cfg.env_name}, 算法:{cfg.algo_name}, 设备:{cfg.device}') - rewards = [] # 记录所有回合的奖励 - ma_rewards = [] # 记录所有回合的滑动平均奖励 - for i_ep in range(cfg.test_eps): - state = env.reset() - ep_reward = 0 - for i_step in range(cfg.max_steps): - action = agent.policy_net.get_action(state) - next_state, reward, done, _ = env.step(action) - state = next_state - ep_reward += reward - if done: - break - rewards.append(ep_reward) - if ma_rewards: - ma_rewards.append(0.9*ma_rewards[-1]+0.1*ep_reward) - else: - ma_rewards.append(ep_reward) - print(f"回合:{i_ep+1}/{cfg.test_eps},奖励:{ep_reward:.1f}") - print('完成测试!') - return rewards, ma_rewards - -if __name__ == "__main__": - cfg=SACConfig() - plot_cfg = PlotConfig() - # 训练 - env, agent = env_agent_config(cfg, seed=1) - rewards, ma_rewards = train(cfg, env, agent) - make_dir(plot_cfg.result_path, plot_cfg.model_path) # 创建保存结果和模型路径的文件夹 - agent.save(path=plot_cfg.model_path) # 保存模型 - save_results(rewards, ma_rewards, tag='train', - path=plot_cfg.result_path) # 保存结果 - plot_rewards(rewards, ma_rewards, plot_cfg, tag="train") # 画出结果 - # 测试 - env, agent = env_agent_config(cfg, seed=10) - agent.load(path=plot_cfg.model_path) # 导入模型 - rewards, ma_rewards = test(cfg, env, agent) - save_results(rewards, ma_rewards, tag='test', path=plot_cfg.result_path) # 保存结果 - plot_rewards(rewards, ma_rewards, plot_cfg, tag="test") # 画出结果 - - - - diff --git a/projects/codes/SoftActorCritic/task0_train.ipynb b/projects/codes/SoftActorCritic/task0_train.ipynb deleted file mode 100644 index 3be10c6..0000000 --- a/projects/codes/SoftActorCritic/task0_train.ipynb +++ /dev/null @@ -1,221 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import sys\n", - "from pathlib import Path\n", - "curr_path = str(Path().absolute())\n", - "parent_path = str(Path().absolute().parent)\n", - "sys.path.append(parent_path) # add current terminal path to sys.path" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "import gym\n", - "import torch\n", - "import datetime\n", - "\n", - "from SAC.env import NormalizedActions\n", - "from SAC.agent import SAC\n", - "from common.utils import save_results, make_dir\n", - "from common.plot import plot_rewards\n", - "\n", - "curr_time = datetime.datetime.now().strftime(\"%Y%m%d-%H%M%S\") # obtain current time" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "class SACConfig:\n", - " def __init__(self) -> None:\n", - " self.algo = 'SAC'\n", - " self.env = 'Pendulum-v0'\n", - " self.result_path = curr_path+\"/outputs/\" +self.env+'/'+curr_time+'/results/' # path to save results\n", - " self.model_path = curr_path+\"/outputs/\" +self.env+'/'+curr_time+'/models/' # path to save models\n", - " self.train_eps = 300\n", - " self.train_steps = 500\n", - " self.test_eps = 50\n", - " self.eval_steps = 500\n", - " self.gamma = 0.99\n", - " self.mean_lambda=1e-3\n", - " self.std_lambda=1e-3\n", - " self.z_lambda=0.0\n", - " self.soft_tau=1e-2\n", - " self.value_lr = 3e-4\n", - " self.soft_q_lr = 3e-4\n", - " self.policy_lr = 3e-4\n", - " self.capacity = 1000000\n", - " self.hidden_dim = 256\n", - " self.batch_size = 128\n", - " self.device=torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [], - "source": [ - "def env_agent_config(cfg,seed=1):\n", - " env = NormalizedActions(gym.make(\"Pendulum-v0\"))\n", - " env.seed(seed)\n", - " n_actions = env.action_space.shape[0]\n", - " n_states = env.observation_space.shape[0]\n", - " agent = SAC(n_states,n_actions,cfg)\n", - " return env,agent" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "def train(cfg,env,agent):\n", - " print('Start to train !')\n", - " print(f'Env: {cfg.env}, Algorithm: {cfg.algo}, Device: {cfg.device}')\n", - " rewards = []\n", - " ma_rewards = [] # moveing average reward\n", - " for i_ep in range(cfg.train_eps):\n", - " state = env.reset()\n", - " ep_reward = 0\n", - " for i_step in range(cfg.train_steps):\n", - " action = agent.policy_net.get_action(state)\n", - " next_state, reward, done, _ = env.step(action)\n", - " agent.memory.push(state, action, reward, next_state, done)\n", - " agent.update()\n", - " state = next_state\n", - " ep_reward += reward\n", - " if done:\n", - " break\n", - " if (i_ep+1)%10==0:\n", - " print(f\"Episode:{i_ep+1}/{cfg.train_eps}, Reward:{ep_reward:.3f}\")\n", - " rewards.append(ep_reward)\n", - " if ma_rewards:\n", - " ma_rewards.append(0.9*ma_rewards[-1]+0.1*ep_reward)\n", - " else:\n", - " ma_rewards.append(ep_reward) \n", - " print('Complete training!')\n", - " return rewards, ma_rewards" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "def eval(cfg,env,agent):\n", - " print('Start to eval !')\n", - " print(f'Env: {cfg.env}, Algorithm: {cfg.algo}, Device: {cfg.device}')\n", - " rewards = []\n", - " ma_rewards = [] # moveing average reward\n", - " for i_ep in range(cfg.test_eps):\n", - " state = env.reset()\n", - " ep_reward = 0\n", - " for i_step in range(cfg.eval_steps):\n", - " action = agent.policy_net.get_action(state)\n", - " next_state, reward, done, _ = env.step(action)\n", - " state = next_state\n", - " ep_reward += reward\n", - " if done:\n", - " break\n", - " if (i_ep+1)%10==0:\n", - " print(f\"Episode:{i_ep+1}/{cfg.train_eps}, Reward:{ep_reward:.3f}\")\n", - " rewards.append(ep_reward)\n", - " if ma_rewards:\n", - " ma_rewards.append(0.9*ma_rewards[-1]+0.1*ep_reward)\n", - " else:\n", - " ma_rewards.append(ep_reward) \n", - " print('Complete evaling!')\n", - " return rewards, ma_rewards\n" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "ename": "DeprecatedEnv", - "evalue": "Env Pendulum-v0 not found (valid versions include ['Pendulum-v1'])", - "output_type": "error", - "traceback": [ - "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", - "\u001b[0;32m~/anaconda3/envs/py37/lib/python3.7/site-packages/gym/envs/registration.py\u001b[0m in \u001b[0;36mspec\u001b[0;34m(self, path)\u001b[0m\n\u001b[1;32m 157\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 158\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0menv_specs\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mid\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 159\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mKeyError\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;31mKeyError\u001b[0m: 'Pendulum-v0'", - "\nDuring handling of the above exception, another exception occurred:\n", - "\u001b[0;31mDeprecatedEnv\u001b[0m Traceback (most recent call last)", - "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;31m# train\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 5\u001b[0;31m \u001b[0menv\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0magent\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0menv_agent_config\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcfg\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mseed\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 6\u001b[0m \u001b[0mrewards\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mma_rewards\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtrain\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcfg\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0menv\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0magent\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 7\u001b[0m \u001b[0mmake_dir\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcfg\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mresult_path\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcfg\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmodel_path\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m\u001b[0m in \u001b[0;36menv_agent_config\u001b[0;34m(cfg, seed)\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0menv_agent_config\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcfg\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mseed\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0menv\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mNormalizedActions\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgym\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmake\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Pendulum-v0\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0menv\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mseed\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mseed\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mn_actions\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0menv\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0maction_space\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mn_states\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0menv\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mobservation_space\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m~/anaconda3/envs/py37/lib/python3.7/site-packages/gym/envs/registration.py\u001b[0m in \u001b[0;36mmake\u001b[0;34m(id, **kwargs)\u001b[0m\n\u001b[1;32m 233\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 234\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mmake\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mid\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 235\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mregistry\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmake\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mid\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 236\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 237\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m~/anaconda3/envs/py37/lib/python3.7/site-packages/gym/envs/registration.py\u001b[0m in \u001b[0;36mmake\u001b[0;34m(self, path, **kwargs)\u001b[0m\n\u001b[1;32m 126\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 127\u001b[0m \u001b[0mlogger\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0minfo\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Making new env: %s\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 128\u001b[0;31m \u001b[0mspec\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mspec\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 129\u001b[0m \u001b[0menv\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mspec\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmake\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 130\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0menv\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m~/anaconda3/envs/py37/lib/python3.7/site-packages/gym/envs/registration.py\u001b[0m in \u001b[0;36mspec\u001b[0;34m(self, path)\u001b[0m\n\u001b[1;32m 185\u001b[0m raise error.DeprecatedEnv(\n\u001b[1;32m 186\u001b[0m \"Env {} not found (valid versions include {})\".format(\n\u001b[0;32m--> 187\u001b[0;31m \u001b[0mid\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmatching_envs\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 188\u001b[0m )\n\u001b[1;32m 189\u001b[0m )\n", - "\u001b[0;31mDeprecatedEnv\u001b[0m: Env Pendulum-v0 not found (valid versions include ['Pendulum-v1'])" - ] - } - ], - "source": [ - "if __name__ == \"__main__\":\n", - " cfg=SACConfig()\n", - " \n", - " # train\n", - " env,agent = env_agent_config(cfg,seed=1)\n", - " rewards, ma_rewards = train(cfg, env, agent)\n", - " make_dir(cfg.result_path, cfg.model_path)\n", - " agent.save(path=cfg.model_path)\n", - " save_results(rewards, ma_rewards, tag='train', path=cfg.result_path)\n", - " plot_rewards(rewards, ma_rewards, tag=\"train\",\n", - " algo=cfg.algo, path=cfg.result_path)\n", - " # eval\n", - " env,agent = env_agent_config(cfg,seed=10)\n", - " agent.load(path=cfg.model_path)\n", - " rewards,ma_rewards = eval(cfg,env,agent)\n", - " save_results(rewards,ma_rewards,tag='eval',path=cfg.result_path)\n", - " plot_rewards(rewards,ma_rewards,tag=\"eval\",env=cfg.env,algo = cfg.algo,path=cfg.result_path)\n" - ] - } - ], - "metadata": { - "interpreter": { - "hash": "fe38df673a99c62a9fea33a7aceda74c9b65b12ee9d076c5851d98b692a4989a" - }, - "kernelspec": { - "display_name": "Python 3.7.10 64-bit ('mujoco': conda)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.10" - }, - "metadata": { - "interpreter": { - "hash": "fd81e6a9e450d5c245c1a0b5da0b03c89c450f614a13afa2acb1654375922756" - } - }, - "orig_nbformat": 2 - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/projects/codes/SoftQ/outputs/CartPole-v0/20220818-154333/models/checkpoint.pth b/projects/codes/SoftQ/outputs/CartPole-v0/20220818-154333/models/checkpoint.pth deleted file mode 100644 index fc80e6f..0000000 Binary files a/projects/codes/SoftQ/outputs/CartPole-v0/20220818-154333/models/checkpoint.pth and /dev/null differ diff --git a/projects/codes/SoftQ/outputs/CartPole-v0/20220818-154333/results/params.json b/projects/codes/SoftQ/outputs/CartPole-v0/20220818-154333/results/params.json deleted file mode 100644 index 988c303..0000000 --- a/projects/codes/SoftQ/outputs/CartPole-v0/20220818-154333/results/params.json +++ /dev/null @@ -1 +0,0 @@ -{"algo_name": "SoftQ", "env_name": "CartPole-v0", "train_eps": 200, "test_eps": 20, "max_steps": 200, "gamma": 0.99, "alpha": 4, "lr": 0.0001, "memory_capacity": 50000, "batch_size": 128, "target_update": 2, "device": "cpu", "seed": 10, "result_path": "/Users/jj/Desktop/rl-tutorials/codes/SoftQ/outputs/CartPole-v0/20220818-154333/results/", "model_path": "/Users/jj/Desktop/rl-tutorials/codes/SoftQ/outputs/CartPole-v0/20220818-154333/models/", "show_fig": false, "save_fig": true} \ No newline at end of file diff --git a/projects/codes/SoftQ/outputs/CartPole-v0/20220818-154333/results/testing_curve.png b/projects/codes/SoftQ/outputs/CartPole-v0/20220818-154333/results/testing_curve.png deleted file mode 100644 index 83750e7..0000000 Binary files a/projects/codes/SoftQ/outputs/CartPole-v0/20220818-154333/results/testing_curve.png and /dev/null differ diff --git a/projects/codes/SoftQ/outputs/CartPole-v0/20220818-154333/results/testing_results.csv b/projects/codes/SoftQ/outputs/CartPole-v0/20220818-154333/results/testing_results.csv deleted file mode 100644 index b74878b..0000000 --- a/projects/codes/SoftQ/outputs/CartPole-v0/20220818-154333/results/testing_results.csv +++ /dev/null @@ -1,21 +0,0 @@ -episodes,rewards -0,200.0 -1,200.0 -2,200.0 -3,200.0 -4,200.0 -5,200.0 -6,200.0 -7,200.0 -8,199.0 -9,200.0 -10,200.0 -11,200.0 -12,200.0 -13,200.0 -14,200.0 -15,200.0 -16,200.0 -17,200.0 -18,200.0 -19,200.0 diff --git a/projects/codes/SoftQ/outputs/CartPole-v0/20220818-154333/results/training_curve.png b/projects/codes/SoftQ/outputs/CartPole-v0/20220818-154333/results/training_curve.png deleted file mode 100644 index 9f3164b..0000000 Binary files a/projects/codes/SoftQ/outputs/CartPole-v0/20220818-154333/results/training_curve.png and /dev/null differ diff --git a/projects/codes/SoftQ/outputs/CartPole-v0/20220818-154333/results/training_results.csv b/projects/codes/SoftQ/outputs/CartPole-v0/20220818-154333/results/training_results.csv deleted file mode 100644 index 0f52c1c..0000000 --- a/projects/codes/SoftQ/outputs/CartPole-v0/20220818-154333/results/training_results.csv +++ /dev/null @@ -1,201 +0,0 @@ -episodes,rewards -0,21.0 -1,23.0 -2,24.0 -3,27.0 -4,33.0 -5,18.0 -6,47.0 -7,18.0 -8,18.0 -9,21.0 -10,26.0 -11,31.0 -12,11.0 -13,17.0 -14,22.0 -15,16.0 -16,17.0 -17,34.0 -18,20.0 -19,11.0 -20,50.0 -21,15.0 -22,11.0 -23,39.0 -24,11.0 -25,28.0 -26,37.0 -27,26.0 -28,63.0 -29,18.0 -30,17.0 -31,13.0 -32,9.0 -33,15.0 -34,13.0 -35,21.0 -36,17.0 -37,22.0 -38,20.0 -39,31.0 -40,9.0 -41,10.0 -42,11.0 -43,15.0 -44,18.0 -45,10.0 -46,30.0 -47,14.0 -48,36.0 -49,26.0 -50,21.0 -51,15.0 -52,9.0 -53,14.0 -54,10.0 -55,27.0 -56,14.0 -57,15.0 -58,22.0 -59,12.0 -60,20.0 -61,10.0 -62,12.0 -63,29.0 -64,11.0 -65,13.0 -66,27.0 -67,50.0 -68,29.0 -69,40.0 -70,29.0 -71,18.0 -72,27.0 -73,11.0 -74,15.0 -75,10.0 -76,13.0 -77,11.0 -78,17.0 -79,13.0 -80,18.0 -81,24.0 -82,15.0 -83,34.0 -84,11.0 -85,35.0 -86,26.0 -87,9.0 -88,19.0 -89,19.0 -90,16.0 -91,25.0 -92,18.0 -93,37.0 -94,46.0 -95,88.0 -96,26.0 -97,55.0 -98,43.0 -99,141.0 -100,89.0 -101,151.0 -102,47.0 -103,56.0 -104,64.0 -105,56.0 -106,49.0 -107,87.0 -108,58.0 -109,55.0 -110,57.0 -111,165.0 -112,31.0 -113,200.0 -114,57.0 -115,107.0 -116,46.0 -117,45.0 -118,64.0 -119,69.0 -120,67.0 -121,65.0 -122,47.0 -123,63.0 -124,134.0 -125,60.0 -126,89.0 -127,99.0 -128,51.0 -129,109.0 -130,131.0 -131,156.0 -132,118.0 -133,185.0 -134,86.0 -135,149.0 -136,138.0 -137,143.0 -138,114.0 -139,130.0 -140,139.0 -141,106.0 -142,135.0 -143,164.0 -144,156.0 -145,155.0 -146,200.0 -147,186.0 -148,64.0 -149,200.0 -150,135.0 -151,135.0 -152,168.0 -153,200.0 -154,200.0 -155,200.0 -156,167.0 -157,198.0 -158,188.0 -159,200.0 -160,200.0 -161,200.0 -162,200.0 -163,200.0 -164,200.0 -165,200.0 -166,200.0 -167,200.0 -168,189.0 -169,200.0 -170,146.0 -171,200.0 -172,200.0 -173,200.0 -174,115.0 -175,170.0 -176,200.0 -177,200.0 -178,178.0 -179,200.0 -180,200.0 -181,200.0 -182,200.0 -183,200.0 -184,200.0 -185,200.0 -186,120.0 -187,200.0 -188,200.0 -189,200.0 -190,200.0 -191,200.0 -192,200.0 -193,200.0 -194,200.0 -195,200.0 -196,200.0 -197,200.0 -198,200.0 -199,200.0 diff --git a/projects/codes/SoftQ/softq.py b/projects/codes/SoftQ/softq.py deleted file mode 100644 index a9a38e1..0000000 --- a/projects/codes/SoftQ/softq.py +++ /dev/null @@ -1,71 +0,0 @@ -import torch -import torch.nn as nn -import torch.nn.functional as F -from collections import deque -import random -from torch.distributions import Categorical -import gym -import numpy as np - -class SoftQ: - def __init__(self,n_actions,model,memory,cfg): - self.memory = memory - self.alpha = cfg.alpha - self.gamma = cfg.gamma # discount factor - self.batch_size = cfg.batch_size - self.device = torch.device(cfg.device) - self.policy_net = model.to(self.device) - self.target_net = model.to(self.device) - self.target_net.load_state_dict(self.policy_net.state_dict()) # copy parameters - self.optimizer = torch.optim.Adam(self.policy_net.parameters(), lr=cfg.lr) - self.losses = [] # save losses - - def sample_action(self,state): - state = torch.FloatTensor(state).unsqueeze(0).to(self.device) - with torch.no_grad(): - q = self.policy_net(state) - v = self.alpha * torch.log(torch.sum(torch.exp(q/self.alpha), dim=1, keepdim=True)).squeeze() - dist = torch.exp((q-v)/self.alpha) - dist = dist / torch.sum(dist) - c = Categorical(dist) - a = c.sample() - return a.item() - def predict_action(self,state): - state = torch.tensor(np.array(state), device=self.device, dtype=torch.float).unsqueeze(0) - with torch.no_grad(): - q = self.policy_net(state) - v = self.alpha * torch.log(torch.sum(torch.exp(q/self.alpha), dim=1, keepdim=True)).squeeze() - dist = torch.exp((q-v)/self.alpha) - dist = dist / torch.sum(dist) - c = Categorical(dist) - a = c.sample() - return a.item() - def update(self): - if len(self.memory) < self.batch_size: # when the memory capacity does not meet a batch, the network will not update - return - state_batch, action_batch, reward_batch, next_state_batch, done_batch = self.memory.sample(self.batch_size) - state_batch = torch.tensor(np.array(state_batch), device=self.device, dtype=torch.float) # shape(batchsize,n_states) - action_batch = torch.tensor(np.array(action_batch), device=self.device, dtype=torch.float).unsqueeze(1) # shape(batchsize,1) - reward_batch = torch.tensor(np.array(reward_batch), device=self.device, dtype=torch.float).unsqueeze(1) # shape(batchsize,1) - next_state_batch = torch.tensor(np.array(next_state_batch), device=self.device, dtype=torch.float) # shape(batchsize,n_states) - done_batch = torch.tensor(np.array(done_batch), device=self.device, dtype=torch.float).unsqueeze(1) # shape(batchsize,1) - # print(state_batch.shape,action_batch.shape,reward_batch.shape,next_state_batch.shape,done_batch.shape) - with torch.no_grad(): - next_q = self.target_net(next_state_batch) - next_v = self.alpha * torch.log(torch.sum(torch.exp(next_q/self.alpha), dim=1, keepdim=True)) - y = reward_batch + (1 - done_batch ) * self.gamma * next_v - loss = F.mse_loss(self.policy_net(state_batch).gather(1, action_batch.long()), y) - self.losses.append(loss) - self.optimizer.zero_grad() - loss.backward() - self.optimizer.step() - def save_model(self, path): - from pathlib import Path - # create path - Path(path).mkdir(parents=True, exist_ok=True) - torch.save(self.target_net.state_dict(), path+'checkpoint.pth') - - def load_model(self, path): - self.target_net.load_state_dict(torch.load(path+'checkpoint.pth')) - for target_param, param in zip(self.target_net.parameters(), self.policy_net.parameters()): - param.data.copy_(target_param.data) \ No newline at end of file diff --git a/projects/codes/SoftQ/task0.py b/projects/codes/SoftQ/task0.py deleted file mode 100644 index fd67aa4..0000000 --- a/projects/codes/SoftQ/task0.py +++ /dev/null @@ -1,142 +0,0 @@ -import sys,os -curr_path = os.path.dirname(os.path.abspath(__file__)) # current path -parent_path = os.path.dirname(curr_path) # parent path -sys.path.append(parent_path) # add path to system path - -import argparse -import datetime -import gym -import torch -import random -import numpy as np -import torch.nn as nn -from common.memories import ReplayBufferQue -from common.models import MLP -from common.utils import save_results,all_seed,plot_rewards,save_args -from softq import SoftQ - -def get_args(): - """ hyperparameters - """ - curr_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S") # obtain current time - parser = argparse.ArgumentParser(description="hyperparameters") - parser.add_argument('--algo_name',default='SoftQ',type=str,help="name of algorithm") - parser.add_argument('--env_name',default='CartPole-v0',type=str,help="name of environment") - parser.add_argument('--train_eps',default=200,type=int,help="episodes of training") - parser.add_argument('--test_eps',default=20,type=int,help="episodes of testing") - parser.add_argument('--max_steps',default=200,type=int,help="maximum steps per episode") - parser.add_argument('--gamma',default=0.99,type=float,help="discounted factor") - parser.add_argument('--alpha',default=4,type=float,help="alpha") - parser.add_argument('--lr',default=0.0001,type=float,help="learning rate") - parser.add_argument('--memory_capacity',default=50000,type=int,help="memory capacity") - parser.add_argument('--batch_size',default=128,type=int) - parser.add_argument('--target_update',default=2,type=int) - parser.add_argument('--device',default='cpu',type=str,help="cpu or cuda") - parser.add_argument('--seed',default=10,type=int,help="seed") - parser.add_argument('--result_path',default=curr_path + "/outputs/" + parser.parse_args().env_name + \ - '/' + curr_time + '/results/' ) - parser.add_argument('--model_path',default=curr_path + "/outputs/" + parser.parse_args().env_name + \ - '/' + curr_time + '/models/' ) - parser.add_argument('--show_fig',default=False,type=bool,help="if show figure or not") - parser.add_argument('--save_fig',default=True,type=bool,help="if save figure or not") - args = parser.parse_args() - return args - -class SoftQNetwork(nn.Module): - '''Actually almost same to common.models.MLP - ''' - def __init__(self,input_dim,output_dim): - super(SoftQNetwork,self).__init__() - self.fc1 = nn.Linear(input_dim, 64) - self.relu = nn.ReLU() - self.fc2 = nn.Linear(64, 256) - self.fc3 = nn.Linear(256, output_dim) - - def forward(self, x): - x = self.relu(self.fc1(x)) - x = self.relu(self.fc2(x)) - x = self.fc3(x) - return x - -def env_agent_config(cfg): - ''' create env and agent - ''' - env = gym.make(cfg.env_name) # create env - if cfg.seed !=0: # set random seed - all_seed(env,seed=cfg.seed) - n_states = env.observation_space.shape[0] # state dimension - n_actions = env.action_space.n # action dimension - print(f"state dim: {n_states}, action dim: {n_actions}") - # model = MLP(n_states,n_actions) - model = SoftQNetwork(n_states,n_actions) - memory = ReplayBufferQue(cfg.memory_capacity) # replay buffer - agent = SoftQ(n_actions,model,memory,cfg) # create agent - return env, agent - -def train(cfg, env, agent): - ''' training - ''' - print("start training!") - print(f"Env: {cfg.env_name}, Algo: {cfg.algo_name}, Device: {cfg.device}") - rewards = [] # record rewards for all episodes - steps = [] # record steps for all episodes, sometimes need - for i_ep in range(cfg.train_eps): - ep_reward = 0 # reward per episode - ep_step = 0 - state = env.reset() # reset and obtain initial state - while True: - # for _ in range(cfg.max_steps): - ep_step += 1 - action = agent.sample_action(state) # sample action - next_state, reward, done, _ = env.step(action) # update env and return transitions - agent.memory.push((state, action, reward, next_state, done)) # save transitions - state = next_state # update next state for env - agent.update() # update agent - ep_reward += reward - if done: - break - if (i_ep + 1) % cfg.target_update == 0: # target net update, target_update means "C" in pseucodes - agent.target_net.load_state_dict(agent.policy_net.state_dict()) - steps.append(ep_step) - rewards.append(ep_reward) - if (i_ep + 1) % 10 == 0: - print(f'Episode: {i_ep+1}/{cfg.train_eps}, Reward: {ep_reward:.2f}') - print("finish training!") - res_dic = {'episodes':range(len(rewards)),'rewards':rewards} - return res_dic -def test(cfg, env, agent): - print("start testing!") - print(f"Env: {cfg.env_name}, Algo: {cfg.algo_name}, Device: {cfg.device}") - rewards = [] # record rewards for all episodes - for i_ep in range(cfg.test_eps): - ep_reward = 0 # reward per episode - state = env.reset() # reset and obtain initial state - while True: - action = agent.predict_action(state) # predict action - next_state, reward, done, _ = env.step(action) - state = next_state - ep_reward += reward - if done: - break - rewards.append(ep_reward) - print(f'Episode: {i_ep+1}/{cfg.test_eps},Reward: {ep_reward:.2f}') - print("finish testing!") - env.close() - return {'episodes':range(len(rewards)),'rewards':rewards} - -if __name__ == "__main__": - cfg = get_args() - # 训练 - env, agent = env_agent_config(cfg) - res_dic = train(cfg, env, agent) - save_args(cfg,path = cfg.result_path) # 保存参数到模型路径上 - agent.save_model(path = cfg.model_path) # 保存模型 - save_results(res_dic, tag = 'train', path = cfg.result_path) - plot_rewards(res_dic['rewards'], cfg, path = cfg.result_path,tag = "train") - # 测试 - env, agent = env_agent_config(cfg) # 也可以不加,加这一行的是为了避免训练之后环境可能会出现问题,因此新建一个环境用于测试 - agent.load_model(path = cfg.model_path) # 导入模型 - res_dic = test(cfg, env, agent) - save_results(res_dic, tag='test', - path = cfg.result_path) # 保存结果 - plot_rewards(res_dic['rewards'], cfg, path = cfg.result_path,tag = "test") # 画出结果 \ No newline at end of file diff --git a/projects/codes/TD3/README.md b/projects/codes/TD3/README.md deleted file mode 100644 index 8001e9c..0000000 --- a/projects/codes/TD3/README.md +++ /dev/null @@ -1 +0,0 @@ -这是对[Implementation of Twin Delayed Deep Deterministic Policy Gradients (TD3)](https://arxiv.org/abs/1802.09477)的复现 \ No newline at end of file diff --git a/projects/codes/TD3/agent.py b/projects/codes/TD3/agent.py deleted file mode 100644 index f77a912..0000000 --- a/projects/codes/TD3/agent.py +++ /dev/null @@ -1,177 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: JiangJi -Email: johnjim0816@gmail.com -Date: 2021-12-22 10:40:05 -LastEditor: JiangJi -LastEditTime: 2021-12-22 10:43:55 -Discription: -''' -import copy -import numpy as np -import torch -import torch.nn as nn -import torch.nn.functional as F -from TD3.memory import ReplayBuffer - -class Actor(nn.Module): - - def __init__(self, input_dim, output_dim, max_action): - '''[summary] - - Args: - input_dim (int): 输入维度,这里等于n_states - output_dim (int): 输出维度,这里等于n_actions - max_action (int): action的最大值 - ''' - super(Actor, self).__init__() - - self.l1 = nn.Linear(input_dim, 256) - self.l2 = nn.Linear(256, 256) - self.l3 = nn.Linear(256, output_dim) - self.max_action = max_action - - def forward(self, state): - - a = F.relu(self.l1(state)) - a = F.relu(self.l2(a)) - return self.max_action * torch.tanh(self.l3(a)) - - -class Critic(nn.Module): - def __init__(self, input_dim, output_dim): - super(Critic, self).__init__() - - # Q1 architecture - self.l1 = nn.Linear(input_dim + output_dim, 256) - self.l2 = nn.Linear(256, 256) - self.l3 = nn.Linear(256, 1) - - # Q2 architecture - self.l4 = nn.Linear(input_dim + output_dim, 256) - self.l5 = nn.Linear(256, 256) - self.l6 = nn.Linear(256, 1) - - - def forward(self, state, action): - sa = torch.cat([state, action], 1) - - q1 = F.relu(self.l1(sa)) - q1 = F.relu(self.l2(q1)) - q1 = self.l3(q1) - - q2 = F.relu(self.l4(sa)) - q2 = F.relu(self.l5(q2)) - q2 = self.l6(q2) - return q1, q2 - - - def Q1(self, state, action): - sa = torch.cat([state, action], 1) - - q1 = F.relu(self.l1(sa)) - q1 = F.relu(self.l2(q1)) - q1 = self.l3(q1) - return q1 - - -class TD3(object): - def __init__( - self, - input_dim, - output_dim, - max_action, - cfg, - ): - self.max_action = max_action - self.gamma = cfg.gamma - self.lr = cfg.lr - self.policy_noise = cfg.policy_noise - self.noise_clip = cfg.noise_clip - self.policy_freq = cfg.policy_freq - self.batch_size = cfg.batch_size - self.device = cfg.device - self.total_it = 0 - - self.actor = Actor(input_dim, output_dim, max_action).to(self.device) - self.actor_target = copy.deepcopy(self.actor) - self.actor_optimizer = torch.optim.Adam(self.actor.parameters(), lr=3e-4) - - self.critic = Critic(input_dim, output_dim).to(self.device) - self.critic_target = copy.deepcopy(self.critic) - self.critic_optimizer = torch.optim.Adam(self.critic.parameters(), lr=3e-4) - self.memory = ReplayBuffer(input_dim, output_dim) - - def choose_action(self, state): - state = torch.FloatTensor(state.reshape(1, -1)).to(self.device) - return self.actor(state).cpu().data.numpy().flatten() - - def update(self): - self.total_it += 1 - - # Sample replay buffer - state, action, next_state, reward, not_done = self.memory.sample(self.batch_size) - - with torch.no_grad(): - # Select action according to policy and add clipped noise - noise = ( - torch.randn_like(action) * self.policy_noise - ).clamp(-self.noise_clip, self.noise_clip) - - next_action = ( - self.actor_target(next_state) + noise - ).clamp(-self.max_action, self.max_action) - - # Compute the target Q value - target_Q1, target_Q2 = self.critic_target(next_state, next_action) - target_Q = torch.min(target_Q1, target_Q2) - target_Q = reward + not_done * self.gamma * target_Q - - # Get current Q estimates - current_Q1, current_Q2 = self.critic(state, action) - - # Compute critic loss - critic_loss = F.mse_loss(current_Q1, target_Q) + F.mse_loss(current_Q2, target_Q) - - # Optimize the critic - self.critic_optimizer.zero_grad() - critic_loss.backward() - self.critic_optimizer.step() - - # Delayed policy updates - if self.total_it % self.policy_freq == 0: - - # Compute actor losse - actor_loss = -self.critic.Q1(state, self.actor(state)).mean() - - # Optimize the actor - self.actor_optimizer.zero_grad() - actor_loss.backward() - self.actor_optimizer.step() - - # Update the frozen target models - for param, target_param in zip(self.critic.parameters(), self.critic_target.parameters()): - target_param.data.copy_(self.lr * param.data + (1 - self.lr) * target_param.data) - - for param, target_param in zip(self.actor.parameters(), self.actor_target.parameters()): - target_param.data.copy_(self.lr * param.data + (1 - self.lr) * target_param.data) - - - def save(self, path): - torch.save(self.critic.state_dict(), path + "td3_critic") - torch.save(self.critic_optimizer.state_dict(), path + "td3_critic_optimizer") - - torch.save(self.actor.state_dict(), path + "td3_actor") - torch.save(self.actor_optimizer.state_dict(), path + "td3_actor_optimizer") - - - def load(self, path): - self.critic.load_state_dict(torch.load(path + "td3_critic")) - self.critic_optimizer.load_state_dict(torch.load(path + "td3_critic_optimizer")) - self.critic_target = copy.deepcopy(self.critic) - - self.actor.load_state_dict(torch.load(path + "td3_actor")) - self.actor_optimizer.load_state_dict(torch.load(path + "td3_actor_optimizer")) - self.actor_target = copy.deepcopy(self.actor) - diff --git a/projects/codes/TD3/memory.py b/projects/codes/TD3/memory.py deleted file mode 100644 index bcf38bb..0000000 --- a/projects/codes/TD3/memory.py +++ /dev/null @@ -1,44 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: John -Email: johnjim0816@gmail.com -Date: 2021-04-13 11:00:13 -LastEditor: John -LastEditTime: 2021-04-15 01:25:14 -Discription: -Environment: -''' -import numpy as np -import torch - - -class ReplayBuffer(object): - def __init__(self, n_states, n_actions, max_size=int(1e6)): - self.max_size = max_size - self.ptr = 0 - self.size = 0 - self.state = np.zeros((max_size, n_states)) - self.action = np.zeros((max_size, n_actions)) - self.next_state = np.zeros((max_size, n_states)) - self.reward = np.zeros((max_size, 1)) - self.not_done = np.zeros((max_size, 1)) - self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") - def push(self, state, action, next_state, reward, done): - self.state[self.ptr] = state - self.action[self.ptr] = action - self.next_state[self.ptr] = next_state - self.reward[self.ptr] = reward - self.not_done[self.ptr] = 1. - done - self.ptr = (self.ptr + 1) % self.max_size - self.size = min(self.size + 1, self.max_size) - - def sample(self, batch_size): - ind = np.random.randint(0, self.size, size=batch_size) - return ( - torch.FloatTensor(self.state[ind]).to(self.device), - torch.FloatTensor(self.action[ind]).to(self.device), - torch.FloatTensor(self.next_state[ind]).to(self.device), - torch.FloatTensor(self.reward[ind]).to(self.device), - torch.FloatTensor(self.not_done[ind]).to(self.device) - ) \ No newline at end of file diff --git a/projects/codes/TD3/outputs/HalfCheetah-v2/20210416-130341/models/td3_actor b/projects/codes/TD3/outputs/HalfCheetah-v2/20210416-130341/models/td3_actor deleted file mode 100644 index 2b3b481..0000000 Binary files a/projects/codes/TD3/outputs/HalfCheetah-v2/20210416-130341/models/td3_actor and /dev/null differ diff --git a/projects/codes/TD3/outputs/HalfCheetah-v2/20210416-130341/models/td3_actor_optimizer b/projects/codes/TD3/outputs/HalfCheetah-v2/20210416-130341/models/td3_actor_optimizer deleted file mode 100644 index 9bb6195..0000000 Binary files a/projects/codes/TD3/outputs/HalfCheetah-v2/20210416-130341/models/td3_actor_optimizer and /dev/null differ diff --git a/projects/codes/TD3/outputs/HalfCheetah-v2/20210416-130341/models/td3_critic b/projects/codes/TD3/outputs/HalfCheetah-v2/20210416-130341/models/td3_critic deleted file mode 100644 index cccfb71..0000000 Binary files a/projects/codes/TD3/outputs/HalfCheetah-v2/20210416-130341/models/td3_critic and /dev/null differ diff --git a/projects/codes/TD3/outputs/HalfCheetah-v2/20210416-130341/models/td3_critic_optimizer b/projects/codes/TD3/outputs/HalfCheetah-v2/20210416-130341/models/td3_critic_optimizer deleted file mode 100644 index 1446c66..0000000 Binary files a/projects/codes/TD3/outputs/HalfCheetah-v2/20210416-130341/models/td3_critic_optimizer and /dev/null differ diff --git a/projects/codes/TD3/outputs/HalfCheetah-v2/20210416-130341/results/ma_rewards_train.npy b/projects/codes/TD3/outputs/HalfCheetah-v2/20210416-130341/results/ma_rewards_train.npy deleted file mode 100644 index 96d40db..0000000 Binary files a/projects/codes/TD3/outputs/HalfCheetah-v2/20210416-130341/results/ma_rewards_train.npy and /dev/null differ diff --git a/projects/codes/TD3/outputs/HalfCheetah-v2/20210416-130341/results/rewards_curve_train.png b/projects/codes/TD3/outputs/HalfCheetah-v2/20210416-130341/results/rewards_curve_train.png deleted file mode 100644 index e310371..0000000 Binary files a/projects/codes/TD3/outputs/HalfCheetah-v2/20210416-130341/results/rewards_curve_train.png and /dev/null differ diff --git a/projects/codes/TD3/outputs/HalfCheetah-v2/20210416-130341/results/rewards_train.npy b/projects/codes/TD3/outputs/HalfCheetah-v2/20210416-130341/results/rewards_train.npy deleted file mode 100644 index 718e407..0000000 Binary files a/projects/codes/TD3/outputs/HalfCheetah-v2/20210416-130341/results/rewards_train.npy and /dev/null differ diff --git a/projects/codes/TD3/outputs/Pendulum-v1/20211119-123814/models/td3_actor b/projects/codes/TD3/outputs/Pendulum-v1/20211119-123814/models/td3_actor deleted file mode 100644 index 40533d9..0000000 Binary files a/projects/codes/TD3/outputs/Pendulum-v1/20211119-123814/models/td3_actor and /dev/null differ diff --git a/projects/codes/TD3/outputs/Pendulum-v1/20211119-123814/models/td3_actor_optimizer b/projects/codes/TD3/outputs/Pendulum-v1/20211119-123814/models/td3_actor_optimizer deleted file mode 100644 index e91a68f..0000000 Binary files a/projects/codes/TD3/outputs/Pendulum-v1/20211119-123814/models/td3_actor_optimizer and /dev/null differ diff --git a/projects/codes/TD3/outputs/Pendulum-v1/20211119-123814/models/td3_critic b/projects/codes/TD3/outputs/Pendulum-v1/20211119-123814/models/td3_critic deleted file mode 100644 index ef6b3e5..0000000 Binary files a/projects/codes/TD3/outputs/Pendulum-v1/20211119-123814/models/td3_critic and /dev/null differ diff --git a/projects/codes/TD3/outputs/Pendulum-v1/20211119-123814/models/td3_critic_optimizer b/projects/codes/TD3/outputs/Pendulum-v1/20211119-123814/models/td3_critic_optimizer deleted file mode 100644 index 8094beb..0000000 Binary files a/projects/codes/TD3/outputs/Pendulum-v1/20211119-123814/models/td3_critic_optimizer and /dev/null differ diff --git a/projects/codes/TD3/outputs/Pendulum-v1/20211119-123814/results/train_ma_rewards.npy b/projects/codes/TD3/outputs/Pendulum-v1/20211119-123814/results/train_ma_rewards.npy deleted file mode 100644 index 288eb69..0000000 Binary files a/projects/codes/TD3/outputs/Pendulum-v1/20211119-123814/results/train_ma_rewards.npy and /dev/null differ diff --git a/projects/codes/TD3/outputs/Pendulum-v1/20211119-123814/results/train_rewards.npy b/projects/codes/TD3/outputs/Pendulum-v1/20211119-123814/results/train_rewards.npy deleted file mode 100644 index 5bdee4a..0000000 Binary files a/projects/codes/TD3/outputs/Pendulum-v1/20211119-123814/results/train_rewards.npy and /dev/null differ diff --git a/projects/codes/TD3/outputs/Pendulum-v1/20211119-123814/results/train_rewards_curve.png b/projects/codes/TD3/outputs/Pendulum-v1/20211119-123814/results/train_rewards_curve.png deleted file mode 100644 index 31e873c..0000000 Binary files a/projects/codes/TD3/outputs/Pendulum-v1/20211119-123814/results/train_rewards_curve.png and /dev/null differ diff --git a/projects/codes/TD3/outputs/Reacher-v2/20210415-021952/ma_rewards_train.npy b/projects/codes/TD3/outputs/Reacher-v2/20210415-021952/ma_rewards_train.npy deleted file mode 100644 index 017dbba..0000000 Binary files a/projects/codes/TD3/outputs/Reacher-v2/20210415-021952/ma_rewards_train.npy and /dev/null differ diff --git a/projects/codes/TD3/outputs/Reacher-v2/20210415-021952/rewards_curve_train.png b/projects/codes/TD3/outputs/Reacher-v2/20210415-021952/rewards_curve_train.png deleted file mode 100644 index 098872d..0000000 Binary files a/projects/codes/TD3/outputs/Reacher-v2/20210415-021952/rewards_curve_train.png and /dev/null differ diff --git a/projects/codes/TD3/outputs/Reacher-v2/20210415-021952/rewards_train.npy b/projects/codes/TD3/outputs/Reacher-v2/20210415-021952/rewards_train.npy deleted file mode 100644 index 3ef20c3..0000000 Binary files a/projects/codes/TD3/outputs/Reacher-v2/20210415-021952/rewards_train.npy and /dev/null differ diff --git a/projects/codes/TD3/outputs/Reacher-v2/20210415-021952/td3_actor b/projects/codes/TD3/outputs/Reacher-v2/20210415-021952/td3_actor deleted file mode 100644 index 10e7154..0000000 Binary files a/projects/codes/TD3/outputs/Reacher-v2/20210415-021952/td3_actor and /dev/null differ diff --git a/projects/codes/TD3/outputs/Reacher-v2/20210415-021952/td3_actor_optimizer b/projects/codes/TD3/outputs/Reacher-v2/20210415-021952/td3_actor_optimizer deleted file mode 100644 index ac8989e..0000000 Binary files a/projects/codes/TD3/outputs/Reacher-v2/20210415-021952/td3_actor_optimizer and /dev/null differ diff --git a/projects/codes/TD3/outputs/Reacher-v2/20210415-021952/td3_critic b/projects/codes/TD3/outputs/Reacher-v2/20210415-021952/td3_critic deleted file mode 100644 index 5e16302..0000000 Binary files a/projects/codes/TD3/outputs/Reacher-v2/20210415-021952/td3_critic and /dev/null differ diff --git a/projects/codes/TD3/outputs/Reacher-v2/20210415-021952/td3_critic_optimizer b/projects/codes/TD3/outputs/Reacher-v2/20210415-021952/td3_critic_optimizer deleted file mode 100644 index 3b7d759..0000000 Binary files a/projects/codes/TD3/outputs/Reacher-v2/20210415-021952/td3_critic_optimizer and /dev/null differ diff --git a/projects/codes/TD3/task0_eval.py b/projects/codes/TD3/task0_eval.py deleted file mode 100644 index cb977b4..0000000 --- a/projects/codes/TD3/task0_eval.py +++ /dev/null @@ -1,89 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: JiangJi -Email: johnjim0816@gmail.com -Date: 2021-04-23 20:36:23 -LastEditor: JiangJi -LastEditTime: 2021-04-23 20:37:22 -Discription: -Environment: -''' -import sys,os -curr_path = os.path.dirname(__file__) -parent_path=os.path.dirname(curr_path) -sys.path.append(parent_path) # add current terminal path to sys.path - -import torch -import gym -import numpy as np -import datetime - - -from TD3.agent import TD3 -from common.plot import plot_rewards -from common.utils import save_results,make_dir - -curr_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S") # obtain current time - -class TD3Config: - def __init__(self) -> None: - self.algo = 'TD3 and Random' - self.env = 'HalfCheetah-v2' - self.seed = 0 - self.result_path = curr_path+"/results/" +self.env+'/'+curr_time+'/results/' # path to save results - self.model_path = curr_path+"/results/" +self.env+'/'+curr_time+'/models/' # path to save models - self.start_timestep = 25e3 # Time steps initial random policy is used - self.eval_freq = 5e3 # How often (time steps) we evaluate - self.max_timestep = 200000 # Max time steps to run environment - self.expl_noise = 0.1 # Std of Gaussian exploration noise - self.batch_size = 256 # Batch size for both actor and critic - self.gamma = 0.99 # gamma factor - self.lr = 0.0005 # Target network update rate - self.policy_noise = 0.2 # Noise added to target policy during critic update - self.noise_clip = 0.5 # Range to clip target policy noise - self.policy_freq = 2 # Frequency of delayed policy updates - self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") - -# Runs policy for X episodes and returns average reward -# A fixed seed is used for the eval environment -def eval(env_name,agent, seed, eval_episodes=50): - eval_env = gym.make(env_name) - eval_env.seed(seed + 100) - rewards,ma_rewards =[],[] - for i_episode in range(eval_episodes): - ep_reward = 0 - state, done = eval_env.reset(), False - while not done: - eval_env.render() - action = agent.choose_action(np.array(state)) - state, reward, done, _ = eval_env.step(action) - ep_reward += reward - print(f"Episode:{i_episode+1}, Reward:{ep_reward:.3f}") - rewards.append(ep_reward) - # 计算滑动窗口的reward - if ma_rewards: - ma_rewards.append(0.9*ma_rewards[-1]+0.1*ep_reward) - else: - ma_rewards.append(ep_reward) - return rewards,ma_rewards - -if __name__ == "__main__": - cfg = TD3Config() - env = gym.make(cfg.env) - env.seed(cfg.seed) # Set seeds - torch.manual_seed(cfg.seed) - np.random.seed(cfg.seed) - n_states = env.observation_space.shape[0] - n_actions = env.action_space.shape[0] - max_action = float(env.action_space.high[0]) - td3= TD3(n_states,n_actions,max_action,cfg) - cfg.model_path = './TD3/results/HalfCheetah-v2/20210416-130341/models/' - td3.load(cfg.model_path) - td3_rewards,td3_ma_rewards = eval(cfg.env,td3,cfg.seed) - make_dir(cfg.result_path,cfg.model_path) - save_results(td3_rewards,td3_ma_rewards,tag='eval',path=cfg.result_path) - plot_rewards({'td3_rewards':td3_rewards,'td3_ma_rewards':td3_ma_rewards,},tag="eval",env=cfg.env,algo = cfg.algo,path=cfg.result_path) - # cfg.result_path = './TD3/results/HalfCheetah-v2/20210416-130341/' - # agent.load(cfg.result_path) - # eval(cfg.env,agent, cfg.seed) \ No newline at end of file diff --git a/projects/codes/TD3/task0_train.py b/projects/codes/TD3/task0_train.py deleted file mode 100644 index 58e4af9..0000000 --- a/projects/codes/TD3/task0_train.py +++ /dev/null @@ -1,173 +0,0 @@ -import sys,os -curr_path = os.path.dirname(__file__) -parent_path=os.path.dirname(curr_path) -sys.path.append(parent_path) # add current terminal path to sys.path - -import torch -import gym -import numpy as np -import datetime - - -from TD3.agent import TD3 -from common.plot import plot_rewards -from common.utils import save_results,make_dir - -curr_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S") # obtain current time - - -class TD3Config: - def __init__(self) -> None: - self.algo = 'TD3' - self.env = 'HalfCheetah-v2' - self.seed = 0 - self.result_path = curr_path+"/results/" +self.env+'/'+curr_time+'/results/' # path to save results - self.model_path = curr_path+"/results/" +self.env+'/'+curr_time+'/models/' # path to save models - self.start_timestep = 25e3 # Time steps initial random policy is used - self.eval_freq = 5e3 # How often (time steps) we evaluate - # self.train_eps = 800 - self.max_timestep = 4000000 # Max time steps to run environment - self.expl_noise = 0.1 # Std of Gaussian exploration noise - self.batch_size = 256 # Batch size for both actor and critic - self.gamma = 0.99 # gamma factor - self.lr = 0.0005 # Target network update rate - self.policy_noise = 0.2 # Noise added to target policy during critic update - self.noise_clip = 0.5 # Range to clip target policy noise - self.policy_freq = 2 # Frequency of delayed policy updates - self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") - -# Runs policy for X episodes and returns average reward -# A fixed seed is used for the eval environment -def eval(env,agent, seed, eval_episodes=10): - eval_env = gym.make(env) - eval_env.seed(seed + 100) - avg_reward = 0. - for _ in range(eval_episodes): - state, done = eval_env.reset(), False - while not done: - # eval_env.render() - action = agent.choose_action(np.array(state)) - state, reward, done, _ = eval_env.step(action) - avg_reward += reward - avg_reward /= eval_episodes - print("---------------------------------------") - print(f"Evaluation over {eval_episodes} episodes: {avg_reward:.3f}") - print("---------------------------------------") - return avg_reward - -def train(cfg,env,agent): - # Evaluate untrained policy - evaluations = [eval(cfg.env,agent, cfg.seed)] - state, done = env.reset(), False - ep_reward = 0 - ep_timesteps = 0 - episode_num = 0 - rewards = [] - ma_rewards = [] # moveing average reward - for t in range(int(cfg.max_timestep)): - ep_timesteps += 1 - # Select action randomly or according to policy - if t < cfg.start_timestep: - action = env.action_space.sample() - else: - action = ( - agent.choose_action(np.array(state)) - + np.random.normal(0, max_action * cfg.expl_noise, size=n_actions) - ).clip(-max_action, max_action) - # Perform action - next_state, reward, done, _ = env.step(action) - done_bool = float(done) if ep_timesteps < env._max_episode_steps else 0 - # Store data in replay buffer - agent.memory.push(state, action, next_state, reward, done_bool) - state = next_state - ep_reward += reward - # Train agent after collecting sufficient data - if t >= cfg.start_timestep: - agent.update() - if done: - # +1 to account for 0 indexing. +0 on ep_timesteps since it will increment +1 even if done=True - print(f"Episode:{episode_num+1}, Episode T:{ep_timesteps}, Reward:{ep_reward:.3f}") - # Reset environment - state, done = env.reset(), False - rewards.append(ep_reward) - # 计算滑动窗口的reward - if ma_rewards: - ma_rewards.append(0.9*ma_rewards[-1]+0.1*ep_reward) - else: - ma_rewards.append(ep_reward) - ep_reward = 0 - ep_timesteps = 0 - episode_num += 1 - # Evaluate episode - if (t + 1) % cfg.eval_freq == 0: - evaluations.append(eval(cfg.env,agent, cfg.seed)) - return rewards, ma_rewards -# def train(cfg,env,agent): -# evaluations = [eval(cfg.env,agent,cfg.seed)] -# ep_reward = 0 -# tot_timestep = 0 -# rewards = [] -# ma_rewards = [] # moveing average reward -# for i_ep in range(int(cfg.train_eps)): -# state, done = env.reset(), False -# ep_reward = 0 -# ep_timestep = 0 -# while not done: -# ep_timestep += 1 -# tot_timestep +=1 -# # Select action randomly or according to policy -# if tot_timestep < cfg.start_timestep: -# action = env.action_space.sample() -# else: -# action = ( -# agent.choose_action(np.array(state)) -# + np.random.normal(0, max_action * cfg.expl_noise, size=n_actions) -# ).clip(-max_action, max_action) -# # action = ( -# # agent.choose_action(np.array(state)) -# # + np.random.normal(0, max_action * cfg.expl_noise, size=n_actions) -# # ).clip(-max_action, max_action) -# # Perform action -# next_state, reward, done, _ = env.step(action) -# done_bool = float(done) if ep_timestep < env._max_episode_steps else 0 - -# # Store data in replay buffer -# agent.memory.push(state, action, next_state, reward, done_bool) -# state = next_state -# ep_reward += reward -# # Train agent after collecting sufficient data -# if tot_timestep >= cfg.start_timestep: -# agent.update() -# print(f"Episode:{i_ep}/{cfg.train_eps}, Episode Timestep:{ep_timestep}, Reward:{ep_reward:.3f}") -# rewards.append(ep_reward) -# # 计算滑动窗口的reward -# if ma_rewards: -# ma_rewards.append(0.9*ma_rewards[-1]+0.1*ep_reward) -# else: -# ma_rewards.append(ep_reward) -# # Evaluate episode -# if (i_ep+1) % cfg.eval_freq == 0: -# evaluations.append(eval(cfg.env,agent, cfg.seed)) -# return rewards,ma_rewards - - -if __name__ == "__main__": - cfg = TD3Config() - env = gym.make(cfg.env) - env.seed(cfg.seed) # Set seeds - torch.manual_seed(cfg.seed) - np.random.seed(cfg.seed) - n_states = env.observation_space.shape[0] - n_actions = env.action_space.shape[0] - max_action = float(env.action_space.high[0]) - agent = TD3(n_states,n_actions,max_action,cfg) - rewards,ma_rewards = train(cfg,env,agent) - make_dir(cfg.result_path,cfg.model_path) - agent.save(path=cfg.model_path) - save_results(rewards,ma_rewards,tag='train',path=cfg.result_path) - plot_rewards(rewards,ma_rewards,tag="train",env=cfg.env,algo = cfg.algo,path=cfg.result_path) - # cfg.result_path = './TD3/results/HalfCheetah-v2/20210416-130341/' - # agent.load(cfg.result_path) - # eval(cfg.env,agent, cfg.seed) - - diff --git a/projects/codes/TD3/task1_eval.py b/projects/codes/TD3/task1_eval.py deleted file mode 100644 index 0d28c48..0000000 --- a/projects/codes/TD3/task1_eval.py +++ /dev/null @@ -1,83 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: JiangJi -Email: johnjim0816@gmail.com -Date: 2021-04-23 20:36:23 -LastEditor: JiangJi -LastEditTime: 2021-04-28 10:14:33 -Discription: -Environment: -''' -import sys,os -curr_path = os.path.dirname(__file__) -parent_path=os.path.dirname(curr_path) -sys.path.append(parent_path) # add current terminal path to sys.path - -import torch -import gym -import numpy as np -import datetime - - -from TD3.agent import TD3 -from common.plot import plot_rewards -from common.utils import save_results,make_dir - -curr_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S") # obtain current time - -class TD3Config: - def __init__(self) -> None: - self.algo = 'TD3' - self.env = 'Pendulum-v0' - self.seed = 0 - self.result_path = curr_path+"/results/" +self.env+'/'+curr_time+'/results/' # path to save results - self.model_path = curr_path+"/results/" +self.env+'/'+curr_time+'/models/' # path to save models - self.batch_size = 256 # Batch size for both actor and critic - self.gamma = 0.99 # gamma factor - self.lr = 0.0005 # Target network update rate - self.policy_noise = 0.2 # Noise added to target policy during critic update - self.noise_clip = 0.5 # Range to clip target policy noise - self.policy_freq = 2 # Frequency of delayed policy updates - self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") - -# Runs policy for X episodes and returns average reward -# A fixed seed is used for the eval environment -def eval(env_name,agent, seed, eval_episodes=50): - eval_env = gym.make(env_name) - eval_env.seed(seed + 100) - rewards,ma_rewards =[],[] - for i_episode in range(eval_episodes): - ep_reward = 0 - state, done = eval_env.reset(), False - while not done: - # eval_env.render() - action = agent.choose_action(np.array(state)) - state, reward, done, _ = eval_env.step(action) - ep_reward += reward - print(f"Episode:{i_episode+1}, Reward:{ep_reward:.3f}") - rewards.append(ep_reward) - # 计算滑动窗口的reward - if ma_rewards: - ma_rewards.append(0.9*ma_rewards[-1]+0.1*ep_reward) - else: - ma_rewards.append(ep_reward) - return rewards,ma_rewards - -if __name__ == "__main__": - cfg = TD3Config() - env = gym.make(cfg.env) - env.seed(cfg.seed) # Set seeds - torch.manual_seed(cfg.seed) - np.random.seed(cfg.seed) - n_states = env.observation_space.shape[0] - n_actions = env.action_space.shape[0] - max_action = float(env.action_space.high[0]) - td3= TD3(n_states,n_actions,max_action,cfg) - cfg.model_path = './TD3/results/Pendulum-v0/20210428-092059/models/' - cfg.result_path = './TD3/results/Pendulum-v0/20210428-092059/results/' - td3.load(cfg.model_path) - rewards,ma_rewards = eval(cfg.env,td3,cfg.seed) - make_dir(cfg.result_path,cfg.model_path) - save_results(rewards,ma_rewards,tag='eval',path=cfg.result_path) - plot_rewards(rewards,ma_rewards,tag="train",env=cfg.env,algo = cfg.algo,path=cfg.result_path) \ No newline at end of file diff --git a/projects/codes/TD3/task1_train.py b/projects/codes/TD3/task1_train.py deleted file mode 100644 index 868f686..0000000 --- a/projects/codes/TD3/task1_train.py +++ /dev/null @@ -1,122 +0,0 @@ -import sys,os -curr_path = os.path.dirname(os.path.abspath(__file__)) # 当前文件所在绝对路径 -parent_path = os.path.dirname(curr_path) # 父路径 -sys.path.append(parent_path) # 添加路径到系统路径 - -import torch -import gym -import numpy as np -import datetime - -from TD3.agent import TD3 -from common.plot import plot_rewards -from common.utils import save_results,make_dir - -curr_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S") # 获取当前时间 - - -class TD3Config: - def __init__(self) -> None: - self.algo = 'TD3' # 算法名称 - self.env_name = 'Pendulum-v1' # 环境名称 - self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # 检测GPU - self.train_eps = 600 # 训练的回合数 - self.start_timestep = 25e3 # Time steps initial random policy is used - self.epsilon_start = 50 # Episodes initial random policy is used - self.eval_freq = 10 # How often (episodes) we evaluate - self.max_timestep = 100000 # Max time steps to run environment - self.expl_noise = 0.1 # Std of Gaussian exploration noise - self.batch_size = 256 # Batch size for both actor and critic - self.gamma = 0.9 # gamma factor - self.lr = 0.0005 # 学习率 - self.policy_noise = 0.2 # Noise added to target policy during critic update - self.noise_clip = 0.3 # Range to clip target policy noise - self.policy_freq = 2 # Frequency of delayed policy updates -class PlotConfig(TD3Config): - def __init__(self) -> None: - super().__init__() - self.result_path = curr_path+"/outputs/" + self.env_name + \ - '/'+curr_time+'/results/' # 保存结果的路径 - self.model_path = curr_path+"/outputs/" + self.env_name + \ - '/'+curr_time+'/models/' # 保存模型的路径 - self.save = True # 是否保存图片 - - - -# Runs policy for X episodes and returns average reward -# A fixed seed is used for the eval environment -def eval(env,agent, seed, eval_episodes=10): - eval_env = gym.make(env) - eval_env.seed(seed + 100) - avg_reward = 0. - for _ in range(eval_episodes): - state, done = eval_env.reset(), False - while not done: - # eval_env.render() - action = agent.choose_action(np.array(state)) - state, reward, done, _ = eval_env.step(action) - avg_reward += reward - avg_reward /= eval_episodes - print("---------------------------------------") - print(f"Evaluation over {eval_episodes} episodes: {avg_reward:.3f}") - print("---------------------------------------") - return avg_reward - -def train(cfg,env,agent): - print('开始训练!') - print(f'环境:{cfg.env_name}, 算法:{cfg.algo}, 设备:{cfg.device}') - rewards = [] # 记录所有回合的奖励 - ma_rewards = [] # 记录所有回合的滑动平均奖励 - for i_ep in range(int(cfg.train_eps)): - ep_reward = 0 - ep_timesteps = 0 - state, done = env.reset(), False - while not done: - ep_timesteps += 1 - # Select action randomly or according to policy - if i_ep < cfg.epsilon_start: - action = env.action_space.sample() - else: - action = ( - agent.choose_action(np.array(state)) - + np.random.normal(0, max_action * cfg.expl_noise, size=n_actions) - ).clip(-max_action, max_action) - # Perform action - next_state, reward, done, _ = env.step(action) - done_bool = float(done) if ep_timesteps < env._max_episode_steps else 0 - # Store data in replay buffer - agent.memory.push(state, action, next_state, reward, done_bool) - state = next_state - ep_reward += reward - # Train agent after collecting sufficient data - if i_ep+1 >= cfg.epsilon_start: - agent.update() - if (i_ep+1)%10 == 0: - print('回合:{}/{}, 奖励:{:.2f}'.format(i_ep+1, cfg.train_eps, ep_reward)) - rewards.append(ep_reward) - if ma_rewards: - ma_rewards.append(0.9*ma_rewards[-1]+0.1*ep_reward) - else: - ma_rewards.append(ep_reward) - print('完成训练!') - return rewards, ma_rewards - - -if __name__ == "__main__": - cfg = TD3Config() - plot_cfg = PlotConfig() - env = gym.make(cfg.env_name) - env.seed(1) # 随机种子 - torch.manual_seed(1) - np.random.seed(1) - n_states = env.observation_space.shape[0] - n_actions = env.action_space.shape[0] - max_action = float(env.action_space.high[0]) - agent = TD3(n_states,n_actions,max_action,cfg) - rewards,ma_rewards = train(cfg,env,agent) - make_dir(plot_cfg.result_path,plot_cfg.model_path) - agent.save(path=plot_cfg.model_path) - save_results(rewards,ma_rewards,tag='train',path=plot_cfg.result_path) - plot_rewards(rewards,ma_rewards,plot_cfg,tag="train") - - diff --git a/projects/codes/assets/image-20200820174307301.png b/projects/codes/assets/image-20200820174307301.png deleted file mode 100644 index 1197da0..0000000 Binary files a/projects/codes/assets/image-20200820174307301.png and /dev/null differ diff --git a/projects/codes/assets/image-20200820174814084.png b/projects/codes/assets/image-20200820174814084.png deleted file mode 100644 index 4c9e3dc..0000000 Binary files a/projects/codes/assets/image-20200820174814084.png and /dev/null differ diff --git a/projects/codes/common/atari_wrappers.py b/projects/codes/common/atari_wrappers.py deleted file mode 100644 index 48dab94..0000000 --- a/projects/codes/common/atari_wrappers.py +++ /dev/null @@ -1,284 +0,0 @@ -import numpy as np -import os -os.environ.setdefault('PATH', '') -from collections import deque -import gym -from gym import spaces -import cv2 -cv2.ocl.setUseOpenCL(False) -from .wrappers import TimeLimit - - -class NoopResetEnv(gym.Wrapper): - def __init__(self, env, noop_max=30): - """Sample initial states by taking random number of no-ops on reset. - No-op is assumed to be action 0. - """ - gym.Wrapper.__init__(self, env) - self.noop_max = noop_max - self.override_num_noops = None - self.noop_action = 0 - assert env.unwrapped.get_action_meanings()[0] == 'NOOP' - - def reset(self, **kwargs): - """ Do no-op action for a number of steps in [1, noop_max].""" - self.env.reset(**kwargs) - if self.override_num_noops is not None: - noops = self.override_num_noops - else: - noops = self.unwrapped.np_random.randint(1, self.noop_max + 1) #pylint: disable=E1101 - assert noops > 0 - obs = None - for _ in range(noops): - obs, _, done, _ = self.env.step(self.noop_action) - if done: - obs = self.env.reset(**kwargs) - return obs - - def step(self, ac): - return self.env.step(ac) - -class FireResetEnv(gym.Wrapper): - def __init__(self, env): - """Take action on reset for environments that are fixed until firing.""" - gym.Wrapper.__init__(self, env) - assert env.unwrapped.get_action_meanings()[1] == 'FIRE' - assert len(env.unwrapped.get_action_meanings()) >= 3 - - def reset(self, **kwargs): - self.env.reset(**kwargs) - obs, _, done, _ = self.env.step(1) - if done: - self.env.reset(**kwargs) - obs, _, done, _ = self.env.step(2) - if done: - self.env.reset(**kwargs) - return obs - - def step(self, ac): - return self.env.step(ac) - -class EpisodicLifeEnv(gym.Wrapper): - def __init__(self, env): - """Make end-of-life == end-of-episode, but only reset on true game over. - Done by DeepMind for the DQN and co. since it helps value estimation. - """ - gym.Wrapper.__init__(self, env) - self.lives = 0 - self.was_real_done = True - - def step(self, action): - obs, reward, done, info = self.env.step(action) - self.was_real_done = done - # check current lives, make loss of life terminal, - # then update lives to handle bonus lives - lives = self.env.unwrapped.ale.lives() - if lives < self.lives and lives > 0: - # for Qbert sometimes we stay in lives == 0 condition for a few frames - # so it's important to keep lives > 0, so that we only reset once - # the environment advertises done. - done = True - self.lives = lives - return obs, reward, done, info - - def reset(self, **kwargs): - """Reset only when lives are exhausted. - This way all states are still reachable even though lives are episodic, - and the learner need not know about any of this behind-the-scenes. - """ - if self.was_real_done: - obs = self.env.reset(**kwargs) - else: - # no-op step to advance from terminal/lost life state - obs, _, _, _ = self.env.step(0) - self.lives = self.env.unwrapped.ale.lives() - return obs - -class MaxAndSkipEnv(gym.Wrapper): - def __init__(self, env, skip=4): - """Return only every `skip`-th frame""" - gym.Wrapper.__init__(self, env) - # most recent raw observations (for max pooling across time steps) - self._obs_buffer = np.zeros((2,)+env.observation_space.shape, dtype=np.uint8) - self._skip = skip - - def step(self, action): - """Repeat action, sum reward, and max over last observations.""" - total_reward = 0.0 - done = None - for i in range(self._skip): - obs, reward, done, info = self.env.step(action) - if i == self._skip - 2: self._obs_buffer[0] = obs - if i == self._skip - 1: self._obs_buffer[1] = obs - total_reward += reward - if done: - break - # Note that the observation on the done=True frame - # doesn't matter - max_frame = self._obs_buffer.max(axis=0) - - return max_frame, total_reward, done, info - - def reset(self, **kwargs): - return self.env.reset(**kwargs) - -class ClipRewardEnv(gym.RewardWrapper): - def __init__(self, env): - gym.RewardWrapper.__init__(self, env) - - def reward(self, reward): - """Bin reward to {+1, 0, -1} by its sign.""" - return np.sign(reward) - - -class WarpFrame(gym.ObservationWrapper): - def __init__(self, env, width=84, height=84, grayscale=True, dict_space_key=None): - """ - Warp frames to 84x84 as done in the Nature paper and later work. - If the environment uses dictionary observations, `dict_space_key` can be specified which indicates which - observation should be warped. - """ - super().__init__(env) - self._width = width - self._height = height - self._grayscale = grayscale - self._key = dict_space_key - if self._grayscale: - num_colors = 1 - else: - num_colors = 3 - - new_space = gym.spaces.Box( - low=0, - high=255, - shape=(self._height, self._width, num_colors), - dtype=np.uint8, - ) - if self._key is None: - original_space = self.observation_space - self.observation_space = new_space - else: - original_space = self.observation_space.spaces[self._key] - self.observation_space.spaces[self._key] = new_space - assert original_space.dtype == np.uint8 and len(original_space.shape) == 3 - - def observation(self, obs): - if self._key is None: - frame = obs - else: - frame = obs[self._key] - - if self._grayscale: - frame = cv2.cvtColor(frame, cv2.COLOR_RGB2GRAY) - frame = cv2.resize( - frame, (self._width, self._height), interpolation=cv2.INTER_AREA - ) - if self._grayscale: - frame = np.expand_dims(frame, -1) - - if self._key is None: - obs = frame - else: - obs = obs.copy() - obs[self._key] = frame - return obs - - -class FrameStack(gym.Wrapper): - def __init__(self, env, k): - """Stack k last frames. - Returns lazy array, which is much more memory efficient. - See Also - -------- - baselines.common.atari_wrappers.LazyFrames - """ - gym.Wrapper.__init__(self, env) - self.k = k - self.frames = deque([], maxlen=k) - shp = env.observation_space.shape - self.observation_space = spaces.Box(low=0, high=255, shape=(shp[:-1] + (shp[-1] * k,)), dtype=env.observation_space.dtype) - - def reset(self): - ob = self.env.reset() - for _ in range(self.k): - self.frames.append(ob) - return self._get_ob() - - def step(self, action): - ob, reward, done, info = self.env.step(action) - self.frames.append(ob) - return self._get_ob(), reward, done, info - - def _get_ob(self): - assert len(self.frames) == self.k - return LazyFrames(list(self.frames)) - -class ScaledFloatFrame(gym.ObservationWrapper): - def __init__(self, env): - gym.ObservationWrapper.__init__(self, env) - self.observation_space = gym.spaces.Box(low=0, high=1, shape=env.observation_space.shape, dtype=np.float32) - - def observation(self, observation): - # careful! This undoes the memory optimization, use - # with smaller replay buffers only. - return np.array(observation).astype(np.float32) / 255.0 - -class LazyFrames(object): - def __init__(self, frames): - """This object ensures that common frames between the observations are only stored once. - It exists purely to optimize memory usage which can be huge for DQN's 1M frames replay - buffers. - This object should only be converted to numpy array before being passed to the model. - You'd not believe how complex the previous solution was.""" - self._frames = frames - self._out = None - - def _force(self): - if self._out is None: - self._out = np.concatenate(self._frames, axis=-1) - self._frames = None - return self._out - - def __array__(self, dtype=None): - out = self._force() - if dtype is not None: - out = out.astype(dtype) - return out - - def __len__(self): - return len(self._force()) - - def __getitem__(self, i): - return self._force()[i] - - def count(self): - frames = self._force() - return frames.shape[frames.ndim - 1] - - def frame(self, i): - return self._force()[..., i] - -def make_atari(env_id, max_episode_steps=None): - env = gym.make(env_id) - assert 'NoFrameskip' in env.spec.id - env = NoopResetEnv(env, noop_max=30) - env = MaxAndSkipEnv(env, skip=4) - if max_episode_steps is not None: - env = TimeLimit(env, max_episode_steps=max_episode_steps) - return env - -def wrap_deepmind(env, episode_life=True, clip_rewards=True, frame_stack=False, scale=False): - """Configure environment for DeepMind-style Atari. - """ - if episode_life: - env = EpisodicLifeEnv(env) - if 'FIRE' in env.unwrapped.get_action_meanings(): - env = FireResetEnv(env) - env = WarpFrame(env) - if scale: - env = ScaledFloatFrame(env) - if clip_rewards: - env = ClipRewardEnv(env) - if frame_stack: - env = FrameStack(env, 4) - return env \ No newline at end of file diff --git a/projects/codes/common/config.py b/projects/codes/common/config.py deleted file mode 100644 index da0beb9..0000000 --- a/projects/codes/common/config.py +++ /dev/null @@ -1,38 +0,0 @@ - -class DefaultConfig: - def __init__(self) -> None: - pass - def print_cfg(self): - print(self.__dict__) -class GeneralConfig(DefaultConfig): - def __init__(self) -> None: - self.env_name = "CartPole-v1" # name of environment - self.algo_name = "DQN" # name of algorithm - self.mode = "train" # train or test - self.seed = 0 # random seed - self.device = "cuda" # device to use - self.train_eps = 200 # number of episodes for training - self.test_eps = 20 # number of episodes for testing - self.eval_eps = 10 # number of episodes for evaluation - self.eval_per_episode = 5 # evaluation per episode - self.max_steps = 200 # max steps for each episode - self.load_checkpoint = False - self.load_path = None # path to load model - self.show_fig = False # show figure or not - self.save_fig = True # save figure or not - -class AlgoConfig(DefaultConfig): - def __init__(self) -> None: - # set epsilon_start=epsilon_end can obtain fixed epsilon=epsilon_end - # self.epsilon_start = 0.95 # epsilon start value - # self.epsilon_end = 0.01 # epsilon end value - # self.epsilon_decay = 500 # epsilon decay rate - self.gamma = 0.95 # discount factor - # self.lr = 0.0001 # learning rate - # self.buffer_size = 100000 # size of replay buffer - # self.batch_size = 64 # batch size - # self.target_update = 4 # target network update frequency -class MergedConfig: - def __init__(self) -> None: - pass - \ No newline at end of file diff --git a/projects/codes/common/launcher.py b/projects/codes/common/launcher.py deleted file mode 100644 index 2c0793c..0000000 --- a/projects/codes/common/launcher.py +++ /dev/null @@ -1,124 +0,0 @@ -from common.utils import get_logger,save_results,save_cfgs,plot_rewards,merge_class_attrs,load_cfgs -from common.config import GeneralConfig,AlgoConfig,MergedConfig -import time -from pathlib import Path -import datetime -import argparse - -class Launcher: - def __init__(self) -> None: - self.get_cfg() - def get_cfg(self): - self.cfgs = {'general_cfg':GeneralConfig(),'algo_cfg':AlgoConfig()} # create config - def process_yaml_cfg(self): - ''' load yaml config - ''' - parser = argparse.ArgumentParser(description="hyperparameters") - parser.add_argument('--yaml', default = None, type=str,help='the path of config file') - args = parser.parse_args() - if args.yaml is not None: - load_cfgs(self.cfgs, args.yaml) - def print_cfg(self,cfg): - ''' print parameters - ''' - cfg_dict = vars(cfg) - print("Hyperparameters:") - print(''.join(['=']*80)) - tplt = "{:^20}\t{:^20}\t{:^20}" - print(tplt.format("Name", "Value", "Type")) - for k,v in cfg_dict.items(): - print(tplt.format(k,v,str(type(v)))) - print(''.join(['=']*80)) - def env_agent_config(self,cfg,logger): - env,agent = None,None - return env,agent - def train_one_episode(self,env, agent, cfg): - ep_reward = 0 - ep_step = 0 - return agent,ep_reward,ep_step - def test_one_episode(self, env, agent, cfg): - ep_reward = 0 - ep_step = 0 - return agent,ep_reward,ep_step - def evaluate(self, env, agent, cfg): - sum_eval_reward = 0 - for _ in range(cfg.eval_eps): - _,eval_ep_reward,_ = self.test_one_episode(env, agent, cfg) - sum_eval_reward += eval_ep_reward - mean_eval_reward = sum_eval_reward/cfg.eval_eps - return mean_eval_reward - # def train(self,cfg, env, agent,logger): - # res_dic = {} - # return res_dic - # def test(self,cfg, env, agent,logger): - # res_dic = {} - # return res_dic - def create_path(self,cfg): - curr_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S") # obtain current time - self.task_dir = f"{cfg.mode.capitalize()}_{cfg.env_name}_{cfg.algo_name}_{curr_time}" - Path(self.task_dir).mkdir(parents=True, exist_ok=True) - self.model_dir = f"{self.task_dir}/models/" - self.res_dir = f"{self.task_dir}/results/" - self.log_dir = f"{self.task_dir}/logs/" - def run(self): - self.process_yaml_cfg() # load yaml config - cfg = MergedConfig() # merge config - cfg = merge_class_attrs(cfg,self.cfgs['general_cfg']) - cfg = merge_class_attrs(cfg,self.cfgs['algo_cfg']) - self.print_cfg(cfg) # print the configuration - self.create_path(cfg) # create the path to save the results - logger = get_logger(self.log_dir) # create the logger - env, agent = self.env_agent_config(cfg,logger) - if cfg.load_checkpoint: - agent.load_model(f"{cfg.load_path}/models/") - logger.info(f"Start {cfg.mode}ing!") - logger.info(f"Env: {cfg.env_name}, Algorithm: {cfg.algo_name}, Device: {cfg.device}") - rewards = [] # record rewards for all episodes - steps = [] # record steps for all episodes - if cfg.mode.lower() == 'train': - best_ep_reward = -float('inf') - for i_ep in range(cfg.train_eps): - agent,ep_reward,ep_step = self.train_one_episode(env, agent, cfg) - logger.info(f"Episode: {i_ep+1}/{cfg.train_eps}, Reward: {ep_reward:.3f}, Step: {ep_step}") - rewards.append(ep_reward) - steps.append(ep_step) - # for _ in range - if (i_ep+1)%cfg.eval_per_episode == 0: - mean_eval_reward = self.evaluate(env, agent, cfg) - if mean_eval_reward >= best_ep_reward: # update best reward - logger.info(f"Current episode {i_ep+1} has the best eval reward: {mean_eval_reward:.3f}") - best_ep_reward = mean_eval_reward - agent.save_model(self.model_dir) # save models with best reward - # env.close() - elif cfg.mode.lower() == 'test': - for i_ep in range(cfg.test_eps): - agent,ep_reward,ep_step = self.test_one_episode(env, agent, cfg) - logger.info(f"Episode: {i_ep+1}/{cfg.test_eps}, Reward: {ep_reward:.3f}, Step: {ep_step}") - rewards.append(ep_reward) - steps.append(ep_step) - agent.save_model(self.model_dir) # save models - # env.close() - logger.info(f"Finish {cfg.mode}ing!") - res_dic = {'episodes':range(len(rewards)),'rewards':rewards,'steps':steps} - save_results(res_dic, self.res_dir) # save results - save_cfgs(self.cfgs, self.task_dir) # save config - plot_rewards(rewards, title=f"{cfg.mode.lower()}ing curve on {cfg.device} of {cfg.algo_name} for {cfg.env_name}" ,fpath= self.res_dir) - # def run(self): - # self.process_yaml_cfg() # load yaml config - # cfg = MergedConfig() # merge config - # cfg = merge_class_attrs(cfg,self.cfgs['general_cfg']) - # cfg = merge_class_attrs(cfg,self.cfgs['algo_cfg']) - # self.print_cfg(cfg) # print the configuration - # self.create_path(cfg) # create the path to save the results - # logger = get_logger(self.log_dir) # create the logger - # env, agent = self.env_agent_config(cfg,logger) - # if cfg.load_checkpoint: - # agent.load_model(f"{cfg.load_path}/models/") - # if cfg.mode.lower() == 'train': - # res_dic = self.train(cfg, env, agent,logger) - # elif cfg.mode.lower() == 'test': - # res_dic = self.test(cfg, env, agent,logger) - # save_results(res_dic, self.res_dir) # save results - # save_cfgs(self.cfgs, self.task_dir) # save config - # agent.save_model(self.model_dir) # save models - # plot_rewards(res_dic['rewards'], title=f"{cfg.mode.lower()}ing curve on {cfg.device} of {cfg.algo_name} for {cfg.env_name}" ,fpath= self.res_dir) \ No newline at end of file diff --git a/projects/codes/common/memories.py b/projects/codes/common/memories.py deleted file mode 100644 index fd50ab9..0000000 --- a/projects/codes/common/memories.py +++ /dev/null @@ -1,207 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -@Author: John -@Email: johnjim0816@gmail.com -@Date: 2020-06-10 15:27:16 -@LastEditor: John -LastEditTime: 2022-08-28 23:44:06 -@Discription: -@Environment: python 3.7.7 -''' -import random -import numpy as np -from collections import deque -class ReplayBuffer: - def __init__(self, capacity): - self.capacity = capacity # 经验回放的容量 - self.buffer = [] # 缓冲区 - self.position = 0 - - def push(self, state, action, reward, next_state, done): - ''' 缓冲区是一个队列,容量超出时去掉开始存入的转移(transition) - ''' - if len(self.buffer) < self.capacity: - self.buffer.append(None) - self.buffer[self.position] = (state, action, reward, next_state, done) - self.position = (self.position + 1) % self.capacity - - def sample(self, batch_size): - batch = random.sample(self.buffer, batch_size) # 随机采出小批量转移 - state, action, reward, next_state, done = zip(*batch) # 解压成状态,动作等 - return state, action, reward, next_state, done - - def __len__(self): - ''' 返回当前存储的量 - ''' - return len(self.buffer) - -class ReplayBufferQue: - def __init__(self, capacity: int) -> None: - self.capacity = capacity - self.buffer = deque(maxlen=self.capacity) - def push(self,transitions): - '''_summary_ - Args: - trainsitions (tuple): _description_ - ''' - self.buffer.append(transitions) - def sample(self, batch_size: int, sequential: bool = False): - if batch_size > len(self.buffer): - batch_size = len(self.buffer) - if sequential: # sequential sampling - rand = random.randint(0, len(self.buffer) - batch_size) - batch = [self.buffer[i] for i in range(rand, rand + batch_size)] - return zip(*batch) - else: - batch = random.sample(self.buffer, batch_size) - return zip(*batch) - def clear(self): - self.buffer.clear() - def __len__(self): - return len(self.buffer) - -class PGReplay(ReplayBufferQue): - '''replay buffer for policy gradient based methods, each time these methods will sample all transitions - Args: - ReplayBufferQue (_type_): _description_ - ''' - def __init__(self): - self.buffer = deque() - def sample(self): - ''' sample all the transitions - ''' - batch = list(self.buffer) - return zip(*batch) - -class SumTree: - '''SumTree for the per(Prioritized Experience Replay) DQN. - This SumTree code is a modified version and the original code is from: - https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/blob/master/contents/5.2_Prioritized_Replay_DQN/RL_brain.py - ''' - def __init__(self, capacity: int): - self.capacity = capacity - self.data_pointer = 0 - self.n_entries = 0 - self.tree = np.zeros(2 * capacity - 1) - self.data = np.zeros(capacity, dtype = object) - - def update(self, tree_idx, p): - '''Update the sampling weight - ''' - change = p - self.tree[tree_idx] - self.tree[tree_idx] = p - - while tree_idx != 0: - tree_idx = (tree_idx - 1) // 2 - self.tree[tree_idx] += change - - def add(self, p, data): - '''Adding new data to the sumTree - ''' - tree_idx = self.data_pointer + self.capacity - 1 - self.data[self.data_pointer] = data - # print ("tree_idx=", tree_idx) - # print ("nonzero = ", np.count_nonzero(self.tree)) - self.update(tree_idx, p) - - self.data_pointer += 1 - if self.data_pointer >= self.capacity: - self.data_pointer = 0 - - if self.n_entries < self.capacity: - self.n_entries += 1 - - def get_leaf(self, v): - '''Sampling the data - ''' - parent_idx = 0 - while True: - cl_idx = 2 * parent_idx + 1 - cr_idx = cl_idx + 1 - if cl_idx >= len(self.tree): - leaf_idx = parent_idx - break - else: - if v <= self.tree[cl_idx] : - parent_idx = cl_idx - else: - v -= self.tree[cl_idx] - parent_idx = cr_idx - - data_idx = leaf_idx - self.capacity + 1 - return leaf_idx, self.tree[leaf_idx], self.data[data_idx] - - def total(self): - return int(self.tree[0]) - -class ReplayTree: - '''ReplayTree for the per(Prioritized Experience Replay) DQN. - ''' - def __init__(self, capacity): - self.capacity = capacity # the capacity for memory replay - self.tree = SumTree(capacity) - self.abs_err_upper = 1. - - ## hyper parameter for calculating the importance sampling weight - self.beta_increment_per_sampling = 0.001 - self.alpha = 0.6 - self.beta = 0.4 - self.epsilon = 0.01 - self.abs_err_upper = 1. - - def __len__(self): - ''' return the num of storage - ''' - return self.tree.total() - - def push(self, error, sample): - '''Push the sample into the replay according to the importance sampling weight - ''' - p = (np.abs(error) + self.epsilon) ** self.alpha - self.tree.add(p, sample) - - - def sample(self, batch_size): - '''This is for sampling a batch data and the original code is from: - https://github.com/rlcode/per/blob/master/prioritized_memory.py - ''' - pri_segment = self.tree.total() / batch_size - - priorities = [] - batch = [] - idxs = [] - - is_weights = [] - - self.beta = np.min([1., self.beta + self.beta_increment_per_sampling]) - min_prob = np.min(self.tree.tree[-self.tree.capacity:]) / self.tree.total() - - for i in range(batch_size): - a = pri_segment * i - b = pri_segment * (i+1) - - s = random.uniform(a, b) - idx, p, data = self.tree.get_leaf(s) - - priorities.append(p) - batch.append(data) - idxs.append(idx) - prob = p / self.tree.total() - - sampling_probabilities = np.array(priorities) / self.tree.total() - is_weights = np.power(self.tree.n_entries * sampling_probabilities, -self.beta) - is_weights /= is_weights.max() - - return zip(*batch), idxs, is_weights - - def batch_update(self, tree_idx, abs_errors): - '''Update the importance sampling weight - ''' - abs_errors += self.epsilon - - clipped_errors = np.minimum(abs_errors, self.abs_err_upper) - ps = np.power(clipped_errors, self.alpha) - - for ti, p in zip(tree_idx, ps): - self.tree.update(ti, p) diff --git a/projects/codes/common/models.py b/projects/codes/common/models.py deleted file mode 100644 index 41d1b17..0000000 --- a/projects/codes/common/models.py +++ /dev/null @@ -1,139 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: John -Email: johnjim0816@gmail.com -Date: 2021-03-12 21:14:12 -LastEditor: John -LastEditTime: 2022-10-31 23:53:06 -Discription: -Environment: -''' -import torch -import torch.nn as nn -import torch.nn.functional as F -from torch.distributions import Categorical - -class MLP(nn.Module): - def __init__(self, input_dim,output_dim,hidden_dim=128): - """ 初始化q网络,为全连接网络 - input_dim: 输入的特征数即环境的状态维度 - output_dim: 输出的动作维度 - """ - super(MLP, self).__init__() - self.fc1 = nn.Linear(input_dim, hidden_dim) # 输入层 - self.fc2 = nn.Linear(hidden_dim,hidden_dim) # 隐藏层 - self.fc3 = nn.Linear(hidden_dim, output_dim) # 输出层 - - def forward(self, x): - # 各层对应的激活函数 - x = F.relu(self.fc1(x)) - x = F.relu(self.fc2(x)) - return self.fc3(x) - -class ActorSoftmax(nn.Module): - def __init__(self, input_dim, output_dim, hidden_dim=256): - super(ActorSoftmax, self).__init__() - self.fc1 = nn.Linear(input_dim, hidden_dim) - self.fc2 = nn.Linear(hidden_dim, hidden_dim) - self.fc3 = nn.Linear(hidden_dim, output_dim) - def forward(self,x): - x = F.relu(self.fc1(x)) - x = F.relu(self.fc2(x)) - probs = F.softmax(self.fc3(x),dim=1) - return probs - -class ActorSoftmaxTanh(nn.Module): - def __init__(self, input_dim, output_dim, hidden_dim=256): - super(ActorSoftmaxTanh, self).__init__() - self.fc1 = nn.Linear(input_dim, hidden_dim) - self.fc2 = nn.Linear(hidden_dim, hidden_dim) - self.fc3 = nn.Linear(hidden_dim, output_dim) - def forward(self,x): - x = F.tanh(self.fc1(x)) - x = F.tanh(self.fc2(x)) - probs = F.softmax(self.fc3(x),dim=1) - return probs -class ActorNormal(nn.Module): - def __init__(self, n_states,n_actions, hidden_dim=256): - super(ActorNormal, self).__init__() - self.fc1 = nn.Linear(n_states, hidden_dim) - self.fc2 = nn.Linear(hidden_dim, hidden_dim) - self.fc3 = nn.Linear(hidden_dim, n_actions) - self.fc4 = nn.Linear(hidden_dim, n_actions) - def forward(self,x): - x = F.relu(self.fc1(x)) - x = F.relu(self.fc2(x)) - mu = torch.tanh(self.fc3(x)) - sigma = F.softplus(self.fc4(x)) + 0.001 # avoid 0 - return mu,sigma -# class ActorSoftmax(nn.Module): -# def __init__(self,input_dim, output_dim, -# hidden_dim=256): -# super(ActorSoftmax, self).__init__() -# self.actor = nn.Sequential( -# nn.Linear(input_dim, hidden_dim), -# nn.ReLU(), -# nn.Linear(hidden_dim, hidden_dim), -# nn.ReLU(), -# nn.Linear(hidden_dim, output_dim), -# nn.Softmax(dim=-1) -# ) -# def forward(self, state): -# probs = self.actor(state) -# dist = Categorical(probs) -# return dist -class Critic(nn.Module): - def __init__(self,input_dim,output_dim,hidden_dim=256): - super(Critic,self).__init__() - assert output_dim == 1 # critic must output a single value - self.fc1 = nn.Linear(input_dim, hidden_dim) - self.fc2 = nn.Linear(hidden_dim, hidden_dim) - self.fc3 = nn.Linear(hidden_dim, output_dim) - def forward(self,x): - x = F.relu(self.fc1(x)) - x = F.relu(self.fc2(x)) - value = self.fc3(x) - return value - -class ActorCriticSoftmax(nn.Module): - def __init__(self, input_dim, output_dim, actor_hidden_dim=256,critic_hidden_dim=256): - super(ActorCriticSoftmax, self).__init__() - - self.critic_fc1 = nn.Linear(input_dim, critic_hidden_dim) - self.critic_fc2 = nn.Linear(critic_hidden_dim, 1) - - self.actor_fc1 = nn.Linear(input_dim, actor_hidden_dim) - self.actor_fc2 = nn.Linear(actor_hidden_dim, output_dim) - - def forward(self, state): - # state = Variable(torch.from_numpy(state).float().unsqueeze(0)) - value = F.relu(self.critic_fc1(state)) - value = self.critic_fc2(value) - - policy_dist = F.relu(self.actor_fc1(state)) - policy_dist = F.softmax(self.actor_fc2(policy_dist), dim=1) - - return value, policy_dist - -class ActorCritic(nn.Module): - def __init__(self, input_dim, output_dim, hidden_dim=256): - super(ActorCritic, self).__init__() - self.critic = nn.Sequential( - nn.Linear(input_dim, hidden_dim), - nn.ReLU(), - nn.Linear(hidden_dim, 1) - ) - - self.actor = nn.Sequential( - nn.Linear(input_dim, hidden_dim), - nn.ReLU(), - nn.Linear(hidden_dim, output_dim), - nn.Softmax(dim=1), - ) - - def forward(self, x): - value = self.critic(x) - probs = self.actor(x) - dist = Categorical(probs) - return dist, value \ No newline at end of file diff --git a/projects/codes/common/multiprocessing_env.py b/projects/codes/common/multiprocessing_env.py deleted file mode 100644 index 28c8aba..0000000 --- a/projects/codes/common/multiprocessing_env.py +++ /dev/null @@ -1,153 +0,0 @@ -# 该代码来自 openai baseline,用于多线程环境 -# https://github.com/openai/baselines/tree/master/baselines/common/vec_env - -import numpy as np -from multiprocessing import Process, Pipe - -def worker(remote, parent_remote, env_fn_wrapper): - parent_remote.close() - env = env_fn_wrapper.x() - while True: - cmd, data = remote.recv() - if cmd == 'step': - ob, reward, done, info = env.step(data) - if done: - ob = env.reset() - remote.send((ob, reward, done, info)) - elif cmd == 'reset': - ob = env.reset() - remote.send(ob) - elif cmd == 'reset_task': - ob = env.reset_task() - remote.send(ob) - elif cmd == 'close': - remote.close() - break - elif cmd == 'get_spaces': - remote.send((env.observation_space, env.action_space)) - else: - raise NotImplementedError - -class VecEnv(object): - """ - An abstract asynchronous, vectorized environment. - """ - def __init__(self, num_envs, observation_space, action_space): - self.num_envs = num_envs - self.observation_space = observation_space - self.action_space = action_space - - def reset(self): - """ - Reset all the environments and return an array of - observations, or a tuple of observation arrays. - If step_async is still doing work, that work will - be cancelled and step_wait() should not be called - until step_async() is invoked again. - """ - pass - - def step_async(self, actions): - """ - Tell all the environments to start taking a step - with the given actions. - Call step_wait() to get the results of the step. - You should not call this if a step_async run is - already pending. - """ - pass - - def step_wait(self): - """ - Wait for the step taken with step_async(). - Returns (obs, rews, dones, infos): - - obs: an array of observations, or a tuple of - arrays of observations. - - rews: an array of rewards - - dones: an array of "episode done" booleans - - infos: a sequence of info objects - """ - pass - - def close(self): - """ - Clean up the environments' resources. - """ - pass - - def step(self, actions): - self.step_async(actions) - return self.step_wait() - - -class CloudpickleWrapper(object): - """ - Uses cloudpickle to serialize contents (otherwise multiprocessing tries to use pickle) - """ - def __init__(self, x): - self.x = x - def __getstate__(self): - import cloudpickle - return cloudpickle.dumps(self.x) - def __setstate__(self, ob): - import pickle - self.x = pickle.loads(ob) - - -class SubprocVecEnv(VecEnv): - def __init__(self, env_fns, spaces=None): - """ - envs: list of gym environments to run in subprocesses - """ - self.waiting = False - self.closed = False - nenvs = len(env_fns) - self.nenvs = nenvs - self.remotes, self.work_remotes = zip(*[Pipe() for _ in range(nenvs)]) - self.ps = [Process(target=worker, args=(work_remote, remote, CloudpickleWrapper(env_fn))) - for (work_remote, remote, env_fn) in zip(self.work_remotes, self.remotes, env_fns)] - for p in self.ps: - p.daemon = True # if the main process crashes, we should not cause things to hang - p.start() - for remote in self.work_remotes: - remote.close() - - self.remotes[0].send(('get_spaces', None)) - observation_space, action_space = self.remotes[0].recv() - VecEnv.__init__(self, len(env_fns), observation_space, action_space) - - def step_async(self, actions): - for remote, action in zip(self.remotes, actions): - remote.send(('step', action)) - self.waiting = True - - def step_wait(self): - results = [remote.recv() for remote in self.remotes] - self.waiting = False - obs, rews, dones, infos = zip(*results) - return np.stack(obs), np.stack(rews), np.stack(dones), infos - - def reset(self): - for remote in self.remotes: - remote.send(('reset', None)) - return np.stack([remote.recv() for remote in self.remotes]) - - def reset_task(self): - for remote in self.remotes: - remote.send(('reset_task', None)) - return np.stack([remote.recv() for remote in self.remotes]) - - def close(self): - if self.closed: - return - if self.waiting: - for remote in self.remotes: - remote.recv() - for remote in self.remotes: - remote.send(('close', None)) - for p in self.ps: - p.join() - self.closed = True - - def __len__(self): - return self.nenvs \ No newline at end of file diff --git a/projects/codes/common/utils.py b/projects/codes/common/utils.py deleted file mode 100644 index 212ec5f..0000000 --- a/projects/codes/common/utils.py +++ /dev/null @@ -1,195 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: John -Email: johnjim0816@gmail.com -Date: 2021-03-12 16:02:24 -LastEditor: John -LastEditTime: 2022-11-14 10:27:43 -Discription: -Environment: -''' -import os -import numpy as np -from pathlib import Path -import matplotlib.pyplot as plt -import seaborn as sns -import yaml -import pandas as pd -from functools import wraps -from time import time -import logging -from pathlib import Path - - -from matplotlib.font_manager import FontProperties # 导入字体模块 - -def chinese_font(): - ''' 设置中文字体,注意需要根据自己电脑情况更改字体路径,否则还是默认的字体 - ''' - try: - font = FontProperties( - fname='/System/Library/Fonts/STHeiti Light.ttc', size=15) # fname系统字体路径,此处是mac的 - except: - font = None - return font - -def plot_rewards_cn(rewards, ma_rewards, cfg, tag='train'): - ''' 中文画图 - ''' - sns.set() - plt.figure() - plt.title(u"{}环境下{}算法的学习曲线".format(cfg.env_name, - cfg.algo_name), fontproperties=chinese_font()) - plt.xlabel(u'回合数', fontproperties=chinese_font()) - plt.plot(rewards) - plt.plot(ma_rewards) - plt.legend((u'奖励', u'滑动平均奖励',), loc="best", prop=chinese_font()) - if cfg.save: - plt.savefig(cfg.result_path+f"{tag}_rewards_curve_cn") - # plt.show() -def smooth(data, weight=0.9): - '''用于平滑曲线,类似于Tensorboard中的smooth - - Args: - data (List):输入数据 - weight (Float): 平滑权重,处于0-1之间,数值越高说明越平滑,一般取0.9 - - Returns: - smoothed (List): 平滑后的数据 - ''' - last = data[0] # First value in the plot (first timestep) - smoothed = list() - for point in data: - smoothed_val = last * weight + (1 - weight) * point # 计算平滑值 - smoothed.append(smoothed_val) - last = smoothed_val - return smoothed - -def plot_rewards(rewards,title="learning curve",fpath=None,save_fig=True,show_fig=False): - sns.set() - plt.figure() # 创建一个图形实例,方便同时多画几个图 - plt.title(f"{title}") - plt.xlabel('epsiodes') - plt.plot(rewards, label='rewards') - plt.plot(smooth(rewards), label='smoothed') - plt.legend() - if save_fig: - plt.savefig(f"{fpath}/learning_curve.png") - if show_fig: - plt.show() - -def plot_losses(losses, algo="DQN", save=True, path='./'): - sns.set() - plt.figure() - plt.title("loss curve of {}".format(algo)) - plt.xlabel('epsiodes') - plt.plot(losses, label='rewards') - plt.legend() - if save: - plt.savefig(path+"losses_curve") - plt.show() - -def save_results(res_dic,fpath = None): - ''' save results - ''' - Path(fpath).mkdir(parents=True, exist_ok=True) - df = pd.DataFrame(res_dic) - df.to_csv(f"{fpath}/res.csv",index=None) -def merge_class_attrs(ob1, ob2): - ob1.__dict__.update(ob2.__dict__) - return ob1 -def get_logger(fpath): - Path(fpath).mkdir(parents=True, exist_ok=True) - logger = logging.getLogger(name='r') # set root logger if not set name - logger.setLevel(logging.DEBUG) - formatter = logging.Formatter( - '%(asctime)s - %(name)s - %(levelname)s: - %(message)s', - datefmt='%Y-%m-%d %H:%M:%S') - # output to file by using FileHandler - fh = logging.FileHandler(fpath+"log.txt") - fh.setLevel(logging.DEBUG) - fh.setFormatter(formatter) - # output to screen by using StreamHandler - ch = logging.StreamHandler() - ch.setLevel(logging.DEBUG) - ch.setFormatter(formatter) - # add Handler - logger.addHandler(ch) - logger.addHandler(fh) - return logger -def save_cfgs(cfgs, fpath): - ''' save config - ''' - Path(fpath).mkdir(parents=True, exist_ok=True) - - with open(f"{fpath}/config.yaml", 'w') as f: - for cfg_type in cfgs: - yaml.dump({cfg_type: cfgs[cfg_type].__dict__}, f, default_flow_style=False) -def load_cfgs(cfgs, fpath): - with open(fpath) as f: - load_cfg = yaml.load(f,Loader=yaml.FullLoader) - for cfg_type in cfgs: - for k, v in load_cfg[cfg_type].items(): - setattr(cfgs[cfg_type], k, v) -# def del_empty_dir(*paths): -# ''' 删除目录下所有空文件夹 -# ''' -# for path in paths: -# dirs = os.listdir(path) -# for dir in dirs: -# if not os.listdir(os.path.join(path, dir)): -# os.removedirs(os.path.join(path, dir)) - -# class NpEncoder(json.JSONEncoder): -# def default(self, obj): -# if isinstance(obj, np.integer): -# return int(obj) -# if isinstance(obj, np.floating): -# return float(obj) -# if isinstance(obj, np.ndarray): -# return obj.tolist() -# return json.JSONEncoder.default(self, obj) - -# def save_args(args,path=None): -# # save parameters -# Path(path).mkdir(parents=True, exist_ok=True) -# with open(f"{path}/params.json", 'w') as fp: -# json.dump(args, fp,cls=NpEncoder) -# print("Parameters saved!") - - -def timing(func): - ''' a decorator to print the running time of a function - ''' - @wraps(func) - def wrap(*args, **kw): - ts = time() - result = func(*args, **kw) - te = time() - print(f"func: {func.__name__}, took: {te-ts:2.4f} seconds") - return result - return wrap -def all_seed(env,seed = 1): - ''' omnipotent seed for RL, attention the position of seed function, you'd better put it just following the env create function - Args: - env (_type_): - seed (int, optional): _description_. Defaults to 1. - ''' - import torch - import numpy as np - import random - if seed == 0: - return - # print(f"seed = {seed}") - env.seed(seed) # env config - np.random.seed(seed) - random.seed(seed) - torch.manual_seed(seed) # config for CPU - torch.cuda.manual_seed(seed) # config for GPU - os.environ['PYTHONHASHSEED'] = str(seed) # config for python scripts - # config for cudnn - torch.backends.cudnn.deterministic = True - torch.backends.cudnn.benchmark = False - torch.backends.cudnn.enabled = False - \ No newline at end of file diff --git a/projects/codes/common/wrappers.py b/projects/codes/common/wrappers.py deleted file mode 100644 index 4793b36..0000000 --- a/projects/codes/common/wrappers.py +++ /dev/null @@ -1,29 +0,0 @@ -import gym - -class TimeLimit(gym.Wrapper): - def __init__(self, env, max_episode_steps=None): - super(TimeLimit, self).__init__(env) - self._max_episode_steps = max_episode_steps - self._elapsed_steps = 0 - - def step(self, ac): - observation, reward, done, info = self.env.step(ac) - self._elapsed_steps += 1 - if self._elapsed_steps >= self._max_episode_steps: - done = True - info['TimeLimit.truncated'] = True - return observation, reward, done, info - - def reset(self, **kwargs): - self._elapsed_steps = 0 - return self.env.reset(**kwargs) - -class ClipActionsWrapper(gym.Wrapper): - def step(self, action): - import numpy as np - action = np.nan_to_num(action) - action = np.clip(action, self.action_space.low, self.action_space.high) - return self.env.step(action) - - def reset(self, **kwargs): - return self.env.reset(**kwargs) \ No newline at end of file diff --git a/projects/codes/envs/README.md b/projects/codes/envs/README.md deleted file mode 100644 index d30725b..0000000 --- a/projects/codes/envs/README.md +++ /dev/null @@ -1,18 +0,0 @@ -# 环境说明汇总 - -## 算法SAR一览 - -说明:SAR分别指状态(S)、动作(A)以及奖励(R),下表的Reward Range表示每回合能获得的奖励范围,Steps表示环境中每回合的最大步数 - -| Environment ID | Observation Space | Action Space | Reward Range | Steps | -| :--------------------------------: | :---------------: | :----------: | :----------: | :------: | -| CartPole-v0 | Box(4,) | Discrete(2) | [0,200] | 200 | -| CartPole-v1 | Box(4,) | Discrete(2) | [0,500] | 500 | -| CliffWalking-v0 | Discrete(48) | Discrete(4) | [-inf,-13] | [13,inf] | -| FrozenLake-v1(*is_slippery*=False) | Discrete(16) | Discrete(4) | 0 or 1 | [6,info] | - -## 环境描述 - -[OpenAI Gym](./gym_info.md) -[MuJoCo](./mujoco_info.md) - diff --git a/projects/codes/envs/assets/action_grid.png b/projects/codes/envs/assets/action_grid.png deleted file mode 100644 index 7759f8b..0000000 Binary files a/projects/codes/envs/assets/action_grid.png and /dev/null differ diff --git a/projects/codes/envs/assets/gym_info_20211130180023.png b/projects/codes/envs/assets/gym_info_20211130180023.png deleted file mode 100644 index 723b67f..0000000 Binary files a/projects/codes/envs/assets/gym_info_20211130180023.png and /dev/null differ diff --git a/projects/codes/envs/assets/image-20200820174307301.png b/projects/codes/envs/assets/image-20200820174307301.png deleted file mode 100644 index 1197da0..0000000 Binary files a/projects/codes/envs/assets/image-20200820174307301.png and /dev/null differ diff --git a/projects/codes/envs/assets/image-20200820174814084.png b/projects/codes/envs/assets/image-20200820174814084.png deleted file mode 100644 index 4c9e3dc..0000000 Binary files a/projects/codes/envs/assets/image-20200820174814084.png and /dev/null differ diff --git a/projects/codes/envs/assets/image-20201007211441036.png b/projects/codes/envs/assets/image-20201007211441036.png deleted file mode 100644 index ae5b0f8..0000000 Binary files a/projects/codes/envs/assets/image-20201007211441036.png and /dev/null differ diff --git a/projects/codes/envs/assets/image-20201007211858925.png b/projects/codes/envs/assets/image-20201007211858925.png deleted file mode 100644 index 0bbb5b2..0000000 Binary files a/projects/codes/envs/assets/image-20201007211858925.png and /dev/null differ diff --git a/projects/codes/envs/assets/image-20210429150622353.png b/projects/codes/envs/assets/image-20210429150622353.png deleted file mode 100644 index 1216b4c..0000000 Binary files a/projects/codes/envs/assets/image-20210429150622353.png and /dev/null differ diff --git a/projects/codes/envs/assets/image-20210429150630806.png b/projects/codes/envs/assets/image-20210429150630806.png deleted file mode 100644 index 45107d5..0000000 Binary files a/projects/codes/envs/assets/image-20210429150630806.png and /dev/null differ diff --git a/projects/codes/envs/assets/track_big.png b/projects/codes/envs/assets/track_big.png deleted file mode 100644 index f7b3dc1..0000000 Binary files a/projects/codes/envs/assets/track_big.png and /dev/null differ diff --git a/projects/codes/envs/blackjack.py b/projects/codes/envs/blackjack.py deleted file mode 100644 index 87f02d2..0000000 --- a/projects/codes/envs/blackjack.py +++ /dev/null @@ -1,122 +0,0 @@ -import gym -from gym import spaces -from gym.utils import seeding - -def cmp(a, b): - return int((a > b)) - int((a < b)) - -# 1 = Ace, 2-10 = Number cards, Jack/Queen/King = 10 -deck = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 10, 10] - - -def draw_card(np_random): - return np_random.choice(deck) - - -def draw_hand(np_random): - return [draw_card(np_random), draw_card(np_random)] - - -def usable_ace(hand): # Does this hand have a usable ace? - return 1 in hand and sum(hand) + 10 <= 21 - - -def sum_hand(hand): # Return current hand total - if usable_ace(hand): - return sum(hand) + 10 - return sum(hand) - - -def is_bust(hand): # Is this hand a bust? - return sum_hand(hand) > 21 - - -def score(hand): # What is the score of this hand (0 if bust) - return 0 if is_bust(hand) else sum_hand(hand) - - -def is_natural(hand): # Is this hand a natural blackjack? - return sorted(hand) == [1, 10] - - -class BlackjackEnv(gym.Env): - """Simple blackjack environment - Blackjack is a card game where the goal is to obtain cards that sum to as - near as possible to 21 without going over. They're playing against a fixed - dealer. - Face cards (Jack, Queen, King) have point value 10. - Aces can either count as 11 or 1, and it's called 'usable' at 11. - This game is placed with an infinite deck (or with replacement). - The game starts with each (player and dealer) having one face up and one - face down card. - The player can request additional cards (hit=1) until they decide to stop - (stick=0) or exceed 21 (bust). - After the player sticks, the dealer reveals their facedown card, and draws - until their sum is 17 or greater. If the dealer goes bust the player wins. - If neither player nor dealer busts, the outcome (win, lose, draw) is - decided by whose sum is closer to 21. The reward for winning is +1, - drawing is 0, and losing is -1. - The observation of a 3-tuple of: the players current sum, - the dealer's one showing card (1-10 where 1 is ace), - and whether or not the player holds a usable ace (0 or 1). - This environment corresponds to the version of the blackjack problem - described in Example 5.1 in Reinforcement Learning: An Introduction - by Sutton and Barto (1998). - https://webdocs.cs.ualberta.ca/~sutton/book/the-book.html - """ - def __init__(self, natural=False): - self.action_space = spaces.Discrete(2) - self.observation_space = spaces.Tuple(( - spaces.Discrete(32), - spaces.Discrete(11), - spaces.Discrete(2))) - self._seed() - - # Flag to payout 1.5 on a "natural" blackjack win, like casino rules - # Ref: http://www.bicyclecards.com/how-to-play/blackjack/ - self.natural = natural - # Start the first game - self._reset() # Number of - self.n_actions = 2 - - def reset(self): - return self._reset() - - def step(self, action): - return self._step(action) - - def _seed(self, seed=None): - self.np_random, seed = seeding.np_random(seed) - return [seed] - - def _step(self, action): - assert self.action_space.contains(action) - if action: # hit: add a card to players hand and return - self.player.append(draw_card(self.np_random)) - if is_bust(self.player): - done = True - reward = -1 - else: - done = False - reward = 0 - else: # stick: play out the dealers hand, and score - done = True - while sum_hand(self.dealer) < 17: - self.dealer.append(draw_card(self.np_random)) - reward = cmp(score(self.player), score(self.dealer)) - if self.natural and is_natural(self.player) and reward == 1: - reward = 1.5 - return self._get_obs(), reward, done, {} - - def _get_obs(self): - return (sum_hand(self.player), self.dealer[0], usable_ace(self.player)) - - def _reset(self): - self.dealer = draw_hand(self.np_random) - self.player = draw_hand(self.np_random) - - # Auto-draw another card if the score is less than 12 - while sum_hand(self.player) < 12: - self.player.append(draw_card(self.np_random)) - - return self._get_obs() diff --git a/projects/codes/envs/cliff_walking.py b/projects/codes/envs/cliff_walking.py deleted file mode 100644 index 05b9b2e..0000000 --- a/projects/codes/envs/cliff_walking.py +++ /dev/null @@ -1,84 +0,0 @@ -import numpy as np -import sys -from gym.envs.toy_text import discrete - - -UP = 0 -RIGHT = 1 -DOWN = 2 -LEFT = 3 - -class CliffWalkingEnv(discrete.DiscreteEnv): - - metadata = {'render.modes': ['human', 'ansi']} - - def _limit_coordinates(self, coord): - coord[0] = min(coord[0], self.shape[0] - 1) - coord[0] = max(coord[0], 0) - coord[1] = min(coord[1], self.shape[1] - 1) - coord[1] = max(coord[1], 0) - return coord - - def _calculate_transition_prob(self, current, delta): - new_position = np.array(current) + np.array(delta) - new_position = self._limit_coordinates(new_position).astype(int) - new_state = np.ravel_multi_index(tuple(new_position), self.shape) - reward = -100.0 if self._cliff[tuple(new_position)] else -1.0 - is_done = self._cliff[tuple(new_position)] or (tuple(new_position) == (3,11)) - return [(1.0, new_state, reward, is_done)] - - def __init__(self): - self.shape = (4, 12) - - nS = np.prod(self.shape) - n_actions = 4 - - # Cliff Location - self._cliff = np.zeros(self.shape, dtype=np.bool) - self._cliff[3, 1:-1] = True - - # Calculate transition probabilities - P = {} - for s in range(nS): - position = np.unravel_index(s, self.shape) - P[s] = { a : [] for a in range(n_actions) } - P[s][UP] = self._calculate_transition_prob(position, [-1, 0]) - P[s][RIGHT] = self._calculate_transition_prob(position, [0, 1]) - P[s][DOWN] = self._calculate_transition_prob(position, [1, 0]) - P[s][LEFT] = self._calculate_transition_prob(position, [0, -1]) - - # We always start in state (3, 0) - isd = np.zeros(nS) - isd[np.ravel_multi_index((3,0), self.shape)] = 1.0 - - super(CliffWalkingEnv, self).__init__(nS, n_actions, P, isd) - - def render(self, mode='human', close=False): - self._render(mode, close) - - def _render(self, mode='human', close=False): - if close: - return - - outfile = StringIO() if mode == 'ansi' else sys.stdout - - for s in range(self.nS): - position = np.unravel_index(s, self.shape) - # print(self.s) - if self.s == s: - output = " x " - elif position == (3,11): - output = " T " - elif self._cliff[position]: - output = " C " - else: - output = " o " - - if position[1] == 0: - output = output.lstrip() - if position[1] == self.shape[1] - 1: - output = output.rstrip() - output += "\n" - - outfile.write(output) - outfile.write("\n") diff --git a/projects/codes/envs/gridworld.py b/projects/codes/envs/gridworld.py deleted file mode 100644 index cf3aec2..0000000 --- a/projects/codes/envs/gridworld.py +++ /dev/null @@ -1,125 +0,0 @@ -import io -import numpy as np -import sys -from gym.envs.toy_text import discrete - -UP = 0 -RIGHT = 1 -DOWN = 2 -LEFT = 3 - -class GridworldEnv(discrete.DiscreteEnv): - """ - Grid World environment from Sutton's Reinforcement Learning book chapter 4. - You are an agent on an MxN grid and your goal is to reach the terminal - state at the top left or the bottom right corner. - - For example, a 4x4 grid looks as follows: - - T o o o - o x o o - o o o o - o o o T - - x is your position and T are the two terminal states. - - You can take actions in each direction (UP=0, RIGHT=1, DOWN=2, LEFT=3). - Actions going off the edge leave you in your current state. - You receive a reward of -1 at each step until you reach a terminal state. - """ - - metadata = {'render.modes': ['human', 'ansi']} - - def __init__(self, shape=[4,4]): - if not isinstance(shape, (list, tuple)) or not len(shape) == 2: - raise ValueError('shape argument must be a list/tuple of length 2') - - self.shape = shape - - nS = np.prod(shape) - n_actions = 4 - - MAX_Y = shape[0] - MAX_X = shape[1] - - P = {} - grid = np.arange(nS).reshape(shape) - it = np.nditer(grid, flags=['multi_index']) - - while not it.finished: - s = it.iterindex - y, x = it.multi_index - - # P[s][a] = (prob, next_state, reward, is_done) - P[s] = {a : [] for a in range(n_actions)} - - is_done = lambda s: s == 0 or s == (nS - 1) - reward = 0.0 if is_done(s) else -1.0 - - # We're stuck in a terminal state - if is_done(s): - P[s][UP] = [(1.0, s, reward, True)] - P[s][RIGHT] = [(1.0, s, reward, True)] - P[s][DOWN] = [(1.0, s, reward, True)] - P[s][LEFT] = [(1.0, s, reward, True)] - # Not a terminal state - else: - ns_up = s if y == 0 else s - MAX_X - ns_right = s if x == (MAX_X - 1) else s + 1 - ns_down = s if y == (MAX_Y - 1) else s + MAX_X - ns_left = s if x == 0 else s - 1 - P[s][UP] = [(1.0, ns_up, reward, is_done(ns_up))] - P[s][RIGHT] = [(1.0, ns_right, reward, is_done(ns_right))] - P[s][DOWN] = [(1.0, ns_down, reward, is_done(ns_down))] - P[s][LEFT] = [(1.0, ns_left, reward, is_done(ns_left))] - - it.iternext() - - # Initial state distribution is uniform - isd = np.ones(nS) / nS - - # We expose the model of the environment for educational purposes - # This should not be used in any model-free learning algorithm - self.P = P - - super(GridworldEnv, self).__init__(nS, n_actions, P, isd) - - def _render(self, mode='human', close=False): - """ Renders the current gridworld layout - - For example, a 4x4 grid with the mode="human" looks like: - T o o o - o x o o - o o o o - o o o T - where x is your position and T are the two terminal states. - """ - if close: - return - - outfile = io.StringIO() if mode == 'ansi' else sys.stdout - - grid = np.arange(self.nS).reshape(self.shape) - it = np.nditer(grid, flags=['multi_index']) - while not it.finished: - s = it.iterindex - y, x = it.multi_index - - if self.s == s: - output = " x " - elif s == 0 or s == self.nS - 1: - output = " T " - else: - output = " o " - - if x == 0: - output = output.lstrip() - if x == self.shape[1] - 1: - output = output.rstrip() - - outfile.write(output) - - if x == self.shape[1] - 1: - outfile.write("\n") - - it.iternext() diff --git a/projects/codes/envs/gridworld_env.py b/projects/codes/envs/gridworld_env.py deleted file mode 100644 index 9d0724a..0000000 --- a/projects/codes/envs/gridworld_env.py +++ /dev/null @@ -1,100 +0,0 @@ -import gym -import turtle -import numpy as np - -# turtle tutorial : https://docs.python.org/3.3/library/turtle.html - -def GridWorld(gridmap=None, is_slippery=False): - if gridmap is None: - gridmap = ['SFFF', 'FHFH', 'FFFH', 'HFFG'] - env = gym.make("FrozenLake-v0", desc=gridmap, is_slippery=False) - env = FrozenLakeWapper(env) - return env - - -class FrozenLakeWapper(gym.Wrapper): - def __init__(self, env): - gym.Wrapper.__init__(self, env) - self.max_y = env.desc.shape[0] - self.max_x = env.desc.shape[1] - self.t = None - self.unit = 50 - - def draw_box(self, x, y, fillcolor='', line_color='gray'): - self.t.up() - self.t.goto(x * self.unit, y * self.unit) - self.t.color(line_color) - self.t.fillcolor(fillcolor) - self.t.setheading(90) - self.t.down() - self.t.begin_fill() - for _ in range(4): - self.t.forward(self.unit) - self.t.right(90) - self.t.end_fill() - - def move_player(self, x, y): - self.t.up() - self.t.setheading(90) - self.t.fillcolor('red') - self.t.goto((x + 0.5) * self.unit, (y + 0.5) * self.unit) - - def render(self): - if self.t == None: - self.t = turtle.Turtle() - self.wn = turtle.Screen() - self.wn.setup(self.unit * self.max_x + 100, - self.unit * self.max_y + 100) - self.wn.setworldcoordinates(0, 0, self.unit * self.max_x, - self.unit * self.max_y) - self.t.shape('circle') - self.t.width(2) - self.t.speed(0) - self.t.color('gray') - for i in range(self.desc.shape[0]): - for j in range(self.desc.shape[1]): - x = j - y = self.max_y - 1 - i - if self.desc[i][j] == b'S': # Start - self.draw_box(x, y, 'white') - elif self.desc[i][j] == b'F': # Frozen ice - self.draw_box(x, y, 'white') - elif self.desc[i][j] == b'G': # Goal - self.draw_box(x, y, 'yellow') - elif self.desc[i][j] == b'H': # Hole - self.draw_box(x, y, 'black') - else: - self.draw_box(x, y, 'white') - self.t.shape('turtle') - - x_pos = self.s % self.max_x - y_pos = self.max_y - 1 - int(self.s / self.max_x) - self.move_player(x_pos, y_pos) - - - -if __name__ == '__main__': - # 环境1:FrozenLake, 可以配置冰面是否是滑的 - # 0 left, 1 down, 2 right, 3 up - env = gym.make("FrozenLake-v0", is_slippery=False) - env = FrozenLakeWapper(env) - - # 环境2:CliffWalking, 悬崖环境 - # env = gym.make("CliffWalking-v0") # 0 up, 1 right, 2 down, 3 left - # env = CliffWalkingWapper(env) - - # 环境3:自定义格子世界,可以配置地图, S为出发点Start, F为平地Floor, H为洞Hole, G为出口目标Goal - # gridmap = [ - # 'SFFF', - # 'FHFF', - # 'FFFF', - # 'HFGF' ] - # env = GridWorld(gridmap) - - env.reset() - for step in range(10): - action = np.random.randint(0, 4) - obs, reward, done, info = env.step(action) - print('step {}: action {}, obs {}, reward {}, done {}, info {}'.format(\ - step, action, obs, reward, done, info)) - # env.render() # 渲染一帧图像 \ No newline at end of file diff --git a/projects/codes/envs/gym_info.md b/projects/codes/envs/gym_info.md deleted file mode 100644 index 49da18f..0000000 --- a/projects/codes/envs/gym_info.md +++ /dev/null @@ -1,50 +0,0 @@ -# OpenAi Gym 环境说明 -## 基础控制 - -### [CartPole v0](https://github.com/openai/gym/wiki/CartPole-v0) - -image-20200820174307301 - -通过向左或向右推车能够实现平衡,所以动作空间由两个动作组成。每进行一个step就会给一个reward,如果无法保持平衡那么done等于true,本次episode失败。理想状态下,每个episode至少能进行200个step,也就是说每个episode的reward总和至少为200,step数目至少为200 - -### CartPole-v1 - -```CartPole v1```环境其实跟```CartPole v0```是一模一样的,区别在于每回合最大步数(max_episode_steps)以及奖励阈值(reward_threshold),如下是相关源码: - -![](assets/gym_info_20211130180023.png) - -这里先解释一下奖励阈值(reward_threshold),即Gym设置的一个合格标准,比如对于```CartPole v0```如果算法能够将奖励收敛到195以上,说明该算法合格。但实际上```CartPole v0```的每回合最大步数(max_episode_steps)是200,每步的奖励最大是1,也就是每回合最大奖励是200,比Gym设置的奖励阈值高。笔者猜测这是Gym可能是给算法学习者们设置的一个参考线,而实际中在写算法时并不会用到这个算法阈值,所以可以忽略。 - -再看每回合最大步数,可以看到```CartPole v1```的步数更长,相应的奖励要求更高,可以理解为```v1```是```v0```的难度升级版。 - - -### [Pendulum-v0](https://github.com/openai/gym/wiki/Pendulum-v0) - -注:gym 0.18.0之后版本中Pendulum-v0已经改为Pendulum-v1 -image-20200820174814084 - -钟摆以随机位置开始,目标是将其摆动,使其保持向上直立。动作空间是连续的,值的区间为[-2,2]。每个step给的reward最低为-16.27,最高为0。目前最好的成绩是100个episode的reward之和为-123.11 ± 6.86。 - -### - -悬崖寻路问题(CliffWalking)是指在一个4 x 12的网格中,智能体以网格的左下角位置为起点,以网格的下角位置为终点,目标是移动智能体到达终点位置,智能体每次可以在上、下、左、右这4个方向中移动一步,每移动一步会得到-1单位的奖励。 - -image-20201007211441036 - -如图,红色部分表示悬崖,数字代表智能体能够观测到的位置信息,即observation,总共会有0-47等48个不同的值,智能体再移动中会有以下限制: - -* 智能体不能移出网格,如果智能体想执行某个动作移出网格,那么这一步智能体不会移动,但是这个操作依然会得到-1单位的奖励 - -* 如果智能体“掉入悬崖” ,会立即回到起点位置,并得到-100单位的奖励 - -* 当智能体移动到终点时,该回合结束,该回合总奖励为各步奖励之和 - -实际的仿真界面如下: - -image-20201007211858925 - -由于从起点到终点最少需要13步,每步得到-1的reward,因此最佳训练算法下,每个episode下reward总和应该为-13。 - -## 参考 - -[Gym环境相关源码](https://github.com/openai/gym/tree/master/gym/envs) \ No newline at end of file diff --git a/projects/codes/envs/mujoco_info.md b/projects/codes/envs/mujoco_info.md deleted file mode 100644 index aaa8cbb..0000000 --- a/projects/codes/envs/mujoco_info.md +++ /dev/null @@ -1,42 +0,0 @@ -# MuJoCo - -MuJoCo(Multi-Joint dynamics with Contact)是一个物理模拟器,可以用于机器人控制优化等研究。安装见[Mac安装MuJoCo以及mujoco_py](https://blog.csdn.net/JohnJim0/article/details/115656392?spm=1001.2014.3001.5501) - - - -## HalfCheetah-v2 - - - -该环境基于mujoco仿真引擎,该环境的目的是使一只两只脚的“猎豹”跑得越快越好(下面图谷歌HalfCheetah-v2的,https://gym.openai.com/envs/HalfCheetah-v2/)。 - -image-20210429150630806 - -动作空间:Box(6,),一只脚需要控制三个关节一共6个关节,每个关节的运动范围为[-1, 1]。 - -状态空间:Box(17, ),包含各种状态,每个值的范围为![img](assets/9cd6ae68c9aad008ede4139da358ec26.svg),主要描述“猎豹”本身的姿态等信息。 - -回报定义:每一步的回报与这一步的中猎豹的速度和猎豹行动的消耗有关,定义回报的代码如下。 - -```python -def step(self, action): - xposbefore = self.sim.data.qpos[0] - self.do_simulation(action, self.frame_skip) - xposafter = self.sim.data.qpos[0] - ob = self._get_obs() - reward_ctrl = - 0.1 * np.square(action).sum() - reward_run = (xposafter - xposbefore)/self.dt - # =========== reward =========== - reward = reward_ctrl + reward_run - # =========== reward =========== - done = False - return ob, reward, done, dict(reward_run=reward_run, reward_ctrl=reward_ctrl) -``` - -当猎豹无法控制平衡而倒下时,一个回合(episode)结束。 - -但是这个环境有一些问题,目前经过搜索并不知道一个回合的reward上限,实验中训练好的episode能跑出平台之外: - -image-20210429150622353 - -加上时间有限,所以训练中reward一直处于一个平缓上升的状态,本人猜测这可能是gym的一个bug。 \ No newline at end of file diff --git a/projects/codes/envs/racetrack.py b/projects/codes/envs/racetrack.py deleted file mode 100644 index 69836d5..0000000 --- a/projects/codes/envs/racetrack.py +++ /dev/null @@ -1,242 +0,0 @@ -import time -import random -import numpy as np -import os -import matplotlib.pyplot as plt -import matplotlib.patheffects as pe -from IPython.display import clear_output -from gym.spaces import Discrete,Box -from matplotlib import colors -import gym - -class RacetrackEnv(gym.Env) : - """ - Class representing a race-track environment inspired by exercise 5.12 in Sutton & Barto 2018 (p.111). - Please do not make changes to this class - it will be overwritten with a clean version when it comes to marking. - - The dynamics of this environment are detailed in this coursework exercise's jupyter notebook, although I have - included rather verbose comments here for those of you who are interested in how the environment has been - implemented (though this should not impact your solution code).ss - """ - - ACTIONS_DICT = { - 0 : (1, -1), # Acc Vert., Brake Horiz. - 1 : (1, 0), # Acc Vert., Hold Horiz. - 2 : (1, 1), # Acc Vert., Acc Horiz. - 3 : (0, -1), # Hold Vert., Brake Horiz. - 4 : (0, 0), # Hold Vert., Hold Horiz. - 5 : (0, 1), # Hold Vert., Acc Horiz. - 6 : (-1, -1), # Brake Vert., Brake Horiz. - 7 : (-1, 0), # Brake Vert., Hold Horiz. - 8 : (-1, 1) # Brake Vert., Acc Horiz. - } - - - CELL_TYPES_DICT = { - 0 : "track", - 1 : "wall", - 2 : "start", - 3 : "goal" - } - - - def __init__(self) : - # Load racetrack map from file. - self.track = np.flip(np.loadtxt(os.path.dirname(__file__)+"/track.txt", dtype = int), axis = 0) - - - # Discover start grid squares. - self.initial_states = [] - for y in range(self.track.shape[0]) : - for x in range(self.track.shape[1]) : - if (self.CELL_TYPES_DICT[self.track[y, x]] == "start") : - self.initial_states.append((y, x)) - high= np.array([np.finfo(np.float32).max, np.finfo(np.float32).max, np.finfo(np.float32).max, np.finfo(np.float32).max]) - self.observation_space = Box(low=-high, high=high, shape=(4,), dtype=np.float32) - self.action_space = Discrete(9) - self.is_reset = False - - def step(self, action : int) : - """ - Takes a given action in the environment's current state, and returns a next state, - reward, and whether the next state is done or not. - - Arguments: - action {int} -- The action to take in the environment's current state. Should be an integer in the range [0-8]. - - Raises: - RuntimeError: Raised when the environment needs resetting.\n - TypeError: Raised when an action of an invalid type is given.\n - ValueError: Raised when an action outside the range [0-8] is given.\n - - Returns: - A tuple of:\n - {(int, int, int, int)} -- The next state, a tuple of (y_pos, x_pos, y_velocity, x_velocity).\n - {int} -- The reward earned by taking the given action in the current environment state.\n - {bool} -- Whether the environment's next state is done or not.\n - - """ - - # Check whether a reset is needed. - if (not self.is_reset) : - raise RuntimeError(".step() has been called when .reset() is needed.\n" + - "You need to call .reset() before using .step() for the first time, and after an episode ends.\n" + - ".reset() initialises the environment at the start of an episode, then returns an initial state.") - - # Check that action is the correct type (either a python integer or a numpy integer). - if (not (isinstance(action, int) or isinstance(action, np.integer))) : - raise TypeError("action should be an integer.\n" + - "action value {} of type {} was supplied.".format(action, type(action))) - - # Check that action is an allowed value. - if (action < 0 or action > 8) : - raise ValueError("action must be an integer in the range [0-8] corresponding to one of the legal actions.\n" + - "action value {} was supplied.".format(action)) - - - # Update Velocity. - # With probability, 0.85 update velocity components as intended. - if (np.random.uniform() < 0.8) : - (d_y, d_x) = self.ACTIONS_DICT[action] - # With probability, 0.15 Do not change velocity components. - else : - (d_y, d_x) = (0, 0) - - self.velocity = (self.velocity[0] + d_y, self.velocity[1] + d_x) - - # Keep velocity within bounds (-10, 10). - if (self.velocity[0] > 10) : - self.velocity[0] = 10 - elif (self.velocity[0] < -10) : - self.velocity[0] = -10 - if (self.velocity[1] > 10) : - self.velocity[1] = 10 - elif (self.velocity[1] < -10) : - self.velocity[1] = -10 - - # Update Position. - new_position = (self.position[0] + self.velocity[0], self.position[1] + self.velocity[1]) - - reward = 0 - done = False - - # If position is out-of-bounds, return to start and set velocity components to zero. - if (new_position[0] < 0 or new_position[1] < 0 or new_position[0] >= self.track.shape[0] or new_position[1] >= self.track.shape[1]) : - self.position = random.choice(self.initial_states) - self.velocity = (0, 0) - reward -= 10 - # If position is in a wall grid-square, return to start and set velocity components to zero. - elif (self.CELL_TYPES_DICT[self.track[new_position]] == "wall") : - self.position = random.choice(self.initial_states) - self.velocity = (0, 0) - reward -= 10 - # If position is in a track grid-squre or a start-square, update position. - elif (self.CELL_TYPES_DICT[self.track[new_position]] in ["track", "start"]) : - self.position = new_position - # If position is in a goal grid-square, end episode. - elif (self.CELL_TYPES_DICT[self.track[new_position]] == "goal") : - self.position = new_position - reward += 10 - done = True - # If this gets reached, then the student has touched something they shouldn't have. Naughty! - else : - raise RuntimeError("You've met with a terrible fate, haven't you?\nDon't modify things you shouldn't!") - - # Penalise every timestep. - reward -= 1 - - # Require a reset if the current state is done. - if (done) : - self.is_reset = False - - # Return next state, reward, and whether the episode has ended. - return np.array([self.position[0], self.position[1], self.velocity[0], self.velocity[1]]), reward, done,{} - - - def reset(self) : - """ - Resets the environment, ready for a new episode to begin, then returns an initial state. - The initial state will be a starting grid square randomly chosen using a uniform distribution, - with both components of the velocity being zero. - - Returns: - {(int, int, int, int)} -- an initial state, a tuple of (y_pos, x_pos, y_velocity, x_velocity). - """ - - # Pick random starting grid-square. - self.position = random.choice(self.initial_states) - - # Set both velocity components to zero. - self.velocity = (0, 0) - - self.is_reset = True - - return np.array([self.position[0], self.position[1], self.velocity[0], self.velocity[1]]) - - - def render(self, mode = 'human') : - """ - Renders a pretty matplotlib plot representing the current state of the environment. - Calling this method on subsequent timesteps will update the plot. - This is VERY VERY SLOW and wil slow down training a lot. Only use for debugging/testing. - - Arguments: - sleep_time {float} -- How many seconds (or partial seconds) you want to wait on this rendered frame. - - """ - # Turn interactive mode on. - plt.ion() - fig = plt.figure(num = "env_render") - ax = plt.gca() - ax.clear() - clear_output(wait = True) - - # Prepare the environment plot and mark the car's position. - env_plot = np.copy(self.track) - env_plot[self.position] = 4 - env_plot = np.flip(env_plot, axis = 0) - - # Plot the gridworld. - cmap = colors.ListedColormap(["white", "black", "green", "red", "yellow"]) - bounds = list(range(6)) - norm = colors.BoundaryNorm(bounds, cmap.N) - ax.imshow(env_plot, cmap = cmap, norm = norm, zorder = 0) - - # Plot the velocity. - if (not self.velocity == (0, 0)) : - ax.arrow(self.position[1], self.track.shape[0] - 1 - self.position[0], self.velocity[1], -self.velocity[0], - path_effects=[pe.Stroke(linewidth=1, foreground='black')], color = "yellow", width = 0.1, length_includes_head = True, zorder = 2) - - # Set up axes. - ax.grid(which = 'major', axis = 'both', linestyle = '-', color = 'k', linewidth = 2, zorder = 1) - ax.set_xticks(np.arange(-0.5, self.track.shape[1] , 1)); - ax.set_xticklabels([]) - ax.set_yticks(np.arange(-0.5, self.track.shape[0], 1)); - ax.set_yticklabels([]) - - # Draw everything. - #fig.canvas.draw() - #fig.canvas.flush_events() - plt.show() - # time sleep - time.sleep(0.1) - - def get_actions(self) : - """ - Returns the available actions in the current state - will always be a list - of integers in the range [0-8]. - """ - return [*self.ACTIONS_DICT] -if __name__ == "__main__": - num_steps = 1000000 - env = RacetrackEnv() - state = env.reset() - print(state) - for _ in range(num_steps) : - - next_state, reward, done,_ = env.step(random.choice(env.get_actions())) - print(next_state) - env.render() - - if (done) : - _ = env.reset() diff --git a/projects/codes/envs/racetrack_env.md b/projects/codes/envs/racetrack_env.md deleted file mode 100644 index c5e2d7f..0000000 --- a/projects/codes/envs/racetrack_env.md +++ /dev/null @@ -1,37 +0,0 @@ -## The Racetrack Environment -We have implemented a custom environment called "Racetrack" for you to use during this piece of coursework. It is inspired by the environment described in the course textbook (Reinforcement Learning, Sutton & Barto, 2018, Exercise 5.12), but is not exactly the same. - -### Environment Description - -Consider driving a race car around a turn on a racetrack. In order to complete the race as quickly as possible, you would want to drive as fast as you can but, to avoid running off the track, you must slow down while turning. - -In our simplified racetrack environment, the agent is at one of a discrete set of grid positions. The agent also has a discrete speed in two directions, $x$ and $y$. So the state is represented as follows: -$$(\text{position}_y, \text{position}_x, \text{velocity}_y, \text{velocity}_x)$$ - -The agent collects a reward of -1 at each time step, an additional -10 for leaving the track (i.e., ending up on a black grid square in the figure below), and an additional +10 for reaching the finish line (any of the red grid squares). The agent starts each episode in a randomly selected grid-square on the starting line (green grid squares) with a speed of zero in both directions. At each time step, the agent can change its speed in both directions. Each speed can be changed by +1, -1 or 0, giving a total of nine actions. For example, the agent may increase its speed in the $x$ direction by -1 and its speed in the $y$ direction by +1. The agent's speed cannot be greater than +10 or less than -10 in either direction. - - - - -The agent's next state is determined by its current grid square, its current speed in two directions, and the changes it makes to its speed in the two directions. This environment is stochastic. When the agent tries to change its speed, no change occurs (in either direction) with probability 0.2. In other words, 20% of the time, the agent's action is ignored and the car's speed remains the same in both directions. - -If the agent leaves the track, it is returned to a random start grid-square and has its speed set to zero in both directions; the episode continues. An episode ends only when the agent transitions to a goal grid-square. - - - -### Environment Implementation -We have implemented the above environment in the `racetrack_env.py` file, for you to use in this coursework. Please use this implementation instead of writing your own, and please do not modify the environment. - -We provide a `RacetrackEnv` class for your agents to interact with. The class has the following methods: -- **`reset()`** - this method initialises the environment, chooses a random starting state, and returns it. This method should be called before the start of every episode. -- **`step(action)`** - this method takes an integer action (more on this later), and executes one time-step in the environment. It returns a tuple containing the next state, the reward collected, and whether the next state is a terminal state. -- **`render(sleep_time)`** - this method renders a matplotlib graph representing the environment. It takes an optional float parameter giving the number of seconds to display each time-step. This method is useful for testing and debugging, but should not be used during training since it is *very* slow. **Do not use this method in your final submission**. -- **`get_actions()`** - a simple method that returns the available actions in the current state. Always returns a list containing integers in the range [0-8] (more on this later). - -In our code, states are represented as Python tuples - specifically a tuple of four integers. For example, if the agent is in a grid square with coordinates ($Y = 2$, $X = 3$), and is moving zero cells vertically and one cell horizontally per time-step, the state is represented as `(2, 3, 0, 1)`. Tuples of this kind will be returned by the `reset()` and `step(action)` methods. - -There are nine actions available to the agent in each state, as described above. However, to simplify your code, we have represented each of the nine actions as an integer in the range [0-8]. The table below shows the index of each action, along with the corresponding changes it will cause to the agent's speed in each direction. - - - -For example, taking action 8 will increase the agent's speed in the $x$ direction, but decrease its speed in the $y$ direction. \ No newline at end of file diff --git a/projects/codes/envs/register.py b/projects/codes/envs/register.py deleted file mode 100644 index 38074cf..0000000 --- a/projects/codes/envs/register.py +++ /dev/null @@ -1,34 +0,0 @@ - -from gym.envs.registration import register - -def register_env(env_name): - if env_name == 'Racetrack-v0': - register( - id='Racetrack-v0', - entry_point='envs.racetrack:RacetrackEnv', - max_episode_steps=1000, - kwargs={} - ) - elif env_name == 'FrozenLakeNoSlippery-v1': - register( - id='FrozenLakeNoSlippery-v1', - entry_point='gym.envs.toy_text.frozen_lake:FrozenLakeEnv', - kwargs={'map_name':"4x4",'is_slippery':False}, - ) - else: - print("The env name must be wrong or the environment donot need to register!") - -# if __name__ == "__main__": -# import random -# import gym -# env = gym.make('FrozenLakeNoSlippery-v1') -# num_steps = 1000000 -# state = env.reset() -# n_actions = env.action_space.n -# print(state) -# for _ in range(num_steps) : -# next_state, reward, done,_ = env.step(random.choice(range(n_actions))) -# print(next_state) -# if (done) : -# _ = env.reset() - \ No newline at end of file diff --git a/projects/codes/envs/snake/README.md b/projects/codes/envs/snake/README.md deleted file mode 100644 index b49b4e8..0000000 --- a/projects/codes/envs/snake/README.md +++ /dev/null @@ -1,38 +0,0 @@ -# 贪吃蛇 - -贪吃蛇是一个起源于1976年的街机游戏 Blockade,玩家控制蛇上下左右吃到食物并将身体增长,吃到食物后移动速度逐渐加快,直到碰到墙体或者蛇的身体算游戏结束。 - -![image-20200901202636603](img/image-20200901202636603.png) - -如图,本次任务整个游戏版面大小为560X560,绿色部分就是我们的智能体贪吃蛇,红色方块就是食物,墙位于四周,一旦食物被吃掉,会在下一个随机位置刷出新的食物。蛇的每一节以及食物的大小为40X40,除开墙体(厚度也为40),蛇可以活动的范围为480X480,也就是12X12的栅格。环境的状态等信息如下: - -* state:为一个元组,包含(adjoining_wall_x, adjoining_wall_y, food_dir_x, food_dir_y, adjoining_body_top, adjoining_body_bottom, adjoining_body_left, adjoining_body_right). - - * [adjoining_wall_x, adjoining_wall_y]:提供蛇头是否与墙体相邻的信息,具体包含9个状态 - - adjoining_wall_x:0表示x轴方向蛇头无墙体相邻,1表示有墙在蛇头左边,2表示有墙在右边adjoining_wall_y:0表示y轴方向蛇头无墙体相邻,1表示有墙在蛇头上边,2表示有墙在下边 - - 注意[0,0]也包括蛇跑出480X480范围的情况 - - * [food_dir_x, food_dir_y]:表示食物与蛇头的位置关系 - - food_dir_x:0表示食物与蛇头同在x轴上,1表示食物在蛇头左侧(不一定相邻),2表示在右边 - - food_dir_y:0表示食物与蛇头同在y轴上,1表示食物在蛇头上面,2表示在下面 - - * [adjoining_body_top, adjoining_body_bottom, adjoining_body_left, adjoining_body_right]:用以检查蛇的身体是否在蛇头的附近 - - adjoining_body_top:1表示蛇头上边有蛇的身体,0表示没有 - - adjoining_body_bottom:1表示蛇头下边有蛇的身体,0表示没有 - - adjoining_body_left:1表示蛇头左边有蛇的身体,0表示没有 - - adjoining_body_right:1表示蛇头右边有蛇的身体,0表示没有 - -* action:即上下左右 - -* reward:如果吃到食物给一个+1的reward,如果蛇没了就-1,其他情况给-0.1的reward - - - diff --git a/projects/codes/envs/snake/agent.py b/projects/codes/envs/snake/agent.py deleted file mode 100644 index b32de9d..0000000 --- a/projects/codes/envs/snake/agent.py +++ /dev/null @@ -1,106 +0,0 @@ -import numpy as np -import utils -import random -import math - - -class Agent: - - def __init__(self, actions, Ne, C, gamma): - self.actions = actions - self.Ne = Ne # used in exploration function - self.C = C - self.gamma = gamma - - # Create the Q and N Table to work with - self.Q = utils.create_q_table() - self.N = utils.create_q_table() - self.reset() - - def train(self): - self._train = True - - def eval(self): - self._train = False - - # At the end of training save the trained model - def save_model(self, model_path): - utils.save(model_path, self.Q) - - # Load the trained model for evaluation - def load_model(self, model_path): - self.Q = utils.load(model_path) - - def reset(self): - self.points = 0 - self.s = None - self.a = None - - def f(self,u,n): - if n < self.Ne: - return 1 - return u - - def R(self,points,dead): - if dead: - return -1 - elif points > self.points: - return 1 - return -0.1 - - def get_state(self, state): - # [adjoining_wall_x, adjoining_wall_y] - adjoining_wall_x = int(state[0] == utils.WALL_SIZE) + 2 * int(state[0] == utils.DISPLAY_SIZE - utils.WALL_SIZE) - adjoining_wall_y = int(state[1] == utils.WALL_SIZE) + 2 * int(state[1] == utils.DISPLAY_SIZE - utils.WALL_SIZE) - # [food_dir_x, food_dir_y] - food_dir_x = 1 + int(state[0] < state[3]) - int(state[0] == state[3]) - food_dir_y = 1 + int(state[1] < state[4]) - int(state[1] == state[4]) - # [adjoining_body_top, adjoining_body_bottom, adjoining_body_left, adjoining_body_right] - adjoining_body = [(state[0] - body_state[0], state[1] - body_state[1]) for body_state in state[2]] - adjoining_body_top = int([0, utils.GRID_SIZE] in adjoining_body) - adjoining_body_bottom = int([0, -utils.GRID_SIZE] in adjoining_body) - adjoining_body_left = int([utils.GRID_SIZE, 0] in adjoining_body) - adjoining_body_right = int([-utils.GRID_SIZE, 0] in adjoining_body) - return adjoining_wall_x, adjoining_wall_y, food_dir_x, food_dir_y, adjoining_body_top, adjoining_body_bottom, adjoining_body_left, adjoining_body_right - - - def update(self, _state, points, dead): - if self.s: - maxq = max(self.Q[_state]) - reward = self.R(points,dead) - alpha = self.C / (self.C + self.N[self.s][self.a]) - self.Q[self.s][self.a] += alpha * (reward + self.gamma * maxq - self.Q[self.s][self.a]) - self.N[self.s][self.a] += 1.0 - - def choose_action(self, state, points, dead): - ''' - :param state: a list of [snake_head_x, snake_head_y, snake_body, food_x, food_y] from environment. - :param points: float, the current points from environment - :param dead: boolean, if the snake is dead - :return: the index of action. 0,1,2,3 indicates up,down,left,right separately - Return the index of action the snake needs to take, according to the state and points known from environment. - Tips: you need to discretize the state to the state space defined on the webpage first. - (Note that [adjoining_wall_x=0, adjoining_wall_y=0] is also the case when snake runs out of the 480x480 board) - ''' - - _state = self.get_state(state) - Qs = self.Q[_state][:] - - if self._train: - self.update(_state, points, dead) - if dead: - self.reset() - return - Ns = self.N[_state] - Fs = [self.f(Qs[a], Ns[a]) for a in self.actions] - action = np.argmax(Fs) - self.s = _state - self.a = action - else: - if dead: - self.reset() - return - action = np.argmax(Qs) - - self.points = points - return action diff --git a/projects/codes/envs/snake/example_assignment_and_report2.pdf b/projects/codes/envs/snake/example_assignment_and_report2.pdf deleted file mode 100644 index 84008c0..0000000 Binary files a/projects/codes/envs/snake/example_assignment_and_report2.pdf and /dev/null differ diff --git a/projects/codes/envs/snake/main.py b/projects/codes/envs/snake/main.py deleted file mode 100644 index 16776ad..0000000 --- a/projects/codes/envs/snake/main.py +++ /dev/null @@ -1,185 +0,0 @@ -import pygame -from pygame.locals import * -import argparse - -from agent import Agent -from snake_env import SnakeEnv -import utils -import time - -def get_args(): - parser = argparse.ArgumentParser(description='CS440 MP4 Snake') - - parser.add_argument('--human', default = False, action="store_true", - help='making the game human playable - default False') - - parser.add_argument('--model_name', dest="model_name", type=str, default="checkpoint3.npy", - help='name of model to save if training or to load if evaluating - default q_agent') - - parser.add_argument('--train_episodes', dest="train_eps", type=int, default=10000, - help='number of training episodes - default 10000') - - parser.add_argument('--test_episodes', dest="test_eps", type=int, default=1000, - help='number of testing episodes - default 1000') - - parser.add_argument('--show_episodes', dest="show_eps", type=int, default=10, - help='number of displayed episodes - default 10') - - parser.add_argument('--window', dest="window", type=int, default=100, - help='number of episodes to keep running stats for during training - default 100') - - parser.add_argument('--Ne', dest="Ne", type=int, default=40, - help='the Ne parameter used in exploration function - default 40') - - parser.add_argument('--C', dest="C", type=int, default=40, - help='the C parameter used in learning rate - default 40') - - parser.add_argument('--gamma', dest="gamma", type=float, default=0.2, - help='the gamma paramter used in learning rate - default 0.7') - - parser.add_argument('--snake_head_x', dest="snake_head_x", type=int, default=200, - help='initialized x position of snake head - default 200') - - parser.add_argument('--snake_head_y', dest="snake_head_y", type=int, default=200, - help='initialized y position of snake head - default 200') - - parser.add_argument('--food_x', dest="food_x", type=int, default=80, - help='initialized x position of food - default 80') - - parser.add_argument('--food_y', dest="food_y", type=int, default=80, - help='initialized y position of food - default 80') - cfg = parser.parse_args() - return cfg - -class Application: - def __init__(self, args): - self.args = args - self.env = SnakeEnv(args.snake_head_x, args.snake_head_y, args.food_x, args.food_y) - self.agent = Agent(self.env.get_actions(), args.Ne, args.C, args.gamma) - - def execute(self): - if not self.args.human: - if self.args.train_eps != 0: - self.train() - self.eval() - self.show_games() - - def train(self): - print("Train Phase:") - self.agent.train() - window = self.args.window - self.points_results = [] - first_eat = True - start = time.time() - - for game in range(1, self.args.train_eps + 1): - state = self.env.get_state() - dead = False - action = self.agent.choose_action(state, 0, dead) - while not dead: - state, points, dead = self.env.step(action) - - # For debug convenience, you can check if your Q-table mathches ours for given setting of parameters - # (see Debug Convenience part on homework 4 web page) - if first_eat and points == 1: - self.agent.save_model(utils.CHECKPOINT) - first_eat = False - - action = self.agent.choose_action(state, points, dead) - - - points = self.env.get_points() - self.points_results.append(points) - if game % self.args.window == 0: - print( - "Games:", len(self.points_results) - window, "-", len(self.points_results), - "Points (Average:", sum(self.points_results[-window:])/window, - "Max:", max(self.points_results[-window:]), - "Min:", min(self.points_results[-window:]),")", - ) - self.env.reset() - print("Training takes", time.time() - start, "seconds") - self.agent.save_model(self.args.model_name) - - def eval(self): - print("Evaling Phase:") - self.agent.eval() - self.agent.load_model(self.args.model_name) - points_results = [] - start = time.time() - - for game in range(1, self.args.test_eps + 1): - state = self.env.get_state() - dead = False - action = self.agent.choose_action(state, 0, dead) - while not dead: - state, points, dead = self.env.step(action) - action = self.agent.choose_action(state, points, dead) - points = self.env.get_points() - points_results.append(points) - self.env.reset() - - print("Testing takes", time.time() - start, "seconds") - print("Number of Games:", len(points_results)) - print("Average Points:", sum(points_results)/len(points_results)) - print("Max Points:", max(points_results)) - print("Min Points:", min(points_results)) - - def show_games(self): - print("Display Games") - self.env.display() - pygame.event.pump() - self.agent.eval() - points_results = [] - end = False - for game in range(1, self.args.show_eps + 1): - state = self.env.get_state() - dead = False - action = self.agent.choose_action(state, 0, dead) - count = 0 - while not dead: - count +=1 - pygame.event.pump() - keys = pygame.key.get_pressed() - if keys[K_ESCAPE] or self.check_quit(): - end = True - break - state, points, dead = self.env.step(action) - # Qlearning agent - if not self.args.human: - action = self.agent.choose_action(state, points, dead) - # for human player - else: - for event in pygame.event.get(): - if event.type == pygame.KEYDOWN: - if event.key == pygame.K_UP: - action = 2 - elif event.key == pygame.K_DOWN: - action = 3 - elif event.key == pygame.K_LEFT: - action = 1 - elif event.key == pygame.K_RIGHT: - action = 0 - if end: - break - self.env.reset() - points_results.append(points) - print("Game:", str(game)+"/"+str(self.args.show_eps), "Points:", points) - if len(points_results) == 0: - return - print("Average Points:", sum(points_results)/len(points_results)) - - def check_quit(self): - for event in pygame.event.get(): - if event.type == pygame.QUIT: - return True - return False - - -def main(): - cfg = get_args() - app = Application(cfg) - app.execute() - -if __name__ == "__main__": - main() diff --git a/projects/codes/envs/snake/snake_env.py b/projects/codes/envs/snake/snake_env.py deleted file mode 100644 index a4afe0a..0000000 --- a/projects/codes/envs/snake/snake_env.py +++ /dev/null @@ -1,202 +0,0 @@ -import random -import pygame -import utils - -class SnakeEnv: - def __init__(self, snake_head_x, snake_head_y, food_x, food_y): - self.game = Snake(snake_head_x, snake_head_y, food_x, food_y) - self.render = False - - def get_actions(self): - return self.game.get_actions() - - def reset(self): - return self.game.reset() - - def get_points(self): - return self.game.get_points() - - def get_state(self): - return self.game.get_state() - - def step(self, action): - state, points, dead = self.game.step(action) - if self.render: - self.draw(state, points, dead) - # return state, reward, done - return state, points, dead - - def draw(self, state, points, dead): - snake_head_x, snake_head_y, snake_body, food_x, food_y = state - self.display.fill(utils.BLUE) - pygame.draw.rect( self.display, utils.BLACK, - [ - utils.GRID_SIZE, - utils.GRID_SIZE, - utils.DISPLAY_SIZE - utils.GRID_SIZE * 2, - utils.DISPLAY_SIZE - utils.GRID_SIZE * 2 - ]) - - # draw snake head - pygame.draw.rect( - self.display, - utils.GREEN, - [ - snake_head_x, - snake_head_y, - utils.GRID_SIZE, - utils.GRID_SIZE - ], - 3 - ) - # draw snake body - for seg in snake_body: - pygame.draw.rect( - self.display, - utils.GREEN, - [ - seg[0], - seg[1], - utils.GRID_SIZE, - utils.GRID_SIZE, - ], - 1 - ) - # draw food - pygame.draw.rect( - self.display, - utils.RED, - [ - food_x, - food_y, - utils.GRID_SIZE, - utils.GRID_SIZE - ] - ) - - text_surface = self.font.render("Points: " + str(points), True, utils.BLACK) - text_rect = text_surface.get_rect() - text_rect.center = ((280),(25)) - self.display.blit(text_surface, text_rect) - pygame.display.flip() - if dead: - # slow clock if dead - self.clock.tick(1) - else: - self.clock.tick(5) - - return - - - def display(self): - pygame.init() - pygame.display.set_caption('MP4: Snake') - self.clock = pygame.time.Clock() - pygame.font.init() - - self.font = pygame.font.Font(pygame.font.get_default_font(), 15) - self.display = pygame.display.set_mode((utils.DISPLAY_SIZE, utils.DISPLAY_SIZE), pygame.HWSURFACE) - self.draw(self.game.get_state(), self.game.get_points(), False) - self.render = True - -class Snake: - def __init__(self, snake_head_x, snake_head_y, food_x, food_y): - self.init_snake_head_x,self.init_snake_head_y = snake_head_x,snake_head_y # 蛇头初始位置 - self.init_food_x, self.init_food_y = food_x, food_y # 食物初始位置 - self.reset() - - def reset(self): - self.points = 0 - self.snake_head_x, self.snake_head_y = self.init_snake_head_x, self.init_snake_head_y - self.food_x, self.food_y = self.init_food_x, self.init_food_y - self.snake_body = [] # 蛇身的位置集合 - - def get_points(self): - return self.points - - def get_actions(self): - return [0, 1, 2, 3] - - def get_state(self): - return [ - self.snake_head_x, - self.snake_head_y, - self.snake_body, - self.food_x, - self.food_y - ] - - def move(self, action): - '''根据action指令移动蛇头,并返回是否撞死 - ''' - delta_x = delta_y = 0 - if action == 0: # 上 - delta_x = utils.GRID_SIZE - elif action == 1: - delta_x = - utils.GRID_SIZE - elif action == 2: - delta_y = - utils.GRID_SIZE - elif action == 3: - delta_y = utils.GRID_SIZE - old_body_head = None - if len(self.snake_body) == 1: - old_body_head = self.snake_body[0] - - self.snake_body.append((self.snake_head_x, self.snake_head_y)) - self.snake_head_x += delta_x - self.snake_head_y += delta_y - - if len(self.snake_body) > self.points: # 说明没有吃到食物 - del(self.snake_body[0]) - - self.handle_eatfood() - - # 蛇长大于1时,蛇头与蛇身任一位置重叠则看作蛇与自身相撞 - if len(self.snake_body) >= 1: - for seg in self.snake_body: - if self.snake_head_x == seg[0] and self.snake_head_y == seg[1]: - return True - - # 蛇长为1时,如果蛇头与之前的位置重复则看作蛇与自身相撞 - if len(self.snake_body) == 1: - if old_body_head == (self.snake_head_x, self.snake_head_y): - return True - - # 蛇头是否撞墙 - if (self.snake_head_x < utils.GRID_SIZE or self.snake_head_y < utils.GRID_SIZE or - self.snake_head_x + utils.GRID_SIZE > utils.DISPLAY_SIZE-utils.GRID_SIZE or self.snake_head_y + utils.GRID_SIZE > utils.DISPLAY_SIZE-utils.GRID_SIZE): - return True - - return False - - def step(self, action): - is_dead = self.move(action) - return self.get_state(), self.get_points(), is_dead - - def handle_eatfood(self): - if (self.snake_head_x == self.food_x) and (self.snake_head_y == self.food_y): - self.random_food() - self.points += 1 - - def random_food(self): - '''生成随机位置的食物 - ''' - max_x = (utils.DISPLAY_SIZE - utils.WALL_SIZE - utils.GRID_SIZE) - max_y = (utils.DISPLAY_SIZE - utils.WALL_SIZE - utils.GRID_SIZE) - - self.food_x = random.randint(utils.WALL_SIZE, max_x)//utils.GRID_SIZE * utils.GRID_SIZE - self.food_y = random.randint(utils.WALL_SIZE, max_y)//utils.GRID_SIZE * utils.GRID_SIZE - - while self.check_food_on_snake(): # 食物不能生成在蛇身上 - self.food_x = random.randint(utils.WALL_SIZE, max_x)//utils.GRID_SIZE * utils.GRID_SIZE - self.food_y = random.randint(utils.WALL_SIZE, max_y)//utils.GRID_SIZE * utils.GRID_SIZE - - def check_food_on_snake(self): - if self.food_x == self.snake_head_x and self.food_y == self.snake_head_y: - return True - for seg in self.snake_body: - if self.food_x == seg[0] and self.food_y == seg[1]: - return True - return False - - diff --git a/projects/codes/envs/snake/utils.py b/projects/codes/envs/snake/utils.py deleted file mode 100644 index 01c9b00..0000000 --- a/projects/codes/envs/snake/utils.py +++ /dev/null @@ -1,55 +0,0 @@ -import numpy as np -DISPLAY_SIZE = 560 -GRID_SIZE = 40 -WALL_SIZE = 40 -WHITE = (255, 255, 255) -RED = (255, 0, 0) -BLUE = (72, 61, 139) -BLACK = (0, 0, 0) -GREEN = (0, 255, 0) - -NUM_ADJOINING_WALL_X_STATES=3 -NUM_ADJOINING_WALL_Y_STATES=3 -NUM_FOOD_DIR_X=3 -NUM_FOOD_DIR_Y=3 -NUM_ADJOINING_BODY_TOP_STATES=2 -NUM_ADJOINING_BODY_BOTTOM_STATES=2 -NUM_ADJOINING_BODY_LEFT_STATES=2 -NUM_ADJOINING_BODY_RIGHT_STATES=2 -NUM_ACTIONS = 4 - -CHECKPOINT = 'checkpoint.npy' - -def create_q_table(): - return np.zeros((NUM_ADJOINING_WALL_X_STATES, NUM_ADJOINING_WALL_Y_STATES, NUM_FOOD_DIR_X, NUM_FOOD_DIR_Y, - NUM_ADJOINING_BODY_TOP_STATES, NUM_ADJOINING_BODY_BOTTOM_STATES, NUM_ADJOINING_BODY_LEFT_STATES, - NUM_ADJOINING_BODY_RIGHT_STATES, NUM_ACTIONS)) - -def sanity_check(arr): - if (type(arr) is np.ndarray and - arr.shape==(NUM_ADJOINING_WALL_X_STATES, NUM_ADJOINING_WALL_Y_STATES, NUM_FOOD_DIR_X, NUM_FOOD_DIR_Y, - NUM_ADJOINING_BODY_TOP_STATES, NUM_ADJOINING_BODY_BOTTOM_STATES, NUM_ADJOINING_BODY_LEFT_STATES, - NUM_ADJOINING_BODY_RIGHT_STATES,NUM_ACTIONS)): - return True - else: - return False - -def save(filename, arr): - if sanity_check(arr): - np.save(filename,arr) - return True - else: - print("Failed to save model") - return False - -def load(filename): - try: - arr = np.load(filename) - if sanity_check(arr): - print("Loaded model successfully") - return arr - print("Model loaded is not in the required format") - return None - except: - print("Filename doesnt exist") - return None \ No newline at end of file diff --git a/projects/codes/envs/stochastic_mdp.py b/projects/codes/envs/stochastic_mdp.py deleted file mode 100644 index 3c1ad4d..0000000 --- a/projects/codes/envs/stochastic_mdp.py +++ /dev/null @@ -1,53 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -''' -Author: John -Email: johnjim0816@gmail.com -Date: 2021-03-24 22:12:19 -LastEditor: John -LastEditTime: 2021-03-26 17:12:43 -Discription: -Environment: -''' -import numpy as np -import random - - -class StochasticMDP: - def __init__(self): - self.end = False - self.curr_state = 2 - self.n_actions = 2 - self.n_states = 6 - self.p_right = 0.5 - - def reset(self): - self.end = False - self.curr_state = 2 - state = np.zeros(self.n_states) - state[self.curr_state - 1] = 1. - return state - - def step(self, action): - if self.curr_state != 1: - if action == 1: - if random.random() < self.p_right and self.curr_state < self.n_states: - self.curr_state += 1 - else: - self.curr_state -= 1 - - if action == 0: - self.curr_state -= 1 - if self.curr_state == self.n_states: - self.end = True - - state = np.zeros(self.n_states) - state[self.curr_state - 1] = 1. - - if self.curr_state == 1: - if self.end: - return state, 1.00, True, {} - else: - return state, 1.00/100.00, True, {} - else: - return state, 0.0, False, {} diff --git a/projects/codes/envs/track.txt b/projects/codes/envs/track.txt deleted file mode 100644 index 4bbe230..0000000 --- a/projects/codes/envs/track.txt +++ /dev/null @@ -1,15 +0,0 @@ -1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 1 1 1 1 1 1 1 0 0 0 0 0 3 3 3 3 3 1 -1 1 1 1 1 1 0 0 0 0 0 0 0 3 3 3 3 3 1 -1 1 1 1 1 0 0 0 0 0 0 0 0 3 3 3 3 3 1 -1 1 1 1 0 0 0 0 0 0 0 0 0 3 3 3 3 3 1 -1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 -1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 -1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 -1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 \ No newline at end of file diff --git a/projects/codes/envs/windy_gridworld.py b/projects/codes/envs/windy_gridworld.py deleted file mode 100644 index 2a9d4a4..0000000 --- a/projects/codes/envs/windy_gridworld.py +++ /dev/null @@ -1,82 +0,0 @@ -import gym -import numpy as np -import sys -from gym.envs.toy_text import discrete - -UP = 0 -RIGHT = 1 -DOWN = 2 -LEFT = 3 - -class WindyGridworldEnv(discrete.DiscreteEnv): - - metadata = {'render.modes': ['human', 'ansi']} - - def _limit_coordinates(self, coord): - coord[0] = min(coord[0], self.shape[0] - 1) - coord[0] = max(coord[0], 0) - coord[1] = min(coord[1], self.shape[1] - 1) - coord[1] = max(coord[1], 0) - return coord - - def _calculate_transition_prob(self, current, delta, winds): - new_position = np.array(current) + np.array(delta) + np.array([-1, 0]) * winds[tuple(current)] - new_position = self._limit_coordinates(new_position).astype(int) - new_state = np.ravel_multi_index(tuple(new_position), self.shape) - is_done = tuple(new_position) == (3, 7) - return [(1.0, new_state, -1.0, is_done)] - - def __init__(self): - self.shape = (7, 10) - - nS = np.prod(self.shape) - n_actions = 4 - - # Wind strength - winds = np.zeros(self.shape) - winds[:,[3,4,5,8]] = 1 - winds[:,[6,7]] = 2 - - # Calculate transition probabilities - P = {} - for s in range(nS): - position = np.unravel_index(s, self.shape) - P[s] = { a : [] for a in range(n_actions) } - P[s][UP] = self._calculate_transition_prob(position, [-1, 0], winds) - P[s][RIGHT] = self._calculate_transition_prob(position, [0, 1], winds) - P[s][DOWN] = self._calculate_transition_prob(position, [1, 0], winds) - P[s][LEFT] = self._calculate_transition_prob(position, [0, -1], winds) - - # We always start in state (3, 0) - isd = np.zeros(nS) - isd[np.ravel_multi_index((3,0), self.shape)] = 1.0 - - super(WindyGridworldEnv, self).__init__(nS, n_actions, P, isd) - - def render(self, mode='human', close=False): - self._render(mode, close) - - def _render(self, mode='human', close=False): - if close: - return - - outfile = StringIO() if mode == 'ansi' else sys.stdout - - for s in range(self.nS): - position = np.unravel_index(s, self.shape) - # print(self.s) - if self.s == s: - output = " x " - elif position == (3,7): - output = " T " - else: - output = " o " - - if position[1] == 0: - output = output.lstrip() - if position[1] == self.shape[1] - 1: - output = output.rstrip() - output += "\n" - - outfile.write(output) - outfile.write("\n") diff --git a/projects/codes/envs/wrappers.py b/projects/codes/envs/wrappers.py deleted file mode 100644 index 0baa03b..0000000 --- a/projects/codes/envs/wrappers.py +++ /dev/null @@ -1,78 +0,0 @@ -import gym -class CliffWalkingWapper(gym.Wrapper): - def __init__(self, env): - gym.Wrapper.__init__(self, env) - self.t = None - self.unit = 50 - self.max_x = 12 - self.max_y = 4 - - def draw_x_line(self, y, x0, x1, color='gray'): - assert x1 > x0 - self.t.color(color) - self.t.setheading(0) - self.t.up() - self.t.goto(x0, y) - self.t.down() - self.t.forward(x1 - x0) - - def draw_y_line(self, x, y0, y1, color='gray'): - assert y1 > y0 - self.t.color(color) - self.t.setheading(90) - self.t.up() - self.t.goto(x, y0) - self.t.down() - self.t.forward(y1 - y0) - - def draw_box(self, x, y, fillcolor='', line_color='gray'): - self.t.up() - self.t.goto(x * self.unit, y * self.unit) - self.t.color(line_color) - self.t.fillcolor(fillcolor) - self.t.setheading(90) - self.t.down() - self.t.begin_fill() - for i in range(4): - self.t.forward(self.unit) - self.t.right(90) - self.t.end_fill() - - def move_player(self, x, y): - self.t.up() - self.t.setheading(90) - self.t.fillcolor('red') - self.t.goto((x + 0.5) * self.unit, (y + 0.5) * self.unit) - - def render(self): - if self.t == None: - self.t = turtle.Turtle() - self.wn = turtle.Screen() - self.wn.setup(self.unit * self.max_x + 100, - self.unit * self.max_y + 100) - self.wn.setworldcoordinates(0, 0, self.unit * self.max_x, - self.unit * self.max_y) - self.t.shape('circle') - self.t.width(2) - self.t.speed(0) - self.t.color('gray') - for _ in range(2): - self.t.forward(self.max_x * self.unit) - self.t.left(90) - self.t.forward(self.max_y * self.unit) - self.t.left(90) - for i in range(1, self.max_y): - self.draw_x_line( - y=i * self.unit, x0=0, x1=self.max_x * self.unit) - for i in range(1, self.max_x): - self.draw_y_line( - x=i * self.unit, y0=0, y1=self.max_y * self.unit) - - for i in range(1, self.max_x - 1): - self.draw_box(i, 0, 'black') - self.draw_box(self.max_x - 1, 0, 'yellow') - self.t.shape('turtle') - - x_pos = self.s % self.max_x - y_pos = self.max_y - 1 - int(self.s / self.max_x) - self.move_player(x_pos, y_pos) \ No newline at end of file diff --git a/projects/codes/scripts/DQN_Acrobot-v1.sh b/projects/codes/scripts/DQN_Acrobot-v1.sh deleted file mode 100644 index 623a0cc..0000000 --- a/projects/codes/scripts/DQN_Acrobot-v1.sh +++ /dev/null @@ -1,3 +0,0 @@ -# run DQN on Acrobot-v1, not the best tuned parameters -codes_dir=$(dirname $(dirname $(readlink -f "$0"))) # "codes" path -python $codes_dir/DQN/main.py --env_name Acrobot-v1 --train_eps 100 --epsilon_decay 1500 --lr 0.002 --memory_capacity 200000 --batch_size 128 --device cuda \ No newline at end of file diff --git a/projects/codes/scripts/DQN_CartPole-v1.sh b/projects/codes/scripts/DQN_CartPole-v1.sh deleted file mode 100644 index e4fe811..0000000 --- a/projects/codes/scripts/DQN_CartPole-v1.sh +++ /dev/null @@ -1,3 +0,0 @@ -# run DQN on CartPole-v1, not finished yet -codes_dir=$(dirname $(dirname $(readlink -f "$0"))) # "codes" path -python $codes_dir/DQN/main.py --env_name CartPole-v1 --train_eps 2000 --gamma 0.99 --epsilon_decay 6000 --lr 0.00001 --memory_capacity 200000 --batch_size 64 --device cuda \ No newline at end of file diff --git a/projects/codes/scripts/DoubleDQN_CartPole-v0.sh b/projects/codes/scripts/DoubleDQN_CartPole-v0.sh deleted file mode 100644 index 0154227..0000000 --- a/projects/codes/scripts/DoubleDQN_CartPole-v0.sh +++ /dev/null @@ -1,3 +0,0 @@ -# run Double DQN on CartPole-v0 -codes_dir=$(dirname $(dirname $(readlink -f "$0"))) # "codes" path -python $codes_dir/DoubleDQN/main.py --device cuda \ No newline at end of file diff --git a/projects/codes/scripts/PolicyGradient_CartPole-v0.sh b/projects/codes/scripts/PolicyGradient_CartPole-v0.sh deleted file mode 100644 index d7e0a69..0000000 --- a/projects/codes/scripts/PolicyGradient_CartPole-v0.sh +++ /dev/null @@ -1,13 +0,0 @@ -# source conda, if you are already in proper conda environment, then comment the codes util "conda activate easyrl" -if [ -f "$HOME/anaconda3/etc/profile.d/conda.sh" ]; then - echo "source file at ~/anaconda3/etc/profile.d/conda.sh" - source ~/anaconda3/etc/profile.d/conda.sh -elif [ -f "$HOME/opt/anaconda3/etc/profile.d/conda.sh" ]; then - echo "source file at ~/opt/anaconda3/etc/profile.d/conda.sh" - source ~/opt/anaconda3/etc/profile.d/conda.sh -else - echo 'please manually config the conda source path' -fi -conda activate easyrl # easyrl here can be changed to another name of conda env that you have created -codes_dir=$(dirname $(dirname $(readlink -f "$0"))) # "codes" path -python $codes_dir/PolicyGradient/main.py \ No newline at end of file diff --git a/projects/codes/scripts/Qlearning_CliffWalking-v0.sh b/projects/codes/scripts/Qlearning_CliffWalking-v0.sh deleted file mode 100644 index 6ba8b53..0000000 --- a/projects/codes/scripts/Qlearning_CliffWalking-v0.sh +++ /dev/null @@ -1,2 +0,0 @@ -codes_dir=$(dirname $(dirname $(readlink -f "$0"))) # "codes" path -python $codes_dir/QLearning/main.py --env_name CliffWalking-v0 --train_eps 400 --gamma 0.90 --epsilon_start 0.95 --epsilon_end 0.01 --epsilon_decay 300 --lr 0.1 --device cpu \ No newline at end of file diff --git a/projects/codes/scripts/Qlearning_FrozenLakeNoSlippery-v1.sh b/projects/codes/scripts/Qlearning_FrozenLakeNoSlippery-v1.sh deleted file mode 100644 index c4638fe..0000000 --- a/projects/codes/scripts/Qlearning_FrozenLakeNoSlippery-v1.sh +++ /dev/null @@ -1,2 +0,0 @@ -codes_dir=$(dirname $(dirname $(readlink -f "$0"))) # "codes" path -python $codes_dir/QLearning/main.py --env_name FrozenLakeNoSlippery-v1 --train_eps 800 --epsilon_start 0.70 --epsilon_end 0.1 --epsilon_decay 2000 --gamma 0.9 --lr 0.9 --device cpu \ No newline at end of file diff --git a/projects/codes/scripts/Qlearning_Racetrack-v0.sh b/projects/codes/scripts/Qlearning_Racetrack-v0.sh deleted file mode 100644 index aba42b2..0000000 --- a/projects/codes/scripts/Qlearning_Racetrack-v0.sh +++ /dev/null @@ -1,2 +0,0 @@ -codes_dir=$(dirname $(dirname $(readlink -f "$0"))) # "codes" path -python $codes_dir/QLearning/main.py --env_name Racetrack-v0 --device cpu \ No newline at end of file diff --git a/projects/codes/scripts/Sarsa_CliffWalking-v0.sh b/projects/codes/scripts/Sarsa_CliffWalking-v0.sh deleted file mode 100644 index 9207c9d..0000000 --- a/projects/codes/scripts/Sarsa_CliffWalking-v0.sh +++ /dev/null @@ -1,2 +0,0 @@ -codes_dir=$(dirname $(dirname $(readlink -f "$0"))) # "codes" path -python $codes_dir/Sarsa/main.py --env_name CliffWalking-v0 --train_eps 400 --gamma 0.90 --epsilon_start 0.95 --epsilon_end 0.01 --epsilon_decay 300 --lr 0.1 --device cpu \ No newline at end of file diff --git a/projects/codes/scripts/Sarsa_FrozenLakeNoSlippery-v1.sh b/projects/codes/scripts/Sarsa_FrozenLakeNoSlippery-v1.sh deleted file mode 100644 index 9c77e75..0000000 --- a/projects/codes/scripts/Sarsa_FrozenLakeNoSlippery-v1.sh +++ /dev/null @@ -1,2 +0,0 @@ -codes_dir=$(dirname $(dirname $(readlink -f "$0"))) # "codes" path -python $codes_dir/Sarsa/main.py --env_name FrozenLakeNoSlippery-v1 --train_eps 800 --ep_max_steps 10 --epsilon_start 0.50 --epsilon_end 0.01 --epsilon_decay 2000 --gamma 0.9 --lr 0.1 --device cpu \ No newline at end of file diff --git a/projects/codes/scripts/Sarsa_Racetrack-v0.sh b/projects/codes/scripts/Sarsa_Racetrack-v0.sh deleted file mode 100644 index ff8317e..0000000 --- a/projects/codes/scripts/Sarsa_Racetrack-v0.sh +++ /dev/null @@ -1,2 +0,0 @@ -codes_dir=$(dirname $(dirname $(readlink -f "$0"))) # "codes" path -python $codes_dir/Sarsa/main.py --env_name Racetrack-v0 \ No newline at end of file diff --git a/projects/notebooks/2.Sarsa.ipynb b/projects/notebooks/2.Sarsa.ipynb deleted file mode 100644 index 493cb59..0000000 --- a/projects/notebooks/2.Sarsa.ipynb +++ /dev/null @@ -1,896 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 1、定义算法\n", - "\n", - "在阅读该教程之前,请先阅读Q learning教程。Sarsa算法跟Q learning算法基本模式相同,但是根本的区别在于,Sarsa是先做出动作然后拿这个做的动作去更新,而Q learning是假定下一步最大奖励对应的动作拿去更新,然后再使用$\\varepsilon$-greedy策略,也就是说Sarsa是on-policy的,而Q learning是off-policy的。如下方代码所示,只有在更新的地方Sarsa与Q learning有着细微的区别。" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "from collections import defaultdict\n", - "import torch\n", - "import math\n", - "class Sarsa(object):\n", - " def __init__(self,\n", - " n_actions,cfg):\n", - " self.n_actions = n_actions \n", - " self.lr = cfg.lr \n", - " self.gamma = cfg.gamma \n", - " self.sample_count = 0 \n", - " self.epsilon_start = cfg.epsilon_start\n", - " self.epsilon_end = cfg.epsilon_end\n", - " self.epsilon_decay = cfg.epsilon_decay \n", - " self.Q = defaultdict(lambda: np.zeros(n_actions)) # Q table\n", - " def sample(self, state):\n", - " self.sample_count += 1\n", - " self.epsilon = self.epsilon_end + (self.epsilon_start - self.epsilon_end) * \\\n", - " math.exp(-1. * self.sample_count / self.epsilon_decay) # The probability to select a random action, is is log decayed\n", - " best_action = np.argmax(self.Q[state])\n", - " action_probs = np.ones(self.n_actions, dtype=float) * self.epsilon / self.n_actions\n", - " action_probs[best_action] += (1.0 - self.epsilon)\n", - " action = np.random.choice(np.arange(len(action_probs)), p=action_probs) \n", - " return action\n", - " def predict(self,state):\n", - " return np.argmax(self.Q[state])\n", - " def update(self, state, action, reward, next_state, next_action,done):\n", - " Q_predict = self.Q[state][action]\n", - " if done:\n", - " Q_target = reward # 终止状态\n", - " else:\n", - " Q_target = reward + self.gamma * self.Q[next_state][next_action] # 与Q learning不同,Sarsa是拿下一步动作对应的Q值去更新\n", - " self.Q[state][action] += self.lr * (Q_target - Q_predict) \n", - " def save(self,path):\n", - " '''把 Q表格 的数据保存到文件中\n", - " '''\n", - " import dill\n", - " torch.save(\n", - " obj=self.Q,\n", - " f=path+\"sarsa_model.pkl\",\n", - " pickle_module=dill\n", - " )\n", - " def load(self, path):\n", - " '''从文件中读取数据到 Q表格\n", - " '''\n", - " import dill\n", - " self.Q =torch.load(f=path+'sarsa_model.pkl',pickle_module=dill)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 2、定义训练\n", - "\n", - "同样地,跟Q learning差别也不大" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def train(cfg,env,agent):\n", - " print('开始训练!')\n", - " print(f'环境:{cfg.env_name}, 算法:{cfg.algo_name}, 设备:{cfg.device}')\n", - " rewards = [] # 记录奖励\n", - " for i_ep in range(cfg.train_eps):\n", - " ep_reward = 0 # 记录每个回合的奖励\n", - " state = env.reset() # 重置环境,即开始新的回合\n", - " action = agent.sample(state)\n", - " while True:\n", - " action = agent.sample(state) # 根据算法采样一个动作\n", - " next_state, reward, done, _ = env.step(action) # 与环境进行一次动作交互\n", - " next_action = agent.sample(next_state)\n", - " agent.update(state, action, reward, next_state, next_action,done) # 算法更新\n", - " state = next_state # 更新状态\n", - " action = next_action\n", - " ep_reward += reward\n", - " if done:\n", - " break\n", - " rewards.append(ep_reward)\n", - " print(f\"回合:{i_ep+1}/{cfg.train_eps},奖励:{ep_reward:.1f},Epsilon:{agent.epsilon}\")\n", - " print('完成训练!')\n", - " return {\"rewards\":rewards}\n", - " \n", - "def test(cfg,env,agent):\n", - " print('开始测试!')\n", - " print(f'环境:{cfg.env_name}, 算法:{cfg.algo_name}, 设备:{cfg.device}')\n", - " rewards = [] # 记录所有回合的奖励\n", - " for i_ep in range(cfg.test_eps):\n", - " ep_reward = 0 # 记录每个episode的reward\n", - " state = env.reset() # 重置环境, 重新开一局(即开始新的一个回合)\n", - " while True:\n", - " action = agent.predict(state) # 根据算法选择一个动作\n", - " next_state, reward, done, _ = env.step(action) # 与环境进行一个交互\n", - " state = next_state # 更新状态\n", - " ep_reward += reward\n", - " if done:\n", - " break\n", - " rewards.append(ep_reward)\n", - " print(f\"回合数:{i_ep+1}/{cfg.test_eps}, 奖励:{ep_reward:.1f}\")\n", - " print('完成测试!')\n", - " return {\"rewards\":rewards}" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 3、定义环境\n", - "为了具体看看Q learning和Sarsa的不同,笔者决定跟Q learning使用相同的环境\n" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "import gym\n", - "import turtle\n", - "import numpy as np\n", - "\n", - "# turtle tutorial : https://docs.python.org/3.3/library/turtle.html\n", - "\n", - "def GridWorld(gridmap=None, is_slippery=False):\n", - " if gridmap is None:\n", - " gridmap = ['SFFF', 'FHFH', 'FFFH', 'HFFG']\n", - " env = gym.make(\"FrozenLake-v0\", desc=gridmap, is_slippery=False)\n", - " env = FrozenLakeWapper(env)\n", - " return env\n", - "\n", - "\n", - "class FrozenLakeWapper(gym.Wrapper):\n", - " def __init__(self, env):\n", - " gym.Wrapper.__init__(self, env)\n", - " self.max_y = env.desc.shape[0]\n", - " self.max_x = env.desc.shape[1]\n", - " self.t = None\n", - " self.unit = 50\n", - "\n", - " def draw_box(self, x, y, fillcolor='', line_color='gray'):\n", - " self.t.up()\n", - " self.t.goto(x * self.unit, y * self.unit)\n", - " self.t.color(line_color)\n", - " self.t.fillcolor(fillcolor)\n", - " self.t.setheading(90)\n", - " self.t.down()\n", - " self.t.begin_fill()\n", - " for _ in range(4):\n", - " self.t.forward(self.unit)\n", - " self.t.right(90)\n", - " self.t.end_fill()\n", - "\n", - " def move_player(self, x, y):\n", - " self.t.up()\n", - " self.t.setheading(90)\n", - " self.t.fillcolor('red')\n", - " self.t.goto((x + 0.5) * self.unit, (y + 0.5) * self.unit)\n", - "\n", - " def render(self):\n", - " if self.t == None:\n", - " self.t = turtle.Turtle()\n", - " self.wn = turtle.Screen()\n", - " self.wn.setup(self.unit * self.max_x + 100,\n", - " self.unit * self.max_y + 100)\n", - " self.wn.setworldcoordinates(0, 0, self.unit * self.max_x,\n", - " self.unit * self.max_y)\n", - " self.t.shape('circle')\n", - " self.t.width(2)\n", - " self.t.speed(0)\n", - " self.t.color('gray')\n", - " for i in range(self.desc.shape[0]):\n", - " for j in range(self.desc.shape[1]):\n", - " x = j\n", - " y = self.max_y - 1 - i\n", - " if self.desc[i][j] == b'S': # Start\n", - " self.draw_box(x, y, 'white')\n", - " elif self.desc[i][j] == b'F': # Frozen ice\n", - " self.draw_box(x, y, 'white')\n", - " elif self.desc[i][j] == b'G': # Goal\n", - " self.draw_box(x, y, 'yellow')\n", - " elif self.desc[i][j] == b'H': # Hole\n", - " self.draw_box(x, y, 'black')\n", - " else:\n", - " self.draw_box(x, y, 'white')\n", - " self.t.shape('turtle')\n", - "\n", - " x_pos = self.s % self.max_x\n", - " y_pos = self.max_y - 1 - int(self.s / self.max_x)\n", - " self.move_player(x_pos, y_pos)\n", - "\n", - "\n", - "class CliffWalkingWapper(gym.Wrapper):\n", - " def __init__(self, env):\n", - " gym.Wrapper.__init__(self, env)\n", - " self.t = None\n", - " self.unit = 50\n", - " self.max_x = 12\n", - " self.max_y = 4\n", - "\n", - " def draw_x_line(self, y, x0, x1, color='gray'):\n", - " assert x1 > x0\n", - " self.t.color(color)\n", - " self.t.setheading(0)\n", - " self.t.up()\n", - " self.t.goto(x0, y)\n", - " self.t.down()\n", - " self.t.forward(x1 - x0)\n", - "\n", - " def draw_y_line(self, x, y0, y1, color='gray'):\n", - " assert y1 > y0\n", - " self.t.color(color)\n", - " self.t.setheading(90)\n", - " self.t.up()\n", - " self.t.goto(x, y0)\n", - " self.t.down()\n", - " self.t.forward(y1 - y0)\n", - "\n", - " def draw_box(self, x, y, fillcolor='', line_color='gray'):\n", - " self.t.up()\n", - " self.t.goto(x * self.unit, y * self.unit)\n", - " self.t.color(line_color)\n", - " self.t.fillcolor(fillcolor)\n", - " self.t.setheading(90)\n", - " self.t.down()\n", - " self.t.begin_fill()\n", - " for i in range(4):\n", - " self.t.forward(self.unit)\n", - " self.t.right(90)\n", - " self.t.end_fill()\n", - "\n", - " def move_player(self, x, y):\n", - " self.t.up()\n", - " self.t.setheading(90)\n", - " self.t.fillcolor('red')\n", - " self.t.goto((x + 0.5) * self.unit, (y + 0.5) * self.unit)\n", - "\n", - " def render(self):\n", - " if self.t == None:\n", - " self.t = turtle.Turtle()\n", - " self.wn = turtle.Screen()\n", - " self.wn.setup(self.unit * self.max_x + 100,\n", - " self.unit * self.max_y + 100)\n", - " self.wn.setworldcoordinates(0, 0, self.unit * self.max_x,\n", - " self.unit * self.max_y)\n", - " self.t.shape('circle')\n", - " self.t.width(2)\n", - " self.t.speed(0)\n", - " self.t.color('gray')\n", - " for _ in range(2):\n", - " self.t.forward(self.max_x * self.unit)\n", - " self.t.left(90)\n", - " self.t.forward(self.max_y * self.unit)\n", - " self.t.left(90)\n", - " for i in range(1, self.max_y):\n", - " self.draw_x_line(\n", - " y=i * self.unit, x0=0, x1=self.max_x * self.unit)\n", - " for i in range(1, self.max_x):\n", - " self.draw_y_line(\n", - " x=i * self.unit, y0=0, y1=self.max_y * self.unit)\n", - "\n", - " for i in range(1, self.max_x - 1):\n", - " self.draw_box(i, 0, 'black')\n", - " self.draw_box(self.max_x - 1, 0, 'yellow')\n", - " self.t.shape('turtle')\n", - "\n", - " x_pos = self.s % self.max_x\n", - " y_pos = self.max_y - 1 - int(self.s / self.max_x)\n", - " self.move_player(x_pos, y_pos)" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [], - "source": [ - "def env_agent_config(cfg,seed=1):\n", - " '''创建环境和智能体\n", - " Args:\n", - " cfg ([type]): [description]\n", - " seed (int, optional): 随机种子. Defaults to 1.\n", - " Returns:\n", - " env [type]: 环境\n", - " agent : 智能体\n", - " ''' \n", - " env = gym.make(cfg.env_name) \n", - " env = CliffWalkingWapper(env)\n", - " env.seed(seed) # 设置随机种子\n", - " n_states = env.observation_space.n # 状态维度\n", - " n_actions = env.action_space.n # 动作维度\n", - " print(f\"状态数:{n_states},动作数:{n_actions}\")\n", - " agent = Sarsa(n_actions,cfg)\n", - " return env,agent" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 4、设置参数\n", - "同样的参数也是一样" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "import datetime\n", - "import argparse\n", - "import matplotlib.pyplot as plt\n", - "import seaborn as sns\n", - "def get_args():\n", - " \"\"\" \n", - " \"\"\"\n", - " curr_time = datetime.datetime.now().strftime(\"%Y%m%d-%H%M%S\") # 获取当前时间\n", - " parser = argparse.ArgumentParser(description=\"hyperparameters\") \n", - " parser.add_argument('--algo_name',default='Sarsa',type=str,help=\"name of algorithm\")\n", - " parser.add_argument('--env_name',default='CliffWalking-v0',type=str,help=\"name of environment\")\n", - " parser.add_argument('--train_eps',default=400,type=int,help=\"episodes of training\") # 训练的回合数\n", - " parser.add_argument('--test_eps',default=20,type=int,help=\"episodes of testing\") # 测试的回合数\n", - " parser.add_argument('--gamma',default=0.90,type=float,help=\"discounted factor\") # 折扣因子\n", - " parser.add_argument('--epsilon_start',default=0.95,type=float,help=\"initial value of epsilon\") # e-greedy策略中初始epsilon\n", - " parser.add_argument('--epsilon_end',default=0.01,type=float,help=\"final value of epsilon\") # e-greedy策略中的终止epsilon\n", - " parser.add_argument('--epsilon_decay',default=300,type=int,help=\"decay rate of epsilon\") # e-greedy策略中epsilon的衰减率\n", - " parser.add_argument('--lr',default=0.1,type=float,help=\"learning rate\")\n", - " parser.add_argument('--device',default='cpu',type=str,help=\"cpu or cuda\") \n", - " args = parser.parse_args([]) \n", - " return args\n", - "\n", - "def smooth(data, weight=0.9): \n", - " '''用于平滑曲线,类似于Tensorboard中的smooth\n", - "\n", - " Args:\n", - " data (List):输入数据\n", - " weight (Float): 平滑权重,处于0-1之间,数值越高说明越平滑,一般取0.9\n", - "\n", - " Returns:\n", - " smoothed (List): 平滑后的数据\n", - " '''\n", - " last = data[0] # First value in the plot (first timestep)\n", - " smoothed = list()\n", - " for point in data:\n", - " smoothed_val = last * weight + (1 - weight) * point # 计算平滑值\n", - " smoothed.append(smoothed_val) \n", - " last = smoothed_val \n", - " return smoothed\n", - "\n", - "def plot_rewards(rewards,cfg, tag='train'):\n", - " sns.set()\n", - " plt.figure() # 创建一个图形实例,方便同时多画几个图\n", - " plt.title(f\"{tag}ing curve on {cfg.device} of {cfg.algo_name} for {cfg.env_name}\")\n", - " plt.xlabel('epsiodes')\n", - " plt.plot(rewards, label='rewards')\n", - " plt.plot(smooth(rewards), label='smoothed')\n", - " plt.legend()\n", - " plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 5、我准备好了!\n", - "仔细看,会发现Sarsa收敛得快一些,但是收敛之会低些,Q learning会相反,至于为什么请同学们自行思考哟~" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "状态数:48,动作数:4\n", - "开始训练!\n", - "环境:CliffWalking-v0, 算法:Sarsa, 设备:cpu\n", - "回合:1/400,奖励:-1524.0,Epsilon:0.2029722781251147\n", - "回合:2/400,奖励:-1294.0,Epsilon:0.011808588201828951\n", - "回合:3/400,奖励:-192.0,Epsilon:0.01050118158853445\n", - "回合:4/400,奖励:-346.0,Epsilon:0.010049747911736582\n", - "回合:5/400,奖励:-252.0,Epsilon:0.010009240861841986\n", - "回合:6/400,奖励:-168.0,Epsilon:0.010003005072880926\n", - "回合:7/400,奖励:-393.0,Epsilon:0.01000042188120369\n", - "回合:8/400,奖励:-169.0,Epsilon:0.010000136281659052\n", - "回合:9/400,奖励:-97.0,Epsilon:0.010000071145264558\n", - "回合:10/400,奖励:-134.0,Epsilon:0.010000029022085234\n", - "回合:11/400,奖励:-124.0,Epsilon:0.010000012655059554\n", - "回合:12/400,奖励:-74.0,Epsilon:0.010000007701309915\n", - "回合:13/400,奖励:-135.0,Epsilon:0.010000003120699265\n", - "回合:14/400,奖励:-84.0,Epsilon:0.010000001776639691\n", - "回合:15/400,奖励:-101.0,Epsilon:0.010000000903081117\n", - "回合:16/400,奖励:-111.0,Epsilon:0.010000000429438717\n", - "回合:17/400,奖励:-114.0,Epsilon:0.010000000200165738\n", - "回合:18/400,奖励:-114.0,Epsilon:0.010000000093299278\n", - "回合:19/400,奖励:-82.0,Epsilon:0.010000000053829002\n", - "回合:20/400,奖励:-85.0,Epsilon:0.01000000003044167\n", - "回合:21/400,奖励:-108.0,Epsilon:0.010000000014768242\n", - "回合:22/400,奖励:-66.0,Epsilon:0.010000000009479634\n", - "回合:23/400,奖励:-74.0,Epsilon:0.010000000005768887\n", - "回合:24/400,奖励:-114.0,Epsilon:0.010000000002688936\n", - "回合:25/400,奖励:-98.0,Epsilon:0.010000000001394421\n", - "回合:26/400,奖励:-94.0,Epsilon:0.010000000000742658\n", - "回合:27/400,奖励:-58.0,Epsilon:0.010000000000502822\n", - "回合:28/400,奖励:-100.0,Epsilon:0.010000000000257298\n", - "回合:29/400,奖励:-208.0,Epsilon:0.010000000000123995\n", - "回合:30/400,奖励:-184.0,Epsilon:0.010000000000070121\n", - "回合:31/400,奖励:-62.0,Epsilon:0.010000000000046227\n", - "回合:32/400,奖励:-117.0,Epsilon:0.01000000000002112\n", - "回合:33/400,奖励:-47.0,Epsilon:0.010000000000015387\n", - "回合:34/400,奖励:-54.0,Epsilon:0.0100000000000107\n", - "回合:35/400,奖励:-120.0,Epsilon:0.010000000000004792\n", - "回合:36/400,奖励:-75.0,Epsilon:0.010000000000002897\n", - "回合:37/400,奖励:-62.0,Epsilon:0.01000000000000191\n", - "回合:38/400,奖励:-70.0,Epsilon:0.010000000000001194\n", - "回合:39/400,奖励:-67.0,Epsilon:0.010000000000000762\n", - "回合:40/400,奖励:-87.0,Epsilon:0.010000000000000425\n", - "回合:41/400,奖励:-92.0,Epsilon:0.01000000000000023\n", - "回合:42/400,奖励:-79.0,Epsilon:0.010000000000000136\n", - "回合:43/400,奖励:-49.0,Epsilon:0.010000000000000097\n", - "回合:44/400,奖励:-103.0,Epsilon:0.010000000000000049\n", - "回合:45/400,奖励:-40.0,Epsilon:0.010000000000000037\n", - "回合:46/400,奖励:-214.0,Epsilon:0.010000000000000018\n", - "回合:47/400,奖励:-83.0,Epsilon:0.01000000000000001\n", - "回合:48/400,奖励:-62.0,Epsilon:0.010000000000000007\n", - "回合:49/400,奖励:-37.0,Epsilon:0.010000000000000005\n", - "回合:50/400,奖励:-73.0,Epsilon:0.010000000000000004\n", - "回合:51/400,奖励:-66.0,Epsilon:0.010000000000000002\n", - "回合:52/400,奖励:-48.0,Epsilon:0.010000000000000002\n", - "回合:53/400,奖励:-96.0,Epsilon:0.01\n", - "回合:54/400,奖励:-189.0,Epsilon:0.01\n", - "回合:55/400,奖励:-42.0,Epsilon:0.01\n", - "回合:56/400,奖励:-46.0,Epsilon:0.01\n", - "回合:57/400,奖励:-85.0,Epsilon:0.01\n", - "回合:58/400,奖励:-52.0,Epsilon:0.01\n", - "回合:59/400,奖励:-86.0,Epsilon:0.01\n", - "回合:60/400,奖励:-41.0,Epsilon:0.01\n", - "回合:61/400,奖励:-51.0,Epsilon:0.01\n", - "回合:62/400,奖励:-59.0,Epsilon:0.01\n", - "回合:63/400,奖励:-145.0,Epsilon:0.01\n", - "回合:64/400,奖励:-76.0,Epsilon:0.01\n", - "回合:65/400,奖励:-43.0,Epsilon:0.01\n", - "回合:66/400,奖励:-49.0,Epsilon:0.01\n", - "回合:67/400,奖励:-36.0,Epsilon:0.01\n", - "回合:68/400,奖励:-41.0,Epsilon:0.01\n", - "回合:69/400,奖励:-69.0,Epsilon:0.01\n", - "回合:70/400,奖励:-38.0,Epsilon:0.01\n", - "回合:71/400,奖励:-63.0,Epsilon:0.01\n", - "回合:72/400,奖励:-46.0,Epsilon:0.01\n", - "回合:73/400,奖励:-30.0,Epsilon:0.01\n", - "回合:74/400,奖励:-45.0,Epsilon:0.01\n", - "回合:75/400,奖励:-38.0,Epsilon:0.01\n", - "回合:76/400,奖励:-88.0,Epsilon:0.01\n", - "回合:77/400,奖励:-19.0,Epsilon:0.01\n", - "回合:78/400,奖励:-40.0,Epsilon:0.01\n", - "回合:79/400,奖励:-62.0,Epsilon:0.01\n", - "回合:80/400,奖励:-25.0,Epsilon:0.01\n", - "回合:81/400,奖励:-54.0,Epsilon:0.01\n", - "回合:82/400,奖励:-41.0,Epsilon:0.01\n", - "回合:83/400,奖励:-57.0,Epsilon:0.01\n", - "回合:84/400,奖励:-52.0,Epsilon:0.01\n", - "回合:85/400,奖励:-42.0,Epsilon:0.01\n", - "回合:86/400,奖励:-51.0,Epsilon:0.01\n", - "回合:87/400,奖励:-53.0,Epsilon:0.01\n", - "回合:88/400,奖励:-42.0,Epsilon:0.01\n", - "回合:89/400,奖励:-53.0,Epsilon:0.01\n", - "回合:90/400,奖励:-31.0,Epsilon:0.01\n", - "回合:91/400,奖励:-75.0,Epsilon:0.01\n", - "回合:92/400,奖励:-148.0,Epsilon:0.01\n", - "回合:93/400,奖励:-41.0,Epsilon:0.01\n", - "回合:94/400,奖励:-47.0,Epsilon:0.01\n", - "回合:95/400,奖励:-184.0,Epsilon:0.01\n", - "回合:96/400,奖励:-34.0,Epsilon:0.01\n", - "回合:97/400,奖励:-45.0,Epsilon:0.01\n", - "回合:98/400,奖励:-52.0,Epsilon:0.01\n", - "回合:99/400,奖励:-44.0,Epsilon:0.01\n", - "回合:100/400,奖励:-49.0,Epsilon:0.01\n", - "回合:101/400,奖励:-30.0,Epsilon:0.01\n", - "回合:102/400,奖励:-49.0,Epsilon:0.01\n", - "回合:103/400,奖励:-23.0,Epsilon:0.01\n", - "回合:104/400,奖励:-37.0,Epsilon:0.01\n", - "回合:105/400,奖励:-37.0,Epsilon:0.01\n", - "回合:106/400,奖励:-44.0,Epsilon:0.01\n", - "回合:107/400,奖励:-40.0,Epsilon:0.01\n", - "回合:108/400,奖励:-28.0,Epsilon:0.01\n", - "回合:109/400,奖励:-50.0,Epsilon:0.01\n", - "回合:110/400,奖励:-46.0,Epsilon:0.01\n", - "回合:111/400,奖励:-28.0,Epsilon:0.01\n", - "回合:112/400,奖励:-35.0,Epsilon:0.01\n", - "回合:113/400,奖励:-35.0,Epsilon:0.01\n", - "回合:114/400,奖励:-45.0,Epsilon:0.01\n", - "回合:115/400,奖励:-38.0,Epsilon:0.01\n", - "回合:116/400,奖励:-39.0,Epsilon:0.01\n", - "回合:117/400,奖励:-27.0,Epsilon:0.01\n", - "回合:118/400,奖励:-49.0,Epsilon:0.01\n", - "回合:119/400,奖励:-27.0,Epsilon:0.01\n", - "回合:120/400,奖励:-25.0,Epsilon:0.01\n", - "回合:121/400,奖励:-50.0,Epsilon:0.01\n", - "回合:122/400,奖励:-41.0,Epsilon:0.01\n", - "回合:123/400,奖励:-22.0,Epsilon:0.01\n", - "回合:124/400,奖励:-38.0,Epsilon:0.01\n", - "回合:125/400,奖励:-125.0,Epsilon:0.01\n", - "回合:126/400,奖励:-25.0,Epsilon:0.01\n", - "回合:127/400,奖励:-40.0,Epsilon:0.01\n", - "回合:128/400,奖励:-33.0,Epsilon:0.01\n", - "回合:129/400,奖励:-56.0,Epsilon:0.01\n", - "回合:130/400,奖励:-32.0,Epsilon:0.01\n", - "回合:131/400,奖励:-21.0,Epsilon:0.01\n", - "回合:132/400,奖励:-33.0,Epsilon:0.01\n", - "回合:133/400,奖励:-23.0,Epsilon:0.01\n", - "回合:134/400,奖励:-33.0,Epsilon:0.01\n", - "回合:135/400,奖励:-34.0,Epsilon:0.01\n", - "回合:136/400,奖励:-33.0,Epsilon:0.01\n", - "回合:137/400,奖励:-21.0,Epsilon:0.01\n", - "回合:138/400,奖励:-40.0,Epsilon:0.01\n", - "回合:139/400,奖励:-23.0,Epsilon:0.01\n", - "回合:140/400,奖励:-31.0,Epsilon:0.01\n", - "回合:141/400,奖励:-31.0,Epsilon:0.01\n", - "回合:142/400,奖励:-26.0,Epsilon:0.01\n", - "回合:143/400,奖励:-26.0,Epsilon:0.01\n", - "回合:144/400,奖励:-32.0,Epsilon:0.01\n", - "回合:145/400,奖励:-27.0,Epsilon:0.01\n", - "回合:146/400,奖励:-33.0,Epsilon:0.01\n", - "回合:147/400,奖励:-35.0,Epsilon:0.01\n", - "回合:148/400,奖励:-21.0,Epsilon:0.01\n", - "回合:149/400,奖励:-23.0,Epsilon:0.01\n", - "回合:150/400,奖励:-33.0,Epsilon:0.01\n", - "回合:151/400,奖励:-25.0,Epsilon:0.01\n", - "回合:152/400,奖励:-41.0,Epsilon:0.01\n", - "回合:153/400,奖励:-31.0,Epsilon:0.01\n", - "回合:154/400,奖励:-28.0,Epsilon:0.01\n", - "回合:155/400,奖励:-133.0,Epsilon:0.01\n", - "回合:156/400,奖励:-22.0,Epsilon:0.01\n", - "回合:157/400,奖励:-21.0,Epsilon:0.01\n", - "回合:158/400,奖励:-33.0,Epsilon:0.01\n", - "回合:159/400,奖励:-33.0,Epsilon:0.01\n", - "回合:160/400,奖励:-24.0,Epsilon:0.01\n", - "回合:161/400,奖励:-34.0,Epsilon:0.01\n", - "回合:162/400,奖励:-20.0,Epsilon:0.01\n", - "回合:163/400,奖励:-21.0,Epsilon:0.01\n", - "回合:164/400,奖励:-126.0,Epsilon:0.01\n", - "回合:165/400,奖励:-36.0,Epsilon:0.01\n", - "回合:166/400,奖励:-18.0,Epsilon:0.01\n", - "回合:167/400,奖励:-35.0,Epsilon:0.01\n", - "回合:168/400,奖励:-26.0,Epsilon:0.01\n", - "回合:169/400,奖励:-24.0,Epsilon:0.01\n", - "回合:170/400,奖励:-33.0,Epsilon:0.01\n", - "回合:171/400,奖励:-17.0,Epsilon:0.01\n", - "回合:172/400,奖励:-23.0,Epsilon:0.01\n", - "回合:173/400,奖励:-26.0,Epsilon:0.01\n", - "回合:174/400,奖励:-23.0,Epsilon:0.01\n", - "回合:175/400,奖励:-21.0,Epsilon:0.01\n", - "回合:176/400,奖励:-35.0,Epsilon:0.01\n", - "回合:177/400,奖励:-26.0,Epsilon:0.01\n", - "回合:178/400,奖励:-17.0,Epsilon:0.01\n", - "回合:179/400,奖励:-20.0,Epsilon:0.01\n", - "回合:180/400,奖励:-28.0,Epsilon:0.01\n", - "回合:181/400,奖励:-34.0,Epsilon:0.01\n", - "回合:182/400,奖励:-27.0,Epsilon:0.01\n", - "回合:183/400,奖励:-22.0,Epsilon:0.01\n", - "回合:184/400,奖励:-24.0,Epsilon:0.01\n", - "回合:185/400,奖励:-26.0,Epsilon:0.01\n", - "回合:186/400,奖励:-20.0,Epsilon:0.01\n", - "回合:187/400,奖励:-30.0,Epsilon:0.01\n", - "回合:188/400,奖励:-28.0,Epsilon:0.01\n", - "回合:189/400,奖励:-15.0,Epsilon:0.01\n", - "回合:190/400,奖励:-30.0,Epsilon:0.01\n", - "回合:191/400,奖励:-29.0,Epsilon:0.01\n", - "回合:192/400,奖励:-22.0,Epsilon:0.01\n", - "回合:193/400,奖励:-25.0,Epsilon:0.01\n", - "回合:194/400,奖励:-21.0,Epsilon:0.01\n", - "回合:195/400,奖励:-19.0,Epsilon:0.01\n", - "回合:196/400,奖励:-23.0,Epsilon:0.01\n", - "回合:197/400,奖励:-21.0,Epsilon:0.01\n", - "回合:198/400,奖励:-32.0,Epsilon:0.01\n", - "回合:199/400,奖励:-30.0,Epsilon:0.01\n", - "回合:200/400,奖励:-22.0,Epsilon:0.01\n", - "回合:201/400,奖励:-20.0,Epsilon:0.01\n", - "回合:202/400,奖励:-27.0,Epsilon:0.01\n", - "回合:203/400,奖励:-21.0,Epsilon:0.01\n", - "回合:204/400,奖励:-26.0,Epsilon:0.01\n", - "回合:205/400,奖励:-19.0,Epsilon:0.01\n", - "回合:206/400,奖励:-17.0,Epsilon:0.01\n", - "回合:207/400,奖励:-31.0,Epsilon:0.01\n", - "回合:208/400,奖励:-18.0,Epsilon:0.01\n", - "回合:209/400,奖励:-24.0,Epsilon:0.01\n", - "回合:210/400,奖励:-17.0,Epsilon:0.01\n", - "回合:211/400,奖励:-26.0,Epsilon:0.01\n", - "回合:212/400,奖励:-27.0,Epsilon:0.01\n", - "回合:213/400,奖励:-33.0,Epsilon:0.01\n", - "回合:214/400,奖励:-16.0,Epsilon:0.01\n", - "回合:215/400,奖励:-32.0,Epsilon:0.01\n", - "回合:216/400,奖励:-19.0,Epsilon:0.01\n", - "回合:217/400,奖励:-20.0,Epsilon:0.01\n", - "回合:218/400,奖励:-15.0,Epsilon:0.01\n", - "回合:219/400,奖励:-119.0,Epsilon:0.01\n", - "回合:220/400,奖励:-26.0,Epsilon:0.01\n", - "回合:221/400,奖励:-26.0,Epsilon:0.01\n", - "回合:222/400,奖励:-22.0,Epsilon:0.01\n", - "回合:223/400,奖励:-22.0,Epsilon:0.01\n", - "回合:224/400,奖励:-15.0,Epsilon:0.01\n", - "回合:225/400,奖励:-24.0,Epsilon:0.01\n", - "回合:226/400,奖励:-15.0,Epsilon:0.01\n", - "回合:227/400,奖励:-31.0,Epsilon:0.01\n", - "回合:228/400,奖励:-24.0,Epsilon:0.01\n", - "回合:229/400,奖励:-20.0,Epsilon:0.01\n", - "回合:230/400,奖励:-20.0,Epsilon:0.01\n", - "回合:231/400,奖励:-22.0,Epsilon:0.01\n", - "回合:232/400,奖励:-15.0,Epsilon:0.01\n", - "回合:233/400,奖励:-19.0,Epsilon:0.01\n", - "回合:234/400,奖励:-21.0,Epsilon:0.01\n", - "回合:235/400,奖励:-27.0,Epsilon:0.01\n", - "回合:236/400,奖励:-15.0,Epsilon:0.01\n", - "回合:237/400,奖励:-25.0,Epsilon:0.01\n", - "回合:238/400,奖励:-22.0,Epsilon:0.01\n", - "回合:239/400,奖励:-16.0,Epsilon:0.01\n", - "回合:240/400,奖励:-18.0,Epsilon:0.01\n", - "回合:241/400,奖励:-13.0,Epsilon:0.01\n", - "回合:242/400,奖励:-13.0,Epsilon:0.01\n", - "回合:243/400,奖励:-13.0,Epsilon:0.01\n", - "回合:244/400,奖励:-23.0,Epsilon:0.01\n", - "回合:245/400,奖励:-29.0,Epsilon:0.01\n", - "回合:246/400,奖励:-26.0,Epsilon:0.01\n", - "回合:247/400,奖励:-19.0,Epsilon:0.01\n", - "回合:248/400,奖励:-21.0,Epsilon:0.01\n", - "回合:249/400,奖励:-17.0,Epsilon:0.01\n", - "回合:250/400,奖励:-17.0,Epsilon:0.01\n", - "回合:251/400,奖励:-15.0,Epsilon:0.01\n", - "回合:252/400,奖励:-20.0,Epsilon:0.01\n", - "回合:253/400,奖励:-23.0,Epsilon:0.01\n", - "回合:254/400,奖励:-19.0,Epsilon:0.01\n", - "回合:255/400,奖励:-21.0,Epsilon:0.01\n", - "回合:256/400,奖励:-19.0,Epsilon:0.01\n", - "回合:257/400,奖励:-17.0,Epsilon:0.01\n", - "回合:258/400,奖励:-17.0,Epsilon:0.01\n", - "回合:259/400,奖励:-15.0,Epsilon:0.01\n", - "回合:260/400,奖励:-21.0,Epsilon:0.01\n", - "回合:261/400,奖励:-17.0,Epsilon:0.01\n", - "回合:262/400,奖励:-19.0,Epsilon:0.01\n", - "回合:263/400,奖励:-19.0,Epsilon:0.01\n", - "回合:264/400,奖励:-15.0,Epsilon:0.01\n", - "回合:265/400,奖励:-19.0,Epsilon:0.01\n", - "回合:266/400,奖励:-17.0,Epsilon:0.01\n", - "回合:267/400,奖励:-15.0,Epsilon:0.01\n", - "回合:268/400,奖励:-19.0,Epsilon:0.01\n", - "回合:269/400,奖励:-27.0,Epsilon:0.01\n", - "回合:270/400,奖励:-15.0,Epsilon:0.01\n", - "回合:271/400,奖励:-17.0,Epsilon:0.01\n", - "回合:272/400,奖励:-17.0,Epsilon:0.01\n", - "回合:273/400,奖励:-25.0,Epsilon:0.01\n", - "回合:274/400,奖励:-19.0,Epsilon:0.01\n", - "回合:275/400,奖励:-22.0,Epsilon:0.01\n", - "回合:276/400,奖励:-23.0,Epsilon:0.01\n", - "回合:277/400,奖励:-18.0,Epsilon:0.01\n", - "回合:278/400,奖励:-23.0,Epsilon:0.01\n", - "回合:279/400,奖励:-21.0,Epsilon:0.01\n", - "回合:280/400,奖励:-21.0,Epsilon:0.01\n", - "回合:281/400,奖励:-21.0,Epsilon:0.01\n", - "回合:282/400,奖励:-19.0,Epsilon:0.01\n", - "回合:283/400,奖励:-18.0,Epsilon:0.01\n", - "回合:284/400,奖励:-15.0,Epsilon:0.01\n", - "回合:285/400,奖励:-19.0,Epsilon:0.01\n", - "回合:286/400,奖励:-19.0,Epsilon:0.01\n", - "回合:287/400,奖励:-21.0,Epsilon:0.01\n", - "回合:288/400,奖励:-15.0,Epsilon:0.01\n", - "回合:289/400,奖励:-32.0,Epsilon:0.01\n", - "回合:290/400,奖励:-18.0,Epsilon:0.01\n", - "回合:291/400,奖励:-17.0,Epsilon:0.01\n", - "回合:292/400,奖励:-15.0,Epsilon:0.01\n", - "回合:293/400,奖励:-24.0,Epsilon:0.01\n", - "回合:294/400,奖励:-22.0,Epsilon:0.01\n", - "回合:295/400,奖励:-31.0,Epsilon:0.01\n", - "回合:296/400,奖励:-17.0,Epsilon:0.01\n", - "回合:297/400,奖励:-19.0,Epsilon:0.01\n", - "回合:298/400,奖励:-19.0,Epsilon:0.01\n", - "回合:299/400,奖励:-20.0,Epsilon:0.01\n", - "回合:300/400,奖励:-21.0,Epsilon:0.01\n", - "回合:301/400,奖励:-26.0,Epsilon:0.01\n", - "回合:302/400,奖励:-20.0,Epsilon:0.01\n", - "回合:303/400,奖励:-16.0,Epsilon:0.01\n", - "回合:304/400,奖励:-20.0,Epsilon:0.01\n", - "回合:305/400,奖励:-21.0,Epsilon:0.01\n", - "回合:306/400,奖励:-16.0,Epsilon:0.01\n", - "回合:307/400,奖励:-19.0,Epsilon:0.01\n", - "回合:308/400,奖励:-24.0,Epsilon:0.01\n", - "回合:309/400,奖励:-20.0,Epsilon:0.01\n", - "回合:310/400,奖励:-17.0,Epsilon:0.01\n", - "回合:311/400,奖励:-16.0,Epsilon:0.01\n", - "回合:312/400,奖励:-25.0,Epsilon:0.01\n", - "回合:313/400,奖励:-16.0,Epsilon:0.01\n", - "回合:314/400,奖励:-19.0,Epsilon:0.01\n", - "回合:315/400,奖励:-19.0,Epsilon:0.01\n", - "回合:316/400,奖励:-27.0,Epsilon:0.01\n", - "回合:317/400,奖励:-15.0,Epsilon:0.01\n", - "回合:318/400,奖励:-15.0,Epsilon:0.01\n", - "回合:319/400,奖励:-15.0,Epsilon:0.01\n", - "回合:320/400,奖励:-19.0,Epsilon:0.01\n", - "回合:321/400,奖励:-23.0,Epsilon:0.01\n", - "回合:322/400,奖励:-24.0,Epsilon:0.01\n", - "回合:323/400,奖励:-15.0,Epsilon:0.01\n", - "回合:324/400,奖励:-20.0,Epsilon:0.01\n", - "回合:325/400,奖励:-18.0,Epsilon:0.01\n", - "回合:326/400,奖励:-19.0,Epsilon:0.01\n", - "回合:327/400,奖励:-19.0,Epsilon:0.01\n", - "回合:328/400,奖励:-26.0,Epsilon:0.01\n", - "回合:329/400,奖励:-16.0,Epsilon:0.01\n", - "回合:330/400,奖励:-18.0,Epsilon:0.01\n", - "回合:331/400,奖励:-15.0,Epsilon:0.01\n", - "回合:332/400,奖励:-15.0,Epsilon:0.01\n", - "回合:333/400,奖励:-17.0,Epsilon:0.01\n", - "回合:334/400,奖励:-17.0,Epsilon:0.01\n", - "回合:335/400,奖励:-16.0,Epsilon:0.01\n", - "回合:336/400,奖励:-24.0,Epsilon:0.01\n", - "回合:337/400,奖励:-15.0,Epsilon:0.01\n", - "回合:338/400,奖励:-18.0,Epsilon:0.01\n", - "回合:339/400,奖励:-16.0,Epsilon:0.01\n", - "回合:340/400,奖励:-15.0,Epsilon:0.01\n", - "回合:341/400,奖励:-18.0,Epsilon:0.01\n", - "回合:342/400,奖励:-15.0,Epsilon:0.01\n", - "回合:343/400,奖励:-20.0,Epsilon:0.01\n", - "回合:344/400,奖励:-18.0,Epsilon:0.01\n", - "回合:345/400,奖励:-17.0,Epsilon:0.01\n", - "回合:346/400,奖励:-19.0,Epsilon:0.01\n", - "回合:347/400,奖励:-15.0,Epsilon:0.01\n", - "回合:348/400,奖励:-15.0,Epsilon:0.01\n", - "回合:349/400,奖励:-15.0,Epsilon:0.01\n", - "回合:350/400,奖励:-18.0,Epsilon:0.01\n", - "回合:351/400,奖励:-16.0,Epsilon:0.01\n", - "回合:352/400,奖励:-16.0,Epsilon:0.01\n", - "回合:353/400,奖励:-15.0,Epsilon:0.01\n", - "回合:354/400,奖励:-20.0,Epsilon:0.01\n", - "回合:355/400,奖励:-15.0,Epsilon:0.01\n", - "回合:356/400,奖励:-17.0,Epsilon:0.01\n", - "回合:357/400,奖励:-15.0,Epsilon:0.01\n", - "回合:358/400,奖励:-17.0,Epsilon:0.01\n", - "回合:359/400,奖励:-15.0,Epsilon:0.01\n", - "回合:360/400,奖励:-16.0,Epsilon:0.01\n", - "回合:361/400,奖励:-15.0,Epsilon:0.01\n", - "回合:362/400,奖励:-18.0,Epsilon:0.01\n", - "回合:363/400,奖励:-17.0,Epsilon:0.01\n", - "回合:364/400,奖励:-22.0,Epsilon:0.01\n", - "回合:365/400,奖励:-15.0,Epsilon:0.01\n", - "回合:366/400,奖励:-15.0,Epsilon:0.01\n", - "回合:367/400,奖励:-15.0,Epsilon:0.01\n", - "回合:368/400,奖励:-16.0,Epsilon:0.01\n", - "回合:369/400,奖励:-16.0,Epsilon:0.01\n", - "回合:370/400,奖励:-15.0,Epsilon:0.01\n", - "回合:371/400,奖励:-20.0,Epsilon:0.01\n", - "回合:372/400,奖励:-15.0,Epsilon:0.01\n", - "回合:373/400,奖励:-15.0,Epsilon:0.01\n", - "回合:374/400,奖励:-15.0,Epsilon:0.01\n", - "回合:375/400,奖励:-16.0,Epsilon:0.01\n", - "回合:376/400,奖励:-15.0,Epsilon:0.01\n", - "回合:377/400,奖励:-15.0,Epsilon:0.01\n", - "回合:378/400,奖励:-17.0,Epsilon:0.01\n", - "回合:379/400,奖励:-20.0,Epsilon:0.01\n", - "回合:380/400,奖励:-17.0,Epsilon:0.01\n", - "回合:381/400,奖励:-15.0,Epsilon:0.01\n", - "回合:382/400,奖励:-15.0,Epsilon:0.01\n", - "回合:383/400,奖励:-15.0,Epsilon:0.01\n", - "回合:384/400,奖励:-15.0,Epsilon:0.01\n", - "回合:385/400,奖励:-16.0,Epsilon:0.01\n", - "回合:386/400,奖励:-15.0,Epsilon:0.01\n", - "回合:387/400,奖励:-18.0,Epsilon:0.01\n", - "回合:388/400,奖励:-15.0,Epsilon:0.01\n", - "回合:389/400,奖励:-15.0,Epsilon:0.01\n", - "回合:390/400,奖励:-15.0,Epsilon:0.01\n", - "回合:391/400,奖励:-16.0,Epsilon:0.01\n", - "回合:392/400,奖励:-18.0,Epsilon:0.01\n", - "回合:393/400,奖励:-15.0,Epsilon:0.01\n", - "回合:394/400,奖励:-15.0,Epsilon:0.01\n", - "回合:395/400,奖励:-15.0,Epsilon:0.01\n", - "回合:396/400,奖励:-20.0,Epsilon:0.01\n", - "回合:397/400,奖励:-15.0,Epsilon:0.01\n", - "回合:398/400,奖励:-15.0,Epsilon:0.01\n", - "回合:399/400,奖励:-15.0,Epsilon:0.01\n", - "回合:400/400,奖励:-15.0,Epsilon:0.01\n", - "完成训练!\n" - ] - }, - { - "data": { - "image/png": "", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "开始测试!\n", - "环境:CliffWalking-v0, 算法:Sarsa, 设备:cpu\n", - "回合数:1/20, 奖励:-15.0\n", - "回合数:2/20, 奖励:-15.0\n", - "回合数:3/20, 奖励:-15.0\n", - "回合数:4/20, 奖励:-15.0\n", - "回合数:5/20, 奖励:-15.0\n", - "回合数:6/20, 奖励:-15.0\n", - "回合数:7/20, 奖励:-15.0\n", - "回合数:8/20, 奖励:-15.0\n", - "回合数:9/20, 奖励:-15.0\n", - "回合数:10/20, 奖励:-15.0\n", - "回合数:11/20, 奖励:-15.0\n", - "回合数:12/20, 奖励:-15.0\n", - "回合数:13/20, 奖励:-15.0\n", - "回合数:14/20, 奖励:-15.0\n", - "回合数:15/20, 奖励:-15.0\n", - "回合数:16/20, 奖励:-15.0\n", - "回合数:17/20, 奖励:-15.0\n", - "回合数:18/20, 奖励:-15.0\n", - "回合数:19/20, 奖励:-15.0\n", - "回合数:20/20, 奖励:-15.0\n", - "完成测试!\n" - ] - }, - { - "data": { - "image/png": "", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "# 获取参数\n", - "cfg = get_args() \n", - "# 训练\n", - "env, agent = env_agent_config(cfg)\n", - "res_dic = train(cfg, env, agent)\n", - " \n", - "plot_rewards(res_dic['rewards'], cfg, tag=\"train\") \n", - "# 测试\n", - "res_dic = test(cfg, env, agent)\n", - "plot_rewards(res_dic['rewards'], cfg, tag=\"test\") # 画出结果" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.7.12 ('rl_tutorials')", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.12" - }, - "orig_nbformat": 4, - "vscode": { - "interpreter": { - "hash": "4f613f1ab80ec98dc1b91d6e720de51301598a187317378e53e49b773c1123dd" - } - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/projects/notebooks/3.DQN.ipynb b/projects/notebooks/3.DQN.ipynb deleted file mode 100644 index 6b73846..0000000 --- a/projects/notebooks/3.DQN.ipynb +++ /dev/null @@ -1,541 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 1、定义算法\n", - "\n", - "教程中提到相比于Q learning,DQN本质上是为了适应更为复杂的环境,并且经过不断的改良迭代,到了Nature DQN(即Volodymyr Mnih发表的Nature论文)这里才算是基本完善。DQN主要改动的点有三个:\n", - "* 使用深度神经网络替代原来的Q表:这个很容易理解原因\n", - "* 使用了经验回放(Replay Buffer):这个好处有很多,一个是使用一堆历史数据去训练,比之前用一次就扔掉好多了,大大提高样本效率,另外一个是面试常提到的,减少样本之间的相关性,原则上获取经验跟学习阶段是分开的,原来时序的训练数据有可能是不稳定的,打乱之后再学习有助于提高训练的稳定性,跟深度学习中划分训练测试集时打乱样本是一个道理。\n", - "* 使用了两个网络:即策略网络和目标网络,每隔若干步才把每步更新的策略网络参数复制给目标网络,这样做也是为了训练的稳定,避免Q值的估计发散。想象一下,如果当前有个transition(这个Q learning中提过的,一定要记住!!!)样本导致对Q值进行了较差的过估计,如果接下来从经验回放中提取到的样本正好连续几个都这样的,很有可能导致Q值的发散(它的青春小鸟一去不回来了)。再打个比方,我们玩RPG或者闯关类游戏,有些人为了破纪录经常Save和Load,只要我出了错,我不满意我就加载之前的存档,假设不允许加载呢,就像DQN算法一样训练过程中会退不了,这时候是不是搞两个档,一个档每帧都存一下,另外一个档打了不错的结果再存,也就是若干个间隔再存一下,到最后用间隔若干步数再存的档一般都比每帧都存的档好些呢。当然你也可以再搞更多个档,也就是DQN增加多个目标网络,但是对于DQN则没有多大必要,多几个网络效果不见得会好很多。" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 1.1、定义模型\n", - "\n", - "前面说了DQN的模型不再是Q表,而是一个深度神经网络,这里我只用了一个三层的全连接网络(FCN),这种网络也叫多层感知机(MLP),至于怎么用Torch写网络这里就不多说明了,以下仅供参考。" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "import torch.nn as nn\n", - "import torch.nn.functional as F\n", - "class MLP(nn.Module):\n", - " def __init__(self, n_states,n_actions,hidden_dim=128):\n", - " \"\"\" 初始化q网络,为全连接网络\n", - " \"\"\"\n", - " super(MLP, self).__init__()\n", - " self.fc1 = nn.Linear(n_states, hidden_dim) # 输入层\n", - " self.fc2 = nn.Linear(hidden_dim,hidden_dim) # 隐藏层\n", - " self.fc3 = nn.Linear(hidden_dim, n_actions) # 输出层\n", - " \n", - " def forward(self, x):\n", - " # 各层对应的激活函数\n", - " x = F.relu(self.fc1(x)) \n", - " x = F.relu(self.fc2(x))\n", - " return self.fc3(x)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 1.2、定义经验回放\n", - "\n", - "经验回放首先是具有一定容量的,只有存储一定的transition网络才会更新,否则就退回到了之前的逐步更新了。另外写经验回放的时候一般需要包涵两个功能或方法,一个是push,即将一个transition样本按顺序放到经验回放中,如果满了就把最开始放进去的样本挤掉,因此如果大家学过数据结构的话推荐用队列来写,虽然这里不是。另外一个是sample,很简单就是随机采样出一个或者若干个(具体多少就是batch_size了)样本供DQN网络更新。功能讲清楚了,大家可以按照自己的想法用代码来实现,参考如下。" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "from collections import deque\n", - "import random\n", - "class ReplayBuffer(object):\n", - " def __init__(self, capacity: int) -> None:\n", - " self.capacity = capacity\n", - " self.buffer = deque(maxlen=self.capacity)\n", - " def push(self,transitions):\n", - " ''' 存储transition到经验回放中\n", - " '''\n", - " self.buffer.append(transitions)\n", - " def sample(self, batch_size: int, sequential: bool = False):\n", - " if batch_size > len(self.buffer): # 如果批量大小大于经验回放的容量,则取经验回放的容量\n", - " batch_size = len(self.buffer)\n", - " if sequential: # 顺序采样\n", - " rand = random.randint(0, len(self.buffer) - batch_size)\n", - " batch = [self.buffer[i] for i in range(rand, rand + batch_size)]\n", - " return zip(*batch)\n", - " else: # 随机采样\n", - " batch = random.sample(self.buffer, batch_size)\n", - " return zip(*batch)\n", - " def clear(self):\n", - " ''' 清空经验回放\n", - " '''\n", - " self.buffer.clear()\n", - " def __len__(self):\n", - " ''' 返回当前存储的量\n", - " '''\n", - " return len(self.buffer)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 1.3、真定义算法\n", - "\n", - "到了高级一点的算法,定义算法就比较麻烦,要先定义一些子模块,再定义好子模块之后我们就可以实现我们的算法核心部分。如下,可以看到,其实去掉子模块的话,DQN跟Q learning的算法结构没啥区别,当然因为神经网络一般需要Torch或者Tensorflow来写,因此推荐大家先去学一学这些工具,比如\"eat_pytorch_in_20_days\"。\n", - "\n", - "这里我们主要分析一下DQN的更新过程,也就是update函数。首先我们知道目前所有基于深度神经网络的更新方式都是梯度下降,如下:\n", - "$$\n", - "\\theta_i \\leftarrow \\theta_i - \\lambda \\nabla_{\\theta_{i}} L_{i}\\left(\\theta_{i}\\right)\n", - "$$\n", - "那么这个$\\theta$又是什么呢,注意到前面我们讲的DQN跟Q learning算法的一个主要区别就是使用神经网络替代了Q表,而这个$\\theta$实际上就是神经网络的参数,通常用$Q\\left(s_{i}, a_{i} ; \\theta\\right)$表示。根据强化学习的原理我们需要优化的是对应状态下不同动作的长期价值,然后每次选择价值最大对应的动作就能完成一条最优策略,使用神经网络表示Q表时也是如此,我们将输入的状态数作为神经网络的输入层,动作数作为输出层,这样的神经网络表达的功能就跟在Q learning中的Q表是一样的,只不过具有更强的鲁棒性。\n", - "\n", - "讲完了为什么要优化的是这个参数$\\theta$,接下来我们从代码层面进一步剖析,稍微了解一点Torch知识的同学都知道,上面的公式其实只需要定义一个优化器,然后计算损失之后用优化器迭代即可,如下:\n", - "```python\n", - "optimizer = optim.Adam(Q_net.parameters(), lr=0.01) # 定义优化器,对应的网络是Q_net,学习率为0.01\n", - "loss = ... # 计算损失,这里掠过\n", - "# 然后优化器先zero_grad(),loss再反向传播,然后优化器step() ,这是一个固定的套路\n", - "optimizer.zero_grad() \n", - "loss.backward()\n", - "optimizer.step() \n", - "```\n", - "当然强烈建议同学们了解一下深度学习中的梯度下降,并且使用numpy实现,这样就会更加清楚整个梯度下降过程到底是怎么回事,上述只是在同学们了解了梯度下降的具体实现方式的前提下为了方便学习更多其他的知识形成的套路。这就好比我们玩一个竞技游戏,如果我们之前从来没有接触过该类游戏,那么肯定是从普通攻击,每个技能一步一步地学起打好基础,然后再学习技能连招等等也就是形成固定的套路,但是如果不先打基础,直接学习套路可能会是一脸懵逼的状态,尤其是很多高端玩家会对这些连招套路简化名称比如光速qa和1233321等等,一开始我们是很难听懂的。等当我们先打好基础,然后再学习了很多套路之后会发现这些基础并不能用得上,甚至有的时候可能会忽然忘记了这些基础,但其实我们并没有忘记,再回顾一遍也能很快拣起来。在这点上我想强调的是基础固然重要,但是不要死磕基础,除非是学术研究需要。再比如我们小学学完简单加减乘除之后很快就去背九九乘法表,而不会去过多纠结一加一等于几的问题,上大学后也是如此,只是很多时候我们很可能看起来这个问题值得研究,但意识不到自己就是在纠结一加一等于几的问题,这也是我在和众多读者们学习讨论的过程中在他们身上发现的问题。\n", - "\n", - "回归正题,细心的同学会发现数学公式和代码的对应是有一定的壁垒的,只要通过多加练习跨越了这个壁垒,那么对于往后我们想要复现论文也会轻松许多。我们目前讲了参数的更新过程,但是最关键的是损失是如何计算的,在DQN中损失的计算相对来说比较简单,如下:\n", - "$$\n", - "L(\\theta)=\\left(y_{i}-Q\\left(s_{i}, a_{i} ; \\theta\\right)\\right)^{2}\n", - "$$\n", - "这里的$y_{i}$通常称为期望值,$Q\\left(s_{i}, a_{i} ; \\theta\\right)$称为实际值,这个损失在深度学习中通常称作均方差损失,也就是mseloss,使用这个损失函数通常追溯到数学上的最小二乘法,感兴趣的同学可以了解一下深度学习中的各种损失函数以及各自的使用场景。\n", - "$y_{i}$在DQN中一般表示如下:\n", - "$$\n", - "y_{i}= \\begin{cases}r_{i} & \\text {对于终止状态} s_{i+1} \\\\ r_{i}+\\gamma \\max _{a^{\\prime}} Q\\left(s_{i+1}, a^{\\prime} ; \\theta\\right) & \\text {对于非终止状态} s_{i+1}\\end{cases}\n", - "$$\n", - "该公式的意思就是将下一个状态对应的最大Q值作为实际值(因为实际值通常不能直接求得,只能近似),这种做法实际上只是一种近似,可能会导致过估计等问题,也有一些改善的方法具体可以在后面各种改进的DQN算法比如Double DQN中看到,在这里我们暂时不要深究为什么要用这个来近似实际值。然后注意到这里其实有一个终止状态的判断,因为如果当前状态是终止状态,那么实际上是没有下一个状态的,所以DQN干脆直接使用对应的奖励表示Q的实际值。" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "import torch\n", - "import torch.optim as optim\n", - "import math\n", - "import numpy as np\n", - "class DQN:\n", - " def __init__(self,model,memory,cfg):\n", - "\n", - " self.n_actions = cfg['n_actions'] \n", - " self.device = torch.device(cfg['device']) \n", - " self.gamma = cfg['gamma'] # 奖励的折扣因子\n", - " # e-greedy策略相关参数\n", - " self.sample_count = 0 # 用于epsilon的衰减计数\n", - " self.epsilon = cfg['epsilon_start']\n", - " self.sample_count = 0 \n", - " self.epsilon_start = cfg['epsilon_start']\n", - " self.epsilon_end = cfg['epsilon_end']\n", - " self.epsilon_decay = cfg['epsilon_decay']\n", - " self.batch_size = cfg['batch_size']\n", - " self.policy_net = model.to(self.device)\n", - " self.target_net = model.to(self.device)\n", - " # 复制参数到目标网络\n", - " for target_param, param in zip(self.target_net.parameters(),self.policy_net.parameters()): \n", - " target_param.data.copy_(param.data)\n", - " self.optimizer = optim.Adam(self.policy_net.parameters(), lr=cfg['lr']) # 优化器\n", - " self.memory = memory # 经验回放\n", - " def sample_action(self, state):\n", - " ''' 采样动作\n", - " '''\n", - " self.sample_count += 1\n", - " # epsilon指数衰减\n", - " self.epsilon = self.epsilon_end + (self.epsilon_start - self.epsilon_end) * \\\n", - " math.exp(-1. * self.sample_count / self.epsilon_decay) \n", - " if random.random() > self.epsilon:\n", - " with torch.no_grad():\n", - " state = torch.tensor(state, device=self.device, dtype=torch.float32).unsqueeze(dim=0)\n", - " q_values = self.policy_net(state)\n", - " action = q_values.max(1)[1].item() # choose action corresponding to the maximum q value\n", - " else:\n", - " action = random.randrange(self.n_actions)\n", - " return action\n", - " @torch.no_grad() # 不计算梯度,该装饰器效果等同于with torch.no_grad():\n", - " def predict_action(self, state):\n", - " ''' 预测动作\n", - " '''\n", - " state = torch.tensor(state, device=self.device, dtype=torch.float32).unsqueeze(dim=0)\n", - " q_values = self.policy_net(state)\n", - " action = q_values.max(1)[1].item() # choose action corresponding to the maximum q value\n", - " return action\n", - " def update(self):\n", - " if len(self.memory) < self.batch_size: # 当经验回放中不满足一个批量时,不更新策略\n", - " return\n", - " # 从经验回放中随机采样一个批量的转移(transition)\n", - " state_batch, action_batch, reward_batch, next_state_batch, done_batch = self.memory.sample(\n", - " self.batch_size)\n", - " # 将数据转换为tensor\n", - " state_batch = torch.tensor(np.array(state_batch), device=self.device, dtype=torch.float)\n", - " action_batch = torch.tensor(action_batch, device=self.device).unsqueeze(1) \n", - " reward_batch = torch.tensor(reward_batch, device=self.device, dtype=torch.float) \n", - " next_state_batch = torch.tensor(np.array(next_state_batch), device=self.device, dtype=torch.float)\n", - " done_batch = torch.tensor(np.float32(done_batch), device=self.device)\n", - " q_values = self.policy_net(state_batch).gather(dim=1, index=action_batch) # 计算当前状态(s_t,a)对应的Q(s_t, a)\n", - " next_q_values = self.target_net(next_state_batch).max(1)[0].detach() # 计算下一时刻的状态(s_t_,a)对应的Q值\n", - " # 计算期望的Q值,对于终止状态,此时done_batch[0]=1, 对应的expected_q_value等于reward\n", - " expected_q_values = reward_batch + self.gamma * next_q_values * (1-done_batch)\n", - " loss = nn.MSELoss()(q_values, expected_q_values.unsqueeze(1)) # 计算均方根损失\n", - " # 优化更新模型\n", - " self.optimizer.zero_grad() \n", - " loss.backward()\n", - " # clip防止梯度爆炸\n", - " for param in self.policy_net.parameters(): \n", - " param.grad.data.clamp_(-1, 1)\n", - " self.optimizer.step() " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 2、定义训练" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def train(cfg, env, agent):\n", - " ''' 训练\n", - " '''\n", - " print(\"开始训练!\")\n", - " rewards = [] # 记录所有回合的奖励\n", - " steps = []\n", - " for i_ep in range(cfg['train_eps']):\n", - " ep_reward = 0 # 记录一回合内的奖励\n", - " ep_step = 0\n", - " state = env.reset() # 重置环境,返回初始状态\n", - " for _ in range(cfg['ep_max_steps']):\n", - " ep_step += 1\n", - " action = agent.sample_action(state) # 选择动作\n", - " next_state, reward, done, _ = env.step(action) # 更新环境,返回transition\n", - " agent.memory.push((state, action, reward,next_state, done)) # 保存transition\n", - " state = next_state # 更新下一个状态\n", - " agent.update() # 更新智能体\n", - " ep_reward += reward # 累加奖励\n", - " if done:\n", - " break\n", - " if (i_ep + 1) % cfg['target_update'] == 0: # 智能体目标网络更新\n", - " agent.target_net.load_state_dict(agent.policy_net.state_dict())\n", - " steps.append(ep_step)\n", - " rewards.append(ep_reward)\n", - " if (i_ep + 1) % 10 == 0:\n", - " print(f\"回合:{i_ep+1}/{cfg['train_eps']},奖励:{ep_reward:.2f},Epislon:{agent.epsilon:.3f}\")\n", - " print(\"完成训练!\")\n", - " env.close()\n", - " return {'rewards':rewards}\n", - "\n", - "def test(cfg, env, agent):\n", - " print(\"开始测试!\")\n", - " rewards = [] # 记录所有回合的奖励\n", - " steps = []\n", - " for i_ep in range(cfg['test_eps']):\n", - " ep_reward = 0 # 记录一回合内的奖励\n", - " ep_step = 0\n", - " state = env.reset() # 重置环境,返回初始状态\n", - " for _ in range(cfg['ep_max_steps']):\n", - " ep_step+=1\n", - " action = agent.predict_action(state) # 选择动作\n", - " next_state, reward, done, _ = env.step(action) # 更新环境,返回transition\n", - " state = next_state # 更新下一个状态\n", - " ep_reward += reward # 累加奖励\n", - " if done:\n", - " break\n", - " steps.append(ep_step)\n", - " rewards.append(ep_reward)\n", - " print(f\"回合:{i_ep+1}/{cfg['test_eps']},奖励:{ep_reward:.2f}\")\n", - " print(\"完成测试\")\n", - " env.close()\n", - " return {'rewards':rewards}" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 3. 定义环境" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "import gym\n", - "import os\n", - "def all_seed(env,seed = 1):\n", - " ''' 万能的seed函数\n", - " '''\n", - " env.seed(seed) # env config\n", - " np.random.seed(seed)\n", - " random.seed(seed)\n", - " torch.manual_seed(seed) # config for CPU\n", - " torch.cuda.manual_seed(seed) # config for GPU\n", - " os.environ['PYTHONHASHSEED'] = str(seed) # config for python scripts\n", - " # config for cudnn\n", - " torch.backends.cudnn.deterministic = True\n", - " torch.backends.cudnn.benchmark = False\n", - " torch.backends.cudnn.enabled = False\n", - "def env_agent_config(cfg):\n", - " env = gym.make(cfg['env_name']) # 创建环境\n", - " if cfg['seed'] !=0:\n", - " all_seed(env,seed=cfg['seed'])\n", - " n_states = env.observation_space.shape[0]\n", - " n_actions = env.action_space.n\n", - " print(f\"状态空间维度:{n_states},动作空间维度:{n_actions}\")\n", - " cfg.update({\"n_states\":n_states,\"n_actions\":n_actions}) # 更新n_states和n_actions到cfg参数中\n", - " model = MLP(n_states, n_actions, hidden_dim = cfg['hidden_dim']) # 创建模型\n", - " memory = ReplayBuffer(cfg['memory_capacity'])\n", - " agent = DQN(model,memory,cfg)\n", - " return env,agent" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 4、设置参数" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "import argparse\n", - "import matplotlib.pyplot as plt\n", - "import seaborn as sns\n", - "def get_args():\n", - " \"\"\" 超参数\n", - " \"\"\"\n", - " parser = argparse.ArgumentParser(description=\"hyperparameters\") \n", - " parser.add_argument('--algo_name',default='DQN',type=str,help=\"name of algorithm\")\n", - " parser.add_argument('--env_name',default='CartPole-v0',type=str,help=\"name of environment\")\n", - " parser.add_argument('--train_eps',default=200,type=int,help=\"episodes of training\")\n", - " parser.add_argument('--test_eps',default=20,type=int,help=\"episodes of testing\")\n", - " parser.add_argument('--ep_max_steps',default = 100000,type=int,help=\"steps per episode, much larger value can simulate infinite steps\")\n", - " parser.add_argument('--gamma',default=0.95,type=float,help=\"discounted factor\")\n", - " parser.add_argument('--epsilon_start',default=0.95,type=float,help=\"initial value of epsilon\")\n", - " parser.add_argument('--epsilon_end',default=0.01,type=float,help=\"final value of epsilon\")\n", - " parser.add_argument('--epsilon_decay',default=500,type=int,help=\"decay rate of epsilon, the higher value, the slower decay\")\n", - " parser.add_argument('--lr',default=0.0001,type=float,help=\"learning rate\")\n", - " parser.add_argument('--memory_capacity',default=100000,type=int,help=\"memory capacity\")\n", - " parser.add_argument('--batch_size',default=64,type=int)\n", - " parser.add_argument('--target_update',default=4,type=int)\n", - " parser.add_argument('--hidden_dim',default=256,type=int)\n", - " parser.add_argument('--device',default='cpu',type=str,help=\"cpu or cuda\") \n", - " parser.add_argument('--seed',default=10,type=int,help=\"seed\") \n", - " args = parser.parse_args([])\n", - " args = {**vars(args)} # 转换成字典类型 \n", - " ## 打印超参数\n", - " print(\"超参数\")\n", - " print(''.join(['=']*80))\n", - " tplt = \"{:^20}\\t{:^20}\\t{:^20}\"\n", - " print(tplt.format(\"Name\", \"Value\", \"Type\"))\n", - " for k,v in args.items():\n", - " print(tplt.format(k,v,str(type(v)))) \n", - " print(''.join(['=']*80)) \n", - " return args\n", - "def smooth(data, weight=0.9): \n", - " '''用于平滑曲线,类似于Tensorboard中的smooth曲线\n", - " '''\n", - " last = data[0] \n", - " smoothed = []\n", - " for point in data:\n", - " smoothed_val = last * weight + (1 - weight) * point # 计算平滑值\n", - " smoothed.append(smoothed_val) \n", - " last = smoothed_val \n", - " return smoothed\n", - "\n", - "def plot_rewards(rewards,cfg, tag='train'):\n", - " ''' 画图\n", - " '''\n", - " sns.set()\n", - " plt.figure() # 创建一个图形实例,方便同时多画几个图\n", - " plt.title(f\"{tag}ing curve on {cfg['device']} of {cfg['algo_name']} for {cfg['env_name']}\")\n", - " plt.xlabel('epsiodes')\n", - " plt.plot(rewards, label='rewards')\n", - " plt.plot(smooth(rewards), label='smoothed')\n", - " plt.legend()\n", - " plt.show()\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 5、我准备好了!" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "超参数\n", - "================================================================================\n", - " Name \t Value \t Type \n", - " algo_name \t DQN \t \n", - " env_name \t CartPole-v0 \t \n", - " train_eps \t 200 \t \n", - " test_eps \t 20 \t \n", - " ep_max_steps \t 100000 \t \n", - " gamma \t 0.95 \t \n", - " epsilon_start \t 0.95 \t \n", - " epsilon_end \t 0.01 \t \n", - " epsilon_decay \t 500 \t \n", - " lr \t 0.0001 \t \n", - " memory_capacity \t 100000 \t \n", - " batch_size \t 64 \t \n", - " target_update \t 4 \t \n", - " hidden_dim \t 256 \t \n", - " device \t cpu \t \n", - " seed \t 10 \t \n", - "================================================================================\n", - "状态空间维度:4,动作空间维度:2\n", - "开始训练!\n", - "回合:10/200,奖励:14.00,Epislon:0.611\n", - "回合:20/200,奖励:10.00,Epislon:0.470\n", - "回合:30/200,奖励:11.00,Epislon:0.372\n", - "回合:40/200,奖励:18.00,Epislon:0.302\n", - "回合:50/200,奖励:15.00,Epislon:0.228\n", - "回合:60/200,奖励:62.00,Epislon:0.121\n", - "回合:70/200,奖励:128.00,Epislon:0.039\n", - "回合:80/200,奖励:200.00,Epislon:0.011\n", - "回合:90/200,奖励:200.00,Epislon:0.010\n", - "回合:100/200,奖励:200.00,Epislon:0.010\n", - "回合:110/200,奖励:200.00,Epislon:0.010\n", - "回合:120/200,奖励:200.00,Epislon:0.010\n", - "回合:130/200,奖励:200.00,Epislon:0.010\n", - "回合:140/200,奖励:200.00,Epislon:0.010\n", - "回合:150/200,奖励:200.00,Epislon:0.010\n", - "回合:160/200,奖励:200.00,Epislon:0.010\n", - "回合:170/200,奖励:200.00,Epislon:0.010\n", - "回合:180/200,奖励:200.00,Epislon:0.010\n", - "回合:190/200,奖励:200.00,Epislon:0.010\n", - "回合:200/200,奖励:200.00,Epislon:0.010\n", - "完成训练!\n" - ] - }, - { - "data": { - "image/png": "", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "开始测试!\n", - "回合:1/20,奖励:200.00\n", - "回合:2/20,奖励:200.00\n", - "回合:3/20,奖励:200.00\n", - "回合:4/20,奖励:200.00\n", - "回合:5/20,奖励:200.00\n", - "回合:6/20,奖励:200.00\n", - "回合:7/20,奖励:200.00\n", - "回合:8/20,奖励:200.00\n", - "回合:9/20,奖励:200.00\n", - "回合:10/20,奖励:200.00\n", - "回合:11/20,奖励:200.00\n", - "回合:12/20,奖励:200.00\n", - "回合:13/20,奖励:200.00\n", - "回合:14/20,奖励:200.00\n", - "回合:15/20,奖励:200.00\n", - "回合:16/20,奖励:200.00\n", - "回合:17/20,奖励:200.00\n", - "回合:18/20,奖励:200.00\n", - "回合:19/20,奖励:200.00\n", - "回合:20/20,奖励:200.00\n", - "完成测试\n" - ] - }, - { - "data": { - "image/png": "", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "# 获取参数\n", - "cfg = get_args() \n", - "# 训练\n", - "env, agent = env_agent_config(cfg)\n", - "res_dic = train(cfg, env, agent)\n", - " \n", - "plot_rewards(res_dic['rewards'], cfg, tag=\"train\") \n", - "# 测试\n", - "res_dic = test(cfg, env, agent)\n", - "plot_rewards(res_dic['rewards'], cfg, tag=\"test\") # 画出结果" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.7.12 ('easyrl')", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.12" - }, - "orig_nbformat": 4, - "vscode": { - "interpreter": { - "hash": "f5a9629e9f3b9957bf68a43815f911e93447d47b3d065b6a8a04975e44c504d9" - } - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/projects/notebooks/A2C.ipynb b/projects/notebooks/A2C.ipynb deleted file mode 100644 index 8966eac..0000000 --- a/projects/notebooks/A2C.ipynb +++ /dev/null @@ -1,370 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import torch\n", - "import torch.optim as optim\n", - "import torch.nn as nn\n", - "import torch.nn.functional as F\n", - "from torch.distributions import Categorical\n", - "import numpy as np\n", - "from multiprocessing import Process, Pipe\n", - "import argparse\n", - "import gym" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 建立Actor和Critic网络" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "class ActorCritic(nn.Module):\n", - " ''' A2C网络模型,包含一个Actor和Critic\n", - " '''\n", - " def __init__(self, input_dim, output_dim, hidden_dim):\n", - " super(ActorCritic, self).__init__()\n", - " self.critic = nn.Sequential(\n", - " nn.Linear(input_dim, hidden_dim),\n", - " nn.ReLU(),\n", - " nn.Linear(hidden_dim, 1)\n", - " )\n", - " \n", - " self.actor = nn.Sequential(\n", - " nn.Linear(input_dim, hidden_dim),\n", - " nn.ReLU(),\n", - " nn.Linear(hidden_dim, output_dim),\n", - " nn.Softmax(dim=1),\n", - " )\n", - " \n", - " def forward(self, x):\n", - " value = self.critic(x)\n", - " probs = self.actor(x)\n", - " dist = Categorical(probs)\n", - " return dist, value" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "class A2C:\n", - " ''' A2C算法\n", - " '''\n", - " def __init__(self,n_states,n_actions,cfg) -> None:\n", - " self.gamma = cfg.gamma\n", - " self.device = cfg.device\n", - " self.model = ActorCritic(n_states, n_actions, cfg.hidden_size).to(self.device)\n", - " self.optimizer = optim.Adam(self.model.parameters())\n", - "\n", - " def compute_returns(self,next_value, rewards, masks):\n", - " R = next_value\n", - " returns = []\n", - " for step in reversed(range(len(rewards))):\n", - " R = rewards[step] + self.gamma * R * masks[step]\n", - " returns.insert(0, R)\n", - " return returns" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [], - "source": [ - "def make_envs(env_name):\n", - " def _thunk():\n", - " env = gym.make(env_name)\n", - " env.seed(2)\n", - " return env\n", - " return _thunk\n", - "def test_env(env,model,vis=False):\n", - " state = env.reset()\n", - " if vis: env.render()\n", - " done = False\n", - " total_reward = 0\n", - " while not done:\n", - " state = torch.FloatTensor(state).unsqueeze(0).to(cfg.device)\n", - " dist, _ = model(state)\n", - " next_state, reward, done, _ = env.step(dist.sample().cpu().numpy()[0])\n", - " state = next_state\n", - " if vis: env.render()\n", - " total_reward += reward\n", - " return total_reward\n", - "\n", - "def compute_returns(next_value, rewards, masks, gamma=0.99):\n", - " R = next_value\n", - " returns = []\n", - " for step in reversed(range(len(rewards))):\n", - " R = rewards[step] + gamma * R * masks[step]\n", - " returns.insert(0, R)\n", - " return returns\n", - "\n", - "\n", - "def train(cfg,envs):\n", - " print('Start training!')\n", - " print(f'Env:{cfg.env_name}, Algorithm:{cfg.algo_name}, Device:{cfg.device}')\n", - " env = gym.make(cfg.env_name) # a single env\n", - " env.seed(10)\n", - " n_states = envs.observation_space.shape[0]\n", - " n_actions = envs.action_space.n\n", - " model = ActorCritic(n_states, n_actions, cfg.hidden_dim).to(cfg.device)\n", - " optimizer = optim.Adam(model.parameters())\n", - " step_idx = 0\n", - " test_rewards = []\n", - " test_ma_rewards = []\n", - " state = envs.reset()\n", - " while step_idx < cfg.max_steps:\n", - " log_probs = []\n", - " values = []\n", - " rewards = []\n", - " masks = []\n", - " entropy = 0\n", - " # rollout trajectory\n", - " for _ in range(cfg.n_steps):\n", - " state = torch.FloatTensor(state).to(cfg.device)\n", - " dist, value = model(state)\n", - " action = dist.sample()\n", - " next_state, reward, done, _ = envs.step(action.cpu().numpy())\n", - " log_prob = dist.log_prob(action)\n", - " entropy += dist.entropy().mean()\n", - " log_probs.append(log_prob)\n", - " values.append(value)\n", - " rewards.append(torch.FloatTensor(reward).unsqueeze(1).to(cfg.device))\n", - " masks.append(torch.FloatTensor(1 - done).unsqueeze(1).to(cfg.device))\n", - " state = next_state\n", - " step_idx += 1\n", - " if step_idx % 200 == 0:\n", - " test_reward = np.mean([test_env(env,model) for _ in range(10)])\n", - " print(f\"step_idx:{step_idx}, test_reward:{test_reward}\")\n", - " test_rewards.append(test_reward)\n", - " if test_ma_rewards:\n", - " test_ma_rewards.append(0.9*test_ma_rewards[-1]+0.1*test_reward)\n", - " else:\n", - " test_ma_rewards.append(test_reward) \n", - " # plot(step_idx, test_rewards) \n", - " next_state = torch.FloatTensor(next_state).to(cfg.device)\n", - " _, next_value = model(next_state)\n", - " returns = compute_returns(next_value, rewards, masks)\n", - " log_probs = torch.cat(log_probs)\n", - " returns = torch.cat(returns).detach()\n", - " values = torch.cat(values)\n", - " advantage = returns - values\n", - " actor_loss = -(log_probs * advantage.detach()).mean()\n", - " critic_loss = advantage.pow(2).mean()\n", - " loss = actor_loss + 0.5 * critic_loss - 0.001 * entropy\n", - " optimizer.zero_grad()\n", - " loss.backward()\n", - " optimizer.step()\n", - " print('Finish training!')\n", - " return test_rewards, test_ma_rewards" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "import matplotlib.pyplot as plt\n", - "import seaborn as sns \n", - "def plot_rewards(rewards, ma_rewards, cfg, tag='train'):\n", - " sns.set()\n", - " plt.figure() # 创建一个图形实例,方便同时多画几个图\n", - " plt.title(\"learning curve on {} of {} for {}\".format(\n", - " cfg.device, cfg.algo_name, cfg.env_name))\n", - " plt.xlabel('epsiodes')\n", - " plt.plot(rewards, label='rewards')\n", - " plt.plot(ma_rewards, label='ma rewards')\n", - " plt.legend()\n", - " plt.show()" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Start training!\n", - "Env:CartPole-v0, Algorithm:A2C, Device:cuda\n", - "step_idx:200, test_reward:18.6\n", - "step_idx:400, test_reward:19.7\n", - "step_idx:600, test_reward:24.2\n", - "step_idx:800, test_reward:19.5\n", - "step_idx:1000, test_reward:33.9\n", - "step_idx:1200, test_reward:36.1\n", - "step_idx:1400, test_reward:32.6\n", - "step_idx:1600, test_reward:36.3\n", - "step_idx:1800, test_reward:38.9\n", - "step_idx:2000, test_reward:60.8\n", - "step_idx:2200, test_reward:41.9\n", - "step_idx:2400, test_reward:42.2\n", - "step_idx:2600, test_reward:71.6\n", - "step_idx:2800, test_reward:123.6\n", - "step_idx:3000, test_reward:57.5\n", - "step_idx:3200, test_reward:155.4\n", - "step_idx:3400, test_reward:111.4\n", - "step_idx:3600, test_reward:133.8\n", - "step_idx:3800, test_reward:133.8\n", - "step_idx:4000, test_reward:114.3\n", - "step_idx:4200, test_reward:165.5\n", - "step_idx:4400, test_reward:119.4\n", - "step_idx:4600, test_reward:173.4\n", - "step_idx:4800, test_reward:115.4\n", - "step_idx:5000, test_reward:159.7\n", - "step_idx:5200, test_reward:178.1\n", - "step_idx:5400, test_reward:137.8\n", - "step_idx:5600, test_reward:146.0\n", - "step_idx:5800, test_reward:187.4\n", - "step_idx:6000, test_reward:200.0\n", - "step_idx:6200, test_reward:169.2\n", - "step_idx:6400, test_reward:167.8\n", - "step_idx:6600, test_reward:184.3\n", - "step_idx:6800, test_reward:162.3\n", - "step_idx:7000, test_reward:125.4\n", - "step_idx:7200, test_reward:150.6\n", - "step_idx:7400, test_reward:152.6\n", - "step_idx:7600, test_reward:122.5\n", - "step_idx:7800, test_reward:136.3\n", - "step_idx:8000, test_reward:131.4\n", - "step_idx:8200, test_reward:174.6\n", - "step_idx:8400, test_reward:91.7\n", - "step_idx:8600, test_reward:170.1\n", - "step_idx:8800, test_reward:166.0\n", - "step_idx:9000, test_reward:150.2\n", - "step_idx:9200, test_reward:104.6\n", - "step_idx:9400, test_reward:147.2\n", - "step_idx:9600, test_reward:111.8\n", - "step_idx:9800, test_reward:118.7\n", - "step_idx:10000, test_reward:102.6\n", - "step_idx:10200, test_reward:99.0\n", - "step_idx:10400, test_reward:64.6\n", - "step_idx:10600, test_reward:133.7\n", - "step_idx:10800, test_reward:119.7\n", - "step_idx:11000, test_reward:112.6\n", - "step_idx:11200, test_reward:116.1\n", - "step_idx:11400, test_reward:116.3\n", - "step_idx:11600, test_reward:116.2\n", - "step_idx:11800, test_reward:115.3\n", - "step_idx:12000, test_reward:109.7\n", - "step_idx:12200, test_reward:110.3\n", - "step_idx:12400, test_reward:131.4\n", - "step_idx:12600, test_reward:128.3\n", - "step_idx:12800, test_reward:128.8\n", - "step_idx:13000, test_reward:119.8\n", - "step_idx:13200, test_reward:108.6\n", - "step_idx:13400, test_reward:128.4\n", - "step_idx:13600, test_reward:138.2\n", - "step_idx:13800, test_reward:119.1\n", - "step_idx:14000, test_reward:140.7\n", - "step_idx:14200, test_reward:145.3\n", - "step_idx:14400, test_reward:154.1\n", - "step_idx:14600, test_reward:165.2\n", - "step_idx:14800, test_reward:138.2\n", - "step_idx:15000, test_reward:143.5\n", - "step_idx:15200, test_reward:125.4\n", - "step_idx:15400, test_reward:137.1\n", - "step_idx:15600, test_reward:150.1\n", - "step_idx:15800, test_reward:132.9\n", - "step_idx:16000, test_reward:140.4\n", - "step_idx:16200, test_reward:141.3\n", - "step_idx:16400, test_reward:135.5\n", - "step_idx:16600, test_reward:135.5\n", - "step_idx:16800, test_reward:125.6\n", - "step_idx:17000, test_reward:126.8\n", - "step_idx:17200, test_reward:124.7\n", - "step_idx:17400, test_reward:129.6\n", - "step_idx:17600, test_reward:114.3\n", - "step_idx:17800, test_reward:57.3\n", - "step_idx:18000, test_reward:164.7\n", - "step_idx:18200, test_reward:165.8\n", - "step_idx:18400, test_reward:196.7\n", - "step_idx:18600, test_reward:198.8\n", - "step_idx:18800, test_reward:200.0\n", - "step_idx:19000, test_reward:199.6\n", - "step_idx:19200, test_reward:189.5\n", - "step_idx:19400, test_reward:177.9\n", - "step_idx:19600, test_reward:159.3\n", - "step_idx:19800, test_reward:127.7\n", - "step_idx:20000, test_reward:143.6\n", - "Finish training!\n" - ] - }, - { - "data": { - "image/png": "", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "import easydict\n", - "from common.multiprocessing_env import SubprocVecEnv\n", - "cfg = easydict.EasyDict({\n", - " \"algo_name\": 'A2C',\n", - " \"env_name\": 'CartPole-v0',\n", - " \"n_envs\": 8,\n", - " \"max_steps\": 20000,\n", - " \"n_steps\":5,\n", - " \"gamma\":0.99,\n", - " \"lr\": 1e-3,\n", - " \"hidden_dim\": 256,\n", - " \"device\":torch.device(\n", - " \"cuda\" if torch.cuda.is_available() else \"cpu\")\n", - "})\n", - "envs = [make_envs(cfg.env_name) for i in range(cfg.n_envs)]\n", - "envs = SubprocVecEnv(envs) \n", - "rewards,ma_rewards = train(cfg,envs)\n", - "plot_rewards(rewards, ma_rewards, cfg, tag=\"train\") # 画出结果" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.7.12 ('rl_tutorials')", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.12" - }, - "orig_nbformat": 4, - "vscode": { - "interpreter": { - "hash": "4f613f1ab80ec98dc1b91d6e720de51301598a187317378e53e49b773c1123dd" - } - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/projects/notebooks/DDPG.ipynb b/projects/notebooks/DDPG.ipynb deleted file mode 100644 index 5194644..0000000 --- a/projects/notebooks/DDPG.ipynb +++ /dev/null @@ -1,559 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 1. 定义算法" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 1.1. 定义模型\n", - "\n", - "注意DDGP中critic网络的输入是state加上action。" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [], - "source": [ - "import torch\n", - "import torch.nn as nn\n", - "import torch.nn.functional as F\n", - "class Actor(nn.Module):\n", - " def __init__(self, n_states, n_actions, hidden_dim = 256, init_w=3e-3):\n", - " super(Actor, self).__init__() \n", - " self.linear1 = nn.Linear(n_states, hidden_dim)\n", - " self.linear2 = nn.Linear(hidden_dim, hidden_dim)\n", - " self.linear3 = nn.Linear(hidden_dim, n_actions)\n", - " \n", - " self.linear3.weight.data.uniform_(-init_w, init_w)\n", - " self.linear3.bias.data.uniform_(-init_w, init_w)\n", - " \n", - " def forward(self, x):\n", - " x = F.relu(self.linear1(x))\n", - " x = F.relu(self.linear2(x))\n", - " x = torch.tanh(self.linear3(x))\n", - " return x\n", - " \n", - "class Critic(nn.Module):\n", - " def __init__(self, n_states, n_actions, hidden_dim=256, init_w=3e-3):\n", - " super(Critic, self).__init__()\n", - " \n", - " self.linear1 = nn.Linear(n_states + n_actions, hidden_dim)\n", - " self.linear2 = nn.Linear(hidden_dim, hidden_dim)\n", - " self.linear3 = nn.Linear(hidden_dim, 1)\n", - " # 随机初始化为较小的值\n", - " self.linear3.weight.data.uniform_(-init_w, init_w)\n", - " self.linear3.bias.data.uniform_(-init_w, init_w)\n", - " \n", - " def forward(self, state, action):\n", - " # 按维数1拼接\n", - " x = torch.cat([state, action], 1)\n", - " x = F.relu(self.linear1(x))\n", - " x = F.relu(self.linear2(x))\n", - " x = self.linear3(x)\n", - " return x" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 1.2 定义经验回放" - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "metadata": {}, - "outputs": [], - "source": [ - "from collections import deque\n", - "import random\n", - "class ReplayBuffer:\n", - " def __init__(self, capacity: int) -> None:\n", - " self.capacity = capacity\n", - " self.buffer = deque(maxlen=self.capacity)\n", - " def push(self,transitions):\n", - " '''_summary_\n", - " Args:\n", - " trainsitions (tuple): _description_\n", - " '''\n", - " self.buffer.append(transitions)\n", - " def sample(self, batch_size: int, sequential: bool = False):\n", - " if batch_size > len(self.buffer):\n", - " batch_size = len(self.buffer)\n", - " if sequential: # sequential sampling\n", - " rand = random.randint(0, len(self.buffer) - batch_size)\n", - " batch = [self.buffer[i] for i in range(rand, rand + batch_size)]\n", - " return zip(*batch)\n", - " else:\n", - " batch = random.sample(self.buffer, batch_size)\n", - " return zip(*batch)\n", - " def clear(self):\n", - " self.buffer.clear()\n", - " def __len__(self):\n", - " return len(self.buffer)" - ] - }, - { - "cell_type": "code", - "execution_count": 24, - "metadata": {}, - "outputs": [], - "source": [ - "import torch.optim as optim\n", - "import numpy as np\n", - "class DDPG:\n", - " def __init__(self, models,memories,cfg):\n", - " self.device = torch.device(cfg['device'])\n", - " self.critic = models['critic'].to(self.device)\n", - " self.target_critic = models['critic'].to(self.device)\n", - " self.actor = models['actor'].to(self.device)\n", - " self.target_actor = models['actor'].to(self.device)\n", - " \n", - " # 复制参数到目标网络\n", - " for target_param, param in zip(self.target_critic.parameters(), self.critic.parameters()):\n", - " target_param.data.copy_(param.data)\n", - " for target_param, param in zip(self.target_actor.parameters(), self.actor.parameters()):\n", - " target_param.data.copy_(param.data)\n", - " self.critic_optimizer = optim.Adam(\n", - " self.critic.parameters(), lr=cfg['critic_lr'])\n", - " self.actor_optimizer = optim.Adam(self.actor.parameters(), lr=cfg['actor_lr'])\n", - " self.memory = memories['memory']\n", - " self.batch_size = cfg['batch_size']\n", - " self.gamma = cfg['gamma']\n", - " self.tau = cfg['tau'] # 软更新参数\n", - " def sample_action(self, state):\n", - " state = torch.FloatTensor(state).unsqueeze(0).to(self.device)\n", - " action = self.actor(state)\n", - " return action.detach().cpu().numpy()[0, 0]\n", - " @torch.no_grad()\n", - " def predict_action(self, state):\n", - " ''' 用于预测,不需要计算梯度\n", - " '''\n", - " state = torch.FloatTensor(state).unsqueeze(0).to(self.device)\n", - " action = self.actor(state)\n", - " return action.cpu().numpy()[0, 0]\n", - " def update(self):\n", - " if len(self.memory) < self.batch_size: # 当memory中不满足一个批量时,不更新策略\n", - " return\n", - " # 从经验回放中中随机采样一个批量的transition\n", - " state, action, reward, next_state, done = self.memory.sample(self.batch_size)\n", - " # 转变为张量\n", - " state = torch.FloatTensor(np.array(state)).to(self.device)\n", - " next_state = torch.FloatTensor(np.array(next_state)).to(self.device)\n", - " action = torch.FloatTensor(np.array(action)).to(self.device)\n", - " reward = torch.FloatTensor(reward).unsqueeze(1).to(self.device)\n", - " done = torch.FloatTensor(np.float32(done)).unsqueeze(1).to(self.device)\n", - " # 注意看伪代码,这里的actor损失就是对应策略即actor输出的action下对应critic值的负均值\n", - " actor_loss = self.critic(state, self.actor(state))\n", - " actor_loss = - actor_loss.mean()\n", - "\n", - " next_action = self.target_actor(next_state)\n", - " target_value = self.target_critic(next_state, next_action.detach())\n", - " # 这里的expected_value就是伪代码中间的y_i \n", - " expected_value = reward + (1.0 - done) * self.gamma * target_value\n", - " expected_value = torch.clamp(expected_value, -np.inf, np.inf)\n", - "\n", - " actual_value = self.critic(state, action)\n", - " critic_loss = nn.MSELoss()(actual_value, expected_value.detach())\n", - " \n", - " self.actor_optimizer.zero_grad()\n", - " actor_loss.backward()\n", - " self.actor_optimizer.step()\n", - " self.critic_optimizer.zero_grad()\n", - " critic_loss.backward()\n", - " self.critic_optimizer.step()\n", - " # 各自目标网络的参数软更新\n", - " for target_param, param in zip(self.target_critic.parameters(), self.critic.parameters()):\n", - " target_param.data.copy_(\n", - " target_param.data * (1.0 - self.tau) +\n", - " param.data * self.tau\n", - " )\n", - " for target_param, param in zip(self.target_actor.parameters(), self.actor.parameters()):\n", - " target_param.data.copy_(\n", - " target_param.data * (1.0 - self.tau) +\n", - " param.data * self.tau\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 2. 定义训练\n", - "\n", - "注意测试函数中不需要动作噪声" - ] - }, - { - "cell_type": "code", - "execution_count": 25, - "metadata": {}, - "outputs": [], - "source": [ - "class OUNoise(object):\n", - " '''Ornstein–Uhlenbeck噪声\n", - " '''\n", - " def __init__(self, action_space, mu=0.0, theta=0.15, max_sigma=0.3, min_sigma=0.3, decay_period=100000):\n", - " self.mu = mu # OU噪声的参数\n", - " self.theta = theta # OU噪声的参数\n", - " self.sigma = max_sigma # OU噪声的参数\n", - " self.max_sigma = max_sigma\n", - " self.min_sigma = min_sigma\n", - " self.decay_period = decay_period\n", - " self.n_actions = action_space.shape[0]\n", - " self.low = action_space.low\n", - " self.high = action_space.high\n", - " self.reset()\n", - " def reset(self):\n", - " self.obs = np.ones(self.n_actions) * self.mu\n", - " def evolve_obs(self):\n", - " x = self.obs\n", - " dx = self.theta * (self.mu - x) + self.sigma * np.random.randn(self.n_actions)\n", - " self.obs = x + dx\n", - " return self.obs\n", - " def get_action(self, action, t=0):\n", - " ou_obs = self.evolve_obs()\n", - " self.sigma = self.max_sigma - (self.max_sigma - self.min_sigma) * min(1.0, t / self.decay_period) # sigma会逐渐衰减\n", - " return np.clip(action + ou_obs, self.low, self.high) # 动作加上噪声后进行剪切\n", - "\n", - "def train(cfg, env, agent):\n", - " print(\"开始训练!\")\n", - " ou_noise = OUNoise(env.action_space) # 动作噪声\n", - " rewards = [] # 记录所有回合的奖励\n", - " for i_ep in range(cfg['train_eps']):\n", - " state = env.reset()\n", - " ou_noise.reset()\n", - " ep_reward = 0\n", - " for i_step in range(cfg['max_steps']):\n", - " action = agent.sample_action(state)\n", - " action = ou_noise.get_action(action, i_step+1) \n", - " next_state, reward, done, _ = env.step(action)\n", - " ep_reward += reward\n", - " agent.memory.push((state, action, reward, next_state, done))\n", - " agent.update()\n", - " state = next_state\n", - " if done:\n", - " break\n", - " if (i_ep+1)%10 == 0:\n", - " print(f\"回合:{i_ep+1}/{cfg['train_eps']},奖励:{ep_reward:.2f}\")\n", - " rewards.append(ep_reward)\n", - " print(\"完成训练!\")\n", - " return {'rewards':rewards}\n", - "def test(cfg, env, agent):\n", - " print(\"开始测试!\")\n", - " rewards = [] # 记录所有回合的奖励\n", - " for i_ep in range(cfg['test_eps']):\n", - " state = env.reset() \n", - " ep_reward = 0\n", - " for i_step in range(cfg['max_steps']):\n", - " action = agent.predict_action(state)\n", - " next_state, reward, done, _ = env.step(action)\n", - " ep_reward += reward\n", - " state = next_state\n", - " if done:\n", - " break\n", - " rewards.append(ep_reward)\n", - " print(f\"回合:{i_ep+1}/{cfg['test_eps']},奖励:{ep_reward:.2f}\")\n", - " print(\"完成测试!\")\n", - " return {'rewards':rewards}" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 3. 定义环境" - ] - }, - { - "cell_type": "code", - "execution_count": 26, - "metadata": {}, - "outputs": [], - "source": [ - "import gym\n", - "import os\n", - "import torch\n", - "import numpy as np\n", - "import random\n", - "class NormalizedActions(gym.ActionWrapper):\n", - " ''' 将action范围重定在[0.1]之间\n", - " '''\n", - " def action(self, action):\n", - " low_bound = self.action_space.low\n", - " upper_bound = self.action_space.high\n", - " action = low_bound + (action + 1.0) * 0.5 * (upper_bound - low_bound)\n", - " action = np.clip(action, low_bound, upper_bound)\n", - " return action\n", - "\n", - " def reverse_action(self, action):\n", - " low_bound = self.action_space.low\n", - " upper_bound = self.action_space.high\n", - " action = 2 * (action - low_bound) / (upper_bound - low_bound) - 1\n", - " action = np.clip(action, low_bound, upper_bound)\n", - " return action\n", - "def all_seed(env,seed = 1):\n", - " ''' 万能的seed函数\n", - " '''\n", - " env.seed(seed) # env config\n", - " np.random.seed(seed)\n", - " random.seed(seed)\n", - " torch.manual_seed(seed) # config for CPU\n", - " torch.cuda.manual_seed(seed) # config for GPU\n", - " os.environ['PYTHONHASHSEED'] = str(seed) # config for python scripts\n", - " # config for cudnn\n", - " torch.backends.cudnn.deterministic = True\n", - " torch.backends.cudnn.benchmark = False\n", - " torch.backends.cudnn.enabled = False\n", - "def env_agent_config(cfg):\n", - " env = NormalizedActions(gym.make(cfg['env_name'])) # 装饰action噪声\n", - " if cfg['seed'] !=0:\n", - " all_seed(env,seed=cfg['seed'])\n", - " n_states = env.observation_space.shape[0]\n", - " n_actions = env.action_space.shape[0]\n", - " cfg.update({\"n_states\":n_states,\"n_actions\":n_actions}) # 更新n_states和n_actions到cfg参数中\n", - " models = {\"actor\":Actor(n_states,n_actions,hidden_dim=cfg['actor_hidden_dim']),\"critic\":Critic(n_states,n_actions,hidden_dim=cfg['critic_hidden_dim'])}\n", - " memories = {\"memory\":ReplayBuffer(cfg['memory_capacity'])}\n", - " agent = DDPG(models,memories,cfg)\n", - " return env,agent" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 4. 设置参数" - ] - }, - { - "cell_type": "code", - "execution_count": 27, - "metadata": {}, - "outputs": [], - "source": [ - "import argparse\n", - "import matplotlib.pyplot as plt\n", - "import seaborn as sns\n", - "def get_args():\n", - " \"\"\" 超参数\n", - " \"\"\"\n", - " parser = argparse.ArgumentParser(description=\"hyperparameters\") \n", - " parser.add_argument('--algo_name',default='DDPG',type=str,help=\"name of algorithm\")\n", - " parser.add_argument('--env_name',default='Pendulum-v1',type=str,help=\"name of environment\")\n", - " parser.add_argument('--train_eps',default=300,type=int,help=\"episodes of training\")\n", - " parser.add_argument('--test_eps',default=20,type=int,help=\"episodes of testing\")\n", - " parser.add_argument('--max_steps',default=100000,type=int,help=\"steps per episode, much larger value can simulate infinite steps\")\n", - " parser.add_argument('--gamma',default=0.99,type=float,help=\"discounted factor\")\n", - " parser.add_argument('--critic_lr',default=1e-3,type=float,help=\"learning rate of critic\")\n", - " parser.add_argument('--actor_lr',default=1e-4,type=float,help=\"learning rate of actor\")\n", - " parser.add_argument('--memory_capacity',default=8000,type=int,help=\"memory capacity\")\n", - " parser.add_argument('--batch_size',default=128,type=int)\n", - " parser.add_argument('--target_update',default=2,type=int)\n", - " parser.add_argument('--tau',default=1e-2,type=float)\n", - " parser.add_argument('--critic_hidden_dim',default=256,type=int)\n", - " parser.add_argument('--actor_hidden_dim',default=256,type=int)\n", - " parser.add_argument('--device',default='cpu',type=str,help=\"cpu or cuda\") \n", - " parser.add_argument('--seed',default=1,type=int,help=\"random seed\")\n", - " args = parser.parse_args([]) \n", - " args = {**vars(args)} # 将args转换为字典 \n", - " # 打印参数\n", - " print(\"训练参数如下:\")\n", - " print(''.join(['=']*80))\n", - " tplt = \"{:^20}\\t{:^20}\\t{:^20}\"\n", - " print(tplt.format(\"参数名\",\"参数值\",\"参数类型\"))\n", - " for k,v in args.items():\n", - " print(tplt.format(k,v,str(type(v)))) \n", - " print(''.join(['=']*80)) \n", - " return args\n", - "def smooth(data, weight=0.9): \n", - " '''用于平滑曲线,类似于Tensorboard中的smooth\n", - "\n", - " Args:\n", - " data (List):输入数据\n", - " weight (Float): 平滑权重,处于0-1之间,数值越高说明越平滑,一般取0.9\n", - "\n", - " Returns:\n", - " smoothed (List): 平滑后的数据\n", - " '''\n", - " last = data[0] # First value in the plot (first timestep)\n", - " smoothed = list()\n", - " for point in data:\n", - " smoothed_val = last * weight + (1 - weight) * point # 计算平滑值\n", - " smoothed.append(smoothed_val) \n", - " last = smoothed_val \n", - " return smoothed\n", - "\n", - "def plot_rewards(rewards,cfg,path=None,tag='train'):\n", - " sns.set()\n", - " plt.figure() # 创建一个图形实例,方便同时多画几个图\n", - " plt.title(f\"{tag}ing curve on {cfg['device']} of {cfg['algo_name']} for {cfg['env_name']}\")\n", - " plt.xlabel('epsiodes')\n", - " plt.plot(rewards, label='rewards')\n", - " plt.plot(smooth(rewards), label='smoothed')\n", - " plt.legend()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 5. 我准备好了!" - ] - }, - { - "cell_type": "code", - "execution_count": 28, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "训练参数如下:\n", - "================================================================================\n", - " 参数名 \t 参数值 \t 参数类型 \n", - " algo_name \t DDPG \t \n", - " env_name \t Pendulum-v1 \t \n", - " train_eps \t 300 \t \n", - " test_eps \t 20 \t \n", - " max_steps \t 100000 \t \n", - " gamma \t 0.99 \t \n", - " critic_lr \t 0.001 \t \n", - " actor_lr \t 0.0001 \t \n", - " memory_capacity \t 8000 \t \n", - " batch_size \t 128 \t \n", - " target_update \t 2 \t \n", - " tau \t 0.01 \t \n", - " critic_hidden_dim \t 256 \t \n", - " actor_hidden_dim \t 256 \t \n", - " device \t cpu \t \n", - " seed \t 1 \t \n", - "================================================================================\n", - "开始训练!\n", - "回合:10/300,奖励:-1549.57\n", - "回合:20/300,奖励:-1515.84\n", - "回合:30/300,奖励:-1413.30\n", - "回合:40/300,奖励:-972.99\n", - "回合:50/300,奖励:-829.94\n", - "回合:60/300,奖励:-727.91\n", - "回合:70/300,奖励:-954.71\n", - "回合:80/300,奖励:-1318.39\n", - "回合:90/300,奖励:-981.19\n", - "回合:100/300,奖励:-1262.05\n", - "回合:110/300,奖励:-640.49\n", - "回合:120/300,奖励:-1100.00\n", - "回合:130/300,奖励:-764.66\n", - "回合:140/300,奖励:-352.27\n", - "回合:150/300,奖励:-891.03\n", - "回合:160/300,奖励:-1318.07\n", - "回合:170/300,奖励:-124.30\n", - "回合:180/300,奖励:-240.08\n", - "回合:190/300,奖励:-491.77\n", - "回合:200/300,奖励:-1000.77\n", - "回合:210/300,奖励:-128.87\n", - "回合:220/300,奖励:-950.32\n", - "回合:230/300,奖励:-122.48\n", - "回合:240/300,奖励:-246.52\n", - "回合:250/300,奖励:-374.37\n", - "回合:260/300,奖励:-368.25\n", - "回合:270/300,奖励:-364.17\n", - "回合:280/300,奖励:-725.39\n", - "回合:290/300,奖励:-131.21\n", - "回合:300/300,奖励:-610.10\n", - "完成训练!\n", - "开始测试!\n", - "回合:1/20,奖励:-116.05\n", - "回合:2/20,奖励:-126.18\n", - "回合:3/20,奖励:-231.46\n", - "回合:4/20,奖励:-246.40\n", - "回合:5/20,奖励:-304.69\n", - "回合:6/20,奖励:-124.40\n", - "回合:7/20,奖励:-1.06\n", - "回合:8/20,奖励:-114.20\n", - "回合:9/20,奖励:-348.97\n", - "回合:10/20,奖励:-116.11\n", - "回合:11/20,奖励:-117.20\n", - "回合:12/20,奖励:-118.66\n", - "回合:13/20,奖励:-235.18\n", - "回合:14/20,奖励:-356.14\n", - "回合:15/20,奖励:-118.39\n", - "回合:16/20,奖励:-351.94\n", - "回合:17/20,奖励:-114.51\n", - "回合:18/20,奖励:-124.78\n", - "回合:19/20,奖励:-226.47\n", - "回合:20/20,奖励:-121.49\n", - "完成测试!\n" - ] - }, - { - "data": { - "image/png": "", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjMAAAHJCAYAAABws7ggAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAA9hAAAPYQGoP6dpAAC7AklEQVR4nOydd3wb5f3H36ctD3lvO8tJHLI3hIQQdliBphQokAYKpS0jbeBXAoWywk6AAmGPAmE07BbChrKzEwgh286wHe8lD1nzfn9IJ9uxHS9tPe/Xyy/b0o3n0Z3uPvedkizLMgKBQCAQCARhiirYAxAIBAKBQCAYCELMCAQCgUAgCGuEmBEIBAKBQBDWCDEjEAgEAoEgrBFiRiAQCAQCQVgjxIxAIBAIBIKwRogZgUAgEAgEYY0QMwKBQCAQCMIaIWYEXeKvWoqiRqNAIBAIfI0QM4JOfPHFFyxZssTn2920aRNXXnml9/+SkhIKCgp45513fL4vQWSwdu1aTjvtNMaOHcsVV1zR5TILFiygoKDA+zNq1CgmTZrE/Pnzefnll3E4HD5dvqCggLFjxzJnzhzuuOMOGhoaOo2pvLyc5cuXc9ZZZzFp0iQmTZrEr371K5555hksFkuP8169ejUnnHACY8eO5dZbb+3DJ9Y7Dp9PQUEB48eP58wzz+TZZ5/F5XL5fJ9HGstjjz3m93WCyZdffklBQUGwhxHRaII9AEHo8eKLL/plu2+++SaFhYXe/9PT01m1ahWDBg3yy/4E4c8DDzyAy+XimWeeISUlpdvlRo8ezW233QaA0+mkoaGBb775hnvvvZeNGzfyz3/+E5VK5ZPlAex2O7/88gsPPfQQO3bs4PXXX0eSJADWrVvHokWLSEhI4KKLLqKgoACXy8W6det48skn+fTTT3n11VfR6/XdzufOO+9kyJAh3HfffWRkZPTvw+uB8847j9/85jfe/y0WC59++inLly/HbDZz/fXX+2W/0ca6devEZxkAhJgRBA2dTsfEiRODPQxBCFNfX8+0adM49thjj7hcXFxcp3PpxBNPZNiwYdx999188MEHzJs3z6fLT5s2jebmZh599FF++uknJk6cSG1tLYsXL2bIkCH861//IiYmxrv8zJkzOemkk/jtb3/LSy+91MFK2dW8Z86cydFHH33EeQ+EzMzMTnOaMWMGRUVFvPrqqyxatAitVuu3/Uc6TU1NPPvsszz77LPEx8fT0tIS7CFFNMLNJOjAggULWL9+PevXr6egoIB169YB7ovrrbfeyrHHHsu4ceM4//zzWbNmTYd1v//+e84//3wmTZrEtGnT+POf/+y1xNx44428++67lJaWel1Lh7uZ3nnnHUaPHs1PP/3EBRdcwLhx4zjhhBN4/vnnO+ynsrKSxYsXM336dKZNm8att97Kww8/zIknnnjEuVVWVrJkyRJmzJjBpEmTuOSSS9iyZQvQvcvrxhtv7LDdBQsW8H//938sWrSIiRMnctlll3HaaaexaNGiTvs755xz+POf/+z9//PPP2f+/PmMGzeOmTNnctddd/V4gXM6nbz66qucffbZjB8/njlz5rB8+XKsVmuHMV566aW8/fbbXpfMOeecwzfffHPEbQO89957/OpXv2LChAnMmTOHBx98EJvNBsBjjz3GiSeeyP/+9z/mzp3LhAkTOP/8873nBLiPWUFBASUlJR22e+KJJ3LjjTcecd/79+9n0aJFzJw5k4kTJ7JgwQI2bdoEtB2P0tJS3nvvvQ7nYl+45JJLyMjI4N///rdflh87diwAhw4dAuC1116jpqaGu+66q4OQUZgwYQILFy7s8j1wP8Ur7ojHH3+8w2f7/fffc9FFFzFlyhSOPvporr/+esrKyrzrKt+fN998k5kzZzJ9+nT27t3bq3kcPqfm5mav+0yxjJ1yyimMHTuW0047jZUrV3ZYZ8GCBdx8880888wzzJkzh3HjxnHhhReydevWDsutX7+eCy64gAkTJnDaaafxww8/dDn/w4/1ggULWLBgQZfj7e05WFBQwOuvv86NN97IlClTmD59OnfddRetra3cf//9HHPMMRx99NHcfPPNHb5fh7N582YKCgr43//+1+H1HTt2UFBQwGeffQbAW2+9xRtvvMGtt97KJZdc0u32BL5BiBlBB2677TZGjx7N6NGjWbVqFWPGjMFqtbJw4UK++OILFi9ezIoVK8jMzOSKK67wCpri4mKuuuoqxo4dy5NPPsndd9/Nvn37uPLKK3G5XFx11VUcf/zxpKWlsWrVKubMmdPl/l0uF3/9618544wzeOaZZ5g8eTIPPPAA3377LQA2m42FCxeyefNm/v73v3Pvvfeyc+dOXnjhhSPOq7m5md/+9resW7eOv/3tb6xYsQK9Xs/vf/979u/f36fP6KOPPiI2NpYnn3ySK664gnnz5vH111/T1NTkXaawsJCdO3dyzjnnAPD+++9z9dVXM2zYMB5//HGuueYa/vvf/3LVVVcdMSj61ltv5d577+Xkk0/mySef5OKLL+aVV17ptN62bdt4/vnnWbRoEY8//jhqtZprr722y3gOhVdffZUlS5YwZswYVqxYwZVXXsnKlSu56667vMvU1tayZMkSLrroIh555BEMBgOXX345O3bs6NNndjh79+5l/vz5lJSUcMstt7B8+XIkSWLhwoWsX7/e64JMS0vj+OOP956LfUWlUjFjxgy2bt3aKRbGF8vv27cPgLy8PMAdb1ZQUMCIESO6XWfJkiXd3tzGjBnDqlWrALcbaNWqVaSnp/Pee+/x+9//nqysLB566CFuuukmtmzZwgUXXEBNTY13fafTyQsvvMDdd9/NTTfdRH5+fo9z6GpOsbGxXrfe7bffzqOPPsq8efN46qmnmDt3Lvfccw+PP/54h/U++eQTvvjiC2655RYeeughqqurufbaa3E6nQD88ssv/P73vyc+Pp5HH32U3/3ud1x33XV9Ht9AWLZsGTqdjhUrVnDuueeycuVKzj33XMrKyli+fDkLFizgrbfe6iTW2jN58mQGDRrE6tWrO7z+wQcfkJiYyPHHHw+4xdSXX37JhRde6Nc5CdwIN5OgA8OHDycuLg7Aa4J+44032LlzJ2+88QYTJkwAYPbs2SxYsIDly5fz9ttvs3XrVlpbW/njH//o9fFnZmbyxRdf0NLSwqBBg0hOTu7gWurKKiHLMldddZXXlz9lyhQ+++wzvvrqK4477jj++9//UlRUxNtvv+19Kj7mmGM4+eSTjzgvxSr07rvvctRRRwHui9K5557Lhg0bmDFjRq8/I61Wyx133IFOpwNg0KBBPPbYY3z++eece+65gPvCZjKZOPHEE5FlmeXLl3PcccexfPly73aGDBnCpZdeytdff92luNu7dy9vvfUW119/vdclMXPmTNLT07nhhhv45ptvvBfOxsZG3nnnHW/8UUxMDJdccok3gPZwXC4Xjz/+OCeffHIH8WKxWFi9ejV2u937/+233+6dl/JZP/PMMzz88MO9/swOZ8WKFeh0Ol5++WXv+TZnzhzOOussHnjgAd566y0mTpyITqcjOTl5QO7I1NRU7HY79fX1pKam9mt5WZY7iJuGhgbWr1/Pk08+yaRJk7zn4sGDB5k5c2anbXYljDSazpff9u4sxQ3kcrlYvnw5s2bN4sEHH/QuO3nyZM444wyef/55brjhBu/rf/rTn7p9WGiPy+XyjkuWZaqrq3n//ff58ssvueKKK5AkiX379vHGG29w3XXXec/BWbNmIUkSTz/9NBdddBFJSUneOT7//PPe49nc3MySJUvYsWMHY8eO5emnnyYlJYUnn3zS675KSkpi8eLFPY7VVwwfPpw777wTgOnTp/Pmm29it9tZvnw5Go2GWbNm8cknn7B58+YjbmfevHm88MILtLa2YjAYkGWZDz/8kLlz53a4LggCh7DMCHpkzZo1pKWlMWbMGBwOBw6HA6fTyQknnMC2bdtoaGhgwoQJ6PV6zjvvPO6++26+/fZbRo0axeLFi70Xt94yadIk79/KzUwRPmvXriUvL8978wD3DeCEE0444jY3bdpEbm6uV8gAGI1GPvnkkw5BkL1h2LBh3gsWuJ/KJ0+ezIcffuh9bfXq1d4LW1FREeXl5Zx44onez8/hcDBt2jTi4uL4/vvvu9zP+vXrATjzzDM7vH7mmWeiVqs7mOKTk5M7XDwzMzMBus2c2bdvHzU1NZxyyikdXr/88st55513vDcbjUbDWWed5X3fYDAwe/ZsNmzY0P0H1AvWr1/PCSec0OHc0Gg0nHnmmWzbto3m5uYBbb89igVLCdDtz/IbNmxgzJgx3p9jjz2W6667jrFjx/Lggw96l+0qC8jhcHRYV/npLfv27aOqqqrDcQD3zXLSpEne80Sh/Tl+JJ544gnvWJTsrMcff5wLLriAa6+9FnB/32RZ7nTunnjiiVitVq9bEDo+CAHehxrlHNy0aRPHHXdchzicU089FbVa3evPYqC0v7ao1WqSkpIYM2ZMB2GZmJhIY2Mj0Cb42l/3wC1mWlpavK6mzZs3c+jQIa8lVhB4hGVG0CP19fVUVVV1ewGuqqpi+PDhvPLKKzzzzDO89dZbvPzyy5hMJi666CL++te/9vpGAu4bZntUKpX3BlNXV9dlVsuRMl2UOfS0TG+JjY3t9No555zD0qVLqauro6SkhAMHDnDPPfd49w1wxx13cMcdd3Rat7Kyssv9KC6itLS0Dq9rNBqSkpK8F1xwC7P2HOnm2n5MPX0mqampnSwIKSkp3vX7S0NDQ5dWktTUVGRZpqmpqcvPuT9UVFRgMBhITEzs9/JjxozxHjtJktDr9WRlZXUS6jk5OZSWlnZ4TaPR8NZbb3n/f+ONN3jjjTd6PX7ls+7u89q+fXuH17qLxzmc888/n/PPPx9wzyk2Npbc3NwOYkPZ9+GCWqGiosL79+HnoJINppyDDQ0NXiuOgnIuB4quHqyO9Hn9/e9/59133/X+n5OTw5dffsngwYOZNGkSq1ev5vTTT2f16tUMGjSIyZMn+2Xcgp4RYkbQI/Hx8QwZMqSDi6Q9ubm5AIwfP54VK1Zgs9nYtGkTq1at4qmnnmLUqFGcfvrpPhlLRkZGlzEu7eMGuiI+Pr5TgCC4n6gSEhK8Akp58lLobQbC6aefzl133cXnn39OUVEROTk5TJkyBQCTyQTADTfcwPTp0zutm5CQ0OU2lderqqrIycnxvm6326mrqxvQTUAZU21tbYfX6+rq2L59u/cJtivRUl1d7RVB3YmmniwrCQkJVFdXd3q9qqoKwGc3OIfDwbp165g8eXKvLADdLR8bG8u4ceN6XP/EE0/kmWeeobi42BtHA3RY96uvvurTHBRR1d3n1d/PKj09vcc5KefJSy+91KW4zM7O7vX+EhMTO81BluUOcV1HOp+6E7f9PQd7wzXXXMPFF1/s/b+9RXbevHnce++9NDY28vHHH/Pb3/52wPsT9B/hZhJ0on19DXD7lsvKykhJSWHcuHHen++//57nnnsOtVrNiy++yAknnIDNZkOn0zFjxgyWLl0KtGV6HL7d/jB9+nRKSko6BKC2trZ6A4S7Y+rUqRQXF7Nnzx7va1arlWuvvZa33nrL+8TW/knTbrd3ysboDpPJxAknnMAXX3zBJ598wrx587wX2WHDhpGSkkJJSUmHzy8jI4MHH3yw05N1+7kCnQINV69ejdPp9Iql/jBs2DCSkpI6ZWT85z//4corr/TGzBz+2ba2tvLNN994Y4yUz628vNy7TGFhYY+Wm2nTpvG///2vQ9C00+lk9erVjBs3rsNNYyCsWrWKqqqqXt9o+rr84Vx88cUkJiZy4403dpibgtPppKioqE/bHDp0KGlpaXzwwQcdXi8uLubHH3/0qzVg6tSpgFvktj93a2treeSRR/pkoZsxYwbffPNNB9fnt99+6z3XoOvzqaGhoUN9qsPp7znYG3JzczvMu33huzPOOANZlnnkkUeoqanpkMovCDzCMiPohMlkYsuWLaxZs4bRo0czf/58XnnlFS677DL+9Kc/kZWVxQ8//MCzzz7LJZdcglar5ZhjjmH58uVcffXVXHLJJajVav7973+j0+m88Swmk4nq6mq+/vrrXvv1D+ess87imWee4eqrr+Yvf/kLJpOJf/3rX9TU1BzxKXH+/PmsXLmSP//5zyxatIikpCRefvll7HY7F110EQkJCUyaNImVK1cyePBgEhISePnll2ltbe212X7evHksWrQIp9PZwXeuVqtZvHgxt956K2q1mhNOOAGz2cwTTzxBRUVFt+674cOH86tf/YpHH30Ui8XCtGnT2LFjBytWrODoo4/muOOO69uH1w4l2+nOO+8kJSWFE088kX379vHoo49y8cUXd7AW3XTTTfz1r38lJSWF559/npaWFm/K+dFHH43BYOC+++7jL3/5i7fuSk8unWuuuYZvvvmG3/3ud1x55ZVotVpeeeUViouLee655/o8n6amJn788UfA/YReV1fHd999x6pVq5g3bx6nnnrqgJbvLRkZGaxYsYK//OUvzJs3jwsuuIAxY8agUqnYtm0bb7/9Nvv37+/TjU+lUnHddddx0003cf311zNv3jzq6upYsWIFCQkJXHbZZf0aa28oKChg3rx5/OMf/6C0tJSxY8eyb98+Hn74YXJzcxkyZEivt3X11Vfz+eefc/nll3PFFVdQW1vLP//5zw5urYKCArKysnj88ceJi4vzBhof7sJqT3/PwYGiZC699tprTJo0icGDB/t1f4IjI8SMoBMXX3wx27Zt4w9/+AP33nsvZ599Nq+++ioPPvggy5Yto7GxkZycHK6//np+//vfAzBq1CieeuopHn/8ca677jqcTidjx47lhRdeYNiwYYBbUHz99ddcffXVLFq0iDPOOKPPY9NoNDz//PPcfffd3H777Wg0GubNm0diYqI3TbYr4uLieOWVV3jggQdYunQpLpeLiRMn8vLLL3vdAffddx9Lly7llltuIS4ujvPOO48pU6bw5ptv9mpsxx9/PPHx8eTl5TF06NAO7/3mN78hNjaW5557jlWrVhETE8PkyZNZvnx5B3fE4dx9990MHjyYt99+m2effZb09HR+97vfcdVVVw3Y0nXxxRcTExPD888/z6pVq8jMzOQPf/gDf/jDHzosd/vtt3PPPfdQW1vL5MmTef31170XbpPJxGOPPcaDDz7I1VdfTU5ODtdccw3vvffeEfc9YsQIXnvtNW+asSRJjB8/npdfftlrDegL27dv54ILLgDa4j9GjhzJ7bff3mWAd1+X7wtTp07l/fff5/XXX+fjjz/m2WefxWazkZWVxTHHHMPDDz/M6NGj+7TN+fPnExsby9NPP83VV19NXFwcxx13HNddd12nmCpfc++99/L000/z73//m/LyclJSUjjjjDP461//2qfg3SFDhvDKK69w3333sXjxYlJSUliyZAn33Xefdxm1Ws2jjz7KPffcw3XXXUdqaioLFy6kqKio2+93f89BX3DOOefw+eefc/bZZ/t9X4IjI8mi858gjNizZw9FRUWceuqpHYKKzzvvPDIzM1mxYkUQRxdZPPbYY6xYsYJdu3YFeygCgUBwRIRlRhBWtLS08Je//IWLLrqIU045BafTyYcffsi2bdv4v//7v2APTyAQCARBQIgZQVgxYcIE/vnPf/L888/z3nvvIcsyo0eP5rnnnuOYY44J9vAEAoFAEASEm0kgEAgEAkFYI1KzBQKBQCAQhDVCzAgEAoFAIAhrhJgRCAQCgUAQ1ggxIxAIBAKBIKyJmmwmWZZxuXwf66xSSX7ZbigSTXOF6JqvmGvkEk3zFXONPFQqqVeNiqNGzLhcMrW1A2881h6NRkVSUixmcwsOR9fdiSOFaJorRNd8xVwjl2iar5hrZJKcHIta3bOYEW4mgUAgEAgEYY0QMwKBQCAQCMIaIWYEAoFAIBCENULMCAQCgUAgCGuiJgBYIBAIBOGJy+XC6XT0sIxEa6sam82K0xnZWT6RMle1WoNK5RubihAzAoFAIAhJZFnGbK7FYmnq1fLV1SpcrsjO7lGIlLkajXGYTMm9Sr8+EkLMCAQCgSAkUYRMXFwSOp2+xxueWi2FtaWiL4T7XGVZxmaz0tRUB0BCQsqAtifEjEAgEAhCDpfL6RUycXGmXq2j0agivu6KQiTMVafTA9DUVEd8fNKAXE4iAFggEAgEIYfT6QTabniCyEQ5vj3FRPVESIoZl8vFo48+ynHHHcfEiRP5wx/+QHFxcbCHJRAIBIIAM9BYCkFo46vjG5Ji5oknnuC1115j6dKl/Pvf/8blcnHFFVdgs9mCPTSBQCAQCAQhRsiJGZvNxgsvvMCiRYuYM2cOo0aN4uGHH6a8vJxPP/002MMTCAQCgUAQYoRcAPDOnTtpbm5mxowZ3tdMJhOjR49mw4YNnHXWWf3etkbjW+2mVqs6/I5kommuEF3zFXONXMJ5vi5X39wPirdCkkAO3ySfXuHPuV5zzZVkZWVz8823+3bDPaBWSwO6R4ecmCkvLwcgKyurw+vp6ene9/qDSiWRlBQ7oLF1h8lk9Mt2Q5Fomiv4Z76NLTY+/H4fc2cMISEudIIbo+nYRtNcITzn29qqprpa1eebXDgKt/7ij7lKkoQkDUxY9AWXS0KlUpGQEIPBYOj3dkJOzFgsFgB0Ol2H1/V6PQ0NDf3ersslYza3DGhsh6NWqzCZjJjNFpzO8E6R64lomiv4d76vfbabj9cdpKKmmQWnFfh02/0hmo5tNM0Vwnu+NpvVU/lX7lUKsiS55+t0uqLCMuOvucqyjCz37jP3BU6njMvloqGhBYvF2el9k8nYK9EWcmJGUWY2m62DSrNarRiNA3u68NfBcTpdYZ/v31uiaa7gn/nuLXGL8l0H60Lqs4ymYxtNc4XwnG93BeFkWcZm73ouGo3/b8I6rarPGTizZk3lssv+wIcfvo/DYWfFimfJzMzi2Wef5NNPP6K5uYmhQ/O54oo/MX36MRQW7mXhwgt5/vlXKCgYBcBNN/0fmzdv4MMPv0StVuNyuTjzzJO59trrOO20M3j//fd4661/U1xcjEolMXLkKBYtuo5Ro0YDcN55ZzNnzkmsXfs9dXW13HXXA4wZM46nnnqMTz/9GLvdxjnn/Br5MHX02msree+9t6iqqiQ1NY0zz5zHwoWX+zzLrLeitTtCTswo7qXKykoGDRrkfb2yspKCguA/xQoEA8HpcnGwohGAkspmrDYnep06yKMSCMIDWZa595XN7C3tv5V+oAzPTeCmiyf3+Wb+7rtvsnz5ozgcTvLyBnH77Tdz4MA+br11KWlp6Xz//TfccMNfueee5Rx77CyysrLZsGEtBQWjcDqdbNmykZaWFnbv3slRR41h+/ZtNDY2MmPGLL7++n88/PADLFlyCxMmTKK6upp//nMZ9913Fy+++Jp3DO+88wb33/8w8fHxDBs2nH/+cxnff/8tN998GxkZWbz88gv89NMWsrNzAPjuu29YufJf3HnnPeTlDeGXX7Zy1123kZWVzWmnneHTz3WghJyYGTVqFHFxcaxbt84rZsxmM9u3b+eSSy4J8ugEgoFRVt2CzfP04ZJl9pWZGTU4KcijEgjCiDAtO3PaaWd4rSQlJcV8/vkn/OtfrzJihPsh/cILL2Hv3j289trLHHvsLGbOPI4NG9ZxySWXsmPHL2g0WsaOHcfmzRs56qgxfP/9d0yYMAmTyURCQgI33vgPTj31dAAyM7M466x5PPTQAx3GcMwxM5k27WgAWlqa+eijD7j++iXMmDELgJtuupXNmzd6lz90qASdTktmZjaZmZlkZmaSmppORkam3z+vvhJyYkan03HJJZewfPlykpOTycnJYdmyZWRmZnLqqacGe3gCwYDYV27u8H/hoQYhZgSCXiJJEjddPPkIbib/l/jvj5sJIDe3zdOwe/cuAK666ooOyzgcDuLi4gGYOfM4/vvfd7FaW9mwYR1TpkwlMzObTZs2cvHFC/n++++YO/dMACZOnMz+/ft48cXnOHBgPyUlByks3NupEWVubp7374MHD2C32xk1aoz3Nb1ez8iRbR6QU089g9Wr/8tvfzufIUOGMW3a0cyZcxKZmULM9IpFixbhcDi45ZZbaG1tZdq0aTz//PNotdpgD00gGBD7y90uJqNejcXqpLDU3MMaAoGgPZIkdeua1WhUqFWhabrR69syF2XZLTIef/xZYmI6Ztkq/YkmTZqKVqtly5bNbNy4ntNOO4OsrCzeeecNysvL2LNnF3ff7ba8fPrpx9x9922ceurpjB07nnPOmU9RUSEPPXR/t2NQTFzKWBQ0mjZZkJiYyL/+9Rrbtm1lw4Z1rFu3hjfffJ3LL/8jl132h4F9ID4mJHPY1Go1f/vb31izZg1btmzhmWeeITc3N9jDEggGzP4yt5iZNS4bcFtmDg+4EwgEkc3QofkA1NRUk5ub5/1Zvfq/fPjh+4BbVEyfPoPvvvua7du3MWXKNMaPn4jT6eT5558mP384WVnu68irr77I2Wefy803386vf30+EydOprS0BKDb68ugQYPR6fRs3fqT9zWHw8GePbu9/3/66Ue8++5bjB8/kcsv/yPPPOPezxdfhF4B25AUMwJBJOJwuiiubALg+InZaNQSjS12quotQR6ZQCAIJMOG5XPsscexbNm9fPfdN5SWlvDqqy/xyisvkpPT9uA+a9ZsPvzwfVJT08jJyUWvNzB27Hg++eRDZs+e410uPT2Dn3/+iV27dlJaWsKqVa/yzjtvAHTbBigmJobzzjufF154mq+//pIDB/azfPm9VFdXeZex2aw8/vgjfPzxasrKDvHTTz+yZctmxo4d758PZgCEpJtJIIhESquacThdxBo0ZKXEMDgjnsJDZgpLzaQnxQR7eAKBIIDceee9PPPM4yxbdg+NjWays3O58cZ/cPrpbVXuZ8yYidPpZPLkqd7Xpk6dzubNG5k9+3jva4sX38ADD9zNNddciU6nZfjwkdxyyx3cdtvf2blzOxMmTOpyDH/84zXodHoeeuh+WlpaOPHEU5g5c7b3/bPOOpeGhgZefPE5KisriI+PZ86ck/jznxf54RMZGJIcJTZup9NFbW2zT7ep0ahISoqlrq457Go49JVomiv4Z75f/VjKyx/vYsyQJK6/cBKvf76HzzYWc+LkHC45NXhlB6Lp2EbTXCG852u326ipKSMlJQutVtfzCgQmADhUiJS59nSck5Nje1U0T7iZBIIAocTLDMkyAZCf4/4tgoAFAoFgYAgxIxAEiP2etOwhme7Uy+E5CQAUVzZhtXUu4y0QCASC3iHEjEAQAOwOJ6VVbjfnkEy3RSbZZCApXo9Llr1CRyAQCAR9R4gZgSAAFFc243TJxMdoSTa11XoYlu1xNR0SYkYgEAj6ixAzAkEAaHMxmTpUD83PdruaCoPYa0YgEAjCHSFmBIIA4A3+9cTLKChxM4WlonieQCAQ9BchZgSCAOC1zGR1FDODM+NQqyTMLXaqGlqDMTSBQCAIe4SYEQj8jNXupLS6Y/CvglajZlCGW+AUCVeTQCAQ9AshZgQCP1Nc0YQsQ0KcjqR4faf3Rb0ZgUAgGBhCzAgEfmafx8U09DCrjIISN7P3kLDMCAQCQX8QYkYg8DPdBf8qKBlNJZVNWO2ieJ5AIPAvDoeDVate9f7//PNPc955Z/t8P/7ablcIMSMQ+Jnugn8Vkk16EuJ0OF0yB8obAzk0gUAQhXz22cc89tjDwR6GTxFiRiDwIxarg/KaFqBz8K+CJEkMF/VmBAJBgIjEMhCaYA9AIIhkDlY0IgMpJj2m2O47/+bnJLBpdxV7hZgRCI6ILMvgsHXzngrZ352kNboOhS97w5o13/Pcc0+xf38RRmMMM2bM5Nprr2Pv3t0sXnw1d955H0899RgVFRWMHTuOm2++nddfX8nHH69Go9Hym99cyMKFl3u399FHH7Bq1ascPHiQ5ORkzjrrHBYsuAy1Wg1ARUU5Tz/9OBs3rqelpZnx4ydy1VV/YfjwEXz44fvcc88dAMyaNZVHH33Ku91XXnmRt99+g4aGBsaMGcsNN9xMXt4gAJqamnj88Uf49tv/YbfbKSg4iquuWsSoUaO96//nP+/w2msvU1VVxbRp08nKyu73x9xXhJgRCPzI/nIlXqZrq4yCN6PpkBlZlvt8sRQIogFZlmn57924KvYGbQzqjBEY5/2919/R+vp6br75b1xzzWKOPXYWlZUVLF16G0888Qinnno6TqeTl19+gdtuuwuHw8Hf/vZXLr30Is466xyeeeYlPv30I5599klmzTqe/PzhvPHGazz11AoWLbqOKVOms337Nh566H4aGhr4y1+up6WlmT//+XKys3O4774H0Wp1vPDCM1xzzR948cXXOemkU2hqauLRRx/kP//5GJMpgS1bNlFeXsbPP//EsmWPYLfbWLr0Vu67bymPP/4ssizzt78tQqczcP/9/yQuLo6PP17Nn/98OU8//S9GjhzFZ599zEMP3c9f/vJ/TJ06nW+++R/PPPME6ekZfj4iboSbSSDwI/vKjhwvozA4I95dPK/ZRo0onicQdItEeAn9qqoKbDYbGRmZZGZmMX78RO6//yF+/esLvMtcccWfGDVqNGPHjmfKlGkYjUauumoRgwYNZsGCSwEoKtqLLMu88spLzJ9/Pueddz55eYM47bQzuPzyP/Huu2/S1NTEJ598RENDPUuX3s/o0WMZMWIkt99+F3q9gXfeeQO93kBcXBwAKSmpaLVaADQaDbfeupThw0dw1FFjOOec+ezcuR2ATZs2sG3bzyxdei9jxoxl8OAh/PGPVzNmzDjefPPfALz11ipOPvlU5s//DYMGDeaSSy5l5szjAvY5C8uMQOBHemuZ0WnVDMqIY19ZI3sPNZCaaAzE8ASCsEKSJIzz/t6tm0mjUeEIMTfTiBEFnHzyaSxZspiUlFSmTTuaY489jtmz57B1648A5ObmeZc3Go1kZWV796HXGwCw2+3U19dRW1vD+PETO+xj0qTJOBwODhzYT2HhXvLyBpOUlOR9X683MHr0GAoLC7sdZ3JyCrGxcd7/4+NNWK1WAHbv3oksy/z612d1WMdms3mXKSray8knn9bh/bFjx7Nnz+7efEwDRogZgcBPNLfaqayzADC4m7Ts9uRnJ7CvrJHCUjPHjM709/AEgrBEkiTQdi4+CSBpVEiSn8VMP7j99rv5/e//wNq1P7BhwzqWLv0H48dP9MbBaDQdb8XdiaXuAnddLrnddrpbxoVGo+52jCpV944al8tFbGwszz//Sqf3FMsOSMhyx8/+8Hn5E+FmEgj8hJJmnZZoIM6o7WFpdxAwiIwmgSCS+OWXbTz66IMMGjSE88+/iGXLHuGmm25l06YN1NXV9WlbyckpJCeneC06Cj/9tAWtVktOTi75+SMoLj5AXV2t932r1crOnTsYMmQY0L1Y6o5hw4bT3NyM3W4nNzfP+/Pqqy/x3XdfAzBixEi2bv2pw3o7d+7o034GghAzAoGf6K2LSSE/271ccWUTNlE8TyCICGJjY3nnnTd54olHKSkppqhoL1988Sm5uYNITEzs8/Z++9sFvPPOG7z99puUlBTz6acf88ILzzBv3q+Ii4vjlFPmkpCQyD/+cSM7dvzC3r17uPPOW7BYLJxzznzA7coCt9iwWnuO0Tv66BmMGDGS2267ic2bN1JSUsxjjz3Ehx++7xVIl1xyKd988z9ee+1liosP8tZb/+arr77o8/z6ixAzAoGf2N/L4F+FlAQDCbHu4nn7RfE8gSAiGDJkKHffvYzNmzdy2WUX8ec/X45KpebBBx/tV9bib397CVdf/Rf+/e9XueSS3/Dcc09y8cULWbToegDi4uJ47LGniY838Ze/XMVVV12B1WrlySefJzs7B4DJk6cxevRY/vzn3/P999/1uE+1Ws3DDz/BqFGjufXWG1m48EJ+/HELd9+9jClTpgFw7LGzuO22u1i9+r8sXHghX3/9Py688JI+z6+/SHIkVs/pAqfTRW1ts0+3qdGoSEqKpa6u2f9BZ0EmmuYKvpnvDU/+QHVDK3/77SSOGpzU8wrAind+ZvPuKn5zQj6nHz24X/vtK9F0bKNprhDe87XbbdTUlJGSkoVW232NpvYEJAA4RIiUufZ0nJOTY1Gre7a7CMuMQOAHGltsVHtSrAdn9M4yA6KDtkAgEPQHIWYEAj+gBP9mJMcQY+h9RH9+u7YGUWI0FQgEggEjxIxA4Af2ecTM0F6kZLdnSKa7eF5Ds40asyieJxAIBL1BiBmBwA94g3/7KGZ0WjV56e7CVcLVJBAIBL1DiBmBwA9407KzepeW3R5Rb0YgaEO4WyMbXx1fIWYEAh/T0GSlrtGKBAzKiOtx+cNR6s0UHhJiRhC9KB2gbTZrkEci8CfK8VWrB1YtWLQzEAh8jGKVyUqNxaDr+1dMscwcrHAXz9Npuy9BLhBEKiqVGqMxjqYmd5VcnU7fY10Wl0vC6YwOS064z1WWZWw2K01NdRiNcUdsp9AbhJgRCHxMW+XfvsXLKKQmGDDF6jA32zhQ0ciI3EQfjk4gCB9MpmQAr6DpCZVKhcsV/rVXekOkzNVojPMe54EgxIxA4GOU4N+h/YiXAXfflPxsE1v2VFNYahZiRhC1SJJEQkIK8fFJOJ2OIy6rVkskJMTQ0NAS1haL3hApc1WrNQO2yCgIMSMQ+BBZlr1p2f21zIDb1eQWMyJuRiBQqVSoVEeuAqzRqDAYDFgszoiojHskommuvSXgAcCbNm2ioKCg08+6deu8y6xZs4b58+czYcIE5s6dy+rVqwM9TIGgX9Q1WjE321BJkjfFuj8oQcB7D4nieQKBQNATAbfM7Nq1i0GDBvHaa691eD0hwZOOWljIH//4Ry677DKWLVvGV199xQ033EBycjIzZswI9HAFgj6hxMvkpMUOKHB3SJbJXTyvyUat2UpKgsFXQxQIBIKII+BiZvfu3QwfPpy0tLQu33/ppZcoKChg8eLFAOTn57N9+3aee+45IWYEIc/+8v4VyzscvVZNbnocB8obKTzUIMSMQCAQHIGAu5l27dpFfn5+t+9v3Lixk2g55phj2LRpkzC3C0Ke/WX9L5Z3OF5Xk4ibEQgEgiMScMvMnj17SEpKYv78+VRUVDBy5EgWL17M+PHjASgvLyczM7PDOunp6VgsFurq6khO7n8Kl0bjW+2mtCXvTXvycCea5gr9m68sy143U35OwoDPt5F5iXy5uZR9ZWafn7vtiaZjG01zheiar5hrdONTMVNSUsJJJ53U7ftfffUVjY2NtLS0cMstt6BWq3nllVe45JJLeOeddxg+fDitra3odB2j1pX/bTZbv8emUkkkJcX2e/0jYTIZ/bLdUCSa5gp9m29FbQtNFjsatcT4gnS0moEVu5syJgv+8wsHyhuJjTP4vXheNB3baJorRNd8xVyjE5+KmYyMDD788MNu309PT2fDhg0YjUa0Wi0A48aNY/v27axcuZI77rgDvV7fSbQo/xuN/T9wLpeM2dzS7/W7Qq1WYTIZMZstOJ2RnR4XTXOF/s33x50VAOSmx9HUOPCO1zpJ9hbP+3FHOSPyEge8za6IpmMbTXOF6JqvmGtkYjIZe2WB8qmY0Wq1R4yHATCZOsYSqFQq8vPzqahw3wiysrKorKzssExlZSUxMTHExw8sqNJf+fhOpytqcv2jaa7Qt/kqNWGGZMT77DMalmXix73V7DpY3+8ifL0lmo5tNM0Vomu+Yq7RSUAdbt988w2TJk2iuLjY+5rD4WDnzp0MHz4cgKlTp7J+/foO661du5bJkyf7rFKgQOAPfBn8q5Cf495WkWg6KRAIBN0SUHUwefJkkpKSWLJkCdu2bWPXrl0sWbKE+vp6Lr30UgAWLFjA1q1bWb58OYWFhbzwwgt8/PHHXHHFFYEcqkDQJ9oH/w40Lbs9wz1NJwsPmX22TYFAIIg0Aipm4uLiePHFF0lNTeXyyy/nggsuoL6+nldeeYXU1FQARowYwRNPPMHXX3/Nueeey5tvvsmyZctEjRlBSFNZb8FidaBRq8hO9V2g+ZBMEypJoq7RSq154HE4AoFAEIkEPDV70KBBPProo0dcZvbs2cyePTtAIxIIBo7iYhqUEYfGh+mSep2a3PRYDlY0sbe0gekmUTxPIBAIDkcEoQgEPsBXlX+7It/jaioSriaBQCDoEiFmBAIf4A3+zfR9xtHwbE/cjKgELBAIBF0ixIxAMEBcLpn9FUomkz8sM26BdKCiEbtIwxQIBIJOCDEjEAyQ8toWrDYnOq2K7BTfV5lOSzQSZ9TicMoc8IgmgUAgELQhxIxAMECUeJnBGfGoVJLPty9JkjdFu0i4mgQCgaATQswIBAPEn/EyCoqraa8IAhYIBIJOCDEjEAwQb7E8P8TLKOSLIGCBQCDoFiFmBIIB4HS5OFjh+8q/hzM0y4QkIYrnCQQCQRcIMSMQDICy6hZsDhcGnZqM5Bi/7UevU5OXFgeIejMCgUBwOELMCAQDYF+7YnkqyffBv+1RiuftFa4mgUAg6IAQMwLBAGhrLum/4F8FJQi4UHTQFggEgg4IMSMQDABvJpMfg38VFMvMgXJRPE8gEAjaI8SMQNBPHE4XxZVNgH+DfxXS2xXPO1gpiucJBAKBghAzAkE/Ka1qxuF0EaPXkJZo9Pv+JEkiP9vjaioVQcACgUCgIMSMQNBPvJ2ys+KR/Bz8q6C4mkS9GYFAIGhDiBmBoJ8EMvhXwStmRBCwQCAQeBFiRiDoJ21tDPwfL6MwNCseSYJas5W6RmvA9isQCAShjBAzAkE/sDuclFR5gn8DkMmkYNBpyPUUzxOuJoFAIHAjxIxA0A+KK5txumTijFpSTIaA7lu4mgQCgaAjQswIBP0gGMG/CiKjSSAQCDoixIxA0A+UeJmhAQz+VVAsM/vLG3E4RfE8gUAgEGJGIOgH7S0zgSYjSSme5+JgRVPA9y8QCAShhhAzAkEfsdqdlFY3A4FNy1aQJIlhXleTiJsRCAQCIWYEgj5SXNGELENCnI6keH1QxiCCgAUCgaANIWYEgj6yz+NiCka8jIIIAhYIBII2hJgRCPpIMIrlHc7QLBOSBDXmVuqbRPE8gUAQ3QgxIxD0kWAG/yoY9RpyUkXxPIFAIAAhZgSCPmGxOiivaQFgcBDdTADDc4SrSSAQCECIGYGgTxysaEQGkk16EmJ1QR3LsGwRBCwQCAQgxIxA0CeC0Sm7O/I9lhlRPE8gEEQ7QswIBH2gTcwEL15GITM5hliDBrvDRXGlKJ4nEAiiFyFmBII+sL8s+MG/CpIkeevN7BVBwAKBIIoRYkYg6CUtrXYq6ixAaLiZAG8l4KJDIghYIBBEL0LMCAS95IDHxZSaYCDOqA3yaNx4KwELy4xAIIhihJgRCHrJPiVeJis0rDIAw7JMSEB1QysNonieQCCIUvwqZm699VZuvPHGTq+vWbOG+fPnM2HCBObOncvq1as7vG+1WrnjjjuYMWMGkyZN4vrrr6e2ttafQxUIekSJlxkaAsG/Cka9hpy0WAD2inozAoEgSvGLmHG5XDz00EOsWrWq03uFhYX88Y9/5LjjjuOdd97hN7/5DTfccANr1qzxLnP77bfz3Xff8dhjj/HSSy9RVFTEokWL/DFUgaDXhFImU3uUejNFot6MQCCIUjS+3mBhYSE333wzBw4cIDs7u9P7L730EgUFBSxevBiA/Px8tm/fznPPPceMGTOoqKjgvffe46mnnmLq1KkAPPTQQ8ydO5ctW7YwadIkXw9ZIOiRxhYb1Q2tQPAr/x5Ofo6Jb346JOJmBAJB1OJzy8zatWvJz8/ngw8+IDc3t9P7GzduZMaMGR1eO+aYY9i0aROyLLNp0ybvawpDhw4lIyODDRs2+Hq4AkGvUIJ/M5JjiDH4/BlgQAz3BAGL4nkCgSBa8flV+eKLLz7i++Xl5WRmZnZ4LT09HYvFQl1dHRUVFSQlJaHX6zstU15ePqCxaTS+1W5qtarD70gmmuYKned70FOUbliWyefn0UDJSY8j1qChudXBoZoWb7p2b4mmYxtNc4Xomq+Ya3TTJzFTUlLCSSed1O37a9asITk5+YjbaG1tRafr2NNG+d9ms2GxWDq9D6DX67Fa+5+toVJJJCXF9nv9I2EyGf2y3VAkmuYKbfMtqW4GYHR+qt/Oo4FQMCSZzTsrKauzMGVMVr+2EU3HNprmCtE1XzHX6KRPYiYjI4MPP/yw2/cTEhJ63IZer8dms3V4TfnfaDRiMBg6vQ/uDCejsf8HzuWSMZtb+r1+V6jVKkwmI2azBWeEm/ejaa7Qeb67D9QBkJmop66uOcij68yQ9Dg276xk6+4qZo7J6NO60XRso2muEF3zFXONTEwmY68sUH0SM1qtlvz8/H4PCiArK4vKysoOr1VWVhITE0N8fDyZmZnU19djs9k6WGgqKyvJyOjbRfpwHA7/HHSn0+W3bYca0TRXcM+3pt5CbaMVCchJjQ3J+Q/11L7ZU1Lf7/FF07GNprlCdM1XzDU6CbjDberUqaxfv77Da2vXrmXy5MmoVCqmTJmCy+XyBgID7Nu3j4qKCqZNmxbo4QoE3pTsrNRYDLrQCv5VGNq+eF5zZ8umQCAQRDIBFzMLFixg69atLF++nMLCQl544QU+/vhjrrjiCsDtyjrzzDO55ZZbWLduHVu3buW6665j+vTpTJw4MdDDFQhCtr5Me2IMGrJT3bE8IkVbIBBEGwEXMyNGjOCJJ57g66+/5txzz+XNN99k2bJlHdK1ly5dyowZM7jmmmu4/PLLGTZsGI8++mighyoQAO06ZYewmAF3vRmAQlE8TyAQRBl+tZmvXLmyy9dnz57N7Nmzu10vJiaGu+66i7vuustfQxMIeoUsy22WmRDqydQV+dkJfPNTGYWirYFAIIgyRJK6QHAE6hqtNDTbUEkSeelxwR7OEVE6aO8vM4vieQKBIKoQYkYgOAL7PC6m7NRY9Fp1kEdzZDJTYojRa7A5XJRUNQV7OAKBQBAwhJgRCI7AvjLFxRTa8TIAKknyVv8VriaBQBBNCDEjEByBfYfcomBoiAf/KiiuJhEELBAIogkhZgSCbpBl2etmCvXgXwVvRpNIzxYIBFGEEDMCQTdU1llosthRqyRy00I7+FdhmEd0VdW3YhbF8wQCn2FuttHcag/2MATdEJrlTAWCEGBvcT0AuWlxaEOsU3Z3xBi0ZKXEUFbTwoGKRsYNSwn2kASCsKeu0cpNT6/BJctMHZXOCZNyGJ6TgCRJwR6awIMQMwNgy/uriKv4kZjYGOLjY5G0eiSNDjTtfmuV//Wg0R3hfZ37fSk8bprRwJ5id3PJcAj+bU9aopGymhbqGvvfZT5YWKwOXv1sN80WO3qdGr3W86NTo/P8bdCp0WlVbe+1W0b5W6tVoRI3GoGPOFDRiM3TA2ntLxWs/aWC3LRYTpiUwzFjMjHqxa002IgjMABiStaTKleBDZx1PtqoWttB+Kgy8tGNm4s6Jc9HOxD0lr0l9UBbE8dwISleDxCWYubHPdX8sK3cJ9vqIHh0bUIoPlaPWgKDTo1RryFGr8HY7idGr8GgV3tf1+vUQhhFOfWe79KQzHhy0+NYv72CkqpmVn66mze+KmTG6AzmTMphUEZ4PfhEEkLMDID0827ilw0b+GV3GS57Kzqc6FQO8pJ0DEnTkxIrgcMGDhuyw3rYbxvYre7fznaxDU47stMOVpABV0M5jt3fo84di27CGaizjxKmzQAgy7LXzRTqbQwOJylOETOtQR5J36n1jHl4bgJTC9Kx2p3Y7E6sNidWu+fH+7fL/V67123tOgjb7C5sdheNDCzOQQIMeg0xenUn0WM8TPgY9RpyUmPFTa0Lymtb2LSrkpOn5KHXhXbNpsNRHgyGZJn43WkFXHDicH74uZyvfiylrKaFr348xFc/HiI/x8QJk3KYNiodrSa85hjuCDEzAJLSUjn74vM4uqqR9dsr+PrHUnYerAcLcAhSTAZmT8zmuPFZJHpuMF0hyy5w2D1CxyNwHDbk1kbsu77DsW8DzpJtWEq2oUoZjG7CXDTDpiGpxOHzF5V1FppbHWjVKm8Dx3ChzTITfgHAtZ6bxqhBSZw6re/WSJcsewSOyy2EbE5aPWLHZnNid8moNWqq65ppttixWJ1YrA7vT4v3b/frTpeMDN73oWdrlyTB3X84hszkmD6PP5J555siNu6sJD5Gx+wJ2cEeTp+oa3If96Q4HQCxBi2nTMvj5Km57DpYz/+2lLJ5dxWFpWYKS828/vkeZo3PYs6kHDKSxHkQCMTd0AdoNSqOHp3B0aMzKKtp5usfD/H9z2XUmFt595si/vvdPiaOSGXOxByOGpLUyWQtSSrQ6pG0nQWPJm88LnMltp8/xb7rG1w1B2j98mmk9W+hG3cq2oLZSDpjoKYaNSgp2YMy49CowyuOKZzdTHVmz00jvnvxfyRUkoRBp8Gg6/p9jUZFUlIsdXXNOBxHbvkgyzI2h6ud2HF2IXo6CqC9pQ2Ym23sPFgnxMxhVNVbOvwOJxQ3U+Jh56UkSYwanMSowUk0NFn5ZmsZ3/xYSo3Zyifri/lkfTFjhiQxZ1IuE0ekoFaF17UknBBixsdkpcRy4UkjmD97GBt3VfLVj4fYW9LApl1VbNpVRXqikeMnZjNzfBammG6uuIehMqVjmHkJ+innYtv+JfZfPkduqsG65nWsm/6DbvQJaMeegiom0b+TiyIUMRNu8TLQdsENRzeT9wm4n2LGl0iS5I25OZJltT1vf13I6jUHKDpkZs7EHD+PMLxQBEFYiuxenJcJcXrOPnYIZx4zmK1FNXy1pZSfC2v4ZX8dv+yvIzHObZE6fmJOSJzfvmTNtnKsDmdQz3khZvyETqvm2LFZHDs2i5KqJr7ecogffimjst7Cm18V8u63RUwemcYJk3IYmZfYqzgYyRCHfvI8dOPnYt/zA7atHyM3lGP7cTW2rZ+gHTED7fi5qJPERXSgeCv/hqGYSfZcKJtbHdjsTnQh3lOqPcqNLqmX4iHUUNpJFB0S7STa43C6vHWPwlHM1PfhvFSpJCYOT2Xi8FSq6i1889Mhvv3pEPVNNv77/X4++OEAE0ekcsKkri314UZji43nVm9HQmL2hOygzUeImQCQmxbHxaeO5Lw5+azfUcFXP5ayr6yR9TsqWb+jkqyUGI6fmMOxYzOJM2p73J6k0aE7ag7aUbNxHPgR+08f4azYg33Xt9h3fYt60AR0409HnVUggoX7gUuW2V/u7skUjmLGqNeg06qw2V3UNVnDxmfvcLpo9NzwkkzhKmbc7STKqpuxWB0iZdeDudmG7Pk73MSMze6kudUB9N1imJZo5NfH53POrKFs2lXF/7aUsru4ns27q9i8u4r0JCNzJuYwa3xWr679ociekgZkGTJTjEEVZuKbFkD0OjXHTcjmuAnZHChv5KsfS1n7SwVlNS38+4s9vP11IdNGpTNnYg75OaYehYgkqdAOmYx2yGScFXux/fQRjv2bcR78CcvBn1ClDUU34XQ0Q6YgqcLn6TzYVNS20GpzotepyUqNQT5yaEXIIUkSSXF6Kuos1DeGj5ipb7IiAxq1RHyYXtgTYnWkmAzUmFvZV2Zm9JDkYA8pJGgvYOoarciyHDYPWoqLSadV9VucatRtcZWlVU18pVjq6yy88b+9vPNNEdNGpfPr44eRbDL4cvh+Z7cn63NkXmJQxyHETJAYnBnPwrmjOP+E4azdXsHXW0o5WNnED9vK+WFbOblpsV5rTW++QOqM4RhPvRZXfTm2nz/Bvvs7XFX7aP38CaT4NHTjTkNbcFyXQcaCjpRWNQMwJNOEWqXC4QozNYP7CbKizhJWT8H1nuyrxDh92NzoumJYtokacytFh4SYUWh/HlrtTixWJzGG8Lj9tHcx+eK8zGlnqV+3o4L/bS7lQEUja34pR62S+P2ZRw14H4Fkj6ce18jcxKCOQ4RWBxmjXsMJk3K47bJp3PK7qcwal4VOo6KkqplXP9vNQ2/82KftqRIzMRy3kNiLHkQ3+RwkfRxyYxXWH16h6bXrsG58B5dF+POPREVdCwBZaeGVkt2ecMxoUmrMJId5cKSIm+mMYt3o7v9QxhvH5ePzUq9TM3tCNrdeOpU/nDUagB0HfFV9NTC02hwcKG8ChGVG4EGSJIZlmxiWbeLCk4bz/bZyXv98D4WlZhpbbMT3MvNJQWU0oZ/6K3QTz8C+6zt3sHBjFbbN/8X200doR85EN24uqsRMP80ofFFSR7NSwlnMuE3V4SRm6rpJfw03vGKmzBxW7hR/Un/YeVjX2EpOmNRvUoSXv85LSZKYNDIVlSRRY26lusFCakJ4lNsoPGTGJcukmPSkJATXPSYsMyFIjEHLKVPzyEpxxzrsKWno97YkjR7dmJOIveB+DCdfjSptGDjt2Hd8RfMbN2H59DGclUW+GnpEUFnnETNhcrHtCq9lJgyfgJPjwytm4HAGZ8SjVkmYm23UmMMvPd4fdLLMhKHI9meGnUGn8faA23Ww3m/78TV7PPEyI4JslQEhZkIaxWynBFgNBEmlQjtsGjHn/gPj2TehHjQRkHHs30TLe3fS8sH9OEp+QZblnjYV8VTWh7+YSYwLPzdTpFhmdFo1uWlxgHA1KSiWGb2nTEA4nZfdFczzNQWe6/0uH1zvA4U3+DfI8TIgxExIo5wgSoCVL5AkCU1WATFz/0rMb+5GM3ImSGqch3Zg+XAZLe/egb1oA3IYBr36ApvdSa2nCm04u5mSTWEoZpoUy0x4ixmAYTkibqY9de0aNUJnt1Mo09bKwM9iZlAiALvDxDLjcLoo9JzfwjIjOCIj8tw1Kw6UN9Fqc/h8++qkHIxz/kDshfejHXsKqHW4qvfT+vnjNL/5d2w7v3Y3vYwiqhrcbgGjXo0ptm9xSqGEYplpaLLhcoWHtU1pZRDulhmAYVltcTPRjizLXkGgxBOFk8iu91MA8OGMyE1EktyW4XD4fPaXN2J3uIgzaslOCX75ByFmQpjUBCPJJj0uWfbrE54qPhXDsRcTe9FydJPngT4WuaEc6zf/ovnfN7iDh23h10+lP1R54mXSk2LCOnAzIVaHSpJwyTINzaHfcNIly9RHkmXGc9M+UN6IwxmdVk4Fi9WBze7+DJQilOFwswblvPQUcvTzeWnUaxicocTNhH5WkzdeJjchJK6VQsyEOIqryRdxMz3hzoCaT9xvl6M/5gKkmETk5jqsa/9Nw8rrqP369YhP6670pGVnJIVHNkF3qFQSCZ4Ov/VhEATc2GLH6ZKRJMLaIqaQkRxDjF6D3eGipKop2MMJKopwiTVoSPd8r8IlML2x2eY+LwnMeam4mnaGgaspVIrlKQgxE+IovsiBZDT1FUlnRDf+dGJ/uwz97MuQEjKQrc3Uf/cWDSuvo/WHV3E11QRsPIFECf4Nl6q5XSHLLmSXK6xqzShNMU2xurDrUt4VKkliqKg3A3RMbVaq2za22LH30LU8FFDGHqjzsiAvCQj9IGCXLLO31H1PChUxI+rMhDgjc91xM4WHGnA4XQG90EtqLbpRx6MdeRxy8WbsP32ErbwQ+7bPsP/yJZoRx6CbcEZENbas9LqZQs8yIzvtyC0NyC31uCwNnr89/7c0IFvcf8stZiR9DOO1J1BEYpiImchxMSkMyzLxy75aig6ZOXFysEcTPNqnNscaNGjUKhxOF/VNVtISQ+971p5AZ9iNzEtAwt1Spb7J2utu7YHmUFUzza0O9Fo1gzLigj0cQIiZkCcrNZZYg4bmVgcHKhrJ9zSyCySSSoU2fzrpU+ZQ9fN6LJvex3loB47d3+PY/T2awZPQTTwTdcbwgI/N1yhiJiM5MJYZWZbB1uIWIy31XkHiaidUZEsDrpYGsDb3frutjcxp/S8NhinUmQf5cQa+wXvTCNGLd38QlYDdtE9tliSJ5Hi9N8g11MVMX7pl+4IYg5a89DgOVjaxu7ie6UdlBGS/fUWxHOXnuFu+hAJCzIQ4KkliRG4iP+6tZk9xQ1DEjIIkSWjzxiJljcZZWYTtx9U49m/GcWALjgNbUGeNcoua3LEhERDWVxxOl7fImb8sM7Is4yjagP2Xz3E11SBbGsDZh0w1lQYpJgEpJgGVMQEpJtHzf6Lnf/ePbcsH2Hf8j3NjNlFYYkd2LUJShe7XPVIK5rVHcTOV17bQ3Gon1hCezTMHSp0SQOsRBIntxEyo403LDqDFcOSgRA5WNrHrYOiKmVDpx9Se0L26CbyMzHOLmd3F9cw9OjSestXpwzCeei3OukPubt17fsBZthNL2U5UKYPRTTwTzdCpSCGi2ntDrbkVp0tGq1H5xazsqi+j9ftXcJb+0vlNXQwqrzBJQDImtPs/0fN/AuhjeyUU9bN+x8HWWDKKPiDfshXLRw9hPPlqJH1o1s5pM+eHf/CvgilGR1qigap6dwftsUNTgj2koHB4anNyWMVyBb5cwKhBSXy+sSRk42ZkWQ654F8QYiYsUOrN7CmpxyXLqELI6qFOysY453JcU8/FtvUT7Du/wlVzgNYvnkBKyMAwcwGa3LHBHmav8MbLJBp9+hnLDiu2LR9g++lDcDlBrUE34Uw0eeO8wkXS+PYmLkkScsFJPLe1mUvjv0VXup2W/9yFce5iVKZ0n+7LF0SiZQZgWHYCVfXuDtrRKmYOFwSJYSRm6oMQy6UIhEPVzZibbSGX3VfV0Ep9kw21SvK6UkOB8HlsjmIGZ8Sj06pobnVQVt37uIlAoopLwXDsRcRd9BC6Ked6atVUYPlwOa1fP4/ch3iPYKFkMvnSj+84sIXmN2/GtuV9cDlR540n9jf3oJ/6K9QZw1HFp/lcyCgkxuv5xZ7H481nIMUm4aovo+XdO3GU7/bL/gaCvzoTBxsRN9O5gm5bll3o961SXGSBtMzEGbXkprktqIEoydFXlPoyQ7Li0XnaU4QCQsyEARq1yhsrszuAKdr9QTLEoZ9yLnG/Xe6uKoyEfde3NL95M479W4I9vCPiy0wmV2MVlk8ewfLJI8iN1UixyRhOuTaglhHl5rHfmoh0+t9RpQ5BtjZh+eAB7Ht+CMgYeoMsy1EhZqKx75nD6aKxuWPROeW8DIdaM4FoMtkV3hTtEKw3E0r9mNojxEyYMMKTor0nBJV6V0g6I4ZjL8Y47yakhEzklnosnz6C5YuncLU2Bnt4XeILMSM77Vi3vE/zGzfjOLAFJDW6CWcQe/69aIdOCWhgtE6rJtbg9iTXO43EnH0TmiFTwOWg9X/PYN34bkjcYC1WJ1a7E4iMVgbtGZQej0Yt0WSxU1UfHVW029PQZEMG1CqJuBh3AHRSmPQNs9qcWKzu4PxAi2yleN6u4tCrBLw7hDplt0eImTDB20Hbh00nA4EmcySxv74T3YQzQJJwFK6l5Y2/Yy9cHxI30vYobqb+ihlHyS80v/UPbBveBqcNddYoYs67E/3R5yNpg3OT9pr0m6xIWj2GU652HwvAtvk/tH75FLIjuO0OFHdDrEHj7aocKWg1KvLS3SXqo9HV5C2YF6f3xqElhUnfMGXsep0aoz6w4aXK9b6kqpkmS+j0x2toslJRZ0Gi7QE7VBBiJkzIz05ArZKoNVupbgivJzxJo0N/9PnEnPMPVEm5yK2NtH7xBK2frcDVUh/s4QHuipaV7foy9Wnd5josXzyJ5cNlyA3lSEYThhOuxHjWkqAXFPQGW3qaOEqSCv3R56OffRlIahyF62j8z304m4PnvgxG+qu/kF1O5MNS7aM5bqarJo0JcTokCZwuGXNL6PYNC5aLCdwVh7M8zRtDKW5GqUSfkxYXcqUG/Co3b731Vmw2G/fdd1+H1y+77DJ++KGjz3769OmsXLkSAKvVyn333cfHH39Ma2srJ554IjfffDPJycn+HG5Io9epGZQRz74yM3uKG0hNCO1iU12hTh9GzPzbsW15H9uWD3Ds34SjbCeGGRehGXFsUGvT1DdacThdqFUSKabeXbxklxP7L59j3fgu2FtBktCOPgn91F+FTAp0cnzX8Qm6Ucejik/D8tkKnBV7Kf3XjcSc/lcwZQd8jOHULVt22pGbanE11eBqrEJuqsHVWN32u7kOVGo0gyeiGX4MmrzxDMs28cWm6Oyg3VVqs1qlIiFWR32TjbrG0K1yG6hu2d1RMCiJspoWdh2sZ/LItKCM4XDaUrJDyyoDfhIzLpeLf/7zn6xatYpf/epXnd7ftWsXt99+OyeffLL3Na22TeXdfvvtbNy4kcceewydTsdtt93GokWLeOWVV/wx3LBhZF4C+8rM7C6pZ8bYzGAPp19Iag36qb9CM3QqrV8/h6v6AK1fPYu6cC2G4y5FFRec9FXFKpOSYOhVRUtn+R5av3sZV20xAKr0YRhmLUSdOtiv4+wryo2iq/gETc5oYs/9B5aPH8bRUIn5naUYT7464Kn0dSHULVt2WHE11iA3VXf47WqqRm6sRm5pAHpwjThdOIo24CjaALoYRuVMYrgmlgMVYHe40GqixyB+eCaTQlK8wStmhmYFY2Q9095FFgxGDUrkqy2lIRU3o4Q5hFJ9GQWfi5nCwkJuvvlmDhw4QHZ256e8mpoaampqmDBhAmlpndVmRUUF7733Hk899RRTp04F4KGHHmLu3Lls2bKFSZMm+XrIYcPI3EQ+WV8cUmbH/qJOySPm3Fuxbf0I26b3cBb/TPObN6M/+gK0Rx2PJAX2gt/beBmXxYxt/ZvYd33rfkEfi376b9COmh3wMfeGnppNqhKziP/1bVg/X0Fr8Q4sHz2EfuYl6EafGLAxtmUy+b/GjGy3YqusxlZajKO+yitSXE01brHSm+B0tQ5VfApSfCqqOOV329+ypQH73rU49q5FbqlHs+97rjVBvSuG2q9KSJ14PKqUQWFZJbuvdGfdSIrXs68stIOAg51hV+ARDMUVTSFRQdpidVBc6e4APyLEMpnAD2Jm7dq15Ofn8/jjj/PXv/610/u7du1CkiSGDh3a5fqbNm0C4JhjjvG+NnToUDIyMtiwYcOAxIzGx09Eak/TR3WAmj+OGuJO1yuracFicxAfE7hiSv6Zqwrt1HkY8qfS/OVzOCv2Yv3uJZz71hMz5/eoEwJXyru6wR2Empkcg0aj6jRfWXZh2/41lrVveGvm6I46HuMx56MyxgdsnH0l1VMzp77J2u35r9YnkHzRbRx6bwXWXd9h/e5lMFdgPPa3AangXO95Ak5JMPjkOyrbrTgbKnB5fpztfsvNddT3tAGtAXV8Gqr4FFQmj1BRfsenIhnjjyxEElLQZw5DPvZCHGU7se1eQ+POtSSqWqDoS1qKvkSVlINuxAx0I2egNvnPhRDoa9ThdHdsUxLcwrWh2eaz67Kv59rQ7Nvzsq+kJBrJTI6hvLaFokNmJrVzNQXjuO47YEaW3UVF00KwEa/PxczFF198xPd3795NfHw8d955J99//z0xMTHMnTuXq666Cp1OR0VFBUlJSej1HdVweno65eXl/R6XSiWRlOSfOAaTKTAHNikplryMeIorGimtbWVGTlJA9tsev8w1aQQpQ+7BvPEjar96DUfpDhpX3UzyCRdjmno6ksr/GS5KcawhOYkdzhOTyYi1rIjqj5/BemgPALr0IaSe/gcMuaP8Pq6BMjjHHYxa32Tr8fzP/vVfqf8+j7qvX8e69RPULdWk/2oxKp1/z29ziztbY1B2Qq+/oy67FXttGfa6Mhy15Z6/3b+dTbVHXFdliEWTkI4mMR1NQhrahDQ0CWnu1xLSUBl61zKiV6RMg7HT+PdHJ/Dz118xN6OMXGsRrrpSWte/Rev6t9DnFhA35jjijjoWdax/YhECdY06nIZujm22J8Or2erw+XXZV3NtbHF/d/KyTH67d/TEhJFplK89wL6KJk48ekin9wN5XA9UHgBg7PDUoH0eR6JPYqakpISTTjqp2/fXrFnTY5Du7t27sVqtjB8/nssuu4wdO3bwwAMPcOjQIR544AEsFgs6XWeLg16vx2rtv0nS5ZIxm1v6vX5XqNUqTCYjZrMFp9Pl0213x/AcE8UVjWzeUc6o3MCVkg7IXEecQHz6GFq+eh5H6Q5qPvsX9Vu/JfbEK1An+TcwtaTS7V6I16upq2tGrVYRq3NR8dlKWn/+HGQZtAaM03+NftzJWFRqLHWhX9VY44nvMDfbqKxq7DJeQzm2jY2tMOZ0YvXJNH/xDC17N1H8r78Td8Z1qOL8F3xf5YlX0kpQ1+4z7cnCciQkfSyqhAzUCZmoEtJRJWaiTshAm5xFYnpah/PYBdg8P7QCrb69TgBkpSXwqn0QFc0FPHDFYmxFG7Ht+QFHyQ6sJbuwluyi5tMX0OSNQzdyBrqhk5G0A3e7BeMapSDLMjUe960GucOxNWrcYrG8urnD6wPB13OtqnOfB4efl4FkaEYcAD/truowhmAc1592V7nHlBkX0M/DZDL2ygLVJzGTkZHBhx9+2O37CQk9P1XceeedLFmyxLvsyJEj0Wq1LF68mBtuuAGDwYDN1jldz2q1YjQOTIU6HP456E6ny2/bPpzhOQn8b3Mpuw7WBWyf7fH7XGNTMZxxA/adX2Nd+2+cFXsxv3ELusnnopvgHyuNLMtU1LovXCkGF9byIqjZT93G93A21wOgyT8G/TEXoIpNwukCXIH/7PuDXqNCq1Fhd7iorrccsVWDcmxVQ6YRc1YSlk8fxVl9EPNbt2M87a+o04b4fHw2uwOp1cwITT3xB7+laXsFrrpDuBrKkXtK29fHojJluMWKKQNVgufHlIFkiOtyFdkj5gL5nQUY7LkpVdRZaLCpiRsxC+OIWbha6nHsXYe9cC2uqn04Dv6E4+BPtGh0aIZMRjv8GHcX+gF2PA/0fAGaW+3YPPuMN2o77N/kcZHXNlp9Pi5fzNXlkqn3WGtNMbqgXGvBfb0H2F9uprHZ1qneTaCOq93hpOiQOy17eHZC0D6PI9Gnb4hWqyU/P39gO9RoOomeESNGAFBeXk5mZib19fXYbLYOFprKykoyMkKzHXogUUpIHyhvotXmwKCLvF6hkiShO2oOmrxxtH77Es7irdg2vIVj3wYMx1+OOqX/ncPd2SrVyI1VuMzVuBqrsNVXcq1hPykxTRg/eJn2z+WqxCz0MxegyRk98IkFAUmSSIrTU1lvoa7R2uu+U+qM4cSc+w8sH/8TV10pLe/fg+GEP6IdOqVf45BlGbm51i1U6g7hqnf/dtSWcleS+xOX10On8mD9ECyhSKxBS0ZyDBWe+Ifx+e6sPVVMIrrxp6Ebfxqu+jLse9di37sW2VyBwxNELOnj0ORPRzN8BuqM4WETOKwE0MYaNJ16+LRVAW5FluWQm5O5xYZLlpEkMMUGL/A22WTwdl7fU9LgPW8Czb6yRhxOGVOsziftXvxBwO+ECxYsIDc3l3vvvdf72s8//4xWq2XIkCGkpaXhcrnYtGkTM2bMAGDfvn1UVFQwbdq0QA835EhJMJBi0lNjtlJ4yMyYIZFbe0cVl4Jx7mIce36gdc1ruKoP0PLOHegmnYVu0tlI6s6nr+xyuOuANLqFimyuavu7sQrZ0rnWhwTkttuUZDShMqWRMOZYXCNOwCmHXpZSX0iKbxMzfUEVn0bMOTdj+fwJnCXbaP1sBfLRv0E7/vRubz6y7EJurMFVX4qr7hDOduIFe+fGghLgkqEeE+lD8lElZaNKzEaVmBl2gqUnhmWZPGKm65uSKjEL/dRfoZtyLq6qfdj3rsFRuA7ZYsa+/Uvs279Eik9FN+YUtONOCcnsufYcqU6Lkqpts7uwWB3EhFgBNuW7khCr61WpBn9SMCiJqvoydhXXBU3MtPVjSgg54akQcDFz2mmncc899zB+/HhmzZrFzz//zAMPPMDll19OXFwccXFxnHnmmdxyyy3cc889GI1GbrvtNqZPn87EiRMDPdyQZEReIjW/VLCnuD6ixQy4LQvakTNR547B+t1KHPs3Ydv8Hxz7NqIddypySz0us8fS0ljljqWQezCBao3u7JT4NKT4NA406fhgazOm9CyuuHA2klaPRqMiMSnW7RsOQZNqX+gpPftISLoYjHMXY/3hVezbv8S67g1cDeXoj70EuakWp0e0tFlbysDZTVVXSeW2qiRmu0VLUg7banU89WU1+YNSueG0yQOZZsgzLNvEml/Ke6wELEkS6vRhqNOHIR9zIc5DO9zCZt8m5MZqrGtfx1m+G8OcK5D8HJw9ELoqmKeg9A1rbnVQ22gNWTETClWpC/IS+W5rWVCbTir1ZUKtH1N7Ai5mLrnkEiRJYuXKldxzzz2kpaVx6aWXcuWVV3qXWbp0Kffccw/XXHMNALNnz+aWW24J9FBDlpG5iaz9pSIi6s30FlVMIoZTrsGxbwPW71biqivF+s2/ul5YrUEVl4pkSkMVn+ZOp41v+xt9x2yVHd8W8Yt9P7NTs4LWQ8mfJA5AzABIKjX6mQtQJWZhXfMa9p3fYN/5Ld0Wj1Np3JaVRLdgcQuXbLel5TBrWvnaA9ipD4mbhr9R2hrsKzP32rUiqdRocseiyR2LPGsh9l3fYF3zbxz7N9HynzKMpy5ClRCaBTS7K5inkBSvp7nVQX2jldy00LLAeYVYCFQnVppO7i9rDEpogcsls9fTxiDUOmW3x6+fitKe4HAuvvjiI6Zwx8TEcNddd3HXXXf5a2hhjaKOiw6ZcThdaIJUQyLQSJKEdth01NlHYdv4Lq76MiSlBojHyqKKT0WKSeiTCb6tYF7fejKFC0ndtDToC5IkoRt7CipTGpYvnnK7jNQ6VIlZbWIlKRt1Yg6SKa3XgdqBLJgXbPLS49CoVTS3Oqios5CZ3LfzTdLo0I05GXXqECyfrcBVd4jmd+/AeOKf0Aya4KdR95+e2gEkxRsoqWqmNgQL59WHUL+w1AQjKSYDNeZW9pY2MHZoYF1NxZVNtNqcGPVq8tJDS3S2J/KiR6OA7JQY4oxamix2DpQ3kp8Ten0y/InKEI9h1u98tj0lNTi9l8Gx4YbyZFzvg5uGZtBE4i55BNliRopPGXDcRiiZ8/2NRq1icGYchaVmig419FnMKKgzhhMz/3ZaP3scZ8UeLB//E920+egmnhVS8QxHcjMBJMW7Ezx8cV76mlA7LwsGJfLDtnJ2HawPuJhRPADDcxJRqULn/Dqc6HikjzAkSfK2X1d8mYL+U1HXu1YG4UpbzEznANz+IGn1qExpPglAjaSO2b1hWJb7ezvQDtqqmESMZy1Be9QJgIxtw9vuAG2bxQej9A09u5nc1rhQtMyEkpsJ2lob7ApCaEFbP6bQfmgWYiZMUXpj7CluCO5AwpyWVgdNFndCcG/TlsMNRSjUN7nTTUOJUHsC9jdK3MxAxQy4m7YajluIfvZloNJ44miW4mqoGPC2fUHPbiblvAw9MRNKbiZoi5vZd8iM1e4M2H5lWWaPR0CFYj+m9ggxE6YoXUv3lNSH3A0qnKjyxMuYYrSdClJFCglxOiQJnC6ZxpZOlVyChtPlosFTmCxUbhr+Jt8jZoorm7A7fHNT0o06npizb0SKSfTG0TgObvXJtvuLw+nytqno3s3kfr3WHHpiJtREdlqikaR4PU6XTFFp4B5gK+osmFvsaNQqhmYFruJ8fxBiJkwZlBGHTusOJjxUHfpl9UOVCk/J8kgN/gVQq1SYYt3xCb5yNfkCc7MdlyyjVkneirCRTkqCAVOMFqdL5kBFk8+2q8TRqDKGg60Fy8cPY93yAXKQHnQUy4ZGLRFv7Drt2hvLFWKWGYvVQavNLTRDxc0kSZLXOrMzgCnaSrzMsKz4LluhhBKhPTpBt2jUKvKz3T7MPVGUou1rquojO15GIXmA6dn+oNYjrBLidCEdWOhLJEliWLZv4mYORxWTSMxZN6I9ag7uOJq3aP50Ba4gxNHUN7otbolx+m6DkpUqwE0WO7YAuk56QhFXBp06pKy1wYib8RbL8wipUEaImTBGcTXtLhFxM/2lIsIzmRQSfZjR5Ct6iqmIVIZ642Z8/711x9Fciv64S0Glxl64gdIX/44zwHE0SvBvdy4mgBi9Bp3naT+UrDOh5mJSKBiUBLhFsM1HLsqeaKv8mxiQ/Q0EIWbCmJFKRlNxfdDMyeFOZYRnMil44xNCSMwoY+ku2yVS8WUQcHfojppDzNk3IcUkYq86SONbt+EoDlwcTV0vjq0kSQOqTu0vQlXMZCQZSYjV4XC6KCr137mjUNdopbqhFUkiLMp/CDETxgzLSUCtkqhrtFLTEDqxEOGE4mZKixIxE5qWmcgvmNeeoZkmJKC6oRVzczetH3yAOmM4pt/cgT6nANnaguWjh7H+GJg4mt5a3UJRzNT3kFIeLDrEzRyo8/v+FKvMoPT4kHK3dYcQM2GMXqtmcGY8IOrN9Aeb3em9iGZEcAAwhKZlJlSfgP1NjEFDZor7fPOndQZAFZtE9iV3oBvtqUez/i1av3gCuYumn77E62Y6TBAcLqRCUcz0VOwvmChxMzsPBkDMePsxhb5VBoSYCXsUX+ZuUW+mzyhWmRi9hlhD6D95DIRQzByJVjED7VxNZf7/3koaLbFzLvPG0TiKNtDyn7twmSv9ts+6RisqXGRRhe3nT7F8/gRNr15H04t/xrrlfWSXu3nrQPuG+YNQPi9HeuJm9pY0+Cy1vzv2hFG8DIh2BmHPiLwEPl7vrjcj6BtKvExakjGkysD7gyST25UjbhqhwbDsBL7/uecO2r5Ed9Qc1Ek57r5OtSVtfZ3yxvlk+3JrE86KvTgr9nJW0yaykirR/eTk8DPOtuFtnKXbMZxwJcnxoXdehqqbCdytbOJjtDS22Nl9sJ7sJP+4aJssdkqq3CU/RoZwp+z2CDET5ihVGctqWjC32KKmXocvUDKZMiI8XgbaLsytNicWqyPoPnBZltsCgKNRzGS1ddB2yTKqAIlpdeYIYubf7hY0lYVYPn4I3bTz0E04o0+CXpZduOrLcVbsweURMK76Mu/7gz2bknUxaDKGo84cgTpjOC5zJdYfXsV5aActb/2DnBG/AQbWBNXXhLKbSZIkCvIS2birim1F1WRPyfXLfpQu2ZnJMd4aVaGOEDNhTpxRS05qLKXVzewpbmBKQVqwhxQ2REuNGQC9p2aGxeqgrtEadDHT3OrA4fS4GkLwCdjf5KbHotOosFidlNe0kJ0aG7B9q2KTiDn7Rqzfr8S+8xts69/EVX0Aw/GXI2m7Phay3YqzqshreXFW7AVr52KdqsQsXCnDeP0nF/scadyx+Gx02nZF87KPQpM5EssXT+GqOUDOtn8xP2YU3zbO8Nd0+4TT5aKhObSrUhcMSnKLmcIaTvWTmAmXfkztEWImAhiRl+gWMyX1Qsz0gUpP9d9I7cl0OMnxekqtDuqarAG9eXZFrdkdgBofow35yqL+QK1SMSQznt0lDRQdMgf8eEhqLYbZv0eVOhTrD6/gKFpPS/0hjKcuQmVKx9VU4xYt5XvcVpeagyC7DpuEDnX6UNQZbquLOmM4kiGOksom1m5YT5xR21HIeFAlZhFz7i1Y17+F/edPON6wk+GOCuy1w9Em5wToE+gac7MdWQaVFPiq1LIs98o65g0C3l/rfSDwNb3pxyTLMrK5Emf5bpwVe1ElZqEbP9cv4+kNQsxEACNzE/hqS6k3lU7QOyrrFTdTZGcyKSTG6ymtbqYuBHrhhFojv2AwLDvBLWbKzMwanxWUMehGn4AqOZfWzx5zx9G8czuS1oDcXNtpWSk22S1aPC4jVUoekqrzLaS7TKYO21JrMcz4Larso6j+6ClyNHVY3r0DeebFaAtmBy2GTXEx+bsqtSzLyA3lHitXIc7KvbjqDiHFp6FOG4I6bSiqtKGoUwcjaTvGxWSnxRJr1NJssbO/rJEhnoxWX2G1O9lf3gh0jJeRXQ5cNcVu8VK+B2f5bmRLu5gvjQ7tuFORpOA8nAgxEwEoJ9zBiiZabQ4MOnFYe8LhdFHtqc0TLZYZbxpsCMQnRGvBvPZ4M5oC2DiwKzSZI4iZfweWzx7DVVmEbGsBSYUqdbDX4qLOGI4qLqVX2+tLYLdu8ESedv6as51fUEA51m/+hbPkFwzHLUTSB9566K+gdNlmwVm1r81NV1nYpZtONlfgMFfgKFznfkGSUCVmo/IIHHXaUFTJeYwalMimXVXsPFjnczFTdMiM0yWTES+R2FiIda9buDgrC8FxWF0klcY9rswRaPKPDpqQASFmIoJkk4EUk4EacyuFpWbGDE0O9pBCnpqGVmQZdBoViXHhEeA2UJJCqKWBt6iaKboK5rVHETMlVc1Y7U70WnXQxuKOo7kJR/HPSDoj6rRh3cbP9ERbwbzefa+08Uk8eegUbp5STdr+T3EUrae5shDjiX9CnTmiX2PoL77IZPJaXSoLvZYXV10JHF6sUK1BnToUVUa+WzCmDMJlrsRZtQ9X1T6cVfuRm2tx1ZXiqivFsft793qSml8bM8iPicW1+xDOYcehSs7p0krWF1zNdTjL9yBv2cj/mfaSo6mj9aPDxqyP9VjnRrp/UgcjaULj+inETIQwMi+BNb+0sru4XoiZXlDZLvg30tOyFUKpQFmbZSY0LoTBICleT0KcjoYmGwfKG4OeAiuptWiHTB7wdnrjZmpPUryeIiSKko5l8IRpWL54Ermxipb370U35Vx0E89CUgXmib8/mUy9tbpIcSluIZCe73HTDUJSd7wFq0zpaHLHev93tdR7hY0icuTWRmJbDjHTALTsoeWdz0GtRZUyqM16kzYUVWJmt5YSdzZamddd5Czfg9xYBUAeeJWBFJ/WQbyokrKCan05EkLMRAgj8hJZ80uFqDfTS7w1ZqLExQShJWbqorSVQXskSWJYlokte6opOmQOupjxFX111ShWkLpGK+r04cT++k5av30JR+FabBvfwXloB4YTrkQVm+S3MSv0NHa31aUCZ+Xevlld0vP7NX5VTCKqwZPQDJ7Utv+mGuSa/Xy0+huyqGS4sR6VoxVXZSGuykLsyspaA+rUwe7Ym7ShSMYEt7XIE7DbSXBJElJyHt+WxbLHlsavzzuVnEH+yZbyB0LMRAhKlcbCQ2YcThcadWiq51Chsi66gn8htGJm2txM0RszA25Xk1vMRE4F7752Q1fOAeW8lHRGDCf+EUfeWFq/W+mtSWOYc7n3pu4v6hrdcXTt3UyuphocB3/CWfyz24Jhbeq0nhSX4rW4dGd18QWSJCHFp6JJSqckL4Z3tpdz/pRhnDrKgLN6H87Kfbiq9+Os3g/2Vpxlu3CW7WoTOO3R6NxjzhzpDupOz2dftZ03X95IrEFDVl5wM8v6ihAzEUJWSgxxRi1NFjv7yxsZHgZdToOJNy07CmrMKCimc3OzLeiCVwQAuxmW7f6eFpUFrhKwv+mzm6mLWC5JktCOnIU6fTiWL5/EVX0AyyePoB1zMvqjz/dbnEZdkw0VLtIdJVjXfYOjeCuu2pKOC/nI6jJQxuansH57ObuKG5h7zBBUiZloh7vr9cguJ676Q7g87iln1T5ki9nthsoYgTpzBKrUQZ3ibHYXu11NI3ITA1bI0VcIMRMhSJLEiNwEtuypZk9xvRAzPVAZRQXzFOKNWjRqCYdTpr7JSmpCcObeanNgsTqA6E7NBhiSGY8E1Jqt1DdZw76AoN3horHFbQfotWXmCE1QVYmZxJzTVpPG/svnOMt2YTjpz6iTsn02bldrI46DWznD/jkjEkuJXW/Dm7cjSajS89EMmoAmZzSqlMF+sbr0lbH57uyy3SUNuFxyh1RySaVGnZyHOjkPbcFxvd7mnjBrLtme4B8Rgc8YmZfIlj3V7C6u5/RjBgd7OCGLyyV7q/9mRFHMjCRJJMbpqW5opb7RFjQxo8QlGDxViaMZo15DdlospVXNFB0yM3lkeBe9bPBYZTRqFXHGzgXzukIRM/WN1i4Lxyk1aTS5Y2j96jlctcW0vHM7+gHUpJFlGVfNQRwHf3JbXyoKAZmJypD1sWjyxrkFTO44JENcn/fhb4ZlJ2DQqbFYHRRXNjF4gCnaLllmj6eNQTjGb0X3lSTCUE7AvaUNAe33Em7UNVpxOGXUKonkKEsNTop3i5naxlYgOE9ffY2piHTys00RI2baXEy6XosM5TywOVw0tzq6FUGavPHE/PpOWv/3LM7SXzw1abZhOO7SXtWkke2tOEp/wXnwJxwHtyK31Hd435mQw5flSexlMH+7Yj6SKnip8r1BrVYxMi+RrYU17CquH7CYKatupsliR6dVMTjDt7VrAoEQMxHEoIw49Fo1za0ODlU1k5seek8ToYASL5OaaPRrlc9QpP1TcLCI5gaTXTEsO4FvfiqLiCDg/hSd02rU3ni/+kbrES06qphEjGdcj33rx1jXv42jaAPNlUXu7t+5BZ2WdzVUuK0vB3/CWbYLXI62NzU61Nmj3daXQePZUSXxwaofyU6NDXkhozBqcJJbzBys49RpeQPa1m6PVSY/OyEsE0iEmIkg1CoV+Tkmtu+vY3dJvRAz3eCNl4kiF5NCKGQ0+avKarji7aBd3tgp9iHc6K/VLSleT5PFTm2jtcfrliSp0E04A3XWKCxfPoVsrqTl/XtxTZtPwpz52Et+wbrvRxwHf0JuKO+4bnyaR7xMQJ1V0CGQuK7I3fU7nGofHTXYHXi8u7h+wNb4tn5M4RcvA0LMRBwjcxPdYqa4nhMnh0+NgECipGVHU/CvQvuaHsGirknUmGlPdmosep0aq83Joerwtqj2NZNJISleT3Flk7cCb29Qpw8jdv4dtH73Mo69a2hd/xb7N74LLmfbQpIaddZINIPGox40AVVCVrfuL+/Yw0hkD86M91rjS6uayRvAudPWKTvRN4MLMELMRBiKqt5T0tDrLqzRRjRmMiko7QOCKmbMwjLTHpVKYmhmPDsP1lNUZg5vMTMAywy0dVPvLZLOiPHEP2LPHUvr9yvB3opkTEDtDd4dg6TrXS2pcIzl0qhVDM9N4Jd9tew6WNdvMVPdYKHWbEWtksjPFpYZQQgwLCcBtUqirtFKdUNrVFW47S1ey0wUfjahZZkJn5uGvxmabXKLmUMNzJ7gu5TjQNNvN5NSa6af7k/tyJnoh00iXm2lSZOE09nzOodTF6a1jwryEt1ipriek6f2L25mT7E7XmZQRjx6XXjECx1O+EX5CI6IXqv2RrXv9vhABW3IshzVbqZET/O/+iZ3GmwwCNebhj8ZluUpnncovIvnDcTNBF3XmuktKkMcuvRB/e4dFI5uJoCCQYmA+3rf3+/0Ls+9oiBMXUwgxExEorQ2EH2aOmNusWO1O5EgaHVWgklinB4JcDhlGi1dFjn3Kw6ni8ZmdzmyaG9l0B6lg3ZpdTOtNkcPS4cmsixT1+g5tv10MwUzy07Zd3KYxXINzTKh06hobLFzqKalX9sI52J5CkLMRCDKCbm7OPxTPX2NkpadbDKg1UTf6a9Rq4iPdVtnlNiVQFLfZEUGNGqJ+F4WVYsGkuL1JMXrkWXYX9YY7OH0i+ZWBw6nC+i/ZSZY7k+H04XZI7LDzTKjUavI91R833Wwrs/rm1tslHlE0AjPg3A4En1X8yhAOSHLa1u8X1CBm2h2MSl442aCkJ6t3KwS4/QiOP0wFOtMuPZpUo5tnFHb5wcFRcw0tzqw2vsR8DJAzM02ZECtkoiPCT+Rrbiadh2s7/O6SrxMTmpsr6s2hyJCzEQgcUYtOanuipjC1dQRIWaCa9Kv85ryw+vpNxAoWSThGjczkPpBRr0GvdYdeBrM8zIxTheWldOVWJdd/YibaXMxJfp2UAHG52KmrKyM6667jpkzZzJt2jQuv/xy9uzZ02GZjz76iDPOOIPx48dz7rnnsmbNmg7v19XVcf311zNt2jSmT5/OHXfcgcVi8fVQIxrlxBSupo5Ec1q2gi+CLfuL96YhxEwnvJaZMK0EXD+ALDVJkrznRDBcTeF+Xg7LNqFRqzA32yiv7VvcjJIoMjJMi+Up+FTM2Gw2rrzySqqqqnjqqad47bXXiI2NZeHChdTW1gKwdu1a/va3v3HhhRfy7rvvMmPGDK688koKCwu921m0aBEHDhzgxRdf5JFHHuHrr7/m9ttv9+VQIx7lxNwtLDMdaEvL7l3tiUgkMSQsM+EVZBkIBmfGo5Ik6ptsfa63Egq0dyH2h+Rgipmm8M6w02rU5HvE8K4+ZLFarA4OVjQB4VssT8GnYmbjxo3s3r2b5cuXM27cOEaMGMGyZctoaWnhyy+/BODZZ5/l5JNP5ne/+x35+fksWbKEMWPG8NJLLwGwZcsW1q9fz/3338+YMWOYMWMGd955J//5z3+oqKjw5XAjGuXEPFjRiMUantkR/kAJAI5my0xyEFsahPsTsD/Ra9Xkprndw+Hoahpom4rEIMZy1UfAeelN0e5D3EzhIXdT4tQEQ9g33fWpmBkxYgTPPPMMGRkZbTtQuXdhNptxuVxs3ryZGTNmdFjv6KOPZsOGDYBbEKWlpZGfn+99f/r06UiSxKZNm3w53Igm2WQgxWRAlt0nrACaW+00t7qFXTQWzFMIqjm/ScTMHIk2V1P4iZmBuJkAkj2p+sHIsouEQo79iZtRwhDCOYtJwacVgNPS0jj++OM7vLZy5UpaW1uZOXMmZrOZlpYWMjMzOyyTnp5Oebm7IVhFRQVZWVkd3tfpdCQmJlJWVjag8Wl8nIqr9nQWVYdoh9GCQYn8sK2cvaVmJo5IG9C2Qn2uvaHGc5FMiNMR20PGQiTMtzuUqtD1jVY0GlVA56o8AackGnz+fewNoX5ch+cm8tWPh9hXZvbJ5xPQY+sRBCkJ/Tu2KQluy0B9s7Vf6w9krg1NNs8YjEE5L/tKV3MdOTjJW/29ttFKRnLPrvS9njCEUYOTwmLeR6JPYqakpISTTjqp2/fXrFlDcnKy9//PPvuMBx98kEsvvZSCggKvYNHpOnYl1ev1WK3uL4LFYun0/uHL9AeVSiIpKbbf6x8Jkyk0n/Injcrgh23lFJWZfTb3UJ1rb2je767BkJMW1+vPI5zn2x16o/v71WJ1YIjRY9S7LwP+nqvLJXutQUNzk0kKoqsvVI/rpKMy4IPt7K9oxGQy+kyEBGK+9R5BMDgnsV/XmzxPFWRzi31A16v+zLXBU8JiUHaC3+4T/uDwuY4clMSO/bUcrG5hVP6RH2DtDqfXAjhtbFZYzbsr+iRmMjIy+PDDD7t9PyGhLRr69ddfZ+nSpcybN48bbrgBcAsScAcKt8dqtWI0ug+KwWDo9L6yTExM/4M2XS4Zs7l/1RG7Q61WYTIZMZstOD3FokKJ3FT357XrQB2VVY0DKhIX6nPtDfs8TyEp8Xrq6pqPuGwkzPdIGHRqWm1O9hXXkpseH5C5NjRZcbpkJAlwOno8Bv4g1I9rrE6FUa/GYnXy8+5Kb2uS/hKo+dodbUXn1LKrX8dW68mIrq639Gv9/s5VlmWqG9yJAVrkoJyXfaW7uY7IMbFjfy2bd5QzbWTqEbexp7gem8NFfIyWWK0UsvPurajvk5jRarUdYlm6Y9myZTz33HNcdtllLFmyxFscKzExkZiYGCorKzssX1lZ6Y2zyczM5PPPP+/wvs1mo76+nvT09L4MtxMOh3++zE6ny2/bHgjpCQbijFqaLHYKSxoY7oPUu1Cda28oq3F/WVMTDL2eQzjP90gkxespq2mhus5CVor7iczfc63ypMWbYnUg++/72BtC+bgOyTSx40Ade4rrvfWiBoq/51vtObYatQqDVt2vfSV4XL/1TVasNgdqVf8evvo615ZWOza7e/l4ozZkz4uuOHyuSuzLzgN1Pc5jxwG3pXpkbiJOpwwEp1ebr/C5k0wRMkuWLOHGG2/sUOVTkiQmT57M+vXrO6yzbt06pk6dCsC0adMoLy/nwIED3veV5adMmeLr4UY0kiQxQqRoe6nyFsyL3rRshaQgZDSJgnm9IxyDgNsymXT9ruwcH6tDrZKQ5bYYlkCg1FuKNWjQacOzY7TC8JwE1CqJGrPVKzC7Q6kvE+7F8hR8KmbWrVvHc889x4IFCzj77LOpqqry/jQ3u5+KL7vsMlavXs2//vUvCgsLeeCBB9ixYwcLFy4EYMKECUyePJnFixezdetW1q5dy6233sq5557bIUtK0DtGeovn1Qd1HKFAhSiY58Xb0iCAGU0DrUMSLXgrAYdRW4N6H9RpUUkSiXGevmEBPC8jIS1bQa9TM8TjmjxSvRmXS2ZPiTuTaWQYN5dsj0/FzAcffAC4M5hmzZrV4eeFF14AYNasWdxzzz28/vrr/OpXv2Lt2rU89dRTXveVJEmsWLGC3NxcFi5cyF//+ldmz54tiub1E0XM7C1x1xOIVqw2p/dpT4iZto7VwRAzomDekVEsM2XVzWFTI8pX9YOCUTbAa1WKEJE9shd9mkqqmrBYHeh1avLS4wIzMD/j09TspUuXsnTp0h6XO/fcczn33HO7fT8lJYVHH33UhyOLXgZlxKHXqmmxOiitao6YE7evKPEasQYNsYbwbabmK4JqmYnvnK0oaMMUqyM1wUB1Qyv7ysyMHpLc80pBZqAF8xSS4g2AObDnZVPkWGYACvKS+GjtQXYeoYO2YpVxu6XCOyVbITJmIegWtUpFfo77SS+aXU0VosFkB4L5BCwsMz0TbnEzvnAztV8/kLFc9RFmmRmRm4AkQXVDa7dtMSKlH1N7hJiJAkZ6ItyjuYN2Vb0I/m2PIiiCEQAczlVWA8WwrPASM75yMyUF080UIeelUa9pi5vpwtUky7I3ISTc+zG1R4iZKGBEuyDgvraHjxSUnkxpUdzGoD3KTcfcZMMRgHorsixH3E3DnwxTgoAPNYTFd9Z3bqYgiJkIczOB29UEsKu4s6upqt5CQ5MNjVryWgAjASFmooBh2SbUKnc33qqG8OvG6wsUN1OGcDMBEB+jdafBEpg0WIvVidXuBCLrpuEvBmXEoVZJmFvs1IT4d1aWZW/13wG7mbxiJnBzjjQ3Exw5CFjJchqSZUKrCe9U9PYIMRMF6LVt6Xp7ojRupkqkZXfAnQYbuKdg5eYUa9CgD/NaHoFAp1WT6wnWD/UU7SaL3Wvd852byRYQi5TD6cLcYu+w70hgZG4CEu6HuPrDXMl7PM0lR0ZAc8n2CDETJYyI4nozDqeLGk8gXDR3yz4c5eJdG4Cn4EjoShxowiUIWBHD8TFaNAPsJaUIbIfTRZPFPuCx9YRyo1erJOJ6aD4bTsQYtORluMXw4daZtniZyAn+BSFmogZFhe/2pORFE9UNrciy20JlihVpwQqBzGiqM0deXIK/CZcgYF9lMgFoNSriPaIiEOdlfaPbPZYYp0fVz8rFoYo3bqZdinZDk5XKOgsS7rTsSEKImShB6ctUUdvi7RAbLbQP/u1vqfVIJDmQYqZJtDLoK/mem82BisaABGn3F19lMikEMgg4ki2GBUrcTDtrvPIwm5ceR0yE1dsSYiZKiDNqyUlzN62LtrgZEfzbNYpJv7taFL5EtDLoOxlJRmINGuwOFyVVTcEeTrf4OkstkLVmfC3EQomReYlIQFlN2wNspPVjao8QM1FEm6upPqjjCDRVomBelwT0CVgpmGcSBfN6iyRJDA0DV5Mv3UwASZ5zRHFN+pP6xsi1GLofYN1xM4qI8RbLE2JGEM6M8AR8KdHs0UKlJ5MpTYiZDgRDzESiOd+fhEMQcJ0Sd+Izy4yn2WQgLDNNkW0x9LqaDtbR0mqnpNJt4Yukyr8KQsxEEYpl5mBlY9g0sPMFlYqbSWQydcArZsxWv6fBRlozv0ChiJnCkBYzPnYzKdWphcgeMAUeC8yu4nr2ljYg47ZQJ0Tg91CImSgi2WQgNcGALENhaXRYZ1wuWbQy6AbladTudNHY4r80WLvD6U2zVbp1C3qH4maqqG2hudX/qcr9weduJo+wqA9INlNkixmleF5pVTObd1e5X4uw+jIKQsxEGSOiLG6m1tyK0yWjUUsRe8HqL+3TYGsaLH7bj/L0q9OoiNFr/LafSCQ+RuetjbQvBK0z7YWqr7OZav0sZmRZjshWBu0xxejITnUnfnz/czkQmfEyIMRM1DEyyuJmvPEyiUZUKpGWfTjK07Q/S+a3N+WL1Pi+E8pxM3WeNgZajYpYg2+EqiJmLFYHrTb/ucObWx3YHe6UdyVOJxJRXE1Ol9uVHGnF8hSEmIkyFFVeVGb2fpEjGSVeRjSY7BrliTQQlhlhGesfQxUxE4JtDdr3NfKVUDXqNeh17pYX/oybUcYeZ9RGVI+iw1GCgAES4nQRey0UYibKyEyOIT5Gi93h4kB5Y7CH43cqRU+mI5IcH1jLjKDvtLfMhFoHbX/VaUkOQNxMpGcyKRS0cyuNzE2MWOuoEDNRhiRJURU3481kEsG/XaLchKrrA2GZETVm+sOg9Hg0aokmi90bzB4q+Euoegs6+lPMRInITojTk5nsvv5FarwMCDETlSg1BqKh6WT7VgaCzigX8ho/VgGOlpuGv9BqVAzKcHe9D7W4GV9nMil4LTN+rDXTlskUufEyCr85IZ+jR2dw7NjMYA/FbwgxE4Uopaz3ljTgCjGztS+RZdnrZhKtDLrGmzniTzdTBPe/CRSh2nTSX26mxABkNNVGUYuNSSPS+OO8MRgjOJtQiJkoZFBGHHqdmharg9Kq5mAPx280NNuw2V1IEqQkCBdHVyhP1IFxM0X+TcNfDAvRIGB/CdVAxMzUC5EdUQgxE4WoVSqGey6O2/fXBnk0/kOJl0kxGdCoxaneFUocS5PFjtXu9Pn2nS4XDZ70XXHT6D+KmDlY0RhSWYj1fqrsHAjLjBDZkYW4wkcpowYnAfDGl3t56eOd3sJXkUSl6JbdI0a9Gr3Wf2mw5mY7LllGJUmYYiI/NsFfpCUaiTNqcThlDlaGRhaiLMte60aij+NOkj0i26/ZTFHkZooGhJiJUk6ZmseMMRnIwNc/HuKmp9fwvy2luFyRE0NTWe8J/hWZTN0iSVK7Hk2+j5upbXRvMzFeJ4oWDgBJkkKueF6jxY7D6b5e+FoQKJYZc7MNh9P3lii7w9XWYkNYZiICIWaiFJ1WzR/OHsONF08mNy2O5lYHKz/ZxdKXNrI3Qvo2KZaZdJHJdESSTf7rnh3pvW8CiRIEHCptDZRja4rR+tyNGx+jRa2SkMHrpvQlikVJo1YRZ9T6fPuCwCPETJQzMi+R2y6bykUnj8Co13CgopF7Vm7ihdU7MDf7/iISSLxiRriZjog/e+HU+immIhoJNcuMvzKZAFSS5LX21PkhPbvNxaSL2CJy0YYQMwLUKhUnT83j3iuPYdb4LAC++7mMm55Zy+cbi3G6QifgsC8IMdM7lCBg/1pmRDbZQFHaGlTWW0Iixq3OTzVmFJL8aTEUmUwRhxAzAi+mWB2/P+Mobl4whcEZ8VisDl77fA93/Gsjuw7WBXt4faLJYqfF6m5SJwrmHZk2N5PvY2ZExojviDVovcJ8fwikaPvbhaiIJH+IGXFeRh5CzAg6kZ+TwD8WTuV3pxUQa9BQUtXE3S9v4sFXN/k1u8CXKFaZxDidN1tH0DVeN5NZ3DRCnaFK3EwIiBl/upmg7Zzxp8gWmUyRgxAzgi5RqSTmTMrh3j/OYM7EbCTgq80l3PDkD3y87qBfMgx8iZLJlC4ymXqk7aYhxEyoMzTT3dZgX1nw07P97mby43kp3EyRhxAzgiMSZ9Tyu7mjuO330ykYlESrzckb/9vLbS+sZ0cIF9wTmUy9R6np0dBk82l8lCzLbQHA4qbhE4Yolpny4Ftm/O5mEiJb0AeEmBH0imHZJh649jguP+so4oxaympaWPbvH3nyvW3U+rFJYX8Rwb+9xxTrrgHjkmXMzb4LLG1udXgteMKc7xsGZ8QjSW7h6Y+bfF8InJvJn9lM4ryMFISYEfQalUri+Ik53PvHYzhpci6SBBt2VvL3Z9eyes3+kCqzrjSYFGKmZ1QqydsLx5c3DkXkxsdo0WrEpcYX6HVqclJjgeDGzdjsTppb3QH2/rbM1DdZkX3YENdduVi02Ig0xBVG0GdiDVouPnUkt106jRG5CdjsLt7+uohbn1/Hz0U1wR4eICwzfSXF447zpZgRcQn+YUgIBAErx1anURHjp07MitXE4ZRp9GEqepPFLiyGEYgQM4J+MygjnhsvnswfzhpNQqyOijoLD7/xE4+9vdWvXZh7otXm8Bb8EzEzvUPpKu7LzBFRMM8/KBlNwUzPbu9i8lfROY1ahSnW3fOpzoeZdsrYhcUwsvD5kSwrK+O6665j5syZTJs2jcsvv5w9e/Z0WObUU0+loKCgw8+NN97ofb+uro7rr7+eadOmMX36dO644w4sluDdHAXdI0kSM8Zmcs+Vx3DqtDxUksSWPdXc/Nw63v9+Hy4fmod7i2KViTNqiTGIUuW9ISXBY5nxYbVVb4CoSRTM8yVKW4P95Y0+db/0BX9nMikk+aEKcH2Axi4ILD61D9psNq688koSExN56qmnMBgMPPbYYyxcuJAPPviA5ORkWlpaKC4u5umnn2bMmDHedQ2GtgveokWLsFgsvPjii5jNZm6++WZaWlq4//77fTlcgQ8x6jVceNIIjhufxauf7WbnwXre/XYfCXF6Zk/IDuhYhIup76Qm+L5LcZtlRnTL9iU5abFo1CqaWx1U1lvICEL5gfrGwMScJMXrOVDR6FP3p78DlwXBwaeWmY0bN7J7926WL1/OuHHjGDFiBMuWLaOlpYUvv/wSgL179+JyuZg0aRJpaWnen/h4d/2ELVu2sH79eu6//37GjBnDjBkzuPPOO/nPf/5DRUWFL4cr8AM5aXH87beTOPvYIQB8tqE44E+PVSL4t88kJ/g+ZqZOtDLwCxq1ikEZcUDw4mYCJQj8kdEk0rIjE5+KmREjRvDMM8+QkZHRtgOVexdms/tLt2vXLlJTU0lISOhyGxs3biQtLY38/Hzva9OnT0eSJDZt2uTL4Qr8hCRJnDY9D71WTWl1MzsPBLYVQoWoMdNn2mJm/OFmEjcNXzM0U+mgHZzieQFzM/mhCrBwM0UmPnUzpaWlcfzxx3d4beXKlbS2tjJz5kzALWZiYmJYtGgRmzdvJikpiV//+tf87ne/Q6VSUVFRQVZWVodt6HQ6EhMTKSsrG9D4ND4O9lJ72t4rvyOZvs7VFKdn1vgsvthUwhebSxk3PNWfw+tAdYNbzGSlxPb7mEfbsU1tFzOjVks+CepUhFFqotHn373+EinHNT/XxBebYX+5+Yifrb/m2+ARBCmJBr8eW0VkNzTZetxPb+eqpGWnJPh37P4kUs5jX9InMVNSUsJJJ53U7ftr1qwhOTnZ+/9nn33Ggw8+yKWXXkpBQQEAe/bswWw2c9ppp3H11VezadMmli1bRkNDA3/5y1+wWCzodJ197Hq9Hqu1/0+NKpVEUlJsv9c/EiZT9FgA+jLXX580ki82lbBlTxVWF2Sm+OfzP5yqBvdT3PDByQM+5tFybPV2JwA2uwudQUdczMDiXCxWh7fR57C8pJALxA734zpxVCb8dzsHKpowmYw93tR8Pd8GT7bg4OxEv11Xle0DNLTYer2fnuZqbnGneedlJ/h17IEg3M9jX9InMZORkcGHH37Y7fvtXUevv/46S5cuZd68edxwww3e15999lmsVqs3RqagoICmpiaefPJJrr32WgwGAzabrdO2rVYrMTH9D3RzuWTM5pZ+r98VarUKk8mI2WzBGeK9igZKf+Yap1Mxdlgy24pqeefL3fz25JF+HiXYHS6qPW4mo0airq65X9uJxmMbZ9TSZLGzr7iO3PS4AW2zrMb9uRt0aqwWG1ZL5+90MIiU4xqjlTDo1LTanGzbU8mgjPgul/PHfF2yTI3ngUGN3O/vWG/QSO54u+p6S4/76e1clbIRWgm/jt2fRMp53Bt6I9ahj2JGq9V2iGXpjmXLlvHcc89x2WWXsWTJkg4ma51O18nyMnLkSFpaWmhoaCAzM5PPP/+8w/s2m436+nrS09P7MtxOOPxUodbpdPlt26FGX+d68pRcthXV8tWWQ5x97BAMOv8U2FIor2lGxn0TNerUAz4u0XRsk+L1NFnsVNVbyEweWIaMIiiT4vUh+flFwnEdkhnPzoP17C1pILsHq6cv52tutuF0yUhAnEHj188x3ui26FmsThqbbRh7UaDvSHO12Z00eQrwxRu1YX8ORMJ57Ct87nBThMySJUu48cYbOwgZWZY5+eSTWbFiRYd1fv75Z9LS0khKSmLatGmUl5dz4MAB7/vr168HYMqUKb4ersDPjB2WQnqSEYvVwZpt5X7fX/vgX38V84pUfJk5IhpM+p8hQSqe5y06F6tD4+eYDaNeg1GvBtoCdweCsg2tRkWswb8PVoLA4tMzcd26dTz33HMsWLCAs88+m6qqKu9Pc3MzkiRxyimn8Pzzz/Phhx9y8OBBVq1axXPPPceiRYsAmDBhApMnT2bx4sVs3bqVtWvXcuutt3Luued2yJIShAcqSeKkKbkAfL6pxO9p2lWixky/STb5rtaMSH/1P0O9bQ0Cm9EUqEwmBSW1v9aX52Wc/yoXC4KDT6XpBx98ALgzmFauXNnhvWuuuYZrr72W66+/nri4OB566CHKy8vJzc3l5ptv5vzzzwfcab0rVqzgjjvuYOHChej1eubOnctNN93ky6EKAsiscVm8800RZTUtbN9fx5ihyT2v1E/aCuYFvpBYuKMID5/cNLx9mUSNGX8xNNMdJ1NS1YTd4USrUQdkv/UBFqpJcToOVTf7RmQ3iYJ5kYpPxczSpUtZunTpkXeo0XD11Vdz9dVXd7tMSkoKjz76qC+HJggiRr2GWePcadqfbyz2q5ipqHcHeQvLTN9p36V4oCi9dIRlxn+kJBiIj9HS2GLnYGUT+dld1+7yNYGuoOtLy0ygKhcLAo9IUhcEBMXVtLWwhoo632aVtadKFMzrN8km38XM1ImO2X5HkqR2TScD52pqczMFpk2FIpp86v4UBfMiDiFmBAEhMzmGccNSkIEvN5X6ZR9Ol4tqT8qosMz0HeUJ2CdiRtw0AsIQj6spkG0N6gNsmUn2YWC6cDNFLkLMCALGKVPd1pnvfj6ExVNQzZfUmq04XTIatUpcrPqBYkVpstixO5z93o7D6aLRU1RNtDLwL21BwIETM4G2uiX6UMwEOt5HEDiEmBEEjNFDk8lMjsFidfKDH9K023fLVolMhT4Ta9Cg85R3r2vqf5G7+iYrMqBRS946IQL/oIiZ8poWvzwgdEV9gK1uyT7szyQshpGLEDOCgHF4mrbLx2nalZ5YHBEv0z8kSWp7Cjb3/8bhDRAV6a9+xxSrI8WkRwb2l/s/bsZmd9Lc6hZNgbbMmFvsOAZQ7dYly97g9sT4wMT7CAKHEDOCgHLs2EyMejUVtS38sq/Wp9uurBc1ZgaK8sRaN4CMJkXMJAtTfkAIZPE85bzQaVW9qsbrC+KNWjRqtygeSBBwU4sdp8v9AJUoLDMRhxAzgoDiTtPOBuDzjSU+3bbiZkoTlpl+o8S4KCms/SHQqbvRTiDjZuqDUHROkiSv+PCFyDbFaP1euVgQeMQRFQScE6fkIAE/F9VQXuu7NG3FMpMhLDP9RrHM1A4gPqHNMiMK5gWCod6MJv+7mYJV2dkXGU0ikymyEWJGEHAykmIYn58CwBc+ss64ZFm0MvABvqjpISwzgWVwptsyU2Nuxdzi3+7kwRIEvshoCnTgsiCwCDEjCAonT8sD4LttZbS0DjwLo6HJhs3hQiVJ3h5Dgr7jfQIeiDm/ScTMBJIYg8bb5dzfcTPBygZK9kENJNEvLLIRYkYQFEYPTiIrJQarzcn3P5cNeHtKJlNqgkH4wweAL56AlVYGwjITOIZmBcbVFOiCeQo+OS+FmymiEVd9QVCQJImTp7qtM1/4IE3bG/wrXEwDQnnibmiy9euYtE9/FZaZwBGoIOBAd8xW8EXMjCiYF9kIMSMIGseOycSo11BZb+HnwpoBbUukZfuGhDgdkgROl+yt4tsXGj3prxLuGiiCwDC0XXq27OP6Te0JliDwpWVGiJnIRIgZQdDQ69TMnpAFuIvoDQTFMpMh0rIHhFqlIsEjQvrTpVip0mqK0wl3XwAZlBGHWiVhbrFTax542f+ucFvdgtN1WrHM1DdZ+23FFQHAkY242giCyomTc5GAX/bVcqi6ud/bEW4m35E0gIwmUTAvOGg1anLSYgH/uZqCaXUzxeqQ8FgMW+x9Xj8YlYsFgUWIGUFQSUs0MnFEKuCOnekPsiy3czPF+Gxs0Yq3e3Y/MpratzIQBBZ/x80o4tYUG3irm0atwhTnFlD96dEUjMrFgsAixIwg6CiBwN9vK6Olte9PXU0WOxarAwlITxRp2QPF29JgQJYZcRwCjb/FTLDrBw3kvAxG5WJBYBFiRhB0Rg1KJCctFpvdxbdb+56mrVhlEuP1aDVqXw8v6lBaGgxEzIhGfoFniKcS8IGKRp83cYXgZTIpJA0gCFjUmIl8hJgRBB1JkjjZ0037i00luFx9uxB7g39FvIxPEJaZ8CQnLRadRoXF6qTCh21CFIItCHwhZkSNmchFiBlBSHDMmExiDRqqG1r5qbC6T+uKBpO+ZSBpsOKmETzUKhWDMpTieb53NQWrYJ6CTywzIpYrYhFiRhAS6LVqZk/oXzftStGTyae0L1DWl5olsiyLbKYgM8SPlYDD2s0kqv9GPELMCEKGEybnIEmw40AdpVVNvV6vst5tUs8QmUw+QbngW+1OLFZnr9ezWJ1Y7c4O2xAElvbF83xNsCvo+ioAWBCZCDEjCBlSE4xMHpEG9K2InnAz+Ra9Vk2MJ321L+nZSspsrEGDXisCsYOBImYOVjbhcLp8uu1guxCTTAMoGSCq/0Y8QswIQoqTp7oDgddsK6fJ0nOatsXq8BbREm4m36FkNPWlcJ64YQSf9CQjMXoNdoeL0qr+F6E8HKvdSYvVU3QuWG4mz36tNicWz1h6g0uWaQhS5WJB4BBiRhBSjMxLJC89DpvDxbdbD/W4vGKViY/RimJYPkS5cdT2oUCZ6JYdfFSS1BY3U+47V5MiavVaNUZ9cKxuel2bxbAvrTYam22iX1gUIMSMIKRon6b95aYSnK4jm8qrRINJv5DYj5YGIvg3NPBH3Ex7F1Mwi871p9WGYjEMRuViQeAQR1YQchw9OoM4o5Yas5Uf9xy5m3ZFnTv4Nz1RBP/6Em9GU1PvO2d7M0ZEkGVQGZKpVAL2XUZTWyZTcC0bipjpk8VQlAuICoSYEYQcOq2a4ye607S/2FR8xGVFWrZ/8NaaMff9ppFsEgXzgslQj5uptKrZm102UIKdyaTQH4uhyGSKDoSYEYQkJ0zKQSVJ7DxYT3Fl92naws3kH7xpsH3KZgqNG160kxSvJyFWh0uWOVDuG+tMqFg3kvtRa0YEpkcHQswIQpJkk4HJBZ407Y3dW2cqFMuMSMv2Kf2KTRBPwCGBJEneuJmiQ76Jmwl2wTyF/lSnDhUhJvAvQswIQpZTPGnaa7dX0NjSOXbDZnd6L1TCMuNbFDFjbrFjd/Rcr8TucHpT6ZW0bkHw8GY0+UjMhIqbqT+WGeFmig6EmBGELMNzEhicEY/d4eKbnzqnaVc1uOM5jHoNcUZtoIcX0cQZtd7Mj4ZeuJqUm4tOo/KmzwqCh9cy46OMplBpB5DYH/enqDETFQgxIwhZJEnyFtH7cnNppzTtSm8mkzGo6aKRiCRJJMW7M1d6c+NoHy8jjkXwGZLptsxU1Lb0qvjkkehQdC7I1g0luLyxlxZDEG6maEGIGUFIM/2odOJjtNQ1Wtmyu2M37SqRyeRX+tILRwT/hhbxMTpSE9w3/r3FdQPalrfonAQJQU7NjjVovBbD+l6I7PbVgoMtxAT+RYgZQUij1ag5fmIO0DkQuEJkMvmVvgRbCjETegzLdrua9hTXD2g77YvOqVXBvWVIktSnuBll7MGsXCwIDELMCEKeEybloFZJ7C5p6JBqWiUymfxKcrynsV+fxIyoMRMqKMXzBixmQiyAtj8iO9iViwX+x+di5uDBg/z5z39m6tSpTJ06leuuu46KiooOy6xZs4b58+czYcIE5s6dy+rVqzu8b7VaueOOO5gxYwaTJk3i+uuvp7a21tdDFYQJSfF6pihp2u2K6ImCef5FWGbCG6V43p6DA3MzhUomk0JfLDPK2EWLjcjHp2LGZrNx6aWX4nK5eO2111i5ciWVlZX86U9/QpZlAAoLC/njH//IcccdxzvvvMNvfvMbbrjhBtasWePdzu233853333HY489xksvvURRURGLFi3y5VAFYcYpU/MAWLe9AnOzDYfTRbUnmyk9SbQy8AfKzatXAcCiMFnIMTgzHkmC6obWXsWXdEeoZDIp9ElkixYbUYNPxUxZWRnjxo3j3nvvZeTIkRx11FFceumlbN++nbo699PBSy+9REFBAYsXLyY/P5/LL7+cuXPn8txzzwFQUVHBe++9xy233MLUqVMZP348Dz30EBs2bGDLli2+HK4gjBiWbWJoVjwOp8zXPx2i1tyKS5bRaVRBD0qMVLxixiwsM+GIQachOzUWGFi9mVBzM3nPy170ZxLnZfTg04IQgwcP5pFHHvH+f+jQIV5//XXGjBlDUlISABs3buTkk0/usN4xxxzD3XffjSzLbNq0yfuawtChQ8nIyGDDhg1MmjSp3+PTaHzrVVN7ourVUdCJNRTmeur0QTz9n1/4akuptyhYepIRndb3gX2hMN9A0d1cUz2xSPVNVlRqCVU3MQdOl8v75J+WaPT598yXRNNxBcjPSaC0qpn95Y1MGpnWr20oadkpCYaQOLZKllZ9s63DeLo6tg3N7vMyVMbuK6LtPO4Nfqtu9fvf/57vv/+ehIQEXnrpJW/wVXl5OZmZmR2WTU9Px2KxUFdXR0VFBUlJSej1+k7LlJeX93s8KpVEUlJsv9c/EiZT9MRsBHOupx07lFVf7qWu0cqnG0oAyEmP99txheg+tvEmI5IETpeMWqvt1s1Q02BBlt3fsUG5SahV/9/enUc1daZ/AP8mkAARAgEEtbZ1GcFWQKCCMioKtmq1PXW0p85UqLi2tG5oFWn51bW2RdytWtfW9UyrrbWjM+3YRafWBW3rjoigokJAthCEBJL390fIlZiFgAFyyfM5h3Pg5k143rw33Id3u/Y/0dJR2vXZbr44/uc93JIrm/w5KX+g26fmqU6ezfpZs9bTT3gBAMqUapPx1G/bige6ZdlPdpTaRey25ijnsTUalczcuXMHQ4YMMfv4yZMn4e3tDQCYO3cuZs6ciU8//RQJCQk4ePAgOnbsiOrqaojFhsMC+p/VajWqqqqMHgcAFxcXqFRNH/fVahkUigdNfr4pTk5CSKVuUCiqoNFYt4ETX9lLXQeHdsLB/+Xick4xAMDbQ4zS0kqb/x57qW9LsFRXaTsxypVq5OaVoEvdrrKPyr1bDgDwchdDUW7bz5itOVK7AkAnb93FLut2KUpKlE1a0VNctwWCswDN8llrLGfo5l+WKqpRXKyEsC55NtW2+o01RXYSu6040nkslbpZ1QPVqGTG398fR44cMfu4p6cn9/0zzzwDAFi9ejViYmJw4MABTJs2DS4uLlCrDe+zo//Zzc0Nrq6uRo8DuhVObm6Pl4XWWrljZGNpNNpme21709p1je7dCd+duAmNVvcHzVfq2qzxtHZ9W5Kpunq5u6Bcqcb9smp0bu9u8nn36y52Mg8X3rxXjtKuT/i2g7OTAMqqGuQXP2j0NgYqtQYP6jadk7qJ7OI9a+fqzPUYliiqjSb36ttWq324c7FUIraL2G3NUc5jazQqmRGJROjevbvZx/Pz83H+/HkMHz6cOyaRSNC5c2cUFhYCADp27Mh9r1dYWAiJRAIPDw906NABZWVlUKvVBj00hYWF8Pf3b0y4pA3ycndBxDN+OHVZt9yfVjI1L28PF9wqqLC4oqnEziaIkodEzkJ06eSJ7Lwy3MxXNDqZ4TadEzvBzU7uueUkFMKznRhlSjVKK1RmVyqVV6qhZbqdi6Xt6N5tbZ1NZw9lZmZi5syZyMnJ4Y4pFArk5uZySVCfPn1w5swZg+edOnUK4eHhEAqFeO6556DVarmJwACQm5sLuVyOiIgIW4ZLeEq/TBugPWaam5cVK0fKaMM8u9bjSS8AQG4TbjppbyuZ9GRWbOion5TuaQc7F5PmZ9MW7t+/P3r27Ink5GRcunQJly9fxowZMyCTyTBmzBgAQHx8PC5cuID09HTcuHED27dvx3/+8x9MnjwZgG4oa+TIkUhNTcXp06dx4cIFzJ49G5GRkQgNDbVluISnunaU4qW/dsHQiCe5lQ2keVhzfyZa/mrfArhkpsJyQRPsbcM8PZkVe83QeelYbJrMiMVibNmyBU899RQmT56M+Ph4SKVS7N69G+7uuvH2Hj16YMOGDTh27BhGjRqFr776CsuXL0dUVBT3OkuWLEFUVBSmTZuGSZMmoVu3bli7dq0tQyU8Nzq6G/4+pAdtUd7M9BeCMgsXjRK6aNi1Hk/ptsW4VVABbd1cM2vZ66ZzjUlm7C120jxsPgjq5+eHFStWWCwTHR2N6Ohos49LJBIsXboUS5cutXV4hJBG0F80Six151MyY9c6+3nAReQEVY0G+cWVeMLMRG5T7LV3w5pkpox2pXYoNJBICDGL65kxMwGYMUY9M3bOSSjgNpls7FCTvSaq1uwCbK+JGGkelMwQQszSd9FXqTSoqluiW19ldS1q6/a5oO58+9Wtk26PoNyCxk0CttthJv1cLqXxNh56NMzkWCiZIYSY5ebiDDcX3e0iTPXOlCh0/xl7SEQQtaHt4tuarnUbHt5s5Iome+3dkEkf9szob2L8KBpmciz014cQYpGXhRVNdMHgB33PTF6hEjVWbrJWf9M5e2tffc+MukZrsscQsN9EjDQPSmYIIRZ5W5hsSRvm8UN7Lze4u4lQq2G4U6S06jmKB/a76ZxY5IR2rrr1K6Ymp1epalGt1gCgYSZHQckMIcQiLwvJTKmiLpmR0n4/9kwgEKBLB/0kYOuGmvTtba+bzlnaNkDfY+hqRzsXk+Zlf2coIcSucCtHTMyZ0R+TuRvfHJbYF/2NQq1NZux1JZOefhdgUz0zNMTkeCiZIYRYpL9omPoP+OFFg3pm7F3XuuXZN61cnm2vK5n0ZB66BNrSeWmvsRPbo2SGEGKRfj6Mqf+A7f2/d/KQfkXTveJKVKtNT5qtz957Nyz1zNDEdMdDyQwhxCJLcxNowzz+8HJ3gczDBYzpbm3QEHtPVC1t6GjviRixPUpmCCEW6S8Iiko1t0EeAFSra7llsXTR4Ieu3LyZhpMZ+x9mqusxVNAwE6FkhhDSAHeJCE5CARjA7TsCPLxg0IoR/uDmzVixE7C9927ohz9N9czQMJPjoWSGEGKRUCB4uHFevQuHvV/siLHGrGiy9/bV7wKsrKqBukZj8Ji9x05sj5IZQkiDHm4fT8kMn3Wt22umqKwayqoas+X4sOmcxMUZ4rpbaNTvndFotSivtM+di0nzoWSGENIgmYlbGlAywz8SVxH8ZW4ALN+niQ+bzgkEgnp3z354XpYr1WBM16MoldD+R46CkhlCSINMrWgqpXkJvNTViqEmviSqppIZbudidzGEQkGrxEVaHiUzhJAGcStHKqq5Y9ytDGjDPF7pYsWKJr6sBjKVzNB2AY6JkhlCSIOoZ6bt0K9oys1XgDFmsgxfVgOZum9YaV3CTTc/dSyUzBBCGmTq/kzcUARdNHjlKX8PCAUClFeqTd48FODPMJN3Xa+gqWEmLzuPndgWJTOEkAY9nACsBmMMtRotKvQrRqR00eATF5ETOvm2A2B+qIkvw0wmtwxQ8CMRI7ZFyQwhpEH6/3JrNVooq2pQplSBAXB2EsDDTdS6wZFGa2jzPL4MM1maAEw9ho6FkhlCSIOcnYSQSnRJS2mFyuA/d4GAVozwTUMrmvgyzKSPr1yphlarm/9Dw0yOiZIZQohV6k+25MvFjpimT2Zu5lcYTQKuv+mcvQ8zebYTQygQQMsYyit1Q6B0bjomSmYIIVaR1ZufQBcMfnuifTuInIV4oKpFYWmVwWOKyhpu0znPdva96ZxQKICnuy7GEkU1HlTXQlV3awMaZnIslMwQQqwik+pWjpTV65nxpj1meMnZSYin/NwBGA818W3TufrzZorLdYmZm4szXMROrRkWaWGUzBBCrCLT/wdcf84M9czwlrnN8/iykkmv/oaOxeXVBseI47DPm24QQuyOV72N86rruvK96aLBW9zmeY+saOLLSia9+vcN45IZd/seHiO2R8kMIcQq3AZlShWqVXV3VObJBY8Y008Cvl1QAY1WCyehrqOeb0ubuTu6K1QoVuiGmei8dDw0zEQIsYr+AlGiUHH/vVPPDH/5e0vg5uIEda0Wd4squeMPhxD50bthsmeGzkuHQ8kMIcQq+otGlaoWGi2DAIDUzle7EPOEAgGe9tdvnvdw3gzvhpm4JLsaJeV0XyZHRckMIcQqbi5OBitEpO5iODvRnxA+M7V5Hu+GmUysZqJhJsdDf4kIIVYRCAQGFzgaYuI/k8mMkl8r1fTJjLpWi7xCpcEx4jgomSGEWK3+RYIvS3eJeV3qVjTdLapETa0GVapaqNR1m87xJCEQOTvBve7+YFzsdG46HEpmCCFWq3+Bow3z+M9H6gqpRASNluG2XMkNMbm5OMFVzJ/FrvXPSyehAB40l8vhUDJDCLGaQc8MT1a7EPMEAkG9zfMUD4eYeNazUf+89HTX3a+JOBabJzO3b99GYmIi+vTpgz59+mD27NmQy+UGZSZMmIDAwECDr/j4eO5xlUqFRYsWISoqCmFhYZgzZw5KSkpsHSohpJG83Klnpq3pWm8n4DKe3nOrfrwyOi8dkk2TGbVajYSEBGi1Wuzduxe7du1CYWEh3nrrLYM7s167dg0LFy7Er7/+yn2tW7eOe1z/2Lp16/DFF18gJycHM2bMsGWohJAm8DbomeHXBY+Ypt8J+GaBgncrmfRoYjqx6aBofn4+goODsWDBAnh7ewMAEhIS8M4776C0tBTe3t4oLi5GcXExevfujfbt2xu9hlwux8GDB7Fp0yb06dMHALBy5UoMHz4cf/zxB8LCwmwZMiGkEbw86KLR1uiHmQqKHyC/WLd5Ht8SVYOeGSm/Yie2YdOemaeffhpr1qzhEpl79+5h37596NWrF2QyGQBdr4xAIEDXrl1Nvsa5c+cAAP369eOOde3aFf7+/sjIyLBluISQRvKuu3O2APy74BHTpBIxfKSuYAAu3CgGwPdhJn7FTmyj2aarT5w4ESdOnICnpye++OILCOomZGVlZcHDwwOLFy/GiRMnIJFIMHz4cLz99tsQi8WQy+WQyWRwcTE8If38/FBQUPBYMTk723aKkFPdhmFODrBxmCPVFXCs+jamrj6ernh1cHe4iJ3Qrm45LJ84UrsC1te32xNSFCuqUVldC0DXzrb+e9mcfGVu3Pc+nm68ir0pHO08tkajkpk7d+5gyJAhZh8/efIk1yszd+5czJw5E59++ikSEhJw8OBBdOzYEVlZWVCpVAgJCcGECRNw9epVpKWl4d69e0hLS0NVVRXEYuNVEi4uLlCpVI2s3kNCoQAyWbsmP98SqdSt4UJthCPVFXCs+lpb1/EvBzVzJM3PkdoVaLi+vbr5IuNqIffz0094Ndvfy+Ygcn14zejcQcqr2B+Ho53HljQqmfH398eRI0fMPu7p6cl9/8wzzwAAVq9ejZiYGBw4cADTpk3D4sWLkZyczJUNCAiASCRCUlIS5s2bB1dXV6jVaqPXVqlUcHNresNptQwKxYMmP98UJychpFI3KBRV0Gi0Nn1te+NIdQUcq75U17bL2vp2kBmuAHJmDKWllWZK2x/GGDwkIlQ8qIG72IlXsTeFI53HUqmbVT1QjUpmRCIRunfvbvbx/Px8nD9/HsOHD+eOSSQSdO7cGYWFuqzf2dnZIOkBgB49egAACgoK0KFDB5SVlUGtVhv00BQWFsLf378x4RqprW2eRtdotM322vbGkeoKOFZ9qa5tV0P1fbK9OwQAGHQ3oJS4OPPu/ZnxaghqIYCXu5h3sTeVo53Hlth0wC0zMxMzZ85ETk4Od0yhUCA3N5dLguLj45GSkmLwvIsXL0IkEqFLly547rnnoNVquYnAAJCbmwu5XI6IiAhbhksIIQSAm4szOvhIANRtOifk36ZzgU/JMKD3E60dBmklNk1m+vfvj549eyI5ORmXLl3C5cuXMWPGDMhkMowZMwYAMGzYMHz77bfYt28f8vLycOTIEaSlpWHSpElwd3eHv78/Ro4cidTUVJw+fRoXLlzA7NmzERkZidDQUFuGSwghpE6XDrol2rQaiPCRTVczicVibNmyBZ988gkmT54MtVqNAQMGYPfu3XB3dwcAxMXFQSAQYNeuXVi2bBnat2+PhIQETJ06lXudJUuWYNmyZZg2bRoAIDo6GqmpqbYMlRBCSD09nvTEycsF8JfRpFLCPwJWf2veNkyj0aKkxLaTwpydhZDJ2qG0tLLNj1s6Ul0Bx6ov1bXtakx9a2q1OHEpHyHdfLj9hPjEkdrWkerq7d3O9hOACSGEtE0iZyEGh9KcE8JPtOMOIYQQQniNkhlCCCGE8BolM4QQQgjhNUpmCCGEEMJrlMwQQgghhNcomSGEEEIIr1EyQwghhBBeo2SGEEIIIbxGyQwhhBBCeI2SGUIIIYTwGiUzhBBCCOE1SmYIIYQQwmuUzBBCCCGE1wSMMdbaQbQExhi0WttX1clJCI2mbd+CXc+R6go4Vn2prm2XI9WX6tr2CIUCCASCBss5TDJDCCGEkLaJhpkIIYQQwmuUzBBCCCGE1yiZIYQQQgivUTJDCCGEEF6jZIYQQgghvEbJDCGEEEJ4jZIZQgghhPAaJTOEEEII4TVKZgghhBDCa5TMEEIIIYTXKJkhhBBCCK9RMkMIIYQQXqNkhhBCCCG8RsmMBVqtFmvXrsXAgQMRGhqKKVOmIC8vz2z50tJSzJkzBxEREYiMjMSiRYtQVVXVghE3XVlZGT744ANER0cjPDwc//jHP3D27Fmz5Tdu3IjAwECjL76Qy+Um4//6669Nludr254+fdpkPQMDAzFkyBCTzzl37pzJ8qdPn27h6Bvns88+Q3x8vMGxq1evIi4uDqGhoYiNjcXOnTsbfJ1///vfGDFiBEJCQjBq1CicPHmyuUJ+LKbq+9NPP2HMmDEICwtDbGwsPvnkE1RXV5t9DY1Gg5CQEKO2XrduXXOH3yim6pqammoUd2xsrMXX4UPbPlrX+Ph4s5/hgwcPmn2dCRMmGJV/9D1sUxgxa926daxv377s559/ZlevXmUTJ05kQ4cOZSqVymT5uLg4NmbMGHbp0iX222+/sZiYGDZv3rwWjrppJkyYwF566SWWkZHBcnJy2KJFi1hISAi7ceOGyfIzZ85kc+fOZYWFhQZffPHLL7+w4OBgJpfLDeKvqqoyWZ6vbatSqYza6IcffmCBgYFs//79Jp+zZ88e9vzzzxs9z9x5bw92797NevbsyeLi4rhjJSUlrG/fviwlJYVlZ2ez/fv3s+DgYLP1ZoyxkydPsl69erEvvviCZWdns48//pgFBQWx7OzslqiG1UzVNyMjgz3zzDNs48aNLDc3l/3yyy8sOjqazZ8/3+zrZGdns4CAAHb16lWDtlYqlS1RDauYqitjjL366qts5cqVBnEXFxebfR0+tK2pupaWlhrUUS6Xs9dff52NHDnSYjtFRUWxvXv3Gjy3tLS0BWrROiiZMUOlUrGwsDC2Z88e7lh5eTkLCQlh3333nVH533//nQUEBBh8MP73v/+xwMBAVlBQ0CIxN9XNmzdZQEAAO3v2LHdMq9Wy559/nq1evdrkc1588UW2Y8eOForQ9jZv3sxefvllq8ryuW0fVVlZyWJiYixe4BYsWMDeeuutFoyq6QoKCtibb77JQkND2fDhww0uAps2bWIDBgxgNTU13LEVK1awoUOHmn29iRMnspkzZxocGzt2LPu///s/m8feFJbqO2fOHJaQkGBQ/ptvvmG9evUym4gePnyYhYeHN2vMTWWprlqtloWGhrIffvjB6tez57a1VNdH7dq1iwUFBZn9R5Mxxu7fv88CAgLY5cuXmyNcu0TDTGZkZmaisrISUVFR3DGpVIpnn30WGRkZRuXPnj2L9u3bo3v37tyxyMhICAQCnDt3rkVibiqZTIbNmzcjODiYOyYQCCAQCKBQKIzKq9Vq3Lx5E926dWvJMG3q2rVrBm1lCZ/b9lGbNm1CVVUVkpOTzZZpzHvT2i5fvgyRSIRDhw6hd+/eBo+dPXsWkZGRcHZ25o7169cPN2/exP37941eS6vV4vfffzf4zANA3759TX7mW4Ol+k6cONGoXYVCIWpqaqBUKk2+nj23taW63r59Gw8ePLD6b5C9t62lutZXUlKC1atXIzEx0WLdr127BoFAgK5duzZHuHbJueEijqmgoAAA0LFjR4Pjfn5+3GP1yeVyo7JisRheXl7Iz89vvkBtQCqVYtCgQQbHvv/+e9y6dQvvvfeeUfns7GxoNBp8//33+PDDD6FSqRAREYG5c+fCz8+vpcJ+LFlZWZDJZBg3bhxyc3Px9NNPIzExEdHR0UZl+dy29ZWUlODzzz/HnDlz4OXlZbbc9evXIZPJMHr0aMjlcgQEBCApKQkhISEtF6yVYmNjzc6TKCgoQEBAgMEx/fmZn58PX19fg8cUCgUePHiADh06GD3H1Ge+NViq77PPPmvwc01NDT7//HMEBQXB29vb5HOysrJQW1uLSZMmITMzE/7+/hg/fjxeeeUVm8feWJbqmpWVBQDYtWsXjh8/DqFQiOjoaCQlJcHDw8OovL23raW61rdlyxa4urpi0qRJFstlZWXBw8MDixcvxokTJyCRSDB8+HC8/fbbEIvFtgrbrlDPjBn6yZ2PNryLiwtUKpXJ8qZOEnPl7dnvv/+OlJQUDB06FIMHDzZ6XP+HxM3NDWvWrMGHH36InJwcvPHGGxYnG9qL2tpa5OTkoLy8HNOnT8fmzZsRGhqKqVOnmpwQ2Fbadu/evfDw8MDYsWPNlsnPz0dFRQUePHiA1NRUbNiwAb6+voiLi0N2dnYLRvv4qqurTX5+AZhsN/25a+1n3p7V1tZi3rx5uH79OhYsWGC23PXr11FWVob4+Hhs27YNw4YNQ0pKCvbv39+C0TZeVlYWhEIh/Pz8sGnTJsyfPx+//vor3n77bWi1WqPybaFtlUolvvzyS0yaNIk7j83JysqCSqVCSEgItm7disTERHz11VdITU1toWhbHvXMmOHq6gpAN6Si/x7Q/RF0c3MzWV6tVhsdV6lUkEgkzReojR09ehTvvvsuwsPDkZ6ebrLMqFGjEB0dbfDfXo8ePRAdHY2ffvoJI0aMaKlwm8TZ2RmnT5+Gk5MT17ZBQUG4fv06tm3bZtQV3Vba9uDBgxg1apTB+fyojh07IiMjA25ubhCJRACA4OBgXLlyBbt27cKiRYtaKtzHZqrd9BcuU+2mv0CYeo6pz7y9UiqVmDVrFs6cOYP169db7FH717/+BY1Gg3bt2gEAevbsiXv37mHbtm149dVXWyrkRktMTMTrr78OmUwGAAgICED79u3x2muv4eLFi0ZDNW2hbY8ePQq1Wo0xY8Y0WHbx4sVITk6Gp6cnAN37IxKJkJSUhHnz5hn1SrYF1DNjhn5YobCw0OB4YWEh/P39jcp36NDBqKxarUZZWRlvhl52796N6dOnIyYmBps2bbKY/T/abe3n5wcvLy+76LK1Rrt27Ywu6j169IBcLjcq2xbaNjMzE3l5eXj55ZcbLCuVSrlEBtDNu+jevbvJ98aemWo3/c+mPsNeXl6QSCRWf+btUWFhIcaNG4c///wT27ZtMxo+fpSrqyuXyOgFBATY/edYKBRyiYxejx49AMBk7G2hbY8ePYpBgwZBKpU2WNbZ2ZlLZPQsvT9tASUzZvTs2RPu7u4Ge2soFApcuXIFERERRuUjIiJQUFCAW7duccfOnDkDAHjuueeaP+DHtHfvXixZsgTjxo3DypUrLY6rrlq1CsOGDQNjjDt2584dlJaW4i9/+UtLhPtYrl+/jvDwcKN9Uy5dumQyfr63LaCbDOvj44OePXtaLHf8+HGEhYUZ7KdUW1uLzMxMXrRtfRERETh37hw0Gg137NSpU+jatSt8fHyMygsEAoSHh3Ntq3f69Gn06dOn2eN9XOXl5Rg/fjxKSkqwZ88ek3+n6lMoFIiMjDTaW+nixYvchc9ezZs3DwkJCQbHLl68CAAmz1O+ty2g+ww/2mtsTnx8PFJSUgyOXbx4ESKRCF26dGmG6FofJTNmiMVixMXFIT09HT/++CMyMzORlJSEDh06YOjQodBoNCgqKuLGYnv37o3w8HAkJSXhwoULOHXqFD744AOMGjXK7jP/3NxcLFu2DC+88ALefPNN3L9/H0VFRSgqKkJFRQXUajWKioq4LtoXXngBd+/excKFC5Gbm4uMjAxMnz4d4eHhGDhwYCvXpmHdu3dHt27dsHjxYpw9exY3btzARx99hD///BOJiYltqm31rly5YnZTw6KiIlRWVgIAwsPDIZPJkJycjEuXLuHatWtITk5GWVmZ0cXD3o0ZMwZKpRLvv/8+srOz8fXXX+Pzzz/Hm2++yZWpqKhASUkJ9/OECRNw+PBh7NixAzdu3EBaWhquXr2K8ePHt0YVGuWjjz5CXl4eli9fDm9vb+4zXFRUxCV0ZWVlKCsrA6DrgevXrx9WrVqFY8eO4ebNm9i8eTMOHTqE6dOnt2JNGjZs2DCcPHkS69evx+3bt3Hs2DG89957eOmll7jVWW2pbfPz81FaWmr2n5HKykoUFRVxPw8bNgzffvst9u3bh7y8PBw5cgRpaWmYNGkS3N3dWyrsltXaa8PtWW1tLUtLS2P9+vVjoaGhbMqUKSwvL48xxlheXh4LCAhgBw4c4Mrfv3+fTZ8+nYWGhrK+ffuyBQsWsOrq6tYK32obN25kAQEBJr+Sk5PZqVOnWEBAADt16hT3nN9++42NHTuWhYaGssjISJaSksLKyspasRaNU1RUxObPn8/69+/PgoOD2dixY1lGRgZjrG21rd7kyZPZrFmzTD4WEBDA1q5dy/1869YtNn36dBYZGcl69+7NJk6cyK5du9ZSoTZZcnKy0f4c58+fZ6+99hoLCgpiMTExbNeuXUbPiYmJMTj2zTffsBdeeIEFBwezv/3tb+y3335r9tibon59a2trWXBwsNnPsf7vVlxcnMF7VFFRwZYtW8YGDRrEgoKC2CuvvML++9//tkp9LDHVtkeOHGGjRo1iISEhrH///uzjjz82+EzytW3NnceP7nVV39q1a1lAQIDBsd27d7MXX3yRO/c3btzINBpNs8Xd2gSM1RsrIIQQQgjhGRpmIoQQQgivUTJDCCGEEF6jZIYQQgghvEbJDCGEEEJ4jZIZQgghhPAaJTOEEEII4TVKZgghhBDCa5TMEEIIIYTXKJkhhPDCnTt3EBgYaHQvoaaYP38+YmNjbRAVIcQeOLd2AIQQYg0/Pz/885//xFNPPdXaoRBC7AwlM4QQXhCLxQgNDW3tMAghdoiGmQghNvHVV19h5MiRCAoKwuDBg7Fu3Trubs3z589HfHw89u/fj5iYGISFhWH8+PHIzMzknq/VarFq1SrExsYiKCgIsbGxWLFiBWpqagCYHma6efMmZsyYgf79+yM0NBTx8fE4d+6cQVzl5eVISUlBZGQkIiIisHz5cmi1WqP4jx49itGjRyM4OBj9+/fH0qVL8eDBA+7x6upqLFy4ENHR0QgKCsLw4cOxbds2m76HhJCmoZ4ZQshj++yzz7Bq1SrExcUhJSUFV69exbp165Cfn49ly5YBAK5evYqcnBzMnj0bnp6eWLt2LeLi4nDkyBH4+flhy5Yt2LdvH5KTk/Hkk0/i/PnzWLVqFUQiEWbMmGH0O7Ozs/Haa6+hS5cuSE1NhUgkws6dOzF+/Hhs374dkZGR0Gq1mDx5Mu7evYvk5GR4eXlh69atuHjxIvz8/LjX+u677/Duu+/i5ZdfxqxZs3D37l2sWrUK2dnZ2LFjBwQCAZYtW4Zff/0VycnJ8PX1xfHjx5GWlgYvLy+MGTOmxd5rQogxSmYIIY+loqICGzZswNixY5GamgoAGDBgALy8vJCamooJEyZw5TZt2oQ+ffoAAEJCQvD8889j586dePfdd3HmzBkEBQVxiUFkZCTc3Nzg4eFh8veuX78eYrEYO3fuhLu7OwBg8ODBeOmll5CWlob9+/fj+PHjuHDhArZs2YLo6GgAQFRUlMHkX8YY0tPTMXDgQKSnp3PHu3TpgoSEBBw7dgyDBw/GmTNn0L9/f4wcORIA0LdvX0gkEvj4+Njy7SSENAENMxFCHssff/yB6upqxMbGora2lvvSJwwnTpwAAHTu3JlLZADdhN6wsDBkZGQA0CUHJ06cwOuvv46tW7ciOzsbcXFxeOWVV0z+3jNnziAmJoZLZADA2dkZI0eOxKVLl1BZWYmzZ89CJBJh4MCBXBmJRIJBgwZxP+fk5KCgoMAo/oiICLi7u3Px9+3bF19++SWmTJmC3bt3Iy8vD++88w4GDx5smzeSENJk1DNDCHksZWVlAICpU6eafLywsBAA4O/vb/SYj48PLl++DACYPHky2rVrhwMHDiA9PR3Lly9Hjx49kJqain79+hk9t7y8HL6+vkbHfX19wRiDUqlEeXk5vLy8IBAIDMq0b9/eKP5FixZh0aJFZuN///330aFDBxw6dAhLlizBkiVLEBYWhoULF6Jnz54m604IaRmUzBBCHotUKgUApKeno0uXLkaP+/r6Ys2aNSgtLTV67P79+9wwjVAoxLhx4zBu3DgUFxfj2LFj2LRpE6ZPn871jtTn6emJ+/fvGx0vKioCAMhkMshkMpSWlkKj0cDJyYkro09g6sc/b948REZGmvw9gG41VWJiIhITE3Hv3j38/PPP2LBhA+bMmYPDhw+be3sIIS2AhpkIIY+ld+/eEIlEkMvlCA4O5r6cnZ2xcuVK3LlzB4Bu5dGNGze458nlcvzxxx+IiooCAPz973/H0qVLAeh6bEaPHo1x48ZBoVBAqVQa/d6IiAj8/PPPBo9pNBocPnwYwcHBEIvFiIqKQm1tLY4ePcqVUavVBslRt27d4OPjgzt37hjE7+/vjxUrVuDKlSuorq7GsGHDsH37dgBAp06dMG7cOIwcORL37t2z4btJCGkK6pkhhDwWmUyGyZMnY82aNVAqlejbty/kcjnWrFkDgUDADcEwxvDWW28hKSkJTk5OWL9+PTw9PREfHw9Al5xs374dvr6+CAsLg1wux44dOxAZGQlvb2+DZdIAMG3aNBw/fhxvvPEGpk6dCpFIxM1l2bp1KwDdZN8BAwYgNTUVxcXFeOKJJ7Bz506UlJRwPUJOTk5ISkrCBx98ACcnJ8TExEChUGDDhg2Qy+Xo1asXXF1d0atXL6xfvx4ikQiBgYHIzc3FN998g2HDhrXgu00IMUXAGGOtHQQhhP/27NmDvXv34tatW/D09ERUVBRmz56NTp06Yf78+Thz5gymTJmCTz/9FFVVVfjrX/+K5ORkdO7cGQBQW1uLjRs34tChQygoKICHhwdiY2MxZ84cyGQy3LlzB0OGDMFHH32E0aNHA9At9165ciXOnj0LgUCAkJAQTJs2zWCicVVVFdLT03H48GGoVCqMGDECEokEP/74I3766Seu3JEjR7B161Zcv34dEokE4eHhmDVrFgIDAwEASqUSq1evxo8//oiioiL4+PhgxIgRmDlzJlxdXVvwnSaEPIqSGUJIs9MnM/WTB0IIsRWaM0MIIYQQXqNkhhBCCCG8RsNMhBBCCOE16pkhhBBCCK9RMkMIIYQQXqNkhhBCCCG8RskMIYQQQniNkhlCCCGE8BolM4QQQgjhNUpmCCGEEMJrlMwQQgghhNf+H1nDPLI2LtJCAAAAAElFTkSuQmCC", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "# 获取参数\n", - "cfg = get_args() \n", - "# 训练\n", - "env, agent = env_agent_config(cfg)\n", - "res_dic = train(cfg, env, agent)\n", - " \n", - "plot_rewards(res_dic['rewards'], cfg, tag=\"train\") \n", - "# 测试\n", - "res_dic = test(cfg, env, agent)\n", - "plot_rewards(res_dic['rewards'], cfg, tag=\"test\") # 画出结果" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.7.13 ('easyrl')", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.13" - }, - "orig_nbformat": 4, - "vscode": { - "interpreter": { - "hash": "8994a120d39b6e6a2ecc94b4007f5314b68aa69fc88a7f00edf21be39b41f49c" - } - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/projects/notebooks/MonteCarlo.ipynb b/projects/notebooks/MonteCarlo.ipynb deleted file mode 100644 index c0613ce..0000000 --- a/projects/notebooks/MonteCarlo.ipynb +++ /dev/null @@ -1,480 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 1、定义算法" - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "metadata": {}, - "outputs": [], - "source": [ - "from collections import defaultdict\n", - "import numpy as np\n", - "class FisrtVisitMC:\n", - " ''' On-Policy First-Visit MC Control\n", - " '''\n", - " def __init__(self,cfg):\n", - " self.n_actions = cfg.n_actions\n", - " self.epsilon = cfg.epsilon\n", - " self.gamma = cfg.gamma \n", - " self.Q_table = defaultdict(lambda: np.zeros(cfg.n_actions))\n", - " self.returns_sum = defaultdict(float) # 保存return之和\n", - " self.returns_count = defaultdict(float)\n", - " \n", - " def sample_action(self,state):\n", - " state = str(state)\n", - " if np.random.uniform(0, 1) > self.epsilon:\n", - " action = np.argmax(self.Q_table[str(state)]) # 选择Q(s,a)最大对应的动作\n", - " else:\n", - " action = np.random.choice(self.n_actions) # 随机选择动作\n", - " return action\n", - " # if state in self.Q_table.keys():\n", - " # best_action = np.argmax(self.Q_table[state])\n", - " # action_probs = np.ones(self.n_actions, dtype=float) * self.epsilon / self.n_actions\n", - " # action_probs[best_action] += (1.0 - self.epsilon)\n", - " # action = np.random.choice(np.arange(len(action_probs)), p=action_probs)\n", - " # else:\n", - " # action = np.random.randint(0,self.n_actions)\n", - " # return action\n", - " def predict_action(self,state):\n", - " state = str(state)\n", - " state = str(state)\n", - " if np.random.uniform(0, 1) > self.epsilon:\n", - " action = np.argmax(self.Q_table[str(state)]) # 选择Q(s,a)最大对应的动作\n", - " else:\n", - " action = np.random.choice(self.n_actions) # 随机选择动作\n", - " return action\n", - " # if state in self.Q_table.keys():\n", - " # best_action = np.argmax(self.Q_table[state])\n", - " # action_probs = np.ones(self.n_actions, dtype=float) * self.epsilon / self.n_actions\n", - " # action_probs[best_action] += (1.0 - self.epsilon)\n", - " # action = np.argmax(self.Q_table[state])\n", - " # else:\n", - " # action = np.random.randint(0,self.n_actions)\n", - " # return action\n", - " def update(self,one_ep_transition):\n", - " # Find all (state, action) pairs we've visited in this one_ep_transition\n", - " # We convert each state to a tuple so that we can use it as a dict key\n", - " sa_in_episode = set([(str(x[0]), x[1]) for x in one_ep_transition])\n", - " for state, action in sa_in_episode:\n", - " sa_pair = (state, action)\n", - " # Find the first occurence of the (state, action) pair in the one_ep_transition\n", - "\n", - " first_occurence_idx = next(i for i,x in enumerate(one_ep_transition)\n", - " if str(x[0]) == state and x[1] == action)\n", - " # Sum up all rewards since the first occurance\n", - " G = sum([x[2]*(self.gamma**i) for i,x in enumerate(one_ep_transition[first_occurence_idx:])])\n", - " # Calculate average return for this state over all sampled episodes\n", - " self.returns_sum[sa_pair] += G\n", - " self.returns_count[sa_pair] += 1.0\n", - " self.Q_table[state][action] = self.returns_sum[sa_pair] / self.returns_count[sa_pair]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 2、定义训练" - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "metadata": {}, - "outputs": [], - "source": [ - "def train(cfg,env,agent):\n", - " print('开始训练!')\n", - " print(f'环境:{cfg.env_name}, 算法:{cfg.algo_name}, 设备:{cfg.device}')\n", - " rewards = [] # 记录奖励\n", - " for i_ep in range(cfg.train_eps):\n", - " ep_reward = 0 # 记录每个回合的奖励\n", - " one_ep_transition = []\n", - " state = env.reset(seed=cfg.seed) # 重置环境,即开始新的回合\n", - " for _ in range(cfg.max_steps):\n", - " action = agent.sample_action(state) # 根据算法采样一个动作\n", - " next_state, reward, terminated, info = env.step(action) # 与环境进行一次动作交互\n", - " one_ep_transition.append((state, action, reward)) # 保存transitions\n", - " agent.update(one_ep_transition) # 更新智能体\n", - " state = next_state # 更新状态\n", - " ep_reward += reward \n", - " if terminated:\n", - " break\n", - " rewards.append(ep_reward)\n", - " if (i_ep+1)%10==0:\n", - " print(f\"回合:{i_ep+1}/{cfg.train_eps},奖励:{ep_reward:.1f}\")\n", - " print('完成训练!')\n", - " return {\"rewards\":rewards}\n", - "def test(cfg,env,agent):\n", - " print('开始测试!')\n", - " print(f'环境:{cfg.env_name}, 算法:{cfg.algo_name}, 设备:{cfg.device}')\n", - " rewards = [] # 记录所有回合的奖励\n", - " for i_ep in range(cfg.test_eps):\n", - " ep_reward = 0 # 记录每个episode的reward\n", - " state = env.reset(seed=cfg.seed) # 重置环境, 重新开一局(即开始新的一个回合)\n", - " for _ in range(cfg.max_steps):\n", - " action = agent.predict_action(state) # 根据算法选择一个动作\n", - " next_state, reward, terminated, info = env.step(action) # 与环境进行一个交互\n", - " state = next_state # 更新状态\n", - " ep_reward += reward\n", - " if terminated:\n", - " break\n", - " rewards.append(ep_reward)\n", - " print(f\"回合数:{i_ep+1}/{cfg.test_eps}, 奖励:{ep_reward:.1f}\")\n", - " print('完成测试!')\n", - " return {\"rewards\":rewards}" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 3、定义环境" - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "metadata": {}, - "outputs": [], - "source": [ - "import gym\n", - "import turtle\n", - "import numpy as np\n", - "\n", - "# turtle tutorial : https://docs.python.org/3.3/library/turtle.html\n", - "\n", - "class CliffWalkingWapper(gym.Wrapper):\n", - " def __init__(self, env):\n", - " gym.Wrapper.__init__(self, env)\n", - " self.t = None\n", - " self.unit = 50\n", - " self.max_x = 12\n", - " self.max_y = 4\n", - "\n", - " def draw_x_line(self, y, x0, x1, color='gray'):\n", - " assert x1 > x0\n", - " self.t.color(color)\n", - " self.t.setheading(0)\n", - " self.t.up()\n", - " self.t.goto(x0, y)\n", - " self.t.down()\n", - " self.t.forward(x1 - x0)\n", - "\n", - " def draw_y_line(self, x, y0, y1, color='gray'):\n", - " assert y1 > y0\n", - " self.t.color(color)\n", - " self.t.setheading(90)\n", - " self.t.up()\n", - " self.t.goto(x, y0)\n", - " self.t.down()\n", - " self.t.forward(y1 - y0)\n", - "\n", - " def draw_box(self, x, y, fillcolor='', line_color='gray'):\n", - " self.t.up()\n", - " self.t.goto(x * self.unit, y * self.unit)\n", - " self.t.color(line_color)\n", - " self.t.fillcolor(fillcolor)\n", - " self.t.setheading(90)\n", - " self.t.down()\n", - " self.t.begin_fill()\n", - " for i in range(4):\n", - " self.t.forward(self.unit)\n", - " self.t.right(90)\n", - " self.t.end_fill()\n", - "\n", - " def move_player(self, x, y):\n", - " self.t.up()\n", - " self.t.setheading(90)\n", - " self.t.fillcolor('red')\n", - " self.t.goto((x + 0.5) * self.unit, (y + 0.5) * self.unit)\n", - "\n", - " def render(self):\n", - " if self.t == None:\n", - " self.t = turtle.Turtle()\n", - " self.wn = turtle.Screen()\n", - " self.wn.setup(self.unit * self.max_x + 100,\n", - " self.unit * self.max_y + 100)\n", - " self.wn.setworldcoordinates(0, 0, self.unit * self.max_x,\n", - " self.unit * self.max_y)\n", - " self.t.shape('circle')\n", - " self.t.width(2)\n", - " self.t.speed(0)\n", - " self.t.color('gray')\n", - " for _ in range(2):\n", - " self.t.forward(self.max_x * self.unit)\n", - " self.t.left(90)\n", - " self.t.forward(self.max_y * self.unit)\n", - " self.t.left(90)\n", - " for i in range(1, self.max_y):\n", - " self.draw_x_line(\n", - " y=i * self.unit, x0=0, x1=self.max_x * self.unit)\n", - " for i in range(1, self.max_x):\n", - " self.draw_y_line(\n", - " x=i * self.unit, y0=0, y1=self.max_y * self.unit)\n", - "\n", - " for i in range(1, self.max_x - 1):\n", - " self.draw_box(i, 0, 'black')\n", - " self.draw_box(self.max_x - 1, 0, 'yellow')\n", - " self.t.shape('turtle')\n", - "\n", - " x_pos = self.s % self.max_x\n", - " y_pos = self.max_y - 1 - int(self.s / self.max_x)\n", - " self.move_player(x_pos, y_pos)" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [], - "source": [ - "import gym\n", - "import os\n", - "def all_seed(env,seed = 1):\n", - " ''' omnipotent seed for RL, attention the position of seed function, you'd better put it just following the env create function\n", - " Args:\n", - " env (_type_): \n", - " seed (int, optional): _description_. Defaults to 1.\n", - " '''\n", - " import torch\n", - " import numpy as np\n", - " import random\n", - " # print(f\"seed = {seed}\")\n", - " env.seed(seed) # env config\n", - " np.random.seed(seed)\n", - " random.seed(seed)\n", - " torch.manual_seed(seed) # config for CPU\n", - " torch.cuda.manual_seed(seed) # config for GPU\n", - " os.environ['PYTHONHASHSEED'] = str(seed) # config for python scripts\n", - " # config for cudnn\n", - " torch.backends.cudnn.deterministic = True\n", - " torch.backends.cudnn.benchmark = False\n", - " torch.backends.cudnn.enabled = False\n", - " \n", - "def env_agent_config(cfg):\n", - " '''创建环境和智能体\n", - " ''' \n", - " env = gym.make(cfg.env_name,new_step_api=True) # 创建环境\n", - " env = CliffWalkingWapper(env)\n", - " if cfg.seed !=0: # set random seed\n", - " all_seed(env,seed=cfg.seed) \n", - " try: # 状态维度\n", - " n_states = env.observation_space.n # print(hasattr(env.observation_space, 'n'))\n", - " except AttributeError:\n", - " n_states = env.observation_space.shape[0]\n", - " n_actions = env.action_space.n # 动作维度\n", - " setattr(cfg, 'n_states', n_states) # 将状态维度添加到配置参数中\n", - " setattr(cfg, 'n_actions', n_actions) # 将动作维度添加到配置参数中\n", - " agent = FisrtVisitMC(cfg)\n", - " return env,agent" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 4、设置参数" - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "metadata": {}, - "outputs": [], - "source": [ - "import torch\n", - "import matplotlib.pyplot as plt\n", - "import seaborn as sns\n", - "class Config:\n", - " '''配置参数\n", - " '''\n", - " def __init__(self):\n", - " self.env_name = 'CliffWalking-v0' # 环境名称\n", - " self.algo_name = \"FirstVisitMC\" # 算法名称\n", - " self.train_eps = 400 # 训练回合数\n", - " self.test_eps = 20 # 测试回合数\n", - " self.max_steps = 200 # 每个回合最大步数\n", - " self.epsilon = 0.1 # 贪婪度\n", - " self.gamma = 0.9 # 折扣因子\n", - " self.lr = 0.5 # 学习率\n", - " self.seed = 1 # 随机种子\n", - " # if torch.cuda.is_available(): # 是否使用GPUs\n", - " # self.device = torch.device('cuda')\n", - " # else:\n", - " # self.device = torch.device('cpu')\n", - " self.device = torch.device('cpu')\n", - "def smooth(data, weight=0.9): \n", - " '''用于平滑曲线\n", - " '''\n", - " last = data[0] # First value in the plot (first timestep)\n", - " smoothed = list()\n", - " for point in data:\n", - " smoothed_val = last * weight + (1 - weight) * point # 计算平滑值\n", - " smoothed.append(smoothed_val) \n", - " last = smoothed_val \n", - " return smoothed\n", - "\n", - "def plot_rewards(rewards,title=\"learning curve\"):\n", - " sns.set()\n", - " plt.figure() # 创建一个图形实例,方便同时多画几个图\n", - " plt.title(f\"{title}\")\n", - " plt.xlim(0, len(rewards), 10) # 设置x轴的范围\n", - " plt.xlabel('epsiodes')\n", - " plt.plot(rewards, label='rewards')\n", - " plt.plot(smooth(rewards), label='smoothed')\n", - " plt.legend()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 5、我准备好了!" - ] - }, - { - "cell_type": "code", - "execution_count": 24, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "开始训练!\n", - "环境:CliffWalking-v0, 算法:FirstVisitMC, 设备:cpu\n", - "回合:10/400,奖励:-200.0\n", - "回合:20/400,奖励:-200.0\n", - "回合:30/400,奖励:-200.0\n", - "回合:40/400,奖励:-200.0\n", - "回合:50/400,奖励:-200.0\n", - "回合:60/400,奖励:-200.0\n", - "回合:70/400,奖励:-200.0\n", - "回合:80/400,奖励:-200.0\n", - "回合:90/400,奖励:-200.0\n", - "回合:100/400,奖励:-200.0\n", - "回合:110/400,奖励:-200.0\n", - "回合:120/400,奖励:-200.0\n", - "回合:130/400,奖励:-200.0\n", - "回合:140/400,奖励:-200.0\n", - "回合:150/400,奖励:-200.0\n", - "回合:160/400,奖励:-200.0\n", - "回合:170/400,奖励:-200.0\n", - "回合:180/400,奖励:-200.0\n", - "回合:190/400,奖励:-200.0\n", - "回合:200/400,奖励:-200.0\n", - "回合:210/400,奖励:-200.0\n", - "回合:220/400,奖励:-200.0\n", - "回合:230/400,奖励:-200.0\n", - "回合:240/400,奖励:-200.0\n", - "回合:250/400,奖励:-200.0\n", - "回合:260/400,奖励:-200.0\n", - "回合:270/400,奖励:-299.0\n", - "回合:280/400,奖励:-200.0\n", - "回合:290/400,奖励:-200.0\n", - "回合:300/400,奖励:-200.0\n", - "回合:310/400,奖励:-200.0\n", - "回合:320/400,奖励:-200.0\n", - "回合:330/400,奖励:-200.0\n", - "回合:340/400,奖励:-200.0\n", - "回合:350/400,奖励:-200.0\n", - "回合:360/400,奖励:-200.0\n", - "回合:370/400,奖励:-200.0\n", - "回合:380/400,奖励:-200.0\n", - "回合:390/400,奖励:-200.0\n", - "回合:400/400,奖励:-200.0\n", - "完成训练!\n", - "开始测试!\n", - "环境:CliffWalking-v0, 算法:FirstVisitMC, 设备:cpu\n", - "回合数:1/20, 奖励:-200.0\n", - "回合数:2/20, 奖励:-200.0\n", - "回合数:3/20, 奖励:-200.0\n", - "回合数:4/20, 奖励:-200.0\n", - "回合数:5/20, 奖励:-200.0\n", - "回合数:6/20, 奖励:-200.0\n", - "回合数:7/20, 奖励:-200.0\n", - "回合数:8/20, 奖励:-200.0\n", - "回合数:9/20, 奖励:-200.0\n", - "回合数:10/20, 奖励:-299.0\n", - "回合数:11/20, 奖励:-200.0\n", - "回合数:12/20, 奖励:-200.0\n", - "回合数:13/20, 奖励:-200.0\n", - "回合数:14/20, 奖励:-200.0\n", - "回合数:15/20, 奖励:-200.0\n", - "回合数:16/20, 奖励:-200.0\n", - "回合数:17/20, 奖励:-200.0\n", - "回合数:18/20, 奖励:-200.0\n", - "回合数:19/20, 奖励:-200.0\n", - "回合数:20/20, 奖励:-200.0\n", - "完成测试!\n" - ] - }, - { - "data": { - "image/png": "", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "image/png": "", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "# 获取参数\n", - "cfg = Config() \n", - "# 训练\n", - "env, agent = env_agent_config(cfg)\n", - "res_dic = train(cfg, env, agent)\n", - " \n", - "plot_rewards(res_dic['rewards'], title=f\"training curve on {cfg.device} of {cfg.algo_name} for {cfg.env_name}\") \n", - "# 测试\n", - "res_dic = test(cfg, env, agent)\n", - "plot_rewards(res_dic['rewards'], title=f\"testing curve on {cfg.device} of {cfg.algo_name} for {cfg.env_name}\") # 画出结果" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.7.12 ('easyrl')", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.12" - }, - "orig_nbformat": 4, - "vscode": { - "interpreter": { - "hash": "f5a9629e9f3b9957bf68a43815f911e93447d47b3d065b6a8a04975e44c504d9" - } - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/projects/notebooks/PolicyGradient.ipynb b/projects/notebooks/PolicyGradient.ipynb deleted file mode 100644 index b6326da..0000000 --- a/projects/notebooks/PolicyGradient.ipynb +++ /dev/null @@ -1,202 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 1. 定义算法\n", - "\n", - "最基础的策略梯度算法就是REINFORCE算法,又称作Monte-Carlo Policy Gradient算法。我们策略优化的目标如下:\n", - "\n", - "$$\n", - "J_{\\theta}= \\Psi_{\\pi} \\nabla_\\theta \\log \\pi_\\theta\\left(a_t \\mid s_t\\right)\n", - "$$\n", - "\n", - "其中$\\Psi_{\\pi}$在REINFORCE算法中表示衰减的回报(具体公式见伪代码),也可以用优势来估计,也就是我们熟知的A3C算法,这个在后面包括GAE算法中都会讲到。\n", - "\n", - "### 1.1. 策略函数设计\n", - "\n", - "既然策略梯度是直接对策略函数进行梯度计算,那么策略函数如何设计呢?一般来讲有两种设计方式,一个是softmax函数,另外一个是高斯分布$\\mathbb{N}\\left(\\phi(\\mathbb{s})^{\\mathbb{\\pi}} \\theta, \\sigma^2\\right)$,前者用于离散动作空间,后者多用于连续动作空间。\n", - "\n", - "softmax函数可以表示为:\n", - "$$\n", - "\\pi_\\theta(s, a)=\\frac{e^{\\phi(s, a)^{T_\\theta}}}{\\sum_b e^{\\phi(s, b)^{T^T}}}\n", - "$$\n", - "对应的梯度为:\n", - "$$\n", - "\\nabla_\\theta \\log \\pi_\\theta(s, a)=\\phi(s, a)-\\mathbb{E}_{\\pi_\\theta}[\\phi(s,)\n", - "$$\n", - "高斯分布对应的梯度为:\n", - "$$\n", - "\\nabla_\\theta \\log \\pi_\\theta(s, a)=\\frac{\\left(a-\\phi(s)^T \\theta\\right) \\phi(s)}{\\sigma^2}\n", - "$$\n", - "但是对于一些特殊的情况,例如在本次演示中动作维度=2且为离散空间,这个时候可以用伯努利分布来实现,这种方式其实是不推荐的,这里给大家做演示也是为了展现一些特殊情况,启发大家一些思考,例如Bernoulli,Binomial,Gaussian分布之间的关系。简单说来,Binomial分布,$n = 1$时就是Bernoulli分布,$n \\rightarrow \\infty$时就是Gaussian分布。\n", - "\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 1.2. 模型设计\n", - "\n", - "前面讲到,尽管本次演示是离散空间,但是由于动作维度等于2,此时就可以用特殊的高斯分布来表示策略函数,即伯努利分布。伯努利的分布实际上是用一个概率作为输入,然后从中采样动作,伯努利采样出来的动作只可能是0或1,就像投掷出硬币的正反面。在这种情况下,我们的策略模型就需要在MLP的基础上,将状态作为输入,将动作作为倒数第二层输出,并在最后一层增加激活函数来输出对应动作的概率。不清楚激活函数作用的同学可以再看一遍深度学习相关的知识,简单来说其作用就是增加神经网络的非线性。既然需要输出对应动作的概率,那么输出的值需要处于0-1之间,此时sigmoid函数刚好满足我们的需求,实现代码参考如下。" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import torch\n", - "import torch.nn as nn\n", - "import torch.nn.functional as F\n", - "class PGNet(nn.Module):\n", - " def __init__(self, input_dim,output_dim,hidden_dim=128):\n", - " \"\"\" 初始化q网络,为全连接网络\n", - " input_dim: 输入的特征数即环境的状态维度\n", - " output_dim: 输出的动作维度\n", - " \"\"\"\n", - " super(PGNet, self).__init__()\n", - " self.fc1 = nn.Linear(input_dim, hidden_dim) # 输入层\n", - " self.fc2 = nn.Linear(hidden_dim,hidden_dim) # 隐藏层\n", - " self.fc3 = nn.Linear(hidden_dim, output_dim) # 输出层\n", - " def forward(self, x):\n", - " x = F.relu(self.fc1(x))\n", - " x = F.relu(self.fc2(x))\n", - " x = torch.sigmoid(self.fc3(x))\n", - " return x" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 1.3. 更新函数设计\n", - "\n", - "前面提到我们的优化目标也就是策略梯度算法的损失函数如下:\n", - "$$\n", - "J_{\\theta}= \\Psi_{\\pi} \\nabla_\\theta \\log \\pi_\\theta\\left(a_t \\mid s_t\\right)\n", - "$$\n", - "\n", - "我们需要拆开成两个部分$\\Psi_{\\pi}$和$\\nabla_\\theta \\log \\pi_\\theta\\left(a_t \\mid s_t\\right)$分开计算,首先看值函数部分$\\Psi_{\\pi}$,在REINFORCE算法中值函数是从当前时刻开始的衰减回报,如下:\n", - "$$\n", - "G \\leftarrow \\sum_{k=t+1}^{T} \\gamma^{k-1} r_{k}\n", - "$$\n", - "\n", - "这个实际用代码来实现的时候可能有点绕,我们可以倒过来看,在同一回合下,我们的终止时刻是$T$,那么对应的回报$G_T=\\gamma^{T-1}r_T$,而对应的$G_{T-1}=\\gamma^{T-2}r_{T-1}+\\gamma^{T-1}r_T$,在这里代码中我们使用了一个动态规划的技巧,如下:\n", - "```python\n", - "running_add = running_add * self.gamma + reward_pool[i] # running_add初始值为0\n", - "```\n", - "这个公式也是倒过来循环的,第一次的值等于:\n", - "$$\n", - "running\\_add = r_T\n", - "$$\n", - "第二次的值则等于:\n", - "$$\n", - "running\\_add = r_T*\\gamma+r_{T-1}\n", - "$$\n", - "第三次的值等于:\n", - "$$\n", - "running\\_add = (r_T*\\gamma+r_{T-1})*\\gamma+r_{T-2} = r_T*\\gamma^2+r_{T-1}*\\gamma+r_{T-2}\n", - "$$\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import torch\n", - "from torch.distributions import Bernoulli\n", - "from torch.autograd import Variable\n", - "import numpy as np\n", - "\n", - "class PolicyGradient:\n", - " \n", - " def __init__(self, model,memory,cfg):\n", - " self.gamma = cfg['gamma']\n", - " self.device = torch.device(cfg['device']) \n", - " self.memory = memory\n", - " self.policy_net = model.to(self.device)\n", - " self.optimizer = torch.optim.RMSprop(self.policy_net.parameters(), lr=cfg['lr'])\n", - "\n", - " def sample_action(self,state):\n", - "\n", - " state = torch.from_numpy(state).float()\n", - " state = Variable(state)\n", - " probs = self.policy_net(state)\n", - " m = Bernoulli(probs) # 伯努利分布\n", - " action = m.sample()\n", - " \n", - " action = action.data.numpy().astype(int)[0] # 转为标量\n", - " return action\n", - " def predict_action(self,state):\n", - "\n", - " state = torch.from_numpy(state).float()\n", - " state = Variable(state)\n", - " probs = self.policy_net(state)\n", - " m = Bernoulli(probs) # 伯努利分布\n", - " action = m.sample()\n", - " action = action.data.numpy().astype(int)[0] # 转为标量\n", - " return action\n", - " \n", - " def update(self):\n", - " state_pool,action_pool,reward_pool= self.memory.sample()\n", - " state_pool,action_pool,reward_pool = list(state_pool),list(action_pool),list(reward_pool)\n", - " # Discount reward\n", - " running_add = 0\n", - " for i in reversed(range(len(reward_pool))):\n", - " if reward_pool[i] == 0:\n", - " running_add = 0\n", - " else:\n", - " running_add = running_add * self.gamma + reward_pool[i]\n", - " reward_pool[i] = running_add\n", - "\n", - " # Normalize reward\n", - " reward_mean = np.mean(reward_pool)\n", - " reward_std = np.std(reward_pool)\n", - " for i in range(len(reward_pool)):\n", - " reward_pool[i] = (reward_pool[i] - reward_mean) / reward_std\n", - "\n", - " # Gradient Desent\n", - " self.optimizer.zero_grad()\n", - "\n", - " for i in range(len(reward_pool)):\n", - " state = state_pool[i]\n", - " action = Variable(torch.FloatTensor([action_pool[i]]))\n", - " reward = reward_pool[i]\n", - " state = Variable(torch.from_numpy(state).float())\n", - " probs = self.policy_net(state)\n", - " m = Bernoulli(probs)\n", - " loss = -m.log_prob(action) * reward # Negtive score function x reward\n", - " # print(loss)\n", - " loss.backward()\n", - " self.optimizer.step()\n", - " self.memory.clear()" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.7.13 ('easyrl')", - "language": "python", - "name": "python3" - }, - "language_info": { - "name": "python", - "version": "3.7.13" - }, - "orig_nbformat": 4, - "vscode": { - "interpreter": { - "hash": "8994a120d39b6e6a2ecc94b4007f5314b68aa69fc88a7f00edf21be39b41f49c" - } - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/projects/notebooks/Q-learning/Q-learning探索策略研究.ipynb b/projects/notebooks/Q-learning/Q-learning探索策略研究.ipynb deleted file mode 100644 index 40583fd..0000000 --- a/projects/notebooks/Q-learning/Q-learning探索策略研究.ipynb +++ /dev/null @@ -1,32 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Q learning with different exploration strategies\n", - "\n", - "Authors: [johnjim0816](https://github.com/johnjim0816)\n" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.7.13 ('easyrl')", - "language": "python", - "name": "python3" - }, - "language_info": { - "name": "python", - "version": "3.7.13" - }, - "orig_nbformat": 4, - "vscode": { - "interpreter": { - "hash": "8994a120d39b6e6a2ecc94b4007f5314b68aa69fc88a7f00edf21be39b41f49c" - } - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/projects/notebooks/Q-learning/QLearning.ipynb b/projects/notebooks/Q-learning/QLearning.ipynb deleted file mode 100644 index debb47e..0000000 --- a/projects/notebooks/Q-learning/QLearning.ipynb +++ /dev/null @@ -1,459 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 1、定义算法\n", - "强化学习算法的模式都比较固定,一般包括sample(即训练时采样动作),predict(测试时预测动作),update(算法更新)以及保存模型和加载模型等几个方法,其中对于每种算法samle和update的方式是不相同,而其他方法就大同小异。" - ] - }, - { - "cell_type": "code", - "execution_count": 49, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import math\n", - "import torch\n", - "from collections import defaultdict\n", - "\n", - "class QLearning(object):\n", - " def __init__(self,n_states,\n", - " n_actions,cfg):\n", - " self.n_actions = n_actions \n", - " self.lr = cfg.lr # 学习率\n", - " self.gamma = cfg.gamma \n", - " self.epsilon = cfg.epsilon_start\n", - " self.sample_count = 0 \n", - " self.epsilon_start = cfg.epsilon_start\n", - " self.epsilon_end = cfg.epsilon_end\n", - " self.epsilon_decay = cfg.epsilon_decay\n", - " self.Q_table = defaultdict(lambda: np.zeros(n_actions)) # 用嵌套字典存放状态->动作->状态-动作值(Q值)的映射,即Q表\n", - " def sample_action(self, state):\n", - " ''' 采样动作,训练时用\n", - " '''\n", - " self.sample_count += 1\n", - " self.epsilon = self.epsilon_end + (self.epsilon_start - self.epsilon_end) * \\\n", - " math.exp(-1. * self.sample_count / self.epsilon_decay) # epsilon是会递减的,这里选择指数递减\n", - " # e-greedy 策略\n", - " if np.random.uniform(0, 1) > self.epsilon:\n", - " action = np.argmax(self.Q_table[str(state)]) # 选择Q(s,a)最大对应的动作\n", - " else:\n", - " action = np.random.choice(self.n_actions) # 随机选择动作\n", - " return action\n", - " def predict_action(self,state):\n", - " ''' 预测或选择动作,测试时用\n", - " '''\n", - " action = np.argmax(self.Q_table[str(state)])\n", - " return action\n", - " def update(self, state, action, reward, next_state, terminated):\n", - " Q_predict = self.Q_table[str(state)][action] \n", - " if terminated: # 终止状态\n", - " Q_target = reward \n", - " else:\n", - " Q_target = reward + self.gamma * np.max(self.Q_table[str(next_state)]) \n", - " self.Q_table[str(state)][action] += self.lr * (Q_target - Q_predict)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 2、定义训练\n", - "强化学习算法的训练方式也比较固定,如下:\n", - "```python\n", - "for i_ep in range(train_eps): # 遍历每个回合\n", - " state = env.reset() # 重置环境,即开始新的回合\n", - " while True: # 对于一些比较复杂的游戏可以设置每回合最大的步长,例如while ep_step<100,就是每回合最大步长为100。\n", - " action = agent.sample(state) # 根据算法采样一个动作\n", - " next_state, reward, terminated, _ = env.step(action) # 与环境进行一次动作交互\n", - " agent.memory.push(state, action, reward, next_state, terminated) # 记录memory\n", - " agent.update(state, action, reward, next_state, terminated) # 算法更新\n", - " state = next_state # 更新状态\n", - " if terminated:\n", - " break\n", - "```\n", - "首先对于每个回合,回合开始时环境需要重置,好比我们每次开一把游戏需要从头再来一样。我们可以设置智能体在每回合数的最大步长,尤其是对于比较复杂的游戏,这样做的好处之一就是帮助智能体在训练中快速收敛,比如我们先验地知道最优解的大概步数,那么理论上智能体收敛时也应该是这个步数附近,设置最大步数可以方便智能体接近这个最优解。在每个回合中,智能体首先需要采样(sample),或者说采用探索策略例如常见的$\\varepsilon$-greedy策略或者UCB探索策略等等。采样的过程是将当前的状态state作为输入,智能体采样输出动作action。然后环境根据采样出来的动作反馈出下一个状态以及相应的reward等信息。接下来对于具有memory的智能体例如包含replay memory的DQN来说,需要将相应的transition(记住这个词,中文不好翻译,通常是状态、动作、奖励等信息)。紧接着就是智能体更新,对于深度强化学习此时一般从memory中随机采样一些transition进行更新,对于Q learning一般是采样上一次的transition。更新公式是比较关键的部分,但是也很通用,一般基于值的算法更新公式都是一个套路如下:\n", - "$$\n", - "y_{j}= \\begin{cases}r_{j} & \\text { for terminal } s_{t+1} \\\\ r_{j}+\\gamma \\max _{a^{\\prime}} Q\\left(s_{t+1}, a^{\\prime} ; \\theta\\right) & \\text { for non-terminal } s_{t+1}\\end{cases}\n", - "$$\n", - "智能体更新完之后,通常需要更新状态,即```state = next_state```,然后会检查是否完成了这一回合的游戏,即```terminated==True```,注意完成并不代表这回合成功,也有可能是失败的太离谱,等同学们有了自定义强化学习环境的经验就知道了(等你长大就知道了XD)。\n", - "如果需要记录奖励、损失等等的话可以再加上,如下方代码,实际项目中更多地使用tensorboard来记录相应的数据,甚至于笔者就在这些教学代码中使用过,但是看起来有些繁琐,容易给大家增加不必要的学习难度,因此学有余力以及需要在项目研究中做强化学习的可以去看看,也很简单。\n", - "此外稍微复杂一些的强化学习不是一次性写完代码就能收敛的,这时需要我们做一个调参侠。为了检查我们参数调得好不好,可以在终端print出奖励、损失以及epsilon等随着回合数的变化,这点说明一下强化学习的训练过程一般都是先探索然后收敛的,官方的话就是权衡exploration and exploitation。e-greedy策略的做法就是前期探索,然后逐渐减小探索率至慢慢收敛,也就是这个epsilon。这个值越大比如0.9就说明智能体90%的概率在随机探索,通常情况下会设置三个值,epsilon_start、epsilon_end以及epsilon_decay,即初始值、终止值和衰减率,其中初始值一般是0.95不变,终止值是0.01,也就是说即使在收敛阶段也让智能体保持很小概率的探索,这样做的原因就是智能体已经学出了一个不错的策略,但是保不齐还有更好的策略,好比我们知道要出人头地学历高比较重要,但是“人还是要有梦想的,万一实现了呢”,总是存在意外的可能,对吧。回归正题,比较关键的是epsilon_decay这个衰减率,这个epsilon衰减太快了学来的策略往往过拟合,好比一条只能选择一朵花的花道上,你早早选择了一朵看起来还可以的花,却错过了后面更多的好花。但是衰减的太慢会影响收敛的速度,好比你走过了花道的尽头也还没选出一朵花来,相比前者不如更甚。当然强化学习的调参相比于深度学习只能说是有过之无不及,比较复杂,不止epsilon这一个,这就需要同学们的耐心学习了。\n", - "强化学习测试的代码跟训练基本上是一样的,因此我放到同一个代码段里。相比于训练代码,测试代码主要有以下几点不同:1、测试模型的过程是不需要更新的,这个是不言而喻的;2、测试代码不需要采样(sample)动作,相比之代替的是预测(sample)动作,其区别就是采样动作时可能会使用各种策略例如$\\varepsilon$-greedy策略,而预测动作不需要,只需要根据训练时学习好的Q表或者网络模型代入状态得到动作即可;3、测试过程终端一般只需要看奖励,不需要看epislon等,反正它在测试中也是无意义的。" - ] - }, - { - "cell_type": "code", - "execution_count": 50, - "metadata": {}, - "outputs": [], - "source": [ - "def train(cfg,env,agent):\n", - " print('开始训练!')\n", - " print(f'环境:{cfg.env_name}, 算法:{cfg.algo_name}, 设备:{cfg.device}')\n", - " rewards = [] # 记录奖励\n", - " for i_ep in range(cfg.train_eps):\n", - " ep_reward = 0 # 记录每个回合的奖励\n", - " state = env.reset(seed=cfg.seed) # 重置环境,即开始新的回合\n", - " while True:\n", - " action = agent.sample_action(state) # 根据算法采样一个动作\n", - " next_state, reward, terminated, info = env.step(action) # 与环境进行一次动作交互\n", - " agent.update(state, action, reward, next_state, terminated) # Q学习算法更新\n", - " state = next_state # 更新状态\n", - " ep_reward += reward\n", - " if terminated:\n", - " break\n", - " rewards.append(ep_reward)\n", - " if (i_ep+1)%20==0:\n", - " print(f\"回合:{i_ep+1}/{cfg.train_eps},奖励:{ep_reward:.1f},Epsilon:{agent.epsilon:.3f}\")\n", - " print('完成训练!')\n", - " return {\"rewards\":rewards}\n", - "def test(cfg,env,agent):\n", - " print('开始测试!')\n", - " print(f'环境:{cfg.env_name}, 算法:{cfg.algo_name}, 设备:{cfg.device}')\n", - " rewards = [] # 记录所有回合的奖励\n", - " for i_ep in range(cfg.test_eps):\n", - " ep_reward = 0 # 记录每个episode的reward\n", - " state = env.reset(seed=cfg.seed) # 重置环境, 重新开一局(即开始新的一个回合)\n", - " while True:\n", - " action = agent.predict_action(state) # 根据算法选择一个动作\n", - " next_state, reward, terminated, info = env.step(action) # 与环境进行一个交互\n", - " state = next_state # 更新状态\n", - " ep_reward += reward\n", - " if terminated:\n", - " break\n", - " rewards.append(ep_reward)\n", - " print(f\"回合数:{i_ep+1}/{cfg.test_eps}, 奖励:{ep_reward:.1f}\")\n", - " print('完成测试!')\n", - " return {\"rewards\":rewards}" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 3、定义环境\n", - "\n", - "OpenAI Gym中其实集成了很多强化学习环境,足够大家学习了,但是在做强化学习的应用中免不了要自己创建环境,比如在本项目中其实不太好找到Qlearning能学出来的环境,Qlearning实在是太弱了,需要足够简单的环境才行,因此本项目写了一个环境,大家感兴趣的话可以看一下,一般环境接口最关键的部分即使reset和step。" - ] - }, - { - "cell_type": "code", - "execution_count": 51, - "metadata": {}, - "outputs": [], - "source": [ - "import gym\n", - "import turtle\n", - "import numpy as np\n", - "\n", - "# turtle tutorial : https://docs.python.org/3.3/library/turtle.html\n", - "\n", - "class CliffWalkingWapper(gym.Wrapper):\n", - " def __init__(self, env):\n", - " gym.Wrapper.__init__(self, env)\n", - " self.t = None\n", - " self.unit = 50\n", - " self.max_x = 12\n", - " self.max_y = 4\n", - "\n", - " def draw_x_line(self, y, x0, x1, color='gray'):\n", - " assert x1 > x0\n", - " self.t.color(color)\n", - " self.t.setheading(0)\n", - " self.t.up()\n", - " self.t.goto(x0, y)\n", - " self.t.down()\n", - " self.t.forward(x1 - x0)\n", - "\n", - " def draw_y_line(self, x, y0, y1, color='gray'):\n", - " assert y1 > y0\n", - " self.t.color(color)\n", - " self.t.setheading(90)\n", - " self.t.up()\n", - " self.t.goto(x, y0)\n", - " self.t.down()\n", - " self.t.forward(y1 - y0)\n", - "\n", - " def draw_box(self, x, y, fillcolor='', line_color='gray'):\n", - " self.t.up()\n", - " self.t.goto(x * self.unit, y * self.unit)\n", - " self.t.color(line_color)\n", - " self.t.fillcolor(fillcolor)\n", - " self.t.setheading(90)\n", - " self.t.down()\n", - " self.t.begin_fill()\n", - " for i in range(4):\n", - " self.t.forward(self.unit)\n", - " self.t.right(90)\n", - " self.t.end_fill()\n", - "\n", - " def move_player(self, x, y):\n", - " self.t.up()\n", - " self.t.setheading(90)\n", - " self.t.fillcolor('red')\n", - " self.t.goto((x + 0.5) * self.unit, (y + 0.5) * self.unit)\n", - "\n", - " def render(self):\n", - " if self.t == None:\n", - " self.t = turtle.Turtle()\n", - " self.wn = turtle.Screen()\n", - " self.wn.setup(self.unit * self.max_x + 100,\n", - " self.unit * self.max_y + 100)\n", - " self.wn.setworldcoordinates(0, 0, self.unit * self.max_x,\n", - " self.unit * self.max_y)\n", - " self.t.shape('circle')\n", - " self.t.width(2)\n", - " self.t.speed(0)\n", - " self.t.color('gray')\n", - " for _ in range(2):\n", - " self.t.forward(self.max_x * self.unit)\n", - " self.t.left(90)\n", - " self.t.forward(self.max_y * self.unit)\n", - " self.t.left(90)\n", - " for i in range(1, self.max_y):\n", - " self.draw_x_line(\n", - " y=i * self.unit, x0=0, x1=self.max_x * self.unit)\n", - " for i in range(1, self.max_x):\n", - " self.draw_y_line(\n", - " x=i * self.unit, y0=0, y1=self.max_y * self.unit)\n", - "\n", - " for i in range(1, self.max_x - 1):\n", - " self.draw_box(i, 0, 'black')\n", - " self.draw_box(self.max_x - 1, 0, 'yellow')\n", - " self.t.shape('turtle')\n", - "\n", - " x_pos = self.s % self.max_x\n", - " y_pos = self.max_y - 1 - int(self.s / self.max_x)\n", - " self.move_player(x_pos, y_pos)" - ] - }, - { - "cell_type": "code", - "execution_count": 52, - "metadata": {}, - "outputs": [], - "source": [ - "import gym\n", - "def env_agent_config(cfg,seed=1):\n", - " '''创建环境和智能体\n", - " ''' \n", - " env = gym.make(cfg.env_name,new_step_api=True) \n", - " env = CliffWalkingWapper(env)\n", - " n_states = env.observation_space.n # 状态维度\n", - " n_actions = env.action_space.n # 动作维度\n", - " agent = QLearning(n_states,n_actions,cfg)\n", - " return env,agent" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 4、设置参数\n", - "\n", - "到这里所有qlearning模块就算完成了,下面需要设置一些参数,方便大家“炼丹”,其中默认的是笔者已经调好的~。另外为了定义了一个画图函数,用来描述奖励的变化。" - ] - }, - { - "cell_type": "code", - "execution_count": 53, - "metadata": {}, - "outputs": [], - "source": [ - "import datetime\n", - "import argparse\n", - "import matplotlib.pyplot as plt\n", - "import seaborn as sns\n", - "class Config:\n", - " '''配置参数\n", - " '''\n", - " def __init__(self):\n", - " self.env_name = 'CliffWalking-v0' # 环境名称\n", - " self.algo_name = 'Q-Learning' # 算法名称\n", - " self.train_eps = 400 # 训练回合数\n", - " self.test_eps = 20 # 测试回合数\n", - " self.max_steps = 200 # 每个回合最大步数\n", - " self.epsilon_start = 0.95 # e-greedy策略中epsilon的初始值\n", - " self.epsilon_end = 0.01 # e-greedy策略中epsilon的最终值\n", - " self.epsilon_decay = 300 # e-greedy策略中epsilon的衰减率\n", - " self.gamma = 0.9 # 折扣因子\n", - " self.lr = 0.1 # 学习率\n", - " self.seed = 1 # 随机种子\n", - " if torch.cuda.is_available(): # 是否使用GPUs\n", - " self.device = torch.device('cuda')\n", - " else:\n", - " self.device = torch.device('cpu')\n", - "\n", - "def smooth(data, weight=0.9): \n", - " '''用于平滑曲线\n", - " '''\n", - " last = data[0] # First value in the plot (first timestep)\n", - " smoothed = list()\n", - " for point in data:\n", - " smoothed_val = last * weight + (1 - weight) * point # 计算平滑值\n", - " smoothed.append(smoothed_val) \n", - " last = smoothed_val \n", - " return smoothed\n", - "\n", - "def plot_rewards(rewards,title=\"learning curve\"):\n", - " sns.set()\n", - " plt.figure() # 创建一个图形实例,方便同时多画几个图\n", - " plt.title(f\"{title}\")\n", - " plt.xlim(0, len(rewards), 10) # 设置x轴的范围\n", - " plt.xlabel('epsiodes')\n", - " plt.plot(rewards, label='rewards')\n", - " plt.plot(smooth(rewards), label='smoothed')\n", - " plt.legend()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 5、我准备好了!\n", - "\n", - "到现在我们真的可以像海绵宝宝那样大声说出来“我准备好了!“,跟着注释来看下效果吧~。" - ] - }, - { - "cell_type": "code", - "execution_count": 54, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "c:\\Users\\24438\\anaconda3\\envs\\easyrl\\lib\\site-packages\\gym\\core.py:318: DeprecationWarning: \u001b[33mWARN: Initializing wrapper in old step API which returns one bool instead of two. It is recommended to set `new_step_api=True` to use new step API. This will be the default behaviour in future.\u001b[0m\n", - " \"Initializing wrapper in old step API which returns one bool instead of two. It is recommended to set `new_step_api=True` to use new step API. This will be the default behaviour in future.\"\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "开始训练!\n", - "环境:CliffWalking-v0, 算法:Q-Learning, 设备:cuda\n", - "回合:20/400,奖励:-126.0,Epsilon:0.010\n", - "回合:40/400,奖励:-43.0,Epsilon:0.010\n", - "回合:60/400,奖励:-37.0,Epsilon:0.010\n", - "回合:80/400,奖励:-52.0,Epsilon:0.010\n", - "回合:100/400,奖励:-49.0,Epsilon:0.010\n", - "回合:120/400,奖励:-38.0,Epsilon:0.010\n", - "回合:140/400,奖励:-26.0,Epsilon:0.010\n", - "回合:160/400,奖励:-23.0,Epsilon:0.010\n", - "回合:180/400,奖励:-17.0,Epsilon:0.010\n", - "回合:200/400,奖励:-36.0,Epsilon:0.010\n", - "回合:220/400,奖励:-18.0,Epsilon:0.010\n", - "回合:240/400,奖励:-29.0,Epsilon:0.010\n", - "回合:260/400,奖励:-13.0,Epsilon:0.010\n", - "回合:280/400,奖励:-16.0,Epsilon:0.010\n", - "回合:300/400,奖励:-13.0,Epsilon:0.010\n", - "回合:320/400,奖励:-14.0,Epsilon:0.010\n", - "回合:340/400,奖励:-13.0,Epsilon:0.010\n", - "回合:360/400,奖励:-13.0,Epsilon:0.010\n", - "回合:380/400,奖励:-13.0,Epsilon:0.010\n", - "回合:400/400,奖励:-13.0,Epsilon:0.010\n", - "完成训练!\n", - "开始测试!\n", - "环境:CliffWalking-v0, 算法:Q-Learning, 设备:cuda\n", - "回合数:1/20, 奖励:-13.0\n", - "回合数:2/20, 奖励:-13.0\n", - "回合数:3/20, 奖励:-13.0\n", - "回合数:4/20, 奖励:-13.0\n", - "回合数:5/20, 奖励:-13.0\n", - "回合数:6/20, 奖励:-13.0\n", - "回合数:7/20, 奖励:-13.0\n", - "回合数:8/20, 奖励:-13.0\n", - "回合数:9/20, 奖励:-13.0\n", - "回合数:10/20, 奖励:-13.0\n", - "回合数:11/20, 奖励:-13.0\n", - "回合数:12/20, 奖励:-13.0\n", - "回合数:13/20, 奖励:-13.0\n", - "回合数:14/20, 奖励:-13.0\n", - "回合数:15/20, 奖励:-13.0\n", - "回合数:16/20, 奖励:-13.0\n", - "回合数:17/20, 奖励:-13.0\n", - "回合数:18/20, 奖励:-13.0\n", - "回合数:19/20, 奖励:-13.0\n", - "回合数:20/20, 奖励:-13.0\n", - "完成测试!\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "c:\\Users\\24438\\anaconda3\\envs\\easyrl\\lib\\site-packages\\seaborn\\rcmod.py:400: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.\n", - " if LooseVersion(mpl.__version__) >= \"3.0\":\n", - "c:\\Users\\24438\\anaconda3\\envs\\easyrl\\lib\\site-packages\\setuptools\\_distutils\\version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.\n", - " other = LooseVersion(other)\n" - ] - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAkgAAAHJCAYAAAB+GsZPAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAA9hAAAPYQGoP6dpAACFiElEQVR4nO3dd3hUZcLG4d+Zkt6BJFTpofcWQVB0BRXRRV1dERXFrqhYwBU72CgWFLF3XVfFCvYC6kcHAUF6bwkhjfTMzPn+mMwwkwIBEkIyz31dXpJzzpx536nPvO0YpmmaiIiIiIiXpaYLICIiInKyUUASERERKUUBSURERKQUBSQRERGRUhSQREREREpRQBIREREpRQFJREREpBQFJBEREZFSFJBqMa3xKSIiUj0UkGqpn376ifHjx1fJuWbPnk1SUhK7du2q1ttI7ZCUlMSMGTNOyH2tX7+eCy+8kE6dOnHuuece8dj//Oc/DB48mC5dunD66aczbtw4Vq5cWan72rVrF0lJScyePbsqil6tJkyYwODBg0/IfeXk5HDjjTfStWtXevfuzbZt26rlflJSUnj66acZOnQoXbt2ZcCAAdx4440sXbrU77hRo0YxatQo79+lX49z5szhjDPOoFOnTjz44IPs27ePkSNH0rlzZ/r06UO7du148skny9z/o48+SlJSEg899FCZfY899hjdunWjqKioUnXxLdOiRYtISkpi0aJFFR4/ePBgJkyYUKlznyirV69m1KhRdO/enQEDBjB9+vRK1z9Q2Gq6AHJs3nrrrSo71+mnn85HH31EfHx8td5GpLQXX3yRPXv28OKLLxIXF1fhcZ9//jkPPPAAHTp04LbbbqNx48bs27ePTz75hH//+9/cc889jB49+gSWvHrdfPPNXHnllSfkvj7//HN++eUXHnzwQdq0aUOTJk2q/D6WLVvGLbfcQmxsLFdeeSUtWrQgMzOTjz76iFGjRvHEE09w4YUXlnvbjz76iMTERO/fjz76KM2bN+fJJ58kISGBt99+mz///JMpU6aQkJDA008/zYoVK8qc57fffiMmJobff/+9zL4lS5bQq1cvgoKCqqzOvl544QUiIiKq5dzHYufOnYwePZpu3brx7LPPsnnzZp555hkyMzN59NFHa7p4Jw0FJCEuLu6wX05VdRuR0jIyMmjbti2DBg2q8Jh169YxceJEhg0bxuOPP47Fcqjhe/jw4UyePJmnnnqKpKQkTj311BNR7GrXrFmzE3ZfmZmZAFx++eUYhlEt57/jjjto3rw5b775JqGhod59Q4YM4frrr+fBBx9kwIAB1K9fv8ztu3XrVuZ8/fv3p2/fvt6/4+PjvS2Q/fr14/XXX6ewsJDg4GAAduzYwY4dO7jrrruYNm0aW7ZsoWXLlgBkZ2ezceNG/vnPf1Z53T06dOhQbec+Fq+++irh4eHMnDmToKAgBg0aREhICI899hg33ngjjRo1qukinhTUxVYLjRo1isWLF7N48WJv066nmfe///0vZ5xxBj169OCPP/4A4OOPP2bEiBF069aNLl26cMEFF/DNN994z1e6u2zChAlcffXVfPrppwwZMoROnTpxwQUXMH/+/OO6DcCKFSsYOXIk3bp14/TTT+ftt9/m6quvPmLz859//sk111xDjx496NevH+PGjSMlJaXcsniUbtZOSkrihRdeYMSIEXTp0oUXXniB9u3b89577/ndLj09nY4dO3pb6VwuF6+88gr/+Mc/6NSpE0OGDOHdd9894vN08OBBnnjiCc466yw6d+7MsGHD+OSTT8qU8fnnn+epp57i1FNPpUuXLlx77bVH7OYoKiri2Wef5cwzz6RLly4MGzaMzz77rMK6V/Q4LV68mEsvvZSuXbsyZMgQ/u///q/Mfe3atYt7772XAQMG0LFjR5KTk7n33nvJyMg4bBlTU1O57777GDRoEF26dOHiiy/mp59+8u5PSkpi8eLFLFmy5LBdX6+++iphYWE8+OCDfuHI45577qFhw4a8+OKLhy1PZVXm+XY6nbzyyisMGzaMLl260K1bNy677DIWLlzoPWbGjBn84x//4IUXXqBPnz4MGDCArKysSj3npbvYKvs6+eyzzzj33HPp3Lkzw4cPZ8GCBXTo0KHCx3bUqFHerqJ27dp5XzOVfe0+/vjjXHXVVXTp0oX777+/3Pv4/PPPSU1N5T//+Y9fOAKwWCzcfffdjBw5kpycnHJv7+nO8nzGgbvlMSkpicGDBzN79mz27NnjPe7UU0+luLiY1atXe8/x22+/ERUVxRVXXEFoaKhfK9KyZctwuVzecH2sr3ePoqIirrnmGvr27cvff//tfaw8j62nq/ebb75h7NixdO/enT59+jBx4kTy8vK85ykuLmbq1KkMHDjQ+3x//vnnRxzaMGTIEMaOHVtm+wUXXMBNN90EwO+//86gQYP8WsyGDh2Ky+Uqt4UtUKkFqRZ66KGHuOeee7z/bt26NWvWrAHcTbkTJ06koKCA7t278/777zNp0iRuu+02evbsSVZWFq+++ip333033bt392u69vXXX3+RmprK2LFjiYiI4LnnnuO2225j/vz5REdHH9NtNm/ezNVXX02nTp2YPn06GRkZTJ8+nezsbM4777wK67t27VquuOIKunbtytNPP43T6WTatGneD4yjMWvWLO666y5atGhB48aNWbJkCXPmzOGKK67wHvPtt99imqa3TA8//DCzZ8/mhhtuoHv37ixZsoTHH3+c7OxsbrnllnLvp6CggMsvv5wDBw4wduxYGjduzI8//sj9999PWloaN954o/fYd955h549e/LEE0+QlZXF5MmTGT9+PB999FGF9bj77ruZN28eN910E127dmXevHlMmDABu93OsGHDKvVYrFmzhmuuuYZ+/frx/PPPs2vXLsaNG+d3TH5+PldeeSWxsbE89NBDREZGsmLFCl544QVCQkIqbI5PS0vj4osvJjg4mDvvvJPY2Fhmz57NLbfcwtNPP83w4cP56KOPeOSRRwD367iiVpNffvmFQYMGERYWVu7+oKAgzjrrLN59910yMjKIjY2tVP0rUpnne+rUqXz44YfcddddJCUlkZKSwosvvsjtt9/Or7/+6g0Ce/bsYd68ed7uC89751ie8yPd5vPPP2fChAlccskl3HfffaxatYqbb74Zp9NZ4Tkfeugh3nzzTT755BM++ugj4uLijuq1+/777zN69Giuu+46wsPDy72P3377jfr169OlS5dy97dr14527dod5hlx69ixIx999BGXXnopF198MZdccgmGYfDiiy+ydu1aXnjhBRITE4mLiyMkJITly5fTq1cvbxmSk5MJCwujd+/e/Pbbb94uzCVLltCgQQOSkpKO+fXu4XA4uPPOO/nrr7948803ad++fYXHPvTQQ1x00UXMnDmTVatW8cwzzxAbG8tdd90FwIMPPsjXX3/NbbfdRvv27fn666954IEHjvg4DR8+nFdeeYWcnBxvt97mzZtZt24dN910EwUFBezevZsWLVr43S4uLo6IiAi2bt16xPsIFApItVDr1q29L/zSzc+XX345Q4cO9f69c+dOrr32Wm6++WbvtsaNGzNixAiWLVtWYTA5ePAgs2fP9n5phYWFccUVV7Bw4UKGDBlyTLd5+eWXiYyM5LXXXvN+gbRs2ZLLLrvssPWdNWsWMTExvPHGG94m8/j4eO666y42btx42NuW1qtXL7+xKhdccAH/+c9/2LNnj7dZec6cOZx66qk0aNCArVu38r///Y9x48Zx/fXXAzBgwAAMw+Dll1/m8ssvL/cLefbs2WzYsIH//ve/dO/eHYDTTjsNh8PBzJkzueyyy4iJiQEgKiqKmTNnYrVaAXd3wIwZMyr8st+wYQPfffcd//nPf7jqqqsASE5OZvfu3SxatKjSAenll1+mXr16vPTSS9jtdgBiY2O58847vcds27aNxMREnnrqKZo2bQq4uzBWrlzJ4sWLKzz3m2++SXp6Ot999x2NGzcGYNCgQVx99dU8/fTTDBs2jG7dulX4OvbIzMwkNzfXe46KnHLKKZimyZ49e44rIFX2+U5NTeXOO+/0G0wcHBzMbbfdxvr16731cTgcjB8/3vtF7XG0z3llbvPcc89xxhlnMGnSJMD9erPb7UybNq3C+rZu3dr7I8lT5g8++KDSr91GjRpx9913H/Yx3bdv3xGfv8qIiIjwljExMdH777i4OIKCgvxeQz179mT58uWAu0Vn0aJF3HfffQD079+fZ599lqKiIoKCgli6dKm39ehYX+/gbnmcMGECixYt4s0336Rjx46HPX7QoEHeiTbJycn88ccf/Prrr9x1113s2LGDzz77jPHjx3s/r0477TTS0tKO2MIzfPhwZsyYwY8//ugd1/X1118TFRXF4MGDycrK8j6epYWHh1fYkheI1MVWx5T+xTJhwgTuvvtusrOz+fPPP/niiy94//33AQ47YyEuLs7vF73nQzQ/P/+Yb7Nw4UIGDhzo18zevXv3I354Llu2jIEDB3rDked2P//882F/oZWn9PFnn302wcHBzJ07F4C9e/eybNkyLrjgAm+ZTdNk8ODBOBwO73+DBw+msLCQZcuWlXs/ixcvpnHjxt4vGI/hw4dTWFjoN/Oqc+fO3i89OPJj7bnPs88+22/7jBkzeOyxx474GPiex/Ml6nH22Wf7laV9+/Z88MEHNG7cmG3btjFv3jxef/11tmzZctjXz+LFi8t9bocPH87+/fvZsmVLpctZGZ6xM06nE9M0/Z4rh8NR6fNU9vmeNm0aV111Fenp6SxdupRPP/2UL7/8Eij7virvNXq0z/mRbrN9+3b27Nnj9+MIOGzLbEWO5rVbmfef1Wo9bCtWdUhOTmbFihWYpsny5cvJy8tjwIABgDvw5ufns2zZMvLz81m7dq03IB3r6x3crYpfffUVV155JZ07dz5iGUv/KEhMTPR2sS1atAjTNMs8n74/fsp7nZumSdOmTenRo4f3Mw3cP/qGDh1KUFAQLpfrsOWqjnFotZVakOqY0t0QO3bs4MEHH2TBggXY7XZatmzpbc4+3DpKpccKeN40h3tzHek26enp1KtXr8ztyhuY6SszM7Pc2x2L0o9PREQEZ511FnPmzGHMmDHMnTuX0NBQzjrrLO99Q8VfNJ5xUKVlZWXRoEGDMts9dc3OzvZuK29cBlT8WHvKdLyPSVZWVpnWCpvNVmbbm2++yaxZs8jMzKR+/fp06tSJ0NBQDh48eNhze36B+yqv/ocTExNDeHj4EZeT8Oxv3Lgxn332mbe1wMN37NPhVPb5Xr16NY888girV68mNDSU1q1be1sgS7+vyut6Otrn/Ei3SU9PB8q+Jo703irP0bx2K+r29NWoUSNWrVp12GP27t1Lw4YNj7KkFUtOTmbq1Kls2bKF3377jZYtW3qfH0+rmWdafnFxMf379/fe9lhe7+Bufezduzdvv/02l156KQkJCYc9vrzn0/Paqej59P178eLFZWY6vvPOO/Tt25cLLriAxx57jIyMDHbt2sX27dt5/PHHgUMtR7m5uWXKlJOTQ2Rk5GHLHUgUkOowl8vF9ddfj91u55NPPqF9+/bYbDY2bdrEF198ccLLk5iYSFpaWpntBw4c8M4oKU9kZKT3A8PXvHnzaN++fYXhrbwPgPIMHz6c66+/nu3btzNnzhyGDBni/fCKiooC4O233y73i66i2R7R0dFs3769zPb9+/cDHFc3kKdM6enpfmPINm/eTGZmJj179gQo86vddwAouMNH6efDNE1vEzzAV199xZNPPsk999zDiBEjvDMXb7/9dr9BsKVFR0d76+rrWOp/xhlnMH/+fHJzc73PQVFREVu3biUpKQmn08mPP/5Ix44dqVevHmeccUaZAcXx8fGkpqYe8b4q83zn5OQwZswYkpKSmDNnDi1btsRisTBv3jy+++67SterKnleBwcOHPDbXvrvyqjq1+5pp53GL7/8wurVq8ttWfn777+58MILue+++7j66quPurzl6dChAzExMfz5558sWLDA23rkceqpp7Js2TIsFgtt27b1BsJjfb2Dey2lPn36cM455/DII48wc+bMYy6/J1ylpaX5fcb4fg527NixzOvcM67onHPOYdKkSfz4449s2bKFxo0bez8XwsPDSUhIKPMcHzhwgNzcXFq1anXM5a5r1MVWS5U3m6e0jIwMtm7dysUXX0znzp2x2dx52DOz7EhNrVXNMziysLDQu23t2rVHbB3o1asXf/zxh18T99q1a7n++utZs2aN9xfRvn37vPs9YaEyPNOL33nnHdasWePtXvPcN7gfy86dO3v/S09P57nnnqvwPnr37s3u3bvLrMfy5ZdfYrfbKxywWhmeD7qff/7Zb/vUqVOZPHky4P6V6Pt4AGW6A5OTk5k/f75ft85vv/1GcXGx322ioqIYM2aM98siNzfXO/OnIr1792bFihXs3r3bb/uXX35JgwYNOOWUUypbXW644QYKCgp45JFHvPf5119/ceGFF3LDDTfwxBNPsGPHDu8A6tjYWL/nqnPnzpVe36Yyz/eWLVvIzMzkyiuvpHXr1t73Yk29r8AdkJo1a8YPP/zgt/37778/6nNV9Wt3+PDhNGjQgCeeeIKCggK/fU6nk6lTp2K32znnnHOOuqwVsVgs9O3blwULFrBu3boyAWnAgAGsW7eO5cuX+7UeHevrHdwtbA0aNGDcuHH89NNPfjOFj1bPnj2xWq2HfT4jIiLKvM49n4VRUVGcccYZ/PTTT3z33XcMHz7cr+usf//+/Prrr36fqd999x1Wq5V+/fodc7nrGrUg1VJRUVGsWLHCO423PPXq1aNx48a8//77JCYmEhUVxW+//cY777wDHH68Q3W48cYbmTt3LmPGjOGaa64hOzub5557DovFcth+75tvvplLL72UG264gSuvvJKCggKeffZZunTpQv/+/SkoKCAkJIQnn3yS22+/ndzcXJ5//nnvQNIjsVqtnHfeebz33nskJCR411cB9xTj4cOH88ADD7B79246derE1q1beeaZZ2jSpAnNmzcv95wjRozggw8+4JZbbmHs2LE0adKEn3/+mU8//ZRbb73V21JxLNq1a8fQoUOZMmUKBQUFtG/fnvnz5/PLL7/wwgsvAO5Wl5dffpmXX36Zrl278vPPP/tNQQe45ZZb+PHHH7n22msZM2YM6enpPPvss35jkrp06cKHH37Ik08+yRlnnEFqaiqvv/46aWlpFc5mBBg9ejRffvklV199NbfeeisxMTF8/vnnLFy4sMxaRkfStm1bnnzySe677z527NjBZZddRpMmTbjjjjt47rnncDqdJCcnV3rl6T/++KPcLr5zzjmnUs93Xl4eERERzJo1C5vNhs1m47vvvvP+mj/R7ytwd2ePHTuWu+++m4ceeoh//OMfrFu3zrv0wdE83lX92o2MjOTJJ5/k1ltv5ZJLLuGKK66gefPm7Nu3j/fff59Vq1Yxbdq0I3ZJHa1+/frx+OOPY7Va6dOnj9++5ORkcnJyWLp0Kddee613+7G+3n1ddtllfP7550yePJlTTz210rfz1bRpUy666CKmT59OcXEx7dq144cffuCXX34BKvd8Dh8+nLFjx+J0Ov1+9AGMGTPGO6xg9OjRbNu2jenTp/Ovf/1LayD5UECqpUaOHMlff/3FddddxxNPPFHhitYzZ85k8uTJTJgwgaCgIFq3bs1LL73E448/ztKlS/1m4VS3U045hddff52nn36asWPHUq9ePW644QZeeumlCqcIg7u5/N1332XatGnccccdREREMGjQIO6++26CgoIICgpixowZTJs2jVtuuYXGjRtz6623HtUSABdccAFvv/02w4YNK/Ph88QTT/Dyyy/z3//+l3379lGvXj3OPfdc7rjjDr9Bs75CQ0O9ZX7uuefIycmhZcuWTJ48mYsvvrjS5arIlClTeOGFF3j77bfJyMigVatWPP/8896xUzfccAPp6em8/vrrFBcXc/rppzN58mTvOigAzZs357333uPJJ5/kzjvvpF69eowfP97vMg3//Oc/2bVrF59++ikffPABCQkJDBo0iMsvv5wHHniAzZs3l9sk36BBAz788EOmTZvGpEmTvB/yM2fO5Mwzzzzq+p533nm0bduWt956i+eff579+/cTExPjXafntdde46KLLmLSpElHXJTv66+/5uuvvy6zvVOnTiQkJBzx+Y6MjGTmzJk8/fTT3H777YSHh3vX07ruuutYunTpCbtMiK/zzz+fvLw8Xn/9dT799FPatGnD/fffz/3331+psUIe1fHaHTBgAB9//DFvvPEGL7/8MmlpacTExNCpUyc++ugjunbtekznPZzk5GSKi4s59dRTy4z3iYuLo0OHDmzYsIHevXt7tx/r692XxWLh0Ucf5aKLLuKpp57yjv05Wg888ABhYWG88cYb5OTkkJyczE033cSLL75Yqedz0KBBREZG0rRp0zJT+lu1asUbb7zh/SyOjY3l6quvLnf9pEBmmLriqZwgnoHivlOes7OzOfXUU7n33ntP2KUVpO5JT0/n3Xff5aKLLqqWS2XUBl9//TUdOnTwG8/366+/csMNN/DFF19Uaq0hOTlkZmYyf/58TjvtNL8xX0899RSzZ88+7HXfpOqoBUlOmDVr1vD8888zbtw4OnbsSGZmJm+++SaRkZGVXrtHpDxxcXHcfvvtNV2MGvXll1/yzDPPcMcdd9CwYUO2b9/O888/772Aq9QeoaGhTJ48mfbt23PVVVcRFhbGn3/+yXvvvccNN9xQ08ULGGpBkhPG5XIxa9YsvvjiC/bu3UtYWBh9+vThrrvuOqpBuyJSVkZGBtOmTWP+/Pmkp6dTv35972UnDteFLSenv//+m2effZY///yT/Px8mjVrxmWXXcbIkSO1VtEJooAkIiIiUoqm+YuIiIiUooAkIiIiUooCkoiIiEgpCkgiIiIipWiafwnTNHG5Am+8usViqN4BRPUOLKp3YAnEelssRrXN6lNAKmEYBtnZeTgcJ/46SjXFZrMQGxuuegcI1Vv1DgSqd2DVOy4uHKu1egKSuthERERESlFAEhERESlFAUlERESkFAUkERERkVIUkERERERKUUASERERKUUBSURERKQUBSQRERGRUmptQHK5XDz//POcdtppdOvWjeuuu46dO3fWdLFERESkDqi1AWnmzJl88MEHPPbYY/z3v//F5XIxZswYioqKarpoIiIiUsvVyoBUVFTEG2+8wdixYzn99NNp164dzzzzDPv27eP777+v6eKJiIhILVcrr8W2bt06cnNzSU5O9m6LioqiQ4cOLFmyhGHDhh3Tea3WWpkXj5mnvqp3YFC9Ve9AoHoHVr2r6Tq1QC0NSPv27QOgYcOGftvj4+O9+45FVFTocZWrtlK9A4vqHVhU78ASqPWuDrUyIOXn5wMQFBTktz04OJisrKxjPm92dj5OZ+BcBdlqtRAVFap6H0ZegYOMnEIa1w8/QaUr367UHOrHhBASdGxv2b0HcomJDKFRQlS1Pd9FDidBNisAxQ4Xew/k0iwhssrO7zJN9h3Io2G9MIySn40u0yQlPY/EuEPbPPfdND4CwzDIKSgmNDSYIAt+9d64K5O1W9M579Tm2Ep+de9IOciBrAK6tanvPV95DuYVkVfgICEuzG97scPlPu+2DAZ1bUSD2ENfVi6XyTcLt9OiURQhQVZ+X7WXYLuVs3o1pV50iPe8O1JyAGjbNAa77VBrgGma/LRsFyFBVk7t3BAD2Lr3IDn5xbRsFIXT6WLDzky6tK7PL8t30bBeOL07NWLHnkwSfcrpdLlYszWdVZsO4HSZnNo5EYCVmw5wSkIEwSWvsbBgG5k5hWzbm02H5nG0ax5LQaGTtKx8GtUPxzBg9/5c6kWHcCCrgIVrUggJspKTX0x0RBC9kuIJDrIy9/+2UVxczCnx4TRPCGNPajZ70w7SKC6UjqdEsyslm7+3pmGzWRjYvwuxMRGs2pzGpp2ZxEcHsTftIPUj7XRtGUt+QRELV+/G5XLRt3srUrOKWL8lBavpwGI6sJoObDgItpq4igqxmMVYcNGyZRMaN23I6k37SUnPpUF0CAdzC8nNKyTYbqFJ/VCsFtiXloPVYhAWbCEtIw/T5cLAhQEYpguLYRIeYqOwqBinw0lUmJ1m8eFs3JmBw+HAMM2S400MIDTIgtPhdO/DBEwM07PfxBoeQ+dzLuKbRbvIzCks97XWqnE0CXFhLFqb4v++NU2sZjE2VxE2iunerhH14uP48f82YRYXeh8P//87MXBiMV1YLRAWZKGgsBjT6TxUbk8dTJd3Gybush+6c4CSv03vVsM0MQyIDAuiqNhBQZGzpN6lbm/63KZkvzWyPu3OuYS5C3eSnVvJscSGwU0XdSUkuHqiTK0MSCEh7g+ToqIi778BCgsLCQ099vTsdLpwOAInKHgEcr2LipzMXbid1o2jaXdKrN9+h9PF4+8sZUdqDtcP70C/Donlnic7r4jQICv2knBQEdM0mffnHgqKnAzp0/SwX8C+vvx9K5//vpVOLeIYd2m3MvtTM/P54rctbNyVxS3/7MwpiYdCictl8sfqvbz1zTqaxEdw+dB2/LBwO1cOTWLTrizqR4fQqH44BUVOIkLtZcqbmpnPM/9bSZ/2CYwY2JLdabnMW7Gbc/qdQmxksPfY75fs5JNfN3FG9yZcdmZrZny6mj83pXHd+R1I7njocVu3PYPX56zlokGtWLp+P/mFDq49r733XB/8uBGXy6RpfARzF27nzJ5NaNEwipjIYH5etovvl+ykV1IDrju/A3ablVe/WsuCNfs4vXtjhvZthukyefnLNezYl0XfNjFcdk4XHnt7KXmFDu6/she/Lt9N88RIwkNtzPp8FYajEKsBjeOC+X3FdnZs30OIUcymtons3JdBw2g7HZpFEZnYhGYtWwCQW1DMw28sIeNgIf8+qw1rtqbTrU19Vm5KY8XGNG9dN+zI4PZLuvK/nzexcVcmvdrF8/lvWwEIthuEOHKJtuSxa91fjB45hJ+W7+Hr/9vm/TppFh/B+JE9CC358P903mbmLNgOwE9LdxEXFcKSdakARITasdssZBwsJCYiiMycIsAkJmgpruIiRl3Qi8TYEL5fsJktW/cQ7Mwh2pJHhFHAr5tjyHEF48o/yAFLAZFGASGWIkIoJsQopqFRTP6qYtYFWSh2uDBNF1mYWA0XFlwUGSYWXPTBhQUTKy4shgtjuQunYXKOp0buohMNtAfYAs6l0BD3fwDmdjiAhcami6Ylb4+Ongd0OYQBgz1/74RIoFWZd0Q5UqFgIbTB/V8ZO9z/8+2TaF2Z85bctlNljy0tC/78NIvPt7QkypJPnCWHWEsuMZY8Ii35RBoFsKGYDMNBJ8NBiFFMcMnzEmw4sBiHgga73P8761jLUpWOpZ3iIPw5O58vNjcj2uexcL9OCwm3FBJuFBBuFBJmKSLcKCTYKMaZ/zIExx75/MfAME2fKFdLrFq1iksuuYQffviBZs2aebf/+9//JikpiYcffviYzpuRkVvng0Kxw8m8P/fQMymeBrGhxMaGk5GRS2GRg49+3kSrRtH07ZBwxPP8uTGN4CAr7U+JJbegmM9/20rfDgm0bhx9xNtu3ZtNdHgQcVEhLFufytvfrmfkP9qWuV+ny8Xb36wnOMjK5We5P9Z8Q4XLZWKxlA0ZLtNk7dZ0WjaKIizEXma/1WoQExNOVlYePyzZybvfrQfgjQmDcThd3taEz3/bwpd/bAMgyG7hgSt70bhBhPc8+zPzef7TVezen0uz+AgeGt3br3wpGXkUFDo5JTESh9PFe99vYP7KPQAMO7U5vyzfxb/PasPmPdlkZBeS3CkRA+jYIs77pTh/5R7e+mad95zD+zen2OEiyG7lH72asCMlhxmzV+MszCfUKOLsQV04p98pAHyzcDufzNvs+2MNi8XA5TJp0TCKrXuzMQyIDg/iYF4x15zXni17smndOJq129LZsDOTuKgQtmxPJchwcPGglsxfuZv0rDyaJ0Ry4wXtsVis/LRsJ/+3dBNhlkLCjUI6Nwlh395UgnBQaNoJDbYQYjU4pVEMG1MKcORk4sBGqFFImFGE3eL+hR4WE0daejZBOLEYLkKNYkKNIoIMB06sFJsWoo08rIYL0x5KaFQc21PdX/RW3F/SFsMkCAehlmIACswgDFy4TIN0orC4nAQZxdgNJ+FGIVajch9/TtOgeMCN1OvQh1lfrGFpSTAJoph61hxCjGJiLbk0sGQTb80m1pJLkWnDNAzsOLAbTuw4CTIcJfddgN049FmTZm/IG0XnsC+jkARrFs2Cs7E5Cwhq3J6Rl5zO1/+3jc9KwlWQzUKRw0WoUUSCNZvmYblYig4SY8kjzpJDqFFMPctBIi0F2IyT9/PMiQWnaeA0LVisVkyXkxCj+LC3cZkGLsPANA3shtO73cTAadhwWey4LDZMaxBOrDgNO9n5TsKcBwk1CjENCzarFacJGBYsFgtOE4qd4MLAZrXiwsBpgt1mw7BY3G09hgFYcOE+1mK1YhgGGTnFFDpMwkPthIcFY2KBkuNdGBQ5TCwWC1abtWR7yX4MDqRn0dXY4H4sTKPSr8XSTKAYO1bTgdUwMQGX57EwbCWPiw2XYcdlWDENK6ZxqN5Wmw3DYgXDwMSCaXjq7P6/e5CP4ddm5P6n4dt25P2/w2WSnVuE3WYlPMyO4dfO5Pt5bWCW/HkgNY2elnXush/lY9Fk7GsERVZPQKqVLUjt2rUjIiKCRYsWeQNSdnY2a9eu5Yorrqjh0tU8V8k3oqWcFor3vt/Ab6v2snzDfv5zZS/v9vU7Mvlx6S5+ZBd5hQ7O6N64wvOnZeYzY/Yq7DYLz489jVlfrGHN1nT+3LifKTf39zvWNE2+XbyDjIOFXHZmG/bsz2XSO0tpXD+c2y7qwouf/QW4w0jpgPTtoh38vnovAHFRwXz+21auG9aBXu3imfn5X6zYsJ82TaIZfW57DCDIbiUqPIjP5m9hzoLtJHdMIDoimJT0PG75Z2d3ODBNnnxnOdl5RfxnVE9WbTr0i//1OWv5Y/U+oiOCGHlWW+YudP/kjY8JJTUzn49/3QxAsN3Kded3YP7KPezenwvAjtQctu49SMtGUbhKukL+9/MmXC6Tmy7sxI+LtrI/JYWO9kxCjUL2LN1KH0se635ah+EqprGlgLQ9hdgMJ8vt0PKUhsQkNmTXgt2cF3qQWEsusZZcwlcXEFLyBZ+6LgJ7cTEPhBYQHu5ukl6TUgicQmGRk68XbC8JRybRRh7hlkLiLDnE2XMpPGCjfWgOkUY+oRQTGl5E6Lyv6WsUwyY4y3BwLg6cWRYi4wrcD9Aq6GYAMUAhFPzPvTkZSPbNxZlAeQ25e6ARFewDyN1O6+AK9pXmLICMDNqUzb9+QoySpnoDGnPgiPN2TXsoOU47LkcxhtUOVjtFhYXUt+bAordZtfx3OmdlcXZ0FvUsB48vgBgWHEFROAtyqF+8lxv5kPDYAr8vB0fGUpb+bLJgeSad7Af5RwsHzaz7Kd6/g2BX3qFzHeaT3GUafi0NpjUIS3gMlrAYtqSbWPPSCDYcmCFRNGqciBESiREcBvYQjKBQDHsIOQ4ri9alER5io3f7RPKLTbBYCQsNZsHa/RwscHJW7+YEBdsxLFZyC51M+/gvMnMdnNGzGcNPawUWKxgWsNi8I2v/2ppOkM1CUrNYZnyyko2bd2HFxT9Pb82ALo3dX9wWG07guyW72Z9VxKWDW7P3QB7T319EvUg7947qR1hYsPfHic1m8f7wczhc/L0tnUf/+ycAo89tx2ldGh37c1ZKwyInew7k0jwxstItwh4zP1vNwZ1fMSBkA1bDxIUFa2Q9LBH1MCLqYQmLxgiJpIBg8l1W6sVGQVAIht39HyXPDbYgCopcPPe/FRQUFjPu3z2JCa/sG+nk8Mz/VuJIKaZv8Oayj0V4LEZoFEZIBEZwhPv/nn8Hh2ENO/KP8mNVK1uQAJ555hn++9//8vjjj9O4cWOmTJnCrl27+Prrr7Hbj/CpWYG60oI049NVbN6TzaQxfct0m1zz5M/ef78z8SzvB8mvy3fzxty/AbBaDKbf2p/IsCBM0+TDnza6WxMiQ7ju/A4s/juFt791t7rcOqIzL8xe7T3n9Fv7k5VT5O3m+WbRdj7+xR0s/nNFTzbuyvQGjRYNI9m69yAANqvBzHGDsFktOF0uvl+yk8/mb8VRzliZe//dnac/XOH9u1lCBHvS8oiJCOKmCzvx2NtLCaaYImwlv/nggdH9iIsKYWfKQaaUfFh2aB7Htr3Z5BU6ytxHeIiN3AIHLWJNru9j5dd5K3BiIdwoJMhw0KBBLHsOQmRhKo2t6RgG5JnB2C0mNouJ6XAQaSkg2CjGheWIv4yrisOwEd5/JOv2FbJs5WaSQg7QOTID8jKO+9wu0/2r2DAMHKb716Ol5HehGRROSGQ0aflWDhRacdnDCQsPZ+eeA0SEBRMXE8bOPekEG8UQGk1eXgHWkDBO692GnCKTxWtTKMzOwIGNXp2bUVDkpFWLhliCwzhYZOH9b//CVVzEPwZ14UCOk9+WbiLSUkBkiJV/Du1BWFgohtXdjmS1BWGEhPP6d5vYtnErDtOK3XASa8khJiaSq4Z1wbAFYwSHsWJHAf/7bjVnJ7dicJ9WGBZ3N9L6nRkkNY3FbrPwzH+Xcnb6hzSzHSj/gQkKw2ELwwiLJqReI4zoBNIcEXw9fwPBQVbO6d+K39YcYPO+fBIbRHPFuZ0xgsMxIuLYk17AW299yc2RP3jDlhkUhjWuKdt37aepLf2wz4kRGo0lthFGeBxGaBSWqAYYQWFYIuphj2lAbMME3vr6b77/fR1tmsVx08U9sNsPjd1cuGYfr3y1FoCbL+xEr3bxx/068di9P4c1W9M5o0cTv7FUFdm1P4epH66gT/sE/n1WmyMGjoyDhQTbrYSF+KfD0gEJ4LvFOygqdjLs1OZHHWSqyzvfrmPen7tpaM0k1xVM/z5JXHxGuR2AlWK1GkTHhJOdlVfrvsde+WoNi9bspZE1gxxXCKef2oELTqtU5ylxceHVNnOvVrYgAYwdOxaHw8HEiRMpKCigd+/evP7668ccjmqaaZqHfePu3p+DYRg0Kmew8Jqt6Xzxx1auHJJEvagQ/tyYhgls3JlJ97YNvMelZxf43a6o+FATdVbuoQGCTpfJxl1Z9GjbgLSsAn5c6u7c3pGSw+K/U/h7+6Ev21dLPlw9nnxvOamZ+Vw4oAWndk7kk5JwBLBhVybrdhy6rScchRqFOJ0Wtu7Jok3TWH5YsouPf9mMgUlT6wEsmBw0Q2lkzcBmOPn9y+2cHZJF2/BsIhzpuPINIiILcJhW8r8I4eHoPGKteRSZVmy4yDWDWb66HrOXZvqVde22dMKNAvoG7cKBBQsmp9jSiLdmkekKJy4yh1ZGKpYlJuf6j8WFg5AEUNEPNb93lvtxNg0r1ugEtmRacBQXU2iLwCwuxGlY6dKxBcER0TgNGz/9uY/inCxiLbkEG8XEN25M0xancJBwjNAoYuJi+XT+VrZt24XFFsQ1I3rz565iope/SVv7Pgp/f5sWQAvPSyUPMKwU28Jw2MNJc0WRk52DKyyO7t3aYgSFkeO0sWpnHu1bNyI7twjDFkyOw8KPi7Zw3j960b5NY9KzC8jJL6ZZQiQp6Xms3Z6B0+miVeNoWjSMAiACaF5yty7TxLIjk5aNogi2Wzmwcg9L1+9n9LntyMwpJC4yhLDwIMKAXm3ymPHpKnolxdNqYEu/h7IecGF0S/YeyKND+3hMICckAYthMKhbI293ZGnDB7Vj0rYcWjeJwQRWbUpjdK922BIOtSD07AA92jfye+/ZbRY6tajn/TssLJSZ2//BwMjtuIrySUyox2n9u2CJbYwRHIYRVPrFAY2BoU16EhMRRHREMMlNc9nx80YG9G+BtcGhX7yRYXY2OxKYnn0u0ZY89jhiefruc7FZrbw541f+bX5DG3sKea4gLJH1iWjcEmtCa6z1m2OJTsAIqnjMpcVmwWIL4oLTWtG6cQytm8SUCSpdW9cnItSOxWLQuWW9Cs50bBo3iPDrkj6SJg0ieHbsaZU+3ncM3JEM6dPsyAedYOGhdkwM9jjd3UMRYcfX6mMYBtZyhhzUBhEhdkws7Ha6X4PH+1hUlVobkKxWK/fccw/33HNPTRfluH31x1a+X7KTuy/r7jfA1iPjYCEPvL6Y0GAbz97Wv8xg4F9X7GbTriz+b/U+urau5+0X3rk/xy8grdzs/wt41/4clm5I49uF26gXGeK3b8POTHq0bcC2fQf9tq/YmMbWvdnevwu9IcsEDFIz87HgYu7vG9hzIBcTqGc56P7w32Kwf18e/YJ3YcdBhKWQrqF7ach+AJzffERGZAJxBXHcHHmA5iHZBDvzqJAJlBkXneP9V1DJGIVIo4BmWz+jR1Bzwo1CWtpSSYrOw5WXTbhRwJE+Uyz1TyE7OJE1W9IIi4omOiaKrdv3EWEUYEY1pN9pfXlu9hpCjSJaNomleeNYmsRHERUXx5Y0B8vX7eX05CTi490zo6JTDrJq8wFO796YT37dRFLTWKI6HRrI3K1pDo++tQSny2Rg14b0OTsJm9Xil8WGDGnIhz9t5NROicQ1rU9Qxl7ezjmNC6L+IsKZTZDhAHswrbv1ILRJEtb4lthD3GPONmxJ478/beS85FMILpllFgec3tV9bt+Ozm49O3v/HRcVQlyU+3WSEBdWZgZXmcfNMGjvM/D9tK6NOK2rO5zERPh/ACbGhTH5un4VnqtZQqR3RpwBnFsyzupwEmLDmH5rf4KDbRg2G8vX7qVLOSHgSC0KESF28s0gvst2/7o/O74ptmZH/qXv+15uXD+ccf/qVu65DWC3M47dzjjCQ2zYrO4XdWhYKC/uP9s9lgs7957TnQanHP1YC4vFoH3zuHL3hQbbeOSaPhgGBAcdfpKBVK3SrfvhobX26/i4hZd+LEJOjsfi5ChFgPMMvpz0zlJevfeMMvt/Xu5uwckvdLAvPZ8mDcLZsDOT7LximidGsi/DHSJ27c8hKvxQ8/mu1By/86wuFZDm/bmHX5bvBmBbSWtOq8ZRbN6dzcZdme7t+7L9tq8qOUcjazrRlnxCjCK6Be+ik30HGc4wXBg0sBzEYphs3hXPmdGFJFpLpjTk4G5iqIAVFxzcS3v2gh1wgssaTF4xhBlFpBlxBIdHUuyC4Kg46rdIojA8nt9X7qV+YgO+/79NhBhFdG7XjLimzfnftyuJtuRzc+QPtGAnLSJ8rtVXgHc8SkF4I3ZlOgCD8Mat2VkYyYG9e8kyQ/nXZecR2bAp4YA9K5+YiGCsFoMl365n9so93PSPToS2jOf0cxLZkZLD0IEt/X6lt6kPbdr5NxX7ftlffU77Mo9D0/gIHrq6Ny7TrHCafHREMDdecGjuTHiIjRwzlPezersfS4vBzHEDy51Z1yA2lJsuPOZ5N7WK3WbFYhjERoXQMyn+mLoeSn9xRYZVXSu1xWIQHmonJ9/dBRsRduj9GxkWhIlBIe77iwwPKvccx+toWmKk6pQOSKX/DiQn62OhgFRNvvq/bWze7Z52XZn+d3B3be3an0MTn2bpomL3rDOPPWm5ZOcVMa1kHE1UmJ28QndLye60XL/uhl37c9mdlsvu/Tk0T4z0tvy0aRLNxl1Z3nDkq3e7BDbvzmbrXvd6MNv2HiSYIi5I2MmBnLVYnYXEWnLLHY/RwOrf2tTK7p7pY2LhgDOMCEshIUYxqdYE9haEUOiy0qlvX6yndGfyB6uwFOfRypZCgjWbLEs0oy45nfzwhkx8cSEWXJze4xRGnt3W7z6CgHNadQdgXU4U2/Ye5NTBXbFZDX5Ylc7GXVl8lteLwSFryHCFExkTS9OktsS17UquKwRXcBSYIcx4/ncAHhvQl8j9Oby/dQ2N64cT3bCp977qRx/qzrhqaBL/HNiS6JIvrX4dE+nXkSrTJL7yXRNAmXEY7jV01CJQFUp/WEeFVW1QiQoP8gakSJ/7Kh3EoqspIEnNKN1qcrKEgppQusWo9GNTUxSQqsln87cAsGRdCqd2OrS6RnZeERt3ZtGj7aGF6ILtVm9X1W8r99KiUSQ5ecUM6taYNVvTvR+e4A5IDpfL53yH9mUcLGTN1kODOvel5/HAa4sA96QR03R3e/Tv3JCNu8ouVBFMEZ3yFnFdzFpMRxGb3/uB/qaV0bH7CN7m4BQrh7q0LFZ2FUXhwEpks3bsCmvHohVbcGHQoVtnflm2nfb2PWS6whg2/EzmrjjA6i0HiAyxMGFUb6Iy88nNd5BQ0rU09fbB/LRsN//9aSMAHZvHYk9sjR2IiQwh42AhLRtFHfYxH3V2kt/f913Rk9fnrOX31fB7YTsAruyfRLteTQmLDaewZBBnJHDZ4NYUOlw0qhdGw3phXJZTRNumFc+OMAzjpPrCKj0OpypbOQJd6S+uyCoOSL6hKMIvIB26H6vFKBOCpXY7WVtNakLZ7saT47HQO64a5BUcCi1pWf4Do9/7bj1L1+/n2vPa079zQwqLnT7jeGDT7ix+Xr4Lp8u9qGD75v5jDvYcyCUnr+IZUeXNyIJDC5c2bhBO55b1CLJZiI+yc0WnQpYvXkWuGcw5oSuJXF9AJwvuphkfRnQiRvM+pOTbaJhYj+Am7QnLsbM7LYdOHRPJ2JTGmsXuKdX/6tiKn/7O4Y8cd/fQDU0SuLxeHCs2xnJa14aEh9hpWM9/sLnVYqFv+3hvQKofc6i15oIBLVi5KY1ubepXWO+KlA4OYRUM6D3bZxCnAZzdu2m5x52swkut9xQZevKEt9ouolTYjAyv2g9v3zDre1+lt5e3bIfUXidrKKgJZVrTTpIfAydHKWqxbfuysdusfpeiOJB9aEZYSrr7siiL1qYQExHEXyUtPH9uSiO5UyIH8/yXVPcdAL07LZe9B9zji7q0qseqzQfYlZpDRsmS9D2TGrBs/f4yZWrdJJq0zPyS1XShd7t474q7SfE2wtZ9zZQ26+HANviriPN8xtoaUQkEtR8EweH839pUtm1PxajXjCv/NQzDMPCdX3RKxKGBqJ76B9utNG4QTouGUazYmEZMRBARoXYiQu0M7Xv4mSTREcF0b1OfPzem+a1VMrBrIwZ2Pba1S0JLXZqjrv4KL10vtSBVnWpvQfIbd3Tovny78qKr+D6l5pUJSHX0s6kyfAOSxTAqnJl6op0cpail8gqKeeK95RQ7XNxxSVe6tHLPkDng02q0MzWHnak5vPzlGr/bLlu/n7te+MP7xRYSZKWgyElpnkUf+3ZIYNXmA6RkuANXaLCV/p0alhuQ/jmgBXmFDr5esJ2rh7YjNTOfJetSOcW6n7MPfEXRnkNT7a0RcazMCKeJNY0Vzracf8ltGFZ3mQYmmURvTKNpQsQRZ/rEx4Zx3fkdiAoLwma10LKROyA1jT+6a3Fdf35HMnMKjzhDqrJKv9FKt7TUFcFBVm83KiggVaWIkNJjkKqvBcm35c9v+0nUnStVIyzYhoF7Mm5YsA2rpXrW8qkNfFuMwkJsJ81aVQpIx2F/ZgHFJbNiXpi9mqduTCY2MpgDPusN7T2Qy+bd5V+YJiu3iKySi/IlxIaRlVvobfXxzBrz6NKqnl+IatEwilaND43J6dM+nsV/p9K2aYx3Sm+PNvVx7lhJvS2/cF/0dvdsskIwIhsQ3P18ghonUa9FKx5/8if2pOXSuEE4w62HXhIWw/BbJuBIfK+5NbhHEzJzijitS8PD3KKs4CBrlYUjKNuyUlEXW21nMQzCgt2LW0LVt3IEMt9ft0E2C8H2qh387vtcVTQGqaoHhkvNs5SMK8stcAT0+CNw/5C1GO4rHZxMj0Xd/LaoZnvScvl28Q6a+sw2czhdbNmTRc+keL8FGZ0u09u9dTiRYXbCQmzegHRal0bsTM2hqNhFvagQwkPstG4c7e2iS2oWS2RYEE3jI9iZmsN5yc35R6+mNImPwJWdSvH63yjetBDzoLuFKdHqvl6RvW1/QvpdhhESgdVmwTAMEuJC2ZOWS0wV/koNDbYx8h9tj3xgNSvdghRah5uxPR+2oBakqhQSZMVqMXC6TCLD7FX+67YyY5BOpgkBUnUiQu3kFjgCevwRuCe9hIXYyMkvPqnWgzp5SlILbN2bjcUweOmLv0gt6erylZrp3nag1IrVvitPg/uCo56LoHpEhgURFmLzHts8MZI2jaNZsy2DZgnuIHbThZ1Yuy2DwmIHPZPclwS4dURn0rIKaFoyLdyZsoncb6ZBUUn5gsOxJw3E1rQzlqh4LJFlBzonxLpbbKIj6t56KKHB/r/262oLEkBYsB33Ak/+6+nI8TEMg4hQO1m5RdXSMuc3BqmCFqSqHhguJ4eIUDspGfknVatJTYkoWQ+sdJd2Taq73xZVYNHaFBatTWHMsA64TJOn3l+O1Wohv4KZYvsz3V9OnoCUGBfGvvRDK0FfflYbYiKC6dUunv6dG/LOt+tYs80diKLC7cSXzNyyGAYN64VzaqeGrNmWQfc27m6u0GAbPZP8u7waxITSICYUV/Z+itb+RPGaH8HpwNKgBUGd/oGtRU8M2+GDz4AuDdmyO+uou8NqA98WpGC7FVs1XbPnZODbnagWpKpVvQGp/BaksBCbt+VKXWx1k6flKOIkajWpKZ6Wo5OpNU3PymF4BlZ/On8zbZvEUORwQTkr8TaqH86etFz2l6xo7Rmkff6pzXn160PXKhvUrZF38b4GMaEkxIV5A1JkWBCnJLrHFDVLiMBus5DcKZGuresdcUR/4fIvKFr6mfdva7OuhJ55M4a9ci1CzRIimXBFz0odW9v4thjV1RlsHp76GZQdWCzHx/OhXdUDtN3nLL8FyeLTchWlLrY6ydNydDKFgpri+cw6mVrT6vY3RhXZtvcgxcUVX6IgqWkMe9JySc3Mx+F0kVUyjqhjC//rH5Ve2Tg+9tBg5MgwOy0bRTH2oi40qn9oe9hhvuhc+dkU/TmH4tXfAWBt3IGgzkOxNu180swCqGmhgRSQgg/9ArPU0otWnqw8waW6WpD6dnBffLf0+71Lq3qs3JRG83Ku0Si1X7OESP7vr300O8rZvnWRZ6bmydT6Xbe/MarIgewCMg4WVLg/qVkMv6zYzYGsQlIz8jFxXxE8MszOlUOTeOfb9ZzZo0mZ2yXEHloM0fMrsrKLITrTtpM352kozAUgqPfFBHcfdhS1Cgx+AakOjz+CQ0sYnEwfMHVFYj33j5aG9atuhqWHYRjcMLz869SMPrc9LpepwFtH/aNXE3q0rU+9qJAjH1zHnd2rKVaL4XfliZpWt78xqkh2btFh97dsFIXNasHhdLFio3vWWJMG4RiGwendGnNKQiSNSq0cDfhNZz+aX6auzL3kz50KhblYYpsQ1PMC7C17V/r2gSTIZvGO46jrAckzQy/yJGqiriuG929O11b1j3i5m+qgcFR3GYbhd43HQNYkPoKrhrar6WL4qdvfGCeAYUBMRDANYkLYeyCPJX+7p/Q3Tzz0QdqiYfkfqvWjD/1qqOwqqq7sVPLmTMEsOIil/imEDRuPEVT1v2rrCqNkVdac/OLDdlfWBTER7pAdF61fo1XNbrPSuknF1+YTkbpHAekwQoOt5BceWt26fnRImWurRYe7V45uEBPK3gN57EjNAQ5dguNwbFYLV5/TjqxKrhzt2LGS/F9ecbccxTQi9Jy7FI4qITTYWhKQ6vbLvU/7BAqKnHRvffTXrBMREX91+xvjOLlM/78HdG7I579v9dvm6Ttu0TCKVZsPeLdXdlBlZa8xVrj8S4qWzgbA0qAloWffhiX0xDf310aecUh1vYst2G7lH71q10V2RUROVnV3UZhjlJNfzGNvL+Hr/9uGw2dKf2xkMB18ZqV1a12f5I4JnN+/BQCnd/MPOo3qlx1zdKwc2//0hiN7h8GEDb8PS3hslZ2/rvMEo7regiQiIlVH3xilfPTzRrbuPcjWvQf9tndqEUdc5KF1hRLjwvjX4Nbev6MjgmnbNIYNOzOxWowqW5DQLMqn4Pd3ALB3HkJI8r+r5LyBJCEujHU7Mqv0Gm8iIlK3KSCVsmz9/jLbzujemOH9mxMeavdefTk6ouyssxsv6Mj7P2yodLdZZRQu/hgzN919gdneI6rsvIHkssFtGNC5YY3MQBIRkdpJAclHTn4xBUXOMtsvO7O1d5HHqPAgsnKLyg1IMRHB3PLPzlVWHseevyle+zMAIQNHH/GSIVK+4CArrRprBpKIiFSexiD5WLkxrdztVp/usiYlF4VtXD+iWsviTN9F/vczALAnnYatcYdqvT8RERE5RC1IPlIz88tss1kNLD6X7bhheEcOZBXQNL76ApLpdFDwwwtQlIcloTXBp15RbfclIiIiZSkg+TBNs8y20oOtI0Lt1X4xvaJV3+LK2ocRGkXYkDsqfdFZERERqRrqYvNRTj6qstloleXKy6RoxZcABPe7DCOkervyREREpCwFJB/ltSDZbSf2ISpa/hU4irDEt8TWOvmE3reIiIi4KSD5KKcBCZv1xF0o0pW1j+J1vwIQ3OcSDEMXqRQREakJCki+arCLzXS5KPj1dXA5sTbtjK1R+xNyvyIiIlKWApIPVzkJyX6CAlLx+vk4UzaCPYSQAVedkPsUERGR8ikg+SqvBekEjEEyTZPi1d8BENzzQiyRuhq7iIhITVJA8uGqoS425+41uDL3gj0Ee7tB1X5/IiIicngKSH7K62Kr/oHSnsuJ2Nv2xwgKrfb7ExERkcNTQPJRE+sgmYW5OHasBMDe/vRqvS8RERGpHAUkH+WupF3NY5CKty4FlxNLXBOscU2r9b5ERESkchSQfJS3DlJ1z2JzbFoIgK11v2q9HxEREak8BSQfJ7qLzZm6BeeevwEDe6u+1XY/IiIicnQUkHycyEuNmKZJ4eKPAbC1ORVLZINquR8RERE5egpIR1BdLUiu1M3u1iOLjeBe/6yW+xAREZFjo4Dko9x1kGzVM82/2DP2qGVvLQwpIiJyklFA8lVeF1s1tCCZLheOLYvd59fgbBERkZOOApKPEzVI27l3HWZ+NgSHY23cscrPLyIiIsdHAcmHJx8F2Q89LNURkByb3d1r9ha9MKy2Kj+/iIiIHB8FJB+eWWxBNqt3W1XPYjOdDoq3LgO09pGIiMjJSgHJh6eLLdivBalqB2k7d/0FhbkYYTFYE5Oq9NwiIiJSNRSQfJglnWxB9kMtSFXdxVZcMjjb1rI3hkUPv4iIyMlI39A+PC1I1dXFZpoudwsSYGveo8rOKyIiIlVLAcmHNyBV0yBt14Gd7tlrtmCsCW2q7LwiIiJStRSQ/FRvF5tz9xoArI3aafaaiIjISUwBycehLrZDD4u9CgdpO3a5A5KtSacqO6eIiIhUPQUkH4dmsfm0IFXRGCTTUYRz33oALQ4pIiJyklNA8uFdB8lnDFJVDdJ27tsATgdGeByWmIZVck4RERGpHgpIPrwraduqfgySo2T2mrVxRwyjei6AKyIiIlVDAcnHoUuN+Ezzr6KA5PSOP1L3moiIyMlOAclHeV1sVdGC5MrLwpW+EzCwKiCJiIic9BSQfJWzUGRVDNJ27tsAgCWuCZaQyOM+n4iIiFQvLcbjw+XTghQXFUxRsYvwkON/iDwByZrY9rjPJSIiItVPAakcFsPg4dF9cLrMKulic+7bCIA1Uatni4iI1AYKSD486yAZBkSE2qvmnMUFuA7sANSCJCIiUltoDJIPzyDtqpyG70zdAqYLI6Ieloi4KjuviIiIVB8FJB/mkQ85ahp/JCIiUvsoIPk41IJUdefU+CMREZHaRwHJh3cMElWTkEyXE2fKJkABSUREpDZRQCpHVbUguQ7sBEchBIViiW1cNScVERGRaqeA5MNVxV1szpSS7rWENhiGHmoREZHaQt/avqq4i03jj0RERGonBSQfLp91kKqCc/9WAKwNWlbNCUVEROSEUEDy421COv4zFeZiHtwPgLX+Kcd/QhERETlhTnhAWrZsGUlJSWX+W7RokfeYBQsWMGLECLp27crQoUOZM2eO3zkKCwt55JFHSE5Opnv37tx1112kp6cfd9k8s9gsVdCE5EzbDoAR2QAjJOK4zyciIiInzgm/1Mj69etp1qwZH3zwgd/26OhoADZv3swNN9zA6NGjmTJlCr/++iv33nsvcXFxJCcnA/Dwww+zdOlSZsyYQVBQEA899BBjx47lvffeO66yedZBqgqHuteaV9k5RURE5MQ44QFpw4YNtG7dmgYNGpS7/+233yYpKYk777wTgFatWrF27Vpee+01kpOTSUlJ4fPPP2fWrFn06tULgOnTpzN06FBWrFhB9+7dj7lsnnhUFS1IrrRt7nPVb37c5xIREZETq0ZakHr27Fnh/qVLl3LWWWf5bevXrx+TJ0/GNE2WLVvm3ebRokULEhISWLJkyXEFJA+r1cBmO77eR1dJF1tQQovjPld1sVotfv8PFKq36h0IVG/VOxBU5ZUvSjvhAWnjxo3ExsYyYsQIUlJSaNu2LXfeeSddunQBYN++fSQmJvrdJj4+nvz8fDIyMkhJSSE2Npbg4OAyx+zbt++4ymaxuF9YEREhxMaGH/N5XEUFZGSnAlCvVTus4cd+rhMhKiq0potQI1TvwKJ6BxbVW45XlQakXbt2ceaZZ1a4/9dff+XgwYPk5eUxceJErFYr7733HldccQWzZ8+mdevWFBQUEBQU5Hc7z99FRUXk5+eX2Q8QHBxMYWHhcZXf4XQBkJdXSEZG7rGfJ9U9/sgIjSS7yAZFx36u6mS1WoiKCiU7Ox9nSd0DgeqtegcC1Vv1DgTR0aHexo2qVqUBKSEhgblz51a4Pz4+niVLlhAaGordbgegc+fOrF27lnfffZdHHnmE4OBgioqK/G7n+Ts0NJSQkJAy+8E9sy009PiSs6tkISSX08ThOPYXWHHaLgAsMY2O6zwnitPpqhXlrGqqd2BRvQOL6h0YqnBuVRlVGpDsdjutWrU67DFRUVF+f1ssFlq1akVKSgoADRs2JDU11e+Y1NRUwsLCiIyMJDExkczMTIqKivxaklJTU0lISDjOGnguNXJ8nZqujN0Auv6aiIhILXVCR3PNnz+f7t27s3PnTu82h8PBunXraN26NQC9evVi8eLFfrdbuHAhPXr0wGKx0LNnT1wul3ewNsDWrVtJSUmhd+/ex1U+bxI9zkFfTm9AanR8JxIREZEacUIDUo8ePYiNjWX8+PH89ddfrF+/nvHjx5OZmcnVV18NwKhRo1i1ahVTp05l8+bNvPHGG3z77beMGTMGcHfjnXfeeUycOJFFixaxatUqxo0bR58+fejWrdtxlc+sooW0XRl7ALUgiYiI1FYnNCBFRETw1ltvUb9+fa699louvfRSMjMzee+996hfvz4Abdq0YebMmcybN48LL7yQjz/+mClTpngXiQR47LHHSE5O5tZbb+Xaa6+lZcuWPP/888ddPs9CkcfTxWYWF2IeTAMUkERERGqrEz7Nv1mzZkcMMwMHDmTgwIEV7g8LC2PSpElMmjSpSsvm7WE7jiYkV+ZewMQIicQSGnXE40VEROTkE1grSh3BoRakYz+HS+OPREREaj0FJB+HxiAde0LSDDYREZHaTwGpPMfRgqQZbCIiIrWfApIPV0kT0vE8KJrBJiIiUvspIPk6zlHapkMz2EREROoCBSQfnnxkOcYuNs1gExERqRsUkHyYx3lRF1e6xh+JiIjUBQpIPryz2I6xi00z2EREROoGBSQfx7sOktM7QFstSCIiIrWZApKPQ9eqVQuSiIhIIFNA8uXtYjuGm2oGm4iISJ2hgOTDdWgp7aO/rWawiYiI1BkKSOU4li42zWATERGpOxSQfLiOY5C2K1MraIuIiNQVCki+jmMMkruLDSwxDauwQCIiIlITFJB8HM8sNldWCgCW6IQqLJGIiIjUBAUkH8e6DpJpunBlpwJgiU6s6mKJiIjICaaA5ONYLzRi5maAsxgMK0ZEvSotk4iIiJx4Ckg+PLP8LUfZhOTpXjOiGmBYrFVdLBERETnBFJB8mMe4DpIrax+g8UciIiJ1hQJSOY72YrXeAdpRCkgiIiJ1gQKSD9cxLqStGWwiIiJ1iwKSr2OdxeadwaaAJCIiUhcoIPnwDkE6ioRkmiaunJKL1EY2qI5iiYiIyAmmgOTDLJnofzQNSGZhDjiK3LcLj62GUomIiMiJpoDkwzyGS42YOQfctwmNwrAFVUOpRERE5ERTQPJheq81UvmE5PIEpIj61VAiERERqQkKSD6OqYstJx0AS0RcNZRIREREaoICko9j6WI71IKkS4yIiIjUFQpIvrzrIB3FLLaSgGRRQBIREakzFJB8eLvY1IIkIiIS0BSQfBzTOkgHS1qQIhWQRERE6goFJB/eWWyVPd5ZjJmfBagFSUREpC5RQPJhliQkSyUbkMzcDPc/rHaM4IhqKpWIiIicaApIPrwNSJXsYnPlZboPD489qm45ERERObkpIJUwffrXKpt1PC1IlrCYaiiRiIiI1BQFpHJUti3IE5B0DTYREZG6RQGpHJXtLnMpIImIiNRJCkjlqHQXW8kYJHWxiYiI1C0KSOVQF5uIiEhgU0Aqh7rYREREApsC0jEyTdOni00BSUREpC5RQCrhu4h2pRqQCnPBWew+Piy6WsokIiIiNUMBqRyV6WJz5ZV0rwVHYNiCqrtIIiIicgIpIHn4NCFVpgHJzM10H6vxRyIiInWOAlI5KtOCdGgGW0w1l0ZERERONAWkclRmDJKni00DtEVEROoeBaRjpDWQRERE6i4FpHJYKjNI2zMGSatoi4iI1DkKSOWpRBeb6eliUwuSiIhInaOAVI7KzWJTF5uIiEhdpYBUwvSZ53+kWWymy4GZf9B9rLrYRERE6hwFpFIq1XqUlwWYYLFihEZWd5FERETkBFNAKuWo1kAKi8Ew9BCKiIjUNfp2L6VyayBluo/V+CMREZE6SQHJo2QIUmUCkqcFyaLxRyIiInWSAlIZR3OZEbUgiYiI1EUKSKVUqostP8t9bGh0NZdGREREaoICUgnPJP9KdbGVTPG3hEZVX4FERESkxigglWJUpostP9t9bJgCkoiISF2kgFRK5VqQSgJSiAKSiIhIXaSAVMqRApJpmocCkrrYRERE6iQFpDKOkJCK88HlcB+pgCQiIlInKSB5lIzSthypBamk9Qh7CIYtqHrLJCIiIjVCAekoudS9JiIiUucpIJVypGuxafyRiIhI3VetAenBBx9kwoQJZbYvWLCAESNG0LVrV4YOHcqcOXP89hcWFvLII4+QnJxM9+7dueuuu0hPTz+qcxytyq6D5AlIWgNJRESk7qqWgORyuZg+fTofffRRmX2bN2/mhhtu4LTTTmP27Nlccskl3HvvvSxYsMB7zMMPP8zvv//OjBkzePvtt9myZQtjx449qnMcq0q3IGmKv4iISJ1lq+oTbt68mfvvv5/t27fTqFGjMvvffvttkpKSuPPOOwFo1aoVa9eu5bXXXiM5OZmUlBQ+//xzZs2aRa9evQCYPn06Q4cOZcWKFXTv3v2I5zgeR1oG6VAXW+Rx3Y+IiIicvKq8BWnhwoW0atWKr7/+miZNmpTZv3Tp0jIhpl+/fixbtgzTNFm2bJl3m0eLFi1ISEhgyZIllTrHcalkF5vGIImIiNRdVd6CNHLkyMPu37dvH4mJiX7b4uPjyc/PJyMjg5SUFGJjYwkODi5zzL59+yp1jri4uGMouTtYWQwDm+0wubHAfR02W0TM4Y+rBaxWi9//A4XqrXoHAtVb9Q4Elbn6xbE6qoC0a9cuzjzzzAr3L1iw4IjhpKCggKAg//WDPH8XFRWRn59fZj9AcHAwhYWFlTrH8bBaDGJjwyvcn1PkDkhR8fGEHua42iQqKrSmi1AjVO/AonoHFtVbjtdRBaSEhATmzp1b4f7o6OgjniM4OLhMiPH8HRoaSkhISLkhp7CwkNDQ0Eqd43iYJmRk5Fa435GTBUCuI4iCwxxXG1itFqKiQsnOzsfpdNV0cU4Y1Vv1DgSqt+odCKKjQ7FYqqfV7KgCkt1up1WrVsd1hw0bNiQ1NdVvW2pqKmFhYURGRpKYmEhmZiZFRUV+rUSpqakkJCRU6hzHwnfkksNR/ovLdDowC92hyBUUiVnBcbWN0+mqsM51meodWFTvwKJ6B4bjHXZ8OCe8s7JXr14sXrzYb9vChQvp0aMHFouFnj174nK5vIO1AbZu3UpKSgq9e/eu1DmOScmDfLj+TLNk/BGGBYLDju1+RERE5KR3wgPSqFGjWLVqFVOnTmXz5s288cYbfPvtt4wZMwZwd+Odd955TJw4kUWLFrFq1SrGjRtHnz596NatW6XOcTwOG5C8ayBFYhiBNRBOREQkkJzwb/k2bdowc+ZM5s2bx4UXXsjHH3/MlClT/KbtP/bYYyQnJ3Prrbdy7bXX0rJlS55//vmjOsexMg4zz19T/EVERAJDlU/z9/Xuu++Wu33gwIEMHDiwwtuFhYUxadIkJk2aVOExRzrHsapUC5ICkoiISJ2mfqLSDpOQFJBEREQCgwJSKYdbc8qlgCQiIhIQFJBKURebiIiIKCCV8CylYByui63AHZAsCkgiIiJ1mgJSKZVrQTq2xShFRESkdlBA8vAsFHm4Q/LdC0UaIWpBEhERqcsUkMooPyKZpnmoBSnsyNecExERkdpLAakUS0VNSEV54HIA7pW0RUREpO5SQPI6fB+b5yK12IIwbEHlHyQiIiJ1ggJSKRXNYjMLctz7gyNOZHFERESkBigglfBO869of0kLkhESfkLKIyIiIjVHAamUiqb5ewOSWpBERETqPAWkUo7cxaYWJBERkbpOAamUI3axqQVJRESkzlNAKqXCFqTCkhYkjUESERGp8xSQPI4wSluz2ERERAKHAlIpR+5iUwuSiIhIXaeAVMqRutgIUQuSiIhIXaeAVKLS6yCpBUlERKTOU0AqpcJ1kDQGSUREJGAoIJVSXhebabqgMM+9X7PYRERE6jwFpMooysfTCacWJBERkbpPAakUSzldbJ7uNewhGFbbiS2QiIiInHAKSKWV18WmAdoiIiIBRQGplPIGaXtX0VZAEhERCQgKSCXMknn+RjkT/U3PAG0FJBERkYCggORVMgi73Bakki62oLATWSARERGpIQpIpZS3DJJZ5GlBUkASEREJBApIpZS7DlJJFxvqYhMREQkICkillLuSdpG62ERERAKJAlIleAdpKyCJiIgEBAWkUsrtYivKd+/TGCQREZGAoIBUymFnsSkgiYiIBAQFpFION4uNIA3SFhERCQQKSCVK1okst4uNQk3zFxERCSQKSB6elbRL5SPTNA+tg6RB2iIiIgFBAelIigvAdAG61IiIiEigUEAqxVKqCck7/shiA6u9BkokIiIiJ5oCUmmlu9h8xh+VOz5JRERE6hwFpFIMym9B0vgjERGRwKGAVEqZQdolayChGWwiIiIBQwGplDKdaGpBEhERCTgKSCXMknn+pccZHRqDpBlsIiIigUIBqbQKutjUgiQiIhI4FJA8ShaKtJQOSEVqQRIREQk0CkhlVLAOklqQREREAoYCUillljrSddhEREQCjgJSKWUGaRcpIImIiAQaBaRSSjcgaZC2iIhI4FFAKlEyRruchSK1DpKIiEigUUAqpcJLjWgWm4iISMBQQCrNJx+ZLicUF7j/0BgkERGRgKGAVIpfF1tR/qHt6mITEREJGApIpfh2sXnXQLKHYFisNVQiEREROdEUkDxKRmn7tiBpBpuIiEhgUkAqxT8gaQ0kERGRQKSAVEp5XWxqQRIREQksCkglylsHydPFpuuwiYiIBBYFpNL8ZrFpDSQREZFApIBUisWnCUljkERERAKTAtJhaAySiIhIYFJA8nKPQtIsNhEREVFAKsV/FpvWQRIREQlECkillNeCpOuwiYiIBBYFpBKHpvn7JKSSC9Ua9tATXyARERGpMQpIHp5LjfhuKrlYrRGkgCQiIhJIFJBK8etiKy4JSPaQGiqNiIiI1IRqDUgPPvggEyZMKLN99OjRJCUl+f03atQo7/7CwkIeeeQRkpOT6d69O3fddRfp6el+51iwYAEjRoyga9euDB06lDlz5lRNoUsSkmma3i421IIkIiISUKolILlcLqZPn85HH31U7v7169fz8MMP8/vvv3v/mzFjhne/Z9+MGTN4++232bJlC2PHjvXu37x5MzfccAOnnXYas2fP5pJLLuHee+9lwYIFx112bwOSowjMkqn/GoMkIiISUGxVfcLNmzdz//33s337dho1alRm/4EDBzhw4ABdu3alQYMGZfanpKTw+eefM2vWLHr16gXA9OnTGTp0KCtWrKB79+68/fbbJCUlceeddwLQqlUr1q5dy2uvvUZycvJxld/TxebpXsMwwBZ0XOcUERGR2qXKW5AWLlxIq1at+Prrr2nSpEmZ/evXr8cwDFq0aFHu7ZctWwZAv379vNtatGhBQkICS5YsAWDp0qVlglC/fv1YtmyZu2vsOHhnsRWVdK/ZQ/xntomIiEidV+UtSCNHjjzs/g0bNhAZGcmjjz7KH3/8QVhYGEOHDuXmm28mKCiIlJQUYmNjCQ4O9rtdfHw8+/btA2Dfvn0kJiaW2Z+fn09GRgZxcXHHXH6r1cBms+BwHZrib7PVzbHsVqvF7/+BQvVWvQOB6q16B4LqbL84qoC0a9cuzjzzzAr3L1iw4IjhZMOGDRQWFtKlSxdGjx7N33//zdNPP82ePXt4+umnyc/PJyiobJdWcHAwhYWFABQUFJQ5xvN3UVHR0VTJy9PuFBYaTGxsOPlZcBCwhYYRGxt+TOesLaKiAnOMleodWFTvwKJ6y/E6qoCUkJDA3LlzK9wfHR19xHM8+uijjB8/3nts27Ztsdvt3Hnnndx7772EhISUG3IKCwsJDXU/8cHBwWWO8fztOeZYFRQUkZGRS1F6BgAuazAZGbnHdc6TldVqISoqlOzsfJxOV00X54RRvVXvQKB6q96BIDo6FIulelrNjiog2e12WrVqdXx3aLOVCVJt2rQBDnWdZWZmUlRU5NdKlJqaSkJCAgANGzYkNTXV7xypqamEhYURGRl5bAUraUIyXSYOhwtnQckgbVsIDkfdfrE5na46X8fyqN6BRfUOLKp3YDjOYceHdcI7K0eNGsV9993nt2316tXY7XaaN29Oz549cblc3sHaAFu3biUlJYXevXsD0KtXLxYvXux3joULF9KjR4/jT5KedZC0iraIiEjAOuEBaciQIXzxxRd8+OGH7Ny5k7lz5/L0009z7bXXEhERQUJCAueddx4TJ05k0aJFrFq1inHjxtGnTx+6desGuEPWqlWrmDp1Kps3b+aNN97g22+/ZcyYMcddvjLT/LWKtoiISMCp8llsR3LFFVdgGAbvvvsujz/+OA0aNODqq6/m+uuv9x7z2GOP8fjjj3PrrbcCMHDgQCZOnOjd36ZNG2bOnMmUKVN4++23adKkCVOmTDnONZBKFoX0/FkyzV8tSCIiIoGnWgPSu+++W+72kSNHHnY5gLCwMCZNmsSkSZMqPGbgwIEMHDjwuMtYmmfNI12HTUREJHAF1oIJleDtYvMuFKkWJBERkUCjgFTCMxDe28VW7OliUwuSiIhIoFFAKkVdbCIiIqKAVFqpLjYN0hYREQk8CkilHOpi0zR/ERGRQKWA5FEyCMnQQpEiIiIBTwGpFE8LkukZpK0WJBERkYCjgFSKYRiYpss7i01dbCIiIoFHAamEd5q/waFwBBhBYTVSHhEREak5Ckjl8Iw/wmIFq71mCyMiIiInnAJSKRbDwCzKA9ytR55B2yIiIhI4FJBKM3xakNS9JiIiEpAUkEoxDMDbgqQp/iIiIoFIAamEWTJK22qxaA0kERGRAKeA5OVOSO4xSJ6ApC42ERGRQKSAVIrVYmAWurvYUAuSiIhIQFJAKsViMXzGIKkFSUREJBApIJU4NAbJ0BgkERGRAKeAVIrFojFIIiIigU4BqYTnUiPuFiRN8xcREQlkCkge5qFZbHgXilRAEhERCUQKSKWoi01EREQUkEqU28UWrIAkIiISiBSQSrFoDJKIiEjAU0DyKGlCshgmFBe4/1AXm4iISEBSQCrh7WJzFnq3qQVJREQkMCkgebkjktVZ0npktWFY7TVYHhEREakpCkgeni42h2awiYiIBDoFpBKeLjaLo6SLTQFJREQkYCkglWIpGYNk2INruCQiIiJSUxSQSrG4igEwbApIIiIigUoBqRTDM4tNLUgiIiIBSwGpFIuzCFALkoiISCBTQCrFcLgDEgpIIiIiAUsBqRRDg7RFREQCngJSaZ5p/ragmi2HiIiI1BgFJB8WwwCHxiCJiIgEOgUkH1aLgelQF5uIiEigU0DyYbEYUOyZ5h9Ss4URERGRGqOA5MOvBUldbCIiIgFLAcmHxXJoDJIGaYuIiAQuBSQfVouBWawxSCIiIoFOAcmHxaeLTQtFioiIBC4FJB/uaf5qQRIREQl0Ckg+/LrY1IIkIiISsBSQfLgHaauLTUREJNApIPmwGy5wOQF1sYmIiAQyBSQfwRbnoT/UgiQiIhKwFJB8hFgc7n8YVgyrrWYLIyIiIjVGAclHkFESkOxaJFJERCSQKSD5CC5pQdIMNhERkcCmgOQjGE8LkgKSiIhIIFNA8uHpYlMLkoiISGBTQPKhgCQiIiKggOQnSF1sIiIiAmguu4/KtCC5XC6cTseJKlK1crkMCgqsFBUV4nSaNV2cE6a8elutNiwW/V4QERE3BSQfdord/7CVneZvmibZ2enk5+ec4FJVr7Q0Cy6Xq6aLccKVV+/Q0AiiouIwDKOGSiUiIicLBSQf9pIutvIuM+IJRxERsQQFBdeZL1Gr1Qio1iMP33qbpklRUSE5ORkAREfXq8miiYjISUAByUeQtwXJPyC5XE5vOIqIiKqBklUfm82CwxF4LUil6x0U5H7Oc3IyiIyMVXebiEiA07eAD5tZfguS0+m+RpvnS1TqJs/zW1fGmImIyLFTQPJhr6AFyaOudKtJ+fT8ioiIhwKSD5vpDkhaB0lERCSwKSD58LQglTdIW0RERAKHApIPTwuSFoo8Odx66/VMnvxwTRdDREQCkAKSD3WxiYiICCgg+bG6Dj9IW0RERAKD1kHy4W1BqmQXm2maFBXXzBpCQXbLMc26GjCgF6NHX8fcuV/hcBTz0kuv06BBAq+++hLff/8Nubk5tGjRijFjbqRPn35s3ryJq666jNdff4+kpHYA3Hff3SxfvoS5c3/GarXicrkYPvxsbrttHEOGnMtXX33OJ5/8l507d2KxGLRt246xY8fRrl0HAC6++HxOP/1MFi78g4yMdCZNepqOHTsza9YMvv/+W4qLi7jggoswTf8FLD/44F0+//wT9u9PpX79Bpx33nCuuupazT4TEZEqV+UBae/evUyZMoVFixZRVFREly5dmDBhAm3atPEe88033zBjxgx27dpFy5YtGT9+PMnJyd79GRkZTJo0ifnz52MYBueddx733nsvoaGhlT7HsbC6itz/qEQLkmmaPPHecjbtzjqu+zxWrZtEc9/IHscUDj777GOmTn0eh8NJs2bNmDjxPrZv38qDDz5Ggwbx/PHHfO699w4ef3wqp546gIYNG7FkyUKSktrhdDpZsWIpeXl5bNiwjvbtO7J27RoOHjxIcvIA5s37hWeeeZrx4yfStWt30tLSePbZKTz55CTeeusDbxlmz/4fTz31DJGRkbRs2Zpnn53CH3/8xv33P0RCQkPeeecNVq5cQaNGjQH4/ff5vPvumzz66OM0bdqcNWtWMWnSQzRs2IghQ86tssdVREQEqriLraioiOuvv579+/cza9YsPvjgA8LDw7nqqqtIT08HYOHChdxzzz1cdtllfPbZZyQnJ3P99dezefNm73nGjh3L9u3beeutt3juueeYN28eDz/8sHd/Zc5xLKxH2YJELW24GDLkXNq160CnTp3ZuXMHP/74Hf/5z0P06NGLpk2bcdllV3DWWUP44IN3AOjf/zSWLFkEwN9/r8Fms9OpU2eWL18KwIIFv9O1a3eioqKIjo5mwoQHGDLkXBITG9KpU2eGDRvOli2b/MrQr19/evfuS7t2HXA4ivnmm6+57robSU4eQMuWrbjvvgeJizt0yY89e3YRFGQnMbERiYmJnHnm2Tz77Et07drjBD1qIiISSKq0BWnp0qVs2LCB+fPnk5CQAMCUKVPo27cvP//8MxdffDGvvvoqZ511FldeeSUA48ePZ8WKFbz99ts8+uijrFixgsWLFzN37lxatWoFwKOPPsqYMWMYN24cCQkJRzzHsbKY7hWzKzNI2zAM7hvZo9Z1sQE0adLM++8NG9YDcPPNY/yOcTgcREREAu6A9OWXn1FYWMCSJYvo2bMXiYmNWLZsKSNHXsWCBb8zdOgwALp168G2bVt5663X2L59G7t27WDz5k1lLgzbpElT77937NhOcXEx7dp19G4LDg6mbdsk799nn30uc+Z8yb//PYLmzVvSu3dfTj/9TBITE4/pMRARETmcKg1Ibdq04ZVXXvGGI8B7Tavs7GxcLhfLly9nwoQJfrfr27cv33//PeAOWQ0aNPCGI4A+ffpgGAbLli1j6NChRzzHsTDwGe9iC6rcbQyD4CDrMd9nTQkOPhQAPcHlxRdfJSws3O84z3PXvXsv7HY7K1YsZ+nSxQwZci4NGzZk9uz/sW/fXjZu3MDkyYMA+P77b5k8+SHOPvscOnXqwgUXjGDLls1Mn/5UhWXwNMWZpn+IstkOvTxjYmJ4880P+OuvVSxZsohFixbw8ccfcu21NzB69HXH94CIiIiUUqUBqUGDBgwaNMhv27vvvktBQQH9+/cnOzubvLy8Mr/64+Pj2bdvHwApKSk0bNjQb39QUBAxMTHs3bu3Uuc4Ft62GMPAFhzs1zrjctXSvrQjMAxo1ao1AAcOpNG2bTvvvpdffhGr1cqYMTdis9no0yeZ33+fx9q1f/Gf/zxE/fr1cTqdvP76y7Rs2ZqGDRsB8P77b3H++Rdy9933ec/122/zAPe4rfJavZo1O4WgoGBWrVpJmzbuViOHw8HGjRvo0aMXAN9//w0HDx7koov+RZcu3bj22ht46qlJ/PTT90cdkDxFMAwoNQ4cAKvVwGarexM8rVaL3/8DheqtegeCQK13dc7ROaqAtGvXLs4888wK9y9YsIC4uDjv3z/88APTpk3j6quvJikpyRtggoL8W2iCg4MpLCwEID8/v8x+32MKCgqOeI5j4WlBMuwhxMVF+O0rKLCSlmapM1+cFsuherRs2Yr+/U9j6tQnufvu8bRs2Yqff/6R9957i4kTH/YeN3DgIJ544jHq12/AKae4u+g6d+7Cd9/N5eqrr/Uel5CQyOrVK9m0aT0RERH89ts8Zs/+HwAul8PbcuRbhqioCC655FLeeOMV4uMb0KJFS95//13S0vZjGO7jHI5iZs58jqioCLp27UFqagp//rmcbt16HPNzUvqDxOUysFgsREeHERISckznrA2iokKPfFAdpHoHFtVbjtdRBaSEhATmzp1b4f7o6Gjvvz/88EMee+wxhg8fzr333gsc6lYpKiryu11hYaF3hlpISEiZ/Z5jwsLCKnWO42ILIiMj129TUVEhLpcLp9PE4aiZMUdVyeVy18Mw3CHh0Uef4OWXX+TJJydz8GA2jRo1KRlofZ63vn37norT6aRHj17ebT179mHZsqWceupA77Y77riHp5+ezE03XUdQkJ3WrdsyceIjPPTQf/jrr7/o2rW7Xxk8rr/+Fuz2IKZOfZK8vDwGD/4H/fsPxDTdx5177nAyMjJ4/fVXSU1NITIyktNPP5Obbhp71M+Jp95Op8uvBcnpNHG5XGRl5ZGf7zyeh/ikZLVaiIoKJTs7H6ez9r+OK0v1Vr0DQaDWOzo61DscpKodVUCy2+1+Y4MqMmXKFF577TVGjx7N+PHjvd0qMTExhIWFkZqa6nd8amqqd9xSYmIiP/74o9/+oqIiMjMziY+Pr9Q5joV3DJItuMwXrtNZTj9MLfX770u9//aEg+DgEMaOvYuxY++q8HZRUdHMm7fIb9uVV17DlVde47etUaPGPPvszDK3P/PMs73//uSTr8rs93TnjRlzY4VlGDnyKkaOvKrC/ZXlqXd53WtAnQnCFXE6XXW6fhVRvQOL6h0YKvocrwpVHrs84Wj8+PFMmDDBb8yJYRj06NGDxYsX+91m0aJF9OrlHmvSu3dv9u3bx/bt2737Pcf37NmzUuc4Fp5S6jIjIiIiUqWDtBctWsRrr73GqFGjOP/889m/f793X1hYGOHh4YwePZrrr7+eDh06MHDgQD799FP+/vtvJk+eDEDXrl3p0aMHd955Jw8//DB5eXk8+OCDXHjhhd4WoiOd41h4W5B0oVoREZGAV6UtSF9//TXgnrk2YMAAv//eeOMNAAYMGMDjjz/Ohx9+yD//+U8WLlzIrFmzvF13hmHwwgsv0KRJE6666iruuOMOBg4c6LdQ5JHOcSy8M5vUgiQiIhLwDLP0Ba8CVGZ6JukvXYvtlO6EDrndb19xcREHDuylXr2G2O2VWyOptnDPEAuc/mqP8updl59ncNc5NjacjIzcgHrOVW/VOxAEar3j4sKrbWmD2j9nvcqoi01ERETcFJBKaJC2iIiIeCgglfCd5i8iIiKBTQGphLcFSV1sIiIiAU8BqYRakERERMRDAamEWpBqhsPh4KOP3vf+/frrL3PxxedX+f1U13lFRKRuUkDy8rQg1b3p3SezH374lhkznqnpYoiIiPhRQCphMdwByQiOqOGSBBYtwyUiIiejKr3USG1mxb2wliW68he8NU0THEXVVaTDswX5XeeushYs+IPXXpvFtm1bCA0N49RTB3DrrXeyadMG7rzzFh599ElmzZpBSkoKnTp15v77H+bDD9/l22/nYLPZueSSy7jqqmu95/vmm6/573/fZ+fOHcTFxTFs2AWMGjUaq9UKQErKPl5++UWWLl1MXl4uXbp04+abb6d16zbMnfsVjz/+CAADBvTi+ednec/73ntv8emn/yMrK4uOHTtx773307RpMwBycnJ48cXn+O23XyguLiYpqT033zyWdu06eG//xRez+eCDd9i/fz+9e/ehYcNGx/Qwi4hIYFJAKmEAJgaWqPhKHW+aJnlfTsaVsql6C1YBa0IbQof/56hCUmZmJvfffw+33nonp546gNTUFCZNeoiZM5/j7LPPwel08s47b/DQQ5NwOBzcc88dXH315QwbdgGvvPI233//Da+++hIDBgyiVavW/O9/HzBr1gvceuud9O7dl7Vr/2L69KfIysri9tvvIi8vl5tuupZGjRrz5JPTsNuDeOONV7j11ut4660POfPMf5CTk8Pzz0/jiy++JSoqmhUrlrFv315Wr17JlCnPUVxcxGOPPciTTz7Giy++imma3HPPWIKCQnjqqWeJiIjg22/ncNNN1/Lyy2/Stm07fvjhW6ZPf4rbb7+bXr36MH/+L7zyykzi4ysffkVEJLCpi82HGRaHcRRjkAyOvgWnJu3fn0JRUREJCYkkJjakS5duTJ36LBdddKn3mDFjbqRduw506tSFnj17Exoays03j6VZs1MYNepqALZs2YRpmrz33tuMGPEvRoy4hKZNmzFkyLlce+2NfPbZx+Tk5PDdd9+QlZXJY489RYcOnWjTpi0PPzyJ4OAQZs/+H8HBIUREuLs069Wrj91uB8Bms/Hgg4/RunUb2rfvyAUXjGDdurUALFu2hL/+Ws1jjz1Bx46dOOWU5txwwy107NiZjz/+LwCffPIRZ511NiNGXEKzZqdwxRVX07//aSfwkRYRkdpOLUg+XJGVb2EwDIPQ4f+pVV1sbdokcdZZQxg//k7q1atP7959Oe20gfTvP4hVq/4EoEmTpt7jQ0NDadiwkfd+goNDACguLiYzM4P09AN06dLN7z66d++Bw+Fg+/ZtbN68iaZNTyE2Nta7Pzg4hA4dOrJ58+YKyxkXV4/w8ENjwSIjoygsLARgw4Z1mKbJRRcN87tNUVGR95gtWzZx1llD/PZ36tSFjRs3VOZhEhERUUDyE3V0XTCGYdS6a7c9/PBkrrnmOhYu/D+WLFnEww9PpEuXbt5xRTab/0uiohBW0eBql8v0OU9Fx7iw2awVltFiqbhh0+VyER4ezuuvv1dmn6cFCgxM0/9ijaXrJSIicjjqYvN1lAGptlmz5i+ef34azZo151//upwpU57j/vsfYtmyJWRkZBzVueLi6hEXV8/b8uSxcuUK7HY7jRs3oVWrNuzcuZ2MjHTv/sLCQtat+5vmzVsCFQewirRs2Zrc3FyKi4tp0qSp97/333+b33+fB0CbNm1ZtWql3+3Wrfv7qO5HREQCmwKSDyM6saaLUK3Cw8OZPftjZs58nl27drJlyyZ+/PF7mjRpRkxMzFGf79//HsXs2f/js88+YdeunXz//be88cYrDB/+TyIiIvjHP4YSHR3DAw9M4O+/17Bp00YefXQi+fn5XHDBCMDdjQfuAFNYWHDE++zbN5k2bdry0EP3sXz5Unbt2smMGdOZO/crb+i64oqrmT//Fz744B127tzBJ5/8l19//emo6yciIoFL/Q4+Qho0PfJBtVjz5i2YPHkKb775Kp999jEWi4VevfowbdrzpKTsO+rz/fvfVxAUZOejjz7gueemEh+fwMiRV3H55aMAiIiIYMaMl3nhhWe5/fabAejSpSsvvfQ6jRo1BqBHj9506NCJm266hgceeOyI92m1WnnmmZnMnPkcDz44gfz8fJo3b8nkyVPo2bM3AKeeOoCHHprEG2+8wmuvzaJjx85cdtkV/PDDt0ddRxERCUyGqZX6ACjMzyevwIXD4Sqzr7i4iAMH9lKvXkPs9rq10rbNZim3znVdefWuy88zuOscGxtORkZuQD3nqrfqHQgCtd5xceFYrdXTGaYuthLBJV09IiIiIgpIIiIiIqUoIImIiIiUooAkIiIiUooCkoiIiEgpCkhHQRP+6jY9vyIi4qGAVAlWq/uyGEVFhTVcEqlOnufXatXyYCIigU7fBJVgsVgJDY0gJ8d9OY6goOCjvkTGycrlMnA6A6/lxLfepmlSVFRITk4GoaERh70WnIiIBAYFpEqKiooD8IakusJiseByBc6iYh7l1Ts0NML7PIuISGBTQKokwzCIjq5HZGQsTqejpotTJaxWg+joMLKy8gKqFam8elutNrUciYiIlwLSUbJYLFgsdeMyFDabhZCQEPLznQG1NH2g1ltERCpPP5lFRERESlFAEhERESlFAUlERESkFMPU6nheTmfgjUexWi2qdwBRvQOL6h1YArHeFotRbcvuKCCJiIiIlKIuNhEREZFSFJBERERESlFAEhERESlFAUlERESkFAUkERERkVIUkERERERKUUASERERKUUBSURERKQUBSQRERGRUhSQREREREpRQBIREREpRQFJREREpBQFJBEREZFSAjoguVwunn/+eU477TS6devGddddx86dO2u6WFUuJSWFpKSkMv/Nnj0bgL///psrrriCbt26MXjwYN55550aLvHxe/nllxk1apTftiPVsy68Hsqr98SJE8s894MHD/bur631zszM5MEHH2TgwIH06NGDf//73yxdutS7f8GCBYwYMYKuXbsydOhQ5syZ43f7wsJCHnnkEZKTk+nevTt33XUX6enpJ7oaR+1I9R49enSZ59v3NVFb633gwAHuuece+vXrR/fu3bn++uvZvHmzd39dfX8fqd519f3ta+vWrXTv3t37nQUn6Pk2A9iMGTPMvn37mr/88ov5999/m9dcc4159tlnm4WFhTVdtCr166+/mp07dzZTUlLM1NRU73/5+flmenq62bdvX/O+++4zN23aZH7yySdm586dzU8++aSmi33M3nvvPbNdu3bmFVdc4d1WmXrW9tdDefU2TdO8+OKLzenTp/s99wcOHPDur631Hj16tDls2DBzyZIl5pYtW8xHHnnE7NKli7l582Zz06ZNZufOnc3p06ebmzZtMl977TWzQ4cO5v/93/95bz9hwgTzrLPOMpcsWWKuXLnSvPDCC82RI0fWYI0q53D1Nk3TTE5ONj/44AO/5zsjI8N7+9pa70svvdS85JJLzJUrV5qbNm0yb7vtNnPAgAFmXl5enX5/H67epll3398eRUVF5ogRI8y2bduan376qWmaJ+7zPGADUmFhodm9e3fz/fff927Lysoyu3TpYn711Vc1WLKq98orr5jnn39+uftmzZplDhgwwCwuLvZumzZtmnn22WefqOJVmX379pk33HCD2a1bN3Po0KF+QeFI9azNr4fD1dvlcpndunUzv//++3JvW1vrvW3bNrNt27bm0qVLvdtcLpd51llnmc8++6z5wAMPmBdffLHfbcaNG2dec801pmm6H7N27dqZv/76q3f/li1bzLZt25rLly8/MZU4Bkeqd1pamtm2bVtzzZo15d6+ttY7MzPTHDdunLl+/Xrvtr///tts27atuXLlyjr7/j5Svevq+9vXtGnTzCuvvNIvIJ2o5ztgu9jWrVtHbm4uycnJ3m1RUVF06NCBJUuW1GDJqt769etp1apVufuWLl1Knz59sNls3m39+vVj27ZtpKWlnagiVok1a9Zgt9v58ssv6dq1q9++I9WzNr8eDlfvHTt2kJeXR8uWLcu9bW2td2xsLK+88gqdO3f2bjMMA8MwyM7OZunSpX51AvfzvWzZMkzTZNmyZd5tHi1atCAhIaFW13v9+vUYhkGLFi3KvX1trXd0dDTTpk2jbdu2AKSnp/PWW2+RmJhI69at6+z7+0j1rqvvb48lS5bw0Ucf8eSTT/ptP1HPt+3Ih9RN+/btA6Bhw4Z+2+Pj47376ooNGzYQGxvLyJEj2bp1K6eccgo33XQTAwcOZN++fd43n0d8fDwAe/fupX79+jVR5GMyePBgv753X0eqZ21+PRyu3hs2bADg3XffZf78+VgsFgYOHMidd95JZGRkra13VFQUgwYN8tv23XffsX37dv7zn//w2WefkZiY6Lc/Pj6e/Px8MjIySElJITY2luDg4DLH1OZ6b9iwgcjISB599FH++OMPwsLCGDp0KDfffDNBQUG1tt6+HnjgAf73v/8RFBTESy+9RFhYWJ1+f3uUV++6+v4GyM7O5t5772XixIllyn+inu+AbUHKz88HICgoyG97cHAwhYWFNVGkauFwONiyZQtZWVncdtttvPLKK3Tr1o3rr7+eBQsWUFBQUO5jANSpx+FI9ayrr4cNGzZgsViIj49n1qxZTJgwgd9//52bb74Zl8tVZ+q9fPly7rvvPs4++2xOP/30cp9vz99FRUXk5+eX2Q+1v94bNmygsLCQLl268Nprr3HTTTfx8ccfM3HiRIA6Ue+rrrqKTz/9lGHDhnHLLbewZs2agHh/l1fvuvz+fvjhh+nevTvnn39+mX0n6vkO2BakkJAQwP1h6fk3uB/c0NDQmipWlbPZbCxatAir1eqtZ6dOndi4cSOvv/46ISEhFBUV+d3G8wIKCws74eWtLkeqZ119Pdx0001cfvnlxMbGAtC2bVsaNGjAv/71L1avXl0n6v3jjz9y991306NHD6ZOnQq4PwhLP9+ev0NDQ8t9PUDtr/ejjz7K+PHjiY6OBtzPt91u58477+Tee++tE/Vu3bo1AJMnT2blypW89957AfH+Lq/ekydPrpPv788//5ylS5fy1Vdflbv/RD3fAduC5Gl6S01N9duemppKQkJCTRSp2oSHh/u9SADatGlDSkoKiYmJ5T4GQJ16HI5Uz7r6erBYLN4PT482bdoA7mbq2l7v9957j9tuu40zzjiDWbNmeX9FNmzYsNw6hYWFERkZSWJiIpmZmWU+ZGt7vW02mzccefg+37W13unp6cyZMweHw+HdZrFYaN26NampqXX2/X2ketfV9/enn37KgQMHOP300+nevTvdu3cH4KGHHmLMmDEn7PkO2IDUrl07IiIiWLRokXdbdnY2a9eupXfv3jVYsqq1ceNGevTo4VdPgL/++ovWrVvTu3dvli1bhtPp9O5buHAhLVq0oF69eie6uNXmSPWsq6+He++9l6uvvtpv2+rVqwH3L9LaXO8PPviAxx57jJEjRzJ9+nS/5vRevXqxePFiv+MXLlxIjx49sFgs9OzZE5fL5R20DO61VlJSUmp1vUeNGsV9993nd/zq1aux2+00b9681tY7LS2NcePGsWDBAu+24uJi1q5dS6tWrers+/tI9a6r7++pU6cyd+5cPv/8c+9/AGPHjmXy5Mkn7vmumol4tdP06dPNPn36mD/++KPfOglFRUU1XbQq43Q6zYsuusg899xzzSVLlpibNm0yH3/8cbNTp07m+vXrzbS0NLN3797m+PHjzY0bN5qffvqp2blzZ3P27Nk1XfTjMn78eL/p7pWpZ114PZSu948//mi2bdvWnDFjhrl9+3bz119/NQcPHmyOGzfOe0xtrPeWLVvMjh07mrfccovf+i+pqalmdna2uWHDBrNjx47mlClTzE2bNpmvv/56mXWQxo0bZw4ePNhcuHChdz2g0mtInWyOVO93333XbN++vfnBBx+YO3bsMOfMmWP27dvXnD59uvcctbHepmmaY8aMMc8++2xz8eLF5vr1681x48aZvXv3Nnfv3l2n39+Hq3ddfX+Xx3ea/4l6vgM6IDkcDvPpp582+/XrZ3br1s287rrrzJ07d9Z0sarc/v37zQkTJpj9+/c3O3fubF566aXmkiVLvPtXrlxp/utf/zI7depknnHGGea7775bg6WtGqWDgmkeuZ514fVQXr3nzp1rXnjhhWaXLl3M/v37m08++aRZUFDg3V8b6/3SSy+Zbdu2Lfe/8ePHm6ZpmvPmzTOHDRtmdurUyRw6dKg5Z84cv3Pk5uaa999/v9mrVy+zV69e5rhx48z09PSaqE6lVabe7733nnnOOed4X+cvvfSS6XQ6veeojfU2TdPMzs42H3roIbN///5mly5dzGuuucbcsGGDd39dfX8fqd518f1dHt+AZJon5vk2TNM0q6pZTERERKQuCNgxSCIiIiIVUUASERERKUUBSURERKQUBSQRERGRUhSQREREREpRQBIREREpRQFJREREpBQFJBEREZFSFJBEpE7YtWsXSUlJzJ49+7jPNWHCBAYPHlwFpRKR2spW0wUQEakK8fHxfPTRRzRr1qymiyIidYACkojUCUFBQXTr1q2miyEidYS62ETkhPj4448577zz6NSpE6effjozZszA6XQC7i6tUaNG8cknn3DGGWfQvXt3rrrqKtatW+e9vcvl4plnnmHw4MF06tSJwYMHM23aNIqLi4Hyu9i2bdvG2LFj6d+/P926dWPUqFEsW7bMr1xZWVncd9999OnTh969ezNlyhRcLleZ8v/444+MGDGCzp07079/fyZNmkReXp53f0FBAQ8//DADBw6kU6dODB06lNdff71KH0MROXHUgiQi1e7ll1/mmWee4YorruC+++7j77//ZsaMGezdu5fHH38cgL///pstW7Ywbtw4oqOjef7557niiiuYO3cu8fHxvPrqq3z44YeMHz+epk2bsnLlSp555hnsdjtjx44tc5+bNm3iX//6F82bN2fixInY7XbeeecdrrrqKt544w369OmDy+VizJgx7N69m/HjxxMTE8Nrr73G6tWriY+P957rq6++4u677+b888/njjvuYPfu3TzzzDNs2rSJN998E8MwePzxx/n9998ZP3489evXZ/78+Tz99NPExMRw0UUXnbDHWkSqhgKSiFSrgwcPMnPmTC699FImTpwIwIABA4iJiWHixImMHj3ae9ysWbPo1asXAF26dOGss87inXfe4e6772bx4sV06tTJGzb69OlDaGgokZGR5d7vCy+8QFBQEO+88w4REREAnH766QwbNoynn36aTz75hPnz57Nq1SpeffVVBg4cCEBycrLfAG3TNJk6dSqnnXYaU6dO9W5v3rw5V199NfPmzeP0009n8eLF9O/fn/POOw+Avn37EhYWRr169ary4RSRE0RdbCJSrVasWEFBQQGDBw/G4XB4//OEkD/++AOAJk2aeMMRuAddd+/enSVLlgDuwPHHH39w+eWX89prr7Fp0yauuOIKLrjggnLvd/HixZxxxhnecARgs9k477zz+Ouvv8jNzWXp0qXY7XZOO+007zFhYWEMGjTI+/eWLVvYt29fmfL37t2biIgIb/n79u3L//73P6677jree+89du7cyS233MLpp59eNQ+kiJxQakESkWqVmZkJwPXXX1/u/tTUVAASEhLK7KtXrx5r1qwBYMyYMYSHh/Ppp58ydepUpkyZQps2bZg4cSL9+vUrc9usrCzq169fZnv9+vUxTZOcnByysrKIiYnBMAy/Yxo0aFCm/I888giPPPJIheW///77SUxM5Msvv+Sxxx7jscceo3v37jz88MO0a9eu3LqLyMlLAUlEqlVUVBQAU6dOpXnz5mX2169fn+eee46MjIwy+9LS0rxdVBaLhZEjRzJy5EgOHDjAvHnzmDVrFrfddpu3FcdXdHQ0aWlpZbbv378fgNjYWGJjY8nIyMDpdGK1Wr3HeEKRb/nvvfde+vTpU+79gHsW3U033cRNN93Enj17+OWXX5g5cyZ33XUXc+bMqejhEZGTlLrYRKRade3aFbvdTkpKCp07d/b+Z7PZmD59Ort27QLcM842b97svV1KSgorVqwgOTkZgMsuu4xJkyYB7palESNGMHLkSLKzs8nJySlzv7179+aXX37x2+d0OpkzZw6dO3cmKCiI5ORkHA4HP/74o/eYoqIiv8DVsmVL6tWrx65du/zKn5CQwLRp01i7di0FBQUMGTKEN954A4BGjRoxcuRIzjvvPPbs2VOFj6aInChqQRKRahUbG8uYMWN47rnnyMnJoW/fvqSkpPDcc89hGIa3+8k0TW688UbuvPNOrFYrL7zwAtHR0YwaNQpwB5433niD+vXr0717d1JSUnjzzTfp06cPcXFxflPuAW699Vbmz5/PlVdeyfXXX4/dbveODXrttdcA94DsAQMGMHHiRA4cOEDjxo155513SE9P97ZcWa1W7rzzTh588EGsVitnnHEG2dnZzJw5k5SUFDp27EhISAgdO3bkhRdewG63k5SUxNatW/nss88YMmTICXy0RaSqGKZpmjVdCBGp+95//30++OADtm/fTnR0NMnJyYwbN45GjRoxYcIEFi9ezHXXXceLL75Ifn4+p556KuPHj6dJkyYAOBwOXnrpJb788kv27dtHZGQkgwcP5q677iI2NpZdu3Zx5pln8sQTTzBixAjAvXTA9OnTWbp0KYZh0KVLF2699Va/weD5+flMnTqVOXPmUFhYyLnnnktYWBg//fQTP//8s/e4uXPn8tprr7Fx40bCwsLo0aMHd9xxB0lJSQDk5OTw7LPP8tNPP7F//37q1avHueeey+23305ISMgJfKRFpCooIIlIjfMEJN9AIiJSkzQGSURERKQUBSQRERGRUtTFJiIiIlKKWpBERERESlFAEhERESlFAUlERESkFAUkERERkVIUkERERERKUUASERERKUUBSURERKQUBSQRERGRUv4fwQ4H0Ypddl0AAAAASUVORK5CYII=", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "image/png": "", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "# 获取参数\n", - "cfg = Config() \n", - "# 训练\n", - "env, agent = env_agent_config(cfg)\n", - "res_dic = train(cfg, env, agent)\n", - " \n", - "plot_rewards(res_dic['rewards'], title=f\"training curve on {cfg.device} of {cfg.algo_name} for {cfg.env_name}\") \n", - "# 测试\n", - "res_dic = test(cfg, env, agent)\n", - "plot_rewards(res_dic['rewards'], title=f\"testing curve on {cfg.device} of {cfg.algo_name} for {cfg.env_name}\") # 画出结果" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.7.12 ('easyrl')", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.12" - }, - "orig_nbformat": 4, - "vscode": { - "interpreter": { - "hash": "f5a9629e9f3b9957bf68a43815f911e93447d47b3d065b6a8a04975e44c504d9" - } - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/projects/notebooks/Value Iteration/README.md b/projects/notebooks/Value Iteration/README.md deleted file mode 100644 index e69de29..0000000 diff --git a/projects/notebooks/common/multiprocessing_env.py b/projects/notebooks/common/multiprocessing_env.py deleted file mode 100644 index 28c8aba..0000000 --- a/projects/notebooks/common/multiprocessing_env.py +++ /dev/null @@ -1,153 +0,0 @@ -# 该代码来自 openai baseline,用于多线程环境 -# https://github.com/openai/baselines/tree/master/baselines/common/vec_env - -import numpy as np -from multiprocessing import Process, Pipe - -def worker(remote, parent_remote, env_fn_wrapper): - parent_remote.close() - env = env_fn_wrapper.x() - while True: - cmd, data = remote.recv() - if cmd == 'step': - ob, reward, done, info = env.step(data) - if done: - ob = env.reset() - remote.send((ob, reward, done, info)) - elif cmd == 'reset': - ob = env.reset() - remote.send(ob) - elif cmd == 'reset_task': - ob = env.reset_task() - remote.send(ob) - elif cmd == 'close': - remote.close() - break - elif cmd == 'get_spaces': - remote.send((env.observation_space, env.action_space)) - else: - raise NotImplementedError - -class VecEnv(object): - """ - An abstract asynchronous, vectorized environment. - """ - def __init__(self, num_envs, observation_space, action_space): - self.num_envs = num_envs - self.observation_space = observation_space - self.action_space = action_space - - def reset(self): - """ - Reset all the environments and return an array of - observations, or a tuple of observation arrays. - If step_async is still doing work, that work will - be cancelled and step_wait() should not be called - until step_async() is invoked again. - """ - pass - - def step_async(self, actions): - """ - Tell all the environments to start taking a step - with the given actions. - Call step_wait() to get the results of the step. - You should not call this if a step_async run is - already pending. - """ - pass - - def step_wait(self): - """ - Wait for the step taken with step_async(). - Returns (obs, rews, dones, infos): - - obs: an array of observations, or a tuple of - arrays of observations. - - rews: an array of rewards - - dones: an array of "episode done" booleans - - infos: a sequence of info objects - """ - pass - - def close(self): - """ - Clean up the environments' resources. - """ - pass - - def step(self, actions): - self.step_async(actions) - return self.step_wait() - - -class CloudpickleWrapper(object): - """ - Uses cloudpickle to serialize contents (otherwise multiprocessing tries to use pickle) - """ - def __init__(self, x): - self.x = x - def __getstate__(self): - import cloudpickle - return cloudpickle.dumps(self.x) - def __setstate__(self, ob): - import pickle - self.x = pickle.loads(ob) - - -class SubprocVecEnv(VecEnv): - def __init__(self, env_fns, spaces=None): - """ - envs: list of gym environments to run in subprocesses - """ - self.waiting = False - self.closed = False - nenvs = len(env_fns) - self.nenvs = nenvs - self.remotes, self.work_remotes = zip(*[Pipe() for _ in range(nenvs)]) - self.ps = [Process(target=worker, args=(work_remote, remote, CloudpickleWrapper(env_fn))) - for (work_remote, remote, env_fn) in zip(self.work_remotes, self.remotes, env_fns)] - for p in self.ps: - p.daemon = True # if the main process crashes, we should not cause things to hang - p.start() - for remote in self.work_remotes: - remote.close() - - self.remotes[0].send(('get_spaces', None)) - observation_space, action_space = self.remotes[0].recv() - VecEnv.__init__(self, len(env_fns), observation_space, action_space) - - def step_async(self, actions): - for remote, action in zip(self.remotes, actions): - remote.send(('step', action)) - self.waiting = True - - def step_wait(self): - results = [remote.recv() for remote in self.remotes] - self.waiting = False - obs, rews, dones, infos = zip(*results) - return np.stack(obs), np.stack(rews), np.stack(dones), infos - - def reset(self): - for remote in self.remotes: - remote.send(('reset', None)) - return np.stack([remote.recv() for remote in self.remotes]) - - def reset_task(self): - for remote in self.remotes: - remote.send(('reset_task', None)) - return np.stack([remote.recv() for remote in self.remotes]) - - def close(self): - if self.closed: - return - if self.waiting: - for remote in self.remotes: - remote.recv() - for remote in self.remotes: - remote.send(('close', None)) - for p in self.ps: - p.join() - self.closed = True - - def __len__(self): - return self.nenvs \ No newline at end of file diff --git a/projects/parl_tutorials/DDPG.ipynb b/projects/parl_tutorials/DDPG.ipynb deleted file mode 100644 index a0db09f..0000000 --- a/projects/parl_tutorials/DDPG.ipynb +++ /dev/null @@ -1,465 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 1. 定义算法" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[32m[09-28 00:38:01 MainThread @utils.py:73]\u001b[0m paddlepaddle version: 2.3.2.\n" - ] - } - ], - "source": [ - "import parl\n", - "import paddle\n", - "import paddle.nn as nn\n", - "import paddle.nn.functional as F\n", - "class Actor(parl.Model):\n", - " def __init__(self, n_states, n_actions):\n", - " super(Actor, self).__init__()\n", - "\n", - " self.l1 = nn.Linear(n_states, 400)\n", - " self.l2 = nn.Linear(400, 300)\n", - " self.l3 = nn.Linear(300, n_actions)\n", - "\n", - " def forward(self, state):\n", - " x = F.relu(self.l1(state))\n", - " x = F.relu(self.l2(x))\n", - " return paddle.tanh(self.l3(x))\n", - "\n", - "class Critic(parl.Model):\n", - " def __init__(self, n_states, n_actions):\n", - " super(Critic, self).__init__()\n", - "\n", - " self.l1 = nn.Linear(n_states, 400)\n", - " self.l2 = nn.Linear(400 + n_actions, 300)\n", - " self.l3 = nn.Linear(300, 1)\n", - "\n", - " def forward(self, state, action):\n", - " x = F.relu(self.l1(state))\n", - " x = F.relu(self.l2(paddle.concat([x, action], 1)))\n", - " return self.l3(x)\n", - "class ActorCritic(parl.Model):\n", - " def __init__(self, n_states, n_actions):\n", - " super(ActorCritic, self).__init__()\n", - " self.actor_model = Actor(n_states, n_actions)\n", - " self.critic_model = Critic(n_states, n_actions)\n", - "\n", - " def policy(self, state):\n", - " return self.actor_model(state)\n", - "\n", - " def value(self, state, action):\n", - " return self.critic_model(state, action)\n", - "\n", - " def get_actor_params(self):\n", - " return self.actor_model.parameters()\n", - "\n", - " def get_critic_params(self):\n", - " return self.critic_model.parameters()" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "from collections import deque\n", - "import random\n", - "class ReplayBuffer:\n", - " def __init__(self, capacity: int) -> None:\n", - " self.capacity = capacity\n", - " self.buffer = deque(maxlen=self.capacity)\n", - " def push(self,transitions):\n", - " '''_summary_\n", - " Args:\n", - " trainsitions (tuple): _description_\n", - " '''\n", - " self.buffer.append(transitions)\n", - " def sample(self, batch_size: int, sequential: bool = False):\n", - " if batch_size > len(self.buffer):\n", - " batch_size = len(self.buffer)\n", - " if sequential: # sequential sampling\n", - " rand = random.randint(0, len(self.buffer) - batch_size)\n", - " batch = [self.buffer[i] for i in range(rand, rand + batch_size)]\n", - " return zip(*batch)\n", - " else:\n", - " batch = random.sample(self.buffer, batch_size)\n", - " return zip(*batch)\n", - " def clear(self):\n", - " self.buffer.clear()\n", - " def __len__(self):\n", - " return len(self.buffer)" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "import parl\n", - "import paddle\n", - "import numpy as np\n", - "\n", - "\n", - "class DDPGAgent(parl.Agent):\n", - " def __init__(self, algorithm,memory,cfg):\n", - " super(DDPGAgent, self).__init__(algorithm)\n", - " self.n_actions = cfg['n_actions']\n", - " self.expl_noise = cfg['expl_noise']\n", - " self.batch_size = cfg['batch_size'] \n", - " self.memory = memory\n", - " self.alg.sync_target(decay=0)\n", - "\n", - " def sample_action(self, state):\n", - " action_numpy = self.predict_action(state)\n", - " action_noise = np.random.normal(0, self.expl_noise, size=self.n_actions)\n", - " action = (action_numpy + action_noise).clip(-1, 1)\n", - " return action\n", - "\n", - " def predict_action(self, state):\n", - " state = paddle.to_tensor(state.reshape(1, -1), dtype='float32')\n", - " action = self.alg.predict(state)\n", - " action_numpy = action.cpu().numpy()[0]\n", - " return action_numpy\n", - "\n", - " def update(self):\n", - " if len(self.memory) < self.batch_size: \n", - " return\n", - " state_batch, action_batch, reward_batch, next_state_batch, done_batch = self.memory.sample(\n", - " self.batch_size)\n", - " done_batch = np.expand_dims(done_batch , -1)\n", - " reward_batch = np.expand_dims(reward_batch, -1)\n", - " state_batch = paddle.to_tensor(state_batch, dtype='float32')\n", - " action_batch = paddle.to_tensor(action_batch, dtype='float32')\n", - " reward_batch = paddle.to_tensor(reward_batch, dtype='float32')\n", - " next_state_batch = paddle.to_tensor(next_state_batch, dtype='float32')\n", - " done_batch = paddle.to_tensor(done_batch, dtype='float32')\n", - " critic_loss, actor_loss = self.alg.learn(state_batch, action_batch, reward_batch, next_state_batch,\n", - " done_batch)" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [], - "source": [ - "def train(cfg, env, agent):\n", - " ''' 训练\n", - " '''\n", - " print(f\"开始训练!\")\n", - " rewards = [] # 记录所有回合的奖励\n", - " for i_ep in range(cfg[\"train_eps\"]):\n", - " ep_reward = 0 \n", - " state = env.reset() \n", - " for i_step in range(cfg['max_steps']):\n", - " action = agent.sample_action(state) # 采样动作\n", - " next_state, reward, done, _ = env.step(action) \n", - " agent.memory.push((state, action, reward,next_state, done)) \n", - " state = next_state \n", - " agent.update() \n", - " ep_reward += reward \n", - " if done:\n", - " break\n", - " rewards.append(ep_reward)\n", - " if (i_ep + 1) % 10 == 0:\n", - " print(f\"回合:{i_ep+1}/{cfg['train_eps']},奖励:{ep_reward:.2f}\")\n", - " print(\"完成训练!\")\n", - " env.close()\n", - " res_dic = {'episodes':range(len(rewards)),'rewards':rewards}\n", - " return res_dic\n", - "\n", - "def test(cfg, env, agent):\n", - " print(\"开始测试!\")\n", - " rewards = [] # 记录所有回合的奖励\n", - " for i_ep in range(cfg['test_eps']):\n", - " ep_reward = 0 \n", - " state = env.reset() \n", - " for i_step in range(cfg['max_steps']):\n", - " action = agent.predict_action(state) \n", - " next_state, reward, done, _ = env.step(action) \n", - " state = next_state \n", - " ep_reward += reward \n", - " if done:\n", - " break\n", - " rewards.append(ep_reward)\n", - " print(f\"回合:{i_ep+1}/{cfg['test_eps']},奖励:{ep_reward:.2f}\")\n", - " print(\"完成测试!\")\n", - " env.close()\n", - " return {'episodes':range(len(rewards)),'rewards':rewards}\n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "import gym\n", - "import os\n", - "import paddle\n", - "import numpy as np\n", - "import random\n", - "from parl.algorithms import DDPG\n", - "class NormalizedActions(gym.ActionWrapper):\n", - " ''' 将action范围重定在[0.1]之间\n", - " '''\n", - " def action(self, action):\n", - " low_bound = self.action_space.low\n", - " upper_bound = self.action_space.high\n", - " action = low_bound + (action + 1.0) * 0.5 * (upper_bound - low_bound)\n", - " action = np.clip(action, low_bound, upper_bound)\n", - " return action\n", - "\n", - " def reverse_action(self, action):\n", - " low_bound = self.action_space.low\n", - " upper_bound = self.action_space.high\n", - " action = 2 * (action - low_bound) / (upper_bound - low_bound) - 1\n", - " action = np.clip(action, low_bound, upper_bound)\n", - " return action\n", - "def all_seed(env,seed = 1):\n", - " ''' 万能的seed函数\n", - " '''\n", - " env.seed(seed) # env config\n", - " np.random.seed(seed)\n", - " random.seed(seed)\n", - " paddle.seed(seed)\n", - "def env_agent_config(cfg):\n", - " env = NormalizedActions(gym.make(cfg['env_name'])) # 装饰action噪声\n", - " if cfg['seed'] !=0:\n", - " all_seed(env,seed=cfg['seed'])\n", - " n_states = env.observation_space.shape[0]\n", - " n_actions = env.action_space.shape[0]\n", - " print(f\"状态维度:{n_states},动作维度:{n_actions}\")\n", - " cfg.update({\"n_states\":n_states,\"n_actions\":n_actions}) # 更新n_states和n_actions到cfg参数中\n", - " memory = ReplayBuffer(cfg['memory_capacity'])\n", - " model = ActorCritic(n_states, n_actions)\n", - " algorithm = DDPG(model, gamma=cfg['gamma'], tau=cfg['tau'], actor_lr=cfg['actor_lr'], critic_lr=cfg['critic_lr'])\n", - " agent = DDPGAgent(algorithm,memory,cfg)\n", - " return env,agent" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "\n", - "import argparse\n", - "import matplotlib.pyplot as plt\n", - "import seaborn as sns\n", - "def get_args():\n", - " \"\"\" 超参数\n", - " \"\"\"\n", - " parser = argparse.ArgumentParser(description=\"hyperparameters\") \n", - " parser.add_argument('--algo_name',default='DDPG',type=str,help=\"name of algorithm\")\n", - " parser.add_argument('--env_name',default='Pendulum-v0',type=str,help=\"name of environment\")\n", - " parser.add_argument('--train_eps',default=200,type=int,help=\"episodes of training\")\n", - " parser.add_argument('--test_eps',default=20,type=int,help=\"episodes of testing\")\n", - " parser.add_argument('--max_steps',default=100000,type=int,help=\"steps per episode, much larger value can simulate infinite steps\")\n", - " parser.add_argument('--gamma',default=0.99,type=float,help=\"discounted factor\")\n", - " parser.add_argument('--critic_lr',default=1e-3,type=float,help=\"learning rate of critic\")\n", - " parser.add_argument('--actor_lr',default=1e-4,type=float,help=\"learning rate of actor\")\n", - " parser.add_argument('--memory_capacity',default=80000,type=int,help=\"memory capacity\")\n", - " parser.add_argument('--expl_noise',default=0.1,type=float)\n", - " parser.add_argument('--batch_size',default=128,type=int)\n", - " parser.add_argument('--target_update',default=2,type=int)\n", - " parser.add_argument('--tau',default=1e-2,type=float)\n", - " parser.add_argument('--critic_hidden_dim',default=256,type=int)\n", - " parser.add_argument('--actor_hidden_dim',default=256,type=int)\n", - " parser.add_argument('--device',default='cpu',type=str,help=\"cpu or cuda\") \n", - " parser.add_argument('--seed',default=1,type=int,help=\"random seed\")\n", - " args = parser.parse_args([]) \n", - " args = {**vars(args)} # 将args转换为字典 \n", - " # 打印参数\n", - " print(\"训练参数如下:\")\n", - " print(''.join(['=']*80))\n", - " tplt = \"{:^20}\\t{:^20}\\t{:^20}\"\n", - " print(tplt.format(\"参数名\",\"参数值\",\"参数类型\"))\n", - " for k,v in args.items():\n", - " print(tplt.format(k,v,str(type(v)))) \n", - " print(''.join(['=']*80)) \n", - " return args\n", - "def smooth(data, weight=0.9): \n", - " '''用于平滑曲线,类似于Tensorboard中的smooth\n", - "\n", - " Args:\n", - " data (List):输入数据\n", - " weight (Float): 平滑权重,处于0-1之间,数值越高说明越平滑,一般取0.9\n", - "\n", - " Returns:\n", - " smoothed (List): 平滑后的数据\n", - " '''\n", - " last = data[0] # First value in the plot (first timestep)\n", - " smoothed = list()\n", - " for point in data:\n", - " smoothed_val = last * weight + (1 - weight) * point # 计算平滑值\n", - " smoothed.append(smoothed_val) \n", - " last = smoothed_val \n", - " return smoothed\n", - "\n", - "def plot_rewards(rewards,cfg,path=None,tag='train'):\n", - " sns.set()\n", - " plt.figure() # 创建一个图形实例,方便同时多画几个图\n", - " plt.title(f\"{tag}ing curve on {cfg['device']} of {cfg['algo_name']} for {cfg['env_name']}\")\n", - " plt.xlabel('epsiodes')\n", - " plt.plot(rewards, label='rewards')\n", - " plt.plot(smooth(rewards), label='smoothed')\n", - " plt.legend()\n" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "训练参数如下:\n", - "================================================================================\n", - " 参数名 \t 参数值 \t 参数类型 \n", - " algo_name \t DDPG \t \n", - " env_name \t Pendulum-v0 \t \n", - " train_eps \t 200 \t \n", - " test_eps \t 20 \t \n", - " max_steps \t 100000 \t \n", - " gamma \t 0.99 \t \n", - " critic_lr \t 0.001 \t \n", - " actor_lr \t 0.0001 \t \n", - " memory_capacity \t 80000 \t \n", - " expl_noise \t 0.1 \t \n", - " batch_size \t 128 \t \n", - " target_update \t 2 \t \n", - " tau \t 0.01 \t \n", - " critic_hidden_dim \t 256 \t \n", - " actor_hidden_dim \t 256 \t \n", - " device \t cpu \t \n", - " seed \t 1 \t \n", - "================================================================================\n", - "状态维度:3,动作维度:1\n", - "开始训练!\n", - "回合:10/200,奖励:-945.22\n", - "回合:20/200,奖励:-700.56\n", - "回合:30/200,奖励:-128.48\n", - "回合:40/200,奖励:-266.74\n", - "回合:50/200,奖励:-387.26\n", - "回合:60/200,奖励:-133.07\n", - "回合:70/200,奖励:-243.47\n", - "回合:80/200,奖励:-383.76\n", - "回合:90/200,奖励:-130.47\n", - "回合:100/200,奖励:-385.78\n", - "回合:110/200,奖励:-128.11\n", - "回合:120/200,奖励:-245.72\n", - "回合:130/200,奖励:-3.26\n", - "回合:140/200,奖励:-231.93\n", - "回合:150/200,奖励:-122.84\n", - "回合:160/200,奖励:-370.19\n", - "回合:170/200,奖励:-126.60\n", - "回合:180/200,奖励:-118.99\n", - "回合:190/200,奖励:-115.58\n", - "回合:200/200,奖励:-246.70\n", - "完成训练!\n", - "开始测试!\n", - "回合:1/20,奖励:-122.76\n", - "回合:2/20,奖励:-1.78\n", - "回合:3/20,奖励:-128.77\n", - "回合:4/20,奖励:-124.03\n", - "回合:5/20,奖励:-125.87\n", - "回合:6/20,奖励:-130.87\n", - "回合:7/20,奖励:-127.97\n", - "回合:8/20,奖励:-134.63\n", - "回合:9/20,奖励:-126.38\n", - "回合:10/20,奖励:-1.42\n", - "回合:11/20,奖励:-126.13\n", - "回合:12/20,奖励:-1.88\n", - "回合:13/20,奖励:-133.22\n", - "回合:14/20,奖励:-132.14\n", - "回合:15/20,奖励:-245.42\n", - "回合:16/20,奖励:-123.41\n", - "回合:17/20,奖励:-127.20\n", - "回合:18/20,奖励:-130.53\n", - "回合:19/20,奖励:-129.29\n", - "回合:20/20,奖励:-288.72\n", - "完成测试!\n" - ] - }, - { - "data": { - "image/png": "", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "image/png": "", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "# 获取参数\n", - "cfg = get_args() \n", - "# 训练\n", - "env, agent = env_agent_config(cfg)\n", - "res_dic = train(cfg, env, agent)\n", - " \n", - "plot_rewards(res_dic['rewards'], cfg, tag=\"train\") \n", - "# 测试\n", - "res_dic = test(cfg, env, agent)\n", - "plot_rewards(res_dic['rewards'], cfg, tag=\"test\") # 画出结果" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.7.13 ('parl')", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.13" - }, - "orig_nbformat": 4, - "vscode": { - "interpreter": { - "hash": "29c8e495d55843cb894bac6655c13e4a65f834e86169d4dce1750654c48fe628" - } - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/projects/parl_tutorials/DQN.ipynb b/projects/parl_tutorials/DQN.ipynb deleted file mode 100644 index 4b02022..0000000 --- a/projects/parl_tutorials/DQN.ipynb +++ /dev/null @@ -1,538 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 1、定义算法\n", - "相比于Q learning,DQN本质上是为了适应更为复杂的环境,并且经过不断的改良迭代,到了Nature DQN(即Volodymyr Mnih发表的Nature论文)这里才算是基本完善。DQN主要改动的点有三个:\n", - "* 使用深度神经网络替代原来的Q表:这个很容易理解原因\n", - "* 使用了经验回放(Replay Buffer):这个好处有很多,一个是使用一堆历史数据去训练,比之前用一次就扔掉好多了,大大提高样本效率,另外一个是面试常提到的,减少样本之间的相关性,原则上获取经验跟学习阶段是分开的,原来时序的训练数据有可能是不稳定的,打乱之后再学习有助于提高训练的稳定性,跟深度学习中划分训练测试集时打乱样本是一个道理。\n", - "* 使用了两个网络:即策略网络和目标网络,每隔若干步才把每步更新的策略网络参数复制给目标网络,这样做也是为了训练的稳定,避免Q值的估计发散。想象一下,如果当前有个transition(这个Q learning中提过的,一定要记住!!!)样本导致对Q值进行了较差的过估计,如果接下来从经验回放中提取到的样本正好连续几个都这样的,很有可能导致Q值的发散(它的青春小鸟一去不回来了)。再打个比方,我们玩RPG或者闯关类游戏,有些人为了破纪录经常Save和Load,只要我出了错,我不满意我就加载之前的存档,假设不允许加载呢,就像DQN算法一样训练过程中会退不了,这时候是不是搞两个档,一个档每帧都存一下,另外一个档打了不错的结果再存,也就是若干个间隔再存一下,到最后用间隔若干步数再存的档一般都比每帧都存的档好些呢。当然你也可以再搞更多个档,也就是DQN增加多个目标网络,但是对于DQN则没有多大必要,多几个网络效果不见得会好很多。" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 1.1 定义模型" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[32m[09-26 17:18:11 MainThread @utils.py:73]\u001b[0m paddlepaddle version: 2.3.2.\n" - ] - } - ], - "source": [ - "\n", - "import paddle\n", - "import paddle.nn as nn\n", - "import paddle.nn.functional as F\n", - "import parl\n", - "\n", - "class MLP(parl.Model):\n", - " \"\"\" Linear network to solve Cartpole problem.\n", - " Args:\n", - " input_dim (int): Dimension of observation space.\n", - " output_dim (int): Dimension of action space.\n", - " \"\"\"\n", - "\n", - " def __init__(self, input_dim, output_dim):\n", - " super(MLP, self).__init__()\n", - " hidden_dim1 = 256\n", - " hidden_dim2 = 256\n", - " self.fc1 = nn.Linear(input_dim, hidden_dim1)\n", - " self.fc2 = nn.Linear(hidden_dim1, hidden_dim2)\n", - " self.fc3 = nn.Linear(hidden_dim2, output_dim)\n", - "\n", - " def forward(self, state):\n", - " x = F.relu(self.fc1(state))\n", - " x = F.relu(self.fc2(x))\n", - " x = self.fc3(x)\n", - " return x" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 1.2 定义经验回放" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "from collections import deque\n", - "class ReplayBuffer:\n", - " def __init__(self, capacity: int) -> None:\n", - " self.capacity = capacity\n", - " self.buffer = deque(maxlen=self.capacity)\n", - " def push(self,transitions):\n", - " '''_summary_\n", - " Args:\n", - " trainsitions (tuple): _description_\n", - " '''\n", - " self.buffer.append(transitions)\n", - " def sample(self, batch_size: int, sequential: bool = False):\n", - " if batch_size > len(self.buffer):\n", - " batch_size = len(self.buffer)\n", - " if sequential: # sequential sampling\n", - " rand = random.randint(0, len(self.buffer) - batch_size)\n", - " batch = [self.buffer[i] for i in range(rand, rand + batch_size)]\n", - " return zip(*batch)\n", - " else:\n", - " batch = random.sample(self.buffer, batch_size)\n", - " return zip(*batch)\n", - " def clear(self):\n", - " self.buffer.clear()\n", - " def __len__(self):\n", - " return len(self.buffer)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 1.3 定义智能体" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "from random import random\n", - "import parl\n", - "import paddle\n", - "import math\n", - "import numpy as np\n", - "\n", - "\n", - "class DQNAgent(parl.Agent):\n", - " \"\"\"Agent of DQN.\n", - " \"\"\"\n", - "\n", - " def __init__(self, algorithm, memory,cfg):\n", - " super(DQNAgent, self).__init__(algorithm)\n", - " self.n_actions = cfg['n_actions']\n", - " self.epsilon = cfg['epsilon_start']\n", - " self.sample_count = 0 \n", - " self.epsilon_start = cfg['epsilon_start']\n", - " self.epsilon_end = cfg['epsilon_end']\n", - " self.epsilon_decay = cfg['epsilon_decay']\n", - " self.batch_size = cfg['batch_size']\n", - " self.global_step = 0\n", - " self.update_target_steps = 600\n", - " self.memory = memory # replay buffer\n", - "\n", - " def sample_action(self, state):\n", - " self.sample_count += 1\n", - " # epsilon must decay(linear,exponential and etc.) for balancing exploration and exploitation\n", - " self.epsilon = self.epsilon_end + (self.epsilon_start - self.epsilon_end) * \\\n", - " math.exp(-1. * self.sample_count / self.epsilon_decay) \n", - " if random.random() < self.epsilon:\n", - " action = np.random.randint(self.n_actions)\n", - " else:\n", - " action = self.predict_action(state)\n", - " return action\n", - "\n", - " def predict_action(self, state):\n", - " state = paddle.to_tensor(state , dtype='float32')\n", - " q_values = self.alg.predict(state) # self.alg 是自带的算法\n", - " action = q_values.argmax().numpy()[0]\n", - " return action\n", - "\n", - " def update(self):\n", - " \"\"\"Update model with an episode data\n", - " Args:\n", - " obs(np.float32): shape of (batch_size, obs_dim)\n", - " act(np.int32): shape of (batch_size)\n", - " reward(np.float32): shape of (batch_size)\n", - " next_obs(np.float32): shape of (batch_size, obs_dim)\n", - " terminal(np.float32): shape of (batch_size)\n", - " Returns:\n", - " loss(float)\n", - " \"\"\"\n", - " if len(self.memory) < self.batch_size: # when transitions in memory donot meet a batch, not update\n", - " return\n", - " \n", - " if self.global_step % self.update_target_steps == 0:\n", - " self.alg.sync_target()\n", - " self.global_step += 1\n", - " state_batch, action_batch, reward_batch, next_state_batch, done_batch = self.memory.sample(\n", - " self.batch_size)\n", - " action_batch = np.expand_dims(action_batch, axis=-1)\n", - " reward_batch = np.expand_dims(reward_batch, axis=-1)\n", - " done_batch = np.expand_dims(done_batch, axis=-1)\n", - "\n", - " state_batch = paddle.to_tensor(state_batch, dtype='float32')\n", - " action_batch = paddle.to_tensor(action_batch, dtype='int32')\n", - " reward_batch = paddle.to_tensor(reward_batch, dtype='float32')\n", - " next_state_batch = paddle.to_tensor(next_state_batch, dtype='float32')\n", - " done_batch = paddle.to_tensor(done_batch, dtype='float32')\n", - " loss = self.alg.learn(state_batch, action_batch, reward_batch, next_state_batch, done_batch) " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 2. 定义训练" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [], - "source": [ - "def train(cfg, env, agent):\n", - " ''' 训练\n", - " '''\n", - " print(f\"开始训练!\")\n", - " print(f\"环境:{cfg['env_name']},算法:{cfg['algo_name']},设备:{cfg['device']}\")\n", - " rewards = [] # record rewards for all episodes\n", - " steps = []\n", - " for i_ep in range(cfg[\"train_eps\"]):\n", - " ep_reward = 0 # reward per episode\n", - " ep_step = 0\n", - " state = env.reset() # reset and obtain initial state\n", - " for _ in range(cfg['ep_max_steps']):\n", - " ep_step += 1\n", - " action = agent.sample_action(state) # sample action\n", - " next_state, reward, done, _ = env.step(action) # update env and return transitions\n", - " agent.memory.push((state, action, reward,next_state, done)) # save transitions\n", - " state = next_state # update next state for env\n", - " agent.update() # update agent\n", - " ep_reward += reward #\n", - " if done:\n", - " break\n", - " steps.append(ep_step)\n", - " rewards.append(ep_reward)\n", - " if (i_ep + 1) % 10 == 0:\n", - " print(f\"回合:{i_ep+1}/{cfg['train_eps']},奖励:{ep_reward:.2f},Epislon: {agent.epsilon:.3f}\")\n", - " print(\"完成训练!\")\n", - " env.close()\n", - " res_dic = {'episodes':range(len(rewards)),'rewards':rewards,'steps':steps}\n", - " return res_dic\n", - "\n", - "def test(cfg, env, agent):\n", - " print(\"开始测试!\")\n", - " print(f\"环境:{cfg['env_name']},算法:{cfg['algo_name']},设备:{cfg['device']}\")\n", - " rewards = [] # record rewards for all episodes\n", - " steps = []\n", - " for i_ep in range(cfg['test_eps']):\n", - " ep_reward = 0 # reward per episode\n", - " ep_step = 0\n", - " state = env.reset() # reset and obtain initial state\n", - " for _ in range(cfg['ep_max_steps']):\n", - " ep_step+=1\n", - " action = agent.predict_action(state) # predict action\n", - " next_state, reward, done, _ = env.step(action) \n", - " state = next_state \n", - " ep_reward += reward \n", - " if done:\n", - " break\n", - " steps.append(ep_step)\n", - " rewards.append(ep_reward)\n", - " print(f\"回合:{i_ep+1}/{cfg['test_eps']},奖励:{ep_reward:.2f}\")\n", - " print(\"完成测试!\")\n", - " env.close()\n", - " return {'episodes':range(len(rewards)),'rewards':rewards,'steps':steps}\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 3. 定义环境" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/Users/jj/opt/anaconda3/envs/easyrl/lib/python3.7/site-packages/gym/envs/registration.py:250: DeprecationWarning: SelectableGroups dict interface is deprecated. Use select.\n", - " for plugin in metadata.entry_points().get(entry_point, []):\n" - ] - } - ], - "source": [ - "import gym\n", - "import paddle\n", - "import numpy as np\n", - "import random\n", - "import os\n", - "from parl.algorithms import DQN\n", - "def all_seed(env,seed = 1):\n", - " ''' omnipotent seed for RL, attention the position of seed function, you'd better put it just following the env create function\n", - " Args:\n", - " env (_type_): \n", - " seed (int, optional): _description_. Defaults to 1.\n", - " '''\n", - " print(f\"seed = {seed}\")\n", - " env.seed(seed) # env config\n", - " np.random.seed(seed)\n", - " random.seed(seed)\n", - " paddle.seed(seed)\n", - " \n", - "def env_agent_config(cfg):\n", - " ''' create env and agent\n", - " '''\n", - " env = gym.make(cfg['env_name']) \n", - " if cfg['seed'] !=0: # set random seed\n", - " all_seed(env,seed=cfg[\"seed\"]) \n", - " n_states = env.observation_space.shape[0] # print(hasattr(env.observation_space, 'n'))\n", - " n_actions = env.action_space.n # action dimension\n", - " print(f\"n_states: {n_states}, n_actions: {n_actions}\")\n", - " cfg.update({\"n_states\":n_states,\"n_actions\":n_actions}) # update to cfg paramters\n", - " model = MLP(n_states,n_actions)\n", - " algo = DQN(model, gamma=cfg['gamma'], lr=cfg['lr'])\n", - " memory = ReplayBuffer(cfg[\"memory_capacity\"]) # replay buffer\n", - " agent = DQNAgent(algo,memory,cfg) # create agent\n", - " return env, agent" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 4. 设置参数" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/Users/jj/opt/anaconda3/envs/easyrl/lib/python3.7/site-packages/seaborn/rcmod.py:82: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.\n", - " if LooseVersion(mpl.__version__) >= \"3.0\":\n", - "/Users/jj/opt/anaconda3/envs/easyrl/lib/python3.7/site-packages/setuptools/_distutils/version.py:351: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.\n", - " other = LooseVersion(other)\n" - ] - } - ], - "source": [ - "import argparse\n", - "import seaborn as sns\n", - "import matplotlib.pyplot as plt\n", - "def get_args():\n", - " \"\"\" \n", - " \"\"\"\n", - " parser = argparse.ArgumentParser(description=\"hyperparameters\") \n", - " parser.add_argument('--algo_name',default='DQN',type=str,help=\"name of algorithm\")\n", - " parser.add_argument('--env_name',default='CartPole-v0',type=str,help=\"name of environment\")\n", - " parser.add_argument('--train_eps',default=200,type=int,help=\"episodes of training\") # 训练的回合数\n", - " parser.add_argument('--test_eps',default=20,type=int,help=\"episodes of testing\") # 测试的回合数\n", - " parser.add_argument('--ep_max_steps',default = 100000,type=int,help=\"steps per episode, much larger value can simulate infinite steps\")\n", - " parser.add_argument('--gamma',default=0.99,type=float,help=\"discounted factor\") # 折扣因子\n", - " parser.add_argument('--epsilon_start',default=0.95,type=float,help=\"initial value of epsilon\") # e-greedy策略中初始epsilon\n", - " parser.add_argument('--epsilon_end',default=0.01,type=float,help=\"final value of epsilon\") # e-greedy策略中的终止epsilon\n", - " parser.add_argument('--epsilon_decay',default=200,type=int,help=\"decay rate of epsilon\") # e-greedy策略中epsilon的衰减率\n", - " parser.add_argument('--memory_capacity',default=200000,type=int) # replay memory的容量\n", - " parser.add_argument('--memory_warmup_size',default=200,type=int) # replay memory的预热容量\n", - " parser.add_argument('--batch_size',default=64,type=int,help=\"batch size of training\") # 训练时每次使用的样本数\n", - " parser.add_argument('--targe_update_fre',default=200,type=int,help=\"frequency of target network update\") # target network更新频率\n", - " parser.add_argument('--seed',default=10,type=int,help=\"seed\") \n", - " parser.add_argument('--lr',default=0.0001,type=float,help=\"learning rate\")\n", - " parser.add_argument('--device',default='cpu',type=str,help=\"cpu or gpu\") \n", - " args = parser.parse_args([]) \n", - " args = {**vars(args)} # type(dict) \n", - " return args\n", - "def smooth(data, weight=0.9): \n", - " '''用于平滑曲线,类似于Tensorboard中的smooth\n", - "\n", - " Args:\n", - " data (List):输入数据\n", - " weight (Float): 平滑权重,处于0-1之间,数值越高说明越平滑,一般取0.9\n", - "\n", - " Returns:\n", - " smoothed (List): 平滑后的数据\n", - " '''\n", - " last = data[0] # First value in the plot (first timestep)\n", - " smoothed = list()\n", - " for point in data:\n", - " smoothed_val = last * weight + (1 - weight) * point # 计算平滑值\n", - " smoothed.append(smoothed_val) \n", - " last = smoothed_val \n", - " return smoothed\n", - "\n", - "def plot_rewards(rewards,cfg,path=None,tag='train'):\n", - " sns.set()\n", - " plt.figure() # 创建一个图形实例,方便同时多画几个图\n", - " plt.title(f\"{tag}ing curve on {cfg['device']} of {cfg['algo_name']} for {cfg['env_name']}\")\n", - " plt.xlabel('epsiodes')\n", - " plt.plot(rewards, label='rewards')\n", - " plt.plot(smooth(rewards), label='smoothed')\n", - " plt.legend()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 5. 收获成果!" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "seed = 10\n", - "n_states: 4, n_actions: 2\n", - "开始训练!\n", - "环境:CartPole-v0,算法:DQN,设备:cpu\n", - "回合:10/200,奖励:10.00,Epislon: 0.062\n", - "回合:20/200,奖励:85.00,Epislon: 0.014\n", - "回合:30/200,奖励:41.00,Epislon: 0.011\n", - "回合:40/200,奖励:31.00,Epislon: 0.010\n", - "回合:50/200,奖励:22.00,Epislon: 0.010\n", - "回合:60/200,奖励:10.00,Epislon: 0.010\n", - "回合:70/200,奖励:10.00,Epislon: 0.010\n", - "回合:80/200,奖励:22.00,Epislon: 0.010\n", - "回合:90/200,奖励:30.00,Epislon: 0.010\n", - "回合:100/200,奖励:20.00,Epislon: 0.010\n", - "回合:110/200,奖励:15.00,Epislon: 0.010\n", - "回合:120/200,奖励:45.00,Epislon: 0.010\n", - "回合:130/200,奖励:73.00,Epislon: 0.010\n", - "回合:140/200,奖励:180.00,Epislon: 0.010\n", - "回合:150/200,奖励:163.00,Epislon: 0.010\n", - "回合:160/200,奖励:191.00,Epislon: 0.010\n", - "回合:170/200,奖励:200.00,Epislon: 0.010\n", - "回合:180/200,奖励:200.00,Epislon: 0.010\n", - "回合:190/200,奖励:200.00,Epislon: 0.010\n", - "回合:200/200,奖励:200.00,Epislon: 0.010\n", - "完成训练!\n", - "开始测试!\n", - "环境:CartPole-v0,算法:DQN,设备:cpu\n", - "回合:1/20,奖励:200.00\n", - "回合:2/20,奖励:200.00\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/Users/jj/opt/anaconda3/envs/easyrl/lib/python3.7/site-packages/seaborn/rcmod.py:400: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.\n", - " if LooseVersion(mpl.__version__) >= \"3.0\":\n", - "/Users/jj/opt/anaconda3/envs/easyrl/lib/python3.7/site-packages/setuptools/_distutils/version.py:351: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.\n", - " other = LooseVersion(other)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "回合:3/20,奖励:200.00\n", - "回合:4/20,奖励:200.00\n", - "回合:5/20,奖励:200.00\n", - "回合:6/20,奖励:200.00\n", - "回合:7/20,奖励:200.00\n", - "回合:8/20,奖励:193.00\n", - "回合:9/20,奖励:200.00\n", - "回合:10/20,奖励:200.00\n", - "回合:11/20,奖励:200.00\n", - "回合:12/20,奖励:200.00\n", - "回合:13/20,奖励:200.00\n", - "回合:14/20,奖励:194.00\n", - "回合:15/20,奖励:200.00\n", - "回合:16/20,奖励:200.00\n", - "回合:17/20,奖励:200.00\n", - "回合:18/20,奖励:200.00\n", - "回合:19/20,奖励:199.00\n", - "回合:20/20,奖励:200.00\n", - "完成测试!\n" - ] - }, - { - "data": { - "image/png": "", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - }, - { - "data": { - "image/png": "", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "# 获取参数\n", - "cfg = get_args() \n", - "# 训练\n", - "env, agent = env_agent_config(cfg)\n", - "res_dic = train(cfg, env, agent)\n", - " \n", - "plot_rewards(res_dic['rewards'], cfg, tag=\"train\") \n", - "# 测试\n", - "res_dic = test(cfg, env, agent)\n", - "plot_rewards(res_dic['rewards'], cfg, tag=\"test\") # 画出结果" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.7.13 ('easyrl')", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.13" - }, - "orig_nbformat": 4, - "vscode": { - "interpreter": { - "hash": "8994a120d39b6e6a2ecc94b4007f5314b68aa69fc88a7f00edf21be39b41f49c" - } - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/projects/parl_tutorials/README.md b/projects/parl_tutorials/README.md deleted file mode 100644 index fd03c9a..0000000 --- a/projects/parl_tutorials/README.md +++ /dev/null @@ -1,19 +0,0 @@ -## 运行环境 - -由于```parl```和```paddle```容易与notebook相关模块发生版本冲突,因此推荐新建一个Conda环境: -```bash -conda create -n parl python=3.7 -``` - -然后安装```parl```和```paddle```: -```bash -pip install parl==2.0.5 - -pip install paddlepaddle-gpu==2.3.2 -i https://pypi.tuna.tsinghua.edu.cn/simple - -pip install paddlepaddle==2.3.2 -i https://pypi.tuna.tsinghua.edu.cn/simple -``` -安装其他依赖: -```bash -pip install -r parl_requirements.txt -``` \ No newline at end of file diff --git a/projects/parl_tutorials/parl_requirements.txt b/projects/parl_tutorials/parl_requirements.txt deleted file mode 100644 index cc8624d..0000000 --- a/projects/parl_tutorials/parl_requirements.txt +++ /dev/null @@ -1,7 +0,0 @@ -gym==0.19.0 -ipykernel==6.0.0 -jupyter==1.0.0 -pyzmq==18.1.1 -jupyter-client==7.0.0 -matplotlib==3.5.3 -seaborn==0.12.0 \ No newline at end of file diff --git a/projects/requirements.txt b/projects/requirements.txt deleted file mode 100644 index 5cda89e..0000000 --- a/projects/requirements.txt +++ /dev/null @@ -1,11 +0,0 @@ -pyyaml==6.0 -ipykernel==6.15.1 -jupyter==1.0.0 -matplotlib==3.5.3 -seaborn==0.12.1 -dill==0.3.5.1 -argparse==1.4.0 -pandas==1.3.5 -pyglet==1.5.26 -importlib-metadata<5.0 -setuptools==65.2.0 \ No newline at end of file