<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title>高轲用's Blog</title><link>https://blog.gaokeyong.top/</link><description>适可而止！</description><generator>Hugo -- gohugo.io</generator><language>zh-CN</language><managingEditor>gaokeyong@outlook.com (高轲用)</managingEditor><webMaster>gaokeyong@outlook.com (高轲用)</webMaster><lastBuildDate>Thu, 14 Jul 2022 17:00:12 +0800</lastBuildDate><atom:link href="https://blog.gaokeyong.top/index.xml" rel="self" type="application/rss+xml"/><item><title>跨域推荐的用户偏好的个性化转移(PTUPCDR)论文笔记</title><link>https://blog.gaokeyong.top/ptupcdr-notes/</link><pubDate>Thu, 14 Jul 2022 17:00:12 +0800</pubDate><author>gaokeyong@outlook.com (高轲用)</author><guid>https://blog.gaokeyong.top/ptupcdr-notes/</guid><description><![CDATA[<h2 id="概述">概述</h2>
<p>推荐系统在网络和移动应用中发挥着越来越重要的作用，但冷启动问题仍然是一个非常具有挑战性的问题。</p>
<p>幸运的是，辅助源域中冷启动用户的交互可以辅助目标域中的冷启动推荐。如何将用户的偏好从源域转移到目标域，是跨域推荐（CDR）的关键问题，它是处理冷启动问题的一个有前景的解决方案。大多数现有的方法都建立了一个共同的偏好桥来转移所有用户的偏好。</p>
<p>由于用户与用户之间的偏好不同，不同用户的偏好桥应该是不同的。沿着这个思路，论文提出了一个新的框架，名为跨域推荐的个性化用户偏好转移（PTUPCDR）。具体来说，论文学习了一个以用户特征嵌入为基础的元网络，以生成个性化的桥梁函数，实现每个用户的个性化偏好转移。为了稳定地学习元网络，论文采用了一个面向任务的优化程序。通过元生成的个性化桥梁函数，用户在源域的偏好嵌入可以转化为目标域，转化后的用户偏好嵌入可以作为目标域的冷启动用户的初始嵌入。利用大型真实世界数据集，论文进行了广泛的实验，以评估PTUPCDR在冷启动和热启动阶段的有效性。</p>
<p>总的来说，这篇论文的主要贡献归纳为三个方面：</p>
<ul>
<li>为了解决CDR中的冷启动问题，论文提出了一种名为PTUPCDR的新方法，利用元网络为每个用户生成个性化的桥梁函数，给定源域中的编码用户特征。</li>
<li>为了稳定地学习元网络，论文采用了一个面向任务的优化流程来减轻不合理的用户embedding的副作用。</li>
<li>论文使用亚马逊评论数据集对三个跨域任务进行了广泛的实验，证明了PTUPCDR不仅对冷启动场景而且对暖启动场景的有效性和稳健性，而现有的方法只证明了它们在冷启动场景中的有效性。</li>
</ul>
<p>论文被WSDM 2022收录，见于<a href="https://arxiv.org/abs/2110.11154" target="_blank" rel="noopener noreffer">https://arxiv.org/abs/2110.11154</a>。</p>
<p>相关代码见于<a href="https://github.com/easezyc/WSDM2022-PTUPCDR" target="_blank" rel="noopener noreffer">https://github.com/easezyc/WSDM2022-PTUPCDR</a>。</p>
<h2 id="相关术语">相关术语</h2>
<h3 id="跨域推荐-cross-domain-recommendation">跨域推荐 Cross-domain Recommendation</h3>
<p>跨域推荐是为了解决长期存在的数据稀少问题而提出的，它利用来自多个领域的反馈或评分，以一系列方式提高推荐的准确性。</p>
<p>受迁移学习的启发，CDR是在辅助（源）域的帮助下缓解目标域的数据稀少和冷启动问题的一个很有前景的解决方案。PTUPCDR是第一个为每个用户学习个性化的转移桥的研究工作。</p>
<p>例如某公司有一个音乐软件和一个视频软件，如何将用户在音乐软件的兴趣特征迁移到用户在视频软件中的兴趣特征，就归属于跨域推荐问题。</p>
<h3 id="冷启动推荐-cold-start-recommendation">冷启动推荐 Cold-start Recommendation</h3>
<p>为新的、尚未在系统中产生任何交互行为的用户或项目提供推荐是推荐系统的挑战，也被称为冷启动问题(cold-start
problem)，与之相对的是暖启动问题或热启动问题(warm-start problem)。</p>
<p>还是例如某公司有一个音乐软件和一个视频软件，某些用户是音乐软件的活跃用户且刚刚成为视频软件的用户。由于用户与视频软件几乎没有任何交互，因此不能个性化的根据用户的行为挖掘用户在视频领域的兴趣特征，这个问题就属于冷启动问题。</p>
<p>有两种方法来解决冷启动问题。第一种是通过设计决策策略来主动解决冷启动问题，第二种是利用辅助信息来辅助解决冷启动问题。PTUPCDR属于后一种。</p>
<h3 id="元学习-meta-learning">元学习 Meta Learning</h3>
<p>它也被称为&quot;学习如何学习&quot;，旨在通过类似任务的训练来提高新任务的性能。许多元学习方法如基于矩阵的方法、基于梯度的方法和基于参数生成的方法。</p>
<p>PTUPCDR属于基于参数生成的方法。</p>
<h2 id="模型">模型</h2>
<h3 id="问题背景">问题背景</h3>
<p>在CDR中, 论文有一个源域（source domain）和一个目标域（target
domain）。每个域有一个用户集$\mathcal{U} = {u_1, u_2, &hellip;}$、一个项目集$\mathcal{V} = {v_1, v_2, &hellip; }$和一个评分矩阵$\mathcal{R}$。
$r_{ij} \in \mathcal{R}$表示用户$u_i$和项目$v_j$之间的交互行为。为了区分源域和目标域，论文把源域的用户集、项目集、评分矩阵分别表示为$\mathcal{U}^s, \mathcal{V}^s, \mathcal{R}^s$，目标域的用户集、项目集、评分矩阵分别表示为$\mathcal{U}^t, \mathcal{V}^t, \mathcal{R}^t$。论文将两个领域之间的重叠用户定义为$\mathcal{U}^o = \mathcal{U}^s \cap \mathcal{U}^t$。（*即在源域和目标域都有交互操作的用户*）相反，$\mathcal{V}^s$和$\mathcal{V}^t$是不相交的，这意味着两个域之间没有共享项。（*不存在既属于源域有在目标域的项目*）</p>
<p>在隐变量模型（latent factor
models）中，用户和项目被转化为密集的向量，也被称为factors或embeddings。在这篇论文中,
$\bm{u}^d_i \in \mathbb{R}^{k}$和$\bm{v}^d_j  \in \mathbb{R}^{k}$表示用户$u^d_i$和项目$v^d_j$的embeddings，其中，$k$表示embeddings的维度，$d \in {s ,t}$代表域的类型。对每个用户$u_i$，论文将她在源域的连续互动项目列表表示为$\mathcal{S}_{u_i} = {v^s_{t_1}, v^s_{t_2}, \cdots, v^s_{t_n}}$，其中$n$表示互动的项目的数量，$v^s_{t_n}$表示时间戳为$t_n$时的源域中的互动项目。</p>
<h3 id="sec:attention">特征编码器 Characteristic encoder</h3>
<p>生成个性化桥梁函数的第一步是从互动项目中获取用户的个性化可转移特征。然而，冷启动用户在目标域中没有互动项目。因此，必须利用源域中的互动项目$\mathcal{S}$。请注意，论文需要找到对知识迁移有帮助的可迁移特性。</p>
<p>直观地说，各种项目对知识迁移有不同的贡献。<strong>注意力机制</strong>允许不同的部分在压缩到一个单一的表示时有不同的贡献。因此，论文提出在项目embeding上采用注意力机制，进行加权求和：</p>
<div>
$$\bm{p}_{u_i} = \sum_{v^s_j \in \mathcal{S}_{u_i}} a_j \bm{v}^s_j,$$
</div>
<p>在这里$\bm{p}_{u_i} \in \mathbb{R}^{k}$表示用户$u_i$的可转移的特征embedding，$a_j$是项目$v_j$的注意力得分(attention
score)（可以理解为$v_j$在预测个性化桥梁函数方面的重要性）。对于目标领域来说，一个不相关的项目对所有用户的个性化桥梁功能没有什么帮助。因此，论文通过一个注意力网络从项目的embedding中学习注意力得分。从形式上看，注意力网络被定义为：</p>
<div>
$$
\begin{aligned}
    a'_j &= h(\bm{v}_j;\theta),\\
    a_j &= \frac{\exp(a'_j)}{\sum_{v^s_l \in \mathcal{S}_{u_i}} \exp(a'_l)},
\end{aligned}
$$
</div>
<p>在这里$h(\cdot)$表示这个注意力网络，$\theta$表示$h(\cdot)$的参数。在这篇论文中，$h(\cdot)$是一个两层的前馈网络（two-layer
feed-forward
network）。请注意这个归一化的注意力得分$a_j$有利于找到对特定用户有用的互动项目。之后，论文可以利用每个用户的特征作为输入，指导个性化桥梁函数的生成。</p>
<h3 id="sec:meta">元网络 Meta Network</h3>
<p>论文已经提到，用户在不同域的偏好之间的关系因用户而异。换句话说，偏好转移的过程需要个性化。直观地说，偏好关系和用户的特征之间存在着某种联系。基于这一直觉，论文提出了一个元网络，它将用户的可转移特征作为输入，然后在用户在源域和目标域的embeddings之间生成一个个性化的桥梁函数。提出的元网络的公式为：</p>
<div>
$$\bm{w}_{u_i} = g(\bm{p}_{u_i}; \phi),$$
</div>
<p>在这里$g(\cdot)$是元网络，参数为$\phi$。在本文中，元网络是一个两层前馈网络。$\bm{w}_{u_i}$
是一个向量，其大小取决于桥梁函数的结构。个性化的桥梁函数公式为:</p>
<div>
$$f_{u_i}(\cdot;\bm{w}_{u_i}),$$
</div>
<p>其中利用$\bm{w}_{u_i}$作为桥梁函数$f(\cdot)$的参数。桥梁函数可以被定义为任何结构。在这篇论文中，出于简单明了，论文借鉴EMCDR使用一个线性层$f(\cdot)$。因此，为了适应桥梁参数的大小，论文将向量$\bm{w}_{u_i} \in \mathbb{R}^{k^2}$变形为矩阵$\bm{w}_{u_i} \in \mathbb{R}^{k \times k}$.
注意$\bm{w}_{u_i}$被用作桥梁函数的参数，而不是输入。生成的桥梁函数取决于用户的特点，并因人而异，论文称之为个性化的桥梁函数。</p>
<p>通过个性化的桥梁函数，论文可以得到个性化的转换后的用户的embeddings信息:</p>
<div>
$$\hat{\bm{u}}_i^t = f_{u_i}(\bm{u}^s_i;\bm{w}_{u_i}),$$
</div>
<p>在这里$\bm{u}^s_i$表示在源域中的用户$u_i$的embedding，$\hat{\bm{u}}_i^t$表示转移后的embedding。最终论文可以利用转移后的embedding
$\hat{\bm{u}}_i^t$做预测。</p>
<h3 id="sec:taskoriented">面向任务的优化</h3>
<p>为了训练元网络和特征编码器，论文可以依照现有的基于桥梁的方法，使用<b>面向映射的优化(mapping-oriented
optimization)</b>流程来最小化距离：</p>
<div>
$$\mathcal{L} = \sum_{u_i \in \mathcal{U}^o} || \hat{\bm{u}}^t_i - \bm{u}_i^t ||^2$$
</div>
<p>在这里$\hat{\bm{u}}^t_i$表示来自源域中的用户$\bm{u}^s_i$转换后的embedding，$\bm{u}_i^t$表示目标域中的用户embedding。面向映射的优化流程会使转换后的embedding
$\hat{\bm{u}}_i^t$靠近目标域中embedding$\bm{u}_i^t$。</p>
<p>然而，由于一些用户只有有限的互动，用户的embedding
$\bm{u}_i^t$可能不够合理和准确。对相对不合理的embeddings学习会导致对模型的负面影响。因此，论文提出了一个面向任务的优化方法来训练元网络和特征编码器。面向任务的训练程序直接利用最终推荐任务的性能作为优化目标。在本文中，论文专注于评分任务，所以面向任务的损失可以表述为：</p>
<div>
$$\min_{\theta, \phi} \frac{1}{|\mathcal{R}^t_o|} \sum_{r_{ij} \in \mathcal{R}^t_o} (r_{ij} - f_{u_i}(\bm{u}^s_i;\bm{w}_{u_i})\bm{v}_j)^2$$
</div>
<p>在这里$\mathcal{R}^t_o = {r_{ij}| u_i \in \mathcal{U}^o, v_j \in \mathcal{V}^t}$表示目标域中重叠用户的互动。</p>
<p>与面向映射的程序相比，面向任务的优化有两个优点：</p>
<ol>
<li>面向任务的优化可以减轻不合理的embeddings的影响。它直接使用评分数据，是真实值而不是近似的中间结果。</li>
<li>面向任务的学习过程有更多的训练样本，这可以避免过度拟合。例如，有$N$个重叠用户，每个用户有$M$的评分。面向映射的过程使用$|\mathcal{U}^o = N|$个样本学习映射函数，而面向任务的学习过程利用$|\mathcal{R}^t_o| = M \times N$个用户-项目评分。</li>
</ol>
<h3 id="整体流程">整体流程</h3>
<p></p>
<p>PTUPCDR的整体架构如图所示。训练过程可以分为三个步骤：预训练、元和初始化阶段。训练结束后，该方法可以适用于冷启动(cold-start)和暖启动(warm-start)阶段。</p>
<p><strong>预训练阶段</strong>: 这一步是为每个域分别学习隐空间。损失函数表述为:
$$\min_{\bm{u}, \bm{v}} \frac{1}{|\mathcal{R}|} \sum_{r_{ij}\in \mathcal{R}} (r_{ij} - \bm{u}_i \bm{v}_j)^2,$$
在这里$|\mathcal{R}|$表示评分的数量。预训练后，论文可以获得预训练的embeddings
$\bm{u}^s, \bm{u}^t, \bm{v}^s, \bm{v}^t$.</p>
<p><strong>元阶段</strong>:
现有的方法直接训练一个普通的桥函数，而PTUPCDR则训练特征编码器和元网络。</p>
<p><strong>初始化阶段</strong>:
当一个新用户到来时（CDR假设新用户在源域有一些互动），论文使用转换后的embedding
$\hat{\bm{u}}_i^t = f_{u_i}(\bm{u}^s_i;\bm{w}_{u_i})$来初始化新用户在目标域的embedding。</p>
<p><strong>测试阶段</strong>:
对于在目标领域没有互动的极端冷启动用户，直接利用初始embedding
$\hat{\bm{u}}_i^t = f_{u_i}(\bm{u}^s_i;\bm{w}_{u_i})$进行预测。对于在目标域有一些交互的温启动用户来说，用新的交互对初始嵌入进行微调，并利用微调后的嵌入进行预测是很方便的。</p>
<h2 id="论文中的实验">论文中的实验</h2>
<p><strong>数据集：</strong> 数据集采用了Amazon-5cores<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup>。</p>
<p><strong>基准对照：</strong></p>
<ul>
<li>
<p><strong>TGT</strong>：只用了目标域中的数据</p>
</li>
<li>
<p><strong>CMF</strong>：源域和目标域中的用户使用相同的embeddings</p>
</li>
<li>
<p><strong>EMCDR</strong>：它采用矩阵分解法（MF）首先学习嵌入，然后利用网络将用户嵌入从辅助域连接到目标域。</p>
</li>
<li>
<p><strong>DCDCSR</strong>：属于基于桥梁的方法，它考虑了不同领域中单个用户的评分稀疏程度</p>
</li>
<li>
<p><strong>SSCDR</strong>：一个半监督的基于桥梁的方法。</p>
</li>
</ul>
<p><strong>实现细节：</strong></p>
<ul>
<li>
<p>Adam优化器的初始学习率在0.001, 0.005, 0.01, 0.02,
0.1范围内通过网格搜索进行调整</p>
</li>
<li>
<p>embedding的维度：10</p>
</li>
<li>
<p>mini-batch大小：512</p>
</li>
<li>
<p>元网络的隐藏单元为$2\times k$，其中$k$表示嵌入维度，元网络的输出维度为$k\times k$</p>
</li>
<li>
<p>注意力网络是一个具有$k$隐藏单元的两层网络</p>
</li>
</ul>
<h3 id="实验结果">实验结果</h3>
<p>PTUPCDR在大多数情况下都能明显超过最佳的基准对照，这说明PTUPCDR对冷启动推荐是有效的。</p>
<p>实验数据显示PTUPCDR方法对于暖启动阶段也有很大的帮助。</p>
<h2 id="总结">总结</h2>
<p>论文发现现有研究的由所有用户共享的单一桥梁函数很难捕捉到源域和目标域中用户偏好之间的各种关系。因此，论文提出了一个新的框架PTUPCDR。具体来说，通过学习用户特征嵌入的元网络来生成个性化的桥梁函数，实现用户偏好的个性化转移。在真实世界的数据集上进行了广泛的实验来评估所提出的PTUPCDR，结果验证了PTUPCDR在冷启动和热启动阶段的有效性。</p>
<p>总的来说，论文的着眼点在于个性化上，充分考虑域之间的个性化、不同用户的个性化，并将这一抽象概念进行数学建模，捕捉可能有关联的变量并与待预测任务之间建立合适的层次关联，从而形成个性化的知识迁移桥梁函数。将训练后的模型在冷启动、热启动等场景下与现有的最前沿的研究工作进行对比，证明了PTUPCDR不仅对冷启动场景而且对暖启动场景的有效性和稳健性。</p>
<section class="footnotes" role="doc-endnotes">
<hr>
<ol>
<li id="fn:1" role="doc-endnote">
<p><a href="http://jmcauley.ucsd.edu/data/amazon/" target="_blank" rel="noopener noreffer">http://jmcauley.ucsd.edu/data/amazon/</a>&#160;<a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
</ol>
</section>
]]></description></item><item><title>函数式程序设计（一）——初⻅函数式思维</title><link>https://blog.gaokeyong.top/fp-haskell-1/</link><pubDate>Thu, 14 Jul 2022 09:37:53 +0800</pubDate><author>gaokeyong@outlook.com (高轲用)</author><guid>https://blog.gaokeyong.top/fp-haskell-1/</guid><description><![CDATA[<h2 id="列表的操作">列表的操作</h2>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span><span class="lnt">19
</span><span class="lnt">20
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-Haskell" data-lang="Haskell"><span class="o">&gt;</span> <span class="n">head</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span><span class="mi">5</span><span class="p">]</span>
<span class="mi">1</span>
<span class="o">&gt;</span> <span class="n">tail</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span><span class="mi">5</span><span class="p">]</span>
<span class="p">[</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span><span class="mi">5</span><span class="p">]</span>
<span class="o">&gt;</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span><span class="mi">5</span><span class="p">]</span> <span class="o">!!</span> <span class="mi">2</span>
<span class="mi">3</span>
<span class="o">&gt;</span> <span class="n">take</span> <span class="mi">3</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span><span class="mi">5</span><span class="p">]</span>
<span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span>
<span class="o">&gt;</span> <span class="n">drop</span> <span class="mi">3</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span><span class="mi">5</span><span class="p">]</span>
<span class="p">[</span><span class="mi">4</span><span class="p">,</span><span class="mi">5</span><span class="p">]</span>
<span class="o">&gt;</span> <span class="n">length</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span><span class="mi">5</span><span class="p">]</span>
<span class="mi">5</span>
<span class="o">&gt;</span> <span class="n">sum</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span><span class="mi">5</span><span class="p">]</span>
<span class="mi">15</span>
<span class="o">&gt;</span> <span class="n">product</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span><span class="mi">5</span><span class="p">]</span>
<span class="mi">120</span>
<span class="o">&gt;</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="o">++</span> <span class="p">[</span><span class="mi">4</span><span class="p">,</span><span class="mi">5</span><span class="p">]</span>
<span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span><span class="mi">5</span><span class="p">]</span>
<span class="o">&gt;</span> <span class="n">reverse</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span><span class="mi">5</span><span class="p">]</span>
<span class="p">[</span><span class="mi">5</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">1</span><span class="p">]</span>
</code></pre></td></tr></table>
</div>
</div><h2 id="函数应用">函数应用</h2>
<p>在Haskell中，函数应用使用空格来表示，乘法用星号*来表示。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-Haskell" data-lang="Haskell"><span class="nf">f</span> <span class="n">a</span> <span class="n">b</span> <span class="o">+</span> <span class="n">c</span><span class="o">*</span><span class="n">d</span>
</code></pre></td></tr></table>
</div>
</div><table>
<thead>
<tr>
<th style="text-align:left">Mathematics</th>
<th style="text-align:left">Haskell</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left"><code>f(x)</code></td>
<td style="text-align:left"><code>f x</code></td>
</tr>
<tr>
<td style="text-align:left"><code>f(x, y)</code></td>
<td style="text-align:left"><code>f x y</code></td>
</tr>
<tr>
<td style="text-align:left"><code>f(g(x))</code></td>
<td style="text-align:left"><code>f (g x)</code></td>
</tr>
<tr>
<td style="text-align:left"><code>f(g(x), y)</code></td>
<td style="text-align:left"><code>f (g x) y</code></td>
</tr>
<tr>
<td style="text-align:left"><code>f(x)g(y)</code></td>
<td style="text-align:left"><code>f x * g y</code></td>
</tr>
</tbody>
</table>
<h2 id="haskell脚本">Haskell脚本</h2>
<p>用户自己定义的函数在一个script中，由一系列定义组成的文本文件，后缀名习惯用<code>.hs</code>。</p>
<p>当开发一个Haskell脚本时，保持两个窗口的打开是很有用的，一个是运行脚本的编辑器，另一个是运行GHCi。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-Haskell" data-lang="Haskell"><span class="c1">-- test.hs</span>
<span class="nf">double</span> <span class="n">x</span> <span class="ow">=</span> <span class="n">x</span> <span class="o">+</span> <span class="n">x</span>
<span class="nf">quadruple</span> <span class="n">x</span> <span class="ow">=</span> <span class="n">double</span> <span class="p">(</span><span class="n">double</span> <span class="n">x</span><span class="p">)</span>
</code></pre></td></tr></table>
</div>
</div><p>终端中运行<code>ghci test.hs</code>，然后输入<code>double 3</code>，得到<code>6</code>，输入<code>quadruple 3</code>，得到<code>12</code>。</p>
<p>我们不关闭GHCi，然后在<code>test.hs</code>中添加阶乘函数和求均值函数：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-Haskell" data-lang="Haskell"><span class="nf">double</span> <span class="n">x</span> <span class="ow">=</span> <span class="n">x</span> <span class="o">+</span> <span class="n">x</span>
<span class="nf">quadruple</span> <span class="n">x</span> <span class="ow">=</span> <span class="n">double</span> <span class="p">(</span><span class="n">double</span> <span class="n">x</span><span class="p">)</span>
<span class="nf">factorial</span> <span class="n">n</span> <span class="ow">=</span> <span class="n">product</span> <span class="p">[</span><span class="mi">1</span><span class="o">..</span><span class="n">n</span><span class="p">]</span>
<span class="nf">average</span> <span class="n">ns</span> <span class="ow">=</span> <span class="n">sum</span> <span class="n">ns</span> <span class="p">`</span><span class="n">div</span><span class="p">`</span> <span class="n">length</span> <span class="n">ns</span>
</code></pre></td></tr></table>
</div>
</div><div class="details admonition note open">
        <div class="details-summary admonition-title">
            <i class="icon fas fa-pencil-alt fa-fw"></i>注意<i class="details-icon fas fa-angle-right fa-fw"></i>
        </div>
        <div class="details-content">
            <div class="admonition-content"><ul>
<li><code>div</code>是用反引号括起来的，而不是正引号。</li>
<li><code>x ``f`` y</code>只是<code>f x y</code>的语法糖。</li>
</ul>
</div>
        </div>
    </div>
<p>GHCi不会自动检测到脚本已经被改变，所以在使用新的定义之前，必须执行一个重新加载命令<code>:reload</code>。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-Haskell" data-lang="Haskell"><span class="nf">ghci</span><span class="o">&gt;</span> <span class="ow">::</span><span class="n">reload</span>
<span class="p">[</span><span class="mi">1</span> <span class="kr">of</span> <span class="mi">1</span><span class="p">]</span> <span class="kt">Compiling</span> <span class="kt">Main</span>             <span class="p">(</span> <span class="n">test</span><span class="o">.</span><span class="n">hs</span><span class="p">,</span> <span class="n">interpreted</span> <span class="p">)</span>
<span class="kt">Ok</span><span class="p">,</span> <span class="n">one</span> <span class="kr">module</span> <span class="err">loaded.</span>
<span class="err">ghci&gt;</span> <span class="err">factorial</span> <span class="err">5</span>
<span class="err">120</span>
<span class="err">ghci&gt;</span> <span class="err">average</span> <span class="err">[1,1,4,5,1,4]</span>
<span class="err">2</span>
</code></pre></td></tr></table>
</div>
</div><h2 id="常用ghci命令">常用GHCi命令</h2>
<table>
<thead>
<tr>
<th style="text-align:left">命令</th>
<th style="text-align:left">作用</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left"><code>:load </code><em><code>name</code></em></td>
<td style="text-align:left">加载脚本<em>name</em></td>
</tr>
<tr>
<td style="text-align:left"><code>:reload</code></td>
<td style="text-align:left">重新加载脚本</td>
</tr>
<tr>
<td style="text-align:left"><code>:set editor </code><em><code>name</code></em></td>
<td style="text-align:left">设置编辑器为<em>name</em></td>
</tr>
<tr>
<td style="text-align:left"><code>:edit </code><em><code>name</code></em></td>
<td style="text-align:left">编辑脚本文件<em>name</em></td>
</tr>
<tr>
<td style="text-align:left"><code>:edit</code></td>
<td style="text-align:left">编辑当前脚本</td>
</tr>
<tr>
<td style="text-align:left"><code>:type </code><em><code>expr</code></em></td>
<td style="text-align:left">显示表达式<em>expr</em>的类型</td>
</tr>
<tr>
<td style="text-align:left"><code>:?</code></td>
<td style="text-align:left">列出所有命令</td>
</tr>
<tr>
<td style="text-align:left"><code>:quit</code></td>
<td style="text-align:left">退出GHCi</td>
</tr>
</tbody>
</table>
<h2 id="函数式思维">函数式思维</h2>
<p>使用数学中的函数作为求解信息处理问题的基本成分。</p>
<ul>
<li><u>从零开始定义</u>一些基本函数</li>
<li>把已有的函数<u>组装</u>起来，形成新的函数</li>
</ul>
<h3 id="自然数上的-fold-函数">自然数上的 <code>fold</code> 函数</h3>
<p><code>plus</code>、<code>mult</code>、<code>expn</code>这三个函数之间存在共性，这种共性可以被封装在一个函数中。</p>
<div>
$$
\begin{aligned}
foldn&:(A\rightarrow A) \rightarrow (A \rightarrow (\mathbb{N} \rightarrow A)) \\
foldn(h)(c)(0)&\doteq c \\
foldn(h)(c)(succ(n))&\doteq h(foldn(h)(c)(n)) \\
\end{aligned}
$$
</div>
<p>可知</p>
<div>
$$
\begin{aligned}
h&:A\rightarrow A \\
c&:A \\
\end{aligned}
$$
</div>
<p>给定函数$h:A\rightarrow A$和值$c:A$，令$f=fold(h)(c)$，则由上述定义可知：</p>
<div>
$$
\begin{aligned}
f(0)&\doteq c \\
f(succ(n))&\doteq h(f(n)) \\
\end{aligned}
$$
</div>
<div class="details admonition note open">
        <div class="details-summary admonition-title">
            <i class="icon fas fa-pencil-alt fa-fw"></i>注意<i class="details-icon fas fa-angle-right fa-fw"></i>
        </div>
        <div class="details-content">
            <div class="admonition-content"><p>从另一个角度理解$foldn$函数：</p>
<ul>
<li>给定一个自然数$n$，可知：
<div>
$$n=(\underbrace{succ\cdot succ\cdots succ}_\text{n $succ$ functions})(0)$$
</div>
</li>
<li>已知$f=foldn(h)(c)$，可知：
<div>
$$f(n)=(\underbrace{h\cdot h\cdots h}_\text{n $h$ functions})(c)$$
</div>
</li>
</ul>
</div>
        </div>
    </div>
<p>利用<code>foldn</code>函数，可以对<code>plus</code>、<code>mult</code>、<code>expr</code>进行更简洁的定义：</p>
<div>
$$
\begin{aligned}
plus&:\mathbb{N}\rightarrow (\mathbb{N}\rightarrow \mathbb{N}) \\
plus(n)&\doteq foldn(succ)(n) \\
m&=(\underbrace{succ\cdot succ\cdots succ}_\text{m $succ$ functions})(0) \\
plus(n)(m)&=(\underbrace{succ\cdot succ\cdots succ}_\text{m $succ$ functions})(n) \\
\end{aligned}
$$
</div>
<div>
$$
\begin{aligned}
mult&:\mathbb{N}\rightarrow (\mathbb{N}\rightarrow \mathbb{N}) \\
mult(n)&\doteq foldn(plus(n))(0) \\
mult(n)(m)&=(\underbrace{plus(n)\cdot plus(n)\cdots plus(n)}_\text{m $plus(n)$ functions})(0) \\
\end{aligned}
$$
</div>
<div>
$$
\begin{aligned}
expr&:\mathbb{N}\rightarrow (\mathbb{N}\rightarrow \mathbb{N}) \\
expr(n)&\doteq foldn(mult(n))(1) \\
expr(n)(m)&=(\underbrace{mult(n)\cdot mult(n)\cdots mult(n)}_\text{m $mult(n)$ functions})(1) \\
\end{aligned}
$$
</div>
<h3 id="fact函数"><code>fact</code>函数</h3>
<p>为了使用<code>foldn</code>函数定义<code>fact</code>和<code>fib</code>函数，首先引入两个辅助函数：</p>
<div>
$$
\begin{aligned}
outl&:A\times B \rightarrow A \\
outl(a,b)&\doteq a
\end{aligned}
$$
</div>
<div>
$$
\begin{aligned}
outr&:A\times B \rightarrow B \\
outr(a,b)&\doteq b
\end{aligned}
$$
</div>
<p><code>fact</code>函数定义如下：</p>
<div>
$$
\begin{aligned}
f&:\mathbb{N}\times \mathbb{N}\rightarrow \mathbb{N}\times \mathbb{N} \\
f(m,n)&\doteq (m+1,(m+1)\times n) \\
fact&:\mathbb{N}\rightarrow \mathbb{N} \\
fact(n)&\doteq outr\cdot foldn(f)(0,1) \\
\\
m&=(\underbrace{succ\cdot succ\cdots succ}_\text{m $succ$ functions})(0) \\
fact(m)&=outr((\underbrace{f\cdot f\cdots f}_\text{m $f$ functions})(0,1)) \\
\end{aligned}
$$
</div>
<h3 id="fib函数"><code>fib</code>函数</h3>
<div>
$$
\begin{aligned}
g&:\mathbb{N}\times \mathbb{N}\rightarrow \mathbb{N}\times \mathbb{N} \\
g(m,n)&\doteq (n,(m+n)) \\
fib&:\mathbb{N}\rightarrow \mathbb{N} \\
fib(n)&\doteq outl\cdot foldn(g)(0,1) \\
\\
m&=(\underbrace{succ\cdot succ\cdots succ}_\text{m $succ$ functions})(0) \\
fib(m)&=outl((\underbrace{g\cdot g\cdots g}_\text{m $g$ functions})(0,1)) \\
\end{aligned}
$$
</div>
<h2 id="序列list以及序列上的fold函数">序列（List）以及序列上的<code>fold</code>函数</h2>
<p>TBD</p>
<h2 id="list-相关函数的重定义">List 相关函数的重定义</h2>
<p>TBD</p>
<h2 id="一种排序算法">一种排序算法</h2>
<p>TBD</p>
]]></description></item><item><title>在Linux上使用MIPSsim模拟器</title><link>https://blog.gaokeyong.top/mipssim-wine/</link><pubDate>Sun, 01 May 2022 07:49:49 +0800</pubDate><author>gaokeyong@outlook.com (高轲用)</author><guid>https://blog.gaokeyong.top/mipssim-wine/</guid><description><![CDATA[<p>Wine 是一款可以在多个 posix 兼容的操作系统上运行 Windows 应用程序的兼容层，比如 <strong>Linux、 macOS 和 BSD</strong>。Linux中运行MIPSsim模拟器的最终效果如图：</p>
<p></p>
<p>但使用前需要进行一些适配。</p>
<h2 id="安装-wine">安装 Wine</h2>
<p>需要安装<a href="https://wiki.winehq.org/Download" target="_blank" rel="noopener noreffer"><code>wine</code></a>（64位版本）、<a href="https://wiki.winehq.org/Gecko#Installing" target="_blank" rel="noopener noreffer"><code>wine-gecko</code></a>、<a href="https://wiki.winehq.org/Winetricks#Getting_winetricks" target="_blank" rel="noopener noreffer"><code>winetricks</code></a>。您可以使用系统包管理器安装，或参考前述超链接内指引。</p>
<p>以下步骤在wine 7.7版本验证可用。</p>
<h2 id="配置中文字体">配置中文字体</h2>
<p>直接执行：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash">winetricks fakechinese
</code></pre></td></tr></table>
</div>
</div><h2 id="安装net-framework-46">安装.NET Framework 4.6</h2>
<p><small>参见：<a href="https://appdb.winehq.org/objectManager.php?sClass=version&amp;iId=32828" target="_blank" rel="noopener noreffer">WineHQ  - .NET Framework 4.6</a></small></p>
<p>运行：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="nv">LANG</span><span class="o">=</span>zh_CN.UTF-8 winetricks --force dotnet46 corefonts riched20
</code></pre></td></tr></table>
</div>
</div><div class="details admonition warning open">
        <div class="details-summary admonition-title">
            <i class="icon fas fa-exclamation-triangle fa-fw"></i>警告<i class="details-icon fas fa-angle-right fa-fw"></i>
        </div>
        <div class="details-content">
            <div class="admonition-content">请务必安装dotnet<b style="color:red;">46</b>，更高版本或更低版本都有可能出现问题。</div>
        </div>
    </div>
<p></p>
<p>可能要经过三轮安装向导完成安装。</p>
<h2 id="运行模拟器">运行模拟器</h2>
<p>解压<code>计算机系统结构实验指导书及模拟器-发布版.zip</code>，在程序所在目录执行<code>LANG=zh_CN.UTF-8 wine64 &quot;MIPS模拟器(64位).exe&quot;</code>，运行效果如下：</p>
<p></p>
<h2 id="相关问题">相关问题</h2>
<h3 id="分辨率过高字体过小">分辨率过高，字体过小</h3>
<p>调整wine dpi，运行<code>LANG=zh_CN.UTF-8 winecfg</code>，在<code>显示</code>菜单中调整屏幕分辨率：</p>
<p></p>
<h3 id="载入程序时提示汇编错误">载入程序时提示汇编错误</h3>
<p>载入载入程序时提示汇编错误，错误信息：缺少必须的代码段定义。可能是由于文件换行为<code>/n</code>。模拟器需要<code>/r/n</code>的换行。</p>
<p>VS Code右下角可以切换换行模式。</p>
]]></description></item><item><title>Linux下应用火焰图(Flame Graph)可视化交互程序剖析</title><link>https://blog.gaokeyong.top/flame-graph/</link><pubDate>Fri, 22 Apr 2022 21:52:00 +0800</pubDate><author>gaokeyong@outlook.com (高轲用)</author><guid>https://blog.gaokeyong.top/flame-graph/</guid><description><![CDATA[<p>确定CPU繁忙的原因是性能分析的一项常规任务，这通常涉及到对堆栈跟踪的剖析。通过固定速率的采样进行剖析是一种粗略但有效的方法，可以看到哪些代码路径是热的（CPU上的繁忙）。它通常通过创建一个定时中断来收集当前的程序计数器、函数地址或整个堆栈回溯，并在打印总结报告时将其转化为人类可读的东西。</p>
<p>linux-tools中的系统级性能分析工具perf提供一个性能分析框架，通过perf，应用程序可以利用PMU、tracepoint和内核中的计数器来进行性能统计。tracepoints是散落在内核源码中的一些hook，它们可以在特定的代码被执行到时触发，这一特点可以被各种trace/debug工具所使用。perf将tracepoint产生的时间记录下来，生成报告，通过分析这些报告，调优人员便可以了解程序运行期间内核的各种细节，定位程序的性能瓶颈。</p>
<p>然而，perf生成的剖析数据可能有几千行之长，人类是视觉动物，在阅读大量的数字和理解它们方面相当糟糕。火焰图（Flame Graph）是采样堆栈跟踪的一种可视化方式。除CPU剖析外，这种可视化的用途见火焰图的主页。</p>
<p>在这里，我将介绍如何使用<a href="https://www.brendangregg.com/flamegraphs.html" target="_blank" rel="noopener noreffer">Flame Graph</a>生成程序剖析的CPU火焰图，它可以配合perf使用，并且生成的SVG是<strong>交互式的</strong>，可以放大或搜索定位到程序的特定部分，其效果如下：</p>
<p></p>
<h1 id="安装">安装</h1>
<p>首先安装perf，以Arch Linux为例：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash">sudo pacman -S perf
</code></pre></td></tr></table>
</div>
</div><p>拉取<a href="https://github.com/brendangregg/FlameGraph" target="_blank" rel="noopener noreffer">FlameGraph仓库</a>：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash">git clone https://github.com/brendangregg/FlameGraph  <span class="c1"># or download it from github</span>
</code></pre></td></tr></table>
</div>
</div><h1 id="使用">使用</h1>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="nb">cd</span> FlameGraph
sudo perf record -F <span class="m">99</span> -a -g -- COMMAND <span class="o">[</span>ARGS<span class="o">]</span>
</code></pre></td></tr></table>
</div>
</div><p><code>COMMAND [ARGS]</code>为你要剖析的程序及参数，该命令参数的含义是栈帧采样率为99Hz(<code>-F 99</code>)，记录所有CPU核心的事件(<code>-a</code>)，对内核空间和用户空间启动函数调用图记录功能(<code>-g</code>)。除此之外，其它常用的参数如下：</p>
<ul>
<li><code>-p, --pid=</code>：记录指定pid进程的事件，可用逗号分隔多个进程pid</li>
<li><code>-e, --event=</code>：选择指定的PMU事件</li>
<li><code>-C, --cpu</code>：仅在指定CPU核心上采样。如<code>0,1</code> <code>0-2</code></li>
<li><code>-o, --output=</code>：输出文件名</li>
</ul>
<p>程序执行完毕后，当前目录下生成<code>perf.data</code>剖析数据。我们将使用FlameGraph仓库中的脚本生成火焰图：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash">sudo perf script <span class="p">|</span> ~/FlameGraph/stackcollapse-perf.pl &gt; out.perf-folded
~/FlameGraph/flamegraph.pl out.perf-folded &gt; perf.svg
</code></pre></td></tr></table>
</div>
</div><p>当前目录下生成<code>perf.svg</code>。用浏览器打开（Firefox为例）：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash">firefox perf.svg
</code></pre></td></tr></table>
</div>
</div><p>你可以通过鼠标点击来探索火焰图，或者通过右上角的搜索功能定位到某处。</p>
<h1 id="参考">参考</h1>
<p><a href="https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html" target="_blank" rel="noopener noreffer">CPU Flame Graphs</a></p>
<p><a href="https://missing.csail.mit.edu/2020/debugging-profiling/" target="_blank" rel="noopener noreffer">Debugging and Profiling - the missing semester of your cs education</a></p>
]]></description></item><item><title>一种古老的内容订阅技术——RSS</title><link>https://blog.gaokeyong.top/rss/</link><pubDate>Wed, 02 Mar 2022 20:44:59 +0800</pubDate><author>gaokeyong@outlook.com (高轲用)</author><guid>https://blog.gaokeyong.top/rss/</guid><description><![CDATA[<h1 id="何谓rss">何谓RSS?</h1>
<p>对于我们当代的年轻人来说，RSS也许是个十分陌生的概念；然而对于老一代网民尤其是其中对互联网有深入了解的人来说，或许并不陌生。在那个时代，微信、微博等社交平台和通讯工具尚未出现，博客、论坛和新闻网站是当时的网民们网上冲浪的经常去处。然而手动打开众多自己喜爱的博客和新闻网站查看有无更新信息非常不方便，于是作为一种信息聚合方式，RSS更快地检查并聚合您所关注的网站更新，同时也能让博主和新闻媒体创建新闻频道并发布至因特网！</p>
<p><strong>RSS（简易信息聚合）</strong> 是一种消息来源格式规范，用以<strong>聚合经常发布更新数据的网站</strong>，例如博客文章、新闻、音频或视频的网摘。RSS文件（或称做摘要、网络摘要、或频更新，提供到频道）包含全文或是节录的文字，再加上发布者所订阅之网摘数据和授权的元数据。 <sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup></p>
<p>通过使用 RSS，您可以把需要的信息从不需要的信息（兜售信息，垃圾邮件等）中分离出来，通过使用 RSS，您还可以创建自己的新闻频道，并将之发布到因特网。<sup id="fnref:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup></p>
<p>如今，随着互联网日趋中心化，博客、传统论坛等日渐式微，大多数网民的信息聚合主要依赖如微信公众号、知乎、Bilibili、今日头条等平台。但作为一种“去中心化”的协议，RSS仍然具有无可替代的优势。</p>
<h1 id="我的rss折腾笔记">我的RSS折腾笔记</h1>
<h2 id="kindle">Kindle</h2>
<p><a href="https://github.com/cdhigh/KindleEar" target="_blank" rel="noopener noreffer">KindleEar</a>可以生成排版精美的杂志模式mobi/epub格式自动每天推送至您的Kindle或其他邮箱。由于搭建服务所需要的Google App Engine(GAE)服务需要绑定境外银行卡，因此笔者没有尝试成功搭建过。</p>
<h2 id="rsshub-radar">RSSHub Radar</h2>
<p><a href="https://github.com/DIYgod/RSSHub-Radar" target="_blank" rel="noopener noreffer">RSSHub</a>是由<a href="https://diygod.me/" target="_blank" rel="noopener noreffer">DIYgod</a>大佬开发的帮助你快速发现和订阅当前网站 RSS 和 RSSHub 的浏览器扩展。作者官方介绍请看<a href="https://diygod.me/rsshub-radar/" target="_blank" rel="noopener noreffer">这篇文章</a>。</p>
<h2 id="all-about-rss">All-about-RSS</h2>
<p><strong>最重要的是</strong>，你极有可能在<a href="https://github.com/AboutRSS/ALL-about-RSS" target="_blank" rel="noopener noreffer">这个列表</a>里找到你需要的关于RSS的应用、工具和服务。</p>
<h1 id="后记">后记</h1>
<p>深夜写下这篇文章，怀念当年互联网的年代，去中心化、互帮互助、好学、匿名性等这些令人怀念的东西，已然被日趋中心化的互联网磨蚀的稀缺而珍贵。博客、论坛、电子邮件、RSS、IRC这些东西被裹挟着便捷性和商业性的新一代社交媒体抛入历史的洪流中，令人唏嘘。谨以此文，纪念逝去的那段时代。</p>
<h1 id="参考文献">参考文献</h1>
<section class="footnotes" role="doc-endnotes">
<hr>
<ol>
<li id="fn:1" role="doc-endnote">
<p>RSS. 中文维基百科 <a href="https://zh.wikipedia.org/wiki/RSS" target="_blank" rel="noopener noreffer">https://zh.wikipedia.org/wiki/RSS</a>&#160;<a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>RSS 简介. 菜鸟教程 <a href="https://www.runoob.com/rss/rss-intro.html" target="_blank" rel="noopener noreffer">https://www.runoob.com/rss/rss-intro.html</a>&#160;<a href="#fnref:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
</ol>
</section>
]]></description></item><item><title>C++面向对象复习笔记2021</title><link>https://blog.gaokeyong.top/cpp-oop-review-2021/</link><pubDate>Mon, 20 Sep 2021 14:10:38 +0800</pubDate><author>gaokeyong@outlook.com (高轲用)</author><guid>https://blog.gaokeyong.top/cpp-oop-review-2021/</guid><description><![CDATA[<p>简单的复习了C++面向对象的部分知识，并把一些遇到的需要回顾的知识点记录到该笔记。</p>
<h1 id="io-stream">IO Stream</h1>
<p>用提取运算符<code>&gt;&gt;</code>提取数据时，<strong>以空白符(如空格、回车、tab)作为数据的分割符</strong>，因此提取字符串数据时，不能提取空白字符。如果要每次读入一行，可以用<code>getline</code>。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span><span class="lnt">8
</span><span class="lnt">9
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-c++" data-lang="c++"><span class="n">ifstream</span> <span class="n">myFile</span><span class="p">;</span>
<span class="n">myFile</span><span class="p">.</span><span class="n">open</span><span class="p">(</span><span class="s">&#34;in&#34;</span><span class="p">);</span>
<span class="n">string</span> <span class="n">str</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span><span class="o">!</span><span class="n">myFile</span><span class="p">.</span><span class="n">eof</span><span class="p">())</span>
<span class="p">{</span>
    <span class="n">getline</span><span class="p">(</span><span class="n">myFile</span><span class="p">,</span> <span class="n">str</span><span class="p">);</span>
    <span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="n">str</span> <span class="o">&lt;&lt;</span> <span class="n">endl</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">myFile</span><span class="p">.</span><span class="n">close</span><span class="p">();</span>
</code></pre></td></tr></table>
</div>
</div><p>判断打开的文件是否存在，可以调用成员函数<code>is_open()</code>。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-c++" data-lang="c++"><span class="n">ifstream</span> <span class="n">myFile</span><span class="p">;</span>
<span class="n">myFile</span><span class="p">.</span><span class="n">open</span><span class="p">(</span><span class="s">&#34;in2&#34;</span><span class="p">);</span>
<span class="n">string</span> <span class="n">str</span><span class="p">;</span>
<span class="k">if</span><span class="p">(</span><span class="n">myFile</span><span class="p">.</span><span class="n">is_open</span><span class="p">()){</span>
    <span class="k">while</span> <span class="p">(</span><span class="o">!</span><span class="n">myFile</span><span class="p">.</span><span class="n">eof</span><span class="p">())</span>
    <span class="p">{</span>
        <span class="n">getline</span><span class="p">(</span><span class="n">myFile</span><span class="p">,</span> <span class="n">str</span><span class="p">);</span>
        <span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="n">str</span> <span class="o">&lt;&lt;</span> <span class="n">endl</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span><span class="k">else</span><span class="p">{</span>
    <span class="n">puts</span><span class="p">(</span><span class="s">&#34;Not exists. &#34;</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">myFile</span><span class="p">.</span><span class="n">close</span><span class="p">();</span>
</code></pre></td></tr></table>
</div>
</div><h1 id="stl">STL</h1>
<h2 id="vector">Vector</h2>
<p>与C数组对比：</p>
<ul>
<li>常被称为“动态数组”，因为其大小可以按照需要增长和收缩</li>
<li>有内置的操作元素的成员函数</li>
</ul>
<p>常用成员函数：</p>
<ul>
<li><code>push_back()</code></li>
<li><code>size()</code></li>
<li><code>pop_back()</code></li>
<li><code>clear()</code></li>
<li><code>empty()</code></li>
<li><code>capacity()</code></li>
<li><code>reverse()</code></li>
<li><code>resize()</code></li>
</ul>
<h3 id="迭代器">迭代器</h3>
<p>要访问顺序容器和关联容器中的元素，需要通过“迭代器（iterator）”进行。迭代器是一个变量，相当于容器和操纵容器的算法之间的中介。迭代器可以指向容器中的某个元素，通过迭代器就可以读写它指向的元素。从这一点上看，迭代器和指针类似。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-c++" data-lang="c++"><span class="k">for</span> <span class="p">(</span><span class="n">vector</span><span class="o">&lt;</span><span class="kt">int</span><span class="o">&gt;::</span><span class="n">iterator</span> <span class="n">iter</span> <span class="o">=</span> <span class="n">list</span><span class="p">.</span><span class="n">begin</span><span class="p">();</span> <span class="n">iter</span> <span class="o">!=</span> <span class="n">list</span><span class="p">.</span><span class="n">end</span><span class="p">();</span> <span class="n">iter</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="o">*</span><span class="n">iter</span> <span class="o">&lt;&lt;</span> <span class="sc">&#39; &#39;</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="n">endl</span><span class="p">;</span>
</code></pre></td></tr></table>
</div>
</div><h1 id="class">Class</h1>
<h2 id="访问限定符">访问限定符</h2>
<ul>
<li><code>public</code>：完全可访问</li>
<li><code>private</code>：仅该类可以访问</li>
<li><code>protected</code>：仅该类的成员函数和其派生子类的成员函数可以访问</li>
</ul>
<h2 id="static"><code>static</code></h2>
<h3 id="静态成员变量">静态成员变量</h3>
<p>静态成员变量无需对象实例而存在，且一个类的所有对象共享这个类的相同的静态成员变量，即可以理解为每个对象都有该变量的相同的副本。</p>
<div class="details admonition note open">
        <div class="details-summary admonition-title">
            <i class="icon fas fa-pencil-alt fa-fw"></i>注意<i class="details-icon fas fa-angle-right fa-fw"></i>
        </div>
        <div class="details-content">
            <div class="admonition-content">注意：为了使静态成员变量有意义，必须在定义类时对其赋初值。（在.cpp文件的顶部）</div>
        </div>
    </div>
<h3 id="静态成员函数">静态成员函数</h3>
<p>静态成员函数属于这个类的所有实例，并能被该类的任何一个对象调用。</p>
<p>静态成员函数只能访问静态成员变量、静态成员函数和类以外的数据和函数。</p>
<h1 id="多态polymorphism">多态(Polymorphism)</h1>
<p><strong>成员函数</strong>：通过基类指针或引用间接指向派生类子类型时多态性才会起作用。</p>
<p><strong>析构</strong>：如果一个类用作基类，我们通常需要<code>virtual</code>来修饰它的析构函数，这点很重要。如果基类的析构函数不是虚析构，当我们用delete来释放基类指针(它其实指向的是派生类的对象实例)占用的内存的时候，只有基类的析构函数被调用，而派生类的析构函数不会被调用，这就可能引起内存泄露。如果基类的析构函数是虚析构，那么在delete基类指针时，继承树上的析构函数会被自低向上依次调用，即最底层派生类的析构函数会被首先调用，然后一层一层向上直到该指针声明的类型。<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup></p>
<p><strong>纯虚函数</strong>：只有声明没有定义的虚函数，提供了一个可被子类型改写的接口：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-c++" data-lang="c++"><span class="k">class</span> <span class="nc">Base</span> <span class="p">{</span>
    <span class="k">public</span><span class="o">:</span>
    <span class="k">virtual</span> <span class="kt">int</span> <span class="n">func</span><span class="p">()</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></td></tr></table>
</div>
</div><p>含有一个或多个纯虚函数的类为抽象类，抽象类本身不能产生对象实例，否则代码将不能编译。</p>
<h1 id="参考资料">参考资料</h1>
<section class="footnotes" role="doc-endnotes">
<hr>
<ol>
<li id="fn:1" role="doc-endnote">
<p><a href="https://blog.csdn.net/ring0hx/article/details/1605254" target="_blank" rel="noopener noreffer">C++ Virtual详解</a>.悦峰&#160;<a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
</ol>
</section>
]]></description></item><item><title>神经网络笔记（四）——Spatial Batch Normalization &amp; Spatial Group Normalization</title><link>https://blog.gaokeyong.top/nn-notes-4/</link><pubDate>Fri, 10 Sep 2021 12:38:59 +0800</pubDate><author>gaokeyong@outlook.com (高轲用)</author><guid>https://blog.gaokeyong.top/nn-notes-4/</guid><description><![CDATA[<p>这里我们跟着实验来完成Spatial Batch Normalization和Spatial Group Normalization，用于对CNN进行优化。</p>
<h1 id="spatial-batch-normalization">Spatial Batch Normalization</h1>
<p>回忆之前普通神经网络的BN层，输入为$X_{input}=(N, D)$，输出形状也为$(N, D)$，其作用是将输入进行归一化然后输出。在这里，对于来自卷积层的数据$X_{input}=(N,C,H,W)$，其输出形状也为$(N,C,H,W)$，其中$N$是一个mini-batch的数据数量，$C$是特征映射（feature map）的数量，有几个感受野就会产生几个特征映射，而$(H, W)$则给出特征映射的大小。</p>
<p>如果特征映射是由卷积运算产生的，我们希望对各个特征C映射进行归一化，使得每个特征映射的不同图片（N）和一张图片内的不同位置（H,W）的统计学特征（均值、标准差等）相对一致。也就是说，spatial batch normalization为C个特征通道中的每一个都计算出来对应的均值和方差，而这里的均值和方差则是遍历对应特征通道中N张图片和其空间维度(H,W)计算得出的。可以理解为之前的D是这里的$C$，之前的N在这里则是$N\times H \times W$。</p>
<h2 id="前向传播">前向传播</h2>
<p>对输入$X_{input}=(N, C, H, W)$转置为维度$(N\times H\times W, C)$，转化成普通的BN层输入并传递给普通(vanilla)BN层的前向传播函数，再对输出转化成对应的$(N, C, H, W)$。代码如下：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span><span class="lnt">8
</span><span class="lnt">9
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python">    <span class="c1"># *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>

    <span class="n">N</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">H</span><span class="p">,</span> <span class="n">W</span><span class="o">=</span><span class="n">x</span><span class="o">.</span><span class="n">shape</span>
    <span class="n">x_new</span><span class="o">=</span><span class="n">x</span><span class="o">.</span><span class="n">transpose</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="n">C</span><span class="p">)</span>
    <span class="n">out</span><span class="p">,</span><span class="n">cache</span><span class="o">=</span><span class="n">batchnorm_forward</span><span class="p">(</span><span class="n">x_new</span><span class="p">,</span><span class="n">gamma</span><span class="p">,</span><span class="n">beta</span><span class="p">,</span><span class="n">bn_param</span><span class="p">)</span>
    <span class="n">out</span><span class="o">=</span><span class="n">out</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">N</span><span class="p">,</span><span class="n">H</span><span class="p">,</span><span class="n">W</span><span class="p">,</span><span class="n">C</span><span class="p">)</span><span class="o">.</span><span class="n">transpose</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">))</span>
    <span class="k">pass</span>

    <span class="c1"># *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>
</code></pre></td></tr></table>
</div>
</div><h2 id="反向传播">反向传播</h2>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span><span class="lnt">8
</span><span class="lnt">9
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python">    <span class="c1"># *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>

    <span class="n">N</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">H</span><span class="p">,</span> <span class="n">W</span><span class="o">=</span><span class="n">dout</span><span class="o">.</span><span class="n">shape</span>
    <span class="n">dout_new</span><span class="o">=</span><span class="n">dout</span><span class="o">.</span><span class="n">transpose</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="n">C</span><span class="p">)</span>
    <span class="n">dx</span><span class="p">,</span> <span class="n">dgamma</span><span class="p">,</span> <span class="n">dbeta</span> <span class="o">=</span> <span class="n">batchnorm_backward_alt</span><span class="p">(</span><span class="n">dout_new</span><span class="p">,</span><span class="n">cache</span><span class="p">)</span>
    <span class="n">dx</span> <span class="o">=</span> <span class="n">dx</span><span class="o">.</span><span class="n">reshape</span><span class="p">((</span><span class="n">N</span><span class="p">,</span><span class="n">H</span><span class="p">,</span><span class="n">W</span><span class="p">,</span><span class="n">C</span><span class="p">))</span><span class="o">.</span><span class="n">transpose</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">))</span>
    <span class="k">pass</span>

    <span class="c1"># *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>
</code></pre></td></tr></table>
</div>
</div><h1 id="spatial-group-normalization">Spatial Group Normalization</h1>
<p>Spatial Group Normalization可看作解决Layer Normalization在CNN上的表现不能够像Batch Normalization一样好的问题的方案。</p>
<h2 id="前向传播-1">前向传播</h2>
<p>仿照论文中的代码实现：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python">    <span class="c1"># *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>

    <span class="n">cache</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">gamma</span><span class="p">,</span> <span class="n">beta</span><span class="p">,</span> <span class="n">G</span><span class="p">,</span> <span class="n">gn_param</span><span class="p">)</span>

    <span class="n">N</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">H</span><span class="p">,</span> <span class="n">W</span><span class="o">=</span><span class="n">x</span><span class="o">.</span><span class="n">shape</span>
    <span class="n">x_new</span><span class="o">=</span><span class="n">x</span><span class="o">.</span><span class="n">reshape</span><span class="p">((</span><span class="n">N</span><span class="p">,</span><span class="n">G</span><span class="p">,</span><span class="n">C</span><span class="o">//</span><span class="n">G</span><span class="p">,</span><span class="n">H</span><span class="p">,</span><span class="n">W</span><span class="p">))</span>
    <span class="n">mean</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">x_new</span><span class="p">,</span><span class="n">axis</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">),</span><span class="n">keepdims</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
    <span class="n">var</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">var</span><span class="p">(</span><span class="n">x_new</span><span class="p">,</span><span class="n">axis</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">),</span><span class="n">keepdims</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
    <span class="n">x_new</span><span class="o">=</span><span class="p">(</span><span class="n">x_new</span><span class="o">-</span><span class="n">mean</span><span class="p">)</span><span class="o">/</span><span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">var</span><span class="o">+</span><span class="n">eps</span><span class="p">)</span>
    <span class="n">x_new</span><span class="o">=</span><span class="n">x_new</span><span class="o">.</span><span class="n">reshape</span><span class="p">((</span><span class="n">N</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">H</span><span class="p">,</span> <span class="n">W</span><span class="p">))</span>
    <span class="n">out</span><span class="o">=</span><span class="n">x_new</span><span class="o">*</span><span class="n">gamma</span><span class="o">+</span><span class="n">beta</span>
    <span class="k">pass</span>

    <span class="c1"># *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>
</code></pre></td></tr></table>
</div>
</div><h2 id="反向传播-1">反向传播</h2>
<p>参考了这篇<a href="https://canva4.github.io/2020/08/12/CS231n-Assignment2-%E5%AE%9E%E7%8E%B0%E6%97%B6%E9%81%87%E5%88%B0%E7%9A%84%E9%97%AE%E9%A2%98/" target="_blank" rel="noopener noreffer">博客</a>。求导并不复杂，代码实现起来难度较大。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span><span class="lnt">19
</span><span class="lnt">20
</span><span class="lnt">21
</span><span class="lnt">22
</span><span class="lnt">23
</span><span class="lnt">24
</span><span class="lnt">25
</span><span class="lnt">26
</span><span class="lnt">27
</span><span class="lnt">28
</span><span class="lnt">29
</span><span class="lnt">30
</span><span class="lnt">31
</span><span class="lnt">32
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python">    <span class="c1"># *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>

    <span class="n">N</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">H</span><span class="p">,</span> <span class="n">W</span><span class="o">=</span><span class="n">dout</span><span class="o">.</span><span class="n">shape</span>
    <span class="n">x</span><span class="p">,</span> <span class="n">x_new</span><span class="p">,</span> <span class="n">mean</span><span class="p">,</span> <span class="n">var</span><span class="p">,</span> <span class="n">gamma</span><span class="p">,</span> <span class="n">beta</span><span class="p">,</span> <span class="n">G</span><span class="p">,</span> <span class="n">gn_param</span><span class="o">=</span><span class="n">cache</span>
    <span class="n">eps</span><span class="o">=</span><span class="n">gn_param</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&#34;eps&#34;</span><span class="p">,</span> <span class="mf">1e-5</span><span class="p">)</span>

    <span class="n">dgamma</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">dout</span> <span class="o">*</span> <span class="n">x_new</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">))</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
    <span class="n">x</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">G</span><span class="p">,</span> <span class="n">C</span> <span class="o">//</span> <span class="n">G</span><span class="p">,</span> <span class="n">H</span><span class="p">,</span> <span class="n">W</span><span class="p">)</span>
    <span class="c1"># 这里想通过Gradientcheck必须需要将其reshape为(1, C, 1, 1)</span>
    <span class="n">dbeta</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">dout</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">))</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>

    <span class="n">dx_new</span> <span class="o">=</span> <span class="p">(</span><span class="n">dout</span> <span class="o">*</span> <span class="n">gamma</span><span class="p">)</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">G</span><span class="p">,</span> <span class="n">C</span> <span class="o">//</span> <span class="n">G</span><span class="p">,</span> <span class="n">H</span><span class="p">,</span> <span class="n">W</span><span class="p">)</span>
    <span class="n">mean</span> <span class="o">=</span> <span class="n">mean</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">G</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
    <span class="n">var</span> <span class="o">=</span> <span class="n">var</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">G</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
    <span class="n">dL_dvar</span> <span class="o">=</span> <span class="o">-</span><span class="mf">0.5</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">dx_new</span> <span class="o">*</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">mean</span><span class="p">),</span> <span class="n">axis</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">))</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">power</span><span class="p">(</span><span class="n">var</span><span class="o">.</span><span class="n">squeeze</span><span class="p">()</span> <span class="o">+</span> <span class="n">eps</span><span class="p">,</span> <span class="o">-</span><span class="mf">1.5</span><span class="p">)</span>
    <span class="n">dL_dvar</span> <span class="o">=</span> <span class="n">dL_dvar</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">G</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>

    <span class="n">mid</span> <span class="o">=</span> <span class="n">H</span> <span class="o">*</span> <span class="n">W</span> <span class="o">*</span> <span class="n">C</span> <span class="o">//</span> <span class="n">G</span>
    <span class="c1"># add L--&gt;y--&gt;x_hat--&gt;x_i</span>
    <span class="n">dx</span> <span class="o">=</span> <span class="n">dx_new</span> <span class="o">/</span> <span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">var</span> <span class="o">+</span> <span class="n">eps</span><span class="p">)</span>
    <span class="c1"># add L--&gt;mean--&gt;x_i</span>
    <span class="n">dx</span> <span class="o">+=</span> <span class="p">((</span><span class="o">-</span><span class="mi">1</span> <span class="o">/</span> <span class="n">mid</span><span class="p">)</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">dx_new</span> <span class="o">/</span> <span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">var</span> <span class="o">+</span> <span class="n">eps</span><span class="p">),</span> <span class="n">axis</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">)))</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">G</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="o">+</span> <span class="n">dL_dvar</span> <span class="o">*</span> <span class="p">(</span>
        <span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="o">-</span><span class="mi">2</span> <span class="o">*</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">mean</span><span class="p">)</span> <span class="o">/</span> <span class="n">mid</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">)))</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">G</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
    <span class="c1"># add L--&gt;var--&gt;x_i</span>
    <span class="n">dx</span> <span class="o">+=</span> <span class="p">(</span><span class="mi">2</span> <span class="o">/</span> <span class="n">mid</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">mean</span><span class="p">)</span> <span class="o">*</span> <span class="n">dL_dvar</span>
    <span class="n">dx</span> <span class="o">=</span> <span class="n">dx</span><span class="o">.</span><span class="n">reshape</span><span class="p">((</span><span class="n">N</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">H</span><span class="p">,</span> <span class="n">W</span><span class="p">))</span>

    <span class="c1"># dgamma=np.sum(dout*x,axis=(0,2,3)).reshape(1, C, 1, 1)</span>
    <span class="c1"># dbeta=dout.sum(axis=(0,2,3)).reshape((1, C, 1, 1))</span>
    <span class="k">pass</span>

    <span class="c1"># *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>
</code></pre></td></tr></table>
</div>
</div>]]></description></item><item><title>神经网络笔记（三）——卷积神经网络</title><link>https://blog.gaokeyong.top/nn-notes-3/</link><pubDate>Mon, 06 Sep 2021 23:58:41 +0800</pubDate><author>gaokeyong@outlook.com (高轲用)</author><guid>https://blog.gaokeyong.top/nn-notes-3/</guid><description><![CDATA[<p>这篇文章是我在完成CS231N-2021课程的Lab<code>assignment2/ConvolutionalNetworks.ipynb</code>时的学习与实验的摘录与笔记。</p>
<h1 id="卷积运算">卷积运算</h1>
<h2 id="前向传播">前向传播</h2>
<p>输入有$N$个数据点，高度$H$宽度$W$，$C$个通道。每个输入与$F$个不同的filters卷积，每个filter维度为$HH\times WW\times C$。输入的参数还有步长与零补。代码实现输出卷积运算结果的前向传播方法：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span><span class="lnt">19
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python">    <span class="c1"># *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>

    <span class="n">N</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">H</span><span class="p">,</span> <span class="n">W</span><span class="o">=</span><span class="n">x</span><span class="o">.</span><span class="n">shape</span>
    <span class="n">F</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">HH</span><span class="p">,</span> <span class="n">WW</span><span class="o">=</span><span class="n">w</span><span class="o">.</span><span class="n">shape</span>
    <span class="n">stride</span><span class="p">,</span><span class="n">pad</span><span class="o">=</span><span class="n">conv_param</span><span class="p">[</span><span class="s1">&#39;stride&#39;</span><span class="p">],</span><span class="n">conv_param</span><span class="p">[</span><span class="s1">&#39;pad&#39;</span><span class="p">]</span>
    <span class="n">H_prime</span><span class="o">=</span><span class="mi">1</span><span class="o">+</span><span class="p">(</span><span class="n">H</span><span class="o">+</span><span class="mi">2</span><span class="o">*</span><span class="n">pad</span><span class="o">-</span><span class="n">HH</span><span class="p">)</span><span class="o">//</span><span class="n">stride</span>
    <span class="n">W_prime</span><span class="o">=</span><span class="mi">1</span><span class="o">+</span><span class="p">(</span><span class="n">W</span><span class="o">+</span><span class="mi">2</span><span class="o">*</span><span class="n">pad</span><span class="o">-</span><span class="n">WW</span><span class="p">)</span><span class="o">//</span><span class="n">stride</span>

    <span class="n">x_pad</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">pad</span><span class="p">(</span><span class="n">x</span><span class="p">,((</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">),(</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">),(</span><span class="n">pad</span><span class="p">,</span><span class="n">pad</span><span class="p">),(</span><span class="n">pad</span><span class="p">,</span><span class="n">pad</span><span class="p">)))</span>
    <span class="n">out</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="n">N</span><span class="p">,</span><span class="n">F</span><span class="p">,</span><span class="n">H_prime</span><span class="p">,</span><span class="n">W_prime</span><span class="p">))</span>
    <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">N</span><span class="p">):</span>
      <span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">F</span><span class="p">):</span>
        <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">H_prime</span><span class="p">):</span>
          <span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">W_prime</span><span class="p">):</span>
            <span class="c1"># print(x[n,:,i*stride:i*stride+HH+1,j*stride:j*stride+WW+1].shape)</span>
            <span class="n">out</span><span class="p">[</span><span class="n">n</span><span class="p">,</span><span class="n">f</span><span class="p">,</span><span class="n">i</span><span class="p">,</span><span class="n">j</span><span class="p">]</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">w</span><span class="p">[</span><span class="n">f</span><span class="p">,:,:,:]</span><span class="o">*</span><span class="n">x_pad</span><span class="p">[</span><span class="n">n</span><span class="p">,:,</span><span class="n">i</span><span class="o">*</span><span class="n">stride</span><span class="p">:</span><span class="n">i</span><span class="o">*</span><span class="n">stride</span><span class="o">+</span><span class="n">HH</span><span class="p">,</span><span class="n">j</span><span class="o">*</span><span class="n">stride</span><span class="p">:</span><span class="n">j</span><span class="o">*</span><span class="n">stride</span><span class="o">+</span><span class="n">WW</span><span class="p">])</span><span class="o">+</span><span class="n">b</span><span class="p">[</span><span class="n">f</span><span class="p">]</span>
    <span class="k">pass</span>

    <span class="c1"># *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>
</code></pre></td></tr></table>
</div>
</div><h2 id="反向传播">反向传播</h2>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span><span class="lnt">19
</span><span class="lnt">20
</span><span class="lnt">21
</span><span class="lnt">22
</span><span class="lnt">23
</span><span class="lnt">24
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python">    <span class="c1"># *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>

    <span class="n">x</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">conv_param</span><span class="o">=</span><span class="n">cache</span>
    <span class="n">N</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">H</span><span class="p">,</span> <span class="n">W</span><span class="o">=</span><span class="n">x</span><span class="o">.</span><span class="n">shape</span>
    <span class="n">F</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">HH</span><span class="p">,</span> <span class="n">WW</span><span class="o">=</span><span class="n">w</span><span class="o">.</span><span class="n">shape</span>
    <span class="n">stride</span><span class="p">,</span><span class="n">pad</span><span class="o">=</span><span class="n">conv_param</span><span class="p">[</span><span class="s1">&#39;stride&#39;</span><span class="p">],</span><span class="n">conv_param</span><span class="p">[</span><span class="s1">&#39;pad&#39;</span><span class="p">]</span>
    <span class="n">H_prime</span><span class="o">=</span><span class="mi">1</span><span class="o">+</span><span class="p">(</span><span class="n">H</span><span class="o">+</span><span class="mi">2</span><span class="o">*</span><span class="n">pad</span><span class="o">-</span><span class="n">HH</span><span class="p">)</span><span class="o">//</span><span class="n">stride</span>
    <span class="n">W_prime</span><span class="o">=</span><span class="mi">1</span><span class="o">+</span><span class="p">(</span><span class="n">W</span><span class="o">+</span><span class="mi">2</span><span class="o">*</span><span class="n">pad</span><span class="o">-</span><span class="n">WW</span><span class="p">)</span><span class="o">//</span><span class="n">stride</span>

    <span class="n">x_pad</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">pad</span><span class="p">(</span><span class="n">x</span><span class="p">,((</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">),(</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">),(</span><span class="n">pad</span><span class="p">,</span><span class="n">pad</span><span class="p">),(</span><span class="n">pad</span><span class="p">,</span><span class="n">pad</span><span class="p">)))</span>
    <span class="n">dx_pad</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="n">x_pad</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span>
    <span class="n">dw</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="n">w</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span>
    <span class="n">db</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="n">b</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">N</span><span class="p">):</span>
      <span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">F</span><span class="p">):</span>
        <span class="k">for</span> <span class="n">h</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">H_prime</span><span class="p">):</span>
          <span class="k">for</span> <span class="n">w_mid</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">W_prime</span><span class="p">):</span>
            <span class="n">dx_pad</span><span class="p">[</span><span class="n">n</span><span class="p">,</span> <span class="p">:,</span> <span class="n">h</span><span class="o">*</span><span class="n">stride</span><span class="p">:</span><span class="n">h</span><span class="o">*</span><span class="n">stride</span><span class="o">+</span><span class="n">HH</span><span class="p">,</span> <span class="n">w_mid</span><span class="o">*</span><span class="n">stride</span><span class="p">:</span><span class="n">w_mid</span><span class="o">*</span><span class="n">stride</span><span class="o">+</span><span class="n">WW</span><span class="p">]</span><span class="o">+=</span><span class="n">dout</span><span class="p">[</span><span class="n">n</span><span class="p">,</span><span class="n">f</span><span class="p">,</span><span class="n">h</span><span class="p">,</span><span class="n">w_mid</span><span class="p">]</span><span class="o">*</span><span class="n">w</span><span class="p">[</span><span class="n">f</span><span class="p">,:,:,:]</span>
            <span class="n">dw</span><span class="p">[</span><span class="n">f</span><span class="p">,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:]</span><span class="o">+=</span><span class="n">dout</span><span class="p">[</span><span class="n">n</span><span class="p">,</span> <span class="n">f</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w_mid</span><span class="p">]</span><span class="o">*</span><span class="n">x_pad</span><span class="p">[</span><span class="n">n</span><span class="p">,</span> <span class="p">:,</span> <span class="n">h</span><span class="o">*</span><span class="n">stride</span><span class="p">:</span><span class="n">h</span><span class="o">*</span><span class="n">stride</span><span class="o">+</span><span class="n">HH</span><span class="p">,</span> <span class="n">w_mid</span><span class="o">*</span><span class="n">stride</span><span class="p">:</span><span class="n">w_mid</span><span class="o">*</span><span class="n">stride</span><span class="o">+</span><span class="n">WW</span><span class="p">]</span>
            <span class="n">db</span><span class="p">[</span><span class="n">f</span><span class="p">]</span><span class="o">+=</span><span class="n">dout</span><span class="p">[</span><span class="n">n</span><span class="p">,</span><span class="n">f</span><span class="p">,</span><span class="n">h</span><span class="p">,</span><span class="n">w_mid</span><span class="p">]</span>
    <span class="n">dx</span><span class="o">=</span><span class="n">dx_pad</span><span class="p">[:,:,</span><span class="n">pad</span><span class="p">:</span><span class="n">H</span><span class="o">+</span><span class="n">pad</span><span class="p">,</span><span class="n">pad</span><span class="p">:</span><span class="n">W</span><span class="o">+</span><span class="n">pad</span><span class="p">]</span>
    <span class="k">pass</span>

    <span class="c1"># *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>
</code></pre></td></tr></table>
</div>
</div><h1 id="池化层">池化层</h1>
<p>通常，在连续的卷积层之间会周期性地插入一个汇聚层。它的作用是逐渐降低数据体的空间尺寸，这样的话就能减少网络中参数的数量，使得计算资源耗费变少，也能有效控制过拟合。池化层使用MAX操作，对输入数据体的每一个深度切片独立进行操作，改变它的空间尺寸。最常见的形式是池化层使用尺寸2x2的滤波器，以步长为2来对每个深度切片进行降采样，将其中75%的激活信息都丢掉。直观的说，使用MAX操作的池化层取出每个滤波器中最“神经”的那个激活信息。</p>
<h2 id="前向传播-1">前向传播</h2>
<p>代码如下：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python">    <span class="c1"># *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>

    <span class="n">N</span><span class="p">,</span><span class="n">C</span><span class="p">,</span><span class="n">H</span><span class="p">,</span><span class="n">W</span><span class="o">=</span><span class="n">x</span><span class="o">.</span><span class="n">shape</span>
    <span class="n">pool_height</span><span class="p">,</span> <span class="n">pool_width</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="n">pool_param</span><span class="p">[</span><span class="s1">&#39;pool_height&#39;</span><span class="p">],</span> <span class="n">pool_param</span><span class="p">[</span><span class="s1">&#39;pool_width&#39;</span><span class="p">],</span> <span class="n">pool_param</span><span class="p">[</span><span class="s1">&#39;stride&#39;</span><span class="p">]</span>
    <span class="n">H_prime</span><span class="o">=</span><span class="mi">1</span> <span class="o">+</span> <span class="p">(</span><span class="n">H</span> <span class="o">-</span> <span class="n">pool_height</span><span class="p">)</span> <span class="o">//</span> <span class="n">stride</span>
    <span class="n">W_prime</span><span class="o">=</span><span class="mi">1</span> <span class="o">+</span> <span class="p">(</span><span class="n">W</span> <span class="o">-</span> <span class="n">pool_width</span><span class="p">)</span> <span class="o">//</span> <span class="n">stride</span>
    <span class="n">out</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="n">N</span><span class="p">,</span><span class="n">C</span><span class="p">,</span><span class="n">H_prime</span><span class="p">,</span><span class="n">W_prime</span><span class="p">))</span>
    <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">N</span><span class="p">):</span>
      <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">C</span><span class="p">):</span>
        <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">H_prime</span><span class="p">):</span>
          <span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">W_prime</span><span class="p">):</span>
            <span class="n">out</span><span class="p">[</span><span class="n">n</span><span class="p">,</span><span class="n">c</span><span class="p">,</span><span class="n">i</span><span class="p">,</span><span class="n">j</span><span class="p">]</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">max</span><span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="n">n</span><span class="p">,</span><span class="n">c</span><span class="p">,</span><span class="n">i</span><span class="o">*</span><span class="n">stride</span><span class="p">:</span><span class="n">i</span><span class="o">*</span><span class="n">stride</span><span class="o">+</span><span class="n">pool_height</span><span class="p">,</span><span class="n">j</span><span class="o">*</span><span class="n">stride</span><span class="p">:</span><span class="n">j</span><span class="o">*</span><span class="n">stride</span><span class="o">+</span><span class="n">pool_width</span><span class="p">])</span>
    <span class="k">pass</span>

    <span class="c1"># *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>
</code></pre></td></tr></table>
</div>
</div><h2 id="反向传播-1">反向传播</h2>
<p>代码如下：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python">    <span class="c1"># *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>

    <span class="n">x</span><span class="p">,</span> <span class="n">pool_param</span><span class="o">=</span><span class="n">cache</span>
    <span class="n">N</span><span class="p">,</span><span class="n">C</span><span class="p">,</span><span class="n">H</span><span class="p">,</span><span class="n">W</span><span class="o">=</span><span class="n">x</span><span class="o">.</span><span class="n">shape</span>
    <span class="n">pool_height</span><span class="p">,</span> <span class="n">pool_width</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="n">pool_param</span><span class="p">[</span><span class="s1">&#39;pool_height&#39;</span><span class="p">],</span> <span class="n">pool_param</span><span class="p">[</span><span class="s1">&#39;pool_width&#39;</span><span class="p">],</span> <span class="n">pool_param</span><span class="p">[</span><span class="s1">&#39;stride&#39;</span><span class="p">]</span>
    <span class="n">H_prime</span><span class="o">=</span><span class="mi">1</span> <span class="o">+</span> <span class="p">(</span><span class="n">H</span> <span class="o">-</span> <span class="n">pool_height</span><span class="p">)</span> <span class="o">//</span> <span class="n">stride</span>
    <span class="n">W_prime</span><span class="o">=</span><span class="mi">1</span> <span class="o">+</span> <span class="p">(</span><span class="n">W</span> <span class="o">-</span> <span class="n">pool_width</span><span class="p">)</span> <span class="o">//</span> <span class="n">stride</span>
    <span class="n">dx</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="n">x</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">N</span><span class="p">):</span>
      <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">C</span><span class="p">):</span>
        <span class="k">for</span> <span class="n">h</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">H_prime</span><span class="p">):</span>
          <span class="k">for</span> <span class="n">w</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">W_prime</span><span class="p">):</span>
            <span class="c1"># print(x[n,c,h*stride:h*stride+pool_height,w*stride:w*stride+pool_width].shape)</span>
            <span class="n">ind</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">unravel_index</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="n">n</span><span class="p">,</span><span class="n">c</span><span class="p">,</span><span class="n">h</span><span class="o">*</span><span class="n">stride</span><span class="p">:</span><span class="n">h</span><span class="o">*</span><span class="n">stride</span><span class="o">+</span><span class="n">pool_height</span><span class="p">,</span><span class="n">w</span><span class="o">*</span><span class="n">stride</span><span class="p">:</span><span class="n">w</span><span class="o">*</span><span class="n">stride</span><span class="o">+</span><span class="n">pool_width</span><span class="p">]),</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="n">pool_height</span><span class="p">,</span><span class="n">pool_width</span><span class="p">))</span>
            <span class="n">dx</span><span class="p">[</span><span class="n">n</span><span class="p">,</span><span class="n">c</span><span class="p">,</span><span class="n">h</span><span class="o">*</span><span class="n">stride</span><span class="o">+</span><span class="n">ind</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="n">w</span><span class="o">*</span><span class="n">stride</span><span class="o">+</span><span class="n">ind</span><span class="p">[</span><span class="mi">1</span><span class="p">]]</span><span class="o">+=</span><span class="n">dout</span><span class="p">[</span><span class="n">n</span><span class="p">,</span><span class="n">c</span><span class="p">,</span><span class="n">h</span><span class="p">,</span><span class="n">w</span><span class="p">]</span>
    <span class="k">pass</span>

    <span class="c1"># *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>
</code></pre></td></tr></table>
</div>
</div><p>实验中还提供了快速版本的卷积和池化运算的API，快速卷积操作的前向传播和反向传播的加速比分别约为602x和736x。，快速池化操作的前向传播和反向传播的加速比分别约为183x和58x。</p>
<h1 id="三层卷积网络">三层卷积网络</h1>
<p>这里三层卷积网络的结构为<code>conv - relu - 2x2 max pool - affine - relu - affine - softmax</code>，实验中在一个类<code>ThreeLayerConvNet</code>中实现。</p>
<h2 id="参数初始化">参数初始化</h2>
<p>实验中卷积层的<code>padding</code>和<code>stride</code>的设置保证其输出与输入具有相同的高度和宽度。具体来说，步长$S=1$，填充$P=\left\lfloor\frac{F-1}{2}\right\rfloor$，其中$F$为感受野filter的尺寸。我推了一下，貌似这里$F$必须为奇数才能保证输出大小与输入的空间规模相同。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python">        <span class="c1"># *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>

        <span class="c1"># conv layer parameters</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;W1&#39;</span><span class="p">]</span><span class="o">=</span><span class="n">weight_scale</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="n">num_filters</span><span class="p">,</span><span class="n">input_dim</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="n">filter_size</span><span class="p">,</span><span class="n">filter_size</span><span class="p">)</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;b1&#39;</span><span class="p">]</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">num_filters</span><span class="p">)</span>
        <span class="c1"># hidden affine layer parameters</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;W2&#39;</span><span class="p">]</span><span class="o">=</span><span class="n">weight_scale</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="n">num_filters</span><span class="o">*</span><span class="p">(</span><span class="n">input_dim</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">//</span><span class="mi">2</span><span class="p">)</span><span class="o">*</span><span class="p">(</span><span class="n">input_dim</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="o">//</span><span class="mi">2</span><span class="p">),</span><span class="n">hidden_dim</span><span class="p">)</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;b2&#39;</span><span class="p">]</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">hidden_dim</span><span class="p">)</span>
        <span class="c1"># output affine layer parameters</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;W3&#39;</span><span class="p">]</span><span class="o">=</span><span class="n">weight_scale</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="n">hidden_dim</span><span class="p">,</span><span class="n">num_classes</span><span class="p">)</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;b3&#39;</span><span class="p">]</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">num_classes</span><span class="p">)</span>

        <span class="k">pass</span>

        <span class="c1"># *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>
</code></pre></td></tr></table>
</div>
</div><h2 id="损失函数和梯度计算">损失函数和梯度计算</h2>
<p>这里可以使用<code>cs231n/layer_utils.py</code>中提供的“三明治”层，即将多个层例如<code>conv - relu - max_pool</code>的前向传播和反向传播分别整合到一个函数当中去。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span><span class="lnt">19
</span><span class="lnt">20
</span><span class="lnt">21
</span><span class="lnt">22
</span><span class="lnt">23
</span><span class="lnt">24
</span><span class="lnt">25
</span><span class="lnt">26
</span><span class="lnt">27
</span><span class="lnt">28
</span><span class="lnt">29
</span><span class="lnt">30
</span><span class="lnt">31
</span><span class="lnt">32
</span><span class="lnt">33
</span><span class="lnt">34
</span><span class="lnt">35
</span><span class="lnt">36
</span><span class="lnt">37
</span><span class="lnt">38
</span><span class="lnt">39
</span><span class="lnt">40
</span><span class="lnt">41
</span><span class="lnt">42
</span><span class="lnt">43
</span><span class="lnt">44
</span><span class="lnt">45
</span><span class="lnt">46
</span><span class="lnt">47
</span><span class="lnt">48
</span><span class="lnt">49
</span><span class="lnt">50
</span><span class="lnt">51
</span><span class="lnt">52
</span><span class="lnt">53
</span><span class="lnt">54
</span><span class="lnt">55
</span><span class="lnt">56
</span><span class="lnt">57
</span><span class="lnt">58
</span><span class="lnt">59
</span><span class="lnt">60
</span><span class="lnt">61
</span><span class="lnt">62
</span><span class="lnt">63
</span><span class="lnt">64
</span><span class="lnt">65
</span><span class="lnt">66
</span><span class="lnt">67
</span><span class="lnt">68
</span><span class="lnt">69
</span><span class="lnt">70
</span><span class="lnt">71
</span><span class="lnt">72
</span><span class="lnt">73
</span><span class="lnt">74
</span><span class="lnt">75
</span><span class="lnt">76
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python">    <span class="k">def</span> <span class="nf">loss</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
        <span class="s2">&#34;&#34;&#34;
</span><span class="s2">        Evaluate loss and gradient for the three-layer convolutional network.
</span><span class="s2">
</span><span class="s2">        Input / output: Same API as TwoLayerNet in fc_net.py.
</span><span class="s2">        &#34;&#34;&#34;</span>
        <span class="n">W1</span><span class="p">,</span> <span class="n">b1</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s2">&#34;W1&#34;</span><span class="p">],</span> <span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s2">&#34;b1&#34;</span><span class="p">]</span>
        <span class="n">W2</span><span class="p">,</span> <span class="n">b2</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s2">&#34;W2&#34;</span><span class="p">],</span> <span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s2">&#34;b2&#34;</span><span class="p">]</span>
        <span class="n">W3</span><span class="p">,</span> <span class="n">b3</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s2">&#34;W3&#34;</span><span class="p">],</span> <span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s2">&#34;b3&#34;</span><span class="p">]</span>

        <span class="c1"># pass conv_param to the forward pass for the convolutional layer</span>
        <span class="c1"># Padding and stride chosen to preserve the input spatial size</span>
        <span class="n">filter_size</span> <span class="o">=</span> <span class="n">W1</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span>
        <span class="n">conv_param</span> <span class="o">=</span> <span class="p">{</span><span class="s2">&#34;stride&#34;</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="s2">&#34;pad&#34;</span><span class="p">:</span> <span class="p">(</span><span class="n">filter_size</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">//</span> <span class="mi">2</span><span class="p">}</span>

        <span class="c1"># pass pool_param to the forward pass for the max-pooling layer</span>
        <span class="n">pool_param</span> <span class="o">=</span> <span class="p">{</span><span class="s2">&#34;pool_height&#34;</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span> <span class="s2">&#34;pool_width&#34;</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span> <span class="s2">&#34;stride&#34;</span><span class="p">:</span> <span class="mi">2</span><span class="p">}</span>

        <span class="n">scores</span> <span class="o">=</span> <span class="kc">None</span>
        <span class="c1">############################################################################</span>
        <span class="c1"># TODO: Implement the forward pass for the three-layer convolutional net,  #</span>
        <span class="c1"># computing the class scores for X and storing them in the scores          #</span>
        <span class="c1"># variable.                                                                #</span>
        <span class="c1">#                                                                          #</span>
        <span class="c1"># Remember you can use the functions defined in cs231n/fast_layers.py and  #</span>
        <span class="c1"># cs231n/layer_utils.py in your implementation (already imported).         #</span>
        <span class="c1">############################################################################</span>
        <span class="c1"># *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>

        <span class="n">out_conv_relu_pool</span><span class="p">,</span> <span class="n">cache_conv_relu_pool</span><span class="o">=</span><span class="n">conv_relu_pool_forward</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">W1</span><span class="p">,</span><span class="n">b1</span><span class="p">,</span><span class="n">conv_param</span><span class="p">,</span><span class="n">pool_param</span><span class="p">)</span>
        <span class="n">out_affine_relu</span><span class="p">,</span> <span class="n">cache_affine_relu</span><span class="o">=</span><span class="n">affine_relu_forward</span><span class="p">(</span><span class="n">out_conv_relu_pool</span><span class="p">,</span><span class="n">W2</span><span class="p">,</span><span class="n">b2</span><span class="p">)</span>
        <span class="n">scores</span><span class="p">,</span> <span class="n">cache_output_affine</span><span class="o">=</span><span class="n">affine_forward</span><span class="p">(</span><span class="n">out_affine_relu</span><span class="p">,</span><span class="n">W3</span><span class="p">,</span><span class="n">b3</span><span class="p">)</span>

        <span class="k">pass</span>

        <span class="c1"># *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>
        <span class="c1">############################################################################</span>
        <span class="c1">#                             END OF YOUR CODE                             #</span>
        <span class="c1">############################################################################</span>

        <span class="k">if</span> <span class="n">y</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
            <span class="k">return</span> <span class="n">scores</span>

        <span class="n">loss</span><span class="p">,</span> <span class="n">grads</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="p">{}</span>
        <span class="c1">############################################################################</span>
        <span class="c1"># TODO: Implement the backward pass for the three-layer convolutional net, #</span>
        <span class="c1"># storing the loss and gradients in the loss and grads variables. Compute  #</span>
        <span class="c1"># data loss using softmax, and make sure that grads[k] holds the gradients #</span>
        <span class="c1"># for self.params[k]. Don&#39;t forget to add L2 regularization!               #</span>
        <span class="c1">#                                                                          #</span>
        <span class="c1"># NOTE: To ensure that your implementation matches ours and you pass the   #</span>
        <span class="c1"># automated tests, make sure that your L2 regularization includes a factor #</span>
        <span class="c1"># of 0.5 to simplify the expression for the gradient.                      #</span>
        <span class="c1">############################################################################</span>
        <span class="c1"># *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>

        <span class="n">loss</span><span class="p">,</span> <span class="n">grad</span><span class="o">=</span><span class="n">softmax_loss</span><span class="p">(</span><span class="n">scores</span><span class="p">,</span><span class="n">y</span><span class="p">)</span>
        <span class="n">loss</span><span class="o">+=</span><span class="mf">0.5</span><span class="o">*</span><span class="bp">self</span><span class="o">.</span><span class="n">reg</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">W1</span><span class="o">*</span><span class="n">W1</span><span class="p">)</span>
        <span class="n">loss</span><span class="o">+=</span><span class="mf">0.5</span><span class="o">*</span><span class="bp">self</span><span class="o">.</span><span class="n">reg</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">W2</span><span class="o">*</span><span class="n">W2</span><span class="p">)</span>
        <span class="n">loss</span><span class="o">+=</span><span class="mf">0.5</span><span class="o">*</span><span class="bp">self</span><span class="o">.</span><span class="n">reg</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">W3</span><span class="o">*</span><span class="n">W3</span><span class="p">)</span>

        <span class="n">grad</span><span class="p">,</span> <span class="n">grads</span><span class="p">[</span><span class="s1">&#39;W3&#39;</span><span class="p">],</span> <span class="n">grads</span><span class="p">[</span><span class="s1">&#39;b3&#39;</span><span class="p">]</span><span class="o">=</span><span class="n">affine_backward</span><span class="p">(</span><span class="n">grad</span><span class="p">,</span><span class="n">cache_output_affine</span><span class="p">)</span>
        <span class="n">grads</span><span class="p">[</span><span class="s1">&#39;W3&#39;</span><span class="p">]</span><span class="o">+=</span><span class="bp">self</span><span class="o">.</span><span class="n">reg</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">W3</span><span class="p">)</span>
        <span class="n">grad</span><span class="p">,</span> <span class="n">grads</span><span class="p">[</span><span class="s1">&#39;W2&#39;</span><span class="p">],</span> <span class="n">grads</span><span class="p">[</span><span class="s1">&#39;b2&#39;</span><span class="p">]</span><span class="o">=</span><span class="n">affine_relu_backward</span><span class="p">(</span><span class="n">grad</span><span class="p">,</span><span class="n">cache_affine_relu</span><span class="p">)</span>
        <span class="n">grads</span><span class="p">[</span><span class="s1">&#39;W2&#39;</span><span class="p">]</span><span class="o">+=</span><span class="bp">self</span><span class="o">.</span><span class="n">reg</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">W2</span><span class="p">)</span>
        <span class="n">grad</span><span class="p">,</span> <span class="n">grads</span><span class="p">[</span><span class="s1">&#39;W1&#39;</span><span class="p">],</span> <span class="n">grads</span><span class="p">[</span><span class="s1">&#39;b1&#39;</span><span class="p">]</span><span class="o">=</span><span class="n">conv_relu_pool_backward</span><span class="p">(</span><span class="n">grad</span><span class="p">,</span> <span class="n">cache_conv_relu_pool</span><span class="p">)</span>
        <span class="n">grads</span><span class="p">[</span><span class="s1">&#39;W1&#39;</span><span class="p">]</span><span class="o">+=</span><span class="bp">self</span><span class="o">.</span><span class="n">reg</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">W1</span><span class="p">)</span>

        <span class="k">pass</span>

        <span class="c1"># *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>
        <span class="c1">############################################################################</span>
        <span class="c1">#                             END OF YOUR CODE                             #</span>
        <span class="c1">############################################################################</span>

        <span class="k">return</span> <span class="n">loss</span><span class="p">,</span> <span class="n">grads</span>
</code></pre></td></tr></table>
</div>
</div><p>实验中还有Spatial Batch Normalization和Spatial Group Normalization两部分的实验，留到下一次做吧。</p>
]]></description></item><item><title>神经网络笔记（二）——Batch Normalization &amp; DropOut</title><link>https://blog.gaokeyong.top/nn-notes-2/</link><pubDate>Sun, 05 Sep 2021 15:41:56 +0800</pubDate><author>gaokeyong@outlook.com (高轲用)</author><guid>https://blog.gaokeyong.top/nn-notes-2/</guid><description><![CDATA[<h1 id="batch-normalization">Batch Normalization</h1>
<p>对应的实验是<code>BatchNormalization.ipynb</code>。</p>
<p>引用<a href="https://cs231n.github.io/neural-networks-2/#batchnorm" target="_blank" rel="noopener noreffer">官方课程笔记</a>的话：</p>
<blockquote>
<p>批量归一化（Batch Normalization）。批量归一化是loffe和Szegedy最近（2015年）才提出的方法，该方法减轻了如何合理初始化神经网络这个棘手问题带来的头痛：），其做法是让激活数据在训练开始前通过一个网络，网络处理数据使其服从标准高斯分布。因为归一化是一个简单可求导的操作，所以上述思路是可行的。在实现层面，应用这个技巧通常意味着全连接层（或者是卷积层，后续会讲）与激活函数之间添加一个BatchNorm层。对于这个技巧本节不会展开讲，因为上面的参考文献中已经讲得很清楚了，需要知道的是在神经网络中使用批量归一化已经变得非常常见。在实践中，使用了批量归一化的网络对于不好的初始值有更强的鲁棒性。最后一句话总结：批量归一化可以理解为在网络的每一层之前都做预处理，只是这种操作以另一种方式与网络集成在了一起。搞定！<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup></p>
</blockquote>
<p><a href="https://arxiv.org/pdf/1502.03167.pdf" target="_blank" rel="noopener noreffer">Batch Normalization的论文</a>中提出了<em>Internal Covariate Shift</em>的现象，即每个输入层的分布在训练的过程中会由于前层的参数的改变而发生改变，一个层需要不断地去适应其输入的新的分布。当网络深度较大时，前层参数的变化可能会在后层参数项放大而产生指数级变化，这使得我们很难选择一个合适的学习率，也会产生非线性函数的饱和导致的难以训练的现象。BN能够很好的减小Internal Covariate Shift，使得我们可以使用更高的学习率和在参数初始化上不必过分小心。<sup id="fnref:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup> <sup id="fnref:3"><a href="#fn:3" class="footnote-ref" role="doc-noteref">3</a></sup></p>
<p>BN的思想是通过归一化来修正层的输入，来期望能提升训练的速度。众所周知，如果输入经过了白化(whitened)，网络的训练将收敛的更快。</p>
<p>标准化一个单元的均值和标准差会降低包含该单元的神经网络的表达能力。为了保持网络的表现力，通常会将对于归一化的输入替换为</p>
<p>$$ y^{(k)}=\gamma^{(k)}\hat{x}^{(k)}+\beta^{(k)} $$</p>
<p>特别的，当$\gamma^2=\sigma^2$，$\beta=\mu$时，可以实现等价变换（identity transform）并且保留了原始输入特征的分布信息。通过上面的步骤，我们就在一定程度上保证了输入数据的表达能力。$\gamma$和$\beta$是两个需要被学习的参数。</p>
<h2 id="前向传播与后向传播">前向传播与后向传播</h2>
<h3 id="前向传播">前向传播</h3>
<p>训练阶段对每个批次更新滑动平均和方差，用于对测试输入的归一化。代码如下：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="c1"># momentum是衰减系数, pyTorch里面的值为0.1</span>
<span class="n">running_mean</span> <span class="o">=</span> <span class="n">momentum</span> <span class="o">*</span> <span class="n">running_mean</span> <span class="o">+</span> <span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="n">momentum</span><span class="p">)</span> <span class="o">*</span> <span class="n">sample_mean</span>
<span class="n">running_var</span> <span class="o">=</span> <span class="n">momentum</span> <span class="o">*</span> <span class="n">running_var</span> <span class="o">+</span> <span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="n">momentum</span><span class="p">)</span> <span class="o">*</span> <span class="n">sample_var</span>
</code></pre></td></tr></table>
</div>
</div><p>实验代码：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="k">if</span> <span class="n">mode</span> <span class="o">==</span> <span class="s2">&#34;train&#34;</span><span class="p">:</span>
    <span class="c1"># *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>

    <span class="n">sample_mean</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
    <span class="n">sample_var</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">var</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>

    <span class="n">x_hat</span><span class="o">=</span><span class="p">(</span><span class="n">x</span><span class="o">-</span><span class="n">sample_mean</span><span class="p">)</span><span class="o">/</span><span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">sample_var</span><span class="o">+</span><span class="n">eps</span><span class="p">)</span>

    <span class="n">out</span><span class="o">=</span><span class="n">gamma</span><span class="o">*</span><span class="n">x_hat</span><span class="o">+</span><span class="n">beta</span>
    <span class="n">cache</span><span class="o">=</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">x_hat</span><span class="p">,</span> <span class="n">sample_mean</span><span class="p">,</span> <span class="n">sample_var</span><span class="p">,</span> <span class="n">gamma</span><span class="p">,</span> <span class="n">beta</span><span class="p">,</span> <span class="n">eps</span><span class="p">)</span>

    <span class="n">running_mean</span> <span class="o">=</span> <span class="n">momentum</span> <span class="o">*</span> <span class="n">running_mean</span> <span class="o">+</span> <span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="n">momentum</span><span class="p">)</span> <span class="o">*</span> <span class="n">sample_mean</span>
    <span class="n">running_var</span> <span class="o">=</span> <span class="n">momentum</span> <span class="o">*</span> <span class="n">running_var</span> <span class="o">+</span> <span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="n">momentum</span><span class="p">)</span> <span class="o">*</span> <span class="n">sample_var</span>
    <span class="k">pass</span>

    <span class="c1"># *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>
</code></pre></td></tr></table>
</div>
</div><div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span><span class="lnt">8
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="k">elif</span> <span class="n">mode</span> <span class="o">==</span> <span class="s2">&#34;test&#34;</span><span class="p">:</span>
    <span class="c1"># *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>

    <span class="n">x_hat</span><span class="o">=</span><span class="p">(</span><span class="n">x</span><span class="o">-</span><span class="n">running_mean</span><span class="p">)</span><span class="o">/</span><span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">running_var</span><span class="o">+</span><span class="n">eps</span><span class="p">)</span>
    <span class="n">out</span><span class="o">=</span><span class="n">gamma</span><span class="o">*</span><span class="n">x_hat</span><span class="o">+</span><span class="n">beta</span>
    <span class="k">pass</span>

    <span class="c1"># *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>
</code></pre></td></tr></table>
</div>
</div><h3 id="后向传播">后向传播</h3>
<p>需要计算的偏导数有$\frac{\partial L}{\partial x_i}$,$\frac{\partial L}{\partial \gamma}$,$\frac{\partial L}{\partial \beta}$。论文中有推导过程可以参考。代码实现如下：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="c1"># *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>

<span class="n">N</span><span class="p">,</span><span class="n">D</span> <span class="o">=</span> <span class="n">dout</span><span class="o">.</span><span class="n">shape</span>
<span class="n">x</span><span class="p">,</span> <span class="n">x_hat</span><span class="p">,</span> <span class="n">sample_mean</span><span class="p">,</span> <span class="n">sample_var</span><span class="p">,</span> <span class="n">gamma</span><span class="p">,</span> <span class="n">beta</span><span class="p">,</span> <span class="n">eps</span> <span class="o">=</span> <span class="n">cache</span>
<span class="n">dx_hat</span><span class="o">=</span><span class="n">dout</span><span class="o">*</span><span class="n">gamma</span>
<span class="n">dvar</span><span class="o">=-</span><span class="mf">0.5</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">dx_hat</span><span class="o">*</span><span class="p">(</span><span class="n">x</span><span class="o">-</span><span class="n">sample_mean</span><span class="p">),</span><span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">power</span><span class="p">(</span><span class="n">sample_var</span><span class="o">+</span><span class="n">eps</span><span class="p">,</span><span class="o">-</span><span class="mf">1.5</span><span class="p">)</span>
<span class="n">dmean</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">dx_hat</span><span class="o">*</span><span class="p">(</span><span class="o">-</span><span class="mf">1.0</span><span class="o">/</span><span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">sample_var</span><span class="o">.</span><span class="n">T</span><span class="o">+</span><span class="n">eps</span><span class="p">)),</span><span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span><span class="o">+</span><span class="n">dvar</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="o">-</span><span class="mi">2</span><span class="o">*</span><span class="p">(</span><span class="n">x</span><span class="o">-</span><span class="n">sample_mean</span><span class="p">))</span><span class="o">/</span><span class="n">N</span>
<span class="n">dx</span><span class="o">=</span><span class="n">dx_hat</span><span class="o">/</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">sample_var</span><span class="o">+</span><span class="n">eps</span><span class="p">))</span><span class="o">+</span><span class="n">dvar</span><span class="o">*</span><span class="mi">2</span><span class="o">*</span><span class="p">(</span><span class="n">x</span><span class="o">-</span><span class="n">sample_mean</span><span class="p">)</span><span class="o">/</span><span class="n">N</span><span class="o">+</span><span class="n">dmean</span><span class="o">/</span><span class="n">N</span>
<span class="n">dgamma</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">dout</span><span class="o">*</span><span class="n">x_hat</span><span class="p">,</span><span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="n">dbeta</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">dout</span><span class="p">,</span><span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>

<span class="k">pass</span>

<span class="c1"># *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>
</code></pre></td></tr></table>
</div>
</div><h2 id="将bn添加到fully-connected-net中">将BN添加到Fully Connected Net中</h2>
<h3 id="初始化">初始化</h3>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="c1"># *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>

<span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;W1&#39;</span><span class="p">]</span><span class="o">=</span><span class="n">weight_scale</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="n">input_dim</span><span class="p">,</span><span class="n">hidden_dims</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;b1&#39;</span><span class="p">]</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">hidden_dims</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">normalization</span><span class="o">==</span><span class="s2">&#34;batchnorm&#34;</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;gamma1&#39;</span><span class="p">]</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">ones</span><span class="p">(</span><span class="n">hidden_dims</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;beta1&#39;</span><span class="p">]</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">hidden_dims</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="nb">len</span><span class="p">(</span><span class="n">hidden_dims</span><span class="p">)):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;W&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)]</span><span class="o">=</span><span class="n">weight_scale</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="n">hidden_dims</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span><span class="n">hidden_dims</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
<span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;b&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)]</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">hidden_dims</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">normalization</span><span class="o">==</span><span class="s2">&#34;batchnorm&#34;</span><span class="p">:</span>
    <span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;gamma&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)]</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">ones</span><span class="p">(</span><span class="n">hidden_dims</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
    <span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;beta&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)]</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">hidden_dims</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
<span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;W&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">hidden_dims</span><span class="p">)</span><span class="o">+</span><span class="mi">1</span><span class="p">)]</span><span class="o">=</span><span class="n">weight_scale</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="n">hidden_dims</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span><span class="n">num_classes</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;b&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">hidden_dims</span><span class="p">)</span><span class="o">+</span><span class="mi">1</span><span class="p">)]</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">num_classes</span><span class="p">)</span>
<span class="k">pass</span>

<span class="c1"># *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>
</code></pre></td></tr></table>
</div>
</div><h3 id="计算scores">计算scores</h3>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span><span class="lnt">19
</span><span class="lnt">20
</span><span class="lnt">21
</span><span class="lnt">22
</span><span class="lnt">23
</span><span class="lnt">24
</span><span class="lnt">25
</span><span class="lnt">26
</span><span class="lnt">27
</span><span class="lnt">28
</span><span class="lnt">29
</span><span class="lnt">30
</span><span class="lnt">31
</span><span class="lnt">32
</span><span class="lnt">33
</span><span class="lnt">34
</span><span class="lnt">35
</span><span class="lnt">36
</span><span class="lnt">37
</span><span class="lnt">38
</span><span class="lnt">39
</span><span class="lnt">40
</span><span class="lnt">41
</span><span class="lnt">42
</span><span class="lnt">43
</span><span class="lnt">44
</span><span class="lnt">45
</span><span class="lnt">46
</span><span class="lnt">47
</span><span class="lnt">48
</span><span class="lnt">49
</span><span class="lnt">50
</span><span class="lnt">51
</span><span class="lnt">52
</span><span class="lnt">53
</span><span class="lnt">54
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python">        <span class="c1"># *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>

        <span class="n">aff_outs</span><span class="o">=</span><span class="p">[]</span>
        <span class="n">bn_outs</span><span class="o">=</span><span class="p">[]</span>
        <span class="n">relu_outs</span><span class="o">=</span><span class="p">[]</span>
        <span class="n">aff_caches</span><span class="o">=</span><span class="p">[]</span>
        <span class="n">bn_caches</span><span class="o">=</span><span class="p">[]</span>
        <span class="n">relu_caches</span><span class="o">=</span><span class="p">[]</span>
        <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">num_layers</span><span class="o">-</span><span class="mi">1</span><span class="p">):</span>
          <span class="c1"># affine forward</span>
          <span class="n">aff_out</span><span class="p">,</span> <span class="n">aff_cache</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="kc">None</span>
          <span class="k">if</span> <span class="n">i</span><span class="o">==</span><span class="mi">0</span><span class="p">:</span>
            <span class="n">aff_out</span><span class="p">,</span> <span class="n">aff_cache</span><span class="o">=</span><span class="n">affine_forward</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;W1&#39;</span><span class="p">],</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;b1&#39;</span><span class="p">])</span>
          <span class="k">else</span><span class="p">:</span>
            <span class="n">aff_out</span><span class="p">,</span> <span class="n">aff_cache</span><span class="o">=</span><span class="n">affine_forward</span><span class="p">(</span><span class="n">drop_outs</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;W&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)],</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;b&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)])</span>
          <span class="n">aff_outs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">aff_out</span><span class="p">)</span>
          <span class="n">aff_caches</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">aff_cache</span><span class="p">)</span>
          <span class="c1"># BN forward</span>
          <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">normalization</span><span class="o">==</span><span class="s2">&#34;batchnorm&#34;</span><span class="p">:</span>
            <span class="k">if</span> <span class="n">i</span><span class="o">!=</span><span class="bp">self</span><span class="o">.</span><span class="n">num_layers</span><span class="o">-</span><span class="mi">1</span><span class="p">:</span>
              <span class="n">tgamma</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;gamma&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)]</span>
              <span class="n">tbeta</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;beta&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)]</span>
              <span class="n">bnp</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">bn_params</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
              <span class="n">bn_out</span><span class="p">,</span> <span class="n">bn_cache</span><span class="o">=</span><span class="n">batchnorm_forward</span><span class="p">(</span><span class="n">aff_out</span><span class="p">,</span><span class="n">tgamma</span><span class="p">,</span><span class="n">tbeta</span><span class="p">,</span><span class="n">bnp</span><span class="p">)</span>
              <span class="n">bn_outs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">bn_out</span><span class="p">)</span>
              <span class="n">bn_caches</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">bn_cache</span><span class="p">)</span>
            <span class="k">else</span><span class="p">:</span>
              <span class="n">bn_out</span><span class="o">=</span><span class="n">aff_out</span>
          <span class="k">elif</span> <span class="bp">self</span><span class="o">.</span><span class="n">normalization</span><span class="o">==</span><span class="s2">&#34;layernorm&#34;</span><span class="p">:</span>
            <span class="k">if</span> <span class="n">i</span><span class="o">!=</span><span class="bp">self</span><span class="o">.</span><span class="n">num_layers</span><span class="o">-</span><span class="mi">1</span><span class="p">:</span>
              <span class="n">tgamma</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;gamma&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)]</span>
              <span class="n">tbeta</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;beta&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)]</span>
              <span class="n">bnp</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">bn_params</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
              <span class="n">bn_out</span><span class="p">,</span> <span class="n">bn_cache</span><span class="o">=</span><span class="n">layernorm_forward</span><span class="p">(</span><span class="n">aff_out</span><span class="p">,</span><span class="n">tgamma</span><span class="p">,</span><span class="n">tbeta</span><span class="p">,</span><span class="n">bnp</span><span class="p">)</span>
              <span class="n">bn_outs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">bn_out</span><span class="p">)</span>
              <span class="n">bn_caches</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">bn_cache</span><span class="p">)</span>
            <span class="k">else</span><span class="p">:</span>
              <span class="n">bn_out</span><span class="o">=</span><span class="n">aff_out</span>
          <span class="k">else</span><span class="p">:</span>
            <span class="n">bn_out</span><span class="o">=</span><span class="n">aff_out</span>
          <span class="c1"># ReLU forward</span>
          <span class="n">relu_out</span><span class="p">,</span> <span class="n">relu_cache</span><span class="o">=</span><span class="n">relu_forward</span><span class="p">(</span><span class="n">bn_out</span><span class="p">)</span>
          <span class="n">relu_outs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">relu_out</span><span class="p">)</span>
          <span class="n">relu_caches</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">relu_cache</span><span class="p">)</span>
          <span class="k">pass</span>
        <span class="n">i</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">num_layers</span>
        <span class="n">aff_out</span><span class="p">,</span> <span class="n">aff_cache</span><span class="o">=</span><span class="n">affine_forward</span><span class="p">(</span><span class="n">drop_outs</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;W&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)],</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;b&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)])</span>
        <span class="n">aff_outs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">aff_out</span><span class="p">)</span>
        <span class="n">aff_caches</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">aff_cache</span><span class="p">)</span>
        <span class="n">scores</span><span class="o">=</span><span class="n">aff_outs</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>

        <span class="k">pass</span>

        <span class="c1"># *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>
</code></pre></td></tr></table>
</div>
</div><h3 id="计算梯度">计算梯度</h3>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span><span class="lnt">19
</span><span class="lnt">20
</span><span class="lnt">21
</span><span class="lnt">22
</span><span class="lnt">23
</span><span class="lnt">24
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python">        <span class="c1"># *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>

        <span class="c1"># print(self.num_layers, len(drop_caches),len(relu_caches),len(bn_caches),len(aff_caches))</span>
        <span class="n">loss</span><span class="p">,</span> <span class="n">grad</span><span class="o">=</span><span class="n">softmax_loss</span><span class="p">(</span><span class="n">scores</span><span class="p">,</span><span class="n">y</span><span class="p">)</span>
        <span class="n">i</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">num_layers</span>
        <span class="n">grad</span><span class="p">,</span><span class="n">grads</span><span class="p">[</span><span class="s1">&#39;W&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)],</span><span class="n">grads</span><span class="p">[</span><span class="s1">&#39;b&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)]</span><span class="o">=</span><span class="n">affine_backward</span><span class="p">(</span><span class="n">grad</span><span class="p">,</span><span class="n">aff_caches</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
        <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">num_layers</span><span class="p">):</span>
          <span class="n">loss</span><span class="o">+=</span><span class="mf">0.5</span><span class="o">*</span><span class="bp">self</span><span class="o">.</span><span class="n">reg</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">square</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;W&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)]))</span>
        <span class="c1"># backprop</span>
        <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">num_layers</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="o">-</span><span class="mi">1</span><span class="p">):</span>
          <span class="n">grad</span><span class="o">=</span><span class="n">relu_backward</span><span class="p">(</span><span class="n">grad</span><span class="p">,</span> <span class="n">relu_caches</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
          <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">normalization</span><span class="o">==</span><span class="s2">&#34;batchnorm&#34;</span><span class="p">:</span>
            <span class="k">if</span> <span class="n">i</span><span class="o">!=</span><span class="bp">self</span><span class="o">.</span><span class="n">num_layers</span><span class="p">:</span>
              <span class="n">grad</span><span class="p">,</span><span class="n">grads</span><span class="p">[</span><span class="s1">&#39;gamma&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)],</span><span class="n">grads</span><span class="p">[</span><span class="s1">&#39;beta&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)]</span><span class="o">=</span><span class="n">batchnorm_backward_alt</span><span class="p">(</span><span class="n">grad</span><span class="p">,</span><span class="n">bn_caches</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
          <span class="k">elif</span> <span class="bp">self</span><span class="o">.</span><span class="n">normalization</span><span class="o">==</span><span class="s2">&#34;layernorm&#34;</span><span class="p">:</span>
            <span class="k">if</span> <span class="n">i</span><span class="o">!=</span><span class="bp">self</span><span class="o">.</span><span class="n">num_layers</span><span class="p">:</span>
              <span class="n">grad</span><span class="p">,</span><span class="n">grads</span><span class="p">[</span><span class="s1">&#39;gamma&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)],</span><span class="n">grads</span><span class="p">[</span><span class="s1">&#39;beta&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)]</span><span class="o">=</span><span class="n">layernorm_backward</span><span class="p">(</span><span class="n">grad</span><span class="p">,</span><span class="n">bn_caches</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
          <span class="n">grad</span><span class="p">,</span><span class="n">grads</span><span class="p">[</span><span class="s1">&#39;W&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)],</span><span class="n">grads</span><span class="p">[</span><span class="s1">&#39;b&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)]</span><span class="o">=</span><span class="n">affine_backward</span><span class="p">(</span><span class="n">grad</span><span class="p">,</span><span class="n">aff_caches</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
          <span class="n">grads</span><span class="p">[</span><span class="s1">&#39;W&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)]</span><span class="o">+=</span><span class="bp">self</span><span class="o">.</span><span class="n">reg</span><span class="o">*</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;W&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)]</span>


        <span class="k">pass</span>

        <span class="c1"># *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>
</code></pre></td></tr></table>
</div>
</div><h1 id="dropout">Dropout</h1>
<p><strong>随机失活</strong>(Dropout)是一个简单又极其有效的正则化方法。与L1正则化，L2正则化和最大范式约束等方法互为补充。在训练的时候，随机失活的实现方法是让神经元以超参数p的概率被激活或者被设置为0。从课程的实验中能够看到Dropout能够有效地对抗过拟合。添加Dropout层后，完整的loss函数实现如下：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">  1
</span><span class="lnt">  2
</span><span class="lnt">  3
</span><span class="lnt">  4
</span><span class="lnt">  5
</span><span class="lnt">  6
</span><span class="lnt">  7
</span><span class="lnt">  8
</span><span class="lnt">  9
</span><span class="lnt"> 10
</span><span class="lnt"> 11
</span><span class="lnt"> 12
</span><span class="lnt"> 13
</span><span class="lnt"> 14
</span><span class="lnt"> 15
</span><span class="lnt"> 16
</span><span class="lnt"> 17
</span><span class="lnt"> 18
</span><span class="lnt"> 19
</span><span class="lnt"> 20
</span><span class="lnt"> 21
</span><span class="lnt"> 22
</span><span class="lnt"> 23
</span><span class="lnt"> 24
</span><span class="lnt"> 25
</span><span class="lnt"> 26
</span><span class="lnt"> 27
</span><span class="lnt"> 28
</span><span class="lnt"> 29
</span><span class="lnt"> 30
</span><span class="lnt"> 31
</span><span class="lnt"> 32
</span><span class="lnt"> 33
</span><span class="lnt"> 34
</span><span class="lnt"> 35
</span><span class="lnt"> 36
</span><span class="lnt"> 37
</span><span class="lnt"> 38
</span><span class="lnt"> 39
</span><span class="lnt"> 40
</span><span class="lnt"> 41
</span><span class="lnt"> 42
</span><span class="lnt"> 43
</span><span class="lnt"> 44
</span><span class="lnt"> 45
</span><span class="lnt"> 46
</span><span class="lnt"> 47
</span><span class="lnt"> 48
</span><span class="lnt"> 49
</span><span class="lnt"> 50
</span><span class="lnt"> 51
</span><span class="lnt"> 52
</span><span class="lnt"> 53
</span><span class="lnt"> 54
</span><span class="lnt"> 55
</span><span class="lnt"> 56
</span><span class="lnt"> 57
</span><span class="lnt"> 58
</span><span class="lnt"> 59
</span><span class="lnt"> 60
</span><span class="lnt"> 61
</span><span class="lnt"> 62
</span><span class="lnt"> 63
</span><span class="lnt"> 64
</span><span class="lnt"> 65
</span><span class="lnt"> 66
</span><span class="lnt"> 67
</span><span class="lnt"> 68
</span><span class="lnt"> 69
</span><span class="lnt"> 70
</span><span class="lnt"> 71
</span><span class="lnt"> 72
</span><span class="lnt"> 73
</span><span class="lnt"> 74
</span><span class="lnt"> 75
</span><span class="lnt"> 76
</span><span class="lnt"> 77
</span><span class="lnt"> 78
</span><span class="lnt"> 79
</span><span class="lnt"> 80
</span><span class="lnt"> 81
</span><span class="lnt"> 82
</span><span class="lnt"> 83
</span><span class="lnt"> 84
</span><span class="lnt"> 85
</span><span class="lnt"> 86
</span><span class="lnt"> 87
</span><span class="lnt"> 88
</span><span class="lnt"> 89
</span><span class="lnt"> 90
</span><span class="lnt"> 91
</span><span class="lnt"> 92
</span><span class="lnt"> 93
</span><span class="lnt"> 94
</span><span class="lnt"> 95
</span><span class="lnt"> 96
</span><span class="lnt"> 97
</span><span class="lnt"> 98
</span><span class="lnt"> 99
</span><span class="lnt">100
</span><span class="lnt">101
</span><span class="lnt">102
</span><span class="lnt">103
</span><span class="lnt">104
</span><span class="lnt">105
</span><span class="lnt">106
</span><span class="lnt">107
</span><span class="lnt">108
</span><span class="lnt">109
</span><span class="lnt">110
</span><span class="lnt">111
</span><span class="lnt">112
</span><span class="lnt">113
</span><span class="lnt">114
</span><span class="lnt">115
</span><span class="lnt">116
</span><span class="lnt">117
</span><span class="lnt">118
</span><span class="lnt">119
</span><span class="lnt">120
</span><span class="lnt">121
</span><span class="lnt">122
</span><span class="lnt">123
</span><span class="lnt">124
</span><span class="lnt">125
</span><span class="lnt">126
</span><span class="lnt">127
</span><span class="lnt">128
</span><span class="lnt">129
</span><span class="lnt">130
</span><span class="lnt">131
</span><span class="lnt">132
</span><span class="lnt">133
</span><span class="lnt">134
</span><span class="lnt">135
</span><span class="lnt">136
</span><span class="lnt">137
</span><span class="lnt">138
</span><span class="lnt">139
</span><span class="lnt">140
</span><span class="lnt">141
</span><span class="lnt">142
</span><span class="lnt">143
</span><span class="lnt">144
</span><span class="lnt">145
</span><span class="lnt">146
</span><span class="lnt">147
</span><span class="lnt">148
</span><span class="lnt">149
</span><span class="lnt">150
</span><span class="lnt">151
</span><span class="lnt">152
</span><span class="lnt">153
</span><span class="lnt">154
</span><span class="lnt">155
</span><span class="lnt">156
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python">    <span class="k">def</span> <span class="nf">loss</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
        <span class="s2">&#34;&#34;&#34;Compute loss and gradient for the fully connected net.
</span><span class="s2">        
</span><span class="s2">        Inputs:
</span><span class="s2">        - X: Array of input data of shape (N, d_1, ..., d_k)
</span><span class="s2">        - y: Array of labels, of shape (N,). y[i] gives the label for X[i].
</span><span class="s2">
</span><span class="s2">        Returns:
</span><span class="s2">        If y is None, then run a test-time forward pass of the model and return:
</span><span class="s2">        - scores: Array of shape (N, C) giving classification scores, where
</span><span class="s2">            scores[i, c] is the classification score for X[i] and class c.
</span><span class="s2">
</span><span class="s2">        If y is not None, then run a training-time forward and backward pass and
</span><span class="s2">        return a tuple of:
</span><span class="s2">        - loss: Scalar value giving the loss
</span><span class="s2">        - grads: Dictionary with the same keys as self.params, mapping parameter
</span><span class="s2">            names to gradients of the loss with respect to those parameters.
</span><span class="s2">        &#34;&#34;&#34;</span>
        <span class="n">X</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">dtype</span><span class="p">)</span>
        <span class="n">mode</span> <span class="o">=</span> <span class="s2">&#34;test&#34;</span> <span class="k">if</span> <span class="n">y</span> <span class="ow">is</span> <span class="kc">None</span> <span class="k">else</span> <span class="s2">&#34;train&#34;</span>

        <span class="c1"># Set train/test mode for batchnorm params and dropout param since they</span>
        <span class="c1"># behave differently during training and testing.</span>
        <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">use_dropout</span><span class="p">:</span>
            <span class="bp">self</span><span class="o">.</span><span class="n">dropout_param</span><span class="p">[</span><span class="s2">&#34;mode&#34;</span><span class="p">]</span> <span class="o">=</span> <span class="n">mode</span>
        <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">normalization</span> <span class="o">==</span> <span class="s2">&#34;batchnorm&#34;</span><span class="p">:</span>
            <span class="k">for</span> <span class="n">bn_param</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">bn_params</span><span class="p">:</span>
                <span class="n">bn_param</span><span class="p">[</span><span class="s2">&#34;mode&#34;</span><span class="p">]</span> <span class="o">=</span> <span class="n">mode</span>
        <span class="n">scores</span> <span class="o">=</span> <span class="kc">None</span>
        <span class="c1">############################################################################</span>
        <span class="c1"># TODO: Implement the forward pass for the fully connected net, computing  #</span>
        <span class="c1"># the class scores for X and storing them in the scores variable.          #</span>
        <span class="c1">#                                                                          #</span>
        <span class="c1"># When using dropout, you&#39;ll need to pass self.dropout_param to each       #</span>
        <span class="c1"># dropout forward pass.                                                    #</span>
        <span class="c1">#                                                                          #</span>
        <span class="c1"># When using batch normalization, you&#39;ll need to pass self.bn_params[0] to #</span>
        <span class="c1"># the forward pass for the first batch normalization layer, pass           #</span>
        <span class="c1"># self.bn_params[1] to the forward pass for the second batch normalization #</span>
        <span class="c1"># layer, etc.                                                              #</span>
        <span class="c1">############################################################################</span>
        <span class="c1"># *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>

        <span class="n">aff_outs</span><span class="o">=</span><span class="p">[]</span>
        <span class="n">bn_outs</span><span class="o">=</span><span class="p">[]</span>
        <span class="n">relu_outs</span><span class="o">=</span><span class="p">[]</span>
        <span class="n">drop_outs</span><span class="o">=</span><span class="p">[]</span>
        <span class="n">aff_caches</span><span class="o">=</span><span class="p">[]</span>
        <span class="n">bn_caches</span><span class="o">=</span><span class="p">[]</span>
        <span class="n">relu_caches</span><span class="o">=</span><span class="p">[]</span>
        <span class="n">drop_caches</span><span class="o">=</span><span class="p">[]</span>
        <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">num_layers</span><span class="o">-</span><span class="mi">1</span><span class="p">):</span>
          <span class="c1"># affine forward</span>
          <span class="n">aff_out</span><span class="p">,</span> <span class="n">aff_cache</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="kc">None</span>
          <span class="k">if</span> <span class="n">i</span><span class="o">==</span><span class="mi">0</span><span class="p">:</span>
            <span class="n">aff_out</span><span class="p">,</span> <span class="n">aff_cache</span><span class="o">=</span><span class="n">affine_forward</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;W1&#39;</span><span class="p">],</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;b1&#39;</span><span class="p">])</span>
          <span class="k">else</span><span class="p">:</span>
            <span class="n">aff_out</span><span class="p">,</span> <span class="n">aff_cache</span><span class="o">=</span><span class="n">affine_forward</span><span class="p">(</span><span class="n">drop_outs</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;W&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)],</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;b&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)])</span>
          <span class="n">aff_outs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">aff_out</span><span class="p">)</span>
          <span class="n">aff_caches</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">aff_cache</span><span class="p">)</span>
          <span class="c1"># BN forward</span>
          <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">normalization</span><span class="o">==</span><span class="s2">&#34;batchnorm&#34;</span><span class="p">:</span>
            <span class="k">if</span> <span class="n">i</span><span class="o">!=</span><span class="bp">self</span><span class="o">.</span><span class="n">num_layers</span><span class="o">-</span><span class="mi">1</span><span class="p">:</span>
              <span class="n">tgamma</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;gamma&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)]</span>
              <span class="n">tbeta</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;beta&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)]</span>
              <span class="n">bnp</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">bn_params</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
              <span class="n">bn_out</span><span class="p">,</span> <span class="n">bn_cache</span><span class="o">=</span><span class="n">batchnorm_forward</span><span class="p">(</span><span class="n">aff_out</span><span class="p">,</span><span class="n">tgamma</span><span class="p">,</span><span class="n">tbeta</span><span class="p">,</span><span class="n">bnp</span><span class="p">)</span>
              <span class="n">bn_outs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">bn_out</span><span class="p">)</span>
              <span class="n">bn_caches</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">bn_cache</span><span class="p">)</span>
            <span class="k">else</span><span class="p">:</span>
              <span class="n">bn_out</span><span class="o">=</span><span class="n">aff_out</span>
          <span class="k">elif</span> <span class="bp">self</span><span class="o">.</span><span class="n">normalization</span><span class="o">==</span><span class="s2">&#34;layernorm&#34;</span><span class="p">:</span>
            <span class="k">if</span> <span class="n">i</span><span class="o">!=</span><span class="bp">self</span><span class="o">.</span><span class="n">num_layers</span><span class="o">-</span><span class="mi">1</span><span class="p">:</span>
              <span class="n">tgamma</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;gamma&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)]</span>
              <span class="n">tbeta</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;beta&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)]</span>
              <span class="n">bnp</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">bn_params</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
              <span class="n">bn_out</span><span class="p">,</span> <span class="n">bn_cache</span><span class="o">=</span><span class="n">layernorm_forward</span><span class="p">(</span><span class="n">aff_out</span><span class="p">,</span><span class="n">tgamma</span><span class="p">,</span><span class="n">tbeta</span><span class="p">,</span><span class="n">bnp</span><span class="p">)</span>
              <span class="n">bn_outs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">bn_out</span><span class="p">)</span>
              <span class="n">bn_caches</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">bn_cache</span><span class="p">)</span>
            <span class="k">else</span><span class="p">:</span>
              <span class="n">bn_out</span><span class="o">=</span><span class="n">aff_out</span>
          <span class="k">else</span><span class="p">:</span>
            <span class="n">bn_out</span><span class="o">=</span><span class="n">aff_out</span>
          <span class="c1"># ReLU forward</span>
          <span class="n">relu_out</span><span class="p">,</span> <span class="n">relu_cache</span><span class="o">=</span><span class="n">relu_forward</span><span class="p">(</span><span class="n">bn_out</span><span class="p">)</span>
          <span class="n">relu_outs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">relu_out</span><span class="p">)</span>
          <span class="n">relu_caches</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">relu_cache</span><span class="p">)</span>
          <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">use_dropout</span><span class="p">:</span>
            <span class="n">drop_out</span><span class="p">,</span> <span class="n">drop_cache</span><span class="o">=</span><span class="n">dropout_forward</span><span class="p">(</span><span class="n">relu_out</span><span class="p">,</span><span class="bp">self</span><span class="o">.</span><span class="n">dropout_param</span><span class="p">)</span>
            <span class="n">drop_caches</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">drop_cache</span><span class="p">)</span>
          <span class="k">else</span><span class="p">:</span>
            <span class="n">drop_out</span><span class="o">=</span><span class="n">relu_out</span>
          <span class="n">drop_outs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">drop_out</span><span class="p">)</span>
          <span class="k">pass</span>
        <span class="n">i</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">num_layers</span>
        <span class="n">aff_out</span><span class="p">,</span> <span class="n">aff_cache</span><span class="o">=</span><span class="n">affine_forward</span><span class="p">(</span><span class="n">drop_outs</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;W&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)],</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;b&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)])</span>
        <span class="n">aff_outs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">aff_out</span><span class="p">)</span>
        <span class="n">aff_caches</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">aff_cache</span><span class="p">)</span>
        <span class="n">scores</span><span class="o">=</span><span class="n">aff_outs</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>

        <span class="k">pass</span>

        <span class="c1"># *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>
        <span class="c1">############################################################################</span>
        <span class="c1">#                             END OF YOUR CODE                             #</span>
        <span class="c1">############################################################################</span>

        <span class="c1"># If test mode return early.</span>
        <span class="k">if</span> <span class="n">mode</span> <span class="o">==</span> <span class="s2">&#34;test&#34;</span><span class="p">:</span>
            <span class="k">return</span> <span class="n">scores</span>

        <span class="n">loss</span><span class="p">,</span> <span class="n">grads</span> <span class="o">=</span> <span class="mf">0.0</span><span class="p">,</span> <span class="p">{}</span>
        <span class="c1">############################################################################</span>
        <span class="c1"># TODO: Implement the backward pass for the fully connected net. Store the #</span>
        <span class="c1"># loss in the loss variable and gradients in the grads dictionary. Compute #</span>
        <span class="c1"># data loss using softmax, and make sure that grads[k] holds the gradients #</span>
        <span class="c1"># for self.params[k]. Don&#39;t forget to add L2 regularization!               #</span>
        <span class="c1">#                                                                          #</span>
        <span class="c1"># When using batch/layer normalization, you don&#39;t need to regularize the   #</span>
        <span class="c1"># scale and shift parameters.                                              #</span>
        <span class="c1">#                                                                          #</span>
        <span class="c1"># NOTE: To ensure that your implementation matches ours and you pass the   #</span>
        <span class="c1"># automated tests, make sure that your L2 regularization includes a factor #</span>
        <span class="c1"># of 0.5 to simplify the expression for the gradient.                      #</span>
        <span class="c1">############################################################################</span>
        <span class="c1"># *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>

        <span class="c1"># print(self.num_layers, len(drop_caches),len(relu_caches),len(bn_caches),len(aff_caches))</span>
        <span class="n">loss</span><span class="p">,</span> <span class="n">grad</span><span class="o">=</span><span class="n">softmax_loss</span><span class="p">(</span><span class="n">scores</span><span class="p">,</span><span class="n">y</span><span class="p">)</span>
        <span class="n">i</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">num_layers</span>
        <span class="n">grad</span><span class="p">,</span><span class="n">grads</span><span class="p">[</span><span class="s1">&#39;W&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)],</span><span class="n">grads</span><span class="p">[</span><span class="s1">&#39;b&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)]</span><span class="o">=</span><span class="n">affine_backward</span><span class="p">(</span><span class="n">grad</span><span class="p">,</span><span class="n">aff_caches</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
        <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">num_layers</span><span class="p">):</span>
          <span class="n">loss</span><span class="o">+=</span><span class="mf">0.5</span><span class="o">*</span><span class="bp">self</span><span class="o">.</span><span class="n">reg</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">square</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;W&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)]))</span>
        <span class="c1"># backprop</span>
        <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">num_layers</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="o">-</span><span class="mi">1</span><span class="p">):</span>
          <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">use_dropout</span><span class="p">:</span>
            <span class="n">grad</span><span class="o">=</span><span class="n">dropout_backward</span><span class="p">(</span><span class="n">grad</span><span class="p">,</span><span class="n">drop_caches</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
          <span class="n">grad</span><span class="o">=</span><span class="n">relu_backward</span><span class="p">(</span><span class="n">grad</span><span class="p">,</span> <span class="n">relu_caches</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
          <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">normalization</span><span class="o">==</span><span class="s2">&#34;batchnorm&#34;</span><span class="p">:</span>
            <span class="k">if</span> <span class="n">i</span><span class="o">!=</span><span class="bp">self</span><span class="o">.</span><span class="n">num_layers</span><span class="p">:</span>
              <span class="n">grad</span><span class="p">,</span><span class="n">grads</span><span class="p">[</span><span class="s1">&#39;gamma&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)],</span><span class="n">grads</span><span class="p">[</span><span class="s1">&#39;beta&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)]</span><span class="o">=</span><span class="n">batchnorm_backward_alt</span><span class="p">(</span><span class="n">grad</span><span class="p">,</span><span class="n">bn_caches</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
          <span class="k">elif</span> <span class="bp">self</span><span class="o">.</span><span class="n">normalization</span><span class="o">==</span><span class="s2">&#34;layernorm&#34;</span><span class="p">:</span>
            <span class="k">if</span> <span class="n">i</span><span class="o">!=</span><span class="bp">self</span><span class="o">.</span><span class="n">num_layers</span><span class="p">:</span>
              <span class="n">grad</span><span class="p">,</span><span class="n">grads</span><span class="p">[</span><span class="s1">&#39;gamma&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)],</span><span class="n">grads</span><span class="p">[</span><span class="s1">&#39;beta&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)]</span><span class="o">=</span><span class="n">layernorm_backward</span><span class="p">(</span><span class="n">grad</span><span class="p">,</span><span class="n">bn_caches</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
          <span class="n">grad</span><span class="p">,</span><span class="n">grads</span><span class="p">[</span><span class="s1">&#39;W&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)],</span><span class="n">grads</span><span class="p">[</span><span class="s1">&#39;b&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)]</span><span class="o">=</span><span class="n">affine_backward</span><span class="p">(</span><span class="n">grad</span><span class="p">,</span><span class="n">aff_caches</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
          <span class="n">grads</span><span class="p">[</span><span class="s1">&#39;W&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)]</span><span class="o">+=</span><span class="bp">self</span><span class="o">.</span><span class="n">reg</span><span class="o">*</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;W&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)]</span>


        <span class="k">pass</span>

        <span class="c1"># *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>
        <span class="c1">############################################################################</span>
        <span class="c1">#                             END OF YOUR CODE                             #</span>
        <span class="c1">############################################################################</span>

        <span class="k">return</span> <span class="n">loss</span><span class="p">,</span> <span class="n">grads</span>
</code></pre></td></tr></table>
</div>
</div><h1 id="参考资料">参考资料</h1>
<section class="footnotes" role="doc-endnotes">
<hr>
<ol>
<li id="fn:1" role="doc-endnote">
<p><a href="https://zhuanlan.zhihu.com/p/21560667?refer=intelligentunit" target="_blank" rel="noopener noreffer">CS231n课程笔记翻译：神经网络笔记 2</a>&#160;<a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p><a href="https://zhuanlan.zhihu.com/p/34879333" target="_blank" rel="noopener noreffer">Batch Normalization原理与实战</a>&#160;<a href="#fnref:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p><a href="https://www.bilibili.com/video/BV1Wv411h7kN?p=15" target="_blank" rel="noopener noreffer">李宏毅2021春机器学习课程</a>&#160;<a href="#fnref:3" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
</ol>
</section>
]]></description></item><item><title>神经网络笔记（一）——Fully Connected Nets</title><link>https://blog.gaokeyong.top/nn-notes-1/</link><pubDate>Sat, 04 Sep 2021 20:33:38 +0800</pubDate><author>gaokeyong@outlook.com (高轲用)</author><guid>https://blog.gaokeyong.top/nn-notes-1/</guid><description><![CDATA[<p>这篇文章是我在完成CS231N-2021课程的Lab<code>assignment2/FullyConnectedNets.ipynb</code>时的学习与实验的摘录与笔记。</p>
<h1 id="参数初始化">参数初始化</h1>
<p>补全<code>cs231n/classifiers/fc_net.py</code>以实现网络初始化、正向传播和反向传播算法。核心代码如下：</p>
<p>参数初始化：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span><span class="lnt">19
</span><span class="lnt">20
</span><span class="lnt">21
</span><span class="lnt">22
</span><span class="lnt">23
</span><span class="lnt">24
</span><span class="lnt">25
</span><span class="lnt">26
</span><span class="lnt">27
</span><span class="lnt">28
</span><span class="lnt">29
</span><span class="lnt">30
</span><span class="lnt">31
</span><span class="lnt">32
</span><span class="lnt">33
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="k">class</span> <span class="nc">FullyConnectedNet</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>

    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span>
        <span class="bp">self</span><span class="p">,</span>
        <span class="n">hidden_dims</span><span class="p">,</span>
        <span class="n">input_dim</span><span class="o">=</span><span class="mi">3</span> <span class="o">*</span> <span class="mi">32</span> <span class="o">*</span> <span class="mi">32</span><span class="p">,</span>
        <span class="n">num_classes</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span>
        <span class="n">dropout_keep_ratio</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
        <span class="n">normalization</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
        <span class="n">reg</span><span class="o">=</span><span class="mf">0.0</span><span class="p">,</span>
        <span class="n">weight_scale</span><span class="o">=</span><span class="mf">1e-2</span><span class="p">,</span>
        <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">float32</span><span class="p">,</span>
        <span class="n">seed</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
    <span class="p">):</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">normalization</span> <span class="o">=</span> <span class="n">normalization</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">use_dropout</span> <span class="o">=</span> <span class="n">dropout_keep_ratio</span> <span class="o">!=</span> <span class="mi">1</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">reg</span> <span class="o">=</span> <span class="n">reg</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">num_layers</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">+</span> <span class="nb">len</span><span class="p">(</span><span class="n">hidden_dims</span><span class="p">)</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">dtype</span> <span class="o">=</span> <span class="n">dtype</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">params</span> <span class="o">=</span> <span class="p">{}</span>

        <span class="c1"># *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>

        <span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;W1&#39;</span><span class="p">]</span><span class="o">=</span><span class="n">weight_scale</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="n">input_dim</span><span class="p">,</span><span class="n">hidden_dims</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;b1&#39;</span><span class="p">]</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">hidden_dims</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
        <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="nb">len</span><span class="p">(</span><span class="n">hidden_dims</span><span class="p">)):</span>
          <span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;W&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)]</span><span class="o">=</span><span class="n">weight_scale</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="n">hidden_dims</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span><span class="n">hidden_dims</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
          <span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;b&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)]</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">hidden_dims</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;W&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">hidden_dims</span><span class="p">)</span><span class="o">+</span><span class="mi">1</span><span class="p">)]</span><span class="o">=</span><span class="n">weight_scale</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="n">hidden_dims</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span><span class="n">num_classes</span><span class="p">)</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;b&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">hidden_dims</span><span class="p">)</span><span class="o">+</span><span class="mi">1</span><span class="p">)]</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">num_classes</span><span class="p">)</span>
        <span class="k">pass</span>

        <span class="c1"># *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>
</code></pre></td></tr></table>
</div>
</div><p>损失函数和梯度计算：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span><span class="lnt">19
</span><span class="lnt">20
</span><span class="lnt">21
</span><span class="lnt">22
</span><span class="lnt">23
</span><span class="lnt">24
</span><span class="lnt">25
</span><span class="lnt">26
</span><span class="lnt">27
</span><span class="lnt">28
</span><span class="lnt">29
</span><span class="lnt">30
</span><span class="lnt">31
</span><span class="lnt">32
</span><span class="lnt">33
</span><span class="lnt">34
</span><span class="lnt">35
</span><span class="lnt">36
</span><span class="lnt">37
</span><span class="lnt">38
</span><span class="lnt">39
</span><span class="lnt">40
</span><span class="lnt">41
</span><span class="lnt">42
</span><span class="lnt">43
</span><span class="lnt">44
</span><span class="lnt">45
</span><span class="lnt">46
</span><span class="lnt">47
</span><span class="lnt">48
</span><span class="lnt">49
</span><span class="lnt">50
</span><span class="lnt">51
</span><span class="lnt">52
</span><span class="lnt">53
</span><span class="lnt">54
</span><span class="lnt">55
</span><span class="lnt">56
</span><span class="lnt">57
</span><span class="lnt">58
</span><span class="lnt">59
</span><span class="lnt">60
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">loss</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
    <span class="n">X</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">dtype</span><span class="p">)</span>
    <span class="n">mode</span> <span class="o">=</span> <span class="s2">&#34;test&#34;</span> <span class="k">if</span> <span class="n">y</span> <span class="ow">is</span> <span class="kc">None</span> <span class="k">else</span> <span class="s2">&#34;train&#34;</span>

    <span class="c1"># Set train/test mode for batchnorm params and dropout param since they</span>
    <span class="c1"># behave differently during training and testing.</span>
    <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">use_dropout</span><span class="p">:</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">dropout_param</span><span class="p">[</span><span class="s2">&#34;mode&#34;</span><span class="p">]</span> <span class="o">=</span> <span class="n">mode</span>
    <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">normalization</span> <span class="o">==</span> <span class="s2">&#34;batchnorm&#34;</span><span class="p">:</span>
        <span class="k">for</span> <span class="n">bn_param</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">bn_params</span><span class="p">:</span>
            <span class="n">bn_param</span><span class="p">[</span><span class="s2">&#34;mode&#34;</span><span class="p">]</span> <span class="o">=</span> <span class="n">mode</span>
    <span class="n">scores</span> <span class="o">=</span> <span class="kc">None</span>
    <span class="c1"># *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>

    <span class="n">aff_outs</span><span class="o">=</span><span class="p">[]</span>
    <span class="n">relu_outs</span><span class="o">=</span><span class="p">[]</span>
    <span class="n">aff_caches</span><span class="o">=</span><span class="p">[]</span>
    <span class="n">relu_caches</span><span class="o">=</span><span class="p">[]</span>
    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">num_layers</span><span class="p">):</span>
        <span class="c1"># affine forward</span>
        <span class="n">aff_out</span><span class="p">,</span> <span class="n">aff_cache</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="kc">None</span>
        <span class="k">if</span> <span class="n">i</span><span class="o">==</span><span class="mi">0</span><span class="p">:</span>
        <span class="n">aff_out</span><span class="p">,</span> <span class="n">aff_cache</span><span class="o">=</span><span class="n">affine_forward</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;W1&#39;</span><span class="p">],</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;b1&#39;</span><span class="p">])</span>
        <span class="k">else</span><span class="p">:</span>
        <span class="n">aff_out</span><span class="p">,</span> <span class="n">aff_cache</span><span class="o">=</span><span class="n">affine_forward</span><span class="p">(</span><span class="n">relu_outs</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;W&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)],</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;b&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)])</span>
        <span class="n">aff_outs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">aff_out</span><span class="p">)</span>
        <span class="n">aff_caches</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">aff_cache</span><span class="p">)</span>
        <span class="c1"># ReLU forward</span>
        <span class="n">relu_out</span><span class="p">,</span> <span class="n">relu_cache</span><span class="o">=</span><span class="n">relu_forward</span><span class="p">(</span><span class="n">aff_outs</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
        <span class="n">relu_outs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">relu_out</span><span class="p">)</span>
        <span class="n">relu_caches</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">relu_cache</span><span class="p">)</span>
        <span class="k">pass</span>
    <span class="n">scores</span><span class="o">=</span><span class="n">relu_outs</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>

    <span class="k">pass</span>

    <span class="c1"># *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>
    
    <span class="c1"># If test mode return early.</span>
    <span class="k">if</span> <span class="n">mode</span> <span class="o">==</span> <span class="s2">&#34;test&#34;</span><span class="p">:</span>
        <span class="k">return</span> <span class="n">scores</span>

    <span class="n">loss</span><span class="p">,</span> <span class="n">grads</span> <span class="o">=</span> <span class="mf">0.0</span><span class="p">,</span> <span class="p">{}</span>
    <span class="c1"># *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>

    <span class="n">loss</span><span class="p">,</span> <span class="n">grad</span><span class="o">=</span><span class="n">softmax_loss</span><span class="p">(</span><span class="n">scores</span><span class="p">,</span><span class="n">y</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">num_layers</span><span class="p">):</span>
        <span class="n">loss</span><span class="o">+=</span><span class="mf">0.5</span><span class="o">*</span><span class="bp">self</span><span class="o">.</span><span class="n">reg</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">square</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;W&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)]))</span>
    <span class="c1"># backprop</span>
    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">num_layers</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="o">-</span><span class="mi">1</span><span class="p">):</span>
        <span class="n">grad</span><span class="o">=</span><span class="n">relu_backward</span><span class="p">(</span><span class="n">grad</span><span class="p">,</span> <span class="n">relu_caches</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
        <span class="n">grad</span><span class="p">,</span><span class="n">grads</span><span class="p">[</span><span class="s1">&#39;W&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)],</span><span class="n">grads</span><span class="p">[</span><span class="s1">&#39;b&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)]</span><span class="o">=</span><span class="n">affine_backward</span><span class="p">(</span><span class="n">grad</span><span class="p">,</span><span class="n">aff_caches</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
        <span class="n">grads</span><span class="p">[</span><span class="s1">&#39;W&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)]</span><span class="o">+=</span><span class="bp">self</span><span class="o">.</span><span class="n">reg</span><span class="o">*</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;W&#39;</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)]</span>


    <span class="k">pass</span>

    <span class="c1"># *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>
    
    <span class="k">return</span> <span class="n">loss</span><span class="p">,</span> <span class="n">grads</span>
</code></pre></td></tr></table>
</div>
</div><h1 id="参数更新">参数更新</h1>
<p>反向传播计算出的解析梯度用于进行参数的更新。</p>
<h2 id="随机梯度下降sgd及各种更新方法">随机梯度下降(SGD)及各种更新方法</h2>
<h3 id="普通更新">普通更新</h3>
<p>沿着负梯度方向改变参数。假设有一个参数向量x及其梯度dx，那么最简单的更新的形式是：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"># Vanilla update
x += - learning_rate * dx
</code></pre></td></tr></table>
</div>
</div><p>其中<code>learning_rate</code>是一个超参数，它是一个固定的常量。当在整个数据集上进行计算时，只要学习率足够低，总是能在损失函数上得到非负的进展。</p>
<h3 id="动量momentum更新">动量(Momentum)更新</h3>
<p>这个方法在深度网络上几乎总能得到更好的收敛速度。该方法可以看成是从物理角度上对于最优化问题得到的启发。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"># Momentum update
v = mu * v - learning_rate * dx # integrate velocity
x += v # integrate position
</code></pre></td></tr></table>
</div>
</div><p>这里动量的物理意义与摩擦系数更一致。容易理解这里的“动量”抑制了速度，降低了系统的动能，不然质点在山底永远不会停下来。通过交叉验证，这个参数通常设为[0.5,0.9,0.95,0.99]中的一个。和学习率随着时间退火（下文有讨论）类似，动量随时间变化的设置有时能略微改善最优化的效果，其中动量在学习过程的后阶段会上升。一个典型的设置是刚开始将动量设为0.5而在后面的多个周期（epoch）中慢慢提升到0.99。</p>
<h4 id="实验代码">实验代码</h4>
<p>补全<code>cs231n/optim.py</code>的<code>sgd_momentum</code>函数实现动量更新（注意这里不是Nesterov动量），代码片段如下：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span><span class="lnt">19
</span><span class="lnt">20
</span><span class="lnt">21
</span><span class="lnt">22
</span><span class="lnt">23
</span><span class="lnt">24
</span><span class="lnt">25
</span><span class="lnt">26
</span><span class="lnt">27
</span><span class="lnt">28
</span><span class="lnt">29
</span><span class="lnt">30
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">sgd_momentum</span><span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="n">dw</span><span class="p">,</span> <span class="n">config</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
    <span class="s2">&#34;&#34;&#34;
</span><span class="s2">    Performs stochastic gradient descent with momentum.
</span><span class="s2">
</span><span class="s2">    config format:
</span><span class="s2">    - learning_rate: Scalar learning rate.
</span><span class="s2">    - momentum: Scalar between 0 and 1 giving the momentum value.
</span><span class="s2">      Setting momentum = 0 reduces to sgd.
</span><span class="s2">    - velocity: A numpy array of the same shape as w and dw used to store a
</span><span class="s2">      moving average of the gradients.
</span><span class="s2">    &#34;&#34;&#34;</span>
    <span class="k">if</span> <span class="n">config</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
        <span class="n">config</span> <span class="o">=</span> <span class="p">{}</span>
    <span class="n">config</span><span class="o">.</span><span class="n">setdefault</span><span class="p">(</span><span class="s2">&#34;learning_rate&#34;</span><span class="p">,</span> <span class="mf">1e-2</span><span class="p">)</span>
    <span class="n">config</span><span class="o">.</span><span class="n">setdefault</span><span class="p">(</span><span class="s2">&#34;momentum&#34;</span><span class="p">,</span> <span class="mf">0.9</span><span class="p">)</span>
    <span class="n">v</span> <span class="o">=</span> <span class="n">config</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&#34;velocity&#34;</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros_like</span><span class="p">(</span><span class="n">w</span><span class="p">))</span>

    <span class="n">next_w</span> <span class="o">=</span> <span class="kc">None</span>
    <span class="c1">###########################################################################</span>
    <span class="c1"># TODO: Implement the momentum update formula. Store the updated value in #</span>
    <span class="c1"># the next_w variable. You should also use and update the velocity v.     #</span>
    <span class="c1">###########################################################################</span>
    <span class="c1"># *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>

    <span class="n">v</span><span class="o">=</span><span class="n">config</span><span class="p">[</span><span class="s1">&#39;momentum&#39;</span><span class="p">]</span><span class="o">*</span><span class="n">v</span><span class="o">-</span><span class="n">config</span><span class="p">[</span><span class="s1">&#39;learning_rate&#39;</span><span class="p">]</span><span class="o">*</span><span class="n">dw</span>
    <span class="n">next_w</span><span class="o">=</span><span class="n">w</span><span class="o">+</span><span class="n">v</span>

    <span class="k">pass</span>

    <span class="c1"># *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>
</code></pre></td></tr></table>
</div>
</div><p>相对误差输出信息参考如下：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback">next_w error:  8.882347033505819e-09
velocity error:  4.269287743278663e-09
</code></pre></td></tr></table>
</div>
</div><p>在实验中我们可以直观的看到SGD+动量能够更快的收敛。</p>
<h3 id="nesterov动量">Nesterov动量</h3>
<p>与普通动量有些许不同，最近变得比较流行。在理论上对于凸函数它能得到更好的收敛，在实践中也确实比标准动量表现更好一些。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback">v_prev = v # back this up
v = mu * v - learning_rate * dx # velocity update stays the same
x += -mu * v_prev + (1 + mu) * v # position update changes form
</code></pre></td></tr></table>
</div>
</div><h2 id="学习率退火">学习率退火</h2>
<p>如果学习率很高，系统的动能就过大，参数向量就会无规律地跳动，不能够稳定到损失函数更深更窄的部分去。</p>
<p>在实践中，我们发现随步数衰减的随机失活（dropout）更受欢迎，因为它使用的超参数（衰减系数和以周期为时间单位的步数）比更有解释性。</p>
<h3 id="二阶方法">二阶方法</h3>
<p>需要求解Hessian矩阵，其操作非常耗费时间和空间。在深度学习和卷积神经网络中，使用L-BFGS之类的二阶方法并不常见。相反，基于（Nesterov的）动量更新的各种随机梯度下降方法更加常用，因为它们更加简单且容易扩展。</p>
<h3 id="逐参数适应学习率方法">逐参数适应学习率方法</h3>
<p>前面讨论的所有方法都是对学习率进行全局地操作，并且对所有的参数都是一样的。学习率调参是很耗费计算资源的过程，所以很多工作投入到发明能够适应性地对学习率调参的方法，甚至是逐个参数适应学习率调参。很多这些方法依然需要其他的超参数设置，但是其观点是这些方法对于更广范围的超参数比原始的学习率方法有更良好的表现。</p>
<p>在CS231N的实验中需要实现RMSprop和Adam两个方法，这两种方法可以视作对Adagrad方法的改进。</p>
<h4 id="adagrad">Adagrad</h4>
<p>核心思想是接收到高梯度值的权重更新的效果被减弱，而接收到低梯度值的权重的更新效果将会增强。一个缺点是，在深度学习中单调的学习率被证明通常过于激进且过早停止学习。<a href="https://zhuanlan.zhihu.com/p/29920135" target="_blank" rel="noopener noreffer">这里</a>是Adagrad方法的一个直观解释。代码如下：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="c1"># Assume the gradient dx and parameter vector x</span>
<span class="n">cache</span> <span class="o">+=</span> <span class="n">dx</span><span class="o">**</span><span class="mi">2</span>
<span class="n">x</span> <span class="o">+=</span> <span class="o">-</span> <span class="n">learning_rate</span> <span class="o">*</span> <span class="n">dx</span> <span class="o">/</span> <span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">cache</span><span class="p">)</span> <span class="o">+</span> <span class="n">eps</span><span class="p">)</span>
</code></pre></td></tr></table>
</div>
</div><h4 id="rmsprop">RMSprop</h4>
<p>这个方法用一种很简单的方式修改了Adagrad方法，让它不那么激进，单调地降低了学习率。具体说来，就是它使用了一个梯度平方的滑动平均：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="n">cache</span> <span class="o">=</span>  <span class="n">decay_rate</span> <span class="o">*</span> <span class="n">cache</span> <span class="o">+</span> <span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="n">decay_rate</span><span class="p">)</span> <span class="o">*</span> <span class="n">dx</span><span class="o">**</span><span class="mi">2</span>
<span class="n">x</span> <span class="o">+=</span> <span class="o">-</span> <span class="n">learning_rate</span> <span class="o">*</span> <span class="n">dx</span> <span class="o">/</span> <span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">cache</span><span class="p">)</span> <span class="o">+</span> <span class="n">eps</span><span class="p">)</span>
</code></pre></td></tr></table>
</div>
</div><p>在上面的代码中，decay_rate是一个超参数，常用的值是[0.9,0.99,0.999]。其中x+=和Adagrad中是一样的，但是cache变量是不同的。因此，RMSProp仍然是基于梯度的大小来对每个权重的学习率进行修改，这同样效果不错。但是和Adagrad不同，其更新不会让学习率单调变小。</p>
<p>实验中的代码实现如下：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="c1"># *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>

<span class="n">config</span><span class="p">[</span><span class="s2">&#34;cache&#34;</span><span class="p">]</span> <span class="o">=</span>  <span class="n">config</span><span class="p">[</span><span class="s2">&#34;decay_rate&#34;</span><span class="p">]</span> <span class="o">*</span> <span class="n">config</span><span class="p">[</span><span class="s2">&#34;cache&#34;</span><span class="p">]</span> <span class="o">+</span> <span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="n">config</span><span class="p">[</span><span class="s2">&#34;decay_rate&#34;</span><span class="p">])</span> <span class="o">*</span> <span class="n">dw</span><span class="o">**</span><span class="mi">2</span>
<span class="n">next_w</span> <span class="o">=</span> <span class="n">w</span> <span class="o">-</span> <span class="n">config</span><span class="p">[</span><span class="s2">&#34;learning_rate&#34;</span><span class="p">]</span> <span class="o">*</span> <span class="n">dw</span> <span class="o">/</span> <span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">config</span><span class="p">[</span><span class="s2">&#34;cache&#34;</span><span class="p">])</span> <span class="o">+</span> <span class="n">config</span><span class="p">[</span><span class="s2">&#34;epsilon&#34;</span><span class="p">])</span>
<span class="k">pass</span>

<span class="c1"># *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>
</code></pre></td></tr></table>
</div>
</div><h4 id="adam">Adam</h4>
<p>Adam是最近才提出的一种更新方法，它看起来像是RMSProp的动量版。简化的代码是下面这样：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="n">m</span> <span class="o">=</span> <span class="n">beta1</span><span class="o">*</span><span class="n">m</span> <span class="o">+</span> <span class="p">(</span><span class="mi">1</span><span class="o">-</span><span class="n">beta1</span><span class="p">)</span><span class="o">*</span><span class="n">dx</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">beta2</span><span class="o">*</span><span class="n">v</span> <span class="o">+</span> <span class="p">(</span><span class="mi">1</span><span class="o">-</span><span class="n">beta2</span><span class="p">)</span><span class="o">*</span><span class="p">(</span><span class="n">dx</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span>
<span class="n">x</span> <span class="o">+=</span> <span class="o">-</span> <span class="n">learning_rate</span> <span class="o">*</span> <span class="n">m</span> <span class="o">/</span> <span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">v</span><span class="p">)</span> <span class="o">+</span> <span class="n">eps</span><span class="p">)</span>
</code></pre></td></tr></table>
</div>
</div><p>注意这个更新方法看起来真的和RMSProp很像，除了使用的是平滑版的梯度m，而不是用的原始梯度向量dx。论文中推荐的参数值eps=1e-8, beta1=0.9, beta2=0.999。在实际操作中，我们推荐Adam作为默认的算法，一般而言跑起来比RMSProp要好一点。但是也可以试试SGD+Nesterov动量。完整的Adam更新算法也包含了一个偏置（bias）矫正机制，因为m,v两个矩阵初始为0，在没有完全热身之前存在偏差，需要采取一些补偿措施。</p>
<p><b>实验中的Adam方法要求实现偏置（bias）矫正机制。</b>根据<a href="https://arxiv.org/pdf/1412.6980.pdf" target="_blank" rel="noopener noreffer">论文</a>，实验中的相关代码实现如下：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="c1"># *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>

<span class="n">config</span><span class="p">[</span><span class="s2">&#34;t&#34;</span><span class="p">]</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="n">config</span><span class="p">[</span><span class="s2">&#34;m&#34;</span><span class="p">]</span> <span class="o">=</span> <span class="n">config</span><span class="p">[</span><span class="s2">&#34;beta1&#34;</span><span class="p">]</span><span class="o">*</span><span class="n">config</span><span class="p">[</span><span class="s2">&#34;m&#34;</span><span class="p">]</span> <span class="o">+</span> <span class="p">(</span><span class="mi">1</span><span class="o">-</span><span class="n">config</span><span class="p">[</span><span class="s2">&#34;beta1&#34;</span><span class="p">])</span><span class="o">*</span><span class="n">dw</span>
<span class="n">config</span><span class="p">[</span><span class="s2">&#34;v&#34;</span><span class="p">]</span> <span class="o">=</span> <span class="n">config</span><span class="p">[</span><span class="s2">&#34;beta2&#34;</span><span class="p">]</span><span class="o">*</span><span class="n">config</span><span class="p">[</span><span class="s2">&#34;v&#34;</span><span class="p">]</span> <span class="o">+</span> <span class="p">(</span><span class="mi">1</span><span class="o">-</span><span class="n">config</span><span class="p">[</span><span class="s2">&#34;beta2&#34;</span><span class="p">])</span><span class="o">*</span><span class="p">(</span><span class="n">dw</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span>
<span class="n">m_hat</span><span class="o">=</span><span class="n">config</span><span class="p">[</span><span class="s2">&#34;m&#34;</span><span class="p">]</span><span class="o">/</span><span class="p">(</span><span class="mi">1</span><span class="o">-</span><span class="n">config</span><span class="p">[</span><span class="s2">&#34;beta1&#34;</span><span class="p">]</span><span class="o">**</span><span class="n">config</span><span class="p">[</span><span class="s2">&#34;t&#34;</span><span class="p">])</span>
<span class="n">v_hat</span><span class="o">=</span><span class="n">config</span><span class="p">[</span><span class="s2">&#34;v&#34;</span><span class="p">]</span><span class="o">/</span><span class="p">(</span><span class="mi">1</span><span class="o">-</span><span class="n">config</span><span class="p">[</span><span class="s2">&#34;beta2&#34;</span><span class="p">]</span><span class="o">**</span><span class="n">config</span><span class="p">[</span><span class="s2">&#34;t&#34;</span><span class="p">])</span>
<span class="n">next_w</span> <span class="o">=</span> <span class="n">w</span> <span class="o">-</span> <span class="n">config</span><span class="p">[</span><span class="s2">&#34;learning_rate&#34;</span><span class="p">]</span> <span class="o">*</span> <span class="n">m_hat</span> <span class="o">/</span> <span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">v_hat</span><span class="p">)</span> <span class="o">+</span> <span class="n">config</span><span class="p">[</span><span class="s2">&#34;epsilon&#34;</span><span class="p">])</span>
<span class="k">pass</span>

<span class="c1"># *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****</span>
</code></pre></td></tr></table>
</div>
</div>]]></description></item></channel></rss>