这篇文章是我在完成CS231N-2021课程的Labassignment2/ConvolutionalNetworks.ipynb
时的学习与实验的摘录与笔记。
卷积运算
前向传播
输入有$N$个数据点,高度$H$宽度$W$,$C$个通道。每个输入与$F$个不同的filters卷积,每个filter维度为$HH\times WW\times C$。输入的参数还有步长与零补。代码实现输出卷积运算结果的前向传播方法:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
N, C, H, W=x.shape
F, C, HH, WW=w.shape
stride,pad=conv_param['stride'],conv_param['pad']
H_prime=1+(H+2*pad-HH)//stride
W_prime=1+(W+2*pad-WW)//stride
x_pad=np.pad(x,((0,0),(0,0),(pad,pad),(pad,pad)))
out=np.zeros(shape=(N,F,H_prime,W_prime))
for n in range(N):
for f in range(F):
for i in range(H_prime):
for j in range(W_prime):
# print(x[n,:,i*stride:i*stride+HH+1,j*stride:j*stride+WW+1].shape)
out[n,f,i,j]=np.sum(w[f,:,:,:]*x_pad[n,:,i*stride:i*stride+HH,j*stride:j*stride+WW])+b[f]
pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
|
反向传播
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
x, w, b, conv_param=cache
N, C, H, W=x.shape
F, C, HH, WW=w.shape
stride,pad=conv_param['stride'],conv_param['pad']
H_prime=1+(H+2*pad-HH)//stride
W_prime=1+(W+2*pad-WW)//stride
x_pad=np.pad(x,((0,0),(0,0),(pad,pad),(pad,pad)))
dx_pad=np.zeros(shape=x_pad.shape)
dw=np.zeros(shape=w.shape)
db=np.zeros(shape=b.shape)
for n in range(N):
for f in range(F):
for h in range(H_prime):
for w_mid in range(W_prime):
dx_pad[n, :, h*stride:h*stride+HH, w_mid*stride:w_mid*stride+WW]+=dout[n,f,h,w_mid]*w[f,:,:,:]
dw[f, :, :, :]+=dout[n, f, h, w_mid]*x_pad[n, :, h*stride:h*stride+HH, w_mid*stride:w_mid*stride+WW]
db[f]+=dout[n,f,h,w_mid]
dx=dx_pad[:,:,pad:H+pad,pad:W+pad]
pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
|
池化层
通常,在连续的卷积层之间会周期性地插入一个汇聚层。它的作用是逐渐降低数据体的空间尺寸,这样的话就能减少网络中参数的数量,使得计算资源耗费变少,也能有效控制过拟合。池化层使用MAX操作,对输入数据体的每一个深度切片独立进行操作,改变它的空间尺寸。最常见的形式是池化层使用尺寸2x2的滤波器,以步长为2来对每个深度切片进行降采样,将其中75%的激活信息都丢掉。直观的说,使用MAX操作的池化层取出每个滤波器中最“神经”的那个激活信息。
前向传播
代码如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
N,C,H,W=x.shape
pool_height, pool_width, stride=pool_param['pool_height'], pool_param['pool_width'], pool_param['stride']
H_prime=1 + (H - pool_height) // stride
W_prime=1 + (W - pool_width) // stride
out=np.zeros(shape=(N,C,H_prime,W_prime))
for n in range(N):
for c in range(C):
for i in range(H_prime):
for j in range(W_prime):
out[n,c,i,j]=np.max(x[n,c,i*stride:i*stride+pool_height,j*stride:j*stride+pool_width])
pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
|
反向传播
代码如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
x, pool_param=cache
N,C,H,W=x.shape
pool_height, pool_width, stride=pool_param['pool_height'], pool_param['pool_width'], pool_param['stride']
H_prime=1 + (H - pool_height) // stride
W_prime=1 + (W - pool_width) // stride
dx=np.zeros(shape=x.shape)
for n in range(N):
for c in range(C):
for h in range(H_prime):
for w in range(W_prime):
# print(x[n,c,h*stride:h*stride+pool_height,w*stride:w*stride+pool_width].shape)
ind=np.unravel_index(np.argmax(x[n,c,h*stride:h*stride+pool_height,w*stride:w*stride+pool_width]),shape=(pool_height,pool_width))
dx[n,c,h*stride+ind[0],w*stride+ind[1]]+=dout[n,c,h,w]
pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
|
实验中还提供了快速版本的卷积和池化运算的API,快速卷积操作的前向传播和反向传播的加速比分别约为602x和736x。,快速池化操作的前向传播和反向传播的加速比分别约为183x和58x。
三层卷积网络
这里三层卷积网络的结构为conv - relu - 2x2 max pool - affine - relu - affine - softmax
,实验中在一个类ThreeLayerConvNet
中实现。
参数初始化
实验中卷积层的padding
和stride
的设置保证其输出与输入具有相同的高度和宽度。具体来说,步长$S=1$,填充$P=\left\lfloor\frac{F-1}{2}\right\rfloor$,其中$F$为感受野filter的尺寸。我推了一下,貌似这里$F$必须为奇数才能保证输出大小与输入的空间规模相同。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
# conv layer parameters
self.params['W1']=weight_scale*np.random.randn(num_filters,input_dim[0],filter_size,filter_size)
self.params['b1']=np.zeros(num_filters)
# hidden affine layer parameters
self.params['W2']=weight_scale*np.random.randn(num_filters*(input_dim[1]//2)*(input_dim[2]//2),hidden_dim)
self.params['b2']=np.zeros(hidden_dim)
# output affine layer parameters
self.params['W3']=weight_scale*np.random.randn(hidden_dim,num_classes)
self.params['b3']=np.zeros(num_classes)
pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
|
损失函数和梯度计算
这里可以使用cs231n/layer_utils.py
中提供的“三明治”层,即将多个层例如conv - relu - max_pool
的前向传播和反向传播分别整合到一个函数当中去。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
|
def loss(self, X, y=None):
"""
Evaluate loss and gradient for the three-layer convolutional network.
Input / output: Same API as TwoLayerNet in fc_net.py.
"""
W1, b1 = self.params["W1"], self.params["b1"]
W2, b2 = self.params["W2"], self.params["b2"]
W3, b3 = self.params["W3"], self.params["b3"]
# pass conv_param to the forward pass for the convolutional layer
# Padding and stride chosen to preserve the input spatial size
filter_size = W1.shape[2]
conv_param = {"stride": 1, "pad": (filter_size - 1) // 2}
# pass pool_param to the forward pass for the max-pooling layer
pool_param = {"pool_height": 2, "pool_width": 2, "stride": 2}
scores = None
############################################################################
# TODO: Implement the forward pass for the three-layer convolutional net, #
# computing the class scores for X and storing them in the scores #
# variable. #
# #
# Remember you can use the functions defined in cs231n/fast_layers.py and #
# cs231n/layer_utils.py in your implementation (already imported). #
############################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
out_conv_relu_pool, cache_conv_relu_pool=conv_relu_pool_forward(X,W1,b1,conv_param,pool_param)
out_affine_relu, cache_affine_relu=affine_relu_forward(out_conv_relu_pool,W2,b2)
scores, cache_output_affine=affine_forward(out_affine_relu,W3,b3)
pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
############################################################################
# END OF YOUR CODE #
############################################################################
if y is None:
return scores
loss, grads = 0, {}
############################################################################
# TODO: Implement the backward pass for the three-layer convolutional net, #
# storing the loss and gradients in the loss and grads variables. Compute #
# data loss using softmax, and make sure that grads[k] holds the gradients #
# for self.params[k]. Don't forget to add L2 regularization! #
# #
# NOTE: To ensure that your implementation matches ours and you pass the #
# automated tests, make sure that your L2 regularization includes a factor #
# of 0.5 to simplify the expression for the gradient. #
############################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
loss, grad=softmax_loss(scores,y)
loss+=0.5*self.reg*np.sum(W1*W1)
loss+=0.5*self.reg*np.sum(W2*W2)
loss+=0.5*self.reg*np.sum(W3*W3)
grad, grads['W3'], grads['b3']=affine_backward(grad,cache_output_affine)
grads['W3']+=self.reg*np.sum(W3)
grad, grads['W2'], grads['b2']=affine_relu_backward(grad,cache_affine_relu)
grads['W2']+=self.reg*np.sum(W2)
grad, grads['W1'], grads['b1']=conv_relu_pool_backward(grad, cache_conv_relu_pool)
grads['W1']+=self.reg*np.sum(W1)
pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
############################################################################
# END OF YOUR CODE #
############################################################################
return loss, grads
|
实验中还有Spatial Batch Normalization和Spatial Group Normalization两部分的实验,留到下一次做吧。