目录

神经网络笔记(三)——卷积神经网络


这篇文章是我在完成CS231N-2021课程的Labassignment2/ConvolutionalNetworks.ipynb时的学习与实验的摘录与笔记。

卷积运算

前向传播

输入有$N$个数据点,高度$H$宽度$W$,$C$个通道。每个输入与$F$个不同的filters卷积,每个filter维度为$HH\times WW\times C$。输入的参数还有步长与零补。代码实现输出卷积运算结果的前向传播方法:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    N, C, H, W=x.shape
    F, C, HH, WW=w.shape
    stride,pad=conv_param['stride'],conv_param['pad']
    H_prime=1+(H+2*pad-HH)//stride
    W_prime=1+(W+2*pad-WW)//stride

    x_pad=np.pad(x,((0,0),(0,0),(pad,pad),(pad,pad)))
    out=np.zeros(shape=(N,F,H_prime,W_prime))
    for n in range(N):
      for f in range(F):
        for i in range(H_prime):
          for j in range(W_prime):
            # print(x[n,:,i*stride:i*stride+HH+1,j*stride:j*stride+WW+1].shape)
            out[n,f,i,j]=np.sum(w[f,:,:,:]*x_pad[n,:,i*stride:i*stride+HH,j*stride:j*stride+WW])+b[f]
    pass

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

反向传播

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    x, w, b, conv_param=cache
    N, C, H, W=x.shape
    F, C, HH, WW=w.shape
    stride,pad=conv_param['stride'],conv_param['pad']
    H_prime=1+(H+2*pad-HH)//stride
    W_prime=1+(W+2*pad-WW)//stride

    x_pad=np.pad(x,((0,0),(0,0),(pad,pad),(pad,pad)))
    dx_pad=np.zeros(shape=x_pad.shape)
    dw=np.zeros(shape=w.shape)
    db=np.zeros(shape=b.shape)
    for n in range(N):
      for f in range(F):
        for h in range(H_prime):
          for w_mid in range(W_prime):
            dx_pad[n, :, h*stride:h*stride+HH, w_mid*stride:w_mid*stride+WW]+=dout[n,f,h,w_mid]*w[f,:,:,:]
            dw[f, :, :, :]+=dout[n, f, h, w_mid]*x_pad[n, :, h*stride:h*stride+HH, w_mid*stride:w_mid*stride+WW]
            db[f]+=dout[n,f,h,w_mid]
    dx=dx_pad[:,:,pad:H+pad,pad:W+pad]
    pass

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

池化层

通常,在连续的卷积层之间会周期性地插入一个汇聚层。它的作用是逐渐降低数据体的空间尺寸,这样的话就能减少网络中参数的数量,使得计算资源耗费变少,也能有效控制过拟合。池化层使用MAX操作,对输入数据体的每一个深度切片独立进行操作,改变它的空间尺寸。最常见的形式是池化层使用尺寸2x2的滤波器,以步长为2来对每个深度切片进行降采样,将其中75%的激活信息都丢掉。直观的说,使用MAX操作的池化层取出每个滤波器中最“神经”的那个激活信息。

前向传播

代码如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    N,C,H,W=x.shape
    pool_height, pool_width, stride=pool_param['pool_height'], pool_param['pool_width'], pool_param['stride']
    H_prime=1 + (H - pool_height) // stride
    W_prime=1 + (W - pool_width) // stride
    out=np.zeros(shape=(N,C,H_prime,W_prime))
    for n in range(N):
      for c in range(C):
        for i in range(H_prime):
          for j in range(W_prime):
            out[n,c,i,j]=np.max(x[n,c,i*stride:i*stride+pool_height,j*stride:j*stride+pool_width])
    pass

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

反向传播

代码如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    x, pool_param=cache
    N,C,H,W=x.shape
    pool_height, pool_width, stride=pool_param['pool_height'], pool_param['pool_width'], pool_param['stride']
    H_prime=1 + (H - pool_height) // stride
    W_prime=1 + (W - pool_width) // stride
    dx=np.zeros(shape=x.shape)
    for n in range(N):
      for c in range(C):
        for h in range(H_prime):
          for w in range(W_prime):
            # print(x[n,c,h*stride:h*stride+pool_height,w*stride:w*stride+pool_width].shape)
            ind=np.unravel_index(np.argmax(x[n,c,h*stride:h*stride+pool_height,w*stride:w*stride+pool_width]),shape=(pool_height,pool_width))
            dx[n,c,h*stride+ind[0],w*stride+ind[1]]+=dout[n,c,h,w]
    pass

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

实验中还提供了快速版本的卷积和池化运算的API,快速卷积操作的前向传播和反向传播的加速比分别约为602x和736x。,快速池化操作的前向传播和反向传播的加速比分别约为183x和58x。

三层卷积网络

这里三层卷积网络的结构为conv - relu - 2x2 max pool - affine - relu - affine - softmax,实验中在一个类ThreeLayerConvNet中实现。

参数初始化

实验中卷积层的paddingstride的设置保证其输出与输入具有相同的高度和宽度。具体来说,步长$S=1$,填充$P=\left\lfloor\frac{F-1}{2}\right\rfloor$,其中$F$为感受野filter的尺寸。我推了一下,貌似这里$F$必须为奇数才能保证输出大小与输入的空间规模相同。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        # conv layer parameters
        self.params['W1']=weight_scale*np.random.randn(num_filters,input_dim[0],filter_size,filter_size)
        self.params['b1']=np.zeros(num_filters)
        # hidden affine layer parameters
        self.params['W2']=weight_scale*np.random.randn(num_filters*(input_dim[1]//2)*(input_dim[2]//2),hidden_dim)
        self.params['b2']=np.zeros(hidden_dim)
        # output affine layer parameters
        self.params['W3']=weight_scale*np.random.randn(hidden_dim,num_classes)
        self.params['b3']=np.zeros(num_classes)

        pass

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

损失函数和梯度计算

这里可以使用cs231n/layer_utils.py中提供的“三明治”层,即将多个层例如conv - relu - max_pool的前向传播和反向传播分别整合到一个函数当中去。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
    def loss(self, X, y=None):
        """
        Evaluate loss and gradient for the three-layer convolutional network.

        Input / output: Same API as TwoLayerNet in fc_net.py.
        """
        W1, b1 = self.params["W1"], self.params["b1"]
        W2, b2 = self.params["W2"], self.params["b2"]
        W3, b3 = self.params["W3"], self.params["b3"]

        # pass conv_param to the forward pass for the convolutional layer
        # Padding and stride chosen to preserve the input spatial size
        filter_size = W1.shape[2]
        conv_param = {"stride": 1, "pad": (filter_size - 1) // 2}

        # pass pool_param to the forward pass for the max-pooling layer
        pool_param = {"pool_height": 2, "pool_width": 2, "stride": 2}

        scores = None
        ############################################################################
        # TODO: Implement the forward pass for the three-layer convolutional net,  #
        # computing the class scores for X and storing them in the scores          #
        # variable.                                                                #
        #                                                                          #
        # Remember you can use the functions defined in cs231n/fast_layers.py and  #
        # cs231n/layer_utils.py in your implementation (already imported).         #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        out_conv_relu_pool, cache_conv_relu_pool=conv_relu_pool_forward(X,W1,b1,conv_param,pool_param)
        out_affine_relu, cache_affine_relu=affine_relu_forward(out_conv_relu_pool,W2,b2)
        scores, cache_output_affine=affine_forward(out_affine_relu,W3,b3)

        pass

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                             END OF YOUR CODE                             #
        ############################################################################

        if y is None:
            return scores

        loss, grads = 0, {}
        ############################################################################
        # TODO: Implement the backward pass for the three-layer convolutional net, #
        # storing the loss and gradients in the loss and grads variables. Compute  #
        # data loss using softmax, and make sure that grads[k] holds the gradients #
        # for self.params[k]. Don't forget to add L2 regularization!               #
        #                                                                          #
        # NOTE: To ensure that your implementation matches ours and you pass the   #
        # automated tests, make sure that your L2 regularization includes a factor #
        # of 0.5 to simplify the expression for the gradient.                      #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        loss, grad=softmax_loss(scores,y)
        loss+=0.5*self.reg*np.sum(W1*W1)
        loss+=0.5*self.reg*np.sum(W2*W2)
        loss+=0.5*self.reg*np.sum(W3*W3)

        grad, grads['W3'], grads['b3']=affine_backward(grad,cache_output_affine)
        grads['W3']+=self.reg*np.sum(W3)
        grad, grads['W2'], grads['b2']=affine_relu_backward(grad,cache_affine_relu)
        grads['W2']+=self.reg*np.sum(W2)
        grad, grads['W1'], grads['b1']=conv_relu_pool_backward(grad, cache_conv_relu_pool)
        grads['W1']+=self.reg*np.sum(W1)

        pass

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                             END OF YOUR CODE                             #
        ############################################################################

        return loss, grads

实验中还有Spatial Batch Normalization和Spatial Group Normalization两部分的实验,留到下一次做吧。