Grad-CAM的詳細介紹和Pytorch代碼實現

作者：Vinícius Almeida 2023-04-20 15:13:11

Grad-CAM (Gradient-weighted Class Activation Mapping) 是一種可視化深度神經網絡中哪些部分對于預測結果貢獻最大的技術。它能夠定位到特定的圖像區域，從而使得神經網絡的決策過程更加可解釋和可視化。

Grad-CAM 的基本思想是，在神經網絡中，最后一個卷積層的輸出特征圖對于分類結果的影響最大，因此我們可以通過對最后一個卷積層的梯度進行全局平均池化來計算每個通道的權重。這些權重可以用來加權特征圖，生成一個 Class Activation Map (CAM)，其中每個像素都代表了該像素區域對于分類結果的重要性。

相比于傳統的 CAM 方法，Grad-CAM 能夠處理任意種類的神經網絡，因為它不需要修改網絡結構或使用特定的層結構。此外，Grad-CAM 還可以用于對特征的可視化，以及對網絡中的一些特定層或單元進行分析。

在Pytorch中，我們可以使用鉤子 (hook) 技術，在網絡中注冊前向鉤子和反向鉤子。前向鉤子用于記錄目標層的輸出特征圖，反向鉤子用于記錄目標層的梯度。在本篇文章中，我們將詳細介紹如何在Pytorch中實現Grad-CAM。

加載并查看預訓練的模型

為了演示Grad-CAM的實現，我將使用來自Kaggle的胸部x射線數據集和我制作的一個預訓練分類器，該分類器能夠將x射線分類為是否患有肺炎。

model_path = "your/model/path/"
 
 # instantiate your model
 model = XRayClassifier()
 
 # load your model. Here we're loading on CPU since we're not going to do
 # large amounts of inference
 model.load_state_dict(torch.load(model_path, map_location=torch.device('cpu')))
 
 # put it in evaluation mode for inference
 model.eval()

首先我們看看這個模型的架構。就像前面提到的，我們需要識別最后一個卷積層，特別是它的激活函數。這一層表示模型學習到的最復雜的特征，它最有能力幫助我們理解模型的行為，下面是我們這個演示模型的代碼：

import torch
 import torch.nn as nn
 import torch.nn.functional as F
 
 # hyperparameters
 nc = 3 # number of channels
 nf = 64 # number of features to begin with
 dropout = 0.2
 device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
 
 # setup a resnet block and its forward function
 class ResNetBlock(nn.Module):
     def __init__(self, in_channels, out_channels, stride=1):
         super(ResNetBlock, self).__init__()
         self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
         self.bn1 = nn.BatchNorm2d(out_channels)
         self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False)
         self.bn2 = nn.BatchNorm2d(out_channels)
         
         self.shortcut = nn.Sequential()
         if stride != 1 or in_channels != out_channels:
             self.shortcut = nn.Sequential(
                 nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False),
                 nn.BatchNorm2d(out_channels)
            )
         
     def forward(self, x):
         out = F.relu(self.bn1(self.conv1(x)))
         out = self.bn2(self.conv2(out))
         out += self.shortcut(x)
         out = F.relu(out)
         return out
 
 # setup the final model structure
 class XRayClassifier(nn.Module):
     def __init__(self, nc=nc, nf=nf, dropout=dropout):
         super(XRayClassifier, self).__init__()
 
         self.resnet_blocks = nn.Sequential(
             ResNetBlock(nc,   nf,    stride=2), # (B, C, H, W) -> (B, NF, H/2, W/2), i.e., (64,64,128,128)
             ResNetBlock(nf,   nf*2,  stride=2), # (64,128,64,64)
             ResNetBlock(nf*2, nf*4,  stride=2), # (64,256,32,32)
             ResNetBlock(nf*4, nf*8,  stride=2), # (64,512,16,16)
             ResNetBlock(nf*8, nf*16, stride=2), # (64,1024,8,8)
        )
 
         self.classifier = nn.Sequential(
             nn.Conv2d(nf*16, 1, 8, 1, 0, bias=False),
             nn.Dropout(p=dropout),
             nn.Sigmoid(),
        )
 
     def forward(self, input):
         output = self.resnet_blocks(input.to(device))
         output = self.classifier(output)
         return output

模型3通道接收256x256的圖片。它期望輸入為[batch size, 3,256,256]。每個ResNet塊以一個ReLU激活函數結束。對于我們的目標，我們需要選擇最后一個ResNet塊。

XRayClassifier(
  (resnet_blocks): Sequential(
    (0): ResNetBlock(
      (conv1): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential(
        (0): Conv2d(3, 64, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): ResNetBlock(
      (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential(
        (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (2): ResNetBlock(
      (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential(
        (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (3): ResNetBlock(
      (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential(
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (4): ResNetBlock(
      (conv1): Conv2d(512, 1024, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential(
        (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
  )
  (classifier): Sequential(
    (0): Conv2d(1024, 1, kernel_size=(8, 8), stride=(1, 1), bias=False)
    (1): Dropout(p=0.2, inplace=False)
    (2): Sigmoid()
  )
 )

在Pytorch中，我們可以很容易地使用模型的屬性進行選擇。

model.resnet_blocks[-1]
 #ResNetBlock(
 # (conv1): Conv2d(512, 1024, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
 # (bn1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
 # (conv2): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
 # (bn2): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
 # (shortcut): Sequential(
 #   (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
 #   (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
 # )
 #)

Pytorch的鉤子函數

Pytorch有許多鉤子函數，這些函數可以處理在向前或后向傳播期間流經模型的信息。我們可以使用它來檢查中間梯度值，更改特定層的輸出。

在這里，我們這里將關注兩個方法：

register_full_backward_hook(hook, prepend=False)

該方法在模塊上注冊了一個后向傳播的鉤子，當調用backward()方法時，鉤子函數將會運行。后向鉤子函數接收模塊本身的輸入、相對于層的輸入的梯度和相對于層的輸出的梯度

hook(module, grad_input, grad_output) -> tuple(Tensor) or None

它返回一個torch.utils.hooks.RemovableHandle，可以使用這個返回值來刪除鉤子。我們在后面會討論這個問題。

register_forward_hook(hook, *, prepend=False, with_kwargs=False)

這與前一個非常相似，它在前向傳播中后運行，這個函數的參數略有不同。它可以讓你訪問層的輸出:

hook(module, args, output) -> None or modified output

它的返回也是torch.utils.hooks.RemovableHandle

向模型添加鉤子函數

為了計算Grad-CAM，我們需要定義后向和前向鉤子函數。這里的目標是關于最后一個卷積層的輸出的梯度，需要它的激活，即層的激活函數的輸出。鉤子函數會在推理和向后傳播期間為我們提取這些值。

# defines two global scope variables to store our gradients and activations
 gradients = None
 activations = None
 
 def backward_hook(module, grad_input, grad_output):
   global gradients # refers to the variable in the global scope
   print('Backward hook running...')
   gradients = grad_output
   # In this case, we expect it to be torch.Size([batch size, 1024, 8, 8])
   print(f'Gradients size: {gradients[0].size()}')
   # We need the 0 index because the tensor containing the gradients comes
   # inside a one element tuple.
 
 def forward_hook(module, args, output):
   global activations # refers to the variable in the global scope
   print('Forward hook running...')
   activations = output
   # In this case, we expect it to be torch.Size([batch size, 1024, 8, 8])
   print(f'Activations size: {activations.size()}')

在定義了鉤子函數和存儲激活和梯度的變量之后，就可以在感興趣的層中注冊鉤子，注冊的代碼如下：

backward_hook = model.resnet_blocks[-1].register_full_backward_hook(backward_hook, prepend=False)
 forward_hook = model.resnet_blocks[-1].register_forward_hook(forward_hook, prepend=False)

檢索需要的梯度和激活

現在已經為模型設置了鉤子函數，讓我們加載一個圖像，計算gradcam。

from PIL import Image
 
 img_path = "/your/image/path/"
 image = Image.open(img_path).convert('RGB')

為了進行推理，我們還需要對其進行預處理：

from torchvision import transforms
 from torchvision.transforms import ToTensor
 
 image_size = 256
 transform = transforms.Compose([
                                transforms.Resize(image_size, antialias=True),
                                transforms.CenterCrop(image_size),
                                transforms.ToTensor(),
                                transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
                            ])
 
 img_tensor = transform(image) # stores the tensor that represents the image

現在就可以進行前向傳播了：

model(img_tensor.unsqueeze(0)).backward()

鉤子函數的返回如下:

Forward hook running...
 Activations size: torch.Size([1, 1024, 8, 8])
 Backward hook running...
 Gradients size: torch.Size([1, 1024, 8, 8])

得到了梯度和激活變量后就可以生成熱圖：

計算Grad-CAM

為了計算Grad-CAM，我們將原始論文公式進行一些簡單的修改：

pooled_gradients = torch.mean(gradients[0], dim=[0, 2, 3])

import torch.nn.functional as F
 import matplotlib.pyplot as plt
 
 # weight the channels by corresponding gradients
 for i in range(activations.size()[1]):
     activations[:, i, :, :] *= pooled_gradients[i]
 
 # average the channels of the activations
 heatmap = torch.mean(activations, dim=1).squeeze()
 
 # relu on top of the heatmap
 heatmap = F.relu(heatmap)
 
 # normalize the heatmap
 heatmap /= torch.max(heatmap)
 
 # draw the heatmap
 plt.matshow(heatmap.detach())

結果如下：

得到的激活包含1024個特征映射，這些特征映射捕獲輸入圖像的不同方面，每個方面的空間分辨率為8x8。通過鉤子獲得的梯度表示每個特征映射對最終預測的重要性。通過計算梯度和激活的元素積可以獲得突出顯示圖像最相關部分的特征映射的加權和。通過計算加權特征圖的全局平均值，可以得到一個單一的熱圖，該熱圖表明圖像中對模型預測最重要的區域。這就是Grad-CAM，它提供了模型決策過程的可視化解釋，可以幫助我們解釋和調試模型的行為。

但是這個圖能代表什么呢？我們將他與圖片進行整合就能更加清晰的可視化了。

結合原始圖像和熱圖

下面的代碼將原始圖像和我們生成的熱圖進行整合顯示：

from torchvision.transforms.functional import to_pil_image
 from matplotlib import colormaps
 import numpy as np
 import PIL
 
 # Create a figure and plot the first image
 fig, ax = plt.subplots()
 ax.axis('off') # removes the axis markers
 
 # First plot the original image
 ax.imshow(to_pil_image(img_tensor, mode='RGB'))
 
 # Resize the heatmap to the same size as the input image and defines
 # a resample algorithm for increasing image resolution
 # we need heatmap.detach() because it can't be converted to numpy array while
 # requiring gradients
 overlay = to_pil_image(heatmap.detach(), mode='F')
                      .resize((256,256), resample=PIL.Image.BICUBIC)
 
 # Apply any colormap you want
 cmap = colormaps['jet']
 overlay = (255 * cmap(np.asarray(overlay) ** 2)[:, :, :3]).astype(np.uint8)
 
 # Plot the heatmap on the same axes,
 # but with alpha < 1 (this defines the transparency of the heatmap)
 ax.imshow(overlay, alpha=0.4, interpolation='nearest', extent=extent)
 
 # Show the plot
 plt.show()

這樣看是不是就理解多了。由于它是一個正常的x射線結果，所以并沒有什么需要特殊說明的。

再看這個例子，這個結果中被標注的是肺炎。Grad-CAM能準確顯示出醫生為確定是否患有肺炎而必須檢查的胸部x光片區域。也就是說我們的模型的確學到了一些東西（紅色區域再肺部附近）

刪除鉤子

要從模型中刪除鉤子，只需要在返回句柄中調用remove()方法。

backward_hook.remove()
 forward_hook.remove()

總結

這篇文章可以幫助你理清Grad-CAM 是如何工作的，以及如何用Pytorch實現它。因為Pytorch包含了強大的鉤子函數，所以我們可以在任何模型中使用本文的代碼。

責任編輯：華軒來源： DeepHub IMBA

成人免费xxxxx在线视频软件_久久精品久久久_亚洲国产精品久久久_天天色天天色_亚洲人成一区_欧美一级欧美三级在线观看