PyTorch實(shí)現(xiàn)非極大值抑制(NMS)

發(fā)布人：數(shù)據(jù)派THU 時(shí)間：2022-11-20 來源：工程師

加入技術(shù)交流群
- 掃碼加入
  和技術(shù)大咖面對面交流
  海量資料庫查詢

NMS即non maximum suppression即非極大抑制，顧名思義就是抑制不是極大值的元素，搜索局部的極大值。在最近幾年常見的物體檢測算法（包括rcnn、sppnet、fast-rcnn、faster-rcnn等）中，最終都會從一張圖片中找出很多個(gè)可能是物體的矩形框，然后為每個(gè)矩形框?yàn)樽鲱悇e分類概率。

如果你在做計(jì)算機(jī)視覺(特別是目標(biāo)檢測)，你肯定會聽說過非極大值抑制(nms)。網(wǎng)上有很多不錯的文章給出了適當(dāng)?shù)母攀?。簡而言之，非最大抑制使用一些啟發(fā)式方法減少了輸出邊界框的數(shù)量，例如交叉除以并集(iou)。

在PyTorch的文檔中說：NMS 迭代地刪除與另一個(gè)（得分較高）框的 IoU 大于 iou_threshold 的得分較低的框。

為了研究其如何工作，讓我們加載一個(gè)圖像并創(chuàng)建邊界框

 from PIL import Image import torch import matplotlib.pyplot as plt import numpy as np  # credit https://i0.wp.com/craffic.co.in/wp-content/uploads/2021/02/ai-remastered-rick-astley-never-gonna-give-you-up.jpg?w=1600&ssl=1 img = Image.open("./samples/never-gonna-give-you-up.webp")
 img

我們手動創(chuàng)建兩個(gè)框，一個(gè)人臉，一個(gè)話筒

 original_bboxes = torch.tensor([     # head     [ 565, 73, 862, 373],     # mic     [807, 309, 865, 434] ]).float()  w, h = img.size # we need them in range [0, 1] original_bboxes[...,0] /= h original_bboxes[...,1] /= w original_bboxes[...,2] /= h
 original_bboxes[...,3] /= w

這些bboxes 都是在[0,1]范圍內(nèi)的，雖然這不是必需的，但當(dāng)有多個(gè)類時(shí)，這是非常有用的(我們稍后將看到為什么)。

 from torchvision.utils import draw_bounding_boxes from torchvision.transforms.functional import to_tensor from typing import List  def plot_bboxes(img : Image.Image, bboxes: torch.Tensor, *args, **kwargs) -> plt.Figure:     w, h = img.size     # from [0, 1] to image size     bboxes = bboxes.clone()     bboxes[...,0] *= h     bboxes[...,1] *= w     bboxes[...,2] *= h     bboxes[...,3] *= w     fig = plt.figure()     img_with_bboxes = draw_bounding_boxes((to_tensor(img) * 255).to(torch.uint8), bboxes, *args, **kwargs, width=4)     return plt.imshow(img_with_bboxes.permute(1,2,0).numpy()) 
 plot_bboxes(img, original_bboxes, labels=["head", "mic"])

為了說明，我們添加一些重疊的框

 max_bboxes = 3 scaling = torch.tensor([1, .96, .97, 1.02]) shifting = torch.tensor([0, 0.001, 0.002, -0.002])  # broadcasting magic (2, 1, 4) * (1, 3, 1) bboxes = (original_bboxes[:,None,:] * scaling[..., None] + shifting[..., None]).view(-1, 4) 
 plot_bboxes(img, bboxes, colors=[*["yellow"] * 4, *["blue"] * 4], labels=[*["head"] * 4, *["mic"] * 4])

現(xiàn)在可以看到，有6個(gè)bboxes ，這里我們還需要定義一個(gè)分?jǐn)?shù)，這通常由模型輸出。

 scores = torch.tensor([     0.98, 0.85, 0.5, 0.2, # for head     1, 0.92, 0.3, 0.1 # for mic
 ])

我們標(biāo)簽的分類，0代表人臉，1代表麥克風(fēng)。

 labels = torch.tensor([0,0,0,0,1,1,1,1])

最后，讓我們排列一下這些數(shù)據(jù)：

 perm = torch.randperm(scores.shape[0]) bboxes = bboxes[perm] scores = scores[perm]
 labels = labels[perm]

讓我們看看結(jié)果：

 plot_bboxes(img, bboxes,              colors=["yellow" if el.item() == 0 else "blue" for el in labels],              labels=["head" if el.item()  == 0 else "mic" for el in labels]
            )

好了，這樣我們模擬了模型的輸出了，下面進(jìn)入正題。

NMS是通過迭代刪除低分?jǐn)?shù)重疊的邊界框來工作的。步驟如下。

bboxes are sorted by score in decreasing orderinit a vector keep with onesfor i in len(bboxes):    # was suppressed    if keep[i] == 0:        continue    # compare with all the others    for j in len(bbox):        if keep[j]:            if (iou(bboxes[i], bboxes[j]) > iou_threshold):                keep[j] = 0 
return keep

我們的Pytorch實(shí)現(xiàn)，采用三個(gè)參數(shù)（這實(shí)際上是從pytorch的文檔中復(fù)制和粘貼的）：

box (Tensor[N, 4])) – 用于執(zhí)行 NMS 的框。它們應(yīng)該是 (x1, y1, x2, y2) 格式，0 <= x1 < x2 和 0 <= y1 < y2。
score (Tensor[N]) – 每個(gè)box 的得分
iou_threshold (float) – 丟棄所有 IoU > iou_threshold 的框
返回值是非抑制邊界框的索引

from torchvision.ops.boxes import box_ioudef nms(bboxes: torch.Tensor, scores: torch.Tensor, iou_threshold: float) -> torch.Tensor:    order = torch.argsort(-scores)    indices = torch.arange(bboxes.shape[0])    keep = torch.ones_like(indices, dtype=torch.bool)    for i in indices:        if keep[i]:            bbox = bboxes[order[i]]            iou = box_iou(bbox[None,...],(bboxes[order[i + 1:]]) * keep[i + 1:][...,None])            overlapped = torch.nonzero(iou > iou_threshold)            keep[overlapped + i + 1] = 0
    return order[keep]

讓我們詳細(xì)說明下這個(gè)參數(shù)：

order = scores.argsort()

根據(jù)分?jǐn)?shù)得到排序的指標(biāo)

indices = torch.arange(bboxes.shape[0])

創(chuàng)建用于迭代bboxes的索引 indices

keep = torch.ones_like(indices, dtype=torch.bool)

keep是用于判斷一個(gè)bbox是否應(yīng)該保留的向量，如果Keep [i] == 1，則bboxes[order[i]]不被抑制

for i in indices:    ...

for循環(huán)遍歷所有的box,如果當(dāng)前box未被抑制，則keep[i] = 1

bbox = bboxes[order[i]]]

來通過已排序的位置獲取bbox

iou = box_iou(bbox[None,...], (bboxes[order[i + 1:]]) * keep[i + 1:][...,None])

計(jì)算當(dāng)前bbox和所有其他候選bbox之間的iou。這將把所有抑制框設(shè)置為零(因?yàn)閗eep將等于0)

(bboxes ...)[order[i + 1:]]

在排序的順序中與后面所有的框進(jìn)行比較，因?yàn)樾枰^當(dāng)前的框，所以這里是i+ 1,

overlapped = torch.nonzero(iou > iou_threshold)keep[overlapped + i + 1] = 0

計(jì)算和選擇iou大于iou_threshold的索引。

我們之前對bboxes進(jìn)行了切片，(bboxes…)[i + 1:])，所以我們需要添加這些索引的偏移量，這就是后面+ i + 1的原因。

最后返回order[keep]，這樣映射回原始的box索引(未排序)，這樣一個(gè)簡單的函數(shù)就執(zhí)行完成了。

讓我們看看結(jié)果。

nms_indices = nms(bboxes, scores, .45)plot_bboxes(img,             bboxes[nms_indices],            colors=["yellow" if el.item() == 0 else "blue" for el in labels[nms_indices]],             labels=["head" if el.item()  == 0 else "mic" for el in labels[nms_indices]]
           )

因?yàn)橛卸鄠€(gè)類，所以需要讓nms在同一個(gè)類中計(jì)算iou。還記得上面我們提到的在[0,1]之間嗎?可以給它們添加標(biāo)簽，把不同類的框區(qū)分開。

nms_indices = nms(bboxes + labels[..., None], scores, .45)plot_bboxes(img,             bboxes[nms_indices],            colors=["yellow" if el.item() == 0 else "blue" for el in labels[nms_indices]],             labels=["head" if el.item()  == 0 else "mic" for el in labels[nms_indices]]
           )

如果我們將閾值更改為0.1，就得到了下圖

讓我們對比下pytorch官方的實(shí)現(xiàn)：

from torchvision.ops.boxes import nms as torch_nmsnms_indices = torch_nms(bboxes + labels[..., None], scores, .45)plot_bboxes(img,             bboxes[nms_indices],            colors=["yellow" if el.item() == 0 else "blue" for el in labels[nms_indices]],             labels=["head" if el.item()  == 0 else "mic" for el in labels[nms_indices]]
           )

結(jié)果是一樣的。然我們看看時(shí)間：

%%timeitnms(bboxes + labels[..., None], scores, .45)#534 μs ± 22.1 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%%timeittorch_nms(bboxes + labels[..., None], scores, .45)
#54.4 μs ± 3.29 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

我們的實(shí)現(xiàn)慢了大約10倍，哈，這個(gè)結(jié)果很正常，因?yàn)槲覀兾覀儧]有使用自定義的cpp內(nèi)核!但是這并不代表我們的實(shí)現(xiàn)沒有用，因?yàn)槭謱懘a我們完全了解了NMS的工作原理，這是本文的真正意義，總之在這篇文章中我們看到了如何在PyTorch中實(shí)現(xiàn)非最大抑制，這對你了解目標(biāo)檢測的相關(guān)知識是非常有幫助的。

*博客內(nèi)容為網(wǎng)友個(gè)人發(fā)布，僅代表博主個(gè)人觀點(diǎn)，如有侵權(quán)請聯(lián)系工作人員刪除。

博客專欄

PyTorch實(shí)現(xiàn)非極大值抑制(NMS)

相關(guān)推薦

技術(shù)專區(qū)