ByteTrack: Multi-Object Tracking by Associating Every Detection Box

This article describes ByteTrack for multiple object tracking, published in 2021.

The ByteTrack paper “ByteTrack: Multi-Object Tracking by Associating Every Detection Box” is available on ArXiv, and the implementation ByteTrack is available on GitHub.

Introduction

ByteTrack improves performance by devising data association algorithms. The idea is to perform data association using all detection boxes, as the title of the paper says, “Associating Every Detection Box”. However, according to the implementation, data association is performed using detection boxes with a score higher than 0.1, so we feel that it is better to say that almost every detection box is suitable for the implementation.

Next, using the figure below, we describe the problem that ByteTrack tries to solve. Figure (a) shows the detection boxes and their scores in frame t₁, t₂, t₃. Figure (b) shows an example of data association using high score detection boxes with scores higher than 0.5. At frame t₁, three tracklets are generated, but at frame t₂ and frame t₃, occlusion causes the detection score for the red tracklet to be less than 0.5 and the red tracklet to disappear. Now, if we change the above countermeasure to use the every detection box, the performance will suffer due to more false positives. This is the detection dilemma.

Source: ByteTrack

Data Association of ByteTrack

Here we take a closer look at ByteTrack’s data association algorithm. ByteTrack performs data association in two parts. The paper calls them first and second association respectively.

The first association is the data association between high score detections and tracklets. Here, a high score detection is a detection whose score is higher than a threshold (0.6 in the paper’s experiments). At the first association, the state vector of each matched tracklet is updated by the Kalman filter as shown in Figure (b). In the first association, unmatched high score detections and unmatched tracklets are moved to remain detections and remain tracklets, respectively. At frame t₂ and frame t₃ in Figure (b), the red tracklet is moved to the remain tracklet because it has no corresponding high score detection.

The second association is the data association between low score detections and the remain tracklets. Here, a low score detection is a detection with a score below a threshold in the paper and a detection with a score below a threshold and above 0.1 in the implementation. In Figure (c), the red dashed remain tracklet is data associated with detections with scores of 0.4 and 0.1 in frame t₂ and frame t₃, respectively. In the second association, unmatched remain tracklets are moved to re-remain tracklets. Note that unmatched low score detections are not moved to the remain detections and are discarded.

In other words, ByteTrack solves the aforementioned detection dilemma by performing data associations with low score detections only for those remain tracklets that did not have corresponding high score detections.

Versatility of the ByteTrack Algorithm

Here we look at the versatility of ByteTrack’s data association algorithm.

The ByteTrack implementation uses intersection-over-union (IoU) as a similarity metric for both first and second associations. On the other hand, in the ByteTrack paper, the similarity metrics for each association are described as Similarity#1 and Similarity#2, respectively, and are versatile.

The paper compares IoU and Re-ID as similarity metrics by benchmarking with MOT17 and BDD100K. Note that Re-ID is a similarity metric that uses appearance features. In the benchmark using MOT17, either IoU or Re-ID is a good choice for Similarity#1 of the first association. IoU achieves better MOTA and IDs while Re-ID achieves higher IDF1. In the benchmark using BDD100K, Re-ID achieves better performance than IoU. However, regarding Similarity#2 of the second association, it is stated that it is desirable to use IoU. The reason for this is that low score detections are affected by occlusion or motion blur, and appearance features are unreliable.

The paper also applies ByteTrack’s data association algorithm to nine existing trackers and shows that it improves the performance of many trackers.