Number of parameters (M)

YOLOX: Exceeding YOLO Series in 2021 Zheng Ge Songtao Liu Feng Wang Zeming Li Jian SunMegvii Technology{gezheng, liusongtao, wangfeng02, lizeming, " " " $ " " # # ! " # 3940414243444546474849505158111417202326 293235384144 COCO AP (%)V100 batch 1 Latency (ms) AP (%) Number of parameters (M)YOLOX-NanoNanoDetYOLOv4-TinyYOLOX-Tin yEfficientDet-Lite0 EfficientDet-Lite3 YOLOX-SPPYOLO-TinyEfficientDet-Lite2 EfficientDet-Lite1 Figure 1: Speed-accuracy trade-off of accurate models (top) and Size-accuracy curve of lite models on mobile devices(bottom) for YOLOX and other state-of-the-art object this report, we present some experienced improve-ments to YOLO series, forming a new high-performancedetector YOLOX.}

We switch the YOLO detector to ananchor-free manner and conduct other advanced detectiontechniques, , a decoupled head and the leading labelassignment strategy SimOTA to achieve state-of-the-art re-sults across a large scale range of models: For YOLO-Nano with only parameters and FLOPs, weget AP on COCO, surpassing NanoDet by AP;for YOLOv3, one of the most widely used detectors in in-dustry, we boost it to AP on COCO, outperform-ing the current best practice by AP; for YOLOX-Lwith roughly the same amount of parameters as YOLOv4-CSP, YOLOv5-L, we achieve AP on COCO at aspeed of FPS on tesla V100, exceeding YOLOv5-Lby AP.

Further, we won the 1st Place on StreamingPerception Challenge (Workshop on Autonomous Drivingat CVPR 2021) using a single YOLOX-L model. We hopethis report can provide useful experience for developers and* Equal contribution. Corresponding in practical scenes, and we also provide de-ploy versions with ONNX, TensorRT, NCNN, and Openvinosupported. Source code is IntroductionWith the development of object detection, YOLO se-ries [23, 24, 25, 1, 7] always pursuit the optimal speed andaccuracy trade-off for real-time applications. They extractthe most advanced detection technologies available at thetime ( , anchors [26] for YOLOv2 [24], Residual Net [9]for YOLOv3 [25]) and optimize the implementation for bestpractice.

Currently, YOLOv5 [7] holds the best trade-offperformance with AP on COCO at , over the past two years, the major ad-vances in object detection academia have focused onanchor-free detectors [29, 40, 14], advanced label assign-ment strategies [37, 36, 12, 41, 22, 4], and end-to-end(NMS-free) detectors [2, 32, 39]. These have not been inte-grated into YOLO families yet, as YOLOv4 and YOLOv51we choose the YOLOv5-L model at640 640resolution and test themodel with FP16-precision and batch=1 on a V100 to align the settings ofYOLOv4 [1] and YOLOv4-CSP [30] for a fair comparison1 [ ] 6 Aug 2021are still anchor-based detectors with hand-crafted assigningrules for s what brings us here, delivering those recent ad-vancements to YOLO series with experienced optimiza-tion.

Considering YOLOv4 and YOLOv5 may be a littleover-optimized for the anchor-based pipeline, we chooseYOLOv3 [25] as our start point (we set YOLOv3-SPP asthe default YOLOv3). Indeed, YOLOv3 is still one of themost widely used detectors in the industry due to the limitedcomputation resources and the insufficient software supportin various practical shown in Fig. 1, with the experienced updates ofthe above techniques, we boost the YOLOv3 to (YOLOX-DarkNet53) on COCO with640 640res-olution, surpassing the current best practice of YOLOv3( AP, ultralytics version2) by a large margin.

More-over, when switching to the advanced YOLOv5 architec-ture that adopts an advanced CSPNet [31] backbone and anadditional PAN [19] head, YOLOX-L achieves APon COCO with640 640resolution, outperforming thecounterpart YOLOv5-L by AP. We also test our de-sign strategies on models of small size. YOLOX-Tiny andYOLOX-Nano (only parameters and FLOPs)outperform the corresponding counterparts YOLOv4-Tinyand NanoDet3by 10% AP and AP, have released our code , with ONNX,TensorRT, NCNN and Openvino supported. One more thingworth mentioning, we won the 1st Place on Streaming Per-ception Challenge (Workshop on Autonomous Driving atCVPR 2021) using a single YOLOX-L YOLOX-DarkNet53We choose YOLOv3 [25] with Darknet53 as our base-line.

In the following part, we will walk through the wholesystem designs in YOLOX step by detailsOur training settings are mostlyconsistent from the baseline to our final model. We trainthe models for a total of 300 epochs with 5 epochs warm-up on COCO train2017[17]. We use stochastic gradi-ent descent (SGD) for training. We use a learning rate oflr BatchSize/64 (linear scaling [8]), with a initiallr= and the cosine lr schedule. The weight decay is the SGD momentum is The batch size is 128 bydefault to typical 8-GPU devices. Other batch sizes in-clude single GPU training also work well.

The input sizeis evenly drawn from 448 to 832 with 32 strides. FPS and2 HeadDecoupled HeadVanilla ( ) ( )Table 1: The effect of decoupled head for end-to-end YOLOin terms of AP (%) on in this report are all measured with FP16-precisionand batch=1 on a single tesla baselineOur baseline adopts the architec-ture of DarkNet53 backbone and an SPP layer, referredto YOLOv3-SPP in some papers [1, 7].We slightlychange some training strategies compared to the orig-inal implementation [25], adding EMA weights updat-ing, cosine lr schedule, IoU loss and IoU-aware use BCE Loss for trainingclsandobjbranch,and IoU Loss for gen-eral training tricks are orthogonal to the key improve-ment of YOLOX, we thus put them on the , we only conductRandomHorizontalFlip,ColorJittera nd multi-scale for data augmentation anddiscard theRandomResizedCropstrategy, because wefound theRandomResizedCropis kind of overlappedwith the planned mosaic augmentation.

With those en-hancements, our baseline achieves AP on COCOval,as shown in Tab. headIn object detection, the conflict betweenclassification and regression tasks is a well-known prob-lem [27, 34]. Thus the decoupled head for classificationand localization is widely used in the most of one-stage andtwo-stage detectors [16, 29, 35, 34]. However, as YOLO series backbones and feature pyramids ( , FPN [13],PAN [20].) continuously evolving, their detection heads re-main coupled as shown in Fig. two analytical experiments indicate that the coupleddetection head may harm the performance.

1). ReplacingYOLO s head with a decoupled one greatly improves theconverging speed as shown in Fig. 3. 2). The decoupledhead is essential to the end-to-end version of YOLO (willbe described next). One can tell from Tab. 1, the end-to-end property decreases by AP with the coupled head,while the decreasing reduces to AP for a decoupledhead. We thus replace the YOLO detect head with a lite de-coupled head as in Fig. 2. Concretely, it contains a1 1conv layer to reduce the channel dimension, followed bytwo parallel branches with two3 3conv layers respec-tively.

Number of parameters (M)

Tags:

Information

Transcription of Number of parameters (M)

Related search queries

Number of parameters (M)

Tags:

Information

Documents from same domain

Related documents

Related search queries