YAML Metadata Warning: empty or missing yaml metadata in repo card
Check out the documentation for more information.
Model ZOO
This conversion document is adapted from the configs document of mmyolo. The Download link below downloads the original PyTorch models. For convenience, you can use the YOLO series ONNX models we have uploaded to HuggingFace.
You can download the ONNX model of your choice from the following link: https://huggingface.co/CtrlX/JetYOLO/tree/main
Note:
- The model names preceded by (×) indicate that these models have not been exported to ONNX and uploaded to HuggingFace. You can refer to the
doc/model_convert.mddocument to try exporting the ONNX models yourself.- If you are wondering why the onnx is divided into two categories (Backend as efficientNMS and only decode), please refer to the
doc/model_convert.mddocument.
RTMdet
Object Detection
| Model | size | Params(M) | FLOPs(G) | TRT-FP16-Latency(ms) | box AP | TTA box AP | Config | Download |
|---|---|---|---|---|---|---|---|---|
| RTMDet-tiny | 640 | 4.8 | 8.1 | 0.98 | 41.0 | 42.7 | config | model | log |
| (×) RTMDet-tiny * | 640 | 4.8 | 8.1 | 0.98 | 41.8 (+0.8) | 43.2 (+0.5) | config | model | log |
| RTMDet-s | 640 | 8.89 | 14.8 | 1.22 | 44.6 | 45.8 | config | model | log |
| (×) RTMDet-s * | 640 | 8.89 | 14.8 | 1.22 | 45.7 (+1.1) | 47.3 (+1.5) | config | model | log |
| RTMDet-m | 640 | 24.71 | 39.27 | 1.62 | 49.3 | 50.9 | config | model | log |
| (×) RTMDet-m * | 640 | 24.71 | 39.27 | 1.62 | 50.2 (+0.9) | 51.9 (+1.0) | config | model | log |
| RTMDet-l | 640 | 52.3 | 80.23 | 2.44 | 51.4 | 53.1 | config | model | log |
| (×) RTMDet-l * | 640 | 52.3 | 80.23 | 2.44 | 52.3 (+0.9) | 53.7 (+0.6) | config | model | log |
| RTMDet-x | 640 | 94.86 | 141.67 | 3.10 | 52.8 | 54.2 | config | model | log |
Note:
- The inference speed of RTMDet is measured on an NVIDIA 3090 GPU with TensorRT 8.4.3, cuDNN 8.2.0, FP16, batch size=1, and without NMS.
- For a fair comparison, the config of bbox postprocessing is changed to be consistent with YOLOv5/6/7 after PR#9494, bringing about 0.1~0.3% AP improvement.
TTAmeans that Test Time Augmentation. It's perform 3 multi-scaling transformations on the image, followed by 2 flipping transformations (flipping and not flipping). You only need to specify--ttawhen testing to enable. see TTA for details.- * means checkpoints are trained with knowledge distillation. More details can be found in RTMDet distillation.
YOLOv5
COCO
| Backbone | Arch | size | Mask Refine | SyncBN | AMP | Mem (GB) | box AP | TTA box AP | Config | Download |
|---|---|---|---|---|---|---|---|---|---|---|
| YOLOv5-n | P5 | 640 | No | Yes | Yes | 1.5 | 28.0 | 30.7 | config | model | log |
| YOLOv5-n | P5 | 640 | Yes | Yes | Yes | 1.5 | 28.0 | config | model | log | |
| YOLOv5u-n | P5 | 640 | Yes | Yes | Yes | config | model | log | |||
| YOLOv5-s | P5 | 640 | No | Yes | Yes | 2.7 | 37.7 | 40.2 | config | model | log |
| YOLOv5-s | P5 | 640 | Yes | Yes | Yes | 2.7 | 38.0 (+0.3) | config | model | log | |
| YOLOv5u-s | P5 | 640 | Yes | Yes | Yes | config | model | log | |||
| YOLOv5-m | P5 | 640 | No | Yes | Yes | 5.0 | 45.3 | 46.9 | config | model | log |
| YOLOv5-m | P5 | 640 | Yes | Yes | Yes | 5.0 | 45.3 | config | model | log | |
| YOLOv5u-m | P5 | 640 | Yes | Yes | Yes | config | model | log | |||
| YOLOv5-l | P5 | 640 | No | Yes | Yes | 8.1 | 48.8 | 49.9 | config | model | log |
| YOLOv5-l | P5 | 640 | Yes | Yes | Yes | 8.1 | 49.3 (+0.5) | config | model | log | |
| YOLOv5u-l | P5 | 640 | Yes | Yes | Yes | config | model | log | |||
| YOLOv5-x | P5 | 640 | No | Yes | Yes | 12.2 | 50.2 | config | model | log | |
| YOLOv5-x | P5 | 640 | Yes | Yes | Yes | 12.2 | 50.9 (+0.7) | config | model | log | |
| YOLOv5u-x | P5 | 640 | Yes | Yes | Yes | config | model | log | |||
| (×) YOLOv5-n | P6 | 1280 | No | Yes | Yes | 5.8 | 35.9 | config | model | log | |
| (×) YOLOv5-s | P6 | 1280 | No | Yes | Yes | 10.5 | 44.4 | config | model | log | |
| (×) YOLOv5-m | P6 | 1280 | No | Yes | Yes | 19.1 | 51.3 | config | model | log | |
| (×) YOLOv5-l | P6 | 1280 | No | Yes | Yes | 30.5 | 53.7 | config | model | log |
Note:
fastmeans thatYOLOv5DetDataPreprocessorandyolov5_collateare used for data preprocessing, which is faster for training, but less flexible for multitasking. Recommended to use fast version config if you only care about object detection.detectmeans that the network input is fixed to640x640and the post-processing thresholds is modified.SyncBNmeans use SyncBN,AMPindicates training with mixed precision.- We use 8x A100 for training, and the single-GPU batch size is 16. This is different from the official code.
- The performance is unstable and may fluctuate by about 0.4 mAP and the highest performance weight in
COCOtraining inYOLOv5may not be the last epoch.TTAmeans that Test Time Augmentation. It's perform 3 multi-scaling transformations on the image, followed by 2 flipping transformations (flipping and not flipping). You only need to specify--ttawhen testing to enable. see TTA for details.- The performance of
Mask Refinetraining is for the weight performance officially released by YOLOv5.Mask Refinemeans refining bbox by mask while loading annotations and transforming afterYOLOv5RandomAffine,Copy Pastemeans usingYOLOv5CopyPaste.YOLOv5umodels use the same loss functions and split Detect head asYOLOv8models for improved performance, but only requires 300 epochs.
YOLOv6
COCO
| Backbone | Arch | Size | Epoch | SyncBN | AMP | Mem (GB) | Box AP | Config | Download |
|---|---|---|---|---|---|---|---|---|---|
| YOLOv6-n | P5 | 640 | 400 | Yes | Yes | 6.04 | 36.2 | config | model | log |
| YOLOv6-t | P5 | 640 | 400 | Yes | Yes | 8.13 | 41.0 | config | model | log |
| YOLOv6-s | P5 | 640 | 400 | Yes | Yes | 8.88 | 44.0 | config | model | log |
| YOLOv6-m | P5 | 640 | 300 | Yes | Yes | 16.69 | 48.4 | config | model | log |
| YOLOv6-l | P5 | 640 | 300 | Yes | Yes | 20.86 | 51.0 | config | model | log |
Note:
- The official m and l models use knowledge distillation, but our version does not support it, which will be implemented in MMRazor in the future.
- The performance is unstable and may fluctuate by about 0.3 mAP.
- If users need the weight of 300 epoch for nano, tiny and small model, they can train according to the configs of 300 epoch provided by us, or convert the official weight according to the converter script.
- We have observed that the base model has been officially released in v6 recently. Although the accuracy has decreased, it is more efficient. We will also provide the base model configuration in the future.
YOLOv7
COCO
| Backbone | Arch | Size | SyncBN | AMP | Mem (GB) | Box AP | Config | Download |
|---|---|---|---|---|---|---|---|---|
| YOLOv7-tiny | P5 | 640 | Yes | Yes | 2.7 | 37.5 | config | model | log |
| YOLOv7-l | P5 | 640 | Yes | Yes | 10.3 | 50.9 | config | model | log |
| YOLOv7-x | P5 | 640 | Yes | Yes | 13.7 | 52.8 | config | model | log |
| YOLOv7-w | P6 | 1280 | Yes | Yes | 27.0 | 54.1 | config | model | log |
| YOLOv7-e | P6 | 1280 | Yes | Yes | 42.5 | 55.1 | config | model | log |
Note: In the official YOLOv7 code, the
random_perspectivedata augmentation in COCO object detection task training uses mask annotation information, which leads to higher performance. Object detection should not use mask annotation, so only box annotation information is used inMMYOLO. We will use the mask annotation information in the instance segmentation task.
- The performance is unstable and may fluctuate by about 0.3 mAP. The performance shown above is the best model.
- If users need the weight of
YOLOv7-e2e, they can train according to the configs provided by us, or convert the official weight according to the converter script.fastmeans thatYOLOv5DetDataPreprocessorandyolov5_collateare used for data preprocessing, which is faster for training, but less flexible for multitasking. Recommended to use fast version config if you only care about object detection.SyncBNmeans use SyncBN,AMPindicates training with mixed precision.- We use 8x A100 for training, and the single-GPU batch size is 16. This is different from the official code.
YOLOv8
COCO
| Backbone | Arch | size | Mask Refine | SyncBN | AMP | Mem (GB) | box AP | TTA box AP | Config | Download |
|---|---|---|---|---|---|---|---|---|---|---|
| YOLOv8-n | P5 | 640 | No | Yes | Yes | 2.8 | 37.2 | config | model | log | |
| YOLOv8-n | P5 | 640 | Yes | Yes | Yes | 2.5 | 37.4 (+0.2) | 39.9 | config | model | log |
| YOLOv8-s | P5 | 640 | No | Yes | Yes | 4.0 | 44.2 | config | model | log | |
| YOLOv8-s | P5 | 640 | Yes | Yes | Yes | 4.0 | 45.1 (+0.9) | 46.8 | config | model | log |
| YOLOv8-m | P5 | 640 | No | Yes | Yes | 7.2 | 49.8 | config | model | log | |
| YOLOv8-m | P5 | 640 | Yes | Yes | Yes | 7.0 | 50.6 (+0.8) | 52.3 | config | model | log |
| YOLOv8-l | P5 | 640 | No | Yes | Yes | 9.8 | 52.1 | config | model | log | |
| YOLOv8-l | P5 | 640 | Yes | Yes | Yes | 9.1 | 53.0 (+0.9) | 54.4 | config | model | log |
| YOLOv8-x | P5 | 640 | No | Yes | Yes | 12.2 | 52.7 | config | model | log | |
| YOLOv8-x | P5 | 640 | Yes | Yes | Yes | 12.4 | 54.0 (+1.3) | 55.0 | config | model | log |
Note
- We use 8x A100 for training, and the single-GPU batch size is 16. This is different from the official code, but has no effect on performance.
- The performance is unstable and may fluctuate by about 0.3 mAP and the highest performance weight in
COCOtraining inYOLOv8may not be the last epoch. The performance shown above is the best model.- We provide scripts to convert official weights to MMYOLO.
SyncBNmeans using SyncBN,AMPindicates training with mixed precision.- The performance of
Mask Refinetraining is for the weight performance officially released by YOLOv8.Mask Refinemeans refining bbox by mask while loading annotations and transforming afterYOLOv5RandomAffine, and the L and X models useCopy Paste.TTAmeans that Test Time Augmentation. It's perform 3 multi-scaling transformations on the image, followed by 2 flipping transformations (flipping and not flipping). You only need to specify--ttawhen testing to enable. see TTA for details.