isLinXu commited on
Commit
5a441f1
·
1 Parent(s): 2793310

Add Spaces YAML config, dependencies and system packages

Browse files
Files changed (3) hide show
  1. README.md +92 -361
  2. packages.txt +2 -0
  3. requirements.txt +8 -0
README.md CHANGED
@@ -1,364 +1,95 @@
1
- <div align="center">
2
- <h1>YOLO-MASTER</h1>
3
-
4
-
5
- <p align="left"> <a href="https://huggingface.co/spaces/xx"> <img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" alt="Hugging Face Spaces"> </a> <a href="https://colab.research.google.com/github/isLinXu/YOLO-Master"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"> </a> <a href="https://arxiv.org/abs/2512.23273"> <img src="https://img.shields.io/badge/arXiv-2512.23273-b31b1b.svg" alt="arXiv"> </a> <a href="https://github.com/isLinXu/YOLO-Master/releases"> <img src="https://img.shields.io/badge/%F0%9F%93%A6-Model%20Zoo-orange" alt="Model Zoo"> </a> <a href="./LICENSE"> <img src="https://img.shields.io/badge/License-AGPL%203.0-blue.svg" alt="AGPL 3.0"> </a> <a href="https://github.com/ultralytics/ultralytics"> <img src="https://img.shields.io/badge/Ultralytics-YOLO-blue" alt="Ultralytics"> </a> </p>
6
-
7
-
8
- <p align="center">
9
- YOLO-Master:
10
- <b><u>M</u></b>OE-<b><u>A</u></b>ccelerated with
11
- <b><u>S</u></b>pecialized <b><u>T</u></b>ransformers for
12
- <b><u>E</u></b>nhanced <b><u>R</u></b>eal-time Detection.
13
- </p>
14
- </div>
15
-
16
- <div align="center">
17
- <div style="text-align: center; margin-bottom: 8px;">
18
- <a href="https://github.com/isLinXu" style="text-decoration: none;"><b>Xu Lin</b></a><sup>1*</sup>&nbsp;&nbsp;
19
- <a href="https://pjl1995.github.io/" style="text-decoration: none;"><b>Jinlong Peng</b></a><sup>1*</sup>&nbsp;&nbsp;
20
- <a href="https://scholar.google.com/citations?user=fa4NkScAAAAJ" style="text-decoration: none;"><b>Zhenye Gan</b></a><sup>1</sup>&nbsp;&nbsp;
21
- <a href="https://scholar.google.com/citations?hl=en&user=cU0UfhwAAAAJ" style="text-decoration: none;"><b>Jiawen Zhu</b></a><sup>2</sup>&nbsp;&nbsp;
22
- <a href="https://scholar.google.com/citations?user=JIKuf4AAAAAJ&hl=zh-TW" style="text-decoration: none;"><b>Jun Liu</b></a><sup>1</sup>
23
- </div>
24
-
25
- <div style="text-align: center; margin-bottom: 4px; font-size: 0.95em;">
26
- <sup>1</sup>Tencent Youtu Lab &nbsp;&nbsp;&nbsp;
27
- <sup>2</sup>Singapore Management University
28
- </div>
29
-
30
- <div style="text-align: center; margin-bottom: 12px; font-size: 0.85em; color: #666; font-style: italic;">
31
- <sup>*</sup>Equal Contribution
32
- </div>
33
-
34
- <div style="text-align: center;">
35
- <div style="font-family: 'Courier New', Courier, monospace; font-size: 0.85em; background-color: #f6f8fa; padding: 10px; border-radius: 6px; display: inline-block; line-height: 1.4; text-align: left;">
36
- {gatilin, jeromepeng, wingzygan, juliusliu}@tencent.com <br>
37
- jwzhu.2022@phdcs.smu.edu.sg
38
- </div>
39
- </div>
40
- </div>
41
- <br>
42
-
43
- [English](README.md) | [简体中文](README_CN.md)
44
-
45
  ---
46
-
47
- ## 💡 A Humble Beginning (Introduction)
48
-
49
- > **"Exploring the frontiers of Dynamic Intelligence in YOLO."**
50
-
51
- This work represents our passionate exploration into the evolution of Real-Time Object Detection (RTOD). To the best of our knowledge, **YOLO-Master is the first work to deeply integrate Mixture-of-Experts (MoE) with the YOLO architecture on general-purpose datasets.**
52
-
53
- Most existing YOLO models rely on static, dense computation—allocating the same computational budget to a simple sky background as they do to a complex, crowded intersection. We believe detection models should be more "adaptive", much like the human visual system. While this initial exploration may be not perfect, it demonstrates the significant potential of **Efficient Sparse MoE (ES-MoE)** in balancing high precision with ultra-low latency. We are committed to continuous iteration and optimization to refine this approach further.
54
-
55
- Looking forward, we draw inspiration from the transformative advancements in LLMs and VLMs. We are committed to refining this approach and extending these insights to fundamental vision tasks, with the ultimate goal of tackling more ambitious frontiers like Open-Vocabulary Detection and Open-Set Segmentation.
56
-
57
- <details>
58
- <summary>
59
- <font size="+1"><b>Abstract</b></font>
60
- </summary>
61
- Existing Real-Time Object Detection (RTOD) methods commonly adopt YOLO-like architectures for their favorable trade-off between accuracy and speed. However, these models rely on static dense computation that applies uniform processing to all inputs, misallocating representational capacity and computational resources such as over-allocating on trivial scenes while under-serving complex ones. This mismatch results in both computational redundancy and suboptimal detection performance.
62
-
63
- To overcome this limitation, we propose YOLO-Master, a novel YOLO-like framework that introduces instance-conditional adaptive computation for RTOD. This is achieved through an Efficient Sparse Mixture-of-Experts (ES-MoE) block that dynamically allocates computational resources to each input according to its scene complexity. At its core, a lightweight dynamic routing network guides expert specialization during training through a diversity enhancing objective, encouraging complementary expertise among experts. Additionally, the routing network adaptively learns to activate only the most relevant experts, thereby improving detection performance while minimizing computational overhead during inference.
64
-
65
- Comprehensive experiments on five large-scale benchmarks demonstrate the superiority of YOLO-Master. On MS COCO, our model achieves 42.4\% AP with 1.62ms latency, outperforming YOLOv13-N by +0.8\% mAP and 17.8\% faster inference. Notably, the gains are most pronounced on challenging dense scenes, while the model preserves efficiency on typical inputs and maintains real-time inference speed. Code: [isLinXu/YOLO-Master](https://github.com/isLinXu/YOLO-Master)
66
- </details>
67
-
68
  ---
69
 
70
- ## 🎨 Architecture
71
-
72
- <div align="center">
73
- <img width="90%" alt="YOLO-Master Architecture" src="https://github.com/user-attachments/assets/6caa1065-af77-4f77-8faf-7551c013dacd" />
74
- <p><i>YOLO-Master introduces ES-MoE blocks to achieve "compute-on-demand" via dynamic routing.</i></p>
75
- </div>
76
-
77
- ### 📚 In-Depth Documentation
78
- For a deep dive into the design philosophy of MoE modules, detailed routing mechanisms, and optimization guides for deployment on various hardware (GPU/CPU/NPU), please refer to our Wiki:
79
- 👉 **[Wiki: MoE Modules Explained](wiki/MoE_Modules_Explanation_EN.md)**
80
-
81
-
82
- ## 📖 Table of Contents
83
-
84
- - [A Humble Beginning](#-a-humble-beginning-introduction)
85
- - [Architecture](#-architecture)
86
- - [Updates](#-updates-latest-first)
87
- - [Main Results](#-main-results)
88
- - [Detection](#detection)
89
- - [Segmentation](#segmentation)
90
- - [Classification](#classification)
91
- - [Detection Examples](#-detection-examples)
92
- - [Supported Tasks](#-supported-tasks)
93
- - [Quick Start](#-quick-start)
94
- - [Installation](#installation)
95
- - [Validation](#validation)
96
- - [Training](#training)
97
- - [Inference](#inference)
98
- - [Export](#export)
99
- - [Gradio Demo](#gradio-demo)
100
- - [Community & Contributing](#-community--contributing)
101
- - [License](#-license)
102
- - [Acknowledgements](#-acknowledgements)
103
- - [Citation](#-citation)
104
-
105
-
106
-
107
- ## 🚀 Updates (Latest First)
108
-
109
- - **2025/12/30**: arXiv paper published.
110
-
111
-
112
- ## 📊 Main Results
113
- ### Detection
114
- <div align="center">
115
- <img width="450" alt="Radar chart comparing YOLO models on various datasets" src="https://github.com/user-attachments/assets/743fa632-659b-43b1-accf-f865c8b66754"/>
116
- </div>
117
-
118
-
119
- <div align="center">
120
- <p><b>Table 1. Comparison with state-of-the-art Nano-scale detectors across five benchmarks.</b></p>
121
- <table style="border-collapse:collapse; width:100%; font-family:sans-serif; text-align:center; border-top:2px solid #000; border-bottom:2px solid #000; font-size:0.9em;">
122
- <thead>
123
- <tr style="border-bottom:1px solid #ddd;">
124
- <th style="padding:8px; border-right:1px solid #ddd;">Dataset</th>
125
- <th colspan="2" style="border-right:1px solid #ddd;">COCO</th>
126
- <th colspan="2" style="border-right:1px solid #ddd;">PASCAL VOC</th>
127
- <th colspan="2" style="border-right:1px solid #ddd;">VisDrone</th>
128
- <th colspan="2" style="border-right:1px solid #ddd;">KITTI</th>
129
- <th colspan="2" style="border-right:1px solid #ddd;">SKU-110K</th>
130
- <th>Efficiency</th>
131
- </tr>
132
- <tr style="border-bottom:1px solid #000;">
133
- <th style="padding:8px; border-right:1px solid #ddd;">Method</th>
134
- <th>mAP<br>(%)</th>
135
- <th style="border-right:1px solid #ddd;">mAP<sub>50</sub><br>(%)</th>
136
- <th>mAP<br>(%)</th>
137
- <th style="border-right:1px solid #ddd;">mAP<sub>50</sub><br>(%)</th>
138
- <th>mAP<br>(%)</th>
139
- <th style="border-right:1px solid #ddd;">mAP<sub>50</sub><br>(%)</th>
140
- <th>mAP<br>(%)</th>
141
- <th style="border-right:1px solid #ddd;">mAP<sub>50</sub><br>(%)</th>
142
- <th>mAP<br>(%)</th>
143
- <th style="border-right:1px solid #ddd;">mAP<sub>50</sub><br>(%)</th>
144
- <th>Latency<br>(ms)</th>
145
- </tr>
146
- </thead>
147
- <tbody>
148
- <tr>
149
- <td style="padding:6px; text-align:left; border-right:1px solid #ddd;">YOLOv10</td>
150
- <td>38.5</td><td style="border-right:1px solid #ddd;">53.8</td>
151
- <td>60.6</td><td style="border-right:1px solid #ddd;">80.3</td>
152
- <td>18.7</td><td style="border-right:1px solid #ddd;">32.4</td>
153
- <td>66.0</td><td style="border-right:1px solid #ddd;">88.3</td>
154
- <td>57.4</td><td style="border-right:1px solid #ddd;">90.0</td>
155
- <td>1.84</td>
156
- </tr>
157
- <tr>
158
- <td style="padding:6px; text-align:left; border-right:1px solid #ddd;">YOLOv11-N</td>
159
- <td>39.4</td><td style="border-right:1px solid #ddd;">55.3</td>
160
- <td>61.0</td><td style="border-right:1px solid #ddd;">81.2</td>
161
- <td>18.5</td><td style="border-right:1px solid #ddd;">32.2</td>
162
- <td>67.8</td><td style="border-right:1px solid #ddd;">89.8</td>
163
- <td>57.4</td><td style="border-right:1px solid #ddd;">90.0</td>
164
- <td>1.50</td>
165
- </tr>
166
- <tr>
167
- <td style="padding:6px; text-align:left; border-right:1px solid #ddd;">YOLOv12-N</td>
168
- <td>40.6</td><td style="border-right:1px solid #ddd;">56.7</td>
169
- <td>60.7</td><td style="border-right:1px solid #ddd;">80.8</td>
170
- <td>18.3</td><td style="border-right:1px solid #ddd;">31.7</td>
171
- <td>67.6</td><td style="border-right:1px solid #ddd;">89.3</td>
172
- <td>57.4</td><td style="border-right:1px solid #ddd;">90.0</td>
173
- <td>1.64</td>
174
- </tr>
175
- <tr style="border-bottom:1px solid #000;">
176
- <td style="padding:6px; text-align:left; border-right:1px solid #ddd;">YOLOv13-N</td>
177
- <td>41.6</td><td style="border-right:1px solid #ddd;">57.8</td>
178
- <td>60.7</td><td style="border-right:1px solid #ddd;">80.3</td>
179
- <td>17.5</td><td style="border-right:1px solid #ddd;">30.6</td>
180
- <td>67.7</td><td style="border-right:1px solid #ddd;">90.6</td>
181
- <td>57.5</td><td style="border-right:1px solid #ddd;">90.3</td>
182
- <td>1.97</td>
183
- </tr>
184
- <tr style="background-color:#f9f9f9;">
185
- <td style="padding:8px; text-align:left; border-right:1px solid #ddd;"><b>YOLO-Master-N</b></td>
186
- <td><b>42.4</b></td><td style="border-right:1px solid #ddd;"><b>59.2</b></td>
187
- <td><b>62.1</b></td><td style="border-right:1px solid #ddd;"><b>81.9</b></td>
188
- <td><b>19.6</b></td><td style="border-right:1px solid #ddd;"><b>33.7</b></td>
189
- <td><b>69.2</b></td><td style="border-right:1px solid #ddd;"><b>91.3</b></td>
190
- <td><b>58.2</b></td><td style="border-right:1px solid #ddd;"><b>90.6</b></td>
191
- <td><b>1.62</b></td>
192
- </tr>
193
- </tbody>
194
- </table>
195
- </div>
196
-
197
- ### Segmentation
198
-
199
- | **Model** | **Size** | **mAPbox (%)** | **mAPmask (%)** | **Gain (mAPmask)** |
200
- | --------------------- | -------- | -------------- | --------------- | ------------------ |
201
- | YOLOv11-seg-N | 640 | 38.9 | 32.0 | - |
202
- | YOLOv12-seg-N | 640 | 39.9 | 32.8 | Baseline |
203
- | **YOLO-Master-seg-N** | **640** | **42.9** | **35.6** | **+2.8%** 🚀 |
204
-
205
- ### Classification
206
-
207
- | **Model** | **Dataset** | **Input Size** | **Top-1 Acc (%)** | **Top-5 Acc (%)** | **Comparison** |
208
- | --------------------- | ------------ | -------------- | ----------------- | ----------------- | ----------------- |
209
- | YOLOv11-cls-N | ImageNet | 224 | 70.0 | 89.4 | Baseline |
210
- | YOLOv12-cls-N | ImageNet | 224 | 71.7 | 90.5 | +1.7% Top-1 |
211
- | **YOLO-Master-cls-N** | **ImageNet** | **224** | **76.6** | **93.4** | **+4.9% Top-1** 🔥 |
212
-
213
- ## 🖼️ Detection Examples
214
-
215
- <div align="center">
216
- <img width="1416" height="856" alt="Detection Examples" src="https://github.com/user-attachments/assets/0e1fbe4a-34e7-489e-b936-6d121ede5cf6" /> </div>
217
- <table border="0"> <tr> <td align="center" style="font-weight: bold; background-color: #f6f8fa;"> <b>Detection</b> </td> <td width="45%"> <img src="https://github.com/user-attachments/assets/db350acd-1d91-4be6-96b2-6bdf8aac57e8" alt="Detection 1" style="width:100%; display:block; border-radius:4px;"> </td> <td width="45%"> <img src="https://github.com/user-attachments/assets/b6c80dbd-120e-428b-8d26-ea2b38a40b47" alt="Detection 2" style="width:100%; display:block; border-radius:4px;"> </td> </tr> <tr> <td align="center" style="font-weight: bold; background-color: #f6f8fa;"> <b>Segmentation</b> </td> <td width="45%"> <img src="https://github.com/user-attachments/assets/edb05e3c-cd83-41db-89f8-8ef09fc22798" alt="Segmentation 1" style="width:100%; display:block; border-radius:4px;"> </td> <td width="45%"> <img src="https://github.com/user-attachments/assets/ea138674-d7c7-48fb-b272-3ec211d161bf" alt="Segmentation 2" style="width:100%; display:block; border-radius:4px;"> </td> </tr> </table>
218
-
219
-
220
-
221
- ## 🧩 Supported Tasks
222
-
223
- YOLO-Master builds upon the robust Ultralytics framework, inheriting support for various computer vision tasks. While our research primarily focuses on Real-Time Object Detection, the codebase is capable of supporting:
224
-
225
- | Task | Status | Description |
226
- |:-----|:------:|:------------|
227
- | **Object Detection** | ✅ | Real-time object detection with ES-MoE acceleration. |
228
- | **Instance Segmentation** | ✅ | Experimental support (inherited from Ultralytics). |
229
- | **Pose Estimation** | 🚧 | Experimental support (inherited from Ultralytics). |
230
- | **OBB Detection** | 🚧 | Experimental support (inherited from Ultralytics). |
231
- | **Classification** | ✅ | Image classification support. |
232
-
233
- ## ⚙️ Quick Start
234
-
235
- ### Installation
236
-
237
- <details open>
238
- <summary><strong>Install via pip (Recommended)</strong></summary>
239
-
240
- ```bash
241
- # 1. Create and activate a new environment
242
- conda create -n yolo_master python=3.11 -y
243
- conda activate yolo_master
244
-
245
- # 2. Clone the repository
246
- git clone https://github.com/isLinXu/YOLO-Master
247
- cd YOLO-Master
248
-
249
- # 3. Install dependencies
250
- pip install -r requirements.txt
251
- pip install -e .
252
-
253
- # 4. Optional: Install FlashAttention for faster training (CUDA required)
254
- pip install flash_attn
255
- ```
256
- </details>
257
-
258
- ### Validation
259
-
260
- Validate the model accuracy on the COCO dataset.
261
-
262
- ```python
263
- from ultralytics import YOLO
264
-
265
- # Load the pretrained model
266
- model = YOLO("yolo_master_n.pt")
267
-
268
- # Run validation
269
- metrics = model.val(data="coco.yaml", save_json=True)
270
- print(metrics.box.map) # map50-95
271
- ```
272
-
273
- ### Training
274
-
275
- Train a new model on your custom dataset or COCO.
276
-
277
- ```python
278
- from ultralytics import YOLO
279
-
280
- # Load a model
281
- model = YOLO('cfg/models/master/v0/det/yolo-master-n.yaml') # build a new model from YAML
282
-
283
- # Train the model
284
- results = model.train(
285
- data='coco.yaml',
286
- epochs=600,
287
- batch=256,
288
- imgsz=640,
289
- device="0,1,2,3", # Use multiple GPUs
290
- scale=0.5,
291
- mosaic=1.0,
292
- mixup=0.0,
293
- copy_paste=0.1
294
- )
295
- ```
296
-
297
- ### Inference
298
-
299
- Run inference on images or videos.
300
-
301
- **Python:**
302
- ```python
303
- from ultralytics import YOLO
304
-
305
- model = YOLO("yolo_master_n.pt")
306
- results = model("path/to/image.jpg")
307
- results[0].show()
308
- ```
309
-
310
- **CLI:**
311
- ```bash
312
- yolo predict model=yolo_master_n.pt source='path/to/image.jpg' show=True
313
- ```
314
-
315
- ### Export
316
-
317
- Export the model to other formats for deployment (TensorRT, ONNX, etc.).
318
-
319
- ```python
320
- from ultralytics import YOLO
321
-
322
- model = YOLO("yolo_master_n.pt")
323
- model.export(format="engine", half=True) # Export to TensorRT
324
- # formats: onnx, openvino, engine, coreml, saved_model, pb, tflite, edgetpu, tfjs
325
- ```
326
-
327
- ### Gradio Demo
328
-
329
- Launch a local web interface to test the model interactively. This application provides a user-friendly Gradio dashboard for model inference, supporting automatic model scanning, task switching (Detection, Segmentation, Classification), and real-time visualization.
330
-
331
- ```bash
332
- python app.py
333
- # Open http://127.0.0.1:7860 in your browser
334
- ```
335
-
336
- ## 🤝 Community & Contributing
337
-
338
- We welcome contributions! Please check out our [Contribution Guidelines](CONTRIBUTING.md) for details on how to get involved.
339
-
340
- - **Issues**: Report bugs or request features [here](https://github.com/isLinXu/YOLO-Master/issues).
341
- - **Pull Requests**: Submit your improvements.
342
-
343
- ## 📄 License
344
-
345
- This project is licensed under the [GNU Affero General Public License v3.0 (AGPL-3.0)](LICENSE).
346
-
347
- ## 🙏 Acknowledgements
348
-
349
- This work builds upon the excellent [Ultralytics](https://github.com/ultralytics/ultralytics) framework. Huge thanks to the community for contributions, deployments, and tutorials!
350
-
351
- ## 📝 Citation
352
-
353
- If you use YOLO-Master in your research, please cite our paper:
354
-
355
- ```bibtex
356
- @article{lin2025yolomaster,
357
- title={{YOLO-Master}: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection},
358
- author={Lin, Xu and Peng, Jinlong and Gan, Zhenye and Zhu, Jiawen and Liu, Jun},
359
- journal={arXiv preprint arXiv:},
360
- year={2025}
361
- }
362
- ```
363
-
364
- ⭐ **If you find this work useful, please star the repository!**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ Title: YOLO Master WebUI Demo
3
+ emoji: 🚀
4
+ colorFrom: green
5
+ colorTo: blue
6
+ sdk: gradio
7
+ sdk_version: "4.44.0"
8
+ app_file: app.py
9
+ pinned: false
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
+ # YOLO Master WebUI Demo
13
+
14
+ This Space runs a Gradio-based YOLO Master WebUI demo.
15
+
16
+ > import os
17
+ > import gc
18
+ > import warnings
19
+ > from pathlib import Path
20
+ > from typing import List, Dict, Optional, Tuple, Any
21
+ >
22
+ > import gradio as gr
23
+ > import numpy as np
24
+ > import pandas as pd
25
+ > import cv2
26
+ > import torch
27
+ > from ultralytics import YOLO
28
+ > try:
29
+ > from huggingface_hub import hf_hub_download
30
+ > except Exception:
31
+ > hf_hub_download = None
32
+ >
33
+ > # Ignore unnecessary warnings
34
+ > warnings.filterwarnings("ignore")
35
+ >
36
+ >
37
+ > class GlobalConfig:
38
+ > """Global configuration parameters for easy modification."""
39
+ > # Default model files mapping
40
+ > DEFAULT_MODELS = {
41
+ > "detect": "ckpts/yolo-master-v0.1-n.pt",
42
+ > "seg": "ckpts/yolo-master-seg-n.pt",
43
+ > "cls": "ckpts/yolo-master-cls-n.pt",
44
+ > "pose": "yolov8n-pose.pt",
45
+ > "obb": "yolov8n-obb.pt"
46
+ > }
47
+ > # Allowed image formats
48
+ > IMAGE_EXTENSIONS = {".jpg", ".jpeg", ".png", ".bmp", ".webp"}
49
+ > # UI Theme
50
+ > THEME = gr.themes.Soft(primary_hue="blue", neutral_hue="slate")
51
+ > DEFAULT_IMAGE_DIR = "./image"
52
+ >
53
+ >
54
+ > class ModelManager:
55
+ > """Handles model scanning, loading, and memory management."""
56
+ > def __init__(self, ckpts_root: Path):
57
+ > self.ckpts_root = ckpts_root
58
+ > self.current_model: Optional[YOLO] = None
59
+ > self.current_model_path: str = ""
60
+ > self.current_task: str = "detect"
61
+ >
62
+ > def scan_checkpoints(self) -> Dict[str, List[str]]:
63
+ > """
64
+ > Scans the checkpoint directory and categorizes models by task.
65
+ > """
66
+ > model_map = {k: [] for k in GlobalConfig.DEFAULT_MODELS.keys()}
67
+ >
68
+ > if not self.ckpts_root.exists():
69
+ > return model_map
70
+ >
71
+ > # Recursively find all .pt files
72
+ > for p in self.ckpts_root.rglob("*.pt"):
73
+ > if p.is_dir(): continue
74
+ >
75
+ > path_str = str(p.absolute())
76
+ > filename = p.name.lower()
77
+ > parent = p.parent.name.lower()
78
+ >
79
+ > # Intelligent classification logic
80
+ > if "seg" in filename or "seg" in parent:
81
+ > model_map["seg"].append(path_str)
82
+ > elif "cls" in filename or "class" in filename or "cls" in parent:
83
+ > model_map["cls"].append(path_str)
84
+ > elif "pose" in filename or "pose" in parent:
85
+ > model_map["pose"].append(path_str)
86
+ > elif "obb" in filename or "obb" in parent:
87
+ > model_map["obb"].append(path_str)
88
+ > else:
89
+ > model_map["detect"].append(path_str) # Default to detect
90
+ >
91
+ > # Deduplicate and sort
92
+ > for k in model_map:
93
+ > model_map[k] = sorted(list(set(model_map[k])))
94
+ >
95
+ > return model_map
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
packages.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ ffmpeg
2
+ libgl1
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ ultralytics
2
+ gradio==4.44.0
3
+ opencv-python
4
+ pillow
5
+ numpy
6
+ matplotlib
7
+ torch==2.1.2
8
+ torchvision==0.16.2