Spaces:
Build error
Build error
isLinXu commited on
Commit ·
5a441f1
1
Parent(s): 2793310
Add Spaces YAML config, dependencies and system packages
Browse files- README.md +92 -361
- packages.txt +2 -0
- requirements.txt +8 -0
README.md
CHANGED
|
@@ -1,364 +1,95 @@
|
|
| 1 |
-
<div align="center">
|
| 2 |
-
<h1>YOLO-MASTER</h1>
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
<p align="left"> <a href="https://huggingface.co/spaces/xx"> <img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" alt="Hugging Face Spaces"> </a> <a href="https://colab.research.google.com/github/isLinXu/YOLO-Master"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"> </a> <a href="https://arxiv.org/abs/2512.23273"> <img src="https://img.shields.io/badge/arXiv-2512.23273-b31b1b.svg" alt="arXiv"> </a> <a href="https://github.com/isLinXu/YOLO-Master/releases"> <img src="https://img.shields.io/badge/%F0%9F%93%A6-Model%20Zoo-orange" alt="Model Zoo"> </a> <a href="./LICENSE"> <img src="https://img.shields.io/badge/License-AGPL%203.0-blue.svg" alt="AGPL 3.0"> </a> <a href="https://github.com/ultralytics/ultralytics"> <img src="https://img.shields.io/badge/Ultralytics-YOLO-blue" alt="Ultralytics"> </a> </p>
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
<p align="center">
|
| 9 |
-
YOLO-Master:
|
| 10 |
-
<b><u>M</u></b>OE-<b><u>A</u></b>ccelerated with
|
| 11 |
-
<b><u>S</u></b>pecialized <b><u>T</u></b>ransformers for
|
| 12 |
-
<b><u>E</u></b>nhanced <b><u>R</u></b>eal-time Detection.
|
| 13 |
-
</p>
|
| 14 |
-
</div>
|
| 15 |
-
|
| 16 |
-
<div align="center">
|
| 17 |
-
<div style="text-align: center; margin-bottom: 8px;">
|
| 18 |
-
<a href="https://github.com/isLinXu" style="text-decoration: none;"><b>Xu Lin</b></a><sup>1*</sup>
|
| 19 |
-
<a href="https://pjl1995.github.io/" style="text-decoration: none;"><b>Jinlong Peng</b></a><sup>1*</sup>
|
| 20 |
-
<a href="https://scholar.google.com/citations?user=fa4NkScAAAAJ" style="text-decoration: none;"><b>Zhenye Gan</b></a><sup>1</sup>
|
| 21 |
-
<a href="https://scholar.google.com/citations?hl=en&user=cU0UfhwAAAAJ" style="text-decoration: none;"><b>Jiawen Zhu</b></a><sup>2</sup>
|
| 22 |
-
<a href="https://scholar.google.com/citations?user=JIKuf4AAAAAJ&hl=zh-TW" style="text-decoration: none;"><b>Jun Liu</b></a><sup>1</sup>
|
| 23 |
-
</div>
|
| 24 |
-
|
| 25 |
-
<div style="text-align: center; margin-bottom: 4px; font-size: 0.95em;">
|
| 26 |
-
<sup>1</sup>Tencent Youtu Lab
|
| 27 |
-
<sup>2</sup>Singapore Management University
|
| 28 |
-
</div>
|
| 29 |
-
|
| 30 |
-
<div style="text-align: center; margin-bottom: 12px; font-size: 0.85em; color: #666; font-style: italic;">
|
| 31 |
-
<sup>*</sup>Equal Contribution
|
| 32 |
-
</div>
|
| 33 |
-
|
| 34 |
-
<div style="text-align: center;">
|
| 35 |
-
<div style="font-family: 'Courier New', Courier, monospace; font-size: 0.85em; background-color: #f6f8fa; padding: 10px; border-radius: 6px; display: inline-block; line-height: 1.4; text-align: left;">
|
| 36 |
-
{gatilin, jeromepeng, wingzygan, juliusliu}@tencent.com <br>
|
| 37 |
-
jwzhu.2022@phdcs.smu.edu.sg
|
| 38 |
-
</div>
|
| 39 |
-
</div>
|
| 40 |
-
</div>
|
| 41 |
-
<br>
|
| 42 |
-
|
| 43 |
-
[English](README.md) | [简体中文](README_CN.md)
|
| 44 |
-
|
| 45 |
---
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
Looking forward, we draw inspiration from the transformative advancements in LLMs and VLMs. We are committed to refining this approach and extending these insights to fundamental vision tasks, with the ultimate goal of tackling more ambitious frontiers like Open-Vocabulary Detection and Open-Set Segmentation.
|
| 56 |
-
|
| 57 |
-
<details>
|
| 58 |
-
<summary>
|
| 59 |
-
<font size="+1"><b>Abstract</b></font>
|
| 60 |
-
</summary>
|
| 61 |
-
Existing Real-Time Object Detection (RTOD) methods commonly adopt YOLO-like architectures for their favorable trade-off between accuracy and speed. However, these models rely on static dense computation that applies uniform processing to all inputs, misallocating representational capacity and computational resources such as over-allocating on trivial scenes while under-serving complex ones. This mismatch results in both computational redundancy and suboptimal detection performance.
|
| 62 |
-
|
| 63 |
-
To overcome this limitation, we propose YOLO-Master, a novel YOLO-like framework that introduces instance-conditional adaptive computation for RTOD. This is achieved through an Efficient Sparse Mixture-of-Experts (ES-MoE) block that dynamically allocates computational resources to each input according to its scene complexity. At its core, a lightweight dynamic routing network guides expert specialization during training through a diversity enhancing objective, encouraging complementary expertise among experts. Additionally, the routing network adaptively learns to activate only the most relevant experts, thereby improving detection performance while minimizing computational overhead during inference.
|
| 64 |
-
|
| 65 |
-
Comprehensive experiments on five large-scale benchmarks demonstrate the superiority of YOLO-Master. On MS COCO, our model achieves 42.4\% AP with 1.62ms latency, outperforming YOLOv13-N by +0.8\% mAP and 17.8\% faster inference. Notably, the gains are most pronounced on challenging dense scenes, while the model preserves efficiency on typical inputs and maintains real-time inference speed. Code: [isLinXu/YOLO-Master](https://github.com/isLinXu/YOLO-Master)
|
| 66 |
-
</details>
|
| 67 |
-
|
| 68 |
---
|
| 69 |
|
| 70 |
-
#
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
#
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
|
| 154 |
-
<td>57.4</td><td style="border-right:1px solid #ddd;">90.0</td>
|
| 155 |
-
<td>1.84</td>
|
| 156 |
-
</tr>
|
| 157 |
-
<tr>
|
| 158 |
-
<td style="padding:6px; text-align:left; border-right:1px solid #ddd;">YOLOv11-N</td>
|
| 159 |
-
<td>39.4</td><td style="border-right:1px solid #ddd;">55.3</td>
|
| 160 |
-
<td>61.0</td><td style="border-right:1px solid #ddd;">81.2</td>
|
| 161 |
-
<td>18.5</td><td style="border-right:1px solid #ddd;">32.2</td>
|
| 162 |
-
<td>67.8</td><td style="border-right:1px solid #ddd;">89.8</td>
|
| 163 |
-
<td>57.4</td><td style="border-right:1px solid #ddd;">90.0</td>
|
| 164 |
-
<td>1.50</td>
|
| 165 |
-
</tr>
|
| 166 |
-
<tr>
|
| 167 |
-
<td style="padding:6px; text-align:left; border-right:1px solid #ddd;">YOLOv12-N</td>
|
| 168 |
-
<td>40.6</td><td style="border-right:1px solid #ddd;">56.7</td>
|
| 169 |
-
<td>60.7</td><td style="border-right:1px solid #ddd;">80.8</td>
|
| 170 |
-
<td>18.3</td><td style="border-right:1px solid #ddd;">31.7</td>
|
| 171 |
-
<td>67.6</td><td style="border-right:1px solid #ddd;">89.3</td>
|
| 172 |
-
<td>57.4</td><td style="border-right:1px solid #ddd;">90.0</td>
|
| 173 |
-
<td>1.64</td>
|
| 174 |
-
</tr>
|
| 175 |
-
<tr style="border-bottom:1px solid #000;">
|
| 176 |
-
<td style="padding:6px; text-align:left; border-right:1px solid #ddd;">YOLOv13-N</td>
|
| 177 |
-
<td>41.6</td><td style="border-right:1px solid #ddd;">57.8</td>
|
| 178 |
-
<td>60.7</td><td style="border-right:1px solid #ddd;">80.3</td>
|
| 179 |
-
<td>17.5</td><td style="border-right:1px solid #ddd;">30.6</td>
|
| 180 |
-
<td>67.7</td><td style="border-right:1px solid #ddd;">90.6</td>
|
| 181 |
-
<td>57.5</td><td style="border-right:1px solid #ddd;">90.3</td>
|
| 182 |
-
<td>1.97</td>
|
| 183 |
-
</tr>
|
| 184 |
-
<tr style="background-color:#f9f9f9;">
|
| 185 |
-
<td style="padding:8px; text-align:left; border-right:1px solid #ddd;"><b>YOLO-Master-N</b></td>
|
| 186 |
-
<td><b>42.4</b></td><td style="border-right:1px solid #ddd;"><b>59.2</b></td>
|
| 187 |
-
<td><b>62.1</b></td><td style="border-right:1px solid #ddd;"><b>81.9</b></td>
|
| 188 |
-
<td><b>19.6</b></td><td style="border-right:1px solid #ddd;"><b>33.7</b></td>
|
| 189 |
-
<td><b>69.2</b></td><td style="border-right:1px solid #ddd;"><b>91.3</b></td>
|
| 190 |
-
<td><b>58.2</b></td><td style="border-right:1px solid #ddd;"><b>90.6</b></td>
|
| 191 |
-
<td><b>1.62</b></td>
|
| 192 |
-
</tr>
|
| 193 |
-
</tbody>
|
| 194 |
-
</table>
|
| 195 |
-
</div>
|
| 196 |
-
|
| 197 |
-
### Segmentation
|
| 198 |
-
|
| 199 |
-
| **Model** | **Size** | **mAPbox (%)** | **mAPmask (%)** | **Gain (mAPmask)** |
|
| 200 |
-
| --------------------- | -------- | -------------- | --------------- | ------------------ |
|
| 201 |
-
| YOLOv11-seg-N | 640 | 38.9 | 32.0 | - |
|
| 202 |
-
| YOLOv12-seg-N | 640 | 39.9 | 32.8 | Baseline |
|
| 203 |
-
| **YOLO-Master-seg-N** | **640** | **42.9** | **35.6** | **+2.8%** 🚀 |
|
| 204 |
-
|
| 205 |
-
### Classification
|
| 206 |
-
|
| 207 |
-
| **Model** | **Dataset** | **Input Size** | **Top-1 Acc (%)** | **Top-5 Acc (%)** | **Comparison** |
|
| 208 |
-
| --------------------- | ------------ | -------------- | ----------------- | ----------------- | ----------------- |
|
| 209 |
-
| YOLOv11-cls-N | ImageNet | 224 | 70.0 | 89.4 | Baseline |
|
| 210 |
-
| YOLOv12-cls-N | ImageNet | 224 | 71.7 | 90.5 | +1.7% Top-1 |
|
| 211 |
-
| **YOLO-Master-cls-N** | **ImageNet** | **224** | **76.6** | **93.4** | **+4.9% Top-1** 🔥 |
|
| 212 |
-
|
| 213 |
-
## 🖼️ Detection Examples
|
| 214 |
-
|
| 215 |
-
<div align="center">
|
| 216 |
-
<img width="1416" height="856" alt="Detection Examples" src="https://github.com/user-attachments/assets/0e1fbe4a-34e7-489e-b936-6d121ede5cf6" /> </div>
|
| 217 |
-
<table border="0"> <tr> <td align="center" style="font-weight: bold; background-color: #f6f8fa;"> <b>Detection</b> </td> <td width="45%"> <img src="https://github.com/user-attachments/assets/db350acd-1d91-4be6-96b2-6bdf8aac57e8" alt="Detection 1" style="width:100%; display:block; border-radius:4px;"> </td> <td width="45%"> <img src="https://github.com/user-attachments/assets/b6c80dbd-120e-428b-8d26-ea2b38a40b47" alt="Detection 2" style="width:100%; display:block; border-radius:4px;"> </td> </tr> <tr> <td align="center" style="font-weight: bold; background-color: #f6f8fa;"> <b>Segmentation</b> </td> <td width="45%"> <img src="https://github.com/user-attachments/assets/edb05e3c-cd83-41db-89f8-8ef09fc22798" alt="Segmentation 1" style="width:100%; display:block; border-radius:4px;"> </td> <td width="45%"> <img src="https://github.com/user-attachments/assets/ea138674-d7c7-48fb-b272-3ec211d161bf" alt="Segmentation 2" style="width:100%; display:block; border-radius:4px;"> </td> </tr> </table>
|
| 218 |
-
|
| 219 |
-
|
| 220 |
-
|
| 221 |
-
## 🧩 Supported Tasks
|
| 222 |
-
|
| 223 |
-
YOLO-Master builds upon the robust Ultralytics framework, inheriting support for various computer vision tasks. While our research primarily focuses on Real-Time Object Detection, the codebase is capable of supporting:
|
| 224 |
-
|
| 225 |
-
| Task | Status | Description |
|
| 226 |
-
|:-----|:------:|:------------|
|
| 227 |
-
| **Object Detection** | ✅ | Real-time object detection with ES-MoE acceleration. |
|
| 228 |
-
| **Instance Segmentation** | ✅ | Experimental support (inherited from Ultralytics). |
|
| 229 |
-
| **Pose Estimation** | 🚧 | Experimental support (inherited from Ultralytics). |
|
| 230 |
-
| **OBB Detection** | 🚧 | Experimental support (inherited from Ultralytics). |
|
| 231 |
-
| **Classification** | ✅ | Image classification support. |
|
| 232 |
-
|
| 233 |
-
## ⚙️ Quick Start
|
| 234 |
-
|
| 235 |
-
### Installation
|
| 236 |
-
|
| 237 |
-
<details open>
|
| 238 |
-
<summary><strong>Install via pip (Recommended)</strong></summary>
|
| 239 |
-
|
| 240 |
-
```bash
|
| 241 |
-
# 1. Create and activate a new environment
|
| 242 |
-
conda create -n yolo_master python=3.11 -y
|
| 243 |
-
conda activate yolo_master
|
| 244 |
-
|
| 245 |
-
# 2. Clone the repository
|
| 246 |
-
git clone https://github.com/isLinXu/YOLO-Master
|
| 247 |
-
cd YOLO-Master
|
| 248 |
-
|
| 249 |
-
# 3. Install dependencies
|
| 250 |
-
pip install -r requirements.txt
|
| 251 |
-
pip install -e .
|
| 252 |
-
|
| 253 |
-
# 4. Optional: Install FlashAttention for faster training (CUDA required)
|
| 254 |
-
pip install flash_attn
|
| 255 |
-
```
|
| 256 |
-
</details>
|
| 257 |
-
|
| 258 |
-
### Validation
|
| 259 |
-
|
| 260 |
-
Validate the model accuracy on the COCO dataset.
|
| 261 |
-
|
| 262 |
-
```python
|
| 263 |
-
from ultralytics import YOLO
|
| 264 |
-
|
| 265 |
-
# Load the pretrained model
|
| 266 |
-
model = YOLO("yolo_master_n.pt")
|
| 267 |
-
|
| 268 |
-
# Run validation
|
| 269 |
-
metrics = model.val(data="coco.yaml", save_json=True)
|
| 270 |
-
print(metrics.box.map) # map50-95
|
| 271 |
-
```
|
| 272 |
-
|
| 273 |
-
### Training
|
| 274 |
-
|
| 275 |
-
Train a new model on your custom dataset or COCO.
|
| 276 |
-
|
| 277 |
-
```python
|
| 278 |
-
from ultralytics import YOLO
|
| 279 |
-
|
| 280 |
-
# Load a model
|
| 281 |
-
model = YOLO('cfg/models/master/v0/det/yolo-master-n.yaml') # build a new model from YAML
|
| 282 |
-
|
| 283 |
-
# Train the model
|
| 284 |
-
results = model.train(
|
| 285 |
-
data='coco.yaml',
|
| 286 |
-
epochs=600,
|
| 287 |
-
batch=256,
|
| 288 |
-
imgsz=640,
|
| 289 |
-
device="0,1,2,3", # Use multiple GPUs
|
| 290 |
-
scale=0.5,
|
| 291 |
-
mosaic=1.0,
|
| 292 |
-
mixup=0.0,
|
| 293 |
-
copy_paste=0.1
|
| 294 |
-
)
|
| 295 |
-
```
|
| 296 |
-
|
| 297 |
-
### Inference
|
| 298 |
-
|
| 299 |
-
Run inference on images or videos.
|
| 300 |
-
|
| 301 |
-
**Python:**
|
| 302 |
-
```python
|
| 303 |
-
from ultralytics import YOLO
|
| 304 |
-
|
| 305 |
-
model = YOLO("yolo_master_n.pt")
|
| 306 |
-
results = model("path/to/image.jpg")
|
| 307 |
-
results[0].show()
|
| 308 |
-
```
|
| 309 |
-
|
| 310 |
-
**CLI:**
|
| 311 |
-
```bash
|
| 312 |
-
yolo predict model=yolo_master_n.pt source='path/to/image.jpg' show=True
|
| 313 |
-
```
|
| 314 |
-
|
| 315 |
-
### Export
|
| 316 |
-
|
| 317 |
-
Export the model to other formats for deployment (TensorRT, ONNX, etc.).
|
| 318 |
-
|
| 319 |
-
```python
|
| 320 |
-
from ultralytics import YOLO
|
| 321 |
-
|
| 322 |
-
model = YOLO("yolo_master_n.pt")
|
| 323 |
-
model.export(format="engine", half=True) # Export to TensorRT
|
| 324 |
-
# formats: onnx, openvino, engine, coreml, saved_model, pb, tflite, edgetpu, tfjs
|
| 325 |
-
```
|
| 326 |
-
|
| 327 |
-
### Gradio Demo
|
| 328 |
-
|
| 329 |
-
Launch a local web interface to test the model interactively. This application provides a user-friendly Gradio dashboard for model inference, supporting automatic model scanning, task switching (Detection, Segmentation, Classification), and real-time visualization.
|
| 330 |
-
|
| 331 |
-
```bash
|
| 332 |
-
python app.py
|
| 333 |
-
# Open http://127.0.0.1:7860 in your browser
|
| 334 |
-
```
|
| 335 |
-
|
| 336 |
-
## 🤝 Community & Contributing
|
| 337 |
-
|
| 338 |
-
We welcome contributions! Please check out our [Contribution Guidelines](CONTRIBUTING.md) for details on how to get involved.
|
| 339 |
-
|
| 340 |
-
- **Issues**: Report bugs or request features [here](https://github.com/isLinXu/YOLO-Master/issues).
|
| 341 |
-
- **Pull Requests**: Submit your improvements.
|
| 342 |
-
|
| 343 |
-
## 📄 License
|
| 344 |
-
|
| 345 |
-
This project is licensed under the [GNU Affero General Public License v3.0 (AGPL-3.0)](LICENSE).
|
| 346 |
-
|
| 347 |
-
## 🙏 Acknowledgements
|
| 348 |
-
|
| 349 |
-
This work builds upon the excellent [Ultralytics](https://github.com/ultralytics/ultralytics) framework. Huge thanks to the community for contributions, deployments, and tutorials!
|
| 350 |
-
|
| 351 |
-
## 📝 Citation
|
| 352 |
-
|
| 353 |
-
If you use YOLO-Master in your research, please cite our paper:
|
| 354 |
-
|
| 355 |
-
```bibtex
|
| 356 |
-
@article{lin2025yolomaster,
|
| 357 |
-
title={{YOLO-Master}: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection},
|
| 358 |
-
author={Lin, Xu and Peng, Jinlong and Gan, Zhenye and Zhu, Jiawen and Liu, Jun},
|
| 359 |
-
journal={arXiv preprint arXiv:},
|
| 360 |
-
year={2025}
|
| 361 |
-
}
|
| 362 |
-
```
|
| 363 |
-
|
| 364 |
-
⭐ **If you find this work useful, please star the repository!**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
Title: YOLO Master WebUI Demo
|
| 3 |
+
emoji: 🚀
|
| 4 |
+
colorFrom: green
|
| 5 |
+
colorTo: blue
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: "4.44.0"
|
| 8 |
+
app_file: app.py
|
| 9 |
+
pinned: false
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
---
|
| 11 |
|
| 12 |
+
# YOLO Master WebUI Demo
|
| 13 |
+
|
| 14 |
+
This Space runs a Gradio-based YOLO Master WebUI demo.
|
| 15 |
+
|
| 16 |
+
> import os
|
| 17 |
+
> import gc
|
| 18 |
+
> import warnings
|
| 19 |
+
> from pathlib import Path
|
| 20 |
+
> from typing import List, Dict, Optional, Tuple, Any
|
| 21 |
+
>
|
| 22 |
+
> import gradio as gr
|
| 23 |
+
> import numpy as np
|
| 24 |
+
> import pandas as pd
|
| 25 |
+
> import cv2
|
| 26 |
+
> import torch
|
| 27 |
+
> from ultralytics import YOLO
|
| 28 |
+
> try:
|
| 29 |
+
> from huggingface_hub import hf_hub_download
|
| 30 |
+
> except Exception:
|
| 31 |
+
> hf_hub_download = None
|
| 32 |
+
>
|
| 33 |
+
> # Ignore unnecessary warnings
|
| 34 |
+
> warnings.filterwarnings("ignore")
|
| 35 |
+
>
|
| 36 |
+
>
|
| 37 |
+
> class GlobalConfig:
|
| 38 |
+
> """Global configuration parameters for easy modification."""
|
| 39 |
+
> # Default model files mapping
|
| 40 |
+
> DEFAULT_MODELS = {
|
| 41 |
+
> "detect": "ckpts/yolo-master-v0.1-n.pt",
|
| 42 |
+
> "seg": "ckpts/yolo-master-seg-n.pt",
|
| 43 |
+
> "cls": "ckpts/yolo-master-cls-n.pt",
|
| 44 |
+
> "pose": "yolov8n-pose.pt",
|
| 45 |
+
> "obb": "yolov8n-obb.pt"
|
| 46 |
+
> }
|
| 47 |
+
> # Allowed image formats
|
| 48 |
+
> IMAGE_EXTENSIONS = {".jpg", ".jpeg", ".png", ".bmp", ".webp"}
|
| 49 |
+
> # UI Theme
|
| 50 |
+
> THEME = gr.themes.Soft(primary_hue="blue", neutral_hue="slate")
|
| 51 |
+
> DEFAULT_IMAGE_DIR = "./image"
|
| 52 |
+
>
|
| 53 |
+
>
|
| 54 |
+
> class ModelManager:
|
| 55 |
+
> """Handles model scanning, loading, and memory management."""
|
| 56 |
+
> def __init__(self, ckpts_root: Path):
|
| 57 |
+
> self.ckpts_root = ckpts_root
|
| 58 |
+
> self.current_model: Optional[YOLO] = None
|
| 59 |
+
> self.current_model_path: str = ""
|
| 60 |
+
> self.current_task: str = "detect"
|
| 61 |
+
>
|
| 62 |
+
> def scan_checkpoints(self) -> Dict[str, List[str]]:
|
| 63 |
+
> """
|
| 64 |
+
> Scans the checkpoint directory and categorizes models by task.
|
| 65 |
+
> """
|
| 66 |
+
> model_map = {k: [] for k in GlobalConfig.DEFAULT_MODELS.keys()}
|
| 67 |
+
>
|
| 68 |
+
> if not self.ckpts_root.exists():
|
| 69 |
+
> return model_map
|
| 70 |
+
>
|
| 71 |
+
> # Recursively find all .pt files
|
| 72 |
+
> for p in self.ckpts_root.rglob("*.pt"):
|
| 73 |
+
> if p.is_dir(): continue
|
| 74 |
+
>
|
| 75 |
+
> path_str = str(p.absolute())
|
| 76 |
+
> filename = p.name.lower()
|
| 77 |
+
> parent = p.parent.name.lower()
|
| 78 |
+
>
|
| 79 |
+
> # Intelligent classification logic
|
| 80 |
+
> if "seg" in filename or "seg" in parent:
|
| 81 |
+
> model_map["seg"].append(path_str)
|
| 82 |
+
> elif "cls" in filename or "class" in filename or "cls" in parent:
|
| 83 |
+
> model_map["cls"].append(path_str)
|
| 84 |
+
> elif "pose" in filename or "pose" in parent:
|
| 85 |
+
> model_map["pose"].append(path_str)
|
| 86 |
+
> elif "obb" in filename or "obb" in parent:
|
| 87 |
+
> model_map["obb"].append(path_str)
|
| 88 |
+
> else:
|
| 89 |
+
> model_map["detect"].append(path_str) # Default to detect
|
| 90 |
+
>
|
| 91 |
+
> # Deduplicate and sort
|
| 92 |
+
> for k in model_map:
|
| 93 |
+
> model_map[k] = sorted(list(set(model_map[k])))
|
| 94 |
+
>
|
| 95 |
+
> return model_map
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
packages.txt
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
ffmpeg
|
| 2 |
+
libgl1
|
requirements.txt
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
ultralytics
|
| 2 |
+
gradio==4.44.0
|
| 3 |
+
opencv-python
|
| 4 |
+
pillow
|
| 5 |
+
numpy
|
| 6 |
+
matplotlib
|
| 7 |
+
torch==2.1.2
|
| 8 |
+
torchvision==0.16.2
|