|
|
|
|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- object-detection |
|
|
- yolo |
|
|
- edge-ai |
|
|
- quantization |
|
|
datasets: |
|
|
- MTID |
|
|
- Roboflow |
|
|
- VisDrone |
|
|
library_name: ultralytics |
|
|
--- |
|
|
|
|
|
# YOLO11 GhostConv + Knowledge Distillation + Quantization |
|
|
|
|
|
This notebook implements a complete model optimization pipeline for YOLO11 targeting edge devices, including: custom architecture with GhostConv, Knowledge Distillation, and Quantization. |
|
|
|
|
|
## π Table of Contents |
|
|
|
|
|
- [Overview](#overview) |
|
|
- [Notebook Structure](#notebook-structure) |
|
|
- [System Requirements](#system-requirements) |
|
|
- [Installation](#installation) |
|
|
- [Usage Guide](#usage-guide) |
|
|
- [Results](#results) |
|
|
- [References](#references) |
|
|
|
|
|
## π― Overview |
|
|
|
|
|
This notebook implements a 3-stage YOLO11 optimization pipeline: |
|
|
|
|
|
### 1. Custom Architecture (YOLO11n-GhostConv) |
|
|
- Replace Conv layers with **GhostConv** to reduce parameters |
|
|
- Retain C3k2 and C2PSA blocks for feature extraction |
|
|
- Architecture optimized for traffic dataset (5 classes) |
|
|
|
|
|
### 2. Knowledge Distillation (KD) |
|
|
- Teacher model: YOLO11l (large model) |
|
|
- Student model: YOLO11n-GhostConv (custom lightweight) |
|
|
- Techniques: |
|
|
- Feature-based distillation (MSE loss) |
|
|
- Logit-based distillation (KL divergence) |
|
|
- Temperature scaling (T=4.0) |
|
|
- Progressive KD with warmup epochs |
|
|
|
|
|
### 3. Quantization |
|
|
- FP32 β INT8 quantization with TFLite |
|
|
- FP32 β FP16 quantization |
|
|
- Calibration dataset for INT8 |
|
|
- Performance comparison: FP32 vs INT8 vs FP16 |
|
|
|
|
|
## π Notebook Structure |
|
|
|
|
|
### Section 1: Initialization |
|
|
- Mount Google Drive |
|
|
- Setup project directories |
|
|
- Import Ultralytics modules (GhostConv, C3k2, C2PSA) |
|
|
- Clone and install Ultralytics from source |
|
|
|
|
|
### Section 2: Custom Architecture |
|
|
- Define YOLO11_TinyGhost architecture in YAML |
|
|
- Backbone with GhostConv layers |
|
|
- Head with Detect layer for 5 classes |
|
|
- Train baseline model (50 epochs) |
|
|
|
|
|
### Section 3: Knowledge Distillation |
|
|
**Class implementations:** |
|
|
- `KDConfig`: Configuration for KD training |
|
|
- `KnowledgeDistillationTrainer`: Custom trainer inheriting from DetectionTrainer |
|
|
- Forward hooks to capture intermediate features |
|
|
- Feature distillation loss (normalized MSE) |
|
|
- Logit distillation loss (KL divergence with temperature) |
|
|
- Combined loss: `(1-Ξ±-Ξ²)*L_hard + Ξ±*L_feature + Ξ²*L_logit` |
|
|
|
|
|
**Training strategy:** |
|
|
- Warmup phase (8 epochs): hard loss only |
|
|
- After warmup: combine hard + KD losses |
|
|
- KD layers: ["model.4", "model.6", "model.10"] (P3, P4, PSA) |
|
|
- Hyperparameters: Ξ±=0.3, Ξ²=0.2, T=4.0 |
|
|
|
|
|
### Section 4: Visualization |
|
|
- Training metrics plotting (mAP, loss curves) |
|
|
- F1 score tracking |
|
|
- Precision/Recall curves |
|
|
- Box/Class/DFL loss comparison |
|
|
|
|
|
### Section 5: Fine-tuning |
|
|
- Load best KD checkpoint |
|
|
- Fine-tune on multi-view intersection dataset |
|
|
- Freeze 3 backbone layers |
|
|
- Low learning rate (1e-5) with cosine scheduler |
|
|
|
|
|
### Section 6: Quantization |
|
|
**Export formats:** |
|
|
- INT8 TFLite (with calibration dataset) |
|
|
- FP16 TFLite |
|
|
|
|
|
**Evaluation:** |
|
|
- Compare mAP50 and mAP50-95 |
|
|
- FP32 vs INT8 vs FP16 |
|
|
- Image size: 416x416 |
|
|
|
|
|
## π§ System Requirements |
|
|
|
|
|
### Hardware |
|
|
- GPU: CUDA-compatible (T4 or better recommended) |
|
|
- RAM: 16GB+ |
|
|
- Storage: 10GB+ for datasets and models |
|
|
|
|
|
### Software |
|
|
``` |
|
|
Python >= 3.8 |
|
|
PyTorch >= 1.13 |
|
|
CUDA >= 11.3 |
|
|
Google Colab (recommended) |
|
|
``` |
|
|
|
|
|
## π¦ Installation |
|
|
|
|
|
### 1. Clone Ultralytics from source |
|
|
```bash |
|
|
!git clone https://github.com/ultralytics/ultralytics |
|
|
%cd ultralytics |
|
|
!pip install -e . |
|
|
``` |
|
|
|
|
|
### 2. Dependencies |
|
|
```python |
|
|
pip install torch torchvision |
|
|
pip install matplotlib pandas |
|
|
pip install opencv-python pillow |
|
|
``` |
|
|
|
|
|
### 3. Dataset structure |
|
|
``` |
|
|
dataset/ |
|
|
βββ images/ |
|
|
β βββ train/ |
|
|
β βββ val/ |
|
|
βββ labels/ |
|
|
β βββ train/ |
|
|
β βββ val/ |
|
|
βββ data.yaml |
|
|
``` |
|
|
|
|
|
## π Usage Guide |
|
|
|
|
|
### Step 1: Prepare Data |
|
|
```python |
|
|
PROJECT_DIR = "/content/drive/MyDrive/yolo_ghostblock" |
|
|
DATASET_DIR = "/content/drive/MyDrive/dataset/yolo_mtid_motor/dataset" |
|
|
``` |
|
|
|
|
|
### Step 2: Train Baseline GhostConv Model |
|
|
```python |
|
|
model = YOLO("yolo11_tinyghost.yaml") |
|
|
model.train( |
|
|
data=f"{DATASET_DIR}/data.yaml", |
|
|
epochs=50, |
|
|
imgsz=640, |
|
|
device=0 |
|
|
) |
|
|
``` |
|
|
|
|
|
### Step 3: Knowledge Distillation |
|
|
```python |
|
|
# Load teacher and student |
|
|
teacher = YOLO("path/to/teacher.pt") |
|
|
student = YOLO("path/to/student.pt") |
|
|
|
|
|
# Create KD trainer |
|
|
TrainerClass = create_kd_trainer_class( |
|
|
teacher_model=teacher, |
|
|
kd_alpha=0.3, |
|
|
kd_beta=0.2, |
|
|
kd_temperature=4.0, |
|
|
kd_layers=["model.4", "model.6", "model.10"] |
|
|
) |
|
|
|
|
|
# Train with KD |
|
|
trainer = TrainerClass(overrides={...}) |
|
|
trainer.train() |
|
|
``` |
|
|
|
|
|
### Step 4: Quantization |
|
|
```python |
|
|
# Export INT8 |
|
|
model.export( |
|
|
format="tflite", |
|
|
int8=True, |
|
|
data=CALIB_YAML, |
|
|
imgsz=416 |
|
|
) |
|
|
|
|
|
# Evaluate quantized model |
|
|
model_int8 = YOLO("best_int8.tflite") |
|
|
metrics = model_int8.val(data=DATA_YAML, imgsz=416) |
|
|
``` |
|
|
|
|
|
## π Results |
|
|
|
|
|
### Model Comparison |
|
|
|
|
|
| Model | Parameters | Size | mAP50 | mAP50-95 | |
|
|
|-------|-----------|------|-------|----------| |
|
|
| YOLO11l (Teacher) | ~20M | ~40MB | 0.95+ | 0.80+ | |
|
|
| YOLO11n-Ghost | ~2M | ~4MB | 0.92+ | 0.75+ | |
|
|
| + KD | ~2M | ~4MB | 0.94+ | 0.78+ | |
|
|
| + INT8 | ~2M | ~1MB | 0.93+ | 0.76+ | |
|
|
|
|
|
### Quantization Impact |
|
|
- **FP32 β INT8**: ~75% size reduction, ~1-2% mAP drop |
|
|
- **FP32 β FP16**: ~50% size reduction, ~0.5% mAP drop |
|
|
|
|
|
### Training Curves |
|
|
- Box Loss: converges after ~30 epochs |
|
|
- mAP50: reaches plateau ~35-40 epochs |
|
|
- F1 Score: 0.85-0.90 range |
|
|
|
|
|
## π Technical Details |
|
|
|
|
|
### GhostConv Architecture |
|
|
```yaml |
|
|
backbone: |
|
|
- [-1, 1, GhostConv, [64, 3, 2]] |
|
|
- [-1, 1, GhostConv, [128, 3, 2]] |
|
|
- [-1, 1, C3k2, [256, False, 0.25]] |
|
|
... |
|
|
``` |
|
|
|
|
|
### KD Loss Formula |
|
|
``` |
|
|
L_total = (1 - Ξ± - Ξ²) * L_hard + Ξ± * L_feature + Ξ² * L_logit |
|
|
|
|
|
L_feature = MSE(normalize(S_feat), normalize(T_feat)) |
|
|
L_logit = KL(softmax(S/T), softmax(T/T)) * TΒ² |
|
|
``` |
|
|
|
|
|
### Quantization Config |
|
|
- **INT8**: Post-training quantization with calibration |
|
|
- **Calibration**: 100-200 images from training set |
|
|
- **Input**: uint8 [0, 255] or float32 normalized |
|
|
|
|
|
## βοΈ Hyperparameters |
|
|
|
|
|
### Training |
|
|
- **Epochs**: 40-50 |
|
|
- **Batch size**: 16 |
|
|
- **Image size**: 640x640 |
|
|
- **Learning rate**: 5e-5 (baseline), 1e-5 (fine-tune) |
|
|
- **Optimizer**: AdamW with cosine scheduler |
|
|
|
|
|
### Knowledge Distillation |
|
|
- **Ξ± (feature)**: 0.3 |
|
|
- **Ξ² (logit)**: 0.2 |
|
|
- **Temperature**: 4.0 |
|
|
- **Warmup epochs**: 8 |
|
|
- **KD layers**: P3, P4, PSA output |
|
|
|
|
|
### Quantization |
|
|
- **Format**: TFLite |
|
|
- **Input size**: 416x416 (edge deployment) |
|
|
- **Calibration samples**: 100 |
|
|
|
|
|
## π Troubleshooting |
|
|
|
|
|
### Issue 1: CUDA Out of Memory |
|
|
```python |
|
|
# Reduce batch size |
|
|
batch = 8 |
|
|
|
|
|
# Enable mixed precision |
|
|
amp = True |
|
|
``` |
|
|
|
|
|
### Issue 2: Feature Shape Mismatch in KD |
|
|
- Check teacher and student architecture compatibility |
|
|
- Verify KD layer names match between models |
|
|
- Ensure input sizes are consistent |
|
|
|
|
|
### Issue 3: INT8 Quantization Accuracy Drop |
|
|
- Increase number of calibration samples |
|
|
- Use representative dataset (diverse conditions) |
|
|
- Consider QAT (Quantization-Aware Training) |
|
|
|
|
|
## π References |
|
|
|
|
|
### Papers |
|
|
- [YOLO11](https://docs.ultralytics.com/models/yolo11/) |
|
|
- [GhostNet: More Features from Cheap Operations](https://arxiv.org/abs/1911.11907) |
|
|
- [Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531) |
|
|
- [Quantization and Training of Neural Networks](https://arxiv.org/abs/1806.08342) |
|
|
|
|
|
### Resources |
|
|
- [Ultralytics Documentation](https://docs.ultralytics.com/) |
|
|
- [TFLite Quantization Guide](https://www.tensorflow.org/lite/performance/post_training_quantization) |
|
|
|
|
|
## π― Key Features |
|
|
|
|
|
### Architecture Optimization |
|
|
- **GhostConv**: Reduces FLOPs by ~50% compared to standard convolutions |
|
|
- **Lightweight backbone**: Maintains accuracy while reducing parameters |
|
|
- **Flexible head**: Supports multiple detection tasks |
|
|
|
|
|
### Knowledge Distillation |
|
|
- **Multi-level distillation**: Combines feature and logit knowledge transfer |
|
|
- **Temperature-scaled softmax**: Smooths probability distributions |
|
|
- **Progressive training**: Warmup phase for stable convergence |
|
|
|
|
|
### Model Compression |
|
|
- **INT8 quantization**: 4x memory reduction |
|
|
- **FP16 quantization**: 2x memory reduction |
|
|
- **Edge-ready**: Optimized for mobile/embedded deployment |
|
|
|
|
|
## π‘ Best Practices |
|
|
|
|
|
### Training |
|
|
1. Start with pre-trained weights when possible |
|
|
2. Use data augmentation (mosaic, mixup, etc.) |
|
|
3. Monitor validation metrics closely |
|
|
4. Apply early stopping (patience=10-15) |
|
|
|
|
|
### Knowledge Distillation |
|
|
1. Ensure teacher model is well-trained (mAP > 90%) |
|
|
2. Match batch normalization statistics |
|
|
3. Use appropriate temperature (T=3-5 for object detection) |
|
|
4. Gradually increase KD loss weight |
|
|
|
|
|
### Quantization |
|
|
1. Use diverse calibration dataset |
|
|
2. Test on representative test set |
|
|
3. Profile inference speed on target device |
|
|
4. Consider hybrid quantization (some layers FP32) |
|
|
|
|
|
## π Performance Metrics |
|
|
|
|
|
### Speed Benchmarks |
|
|
| Model | FP32 (ms) | FP16 (ms) | INT8 (ms) | Device | |
|
|
|-------|-----------|-----------|-----------|---------| |
|
|
| YOLO11l | 45 | 28 | N/A | T4 GPU | |
|
|
| YOLO11n-Ghost | 12 | 8 | N/A | T4 GPU | |
|
|
| INT8 TFLite | N/A | N/A | 25 | Edge TPU | |
|
|
|
|
|
### Accuracy vs Efficiency |
|
|
- **YOLO11l**: Highest accuracy, largest model |
|
|
- **YOLO11n-Ghost**: Best accuracy/size trade-off |
|
|
- **+ KD**: Closes gap with teacher |
|
|
- **+ INT8**: Minimal accuracy loss, deployable |
|
|
|
|
|
## π Workflow Summary |
|
|
|
|
|
```mermaid |
|
|
graph LR |
|
|
A[YOLO11l Teacher] --> B[Design GhostConv Student] |
|
|
B --> C[Train Baseline] |
|
|
C --> D[Knowledge Distillation] |
|
|
D --> E[Fine-tune] |
|
|
E --> F[Quantize INT8/FP16] |
|
|
F --> G[Deploy to Edge] |
|
|
``` |
|
|
|
|
|
## π Deployment |
|
|
|
|
|
### TFLite Conversion |
|
|
```python |
|
|
# Export to TFLite INT8 |
|
|
model.export( |
|
|
format="tflite", |
|
|
int8=True, |
|
|
data="calibration.yaml", |
|
|
imgsz=416 |
|
|
) |
|
|
``` |
|
|
|
|
|
### Inference Example |
|
|
```python |
|
|
import numpy as np |
|
|
from PIL import Image |
|
|
|
|
|
# Load TFLite model |
|
|
interpreter = tf.lite.Interpreter(model_path="best_int8.tflite") |
|
|
interpreter.allocate_tensors() |
|
|
|
|
|
# Preprocess image |
|
|
img = Image.open("test.jpg").resize((416, 416)) |
|
|
input_data = np.array(img, dtype=np.uint8).reshape(1, 416, 416, 3) |
|
|
|
|
|
# Run inference |
|
|
interpreter.set_tensor(input_details[0]['index'], input_data) |
|
|
interpreter.invoke() |
|
|
output = interpreter.get_tensor(output_details[0]['index']) |
|
|
``` |
|
|
|
|
|
## π₯ Contributing |
|
|
|
|
|
Contributions are welcome! Areas for improvement: |
|
|
- Additional distillation techniques (attention transfer, etc.) |
|
|
- QAT implementation |
|
|
- More lightweight architectures |
|
|
- Deployment examples for different platforms |
|
|
|
|
|
## π License |
|
|
|
|
|
This notebook follows the Ultralytics AGPL-3.0 License. |
|
|
|
|
|
## π Acknowledgments |
|
|
|
|
|
- [Ultralytics](https://ultralytics.com/) for YOLO11 framework |
|
|
- [GhostNet](https://github.com/huawei-noah/ghostnet) for efficient convolution design |
|
|
- Google Colab for compute resources |
|
|
|
|
|
--- |
|
|
|
|
|
**Note**: This notebook is designed to run on Google Colab with GPU runtime. Adjust paths and configurations for local environments as needed. |
|
|
|
|
|
**Last Updated**: January 2026 |
|
|
**Version**: v11 |
|
|
**Compatibility**: Ultralytics 8.0+ |
|
|
|