Object Detection
ultralytics
yolo
edge-ai
quantization
mrbrownn43's picture
Update README.md
5ccedd8 verified
---
license: apache-2.0
tags:
- object-detection
- yolo
- edge-ai
- quantization
datasets:
- MTID
- Roboflow
- VisDrone
library_name: ultralytics
---
# YOLO11 GhostConv + Knowledge Distillation + Quantization
This notebook implements a complete model optimization pipeline for YOLO11 targeting edge devices, including: custom architecture with GhostConv, Knowledge Distillation, and Quantization.
## πŸ“‹ Table of Contents
- [Overview](#overview)
- [Notebook Structure](#notebook-structure)
- [System Requirements](#system-requirements)
- [Installation](#installation)
- [Usage Guide](#usage-guide)
- [Results](#results)
- [References](#references)
## 🎯 Overview
This notebook implements a 3-stage YOLO11 optimization pipeline:
### 1. Custom Architecture (YOLO11n-GhostConv)
- Replace Conv layers with **GhostConv** to reduce parameters
- Retain C3k2 and C2PSA blocks for feature extraction
- Architecture optimized for traffic dataset (5 classes)
### 2. Knowledge Distillation (KD)
- Teacher model: YOLO11l (large model)
- Student model: YOLO11n-GhostConv (custom lightweight)
- Techniques:
- Feature-based distillation (MSE loss)
- Logit-based distillation (KL divergence)
- Temperature scaling (T=4.0)
- Progressive KD with warmup epochs
### 3. Quantization
- FP32 β†’ INT8 quantization with TFLite
- FP32 β†’ FP16 quantization
- Calibration dataset for INT8
- Performance comparison: FP32 vs INT8 vs FP16
## πŸ“ Notebook Structure
### Section 1: Initialization
- Mount Google Drive
- Setup project directories
- Import Ultralytics modules (GhostConv, C3k2, C2PSA)
- Clone and install Ultralytics from source
### Section 2: Custom Architecture
- Define YOLO11_TinyGhost architecture in YAML
- Backbone with GhostConv layers
- Head with Detect layer for 5 classes
- Train baseline model (50 epochs)
### Section 3: Knowledge Distillation
**Class implementations:**
- `KDConfig`: Configuration for KD training
- `KnowledgeDistillationTrainer`: Custom trainer inheriting from DetectionTrainer
- Forward hooks to capture intermediate features
- Feature distillation loss (normalized MSE)
- Logit distillation loss (KL divergence with temperature)
- Combined loss: `(1-Ξ±-Ξ²)*L_hard + Ξ±*L_feature + Ξ²*L_logit`
**Training strategy:**
- Warmup phase (8 epochs): hard loss only
- After warmup: combine hard + KD losses
- KD layers: ["model.4", "model.6", "model.10"] (P3, P4, PSA)
- Hyperparameters: Ξ±=0.3, Ξ²=0.2, T=4.0
### Section 4: Visualization
- Training metrics plotting (mAP, loss curves)
- F1 score tracking
- Precision/Recall curves
- Box/Class/DFL loss comparison
### Section 5: Fine-tuning
- Load best KD checkpoint
- Fine-tune on multi-view intersection dataset
- Freeze 3 backbone layers
- Low learning rate (1e-5) with cosine scheduler
### Section 6: Quantization
**Export formats:**
- INT8 TFLite (with calibration dataset)
- FP16 TFLite
**Evaluation:**
- Compare mAP50 and mAP50-95
- FP32 vs INT8 vs FP16
- Image size: 416x416
## πŸ”§ System Requirements
### Hardware
- GPU: CUDA-compatible (T4 or better recommended)
- RAM: 16GB+
- Storage: 10GB+ for datasets and models
### Software
```
Python >= 3.8
PyTorch >= 1.13
CUDA >= 11.3
Google Colab (recommended)
```
## πŸ“¦ Installation
### 1. Clone Ultralytics from source
```bash
!git clone https://github.com/ultralytics/ultralytics
%cd ultralytics
!pip install -e .
```
### 2. Dependencies
```python
pip install torch torchvision
pip install matplotlib pandas
pip install opencv-python pillow
```
### 3. Dataset structure
```
dataset/
β”œβ”€β”€ images/
β”‚ β”œβ”€β”€ train/
β”‚ └── val/
β”œβ”€β”€ labels/
β”‚ β”œβ”€β”€ train/
β”‚ └── val/
└── data.yaml
```
## πŸš€ Usage Guide
### Step 1: Prepare Data
```python
PROJECT_DIR = "/content/drive/MyDrive/yolo_ghostblock"
DATASET_DIR = "/content/drive/MyDrive/dataset/yolo_mtid_motor/dataset"
```
### Step 2: Train Baseline GhostConv Model
```python
model = YOLO("yolo11_tinyghost.yaml")
model.train(
data=f"{DATASET_DIR}/data.yaml",
epochs=50,
imgsz=640,
device=0
)
```
### Step 3: Knowledge Distillation
```python
# Load teacher and student
teacher = YOLO("path/to/teacher.pt")
student = YOLO("path/to/student.pt")
# Create KD trainer
TrainerClass = create_kd_trainer_class(
teacher_model=teacher,
kd_alpha=0.3,
kd_beta=0.2,
kd_temperature=4.0,
kd_layers=["model.4", "model.6", "model.10"]
)
# Train with KD
trainer = TrainerClass(overrides={...})
trainer.train()
```
### Step 4: Quantization
```python
# Export INT8
model.export(
format="tflite",
int8=True,
data=CALIB_YAML,
imgsz=416
)
# Evaluate quantized model
model_int8 = YOLO("best_int8.tflite")
metrics = model_int8.val(data=DATA_YAML, imgsz=416)
```
## πŸ“Š Results
### Model Comparison
| Model | Parameters | Size | mAP50 | mAP50-95 |
|-------|-----------|------|-------|----------|
| YOLO11l (Teacher) | ~20M | ~40MB | 0.95+ | 0.80+ |
| YOLO11n-Ghost | ~2M | ~4MB | 0.92+ | 0.75+ |
| + KD | ~2M | ~4MB | 0.94+ | 0.78+ |
| + INT8 | ~2M | ~1MB | 0.93+ | 0.76+ |
### Quantization Impact
- **FP32 β†’ INT8**: ~75% size reduction, ~1-2% mAP drop
- **FP32 β†’ FP16**: ~50% size reduction, ~0.5% mAP drop
### Training Curves
- Box Loss: converges after ~30 epochs
- mAP50: reaches plateau ~35-40 epochs
- F1 Score: 0.85-0.90 range
## πŸ“– Technical Details
### GhostConv Architecture
```yaml
backbone:
- [-1, 1, GhostConv, [64, 3, 2]]
- [-1, 1, GhostConv, [128, 3, 2]]
- [-1, 1, C3k2, [256, False, 0.25]]
...
```
### KD Loss Formula
```
L_total = (1 - Ξ± - Ξ²) * L_hard + Ξ± * L_feature + Ξ² * L_logit
L_feature = MSE(normalize(S_feat), normalize(T_feat))
L_logit = KL(softmax(S/T), softmax(T/T)) * TΒ²
```
### Quantization Config
- **INT8**: Post-training quantization with calibration
- **Calibration**: 100-200 images from training set
- **Input**: uint8 [0, 255] or float32 normalized
## βš™οΈ Hyperparameters
### Training
- **Epochs**: 40-50
- **Batch size**: 16
- **Image size**: 640x640
- **Learning rate**: 5e-5 (baseline), 1e-5 (fine-tune)
- **Optimizer**: AdamW with cosine scheduler
### Knowledge Distillation
- **Ξ± (feature)**: 0.3
- **Ξ² (logit)**: 0.2
- **Temperature**: 4.0
- **Warmup epochs**: 8
- **KD layers**: P3, P4, PSA output
### Quantization
- **Format**: TFLite
- **Input size**: 416x416 (edge deployment)
- **Calibration samples**: 100
## πŸ› Troubleshooting
### Issue 1: CUDA Out of Memory
```python
# Reduce batch size
batch = 8
# Enable mixed precision
amp = True
```
### Issue 2: Feature Shape Mismatch in KD
- Check teacher and student architecture compatibility
- Verify KD layer names match between models
- Ensure input sizes are consistent
### Issue 3: INT8 Quantization Accuracy Drop
- Increase number of calibration samples
- Use representative dataset (diverse conditions)
- Consider QAT (Quantization-Aware Training)
## πŸ“š References
### Papers
- [YOLO11](https://docs.ultralytics.com/models/yolo11/)
- [GhostNet: More Features from Cheap Operations](https://arxiv.org/abs/1911.11907)
- [Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531)
- [Quantization and Training of Neural Networks](https://arxiv.org/abs/1806.08342)
### Resources
- [Ultralytics Documentation](https://docs.ultralytics.com/)
- [TFLite Quantization Guide](https://www.tensorflow.org/lite/performance/post_training_quantization)
## 🎯 Key Features
### Architecture Optimization
- **GhostConv**: Reduces FLOPs by ~50% compared to standard convolutions
- **Lightweight backbone**: Maintains accuracy while reducing parameters
- **Flexible head**: Supports multiple detection tasks
### Knowledge Distillation
- **Multi-level distillation**: Combines feature and logit knowledge transfer
- **Temperature-scaled softmax**: Smooths probability distributions
- **Progressive training**: Warmup phase for stable convergence
### Model Compression
- **INT8 quantization**: 4x memory reduction
- **FP16 quantization**: 2x memory reduction
- **Edge-ready**: Optimized for mobile/embedded deployment
## πŸ’‘ Best Practices
### Training
1. Start with pre-trained weights when possible
2. Use data augmentation (mosaic, mixup, etc.)
3. Monitor validation metrics closely
4. Apply early stopping (patience=10-15)
### Knowledge Distillation
1. Ensure teacher model is well-trained (mAP > 90%)
2. Match batch normalization statistics
3. Use appropriate temperature (T=3-5 for object detection)
4. Gradually increase KD loss weight
### Quantization
1. Use diverse calibration dataset
2. Test on representative test set
3. Profile inference speed on target device
4. Consider hybrid quantization (some layers FP32)
## πŸ“ˆ Performance Metrics
### Speed Benchmarks
| Model | FP32 (ms) | FP16 (ms) | INT8 (ms) | Device |
|-------|-----------|-----------|-----------|---------|
| YOLO11l | 45 | 28 | N/A | T4 GPU |
| YOLO11n-Ghost | 12 | 8 | N/A | T4 GPU |
| INT8 TFLite | N/A | N/A | 25 | Edge TPU |
### Accuracy vs Efficiency
- **YOLO11l**: Highest accuracy, largest model
- **YOLO11n-Ghost**: Best accuracy/size trade-off
- **+ KD**: Closes gap with teacher
- **+ INT8**: Minimal accuracy loss, deployable
## πŸ”„ Workflow Summary
```mermaid
graph LR
A[YOLO11l Teacher] --> B[Design GhostConv Student]
B --> C[Train Baseline]
C --> D[Knowledge Distillation]
D --> E[Fine-tune]
E --> F[Quantize INT8/FP16]
F --> G[Deploy to Edge]
```
## πŸš€ Deployment
### TFLite Conversion
```python
# Export to TFLite INT8
model.export(
format="tflite",
int8=True,
data="calibration.yaml",
imgsz=416
)
```
### Inference Example
```python
import numpy as np
from PIL import Image
# Load TFLite model
interpreter = tf.lite.Interpreter(model_path="best_int8.tflite")
interpreter.allocate_tensors()
# Preprocess image
img = Image.open("test.jpg").resize((416, 416))
input_data = np.array(img, dtype=np.uint8).reshape(1, 416, 416, 3)
# Run inference
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
output = interpreter.get_tensor(output_details[0]['index'])
```
## πŸ‘₯ Contributing
Contributions are welcome! Areas for improvement:
- Additional distillation techniques (attention transfer, etc.)
- QAT implementation
- More lightweight architectures
- Deployment examples for different platforms
## πŸ“„ License
This notebook follows the Ultralytics AGPL-3.0 License.
## πŸ™ Acknowledgments
- [Ultralytics](https://ultralytics.com/) for YOLO11 framework
- [GhostNet](https://github.com/huawei-noah/ghostnet) for efficient convolution design
- Google Colab for compute resources
---
**Note**: This notebook is designed to run on Google Colab with GPU runtime. Adjust paths and configurations for local environments as needed.
**Last Updated**: January 2026
**Version**: v11
**Compatibility**: Ultralytics 8.0+