🚀 HMKD-ICMR: Heterogeneous Model Knowledge Distillation via Dual Alignment for Semantic Segmentation

---
license: apache-2.0
tags:
- semantic-segmentation
- knowledge-distillation
- multimodal
- model-compression
- pytorch
---

<a id="top"></a>
<div align="center">
  <h1>🚀 HMKD-ICMR: Heterogeneous Model Knowledge Distillation via Dual Alignment for Semantic Segmentation</h1>

  <p>
    <b>Mingzhu Xu</b><sup>1</sup>&nbsp;
    <b>Jing Wang</b><sup>1</sup>&nbsp;
    <b>Mingcai Wang</b><sup>1</sup>&nbsp;
    <b>Yiping Li</b><sup>1</sup>&nbsp;
    <b>Yupeng Hu</b><sup>1✉</sup>&nbsp;
    <b>Xuemeng Song</b><sup>1</sup>&nbsp;
    <b>Weili Guan</b><sup>1</sup>
  </p>

  <p>
    <sup>1</sup>Affiliation (Please update if needed)
  </p>
</div>

Official implementation of **HMKD**, a Heterogeneous Model Knowledge Distillation framework with Dual Alignment for Semantic Segmentation.

🔗 **Conference:** ICMR 2025  
🔗 **Task:** Semantic Segmentation  
🔗 **Framework:** PyTorch  

---

## 📌 Model Information

### 1. Model Name
**HMKD** (Heterogeneous Model Knowledge Distillation)

---

### 2. Task Type & Applicable Tasks
- **Task Type:** Semantic Segmentation / Model Compression  
- **Core Task:** Knowledge Distillation for segmentation  
- **Applicable Scenarios:**
  - Lightweight model deployment  
  - Cross-architecture distillation  
  - Efficient semantic understanding  

---

### 3. Project Introduction

Semantic segmentation models often rely on heavy architectures, limiting their deployment in resource-constrained environments. Knowledge distillation (KD) provides a promising solution by transferring knowledge from a large teacher model to a compact student model.

**HMKD** introduces a **Dual Alignment Distillation Framework**, which:

- Aligns heterogeneous architectures between teacher and student models  
- Performs **feature-level and prediction-level alignment**  
- Bridges the representation gap across different model families  
- Improves segmentation accuracy while maintaining efficiency  

---

### 4. Training Data Source

Supported datasets:

- **Cityscapes**
- **CamVid**

| Dataset | Train | Val | Test | Classes |
|--------|------|-----|------|--------|
| Cityscapes | 2975 | 500 | 1525 | 19 |
| CamVid | 367 | 101 | 233 | 11 |

---

## 🚀 Environment Setup

- Ubuntu 20.04.4 LTS  
- Python 3.8.10 (Anaconda recommended)  
- CUDA 11.3  
- PyTorch 1.11.0  
- NCCL 2.10.3  

### Install dependencies:

```bash
pip install timm==0.3.2
pip install mmcv-full==1.2.7
pip install opencv-python==4.5.1.48
```

---

## ⚙️ Pre-trained Weights

### Initialization Weights

- ResNet-18  
- ResNet-101  
- SegFormer-B0  
- SegFormer-B4  

(Download from official PyTorch and Google Drive links)

---

### Trained Weights

Download trained HMKD models:

- Baidu Cloud: https://pan.baidu.com/s/1xw_6ts5VNV73vXeOLAokwQ?pwd=jvx8  

---

## 🚀 Training

1. Download datasets and pre-trained weights  
2. Generate dataset path lists (.txt files)  
3. Update dataset paths in the code  

### Run training:

```bash
CUDA_VISIBLE_DEVICES=0,1 nohup python -m torch.distributed.launch --nproc_per_node=2 train_NEW_AEU_kd.py > train.log 2>&1 &

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 train_NEW_AEU_kd.py
```

---

## ⚠️ Notes

- Designed for research purposes  
- Performance depends on teacher-student architecture pairing  
- Multi-GPU training is recommended  

---

## 📝 Citation

```bibtex
@ARTICLE{HMKD,
  author={Xu, Mingzhu and Wang, Jing and Wang, Mingcai and Li, Yiping and Hu, Yupeng and Song, Xuemeng and Guan, Weili},
  journal={ICMR}, 
  title={Heterogeneous Model Knowledge Distillation via Dual Alignment for Semantic Segmentation}, 
  year={2025}
}
```

---

## 📬 Contact

For questions or collaboration, please contact the corresponding author.

---