Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,210 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: cc-by-nc-sa-4.0
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-nc-sa-4.0
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
tags:
|
| 6 |
+
- neural-architecture-search
|
| 7 |
+
- evolutionary-computation
|
| 8 |
+
- computer-vision
|
| 9 |
+
- depth-estimation
|
| 10 |
+
- object-detection
|
| 11 |
+
- semantic-segmentation
|
| 12 |
+
- 3d-gaussian-splatting
|
| 13 |
+
- mamba
|
| 14 |
+
- vision-transformer
|
| 15 |
+
- multi-objective-optimization
|
| 16 |
+
datasets:
|
| 17 |
+
- imagenet-1k
|
| 18 |
+
- detection-datasets/coco
|
| 19 |
+
- scene_parse_150
|
| 20 |
+
- kitti
|
| 21 |
+
- nyu_depth_v2
|
| 22 |
+
- RealEstate10K
|
| 23 |
+
metrics:
|
| 24 |
+
- mAP
|
| 25 |
+
- miou
|
| 26 |
+
- abs_rel
|
| 27 |
+
- psnr
|
| 28 |
+
- ssim
|
| 29 |
+
pipeline_tag: depth-estimation
|
| 30 |
+
library_name: pytorch
|
| 31 |
+
---
|
| 32 |
+
|
| 33 |
+
# EvoNAS: Dual-Domain Representation Alignment for Geometry-Aware Architecture Search
|
| 34 |
+
|
| 35 |
+
<p align="center">
|
| 36 |
+
<a href="https://arxiv.org/abs/2603.19563"><img src="https://img.shields.io/badge/Paper-arXiv-red" alt="arXiv"></a>
|
| 37 |
+
<a href="https://github.com/EMI-group/evonas"><img src="https://img.shields.io/badge/Code-GitHub-blue" alt="GitHub"></a>
|
| 38 |
+
</p>
|
| 39 |
+
|
| 40 |
+
## Overview
|
| 41 |
+
|
| 42 |
+
EvoNAS is a multi-objective evolutionary neural architecture search framework that discovers **Pareto-optimal vision backbones** bridging 2D dense prediction and 3D rendering. It features:
|
| 43 |
+
|
| 44 |
+
- **Hybrid VSS-ViT Search Space**: Combines Vision State Space (Mamba) blocks with Vision Transformers
|
| 45 |
+
- **CA-DDKD**: Cross-Architecture Dual-Domain Knowledge Distillation via DCT constraints
|
| 46 |
+
- **DMMPE**: Hardware-isolated distributed evaluation engine for unbiased latency measurement
|
| 47 |
+
- **Progressive Supernet Training (PST)**: Curriculum-based weight-sharing optimization
|
| 48 |
+
|
| 49 |
+
The discovered **EvoNets** achieve state-of-the-art accuracy-efficiency trade-offs across object detection, semantic segmentation, monocular depth estimation, and novel view synthesis.
|
| 50 |
+
|
| 51 |
+
## Model Zoo
|
| 52 |
+
|
| 53 |
+
### Searched Architectures (EvoNets)
|
| 54 |
+
|
| 55 |
+
#### Object Detection on COCO (Mask R-CNN)
|
| 56 |
+
|
| 57 |
+
| Model | Params | MACs | AP^b | Latency | Throughput | NID | Weight |
|
| 58 |
+
|:------|:------:|:----:|:----:|:-------:|:----------:|:---:|:------:|
|
| 59 |
+
| EvoNet-C1 | 33M | 190G | 45.4 | 50.2ms | 26 FPS | 1.39 | [Download](./EvoNAS/evonet_c1_best_coco_bbox_mAP_0.45400) |
|
| 60 |
+
| EvoNet-C2 | 36M | 202G | 47.1 | 55.4ms | 23 FPS | 1.29 | [Download](./EvoNAS/evonet_c2_best_coco_bbox_mAP_0.46800) |
|
| 61 |
+
| EvoNet-C3 | 42M | 228G | 48.5 | 66.9ms | 18 FPS | 1.15 | [Download](./EvoNAS/evonet_c3_best_coco_bbox_mAP_0.48300) |
|
| 62 |
+
|
| 63 |
+
#### Semantic Segmentation on ADE20K (UPerNet)
|
| 64 |
+
|
| 65 |
+
| Model | Params | MACs | mIoU | Latency | Throughput | NID | Weight |
|
| 66 |
+
|:------|:------:|:----:|:----:|:-------:|:----------:|:---:|:------:|
|
| 67 |
+
| EvoNet-A1 | 23M | 711G | 44.1 | 77.3ms | 14 FPS | 1.93 | [Download](./EvoNAS/evonet_a1_best_mIoU_44.10000) |
|
| 68 |
+
| EvoNet-A2 | 26M | 724G | 47.3 | 81.0ms | 13 FPS | 1.79 | [Download](./EvoNAS/evonet_a2_best_mIoU_47.26000) |
|
| 69 |
+
| EvoNet-A3 | 32M | 754G | 49.7 | 94.8ms | 12 FPS | 1.57 | [Download](./EvoNAS/evonet_a3_best_mIoU_49.72000) |
|
| 70 |
+
|
| 71 |
+
#### Monocular Depth Estimation on KITTI
|
| 72 |
+
|
| 73 |
+
| Model | Params | MACs | Abs Relβ | Ξ΄ββ | Latency | Throughput | NID | Weight |
|
| 74 |
+
|:------|:------:|:----:|:--------:|:---:|:-------:|:----------:|:---:|:------:|
|
| 75 |
+
| EvoNet-K1 | 18.0M | 27.3G | 0.060 | 0.960 | 18.6ms | 117 FPS | 5.34 | [Download](./EvoNAS/evonet_k1_best_abs_rel_0.05967) |
|
| 76 |
+
| EvoNet-K2 | 22.6M | 36.2G | 0.056 | 0.966 | 24.6ms | 83 FPS | 4.28 | [Download](./EvoNAS/evonet_k2_best_abs_rel_0.05517) |
|
| 77 |
+
| EvoNet-K3 | 26.3M | 45.0G | 0.054 | 0.969 | 28.0ms | 65 FPS | 3.68 | [Download](./EvoNAS/evonet_k3_best_abs_rel_0.05375) |
|
| 78 |
+
|
| 79 |
+
#### Monocular Depth Estimation on NYU Depth v2
|
| 80 |
+
|
| 81 |
+
| Model | Params | MACs | Abs Relβ | Ξ΄ββ | Latency | Throughput | NID | Weight |
|
| 82 |
+
|:------|:------:|:----:|:--------:|:---:|:-------:|:----------:|:---:|:------:|
|
| 83 |
+
| EvoNet-N1 | 19.1M | 21.7G | 0.095 | 0.912 | 21.8ms | 138 FPS | 4.77 | [Download](./EvoNAS/evonet_n1_best_abs_rel_0.09456) |
|
| 84 |
+
| EvoNet-N2 | 24.1M | 27.1G | 0.089 | 0.926 | 25.9ms | 107 FPS | 3.85 | [Download](./EvoNAS/evonet_n2_best_abs_rel_0.08818) |
|
| 85 |
+
| EvoNet-N3 | 30.3M | 33.9G | 0.085 | 0.932 | 30.8ms | 88 FPS | 3.08 | [Download](./EvoNAS/evonet_n3_best_abs_rel_0.08475) |
|
| 86 |
+
|
| 87 |
+
#### Novel View Synthesis on RealEstate10K (3DGS)
|
| 88 |
+
|
| 89 |
+
| Model | Params | PSNRβ | SSIMβ | LPIPSβ | Latency | Throughput | Weight |
|
| 90 |
+
|:------|:------:|:-----:|:-----:|:------:|:-------:|:----------:|:------:|
|
| 91 |
+
| EvoNet-D | 44M | 26.41 | 0.871 | 0.127 | 88ms | 27 FPS | [Download](./NVS/epoch_9-step_150000.ckpt) |
|
| 92 |
+
|
| 93 |
+
### Supernet Checkpoints
|
| 94 |
+
|
| 95 |
+
| Checkpoint | Description | Weight |
|
| 96 |
+
|:-----------|:------------|:------:|
|
| 97 |
+
| supernet_imagenet_1k | Stage 1: ImageNet-1K pretrained VSS-ViT supernet | [Download](./supernet_imagenet_1k.pth) |
|
| 98 |
+
| supernet_nyu | Stage 2: Fine-tuned on NYU Depth v2 with CA-DDKD | [Download](./SuperNet_FT/supernet_nyu) |
|
| 99 |
+
| supernet_kitti | Stage 2: Fine-tuned on KITTI with CA-DDKD | [Download](./SuperNet_FT/supernet_kitti) |
|
| 100 |
+
| supernet_ade20k | Stage 2: Fine-tuned on ADE20K with CA-DDKD | [Download](./SuperNet_FT/supernet_ade20k.pth) |
|
| 101 |
+
| supernet_coco | Stage 2: Fine-tuned on COCO with CA-DDKD | [Download](./SuperNet_FT/supernet_coco.pth) |
|
| 102 |
+
|
| 103 |
+
### Teacher Models (Depth Anything)
|
| 104 |
+
|
| 105 |
+
| Checkpoint | Description | Weight |
|
| 106 |
+
|:-----------|:------------|:------:|
|
| 107 |
+
| nyu_depth_anything | Depth Anything metric indoor teacher | [Download](./pre_DA/nyu_depth_anything_metric_depth_indoor.pt) |
|
| 108 |
+
| kitti_depth_anything | Depth Anything metric outdoor teacher | [Download](./pre_DA/kitti_depth_anything_metric_depth_outdoor.pt) |
|
| 109 |
+
| ade20k_vitl | ViT-L teacher for ADE20K segmentation | [Download](./pre_DA/ade20k_vitl_mIoU_59.4.pth) |
|
| 110 |
+
| coco_dinov2 | DINOv2 teacher for COCO detection | [Download](./pre_DA/coco_dinov2_epoch_12.pth) |
|
| 111 |
+
|
| 112 |
+
## Quick Start
|
| 113 |
+
|
| 114 |
+
```python
|
| 115 |
+
# Download a specific model
|
| 116 |
+
from huggingface_hub import hf_hub_download
|
| 117 |
+
|
| 118 |
+
# Example: Download EvoNet-N3 (NYU Depth v2)
|
| 119 |
+
ckpt_path = hf_hub_download(
|
| 120 |
+
repo_id="YOUR_USERNAME/EvoNAS",
|
| 121 |
+
filename="EvoNAS/evonet_n3_best_abs_rel_0.08475",
|
| 122 |
+
)
|
| 123 |
+
|
| 124 |
+
# Example: Download the ImageNet-1K pretrained supernet
|
| 125 |
+
supernet_path = hf_hub_download(
|
| 126 |
+
repo_id="YOUR_USERNAME/EvoNAS",
|
| 127 |
+
filename="supernet_imagenet_1k.pth",
|
| 128 |
+
)
|
| 129 |
+
```
|
| 130 |
+
|
| 131 |
+
```python
|
| 132 |
+
# Download all checkpoints
|
| 133 |
+
from huggingface_hub import snapshot_download
|
| 134 |
+
|
| 135 |
+
snapshot_download(
|
| 136 |
+
repo_id="YOUR_USERNAME/EvoNAS",
|
| 137 |
+
local_dir="./evonas_checkpoints",
|
| 138 |
+
)
|
| 139 |
+
```
|
| 140 |
+
|
| 141 |
+
## Usage
|
| 142 |
+
|
| 143 |
+
Please refer to our [GitHub repository](https://github.com/EMI-Group/evonas) for full training, search, and evaluation instructions.
|
| 144 |
+
|
| 145 |
+
### Inference Example (Monocular Depth Estimation)
|
| 146 |
+
|
| 147 |
+
```python
|
| 148 |
+
import torch
|
| 149 |
+
from networks.EvoMambaDepthNet import EvoMambaDepthNet
|
| 150 |
+
|
| 151 |
+
# Define the searched architecture genotype
|
| 152 |
+
evonet_n3_genotype = {
|
| 153 |
+
# Replace with actual searched genotype from search logs
|
| 154 |
+
"d_state": [...],
|
| 155 |
+
"ssm_expand": [...],
|
| 156 |
+
"mlp_ratio": [...],
|
| 157 |
+
"depth": [...],
|
| 158 |
+
}
|
| 159 |
+
|
| 160 |
+
model = EvoMambaDepthNet(genotype=evonet_n3_genotype)
|
| 161 |
+
checkpoint = torch.load("evonet_n3_best_abs_rel_0.08475", map_location="cpu")
|
| 162 |
+
model.load_state_dict(checkpoint["model"])
|
| 163 |
+
model.eval()
|
| 164 |
+
|
| 165 |
+
# Run inference
|
| 166 |
+
with torch.no_grad():
|
| 167 |
+
depth = model(image_tensor)
|
| 168 |
+
```
|
| 169 |
+
|
| 170 |
+
## File Structure
|
| 171 |
+
|
| 172 |
+
```
|
| 173 |
+
.
|
| 174 |
+
βββ EvoNAS/ # Searched EvoNet checkpoints
|
| 175 |
+
β βββ evonet_c{1,2,3}_* # COCO object detection
|
| 176 |
+
β βββ evonet_a{1,2,3}_* # ADE20K semantic segmentation
|
| 177 |
+
β βββ evonet_k{1,2,3}_* # KITTI depth estimation
|
| 178 |
+
β βββ evonet_n{1,2,3}_* # NYU v2 depth estimation
|
| 179 |
+
β βββ logs/ # Training logs
|
| 180 |
+
βββ NVS/ # Novel view synthesis checkpoint
|
| 181 |
+
β βββ epoch_9-step_150000.ckpt
|
| 182 |
+
βββ SuperNet_FT/ # Fine-tuned supernet checkpoints
|
| 183 |
+
β βββ supernet_ade20k.pth
|
| 184 |
+
β βββ supernet_coco.pth
|
| 185 |
+
β βββ supernet_kitti
|
| 186 |
+
β βββ supernet_nyu
|
| 187 |
+
βββ pre_DA/ # Teacher model checkpoints
|
| 188 |
+
β βββ ade20k_vitl_mIoU_59.4.pth
|
| 189 |
+
β βββ coco_dinov2_epoch_12.pth
|
| 190 |
+
β βββ kitti_depth_anything_metric_depth_outdoor.pt
|
| 191 |
+
β βββ nyu_depth_anything_metric_depth_indoor.pt
|
| 192 |
+
βββ supernet_imagenet_1k.pth # ImageNet-1K pretrained supernet
|
| 193 |
+
```
|
| 194 |
+
|
| 195 |
+
## Citation
|
| 196 |
+
|
| 197 |
+
```bibtex
|
| 198 |
+
@article{zhang2025evonas,
|
| 199 |
+
title={Dual-Domain Representation Alignment: Bridging 2D and 3D Vision via Geometry-Aware Architecture Search},
|
| 200 |
+
author={Zhang, Haoyu and Yu, Zhihao and Wang, Rui and Jin, Yaochu and Liu, Qiqi and Cheng, Ran},
|
| 201 |
+
journal={arXiv preprint arXiv:2603.19563},
|
| 202 |
+
year={2025}
|
| 203 |
+
}
|
| 204 |
+
```
|
| 205 |
+
|
| 206 |
+
|
| 207 |
+
|
| 208 |
+
## Acknowledgements
|
| 209 |
+
|
| 210 |
+
We thank the open-source community behind [PyTorch](https://pytorch.org/), [Mamba SSM](https://github.com/state-spaces/mamba), [Spatial-Mamba](https://github.com/EdwardChasel/Spatial-Mamba), [MMDetection](https://github.com/open-mmlab/mmdetection), [MMSegmentation](https://github.com/open-mmlab/mmsegmentation), [Depth Anything](https://github.com/LiheYoung/Depth-Anything), [pymoo](https://github.com/anyoptimization/pymoo), and [timm](https://github.com/huggingface/pytorch-image-models).
|