kujimili commited on
Commit
6056a64
Β·
verified Β·
1 Parent(s): ca9969c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +210 -3
README.md CHANGED
@@ -1,3 +1,210 @@
1
- ---
2
- license: cc-by-nc-sa-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-sa-4.0
3
+ ---
4
+
5
+ tags:
6
+ - neural-architecture-search
7
+ - evolutionary-computation
8
+ - computer-vision
9
+ - depth-estimation
10
+ - object-detection
11
+ - semantic-segmentation
12
+ - 3d-gaussian-splatting
13
+ - mamba
14
+ - vision-transformer
15
+ - multi-objective-optimization
16
+ datasets:
17
+ - imagenet-1k
18
+ - detection-datasets/coco
19
+ - scene_parse_150
20
+ - kitti
21
+ - nyu_depth_v2
22
+ - RealEstate10K
23
+ metrics:
24
+ - mAP
25
+ - miou
26
+ - abs_rel
27
+ - psnr
28
+ - ssim
29
+ pipeline_tag: depth-estimation
30
+ library_name: pytorch
31
+ ---
32
+
33
+ # EvoNAS: Dual-Domain Representation Alignment for Geometry-Aware Architecture Search
34
+
35
+ <p align="center">
36
+ <a href="https://arxiv.org/abs/2603.19563"><img src="https://img.shields.io/badge/Paper-arXiv-red" alt="arXiv"></a>
37
+ <a href="https://github.com/EMI-group/evonas"><img src="https://img.shields.io/badge/Code-GitHub-blue" alt="GitHub"></a>
38
+ </p>
39
+
40
+ ## Overview
41
+
42
+ EvoNAS is a multi-objective evolutionary neural architecture search framework that discovers **Pareto-optimal vision backbones** bridging 2D dense prediction and 3D rendering. It features:
43
+
44
+ - **Hybrid VSS-ViT Search Space**: Combines Vision State Space (Mamba) blocks with Vision Transformers
45
+ - **CA-DDKD**: Cross-Architecture Dual-Domain Knowledge Distillation via DCT constraints
46
+ - **DMMPE**: Hardware-isolated distributed evaluation engine for unbiased latency measurement
47
+ - **Progressive Supernet Training (PST)**: Curriculum-based weight-sharing optimization
48
+
49
+ The discovered **EvoNets** achieve state-of-the-art accuracy-efficiency trade-offs across object detection, semantic segmentation, monocular depth estimation, and novel view synthesis.
50
+
51
+ ## Model Zoo
52
+
53
+ ### Searched Architectures (EvoNets)
54
+
55
+ #### Object Detection on COCO (Mask R-CNN)
56
+
57
+ | Model | Params | MACs | AP^b | Latency | Throughput | NID | Weight |
58
+ |:------|:------:|:----:|:----:|:-------:|:----------:|:---:|:------:|
59
+ | EvoNet-C1 | 33M | 190G | 45.4 | 50.2ms | 26 FPS | 1.39 | [Download](./EvoNAS/evonet_c1_best_coco_bbox_mAP_0.45400) |
60
+ | EvoNet-C2 | 36M | 202G | 47.1 | 55.4ms | 23 FPS | 1.29 | [Download](./EvoNAS/evonet_c2_best_coco_bbox_mAP_0.46800) |
61
+ | EvoNet-C3 | 42M | 228G | 48.5 | 66.9ms | 18 FPS | 1.15 | [Download](./EvoNAS/evonet_c3_best_coco_bbox_mAP_0.48300) |
62
+
63
+ #### Semantic Segmentation on ADE20K (UPerNet)
64
+
65
+ | Model | Params | MACs | mIoU | Latency | Throughput | NID | Weight |
66
+ |:------|:------:|:----:|:----:|:-------:|:----------:|:---:|:------:|
67
+ | EvoNet-A1 | 23M | 711G | 44.1 | 77.3ms | 14 FPS | 1.93 | [Download](./EvoNAS/evonet_a1_best_mIoU_44.10000) |
68
+ | EvoNet-A2 | 26M | 724G | 47.3 | 81.0ms | 13 FPS | 1.79 | [Download](./EvoNAS/evonet_a2_best_mIoU_47.26000) |
69
+ | EvoNet-A3 | 32M | 754G | 49.7 | 94.8ms | 12 FPS | 1.57 | [Download](./EvoNAS/evonet_a3_best_mIoU_49.72000) |
70
+
71
+ #### Monocular Depth Estimation on KITTI
72
+
73
+ | Model | Params | MACs | Abs Rel↓ | δ₁↑ | Latency | Throughput | NID | Weight |
74
+ |:------|:------:|:----:|:--------:|:---:|:-------:|:----------:|:---:|:------:|
75
+ | EvoNet-K1 | 18.0M | 27.3G | 0.060 | 0.960 | 18.6ms | 117 FPS | 5.34 | [Download](./EvoNAS/evonet_k1_best_abs_rel_0.05967) |
76
+ | EvoNet-K2 | 22.6M | 36.2G | 0.056 | 0.966 | 24.6ms | 83 FPS | 4.28 | [Download](./EvoNAS/evonet_k2_best_abs_rel_0.05517) |
77
+ | EvoNet-K3 | 26.3M | 45.0G | 0.054 | 0.969 | 28.0ms | 65 FPS | 3.68 | [Download](./EvoNAS/evonet_k3_best_abs_rel_0.05375) |
78
+
79
+ #### Monocular Depth Estimation on NYU Depth v2
80
+
81
+ | Model | Params | MACs | Abs Rel↓ | δ₁↑ | Latency | Throughput | NID | Weight |
82
+ |:------|:------:|:----:|:--------:|:---:|:-------:|:----------:|:---:|:------:|
83
+ | EvoNet-N1 | 19.1M | 21.7G | 0.095 | 0.912 | 21.8ms | 138 FPS | 4.77 | [Download](./EvoNAS/evonet_n1_best_abs_rel_0.09456) |
84
+ | EvoNet-N2 | 24.1M | 27.1G | 0.089 | 0.926 | 25.9ms | 107 FPS | 3.85 | [Download](./EvoNAS/evonet_n2_best_abs_rel_0.08818) |
85
+ | EvoNet-N3 | 30.3M | 33.9G | 0.085 | 0.932 | 30.8ms | 88 FPS | 3.08 | [Download](./EvoNAS/evonet_n3_best_abs_rel_0.08475) |
86
+
87
+ #### Novel View Synthesis on RealEstate10K (3DGS)
88
+
89
+ | Model | Params | PSNR↑ | SSIM↑ | LPIPS↓ | Latency | Throughput | Weight |
90
+ |:------|:------:|:-----:|:-----:|:------:|:-------:|:----------:|:------:|
91
+ | EvoNet-D | 44M | 26.41 | 0.871 | 0.127 | 88ms | 27 FPS | [Download](./NVS/epoch_9-step_150000.ckpt) |
92
+
93
+ ### Supernet Checkpoints
94
+
95
+ | Checkpoint | Description | Weight |
96
+ |:-----------|:------------|:------:|
97
+ | supernet_imagenet_1k | Stage 1: ImageNet-1K pretrained VSS-ViT supernet | [Download](./supernet_imagenet_1k.pth) |
98
+ | supernet_nyu | Stage 2: Fine-tuned on NYU Depth v2 with CA-DDKD | [Download](./SuperNet_FT/supernet_nyu) |
99
+ | supernet_kitti | Stage 2: Fine-tuned on KITTI with CA-DDKD | [Download](./SuperNet_FT/supernet_kitti) |
100
+ | supernet_ade20k | Stage 2: Fine-tuned on ADE20K with CA-DDKD | [Download](./SuperNet_FT/supernet_ade20k.pth) |
101
+ | supernet_coco | Stage 2: Fine-tuned on COCO with CA-DDKD | [Download](./SuperNet_FT/supernet_coco.pth) |
102
+
103
+ ### Teacher Models (Depth Anything)
104
+
105
+ | Checkpoint | Description | Weight |
106
+ |:-----------|:------------|:------:|
107
+ | nyu_depth_anything | Depth Anything metric indoor teacher | [Download](./pre_DA/nyu_depth_anything_metric_depth_indoor.pt) |
108
+ | kitti_depth_anything | Depth Anything metric outdoor teacher | [Download](./pre_DA/kitti_depth_anything_metric_depth_outdoor.pt) |
109
+ | ade20k_vitl | ViT-L teacher for ADE20K segmentation | [Download](./pre_DA/ade20k_vitl_mIoU_59.4.pth) |
110
+ | coco_dinov2 | DINOv2 teacher for COCO detection | [Download](./pre_DA/coco_dinov2_epoch_12.pth) |
111
+
112
+ ## Quick Start
113
+
114
+ ```python
115
+ # Download a specific model
116
+ from huggingface_hub import hf_hub_download
117
+
118
+ # Example: Download EvoNet-N3 (NYU Depth v2)
119
+ ckpt_path = hf_hub_download(
120
+ repo_id="YOUR_USERNAME/EvoNAS",
121
+ filename="EvoNAS/evonet_n3_best_abs_rel_0.08475",
122
+ )
123
+
124
+ # Example: Download the ImageNet-1K pretrained supernet
125
+ supernet_path = hf_hub_download(
126
+ repo_id="YOUR_USERNAME/EvoNAS",
127
+ filename="supernet_imagenet_1k.pth",
128
+ )
129
+ ```
130
+
131
+ ```python
132
+ # Download all checkpoints
133
+ from huggingface_hub import snapshot_download
134
+
135
+ snapshot_download(
136
+ repo_id="YOUR_USERNAME/EvoNAS",
137
+ local_dir="./evonas_checkpoints",
138
+ )
139
+ ```
140
+
141
+ ## Usage
142
+
143
+ Please refer to our [GitHub repository](https://github.com/EMI-Group/evonas) for full training, search, and evaluation instructions.
144
+
145
+ ### Inference Example (Monocular Depth Estimation)
146
+
147
+ ```python
148
+ import torch
149
+ from networks.EvoMambaDepthNet import EvoMambaDepthNet
150
+
151
+ # Define the searched architecture genotype
152
+ evonet_n3_genotype = {
153
+ # Replace with actual searched genotype from search logs
154
+ "d_state": [...],
155
+ "ssm_expand": [...],
156
+ "mlp_ratio": [...],
157
+ "depth": [...],
158
+ }
159
+
160
+ model = EvoMambaDepthNet(genotype=evonet_n3_genotype)
161
+ checkpoint = torch.load("evonet_n3_best_abs_rel_0.08475", map_location="cpu")
162
+ model.load_state_dict(checkpoint["model"])
163
+ model.eval()
164
+
165
+ # Run inference
166
+ with torch.no_grad():
167
+ depth = model(image_tensor)
168
+ ```
169
+
170
+ ## File Structure
171
+
172
+ ```
173
+ .
174
+ β”œβ”€β”€ EvoNAS/ # Searched EvoNet checkpoints
175
+ β”‚ β”œβ”€β”€ evonet_c{1,2,3}_* # COCO object detection
176
+ β”‚ β”œβ”€β”€ evonet_a{1,2,3}_* # ADE20K semantic segmentation
177
+ β”‚ β”œβ”€β”€ evonet_k{1,2,3}_* # KITTI depth estimation
178
+ β”‚ β”œβ”€β”€ evonet_n{1,2,3}_* # NYU v2 depth estimation
179
+ β”‚ └── logs/ # Training logs
180
+ β”œβ”€β”€ NVS/ # Novel view synthesis checkpoint
181
+ β”‚ └── epoch_9-step_150000.ckpt
182
+ β”œβ”€β”€ SuperNet_FT/ # Fine-tuned supernet checkpoints
183
+ β”‚ β”œβ”€β”€ supernet_ade20k.pth
184
+ β”‚ β”œβ”€β”€ supernet_coco.pth
185
+ β”‚ β”œβ”€β”€ supernet_kitti
186
+ β”‚ └── supernet_nyu
187
+ β”œβ”€β”€ pre_DA/ # Teacher model checkpoints
188
+ β”‚ β”œβ”€β”€ ade20k_vitl_mIoU_59.4.pth
189
+ β”‚ β”œβ”€β”€ coco_dinov2_epoch_12.pth
190
+ β”‚ β”œβ”€β”€ kitti_depth_anything_metric_depth_outdoor.pt
191
+ β”‚ └── nyu_depth_anything_metric_depth_indoor.pt
192
+ └── supernet_imagenet_1k.pth # ImageNet-1K pretrained supernet
193
+ ```
194
+
195
+ ## Citation
196
+
197
+ ```bibtex
198
+ @article{zhang2025evonas,
199
+ title={Dual-Domain Representation Alignment: Bridging 2D and 3D Vision via Geometry-Aware Architecture Search},
200
+ author={Zhang, Haoyu and Yu, Zhihao and Wang, Rui and Jin, Yaochu and Liu, Qiqi and Cheng, Ran},
201
+ journal={arXiv preprint arXiv:2603.19563},
202
+ year={2025}
203
+ }
204
+ ```
205
+
206
+
207
+
208
+ ## Acknowledgements
209
+
210
+ We thank the open-source community behind [PyTorch](https://pytorch.org/), [Mamba SSM](https://github.com/state-spaces/mamba), [Spatial-Mamba](https://github.com/EdwardChasel/Spatial-Mamba), [MMDetection](https://github.com/open-mmlab/mmdetection), [MMSegmentation](https://github.com/open-mmlab/mmsegmentation), [Depth Anything](https://github.com/LiheYoung/Depth-Anything), [pymoo](https://github.com/anyoptimization/pymoo), and [timm](https://github.com/huggingface/pytorch-image-models).