kujimili
/

EvoNAS

Model card Files Files and versions

xet

Community

kujimili commited on 28 days ago

Commit

6056a64

verified ·

1 Parent(s): ca9969c

Update README.md

Browse files

Files changed (1) hide show

README.md +210 -3

README.md CHANGED Viewed

@@ -1,3 +1,210 @@
----
-license: cc-by-nc-sa-4.0
----

+---
+license: cc-by-nc-sa-4.0
+---
+tags:
+  - neural-architecture-search
+  - evolutionary-computation
+  - computer-vision
+  - depth-estimation
+  - object-detection
+  - semantic-segmentation
+  - 3d-gaussian-splatting
+  - mamba
+  - vision-transformer
+  - multi-objective-optimization
+datasets:
+  - imagenet-1k
+  - detection-datasets/coco
+  - scene_parse_150
+  - kitti
+  - nyu_depth_v2
+  - RealEstate10K
+metrics:
+  - mAP
+  - miou
+  - abs_rel
+  - psnr
+  - ssim
+pipeline_tag: depth-estimation
+library_name: pytorch
+---
+# EvoNAS: Dual-Domain Representation Alignment for Geometry-Aware Architecture Search
+<p align="center">
+  <a href="https://arxiv.org/abs/2603.19563"><img src="https://img.shields.io/badge/Paper-arXiv-red" alt="arXiv"></a>
+  <a href="https://github.com/EMI-group/evonas"><img src="https://img.shields.io/badge/Code-GitHub-blue" alt="GitHub"></a>
+</p>
+## Overview
+EvoNAS is a multi-objective evolutionary neural architecture search framework that discovers **Pareto-optimal vision backbones** bridging 2D dense prediction and 3D rendering. It features:
+- **Hybrid VSS-ViT Search Space**: Combines Vision State Space (Mamba) blocks with Vision Transformers
+- **CA-DDKD**: Cross-Architecture Dual-Domain Knowledge Distillation via DCT constraints
+- **DMMPE**: Hardware-isolated distributed evaluation engine for unbiased latency measurement
+- **Progressive Supernet Training (PST)**: Curriculum-based weight-sharing optimization
+The discovered **EvoNets** achieve state-of-the-art accuracy-efficiency trade-offs across object detection, semantic segmentation, monocular depth estimation, and novel view synthesis.
+## Model Zoo
+### Searched Architectures (EvoNets)
+#### Object Detection on COCO (Mask R-CNN)
+| Model | Params | MACs | AP^b | Latency | Throughput | NID | Weight |
+|:------|:------:|:----:|:----:|:-------:|:----------:|:---:|:------:|
+| EvoNet-C1 | 33M | 190G | 45.4 | 50.2ms | 26 FPS | 1.39 | [Download](./EvoNAS/evonet_c1_best_coco_bbox_mAP_0.45400) |
+| EvoNet-C2 | 36M | 202G | 47.1 | 55.4ms | 23 FPS | 1.29 | [Download](./EvoNAS/evonet_c2_best_coco_bbox_mAP_0.46800) |
+| EvoNet-C3 | 42M | 228G | 48.5 | 66.9ms | 18 FPS | 1.15 | [Download](./EvoNAS/evonet_c3_best_coco_bbox_mAP_0.48300) |
+#### Semantic Segmentation on ADE20K (UPerNet)
+| Model | Params | MACs | mIoU | Latency | Throughput | NID | Weight |
+|:------|:------:|:----:|:----:|:-------:|:----------:|:---:|:------:|
+| EvoNet-A1 | 23M | 711G | 44.1 | 77.3ms | 14 FPS | 1.93 | [Download](./EvoNAS/evonet_a1_best_mIoU_44.10000) |
+| EvoNet-A2 | 26M | 724G | 47.3 | 81.0ms | 13 FPS | 1.79 | [Download](./EvoNAS/evonet_a2_best_mIoU_47.26000) |
+| EvoNet-A3 | 32M | 754G | 49.7 | 94.8ms | 12 FPS | 1.57 | [Download](./EvoNAS/evonet_a3_best_mIoU_49.72000) |
+#### Monocular Depth Estimation on KITTI
+| Model | Params | MACs | Abs Rel↓ | δ₁↑ | Latency | Throughput | NID | Weight |
+|:------|:------:|:----:|:--------:|:---:|:-------:|:----------:|:---:|:------:|
+| EvoNet-K1 | 18.0M | 27.3G | 0.060 | 0.960 | 18.6ms | 117 FPS | 5.34 | [Download](./EvoNAS/evonet_k1_best_abs_rel_0.05967) |
+| EvoNet-K2 | 22.6M | 36.2G | 0.056 | 0.966 | 24.6ms | 83 FPS | 4.28 | [Download](./EvoNAS/evonet_k2_best_abs_rel_0.05517) |
+| EvoNet-K3 | 26.3M | 45.0G | 0.054 | 0.969 | 28.0ms | 65 FPS | 3.68 | [Download](./EvoNAS/evonet_k3_best_abs_rel_0.05375) |
+#### Monocular Depth Estimation on NYU Depth v2
+| Model | Params | MACs | Abs Rel↓ | δ₁↑ | Latency | Throughput | NID | Weight |
+|:------|:------:|:----:|:--------:|:---:|:-------:|:----------:|:---:|:------:|
+| EvoNet-N1 | 19.1M | 21.7G | 0.095 | 0.912 | 21.8ms | 138 FPS | 4.77 | [Download](./EvoNAS/evonet_n1_best_abs_rel_0.09456) |
+| EvoNet-N2 | 24.1M | 27.1G | 0.089 | 0.926 | 25.9ms | 107 FPS | 3.85 | [Download](./EvoNAS/evonet_n2_best_abs_rel_0.08818) |
+| EvoNet-N3 | 30.3M | 33.9G | 0.085 | 0.932 | 30.8ms | 88 FPS | 3.08 | [Download](./EvoNAS/evonet_n3_best_abs_rel_0.08475) |
+#### Novel View Synthesis on RealEstate10K (3DGS)
+| Model | Params | PSNR↑ | SSIM↑ | LPIPS↓ | Latency | Throughput | Weight |
+|:------|:------:|:-----:|:-----:|:------:|:-------:|:----------:|:------:|
+| EvoNet-D | 44M | 26.41 | 0.871 | 0.127 | 88ms | 27 FPS | [Download](./NVS/epoch_9-step_150000.ckpt) |
+### Supernet Checkpoints
+| Checkpoint | Description | Weight |
+|:-----------|:------------|:------:|
+| supernet_imagenet_1k | Stage 1: ImageNet-1K pretrained VSS-ViT supernet | [Download](./supernet_imagenet_1k.pth) |
+| supernet_nyu | Stage 2: Fine-tuned on NYU Depth v2 with CA-DDKD | [Download](./SuperNet_FT/supernet_nyu) |
+| supernet_kitti | Stage 2: Fine-tuned on KITTI with CA-DDKD | [Download](./SuperNet_FT/supernet_kitti) |
+| supernet_ade20k | Stage 2: Fine-tuned on ADE20K with CA-DDKD | [Download](./SuperNet_FT/supernet_ade20k.pth) |
+| supernet_coco | Stage 2: Fine-tuned on COCO with CA-DDKD | [Download](./SuperNet_FT/supernet_coco.pth) |
+### Teacher Models (Depth Anything)
+| Checkpoint | Description | Weight |
+|:-----------|:------------|:------:|
+| nyu_depth_anything | Depth Anything metric indoor teacher | [Download](./pre_DA/nyu_depth_anything_metric_depth_indoor.pt) |
+| kitti_depth_anything | Depth Anything metric outdoor teacher | [Download](./pre_DA/kitti_depth_anything_metric_depth_outdoor.pt) |
+| ade20k_vitl | ViT-L teacher for ADE20K segmentation | [Download](./pre_DA/ade20k_vitl_mIoU_59.4.pth) |
+| coco_dinov2 | DINOv2 teacher for COCO detection | [Download](./pre_DA/coco_dinov2_epoch_12.pth) |
+## Quick Start
+```python
+# Download a specific model
+from huggingface_hub import hf_hub_download
+# Example: Download EvoNet-N3 (NYU Depth v2)
+ckpt_path = hf_hub_download(
+    repo_id="YOUR_USERNAME/EvoNAS",
+    filename="EvoNAS/evonet_n3_best_abs_rel_0.08475",
+)
+# Example: Download the ImageNet-1K pretrained supernet
+supernet_path = hf_hub_download(
+    repo_id="YOUR_USERNAME/EvoNAS",
+    filename="supernet_imagenet_1k.pth",
+)
+```
+```python
+# Download all checkpoints
+from huggingface_hub import snapshot_download
+snapshot_download(
+    repo_id="YOUR_USERNAME/EvoNAS",
+    local_dir="./evonas_checkpoints",
+)
+```
+## Usage
+Please refer to our [GitHub repository](https://github.com/EMI-Group/evonas) for full training, search, and evaluation instructions.
+### Inference Example (Monocular Depth Estimation)
+```python
+import torch
+from networks.EvoMambaDepthNet import EvoMambaDepthNet
+# Define the searched architecture genotype
+evonet_n3_genotype = {
+    # Replace with actual searched genotype from search logs
+    "d_state": [...],
+    "ssm_expand": [...],
+    "mlp_ratio": [...],
+    "depth": [...],
+}
+model = EvoMambaDepthNet(genotype=evonet_n3_genotype)
+checkpoint = torch.load("evonet_n3_best_abs_rel_0.08475", map_location="cpu")
+model.load_state_dict(checkpoint["model"])
+model.eval()
+# Run inference
+with torch.no_grad():
+    depth = model(image_tensor)
+```
+## File Structure
+```
+.
+├── EvoNAS/                          # Searched EvoNet checkpoints
+│   ├── evonet_c{1,2,3}_*            # COCO object detection
+│   ├── evonet_a{1,2,3}_*            # ADE20K semantic segmentation
+│   ├── evonet_k{1,2,3}_*            # KITTI depth estimation
+│   ├── evonet_n{1,2,3}_*            # NYU v2 depth estimation
+│   └── logs/                        # Training logs
+├── NVS/                             # Novel view synthesis checkpoint
+│   └── epoch_9-step_150000.ckpt
+├── SuperNet_FT/                     # Fine-tuned supernet checkpoints
+│   ├── supernet_ade20k.pth
+│   ├── supernet_coco.pth
+│   ├── supernet_kitti
+│   └── supernet_nyu
+├── pre_DA/                          # Teacher model checkpoints
+│   ├── ade20k_vitl_mIoU_59.4.pth
+│   ├── coco_dinov2_epoch_12.pth
+│   ├── kitti_depth_anything_metric_depth_outdoor.pt
+│   └── nyu_depth_anything_metric_depth_indoor.pt
+└── supernet_imagenet_1k.pth         # ImageNet-1K pretrained supernet
+```
+## Citation
+```bibtex
+@article{zhang2025evonas,
+  title={Dual-Domain Representation Alignment: Bridging 2D and 3D Vision via Geometry-Aware Architecture Search},
+  author={Zhang, Haoyu and Yu, Zhihao and Wang, Rui and Jin, Yaochu and Liu, Qiqi and Cheng, Ran},
+  journal={arXiv preprint arXiv:2603.19563},
+  year={2025}
+}
+```
+## Acknowledgements
+We thank the open-source community behind [PyTorch](https://pytorch.org/), [Mamba SSM](https://github.com/state-spaces/mamba), [Spatial-Mamba](https://github.com/EdwardChasel/Spatial-Mamba), [MMDetection](https://github.com/open-mmlab/mmdetection), [MMSegmentation](https://github.com/open-mmlab/mmsegmentation), [Depth Anything](https://github.com/LiheYoung/Depth-Anything), [pymoo](https://github.com/anyoptimization/pymoo), and [timm](https://github.com/huggingface/pytorch-image-models).