Lewandofski commited on
Commit
3d09deb
Β·
verified Β·
1 Parent(s): 000d147

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +550 -3
README.md CHANGED
@@ -1,3 +1,550 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ **[CVPR'25] Official PyTorch implementation of "[**MobileMamba: Lightweight Multi-Receptive Visual Mamba Network**](https://arxiv.org/pdf/2411.15941)".**
6
+ ---
7
+ [![arXiv](https://img.shields.io/badge/Arxiv-2411.15941-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2411.15941)
8
+ [![github](https://img.shields.io/github/stars/lewandofskee/MobileMamba.svg?style=social)](https://github.com/lewandofskee/MobileMamba)
9
+
10
+ [Haoyang He<sup>1*</sup>](https://scholar.google.com/citations?hl=zh-CN&user=8NfQv1sAAAAJ),
11
+ [Jiangning Zhang<sup>2*</sup>](https://zhangzjn.github.io),
12
+ [Yuxuan Cai<sup>3</sup>](https://scholar.google.com/citations?hl=zh-CN&user=J9lTFAUAAAAJ),
13
+ [Hongxu Chen<sup>1</sup>](https://scholar.google.com/citations?hl=zh-CN&user=uFT3YfMAAAAJ)
14
+ [Xiaobin Hu<sup>2</sup>](https://scholar.google.com/citations?hl=zh-CN&user=3lMuodUAAAAJ),
15
+
16
+ [Zhenye Gan<sup>2</sup>](https://scholar.google.com/citations?user=fa4NkScAAAAJ&hl=zh-CN),
17
+ [Yabiao Wang<sup>2</sup>](https://scholar.google.com/citations?user=xiK4nFUAAAAJ&hl=zh-CN),
18
+ [Chengjie Wang<sup>2</sup>](https://scholar.google.com/citations?hl=zh-CN&user=fqte5H4AAAAJ),
19
+ Yunsheng Wu<sup>2</sup>,
20
+ [Lei Xie<sup>1†</sup>](https://scholar.google.com/citations?hl=zh-CN&user=7ZZ_-m0AAAAJ)
21
+
22
+ <sup>1</sup>College of Control Science and Engineering, Zhejiang University,
23
+ <sup>2</sup>Youtu Lab, Tencent,
24
+ <sup>3</sup>Huazhong University of Science and Technology
25
+ > **Abstract:** Previous research on lightweight models has primarily focused on CNNs and Transformer-based designs. CNNs, with their local receptive fields, struggle to capture long-range dependencies, while Transformers, despite their global modeling capabilities, are limited by quadratic computational complexity in high-resolution scenarios. Recently, state-space models have gained popularity in the visual domain due to their linear computational complexity. Despite their low FLOPs, current lightweight Mamba-based models exhibit suboptimal throughput.
26
+ In this work, we propose the MobileMamba framework, which balances efficiency and performance. We design a three-stage network to enhance inference speed significantly. At a fine-grained level, we introduce the Multi-Receptive Field Feature Interaction MRFFI module, comprising the Long-Range Wavelet Transform-Enhanced Mamba WTE-Mamba, Efficient Multi-Kernel Depthwise Convolution MK-DeConv, and Eliminate Redundant Identity components. This module integrates multi-receptive field information and enhances high-frequency detail extraction. Additionally, we employ training and testing strategies to further improve performance and efficiency.
27
+ MobileMamba achieves up to 83.6% on Top-1, surpassing existing state-of-the-art methods which is maximum x21 faster than LocalVim on GPU. Extensive experiments on high-resolution downstream tasks demonstrate that MobileMamba surpasses current efficient models, achieving an optimal balance between speed and accuracy.
28
+
29
+
30
+ ------
31
+ # Classification results
32
+ ### Image Classification for [ImageNet-1K](https://www.image-net.org):
33
+ | Model | FLOPs | #Params | Resolution | Top-1 | Cfg | Log | Model |
34
+ |--------------------------|:-----:|:-------:|:----------:|:-----:|:---------------------------------------------:|:--------------------------------------------------:|:----------------------------------------------------:|
35
+ | MobileMamba-T2 | 255M | 8.8M | 192 x 192 | 71.5 | [cfg](configs/mobilemamba/mobilemamba_t2.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_T2/mobilemamba_t2.txt) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_T2/mobilemamba_t2.pth) |
36
+ | MobileMamba-T2† | 255M | 8.8M | 192 x 192 | 76.9 | [cfg](configs/mobilemamba/mobilemamba_t2s.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_T2s/mobilemamba_t2s.txt) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_T2s/mobilemamba_t2s.pth) |
37
+ | MobileMamba-T4 | 413M | 14.2M | 192 x 192 | 76.1 | [cfg](configs/mobilemamba/mobilemamba_t4.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_T4/mobilemamba_t4.txt) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_T4/mobilemamba_t4.pth) |
38
+ | MobileMamba-T4† | 413M | 14.2M | 192 x 192 | 78.9 | [cfg](configs/mobilemamba/mobilemamba_t4s.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_T4s/mobilemamba_t4s.txt) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_T4s/mobilemamba_t4s.pth) |
39
+ | MobileMamba-S6 | 652M | 15.0M | 224 x 224 | 78.0 | [cfg](configs/mobilemamba/mobilemamba_s6.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_S6/mobilemamba_s6.txt) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_S6/mobilemamba_s6.pth) |
40
+ | MobileMamba-S6† | 652M | 15.0M | 224 x 224 | 80.7 | [cfg](configs/mobilemamba/mobilemamba_s6s.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_S6s/mobilemamba_s6s.txt) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_S6s/mobilemamba_s6s.pth) |
41
+ | MobileMamba-B1 | 1080M | 17.1M | 256 x 256 | 79.9 | [cfg](configs/mobilemamba/mobilemamba_b1.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_B1/mobilemamba_b1.txt) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_B1/mobilemamba_b1.pth) |
42
+ | MobileMamba-B1† | 1080M | 17.1M | 256 x 256 | 82.2 | [cfg](configs/mobilemamba/mobilemamba_b1s.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_B1s/mobilemamba_b1s.txt) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_B1s/mobilemamba_b1s.pth) |
43
+ | MobileMamba-B2 | 2427M | 17.1M | 384 x 384 | 81.6 | [cfg](configs/mobilemamba/mobilemamba_b2.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_B2/mobilemamba_b2.txt) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_B2/mobilemamba_b2.pth) |
44
+ | MobileMamba-B2† | 2427M | 17.1M | 384 x 384 | 83.3 | [cfg](configs/mobilemamba/mobilemamba_b2s.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_B2s/mobilemamba_b2s.txt) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_B2s/mobilemamba_b2s.pth) |
45
+ | MobileMamba-B4 | 4313M | 17.1M | 512 x 512 | 82.5 | [cfg](configs/mobilemamba/mobilemamba_b4.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_B4/mobilemamba_b4.txt) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_B4/mobilemamba_b4.pth) |
46
+ | MobileMamba-B4† | 4313M | 17.1M | 512 x 512 | 83.6 | [cfg](configs/mobilemamba/mobilemamba_b4s.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_B4s/mobilemamba_b4s.txt) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_B4s/mobilemamba_b4s.pth) |
47
+
48
+ ------
49
+ # Downstream Results
50
+ ## Object Detection and Instant Segmentation Results
51
+ ### Object Detection and Instant Segmentation Performance Based on [Mask-RCNN](https://openaccess.thecvf.com/content_ICCV_2017/papers/He_Mask_R-CNN_ICCV_2017_paper.pdf) for [COCO2017](https://cocodataset.org):
52
+ | Backbone | AP<sup>b</sup> | AP<sup>b</sup><sub>50</sub> | AP<sup>b</sup><sub>75</sub> | AP<sup>b</sup><sub>S</sub> | AP<sup>b</sup><sub>M</sub> | AP<sup>b</sup><sub>L</sub> | AP<sup>m</sup> | AP<sup>m</sup><sub>50</sub> | AP<sup>m</sup><sub>75</sub> | AP<sup>m</sup><sub>S</sub> | AP<sup>m</sup><sub>M</sub> | AP<sup>m</sup><sub>L</sub> | #Params | FLOPs | Cfg | Log | Model |
53
+ |:--------:|:--------------:|:---------------------------:|:---------------------------:|:--------------------------:|:--------------------------:|:--------------------------:|:--------------:|:---------------------------:|:---------------------------:|:--------------------------:|:--------------------------:|:--------------------------:|:-------:|:-----:|:------------------------------------------:|:------------------------------------------:|:--------------------------------------------:|
54
+ | MobileMamba-B1 | 40.6 | 61.8 | 43.8 | 22.4 | 43.5 | 55.9 | 37.4 | 58.9 | 39.9 | 17.1 | 39.9 | 56.4 | 38.0M | 178G | [cfg](downstream/det/configs/mask_rcnn/mask-rcnn_mobilemamba_b1_fpn_1x_coco.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/downstream/det/maskrcnn.log) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/downstream/det/maskrcnn.pth) |
55
+
56
+ ### Object Detection Performance Based on [RetinaNet](https://openaccess.thecvf.com/content_ICCV_2017/papers/Lin_Focal_Loss_for_ICCV_2017_paper.pdf) for [COCO2017](https://cocodataset.org):
57
+ | Backbone | AP | AP<sub>50</sub> | AP<sub>75</sub> | AP<sub>S</sub> | AP<sub>M</sub> | AP<sub>L</sub> | #Params | FLOPs | Cfg | Log | Model |
58
+ |:--------:|:----:|:---------------:|:---------------:|:--------------:|:--------------:|:--------------:|:-------:|:-----:|:-------------------------------------------:|:-------------------------------------------:|:---------------------------------------------:|
59
+ | MobileMamba-B1 | 39.6 | 59.8 | 42.4 | 21.5 | 43.4 | 53.9 | 27.1M | 151G | [cfg](downstream/det/configs/retinanet/retinanet_mobilemamba_b1_fpn_1x_coco.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/downstream/det/retinanet.log) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/downstream/det/retinanet.pth) |
60
+
61
+ ### Object Detection Performance Based on [SSDLite](https://openaccess.thecvf.com/content_ICCV_2019/papers/Howard_Searching_for_MobileNetV3_ICCV_2019_paper.pdf) for [COCO2017](https://cocodataset.org):
62
+ | Backbone | AP | AP<sub>50</sub> | AP<sub>75</sub> | AP<sub>S</sub> | AP<sub>M</sub> | AP<sub>L</sub> | #Params | FLOPs | Cfg | Log | Model |
63
+ |:-------------------:|:----:|:---------------:|:---------------:|:--------------:|:--------------:|:--------------:|:-------:|:-----:|:-----------------------------------------------------------------------------:|:------------------------------:|:-----------------------------------------------:|
64
+ | MobileMamba-B1 | 24.0 | 39.5 | 24.0 | 3.1 | 23.4 | 46.9 | 18.0M | 1.7G | [cfg](downstream/det/configs/ssd/ssdlite_mobilemamba_b1_8gpu_2lr_coco.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/downstream/det/ssdlite.log) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/downstream/det/ssdlite.pth) |
65
+ | MobileMamba-B1-r512 | 29.5 | 47.7 | 30.4 | 8.9 | 35.0 | 47.0 | 18.0M | 4.4G | [cfg](downstream/det/configs/ssd/ssdlite_mobilemamba_b1_8gpu_2lr_512_coco.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/downstream/det/ssdlite_512.log) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/downstream/det/ssdlite_512.pth) |
66
+
67
+ ## Semantic Segmentation Results
68
+ ### Semantic Segmentation Based on [Semantic FPN](https://openaccess.thecvf.com/content_CVPR_2019/papers/Kirillov_Panoptic_Feature_Pyramid_Networks_CVPR_2019_paper.pdf) for [ADE20k](http://sceneparsing.csail.mit.edu/):
69
+ | Backbone | aAcc | mIoU | mAcc | #Params | FLOPs | Cfg | Log | Model |
70
+ |:--------:|:----:|:----:|:----:|:-------:|:-----:|:-------------------------------------:|:-------------------------------------:|:---------------------------------------:|
71
+ | MobileMamba-B4 | 79.9 | 42.5 | 53.7 | 19.8M | 5.6G | [cfg](downstream/seg/configs/sem_fpn/fpn_mobilemamba_b4-160k_ade20k-512x512.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/downstream/seg/fpn.log) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/downstream/seg/fpn.pth) |
72
+
73
+
74
+ ### Semantic Segmentation Based on [DeepLabv3](https://arxiv.org/pdf/1706.05587.pdf) for [ADE20k](http://sceneparsing.csail.mit.edu/):
75
+ | Backbone | aAcc | mIoU | mAcc | #Params | FLOPs | Cfg | Log | Model |
76
+ |:--------------:|:----:|:----:|:----:|:-------:|:-----:|:-------------------------------------------:|:-------------------------------------------:|:---------------------------------------------:|
77
+ | MobileMamba-B4 | 76.3 | 36.6 | 47.1 | 23.4M | 4.7G | [cfg](downstream/seg/configs/deeplabv3/deeplabv3_mobilemamba_b4-80k_ade20k-512x512.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/downstream/seg/deeplabv3.log) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/downstream/seg/deeplabv3.pth) |
78
+
79
+
80
+ ### Semantic Segmentation Based on [PSPNet](https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.pdf) for [ADE20k](http://sceneparsing.csail.mit.edu/):
81
+ | Backbone | aAcc | mIoU | mAcc | #Params | FLOPs | Cfg | Log | Model |
82
+ |:--------:|:----:|:----:|:----:|:-------:|:-----:|:----------------------------------------:|:----------------------------------------:|:------------------------------------------:|
83
+ | MobileMamba-B4 | 76.2 | 36.9 | 47.9 | 20.5M | 4.5G | [cfg](downstream/seg/configs/pspnet/pspnet_mobilemamba_b4-80k_ade20k-512x512.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/downstream/seg/pspnet.log) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/downstream/seg/pspnet.pth) |
84
+
85
+ ------
86
+ # All Pretrained Weights and Logs
87
+
88
+ The model weights and log files for all classification and downstream tasks are available for download via [GoogleDrive](https://drive.google.com/file/d/1EDqWI6JKMaLZRSRWt9aM7VXaNvosStGE/view?usp=drive_link) and [Hugging Face](https://huggingface.co/Lewandofski/MobileMamba/tree/main)..
89
+
90
+ ------
91
+ # Classification
92
+ ## Environments
93
+ ```shell
94
+ pip3 install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
95
+ pip3 install timm==0.6.5 tensorboardX einops torchprofile fvcore==0.1.5.post20221221
96
+ cd model/lib_mamba/kernels/selective_scan && pip install . && cd ../../../..
97
+ git clone https://github.com/NVIDIA/apex && cd apex && pip3 install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ (optional)
98
+ ```
99
+
100
+ ## Prepare ImageNet-1K Dataset
101
+ Download and extract [ImageNet-1K](http://image-net.org/) dataset in the following directory structure:
102
+
103
+ ```
104
+ β”œβ”€β”€ imagenet
105
+ β”œβ”€β”€ train
106
+ β”œβ”€β”€ n01440764
107
+ β”œβ”€β”€ n01440764_10026.JPEG
108
+ β”œβ”€β”€ ...
109
+ β”œβ”€β”€ ...
110
+ β”œβ”€β”€ train.txt (optional)
111
+ β”œβ”€β”€ val
112
+ β”œβ”€β”€ n01440764
113
+ β”œβ”€β”€ ILSVRC2012_val_00000293.JPEG
114
+ β”œβ”€β”€ ...
115
+ β”œβ”€β”€ ...
116
+ └── val.txt (optional)
117
+ ```
118
+
119
+ ## Test
120
+ Test with 8 GPUs in one node:
121
+
122
+ <details>
123
+ <summary>
124
+ MobileMamba-T2
125
+ </summary>
126
+
127
+ ```
128
+ python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_t2 -m test model.model_kwargs.checkpoint_path=weights/MobileMamba_T2/mobilemamba_t2.pth
129
+ ```
130
+ This should give `Top-1: 73.638 (Top-5: 91.422)`
131
+ </details>
132
+
133
+ <details>
134
+ <summary>
135
+ MobileMamba-T2†
136
+ </summary>
137
+
138
+ ```
139
+ python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_t2s -m test model.model_kwargs.checkpoint_path=weights/MobileMamba_T2s/mobilemamba_t2s.pth
140
+ ```
141
+ This should give `Top-1: 76.934 (Top-5: 93.100)`
142
+ </details>
143
+
144
+ <details>
145
+ <summary>
146
+ MobileMamba-T4
147
+ </summary>
148
+
149
+ ```
150
+ python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_t4 -m test model.model_kwargs.checkpoint_path=weights/MobileMamba_T4/mobilemamba_t4.pth
151
+ ```
152
+ This should give `Top-1: 76.086 (Top-5: 92.772)`
153
+ </details>
154
+
155
+ <details>
156
+ <summary>
157
+ MobileMamba-T4†
158
+ </summary>
159
+
160
+ ```
161
+ python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_t4s -m test model.model_kwargs.checkpoint_path=weights/MobileMamba_T4s/mobilemamba_t4s.pth
162
+ ```
163
+ This should give `Top-1: 78.914 (Top-5: 94.160)`
164
+ </details>
165
+
166
+ <details>
167
+ <summary>
168
+ MobileMamba-S6
169
+ </summary>
170
+
171
+ ```
172
+ python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_s6 -m test model.model_kwargs.checkpoint_path=weights/MobileMamba_S6/mobilemamba_s6.pth
173
+ ```
174
+ This should give `Top-1: 78.002 (Top-5: 93.992)`
175
+ </details>
176
+
177
+ <details>
178
+ <summary>
179
+ MobileMamba-S6†
180
+ </summary>
181
+
182
+ ```
183
+ python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_s6s -m test model.model_kwargs.checkpoint_path=weights/MobileMamba_S6s/mobilemamba_s6s.pth
184
+ ```
185
+ This should give `Top-1: 80.742 (Top-5: 95.182)`
186
+ </details>
187
+
188
+ <details>
189
+ <summary>
190
+ MobileMamba-B1
191
+ </summary>
192
+
193
+ ```
194
+ python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_b1 -m test model.model_kwargs.checkpoint_path=weights/MobileMamba_B1/mobilemamba_b1.pth
195
+ ```
196
+ This should give `Top-1: 79.948 (Top-5: 94.924)`
197
+ </details>
198
+
199
+ <details>
200
+ <summary>
201
+ MobileMamba-B1†
202
+ </summary>
203
+
204
+ ```
205
+ python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_b1s -m test model.model_kwargs.checkpoint_path=weights/MobileMamba_B1s/mobilemamba_b1s.pth
206
+ ```
207
+ This should give `Top-1: 82.234 (Top-5: 95.872)`
208
+ </details>
209
+
210
+ <details>
211
+ <summary>
212
+ MobileMamba-B2
213
+ </summary>
214
+
215
+ ```
216
+ python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_b2 -m test model.model_kwargs.checkpoint_path=weights/MobileMamba_B2/mobilemamba_b2.pth
217
+ ```
218
+ This should give `Top-1: 81.624 (Top-5: 95.890)`
219
+ </details>
220
+
221
+ <details>
222
+ <summary>
223
+ MobileMamba-B2†
224
+ </summary>
225
+
226
+ ```
227
+ python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_b2s -m test model.model_kwargs.checkpoint_path=weights/MobileMamba_B2s/mobilemamba_b2s.pth
228
+ ```
229
+ This should give `Top-1: 83.260 (Top-5: 96.438)`
230
+ </details>
231
+
232
+ <details>
233
+ <summary>
234
+ MobileMamba-B4
235
+ </summary>
236
+
237
+ ```
238
+ python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_b4 -m test model.model_kwargs.checkpoint_path=weights/MobileMamba_B4/mobilemamba_b4.pth
239
+ ```
240
+ This should give `Top-1: 82.496 (Top-5: 96.252)`
241
+ </details>
242
+
243
+ <details>
244
+ <summary>
245
+ MobileMamba-B4†
246
+ </summary>
247
+
248
+ ```
249
+ python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_b4s -m test model.model_kwargs.checkpoint_path=weights/MobileMamba_B4s/mobilemamba_b4s.pth
250
+ ```
251
+ This should give `Top-1: 83.644 (Top-5: 96.606)`
252
+ </details>
253
+
254
+
255
+ ## Train
256
+ Train with 8 GPUs in one node:
257
+
258
+ <details>
259
+ <summary>
260
+ MobileMamba-T2
261
+ </summary>
262
+
263
+ ```
264
+ python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_t2 -m train
265
+ ```
266
+ </details>
267
+
268
+ <details>
269
+ <summary>
270
+ MobileMamba-T2†
271
+ </summary>
272
+
273
+ ```
274
+ python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_t2s -m train
275
+ ```
276
+ </details>
277
+
278
+ <details>
279
+ <summary>
280
+ MobileMamba-T4
281
+ </summary>
282
+
283
+ ```
284
+ python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_t4 -m train
285
+ ```
286
+ </details>
287
+
288
+ <details>
289
+ <summary>
290
+ MobileMamba-T4†
291
+ </summary>
292
+
293
+ ```
294
+ python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_t4s -m train
295
+ ```
296
+ </details>
297
+
298
+ <details>
299
+ <summary>
300
+ MobileMamba-S6
301
+ </summary>
302
+
303
+ ```
304
+ python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_s6 -m train
305
+ ```
306
+ </details>
307
+
308
+ <details>
309
+ <summary>
310
+ MobileMamba-S6†
311
+ </summary>
312
+
313
+ ```
314
+ python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_s6s -m train
315
+ ```
316
+ </details>
317
+
318
+ <details>
319
+ <summary>
320
+ MobileMamba-B1
321
+ </summary>
322
+
323
+ ```
324
+ python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_b1 -m train
325
+ ```
326
+ </details>
327
+
328
+ <details>
329
+ <summary>
330
+ MobileMamba-B1†
331
+ </summary>
332
+
333
+ ```
334
+ python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_b1s -m train
335
+ ```
336
+ </details>
337
+
338
+ <details>
339
+ <summary>
340
+ MobileMamba-B2
341
+ </summary>
342
+
343
+ ```
344
+ python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_b2 -m train
345
+ ```
346
+ </details>
347
+
348
+ <details>
349
+ <summary>
350
+ MobileMamba-B2†
351
+ </summary>
352
+
353
+ ```
354
+ python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_b2s -m train
355
+ ```
356
+ </details>
357
+
358
+ <details>
359
+ <summary>
360
+ MobileMamba-B4
361
+ </summary>
362
+
363
+ ```
364
+ python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_b4 -m train
365
+ ```
366
+ </details>
367
+
368
+ <details>
369
+ <summary>
370
+ MobileMamba-B4†
371
+ </summary>
372
+
373
+ ```
374
+ python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_b4s -m train
375
+ ```
376
+ </details>
377
+
378
+ ------
379
+ # Down-Stream Tasks
380
+ ## Environments
381
+ ```shell
382
+ pip3 install terminaltables pycocotools prettytable xtcocotools
383
+ pip3 install mmpretrain==1.2.0 mmdet==3.3.0 mmsegmentation==1.2.2
384
+ pip3 install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu118/torch2.1/index.html
385
+ cd det/backbones/lib_mamba/kernels/selective_scan && pip install . && cd ../../../..
386
+ ```
387
+ ## Prepare COCO and ADE20k Dataset
388
+ Download and extract [COCO2017](https://cocodataset.org) and [ADE20k](http://sceneparsing.csail.mit.edu/) dataset in the following directory structure:
389
+
390
+ ```
391
+ downstream
392
+ β”œβ”€β”€ det
393
+ β”œβ”€β”€β”€β”€ data
394
+ β”‚ β”œβ”€β”€β”€β”€ coco
395
+ β”‚ β”‚ β”œβ”€β”€β”€β”€ annotations
396
+ β”‚ β”‚ β”œβ”€β”€β”€β”€ train2017
397
+ β”‚ β”‚ β”œβ”€β”€β”€β”€ val2017
398
+ β”‚ β”‚ β”œβ”€β”€β”€β”€ test2017
399
+ β”œβ”€β”€ seg
400
+ β”œβ”€β”€β”€β”€ data
401
+ β”‚ β”œβ”€β”€β”€β”€ ade
402
+ β”‚ β”‚ β”œβ”€β”€β”€β”€ ADEChallengeData2016
403
+ β”‚ β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€ annotations
404
+ β”‚ β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€ images
405
+ ```
406
+
407
+ ## Object Detection
408
+ <details>
409
+ <summary>
410
+ Mask-RCNN
411
+ </summary>
412
+
413
+ #### Train:
414
+
415
+ ```
416
+ CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_train.sh configs/mask_rcnn/mask-rcnn_mobilemamba_b1_fpn_1x_coco.py 4
417
+ ```
418
+
419
+ #### Test:
420
+
421
+ ```
422
+ CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_test.sh configs/mask_rcnn/mask-rcnn_mobilemamba_b1_fpn_1x_coco.py ../../weights/downstream/det/maskrcnn.pth 4
423
+ ```
424
+ </details>
425
+
426
+ <details>
427
+ <summary>
428
+ RetinaNet
429
+ </summary>
430
+
431
+ #### Train:
432
+
433
+ ```
434
+ CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_train.sh configs/retinanet/retinanet_mobilemamba_b1_fpn_1x_coco.py 4
435
+ ```
436
+
437
+ #### Test:
438
+
439
+ ```
440
+ CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_test.sh configs/retinanet/retinanet_mobilemamba_b1_fpn_1x_coco.py ../../weights/downstream/det/retinanet.pth 4
441
+ ```
442
+ </details>
443
+
444
+ <details>
445
+ <summary>
446
+ SSDLite
447
+ </summary>
448
+
449
+ #### Train with 320 x 320 resolution:
450
+
451
+ ```
452
+ ./tools/dist_train.sh configs/ssd/ssdlite_mobilemamba_b1_8gpu_2lr_coco.py 8
453
+ ```
454
+
455
+ #### Test with 320 x 320 resolution:
456
+
457
+ ```
458
+ ./tools/dist_test.sh configs/ssd/ssdlite_mobilemamba_b1_8gpu_2lr_coco.py ../../weights/downstream/det/ssdlite.pth 8
459
+ ```
460
+
461
+ #### Train with 512 x 512 resolution:
462
+ ```
463
+ ./tools/dist_train.sh configs/ssd/ssdlite_mobilemamba_b1_8gpu_2lr_512_coco.py 8
464
+ ```
465
+
466
+ #### Test with 512 x 512 resolution:
467
+
468
+ ```
469
+ ./tools/dist_test.sh configs/ssd/ssdlite_mobilemamba_b1_8gpu_2lr_512_coco.py ../../weights/downstream/det/ssdlite_512.pth 8
470
+ ```
471
+ </details>
472
+
473
+
474
+ ## Semantic Segmentation
475
+ <details>
476
+ <summary>
477
+ DeepLabV3
478
+ </summary>
479
+
480
+ #### Train:
481
+
482
+ ```
483
+ CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_train.sh configs/deeplabv3/deeplabv3_mobilemamba_b4-80k_ade20k-512x512.py 4
484
+ ```
485
+
486
+ #### Test:
487
+
488
+ ```
489
+ CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_test.sh configs/deeplabv3/deeplabv3_mobilemamba_b4-80k_ade20k-512x512.py ../../weights/downstream/seg/deeplabv3.pth 4
490
+ ```
491
+ </details>
492
+
493
+ <details>
494
+ <summary>
495
+ Semantic FPN
496
+ </summary>
497
+
498
+ #### Train:
499
+
500
+ ```
501
+ CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_train.sh configs/sem_fpn/fpn_mobilemamba_b4-160k_ade20k-512x512.py 4
502
+ ```
503
+
504
+ #### Test:
505
+
506
+ ```
507
+ CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_test.sh configs/sem_fpn/fpn_mobilemamba_b4-160k_ade20k-512x512.py ../../weights/downstream/seg/fpn.pth 4
508
+ ```
509
+ </details>
510
+
511
+ <details>
512
+ <summary>
513
+ PSPNet
514
+ </summary>
515
+
516
+ #### Train:
517
+
518
+ ```
519
+ CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_train.sh configs/pspnet/pspnet_mobilemamba_b4-80k_ade20k-512x512.py 4
520
+ ```
521
+
522
+ #### Test:
523
+
524
+ ```
525
+ CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_test.sh configs/pspnet/pspnet_mobilemamba_b4-80k_ade20k-512x512.py ../../weights/downstream/seg/pspnet.pth 4
526
+ ```
527
+ </details>
528
+
529
+
530
+ # Citation
531
+ If our work is helpful for your research, please consider citing:
532
+ ```angular2html
533
+ @article{mobilemamba,
534
+ title={MobileMamba: Lightweight Multi-Receptive Visual Mamba Network},
535
+ author={Haoyang He and Jiangning Zhang and Yuxuan Cai and Hongxu Chen and Xiaobin Hu and Zhenye Gan and Yabiao Wang and Chengjie Wang and Yunsheng Wu and Lei Xie},
536
+ journal={arXiv preprint arXiv:2411.15941},
537
+ year={2024}
538
+ }
539
+ ```
540
+
541
+ # Acknowledgements
542
+ We thank but not limited to following repositories for providing assistance for our research:
543
+ - [EMO](https://github.com/zhangzjn/EMO)
544
+ - [EfficientViT](https://github.com/microsoft/Cream/tree/main/EfficientViT)
545
+ - [VMamba](https://github.com/MzeroMiko/VMamba)
546
+ - [TIMM](https://github.com/rwightman/pytorch-image-models)
547
+ - [MMDetection](https://github.com/open-mmlab/mmdetection)
548
+ - [MMSegmentation](https://github.com/open-mmlab/mmsegmentation)
549
+
550
+