Lewandofski
/

MobileMamba

Model card Files Files and versions

xet

Community

Lewandofski commited on Mar 11, 2025

Commit

3d09deb

verified ·

1 Parent(s): 000d147

Update README.md

Browse files

Files changed (1) hide show

README.md +550 -3

README.md CHANGED Viewed

@@ -1,3 +1,550 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+**[CVPR'25] Official PyTorch implementation of "[**MobileMamba: Lightweight Multi-Receptive Visual Mamba Network**](https://arxiv.org/pdf/2411.15941)".**
+---
+[![arXiv](https://img.shields.io/badge/Arxiv-2411.15941-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2411.15941)
+[![github](https://img.shields.io/github/stars/lewandofskee/MobileMamba.svg?style=social)](https://github.com/lewandofskee/MobileMamba)
+[Haoyang He<sup>1*</sup>](https://scholar.google.com/citations?hl=zh-CN&user=8NfQv1sAAAAJ),
+[Jiangning Zhang<sup>2*</sup>](https://zhangzjn.github.io),
+[Yuxuan Cai<sup>3</sup>](https://scholar.google.com/citations?hl=zh-CN&user=J9lTFAUAAAAJ),
+[Hongxu Chen<sup>1</sup>](https://scholar.google.com/citations?hl=zh-CN&user=uFT3YfMAAAAJ)
+[Xiaobin Hu<sup>2</sup>](https://scholar.google.com/citations?hl=zh-CN&user=3lMuodUAAAAJ),
+[Zhenye Gan<sup>2</sup>](https://scholar.google.com/citations?user=fa4NkScAAAAJ&hl=zh-CN),
+[Yabiao Wang<sup>2</sup>](https://scholar.google.com/citations?user=xiK4nFUAAAAJ&hl=zh-CN),
+[Chengjie Wang<sup>2</sup>](https://scholar.google.com/citations?hl=zh-CN&user=fqte5H4AAAAJ),
+Yunsheng Wu<sup>2</sup>,
+[Lei Xie<sup>1†</sup>](https://scholar.google.com/citations?hl=zh-CN&user=7ZZ_-m0AAAAJ)
+<sup>1</sup>College of Control Science and Engineering, Zhejiang University,
+<sup>2</sup>Youtu Lab, Tencent,
+<sup>3</sup>Huazhong University of Science and Technology
+> **Abstract:** Previous research on lightweight models has primarily focused on CNNs and Transformer-based designs. CNNs, with their local receptive fields, struggle to capture long-range dependencies, while Transformers, despite their global modeling capabilities, are limited by quadratic computational complexity in high-resolution scenarios. Recently, state-space models have gained popularity in the visual domain due to their linear computational complexity. Despite their low FLOPs, current lightweight Mamba-based models exhibit suboptimal throughput.
+In this work, we propose the MobileMamba framework, which balances efficiency and performance. We design a three-stage network to enhance inference speed significantly. At a fine-grained level, we introduce the Multi-Receptive Field Feature Interaction MRFFI module, comprising the Long-Range Wavelet Transform-Enhanced Mamba WTE-Mamba, Efficient Multi-Kernel Depthwise Convolution MK-DeConv, and Eliminate Redundant Identity components. This module integrates multi-receptive field information and enhances high-frequency detail extraction. Additionally, we employ training and testing strategies to further improve performance and efficiency.
+MobileMamba achieves up to 83.6% on Top-1, surpassing existing state-of-the-art methods which is maximum x21 faster than LocalVim on GPU. Extensive experiments on high-resolution downstream tasks demonstrate that MobileMamba surpasses current efficient models, achieving an optimal balance between speed and accuracy.
+------
+# Classification results
+### Image Classification for [ImageNet-1K](https://www.image-net.org):
+| Model                    | FLOPs | #Params | Resolution | Top-1 |                      Cfg                      |                        Log                         |                        Model                         |
+|--------------------------|:-----:|:-------:|:----------:|:-----:|:---------------------------------------------:|:--------------------------------------------------:|:----------------------------------------------------:|
+| MobileMamba-T2           | 255M  |  8.8M   | 192 x 192  | 71.5  | [cfg](configs/mobilemamba/mobilemamba_t2.py)  |  [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_T2/mobilemamba_t2.txt)  |  [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_T2/mobilemamba_t2.pth)  |
+| MobileMamba-T2† | 255M  |  8.8M   | 192 x 192  | 76.9  | [cfg](configs/mobilemamba/mobilemamba_t2s.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_T2s/mobilemamba_t2s.txt) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_T2s/mobilemamba_t2s.pth) |
+| MobileMamba-T4           | 413M  |  14.2M  | 192 x 192  | 76.1  | [cfg](configs/mobilemamba/mobilemamba_t4.py)  |  [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_T4/mobilemamba_t4.txt)  |  [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_T4/mobilemamba_t4.pth)  |
+| MobileMamba-T4†          | 413M  |  14.2M  | 192 x 192  | 78.9  | [cfg](configs/mobilemamba/mobilemamba_t4s.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_T4s/mobilemamba_t4s.txt) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_T4s/mobilemamba_t4s.pth) |
+| MobileMamba-S6           | 652M  |  15.0M  | 224 x 224  | 78.0  | [cfg](configs/mobilemamba/mobilemamba_s6.py)  |  [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_S6/mobilemamba_s6.txt)  |  [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_S6/mobilemamba_s6.pth)  |
+| MobileMamba-S6†          | 652M  |  15.0M  | 224 x 224  | 80.7  | [cfg](configs/mobilemamba/mobilemamba_s6s.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_S6s/mobilemamba_s6s.txt) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_S6s/mobilemamba_s6s.pth) |
+| MobileMamba-B1           | 1080M |  17.1M  | 256 x 256  | 79.9  | [cfg](configs/mobilemamba/mobilemamba_b1.py)  |  [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_B1/mobilemamba_b1.txt)  |  [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_B1/mobilemamba_b1.pth)  |
+| MobileMamba-B1†          | 1080M |  17.1M  | 256 x 256  | 82.2  | [cfg](configs/mobilemamba/mobilemamba_b1s.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_B1s/mobilemamba_b1s.txt) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_B1s/mobilemamba_b1s.pth) |
+| MobileMamba-B2           | 2427M |  17.1M  | 384 x 384  | 81.6  | [cfg](configs/mobilemamba/mobilemamba_b2.py)  |  [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_B2/mobilemamba_b2.txt)  |  [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_B2/mobilemamba_b2.pth)  |
+| MobileMamba-B2†          | 2427M |  17.1M  | 384 x 384  | 83.3  | [cfg](configs/mobilemamba/mobilemamba_b2s.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_B2s/mobilemamba_b2s.txt) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_B2s/mobilemamba_b2s.pth) |
+| MobileMamba-B4           | 4313M |  17.1M  | 512 x 512  | 82.5  | [cfg](configs/mobilemamba/mobilemamba_b4.py)  |  [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_B4/mobilemamba_b4.txt)  |  [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_B4/mobilemamba_b4.pth)  |
+| MobileMamba-B4†          | 4313M |  17.1M  | 512 x 512  | 83.6  | [cfg](configs/mobilemamba/mobilemamba_b4s.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_B4s/mobilemamba_b4s.txt) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/MobileMamba_B4s/mobilemamba_b4s.pth) |
+------
+# Downstream Results
+## Object Detection and Instant Segmentation Results
+### Object Detection  and Instant Segmentation Performance Based on [Mask-RCNN](https://openaccess.thecvf.com/content_ICCV_2017/papers/He_Mask_R-CNN_ICCV_2017_paper.pdf) for [COCO2017](https://cocodataset.org):
+| Backbone | AP<sup>b</sup> | AP<sup>b</sup><sub>50</sub> | AP<sup>b</sup><sub>75</sub> | AP<sup>b</sup><sub>S</sub> | AP<sup>b</sup><sub>M</sub> | AP<sup>b</sup><sub>L</sub> | AP<sup>m</sup> | AP<sup>m</sup><sub>50</sub> | AP<sup>m</sup><sub>75</sub> | AP<sup>m</sup><sub>S</sub> | AP<sup>m</sup><sub>M</sub> | AP<sup>m</sup><sub>L</sub> | #Params | FLOPs |                    Cfg                     |                    Log                     |                    Model                     |
+|:--------:|:--------------:|:---------------------------:|:---------------------------:|:--------------------------:|:--------------------------:|:--------------------------:|:--------------:|:---------------------------:|:---------------------------:|:--------------------------:|:--------------------------:|:--------------------------:|:-------:|:-----:|:------------------------------------------:|:------------------------------------------:|:--------------------------------------------:|
+|  MobileMamba-B1  |      40.6      |            61.8             |            43.8             |            22.4            |            43.5            |            55.9            |      37.4      |            58.9             |            39.9             |            17.1            |            39.9            |            56.4            |  38.0M  | 178G  | [cfg](downstream/det/configs/mask_rcnn/mask-rcnn_mobilemamba_b1_fpn_1x_coco.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/downstream/det/maskrcnn.log) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/downstream/det/maskrcnn.pth) |
+### Object Detection Performance Based on [RetinaNet](https://openaccess.thecvf.com/content_ICCV_2017/papers/Lin_Focal_Loss_for_ICCV_2017_paper.pdf) for [COCO2017](https://cocodataset.org):
+| Backbone |  AP  | AP<sub>50</sub> | AP<sub>75</sub> | AP<sub>S</sub> | AP<sub>M</sub> | AP<sub>L</sub> | #Params | FLOPs |                     Cfg                     |                     Log                     |                     Model                     |
+|:--------:|:----:|:---------------:|:---------------:|:--------------:|:--------------:|:--------------:|:-------:|:-----:|:-------------------------------------------:|:-------------------------------------------:|:---------------------------------------------:|
+|  MobileMamba-B1  | 39.6 |      59.8       |      42.4       |      21.5      |      43.4      |      53.9      |  27.1M  | 151G  | [cfg](downstream/det/configs/retinanet/retinanet_mobilemamba_b1_fpn_1x_coco.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/downstream/det/retinanet.log) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/downstream/det/retinanet.pth) |
+### Object Detection Performance Based on [SSDLite](https://openaccess.thecvf.com/content_ICCV_2019/papers/Howard_Searching_for_MobileNetV3_ICCV_2019_paper.pdf) for [COCO2017](https://cocodataset.org):
+|      Backbone       |  AP  | AP<sub>50</sub> | AP<sub>75</sub> | AP<sub>S</sub> | AP<sub>M</sub> | AP<sub>L</sub> | #Params | FLOPs |                                      Cfg                                      |              Log               |                      Model                      |
+|:-------------------:|:----:|:---------------:|:---------------:|:--------------:|:--------------:|:--------------:|:-------:|:-----:|:-----------------------------------------------------------------------------:|:------------------------------:|:-----------------------------------------------:|
+|   MobileMamba-B1    | 24.0 |      39.5       |      24.0       |      3.1       |      23.4      |      46.9      |  18.0M  | 1.7G  |   [cfg](downstream/det/configs/ssd/ssdlite_mobilemamba_b1_8gpu_2lr_coco.py)   |   [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/downstream/det/ssdlite.log)   |   [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/downstream/det/ssdlite.pth)   |
+| MobileMamba-B1-r512 | 29.5 |      47.7       |      30.4       |      8.9       |      35.0      |      47.0      |  18.0M  | 4.4G  | [cfg](downstream/det/configs/ssd/ssdlite_mobilemamba_b1_8gpu_2lr_512_coco.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/downstream/det/ssdlite_512.log) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/downstream/det/ssdlite_512.pth) |
+## Semantic Segmentation Results
+### Semantic Segmentation Based on [Semantic FPN](https://openaccess.thecvf.com/content_CVPR_2019/papers/Kirillov_Panoptic_Feature_Pyramid_Networks_CVPR_2019_paper.pdf) for [ADE20k](http://sceneparsing.csail.mit.edu/):
+| Backbone | aAcc | mIoU | mAcc | #Params | FLOPs |                  Cfg                  |                  Log                  |                  Model                  |
+|:--------:|:----:|:----:|:----:|:-------:|:-----:|:-------------------------------------:|:-------------------------------------:|:---------------------------------------:|
+| MobileMamba-B4 | 79.9 | 42.5 | 53.7 |  19.8M  | 5.6G  | [cfg](downstream/seg/configs/sem_fpn/fpn_mobilemamba_b4-160k_ade20k-512x512.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/downstream/seg/fpn.log) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/downstream/seg/fpn.pth) |
+### Semantic Segmentation Based on [DeepLabv3](https://arxiv.org/pdf/1706.05587.pdf) for [ADE20k](http://sceneparsing.csail.mit.edu/):
+|    Backbone    | aAcc | mIoU | mAcc | #Params | FLOPs |                     Cfg                     |                     Log                     |                     Model                     |
+|:--------------:|:----:|:----:|:----:|:-------:|:-----:|:-------------------------------------------:|:-------------------------------------------:|:---------------------------------------------:|
+| MobileMamba-B4 | 76.3 | 36.6 | 47.1 |  23.4M  | 4.7G  | [cfg](downstream/seg/configs/deeplabv3/deeplabv3_mobilemamba_b4-80k_ade20k-512x512.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/downstream/seg/deeplabv3.log) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/downstream/seg/deeplabv3.pth) |
+### Semantic Segmentation Based on [PSPNet](https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.pdf) for [ADE20k](http://sceneparsing.csail.mit.edu/):
+| Backbone | aAcc | mIoU | mAcc | #Params | FLOPs |                   Cfg                    |                   Log                    |                   Model                    |
+|:--------:|:----:|:----:|:----:|:-------:|:-----:|:----------------------------------------:|:----------------------------------------:|:------------------------------------------:|
+| MobileMamba-B4 | 76.2 | 36.9 | 47.9 |  20.5M  | 4.5G  | [cfg](downstream/seg/configs/pspnet/pspnet_mobilemamba_b4-80k_ade20k-512x512.py) | [log](https://huggingface.co/Lewandofski/MobileMamba/blob/main/downstream/seg/pspnet.log) | [model](https://huggingface.co/Lewandofski/MobileMamba/blob/main/downstream/seg/pspnet.pth) |
+------
+# All Pretrained Weights and Logs
+The model weights and log files for all classification and downstream tasks are available for download via [GoogleDrive](https://drive.google.com/file/d/1EDqWI6JKMaLZRSRWt9aM7VXaNvosStGE/view?usp=drive_link) and [Hugging Face](https://huggingface.co/Lewandofski/MobileMamba/tree/main)..
+------
+# Classification
+## Environments
+```shell
+pip3 install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
+pip3 install timm==0.6.5 tensorboardX einops torchprofile fvcore==0.1.5.post20221221
+cd model/lib_mamba/kernels/selective_scan && pip install . && cd ../../../..
+git clone https://github.com/NVIDIA/apex && cd apex && pip3 install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ (optional)
+```
+## Prepare ImageNet-1K Dataset
+Download and extract [ImageNet-1K](http://image-net.org/) dataset in the following directory structure:
+```
+├── imagenet
+    ├── train
+        ├── n01440764
+            ├── n01440764_10026.JPEG
+            ├── ...
+        ├── ...
+    ├── train.txt (optional)
+    ├── val
+        ├── n01440764
+            ├── ILSVRC2012_val_00000293.JPEG
+            ├── ...
+        ├── ...
+    └── val.txt (optional)
+```
+## Test
+Test with 8 GPUs in one node:
+<details>
+<summary>
+MobileMamba-T2
+</summary>
+```
+python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_t2 -m test model.model_kwargs.checkpoint_path=weights/MobileMamba_T2/mobilemamba_t2.pth
+```
+This should give `Top-1: 73.638 (Top-5: 91.422)`
+</details>
+<details>
+<summary>
+MobileMamba-T2†
+</summary>
+```
+python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_t2s -m test model.model_kwargs.checkpoint_path=weights/MobileMamba_T2s/mobilemamba_t2s.pth
+```
+This should give `Top-1: 76.934 (Top-5: 93.100)`
+</details>
+<details>
+<summary>
+MobileMamba-T4
+</summary>
+```
+python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_t4 -m test model.model_kwargs.checkpoint_path=weights/MobileMamba_T4/mobilemamba_t4.pth
+```
+This should give `Top-1: 76.086 (Top-5: 92.772)`
+</details>
+<details>
+<summary>
+MobileMamba-T4†
+</summary>
+```
+python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_t4s -m test model.model_kwargs.checkpoint_path=weights/MobileMamba_T4s/mobilemamba_t4s.pth
+```
+This should give `Top-1: 78.914 (Top-5: 94.160)`
+</details>
+<details>
+<summary>
+MobileMamba-S6
+</summary>
+```
+python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_s6 -m test model.model_kwargs.checkpoint_path=weights/MobileMamba_S6/mobilemamba_s6.pth
+```
+This should give `Top-1: 78.002 (Top-5: 93.992)`
+</details>
+<details>
+<summary>
+MobileMamba-S6†
+</summary>
+```
+python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_s6s -m test model.model_kwargs.checkpoint_path=weights/MobileMamba_S6s/mobilemamba_s6s.pth
+```
+This should give `Top-1: 80.742 (Top-5: 95.182)`
+</details>
+<details>
+<summary>
+MobileMamba-B1
+</summary>
+```
+python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_b1 -m test model.model_kwargs.checkpoint_path=weights/MobileMamba_B1/mobilemamba_b1.pth
+```
+This should give `Top-1: 79.948 (Top-5: 94.924)`
+</details>
+<details>
+<summary>
+MobileMamba-B1†
+</summary>
+```
+python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_b1s -m test model.model_kwargs.checkpoint_path=weights/MobileMamba_B1s/mobilemamba_b1s.pth
+```
+This should give `Top-1: 82.234 (Top-5: 95.872)`
+</details>
+<details>
+<summary>
+MobileMamba-B2
+</summary>
+```
+python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_b2 -m test model.model_kwargs.checkpoint_path=weights/MobileMamba_B2/mobilemamba_b2.pth
+```
+This should give `Top-1: 81.624 (Top-5: 95.890)`
+</details>
+<details>
+<summary>
+MobileMamba-B2†
+</summary>
+```
+python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_b2s -m test model.model_kwargs.checkpoint_path=weights/MobileMamba_B2s/mobilemamba_b2s.pth
+```
+This should give `Top-1: 83.260 (Top-5: 96.438)`
+</details>
+<details>
+<summary>
+MobileMamba-B4
+</summary>
+```
+python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_b4 -m test model.model_kwargs.checkpoint_path=weights/MobileMamba_B4/mobilemamba_b4.pth
+```
+This should give `Top-1: 82.496 (Top-5: 96.252)`
+</details>
+<details>
+<summary>
+MobileMamba-B4†
+</summary>
+```
+python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_b4s -m test model.model_kwargs.checkpoint_path=weights/MobileMamba_B4s/mobilemamba_b4s.pth
+```
+This should give `Top-1: 83.644 (Top-5: 96.606)`
+</details>
+## Train
+Train with 8 GPUs in one node:
+<details>
+<summary>
+MobileMamba-T2
+</summary>
+```
+python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_t2 -m train
+```
+</details>
+<details>
+<summary>
+MobileMamba-T2†
+</summary>
+```
+python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_t2s -m train
+```
+</details>
+<details>
+<summary>
+MobileMamba-T4
+</summary>
+```
+python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_t4 -m train
+```
+</details>
+<details>
+<summary>
+MobileMamba-T4†
+</summary>
+```
+python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_t4s -m train
+```
+</details>
+<details>
+<summary>
+MobileMamba-S6
+</summary>
+```
+python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_s6 -m train
+```
+</details>
+<details>
+<summary>
+MobileMamba-S6†
+</summary>
+```
+python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_s6s -m train
+```
+</details>
+<details>
+<summary>
+MobileMamba-B1
+</summary>
+```
+python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_b1 -m train
+```
+</details>
+<details>
+<summary>
+MobileMamba-B1†
+</summary>
+```
+python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_b1s -m train
+```
+</details>
+<details>
+<summary>
+MobileMamba-B2
+</summary>
+```
+python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_b2 -m train
+```
+</details>
+<details>
+<summary>
+MobileMamba-B2†
+</summary>
+```
+python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_b2s -m train
+```
+</details>
+<details>
+<summary>
+MobileMamba-B4
+</summary>
+```
+python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_b4 -m train
+```
+</details>
+<details>
+<summary>
+MobileMamba-B4†
+</summary>
+```
+python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobilemamba/mobilemamba_b4s -m train
+```
+</details>
+------
+# Down-Stream Tasks
+## Environments
+```shell
+pip3 install terminaltables pycocotools prettytable xtcocotools
+pip3 install mmpretrain==1.2.0 mmdet==3.3.0 mmsegmentation==1.2.2
+pip3 install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu118/torch2.1/index.html
+cd det/backbones/lib_mamba/kernels/selective_scan && pip install . && cd ../../../..
+```
+## Prepare COCO and ADE20k Dataset
+Download and extract [COCO2017](https://cocodataset.org) and [ADE20k](http://sceneparsing.csail.mit.edu/) dataset in the following directory structure:
+```
+downstream
+├── det
+├──── data
+│   ├──── coco
+│   │   ├──── annotations
+│   │   ├──── train2017
+│   │   ├──── val2017
+│   │   ├──── test2017
+├── seg
+├──── data
+│   ├──── ade
+│   │   ├──── ADEChallengeData2016
+│   │   ├──────── annotations
+│   │   ├──────── images
+```
+## Object Detection
+<details>
+<summary>
+Mask-RCNN
+</summary>
+#### Train:
+```
+CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_train.sh configs/mask_rcnn/mask-rcnn_mobilemamba_b1_fpn_1x_coco.py 4
+```
+#### Test:
+```
+CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_test.sh configs/mask_rcnn/mask-rcnn_mobilemamba_b1_fpn_1x_coco.py ../../weights/downstream/det/maskrcnn.pth 4
+```
+</details>
+<details>
+<summary>
+RetinaNet
+</summary>
+#### Train:
+```
+CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_train.sh configs/retinanet/retinanet_mobilemamba_b1_fpn_1x_coco.py 4
+```
+#### Test:
+```
+CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_test.sh configs/retinanet/retinanet_mobilemamba_b1_fpn_1x_coco.py ../../weights/downstream/det/retinanet.pth 4
+```
+</details>
+<details>
+<summary>
+SSDLite
+</summary>
+#### Train with 320 x 320 resolution:
+```
+./tools/dist_train.sh configs/ssd/ssdlite_mobilemamba_b1_8gpu_2lr_coco.py 8
+```
+#### Test with 320 x 320 resolution:
+```
+./tools/dist_test.sh configs/ssd/ssdlite_mobilemamba_b1_8gpu_2lr_coco.py ../../weights/downstream/det/ssdlite.pth 8
+```
+#### Train with 512 x 512 resolution:
+```
+./tools/dist_train.sh configs/ssd/ssdlite_mobilemamba_b1_8gpu_2lr_512_coco.py 8
+```
+#### Test with 512 x 512 resolution:
+```
+./tools/dist_test.sh configs/ssd/ssdlite_mobilemamba_b1_8gpu_2lr_512_coco.py ../../weights/downstream/det/ssdlite_512.pth 8
+```
+</details>
+## Semantic Segmentation
+<details>
+<summary>
+DeepLabV3
+</summary>
+#### Train:
+```
+CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_train.sh configs/deeplabv3/deeplabv3_mobilemamba_b4-80k_ade20k-512x512.py 4
+```
+#### Test:
+```
+CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_test.sh configs/deeplabv3/deeplabv3_mobilemamba_b4-80k_ade20k-512x512.py ../../weights/downstream/seg/deeplabv3.pth 4
+```
+</details>
+<details>
+<summary>
+Semantic FPN
+</summary>
+#### Train:
+```
+CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_train.sh configs/sem_fpn/fpn_mobilemamba_b4-160k_ade20k-512x512.py 4
+```
+#### Test:
+```
+CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_test.sh configs/sem_fpn/fpn_mobilemamba_b4-160k_ade20k-512x512.py ../../weights/downstream/seg/fpn.pth 4
+```
+</details>
+<details>
+<summary>
+PSPNet
+</summary>
+#### Train:
+```
+CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_train.sh configs/pspnet/pspnet_mobilemamba_b4-80k_ade20k-512x512.py 4
+```
+#### Test:
+```
+CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_test.sh configs/pspnet/pspnet_mobilemamba_b4-80k_ade20k-512x512.py ../../weights/downstream/seg/pspnet.pth 4
+```
+</details>
+# Citation
+If our work is helpful for your research, please consider citing:
+```angular2html
+@article{mobilemamba,
+  title={MobileMamba: Lightweight Multi-Receptive Visual Mamba Network},
+  author={Haoyang He and Jiangning Zhang and Yuxuan Cai and Hongxu Chen and Xiaobin Hu and Zhenye Gan and Yabiao Wang and Chengjie Wang and Yunsheng Wu and Lei Xie},
+  journal={arXiv preprint arXiv:2411.15941},
+  year={2024}
+}
+```
+# Acknowledgements
+We thank but not limited to following repositories for providing assistance for our research:
+- [EMO](https://github.com/zhangzjn/EMO)
+- [EfficientViT](https://github.com/microsoft/Cream/tree/main/EfficientViT)
+- [VMamba](https://github.com/MzeroMiko/VMamba)
+- [TIMM](https://github.com/rwightman/pytorch-image-models)
+- [MMDetection](https://github.com/open-mmlab/mmdetection)
+- [MMSegmentation](https://github.com/open-mmlab/mmsegmentation)