TimingYang
/

Dino-Mamba

Image Classification

Model card Files Files and versions

Timing1 commited on Nov 25, 2025

Commit

05c8d05

·

verified ·

1 Parent(s): 6b1fcc5

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ Official pretrained checkpoints for **"RNN as Linear Transformer: A Closer Inves
 Mamba, originally introduced for language modeling, has recently garnered attention as an effective backbone for vision tasks. However, its underlying mechanism in visual domains remains poorly understood. In this work, we systematically investigate Mamba’s representational properties and make three primary contributions. First, we theoretically analyze Mamba’s relationship to Softmax and Linear Attention, confirming that it can be viewed as a low-rank approximation of Softmax Attention and thereby bridging the representational gap between Softmax and Linear forms. Second, we introduce a novel binary segmentation metric for activation map evaluation, extending qualitative assessments to a quantitative measure that demonstrates Mamba’s capacity to model long-range dependencies. Third, by leveraging DINO for self-supervised pretraining, we obtain clearer activation maps than those produced by standard supervised approaches, highlighting Mamba’s potential for interpretability. Notably, our model also achieves a 78.5\% linear probing accuracy on ImageNet, underscoring its strong performance. We hope this work can provide valuable insights for future investigations of Mamba-based vision architectures.
 ## Links
-- Paper: [arXiv](link)
 - Code: [GitHub](https://github.com/yangtiming/Dino-Mamba)
 ## Citation

 Mamba, originally introduced for language modeling, has recently garnered attention as an effective backbone for vision tasks. However, its underlying mechanism in visual domains remains poorly understood. In this work, we systematically investigate Mamba’s representational properties and make three primary contributions. First, we theoretically analyze Mamba’s relationship to Softmax and Linear Attention, confirming that it can be viewed as a low-rank approximation of Softmax Attention and thereby bridging the representational gap between Softmax and Linear forms. Second, we introduce a novel binary segmentation metric for activation map evaluation, extending qualitative assessments to a quantitative measure that demonstrates Mamba’s capacity to model long-range dependencies. Third, by leveraging DINO for self-supervised pretraining, we obtain clearer activation maps than those produced by standard supervised approaches, highlighting Mamba’s potential for interpretability. Notably, our model also achieves a 78.5\% linear probing accuracy on ImageNet, underscoring its strong performance. We hope this work can provide valuable insights for future investigations of Mamba-based vision architectures.
 ## Links
+- Paper: [arXiv](https://arxiv.org/abs/2511.18380)
 - Code: [GitHub](https://github.com/yangtiming/Dino-Mamba)
 ## Citation