Add model card and metadata for SELF1E

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +42 -3
README.md CHANGED
@@ -1,3 +1,42 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: transformers
4
+ pipeline_tag: image-segmentation
5
+ ---
6
+
7
+ # SELF1E: Rethinking MLLM Itself as a Segmenter with a Single Segmentation Token
8
+
9
+ This repository contains the weights for **SELF1E** (**S**egmentation **E**mbedding from MLLM it**SELF** with **1** token), an approach that enables Multi-modal Large Language Models to perform high-quality segmentation without external specialist decoders.
10
+
11
+ - **Paper:** [Rethinking MLLM Itself as a Segmenter with a Single Segmentation Token](https://huggingface.co/papers/2603.19026)
12
+ - **GitHub Repository:** [https://github.com/ANDYZAQ/SELF1E](https://github.com/ANDYZAQ/SELF1E)
13
+
14
+ ## Highlights
15
+
16
+ - ✅ **No external expert decoder** for text-guided referring segmentation.
17
+ - ✅ **Only 1 `[SEG]` token** for segmentation.
18
+ - ✅ **Competitive results** while eliminating the need for external decoders (like SAM).
19
+ - 🚀 A step forward for integrating segmentation ability directly inside MLLMs.
20
+
21
+ ## Introduction
22
+
23
+ SELF1E investigates whether and how we can unlock segmentation ability from MLLM it**SELF** with **1** segmentation **E**mbedding while achieving competitive results. The approach targets the fundamental limitation of resolution reduction in pixel-shuffled image features from MLLMs by:
24
+ 1. Retaining image features at their original uncompressed resolution and refilling them with residual features.
25
+ 2. Integrating pixel-unshuffle operations to unleash details.
26
+ 3. Redesigning the attention mask with dual perception pathways (image-to-image and image-to-segmentation).
27
+
28
+ ## Citation
29
+
30
+ If you find this project useful in your research, please consider citing:
31
+
32
+ ```bibtex
33
+ @inproceedings{zhang2026self1e,
34
+ author = {Zhang, Anqi and Ji, Xiaokang and Gao, Guangyu and Jiao, Jianbo and Liu, Chi Harold and Wei, Yunchao},
35
+ title = {SELF1E: Rethinking MLLM Itself as a Segmenter with a Single Segmentation Token},
36
+ booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
37
+ year = {2026},
38
+ }
39
+ ```
40
+
41
+ ## Acknowledgement
42
+ This work is built upon the [LISA](https://github.com/JIA-Lab-research/LISA) framework and some of the training settings are borrowed from [PSALM](https://github.com/zamling/PSALM).