Add model card and metadata for Heima
#1
by nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,4 +1,33 @@
|
|
| 1 |
---
|
| 2 |
base_model:
|
| 3 |
- meta-llama/Llama-3.2-11B-Vision-Instruct
|
| 4 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
base_model:
|
| 3 |
- meta-llama/Llama-3.2-11B-Vision-Instruct
|
| 4 |
+
library_name: transformers
|
| 5 |
+
pipeline_tag: image-text-to-text
|
| 6 |
+
license: llama3.2
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
# Heima: Efficient Reasoning with Hidden Thinking
|
| 10 |
+
|
| 11 |
+
Heima (short for "hidden llama") is a Chain-of-Thought (CoT) compression framework designed for Multimodal Large Language Models (MLLMs). It condenses lengthy textual reasoning into a small set of abstract "thinking tokens," preserving essential reasoning capabilities while significantly improving inference efficiency.
|
| 12 |
+
|
| 13 |
+
- **Paper:** [Efficient Reasoning with Hidden Thinking](https://huggingface.co/papers/2501.19201)
|
| 14 |
+
- **Repository:** [https://github.com/shawnricecake/heima](https://github.com/shawnricecake/heima)
|
| 15 |
+
|
| 16 |
+
## Model Description
|
| 17 |
+
The Heima framework addresses the redundancy and verbosity of traditional textual CoT. By training the model to utilize latent thinking tokens, the Heima Encoder can maintain high problem-solving accuracy while reducing the number of generated tokens.
|
| 18 |
+
|
| 19 |
+
This repository contains the weights for the Heima Encoder, based on the Llama-3.2-11B-Vision-Instruct architecture. To reconstruct the reasoning process into human-readable text, an associated Heima Decoder (interpreter) can be used to map the thinking tokens back into textual sequences.
|
| 20 |
+
|
| 21 |
+
## Performance
|
| 22 |
+
Experiments across diverse reasoning benchmarks demonstrate that Heima improves reasoning efficiency while maintaining or even achieving better zero-shot accuracy compared to standard verbose CoT methods.
|
| 23 |
+
|
| 24 |
+
## Citation
|
| 25 |
+
If you find Heima useful for your research, please cite:
|
| 26 |
+
```bibtex
|
| 27 |
+
@article{shen2025efficient,
|
| 28 |
+
title={Efficient Reasoning with Hidden Thinking},
|
| 29 |
+
author={Shen, Xuan and Wang, Yizhou and Shi, Xiangxi and Wang, Yanzhi and Zhao, Pu and Gu, Jiuxiang},
|
| 30 |
+
journal={arXiv preprint arXiv:2501.19201},
|
| 31 |
+
year={2025}
|
| 32 |
+
}
|
| 33 |
+
```
|