Improve model card metadata and structure
#1
by nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,12 +1,22 @@
|
|
| 1 |
---
|
|
|
|
|
|
|
| 2 |
tags:
|
| 3 |
- model_hub_mixin
|
| 4 |
- pytorch_model_hub_mixin
|
| 5 |
-
|
|
|
|
| 6 |
---
|
|
|
|
| 7 |
# Communication-Inspired Tokenization for Structured Image Representations
|
| 8 |
-
|
| 9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
<a href="https://araachie.github.io">Aram Davtyan</a> •
|
| 11 |
<a href="https://www.cvg.unibe.ch/people/sahin">Yusuf Sahin</a> •
|
| 12 |
<a href="https://people.epfl.ch/yasaman.haghighi?lang=en">Yasaman Haghighi</a> •
|
|
@@ -14,28 +24,38 @@ license: apache-2.0
|
|
| 14 |
<a href="https://www.cvg.unibe.ch/people/acuaviva">Pablo Acuaviva</a> •
|
| 15 |
<a href="https://people.epfl.ch/alexandre.alahi?lang=en">Alexandre Alahi</a> •
|
| 16 |
<a href="https://www.cvg.unibe.ch/people/favaro">Paolo Favaro</a>
|
| 17 |
-
|
| 18 |
|
| 19 |
-
|
| 20 |
|
| 21 |
-
Project's website: https://araachie.github.io/comit/
|
| 22 |
-
|
| 23 |
## Installation
|
| 24 |
|
| 25 |
-
Follow the instructions at https://github.com/Araachie/comit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
## Usage
|
| 28 |
|
| 29 |
-
|
|
|
|
| 30 |
|
| 31 |
```python
|
|
|
|
| 32 |
from comit import COMiT
|
| 33 |
|
|
|
|
| 34 |
model = COMiT.from_pretrained('cvg-unibe/comit-l')
|
| 35 |
model.eval().to(device)
|
| 36 |
```
|
| 37 |
|
| 38 |
-
|
|
|
|
| 39 |
|
| 40 |
```python
|
| 41 |
with torch.no_grad():
|
|
@@ -45,14 +65,12 @@ with torch.no_grad():
|
|
| 45 |
order="adaptive", # One of ["raster_scan", "random", "adaptive"] or a list of crop indices
|
| 46 |
num_crops=3, # Used to truncate the list of crops to embed
|
| 47 |
)
|
| 48 |
-
```
|
| 49 |
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
```python
|
| 53 |
token_ids = model.quantizer.codes_to_indices(token_dict["msgs"])
|
| 54 |
```
|
| 55 |
|
|
|
|
| 56 |
To visually probe the information in the token sequences, one can decode the tokens back into images:
|
| 57 |
|
| 58 |
```python
|
|
@@ -83,12 +101,12 @@ with torch.no_grad():
|
|
| 83 |
|
| 84 |
## Licensing
|
| 85 |
|
| 86 |
-
Unless otherwise noted, the model weights are licensed under Apache
|
| 87 |
-
For the code licensing, see https://github.com/Araachie/comit?tab=readme-ov-file#licensing
|
| 88 |
|
| 89 |
## Citation
|
| 90 |
|
| 91 |
-
If you find this work helpful, please consider citing
|
| 92 |
|
| 93 |
```bibtex
|
| 94 |
@misc{davtyan2026comit,
|
|
|
|
| 1 |
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
pipeline_tag: image-feature-extraction
|
| 4 |
tags:
|
| 5 |
- model_hub_mixin
|
| 6 |
- pytorch_model_hub_mixin
|
| 7 |
+
- vision
|
| 8 |
+
- image-tokenization
|
| 9 |
---
|
| 10 |
+
|
| 11 |
# Communication-Inspired Tokenization for Structured Image Representations
|
| 12 |
+
|
| 13 |
+
<p align="left">
|
| 14 |
+
<a href="https://huggingface.co/papers/2602.20731">Paper</a> •
|
| 15 |
+
<a href="https://araachie.github.io/comit/">Project Website</a> •
|
| 16 |
+
<a href="https://github.com/Araachie/comit">GitHub</a>
|
| 17 |
+
</p>
|
| 18 |
+
|
| 19 |
+
<p align="left">
|
| 20 |
<a href="https://araachie.github.io">Aram Davtyan</a> •
|
| 21 |
<a href="https://www.cvg.unibe.ch/people/sahin">Yusuf Sahin</a> •
|
| 22 |
<a href="https://people.epfl.ch/yasaman.haghighi?lang=en">Yasaman Haghighi</a> •
|
|
|
|
| 24 |
<a href="https://www.cvg.unibe.ch/people/acuaviva">Pablo Acuaviva</a> •
|
| 25 |
<a href="https://people.epfl.ch/alexandre.alahi?lang=en">Alexandre Alahi</a> •
|
| 26 |
<a href="https://www.cvg.unibe.ch/people/favaro">Paolo Favaro</a>
|
| 27 |
+
</p>
|
| 28 |
|
| 29 |
+
COMmunication inspired Tokenization (**COMiT**) is a framework for learning structured discrete visual token sequences. Unlike traditional tokenizers optimized primarily for reconstruction, COMiT constructs a latent message by iteratively observing localized image crops and recurrently updating its discrete representation, resulting in interpretable, object-centric token structure.
|
| 30 |
|
|
|
|
|
|
|
| 31 |
## Installation
|
| 32 |
|
| 33 |
+
Follow the instructions at the [official repository](https://github.com/Araachie/comit):
|
| 34 |
+
|
| 35 |
+
```bash
|
| 36 |
+
git clone https://github.com/Araachie/comit.git
|
| 37 |
+
cd comit
|
| 38 |
+
conda create -n comit python==3.11 -y
|
| 39 |
+
conda activate comit
|
| 40 |
+
pip install -e .
|
| 41 |
+
```
|
| 42 |
|
| 43 |
## Usage
|
| 44 |
|
| 45 |
+
### Loading the Model
|
| 46 |
+
You can download and load the pre-trained `COMiT-L` model directly from the Hub:
|
| 47 |
|
| 48 |
```python
|
| 49 |
+
import torch
|
| 50 |
from comit import COMiT
|
| 51 |
|
| 52 |
+
device = "cuda" if torch.cuda.is_available() else "cpu"
|
| 53 |
model = COMiT.from_pretrained('cvg-unibe/comit-l')
|
| 54 |
model.eval().to(device)
|
| 55 |
```
|
| 56 |
|
| 57 |
+
### Encoding Images (Tokenization)
|
| 58 |
+
With a pretrained COMiT model, images can be encoded into token sequences as follows:
|
| 59 |
|
| 60 |
```python
|
| 61 |
with torch.no_grad():
|
|
|
|
| 65 |
order="adaptive", # One of ["raster_scan", "random", "adaptive"] or a list of crop indices
|
| 66 |
num_crops=3, # Used to truncate the list of crops to embed
|
| 67 |
)
|
|
|
|
| 68 |
|
| 69 |
+
# Get token indices (discrete IDs)
|
|
|
|
|
|
|
| 70 |
token_ids = model.quantizer.codes_to_indices(token_dict["msgs"])
|
| 71 |
```
|
| 72 |
|
| 73 |
+
### Decoding Tokens (Reconstruction)
|
| 74 |
To visually probe the information in the token sequences, one can decode the tokens back into images:
|
| 75 |
|
| 76 |
```python
|
|
|
|
| 101 |
|
| 102 |
## Licensing
|
| 103 |
|
| 104 |
+
Unless otherwise noted, the model weights are licensed under the Apache License 2.0.
|
| 105 |
+
For the code licensing, see [GitHub licensing](https://github.com/Araachie/comit?tab=readme-ov-file#licensing).
|
| 106 |
|
| 107 |
## Citation
|
| 108 |
|
| 109 |
+
If you find this work helpful, please consider citing:
|
| 110 |
|
| 111 |
```bibtex
|
| 112 |
@misc{davtyan2026comit,
|