Improve model card metadata and structure

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +35 -17
README.md CHANGED
@@ -1,12 +1,22 @@
1
  ---
 
 
2
  tags:
3
  - model_hub_mixin
4
  - pytorch_model_hub_mixin
5
- license: apache-2.0
 
6
  ---
 
7
  # Communication-Inspired Tokenization for Structured Image Representations
8
- </h1>
9
- <p align="left">
 
 
 
 
 
 
10
  <a href="https://araachie.github.io">Aram Davtyan</a> •
11
  <a href="https://www.cvg.unibe.ch/people/sahin">Yusuf Sahin</a> •
12
  <a href="https://people.epfl.ch/yasaman.haghighi?lang=en">Yasaman Haghighi</a> •
@@ -14,28 +24,38 @@ license: apache-2.0
14
  <a href="https://www.cvg.unibe.ch/people/acuaviva">Pablo Acuaviva</a> •
15
  <a href="https://people.epfl.ch/alexandre.alahi?lang=en">Alexandre Alahi</a> •
16
  <a href="https://www.cvg.unibe.ch/people/favaro">Paolo Favaro</a>
17
- </p>
18
 
19
- Official pre-trained models for the paper: https://arxiv.org/abs/2602.20731
20
 
21
- Project's website: https://araachie.github.io/comit/
22
-
23
  ## Installation
24
 
25
- Follow the instructions at https://github.com/Araachie/comit
 
 
 
 
 
 
 
 
26
 
27
  ## Usage
28
 
29
- Example usage, downloading `COMiT-L` from the Hugging Face Hub:
 
30
 
31
  ```python
 
32
  from comit import COMiT
33
 
 
34
  model = COMiT.from_pretrained('cvg-unibe/comit-l')
35
  model.eval().to(device)
36
  ```
37
 
38
- With a pretrained COMiT model images can be encoded into token sequences as follows:
 
39
 
40
  ```python
41
  with torch.no_grad():
@@ -45,14 +65,12 @@ with torch.no_grad():
45
  order="adaptive", # One of ["raster_scan", "random", "adaptive"] or a list of crop indices
46
  num_crops=3, # Used to truncate the list of crops to embed
47
  )
48
- ```
49
 
50
- By default the tokenization pipeline returns a list of 256 6-dimensional tokens. If token indices are needed instead, they can be obtained via:
51
-
52
- ```python
53
  token_ids = model.quantizer.codes_to_indices(token_dict["msgs"])
54
  ```
55
 
 
56
  To visually probe the information in the token sequences, one can decode the tokens back into images:
57
 
58
  ```python
@@ -83,12 +101,12 @@ with torch.no_grad():
83
 
84
  ## Licensing
85
 
86
- Unless otherwise noted, the model weights are licensed under Apache license 2.0.
87
- For the code licensing, see https://github.com/Araachie/comit?tab=readme-ov-file#licensing
88
 
89
  ## Citation
90
 
91
- If you find this work helpful, please consider citing our work:
92
 
93
  ```bibtex
94
  @misc{davtyan2026comit,
 
1
  ---
2
+ license: apache-2.0
3
+ pipeline_tag: image-feature-extraction
4
  tags:
5
  - model_hub_mixin
6
  - pytorch_model_hub_mixin
7
+ - vision
8
+ - image-tokenization
9
  ---
10
+
11
  # Communication-Inspired Tokenization for Structured Image Representations
12
+
13
+ <p align="left">
14
+ <a href="https://huggingface.co/papers/2602.20731">Paper</a> •
15
+ <a href="https://araachie.github.io/comit/">Project Website</a> •
16
+ <a href="https://github.com/Araachie/comit">GitHub</a>
17
+ </p>
18
+
19
+ <p align="left">
20
  <a href="https://araachie.github.io">Aram Davtyan</a> •
21
  <a href="https://www.cvg.unibe.ch/people/sahin">Yusuf Sahin</a> •
22
  <a href="https://people.epfl.ch/yasaman.haghighi?lang=en">Yasaman Haghighi</a> •
 
24
  <a href="https://www.cvg.unibe.ch/people/acuaviva">Pablo Acuaviva</a> •
25
  <a href="https://people.epfl.ch/alexandre.alahi?lang=en">Alexandre Alahi</a> •
26
  <a href="https://www.cvg.unibe.ch/people/favaro">Paolo Favaro</a>
27
+ </p>
28
 
29
+ COMmunication inspired Tokenization (**COMiT**) is a framework for learning structured discrete visual token sequences. Unlike traditional tokenizers optimized primarily for reconstruction, COMiT constructs a latent message by iteratively observing localized image crops and recurrently updating its discrete representation, resulting in interpretable, object-centric token structure.
30
 
 
 
31
  ## Installation
32
 
33
+ Follow the instructions at the [official repository](https://github.com/Araachie/comit):
34
+
35
+ ```bash
36
+ git clone https://github.com/Araachie/comit.git
37
+ cd comit
38
+ conda create -n comit python==3.11 -y
39
+ conda activate comit
40
+ pip install -e .
41
+ ```
42
 
43
  ## Usage
44
 
45
+ ### Loading the Model
46
+ You can download and load the pre-trained `COMiT-L` model directly from the Hub:
47
 
48
  ```python
49
+ import torch
50
  from comit import COMiT
51
 
52
+ device = "cuda" if torch.cuda.is_available() else "cpu"
53
  model = COMiT.from_pretrained('cvg-unibe/comit-l')
54
  model.eval().to(device)
55
  ```
56
 
57
+ ### Encoding Images (Tokenization)
58
+ With a pretrained COMiT model, images can be encoded into token sequences as follows:
59
 
60
  ```python
61
  with torch.no_grad():
 
65
  order="adaptive", # One of ["raster_scan", "random", "adaptive"] or a list of crop indices
66
  num_crops=3, # Used to truncate the list of crops to embed
67
  )
 
68
 
69
+ # Get token indices (discrete IDs)
 
 
70
  token_ids = model.quantizer.codes_to_indices(token_dict["msgs"])
71
  ```
72
 
73
+ ### Decoding Tokens (Reconstruction)
74
  To visually probe the information in the token sequences, one can decode the tokens back into images:
75
 
76
  ```python
 
101
 
102
  ## Licensing
103
 
104
+ Unless otherwise noted, the model weights are licensed under the Apache License 2.0.
105
+ For the code licensing, see [GitHub licensing](https://github.com/Araachie/comit?tab=readme-ov-file#licensing).
106
 
107
  ## Citation
108
 
109
+ If you find this work helpful, please consider citing:
110
 
111
  ```bibtex
112
  @misc{davtyan2026comit,