Commit ·
ce980b9
1
Parent(s): 13fe96e
Update README sections and remove deprecated checkpoint/base_model fields
Browse files- README.md +23 -5
- config.yaml +0 -1
README.md
CHANGED
|
@@ -5,7 +5,6 @@ tags:
|
|
| 5 |
- image-generation
|
| 6 |
- flextok
|
| 7 |
- autoregressive
|
| 8 |
-
base_model: ZhitongGao/gridtok_d36_t2i_256
|
| 9 |
---
|
| 10 |
|
| 11 |
# gridtok_d36_t2i_256
|
|
@@ -23,6 +22,9 @@ FlexTok Autoregressive Text-to-Image Model
|
|
| 23 |
- **Tokenizer**: ZhitongGao/GridAR_256
|
| 24 |
- **Text Encoder**: google/flan-t5-xl
|
| 25 |
|
|
|
|
|
|
|
|
|
|
| 26 |
## Usage
|
| 27 |
|
| 28 |
```python
|
|
@@ -31,19 +33,35 @@ from flextok_ar.utils.helpers import load_model
|
|
| 31 |
|
| 32 |
# Load model
|
| 33 |
model, tokenizer, cfg = load_model(
|
| 34 |
-
model_id="
|
| 35 |
device="cuda"
|
| 36 |
)
|
| 37 |
|
| 38 |
# Generate image
|
| 39 |
images = model.generate(
|
| 40 |
-
data_dict={
|
| 41 |
num_samples=1,
|
| 42 |
cfg_factor=3.0,
|
| 43 |
temperature=1.0,
|
| 44 |
)
|
| 45 |
```
|
| 46 |
|
| 47 |
-
##
|
|
|
|
|
|
|
| 48 |
|
| 49 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
- image-generation
|
| 6 |
- flextok
|
| 7 |
- autoregressive
|
|
|
|
| 8 |
---
|
| 9 |
|
| 10 |
# gridtok_d36_t2i_256
|
|
|
|
| 22 |
- **Tokenizer**: ZhitongGao/GridAR_256
|
| 23 |
- **Text Encoder**: google/flan-t5-xl
|
| 24 |
|
| 25 |
+
## Installation
|
| 26 |
+
For install instructions, please see https://github.com/EPFL-VILAB/search-over-token/.
|
| 27 |
+
|
| 28 |
## Usage
|
| 29 |
|
| 30 |
```python
|
|
|
|
| 33 |
|
| 34 |
# Load model
|
| 35 |
model, tokenizer, cfg = load_model(
|
| 36 |
+
model_id="ZhitongGao/GridAR-3B-T2I",
|
| 37 |
device="cuda"
|
| 38 |
)
|
| 39 |
|
| 40 |
# Generate image
|
| 41 |
images = model.generate(
|
| 42 |
+
data_dict={"text": ["A serene lake at sunset"]},
|
| 43 |
num_samples=1,
|
| 44 |
cfg_factor=3.0,
|
| 45 |
temperature=1.0,
|
| 46 |
)
|
| 47 |
```
|
| 48 |
|
| 49 |
+
## Citation
|
| 50 |
+
|
| 51 |
+
If you find this repository helpful, please consider citing our work:
|
| 52 |
|
| 53 |
+
```bibtex
|
| 54 |
+
@article{gao2026ordered,
|
| 55 |
+
title={(1D) Ordered Tokens Enable Efficient Test-Time Search},
|
| 56 |
+
author={Zhitong Gao and Parham Rezaei and Ali Cy and Mingqiao Ye and Nata{\v{s}}a Jovanovi{\'{c}} and Jesse Allardice and Afshin Dehghan and Amir Zamir and Roman Bachmann and O{\u{g}}uzhan Fatih Kar},
|
| 57 |
+
journal={arxiv 2026},
|
| 58 |
+
year={2026}
|
| 59 |
+
}
|
| 60 |
+
|
| 61 |
+
@article{flextok,
|
| 62 |
+
title={{FlexTok}: Resampling Images into 1D Token Sequences of Flexible Length},
|
| 63 |
+
author={Roman Bachmann and Jesse Allardice and David Mizrahi and Enrico Fini and O{\u{g}}uzhan Fatih Kar and Elmira Amirloo and Alaaeldin El-Nouby and Amir Zamir and Afshin Dehghan},
|
| 64 |
+
journal={arXiv 2025},
|
| 65 |
+
year={2025}
|
| 66 |
+
}
|
| 67 |
+
```
|
config.yaml
CHANGED
|
@@ -4,7 +4,6 @@
|
|
| 4 |
# Uses nomup checkpoint (muP scaling baked into weights)
|
| 5 |
|
| 6 |
model:
|
| 7 |
-
checkpoint: /capstor/scratch/cscs/zgao/flextok/release/flextok_ar/checkpoints/2d_grid_d36/checkpoint_nomup.pth
|
| 8 |
modality: image
|
| 9 |
|
| 10 |
# For 2D grid, the AR checkpoint does NOT contain image_tokenizer weights.
|
|
|
|
| 4 |
# Uses nomup checkpoint (muP scaling baked into weights)
|
| 5 |
|
| 6 |
model:
|
|
|
|
| 7 |
modality: image
|
| 8 |
|
| 9 |
# For 2D grid, the AR checkpoint does NOT contain image_tokenizer weights.
|