EPFL-VILAB
/

GridAR-3B-T2I

@@ -5,7 +5,6 @@ tags:
 - image-generation
 - flextok
 - autoregressive
-base_model: ZhitongGao/gridtok_d36_t2i_256
 ---
 # gridtok_d36_t2i_256
@@ -23,6 +22,9 @@ FlexTok Autoregressive Text-to-Image Model
 - **Tokenizer**: ZhitongGao/GridAR_256
 - **Text Encoder**: google/flan-t5-xl
 ## Usage
 ```python
@@ -31,19 +33,35 @@ from flextok_ar.utils.helpers import load_model
 # Load model
 model, tokenizer, cfg = load_model(
-    model_id="{model_id}",
     device="cuda"
 )
 # Generate image
 images = model.generate(
-    data_dict={{"text": ["A serene lake at sunset"]}},
     num_samples=1,
     cfg_factor=3.0,
     temperature=1.0,
 )
 ```
-## Configuration
-This model uses the configuration from `configs/t2i/gridtok_d36_t2i_256.yaml`.

 - image-generation
 - flextok
 - autoregressive
 ---
 # gridtok_d36_t2i_256
 - **Tokenizer**: ZhitongGao/GridAR_256
 - **Text Encoder**: google/flan-t5-xl
+## Installation
+For install instructions, please see https://github.com/EPFL-VILAB/search-over-token/.
 ## Usage
 ```python
 # Load model
 model, tokenizer, cfg = load_model(
+    model_id="ZhitongGao/GridAR-3B-T2I",
     device="cuda"
 )
 # Generate image
 images = model.generate(
+    data_dict={"text": ["A serene lake at sunset"]},
     num_samples=1,
     cfg_factor=3.0,
     temperature=1.0,
 )
 ```
+## Citation
+If you find this repository helpful, please consider citing our work:
+```bibtex
+@article{gao2026ordered,
+  title={(1D) Ordered Tokens Enable Efficient Test-Time Search},
+  author={Zhitong Gao and Parham Rezaei and Ali Cy and Mingqiao Ye and Nata{\v{s}}a Jovanovi{\'{c}} and Jesse Allardice and Afshin Dehghan and Amir Zamir and Roman Bachmann and O{\u{g}}uzhan Fatih Kar},
+  journal={arxiv 2026},
+  year={2026}
+}
+@article{flextok,
+  title={{FlexTok}: Resampling Images into 1D Token Sequences of Flexible Length},
+  author={Roman Bachmann and Jesse Allardice and David Mizrahi and Enrico Fini and O{\u{g}}uzhan Fatih Kar and Elmira Amirloo and Alaaeldin El-Nouby and Amir Zamir and Afshin Dehghan},
+  journal={arXiv 2025},
+  year={2025}
+}
+```

config.yaml CHANGED Viewed

@@ -4,7 +4,6 @@
 # Uses nomup checkpoint (muP scaling baked into weights)
 model:
-  checkpoint: /capstor/scratch/cscs/zgao/flextok/release/flextok_ar/checkpoints/2d_grid_d36/checkpoint_nomup.pth
   modality: image
   # For 2D grid, the AR checkpoint does NOT contain image_tokenizer weights.

 # Uses nomup checkpoint (muP scaling baked into weights)
 model:
   modality: image
   # For 2D grid, the AR checkpoint does NOT contain image_tokenizer weights.