AR Models with FlexTok
Collection
6 items • Updated
Autoregressive Text-to-Image Model with 2D Grid Tokens (a Controlled Baseline for FlexTok)
For install instructions, please see https://github.com/EPFL-VILAB/search-over-tokens/.
# Generate image from text prompt
from flextok_ar.utils.helpers import load_model
# Load model
model, tokenizer, cfg = load_model(
model_id="ZhitongGao/GridAR-3B-T2I",
device="cuda"
)
# Generate image
images = model.generate(
data_dict={"text": ["A serene lake at sunset"]},
cfg_factor=3.0,
temperature=1.0,
)
If you find this repository helpful, please consider citing our work:
@article{gao2026ordered,
title={(1D) Ordered Tokens Enable Efficient Test-Time Search},
author={Zhitong Gao and Parham Rezaei and Ali Cy and Mingqiao Ye and Nata{\v{s}}a Jovanovi{\'{c}} and Jesse Allardice and Afshin Dehghan and Amir Zamir and Roman Bachmann and O{\u{g}}uzhan Fatih Kar},
journal={arxiv 2026},
year={2026}
}
@article{flextok,
title={{FlexTok}: Resampling Images into 1D Token Sequences of Flexible Length},
author={Roman Bachmann and Jesse Allardice and David Mizrahi and Enrico Fini and O{\u{g}}uzhan Fatih Kar and Elmira Amirloo and Alaaeldin El-Nouby and Amir Zamir and Afshin Dehghan},
journal={arXiv 2025},
year={2025}
}