ZhitongGao commited on
Commit
ce980b9
·
1 Parent(s): 13fe96e

Update README sections and remove deprecated checkpoint/base_model fields

Browse files
Files changed (2) hide show
  1. README.md +23 -5
  2. config.yaml +0 -1
README.md CHANGED
@@ -5,7 +5,6 @@ tags:
5
  - image-generation
6
  - flextok
7
  - autoregressive
8
- base_model: ZhitongGao/gridtok_d36_t2i_256
9
  ---
10
 
11
  # gridtok_d36_t2i_256
@@ -23,6 +22,9 @@ FlexTok Autoregressive Text-to-Image Model
23
  - **Tokenizer**: ZhitongGao/GridAR_256
24
  - **Text Encoder**: google/flan-t5-xl
25
 
 
 
 
26
  ## Usage
27
 
28
  ```python
@@ -31,19 +33,35 @@ from flextok_ar.utils.helpers import load_model
31
 
32
  # Load model
33
  model, tokenizer, cfg = load_model(
34
- model_id="{model_id}",
35
  device="cuda"
36
  )
37
 
38
  # Generate image
39
  images = model.generate(
40
- data_dict={{"text": ["A serene lake at sunset"]}},
41
  num_samples=1,
42
  cfg_factor=3.0,
43
  temperature=1.0,
44
  )
45
  ```
46
 
47
- ## Configuration
 
 
48
 
49
- This model uses the configuration from `configs/t2i/gridtok_d36_t2i_256.yaml`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  - image-generation
6
  - flextok
7
  - autoregressive
 
8
  ---
9
 
10
  # gridtok_d36_t2i_256
 
22
  - **Tokenizer**: ZhitongGao/GridAR_256
23
  - **Text Encoder**: google/flan-t5-xl
24
 
25
+ ## Installation
26
+ For install instructions, please see https://github.com/EPFL-VILAB/search-over-token/.
27
+
28
  ## Usage
29
 
30
  ```python
 
33
 
34
  # Load model
35
  model, tokenizer, cfg = load_model(
36
+ model_id="ZhitongGao/GridAR-3B-T2I",
37
  device="cuda"
38
  )
39
 
40
  # Generate image
41
  images = model.generate(
42
+ data_dict={"text": ["A serene lake at sunset"]},
43
  num_samples=1,
44
  cfg_factor=3.0,
45
  temperature=1.0,
46
  )
47
  ```
48
 
49
+ ## Citation
50
+
51
+ If you find this repository helpful, please consider citing our work:
52
 
53
+ ```bibtex
54
+ @article{gao2026ordered,
55
+ title={(1D) Ordered Tokens Enable Efficient Test-Time Search},
56
+ author={Zhitong Gao and Parham Rezaei and Ali Cy and Mingqiao Ye and Nata{\v{s}}a Jovanovi{\'{c}} and Jesse Allardice and Afshin Dehghan and Amir Zamir and Roman Bachmann and O{\u{g}}uzhan Fatih Kar},
57
+ journal={arxiv 2026},
58
+ year={2026}
59
+ }
60
+
61
+ @article{flextok,
62
+ title={{FlexTok}: Resampling Images into 1D Token Sequences of Flexible Length},
63
+ author={Roman Bachmann and Jesse Allardice and David Mizrahi and Enrico Fini and O{\u{g}}uzhan Fatih Kar and Elmira Amirloo and Alaaeldin El-Nouby and Amir Zamir and Afshin Dehghan},
64
+ journal={arXiv 2025},
65
+ year={2025}
66
+ }
67
+ ```
config.yaml CHANGED
@@ -4,7 +4,6 @@
4
  # Uses nomup checkpoint (muP scaling baked into weights)
5
 
6
  model:
7
- checkpoint: /capstor/scratch/cscs/zgao/flextok/release/flextok_ar/checkpoints/2d_grid_d36/checkpoint_nomup.pth
8
  modality: image
9
 
10
  # For 2D grid, the AR checkpoint does NOT contain image_tokenizer weights.
 
4
  # Uses nomup checkpoint (muP scaling baked into weights)
5
 
6
  model:
 
7
  modality: image
8
 
9
  # For 2D grid, the AR checkpoint does NOT contain image_tokenizer weights.