Add metadata and improve model card

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +17 -135
README.md CHANGED
@@ -1,51 +1,26 @@
 
 
 
 
 
1
  # TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders
2
 
 
 
3
  <p align="center">
4
- <a href="https://arxiv.org/abs/2604.07340"><img src="https://img.shields.io/badge/Paper-Arxiv-b31b1b.svg" alt="arXiv"></a>&nbsp;
5
- <a href="https://huggingface.co/inclusionAI/TC-AE/tree/main"><img src="https://img.shields.io/badge/🤗%20Hugging%20Face-Models-yellow" alt="Models"></a>
6
  </p>
7
- <div align="center">
8
- <a href="https://tliby.github.io/" target="_blank">Teng&nbsp;Li</a><sup>1,2*</sup>,
9
- <a href="https://huang-ziyuan.github.io/" target="_blank">Ziyuan&nbsp;Huang</a><sup>1,*,✉</sup>,
10
- <a href="https://scholar.google.com/citations?user=kwDXTpAAAAAJ&hl=en" target="_blank">Cong&nbsp;Chen</a><sup>1,3,*</sup>,
11
- <a href="https://ychenl.github.io/" target="_blank">Yangfu&nbsp;Li</a><sup>1,4</sup>,
12
- <a href="https://qc-ly.github.io/" target="_blank">Yuanhuiyi&nbsp;Lyu</a><sup>1,5</sup>, <br>
13
- <a href="#" target="_blank">Dandan&nbsp;Zheng</a><sup>1</sup>,
14
- <a href="https://scholar.google.com/citations?user=Ljk2BvIAAAAJ&hl=en" target="_blank">Chunhua&nbsp;Shen</a><sup>3</sup>,
15
- <a href="https://eejzhang.people.ust.hk/" target="_blank">Jun&nbsp;Zhang</a><sup>2✉</sup><br>
16
- <sup>1</sup>Inclusion AI, Ant Group, <sup>2</sup>HKUST, <sup>3</sup>ZJU, <sup>4</sup>ECNU, <sup>5</sup>HKUST (GZ) <br>
17
- <sup>*</sup>Equal contribution, ✉ Corresponding authors <br>
18
- </div>
19
-
20
-
21
-
22
-
23
-
24
- ## News
25
-
26
- - [2026/04/09] Research paper, code, and models are released for TC-AE!
27
-
28
 
29
  ## Introduction
30
 
31
- <p align="center">
32
- <img src="assets/pipeline.png" width=98%>
33
- <p>
34
-
35
-
36
-
37
- **TC-AE** is a novel Vision Transformer (ViT)-based tokenizer for deep image compression and visual generation. Traditional deep compression methods typically increase channel dimensions to maintain reconstruction quality at high compression ratios, but this often leads to representation collapse that degrades generative performance. TC-AE addresses this fundamental challenge from a new perspective: **optimizing the token space** — the critical bridge between pixels and latent representations. By scaling token numbers and enhancing their semantic structure, TC-AE achieves superior reconstruction and generation quality. Key Innovations:
38
-
39
- - Token Space Optimization: First to address representation collapse through token sapce optimization
40
- - Staged Token Compression: Decomposes token-to-latent mapping into two stages, reducing structural information loss in the bottleneck
41
- - Semantic Enhancement: Incorporates self-supervised learning to produce more generative-friendly latents
42
 
43
- 🚀 In this codebase, we release:
44
 
45
- - Pre-trained TC-AE tokenizer weights and evaluation code
46
- - Diffusion model training and evaluation code
47
-
48
- ## Environment Setup
49
 
50
  To set up the environment for TC-AE, follow these steps:
51
 
@@ -55,19 +30,9 @@ conda activate tcae
55
  pip install -r requirements.txt
56
  ```
57
 
58
- ## Download Checkpoints
59
-
60
- Download the pre-trained TC-AE weights and place them in the `results/` directory:
61
-
62
-
63
- | Tokenizer | Compression Ratio | rFID | LPIPS | Pretrained Weights |
64
- | --------- | ----------------- | ---- | ----- | ------------------------------------------------------------ |
65
- | TC-AE-SL | f32d128 | 0.35 | 0.060 | [![Models](https://img.shields.io/badge/🤗%20Hugging%20Face-Models-yellow)](https://huggingface.co/inclusionAI/TC-AE/tree/main) |
66
 
67
-
68
- ## Reconstruction Evaluation
69
-
70
- ##### Image Reconstruction Demo
71
 
72
  ```shell
73
  python tcae/script/demo_recon.py \
@@ -78,92 +43,9 @@ python tcae/script/demo_recon.py \
78
  --rank 0
79
  ```
80
 
81
- ##### ImageNet Evaluation
82
-
83
- Evaluate reconstruction quality on ImageNet validation set:
84
-
85
- ```shell
86
- python tcae/script/eval_recon.py \
87
- --ckpt_path results/tcae.pt \
88
- --dataset_root /path/to/imagenet_val \
89
- --config configs/TC-AE-SL.yaml \
90
- --rank 0
91
- ```
92
-
93
- ## Generation Evaluation
94
-
95
- Our DiT architecture and training pipeline are based on [RAE](https://github.com/bytetriper/RAE) and [VA-VAE](https://github.com/hustvl/LightningDiT).
96
-
97
- ##### Prepare ImageNet Latents for Training
98
-
99
- Extract and cache latent representations from ImageNet training set:
100
-
101
- ```shell
102
- accelerate launch \
103
- --mixed_precision bf16 \
104
- diffusion/script/extract_features.py \
105
- --data_path /path/to/imagenet_train \
106
- --batch_size 50 \
107
- --tokenizer_cfg_path configs/TC-AE-SL.yaml \
108
- --tokenizer_ckpt_path results/tcae.pt
109
- ```
110
-
111
- This will cache latents to `results/cached_latents/imagenet_train_256/`.
112
-
113
- ##### Training
114
-
115
- Train a DiT-XL model on the extracted latents:
116
-
117
- ```shell
118
- mkdir -p results/dit
119
- torchrun --standalone --nproc_per_node=8 \
120
- diffusion/script/train_dit.py \
121
- --config configs/DiT-XL.yaml \
122
- --data-path results/cached_latents/imagenet_train_256 \
123
- --results-dir results/dit \
124
- --image-size 256 \
125
- --precision bf16
126
- ```
127
-
128
- ##### Sampling
129
-
130
- Generate images using the trained diffusion model:
131
-
132
- ```shell
133
- mkdir -p results/dit/samples
134
- torchrun --standalone --nnodes=1 --nproc_per_node=8 \
135
- diffusion/script/sample_ddp_dit.py \
136
- --config configs/DiT-XL.yaml \
137
- --sample-dir results/dit/samples \
138
- --precision bf16 \
139
- --label-sampling equal \
140
- --tokenizer_cfg_path configs/TC-AE-SL.yaml \
141
- --tokenizer_ckpt_path results/tcae.pt
142
- ```
143
-
144
- ##### Evaluation
145
-
146
- Download the ImageNet reference statistics: [adm_in256_stats.npz](https://huggingface.co/jjiaweiyang/l-DeTok/commit/28ef58d254bb1bde10e331372fe542e5458f3b5f#d2h-232267) and place it in `results/`.
147
-
148
- ```shell
149
- python diffusion/script/eval_dit.py \
150
- --generated_dir results/dit/samples/DiT-0100000-cfg-1.00-bs100-ODE-50-euler-bf16 \
151
- --reference_npz results/adm_in256_stats.npz \
152
- --batch-size 512 \
153
- --num-workers 8
154
- ```
155
-
156
- ## Acknowledgements
157
-
158
- The codebase is built on [HieraTok](https://arxiv.org/abs/2509.23736), [RAE](https://github.com/bytetriper/RAE), [VA-VAE](https://github.com/hustvl/LightningDiT), [iBOT](https://github.com/bytedance/ibot). Thanks for their efforts!
159
-
160
- ## License
161
-
162
- This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
163
-
164
  ## Citation
165
 
166
- ```
167
  @article{li2026tcae,
168
  title={TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders},
169
  author={Li, Teng and Huang, Ziyuan and Chen, Cong and Li, Yangfu and Lyu, Yuanhuiyi and Zheng, Dandan and Shen, Chunhua and Zhang, Jun},
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: image-to-image
4
+ ---
5
+
6
  # TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders
7
 
8
+ TC-AE is a novel Vision Transformer (ViT)-based tokenizer for deep image compression and visual generation. It addresses the challenge of latent representation collapse in high compression ratios by optimizing the token space.
9
+
10
  <p align="center">
11
+ <a href="https://huggingface.co/papers/2604.07340"><img src="https://img.shields.io/badge/Paper-Arxiv-b31b1b.svg" alt="arXiv"></a>&nbsp;
12
+ <a href="https://github.com/inclusionAI/TC-AE"><img src="https://img.shields.io/badge/Code-GitHub-blue?logo=github" alt="GitHub"></a>
13
  </p>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
  ## Introduction
16
 
17
+ TC-AE achieves substantially improved reconstruction and generative performance under deep compression through two key innovations:
18
+ 1. **Staged Token Compression**: Decomposes token-to-latent mapping into two stages, reducing structural information loss in the bottleneck.
19
+ 2. **Semantic Enhancement**: Incorporates joint self-supervised training to produce more generative-friendly latents.
 
 
 
 
 
 
 
 
20
 
21
+ ## Usage
22
 
23
+ ### Environment Setup
 
 
 
24
 
25
  To set up the environment for TC-AE, follow these steps:
26
 
 
30
  pip install -r requirements.txt
31
  ```
32
 
33
+ ### Image Reconstruction Demo
 
 
 
 
 
 
 
34
 
35
+ To use the TC-AE tokenizer for image reconstruction, you can run the following script using the pre-trained weights:
 
 
 
36
 
37
  ```shell
38
  python tcae/script/demo_recon.py \
 
43
  --rank 0
44
  ```
45
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
  ## Citation
47
 
48
+ ```bibtex
49
  @article{li2026tcae,
50
  title={TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders},
51
  author={Li, Teng and Huang, Ziyuan and Chen, Cong and Li, Yangfu and Lyu, Yuanhuiyi and Zheng, Dandan and Shen, Chunhua and Zhang, Jun},