--- license: mit pipeline_tag: image-to-image --- # TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders TC-AE is a novel Vision Transformer (ViT)-based tokenizer for deep image compression and visual generation. It addresses the challenge of latent representation collapse in high compression ratios by optimizing the token space.

arXiv  GitHub

## Introduction TC-AE achieves substantially improved reconstruction and generative performance under deep compression through two key innovations: 1. **Staged Token Compression**: Decomposes token-to-latent mapping into two stages, reducing structural information loss in the bottleneck. 2. **Semantic Enhancement**: Incorporates joint self-supervised training to produce more generative-friendly latents. ## Usage ### Environment Setup To set up the environment for TC-AE, follow these steps: ```shell conda create -n tcae python=3.9 conda activate tcae pip install -r requirements.txt ``` ### Image Reconstruction Demo To use the TC-AE tokenizer for image reconstruction, you can run the following script using the pre-trained weights: ```shell python tcae/script/demo_recon.py \ --img_folder /path/to/your/images \ --output_folder /path/to/output \ --ckpt_path results/tcae.pt \ --config configs/TC-AE-SL.yaml \ --rank 0 ``` ## Citation ```bibtex @article{li2026tcae, title={TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders}, author={Li, Teng and Huang, Ziyuan and Chen, Cong and Li, Yangfu and Lyu, Yuanhuiyi and Zheng, Dandan and Shen, Chunhua and Zhang, Jun}, journal={arXiv preprint arXiv:2604.07340}, year={2026} } ```