TC-AE / README.md

nielsr HF Staff

Add metadata and improve model card

6492198 verified about 18 hours ago

1.93 kB

license: mit
pipeline_tag: image-to-image

TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders

TC-AE is a novel Vision Transformer (ViT)-based tokenizer for deep image compression and visual generation. It addresses the challenge of latent representation collapse in high compression ratios by optimizing the token space.

Introduction

TC-AE achieves substantially improved reconstruction and generative performance under deep compression through two key innovations:

Staged Token Compression: Decomposes token-to-latent mapping into two stages, reducing structural information loss in the bottleneck.
Semantic Enhancement: Incorporates joint self-supervised training to produce more generative-friendly latents.

Usage

Environment Setup

To set up the environment for TC-AE, follow these steps:

conda create -n tcae python=3.9
conda activate tcae
pip install -r requirements.txt

Image Reconstruction Demo

To use the TC-AE tokenizer for image reconstruction, you can run the following script using the pre-trained weights:

python tcae/script/demo_recon.py \
    --img_folder /path/to/your/images \
    --output_folder /path/to/output \
    --ckpt_path results/tcae.pt \
    --config configs/TC-AE-SL.yaml \
    --rank 0

Citation

@article{li2026tcae,
  title={TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders},
  author={Li, Teng and Huang, Ziyuan and Chen, Cong and Li, Yangfu and Lyu, Yuanhuiyi and Zheng, Dandan and Shen, Chunhua and Zhang, Jun},
  journal={arXiv preprint arXiv:2604.07340},
  year={2026}
}