🐱 DiT-IC Model Card

Source code is available at Github.

Model Description

  • Developed by: NJU VisionLab
  • Model type: Diffusion Transformer based image compression model
  • Model size: 1.049 B parameters
  • Resolution: Supports arbitrary resolution images

This model performs the diffusion process in a 32Γ— latent space to reduce memory usage and accelerate inference.

The model is based on the pretrained text-to-image generative model SANA-600M.

Quick Start

For training and inference scripts, please visit our GitHub Repository.

Limitations

  • The model may fail to reconstruct tiny text at low bitrates.
  • Fingers and other fine structures may not be generated properly.

Citation

If you use this model, please cite:

@inproceedings{shi2026ditic,
  title={DiT-IC: Aligned Diffusion Transformer for Efficient Image Compression},
  author={Shi Junqi, Lu Ming, Li Xingchen, Ke Anle, Zhang Ruiqi and Ma Zhan},
  booktitle={CVPR},
  year={2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for JunqiShi/DiT-IC

Unable to build the model tree, the base model loops to the model itself. Learn more.

Dataset used to train JunqiShi/DiT-IC

Paper for JunqiShi/DiT-IC