CoD: A Diffusion Foundation Model for Image Compression

CoD (Compression-oriented Diffusion) is the first diffusion foundation model designed and trained from scratch specifically for image compression. It enables end-to-end optimization of both compression and generation.

Authors

Zhaoyang Jia, Zihan Zheng, Naifu Xue, Jiahao Li, Bin Li, Zongyu Guo, Xiaoyi Zhang, Houqiang Li, Yan Lu

Key Advantages

High compression efficiency: Replaces Stable Diffusion in downstream codecs (like DiffC) to achieve SOTA results, especially at ultra-low bitrates (e.g., 0.0039 bpp).
Low-cost and reproducible training: 300$\times$ faster training than Stable Diffusion ($\sim$ 20 vs. $\sim$ 6,250 A100 GPU days) on entirely open image-only datasets.
Architecture: Features a lightweight condition encoder for image-native features, a VQ information bottleneck for compact bitstreams, and a Diffusion Transformer (DiT) for reconstruction.

Available Models

Base CoD Models (`cod/`)

Model	Space	BPP	Config	Checkpoint
CoD (pixel)	Pixel	0.0039	`CoD_pixel_vpred.yaml`	`CoD_pixel_vpred.pt`
CoD (latent)	Latent	0.0039	`CoD_latent_vpred.yaml`	`CoD_latent_vpred.pt`
CoD (latent, 64-bit)	Latent	0.00024	`CoD_latent_vpred_64bits.yaml`	`CoD_latent_vpred_64bits.pt`

One-Step CoD (`finetuned_one_step_cod/`)

Single forward pass, better performance, wider bitrates.

Model	BPP	Config	Checkpoint
`bpp_0_0039`	0.0039	`bpp_0_0039.yaml`	`bpp_0_0039.pt`
`bpp_0_0039_noise_1`	0.0039	`bpp_0_0039_noise_1.yaml`	`bpp_0_0039_noise_1.pt`
`bpp_0_0312`	0.0312	`bpp_0_0312.yaml`	`bpp_0_0312.pt`
`bpp_0_1250`	0.1250	`bpp_0_1250.yaml`	`bpp_0_1250.pt`

CoD as Perceptual Loss (`perceptual_loss_illm_dec/`)

Model	Checkpoint
`msillm_quality_vlo2`	`msillm_quality_vlo2.pt`
`msillm_quality_1`	`msillm_quality_1.pt`
`msillm_quality_2`	`msillm_quality_2.pt`
`msillm_quality_3`	`msillm_quality_3.pt`
`msillm_quality_4`	`msillm_quality_4.pt`

Performance

Metrics evaluated on Kodak (512x512):

Model	BPP	PSNR	LPIPS	DISTS	FID
CoD (pixel)	0.0039	16.21	0.434	0.186	46.0
CoD (latent)	0.0039	15.03	0.415	0.188	45.7
CoD (latent, 64-bit)	0.00024	10.09	0.686	0.288	69.5

Note: CoD (latent) at 0.0039 bpp uses --cfg 1.25. CoD (latent, 64-bit) uses --cfg 3.0.

Quick Start

Installation

git clone https://github.com/microsoft/GenCodec.git
cd GenCodec/CoD
pip install -r requirements.txt

Download Checkpoints

# Download base CoD models
huggingface-cli download zhaoyangjia/CoD --include "cod/*" --local-dir ./pretrained/CoD

# Download one-step models
huggingface-cli download zhaoyangjia/CoD --include "finetuned_one_step_cod/*" --local-dir ./pretrained/CoD

# Download perceptual loss models
huggingface-cli download zhaoyangjia/CoD --include "perceptual_loss_illm_dec/*" --local-dir ./pretrained/CoD

# Download a specific model
huggingface-cli download zhaoyangjia/CoD cod/CoD_pixel_vpred.pt cod/CoD_pixel_vpred.yaml --local-dir ./pretrained/CoD

# Download everything
huggingface-cli download zhaoyangjia/CoD --local-dir ./pretrained/CoD

Base CoD Inference

python -m cod.inference evaluate \
    --ckpt ./pretrained/CoD/cod/CoD_pixel_vpred.pt \
    --config ./pretrained/CoD/cod/CoD_pixel_vpred.yaml \
    --input <image_dir> --output <recon_dir> \
    --step 25 --cfg 3.0 --sampler adam2

# For latent model, use --cfg 1.25
python -m cod.inference evaluate \
    --ckpt ./pretrained/CoD/cod/CoD_latent_vpred.pt \
    --config ./pretrained/CoD/cod/CoD_latent_vpred.yaml \
    --input <image_dir> --output <recon_dir> \
    --step 25 --cfg 1.25 --sampler adam2

One-Step CoD Inference

python -m downstream.finetuned_one_step_cod evaluate \
    --ckpt ./pretrained/CoD/finetuned_one_step_cod/bpp_0_0039.pt \
    --config ./pretrained/CoD/finetuned_one_step_cod/bpp_0_0039.yaml \
    --input <image_dir> --output <recon_dir>

Perceptual Loss Inference

Requires NeuralCompression (installed automatically via torch.hub).

python -m downstream.perceptual_loss_inference \
    --ckpt ./pretrained/CoD/perceptual_loss_illm_dec/msillm_quality_1.pt \
    --quality 1 \
    --input <image_dir> --output <recon_dir>

Citation

@inproceedings{jia2025cod,
    title     = {CoD: A Diffusion Foundation Model for Image Compression},
    author    = {Jia, Zhaoyang and Zheng, Zihan and Xue, Naifu and Li, Jiahao and Li, Bin and Guo, Zongyu and Zhang, Xiaoyi and Li, Houqiang and Lu, Yan},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year      = {2026}
}

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track

Collection including zhaoyangjia/CoD

CoD

Collection

Collection of Compression-oriented Diffusion Models and its Applications • 2 items • Updated Apr 14

Paper for zhaoyangjia/CoD

CoD: A Diffusion Foundation Model for Image Compression

Paper • 2511.18706 • Published Nov 24, 2025 • 2

CoD: A Diffusion Foundation Model for Image Compression

Authors

Key Advantages

Available Models

Base CoD Models (cod/)

One-Step CoD (finetuned_one_step_cod/)

CoD as Perceptual Loss (perceptual_loss_illm_dec/)

Performance

Quick Start

Installation

Download Checkpoints

Base CoD Inference

One-Step CoD Inference

Perceptual Loss Inference

Citation

License

Collection including zhaoyangjia/CoD

Paper for zhaoyangjia/CoD

Base CoD Models (`cod/`)

One-Step CoD (`finetuned_one_step_cod/`)

CoD as Perceptual Loss (`perceptual_loss_illm_dec/`)