CoD: A Diffusion Foundation Model for Image Compression

arXiv GitHub

CoD (Compression-oriented Diffusion) is the first diffusion foundation model designed and trained from scratch specifically for image compression. A lightweight condition encoder image-native features, a VQ information bottleneck compresses them into a compact bitstream, and a Diffusion Transformer reconstructs the image conditioned on the quantized representation.

Available Models

Base CoD Models (cod/)

Model Space BPP Config Checkpoint
CoD (pixel) Pixel 0.0039 CoD_pixel_vpred.yaml CoD_pixel_vpred.pt
CoD (latent) Latent 0.0039 CoD_latent_vpred.yaml CoD_latent_vpred.pt
CoD (latent, 64-bit) Latent 0.00024 CoD_latent_vpred_64bits.yaml CoD_latent_vpred_64bits.pt

One-Step CoD (finetuned_one_step_cod/)

Single forward pass, better performance, wider bitrates.

Model BPP Config Checkpoint
bpp_0_0039 0.0039 bpp_0_0039.yaml bpp_0_0039.pt
bpp_0_0039_noise_1 0.0039 bpp_0_0039_noise_1.yaml bpp_0_0039_noise_1.pt
bpp_0_0312 0.0312 bpp_0_0312.yaml bpp_0_0312.pt
bpp_0_1250 0.1250 bpp_0_1250.yaml bpp_0_1250.pt

CoD as Perceptual Loss (perceptual_loss_illm_dec/)

Model Checkpoint
msillm_quality_vlo2 msillm_quality_vlo2.pt
msillm_quality_1 msillm_quality_1.pt
msillm_quality_2 msillm_quality_2.pt
msillm_quality_3 msillm_quality_3.pt
msillm_quality_4 msillm_quality_4.pt

Performance

Metrics evaluated on Kodak (512x512):

Model BPP PSNR LPIPS DISTS FID
CoD (pixel) 0.0039 16.21 0.434 0.186 46.0
CoD (latent) 0.0039 15.03 0.415 0.188 45.7
CoD (latent, 64-bit) 0.00024 10.09 0.686 0.288 69.5

Note: CoD (latent) at 0.0039 bpp uses --cfg 1.25. CoD (latent, 64-bit) uses --cfg 3.0.

Quick Start

Installation

git clone https://github.com/microsoft/GenCodec/CoD.git
cd CoD
pip install -r requirements.txt

Download Checkpoints

# Download base CoD models
huggingface-cli download jzyustc/CoD --include "cod/*" --local-dir ./pretrained/CoD

# Download one-step models
huggingface-cli download jzyustc/CoD --include "finetuned_one_step_cod/*" --local-dir ./pretrained/CoD

# Download perceptual loss models
huggingface-cli download jzyustc/CoD --include "perceptual_loss_illm_dec/*" --local-dir ./pretrained/CoD

# Download a specific model
huggingface-cli download jzyustc/CoD cod/CoD_pixel_vpred.pt cod/CoD_pixel_vpred.yaml --local-dir ./pretrained/CoD

# Download everything
huggingface-cli download jzyustc/CoD --local-dir ./pretrained/CoD

Base CoD Inference

python -m cod.inference evaluate \
    --ckpt ./pretrained/CoD/cod/CoD_pixel_vpred.pt \
    --config ./pretrained/CoD/cod/CoD_pixel_vpred.yaml \
    --input <image_dir> --output <recon_dir> \
    --step 25 --cfg 3.0 --sampler adam2

# For latent model, use --cfg 1.25
python -m cod.inference evaluate \
    --ckpt ./pretrained/CoD/cod/CoD_latent_vpred.pt \
    --config ./pretrained/CoD/cod/CoD_latent_vpred.yaml \
    --input <image_dir> --output <recon_dir> \
    --step 25 --cfg 1.25 --sampler adam2

One-Step CoD Inference

python -m downstream.finetuned_one_step_cod evaluate \
    --ckpt ./pretrained/CoD/finetuned_one_step_cod/bpp_0_0039.pt \
    --config ./pretrained/CoD/finetuned_one_step_cod/bpp_0_0039.yaml \
    --input <image_dir> --output <recon_dir>

Perceptual Loss Inference

Requires NeuralCompression (installed automatically via torch.hub).

python -m downstream.perceptual_loss_inference \
    --ckpt ./pretrained/CoD/perceptual_loss_illm_dec/msillm_quality_1.pt \
    --quality 1 \
    --input <image_dir> --output <recon_dir>

Citation

@inproceedings{jia2025cod,
    title     = {CoD: A Diffusion Foundation Model for Image Compression},
    author    = {Jia, Zhaoyang and Zheng, Zihan and Xue, Naifu and Li, Jiahao and Li, Bin and Guo, Zongyu and Zhang, Xiaoyi and Li, Houqiang and Lu, Yan},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year      = {2026}
}

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including jzyustc/CoD

Paper for jzyustc/CoD