CoD
Collection
Collection of Compression-oriented Diffusion Models and its Applications • 1 item • Updated
CoD (Compression-oriented Diffusion) is the first diffusion foundation model designed and trained from scratch specifically for image compression. A lightweight condition encoder image-native features, a VQ information bottleneck compresses them into a compact bitstream, and a Diffusion Transformer reconstructs the image conditioned on the quantized representation.
cod/)
| Model | Space | BPP | Config | Checkpoint |
|---|---|---|---|---|
| CoD (pixel) | Pixel | 0.0039 | CoD_pixel_vpred.yaml |
CoD_pixel_vpred.pt |
| CoD (latent) | Latent | 0.0039 | CoD_latent_vpred.yaml |
CoD_latent_vpred.pt |
| CoD (latent, 64-bit) | Latent | 0.00024 | CoD_latent_vpred_64bits.yaml |
CoD_latent_vpred_64bits.pt |
finetuned_one_step_cod/)
Single forward pass, better performance, wider bitrates.
| Model | BPP | Config | Checkpoint |
|---|---|---|---|
bpp_0_0039 |
0.0039 | bpp_0_0039.yaml |
bpp_0_0039.pt |
bpp_0_0039_noise_1 |
0.0039 | bpp_0_0039_noise_1.yaml |
bpp_0_0039_noise_1.pt |
bpp_0_0312 |
0.0312 | bpp_0_0312.yaml |
bpp_0_0312.pt |
bpp_0_1250 |
0.1250 | bpp_0_1250.yaml |
bpp_0_1250.pt |
perceptual_loss_illm_dec/)
| Model | Checkpoint |
|---|---|
msillm_quality_vlo2 |
msillm_quality_vlo2.pt |
msillm_quality_1 |
msillm_quality_1.pt |
msillm_quality_2 |
msillm_quality_2.pt |
msillm_quality_3 |
msillm_quality_3.pt |
msillm_quality_4 |
msillm_quality_4.pt |
Metrics evaluated on Kodak (512x512):
| Model | BPP | PSNR | LPIPS | DISTS | FID |
|---|---|---|---|---|---|
| CoD (pixel) | 0.0039 | 16.21 | 0.434 | 0.186 | 46.0 |
| CoD (latent) | 0.0039 | 15.03 | 0.415 | 0.188 | 45.7 |
| CoD (latent, 64-bit) | 0.00024 | 10.09 | 0.686 | 0.288 | 69.5 |
Note: CoD (latent) at 0.0039 bpp uses
--cfg 1.25. CoD (latent, 64-bit) uses--cfg 3.0.
git clone https://github.com/microsoft/GenCodec.git
cd GenCodec/CoD
pip install -r requirements.txt
# Download base CoD models
huggingface-cli download zhaoyangjia/CoD --include "cod/*" --local-dir ./pretrained/CoD
# Download one-step models
huggingface-cli download zhaoyangjia/CoD --include "finetuned_one_step_cod/*" --local-dir ./pretrained/CoD
# Download perceptual loss models
huggingface-cli download zhaoyangjia/CoD --include "perceptual_loss_illm_dec/*" --local-dir ./pretrained/CoD
# Download a specific model
huggingface-cli download zhaoyangjia/CoD cod/CoD_pixel_vpred.pt cod/CoD_pixel_vpred.yaml --local-dir ./pretrained/CoD
# Download everything
huggingface-cli download zhaoyangjia/CoD --local-dir ./pretrained/CoD
python -m cod.inference evaluate \
--ckpt ./pretrained/CoD/cod/CoD_pixel_vpred.pt \
--config ./pretrained/CoD/cod/CoD_pixel_vpred.yaml \
--input <image_dir> --output <recon_dir> \
--step 25 --cfg 3.0 --sampler adam2
# For latent model, use --cfg 1.25
python -m cod.inference evaluate \
--ckpt ./pretrained/CoD/cod/CoD_latent_vpred.pt \
--config ./pretrained/CoD/cod/CoD_latent_vpred.yaml \
--input <image_dir> --output <recon_dir> \
--step 25 --cfg 1.25 --sampler adam2
python -m downstream.finetuned_one_step_cod evaluate \
--ckpt ./pretrained/CoD/finetuned_one_step_cod/bpp_0_0039.pt \
--config ./pretrained/CoD/finetuned_one_step_cod/bpp_0_0039.yaml \
--input <image_dir> --output <recon_dir>
Requires NeuralCompression (installed automatically via torch.hub).
python -m downstream.perceptual_loss_inference \
--ckpt ./pretrained/CoD/perceptual_loss_illm_dec/msillm_quality_1.pt \
--quality 1 \
--input <image_dir> --output <recon_dir>
@inproceedings{jia2025cod,
title = {CoD: A Diffusion Foundation Model for Image Compression},
author = {Jia, Zhaoyang and Zheng, Zihan and Xue, Naifu and Li, Jiahao and Li, Bin and Guo, Zongyu and Zhang, Xiaoyi and Li, Houqiang and Lu, Yan},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2026}
}
MIT