Dual-Constrained Diffusion Image Compression for Operational Rate-Distortion-Perception Optimization
Abstract
DCIC introduces a dual-constrained diffusion image compression framework that jointly optimizes fidelity and perceptual realism through distortion and idempotence constraints, enabling flexible trade-offs across the rate-distortion-perception surface without additional rate overhead.
The rate-distortion-perception (RDP) trade-off extends classical rate--distortion theory by imposing a distributional constraint on reconstructions, providing a unified framework for neural image compression that jointly governs fidelity and perceptual realism. While prior work achieves near-optimal rate--perception trade-offs, practical frameworks explicitly realizing the full RDP surface remain scarce, primarily due to the difficulty of introducing common randomness at the decoder. We propose DCIC (Dual-Constrained Diffusion Image Compression), which integrates a learned codec with a diffusion-based decoder governed by joint distortion and idempotence constraints. The distortion constraint bounds reconstruction fidelity relative to the base codec output; the idempotence constraint -- requiring that re-encoding the restored image recovers the base codec reconstruction -- serves as a tractable surrogate for the distributional perception requirement. Together, they steer the reverse denoising process via iterative optimization with consistent noise injection, realizing common randomness without additional rate overhead. At fixed rate, dual attenuation factors (K_D, K_P) jointly navigate the Pareto frontier of the distortion-perception plane, enabling continuously adjustable fidelity-realism trade-offs from a single bitstream. DCIC_{RD} (K_P{=}0) and DCIC_{RP} (K_D{=}0) arise as boundary curves, with DCIC_{RDP} (K_D = K_P=1) realizing the optimal interior operating point. Experiments on CelebA-HQ, CLIC2020, and ImageNet-1K across CNN, Transformer, and hybrid architectures confirm that DCIC_{RDP} achieves superior BD-PSNR over all perceptual codecs, while DCIC_{RP} matches dedicated perception-oriented methods in BD-FID, validating the practical value of full RDP surface navigation.
Get this paper in your agent:
hf papers read 2606.13366 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper