β¨ Highlights β’ β‘ Up to 3.4Γ faster on dense multi-region captioning, with stable per-image latency β’ π PerceptionDLM-Base beats LLaDA-V on 15/16 multimodal benchmarks (new SOTA among open diffusion VLMs) β’ π New benchmark: ParaDLC-Bench β jointly evaluates caption quality AND inference efficiency β’ π Code, models & benchmark all open-sourced