--- license: mit pipeline_tag: text-generation library_name: peft --- # CDLM-LLaDA LoRA adapter for LLaDA-8B-Instruct This repository hosts the LoRA adapter for the LLaDA-8B-Instruct diffusion LLM (dLLM), produced with the CDLM (Consistency Diffusion Language Models) method. CDLM integrates consistency modeling and a block-wise causal attention mask so the student model becomes fully KV-cache compatible while retaining the strong local bidirectional modeling within each block. In practice, the adapter enables significantly faster inference with competitive quality. - GitHub: https://github.com/SqueezeAILab/CDLM - Paper: [CDLM: Consistency Diffusion Language Models For Faster Sampling](https://huggingface.co/papers/2511.19269) ## Model details - Base model: GSAI-ML/LLaDA-8B-Instruct - Method: CDLM (consistency distillation + block-wise causal masking for KV-cache compatibility) - Format: PEFT LoRA adapter (`adapter_model.safetensors`, `adapter_config.json`) - Intended use: attach this adapter to the base LLaDA-8B-Instruct model for accelerated inference via the CDLM decoding path ## How to use This is a LoRA adapter, not a full model. You must load the base model and then attach this adapter. For best speedups, use the CDLM inference path in the accompanying codebase. ## License This adapter is released under the MIT License. The base model is governed by its own license; please ensure compliance with the base model’s terms. ## Citation ```bibtex @article{kim2025cdlm, title = {CDLM: Consistency Diffusion Language Models for Faster Sampling}, author = {Kim, Minseo and Xu, Chenfeng and Hooper, Coleman and Singh, Harman and Athiwaratkun, Ben and Zhang, Ce and Keutzer, Kurt and Gholami, Amir}, journal = {arXiv preprint arXiv:2511.19269}, year = {2025}, url = {https://arxiv.org/abs/2511.19269} } ```