---
license: mit
pipeline_tag: text-generation
library_name: peft
---

# CDLM-LLaDA LoRA adapter for LLaDA-8B-Instruct

This repository hosts the LoRA adapter for the LLaDA-8B-Instruct diffusion LLM (dLLM), produced with the CDLM (Consistency Diffusion Language Models) method. CDLM integrates consistency modeling and a block-wise causal attention mask so the student model becomes fully KV-cache compatible while retaining the strong local bidirectional modeling within each block. In practice, the adapter enables significantly faster inference with competitive quality.

- GitHub: https://github.com/SqueezeAILab/CDLM
- Paper: [CDLM: Consistency Diffusion Language Models For Faster Sampling](https://huggingface.co/papers/2511.19269)


## Model details

- Base model: GSAI-ML/LLaDA-8B-Instruct
- Method: CDLM (consistency distillation + block-wise causal masking for KV-cache compatibility)
- Format: PEFT LoRA adapter (`adapter_model.safetensors`, `adapter_config.json`)
- Intended use: attach this adapter to the base LLaDA-8B-Instruct model for accelerated inference via the CDLM decoding path


## How to use

This is a LoRA adapter, not a full model. You must load the base model and then attach this adapter. For best speedups, use the CDLM inference path in the accompanying codebase.

## License

This adapter is released under the MIT License. The base model is governed by its own license; please ensure compliance with the base model’s terms.


## Citation

```bibtex
@article{kim2025cdlm,
  title   = {CDLM: Consistency Diffusion Language Models for Faster Sampling},
  author  = {Kim, Minseo and Xu, Chenfeng and Hooper, Coleman and Singh, Harman 
             and Athiwaratkun, Ben and Zhang, Ce and Keutzer, Kurt and Gholami, Amir},
  journal = {arXiv preprint arXiv:2511.19269},
  year    = {2025},
  url     = {https://arxiv.org/abs/2511.19269}
}
```