apple
/

CADD-Base-7B

Text Generation

diffusion language model

Model card Files Files and versions

CADD-Base-7B / README.md

hjzheng's picture

Update README.md

074d33f verified 2 months ago

|

History Blame Contribute Delete

2.75 kB

	---
	license: apple-amlr
	base_model:
	- Qwen/Qwen2.5-Coder-7B-Instruct
	pipeline_tag: text-generation
	tags:
	- code
	- diffusion
	- Dream
	- diffusion language model
	---


	### CADD-Base-7B

	CADD-Base-7B is a masked diffusion language model for code generation, augmented with Continuously Augmented Discrete Diffusion (CADD) --- a continuous flow-matching signal that guides the discrete denoising process.

	Key idea: At each diffusion step, a continuous embedding `z_continuous` is added to masked-token embeddings, following a linear flow-matching trajectory from noise to clean embeddings. This is orthogonal to the discrete unmasking strategy --- any MDM algorithm can be combined with CADD.

	#### Usage

	```python
	import torch
	from transformers import AutoModel, AutoTokenizer

	model_path = "apple/CADD-Base-7B"
	model = AutoModel.from_pretrained(model_path, torch_dtype=torch.bfloat16, trust_remote_code=True)
	tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
	model = model.to("cuda").eval()

	prompt = "def fibonacci(n):\n"
	input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")

	output = model.diffusion_generate(
	input_ids,
	max_new_tokens=512,
	steps=512,
	temperature=0.1,
	alg="entropy",
	alg_temp=0.0,
	use_cadd=True,
	cadd_sampling_mode="weighted",
	)

	print(tokenizer.decode(output[0], skip_special_tokens=True))
	```

	#### CADD Sampling Parameters

	\| Parameter \| Type \| Default \| Description \|
	\|:---\|:---:\|:---:\|:---\|
	\| `use_cadd` \| bool \| `True` \| Enable CADD continuous augmentation \|
	\| `cadd_sampling_mode` \| str \| `"argmax"` \| How to estimate z_0 from logits: `"weighted"` or `"argmax"` \|
	\| `alg` \| str \| `"origin"` \| Unmasking strategy: `"entropy"`, `"origin"`, `"maskgit_plus"`, `"topk_margin"` \|
	\| `temperature` \| float \| `1.0` \| Sampling temperature for token prediction \|
	\| `steps` \| int \| `512` \| Number of diffusion steps \|

	#### More details:

	- Paper: [Continuously Augmented Discrete Diffusion Model for Categorical Generative Modeling](https://arxiv.org/abs/2510.01329) (ICLR 2026)
	- GitHub: https://github.com/apple/ml-CADD

	#### Citation

	```bibtex
	@article{zheng2025continuously,
	title={Continuously augmented discrete diffusion model for categorical generative modeling},
	author={Zheng, Huangjie and Gong, Shansan and Zhang, Ruixiang and Chen, Tianrong and Gu, Jiatao and Zhou, Mingyuan and Jaitly, Navdeep and Zhang, Yizhe},
	journal={arXiv preprint arXiv:2510.01329},
	year={2025}
	}
	```

	#### Acknowledgment

	To power this HuggingFace model release, we build upon and improve [DiffuCoder](https://github.com/apple/ml-diffucoder), reusing [Dream](https://huggingface.co/Dream-org/Dream-v0-Base-7B)'s modeling architecture and generation utils.