tencent
/

WeDLM-7B-Base

Text Generation

parallel-decoding

Model card Files Files and versions

WeDLM-7B-Base / Readme.md

exlaw's picture

Upload folder using huggingface_hub (#2)

08da575 verified 5 days ago

|

history blame contribute delete

2.5 kB

	---
	license: apache-2.0
	language:
	- en
	- zh
	base_model: Qwen/Qwen2.5-7B
	pipeline_tag: text-generation
	tags:
	- language model
	- parallel-decoding
	---

	# WeDLM-7B

	WeDLM-7B is a diffusion language model that performs parallel decoding under standard causal attention, initialized from [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B).

	This is the base (pretrained) version. For the instruction-tuned version, see [WeDLM-7B-Instruct](https://huggingface.co/tencent/WeDLM-7B-Instruct).

	📄 Paper (Coming Soon) \| 🌐 [Project Page](https://wedlm.github.io) \| 💻 [GitHub](https://github.com/tencent/WeDLM)

	## Model Details

	\| Attribute \| Value \|
	\|:----------\|:------\|
	\| Initialized From \| [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) \|
	\| Parameters \| 7B \|
	\| Context Length \| 32,768 \|

	## Quick Start (Recommended)

	For fast inference, use the `wedlm` engine:

	```bash
	pip install git+https://github.com/tencent/WeDLM.git
	```

	```python
	from wedlm import LLM, SamplingParams

	llm = LLM(model="tencent/WeDLM-7B")

	prompt = "The theory of relativity states that"
	outputs = llm.generate([prompt], SamplingParams(temperature=0.7, max_tokens=256))

	print(outputs[0]["text"])
	```

	## HuggingFace Transformers

	For training or simple forward passes, you can load via Transformers:

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	tokenizer = AutoTokenizer.from_pretrained("tencent/WeDLM-7B", trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	"tencent/WeDLM-7B",
	trust_remote_code=True,
	torch_dtype="auto",
	device_map="auto"
	)

	inputs = tokenizer("The theory of relativity", return_tensors="pt").to(model.device)
	outputs = model(**inputs)
	```

	> ⚠️ Note: The HuggingFace interface is for training/forward pass convenience. For optimized inference throughput, use the `wedlm` engine above.

	## Performance

	\| Benchmark \| Qwen2.5-7B \| WeDLM-7B \|
	\|:----------\|:----------:\|:--------:\|
	\| ARC-C (0-shot) \| 89.93 \| 90.70 \|
	\| GSM8K (3-shot) \| 79.23 \| 84.76 \|
	\| MATH (4-shot) \| 43.40 \| 48.20 \|
	\| HumanEval (4-shot) \| 59.14 \| 68.90 \|
	\| MMLU (5-shot) \| 71.62 \| 71.93 \|

	## Citation

	```bibtex
	@article{liu2025wedlm,
	title={WeDLM: Reconciling Diffusion Language Models with Standard Causal Attention for Fast Inference},
	author={Liu, Aiwei and He, Minghua and Zeng, Shaoxun and Zhang, Linhao and Wu, Chuhan and Jia, Wei and Liu, Yuan and Yu, Yang and Zhou, Xiao and Zhou, Jie},
	year={2025}
	}
	```

	## License

	Apache 2.0