|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
- zh |
|
|
base_model: Qwen/Qwen2.5-7B |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- language model |
|
|
- parallel-decoding |
|
|
--- |
|
|
|
|
|
# WeDLM-7B |
|
|
|
|
|
**WeDLM-7B** is a diffusion language model that performs parallel decoding under standard causal attention, initialized from [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B). |
|
|
|
|
|
This is the **base (pretrained)** version. For the instruction-tuned version, see [WeDLM-7B-Instruct](https://huggingface.co/tencent/WeDLM-7B-Instruct). |
|
|
|
|
|
π Paper (Coming Soon) | π [Project Page](https://wedlm.github.io) | π» [GitHub](https://github.com/tencent/WeDLM) |
|
|
|
|
|
## Model Details |
|
|
|
|
|
| Attribute | Value | |
|
|
|:----------|:------| |
|
|
| Initialized From | [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) | |
|
|
| Parameters | 7B | |
|
|
| Context Length | 32,768 | |
|
|
|
|
|
## Quick Start (Recommended) |
|
|
|
|
|
For **fast inference**, use the `wedlm` engine: |
|
|
|
|
|
```bash |
|
|
pip install git+https://github.com/tencent/WeDLM.git |
|
|
``` |
|
|
|
|
|
```python |
|
|
from wedlm import LLM, SamplingParams |
|
|
|
|
|
llm = LLM(model="tencent/WeDLM-7B") |
|
|
|
|
|
prompt = "The theory of relativity states that" |
|
|
outputs = llm.generate([prompt], SamplingParams(temperature=0.7, max_tokens=256)) |
|
|
|
|
|
print(outputs[0]["text"]) |
|
|
``` |
|
|
|
|
|
## HuggingFace Transformers |
|
|
|
|
|
For **training** or simple forward passes, you can load via Transformers: |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("tencent/WeDLM-7B", trust_remote_code=True) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
"tencent/WeDLM-7B", |
|
|
trust_remote_code=True, |
|
|
torch_dtype="auto", |
|
|
device_map="auto" |
|
|
) |
|
|
|
|
|
inputs = tokenizer("The theory of relativity", return_tensors="pt").to(model.device) |
|
|
outputs = model(**inputs) |
|
|
``` |
|
|
|
|
|
> β οΈ **Note:** The HuggingFace interface is for training/forward pass convenience. For optimized inference throughput, use the `wedlm` engine above. |
|
|
|
|
|
## Performance |
|
|
|
|
|
| Benchmark | Qwen2.5-7B | WeDLM-7B | |
|
|
|:----------|:----------:|:--------:| |
|
|
| ARC-C (0-shot) | 89.93 | 90.70 | |
|
|
| GSM8K (3-shot) | 79.23 | 84.76 | |
|
|
| MATH (4-shot) | 43.40 | 48.20 | |
|
|
| HumanEval (4-shot) | 59.14 | 68.90 | |
|
|
| MMLU (5-shot) | 71.62 | 71.93 | |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@article{liu2025wedlm, |
|
|
title={WeDLM: Reconciling Diffusion Language Models with Standard Causal Attention for Fast Inference}, |
|
|
author={Liu, Aiwei and He, Minghua and Zeng, Shaoxun and Zhang, Linhao and Wu, Chuhan and Jia, Wei and Liu, Yuan and Yu, Yang and Zhou, Xiao and Zhou, Jie}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 |