tencent
/

WeDLM-7B-Base

Text Generation

parallel-decoding

Model card Files Files and versions

exlaw commited on 9 days ago

Commit

5fa662c

·

verified ·

1 Parent(s): 08da575

Upload folder using huggingface_hub

Files changed (1) hide show

README.md +91 -0

README.md ADDED Viewed

	@@ -0,0 +1,91 @@

+---
+license: apache-2.0
+language:
+- en
+- zh
+base_model: Qwen/Qwen2.5-7B
+pipeline_tag: text-generation
+tags:
+- language model
+- parallel-decoding
+---
+# WeDLM-7B
+**WeDLM-7B** is a diffusion language model that performs parallel decoding under standard causal attention, initialized from [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B).
+This is the **base (pretrained)** version. For the instruction-tuned version, see [WeDLM-7B-Instruct](https://huggingface.co/tencent/WeDLM-7B-Instruct).
+📄 Paper (Coming Soon) | 🌐 [Project Page](https://wedlm.github.io) | 💻 [GitHub](https://github.com/tencent/WeDLM)
+## Model Details
+| Attribute | Value |
+|:----------|:------|
+| Initialized From | [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) |
+| Parameters | 7B |
+| Context Length | 32,768 |
+## Quick Start (Recommended)
+For **fast inference**, use the `wedlm` engine:
+```bash
+pip install git+https://github.com/tencent/WeDLM.git
+```
+```python
+from wedlm import LLM, SamplingParams
+llm = LLM(model="tencent/WeDLM-7B")
+prompt = "The theory of relativity states that"
+outputs = llm.generate([prompt], SamplingParams(temperature=0.7, max_tokens=256))
+print(outputs[0]["text"])
+```
+## HuggingFace Transformers
+For **training** or simple forward passes, you can load via Transformers:
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("tencent/WeDLM-7B", trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    "tencent/WeDLM-7B",
+    trust_remote_code=True,
+    torch_dtype="auto",
+    device_map="auto"
+)
+inputs = tokenizer("The theory of relativity", return_tensors="pt").to(model.device)
+outputs = model(**inputs)
+```
+> ⚠️ **Note:** The HuggingFace interface is for training/forward pass convenience. For optimized inference throughput, use the `wedlm` engine above.
+## Performance
+| Benchmark | Qwen2.5-7B | WeDLM-7B |
+|:----------|:----------:|:--------:|
+| ARC-C (0-shot) | 89.93 | 90.70 |
+| GSM8K (3-shot) | 79.23 | 84.76 |
+| MATH (4-shot) | 43.40 | 48.20 |
+| HumanEval (4-shot) | 59.14 | 68.90 |
+| MMLU (5-shot) | 71.62 | 71.93 |
+## Citation
+```bibtex
+@article{liu2025wedlm,
+  title={WeDLM: Reconciling Diffusion Language Models with Standard Causal Attention for Fast Inference},
+  author={Liu, Aiwei and He, Minghua and Zeng, Shaoxun and Zhang, Linhao and Wu, Chuhan and Jia, Wei and Liu, Yuan and Yu, Yang and Zhou, Xiao and Zhou, Jie},
+  year={2025}
+}
+```
+## License
+Apache 2.0