WeDLM-8B-Base / Readme.md
exlaw's picture
Upload folder using huggingface_hub (#1)
d9f58e0 verified
---
license: apache-2.0
language:
- en
- zh
base_model: Qwen/Qwen3-8B
pipeline_tag: text-generation
tags:
- language model
- parallel-decoding
---
# WeDLM-8B
**WeDLM-8B** is a diffusion language model that performs parallel decoding under standard causal attention, initialized from [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B).
This is the **base (pretrained)** version. For the instruction-tuned version, see [WeDLM-8B-Instruct](https://huggingface.co/tencent/WeDLM-8B-Instruct).
πŸ“„ Paper (Coming Soon) | 🌐 [Project Page](https://wedlm.github.io) | πŸ’» [GitHub](https://github.com/tencent/WeDLM)
## Model Details
| Attribute | Value |
|:----------|:------|
| Initialized From | [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) |
| Parameters | 8B |
| Context Length | 32,768 |
## Quick Start (Recommended)
For **fast inference**, use the `wedlm` engine:
```bash
pip install git+https://github.com/tencent/WeDLM.git
```
```python
from wedlm import LLM, SamplingParams
llm = LLM(model="tencent/WeDLM-8B")
prompt = "The theory of relativity states that"
outputs = llm.generate([prompt], SamplingParams(max_tokens=256))
print(outputs[0]["text"])
```
## HuggingFace Transformers
For **training** or simple forward passes, you can load via Transformers:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("tencent/WeDLM-8B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
"tencent/WeDLM-8B",
trust_remote_code=True,
torch_dtype="auto",
device_map="auto"
)
inputs = tokenizer("The theory of relativity", return_tensors="pt").to(model.device)
outputs = model(**inputs)
```
> ⚠️ **Note:** The HuggingFace interface is for training/forward pass convenience. For optimized inference throughput, use the `wedlm` engine above.
## Performance
| Benchmark | Qwen3-8B | WeDLM-8B |
|:----------|:--------:|:--------:|
| ARC-C (0-shot) | 92.66 | **92.92** |
| GSM8K (3-shot) | 85.97 | **90.20** |
| MATH (4-shot) | 50.80 | **53.60** |
| HumanEval (4-shot) | 68.90 | **75.00** |
| MMLU (5-shot) | 74.03 | **75.46** |
| **Average** | 72.61 | **74.72** |
## Citation (Coming soon)
## License
Apache 2.0