--- license: apache-2.0 language: - en - zh base_model: Qwen/Qwen2.5-7B pipeline_tag: text-generation tags: - language model - parallel-decoding --- # WeDLM-7B **WeDLM-7B** is a diffusion language model that performs parallel decoding under standard causal attention, initialized from [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B). This is the **base (pretrained)** version. For the instruction-tuned version, see [WeDLM-7B-Instruct](https://huggingface.co/tencent/WeDLM-7B-Instruct). 📄 Paper (Coming Soon) | 🌐 [Project Page](https://wedlm.github.io) | 💻 [GitHub](https://github.com/tencent/WeDLM) ## Model Details | Attribute | Value | |:----------|:------| | Initialized From | [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) | | Parameters | 7B | | Context Length | 32,768 | ## Quick Start (Recommended) For **fast inference**, use the `wedlm` engine: ```bash pip install git+https://github.com/tencent/WeDLM.git ``` ```python from wedlm import LLM, SamplingParams llm = LLM(model="tencent/WeDLM-7B") prompt = "The theory of relativity states that" outputs = llm.generate([prompt], SamplingParams(temperature=0.7, max_tokens=256)) print(outputs[0]["text"]) ``` ## HuggingFace Transformers For **training** or simple forward passes, you can load via Transformers: ```python from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("tencent/WeDLM-7B", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( "tencent/WeDLM-7B", trust_remote_code=True, torch_dtype="auto", device_map="auto" ) inputs = tokenizer("The theory of relativity", return_tensors="pt").to(model.device) outputs = model(**inputs) ``` > ⚠️ **Note:** The HuggingFace interface is for training/forward pass convenience. For optimized inference throughput, use the `wedlm` engine above. ## Performance | Benchmark | Qwen2.5-7B | WeDLM-7B | |:----------|:----------:|:--------:| | ARC-C (0-shot) | 89.93 | 90.70 | | GSM8K (3-shot) | 79.23 | 84.76 | | MATH (4-shot) | 43.40 | 48.20 | | HumanEval (4-shot) | 59.14 | 68.90 | | MMLU (5-shot) | 71.62 | 71.93 | ## Citation ```bibtex @article{liu2025wedlm, title={WeDLM: Reconciling Diffusion Language Models with Standard Causal Attention for Fast Inference}, author={Liu, Aiwei and He, Minghua and Zeng, Shaoxun and Zhang, Linhao and Wu, Chuhan and Jia, Wei and Liu, Yuan and Yu, Yang and Zhou, Xiao and Zhou, Jie}, year={2025} } ``` ## License Apache 2.0