exlaw commited on
Commit
5fa662c
·
verified ·
1 Parent(s): 08da575

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +91 -0
README.md ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ base_model: Qwen/Qwen2.5-7B
7
+ pipeline_tag: text-generation
8
+ tags:
9
+ - language model
10
+ - parallel-decoding
11
+ ---
12
+
13
+ # WeDLM-7B
14
+
15
+ **WeDLM-7B** is a diffusion language model that performs parallel decoding under standard causal attention, initialized from [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B).
16
+
17
+ This is the **base (pretrained)** version. For the instruction-tuned version, see [WeDLM-7B-Instruct](https://huggingface.co/tencent/WeDLM-7B-Instruct).
18
+
19
+ 📄 Paper (Coming Soon) | 🌐 [Project Page](https://wedlm.github.io) | 💻 [GitHub](https://github.com/tencent/WeDLM)
20
+
21
+ ## Model Details
22
+
23
+ | Attribute | Value |
24
+ |:----------|:------|
25
+ | Initialized From | [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) |
26
+ | Parameters | 7B |
27
+ | Context Length | 32,768 |
28
+
29
+ ## Quick Start (Recommended)
30
+
31
+ For **fast inference**, use the `wedlm` engine:
32
+
33
+ ```bash
34
+ pip install git+https://github.com/tencent/WeDLM.git
35
+ ```
36
+
37
+ ```python
38
+ from wedlm import LLM, SamplingParams
39
+
40
+ llm = LLM(model="tencent/WeDLM-7B")
41
+
42
+ prompt = "The theory of relativity states that"
43
+ outputs = llm.generate([prompt], SamplingParams(temperature=0.7, max_tokens=256))
44
+
45
+ print(outputs[0]["text"])
46
+ ```
47
+
48
+ ## HuggingFace Transformers
49
+
50
+ For **training** or simple forward passes, you can load via Transformers:
51
+
52
+ ```python
53
+ from transformers import AutoTokenizer, AutoModelForCausalLM
54
+
55
+ tokenizer = AutoTokenizer.from_pretrained("tencent/WeDLM-7B", trust_remote_code=True)
56
+ model = AutoModelForCausalLM.from_pretrained(
57
+ "tencent/WeDLM-7B",
58
+ trust_remote_code=True,
59
+ torch_dtype="auto",
60
+ device_map="auto"
61
+ )
62
+
63
+ inputs = tokenizer("The theory of relativity", return_tensors="pt").to(model.device)
64
+ outputs = model(**inputs)
65
+ ```
66
+
67
+ > ⚠️ **Note:** The HuggingFace interface is for training/forward pass convenience. For optimized inference throughput, use the `wedlm` engine above.
68
+
69
+ ## Performance
70
+
71
+ | Benchmark | Qwen2.5-7B | WeDLM-7B |
72
+ |:----------|:----------:|:--------:|
73
+ | ARC-C (0-shot) | 89.93 | 90.70 |
74
+ | GSM8K (3-shot) | 79.23 | 84.76 |
75
+ | MATH (4-shot) | 43.40 | 48.20 |
76
+ | HumanEval (4-shot) | 59.14 | 68.90 |
77
+ | MMLU (5-shot) | 71.62 | 71.93 |
78
+
79
+ ## Citation
80
+
81
+ ```bibtex
82
+ @article{liu2025wedlm,
83
+ title={WeDLM: Reconciling Diffusion Language Models with Standard Causal Attention for Fast Inference},
84
+ author={Liu, Aiwei and He, Minghua and Zeng, Shaoxun and Zhang, Linhao and Wu, Chuhan and Jia, Wei and Liu, Yuan and Yu, Yang and Zhou, Xiao and Zhou, Jie},
85
+ year={2025}
86
+ }
87
+ ```
88
+
89
+ ## License
90
+
91
+ Apache 2.0