exlaw commited on
Commit
2eb836a
·
verified ·
1 Parent(s): 24e896e

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +104 -0
README.md ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ base_model: tencent/WeDLM-7B
7
+ pipeline_tag: text-generation
8
+ tags:
9
+ - language model
10
+ - parallel-decoding
11
+ - chat
12
+ - instruct
13
+ ---
14
+
15
+ # WeDLM-7B-Instruct
16
+
17
+ **WeDLM-7B-Instruct** is an instruction-tuned diffusion language model that performs parallel decoding under standard causal attention, fine-tuned from [WeDLM-7B](https://huggingface.co/tencent/WeDLM-7B).
18
+
19
+ For the base (pretrained) version, see [WeDLM-7B](https://huggingface.co/tencent/WeDLM-7B).
20
+
21
+ 📄 Paper (Coming Soon) | 🌐 [Project Page](https://wedlm.github.io) | 💻 [GitHub](https://github.com/tencent/WeDLM)
22
+
23
+ ## Model Details
24
+
25
+ | Attribute | Value |
26
+ |:----------|:------|
27
+ | Base Model | [WeDLM-7B](https://huggingface.co/tencent/WeDLM-7B) |
28
+ | Parameters | 7B |
29
+ | Context Length | 32,768 |
30
+
31
+ ## Quick Start (Recommended)
32
+
33
+ For **fast inference**, use the `wedlm` engine:
34
+
35
+ ```bash
36
+ pip install git+https://github.com/tencent/WeDLM.git
37
+ ```
38
+
39
+ ```python
40
+ from transformers import AutoTokenizer
41
+ from wedlm import LLM, SamplingParams
42
+
43
+ llm = LLM(model="tencent/WeDLM-7B-Instruct")
44
+ tokenizer = AutoTokenizer.from_pretrained("tencent/WeDLM-7B-Instruct", trust_remote_code=True)
45
+
46
+ prompt = "Explain the difference between machine learning and deep learning."
47
+ messages = [{"role": "user", "content": prompt}]
48
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
49
+
50
+ outputs = llm.generate([text], SamplingParams(temperature=0.3, max_tokens=512))
51
+ print(outputs[0]["text"])
52
+ ```
53
+
54
+ ### Multi-turn Conversation
55
+
56
+ ```python
57
+ messages = [
58
+ {"role": "user", "content": "What is Python?"},
59
+ {"role": "assistant", "content": "Python is a high-level programming language known for its simplicity and readability."},
60
+ {"role": "user", "content": "Show me a hello world example."}
61
+ ]
62
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
63
+ outputs = llm.generate([text], SamplingParams(temperature=0.3, max_tokens=256))
64
+ ```
65
+
66
+ ## HuggingFace Transformers
67
+
68
+ For **training** or simple forward passes:
69
+
70
+ ```python
71
+ from transformers import AutoTokenizer, AutoModelForCausalLM
72
+
73
+ tokenizer = AutoTokenizer.from_pretrained("tencent/WeDLM-7B-Instruct", trust_remote_code=True)
74
+ model = AutoModelForCausalLM.from_pretrained(
75
+ "tencent/WeDLM-7B-Instruct",
76
+ trust_remote_code=True,
77
+ torch_dtype="auto",
78
+ device_map="auto"
79
+ )
80
+
81
+ messages = [{"role": "user", "content": "Hello!"}]
82
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
83
+ inputs = tokenizer(text, return_tensors="pt").to(model.device)
84
+ outputs = model(**inputs)
85
+ ```
86
+
87
+ > ⚠️ **Note:** The HuggingFace interface is for training/forward pass convenience. For optimized inference throughput, use the `wedlm` engine above.
88
+
89
+ ## Performance
90
+
91
+ | Benchmark | Qwen2.5-7B-Instruct | WeDLM-7B-Instruct |
92
+ |:----------|:-------------------:|:-----------------:|
93
+ | ARC-C (0-shot) | 86.09 | 89.59 |
94
+ | GSM8K (3-shot) | 89.91 | 87.57 |
95
+ | MATH (4-shot) | 45.00 | 55.40 |
96
+ | HumanEval (4-shot) | 76.22 | 75.00 |
97
+ | MMLU (5-shot) | 71.98 | 70.52 |
98
+
99
+ ## Citation (Coming soon)
100
+
101
+
102
+ ## License
103
+
104
+ Apache 2.0