feedseawave commited on
Commit
c2c2eee
·
verified ·
1 Parent(s): 681a30b

Upload 3 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ WeDLM-8B-Instruct-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
37
+ WeDLM-8B-Instruct-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,112 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ base_model: tencent/WeDLM-8B-Instruct
7
+ tags:
8
+ - gguf
9
+ - llama-cpp
10
+ - wedlm
11
+ - tencent
12
+ - qwen3
13
+ - quantized
14
+ library_name: gguf
15
+ pipeline_tag: text-generation
16
  ---
17
+
18
+ # WeDLM-8B-Instruct-GGUF
19
+
20
+ **First GGUF quantization of Tencent WeDLM-8B-Instruct!**
21
+
22
+ Quantized using llama.cpp b7688.
23
+
24
+ Original model: [tencent/WeDLM-8B-Instruct](https://huggingface.co/tencent/WeDLM-8B-Instruct)
25
+
26
+ ## About
27
+
28
+ WeDLM is an 8B parameter instruction-tuned model by Tencent, supporting English and Chinese. It features QK Norm architecture similar to Qwen3.
29
+
30
+ This GGUF uses `qwen3` architecture identifier for maximum llama.cpp compatibility.
31
+
32
+ ## Available Files
33
+
34
+ | Filename | Quant | Size | Description |
35
+ |----------|-------|------|-------------|
36
+ | WeDLM-8B-Instruct-Q4_K_M.gguf | Q4_K_M | 4.68 GB | Good quality, recommended for most use cases |
37
+ | WeDLM-8B-Instruct-Q8_0.gguf | Q8_0 | 8.11 GB | High quality, best accuracy |
38
+
39
+ ## Performance Benchmarks
40
+
41
+ ### CPU (16 threads, Zen4)
42
+
43
+ | Quant | Prompt Processing | Text Generation |
44
+ |-------|-------------------|-----------------|
45
+ | Q4_K_M | 88.65 t/s | 8.27 t/s |
46
+ | Q8_0 | 50.80 t/s | 5.17 t/s |
47
+
48
+ ### GPU (RTX 4060 Laptop, 8GB VRAM)
49
+
50
+ | Quant | Prompt Processing | Text Generation |
51
+ |-------|-------------------|-----------------|
52
+ | Q4_K_M | **1833.84 t/s** | **37.08 t/s** |
53
+
54
+ *Q4_K_M recommended for RTX 4060 (fits in 8GB VRAM)*
55
+
56
+ ## Prompt Format (ChatML)
57
+
58
+ ```
59
+ <|im_start|>system
60
+ You are a helpful AI assistant.<|im_end|>
61
+ <|im_start|>user
62
+ Hello!<|im_end|>
63
+ <|im_start|>assistant
64
+ ```
65
+
66
+ ## Usage
67
+
68
+ ### llama.cpp
69
+
70
+ ```bash
71
+ ./llama-cli -m WeDLM-8B-Instruct-Q4_K_M.gguf \
72
+ -p "<|im_start|>user\nHello<|im_end|>\n<|im_start|>assistant\n" \
73
+ -n 256 -ngl 99
74
+ ```
75
+
76
+ ### Ollama
77
+
78
+ ```bash
79
+ # Create Modelfile
80
+ cat > Modelfile << 'EOF'
81
+ FROM ./WeDLM-8B-Instruct-Q4_K_M.gguf
82
+ TEMPLATE "<|im_start|>user\n{{ .Prompt }}<|im_end|>\n<|im_start|>assistant\n"
83
+ EOF
84
+
85
+ ollama create wedlm -f Modelfile
86
+ ollama run wedlm
87
+ ```
88
+
89
+ ## Hardware Requirements
90
+
91
+ | Quant | Min VRAM | Recommended RAM |
92
+ |-------|----------|-----------------|
93
+ | Q4_K_M | 6 GB | 8 GB |
94
+ | Q8_0 | 10 GB | 12 GB |
95
+
96
+ ## Model Architecture
97
+
98
+ - Parameters: 8.19B
99
+ - Layers: 36
100
+ - Hidden Size: 4096
101
+ - Attention Heads: 32 (8 KV heads, GQA)
102
+ - Context Length: 16384
103
+ - Features: QK Norm, SwiGLU, RoPE (theta=1M)
104
+
105
+ ## Acknowledgements
106
+
107
+ - Original model: [Tencent WeDLM Team](https://huggingface.co/tencent)
108
+ - Inference framework: [llama.cpp](https://github.com/ggml-org/llama.cpp)
109
+
110
+ ## Disclaimer
111
+
112
+ This is an unofficial quantization. For official support, please refer to the original model repository.
WeDLM-8B-Instruct-Q4_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5c8b938dab334f03b9184e68f6466736153adb9d98b3f43119ee8c51852e1975
3
+ size 5027782208
WeDLM-8B-Instruct-Q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d75abe66fd2c3f980d2090efc6bf74013eeca7b99322b00fbd3b8e65cdbef239
3
+ size 8709516864