g023 commited on
Commit
156c198
Β·
verified Β·
1 Parent(s): cd8aa79

Upload 6 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,8 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ Qwen3-g023-tiny-v2-Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
37
+ Qwen3-g023-tiny-v2-Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
38
+ Qwen3-g023-tiny-v2-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
39
+ Qwen3-g023-tiny-v2-Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
40
+ Qwen3-g023-tiny-v2-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
Qwen3-g023-tiny-v2-Q2_K.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:93c5f34612e203562c15ed55059ef9b91d0e4ebe9747d79ea2478cf479876c5c
3
+ size 814695424
Qwen3-g023-tiny-v2-Q3_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8e18787fb7ed95c5201f1c42709d1d1c1a6bbc540c3a709750cf44bed305d75e
3
+ size 987841536
Qwen3-g023-tiny-v2-Q4_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b225c0fd5eae542b24e60c742f106c0ee7353df8f85ef0d06d8b22282db5ccda
3
+ size 1164067840
Qwen3-g023-tiny-v2-Q6_K.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e1d73d2e9396ce9a7a85adddcf63861be3df4aa382a7aae2d46064c834259d24
3
+ size 1500365824
Qwen3-g023-tiny-v2-Q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2fb5fb8b5b6a3d9308dc522c4b8fa8e80fa861212f9d543cf929ba0fbb6feb18
3
+ size 1941416960
README.md CHANGED
@@ -1,3 +1,212 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model: Qwen/Qwen3-1.7B
6
+ tags:
7
+ - qwen3
8
+ - gguf
9
+ - layer-surgery
10
+ - small-language-model
11
+ - optimized
12
+ - thinking
13
+ - text-generation
14
+ - skip-connections
15
+ - interpolation
16
+ model_name: Qwen3-g023-tiny-v2
17
+ pipeline_tag: text-generation
18
+ library_name: llama.cpp
19
+ quantized_by: g023
20
+ ---
21
+
22
+ # Qwen3-g023-tiny-v2 β€” GGUF
23
+
24
+ **An advanced 30-layer Qwen3 variant using swap, interpolation, and skip-bridge surgery.**
25
+
26
+ Created through innovative layer surgery combining multi-swap, interpolation, and bridge (skip connection) techniques. Scores **94.3/100** β€” a 6.5-point improvement over the original Qwen3-1.7B baseline (87.8/100) and the highest score achieved in two phases of experimentation across ~250 configurations. (I have my own benchmarks, so results may vary if you run your own tests.)
27
+
28
+ ## Available Quantizations
29
+
30
+ | Quantization | Bits/Weight | Description | Download |
31
+ |:---:|:---:|:---|:---:|
32
+ | **Q8_0** | 8.00 | Highest quality, virtually lossless | [Qwen3-g023-tiny-v2-Q8_0.gguf](./Qwen3-g023-tiny-v2-Q8_0.gguf) |
33
+ | **Q6_K** | 6.57 | Excellent quality, good compression | [Qwen3-g023-tiny-v2-Q6_K.gguf](./Qwen3-g023-tiny-v2-Q6_K.gguf) |
34
+ | **Q4_K_M** | 4.85 | Good balance of quality and size | [Qwen3-g023-tiny-v2-Q4_K_M.gguf](./Qwen3-g023-tiny-v2-Q4_K_M.gguf) |
35
+ | **Q3_K_M** | 3.91 | High compression, moderate quality loss | [Qwen3-g023-tiny-v2-Q3_K_M.gguf](./Qwen3-g023-tiny-v2-Q3_K_M.gguf) |
36
+ | **Q2_K** | 3.35 | Maximum compression, significant quality loss | [Qwen3-g023-tiny-v2-Q2_K.gguf](./Qwen3-g023-tiny-v2-Q2_K.gguf) |
37
+
38
+ ## Model Details
39
+
40
+ | Parameter | Value |
41
+ |:---|:---|
42
+ | Architecture | Qwen3ForCausalLM |
43
+ | Layers | **30** (28 original + 2 from surgery) |
44
+ | Hidden Size | 2,048 |
45
+ | Intermediate Size | 6,144 |
46
+ | Attention Heads | 16 query / 8 key-value (GQA) |
47
+ | Head Dimension | 128 |
48
+ | Vocabulary | 151,936 tokens |
49
+ | Max Context | 40,960 tokens |
50
+ | RoPE ΞΈ | 1,000,000 |
51
+ | Tied Embeddings | Yes |
52
+ | Total Parameters | **~1.82B** |
53
+ | Precision (source) | bfloat16 |
54
+
55
+ ## Surgery Operations
56
+
57
+ This model was created by applying three innovative surgical operations to [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B):
58
+
59
+ 1. **Multi-swap: layers 12↔13 and 16↔17** β€” Reorders attention blocks at two critical points in the network for improved representational flow through the mid-layers.
60
+ 2. **Interpolation: layers 20 & 22 (Ξ±=0.5)** β€” Creates a new layer by blending the weights of layers 20 and 22 at equal proportions, producing a smoother transition in the upper layers.
61
+ 3. **Bridge (skip connection): layer 5 β†’ after layer 20** β€” Copies early-layer representations (layer 5) and inserts them after layer 20, creating a skip connection that helps preserve low-level features deep in the network.
62
+
63
+ ### Why These Operations Work
64
+
65
+ - **Multi-swap** corrects suboptimal layer ordering that emerged from pre-training, allowing better gradient flow through the network's critical middle section.
66
+ - **Interpolation** creates a synthetic transition layer that smooths the representation gap between layers 20 and 22, reducing the information bottleneck.
67
+ - **Bridge/skip connections** address the "forgetting problem" in deep networks by reintroducing early feature representations at later stages β€” a technique inspired by ResNet's residual connections but applied at the transformer layer level.
68
+
69
+ ## Benchmark Results
70
+
71
+ | Metric | Original (28L) | [v1 (27L)](https://huggingface.co/g023/Qwen3-g023-tiny-v1-GGUF) | **v2 (30L)** | Ξ” vs Original |
72
+ |:---|:---:|:---:|:---:|:---:|
73
+ | **Overall Score** | 87.8 / 100 | 92.9 / 100 | **94.3 / 100** | **+6.5** |
74
+ | **Factual Accuracy** | 15/17 (88%) | 17/17 (100%) | **16/17 (94%)** | **+6%** |
75
+ | Avg Perplexity | β€” | 15.70 | **15.17** | β€” |
76
+ | Thinking Mode | βœ… | βœ… | βœ… | β€” |
77
+ | Non-Thinking Mode | βœ… | βœ… | βœ… | β€” |
78
+
79
+ Evaluated using a comprehensive test suite with 17 factual questions, 2 completion coherence tests, perplexity measurements, repetition analysis, and thinking/non-thinking mode verification.
80
+
81
+ ## Features
82
+
83
+ - **Thinking mode**: Full `<think>` / `</think>` reasoning support β€” toggle via `enable_thinking` parameter
84
+ - **Non-thinking mode**: Direct responses without chain-of-thought overhead
85
+ - **Tool calling**: Full function/tool calling support
86
+ - **System prompts**: Standard system message support
87
+ - **Chat template**: Qwen3 ChatML template embedded in the GGUF
88
+
89
+ ## Usage
90
+
91
+ ### With Ollama
92
+
93
+ ```bash
94
+ # Download the GGUF and create from Modelfile
95
+ cat > Modelfile << 'EOF'
96
+ FROM ./Qwen3-g023-tiny-v2-Q4_K_M.gguf
97
+
98
+ PARAMETER temperature 0.6
99
+ PARAMETER top_p 0.95
100
+ PARAMETER top_k 20
101
+ PARAMETER min_p 0.0
102
+
103
+ TEMPLATE """{{- if .System }}
104
+ <|im_start|>system
105
+ {{ .System }}<|im_end|>
106
+ {{ end }}
107
+ {{- range .Messages }}
108
+ {{- if eq .Role "user" }}
109
+ <|im_start|>user
110
+ {{ .Content }}<|im_end|>
111
+ {{- else if eq .Role "assistant" }}
112
+ <|im_start|>assistant
113
+ {{ .Content }}<|im_end|>
114
+ {{- end }}
115
+ {{- end }}
116
+ <|im_start|>assistant
117
+ """
118
+ SYSTEM "You are a helpful assistant."
119
+ EOF
120
+
121
+ ollama create qwen3-tiny-v2 -f Modelfile
122
+ ollama run qwen3-tiny-v2
123
+ ```
124
+
125
+ ### With llama.cpp
126
+
127
+ ```bash
128
+ # Interactive chat
129
+ llama-cli -m Qwen3-g023-tiny-v2-Q4_K_M.gguf \
130
+ --chat-template chatml -cnv
131
+
132
+ # Thinking mode
133
+ llama-cli -m Qwen3-g023-tiny-v2-Q4_K_M.gguf \
134
+ -p "<|im_start|>user\nExplain quantum computing<|im_end|>\n<|im_start|>assistant\n<think>\n" \
135
+ -n 512
136
+
137
+ # Non-thinking mode
138
+ llama-cli -m Qwen3-g023-tiny-v2-Q4_K_M.gguf \
139
+ -p "<|im_start|>user\n/no_think What is 2+2?<|im_end|>\n<|im_start|>assistant\n" \
140
+ -n 128
141
+ ```
142
+
143
+ ### With Python (llama-cpp-python)
144
+
145
+ ```python
146
+ from llama_cpp import Llama
147
+
148
+ model = Llama("Qwen3-g023-tiny-v2-Q4_K_M.gguf", n_ctx=4096)
149
+ response = model.create_chat_completion(
150
+ messages=[
151
+ {"role": "system", "content": "You are a helpful assistant."},
152
+ {"role": "user", "content": "What is the capital of France?"},
153
+ ],
154
+ temperature=0.6,
155
+ )
156
+ print(response["choices"][0]["message"]["content"])
157
+ ```
158
+
159
+ ## System Requirements
160
+
161
+ | Quantization | RAM (CPU) | VRAM (GPU) |
162
+ |:---:|:---:|:---:|
163
+ | Q8_0 | ~2.2 GB | ~2.2 GB |
164
+ | Q6_K | ~1.8 GB | ~1.8 GB |
165
+ | Q4_K_M | ~1.4 GB | ~1.4 GB |
166
+ | Q3_K_M | ~1.2 GB | ~1.2 GB |
167
+ | Q2_K | ~1.0 GB | ~1.0 GB |
168
+
169
+ ## v1 vs v2
170
+
171
+ This model (v2) is the **Phase 2 champion**, using advanced multi-operation surgery for the highest overall score.
172
+
173
+ | | [v1](https://huggingface.co/g023/Qwen3-g023-tiny-v1-GGUF) | v2 (this model) |
174
+ |:---|:---:|:---:|
175
+ | Layers | 27 | 30 |
176
+ | Parameters | ~1.67B | ~1.82B |
177
+ | Operations | del + swap | swap + interpolate + bridge |
178
+ | Score | 92.9 / 100 | 94.3 / 100 |
179
+ | Factual | 100% (17/17) | 94% (16/17) |
180
+ | Perplexity | 15.70 | 15.17 |
181
+ | Use Case | Max factual accuracy | Max overall score |
182
+
183
+ **v1** is recommended when factual accuracy is paramount (100% vs 94%).
184
+ **v2** is recommended when overall quality matters more (94.3 vs 92.9).
185
+
186
+ ## Methodology
187
+
188
+ Layer surgery was performed through a systematic, test-driven process across two phases:
189
+
190
+ 1. **Phase 1** (~150 configs): Exhaustive search across deletion, duplication, swapping, interpolation, and combined operations β†’ champion: del_10 + swap_11↔12 (v1)
191
+ 2. **Phase 2** (~95 configs): Advanced techniques including tripling, multi-swap, layer reversal, cycling, weight scaling, layer merging, skip bridges, and synthesis β†’ champion: this model (v2)
192
+ 3. **Evaluation**: Each configuration scored on factual accuracy (17 questions), completion coherence, perplexity, repetition ratio, and thinking mode functionality
193
+
194
+ ### Phase 2 Leaderboard (Top 5)
195
+
196
+ | Rank | Configuration | Score | Factual | PPL |
197
+ |:---:|:---|:---:|:---:|:---:|
198
+ | πŸ₯‡ | swap(12↔13,16↔17) + interp(20↔22) + bridge(5β†’20) | **94.3** | 94% | 15.17 |
199
+ | πŸ₯ˆ | swap(12↔13,16↔17) + interp(20↔22) | 93.9 | 94% | 14.74 |
200
+ | πŸ₯‰ | swap(12↔13) + interp(20↔22) + bridge(5β†’20) | 93.4 | 94% | 15.66 |
201
+ | 4 | multi-swap(12↔13,16↔17) | 93.1 | 100% | 14.90 |
202
+ | 5 | Phase 1 champion (del_10 + swap_11↔12) | 92.9 | 100% | 15.70 |
203
+
204
+ ## Credits
205
+
206
+ - **Base model**: [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) by the Qwen team at Alibaba
207
+ - **Quantization**: llama.cpp
208
+ - **Surgery**: g023
209
+
210
+ ## License
211
+
212
+ Apache 2.0 β€” same as the original Qwen3-1.7B model.