g023 commited on
Commit
1db2f0f
·
verified ·
1 Parent(s): 156c198

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -9
README.md CHANGED
@@ -29,7 +29,7 @@ Created through innovative layer surgery combining multi-swap, interpolation, an
29
 
30
  | Quantization | Bits/Weight | Description | Download |
31
  |:---:|:---:|:---|:---:|
32
- | **Q8_0** | 8.00 | Highest quality, virtually lossless | [Qwen3-g023-tiny-v2-Q8_0.gguf](./Qwen3-g023-tiny-v2-Q8_0.gguf) |
33
  | **Q6_K** | 6.57 | Excellent quality, good compression | [Qwen3-g023-tiny-v2-Q6_K.gguf](./Qwen3-g023-tiny-v2-Q6_K.gguf) |
34
  | **Q4_K_M** | 4.85 | Good balance of quality and size | [Qwen3-g023-tiny-v2-Q4_K_M.gguf](./Qwen3-g023-tiny-v2-Q4_K_M.gguf) |
35
  | **Q3_K_M** | 3.91 | High compression, moderate quality loss | [Qwen3-g023-tiny-v2-Q3_K_M.gguf](./Qwen3-g023-tiny-v2-Q3_K_M.gguf) |
@@ -93,12 +93,20 @@ Evaluated using a comprehensive test suite with 17 factual questions, 2 completi
93
  ```bash
94
  # Download the GGUF and create from Modelfile
95
  cat > Modelfile << 'EOF'
96
- FROM ./Qwen3-g023-tiny-v2-Q4_K_M.gguf
97
 
98
- PARAMETER temperature 0.6
99
  PARAMETER top_p 0.95
100
- PARAMETER top_k 20
101
- PARAMETER min_p 0.0
 
 
 
 
 
 
 
 
102
 
103
  TEMPLATE """{{- if .System }}
104
  <|im_start|>system
@@ -126,16 +134,16 @@ ollama run qwen3-tiny-v2
126
 
127
  ```bash
128
  # Interactive chat
129
- llama-cli -m Qwen3-g023-tiny-v2-Q4_K_M.gguf \
130
  --chat-template chatml -cnv
131
 
132
  # Thinking mode
133
- llama-cli -m Qwen3-g023-tiny-v2-Q4_K_M.gguf \
134
  -p "<|im_start|>user\nExplain quantum computing<|im_end|>\n<|im_start|>assistant\n<think>\n" \
135
  -n 512
136
 
137
  # Non-thinking mode
138
- llama-cli -m Qwen3-g023-tiny-v2-Q4_K_M.gguf \
139
  -p "<|im_start|>user\n/no_think What is 2+2?<|im_end|>\n<|im_start|>assistant\n" \
140
  -n 128
141
  ```
@@ -145,7 +153,7 @@ llama-cli -m Qwen3-g023-tiny-v2-Q4_K_M.gguf \
145
  ```python
146
  from llama_cpp import Llama
147
 
148
- model = Llama("Qwen3-g023-tiny-v2-Q4_K_M.gguf", n_ctx=4096)
149
  response = model.create_chat_completion(
150
  messages=[
151
  {"role": "system", "content": "You are a helpful assistant."},
 
29
 
30
  | Quantization | Bits/Weight | Description | Download |
31
  |:---:|:---:|:---|:---:|
32
+ | **Q8_0** | 8.00 | Highest quality, virtually lossless (USE THIS ONE) | [Qwen3-g023-tiny-v2-Q8_0.gguf](./Qwen3-g023-tiny-v2-Q8_0.gguf) |
33
  | **Q6_K** | 6.57 | Excellent quality, good compression | [Qwen3-g023-tiny-v2-Q6_K.gguf](./Qwen3-g023-tiny-v2-Q6_K.gguf) |
34
  | **Q4_K_M** | 4.85 | Good balance of quality and size | [Qwen3-g023-tiny-v2-Q4_K_M.gguf](./Qwen3-g023-tiny-v2-Q4_K_M.gguf) |
35
  | **Q3_K_M** | 3.91 | High compression, moderate quality loss | [Qwen3-g023-tiny-v2-Q3_K_M.gguf](./Qwen3-g023-tiny-v2-Q3_K_M.gguf) |
 
93
  ```bash
94
  # Download the GGUF and create from Modelfile
95
  cat > Modelfile << 'EOF'
96
+ FROM ./Qwen3-g023-tiny-v2-Q8_0.gguf
97
 
98
+ PARAMETER temperature 1.0
99
  PARAMETER top_p 0.95
100
+ PARAMETER top_k 45
101
+ PARAMETER min_p 0.1
102
+ PARAMETER num_ctx 40000
103
+ PARAMETER mirostat 2
104
+ PARAMETER mirostat_tau 5.0
105
+ PARAMETER mirostat_eta 0.1
106
+ PARAMETER repeat_last_n 16384
107
+ PARAMETER repeat_penalty 1.1
108
+ PARAMETER presence_penalty 0.5
109
+ PARAMETER frequency_penalty 1.0
110
 
111
  TEMPLATE """{{- if .System }}
112
  <|im_start|>system
 
134
 
135
  ```bash
136
  # Interactive chat
137
+ llama-cli -m Qwen3-g023-tiny-v2-Q8_0.gguf \
138
  --chat-template chatml -cnv
139
 
140
  # Thinking mode
141
+ llama-cli -m Qwen3-g023-tiny-v2-Q8_0.gguf \
142
  -p "<|im_start|>user\nExplain quantum computing<|im_end|>\n<|im_start|>assistant\n<think>\n" \
143
  -n 512
144
 
145
  # Non-thinking mode
146
+ llama-cli -m Qwen3-g023-tiny-v2-Q8_0.gguf \
147
  -p "<|im_start|>user\n/no_think What is 2+2?<|im_end|>\n<|im_start|>assistant\n" \
148
  -n 128
149
  ```
 
153
  ```python
154
  from llama_cpp import Llama
155
 
156
+ model = Llama("Qwen3-g023-tiny-v2-Q8_0.gguf", n_ctx=4096)
157
  response = model.create_chat_completion(
158
  messages=[
159
  {"role": "system", "content": "You are a helpful assistant."},