preethamBelweave commited on
Commit
cb80baf
·
verified ·
1 Parent(s): 9876a12

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +29 -0
README.md CHANGED
@@ -182,6 +182,35 @@ response = generate(
182
  print(response)
183
  ```
184
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
185
  ## Model Architecture Details
186
 
187
  - **Hidden Size:** 3,584
 
182
  print(response)
183
  ```
184
 
185
+ ## GGUF Quantizations
186
+
187
+ For local inference with **LM Studio**, **llama.cpp**, or **Ollama**, download a GGUF variant:
188
+
189
+ | Quantization | Size | BPW | Best For |
190
+ |--------------|------|-----|----------|
191
+ | [Q4_K_M](https://huggingface.co/belweave/kai-2/blob/main/kai-2-Q4_K_M.gguf) | ~4.7 GB | 4.91 | Speed/quality balance (recommended) |
192
+ | [Q5_K_M](https://huggingface.co/belweave/kai-2/blob/main/kai-2-Q5_K_M.gguf) | ~5.4 GB | 5.71 | Slightly higher quality |
193
+ | [Q8_0](https://huggingface.co/belweave/kai-2/blob/main/kai-2-Q8_0.gguf) | ~8.1 GB | 8.50 | Near-lossless quality |
194
+
195
+ ### LM Studio
196
+
197
+ 1. Download any `.gguf` file above
198
+ 2. Open LM Studio → **My Models** → **Load from Disk**
199
+ 3. Select the downloaded `.gguf` file
200
+ 4. In **Chat Settings**, ensure the system prompt is left **empty** (the model's chat template already handles identity)
201
+
202
+ ### llama.cpp / Ollama
203
+
204
+ ```bash
205
+ # llama.cpp
206
+ ./llama-cli -m kai-2-Q4_K_M.gguf -p "Who are you?"
207
+
208
+ # Ollama (create a Modelfile)
209
+ echo 'FROM ./kai-2-Q4_K_M.gguf' > Modelfile
210
+ ollama create kai-2 -f Modelfile
211
+ ollama run kai-2
212
+ ```
213
+
214
  ## Model Architecture Details
215
 
216
  - **Hidden Size:** 3,584