Update README.md
Browse files
README.md
CHANGED
|
@@ -27,16 +27,13 @@ library_name: transformers
|
|
| 27 |
</div>
|
| 28 |
<h1 style="margin-top: 0rem;">🌙 Kimi K2 Usage Guidelines</h1>
|
| 29 |
</div>
|
| 30 |
-
It is recommended to have at least 128GB unified RAM memory to run the small quants.
|
| 31 |
For best results, use any 2-bit XL quant or above.
|
| 32 |
|
| 33 |
Set the temperature to 0.6 recommended) to reduce repetition and incoherence.
|
| 34 |
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
<|User|>What is 1+1?<|Assistant|>
|
| 38 |
-
```
|
| 39 |
-
- For complete detailed instructions, see our guide: [unsloth.ai/blog/kimi-k2](https://docs.unsloth.ai/basics/kimi-k2)
|
| 40 |
|
| 41 |
<div align="center">
|
| 42 |
<picture>
|
|
|
|
| 27 |
</div>
|
| 28 |
<h1 style="margin-top: 0rem;">🌙 Kimi K2 Usage Guidelines</h1>
|
| 29 |
</div>
|
| 30 |
+
It is recommended to have at least 128GB unified RAM memory to run the small quants. With 16GB VRAM and 256 RAM, expect 5+ tokens/sec.
|
| 31 |
For best results, use any 2-bit XL quant or above.
|
| 32 |
|
| 33 |
Set the temperature to 0.6 recommended) to reduce repetition and incoherence.
|
| 34 |
|
| 35 |
+
- Use llama.cpp's [PR #14654](https://github.com/ggml-org/llama.cpp/pull/14654) or [our llama.cpp fork](https://github.com/unslothai/llama.cpp) (easier to work)
|
| 36 |
+
- For complete detailed instructions, see our guide: [docs.unsloth.ai/basics/kimi-k2](https://docs.unsloth.ai/basics/kimi-k2)
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
<div align="center">
|
| 39 |
<picture>
|