Instructions to use huihui-ai/Kimi-K2-Instruct-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use huihui-ai/Kimi-K2-Instruct-GGUF with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("huihui-ai/Kimi-K2-Instruct-GGUF", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Unsloth Studio new
How to use huihui-ai/Kimi-K2-Instruct-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for huihui-ai/Kimi-K2-Instruct-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for huihui-ai/Kimi-K2-Instruct-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for huihui-ai/Kimi-K2-Instruct-GGUF to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="huihui-ai/Kimi-K2-Instruct-GGUF", max_seq_length=2048, )
huihui-ai/Kimi-K2-Instruct-GGUF
This model converted from unsloth/Kimi-K2-Instruct-BF16 to GGUF.
Here we simply provide the conversion command and related information about ollama.
BF16 to f16.gguf
- Use the llama.cpp conversion program to convert Kimi-K2-Instruct-BF16 to gguf format, requires an additional approximately 2.1 TB of space.
python convert_hf_to_gguf.py /home/admin/models/unsloth/Kimi-K2-Instruct-BF16 --outfile /home/admin/models/unsloth/Kimi-K2-Instruct-BF16/ggml-model-f16.gguf --outtype f16
- Use the llama.cpp quantitative program to quantitative model (llama-quantize needs to be compiled.), other quant option. Convert first Q2_K, requires an additional approximately 347 GB of space.
llama-quantize /home/admin/models/unsloth/Kimi-K2-Instruct-BF16/ggml-model-f16.gguf /home/admin/models/unsloth/Kimi-K2-Instruct-BF16/ggml-model-Q2_K.gguf Q2_K
- Use llama-cli to test.
llama-cli -m /home/admin/models/unsloth/Kimi-K2-Instruct-BF16/ggml-model-Q2_K.gguf -n 2048
Use with ollama
The current version (0.9.6) of Ollama, due to LLAMA_MAX_EXPERTS being set to 256 in llama-hparams.h, requires manual modification to 384 and recompilation to run properly.
-- #define LLAMA_MAX_EXPERTS 256 // DeepSeekV3
++ #define LLAMA_MAX_EXPERTS 384 // Kimi-K2-Instruct
How to recompile ollama, please refer to Development
You can use huihui_ai/kimi-k2:1026b-Q2_K directly,
ollama run huihui_ai/huihui_ai/kimi-k2:1026b-Q2_K
Parameter description
1. num_gpu
The value of num_gpu inside the model is 1, which means it defaults to loading one layer. All others will be loaded into CPU memory. You can modify num_gpu according to your GPU configuration.
/set parameter num_gpu 2
2. num_thread
"num_thread" refers to the number of cores in your computer, and it's recommended to use half of that, Otherwise, the CPU will be at 100%.
/set parameter num_thread 32
3. num_ctx
"num_ctx" for ollama refers to the number of context slots or the number of contexts the model can maintain during inference.
/set parameter num_ctx 4096
Donation
You can follow x.com/support_huihui to get the latest model information from huihui.ai.
Your donation helps us continue our further development and improvement, a cup of coffee can do it.
- bitcoin:
bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge
Model tree for huihui-ai/Kimi-K2-Instruct-GGUF
Base model
moonshotai/Kimi-K2-Instruct