Instructions to use mlx-community/Kimi-K2-Instruct-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/Kimi-K2-Instruct-4bit with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("mlx-community/Kimi-K2-Instruct-4bit") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use mlx-community/Kimi-K2-Instruct-4bit with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "mlx-community/Kimi-K2-Instruct-4bit"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "mlx-community/Kimi-K2-Instruct-4bit" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use mlx-community/Kimi-K2-Instruct-4bit with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "mlx-community/Kimi-K2-Instruct-4bit"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default mlx-community/Kimi-K2-Instruct-4bit
Run Hermes
hermes
- MLX LM
How to use mlx-community/Kimi-K2-Instruct-4bit with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "mlx-community/Kimi-K2-Instruct-4bit"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "mlx-community/Kimi-K2-Instruct-4bit" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mlx-community/Kimi-K2-Instruct-4bit", "messages": [ {"role": "user", "content": "Hello"} ] }'
How? (Model type kimi_k2 not supported)
You say you converted this with version 0.26.0 but when I try to do that it says ValueError: Model type kimi_k2 not supported. How?
me too.
code:
from mlx_lm import load, generate
model, tokenizer = load("mlx-community/Kimi-K2-Instruct-4bit")
this is log:
model-00175-of-00180.safetensors:β100%
β3.17G/3.17Gβ[00:29<00:00,β527MB/s]
model-00176-of-00180.safetensors:β100%
β3.17G/3.17Gβ[00:28<00:00,β586MB/s]
model-00177-of-00180.safetensors:β100%
β3.26G/3.26Gβ[00:27<00:00,β74.1MB/s]
model-00178-of-00180.safetensors:β100%
β3.17G/3.17Gβ[00:26<00:00,β316MB/s]
model-00179-of-00180.safetensors:β100%
β3.17G/3.17Gβ[00:26<00:00,β395MB/s]
model-00180-of-00180.safetensors:β100%
β3.86G/3.86Gβ[00:22<00:00,β631MB/s]
model.safetensors.index.json:β
β221k/?β[00:00<00:00,β17.2MB/s]
special_tokens_map.json:β100%
β760/760β[00:00<00:00,β162kB/s]
modeling_deepseek.py:β
β75.8k/?β[00:00<00:00,β4.46MB/s]
tiktoken.model:β100%
β2.80M/2.80Mβ[00:11<00:00,β253kB/s]
tokenization_kimi.py:β
β11.3k/?β[00:00<00:00,β943kB/s]
tokenizer_config.json:β
β2.74k/?β[00:00<00:00,β84.5kB/s]
ERROR:root:Model type kimi_k2 not supported.
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
File /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/mlx_lm/utils.py:65, in _get_classes(config)
64 try:
---> 65 arch = importlib.import_module(f"mlx_lm.models.{model_type}")
66 except ImportError:
File /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/importlib/__init__.py:126, in import_module(name, package)
125 level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)
File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level)
File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)
File <frozen importlib._bootstrap>:1004, in _find_and_load_unlocked(name, import_)
ModuleNotFoundError: No module named 'mlx_lm.models.kimi_k2'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
Cell In[1], line 3
1 from mlx_lm import load, generate
----> 3 model, tokenizer = load("mlx-community/Kimi-K2-Instruct-4bit")
File /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/mlx_lm/utils.py:258, in load(path_or_hf_repo, tokenizer_config, model_config, adapter_path, lazy)
235 """
236 Load the model and tokenizer from a given path or a huggingface repository.
237
(...)
254 ValueError: If model class or args class are not found.
255 """
256 model_path, _ = get_model_path(path_or_hf_repo)
--> 258 model, config = load_model(model_path, lazy)
259 if adapter_path is not None:
260 model = load_adapters(model, adapter_path)
File /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/mlx_lm/utils.py:185, in load_model(model_path, lazy, strict, model_config, get_model_classes)
182 for wf in weight_files:
183 weights.update(mx.load(wf))
--> 185 model_class, model_args_class = get_model_classes(config=config)
187 model_args = model_args_class.from_dict(config)
188 model = model_class(model_args)
File /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/mlx_lm/utils.py:69, in _get_classes(config)
67 msg = f"Model type {model_type} not supported."
68 logging.error(msg)
---> 69 raise ValueError(msg)
71 return arch.Model, arch.ModelArgs
ValueError: Model type kimi_k2 not supported.
Same here. Does anyone have a solution for that? Are there any alternatives?
I can't test it and it may not work at all, don't know, but you could try to look for the mlx_lm package in your IDE, direct into the models folder, then copy the "deepseek_v3.py" file and rename the copy to "kimi_k2.py".
Someone please make a 3-bit or 2-bit quant of this model. It would fit on a single 512GB M3 ultra.
I had this problem too. Workaround: clone MLX-LM from Github directly and use DeepseekV3 as model_type.
I managed to get inference running on two 512GB M3 Ultras, but I am failing to find solution to run this distributed inference as OpenAI compatible API http server.
Someone please make a 3-bit or 2-bit quant of this model. It would fit on a single 512GB M3 ultra.
https://huggingface.co/mlx-community/moonshotai_Kimi-K2-Instruct-mlx-3bit
Thank for your update. By the way, is there ant way or method to quantize large models(Deepseek, GLM-4.5, Kimi-K2) which exceeds local machine's RAM.
If you could please support 'chunk based' conversion & quantization instead of loading full models into the memory.