Instructions to use mlx-community/Kimi-K2-Instruct-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use mlx-community/Kimi-K2-Instruct-4bit with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("mlx-community/Kimi-K2-Instruct-4bit")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

Pi new

How to use mlx-community/Kimi-K2-Instruct-4bit with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "mlx-community/Kimi-K2-Instruct-4bit"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "mlx-community/Kimi-K2-Instruct-4bit"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use mlx-community/Kimi-K2-Instruct-4bit with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "mlx-community/Kimi-K2-Instruct-4bit"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default mlx-community/Kimi-K2-Instruct-4bit

Run Hermes

hermes

MLX LM

How to use mlx-community/Kimi-K2-Instruct-4bit with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "mlx-community/Kimi-K2-Instruct-4bit"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "mlx-community/Kimi-K2-Instruct-4bit"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "mlx-community/Kimi-K2-Instruct-4bit",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

How? (Model type kimi_k2 not supported)

by deltanym - opened Jul 13, 2025

Discussion

deltanym

Jul 13, 2025

You say you converted this with version 0.26.0 but when I try to do that it says ValueError: Model type kimi_k2 not supported. How?

nameliu

Jul 14, 2025

me too.

code:

from mlx_lm import load, generate

model, tokenizer = load("mlx-community/Kimi-K2-Instruct-4bit")

this is log:

model-00175-of-00180.safetensors: 100%
 3.17G/3.17G [00:29<00:00, 527MB/s]
model-00176-of-00180.safetensors: 100%
 3.17G/3.17G [00:28<00:00, 586MB/s]
model-00177-of-00180.safetensors: 100%
 3.26G/3.26G [00:27<00:00, 74.1MB/s]
model-00178-of-00180.safetensors: 100%
 3.17G/3.17G [00:26<00:00, 316MB/s]
model-00179-of-00180.safetensors: 100%
 3.17G/3.17G [00:26<00:00, 395MB/s]
model-00180-of-00180.safetensors: 100%
 3.86G/3.86G [00:22<00:00, 631MB/s]
model.safetensors.index.json: 
 221k/? [00:00<00:00, 17.2MB/s]
special_tokens_map.json: 100%
 760/760 [00:00<00:00, 162kB/s]
modeling_deepseek.py: 
 75.8k/? [00:00<00:00, 4.46MB/s]
tiktoken.model: 100%
 2.80M/2.80M [00:11<00:00, 253kB/s]
tokenization_kimi.py: 
 11.3k/? [00:00<00:00, 943kB/s]
tokenizer_config.json: 
 2.74k/? [00:00<00:00, 84.5kB/s]

ERROR:root:Model type kimi_k2 not supported.

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
File /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/mlx_lm/utils.py:65, in _get_classes(config)
     64 try:
---> 65     arch = importlib.import_module(f"mlx_lm.models.{model_type}")
     66 except ImportError:

File /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/importlib/__init__.py:126, in import_module(name, package)
    125         level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)

File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)

File <frozen importlib._bootstrap>:1004, in _find_and_load_unlocked(name, import_)

ModuleNotFoundError: No module named 'mlx_lm.models.kimi_k2'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[1], line 3
      1 from mlx_lm import load, generate
----> 3 model, tokenizer = load("mlx-community/Kimi-K2-Instruct-4bit")

File /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/mlx_lm/utils.py:258, in load(path_or_hf_repo, tokenizer_config, model_config, adapter_path, lazy)
    235 """
    236 Load the model and tokenizer from a given path or a huggingface repository.
    237 
   (...)
    254     ValueError: If model class or args class are not found.
    255 """
    256 model_path, _ = get_model_path(path_or_hf_repo)
--> 258 model, config = load_model(model_path, lazy)
    259 if adapter_path is not None:
    260     model = load_adapters(model, adapter_path)

File /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/mlx_lm/utils.py:185, in load_model(model_path, lazy, strict, model_config, get_model_classes)
    182 for wf in weight_files:
    183     weights.update(mx.load(wf))
--> 185 model_class, model_args_class = get_model_classes(config=config)
    187 model_args = model_args_class.from_dict(config)
    188 model = model_class(model_args)

File /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/mlx_lm/utils.py:69, in _get_classes(config)
     67     msg = f"Model type {model_type} not supported."
     68     logging.error(msg)
---> 69     raise ValueError(msg)
     71 return arch.Model, arch.ModelArgs

ValueError: Model type kimi_k2 not supported.

gloh31

Jul 15, 2025

Same here. Does anyone have a solution for that? Are there any alternatives?

BenBenser

MLX Community org Jul 21, 2025

I can't test it and it may not work at all, don't know, but you could try to look for the mlx_lm package in your IDE, direct into the models folder, then copy the "deepseek_v3.py" file and rename the copy to "kimi_k2.py".

darthsider

Jul 22, 2025

•

edited Jul 22, 2025

Someone please make a 3-bit or 2-bit quant of this model. It would fit on a single 512GB M3 ultra.

michalkomar

MLX Community org Jul 25, 2025

I had this problem too. Workaround: clone MLX-LM from Github directly and use DeepseekV3 as model_type.

I managed to get inference running on two 512GB M3 Ultras, but I am failing to find solution to run this distributed inference as OpenAI compatible API http server.

bibproj

MLX Community org Aug 23, 2025

Someone please make a 3-bit or 2-bit quant of this model. It would fit on a single 512GB M3 ultra.

https://huggingface.co/mlx-community/moonshotai_Kimi-K2-Instruct-mlx-3bit

TOTORONG

MLX Community org Aug 23, 2025

Thank for your update. By the way, is there ant way or method to quantize large models(Deepseek, GLM-4.5, Kimi-K2) which exceeds local machine's RAM.
If you could please support 'chunk based' conversion & quantization instead of loading full models into the memory.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment