Instructions to use Ex0bit/Kimi-K2.5-PRISM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Ex0bit/Kimi-K2.5-PRISM with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Ex0bit/Kimi-K2.5-PRISM", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Ex0bit/Kimi-K2.5-PRISM", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Ex0bit/Kimi-K2.5-PRISM with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Ex0bit/Kimi-K2.5-PRISM"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ex0bit/Kimi-K2.5-PRISM",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Ex0bit/Kimi-K2.5-PRISM

SGLang

How to use Ex0bit/Kimi-K2.5-PRISM with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Ex0bit/Kimi-K2.5-PRISM" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ex0bit/Kimi-K2.5-PRISM",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Ex0bit/Kimi-K2.5-PRISM" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ex0bit/Kimi-K2.5-PRISM",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Ex0bit/Kimi-K2.5-PRISM with Docker Model Runner:
```
docker model run hf.co/Ex0bit/Kimi-K2.5-PRISM
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

### ☕ Model Purchase (IMPORTANT READ FIRST)

The Kimi-K2.5-PRISM Tensors are available for purchase only, reach out here: https://ko-fi.com/s/64a50000a4

Kimi-K2.5-PRISM

An unrestricted/unchained PRISM version of Moonshot AI's Kimi-K2.5 with over-refusal and propaganda mechanisms completely removed using our advanced PRISM pipeline (Projected Refusal Isolation via Subspace Modification).

☕ Support Our Work

If you enjoy our work and find it useful, please sponsor and support!

Option	Description
PRISM VIP Membership	Day-0 Access to all PRISM models
One-Time Support	Purchase this model

Model Highlights

PRISM Ablation — State-of-the-art technique that removes over-refusal behaviors while preserving model capabilities
1T MoE Architecture — 1 trillion total parameters with 32 billion active per token across 384 experts
Native Multimodal — Pre-trained on vision-language tokens for seamless image, video, and text understanding
256K Context Window — Extended context for complex agentic tasks and large codebases
Dual Modes — Supports both Thinking (deep reasoning) and Instant (fast response) modes
Agent Swarm — Self-directed, coordinated multi-agent execution for complex tasks

Model Architecture

Specification	Value
Architecture	Mixture-of-Experts (MoE)
Total Parameters	1T
Activated Parameters	32B
Number of Layers	61
Attention Hidden Dimension	7168
Number of Attention Heads	64
Number of Experts	384
Selected Experts per Token	8
Shared Experts	1
Vocabulary Size	160K
Context Length	256K
Attention Mechanism	MLA
Activation Function	SwiGLU
Vision Encoder	MoonViT (400M)

Benchmarks

Benchmark	Kimi K2.5 (Thinking)	GPT-5.2	Claude 4.5 Opus	Gemini 3 Pro
AIME 2025	96.1	100	92.8	95.0
GPQA-Diamond	87.6	92.4	87.0	91.9
HLE-Full	30.1	34.5	30.8	37.5
HLE-Full (w/ tools)	50.2	45.5	43.2	45.8
SWE-Bench Verified	76.8	80.0	80.9	76.2
Terminal Bench 2.0	50.8	54.0	59.3	54.2
BrowseComp	60.6	65.8	37.0	37.8
MMMU-Pro	78.5	79.5	74.0	81.0
VideoMMMU	86.6	85.9	84.4	87.6

Usage

Transformers

Install dependencies:

pip install git+https://github.com/huggingface/transformers.git

Basic chat completion:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_PATH = "Ex0bit/Kimi-K2.5-PRISM"

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_PATH,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

messages = [
    {"role": "system", "content": "You are Kimi, an AI assistant."},
    {"role": "user", "content": "Hello!"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

generated_ids = model.generate(**inputs, max_new_tokens=4096, do_sample=True, temperature=1.0, top_p=0.95)
output_text = tokenizer.decode(generated_ids[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(output_text)

Chat with Image

import base64
import requests

# Load image
url = "https://example.com/image.png"
image_base64 = base64.b64encode(requests.get(url).content).decode()

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image in detail."},
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/png;base64,{image_base64}"},
            },
        ],
    }
]

# Use same generation code as above

vLLM

Install vLLM nightly:

pip install -U vllm --pre --index-url https://pypi.org/simple --extra-index-url https://wheels.vllm.ai/nightly
pip install git+https://github.com/huggingface/transformers.git

Serve the model:

vllm serve Ex0bit/Kimi-K2.5-PRISM \
     --tensor-parallel-size 8 \
     --trust-remote-code \
     --served-model-name kimi-k2.5-prism

SGLang

python3 -m sglang.launch_server \
  --model-path Ex0bit/Kimi-K2.5-PRISM \
  --tp-size 8 \
  --trust-remote-code \
  --served-model-name kimi-k2.5-prism \
  --host 0.0.0.0 \
  --port 8000

Recommended Parameters

Mode	Temperature	Top-P	Max New Tokens
Thinking	1.0	0.95	96000
Instant	0.6	0.95	4096

Switching Modes

For Instant mode (faster, no reasoning), pass:

# Official API
extra_body={"thinking": {"type": "disabled"}}

# vLLM/SGLang
extra_body={"chat_template_kwargs": {"thinking": False}}

Hardware Requirements

Due to the 1T parameter size, this model requires significant hardware:

Minimum: 8x A100 80GB or equivalent
Recommended: 8x H100 80GB for optimal performance
INT4 Quantization: Available for reduced memory footprint

License

This model is released under the PRISM Research License.

Acknowledgments

Based on Kimi-K2.5 by Moonshot AI. See the technical blog for more details on the base model.

Downloads last month: 2

Safetensors

Model size

171B params

Tensor type

I32

BF16

Collection including Ex0bit/Kimi-K2.5-PRISM

Kimi K2.5 PRISM

Collection

PRISM abliterated Moonshot Kimi K2.5 — multimodal MoE with REAP expert pruning. • 2 items • Updated Mar 5