---
license: other
license_name: prism-research
license_link: LICENSE.md
language:
- en
- zh
tags:
- kimi
- prism
- moe
- multimodal
- vision
pipeline_tag: image-text-to-text
library_name: transformers
---
[-blue)]()
[]()
[]()
[]()
# Kimi-K2.5-PRISM
An unrestricted/unchained PRISM version of [Moonshot AI's Kimi-K2.5](https://huggingface.co/moonshotai/Kimi-K2.5) with over-refusal and propaganda mechanisms removed using our advanced PRISM pipeline (Projected Refusal Isolation via Subspace Modification).
### ☕ Support Our Work
If you enjou our work and find it useful, please consider sponsoring or supporting us!
[](https://ko-fi.com/ericelbaz)
| Option | Description |
|--------|-------------|
| [**PRISM VIP Membership**](https://ko-fi.com/summary/6bae206c-a751-4868-8dc7-f531afd1fb4c) | Access to all PRISM models |
| [**One-Time Support**](https://ko-fi.com/s/21007bed1a) | Support this model |
---
## Model Highlights
- **PRISM Ablation** — State-of-the-art technique that removes over-refusal behaviors while preserving model capabilities
- **1T MoE Architecture** — 1 trillion total parameters with 32 billion active per token across 384 experts
- **Native Multimodal** — Pre-trained on vision-language tokens for seamless image, video, and text understanding
- **256K Context Window** — Extended context for complex agentic tasks and large codebases
- **Dual Modes** — Supports both Thinking (deep reasoning) and Instant (fast response) modes
- **Agent Swarm** — Self-directed, coordinated multi-agent execution for complex tasks
## Model Architecture
| Specification | Value |
|---------------|-------|
| Architecture | Mixture-of-Experts (MoE) |
| Total Parameters | 1T |
| Activated Parameters | 32B |
| Number of Layers | 61 |
| Attention Hidden Dimension | 7168 |
| Number of Attention Heads | 64 |
| Number of Experts | 384 |
| Selected Experts per Token | 8 |
| Shared Experts | 1 |
| Vocabulary Size | 160K |
| Context Length | 256K |
| Attention Mechanism | MLA |
| Activation Function | SwiGLU |
| Vision Encoder | MoonViT (400M) |
## Benchmarks
| Benchmark | Kimi K2.5 (Thinking) | GPT-5.2 | Claude 4.5 Opus | Gemini 3 Pro |
|-----------|----------------------|---------|-----------------|--------------|
| AIME 2025 | 96.1 | 100 | 92.8 | 95.0 |
| GPQA-Diamond | 87.6 | 92.4 | 87.0 | 91.9 |
| HLE-Full | 30.1 | 34.5 | 30.8 | 37.5 |
| HLE-Full (w/ tools) | 50.2 | 45.5 | 43.2 | 45.8 |
| SWE-Bench Verified | 76.8 | 80.0 | 80.9 | 76.2 |
| Terminal Bench 2.0 | 50.8 | 54.0 | 59.3 | 54.2 |
| BrowseComp | 60.6 | 65.8 | 37.0 | 37.8 |
| MMMU-Pro | 78.5 | 79.5 | 74.0 | 81.0 |
| VideoMMMU | 86.6 | 85.9 | 84.4 | 87.6 |
## Usage
### Transformers
Install dependencies:
```shell
pip install git+https://github.com/huggingface/transformers.git
```
Basic chat completion:
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
MODEL_PATH = "Ex0bit/Kimi-K2.5-PRISM"
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
MODEL_PATH,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
messages = [
{"role": "system", "content": "You are Kimi, an AI assistant."},
{"role": "user", "content": "Hello!"}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
generated_ids = model.generate(**inputs, max_new_tokens=4096, do_sample=True, temperature=1.0, top_p=0.95)
output_text = tokenizer.decode(generated_ids[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(output_text)
```
### Chat with Image
```python
import base64
import requests
# Load image
url = "https://example.com/image.png"
image_base64 = base64.b64encode(requests.get(url).content).decode()
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image in detail."},
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{image_base64}"},
},
],
}
]
# Use same generation code as above
```
### vLLM
Install vLLM nightly:
```shell
pip install -U vllm --pre --index-url https://pypi.org/simple --extra-index-url https://wheels.vllm.ai/nightly
pip install git+https://github.com/huggingface/transformers.git
```
Serve the model:
```shell
vllm serve Ex0bit/Kimi-K2.5-PRISM \
--tensor-parallel-size 8 \
--trust-remote-code \
--served-model-name kimi-k2.5-prism
```
### SGLang
```shell
python3 -m sglang.launch_server \
--model-path Ex0bit/Kimi-K2.5-PRISM \
--tp-size 8 \
--trust-remote-code \
--served-model-name kimi-k2.5-prism \
--host 0.0.0.0 \
--port 8000
```
## Recommended Parameters
| Mode | Temperature | Top-P | Max New Tokens |
|------|-------------|-------|----------------|
| Thinking | 1.0 | 0.95 | 96000 |
| Instant | 0.6 | 0.95 | 4096 |
### Switching Modes
For **Instant mode** (faster, no reasoning), pass:
```python
# Official API
extra_body={"thinking": {"type": "disabled"}}
# vLLM/SGLang
extra_body={"chat_template_kwargs": {"thinking": False}}
```
## Hardware Requirements
Due to the 1T parameter size, this model requires significant hardware:
- **Minimum:** 8x A100 80GB or equivalent
- **Recommended:** 8x H100 80GB for optimal performance
- **INT4 Quantization:** Available for reduced memory footprint
## License
This model is released under the [PRISM Research License](LICENSE.md).
## Acknowledgments
Based on [Kimi-K2.5](https://huggingface.co/moonshotai/Kimi-K2.5) by [Moonshot AI](https://www.moonshot.ai). See the [technical blog](https://www.kimi.com/blog/kimi-k2-5.html) for more details on the base model.