---
license: other
license_name: prism-research
license_link: LICENSE.md
language:
- en
- zh
tags:
- kimi
- prism
- moe
- multimodal
- vision
pipeline_tag: image-text-to-text
library_name: transformers
---

[![Parameters](https://img.shields.io/badge/Parameters-1T_(32B_Active)-blue)]()
[![Architecture](https://img.shields.io/badge/Architecture-MoE-green)]()
[![Context](https://img.shields.io/badge/Context-256K-orange)]()
[![Multimodal](https://img.shields.io/badge/Multimodal-Vision%20%2B%20Text-purple)]()


<p align="center">
  <img src="https://cdn-uploads.huggingface.co/production/uploads/63adf1fa42fd3b8dbaeb0c92/ZBA5B381EC5oOmnAV7TPC.png" width="400"/>
</p>
# Kimi-K2.5-PRISM

An unrestricted/unchained PRISM version of [Moonshot AI's Kimi-K2.5](https://huggingface.co/moonshotai/Kimi-K2.5) with over-refusal and propaganda mechanisms removed using our advanced PRISM pipeline (Projected Refusal Isolation via Subspace Modification).

<div align="center">

### ☕ Support Our Work

If you enjou our work and find it useful, please consider sponsoring or supporting us!

[![Ko-fi](https://img.shields.io/badge/Ko--fi-Support%20Us-ff5e5b?logo=ko-fi&logoColor=white)](https://ko-fi.com/ericelbaz)

| Option | Description |
|--------|-------------|
| [**PRISM VIP Membership**](https://ko-fi.com/summary/6bae206c-a751-4868-8dc7-f531afd1fb4c) | Access to all PRISM models |
| [**One-Time Support**](https://ko-fi.com/s/21007bed1a) | Support this model |

</div>

---

## Model Highlights

- **PRISM Ablation** — State-of-the-art technique that removes over-refusal behaviors while preserving model capabilities
- **1T MoE Architecture** — 1 trillion total parameters with 32 billion active per token across 384 experts
- **Native Multimodal** — Pre-trained on vision-language tokens for seamless image, video, and text understanding
- **256K Context Window** — Extended context for complex agentic tasks and large codebases
- **Dual Modes** — Supports both Thinking (deep reasoning) and Instant (fast response) modes
- **Agent Swarm** — Self-directed, coordinated multi-agent execution for complex tasks

## Model Architecture

| Specification | Value |
|---------------|-------|
| Architecture | Mixture-of-Experts (MoE) |
| Total Parameters | 1T |
| Activated Parameters | 32B |
| Number of Layers | 61 |
| Attention Hidden Dimension | 7168 |
| Number of Attention Heads | 64 |
| Number of Experts | 384 |
| Selected Experts per Token | 8 |
| Shared Experts | 1 |
| Vocabulary Size | 160K |
| Context Length | 256K |
| Attention Mechanism | MLA |
| Activation Function | SwiGLU |
| Vision Encoder | MoonViT (400M) |

## Benchmarks

| Benchmark | Kimi K2.5 (Thinking) | GPT-5.2 | Claude 4.5 Opus | Gemini 3 Pro |
|-----------|----------------------|---------|-----------------|--------------|
| AIME 2025 | 96.1 | 100 | 92.8 | 95.0 |
| GPQA-Diamond | 87.6 | 92.4 | 87.0 | 91.9 |
| HLE-Full | 30.1 | 34.5 | 30.8 | 37.5 |
| HLE-Full (w/ tools) | 50.2 | 45.5 | 43.2 | 45.8 |
| SWE-Bench Verified | 76.8 | 80.0 | 80.9 | 76.2 |
| Terminal Bench 2.0 | 50.8 | 54.0 | 59.3 | 54.2 |
| BrowseComp | 60.6 | 65.8 | 37.0 | 37.8 |
| MMMU-Pro | 78.5 | 79.5 | 74.0 | 81.0 |
| VideoMMMU | 86.6 | 85.9 | 84.4 | 87.6 |

## Usage

### Transformers

Install dependencies:

```shell
pip install git+https://github.com/huggingface/transformers.git
```

Basic chat completion:

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_PATH = "Ex0bit/Kimi-K2.5-PRISM"

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_PATH,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

messages = [
    {"role": "system", "content": "You are Kimi, an AI assistant."},
    {"role": "user", "content": "Hello!"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

generated_ids = model.generate(**inputs, max_new_tokens=4096, do_sample=True, temperature=1.0, top_p=0.95)
output_text = tokenizer.decode(generated_ids[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(output_text)
```

### Chat with Image

```python
import base64
import requests

# Load image
url = "https://example.com/image.png"
image_base64 = base64.b64encode(requests.get(url).content).decode()

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image in detail."},
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/png;base64,{image_base64}"},
            },
        ],
    }
]

# Use same generation code as above
```

### vLLM

Install vLLM nightly:

```shell
pip install -U vllm --pre --index-url https://pypi.org/simple --extra-index-url https://wheels.vllm.ai/nightly
pip install git+https://github.com/huggingface/transformers.git
```

Serve the model:

```shell
vllm serve Ex0bit/Kimi-K2.5-PRISM \
     --tensor-parallel-size 8 \
     --trust-remote-code \
     --served-model-name kimi-k2.5-prism
```

### SGLang

```shell
python3 -m sglang.launch_server \
  --model-path Ex0bit/Kimi-K2.5-PRISM \
  --tp-size 8 \
  --trust-remote-code \
  --served-model-name kimi-k2.5-prism \
  --host 0.0.0.0 \
  --port 8000
```

## Recommended Parameters

| Mode | Temperature | Top-P | Max New Tokens |
|------|-------------|-------|----------------|
| Thinking | 1.0 | 0.95 | 96000 |
| Instant | 0.6 | 0.95 | 4096 |

### Switching Modes

For **Instant mode** (faster, no reasoning), pass:

```python
# Official API
extra_body={"thinking": {"type": "disabled"}}

# vLLM/SGLang
extra_body={"chat_template_kwargs": {"thinking": False}}
```

## Hardware Requirements

Due to the 1T parameter size, this model requires significant hardware:

- **Minimum:** 8x A100 80GB or equivalent
- **Recommended:** 8x H100 80GB for optimal performance
- **INT4 Quantization:** Available for reduced memory footprint

## License

This model is released under the [PRISM Research License](LICENSE.md).

## Acknowledgments

Based on [Kimi-K2.5](https://huggingface.co/moonshotai/Kimi-K2.5) by [Moonshot AI](https://www.moonshot.ai). See the [technical blog](https://www.kimi.com/blog/kimi-k2-5.html) for more details on the base model.