|
|
--- |
|
|
license: other |
|
|
license_name: prism-research |
|
|
license_link: LICENSE.md |
|
|
language: |
|
|
- en |
|
|
- zh |
|
|
tags: |
|
|
- kimi |
|
|
- prism |
|
|
- moe |
|
|
- multimodal |
|
|
- vision |
|
|
pipeline_tag: image-text-to-text |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
[-blue)]() |
|
|
[]() |
|
|
[]() |
|
|
[]() |
|
|
|
|
|
|
|
|
|
|
|
<p align="center"> |
|
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/63adf1fa42fd3b8dbaeb0c92/ZBA5B381EC5oOmnAV7TPC.png" width="400"/> |
|
|
</p> |
|
|
# Kimi-K2.5-PRISM |
|
|
|
|
|
An unrestricted/unchained PRISM version of [Moonshot AI's Kimi-K2.5](https://huggingface.co/moonshotai/Kimi-K2.5) with over-refusal and propaganda mechanisms removed using our advanced PRISM pipeline (Projected Refusal Isolation via Subspace Modification). |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
### ☕ Support Our Work |
|
|
|
|
|
If you enjou our work and find it useful, please consider sponsoring or supporting us! |
|
|
|
|
|
[](https://ko-fi.com/ericelbaz) |
|
|
|
|
|
| Option | Description | |
|
|
|--------|-------------| |
|
|
| [**PRISM VIP Membership**](https://ko-fi.com/summary/6bae206c-a751-4868-8dc7-f531afd1fb4c) | Access to all PRISM models | |
|
|
| [**One-Time Support**](https://ko-fi.com/s/21007bed1a) | Support this model | |
|
|
|
|
|
</div> |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Highlights |
|
|
|
|
|
- **PRISM Ablation** — State-of-the-art technique that removes over-refusal behaviors while preserving model capabilities |
|
|
- **1T MoE Architecture** — 1 trillion total parameters with 32 billion active per token across 384 experts |
|
|
- **Native Multimodal** — Pre-trained on vision-language tokens for seamless image, video, and text understanding |
|
|
- **256K Context Window** — Extended context for complex agentic tasks and large codebases |
|
|
- **Dual Modes** — Supports both Thinking (deep reasoning) and Instant (fast response) modes |
|
|
- **Agent Swarm** — Self-directed, coordinated multi-agent execution for complex tasks |
|
|
|
|
|
## Model Architecture |
|
|
|
|
|
| Specification | Value | |
|
|
|---------------|-------| |
|
|
| Architecture | Mixture-of-Experts (MoE) | |
|
|
| Total Parameters | 1T | |
|
|
| Activated Parameters | 32B | |
|
|
| Number of Layers | 61 | |
|
|
| Attention Hidden Dimension | 7168 | |
|
|
| Number of Attention Heads | 64 | |
|
|
| Number of Experts | 384 | |
|
|
| Selected Experts per Token | 8 | |
|
|
| Shared Experts | 1 | |
|
|
| Vocabulary Size | 160K | |
|
|
| Context Length | 256K | |
|
|
| Attention Mechanism | MLA | |
|
|
| Activation Function | SwiGLU | |
|
|
| Vision Encoder | MoonViT (400M) | |
|
|
|
|
|
## Benchmarks |
|
|
|
|
|
| Benchmark | Kimi K2.5 (Thinking) | GPT-5.2 | Claude 4.5 Opus | Gemini 3 Pro | |
|
|
|-----------|----------------------|---------|-----------------|--------------| |
|
|
| AIME 2025 | 96.1 | 100 | 92.8 | 95.0 | |
|
|
| GPQA-Diamond | 87.6 | 92.4 | 87.0 | 91.9 | |
|
|
| HLE-Full | 30.1 | 34.5 | 30.8 | 37.5 | |
|
|
| HLE-Full (w/ tools) | 50.2 | 45.5 | 43.2 | 45.8 | |
|
|
| SWE-Bench Verified | 76.8 | 80.0 | 80.9 | 76.2 | |
|
|
| Terminal Bench 2.0 | 50.8 | 54.0 | 59.3 | 54.2 | |
|
|
| BrowseComp | 60.6 | 65.8 | 37.0 | 37.8 | |
|
|
| MMMU-Pro | 78.5 | 79.5 | 74.0 | 81.0 | |
|
|
| VideoMMMU | 86.6 | 85.9 | 84.4 | 87.6 | |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Transformers |
|
|
|
|
|
Install dependencies: |
|
|
|
|
|
```shell |
|
|
pip install git+https://github.com/huggingface/transformers.git |
|
|
``` |
|
|
|
|
|
Basic chat completion: |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
MODEL_PATH = "Ex0bit/Kimi-K2.5-PRISM" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
MODEL_PATH, |
|
|
torch_dtype=torch.bfloat16, |
|
|
device_map="auto", |
|
|
trust_remote_code=True, |
|
|
) |
|
|
|
|
|
messages = [ |
|
|
{"role": "system", "content": "You are Kimi, an AI assistant."}, |
|
|
{"role": "user", "content": "Hello!"} |
|
|
] |
|
|
|
|
|
inputs = tokenizer.apply_chat_template( |
|
|
messages, |
|
|
tokenize=True, |
|
|
add_generation_prompt=True, |
|
|
return_dict=True, |
|
|
return_tensors="pt", |
|
|
).to(model.device) |
|
|
|
|
|
generated_ids = model.generate(**inputs, max_new_tokens=4096, do_sample=True, temperature=1.0, top_p=0.95) |
|
|
output_text = tokenizer.decode(generated_ids[0][inputs.input_ids.shape[1]:], skip_special_tokens=True) |
|
|
print(output_text) |
|
|
``` |
|
|
|
|
|
### Chat with Image |
|
|
|
|
|
```python |
|
|
import base64 |
|
|
import requests |
|
|
|
|
|
# Load image |
|
|
url = "https://example.com/image.png" |
|
|
image_base64 = base64.b64encode(requests.get(url).content).decode() |
|
|
|
|
|
messages = [ |
|
|
{ |
|
|
"role": "user", |
|
|
"content": [ |
|
|
{"type": "text", "text": "Describe this image in detail."}, |
|
|
{ |
|
|
"type": "image_url", |
|
|
"image_url": {"url": f"data:image/png;base64,{image_base64}"}, |
|
|
}, |
|
|
], |
|
|
} |
|
|
] |
|
|
|
|
|
# Use same generation code as above |
|
|
``` |
|
|
|
|
|
### vLLM |
|
|
|
|
|
Install vLLM nightly: |
|
|
|
|
|
```shell |
|
|
pip install -U vllm --pre --index-url https://pypi.org/simple --extra-index-url https://wheels.vllm.ai/nightly |
|
|
pip install git+https://github.com/huggingface/transformers.git |
|
|
``` |
|
|
|
|
|
Serve the model: |
|
|
|
|
|
```shell |
|
|
vllm serve Ex0bit/Kimi-K2.5-PRISM \ |
|
|
--tensor-parallel-size 8 \ |
|
|
--trust-remote-code \ |
|
|
--served-model-name kimi-k2.5-prism |
|
|
``` |
|
|
|
|
|
### SGLang |
|
|
|
|
|
```shell |
|
|
python3 -m sglang.launch_server \ |
|
|
--model-path Ex0bit/Kimi-K2.5-PRISM \ |
|
|
--tp-size 8 \ |
|
|
--trust-remote-code \ |
|
|
--served-model-name kimi-k2.5-prism \ |
|
|
--host 0.0.0.0 \ |
|
|
--port 8000 |
|
|
``` |
|
|
|
|
|
## Recommended Parameters |
|
|
|
|
|
| Mode | Temperature | Top-P | Max New Tokens | |
|
|
|------|-------------|-------|----------------| |
|
|
| Thinking | 1.0 | 0.95 | 96000 | |
|
|
| Instant | 0.6 | 0.95 | 4096 | |
|
|
|
|
|
### Switching Modes |
|
|
|
|
|
For **Instant mode** (faster, no reasoning), pass: |
|
|
|
|
|
```python |
|
|
# Official API |
|
|
extra_body={"thinking": {"type": "disabled"}} |
|
|
|
|
|
# vLLM/SGLang |
|
|
extra_body={"chat_template_kwargs": {"thinking": False}} |
|
|
``` |
|
|
|
|
|
## Hardware Requirements |
|
|
|
|
|
Due to the 1T parameter size, this model requires significant hardware: |
|
|
|
|
|
- **Minimum:** 8x A100 80GB or equivalent |
|
|
- **Recommended:** 8x H100 80GB for optimal performance |
|
|
- **INT4 Quantization:** Available for reduced memory footprint |
|
|
|
|
|
## License |
|
|
|
|
|
This model is released under the [PRISM Research License](LICENSE.md). |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
Based on [Kimi-K2.5](https://huggingface.co/moonshotai/Kimi-K2.5) by [Moonshot AI](https://www.moonshot.ai). See the [technical blog](https://www.kimi.com/blog/kimi-k2-5.html) for more details on the base model. |
|
|
|