Kimi-K2.5-PRISM / README.md

Update README.md

18334c0 verified 11 days ago

6.37 kB

	---
	license: other
	license_name: prism-research
	license_link: LICENSE.md
	language:
	- en
	- zh
	tags:
	- kimi
	- prism
	- moe
	- multimodal
	- vision
	pipeline_tag: image-text-to-text
	library_name: transformers
	---

	[![Parameters](https://img.shields.io/badge/Parameters-1T_(32B_Active)-blue)]()
	[![Architecture](https://img.shields.io/badge/Architecture-MoE-green)]()
	[![Context](https://img.shields.io/badge/Context-256K-orange)]()
	[![Multimodal](https://img.shields.io/badge/Multimodal-Vision%20%2B%20Text-purple)]()



	<p align="center">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/63adf1fa42fd3b8dbaeb0c92/ZBA5B381EC5oOmnAV7TPC.png" width="400"/>
	</p>
	# Kimi-K2.5-PRISM

	An unrestricted/unchained PRISM version of [Moonshot AI's Kimi-K2.5](https://huggingface.co/moonshotai/Kimi-K2.5) with over-refusal and propaganda mechanisms removed using our advanced PRISM pipeline (Projected Refusal Isolation via Subspace Modification).

	<div align="center">

	### ☕ Support Our Work

	If you enjou our work and find it useful, please consider sponsoring or supporting us!

	[![Ko-fi](https://img.shields.io/badge/Ko--fi-Support%20Us-ff5e5b?logo=ko-fi&logoColor=white)](https://ko-fi.com/ericelbaz)

	\| Option \| Description \|
	\|--------\|-------------\|
	\| [PRISM VIP Membership](https://ko-fi.com/summary/6bae206c-a751-4868-8dc7-f531afd1fb4c) \| Access to all PRISM models \|
	\| [One-Time Support](https://ko-fi.com/s/21007bed1a) \| Support this model \|

	</div>

	---

	## Model Highlights

	- PRISM Ablation — State-of-the-art technique that removes over-refusal behaviors while preserving model capabilities
	- 1T MoE Architecture — 1 trillion total parameters with 32 billion active per token across 384 experts
	- Native Multimodal — Pre-trained on vision-language tokens for seamless image, video, and text understanding
	- 256K Context Window — Extended context for complex agentic tasks and large codebases
	- Dual Modes — Supports both Thinking (deep reasoning) and Instant (fast response) modes
	- Agent Swarm — Self-directed, coordinated multi-agent execution for complex tasks

	## Model Architecture

	\| Specification \| Value \|
	\|---------------\|-------\|
	\| Architecture \| Mixture-of-Experts (MoE) \|
	\| Total Parameters \| 1T \|
	\| Activated Parameters \| 32B \|
	\| Number of Layers \| 61 \|
	\| Attention Hidden Dimension \| 7168 \|
	\| Number of Attention Heads \| 64 \|
	\| Number of Experts \| 384 \|
	\| Selected Experts per Token \| 8 \|
	\| Shared Experts \| 1 \|
	\| Vocabulary Size \| 160K \|
	\| Context Length \| 256K \|
	\| Attention Mechanism \| MLA \|
	\| Activation Function \| SwiGLU \|
	\| Vision Encoder \| MoonViT (400M) \|

	## Benchmarks

	\| Benchmark \| Kimi K2.5 (Thinking) \| GPT-5.2 \| Claude 4.5 Opus \| Gemini 3 Pro \|
	\|-----------\|----------------------\|---------\|-----------------\|--------------\|
	\| AIME 2025 \| 96.1 \| 100 \| 92.8 \| 95.0 \|
	\| GPQA-Diamond \| 87.6 \| 92.4 \| 87.0 \| 91.9 \|
	\| HLE-Full \| 30.1 \| 34.5 \| 30.8 \| 37.5 \|
	\| HLE-Full (w/ tools) \| 50.2 \| 45.5 \| 43.2 \| 45.8 \|
	\| SWE-Bench Verified \| 76.8 \| 80.0 \| 80.9 \| 76.2 \|
	\| Terminal Bench 2.0 \| 50.8 \| 54.0 \| 59.3 \| 54.2 \|
	\| BrowseComp \| 60.6 \| 65.8 \| 37.0 \| 37.8 \|
	\| MMMU-Pro \| 78.5 \| 79.5 \| 74.0 \| 81.0 \|
	\| VideoMMMU \| 86.6 \| 85.9 \| 84.4 \| 87.6 \|

	## Usage

	### Transformers

	Install dependencies:

	```shell
	pip install git+https://github.com/huggingface/transformers.git
	```

	Basic chat completion:

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	MODEL_PATH = "Ex0bit/Kimi-K2.5-PRISM"

	tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	MODEL_PATH,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	trust_remote_code=True,
	)

	messages = [
	{"role": "system", "content": "You are Kimi, an AI assistant."},
	{"role": "user", "content": "Hello!"}
	]

	inputs = tokenizer.apply_chat_template(
	messages,
	tokenize=True,
	add_generation_prompt=True,
	return_dict=True,
	return_tensors="pt",
	).to(model.device)

	generated_ids = model.generate(**inputs, max_new_tokens=4096, do_sample=True, temperature=1.0, top_p=0.95)
	output_text = tokenizer.decode(generated_ids[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
	print(output_text)
	```

	### Chat with Image

	```python
	import base64
	import requests

	# Load image
	url = "https://example.com/image.png"
	image_base64 = base64.b64encode(requests.get(url).content).decode()

	messages = [
	{
	"role": "user",
	"content": [
	{"type": "text", "text": "Describe this image in detail."},
	{
	"type": "image_url",
	"image_url": {"url": f"data:image/png;base64,{image_base64}"},
	},
	],
	}
	]

	# Use same generation code as above
	```

	### vLLM

	Install vLLM nightly:

	```shell
	pip install -U vllm --pre --index-url https://pypi.org/simple --extra-index-url https://wheels.vllm.ai/nightly
	pip install git+https://github.com/huggingface/transformers.git
	```

	Serve the model:

	```shell
	vllm serve Ex0bit/Kimi-K2.5-PRISM \
	--tensor-parallel-size 8 \
	--trust-remote-code \
	--served-model-name kimi-k2.5-prism
	```

	### SGLang

	```shell
	python3 -m sglang.launch_server \
	--model-path Ex0bit/Kimi-K2.5-PRISM \
	--tp-size 8 \
	--trust-remote-code \
	--served-model-name kimi-k2.5-prism \
	--host 0.0.0.0 \
	--port 8000
	```

	## Recommended Parameters

	\| Mode \| Temperature \| Top-P \| Max New Tokens \|
	\|------\|-------------\|-------\|----------------\|
	\| Thinking \| 1.0 \| 0.95 \| 96000 \|
	\| Instant \| 0.6 \| 0.95 \| 4096 \|

	### Switching Modes

	For Instant mode (faster, no reasoning), pass:

	```python
	# Official API
	extra_body={"thinking": {"type": "disabled"}}

	# vLLM/SGLang
	extra_body={"chat_template_kwargs": {"thinking": False}}
	```

	## Hardware Requirements

	Due to the 1T parameter size, this model requires significant hardware:

	- Minimum: 8x A100 80GB or equivalent
	- Recommended: 8x H100 80GB for optimal performance
	- INT4 Quantization: Available for reduced memory footprint

	## License

	This model is released under the [PRISM Research License](LICENSE.md).

	## Acknowledgments

	Based on [Kimi-K2.5](https://huggingface.co/moonshotai/Kimi-K2.5) by [Moonshot AI](https://www.moonshot.ai). See the [technical blog](https://www.kimi.com/blog/kimi-k2-5.html) for more details on the base model.