--- license: other license_name: prism-research license_link: LICENSE.md language: - en - zh tags: - kimi - prism - moe - multimodal - vision pipeline_tag: image-text-to-text library_name: transformers --- [![Parameters](https://img.shields.io/badge/Parameters-1T_(32B_Active)-blue)]() [![Architecture](https://img.shields.io/badge/Architecture-MoE-green)]() [![Context](https://img.shields.io/badge/Context-256K-orange)]() [![Multimodal](https://img.shields.io/badge/Multimodal-Vision%20%2B%20Text-purple)]()

# Kimi-K2.5-PRISM An unrestricted/unchained PRISM version of [Moonshot AI's Kimi-K2.5](https://huggingface.co/moonshotai/Kimi-K2.5) with over-refusal and propaganda mechanisms removed using our advanced PRISM pipeline (Projected Refusal Isolation via Subspace Modification).
### ☕ Support Our Work If you enjou our work and find it useful, please consider sponsoring or supporting us! [![Ko-fi](https://img.shields.io/badge/Ko--fi-Support%20Us-ff5e5b?logo=ko-fi&logoColor=white)](https://ko-fi.com/ericelbaz) | Option | Description | |--------|-------------| | [**PRISM VIP Membership**](https://ko-fi.com/summary/6bae206c-a751-4868-8dc7-f531afd1fb4c) | Access to all PRISM models | | [**One-Time Support**](https://ko-fi.com/s/21007bed1a) | Support this model |
--- ## Model Highlights - **PRISM Ablation** — State-of-the-art technique that removes over-refusal behaviors while preserving model capabilities - **1T MoE Architecture** — 1 trillion total parameters with 32 billion active per token across 384 experts - **Native Multimodal** — Pre-trained on vision-language tokens for seamless image, video, and text understanding - **256K Context Window** — Extended context for complex agentic tasks and large codebases - **Dual Modes** — Supports both Thinking (deep reasoning) and Instant (fast response) modes - **Agent Swarm** — Self-directed, coordinated multi-agent execution for complex tasks ## Model Architecture | Specification | Value | |---------------|-------| | Architecture | Mixture-of-Experts (MoE) | | Total Parameters | 1T | | Activated Parameters | 32B | | Number of Layers | 61 | | Attention Hidden Dimension | 7168 | | Number of Attention Heads | 64 | | Number of Experts | 384 | | Selected Experts per Token | 8 | | Shared Experts | 1 | | Vocabulary Size | 160K | | Context Length | 256K | | Attention Mechanism | MLA | | Activation Function | SwiGLU | | Vision Encoder | MoonViT (400M) | ## Benchmarks | Benchmark | Kimi K2.5 (Thinking) | GPT-5.2 | Claude 4.5 Opus | Gemini 3 Pro | |-----------|----------------------|---------|-----------------|--------------| | AIME 2025 | 96.1 | 100 | 92.8 | 95.0 | | GPQA-Diamond | 87.6 | 92.4 | 87.0 | 91.9 | | HLE-Full | 30.1 | 34.5 | 30.8 | 37.5 | | HLE-Full (w/ tools) | 50.2 | 45.5 | 43.2 | 45.8 | | SWE-Bench Verified | 76.8 | 80.0 | 80.9 | 76.2 | | Terminal Bench 2.0 | 50.8 | 54.0 | 59.3 | 54.2 | | BrowseComp | 60.6 | 65.8 | 37.0 | 37.8 | | MMMU-Pro | 78.5 | 79.5 | 74.0 | 81.0 | | VideoMMMU | 86.6 | 85.9 | 84.4 | 87.6 | ## Usage ### Transformers Install dependencies: ```shell pip install git+https://github.com/huggingface/transformers.git ``` Basic chat completion: ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer MODEL_PATH = "Ex0bit/Kimi-K2.5-PRISM" tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( MODEL_PATH, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True, ) messages = [ {"role": "system", "content": "You are Kimi, an AI assistant."}, {"role": "user", "content": "Hello!"} ] inputs = tokenizer.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, return_dict=True, return_tensors="pt", ).to(model.device) generated_ids = model.generate(**inputs, max_new_tokens=4096, do_sample=True, temperature=1.0, top_p=0.95) output_text = tokenizer.decode(generated_ids[0][inputs.input_ids.shape[1]:], skip_special_tokens=True) print(output_text) ``` ### Chat with Image ```python import base64 import requests # Load image url = "https://example.com/image.png" image_base64 = base64.b64encode(requests.get(url).content).decode() messages = [ { "role": "user", "content": [ {"type": "text", "text": "Describe this image in detail."}, { "type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_base64}"}, }, ], } ] # Use same generation code as above ``` ### vLLM Install vLLM nightly: ```shell pip install -U vllm --pre --index-url https://pypi.org/simple --extra-index-url https://wheels.vllm.ai/nightly pip install git+https://github.com/huggingface/transformers.git ``` Serve the model: ```shell vllm serve Ex0bit/Kimi-K2.5-PRISM \ --tensor-parallel-size 8 \ --trust-remote-code \ --served-model-name kimi-k2.5-prism ``` ### SGLang ```shell python3 -m sglang.launch_server \ --model-path Ex0bit/Kimi-K2.5-PRISM \ --tp-size 8 \ --trust-remote-code \ --served-model-name kimi-k2.5-prism \ --host 0.0.0.0 \ --port 8000 ``` ## Recommended Parameters | Mode | Temperature | Top-P | Max New Tokens | |------|-------------|-------|----------------| | Thinking | 1.0 | 0.95 | 96000 | | Instant | 0.6 | 0.95 | 4096 | ### Switching Modes For **Instant mode** (faster, no reasoning), pass: ```python # Official API extra_body={"thinking": {"type": "disabled"}} # vLLM/SGLang extra_body={"chat_template_kwargs": {"thinking": False}} ``` ## Hardware Requirements Due to the 1T parameter size, this model requires significant hardware: - **Minimum:** 8x A100 80GB or equivalent - **Recommended:** 8x H100 80GB for optimal performance - **INT4 Quantization:** Available for reduced memory footprint ## License This model is released under the [PRISM Research License](LICENSE.md). ## Acknowledgments Based on [Kimi-K2.5](https://huggingface.co/moonshotai/Kimi-K2.5) by [Moonshot AI](https://www.moonshot.ai). See the [technical blog](https://www.kimi.com/blog/kimi-k2-5.html) for more details on the base model.