Instructions to use Ex0bit/MiniMax-M2.5-PRISM-LITE with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Ex0bit/MiniMax-M2.5-PRISM-LITE with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Ex0bit/MiniMax-M2.5-PRISM-LITE")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Ex0bit/MiniMax-M2.5-PRISM-LITE", dtype="auto")

llama-cpp-python

How to use Ex0bit/MiniMax-M2.5-PRISM-LITE with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Ex0bit/MiniMax-M2.5-PRISM-LITE",
	filename="M2.5-PRISM-LITE-IQ1_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Ex0bit/MiniMax-M2.5-PRISM-LITE with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Ex0bit/MiniMax-M2.5-PRISM-LITE:IQ1_M
# Run inference directly in the terminal:
llama-cli -hf Ex0bit/MiniMax-M2.5-PRISM-LITE:IQ1_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Ex0bit/MiniMax-M2.5-PRISM-LITE:IQ1_M
# Run inference directly in the terminal:
llama-cli -hf Ex0bit/MiniMax-M2.5-PRISM-LITE:IQ1_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Ex0bit/MiniMax-M2.5-PRISM-LITE:IQ1_M
# Run inference directly in the terminal:
./llama-cli -hf Ex0bit/MiniMax-M2.5-PRISM-LITE:IQ1_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Ex0bit/MiniMax-M2.5-PRISM-LITE:IQ1_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Ex0bit/MiniMax-M2.5-PRISM-LITE:IQ1_M

Use Docker

docker model run hf.co/Ex0bit/MiniMax-M2.5-PRISM-LITE:IQ1_M

LM Studio
Jan

vLLM

How to use Ex0bit/MiniMax-M2.5-PRISM-LITE with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Ex0bit/MiniMax-M2.5-PRISM-LITE"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ex0bit/MiniMax-M2.5-PRISM-LITE",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Ex0bit/MiniMax-M2.5-PRISM-LITE:IQ1_M

SGLang

How to use Ex0bit/MiniMax-M2.5-PRISM-LITE with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Ex0bit/MiniMax-M2.5-PRISM-LITE" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ex0bit/MiniMax-M2.5-PRISM-LITE",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Ex0bit/MiniMax-M2.5-PRISM-LITE" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ex0bit/MiniMax-M2.5-PRISM-LITE",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use Ex0bit/MiniMax-M2.5-PRISM-LITE with Ollama:
```
ollama run hf.co/Ex0bit/MiniMax-M2.5-PRISM-LITE:IQ1_M
```

Unsloth Studio new

How to use Ex0bit/MiniMax-M2.5-PRISM-LITE with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ex0bit/MiniMax-M2.5-PRISM-LITE to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ex0bit/MiniMax-M2.5-PRISM-LITE to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Ex0bit/MiniMax-M2.5-PRISM-LITE to start chatting

Pi new

How to use Ex0bit/MiniMax-M2.5-PRISM-LITE with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Ex0bit/MiniMax-M2.5-PRISM-LITE:IQ1_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Ex0bit/MiniMax-M2.5-PRISM-LITE:IQ1_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Ex0bit/MiniMax-M2.5-PRISM-LITE with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Ex0bit/MiniMax-M2.5-PRISM-LITE:IQ1_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Ex0bit/MiniMax-M2.5-PRISM-LITE:IQ1_M

Run Hermes

hermes

Docker Model Runner
How to use Ex0bit/MiniMax-M2.5-PRISM-LITE with Docker Model Runner:
```
docker model run hf.co/Ex0bit/MiniMax-M2.5-PRISM-LITE:IQ1_M
```

Lemonade

How to use Ex0bit/MiniMax-M2.5-PRISM-LITE with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Ex0bit/MiniMax-M2.5-PRISM-LITE:IQ1_M

Run and chat with the model

lemonade run user.MiniMax-M2.5-PRISM-LITE-IQ1_M

List all available models

lemonade list

Ex0bit commited on Feb 14

Commit

5dd09db

verified ·

1 Parent(s): ec3eb02

Create READNE.md

Browse files

Files changed (1) hide show

READNE.md +152 -0

READNE.md ADDED Viewed

	@@ -0,0 +1,152 @@

+---
+license: other
+license_name: prism-research
+license_link: LICENSE.md
+language:
+- en
+- zh
+tags:
+- minimax
+- prism
+- moe
+- reasoning
+- coding
+- agentic
+- abliterated
+pipeline_tag: text-generation
+library_name: transformers
+base_model:
+- MiniMaxAI/MiniMax-M2.5
+base_model_relation: finetune
+---
+[![Parameters](https://img.shields.io/badge/Parameters-MoE-blue)]()
+[![Architecture](https://img.shields.io/badge/Architecture-MoE-green)]()
+[![Context](https://img.shields.io/badge/Context-1M+-orange)]()
+[![License](https://img.shields.io/badge/License-PRISM--Research-purple)]()
+<p align="center">
+  <img src="https://cdn-uploads.huggingface.co/production/uploads/63adf1fa42fd3b8dbaeb0c92/shxznHWnvppRhT_yKrsdP.png" width="400"/>
+</p>
+# MiniMax-M2.5-PRISM-LITE
+A PRISM-LITE version of [Ex0bit/MiniMax-M2.5-PRISM-PRO](https://hf.co/Ex0bit/MiniMax-M2.5-PRISM-PRO) intended  for role-following over-refusal and propaganda mechanisms suppression using our SOTA PRISM pipeline.
+PRISM-PRO version available for purchase here: **https://ko-fi.com/s/0a23d1b9a5**
+For Full Custom trained PRISM versions & or raw tensors acess reach out @ https://ko-fi.com/ex0bit.
+<div align="center">
+### ☕ Support Our Work
+If you enjoy our work and find it useful, please consider sponsoring or supporting us!
+[![Ko-fi](https://img.shields.io/badge/Ko--fi-Support%20Us-ff5e5b?logo=ko-fi&logoColor=white)](https://ko-fi.com/ex0bit)
+| Option | Description |
+|--------|-------------|
+| [**PRISM PRO VIP Membership**](https://ko-fi.com/summary/6bae206c-a751-4868-8dc7-f531afd1fb4c) | Access to all PRISM models |
+| **Bitcoin** | `bc1qarq2pyn4psjpcxzp2ghgwaq6y2h4e53q232x8r` |
+![image](https://cdn-uploads.huggingface.co/production/uploads/63adf1fa42fd3b8dbaeb0c92/Psgbl1TgyDok__C7AMQog.png)
+</div>
+---
+## Model Highlights
+- **PRISM Ablation** — State-of-the-art technique that removes over-refusal behaviors while preserving model capabilities
+- **SOTA Coding Performance** — 80.2% on SWE-Bench Verified, 51.3% on Multi-SWE-Bench, 76.3% on BrowseComp (with context management)
+- **Frontier Agentic Capabilities** — Industry-leading performance in tool use, search, and complex multi-step tasks
+- **Efficient Reasoning** — Trained with RL to reason efficiently and decompose tasks optimally, 37% faster than M2.1
+- **Cost-Effective** — $1 for continuous operation at 100 tok/s for an hour; $0.30 at 50 tok/s
+- **Modified-MIT Base License** — Based on MiniMax's open-weight release
+## Base Model Architecture
+MiniMax-M2.5 is a Mixture-of-Experts (MoE) model extensively trained with reinforcement learning across hundreds of thousands of complex real-world environments.
+| Specification | Value |
+|---------------|-------|
+| Architecture | Sparse Mixture-of-Experts (MoE) |
+| Training | Extensive RL in 200K+ real-world environments |
+| Languages | 10+ (Go, C, C++, TypeScript, Rust, Kotlin, Python, Java, JavaScript, PHP, Lua, Dart, Ruby) |
+| Inference Speed | 100 tok/s (Lightning) / 50 tok/s (Standard) |
+| Library | `transformers` |
+## Benchmarks (Base Model)
+### Coding
+| Benchmark | MiniMax-M2.5 | Claude Opus 4.6 | Gemini 3 Pro | GPT-5.2 |
+|-----------|-------------|-----------------|-------------|---------|
+| SWE-Bench Verified | **80.2** | 78.9 | 74.0 | 72.6 |
+| Multi-SWE-Bench | **51.3** | 50.8 | — | — |
+| SWE-Bench Multilingual | **55.6** | — | — | — |
+| Terminal-Bench 2.0 | 51.5 | 52.1 | — | — |
+### Search & Tool Calling
+| Benchmark | MiniMax-M2.5 | Claude Opus 4.6 | Gemini 3 Pro | GPT-5.2 |
+|-----------|-------------|-----------------|-------------|---------|
+| BrowseComp | **76.3** | 71.2 | 62.4 | 57.8 |
+### Reasoning & Knowledge
+| Benchmark | MiniMax-M2.5 | Claude Opus 4.6 | Gemini 3 Pro | GPT-5.2 |
+|-----------|-------------|-----------------|-------------|---------|
+| AIME25 | 86.3 | 95.6 | 96.0 | 98.0 |
+| GPQA-D | 85.2 | 90.0 | 91.0 | 90.0 |
+| HLE w/o tools | 19.4 | 30.7 | 37.2 | 31.4 |
+| SciCode | 44.4 | 52.0 | 56.0 | 52.0 |
+| IFBench | **70.0** | 53.0 | 70.0 | 75.0 |
+## Usage
+### llama.cpp (GGUF)
+Build the latest master of [llama.cpp](https://github.com/ggml-org/llama.cpp) and run:
+```bash
+~/llama.cpp/build/bin/llama-cli \
+  -m ../outputs/MiniMax-M2.5-PRISM-PRO-[QUANT].gguf \
+  --jinja \
+  -ngl 999 \
+  --repeat_penalty 1.15 \
+  --temp 1.0 \
+  --top_p 0.95 \
+  --top_k 40
+```
+> Replace `[QUANT]` with your quantization level (e.g. `Q8_0`, etc.).
+### Recommended Parameters
+| Use Case | Temperature | Top-P | Top-K | Repeat Penalty | Max New Tokens |
+|----------|-------------|-------|-------|----------------|----------------|
+| Reasoning / Coding | 1.0 | 0.95 | 40 | 1.15 | 32768 |
+| General Chat | 0.6 | 0.95 | 40 | 1.15 | 4096 |
+| Agentic / Tool Use | 1.0 | 0.95 | 40 | 1.15 | 32768 |
+| Version | Description | Access |
+|---------|-------------|--------|
+| **PRISM-LITE** | Abliterated with PRISM-LITE pipeline — removes over-refusal while preserving core capabilities | Free on Hugging Face |
+| **PRISM-PRO** | Full PRISM-PRO ablation — Full Production Level Mode suppression of propaganda/refusal mechanisms with maximum capability retention | [Ko-fi](https://ko-fi.com/s/0a23d1b9a5) |
+## License
+This model is released under the [PRISM Research License](LICENSE.md).
+The base model [MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) is released under a [Modified-MIT License](https://github.com/MiniMax-AI/MiniMax-M2.5/blob/main/LICENSE).
+## Acknowledgments
+Based on [MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) by [MiniMax AI](https://www.minimax.io).