Instructions to use Vaultkeeper/ouroboros-next with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Vaultkeeper/ouroboros-next with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Vaultkeeper/ouroboros-next")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("Vaultkeeper/ouroboros-next")
model = AutoModelForImageTextToText.from_pretrained("Vaultkeeper/ouroboros-next")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use Vaultkeeper/ouroboros-next with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Vaultkeeper/ouroboros-next",
	filename="Ouroboros-Next-9B-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Vaultkeeper/ouroboros-next with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Vaultkeeper/ouroboros-next:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Vaultkeeper/ouroboros-next:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Vaultkeeper/ouroboros-next:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Vaultkeeper/ouroboros-next:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Vaultkeeper/ouroboros-next:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf Vaultkeeper/ouroboros-next:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Vaultkeeper/ouroboros-next:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Vaultkeeper/ouroboros-next:Q4_K_M

Use Docker

docker model run hf.co/Vaultkeeper/ouroboros-next:Q4_K_M

LM Studio
Jan

vLLM

How to use Vaultkeeper/ouroboros-next with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Vaultkeeper/ouroboros-next"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Vaultkeeper/ouroboros-next",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Vaultkeeper/ouroboros-next:Q4_K_M

SGLang

How to use Vaultkeeper/ouroboros-next with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Vaultkeeper/ouroboros-next" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Vaultkeeper/ouroboros-next",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Vaultkeeper/ouroboros-next" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Vaultkeeper/ouroboros-next",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use Vaultkeeper/ouroboros-next with Ollama:
```
ollama run hf.co/Vaultkeeper/ouroboros-next:Q4_K_M
```

Unsloth Studio new

How to use Vaultkeeper/ouroboros-next with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Vaultkeeper/ouroboros-next to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Vaultkeeper/ouroboros-next to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Vaultkeeper/ouroboros-next to start chatting

Pi new

How to use Vaultkeeper/ouroboros-next with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Vaultkeeper/ouroboros-next:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "ouroboros-next"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Docker Model Runner
How to use Vaultkeeper/ouroboros-next with Docker Model Runner:
```
docker model run hf.co/Vaultkeeper/ouroboros-next:Q4_K_M
```

Lemonade

How to use Vaultkeeper/ouroboros-next with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Vaultkeeper/ouroboros-next:Q4_K_M

Run and chat with the model

lemonade run user.ouroboros-next-Q4_K_M

List all available models

lemonade list

Ouroboros-Next

by VaultAI

Deployment Status: ● ONLINE / RELEASED

[ VERSION 1.0 ] OUROBOROS-NEXT | NEURAL PIPELINE STABILIZED

✅ Intelligence, Unfiltered.

Most AI models give you the first, sanitized answer they can generate. They are built to agree, not to solve. Ouroboros-Next is built differently.

Engineered by VaultAI, Ouroboros-Next is a next-generation Linear Hybrid model. It synthesizes high-IQ "Heretic" reasoning with advanced multimodal vision capabilities. Designed for users who need expert-level execution without the corporate filler, it represents the evolution of the Ouroboros series into a fully multimodal coding agent. It doesn’t just answer your prompts; it interrogates them.

🧠 Architecture & Identity: The Shadow Triad

Ouroboros-Next is not a standard conversational assistant. It was engineered using a specialized 60/40 architectural split, designed specifically to process complex visual and textual information through a psychological framework.

Instead of defaulting to literal, surface-level descriptions, Ouroboros-Next evaluates prompts through a hardwired Jungian Shadow Triad logic system. When presented with an image or a scenario, the model is trained to look past the obvious and dissect the underlying psychological conflicts, hidden archetypes, and subconscious motivations at play.

Key Capabilities:

Multimodal Psychoanalysis: Capable of ingesting complex visual scenes (via the mmproj vision encoder) and outputting deep, qualitative analysis of the environment's emotional and psychological weight.
Subtextual Reasoning: Trained to bypass AI "pleasantries" and identify the inherent contradictions, shadow elements, and hidden meanings within text and code structures.
Hardware Optimized: Fully compatible with llama.cpp, allowing this complex reasoning to run efficiently on a single consumer-grade GPU (like an NVIDIA T4) using Q4_K_M quantization.

⚡ Performance & Benchmarks

Ouroboros-Next was benchmarked on a single NVIDIA T4 GPU (16GB VRAM) using the Q4_K_M quantization.

Metric	Speed (Tokens / Second)	Hardware	Comparison Notes
Vision Encoding & Prompt Processing	301.75 t/s	1x T4 (16GB)	~2.5x faster than base Llama-3-V on equivalent hardware.
Text Generation & Reasoning	33.35 t/s	1x T4 (16GB)	Matches GPT-4o-mini throughput while running locally.
Model Size / VRAM	5.24 GB	1x T4 (16GB)	Optimized for 12GB/16GB consumer cards with high context headroom.

Technical Notes:

Quantization: Q4_K_M (GGUF) — The optimal balance of reasoning quality and speed.
Compatibility: Fully compatible with llama.cpp and Ollama (requires the accompanying mmproj file).
Vision Projection: Prompt processing speed includes the mmproj encoding overhead for high-resolution images.

Standardized Accuracy Benchmarks (Pending)

The following benchmarks are currently queued for evaluation to test the reasoning capabilities and knowledge retention of the architecture.

Benchmark	Focus Area	Score	Status
GSM8k	Grade School Math	TBD	⏳ Pending Eval
MMLU	General Knowledge	TBD	⏳ Pending Eval
HumanEval	Coding & Logic	TBD	⏳ Pending Eval
ARC-C	Advanced Reasoning	TBD	⏳ Pending Eval

Accuracy scores are actively being evaluated and will be updated soon.

Model Details

Type: Multimodal Causal Language Model (Linear Hybrid)
Base Architecture: Qwen 3.5 (9B) + Phi-4 (15B Vision)
Total Parameters: ~12-14B (Effective density via Linear Blending)
Context Length: 128,000 tokens (Optimized for deep dev tasks)
Merge Method: Linear Weight Blending (60/40 Split)
Weights Blend:
- 60% — Crow-9B-Opus-4.6-Distill-Heretic: Distilled Claude 4.6 Opus logic for sharp, unfiltered coding performance.
- 40% — Phi-4-reasoning-vision-15B: Microsoft’s state-of-the-art vision-reasoning backbone for GUI grounding and spatial logic.
Tokenizer: crownelius/Crow-9B (Qwen 3.5 Base)
License: Apache 2.0

Why Ouroboros-Next?

Zero Corporate Fluff: No "As an AI..." apologies. Just confident, intelligence-first execution.
Self-Auditing: The built-in Shadow and Vision protocols mean the model checks its own blind spots before you have to.
Built for Builders: Designed for complex logic, agentic workflows, and deep technical problem-solving.

Key Custom Features

1. The Vision-Heretic Triad (Shadow Logic)

Before Ouroboros-Next outputs a single word, it initiates a mandatory internal debate. Inside every mandatory <think> block, the model divides its cognition into three distinct personas to stress-test its own logic:

EGO (Builder): Primary high-performance code and architectural planning. Focuses on generating expert-level solutions instantly.
SHADOW (Heretic): Aggressive auditor. Hunts down logical flaws, identifies "safe-mode" hallucinations, security flaws, and logic traps.
VISION (Auditor): Grounded multimodal analysis. Enforces strict mathematical logic, maps UI coordinates [x, y], and verifies visual evidence.

2. GUI & Multimodal Grounding

Optimized for Autonomous Computer Use. Ouroboros-Next can look at screenshots and provide precise, normalized coordinates for interactive elements, bridging the gap between "thinking" and "doing."

3. "Heretic" Reasoning

Unlike standard models, Ouroboros-Next inherits a distilled Claude 4.6 Opus personality—prioritizing efficient, direct, and un-sanitized technical solutions over corporate verbosity.

Intended Use

Autonomous Coding Agents: Advanced repo-level analysis and auto-refactoring.
Visual Web/GUI Navigation: Grounded multimodal reasoning for browser-based tasks.
Deep Reasoning: Complex math and logic puzzles requiring cross-verified verification.

Ouroboros-Next

by VaultAI

Downloads last month: 798

GGUF

Model size

9B params

Architecture

qwen35

Hardware compatibility

4-bit

Model tree for Vaultkeeper/ouroboros-next

crownelius/Crow-9B-Opus-4.6-Distill-Heretic_Qwen3.5

microsoft/Phi-4-reasoning-vision-15B

Merge model

this model