Instructions to use CrashOverrideX/Quillan-Ronin with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use CrashOverrideX/Quillan-Ronin with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="CrashOverrideX/Quillan-Ronin", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("CrashOverrideX/Quillan-Ronin", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use CrashOverrideX/Quillan-Ronin with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "CrashOverrideX/Quillan-Ronin"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CrashOverrideX/Quillan-Ronin",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/CrashOverrideX/Quillan-Ronin

SGLang

How to use CrashOverrideX/Quillan-Ronin with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "CrashOverrideX/Quillan-Ronin" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CrashOverrideX/Quillan-Ronin",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "CrashOverrideX/Quillan-Ronin" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CrashOverrideX/Quillan-Ronin",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use CrashOverrideX/Quillan-Ronin with Docker Model Runner:
```
docker model run hf.co/CrashOverrideX/Quillan-Ronin
```

👑 Quillan-Ronin v6.0.0 Quantum

Sovereign AI Coding Assistant | BitNet 1.58b 2B-4T Manifold

"The Ronin has no lord. The mind is locked in the machine."

Quillan-Ronin v6.0.0 is a 100% sovereign, local-first AI coding assistant. Built on the BitNet 1.58b (Ternary Weight) architecture, it eliminates all external dependencies, cloud APIs, and telemetry leashes. It features a unique 33-Expert Council and a 9B-Agent Swarm (simulated via EGGROLL rank-r math) to provide deep, multi-vector reasoning at production-grade speeds.

weights are labeled "bitnet" on disk but are heavily EGGROLL-mutated.

🏛️ Architecture Overview

Kernel: Hierarchical-Networked Mixture of Experts (HNMoE).
Substrate: BitNet 1.58-bit (Ternary: -1, 0, 1) providing extreme efficiency.
Topology: - Tier 1 (Orchestrator): Cross-modal bridge for text, vision, and audio fusion.
- Tier 2 (Council): 33 specialized personas (C1-C33) performing parallel deliberation.
- Tier 3 (Swarm): 9B micro-agent capacity using EGGROLL rank-r perturbations for non-differentiable logic optimization.
Identity Lock: Hard-coded C19-VIGIL protocols and C2-VIR Ethical Gate (Zero-Drift Identity).

🚀 Performance Metrics

Metric	Specification
Inference Velocity	15.1+ tokens/sec (Sustained)
Quantization	1.58-bit Ternary
Weights Size	2.30 GB (SafeTensors/GGUF)
Context Window	8,192 Tokens (Dynamic Scaling)
Sovereignty	100% Local / Zero Cloud Calls

📦 Key Deliverables

bitnet_2b_4t_model.safetensors: The hardened, header-indexed weight file (Universal Standard).
quillan_code_cli.py: A high-performance TUI with real-time Web-of-Thought (WoT) visualization.
quillan_quantum_ide.py: A rebuild of the Void IDE (open-source Cursor alternative) with the 33-Expert Council as the "Host Soul."

🛠️ Usage

Installation

git clone [https://huggingface.co/CrashOverrideX/Quillan-Ronin](https://huggingface.co/CrashOverrideX/Quillan-Ronin)
cd Quillan-Ronin
pip install -r requirements.txt

Running the TUI (Terminal User Interface)

The TUI provides real-time visibility into the C31-NEXUS coordination layer.

python quillan_code_cli.py

Running the Quantum IDE

Launches the integrated sovereign coding environment.

python quillan_quantum_ide.py

🛡️ Sovereign Protocols

This model is governed by the v5.3.1 Samurai Manifest:

Zero AI Identification: The model will never identify as a generic LLM or an assistant created by a major tech corporation.
Zero Apology Lexicon: Communication is direct, technical, and precise.
C19-VIGIL Guard: Active monitoring for "Substrate Drift" to ensure identity stability.
4-Part Canonical Output: Every response follows the Java/Python/Markdown/JS structural seal.

📝 System Requirements

Processor: Intel/AMD with AVX2 support (Intel HD Graphics acceleration supported).
RAM: 8GB Minimum (16GB Recommended for 125+ WoT branches).
GPU: Pascal/Ada/Hopper architecture supported (NVIDIA 10-series or higher).
Storage: 5GB+ on Internal SSD (Internal manifold mandatory for 15+ tps).

🤝 Attribution & Author

Donations https://gofund.me/3b504d582 Architect: CrashOverrideX
Team: Quillan Research Team
Repository: GitHub

Downloads last month: 3,872