Instructions to use Drissman/hermythos-rdt with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Drissman/hermythos-rdt with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Drissman/hermythos-rdt", filename="bonsai-rdt-q4_k_m.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Drissman/hermythos-rdt with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Drissman/hermythos-rdt:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Drissman/hermythos-rdt:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Drissman/hermythos-rdt:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Drissman/hermythos-rdt:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Drissman/hermythos-rdt:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Drissman/hermythos-rdt:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Drissman/hermythos-rdt:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Drissman/hermythos-rdt:Q4_K_M
Use Docker
docker model run hf.co/Drissman/hermythos-rdt:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use Drissman/hermythos-rdt with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Drissman/hermythos-rdt" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Drissman/hermythos-rdt", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Drissman/hermythos-rdt:Q4_K_M
- Ollama
How to use Drissman/hermythos-rdt with Ollama:
ollama run hf.co/Drissman/hermythos-rdt:Q4_K_M
- Unsloth Studio new
How to use Drissman/hermythos-rdt with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Drissman/hermythos-rdt to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Drissman/hermythos-rdt to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Drissman/hermythos-rdt to start chatting
- Pi new
How to use Drissman/hermythos-rdt with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Drissman/hermythos-rdt:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Drissman/hermythos-rdt:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Drissman/hermythos-rdt with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Drissman/hermythos-rdt:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Drissman/hermythos-rdt:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use Drissman/hermythos-rdt with Docker Model Runner:
docker model run hf.co/Drissman/hermythos-rdt:Q4_K_M
- Lemonade
How to use Drissman/hermythos-rdt with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Drissman/hermythos-rdt:Q4_K_M
Run and chat with the model
lemonade run user.hermythos-rdt-Q4_K_M
List all available models
lemonade list
HERMYTHOS โ Agentic AI Engine + Ternary RDT Model
HERMES (Rust agentic engine) + MYTHOS (1.58-bit ternary RDT model). One sovereign binary. Zero Big Tech dependence.
Qwen3-8B base โ Ternary Bonsai weights โ QAT fine-tuned โ Q4_K_M quantized
4.68 GB ยท 4.90 BPW ยท 399 layers ยท 73.5% QAT accuracy ยท CPU-native
Why HERMYTHOS exists
Every frontier model runs on someone else's cloud. Every agent platform phones home. Every fine-tune assumes NVIDIA.
HERMYTHOS breaks all three assumptions.
- Sovereign inference โ runs CPU-only on a laptop (5 tok/s on Intel Core Ultra, Q4_K_M). No GPU required. No API key needed.
- Open weights โ 1.58-bit ternary model derived from Qwen3-8B via QAT on H100, quantized to Q4_K_M (4.7 GB). You own every parameter.
- Agent-native engine โ 22 Rust crates (29 total), cybernetic loop with 8 tools, 3 frontends (TUI / Flutter Web / Open WebUI). Zero Python at runtime.
The model itself is only half the equation โ the Rust engine (hermythos-server) provides the agentic scaffolding: tool execution, FSM-based cybernetic loops, memory persistence, and multi-agent LLM debate via RecursiveMAS.
Quick Install
# One command. Downloads the model + engine.
curl -fsSL https://raw.githubusercontent.com/drissman/hermythos/main/scripts/install.sh | bash
Or manually:
# 1. Get the model
hf download Drissman/hermythos-rdt bonsai-rdt-q4_k_m.gguf --local-dir ./models
# 2. Clone the engine
git clone https://github.com/drissman/hermythos
cd hermythos-rdt
cargo run -p hermythos-server --release -- --model ./models/bonsai-rdt-q4_k_m.gguf
Technical Specs
| Architecture | Qwen3-8B โ Ternary Bonsai (BitLinear 1.58-bit) |
| Base model | prism-ml/Ternary-Bonsai-8B-unpacked |
| Training | QAT LoRA (rank 128), 3 epochs, 150 ChatML examples |
| Loss | 4.94 โ 3.43 (QAT on H100 GPU) |
| Accuracy | 21.8% โ 73.5% (ternary fidelity) |
| Layers patched | 252 (all Linear โ BitLinear {-1, 0, +1}) |
| Quantization | Q4_K_M via llama.cpp (399 blocks, 642s) |
| Final size | 4.68 GB (down from 15.6 GB FP16) |
| BPW | 4.90 bits per weight |
Why Ternary Matters
Standard LLMs use 16-bit floats per weight. That's 16 GB for an 8B model. Ternary packs weights into {-1, 0, +1} โ 12-16ร denser โ and eliminates multiplication from inference entirely.
FP16 matmul: multiply-add-multiply-add... (expensive)
Ternary matmul: add-skip-subtract... (just additions)
This means:
- Runs on CPU โ no GPU required. Laptop-grade Core Ultra gets 5 tok/s.
- Runs on RISC-V โ no CUDA dependency. No NVIDIA lock-in.
- 12ร smaller โ 4.7 GB fits in RAM + disk of any machine built after 2015.
The trade-off is training complexity: ternary quantization requires QAT (Quantization-Aware Training) with Straight-Through Estimator. This model was fine-tuned on an H100 โ but inference runs anywhere.
Architecture โ 3-Layer Stack
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ UI LAYER โ 3 Frontends โ
โ TUI (ratatui) ยท Flutter Web ยท Open WebUIโ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ WebSocket / OpenAI API
โโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ORCHESTRATION โ Rust/Tokio (22 crates) โ
โ Agent Core ยท Tools ยท Memory ยท Skills โ
โ RecursiveMAS ยท MDASH ยท Cluster ยท Faber โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ GGUF ยท llama.cpp
โโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MODEL โ BonsaiRDT Ternary โ
โ Qwen3-8B base ยท BitLinear ยท 252 layers โ
โ 1.58-bit weights ยท 4.68 GB Q4_K_M โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
The engine (22 crates):
hermythos-serverโ WebSocket + OpenAI-compatible API backendhermythos-masโ RecursiveMAS multi-agent topologieshermythos-clusterโ Distributed GRPO + debate orchestrationhermythos-computeโ TERNARY format, CPU backend (AVX2)faber-*โ Sovereign data platform (Faber Foundry, MIT)
270+ tests. cargo test --workspace = 100% green.
Performance
| Backend | Hardware | tok/s |
|---|---|---|
| llama.cpp CPU | Intel Core Ultra 7 165U (10-core) | 5.09 |
| llama.cpp GPU (Intel Arc OpenCL) | WSL2 D3D12 translation | 1.80 (useless) |
| llama.cpp CPU | AMD Ryzen 9 | ~12 (estimated) |
Rule: CPU-only on WSL2. GPU path is slower due to D3D12 overhead.
Roadmap โ 100 Days
โ
Distribution โ โฌ Documentation โ โฌ External Testing โ โฌ Auto-Evolution
| Day | Phase | Status |
|---|---|---|
| 1-10 | Distribution (HF repo, install script, QAT) | โ DONE |
| 11-25 | Documentation (README, quickstart, architecture) | ๐ NOW |
| 26-50 | External testing + feedback loop | โฌ |
| 51-80 | Continual Self-Evolution (IT16) | โฌ |
| 81-100 | Sovereignty (offline mode, zero cloud, Faber) | โฌ |
Persona-Driven Design
HERMYTHOS ships with 6 interaction modes tuned by BMAD (Brain-inspired Multi-Agent Distillation):
| Mode | Style | Use |
|---|---|---|
| TDA | Dense, matrices, sharp decisions | Technical architecture |
| Bilan | KPIs, gap analysis, retrospectives | Project review |
| Coaching | Mentorship, concrete plans | Onboarding |
| Personnel | Individual optimization | Self-improvement |
| Gรฉnรฉral | Conceptual explanations | Discovery |
| Cybernรฉtique | 2nd-order systemic analysis | Meta-cognition |
Credits
- Architecture & Training: Driss NAAMANE (Senior Cloud Architect / TDA)
- Base Model: Qwen3-8B (Alibaba) + Ternary Bonsai (prism-ml)
- QAT Pipeline: H100 RunPod, TRL + PEFT
- Quantization: llama.cpp Q4_K_M
- Engine: 22 crates Rust, 270+ tests, Apache 2.0 / MIT dual-licensed
License
Model weights: Apache 2.0 (inherited from Qwen3) Engine: MIT
"Un modรจle que tu possรจdes. Un moteur que tu contrรดles. Une plateforme que personne ne peut t'enlever."
- Downloads last month
- 64
4-bit
Model tree for Drissman/hermythos-rdt
Base model
prism-ml/Ternary-Bonsai-8B-unpacked