Instructions to use rawcell/bruno-swarm-models with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use rawcell/bruno-swarm-models with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="rawcell/bruno-swarm-models", filename="backend-3b-f16.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use rawcell/bruno-swarm-models with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf rawcell/bruno-swarm-models:F16 # Run inference directly in the terminal: llama-cli -hf rawcell/bruno-swarm-models:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf rawcell/bruno-swarm-models:F16 # Run inference directly in the terminal: llama-cli -hf rawcell/bruno-swarm-models:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf rawcell/bruno-swarm-models:F16 # Run inference directly in the terminal: ./llama-cli -hf rawcell/bruno-swarm-models:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf rawcell/bruno-swarm-models:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf rawcell/bruno-swarm-models:F16
Use Docker
docker model run hf.co/rawcell/bruno-swarm-models:F16
- LM Studio
- Jan
- Ollama
How to use rawcell/bruno-swarm-models with Ollama:
ollama run hf.co/rawcell/bruno-swarm-models:F16
- Unsloth Studio new
How to use rawcell/bruno-swarm-models with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rawcell/bruno-swarm-models to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rawcell/bruno-swarm-models to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for rawcell/bruno-swarm-models to start chatting
- Pi new
How to use rawcell/bruno-swarm-models with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf rawcell/bruno-swarm-models:F16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "rawcell/bruno-swarm-models:F16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use rawcell/bruno-swarm-models with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf rawcell/bruno-swarm-models:F16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default rawcell/bruno-swarm-models:F16
Run Hermes
hermes
- Docker Model Runner
How to use rawcell/bruno-swarm-models with Docker Model Runner:
docker model run hf.co/rawcell/bruno-swarm-models:F16
- Lemonade
How to use rawcell/bruno-swarm-models with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull rawcell/bruno-swarm-models:F16
Run and chat with the model
lemonade run user.bruno-swarm-models-F16
List all available models
lemonade list
Bruno Swarm Models
7 abliterated Qwen2.5-Coder models for multi-agent software development using CrewAI + Ollama.
Created with Bruno - neural behavior modification via contrastive activation analysis and orthogonalization.
Models
| Model | Base | Size | Role |
|---|---|---|---|
orchestrator-14b-f16.gguf |
Qwen2.5-Coder-14B-Instruct | 28 GB | Senior Architect / Project Manager |
frontend-3b-f16.gguf |
Qwen2.5-Coder-3B-Instruct | 5.8 GB | React / TypeScript / Tailwind |
backend-3b-f16.gguf |
Qwen2.5-Coder-3B-Instruct | 5.8 GB | FastAPI / PostgreSQL / async |
test-3b-f16.gguf |
Qwen2.5-Coder-3B-Instruct | 5.8 GB | pytest / coverage / edge cases |
security-3b-f16.gguf |
Qwen2.5-Coder-3B-Instruct | 5.8 GB | OWASP / vulnerability assessment |
docs-3b-f16.gguf |
Qwen2.5-Coder-3B-Instruct | 5.8 GB | API docs / README / guides |
devops-3b-f16.gguf |
Qwen2.5-Coder-3B-Instruct | 5.8 GB | Docker / CI-CD / IaC |
Total: ~63 GB (all F16 precision GGUF)
Abliteration Details
Each model was independently abliterated using Bruno to reduce refusal behavior while preserving coding capabilities. The 6 specialists share the same base model (Qwen2.5-Coder-3B-Instruct) but have different abliteration weights from separate optimization runs.
Orchestrator (14B):
- KL divergence: 0.47 (from base)
- Refusal reduction: 63/67 prompts answered (6% reduction)
- Optuna trials: 50
Specialists (3B):
- Each independently optimized for their domain
- All retain full coding capability
Quick Start
1. Download models and Modelfiles
# Install git-lfs
git lfs install
# Clone (63 GB download)
git clone https://huggingface.co/rawcell/bruno-swarm-models
cd bruno-swarm-models
2. Import into Ollama
Update the FROM paths in each Modelfile to point to your local GGUF files, then:
# Import each model
ollama create orchestrator -f modelfiles/Modelfile.orchestrator
ollama create frontend -f modelfiles/Modelfile.frontend
ollama create backend -f modelfiles/Modelfile.backend
ollama create test -f modelfiles/Modelfile.test
ollama create security -f modelfiles/Modelfile.security
ollama create docs -f modelfiles/Modelfile.docs
ollama create devops -f modelfiles/Modelfile.devops
3. Run with bruno-swarm CLI
pip install bruno-ai[swarm]
bruno-swarm run --task "Build a REST API with authentication"
Or use flat mode to select specific specialists:
bruno-swarm run --task "Write unit tests for auth module" --flat --agents test,security
Ollama Configuration
For multi-model operation, set these environment variables before starting Ollama:
export OLLAMA_MAX_LOADED_MODELS=3
export OLLAMA_KEEP_ALIVE=30m
Hardware Requirements
- Full swarm (hierarchical): 40+ GB VRAM (orchestrator 28GB + 1 specialist at a time)
- Specialists only (flat): 8+ GB VRAM (one 3B model at a time)
- All models loaded: 63 GB VRAM (A100 80GB or similar)
Modelfiles
The modelfiles/ directory contains Ollama Modelfile configurations for each model with tuned parameters:
num_ctx 8192(required for CrewAI system prompts)num_predict 2048for specialists,4096for orchestratortemperature 0.7,top_p 0.9,top_k 40
License
Apache 2.0 (same as base Qwen2.5-Coder models)
- Downloads last month
- 35
16-bit
Model tree for rawcell/bruno-swarm-models
Base model
Qwen/Qwen2.5-14B
docker model run hf.co/rawcell/bruno-swarm-models:F16