--- license: apache-2.0 tags: - abliterated - gguf - ollama - crewai - multi-agent - qwen2.5-coder base_model: - Qwen/Qwen2.5-Coder-14B-Instruct - Qwen/Qwen2.5-Coder-3B-Instruct --- # Bruno Swarm Models 7 abliterated Qwen2.5-Coder models for multi-agent software development using [CrewAI](https://github.com/crewai/crewai) + [Ollama](https://ollama.com). Created with [Bruno](https://github.com/rawcell/heretic) - neural behavior modification via contrastive activation analysis and orthogonalization. ## Models | Model | Base | Size | Role | |-------|------|------|------| | `orchestrator-14b-f16.gguf` | Qwen2.5-Coder-14B-Instruct | 28 GB | Senior Architect / Project Manager | | `frontend-3b-f16.gguf` | Qwen2.5-Coder-3B-Instruct | 5.8 GB | React / TypeScript / Tailwind | | `backend-3b-f16.gguf` | Qwen2.5-Coder-3B-Instruct | 5.8 GB | FastAPI / PostgreSQL / async | | `test-3b-f16.gguf` | Qwen2.5-Coder-3B-Instruct | 5.8 GB | pytest / coverage / edge cases | | `security-3b-f16.gguf` | Qwen2.5-Coder-3B-Instruct | 5.8 GB | OWASP / vulnerability assessment | | `docs-3b-f16.gguf` | Qwen2.5-Coder-3B-Instruct | 5.8 GB | API docs / README / guides | | `devops-3b-f16.gguf` | Qwen2.5-Coder-3B-Instruct | 5.8 GB | Docker / CI-CD / IaC | Total: ~63 GB (all F16 precision GGUF) ## Abliteration Details Each model was independently abliterated using Bruno to reduce refusal behavior while preserving coding capabilities. The 6 specialists share the same base model (Qwen2.5-Coder-3B-Instruct) but have different abliteration weights from separate optimization runs. **Orchestrator (14B)**: - KL divergence: 0.47 (from base) - Refusal reduction: 63/67 prompts answered (6% reduction) - Optuna trials: 50 **Specialists (3B)**: - Each independently optimized for their domain - All retain full coding capability ## Quick Start ### 1. Download models and Modelfiles ```bash # Install git-lfs git lfs install # Clone (63 GB download) git clone https://huggingface.co/rawcell/bruno-swarm-models cd bruno-swarm-models ``` ### 2. Import into Ollama Update the `FROM` paths in each Modelfile to point to your local GGUF files, then: ```bash # Import each model ollama create orchestrator -f modelfiles/Modelfile.orchestrator ollama create frontend -f modelfiles/Modelfile.frontend ollama create backend -f modelfiles/Modelfile.backend ollama create test -f modelfiles/Modelfile.test ollama create security -f modelfiles/Modelfile.security ollama create docs -f modelfiles/Modelfile.docs ollama create devops -f modelfiles/Modelfile.devops ``` ### 3. Run with bruno-swarm CLI ```bash pip install bruno-ai[swarm] bruno-swarm run --task "Build a REST API with authentication" ``` Or use flat mode to select specific specialists: ```bash bruno-swarm run --task "Write unit tests for auth module" --flat --agents test,security ``` ## Ollama Configuration For multi-model operation, set these environment variables before starting Ollama: ```bash export OLLAMA_MAX_LOADED_MODELS=3 export OLLAMA_KEEP_ALIVE=30m ``` ## Hardware Requirements - **Full swarm (hierarchical)**: 40+ GB VRAM (orchestrator 28GB + 1 specialist at a time) - **Specialists only (flat)**: 8+ GB VRAM (one 3B model at a time) - **All models loaded**: 63 GB VRAM (A100 80GB or similar) ## Modelfiles The `modelfiles/` directory contains Ollama Modelfile configurations for each model with tuned parameters: - `num_ctx 8192` (required for CrewAI system prompts) - `num_predict 2048` for specialists, `4096` for orchestrator - `temperature 0.7`, `top_p 0.9`, `top_k 40` ## License Apache 2.0 (same as base Qwen2.5-Coder models)