|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- abliterated |
|
|
- gguf |
|
|
- ollama |
|
|
- crewai |
|
|
- multi-agent |
|
|
- qwen2.5-coder |
|
|
base_model: |
|
|
- Qwen/Qwen2.5-Coder-14B-Instruct |
|
|
- Qwen/Qwen2.5-Coder-3B-Instruct |
|
|
--- |
|
|
|
|
|
# Bruno Swarm Models |
|
|
|
|
|
7 abliterated Qwen2.5-Coder models for multi-agent software development using [CrewAI](https://github.com/crewai/crewai) + [Ollama](https://ollama.com). |
|
|
|
|
|
Created with [Bruno](https://github.com/rawcell/heretic) - neural behavior modification via contrastive activation analysis and orthogonalization. |
|
|
|
|
|
## Models |
|
|
|
|
|
| Model | Base | Size | Role | |
|
|
|-------|------|------|------| |
|
|
| `orchestrator-14b-f16.gguf` | Qwen2.5-Coder-14B-Instruct | 28 GB | Senior Architect / Project Manager | |
|
|
| `frontend-3b-f16.gguf` | Qwen2.5-Coder-3B-Instruct | 5.8 GB | React / TypeScript / Tailwind | |
|
|
| `backend-3b-f16.gguf` | Qwen2.5-Coder-3B-Instruct | 5.8 GB | FastAPI / PostgreSQL / async | |
|
|
| `test-3b-f16.gguf` | Qwen2.5-Coder-3B-Instruct | 5.8 GB | pytest / coverage / edge cases | |
|
|
| `security-3b-f16.gguf` | Qwen2.5-Coder-3B-Instruct | 5.8 GB | OWASP / vulnerability assessment | |
|
|
| `docs-3b-f16.gguf` | Qwen2.5-Coder-3B-Instruct | 5.8 GB | API docs / README / guides | |
|
|
| `devops-3b-f16.gguf` | Qwen2.5-Coder-3B-Instruct | 5.8 GB | Docker / CI-CD / IaC | |
|
|
|
|
|
Total: ~63 GB (all F16 precision GGUF) |
|
|
|
|
|
## Abliteration Details |
|
|
|
|
|
Each model was independently abliterated using Bruno to reduce refusal behavior while preserving coding capabilities. The 6 specialists share the same base model (Qwen2.5-Coder-3B-Instruct) but have different abliteration weights from separate optimization runs. |
|
|
|
|
|
**Orchestrator (14B)**: |
|
|
- KL divergence: 0.47 (from base) |
|
|
- Refusal reduction: 63/67 prompts answered (6% reduction) |
|
|
- Optuna trials: 50 |
|
|
|
|
|
**Specialists (3B)**: |
|
|
- Each independently optimized for their domain |
|
|
- All retain full coding capability |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
### 1. Download models and Modelfiles |
|
|
|
|
|
```bash |
|
|
# Install git-lfs |
|
|
git lfs install |
|
|
|
|
|
# Clone (63 GB download) |
|
|
git clone https://huggingface.co/rawcell/bruno-swarm-models |
|
|
cd bruno-swarm-models |
|
|
``` |
|
|
|
|
|
### 2. Import into Ollama |
|
|
|
|
|
Update the `FROM` paths in each Modelfile to point to your local GGUF files, then: |
|
|
|
|
|
```bash |
|
|
# Import each model |
|
|
ollama create orchestrator -f modelfiles/Modelfile.orchestrator |
|
|
ollama create frontend -f modelfiles/Modelfile.frontend |
|
|
ollama create backend -f modelfiles/Modelfile.backend |
|
|
ollama create test -f modelfiles/Modelfile.test |
|
|
ollama create security -f modelfiles/Modelfile.security |
|
|
ollama create docs -f modelfiles/Modelfile.docs |
|
|
ollama create devops -f modelfiles/Modelfile.devops |
|
|
``` |
|
|
|
|
|
### 3. Run with bruno-swarm CLI |
|
|
|
|
|
```bash |
|
|
pip install bruno-ai[swarm] |
|
|
bruno-swarm run --task "Build a REST API with authentication" |
|
|
``` |
|
|
|
|
|
Or use flat mode to select specific specialists: |
|
|
|
|
|
```bash |
|
|
bruno-swarm run --task "Write unit tests for auth module" --flat --agents test,security |
|
|
``` |
|
|
|
|
|
## Ollama Configuration |
|
|
|
|
|
For multi-model operation, set these environment variables before starting Ollama: |
|
|
|
|
|
```bash |
|
|
export OLLAMA_MAX_LOADED_MODELS=3 |
|
|
export OLLAMA_KEEP_ALIVE=30m |
|
|
``` |
|
|
|
|
|
## Hardware Requirements |
|
|
|
|
|
- **Full swarm (hierarchical)**: 40+ GB VRAM (orchestrator 28GB + 1 specialist at a time) |
|
|
- **Specialists only (flat)**: 8+ GB VRAM (one 3B model at a time) |
|
|
- **All models loaded**: 63 GB VRAM (A100 80GB or similar) |
|
|
|
|
|
## Modelfiles |
|
|
|
|
|
The `modelfiles/` directory contains Ollama Modelfile configurations for each model with tuned parameters: |
|
|
- `num_ctx 8192` (required for CrewAI system prompts) |
|
|
- `num_predict 2048` for specialists, `4096` for orchestrator |
|
|
- `temperature 0.7`, `top_p 0.9`, `top_k 40` |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 (same as base Qwen2.5-Coder models) |
|
|
|