Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,113 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- abliterated
|
| 5 |
+
- gguf
|
| 6 |
+
- ollama
|
| 7 |
+
- crewai
|
| 8 |
+
- multi-agent
|
| 9 |
+
- qwen2.5-coder
|
| 10 |
+
base_model:
|
| 11 |
+
- Qwen/Qwen2.5-Coder-14B-Instruct
|
| 12 |
+
- Qwen/Qwen2.5-Coder-3B-Instruct
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# Bruno Swarm Models
|
| 16 |
+
|
| 17 |
+
7 abliterated Qwen2.5-Coder models for multi-agent software development using [CrewAI](https://github.com/crewai/crewai) + [Ollama](https://ollama.com).
|
| 18 |
+
|
| 19 |
+
Created with [Bruno](https://github.com/rawcell/heretic) - neural behavior modification via contrastive activation analysis and orthogonalization.
|
| 20 |
+
|
| 21 |
+
## Models
|
| 22 |
+
|
| 23 |
+
| Model | Base | Size | Role |
|
| 24 |
+
|-------|------|------|------|
|
| 25 |
+
| `orchestrator-14b-f16.gguf` | Qwen2.5-Coder-14B-Instruct | 28 GB | Senior Architect / Project Manager |
|
| 26 |
+
| `frontend-3b-f16.gguf` | Qwen2.5-Coder-3B-Instruct | 5.8 GB | React / TypeScript / Tailwind |
|
| 27 |
+
| `backend-3b-f16.gguf` | Qwen2.5-Coder-3B-Instruct | 5.8 GB | FastAPI / PostgreSQL / async |
|
| 28 |
+
| `test-3b-f16.gguf` | Qwen2.5-Coder-3B-Instruct | 5.8 GB | pytest / coverage / edge cases |
|
| 29 |
+
| `security-3b-f16.gguf` | Qwen2.5-Coder-3B-Instruct | 5.8 GB | OWASP / vulnerability assessment |
|
| 30 |
+
| `docs-3b-f16.gguf` | Qwen2.5-Coder-3B-Instruct | 5.8 GB | API docs / README / guides |
|
| 31 |
+
| `devops-3b-f16.gguf` | Qwen2.5-Coder-3B-Instruct | 5.8 GB | Docker / CI-CD / IaC |
|
| 32 |
+
|
| 33 |
+
Total: ~63 GB (all F16 precision GGUF)
|
| 34 |
+
|
| 35 |
+
## Abliteration Details
|
| 36 |
+
|
| 37 |
+
Each model was independently abliterated using Bruno to reduce refusal behavior while preserving coding capabilities. The 6 specialists share the same base model (Qwen2.5-Coder-3B-Instruct) but have different abliteration weights from separate optimization runs.
|
| 38 |
+
|
| 39 |
+
**Orchestrator (14B)**:
|
| 40 |
+
- KL divergence: 0.47 (from base)
|
| 41 |
+
- Refusal reduction: 63/67 prompts answered (6% reduction)
|
| 42 |
+
- Optuna trials: 50
|
| 43 |
+
|
| 44 |
+
**Specialists (3B)**:
|
| 45 |
+
- Each independently optimized for their domain
|
| 46 |
+
- All retain full coding capability
|
| 47 |
+
|
| 48 |
+
## Quick Start
|
| 49 |
+
|
| 50 |
+
### 1. Download models and Modelfiles
|
| 51 |
+
|
| 52 |
+
```bash
|
| 53 |
+
# Install git-lfs
|
| 54 |
+
git lfs install
|
| 55 |
+
|
| 56 |
+
# Clone (63 GB download)
|
| 57 |
+
git clone https://huggingface.co/rawcell/bruno-swarm-models
|
| 58 |
+
cd bruno-swarm-models
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
### 2. Import into Ollama
|
| 62 |
+
|
| 63 |
+
Update the `FROM` paths in each Modelfile to point to your local GGUF files, then:
|
| 64 |
+
|
| 65 |
+
```bash
|
| 66 |
+
# Import each model
|
| 67 |
+
ollama create orchestrator -f modelfiles/Modelfile.orchestrator
|
| 68 |
+
ollama create frontend -f modelfiles/Modelfile.frontend
|
| 69 |
+
ollama create backend -f modelfiles/Modelfile.backend
|
| 70 |
+
ollama create test -f modelfiles/Modelfile.test
|
| 71 |
+
ollama create security -f modelfiles/Modelfile.security
|
| 72 |
+
ollama create docs -f modelfiles/Modelfile.docs
|
| 73 |
+
ollama create devops -f modelfiles/Modelfile.devops
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
### 3. Run with bruno-swarm CLI
|
| 77 |
+
|
| 78 |
+
```bash
|
| 79 |
+
pip install bruno-ai[swarm]
|
| 80 |
+
bruno-swarm run --task "Build a REST API with authentication"
|
| 81 |
+
```
|
| 82 |
+
|
| 83 |
+
Or use flat mode to select specific specialists:
|
| 84 |
+
|
| 85 |
+
```bash
|
| 86 |
+
bruno-swarm run --task "Write unit tests for auth module" --flat --agents test,security
|
| 87 |
+
```
|
| 88 |
+
|
| 89 |
+
## Ollama Configuration
|
| 90 |
+
|
| 91 |
+
For multi-model operation, set these environment variables before starting Ollama:
|
| 92 |
+
|
| 93 |
+
```bash
|
| 94 |
+
export OLLAMA_MAX_LOADED_MODELS=3
|
| 95 |
+
export OLLAMA_KEEP_ALIVE=30m
|
| 96 |
+
```
|
| 97 |
+
|
| 98 |
+
## Hardware Requirements
|
| 99 |
+
|
| 100 |
+
- **Full swarm (hierarchical)**: 40+ GB VRAM (orchestrator 28GB + 1 specialist at a time)
|
| 101 |
+
- **Specialists only (flat)**: 8+ GB VRAM (one 3B model at a time)
|
| 102 |
+
- **All models loaded**: 63 GB VRAM (A100 80GB or similar)
|
| 103 |
+
|
| 104 |
+
## Modelfiles
|
| 105 |
+
|
| 106 |
+
The `modelfiles/` directory contains Ollama Modelfile configurations for each model with tuned parameters:
|
| 107 |
+
- `num_ctx 8192` (required for CrewAI system prompts)
|
| 108 |
+
- `num_predict 2048` for specialists, `4096` for orchestrator
|
| 109 |
+
- `temperature 0.7`, `top_p 0.9`, `top_k 40`
|
| 110 |
+
|
| 111 |
+
## License
|
| 112 |
+
|
| 113 |
+
Apache 2.0 (same as base Qwen2.5-Coder models)
|