bruno-swarm-models / README.md

Upload README.md with huggingface_hub

86f9939 verified 8 days ago

3.62 kB

	---
	license: apache-2.0
	tags:
	- abliterated
	- gguf
	- ollama
	- crewai
	- multi-agent
	- qwen2.5-coder
	base_model:
	- Qwen/Qwen2.5-Coder-14B-Instruct
	- Qwen/Qwen2.5-Coder-3B-Instruct
	---

	# Bruno Swarm Models

	7 abliterated Qwen2.5-Coder models for multi-agent software development using [CrewAI](https://github.com/crewai/crewai) + [Ollama](https://ollama.com).

	Created with [Bruno](https://github.com/rawcell/heretic) - neural behavior modification via contrastive activation analysis and orthogonalization.

	## Models

	\| Model \| Base \| Size \| Role \|
	\|-------\|------\|------\|------\|
	\| `orchestrator-14b-f16.gguf` \| Qwen2.5-Coder-14B-Instruct \| 28 GB \| Senior Architect / Project Manager \|
	\| `frontend-3b-f16.gguf` \| Qwen2.5-Coder-3B-Instruct \| 5.8 GB \| React / TypeScript / Tailwind \|
	\| `backend-3b-f16.gguf` \| Qwen2.5-Coder-3B-Instruct \| 5.8 GB \| FastAPI / PostgreSQL / async \|
	\| `test-3b-f16.gguf` \| Qwen2.5-Coder-3B-Instruct \| 5.8 GB \| pytest / coverage / edge cases \|
	\| `security-3b-f16.gguf` \| Qwen2.5-Coder-3B-Instruct \| 5.8 GB \| OWASP / vulnerability assessment \|
	\| `docs-3b-f16.gguf` \| Qwen2.5-Coder-3B-Instruct \| 5.8 GB \| API docs / README / guides \|
	\| `devops-3b-f16.gguf` \| Qwen2.5-Coder-3B-Instruct \| 5.8 GB \| Docker / CI-CD / IaC \|

	Total: ~63 GB (all F16 precision GGUF)

	## Abliteration Details

	Each model was independently abliterated using Bruno to reduce refusal behavior while preserving coding capabilities. The 6 specialists share the same base model (Qwen2.5-Coder-3B-Instruct) but have different abliteration weights from separate optimization runs.

	Orchestrator (14B):
	- KL divergence: 0.47 (from base)
	- Refusal reduction: 63/67 prompts answered (6% reduction)
	- Optuna trials: 50

	Specialists (3B):
	- Each independently optimized for their domain
	- All retain full coding capability

	## Quick Start

	### 1. Download models and Modelfiles

	```bash
	# Install git-lfs
	git lfs install

	# Clone (63 GB download)
	git clone https://huggingface.co/rawcell/bruno-swarm-models
	cd bruno-swarm-models
	```

	### 2. Import into Ollama

	Update the `FROM` paths in each Modelfile to point to your local GGUF files, then:

	```bash
	# Import each model
	ollama create orchestrator -f modelfiles/Modelfile.orchestrator
	ollama create frontend -f modelfiles/Modelfile.frontend
	ollama create backend -f modelfiles/Modelfile.backend
	ollama create test -f modelfiles/Modelfile.test
	ollama create security -f modelfiles/Modelfile.security
	ollama create docs -f modelfiles/Modelfile.docs
	ollama create devops -f modelfiles/Modelfile.devops
	```

	### 3. Run with bruno-swarm CLI

	```bash
	pip install bruno-ai[swarm]
	bruno-swarm run --task "Build a REST API with authentication"
	```

	Or use flat mode to select specific specialists:

	```bash
	bruno-swarm run --task "Write unit tests for auth module" --flat --agents test,security
	```

	## Ollama Configuration

	For multi-model operation, set these environment variables before starting Ollama:

	```bash
	export OLLAMA_MAX_LOADED_MODELS=3
	export OLLAMA_KEEP_ALIVE=30m
	```

	## Hardware Requirements

	- Full swarm (hierarchical): 40+ GB VRAM (orchestrator 28GB + 1 specialist at a time)
	- Specialists only (flat): 8+ GB VRAM (one 3B model at a time)
	- All models loaded: 63 GB VRAM (A100 80GB or similar)

	## Modelfiles

	The `modelfiles/` directory contains Ollama Modelfile configurations for each model with tuned parameters:
	- `num_ctx 8192` (required for CrewAI system prompts)
	- `num_predict 2048` for specialists, `4096` for orchestrator
	- `temperature 0.7`, `top_p 0.9`, `top_k 40`

	## License

	Apache 2.0 (same as base Qwen2.5-Coder models)