File size: 1,934 Bytes
d6c1199 0f63d2f d6c1199 0f63d2f d6c1199 0f63d2f d6c1199 0f63d2f d6c1199 0f63d2f d6c1199 0f63d2f d6c1199 0f63d2f d6c1199 0f63d2f d6c1199 0f63d2f d6c1199 0f63d2f d6c1199 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | # Agent Zero — ZeroGPU Native
**This repo contains the reference implementation for the Agent Zero ZeroGPU Space.**
## ➡️ Live Space
The fully operational Space is at:
**https://huggingface.co/spaces/ScottzillaSystems/agent-zero-orchestration**
## Architecture
- **SDK**: Gradio (required for ZeroGPU)
- **Hardware**: ZeroGPU (H200, 70GB VRAM)
- **Decorator**: `@spaces.GPU(duration=180)` for all inference functions
- **Models**: Loaded on-demand per request via `transformers`
## Models (ScottzillaSystems Fleet)
| Model | Tier | Size | Architecture | Repo |
|---|---|---|---|---|
| ChatGPT-5 | T0 | 494M | Qwen2ForCausalLM | `ScottzillaSystems/ChatGPT-5` |
| Qwen3.5 9B Opus | T1 | 9.6B | Qwen3_5ForConditionalGeneration | `ScottzillaSystems/Huihui-Qwen3.5-9B-Claude-4.6-Opus-abliterated` |
| SuperGemma4 | T1 | 7.5B | Gemma4ForConditionalGeneration | `ScottzillaSystems/supergemma4-e4b-abliterated` |
| Cydonia 24B | T2 | 24B | MistralForCausalLM | `ScottzillaSystems/Cydonia-24B-v4.1` |
| Qwen3.6 27B | T3 | 27.8B | CausalLM | `ScottzillaSystems/Huihui-Qwen3.6-27B-abliterated` |
| Qwen3 VL 8B | VL | 8.8B | ConditionalGeneration | `ScottzillaSystems/Huihui-Qwen3-VL-8B-Instruct-abliterated` |
| Qwen3.5 9B Base | T1 | 9.6B | Qwen3_5ForConditionalGeneration | `ScottzillaSystems/Qwen3.5-9B` |
## Key Design Decisions
1. **ZeroGPU requires Gradio SDK** — Docker SDK is not supported
2. **Models load inside `@spaces.GPU`** — GPU is allocated per-request
3. **`AutoModelForImageTextToText`** for multimodal models (Qwen3.5, SuperGemma4, Qwen3 VL)
4. **`AutoModelForCausalLM`** for standard text models (ChatGPT-5, Cydonia, Qwen3.6 27B)
5. **Smart auto-routing** with fallback chain (T3→T2→T1→T0)
6. **No LiteLLM, no TGI, no Docker Compose** — pure transformers + ZeroGPU
## Setup
1. Set `HF_TOKEN` as Space Secret
2. Set hardware to ZeroGPU in Space Settings
3. Done — models load on first request
|