# Agent Zero — ZeroGPU Native **This repo contains the reference implementation for the Agent Zero ZeroGPU Space.** ## ➡️ Live Space The fully operational Space is at: **https://huggingface.co/spaces/ScottzillaSystems/agent-zero-orchestration** ## Architecture - **SDK**: Gradio (required for ZeroGPU) - **Hardware**: ZeroGPU (H200, 70GB VRAM) - **Decorator**: `@spaces.GPU(duration=180)` for all inference functions - **Models**: Loaded on-demand per request via `transformers` ## Models (ScottzillaSystems Fleet) | Model | Tier | Size | Architecture | Repo | |---|---|---|---|---| | ChatGPT-5 | T0 | 494M | Qwen2ForCausalLM | `ScottzillaSystems/ChatGPT-5` | | Qwen3.5 9B Opus | T1 | 9.6B | Qwen3_5ForConditionalGeneration | `ScottzillaSystems/Huihui-Qwen3.5-9B-Claude-4.6-Opus-abliterated` | | SuperGemma4 | T1 | 7.5B | Gemma4ForConditionalGeneration | `ScottzillaSystems/supergemma4-e4b-abliterated` | | Cydonia 24B | T2 | 24B | MistralForCausalLM | `ScottzillaSystems/Cydonia-24B-v4.1` | | Qwen3.6 27B | T3 | 27.8B | CausalLM | `ScottzillaSystems/Huihui-Qwen3.6-27B-abliterated` | | Qwen3 VL 8B | VL | 8.8B | ConditionalGeneration | `ScottzillaSystems/Huihui-Qwen3-VL-8B-Instruct-abliterated` | | Qwen3.5 9B Base | T1 | 9.6B | Qwen3_5ForConditionalGeneration | `ScottzillaSystems/Qwen3.5-9B` | ## Key Design Decisions 1. **ZeroGPU requires Gradio SDK** — Docker SDK is not supported 2. **Models load inside `@spaces.GPU`** — GPU is allocated per-request 3. **`AutoModelForImageTextToText`** for multimodal models (Qwen3.5, SuperGemma4, Qwen3 VL) 4. **`AutoModelForCausalLM`** for standard text models (ChatGPT-5, Cydonia, Qwen3.6 27B) 5. **Smart auto-routing** with fallback chain (T3→T2→T1→T0) 6. **No LiteLLM, no TGI, no Docker Compose** — pure transformers + ZeroGPU ## Setup 1. Set `HF_TOKEN` as Space Secret 2. Set hardware to ZeroGPU in Space Settings 3. Done — models load on first request