Agent Zero — ZeroGPU Native
This repo contains the reference implementation for the Agent Zero ZeroGPU Space.
➡️ Live Space
The fully operational Space is at:
https://huggingface.co/spaces/ScottzillaSystems/agent-zero-orchestration
Architecture
- SDK: Gradio (required for ZeroGPU)
- Hardware: ZeroGPU (H200, 70GB VRAM)
- Decorator:
@spaces.GPU(duration=180)for all inference functions - Models: Loaded on-demand per request via
transformers
Models (ScottzillaSystems Fleet)
| Model | Tier | Size | Architecture | Repo |
|---|---|---|---|---|
| ChatGPT-5 | T0 | 494M | Qwen2ForCausalLM | ScottzillaSystems/ChatGPT-5 |
| Qwen3.5 9B Opus | T1 | 9.6B | Qwen3_5ForConditionalGeneration | ScottzillaSystems/Huihui-Qwen3.5-9B-Claude-4.6-Opus-abliterated |
| SuperGemma4 | T1 | 7.5B | Gemma4ForConditionalGeneration | ScottzillaSystems/supergemma4-e4b-abliterated |
| Cydonia 24B | T2 | 24B | MistralForCausalLM | ScottzillaSystems/Cydonia-24B-v4.1 |
| Qwen3.6 27B | T3 | 27.8B | CausalLM | ScottzillaSystems/Huihui-Qwen3.6-27B-abliterated |
| Qwen3 VL 8B | VL | 8.8B | ConditionalGeneration | ScottzillaSystems/Huihui-Qwen3-VL-8B-Instruct-abliterated |
| Qwen3.5 9B Base | T1 | 9.6B | Qwen3_5ForConditionalGeneration | ScottzillaSystems/Qwen3.5-9B |
Key Design Decisions
- ZeroGPU requires Gradio SDK — Docker SDK is not supported
- Models load inside
@spaces.GPU— GPU is allocated per-request AutoModelForImageTextToTextfor multimodal models (Qwen3.5, SuperGemma4, Qwen3 VL)AutoModelForCausalLMfor standard text models (ChatGPT-5, Cydonia, Qwen3.6 27B)- Smart auto-routing with fallback chain (T3→T2→T1→T0)
- No LiteLLM, no TGI, no Docker Compose — pure transformers + ZeroGPU
Setup
- Set
HF_TOKENas Space Secret - Set hardware to ZeroGPU in Space Settings
- Done — models load on first request