agent-zero-fixed / README.md
ScottzillaSystems's picture
Update README to reference the operational ZeroGPU Space
d6c1199 verified

Agent Zero — ZeroGPU Native

This repo contains the reference implementation for the Agent Zero ZeroGPU Space.

➡️ Live Space

The fully operational Space is at:
https://huggingface.co/spaces/ScottzillaSystems/agent-zero-orchestration

Architecture

  • SDK: Gradio (required for ZeroGPU)
  • Hardware: ZeroGPU (H200, 70GB VRAM)
  • Decorator: @spaces.GPU(duration=180) for all inference functions
  • Models: Loaded on-demand per request via transformers

Models (ScottzillaSystems Fleet)

Model Tier Size Architecture Repo
ChatGPT-5 T0 494M Qwen2ForCausalLM ScottzillaSystems/ChatGPT-5
Qwen3.5 9B Opus T1 9.6B Qwen3_5ForConditionalGeneration ScottzillaSystems/Huihui-Qwen3.5-9B-Claude-4.6-Opus-abliterated
SuperGemma4 T1 7.5B Gemma4ForConditionalGeneration ScottzillaSystems/supergemma4-e4b-abliterated
Cydonia 24B T2 24B MistralForCausalLM ScottzillaSystems/Cydonia-24B-v4.1
Qwen3.6 27B T3 27.8B CausalLM ScottzillaSystems/Huihui-Qwen3.6-27B-abliterated
Qwen3 VL 8B VL 8.8B ConditionalGeneration ScottzillaSystems/Huihui-Qwen3-VL-8B-Instruct-abliterated
Qwen3.5 9B Base T1 9.6B Qwen3_5ForConditionalGeneration ScottzillaSystems/Qwen3.5-9B

Key Design Decisions

  1. ZeroGPU requires Gradio SDK — Docker SDK is not supported
  2. Models load inside @spaces.GPU — GPU is allocated per-request
  3. AutoModelForImageTextToText for multimodal models (Qwen3.5, SuperGemma4, Qwen3 VL)
  4. AutoModelForCausalLM for standard text models (ChatGPT-5, Cydonia, Qwen3.6 27B)
  5. Smart auto-routing with fallback chain (T3→T2→T1→T0)
  6. No LiteLLM, no TGI, no Docker Compose — pure transformers + ZeroGPU

Setup

  1. Set HF_TOKEN as Space Secret
  2. Set hardware to ZeroGPU in Space Settings
  3. Done — models load on first request