File size: 1,934 Bytes
d6c1199
0f63d2f
d6c1199
0f63d2f
d6c1199
0f63d2f
d6c1199
 
0f63d2f
d6c1199
0f63d2f
d6c1199
 
 
 
0f63d2f
d6c1199
0f63d2f
d6c1199
 
 
 
 
 
 
 
 
0f63d2f
d6c1199
0f63d2f
d6c1199
 
 
 
 
 
0f63d2f
d6c1199
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# Agent Zero — ZeroGPU Native

**This repo contains the reference implementation for the Agent Zero ZeroGPU Space.**

## ➡️ Live Space

The fully operational Space is at:  
**https://huggingface.co/spaces/ScottzillaSystems/agent-zero-orchestration**

## Architecture

- **SDK**: Gradio (required for ZeroGPU)
- **Hardware**: ZeroGPU (H200, 70GB VRAM)
- **Decorator**: `@spaces.GPU(duration=180)` for all inference functions
- **Models**: Loaded on-demand per request via `transformers`

## Models (ScottzillaSystems Fleet)

| Model | Tier | Size | Architecture | Repo |
|---|---|---|---|---|
| ChatGPT-5 | T0 | 494M | Qwen2ForCausalLM | `ScottzillaSystems/ChatGPT-5` |
| Qwen3.5 9B Opus | T1 | 9.6B | Qwen3_5ForConditionalGeneration | `ScottzillaSystems/Huihui-Qwen3.5-9B-Claude-4.6-Opus-abliterated` |
| SuperGemma4 | T1 | 7.5B | Gemma4ForConditionalGeneration | `ScottzillaSystems/supergemma4-e4b-abliterated` |
| Cydonia 24B | T2 | 24B | MistralForCausalLM | `ScottzillaSystems/Cydonia-24B-v4.1` |
| Qwen3.6 27B | T3 | 27.8B | CausalLM | `ScottzillaSystems/Huihui-Qwen3.6-27B-abliterated` |
| Qwen3 VL 8B | VL | 8.8B | ConditionalGeneration | `ScottzillaSystems/Huihui-Qwen3-VL-8B-Instruct-abliterated` |
| Qwen3.5 9B Base | T1 | 9.6B | Qwen3_5ForConditionalGeneration | `ScottzillaSystems/Qwen3.5-9B` |

## Key Design Decisions

1. **ZeroGPU requires Gradio SDK** — Docker SDK is not supported
2. **Models load inside `@spaces.GPU`** — GPU is allocated per-request
3. **`AutoModelForImageTextToText`** for multimodal models (Qwen3.5, SuperGemma4, Qwen3 VL)
4. **`AutoModelForCausalLM`** for standard text models (ChatGPT-5, Cydonia, Qwen3.6 27B)
5. **Smart auto-routing** with fallback chain (T3→T2→T1→T0)
6. **No LiteLLM, no TGI, no Docker Compose** — pure transformers + ZeroGPU

## Setup

1. Set `HF_TOKEN` as Space Secret
2. Set hardware to ZeroGPU in Space Settings
3. Done — models load on first request