Upload README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Agent Zero — Native HF Space
|
| 2 |
+
|
| 3 |
+
**Fixed version that loads your ACTUAL model weights** — not a proxy.
|
| 4 |
+
|
| 5 |
+
## What was wrong with the old Agent Zero
|
| 6 |
+
|
| 7 |
+
The old Agent Zero (`agent-zero`, `agent-zero-pentesting`, etc.) was designed as a **Docker Compose multi-service stack** — LiteLLM proxy + TGI endpoints + PostgreSQL + SearXNG. On HF Spaces, only a single Docker container runs. The orchestrator tries to connect to `http://localhost:4000` (LiteLLM proxy) which **doesn't exist**, so **no models ever load**.
|
| 8 |
+
|
| 9 |
+
The "models_loaded: 3" in the logs was fake — the service_monitor was reporting ollama container health, not actual model availability.
|
| 10 |
+
|
| 11 |
+
## What this does
|
| 12 |
+
|
| 13 |
+
- Loads your **actual model weights** from your HF repos via `AutoModelForCausalLM.from_pretrained()`
|
| 14 |
+
- No LiteLLM, no TGI, no PostgreSQL, no Docker Compose
|
| 15 |
+
- Models load on-demand, persist in memory cache
|
| 16 |
+
- ZeroGPU compatible (`@spaces.GPU` decorator)
|
| 17 |
+
- Select any model from the catalog dropdown
|
| 18 |
+
|
| 19 |
+
## Models available
|
| 20 |
+
|
| 21 |
+
| Model | Tier | Size | Repo |
|
| 22 |
+
|---|---|---|---|
|
| 23 |
+
| chatgpt5 | T0 | 494M | `ScottzillaSystems/ChatGPT-5-Chat` |
|
| 24 |
+
| qwen3.5-9b | T1 | 9.6B | `ScottzillaSystems/Qwen3.5-9B-Chat` |
|
| 25 |
+
| cydonia-24b | T2 | 24B | `ScottzillaSystems/Cydonia-24B-v4.1` |
|
| 26 |
+
| qwen3.5-27b | T3 | 27B | `ScottzillaSystems/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled` |
|
| 27 |
+
| fallen-command | T4 | 111B | `ScottzillaSystems/Fallen-Command-A-111B-Chat` |
|
| 28 |
+
|
| 29 |
+
## Hardware
|
| 30 |
+
|
| 31 |
+
Currently configured for `cpu-basic` startup. Upgrade to `a10g-large` or `a100-large` for larger models. ZeroGPU (`zero-a10g`) works for models up to 24B.
|