ZeroR3 commited on
Commit
0a86f4b
Β·
1 Parent(s): 16bce8b

feat: HF Space judging-readiness pass

Browse files

- Hero verified-stats one-liner at top (256K, 31/31, 200K needle 3/3, 9/9 e2e, $4.12)
- Reframe Status: HF Spaces don't ship MI300X by default β†’ CPU mock is by design,
not because the project doesn't work. Verified numbers come from real MI300X
stress test on AMD Developer Cloud (124 min, $4.12)
- Add demo video link (youtu.be/BvSBR1QazLU)
- Add Lablab project page + AMD Developer Forum thread #505 + GitHub links
- Memory-architecture comparison as table (MI300X vs H100 80GB)
- Soften like-CTA: factual rather than pleading
- README.md: parallel updates for Space card display

Files changed (2) hide show
  1. README.md +13 -5
  2. app.py +24 -18
README.md CHANGED
@@ -80,13 +80,21 @@ Full stress test on a single AMD MI300X x1 (AMD Developer Cloud, $1.99/hr, vLLM
80
  - βœ… 14.5 active continuous queriers per MI300X, or 70–140 dev seats for typical bursty engineering teams
81
  - βœ… Owned MI300X ($18K) breaks even vs Cursor in 3–6 months at team-of-100 usage
82
 
83
- This Space currently runs CPU-basic with the **mock LLM backend** because keeping a paid MI300X droplet up 24/7 for sporadic visitors is uneconomical. **Final demo wires to a live MI300X endpoint** during the judging window.
84
 
85
- Full evidence pack (7 JSON results + 5 PNG plots + e2e prompts/answers + 2Γ— rocm-smi snapshots + run logs) is in the repo:
86
- [github.com/SRKRZ23/repomind/tree/main/benchmarks/2026-05-05-mi300x-stress-test](https://github.com/SRKRZ23/repomind/tree/main/benchmarks/2026-05-05-mi300x-stress-test)
87
- Extended PHASE 1+2 narrative (24-cell matrix + AITER A/B): [extended/SUMMARY.md](https://github.com/SRKRZ23/repomind/tree/main/benchmarks/2026-05-05-mi300x-stress-test/extended).
88
 
89
- If the MI300X memory-architecture pitch resonates, **a like on this Space helps us with the Hugging Face Special Prize judging** πŸ€—
 
 
 
 
 
 
 
 
 
 
90
 
91
  ## Author
92
 
 
80
  - βœ… 14.5 active continuous queriers per MI300X, or 70–140 dev seats for typical bursty engineering teams
81
  - βœ… Owned MI300X ($18K) breaks even vs Cursor in 3–6 months at team-of-100 usage
82
 
83
+ ## Demo backend
84
 
85
+ HF Spaces ship CPU / consumer GPUs by default β€” not MI300X. So this Space serves a **CPU mock for UI demonstration only**. The verified performance numbers above come from a real MI300X stress test on AMD Developer Cloud (124 min, $4.12).
 
 
86
 
87
+ To wire a real MI300X endpoint, set Space secrets `VLLM_BASE_URL` + `MODEL_NAME=Qwen/Qwen3-Coder-Next-FP8` against a vLLM 0.17.1 server. For a live walkthrough on a hosted MI300X, contact razikovsardor1@gmail.com.
88
+
89
+ ## Evidence
90
+
91
+ - **1-minute demo video**: <https://youtu.be/BvSBR1QazLU>
92
+ - **Lablab project page**: <https://lablab.ai/ai-hackathons/amd-developer/repomind/repomind>
93
+ - **AMD Developer Forum thread #505** (AITER FP8 regression filed): <https://devcommunity.amd.com/t/repomind-open-source-repo-scale-coding-agent-on-a-single-mi300x-256k-context-fp8-31-31x-concurrency-verified/505>
94
+ - **Full evidence pack** (7 JSON results + 5 PNG plots + e2e prompts/answers + 2Γ— rocm-smi snapshots + run logs): [github.com/SRKRZ23/repomind/tree/main/benchmarks/2026-05-05-mi300x-stress-test](https://github.com/SRKRZ23/repomind/tree/main/benchmarks/2026-05-05-mi300x-stress-test)
95
+ - **Extended PHASE 1+2 narrative** (24-cell matrix + AITER A/B): [extended/SUMMARY.md](https://github.com/SRKRZ23/repomind/tree/main/benchmarks/2026-05-05-mi300x-stress-test/extended)
96
+
97
+ Built for the AMD Developer Hackathon 2026 β€” eligible for the **Hugging Face Special Prize**. If the verified MI300X numbers are useful, a Space like is appreciated. πŸ€—
98
 
99
  ## Author
100
 
app.py CHANGED
@@ -36,11 +36,10 @@ from ingestion.cloner import clone
36
  VLLM_BASE_URL = os.environ.get("VLLM_BASE_URL", "").strip()
37
  MODEL_NAME = os.environ.get("MODEL_NAME", "Qwen/Qwen3-Coder-Next-FP8").strip()
38
  LIVE_BACKEND = bool(VLLM_BASE_URL)
39
- BACKEND_LABEL = "🟒 Live AMD MI300X" if LIVE_BACKEND else "🟑 Mock backend (CPU-basic, demo mode)"
40
- BACKEND_HINT = (
41
- f"Connected to vLLM endpoint: `{VLLM_BASE_URL}` Β· model `{MODEL_NAME}`"
42
- if LIVE_BACKEND else
43
- "Set the Space secrets `VLLM_BASE_URL` + `MODEL_NAME` to wire a real MI300X backend."
44
  )
45
 
46
 
@@ -51,23 +50,30 @@ HEADER_MD = f"""
51
  Ingest a git repository (up to 256K tokens, FP8) on a single GPU and
52
  reason across the whole codebase with multi-step tool use.
53
 
54
- > πŸ“¦ GitHub: <a href="https://github.com/SRKRZ23/repomind" target="_blank" rel="noopener noreferrer">SRKRZ23/repomind</a> Β· MIT
55
- > πŸ† Built for the <a href="https://lablab.ai/ai-hackathons/amd-developer" target="_blank" rel="noopener noreferrer">AMD Developer Hackathon 2026</a>
56
- > πŸ€— HF Special Prize candidate Β· πŸ›‘ Conservative claim discipline applied
57
 
58
- ### Why AMD MI300X (verified 2026-05-05 on real hardware)
 
 
 
59
 
60
- - Qwen3-Coder-Next-FP8 weights = **77.29 GiB** in VRAM (verified)
61
- - 256K KV cache @ FP8 = **94.58 GiB** available (2,065,744 tokens, verified)
62
- - Activations + framework overhead β†’ peak 176/191.7 GiB β‰ˆ **92% utilization**
63
- - NVIDIA H100 80 GB cannot accommodate this on a single card by VRAM
64
- accounting (~143 GB > 80 GB); MI300X 192 GB has the headroom
65
 
66
- ### Status
 
 
 
 
 
 
 
 
 
 
67
 
68
  **Backend right now**: {BACKEND_LABEL}
69
 
70
- {BACKEND_HINT}
71
  """
72
 
73
 
@@ -237,8 +243,8 @@ with gr.Blocks(
237
  <a href="mailto:razikovsardor1@gmail.com">razikovsardor1@gmail.com</a> Β·
238
  <a href="mailto:razikovs777@gmail.com">razikovs777@gmail.com</a>
239
  </p>
240
- <p><em>If the MI300X memory-architecture story resonates,
241
- <strong>a like on this Space helps with the Hugging Face Special Prize judging.</strong> πŸ€—</em></p>
242
  """
243
  )
244
 
 
36
  VLLM_BASE_URL = os.environ.get("VLLM_BASE_URL", "").strip()
37
  MODEL_NAME = os.environ.get("MODEL_NAME", "Qwen/Qwen3-Coder-Next-FP8").strip()
38
  LIVE_BACKEND = bool(VLLM_BASE_URL)
39
+ BACKEND_LABEL = (
40
+ f"🟒 Live AMD MI300X (vLLM endpoint `{VLLM_BASE_URL}`, model `{MODEL_NAME}`)"
41
+ if LIVE_BACKEND
42
+ else "🟑 CPU mock β€” HF Spaces ship CPU/T4 by default, not MI300X"
 
43
  )
44
 
45
 
 
50
  Ingest a git repository (up to 256K tokens, FP8) on a single GPU and
51
  reason across the whole codebase with multi-step tool use.
52
 
53
+ > **Verified on a single MI300X (2026-05-05):** 256K context Β· 31/31 concurrent users at 8K–64K Β· 200K needle-in-haystack 3/3 Β· 9/9 end-to-end repo questions correct Β· $4.12 total stress test cost Β· AITER FP8 attention backend regression filed for AMD review.
 
 
54
 
55
+ > 🎬 <a href="https://youtu.be/BvSBR1QazLU" target="_blank" rel="noopener noreferrer">1-minute demo video</a> ·
56
+ > πŸ“¦ <a href="https://github.com/SRKRZ23/repomind" target="_blank" rel="noopener noreferrer">GitHub source (MIT)</a> Β·
57
+ > πŸ† <a href="https://lablab.ai/ai-hackathons/amd-developer/repomind/repomind" target="_blank" rel="noopener noreferrer">Lablab project page</a> Β·
58
+ > πŸ› <a href="https://devcommunity.amd.com/t/repomind-open-source-repo-scale-coding-agent-on-a-single-mi300x-256k-context-fp8-31-31x-concurrency-verified/505" target="_blank" rel="noopener noreferrer">AMD Developer Forum thread #505</a>
59
 
60
+ ### Why AMD MI300X β€” memory architecture
 
 
 
 
61
 
62
+ | Component | Verified on MI300X | NVIDIA H100 80 GB |
63
+ |---|---|---|
64
+ | Qwen3-Coder-Next-FP8 weights in VRAM | **77.29 GiB** | fits |
65
+ | 256K KV cache @ FP8 (2,065,744 tokens) | **94.58 GiB** available | cannot fit |
66
+ | Total peak utilization | **176 / 191.7 GiB (92%)** | cannot accommodate (~143 GB > 80 GB) |
67
+
68
+ This is a memory-architecture story. AMD MI300X 192 GB has the headroom on a single card; NVIDIA H100 80 GB cannot accommodate the same configuration by VRAM accounting.
69
+
70
+ ### Demo backend
71
+
72
+ **This Space serves a CPU mock for UI demonstration only** β€” HF Spaces don't ship MI300X GPUs. The verified performance numbers above and in the *Verified evidence* tab come from a real MI300X stress test on AMD Developer Cloud (124 min, $4.12).
73
 
74
  **Backend right now**: {BACKEND_LABEL}
75
 
76
+ To wire a real MI300X endpoint, set Space secrets `VLLM_BASE_URL` + `MODEL_NAME=Qwen/Qwen3-Coder-Next-FP8`. For a live walkthrough on a hosted MI300X, contact razikovsardor1@gmail.com.
77
  """
78
 
79
 
 
243
  <a href="mailto:razikovsardor1@gmail.com">razikovsardor1@gmail.com</a> Β·
244
  <a href="mailto:razikovs777@gmail.com">razikovs777@gmail.com</a>
245
  </p>
246
+ <p><em>Built for the AMD Developer Hackathon 2026 β€” eligible for the
247
+ <strong>Hugging Face Special Prize</strong>. If the verified MI300X numbers are useful, a Space like is appreciated. πŸ€—</em></p>
248
  """
249
  )
250