repomind

Sleeping

ZeroR3 commited on 20 days ago

Commit

0a86f4b

1 Parent(s): 16bce8b

feat: HF Space judging-readiness pass

- Hero verified-stats one-liner at top (256K, 31/31, 200K needle 3/3, 9/9 e2e, $4.12)
- Reframe Status: HF Spaces don't ship MI300X by default → CPU mock is by design,
not because the project doesn't work. Verified numbers come from real MI300X
stress test on AMD Developer Cloud (124 min, $4.12)
- Add demo video link (youtu.be/BvSBR1QazLU)
- Add Lablab project page + AMD Developer Forum thread #505 + GitHub links
- Memory-architecture comparison as table (MI300X vs H100 80GB)
- Soften like-CTA: factual rather than pleading
- README.md: parallel updates for Space card display

Files changed (2) hide show

README.md +13 -5
app.py +24 -18

README.md CHANGED Viewed

@@ -80,13 +80,21 @@ Full stress test on a single AMD MI300X x1 (AMD Developer Cloud, $1.99/hr, vLLM
 - ✅ 14.5 active continuous queriers per MI300X, or 70–140 dev seats for typical bursty engineering teams
 - ✅ Owned MI300X ($18K) breaks even vs Cursor in 3–6 months at team-of-100 usage
-This Space currently runs CPU-basic with the **mock LLM backend** because keeping a paid MI300X droplet up 24/7 for sporadic visitors is uneconomical. **Final demo wires to a live MI300X endpoint** during the judging window.
-Full evidence pack (7 JSON results + 5 PNG plots + e2e prompts/answers + 2× rocm-smi snapshots + run logs) is in the repo:
-[github.com/SRKRZ23/repomind/tree/main/benchmarks/2026-05-05-mi300x-stress-test](https://github.com/SRKRZ23/repomind/tree/main/benchmarks/2026-05-05-mi300x-stress-test)
-Extended PHASE 1+2 narrative (24-cell matrix + AITER A/B): [extended/SUMMARY.md](https://github.com/SRKRZ23/repomind/tree/main/benchmarks/2026-05-05-mi300x-stress-test/extended).
-If the MI300X memory-architecture pitch resonates, **a like on this Space helps us with the Hugging Face Special Prize judging** 🤗
 ## Author

 - ✅ 14.5 active continuous queriers per MI300X, or 70–140 dev seats for typical bursty engineering teams
 - ✅ Owned MI300X ($18K) breaks even vs Cursor in 3–6 months at team-of-100 usage
+## Demo backend
+HF Spaces ship CPU / consumer GPUs by default — not MI300X. So this Space serves a **CPU mock for UI demonstration only**. The verified performance numbers above come from a real MI300X stress test on AMD Developer Cloud (124 min, $4.12).
+To wire a real MI300X endpoint, set Space secrets `VLLM_BASE_URL` + `MODEL_NAME=Qwen/Qwen3-Coder-Next-FP8` against a vLLM 0.17.1 server. For a live walkthrough on a hosted MI300X, contact razikovsardor1@gmail.com.
+## Evidence
+- **1-minute demo video**: <https://youtu.be/BvSBR1QazLU>
+- **Lablab project page**: <https://lablab.ai/ai-hackathons/amd-developer/repomind/repomind>
+- **AMD Developer Forum thread #505** (AITER FP8 regression filed): <https://devcommunity.amd.com/t/repomind-open-source-repo-scale-coding-agent-on-a-single-mi300x-256k-context-fp8-31-31x-concurrency-verified/505>
+- **Full evidence pack** (7 JSON results + 5 PNG plots + e2e prompts/answers + 2× rocm-smi snapshots + run logs): [github.com/SRKRZ23/repomind/tree/main/benchmarks/2026-05-05-mi300x-stress-test](https://github.com/SRKRZ23/repomind/tree/main/benchmarks/2026-05-05-mi300x-stress-test)
+- **Extended PHASE 1+2 narrative** (24-cell matrix + AITER A/B): [extended/SUMMARY.md](https://github.com/SRKRZ23/repomind/tree/main/benchmarks/2026-05-05-mi300x-stress-test/extended)
+Built for the AMD Developer Hackathon 2026 — eligible for the **Hugging Face Special Prize**. If the verified MI300X numbers are useful, a Space like is appreciated. 🤗
 ## Author

app.py CHANGED Viewed

@@ -36,11 +36,10 @@ from ingestion.cloner import clone
 VLLM_BASE_URL = os.environ.get("VLLM_BASE_URL", "").strip()
 MODEL_NAME = os.environ.get("MODEL_NAME", "Qwen/Qwen3-Coder-Next-FP8").strip()
 LIVE_BACKEND = bool(VLLM_BASE_URL)
-BACKEND_LABEL = "🟢 Live AMD MI300X" if LIVE_BACKEND else "🟡 Mock backend (CPU-basic, demo mode)"
-BACKEND_HINT = (
-    f"Connected to vLLM endpoint: `{VLLM_BASE_URL}` · model `{MODEL_NAME}`"
-    if LIVE_BACKEND else
-    "Set the Space secrets `VLLM_BASE_URL` + `MODEL_NAME` to wire a real MI300X backend."
 )
@@ -51,23 +50,30 @@ HEADER_MD = f"""
 Ingest a git repository (up to 256K tokens, FP8) on a single GPU and
 reason across the whole codebase with multi-step tool use.
-> 📦 GitHub: <a href="https://github.com/SRKRZ23/repomind" target="_blank" rel="noopener noreferrer">SRKRZ23/repomind</a> · MIT
-> 🏆 Built for the <a href="https://lablab.ai/ai-hackathons/amd-developer" target="_blank" rel="noopener noreferrer">AMD Developer Hackathon 2026</a>
-> 🤗 HF Special Prize candidate · 🛡 Conservative claim discipline applied
-### Why AMD MI300X (verified 2026-05-05 on real hardware)
-- Qwen3-Coder-Next-FP8 weights = **77.29 GiB** in VRAM (verified)
-- 256K KV cache @ FP8 = **94.58 GiB** available (2,065,744 tokens, verified)
-- Activations + framework overhead → peak 176/191.7 GiB ≈ **92% utilization**
-- NVIDIA H100 80 GB cannot accommodate this on a single card by VRAM
-  accounting (~143 GB > 80 GB); MI300X 192 GB has the headroom
-### Status
 **Backend right now**: {BACKEND_LABEL}
-{BACKEND_HINT}
 """
@@ -237,8 +243,8 @@ with gr.Blocks(
           <a href="mailto:razikovsardor1@gmail.com">razikovsardor1@gmail.com</a> ·
           <a href="mailto:razikovs777@gmail.com">razikovs777@gmail.com</a>
         </p>
-        <p><em>If the MI300X memory-architecture story resonates,
-          <strong>a like on this Space helps with the Hugging Face Special Prize judging.</strong> 🤗</em></p>
         """
     )

 VLLM_BASE_URL = os.environ.get("VLLM_BASE_URL", "").strip()
 MODEL_NAME = os.environ.get("MODEL_NAME", "Qwen/Qwen3-Coder-Next-FP8").strip()
 LIVE_BACKEND = bool(VLLM_BASE_URL)
+BACKEND_LABEL = (
+    f"🟢 Live AMD MI300X (vLLM endpoint `{VLLM_BASE_URL}`, model `{MODEL_NAME}`)"
+    if LIVE_BACKEND
+    else "🟡 CPU mock — HF Spaces ship CPU/T4 by default, not MI300X"
 )
 Ingest a git repository (up to 256K tokens, FP8) on a single GPU and
 reason across the whole codebase with multi-step tool use.
+> **Verified on a single MI300X (2026-05-05):** 256K context · 31/31 concurrent users at 8K–64K · 200K needle-in-haystack 3/3 · 9/9 end-to-end repo questions correct · $4.12 total stress test cost · AITER FP8 attention backend regression filed for AMD review.
+> 🎬 <a href="https://youtu.be/BvSBR1QazLU" target="_blank" rel="noopener noreferrer">1-minute demo video</a> ·
+> 📦 <a href="https://github.com/SRKRZ23/repomind" target="_blank" rel="noopener noreferrer">GitHub source (MIT)</a> ·
+> 🏆 <a href="https://lablab.ai/ai-hackathons/amd-developer/repomind/repomind" target="_blank" rel="noopener noreferrer">Lablab project page</a> ·
+> 🐛 <a href="https://devcommunity.amd.com/t/repomind-open-source-repo-scale-coding-agent-on-a-single-mi300x-256k-context-fp8-31-31x-concurrency-verified/505" target="_blank" rel="noopener noreferrer">AMD Developer Forum thread #505</a>
+### Why AMD MI300X — memory architecture
+| Component | Verified on MI300X | NVIDIA H100 80 GB |
+|---|---|---|
+| Qwen3-Coder-Next-FP8 weights in VRAM | **77.29 GiB** | fits |
+| 256K KV cache @ FP8 (2,065,744 tokens) | **94.58 GiB** available | cannot fit |
+| Total peak utilization | **176 / 191.7 GiB (92%)** | cannot accommodate (~143 GB > 80 GB) |
+This is a memory-architecture story. AMD MI300X 192 GB has the headroom on a single card; NVIDIA H100 80 GB cannot accommodate the same configuration by VRAM accounting.
+### Demo backend
+**This Space serves a CPU mock for UI demonstration only** — HF Spaces don't ship MI300X GPUs. The verified performance numbers above and in the *Verified evidence* tab come from a real MI300X stress test on AMD Developer Cloud (124 min, $4.12).
 **Backend right now**: {BACKEND_LABEL}
+To wire a real MI300X endpoint, set Space secrets `VLLM_BASE_URL` + `MODEL_NAME=Qwen/Qwen3-Coder-Next-FP8`. For a live walkthrough on a hosted MI300X, contact razikovsardor1@gmail.com.
 """
           <a href="mailto:razikovsardor1@gmail.com">razikovsardor1@gmail.com</a> ·
           <a href="mailto:razikovs777@gmail.com">razikovs777@gmail.com</a>
         </p>
+        <p><em>Built for the AMD Developer Hackathon 2026 — eligible for the
+          <strong>Hugging Face Special Prize</strong>. If the verified MI300X numbers are useful, a Space like is appreciated. 🤗</em></p>
         """
     )