Spaces:

lablab-ai-amd-developer-hackathon
/

ForgeSight

Sleeping

App Files Files Community

rasAli02 commited on May 5

Commit

cb59dbe

1 Parent(s): 53cd64c

feat: update AMD inference server to 165.245.143.46 and increase timeout

Browse files

Files changed (2) hide show

README.md +6 -0
agents.py +4 -4

README.md CHANGED Viewed

@@ -23,6 +23,12 @@ tags:
 # 🔍 ForgeSight — Multimodal Quality-Control Copilot
 > **AMD + lablab.ai Hackathon** — Track 2 (AMD Developer Cloud) · Track 1 (AI Agents) · Track 3 (Vision & Multimodal AI)
 ForgeSight is a production-ready AI system that performs automated visual quality control on the **AMD Instinct MI300X** GPU. Upload a product image and a 4-agent agentic pipeline delivers a structured defect report in seconds.

 # 🔍 ForgeSight — Multimodal Quality-Control Copilot
+### ⚡ Live Status (Hackathon Mode)
+- **Primary Inference**: AMD Instinct MI300X (192GB VRAM)
+- **Backend**: FastAPI + vLLM on ROCm
+- **Current Server**: `165.245.143.46` (vLLM via Token Auth)
+- **Status**: ✅ **ONLINE** (Live Inference Active)
 > **AMD + lablab.ai Hackathon** — Track 2 (AMD Developer Cloud) · Track 1 (AI Agents) · Track 3 (Vision & Multimodal AI)
 ForgeSight is a production-ready AI system that performs automated visual quality control on the **AMD Instinct MI300X** GPU. Upload a product image and a 4-agent agentic pipeline delivers a structured defect report in seconds.

agents.py CHANGED Viewed

@@ -15,17 +15,17 @@ import httpx  # async HTTP — lightweight, no extra deps beyond requirements
 # ── AMD vLLM inference endpoint ─────────────────────────────────────────────
 # vLLM exposes an OpenAI-compatible API at /v1/chat/completions.
 # Set AMD_INFERENCE_URL in your .env to point at the running vLLM server.
-# Example: http://129.212.191.163:8000   (direct port — ensure firewall allows it)
 # Or use the Jupyter proxy route: http://129.212.191.163/proxy/8000
 AMD_INFERENCE_URL = os.environ.get(
     "AMD_INFERENCE_URL",
-    "http://129.212.184.42"
 ).rstrip("/")
 # Token for the AMD inference server (if required)
 AMD_INFERENCE_TOKEN = os.environ.get(
     "AMD_INFERENCE_TOKEN",
-    "sr49urlf/6cgbSvhp8lg1EyTiHd2VvsOa6dev8Rc/vfK83fra"
 )
 # The model name vLLM is serving (used in the chat/completions request).
@@ -33,7 +33,7 @@ AMD_INFERENCE_TOKEN = os.environ.get(
 AMD_MODEL_NAME = os.environ.get("AMD_MODEL_NAME", "Qwen/Qwen2-VL-7B-Instruct")
 # Timeout (seconds) to wait for the AMD server before falling back to mock.
-AMD_TIMEOUT = float(os.environ.get("AMD_TIMEOUT", "30"))
 # ── System prompts ───────────────────────────────────────────────────────────
 INSPECTOR_SYSTEM = """You are the INSPECTOR agent of ForgeSight — a multimodal quality-control copilot

 # ── AMD vLLM inference endpoint ─────────────────────────────────────────────
 # vLLM exposes an OpenAI-compatible API at /v1/chat/completions.
 # Set AMD_INFERENCE_URL in your .env to point at the running vLLM server.
+# Example: http://129.212.191.163   (direct port — ensure firewall allows it)
 # Or use the Jupyter proxy route: http://129.212.191.163/proxy/8000
 AMD_INFERENCE_URL = os.environ.get(
     "AMD_INFERENCE_URL",
+    "http://165.245.143.46:8000"
 ).rstrip("/")
 # Token for the AMD inference server (if required)
 AMD_INFERENCE_TOKEN = os.environ.get(
     "AMD_INFERENCE_TOKEN",
+    "5peRa6unb0DdXvzB3Pbck48IgNTDmxeJSUvE4NdnhvW70FcaX"
 )
 # The model name vLLM is serving (used in the chat/completions request).
 AMD_MODEL_NAME = os.environ.get("AMD_MODEL_NAME", "Qwen/Qwen2-VL-7B-Instruct")
 # Timeout (seconds) to wait for the AMD server before falling back to mock.
+AMD_TIMEOUT = float(os.environ.get("AMD_TIMEOUT", "60"))
 # ── System prompts ───────────────────────────────────────────────────────────
 INSPECTOR_SYSTEM = """You are the INSPECTOR agent of ForgeSight — a multimodal quality-control copilot