rasAli02 commited on
Commit
11c606d
Β·
verified Β·
1 Parent(s): 5afad50

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. .gitattributes +35 -35
  2. README.md +128 -128
.gitattributes CHANGED
@@ -1,35 +1,35 @@
1
- *.7z filter=lfs diff=lfs merge=lfs -text
2
- *.arrow filter=lfs diff=lfs merge=lfs -text
3
- *.bin filter=lfs diff=lfs merge=lfs -text
4
- *.bz2 filter=lfs diff=lfs merge=lfs -text
5
- *.ckpt filter=lfs diff=lfs merge=lfs -text
6
- *.ftz filter=lfs diff=lfs merge=lfs -text
7
- *.gz filter=lfs diff=lfs merge=lfs -text
8
- *.h5 filter=lfs diff=lfs merge=lfs -text
9
- *.joblib filter=lfs diff=lfs merge=lfs -text
10
- *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
- *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
- *.model filter=lfs diff=lfs merge=lfs -text
13
- *.msgpack filter=lfs diff=lfs merge=lfs -text
14
- *.npy filter=lfs diff=lfs merge=lfs -text
15
- *.npz filter=lfs diff=lfs merge=lfs -text
16
- *.onnx filter=lfs diff=lfs merge=lfs -text
17
- *.ot filter=lfs diff=lfs merge=lfs -text
18
- *.parquet filter=lfs diff=lfs merge=lfs -text
19
- *.pb filter=lfs diff=lfs merge=lfs -text
20
- *.pickle filter=lfs diff=lfs merge=lfs -text
21
- *.pkl filter=lfs diff=lfs merge=lfs -text
22
- *.pt filter=lfs diff=lfs merge=lfs -text
23
- *.pth filter=lfs diff=lfs merge=lfs -text
24
- *.rar filter=lfs diff=lfs merge=lfs -text
25
- *.safetensors filter=lfs diff=lfs merge=lfs -text
26
- saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
- *.tar.* filter=lfs diff=lfs merge=lfs -text
28
- *.tar filter=lfs diff=lfs merge=lfs -text
29
- *.tflite filter=lfs diff=lfs merge=lfs -text
30
- *.tgz filter=lfs diff=lfs merge=lfs -text
31
- *.wasm filter=lfs diff=lfs merge=lfs -text
32
- *.xz filter=lfs diff=lfs merge=lfs -text
33
- *.zip filter=lfs diff=lfs merge=lfs -text
34
- *.zst filter=lfs diff=lfs merge=lfs -text
35
- *tfevents* filter=lfs diff=lfs merge=lfs -text
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,128 +1,128 @@
1
- ---
2
- title: ForgeSight
3
- emoji: πŸ”
4
- colorFrom: red
5
- colorTo: gray
6
- sdk: docker
7
- pinned: true
8
- license: mit
9
- short_description: "Multimodal QC Copilot Β· AMD MI300X Β· Qwen2-VL"
10
- tags:
11
- - amd
12
- - rocm
13
- - mi300x
14
- - qwen
15
- - qwen2-vl
16
- - vllm
17
- - quality-control
18
- - agents
19
- - multimodal
20
- - industrial-ai
21
- - vision
22
- ---
23
-
24
- # πŸ” ForgeSight β€” Multimodal Quality-Control Copilot
25
-
26
- ### ⚑ Live Status (Hackathon Mode)
27
- - **Primary Inference**: AMD Instinct MI300X (192GB VRAM)
28
- - **Backend**: FastAPI + vLLM on ROCm
29
- - **Status**: βœ… **ONLINE** (Live Inference Active)
30
- - **Current Server**: `165.245.137.80` (vLLM via Token Auth)
31
-
32
- > **AMD + lablab.ai Hackathon** β€” Track 2 (AMD Developer Cloud) Β· Track 1 (AI Agents) Β· Track 3 (Vision & Multimodal AI)
33
-
34
- ForgeSight is a production-ready AI system that performs automated visual quality control on the **AMD Instinct MI300X** GPU. Upload a product image and a 4-agent agentic pipeline delivers a structured defect report in seconds.
35
-
36
- ---
37
-
38
- ## πŸ€– Qwen2-VL β€” The Brain of ForgeSight
39
-
40
- ForgeSight is powered entirely by **[Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)**, Alibaba Cloud's state-of-the-art multimodal vision-language model.
41
-
42
- ### Why Qwen2-VL?
43
-
44
- | Capability | How ForgeSight uses it |
45
- | --- | --- |
46
- | **Image understanding** | Reads raw product images β€” scratches, cracks, misalignments |
47
- | **Structured JSON output** | Each agent returns typed JSON: verdicts, defect lists, action codes |
48
- | **Long-context reasoning** | Diagnostician agent cross-references inspector findings over 8K tokens |
49
- | **Multilingual** | Operator notes can be submitted in any language |
50
- | **192 GB VRAM on MI300X** | Entire 7B model fits in GPU memory with headroom for 88Γ— concurrent sessions |
51
-
52
- ### How Qwen2-VL is used across the 4-agent pipeline
53
-
54
- ```text
55
- Image Input (JPEG/PNG/WEBP)
56
- β”‚
57
- β–Ό
58
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
59
- β”‚ Agent 1 Β· INSPECTOR (Qwen2-VL) β”‚
60
- β”‚ β†’ Detects defects, produces verdict: pass / warn / failβ”‚
61
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
62
- β”‚ inspector_report
63
- β–Ό
64
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
65
- β”‚ Agent 2 Β· DIAGNOSTICIAN (Qwen2-VL) β”‚
66
- β”‚ β†’ Classifies root cause, estimates severity β”‚
67
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
68
- β”‚ diagnostic_report
69
- β–Ό
70
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
71
- β”‚ Agent 3 Β· ACTION (Qwen2-VL) β”‚
72
- β”‚ β†’ Maps defects to priority codes (P0–P3) + actions β”‚
73
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
74
- β”‚ action_plan
75
- β–Ό
76
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
77
- β”‚ Agent 4 Β· REPORTER (Qwen2-VL) β”‚
78
- β”‚ β†’ Writes a human-readable QC report + social post β”‚
79
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
80
- β”‚
81
- β–Ό
82
- Structured JSON β†’ React Dashboard
83
- ```
84
-
85
- ---
86
-
87
- ## πŸ—οΈ Architecture
88
-
89
- | Layer | Technology |
90
- | --- | --- |
91
- | **Hardware** | AMD Instinct MI300X Β· 192 GB HBM3 |
92
- | **Runtime** | ROCm 7.2.1 Β· PyTorch 2.10 (ROCm build) |
93
- | **Inference** | vLLM 0.20.1 (ROCm wheels) Β· OpenAI-compatible API |
94
- | **Model** | Qwen/Qwen2-VL-7B-Instruct |
95
- | **Backend** | FastAPI + Gradio Β· Python 3.12 |
96
- | **Persistence** | MongoDB Atlas (motor async driver) |
97
- | **Frontend** | React 18 Β· Recharts Β· Lucide |
98
- | **Deployment** | Hugging Face Spaces (Docker) |
99
-
100
- ---
101
-
102
- ## πŸš€ Running Locally
103
-
104
- ```bash
105
- # 1. Start vLLM on your AMD GPU
106
- python -m vllm.entrypoints.openai.api_server \
107
- --model Qwen/Qwen2-VL-7B-Instruct \
108
- --host 0.0.0.0 --port 8000 \
109
- --allowed-origins '["*"]'
110
-
111
- # 2. Set environment variables
112
- export AMD_INFERENCE_URL=http://localhost:8000
113
- export AMD_MODEL_NAME=Qwen/Qwen2-VL-7B-Instruct
114
- export MONGO_URL=mongodb+srv://... # optional
115
-
116
- # 3. Start the backend
117
- pip install -r requirements.txt
118
- python app.py
119
- ```
120
-
121
- ---
122
-
123
- ## 🎯 Hackathon Track Alignment
124
-
125
- - **Track 2 Β· AMD Developer Cloud** *(primary)*: Real MI300X inference via ROCm/vLLM
126
- - **Track 1 Β· AI Agents**: 4-agent agentic workflow (Inspector β†’ Diagnostician β†’ Action β†’ Reporter)
127
- - **Track 3 Β· Vision & Multimodal AI**: Qwen2-VL processing product images for industrial QC
128
- - **Qwen Challenge**: Qwen2-VL-7B-Instruct is the sole model powering all four agents end-to-end
 
1
+ ---
2
+ title: ForgeSight
3
+ emoji: πŸ”
4
+ colorFrom: red
5
+ colorTo: gray
6
+ sdk: docker
7
+ pinned: true
8
+ license: mit
9
+ short_description: "Multimodal QC Copilot Β· AMD MI300X Β· Qwen2-VL"
10
+ tags:
11
+ - amd
12
+ - rocm
13
+ - mi300x
14
+ - qwen
15
+ - qwen2-vl
16
+ - vllm
17
+ - quality-control
18
+ - agents
19
+ - multimodal
20
+ - industrial-ai
21
+ - vision
22
+ ---
23
+
24
+ # πŸ” ForgeSight β€” Multimodal Quality-Control Copilot
25
+
26
+ ### ⚑ Live Status (Hackathon Mode)
27
+ - **Primary Inference**: AMD Instinct MI300X (192GB VRAM)
28
+ - **Backend**: FastAPI + vLLM on ROCm
29
+ - **Status**: βœ… **ONLINE** (Live Inference Active)
30
+ - **Current Server**: `165.245.137.80` (vLLM via Token Auth)
31
+
32
+ > **AMD + lablab.ai Hackathon** β€” Track 2 (AMD Developer Cloud) Β· Track 1 (AI Agents) Β· Track 3 (Vision & Multimodal AI)
33
+
34
+ ForgeSight is a production-ready AI system that performs automated visual quality control on the **AMD Instinct MI300X** GPU. Upload a product image and a 4-agent agentic pipeline delivers a structured defect report in seconds.
35
+
36
+ ---
37
+
38
+ ## πŸ€– Qwen2-VL β€” The Brain of ForgeSight
39
+
40
+ ForgeSight is powered entirely by **[Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)**, Alibaba Cloud's state-of-the-art multimodal vision-language model.
41
+
42
+ ### Why Qwen2-VL?
43
+
44
+ | Capability | How ForgeSight uses it |
45
+ | --- | --- |
46
+ | **Image understanding** | Reads raw product images β€” scratches, cracks, misalignments |
47
+ | **Structured JSON output** | Each agent returns typed JSON: verdicts, defect lists, action codes |
48
+ | **Long-context reasoning** | Diagnostician agent cross-references inspector findings over 8K tokens |
49
+ | **Multilingual** | Operator notes can be submitted in any language |
50
+ | **192 GB VRAM on MI300X** | Entire 7B model fits in GPU memory with headroom for 88Γ— concurrent sessions |
51
+
52
+ ### How Qwen2-VL is used across the 4-agent pipeline
53
+
54
+ ```text
55
+ Image Input (JPEG/PNG/WEBP)
56
+ β”‚
57
+ β–Ό
58
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
59
+ β”‚ Agent 1 Β· INSPECTOR (Qwen2-VL) β”‚
60
+ β”‚ β†’ Detects defects, produces verdict: pass / warn / failβ”‚
61
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
62
+ β”‚ inspector_report
63
+ β–Ό
64
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
65
+ β”‚ Agent 2 Β· DIAGNOSTICIAN (Qwen2-VL) β”‚
66
+ β”‚ β†’ Classifies root cause, estimates severity β”‚
67
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
68
+ β”‚ diagnostic_report
69
+ β–Ό
70
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
71
+ β”‚ Agent 3 Β· ACTION (Qwen2-VL) β”‚
72
+ β”‚ β†’ Maps defects to priority codes (P0–P3) + actions β”‚
73
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
74
+ β”‚ action_plan
75
+ β–Ό
76
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
77
+ β”‚ Agent 4 Β· REPORTER (Qwen2-VL) β”‚
78
+ β”‚ β†’ Writes a human-readable QC report + social post β”‚
79
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
80
+ β”‚
81
+ β–Ό
82
+ Structured JSON β†’ React Dashboard
83
+ ```
84
+
85
+ ---
86
+
87
+ ## πŸ—οΈ Architecture
88
+
89
+ | Layer | Technology |
90
+ | --- | --- |
91
+ | **Hardware** | AMD Instinct MI300X Β· 192 GB HBM3 |
92
+ | **Runtime** | ROCm 7.2.1 Β· PyTorch 2.10 (ROCm build) |
93
+ | **Inference** | vLLM 0.20.1 (ROCm wheels) Β· OpenAI-compatible API |
94
+ | **Model** | Qwen/Qwen2-VL-7B-Instruct |
95
+ | **Backend** | FastAPI + Gradio Β· Python 3.12 |
96
+ | **Persistence** | MongoDB Atlas (motor async driver) |
97
+ | **Frontend** | React 18 Β· Recharts Β· Lucide |
98
+ | **Deployment** | Hugging Face Spaces (Docker) |
99
+
100
+ ---
101
+
102
+ ## πŸš€ Running Locally
103
+
104
+ ```bash
105
+ # 1. Start vLLM on your AMD GPU
106
+ python -m vllm.entrypoints.openai.api_server \
107
+ --model Qwen/Qwen2-VL-7B-Instruct \
108
+ --host 0.0.0.0 --port 8000 \
109
+ --allowed-origins '["*"]'
110
+
111
+ # 2. Set environment variables
112
+ export AMD_INFERENCE_URL=http://localhost:8000
113
+ export AMD_MODEL_NAME=Qwen/Qwen2-VL-7B-Instruct
114
+ export MONGO_URL=mongodb+srv://... # optional
115
+
116
+ # 3. Start the backend
117
+ pip install -r requirements.txt
118
+ python app.py
119
+ ```
120
+
121
+ ---
122
+
123
+ ## 🎯 Hackathon Track Alignment
124
+
125
+ - **Track 2 Β· AMD Developer Cloud** *(primary)*: Real MI300X inference via ROCm/vLLM
126
+ - **Track 1 Β· AI Agents**: 4-agent agentic workflow (Inspector β†’ Diagnostician β†’ Action β†’ Reporter)
127
+ - **Track 3 Β· Vision & Multimodal AI**: Qwen2-VL processing product images for industrial QC
128
+ - **Qwen Challenge**: Qwen2-VL-7B-Instruct is the sole model powering all four agents end-to-end