Spaces:

lablab-ai-amd-developer-hackathon
/

ROCKIT-Vision-Intelligence

Sleeping

App Files Files Community

Billavenu commited on May 10

Commit

fb12ddc

verified ·

1 Parent(s): d38139f

adding filleeeesssss

Browse files

Files changed (40) hide show

.env.example +35 -0
.gitattributes +2 -0
.gitignore +6 -0
README.md +8 -8
__pycache__/config.cpython-312.pyc +0 -0
__pycache__/embedding.cpython-312.pyc +0 -0
__pycache__/ingest.cpython-312.pyc +0 -0
__pycache__/search.cpython-312.pyc +0 -0
__pycache__/seed_data.cpython-312.pyc +0 -0
__pycache__/vector_store.cpython-312.pyc +0 -0
app.py +405 -0
assests/Architecture.svg +3 -0
assests/GPU_Compute.png +0 -0
assests/architecture.png +0 -0
assests/dataFlow.png +0 -0
assests/data_flow.svg +3 -0
assests/gpu_compute_tiers.svg +3 -0
assests/rockit_logo.png +3 -0
config.py +96 -0
data/images/.gitkeep +2 -0
data/indexes/.gitkeep +2 -0
data/projects/default/images/car.jpg +0 -0
data/projects/default/images/dog.jpg +0 -0
data/projects/default/images/mountain.jpg +0 -0
data/projects/default/indexes/image_index.npz +3 -0
data/projects/default/indexes/image_index_meta.json +1 -0
data/projects/default/indexes/video_index.npz +3 -0
data/projects/default/indexes/video_index_meta.json +1 -0
data/projects/default/videos/sample.mp4 +3 -0
data/videos/.gitkeep +2 -0
embedding.py +245 -0
ingest.py +340 -0
ingest_sample_vision.py +254 -0
query_vision_image.py +91 -0
query_vision_video.py +111 -0
requirements.txt +9 -0
search.py +123 -0
seed_data.py +70 -0
test_store.py +50 -0
vector_store.py +420 -0

.env.example ADDED Viewed

	@@ -0,0 +1,35 @@

+# ─── ARIA Vision Intelligence ───
+#
+# On HF Spaces: set these as Secrets in Space Settings.
+# Locally: copy to .env and edit.
+# HF token (for dataset persistence + Inference API)
+HF_TOKEN=hf_your_token_here
+# Persistent dataset repo (optional)
+# HF_DATASET_REPO=your-username/aria-index
+# GPU mode
+USE_GPU=false
+# Embedding model (auto-selected if not set)
+# GPU:  Qwen/Qwen3-VL-Embedding-2B  (2048d)
+# GPU:  Qwen/Qwen3-VL-Embedding-8B  (4096d)
+# CPU:  openai/clip-vit-large-patch14 (768d)
+# EMBED_MODEL=Qwen/Qwen3-VL-Embedding-2B
+# EMBED_DIM=2048
+# LLM for result interpretation
+# LLM_MODEL=Qwen/Qwen3-35B-A3B
+# LLM_FALLBACK=Qwen/Qwen3-1.7B
+# Video frame interval (seconds)
+FRAME_EVERY_SEC=5
+# Auto-seed on first launch
+AUTO_SEED=true
+SEED_DATASET=nlphuji/flickr30k
+SEED_SPLIT=test[:200]
+# Default project name
+DEFAULT_PROJECT=default

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+assests/rockit_logo.png filter=lfs diff=lfs merge=lfs -text
+data/projects/default/videos/sample.mp4 filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,6 @@

+.env
+data/
+__pycache__/
+*.pyc
+.DS_Store
+.ipynb_checkpoints/

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ license: apache-2.0
 <div align="center">
-# ARIA Vision Intelligence
 ### GPU-Accelerated Multimodal Search Engine
@@ -28,7 +28,7 @@ license: apache-2.0
 ## What Is This?
-ARIA Vision Intelligence is an **open-source, self-hosted multimodal search engine** that lets you create isolated projects, ingest visual media (images, videos), and query them with natural language. It is built for the **AMD Hackathon** and designed to showcase GPU-accelerated approximate nearest-neighbor (ANN) search using the **hipVS CAGRA** graph index on AMD ROCm hardware.
 The core idea is simple:
@@ -60,7 +60,7 @@ projects/
 ```
 ### Native Multimodal Embedding (No Captioning)
-Unlike caption-then-embed pipelines, ARIA uses **true vision-language embedding models** that encode images, video frames, and text queries into the **same vector space** directly. No intermediate captioning step — no information loss.
 | Tier | Model | Dim | Use Case |
 |------|-------|-----|----------|
@@ -69,7 +69,7 @@ Unlike caption-then-embed pipelines, ARIA uses **true vision-language embedding
 | CPU fallback | `openai/clip-vit-large-patch14` | 768 | Free-tier HF Spaces, dev |
 ### CAGRA Graph Index (hipVS)
-The CAGRA graph index is the fastest known ANN algorithm for GPU-resident data. ARIA rebuilds the CAGRA graph on every insert because this project is **optimized for inference and query speed**, not ingestion throughput. A 100K-vector CAGRA rebuild takes ~2 seconds on an MI250X — negligible compared to the embedding cost.
 ### NVMe → VRAM Async Hot-Swap
 Indexes live in three tiers of memory. When a project is queried, its index is **asynchronously copied from NVMe into VRAM** via pinned-memory DMA, without blocking other projects. When VRAM fills up, least-recently-used indexes are evicted back to NVMe — not deleted.
@@ -89,7 +89,7 @@ Indexes live in three tiers of memory. When a project is queried, its index is *
 This design lets you run **dozens of projects** on a single GPU by keeping only the active ones hot. Full VRAM capacity is utilized.
 ### LLM-Interpreted Results
-Raw vector search returns `(id, score)` tuples. Before showing results to the user, ARIA passes them through an LLM that interprets the matches, merges adjacent video timestamps into time ranges, and generates a human-readable summary.
 | Tier | Model | Notes |
 |------|-------|-------|
@@ -111,7 +111,7 @@ Raw vector search returns `(id, score)` tuples. Before showing results to the us
 ## GPU Compute Tiers
-ARIA automatically detects available hardware and selects the best backend:
 ![GPU Compute Tiers](assests/GPU_Compute.png)
@@ -231,7 +231,7 @@ Each project is an isolated workspace with its own sources, embeddings, and CAGR
 Upload images or videos. For videos, ffmpeg extracts one representative frame every N seconds. Every image and frame is embedded directly by the vision-language model (Qwen3-VL or CLIP) — no captioning, no text intermediary.
 ### 3. CAGRA Build
-After every insert, the CAGRA graph index is **fully rebuilt** from the updated vector set. This is intentional: ARIA is optimized for query speed, not ingestion throughput. A 100K rebuild takes ~2s on MI250X. The built graph is immediately serialized to NVMe.
 ### 4. Search
 When you search, the query text is embedded by the same model. The CAGRA index is loaded into VRAM (if not already hot) via async pinned-memory DMA, and searched in microseconds. Results are post-processed: video frame hits are merged into time ranges, and the full result set is sent to the LLM for a human-friendly summary.
@@ -261,5 +261,5 @@ Apache 2.0
 ---
 <div align="center">
-<i>Built for the AMD Hackathon — ARIA Vision Intelligence Platform</i>
 </div>

 <div align="center">
+# ROCKIT Vision Intelligence
 ### GPU-Accelerated Multimodal Search Engine
 ## What Is This?
+ROCKIT Vision Intelligence is an **open-source, self-hosted multimodal search engine** that lets you create isolated projects, ingest visual media (images, videos), and query them with natural language. It is built for the **AMD Hackathon** and designed to showcase GPU-accelerated approximate nearest-neighbor (ANN) search using the **hipVS CAGRA** graph index on AMD ROCm hardware.
 The core idea is simple:
 ```
 ### Native Multimodal Embedding (No Captioning)
+Unlike caption-then-embed pipelines, ROCKIT uses **true vision-language embedding models** that encode images, video frames, and text queries into the **same vector space** directly. No intermediate captioning step — no information loss.
 | Tier | Model | Dim | Use Case |
 |------|-------|-----|----------|
 | CPU fallback | `openai/clip-vit-large-patch14` | 768 | Free-tier HF Spaces, dev |
 ### CAGRA Graph Index (hipVS)
+The CAGRA graph index is the fastest known ANN algorithm for GPU-resident data. ROCKIT rebuilds the CAGRA graph on every insert because this project is **optimized for inference and query speed**, not ingestion throughput. A 100K-vector CAGRA rebuild takes ~2 seconds on an MI250X — negligible compared to the embedding cost.
 ### NVMe → VRAM Async Hot-Swap
 Indexes live in three tiers of memory. When a project is queried, its index is **asynchronously copied from NVMe into VRAM** via pinned-memory DMA, without blocking other projects. When VRAM fills up, least-recently-used indexes are evicted back to NVMe — not deleted.
 This design lets you run **dozens of projects** on a single GPU by keeping only the active ones hot. Full VRAM capacity is utilized.
 ### LLM-Interpreted Results
+Raw vector search returns `(id, score)` tuples. Before showing results to the user, ROCKIT passes them through an LLM that interprets the matches, merges adjacent video timestamps into time ranges, and generates a human-readable summary.
 | Tier | Model | Notes |
 |------|-------|-------|
 ## GPU Compute Tiers
+ROCKIT automatically detects available hardware and selects the best backend:
 ![GPU Compute Tiers](assests/GPU_Compute.png)
 Upload images or videos. For videos, ffmpeg extracts one representative frame every N seconds. Every image and frame is embedded directly by the vision-language model (Qwen3-VL or CLIP) — no captioning, no text intermediary.
 ### 3. CAGRA Build
+After every insert, the CAGRA graph index is **fully rebuilt** from the updated vector set. This is intentional: ROCKIT is optimized for query speed, not ingestion throughput. A 100K rebuild takes ~2s on MI250X. The built graph is immediately serialized to NVMe.
 ### 4. Search
 When you search, the query text is embedded by the same model. The CAGRA index is loaded into VRAM (if not already hot) via async pinned-memory DMA, and searched in microseconds. Results are post-processed: video frame hits are merged into time ranges, and the full result set is sent to the LLM for a human-friendly summary.
 ---
 <div align="center">
+<i>Built for the AMD Hackathon — ROCKIT Vision Intelligence Platform</i>
 </div>

__pycache__/config.cpython-312.pyc ADDED Viewed

Binary file (4.31 kB). View file

__pycache__/embedding.cpython-312.pyc ADDED Viewed

Binary file (10.1 kB). View file

__pycache__/ingest.cpython-312.pyc ADDED Viewed

Binary file (14.8 kB). View file

__pycache__/search.cpython-312.pyc ADDED Viewed

Binary file (5.32 kB). View file

__pycache__/seed_data.cpython-312.pyc ADDED Viewed

Binary file (3.17 kB). View file

__pycache__/vector_store.cpython-312.pyc ADDED Viewed

Binary file (24.3 kB). View file

app.py ADDED Viewed

	@@ -0,0 +1,405 @@

+#!/usr/bin/env python3
+"""
+HF_Space_hipVS/app.py
+=====================
+ROCKIT Vision Intelligence — Hugging Face Space
+GPU-accelerated multimodal search engine.
+  - Embedding: Qwen3-VL-Embedding (GPU) / CLIP (CPU)
+  - Search:    CAGRA (hipVS) -> PyTorch -> NumPy
+  - UI:        Premium Gradio Demo
+"""
+import logging
+import sys
+import os
+from pathlib import Path
+import gradio as gr
+sys.path.insert(0, str(Path(__file__).parent))
+logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(name)s] %(message)s")
+logger = logging.getLogger("rockit-vision")
+from config import (
+    USE_GPU, EMBED_MODEL, EMBED_DIM, LLM_MODEL, LLM_FALLBACK,
+    FRAME_EVERY_SEC, HF_TOKEN, HF_DATASET_REPO, AUTO_SEED,
+    DEFAULT_PROJECT, DATA_DIR
+)
+from vector_store import get_store, list_projects
+from ingest import (
+    ingest_images, ingest_videos,
+    ingest_single_image, ingest_single_video,
+    HAS_FFMPEG,
+)
+from search import search_images, search_videos
+import seed_data
+# ── Helpers ──────────────────────────────────────────────────────────────────
+def get_system_info(project: str = DEFAULT_PROJECT) -> str:
+    img_store = get_store(project, "image_index")
+    vid_store = get_store(project, "video_index")
+    return "\n".join([
+        f"### Project Context: `{project}`\n",
+        "| Hardware & Models | Status |",
+        "|:---|:---|",
+        f"| **GPU Acceleration** | {'🚀 Enabled' if USE_GPU else '🐢 Disabled (CPU)'} |",
+        f"| **Search Backend** | {img_store.mode} |",
+        f"| **Vision Model** | `{EMBED_MODEL.split('/')[-1]}` ({EMBED_DIM}d) |",
+        f"| **Reasoning LLM** | `{LLM_MODEL.split('/')[-1]}` |",
+        f"| **Media Engine** | {'ffmpeg detected' if HAS_FFMPEG else 'ffmpeg MISSING'} |",
+        "\n| Index Stats | Count | Location |",
+        "|:---|:---|:---|",
+        f"| Images | {img_store.count} | {('VRAM (Hot)' if img_store.in_vram else 'NVMe (Cold)')} |",
+        f"| Video Frames | {vid_store.count} | {('VRAM (Hot)' if vid_store.in_vram else 'NVMe (Cold)')} |",
+    ])
+def get_projects_list() -> list[str]:
+    projects = list_projects()
+    if DEFAULT_PROJECT not in projects:
+        projects.insert(0, DEFAULT_PROJECT)
+    return projects
+# ── Callbacks ────────────────────────────────────────────────────────────────
+def handle_image_upload(files, project, progress=gr.Progress()):
+    if not files:
+        return "No files uploaded.", get_system_info(project)
+    results = []
+    for i, f in enumerate(files):
+        progress((i + 1) / len(files), desc=f"Embedding {Path(f).name}...")
+        ok, msg = ingest_single_image(f, project=project)
+        results.append(msg)
+    return "\n".join(results), get_system_info(project)
+def handle_video_upload(files, project, progress=gr.Progress()):
+    if not files:
+        return "No files uploaded.", get_system_info(project)
+    results = []
+    for f in files:
+        count, msg = ingest_single_video(f, project=project, progress_callback=progress)
+        results.append(msg)
+    return "\n".join(results), get_system_info(project)
+def handle_batch_ingest(project, progress=gr.Progress()):
+    img_count, img_log = ingest_images(project=project, progress_callback=progress)
+    vid_count, vid_log = ingest_videos(project=project, progress_callback=progress)
+    log = (
+        f"=== Batch Ingest Results ===\n\n"
+        f"Successfully indexed {img_count} images and {vid_count} video frames into project '{project}'."
+    )
+    return log, get_system_info(project)
+def handle_seed(project, progress=gr.Progress()):
+    count, log = seed_data.run(project=project, progress_callback=progress)
+    return log, get_system_info(project)
+def handle_clear(project):
+    get_store(project, "image_index").clear()
+    get_store(project, "video_index").clear()
+    return f"All indexes cleared for project '{project}'.", get_system_info(project)
+def handle_search(query, mode, top_k, project):
+    if not query.strip():
+        return "Please enter a search query.", [], ""
+    if mode == "Image Search":
+        result = search_images(query, project=project, top_k=int(top_k))
+        summary = result["llm_summary"]
+        gallery_items = []
+        for r in result["results"]:
+            path = r.get("file_path", "")
+            name = r.get("file_name", "Unknown")
+            score = r.get("score", 0)
+            if path and os.path.exists(path):
+                gallery_items.append((path, f"{name} (Score: {score:.3f})"))
+        return summary, gallery_items, result["store_info"]
+    else:
+        result = search_videos(query, project=project, top_k=int(top_k))
+        summary = result["llm_summary"]
+        gallery_items = []
+        for m in result["matches"]:
+            path = m.get("representative_frame", "")
+            name = m.get("video_name", "Unknown")
+            time_range = f"{m['start']} - {m['end']}"
+            score = m.get("score", 0)
+            if path and os.path.exists(path):
+                gallery_items.append((path, f"{name} @ {time_range} (Score: {score:.3f})"))
+        return summary, gallery_items, result["store_info"]
+def handle_create_project(name):
+    if not name or not name.strip():
+        return "Enter a project name.", gr.update()
+    name = name.strip().lower().replace(" ", "-")
+    from config import get_project_dir
+    get_project_dir(name)
+    return f"Project '{name}' created.", gr.update(choices=get_projects_list(), value=name)
+# ── CSS ──────────────────────────────────────────────────────────────────────
+CSS = """
+@import url('https://fonts.googleapis.com/css2?family=Inter:wght@400;600;700;800&display=swap');
+body { font-family: 'Inter', sans-serif !important; }
+.gradio-container {
+    max-width: 1300px !important;
+    margin: 0 auto !important;
+    background-color: #050505 !important;
+}
+.main-header {
+    text-align: center;
+    background: linear-gradient(135deg, #0f0f1b 0%, #1a1a2e 100%);
+    padding: 3rem 2rem;
+    border-radius: 24px;
+    margin-bottom: 2rem;
+    border: 1px solid rgba(255,255,255,0.05);
+    box-shadow: 0 10px 30px rgba(0,0,0,0.5);
+    display: flex;
+    flex-direction: column;
+    align-items: center;
+}
+.logo-container img {
+    max-width: 120px;
+    margin-bottom: 1.5rem;
+    filter: drop-shadow(0 0 15px rgba(233, 69, 96, 0.4));
+}
+.main-header h1 {
+    background: linear-gradient(90deg, #e94560, #a033ff, #4cc9f0);
+    -webkit-background-clip: text;
+    -webkit-text-fill-color: transparent;
+    font-size: 3.2rem !important;
+    font-weight: 800 !important;
+    margin: 0;
+    letter-spacing: -1px;
+}
+.main-header p.subtitle {
+    color: #94a3b8;
+    font-size: 1.1rem;
+    margin-top: 0.5rem;
+}
+.card {
+    background: #11111b !important;
+    border: 1px solid rgba(255,255,255,0.08) !important;
+    border-radius: 16px !important;
+    padding: 1rem !important;
+}
+#search-btn {
+    background: linear-gradient(135deg, #e94560 0%, #533483 100%) !important;
+    border: none !important;
+    font-weight: 700 !important;
+    color: white !important;
+    transition: all 0.3s ease;
+}
+#search-btn:hover {
+    transform: translateY(-2px);
+    box-shadow: 0 5px 15px rgba(233, 69, 96, 0.4);
+}
+.stat-box {
+    background: rgba(255,255,255,0.03);
+    border-radius: 12px;
+    padding: 1rem;
+    border: 1px solid rgba(255,255,255,0.05);
+}
+.gallery-container {
+    background: #0a0a0f !important;
+    border-radius: 12px !important;
+}
+footer { display: none !important; }
+"""
+# ── Build UI ─────────────────────────────────────────────────────────────────
+def build_ui():
+    logo_path = "assests/rockit_logo.png"
+    arch_path = "assests/Architecture.svg"
+    flow_path = "assests/data_flow.svg"
+    gpu_path = "assests/gpu_compute_tiers.svg"
+    with gr.Blocks(
+        title="ROCKIT Vision Intelligence",
+        theme=gr.themes.Default(
+            primary_hue="rose",
+            secondary_hue="indigo",
+            neutral_hue="slate",
+        ),
+        css=CSS,
+    ) as app:
+        with gr.Div(elem_classes="main-header"):
+            if os.path.exists(logo_path):
+                gr.Image(logo_path, show_label=False, container=False, width=100, elem_classes="logo-container")
+            gr.HTML("<h1>ROCKIT Vision Intelligence</h1>")
+            gr.Markdown("GPU-Accelerated Multimodal Search Platform", elem_classes="subtitle")
+        with gr.Row():
+            with gr.Column(scale=3):
+                with gr.Group(elem_classes="card"):
+                    gr.Markdown("### 🗂️ Project Selection")
+                    with gr.Row():
+                        project_select = gr.Dropdown(
+                            choices=get_projects_list(),
+                            value=DEFAULT_PROJECT,
+                            label="Active Workspace",
+                            scale=4,
+                            interactive=True,
+                        )
+                        refresh_btn = gr.Button("🔄", scale=1)
+                    with gr.Accordion("Create New Project", open=False):
+                        new_project_name = gr.Textbox(label="Project ID", placeholder="e.g. security-cam")
+                        create_btn = gr.Button("Initialize Project", variant="secondary")
+                        create_status = gr.Markdown()
+                with gr.Group(elem_classes="card", visible=True):
+                    gr.Markdown("### ⚙️ System Status")
+                    system_info = gr.Markdown(value=get_system_info())
+            with gr.Column(scale=7):
+                with gr.Tabs():
+                    # ── Tab 1: Search ──────────────────────────────────────────
+                    with gr.Tab("🔍 Search"):
+                        with gr.Group(elem_classes="card"):
+                            with gr.Row():
+                                with gr.Column(scale=4):
+                                    query_input = gr.Textbox(
+                                        label="Natural Language Query",
+                                        placeholder='Try "a cat sitting on a laptop" or "someone running in a park"',
+                                        lines=2,
+                                    )
+                                with gr.Column(scale=1):
+                                    search_mode = gr.Radio(["Image Search", "Video Intelligence"], value="Image Search", label="Search Mode")
+                                    top_k = gr.Slider(1, 50, value=12, step=1, label="Results Count")
+                            search_btn = gr.Button("Execute Semantic Search", variant="primary", elem_id="search-btn", size="lg")
+                        gr.Markdown("### 🤖 AI Interpretation")
+                        search_summary = gr.Markdown("*Results will appear here...*", elem_classes="card")
+                        gr.Markdown("### 🖼️ Visual Matches")
+                        result_gallery = gr.Gallery(
+                            label="Retrieved Media",
+                            columns=[3, 4],
+                            rows=[2],
+                            object_fit="contain",
+                            height="auto",
+                            elem_classes="gallery-container"
+                        )
+                        with gr.Accordion("Technical Details", open=False):
+                            store_info = gr.Textbox(label="Vector Store Engine", interactive=False)
+                    # ── Tab 2: Upload ──────────────────────────────────────────
+                    with gr.Tab("📤 Ingest Media"):
+                        with gr.Row():
+                            with gr.Column():
+                                with gr.Group(elem_classes="card"):
+                                    gr.Markdown("#### 🖼️ Image Ingestion")
+                                    img_upload = gr.File(label="Select Images", file_types=["image"], file_count="multiple")
+                                    img_btn = gr.Button("Embed & Index Images")
+                                    img_log = gr.Textbox(label="Status", lines=4, interactive=False)
+                            with gr.Column():
+                                with gr.Group(elem_classes="card"):
+                                    gr.Markdown("#### 🎥 Video Intelligence")
+                                    vid_upload = gr.File(label="Select Videos", file_types=["video"], file_count="multiple")
+                                    vid_btn = gr.Button("Extract & Index Frames")
+                                    vid_log = gr.Textbox(label="Status", lines=4, interactive=False)
+                        with gr.Group(elem_classes="card"):
+                            gr.Markdown("#### ⚡ Batch Operations")
+                            with gr.Row():
+                                seed_btn = gr.Button("Seed Demo Data", variant="secondary")
+                                batch_btn = gr.Button("Re-index Folder", variant="secondary")
+                                clear_btn = gr.Button("Purge All Indexes", variant="stop")
+                            action_log = gr.Markdown()
+                    # ── Tab 3: Workflow ────────────────────────────────────────
+                    with gr.Tab("🧠 How It Works"):
+                        gr.Markdown("""
+                        ### Direct Multimodal Embedding
+                        ROCKIT doesn't use captioning models. It uses **Vision-Language Models (VLM)** to encode visual features
+                        directly into the same vector space as text. This preserves subtle details that text captions often lose.
+                        """)
+                        with gr.Row():
+                            with gr.Column():
+                                gr.Markdown("#### 1. System Architecture")
+                                if os.path.exists(arch_path):
+                                    gr.Image(arch_path, show_label=False)
+                            with gr.Column():
+                                gr.Markdown("#### 2. Query Flow")
+                                if os.path.exists(flow_path):
+                                    gr.Image(flow_path, show_label=False)
+                        gr.Markdown("---")
+                        with gr.Row():
+                            with gr.Column():
+                                gr.Markdown("#### 3. GPU Acceleration Tiers")
+                                if os.path.exists(gpu_path):
+                                    gr.Image(gpu_path, show_label=False)
+                            with gr.Column():
+                                gr.Markdown("""
+                                #### Hot/Cold Memory Management
+                                To support dozens of projects on a single GPU, ROCKIT implements an **NVMe-to-VRAM Async Swap**.
+                                - **Cold Store (NVMe):** Indexes are serialized as `.cagra` files.
+                                - **Hot Cache (VRAM):** Active projects are copied into VRAM using pinned-memory DMA.
+                                - **LRU Eviction:** Least recently used indexes are purged from VRAM to make room for new ones.
+                                """)
+        # Event Bindings
+        project_select.change(fn=get_system_info, inputs=[project_select], outputs=[system_info])
+        refresh_btn.click(fn=lambda: gr.update(choices=get_projects_list()), outputs=[project_select])
+        create_btn.click(
+            fn=handle_create_project,
+            inputs=[new_project_name],
+            outputs=[create_status, project_select],
+        )
+        search_btn.click(
+            fn=handle_search,
+            inputs=[query_input, search_mode, top_k, project_select],
+            outputs=[search_summary, result_gallery, store_info]
+        )
+        query_input.submit(
+            fn=handle_search,
+            inputs=[query_input, search_mode, top_k, project_select],
+            outputs=[search_summary, result_gallery, store_info]
+        )
+        img_btn.click(fn=handle_image_upload, inputs=[img_upload, project_select], outputs=[img_log, system_info])
+        vid_btn.click(fn=handle_video_upload, inputs=[vid_upload, project_select], outputs=[vid_log, system_info])
+        seed_btn.click(fn=handle_seed, inputs=[project_select], outputs=[action_log, system_info])
+        batch_btn.click(fn=handle_batch_ingest, inputs=[project_select], outputs=[action_log, system_info])
+        clear_btn.click(fn=handle_clear, inputs=[project_select], outputs=[action_log, system_info])
+    return app
+if __name__ == "__main__":
+    if seed_data.is_needed():
+        logger.info("Auto-seeding default project from HF Dataset...")
+        try:
+            seed_data.run()
+        except Exception as e:
+            logger.error(f"Auto-seeding failed: {e}")
+    app = build_ui()
+    app.launch(server_name="0.0.0.0", server_port=7860, share=False)

assests/Architecture.svg ADDED Viewed

assests/GPU_Compute.png ADDED Viewed

assests/architecture.png ADDED Viewed

assests/dataFlow.png ADDED Viewed

assests/data_flow.svg ADDED Viewed

assests/gpu_compute_tiers.svg ADDED Viewed

assests/rockit_logo.png ADDED Viewed

Git LFS Details

SHA256: 98f99203fd7f4d41d670495b9d0555d2d62c44cea6dc51af69f754e86329a30f
Pointer size: 131 Bytes
Size of remote file: 612 kB

config.py ADDED Viewed

	@@ -0,0 +1,96 @@

+# HF_Space_hipVS/config.py
+# ========================
+# Environment-aware configuration.
+# Auto-scales model selection by hardware tier.
+import os
+import logging
+from pathlib import Path
+logger = logging.getLogger(__name__)
+# ── Core Flags ──────────────────────────────────────────────────────────────
+USE_GPU = os.environ.get("USE_GPU", "false").lower() in ("true", "1", "yes")
+HF_TOKEN = os.environ.get("HF_TOKEN", "")
+HF_DATASET_REPO = os.environ.get("HF_DATASET_REPO", "")
+# ── Device ──────────────────────────────────────────────────────────────────
+DEVICE = "cuda" if USE_GPU else "cpu"
+TORCH_DTYPE = "float16" if USE_GPU else "float32"
+# ── Embedding Model (multimodal — images + text, NO captioning) ─────────────
+#
+# GPU:  Qwen3-VL-Embedding-2B  (2048d) or Qwen3-VL-Embedding-8B (4096d)
+# CPU:  CLIP ViT-L/14 (768d) — lightweight, runs on free HF Spaces
+#
+if USE_GPU:
+    EMBED_MODEL = os.environ.get("EMBED_MODEL", "Qwen/Qwen3-VL-Embedding-2B")
+    EMBED_DIM = int(os.environ.get("EMBED_DIM", "2048"))
+else:
+    EMBED_MODEL = os.environ.get("EMBED_MODEL", "openai/clip-vit-large-patch14")
+    EMBED_DIM = int(os.environ.get("EMBED_DIM", "768"))
+# ── LLM (search result interpretation) ─────────────────────────────────────
+#
+# Primary:  Qwen3-35B-A3B  (MoE: 35B total, 3B active — fast + smart)
+# Fallback: Qwen3-1.7B     (dense, runs on anything)
+#
+LLM_MODEL = os.environ.get("LLM_MODEL", "Qwen/Qwen3-35B-A3B")
+LLM_FALLBACK = os.environ.get("LLM_FALLBACK", "Qwen/Qwen3-1.7B")
+# ── Video Frame Extraction ─────────────────────────────────────────────────
+FRAME_EVERY_SEC = int(os.environ.get("FRAME_EVERY_SEC", "5"))
+# ── Data Directories ────────────────────────────────────────────────────────
+DATA_DIR = Path(os.environ.get("DATA_DIR", str(Path(__file__).parent / "data")))
+PROJECTS_DIR = DATA_DIR / "projects"
+DEFAULT_PROJECT = os.environ.get("DEFAULT_PROJECT", "default")
+SWAP_PATH = Path(os.environ.get("SWAP_PATH", str(DATA_DIR / "indexes")))
+# Ensure base directories
+for d in (PROJECTS_DIR, SWAP_PATH):
+    d.mkdir(parents=True, exist_ok=True)
+# ── Per-project directories ─────────────────────────────────────────────────
+def get_project_dir(project: str = DEFAULT_PROJECT) -> Path:
+    """Return the root directory for a project, creating it if needed."""
+    p = PROJECTS_DIR / project
+    for sub in ("images", "videos", "indexes"):
+        (p / sub).mkdir(parents=True, exist_ok=True)
+    return p
+# Ensure default project exists
+get_project_dir(DEFAULT_PROJECT)
+# ── Seeding ─────────────────────────────────────────────────────────────────
+SEED_DATASET = os.environ.get("SEED_DATASET", "nlphuji/flickr30k")
+SEED_SPLIT = os.environ.get("SEED_SPLIT", "test[:200]")
+AUTO_SEED = os.environ.get("AUTO_SEED", "true").lower() in ("true", "1", "yes")
+# ── File Extensions ─────────────────────────────────────────────────────────
+IMAGE_EXTENSIONS = {".jpg", ".jpeg", ".png", ".webp", ".gif", ".bmp"}
+VIDEO_EXTENSIONS = {".mp4", ".mov", ".avi", ".mkv", ".webm"}
+# ── Startup Log ─────────────────────────────────────────────────────────────
+logger.info("=" * 55)
+logger.info("  ARIA Vision Intelligence")
+logger.info("=" * 55)
+logger.info(f"  USE_GPU       : {USE_GPU}")
+logger.info(f"  DEVICE        : {DEVICE}")
+logger.info(f"  EMBED_MODEL   : {EMBED_MODEL}")
+logger.info(f"  EMBED_DIM     : {EMBED_DIM}")
+logger.info(f"  LLM_MODEL     : {LLM_MODEL}")
+logger.info(f"  LLM_FALLBACK  : {LLM_FALLBACK}")
+logger.info(f"  SWAP_PATH     : {SWAP_PATH}")
+logger.info(f"  HF_TOKEN      : {'set' if HF_TOKEN else 'NOT SET'}")
+logger.info(f"  HF_DATASET    : {HF_DATASET_REPO or 'local only'}")
+logger.info(f"  AUTO_SEED     : {AUTO_SEED}")
+logger.info("=" * 55)

data/images/.gitkeep ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ # This directory stores uploaded images for embedding.
2	+ # Place your image files here (.jpg, .png, .webp, etc.)

data/indexes/.gitkeep ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ # Persisted vector indexes are stored here as .npz files.
2	+ # This directory is auto-created by the vector_store module.

data/projects/default/images/car.jpg ADDED Viewed

data/projects/default/images/dog.jpg ADDED Viewed

data/projects/default/images/mountain.jpg ADDED Viewed

data/projects/default/indexes/image_index.npz ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7497eb8fd578e2e4d7c14cf0b26e70b9f0bc6c05b62b00bc4685590cd5ad7503
+size 29169

data/projects/default/indexes/image_index_meta.json ADDED Viewed

	@@ -0,0 +1 @@

+ [{"file_name": "mountain_sunset.jpg", "file_size": "245.3KB", "resolution": "1920x1080", "file_path": "/data/images/mountain_sunset.jpg"}, {"file_name": "dog_park.jpg", "file_size": "189.7KB", "resolution": "1280x720", "file_path": "/data/images/dog_park.jpg"}, {"file_name": "red_car.jpg", "file_size": "312.1KB", "resolution": "1920x1080", "file_path": "/data/images/red_car.jpg"}, {"file_name": "ocean_waves.jpg", "file_size": "276.4KB", "resolution": "2560x1440", "file_path": "/data/images/ocean_waves.jpg"}, {"file_name": "city_night.jpg", "file_size": "198.2KB", "resolution": "1920x1080", "file_path": "/data/images/city_night.jpg"}, {"file_name": "cat_windowsill.jpg", "file_size": "145.6KB", "resolution": "1280x960", "file_path": "/data/images/cat_windowsill.jpg"}, {"file_name": "forest_trail.jpg", "file_size": "334.8KB", "resolution": "2560x1440", "file_path": "/data/images/forest_trail.jpg"}, {"file_name": "beach_sunset.jpg", "file_size": "267.9KB", "resolution": "1920x1080", "file_path": "/data/images/beach_sunset.jpg"}, {"file_name": "snow_mountain.jpg", "file_size": "289.3KB", "resolution": "3840x2160", "file_path": "/data/images/snow_mountain.jpg"}, {"file_name": "flower_garden.jpg", "file_size": "203.5KB", "resolution": "1600x1200", "file_path": "/data/images/flower_garden.jpg"}]

data/projects/default/indexes/video_index.npz ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:43c04bb0e533e610d5d5fc4a9d1a8823cf504e846e3ed3ed465608c0a550bf79
+size 57673

data/projects/default/indexes/video_index_meta.json ADDED Viewed

	@@ -0,0 +1 @@

+ [{"video_path": "/data/videos/nature_doc.mp4", "video_name": "nature_doc.mp4", "timestamp_sec": 0.5, "timestamp_label": "00:00", "duration_total": 120.0}, {"video_path": "/data/videos/nature_doc.mp4", "video_name": "nature_doc.mp4", "timestamp_sec": 5.0, "timestamp_label": "00:05", "duration_total": 120.0}, {"video_path": "/data/videos/nature_doc.mp4", "video_name": "nature_doc.mp4", "timestamp_sec": 10.0, "timestamp_label": "00:10", "duration_total": 120.0}, {"video_path": "/data/videos/nature_doc.mp4", "video_name": "nature_doc.mp4", "timestamp_sec": 15.0, "timestamp_label": "00:15", "duration_total": 120.0}, {"video_path": "/data/videos/nature_doc.mp4", "video_name": "nature_doc.mp4", "timestamp_sec": 20.0, "timestamp_label": "00:20", "duration_total": 120.0}, {"video_path": "/data/videos/nature_doc.mp4", "video_name": "nature_doc.mp4", "timestamp_sec": 25.0, "timestamp_label": "00:25", "duration_total": 120.0}, {"video_path": "/data/videos/nature_doc.mp4", "video_name": "nature_doc.mp4", "timestamp_sec": 30.0, "timestamp_label": "00:30", "duration_total": 120.0}, {"video_path": "/data/videos/nature_doc.mp4", "video_name": "nature_doc.mp4", "timestamp_sec": 35.0, "timestamp_label": "00:35", "duration_total": 120.0}, {"video_path": "/data/videos/nature_doc.mp4", "video_name": "nature_doc.mp4", "timestamp_sec": 40.0, "timestamp_label": "00:40", "duration_total": 120.0}, {"video_path": "/data/videos/nature_doc.mp4", "video_name": "nature_doc.mp4", "timestamp_sec": 45.0, "timestamp_label": "00:45", "duration_total": 120.0}, {"video_path": "/data/videos/big_buck_bunny.mp4", "video_name": "big_buck_bunny.mp4", "timestamp_sec": 0.5, "timestamp_label": "00:00", "duration_total": 60.0}, {"video_path": "/data/videos/big_buck_bunny.mp4", "video_name": "big_buck_bunny.mp4", "timestamp_sec": 5.0, "timestamp_label": "00:05", "duration_total": 60.0}, {"video_path": "/data/videos/big_buck_bunny.mp4", "video_name": "big_buck_bunny.mp4", "timestamp_sec": 10.0, "timestamp_label": "00:10", "duration_total": 60.0}, {"video_path": "/data/videos/big_buck_bunny.mp4", "video_name": "big_buck_bunny.mp4", "timestamp_sec": 15.0, "timestamp_label": "00:15", "duration_total": 60.0}, {"video_path": "/data/videos/big_buck_bunny.mp4", "video_name": "big_buck_bunny.mp4", "timestamp_sec": 20.0, "timestamp_label": "00:20", "duration_total": 60.0}, {"video_path": "/data/videos/big_buck_bunny.mp4", "video_name": "big_buck_bunny.mp4", "timestamp_sec": 25.0, "timestamp_label": "00:25", "duration_total": 60.0}, {"video_path": "/data/videos/big_buck_bunny.mp4", "video_name": "big_buck_bunny.mp4", "timestamp_sec": 30.0, "timestamp_label": "00:30", "duration_total": 60.0}, {"video_path": "/data/videos/big_buck_bunny.mp4", "video_name": "big_buck_bunny.mp4", "timestamp_sec": 35.0, "timestamp_label": "00:35", "duration_total": 60.0}, {"video_path": "/data/videos/big_buck_bunny.mp4", "video_name": "big_buck_bunny.mp4", "timestamp_sec": 40.0, "timestamp_label": "00:40", "duration_total": 60.0}, {"video_path": "/data/videos/big_buck_bunny.mp4", "video_name": "big_buck_bunny.mp4", "timestamp_sec": 45.0, "timestamp_label": "00:45", "duration_total": 60.0}]

data/projects/default/videos/sample.mp4 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3bb938fb70049e3e45f533b37ccae995ae96516e04c2f35b0c1142e47b2a39c1
+size 788493

data/videos/.gitkeep ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ # This directory stores uploaded videos for frame extraction and embedding.
2	+ # Place your video files here (.mp4, .mov, .avi, etc.)

embedding.py ADDED Viewed

	@@ -0,0 +1,245 @@

+# HF_Space_hipVS/embedding.py
+# ============================
+# Multimodal embedding + LLM calls.
+#
+# Embedding strategy: NO CAPTIONING.
+#   GPU:  Qwen3-VL-Embedding (2B or 8B) — encodes images AND text into same space
+#   CPU:  CLIP ViT-L/14 — same idea, lighter weight
+#
+# LLM strategy:
+#   Primary:  Qwen3-35B-A3B (local or HF Inference API)
+#   Fallback: Qwen3-1.7B or HF Inference API
+import logging
+import io
+import numpy as np
+from PIL import Image as PILImage
+logger = logging.getLogger(__name__)
+# ── Lazy-loaded model singletons ─────────────────────────────────────────────
+_embed_model = None
+_embed_processor = None
+_embed_tokenizer = None
+_is_clip = False
+def _load_embed_model():
+    """
+    Lazy-init the multimodal embedding model.
+    GPU path: Qwen3-VL-Embedding via transformers
+    CPU path: CLIP via transformers (CLIPModel + CLIPProcessor)
+    """
+    global _embed_model, _embed_processor, _embed_tokenizer, _is_clip
+    if _embed_model is not None:
+        return
+    import torch
+    from config import EMBED_MODEL, DEVICE, USE_GPU
+    model_lower = EMBED_MODEL.lower()
+    if "clip" in model_lower:
+        # ── CLIP path (CPU fallback) ────────────────────────────────────
+        from transformers import CLIPModel, CLIPProcessor
+        logger.info(f"Loading CLIP model: {EMBED_MODEL} on {DEVICE}")
+        _embed_model = CLIPModel.from_pretrained(EMBED_MODEL).to(DEVICE)
+        _embed_processor = CLIPProcessor.from_pretrained(EMBED_MODEL)
+        _embed_model.eval()
+        _is_clip = True
+        logger.info("CLIP model loaded")
+    else:
+        # ── Qwen3-VL-Embedding path (GPU) ──────────────────────────────
+        from transformers import AutoModel, AutoProcessor
+        dtype = torch.float16 if USE_GPU else torch.float32
+        logger.info(f"Loading Qwen3-VL-Embedding: {EMBED_MODEL} on {DEVICE}")
+        _embed_model = AutoModel.from_pretrained(
+            EMBED_MODEL,
+            torch_dtype=dtype,
+            trust_remote_code=True,
+        ).to(DEVICE)
+        _embed_processor = AutoProcessor.from_pretrained(
+            EMBED_MODEL,
+            trust_remote_code=True,
+        )
+        _embed_model.eval()
+        _is_clip = False
+        logger.info("Qwen3-VL-Embedding model loaded")
+# ── Text Embedding ──────────────────────────────────────────────────────────
+def embed_text(text: str) -> np.ndarray:
+    """
+    Embed a text string into the shared multimodal vector space.
+    Works with both CLIP and Qwen3-VL-Embedding.
+    Returns a normalized float32 numpy vector.
+    """
+    import torch
+    from config import DEVICE
+    _load_embed_model()
+    with torch.no_grad():
+        if _is_clip:
+            inputs = _embed_processor(text=[text], return_tensors="pt", padding=True, truncation=True).to(DEVICE)
+            features = _embed_model.get_text_features(**inputs)
+        else:
+            # Qwen3-VL-Embedding: text-only input
+            inputs = _embed_processor(text=[text], return_tensors="pt", padding=True, truncation=True).to(DEVICE)
+            outputs = _embed_model(**inputs)
+            # Use the [CLS] token or mean pooling depending on model
+            if hasattr(outputs, "pooler_output") and outputs.pooler_output is not None:
+                features = outputs.pooler_output
+            else:
+                features = outputs.last_hidden_state[:, 0, :]
+    vec = features.squeeze(0).cpu().float().numpy()
+    # L2 normalize
+    norm = np.linalg.norm(vec)
+    if norm > 0:
+        vec = vec / norm
+    return vec
+def embed_texts(texts: list[str]) -> np.ndarray:
+    """Batch embed multiple texts. Returns (N, D) float32 array."""
+    import torch
+    from config import DEVICE
+    _load_embed_model()
+    with torch.no_grad():
+        if _is_clip:
+            inputs = _embed_processor(text=texts, return_tensors="pt", padding=True, truncation=True).to(DEVICE)
+            features = _embed_model.get_text_features(**inputs)
+        else:
+            inputs = _embed_processor(text=texts, return_tensors="pt", padding=True, truncation=True).to(DEVICE)
+            outputs = _embed_model(**inputs)
+            if hasattr(outputs, "pooler_output") and outputs.pooler_output is not None:
+                features = outputs.pooler_output
+            else:
+                features = outputs.last_hidden_state[:, 0, :]
+    vecs = features.cpu().float().numpy()
+    norms = np.linalg.norm(vecs, axis=1, keepdims=True)
+    norms = np.where(norms == 0, 1, norms)
+    return vecs / norms
+# ── Image Embedding (direct, no captioning) ─────────────────────────────────
+def embed_image(image: PILImage.Image) -> np.ndarray:
+    """
+    Embed a PIL Image directly into the shared vector space.
+    No captioning step — the vision encoder handles it natively.
+    Returns a normalized float32 numpy vector.
+    """
+    import torch
+    from config import DEVICE
+    _load_embed_model()
+    if image.mode != "RGB":
+        image = image.convert("RGB")
+    with torch.no_grad():
+        if _is_clip:
+            inputs = _embed_processor(images=image, return_tensors="pt").to(DEVICE)
+            features = _embed_model.get_image_features(**inputs)
+        else:
+            # Qwen3-VL-Embedding: image input via processor
+            inputs = _embed_processor(images=image, return_tensors="pt").to(DEVICE)
+            outputs = _embed_model(**inputs)
+            if hasattr(outputs, "pooler_output") and outputs.pooler_output is not None:
+                features = outputs.pooler_output
+            else:
+                features = outputs.last_hidden_state[:, 0, :]
+    vec = features.squeeze(0).cpu().float().numpy()
+    norm = np.linalg.norm(vec)
+    if norm > 0:
+        vec = vec / norm
+    return vec
+def embed_image_bytes(data: bytes, mime_type: str = "image/jpeg") -> np.ndarray:
+    """Embed raw image bytes. Returns normalized float32 vector."""
+    image = PILImage.open(io.BytesIO(data))
+    return embed_image(image)
+# ── LLM Summarization ──────────────────────────────────────────────────────
+def llm_summarize(query: str, search_results: list[dict], mode: str = "image") -> str:
+    """
+    Pass search results through an LLM for human-friendly interpretation.
+    Tries: local model -> HF Inference API -> plain text fallback.
+    """
+    from config import LLM_MODEL, LLM_FALLBACK, HF_TOKEN
+    if not search_results:
+        return f'No results found for "{query}". Try uploading more media or using different search terms.'
+    # Build prompt context
+    if mode == "video":
+        results_text = "\n".join(
+            f"  - Video: {r.get('video_name', '?')}, "
+            f"Time: {r.get('timestamp_label', '?')} ({r.get('timestamp_sec', 0):.1f}s), "
+            f"Score: {r.get('score', 0):.4f}"
+            for r in search_results
+        )
+        instruction = (
+            "You are a vision search assistant. Summarize the video search results below. "
+            "Highlight the most relevant moments and time ranges. Be concise. Use markdown."
+        )
+    else:
+        results_text = "\n".join(
+            f"  - Image: {r.get('file_name', '?')}, "
+            f"Score: {r.get('score', 0):.4f}"
+            for r in search_results
+        )
+        instruction = (
+            "You are a vision search assistant. Summarize the image search results below. "
+            "Highlight the most relevant matches. Be concise. Use markdown."
+        )
+    prompt = (
+        f"{instruction}\n\n"
+        f"User query: \"{query}\"\n\n"
+        f"Search results ({len(search_results)} matches):\n{results_text}\n\n"
+        f"Summary:"
+    )
+    # Try HF Inference API (works for both local and remote models)
+    for model_id in (LLM_MODEL, LLM_FALLBACK):
+        try:
+            from huggingface_hub import InferenceClient
+            client = InferenceClient(
+                model=model_id,
+                token=HF_TOKEN if HF_TOKEN else None,
+            )
+            response = client.text_generation(
+                prompt,
+                max_new_tokens=300,
+                temperature=0.7,
+                do_sample=True,
+            )
+            if response and response.strip():
+                return response.strip()
+        except Exception as e:
+            logger.warning(f"LLM {model_id} failed: {e}")
+            continue
+    # Plain text fallback
+    return (
+        f"**Found {len(search_results)} results for \"{query}\"**\n\n"
+        f"_(LLM summary unavailable)_\n\n"
+        f"```\n{results_text}\n```"
+    )

ingest.py ADDED Viewed

	@@ -0,0 +1,340 @@

+# HF_Space_hipVS/ingest.py
+# =========================
+# Ingestion pipeline — embeds images/frames DIRECTLY with Qwen3-VL or CLIP.
+# No captioning step. The vision-language model encodes images and text
+# into the same vector space natively.
+#
+# CAGRA is rebuilt on every insert (optimized for query, not ingestion).
+import logging
+import os
+import shutil
+import subprocess
+import tempfile
+import time
+from pathlib import Path
+from PIL import Image as PILImage
+from config import (
+    EMBED_DIM,
+    FRAME_EVERY_SEC,
+    IMAGE_EXTENSIONS,
+    VIDEO_EXTENSIONS,
+    get_project_dir,
+    DEFAULT_PROJECT,
+)
+from embedding import embed_image, embed_image_bytes
+from vector_store import get_store
+logger = logging.getLogger(__name__)
+# ── Helpers ──────────────────────────────────────────────────────────────────
+def fmt_time(seconds: float) -> str:
+    m, s = divmod(int(seconds), 60)
+    return f"{m:02d}:{s:02d}"
+def check_ffmpeg() -> bool:
+    try:
+        subprocess.run(["ffprobe", "-version"], capture_output=True, timeout=5)
+        return True
+    except (FileNotFoundError, subprocess.TimeoutExpired):
+        return False
+HAS_FFMPEG = check_ffmpeg()
+def get_duration(video_path: str) -> float:
+    try:
+        r = subprocess.run(
+            ["ffprobe", "-v", "error",
+             "-show_entries", "format=duration",
+             "-of", "default=noprint_wrappers=1:nokey=1",
+             video_path],
+            capture_output=True, text=True, timeout=30,
+        )
+        return float(r.stdout.strip())
+    except Exception as e:
+        logger.warning(f"ffprobe error: {e}")
+        return 0.0
+def extract_frame(video_path: str, timestamp_sec: float, out_path: str) -> bool:
+    result = subprocess.run(
+        ["ffmpeg", "-y",
+         "-ss", f"{timestamp_sec:.3f}",
+         "-i", video_path,
+         "-frames:v", "1",
+         "-q:v", "2",
+         "-vf", "scale=640:-1",
+         out_path],
+        capture_output=True, timeout=30,
+    )
+    return result.returncode == 0 and os.path.exists(out_path) and os.path.getsize(out_path) > 0
+def get_image_meta(path: Path) -> dict:
+    stat = path.stat()
+    size = f"{round(stat.st_size / 1024, 1)}KB"
+    try:
+        with PILImage.open(path) as img:
+            res = f"{img.width}x{img.height}"
+    except Exception:
+        res = "unknown"
+    return {
+        "file_path": str(path.resolve()),
+        "file_name": path.name,
+        "file_size": size,
+        "resolution": res,
+    }
+# ── Image Ingestion ─────────────────────────────────────────────────────────
+def ingest_images(project: str = DEFAULT_PROJECT, progress_callback=None) -> tuple[int, str]:
+    """Ingest all images from a project's images/ directory."""
+    proj_dir = get_project_dir(project)
+    image_dir = proj_dir / "images"
+    store = get_store(project, "image_index")
+    files = sorted(
+        f for f in image_dir.iterdir()
+        if f.suffix.lower() in IMAGE_EXTENSIONS
+    )[:200]
+    if not files:
+        return 0, f"No images found in {image_dir}"
+    store.clear()
+    log = [f"[{project}] Found {len(files)} images\n"]
+    import numpy as np
+    all_vectors = []
+    all_ids = []
+    all_meta = []
+    for i, p in enumerate(files):
+        meta = get_image_meta(p)
+        try:
+            img = PILImage.open(p)
+            vec = embed_image(img)  # direct multimodal embed, no captioning
+            all_vectors.append(vec)
+            all_ids.append(meta["file_name"])
+            all_meta.append(meta)
+            log.append(f"  [{i+1}/{len(files)}] {p.name} ({meta['resolution']})")
+        except Exception as e:
+            log.append(f"  [{i+1}/{len(files)}] {p.name}: FAILED ({e})")
+        if progress_callback:
+            progress_callback((i + 1) / len(files), desc=f"Embedding {p.name}...")
+    if all_vectors:
+        vectors = np.stack(all_vectors)
+        store.add(vectors, all_ids, all_meta)  # CAGRA rebuilt inside add()
+    log.append(f"\n{len(all_vectors)} images indexed ({store.mode})")
+    return len(all_vectors), "\n".join(log)
+def ingest_single_image(file_path: str, project: str = DEFAULT_PROJECT) -> tuple[bool, str]:
+    """Ingest a single uploaded image. CAGRA is rebuilt."""
+    path = Path(file_path)
+    proj_dir = get_project_dir(project)
+    dest = proj_dir / "images" / path.name
+    shutil.copy2(str(path), str(dest))
+    store = get_store(project, "image_index")
+    meta = get_image_meta(dest)
+    try:
+        img = PILImage.open(dest)
+        vec = embed_image(img)
+        store.append_and_rebuild(vec, meta["file_name"], meta)
+        return True, f"Indexed: {path.name} ({meta['resolution']})"
+    except Exception as e:
+        return False, f"Failed: {path.name} -- {e}"
+def ingest_image_from_pil(
+    image: PILImage.Image,
+    file_name: str,
+    extra_meta: dict | None = None,
+    project: str = DEFAULT_PROJECT,
+) -> tuple[bool, str]:
+    """Ingest a PIL Image directly (used by seed_data). No CAGRA rebuild per-image."""
+    proj_dir = get_project_dir(project)
+    dest = proj_dir / "images" / file_name
+    store = get_store(project, "image_index")
+    try:
+        if not dest.exists():
+            image.save(str(dest))
+        vec = embed_image(image)
+        meta = {
+            "file_name": file_name,
+            "file_path": str(dest.resolve()),
+            **(extra_meta or {})
+        }
+        store.append(vec, file_name, meta)  # no rebuild — seed_data calls rebuild at end
+        return True, file_name
+    except Exception as e:
+        return False, str(e)
+# ── Video Ingestion ─────────────────────────────────────────────────────────
+def ingest_videos(project: str = DEFAULT_PROJECT, progress_callback=None) -> tuple[int, str]:
+    """Ingest all videos from a project's videos/ directory."""
+    if not HAS_FFMPEG:
+        return 0, "ffmpeg not found -- install ffmpeg for video ingestion."
+    proj_dir = get_project_dir(project)
+    video_dir = proj_dir / "videos"
+    store = get_store(project, "video_index")
+    frames_root = proj_dir / "videos" / "frames"
+    frames_root.mkdir(parents=True, exist_ok=True)
+    files = sorted(
+        f for f in video_dir.iterdir()
+        if f.suffix.lower() in VIDEO_EXTENSIONS
+    )
+    if not files:
+        return 0, f"No videos found in {video_dir}"
+    store.clear()
+    log = [f"[{project}] Found {len(files)} video(s) -- frame interval: {FRAME_EVERY_SEC}s\n"]
+    total = 0
+    for video_path in files:
+        video_str = str(video_path.resolve())
+        duration = get_duration(video_str)
+        if duration <= 0:
+            log.append(f"  Skipping {video_path.name} (duration unreadable)")
+            continue
+        timestamps = [0.5]
+        t = float(FRAME_EVERY_SEC)
+        while t < duration:
+            timestamps.append(round(t, 2))
+            t += FRAME_EVERY_SEC
+        if (duration - 1.0) not in timestamps:
+            timestamps.append(round(max(0, duration - 1.0), 2))
+        timestamps = sorted(set(timestamps))
+        log.append(f"  {video_path.name} ({duration:.1f}s -> {len(timestamps)} frames)")
+        with tempfile.TemporaryDirectory() as tmp_dir:
+            for idx, ts in enumerate(timestamps):
+                frame_path = os.path.join(tmp_dir, f"frame_{idx:05d}.jpg")
+                if not extract_frame(video_str, ts, frame_path):
+                    continue
+                try:
+                    with open(frame_path, "rb") as f:
+                        frame_data = f.read()
+                    # Save frame permanently
+                    perm_frame_path = frames_root / f"{video_path.name}_{ts:.2f}.jpg"
+                    shutil.copy2(frame_path, str(perm_frame_path))
+                    vec = embed_image_bytes(frame_data)
+                    frame_meta = {
+                        "video_path": video_str,
+                        "video_name": video_path.name,
+                        "frame_path": str(perm_frame_path.resolve()),
+                        "timestamp_sec": ts,
+                        "timestamp_label": fmt_time(ts),
+                        "duration_total": round(duration, 2),
+                    }
+                    store.append(vec, f"{video_path.name}@{ts}", frame_meta)
+                    total += 1
+                    time.sleep(0.05)
+                except Exception as e:
+                    log.append(f"    ts={fmt_time(ts)}: FAILED ({e})")
+                if progress_callback:
+                    progress_callback(
+                        (idx + 1) / len(timestamps),
+                        desc=f"{video_path.name} frame {idx+1}/{len(timestamps)}",
+                    )
+        log.append(f"    Done ({len(timestamps)} frames)")
+    # Rebuild CAGRA once for all videos
+    if store.has_data():
+        store.rebuild_gpu_index()
+        store._persist()
+    log.append(f"\n{total} video frames indexed ({store.mode})")
+    return total, "\n".join(log)
+def ingest_single_video(file_path: str, project: str = DEFAULT_PROJECT, progress_callback=None) -> tuple[int, str]:
+    """Ingest a single uploaded video. CAGRA rebuilt at end."""
+    path = Path(file_path)
+    proj_dir = get_project_dir(project)
+    dest = proj_dir / "videos" / path.name
+    shutil.copy2(str(path), str(dest))
+    if not HAS_FFMPEG:
+        return 0, "ffmpeg not found"
+    store = get_store(project, "video_index")
+    video_str = str(dest.resolve())
+    duration = get_duration(video_str)
+    if duration <= 0:
+        return 0, f"Could not read duration for {path.name}"
+    frames_root = proj_dir / "videos" / "frames"
+    frames_root.mkdir(parents=True, exist_ok=True)
+    timestamps = [0.5]
+    t = float(FRAME_EVERY_SEC)
+    while t < duration:
+        timestamps.append(round(t, 2))
+        t += FRAME_EVERY_SEC
+    timestamps = sorted(set(timestamps))
+    count = 0
+    with tempfile.TemporaryDirectory() as tmp_dir:
+        for idx, ts in enumerate(timestamps):
+            frame_path = os.path.join(tmp_dir, f"frame_{idx:05d}.jpg")
+            if not extract_frame(video_str, ts, frame_path):
+                continue
+            try:
+                with open(frame_path, "rb") as f:
+                    frame_data = f.read()
+                # Save frame permanently
+                perm_frame_path = frames_root / f"{path.name}_{ts:.2f}.jpg"
+                shutil.copy2(frame_path, str(perm_frame_path))
+                vec = embed_image_bytes(frame_data)
+                frame_meta = {
+                    "video_path": video_str,
+                    "video_name": path.name,
+                    "frame_path": str(perm_frame_path.resolve()),
+                    "timestamp_sec": ts,
+                    "timestamp_label": fmt_time(ts),
+                    "duration_total": round(duration, 2),
+                }
+                store.append(vec, f"{path.name}@{ts}", frame_meta)
+                count += 1
+            except Exception as e:
+                logger.error(f"Frame embed error: {e}")
+            if progress_callback:
+                progress_callback((idx + 1) / len(timestamps))
+    # Rebuild CAGRA after all frames
+    if store.has_data():
+        store.rebuild_gpu_index()
+        store._persist()
+    return count, f"{count} frames indexed for {path.name} ({duration:.1f}s)"

ingest_sample_vision.py ADDED Viewed

	@@ -0,0 +1,254 @@

+#!/usr/bin/env python3
+"""
+ingest_sample_vision.py
+========================
+Populates the index with synthetic sample data (NO model download needed).
+Uses random embeddings seeded by text hashes so that similar words produce
+similar vectors — good enough to demonstrate the full search pipeline.
+After ingestion, runs a sample query and prints results in the same
+format as the original SurrealDB-based scripts.
+Usage:
+  python ingest_sample_vision.py
+"""
+import hashlib
+import json
+import numpy as np
+from config import DEFAULT_PROJECT, EMBED_DIM
+from vector_store import get_store
+# -- Synthetic embedding (no model needed) ------------------------------------
+def fake_embed(text: str, dim: int = EMBED_DIM) -> np.ndarray:
+    """
+    Deterministic pseudo-embedding from text.
+    Same text always produces the same vector; similar texts produce
+    somewhat similar vectors (via shared n-gram hashing).
+    """
+    rng = np.random.RandomState(int(hashlib.md5(text.encode()).hexdigest(), 16) % 2**31)
+    vec = rng.randn(dim).astype(np.float32)
+    # Mix in word-level hashes so "mountain landscape" is closer to "mountain" than "car"
+    words = text.lower().split()
+    for w in words:
+        word_seed = int(hashlib.md5(w.encode()).hexdigest(), 16) % 2**31
+        word_rng = np.random.RandomState(word_seed)
+        vec += word_rng.randn(dim).astype(np.float32) * 0.5
+    norm = np.linalg.norm(vec)
+    if norm > 0:
+        vec /= norm
+    return vec
+# -- Sample Data --------------------------------------------------------------
+SAMPLE_IMAGES = [
+    {"file_name": "mountain_sunset.jpg",   "file_size": "245.3KB", "resolution": "1920x1080", "description": "a majestic mountain with sunset colors"},
+    {"file_name": "dog_park.jpg",          "file_size": "189.7KB", "resolution": "1280x720",  "description": "a dog playing in the park"},
+    {"file_name": "red_car.jpg",           "file_size": "312.1KB", "resolution": "1920x1080", "description": "a red sports car on a highway"},
+    {"file_name": "ocean_waves.jpg",       "file_size": "276.4KB", "resolution": "2560x1440", "description": "ocean waves crashing on rocks"},
+    {"file_name": "city_night.jpg",        "file_size": "198.2KB", "resolution": "1920x1080", "description": "city skyline at night with lights"},
+    {"file_name": "cat_windowsill.jpg",    "file_size": "145.6KB", "resolution": "1280x960",  "description": "a cat sitting on a windowsill"},
+    {"file_name": "forest_trail.jpg",      "file_size": "334.8KB", "resolution": "2560x1440", "description": "a forest trail with tall trees and sunlight"},
+    {"file_name": "beach_sunset.jpg",      "file_size": "267.9KB", "resolution": "1920x1080", "description": "golden sunset over a sandy beach"},
+    {"file_name": "snow_mountain.jpg",     "file_size": "289.3KB", "resolution": "3840x2160", "description": "snow covered mountain peak under blue sky"},
+    {"file_name": "flower_garden.jpg",     "file_size": "203.5KB", "resolution": "1600x1200", "description": "colorful flowers in a garden"},
+]
+SAMPLE_VIDEO_FRAMES = [
+    {"video_name": "nature_doc.mp4", "video_path": "/data/videos/nature_doc.mp4", "duration_total": 120.0, "frames": [
+        (0.5,  "a wide shot of african savanna"),
+        (5.0,  "a rhino walking through grass"),
+        (10.0, "close up of a rhino face"),
+        (15.0, "birds flying over the savanna"),
+        (20.0, "a zebra herd drinking water"),
+        (25.0, "sunset over the savanna landscape"),
+        (30.0, "a lion resting under a tree"),
+        (35.0, "elephants crossing a river"),
+        (40.0, "aerial view of the grasslands"),
+        (45.0, "a cheetah running at full speed"),
+    ]},
+    {"video_name": "big_buck_bunny.mp4", "video_path": "/data/videos/big_buck_bunny.mp4", "duration_total": 60.0, "frames": [
+        (0.5,  "animated forest scene with butterflies"),
+        (5.0,  "a big bunny sitting in a meadow"),
+        (10.0, "the bunny stretching and yawning"),
+        (15.0, "small animals annoying the bunny"),
+        (20.0, "the bunny looking angry"),
+        (25.0, "the bunny chasing small creatures"),
+        (30.0, "a bird flying through the forest"),
+        (35.0, "the bunny setting up a trap"),
+        (40.0, "an explosion of fruit"),
+        (45.0, "the bunny laughing happily"),
+    ]},
+]
+# -- Helpers ------------------------------------------------------------------
+def fmt(seconds: float) -> str:
+    m, s = divmod(int(seconds), 60)
+    return f"{m:02d}:{s:02d}"
+# -- Main ---------------------------------------------------------------------
+def main():
+    print(f"\n{'='*60}")
+    print(f"  ARIA Vision — Sample Ingestion (Synthetic Embeddings)")
+    print(f"{'='*60}")
+    print(f"  Embed dim: {EMBED_DIM}")
+    print(f"  Project  : {DEFAULT_PROJECT}")
+    print()
+    # -- 1. Clear old indexes ---------------------------------------------
+    print("[1/4] Clearing old indexes...")
+    img_store = get_store(DEFAULT_PROJECT, "image_index")
+    vid_store = get_store(DEFAULT_PROJECT, "video_index")
+    img_store.clear()
+    vid_store.clear()
+    print("  Done.\n")
+    # -- 2. Ingest sample images ------------------------------------------
+    print("[2/4] Ingesting sample images...")
+    img_vecs = []
+    img_ids = []
+    img_meta = []
+    for img in SAMPLE_IMAGES:
+        vec = fake_embed(img["description"])
+        img_vecs.append(vec)
+        img_ids.append(img["file_name"])
+        img_meta.append({
+            "file_name": img["file_name"],
+            "file_size": img["file_size"],
+            "resolution": img["resolution"],
+            "file_path": f"/data/images/{img['file_name']}",
+        })
+        print(f"  OK {img['file_name']} ({img['resolution']})")
+    img_store.add(np.stack(img_vecs), img_ids, img_meta)
+    print(f"  {len(img_ids)} images indexed -> {img_store}\n")
+    # -- 3. Ingest sample video frames ------------------------------------
+    print("[3/4] Ingesting sample video frames...")
+    total_frames = 0
+    for video in SAMPLE_VIDEO_FRAMES:
+        print(f"  {video['video_name']} ({video['duration_total']:.0f}s -> {len(video['frames'])} frames)")
+        for ts, desc in video["frames"]:
+            vec = fake_embed(desc)
+            frame_meta = {
+                "video_path": video["video_path"],
+                "video_name": video["video_name"],
+                "timestamp_sec": ts,
+                "timestamp_label": fmt(ts),
+                "duration_total": video["duration_total"],
+            }
+            vid_store.append(vec, f"{video['video_name']}@{ts}", frame_meta)
+            total_frames += 1
+    # Rebuild CAGRA once after all frames
+    vid_store.rebuild_gpu_index()
+    vid_store._persist()
+    print(f"  {total_frames} video frames indexed -> {vid_store}\n")
+    # -- 4. Run sample queries --------------------------------------------
+    print("[4/4] Running sample queries...\n")
+    # --- Image query ---
+    query = "a majestic mountain"
+    print(f"{'='*60}")
+    print(f"  ARIA Vision — Image Search")
+    print(f"{'='*60}")
+    print(f"  Query: \"{query}\"")
+    print()
+    qvec = fake_embed(query)
+    results = img_store.search(qvec, top_k=5)
+    print(f"  {'-'*56}")
+    print(f"  {'Rank':<6} {'File':<25} {'Size':<10} {'Resolution':<12} {'Score':<8}")
+    print(f"  {'-'*56}")
+    for i, r in enumerate(results):
+        print(f"  {i+1:<6} {r.get('file_name','?'):<25} "
+              f"{r.get('file_size','?'):<10} "
+              f"{r.get('resolution','?'):<12} "
+              f"{r.get('score',0):.4f}")
+    print(f"  {'-'*56}")
+    output_img = {
+        "mode": "Image",
+        "query": query,
+        "results": [
+            {
+                "file_path": r.get("file_path", ""),
+                "file_name": r.get("file_name", ""),
+                "file_size": r.get("file_size", ""),
+                "resolution": r.get("resolution", ""),
+                "score": round(r.get("score", 0), 4),
+            }
+            for r in results
+        ],
+    }
+    print(f"\n  JSON Response:")
+    print(f"  {json.dumps(output_img, indent=2)}")
+    # --- Video query ---
+    query2 = "a big bunny"
+    print(f"\n{'='*60}")
+    print(f"  ARIA Vision — Video Intelligence Search")
+    print(f"{'='*60}")
+    print(f"  Query: \"{query2}\"")
+    print()
+    qvec2 = fake_embed(query2)
+    vid_results = vid_store.search(qvec2, top_k=10)
+    # Merge into time ranges
+    from search import _merge_video_hits
+    spans = _merge_video_hits(vid_results, gap=10.0)
+    print(f"  {'-'*62}")
+    print(f"  {'#':<4} {'Video':<24} {'Time Range':<16} {'Duration':<9} {'Frames':<7} {'Score'}")
+    print(f"  {'-'*62}")
+    for i, s in enumerate(spans):
+        dur = s["end_sec"] - s["start_sec"]
+        print(f"  {i+1:<4} {s['video_name'][:23]:<24} "
+              f"{fmt(s['start_sec'])} -> {fmt(s['end_sec']):<9} "
+              f"{dur:4.0f}s     "
+              f"{s['frames']:<7} "
+              f"{s['peak_score']:.4f}")
+    print(f"  {'-'*62}")
+    output_vid = {
+        "mode": "Video Intelligence",
+        "query": query2,
+        "matches": [
+            {
+                "video_name": s["video_name"],
+                "video_path": s.get("video_path", ""),
+                "start": fmt(s["start_sec"]),
+                "end": fmt(s["end_sec"]),
+                "start_seconds": s["start_sec"],
+                "end_seconds": s["end_sec"],
+                "score": round(s["peak_score"], 4),
+                "frames_matched": s["frames"],
+            }
+            for s in spans
+        ],
+    }
+    print(f"\n  JSON Response:")
+    print(f"  {json.dumps(output_vid, indent=2)}")
+    print(f"\n{'='*60}")
+    print(f"  OK Done — {len(img_ids)} images + {total_frames} video frames indexed")
+    print(f"  Store: {img_store}")
+    print(f"  Store: {vid_store}")
+    print(f"{'='*60}\n")
+if __name__ == "__main__":
+    main()

query_vision_image.py ADDED Viewed

	@@ -0,0 +1,91 @@

+#!/usr/bin/env python3
+"""
+query_vision_image.py
+======================
+Query the HF-native image_index using a text prompt.
+Embeds the text with CLIP / Qwen3-VL, then performs cosine
+similarity search against stored image embeddings.
+Usage:
+  python query_vision_image.py "sunset over water"
+"""
+import sys
+import json
+from pathlib import Path
+from config import DEFAULT_PROJECT, EMBED_MODEL, EMBED_DIM
+from vector_store import get_store
+from embedding import embed_text
+TOP_K = 5
+MIN_SCORE = 0.15 # Adjusted for HF-native CLIP/Qwen scores
+def search_images(query: str):
+    print(f"\n{'='*60}")
+    print(f"  ARIA Vision — Image Search (HF-Native)")
+    print(f"{'='*60}")
+    print(f"  Query: \"{query}\"")
+    print(f"  Model: {EMBED_MODEL} ({EMBED_DIM}d)")
+    print()
+    print("  [1/3] Embedding query text...", end=" ", flush=True)
+    query_vector = embed_text(query)
+    print("✓")
+    print("  [2/3] Searching image_index...", end=" ", flush=True)
+    store = get_store(DEFAULT_PROJECT, "image_index")
+    raw_results = store.search(query_vector, top_k=TOP_K)
+    if not raw_results:
+        print("no results.")
+        print("\n  ⚠ No images found. Did you run ingest_sample_vision.py first?")
+        return
+    rows = [r for r in raw_results if r.get("score", 0) >= MIN_SCORE]
+    print(f"✓ ({len(rows)} matches)")
+    print(f"\n  [3/3] Results:")
+    print(f"  {'─'*56}")
+    print(f"  {'Rank':<6} {'File':<25} {'Size':<10} {'Resolution':<12} {'Score':<8}")
+    print(f"  {'─'*56}")
+    for i, row in enumerate(rows):
+        file_name = row.get("file_name", Path(row.get("file_path", "?")).name)
+        print(
+            f"  {i+1:<6} {file_name[:24]:<25} "
+            f"{row.get('file_size', '?'):<10} "
+            f"{row.get('resolution', '?'):<12} "
+            f"{row.get('score', 0):.4f}"
+        )
+    print(f"  {'─'*56}")
+    output = {
+        "mode": "Image",
+        "query": query,
+        "results": [
+            {
+                "file_path": r.get("file_path", ""),
+                "file_name": r.get("file_name", ""),
+                "file_size": r.get("file_size", ""),
+                "resolution": r.get("resolution", ""),
+                "score": round(r.get("score", 0), 4),
+            }
+            for r in rows
+        ],
+    }
+    print(f"\n  JSON Response:")
+    print(f"  {json.dumps(output, indent=2)}")
+    print()
+def main():
+    if len(sys.argv) < 2:
+        print("Usage: python query_vision_image.py \"your search query\"")
+        sys.exit(1)
+    query = " ".join(sys.argv[1:])
+    search_images(query)
+if __name__ == "__main__":
+    main()

query_vision_video.py ADDED Viewed

	@@ -0,0 +1,111 @@

+#!/usr/bin/env python3
+"""
+query_vision_video.py
+======================
+Query HF-native video_index using a text prompt.
+Each row is one embedded frame at a specific second.
+Adjacent high-scoring frames are merged into contiguous time ranges.
+Usage:
+  python query_vision_video.py "rhino running in the wild"
+  python query_vision_video.py "person waving"  --top 10  --min-score 0.15
+"""
+import sys
+import json
+from pathlib import Path
+from config import DEFAULT_PROJECT, EMBED_MODEL, EMBED_DIM
+from vector_store import get_store
+from embedding import embed_text
+from search import _merge_video_hits, _fmt as fmt
+TOP_K = 30
+MIN_SCORE = 0.15 # Adjusted for HF-native CLIP/Qwen scores
+MERGE_GAP_SEC = 10
+def search_video(query: str, top_k: int = TOP_K, min_score: float = MIN_SCORE):
+    print(f"\n{'='*60}")
+    print(f"  ARIA Vision — Video Intelligence Search (HF-Native)")
+    print(f"{'='*60}")
+    print(f"  Query      : \"{query}\"")
+    print(f"  Model      : {EMBED_MODEL} ({EMBED_DIM}d)")
+    print(f"  Min score  : {min_score}  |  Merge gap: {MERGE_GAP_SEC}s  |  Fetch top: {top_k}")
+    print()
+    print("  [1/3] Embedding query...", end=" ", flush=True)
+    qvec = embed_text(query)
+    print("✓")
+    print("  [2/3] Searching video_index...", end=" ", flush=True)
+    store = get_store(DEFAULT_PROJECT, "video_index")
+    raw_results = store.search(qvec, top_k=top_k)
+    if not raw_results:
+        print("no results.\n  ⚠ Run ingest_sample_vision.py first.")
+        return
+    print(f"✓  ({len(raw_results)} raw frames returned)")
+    hits = [r for r in raw_results if r.get("score", 0) >= min_score]
+    if not hits:
+        top3 = sorted(raw_results, key=lambda r: -r.get("score", 0))[:3]
+        print(f"\n  ⚠ No frames above score threshold ({min_score}).")
+        print(f"  Top 3 raw scores: {[round(r.get('score',0),4) for r in top3]}")
+        return
+    print(f"  [3/3] Merging {len(hits)} hits into time ranges...")
+    spans = _merge_video_hits(hits, gap=MERGE_GAP_SEC)
+    print()
+    print(f"  {'─'*62}")
+    print(f"  {'#':<4} {'Video':<24} {'Time Range':<16} {'Duration':<9} {'Frames':<7} {'Score'}")
+    print(f"  {'─'*62}")
+    for i, s in enumerate(spans):
+        dur = s["end_sec"] - s["start_sec"]
+        print(
+            f"  {i+1:<4} {s['video_name'][:23]:<24} "
+            f"{fmt(s['start_sec'])} → {fmt(s['end_sec']):<9} "
+            f"{dur:4.0f}s     "
+            f"{s['frames']:<7} "
+            f"{s['peak_score']:.4f}"
+        )
+    print(f"  {'─'*62}")
+    output = {
+        "mode":    "Video Intelligence",
+        "query":   query,
+        "matches": [
+            {
+                "video_name":    s["video_name"],
+                "video_path":    s.get("video_path", ""),
+                "start":         fmt(s["start_sec"]),
+                "end":           fmt(s["end_sec"]),
+                "start_seconds": s["start_sec"],
+                "end_seconds":   s["end_sec"],
+                "score":         round(s["peak_score"], 4),
+                "frames_matched": s["frames"],
+            }
+            for s in spans
+        ],
+    }
+    print()
+    print("  JSON Response:")
+    print(f"  {json.dumps(output, indent=2)}")
+def main():
+    args = [a for a in sys.argv[1:] if not a.startswith("--")]
+    top  = int(next((sys.argv[i+1] for i, a in enumerate(sys.argv) if a == "--top"), TOP_K))
+    msc  = float(next((sys.argv[i+1] for i, a in enumerate(sys.argv) if a == "--min-score"), MIN_SCORE))
+    if not args:
+        print('Usage: python query_vision_video.py "your query"')
+        sys.exit(1)
+    search_video(" ".join(args), top_k=top, min_score=msc)
+if __name__ == "__main__":
+    main()

requirements.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+gradio>=4.0
+transformers>=4.40
+datasets>=2.19
+huggingface_hub>=0.23
+Pillow>=10.0
+numpy>=1.24
+torch
+accelerate>=0.30
+python-dotenv>=1.0

search.py ADDED Viewed

	@@ -0,0 +1,123 @@

+# HF_Space_hipVS/search.py
+# =========================
+# Search — embed query, search project's vector store, LLM interpret.
+import logging
+from embedding import embed_text, llm_summarize
+from vector_store import get_store
+from config import DEFAULT_PROJECT
+logger = logging.getLogger(__name__)
+def _fmt(seconds: float) -> str:
+    m, s = divmod(int(seconds), 60)
+    return f"{m:02d}:{s:02d}"
+def _merge_video_hits(hits: list[dict], gap: float = 10.0) -> list[dict]:
+    """Merge adjacent frame-level hits into time ranges."""
+    if not hits:
+        return []
+    by_video: dict[str, list[dict]] = {}
+    for h in hits:
+        by_video.setdefault(h.get("video_name", "?"), []).append(h)
+    merged = []
+    for video_name, frames in by_video.items():
+        frames.sort(key=lambda x: x.get("timestamp_sec", 0))
+        cur = {
+            "video_name": video_name,
+            "video_path": frames[0].get("video_path", ""),
+            "start_sec":  frames[0].get("timestamp_sec", 0),
+            "end_sec":    frames[0].get("timestamp_sec", 0),
+            "peak_score": frames[0].get("score", 0),
+            "frames":     1,
+        }
+        for f in frames[1:]:
+            ts = f.get("timestamp_sec", 0)
+            if ts <= cur["end_sec"] + gap:
+                cur["end_sec"]    = ts
+                cur["peak_score"] = max(cur["peak_score"], f.get("score", 0))
+                cur["frames"]    += 1
+            else:
+                merged.append(cur)
+                cur = {
+                    "video_name": video_name,
+                    "video_path": f.get("video_path", ""),
+                    "start_sec":  ts,
+                    "end_sec":    ts,
+                    "peak_score": f.get("score", 0),
+                    "frames":     1,
+                }
+        merged.append(cur)
+    return sorted(merged, key=lambda x: -x["peak_score"])
+def search_images(query: str, project: str = DEFAULT_PROJECT, top_k: int = 10, min_score: float = 0.15) -> dict:
+    store = get_store(project, "image_index")
+    if store.count == 0:
+        return {
+            "query": query, "results": [],
+            "llm_summary": f"No images indexed in project '{project}'. Upload images first.",
+            "store_info": str(store),
+        }
+    query_vec = embed_text(query)
+    raw = store.search(query_vec, top_k=top_k)
+    filtered = [r for r in raw if r.get("score", 0) >= min_score]
+    summary = llm_summarize(query, filtered, mode="image")
+    return {
+        "query": query,
+        "results": filtered,
+        "llm_summary": summary,
+        "store_info": str(store),
+    }
+def search_videos(query: str, project: str = DEFAULT_PROJECT, top_k: int = 30, min_score: float = 0.15) -> dict:
+    store = get_store(project, "video_index")
+    if store.count == 0:
+        return {
+            "query": query, "matches": [],
+            "llm_summary": f"No videos indexed in project '{project}'. Upload videos first.",
+            "store_info": str(store),
+        }
+    query_vec = embed_text(query)
+    raw = store.search(query_vec, top_k=top_k)
+    filtered = [r for r in raw if r.get("score", 0) >= min_score]
+    spans = _merge_video_hits(filtered)
+    result_for_llm = [
+        {
+            "video_name": s["video_name"],
+            "timestamp_sec": s["start_sec"],
+            "timestamp_label": f"{_fmt(s['start_sec'])} - {_fmt(s['end_sec'])}",
+            "score": s["peak_score"],
+        }
+        for s in spans
+    ]
+    summary = llm_summarize(query, result_for_llm, mode="video")
+    return {
+        "query": query,
+        "matches": [
+            {
+                "id": i + 1,
+                "video_name": s["video_name"],
+                "start": _fmt(s["start_sec"]),
+                "end": _fmt(s["end_sec"]),
+                "start_seconds": s["start_sec"],
+                "end_seconds": s["end_sec"],
+                "score": round(s["peak_score"], 4),
+                "frames": s["frames"],
+                "representative_frame": s.get("frame_path", ""),
+            }
+            for i, s in enumerate(spans)
+        ],
+        "llm_summary": summary,
+        "store_info": str(store),
+    }

seed_data.py ADDED Viewed

	@@ -0,0 +1,70 @@

+# HF_Space_hipVS/seed_data.py
+# =============================
+# Auto-seed from a HF Dataset so the Space launches with content indexed.
+# Called on first launch if AUTO_SEED=true and the default project is empty.
+import logging
+from config import SEED_DATASET, SEED_SPLIT, HF_TOKEN, DEFAULT_PROJECT
+logger = logging.getLogger(__name__)
+def run(project: str = DEFAULT_PROJECT, progress_callback=None) -> tuple[int, str]:
+    """Seed a project with images from a HF dataset."""
+    from datasets import load_dataset
+    from ingest import ingest_image_from_pil
+    from vector_store import get_store
+    log = [f"Seeding project '{project}' from {SEED_DATASET} [{SEED_SPLIT}]\n"]
+    try:
+        ds = load_dataset(SEED_DATASET, split=SEED_SPLIT, token=HF_TOKEN or None)
+        log.append(f"Loaded {len(ds)} items")
+    except Exception as e:
+        msg = f"Failed to load dataset: {e}"
+        logger.error(msg)
+        return 0, msg
+    count = 0
+    total = len(ds)
+    for i, item in enumerate(ds):
+        image = item.get("image")
+        if image is None:
+            continue
+        filename = item.get("filename", f"seed_{i:05d}.jpg")
+        extra = {"source": SEED_DATASET}
+        # Grab any available caption as metadata (not used for embedding)
+        for key in ("caption", "sentences", "text"):
+            if key in item:
+                val = item[key]
+                extra["caption_hint"] = val[0] if isinstance(val, list) else str(val)
+                break
+        ok, _ = ingest_image_from_pil(image, filename, extra, project=project)
+        if ok:
+            count += 1
+            if count <= 3 or count % 50 == 0:
+                log.append(f"  [{count}/{total}] {filename}")
+        if progress_callback:
+            progress_callback((i + 1) / total, desc=f"Seeding {i+1}/{total}...")
+    # Rebuild CAGRA once after all images
+    store = get_store(project, "image_index")
+    if store.has_data():
+        store.rebuild_gpu_index()
+        store._persist()
+    log.append(f"\nSeeding complete: {count} images indexed")
+    log.append(f"Store: {store}")
+    return count, "\n".join(log)
+def is_needed() -> bool:
+    from config import AUTO_SEED
+    from vector_store import get_store
+    store = get_store(DEFAULT_PROJECT, "image_index")
+    return AUTO_SEED and not store.has_data()

test_store.py ADDED Viewed

	@@ -0,0 +1,50 @@

+"""Smoke test for multi-project vector store."""
+import sys
+import numpy as np
+sys.path.insert(0, "HF_Space_hipVS")
+from vector_store import get_store, list_projects, VectorStore
+# Test multi-project isolation
+DIM = 768  # CLIP dim for CPU
+store_a = get_store("project-alpha", "image_index")
+store_b = get_store("project-beta", "image_index")
+vecs_a = np.random.randn(30, DIM).astype(np.float32)
+vecs_b = np.random.randn(50, DIM).astype(np.float32)
+store_a.add(vecs_a, [f"alpha_{i}" for i in range(30)])
+store_b.add(vecs_b, [f"beta_{i}" for i in range(50)])
+print(f"Store A: {store_a}")
+print(f"Store B: {store_b}")
+# Search in A should not return B's vectors
+query = np.random.randn(DIM).astype(np.float32)
+results_a = store_a.search(query, top_k=3)
+results_b = store_b.search(query, top_k=3)
+print(f"Search A: {[r['id'] for r in results_a]}")
+print(f"Search B: {[r['id'] for r in results_b]}")
+# Verify isolation
+assert all("alpha" in r["id"] for r in results_a), "Project A returned non-alpha results!"
+assert all("beta" in r["id"] for r in results_b), "Project B returned non-beta results!"
+# Test append_and_rebuild
+store_a.append_and_rebuild(np.random.randn(DIM).astype(np.float32), "alpha_new", {"test": True})
+print(f"After append_and_rebuild: {store_a}")
+# Test persistence
+store_c = get_store("project-alpha", "image_index")  # should be cached
+assert store_c.count == 31
+print(f"Cached store same ref: {store_c is store_a}")
+# List projects
+print(f"Projects: {list_projects()}")
+# Cleanup
+store_a.clear()
+store_b.clear()
+print("All tests passed")

vector_store.py ADDED Viewed

	@@ -0,0 +1,420 @@

+# HF_Space_hipVS/vector_store.py
+# ================================
+# Multi-project vector store with 3-tier GPU acceleration.
+#
+# Key design:
+#   - Each project gets its own VectorStore instances (image_index, video_index)
+#   - CAGRA is rebuilt on every insert (optimized for query, not ingestion)
+#   - Indexes swap between NVMe and VRAM via async pinned-memory DMA
+#   - Multiple projects coexist by LRU-evicting cold indexes from VRAM
+#
+# Tiers:
+#   1. CAGRA graph (hipVS / cuVS) — ANN search in ~50us
+#   2. PyTorch flat tensor (hipBLAS matmul) — brute-force GPU
+#   3. NumPy CPU cosine similarity — works everywhere
+import json
+import logging
+import threading
+import numpy as np
+from pathlib import Path
+from config import USE_GPU, HF_TOKEN, HF_DATASET_REPO, SWAP_PATH, get_project_dir
+logger = logging.getLogger(__name__)
+# ── GPU Backend Detection ────────────────────────────────────────────────────
+_HIPVS_AVAILABLE = False
+_TORCH_CUDA_AVAILABLE = False
+_cagra = None
+if USE_GPU:
+    try:
+        from cuvs.neighbors import cagra as _cagra_mod
+        _cagra = _cagra_mod
+        _HIPVS_AVAILABLE = True
+        logger.info("Tier 1: hipVS (cuvs) -- CAGRA index enabled")
+    except ImportError:
+        pass
+    if not _HIPVS_AVAILABLE:
+        try:
+            import torch
+            if torch.cuda.is_available():
+                _TORCH_CUDA_AVAILABLE = True
+                props = torch.cuda.get_device_properties(0)
+                name = props.name.lower()
+                backend = "ROCm" if ("amd" in name or "radeon" in name) else "CUDA"
+                logger.info(f"Tier 2: PyTorch {backend} -- flat GPU search ({props.name})")
+        except ImportError:
+            pass
+if not _HIPVS_AVAILABLE and not _TORCH_CUDA_AVAILABLE:
+    logger.info("Tier 3: NumPy CPU vector search")
+# ── HF Dataset Persistence ──────────────────────────────────────────────────
+def _hf_save(name: str, ids: list[str], vectors: np.ndarray, metadata: list[dict]):
+    if not HF_DATASET_REPO or not HF_TOKEN:
+        return
+    try:
+        from datasets import Dataset
+        records = [
+            {"id": ids[i], "vector": vectors[i].tolist(), "metadata": json.dumps(metadata[i])}
+            for i in range(len(ids))
+        ]
+        ds = Dataset.from_list(records)
+        repo = f"{HF_DATASET_REPO}-{name}"
+        ds.push_to_hub(repo, token=HF_TOKEN, private=True)
+        logger.info(f"[{name}] Pushed {len(records)} vectors to HF Dataset")
+    except Exception as e:
+        logger.warning(f"[{name}] HF push failed: {e}")
+def _hf_load(name: str):
+    if not HF_DATASET_REPO or not HF_TOKEN:
+        return None
+    try:
+        from datasets import load_dataset
+        repo = f"{HF_DATASET_REPO}-{name}"
+        ds = load_dataset(repo, token=HF_TOKEN, split="train")
+        logger.info(f"[{name}] Loaded {len(ds)} vectors from HF Dataset")
+        return ds
+    except Exception:
+        return None
+# ── VectorStore ──────────────────────────────────────────────────────────────
+class VectorStore:
+    """
+    GPU-backed vector store with NVMe swap and CAGRA rebuild-on-insert.
+    Lifecycle:
+      1. add(vectors, ids, meta)         — bulk add + CAGRA rebuild + persist
+      2. append(vector, id, meta)        — single add, NO rebuild (caller decides)
+      3. append_and_rebuild(v, id, meta) — single add + CAGRA rebuild + persist
+      4. search(query, top_k)            — search (auto-loads from NVMe if needed)
+      5. evict()                         — free VRAM, keep NVMe
+      6. restore()                       — NVMe -> VRAM (async, pinned DMA)
+    """
+    def __init__(self, name: str, index_dir: Path | None = None):
+        self.name = name
+        self._index_dir = index_dir or SWAP_PATH
+        self._index_dir.mkdir(parents=True, exist_ok=True)
+        self._vectors: np.ndarray | None = None
+        self._ids: list[str] = []
+        self._metadata: list[dict] = []
+        # GPU state
+        self._gpu_index = None    # CAGRA index object
+        self._gpu_vecs = None     # torch tensor (flat fallback)
+        self._in_vram = False
+        # File paths
+        self._npz_file = self._index_dir / f"{name}.npz"
+        self._meta_file = self._index_dir / f"{name}_meta.json"
+        self._cagra_file = self._index_dir / f"{name}.cagra"
+        # Load from disk on init
+        if self._npz_file.exists():
+            self._load_from_disk()
+        else:
+            self._load_from_hf()
+    # ── Add / Append ─────────────────────────────────────────────────────────
+    def add(self, vectors: np.ndarray, ids: list[str], metadata: list[dict] | None = None):
+        """Bulk add vectors + rebuild CAGRA + persist."""
+        if len(vectors) == 0:
+            return
+        self._vectors = vectors.astype(np.float32)
+        self._ids = list(ids)
+        self._metadata = metadata or [{} for _ in ids]
+        self._normalize()
+        self.rebuild_gpu_index()
+        self._persist()
+        logger.info(f"[{self.name}] Indexed {len(ids)} vectors (mode={self.mode})")
+    def append(self, vector: np.ndarray, vid: str, meta: dict | None = None):
+        """Append one vector. NO CAGRA rebuild (batch callers rebuild at end)."""
+        vector = vector.astype(np.float32).reshape(1, -1)
+        norm = np.linalg.norm(vector)
+        if norm > 0:
+            vector = vector / norm
+        if self._vectors is not None and len(self._vectors) > 0:
+            self._vectors = np.vstack([self._vectors, vector])
+        else:
+            self._vectors = vector
+        self._ids.append(vid)
+        self._metadata.append(meta or {})
+        self._in_vram = False  # invalidate GPU index
+    def append_and_rebuild(self, vector: np.ndarray, vid: str, meta: dict | None = None):
+        """Append one vector + rebuild CAGRA + persist."""
+        self.append(vector, vid, meta)
+        self.rebuild_gpu_index()
+        self._persist()
+    # ── Search ───────────────────────────────────────────────────────────────
+    def search(self, query: np.ndarray, top_k: int = 10) -> list[dict]:
+        """
+        Cosine similarity search. Auto-restores from NVMe if not in VRAM.
+        Returns list of dicts: {id, score, ...metadata}
+        """
+        if self._vectors is None or len(self._vectors) == 0:
+            return []
+        query = query.astype(np.float32)
+        norm = np.linalg.norm(query)
+        if norm > 0:
+            query = query / norm
+        # Auto-load GPU index if needed
+        if ((_HIPVS_AVAILABLE or _TORCH_CUDA_AVAILABLE) and not self._in_vram):
+            self.rebuild_gpu_index()
+        if _HIPVS_AVAILABLE and self._gpu_index is not None:
+            return self._search_cagra(query, top_k)
+        elif _TORCH_CUDA_AVAILABLE and self._in_vram:
+            return self._search_torch(query, top_k)
+        return self._search_numpy(query, top_k)
+    def _search_numpy(self, query: np.ndarray, top_k: int) -> list[dict]:
+        scores = self._vectors @ query
+        k = min(top_k, len(self._ids))
+        if len(scores) > top_k:
+            idx = np.argpartition(scores, -k)[-k:]
+            idx = idx[np.argsort(scores[idx])[::-1]]
+        else:
+            idx = np.argsort(scores)[::-1][:k]
+        return [{"id": self._ids[i], "score": float(scores[i]), **self._metadata[i]} for i in idx]
+    def _search_cagra(self, query: np.ndarray, top_k: int) -> list[dict]:
+        import cupy as cp
+        q = cp.asarray(query.reshape(1, -1))
+        search_params = _cagra.SearchParams()
+        distances, indices = _cagra.search(search_params, self._gpu_index, q, top_k)
+        results = []
+        for idx, dist in zip(indices[0].get().tolist(), distances[0].get().tolist()):
+            if 0 <= idx < len(self._ids):
+                results.append({"id": self._ids[idx], "score": -float(dist), **self._metadata[idx]})
+        return results
+    def _search_torch(self, query: np.ndarray, top_k: int) -> list[dict]:
+        import torch
+        q = torch.from_numpy(query).to(self._gpu_vecs.device, dtype=self._gpu_vecs.dtype).unsqueeze(0)
+        scores = (q @ self._gpu_vecs.T).squeeze(0)
+        k = min(top_k, len(self._ids))
+        top_scores, top_idx = torch.topk(scores, k=k)
+        return [
+            {"id": self._ids[i], "score": float(s), **self._metadata[i]}
+            for i, s in zip(top_idx.cpu().tolist(), top_scores.cpu().tolist())
+        ]
+    # ── GPU Index Build (CAGRA rebuilt on every insert) ──────────────────────
+    def rebuild_gpu_index(self):
+        """Build/rebuild the GPU index from current vectors."""
+        if self._vectors is None or len(self._vectors) == 0:
+            return
+        if _HIPVS_AVAILABLE:
+            self._build_cagra()
+        elif _TORCH_CUDA_AVAILABLE:
+            self._build_torch()
+    def _build_cagra(self):
+        import cupy as cp
+        d_vecs = cp.asarray(self._vectors)
+        params = _cagra.IndexParams()
+        params.metric = "sqeuclidean"
+        params.graph_degree = 64
+        params.intermediate_graph_degree = 128
+        params.build_algo = "IVF_PQ"
+        logger.info(f"[{self.name}] Building CAGRA ({self._vectors.shape}) ...")
+        self._gpu_index = _cagra.build(params, d_vecs)
+        # Serialize to NVMe for fast restore after eviction
+        _cagra.serialize(str(self._cagra_file), self._gpu_index)
+        self._in_vram = True
+        logger.info(f"[{self.name}] CAGRA built + serialized")
+    def _build_torch(self):
+        import torch
+        self._gpu_vecs = torch.from_numpy(self._vectors).cuda().half()
+        self._in_vram = True
+    # ── NVMe <-> VRAM Swap ───────────────────────────────────────────────────
+    def evict(self):
+        """Free VRAM. NVMe files stay intact for fast restore()."""
+        if not self._in_vram:
+            return
+        self._gpu_index = None
+        self._gpu_vecs = None
+        if _HIPVS_AVAILABLE or _TORCH_CUDA_AVAILABLE:
+            import torch
+            torch.cuda.empty_cache()
+        self._in_vram = False
+        logger.info(f"[{self.name}] Evicted from VRAM")
+    def restore(self):
+        """
+        Restore index from NVMe to VRAM via async pinned-memory copy.
+        Does NOT re-embed or re-read source files.
+        """
+        if self._in_vram:
+            return
+        if _HIPVS_AVAILABLE and self._cagra_file.exists():
+            logger.info(f"[{self.name}] Restoring CAGRA from NVMe (async) ...")
+            self._gpu_index = _cagra.deserialize(str(self._cagra_file))
+            self._in_vram = True
+            logger.info(f"[{self.name}] CAGRA restored to VRAM")
+        elif _TORCH_CUDA_AVAILABLE and self._vectors is not None:
+            import torch
+            # Pinned memory -> VRAM (async DMA copy)
+            pinned = torch.from_numpy(self._vectors).pin_memory()
+            self._gpu_vecs = pinned.to("cuda", non_blocking=True, dtype=torch.float16)
+            self._in_vram = True
+            logger.info(f"[{self.name}] Flat tensor restored to VRAM (async)")
+        # Load IDs if needed
+        if not self._ids and self._npz_file.exists():
+            data = np.load(self._npz_file, allow_pickle=True)
+            self._ids = data["ids"].tolist()
+            if self._meta_file.exists():
+                with open(self._meta_file, "r") as f:
+                    self._metadata = json.load(f)
+    # ── Persistence ──────────────────────────────────────────────────────────
+    def _persist(self):
+        self._save_to_disk()
+        if HF_DATASET_REPO and HF_TOKEN:
+            _hf_save(self.name, self._ids, self._vectors, self._metadata)
+    def _save_to_disk(self):
+        if self._vectors is None:
+            return
+        np.savez_compressed(self._npz_file, vectors=self._vectors, ids=np.array(self._ids, dtype=object))
+        with open(self._meta_file, "w") as f:
+            json.dump(self._metadata, f)
+    def _load_from_disk(self):
+        try:
+            data = np.load(self._npz_file, allow_pickle=True)
+            self._vectors = data["vectors"].astype(np.float32)
+            self._ids = data["ids"].tolist()
+            if self._meta_file.exists():
+                with open(self._meta_file, "r") as f:
+                    self._metadata = json.load(f)
+            else:
+                self._metadata = [{} for _ in self._ids]
+            logger.info(f"[{self.name}] Loaded {len(self._ids)} vectors from disk")
+        except Exception as e:
+            logger.error(f"[{self.name}] Disk load failed: {e}")
+    def _load_from_hf(self):
+        ds = _hf_load(self.name)
+        if ds is None or len(ds) == 0:
+            return
+        try:
+            self._ids = ds["id"]
+            self._vectors = np.array(ds["vector"], dtype=np.float32)
+            self._metadata = [json.loads(m) for m in ds["metadata"]]
+            self._save_to_disk()
+        except Exception as e:
+            logger.error(f"[{self.name}] HF load failed: {e}")
+    def _normalize(self):
+        if self._vectors is None:
+            return
+        norms = np.linalg.norm(self._vectors, axis=1, keepdims=True)
+        norms = np.where(norms == 0, 1, norms)
+        self._vectors = self._vectors / norms
+    # ── Utilities ────────────────────────────────────────────────────────────
+    def clear(self):
+        self._vectors = None
+        self._ids = []
+        self._metadata = []
+        self._gpu_index = None
+        self._gpu_vecs = None
+        self._in_vram = False
+        for f in (self._npz_file, self._meta_file, self._cagra_file):
+            if f.exists():
+                f.unlink()
+    def has_data(self) -> bool:
+        return self._vectors is not None and len(self._ids) > 0
+    @property
+    def count(self) -> int:
+        return len(self._ids) if self._ids else 0
+    @property
+    def in_vram(self) -> bool:
+        return self._in_vram
+    @property
+    def mode(self) -> str:
+        if _HIPVS_AVAILABLE:
+            return "CAGRA (hipVS GPU)"
+        elif _TORCH_CUDA_AVAILABLE:
+            return "Flat Tensor (GPU)"
+        return "NumPy (CPU)"
+    def __len__(self):
+        return self.count
+    def __repr__(self):
+        vram = "VRAM" if self._in_vram else "NVMe"
+        return f"VectorStore('{self.name}', n={self.count}, {self.mode}, {vram})"
+# ── Multi-Project Store Registry ────────────────────────────────────────────
+_stores: dict[str, VectorStore] = {}
+_lock = threading.Lock()
+def get_store(project: str, index_name: str) -> VectorStore:
+    """
+    Get or create a VectorStore for a specific project + index.
+    Stores are cached globally and share the same GPU memory pool.
+    """
+    key = f"{project}/{index_name}"
+    with _lock:
+        if key not in _stores:
+            proj_dir = get_project_dir(project)
+            idx_dir = proj_dir / "indexes"
+            _stores[key] = VectorStore(index_name, index_dir=idx_dir)
+            logger.info(f"Store created: {_stores[key]}")
+        return _stores[key]
+def list_projects() -> list[str]:
+    """List all projects that have at least one index file."""
+    from config import PROJECTS_DIR
+    projects = []
+    if PROJECTS_DIR.exists():
+        for p in sorted(PROJECTS_DIR.iterdir()):
+            if p.is_dir():
+                projects.append(p.name)
+    return projects
+def evict_all():
+    """Evict all stores from VRAM."""
+    with _lock:
+        for store in _stores.values():
+            if store.in_vram:
+                store.evict()