Spaces:

NanG01
/

visionq

Running

App Files Files Community

NanG01 commited on 15 days ago

Commit

bf88c57

1 Parent(s): 2402166

Clean up repo: remove archive, dev docs, nested submodule; fix license badge

Browse files

Files changed (44) hide show

.gitignore +7 -0
BEFORE_AFTER.md +0 -422
FINAL_FIX.md +0 -177
FIXES_APPLIED.md +0 -124
FIX_TENSORFLOW.md +0 -138
MASTER_TROUBLESHOOTING.md +0 -232
MODELS_AND_SPEED.md +0 -203
README.md +1 -1
RESTRUCTURE_SUMMARY.md +0 -348
TROUBLESHOOTING.md +0 -191
VisionQ +0 -1
archive/old_agents/caption_agent.py +0 -40
archive/old_agents/memory_agent.py +0 -59
archive/old_agents/query_agent.py +0 -127
archive/old_agents/vision_agent.py +0 -210
archive/old_agents/voice_agent.py +0 -127
archive/old_docs/ARCHITECTURE.md +0 -445
archive/old_docs/COMPARISON.md +0 -431
archive/old_docs/DEPLOYMENT_CHECKLIST.md +0 -397
archive/old_docs/INDEX.md +0 -359
archive/old_docs/QUICKSTART.md +0 -197
archive/old_docs/QUICK_REFERENCE.md +0 -315
archive/old_docs/README_UPGRADED.md +0 -410
archive/old_docs/SUMMARY.md +0 -406
archive/old_docs/UPGRADE_GUIDE.md +0 -532
archive/old_scripts/ask_question.py +0 -19
archive/old_scripts/ask_question_upgraded.py +0 -41
archive/old_scripts/install_upgrade.bat +0 -101
archive/old_scripts/main.py +0 -66
archive/old_scripts/main_upgraded.py +0 -85
archive/old_scripts/test_upgrade.py +0 -274
archive/pipcheck.txt +0 -0
archive/requirements_upgraded.txt +0 -54
cleanup.bat +0 -65
config/fast_mode.py +0 -40
docs/CAMERA_FEED.md +0 -178
docs/PERFORMANCE.md +0 -187
docs/PERFORMANCE_ANALYSIS.md +0 -310
extras/labelmap_M.txt +0 -91
fix_and_run.bat +0 -40
fix_tensorflow.bat +0 -43
memory.json +0 -0
run_continuous.bat +0 -30
ui/app_continuous.py +0 -340

.gitignore CHANGED Viewed

@@ -51,6 +51,13 @@ models/piper/
 *.tflite
 *.onnx
 # Environment variables
 .env

 *.tflite
 *.onnx
+# Runtime data at root
+memory.json
+# Local archive and extras
+archive/
+extras/
 # Environment variables
 .env

BEFORE_AFTER.md DELETED Viewed

@@ -1,422 +0,0 @@
-# 📊 VisionQ - Before & After Restructuring
-## 🎯 TRANSFORMATION SUMMARY
-Your VisionQ project has been transformed from a **cluttered development project** to a **clean, production-ready application** with a **web interface**!
----
-## 📂 FOLDER STRUCTURE COMPARISON
-### **BEFORE** ❌
-```
-VisionQ/
-├── agents/ (new)
-├── core/ (new)
-├── data/
-├── extras/
-├── models/
-├── VisionQ/
-├── caption_agent.py (duplicate)
-├── memory_agent.py (duplicate)
-├── query_agent.py (duplicate)
-├── vision_agent.py (duplicate)
-├── voice_agent.py (duplicate)
-├── main.py (old)
-├── main_upgraded.py (old)
-├── ask_question.py (old)
-├── ask_question_upgraded.py (old)
-├── test_upgrade.py (old)
-├── install_upgrade.bat (old)
-├── requirements.txt (old)
-├── requirements_upgraded.txt (old)
-├── README.md (old)
-├── README_UPGRADED.md (duplicate)
-├── ARCHITECTURE.md (old)
-├── COMPARISON.md (old)
-├── DEPLOYMENT_CHECKLIST.md (old)
-├── INDEX.md (old)
-├── QUICK_REFERENCE.md (old)
-├── QUICKSTART.md (old)
-├── SUMMARY.md (old)
-├── UPGRADE_GUIDE.md (old)
-└── ... (many more files)
-❌ 40+ files in root
-❌ Duplicate files
-❌ Confusing structure
-❌ No web interface
-❌ Scattered docs
-```
-### **AFTER** ✅
-```
-VisionQ/
-├── 📁 agents/          # AI agents (clean)
-├── 📁 config/          # Settings (centralized)
-├── 📁 ui/              # Web interface (NEW!)
-├── 📁 core/            # Integration
-├── 📁 data/            # Storage
-├── 📁 models/          # AI models
-├── 📁 docs/            # Documentation (organized)
-├── 📁 .streamlit/      # UI config
-├── 📁 archive/         # Old files (backup)
-├── 📄 README.md        # Main docs (clean)
-├── 📄 requirements.txt # Dependencies (clean)
-├── 📄 run.bat          # Launcher (NEW!)
-├── 📄 cleanup.bat      # Cleanup script (NEW!)
-├── 📄 .env.example     # Environment template (NEW!)
-└── 📄 .gitignore       # Git rules (updated)
-✅ 15 files in root
-✅ No duplicates
-✅ Clear structure
-✅ Web interface
-✅ Organized docs
-```
----
-## 🆕 NEW FEATURES
-| Feature | Before | After |
-|---------|--------|-------|
-| **Web Interface** | ❌ None | ✅ Streamlit UI |
-| **One-Click Launch** | ❌ None | ✅ run.bat |
-| **Centralized Config** | ❌ Scattered | ✅ config/settings.py |
-| **Language Docs** | ❌ None | ✅ docs/LANGUAGES.md |
-| **API Keys Docs** | ❌ None | ✅ docs/API_KEYS.md |
-| **Structure Docs** | ❌ None | ✅ docs/STRUCTURE.md |
-| **Cleanup Script** | ❌ None | ✅ cleanup.bat |
-| **Environment Template** | ❌ None | ✅ .env.example |
----
-## 🌐 WEB INTERFACE (NEW!)
-### **Before**
-```python
-# Command line only
-python main_upgraded.py
-# Voice commands only
-# No visual feedback
-# Hard to test
-```
-### **After**
-```bash
-# Web interface
-run.bat
-# Opens browser at http://localhost:8501
-# Visual interface
-# Easy to test
-# Interactive
-```
-### **UI Features**
-- ✅ **4 Tabs:** Vision, Query, Memories, Help
-- ✅ **Live Camera:** See what AI sees
-- ✅ **Interactive Buttons:** Capture, Remember, Read Text
-- ✅ **Query Interface:** Ask questions visually
-- ✅ **Memory Browser:** View stored memories
-- ✅ **Settings Sidebar:** Configure languages
-- ✅ **Help Section:** Built-in documentation
----
-## 🌍 LANGUAGE SUPPORT
-### **Before**
-```python
-# Hardcoded in code
-OCR_LANGUAGES = ['en']
-# No documentation
-# No easy way to change
-```
-### **After**
-```python
-# Configurable in UI
-# Select from 90+ languages
-# Documented in docs/LANGUAGES.md
-# Easy to change:
-# 1. Open UI sidebar
-# 2. Select languages
-# 3. Done!
-```
-**Supported:** 90+ languages including English, Spanish, French, German, Italian, Portuguese, Russian, Arabic, Hindi, Chinese, Japanese, Korean, and many more!
----
-## 🔑 API KEYS CLARITY
-### **Before**
-```
-❓ Unclear if API keys needed
-❓ No documentation
-❓ Confusing for users
-```
-### **After**
-```
-✅ Clear: NO API keys needed!
-✅ Documented in docs/API_KEYS.md
-✅ .env.example for optional token
-✅ Works 100% offline
-```
----
-## 📚 DOCUMENTATION
-### **Before**
-```
-❌ 11 documentation files in root
-❌ Scattered information
-❌ Redundant content
-❌ Hard to find info
-```
-### **After**
-```
-✅ 1 main README.md
-✅ 3 focused docs in docs/
-✅ No redundancy
-✅ Easy to navigate
-```
-| Document | Purpose |
-|----------|---------|
-| `README.md` | Main documentation |
-| `docs/LANGUAGES.md` | Language support (90+) |
-| `docs/API_KEYS.md` | API keys info |
-| `docs/STRUCTURE.md` | Project structure |
-| `RESTRUCTURE_SUMMARY.md` | This summary |
----
-## 🎯 USER EXPERIENCE
-### **Before**
-```
-1. Install dependencies
-2. Run python main_upgraded.py
-3. Use voice commands only
-4. No visual feedback
-5. Hard to debug
-```
-### **After**
-```
-1. Run run.bat
-2. Browser opens automatically
-3. Click buttons in UI
-4. See results instantly
-5. Easy to use and test
-```
----
-## 👨‍💻 DEVELOPER EXPERIENCE
-### **Before**
-```
-❌ Flat file structure
-❌ Settings scattered in code
-❌ Hard to find files
-❌ Duplicate code
-❌ No clear entry point
-```
-### **After**
-```
-✅ Organized folders
-✅ Centralized config
-✅ Easy to navigate
-✅ No duplicates
-✅ Clear entry points
-```
----
-## 🔧 CONFIGURATION
-### **Before**
-```python
-# Settings scattered across files
-# Hard to change
-# No central config
-```
-### **After**
-```python
-# All in config/settings.py
-# Easy to customize
-# Well documented
-# Feature flags
-# Example:
-OCR_CONFIG = {
-    "languages": ["en", "es", "fr"],
-    "confidence_threshold": 0.3,
-}
-```
----
-## 📊 FILE COUNT
-| Category | Before | After | Change |
-|----------|--------|-------|--------|
-| **Root Files** | 40+ | 15 | -25 |
-| **Agent Files** | 12 (duplicates) | 7 (clean) | -5 |
-| **Doc Files** | 11 (scattered) | 4 (organized) | -7 |
-| **Config Files** | 0 | 1 | +1 |
-| **UI Files** | 0 | 1 | +1 |
-| **Total Clutter** | High | Low | ✅ |
----
-## 🚀 LAUNCH PROCESS
-### **Before**
-```bash
-# Manual process
-1. Activate venv
-2. Install dependencies
-3. Run python main_upgraded.py
-4. Hope it works
-```
-### **After**
-```bash
-# One command
-run.bat
-# Automatically:
-# - Creates venv if needed
-# - Installs dependencies
-# - Launches Streamlit
-# - Opens browser
-```
----
-## 🎨 VISUAL COMPARISON
-### **Before: Command Line**
-```
-$ python main_upgraded.py
-[VisionAgent] Initializing...
-[VisionAgent] YOLO backend loaded
-[VoiceAgent] Microphone detected
-Vision Q started. I am listening.
-[VOICE IN]: Listening (offline)...
-```
-### **After: Web Interface**
-```
-┌─────────────────────────────────────┐
-│  👁️ VisionQ - Multimodal AI        │
-├─────────────────────────────────────┤
-│  📷 Vision  🔍 Query  🧠 Memories   │
-├─────────────────────────────────────┤
-│  [📷 Capture]  [💾 Remember]       │
-│  [🔤 Read Text]                     │
-│                                     │
-│  📸 Camera Feed                     │
-│  [Live video preview]               │
-│                                     │
-│  📝 Results                         │
-│  "a person holding a phone"         │
-└─────────────────────────────────────┘
-```
----
-## ✅ BENEFITS SUMMARY
-### **For Users**
-- ✅ Easy web interface
-- ✅ Visual feedback
-- ✅ One-click launch
-- ✅ 90+ languages
-- ✅ No API keys needed
-### **For Developers**
-- ✅ Clean structure
-- ✅ Modular code
-- ✅ Centralized config
-- ✅ Easy to extend
-- ✅ Well documented
-### **For Everyone**
-- ✅ Professional appearance
-- ✅ Production ready
-- ✅ Easy to deploy
-- ✅ Easy to maintain
-- ✅ Open source
----
-## 🎯 WHAT TO DO NOW
-### **1. Install & Run**
-```bash
-pip install -r requirements.txt
-run.bat
-```
-### **2. Clean Up Old Files**
-```bash
-cleanup.bat
-```
-### **3. Explore**
-- Open http://localhost:8501
-- Try the web interface
-- Test OCR in different languages
-- Query your memories
-### **4. Customize**
-- Edit `config/settings.py`
-- Select languages in UI
-- Adjust settings
----
-## 📈 IMPROVEMENT METRICS
-| Metric | Before | After | Improvement |
-|--------|--------|-------|-------------|
-| **Files in Root** | 40+ | 15 | 🟢 -62% |
-| **Duplicate Files** | 12 | 0 | 🟢 -100% |
-| **Setup Steps** | 5 | 1 | 🟢 -80% |
-| **User Interface** | CLI only | Web UI | 🟢 +100% |
-| **Documentation** | Scattered | Organized | 🟢 +100% |
-| **Ease of Use** | Hard | Easy | 🟢 +200% |
----
-## 🎉 FINAL RESULT
-**VisionQ is now:**
-- ✅ **Clean** - Organized folder structure
-- ✅ **Modern** - Web interface with Streamlit
-- ✅ **Documented** - Clear, focused documentation
-- ✅ **Configurable** - Centralized settings
-- ✅ **Multi-lingual** - 90+ languages supported
-- ✅ **Offline** - No API keys needed
-- ✅ **Professional** - Production ready
-- ✅ **User-Friendly** - Easy to use and test
----
-**From cluttered development project to polished application! 🚀**
-**Run `run.bat` and see the transformation yourself!**

FINAL_FIX.md DELETED Viewed

@@ -1,177 +0,0 @@
-# FINAL FIX - Summary
-## What Was Done
-### 1. Fixed Embedding Normalization
-- Changed from `.norm()` to `torch.nn.functional.normalize()`
-- Updated both `encode_image()` and `encode_text()` methods
-- File: `agents/embedding_agent.py`
-### 2. Added Error Handling
-- Wrapped embedding calls in try-except blocks
-- System continues even if embeddings fail
-- File: `agents/vision_agent.py`
-### 3. Disabled Embeddings by Default
-- Set `embeddings_enabled: False` in config
-- Improves speed (2-3 seconds vs 5-7 seconds)
-- File: `config/settings.py`
-### 4. Removed All Emojis
-- Cleaned up UI code
-- Professional appearance
-- File: `ui/app.py`
-### 5. Added Cache Clearing
-- Created `fix_and_run.bat` script
-- Clears Python and Streamlit cache
-- Ensures fresh start
-## How to Fix the Error
-### Quick Fix (Do This Now)
-```bash
-# Run this script
-fix_and_run.bat
-```
-This will:
-1. Clear all cache
-2. Start fresh
-3. Load updated code
-### If Still Getting Errors
-1. **Stop the application** (Ctrl+C in terminal)
-2. **Clear all cache manually:**
-   ```bash
-   rd /s /q __pycache__
-   rd /s /q agents\__pycache__
-   rd /s /q config\__pycache__
-   rd /s /q core\__pycache__
-   rd /s /q ui\__pycache__
-   ```
-3. **Restart:**
-   ```bash
-   run.bat
-   ```
-4. **In browser, press `C` key** to clear Streamlit cache
-5. **Click "Initialize System"** again
-## Why This Happens
-The error occurs because:
-1. Streamlit caches the old code
-2. Python bytecode cache has old version
-3. Old embedding_agent.py is being used
-The fix clears all caches and loads the new code.
-## Verification
-After running `fix_and_run.bat`, you should see:
-```
-[VisionAgent] Initializing...
-[VisionAgent] Embeddings disabled for faster performance
-[VisionAgent] YOLO backend loaded
-[VisionAgent] Vision system initialized
-```
-The key line is: **"Embeddings disabled for faster performance"**
-This means:
-- Embeddings are properly disabled
-- No embedding errors will occur
-- System will be faster (2-3 seconds)
-## What Each Button Does Now
-### "Capture & Describe"
-- Captures frame
-- Generates caption (BLIP)
-- Extracts text (OCR)
-- NO embeddings (disabled)
-- Fast: ~2-3 seconds
-### "Remember Scene"
-- Captures frame
-- Generates caption (BLIP)
-- Extracts text (OCR)
-- NO embeddings (disabled)
-- Stores in memory
-- Fast: ~2-3 seconds
-### "Read Text"
-- Captures frame
-- Extracts text only (OCR)
-- Very fast: ~500ms
-## Files Created
-1. `fix_and_run.bat` - Quick fix script
-2. `TROUBLESHOOTING.md` - Detailed troubleshooting guide
-3. `docs/PERFORMANCE.md` - Performance optimization guide
-4. `FIXES_APPLIED.md` - Summary of all fixes
-## Current Configuration
-```python
-# config/settings.py
-FEATURES = {
-    "ocr_enabled": True,           # Text extraction
-    "embeddings_enabled": False,   # Disabled for speed
-    "object_detection_enabled": True,  # YOLO detection
-}
-```
-**Result:**
-- Speed: Fast (2-3 seconds)
-- Features: Caption + OCR + Objects
-- Stability: No embedding errors
-## To Enable Embeddings (Optional)
-If you want visual similarity search:
-1. Edit `config/settings.py`:
-   ```python
-   FEATURES = {
-       "embeddings_enabled": True,
-   }
-   ```
-2. Run `fix_and_run.bat`
-3. Test carefully
-**Note:** This will make it slower (5-7 seconds) but enables visual search.
-## Summary
-**The error is fixed in the code.**
-**To apply the fix:**
-1. Run `fix_and_run.bat`
-2. Click "Initialize System"
-3. Test buttons
-4. Should work now!
-**If still broken:**
-- See `TROUBLESHOOTING.md`
-- Or keep embeddings disabled (recommended)
-**Current status:**
-- Embeddings: Disabled (for speed and stability)
-- OCR: Enabled
-- Object Detection: Enabled
-- Caption: Enabled
-- Speed: Fast (2-3 seconds)
-- Emojis: Removed
-**Everything should work now!**

FIXES_APPLIED.md DELETED Viewed

@@ -1,124 +0,0 @@
-# Fixes Applied - Summary
-## Issues Fixed
-### 1. AttributeError with CLIP Embeddings
-**Error:** `'BaseModelOutputWithPooling' object has no attribute 'norm'`
-**Fix:** Changed normalization method in `agents/embedding_agent.py`:
-```python
-# Before (broken)
-embedding = image_features / image_features.norm(dim=-1, keepdim=True)
-# After (fixed)
-embedding = torch.nn.functional.normalize(image_features, p=2, dim=-1)
-```
-### 2. Slow Performance
-**Issue:** System taking 5-7 seconds per capture
-**Fix:** Disabled embeddings by default in `config/settings.py`:
-```python
-FEATURES = {
-    "embeddings_enabled": False,  # Now disabled for speed
-}
-```
-**Result:** ~2-3 seconds per capture (much faster!)
-### 3. Emojis in Code
-**Issue:** Emojis throughout UI code
-**Fix:** Removed all emojis from:
-- Button labels
-- Headers
-- Status messages
-- Tab names
-- Spinner messages
-**Result:** Clean, professional UI without emojis
-## What Changed
-### Files Modified
-1. `agents/embedding_agent.py` - Fixed normalization
-2. `config/settings.py` - Disabled embeddings by default
-3. `agents/vision_agent.py` - Made embeddings optional
-4. `ui/app.py` - Removed all emojis
-### New Files Created
-1. `docs/PERFORMANCE.md` - Performance optimization guide
-## Current Configuration
-**Speed:** Fast (2-3 seconds per capture)
-**Enabled Features:**
-- BLIP Caption (always on)
-- YOLO Object Detection
-- EasyOCR Text Extraction
-**Disabled Features:**
-- CLIP Embeddings (for speed)
-## How to Use
-### 1. Run the Application
-```bash
-run.bat
-```
-### 2. Test the Fix
-- Click "Initialize System"
-- Click "Capture & Describe"
-- Should work without errors now
-- Should be faster (2-3 seconds)
-### 3. Enable Embeddings (Optional)
-If you want visual similarity search:
-Edit `config/settings.py`:
-```python
-FEATURES = {
-    "embeddings_enabled": True,  # Enable for visual search
-}
-```
-**Note:** This will make it slower (5-7 seconds) but enables visual similarity search.
-## Performance Comparison
-| Configuration | Speed | Features |
-|---------------|-------|----------|
-| **Current (Fast)** | 2-3s | Caption + OCR + Objects |
-| Full (Slow) | 5-7s | Caption + OCR + Objects + Embeddings |
-| Minimal (Fastest) | 1s | Caption only |
-## Troubleshooting
-### Still getting errors?
-1. Restart the application
-2. Clear cache: Delete `data/` folder
-3. Reinstall: `pip install --upgrade -r requirements.txt`
-### Still slow?
-1. Check `config/settings.py` - ensure `embeddings_enabled: False`
-2. Reduce OCR languages to just English
-3. Use smaller YOLO model (yolov8n.pt)
-See `docs/PERFORMANCE.md` for detailed optimization guide.
-## Summary
-All issues fixed:
-- Error with embeddings: FIXED
-- Slow performance: FIXED (2-3x faster)
-- Emojis in code: REMOVED
-System is now:
-- Working correctly
-- Much faster
-- Professional appearance
-- Ready to use
-Run `run.bat` and test it out!

FIX_TENSORFLOW.md DELETED Viewed

@@ -1,138 +0,0 @@
-# Fix: TensorFlow/Protobuf Conflict
-## The Error
-```
-RuntimeError: Failed to import transformers/BLIP.
-This usually happens when TensorFlow and protobuf are out of sync.
-```
-## Quick Fix (Recommended)
-Run this script:
-```bash
-fix_tensorflow.bat
-```
-This will:
-1. Remove TensorFlow (not needed)
-2. Install correct protobuf version
-3. Reinstall transformers
-4. Clear cache
-Then run:
-```bash
-streamlit run ui\app.py
-```
----
-## Manual Fix (If script doesn't work)
-### Step 1: Uninstall Conflicting Packages
-```bash
-pip uninstall tensorflow tensorflow-cpu protobuf -y
-```
-### Step 2: Install Correct Protobuf
-```bash
-pip install protobuf==3.20.3
-```
-### Step 3: Reinstall Transformers
-```bash
-pip install --upgrade --force-reinstall transformers
-```
-### Step 4: Clear Cache
-```bash
-rd /s /q __pycache__
-rd /s /q agents\__pycache__
-rd /s /q config\__pycache__
-rd /s /q core\__pycache__
-rd /s /q ui\__pycache__
-```
-### Step 5: Test
-```bash
-streamlit run ui\app.py
-```
----
-## Why This Happens
-VisionQ doesn't need TensorFlow, but sometimes it gets installed as a dependency and conflicts with protobuf.
-**Solution:** Remove TensorFlow and use specific protobuf version.
----
-## Nuclear Option (If nothing works)
-### Delete and Recreate Virtual Environment
-```bash
-# 1. Deactivate current venv
-deactivate
-# 2. Delete old venv
-rd /s /q .venv
-rd /s /q venv
-# 3. Create fresh venv
-python -m venv venv
-# 4. Activate
-venv\Scripts\activate
-# 5. Install dependencies
-pip install -r requirements.txt
-# 6. Run
-streamlit run ui\app.py
-```
----
-## Verify Fix
-After running the fix, you should be able to import without errors:
-```bash
-python -c "from agents.caption_agent import CaptionAgent; print('Success!')"
-```
-If you see "Success!", the fix worked!
----
-## Prevention
-The `requirements.txt` has been updated to include:
-```
-protobuf==3.20.3
-```
-This prevents future conflicts.
----
-## Summary
-**Quick Fix:**
-```bash
-fix_tensorflow.bat
-streamlit run ui\app.py
-```
-**If that doesn't work:**
-```bash
-# Nuclear option
-rd /s /q venv
-python -m venv venv
-venv\Scripts\activate
-pip install -r requirements.txt
-streamlit run ui\app.py
-```
-**One of these will definitely work!**

MASTER_TROUBLESHOOTING.md DELETED Viewed

@@ -1,232 +0,0 @@
-# VisionQ - Complete Troubleshooting Guide
-## Current Issues & Fixes
-### Issue 1: TensorFlow/Protobuf Conflict
-**Error:**
-```
-RuntimeError: Failed to import transformers/BLIP
-```
-**Fix:**
-```bash
-fix_tensorflow.bat
-```
-See `FIX_TENSORFLOW.md` for details.
----
-### Issue 2: Embedding AttributeError
-**Error:**
-```
-AttributeError: 'BaseModelOutputWithPooling' object has no attribute 'norm'
-```
-**Fix:**
-```bash
-fix_and_run.bat
-```
-See `TROUBLESHOOTING.md` for details.
----
-## All Fix Scripts
-| Script | Purpose | When to Use |
-|--------|---------|-------------|
-| `fix_tensorflow.bat` | Fix TensorFlow/protobuf conflict | Import errors with BLIP |
-| `fix_and_run.bat` | Clear cache and restart | Embedding errors, old code |
-| `run.bat` | Normal start | Regular use |
----
-## Step-by-Step Fix Process
-### Step 1: Fix TensorFlow Conflict
-```bash
-fix_tensorflow.bat
-```
-### Step 2: Clear Cache
-```bash
-fix_and_run.bat
-```
-### Step 3: Test
-Open http://localhost:8501 and test all buttons.
----
-## If Nothing Works: Nuclear Option
-### Complete Reset
-```bash
-# 1. Stop everything
-# Press Ctrl+C in terminal
-# 2. Delete virtual environment
-rd /s /q .venv
-rd /s /q venv
-# 3. Delete cache
-rd /s /q __pycache__
-rd /s /q agents\__pycache__
-rd /s /q config\__pycache__
-rd /s /q core\__pycache__
-rd /s /q ui\__pycache__
-# 4. Create fresh venv
-python -m venv venv
-# 5. Activate
-venv\Scripts\activate
-# 6. Upgrade pip
-python -m pip install --upgrade pip
-# 7. Install dependencies
-pip install -r requirements.txt
-# 8. Run
-streamlit run ui\app.py
-```
-This will give you a completely fresh start.
----
-## Common Errors & Solutions
-### Error: "python run.bat" gives SyntaxError
-**Problem:** Trying to run .bat file with Python
-**Solution:** Just run `run.bat` (without python)
-### Error: Camera not working
-**Problem:** Camera in use or permissions
-**Solution:**
-- Close other apps using camera
-- Check camera permissions
-- Try different camera index in `config/settings.py`
-### Error: Models loading slowly
-**Problem:** First run downloads models
-**Solution:** Wait for download to complete (~2GB)
-### Error: Out of memory
-**Problem:** Too many models loaded
-**Solution:**
-- Close other applications
-- Disable embeddings in `config/settings.py`
-- Use smaller YOLO model
-### Error: OCR not detecting text
-**Problem:** Poor lighting or text quality
-**Solution:**
-- Ensure good lighting
-- Text should be clear
-- Try different languages
----
-## Performance Issues
-### System is slow (5+ seconds per capture)
-**Check if embeddings are enabled:**
-```python
-# config/settings.py
-FEATURES = {
-    "embeddings_enabled": False,  # Should be False for speed
-}
-```
-**If True, change to False and restart.**
----
-## Verification Commands
-### Test imports:
-```bash
-python -c "from agents.caption_agent import CaptionAgent; print('Caption: OK')"
-python -c "from agents.vision_agent import VisionAgent; print('Vision: OK')"
-python -c "from agents.memory_agent import MemoryAgent; print('Memory: OK')"
-```
-### Check protobuf version:
-```bash
-pip show protobuf
-```
-Should show: `Version: 3.20.3`
-### Check if TensorFlow is installed:
-```bash
-pip show tensorflow
-```
-Should show: `WARNING: Package(s) not found: tensorflow` (Good!)
----
-## Getting Help
-### Check these files:
-1. `FIX_TENSORFLOW.md` - TensorFlow/protobuf issues
-2. `TROUBLESHOOTING.md` - Embedding errors
-3. `docs/PERFORMANCE.md` - Speed optimization
-4. `FINAL_FIX.md` - Summary of all fixes
-### Still stuck?
-1. Run nuclear option (complete reset)
-2. Check Python version: `python --version` (should be 3.8+)
-3. Check if in virtual environment: Look for `(.venv)` or `(venv)` in prompt
-4. Try on different computer to isolate issue
----
-## Quick Reference
-**Fix TensorFlow:**
-```bash
-fix_tensorflow.bat
-```
-**Fix Cache:**
-```bash
-fix_and_run.bat
-```
-**Normal Start:**
-```bash
-run.bat
-```
-**Direct Start:**
-```bash
-streamlit run ui\app.py
-```
-**Nuclear Reset:**
-```bash
-rd /s /q venv
-python -m venv venv
-venv\Scripts\activate
-pip install -r requirements.txt
-streamlit run ui\app.py
-```
----
-## Summary
-Most issues are fixed by:
-1. `fix_tensorflow.bat` - Fixes import errors
-2. `fix_and_run.bat` - Fixes cache issues
-3. Nuclear option - Fixes everything else
-**Try them in order until it works!**

MODELS_AND_SPEED.md DELETED Viewed

@@ -1,203 +0,0 @@
-# Models & Performance - Quick Reference
-## Current Models
-| Component | Model | Size | Speed | Status |
-|-----------|-------|------|-------|--------|
-| **Caption** | BLIP-base | 990MB | 1.5s | Active (SLOWEST) |
-| **Object Detection** | YOLOv8s | 22MB | 0.5s | Active |
-| **OCR** | EasyOCR | 50MB | 0.5s | Active |
-| **Embeddings** | CLIP | 500MB | 2s | Disabled |
-**Total processing time: ~2.5 seconds per capture**
----
-## Why Camera is Slow
-**The camera itself is fast!** The slowness comes from AI processing:
-```
-Camera capture:     10ms   (fast!)
-BLIP caption:     1500ms   (slow!)
-EasyOCR:           500ms   (medium)
-YOLO detection:    500ms   (medium)
-------------------------
-Total:            2510ms   (2.5 seconds)
-```
-**BLIP is the bottleneck!**
----
-## Quick Speed Fixes
-### Option 1: Disable OCR (500ms faster)
-```python
-# config/settings.py
-FEATURES = {
-    "ocr_enabled": False,
-}
-```
-**New speed:** 2 seconds
-### Option 2: Disable YOLO (500ms faster)
-```python
-FEATURES = {
-    "object_detection_enabled": False,
-}
-```
-**New speed:** 2 seconds
-### Option 3: Both (1000ms faster)
-```python
-FEATURES = {
-    "ocr_enabled": False,
-    "object_detection_enabled": False,
-}
-```
-**New speed:** 1.5 seconds (40% faster!)
----
-## Apply Fast Mode
-### Step 1: Edit config/settings.py
-Find the `FEATURES` section and change to:
-```python
-FEATURES = {
-    "ocr_enabled": False,              # Disable for speed
-    "object_detection_enabled": False,  # Disable for speed
-    "embeddings_enabled": False,       # Keep disabled
-}
-```
-### Step 2: Restart
-```bash
-fix_and_run.bat
-```
-### Step 3: Test
-Click "Capture & Describe" - should be ~1.5 seconds now!
----
-## Model Details
-### BLIP (Caption Model)
-- **Full name:** Salesforce/blip-image-captioning-base
-- **Purpose:** Generate scene descriptions
-- **Speed:** 1.5 seconds (CPU)
-- **Can't disable:** This is the core feature
-- **Alternative:** Use GIT model (3x faster)
-### YOLOv8s (Object Detection)
-- **Full name:** YOLOv8 Small
-- **Purpose:** Detect objects (person, car, etc.)
-- **Speed:** 0.5 seconds
-- **Can disable:** Yes (set object_detection_enabled: False)
-- **Alternative:** Use YOLOv8n (nano) for 200ms faster
-### EasyOCR (Text Reading)
-- **Purpose:** Read text from images
-- **Speed:** 0.5 seconds
-- **Can disable:** Yes (set ocr_enabled: False)
-- **Languages:** 90+ supported
-### CLIP (Embeddings)
-- **Purpose:** Visual similarity search
-- **Speed:** 2 seconds
-- **Status:** Already disabled
-- **Keep disabled:** For best performance
----
-## GPU Acceleration
-If you have NVIDIA GPU:
-```python
-# config/settings.py
-PERFORMANCE_CONFIG = {
-    "use_gpu": True,
-}
-OCR_CONFIG = {
-    "gpu": True,
-}
-```
-**Speed improvement:** 2-3x faster!
-**Requirements:**
-- NVIDIA GPU
-- CUDA installed
-- `pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118`
----
-## Recommended Settings
-### For Speed (Fastest)
-```python
-FEATURES = {
-    "ocr_enabled": False,
-    "object_detection_enabled": False,
-    "embeddings_enabled": False,
-}
-```
-**Speed:** 1.5 seconds
-**Features:** Caption only
-### For Balance (Recommended)
-```python
-FEATURES = {
-    "ocr_enabled": True,
-    "object_detection_enabled": False,
-    "embeddings_enabled": False,
-}
-```
-**Speed:** 2 seconds
-**Features:** Caption + OCR
-### For Full Features
-```python
-FEATURES = {
-    "ocr_enabled": True,
-    "object_detection_enabled": True,
-    "embeddings_enabled": False,
-}
-```
-**Speed:** 2.5 seconds
-**Features:** Everything
----
-## Summary
-**Models used:**
-- BLIP (caption) - 1.5s - Can't disable
-- YOLO (objects) - 0.5s - Can disable
-- EasyOCR (text) - 0.5s - Can disable
-**Why slow:**
-- BLIP takes 1.5 seconds
-- This is normal for AI image captioning
-- Camera itself is fast
-**Quick fix:**
-```python
-# Disable OCR and YOLO
-FEATURES = {
-    "ocr_enabled": False,
-    "object_detection_enabled": False,
-}
-```
-**Result:** 40% faster (1.5s instead of 2.5s)
-**Best fix:**
-- Use GPU (2-3x faster)
-- Or accept 1.5-2.5 second delay (normal for AI)
-**The camera is not slow - the AI models are doing heavy processing!**

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ license: apache-2.0
 [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
 [![Streamlit](https://img.shields.io/badge/streamlit-1.28+-red.svg)](https://streamlit.io/)
-[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
 ---

 [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
 [![Streamlit](https://img.shields.io/badge/streamlit-1.28+-red.svg)](https://streamlit.io/)
+[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
 ---

RESTRUCTURE_SUMMARY.md DELETED Viewed

@@ -1,348 +0,0 @@
-# 🎉 VisionQ - Restructuring Complete!
-## ✅ What Was Done
-Your VisionQ project has been **completely restructured** with:
-1. ✅ **Clean folder structure**
-2. ✅ **Streamlit web interface**
-3. ✅ **Centralized configuration**
-4. ✅ **Comprehensive documentation**
-5. ✅ **90+ language support**
-6. ✅ **No API keys needed**
----
-## 📂 NEW Structure
-```
-VisionQ/
-├── agents/          # AI agents (7 files)
-├── config/          # Settings (1 file)
-├── ui/              # Streamlit app (1 file)
-├── core/            # Integration (1 file)
-├── data/            # Storage (auto-created)
-├── models/          # AI models (auto-downloaded)
-├── docs/            # Documentation (3 files)
-├── .streamlit/      # UI config
-├── README.md        # Main docs
-├── requirements.txt # Dependencies
-├── run.bat          # Launcher
-└── cleanup.bat      # Cleanup script
-```
-**Total:** 10 code files, 3 docs, clean structure!
----
-## 🚀 How to Use
-### **1. Install Dependencies**
-```bash
-pip install -r requirements.txt
-```
-### **2. Launch Web Interface**
-```bash
-# Windows
-run.bat
-# Linux/Mac
-streamlit run ui/app.py
-```
-### **3. Open Browser**
-Go to: `http://localhost:8501`
-### **4. Start Using**
-- Click "Initialize System"
-- Capture scenes
-- Read text (OCR)
-- Query memories
----
-## 🌍 Language Support
-**90+ languages supported!**
-Including:
-- 🇬🇧 English
-- 🇪🇸 Spanish
-- 🇫🇷 French
-- 🇩🇪 German
-- 🇮🇹 Italian
-- 🇵🇹 Portuguese
-- 🇷🇺 Russian
-- 🇨🇳 Chinese
-- 🇯🇵 Japanese
-- 🇰🇷 Korean
-- 🇸🇦 Arabic
-- 🇮🇳 Hindi
-- ...and 78 more!
-**Select languages in UI sidebar.**
-See `docs/LANGUAGES.md` for full list.
----
-## 🔑 API Keys
-**Do you need API keys?**
-# **NO!** ❌
-VisionQ works **100% offline** without any API keys.
-All models run locally:
-- ✅ YOLO (object detection)
-- ✅ BLIP (captioning)
-- ✅ CLIP (embeddings)
-- ✅ EasyOCR (text extraction)
-- ✅ DistilBERT (NLP)
-- ✅ FAISS (vector search)
-**Optional:** Hugging Face token (for private models only)
-See `docs/API_KEYS.md` for details.
----
-## 🎯 Key Features
-### **Vision**
-- 👁️ Object detection (YOLO/SSD)
-- 📝 Image captioning (BLIP)
-- 🖼️ Visual embeddings (CLIP)
-- 🔤 Text extraction (OCR, 90+ languages)
-### **Memory**
-- 🧠 Semantic storage (FAISS)
-- 💾 Persistent JSON
-- ⚡ Fast search (<10ms)
-- 📊 10,000+ capacity
-### **Intelligence**
-- 🔍 Smart queries (DistilBERT)
-- ⏰ Time-based filtering
-- 🎯 Intent classification
-- 🔗 Multimodal fusion
-### **Interface**
-- 🌐 Web UI (Streamlit)
-- 📱 Responsive design
-- 🎨 Clean interface
-- 🚀 One-click launch
----
-## 📚 Documentation
-| File | Purpose |
-|------|---------|
-| `README.md` | Main documentation |
-| `docs/LANGUAGES.md` | Language support (90+) |
-| `docs/API_KEYS.md` | API keys info (none needed!) |
-| `docs/STRUCTURE.md` | Project structure |
----
-## 🧹 Cleanup Old Files
-**Run cleanup script:**
-```bash
-cleanup.bat
-```
-This moves old files to `archive/` folder:
-- Old agent files
-- Old documentation
-- Old scripts
-- Old requirements
-**You can safely delete `archive/` if not needed.**
----
-## 🔧 Configuration
-**Edit `config/settings.py` to customize:**
-```python
-# OCR languages
-OCR_CONFIG = {
-    "languages": ["en", "es", "fr"],
-}
-# Vision settings
-VISION_CONFIG = {
-    "camera_index": 0,
-    "confidence_threshold": 0.5,
-}
-# Memory settings
-MEMORY_CONFIG = {
-    "max_memories": 10000,
-}
-```
----
-## 🎓 Quick Start Guide
-### **Step 1: Install**
-```bash
-pip install -r requirements.txt
-```
-### **Step 2: Run**
-```bash
-run.bat  # Windows
-# or
-streamlit run ui/app.py  # Linux/Mac
-```
-### **Step 3: Initialize**
-- Open http://localhost:8501
-- Click "Initialize System"
-- Wait for models to load (~1 min first time)
-### **Step 4: Use**
-- **Vision Tab:** Capture, remember, read text
-- **Query Tab:** Ask questions about memories
-- **Memories Tab:** Browse stored memories
-- **Help Tab:** Documentation and tips
----
-## 📊 What Changed
-### **Before**
-```
-❌ Flat file structure
-❌ Redundant files
-❌ No web interface
-❌ Scattered documentation
-❌ Complex to use
-```
-### **After**
-```
-✅ Clean folder structure
-✅ No redundant files
-✅ Streamlit web interface
-✅ Organized documentation
-✅ Easy to use
-```
----
-## 🎯 Benefits
-### **For Users**
-- ✅ Easy web interface
-- ✅ One-click launch
-- ✅ Clear documentation
-- ✅ 90+ languages
-### **For Developers**
-- ✅ Clean structure
-- ✅ Modular code
-- ✅ Centralized config
-- ✅ Easy to extend
-### **For Everyone**
-- ✅ No API keys needed
-- ✅ 100% offline
-- ✅ Free forever
-- ✅ Open source
----
-## 🐛 Troubleshooting
-### **Models loading slowly?**
-- First run downloads ~2GB
-- Subsequent runs are fast
-- Models cached locally
-### **Camera not working?**
-- Check permissions
-- Try different camera index
-- Ensure no other app using camera
-### **OCR not detecting text?**
-- Ensure good lighting
-- Text should be clear
-- Try different languages
-### **Out of memory?**
-- Close other applications
-- Reduce stored memories
-- Use CPU instead of GPU
----
-## 📞 Support
-**Need help?**
-1. Check `README.md`
-2. Check `docs/` folder
-3. Check UI "Help" tab
-4. Open GitHub issue
----
-## ✅ Next Steps
-### **Immediate**
-1. ✅ Run `pip install -r requirements.txt`
-2. ✅ Run `run.bat` or `streamlit run ui/app.py`
-3. ✅ Open http://localhost:8501
-4. ✅ Click "Initialize System"
-5. ✅ Start using!
-### **Optional**
-1. ⭐ Run `cleanup.bat` to organize old files
-2. ⭐ Customize `config/settings.py`
-3. ⭐ Add languages in UI sidebar
-4. ⭐ Explore documentation
----
-## 🎉 Summary
-**VisionQ is now:**
-- ✅ Clean & organized
-- ✅ Easy to use (web UI)
-- ✅ Well documented
-- ✅ Multi-language (90+)
-- ✅ No API keys needed
-- ✅ 100% offline
-- ✅ Production ready
-**Everything you need in one place!**
----
-## 📋 Checklist
-- [ ] Install dependencies: `pip install -r requirements.txt`
-- [ ] Launch UI: `run.bat` or `streamlit run ui/app.py`
-- [ ] Open browser: http://localhost:8501
-- [ ] Initialize system
-- [ ] Test vision features
-- [ ] Test OCR (read text)
-- [ ] Test queries
-- [ ] Browse memories
-- [ ] Read documentation
-- [ ] Customize settings (optional)
-- [ ] Run cleanup (optional)
----
-**VisionQ - Restructured, refined, and ready to use! 🚀**
-**Open http://localhost:8501 and start exploring!**

TROUBLESHOOTING.md DELETED Viewed

@@ -1,191 +0,0 @@
-# Troubleshooting: AttributeError with Embeddings
-## The Error
-```
-AttributeError: 'BaseModelOutputWithPooling' object has no attribute 'norm'
-```
-## What Causes This
-This error occurs when:
-1. Old cached version of embedding_agent.py is being used
-2. Streamlit is caching the old agent code
-3. Python bytecode cache (__pycache__) has old version
-## Quick Fix (Recommended)
-### Option 1: Use Fix Script
-```bash
-fix_and_run.bat
-```
-This will:
-- Clear all Python cache
-- Clear Streamlit cache
-- Restart the application
-### Option 2: Manual Fix
-```bash
-# 1. Stop the application (Ctrl+C)
-# 2. Clear Python cache
-rd /s /q __pycache__
-rd /s /q agents\__pycache__
-rd /s /q config\__pycache__
-rd /s /q core\__pycache__
-rd /s /q ui\__pycache__
-# 3. Clear Streamlit cache
-rd /s /q .streamlit\cache
-# 4. Restart
-run.bat
-```
-### Option 3: In Browser
-1. Open http://localhost:8501
-2. Press `C` key (clears cache)
-3. Click "Initialize System" again
-## Permanent Fix
-The code has been updated with error handling, so even if the error occurs, it will:
-1. Print a warning message
-2. Continue without embeddings
-3. Still work for caption and OCR
-## Verify Fix
-After running fix_and_run.bat, you should see:
-```
-[VisionAgent] Embeddings disabled for faster performance
-```
-This means embeddings are properly disabled and won't cause errors.
-## If Still Getting Errors
-### Step 1: Check Config
-Open `config/settings.py` and verify:
-```python
-FEATURES = {
-    "embeddings_enabled": False,  # Should be False
-}
-```
-### Step 2: Delete All Cache
-```bash
-# Delete everything
-rd /s /q __pycache__
-rd /s /q agents\__pycache__
-rd /s /q config\__pycache__
-rd /s /q core\__pycache__
-rd /s /q ui\__pycache__
-rd /s /q .streamlit
-# Recreate .streamlit
-mkdir .streamlit
-```
-### Step 3: Reinstall
-```bash
-pip uninstall transformers torch -y
-pip install transformers torch
-```
-### Step 4: Fresh Start
-```bash
-# Close all Python processes
-taskkill /F /IM python.exe
-# Wait 5 seconds
-# Start fresh
-run.bat
-```
-## Understanding the Fix
-### What Was Changed
-**Before (Broken):**
-```python
-embedding = image_features / image_features.norm(dim=-1, keepdim=True)
-```
-**After (Fixed):**
-```python
-embedding = torch.nn.functional.normalize(image_features, p=2, dim=-1)
-```
-### Why It Works
-The new method uses PyTorch's built-in normalize function which:
-- Works with all tensor types
-- Handles the BaseModelOutputWithPooling correctly
-- Is more robust
-### Error Handling Added
-```python
-if self.embedding_agent:
-    try:
-        embedding = self.embedding_agent.encode_image(frame)
-    except Exception as e:
-        print(f"[VisionAgent] Embedding failed: {e}")
-        embedding = None
-```
-Now even if embeddings fail, the system continues working.
-## Prevention
-To avoid this in the future:
-1. **Always clear cache after code changes:**
-   ```bash
-   fix_and_run.bat
-   ```
-2. **Use the fix script instead of run.bat when testing changes**
-3. **Keep embeddings disabled unless you need visual search:**
-   ```python
-   FEATURES = {
-       "embeddings_enabled": False,  # Faster and more stable
-   }
-   ```
-## Performance Note
-With embeddings disabled:
-- Speed: 2-3 seconds per capture
-- Features: Caption + OCR + Object Detection
-- Stability: No embedding errors
-With embeddings enabled:
-- Speed: 5-7 seconds per capture
-- Features: All features + Visual Search
-- Stability: May have errors if not properly cached
-## Summary
-**Quick Fix:**
-1. Run `fix_and_run.bat`
-2. Click "Initialize System"
-3. Test "Capture & Describe"
-4. Should work now!
-**If still broken:**
-1. Check `config/settings.py` - embeddings should be False
-2. Delete all __pycache__ folders
-3. Restart computer (clears all Python processes)
-4. Run `fix_and_run.bat`
-**Prevention:**
-- Always use `fix_and_run.bat` after code changes
-- Keep embeddings disabled for stability
-- Clear cache regularly
-The system is now more robust and will handle errors gracefully!

VisionQ DELETED Viewed

	@@ -1 +0,0 @@
1	- Subproject commit 18f18d23a1f3ad386db32957c746239c80e78751

archive/old_agents/caption_agent.py DELETED Viewed

@@ -1,40 +0,0 @@
-import os
-# Avoid importing TensorFlow (fixes compatibility issues such as protobuf/DType conflicts)
-os.environ["TRANSFORMERS_NO_TF"] = "1"
-os.environ["HF_HUB_DISABLE_TF"] = "1"
-from PIL import Image
-import torch
-try:
-    from transformers import BlipProcessor, BlipForConditionalGeneration
-except Exception as e:
-    raise RuntimeError(
-        "Failed to import transformers/BLIP. This usually happens when TensorFlow and protobuf "
-        "are out of sync in the current Python environment.\n\n"
-        "Fix: run in a clean virtual environment and install dependencies from requirements.txt."
-    ) from e
-class CaptionAgent:
-    def __init__(self):
-        self.processor = BlipProcessor.from_pretrained(
-            "Salesforce/blip-image-captioning-base"
-        )
-        self.model = BlipForConditionalGeneration.from_pretrained(
-            "Salesforce/blip-image-captioning-base"
-        )
-        self.model.eval()
-    def describe(self, frame_bgr):
-        # OpenCV BGR → PIL RGB
-        frame_rgb = frame_bgr[:, :, ::-1]
-        image = Image.fromarray(frame_rgb)
-        inputs = self.processor(image, return_tensors="pt")
-        with torch.no_grad():
-            out = self.model.generate(**inputs, max_length=30)
-        caption = self.processor.decode(out[0], skip_special_tokens=True)
-        return caption

archive/old_agents/memory_agent.py DELETED Viewed

@@ -1,59 +0,0 @@
-import json
-from datetime import datetime
-import numpy as np
-from sentence_transformers import SentenceTransformer
-class MemoryAgent:
-    def __init__(self, memory_file="memory.json"):
-        self.memory_file = memory_file
-        self.model = SentenceTransformer("all-MiniLM-L6-v2")
-        self.memories = []
-        self._load()
-    def _load(self):
-        try:
-            with open(self.memory_file, "r") as f:
-                self.memories = json.load(f)
-        except:
-            self.memories = []
-    def _save(self):
-        with open(self.memory_file, "w") as f:
-            json.dump(self.memories, f, indent=2)
-    @staticmethod
-    def compute_importance(description):
-        desc = description.lower()
-        score = 1  # base importance
-        if "person" in desc:
-            score += 2
-        if any(obj in desc for obj in ["phone", "bag", "book", "device"]):
-            score += 1
-        if any(act in desc for act in ["entered", "left", "holding", "walking"]):
-            score += 2
-        return score
-    def add(self, description):
-        embedding = self.model.encode(description).tolist()
-        importance = MemoryAgent.compute_importance(description)
-        memory = {
-            "timestamp": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
-            "description": description,
-            "embedding": embedding,
-            "importance": importance
-        }
-        self.memories.append(memory)
-        self._save()
-    def recall_last(self):
-        if not self.memories:
-            return None
-        return self.memories[-1]

archive/old_agents/query_agent.py DELETED Viewed

@@ -1,127 +0,0 @@
-import numpy as np
-from datetime import datetime, timedelta
-class QueryAgent:
-    def __init__(self, memory_agent):
-        self.memory_agent = memory_agent
-        self.model = memory_agent.model
-    # Cosine similarity
-    def cosine_similarity(self, a, b):
-        return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
-    # Time parsing
-    @staticmethod
-    def extract_time_window(question):
-        now = datetime.now()
-        q = question.lower()
-        if "last hour" in q:
-            return now - timedelta(hours=1)
-        if "last 30 minutes" in q:
-            return now - timedelta(minutes=30)
-        if "recent" in q or "recently" in q:
-            return now - timedelta(hours=2)
-        if "today" in q:
-            return now.replace(hour=0, minute=0, second=0)
-        if "yesterday" in q:
-            start = (now - timedelta(days=1)).replace(hour=0, minute=0, second=0)
-            end = start + timedelta(days=1)
-            return (start, end)
-        if "this morning" in q:
-            return (
-                now.replace(hour=6, minute=0, second=0),
-                now.replace(hour=12, minute=0, second=0),
-            )
-        if "this evening" in q:
-            return (
-                now.replace(hour=18, minute=0, second=0),
-                now.replace(hour=22, minute=0, second=0),
-            )
-        if "last evening" in q:
-            start = (now - timedelta(days=1)).replace(hour=18, minute=0, second=0)
-            return (start, start.replace(hour=22))
-        if "last night" in q:
-            return (
-                (now - timedelta(days=1)).replace(hour=22, minute=0, second=0),
-                now.replace(hour=6, minute=0, second=0),
-            )
-        return None
-    # MAIN QUERY METHOD
-    def ask(self, question, threshold=0.45):    #Change to 0.5 when scalable enough
-        memories = self.memory_agent.recall_all()
-        if not memories:
-            return "I don't have any memories yet."
-        #  Time filtering
-        time_filter = self.extract_time_window(question)
-        filtered = []
-        for m in memories:
-            mem_time = datetime.strptime(
-                m["timestamp"], "%Y-%m-%d %H:%M:%S"
-            )
-            if time_filter is None:
-                filtered.append(m)
-            elif isinstance(time_filter, tuple):
-                start, end = time_filter
-                if start <= mem_time < end:
-                    filtered.append(m)
-            else:
-                if mem_time >= time_filter:
-                    filtered.append(m)
-        if not filtered:
-            return "I don't recall anything from that time."
-        # Semantic similarity
-        query_embedding = self.model.encode(question)
-        scored = []
-        for m in filtered:
-            # Handle missing embeddings (for backwards compatibility)
-            if "embedding" not in m:
-                m["embedding"] = self.model.encode(m["description"]).tolist()
-            # Handle missing importance
-            if "importance" not in m:
-                m["importance"] = 1
-            sim = self.cosine_similarity(
-                query_embedding,
-                np.array(m["embedding"])
-            )
-            if sim >= threshold:
-                scored.append((sim, m))
-        if not scored:
-            return "I don't recall anything related to that."
-        # Rank by similarity + importance
-        scored.sort(
-            key=lambda x: (x[0], x[1]["importance"]),
-            reverse=True
-        )
-        # Build response
-        responses = []
-        for sim, m in scored:
-            responses.append(
-                f"At {m['timestamp']}, {m['description']} "
-                f"(confidence {sim:.2f})"
-            )
-        return "\n".join(responses)

archive/old_agents/vision_agent.py DELETED Viewed

@@ -1,210 +0,0 @@
-import cv2
-import numpy as np
-import time
-import warnings
-warnings.filterwarnings("ignore")
-from caption_agent import CaptionAgent
-from memory_agent import MemoryAgent
-class VisionAgent:
-    def __init__(self):
-        # -------------------------------------------------
-        # INIT AGENTS
-        # -------------------------------------------------
-        self.caption_agent = CaptionAgent()
-        self.memory_agent = MemoryAgent()
-        # -------------------------------------------------
-        # CONFIG
-        # -------------------------------------------------
-        self.FRAME_INTERVAL = 0.3
-        self.CONF_THRESHOLD = 0.5
-        # -------------------------------------------------
-        # LOAD YOLO (PRIMARY)
-        # -------------------------------------------------
-        self.VISION_BACKEND = "SSD"
-        self.yolo_model = None
-        self.interpreter = None
-        self.LABELS = None
-        self.input_details = None
-        self.output_details = None
-        self.INPUT_HEIGHT = None
-        self.INPUT_WIDTH = None
-        self.INPUT_TYPE = None
-        try:
-            from ultralytics import YOLO
-            self.yolo_model = YOLO("yolov8s.pt")
-            self.VISION_BACKEND = "YOLO"
-            print("[Vision] YOLO backend loaded")
-        except Exception as e:
-            print("[Vision] YOLO failed, falling back to SSD:", e)
-            Interpreter = None
-            try:
-                from ai_edge_litert import Interpreter
-            except ImportError:
-                try:
-                    import tensorflow as tf
-                    Interpreter = tf.lite.Interpreter
-                except Exception as tf_err:
-                    print(
-                        "[Vision] SSD fallback unavailable (ai_edge_litert / tensorflow not installed):",
-                        tf_err,
-                    )
-            if Interpreter is not None:
-                with open("label_ssd.txt", "r") as f:
-                    self.LABELS = [line.strip() for line in f.readlines()]
-                self.interpreter = Interpreter(
-                    model_path="ssd_mobilenet_v2_fpnlite_035_192_int8.tflite"
-                )
-                self.interpreter.allocate_tensors()
-                self.input_details = self.interpreter.get_input_details()
-                self.output_details = self.interpreter.get_output_details()
-                self.INPUT_HEIGHT = self.input_details[0]["shape"][1]
-                self.INPUT_WIDTH = self.input_details[0]["shape"][2]
-                self.INPUT_TYPE = self.input_details[0]["dtype"]
-                self.VISION_BACKEND = "SSD"
-                print("[Vision] MobileNet-SSD backend loaded")
-            else:
-                # No valid vision backend available, only captioning will work.
-                self.VISION_BACKEND = None
-                print(
-                    "[Vision] No valid object-detection backend available; "
-                    "only captioning will work. Install ultralytics or tensorflow."
-                )
-        # -------------------------------------------------
-        # CAMERA
-        # -------------------------------------------------
-        self.cap = cv2.VideoCapture(0)
-        print("Vision system initialized.")
-    def describe_scene(self):
-        """Capture and describe current scene"""
-        ret, frame = self.cap.read()
-        if not ret:
-            return None
-        return self.caption_agent.describe(frame)
-    def remember_scene(self):
-        """Capture, describe, and remember current scene"""
-        ret, frame = self.cap.read()
-        if not ret:
-            return None
-        description = self.caption_agent.describe(frame)
-        self.memory_agent.add(description)
-        return description
-    def cleanup(self):
-        """Release resources"""
-        self.cap.release()
-        cv2.destroyAllWindows()
-        print("Vision system stopped.")
-    def run_continuous(self):
-        """Run continuous vision loop (object detection + caption on change)"""
-        previous_objects = set()
-        last_time = 0
-        print("Vision continuous mode started.")
-        print("Press 'q' to quit.")
-        while True:
-            ret, frame = self.cap.read()
-            if not ret:
-                break
-            current_time = time.time()
-            if current_time - last_time < self.FRAME_INTERVAL:
-                continue
-            last_time = current_time
-            current_objects = set()
-            # -------------------------------------------------
-            # OBJECT DETECTION (CONTINUOUS)
-            # -------------------------------------------------
-            if self.VISION_BACKEND == "YOLO":
-                results = self.yolo_model(frame, conf=self.CONF_THRESHOLD, verbose=False)
-                for r in results:
-                    for box in r.boxes:
-                        label = r.names[int(box.cls[0])]
-                        conf = float(box.conf[0])
-                        if conf >= self.CONF_THRESHOLD:
-                            current_objects.add(label)
-            elif self.VISION_BACKEND == "SSD":
-                rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
-                resized = cv2.resize(rgb, (self.INPUT_WIDTH, self.INPUT_HEIGHT))
-                input_data = np.expand_dims(resized, axis=0)
-                if self.INPUT_TYPE == np.uint8:
-                    input_data = input_data.astype(np.uint8)
-                else:
-                    input_data = input_data.astype(np.float32) / 255.0
-                self.interpreter.set_tensor(self.input_details[0]["index"], input_data)
-                self.interpreter.invoke()
-                classes = self.interpreter.get_tensor(self.output_details[1]["index"]).flatten()
-                scores = self.interpreter.get_tensor(self.output_details[2]["index"]).flatten()
-                for i, score in enumerate(scores):
-                    if score >= self.CONF_THRESHOLD:
-                        class_id = int(classes[i])
-                        if class_id < len(self.LABELS):
-                            current_objects.add(self.LABELS[class_id])
-            else:
-                # No object-detection backend is available; just caption the scene every interval.
-                description = self.caption_agent.describe(frame)
-                print("[CAPTION]", description)
-                self.memory_agent.add(description)
-                previous_objects = set()
-                continue
-                self.interpreter.set_tensor(self.input_details[0]["index"], input_data)
-                self.interpreter.invoke()
-                classes = self.interpreter.get_tensor(self.output_details[1]["index"]).flatten()
-                scores = self.interpreter.get_tensor(self.output_details[2]["index"]).flatten()
-                for i, score in enumerate(scores):
-                    if score >= self.CONF_THRESHOLD:
-                        class_id = int(classes[i])
-                        if class_id < len(self.LABELS):
-                            current_objects.add(self.LABELS[class_id])
-            print("Detected objects:", current_objects)
-            # -------------------------------------------------
-            # EVENT DETECTION + VLM CAPTION
-            # -------------------------------------------------
-            new_objects = current_objects - previous_objects
-            removed_objects = previous_objects - current_objects
-            if new_objects or removed_objects:
-                description = self.caption_agent.describe(frame)
-                print("[CAPTION]", description)
-                self.memory_agent.add(description)
-            previous_objects = current_objects.copy()
-            # -------------------------------------------------
-            # EXIT (NON-BLOCKING)
-            # -------------------------------------------------
-            if cv2.waitKey(1) & 0xFF == ord('q'):
-                break
-        self.cleanup()

archive/old_agents/voice_agent.py DELETED Viewed

@@ -1,127 +0,0 @@
-import json
-import queue
-import pyttsx3
-import sounddevice as sd
-from vosk import Model, KaldiRecognizer
-class VoiceAgent:
-    def __init__(self, model_path="models/vosk"):
-        # -------------------------
-        # Text-to-Speech
-        # -------------------------
-        self.engine = pyttsx3.init()
-        # -------------------------
-        # Speech-to-Text (offline)
-        # -------------------------
-        self.sample_rate = 16000
-        try:
-            self.model = Model(model_path)
-        except Exception as e:
-            raise RuntimeError(f"Vosk model not found at {model_path}") from e
-        self.recognizer = KaldiRecognizer(self.model, self.sample_rate)
-        # Audio queue
-        self.audio_queue = queue.Queue()
-        # Check mic
-        self._check_microphone()
-    # -------------------------
-    # Microphone check
-    # -------------------------
-    def _check_microphone(self):
-        devices = sd.query_devices()
-        input_devices = [d for d in devices if d["max_input_channels"] > 0]
-        if not input_devices:
-            raise RuntimeError("No microphone detected.")
-        print("[VOICE INIT] Microphone detected:")
-        for d in input_devices:
-            print(" -", d["name"])
-        self.speak("Microphone is ready.")
-    # -------------------------
-    # TTS
-    # -------------------------
-    def speak(self, text):
-        print("[VOICE OUT]:", text)
-        self.engine.say(text)
-        self.engine.runAndWait()
-    # -------------------------
-    # Audio callback
-    # -------------------------
-    def _audio_callback(self, indata, frames, time, status):
-        if status:
-            print("[VOICE WARNING]", status)
-        self.audio_queue.put(bytes(indata))
-    # -------------------------
-    # Listen (offline STT)
-    # -------------------------
-    def listen(self, timeout=5):
-        print("[VOICE IN]: Listening (offline)...")
-        self.speak("Listening")
-        self.recognizer.Reset()
-        with sd.RawInputStream(
-            samplerate=self.sample_rate,
-            blocksize=8000,
-            dtype="int16",
-            channels=1,
-            callback=self._audio_callback
-        ):
-            for _ in range(int(timeout * self.sample_rate / 8000)):
-                data = self.audio_queue.get()
-                if self.recognizer.AcceptWaveform(data):
-                    break
-        result = json.loads(self.recognizer.FinalResult())
-        text = result.get("text", "").lower()
-        if text:
-            print("[VOICE IN]: Detected speech →", text)
-        else:
-            print("[VOICE IN]: No speech detected")
-        return text
-    # -------------------------
-    # Intent parsing (SAFE ORDER)
-    # -------------------------
-    def parse_intent(self, text):
-        if not text:
-            return "UNKNOWN"
-        # RECALL (most specific)
-        if "what did i see" in text or "what have i seen" in text:
-            return "RECALL_MEMORY"
-        if "remember what i saw" in text:
-            return "RECALL_MEMORY"
-        # STORE
-        if "remember this" in text or "save this" in text:
-            return "REMEMBER_SCENE"
-        # DESCRIBE
-        if "describe" in text or "what is in front" in text:
-            return "DESCRIBE_SCENE"
-        # OCR (later)
-        if "read" in text or "what does this say" in text:
-            return "READ_TEXT"
-        # EXIT
-        if "exit" in text or "quit" in text or "stop" in text:
-            return "EXIT"
-        return "UNKNOWN"

archive/old_docs/ARCHITECTURE.md DELETED Viewed

@@ -1,445 +0,0 @@
-# 🏗️ VisionQ Architecture - Detailed Diagram
-## 📐 SYSTEM OVERVIEW
-```
-┌─────────────────────────────────────────────────────────────────┐
-│                         USER INTERACTION                         │
-│                    (Voice Commands / Text Queries)               │
-└────────────────────────────┬────────────────────────────────────┘
-                             │
-                ┌────────────┴────────────┐
-                │                         │
-         ┌──────▼──────┐          ┌──────▼──────┐
-         │ VOICE AGENT │          │TEXT QUERIES │
-         │  (UPDATED)  │          │  (UPDATED)  │
-         └──────┬──────┘          └──────┬──────┘
-                │                        │
-    ┌───────────┴────────────┐          │
-    │                        │          │
-┌───▼────┐            ┌─────▼──────┐   │
-│  STT   │            │    TTS     │   │
-│ (Vosk) │            │ (UPDATED)  │   │
-│ KEPT   │            └─────┬──────┘   │
-└───┬────┘                  │          │
-    │              ┌────────┴────────┐ │
-    │              │                 │ │
-    │         ┌────▼────┐     ┌─────▼─▼──────┐
-    │         │Voxtral  │     │   pyttsx3    │
-    │         │ (NEW)   │     │  (FALLBACK)  │
-    │         │Primary  │     │    KEPT      │
-    │         └─────────┘     └──────────────┘
-    │
-    └──────────────────┬──────────────────────────────────┐
-                       │                                  │
-                ┌──────▼──────┐                          │
-                │VISION AGENT │                          │
-                │  (UPDATED)  │                          │
-                │   HUB       │                          │
-                └──────┬──────┘                          │
-                       │                                  │
-        ┌──────────────┼──────────────┬─────────────┐   │
-        │              │              │             │   │
-   ┌────▼────┐   ┌────▼────┐   ┌────▼────┐  ┌────▼───▼──┐
-   │  YOLO/  │   │  BLIP   │   │MobileCLIP│  │  EasyOCR  │
-   │   SSD   │   │Caption  │   │Embedding│  │    OCR    │
-   │  KEPT   │   │  KEPT   │   │  (NEW)  │  │   (NEW)   │
-   └────┬────┘   └────┬────┘   └────┬────┘  └────┬──────┘
-        │             │              │            │
-        │ Objects     │ Caption      │ Embedding  │ Text
-        │             │              │            │
-        └─────────────┴──────────────┴────────────┘
-                       │
-                ┌──────▼──────┐
-                │   FUSION    │
-                │    LAYER    │
-                │    (NEW)    │
-                └──────┬──────┘
-                       │
-            Unified Multimodal Context
-                       │
-        ┌──────────────┴──────────────┐
-        │                             │
-   ┌────▼────┐                  ┌────▼────┐
-   │ MEMORY  │                  │  QUERY  │
-   │  AGENT  │◄─────────────────┤  AGENT  │
-   │(UPDATED)│                  │(UPDATED)│
-   └────┬────┘                  └────┬────┘
-        │                            │
-   ┌────┴────┬────────┐         ┌───┴────┐
-   │         │        │         │        │
-┌──▼──┐  ┌──▼───┐ ┌──▼───┐  ┌──▼────┐ ┌▼────────┐
-│JSON │  │FAISS │ │Text  │  │DistilB│ │Hybrid   │
-│Meta │  │Index │ │Embed │  │ ERT   │ │Search   │
-│KEPT │  │(NEW) │ │KEPT  │  │(NEW)  │ │(NEW)    │
-└─────┘  └──────┘ └──────┘  └───────┘ └─────────┘
-```
----
-## 🔄 DATA FLOW DIAGRAM
-### **1. SCENE DESCRIPTION FLOW**
-```
-User: "Describe the scene"
-    │
-    ▼
-VoiceAgent.listen() → Vosk STT
-    │
-    ▼
-VoiceAgent.parse_intent() → "DESCRIBE_SCENE"
-    │
-    ▼
-VisionAgent.describe_scene()
-    │
-    ├─► CaptionAgent.describe() → "a person holding a phone"
-    │
-    ├─► OCRAgent.extract_text() → "Hello World"
-    │
-    ├─► EmbeddingAgent.encode_image() → [512-dim vector]
-    │
-    └─► FusionLayer.fuse()
-            │
-            ▼
-        Combined Description:
-        "a person holding a phone. Text visible: Hello World"
-            │
-            ▼
-VoiceAgent.speak() → Voxtral/pyttsx3
-```
----
-### **2. MEMORY STORAGE FLOW**
-```
-User: "Remember this"
-    │
-    ▼
-VisionAgent.remember_scene()
-    │
-    ├─► Capture frame
-    │
-    ├─► Get caption (BLIP)
-    │
-    ├─► Get OCR text (EasyOCR)
-    │
-    ├─► Get embedding (MobileCLIP)
-    │
-    └─► FusionLayer.fuse()
-            │
-            ▼
-        Fused Context
-            │
-            ▼
-MemoryAgent.add(description, embedding)
-    │
-    ├─► Generate text embedding (sentence-transformers)
-    │
-    ├─► Compute importance score
-    │
-    ├─► Save to JSON:
-    │   {
-    │     "id": 0,
-    │     "timestamp": "2024-01-15 10:30:00",
-    │     "description": "...",
-    │     "text_embedding": [...],
-    │     "image_embedding": [...],
-    │     "importance": 5
-    │   }
-    │
-    └─► Add to FAISS index (image embedding)
-            │
-            ▼
-        Memory Stored ✅
-```
----
-### **3. MEMORY QUERY FLOW**
-```
-User: "What did I see this morning?"
-    │
-    ▼
-QueryAgent.ask(question)
-    │
-    ├─► QueryAgent.classify_intent()
-    │       │
-    │       └─► DistilBERT → "temporal"
-    │
-    ├─► QueryAgent.extract_time_window()
-    │       │
-    │       └─► (6:00 AM, 12:00 PM)
-    │
-    ├─► Filter memories by time
-    │
-    ├─► Text similarity search
-    │   │
-    │   └─► sentence-transformers cosine similarity
-    │
-    ├─► Image similarity search (if query has image)
-    │   │
-    │   └─► FAISS.search() → Top-K results
-    │
-    ├─► Hybrid ranking
-    │   │
-    │   └─► Sort by (similarity × importance)
-    │
-    └─► Build response
-            │
-            ▼
-        "At 10:30 AM, a person holding a phone.
-         Text visible: Hello World (confidence 0.87)"
-```
----
-### **4. OCR READING FLOW**
-```
-User: "Read the text"
-    │
-    ▼
-VisionAgent.read_text()
-    │
-    ├─► Capture frame
-    │
-    └─► OCRAgent.extract_text(frame)
-            │
-            ├─► EasyOCR.readtext() → [(bbox, text, conf), ...]
-            │
-            ├─► Filter by confidence (>0.3)
-            │
-            ├─► Clean text (remove special chars)
-            │
-            └─► Return: "Hello World"
-                    │
-                    ▼
-VoiceAgent.speak("I can see the following text: Hello World")
-```
----
-## 🧩 MODULE INTERACTIONS
-### **Agent Dependencies**
-```
-VoiceAgent
-    ├─ Depends on: vosk, sounddevice, pyttsx3, piper-tts
-    └─ Used by: main.py
-VisionAgent
-    ├─ Depends on: CaptionAgent, EmbeddingAgent, OCRAgent,
-    │              MemoryAgent, FusionLayer
-    └─ Used by: main.py
-CaptionAgent
-    ├─ Depends on: transformers (BLIP), torch
-    └─ Used by: VisionAgent
-EmbeddingAgent
-    ├─ Depends on: transformers (CLIP), torch
-    └─ Used by: VisionAgent
-OCRAgent
-    ├─ Depends on: easyocr
-    └─ Used by: VisionAgent
-FusionLayer
-    ├─ Depends on: None (pure Python)
-    └─ Used by: VisionAgent
-MemoryAgent
-    ├─ Depends on: sentence-transformers, faiss, json
-    └─ Used by: VisionAgent, QueryAgent
-QueryAgent
-    ├─ Depends on: MemoryAgent, transformers (DistilBERT)
-    └─ Used by: ask_question.py, main.py (future)
-```
----
-## 🔀 FALLBACK MECHANISMS
-### **1. TTS Fallback**
-```
-Try: Voxtral/Piper
-    │
-    ├─ Success → Use neural TTS
-    │
-    └─ Failure → Fall back to pyttsx3
-```
-### **2. Intent Classification Fallback**
-```
-Try: DistilBERT
-    │
-    ├─ Success → Use NLP classification
-    │
-    └─ Failure → Use keyword matching
-```
-### **3. Vision Backend Fallback**
-```
-Try: YOLO
-    │
-    ├─ Success → Use YOLO
-    │
-    └─ Failure → Try SSD
-            │
-            ├─ Success → Use SSD
-            │
-            └─ Failure → Caption only
-```
-### **4. Vector Search Fallback**
-```
-Try: FAISS
-    │
-    ├─ Available → Fast vector search
-    │
-    └─ Unavailable → Linear text search
-```
----
-## 📊 MEMORY ARCHITECTURE
-### **Hybrid Storage System**
-```
-┌─────────────────────────────────────────┐
-│           MEMORY AGENT                  │
-├─────────────────────────────────────────┤
-│                                         │
-│  ┌───────────────┐  ┌────────────────┐ │
-│  │  JSON FILE    │  │  FAISS INDEX   │ │
-│  │  (Metadata)   │  │  (Vectors)     │ │
-│  ├───────────────┤  ├────────────────┤ │
-│  │ • ID          │  │ • Image embed  │ │
-│  │ • Timestamp   │  │ • Fast search  │ │
-│  │ • Description │  │ • Cosine sim   │ │
-│  │ • Text embed  │  │ • Top-K        │ │
-│  │ • Image embed │  │                │ │
-│  │ • Importance  │  │                │ │
-│  └───────────────┘  └────────────────┘ │
-│         │                    │         │
-│         └────────┬───────────┘         │
-│                  │                     │
-│         Linked by Memory ID            │
-└─────────────────────────────────────────┘
-```
-### **Search Strategy**
-```
-Query Input
-    │
-    ├─► Has image? → FAISS image search
-    │                    │
-    │                    └─► Get top-K IDs
-    │
-    └─► Has text? → Text embedding search
-                         │
-                         └─► Get matching IDs
-                                 │
-                                 ▼
-                         Merge & Rank
-                                 │
-                                 ▼
-                         Return Results
-```
----
-## 🎯 COMPONENT STATUS
-| Component | Status | Notes |
-|-----------|--------|-------|
-| VoiceAgent | ✅ UPDATED | Added Voxtral + fallback |
-| VisionAgent | ✅ UPDATED | Integrated new agents |
-| CaptionAgent | ✅ KEPT | No changes needed |
-| EmbeddingAgent | 🆕 NEW | MobileCLIP integration |
-| OCRAgent | 🆕 NEW | EasyOCR integration |
-| FusionLayer | 🆕 NEW | Multimodal fusion |
-| MemoryAgent | ✅ UPDATED | Added FAISS |
-| QueryAgent | ✅ UPDATED | Added DistilBERT |
----
-## 🔧 CONFIGURATION POINTS
-### **Adjustable Parameters**
-```python
-# VisionAgent
-FRAME_INTERVAL = 0.3        # Seconds between frames
-CONF_THRESHOLD = 0.5        # Object detection confidence
-# OCRAgent
-OCR_CONFIDENCE = 0.3        # Text detection threshold
-OCR_LANGUAGES = ['en']      # Supported languages
-# MemoryAgent
-EMBEDDING_DIM = 512         # CLIP embedding size
-FAISS_INDEX_TYPE = "FlatIP" # Inner product (cosine)
-# QueryAgent
-SIMILARITY_THRESHOLD = 0.45 # Text search threshold
-TOP_K_RESULTS = 5           # Max results to return
-```
----
-## 📈 SCALABILITY
-### **Current Limits**
-- Memory: ~10,000 entries (JSON + FAISS)
-- Search: O(log n) with FAISS
-- Real-time: 3 FPS (with all agents)
-### **Optimization Options**
-1. Use FAISS IVF index for >100K memories
-2. Batch process frames
-3. GPU acceleration for embeddings
-4. Async processing pipeline
----
-## 🎓 KEY DESIGN DECISIONS
-### **1. Why FAISS?**
-- Fast similarity search (10-100x faster than linear)
-- Scales to millions of vectors
-- CPU-friendly (no GPU required)
-### **2. Why EasyOCR?**
-- Offline capability
-- Multi-language support
-- Good accuracy/speed tradeoff
-### **3. Why DistilBERT?**
-- 40% smaller than BERT
-- 60% faster
-- 97% of BERT's accuracy
-### **4. Why Hybrid Storage?**
-- JSON: Human-readable, easy debugging
-- FAISS: Fast vector search
-- Best of both worlds
----
-**This architecture provides:**
-- ✅ Modularity (easy to extend)
-- ✅ Robustness (multiple fallbacks)
-- ✅ Performance (FAISS acceleration)
-- ✅ Compatibility (backward compatible)
----
-For implementation details, see individual agent files in `agents/` directory.

archive/old_docs/COMPARISON.md DELETED Viewed

@@ -1,431 +0,0 @@
-# 📊 VisionQ - Before vs After Comparison
-## 🎯 EXECUTIVE SUMMARY
-VisionQ has been upgraded from a **basic vision assistant** to a **comprehensive multimodal AI system** with 4 major new capabilities, 10x performance improvement, and 100% backward compatibility.
----
-## 🆚 FEATURE COMPARISON
-| Feature | Before | After | Improvement |
-|---------|--------|-------|-------------|
-| **Text Reading** | ❌ None | ✅ EasyOCR | NEW |
-| **Memory Search** | Linear O(n) | FAISS O(log n) | 10-100x faster |
-| **Voice Quality** | Robotic (pyttsx3) | Natural (Voxtral) | Much better |
-| **Query Understanding** | Keywords | DistilBERT NLP | 27% more accurate |
-| **Scene Description** | Caption only | Caption+OCR+Objects | 4x richer |
-| **Memory Capacity** | ~1,000 entries | 10,000+ entries | 10x more |
-| **Search Accuracy** | ~75% relevant | ~90% relevant | 15% better |
-| **Response Time** | 100-500ms | <100ms | 5x faster |
----
-## 🏗️ ARCHITECTURE COMPARISON
-### **Before (Original)**
-```
-Voice (Vosk + pyttsx3)
-    ↓
-Vision (YOLO/SSD + BLIP)
-    ↓
-Memory (JSON + text embeddings)
-    ↓
-Query (cosine similarity)
-```
-**Components:** 4 agents
-**Storage:** JSON only
-**Search:** Linear text search
-**Modalities:** Vision only
----
-### **After (Upgraded)**
-```
-Voice (Vosk + Voxtral + pyttsx3)
-    ↓
-Vision Hub
-  ├─ YOLO/SSD (objects)
-  ├─ BLIP (captions)
-  ├─ MobileCLIP (embeddings)
-  └─ EasyOCR (text)
-    ↓
-Fusion Layer
-    ↓
-Memory (JSON + FAISS)
-    ↓
-Query (DistilBERT + hybrid search)
-```
-**Components:** 7 agents + fusion layer
-**Storage:** JSON + FAISS hybrid
-**Search:** Vector similarity + text
-**Modalities:** Vision + Text + Embeddings
----
-## 📈 PERFORMANCE METRICS
-| Metric | Before | After | Change |
-|--------|--------|-------|--------|
-| **Memory Search Time** | 100-500ms | <10ms | 🟢 10-50x faster |
-| **Query Response** | 200-1000ms | <100ms | 🟢 2-10x faster |
-| **Memory Capacity** | ~1,000 | 10,000+ | 🟢 10x more |
-| **Search Accuracy** | 75% | 90% | 🟢 +15% |
-| **Intent Accuracy** | 70% | 97% | 🟢 +27% |
-| **OCR Accuracy** | N/A | 85-95% | 🟢 NEW |
-| **Startup Time** | 5-10s | 8-15s | 🟡 Slightly slower |
-| **Memory Usage** | ~500MB | ~800MB | 🟡 +300MB |
----
-## 🆕 NEW CAPABILITIES
-### **1. OCR Text Extraction**
-**Before:** ❌ Could not read text
-**After:** ✅ Extracts and reads visible text
-**Example:**
-```
-Before: "a sign on a wall"
-After: "a sign on a wall. Text visible: EXIT"
-```
----
-### **2. Visual Similarity Search**
-**Before:** ❌ Text-only search
-**After:** ✅ Image embedding search via FAISS
-**Example:**
-```
-Before: Search by description only
-After: "Find scenes similar to this image" → Returns visually similar memories
-```
----
-### **3. Intent Classification**
-**Before:** ❌ Keyword matching (70% accuracy)
-**After:** ✅ DistilBERT NLP (97% accuracy)
-**Example:**
-```
-Query: "What did I see this morning?"
-Before: Matches "see" keyword → Generic results
-After: Classifies as "temporal" → Time-filtered results
-```
----
-### **4. Neural TTS**
-**Before:** ❌ Robotic pyttsx3 voice
-**After:** ✅ Natural Voxtral/Piper voice
-**Example:**
-```
-Before: "Scene. Remembered." (robotic)
-After: "Scene remembered." (natural)
-```
----
-### **5. Multimodal Fusion**
-**Before:** ❌ Caption only
-**After:** ✅ Caption + OCR + Objects + Embeddings
-**Example:**
-```
-Before: "a person holding a phone"
-After: "a person holding a phone. Objects detected: person, phone. Text visible: Hello World"
-```
----
-## 🔧 TECHNICAL IMPROVEMENTS
-### **Code Organization**
-| Aspect | Before | After |
-|--------|--------|-------|
-| **Structure** | Flat files | Modular (agents/ + core/) |
-| **Agents** | 4 agents | 7 agents + fusion layer |
-| **Lines of Code** | ~800 | ~1,500 (better organized) |
-| **Documentation** | Basic README | 6 comprehensive docs |
-| **Tests** | None | Automated test suite |
----
-### **Dependencies**
-| Category | Before | After | Added |
-|----------|--------|-------|-------|
-| **Core** | 8 packages | 10 packages | +2 |
-| **Optional** | 1 (tensorflow) | 3 (faiss, easyocr, piper) | +2 |
-| **Total Size** | ~1.5GB | ~2GB | +500MB |
-**New Dependencies:**
-- ✅ faiss-cpu (vector search)
-- ✅ easyocr (text extraction)
-- ✅ piper-tts (neural voice)
----
-### **Storage System**
-| Aspect | Before | After |
-|--------|--------|-------|
-| **Metadata** | JSON file | JSON file (kept) |
-| **Vectors** | In JSON | FAISS index (new) |
-| **Text Embeddings** | sentence-transformers | sentence-transformers (kept) |
-| **Image Embeddings** | ❌ None | ✅ MobileCLIP |
-| **Search Method** | Linear scan | FAISS similarity |
-| **Index Size** | N/A | ~4KB per 1000 entries |
----
-## 🎯 USE CASE COMPARISON
-### **Scenario 1: Scene Description**
-**Before:**
-```
-User: "Describe the scene"
-System: "a person holding a phone"
-```
-**After:**
-```
-User: "Describe the scene"
-System: "a person holding a phone. Objects detected: person, phone. Text visible: Hello World"
-```
-**Improvement:** 4x more information
----
-### **Scenario 2: Memory Search**
-**Before:**
-```
-User: "What did I see this morning?"
-System: [Searches all memories linearly]
-Time: 500ms for 1000 memories
-Results: 5 matches (75% relevant)
-```
-**After:**
-```
-User: "What did I see this morning?"
-System: [FAISS + time filter + intent classification]
-Time: 10ms for 10,000 memories
-Results: 5 matches (90% relevant)
-```
-**Improvement:** 50x faster, 15% more accurate
----
-### **Scenario 3: Text Reading**
-**Before:**
-```
-User: "Read the text"
-System: "Reading text will be available soon."
-```
-**After:**
-```
-User: "Read the text"
-System: "I can see the following text: Hello World"
-```
-**Improvement:** NEW capability
----
-## 📊 CAPABILITY MATRIX
-| Capability | Before | After | Status |
-|------------|--------|-------|--------|
-| **Object Detection** | ✅ YOLO/SSD | ✅ YOLO/SSD | KEPT |
-| **Image Captioning** | ✅ BLIP | ✅ BLIP | KEPT |
-| **Text Extraction** | ❌ | ✅ EasyOCR | NEW |
-| **Image Embeddings** | ❌ | ✅ MobileCLIP | NEW |
-| **Text Embeddings** | ✅ MiniLM | ✅ MiniLM | KEPT |
-| **Vector Search** | ❌ | ✅ FAISS | NEW |
-| **Speech Recognition** | ✅ Vosk | ✅ Vosk | KEPT |
-| **Text-to-Speech** | ✅ pyttsx3 | ✅ Voxtral + pyttsx3 | ENHANCED |
-| **Intent Classification** | ❌ Keywords | ✅ DistilBERT | NEW |
-| **Time Filtering** | ✅ Basic | ✅ Enhanced | IMPROVED |
-| **Importance Scoring** | ✅ Basic | ✅ Enhanced | IMPROVED |
-| **Multimodal Fusion** | ❌ | ✅ FusionLayer | NEW |
----
-## 🔄 BACKWARD COMPATIBILITY
-| Aspect | Compatible? | Notes |
-|--------|-------------|-------|
-| **Old memory.json** | ✅ YES | Automatically migrated |
-| **Voice commands** | ✅ YES | Same commands work |
-| **Memory format** | ✅ YES | New fields optional |
-| **API** | ✅ YES | Old methods still work |
-| **File structure** | ✅ YES | Old files preserved |
-| **Dependencies** | ✅ YES | Old deps still work |
-**Breaking Changes:** ❌ NONE
----
-## 💰 COST-BENEFIT ANALYSIS
-### **Costs**
-| Item | Cost |
-|------|------|
-| **Development Time** | ~8 hours |
-| **Additional Storage** | +500MB models |
-| **Memory Usage** | +300MB RAM |
-| **Startup Time** | +3-5 seconds |
-| **Complexity** | Medium increase |
-### **Benefits**
-| Item | Benefit |
-|------|---------|
-| **New Features** | 4 major capabilities |
-| **Performance** | 10x faster search |
-| **Accuracy** | 15-27% improvement |
-| **Capacity** | 10x more memories |
-| **User Experience** | Significantly better |
-| **Maintainability** | Better code structure |
-**ROI:** 🟢 **VERY HIGH** - Major improvements with minimal cost
----
-## 🎯 UPGRADE IMPACT
-### **User Impact**
-- 🟢 **Positive:** Better features, faster, smarter
-- 🟡 **Neutral:** Slightly longer startup
-- 🔴 **Negative:** None
-### **Developer Impact**
-- 🟢 **Positive:** Better code organization, more modular
-- 🟢 **Positive:** Comprehensive documentation
-- 🟡 **Neutral:** More files to maintain
-- 🔴 **Negative:** None
-### **System Impact**
-- 🟢 **Positive:** 10x performance improvement
-- 🟢 **Positive:** 10x capacity increase
-- 🟡 **Neutral:** +300MB memory usage
-- 🔴 **Negative:** None
----
-## 📈 SCALABILITY COMPARISON
-| Aspect | Before | After | Improvement |
-|--------|--------|-------|-------------|
-| **Max Memories** | ~1,000 | 10,000+ | 10x |
-| **Search Complexity** | O(n) | O(log n) | Logarithmic |
-| **Concurrent Queries** | 1 | Multiple | Thread-safe |
-| **Index Size** | N/A | ~4KB/1000 | Efficient |
-| **Memory Growth** | Linear | Sub-linear | Better |
----
-## 🏆 SUCCESS METRICS
-### **Technical Success**
-- ✅ 100% backward compatible
-- ✅ 0 breaking changes
-- ✅ 10x performance improvement
-- ✅ 4 new major features
-- ✅ 8 new modules created
-### **Quality Success**
-- ✅ Comprehensive documentation
-- ✅ Automated tests
-- ✅ Error handling
-- ✅ Fallback mechanisms
-- ✅ Code organization
-### **User Success** (To Measure)
-- ⏳ User satisfaction
-- ⏳ Feature adoption
-- ⏳ Error rate reduction
-- ⏳ Performance perception
-- ⏳ Feedback scores
----
-## 🎓 LESSONS LEARNED
-### **What Worked Well**
-- ✅ Modular architecture
-- ✅ Fallback mechanisms
-- ✅ Backward compatibility
-- ✅ Comprehensive docs
-- ✅ Hybrid storage (JSON + FAISS)
-### **What Could Be Better**
-- 🟡 Startup time (slightly slower)
-- 🟡 Memory usage (increased)
-- 🟡 Dependency count (more packages)
-### **Future Improvements**
-- 💡 Lazy loading for faster startup
-- 💡 Memory optimization
-- 💡 Optional feature flags
-- 💡 Web interface
-- 💡 Mobile app
----
-## 📊 FINAL VERDICT
-### **Overall Assessment**
-| Category | Rating | Notes |
-|----------|--------|-------|
-| **Features** | ⭐⭐⭐⭐⭐ | 4 major new capabilities |
-| **Performance** | ⭐⭐⭐⭐⭐ | 10x faster |
-| **Compatibility** | ⭐⭐⭐⭐⭐ | 100% backward compatible |
-| **Code Quality** | ⭐⭐⭐⭐⭐ | Well organized |
-| **Documentation** | ⭐⭐⭐⭐⭐ | Comprehensive |
-| **Testing** | ⭐⭐⭐⭐☆ | Good coverage |
-| **User Experience** | ⭐⭐⭐⭐⭐ | Significantly improved |
-**Overall:** ⭐⭐⭐⭐⭐ **EXCELLENT UPGRADE**
----
-## ✅ RECOMMENDATION
-**Status:** ✅ **APPROVED FOR DEPLOYMENT**
-**Confidence:** 🟢 **HIGH**
-**Reasoning:**
-- All objectives achieved
-- No breaking changes
-- Significant improvements
-- Well documented
-- Production ready
-**Next Steps:**
-1. ✅ Deploy to production
-2. ⏳ Monitor performance
-3. ⏳ Collect user feedback
-4. ⏳ Plan next iteration
----
-**The upgrade is a resounding success! 🎉**
-VisionQ has evolved from a basic vision assistant to a **state-of-the-art multimodal AI system** while maintaining 100% backward compatibility.
-**Recommended Action:** PROCEED WITH DEPLOYMENT 🚀

archive/old_docs/DEPLOYMENT_CHECKLIST.md DELETED Viewed

@@ -1,397 +0,0 @@
-# ✅ VisionQ Upgrade - Deployment Checklist
-## 📋 PRE-DEPLOYMENT
-### **Code Review**
-- [x] All agents implemented
-- [x] Fusion layer created
-- [x] Memory system upgraded
-- [x] Query system enhanced
-- [x] Voice system updated
-- [x] Backward compatibility verified
-- [x] Error handling added
-- [x] Fallback mechanisms in place
-### **Documentation**
-- [x] README_UPGRADED.md created
-- [x] QUICKSTART.md created
-- [x] UPGRADE_GUIDE.md created
-- [x] ARCHITECTURE.md created
-- [x] SUMMARY.md created
-- [x] Code comments added
-- [x] Docstrings complete
-### **Testing Scripts**
-- [x] test_upgrade.py created
-- [x] install_upgrade.bat created
-- [x] Test cases defined
----
-## 🚀 DEPLOYMENT STEPS
-### **Step 1: Backup** ⚠️
-```bash
-# Backup existing system
-mkdir backup
-copy *.py backup\
-copy memory.json backup\
-```
-- [ ] Old files backed up
-- [ ] Memory file backed up
-- [ ] Configuration saved
-### **Step 2: Install Dependencies**
-```bash
-pip install -r requirements_upgraded.txt
-```
-- [ ] Core dependencies installed
-- [ ] FAISS installed
-- [ ] EasyOCR installed
-- [ ] Piper TTS installed (optional)
-### **Step 3: Directory Setup**
-```bash
-mkdir data
-move memory.json data\memory.json
-```
-- [ ] data/ directory created
-- [ ] Memory file migrated
-- [ ] Permissions verified
-### **Step 4: Run Tests**
-```bash
-python test_upgrade.py
-```
-- [ ] All imports successful
-- [ ] MemoryAgent tests pass
-- [ ] FusionLayer tests pass
-- [ ] QueryAgent tests pass
-- [ ] Backward compatibility verified
-### **Step 5: Initial Run**
-```bash
-python main_upgraded.py
-```
-- [ ] System starts without errors
-- [ ] Camera initializes
-- [ ] Microphone detected
-- [ ] Voice output works
-- [ ] Can exit cleanly
----
-## 🧪 FUNCTIONAL TESTING
-### **Voice Commands**
-- [ ] "Describe the scene" works
-- [ ] "Remember this" stores memory
-- [ ] "What did I see" recalls memory
-- [ ] "Read the text" extracts text (if text visible)
-- [ ] "Exit" quits properly
-### **Memory System**
-- [ ] Memories persist after restart
-- [ ] JSON file created in data/
-- [ ] FAISS index created (if available)
-- [ ] Can recall stored memories
-- [ ] Timestamps correct
-### **Query System**
-```bash
-python ask_question_upgraded.py
-```
-- [ ] Time-based queries work
-- [ ] Text search returns results
-- [ ] Intent classification functional
-- [ ] Confidence scores displayed
-### **OCR Functionality**
-- [ ] Text extraction works
-- [ ] Confidence filtering applied
-- [ ] Text cleaning functional
-- [ ] Integrated into descriptions
-### **Fallback Mechanisms**
-- [ ] pyttsx3 works if Voxtral unavailable
-- [ ] Keyword matching if DistilBERT fails
-- [ ] Linear search if FAISS unavailable
-- [ ] System continues if OCR fails
----
-## 🔍 PERFORMANCE TESTING
-### **Speed Tests**
-- [ ] Memory search <100ms
-- [ ] OCR processing <500ms
-- [ ] Caption generation <200ms
-- [ ] Query response <100ms
-### **Capacity Tests**
-- [ ] Can store 100+ memories
-- [ ] Search remains fast with many memories
-- [ ] FAISS index scales properly
-- [ ] No memory leaks
-### **Accuracy Tests**
-- [ ] OCR accuracy >85% (on clear text)
-- [ ] Intent classification >90%
-- [ ] Memory retrieval relevance >85%
-- [ ] Caption quality maintained
----
-## 📊 INTEGRATION TESTING
-### **End-to-End Scenarios**
-**Scenario 1: Basic Usage**
-1. [ ] Start system
-2. [ ] Describe scene
-3. [ ] Remember scene
-4. [ ] Recall memory
-5. [ ] Exit
-**Scenario 2: OCR Workflow**
-1. [ ] Start system
-2. [ ] Point at text
-3. [ ] Say "Read the text"
-4. [ ] Verify text extracted
-5. [ ] Check memory includes text
-**Scenario 3: Query Workflow**
-1. [ ] Store multiple memories
-2. [ ] Run ask_question_upgraded.py
-3. [ ] Try time-based query
-4. [ ] Try object-based query
-5. [ ] Verify results relevant
-**Scenario 4: Fallback Testing**
-1. [ ] Uninstall FAISS temporarily
-2. [ ] Verify system still works
-3. [ ] Reinstall FAISS
-4. [ ] Verify enhanced features return
----
-## 🐛 ERROR HANDLING
-### **Common Errors to Test**
-- [ ] Camera not available
-- [ ] Microphone not detected
-- [ ] Model download failure
-- [ ] Memory file corrupted
-- [ ] FAISS index corrupted
-- [ ] Out of disk space
-- [ ] Permission denied
-### **Recovery Procedures**
-- [ ] System logs errors clearly
-- [ ] Fallbacks activate automatically
-- [ ] User gets helpful error messages
-- [ ] System doesn't crash
-- [ ] Can recover without restart
----
-## 📚 DOCUMENTATION VERIFICATION
-### **User Documentation**
-- [ ] QUICKSTART.md accurate
-- [ ] Installation steps work
-- [ ] Voice commands documented
-- [ ] Query examples work
-- [ ] Troubleshooting helpful
-### **Developer Documentation**
-- [ ] ARCHITECTURE.md clear
-- [ ] Code comments accurate
-- [ ] API documented
-- [ ] Examples provided
-- [ ] Diagrams correct
-### **Upgrade Documentation**
-- [ ] UPGRADE_GUIDE.md complete
-- [ ] Migration steps clear
-- [ ] Backward compatibility explained
-- [ ] New features documented
-- [ ] Performance metrics accurate
----
-## 🔒 SECURITY & PRIVACY
-### **Privacy Checks**
-- [ ] No data sent to cloud
-- [ ] All processing local
-- [ ] Memory stored locally
-- [ ] No telemetry
-- [ ] No external API calls
-### **Security Checks**
-- [ ] No hardcoded credentials
-- [ ] File permissions correct
-- [ ] Input validation present
-- [ ] No SQL injection risks
-- [ ] Dependencies up to date
----
-## 📦 PACKAGING
-### **Files to Include**
-- [x] agents/ directory
-- [x] core/ directory
-- [x] main_upgraded.py
-- [x] ask_question_upgraded.py
-- [x] requirements_upgraded.txt
-- [x] install_upgrade.bat
-- [x] test_upgrade.py
-- [x] All documentation files
-- [x] LICENSE
-- [x] .gitignore
-### **Files to Exclude**
-- [ ] __pycache__/
-- [ ] *.pyc
-- [ ] data/test_*
-- [ ] .venv/
-- [ ] models/ (too large, download separately)
----
-## 🚀 PRODUCTION READINESS
-### **Critical Requirements**
-- [ ] All tests pass
-- [ ] No critical bugs
-- [ ] Documentation complete
-- [ ] Backward compatible
-- [ ] Performance acceptable
-### **Nice-to-Have**
-- [ ] Neural TTS working
-- [ ] FAISS available
-- [ ] EasyOCR installed
-- [ ] All optional features enabled
----
-## 📈 POST-DEPLOYMENT
-### **Monitoring**
-- [ ] Track memory usage
-- [ ] Monitor query performance
-- [ ] Log error rates
-- [ ] Collect user feedback
-- [ ] Measure accuracy
-### **Maintenance**
-- [ ] Regular dependency updates
-- [ ] Model updates
-- [ ] Bug fixes
-- [ ] Feature requests
-- [ ] Documentation updates
----
-## ✅ FINAL SIGN-OFF
-### **Deployment Approval**
-- [ ] All critical tests passed
-- [ ] Documentation reviewed
-- [ ] Backup created
-- [ ] Rollback plan ready
-- [ ] Team notified
-### **Go-Live Checklist**
-- [ ] System tested end-to-end
-- [ ] Users trained
-- [ ] Support ready
-- [ ] Monitoring active
-- [ ] Feedback mechanism in place
----
-## 🎉 SUCCESS CRITERIA
-### **Must Have**
-- ✅ System starts without errors
-- ✅ All voice commands work
-- ✅ Memory persists
-- ✅ Backward compatible
-- ✅ Documentation complete
-### **Should Have**
-- ✅ OCR functional
-- ✅ FAISS search fast
-- ✅ Neural TTS working
-- ✅ Intent classification accurate
-- ✅ Performance targets met
-### **Nice to Have**
-- ⭐ All optional features enabled
-- ⭐ Zero warnings
-- ⭐ Perfect test coverage
-- ⭐ User feedback positive
-- ⭐ Performance exceeds targets
----
-## 📞 SUPPORT CONTACTS
-### **Technical Issues**
-- Check: UPGRADE_GUIDE.md troubleshooting
-- Run: test_upgrade.py
-- Review: Error logs
-### **Documentation**
-- QUICKSTART.md - Quick setup
-- UPGRADE_GUIDE.md - Complete guide
-- ARCHITECTURE.md - Technical details
----
-## 🏁 DEPLOYMENT STATUS
-**Current Status:** ✅ READY FOR DEPLOYMENT
-**Confidence Level:** HIGH
-- All code implemented
-- Tests created
-- Documentation complete
-- Backward compatible
-- Fallbacks in place
-**Recommended Action:** PROCEED WITH DEPLOYMENT
----
-## 📝 NOTES
-### **Known Limitations**
-- Piper TTS requires separate model download
-- EasyOCR first run downloads models (~500MB)
-- FAISS CPU-only (GPU version available separately)
-- OCR accuracy depends on image quality
-### **Future Improvements**
-- Web interface
-- Mobile app
-- Cloud sync (optional)
-- Multi-user support
-- Video recording
----
-**Deployment Checklist Complete! 🎊**
-**Next Steps:**
-1. Review this checklist
-2. Run through deployment steps
-3. Execute test suite
-4. Verify all features
-5. Deploy to production
-**Good luck! 🚀**

archive/old_docs/INDEX.md DELETED Viewed

@@ -1,359 +0,0 @@
-# 📚 VisionQ Upgrade - Documentation Index
-## 🎯 START HERE
-**New to VisionQ?** → [QUICKSTART.md](QUICKSTART.md)
-**Upgrading existing system?** → [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md)
-**Need quick reference?** → [QUICK_REFERENCE.md](QUICK_REFERENCE.md)
----
-## 📖 DOCUMENTATION MAP
-### **🚀 Getting Started**
-| Document | Purpose | Time | Audience |
-|----------|---------|------|----------|
-| [QUICKSTART.md](QUICKSTART.md) | 5-minute setup guide | 5 min | Everyone |
-| [QUICK_REFERENCE.md](QUICK_REFERENCE.md) | Command cheat sheet | 2 min | Everyone |
-| [README_UPGRADED.md](README_UPGRADED.md) | Project overview | 10 min | Everyone |
-### **📋 Upgrade Information**
-| Document | Purpose | Time | Audience |
-|----------|---------|------|----------|
-| [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md) | Complete upgrade docs | 30 min | Developers |
-| [SUMMARY.md](SUMMARY.md) | Executive summary | 10 min | Managers |
-| [COMPARISON.md](COMPARISON.md) | Before/After analysis | 15 min | Technical leads |
-### **🏗️ Technical Documentation**
-| Document | Purpose | Time | Audience |
-|----------|---------|------|----------|
-| [ARCHITECTURE.md](ARCHITECTURE.md) | System architecture | 20 min | Developers |
-| [DEPLOYMENT_CHECKLIST.md](DEPLOYMENT_CHECKLIST.md) | Deploy procedures | 15 min | DevOps |
-### **📝 Code Files**
-| File | Purpose | Type |
-|------|---------|------|
-| `main_upgraded.py` | Main entry point | Python |
-| `ask_question_upgraded.py` | Query interface | Python |
-| `test_upgrade.py` | Test suite | Python |
-| `install_upgrade.bat` | Installer script | Batch |
-| `requirements_upgraded.txt` | Dependencies | Text |
----
-## 🗺️ NAVIGATION GUIDE
-### **I want to...**
-**...get started quickly**
-→ [QUICKSTART.md](QUICKSTART.md) → Run `install_upgrade.bat`
-**...understand what changed**
-→ [COMPARISON.md](COMPARISON.md) → [SUMMARY.md](SUMMARY.md)
-**...learn the architecture**
-→ [ARCHITECTURE.md](ARCHITECTURE.md) → Code in `agents/`
-**...deploy to production**
-→ [DEPLOYMENT_CHECKLIST.md](DEPLOYMENT_CHECKLIST.md)
-**...troubleshoot issues**
-→ [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md) → Troubleshooting section
-**...see command reference**
-→ [QUICK_REFERENCE.md](QUICK_REFERENCE.md)
-**...understand the upgrade**
-→ [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md) → [SUMMARY.md](SUMMARY.md)
----
-## 📊 DOCUMENT RELATIONSHIPS
-```
-START
-  │
-  ├─ Quick Start? → QUICKSTART.md
-  │                     │
-  │                     └─ Need details? → UPGRADE_GUIDE.md
-  │
-  ├─ Overview? → README_UPGRADED.md
-  │                  │
-  │                  └─ Technical? → ARCHITECTURE.md
-  │
-  ├─ Comparison? → COMPARISON.md
-  │                    │
-  │                    └─ Summary? → SUMMARY.md
-  │
-  └─ Deploy? → DEPLOYMENT_CHECKLIST.md
-                   │
-                   └─ Reference? → QUICK_REFERENCE.md
-```
----
-## 🎯 BY ROLE
-### **👤 End User**
-1. [QUICKSTART.md](QUICKSTART.md) - Setup
-2. [QUICK_REFERENCE.md](QUICK_REFERENCE.md) - Commands
-3. [README_UPGRADED.md](README_UPGRADED.md) - Features
-### **👨‍💻 Developer**
-1. [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md) - Complete guide
-2. [ARCHITECTURE.md](ARCHITECTURE.md) - System design
-3. Code in `agents/` and `core/` - Implementation
-### **👔 Manager**
-1. [SUMMARY.md](SUMMARY.md) - Executive summary
-2. [COMPARISON.md](COMPARISON.md) - ROI analysis
-3. [README_UPGRADED.md](README_UPGRADED.md) - Overview
-### **🚀 DevOps**
-1. [DEPLOYMENT_CHECKLIST.md](DEPLOYMENT_CHECKLIST.md) - Deploy
-2. [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md) - Installation
-3. `test_upgrade.py` - Testing
----
-## 📚 READING ORDER
-### **Fast Track (30 minutes)**
-1. [QUICKSTART.md](QUICKSTART.md) - 5 min
-2. [QUICK_REFERENCE.md](QUICK_REFERENCE.md) - 2 min
-3. [SUMMARY.md](SUMMARY.md) - 10 min
-4. [COMPARISON.md](COMPARISON.md) - 15 min
-### **Complete Track (2 hours)**
-1. [README_UPGRADED.md](README_UPGRADED.md) - 10 min
-2. [QUICKSTART.md](QUICKSTART.md) - 5 min
-3. [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md) - 30 min
-4. [ARCHITECTURE.md](ARCHITECTURE.md) - 20 min
-5. [COMPARISON.md](COMPARISON.md) - 15 min
-6. [SUMMARY.md](SUMMARY.md) - 10 min
-7. [DEPLOYMENT_CHECKLIST.md](DEPLOYMENT_CHECKLIST.md) - 15 min
-8. Code exploration - 30 min
-### **Technical Deep Dive (4 hours)**
-1. All documents above
-2. Code in `agents/` - 1 hour
-3. Code in `core/` - 30 min
-4. Test suite analysis - 30 min
-5. Hands-on experimentation - 1 hour
----
-## 🔍 BY TOPIC
-### **Installation & Setup**
-- [QUICKSTART.md](QUICKSTART.md) - Quick setup
-- [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md) - Detailed installation
-- `install_upgrade.bat` - Automated installer
-- `requirements_upgraded.txt` - Dependencies
-### **Features & Capabilities**
-- [README_UPGRADED.md](README_UPGRADED.md) - Feature overview
-- [COMPARISON.md](COMPARISON.md) - Before/After features
-- [SUMMARY.md](SUMMARY.md) - Capability summary
-### **Architecture & Design**
-- [ARCHITECTURE.md](ARCHITECTURE.md) - System architecture
-- [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md) - Design decisions
-- Code in `agents/` - Implementation
-### **Usage & Commands**
-- [QUICK_REFERENCE.md](QUICK_REFERENCE.md) - Command reference
-- [QUICKSTART.md](QUICKSTART.md) - Usage examples
-- [README_UPGRADED.md](README_UPGRADED.md) - Use cases
-### **Testing & Deployment**
-- [DEPLOYMENT_CHECKLIST.md](DEPLOYMENT_CHECKLIST.md) - Deploy guide
-- `test_upgrade.py` - Test suite
-- [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md) - Testing section
-### **Troubleshooting**
-- [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md) - Troubleshooting section
-- [QUICK_REFERENCE.md](QUICK_REFERENCE.md) - Quick fixes
-- [DEPLOYMENT_CHECKLIST.md](DEPLOYMENT_CHECKLIST.md) - Error handling
----
-## 📦 FILE INVENTORY
-### **Documentation (11 files)**
-- ✅ QUICKSTART.md
-- ✅ QUICK_REFERENCE.md
-- ✅ README_UPGRADED.md
-- ✅ UPGRADE_GUIDE.md
-- ✅ ARCHITECTURE.md
-- ✅ SUMMARY.md
-- ✅ COMPARISON.md
-- ✅ DEPLOYMENT_CHECKLIST.md
-- ✅ INDEX.md (this file)
-- ✅ README.md (original)
-- ✅ requirements_upgraded.txt
-### **Code Files (12 files)**
-- ✅ agents/__init__.py
-- ✅ agents/voice_agent.py
-- ✅ agents/vision_agent.py
-- ✅ agents/caption_agent.py
-- ✅ agents/embedding_agent.py
-- ✅ agents/ocr_agent.py
-- ✅ agents/memory_agent.py
-- ✅ agents/query_agent.py
-- ✅ core/__init__.py
-- ✅ core/fusion_layer.py
-- ✅ main_upgraded.py
-- ✅ ask_question_upgraded.py
-### **Utility Files (2 files)**
-- ✅ test_upgrade.py
-- ✅ install_upgrade.bat
-### **Total: 25 new/updated files**
----
-## 🎓 LEARNING PATHS
-### **Path 1: Quick User (1 hour)**
-```
-QUICKSTART.md
-    ↓
-Run install_upgrade.bat
-    ↓
-Run main_upgraded.py
-    ↓
-Try voice commands
-    ↓
-QUICK_REFERENCE.md (bookmark)
-```
-### **Path 2: Developer (4 hours)**
-```
-README_UPGRADED.md
-    ↓
-UPGRADE_GUIDE.md
-    ↓
-ARCHITECTURE.md
-    ↓
-Explore agents/ code
-    ↓
-Run test_upgrade.py
-    ↓
-Modify and experiment
-```
-### **Path 3: Manager (30 minutes)**
-```
-SUMMARY.md
-    ↓
-COMPARISON.md
-    ↓
-README_UPGRADED.md
-    ↓
-Make decision
-```
----
-## 🔗 EXTERNAL RESOURCES
-### **Model Documentation**
-- [YOLO](https://github.com/ultralytics/ultralytics)
-- [BLIP](https://github.com/salesforce/BLIP)
-- [CLIP](https://github.com/openai/CLIP)
-- [EasyOCR](https://github.com/JaidedAI/EasyOCR)
-- [FAISS](https://github.com/facebookresearch/faiss)
-- [Vosk](https://alphacephei.com/vosk/)
-- [Piper TTS](https://github.com/rhasspy/piper)
-### **Python Libraries**
-- [PyTorch](https://pytorch.org/)
-- [Transformers](https://huggingface.co/docs/transformers)
-- [OpenCV](https://opencv.org/)
-- [sentence-transformers](https://www.sbert.net/)
----
-## 📞 SUPPORT MATRIX
-| Issue Type | Resource |
-|------------|----------|
-| **Installation** | QUICKSTART.md → UPGRADE_GUIDE.md |
-| **Usage** | QUICK_REFERENCE.md → README_UPGRADED.md |
-| **Errors** | UPGRADE_GUIDE.md (Troubleshooting) |
-| **Architecture** | ARCHITECTURE.md |
-| **Deployment** | DEPLOYMENT_CHECKLIST.md |
-| **Comparison** | COMPARISON.md |
-| **Testing** | test_upgrade.py |
----
-## ✅ DOCUMENTATION CHECKLIST
-### **For Users**
-- [x] Quick start guide
-- [x] Command reference
-- [x] Troubleshooting guide
-- [x] Use case examples
-### **For Developers**
-- [x] Architecture documentation
-- [x] Code organization explained
-- [x] API documentation (docstrings)
-- [x] Test suite
-### **For Managers**
-- [x] Executive summary
-- [x] ROI analysis
-- [x] Feature comparison
-- [x] Deployment guide
-### **For DevOps**
-- [x] Installation scripts
-- [x] Deployment checklist
-- [x] Testing procedures
-- [x] Troubleshooting guide
----
-## 🎯 QUICK LINKS
-**Most Important:**
-- 🚀 [QUICKSTART.md](QUICKSTART.md) - Start here!
-- 📋 [QUICK_REFERENCE.md](QUICK_REFERENCE.md) - Commands
-- 📚 [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md) - Complete guide
-**For Understanding:**
-- 📊 [COMPARISON.md](COMPARISON.md) - What changed
-- 📝 [SUMMARY.md](SUMMARY.md) - Executive summary
-- 🏗️ [ARCHITECTURE.md](ARCHITECTURE.md) - How it works
-**For Action:**
-- ✅ [DEPLOYMENT_CHECKLIST.md](DEPLOYMENT_CHECKLIST.md) - Deploy
-- 🧪 `test_upgrade.py` - Test
-- 🔧 `install_upgrade.bat` - Install
----
-## 🎉 YOU'RE ALL SET!
-**This index covers all documentation for the VisionQ upgrade.**
-**Start with:** [QUICKSTART.md](QUICKSTART.md)
-**Need help?** Check the appropriate document above.
-**Happy upgrading! 🚀**
----
-**Last Updated:** 2024
-**Version:** 2.0 (Upgraded)
-**Status:** ✅ Production Ready

archive/old_docs/QUICKSTART.md DELETED Viewed

@@ -1,197 +0,0 @@
-# 🚀 VisionQ Upgrade - Quick Start Guide
-## ⚡ 5-Minute Setup
-### **Step 1: Install Dependencies**
-```bash
-pip install -r requirements_upgraded.txt
-```
-### **Step 2: Create Data Directory**
-```bash
-mkdir data
-move memory.json data\memory.json
-```
-### **Step 3: Run Upgraded System**
-```bash
-python main_upgraded.py
-```
----
-## 🎯 What's New?
-### **1. OCR Text Reading** ✨
-**Voice Command:** "Read the text"
-- Points camera at text
-- Extracts and speaks visible text
-- Stores text in memory
-### **2. Enhanced Memory** 🧠
-- **FAISS vector search** - 10x faster retrieval
-- **Image embeddings** - Find visually similar scenes
-- **Hybrid search** - Text + image combined
-### **3. Better Voice** 🗣️
-- **Neural TTS** (Voxtral/Piper) - Natural speech
-- **Auto-fallback** - Uses pyttsx3 if needed
-- **Same commands** - No learning curve
-### **4. Smarter Queries** 🔍
-- **DistilBERT NLP** - Understands intent
-- **Time-aware** - "What did I see this morning?"
-- **Multi-modal** - Searches text, images, objects
----
-## 📋 Voice Commands
-| Say This | System Does |
-|----------|-------------|
-| "Describe the scene" | Caption + OCR + objects |
-| "Remember this" | Store with embeddings |
-| "What did I see" | Recall last memory |
-| **"Read the text"** | **Extract visible text** ⭐ NEW |
-| "Exit" | Quit system |
----
-## 🔧 Optional: Neural TTS Setup
-**Want better voice quality?**
-1. Download Piper voice model:
-   - https://github.com/rhasspy/piper/releases
-   - Get: `en_US-lessac-medium.onnx`
-2. Create directory:
-   ```bash
-   mkdir models\piper
-   ```
-3. Extract model to `models/piper/`
-4. Restart VisionQ
-**Note:** System works fine without this - pyttsx3 is the fallback!
----
-## 🧪 Test Your Upgrade
-### **Test 1: OCR**
-```bash
-python main_upgraded.py
-# Say: "Read the text"
-# Point camera at text
-# Should extract and speak text
-```
-### **Test 2: Enhanced Memory**
-```bash
-python ask_question_upgraded.py
-# Type: "What did I see today?"
-# Should show memories with confidence scores
-```
-### **Test 3: Voice Quality**
-```bash
-# Listen to TTS output
-# Should sound natural (if Piper installed)
-# Or robotic (if using pyttsx3 fallback)
-```
----
-## 🐛 Quick Fixes
-### **"Module not found" error**
-```bash
-pip install --upgrade -r requirements_upgraded.txt
-```
-### **"FAISS not available" warning**
-```bash
-pip install faiss-cpu
-```
-### **"OCR not working"**
-```bash
-pip install easyocr
-```
-### **Camera not opening**
-```bash
-# Check camera permissions
-# Try different camera index in vision_agent.py:
-# self.cap = cv2.VideoCapture(1)  # Try 1 instead of 0
-```
----
-## 📊 What Got Better?
-| Feature | Before | After | Improvement |
-|---------|--------|-------|-------------|
-| Text Reading | ❌ None | ✅ OCR | NEW |
-| Memory Search | Slow | Fast | 10x faster |
-| Voice Quality | Robotic | Natural | Much better |
-| Query Understanding | Keywords | NLP | Smarter |
-| Scene Understanding | Caption only | Caption+OCR+Objects | Richer |
----
-## 🎓 Example Queries
-**Try these in `ask_question_upgraded.py`:**
-```
-"What did I see this morning?"
-"Show me memories with text"
-"When did I see a person?"
-"What happened in the last hour?"
-"Find memories from yesterday"
-```
----
-## ✅ Success Checklist
-- [ ] System starts without errors
-- [ ] Voice recognition works
-- [ ] Camera captures video
-- [ ] "Describe scene" gives detailed output
-- [ ] "Remember this" stores memory
-- [ ] "Read text" extracts text (if text visible)
-- [ ] Query system returns results
-- [ ] Memory persists after restart
----
-## 🚀 You're Ready!
-Your VisionQ is now upgraded with:
-- ✅ OCR text reading
-- ✅ Fast vector search (FAISS)
-- ✅ Neural TTS (optional)
-- ✅ Smart NLP queries
-- ✅ Enhanced memory
-**All existing features still work!**
----
-## 📚 Full Documentation
-For detailed information, see:
-- `UPGRADE_GUIDE.md` - Complete upgrade documentation
-- `requirements_upgraded.txt` - All dependencies
-- `agents/` - New modular code
-- `core/` - Fusion layer
----
-**Need help?** Check `UPGRADE_GUIDE.md` troubleshooting section.
-**Happy upgrading! 🎉**

archive/old_docs/QUICK_REFERENCE.md DELETED Viewed

@@ -1,315 +0,0 @@
-# 🎯 VisionQ Upgrade - Quick Reference Card
-## 📦 INSTALLATION (3 Steps)
-```bash
-# 1. Install dependencies
-pip install -r requirements_upgraded.txt
-# 2. Create data directory
-mkdir data
-# 3. Run system
-python main_upgraded.py
-```
----
-## 🗣️ VOICE COMMANDS
-| Say This | System Does |
-|----------|-------------|
-| **"Describe the scene"** | Captures and describes (caption + OCR + objects) |
-| **"Remember this"** | Stores scene with embeddings in memory |
-| **"What did I see"** | Recalls last memory |
-| **"Read the text"** | Extracts visible text (OCR) 🆕 |
-| **"Exit"** | Quits system |
----
-## 🔍 QUERY EXAMPLES
-```bash
-python ask_question_upgraded.py
-```
-**Try these:**
-- "What did I see this morning?"
-- "Show me memories with text"
-- "When did I see a person?"
-- "Find memories from yesterday"
-- "What happened in the last hour?"
----
-## 📂 FILE STRUCTURE
-```
-VisionQ/
-├── agents/              # 🆕 Modular agents
-│   ├── voice_agent.py   # Voice I/O
-│   ├── vision_agent.py  # Vision hub
-│   ├── embedding_agent.py # 🆕 MobileCLIP
-│   ├── ocr_agent.py     # 🆕 Text extraction
-│   ├── memory_agent.py  # Storage (JSON + FAISS)
-│   └── query_agent.py   # Smart retrieval
-│
-├── core/                # 🆕 Integration
-│   └── fusion_layer.py  # 🆕 Multimodal fusion
-│
-├── data/                # 🆕 Storage
-│   ├── memory.json      # Metadata
-│   └── memory.faiss     # 🆕 Vector index
-│
-├── main_upgraded.py     # 🆕 Main entry
-└── ask_question_upgraded.py # 🆕 Query tool
-```
----
-## 🆕 WHAT'S NEW?
-| Feature | Status |
-|---------|--------|
-| **OCR Text Reading** | ✅ NEW |
-| **FAISS Vector Search** | ✅ NEW (10x faster) |
-| **Neural TTS (Voxtral)** | ✅ NEW (natural voice) |
-| **Intent Classification** | ✅ NEW (DistilBERT) |
-| **Multimodal Fusion** | ✅ NEW (richer context) |
----
-## 🔧 CONFIGURATION
-**Vision** (`agents/vision_agent.py`):
-```python
-FRAME_INTERVAL = 0.3      # Seconds between frames
-CONF_THRESHOLD = 0.5      # Detection confidence
-```
-**OCR** (`agents/ocr_agent.py`):
-```python
-OCR_CONFIDENCE = 0.3      # Text threshold
-OCR_LANGUAGES = ['en']    # Languages
-```
-**Query** (`agents/query_agent.py`):
-```python
-SIMILARITY_THRESHOLD = 0.45  # Search threshold
-TOP_K_RESULTS = 5            # Max results
-```
----
-## 🧪 TESTING
-```bash
-# Run test suite
-python test_upgrade.py
-# Expected: All tests pass ✅
-```
----
-## 🐛 TROUBLESHOOTING
-**"Module not found":**
-```bash
-pip install --upgrade -r requirements_upgraded.txt
-```
-**"FAISS not available":**
-```bash
-pip install faiss-cpu
-```
-**"OCR not working":**
-```bash
-pip install easyocr
-```
-**Camera not opening:**
-```python
-# Edit agents/vision_agent.py line ~90
-self.cap = cv2.VideoCapture(1)  # Try 1 instead of 0
-```
----
-## 📚 DOCUMENTATION
-| File | Purpose |
-|------|---------|
-| **QUICKSTART.md** | 5-minute setup |
-| **UPGRADE_GUIDE.md** | Complete guide |
-| **ARCHITECTURE.md** | System design |
-| **SUMMARY.md** | Executive summary |
-| **COMPARISON.md** | Before/After |
-| **DEPLOYMENT_CHECKLIST.md** | Deploy steps |
----
-## 🎯 KEY IMPROVEMENTS
-| Metric | Before | After | Change |
-|--------|--------|-------|--------|
-| **Search Speed** | 100-500ms | <10ms | 🟢 10-50x |
-| **Memory Capacity** | ~1,000 | 10,000+ | 🟢 10x |
-| **Query Accuracy** | 75% | 90% | 🟢 +15% |
-| **Intent Accuracy** | 70% | 97% | 🟢 +27% |
----
-## ✅ BACKWARD COMPATIBILITY
-- ✅ Old memory.json files work
-- ✅ Same voice commands
-- ✅ Old files preserved
-- ✅ Zero breaking changes
----
-## 🚀 QUICK START CHECKLIST
-- [ ] Install: `pip install -r requirements_upgraded.txt`
-- [ ] Setup: `mkdir data`
-- [ ] Test: `python test_upgrade.py`
-- [ ] Run: `python main_upgraded.py`
-- [ ] Try: Voice commands
-- [ ] Query: `python ask_question_upgraded.py`
----
-## 📊 ARCHITECTURE (Simplified)
-```
-Voice → Vision Hub → Fusion → Memory → Query
-         ├─ YOLO           ├─ JSON
-         ├─ BLIP           └─ FAISS
-         ├─ CLIP
-         └─ OCR
-```
----
-## 🔄 DATA FLOW
-```
-1. User speaks → Vosk STT
-2. Camera captures → Vision agents
-3. Fusion combines → Unified context
-4. Memory stores → JSON + FAISS
-5. Query retrieves → Smart search
-6. System speaks → Voxtral/pyttsx3
-```
----
-## 💡 TIPS
-**For Best Performance:**
-- Install FAISS: `pip install faiss-cpu`
-- Install EasyOCR: `pip install easyocr`
-- Use good lighting for OCR
-- Clear audio for voice commands
-**For Better Voice:**
-- Download Piper TTS model
-- Place in `models/piper/`
-- System auto-detects and uses
-**For Faster Startup:**
-- Models cached after first run
-- Subsequent starts faster
----
-## 🎓 LEARNING PATH
-**Beginner:**
-1. Read QUICKSTART.md
-2. Run main_upgraded.py
-3. Try voice commands
-**Intermediate:**
-1. Read UPGRADE_GUIDE.md
-2. Explore agents/ code
-3. Customize parameters
-**Advanced:**
-1. Read ARCHITECTURE.md
-2. Modify agents
-3. Add new features
----
-## 📞 SUPPORT
-**Documentation:**
-- QUICKSTART.md - Quick setup
-- UPGRADE_GUIDE.md - Complete guide
-- ARCHITECTURE.md - Technical details
-**Testing:**
-- test_upgrade.py - Automated tests
-- DEPLOYMENT_CHECKLIST.md - Deploy guide
-**Comparison:**
-- COMPARISON.md - Before/After
-- SUMMARY.md - Executive summary
----
-## 🏆 SUCCESS CRITERIA
-**System Working If:**
-- ✅ Starts without errors
-- ✅ Camera shows video
-- ✅ Voice commands work
-- ✅ Memory persists
-- ✅ Queries return results
----
-## 🎉 YOU'RE READY!
-**Your VisionQ now has:**
-- 🧠 Smarter memory (FAISS)
-- 👁️ Better vision (CLIP + OCR)
-- 🗣️ Natural voice (Voxtral)
-- 🔍 Smart queries (DistilBERT)
-**All while keeping existing features! 🚀**
----
-## 📋 COMMAND CHEAT SHEET
-```bash
-# Install
-pip install -r requirements_upgraded.txt
-# Setup
-mkdir data
-# Test
-python test_upgrade.py
-# Run main system
-python main_upgraded.py
-# Run query tool
-python ask_question_upgraded.py
-# Install optional
-pip install faiss-cpu easyocr piper-tts
-```
----
-**Keep this card handy for quick reference! 📌**
-**For detailed info, see full documentation files.**
-**Happy upgrading! 🎊**

archive/old_docs/README_UPGRADED.md DELETED Viewed

@@ -1,410 +0,0 @@
-# 🚀 VisionQ - Multimodal AI Assistant (UPGRADED)
-> **A voice-controlled AI vision assistant that can see, remember, read text, and recall visual memories through natural conversation.**
-[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
-[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
-[![Status: Production Ready](https://img.shields.io/badge/status-production%20ready-green.svg)]()
----
-## 🎯 What is VisionQ?
-VisionQ is an **upgraded multimodal AI assistant** that combines:
-- 👁️ **Computer Vision** (YOLO/SSD object detection + BLIP captioning)
-- 🔤 **OCR** (EasyOCR text extraction)
-- 🧠 **Semantic Memory** (FAISS vector search + JSON storage)
-- 🗣️ **Voice Interaction** (Vosk STT + Voxtral/Piper TTS)
-- 🔍 **Intelligent Queries** (DistilBERT NLP)
----
-## ✨ Key Features
-### **Core Capabilities**
-- ✅ **Scene Description** - Multimodal understanding (vision + text)
-- ✅ **Memory Storage** - Persistent semantic memory with FAISS
-- ✅ **Memory Recall** - Fast similarity search (10x faster)
-- ✅ **Text Reading** - OCR extraction from images 🆕
-- ✅ **Voice Control** - Natural language commands
-- ✅ **Smart Queries** - Time-aware, intent-based search 🆕
-### **Technical Highlights**
-- 🚀 **FAISS Vector Search** - Lightning-fast similarity matching
-- 🖼️ **MobileCLIP Embeddings** - Visual semantic understanding
-- 🔤 **EasyOCR Integration** - Offline text extraction
-- 🧠 **DistilBERT NLP** - Intent classification
-- 🗣️ **Neural TTS** - Natural voice output (Voxtral/Piper)
-- 🔗 **Multimodal Fusion** - Combined vision + text + embeddings
----
-## 📦 Installation
-### **Quick Install**
-```bash
-# Clone repository
-git clone <your-repo-url>
-cd VisionQ
-# Run automated installer (Windows)
-install_upgrade.bat
-# Or manual install:
-pip install -r requirements_upgraded.txt
-mkdir data
-```
-### **Requirements**
-- Python 3.8+
-- Webcam
-- Microphone
-- ~2GB disk space (for models)
----
-## 🚀 Quick Start
-### **1. Run the System**
-```bash
-python main_upgraded.py
-```
-### **2. Voice Commands**
-| Say This | System Does |
-|----------|-------------|
-| "Describe the scene" | Captures and describes what it sees |
-| "Remember this" | Stores current scene in memory |
-| "What did I see" | Recalls last memory |
-| "Read the text" | Extracts visible text (OCR) 🆕 |
-| "Exit" | Quits the system |
-### **3. Query Memory**
-```bash
-python ask_question_upgraded.py
-```
-**Example Queries:**
-- "What did I see this morning?"
-- "Show me memories with text"
-- "When did I see a person?"
-- "Find memories from yesterday"
----
-## 🏗️ Architecture
-```
-Voice (Vosk + Voxtral/Piper)
-    ↓
-Vision Hub
-  ├─ YOLO/SSD (objects)
-  ├─ BLIP (captions)
-  ├─ MobileCLIP (embeddings)
-  └─ EasyOCR (text)
-    ↓
-Fusion Layer
-    ↓
-Memory (JSON + FAISS)
-    ↓
-Query (DistilBERT)
-```
-**See [ARCHITECTURE.md](ARCHITECTURE.md) for detailed diagrams.**
----
-## 📂 Project Structure
-```
-VisionQ/
-├── agents/              # Modular AI agents
-│   ├── voice_agent.py   # Voice I/O (STT + TTS)
-│   ├── vision_agent.py  # Vision coordinator
-│   ├── caption_agent.py # BLIP captioning
-│   ├── embedding_agent.py # MobileCLIP embeddings
-│   ├── ocr_agent.py     # Text extraction
-│   ├── memory_agent.py  # Storage (JSON + FAISS)
-│   └── query_agent.py   # Intelligent retrieval
-│
-├── core/                # Integration layer
-│   └── fusion_layer.py  # Multimodal fusion
-│
-├── data/                # Persistent storage
-│   ├── memory.json      # Metadata
-│   └── memory.faiss     # Vector index
-│
-├── models/              # AI models
-│   ├── vosk/            # Speech recognition
-│   └── piper/           # Neural TTS (optional)
-│
-├── main_upgraded.py     # Main entry point
-├── ask_question_upgraded.py # Query interface
-└── requirements_upgraded.txt # Dependencies
-```
----
-## 🆕 What's New in This Upgrade?
-### **New Features**
-1. **OCR Text Reading** 🔤
-   - Extract text from images
-   - Confidence filtering
-   - Multi-language support
-2. **Visual Similarity Search** 🖼️
-   - MobileCLIP embeddings
-   - FAISS vector indexing
-   - 10x faster retrieval
-3. **Intent Classification** 🧠
-   - DistilBERT NLP
-   - Better query understanding
-   - Context-aware responses
-4. **Neural TTS** 🗣️
-   - Voxtral/Piper integration
-   - Natural voice output
-   - Automatic fallback to pyttsx3
-5. **Multimodal Fusion** 🔗
-   - Combined vision + text + embeddings
-   - Richer scene descriptions
-   - Better memory context
-### **Performance Improvements**
-- 🚀 10x faster memory search (FAISS)
-- 🎯 20% better query relevance
-- 📈 10x memory capacity (10,000+ entries)
-- ⚡ Sub-100ms query response time
----
-## 🔧 Configuration
-### **Adjustable Parameters**
-**Vision Settings** (`agents/vision_agent.py`):
-```python
-FRAME_INTERVAL = 0.3      # Seconds between frames
-CONF_THRESHOLD = 0.5      # Object detection confidence
-```
-**OCR Settings** (`agents/ocr_agent.py`):
-```python
-OCR_CONFIDENCE = 0.3      # Text detection threshold
-OCR_LANGUAGES = ['en']    # Supported languages
-```
-**Query Settings** (`agents/query_agent.py`):
-```python
-SIMILARITY_THRESHOLD = 0.45  # Text search threshold
-TOP_K_RESULTS = 5            # Max results to return
-```
----
-## 🧪 Testing
-### **Run Test Suite**
-```bash
-python test_upgrade.py
-```
-**Tests:**
-- ✅ Module imports
-- ✅ Dependency availability
-- ✅ MemoryAgent functionality
-- ✅ FusionLayer integration
-- ✅ QueryAgent NLP
-- ✅ Backward compatibility
----
-## 📚 Documentation
-| Document | Description |
-|----------|-------------|
-| [QUICKSTART.md](QUICKSTART.md) | 5-minute setup guide |
-| [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md) | Complete upgrade documentation |
-| [ARCHITECTURE.md](ARCHITECTURE.md) | System architecture details |
-| [SUMMARY.md](SUMMARY.md) | Executive summary |
----
-## 🐛 Troubleshooting
-### **Common Issues**
-**"Module not found" error:**
-```bash
-pip install --upgrade -r requirements_upgraded.txt
-```
-**"FAISS not available" warning:**
-```bash
-pip install faiss-cpu
-```
-**"OCR not working":**
-```bash
-pip install easyocr
-```
-**Camera not opening:**
-```python
-# Try different camera index in vision_agent.py
-self.cap = cv2.VideoCapture(1)  # Try 1 instead of 0
-```
-**See [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md) for more troubleshooting.**
----
-## 🎓 Use Cases
-### **Personal Assistant**
-- "What did I see this morning?"
-- "Remember this document"
-- "Read the text on this sign"
-### **Memory Aid**
-- "When did I last see my keys?"
-- "Show me memories with text"
-- "What was I doing yesterday?"
-### **Accessibility**
-- Text-to-speech for visual content
-- Voice-controlled navigation
-- OCR for reading assistance
----
-## 🔒 Privacy
-- ✅ **100% Offline** - All processing on-device
-- ✅ **No Cloud** - No data sent to external servers
-- ✅ **Local Storage** - Memories stored locally
-- ✅ **No Tracking** - No analytics or telemetry
----
-## 🛠️ Tech Stack
-### **Core Technologies**
-- **Python 3.8+** - Programming language
-- **PyTorch** - Deep learning framework
-- **OpenCV** - Computer vision
-- **FAISS** - Vector similarity search
-### **AI Models**
-- **YOLO/SSD** - Object detection
-- **BLIP** - Image captioning
-- **CLIP** - Visual embeddings
-- **DistilBERT** - NLP
-- **EasyOCR** - Text extraction
-- **Vosk** - Speech recognition
-- **Piper** - Neural TTS
----
-## 📈 Performance
-| Metric | Value |
-|--------|-------|
-| Memory Search | <10ms (FAISS) |
-| OCR Processing | 200-500ms |
-| Caption Generation | 100-200ms |
-| Embedding Generation | 50ms |
-| Query Response | <100ms |
-| Memory Capacity | 10,000+ entries |
----
-## 🚀 Future Enhancements
-### **Planned Features**
-- [ ] Web interface
-- [ ] Mobile app
-- [ ] Cloud sync (optional)
-- [ ] Multi-user support
-- [ ] Video recording
-- [ ] Real-time object tracking
-- [ ] Face recognition
-- [ ] Emotion detection
----
-## 🤝 Contributing
-Contributions welcome! Please:
-1. Fork the repository
-2. Create a feature branch
-3. Make your changes
-4. Submit a pull request
----
-## 📄 License
-This project is licensed under the MIT License - see [LICENSE](LICENSE) file for details.
----
-## 🙏 Acknowledgments
-### **Models & Libraries**
-- [Ultralytics YOLO](https://github.com/ultralytics/ultralytics)
-- [Salesforce BLIP](https://github.com/salesforce/BLIP)
-- [OpenAI CLIP](https://github.com/openai/CLIP)
-- [EasyOCR](https://github.com/JaidedAI/EasyOCR)
-- [FAISS](https://github.com/facebookresearch/faiss)
-- [Vosk](https://alphacephei.com/vosk/)
-- [Piper TTS](https://github.com/rhasspy/piper)
----
-## 📞 Support
-- **Documentation:** See `docs/` folder
-- **Issues:** Open a GitHub issue
-- **Questions:** Check [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md)
----
-## 🎉 Status
-**✅ Production Ready**
-- All features implemented
-- Fully tested
-- Backward compatible
-- Well documented
----
-## 📊 Comparison
-| Feature | Before | After |
-|---------|--------|-------|
-| Text Reading | ❌ | ✅ OCR |
-| Memory Search | Slow | 10x faster |
-| Voice Quality | Robotic | Natural |
-| Query Understanding | Keywords | NLP |
-| Scene Understanding | Caption only | Caption+OCR+Objects |
----
-**VisionQ - See, Remember, Recall. Now with OCR, FAISS, and Neural TTS! 🚀**
----
-## 🏁 Getting Started
-1. **Install:** `install_upgrade.bat` or `pip install -r requirements_upgraded.txt`
-2. **Run:** `python main_upgraded.py`
-3. **Test:** `python test_upgrade.py`
-4. **Query:** `python ask_question_upgraded.py`
-5. **Read:** [QUICKSTART.md](QUICKSTART.md)
-**Happy coding! 🎊**

archive/old_docs/SUMMARY.md DELETED Viewed

@@ -1,406 +0,0 @@
-# 🎯 VisionQ Upgrade - Executive Summary
-## 📊 UPGRADE OVERVIEW
-**Project:** VisionQ Multimodal AI Assistant
-**Upgrade Date:** 2024
-**Status:** ✅ Complete - Ready for Testing
-**Backward Compatibility:** ✅ 100% - All existing features preserved
----
-## 🚀 WHAT WAS UPGRADED
-### **Core Enhancements**
-| Area | Before | After | Impact |
-|------|--------|-------|--------|
-| **Vision** | YOLO + BLIP | YOLO + BLIP + MobileCLIP + OCR | 4x richer understanding |
-| **Memory** | JSON + text embeddings | JSON + FAISS + image embeddings | 10x faster search |
-| **Voice** | Vosk + pyttsx3 | Vosk + Voxtral + pyttsx3 | Natural speech |
-| **Query** | Keyword matching | DistilBERT NLP | Smarter understanding |
-| **Text Reading** | ❌ None | ✅ EasyOCR | NEW capability |
----
-## 🆕 NEW CAPABILITIES
-### **1. OCR Text Extraction** 🔤
-- **What:** Extract and read visible text from camera
-- **How:** EasyOCR with confidence filtering
-- **Use Case:** Read signs, documents, labels
-- **Command:** "Read the text"
-### **2. Visual Similarity Search** 🖼️
-- **What:** Find visually similar memories
-- **How:** MobileCLIP embeddings + FAISS indexing
-- **Use Case:** "Show me similar scenes"
-- **Speed:** 10-100x faster than before
-### **3. Intent Classification** 🧠
-- **What:** Understand query meaning
-- **How:** DistilBERT zero-shot classification
-- **Use Case:** Better query interpretation
-- **Accuracy:** 97% (vs 70% keyword matching)
-### **4. Neural Text-to-Speech** 🗣️
-- **What:** Natural-sounding voice output
-- **How:** Voxtral/Piper neural TTS
-- **Use Case:** Better user experience
-- **Fallback:** pyttsx3 (automatic)
-### **5. Multimodal Fusion** 🔗
-- **What:** Combine caption + OCR + objects + embeddings
-- **How:** FusionLayer integration
-- **Use Case:** Richer scene descriptions
-- **Example:** "a person holding a phone. Text visible: Hello World"
----
-## 📁 NEW FILE STRUCTURE
-```
-VisionQ/
-├── agents/              [NEW] Modular agent architecture
-│   ├── voice_agent.py   [UPDATED] Voxtral + fallback
-│   ├── vision_agent.py  [UPDATED] Multimodal hub
-│   ├── caption_agent.py [KEPT] BLIP captioning
-│   ├── embedding_agent.py [NEW] MobileCLIP
-│   ├── ocr_agent.py     [NEW] EasyOCR
-│   ├── memory_agent.py  [UPDATED] FAISS integration
-│   └── query_agent.py   [UPDATED] DistilBERT
-│
-├── core/                [NEW] Integration layer
-│   └── fusion_layer.py  [NEW] Multimodal fusion
-│
-├── data/                [NEW] Persistent storage
-│   ├── memory.json      [EXISTING] Metadata
-│   └── memory.faiss     [NEW] Vector index
-│
-├── main_upgraded.py     [NEW] Upgraded entry point
-├── ask_question_upgraded.py [NEW] Enhanced queries
-└── requirements_upgraded.txt [NEW] Dependencies
-```
----
-## 🔧 TECHNICAL IMPROVEMENTS
-### **Performance**
-- **Memory Search:** O(n) → O(log n) with FAISS
-- **Query Speed:** 100ms → 10ms average
-- **Embedding Generation:** 50ms per image
-- **OCR Processing:** 200-500ms per frame
-### **Accuracy**
-- **Intent Classification:** 70% → 97%
-- **Text Extraction:** N/A → 85-95% (depends on image quality)
-- **Memory Retrieval:** 75% → 90% relevance
-### **Scalability**
-- **Memory Capacity:** 1,000 → 10,000+ entries
-- **Search Performance:** Linear → Logarithmic
-- **Concurrent Queries:** 1 → Multiple (FAISS thread-safe)
----
-## 🎯 USE CASES
-### **Before Upgrade**
-1. ✅ Describe current scene
-2. ✅ Remember scenes
-3. ✅ Recall last memory
-4. ❌ Read text
-5. ❌ Find similar scenes
-6. ❌ Smart queries
-### **After Upgrade**
-1. ✅ Describe scene (enhanced with OCR)
-2. ✅ Remember scenes (with embeddings)
-3. ✅ Recall memories (faster, smarter)
-4. ✅ **Read text from images** 🆕
-5. ✅ **Find visually similar memories** 🆕
-6. ✅ **Natural language queries** 🆕
-7. ✅ **Time-aware search** 🆕
-8. ✅ **Hybrid text+image search** 🆕
----
-## 📦 DEPENDENCIES ADDED
-### **Required**
-```
-faiss-cpu          # Vector similarity search
-easyocr            # Text extraction
-```
-### **Optional (Recommended)**
-```
-piper-tts          # Neural TTS
-```
-### **Kept from Original**
-```
-torch              # Deep learning
-transformers       # BLIP, CLIP, DistilBERT
-sentence-transformers  # Text embeddings
-opencv-python      # Computer vision
-vosk               # Speech recognition
-pyttsx3            # TTS fallback
-ultralytics        # YOLO
-```
----
-## ✅ WHAT WAS PRESERVED
-### **100% Backward Compatible**
-| Feature | Status | Notes |
-|---------|--------|-------|
-| Voice commands | ✅ KEPT | Same commands work |
-| YOLO/SSD detection | ✅ KEPT | No changes |
-| BLIP captioning | ✅ KEPT | Still primary |
-| JSON memory | ✅ KEPT | Same format |
-| Time filtering | ✅ KEPT | Enhanced |
-| Importance scoring | ✅ KEPT | Same algorithm |
-| Vosk STT | ✅ KEPT | No changes |
-| pyttsx3 TTS | ✅ KEPT | Now fallback |
-### **Old Files Preserved**
-- All original `.py` files in root directory
-- Can run old system alongside new
-- No breaking changes
----
-## 🚦 DEPLOYMENT STATUS
-### **Ready for Production** ✅
-- [x] All modules implemented
-- [x] Fallback mechanisms in place
-- [x] Error handling added
-- [x] Documentation complete
-- [x] Backward compatibility verified
-### **Testing Required** ⚠️
-- [ ] End-to-end voice commands
-- [ ] OCR on various text types
-- [ ] FAISS performance with 1000+ memories
-- [ ] Neural TTS quality
-- [ ] Memory persistence across restarts
-### **Optional Enhancements** 💡
-- [ ] Web interface
-- [ ] Mobile app
-- [ ] Cloud sync
-- [ ] Multi-user support
-- [ ] Video recording
----
-## 📈 EXPECTED BENEFITS
-### **User Experience**
-- **Better Understanding:** OCR + embeddings = richer context
-- **Faster Responses:** FAISS = 10x faster search
-- **Natural Voice:** Voxtral = human-like speech
-- **Smarter Queries:** DistilBERT = better understanding
-### **Developer Experience**
-- **Modular Code:** Easy to extend/modify
-- **Clear Architecture:** Well-documented
-- **Fallback Safety:** System never breaks
-- **Type Safety:** Clear interfaces
-### **System Performance**
-- **Scalability:** Handles 10x more memories
-- **Speed:** 10x faster retrieval
-- **Accuracy:** 20% improvement in relevance
-- **Reliability:** Multiple fallback layers
----
-## 🎓 LEARNING OUTCOMES
-### **Technologies Integrated**
-1. **CLIP** - Visual-language understanding
-2. **FAISS** - Efficient vector search
-3. **EasyOCR** - Text extraction
-4. **DistilBERT** - Intent classification
-5. **Piper TTS** - Neural speech synthesis
-### **Design Patterns Applied**
-1. **Modular Architecture** - Separate agents
-2. **Fallback Pattern** - Graceful degradation
-3. **Fusion Pattern** - Multimodal integration
-4. **Hybrid Storage** - JSON + FAISS
-5. **Dependency Injection** - Loose coupling
----
-## 🔍 TESTING CHECKLIST
-### **Critical Path** (Must Work)
-- [ ] System starts without errors
-- [ ] Voice recognition functional
-- [ ] Camera capture working
-- [ ] Basic commands work
-- [ ] Memory persists
-### **New Features** (Should Work)
-- [ ] OCR extracts text
-- [ ] FAISS search faster
-- [ ] Neural TTS sounds natural
-- [ ] Intent classification accurate
-- [ ] Fusion layer combines data
-### **Fallbacks** (Must Work)
-- [ ] pyttsx3 if Voxtral fails
-- [ ] Keyword matching if DistilBERT fails
-- [ ] Linear search if FAISS unavailable
-- [ ] System continues if OCR fails
----
-## 📞 NEXT STEPS
-### **Immediate (Day 1)**
-1. Install dependencies: `pip install -r requirements_upgraded.txt`
-2. Create data directory: `mkdir data`
-3. Run system: `python main_upgraded.py`
-4. Test voice commands
-5. Verify memory storage
-### **Short-term (Week 1)**
-1. Test OCR on various text types
-2. Build up memory database (100+ entries)
-3. Benchmark FAISS performance
-4. Fine-tune confidence thresholds
-5. Collect user feedback
-### **Long-term (Month 1)**
-1. Optimize for mobile deployment
-2. Add web interface
-3. Implement cloud sync
-4. Add multi-language support
-5. Create demo videos
----
-## 💰 COST-BENEFIT ANALYSIS
-### **Development Cost**
-- **Time:** ~8 hours implementation
-- **Complexity:** Medium (modular design)
-- **Risk:** Low (backward compatible)
-### **Benefits**
-- **Functionality:** +50% new capabilities
-- **Performance:** 10x faster search
-- **User Experience:** Significantly improved
-- **Maintainability:** Better code structure
-- **Scalability:** 10x capacity increase
-### **ROI**
-- **High:** Major capability boost with minimal risk
-- **Immediate:** All features ready to use
-- **Long-term:** Foundation for future enhancements
----
-## 🏆 SUCCESS METRICS
-### **Technical Metrics**
-- ✅ 100% backward compatibility
-- ✅ 0 breaking changes
-- ✅ 10x search performance improvement
-- ✅ 4 new major features
-- ✅ 8 new modules created
-### **User Metrics** (To Measure)
-- Query response time < 100ms
-- OCR accuracy > 85%
-- Intent classification > 90%
-- User satisfaction score
-- Feature adoption rate
----
-## 📚 DOCUMENTATION
-### **Created Documents**
-1. ✅ `UPGRADE_GUIDE.md` - Complete upgrade documentation
-2. ✅ `QUICKSTART.md` - 5-minute setup guide
-3. ✅ `ARCHITECTURE.md` - Detailed system architecture
-4. ✅ `SUMMARY.md` - This executive summary
-5. ✅ `requirements_upgraded.txt` - Dependencies
-6. ✅ Inline code comments - All modules documented
-### **Code Documentation**
-- All agents have docstrings
-- Methods documented with parameters
-- Clear status markers (KEPT/UPDATED/NEW)
-- Architecture diagrams included
----
-## 🎉 CONCLUSION
-### **Upgrade Success** ✅
-VisionQ has been successfully upgraded from a basic vision assistant to a **comprehensive multimodal AI system** with:
-- 🧠 **Smarter memory** (FAISS vector search)
-- 👁️ **Better vision** (MobileCLIP + OCR)
-- 🗣️ **Natural voice** (Voxtral neural TTS)
-- 🔍 **Intelligent queries** (DistilBERT NLP)
-- 🔗 **Multimodal fusion** (Combined understanding)
-### **Key Achievements**
-- ✅ All existing features preserved
-- ✅ 4 major new capabilities added
-- ✅ 10x performance improvement
-- ✅ Zero breaking changes
-- ✅ Production-ready code
-### **Ready for Deployment**
-The system is now ready for:
-- ✅ Testing and validation
-- ✅ User feedback collection
-- ✅ Production deployment
-- ✅ Future enhancements
----
-**The upgrade is complete. VisionQ is now a state-of-the-art multimodal AI assistant! 🚀**
----
-## 📋 QUICK REFERENCE
-**Start Upgraded System:**
-```bash
-python main_upgraded.py
-```
-**Test Query System:**
-```bash
-python ask_question_upgraded.py
-```
-**Install Dependencies:**
-```bash
-pip install -r requirements_upgraded.txt
-```
-**Documentation:**
-- Setup: `QUICKSTART.md`
-- Details: `UPGRADE_GUIDE.md`
-- Architecture: `ARCHITECTURE.md`
----
-**Questions?** See `UPGRADE_GUIDE.md` troubleshooting section.
-**Happy upgrading! 🎊**

archive/old_docs/UPGRADE_GUIDE.md DELETED Viewed

@@ -1,532 +0,0 @@
-# 🚀 VisionQ System Upgrade Documentation
-## 📊 UPGRADE SUMMARY
-VisionQ has been upgraded from a basic vision assistant to a **multimodal AI system** with:
-- ✅ Enhanced vision understanding (MobileCLIP embeddings)
-- ✅ OCR text extraction (EasyOCR)
-- ✅ Fast vector search (FAISS)
-- ✅ Improved NLP (DistilBERT)
-- ✅ Neural TTS (Voxtral/Piper)
-- ✅ **ALL existing functionality preserved**
----
-## 🏗️ ARCHITECTURE CHANGES
-### **Before (Original System)**
-```
-Voice (Vosk + pyttsx3)
-  ↓
-Vision (YOLO/SSD + BLIP)
-  ↓
-Memory (JSON + sentence-transformers)
-  ↓
-Query (cosine similarity)
-```
-### **After (Upgraded System)**
-```
-Voice (Vosk + Voxtral/Piper + pyttsx3 fallback)
-  ↓
-Vision Hub
-  ├─ YOLO/SSD (objects)
-  ├─ BLIP (captions)
-  ├─ MobileCLIP (embeddings)
-  └─ EasyOCR (text)
-  ↓
-Fusion Layer (combines all modalities)
-  ↓
-Memory (JSON metadata + FAISS vectors)
-  ↓
-Query (DistilBERT + hybrid search)
-```
----
-## 📂 NEW FILE STRUCTURE
-```
-VisionQ/
-├── agents/                    [NEW FOLDER]
-│   ├── __init__.py           [NEW]
-│   ├── voice_agent.py        [UPDATED]
-│   ├── vision_agent.py       [UPDATED]
-│   ├── caption_agent.py      [UNCHANGED]
-│   ├── embedding_agent.py    [NEW]
-│   ├── ocr_agent.py          [NEW]
-│   ├── memory_agent.py       [UPDATED]
-│   └── query_agent.py        [UPDATED]
-│
-├── core/                      [NEW FOLDER]
-│   ├── __init__.py           [NEW]
-│   └── fusion_layer.py       [NEW]
-│
-├── data/                      [NEW FOLDER]
-│   ├── memory.json           [EXISTING]
-│   └── memory.faiss          [NEW - auto-generated]
-│
-├── models/
-│   ├── vosk/                 [EXISTING]
-│   └── piper/                [NEW - optional]
-│
-├── main_upgraded.py          [NEW]
-├── ask_question_upgraded.py  [NEW]
-├── requirements_upgraded.txt [NEW]
-│
-└── [OLD FILES PRESERVED]
-    ├── main.py
-    ├── voice_agent.py
-    ├── vision_agent.py
-    ├── caption_agent.py
-    ├── memory_agent.py
-    ├── query_agent.py
-    └── ask_question.py
-```
----
-## 🆕 NEW MODULES
-### **1. EmbeddingAgent** (`agents/embedding_agent.py`)
-- **Purpose**: Generate visual embeddings using CLIP
-- **Input**: BGR image frame
-- **Output**: 512-dim embedding vector
-- **Use**: Semantic image search via FAISS
-### **2. OCRAgent** (`agents/ocr_agent.py`)
-- **Purpose**: Extract text from images
-- **Technology**: EasyOCR (offline, lightweight)
-- **Features**:
-  - Confidence filtering
-  - Text cleaning
-  - Multi-language support
-### **3. FusionLayer** (`core/fusion_layer.py`)
-- **Purpose**: Combine multimodal inputs
-- **Inputs**: Caption + OCR + Objects + Embedding
-- **Output**: Unified context dictionary
-- **Key Method**: `fuse()` - creates structured multimodal data
----
-## 🔄 UPDATED MODULES
-### **1. VoiceAgent** (UPDATED)
-**Kept:**
-- Vosk STT
-- Intent parsing
-- Microphone detection
-**Added:**
-- Voxtral/Piper neural TTS (primary)
-- pyttsx3 fallback mechanism
-- Automatic TTS switching
-**New Methods:**
-- `_init_voxtral()` - Load neural TTS
-- `_speak_voxtral()` - Neural speech synthesis
-- Fallback logic in `speak()`
----
-### **2. VisionAgent** (UPDATED)
-**Kept:**
-- YOLO/SSD object detection
-- BLIP captioning
-- Camera capture
-- Continuous monitoring
-**Added:**
-- EmbeddingAgent integration
-- OCRAgent integration
-- FusionLayer coordination
-- Multimodal memory storage
-**New Methods:**
-- `read_text()` - OCR extraction
-- Enhanced `describe_scene()` with OCR
-- Enhanced `remember_scene()` with embeddings
----
-### **3. MemoryAgent** (UPDATED)
-**Kept:**
-- JSON metadata storage
-- sentence-transformers text embeddings
-- Importance scoring
-- Timestamp tracking
-**Added:**
-- FAISS vector indexing
-- Image embedding storage
-- Hybrid search (text + image)
-- Fast similarity search
-**New Methods:**
-- `_init_faiss_index()` - FAISS setup
-- `_save_faiss_index()` - Persist vectors
-- `search_by_image()` - Visual similarity search
-- Enhanced `add()` with image embeddings
-**Storage Format:**
-```json
-{
-  "id": 0,
-  "timestamp": "2024-01-15 10:30:00",
-  "description": "a person holding a phone. Text visible: Hello World",
-  "text_embedding": [...],
-  "image_embedding": [...],
-  "importance": 5
-}
-```
----
-### **4. QueryAgent** (UPDATED)
-**Kept:**
-- Time-based filtering
-- Cosine similarity
-- Importance weighting
-**Added:**
-- DistilBERT intent classification
-- Hybrid search (text + image)
-- Multi-source ranking
-**New Methods:**
-- `classify_intent()` - NLP-based intent detection
-- `_fallback_intent()` - Keyword-based backup
-- Enhanced `ask()` with hybrid search
-**Intent Categories:**
-- `temporal` - Time-based queries
-- `object` - Object detection queries
-- `action` - Activity queries
-- `text` - OCR-related queries
-- `general` - Scene descriptions
----
-## 🎯 NEW FEATURES
-### **1. OCR Text Reading**
-```python
-# Voice command: "Read the text"
-# System extracts and speaks visible text
-```
-**Implementation:**
-- EasyOCR extracts text from camera frame
-- Confidence filtering (threshold: 0.3)
-- Text cleaning and normalization
-- Integrated into memory descriptions
----
-### **2. Visual Similarity Search**
-```python
-# Find visually similar memories
-results = memory_agent.search_by_image(query_embedding, k=5)
-```
-**How it works:**
-1. MobileCLIP generates image embedding
-2. FAISS performs fast similarity search
-3. Returns top-k matching memories
-4. Combines with text search for hybrid ranking
----
-### **3. Intent Classification**
-```python
-# DistilBERT understands query intent
-intent = query_agent.classify_intent("What did I see this morning?")
-# Returns: "temporal"
-```
-**Benefits:**
-- Better query understanding
-- Context-aware responses
-- Improved accuracy
----
-### **4. Neural TTS**
-```python
-# High-quality voice output
-voice.speak("Scene remembered")
-# Uses Voxtral/Piper if available
-# Falls back to pyttsx3 automatically
-```
----
-## 🔧 INSTALLATION GUIDE
-### **Step 1: Backup Existing System**
-```bash
-# Create backup of old files
-mkdir backup
-copy *.py backup\
-```
-### **Step 2: Install Dependencies**
-```bash
-# Create virtual environment
-python -m venv venv
-venv\Scripts\activate
-# Install upgraded requirements
-pip install -r requirements_upgraded.txt
-```
-### **Step 3: Download Models**
-**Vosk (Required - Already have):**
-- Location: `models/vosk/`
-- ✅ Already installed
-**Piper Voice (Optional - for neural TTS):**
-```bash
-# Download from: https://github.com/rhasspy/piper/releases
-# Example: en_US-lessac-medium.onnx
-# Extract to: models/piper/
-```
-### **Step 4: Migrate Memory Data**
-```bash
-# Create data directory
-mkdir data
-# Move existing memory
-move memory.json data\memory.json
-```
-### **Step 5: Test Upgraded System**
-```bash
-# Test voice + vision
-python main_upgraded.py
-# Test query system
-python ask_question_upgraded.py
-```
----
-## 🎮 USAGE GUIDE
-### **Voice Commands (UPDATED)**
-| Command | Action | Status |
-|---------|--------|--------|
-| "Describe the scene" | Get multimodal description | ✅ ENHANCED |
-| "Remember this" | Store with embeddings | ✅ ENHANCED |
-| "What did I see" | Recall last memory | ✅ KEPT |
-| "Read the text" | OCR extraction | ✅ NEW |
-| "Exit" | Quit system | ✅ KEPT |
-### **Query Examples (NEW)**
-**Time-based:**
-```
-"What did I see this morning?"
-"Show me memories from yesterday"
-"What happened in the last hour?"
-```
-**Object-based:**
-```
-"When did I see a person?"
-"Find memories with a phone"
-```
-**Text-based:**
-```
-"What text did I see today?"
-"Find memories with visible text"
-```
----
-## 🔍 TESTING CHECKLIST
-### **Basic Functionality (Must Work)**
-- [ ] Camera capture
-- [ ] Voice recognition (Vosk)
-- [ ] Voice output (pyttsx3 fallback)
-- [ ] BLIP captioning
-- [ ] YOLO/SSD detection
-- [ ] Memory storage (JSON)
-- [ ] Memory recall
-### **New Features (Should Work if Dependencies Installed)**
-- [ ] OCR text extraction
-- [ ] MobileCLIP embeddings
-- [ ] FAISS vector search
-- [ ] DistilBERT intent classification
-- [ ] Voxtral/Piper TTS
-- [ ] Fusion layer integration
-### **Fallback Mechanisms (Must Work)**
-- [ ] pyttsx3 if Voxtral fails
-- [ ] Keyword intent if DistilBERT fails
-- [ ] Text search if FAISS unavailable
-- [ ] System continues if OCR fails
----
-## 🐛 TROUBLESHOOTING
-### **Issue: FAISS not installing**
-```bash
-# Try CPU version
-pip install faiss-cpu
-# Or GPU version (if CUDA available)
-pip install faiss-gpu
-```
-### **Issue: EasyOCR fails**
-```bash
-# Install dependencies
-pip install easyocr torch torchvision
-```
-### **Issue: Piper TTS not working**
-```bash
-# System will automatically fall back to pyttsx3
-# No action needed - this is expected behavior
-```
-### **Issue: Import errors**
-```bash
-# Ensure you're in project root
-cd VisionQ
-# Run with Python module syntax
-python -m main_upgraded
-```
----
-## 📊 PERFORMANCE COMPARISON
-| Feature | Before | After |
-|---------|--------|-------|
-| Caption Quality | BLIP only | BLIP + OCR + Objects |
-| Memory Search | Text only | Text + Image (FAISS) |
-| Query Understanding | Keywords | DistilBERT NLP |
-| TTS Quality | Robotic | Natural (Voxtral) |
-| Search Speed | O(n) linear | O(log n) FAISS |
-| Text Reading | ❌ None | ✅ EasyOCR |
----
-## 🚀 NEXT STEPS
-### **Immediate:**
-1. Test all voice commands
-2. Verify OCR on text images
-3. Check memory persistence
-4. Test query system
-### **Optional Enhancements:**
-1. Add FastVLM for faster captioning
-2. Implement image-to-image search UI
-3. Add multi-language OCR
-4. Create web interface
-5. Add video recording
-### **Production Readiness:**
-1. Add error logging
-2. Implement health checks
-3. Add configuration file
-4. Create Docker container
-5. Add unit tests
----
-## 📝 MIGRATION NOTES
-### **Backward Compatibility:**
-- ✅ Old `memory.json` files work with new system
-- ✅ Existing voice commands unchanged
-- ✅ Old agents still available in root directory
-- ✅ Can run old and new systems side-by-side
-### **Breaking Changes:**
-- ❌ None - fully backward compatible
-### **Deprecation Warnings:**
-- Old files in root will be deprecated in future versions
-- Recommended to use `agents/` modules going forward
----
-## 🎓 LEARNING RESOURCES
-**MobileCLIP:**
-- Paper: https://arxiv.org/abs/2311.17049
-- Use: Visual embeddings for similarity search
-**FAISS:**
-- Docs: https://github.com/facebookresearch/faiss
-- Use: Fast vector similarity search
-**EasyOCR:**
-- Docs: https://github.com/JaidedAI/EasyOCR
-- Use: Offline text extraction
-**DistilBERT:**
-- Paper: https://arxiv.org/abs/1910.01108
-- Use: Efficient NLP for intent classification
-**Piper TTS:**
-- Docs: https://github.com/rhasspy/piper
-- Use: Neural text-to-speech
----
-## ✅ VERIFICATION
-Run this checklist to verify upgrade success:
-```bash
-# 1. Check file structure
-dir agents
-dir core
-dir data
-# 2. Test imports
-python -c "from agents import VisionAgent; print('✅ Imports OK')"
-# 3. Test memory agent
-python -c "from agents import MemoryAgent; m = MemoryAgent(); print('✅ Memory OK')"
-# 4. Run upgraded system
-python main_upgraded.py
-```
----
-## 📞 SUPPORT
-If you encounter issues:
-1. Check `TROUBLESHOOTING.md` section above
-2. Verify all dependencies installed
-3. Check Python version (3.8+)
-4. Ensure camera/microphone permissions
-5. Review error logs
----
-**Upgrade completed successfully! 🎉**
-Your VisionQ system now has:
-- 🧠 Smarter memory (FAISS)
-- 👁️ Better vision (MobileCLIP + OCR)
-- 🗣️ Natural voice (Voxtral)
-- 🔍 Intelligent queries (DistilBERT)
-**All while keeping your existing system intact!**

archive/old_scripts/ask_question.py DELETED Viewed

@@ -1,19 +0,0 @@
-from memory_agent import MemoryAgent
-from query_agent import QueryAgent
-def main():
-    memory_agent = MemoryAgent()
-    query_agent = QueryAgent(memory_agent)
-    print("🧠 Memory Query System (type 'exit' to quit)")
-    while True:
-        question = input("\nAsk a question: ").strip()
-        if question.lower() == "exit":
-            break
-        answer = query_agent.ask(question)
-        print("\n" + answer)
-if __name__ == "__main__":
-    main()

archive/old_scripts/ask_question_upgraded.py DELETED Viewed

@@ -1,41 +0,0 @@
-"""
-Memory Query System - Interactive memory search
-UPDATED: Now includes intent classification and hybrid search
-"""
-from agents.memory_agent import MemoryAgent
-from agents.query_agent import QueryAgent
-def main():
-    print("=" * 60)
-    print("🧠 VisionQ Memory Query System (UPGRADED)")
-    print("=" * 60)
-    print("\nFeatures:")
-    print("  • Time-based queries (today, yesterday, last hour)")
-    print("  • Semantic search with DistilBERT")
-    print("  • FAISS-powered similarity search")
-    print("  • OCR text search")
-    print("\nType 'exit' to quit\n")
-    memory_agent = MemoryAgent()
-    query_agent = QueryAgent(memory_agent)
-    while True:
-        question = input("\n❓ Ask a question: ").strip()
-        if question.lower() == "exit":
-            print("Goodbye!")
-            break
-        if not question:
-            continue
-        # Query with enhanced capabilities
-        answer = query_agent.ask(question)
-        print(f"\n💡 Answer:\n{answer}\n")
-        print("-" * 60)
-if __name__ == "__main__":
-    main()

archive/old_scripts/install_upgrade.bat DELETED Viewed

@@ -1,101 +0,0 @@
-@echo off
-REM ============================================
-REM VisionQ Upgrade - Automated Installation
-REM ============================================
-echo.
-echo ============================================
-echo VisionQ System Upgrade Installer
-echo ============================================
-echo.
-REM Check Python installation
-python --version >nul 2>&1
-if errorlevel 1 (
-    echo [ERROR] Python not found. Please install Python 3.8+
-    pause
-    exit /b 1
-)
-echo [1/6] Python detected
-echo.
-REM Create data directory
-echo [2/6] Creating data directory...
-if not exist "data" mkdir data
-echo       - data\ created
-REM Move existing memory file
-if exist "memory.json" (
-    echo [3/6] Migrating existing memory...
-    move /Y memory.json data\memory.json >nul
-    echo       - memory.json moved to data\
-) else (
-    echo [3/6] No existing memory found (fresh install)
-)
-REM Install dependencies
-echo.
-echo [4/6] Installing dependencies...
-echo       This may take several minutes...
-echo.
-pip install -r requirements_upgraded.txt
-if errorlevel 1 (
-    echo [ERROR] Dependency installation failed
-    pause
-    exit /b 1
-)
-echo.
-echo [5/6] Verifying installation...
-REM Test imports
-python -c "from agents import VisionAgent; print('  - Agents: OK')" 2>nul
-if errorlevel 1 (
-    echo [ERROR] Agent import failed
-    pause
-    exit /b 1
-)
-python -c "from core import FusionLayer; print('  - Core: OK')" 2>nul
-if errorlevel 1 (
-    echo [ERROR] Core import failed
-    pause
-    exit /b 1
-)
-python -c "import faiss; print('  - FAISS: OK')" 2>nul
-if errorlevel 1 (
-    echo [WARNING] FAISS not available (optional)
-    echo            Install with: pip install faiss-cpu
-)
-python -c "import easyocr; print('  - EasyOCR: OK')" 2>nul
-if errorlevel 1 (
-    echo [WARNING] EasyOCR not available (optional)
-    echo            Install with: pip install easyocr
-)
-echo.
-echo [6/6] Installation complete!
-echo.
-echo ============================================
-echo VisionQ Upgrade Installed Successfully!
-echo ============================================
-echo.
-echo Next steps:
-echo   1. Run: python main_upgraded.py
-echo   2. Test voice commands
-echo   3. Check QUICKSTART.md for usage guide
-echo.
-echo Optional enhancements:
-echo   - Install FAISS: pip install faiss-cpu
-echo   - Install EasyOCR: pip install easyocr
-echo   - Download Piper TTS model (see QUICKSTART.md)
-echo.
-echo Documentation:
-echo   - QUICKSTART.md - Quick start guide
-echo   - UPGRADE_GUIDE.md - Complete documentation
-echo   - ARCHITECTURE.md - System architecture
-echo.
-pause

archive/old_scripts/main.py DELETED Viewed

@@ -1,66 +0,0 @@
-from voice_agent import VoiceAgent
-from vision_agent import VisionAgent
-def main():
-    voice = VoiceAgent()
-    vision = VisionAgent()
-    voice.speak("Vision Q started. I am listening.")
-    while True:
-        spoken_text = voice.listen()
-        if not spoken_text:
-            continue
-        intent = voice.parse_intent(spoken_text)
-        print("[INTENT]:", intent)
-        if intent == "DESCRIBE_SCENE":
-            voice.speak("Describing the scene.")
-            description = vision.describe_scene()
-            if description:
-                print("[DESCRIPTION]:", description)
-                voice.speak(description)
-            else:
-                voice.speak("I could not capture the scene.")
-        elif intent == "REMEMBER_SCENE":
-            voice.speak("I will remember this scene.")
-            description = vision.remember_scene()
-            if description:
-                print("[REMEMBERED]:", description)
-                voice.speak("Scene remembered.")
-            else:
-                voice.speak("I could not remember the scene.")
-        elif intent == "RECALL_MEMORY":
-            memory = vision.memory_agent.recall_last()
-            if memory:
-                voice.speak(memory)
-            else:
-                voice.speak("I do not have any memories yet.")
-        elif intent == "READ_TEXT":
-            # OCR intentionally postponed
-            voice.speak("Reading text will be available soon.")
-        elif intent == "EXIT":
-            voice.speak("Goodbye.")
-            vision.cleanup()
-            break
-        else:
-            voice.speak("I did not understand.")
-    print("Vision Q stopped.")
-if __name__ == "__main__":
-    main()

archive/old_scripts/main_upgraded.py DELETED Viewed

@@ -1,85 +0,0 @@
-"""
-VisionQ - Upgraded Multimodal AI Assistant
-UPDATED: Now includes OCR, embeddings, and enhanced memory
-"""
-from agents.voice_agent import VoiceAgent
-from agents.vision_agent import VisionAgent
-def main():
-    print("=" * 60)
-    print("VisionQ - Multimodal AI Assistant (UPGRADED)")
-    print("=" * 60)
-    # Initialize agents
-    voice = VoiceAgent()
-    vision = VisionAgent()
-    voice.speak("Vision Q started. I am listening.")
-    while True:
-        spoken_text = voice.listen()
-        if not spoken_text:
-            continue
-        intent = voice.parse_intent(spoken_text)
-        print(f"[INTENT]: {intent}")
-        # ===== DESCRIBE SCENE (UPDATED) =====
-        if intent == "DESCRIBE_SCENE":
-            voice.speak("Describing the scene.")
-            description = vision.describe_scene()
-            if description:
-                print(f"[DESCRIPTION]: {description}")
-                voice.speak(description)
-            else:
-                voice.speak("I could not capture the scene.")
-        # ===== REMEMBER SCENE (UPDATED) =====
-        elif intent == "REMEMBER_SCENE":
-            voice.speak("I will remember this scene.")
-            description = vision.remember_scene()
-            if description:
-                print(f"[REMEMBERED]: {description}")
-                voice.speak("Scene remembered.")
-            else:
-                voice.speak("I could not remember the scene.")
-        # ===== RECALL MEMORY (KEPT) =====
-        elif intent == "RECALL_MEMORY":
-            memory = vision.memory_agent.recall_last()
-            if memory:
-                response = f"At {memory['timestamp']}, {memory['description']}"
-                voice.speak(response)
-            else:
-                voice.speak("I do not have any memories yet.")
-        # ===== READ TEXT (NEW - NOW FUNCTIONAL) =====
-        elif intent == "READ_TEXT":
-            voice.speak("Reading text from the scene.")
-            text_result = vision.read_text()
-            if text_result:
-                print(f"[OCR]: {text_result}")
-                voice.speak(text_result)
-            else:
-                voice.speak("I could not read any text.")
-        # ===== EXIT (KEPT) =====
-        elif intent == "EXIT":
-            voice.speak("Goodbye.")
-            vision.cleanup()
-            break
-        # ===== UNKNOWN (KEPT) =====
-        else:
-            voice.speak("I did not understand.")
-    print("Vision Q stopped.")
-if __name__ == "__main__":
-    main()

archive/old_scripts/test_upgrade.py DELETED Viewed

@@ -1,274 +0,0 @@
-"""
-VisionQ Upgrade - Automated Test Suite
-Tests all new and existing functionality
-"""
-import sys
-import os
-def test_imports():
-    """Test all module imports"""
-    print("\n" + "="*60)
-    print("TEST 1: Module Imports")
-    print("="*60)
-    tests = [
-        ("agents.voice_agent", "VoiceAgent"),
-        ("agents.vision_agent", "VisionAgent"),
-        ("agents.caption_agent", "CaptionAgent"),
-        ("agents.embedding_agent", "EmbeddingAgent"),
-        ("agents.ocr_agent", "OCRAgent"),
-        ("agents.memory_agent", "MemoryAgent"),
-        ("agents.query_agent", "QueryAgent"),
-        ("core.fusion_layer", "FusionLayer"),
-    ]
-    passed = 0
-    failed = 0
-    for module, cls in tests:
-        try:
-            exec(f"from {module} import {cls}")
-            print(f"  ✅ {module}.{cls}")
-            passed += 1
-        except Exception as e:
-            print(f"  ❌ {module}.{cls} - {e}")
-            failed += 1
-    print(f"\nResult: {passed} passed, {failed} failed")
-    return failed == 0
-def test_dependencies():
-    """Test optional dependencies"""
-    print("\n" + "="*60)
-    print("TEST 2: Optional Dependencies")
-    print("="*60)
-    deps = [
-        ("faiss", "FAISS (vector search)"),
-        ("easyocr", "EasyOCR (text extraction)"),
-        ("piper", "Piper TTS (neural voice)"),
-    ]
-    for module, name in deps:
-        try:
-            __import__(module)
-            print(f"  ✅ {name}")
-        except ImportError:
-            print(f"  ⚠️  {name} - Not installed (optional)")
-    return True
-def test_memory_agent():
-    """Test MemoryAgent functionality"""
-    print("\n" + "="*60)
-    print("TEST 3: MemoryAgent")
-    print("="*60)
-    try:
-        from agents.memory_agent import MemoryAgent
-        import numpy as np
-        # Create test memory
-        memory = MemoryAgent(
-            memory_file="data/test_memory.json",
-            faiss_index_file="data/test_memory.faiss"
-        )
-        # Test adding memory
-        test_desc = "Test scene with a person"
-        test_embedding = np.random.rand(512).astype('float32')
-        memory.add(test_desc, image_embedding=test_embedding)
-        print("  ✅ Add memory")
-        # Test recall
-        last = memory.recall_last()
-        assert last is not None
-        print("  ✅ Recall last")
-        # Test text search
-        results = memory.search_by_text("person", threshold=0.1)
-        print(f"  ✅ Text search ({len(results)} results)")
-        # Test image search (if FAISS available)
-        try:
-            results = memory.search_by_image(test_embedding, k=1)
-            print(f"  ✅ Image search ({len(results)} results)")
-        except:
-            print("  ⚠️  Image search - FAISS not available")
-        # Cleanup
-        if os.path.exists("data/test_memory.json"):
-            os.remove("data/test_memory.json")
-        if os.path.exists("data/test_memory.faiss"):
-            os.remove("data/test_memory.faiss")
-        print("\n  MemoryAgent: PASSED")
-        return True
-    except Exception as e:
-        print(f"\n  ❌ MemoryAgent: FAILED - {e}")
-        return False
-def test_fusion_layer():
-    """Test FusionLayer"""
-    print("\n" + "="*60)
-    print("TEST 4: FusionLayer")
-    print("="*60)
-    try:
-        from core.fusion_layer import FusionLayer
-        import numpy as np
-        fusion = FusionLayer()
-        # Test fusion
-        context = fusion.fuse(
-            caption="a person holding a phone",
-            ocr_text="Hello World",
-            objects=["person", "phone"],
-            embedding=np.random.rand(512)
-        )
-        assert "caption" in context
-        assert "ocr_text" in context
-        assert "objects" in context
-        assert "full_description" in context
-        print("  ✅ Fuse multimodal data")
-        # Test extraction
-        desc, emb = fusion.extract_for_storage(context)
-        assert desc is not None
-        assert emb is not None
-        print("  ✅ Extract for storage")
-        print("\n  FusionLayer: PASSED")
-        return True
-    except Exception as e:
-        print(f"\n  ❌ FusionLayer: FAILED - {e}")
-        return False
-def test_query_agent():
-    """Test QueryAgent"""
-    print("\n" + "="*60)
-    print("TEST 5: QueryAgent")
-    print("="*60)
-    try:
-        from agents.memory_agent import MemoryAgent
-        from agents.query_agent import QueryAgent
-        memory = MemoryAgent(
-            memory_file="data/test_memory.json",
-            faiss_index_file="data/test_memory.faiss"
-        )
-        query = QueryAgent(memory)
-        # Test intent classification
-        intent = query.classify_intent("What did I see this morning?")
-        print(f"  ✅ Intent classification: {intent}")
-        # Test time extraction
-        time_window = query.extract_time_window("What did I see today?")
-        print(f"  ✅ Time extraction: {time_window is not None}")
-        # Cleanup
-        if os.path.exists("data/test_memory.json"):
-            os.remove("data/test_memory.json")
-        if os.path.exists("data/test_memory.faiss"):
-            os.remove("data/test_memory.faiss")
-        print("\n  QueryAgent: PASSED")
-        return True
-    except Exception as e:
-        print(f"\n  ❌ QueryAgent: FAILED - {e}")
-        return False
-def test_backward_compatibility():
-    """Test backward compatibility"""
-    print("\n" + "="*60)
-    print("TEST 6: Backward Compatibility")
-    print("="*60)
-    try:
-        from agents.memory_agent import MemoryAgent
-        # Test old memory format
-        memory = MemoryAgent(
-            memory_file="data/test_memory.json",
-            faiss_index_file="data/test_memory.faiss"
-        )
-        # Add memory without image embedding (old format)
-        memory.add("Test scene without embedding", image_embedding=None)
-        print("  ✅ Old format (no image embedding)")
-        # Recall should work
-        last = memory.recall_last()
-        assert last is not None
-        print("  ✅ Recall old format")
-        # Cleanup
-        if os.path.exists("data/test_memory.json"):
-            os.remove("data/test_memory.json")
-        if os.path.exists("data/test_memory.faiss"):
-            os.remove("data/test_memory.faiss")
-        print("\n  Backward Compatibility: PASSED")
-        return True
-    except Exception as e:
-        print(f"\n  ❌ Backward Compatibility: FAILED - {e}")
-        return False
-def main():
-    """Run all tests"""
-    print("\n" + "="*60)
-    print("VisionQ Upgrade - Test Suite")
-    print("="*60)
-    # Ensure data directory exists
-    os.makedirs("data", exist_ok=True)
-    results = []
-    # Run tests
-    results.append(("Imports", test_imports()))
-    results.append(("Dependencies", test_dependencies()))
-    results.append(("MemoryAgent", test_memory_agent()))
-    results.append(("FusionLayer", test_fusion_layer()))
-    results.append(("QueryAgent", test_query_agent()))
-    results.append(("Backward Compatibility", test_backward_compatibility()))
-    # Summary
-    print("\n" + "="*60)
-    print("TEST SUMMARY")
-    print("="*60)
-    passed = sum(1 for _, result in results if result)
-    total = len(results)
-    for name, result in results:
-        status = "✅ PASSED" if result else "❌ FAILED"
-        print(f"  {name}: {status}")
-    print(f"\nTotal: {passed}/{total} tests passed")
-    if passed == total:
-        print("\n🎉 All tests passed! System is ready.")
-        return 0
-    else:
-        print(f"\n⚠️  {total - passed} test(s) failed. Check errors above.")
-        return 1
-if __name__ == "__main__":
-    sys.exit(main())

archive/pipcheck.txt DELETED Viewed

Binary file (22.9 kB)

archive/requirements_upgraded.txt DELETED Viewed

@@ -1,54 +0,0 @@
-# ============================================
-# VisionQ - UPGRADED Requirements
-# ============================================
-# Core ML/AI
-torch>=2.0.0
-transformers>=4.57.3
-sentence-transformers>=2.2.2
-# Vision
-opencv-python
-pillow
-ultralytics  # YOLO (optional but recommended)
-# Voice
-pyttsx3          # TTS fallback (KEPT)
-sounddevice
-vosk
-# NEW: Neural TTS (Primary)
-piper-tts        # Voxtral/Piper neural TTS
-# NEW: OCR
-easyocr          # Lightweight OCR
-# NEW: Vector Search
-faiss-cpu        # FAISS for similarity search
-# Use faiss-gpu if you have CUDA
-# NEW: NLP Enhancement
-# DistilBERT is included in transformers
-# Optional: TensorFlow for SSD fallback
-# tensorflow>=2.13.0
-# ============================================
-# Installation Notes:
-# ============================================
-# 1. Create virtual environment:
-#    python -m venv venv
-#    venv\Scripts\activate  (Windows)
-#    source venv/bin/activate  (Linux/Mac)
-#
-# 2. Install dependencies:
-#    pip install -r requirements.txt
-#
-# 3. Download Vosk model:
-#    https://alphacephei.com/vosk/models
-#    Extract to: models/vosk/
-#
-# 4. Download Piper voice (optional):
-#    https://github.com/rhasspy/piper/releases
-#    Extract to: models/piper/
-# ============================================

cleanup.bat DELETED Viewed

@@ -1,65 +0,0 @@
-@echo off
-REM ============================================
-REM VisionQ - Project Cleanup Script
-REM Moves old/redundant files to archive
-REM ============================================
-echo.
-echo ============================================
-echo VisionQ Project Cleanup
-echo ============================================
-echo.
-REM Create archive directory
-if not exist "archive\" mkdir archive
-if not exist "archive\old_agents\" mkdir archive\old_agents
-if not exist "archive\old_docs\" mkdir archive\old_docs
-if not exist "archive\old_scripts\" mkdir archive\old_scripts
-echo [1/4] Moving old agent files...
-if exist "caption_agent.py" move /Y caption_agent.py archive\old_agents\
-if exist "memory_agent.py" move /Y memory_agent.py archive\old_agents\
-if exist "query_agent.py" move /Y query_agent.py archive\old_agents\
-if exist "vision_agent.py" move /Y vision_agent.py archive\old_agents\
-if exist "voice_agent.py" move /Y voice_agent.py archive\old_agents\
-echo [2/4] Moving old documentation...
-if exist "README_UPGRADED.md" move /Y README_UPGRADED.md archive\old_docs\
-if exist "ARCHITECTURE.md" move /Y ARCHITECTURE.md archive\old_docs\
-if exist "COMPARISON.md" move /Y COMPARISON.md archive\old_docs\
-if exist "DEPLOYMENT_CHECKLIST.md" move /Y DEPLOYMENT_CHECKLIST.md archive\old_docs\
-if exist "INDEX.md" move /Y INDEX.md archive\old_docs\
-if exist "QUICK_REFERENCE.md" move /Y QUICK_REFERENCE.md archive\old_docs\
-if exist "QUICKSTART.md" move /Y QUICKSTART.md archive\old_docs\
-if exist "SUMMARY.md" move /Y SUMMARY.md archive\old_docs\
-if exist "UPGRADE_GUIDE.md" move /Y UPGRADE_GUIDE.md archive\old_docs\
-echo [3/4] Moving old scripts...
-if exist "main.py" move /Y main.py archive\old_scripts\
-if exist "main_upgraded.py" move /Y main_upgraded.py archive\old_scripts\
-if exist "ask_question.py" move /Y ask_question.py archive\old_scripts\
-if exist "ask_question_upgraded.py" move /Y ask_question_upgraded.py archive\old_scripts\
-if exist "test_upgrade.py" move /Y test_upgrade.py archive\old_scripts\
-if exist "install_upgrade.bat" move /Y install_upgrade.bat archive\old_scripts\
-echo [4/4] Moving old requirements...
-if exist "requirements_upgraded.txt" move /Y requirements_upgraded.txt archive\
-if exist "pipcheck.txt" move /Y pipcheck.txt archive\
-echo.
-echo ============================================
-echo Cleanup Complete!
-echo ============================================
-echo.
-echo Old files moved to archive\ directory
-echo.
-echo Current structure:
-echo   agents/     - AI agents
-echo   config/     - Configuration
-echo   ui/         - Streamlit interface
-echo   data/       - Storage
-echo   archive/    - Old files (backup)
-echo.
-echo You can safely delete archive\ if not needed
-echo.
-pause

config/fast_mode.py DELETED Viewed

@@ -1,40 +0,0 @@
-# Fast Mode Configuration
-# Copy this to config/settings.py to make VisionQ faster
-# ============================================
-# FAST MODE - Optimized for Speed
-# ============================================
-FEATURES = {
-    "ocr_enabled": False,              # Disabled for speed
-    "faiss_enabled": True,
-    "neural_tts_enabled": True,
-    "intent_classification_enabled": True,
-    "object_detection_enabled": False,  # Disabled for speed
-    "continuous_mode_enabled": True,
-    "embeddings_enabled": False,       # Keep disabled
-}
-# Use nano YOLO if you enable object detection
-MODEL_CONFIG = {
-    "yolo_model": "yolov8n.pt",  # Nano model (faster)
-    "caption_model": "Salesforce/blip-image-captioning-base",
-    # ... rest of config
-}
-# ============================================
-# EXPECTED PERFORMANCE
-# ============================================
-# With this config:
-# - Capture & Describe: ~1.5 seconds
-# - Remember Scene: ~1.5 seconds
-# - Read Text: Disabled
-#
-# Speed improvement: ~40% faster!
-# ============================================
-# To apply:
-# 1. Copy FEATURES section above
-# 2. Paste into config/settings.py
-# 3. Run: fix_and_run.bat
-# 4. Test the speed!

docs/CAMERA_FEED.md DELETED Viewed

@@ -1,178 +0,0 @@
-# Camera Feed Options
-## Two Versions Available
-### 1. Standard Version (app.py)
-**File:** `ui/app.py`
-**Launch:** `run.bat` or `streamlit run ui/app.py`
-**Features:**
-- Static camera feed
-- Updates only when you click buttons
-- Lower CPU usage
-- Better for slower computers
-**Best for:**
-- Testing and development
-- Slower computers
-- Battery saving on laptops
----
-### 2. Continuous Feed Version (app_continuous.py)
-**File:** `ui/app_continuous.py`
-**Launch:** `run_continuous.bat` or `streamlit run ui/app_continuous.py`
-**Features:**
-- Live continuous camera feed
-- Adjustable refresh rate (0.5-5 seconds)
-- Start/Stop camera button
-- Real-time preview
-**Best for:**
-- Live monitoring
-- Real-time demonstrations
-- Faster computers with good camera
----
-## Comparison
-| Feature | Standard | Continuous |
-|---------|----------|------------|
-| **Camera Feed** | Static | Live |
-| **Updates** | On button click | Automatic |
-| **CPU Usage** | Low | Medium-High |
-| **Refresh Rate** | Manual | 0.5-5 seconds |
-| **Start/Stop** | No | Yes |
-| **Battery Impact** | Low | Higher |
----
-## How to Use Continuous Feed
-### Step 1: Launch
-```bash
-run_continuous.bat
-```
-### Step 2: Initialize System
-Click "Initialize System" in the Vision tab
-### Step 3: Start Camera
-Click "Start Camera" button
-### Step 4: Adjust Settings
-- Use sidebar slider to change refresh rate
-- Lower rate = smoother but more CPU
-- Higher rate = less CPU but choppier
-### Step 5: Use Features
-- Camera keeps running in background
-- Click "Capture & Describe" anytime
-- Click "Remember Scene" anytime
-- Click "Read Text" anytime
-### Step 6: Stop Camera
-Click "Stop Camera" when done to save resources
----
-## Performance Tips
-### For Continuous Feed
-**Optimize refresh rate:**
-- Fast computer: 0.5-1 second
-- Medium computer: 1-2 seconds
-- Slow computer: 2-5 seconds
-**Save resources:**
-- Stop camera when not actively using
-- Close other applications
-- Use standard version if too slow
-**Battery saving:**
-- Use standard version on laptop
-- Or set refresh rate to 3-5 seconds
-- Stop camera between uses
----
-## Troubleshooting
-### Camera feed is choppy
-**Solution:** Increase refresh rate in sidebar (try 2-3 seconds)
-### High CPU usage
-**Solution:**
-- Stop camera when not needed
-- Increase refresh rate
-- Use standard version instead
-### Camera won't start
-**Solution:**
-- Check camera permissions
-- Close other apps using camera
-- Try standard version first
-- Restart application
-### Feed freezes
-**Solution:**
-- Click "Stop Camera" then "Start Camera"
-- Refresh browser page
-- Restart application
----
-## Which Version Should I Use?
-### Use Standard Version (`run.bat`) if:
-- Testing features
-- Slower computer
-- On battery power
-- Don't need live feed
-- Just want to capture occasionally
-### Use Continuous Version (`run_continuous.bat`) if:
-- Need live monitoring
-- Demonstrating to others
-- Fast computer with good camera
-- Plugged into power
-- Want real-time preview
----
-## Switching Between Versions
-You can switch anytime:
-```bash
-# Stop current version (Ctrl+C)
-# Start standard version
-run.bat
-# OR start continuous version
-run_continuous.bat
-```
-Both use the same memory and settings!
----
-## Summary
-**Standard Version:**
-- Launch: `run.bat`
-- Camera: Static (updates on click)
-- CPU: Low
-- Best for: General use
-**Continuous Version:**
-- Launch: `run_continuous.bat`
-- Camera: Live feed
-- CPU: Medium-High
-- Best for: Live monitoring
-**Try both and see which works better for you!**

docs/PERFORMANCE.md DELETED Viewed

@@ -1,187 +0,0 @@
-# Performance Optimization Guide
-## Speed Issues Fixed
-### 1. Embedding Error Fixed
-The AttributeError with CLIP embeddings has been fixed by using `torch.nn.functional.normalize()` instead of `.norm()`.
-### 2. Embeddings Disabled by Default
-Image embeddings (CLIP) are now **disabled by default** for faster performance.
-To enable embeddings, edit `config/settings.py`:
-```python
-FEATURES = {
-    "embeddings_enabled": True,  # Enable for visual similarity search
-}
-```
-## Performance Settings
-### Fast Mode (Default)
-```python
-# config/settings.py
-FEATURES = {
-    "embeddings_enabled": False,  # Faster
-    "ocr_enabled": True,
-    "object_detection_enabled": True,
-}
-```
-**Speed:** Fast (2-3 seconds per capture)
-**Features:** Caption + OCR + Object Detection
-### Full Mode (Slower but more features)
-```python
-FEATURES = {
-    "embeddings_enabled": True,  # Slower but enables visual search
-    "ocr_enabled": True,
-    "object_detection_enabled": True,
-}
-```
-**Speed:** Slower (5-7 seconds per capture)
-**Features:** All features including visual similarity search
-## Speed Comparison
-| Feature | Time | Can Disable? |
-|---------|------|--------------|
-| YOLO Detection | ~500ms | Yes (set object_detection_enabled=False) |
-| BLIP Caption | ~1000ms | No (core feature) |
-| CLIP Embeddings | ~2000ms | Yes (set embeddings_enabled=False) |
-| EasyOCR | ~500ms | Yes (set ocr_enabled=False) |
-## Optimization Tips
-### 1. Disable Unused Features
-Edit `config/settings.py`:
-```python
-FEATURES = {
-    "ocr_enabled": False,  # If you don't need text reading
-    "embeddings_enabled": False,  # If you don't need visual search
-    "object_detection_enabled": False,  # If you don't need object detection
-}
-```
-### 2. Use Smaller Models
-```python
-MODEL_CONFIG = {
-    "yolo_model": "yolov8n.pt",  # Nano model (faster)
-    # Instead of "yolov8s.pt" (small model)
-}
-```
-### 3. Reduce OCR Languages
-```python
-OCR_CONFIG = {
-    "languages": ["en"],  # Just English (faster)
-    # Instead of ["en", "es", "fr", "de"]
-}
-```
-### 4. Lower Confidence Thresholds
-```python
-VISION_CONFIG = {
-    "confidence_threshold": 0.3,  # Lower = faster but less accurate
-}
-```
-### 5. Use GPU (if available)
-```python
-PERFORMANCE_CONFIG = {
-    "use_gpu": True,  # Much faster with GPU
-}
-```
-## Recommended Settings
-### For Speed
-```python
-# Fastest configuration
-FEATURES = {
-    "ocr_enabled": False,
-    "embeddings_enabled": False,
-    "object_detection_enabled": False,
-}
-```
-**Result:** ~1 second per capture (caption only)
-### For Balance
-```python
-# Balanced configuration (default)
-FEATURES = {
-    "ocr_enabled": True,
-    "embeddings_enabled": False,
-    "object_detection_enabled": True,
-}
-```
-**Result:** ~2-3 seconds per capture
-### For Full Features
-```python
-# All features enabled
-FEATURES = {
-    "ocr_enabled": True,
-    "embeddings_enabled": True,
-    "object_detection_enabled": True,
-}
-```
-**Result:** ~5-7 seconds per capture
-## Troubleshooting Slow Performance
-### Issue: First run is very slow
-**Solution:** This is normal. Models are being downloaded (~2GB). Subsequent runs will be much faster.
-### Issue: Every capture takes 5+ seconds
-**Solution:** Disable embeddings in `config/settings.py`:
-```python
-FEATURES = {
-    "embeddings_enabled": False,
-}
-```
-### Issue: OCR is slow
-**Solution:**
-1. Reduce languages to just what you need
-2. Or disable OCR if not needed:
-```python
-FEATURES = {
-    "ocr_enabled": False,
-}
-```
-### Issue: Out of memory
-**Solution:**
-1. Close other applications
-2. Disable embeddings
-3. Use smaller YOLO model (yolov8n.pt)
-## Current Configuration
-The system is now configured for **balanced performance**:
-- Embeddings: DISABLED (faster)
-- OCR: ENABLED
-- Object Detection: ENABLED
-- Caption: ENABLED (always on)
-This gives you good features with reasonable speed (~2-3 seconds per capture).
-## How to Change Settings
-1. Open `config/settings.py`
-2. Find the `FEATURES` section
-3. Change `True`/`False` values
-4. Restart the application
-Example:
-```python
-# For maximum speed
-FEATURES = {
-    "ocr_enabled": False,
-    "embeddings_enabled": False,
-    "object_detection_enabled": False,
-}
-```
-Save the file and restart with `run.bat`.

docs/PERFORMANCE_ANALYSIS.md DELETED Viewed

@@ -1,310 +0,0 @@
-# VisionQ Performance Analysis
-## Current Models Being Used
-### 1. YOLO (Object Detection)
-**Model:** YOLOv8s (Small)
-**File:** `yolov8s.pt`
-**Size:** ~22MB
-**Speed:** ~500ms per frame
-**Purpose:** Detect objects in scene
-### 2. BLIP (Image Captioning)
-**Model:** Salesforce/blip-image-captioning-base
-**Size:** ~990MB
-**Speed:** ~1000-1500ms per frame
-**Purpose:** Generate scene descriptions
-**THIS IS THE SLOWEST PART!**
-### 3. EasyOCR (Text Extraction)
-**Model:** EasyOCR English
-**Size:** ~50MB per language
-**Speed:** ~500ms per frame
-**Purpose:** Read text from images
-### 4. CLIP (Embeddings) - DISABLED
-**Model:** openai/clip-vit-base-patch32
-**Status:** Disabled by default
-**Speed:** Would add ~2000ms if enabled
----
-## Why Camera is Slow
-### Current Processing Time Breakdown
-**When you click "Capture & Describe":**
-```
-1. Capture frame:           ~10ms
-2. BLIP caption:           ~1500ms  ← SLOWEST!
-3. EasyOCR text:            ~500ms
-4. Fusion/processing:        ~50ms
---------------------------------
-Total:                     ~2060ms (2+ seconds)
-```
-**The main bottleneck is BLIP (image captioning)!**
----
-## Speed Optimization Options
-### Option 1: Disable OCR (Fastest)
-**Speed gain:** ~500ms faster
-**Trade-off:** No text reading
-Edit `config/settings.py`:
-```python
-FEATURES = {
-    "ocr_enabled": False,  # Disable OCR
-}
-```
-**New speed:** ~1.5 seconds
----
-### Option 2: Use Smaller YOLO Model
-**Speed gain:** ~200ms faster
-**Trade-off:** Slightly less accurate object detection
-Edit `config/settings.py`:
-```python
-MODEL_CONFIG = {
-    "yolo_model": "yolov8n.pt",  # Nano model (faster)
-}
-```
-Download nano model:
-```bash
-# In Python
-from ultralytics import YOLO
-model = YOLO("yolov8n.pt")
-```
-**New speed:** ~1.8 seconds
----
-### Option 3: Disable Object Detection
-**Speed gain:** ~500ms faster
-**Trade-off:** No object detection
-Edit `config/settings.py`:
-```python
-FEATURES = {
-    "object_detection_enabled": False,
-}
-```
-**New speed:** ~1.5 seconds
----
-### Option 4: Use Faster Caption Model (RECOMMENDED)
-**Speed gain:** ~1000ms faster!
-**Trade-off:** Slightly different captions
-Replace BLIP with a faster model like GIT or BLIP-2 small.
----
-### Option 5: Caption Only Mode (FASTEST)
-**Speed gain:** Maximum
-**Trade-off:** Only caption, no OCR or objects
-Edit `config/settings.py`:
-```python
-FEATURES = {
-    "ocr_enabled": False,
-    "object_detection_enabled": False,
-}
-```
-**New speed:** ~1.5 seconds (just BLIP)
----
-## Recommended Configurations
-### For Speed (Fastest)
-```python
-# config/settings.py
-FEATURES = {
-    "ocr_enabled": False,           # Disable OCR
-    "object_detection_enabled": False,  # Disable YOLO
-    "embeddings_enabled": False,    # Already disabled
-}
-MODEL_CONFIG = {
-    "yolo_model": "yolov8n.pt",     # Use nano if keeping YOLO
-}
-```
-**Result:** ~1.5 seconds per capture
----
-### For Balance (Recommended)
-```python
-# config/settings.py
-FEATURES = {
-    "ocr_enabled": True,            # Keep OCR
-    "object_detection_enabled": False,  # Disable YOLO (not critical)
-    "embeddings_enabled": False,    # Keep disabled
-}
-```
-**Result:** ~2 seconds per capture
----
-### For Full Features (Slowest)
-```python
-# config/settings.py
-FEATURES = {
-    "ocr_enabled": True,
-    "object_detection_enabled": True,
-    "embeddings_enabled": False,  # Keep disabled!
-}
-```
-**Result:** ~2.5 seconds per capture
----
-## GPU Acceleration
-If you have an NVIDIA GPU:
-```python
-# config/settings.py
-PERFORMANCE_CONFIG = {
-    "use_gpu": True,  # Enable GPU
-}
-OCR_CONFIG = {
-    "gpu": True,  # Enable GPU for OCR
-}
-```
-**Speed improvement:** 2-3x faster!
-**Requirements:**
-- NVIDIA GPU
-- CUDA installed
-- PyTorch with CUDA support
----
-## Camera Feed Speed
-The camera itself is fast (~10ms per frame).
-**The slowness comes from AI processing, not the camera!**
-### For Continuous Feed:
-- Camera updates quickly
-- But processing (BLIP) takes 1-2 seconds
-- So you see lag between capture and results
-### Solutions:
-1. Use static feed (current `app.py`)
-2. Disable heavy features (OCR, YOLO)
-3. Use GPU acceleration
-4. Accept the 1-2 second delay
----
-## Model Comparison
-| Model | Size | Speed | Accuracy | Replaceable? |
-|-------|------|-------|----------|--------------|
-| **BLIP** | 990MB | Slow (1.5s) | High | Yes (use GIT) |
-| **YOLO** | 22MB | Medium (0.5s) | High | Yes (use nano) |
-| **EasyOCR** | 50MB | Medium (0.5s) | High | Hard to replace |
-| **CLIP** | 500MB | Slow (2s) | High | Disabled |
----
-## Quick Fixes You Can Try Now
-### 1. Disable OCR
-```python
-# config/settings.py
-FEATURES = {
-    "ocr_enabled": False,
-}
-```
-**Restart app:** `fix_and_run.bat`
-### 2. Disable YOLO
-```python
-# config/settings.py
-FEATURES = {
-    "object_detection_enabled": False,
-}
-```
-**Restart app:** `fix_and_run.bat`
-### 3. Both (Fastest)
-```python
-# config/settings.py
-FEATURES = {
-    "ocr_enabled": False,
-    "object_detection_enabled": False,
-}
-```
-**Restart app:** `fix_and_run.bat`
-**Result:** Only BLIP caption (~1.5 seconds)
----
-## Alternative: Use Lighter Caption Model
-Create a new caption agent with a faster model:
-```python
-# agents/caption_agent_fast.py
-from transformers import AutoProcessor, AutoModelForCausalLM
-class FastCaptionAgent:
-    def __init__(self):
-        # Use GIT (faster than BLIP)
-        self.processor = AutoProcessor.from_pretrained("microsoft/git-base")
-        self.model = AutoModelForCausalLM.from_pretrained("microsoft/git-base")
-        self.model.eval()
-    def describe(self, frame_bgr):
-        # Same as BLIP but faster
-        ...
-```
-**Speed:** ~500ms (3x faster than BLIP!)
----
-## Summary
-**Why slow:**
-- BLIP caption model takes 1.5 seconds
-- OCR adds 0.5 seconds
-- YOLO adds 0.5 seconds
-- Total: 2.5 seconds
-**Quick fix:**
-```python
-# Disable OCR and YOLO
-FEATURES = {
-    "ocr_enabled": False,
-    "object_detection_enabled": False,
-}
-```
-**New speed:** 1.5 seconds (just BLIP)
-**Best fix:**
-- Use GPU acceleration (2-3x faster)
-- Or replace BLIP with GIT model (3x faster)
-**The camera itself is fast - it's the AI models that are slow!**

extras/labelmap_M.txt DELETED Viewed

@@ -1,91 +0,0 @@
-???
-person
-bicycle
-car
-motorcycle
-airplane
-bus
-train
-truck
-boat
-traffic light
-fire hydrant
-???
-stop sign
-parking meter
-bench
-bird
-cat
-dog
-horse
-sheep
-cow
-elephant
-bear
-zebra
-giraffe
-???
-backpack
-umbrella
-???
-???
-handbag
-tie
-suitcase
-frisbee
-skis
-snowboard
-sports ball
-kite
-baseball bat
-baseball glove
-skateboard
-surfboard
-tennis racket
-bottle
-???
-wine glass
-cup
-fork
-knife
-spoon
-bowl
-banana
-apple
-sandwich
-orange
-broccoli
-carrot
-hot dog
-pizza
-donut
-cake
-chair
-couch
-potted plant
-bed
-???
-dining table
-???
-???
-toilet
-???
-tv
-laptop
-mouse
-remote
-keyboard
-cell phone
-microwave
-oven
-toaster
-sink
-refrigerator
-???
-book
-clock
-vase
-scissors
-teddy bear
-hair drier
-toothbrush

fix_and_run.bat DELETED Viewed

@@ -1,40 +0,0 @@
-@echo off
-REM Quick Fix Script for VisionQ
-REM Clears cache and restarts
-echo.
-echo ============================================
-echo VisionQ - Quick Fix Script
-echo ============================================
-echo.
-echo [1/3] Clearing Python cache...
-if exist "__pycache__" rd /s /q __pycache__
-if exist "agents\__pycache__" rd /s /q agents\__pycache__
-if exist "config\__pycache__" rd /s /q config\__pycache__
-if exist "core\__pycache__" rd /s /q core\__pycache__
-if exist "ui\__pycache__" rd /s /q ui\__pycache__
-echo    - Python cache cleared
-echo.
-echo [2/3] Clearing Streamlit cache...
-if exist ".streamlit\cache" rd /s /q .streamlit\cache
-echo    - Streamlit cache cleared
-echo.
-echo [3/3] Restarting application...
-echo.
-echo ============================================
-echo Cache cleared! Starting VisionQ...
-echo ============================================
-echo.
-REM Activate venv if exists
-if exist "venv\Scripts\activate.bat" (
-    call venv\Scripts\activate.bat
-)
-REM Run Streamlit
-streamlit run ui\app.py
-pause

fix_tensorflow.bat DELETED Viewed

@@ -1,43 +0,0 @@
-@echo off
-REM Fix TensorFlow/Protobuf Conflict
-echo.
-echo ============================================
-echo Fixing TensorFlow/Protobuf Conflict
-echo ============================================
-echo.
-REM Activate venv
-if exist ".venv\Scripts\activate.bat" (
-    call .venv\Scripts\activate.bat
-) else if exist "venv\Scripts\activate.bat" (
-    call venv\Scripts\activate.bat
-)
-echo [1/4] Uninstalling conflicting packages...
-pip uninstall tensorflow tensorflow-cpu protobuf -y
-echo.
-echo [2/4] Installing correct protobuf version...
-pip install protobuf==3.20.3
-echo.
-echo [3/4] Reinstalling transformers...
-pip install --upgrade --force-reinstall transformers
-echo.
-echo [4/4] Clearing cache...
-rd /s /q __pycache__ 2>nul
-rd /s /q agents\__pycache__ 2>nul
-rd /s /q config\__pycache__ 2>nul
-rd /s /q core\__pycache__ 2>nul
-rd /s /q ui\__pycache__ 2>nul
-echo.
-echo ============================================
-echo Fix Complete!
-echo ============================================
-echo.
-echo Now run: streamlit run ui\app.py
-echo.
-pause

memory.json DELETED Viewed

The diff for this file is too large to render. See raw diff

run_continuous.bat DELETED Viewed

@@ -1,30 +0,0 @@
-@echo off
-REM VisionQ - Continuous Camera Feed Version
-echo.
-echo ============================================
-echo VisionQ - Continuous Camera Feed
-echo ============================================
-echo.
-REM Activate venv
-if exist ".venv\Scripts\activate.bat" (
-    call .venv\Scripts\activate.bat
-) else if exist "venv\Scripts\activate.bat" (
-    call venv\Scripts\activate.bat
-)
-echo [INFO] Launching VisionQ with continuous camera feed...
-echo [INFO] Opening browser at http://localhost:8501
-echo.
-echo Features:
-echo   - Live camera feed
-echo   - Adjustable refresh rate
-echo   - Start/Stop camera control
-echo.
-echo Press Ctrl+C to stop the server
-echo.
-streamlit run ui\app_continuous.py
-pause

ui/app_continuous.py DELETED Viewed

@@ -1,340 +0,0 @@
-"""
-VisionQ - Enhanced Streamlit Interface with Continuous Camera Feed
-"""
-import streamlit as st
-import cv2
-import numpy as np
-from PIL import Image
-import sys
-from pathlib import Path
-import time
-# Add project root to path
-PROJECT_ROOT = Path(__file__).parent.parent
-sys.path.insert(0, str(PROJECT_ROOT))
-from config.settings import UI_CONFIG, OCR_CONFIG, SUPPORTED_LANGUAGES
-from agents.vision_agent import VisionAgent
-from agents.memory_agent import MemoryAgent
-from agents.query_agent import QueryAgent
-# Page config
-st.set_page_config(
-    page_title=UI_CONFIG["title"],
-    page_icon="👁️",
-    layout=UI_CONFIG["layout"],
-)
-# Custom CSS
-st.markdown("""
-<style>
-    .main-header {
-        font-size: 3rem;
-        font-weight: bold;
-        text-align: center;
-        margin-bottom: 2rem;
-        background: linear-gradient(90deg, #667eea 0%, #764ba2 100%);
-        -webkit-background-clip: text;
-        -webkit-text-fill-color: transparent;
-    }
-    .success-box {
-        padding: 1rem;
-        border-radius: 0.5rem;
-        background-color: #d4edda;
-        border: 1px solid #c3e6cb;
-        color: #155724;
-    }
-</style>
-""", unsafe_allow_html=True)
-# Initialize session state
-if "vision_agent" not in st.session_state:
-    st.session_state.vision_agent = None
-if "memory_agent" not in st.session_state:
-    st.session_state.memory_agent = None
-if "query_agent" not in st.session_state:
-    st.session_state.query_agent = None
-if "last_description" not in st.session_state:
-    st.session_state.last_description = None
-if "camera_running" not in st.session_state:
-    st.session_state.camera_running = False
-@st.cache_resource
-def load_agents():
-    """Load all agents (cached)"""
-    with st.spinner("Loading AI models... This may take a minute on first run..."):
-        try:
-            vision = VisionAgent()
-            memory = MemoryAgent()
-            query = QueryAgent(memory)
-            return vision, memory, query
-        except Exception as e:
-            st.error(f"Error loading agents: {e}")
-            return None, None, None
-def capture_frame(vision_agent):
-    """Capture frame from camera"""
-    ret, frame = vision_agent.cap.read()
-    if ret:
-        return frame
-    return None
-def main():
-    # Header
-    st.markdown('<h1 class="main-header">VisionQ - Multimodal AI Assistant</h1>', unsafe_allow_html=True)
-    # Sidebar
-    with st.sidebar:
-        st.header("Settings")
-        # Language selection
-        st.subheader("OCR Language")
-        selected_langs = st.multiselect(
-            "Select languages for text extraction:",
-            options=list(SUPPORTED_LANGUAGES.keys()),
-            default=OCR_CONFIG["languages"],
-            format_func=lambda x: f"{SUPPORTED_LANGUAGES[x]} ({x})"
-        )
-        if selected_langs:
-            OCR_CONFIG["languages"] = selected_langs
-        st.divider()
-        # Camera settings
-        st.subheader("Camera Settings")
-        refresh_rate = st.slider("Refresh rate (seconds)", 0.5, 5.0, 1.0, 0.5)
-        st.divider()
-        # Info
-        st.subheader("About")
-        st.info("""
-        **VisionQ** is a multimodal AI assistant that can:
-        - See and describe scenes
-        - Read text (OCR)
-        - Remember and recall
-        - Search memories
-        """)
-        st.divider()
-        # Stats
-        st.subheader("System Status")
-        if st.session_state.memory_agent:
-            memories = st.session_state.memory_agent.recall_all()
-            st.metric("Memories Stored", len(memories))
-        else:
-            st.metric("Memories Stored", "Not loaded")
-    # Main content
-    tab1, tab2, tab3, tab4 = st.tabs(["Vision", "Query", "Memories", "Help"])
-    # TAB 1: VISION
-    with tab1:
-        st.header("Vision System")
-        # Load agents
-        if st.session_state.vision_agent is None:
-            if st.button("Initialize System", type="primary"):
-                st.cache_resource.clear()
-                vision, memory, query = load_agents()
-                if vision:
-                    st.session_state.vision_agent = vision
-                    st.session_state.memory_agent = memory
-                    st.session_state.query_agent = query
-                    st.success("System initialized successfully!")
-                    st.rerun()
-        else:
-            col1, col2 = st.columns([2, 1])
-            with col1:
-                st.subheader("Live Camera Feed")
-                # Camera controls
-                col_a, col_b, col_c, col_d = st.columns(4)
-                with col_a:
-                    if st.button("Capture & Describe"):
-                        with st.spinner("Analyzing scene..."):
-                            description = st.session_state.vision_agent.describe_scene()
-                            if description:
-                                st.session_state.last_description = description
-                                st.success("Scene analyzed!")
-                with col_b:
-                    if st.button("Remember Scene"):
-                        with st.spinner("Storing memory..."):
-                            description = st.session_state.vision_agent.remember_scene()
-                            if description:
-                                st.session_state.last_description = description
-                                st.success("Scene remembered!")
-                with col_c:
-                    if st.button("Read Text"):
-                        with st.spinner("Extracting text..."):
-                            text_result = st.session_state.vision_agent.read_text()
-                            if text_result:
-                                st.session_state.last_description = text_result
-                                st.success("Text extracted!")
-                with col_d:
-                    if st.button("Stop Camera" if st.session_state.camera_running else "Start Camera"):
-                        st.session_state.camera_running = not st.session_state.camera_running
-                        st.rerun()
-                # Camera feed placeholder
-                camera_placeholder = st.empty()
-                # Continuous camera feed
-                if st.session_state.camera_running:
-                    frame = capture_frame(st.session_state.vision_agent)
-                    if frame is not None:
-                        frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
-                        camera_placeholder.image(frame_rgb, channels="RGB", use_container_width=True)
-                        time.sleep(refresh_rate)
-                        st.rerun()
-                    else:
-                        camera_placeholder.error("Could not capture frame from camera")
-                else:
-                    # Show single frame when stopped
-                    frame = capture_frame(st.session_state.vision_agent)
-                    if frame is not None:
-                        frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
-                        camera_placeholder.image(frame_rgb, channels="RGB", use_container_width=True)
-                    else:
-                        camera_placeholder.info("Click 'Start Camera' to begin live feed")
-            with col2:
-                st.subheader("Results")
-                if st.session_state.last_description:
-                    st.markdown(f'<div class="success-box">{st.session_state.last_description}</div>',
-                              unsafe_allow_html=True)
-                else:
-                    st.info("Click a button to analyze the scene")
-    # TAB 2: QUERY
-    with tab2:
-        st.header("Query Memories")
-        if st.session_state.query_agent is None:
-            st.warning("Please initialize the system first (Vision tab)")
-        else:
-            st.subheader("Ask a Question")
-            query_text = st.text_input(
-                "Enter your question:",
-                placeholder="e.g., What did I see this morning?",
-                key="query_input"
-            )
-            st.caption("**Examples:**")
-            col1, col2, col3 = st.columns(3)
-            with col1:
-                if st.button("What did I see today?"):
-                    query_text = "What did I see today?"
-            with col2:
-                if st.button("When did I see a person?"):
-                    query_text = "When did I see a person?"
-            with col3:
-                if st.button("Show memories with text"):
-                    query_text = "Show memories with text"
-            if st.button("Search", type="primary") and query_text:
-                with st.spinner("Searching memories..."):
-                    result = st.session_state.query_agent.ask(query_text)
-                    st.subheader("Results")
-                    if "don't" in result.lower() or "no" in result.lower():
-                        st.info(result)
-                    else:
-                        st.success(result)
-    # TAB 3: MEMORIES
-    with tab3:
-        st.header("Memory Browser")
-        if st.session_state.memory_agent is None:
-            st.warning("Please initialize the system first (Vision tab)")
-        else:
-            memories = st.session_state.memory_agent.recall_all()
-            if not memories:
-                st.info("No memories stored yet. Use the Vision tab to remember scenes!")
-            else:
-                st.success(f"Total memories: {len(memories)}")
-                for i, mem in enumerate(reversed(memories[-10:])):
-                    with st.expander(f"Memory #{mem.get('id', i)} - {mem.get('timestamp', 'Unknown')}"):
-                        st.write(f"**Description:** {mem.get('description', 'N/A')}")
-                        st.write(f"**Importance:** {mem.get('importance', 1)}")
-                        has_text_emb = "text_embedding" in mem
-                        has_img_emb = "image_embedding" in mem
-                        col1, col2 = st.columns(2)
-                        with col1:
-                            st.caption(f"Text Embedding: {'Yes' if has_text_emb else 'No'}")
-                        with col2:
-                            st.caption(f"Image Embedding: {'Yes' if has_img_emb else 'No'}")
-                st.divider()
-                if st.button("Clear All Memories", type="secondary"):
-                    if st.button("Confirm Clear"):
-                        st.session_state.memory_agent.memories = []
-                        st.session_state.memory_agent._save()
-                        st.success("All memories cleared!")
-                        st.rerun()
-    # TAB 4: HELP
-    with tab4:
-        st.header("Help & Documentation")
-        st.subheader("Quick Start")
-        st.markdown("""
-        1. **Initialize System**: Click "Initialize System" in the Vision tab
-        2. **Start Camera**: Click "Start Camera" for continuous feed
-        3. **Capture Scene**: Click "Capture & Describe" to analyze
-        4. **Remember**: Click "Remember Scene" to store in memory
-        5. **Read Text**: Click "Read Text" to extract visible text
-        6. **Query**: Go to Query tab and ask questions
-        """)
-        st.divider()
-        st.subheader("Camera Controls")
-        st.markdown("""
-        - **Start Camera**: Begins continuous live feed
-        - **Stop Camera**: Pauses live feed (saves resources)
-        - **Refresh Rate**: Adjust in sidebar (0.5-5 seconds)
-        **Tip:** Stop camera when not in use to save CPU/battery
-        """)
-        st.divider()
-        st.subheader("Supported Languages")
-        st.markdown(f"""
-        VisionQ supports **{len(SUPPORTED_LANGUAGES)} languages** for OCR.
-        Select languages in the sidebar settings.
-        """)
-        st.divider()
-        st.subheader("Troubleshooting")
-        st.markdown("""
-        **Camera not working?**
-        - Check camera permissions
-        - Ensure no other app is using camera
-        - Try clicking "Stop Camera" then "Start Camera"
-        **System slow?**
-        - Stop camera when not needed
-        - Increase refresh rate in sidebar
-        - Check `docs/PERFORMANCE.md`
-        """)
-if __name__ == "__main__":
-    main()