Clean up repo: remove archive, dev docs, nested submodule; fix license badge
Browse files- .gitignore +7 -0
- BEFORE_AFTER.md +0 -422
- FINAL_FIX.md +0 -177
- FIXES_APPLIED.md +0 -124
- FIX_TENSORFLOW.md +0 -138
- MASTER_TROUBLESHOOTING.md +0 -232
- MODELS_AND_SPEED.md +0 -203
- README.md +1 -1
- RESTRUCTURE_SUMMARY.md +0 -348
- TROUBLESHOOTING.md +0 -191
- VisionQ +0 -1
- archive/old_agents/caption_agent.py +0 -40
- archive/old_agents/memory_agent.py +0 -59
- archive/old_agents/query_agent.py +0 -127
- archive/old_agents/vision_agent.py +0 -210
- archive/old_agents/voice_agent.py +0 -127
- archive/old_docs/ARCHITECTURE.md +0 -445
- archive/old_docs/COMPARISON.md +0 -431
- archive/old_docs/DEPLOYMENT_CHECKLIST.md +0 -397
- archive/old_docs/INDEX.md +0 -359
- archive/old_docs/QUICKSTART.md +0 -197
- archive/old_docs/QUICK_REFERENCE.md +0 -315
- archive/old_docs/README_UPGRADED.md +0 -410
- archive/old_docs/SUMMARY.md +0 -406
- archive/old_docs/UPGRADE_GUIDE.md +0 -532
- archive/old_scripts/ask_question.py +0 -19
- archive/old_scripts/ask_question_upgraded.py +0 -41
- archive/old_scripts/install_upgrade.bat +0 -101
- archive/old_scripts/main.py +0 -66
- archive/old_scripts/main_upgraded.py +0 -85
- archive/old_scripts/test_upgrade.py +0 -274
- archive/pipcheck.txt +0 -0
- archive/requirements_upgraded.txt +0 -54
- cleanup.bat +0 -65
- config/fast_mode.py +0 -40
- docs/CAMERA_FEED.md +0 -178
- docs/PERFORMANCE.md +0 -187
- docs/PERFORMANCE_ANALYSIS.md +0 -310
- extras/labelmap_M.txt +0 -91
- fix_and_run.bat +0 -40
- fix_tensorflow.bat +0 -43
- memory.json +0 -0
- run_continuous.bat +0 -30
- ui/app_continuous.py +0 -340
.gitignore
CHANGED
|
@@ -51,6 +51,13 @@ models/piper/
|
|
| 51 |
*.tflite
|
| 52 |
*.onnx
|
| 53 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 54 |
# Environment variables
|
| 55 |
.env
|
| 56 |
|
|
|
|
| 51 |
*.tflite
|
| 52 |
*.onnx
|
| 53 |
|
| 54 |
+
# Runtime data at root
|
| 55 |
+
memory.json
|
| 56 |
+
|
| 57 |
+
# Local archive and extras
|
| 58 |
+
archive/
|
| 59 |
+
extras/
|
| 60 |
+
|
| 61 |
# Environment variables
|
| 62 |
.env
|
| 63 |
|
BEFORE_AFTER.md
DELETED
|
@@ -1,422 +0,0 @@
|
|
| 1 |
-
# 📊 VisionQ - Before & After Restructuring
|
| 2 |
-
|
| 3 |
-
## 🎯 TRANSFORMATION SUMMARY
|
| 4 |
-
|
| 5 |
-
Your VisionQ project has been transformed from a **cluttered development project** to a **clean, production-ready application** with a **web interface**!
|
| 6 |
-
|
| 7 |
-
---
|
| 8 |
-
|
| 9 |
-
## 📂 FOLDER STRUCTURE COMPARISON
|
| 10 |
-
|
| 11 |
-
### **BEFORE** ❌
|
| 12 |
-
```
|
| 13 |
-
VisionQ/
|
| 14 |
-
├── agents/ (new)
|
| 15 |
-
├── core/ (new)
|
| 16 |
-
├── data/
|
| 17 |
-
├── extras/
|
| 18 |
-
├── models/
|
| 19 |
-
├── VisionQ/
|
| 20 |
-
├── caption_agent.py (duplicate)
|
| 21 |
-
├── memory_agent.py (duplicate)
|
| 22 |
-
├── query_agent.py (duplicate)
|
| 23 |
-
├── vision_agent.py (duplicate)
|
| 24 |
-
├── voice_agent.py (duplicate)
|
| 25 |
-
├── main.py (old)
|
| 26 |
-
├── main_upgraded.py (old)
|
| 27 |
-
├── ask_question.py (old)
|
| 28 |
-
├── ask_question_upgraded.py (old)
|
| 29 |
-
├── test_upgrade.py (old)
|
| 30 |
-
├── install_upgrade.bat (old)
|
| 31 |
-
├── requirements.txt (old)
|
| 32 |
-
├── requirements_upgraded.txt (old)
|
| 33 |
-
├── README.md (old)
|
| 34 |
-
├── README_UPGRADED.md (duplicate)
|
| 35 |
-
├── ARCHITECTURE.md (old)
|
| 36 |
-
├── COMPARISON.md (old)
|
| 37 |
-
├── DEPLOYMENT_CHECKLIST.md (old)
|
| 38 |
-
├── INDEX.md (old)
|
| 39 |
-
├── QUICK_REFERENCE.md (old)
|
| 40 |
-
├── QUICKSTART.md (old)
|
| 41 |
-
├── SUMMARY.md (old)
|
| 42 |
-
├── UPGRADE_GUIDE.md (old)
|
| 43 |
-
└── ... (many more files)
|
| 44 |
-
|
| 45 |
-
❌ 40+ files in root
|
| 46 |
-
❌ Duplicate files
|
| 47 |
-
❌ Confusing structure
|
| 48 |
-
❌ No web interface
|
| 49 |
-
❌ Scattered docs
|
| 50 |
-
```
|
| 51 |
-
|
| 52 |
-
### **AFTER** ✅
|
| 53 |
-
```
|
| 54 |
-
VisionQ/
|
| 55 |
-
├── 📁 agents/ # AI agents (clean)
|
| 56 |
-
├── 📁 config/ # Settings (centralized)
|
| 57 |
-
├── 📁 ui/ # Web interface (NEW!)
|
| 58 |
-
├── 📁 core/ # Integration
|
| 59 |
-
├── 📁 data/ # Storage
|
| 60 |
-
├── 📁 models/ # AI models
|
| 61 |
-
├── 📁 docs/ # Documentation (organized)
|
| 62 |
-
├── 📁 .streamlit/ # UI config
|
| 63 |
-
├── 📁 archive/ # Old files (backup)
|
| 64 |
-
├── 📄 README.md # Main docs (clean)
|
| 65 |
-
├── 📄 requirements.txt # Dependencies (clean)
|
| 66 |
-
├── 📄 run.bat # Launcher (NEW!)
|
| 67 |
-
├── 📄 cleanup.bat # Cleanup script (NEW!)
|
| 68 |
-
├── 📄 .env.example # Environment template (NEW!)
|
| 69 |
-
└── 📄 .gitignore # Git rules (updated)
|
| 70 |
-
|
| 71 |
-
✅ 15 files in root
|
| 72 |
-
✅ No duplicates
|
| 73 |
-
✅ Clear structure
|
| 74 |
-
✅ Web interface
|
| 75 |
-
✅ Organized docs
|
| 76 |
-
```
|
| 77 |
-
|
| 78 |
-
---
|
| 79 |
-
|
| 80 |
-
## 🆕 NEW FEATURES
|
| 81 |
-
|
| 82 |
-
| Feature | Before | After |
|
| 83 |
-
|---------|--------|-------|
|
| 84 |
-
| **Web Interface** | ❌ None | ✅ Streamlit UI |
|
| 85 |
-
| **One-Click Launch** | ❌ None | ✅ run.bat |
|
| 86 |
-
| **Centralized Config** | ❌ Scattered | ✅ config/settings.py |
|
| 87 |
-
| **Language Docs** | ❌ None | ✅ docs/LANGUAGES.md |
|
| 88 |
-
| **API Keys Docs** | ❌ None | ✅ docs/API_KEYS.md |
|
| 89 |
-
| **Structure Docs** | ❌ None | ✅ docs/STRUCTURE.md |
|
| 90 |
-
| **Cleanup Script** | ❌ None | ✅ cleanup.bat |
|
| 91 |
-
| **Environment Template** | ❌ None | ✅ .env.example |
|
| 92 |
-
|
| 93 |
-
---
|
| 94 |
-
|
| 95 |
-
## 🌐 WEB INTERFACE (NEW!)
|
| 96 |
-
|
| 97 |
-
### **Before**
|
| 98 |
-
```python
|
| 99 |
-
# Command line only
|
| 100 |
-
python main_upgraded.py
|
| 101 |
-
|
| 102 |
-
# Voice commands only
|
| 103 |
-
# No visual feedback
|
| 104 |
-
# Hard to test
|
| 105 |
-
```
|
| 106 |
-
|
| 107 |
-
### **After**
|
| 108 |
-
```bash
|
| 109 |
-
# Web interface
|
| 110 |
-
run.bat
|
| 111 |
-
|
| 112 |
-
# Opens browser at http://localhost:8501
|
| 113 |
-
# Visual interface
|
| 114 |
-
# Easy to test
|
| 115 |
-
# Interactive
|
| 116 |
-
```
|
| 117 |
-
|
| 118 |
-
### **UI Features**
|
| 119 |
-
- ✅ **4 Tabs:** Vision, Query, Memories, Help
|
| 120 |
-
- ✅ **Live Camera:** See what AI sees
|
| 121 |
-
- ✅ **Interactive Buttons:** Capture, Remember, Read Text
|
| 122 |
-
- ✅ **Query Interface:** Ask questions visually
|
| 123 |
-
- ✅ **Memory Browser:** View stored memories
|
| 124 |
-
- ✅ **Settings Sidebar:** Configure languages
|
| 125 |
-
- ✅ **Help Section:** Built-in documentation
|
| 126 |
-
|
| 127 |
-
---
|
| 128 |
-
|
| 129 |
-
## 🌍 LANGUAGE SUPPORT
|
| 130 |
-
|
| 131 |
-
### **Before**
|
| 132 |
-
```python
|
| 133 |
-
# Hardcoded in code
|
| 134 |
-
OCR_LANGUAGES = ['en']
|
| 135 |
-
|
| 136 |
-
# No documentation
|
| 137 |
-
# No easy way to change
|
| 138 |
-
```
|
| 139 |
-
|
| 140 |
-
### **After**
|
| 141 |
-
```python
|
| 142 |
-
# Configurable in UI
|
| 143 |
-
# Select from 90+ languages
|
| 144 |
-
# Documented in docs/LANGUAGES.md
|
| 145 |
-
|
| 146 |
-
# Easy to change:
|
| 147 |
-
# 1. Open UI sidebar
|
| 148 |
-
# 2. Select languages
|
| 149 |
-
# 3. Done!
|
| 150 |
-
```
|
| 151 |
-
|
| 152 |
-
**Supported:** 90+ languages including English, Spanish, French, German, Italian, Portuguese, Russian, Arabic, Hindi, Chinese, Japanese, Korean, and many more!
|
| 153 |
-
|
| 154 |
-
---
|
| 155 |
-
|
| 156 |
-
## 🔑 API KEYS CLARITY
|
| 157 |
-
|
| 158 |
-
### **Before**
|
| 159 |
-
```
|
| 160 |
-
❓ Unclear if API keys needed
|
| 161 |
-
❓ No documentation
|
| 162 |
-
❓ Confusing for users
|
| 163 |
-
```
|
| 164 |
-
|
| 165 |
-
### **After**
|
| 166 |
-
```
|
| 167 |
-
✅ Clear: NO API keys needed!
|
| 168 |
-
✅ Documented in docs/API_KEYS.md
|
| 169 |
-
✅ .env.example for optional token
|
| 170 |
-
✅ Works 100% offline
|
| 171 |
-
```
|
| 172 |
-
|
| 173 |
-
---
|
| 174 |
-
|
| 175 |
-
## 📚 DOCUMENTATION
|
| 176 |
-
|
| 177 |
-
### **Before**
|
| 178 |
-
```
|
| 179 |
-
❌ 11 documentation files in root
|
| 180 |
-
❌ Scattered information
|
| 181 |
-
❌ Redundant content
|
| 182 |
-
❌ Hard to find info
|
| 183 |
-
```
|
| 184 |
-
|
| 185 |
-
### **After**
|
| 186 |
-
```
|
| 187 |
-
✅ 1 main README.md
|
| 188 |
-
✅ 3 focused docs in docs/
|
| 189 |
-
✅ No redundancy
|
| 190 |
-
✅ Easy to navigate
|
| 191 |
-
```
|
| 192 |
-
|
| 193 |
-
| Document | Purpose |
|
| 194 |
-
|----------|---------|
|
| 195 |
-
| `README.md` | Main documentation |
|
| 196 |
-
| `docs/LANGUAGES.md` | Language support (90+) |
|
| 197 |
-
| `docs/API_KEYS.md` | API keys info |
|
| 198 |
-
| `docs/STRUCTURE.md` | Project structure |
|
| 199 |
-
| `RESTRUCTURE_SUMMARY.md` | This summary |
|
| 200 |
-
|
| 201 |
-
---
|
| 202 |
-
|
| 203 |
-
## 🎯 USER EXPERIENCE
|
| 204 |
-
|
| 205 |
-
### **Before**
|
| 206 |
-
```
|
| 207 |
-
1. Install dependencies
|
| 208 |
-
2. Run python main_upgraded.py
|
| 209 |
-
3. Use voice commands only
|
| 210 |
-
4. No visual feedback
|
| 211 |
-
5. Hard to debug
|
| 212 |
-
```
|
| 213 |
-
|
| 214 |
-
### **After**
|
| 215 |
-
```
|
| 216 |
-
1. Run run.bat
|
| 217 |
-
2. Browser opens automatically
|
| 218 |
-
3. Click buttons in UI
|
| 219 |
-
4. See results instantly
|
| 220 |
-
5. Easy to use and test
|
| 221 |
-
```
|
| 222 |
-
|
| 223 |
-
---
|
| 224 |
-
|
| 225 |
-
## 👨💻 DEVELOPER EXPERIENCE
|
| 226 |
-
|
| 227 |
-
### **Before**
|
| 228 |
-
```
|
| 229 |
-
❌ Flat file structure
|
| 230 |
-
❌ Settings scattered in code
|
| 231 |
-
❌ Hard to find files
|
| 232 |
-
❌ Duplicate code
|
| 233 |
-
❌ No clear entry point
|
| 234 |
-
```
|
| 235 |
-
|
| 236 |
-
### **After**
|
| 237 |
-
```
|
| 238 |
-
✅ Organized folders
|
| 239 |
-
✅ Centralized config
|
| 240 |
-
✅ Easy to navigate
|
| 241 |
-
✅ No duplicates
|
| 242 |
-
✅ Clear entry points
|
| 243 |
-
```
|
| 244 |
-
|
| 245 |
-
---
|
| 246 |
-
|
| 247 |
-
## 🔧 CONFIGURATION
|
| 248 |
-
|
| 249 |
-
### **Before**
|
| 250 |
-
```python
|
| 251 |
-
# Settings scattered across files
|
| 252 |
-
# Hard to change
|
| 253 |
-
# No central config
|
| 254 |
-
```
|
| 255 |
-
|
| 256 |
-
### **After**
|
| 257 |
-
```python
|
| 258 |
-
# All in config/settings.py
|
| 259 |
-
# Easy to customize
|
| 260 |
-
# Well documented
|
| 261 |
-
# Feature flags
|
| 262 |
-
|
| 263 |
-
# Example:
|
| 264 |
-
OCR_CONFIG = {
|
| 265 |
-
"languages": ["en", "es", "fr"],
|
| 266 |
-
"confidence_threshold": 0.3,
|
| 267 |
-
}
|
| 268 |
-
```
|
| 269 |
-
|
| 270 |
-
---
|
| 271 |
-
|
| 272 |
-
## 📊 FILE COUNT
|
| 273 |
-
|
| 274 |
-
| Category | Before | After | Change |
|
| 275 |
-
|----------|--------|-------|--------|
|
| 276 |
-
| **Root Files** | 40+ | 15 | -25 |
|
| 277 |
-
| **Agent Files** | 12 (duplicates) | 7 (clean) | -5 |
|
| 278 |
-
| **Doc Files** | 11 (scattered) | 4 (organized) | -7 |
|
| 279 |
-
| **Config Files** | 0 | 1 | +1 |
|
| 280 |
-
| **UI Files** | 0 | 1 | +1 |
|
| 281 |
-
| **Total Clutter** | High | Low | ✅ |
|
| 282 |
-
|
| 283 |
-
---
|
| 284 |
-
|
| 285 |
-
## 🚀 LAUNCH PROCESS
|
| 286 |
-
|
| 287 |
-
### **Before**
|
| 288 |
-
```bash
|
| 289 |
-
# Manual process
|
| 290 |
-
1. Activate venv
|
| 291 |
-
2. Install dependencies
|
| 292 |
-
3. Run python main_upgraded.py
|
| 293 |
-
4. Hope it works
|
| 294 |
-
```
|
| 295 |
-
|
| 296 |
-
### **After**
|
| 297 |
-
```bash
|
| 298 |
-
# One command
|
| 299 |
-
run.bat
|
| 300 |
-
|
| 301 |
-
# Automatically:
|
| 302 |
-
# - Creates venv if needed
|
| 303 |
-
# - Installs dependencies
|
| 304 |
-
# - Launches Streamlit
|
| 305 |
-
# - Opens browser
|
| 306 |
-
```
|
| 307 |
-
|
| 308 |
-
---
|
| 309 |
-
|
| 310 |
-
## 🎨 VISUAL COMPARISON
|
| 311 |
-
|
| 312 |
-
### **Before: Command Line**
|
| 313 |
-
```
|
| 314 |
-
$ python main_upgraded.py
|
| 315 |
-
[VisionAgent] Initializing...
|
| 316 |
-
[VisionAgent] YOLO backend loaded
|
| 317 |
-
[VoiceAgent] Microphone detected
|
| 318 |
-
Vision Q started. I am listening.
|
| 319 |
-
[VOICE IN]: Listening (offline)...
|
| 320 |
-
```
|
| 321 |
-
|
| 322 |
-
### **After: Web Interface**
|
| 323 |
-
```
|
| 324 |
-
┌─────────────────────────────────────┐
|
| 325 |
-
│ 👁️ VisionQ - Multimodal AI │
|
| 326 |
-
├─────────────────────────────────────┤
|
| 327 |
-
│ 📷 Vision 🔍 Query 🧠 Memories │
|
| 328 |
-
├─────────────────────────────────────┤
|
| 329 |
-
│ [📷 Capture] [💾 Remember] │
|
| 330 |
-
│ [🔤 Read Text] │
|
| 331 |
-
│ │
|
| 332 |
-
│ 📸 Camera Feed │
|
| 333 |
-
│ [Live video preview] │
|
| 334 |
-
│ │
|
| 335 |
-
│ 📝 Results │
|
| 336 |
-
│ "a person holding a phone" │
|
| 337 |
-
└─────────────────────────────────────┘
|
| 338 |
-
```
|
| 339 |
-
|
| 340 |
-
---
|
| 341 |
-
|
| 342 |
-
## ✅ BENEFITS SUMMARY
|
| 343 |
-
|
| 344 |
-
### **For Users**
|
| 345 |
-
- ✅ Easy web interface
|
| 346 |
-
- ✅ Visual feedback
|
| 347 |
-
- ✅ One-click launch
|
| 348 |
-
- ✅ 90+ languages
|
| 349 |
-
- ✅ No API keys needed
|
| 350 |
-
|
| 351 |
-
### **For Developers**
|
| 352 |
-
- ✅ Clean structure
|
| 353 |
-
- ✅ Modular code
|
| 354 |
-
- ✅ Centralized config
|
| 355 |
-
- ✅ Easy to extend
|
| 356 |
-
- ✅ Well documented
|
| 357 |
-
|
| 358 |
-
### **For Everyone**
|
| 359 |
-
- ✅ Professional appearance
|
| 360 |
-
- ✅ Production ready
|
| 361 |
-
- ✅ Easy to deploy
|
| 362 |
-
- ✅ Easy to maintain
|
| 363 |
-
- ✅ Open source
|
| 364 |
-
|
| 365 |
-
---
|
| 366 |
-
|
| 367 |
-
## 🎯 WHAT TO DO NOW
|
| 368 |
-
|
| 369 |
-
### **1. Install & Run**
|
| 370 |
-
```bash
|
| 371 |
-
pip install -r requirements.txt
|
| 372 |
-
run.bat
|
| 373 |
-
```
|
| 374 |
-
|
| 375 |
-
### **2. Clean Up Old Files**
|
| 376 |
-
```bash
|
| 377 |
-
cleanup.bat
|
| 378 |
-
```
|
| 379 |
-
|
| 380 |
-
### **3. Explore**
|
| 381 |
-
- Open http://localhost:8501
|
| 382 |
-
- Try the web interface
|
| 383 |
-
- Test OCR in different languages
|
| 384 |
-
- Query your memories
|
| 385 |
-
|
| 386 |
-
### **4. Customize**
|
| 387 |
-
- Edit `config/settings.py`
|
| 388 |
-
- Select languages in UI
|
| 389 |
-
- Adjust settings
|
| 390 |
-
|
| 391 |
-
---
|
| 392 |
-
|
| 393 |
-
## 📈 IMPROVEMENT METRICS
|
| 394 |
-
|
| 395 |
-
| Metric | Before | After | Improvement |
|
| 396 |
-
|--------|--------|-------|-------------|
|
| 397 |
-
| **Files in Root** | 40+ | 15 | 🟢 -62% |
|
| 398 |
-
| **Duplicate Files** | 12 | 0 | 🟢 -100% |
|
| 399 |
-
| **Setup Steps** | 5 | 1 | 🟢 -80% |
|
| 400 |
-
| **User Interface** | CLI only | Web UI | 🟢 +100% |
|
| 401 |
-
| **Documentation** | Scattered | Organized | 🟢 +100% |
|
| 402 |
-
| **Ease of Use** | Hard | Easy | 🟢 +200% |
|
| 403 |
-
|
| 404 |
-
---
|
| 405 |
-
|
| 406 |
-
## 🎉 FINAL RESULT
|
| 407 |
-
|
| 408 |
-
**VisionQ is now:**
|
| 409 |
-
- ✅ **Clean** - Organized folder structure
|
| 410 |
-
- ✅ **Modern** - Web interface with Streamlit
|
| 411 |
-
- ✅ **Documented** - Clear, focused documentation
|
| 412 |
-
- ✅ **Configurable** - Centralized settings
|
| 413 |
-
- ✅ **Multi-lingual** - 90+ languages supported
|
| 414 |
-
- ✅ **Offline** - No API keys needed
|
| 415 |
-
- ✅ **Professional** - Production ready
|
| 416 |
-
- ✅ **User-Friendly** - Easy to use and test
|
| 417 |
-
|
| 418 |
-
---
|
| 419 |
-
|
| 420 |
-
**From cluttered development project to polished application! 🚀**
|
| 421 |
-
|
| 422 |
-
**Run `run.bat` and see the transformation yourself!**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
FINAL_FIX.md
DELETED
|
@@ -1,177 +0,0 @@
|
|
| 1 |
-
# FINAL FIX - Summary
|
| 2 |
-
|
| 3 |
-
## What Was Done
|
| 4 |
-
|
| 5 |
-
### 1. Fixed Embedding Normalization
|
| 6 |
-
- Changed from `.norm()` to `torch.nn.functional.normalize()`
|
| 7 |
-
- Updated both `encode_image()` and `encode_text()` methods
|
| 8 |
-
- File: `agents/embedding_agent.py`
|
| 9 |
-
|
| 10 |
-
### 2. Added Error Handling
|
| 11 |
-
- Wrapped embedding calls in try-except blocks
|
| 12 |
-
- System continues even if embeddings fail
|
| 13 |
-
- File: `agents/vision_agent.py`
|
| 14 |
-
|
| 15 |
-
### 3. Disabled Embeddings by Default
|
| 16 |
-
- Set `embeddings_enabled: False` in config
|
| 17 |
-
- Improves speed (2-3 seconds vs 5-7 seconds)
|
| 18 |
-
- File: `config/settings.py`
|
| 19 |
-
|
| 20 |
-
### 4. Removed All Emojis
|
| 21 |
-
- Cleaned up UI code
|
| 22 |
-
- Professional appearance
|
| 23 |
-
- File: `ui/app.py`
|
| 24 |
-
|
| 25 |
-
### 5. Added Cache Clearing
|
| 26 |
-
- Created `fix_and_run.bat` script
|
| 27 |
-
- Clears Python and Streamlit cache
|
| 28 |
-
- Ensures fresh start
|
| 29 |
-
|
| 30 |
-
## How to Fix the Error
|
| 31 |
-
|
| 32 |
-
### Quick Fix (Do This Now)
|
| 33 |
-
|
| 34 |
-
```bash
|
| 35 |
-
# Run this script
|
| 36 |
-
fix_and_run.bat
|
| 37 |
-
```
|
| 38 |
-
|
| 39 |
-
This will:
|
| 40 |
-
1. Clear all cache
|
| 41 |
-
2. Start fresh
|
| 42 |
-
3. Load updated code
|
| 43 |
-
|
| 44 |
-
### If Still Getting Errors
|
| 45 |
-
|
| 46 |
-
1. **Stop the application** (Ctrl+C in terminal)
|
| 47 |
-
|
| 48 |
-
2. **Clear all cache manually:**
|
| 49 |
-
```bash
|
| 50 |
-
rd /s /q __pycache__
|
| 51 |
-
rd /s /q agents\__pycache__
|
| 52 |
-
rd /s /q config\__pycache__
|
| 53 |
-
rd /s /q core\__pycache__
|
| 54 |
-
rd /s /q ui\__pycache__
|
| 55 |
-
```
|
| 56 |
-
|
| 57 |
-
3. **Restart:**
|
| 58 |
-
```bash
|
| 59 |
-
run.bat
|
| 60 |
-
```
|
| 61 |
-
|
| 62 |
-
4. **In browser, press `C` key** to clear Streamlit cache
|
| 63 |
-
|
| 64 |
-
5. **Click "Initialize System"** again
|
| 65 |
-
|
| 66 |
-
## Why This Happens
|
| 67 |
-
|
| 68 |
-
The error occurs because:
|
| 69 |
-
1. Streamlit caches the old code
|
| 70 |
-
2. Python bytecode cache has old version
|
| 71 |
-
3. Old embedding_agent.py is being used
|
| 72 |
-
|
| 73 |
-
The fix clears all caches and loads the new code.
|
| 74 |
-
|
| 75 |
-
## Verification
|
| 76 |
-
|
| 77 |
-
After running `fix_and_run.bat`, you should see:
|
| 78 |
-
|
| 79 |
-
```
|
| 80 |
-
[VisionAgent] Initializing...
|
| 81 |
-
[VisionAgent] Embeddings disabled for faster performance
|
| 82 |
-
[VisionAgent] YOLO backend loaded
|
| 83 |
-
[VisionAgent] Vision system initialized
|
| 84 |
-
```
|
| 85 |
-
|
| 86 |
-
The key line is: **"Embeddings disabled for faster performance"**
|
| 87 |
-
|
| 88 |
-
This means:
|
| 89 |
-
- Embeddings are properly disabled
|
| 90 |
-
- No embedding errors will occur
|
| 91 |
-
- System will be faster (2-3 seconds)
|
| 92 |
-
|
| 93 |
-
## What Each Button Does Now
|
| 94 |
-
|
| 95 |
-
### "Capture & Describe"
|
| 96 |
-
- Captures frame
|
| 97 |
-
- Generates caption (BLIP)
|
| 98 |
-
- Extracts text (OCR)
|
| 99 |
-
- NO embeddings (disabled)
|
| 100 |
-
- Fast: ~2-3 seconds
|
| 101 |
-
|
| 102 |
-
### "Remember Scene"
|
| 103 |
-
- Captures frame
|
| 104 |
-
- Generates caption (BLIP)
|
| 105 |
-
- Extracts text (OCR)
|
| 106 |
-
- NO embeddings (disabled)
|
| 107 |
-
- Stores in memory
|
| 108 |
-
- Fast: ~2-3 seconds
|
| 109 |
-
|
| 110 |
-
### "Read Text"
|
| 111 |
-
- Captures frame
|
| 112 |
-
- Extracts text only (OCR)
|
| 113 |
-
- Very fast: ~500ms
|
| 114 |
-
|
| 115 |
-
## Files Created
|
| 116 |
-
|
| 117 |
-
1. `fix_and_run.bat` - Quick fix script
|
| 118 |
-
2. `TROUBLESHOOTING.md` - Detailed troubleshooting guide
|
| 119 |
-
3. `docs/PERFORMANCE.md` - Performance optimization guide
|
| 120 |
-
4. `FIXES_APPLIED.md` - Summary of all fixes
|
| 121 |
-
|
| 122 |
-
## Current Configuration
|
| 123 |
-
|
| 124 |
-
```python
|
| 125 |
-
# config/settings.py
|
| 126 |
-
FEATURES = {
|
| 127 |
-
"ocr_enabled": True, # Text extraction
|
| 128 |
-
"embeddings_enabled": False, # Disabled for speed
|
| 129 |
-
"object_detection_enabled": True, # YOLO detection
|
| 130 |
-
}
|
| 131 |
-
```
|
| 132 |
-
|
| 133 |
-
**Result:**
|
| 134 |
-
- Speed: Fast (2-3 seconds)
|
| 135 |
-
- Features: Caption + OCR + Objects
|
| 136 |
-
- Stability: No embedding errors
|
| 137 |
-
|
| 138 |
-
## To Enable Embeddings (Optional)
|
| 139 |
-
|
| 140 |
-
If you want visual similarity search:
|
| 141 |
-
|
| 142 |
-
1. Edit `config/settings.py`:
|
| 143 |
-
```python
|
| 144 |
-
FEATURES = {
|
| 145 |
-
"embeddings_enabled": True,
|
| 146 |
-
}
|
| 147 |
-
```
|
| 148 |
-
|
| 149 |
-
2. Run `fix_and_run.bat`
|
| 150 |
-
|
| 151 |
-
3. Test carefully
|
| 152 |
-
|
| 153 |
-
**Note:** This will make it slower (5-7 seconds) but enables visual search.
|
| 154 |
-
|
| 155 |
-
## Summary
|
| 156 |
-
|
| 157 |
-
**The error is fixed in the code.**
|
| 158 |
-
|
| 159 |
-
**To apply the fix:**
|
| 160 |
-
1. Run `fix_and_run.bat`
|
| 161 |
-
2. Click "Initialize System"
|
| 162 |
-
3. Test buttons
|
| 163 |
-
4. Should work now!
|
| 164 |
-
|
| 165 |
-
**If still broken:**
|
| 166 |
-
- See `TROUBLESHOOTING.md`
|
| 167 |
-
- Or keep embeddings disabled (recommended)
|
| 168 |
-
|
| 169 |
-
**Current status:**
|
| 170 |
-
- Embeddings: Disabled (for speed and stability)
|
| 171 |
-
- OCR: Enabled
|
| 172 |
-
- Object Detection: Enabled
|
| 173 |
-
- Caption: Enabled
|
| 174 |
-
- Speed: Fast (2-3 seconds)
|
| 175 |
-
- Emojis: Removed
|
| 176 |
-
|
| 177 |
-
**Everything should work now!**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
FIXES_APPLIED.md
DELETED
|
@@ -1,124 +0,0 @@
|
|
| 1 |
-
# Fixes Applied - Summary
|
| 2 |
-
|
| 3 |
-
## Issues Fixed
|
| 4 |
-
|
| 5 |
-
### 1. AttributeError with CLIP Embeddings
|
| 6 |
-
**Error:** `'BaseModelOutputWithPooling' object has no attribute 'norm'`
|
| 7 |
-
|
| 8 |
-
**Fix:** Changed normalization method in `agents/embedding_agent.py`:
|
| 9 |
-
```python
|
| 10 |
-
# Before (broken)
|
| 11 |
-
embedding = image_features / image_features.norm(dim=-1, keepdim=True)
|
| 12 |
-
|
| 13 |
-
# After (fixed)
|
| 14 |
-
embedding = torch.nn.functional.normalize(image_features, p=2, dim=-1)
|
| 15 |
-
```
|
| 16 |
-
|
| 17 |
-
### 2. Slow Performance
|
| 18 |
-
**Issue:** System taking 5-7 seconds per capture
|
| 19 |
-
|
| 20 |
-
**Fix:** Disabled embeddings by default in `config/settings.py`:
|
| 21 |
-
```python
|
| 22 |
-
FEATURES = {
|
| 23 |
-
"embeddings_enabled": False, # Now disabled for speed
|
| 24 |
-
}
|
| 25 |
-
```
|
| 26 |
-
|
| 27 |
-
**Result:** ~2-3 seconds per capture (much faster!)
|
| 28 |
-
|
| 29 |
-
### 3. Emojis in Code
|
| 30 |
-
**Issue:** Emojis throughout UI code
|
| 31 |
-
|
| 32 |
-
**Fix:** Removed all emojis from:
|
| 33 |
-
- Button labels
|
| 34 |
-
- Headers
|
| 35 |
-
- Status messages
|
| 36 |
-
- Tab names
|
| 37 |
-
- Spinner messages
|
| 38 |
-
|
| 39 |
-
**Result:** Clean, professional UI without emojis
|
| 40 |
-
|
| 41 |
-
## What Changed
|
| 42 |
-
|
| 43 |
-
### Files Modified
|
| 44 |
-
1. `agents/embedding_agent.py` - Fixed normalization
|
| 45 |
-
2. `config/settings.py` - Disabled embeddings by default
|
| 46 |
-
3. `agents/vision_agent.py` - Made embeddings optional
|
| 47 |
-
4. `ui/app.py` - Removed all emojis
|
| 48 |
-
|
| 49 |
-
### New Files Created
|
| 50 |
-
1. `docs/PERFORMANCE.md` - Performance optimization guide
|
| 51 |
-
|
| 52 |
-
## Current Configuration
|
| 53 |
-
|
| 54 |
-
**Speed:** Fast (2-3 seconds per capture)
|
| 55 |
-
|
| 56 |
-
**Enabled Features:**
|
| 57 |
-
- BLIP Caption (always on)
|
| 58 |
-
- YOLO Object Detection
|
| 59 |
-
- EasyOCR Text Extraction
|
| 60 |
-
|
| 61 |
-
**Disabled Features:**
|
| 62 |
-
- CLIP Embeddings (for speed)
|
| 63 |
-
|
| 64 |
-
## How to Use
|
| 65 |
-
|
| 66 |
-
### 1. Run the Application
|
| 67 |
-
```bash
|
| 68 |
-
run.bat
|
| 69 |
-
```
|
| 70 |
-
|
| 71 |
-
### 2. Test the Fix
|
| 72 |
-
- Click "Initialize System"
|
| 73 |
-
- Click "Capture & Describe"
|
| 74 |
-
- Should work without errors now
|
| 75 |
-
- Should be faster (2-3 seconds)
|
| 76 |
-
|
| 77 |
-
### 3. Enable Embeddings (Optional)
|
| 78 |
-
If you want visual similarity search:
|
| 79 |
-
|
| 80 |
-
Edit `config/settings.py`:
|
| 81 |
-
```python
|
| 82 |
-
FEATURES = {
|
| 83 |
-
"embeddings_enabled": True, # Enable for visual search
|
| 84 |
-
}
|
| 85 |
-
```
|
| 86 |
-
|
| 87 |
-
**Note:** This will make it slower (5-7 seconds) but enables visual similarity search.
|
| 88 |
-
|
| 89 |
-
## Performance Comparison
|
| 90 |
-
|
| 91 |
-
| Configuration | Speed | Features |
|
| 92 |
-
|---------------|-------|----------|
|
| 93 |
-
| **Current (Fast)** | 2-3s | Caption + OCR + Objects |
|
| 94 |
-
| Full (Slow) | 5-7s | Caption + OCR + Objects + Embeddings |
|
| 95 |
-
| Minimal (Fastest) | 1s | Caption only |
|
| 96 |
-
|
| 97 |
-
## Troubleshooting
|
| 98 |
-
|
| 99 |
-
### Still getting errors?
|
| 100 |
-
1. Restart the application
|
| 101 |
-
2. Clear cache: Delete `data/` folder
|
| 102 |
-
3. Reinstall: `pip install --upgrade -r requirements.txt`
|
| 103 |
-
|
| 104 |
-
### Still slow?
|
| 105 |
-
1. Check `config/settings.py` - ensure `embeddings_enabled: False`
|
| 106 |
-
2. Reduce OCR languages to just English
|
| 107 |
-
3. Use smaller YOLO model (yolov8n.pt)
|
| 108 |
-
|
| 109 |
-
See `docs/PERFORMANCE.md` for detailed optimization guide.
|
| 110 |
-
|
| 111 |
-
## Summary
|
| 112 |
-
|
| 113 |
-
All issues fixed:
|
| 114 |
-
- Error with embeddings: FIXED
|
| 115 |
-
- Slow performance: FIXED (2-3x faster)
|
| 116 |
-
- Emojis in code: REMOVED
|
| 117 |
-
|
| 118 |
-
System is now:
|
| 119 |
-
- Working correctly
|
| 120 |
-
- Much faster
|
| 121 |
-
- Professional appearance
|
| 122 |
-
- Ready to use
|
| 123 |
-
|
| 124 |
-
Run `run.bat` and test it out!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
FIX_TENSORFLOW.md
DELETED
|
@@ -1,138 +0,0 @@
|
|
| 1 |
-
# Fix: TensorFlow/Protobuf Conflict
|
| 2 |
-
|
| 3 |
-
## The Error
|
| 4 |
-
```
|
| 5 |
-
RuntimeError: Failed to import transformers/BLIP.
|
| 6 |
-
This usually happens when TensorFlow and protobuf are out of sync.
|
| 7 |
-
```
|
| 8 |
-
|
| 9 |
-
## Quick Fix (Recommended)
|
| 10 |
-
|
| 11 |
-
Run this script:
|
| 12 |
-
```bash
|
| 13 |
-
fix_tensorflow.bat
|
| 14 |
-
```
|
| 15 |
-
|
| 16 |
-
This will:
|
| 17 |
-
1. Remove TensorFlow (not needed)
|
| 18 |
-
2. Install correct protobuf version
|
| 19 |
-
3. Reinstall transformers
|
| 20 |
-
4. Clear cache
|
| 21 |
-
|
| 22 |
-
Then run:
|
| 23 |
-
```bash
|
| 24 |
-
streamlit run ui\app.py
|
| 25 |
-
```
|
| 26 |
-
|
| 27 |
-
---
|
| 28 |
-
|
| 29 |
-
## Manual Fix (If script doesn't work)
|
| 30 |
-
|
| 31 |
-
### Step 1: Uninstall Conflicting Packages
|
| 32 |
-
```bash
|
| 33 |
-
pip uninstall tensorflow tensorflow-cpu protobuf -y
|
| 34 |
-
```
|
| 35 |
-
|
| 36 |
-
### Step 2: Install Correct Protobuf
|
| 37 |
-
```bash
|
| 38 |
-
pip install protobuf==3.20.3
|
| 39 |
-
```
|
| 40 |
-
|
| 41 |
-
### Step 3: Reinstall Transformers
|
| 42 |
-
```bash
|
| 43 |
-
pip install --upgrade --force-reinstall transformers
|
| 44 |
-
```
|
| 45 |
-
|
| 46 |
-
### Step 4: Clear Cache
|
| 47 |
-
```bash
|
| 48 |
-
rd /s /q __pycache__
|
| 49 |
-
rd /s /q agents\__pycache__
|
| 50 |
-
rd /s /q config\__pycache__
|
| 51 |
-
rd /s /q core\__pycache__
|
| 52 |
-
rd /s /q ui\__pycache__
|
| 53 |
-
```
|
| 54 |
-
|
| 55 |
-
### Step 5: Test
|
| 56 |
-
```bash
|
| 57 |
-
streamlit run ui\app.py
|
| 58 |
-
```
|
| 59 |
-
|
| 60 |
-
---
|
| 61 |
-
|
| 62 |
-
## Why This Happens
|
| 63 |
-
|
| 64 |
-
VisionQ doesn't need TensorFlow, but sometimes it gets installed as a dependency and conflicts with protobuf.
|
| 65 |
-
|
| 66 |
-
**Solution:** Remove TensorFlow and use specific protobuf version.
|
| 67 |
-
|
| 68 |
-
---
|
| 69 |
-
|
| 70 |
-
## Nuclear Option (If nothing works)
|
| 71 |
-
|
| 72 |
-
### Delete and Recreate Virtual Environment
|
| 73 |
-
|
| 74 |
-
```bash
|
| 75 |
-
# 1. Deactivate current venv
|
| 76 |
-
deactivate
|
| 77 |
-
|
| 78 |
-
# 2. Delete old venv
|
| 79 |
-
rd /s /q .venv
|
| 80 |
-
rd /s /q venv
|
| 81 |
-
|
| 82 |
-
# 3. Create fresh venv
|
| 83 |
-
python -m venv venv
|
| 84 |
-
|
| 85 |
-
# 4. Activate
|
| 86 |
-
venv\Scripts\activate
|
| 87 |
-
|
| 88 |
-
# 5. Install dependencies
|
| 89 |
-
pip install -r requirements.txt
|
| 90 |
-
|
| 91 |
-
# 6. Run
|
| 92 |
-
streamlit run ui\app.py
|
| 93 |
-
```
|
| 94 |
-
|
| 95 |
-
---
|
| 96 |
-
|
| 97 |
-
## Verify Fix
|
| 98 |
-
|
| 99 |
-
After running the fix, you should be able to import without errors:
|
| 100 |
-
|
| 101 |
-
```bash
|
| 102 |
-
python -c "from agents.caption_agent import CaptionAgent; print('Success!')"
|
| 103 |
-
```
|
| 104 |
-
|
| 105 |
-
If you see "Success!", the fix worked!
|
| 106 |
-
|
| 107 |
-
---
|
| 108 |
-
|
| 109 |
-
## Prevention
|
| 110 |
-
|
| 111 |
-
The `requirements.txt` has been updated to include:
|
| 112 |
-
```
|
| 113 |
-
protobuf==3.20.3
|
| 114 |
-
```
|
| 115 |
-
|
| 116 |
-
This prevents future conflicts.
|
| 117 |
-
|
| 118 |
-
---
|
| 119 |
-
|
| 120 |
-
## Summary
|
| 121 |
-
|
| 122 |
-
**Quick Fix:**
|
| 123 |
-
```bash
|
| 124 |
-
fix_tensorflow.bat
|
| 125 |
-
streamlit run ui\app.py
|
| 126 |
-
```
|
| 127 |
-
|
| 128 |
-
**If that doesn't work:**
|
| 129 |
-
```bash
|
| 130 |
-
# Nuclear option
|
| 131 |
-
rd /s /q venv
|
| 132 |
-
python -m venv venv
|
| 133 |
-
venv\Scripts\activate
|
| 134 |
-
pip install -r requirements.txt
|
| 135 |
-
streamlit run ui\app.py
|
| 136 |
-
```
|
| 137 |
-
|
| 138 |
-
**One of these will definitely work!**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
MASTER_TROUBLESHOOTING.md
DELETED
|
@@ -1,232 +0,0 @@
|
|
| 1 |
-
# VisionQ - Complete Troubleshooting Guide
|
| 2 |
-
|
| 3 |
-
## Current Issues & Fixes
|
| 4 |
-
|
| 5 |
-
### Issue 1: TensorFlow/Protobuf Conflict
|
| 6 |
-
|
| 7 |
-
**Error:**
|
| 8 |
-
```
|
| 9 |
-
RuntimeError: Failed to import transformers/BLIP
|
| 10 |
-
```
|
| 11 |
-
|
| 12 |
-
**Fix:**
|
| 13 |
-
```bash
|
| 14 |
-
fix_tensorflow.bat
|
| 15 |
-
```
|
| 16 |
-
|
| 17 |
-
See `FIX_TENSORFLOW.md` for details.
|
| 18 |
-
|
| 19 |
-
---
|
| 20 |
-
|
| 21 |
-
### Issue 2: Embedding AttributeError
|
| 22 |
-
|
| 23 |
-
**Error:**
|
| 24 |
-
```
|
| 25 |
-
AttributeError: 'BaseModelOutputWithPooling' object has no attribute 'norm'
|
| 26 |
-
```
|
| 27 |
-
|
| 28 |
-
**Fix:**
|
| 29 |
-
```bash
|
| 30 |
-
fix_and_run.bat
|
| 31 |
-
```
|
| 32 |
-
|
| 33 |
-
See `TROUBLESHOOTING.md` for details.
|
| 34 |
-
|
| 35 |
-
---
|
| 36 |
-
|
| 37 |
-
## All Fix Scripts
|
| 38 |
-
|
| 39 |
-
| Script | Purpose | When to Use |
|
| 40 |
-
|--------|---------|-------------|
|
| 41 |
-
| `fix_tensorflow.bat` | Fix TensorFlow/protobuf conflict | Import errors with BLIP |
|
| 42 |
-
| `fix_and_run.bat` | Clear cache and restart | Embedding errors, old code |
|
| 43 |
-
| `run.bat` | Normal start | Regular use |
|
| 44 |
-
|
| 45 |
-
---
|
| 46 |
-
|
| 47 |
-
## Step-by-Step Fix Process
|
| 48 |
-
|
| 49 |
-
### Step 1: Fix TensorFlow Conflict
|
| 50 |
-
```bash
|
| 51 |
-
fix_tensorflow.bat
|
| 52 |
-
```
|
| 53 |
-
|
| 54 |
-
### Step 2: Clear Cache
|
| 55 |
-
```bash
|
| 56 |
-
fix_and_run.bat
|
| 57 |
-
```
|
| 58 |
-
|
| 59 |
-
### Step 3: Test
|
| 60 |
-
Open http://localhost:8501 and test all buttons.
|
| 61 |
-
|
| 62 |
-
---
|
| 63 |
-
|
| 64 |
-
## If Nothing Works: Nuclear Option
|
| 65 |
-
|
| 66 |
-
### Complete Reset
|
| 67 |
-
|
| 68 |
-
```bash
|
| 69 |
-
# 1. Stop everything
|
| 70 |
-
# Press Ctrl+C in terminal
|
| 71 |
-
|
| 72 |
-
# 2. Delete virtual environment
|
| 73 |
-
rd /s /q .venv
|
| 74 |
-
rd /s /q venv
|
| 75 |
-
|
| 76 |
-
# 3. Delete cache
|
| 77 |
-
rd /s /q __pycache__
|
| 78 |
-
rd /s /q agents\__pycache__
|
| 79 |
-
rd /s /q config\__pycache__
|
| 80 |
-
rd /s /q core\__pycache__
|
| 81 |
-
rd /s /q ui\__pycache__
|
| 82 |
-
|
| 83 |
-
# 4. Create fresh venv
|
| 84 |
-
python -m venv venv
|
| 85 |
-
|
| 86 |
-
# 5. Activate
|
| 87 |
-
venv\Scripts\activate
|
| 88 |
-
|
| 89 |
-
# 6. Upgrade pip
|
| 90 |
-
python -m pip install --upgrade pip
|
| 91 |
-
|
| 92 |
-
# 7. Install dependencies
|
| 93 |
-
pip install -r requirements.txt
|
| 94 |
-
|
| 95 |
-
# 8. Run
|
| 96 |
-
streamlit run ui\app.py
|
| 97 |
-
```
|
| 98 |
-
|
| 99 |
-
This will give you a completely fresh start.
|
| 100 |
-
|
| 101 |
-
---
|
| 102 |
-
|
| 103 |
-
## Common Errors & Solutions
|
| 104 |
-
|
| 105 |
-
### Error: "python run.bat" gives SyntaxError
|
| 106 |
-
**Problem:** Trying to run .bat file with Python
|
| 107 |
-
**Solution:** Just run `run.bat` (without python)
|
| 108 |
-
|
| 109 |
-
### Error: Camera not working
|
| 110 |
-
**Problem:** Camera in use or permissions
|
| 111 |
-
**Solution:**
|
| 112 |
-
- Close other apps using camera
|
| 113 |
-
- Check camera permissions
|
| 114 |
-
- Try different camera index in `config/settings.py`
|
| 115 |
-
|
| 116 |
-
### Error: Models loading slowly
|
| 117 |
-
**Problem:** First run downloads models
|
| 118 |
-
**Solution:** Wait for download to complete (~2GB)
|
| 119 |
-
|
| 120 |
-
### Error: Out of memory
|
| 121 |
-
**Problem:** Too many models loaded
|
| 122 |
-
**Solution:**
|
| 123 |
-
- Close other applications
|
| 124 |
-
- Disable embeddings in `config/settings.py`
|
| 125 |
-
- Use smaller YOLO model
|
| 126 |
-
|
| 127 |
-
### Error: OCR not detecting text
|
| 128 |
-
**Problem:** Poor lighting or text quality
|
| 129 |
-
**Solution:**
|
| 130 |
-
- Ensure good lighting
|
| 131 |
-
- Text should be clear
|
| 132 |
-
- Try different languages
|
| 133 |
-
|
| 134 |
-
---
|
| 135 |
-
|
| 136 |
-
## Performance Issues
|
| 137 |
-
|
| 138 |
-
### System is slow (5+ seconds per capture)
|
| 139 |
-
|
| 140 |
-
**Check if embeddings are enabled:**
|
| 141 |
-
```python
|
| 142 |
-
# config/settings.py
|
| 143 |
-
FEATURES = {
|
| 144 |
-
"embeddings_enabled": False, # Should be False for speed
|
| 145 |
-
}
|
| 146 |
-
```
|
| 147 |
-
|
| 148 |
-
**If True, change to False and restart.**
|
| 149 |
-
|
| 150 |
-
---
|
| 151 |
-
|
| 152 |
-
## Verification Commands
|
| 153 |
-
|
| 154 |
-
### Test imports:
|
| 155 |
-
```bash
|
| 156 |
-
python -c "from agents.caption_agent import CaptionAgent; print('Caption: OK')"
|
| 157 |
-
python -c "from agents.vision_agent import VisionAgent; print('Vision: OK')"
|
| 158 |
-
python -c "from agents.memory_agent import MemoryAgent; print('Memory: OK')"
|
| 159 |
-
```
|
| 160 |
-
|
| 161 |
-
### Check protobuf version:
|
| 162 |
-
```bash
|
| 163 |
-
pip show protobuf
|
| 164 |
-
```
|
| 165 |
-
Should show: `Version: 3.20.3`
|
| 166 |
-
|
| 167 |
-
### Check if TensorFlow is installed:
|
| 168 |
-
```bash
|
| 169 |
-
pip show tensorflow
|
| 170 |
-
```
|
| 171 |
-
Should show: `WARNING: Package(s) not found: tensorflow` (Good!)
|
| 172 |
-
|
| 173 |
-
---
|
| 174 |
-
|
| 175 |
-
## Getting Help
|
| 176 |
-
|
| 177 |
-
### Check these files:
|
| 178 |
-
1. `FIX_TENSORFLOW.md` - TensorFlow/protobuf issues
|
| 179 |
-
2. `TROUBLESHOOTING.md` - Embedding errors
|
| 180 |
-
3. `docs/PERFORMANCE.md` - Speed optimization
|
| 181 |
-
4. `FINAL_FIX.md` - Summary of all fixes
|
| 182 |
-
|
| 183 |
-
### Still stuck?
|
| 184 |
-
|
| 185 |
-
1. Run nuclear option (complete reset)
|
| 186 |
-
2. Check Python version: `python --version` (should be 3.8+)
|
| 187 |
-
3. Check if in virtual environment: Look for `(.venv)` or `(venv)` in prompt
|
| 188 |
-
4. Try on different computer to isolate issue
|
| 189 |
-
|
| 190 |
-
---
|
| 191 |
-
|
| 192 |
-
## Quick Reference
|
| 193 |
-
|
| 194 |
-
**Fix TensorFlow:**
|
| 195 |
-
```bash
|
| 196 |
-
fix_tensorflow.bat
|
| 197 |
-
```
|
| 198 |
-
|
| 199 |
-
**Fix Cache:**
|
| 200 |
-
```bash
|
| 201 |
-
fix_and_run.bat
|
| 202 |
-
```
|
| 203 |
-
|
| 204 |
-
**Normal Start:**
|
| 205 |
-
```bash
|
| 206 |
-
run.bat
|
| 207 |
-
```
|
| 208 |
-
|
| 209 |
-
**Direct Start:**
|
| 210 |
-
```bash
|
| 211 |
-
streamlit run ui\app.py
|
| 212 |
-
```
|
| 213 |
-
|
| 214 |
-
**Nuclear Reset:**
|
| 215 |
-
```bash
|
| 216 |
-
rd /s /q venv
|
| 217 |
-
python -m venv venv
|
| 218 |
-
venv\Scripts\activate
|
| 219 |
-
pip install -r requirements.txt
|
| 220 |
-
streamlit run ui\app.py
|
| 221 |
-
```
|
| 222 |
-
|
| 223 |
-
---
|
| 224 |
-
|
| 225 |
-
## Summary
|
| 226 |
-
|
| 227 |
-
Most issues are fixed by:
|
| 228 |
-
1. `fix_tensorflow.bat` - Fixes import errors
|
| 229 |
-
2. `fix_and_run.bat` - Fixes cache issues
|
| 230 |
-
3. Nuclear option - Fixes everything else
|
| 231 |
-
|
| 232 |
-
**Try them in order until it works!**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
MODELS_AND_SPEED.md
DELETED
|
@@ -1,203 +0,0 @@
|
|
| 1 |
-
# Models & Performance - Quick Reference
|
| 2 |
-
|
| 3 |
-
## Current Models
|
| 4 |
-
|
| 5 |
-
| Component | Model | Size | Speed | Status |
|
| 6 |
-
|-----------|-------|------|-------|--------|
|
| 7 |
-
| **Caption** | BLIP-base | 990MB | 1.5s | Active (SLOWEST) |
|
| 8 |
-
| **Object Detection** | YOLOv8s | 22MB | 0.5s | Active |
|
| 9 |
-
| **OCR** | EasyOCR | 50MB | 0.5s | Active |
|
| 10 |
-
| **Embeddings** | CLIP | 500MB | 2s | Disabled |
|
| 11 |
-
|
| 12 |
-
**Total processing time: ~2.5 seconds per capture**
|
| 13 |
-
|
| 14 |
-
---
|
| 15 |
-
|
| 16 |
-
## Why Camera is Slow
|
| 17 |
-
|
| 18 |
-
**The camera itself is fast!** The slowness comes from AI processing:
|
| 19 |
-
|
| 20 |
-
```
|
| 21 |
-
Camera capture: 10ms (fast!)
|
| 22 |
-
BLIP caption: 1500ms (slow!)
|
| 23 |
-
EasyOCR: 500ms (medium)
|
| 24 |
-
YOLO detection: 500ms (medium)
|
| 25 |
-
------------------------
|
| 26 |
-
Total: 2510ms (2.5 seconds)
|
| 27 |
-
```
|
| 28 |
-
|
| 29 |
-
**BLIP is the bottleneck!**
|
| 30 |
-
|
| 31 |
-
---
|
| 32 |
-
|
| 33 |
-
## Quick Speed Fixes
|
| 34 |
-
|
| 35 |
-
### Option 1: Disable OCR (500ms faster)
|
| 36 |
-
```python
|
| 37 |
-
# config/settings.py
|
| 38 |
-
FEATURES = {
|
| 39 |
-
"ocr_enabled": False,
|
| 40 |
-
}
|
| 41 |
-
```
|
| 42 |
-
**New speed:** 2 seconds
|
| 43 |
-
|
| 44 |
-
### Option 2: Disable YOLO (500ms faster)
|
| 45 |
-
```python
|
| 46 |
-
FEATURES = {
|
| 47 |
-
"object_detection_enabled": False,
|
| 48 |
-
}
|
| 49 |
-
```
|
| 50 |
-
**New speed:** 2 seconds
|
| 51 |
-
|
| 52 |
-
### Option 3: Both (1000ms faster)
|
| 53 |
-
```python
|
| 54 |
-
FEATURES = {
|
| 55 |
-
"ocr_enabled": False,
|
| 56 |
-
"object_detection_enabled": False,
|
| 57 |
-
}
|
| 58 |
-
```
|
| 59 |
-
**New speed:** 1.5 seconds (40% faster!)
|
| 60 |
-
|
| 61 |
-
---
|
| 62 |
-
|
| 63 |
-
## Apply Fast Mode
|
| 64 |
-
|
| 65 |
-
### Step 1: Edit config/settings.py
|
| 66 |
-
|
| 67 |
-
Find the `FEATURES` section and change to:
|
| 68 |
-
```python
|
| 69 |
-
FEATURES = {
|
| 70 |
-
"ocr_enabled": False, # Disable for speed
|
| 71 |
-
"object_detection_enabled": False, # Disable for speed
|
| 72 |
-
"embeddings_enabled": False, # Keep disabled
|
| 73 |
-
}
|
| 74 |
-
```
|
| 75 |
-
|
| 76 |
-
### Step 2: Restart
|
| 77 |
-
```bash
|
| 78 |
-
fix_and_run.bat
|
| 79 |
-
```
|
| 80 |
-
|
| 81 |
-
### Step 3: Test
|
| 82 |
-
Click "Capture & Describe" - should be ~1.5 seconds now!
|
| 83 |
-
|
| 84 |
-
---
|
| 85 |
-
|
| 86 |
-
## Model Details
|
| 87 |
-
|
| 88 |
-
### BLIP (Caption Model)
|
| 89 |
-
- **Full name:** Salesforce/blip-image-captioning-base
|
| 90 |
-
- **Purpose:** Generate scene descriptions
|
| 91 |
-
- **Speed:** 1.5 seconds (CPU)
|
| 92 |
-
- **Can't disable:** This is the core feature
|
| 93 |
-
- **Alternative:** Use GIT model (3x faster)
|
| 94 |
-
|
| 95 |
-
### YOLOv8s (Object Detection)
|
| 96 |
-
- **Full name:** YOLOv8 Small
|
| 97 |
-
- **Purpose:** Detect objects (person, car, etc.)
|
| 98 |
-
- **Speed:** 0.5 seconds
|
| 99 |
-
- **Can disable:** Yes (set object_detection_enabled: False)
|
| 100 |
-
- **Alternative:** Use YOLOv8n (nano) for 200ms faster
|
| 101 |
-
|
| 102 |
-
### EasyOCR (Text Reading)
|
| 103 |
-
- **Purpose:** Read text from images
|
| 104 |
-
- **Speed:** 0.5 seconds
|
| 105 |
-
- **Can disable:** Yes (set ocr_enabled: False)
|
| 106 |
-
- **Languages:** 90+ supported
|
| 107 |
-
|
| 108 |
-
### CLIP (Embeddings)
|
| 109 |
-
- **Purpose:** Visual similarity search
|
| 110 |
-
- **Speed:** 2 seconds
|
| 111 |
-
- **Status:** Already disabled
|
| 112 |
-
- **Keep disabled:** For best performance
|
| 113 |
-
|
| 114 |
-
---
|
| 115 |
-
|
| 116 |
-
## GPU Acceleration
|
| 117 |
-
|
| 118 |
-
If you have NVIDIA GPU:
|
| 119 |
-
|
| 120 |
-
```python
|
| 121 |
-
# config/settings.py
|
| 122 |
-
PERFORMANCE_CONFIG = {
|
| 123 |
-
"use_gpu": True,
|
| 124 |
-
}
|
| 125 |
-
|
| 126 |
-
OCR_CONFIG = {
|
| 127 |
-
"gpu": True,
|
| 128 |
-
}
|
| 129 |
-
```
|
| 130 |
-
|
| 131 |
-
**Speed improvement:** 2-3x faster!
|
| 132 |
-
|
| 133 |
-
**Requirements:**
|
| 134 |
-
- NVIDIA GPU
|
| 135 |
-
- CUDA installed
|
| 136 |
-
- `pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118`
|
| 137 |
-
|
| 138 |
-
---
|
| 139 |
-
|
| 140 |
-
## Recommended Settings
|
| 141 |
-
|
| 142 |
-
### For Speed (Fastest)
|
| 143 |
-
```python
|
| 144 |
-
FEATURES = {
|
| 145 |
-
"ocr_enabled": False,
|
| 146 |
-
"object_detection_enabled": False,
|
| 147 |
-
"embeddings_enabled": False,
|
| 148 |
-
}
|
| 149 |
-
```
|
| 150 |
-
**Speed:** 1.5 seconds
|
| 151 |
-
**Features:** Caption only
|
| 152 |
-
|
| 153 |
-
### For Balance (Recommended)
|
| 154 |
-
```python
|
| 155 |
-
FEATURES = {
|
| 156 |
-
"ocr_enabled": True,
|
| 157 |
-
"object_detection_enabled": False,
|
| 158 |
-
"embeddings_enabled": False,
|
| 159 |
-
}
|
| 160 |
-
```
|
| 161 |
-
**Speed:** 2 seconds
|
| 162 |
-
**Features:** Caption + OCR
|
| 163 |
-
|
| 164 |
-
### For Full Features
|
| 165 |
-
```python
|
| 166 |
-
FEATURES = {
|
| 167 |
-
"ocr_enabled": True,
|
| 168 |
-
"object_detection_enabled": True,
|
| 169 |
-
"embeddings_enabled": False,
|
| 170 |
-
}
|
| 171 |
-
```
|
| 172 |
-
**Speed:** 2.5 seconds
|
| 173 |
-
**Features:** Everything
|
| 174 |
-
|
| 175 |
-
---
|
| 176 |
-
|
| 177 |
-
## Summary
|
| 178 |
-
|
| 179 |
-
**Models used:**
|
| 180 |
-
- BLIP (caption) - 1.5s - Can't disable
|
| 181 |
-
- YOLO (objects) - 0.5s - Can disable
|
| 182 |
-
- EasyOCR (text) - 0.5s - Can disable
|
| 183 |
-
|
| 184 |
-
**Why slow:**
|
| 185 |
-
- BLIP takes 1.5 seconds
|
| 186 |
-
- This is normal for AI image captioning
|
| 187 |
-
- Camera itself is fast
|
| 188 |
-
|
| 189 |
-
**Quick fix:**
|
| 190 |
-
```python
|
| 191 |
-
# Disable OCR and YOLO
|
| 192 |
-
FEATURES = {
|
| 193 |
-
"ocr_enabled": False,
|
| 194 |
-
"object_detection_enabled": False,
|
| 195 |
-
}
|
| 196 |
-
```
|
| 197 |
-
**Result:** 40% faster (1.5s instead of 2.5s)
|
| 198 |
-
|
| 199 |
-
**Best fix:**
|
| 200 |
-
- Use GPU (2-3x faster)
|
| 201 |
-
- Or accept 1.5-2.5 second delay (normal for AI)
|
| 202 |
-
|
| 203 |
-
**The camera is not slow - the AI models are doing heavy processing!**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
README.md
CHANGED
|
@@ -14,7 +14,7 @@ license: apache-2.0
|
|
| 14 |
|
| 15 |
[](https://www.python.org/downloads/)
|
| 16 |
[](https://streamlit.io/)
|
| 17 |
-
[](https://www.python.org/downloads/)
|
| 16 |
[](https://streamlit.io/)
|
| 17 |
+
[](LICENSE)
|
| 18 |
|
| 19 |
---
|
| 20 |
|
RESTRUCTURE_SUMMARY.md
DELETED
|
@@ -1,348 +0,0 @@
|
|
| 1 |
-
# 🎉 VisionQ - Restructuring Complete!
|
| 2 |
-
|
| 3 |
-
## ✅ What Was Done
|
| 4 |
-
|
| 5 |
-
Your VisionQ project has been **completely restructured** with:
|
| 6 |
-
|
| 7 |
-
1. ✅ **Clean folder structure**
|
| 8 |
-
2. ✅ **Streamlit web interface**
|
| 9 |
-
3. ✅ **Centralized configuration**
|
| 10 |
-
4. ✅ **Comprehensive documentation**
|
| 11 |
-
5. ✅ **90+ language support**
|
| 12 |
-
6. ✅ **No API keys needed**
|
| 13 |
-
|
| 14 |
-
---
|
| 15 |
-
|
| 16 |
-
## 📂 NEW Structure
|
| 17 |
-
|
| 18 |
-
```
|
| 19 |
-
VisionQ/
|
| 20 |
-
├── agents/ # AI agents (7 files)
|
| 21 |
-
├── config/ # Settings (1 file)
|
| 22 |
-
├── ui/ # Streamlit app (1 file)
|
| 23 |
-
├── core/ # Integration (1 file)
|
| 24 |
-
├── data/ # Storage (auto-created)
|
| 25 |
-
├── models/ # AI models (auto-downloaded)
|
| 26 |
-
├── docs/ # Documentation (3 files)
|
| 27 |
-
├── .streamlit/ # UI config
|
| 28 |
-
├── README.md # Main docs
|
| 29 |
-
├── requirements.txt # Dependencies
|
| 30 |
-
├── run.bat # Launcher
|
| 31 |
-
└── cleanup.bat # Cleanup script
|
| 32 |
-
```
|
| 33 |
-
|
| 34 |
-
**Total:** 10 code files, 3 docs, clean structure!
|
| 35 |
-
|
| 36 |
-
---
|
| 37 |
-
|
| 38 |
-
## 🚀 How to Use
|
| 39 |
-
|
| 40 |
-
### **1. Install Dependencies**
|
| 41 |
-
```bash
|
| 42 |
-
pip install -r requirements.txt
|
| 43 |
-
```
|
| 44 |
-
|
| 45 |
-
### **2. Launch Web Interface**
|
| 46 |
-
```bash
|
| 47 |
-
# Windows
|
| 48 |
-
run.bat
|
| 49 |
-
|
| 50 |
-
# Linux/Mac
|
| 51 |
-
streamlit run ui/app.py
|
| 52 |
-
```
|
| 53 |
-
|
| 54 |
-
### **3. Open Browser**
|
| 55 |
-
Go to: `http://localhost:8501`
|
| 56 |
-
|
| 57 |
-
### **4. Start Using**
|
| 58 |
-
- Click "Initialize System"
|
| 59 |
-
- Capture scenes
|
| 60 |
-
- Read text (OCR)
|
| 61 |
-
- Query memories
|
| 62 |
-
|
| 63 |
-
---
|
| 64 |
-
|
| 65 |
-
## 🌍 Language Support
|
| 66 |
-
|
| 67 |
-
**90+ languages supported!**
|
| 68 |
-
|
| 69 |
-
Including:
|
| 70 |
-
- 🇬🇧 English
|
| 71 |
-
- 🇪🇸 Spanish
|
| 72 |
-
- 🇫🇷 French
|
| 73 |
-
- 🇩🇪 German
|
| 74 |
-
- 🇮🇹 Italian
|
| 75 |
-
- 🇵🇹 Portuguese
|
| 76 |
-
- 🇷🇺 Russian
|
| 77 |
-
- 🇨🇳 Chinese
|
| 78 |
-
- 🇯🇵 Japanese
|
| 79 |
-
- 🇰🇷 Korean
|
| 80 |
-
- 🇸🇦 Arabic
|
| 81 |
-
- 🇮🇳 Hindi
|
| 82 |
-
- ...and 78 more!
|
| 83 |
-
|
| 84 |
-
**Select languages in UI sidebar.**
|
| 85 |
-
|
| 86 |
-
See `docs/LANGUAGES.md` for full list.
|
| 87 |
-
|
| 88 |
-
---
|
| 89 |
-
|
| 90 |
-
## 🔑 API Keys
|
| 91 |
-
|
| 92 |
-
**Do you need API keys?**
|
| 93 |
-
|
| 94 |
-
# **NO!** ❌
|
| 95 |
-
|
| 96 |
-
VisionQ works **100% offline** without any API keys.
|
| 97 |
-
|
| 98 |
-
All models run locally:
|
| 99 |
-
- ✅ YOLO (object detection)
|
| 100 |
-
- ✅ BLIP (captioning)
|
| 101 |
-
- ✅ CLIP (embeddings)
|
| 102 |
-
- ✅ EasyOCR (text extraction)
|
| 103 |
-
- ✅ DistilBERT (NLP)
|
| 104 |
-
- ✅ FAISS (vector search)
|
| 105 |
-
|
| 106 |
-
**Optional:** Hugging Face token (for private models only)
|
| 107 |
-
|
| 108 |
-
See `docs/API_KEYS.md` for details.
|
| 109 |
-
|
| 110 |
-
---
|
| 111 |
-
|
| 112 |
-
## 🎯 Key Features
|
| 113 |
-
|
| 114 |
-
### **Vision**
|
| 115 |
-
- 👁️ Object detection (YOLO/SSD)
|
| 116 |
-
- 📝 Image captioning (BLIP)
|
| 117 |
-
- 🖼️ Visual embeddings (CLIP)
|
| 118 |
-
- 🔤 Text extraction (OCR, 90+ languages)
|
| 119 |
-
|
| 120 |
-
### **Memory**
|
| 121 |
-
- 🧠 Semantic storage (FAISS)
|
| 122 |
-
- 💾 Persistent JSON
|
| 123 |
-
- ⚡ Fast search (<10ms)
|
| 124 |
-
- 📊 10,000+ capacity
|
| 125 |
-
|
| 126 |
-
### **Intelligence**
|
| 127 |
-
- 🔍 Smart queries (DistilBERT)
|
| 128 |
-
- ⏰ Time-based filtering
|
| 129 |
-
- 🎯 Intent classification
|
| 130 |
-
- 🔗 Multimodal fusion
|
| 131 |
-
|
| 132 |
-
### **Interface**
|
| 133 |
-
- 🌐 Web UI (Streamlit)
|
| 134 |
-
- 📱 Responsive design
|
| 135 |
-
- 🎨 Clean interface
|
| 136 |
-
- 🚀 One-click launch
|
| 137 |
-
|
| 138 |
-
---
|
| 139 |
-
|
| 140 |
-
## 📚 Documentation
|
| 141 |
-
|
| 142 |
-
| File | Purpose |
|
| 143 |
-
|------|---------|
|
| 144 |
-
| `README.md` | Main documentation |
|
| 145 |
-
| `docs/LANGUAGES.md` | Language support (90+) |
|
| 146 |
-
| `docs/API_KEYS.md` | API keys info (none needed!) |
|
| 147 |
-
| `docs/STRUCTURE.md` | Project structure |
|
| 148 |
-
|
| 149 |
-
---
|
| 150 |
-
|
| 151 |
-
## 🧹 Cleanup Old Files
|
| 152 |
-
|
| 153 |
-
**Run cleanup script:**
|
| 154 |
-
```bash
|
| 155 |
-
cleanup.bat
|
| 156 |
-
```
|
| 157 |
-
|
| 158 |
-
This moves old files to `archive/` folder:
|
| 159 |
-
- Old agent files
|
| 160 |
-
- Old documentation
|
| 161 |
-
- Old scripts
|
| 162 |
-
- Old requirements
|
| 163 |
-
|
| 164 |
-
**You can safely delete `archive/` if not needed.**
|
| 165 |
-
|
| 166 |
-
---
|
| 167 |
-
|
| 168 |
-
## 🔧 Configuration
|
| 169 |
-
|
| 170 |
-
**Edit `config/settings.py` to customize:**
|
| 171 |
-
|
| 172 |
-
```python
|
| 173 |
-
# OCR languages
|
| 174 |
-
OCR_CONFIG = {
|
| 175 |
-
"languages": ["en", "es", "fr"],
|
| 176 |
-
}
|
| 177 |
-
|
| 178 |
-
# Vision settings
|
| 179 |
-
VISION_CONFIG = {
|
| 180 |
-
"camera_index": 0,
|
| 181 |
-
"confidence_threshold": 0.5,
|
| 182 |
-
}
|
| 183 |
-
|
| 184 |
-
# Memory settings
|
| 185 |
-
MEMORY_CONFIG = {
|
| 186 |
-
"max_memories": 10000,
|
| 187 |
-
}
|
| 188 |
-
```
|
| 189 |
-
|
| 190 |
-
---
|
| 191 |
-
|
| 192 |
-
## 🎓 Quick Start Guide
|
| 193 |
-
|
| 194 |
-
### **Step 1: Install**
|
| 195 |
-
```bash
|
| 196 |
-
pip install -r requirements.txt
|
| 197 |
-
```
|
| 198 |
-
|
| 199 |
-
### **Step 2: Run**
|
| 200 |
-
```bash
|
| 201 |
-
run.bat # Windows
|
| 202 |
-
# or
|
| 203 |
-
streamlit run ui/app.py # Linux/Mac
|
| 204 |
-
```
|
| 205 |
-
|
| 206 |
-
### **Step 3: Initialize**
|
| 207 |
-
- Open http://localhost:8501
|
| 208 |
-
- Click "Initialize System"
|
| 209 |
-
- Wait for models to load (~1 min first time)
|
| 210 |
-
|
| 211 |
-
### **Step 4: Use**
|
| 212 |
-
- **Vision Tab:** Capture, remember, read text
|
| 213 |
-
- **Query Tab:** Ask questions about memories
|
| 214 |
-
- **Memories Tab:** Browse stored memories
|
| 215 |
-
- **Help Tab:** Documentation and tips
|
| 216 |
-
|
| 217 |
-
---
|
| 218 |
-
|
| 219 |
-
## 📊 What Changed
|
| 220 |
-
|
| 221 |
-
### **Before**
|
| 222 |
-
```
|
| 223 |
-
❌ Flat file structure
|
| 224 |
-
❌ Redundant files
|
| 225 |
-
❌ No web interface
|
| 226 |
-
❌ Scattered documentation
|
| 227 |
-
❌ Complex to use
|
| 228 |
-
```
|
| 229 |
-
|
| 230 |
-
### **After**
|
| 231 |
-
```
|
| 232 |
-
✅ Clean folder structure
|
| 233 |
-
✅ No redundant files
|
| 234 |
-
✅ Streamlit web interface
|
| 235 |
-
✅ Organized documentation
|
| 236 |
-
✅ Easy to use
|
| 237 |
-
```
|
| 238 |
-
|
| 239 |
-
---
|
| 240 |
-
|
| 241 |
-
## 🎯 Benefits
|
| 242 |
-
|
| 243 |
-
### **For Users**
|
| 244 |
-
- ✅ Easy web interface
|
| 245 |
-
- ✅ One-click launch
|
| 246 |
-
- ✅ Clear documentation
|
| 247 |
-
- ✅ 90+ languages
|
| 248 |
-
|
| 249 |
-
### **For Developers**
|
| 250 |
-
- ✅ Clean structure
|
| 251 |
-
- ✅ Modular code
|
| 252 |
-
- ✅ Centralized config
|
| 253 |
-
- ✅ Easy to extend
|
| 254 |
-
|
| 255 |
-
### **For Everyone**
|
| 256 |
-
- ✅ No API keys needed
|
| 257 |
-
- ✅ 100% offline
|
| 258 |
-
- ✅ Free forever
|
| 259 |
-
- ✅ Open source
|
| 260 |
-
|
| 261 |
-
---
|
| 262 |
-
|
| 263 |
-
## 🐛 Troubleshooting
|
| 264 |
-
|
| 265 |
-
### **Models loading slowly?**
|
| 266 |
-
- First run downloads ~2GB
|
| 267 |
-
- Subsequent runs are fast
|
| 268 |
-
- Models cached locally
|
| 269 |
-
|
| 270 |
-
### **Camera not working?**
|
| 271 |
-
- Check permissions
|
| 272 |
-
- Try different camera index
|
| 273 |
-
- Ensure no other app using camera
|
| 274 |
-
|
| 275 |
-
### **OCR not detecting text?**
|
| 276 |
-
- Ensure good lighting
|
| 277 |
-
- Text should be clear
|
| 278 |
-
- Try different languages
|
| 279 |
-
|
| 280 |
-
### **Out of memory?**
|
| 281 |
-
- Close other applications
|
| 282 |
-
- Reduce stored memories
|
| 283 |
-
- Use CPU instead of GPU
|
| 284 |
-
|
| 285 |
-
---
|
| 286 |
-
|
| 287 |
-
## 📞 Support
|
| 288 |
-
|
| 289 |
-
**Need help?**
|
| 290 |
-
|
| 291 |
-
1. Check `README.md`
|
| 292 |
-
2. Check `docs/` folder
|
| 293 |
-
3. Check UI "Help" tab
|
| 294 |
-
4. Open GitHub issue
|
| 295 |
-
|
| 296 |
-
---
|
| 297 |
-
|
| 298 |
-
## ✅ Next Steps
|
| 299 |
-
|
| 300 |
-
### **Immediate**
|
| 301 |
-
1. ✅ Run `pip install -r requirements.txt`
|
| 302 |
-
2. ✅ Run `run.bat` or `streamlit run ui/app.py`
|
| 303 |
-
3. ✅ Open http://localhost:8501
|
| 304 |
-
4. ✅ Click "Initialize System"
|
| 305 |
-
5. ✅ Start using!
|
| 306 |
-
|
| 307 |
-
### **Optional**
|
| 308 |
-
1. ⭐ Run `cleanup.bat` to organize old files
|
| 309 |
-
2. ⭐ Customize `config/settings.py`
|
| 310 |
-
3. ⭐ Add languages in UI sidebar
|
| 311 |
-
4. ⭐ Explore documentation
|
| 312 |
-
|
| 313 |
-
---
|
| 314 |
-
|
| 315 |
-
## 🎉 Summary
|
| 316 |
-
|
| 317 |
-
**VisionQ is now:**
|
| 318 |
-
- ✅ Clean & organized
|
| 319 |
-
- ✅ Easy to use (web UI)
|
| 320 |
-
- ✅ Well documented
|
| 321 |
-
- ✅ Multi-language (90+)
|
| 322 |
-
- ✅ No API keys needed
|
| 323 |
-
- ✅ 100% offline
|
| 324 |
-
- ✅ Production ready
|
| 325 |
-
|
| 326 |
-
**Everything you need in one place!**
|
| 327 |
-
|
| 328 |
-
---
|
| 329 |
-
|
| 330 |
-
## 📋 Checklist
|
| 331 |
-
|
| 332 |
-
- [ ] Install dependencies: `pip install -r requirements.txt`
|
| 333 |
-
- [ ] Launch UI: `run.bat` or `streamlit run ui/app.py`
|
| 334 |
-
- [ ] Open browser: http://localhost:8501
|
| 335 |
-
- [ ] Initialize system
|
| 336 |
-
- [ ] Test vision features
|
| 337 |
-
- [ ] Test OCR (read text)
|
| 338 |
-
- [ ] Test queries
|
| 339 |
-
- [ ] Browse memories
|
| 340 |
-
- [ ] Read documentation
|
| 341 |
-
- [ ] Customize settings (optional)
|
| 342 |
-
- [ ] Run cleanup (optional)
|
| 343 |
-
|
| 344 |
-
---
|
| 345 |
-
|
| 346 |
-
**VisionQ - Restructured, refined, and ready to use! 🚀**
|
| 347 |
-
|
| 348 |
-
**Open http://localhost:8501 and start exploring!**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
TROUBLESHOOTING.md
DELETED
|
@@ -1,191 +0,0 @@
|
|
| 1 |
-
# Troubleshooting: AttributeError with Embeddings
|
| 2 |
-
|
| 3 |
-
## The Error
|
| 4 |
-
|
| 5 |
-
```
|
| 6 |
-
AttributeError: 'BaseModelOutputWithPooling' object has no attribute 'norm'
|
| 7 |
-
```
|
| 8 |
-
|
| 9 |
-
## What Causes This
|
| 10 |
-
|
| 11 |
-
This error occurs when:
|
| 12 |
-
1. Old cached version of embedding_agent.py is being used
|
| 13 |
-
2. Streamlit is caching the old agent code
|
| 14 |
-
3. Python bytecode cache (__pycache__) has old version
|
| 15 |
-
|
| 16 |
-
## Quick Fix (Recommended)
|
| 17 |
-
|
| 18 |
-
### Option 1: Use Fix Script
|
| 19 |
-
```bash
|
| 20 |
-
fix_and_run.bat
|
| 21 |
-
```
|
| 22 |
-
|
| 23 |
-
This will:
|
| 24 |
-
- Clear all Python cache
|
| 25 |
-
- Clear Streamlit cache
|
| 26 |
-
- Restart the application
|
| 27 |
-
|
| 28 |
-
### Option 2: Manual Fix
|
| 29 |
-
```bash
|
| 30 |
-
# 1. Stop the application (Ctrl+C)
|
| 31 |
-
|
| 32 |
-
# 2. Clear Python cache
|
| 33 |
-
rd /s /q __pycache__
|
| 34 |
-
rd /s /q agents\__pycache__
|
| 35 |
-
rd /s /q config\__pycache__
|
| 36 |
-
rd /s /q core\__pycache__
|
| 37 |
-
rd /s /q ui\__pycache__
|
| 38 |
-
|
| 39 |
-
# 3. Clear Streamlit cache
|
| 40 |
-
rd /s /q .streamlit\cache
|
| 41 |
-
|
| 42 |
-
# 4. Restart
|
| 43 |
-
run.bat
|
| 44 |
-
```
|
| 45 |
-
|
| 46 |
-
### Option 3: In Browser
|
| 47 |
-
1. Open http://localhost:8501
|
| 48 |
-
2. Press `C` key (clears cache)
|
| 49 |
-
3. Click "Initialize System" again
|
| 50 |
-
|
| 51 |
-
## Permanent Fix
|
| 52 |
-
|
| 53 |
-
The code has been updated with error handling, so even if the error occurs, it will:
|
| 54 |
-
1. Print a warning message
|
| 55 |
-
2. Continue without embeddings
|
| 56 |
-
3. Still work for caption and OCR
|
| 57 |
-
|
| 58 |
-
## Verify Fix
|
| 59 |
-
|
| 60 |
-
After running fix_and_run.bat, you should see:
|
| 61 |
-
```
|
| 62 |
-
[VisionAgent] Embeddings disabled for faster performance
|
| 63 |
-
```
|
| 64 |
-
|
| 65 |
-
This means embeddings are properly disabled and won't cause errors.
|
| 66 |
-
|
| 67 |
-
## If Still Getting Errors
|
| 68 |
-
|
| 69 |
-
### Step 1: Check Config
|
| 70 |
-
Open `config/settings.py` and verify:
|
| 71 |
-
```python
|
| 72 |
-
FEATURES = {
|
| 73 |
-
"embeddings_enabled": False, # Should be False
|
| 74 |
-
}
|
| 75 |
-
```
|
| 76 |
-
|
| 77 |
-
### Step 2: Delete All Cache
|
| 78 |
-
```bash
|
| 79 |
-
# Delete everything
|
| 80 |
-
rd /s /q __pycache__
|
| 81 |
-
rd /s /q agents\__pycache__
|
| 82 |
-
rd /s /q config\__pycache__
|
| 83 |
-
rd /s /q core\__pycache__
|
| 84 |
-
rd /s /q ui\__pycache__
|
| 85 |
-
rd /s /q .streamlit
|
| 86 |
-
|
| 87 |
-
# Recreate .streamlit
|
| 88 |
-
mkdir .streamlit
|
| 89 |
-
```
|
| 90 |
-
|
| 91 |
-
### Step 3: Reinstall
|
| 92 |
-
```bash
|
| 93 |
-
pip uninstall transformers torch -y
|
| 94 |
-
pip install transformers torch
|
| 95 |
-
```
|
| 96 |
-
|
| 97 |
-
### Step 4: Fresh Start
|
| 98 |
-
```bash
|
| 99 |
-
# Close all Python processes
|
| 100 |
-
taskkill /F /IM python.exe
|
| 101 |
-
|
| 102 |
-
# Wait 5 seconds
|
| 103 |
-
|
| 104 |
-
# Start fresh
|
| 105 |
-
run.bat
|
| 106 |
-
```
|
| 107 |
-
|
| 108 |
-
## Understanding the Fix
|
| 109 |
-
|
| 110 |
-
### What Was Changed
|
| 111 |
-
|
| 112 |
-
**Before (Broken):**
|
| 113 |
-
```python
|
| 114 |
-
embedding = image_features / image_features.norm(dim=-1, keepdim=True)
|
| 115 |
-
```
|
| 116 |
-
|
| 117 |
-
**After (Fixed):**
|
| 118 |
-
```python
|
| 119 |
-
embedding = torch.nn.functional.normalize(image_features, p=2, dim=-1)
|
| 120 |
-
```
|
| 121 |
-
|
| 122 |
-
### Why It Works
|
| 123 |
-
|
| 124 |
-
The new method uses PyTorch's built-in normalize function which:
|
| 125 |
-
- Works with all tensor types
|
| 126 |
-
- Handles the BaseModelOutputWithPooling correctly
|
| 127 |
-
- Is more robust
|
| 128 |
-
|
| 129 |
-
### Error Handling Added
|
| 130 |
-
|
| 131 |
-
```python
|
| 132 |
-
if self.embedding_agent:
|
| 133 |
-
try:
|
| 134 |
-
embedding = self.embedding_agent.encode_image(frame)
|
| 135 |
-
except Exception as e:
|
| 136 |
-
print(f"[VisionAgent] Embedding failed: {e}")
|
| 137 |
-
embedding = None
|
| 138 |
-
```
|
| 139 |
-
|
| 140 |
-
Now even if embeddings fail, the system continues working.
|
| 141 |
-
|
| 142 |
-
## Prevention
|
| 143 |
-
|
| 144 |
-
To avoid this in the future:
|
| 145 |
-
|
| 146 |
-
1. **Always clear cache after code changes:**
|
| 147 |
-
```bash
|
| 148 |
-
fix_and_run.bat
|
| 149 |
-
```
|
| 150 |
-
|
| 151 |
-
2. **Use the fix script instead of run.bat when testing changes**
|
| 152 |
-
|
| 153 |
-
3. **Keep embeddings disabled unless you need visual search:**
|
| 154 |
-
```python
|
| 155 |
-
FEATURES = {
|
| 156 |
-
"embeddings_enabled": False, # Faster and more stable
|
| 157 |
-
}
|
| 158 |
-
```
|
| 159 |
-
|
| 160 |
-
## Performance Note
|
| 161 |
-
|
| 162 |
-
With embeddings disabled:
|
| 163 |
-
- Speed: 2-3 seconds per capture
|
| 164 |
-
- Features: Caption + OCR + Object Detection
|
| 165 |
-
- Stability: No embedding errors
|
| 166 |
-
|
| 167 |
-
With embeddings enabled:
|
| 168 |
-
- Speed: 5-7 seconds per capture
|
| 169 |
-
- Features: All features + Visual Search
|
| 170 |
-
- Stability: May have errors if not properly cached
|
| 171 |
-
|
| 172 |
-
## Summary
|
| 173 |
-
|
| 174 |
-
**Quick Fix:**
|
| 175 |
-
1. Run `fix_and_run.bat`
|
| 176 |
-
2. Click "Initialize System"
|
| 177 |
-
3. Test "Capture & Describe"
|
| 178 |
-
4. Should work now!
|
| 179 |
-
|
| 180 |
-
**If still broken:**
|
| 181 |
-
1. Check `config/settings.py` - embeddings should be False
|
| 182 |
-
2. Delete all __pycache__ folders
|
| 183 |
-
3. Restart computer (clears all Python processes)
|
| 184 |
-
4. Run `fix_and_run.bat`
|
| 185 |
-
|
| 186 |
-
**Prevention:**
|
| 187 |
-
- Always use `fix_and_run.bat` after code changes
|
| 188 |
-
- Keep embeddings disabled for stability
|
| 189 |
-
- Clear cache regularly
|
| 190 |
-
|
| 191 |
-
The system is now more robust and will handle errors gracefully!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
VisionQ
DELETED
|
@@ -1 +0,0 @@
|
|
| 1 |
-
Subproject commit 18f18d23a1f3ad386db32957c746239c80e78751
|
|
|
|
|
|
archive/old_agents/caption_agent.py
DELETED
|
@@ -1,40 +0,0 @@
|
|
| 1 |
-
import os
|
| 2 |
-
|
| 3 |
-
# Avoid importing TensorFlow (fixes compatibility issues such as protobuf/DType conflicts)
|
| 4 |
-
os.environ["TRANSFORMERS_NO_TF"] = "1"
|
| 5 |
-
os.environ["HF_HUB_DISABLE_TF"] = "1"
|
| 6 |
-
|
| 7 |
-
from PIL import Image
|
| 8 |
-
import torch
|
| 9 |
-
|
| 10 |
-
try:
|
| 11 |
-
from transformers import BlipProcessor, BlipForConditionalGeneration
|
| 12 |
-
except Exception as e:
|
| 13 |
-
raise RuntimeError(
|
| 14 |
-
"Failed to import transformers/BLIP. This usually happens when TensorFlow and protobuf "
|
| 15 |
-
"are out of sync in the current Python environment.\n\n"
|
| 16 |
-
"Fix: run in a clean virtual environment and install dependencies from requirements.txt."
|
| 17 |
-
) from e
|
| 18 |
-
|
| 19 |
-
class CaptionAgent:
|
| 20 |
-
def __init__(self):
|
| 21 |
-
self.processor = BlipProcessor.from_pretrained(
|
| 22 |
-
"Salesforce/blip-image-captioning-base"
|
| 23 |
-
)
|
| 24 |
-
self.model = BlipForConditionalGeneration.from_pretrained(
|
| 25 |
-
"Salesforce/blip-image-captioning-base"
|
| 26 |
-
)
|
| 27 |
-
|
| 28 |
-
self.model.eval()
|
| 29 |
-
|
| 30 |
-
def describe(self, frame_bgr):
|
| 31 |
-
# OpenCV BGR → PIL RGB
|
| 32 |
-
frame_rgb = frame_bgr[:, :, ::-1]
|
| 33 |
-
image = Image.fromarray(frame_rgb)
|
| 34 |
-
|
| 35 |
-
inputs = self.processor(image, return_tensors="pt")
|
| 36 |
-
with torch.no_grad():
|
| 37 |
-
out = self.model.generate(**inputs, max_length=30)
|
| 38 |
-
|
| 39 |
-
caption = self.processor.decode(out[0], skip_special_tokens=True)
|
| 40 |
-
return caption
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
archive/old_agents/memory_agent.py
DELETED
|
@@ -1,59 +0,0 @@
|
|
| 1 |
-
import json
|
| 2 |
-
from datetime import datetime
|
| 3 |
-
import numpy as np
|
| 4 |
-
from sentence_transformers import SentenceTransformer
|
| 5 |
-
|
| 6 |
-
class MemoryAgent:
|
| 7 |
-
def __init__(self, memory_file="memory.json"):
|
| 8 |
-
self.memory_file = memory_file
|
| 9 |
-
self.model = SentenceTransformer("all-MiniLM-L6-v2")
|
| 10 |
-
self.memories = []
|
| 11 |
-
self._load()
|
| 12 |
-
|
| 13 |
-
def _load(self):
|
| 14 |
-
try:
|
| 15 |
-
with open(self.memory_file, "r") as f:
|
| 16 |
-
self.memories = json.load(f)
|
| 17 |
-
except:
|
| 18 |
-
self.memories = []
|
| 19 |
-
|
| 20 |
-
def _save(self):
|
| 21 |
-
with open(self.memory_file, "w") as f:
|
| 22 |
-
json.dump(self.memories, f, indent=2)
|
| 23 |
-
|
| 24 |
-
@staticmethod
|
| 25 |
-
def compute_importance(description):
|
| 26 |
-
desc = description.lower()
|
| 27 |
-
score = 1 # base importance
|
| 28 |
-
|
| 29 |
-
if "person" in desc:
|
| 30 |
-
score += 2
|
| 31 |
-
|
| 32 |
-
if any(obj in desc for obj in ["phone", "bag", "book", "device"]):
|
| 33 |
-
score += 1
|
| 34 |
-
|
| 35 |
-
if any(act in desc for act in ["entered", "left", "holding", "walking"]):
|
| 36 |
-
score += 2
|
| 37 |
-
|
| 38 |
-
return score
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
def add(self, description):
|
| 42 |
-
embedding = self.model.encode(description).tolist()
|
| 43 |
-
importance = MemoryAgent.compute_importance(description)
|
| 44 |
-
|
| 45 |
-
memory = {
|
| 46 |
-
"timestamp": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
| 47 |
-
"description": description,
|
| 48 |
-
"embedding": embedding,
|
| 49 |
-
"importance": importance
|
| 50 |
-
}
|
| 51 |
-
|
| 52 |
-
self.memories.append(memory)
|
| 53 |
-
self._save()
|
| 54 |
-
|
| 55 |
-
def recall_last(self):
|
| 56 |
-
if not self.memories:
|
| 57 |
-
return None
|
| 58 |
-
return self.memories[-1]
|
| 59 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
archive/old_agents/query_agent.py
DELETED
|
@@ -1,127 +0,0 @@
|
|
| 1 |
-
import numpy as np
|
| 2 |
-
from datetime import datetime, timedelta
|
| 3 |
-
|
| 4 |
-
class QueryAgent:
|
| 5 |
-
def __init__(self, memory_agent):
|
| 6 |
-
self.memory_agent = memory_agent
|
| 7 |
-
self.model = memory_agent.model
|
| 8 |
-
|
| 9 |
-
# Cosine similarity
|
| 10 |
-
def cosine_similarity(self, a, b):
|
| 11 |
-
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
|
| 12 |
-
|
| 13 |
-
# Time parsing
|
| 14 |
-
@staticmethod
|
| 15 |
-
def extract_time_window(question):
|
| 16 |
-
now = datetime.now()
|
| 17 |
-
q = question.lower()
|
| 18 |
-
|
| 19 |
-
if "last hour" in q:
|
| 20 |
-
return now - timedelta(hours=1)
|
| 21 |
-
|
| 22 |
-
if "last 30 minutes" in q:
|
| 23 |
-
return now - timedelta(minutes=30)
|
| 24 |
-
|
| 25 |
-
if "recent" in q or "recently" in q:
|
| 26 |
-
return now - timedelta(hours=2)
|
| 27 |
-
|
| 28 |
-
if "today" in q:
|
| 29 |
-
return now.replace(hour=0, minute=0, second=0)
|
| 30 |
-
|
| 31 |
-
if "yesterday" in q:
|
| 32 |
-
start = (now - timedelta(days=1)).replace(hour=0, minute=0, second=0)
|
| 33 |
-
end = start + timedelta(days=1)
|
| 34 |
-
return (start, end)
|
| 35 |
-
|
| 36 |
-
if "this morning" in q:
|
| 37 |
-
return (
|
| 38 |
-
now.replace(hour=6, minute=0, second=0),
|
| 39 |
-
now.replace(hour=12, minute=0, second=0),
|
| 40 |
-
)
|
| 41 |
-
|
| 42 |
-
if "this evening" in q:
|
| 43 |
-
return (
|
| 44 |
-
now.replace(hour=18, minute=0, second=0),
|
| 45 |
-
now.replace(hour=22, minute=0, second=0),
|
| 46 |
-
)
|
| 47 |
-
|
| 48 |
-
if "last evening" in q:
|
| 49 |
-
start = (now - timedelta(days=1)).replace(hour=18, minute=0, second=0)
|
| 50 |
-
return (start, start.replace(hour=22))
|
| 51 |
-
|
| 52 |
-
if "last night" in q:
|
| 53 |
-
return (
|
| 54 |
-
(now - timedelta(days=1)).replace(hour=22, minute=0, second=0),
|
| 55 |
-
now.replace(hour=6, minute=0, second=0),
|
| 56 |
-
)
|
| 57 |
-
|
| 58 |
-
return None
|
| 59 |
-
|
| 60 |
-
# MAIN QUERY METHOD
|
| 61 |
-
def ask(self, question, threshold=0.45): #Change to 0.5 when scalable enough
|
| 62 |
-
memories = self.memory_agent.recall_all()
|
| 63 |
-
if not memories:
|
| 64 |
-
return "I don't have any memories yet."
|
| 65 |
-
|
| 66 |
-
# Time filtering
|
| 67 |
-
time_filter = self.extract_time_window(question)
|
| 68 |
-
filtered = []
|
| 69 |
-
|
| 70 |
-
for m in memories:
|
| 71 |
-
mem_time = datetime.strptime(
|
| 72 |
-
m["timestamp"], "%Y-%m-%d %H:%M:%S"
|
| 73 |
-
)
|
| 74 |
-
|
| 75 |
-
if time_filter is None:
|
| 76 |
-
filtered.append(m)
|
| 77 |
-
|
| 78 |
-
elif isinstance(time_filter, tuple):
|
| 79 |
-
start, end = time_filter
|
| 80 |
-
if start <= mem_time < end:
|
| 81 |
-
filtered.append(m)
|
| 82 |
-
|
| 83 |
-
else:
|
| 84 |
-
if mem_time >= time_filter:
|
| 85 |
-
filtered.append(m)
|
| 86 |
-
|
| 87 |
-
if not filtered:
|
| 88 |
-
return "I don't recall anything from that time."
|
| 89 |
-
|
| 90 |
-
# Semantic similarity
|
| 91 |
-
query_embedding = self.model.encode(question)
|
| 92 |
-
scored = []
|
| 93 |
-
|
| 94 |
-
for m in filtered:
|
| 95 |
-
# Handle missing embeddings (for backwards compatibility)
|
| 96 |
-
if "embedding" not in m:
|
| 97 |
-
m["embedding"] = self.model.encode(m["description"]).tolist()
|
| 98 |
-
|
| 99 |
-
# Handle missing importance
|
| 100 |
-
if "importance" not in m:
|
| 101 |
-
m["importance"] = 1
|
| 102 |
-
|
| 103 |
-
sim = self.cosine_similarity(
|
| 104 |
-
query_embedding,
|
| 105 |
-
np.array(m["embedding"])
|
| 106 |
-
)
|
| 107 |
-
if sim >= threshold:
|
| 108 |
-
scored.append((sim, m))
|
| 109 |
-
|
| 110 |
-
if not scored:
|
| 111 |
-
return "I don't recall anything related to that."
|
| 112 |
-
|
| 113 |
-
# Rank by similarity + importance
|
| 114 |
-
scored.sort(
|
| 115 |
-
key=lambda x: (x[0], x[1]["importance"]),
|
| 116 |
-
reverse=True
|
| 117 |
-
)
|
| 118 |
-
|
| 119 |
-
# Build response
|
| 120 |
-
responses = []
|
| 121 |
-
for sim, m in scored:
|
| 122 |
-
responses.append(
|
| 123 |
-
f"At {m['timestamp']}, {m['description']} "
|
| 124 |
-
f"(confidence {sim:.2f})"
|
| 125 |
-
)
|
| 126 |
-
|
| 127 |
-
return "\n".join(responses)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
archive/old_agents/vision_agent.py
DELETED
|
@@ -1,210 +0,0 @@
|
|
| 1 |
-
import cv2
|
| 2 |
-
import numpy as np
|
| 3 |
-
import time
|
| 4 |
-
import warnings
|
| 5 |
-
warnings.filterwarnings("ignore")
|
| 6 |
-
|
| 7 |
-
from caption_agent import CaptionAgent
|
| 8 |
-
from memory_agent import MemoryAgent
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
class VisionAgent:
|
| 12 |
-
def __init__(self):
|
| 13 |
-
# -------------------------------------------------
|
| 14 |
-
# INIT AGENTS
|
| 15 |
-
# -------------------------------------------------
|
| 16 |
-
self.caption_agent = CaptionAgent()
|
| 17 |
-
self.memory_agent = MemoryAgent()
|
| 18 |
-
|
| 19 |
-
# -------------------------------------------------
|
| 20 |
-
# CONFIG
|
| 21 |
-
# -------------------------------------------------
|
| 22 |
-
self.FRAME_INTERVAL = 0.3
|
| 23 |
-
self.CONF_THRESHOLD = 0.5
|
| 24 |
-
|
| 25 |
-
# -------------------------------------------------
|
| 26 |
-
# LOAD YOLO (PRIMARY)
|
| 27 |
-
# -------------------------------------------------
|
| 28 |
-
self.VISION_BACKEND = "SSD"
|
| 29 |
-
self.yolo_model = None
|
| 30 |
-
self.interpreter = None
|
| 31 |
-
self.LABELS = None
|
| 32 |
-
self.input_details = None
|
| 33 |
-
self.output_details = None
|
| 34 |
-
self.INPUT_HEIGHT = None
|
| 35 |
-
self.INPUT_WIDTH = None
|
| 36 |
-
self.INPUT_TYPE = None
|
| 37 |
-
|
| 38 |
-
try:
|
| 39 |
-
from ultralytics import YOLO
|
| 40 |
-
self.yolo_model = YOLO("yolov8s.pt")
|
| 41 |
-
self.VISION_BACKEND = "YOLO"
|
| 42 |
-
print("[Vision] YOLO backend loaded")
|
| 43 |
-
|
| 44 |
-
except Exception as e:
|
| 45 |
-
print("[Vision] YOLO failed, falling back to SSD:", e)
|
| 46 |
-
|
| 47 |
-
Interpreter = None
|
| 48 |
-
try:
|
| 49 |
-
from ai_edge_litert import Interpreter
|
| 50 |
-
except ImportError:
|
| 51 |
-
try:
|
| 52 |
-
import tensorflow as tf
|
| 53 |
-
Interpreter = tf.lite.Interpreter
|
| 54 |
-
except Exception as tf_err:
|
| 55 |
-
print(
|
| 56 |
-
"[Vision] SSD fallback unavailable (ai_edge_litert / tensorflow not installed):",
|
| 57 |
-
tf_err,
|
| 58 |
-
)
|
| 59 |
-
|
| 60 |
-
if Interpreter is not None:
|
| 61 |
-
with open("label_ssd.txt", "r") as f:
|
| 62 |
-
self.LABELS = [line.strip() for line in f.readlines()]
|
| 63 |
-
|
| 64 |
-
self.interpreter = Interpreter(
|
| 65 |
-
model_path="ssd_mobilenet_v2_fpnlite_035_192_int8.tflite"
|
| 66 |
-
)
|
| 67 |
-
self.interpreter.allocate_tensors()
|
| 68 |
-
|
| 69 |
-
self.input_details = self.interpreter.get_input_details()
|
| 70 |
-
self.output_details = self.interpreter.get_output_details()
|
| 71 |
-
|
| 72 |
-
self.INPUT_HEIGHT = self.input_details[0]["shape"][1]
|
| 73 |
-
self.INPUT_WIDTH = self.input_details[0]["shape"][2]
|
| 74 |
-
self.INPUT_TYPE = self.input_details[0]["dtype"]
|
| 75 |
-
|
| 76 |
-
self.VISION_BACKEND = "SSD"
|
| 77 |
-
print("[Vision] MobileNet-SSD backend loaded")
|
| 78 |
-
else:
|
| 79 |
-
# No valid vision backend available, only captioning will work.
|
| 80 |
-
self.VISION_BACKEND = None
|
| 81 |
-
print(
|
| 82 |
-
"[Vision] No valid object-detection backend available; "
|
| 83 |
-
"only captioning will work. Install ultralytics or tensorflow."
|
| 84 |
-
)
|
| 85 |
-
|
| 86 |
-
# -------------------------------------------------
|
| 87 |
-
# CAMERA
|
| 88 |
-
# -------------------------------------------------
|
| 89 |
-
self.cap = cv2.VideoCapture(0)
|
| 90 |
-
print("Vision system initialized.")
|
| 91 |
-
|
| 92 |
-
def describe_scene(self):
|
| 93 |
-
"""Capture and describe current scene"""
|
| 94 |
-
ret, frame = self.cap.read()
|
| 95 |
-
if not ret:
|
| 96 |
-
return None
|
| 97 |
-
return self.caption_agent.describe(frame)
|
| 98 |
-
|
| 99 |
-
def remember_scene(self):
|
| 100 |
-
"""Capture, describe, and remember current scene"""
|
| 101 |
-
ret, frame = self.cap.read()
|
| 102 |
-
if not ret:
|
| 103 |
-
return None
|
| 104 |
-
description = self.caption_agent.describe(frame)
|
| 105 |
-
self.memory_agent.add(description)
|
| 106 |
-
return description
|
| 107 |
-
|
| 108 |
-
def cleanup(self):
|
| 109 |
-
"""Release resources"""
|
| 110 |
-
self.cap.release()
|
| 111 |
-
cv2.destroyAllWindows()
|
| 112 |
-
print("Vision system stopped.")
|
| 113 |
-
|
| 114 |
-
def run_continuous(self):
|
| 115 |
-
"""Run continuous vision loop (object detection + caption on change)"""
|
| 116 |
-
previous_objects = set()
|
| 117 |
-
last_time = 0
|
| 118 |
-
|
| 119 |
-
print("Vision continuous mode started.")
|
| 120 |
-
print("Press 'q' to quit.")
|
| 121 |
-
|
| 122 |
-
while True:
|
| 123 |
-
ret, frame = self.cap.read()
|
| 124 |
-
if not ret:
|
| 125 |
-
break
|
| 126 |
-
|
| 127 |
-
current_time = time.time()
|
| 128 |
-
if current_time - last_time < self.FRAME_INTERVAL:
|
| 129 |
-
continue
|
| 130 |
-
last_time = current_time
|
| 131 |
-
|
| 132 |
-
current_objects = set()
|
| 133 |
-
|
| 134 |
-
# -------------------------------------------------
|
| 135 |
-
# OBJECT DETECTION (CONTINUOUS)
|
| 136 |
-
# -------------------------------------------------
|
| 137 |
-
if self.VISION_BACKEND == "YOLO":
|
| 138 |
-
results = self.yolo_model(frame, conf=self.CONF_THRESHOLD, verbose=False)
|
| 139 |
-
|
| 140 |
-
for r in results:
|
| 141 |
-
for box in r.boxes:
|
| 142 |
-
label = r.names[int(box.cls[0])]
|
| 143 |
-
conf = float(box.conf[0])
|
| 144 |
-
if conf >= self.CONF_THRESHOLD:
|
| 145 |
-
current_objects.add(label)
|
| 146 |
-
|
| 147 |
-
elif self.VISION_BACKEND == "SSD":
|
| 148 |
-
rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
|
| 149 |
-
resized = cv2.resize(rgb, (self.INPUT_WIDTH, self.INPUT_HEIGHT))
|
| 150 |
-
input_data = np.expand_dims(resized, axis=0)
|
| 151 |
-
|
| 152 |
-
if self.INPUT_TYPE == np.uint8:
|
| 153 |
-
input_data = input_data.astype(np.uint8)
|
| 154 |
-
else:
|
| 155 |
-
input_data = input_data.astype(np.float32) / 255.0
|
| 156 |
-
|
| 157 |
-
self.interpreter.set_tensor(self.input_details[0]["index"], input_data)
|
| 158 |
-
self.interpreter.invoke()
|
| 159 |
-
|
| 160 |
-
classes = self.interpreter.get_tensor(self.output_details[1]["index"]).flatten()
|
| 161 |
-
scores = self.interpreter.get_tensor(self.output_details[2]["index"]).flatten()
|
| 162 |
-
|
| 163 |
-
for i, score in enumerate(scores):
|
| 164 |
-
if score >= self.CONF_THRESHOLD:
|
| 165 |
-
class_id = int(classes[i])
|
| 166 |
-
if class_id < len(self.LABELS):
|
| 167 |
-
current_objects.add(self.LABELS[class_id])
|
| 168 |
-
|
| 169 |
-
else:
|
| 170 |
-
# No object-detection backend is available; just caption the scene every interval.
|
| 171 |
-
description = self.caption_agent.describe(frame)
|
| 172 |
-
print("[CAPTION]", description)
|
| 173 |
-
self.memory_agent.add(description)
|
| 174 |
-
previous_objects = set()
|
| 175 |
-
continue
|
| 176 |
-
|
| 177 |
-
self.interpreter.set_tensor(self.input_details[0]["index"], input_data)
|
| 178 |
-
self.interpreter.invoke()
|
| 179 |
-
|
| 180 |
-
classes = self.interpreter.get_tensor(self.output_details[1]["index"]).flatten()
|
| 181 |
-
scores = self.interpreter.get_tensor(self.output_details[2]["index"]).flatten()
|
| 182 |
-
|
| 183 |
-
for i, score in enumerate(scores):
|
| 184 |
-
if score >= self.CONF_THRESHOLD:
|
| 185 |
-
class_id = int(classes[i])
|
| 186 |
-
if class_id < len(self.LABELS):
|
| 187 |
-
current_objects.add(self.LABELS[class_id])
|
| 188 |
-
|
| 189 |
-
print("Detected objects:", current_objects)
|
| 190 |
-
|
| 191 |
-
# -------------------------------------------------
|
| 192 |
-
# EVENT DETECTION + VLM CAPTION
|
| 193 |
-
# -------------------------------------------------
|
| 194 |
-
new_objects = current_objects - previous_objects
|
| 195 |
-
removed_objects = previous_objects - current_objects
|
| 196 |
-
|
| 197 |
-
if new_objects or removed_objects:
|
| 198 |
-
description = self.caption_agent.describe(frame)
|
| 199 |
-
print("[CAPTION]", description)
|
| 200 |
-
self.memory_agent.add(description)
|
| 201 |
-
|
| 202 |
-
previous_objects = current_objects.copy()
|
| 203 |
-
|
| 204 |
-
# -------------------------------------------------
|
| 205 |
-
# EXIT (NON-BLOCKING)
|
| 206 |
-
# -------------------------------------------------
|
| 207 |
-
if cv2.waitKey(1) & 0xFF == ord('q'):
|
| 208 |
-
break
|
| 209 |
-
|
| 210 |
-
self.cleanup()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
archive/old_agents/voice_agent.py
DELETED
|
@@ -1,127 +0,0 @@
|
|
| 1 |
-
import json
|
| 2 |
-
import queue
|
| 3 |
-
import pyttsx3
|
| 4 |
-
import sounddevice as sd
|
| 5 |
-
from vosk import Model, KaldiRecognizer
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
class VoiceAgent:
|
| 9 |
-
def __init__(self, model_path="models/vosk"):
|
| 10 |
-
# -------------------------
|
| 11 |
-
# Text-to-Speech
|
| 12 |
-
# -------------------------
|
| 13 |
-
self.engine = pyttsx3.init()
|
| 14 |
-
|
| 15 |
-
# -------------------------
|
| 16 |
-
# Speech-to-Text (offline)
|
| 17 |
-
# -------------------------
|
| 18 |
-
self.sample_rate = 16000
|
| 19 |
-
|
| 20 |
-
try:
|
| 21 |
-
self.model = Model(model_path)
|
| 22 |
-
except Exception as e:
|
| 23 |
-
raise RuntimeError(f"Vosk model not found at {model_path}") from e
|
| 24 |
-
|
| 25 |
-
self.recognizer = KaldiRecognizer(self.model, self.sample_rate)
|
| 26 |
-
|
| 27 |
-
# Audio queue
|
| 28 |
-
self.audio_queue = queue.Queue()
|
| 29 |
-
|
| 30 |
-
# Check mic
|
| 31 |
-
self._check_microphone()
|
| 32 |
-
|
| 33 |
-
# -------------------------
|
| 34 |
-
# Microphone check
|
| 35 |
-
# -------------------------
|
| 36 |
-
def _check_microphone(self):
|
| 37 |
-
devices = sd.query_devices()
|
| 38 |
-
input_devices = [d for d in devices if d["max_input_channels"] > 0]
|
| 39 |
-
|
| 40 |
-
if not input_devices:
|
| 41 |
-
raise RuntimeError("No microphone detected.")
|
| 42 |
-
|
| 43 |
-
print("[VOICE INIT] Microphone detected:")
|
| 44 |
-
for d in input_devices:
|
| 45 |
-
print(" -", d["name"])
|
| 46 |
-
|
| 47 |
-
self.speak("Microphone is ready.")
|
| 48 |
-
|
| 49 |
-
# -------------------------
|
| 50 |
-
# TTS
|
| 51 |
-
# -------------------------
|
| 52 |
-
def speak(self, text):
|
| 53 |
-
print("[VOICE OUT]:", text)
|
| 54 |
-
self.engine.say(text)
|
| 55 |
-
self.engine.runAndWait()
|
| 56 |
-
|
| 57 |
-
# -------------------------
|
| 58 |
-
# Audio callback
|
| 59 |
-
# -------------------------
|
| 60 |
-
def _audio_callback(self, indata, frames, time, status):
|
| 61 |
-
if status:
|
| 62 |
-
print("[VOICE WARNING]", status)
|
| 63 |
-
self.audio_queue.put(bytes(indata))
|
| 64 |
-
|
| 65 |
-
# -------------------------
|
| 66 |
-
# Listen (offline STT)
|
| 67 |
-
# -------------------------
|
| 68 |
-
def listen(self, timeout=5):
|
| 69 |
-
print("[VOICE IN]: Listening (offline)...")
|
| 70 |
-
self.speak("Listening")
|
| 71 |
-
|
| 72 |
-
self.recognizer.Reset()
|
| 73 |
-
|
| 74 |
-
with sd.RawInputStream(
|
| 75 |
-
samplerate=self.sample_rate,
|
| 76 |
-
blocksize=8000,
|
| 77 |
-
dtype="int16",
|
| 78 |
-
channels=1,
|
| 79 |
-
callback=self._audio_callback
|
| 80 |
-
):
|
| 81 |
-
for _ in range(int(timeout * self.sample_rate / 8000)):
|
| 82 |
-
data = self.audio_queue.get()
|
| 83 |
-
|
| 84 |
-
if self.recognizer.AcceptWaveform(data):
|
| 85 |
-
break
|
| 86 |
-
|
| 87 |
-
result = json.loads(self.recognizer.FinalResult())
|
| 88 |
-
text = result.get("text", "").lower()
|
| 89 |
-
|
| 90 |
-
if text:
|
| 91 |
-
print("[VOICE IN]: Detected speech →", text)
|
| 92 |
-
else:
|
| 93 |
-
print("[VOICE IN]: No speech detected")
|
| 94 |
-
|
| 95 |
-
return text
|
| 96 |
-
|
| 97 |
-
# -------------------------
|
| 98 |
-
# Intent parsing (SAFE ORDER)
|
| 99 |
-
# -------------------------
|
| 100 |
-
def parse_intent(self, text):
|
| 101 |
-
if not text:
|
| 102 |
-
return "UNKNOWN"
|
| 103 |
-
|
| 104 |
-
# RECALL (most specific)
|
| 105 |
-
if "what did i see" in text or "what have i seen" in text:
|
| 106 |
-
return "RECALL_MEMORY"
|
| 107 |
-
|
| 108 |
-
if "remember what i saw" in text:
|
| 109 |
-
return "RECALL_MEMORY"
|
| 110 |
-
|
| 111 |
-
# STORE
|
| 112 |
-
if "remember this" in text or "save this" in text:
|
| 113 |
-
return "REMEMBER_SCENE"
|
| 114 |
-
|
| 115 |
-
# DESCRIBE
|
| 116 |
-
if "describe" in text or "what is in front" in text:
|
| 117 |
-
return "DESCRIBE_SCENE"
|
| 118 |
-
|
| 119 |
-
# OCR (later)
|
| 120 |
-
if "read" in text or "what does this say" in text:
|
| 121 |
-
return "READ_TEXT"
|
| 122 |
-
|
| 123 |
-
# EXIT
|
| 124 |
-
if "exit" in text or "quit" in text or "stop" in text:
|
| 125 |
-
return "EXIT"
|
| 126 |
-
|
| 127 |
-
return "UNKNOWN"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
archive/old_docs/ARCHITECTURE.md
DELETED
|
@@ -1,445 +0,0 @@
|
|
| 1 |
-
# 🏗️ VisionQ Architecture - Detailed Diagram
|
| 2 |
-
|
| 3 |
-
## 📐 SYSTEM OVERVIEW
|
| 4 |
-
|
| 5 |
-
```
|
| 6 |
-
┌─────────────────────────────────────────────────────────────────┐
|
| 7 |
-
│ USER INTERACTION │
|
| 8 |
-
│ (Voice Commands / Text Queries) │
|
| 9 |
-
└────────────────────────────┬────────────────────────────────────┘
|
| 10 |
-
│
|
| 11 |
-
┌────────────┴────────────┐
|
| 12 |
-
│ │
|
| 13 |
-
┌──────▼──────┐ ┌──────▼──────┐
|
| 14 |
-
│ VOICE AGENT │ │TEXT QUERIES │
|
| 15 |
-
│ (UPDATED) │ │ (UPDATED) │
|
| 16 |
-
└──────┬──────┘ └──────┬──────┘
|
| 17 |
-
│ │
|
| 18 |
-
┌───────────┴────────────┐ │
|
| 19 |
-
│ │ │
|
| 20 |
-
┌───▼────┐ ┌─────▼──────┐ │
|
| 21 |
-
│ STT │ │ TTS │ │
|
| 22 |
-
│ (Vosk) │ │ (UPDATED) │ │
|
| 23 |
-
│ KEPT │ └─────┬──────┘ │
|
| 24 |
-
└───┬────┘ │ │
|
| 25 |
-
│ ┌────────┴────────┐ │
|
| 26 |
-
│ │ │ │
|
| 27 |
-
│ ┌────▼────┐ ┌─────▼─▼──────┐
|
| 28 |
-
│ │Voxtral │ │ pyttsx3 │
|
| 29 |
-
│ │ (NEW) │ │ (FALLBACK) │
|
| 30 |
-
│ │Primary │ │ KEPT │
|
| 31 |
-
│ └─────────┘ └──────────────┘
|
| 32 |
-
│
|
| 33 |
-
└──────────────────┬──────────────────────────────────┐
|
| 34 |
-
│ │
|
| 35 |
-
┌──────▼──────┐ │
|
| 36 |
-
│VISION AGENT │ │
|
| 37 |
-
│ (UPDATED) │ │
|
| 38 |
-
│ HUB │ │
|
| 39 |
-
└──────┬──────┘ │
|
| 40 |
-
│ │
|
| 41 |
-
┌──────────────┼──────────────┬─────────────┐ │
|
| 42 |
-
│ │ │ │ │
|
| 43 |
-
┌────▼────┐ ┌────▼────┐ ┌────▼────┐ ┌────▼───▼──┐
|
| 44 |
-
│ YOLO/ │ │ BLIP │ │MobileCLIP│ │ EasyOCR │
|
| 45 |
-
│ SSD │ │Caption │ │Embedding│ │ OCR │
|
| 46 |
-
│ KEPT │ │ KEPT │ │ (NEW) │ │ (NEW) │
|
| 47 |
-
└────┬────┘ └────┬────┘ └────┬────┘ └────┬──────┘
|
| 48 |
-
│ │ │ │
|
| 49 |
-
│ Objects │ Caption │ Embedding │ Text
|
| 50 |
-
│ │ │ │
|
| 51 |
-
└─────────────┴──────────────┴────────────┘
|
| 52 |
-
│
|
| 53 |
-
┌──────▼──────┐
|
| 54 |
-
│ FUSION │
|
| 55 |
-
│ LAYER │
|
| 56 |
-
│ (NEW) │
|
| 57 |
-
└──────┬──────┘
|
| 58 |
-
│
|
| 59 |
-
Unified Multimodal Context
|
| 60 |
-
│
|
| 61 |
-
┌──────────────┴──────────────┐
|
| 62 |
-
│ │
|
| 63 |
-
┌────▼────┐ ┌────▼────┐
|
| 64 |
-
│ MEMORY │ │ QUERY │
|
| 65 |
-
│ AGENT │◄─────────────────┤ AGENT │
|
| 66 |
-
│(UPDATED)│ │(UPDATED)│
|
| 67 |
-
└────┬────┘ └────┬────┘
|
| 68 |
-
│ │
|
| 69 |
-
┌────┴────┬────────┐ ┌───┴────┐
|
| 70 |
-
│ │ │ │ │
|
| 71 |
-
┌──▼──┐ ┌──▼───┐ ┌──▼───┐ ┌──▼────┐ ┌▼────────┐
|
| 72 |
-
│JSON │ │FAISS │ │Text │ │DistilB│ │Hybrid │
|
| 73 |
-
│Meta │ │Index │ │Embed │ │ ERT │ │Search │
|
| 74 |
-
│KEPT │ │(NEW) │ │KEPT │ │(NEW) │ │(NEW) │
|
| 75 |
-
└─────┘ └──────┘ └──────┘ └───────┘ └─────────┘
|
| 76 |
-
```
|
| 77 |
-
|
| 78 |
-
---
|
| 79 |
-
|
| 80 |
-
## 🔄 DATA FLOW DIAGRAM
|
| 81 |
-
|
| 82 |
-
### **1. SCENE DESCRIPTION FLOW**
|
| 83 |
-
|
| 84 |
-
```
|
| 85 |
-
User: "Describe the scene"
|
| 86 |
-
│
|
| 87 |
-
▼
|
| 88 |
-
VoiceAgent.listen() → Vosk STT
|
| 89 |
-
│
|
| 90 |
-
▼
|
| 91 |
-
VoiceAgent.parse_intent() → "DESCRIBE_SCENE"
|
| 92 |
-
│
|
| 93 |
-
▼
|
| 94 |
-
VisionAgent.describe_scene()
|
| 95 |
-
│
|
| 96 |
-
├─► CaptionAgent.describe() → "a person holding a phone"
|
| 97 |
-
│
|
| 98 |
-
├─► OCRAgent.extract_text() → "Hello World"
|
| 99 |
-
│
|
| 100 |
-
├─► EmbeddingAgent.encode_image() → [512-dim vector]
|
| 101 |
-
│
|
| 102 |
-
└─► FusionLayer.fuse()
|
| 103 |
-
│
|
| 104 |
-
▼
|
| 105 |
-
Combined Description:
|
| 106 |
-
"a person holding a phone. Text visible: Hello World"
|
| 107 |
-
│
|
| 108 |
-
▼
|
| 109 |
-
VoiceAgent.speak() → Voxtral/pyttsx3
|
| 110 |
-
```
|
| 111 |
-
|
| 112 |
-
---
|
| 113 |
-
|
| 114 |
-
### **2. MEMORY STORAGE FLOW**
|
| 115 |
-
|
| 116 |
-
```
|
| 117 |
-
User: "Remember this"
|
| 118 |
-
│
|
| 119 |
-
▼
|
| 120 |
-
VisionAgent.remember_scene()
|
| 121 |
-
│
|
| 122 |
-
├─► Capture frame
|
| 123 |
-
│
|
| 124 |
-
├─► Get caption (BLIP)
|
| 125 |
-
│
|
| 126 |
-
├─► Get OCR text (EasyOCR)
|
| 127 |
-
│
|
| 128 |
-
├─► Get embedding (MobileCLIP)
|
| 129 |
-
│
|
| 130 |
-
└─► FusionLayer.fuse()
|
| 131 |
-
│
|
| 132 |
-
▼
|
| 133 |
-
Fused Context
|
| 134 |
-
│
|
| 135 |
-
▼
|
| 136 |
-
MemoryAgent.add(description, embedding)
|
| 137 |
-
│
|
| 138 |
-
├─► Generate text embedding (sentence-transformers)
|
| 139 |
-
│
|
| 140 |
-
├─► Compute importance score
|
| 141 |
-
│
|
| 142 |
-
├─► Save to JSON:
|
| 143 |
-
│ {
|
| 144 |
-
│ "id": 0,
|
| 145 |
-
│ "timestamp": "2024-01-15 10:30:00",
|
| 146 |
-
│ "description": "...",
|
| 147 |
-
│ "text_embedding": [...],
|
| 148 |
-
│ "image_embedding": [...],
|
| 149 |
-
│ "importance": 5
|
| 150 |
-
│ }
|
| 151 |
-
│
|
| 152 |
-
└─► Add to FAISS index (image embedding)
|
| 153 |
-
│
|
| 154 |
-
▼
|
| 155 |
-
Memory Stored ✅
|
| 156 |
-
```
|
| 157 |
-
|
| 158 |
-
---
|
| 159 |
-
|
| 160 |
-
### **3. MEMORY QUERY FLOW**
|
| 161 |
-
|
| 162 |
-
```
|
| 163 |
-
User: "What did I see this morning?"
|
| 164 |
-
│
|
| 165 |
-
▼
|
| 166 |
-
QueryAgent.ask(question)
|
| 167 |
-
│
|
| 168 |
-
├─► QueryAgent.classify_intent()
|
| 169 |
-
│ │
|
| 170 |
-
│ └─► DistilBERT → "temporal"
|
| 171 |
-
│
|
| 172 |
-
├─► QueryAgent.extract_time_window()
|
| 173 |
-
│ │
|
| 174 |
-
│ └─► (6:00 AM, 12:00 PM)
|
| 175 |
-
│
|
| 176 |
-
├─► Filter memories by time
|
| 177 |
-
│
|
| 178 |
-
├─► Text similarity search
|
| 179 |
-
│ │
|
| 180 |
-
│ └─► sentence-transformers cosine similarity
|
| 181 |
-
│
|
| 182 |
-
├─► Image similarity search (if query has image)
|
| 183 |
-
│ │
|
| 184 |
-
│ └─► FAISS.search() → Top-K results
|
| 185 |
-
│
|
| 186 |
-
├─► Hybrid ranking
|
| 187 |
-
│ │
|
| 188 |
-
│ └─► Sort by (similarity × importance)
|
| 189 |
-
│
|
| 190 |
-
└─► Build response
|
| 191 |
-
│
|
| 192 |
-
▼
|
| 193 |
-
"At 10:30 AM, a person holding a phone.
|
| 194 |
-
Text visible: Hello World (confidence 0.87)"
|
| 195 |
-
```
|
| 196 |
-
|
| 197 |
-
---
|
| 198 |
-
|
| 199 |
-
### **4. OCR READING FLOW**
|
| 200 |
-
|
| 201 |
-
```
|
| 202 |
-
User: "Read the text"
|
| 203 |
-
│
|
| 204 |
-
▼
|
| 205 |
-
VisionAgent.read_text()
|
| 206 |
-
│
|
| 207 |
-
├─► Capture frame
|
| 208 |
-
│
|
| 209 |
-
└─► OCRAgent.extract_text(frame)
|
| 210 |
-
│
|
| 211 |
-
├─► EasyOCR.readtext() → [(bbox, text, conf), ...]
|
| 212 |
-
│
|
| 213 |
-
├─► Filter by confidence (>0.3)
|
| 214 |
-
│
|
| 215 |
-
├─► Clean text (remove special chars)
|
| 216 |
-
│
|
| 217 |
-
└─► Return: "Hello World"
|
| 218 |
-
│
|
| 219 |
-
▼
|
| 220 |
-
VoiceAgent.speak("I can see the following text: Hello World")
|
| 221 |
-
```
|
| 222 |
-
|
| 223 |
-
---
|
| 224 |
-
|
| 225 |
-
## 🧩 MODULE INTERACTIONS
|
| 226 |
-
|
| 227 |
-
### **Agent Dependencies**
|
| 228 |
-
|
| 229 |
-
```
|
| 230 |
-
VoiceAgent
|
| 231 |
-
├─ Depends on: vosk, sounddevice, pyttsx3, piper-tts
|
| 232 |
-
└─ Used by: main.py
|
| 233 |
-
|
| 234 |
-
VisionAgent
|
| 235 |
-
├─ Depends on: CaptionAgent, EmbeddingAgent, OCRAgent,
|
| 236 |
-
│ MemoryAgent, FusionLayer
|
| 237 |
-
└─ Used by: main.py
|
| 238 |
-
|
| 239 |
-
CaptionAgent
|
| 240 |
-
├─ Depends on: transformers (BLIP), torch
|
| 241 |
-
└─ Used by: VisionAgent
|
| 242 |
-
|
| 243 |
-
EmbeddingAgent
|
| 244 |
-
├─ Depends on: transformers (CLIP), torch
|
| 245 |
-
└─ Used by: VisionAgent
|
| 246 |
-
|
| 247 |
-
OCRAgent
|
| 248 |
-
├─ Depends on: easyocr
|
| 249 |
-
└─ Used by: VisionAgent
|
| 250 |
-
|
| 251 |
-
FusionLayer
|
| 252 |
-
├─ Depends on: None (pure Python)
|
| 253 |
-
└─ Used by: VisionAgent
|
| 254 |
-
|
| 255 |
-
MemoryAgent
|
| 256 |
-
├─ Depends on: sentence-transformers, faiss, json
|
| 257 |
-
└─ Used by: VisionAgent, QueryAgent
|
| 258 |
-
|
| 259 |
-
QueryAgent
|
| 260 |
-
├─ Depends on: MemoryAgent, transformers (DistilBERT)
|
| 261 |
-
└─ Used by: ask_question.py, main.py (future)
|
| 262 |
-
```
|
| 263 |
-
|
| 264 |
-
---
|
| 265 |
-
|
| 266 |
-
## 🔀 FALLBACK MECHANISMS
|
| 267 |
-
|
| 268 |
-
### **1. TTS Fallback**
|
| 269 |
-
```
|
| 270 |
-
Try: Voxtral/Piper
|
| 271 |
-
│
|
| 272 |
-
├─ Success → Use neural TTS
|
| 273 |
-
│
|
| 274 |
-
└─ Failure → Fall back to pyttsx3
|
| 275 |
-
```
|
| 276 |
-
|
| 277 |
-
### **2. Intent Classification Fallback**
|
| 278 |
-
```
|
| 279 |
-
Try: DistilBERT
|
| 280 |
-
│
|
| 281 |
-
├─ Success → Use NLP classification
|
| 282 |
-
│
|
| 283 |
-
└─ Failure → Use keyword matching
|
| 284 |
-
```
|
| 285 |
-
|
| 286 |
-
### **3. Vision Backend Fallback**
|
| 287 |
-
```
|
| 288 |
-
Try: YOLO
|
| 289 |
-
│
|
| 290 |
-
├─ Success → Use YOLO
|
| 291 |
-
│
|
| 292 |
-
└─ Failure → Try SSD
|
| 293 |
-
│
|
| 294 |
-
├─ Success → Use SSD
|
| 295 |
-
│
|
| 296 |
-
└─ Failure → Caption only
|
| 297 |
-
```
|
| 298 |
-
|
| 299 |
-
### **4. Vector Search Fallback**
|
| 300 |
-
```
|
| 301 |
-
Try: FAISS
|
| 302 |
-
│
|
| 303 |
-
├─ Available → Fast vector search
|
| 304 |
-
│
|
| 305 |
-
└─ Unavailable → Linear text search
|
| 306 |
-
```
|
| 307 |
-
|
| 308 |
-
---
|
| 309 |
-
|
| 310 |
-
## 📊 MEMORY ARCHITECTURE
|
| 311 |
-
|
| 312 |
-
### **Hybrid Storage System**
|
| 313 |
-
|
| 314 |
-
```
|
| 315 |
-
┌─────────────────────────────────────────┐
|
| 316 |
-
│ MEMORY AGENT │
|
| 317 |
-
├─────────────────────────────────────────┤
|
| 318 |
-
│ │
|
| 319 |
-
│ ┌───────────────┐ ┌────────────────┐ │
|
| 320 |
-
│ │ JSON FILE │ │ FAISS INDEX │ │
|
| 321 |
-
│ │ (Metadata) │ │ (Vectors) │ │
|
| 322 |
-
│ ├───────────────┤ ├────────────────┤ │
|
| 323 |
-
│ │ • ID │ │ • Image embed │ │
|
| 324 |
-
│ │ • Timestamp │ │ • Fast search │ │
|
| 325 |
-
│ │ • Description │ │ • Cosine sim │ │
|
| 326 |
-
│ │ • Text embed │ │ • Top-K │ │
|
| 327 |
-
│ │ • Image embed │ │ │ │
|
| 328 |
-
│ │ • Importance │ │ │ │
|
| 329 |
-
│ └───────────────┘ └────────────────┘ │
|
| 330 |
-
│ │ │ │
|
| 331 |
-
│ └────────┬───────────┘ │
|
| 332 |
-
│ │ │
|
| 333 |
-
│ Linked by Memory ID │
|
| 334 |
-
└─────────────────────────────────────────┘
|
| 335 |
-
```
|
| 336 |
-
|
| 337 |
-
### **Search Strategy**
|
| 338 |
-
|
| 339 |
-
```
|
| 340 |
-
Query Input
|
| 341 |
-
│
|
| 342 |
-
├─► Has image? → FAISS image search
|
| 343 |
-
│ │
|
| 344 |
-
│ └─► Get top-K IDs
|
| 345 |
-
│
|
| 346 |
-
└─► Has text? → Text embedding search
|
| 347 |
-
│
|
| 348 |
-
└─► Get matching IDs
|
| 349 |
-
│
|
| 350 |
-
▼
|
| 351 |
-
Merge & Rank
|
| 352 |
-
│
|
| 353 |
-
▼
|
| 354 |
-
Return Results
|
| 355 |
-
```
|
| 356 |
-
|
| 357 |
-
---
|
| 358 |
-
|
| 359 |
-
## 🎯 COMPONENT STATUS
|
| 360 |
-
|
| 361 |
-
| Component | Status | Notes |
|
| 362 |
-
|-----------|--------|-------|
|
| 363 |
-
| VoiceAgent | ✅ UPDATED | Added Voxtral + fallback |
|
| 364 |
-
| VisionAgent | ✅ UPDATED | Integrated new agents |
|
| 365 |
-
| CaptionAgent | ✅ KEPT | No changes needed |
|
| 366 |
-
| EmbeddingAgent | 🆕 NEW | MobileCLIP integration |
|
| 367 |
-
| OCRAgent | 🆕 NEW | EasyOCR integration |
|
| 368 |
-
| FusionLayer | 🆕 NEW | Multimodal fusion |
|
| 369 |
-
| MemoryAgent | ✅ UPDATED | Added FAISS |
|
| 370 |
-
| QueryAgent | ✅ UPDATED | Added DistilBERT |
|
| 371 |
-
|
| 372 |
-
---
|
| 373 |
-
|
| 374 |
-
## 🔧 CONFIGURATION POINTS
|
| 375 |
-
|
| 376 |
-
### **Adjustable Parameters**
|
| 377 |
-
|
| 378 |
-
```python
|
| 379 |
-
# VisionAgent
|
| 380 |
-
FRAME_INTERVAL = 0.3 # Seconds between frames
|
| 381 |
-
CONF_THRESHOLD = 0.5 # Object detection confidence
|
| 382 |
-
|
| 383 |
-
# OCRAgent
|
| 384 |
-
OCR_CONFIDENCE = 0.3 # Text detection threshold
|
| 385 |
-
OCR_LANGUAGES = ['en'] # Supported languages
|
| 386 |
-
|
| 387 |
-
# MemoryAgent
|
| 388 |
-
EMBEDDING_DIM = 512 # CLIP embedding size
|
| 389 |
-
FAISS_INDEX_TYPE = "FlatIP" # Inner product (cosine)
|
| 390 |
-
|
| 391 |
-
# QueryAgent
|
| 392 |
-
SIMILARITY_THRESHOLD = 0.45 # Text search threshold
|
| 393 |
-
TOP_K_RESULTS = 5 # Max results to return
|
| 394 |
-
```
|
| 395 |
-
|
| 396 |
-
---
|
| 397 |
-
|
| 398 |
-
## 📈 SCALABILITY
|
| 399 |
-
|
| 400 |
-
### **Current Limits**
|
| 401 |
-
- Memory: ~10,000 entries (JSON + FAISS)
|
| 402 |
-
- Search: O(log n) with FAISS
|
| 403 |
-
- Real-time: 3 FPS (with all agents)
|
| 404 |
-
|
| 405 |
-
### **Optimization Options**
|
| 406 |
-
1. Use FAISS IVF index for >100K memories
|
| 407 |
-
2. Batch process frames
|
| 408 |
-
3. GPU acceleration for embeddings
|
| 409 |
-
4. Async processing pipeline
|
| 410 |
-
|
| 411 |
-
---
|
| 412 |
-
|
| 413 |
-
## 🎓 KEY DESIGN DECISIONS
|
| 414 |
-
|
| 415 |
-
### **1. Why FAISS?**
|
| 416 |
-
- Fast similarity search (10-100x faster than linear)
|
| 417 |
-
- Scales to millions of vectors
|
| 418 |
-
- CPU-friendly (no GPU required)
|
| 419 |
-
|
| 420 |
-
### **2. Why EasyOCR?**
|
| 421 |
-
- Offline capability
|
| 422 |
-
- Multi-language support
|
| 423 |
-
- Good accuracy/speed tradeoff
|
| 424 |
-
|
| 425 |
-
### **3. Why DistilBERT?**
|
| 426 |
-
- 40% smaller than BERT
|
| 427 |
-
- 60% faster
|
| 428 |
-
- 97% of BERT's accuracy
|
| 429 |
-
|
| 430 |
-
### **4. Why Hybrid Storage?**
|
| 431 |
-
- JSON: Human-readable, easy debugging
|
| 432 |
-
- FAISS: Fast vector search
|
| 433 |
-
- Best of both worlds
|
| 434 |
-
|
| 435 |
-
---
|
| 436 |
-
|
| 437 |
-
**This architecture provides:**
|
| 438 |
-
- ✅ Modularity (easy to extend)
|
| 439 |
-
- ✅ Robustness (multiple fallbacks)
|
| 440 |
-
- ✅ Performance (FAISS acceleration)
|
| 441 |
-
- ✅ Compatibility (backward compatible)
|
| 442 |
-
|
| 443 |
-
---
|
| 444 |
-
|
| 445 |
-
For implementation details, see individual agent files in `agents/` directory.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
archive/old_docs/COMPARISON.md
DELETED
|
@@ -1,431 +0,0 @@
|
|
| 1 |
-
# 📊 VisionQ - Before vs After Comparison
|
| 2 |
-
|
| 3 |
-
## 🎯 EXECUTIVE SUMMARY
|
| 4 |
-
|
| 5 |
-
VisionQ has been upgraded from a **basic vision assistant** to a **comprehensive multimodal AI system** with 4 major new capabilities, 10x performance improvement, and 100% backward compatibility.
|
| 6 |
-
|
| 7 |
-
---
|
| 8 |
-
|
| 9 |
-
## 🆚 FEATURE COMPARISON
|
| 10 |
-
|
| 11 |
-
| Feature | Before | After | Improvement |
|
| 12 |
-
|---------|--------|-------|-------------|
|
| 13 |
-
| **Text Reading** | ❌ None | ✅ EasyOCR | NEW |
|
| 14 |
-
| **Memory Search** | Linear O(n) | FAISS O(log n) | 10-100x faster |
|
| 15 |
-
| **Voice Quality** | Robotic (pyttsx3) | Natural (Voxtral) | Much better |
|
| 16 |
-
| **Query Understanding** | Keywords | DistilBERT NLP | 27% more accurate |
|
| 17 |
-
| **Scene Description** | Caption only | Caption+OCR+Objects | 4x richer |
|
| 18 |
-
| **Memory Capacity** | ~1,000 entries | 10,000+ entries | 10x more |
|
| 19 |
-
| **Search Accuracy** | ~75% relevant | ~90% relevant | 15% better |
|
| 20 |
-
| **Response Time** | 100-500ms | <100ms | 5x faster |
|
| 21 |
-
|
| 22 |
-
---
|
| 23 |
-
|
| 24 |
-
## 🏗️ ARCHITECTURE COMPARISON
|
| 25 |
-
|
| 26 |
-
### **Before (Original)**
|
| 27 |
-
```
|
| 28 |
-
Voice (Vosk + pyttsx3)
|
| 29 |
-
↓
|
| 30 |
-
Vision (YOLO/SSD + BLIP)
|
| 31 |
-
↓
|
| 32 |
-
Memory (JSON + text embeddings)
|
| 33 |
-
↓
|
| 34 |
-
Query (cosine similarity)
|
| 35 |
-
```
|
| 36 |
-
|
| 37 |
-
**Components:** 4 agents
|
| 38 |
-
**Storage:** JSON only
|
| 39 |
-
**Search:** Linear text search
|
| 40 |
-
**Modalities:** Vision only
|
| 41 |
-
|
| 42 |
-
---
|
| 43 |
-
|
| 44 |
-
### **After (Upgraded)**
|
| 45 |
-
```
|
| 46 |
-
Voice (Vosk + Voxtral + pyttsx3)
|
| 47 |
-
↓
|
| 48 |
-
Vision Hub
|
| 49 |
-
├─ YOLO/SSD (objects)
|
| 50 |
-
├─ BLIP (captions)
|
| 51 |
-
├─ MobileCLIP (embeddings)
|
| 52 |
-
└─ EasyOCR (text)
|
| 53 |
-
↓
|
| 54 |
-
Fusion Layer
|
| 55 |
-
↓
|
| 56 |
-
Memory (JSON + FAISS)
|
| 57 |
-
↓
|
| 58 |
-
Query (DistilBERT + hybrid search)
|
| 59 |
-
```
|
| 60 |
-
|
| 61 |
-
**Components:** 7 agents + fusion layer
|
| 62 |
-
**Storage:** JSON + FAISS hybrid
|
| 63 |
-
**Search:** Vector similarity + text
|
| 64 |
-
**Modalities:** Vision + Text + Embeddings
|
| 65 |
-
|
| 66 |
-
---
|
| 67 |
-
|
| 68 |
-
## 📈 PERFORMANCE METRICS
|
| 69 |
-
|
| 70 |
-
| Metric | Before | After | Change |
|
| 71 |
-
|--------|--------|-------|--------|
|
| 72 |
-
| **Memory Search Time** | 100-500ms | <10ms | 🟢 10-50x faster |
|
| 73 |
-
| **Query Response** | 200-1000ms | <100ms | 🟢 2-10x faster |
|
| 74 |
-
| **Memory Capacity** | ~1,000 | 10,000+ | 🟢 10x more |
|
| 75 |
-
| **Search Accuracy** | 75% | 90% | 🟢 +15% |
|
| 76 |
-
| **Intent Accuracy** | 70% | 97% | 🟢 +27% |
|
| 77 |
-
| **OCR Accuracy** | N/A | 85-95% | 🟢 NEW |
|
| 78 |
-
| **Startup Time** | 5-10s | 8-15s | 🟡 Slightly slower |
|
| 79 |
-
| **Memory Usage** | ~500MB | ~800MB | 🟡 +300MB |
|
| 80 |
-
|
| 81 |
-
---
|
| 82 |
-
|
| 83 |
-
## 🆕 NEW CAPABILITIES
|
| 84 |
-
|
| 85 |
-
### **1. OCR Text Extraction**
|
| 86 |
-
**Before:** ❌ Could not read text
|
| 87 |
-
**After:** ✅ Extracts and reads visible text
|
| 88 |
-
|
| 89 |
-
**Example:**
|
| 90 |
-
```
|
| 91 |
-
Before: "a sign on a wall"
|
| 92 |
-
After: "a sign on a wall. Text visible: EXIT"
|
| 93 |
-
```
|
| 94 |
-
|
| 95 |
-
---
|
| 96 |
-
|
| 97 |
-
### **2. Visual Similarity Search**
|
| 98 |
-
**Before:** ❌ Text-only search
|
| 99 |
-
**After:** ✅ Image embedding search via FAISS
|
| 100 |
-
|
| 101 |
-
**Example:**
|
| 102 |
-
```
|
| 103 |
-
Before: Search by description only
|
| 104 |
-
After: "Find scenes similar to this image" → Returns visually similar memories
|
| 105 |
-
```
|
| 106 |
-
|
| 107 |
-
---
|
| 108 |
-
|
| 109 |
-
### **3. Intent Classification**
|
| 110 |
-
**Before:** ❌ Keyword matching (70% accuracy)
|
| 111 |
-
**After:** ✅ DistilBERT NLP (97% accuracy)
|
| 112 |
-
|
| 113 |
-
**Example:**
|
| 114 |
-
```
|
| 115 |
-
Query: "What did I see this morning?"
|
| 116 |
-
Before: Matches "see" keyword → Generic results
|
| 117 |
-
After: Classifies as "temporal" → Time-filtered results
|
| 118 |
-
```
|
| 119 |
-
|
| 120 |
-
---
|
| 121 |
-
|
| 122 |
-
### **4. Neural TTS**
|
| 123 |
-
**Before:** ❌ Robotic pyttsx3 voice
|
| 124 |
-
**After:** ✅ Natural Voxtral/Piper voice
|
| 125 |
-
|
| 126 |
-
**Example:**
|
| 127 |
-
```
|
| 128 |
-
Before: "Scene. Remembered." (robotic)
|
| 129 |
-
After: "Scene remembered." (natural)
|
| 130 |
-
```
|
| 131 |
-
|
| 132 |
-
---
|
| 133 |
-
|
| 134 |
-
### **5. Multimodal Fusion**
|
| 135 |
-
**Before:** ❌ Caption only
|
| 136 |
-
**After:** ✅ Caption + OCR + Objects + Embeddings
|
| 137 |
-
|
| 138 |
-
**Example:**
|
| 139 |
-
```
|
| 140 |
-
Before: "a person holding a phone"
|
| 141 |
-
After: "a person holding a phone. Objects detected: person, phone. Text visible: Hello World"
|
| 142 |
-
```
|
| 143 |
-
|
| 144 |
-
---
|
| 145 |
-
|
| 146 |
-
## 🔧 TECHNICAL IMPROVEMENTS
|
| 147 |
-
|
| 148 |
-
### **Code Organization**
|
| 149 |
-
|
| 150 |
-
| Aspect | Before | After |
|
| 151 |
-
|--------|--------|-------|
|
| 152 |
-
| **Structure** | Flat files | Modular (agents/ + core/) |
|
| 153 |
-
| **Agents** | 4 agents | 7 agents + fusion layer |
|
| 154 |
-
| **Lines of Code** | ~800 | ~1,500 (better organized) |
|
| 155 |
-
| **Documentation** | Basic README | 6 comprehensive docs |
|
| 156 |
-
| **Tests** | None | Automated test suite |
|
| 157 |
-
|
| 158 |
-
---
|
| 159 |
-
|
| 160 |
-
### **Dependencies**
|
| 161 |
-
|
| 162 |
-
| Category | Before | After | Added |
|
| 163 |
-
|----------|--------|-------|-------|
|
| 164 |
-
| **Core** | 8 packages | 10 packages | +2 |
|
| 165 |
-
| **Optional** | 1 (tensorflow) | 3 (faiss, easyocr, piper) | +2 |
|
| 166 |
-
| **Total Size** | ~1.5GB | ~2GB | +500MB |
|
| 167 |
-
|
| 168 |
-
**New Dependencies:**
|
| 169 |
-
- ✅ faiss-cpu (vector search)
|
| 170 |
-
- ✅ easyocr (text extraction)
|
| 171 |
-
- ✅ piper-tts (neural voice)
|
| 172 |
-
|
| 173 |
-
---
|
| 174 |
-
|
| 175 |
-
### **Storage System**
|
| 176 |
-
|
| 177 |
-
| Aspect | Before | After |
|
| 178 |
-
|--------|--------|-------|
|
| 179 |
-
| **Metadata** | JSON file | JSON file (kept) |
|
| 180 |
-
| **Vectors** | In JSON | FAISS index (new) |
|
| 181 |
-
| **Text Embeddings** | sentence-transformers | sentence-transformers (kept) |
|
| 182 |
-
| **Image Embeddings** | ❌ None | ✅ MobileCLIP |
|
| 183 |
-
| **Search Method** | Linear scan | FAISS similarity |
|
| 184 |
-
| **Index Size** | N/A | ~4KB per 1000 entries |
|
| 185 |
-
|
| 186 |
-
---
|
| 187 |
-
|
| 188 |
-
## 🎯 USE CASE COMPARISON
|
| 189 |
-
|
| 190 |
-
### **Scenario 1: Scene Description**
|
| 191 |
-
|
| 192 |
-
**Before:**
|
| 193 |
-
```
|
| 194 |
-
User: "Describe the scene"
|
| 195 |
-
System: "a person holding a phone"
|
| 196 |
-
```
|
| 197 |
-
|
| 198 |
-
**After:**
|
| 199 |
-
```
|
| 200 |
-
User: "Describe the scene"
|
| 201 |
-
System: "a person holding a phone. Objects detected: person, phone. Text visible: Hello World"
|
| 202 |
-
```
|
| 203 |
-
|
| 204 |
-
**Improvement:** 4x more information
|
| 205 |
-
|
| 206 |
-
---
|
| 207 |
-
|
| 208 |
-
### **Scenario 2: Memory Search**
|
| 209 |
-
|
| 210 |
-
**Before:**
|
| 211 |
-
```
|
| 212 |
-
User: "What did I see this morning?"
|
| 213 |
-
System: [Searches all memories linearly]
|
| 214 |
-
Time: 500ms for 1000 memories
|
| 215 |
-
Results: 5 matches (75% relevant)
|
| 216 |
-
```
|
| 217 |
-
|
| 218 |
-
**After:**
|
| 219 |
-
```
|
| 220 |
-
User: "What did I see this morning?"
|
| 221 |
-
System: [FAISS + time filter + intent classification]
|
| 222 |
-
Time: 10ms for 10,000 memories
|
| 223 |
-
Results: 5 matches (90% relevant)
|
| 224 |
-
```
|
| 225 |
-
|
| 226 |
-
**Improvement:** 50x faster, 15% more accurate
|
| 227 |
-
|
| 228 |
-
---
|
| 229 |
-
|
| 230 |
-
### **Scenario 3: Text Reading**
|
| 231 |
-
|
| 232 |
-
**Before:**
|
| 233 |
-
```
|
| 234 |
-
User: "Read the text"
|
| 235 |
-
System: "Reading text will be available soon."
|
| 236 |
-
```
|
| 237 |
-
|
| 238 |
-
**After:**
|
| 239 |
-
```
|
| 240 |
-
User: "Read the text"
|
| 241 |
-
System: "I can see the following text: Hello World"
|
| 242 |
-
```
|
| 243 |
-
|
| 244 |
-
**Improvement:** NEW capability
|
| 245 |
-
|
| 246 |
-
---
|
| 247 |
-
|
| 248 |
-
## 📊 CAPABILITY MATRIX
|
| 249 |
-
|
| 250 |
-
| Capability | Before | After | Status |
|
| 251 |
-
|------------|--------|-------|--------|
|
| 252 |
-
| **Object Detection** | ✅ YOLO/SSD | ✅ YOLO/SSD | KEPT |
|
| 253 |
-
| **Image Captioning** | ✅ BLIP | ✅ BLIP | KEPT |
|
| 254 |
-
| **Text Extraction** | ❌ | ✅ EasyOCR | NEW |
|
| 255 |
-
| **Image Embeddings** | ❌ | ✅ MobileCLIP | NEW |
|
| 256 |
-
| **Text Embeddings** | ✅ MiniLM | ✅ MiniLM | KEPT |
|
| 257 |
-
| **Vector Search** | ❌ | ✅ FAISS | NEW |
|
| 258 |
-
| **Speech Recognition** | ✅ Vosk | ✅ Vosk | KEPT |
|
| 259 |
-
| **Text-to-Speech** | ✅ pyttsx3 | ✅ Voxtral + pyttsx3 | ENHANCED |
|
| 260 |
-
| **Intent Classification** | ❌ Keywords | ✅ DistilBERT | NEW |
|
| 261 |
-
| **Time Filtering** | ✅ Basic | ✅ Enhanced | IMPROVED |
|
| 262 |
-
| **Importance Scoring** | ✅ Basic | ✅ Enhanced | IMPROVED |
|
| 263 |
-
| **Multimodal Fusion** | ❌ | ✅ FusionLayer | NEW |
|
| 264 |
-
|
| 265 |
-
---
|
| 266 |
-
|
| 267 |
-
## 🔄 BACKWARD COMPATIBILITY
|
| 268 |
-
|
| 269 |
-
| Aspect | Compatible? | Notes |
|
| 270 |
-
|--------|-------------|-------|
|
| 271 |
-
| **Old memory.json** | ✅ YES | Automatically migrated |
|
| 272 |
-
| **Voice commands** | ✅ YES | Same commands work |
|
| 273 |
-
| **Memory format** | ✅ YES | New fields optional |
|
| 274 |
-
| **API** | ✅ YES | Old methods still work |
|
| 275 |
-
| **File structure** | ✅ YES | Old files preserved |
|
| 276 |
-
| **Dependencies** | ✅ YES | Old deps still work |
|
| 277 |
-
|
| 278 |
-
**Breaking Changes:** ❌ NONE
|
| 279 |
-
|
| 280 |
-
---
|
| 281 |
-
|
| 282 |
-
## 💰 COST-BENEFIT ANALYSIS
|
| 283 |
-
|
| 284 |
-
### **Costs**
|
| 285 |
-
| Item | Cost |
|
| 286 |
-
|------|------|
|
| 287 |
-
| **Development Time** | ~8 hours |
|
| 288 |
-
| **Additional Storage** | +500MB models |
|
| 289 |
-
| **Memory Usage** | +300MB RAM |
|
| 290 |
-
| **Startup Time** | +3-5 seconds |
|
| 291 |
-
| **Complexity** | Medium increase |
|
| 292 |
-
|
| 293 |
-
### **Benefits**
|
| 294 |
-
| Item | Benefit |
|
| 295 |
-
|------|---------|
|
| 296 |
-
| **New Features** | 4 major capabilities |
|
| 297 |
-
| **Performance** | 10x faster search |
|
| 298 |
-
| **Accuracy** | 15-27% improvement |
|
| 299 |
-
| **Capacity** | 10x more memories |
|
| 300 |
-
| **User Experience** | Significantly better |
|
| 301 |
-
| **Maintainability** | Better code structure |
|
| 302 |
-
|
| 303 |
-
**ROI:** 🟢 **VERY HIGH** - Major improvements with minimal cost
|
| 304 |
-
|
| 305 |
-
---
|
| 306 |
-
|
| 307 |
-
## 🎯 UPGRADE IMPACT
|
| 308 |
-
|
| 309 |
-
### **User Impact**
|
| 310 |
-
- 🟢 **Positive:** Better features, faster, smarter
|
| 311 |
-
- 🟡 **Neutral:** Slightly longer startup
|
| 312 |
-
- 🔴 **Negative:** None
|
| 313 |
-
|
| 314 |
-
### **Developer Impact**
|
| 315 |
-
- 🟢 **Positive:** Better code organization, more modular
|
| 316 |
-
- 🟢 **Positive:** Comprehensive documentation
|
| 317 |
-
- 🟡 **Neutral:** More files to maintain
|
| 318 |
-
- 🔴 **Negative:** None
|
| 319 |
-
|
| 320 |
-
### **System Impact**
|
| 321 |
-
- 🟢 **Positive:** 10x performance improvement
|
| 322 |
-
- 🟢 **Positive:** 10x capacity increase
|
| 323 |
-
- 🟡 **Neutral:** +300MB memory usage
|
| 324 |
-
- 🔴 **Negative:** None
|
| 325 |
-
|
| 326 |
-
---
|
| 327 |
-
|
| 328 |
-
## 📈 SCALABILITY COMPARISON
|
| 329 |
-
|
| 330 |
-
| Aspect | Before | After | Improvement |
|
| 331 |
-
|--------|--------|-------|-------------|
|
| 332 |
-
| **Max Memories** | ~1,000 | 10,000+ | 10x |
|
| 333 |
-
| **Search Complexity** | O(n) | O(log n) | Logarithmic |
|
| 334 |
-
| **Concurrent Queries** | 1 | Multiple | Thread-safe |
|
| 335 |
-
| **Index Size** | N/A | ~4KB/1000 | Efficient |
|
| 336 |
-
| **Memory Growth** | Linear | Sub-linear | Better |
|
| 337 |
-
|
| 338 |
-
---
|
| 339 |
-
|
| 340 |
-
## 🏆 SUCCESS METRICS
|
| 341 |
-
|
| 342 |
-
### **Technical Success**
|
| 343 |
-
- ✅ 100% backward compatible
|
| 344 |
-
- ✅ 0 breaking changes
|
| 345 |
-
- ✅ 10x performance improvement
|
| 346 |
-
- ✅ 4 new major features
|
| 347 |
-
- ✅ 8 new modules created
|
| 348 |
-
|
| 349 |
-
### **Quality Success**
|
| 350 |
-
- ✅ Comprehensive documentation
|
| 351 |
-
- ✅ Automated tests
|
| 352 |
-
- ✅ Error handling
|
| 353 |
-
- ✅ Fallback mechanisms
|
| 354 |
-
- ✅ Code organization
|
| 355 |
-
|
| 356 |
-
### **User Success** (To Measure)
|
| 357 |
-
- ⏳ User satisfaction
|
| 358 |
-
- ⏳ Feature adoption
|
| 359 |
-
- ⏳ Error rate reduction
|
| 360 |
-
- ⏳ Performance perception
|
| 361 |
-
- ⏳ Feedback scores
|
| 362 |
-
|
| 363 |
-
---
|
| 364 |
-
|
| 365 |
-
## 🎓 LESSONS LEARNED
|
| 366 |
-
|
| 367 |
-
### **What Worked Well**
|
| 368 |
-
- ✅ Modular architecture
|
| 369 |
-
- ✅ Fallback mechanisms
|
| 370 |
-
- ✅ Backward compatibility
|
| 371 |
-
- ✅ Comprehensive docs
|
| 372 |
-
- ✅ Hybrid storage (JSON + FAISS)
|
| 373 |
-
|
| 374 |
-
### **What Could Be Better**
|
| 375 |
-
- 🟡 Startup time (slightly slower)
|
| 376 |
-
- 🟡 Memory usage (increased)
|
| 377 |
-
- 🟡 Dependency count (more packages)
|
| 378 |
-
|
| 379 |
-
### **Future Improvements**
|
| 380 |
-
- 💡 Lazy loading for faster startup
|
| 381 |
-
- 💡 Memory optimization
|
| 382 |
-
- 💡 Optional feature flags
|
| 383 |
-
- 💡 Web interface
|
| 384 |
-
- 💡 Mobile app
|
| 385 |
-
|
| 386 |
-
---
|
| 387 |
-
|
| 388 |
-
## 📊 FINAL VERDICT
|
| 389 |
-
|
| 390 |
-
### **Overall Assessment**
|
| 391 |
-
|
| 392 |
-
| Category | Rating | Notes |
|
| 393 |
-
|----------|--------|-------|
|
| 394 |
-
| **Features** | ⭐⭐⭐⭐⭐ | 4 major new capabilities |
|
| 395 |
-
| **Performance** | ⭐⭐⭐⭐⭐ | 10x faster |
|
| 396 |
-
| **Compatibility** | ⭐⭐⭐⭐⭐ | 100% backward compatible |
|
| 397 |
-
| **Code Quality** | ⭐⭐⭐⭐⭐ | Well organized |
|
| 398 |
-
| **Documentation** | ⭐⭐⭐⭐⭐ | Comprehensive |
|
| 399 |
-
| **Testing** | ⭐⭐⭐⭐☆ | Good coverage |
|
| 400 |
-
| **User Experience** | ⭐⭐⭐⭐⭐ | Significantly improved |
|
| 401 |
-
|
| 402 |
-
**Overall:** ⭐⭐⭐⭐⭐ **EXCELLENT UPGRADE**
|
| 403 |
-
|
| 404 |
-
---
|
| 405 |
-
|
| 406 |
-
## ✅ RECOMMENDATION
|
| 407 |
-
|
| 408 |
-
**Status:** ✅ **APPROVED FOR DEPLOYMENT**
|
| 409 |
-
|
| 410 |
-
**Confidence:** 🟢 **HIGH**
|
| 411 |
-
|
| 412 |
-
**Reasoning:**
|
| 413 |
-
- All objectives achieved
|
| 414 |
-
- No breaking changes
|
| 415 |
-
- Significant improvements
|
| 416 |
-
- Well documented
|
| 417 |
-
- Production ready
|
| 418 |
-
|
| 419 |
-
**Next Steps:**
|
| 420 |
-
1. ✅ Deploy to production
|
| 421 |
-
2. ⏳ Monitor performance
|
| 422 |
-
3. ⏳ Collect user feedback
|
| 423 |
-
4. ⏳ Plan next iteration
|
| 424 |
-
|
| 425 |
-
---
|
| 426 |
-
|
| 427 |
-
**The upgrade is a resounding success! 🎉**
|
| 428 |
-
|
| 429 |
-
VisionQ has evolved from a basic vision assistant to a **state-of-the-art multimodal AI system** while maintaining 100% backward compatibility.
|
| 430 |
-
|
| 431 |
-
**Recommended Action:** PROCEED WITH DEPLOYMENT 🚀
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
archive/old_docs/DEPLOYMENT_CHECKLIST.md
DELETED
|
@@ -1,397 +0,0 @@
|
|
| 1 |
-
# ✅ VisionQ Upgrade - Deployment Checklist
|
| 2 |
-
|
| 3 |
-
## 📋 PRE-DEPLOYMENT
|
| 4 |
-
|
| 5 |
-
### **Code Review**
|
| 6 |
-
- [x] All agents implemented
|
| 7 |
-
- [x] Fusion layer created
|
| 8 |
-
- [x] Memory system upgraded
|
| 9 |
-
- [x] Query system enhanced
|
| 10 |
-
- [x] Voice system updated
|
| 11 |
-
- [x] Backward compatibility verified
|
| 12 |
-
- [x] Error handling added
|
| 13 |
-
- [x] Fallback mechanisms in place
|
| 14 |
-
|
| 15 |
-
### **Documentation**
|
| 16 |
-
- [x] README_UPGRADED.md created
|
| 17 |
-
- [x] QUICKSTART.md created
|
| 18 |
-
- [x] UPGRADE_GUIDE.md created
|
| 19 |
-
- [x] ARCHITECTURE.md created
|
| 20 |
-
- [x] SUMMARY.md created
|
| 21 |
-
- [x] Code comments added
|
| 22 |
-
- [x] Docstrings complete
|
| 23 |
-
|
| 24 |
-
### **Testing Scripts**
|
| 25 |
-
- [x] test_upgrade.py created
|
| 26 |
-
- [x] install_upgrade.bat created
|
| 27 |
-
- [x] Test cases defined
|
| 28 |
-
|
| 29 |
-
---
|
| 30 |
-
|
| 31 |
-
## 🚀 DEPLOYMENT STEPS
|
| 32 |
-
|
| 33 |
-
### **Step 1: Backup** ⚠️
|
| 34 |
-
```bash
|
| 35 |
-
# Backup existing system
|
| 36 |
-
mkdir backup
|
| 37 |
-
copy *.py backup\
|
| 38 |
-
copy memory.json backup\
|
| 39 |
-
```
|
| 40 |
-
- [ ] Old files backed up
|
| 41 |
-
- [ ] Memory file backed up
|
| 42 |
-
- [ ] Configuration saved
|
| 43 |
-
|
| 44 |
-
### **Step 2: Install Dependencies**
|
| 45 |
-
```bash
|
| 46 |
-
pip install -r requirements_upgraded.txt
|
| 47 |
-
```
|
| 48 |
-
- [ ] Core dependencies installed
|
| 49 |
-
- [ ] FAISS installed
|
| 50 |
-
- [ ] EasyOCR installed
|
| 51 |
-
- [ ] Piper TTS installed (optional)
|
| 52 |
-
|
| 53 |
-
### **Step 3: Directory Setup**
|
| 54 |
-
```bash
|
| 55 |
-
mkdir data
|
| 56 |
-
move memory.json data\memory.json
|
| 57 |
-
```
|
| 58 |
-
- [ ] data/ directory created
|
| 59 |
-
- [ ] Memory file migrated
|
| 60 |
-
- [ ] Permissions verified
|
| 61 |
-
|
| 62 |
-
### **Step 4: Run Tests**
|
| 63 |
-
```bash
|
| 64 |
-
python test_upgrade.py
|
| 65 |
-
```
|
| 66 |
-
- [ ] All imports successful
|
| 67 |
-
- [ ] MemoryAgent tests pass
|
| 68 |
-
- [ ] FusionLayer tests pass
|
| 69 |
-
- [ ] QueryAgent tests pass
|
| 70 |
-
- [ ] Backward compatibility verified
|
| 71 |
-
|
| 72 |
-
### **Step 5: Initial Run**
|
| 73 |
-
```bash
|
| 74 |
-
python main_upgraded.py
|
| 75 |
-
```
|
| 76 |
-
- [ ] System starts without errors
|
| 77 |
-
- [ ] Camera initializes
|
| 78 |
-
- [ ] Microphone detected
|
| 79 |
-
- [ ] Voice output works
|
| 80 |
-
- [ ] Can exit cleanly
|
| 81 |
-
|
| 82 |
-
---
|
| 83 |
-
|
| 84 |
-
## 🧪 FUNCTIONAL TESTING
|
| 85 |
-
|
| 86 |
-
### **Voice Commands**
|
| 87 |
-
- [ ] "Describe the scene" works
|
| 88 |
-
- [ ] "Remember this" stores memory
|
| 89 |
-
- [ ] "What did I see" recalls memory
|
| 90 |
-
- [ ] "Read the text" extracts text (if text visible)
|
| 91 |
-
- [ ] "Exit" quits properly
|
| 92 |
-
|
| 93 |
-
### **Memory System**
|
| 94 |
-
- [ ] Memories persist after restart
|
| 95 |
-
- [ ] JSON file created in data/
|
| 96 |
-
- [ ] FAISS index created (if available)
|
| 97 |
-
- [ ] Can recall stored memories
|
| 98 |
-
- [ ] Timestamps correct
|
| 99 |
-
|
| 100 |
-
### **Query System**
|
| 101 |
-
```bash
|
| 102 |
-
python ask_question_upgraded.py
|
| 103 |
-
```
|
| 104 |
-
- [ ] Time-based queries work
|
| 105 |
-
- [ ] Text search returns results
|
| 106 |
-
- [ ] Intent classification functional
|
| 107 |
-
- [ ] Confidence scores displayed
|
| 108 |
-
|
| 109 |
-
### **OCR Functionality**
|
| 110 |
-
- [ ] Text extraction works
|
| 111 |
-
- [ ] Confidence filtering applied
|
| 112 |
-
- [ ] Text cleaning functional
|
| 113 |
-
- [ ] Integrated into descriptions
|
| 114 |
-
|
| 115 |
-
### **Fallback Mechanisms**
|
| 116 |
-
- [ ] pyttsx3 works if Voxtral unavailable
|
| 117 |
-
- [ ] Keyword matching if DistilBERT fails
|
| 118 |
-
- [ ] Linear search if FAISS unavailable
|
| 119 |
-
- [ ] System continues if OCR fails
|
| 120 |
-
|
| 121 |
-
---
|
| 122 |
-
|
| 123 |
-
## 🔍 PERFORMANCE TESTING
|
| 124 |
-
|
| 125 |
-
### **Speed Tests**
|
| 126 |
-
- [ ] Memory search <100ms
|
| 127 |
-
- [ ] OCR processing <500ms
|
| 128 |
-
- [ ] Caption generation <200ms
|
| 129 |
-
- [ ] Query response <100ms
|
| 130 |
-
|
| 131 |
-
### **Capacity Tests**
|
| 132 |
-
- [ ] Can store 100+ memories
|
| 133 |
-
- [ ] Search remains fast with many memories
|
| 134 |
-
- [ ] FAISS index scales properly
|
| 135 |
-
- [ ] No memory leaks
|
| 136 |
-
|
| 137 |
-
### **Accuracy Tests**
|
| 138 |
-
- [ ] OCR accuracy >85% (on clear text)
|
| 139 |
-
- [ ] Intent classification >90%
|
| 140 |
-
- [ ] Memory retrieval relevance >85%
|
| 141 |
-
- [ ] Caption quality maintained
|
| 142 |
-
|
| 143 |
-
---
|
| 144 |
-
|
| 145 |
-
## 📊 INTEGRATION TESTING
|
| 146 |
-
|
| 147 |
-
### **End-to-End Scenarios**
|
| 148 |
-
|
| 149 |
-
**Scenario 1: Basic Usage**
|
| 150 |
-
1. [ ] Start system
|
| 151 |
-
2. [ ] Describe scene
|
| 152 |
-
3. [ ] Remember scene
|
| 153 |
-
4. [ ] Recall memory
|
| 154 |
-
5. [ ] Exit
|
| 155 |
-
|
| 156 |
-
**Scenario 2: OCR Workflow**
|
| 157 |
-
1. [ ] Start system
|
| 158 |
-
2. [ ] Point at text
|
| 159 |
-
3. [ ] Say "Read the text"
|
| 160 |
-
4. [ ] Verify text extracted
|
| 161 |
-
5. [ ] Check memory includes text
|
| 162 |
-
|
| 163 |
-
**Scenario 3: Query Workflow**
|
| 164 |
-
1. [ ] Store multiple memories
|
| 165 |
-
2. [ ] Run ask_question_upgraded.py
|
| 166 |
-
3. [ ] Try time-based query
|
| 167 |
-
4. [ ] Try object-based query
|
| 168 |
-
5. [ ] Verify results relevant
|
| 169 |
-
|
| 170 |
-
**Scenario 4: Fallback Testing**
|
| 171 |
-
1. [ ] Uninstall FAISS temporarily
|
| 172 |
-
2. [ ] Verify system still works
|
| 173 |
-
3. [ ] Reinstall FAISS
|
| 174 |
-
4. [ ] Verify enhanced features return
|
| 175 |
-
|
| 176 |
-
---
|
| 177 |
-
|
| 178 |
-
## 🐛 ERROR HANDLING
|
| 179 |
-
|
| 180 |
-
### **Common Errors to Test**
|
| 181 |
-
- [ ] Camera not available
|
| 182 |
-
- [ ] Microphone not detected
|
| 183 |
-
- [ ] Model download failure
|
| 184 |
-
- [ ] Memory file corrupted
|
| 185 |
-
- [ ] FAISS index corrupted
|
| 186 |
-
- [ ] Out of disk space
|
| 187 |
-
- [ ] Permission denied
|
| 188 |
-
|
| 189 |
-
### **Recovery Procedures**
|
| 190 |
-
- [ ] System logs errors clearly
|
| 191 |
-
- [ ] Fallbacks activate automatically
|
| 192 |
-
- [ ] User gets helpful error messages
|
| 193 |
-
- [ ] System doesn't crash
|
| 194 |
-
- [ ] Can recover without restart
|
| 195 |
-
|
| 196 |
-
---
|
| 197 |
-
|
| 198 |
-
## 📚 DOCUMENTATION VERIFICATION
|
| 199 |
-
|
| 200 |
-
### **User Documentation**
|
| 201 |
-
- [ ] QUICKSTART.md accurate
|
| 202 |
-
- [ ] Installation steps work
|
| 203 |
-
- [ ] Voice commands documented
|
| 204 |
-
- [ ] Query examples work
|
| 205 |
-
- [ ] Troubleshooting helpful
|
| 206 |
-
|
| 207 |
-
### **Developer Documentation**
|
| 208 |
-
- [ ] ARCHITECTURE.md clear
|
| 209 |
-
- [ ] Code comments accurate
|
| 210 |
-
- [ ] API documented
|
| 211 |
-
- [ ] Examples provided
|
| 212 |
-
- [ ] Diagrams correct
|
| 213 |
-
|
| 214 |
-
### **Upgrade Documentation**
|
| 215 |
-
- [ ] UPGRADE_GUIDE.md complete
|
| 216 |
-
- [ ] Migration steps clear
|
| 217 |
-
- [ ] Backward compatibility explained
|
| 218 |
-
- [ ] New features documented
|
| 219 |
-
- [ ] Performance metrics accurate
|
| 220 |
-
|
| 221 |
-
---
|
| 222 |
-
|
| 223 |
-
## 🔒 SECURITY & PRIVACY
|
| 224 |
-
|
| 225 |
-
### **Privacy Checks**
|
| 226 |
-
- [ ] No data sent to cloud
|
| 227 |
-
- [ ] All processing local
|
| 228 |
-
- [ ] Memory stored locally
|
| 229 |
-
- [ ] No telemetry
|
| 230 |
-
- [ ] No external API calls
|
| 231 |
-
|
| 232 |
-
### **Security Checks**
|
| 233 |
-
- [ ] No hardcoded credentials
|
| 234 |
-
- [ ] File permissions correct
|
| 235 |
-
- [ ] Input validation present
|
| 236 |
-
- [ ] No SQL injection risks
|
| 237 |
-
- [ ] Dependencies up to date
|
| 238 |
-
|
| 239 |
-
---
|
| 240 |
-
|
| 241 |
-
## 📦 PACKAGING
|
| 242 |
-
|
| 243 |
-
### **Files to Include**
|
| 244 |
-
- [x] agents/ directory
|
| 245 |
-
- [x] core/ directory
|
| 246 |
-
- [x] main_upgraded.py
|
| 247 |
-
- [x] ask_question_upgraded.py
|
| 248 |
-
- [x] requirements_upgraded.txt
|
| 249 |
-
- [x] install_upgrade.bat
|
| 250 |
-
- [x] test_upgrade.py
|
| 251 |
-
- [x] All documentation files
|
| 252 |
-
- [x] LICENSE
|
| 253 |
-
- [x] .gitignore
|
| 254 |
-
|
| 255 |
-
### **Files to Exclude**
|
| 256 |
-
- [ ] __pycache__/
|
| 257 |
-
- [ ] *.pyc
|
| 258 |
-
- [ ] data/test_*
|
| 259 |
-
- [ ] .venv/
|
| 260 |
-
- [ ] models/ (too large, download separately)
|
| 261 |
-
|
| 262 |
-
---
|
| 263 |
-
|
| 264 |
-
## 🚀 PRODUCTION READINESS
|
| 265 |
-
|
| 266 |
-
### **Critical Requirements**
|
| 267 |
-
- [ ] All tests pass
|
| 268 |
-
- [ ] No critical bugs
|
| 269 |
-
- [ ] Documentation complete
|
| 270 |
-
- [ ] Backward compatible
|
| 271 |
-
- [ ] Performance acceptable
|
| 272 |
-
|
| 273 |
-
### **Nice-to-Have**
|
| 274 |
-
- [ ] Neural TTS working
|
| 275 |
-
- [ ] FAISS available
|
| 276 |
-
- [ ] EasyOCR installed
|
| 277 |
-
- [ ] All optional features enabled
|
| 278 |
-
|
| 279 |
-
---
|
| 280 |
-
|
| 281 |
-
## 📈 POST-DEPLOYMENT
|
| 282 |
-
|
| 283 |
-
### **Monitoring**
|
| 284 |
-
- [ ] Track memory usage
|
| 285 |
-
- [ ] Monitor query performance
|
| 286 |
-
- [ ] Log error rates
|
| 287 |
-
- [ ] Collect user feedback
|
| 288 |
-
- [ ] Measure accuracy
|
| 289 |
-
|
| 290 |
-
### **Maintenance**
|
| 291 |
-
- [ ] Regular dependency updates
|
| 292 |
-
- [ ] Model updates
|
| 293 |
-
- [ ] Bug fixes
|
| 294 |
-
- [ ] Feature requests
|
| 295 |
-
- [ ] Documentation updates
|
| 296 |
-
|
| 297 |
-
---
|
| 298 |
-
|
| 299 |
-
## ✅ FINAL SIGN-OFF
|
| 300 |
-
|
| 301 |
-
### **Deployment Approval**
|
| 302 |
-
- [ ] All critical tests passed
|
| 303 |
-
- [ ] Documentation reviewed
|
| 304 |
-
- [ ] Backup created
|
| 305 |
-
- [ ] Rollback plan ready
|
| 306 |
-
- [ ] Team notified
|
| 307 |
-
|
| 308 |
-
### **Go-Live Checklist**
|
| 309 |
-
- [ ] System tested end-to-end
|
| 310 |
-
- [ ] Users trained
|
| 311 |
-
- [ ] Support ready
|
| 312 |
-
- [ ] Monitoring active
|
| 313 |
-
- [ ] Feedback mechanism in place
|
| 314 |
-
|
| 315 |
-
---
|
| 316 |
-
|
| 317 |
-
## 🎉 SUCCESS CRITERIA
|
| 318 |
-
|
| 319 |
-
### **Must Have**
|
| 320 |
-
- ✅ System starts without errors
|
| 321 |
-
- ✅ All voice commands work
|
| 322 |
-
- ✅ Memory persists
|
| 323 |
-
- ✅ Backward compatible
|
| 324 |
-
- ✅ Documentation complete
|
| 325 |
-
|
| 326 |
-
### **Should Have**
|
| 327 |
-
- ✅ OCR functional
|
| 328 |
-
- ✅ FAISS search fast
|
| 329 |
-
- ✅ Neural TTS working
|
| 330 |
-
- ✅ Intent classification accurate
|
| 331 |
-
- ✅ Performance targets met
|
| 332 |
-
|
| 333 |
-
### **Nice to Have**
|
| 334 |
-
- ⭐ All optional features enabled
|
| 335 |
-
- ⭐ Zero warnings
|
| 336 |
-
- ⭐ Perfect test coverage
|
| 337 |
-
- ⭐ User feedback positive
|
| 338 |
-
- ⭐ Performance exceeds targets
|
| 339 |
-
|
| 340 |
-
---
|
| 341 |
-
|
| 342 |
-
## 📞 SUPPORT CONTACTS
|
| 343 |
-
|
| 344 |
-
### **Technical Issues**
|
| 345 |
-
- Check: UPGRADE_GUIDE.md troubleshooting
|
| 346 |
-
- Run: test_upgrade.py
|
| 347 |
-
- Review: Error logs
|
| 348 |
-
|
| 349 |
-
### **Documentation**
|
| 350 |
-
- QUICKSTART.md - Quick setup
|
| 351 |
-
- UPGRADE_GUIDE.md - Complete guide
|
| 352 |
-
- ARCHITECTURE.md - Technical details
|
| 353 |
-
|
| 354 |
-
---
|
| 355 |
-
|
| 356 |
-
## 🏁 DEPLOYMENT STATUS
|
| 357 |
-
|
| 358 |
-
**Current Status:** ✅ READY FOR DEPLOYMENT
|
| 359 |
-
|
| 360 |
-
**Confidence Level:** HIGH
|
| 361 |
-
- All code implemented
|
| 362 |
-
- Tests created
|
| 363 |
-
- Documentation complete
|
| 364 |
-
- Backward compatible
|
| 365 |
-
- Fallbacks in place
|
| 366 |
-
|
| 367 |
-
**Recommended Action:** PROCEED WITH DEPLOYMENT
|
| 368 |
-
|
| 369 |
-
---
|
| 370 |
-
|
| 371 |
-
## 📝 NOTES
|
| 372 |
-
|
| 373 |
-
### **Known Limitations**
|
| 374 |
-
- Piper TTS requires separate model download
|
| 375 |
-
- EasyOCR first run downloads models (~500MB)
|
| 376 |
-
- FAISS CPU-only (GPU version available separately)
|
| 377 |
-
- OCR accuracy depends on image quality
|
| 378 |
-
|
| 379 |
-
### **Future Improvements**
|
| 380 |
-
- Web interface
|
| 381 |
-
- Mobile app
|
| 382 |
-
- Cloud sync (optional)
|
| 383 |
-
- Multi-user support
|
| 384 |
-
- Video recording
|
| 385 |
-
|
| 386 |
-
---
|
| 387 |
-
|
| 388 |
-
**Deployment Checklist Complete! 🎊**
|
| 389 |
-
|
| 390 |
-
**Next Steps:**
|
| 391 |
-
1. Review this checklist
|
| 392 |
-
2. Run through deployment steps
|
| 393 |
-
3. Execute test suite
|
| 394 |
-
4. Verify all features
|
| 395 |
-
5. Deploy to production
|
| 396 |
-
|
| 397 |
-
**Good luck! 🚀**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
archive/old_docs/INDEX.md
DELETED
|
@@ -1,359 +0,0 @@
|
|
| 1 |
-
# 📚 VisionQ Upgrade - Documentation Index
|
| 2 |
-
|
| 3 |
-
## 🎯 START HERE
|
| 4 |
-
|
| 5 |
-
**New to VisionQ?** → [QUICKSTART.md](QUICKSTART.md)
|
| 6 |
-
**Upgrading existing system?** → [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md)
|
| 7 |
-
**Need quick reference?** → [QUICK_REFERENCE.md](QUICK_REFERENCE.md)
|
| 8 |
-
|
| 9 |
-
---
|
| 10 |
-
|
| 11 |
-
## 📖 DOCUMENTATION MAP
|
| 12 |
-
|
| 13 |
-
### **🚀 Getting Started**
|
| 14 |
-
|
| 15 |
-
| Document | Purpose | Time | Audience |
|
| 16 |
-
|----------|---------|------|----------|
|
| 17 |
-
| [QUICKSTART.md](QUICKSTART.md) | 5-minute setup guide | 5 min | Everyone |
|
| 18 |
-
| [QUICK_REFERENCE.md](QUICK_REFERENCE.md) | Command cheat sheet | 2 min | Everyone |
|
| 19 |
-
| [README_UPGRADED.md](README_UPGRADED.md) | Project overview | 10 min | Everyone |
|
| 20 |
-
|
| 21 |
-
### **📋 Upgrade Information**
|
| 22 |
-
|
| 23 |
-
| Document | Purpose | Time | Audience |
|
| 24 |
-
|----------|---------|------|----------|
|
| 25 |
-
| [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md) | Complete upgrade docs | 30 min | Developers |
|
| 26 |
-
| [SUMMARY.md](SUMMARY.md) | Executive summary | 10 min | Managers |
|
| 27 |
-
| [COMPARISON.md](COMPARISON.md) | Before/After analysis | 15 min | Technical leads |
|
| 28 |
-
|
| 29 |
-
### **🏗️ Technical Documentation**
|
| 30 |
-
|
| 31 |
-
| Document | Purpose | Time | Audience |
|
| 32 |
-
|----------|---------|------|----------|
|
| 33 |
-
| [ARCHITECTURE.md](ARCHITECTURE.md) | System architecture | 20 min | Developers |
|
| 34 |
-
| [DEPLOYMENT_CHECKLIST.md](DEPLOYMENT_CHECKLIST.md) | Deploy procedures | 15 min | DevOps |
|
| 35 |
-
|
| 36 |
-
### **📝 Code Files**
|
| 37 |
-
|
| 38 |
-
| File | Purpose | Type |
|
| 39 |
-
|------|---------|------|
|
| 40 |
-
| `main_upgraded.py` | Main entry point | Python |
|
| 41 |
-
| `ask_question_upgraded.py` | Query interface | Python |
|
| 42 |
-
| `test_upgrade.py` | Test suite | Python |
|
| 43 |
-
| `install_upgrade.bat` | Installer script | Batch |
|
| 44 |
-
| `requirements_upgraded.txt` | Dependencies | Text |
|
| 45 |
-
|
| 46 |
-
---
|
| 47 |
-
|
| 48 |
-
## 🗺️ NAVIGATION GUIDE
|
| 49 |
-
|
| 50 |
-
### **I want to...**
|
| 51 |
-
|
| 52 |
-
**...get started quickly**
|
| 53 |
-
→ [QUICKSTART.md](QUICKSTART.md) → Run `install_upgrade.bat`
|
| 54 |
-
|
| 55 |
-
**...understand what changed**
|
| 56 |
-
→ [COMPARISON.md](COMPARISON.md) → [SUMMARY.md](SUMMARY.md)
|
| 57 |
-
|
| 58 |
-
**...learn the architecture**
|
| 59 |
-
→ [ARCHITECTURE.md](ARCHITECTURE.md) → Code in `agents/`
|
| 60 |
-
|
| 61 |
-
**...deploy to production**
|
| 62 |
-
→ [DEPLOYMENT_CHECKLIST.md](DEPLOYMENT_CHECKLIST.md)
|
| 63 |
-
|
| 64 |
-
**...troubleshoot issues**
|
| 65 |
-
→ [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md) → Troubleshooting section
|
| 66 |
-
|
| 67 |
-
**...see command reference**
|
| 68 |
-
→ [QUICK_REFERENCE.md](QUICK_REFERENCE.md)
|
| 69 |
-
|
| 70 |
-
**...understand the upgrade**
|
| 71 |
-
→ [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md) → [SUMMARY.md](SUMMARY.md)
|
| 72 |
-
|
| 73 |
-
---
|
| 74 |
-
|
| 75 |
-
## 📊 DOCUMENT RELATIONSHIPS
|
| 76 |
-
|
| 77 |
-
```
|
| 78 |
-
START
|
| 79 |
-
│
|
| 80 |
-
├─ Quick Start? → QUICKSTART.md
|
| 81 |
-
│ │
|
| 82 |
-
│ └─ Need details? → UPGRADE_GUIDE.md
|
| 83 |
-
│
|
| 84 |
-
├─ Overview? → README_UPGRADED.md
|
| 85 |
-
│ │
|
| 86 |
-
│ └─ Technical? → ARCHITECTURE.md
|
| 87 |
-
│
|
| 88 |
-
├─ Comparison? → COMPARISON.md
|
| 89 |
-
│ │
|
| 90 |
-
│ └─ Summary? → SUMMARY.md
|
| 91 |
-
│
|
| 92 |
-
└─ Deploy? → DEPLOYMENT_CHECKLIST.md
|
| 93 |
-
│
|
| 94 |
-
└─ Reference? → QUICK_REFERENCE.md
|
| 95 |
-
```
|
| 96 |
-
|
| 97 |
-
---
|
| 98 |
-
|
| 99 |
-
## 🎯 BY ROLE
|
| 100 |
-
|
| 101 |
-
### **👤 End User**
|
| 102 |
-
1. [QUICKSTART.md](QUICKSTART.md) - Setup
|
| 103 |
-
2. [QUICK_REFERENCE.md](QUICK_REFERENCE.md) - Commands
|
| 104 |
-
3. [README_UPGRADED.md](README_UPGRADED.md) - Features
|
| 105 |
-
|
| 106 |
-
### **👨💻 Developer**
|
| 107 |
-
1. [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md) - Complete guide
|
| 108 |
-
2. [ARCHITECTURE.md](ARCHITECTURE.md) - System design
|
| 109 |
-
3. Code in `agents/` and `core/` - Implementation
|
| 110 |
-
|
| 111 |
-
### **👔 Manager**
|
| 112 |
-
1. [SUMMARY.md](SUMMARY.md) - Executive summary
|
| 113 |
-
2. [COMPARISON.md](COMPARISON.md) - ROI analysis
|
| 114 |
-
3. [README_UPGRADED.md](README_UPGRADED.md) - Overview
|
| 115 |
-
|
| 116 |
-
### **🚀 DevOps**
|
| 117 |
-
1. [DEPLOYMENT_CHECKLIST.md](DEPLOYMENT_CHECKLIST.md) - Deploy
|
| 118 |
-
2. [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md) - Installation
|
| 119 |
-
3. `test_upgrade.py` - Testing
|
| 120 |
-
|
| 121 |
-
---
|
| 122 |
-
|
| 123 |
-
## 📚 READING ORDER
|
| 124 |
-
|
| 125 |
-
### **Fast Track (30 minutes)**
|
| 126 |
-
1. [QUICKSTART.md](QUICKSTART.md) - 5 min
|
| 127 |
-
2. [QUICK_REFERENCE.md](QUICK_REFERENCE.md) - 2 min
|
| 128 |
-
3. [SUMMARY.md](SUMMARY.md) - 10 min
|
| 129 |
-
4. [COMPARISON.md](COMPARISON.md) - 15 min
|
| 130 |
-
|
| 131 |
-
### **Complete Track (2 hours)**
|
| 132 |
-
1. [README_UPGRADED.md](README_UPGRADED.md) - 10 min
|
| 133 |
-
2. [QUICKSTART.md](QUICKSTART.md) - 5 min
|
| 134 |
-
3. [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md) - 30 min
|
| 135 |
-
4. [ARCHITECTURE.md](ARCHITECTURE.md) - 20 min
|
| 136 |
-
5. [COMPARISON.md](COMPARISON.md) - 15 min
|
| 137 |
-
6. [SUMMARY.md](SUMMARY.md) - 10 min
|
| 138 |
-
7. [DEPLOYMENT_CHECKLIST.md](DEPLOYMENT_CHECKLIST.md) - 15 min
|
| 139 |
-
8. Code exploration - 30 min
|
| 140 |
-
|
| 141 |
-
### **Technical Deep Dive (4 hours)**
|
| 142 |
-
1. All documents above
|
| 143 |
-
2. Code in `agents/` - 1 hour
|
| 144 |
-
3. Code in `core/` - 30 min
|
| 145 |
-
4. Test suite analysis - 30 min
|
| 146 |
-
5. Hands-on experimentation - 1 hour
|
| 147 |
-
|
| 148 |
-
---
|
| 149 |
-
|
| 150 |
-
## 🔍 BY TOPIC
|
| 151 |
-
|
| 152 |
-
### **Installation & Setup**
|
| 153 |
-
- [QUICKSTART.md](QUICKSTART.md) - Quick setup
|
| 154 |
-
- [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md) - Detailed installation
|
| 155 |
-
- `install_upgrade.bat` - Automated installer
|
| 156 |
-
- `requirements_upgraded.txt` - Dependencies
|
| 157 |
-
|
| 158 |
-
### **Features & Capabilities**
|
| 159 |
-
- [README_UPGRADED.md](README_UPGRADED.md) - Feature overview
|
| 160 |
-
- [COMPARISON.md](COMPARISON.md) - Before/After features
|
| 161 |
-
- [SUMMARY.md](SUMMARY.md) - Capability summary
|
| 162 |
-
|
| 163 |
-
### **Architecture & Design**
|
| 164 |
-
- [ARCHITECTURE.md](ARCHITECTURE.md) - System architecture
|
| 165 |
-
- [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md) - Design decisions
|
| 166 |
-
- Code in `agents/` - Implementation
|
| 167 |
-
|
| 168 |
-
### **Usage & Commands**
|
| 169 |
-
- [QUICK_REFERENCE.md](QUICK_REFERENCE.md) - Command reference
|
| 170 |
-
- [QUICKSTART.md](QUICKSTART.md) - Usage examples
|
| 171 |
-
- [README_UPGRADED.md](README_UPGRADED.md) - Use cases
|
| 172 |
-
|
| 173 |
-
### **Testing & Deployment**
|
| 174 |
-
- [DEPLOYMENT_CHECKLIST.md](DEPLOYMENT_CHECKLIST.md) - Deploy guide
|
| 175 |
-
- `test_upgrade.py` - Test suite
|
| 176 |
-
- [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md) - Testing section
|
| 177 |
-
|
| 178 |
-
### **Troubleshooting**
|
| 179 |
-
- [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md) - Troubleshooting section
|
| 180 |
-
- [QUICK_REFERENCE.md](QUICK_REFERENCE.md) - Quick fixes
|
| 181 |
-
- [DEPLOYMENT_CHECKLIST.md](DEPLOYMENT_CHECKLIST.md) - Error handling
|
| 182 |
-
|
| 183 |
-
---
|
| 184 |
-
|
| 185 |
-
## 📦 FILE INVENTORY
|
| 186 |
-
|
| 187 |
-
### **Documentation (11 files)**
|
| 188 |
-
- ✅ QUICKSTART.md
|
| 189 |
-
- ✅ QUICK_REFERENCE.md
|
| 190 |
-
- ✅ README_UPGRADED.md
|
| 191 |
-
- ✅ UPGRADE_GUIDE.md
|
| 192 |
-
- ✅ ARCHITECTURE.md
|
| 193 |
-
- ✅ SUMMARY.md
|
| 194 |
-
- ✅ COMPARISON.md
|
| 195 |
-
- ✅ DEPLOYMENT_CHECKLIST.md
|
| 196 |
-
- ✅ INDEX.md (this file)
|
| 197 |
-
- ✅ README.md (original)
|
| 198 |
-
- ✅ requirements_upgraded.txt
|
| 199 |
-
|
| 200 |
-
### **Code Files (12 files)**
|
| 201 |
-
- ✅ agents/__init__.py
|
| 202 |
-
- ✅ agents/voice_agent.py
|
| 203 |
-
- ✅ agents/vision_agent.py
|
| 204 |
-
- ✅ agents/caption_agent.py
|
| 205 |
-
- ✅ agents/embedding_agent.py
|
| 206 |
-
- ✅ agents/ocr_agent.py
|
| 207 |
-
- ✅ agents/memory_agent.py
|
| 208 |
-
- ✅ agents/query_agent.py
|
| 209 |
-
- ✅ core/__init__.py
|
| 210 |
-
- ✅ core/fusion_layer.py
|
| 211 |
-
- ✅ main_upgraded.py
|
| 212 |
-
- ✅ ask_question_upgraded.py
|
| 213 |
-
|
| 214 |
-
### **Utility Files (2 files)**
|
| 215 |
-
- ✅ test_upgrade.py
|
| 216 |
-
- ✅ install_upgrade.bat
|
| 217 |
-
|
| 218 |
-
### **Total: 25 new/updated files**
|
| 219 |
-
|
| 220 |
-
---
|
| 221 |
-
|
| 222 |
-
## 🎓 LEARNING PATHS
|
| 223 |
-
|
| 224 |
-
### **Path 1: Quick User (1 hour)**
|
| 225 |
-
```
|
| 226 |
-
QUICKSTART.md
|
| 227 |
-
↓
|
| 228 |
-
Run install_upgrade.bat
|
| 229 |
-
↓
|
| 230 |
-
Run main_upgraded.py
|
| 231 |
-
↓
|
| 232 |
-
Try voice commands
|
| 233 |
-
↓
|
| 234 |
-
QUICK_REFERENCE.md (bookmark)
|
| 235 |
-
```
|
| 236 |
-
|
| 237 |
-
### **Path 2: Developer (4 hours)**
|
| 238 |
-
```
|
| 239 |
-
README_UPGRADED.md
|
| 240 |
-
↓
|
| 241 |
-
UPGRADE_GUIDE.md
|
| 242 |
-
↓
|
| 243 |
-
ARCHITECTURE.md
|
| 244 |
-
↓
|
| 245 |
-
Explore agents/ code
|
| 246 |
-
↓
|
| 247 |
-
Run test_upgrade.py
|
| 248 |
-
↓
|
| 249 |
-
Modify and experiment
|
| 250 |
-
```
|
| 251 |
-
|
| 252 |
-
### **Path 3: Manager (30 minutes)**
|
| 253 |
-
```
|
| 254 |
-
SUMMARY.md
|
| 255 |
-
↓
|
| 256 |
-
COMPARISON.md
|
| 257 |
-
↓
|
| 258 |
-
README_UPGRADED.md
|
| 259 |
-
↓
|
| 260 |
-
Make decision
|
| 261 |
-
```
|
| 262 |
-
|
| 263 |
-
---
|
| 264 |
-
|
| 265 |
-
## 🔗 EXTERNAL RESOURCES
|
| 266 |
-
|
| 267 |
-
### **Model Documentation**
|
| 268 |
-
- [YOLO](https://github.com/ultralytics/ultralytics)
|
| 269 |
-
- [BLIP](https://github.com/salesforce/BLIP)
|
| 270 |
-
- [CLIP](https://github.com/openai/CLIP)
|
| 271 |
-
- [EasyOCR](https://github.com/JaidedAI/EasyOCR)
|
| 272 |
-
- [FAISS](https://github.com/facebookresearch/faiss)
|
| 273 |
-
- [Vosk](https://alphacephei.com/vosk/)
|
| 274 |
-
- [Piper TTS](https://github.com/rhasspy/piper)
|
| 275 |
-
|
| 276 |
-
### **Python Libraries**
|
| 277 |
-
- [PyTorch](https://pytorch.org/)
|
| 278 |
-
- [Transformers](https://huggingface.co/docs/transformers)
|
| 279 |
-
- [OpenCV](https://opencv.org/)
|
| 280 |
-
- [sentence-transformers](https://www.sbert.net/)
|
| 281 |
-
|
| 282 |
-
---
|
| 283 |
-
|
| 284 |
-
## 📞 SUPPORT MATRIX
|
| 285 |
-
|
| 286 |
-
| Issue Type | Resource |
|
| 287 |
-
|------------|----------|
|
| 288 |
-
| **Installation** | QUICKSTART.md → UPGRADE_GUIDE.md |
|
| 289 |
-
| **Usage** | QUICK_REFERENCE.md → README_UPGRADED.md |
|
| 290 |
-
| **Errors** | UPGRADE_GUIDE.md (Troubleshooting) |
|
| 291 |
-
| **Architecture** | ARCHITECTURE.md |
|
| 292 |
-
| **Deployment** | DEPLOYMENT_CHECKLIST.md |
|
| 293 |
-
| **Comparison** | COMPARISON.md |
|
| 294 |
-
| **Testing** | test_upgrade.py |
|
| 295 |
-
|
| 296 |
-
---
|
| 297 |
-
|
| 298 |
-
## ✅ DOCUMENTATION CHECKLIST
|
| 299 |
-
|
| 300 |
-
### **For Users**
|
| 301 |
-
- [x] Quick start guide
|
| 302 |
-
- [x] Command reference
|
| 303 |
-
- [x] Troubleshooting guide
|
| 304 |
-
- [x] Use case examples
|
| 305 |
-
|
| 306 |
-
### **For Developers**
|
| 307 |
-
- [x] Architecture documentation
|
| 308 |
-
- [x] Code organization explained
|
| 309 |
-
- [x] API documentation (docstrings)
|
| 310 |
-
- [x] Test suite
|
| 311 |
-
|
| 312 |
-
### **For Managers**
|
| 313 |
-
- [x] Executive summary
|
| 314 |
-
- [x] ROI analysis
|
| 315 |
-
- [x] Feature comparison
|
| 316 |
-
- [x] Deployment guide
|
| 317 |
-
|
| 318 |
-
### **For DevOps**
|
| 319 |
-
- [x] Installation scripts
|
| 320 |
-
- [x] Deployment checklist
|
| 321 |
-
- [x] Testing procedures
|
| 322 |
-
- [x] Troubleshooting guide
|
| 323 |
-
|
| 324 |
-
---
|
| 325 |
-
|
| 326 |
-
## 🎯 QUICK LINKS
|
| 327 |
-
|
| 328 |
-
**Most Important:**
|
| 329 |
-
- 🚀 [QUICKSTART.md](QUICKSTART.md) - Start here!
|
| 330 |
-
- 📋 [QUICK_REFERENCE.md](QUICK_REFERENCE.md) - Commands
|
| 331 |
-
- 📚 [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md) - Complete guide
|
| 332 |
-
|
| 333 |
-
**For Understanding:**
|
| 334 |
-
- 📊 [COMPARISON.md](COMPARISON.md) - What changed
|
| 335 |
-
- 📝 [SUMMARY.md](SUMMARY.md) - Executive summary
|
| 336 |
-
- 🏗️ [ARCHITECTURE.md](ARCHITECTURE.md) - How it works
|
| 337 |
-
|
| 338 |
-
**For Action:**
|
| 339 |
-
- ✅ [DEPLOYMENT_CHECKLIST.md](DEPLOYMENT_CHECKLIST.md) - Deploy
|
| 340 |
-
- 🧪 `test_upgrade.py` - Test
|
| 341 |
-
- 🔧 `install_upgrade.bat` - Install
|
| 342 |
-
|
| 343 |
-
---
|
| 344 |
-
|
| 345 |
-
## 🎉 YOU'RE ALL SET!
|
| 346 |
-
|
| 347 |
-
**This index covers all documentation for the VisionQ upgrade.**
|
| 348 |
-
|
| 349 |
-
**Start with:** [QUICKSTART.md](QUICKSTART.md)
|
| 350 |
-
|
| 351 |
-
**Need help?** Check the appropriate document above.
|
| 352 |
-
|
| 353 |
-
**Happy upgrading! 🚀**
|
| 354 |
-
|
| 355 |
-
---
|
| 356 |
-
|
| 357 |
-
**Last Updated:** 2024
|
| 358 |
-
**Version:** 2.0 (Upgraded)
|
| 359 |
-
**Status:** ✅ Production Ready
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
archive/old_docs/QUICKSTART.md
DELETED
|
@@ -1,197 +0,0 @@
|
|
| 1 |
-
# 🚀 VisionQ Upgrade - Quick Start Guide
|
| 2 |
-
|
| 3 |
-
## ⚡ 5-Minute Setup
|
| 4 |
-
|
| 5 |
-
### **Step 1: Install Dependencies**
|
| 6 |
-
```bash
|
| 7 |
-
pip install -r requirements_upgraded.txt
|
| 8 |
-
```
|
| 9 |
-
|
| 10 |
-
### **Step 2: Create Data Directory**
|
| 11 |
-
```bash
|
| 12 |
-
mkdir data
|
| 13 |
-
move memory.json data\memory.json
|
| 14 |
-
```
|
| 15 |
-
|
| 16 |
-
### **Step 3: Run Upgraded System**
|
| 17 |
-
```bash
|
| 18 |
-
python main_upgraded.py
|
| 19 |
-
```
|
| 20 |
-
|
| 21 |
-
---
|
| 22 |
-
|
| 23 |
-
## 🎯 What's New?
|
| 24 |
-
|
| 25 |
-
### **1. OCR Text Reading** ✨
|
| 26 |
-
**Voice Command:** "Read the text"
|
| 27 |
-
- Points camera at text
|
| 28 |
-
- Extracts and speaks visible text
|
| 29 |
-
- Stores text in memory
|
| 30 |
-
|
| 31 |
-
### **2. Enhanced Memory** 🧠
|
| 32 |
-
- **FAISS vector search** - 10x faster retrieval
|
| 33 |
-
- **Image embeddings** - Find visually similar scenes
|
| 34 |
-
- **Hybrid search** - Text + image combined
|
| 35 |
-
|
| 36 |
-
### **3. Better Voice** 🗣️
|
| 37 |
-
- **Neural TTS** (Voxtral/Piper) - Natural speech
|
| 38 |
-
- **Auto-fallback** - Uses pyttsx3 if needed
|
| 39 |
-
- **Same commands** - No learning curve
|
| 40 |
-
|
| 41 |
-
### **4. Smarter Queries** 🔍
|
| 42 |
-
- **DistilBERT NLP** - Understands intent
|
| 43 |
-
- **Time-aware** - "What did I see this morning?"
|
| 44 |
-
- **Multi-modal** - Searches text, images, objects
|
| 45 |
-
|
| 46 |
-
---
|
| 47 |
-
|
| 48 |
-
## 📋 Voice Commands
|
| 49 |
-
|
| 50 |
-
| Say This | System Does |
|
| 51 |
-
|----------|-------------|
|
| 52 |
-
| "Describe the scene" | Caption + OCR + objects |
|
| 53 |
-
| "Remember this" | Store with embeddings |
|
| 54 |
-
| "What did I see" | Recall last memory |
|
| 55 |
-
| **"Read the text"** | **Extract visible text** ⭐ NEW |
|
| 56 |
-
| "Exit" | Quit system |
|
| 57 |
-
|
| 58 |
-
---
|
| 59 |
-
|
| 60 |
-
## 🔧 Optional: Neural TTS Setup
|
| 61 |
-
|
| 62 |
-
**Want better voice quality?**
|
| 63 |
-
|
| 64 |
-
1. Download Piper voice model:
|
| 65 |
-
- https://github.com/rhasspy/piper/releases
|
| 66 |
-
- Get: `en_US-lessac-medium.onnx`
|
| 67 |
-
|
| 68 |
-
2. Create directory:
|
| 69 |
-
```bash
|
| 70 |
-
mkdir models\piper
|
| 71 |
-
```
|
| 72 |
-
|
| 73 |
-
3. Extract model to `models/piper/`
|
| 74 |
-
|
| 75 |
-
4. Restart VisionQ
|
| 76 |
-
|
| 77 |
-
**Note:** System works fine without this - pyttsx3 is the fallback!
|
| 78 |
-
|
| 79 |
-
---
|
| 80 |
-
|
| 81 |
-
## 🧪 Test Your Upgrade
|
| 82 |
-
|
| 83 |
-
### **Test 1: OCR**
|
| 84 |
-
```bash
|
| 85 |
-
python main_upgraded.py
|
| 86 |
-
# Say: "Read the text"
|
| 87 |
-
# Point camera at text
|
| 88 |
-
# Should extract and speak text
|
| 89 |
-
```
|
| 90 |
-
|
| 91 |
-
### **Test 2: Enhanced Memory**
|
| 92 |
-
```bash
|
| 93 |
-
python ask_question_upgraded.py
|
| 94 |
-
# Type: "What did I see today?"
|
| 95 |
-
# Should show memories with confidence scores
|
| 96 |
-
```
|
| 97 |
-
|
| 98 |
-
### **Test 3: Voice Quality**
|
| 99 |
-
```bash
|
| 100 |
-
# Listen to TTS output
|
| 101 |
-
# Should sound natural (if Piper installed)
|
| 102 |
-
# Or robotic (if using pyttsx3 fallback)
|
| 103 |
-
```
|
| 104 |
-
|
| 105 |
-
---
|
| 106 |
-
|
| 107 |
-
## 🐛 Quick Fixes
|
| 108 |
-
|
| 109 |
-
### **"Module not found" error**
|
| 110 |
-
```bash
|
| 111 |
-
pip install --upgrade -r requirements_upgraded.txt
|
| 112 |
-
```
|
| 113 |
-
|
| 114 |
-
### **"FAISS not available" warning**
|
| 115 |
-
```bash
|
| 116 |
-
pip install faiss-cpu
|
| 117 |
-
```
|
| 118 |
-
|
| 119 |
-
### **"OCR not working"**
|
| 120 |
-
```bash
|
| 121 |
-
pip install easyocr
|
| 122 |
-
```
|
| 123 |
-
|
| 124 |
-
### **Camera not opening**
|
| 125 |
-
```bash
|
| 126 |
-
# Check camera permissions
|
| 127 |
-
# Try different camera index in vision_agent.py:
|
| 128 |
-
# self.cap = cv2.VideoCapture(1) # Try 1 instead of 0
|
| 129 |
-
```
|
| 130 |
-
|
| 131 |
-
---
|
| 132 |
-
|
| 133 |
-
## 📊 What Got Better?
|
| 134 |
-
|
| 135 |
-
| Feature | Before | After | Improvement |
|
| 136 |
-
|---------|--------|-------|-------------|
|
| 137 |
-
| Text Reading | ❌ None | ✅ OCR | NEW |
|
| 138 |
-
| Memory Search | Slow | Fast | 10x faster |
|
| 139 |
-
| Voice Quality | Robotic | Natural | Much better |
|
| 140 |
-
| Query Understanding | Keywords | NLP | Smarter |
|
| 141 |
-
| Scene Understanding | Caption only | Caption+OCR+Objects | Richer |
|
| 142 |
-
|
| 143 |
-
---
|
| 144 |
-
|
| 145 |
-
## 🎓 Example Queries
|
| 146 |
-
|
| 147 |
-
**Try these in `ask_question_upgraded.py`:**
|
| 148 |
-
|
| 149 |
-
```
|
| 150 |
-
"What did I see this morning?"
|
| 151 |
-
"Show me memories with text"
|
| 152 |
-
"When did I see a person?"
|
| 153 |
-
"What happened in the last hour?"
|
| 154 |
-
"Find memories from yesterday"
|
| 155 |
-
```
|
| 156 |
-
|
| 157 |
-
---
|
| 158 |
-
|
| 159 |
-
## ✅ Success Checklist
|
| 160 |
-
|
| 161 |
-
- [ ] System starts without errors
|
| 162 |
-
- [ ] Voice recognition works
|
| 163 |
-
- [ ] Camera captures video
|
| 164 |
-
- [ ] "Describe scene" gives detailed output
|
| 165 |
-
- [ ] "Remember this" stores memory
|
| 166 |
-
- [ ] "Read text" extracts text (if text visible)
|
| 167 |
-
- [ ] Query system returns results
|
| 168 |
-
- [ ] Memory persists after restart
|
| 169 |
-
|
| 170 |
-
---
|
| 171 |
-
|
| 172 |
-
## 🚀 You're Ready!
|
| 173 |
-
|
| 174 |
-
Your VisionQ is now upgraded with:
|
| 175 |
-
- ✅ OCR text reading
|
| 176 |
-
- ✅ Fast vector search (FAISS)
|
| 177 |
-
- ✅ Neural TTS (optional)
|
| 178 |
-
- ✅ Smart NLP queries
|
| 179 |
-
- ✅ Enhanced memory
|
| 180 |
-
|
| 181 |
-
**All existing features still work!**
|
| 182 |
-
|
| 183 |
-
---
|
| 184 |
-
|
| 185 |
-
## 📚 Full Documentation
|
| 186 |
-
|
| 187 |
-
For detailed information, see:
|
| 188 |
-
- `UPGRADE_GUIDE.md` - Complete upgrade documentation
|
| 189 |
-
- `requirements_upgraded.txt` - All dependencies
|
| 190 |
-
- `agents/` - New modular code
|
| 191 |
-
- `core/` - Fusion layer
|
| 192 |
-
|
| 193 |
-
---
|
| 194 |
-
|
| 195 |
-
**Need help?** Check `UPGRADE_GUIDE.md` troubleshooting section.
|
| 196 |
-
|
| 197 |
-
**Happy upgrading! 🎉**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
archive/old_docs/QUICK_REFERENCE.md
DELETED
|
@@ -1,315 +0,0 @@
|
|
| 1 |
-
# 🎯 VisionQ Upgrade - Quick Reference Card
|
| 2 |
-
|
| 3 |
-
## 📦 INSTALLATION (3 Steps)
|
| 4 |
-
|
| 5 |
-
```bash
|
| 6 |
-
# 1. Install dependencies
|
| 7 |
-
pip install -r requirements_upgraded.txt
|
| 8 |
-
|
| 9 |
-
# 2. Create data directory
|
| 10 |
-
mkdir data
|
| 11 |
-
|
| 12 |
-
# 3. Run system
|
| 13 |
-
python main_upgraded.py
|
| 14 |
-
```
|
| 15 |
-
|
| 16 |
-
---
|
| 17 |
-
|
| 18 |
-
## 🗣️ VOICE COMMANDS
|
| 19 |
-
|
| 20 |
-
| Say This | System Does |
|
| 21 |
-
|----------|-------------|
|
| 22 |
-
| **"Describe the scene"** | Captures and describes (caption + OCR + objects) |
|
| 23 |
-
| **"Remember this"** | Stores scene with embeddings in memory |
|
| 24 |
-
| **"What did I see"** | Recalls last memory |
|
| 25 |
-
| **"Read the text"** | Extracts visible text (OCR) 🆕 |
|
| 26 |
-
| **"Exit"** | Quits system |
|
| 27 |
-
|
| 28 |
-
---
|
| 29 |
-
|
| 30 |
-
## 🔍 QUERY EXAMPLES
|
| 31 |
-
|
| 32 |
-
```bash
|
| 33 |
-
python ask_question_upgraded.py
|
| 34 |
-
```
|
| 35 |
-
|
| 36 |
-
**Try these:**
|
| 37 |
-
- "What did I see this morning?"
|
| 38 |
-
- "Show me memories with text"
|
| 39 |
-
- "When did I see a person?"
|
| 40 |
-
- "Find memories from yesterday"
|
| 41 |
-
- "What happened in the last hour?"
|
| 42 |
-
|
| 43 |
-
---
|
| 44 |
-
|
| 45 |
-
## 📂 FILE STRUCTURE
|
| 46 |
-
|
| 47 |
-
```
|
| 48 |
-
VisionQ/
|
| 49 |
-
├── agents/ # 🆕 Modular agents
|
| 50 |
-
│ ├── voice_agent.py # Voice I/O
|
| 51 |
-
│ ├── vision_agent.py # Vision hub
|
| 52 |
-
│ ├── embedding_agent.py # 🆕 MobileCLIP
|
| 53 |
-
│ ├── ocr_agent.py # 🆕 Text extraction
|
| 54 |
-
│ ├── memory_agent.py # Storage (JSON + FAISS)
|
| 55 |
-
│ └── query_agent.py # Smart retrieval
|
| 56 |
-
│
|
| 57 |
-
├── core/ # 🆕 Integration
|
| 58 |
-
│ └── fusion_layer.py # 🆕 Multimodal fusion
|
| 59 |
-
│
|
| 60 |
-
├── data/ # 🆕 Storage
|
| 61 |
-
│ ├── memory.json # Metadata
|
| 62 |
-
│ └── memory.faiss # 🆕 Vector index
|
| 63 |
-
│
|
| 64 |
-
├── main_upgraded.py # 🆕 Main entry
|
| 65 |
-
└── ask_question_upgraded.py # 🆕 Query tool
|
| 66 |
-
```
|
| 67 |
-
|
| 68 |
-
---
|
| 69 |
-
|
| 70 |
-
## 🆕 WHAT'S NEW?
|
| 71 |
-
|
| 72 |
-
| Feature | Status |
|
| 73 |
-
|---------|--------|
|
| 74 |
-
| **OCR Text Reading** | ✅ NEW |
|
| 75 |
-
| **FAISS Vector Search** | ✅ NEW (10x faster) |
|
| 76 |
-
| **Neural TTS (Voxtral)** | ✅ NEW (natural voice) |
|
| 77 |
-
| **Intent Classification** | ✅ NEW (DistilBERT) |
|
| 78 |
-
| **Multimodal Fusion** | ✅ NEW (richer context) |
|
| 79 |
-
|
| 80 |
-
---
|
| 81 |
-
|
| 82 |
-
## 🔧 CONFIGURATION
|
| 83 |
-
|
| 84 |
-
**Vision** (`agents/vision_agent.py`):
|
| 85 |
-
```python
|
| 86 |
-
FRAME_INTERVAL = 0.3 # Seconds between frames
|
| 87 |
-
CONF_THRESHOLD = 0.5 # Detection confidence
|
| 88 |
-
```
|
| 89 |
-
|
| 90 |
-
**OCR** (`agents/ocr_agent.py`):
|
| 91 |
-
```python
|
| 92 |
-
OCR_CONFIDENCE = 0.3 # Text threshold
|
| 93 |
-
OCR_LANGUAGES = ['en'] # Languages
|
| 94 |
-
```
|
| 95 |
-
|
| 96 |
-
**Query** (`agents/query_agent.py`):
|
| 97 |
-
```python
|
| 98 |
-
SIMILARITY_THRESHOLD = 0.45 # Search threshold
|
| 99 |
-
TOP_K_RESULTS = 5 # Max results
|
| 100 |
-
```
|
| 101 |
-
|
| 102 |
-
---
|
| 103 |
-
|
| 104 |
-
## 🧪 TESTING
|
| 105 |
-
|
| 106 |
-
```bash
|
| 107 |
-
# Run test suite
|
| 108 |
-
python test_upgrade.py
|
| 109 |
-
|
| 110 |
-
# Expected: All tests pass ✅
|
| 111 |
-
```
|
| 112 |
-
|
| 113 |
-
---
|
| 114 |
-
|
| 115 |
-
## 🐛 TROUBLESHOOTING
|
| 116 |
-
|
| 117 |
-
**"Module not found":**
|
| 118 |
-
```bash
|
| 119 |
-
pip install --upgrade -r requirements_upgraded.txt
|
| 120 |
-
```
|
| 121 |
-
|
| 122 |
-
**"FAISS not available":**
|
| 123 |
-
```bash
|
| 124 |
-
pip install faiss-cpu
|
| 125 |
-
```
|
| 126 |
-
|
| 127 |
-
**"OCR not working":**
|
| 128 |
-
```bash
|
| 129 |
-
pip install easyocr
|
| 130 |
-
```
|
| 131 |
-
|
| 132 |
-
**Camera not opening:**
|
| 133 |
-
```python
|
| 134 |
-
# Edit agents/vision_agent.py line ~90
|
| 135 |
-
self.cap = cv2.VideoCapture(1) # Try 1 instead of 0
|
| 136 |
-
```
|
| 137 |
-
|
| 138 |
-
---
|
| 139 |
-
|
| 140 |
-
## 📚 DOCUMENTATION
|
| 141 |
-
|
| 142 |
-
| File | Purpose |
|
| 143 |
-
|------|---------|
|
| 144 |
-
| **QUICKSTART.md** | 5-minute setup |
|
| 145 |
-
| **UPGRADE_GUIDE.md** | Complete guide |
|
| 146 |
-
| **ARCHITECTURE.md** | System design |
|
| 147 |
-
| **SUMMARY.md** | Executive summary |
|
| 148 |
-
| **COMPARISON.md** | Before/After |
|
| 149 |
-
| **DEPLOYMENT_CHECKLIST.md** | Deploy steps |
|
| 150 |
-
|
| 151 |
-
---
|
| 152 |
-
|
| 153 |
-
## 🎯 KEY IMPROVEMENTS
|
| 154 |
-
|
| 155 |
-
| Metric | Before | After | Change |
|
| 156 |
-
|--------|--------|-------|--------|
|
| 157 |
-
| **Search Speed** | 100-500ms | <10ms | 🟢 10-50x |
|
| 158 |
-
| **Memory Capacity** | ~1,000 | 10,000+ | 🟢 10x |
|
| 159 |
-
| **Query Accuracy** | 75% | 90% | 🟢 +15% |
|
| 160 |
-
| **Intent Accuracy** | 70% | 97% | 🟢 +27% |
|
| 161 |
-
|
| 162 |
-
---
|
| 163 |
-
|
| 164 |
-
## ✅ BACKWARD COMPATIBILITY
|
| 165 |
-
|
| 166 |
-
- ✅ Old memory.json files work
|
| 167 |
-
- ✅ Same voice commands
|
| 168 |
-
- ✅ Old files preserved
|
| 169 |
-
- ✅ Zero breaking changes
|
| 170 |
-
|
| 171 |
-
---
|
| 172 |
-
|
| 173 |
-
## 🚀 QUICK START CHECKLIST
|
| 174 |
-
|
| 175 |
-
- [ ] Install: `pip install -r requirements_upgraded.txt`
|
| 176 |
-
- [ ] Setup: `mkdir data`
|
| 177 |
-
- [ ] Test: `python test_upgrade.py`
|
| 178 |
-
- [ ] Run: `python main_upgraded.py`
|
| 179 |
-
- [ ] Try: Voice commands
|
| 180 |
-
- [ ] Query: `python ask_question_upgraded.py`
|
| 181 |
-
|
| 182 |
-
---
|
| 183 |
-
|
| 184 |
-
## 📊 ARCHITECTURE (Simplified)
|
| 185 |
-
|
| 186 |
-
```
|
| 187 |
-
Voice → Vision Hub → Fusion → Memory → Query
|
| 188 |
-
├─ YOLO ├─ JSON
|
| 189 |
-
├─ BLIP └─ FAISS
|
| 190 |
-
├─ CLIP
|
| 191 |
-
└─ OCR
|
| 192 |
-
```
|
| 193 |
-
|
| 194 |
-
---
|
| 195 |
-
|
| 196 |
-
## 🔄 DATA FLOW
|
| 197 |
-
|
| 198 |
-
```
|
| 199 |
-
1. User speaks → Vosk STT
|
| 200 |
-
2. Camera captures → Vision agents
|
| 201 |
-
3. Fusion combines → Unified context
|
| 202 |
-
4. Memory stores → JSON + FAISS
|
| 203 |
-
5. Query retrieves → Smart search
|
| 204 |
-
6. System speaks → Voxtral/pyttsx3
|
| 205 |
-
```
|
| 206 |
-
|
| 207 |
-
---
|
| 208 |
-
|
| 209 |
-
## 💡 TIPS
|
| 210 |
-
|
| 211 |
-
**For Best Performance:**
|
| 212 |
-
- Install FAISS: `pip install faiss-cpu`
|
| 213 |
-
- Install EasyOCR: `pip install easyocr`
|
| 214 |
-
- Use good lighting for OCR
|
| 215 |
-
- Clear audio for voice commands
|
| 216 |
-
|
| 217 |
-
**For Better Voice:**
|
| 218 |
-
- Download Piper TTS model
|
| 219 |
-
- Place in `models/piper/`
|
| 220 |
-
- System auto-detects and uses
|
| 221 |
-
|
| 222 |
-
**For Faster Startup:**
|
| 223 |
-
- Models cached after first run
|
| 224 |
-
- Subsequent starts faster
|
| 225 |
-
|
| 226 |
-
---
|
| 227 |
-
|
| 228 |
-
## 🎓 LEARNING PATH
|
| 229 |
-
|
| 230 |
-
**Beginner:**
|
| 231 |
-
1. Read QUICKSTART.md
|
| 232 |
-
2. Run main_upgraded.py
|
| 233 |
-
3. Try voice commands
|
| 234 |
-
|
| 235 |
-
**Intermediate:**
|
| 236 |
-
1. Read UPGRADE_GUIDE.md
|
| 237 |
-
2. Explore agents/ code
|
| 238 |
-
3. Customize parameters
|
| 239 |
-
|
| 240 |
-
**Advanced:**
|
| 241 |
-
1. Read ARCHITECTURE.md
|
| 242 |
-
2. Modify agents
|
| 243 |
-
3. Add new features
|
| 244 |
-
|
| 245 |
-
---
|
| 246 |
-
|
| 247 |
-
## 📞 SUPPORT
|
| 248 |
-
|
| 249 |
-
**Documentation:**
|
| 250 |
-
- QUICKSTART.md - Quick setup
|
| 251 |
-
- UPGRADE_GUIDE.md - Complete guide
|
| 252 |
-
- ARCHITECTURE.md - Technical details
|
| 253 |
-
|
| 254 |
-
**Testing:**
|
| 255 |
-
- test_upgrade.py - Automated tests
|
| 256 |
-
- DEPLOYMENT_CHECKLIST.md - Deploy guide
|
| 257 |
-
|
| 258 |
-
**Comparison:**
|
| 259 |
-
- COMPARISON.md - Before/After
|
| 260 |
-
- SUMMARY.md - Executive summary
|
| 261 |
-
|
| 262 |
-
---
|
| 263 |
-
|
| 264 |
-
## 🏆 SUCCESS CRITERIA
|
| 265 |
-
|
| 266 |
-
**System Working If:**
|
| 267 |
-
- ✅ Starts without errors
|
| 268 |
-
- ✅ Camera shows video
|
| 269 |
-
- ✅ Voice commands work
|
| 270 |
-
- ✅ Memory persists
|
| 271 |
-
- ✅ Queries return results
|
| 272 |
-
|
| 273 |
-
---
|
| 274 |
-
|
| 275 |
-
## 🎉 YOU'RE READY!
|
| 276 |
-
|
| 277 |
-
**Your VisionQ now has:**
|
| 278 |
-
- 🧠 Smarter memory (FAISS)
|
| 279 |
-
- 👁️ Better vision (CLIP + OCR)
|
| 280 |
-
- 🗣️ Natural voice (Voxtral)
|
| 281 |
-
- 🔍 Smart queries (DistilBERT)
|
| 282 |
-
|
| 283 |
-
**All while keeping existing features! 🚀**
|
| 284 |
-
|
| 285 |
-
---
|
| 286 |
-
|
| 287 |
-
## 📋 COMMAND CHEAT SHEET
|
| 288 |
-
|
| 289 |
-
```bash
|
| 290 |
-
# Install
|
| 291 |
-
pip install -r requirements_upgraded.txt
|
| 292 |
-
|
| 293 |
-
# Setup
|
| 294 |
-
mkdir data
|
| 295 |
-
|
| 296 |
-
# Test
|
| 297 |
-
python test_upgrade.py
|
| 298 |
-
|
| 299 |
-
# Run main system
|
| 300 |
-
python main_upgraded.py
|
| 301 |
-
|
| 302 |
-
# Run query tool
|
| 303 |
-
python ask_question_upgraded.py
|
| 304 |
-
|
| 305 |
-
# Install optional
|
| 306 |
-
pip install faiss-cpu easyocr piper-tts
|
| 307 |
-
```
|
| 308 |
-
|
| 309 |
-
---
|
| 310 |
-
|
| 311 |
-
**Keep this card handy for quick reference! 📌**
|
| 312 |
-
|
| 313 |
-
**For detailed info, see full documentation files.**
|
| 314 |
-
|
| 315 |
-
**Happy upgrading! 🎊**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
archive/old_docs/README_UPGRADED.md
DELETED
|
@@ -1,410 +0,0 @@
|
|
| 1 |
-
# 🚀 VisionQ - Multimodal AI Assistant (UPGRADED)
|
| 2 |
-
|
| 3 |
-
> **A voice-controlled AI vision assistant that can see, remember, read text, and recall visual memories through natural conversation.**
|
| 4 |
-
|
| 5 |
-
[](https://www.python.org/downloads/)
|
| 6 |
-
[](LICENSE)
|
| 7 |
-
[]()
|
| 8 |
-
|
| 9 |
-
---
|
| 10 |
-
|
| 11 |
-
## 🎯 What is VisionQ?
|
| 12 |
-
|
| 13 |
-
VisionQ is an **upgraded multimodal AI assistant** that combines:
|
| 14 |
-
- 👁️ **Computer Vision** (YOLO/SSD object detection + BLIP captioning)
|
| 15 |
-
- 🔤 **OCR** (EasyOCR text extraction)
|
| 16 |
-
- 🧠 **Semantic Memory** (FAISS vector search + JSON storage)
|
| 17 |
-
- 🗣️ **Voice Interaction** (Vosk STT + Voxtral/Piper TTS)
|
| 18 |
-
- 🔍 **Intelligent Queries** (DistilBERT NLP)
|
| 19 |
-
|
| 20 |
-
---
|
| 21 |
-
|
| 22 |
-
## ✨ Key Features
|
| 23 |
-
|
| 24 |
-
### **Core Capabilities**
|
| 25 |
-
- ✅ **Scene Description** - Multimodal understanding (vision + text)
|
| 26 |
-
- ✅ **Memory Storage** - Persistent semantic memory with FAISS
|
| 27 |
-
- ✅ **Memory Recall** - Fast similarity search (10x faster)
|
| 28 |
-
- ✅ **Text Reading** - OCR extraction from images 🆕
|
| 29 |
-
- ✅ **Voice Control** - Natural language commands
|
| 30 |
-
- ✅ **Smart Queries** - Time-aware, intent-based search 🆕
|
| 31 |
-
|
| 32 |
-
### **Technical Highlights**
|
| 33 |
-
- 🚀 **FAISS Vector Search** - Lightning-fast similarity matching
|
| 34 |
-
- 🖼️ **MobileCLIP Embeddings** - Visual semantic understanding
|
| 35 |
-
- 🔤 **EasyOCR Integration** - Offline text extraction
|
| 36 |
-
- 🧠 **DistilBERT NLP** - Intent classification
|
| 37 |
-
- 🗣️ **Neural TTS** - Natural voice output (Voxtral/Piper)
|
| 38 |
-
- 🔗 **Multimodal Fusion** - Combined vision + text + embeddings
|
| 39 |
-
|
| 40 |
-
---
|
| 41 |
-
|
| 42 |
-
## 📦 Installation
|
| 43 |
-
|
| 44 |
-
### **Quick Install**
|
| 45 |
-
```bash
|
| 46 |
-
# Clone repository
|
| 47 |
-
git clone <your-repo-url>
|
| 48 |
-
cd VisionQ
|
| 49 |
-
|
| 50 |
-
# Run automated installer (Windows)
|
| 51 |
-
install_upgrade.bat
|
| 52 |
-
|
| 53 |
-
# Or manual install:
|
| 54 |
-
pip install -r requirements_upgraded.txt
|
| 55 |
-
mkdir data
|
| 56 |
-
```
|
| 57 |
-
|
| 58 |
-
### **Requirements**
|
| 59 |
-
- Python 3.8+
|
| 60 |
-
- Webcam
|
| 61 |
-
- Microphone
|
| 62 |
-
- ~2GB disk space (for models)
|
| 63 |
-
|
| 64 |
-
---
|
| 65 |
-
|
| 66 |
-
## 🚀 Quick Start
|
| 67 |
-
|
| 68 |
-
### **1. Run the System**
|
| 69 |
-
```bash
|
| 70 |
-
python main_upgraded.py
|
| 71 |
-
```
|
| 72 |
-
|
| 73 |
-
### **2. Voice Commands**
|
| 74 |
-
| Say This | System Does |
|
| 75 |
-
|----------|-------------|
|
| 76 |
-
| "Describe the scene" | Captures and describes what it sees |
|
| 77 |
-
| "Remember this" | Stores current scene in memory |
|
| 78 |
-
| "What did I see" | Recalls last memory |
|
| 79 |
-
| "Read the text" | Extracts visible text (OCR) 🆕 |
|
| 80 |
-
| "Exit" | Quits the system |
|
| 81 |
-
|
| 82 |
-
### **3. Query Memory**
|
| 83 |
-
```bash
|
| 84 |
-
python ask_question_upgraded.py
|
| 85 |
-
```
|
| 86 |
-
|
| 87 |
-
**Example Queries:**
|
| 88 |
-
- "What did I see this morning?"
|
| 89 |
-
- "Show me memories with text"
|
| 90 |
-
- "When did I see a person?"
|
| 91 |
-
- "Find memories from yesterday"
|
| 92 |
-
|
| 93 |
-
---
|
| 94 |
-
|
| 95 |
-
## 🏗️ Architecture
|
| 96 |
-
|
| 97 |
-
```
|
| 98 |
-
Voice (Vosk + Voxtral/Piper)
|
| 99 |
-
↓
|
| 100 |
-
Vision Hub
|
| 101 |
-
├─ YOLO/SSD (objects)
|
| 102 |
-
├─ BLIP (captions)
|
| 103 |
-
├─ MobileCLIP (embeddings)
|
| 104 |
-
└─ EasyOCR (text)
|
| 105 |
-
↓
|
| 106 |
-
Fusion Layer
|
| 107 |
-
↓
|
| 108 |
-
Memory (JSON + FAISS)
|
| 109 |
-
↓
|
| 110 |
-
Query (DistilBERT)
|
| 111 |
-
```
|
| 112 |
-
|
| 113 |
-
**See [ARCHITECTURE.md](ARCHITECTURE.md) for detailed diagrams.**
|
| 114 |
-
|
| 115 |
-
---
|
| 116 |
-
|
| 117 |
-
## 📂 Project Structure
|
| 118 |
-
|
| 119 |
-
```
|
| 120 |
-
VisionQ/
|
| 121 |
-
├── agents/ # Modular AI agents
|
| 122 |
-
│ ├── voice_agent.py # Voice I/O (STT + TTS)
|
| 123 |
-
│ ├── vision_agent.py # Vision coordinator
|
| 124 |
-
│ ├── caption_agent.py # BLIP captioning
|
| 125 |
-
│ ├── embedding_agent.py # MobileCLIP embeddings
|
| 126 |
-
│ ├── ocr_agent.py # Text extraction
|
| 127 |
-
│ ├── memory_agent.py # Storage (JSON + FAISS)
|
| 128 |
-
│ └── query_agent.py # Intelligent retrieval
|
| 129 |
-
│
|
| 130 |
-
├── core/ # Integration layer
|
| 131 |
-
│ └── fusion_layer.py # Multimodal fusion
|
| 132 |
-
│
|
| 133 |
-
├── data/ # Persistent storage
|
| 134 |
-
│ ├── memory.json # Metadata
|
| 135 |
-
│ └── memory.faiss # Vector index
|
| 136 |
-
│
|
| 137 |
-
├── models/ # AI models
|
| 138 |
-
│ ├── vosk/ # Speech recognition
|
| 139 |
-
│ └── piper/ # Neural TTS (optional)
|
| 140 |
-
│
|
| 141 |
-
├── main_upgraded.py # Main entry point
|
| 142 |
-
├── ask_question_upgraded.py # Query interface
|
| 143 |
-
└── requirements_upgraded.txt # Dependencies
|
| 144 |
-
```
|
| 145 |
-
|
| 146 |
-
---
|
| 147 |
-
|
| 148 |
-
## 🆕 What's New in This Upgrade?
|
| 149 |
-
|
| 150 |
-
### **New Features**
|
| 151 |
-
1. **OCR Text Reading** 🔤
|
| 152 |
-
- Extract text from images
|
| 153 |
-
- Confidence filtering
|
| 154 |
-
- Multi-language support
|
| 155 |
-
|
| 156 |
-
2. **Visual Similarity Search** 🖼️
|
| 157 |
-
- MobileCLIP embeddings
|
| 158 |
-
- FAISS vector indexing
|
| 159 |
-
- 10x faster retrieval
|
| 160 |
-
|
| 161 |
-
3. **Intent Classification** 🧠
|
| 162 |
-
- DistilBERT NLP
|
| 163 |
-
- Better query understanding
|
| 164 |
-
- Context-aware responses
|
| 165 |
-
|
| 166 |
-
4. **Neural TTS** 🗣️
|
| 167 |
-
- Voxtral/Piper integration
|
| 168 |
-
- Natural voice output
|
| 169 |
-
- Automatic fallback to pyttsx3
|
| 170 |
-
|
| 171 |
-
5. **Multimodal Fusion** 🔗
|
| 172 |
-
- Combined vision + text + embeddings
|
| 173 |
-
- Richer scene descriptions
|
| 174 |
-
- Better memory context
|
| 175 |
-
|
| 176 |
-
### **Performance Improvements**
|
| 177 |
-
- 🚀 10x faster memory search (FAISS)
|
| 178 |
-
- 🎯 20% better query relevance
|
| 179 |
-
- 📈 10x memory capacity (10,000+ entries)
|
| 180 |
-
- ⚡ Sub-100ms query response time
|
| 181 |
-
|
| 182 |
-
---
|
| 183 |
-
|
| 184 |
-
## 🔧 Configuration
|
| 185 |
-
|
| 186 |
-
### **Adjustable Parameters**
|
| 187 |
-
|
| 188 |
-
**Vision Settings** (`agents/vision_agent.py`):
|
| 189 |
-
```python
|
| 190 |
-
FRAME_INTERVAL = 0.3 # Seconds between frames
|
| 191 |
-
CONF_THRESHOLD = 0.5 # Object detection confidence
|
| 192 |
-
```
|
| 193 |
-
|
| 194 |
-
**OCR Settings** (`agents/ocr_agent.py`):
|
| 195 |
-
```python
|
| 196 |
-
OCR_CONFIDENCE = 0.3 # Text detection threshold
|
| 197 |
-
OCR_LANGUAGES = ['en'] # Supported languages
|
| 198 |
-
```
|
| 199 |
-
|
| 200 |
-
**Query Settings** (`agents/query_agent.py`):
|
| 201 |
-
```python
|
| 202 |
-
SIMILARITY_THRESHOLD = 0.45 # Text search threshold
|
| 203 |
-
TOP_K_RESULTS = 5 # Max results to return
|
| 204 |
-
```
|
| 205 |
-
|
| 206 |
-
---
|
| 207 |
-
|
| 208 |
-
## 🧪 Testing
|
| 209 |
-
|
| 210 |
-
### **Run Test Suite**
|
| 211 |
-
```bash
|
| 212 |
-
python test_upgrade.py
|
| 213 |
-
```
|
| 214 |
-
|
| 215 |
-
**Tests:**
|
| 216 |
-
- ✅ Module imports
|
| 217 |
-
- ✅ Dependency availability
|
| 218 |
-
- ✅ MemoryAgent functionality
|
| 219 |
-
- ✅ FusionLayer integration
|
| 220 |
-
- ✅ QueryAgent NLP
|
| 221 |
-
- ✅ Backward compatibility
|
| 222 |
-
|
| 223 |
-
---
|
| 224 |
-
|
| 225 |
-
## 📚 Documentation
|
| 226 |
-
|
| 227 |
-
| Document | Description |
|
| 228 |
-
|----------|-------------|
|
| 229 |
-
| [QUICKSTART.md](QUICKSTART.md) | 5-minute setup guide |
|
| 230 |
-
| [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md) | Complete upgrade documentation |
|
| 231 |
-
| [ARCHITECTURE.md](ARCHITECTURE.md) | System architecture details |
|
| 232 |
-
| [SUMMARY.md](SUMMARY.md) | Executive summary |
|
| 233 |
-
|
| 234 |
-
---
|
| 235 |
-
|
| 236 |
-
## 🐛 Troubleshooting
|
| 237 |
-
|
| 238 |
-
### **Common Issues**
|
| 239 |
-
|
| 240 |
-
**"Module not found" error:**
|
| 241 |
-
```bash
|
| 242 |
-
pip install --upgrade -r requirements_upgraded.txt
|
| 243 |
-
```
|
| 244 |
-
|
| 245 |
-
**"FAISS not available" warning:**
|
| 246 |
-
```bash
|
| 247 |
-
pip install faiss-cpu
|
| 248 |
-
```
|
| 249 |
-
|
| 250 |
-
**"OCR not working":**
|
| 251 |
-
```bash
|
| 252 |
-
pip install easyocr
|
| 253 |
-
```
|
| 254 |
-
|
| 255 |
-
**Camera not opening:**
|
| 256 |
-
```python
|
| 257 |
-
# Try different camera index in vision_agent.py
|
| 258 |
-
self.cap = cv2.VideoCapture(1) # Try 1 instead of 0
|
| 259 |
-
```
|
| 260 |
-
|
| 261 |
-
**See [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md) for more troubleshooting.**
|
| 262 |
-
|
| 263 |
-
---
|
| 264 |
-
|
| 265 |
-
## 🎓 Use Cases
|
| 266 |
-
|
| 267 |
-
### **Personal Assistant**
|
| 268 |
-
- "What did I see this morning?"
|
| 269 |
-
- "Remember this document"
|
| 270 |
-
- "Read the text on this sign"
|
| 271 |
-
|
| 272 |
-
### **Memory Aid**
|
| 273 |
-
- "When did I last see my keys?"
|
| 274 |
-
- "Show me memories with text"
|
| 275 |
-
- "What was I doing yesterday?"
|
| 276 |
-
|
| 277 |
-
### **Accessibility**
|
| 278 |
-
- Text-to-speech for visual content
|
| 279 |
-
- Voice-controlled navigation
|
| 280 |
-
- OCR for reading assistance
|
| 281 |
-
|
| 282 |
-
---
|
| 283 |
-
|
| 284 |
-
## 🔒 Privacy
|
| 285 |
-
|
| 286 |
-
- ✅ **100% Offline** - All processing on-device
|
| 287 |
-
- ✅ **No Cloud** - No data sent to external servers
|
| 288 |
-
- ✅ **Local Storage** - Memories stored locally
|
| 289 |
-
- ✅ **No Tracking** - No analytics or telemetry
|
| 290 |
-
|
| 291 |
-
---
|
| 292 |
-
|
| 293 |
-
## 🛠️ Tech Stack
|
| 294 |
-
|
| 295 |
-
### **Core Technologies**
|
| 296 |
-
- **Python 3.8+** - Programming language
|
| 297 |
-
- **PyTorch** - Deep learning framework
|
| 298 |
-
- **OpenCV** - Computer vision
|
| 299 |
-
- **FAISS** - Vector similarity search
|
| 300 |
-
|
| 301 |
-
### **AI Models**
|
| 302 |
-
- **YOLO/SSD** - Object detection
|
| 303 |
-
- **BLIP** - Image captioning
|
| 304 |
-
- **CLIP** - Visual embeddings
|
| 305 |
-
- **DistilBERT** - NLP
|
| 306 |
-
- **EasyOCR** - Text extraction
|
| 307 |
-
- **Vosk** - Speech recognition
|
| 308 |
-
- **Piper** - Neural TTS
|
| 309 |
-
|
| 310 |
-
---
|
| 311 |
-
|
| 312 |
-
## 📈 Performance
|
| 313 |
-
|
| 314 |
-
| Metric | Value |
|
| 315 |
-
|--------|-------|
|
| 316 |
-
| Memory Search | <10ms (FAISS) |
|
| 317 |
-
| OCR Processing | 200-500ms |
|
| 318 |
-
| Caption Generation | 100-200ms |
|
| 319 |
-
| Embedding Generation | 50ms |
|
| 320 |
-
| Query Response | <100ms |
|
| 321 |
-
| Memory Capacity | 10,000+ entries |
|
| 322 |
-
|
| 323 |
-
---
|
| 324 |
-
|
| 325 |
-
## 🚀 Future Enhancements
|
| 326 |
-
|
| 327 |
-
### **Planned Features**
|
| 328 |
-
- [ ] Web interface
|
| 329 |
-
- [ ] Mobile app
|
| 330 |
-
- [ ] Cloud sync (optional)
|
| 331 |
-
- [ ] Multi-user support
|
| 332 |
-
- [ ] Video recording
|
| 333 |
-
- [ ] Real-time object tracking
|
| 334 |
-
- [ ] Face recognition
|
| 335 |
-
- [ ] Emotion detection
|
| 336 |
-
|
| 337 |
-
---
|
| 338 |
-
|
| 339 |
-
## 🤝 Contributing
|
| 340 |
-
|
| 341 |
-
Contributions welcome! Please:
|
| 342 |
-
1. Fork the repository
|
| 343 |
-
2. Create a feature branch
|
| 344 |
-
3. Make your changes
|
| 345 |
-
4. Submit a pull request
|
| 346 |
-
|
| 347 |
-
---
|
| 348 |
-
|
| 349 |
-
## 📄 License
|
| 350 |
-
|
| 351 |
-
This project is licensed under the MIT License - see [LICENSE](LICENSE) file for details.
|
| 352 |
-
|
| 353 |
-
---
|
| 354 |
-
|
| 355 |
-
## 🙏 Acknowledgments
|
| 356 |
-
|
| 357 |
-
### **Models & Libraries**
|
| 358 |
-
- [Ultralytics YOLO](https://github.com/ultralytics/ultralytics)
|
| 359 |
-
- [Salesforce BLIP](https://github.com/salesforce/BLIP)
|
| 360 |
-
- [OpenAI CLIP](https://github.com/openai/CLIP)
|
| 361 |
-
- [EasyOCR](https://github.com/JaidedAI/EasyOCR)
|
| 362 |
-
- [FAISS](https://github.com/facebookresearch/faiss)
|
| 363 |
-
- [Vosk](https://alphacephei.com/vosk/)
|
| 364 |
-
- [Piper TTS](https://github.com/rhasspy/piper)
|
| 365 |
-
|
| 366 |
-
---
|
| 367 |
-
|
| 368 |
-
## 📞 Support
|
| 369 |
-
|
| 370 |
-
- **Documentation:** See `docs/` folder
|
| 371 |
-
- **Issues:** Open a GitHub issue
|
| 372 |
-
- **Questions:** Check [UPGRADE_GUIDE.md](UPGRADE_GUIDE.md)
|
| 373 |
-
|
| 374 |
-
---
|
| 375 |
-
|
| 376 |
-
## 🎉 Status
|
| 377 |
-
|
| 378 |
-
**✅ Production Ready**
|
| 379 |
-
- All features implemented
|
| 380 |
-
- Fully tested
|
| 381 |
-
- Backward compatible
|
| 382 |
-
- Well documented
|
| 383 |
-
|
| 384 |
-
---
|
| 385 |
-
|
| 386 |
-
## 📊 Comparison
|
| 387 |
-
|
| 388 |
-
| Feature | Before | After |
|
| 389 |
-
|---------|--------|-------|
|
| 390 |
-
| Text Reading | ❌ | ✅ OCR |
|
| 391 |
-
| Memory Search | Slow | 10x faster |
|
| 392 |
-
| Voice Quality | Robotic | Natural |
|
| 393 |
-
| Query Understanding | Keywords | NLP |
|
| 394 |
-
| Scene Understanding | Caption only | Caption+OCR+Objects |
|
| 395 |
-
|
| 396 |
-
---
|
| 397 |
-
|
| 398 |
-
**VisionQ - See, Remember, Recall. Now with OCR, FAISS, and Neural TTS! 🚀**
|
| 399 |
-
|
| 400 |
-
---
|
| 401 |
-
|
| 402 |
-
## 🏁 Getting Started
|
| 403 |
-
|
| 404 |
-
1. **Install:** `install_upgrade.bat` or `pip install -r requirements_upgraded.txt`
|
| 405 |
-
2. **Run:** `python main_upgraded.py`
|
| 406 |
-
3. **Test:** `python test_upgrade.py`
|
| 407 |
-
4. **Query:** `python ask_question_upgraded.py`
|
| 408 |
-
5. **Read:** [QUICKSTART.md](QUICKSTART.md)
|
| 409 |
-
|
| 410 |
-
**Happy coding! 🎊**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
archive/old_docs/SUMMARY.md
DELETED
|
@@ -1,406 +0,0 @@
|
|
| 1 |
-
# 🎯 VisionQ Upgrade - Executive Summary
|
| 2 |
-
|
| 3 |
-
## 📊 UPGRADE OVERVIEW
|
| 4 |
-
|
| 5 |
-
**Project:** VisionQ Multimodal AI Assistant
|
| 6 |
-
**Upgrade Date:** 2024
|
| 7 |
-
**Status:** ✅ Complete - Ready for Testing
|
| 8 |
-
**Backward Compatibility:** ✅ 100% - All existing features preserved
|
| 9 |
-
|
| 10 |
-
---
|
| 11 |
-
|
| 12 |
-
## 🚀 WHAT WAS UPGRADED
|
| 13 |
-
|
| 14 |
-
### **Core Enhancements**
|
| 15 |
-
|
| 16 |
-
| Area | Before | After | Impact |
|
| 17 |
-
|------|--------|-------|--------|
|
| 18 |
-
| **Vision** | YOLO + BLIP | YOLO + BLIP + MobileCLIP + OCR | 4x richer understanding |
|
| 19 |
-
| **Memory** | JSON + text embeddings | JSON + FAISS + image embeddings | 10x faster search |
|
| 20 |
-
| **Voice** | Vosk + pyttsx3 | Vosk + Voxtral + pyttsx3 | Natural speech |
|
| 21 |
-
| **Query** | Keyword matching | DistilBERT NLP | Smarter understanding |
|
| 22 |
-
| **Text Reading** | ❌ None | ✅ EasyOCR | NEW capability |
|
| 23 |
-
|
| 24 |
-
---
|
| 25 |
-
|
| 26 |
-
## 🆕 NEW CAPABILITIES
|
| 27 |
-
|
| 28 |
-
### **1. OCR Text Extraction** 🔤
|
| 29 |
-
- **What:** Extract and read visible text from camera
|
| 30 |
-
- **How:** EasyOCR with confidence filtering
|
| 31 |
-
- **Use Case:** Read signs, documents, labels
|
| 32 |
-
- **Command:** "Read the text"
|
| 33 |
-
|
| 34 |
-
### **2. Visual Similarity Search** 🖼️
|
| 35 |
-
- **What:** Find visually similar memories
|
| 36 |
-
- **How:** MobileCLIP embeddings + FAISS indexing
|
| 37 |
-
- **Use Case:** "Show me similar scenes"
|
| 38 |
-
- **Speed:** 10-100x faster than before
|
| 39 |
-
|
| 40 |
-
### **3. Intent Classification** 🧠
|
| 41 |
-
- **What:** Understand query meaning
|
| 42 |
-
- **How:** DistilBERT zero-shot classification
|
| 43 |
-
- **Use Case:** Better query interpretation
|
| 44 |
-
- **Accuracy:** 97% (vs 70% keyword matching)
|
| 45 |
-
|
| 46 |
-
### **4. Neural Text-to-Speech** 🗣️
|
| 47 |
-
- **What:** Natural-sounding voice output
|
| 48 |
-
- **How:** Voxtral/Piper neural TTS
|
| 49 |
-
- **Use Case:** Better user experience
|
| 50 |
-
- **Fallback:** pyttsx3 (automatic)
|
| 51 |
-
|
| 52 |
-
### **5. Multimodal Fusion** 🔗
|
| 53 |
-
- **What:** Combine caption + OCR + objects + embeddings
|
| 54 |
-
- **How:** FusionLayer integration
|
| 55 |
-
- **Use Case:** Richer scene descriptions
|
| 56 |
-
- **Example:** "a person holding a phone. Text visible: Hello World"
|
| 57 |
-
|
| 58 |
-
---
|
| 59 |
-
|
| 60 |
-
## 📁 NEW FILE STRUCTURE
|
| 61 |
-
|
| 62 |
-
```
|
| 63 |
-
VisionQ/
|
| 64 |
-
├── agents/ [NEW] Modular agent architecture
|
| 65 |
-
│ ├── voice_agent.py [UPDATED] Voxtral + fallback
|
| 66 |
-
│ ├── vision_agent.py [UPDATED] Multimodal hub
|
| 67 |
-
│ ├── caption_agent.py [KEPT] BLIP captioning
|
| 68 |
-
│ ├── embedding_agent.py [NEW] MobileCLIP
|
| 69 |
-
│ ├── ocr_agent.py [NEW] EasyOCR
|
| 70 |
-
│ ├── memory_agent.py [UPDATED] FAISS integration
|
| 71 |
-
│ └── query_agent.py [UPDATED] DistilBERT
|
| 72 |
-
│
|
| 73 |
-
├── core/ [NEW] Integration layer
|
| 74 |
-
│ └── fusion_layer.py [NEW] Multimodal fusion
|
| 75 |
-
│
|
| 76 |
-
├── data/ [NEW] Persistent storage
|
| 77 |
-
│ ├── memory.json [EXISTING] Metadata
|
| 78 |
-
│ └── memory.faiss [NEW] Vector index
|
| 79 |
-
│
|
| 80 |
-
├── main_upgraded.py [NEW] Upgraded entry point
|
| 81 |
-
├── ask_question_upgraded.py [NEW] Enhanced queries
|
| 82 |
-
└── requirements_upgraded.txt [NEW] Dependencies
|
| 83 |
-
```
|
| 84 |
-
|
| 85 |
-
---
|
| 86 |
-
|
| 87 |
-
## 🔧 TECHNICAL IMPROVEMENTS
|
| 88 |
-
|
| 89 |
-
### **Performance**
|
| 90 |
-
- **Memory Search:** O(n) → O(log n) with FAISS
|
| 91 |
-
- **Query Speed:** 100ms → 10ms average
|
| 92 |
-
- **Embedding Generation:** 50ms per image
|
| 93 |
-
- **OCR Processing:** 200-500ms per frame
|
| 94 |
-
|
| 95 |
-
### **Accuracy**
|
| 96 |
-
- **Intent Classification:** 70% → 97%
|
| 97 |
-
- **Text Extraction:** N/A → 85-95% (depends on image quality)
|
| 98 |
-
- **Memory Retrieval:** 75% → 90% relevance
|
| 99 |
-
|
| 100 |
-
### **Scalability**
|
| 101 |
-
- **Memory Capacity:** 1,000 → 10,000+ entries
|
| 102 |
-
- **Search Performance:** Linear → Logarithmic
|
| 103 |
-
- **Concurrent Queries:** 1 → Multiple (FAISS thread-safe)
|
| 104 |
-
|
| 105 |
-
---
|
| 106 |
-
|
| 107 |
-
## 🎯 USE CASES
|
| 108 |
-
|
| 109 |
-
### **Before Upgrade**
|
| 110 |
-
1. ✅ Describe current scene
|
| 111 |
-
2. ✅ Remember scenes
|
| 112 |
-
3. ✅ Recall last memory
|
| 113 |
-
4. ❌ Read text
|
| 114 |
-
5. ❌ Find similar scenes
|
| 115 |
-
6. ❌ Smart queries
|
| 116 |
-
|
| 117 |
-
### **After Upgrade**
|
| 118 |
-
1. ✅ Describe scene (enhanced with OCR)
|
| 119 |
-
2. ✅ Remember scenes (with embeddings)
|
| 120 |
-
3. ✅ Recall memories (faster, smarter)
|
| 121 |
-
4. ✅ **Read text from images** 🆕
|
| 122 |
-
5. ✅ **Find visually similar memories** 🆕
|
| 123 |
-
6. ✅ **Natural language queries** 🆕
|
| 124 |
-
7. ✅ **Time-aware search** 🆕
|
| 125 |
-
8. ✅ **Hybrid text+image search** 🆕
|
| 126 |
-
|
| 127 |
-
---
|
| 128 |
-
|
| 129 |
-
## 📦 DEPENDENCIES ADDED
|
| 130 |
-
|
| 131 |
-
### **Required**
|
| 132 |
-
```
|
| 133 |
-
faiss-cpu # Vector similarity search
|
| 134 |
-
easyocr # Text extraction
|
| 135 |
-
```
|
| 136 |
-
|
| 137 |
-
### **Optional (Recommended)**
|
| 138 |
-
```
|
| 139 |
-
piper-tts # Neural TTS
|
| 140 |
-
```
|
| 141 |
-
|
| 142 |
-
### **Kept from Original**
|
| 143 |
-
```
|
| 144 |
-
torch # Deep learning
|
| 145 |
-
transformers # BLIP, CLIP, DistilBERT
|
| 146 |
-
sentence-transformers # Text embeddings
|
| 147 |
-
opencv-python # Computer vision
|
| 148 |
-
vosk # Speech recognition
|
| 149 |
-
pyttsx3 # TTS fallback
|
| 150 |
-
ultralytics # YOLO
|
| 151 |
-
```
|
| 152 |
-
|
| 153 |
-
---
|
| 154 |
-
|
| 155 |
-
## ✅ WHAT WAS PRESERVED
|
| 156 |
-
|
| 157 |
-
### **100% Backward Compatible**
|
| 158 |
-
|
| 159 |
-
| Feature | Status | Notes |
|
| 160 |
-
|---------|--------|-------|
|
| 161 |
-
| Voice commands | ✅ KEPT | Same commands work |
|
| 162 |
-
| YOLO/SSD detection | ✅ KEPT | No changes |
|
| 163 |
-
| BLIP captioning | ✅ KEPT | Still primary |
|
| 164 |
-
| JSON memory | ✅ KEPT | Same format |
|
| 165 |
-
| Time filtering | ✅ KEPT | Enhanced |
|
| 166 |
-
| Importance scoring | ✅ KEPT | Same algorithm |
|
| 167 |
-
| Vosk STT | ✅ KEPT | No changes |
|
| 168 |
-
| pyttsx3 TTS | ✅ KEPT | Now fallback |
|
| 169 |
-
|
| 170 |
-
### **Old Files Preserved**
|
| 171 |
-
- All original `.py` files in root directory
|
| 172 |
-
- Can run old system alongside new
|
| 173 |
-
- No breaking changes
|
| 174 |
-
|
| 175 |
-
---
|
| 176 |
-
|
| 177 |
-
## 🚦 DEPLOYMENT STATUS
|
| 178 |
-
|
| 179 |
-
### **Ready for Production** ✅
|
| 180 |
-
- [x] All modules implemented
|
| 181 |
-
- [x] Fallback mechanisms in place
|
| 182 |
-
- [x] Error handling added
|
| 183 |
-
- [x] Documentation complete
|
| 184 |
-
- [x] Backward compatibility verified
|
| 185 |
-
|
| 186 |
-
### **Testing Required** ⚠️
|
| 187 |
-
- [ ] End-to-end voice commands
|
| 188 |
-
- [ ] OCR on various text types
|
| 189 |
-
- [ ] FAISS performance with 1000+ memories
|
| 190 |
-
- [ ] Neural TTS quality
|
| 191 |
-
- [ ] Memory persistence across restarts
|
| 192 |
-
|
| 193 |
-
### **Optional Enhancements** 💡
|
| 194 |
-
- [ ] Web interface
|
| 195 |
-
- [ ] Mobile app
|
| 196 |
-
- [ ] Cloud sync
|
| 197 |
-
- [ ] Multi-user support
|
| 198 |
-
- [ ] Video recording
|
| 199 |
-
|
| 200 |
-
---
|
| 201 |
-
|
| 202 |
-
## 📈 EXPECTED BENEFITS
|
| 203 |
-
|
| 204 |
-
### **User Experience**
|
| 205 |
-
- **Better Understanding:** OCR + embeddings = richer context
|
| 206 |
-
- **Faster Responses:** FAISS = 10x faster search
|
| 207 |
-
- **Natural Voice:** Voxtral = human-like speech
|
| 208 |
-
- **Smarter Queries:** DistilBERT = better understanding
|
| 209 |
-
|
| 210 |
-
### **Developer Experience**
|
| 211 |
-
- **Modular Code:** Easy to extend/modify
|
| 212 |
-
- **Clear Architecture:** Well-documented
|
| 213 |
-
- **Fallback Safety:** System never breaks
|
| 214 |
-
- **Type Safety:** Clear interfaces
|
| 215 |
-
|
| 216 |
-
### **System Performance**
|
| 217 |
-
- **Scalability:** Handles 10x more memories
|
| 218 |
-
- **Speed:** 10x faster retrieval
|
| 219 |
-
- **Accuracy:** 20% improvement in relevance
|
| 220 |
-
- **Reliability:** Multiple fallback layers
|
| 221 |
-
|
| 222 |
-
---
|
| 223 |
-
|
| 224 |
-
## 🎓 LEARNING OUTCOMES
|
| 225 |
-
|
| 226 |
-
### **Technologies Integrated**
|
| 227 |
-
1. **CLIP** - Visual-language understanding
|
| 228 |
-
2. **FAISS** - Efficient vector search
|
| 229 |
-
3. **EasyOCR** - Text extraction
|
| 230 |
-
4. **DistilBERT** - Intent classification
|
| 231 |
-
5. **Piper TTS** - Neural speech synthesis
|
| 232 |
-
|
| 233 |
-
### **Design Patterns Applied**
|
| 234 |
-
1. **Modular Architecture** - Separate agents
|
| 235 |
-
2. **Fallback Pattern** - Graceful degradation
|
| 236 |
-
3. **Fusion Pattern** - Multimodal integration
|
| 237 |
-
4. **Hybrid Storage** - JSON + FAISS
|
| 238 |
-
5. **Dependency Injection** - Loose coupling
|
| 239 |
-
|
| 240 |
-
---
|
| 241 |
-
|
| 242 |
-
## 🔍 TESTING CHECKLIST
|
| 243 |
-
|
| 244 |
-
### **Critical Path** (Must Work)
|
| 245 |
-
- [ ] System starts without errors
|
| 246 |
-
- [ ] Voice recognition functional
|
| 247 |
-
- [ ] Camera capture working
|
| 248 |
-
- [ ] Basic commands work
|
| 249 |
-
- [ ] Memory persists
|
| 250 |
-
|
| 251 |
-
### **New Features** (Should Work)
|
| 252 |
-
- [ ] OCR extracts text
|
| 253 |
-
- [ ] FAISS search faster
|
| 254 |
-
- [ ] Neural TTS sounds natural
|
| 255 |
-
- [ ] Intent classification accurate
|
| 256 |
-
- [ ] Fusion layer combines data
|
| 257 |
-
|
| 258 |
-
### **Fallbacks** (Must Work)
|
| 259 |
-
- [ ] pyttsx3 if Voxtral fails
|
| 260 |
-
- [ ] Keyword matching if DistilBERT fails
|
| 261 |
-
- [ ] Linear search if FAISS unavailable
|
| 262 |
-
- [ ] System continues if OCR fails
|
| 263 |
-
|
| 264 |
-
---
|
| 265 |
-
|
| 266 |
-
## 📞 NEXT STEPS
|
| 267 |
-
|
| 268 |
-
### **Immediate (Day 1)**
|
| 269 |
-
1. Install dependencies: `pip install -r requirements_upgraded.txt`
|
| 270 |
-
2. Create data directory: `mkdir data`
|
| 271 |
-
3. Run system: `python main_upgraded.py`
|
| 272 |
-
4. Test voice commands
|
| 273 |
-
5. Verify memory storage
|
| 274 |
-
|
| 275 |
-
### **Short-term (Week 1)**
|
| 276 |
-
1. Test OCR on various text types
|
| 277 |
-
2. Build up memory database (100+ entries)
|
| 278 |
-
3. Benchmark FAISS performance
|
| 279 |
-
4. Fine-tune confidence thresholds
|
| 280 |
-
5. Collect user feedback
|
| 281 |
-
|
| 282 |
-
### **Long-term (Month 1)**
|
| 283 |
-
1. Optimize for mobile deployment
|
| 284 |
-
2. Add web interface
|
| 285 |
-
3. Implement cloud sync
|
| 286 |
-
4. Add multi-language support
|
| 287 |
-
5. Create demo videos
|
| 288 |
-
|
| 289 |
-
---
|
| 290 |
-
|
| 291 |
-
## 💰 COST-BENEFIT ANALYSIS
|
| 292 |
-
|
| 293 |
-
### **Development Cost**
|
| 294 |
-
- **Time:** ~8 hours implementation
|
| 295 |
-
- **Complexity:** Medium (modular design)
|
| 296 |
-
- **Risk:** Low (backward compatible)
|
| 297 |
-
|
| 298 |
-
### **Benefits**
|
| 299 |
-
- **Functionality:** +50% new capabilities
|
| 300 |
-
- **Performance:** 10x faster search
|
| 301 |
-
- **User Experience:** Significantly improved
|
| 302 |
-
- **Maintainability:** Better code structure
|
| 303 |
-
- **Scalability:** 10x capacity increase
|
| 304 |
-
|
| 305 |
-
### **ROI**
|
| 306 |
-
- **High:** Major capability boost with minimal risk
|
| 307 |
-
- **Immediate:** All features ready to use
|
| 308 |
-
- **Long-term:** Foundation for future enhancements
|
| 309 |
-
|
| 310 |
-
---
|
| 311 |
-
|
| 312 |
-
## 🏆 SUCCESS METRICS
|
| 313 |
-
|
| 314 |
-
### **Technical Metrics**
|
| 315 |
-
- ✅ 100% backward compatibility
|
| 316 |
-
- ✅ 0 breaking changes
|
| 317 |
-
- ✅ 10x search performance improvement
|
| 318 |
-
- ✅ 4 new major features
|
| 319 |
-
- ✅ 8 new modules created
|
| 320 |
-
|
| 321 |
-
### **User Metrics** (To Measure)
|
| 322 |
-
- Query response time < 100ms
|
| 323 |
-
- OCR accuracy > 85%
|
| 324 |
-
- Intent classification > 90%
|
| 325 |
-
- User satisfaction score
|
| 326 |
-
- Feature adoption rate
|
| 327 |
-
|
| 328 |
-
---
|
| 329 |
-
|
| 330 |
-
## 📚 DOCUMENTATION
|
| 331 |
-
|
| 332 |
-
### **Created Documents**
|
| 333 |
-
1. ✅ `UPGRADE_GUIDE.md` - Complete upgrade documentation
|
| 334 |
-
2. ✅ `QUICKSTART.md` - 5-minute setup guide
|
| 335 |
-
3. ✅ `ARCHITECTURE.md` - Detailed system architecture
|
| 336 |
-
4. ✅ `SUMMARY.md` - This executive summary
|
| 337 |
-
5. ✅ `requirements_upgraded.txt` - Dependencies
|
| 338 |
-
6. ✅ Inline code comments - All modules documented
|
| 339 |
-
|
| 340 |
-
### **Code Documentation**
|
| 341 |
-
- All agents have docstrings
|
| 342 |
-
- Methods documented with parameters
|
| 343 |
-
- Clear status markers (KEPT/UPDATED/NEW)
|
| 344 |
-
- Architecture diagrams included
|
| 345 |
-
|
| 346 |
-
---
|
| 347 |
-
|
| 348 |
-
## 🎉 CONCLUSION
|
| 349 |
-
|
| 350 |
-
### **Upgrade Success** ✅
|
| 351 |
-
|
| 352 |
-
VisionQ has been successfully upgraded from a basic vision assistant to a **comprehensive multimodal AI system** with:
|
| 353 |
-
|
| 354 |
-
- 🧠 **Smarter memory** (FAISS vector search)
|
| 355 |
-
- 👁️ **Better vision** (MobileCLIP + OCR)
|
| 356 |
-
- 🗣️ **Natural voice** (Voxtral neural TTS)
|
| 357 |
-
- 🔍 **Intelligent queries** (DistilBERT NLP)
|
| 358 |
-
- 🔗 **Multimodal fusion** (Combined understanding)
|
| 359 |
-
|
| 360 |
-
### **Key Achievements**
|
| 361 |
-
- ✅ All existing features preserved
|
| 362 |
-
- ✅ 4 major new capabilities added
|
| 363 |
-
- ✅ 10x performance improvement
|
| 364 |
-
- ✅ Zero breaking changes
|
| 365 |
-
- ✅ Production-ready code
|
| 366 |
-
|
| 367 |
-
### **Ready for Deployment**
|
| 368 |
-
The system is now ready for:
|
| 369 |
-
- ✅ Testing and validation
|
| 370 |
-
- ✅ User feedback collection
|
| 371 |
-
- ✅ Production deployment
|
| 372 |
-
- ✅ Future enhancements
|
| 373 |
-
|
| 374 |
-
---
|
| 375 |
-
|
| 376 |
-
**The upgrade is complete. VisionQ is now a state-of-the-art multimodal AI assistant! 🚀**
|
| 377 |
-
|
| 378 |
-
---
|
| 379 |
-
|
| 380 |
-
## 📋 QUICK REFERENCE
|
| 381 |
-
|
| 382 |
-
**Start Upgraded System:**
|
| 383 |
-
```bash
|
| 384 |
-
python main_upgraded.py
|
| 385 |
-
```
|
| 386 |
-
|
| 387 |
-
**Test Query System:**
|
| 388 |
-
```bash
|
| 389 |
-
python ask_question_upgraded.py
|
| 390 |
-
```
|
| 391 |
-
|
| 392 |
-
**Install Dependencies:**
|
| 393 |
-
```bash
|
| 394 |
-
pip install -r requirements_upgraded.txt
|
| 395 |
-
```
|
| 396 |
-
|
| 397 |
-
**Documentation:**
|
| 398 |
-
- Setup: `QUICKSTART.md`
|
| 399 |
-
- Details: `UPGRADE_GUIDE.md`
|
| 400 |
-
- Architecture: `ARCHITECTURE.md`
|
| 401 |
-
|
| 402 |
-
---
|
| 403 |
-
|
| 404 |
-
**Questions?** See `UPGRADE_GUIDE.md` troubleshooting section.
|
| 405 |
-
|
| 406 |
-
**Happy upgrading! 🎊**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
archive/old_docs/UPGRADE_GUIDE.md
DELETED
|
@@ -1,532 +0,0 @@
|
|
| 1 |
-
# 🚀 VisionQ System Upgrade Documentation
|
| 2 |
-
|
| 3 |
-
## 📊 UPGRADE SUMMARY
|
| 4 |
-
|
| 5 |
-
VisionQ has been upgraded from a basic vision assistant to a **multimodal AI system** with:
|
| 6 |
-
- ✅ Enhanced vision understanding (MobileCLIP embeddings)
|
| 7 |
-
- ✅ OCR text extraction (EasyOCR)
|
| 8 |
-
- ✅ Fast vector search (FAISS)
|
| 9 |
-
- ✅ Improved NLP (DistilBERT)
|
| 10 |
-
- ✅ Neural TTS (Voxtral/Piper)
|
| 11 |
-
- ✅ **ALL existing functionality preserved**
|
| 12 |
-
|
| 13 |
-
---
|
| 14 |
-
|
| 15 |
-
## 🏗️ ARCHITECTURE CHANGES
|
| 16 |
-
|
| 17 |
-
### **Before (Original System)**
|
| 18 |
-
```
|
| 19 |
-
Voice (Vosk + pyttsx3)
|
| 20 |
-
↓
|
| 21 |
-
Vision (YOLO/SSD + BLIP)
|
| 22 |
-
↓
|
| 23 |
-
Memory (JSON + sentence-transformers)
|
| 24 |
-
↓
|
| 25 |
-
Query (cosine similarity)
|
| 26 |
-
```
|
| 27 |
-
|
| 28 |
-
### **After (Upgraded System)**
|
| 29 |
-
```
|
| 30 |
-
Voice (Vosk + Voxtral/Piper + pyttsx3 fallback)
|
| 31 |
-
↓
|
| 32 |
-
Vision Hub
|
| 33 |
-
├─ YOLO/SSD (objects)
|
| 34 |
-
├─ BLIP (captions)
|
| 35 |
-
├─ MobileCLIP (embeddings)
|
| 36 |
-
└─ EasyOCR (text)
|
| 37 |
-
↓
|
| 38 |
-
Fusion Layer (combines all modalities)
|
| 39 |
-
↓
|
| 40 |
-
Memory (JSON metadata + FAISS vectors)
|
| 41 |
-
↓
|
| 42 |
-
Query (DistilBERT + hybrid search)
|
| 43 |
-
```
|
| 44 |
-
|
| 45 |
-
---
|
| 46 |
-
|
| 47 |
-
## 📂 NEW FILE STRUCTURE
|
| 48 |
-
|
| 49 |
-
```
|
| 50 |
-
VisionQ/
|
| 51 |
-
├── agents/ [NEW FOLDER]
|
| 52 |
-
│ ├── __init__.py [NEW]
|
| 53 |
-
│ ├── voice_agent.py [UPDATED]
|
| 54 |
-
│ ├── vision_agent.py [UPDATED]
|
| 55 |
-
│ ├── caption_agent.py [UNCHANGED]
|
| 56 |
-
│ ├── embedding_agent.py [NEW]
|
| 57 |
-
│ ├── ocr_agent.py [NEW]
|
| 58 |
-
│ ├── memory_agent.py [UPDATED]
|
| 59 |
-
│ └── query_agent.py [UPDATED]
|
| 60 |
-
│
|
| 61 |
-
├── core/ [NEW FOLDER]
|
| 62 |
-
│ ├── __init__.py [NEW]
|
| 63 |
-
│ └── fusion_layer.py [NEW]
|
| 64 |
-
│
|
| 65 |
-
├── data/ [NEW FOLDER]
|
| 66 |
-
│ ├── memory.json [EXISTING]
|
| 67 |
-
│ └── memory.faiss [NEW - auto-generated]
|
| 68 |
-
│
|
| 69 |
-
├── models/
|
| 70 |
-
│ ├── vosk/ [EXISTING]
|
| 71 |
-
│ └── piper/ [NEW - optional]
|
| 72 |
-
│
|
| 73 |
-
├── main_upgraded.py [NEW]
|
| 74 |
-
├── ask_question_upgraded.py [NEW]
|
| 75 |
-
├── requirements_upgraded.txt [NEW]
|
| 76 |
-
│
|
| 77 |
-
└── [OLD FILES PRESERVED]
|
| 78 |
-
├── main.py
|
| 79 |
-
├── voice_agent.py
|
| 80 |
-
├── vision_agent.py
|
| 81 |
-
├── caption_agent.py
|
| 82 |
-
├── memory_agent.py
|
| 83 |
-
├── query_agent.py
|
| 84 |
-
└── ask_question.py
|
| 85 |
-
```
|
| 86 |
-
|
| 87 |
-
---
|
| 88 |
-
|
| 89 |
-
## 🆕 NEW MODULES
|
| 90 |
-
|
| 91 |
-
### **1. EmbeddingAgent** (`agents/embedding_agent.py`)
|
| 92 |
-
- **Purpose**: Generate visual embeddings using CLIP
|
| 93 |
-
- **Input**: BGR image frame
|
| 94 |
-
- **Output**: 512-dim embedding vector
|
| 95 |
-
- **Use**: Semantic image search via FAISS
|
| 96 |
-
|
| 97 |
-
### **2. OCRAgent** (`agents/ocr_agent.py`)
|
| 98 |
-
- **Purpose**: Extract text from images
|
| 99 |
-
- **Technology**: EasyOCR (offline, lightweight)
|
| 100 |
-
- **Features**:
|
| 101 |
-
- Confidence filtering
|
| 102 |
-
- Text cleaning
|
| 103 |
-
- Multi-language support
|
| 104 |
-
|
| 105 |
-
### **3. FusionLayer** (`core/fusion_layer.py`)
|
| 106 |
-
- **Purpose**: Combine multimodal inputs
|
| 107 |
-
- **Inputs**: Caption + OCR + Objects + Embedding
|
| 108 |
-
- **Output**: Unified context dictionary
|
| 109 |
-
- **Key Method**: `fuse()` - creates structured multimodal data
|
| 110 |
-
|
| 111 |
-
---
|
| 112 |
-
|
| 113 |
-
## 🔄 UPDATED MODULES
|
| 114 |
-
|
| 115 |
-
### **1. VoiceAgent** (UPDATED)
|
| 116 |
-
**Kept:**
|
| 117 |
-
- Vosk STT
|
| 118 |
-
- Intent parsing
|
| 119 |
-
- Microphone detection
|
| 120 |
-
|
| 121 |
-
**Added:**
|
| 122 |
-
- Voxtral/Piper neural TTS (primary)
|
| 123 |
-
- pyttsx3 fallback mechanism
|
| 124 |
-
- Automatic TTS switching
|
| 125 |
-
|
| 126 |
-
**New Methods:**
|
| 127 |
-
- `_init_voxtral()` - Load neural TTS
|
| 128 |
-
- `_speak_voxtral()` - Neural speech synthesis
|
| 129 |
-
- Fallback logic in `speak()`
|
| 130 |
-
|
| 131 |
-
---
|
| 132 |
-
|
| 133 |
-
### **2. VisionAgent** (UPDATED)
|
| 134 |
-
**Kept:**
|
| 135 |
-
- YOLO/SSD object detection
|
| 136 |
-
- BLIP captioning
|
| 137 |
-
- Camera capture
|
| 138 |
-
- Continuous monitoring
|
| 139 |
-
|
| 140 |
-
**Added:**
|
| 141 |
-
- EmbeddingAgent integration
|
| 142 |
-
- OCRAgent integration
|
| 143 |
-
- FusionLayer coordination
|
| 144 |
-
- Multimodal memory storage
|
| 145 |
-
|
| 146 |
-
**New Methods:**
|
| 147 |
-
- `read_text()` - OCR extraction
|
| 148 |
-
- Enhanced `describe_scene()` with OCR
|
| 149 |
-
- Enhanced `remember_scene()` with embeddings
|
| 150 |
-
|
| 151 |
-
---
|
| 152 |
-
|
| 153 |
-
### **3. MemoryAgent** (UPDATED)
|
| 154 |
-
**Kept:**
|
| 155 |
-
- JSON metadata storage
|
| 156 |
-
- sentence-transformers text embeddings
|
| 157 |
-
- Importance scoring
|
| 158 |
-
- Timestamp tracking
|
| 159 |
-
|
| 160 |
-
**Added:**
|
| 161 |
-
- FAISS vector indexing
|
| 162 |
-
- Image embedding storage
|
| 163 |
-
- Hybrid search (text + image)
|
| 164 |
-
- Fast similarity search
|
| 165 |
-
|
| 166 |
-
**New Methods:**
|
| 167 |
-
- `_init_faiss_index()` - FAISS setup
|
| 168 |
-
- `_save_faiss_index()` - Persist vectors
|
| 169 |
-
- `search_by_image()` - Visual similarity search
|
| 170 |
-
- Enhanced `add()` with image embeddings
|
| 171 |
-
|
| 172 |
-
**Storage Format:**
|
| 173 |
-
```json
|
| 174 |
-
{
|
| 175 |
-
"id": 0,
|
| 176 |
-
"timestamp": "2024-01-15 10:30:00",
|
| 177 |
-
"description": "a person holding a phone. Text visible: Hello World",
|
| 178 |
-
"text_embedding": [...],
|
| 179 |
-
"image_embedding": [...],
|
| 180 |
-
"importance": 5
|
| 181 |
-
}
|
| 182 |
-
```
|
| 183 |
-
|
| 184 |
-
---
|
| 185 |
-
|
| 186 |
-
### **4. QueryAgent** (UPDATED)
|
| 187 |
-
**Kept:**
|
| 188 |
-
- Time-based filtering
|
| 189 |
-
- Cosine similarity
|
| 190 |
-
- Importance weighting
|
| 191 |
-
|
| 192 |
-
**Added:**
|
| 193 |
-
- DistilBERT intent classification
|
| 194 |
-
- Hybrid search (text + image)
|
| 195 |
-
- Multi-source ranking
|
| 196 |
-
|
| 197 |
-
**New Methods:**
|
| 198 |
-
- `classify_intent()` - NLP-based intent detection
|
| 199 |
-
- `_fallback_intent()` - Keyword-based backup
|
| 200 |
-
- Enhanced `ask()` with hybrid search
|
| 201 |
-
|
| 202 |
-
**Intent Categories:**
|
| 203 |
-
- `temporal` - Time-based queries
|
| 204 |
-
- `object` - Object detection queries
|
| 205 |
-
- `action` - Activity queries
|
| 206 |
-
- `text` - OCR-related queries
|
| 207 |
-
- `general` - Scene descriptions
|
| 208 |
-
|
| 209 |
-
---
|
| 210 |
-
|
| 211 |
-
## 🎯 NEW FEATURES
|
| 212 |
-
|
| 213 |
-
### **1. OCR Text Reading**
|
| 214 |
-
```python
|
| 215 |
-
# Voice command: "Read the text"
|
| 216 |
-
# System extracts and speaks visible text
|
| 217 |
-
```
|
| 218 |
-
|
| 219 |
-
**Implementation:**
|
| 220 |
-
- EasyOCR extracts text from camera frame
|
| 221 |
-
- Confidence filtering (threshold: 0.3)
|
| 222 |
-
- Text cleaning and normalization
|
| 223 |
-
- Integrated into memory descriptions
|
| 224 |
-
|
| 225 |
-
---
|
| 226 |
-
|
| 227 |
-
### **2. Visual Similarity Search**
|
| 228 |
-
```python
|
| 229 |
-
# Find visually similar memories
|
| 230 |
-
results = memory_agent.search_by_image(query_embedding, k=5)
|
| 231 |
-
```
|
| 232 |
-
|
| 233 |
-
**How it works:**
|
| 234 |
-
1. MobileCLIP generates image embedding
|
| 235 |
-
2. FAISS performs fast similarity search
|
| 236 |
-
3. Returns top-k matching memories
|
| 237 |
-
4. Combines with text search for hybrid ranking
|
| 238 |
-
|
| 239 |
-
---
|
| 240 |
-
|
| 241 |
-
### **3. Intent Classification**
|
| 242 |
-
```python
|
| 243 |
-
# DistilBERT understands query intent
|
| 244 |
-
intent = query_agent.classify_intent("What did I see this morning?")
|
| 245 |
-
# Returns: "temporal"
|
| 246 |
-
```
|
| 247 |
-
|
| 248 |
-
**Benefits:**
|
| 249 |
-
- Better query understanding
|
| 250 |
-
- Context-aware responses
|
| 251 |
-
- Improved accuracy
|
| 252 |
-
|
| 253 |
-
---
|
| 254 |
-
|
| 255 |
-
### **4. Neural TTS**
|
| 256 |
-
```python
|
| 257 |
-
# High-quality voice output
|
| 258 |
-
voice.speak("Scene remembered")
|
| 259 |
-
# Uses Voxtral/Piper if available
|
| 260 |
-
# Falls back to pyttsx3 automatically
|
| 261 |
-
```
|
| 262 |
-
|
| 263 |
-
---
|
| 264 |
-
|
| 265 |
-
## 🔧 INSTALLATION GUIDE
|
| 266 |
-
|
| 267 |
-
### **Step 1: Backup Existing System**
|
| 268 |
-
```bash
|
| 269 |
-
# Create backup of old files
|
| 270 |
-
mkdir backup
|
| 271 |
-
copy *.py backup\
|
| 272 |
-
```
|
| 273 |
-
|
| 274 |
-
### **Step 2: Install Dependencies**
|
| 275 |
-
```bash
|
| 276 |
-
# Create virtual environment
|
| 277 |
-
python -m venv venv
|
| 278 |
-
venv\Scripts\activate
|
| 279 |
-
|
| 280 |
-
# Install upgraded requirements
|
| 281 |
-
pip install -r requirements_upgraded.txt
|
| 282 |
-
```
|
| 283 |
-
|
| 284 |
-
### **Step 3: Download Models**
|
| 285 |
-
|
| 286 |
-
**Vosk (Required - Already have):**
|
| 287 |
-
- Location: `models/vosk/`
|
| 288 |
-
- ✅ Already installed
|
| 289 |
-
|
| 290 |
-
**Piper Voice (Optional - for neural TTS):**
|
| 291 |
-
```bash
|
| 292 |
-
# Download from: https://github.com/rhasspy/piper/releases
|
| 293 |
-
# Example: en_US-lessac-medium.onnx
|
| 294 |
-
# Extract to: models/piper/
|
| 295 |
-
```
|
| 296 |
-
|
| 297 |
-
### **Step 4: Migrate Memory Data**
|
| 298 |
-
```bash
|
| 299 |
-
# Create data directory
|
| 300 |
-
mkdir data
|
| 301 |
-
|
| 302 |
-
# Move existing memory
|
| 303 |
-
move memory.json data\memory.json
|
| 304 |
-
```
|
| 305 |
-
|
| 306 |
-
### **Step 5: Test Upgraded System**
|
| 307 |
-
```bash
|
| 308 |
-
# Test voice + vision
|
| 309 |
-
python main_upgraded.py
|
| 310 |
-
|
| 311 |
-
# Test query system
|
| 312 |
-
python ask_question_upgraded.py
|
| 313 |
-
```
|
| 314 |
-
|
| 315 |
-
---
|
| 316 |
-
|
| 317 |
-
## 🎮 USAGE GUIDE
|
| 318 |
-
|
| 319 |
-
### **Voice Commands (UPDATED)**
|
| 320 |
-
|
| 321 |
-
| Command | Action | Status |
|
| 322 |
-
|---------|--------|--------|
|
| 323 |
-
| "Describe the scene" | Get multimodal description | ✅ ENHANCED |
|
| 324 |
-
| "Remember this" | Store with embeddings | ✅ ENHANCED |
|
| 325 |
-
| "What did I see" | Recall last memory | ✅ KEPT |
|
| 326 |
-
| "Read the text" | OCR extraction | ✅ NEW |
|
| 327 |
-
| "Exit" | Quit system | ✅ KEPT |
|
| 328 |
-
|
| 329 |
-
### **Query Examples (NEW)**
|
| 330 |
-
|
| 331 |
-
**Time-based:**
|
| 332 |
-
```
|
| 333 |
-
"What did I see this morning?"
|
| 334 |
-
"Show me memories from yesterday"
|
| 335 |
-
"What happened in the last hour?"
|
| 336 |
-
```
|
| 337 |
-
|
| 338 |
-
**Object-based:**
|
| 339 |
-
```
|
| 340 |
-
"When did I see a person?"
|
| 341 |
-
"Find memories with a phone"
|
| 342 |
-
```
|
| 343 |
-
|
| 344 |
-
**Text-based:**
|
| 345 |
-
```
|
| 346 |
-
"What text did I see today?"
|
| 347 |
-
"Find memories with visible text"
|
| 348 |
-
```
|
| 349 |
-
|
| 350 |
-
---
|
| 351 |
-
|
| 352 |
-
## 🔍 TESTING CHECKLIST
|
| 353 |
-
|
| 354 |
-
### **Basic Functionality (Must Work)**
|
| 355 |
-
- [ ] Camera capture
|
| 356 |
-
- [ ] Voice recognition (Vosk)
|
| 357 |
-
- [ ] Voice output (pyttsx3 fallback)
|
| 358 |
-
- [ ] BLIP captioning
|
| 359 |
-
- [ ] YOLO/SSD detection
|
| 360 |
-
- [ ] Memory storage (JSON)
|
| 361 |
-
- [ ] Memory recall
|
| 362 |
-
|
| 363 |
-
### **New Features (Should Work if Dependencies Installed)**
|
| 364 |
-
- [ ] OCR text extraction
|
| 365 |
-
- [ ] MobileCLIP embeddings
|
| 366 |
-
- [ ] FAISS vector search
|
| 367 |
-
- [ ] DistilBERT intent classification
|
| 368 |
-
- [ ] Voxtral/Piper TTS
|
| 369 |
-
- [ ] Fusion layer integration
|
| 370 |
-
|
| 371 |
-
### **Fallback Mechanisms (Must Work)**
|
| 372 |
-
- [ ] pyttsx3 if Voxtral fails
|
| 373 |
-
- [ ] Keyword intent if DistilBERT fails
|
| 374 |
-
- [ ] Text search if FAISS unavailable
|
| 375 |
-
- [ ] System continues if OCR fails
|
| 376 |
-
|
| 377 |
-
---
|
| 378 |
-
|
| 379 |
-
## 🐛 TROUBLESHOOTING
|
| 380 |
-
|
| 381 |
-
### **Issue: FAISS not installing**
|
| 382 |
-
```bash
|
| 383 |
-
# Try CPU version
|
| 384 |
-
pip install faiss-cpu
|
| 385 |
-
|
| 386 |
-
# Or GPU version (if CUDA available)
|
| 387 |
-
pip install faiss-gpu
|
| 388 |
-
```
|
| 389 |
-
|
| 390 |
-
### **Issue: EasyOCR fails**
|
| 391 |
-
```bash
|
| 392 |
-
# Install dependencies
|
| 393 |
-
pip install easyocr torch torchvision
|
| 394 |
-
```
|
| 395 |
-
|
| 396 |
-
### **Issue: Piper TTS not working**
|
| 397 |
-
```bash
|
| 398 |
-
# System will automatically fall back to pyttsx3
|
| 399 |
-
# No action needed - this is expected behavior
|
| 400 |
-
```
|
| 401 |
-
|
| 402 |
-
### **Issue: Import errors**
|
| 403 |
-
```bash
|
| 404 |
-
# Ensure you're in project root
|
| 405 |
-
cd VisionQ
|
| 406 |
-
|
| 407 |
-
# Run with Python module syntax
|
| 408 |
-
python -m main_upgraded
|
| 409 |
-
```
|
| 410 |
-
|
| 411 |
-
---
|
| 412 |
-
|
| 413 |
-
## 📊 PERFORMANCE COMPARISON
|
| 414 |
-
|
| 415 |
-
| Feature | Before | After |
|
| 416 |
-
|---------|--------|-------|
|
| 417 |
-
| Caption Quality | BLIP only | BLIP + OCR + Objects |
|
| 418 |
-
| Memory Search | Text only | Text + Image (FAISS) |
|
| 419 |
-
| Query Understanding | Keywords | DistilBERT NLP |
|
| 420 |
-
| TTS Quality | Robotic | Natural (Voxtral) |
|
| 421 |
-
| Search Speed | O(n) linear | O(log n) FAISS |
|
| 422 |
-
| Text Reading | ❌ None | ✅ EasyOCR |
|
| 423 |
-
|
| 424 |
-
---
|
| 425 |
-
|
| 426 |
-
## 🚀 NEXT STEPS
|
| 427 |
-
|
| 428 |
-
### **Immediate:**
|
| 429 |
-
1. Test all voice commands
|
| 430 |
-
2. Verify OCR on text images
|
| 431 |
-
3. Check memory persistence
|
| 432 |
-
4. Test query system
|
| 433 |
-
|
| 434 |
-
### **Optional Enhancements:**
|
| 435 |
-
1. Add FastVLM for faster captioning
|
| 436 |
-
2. Implement image-to-image search UI
|
| 437 |
-
3. Add multi-language OCR
|
| 438 |
-
4. Create web interface
|
| 439 |
-
5. Add video recording
|
| 440 |
-
|
| 441 |
-
### **Production Readiness:**
|
| 442 |
-
1. Add error logging
|
| 443 |
-
2. Implement health checks
|
| 444 |
-
3. Add configuration file
|
| 445 |
-
4. Create Docker container
|
| 446 |
-
5. Add unit tests
|
| 447 |
-
|
| 448 |
-
---
|
| 449 |
-
|
| 450 |
-
## 📝 MIGRATION NOTES
|
| 451 |
-
|
| 452 |
-
### **Backward Compatibility:**
|
| 453 |
-
- ✅ Old `memory.json` files work with new system
|
| 454 |
-
- ✅ Existing voice commands unchanged
|
| 455 |
-
- ✅ Old agents still available in root directory
|
| 456 |
-
- ✅ Can run old and new systems side-by-side
|
| 457 |
-
|
| 458 |
-
### **Breaking Changes:**
|
| 459 |
-
- ❌ None - fully backward compatible
|
| 460 |
-
|
| 461 |
-
### **Deprecation Warnings:**
|
| 462 |
-
- Old files in root will be deprecated in future versions
|
| 463 |
-
- Recommended to use `agents/` modules going forward
|
| 464 |
-
|
| 465 |
-
---
|
| 466 |
-
|
| 467 |
-
## 🎓 LEARNING RESOURCES
|
| 468 |
-
|
| 469 |
-
**MobileCLIP:**
|
| 470 |
-
- Paper: https://arxiv.org/abs/2311.17049
|
| 471 |
-
- Use: Visual embeddings for similarity search
|
| 472 |
-
|
| 473 |
-
**FAISS:**
|
| 474 |
-
- Docs: https://github.com/facebookresearch/faiss
|
| 475 |
-
- Use: Fast vector similarity search
|
| 476 |
-
|
| 477 |
-
**EasyOCR:**
|
| 478 |
-
- Docs: https://github.com/JaidedAI/EasyOCR
|
| 479 |
-
- Use: Offline text extraction
|
| 480 |
-
|
| 481 |
-
**DistilBERT:**
|
| 482 |
-
- Paper: https://arxiv.org/abs/1910.01108
|
| 483 |
-
- Use: Efficient NLP for intent classification
|
| 484 |
-
|
| 485 |
-
**Piper TTS:**
|
| 486 |
-
- Docs: https://github.com/rhasspy/piper
|
| 487 |
-
- Use: Neural text-to-speech
|
| 488 |
-
|
| 489 |
-
---
|
| 490 |
-
|
| 491 |
-
## ✅ VERIFICATION
|
| 492 |
-
|
| 493 |
-
Run this checklist to verify upgrade success:
|
| 494 |
-
|
| 495 |
-
```bash
|
| 496 |
-
# 1. Check file structure
|
| 497 |
-
dir agents
|
| 498 |
-
dir core
|
| 499 |
-
dir data
|
| 500 |
-
|
| 501 |
-
# 2. Test imports
|
| 502 |
-
python -c "from agents import VisionAgent; print('✅ Imports OK')"
|
| 503 |
-
|
| 504 |
-
# 3. Test memory agent
|
| 505 |
-
python -c "from agents import MemoryAgent; m = MemoryAgent(); print('✅ Memory OK')"
|
| 506 |
-
|
| 507 |
-
# 4. Run upgraded system
|
| 508 |
-
python main_upgraded.py
|
| 509 |
-
```
|
| 510 |
-
|
| 511 |
-
---
|
| 512 |
-
|
| 513 |
-
## 📞 SUPPORT
|
| 514 |
-
|
| 515 |
-
If you encounter issues:
|
| 516 |
-
1. Check `TROUBLESHOOTING.md` section above
|
| 517 |
-
2. Verify all dependencies installed
|
| 518 |
-
3. Check Python version (3.8+)
|
| 519 |
-
4. Ensure camera/microphone permissions
|
| 520 |
-
5. Review error logs
|
| 521 |
-
|
| 522 |
-
---
|
| 523 |
-
|
| 524 |
-
**Upgrade completed successfully! 🎉**
|
| 525 |
-
|
| 526 |
-
Your VisionQ system now has:
|
| 527 |
-
- 🧠 Smarter memory (FAISS)
|
| 528 |
-
- 👁️ Better vision (MobileCLIP + OCR)
|
| 529 |
-
- 🗣️ Natural voice (Voxtral)
|
| 530 |
-
- 🔍 Intelligent queries (DistilBERT)
|
| 531 |
-
|
| 532 |
-
**All while keeping your existing system intact!**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
archive/old_scripts/ask_question.py
DELETED
|
@@ -1,19 +0,0 @@
|
|
| 1 |
-
from memory_agent import MemoryAgent
|
| 2 |
-
from query_agent import QueryAgent
|
| 3 |
-
|
| 4 |
-
def main():
|
| 5 |
-
memory_agent = MemoryAgent()
|
| 6 |
-
query_agent = QueryAgent(memory_agent)
|
| 7 |
-
|
| 8 |
-
print("🧠 Memory Query System (type 'exit' to quit)")
|
| 9 |
-
|
| 10 |
-
while True:
|
| 11 |
-
question = input("\nAsk a question: ").strip()
|
| 12 |
-
if question.lower() == "exit":
|
| 13 |
-
break
|
| 14 |
-
|
| 15 |
-
answer = query_agent.ask(question)
|
| 16 |
-
print("\n" + answer)
|
| 17 |
-
|
| 18 |
-
if __name__ == "__main__":
|
| 19 |
-
main()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
archive/old_scripts/ask_question_upgraded.py
DELETED
|
@@ -1,41 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
Memory Query System - Interactive memory search
|
| 3 |
-
UPDATED: Now includes intent classification and hybrid search
|
| 4 |
-
"""
|
| 5 |
-
|
| 6 |
-
from agents.memory_agent import MemoryAgent
|
| 7 |
-
from agents.query_agent import QueryAgent
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
def main():
|
| 11 |
-
print("=" * 60)
|
| 12 |
-
print("🧠 VisionQ Memory Query System (UPGRADED)")
|
| 13 |
-
print("=" * 60)
|
| 14 |
-
print("\nFeatures:")
|
| 15 |
-
print(" • Time-based queries (today, yesterday, last hour)")
|
| 16 |
-
print(" • Semantic search with DistilBERT")
|
| 17 |
-
print(" • FAISS-powered similarity search")
|
| 18 |
-
print(" • OCR text search")
|
| 19 |
-
print("\nType 'exit' to quit\n")
|
| 20 |
-
|
| 21 |
-
memory_agent = MemoryAgent()
|
| 22 |
-
query_agent = QueryAgent(memory_agent)
|
| 23 |
-
|
| 24 |
-
while True:
|
| 25 |
-
question = input("\n❓ Ask a question: ").strip()
|
| 26 |
-
|
| 27 |
-
if question.lower() == "exit":
|
| 28 |
-
print("Goodbye!")
|
| 29 |
-
break
|
| 30 |
-
|
| 31 |
-
if not question:
|
| 32 |
-
continue
|
| 33 |
-
|
| 34 |
-
# Query with enhanced capabilities
|
| 35 |
-
answer = query_agent.ask(question)
|
| 36 |
-
print(f"\n💡 Answer:\n{answer}\n")
|
| 37 |
-
print("-" * 60)
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
if __name__ == "__main__":
|
| 41 |
-
main()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
archive/old_scripts/install_upgrade.bat
DELETED
|
@@ -1,101 +0,0 @@
|
|
| 1 |
-
@echo off
|
| 2 |
-
REM ============================================
|
| 3 |
-
REM VisionQ Upgrade - Automated Installation
|
| 4 |
-
REM ============================================
|
| 5 |
-
|
| 6 |
-
echo.
|
| 7 |
-
echo ============================================
|
| 8 |
-
echo VisionQ System Upgrade Installer
|
| 9 |
-
echo ============================================
|
| 10 |
-
echo.
|
| 11 |
-
|
| 12 |
-
REM Check Python installation
|
| 13 |
-
python --version >nul 2>&1
|
| 14 |
-
if errorlevel 1 (
|
| 15 |
-
echo [ERROR] Python not found. Please install Python 3.8+
|
| 16 |
-
pause
|
| 17 |
-
exit /b 1
|
| 18 |
-
)
|
| 19 |
-
|
| 20 |
-
echo [1/6] Python detected
|
| 21 |
-
echo.
|
| 22 |
-
|
| 23 |
-
REM Create data directory
|
| 24 |
-
echo [2/6] Creating data directory...
|
| 25 |
-
if not exist "data" mkdir data
|
| 26 |
-
echo - data\ created
|
| 27 |
-
|
| 28 |
-
REM Move existing memory file
|
| 29 |
-
if exist "memory.json" (
|
| 30 |
-
echo [3/6] Migrating existing memory...
|
| 31 |
-
move /Y memory.json data\memory.json >nul
|
| 32 |
-
echo - memory.json moved to data\
|
| 33 |
-
) else (
|
| 34 |
-
echo [3/6] No existing memory found (fresh install)
|
| 35 |
-
)
|
| 36 |
-
|
| 37 |
-
REM Install dependencies
|
| 38 |
-
echo.
|
| 39 |
-
echo [4/6] Installing dependencies...
|
| 40 |
-
echo This may take several minutes...
|
| 41 |
-
echo.
|
| 42 |
-
pip install -r requirements_upgraded.txt
|
| 43 |
-
if errorlevel 1 (
|
| 44 |
-
echo [ERROR] Dependency installation failed
|
| 45 |
-
pause
|
| 46 |
-
exit /b 1
|
| 47 |
-
)
|
| 48 |
-
|
| 49 |
-
echo.
|
| 50 |
-
echo [5/6] Verifying installation...
|
| 51 |
-
|
| 52 |
-
REM Test imports
|
| 53 |
-
python -c "from agents import VisionAgent; print(' - Agents: OK')" 2>nul
|
| 54 |
-
if errorlevel 1 (
|
| 55 |
-
echo [ERROR] Agent import failed
|
| 56 |
-
pause
|
| 57 |
-
exit /b 1
|
| 58 |
-
)
|
| 59 |
-
|
| 60 |
-
python -c "from core import FusionLayer; print(' - Core: OK')" 2>nul
|
| 61 |
-
if errorlevel 1 (
|
| 62 |
-
echo [ERROR] Core import failed
|
| 63 |
-
pause
|
| 64 |
-
exit /b 1
|
| 65 |
-
)
|
| 66 |
-
|
| 67 |
-
python -c "import faiss; print(' - FAISS: OK')" 2>nul
|
| 68 |
-
if errorlevel 1 (
|
| 69 |
-
echo [WARNING] FAISS not available (optional)
|
| 70 |
-
echo Install with: pip install faiss-cpu
|
| 71 |
-
)
|
| 72 |
-
|
| 73 |
-
python -c "import easyocr; print(' - EasyOCR: OK')" 2>nul
|
| 74 |
-
if errorlevel 1 (
|
| 75 |
-
echo [WARNING] EasyOCR not available (optional)
|
| 76 |
-
echo Install with: pip install easyocr
|
| 77 |
-
)
|
| 78 |
-
|
| 79 |
-
echo.
|
| 80 |
-
echo [6/6] Installation complete!
|
| 81 |
-
echo.
|
| 82 |
-
echo ============================================
|
| 83 |
-
echo VisionQ Upgrade Installed Successfully!
|
| 84 |
-
echo ============================================
|
| 85 |
-
echo.
|
| 86 |
-
echo Next steps:
|
| 87 |
-
echo 1. Run: python main_upgraded.py
|
| 88 |
-
echo 2. Test voice commands
|
| 89 |
-
echo 3. Check QUICKSTART.md for usage guide
|
| 90 |
-
echo.
|
| 91 |
-
echo Optional enhancements:
|
| 92 |
-
echo - Install FAISS: pip install faiss-cpu
|
| 93 |
-
echo - Install EasyOCR: pip install easyocr
|
| 94 |
-
echo - Download Piper TTS model (see QUICKSTART.md)
|
| 95 |
-
echo.
|
| 96 |
-
echo Documentation:
|
| 97 |
-
echo - QUICKSTART.md - Quick start guide
|
| 98 |
-
echo - UPGRADE_GUIDE.md - Complete documentation
|
| 99 |
-
echo - ARCHITECTURE.md - System architecture
|
| 100 |
-
echo.
|
| 101 |
-
pause
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
archive/old_scripts/main.py
DELETED
|
@@ -1,66 +0,0 @@
|
|
| 1 |
-
from voice_agent import VoiceAgent
|
| 2 |
-
from vision_agent import VisionAgent
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
def main():
|
| 6 |
-
voice = VoiceAgent()
|
| 7 |
-
vision = VisionAgent()
|
| 8 |
-
|
| 9 |
-
voice.speak("Vision Q started. I am listening.")
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
while True:
|
| 13 |
-
spoken_text = voice.listen()
|
| 14 |
-
if not spoken_text:
|
| 15 |
-
continue
|
| 16 |
-
|
| 17 |
-
intent = voice.parse_intent(spoken_text)
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
print("[INTENT]:", intent)
|
| 21 |
-
|
| 22 |
-
if intent == "DESCRIBE_SCENE":
|
| 23 |
-
voice.speak("Describing the scene.")
|
| 24 |
-
description = vision.describe_scene()
|
| 25 |
-
|
| 26 |
-
if description:
|
| 27 |
-
print("[DESCRIPTION]:", description)
|
| 28 |
-
voice.speak(description)
|
| 29 |
-
else:
|
| 30 |
-
voice.speak("I could not capture the scene.")
|
| 31 |
-
|
| 32 |
-
elif intent == "REMEMBER_SCENE":
|
| 33 |
-
voice.speak("I will remember this scene.")
|
| 34 |
-
description = vision.remember_scene()
|
| 35 |
-
|
| 36 |
-
if description:
|
| 37 |
-
print("[REMEMBERED]:", description)
|
| 38 |
-
voice.speak("Scene remembered.")
|
| 39 |
-
else:
|
| 40 |
-
voice.speak("I could not remember the scene.")
|
| 41 |
-
|
| 42 |
-
elif intent == "RECALL_MEMORY":
|
| 43 |
-
memory = vision.memory_agent.recall_last()
|
| 44 |
-
if memory:
|
| 45 |
-
voice.speak(memory)
|
| 46 |
-
else:
|
| 47 |
-
voice.speak("I do not have any memories yet.")
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
elif intent == "READ_TEXT":
|
| 51 |
-
# OCR intentionally postponed
|
| 52 |
-
voice.speak("Reading text will be available soon.")
|
| 53 |
-
|
| 54 |
-
elif intent == "EXIT":
|
| 55 |
-
voice.speak("Goodbye.")
|
| 56 |
-
vision.cleanup()
|
| 57 |
-
break
|
| 58 |
-
|
| 59 |
-
else:
|
| 60 |
-
voice.speak("I did not understand.")
|
| 61 |
-
|
| 62 |
-
print("Vision Q stopped.")
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
if __name__ == "__main__":
|
| 66 |
-
main()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
archive/old_scripts/main_upgraded.py
DELETED
|
@@ -1,85 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
VisionQ - Upgraded Multimodal AI Assistant
|
| 3 |
-
UPDATED: Now includes OCR, embeddings, and enhanced memory
|
| 4 |
-
"""
|
| 5 |
-
|
| 6 |
-
from agents.voice_agent import VoiceAgent
|
| 7 |
-
from agents.vision_agent import VisionAgent
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
def main():
|
| 11 |
-
print("=" * 60)
|
| 12 |
-
print("VisionQ - Multimodal AI Assistant (UPGRADED)")
|
| 13 |
-
print("=" * 60)
|
| 14 |
-
|
| 15 |
-
# Initialize agents
|
| 16 |
-
voice = VoiceAgent()
|
| 17 |
-
vision = VisionAgent()
|
| 18 |
-
|
| 19 |
-
voice.speak("Vision Q started. I am listening.")
|
| 20 |
-
|
| 21 |
-
while True:
|
| 22 |
-
spoken_text = voice.listen()
|
| 23 |
-
if not spoken_text:
|
| 24 |
-
continue
|
| 25 |
-
|
| 26 |
-
intent = voice.parse_intent(spoken_text)
|
| 27 |
-
print(f"[INTENT]: {intent}")
|
| 28 |
-
|
| 29 |
-
# ===== DESCRIBE SCENE (UPDATED) =====
|
| 30 |
-
if intent == "DESCRIBE_SCENE":
|
| 31 |
-
voice.speak("Describing the scene.")
|
| 32 |
-
description = vision.describe_scene()
|
| 33 |
-
|
| 34 |
-
if description:
|
| 35 |
-
print(f"[DESCRIPTION]: {description}")
|
| 36 |
-
voice.speak(description)
|
| 37 |
-
else:
|
| 38 |
-
voice.speak("I could not capture the scene.")
|
| 39 |
-
|
| 40 |
-
# ===== REMEMBER SCENE (UPDATED) =====
|
| 41 |
-
elif intent == "REMEMBER_SCENE":
|
| 42 |
-
voice.speak("I will remember this scene.")
|
| 43 |
-
description = vision.remember_scene()
|
| 44 |
-
|
| 45 |
-
if description:
|
| 46 |
-
print(f"[REMEMBERED]: {description}")
|
| 47 |
-
voice.speak("Scene remembered.")
|
| 48 |
-
else:
|
| 49 |
-
voice.speak("I could not remember the scene.")
|
| 50 |
-
|
| 51 |
-
# ===== RECALL MEMORY (KEPT) =====
|
| 52 |
-
elif intent == "RECALL_MEMORY":
|
| 53 |
-
memory = vision.memory_agent.recall_last()
|
| 54 |
-
if memory:
|
| 55 |
-
response = f"At {memory['timestamp']}, {memory['description']}"
|
| 56 |
-
voice.speak(response)
|
| 57 |
-
else:
|
| 58 |
-
voice.speak("I do not have any memories yet.")
|
| 59 |
-
|
| 60 |
-
# ===== READ TEXT (NEW - NOW FUNCTIONAL) =====
|
| 61 |
-
elif intent == "READ_TEXT":
|
| 62 |
-
voice.speak("Reading text from the scene.")
|
| 63 |
-
text_result = vision.read_text()
|
| 64 |
-
|
| 65 |
-
if text_result:
|
| 66 |
-
print(f"[OCR]: {text_result}")
|
| 67 |
-
voice.speak(text_result)
|
| 68 |
-
else:
|
| 69 |
-
voice.speak("I could not read any text.")
|
| 70 |
-
|
| 71 |
-
# ===== EXIT (KEPT) =====
|
| 72 |
-
elif intent == "EXIT":
|
| 73 |
-
voice.speak("Goodbye.")
|
| 74 |
-
vision.cleanup()
|
| 75 |
-
break
|
| 76 |
-
|
| 77 |
-
# ===== UNKNOWN (KEPT) =====
|
| 78 |
-
else:
|
| 79 |
-
voice.speak("I did not understand.")
|
| 80 |
-
|
| 81 |
-
print("Vision Q stopped.")
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
if __name__ == "__main__":
|
| 85 |
-
main()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
archive/old_scripts/test_upgrade.py
DELETED
|
@@ -1,274 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
VisionQ Upgrade - Automated Test Suite
|
| 3 |
-
Tests all new and existing functionality
|
| 4 |
-
"""
|
| 5 |
-
|
| 6 |
-
import sys
|
| 7 |
-
import os
|
| 8 |
-
|
| 9 |
-
def test_imports():
|
| 10 |
-
"""Test all module imports"""
|
| 11 |
-
print("\n" + "="*60)
|
| 12 |
-
print("TEST 1: Module Imports")
|
| 13 |
-
print("="*60)
|
| 14 |
-
|
| 15 |
-
tests = [
|
| 16 |
-
("agents.voice_agent", "VoiceAgent"),
|
| 17 |
-
("agents.vision_agent", "VisionAgent"),
|
| 18 |
-
("agents.caption_agent", "CaptionAgent"),
|
| 19 |
-
("agents.embedding_agent", "EmbeddingAgent"),
|
| 20 |
-
("agents.ocr_agent", "OCRAgent"),
|
| 21 |
-
("agents.memory_agent", "MemoryAgent"),
|
| 22 |
-
("agents.query_agent", "QueryAgent"),
|
| 23 |
-
("core.fusion_layer", "FusionLayer"),
|
| 24 |
-
]
|
| 25 |
-
|
| 26 |
-
passed = 0
|
| 27 |
-
failed = 0
|
| 28 |
-
|
| 29 |
-
for module, cls in tests:
|
| 30 |
-
try:
|
| 31 |
-
exec(f"from {module} import {cls}")
|
| 32 |
-
print(f" ✅ {module}.{cls}")
|
| 33 |
-
passed += 1
|
| 34 |
-
except Exception as e:
|
| 35 |
-
print(f" ❌ {module}.{cls} - {e}")
|
| 36 |
-
failed += 1
|
| 37 |
-
|
| 38 |
-
print(f"\nResult: {passed} passed, {failed} failed")
|
| 39 |
-
return failed == 0
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
def test_dependencies():
|
| 43 |
-
"""Test optional dependencies"""
|
| 44 |
-
print("\n" + "="*60)
|
| 45 |
-
print("TEST 2: Optional Dependencies")
|
| 46 |
-
print("="*60)
|
| 47 |
-
|
| 48 |
-
deps = [
|
| 49 |
-
("faiss", "FAISS (vector search)"),
|
| 50 |
-
("easyocr", "EasyOCR (text extraction)"),
|
| 51 |
-
("piper", "Piper TTS (neural voice)"),
|
| 52 |
-
]
|
| 53 |
-
|
| 54 |
-
for module, name in deps:
|
| 55 |
-
try:
|
| 56 |
-
__import__(module)
|
| 57 |
-
print(f" ✅ {name}")
|
| 58 |
-
except ImportError:
|
| 59 |
-
print(f" ⚠️ {name} - Not installed (optional)")
|
| 60 |
-
|
| 61 |
-
return True
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
def test_memory_agent():
|
| 65 |
-
"""Test MemoryAgent functionality"""
|
| 66 |
-
print("\n" + "="*60)
|
| 67 |
-
print("TEST 3: MemoryAgent")
|
| 68 |
-
print("="*60)
|
| 69 |
-
|
| 70 |
-
try:
|
| 71 |
-
from agents.memory_agent import MemoryAgent
|
| 72 |
-
import numpy as np
|
| 73 |
-
|
| 74 |
-
# Create test memory
|
| 75 |
-
memory = MemoryAgent(
|
| 76 |
-
memory_file="data/test_memory.json",
|
| 77 |
-
faiss_index_file="data/test_memory.faiss"
|
| 78 |
-
)
|
| 79 |
-
|
| 80 |
-
# Test adding memory
|
| 81 |
-
test_desc = "Test scene with a person"
|
| 82 |
-
test_embedding = np.random.rand(512).astype('float32')
|
| 83 |
-
memory.add(test_desc, image_embedding=test_embedding)
|
| 84 |
-
print(" ✅ Add memory")
|
| 85 |
-
|
| 86 |
-
# Test recall
|
| 87 |
-
last = memory.recall_last()
|
| 88 |
-
assert last is not None
|
| 89 |
-
print(" ✅ Recall last")
|
| 90 |
-
|
| 91 |
-
# Test text search
|
| 92 |
-
results = memory.search_by_text("person", threshold=0.1)
|
| 93 |
-
print(f" ✅ Text search ({len(results)} results)")
|
| 94 |
-
|
| 95 |
-
# Test image search (if FAISS available)
|
| 96 |
-
try:
|
| 97 |
-
results = memory.search_by_image(test_embedding, k=1)
|
| 98 |
-
print(f" ✅ Image search ({len(results)} results)")
|
| 99 |
-
except:
|
| 100 |
-
print(" ⚠️ Image search - FAISS not available")
|
| 101 |
-
|
| 102 |
-
# Cleanup
|
| 103 |
-
if os.path.exists("data/test_memory.json"):
|
| 104 |
-
os.remove("data/test_memory.json")
|
| 105 |
-
if os.path.exists("data/test_memory.faiss"):
|
| 106 |
-
os.remove("data/test_memory.faiss")
|
| 107 |
-
|
| 108 |
-
print("\n MemoryAgent: PASSED")
|
| 109 |
-
return True
|
| 110 |
-
|
| 111 |
-
except Exception as e:
|
| 112 |
-
print(f"\n ❌ MemoryAgent: FAILED - {e}")
|
| 113 |
-
return False
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
def test_fusion_layer():
|
| 117 |
-
"""Test FusionLayer"""
|
| 118 |
-
print("\n" + "="*60)
|
| 119 |
-
print("TEST 4: FusionLayer")
|
| 120 |
-
print("="*60)
|
| 121 |
-
|
| 122 |
-
try:
|
| 123 |
-
from core.fusion_layer import FusionLayer
|
| 124 |
-
import numpy as np
|
| 125 |
-
|
| 126 |
-
fusion = FusionLayer()
|
| 127 |
-
|
| 128 |
-
# Test fusion
|
| 129 |
-
context = fusion.fuse(
|
| 130 |
-
caption="a person holding a phone",
|
| 131 |
-
ocr_text="Hello World",
|
| 132 |
-
objects=["person", "phone"],
|
| 133 |
-
embedding=np.random.rand(512)
|
| 134 |
-
)
|
| 135 |
-
|
| 136 |
-
assert "caption" in context
|
| 137 |
-
assert "ocr_text" in context
|
| 138 |
-
assert "objects" in context
|
| 139 |
-
assert "full_description" in context
|
| 140 |
-
print(" ✅ Fuse multimodal data")
|
| 141 |
-
|
| 142 |
-
# Test extraction
|
| 143 |
-
desc, emb = fusion.extract_for_storage(context)
|
| 144 |
-
assert desc is not None
|
| 145 |
-
assert emb is not None
|
| 146 |
-
print(" ✅ Extract for storage")
|
| 147 |
-
|
| 148 |
-
print("\n FusionLayer: PASSED")
|
| 149 |
-
return True
|
| 150 |
-
|
| 151 |
-
except Exception as e:
|
| 152 |
-
print(f"\n ❌ FusionLayer: FAILED - {e}")
|
| 153 |
-
return False
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
def test_query_agent():
|
| 157 |
-
"""Test QueryAgent"""
|
| 158 |
-
print("\n" + "="*60)
|
| 159 |
-
print("TEST 5: QueryAgent")
|
| 160 |
-
print("="*60)
|
| 161 |
-
|
| 162 |
-
try:
|
| 163 |
-
from agents.memory_agent import MemoryAgent
|
| 164 |
-
from agents.query_agent import QueryAgent
|
| 165 |
-
|
| 166 |
-
memory = MemoryAgent(
|
| 167 |
-
memory_file="data/test_memory.json",
|
| 168 |
-
faiss_index_file="data/test_memory.faiss"
|
| 169 |
-
)
|
| 170 |
-
query = QueryAgent(memory)
|
| 171 |
-
|
| 172 |
-
# Test intent classification
|
| 173 |
-
intent = query.classify_intent("What did I see this morning?")
|
| 174 |
-
print(f" ✅ Intent classification: {intent}")
|
| 175 |
-
|
| 176 |
-
# Test time extraction
|
| 177 |
-
time_window = query.extract_time_window("What did I see today?")
|
| 178 |
-
print(f" ✅ Time extraction: {time_window is not None}")
|
| 179 |
-
|
| 180 |
-
# Cleanup
|
| 181 |
-
if os.path.exists("data/test_memory.json"):
|
| 182 |
-
os.remove("data/test_memory.json")
|
| 183 |
-
if os.path.exists("data/test_memory.faiss"):
|
| 184 |
-
os.remove("data/test_memory.faiss")
|
| 185 |
-
|
| 186 |
-
print("\n QueryAgent: PASSED")
|
| 187 |
-
return True
|
| 188 |
-
|
| 189 |
-
except Exception as e:
|
| 190 |
-
print(f"\n ❌ QueryAgent: FAILED - {e}")
|
| 191 |
-
return False
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
def test_backward_compatibility():
|
| 195 |
-
"""Test backward compatibility"""
|
| 196 |
-
print("\n" + "="*60)
|
| 197 |
-
print("TEST 6: Backward Compatibility")
|
| 198 |
-
print("="*60)
|
| 199 |
-
|
| 200 |
-
try:
|
| 201 |
-
from agents.memory_agent import MemoryAgent
|
| 202 |
-
|
| 203 |
-
# Test old memory format
|
| 204 |
-
memory = MemoryAgent(
|
| 205 |
-
memory_file="data/test_memory.json",
|
| 206 |
-
faiss_index_file="data/test_memory.faiss"
|
| 207 |
-
)
|
| 208 |
-
|
| 209 |
-
# Add memory without image embedding (old format)
|
| 210 |
-
memory.add("Test scene without embedding", image_embedding=None)
|
| 211 |
-
print(" ✅ Old format (no image embedding)")
|
| 212 |
-
|
| 213 |
-
# Recall should work
|
| 214 |
-
last = memory.recall_last()
|
| 215 |
-
assert last is not None
|
| 216 |
-
print(" ✅ Recall old format")
|
| 217 |
-
|
| 218 |
-
# Cleanup
|
| 219 |
-
if os.path.exists("data/test_memory.json"):
|
| 220 |
-
os.remove("data/test_memory.json")
|
| 221 |
-
if os.path.exists("data/test_memory.faiss"):
|
| 222 |
-
os.remove("data/test_memory.faiss")
|
| 223 |
-
|
| 224 |
-
print("\n Backward Compatibility: PASSED")
|
| 225 |
-
return True
|
| 226 |
-
|
| 227 |
-
except Exception as e:
|
| 228 |
-
print(f"\n ❌ Backward Compatibility: FAILED - {e}")
|
| 229 |
-
return False
|
| 230 |
-
|
| 231 |
-
|
| 232 |
-
def main():
|
| 233 |
-
"""Run all tests"""
|
| 234 |
-
print("\n" + "="*60)
|
| 235 |
-
print("VisionQ Upgrade - Test Suite")
|
| 236 |
-
print("="*60)
|
| 237 |
-
|
| 238 |
-
# Ensure data directory exists
|
| 239 |
-
os.makedirs("data", exist_ok=True)
|
| 240 |
-
|
| 241 |
-
results = []
|
| 242 |
-
|
| 243 |
-
# Run tests
|
| 244 |
-
results.append(("Imports", test_imports()))
|
| 245 |
-
results.append(("Dependencies", test_dependencies()))
|
| 246 |
-
results.append(("MemoryAgent", test_memory_agent()))
|
| 247 |
-
results.append(("FusionLayer", test_fusion_layer()))
|
| 248 |
-
results.append(("QueryAgent", test_query_agent()))
|
| 249 |
-
results.append(("Backward Compatibility", test_backward_compatibility()))
|
| 250 |
-
|
| 251 |
-
# Summary
|
| 252 |
-
print("\n" + "="*60)
|
| 253 |
-
print("TEST SUMMARY")
|
| 254 |
-
print("="*60)
|
| 255 |
-
|
| 256 |
-
passed = sum(1 for _, result in results if result)
|
| 257 |
-
total = len(results)
|
| 258 |
-
|
| 259 |
-
for name, result in results:
|
| 260 |
-
status = "✅ PASSED" if result else "❌ FAILED"
|
| 261 |
-
print(f" {name}: {status}")
|
| 262 |
-
|
| 263 |
-
print(f"\nTotal: {passed}/{total} tests passed")
|
| 264 |
-
|
| 265 |
-
if passed == total:
|
| 266 |
-
print("\n🎉 All tests passed! System is ready.")
|
| 267 |
-
return 0
|
| 268 |
-
else:
|
| 269 |
-
print(f"\n⚠️ {total - passed} test(s) failed. Check errors above.")
|
| 270 |
-
return 1
|
| 271 |
-
|
| 272 |
-
|
| 273 |
-
if __name__ == "__main__":
|
| 274 |
-
sys.exit(main())
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
archive/pipcheck.txt
DELETED
|
Binary file (22.9 kB)
|
|
|
archive/requirements_upgraded.txt
DELETED
|
@@ -1,54 +0,0 @@
|
|
| 1 |
-
# ============================================
|
| 2 |
-
# VisionQ - UPGRADED Requirements
|
| 3 |
-
# ============================================
|
| 4 |
-
|
| 5 |
-
# Core ML/AI
|
| 6 |
-
torch>=2.0.0
|
| 7 |
-
transformers>=4.57.3
|
| 8 |
-
sentence-transformers>=2.2.2
|
| 9 |
-
|
| 10 |
-
# Vision
|
| 11 |
-
opencv-python
|
| 12 |
-
pillow
|
| 13 |
-
ultralytics # YOLO (optional but recommended)
|
| 14 |
-
|
| 15 |
-
# Voice
|
| 16 |
-
pyttsx3 # TTS fallback (KEPT)
|
| 17 |
-
sounddevice
|
| 18 |
-
vosk
|
| 19 |
-
|
| 20 |
-
# NEW: Neural TTS (Primary)
|
| 21 |
-
piper-tts # Voxtral/Piper neural TTS
|
| 22 |
-
|
| 23 |
-
# NEW: OCR
|
| 24 |
-
easyocr # Lightweight OCR
|
| 25 |
-
|
| 26 |
-
# NEW: Vector Search
|
| 27 |
-
faiss-cpu # FAISS for similarity search
|
| 28 |
-
# Use faiss-gpu if you have CUDA
|
| 29 |
-
|
| 30 |
-
# NEW: NLP Enhancement
|
| 31 |
-
# DistilBERT is included in transformers
|
| 32 |
-
|
| 33 |
-
# Optional: TensorFlow for SSD fallback
|
| 34 |
-
# tensorflow>=2.13.0
|
| 35 |
-
|
| 36 |
-
# ============================================
|
| 37 |
-
# Installation Notes:
|
| 38 |
-
# ============================================
|
| 39 |
-
# 1. Create virtual environment:
|
| 40 |
-
# python -m venv venv
|
| 41 |
-
# venv\Scripts\activate (Windows)
|
| 42 |
-
# source venv/bin/activate (Linux/Mac)
|
| 43 |
-
#
|
| 44 |
-
# 2. Install dependencies:
|
| 45 |
-
# pip install -r requirements.txt
|
| 46 |
-
#
|
| 47 |
-
# 3. Download Vosk model:
|
| 48 |
-
# https://alphacephei.com/vosk/models
|
| 49 |
-
# Extract to: models/vosk/
|
| 50 |
-
#
|
| 51 |
-
# 4. Download Piper voice (optional):
|
| 52 |
-
# https://github.com/rhasspy/piper/releases
|
| 53 |
-
# Extract to: models/piper/
|
| 54 |
-
# ============================================
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
cleanup.bat
DELETED
|
@@ -1,65 +0,0 @@
|
|
| 1 |
-
@echo off
|
| 2 |
-
REM ============================================
|
| 3 |
-
REM VisionQ - Project Cleanup Script
|
| 4 |
-
REM Moves old/redundant files to archive
|
| 5 |
-
REM ============================================
|
| 6 |
-
|
| 7 |
-
echo.
|
| 8 |
-
echo ============================================
|
| 9 |
-
echo VisionQ Project Cleanup
|
| 10 |
-
echo ============================================
|
| 11 |
-
echo.
|
| 12 |
-
|
| 13 |
-
REM Create archive directory
|
| 14 |
-
if not exist "archive\" mkdir archive
|
| 15 |
-
if not exist "archive\old_agents\" mkdir archive\old_agents
|
| 16 |
-
if not exist "archive\old_docs\" mkdir archive\old_docs
|
| 17 |
-
if not exist "archive\old_scripts\" mkdir archive\old_scripts
|
| 18 |
-
|
| 19 |
-
echo [1/4] Moving old agent files...
|
| 20 |
-
if exist "caption_agent.py" move /Y caption_agent.py archive\old_agents\
|
| 21 |
-
if exist "memory_agent.py" move /Y memory_agent.py archive\old_agents\
|
| 22 |
-
if exist "query_agent.py" move /Y query_agent.py archive\old_agents\
|
| 23 |
-
if exist "vision_agent.py" move /Y vision_agent.py archive\old_agents\
|
| 24 |
-
if exist "voice_agent.py" move /Y voice_agent.py archive\old_agents\
|
| 25 |
-
|
| 26 |
-
echo [2/4] Moving old documentation...
|
| 27 |
-
if exist "README_UPGRADED.md" move /Y README_UPGRADED.md archive\old_docs\
|
| 28 |
-
if exist "ARCHITECTURE.md" move /Y ARCHITECTURE.md archive\old_docs\
|
| 29 |
-
if exist "COMPARISON.md" move /Y COMPARISON.md archive\old_docs\
|
| 30 |
-
if exist "DEPLOYMENT_CHECKLIST.md" move /Y DEPLOYMENT_CHECKLIST.md archive\old_docs\
|
| 31 |
-
if exist "INDEX.md" move /Y INDEX.md archive\old_docs\
|
| 32 |
-
if exist "QUICK_REFERENCE.md" move /Y QUICK_REFERENCE.md archive\old_docs\
|
| 33 |
-
if exist "QUICKSTART.md" move /Y QUICKSTART.md archive\old_docs\
|
| 34 |
-
if exist "SUMMARY.md" move /Y SUMMARY.md archive\old_docs\
|
| 35 |
-
if exist "UPGRADE_GUIDE.md" move /Y UPGRADE_GUIDE.md archive\old_docs\
|
| 36 |
-
|
| 37 |
-
echo [3/4] Moving old scripts...
|
| 38 |
-
if exist "main.py" move /Y main.py archive\old_scripts\
|
| 39 |
-
if exist "main_upgraded.py" move /Y main_upgraded.py archive\old_scripts\
|
| 40 |
-
if exist "ask_question.py" move /Y ask_question.py archive\old_scripts\
|
| 41 |
-
if exist "ask_question_upgraded.py" move /Y ask_question_upgraded.py archive\old_scripts\
|
| 42 |
-
if exist "test_upgrade.py" move /Y test_upgrade.py archive\old_scripts\
|
| 43 |
-
if exist "install_upgrade.bat" move /Y install_upgrade.bat archive\old_scripts\
|
| 44 |
-
|
| 45 |
-
echo [4/4] Moving old requirements...
|
| 46 |
-
if exist "requirements_upgraded.txt" move /Y requirements_upgraded.txt archive\
|
| 47 |
-
if exist "pipcheck.txt" move /Y pipcheck.txt archive\
|
| 48 |
-
|
| 49 |
-
echo.
|
| 50 |
-
echo ============================================
|
| 51 |
-
echo Cleanup Complete!
|
| 52 |
-
echo ============================================
|
| 53 |
-
echo.
|
| 54 |
-
echo Old files moved to archive\ directory
|
| 55 |
-
echo.
|
| 56 |
-
echo Current structure:
|
| 57 |
-
echo agents/ - AI agents
|
| 58 |
-
echo config/ - Configuration
|
| 59 |
-
echo ui/ - Streamlit interface
|
| 60 |
-
echo data/ - Storage
|
| 61 |
-
echo archive/ - Old files (backup)
|
| 62 |
-
echo.
|
| 63 |
-
echo You can safely delete archive\ if not needed
|
| 64 |
-
echo.
|
| 65 |
-
pause
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
config/fast_mode.py
DELETED
|
@@ -1,40 +0,0 @@
|
|
| 1 |
-
# Fast Mode Configuration
|
| 2 |
-
# Copy this to config/settings.py to make VisionQ faster
|
| 3 |
-
|
| 4 |
-
# ============================================
|
| 5 |
-
# FAST MODE - Optimized for Speed
|
| 6 |
-
# ============================================
|
| 7 |
-
|
| 8 |
-
FEATURES = {
|
| 9 |
-
"ocr_enabled": False, # Disabled for speed
|
| 10 |
-
"faiss_enabled": True,
|
| 11 |
-
"neural_tts_enabled": True,
|
| 12 |
-
"intent_classification_enabled": True,
|
| 13 |
-
"object_detection_enabled": False, # Disabled for speed
|
| 14 |
-
"continuous_mode_enabled": True,
|
| 15 |
-
"embeddings_enabled": False, # Keep disabled
|
| 16 |
-
}
|
| 17 |
-
|
| 18 |
-
# Use nano YOLO if you enable object detection
|
| 19 |
-
MODEL_CONFIG = {
|
| 20 |
-
"yolo_model": "yolov8n.pt", # Nano model (faster)
|
| 21 |
-
"caption_model": "Salesforce/blip-image-captioning-base",
|
| 22 |
-
# ... rest of config
|
| 23 |
-
}
|
| 24 |
-
|
| 25 |
-
# ============================================
|
| 26 |
-
# EXPECTED PERFORMANCE
|
| 27 |
-
# ============================================
|
| 28 |
-
# With this config:
|
| 29 |
-
# - Capture & Describe: ~1.5 seconds
|
| 30 |
-
# - Remember Scene: ~1.5 seconds
|
| 31 |
-
# - Read Text: Disabled
|
| 32 |
-
#
|
| 33 |
-
# Speed improvement: ~40% faster!
|
| 34 |
-
# ============================================
|
| 35 |
-
|
| 36 |
-
# To apply:
|
| 37 |
-
# 1. Copy FEATURES section above
|
| 38 |
-
# 2. Paste into config/settings.py
|
| 39 |
-
# 3. Run: fix_and_run.bat
|
| 40 |
-
# 4. Test the speed!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
docs/CAMERA_FEED.md
DELETED
|
@@ -1,178 +0,0 @@
|
|
| 1 |
-
# Camera Feed Options
|
| 2 |
-
|
| 3 |
-
## Two Versions Available
|
| 4 |
-
|
| 5 |
-
### 1. Standard Version (app.py)
|
| 6 |
-
**File:** `ui/app.py`
|
| 7 |
-
**Launch:** `run.bat` or `streamlit run ui/app.py`
|
| 8 |
-
|
| 9 |
-
**Features:**
|
| 10 |
-
- Static camera feed
|
| 11 |
-
- Updates only when you click buttons
|
| 12 |
-
- Lower CPU usage
|
| 13 |
-
- Better for slower computers
|
| 14 |
-
|
| 15 |
-
**Best for:**
|
| 16 |
-
- Testing and development
|
| 17 |
-
- Slower computers
|
| 18 |
-
- Battery saving on laptops
|
| 19 |
-
|
| 20 |
-
---
|
| 21 |
-
|
| 22 |
-
### 2. Continuous Feed Version (app_continuous.py)
|
| 23 |
-
**File:** `ui/app_continuous.py`
|
| 24 |
-
**Launch:** `run_continuous.bat` or `streamlit run ui/app_continuous.py`
|
| 25 |
-
|
| 26 |
-
**Features:**
|
| 27 |
-
- Live continuous camera feed
|
| 28 |
-
- Adjustable refresh rate (0.5-5 seconds)
|
| 29 |
-
- Start/Stop camera button
|
| 30 |
-
- Real-time preview
|
| 31 |
-
|
| 32 |
-
**Best for:**
|
| 33 |
-
- Live monitoring
|
| 34 |
-
- Real-time demonstrations
|
| 35 |
-
- Faster computers with good camera
|
| 36 |
-
|
| 37 |
-
---
|
| 38 |
-
|
| 39 |
-
## Comparison
|
| 40 |
-
|
| 41 |
-
| Feature | Standard | Continuous |
|
| 42 |
-
|---------|----------|------------|
|
| 43 |
-
| **Camera Feed** | Static | Live |
|
| 44 |
-
| **Updates** | On button click | Automatic |
|
| 45 |
-
| **CPU Usage** | Low | Medium-High |
|
| 46 |
-
| **Refresh Rate** | Manual | 0.5-5 seconds |
|
| 47 |
-
| **Start/Stop** | No | Yes |
|
| 48 |
-
| **Battery Impact** | Low | Higher |
|
| 49 |
-
|
| 50 |
-
---
|
| 51 |
-
|
| 52 |
-
## How to Use Continuous Feed
|
| 53 |
-
|
| 54 |
-
### Step 1: Launch
|
| 55 |
-
```bash
|
| 56 |
-
run_continuous.bat
|
| 57 |
-
```
|
| 58 |
-
|
| 59 |
-
### Step 2: Initialize System
|
| 60 |
-
Click "Initialize System" in the Vision tab
|
| 61 |
-
|
| 62 |
-
### Step 3: Start Camera
|
| 63 |
-
Click "Start Camera" button
|
| 64 |
-
|
| 65 |
-
### Step 4: Adjust Settings
|
| 66 |
-
- Use sidebar slider to change refresh rate
|
| 67 |
-
- Lower rate = smoother but more CPU
|
| 68 |
-
- Higher rate = less CPU but choppier
|
| 69 |
-
|
| 70 |
-
### Step 5: Use Features
|
| 71 |
-
- Camera keeps running in background
|
| 72 |
-
- Click "Capture & Describe" anytime
|
| 73 |
-
- Click "Remember Scene" anytime
|
| 74 |
-
- Click "Read Text" anytime
|
| 75 |
-
|
| 76 |
-
### Step 6: Stop Camera
|
| 77 |
-
Click "Stop Camera" when done to save resources
|
| 78 |
-
|
| 79 |
-
---
|
| 80 |
-
|
| 81 |
-
## Performance Tips
|
| 82 |
-
|
| 83 |
-
### For Continuous Feed
|
| 84 |
-
|
| 85 |
-
**Optimize refresh rate:**
|
| 86 |
-
- Fast computer: 0.5-1 second
|
| 87 |
-
- Medium computer: 1-2 seconds
|
| 88 |
-
- Slow computer: 2-5 seconds
|
| 89 |
-
|
| 90 |
-
**Save resources:**
|
| 91 |
-
- Stop camera when not actively using
|
| 92 |
-
- Close other applications
|
| 93 |
-
- Use standard version if too slow
|
| 94 |
-
|
| 95 |
-
**Battery saving:**
|
| 96 |
-
- Use standard version on laptop
|
| 97 |
-
- Or set refresh rate to 3-5 seconds
|
| 98 |
-
- Stop camera between uses
|
| 99 |
-
|
| 100 |
-
---
|
| 101 |
-
|
| 102 |
-
## Troubleshooting
|
| 103 |
-
|
| 104 |
-
### Camera feed is choppy
|
| 105 |
-
**Solution:** Increase refresh rate in sidebar (try 2-3 seconds)
|
| 106 |
-
|
| 107 |
-
### High CPU usage
|
| 108 |
-
**Solution:**
|
| 109 |
-
- Stop camera when not needed
|
| 110 |
-
- Increase refresh rate
|
| 111 |
-
- Use standard version instead
|
| 112 |
-
|
| 113 |
-
### Camera won't start
|
| 114 |
-
**Solution:**
|
| 115 |
-
- Check camera permissions
|
| 116 |
-
- Close other apps using camera
|
| 117 |
-
- Try standard version first
|
| 118 |
-
- Restart application
|
| 119 |
-
|
| 120 |
-
### Feed freezes
|
| 121 |
-
**Solution:**
|
| 122 |
-
- Click "Stop Camera" then "Start Camera"
|
| 123 |
-
- Refresh browser page
|
| 124 |
-
- Restart application
|
| 125 |
-
|
| 126 |
-
---
|
| 127 |
-
|
| 128 |
-
## Which Version Should I Use?
|
| 129 |
-
|
| 130 |
-
### Use Standard Version (`run.bat`) if:
|
| 131 |
-
- Testing features
|
| 132 |
-
- Slower computer
|
| 133 |
-
- On battery power
|
| 134 |
-
- Don't need live feed
|
| 135 |
-
- Just want to capture occasionally
|
| 136 |
-
|
| 137 |
-
### Use Continuous Version (`run_continuous.bat`) if:
|
| 138 |
-
- Need live monitoring
|
| 139 |
-
- Demonstrating to others
|
| 140 |
-
- Fast computer with good camera
|
| 141 |
-
- Plugged into power
|
| 142 |
-
- Want real-time preview
|
| 143 |
-
|
| 144 |
-
---
|
| 145 |
-
|
| 146 |
-
## Switching Between Versions
|
| 147 |
-
|
| 148 |
-
You can switch anytime:
|
| 149 |
-
|
| 150 |
-
```bash
|
| 151 |
-
# Stop current version (Ctrl+C)
|
| 152 |
-
|
| 153 |
-
# Start standard version
|
| 154 |
-
run.bat
|
| 155 |
-
|
| 156 |
-
# OR start continuous version
|
| 157 |
-
run_continuous.bat
|
| 158 |
-
```
|
| 159 |
-
|
| 160 |
-
Both use the same memory and settings!
|
| 161 |
-
|
| 162 |
-
---
|
| 163 |
-
|
| 164 |
-
## Summary
|
| 165 |
-
|
| 166 |
-
**Standard Version:**
|
| 167 |
-
- Launch: `run.bat`
|
| 168 |
-
- Camera: Static (updates on click)
|
| 169 |
-
- CPU: Low
|
| 170 |
-
- Best for: General use
|
| 171 |
-
|
| 172 |
-
**Continuous Version:**
|
| 173 |
-
- Launch: `run_continuous.bat`
|
| 174 |
-
- Camera: Live feed
|
| 175 |
-
- CPU: Medium-High
|
| 176 |
-
- Best for: Live monitoring
|
| 177 |
-
|
| 178 |
-
**Try both and see which works better for you!**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
docs/PERFORMANCE.md
DELETED
|
@@ -1,187 +0,0 @@
|
|
| 1 |
-
# Performance Optimization Guide
|
| 2 |
-
|
| 3 |
-
## Speed Issues Fixed
|
| 4 |
-
|
| 5 |
-
### 1. Embedding Error Fixed
|
| 6 |
-
The AttributeError with CLIP embeddings has been fixed by using `torch.nn.functional.normalize()` instead of `.norm()`.
|
| 7 |
-
|
| 8 |
-
### 2. Embeddings Disabled by Default
|
| 9 |
-
Image embeddings (CLIP) are now **disabled by default** for faster performance.
|
| 10 |
-
|
| 11 |
-
To enable embeddings, edit `config/settings.py`:
|
| 12 |
-
```python
|
| 13 |
-
FEATURES = {
|
| 14 |
-
"embeddings_enabled": True, # Enable for visual similarity search
|
| 15 |
-
}
|
| 16 |
-
```
|
| 17 |
-
|
| 18 |
-
## Performance Settings
|
| 19 |
-
|
| 20 |
-
### Fast Mode (Default)
|
| 21 |
-
```python
|
| 22 |
-
# config/settings.py
|
| 23 |
-
FEATURES = {
|
| 24 |
-
"embeddings_enabled": False, # Faster
|
| 25 |
-
"ocr_enabled": True,
|
| 26 |
-
"object_detection_enabled": True,
|
| 27 |
-
}
|
| 28 |
-
```
|
| 29 |
-
|
| 30 |
-
**Speed:** Fast (2-3 seconds per capture)
|
| 31 |
-
**Features:** Caption + OCR + Object Detection
|
| 32 |
-
|
| 33 |
-
### Full Mode (Slower but more features)
|
| 34 |
-
```python
|
| 35 |
-
FEATURES = {
|
| 36 |
-
"embeddings_enabled": True, # Slower but enables visual search
|
| 37 |
-
"ocr_enabled": True,
|
| 38 |
-
"object_detection_enabled": True,
|
| 39 |
-
}
|
| 40 |
-
```
|
| 41 |
-
|
| 42 |
-
**Speed:** Slower (5-7 seconds per capture)
|
| 43 |
-
**Features:** All features including visual similarity search
|
| 44 |
-
|
| 45 |
-
## Speed Comparison
|
| 46 |
-
|
| 47 |
-
| Feature | Time | Can Disable? |
|
| 48 |
-
|---------|------|--------------|
|
| 49 |
-
| YOLO Detection | ~500ms | Yes (set object_detection_enabled=False) |
|
| 50 |
-
| BLIP Caption | ~1000ms | No (core feature) |
|
| 51 |
-
| CLIP Embeddings | ~2000ms | Yes (set embeddings_enabled=False) |
|
| 52 |
-
| EasyOCR | ~500ms | Yes (set ocr_enabled=False) |
|
| 53 |
-
|
| 54 |
-
## Optimization Tips
|
| 55 |
-
|
| 56 |
-
### 1. Disable Unused Features
|
| 57 |
-
Edit `config/settings.py`:
|
| 58 |
-
```python
|
| 59 |
-
FEATURES = {
|
| 60 |
-
"ocr_enabled": False, # If you don't need text reading
|
| 61 |
-
"embeddings_enabled": False, # If you don't need visual search
|
| 62 |
-
"object_detection_enabled": False, # If you don't need object detection
|
| 63 |
-
}
|
| 64 |
-
```
|
| 65 |
-
|
| 66 |
-
### 2. Use Smaller Models
|
| 67 |
-
```python
|
| 68 |
-
MODEL_CONFIG = {
|
| 69 |
-
"yolo_model": "yolov8n.pt", # Nano model (faster)
|
| 70 |
-
# Instead of "yolov8s.pt" (small model)
|
| 71 |
-
}
|
| 72 |
-
```
|
| 73 |
-
|
| 74 |
-
### 3. Reduce OCR Languages
|
| 75 |
-
```python
|
| 76 |
-
OCR_CONFIG = {
|
| 77 |
-
"languages": ["en"], # Just English (faster)
|
| 78 |
-
# Instead of ["en", "es", "fr", "de"]
|
| 79 |
-
}
|
| 80 |
-
```
|
| 81 |
-
|
| 82 |
-
### 4. Lower Confidence Thresholds
|
| 83 |
-
```python
|
| 84 |
-
VISION_CONFIG = {
|
| 85 |
-
"confidence_threshold": 0.3, # Lower = faster but less accurate
|
| 86 |
-
}
|
| 87 |
-
```
|
| 88 |
-
|
| 89 |
-
### 5. Use GPU (if available)
|
| 90 |
-
```python
|
| 91 |
-
PERFORMANCE_CONFIG = {
|
| 92 |
-
"use_gpu": True, # Much faster with GPU
|
| 93 |
-
}
|
| 94 |
-
```
|
| 95 |
-
|
| 96 |
-
## Recommended Settings
|
| 97 |
-
|
| 98 |
-
### For Speed
|
| 99 |
-
```python
|
| 100 |
-
# Fastest configuration
|
| 101 |
-
FEATURES = {
|
| 102 |
-
"ocr_enabled": False,
|
| 103 |
-
"embeddings_enabled": False,
|
| 104 |
-
"object_detection_enabled": False,
|
| 105 |
-
}
|
| 106 |
-
```
|
| 107 |
-
**Result:** ~1 second per capture (caption only)
|
| 108 |
-
|
| 109 |
-
### For Balance
|
| 110 |
-
```python
|
| 111 |
-
# Balanced configuration (default)
|
| 112 |
-
FEATURES = {
|
| 113 |
-
"ocr_enabled": True,
|
| 114 |
-
"embeddings_enabled": False,
|
| 115 |
-
"object_detection_enabled": True,
|
| 116 |
-
}
|
| 117 |
-
```
|
| 118 |
-
**Result:** ~2-3 seconds per capture
|
| 119 |
-
|
| 120 |
-
### For Full Features
|
| 121 |
-
```python
|
| 122 |
-
# All features enabled
|
| 123 |
-
FEATURES = {
|
| 124 |
-
"ocr_enabled": True,
|
| 125 |
-
"embeddings_enabled": True,
|
| 126 |
-
"object_detection_enabled": True,
|
| 127 |
-
}
|
| 128 |
-
```
|
| 129 |
-
**Result:** ~5-7 seconds per capture
|
| 130 |
-
|
| 131 |
-
## Troubleshooting Slow Performance
|
| 132 |
-
|
| 133 |
-
### Issue: First run is very slow
|
| 134 |
-
**Solution:** This is normal. Models are being downloaded (~2GB). Subsequent runs will be much faster.
|
| 135 |
-
|
| 136 |
-
### Issue: Every capture takes 5+ seconds
|
| 137 |
-
**Solution:** Disable embeddings in `config/settings.py`:
|
| 138 |
-
```python
|
| 139 |
-
FEATURES = {
|
| 140 |
-
"embeddings_enabled": False,
|
| 141 |
-
}
|
| 142 |
-
```
|
| 143 |
-
|
| 144 |
-
### Issue: OCR is slow
|
| 145 |
-
**Solution:**
|
| 146 |
-
1. Reduce languages to just what you need
|
| 147 |
-
2. Or disable OCR if not needed:
|
| 148 |
-
```python
|
| 149 |
-
FEATURES = {
|
| 150 |
-
"ocr_enabled": False,
|
| 151 |
-
}
|
| 152 |
-
```
|
| 153 |
-
|
| 154 |
-
### Issue: Out of memory
|
| 155 |
-
**Solution:**
|
| 156 |
-
1. Close other applications
|
| 157 |
-
2. Disable embeddings
|
| 158 |
-
3. Use smaller YOLO model (yolov8n.pt)
|
| 159 |
-
|
| 160 |
-
## Current Configuration
|
| 161 |
-
|
| 162 |
-
The system is now configured for **balanced performance**:
|
| 163 |
-
- Embeddings: DISABLED (faster)
|
| 164 |
-
- OCR: ENABLED
|
| 165 |
-
- Object Detection: ENABLED
|
| 166 |
-
- Caption: ENABLED (always on)
|
| 167 |
-
|
| 168 |
-
This gives you good features with reasonable speed (~2-3 seconds per capture).
|
| 169 |
-
|
| 170 |
-
## How to Change Settings
|
| 171 |
-
|
| 172 |
-
1. Open `config/settings.py`
|
| 173 |
-
2. Find the `FEATURES` section
|
| 174 |
-
3. Change `True`/`False` values
|
| 175 |
-
4. Restart the application
|
| 176 |
-
|
| 177 |
-
Example:
|
| 178 |
-
```python
|
| 179 |
-
# For maximum speed
|
| 180 |
-
FEATURES = {
|
| 181 |
-
"ocr_enabled": False,
|
| 182 |
-
"embeddings_enabled": False,
|
| 183 |
-
"object_detection_enabled": False,
|
| 184 |
-
}
|
| 185 |
-
```
|
| 186 |
-
|
| 187 |
-
Save the file and restart with `run.bat`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
docs/PERFORMANCE_ANALYSIS.md
DELETED
|
@@ -1,310 +0,0 @@
|
|
| 1 |
-
# VisionQ Performance Analysis
|
| 2 |
-
|
| 3 |
-
## Current Models Being Used
|
| 4 |
-
|
| 5 |
-
### 1. YOLO (Object Detection)
|
| 6 |
-
**Model:** YOLOv8s (Small)
|
| 7 |
-
**File:** `yolov8s.pt`
|
| 8 |
-
**Size:** ~22MB
|
| 9 |
-
**Speed:** ~500ms per frame
|
| 10 |
-
**Purpose:** Detect objects in scene
|
| 11 |
-
|
| 12 |
-
### 2. BLIP (Image Captioning)
|
| 13 |
-
**Model:** Salesforce/blip-image-captioning-base
|
| 14 |
-
**Size:** ~990MB
|
| 15 |
-
**Speed:** ~1000-1500ms per frame
|
| 16 |
-
**Purpose:** Generate scene descriptions
|
| 17 |
-
**THIS IS THE SLOWEST PART!**
|
| 18 |
-
|
| 19 |
-
### 3. EasyOCR (Text Extraction)
|
| 20 |
-
**Model:** EasyOCR English
|
| 21 |
-
**Size:** ~50MB per language
|
| 22 |
-
**Speed:** ~500ms per frame
|
| 23 |
-
**Purpose:** Read text from images
|
| 24 |
-
|
| 25 |
-
### 4. CLIP (Embeddings) - DISABLED
|
| 26 |
-
**Model:** openai/clip-vit-base-patch32
|
| 27 |
-
**Status:** Disabled by default
|
| 28 |
-
**Speed:** Would add ~2000ms if enabled
|
| 29 |
-
|
| 30 |
-
---
|
| 31 |
-
|
| 32 |
-
## Why Camera is Slow
|
| 33 |
-
|
| 34 |
-
### Current Processing Time Breakdown
|
| 35 |
-
|
| 36 |
-
**When you click "Capture & Describe":**
|
| 37 |
-
```
|
| 38 |
-
1. Capture frame: ~10ms
|
| 39 |
-
2. BLIP caption: ~1500ms ← SLOWEST!
|
| 40 |
-
3. EasyOCR text: ~500ms
|
| 41 |
-
4. Fusion/processing: ~50ms
|
| 42 |
-
--------------------------------
|
| 43 |
-
Total: ~2060ms (2+ seconds)
|
| 44 |
-
```
|
| 45 |
-
|
| 46 |
-
**The main bottleneck is BLIP (image captioning)!**
|
| 47 |
-
|
| 48 |
-
---
|
| 49 |
-
|
| 50 |
-
## Speed Optimization Options
|
| 51 |
-
|
| 52 |
-
### Option 1: Disable OCR (Fastest)
|
| 53 |
-
**Speed gain:** ~500ms faster
|
| 54 |
-
**Trade-off:** No text reading
|
| 55 |
-
|
| 56 |
-
Edit `config/settings.py`:
|
| 57 |
-
```python
|
| 58 |
-
FEATURES = {
|
| 59 |
-
"ocr_enabled": False, # Disable OCR
|
| 60 |
-
}
|
| 61 |
-
```
|
| 62 |
-
|
| 63 |
-
**New speed:** ~1.5 seconds
|
| 64 |
-
|
| 65 |
-
---
|
| 66 |
-
|
| 67 |
-
### Option 2: Use Smaller YOLO Model
|
| 68 |
-
**Speed gain:** ~200ms faster
|
| 69 |
-
**Trade-off:** Slightly less accurate object detection
|
| 70 |
-
|
| 71 |
-
Edit `config/settings.py`:
|
| 72 |
-
```python
|
| 73 |
-
MODEL_CONFIG = {
|
| 74 |
-
"yolo_model": "yolov8n.pt", # Nano model (faster)
|
| 75 |
-
}
|
| 76 |
-
```
|
| 77 |
-
|
| 78 |
-
Download nano model:
|
| 79 |
-
```bash
|
| 80 |
-
# In Python
|
| 81 |
-
from ultralytics import YOLO
|
| 82 |
-
model = YOLO("yolov8n.pt")
|
| 83 |
-
```
|
| 84 |
-
|
| 85 |
-
**New speed:** ~1.8 seconds
|
| 86 |
-
|
| 87 |
-
---
|
| 88 |
-
|
| 89 |
-
### Option 3: Disable Object Detection
|
| 90 |
-
**Speed gain:** ~500ms faster
|
| 91 |
-
**Trade-off:** No object detection
|
| 92 |
-
|
| 93 |
-
Edit `config/settings.py`:
|
| 94 |
-
```python
|
| 95 |
-
FEATURES = {
|
| 96 |
-
"object_detection_enabled": False,
|
| 97 |
-
}
|
| 98 |
-
```
|
| 99 |
-
|
| 100 |
-
**New speed:** ~1.5 seconds
|
| 101 |
-
|
| 102 |
-
---
|
| 103 |
-
|
| 104 |
-
### Option 4: Use Faster Caption Model (RECOMMENDED)
|
| 105 |
-
**Speed gain:** ~1000ms faster!
|
| 106 |
-
**Trade-off:** Slightly different captions
|
| 107 |
-
|
| 108 |
-
Replace BLIP with a faster model like GIT or BLIP-2 small.
|
| 109 |
-
|
| 110 |
-
---
|
| 111 |
-
|
| 112 |
-
### Option 5: Caption Only Mode (FASTEST)
|
| 113 |
-
**Speed gain:** Maximum
|
| 114 |
-
**Trade-off:** Only caption, no OCR or objects
|
| 115 |
-
|
| 116 |
-
Edit `config/settings.py`:
|
| 117 |
-
```python
|
| 118 |
-
FEATURES = {
|
| 119 |
-
"ocr_enabled": False,
|
| 120 |
-
"object_detection_enabled": False,
|
| 121 |
-
}
|
| 122 |
-
```
|
| 123 |
-
|
| 124 |
-
**New speed:** ~1.5 seconds (just BLIP)
|
| 125 |
-
|
| 126 |
-
---
|
| 127 |
-
|
| 128 |
-
## Recommended Configurations
|
| 129 |
-
|
| 130 |
-
### For Speed (Fastest)
|
| 131 |
-
```python
|
| 132 |
-
# config/settings.py
|
| 133 |
-
FEATURES = {
|
| 134 |
-
"ocr_enabled": False, # Disable OCR
|
| 135 |
-
"object_detection_enabled": False, # Disable YOLO
|
| 136 |
-
"embeddings_enabled": False, # Already disabled
|
| 137 |
-
}
|
| 138 |
-
|
| 139 |
-
MODEL_CONFIG = {
|
| 140 |
-
"yolo_model": "yolov8n.pt", # Use nano if keeping YOLO
|
| 141 |
-
}
|
| 142 |
-
```
|
| 143 |
-
|
| 144 |
-
**Result:** ~1.5 seconds per capture
|
| 145 |
-
|
| 146 |
-
---
|
| 147 |
-
|
| 148 |
-
### For Balance (Recommended)
|
| 149 |
-
```python
|
| 150 |
-
# config/settings.py
|
| 151 |
-
FEATURES = {
|
| 152 |
-
"ocr_enabled": True, # Keep OCR
|
| 153 |
-
"object_detection_enabled": False, # Disable YOLO (not critical)
|
| 154 |
-
"embeddings_enabled": False, # Keep disabled
|
| 155 |
-
}
|
| 156 |
-
```
|
| 157 |
-
|
| 158 |
-
**Result:** ~2 seconds per capture
|
| 159 |
-
|
| 160 |
-
---
|
| 161 |
-
|
| 162 |
-
### For Full Features (Slowest)
|
| 163 |
-
```python
|
| 164 |
-
# config/settings.py
|
| 165 |
-
FEATURES = {
|
| 166 |
-
"ocr_enabled": True,
|
| 167 |
-
"object_detection_enabled": True,
|
| 168 |
-
"embeddings_enabled": False, # Keep disabled!
|
| 169 |
-
}
|
| 170 |
-
```
|
| 171 |
-
|
| 172 |
-
**Result:** ~2.5 seconds per capture
|
| 173 |
-
|
| 174 |
-
---
|
| 175 |
-
|
| 176 |
-
## GPU Acceleration
|
| 177 |
-
|
| 178 |
-
If you have an NVIDIA GPU:
|
| 179 |
-
|
| 180 |
-
```python
|
| 181 |
-
# config/settings.py
|
| 182 |
-
PERFORMANCE_CONFIG = {
|
| 183 |
-
"use_gpu": True, # Enable GPU
|
| 184 |
-
}
|
| 185 |
-
|
| 186 |
-
OCR_CONFIG = {
|
| 187 |
-
"gpu": True, # Enable GPU for OCR
|
| 188 |
-
}
|
| 189 |
-
```
|
| 190 |
-
|
| 191 |
-
**Speed improvement:** 2-3x faster!
|
| 192 |
-
|
| 193 |
-
**Requirements:**
|
| 194 |
-
- NVIDIA GPU
|
| 195 |
-
- CUDA installed
|
| 196 |
-
- PyTorch with CUDA support
|
| 197 |
-
|
| 198 |
-
---
|
| 199 |
-
|
| 200 |
-
## Camera Feed Speed
|
| 201 |
-
|
| 202 |
-
The camera itself is fast (~10ms per frame).
|
| 203 |
-
|
| 204 |
-
**The slowness comes from AI processing, not the camera!**
|
| 205 |
-
|
| 206 |
-
### For Continuous Feed:
|
| 207 |
-
- Camera updates quickly
|
| 208 |
-
- But processing (BLIP) takes 1-2 seconds
|
| 209 |
-
- So you see lag between capture and results
|
| 210 |
-
|
| 211 |
-
### Solutions:
|
| 212 |
-
1. Use static feed (current `app.py`)
|
| 213 |
-
2. Disable heavy features (OCR, YOLO)
|
| 214 |
-
3. Use GPU acceleration
|
| 215 |
-
4. Accept the 1-2 second delay
|
| 216 |
-
|
| 217 |
-
---
|
| 218 |
-
|
| 219 |
-
## Model Comparison
|
| 220 |
-
|
| 221 |
-
| Model | Size | Speed | Accuracy | Replaceable? |
|
| 222 |
-
|-------|------|-------|----------|--------------|
|
| 223 |
-
| **BLIP** | 990MB | Slow (1.5s) | High | Yes (use GIT) |
|
| 224 |
-
| **YOLO** | 22MB | Medium (0.5s) | High | Yes (use nano) |
|
| 225 |
-
| **EasyOCR** | 50MB | Medium (0.5s) | High | Hard to replace |
|
| 226 |
-
| **CLIP** | 500MB | Slow (2s) | High | Disabled |
|
| 227 |
-
|
| 228 |
-
---
|
| 229 |
-
|
| 230 |
-
## Quick Fixes You Can Try Now
|
| 231 |
-
|
| 232 |
-
### 1. Disable OCR
|
| 233 |
-
```python
|
| 234 |
-
# config/settings.py
|
| 235 |
-
FEATURES = {
|
| 236 |
-
"ocr_enabled": False,
|
| 237 |
-
}
|
| 238 |
-
```
|
| 239 |
-
**Restart app:** `fix_and_run.bat`
|
| 240 |
-
|
| 241 |
-
### 2. Disable YOLO
|
| 242 |
-
```python
|
| 243 |
-
# config/settings.py
|
| 244 |
-
FEATURES = {
|
| 245 |
-
"object_detection_enabled": False,
|
| 246 |
-
}
|
| 247 |
-
```
|
| 248 |
-
**Restart app:** `fix_and_run.bat`
|
| 249 |
-
|
| 250 |
-
### 3. Both (Fastest)
|
| 251 |
-
```python
|
| 252 |
-
# config/settings.py
|
| 253 |
-
FEATURES = {
|
| 254 |
-
"ocr_enabled": False,
|
| 255 |
-
"object_detection_enabled": False,
|
| 256 |
-
}
|
| 257 |
-
```
|
| 258 |
-
**Restart app:** `fix_and_run.bat`
|
| 259 |
-
|
| 260 |
-
**Result:** Only BLIP caption (~1.5 seconds)
|
| 261 |
-
|
| 262 |
-
---
|
| 263 |
-
|
| 264 |
-
## Alternative: Use Lighter Caption Model
|
| 265 |
-
|
| 266 |
-
Create a new caption agent with a faster model:
|
| 267 |
-
|
| 268 |
-
```python
|
| 269 |
-
# agents/caption_agent_fast.py
|
| 270 |
-
from transformers import AutoProcessor, AutoModelForCausalLM
|
| 271 |
-
|
| 272 |
-
class FastCaptionAgent:
|
| 273 |
-
def __init__(self):
|
| 274 |
-
# Use GIT (faster than BLIP)
|
| 275 |
-
self.processor = AutoProcessor.from_pretrained("microsoft/git-base")
|
| 276 |
-
self.model = AutoModelForCausalLM.from_pretrained("microsoft/git-base")
|
| 277 |
-
self.model.eval()
|
| 278 |
-
|
| 279 |
-
def describe(self, frame_bgr):
|
| 280 |
-
# Same as BLIP but faster
|
| 281 |
-
...
|
| 282 |
-
```
|
| 283 |
-
|
| 284 |
-
**Speed:** ~500ms (3x faster than BLIP!)
|
| 285 |
-
|
| 286 |
-
---
|
| 287 |
-
|
| 288 |
-
## Summary
|
| 289 |
-
|
| 290 |
-
**Why slow:**
|
| 291 |
-
- BLIP caption model takes 1.5 seconds
|
| 292 |
-
- OCR adds 0.5 seconds
|
| 293 |
-
- YOLO adds 0.5 seconds
|
| 294 |
-
- Total: 2.5 seconds
|
| 295 |
-
|
| 296 |
-
**Quick fix:**
|
| 297 |
-
```python
|
| 298 |
-
# Disable OCR and YOLO
|
| 299 |
-
FEATURES = {
|
| 300 |
-
"ocr_enabled": False,
|
| 301 |
-
"object_detection_enabled": False,
|
| 302 |
-
}
|
| 303 |
-
```
|
| 304 |
-
**New speed:** 1.5 seconds (just BLIP)
|
| 305 |
-
|
| 306 |
-
**Best fix:**
|
| 307 |
-
- Use GPU acceleration (2-3x faster)
|
| 308 |
-
- Or replace BLIP with GIT model (3x faster)
|
| 309 |
-
|
| 310 |
-
**The camera itself is fast - it's the AI models that are slow!**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
extras/labelmap_M.txt
DELETED
|
@@ -1,91 +0,0 @@
|
|
| 1 |
-
???
|
| 2 |
-
person
|
| 3 |
-
bicycle
|
| 4 |
-
car
|
| 5 |
-
motorcycle
|
| 6 |
-
airplane
|
| 7 |
-
bus
|
| 8 |
-
train
|
| 9 |
-
truck
|
| 10 |
-
boat
|
| 11 |
-
traffic light
|
| 12 |
-
fire hydrant
|
| 13 |
-
???
|
| 14 |
-
stop sign
|
| 15 |
-
parking meter
|
| 16 |
-
bench
|
| 17 |
-
bird
|
| 18 |
-
cat
|
| 19 |
-
dog
|
| 20 |
-
horse
|
| 21 |
-
sheep
|
| 22 |
-
cow
|
| 23 |
-
elephant
|
| 24 |
-
bear
|
| 25 |
-
zebra
|
| 26 |
-
giraffe
|
| 27 |
-
???
|
| 28 |
-
backpack
|
| 29 |
-
umbrella
|
| 30 |
-
???
|
| 31 |
-
???
|
| 32 |
-
handbag
|
| 33 |
-
tie
|
| 34 |
-
suitcase
|
| 35 |
-
frisbee
|
| 36 |
-
skis
|
| 37 |
-
snowboard
|
| 38 |
-
sports ball
|
| 39 |
-
kite
|
| 40 |
-
baseball bat
|
| 41 |
-
baseball glove
|
| 42 |
-
skateboard
|
| 43 |
-
surfboard
|
| 44 |
-
tennis racket
|
| 45 |
-
bottle
|
| 46 |
-
???
|
| 47 |
-
wine glass
|
| 48 |
-
cup
|
| 49 |
-
fork
|
| 50 |
-
knife
|
| 51 |
-
spoon
|
| 52 |
-
bowl
|
| 53 |
-
banana
|
| 54 |
-
apple
|
| 55 |
-
sandwich
|
| 56 |
-
orange
|
| 57 |
-
broccoli
|
| 58 |
-
carrot
|
| 59 |
-
hot dog
|
| 60 |
-
pizza
|
| 61 |
-
donut
|
| 62 |
-
cake
|
| 63 |
-
chair
|
| 64 |
-
couch
|
| 65 |
-
potted plant
|
| 66 |
-
bed
|
| 67 |
-
???
|
| 68 |
-
dining table
|
| 69 |
-
???
|
| 70 |
-
???
|
| 71 |
-
toilet
|
| 72 |
-
???
|
| 73 |
-
tv
|
| 74 |
-
laptop
|
| 75 |
-
mouse
|
| 76 |
-
remote
|
| 77 |
-
keyboard
|
| 78 |
-
cell phone
|
| 79 |
-
microwave
|
| 80 |
-
oven
|
| 81 |
-
toaster
|
| 82 |
-
sink
|
| 83 |
-
refrigerator
|
| 84 |
-
???
|
| 85 |
-
book
|
| 86 |
-
clock
|
| 87 |
-
vase
|
| 88 |
-
scissors
|
| 89 |
-
teddy bear
|
| 90 |
-
hair drier
|
| 91 |
-
toothbrush
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
fix_and_run.bat
DELETED
|
@@ -1,40 +0,0 @@
|
|
| 1 |
-
@echo off
|
| 2 |
-
REM Quick Fix Script for VisionQ
|
| 3 |
-
REM Clears cache and restarts
|
| 4 |
-
|
| 5 |
-
echo.
|
| 6 |
-
echo ============================================
|
| 7 |
-
echo VisionQ - Quick Fix Script
|
| 8 |
-
echo ============================================
|
| 9 |
-
echo.
|
| 10 |
-
|
| 11 |
-
echo [1/3] Clearing Python cache...
|
| 12 |
-
if exist "__pycache__" rd /s /q __pycache__
|
| 13 |
-
if exist "agents\__pycache__" rd /s /q agents\__pycache__
|
| 14 |
-
if exist "config\__pycache__" rd /s /q config\__pycache__
|
| 15 |
-
if exist "core\__pycache__" rd /s /q core\__pycache__
|
| 16 |
-
if exist "ui\__pycache__" rd /s /q ui\__pycache__
|
| 17 |
-
echo - Python cache cleared
|
| 18 |
-
|
| 19 |
-
echo.
|
| 20 |
-
echo [2/3] Clearing Streamlit cache...
|
| 21 |
-
if exist ".streamlit\cache" rd /s /q .streamlit\cache
|
| 22 |
-
echo - Streamlit cache cleared
|
| 23 |
-
|
| 24 |
-
echo.
|
| 25 |
-
echo [3/3] Restarting application...
|
| 26 |
-
echo.
|
| 27 |
-
echo ============================================
|
| 28 |
-
echo Cache cleared! Starting VisionQ...
|
| 29 |
-
echo ============================================
|
| 30 |
-
echo.
|
| 31 |
-
|
| 32 |
-
REM Activate venv if exists
|
| 33 |
-
if exist "venv\Scripts\activate.bat" (
|
| 34 |
-
call venv\Scripts\activate.bat
|
| 35 |
-
)
|
| 36 |
-
|
| 37 |
-
REM Run Streamlit
|
| 38 |
-
streamlit run ui\app.py
|
| 39 |
-
|
| 40 |
-
pause
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
fix_tensorflow.bat
DELETED
|
@@ -1,43 +0,0 @@
|
|
| 1 |
-
@echo off
|
| 2 |
-
REM Fix TensorFlow/Protobuf Conflict
|
| 3 |
-
|
| 4 |
-
echo.
|
| 5 |
-
echo ============================================
|
| 6 |
-
echo Fixing TensorFlow/Protobuf Conflict
|
| 7 |
-
echo ============================================
|
| 8 |
-
echo.
|
| 9 |
-
|
| 10 |
-
REM Activate venv
|
| 11 |
-
if exist ".venv\Scripts\activate.bat" (
|
| 12 |
-
call .venv\Scripts\activate.bat
|
| 13 |
-
) else if exist "venv\Scripts\activate.bat" (
|
| 14 |
-
call venv\Scripts\activate.bat
|
| 15 |
-
)
|
| 16 |
-
|
| 17 |
-
echo [1/4] Uninstalling conflicting packages...
|
| 18 |
-
pip uninstall tensorflow tensorflow-cpu protobuf -y
|
| 19 |
-
|
| 20 |
-
echo.
|
| 21 |
-
echo [2/4] Installing correct protobuf version...
|
| 22 |
-
pip install protobuf==3.20.3
|
| 23 |
-
|
| 24 |
-
echo.
|
| 25 |
-
echo [3/4] Reinstalling transformers...
|
| 26 |
-
pip install --upgrade --force-reinstall transformers
|
| 27 |
-
|
| 28 |
-
echo.
|
| 29 |
-
echo [4/4] Clearing cache...
|
| 30 |
-
rd /s /q __pycache__ 2>nul
|
| 31 |
-
rd /s /q agents\__pycache__ 2>nul
|
| 32 |
-
rd /s /q config\__pycache__ 2>nul
|
| 33 |
-
rd /s /q core\__pycache__ 2>nul
|
| 34 |
-
rd /s /q ui\__pycache__ 2>nul
|
| 35 |
-
|
| 36 |
-
echo.
|
| 37 |
-
echo ============================================
|
| 38 |
-
echo Fix Complete!
|
| 39 |
-
echo ============================================
|
| 40 |
-
echo.
|
| 41 |
-
echo Now run: streamlit run ui\app.py
|
| 42 |
-
echo.
|
| 43 |
-
pause
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
memory.json
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
run_continuous.bat
DELETED
|
@@ -1,30 +0,0 @@
|
|
| 1 |
-
@echo off
|
| 2 |
-
REM VisionQ - Continuous Camera Feed Version
|
| 3 |
-
|
| 4 |
-
echo.
|
| 5 |
-
echo ============================================
|
| 6 |
-
echo VisionQ - Continuous Camera Feed
|
| 7 |
-
echo ============================================
|
| 8 |
-
echo.
|
| 9 |
-
|
| 10 |
-
REM Activate venv
|
| 11 |
-
if exist ".venv\Scripts\activate.bat" (
|
| 12 |
-
call .venv\Scripts\activate.bat
|
| 13 |
-
) else if exist "venv\Scripts\activate.bat" (
|
| 14 |
-
call venv\Scripts\activate.bat
|
| 15 |
-
)
|
| 16 |
-
|
| 17 |
-
echo [INFO] Launching VisionQ with continuous camera feed...
|
| 18 |
-
echo [INFO] Opening browser at http://localhost:8501
|
| 19 |
-
echo.
|
| 20 |
-
echo Features:
|
| 21 |
-
echo - Live camera feed
|
| 22 |
-
echo - Adjustable refresh rate
|
| 23 |
-
echo - Start/Stop camera control
|
| 24 |
-
echo.
|
| 25 |
-
echo Press Ctrl+C to stop the server
|
| 26 |
-
echo.
|
| 27 |
-
|
| 28 |
-
streamlit run ui\app_continuous.py
|
| 29 |
-
|
| 30 |
-
pause
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ui/app_continuous.py
DELETED
|
@@ -1,340 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
VisionQ - Enhanced Streamlit Interface with Continuous Camera Feed
|
| 3 |
-
"""
|
| 4 |
-
|
| 5 |
-
import streamlit as st
|
| 6 |
-
import cv2
|
| 7 |
-
import numpy as np
|
| 8 |
-
from PIL import Image
|
| 9 |
-
import sys
|
| 10 |
-
from pathlib import Path
|
| 11 |
-
import time
|
| 12 |
-
|
| 13 |
-
# Add project root to path
|
| 14 |
-
PROJECT_ROOT = Path(__file__).parent.parent
|
| 15 |
-
sys.path.insert(0, str(PROJECT_ROOT))
|
| 16 |
-
|
| 17 |
-
from config.settings import UI_CONFIG, OCR_CONFIG, SUPPORTED_LANGUAGES
|
| 18 |
-
from agents.vision_agent import VisionAgent
|
| 19 |
-
from agents.memory_agent import MemoryAgent
|
| 20 |
-
from agents.query_agent import QueryAgent
|
| 21 |
-
|
| 22 |
-
# Page config
|
| 23 |
-
st.set_page_config(
|
| 24 |
-
page_title=UI_CONFIG["title"],
|
| 25 |
-
page_icon="👁️",
|
| 26 |
-
layout=UI_CONFIG["layout"],
|
| 27 |
-
)
|
| 28 |
-
|
| 29 |
-
# Custom CSS
|
| 30 |
-
st.markdown("""
|
| 31 |
-
<style>
|
| 32 |
-
.main-header {
|
| 33 |
-
font-size: 3rem;
|
| 34 |
-
font-weight: bold;
|
| 35 |
-
text-align: center;
|
| 36 |
-
margin-bottom: 2rem;
|
| 37 |
-
background: linear-gradient(90deg, #667eea 0%, #764ba2 100%);
|
| 38 |
-
-webkit-background-clip: text;
|
| 39 |
-
-webkit-text-fill-color: transparent;
|
| 40 |
-
}
|
| 41 |
-
.success-box {
|
| 42 |
-
padding: 1rem;
|
| 43 |
-
border-radius: 0.5rem;
|
| 44 |
-
background-color: #d4edda;
|
| 45 |
-
border: 1px solid #c3e6cb;
|
| 46 |
-
color: #155724;
|
| 47 |
-
}
|
| 48 |
-
</style>
|
| 49 |
-
""", unsafe_allow_html=True)
|
| 50 |
-
|
| 51 |
-
# Initialize session state
|
| 52 |
-
if "vision_agent" not in st.session_state:
|
| 53 |
-
st.session_state.vision_agent = None
|
| 54 |
-
if "memory_agent" not in st.session_state:
|
| 55 |
-
st.session_state.memory_agent = None
|
| 56 |
-
if "query_agent" not in st.session_state:
|
| 57 |
-
st.session_state.query_agent = None
|
| 58 |
-
if "last_description" not in st.session_state:
|
| 59 |
-
st.session_state.last_description = None
|
| 60 |
-
if "camera_running" not in st.session_state:
|
| 61 |
-
st.session_state.camera_running = False
|
| 62 |
-
|
| 63 |
-
@st.cache_resource
|
| 64 |
-
def load_agents():
|
| 65 |
-
"""Load all agents (cached)"""
|
| 66 |
-
with st.spinner("Loading AI models... This may take a minute on first run..."):
|
| 67 |
-
try:
|
| 68 |
-
vision = VisionAgent()
|
| 69 |
-
memory = MemoryAgent()
|
| 70 |
-
query = QueryAgent(memory)
|
| 71 |
-
return vision, memory, query
|
| 72 |
-
except Exception as e:
|
| 73 |
-
st.error(f"Error loading agents: {e}")
|
| 74 |
-
return None, None, None
|
| 75 |
-
|
| 76 |
-
def capture_frame(vision_agent):
|
| 77 |
-
"""Capture frame from camera"""
|
| 78 |
-
ret, frame = vision_agent.cap.read()
|
| 79 |
-
if ret:
|
| 80 |
-
return frame
|
| 81 |
-
return None
|
| 82 |
-
|
| 83 |
-
def main():
|
| 84 |
-
# Header
|
| 85 |
-
st.markdown('<h1 class="main-header">VisionQ - Multimodal AI Assistant</h1>', unsafe_allow_html=True)
|
| 86 |
-
|
| 87 |
-
# Sidebar
|
| 88 |
-
with st.sidebar:
|
| 89 |
-
st.header("Settings")
|
| 90 |
-
|
| 91 |
-
# Language selection
|
| 92 |
-
st.subheader("OCR Language")
|
| 93 |
-
selected_langs = st.multiselect(
|
| 94 |
-
"Select languages for text extraction:",
|
| 95 |
-
options=list(SUPPORTED_LANGUAGES.keys()),
|
| 96 |
-
default=OCR_CONFIG["languages"],
|
| 97 |
-
format_func=lambda x: f"{SUPPORTED_LANGUAGES[x]} ({x})"
|
| 98 |
-
)
|
| 99 |
-
|
| 100 |
-
if selected_langs:
|
| 101 |
-
OCR_CONFIG["languages"] = selected_langs
|
| 102 |
-
|
| 103 |
-
st.divider()
|
| 104 |
-
|
| 105 |
-
# Camera settings
|
| 106 |
-
st.subheader("Camera Settings")
|
| 107 |
-
refresh_rate = st.slider("Refresh rate (seconds)", 0.5, 5.0, 1.0, 0.5)
|
| 108 |
-
|
| 109 |
-
st.divider()
|
| 110 |
-
|
| 111 |
-
# Info
|
| 112 |
-
st.subheader("About")
|
| 113 |
-
st.info("""
|
| 114 |
-
**VisionQ** is a multimodal AI assistant that can:
|
| 115 |
-
- See and describe scenes
|
| 116 |
-
- Read text (OCR)
|
| 117 |
-
- Remember and recall
|
| 118 |
-
- Search memories
|
| 119 |
-
""")
|
| 120 |
-
|
| 121 |
-
st.divider()
|
| 122 |
-
|
| 123 |
-
# Stats
|
| 124 |
-
st.subheader("System Status")
|
| 125 |
-
if st.session_state.memory_agent:
|
| 126 |
-
memories = st.session_state.memory_agent.recall_all()
|
| 127 |
-
st.metric("Memories Stored", len(memories))
|
| 128 |
-
else:
|
| 129 |
-
st.metric("Memories Stored", "Not loaded")
|
| 130 |
-
|
| 131 |
-
# Main content
|
| 132 |
-
tab1, tab2, tab3, tab4 = st.tabs(["Vision", "Query", "Memories", "Help"])
|
| 133 |
-
|
| 134 |
-
# TAB 1: VISION
|
| 135 |
-
with tab1:
|
| 136 |
-
st.header("Vision System")
|
| 137 |
-
|
| 138 |
-
# Load agents
|
| 139 |
-
if st.session_state.vision_agent is None:
|
| 140 |
-
if st.button("Initialize System", type="primary"):
|
| 141 |
-
st.cache_resource.clear()
|
| 142 |
-
vision, memory, query = load_agents()
|
| 143 |
-
if vision:
|
| 144 |
-
st.session_state.vision_agent = vision
|
| 145 |
-
st.session_state.memory_agent = memory
|
| 146 |
-
st.session_state.query_agent = query
|
| 147 |
-
st.success("System initialized successfully!")
|
| 148 |
-
st.rerun()
|
| 149 |
-
else:
|
| 150 |
-
col1, col2 = st.columns([2, 1])
|
| 151 |
-
|
| 152 |
-
with col1:
|
| 153 |
-
st.subheader("Live Camera Feed")
|
| 154 |
-
|
| 155 |
-
# Camera controls
|
| 156 |
-
col_a, col_b, col_c, col_d = st.columns(4)
|
| 157 |
-
|
| 158 |
-
with col_a:
|
| 159 |
-
if st.button("Capture & Describe"):
|
| 160 |
-
with st.spinner("Analyzing scene..."):
|
| 161 |
-
description = st.session_state.vision_agent.describe_scene()
|
| 162 |
-
if description:
|
| 163 |
-
st.session_state.last_description = description
|
| 164 |
-
st.success("Scene analyzed!")
|
| 165 |
-
|
| 166 |
-
with col_b:
|
| 167 |
-
if st.button("Remember Scene"):
|
| 168 |
-
with st.spinner("Storing memory..."):
|
| 169 |
-
description = st.session_state.vision_agent.remember_scene()
|
| 170 |
-
if description:
|
| 171 |
-
st.session_state.last_description = description
|
| 172 |
-
st.success("Scene remembered!")
|
| 173 |
-
|
| 174 |
-
with col_c:
|
| 175 |
-
if st.button("Read Text"):
|
| 176 |
-
with st.spinner("Extracting text..."):
|
| 177 |
-
text_result = st.session_state.vision_agent.read_text()
|
| 178 |
-
if text_result:
|
| 179 |
-
st.session_state.last_description = text_result
|
| 180 |
-
st.success("Text extracted!")
|
| 181 |
-
|
| 182 |
-
with col_d:
|
| 183 |
-
if st.button("Stop Camera" if st.session_state.camera_running else "Start Camera"):
|
| 184 |
-
st.session_state.camera_running = not st.session_state.camera_running
|
| 185 |
-
st.rerun()
|
| 186 |
-
|
| 187 |
-
# Camera feed placeholder
|
| 188 |
-
camera_placeholder = st.empty()
|
| 189 |
-
|
| 190 |
-
# Continuous camera feed
|
| 191 |
-
if st.session_state.camera_running:
|
| 192 |
-
frame = capture_frame(st.session_state.vision_agent)
|
| 193 |
-
if frame is not None:
|
| 194 |
-
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
|
| 195 |
-
camera_placeholder.image(frame_rgb, channels="RGB", use_container_width=True)
|
| 196 |
-
time.sleep(refresh_rate)
|
| 197 |
-
st.rerun()
|
| 198 |
-
else:
|
| 199 |
-
camera_placeholder.error("Could not capture frame from camera")
|
| 200 |
-
else:
|
| 201 |
-
# Show single frame when stopped
|
| 202 |
-
frame = capture_frame(st.session_state.vision_agent)
|
| 203 |
-
if frame is not None:
|
| 204 |
-
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
|
| 205 |
-
camera_placeholder.image(frame_rgb, channels="RGB", use_container_width=True)
|
| 206 |
-
else:
|
| 207 |
-
camera_placeholder.info("Click 'Start Camera' to begin live feed")
|
| 208 |
-
|
| 209 |
-
with col2:
|
| 210 |
-
st.subheader("Results")
|
| 211 |
-
|
| 212 |
-
if st.session_state.last_description:
|
| 213 |
-
st.markdown(f'<div class="success-box">{st.session_state.last_description}</div>',
|
| 214 |
-
unsafe_allow_html=True)
|
| 215 |
-
else:
|
| 216 |
-
st.info("Click a button to analyze the scene")
|
| 217 |
-
|
| 218 |
-
# TAB 2: QUERY
|
| 219 |
-
with tab2:
|
| 220 |
-
st.header("Query Memories")
|
| 221 |
-
|
| 222 |
-
if st.session_state.query_agent is None:
|
| 223 |
-
st.warning("Please initialize the system first (Vision tab)")
|
| 224 |
-
else:
|
| 225 |
-
st.subheader("Ask a Question")
|
| 226 |
-
|
| 227 |
-
query_text = st.text_input(
|
| 228 |
-
"Enter your question:",
|
| 229 |
-
placeholder="e.g., What did I see this morning?",
|
| 230 |
-
key="query_input"
|
| 231 |
-
)
|
| 232 |
-
|
| 233 |
-
st.caption("**Examples:**")
|
| 234 |
-
col1, col2, col3 = st.columns(3)
|
| 235 |
-
with col1:
|
| 236 |
-
if st.button("What did I see today?"):
|
| 237 |
-
query_text = "What did I see today?"
|
| 238 |
-
with col2:
|
| 239 |
-
if st.button("When did I see a person?"):
|
| 240 |
-
query_text = "When did I see a person?"
|
| 241 |
-
with col3:
|
| 242 |
-
if st.button("Show memories with text"):
|
| 243 |
-
query_text = "Show memories with text"
|
| 244 |
-
|
| 245 |
-
if st.button("Search", type="primary") and query_text:
|
| 246 |
-
with st.spinner("Searching memories..."):
|
| 247 |
-
result = st.session_state.query_agent.ask(query_text)
|
| 248 |
-
|
| 249 |
-
st.subheader("Results")
|
| 250 |
-
if "don't" in result.lower() or "no" in result.lower():
|
| 251 |
-
st.info(result)
|
| 252 |
-
else:
|
| 253 |
-
st.success(result)
|
| 254 |
-
|
| 255 |
-
# TAB 3: MEMORIES
|
| 256 |
-
with tab3:
|
| 257 |
-
st.header("Memory Browser")
|
| 258 |
-
|
| 259 |
-
if st.session_state.memory_agent is None:
|
| 260 |
-
st.warning("Please initialize the system first (Vision tab)")
|
| 261 |
-
else:
|
| 262 |
-
memories = st.session_state.memory_agent.recall_all()
|
| 263 |
-
|
| 264 |
-
if not memories:
|
| 265 |
-
st.info("No memories stored yet. Use the Vision tab to remember scenes!")
|
| 266 |
-
else:
|
| 267 |
-
st.success(f"Total memories: {len(memories)}")
|
| 268 |
-
|
| 269 |
-
for i, mem in enumerate(reversed(memories[-10:])):
|
| 270 |
-
with st.expander(f"Memory #{mem.get('id', i)} - {mem.get('timestamp', 'Unknown')}"):
|
| 271 |
-
st.write(f"**Description:** {mem.get('description', 'N/A')}")
|
| 272 |
-
st.write(f"**Importance:** {mem.get('importance', 1)}")
|
| 273 |
-
|
| 274 |
-
has_text_emb = "text_embedding" in mem
|
| 275 |
-
has_img_emb = "image_embedding" in mem
|
| 276 |
-
|
| 277 |
-
col1, col2 = st.columns(2)
|
| 278 |
-
with col1:
|
| 279 |
-
st.caption(f"Text Embedding: {'Yes' if has_text_emb else 'No'}")
|
| 280 |
-
with col2:
|
| 281 |
-
st.caption(f"Image Embedding: {'Yes' if has_img_emb else 'No'}")
|
| 282 |
-
|
| 283 |
-
st.divider()
|
| 284 |
-
if st.button("Clear All Memories", type="secondary"):
|
| 285 |
-
if st.button("Confirm Clear"):
|
| 286 |
-
st.session_state.memory_agent.memories = []
|
| 287 |
-
st.session_state.memory_agent._save()
|
| 288 |
-
st.success("All memories cleared!")
|
| 289 |
-
st.rerun()
|
| 290 |
-
|
| 291 |
-
# TAB 4: HELP
|
| 292 |
-
with tab4:
|
| 293 |
-
st.header("Help & Documentation")
|
| 294 |
-
|
| 295 |
-
st.subheader("Quick Start")
|
| 296 |
-
st.markdown("""
|
| 297 |
-
1. **Initialize System**: Click "Initialize System" in the Vision tab
|
| 298 |
-
2. **Start Camera**: Click "Start Camera" for continuous feed
|
| 299 |
-
3. **Capture Scene**: Click "Capture & Describe" to analyze
|
| 300 |
-
4. **Remember**: Click "Remember Scene" to store in memory
|
| 301 |
-
5. **Read Text**: Click "Read Text" to extract visible text
|
| 302 |
-
6. **Query**: Go to Query tab and ask questions
|
| 303 |
-
""")
|
| 304 |
-
|
| 305 |
-
st.divider()
|
| 306 |
-
|
| 307 |
-
st.subheader("Camera Controls")
|
| 308 |
-
st.markdown("""
|
| 309 |
-
- **Start Camera**: Begins continuous live feed
|
| 310 |
-
- **Stop Camera**: Pauses live feed (saves resources)
|
| 311 |
-
- **Refresh Rate**: Adjust in sidebar (0.5-5 seconds)
|
| 312 |
-
|
| 313 |
-
**Tip:** Stop camera when not in use to save CPU/battery
|
| 314 |
-
""")
|
| 315 |
-
|
| 316 |
-
st.divider()
|
| 317 |
-
|
| 318 |
-
st.subheader("Supported Languages")
|
| 319 |
-
st.markdown(f"""
|
| 320 |
-
VisionQ supports **{len(SUPPORTED_LANGUAGES)} languages** for OCR.
|
| 321 |
-
Select languages in the sidebar settings.
|
| 322 |
-
""")
|
| 323 |
-
|
| 324 |
-
st.divider()
|
| 325 |
-
|
| 326 |
-
st.subheader("Troubleshooting")
|
| 327 |
-
st.markdown("""
|
| 328 |
-
**Camera not working?**
|
| 329 |
-
- Check camera permissions
|
| 330 |
-
- Ensure no other app is using camera
|
| 331 |
-
- Try clicking "Stop Camera" then "Start Camera"
|
| 332 |
-
|
| 333 |
-
**System slow?**
|
| 334 |
-
- Stop camera when not needed
|
| 335 |
-
- Increase refresh rate in sidebar
|
| 336 |
-
- Check `docs/PERFORMANCE.md`
|
| 337 |
-
""")
|
| 338 |
-
|
| 339 |
-
if __name__ == "__main__":
|
| 340 |
-
main()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|