# ๐Ÿ“‚ VisionQ - Project Structure ## ๐ŸŽฏ Clean & Organized Structure ``` VisionQ/ โ”‚ โ”œโ”€โ”€ ๐Ÿ“ agents/ # AI Agents (Core Logic) โ”‚ โ”œโ”€โ”€ __init__.py โ”‚ โ”œโ”€โ”€ voice_agent.py # Voice I/O (STT + TTS) โ”‚ โ”œโ”€โ”€ vision_agent.py # Vision coordinator โ”‚ โ”œโ”€โ”€ caption_agent.py # Image captioning (BLIP) โ”‚ โ”œโ”€โ”€ embedding_agent.py # Visual embeddings (CLIP) โ”‚ โ”œโ”€โ”€ ocr_agent.py # Text extraction (EasyOCR) โ”‚ โ”œโ”€โ”€ memory_agent.py # Storage (JSON + FAISS) โ”‚ โ””โ”€โ”€ query_agent.py # Smart retrieval (DistilBERT) โ”‚ โ”œโ”€โ”€ ๐Ÿ“ config/ # Configuration โ”‚ โ””โ”€โ”€ settings.py # Centralized settings โ”‚ โ”œโ”€โ”€ ๐Ÿ“ ui/ # Web Interface โ”‚ โ””โ”€โ”€ app.py # Streamlit application โ”‚ โ”œโ”€โ”€ ๐Ÿ“ data/ # Storage (auto-created) โ”‚ โ”œโ”€โ”€ memory.json # Memory metadata โ”‚ โ”œโ”€โ”€ memory.faiss # Vector index โ”‚ โ””โ”€โ”€ visionq.log # Application logs โ”‚ โ”œโ”€โ”€ ๐Ÿ“ models/ # AI Models (auto-downloaded) โ”‚ โ”œโ”€โ”€ vosk/ # Speech recognition โ”‚ โ””โ”€โ”€ piper/ # Neural TTS (optional) โ”‚ โ”œโ”€โ”€ ๐Ÿ“ docs/ # Documentation โ”‚ โ”œโ”€โ”€ LANGUAGES.md # Language support (90+) โ”‚ โ””โ”€โ”€ API_KEYS.md # API keys info (none needed!) โ”‚ โ”œโ”€โ”€ ๐Ÿ“ core/ # Core Integration โ”‚ โ”œโ”€โ”€ __init__.py โ”‚ โ””โ”€โ”€ fusion_layer.py # Multimodal fusion โ”‚ โ”œโ”€โ”€ ๐Ÿ“ .streamlit/ # Streamlit Config โ”‚ โ””โ”€โ”€ config.toml # UI theme & settings โ”‚ โ”œโ”€โ”€ ๐Ÿ“ archive/ # Old Files (after cleanup) โ”‚ โ”œโ”€โ”€ old_agents/ โ”‚ โ”œโ”€โ”€ old_docs/ โ”‚ โ””โ”€โ”€ old_scripts/ โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ README.md # Main documentation โ”œโ”€โ”€ ๐Ÿ“„ requirements.txt # Dependencies โ”œโ”€โ”€ ๐Ÿ“„ .env.example # Environment template โ”œโ”€โ”€ ๐Ÿ“„ .gitignore # Git ignore rules โ”œโ”€โ”€ ๐Ÿ“„ run.bat # Quick launcher (Windows) โ”œโ”€โ”€ ๐Ÿ“„ cleanup.bat # Project cleanup script โ””โ”€โ”€ ๐Ÿ“„ LICENSE # MIT License ``` --- ## ๐Ÿ“‹ File Descriptions ### **Core Agents (`agents/`)** | File | Purpose | Lines | Key Features | |------|---------|-------|--------------| | `voice_agent.py` | Voice I/O | ~200 | Vosk STT, Voxtral/pyttsx3 TTS, fallback | | `vision_agent.py` | Vision hub | ~250 | YOLO/SSD, coordinates all vision agents | | `caption_agent.py` | Captioning | ~50 | BLIP image-to-text | | `embedding_agent.py` | Embeddings | ~60 | CLIP visual embeddings | | `ocr_agent.py` | Text extraction | ~80 | EasyOCR, 90+ languages | | `memory_agent.py` | Storage | ~200 | JSON + FAISS hybrid storage | | `query_agent.py` | Retrieval | ~180 | DistilBERT NLP, hybrid search | ### **Configuration (`config/`)** | File | Purpose | Lines | Key Features | |------|---------|-------|--------------| | `settings.py` | Central config | ~200 | All settings, feature flags, paths | ### **User Interface (`ui/`)** | File | Purpose | Lines | Key Features | |------|---------|-------|--------------| | `app.py` | Streamlit UI | ~400 | Web interface, 4 tabs, interactive | ### **Integration (`core/`)** | File | Purpose | Lines | Key Features | |------|---------|-------|--------------| | `fusion_layer.py` | Multimodal fusion | ~80 | Combines caption + OCR + embeddings | ### **Documentation (`docs/`)** | File | Purpose | Pages | |------|---------|-------| | `LANGUAGES.md` | Language support | 5 | | `API_KEYS.md` | API keys info | 4 | ### **Root Files** | File | Purpose | |------|---------| | `README.md` | Main documentation | | `requirements.txt` | Python dependencies | | `.env.example` | Environment template | | `.gitignore` | Git ignore rules | | `run.bat` | Quick launcher | | `cleanup.bat` | Cleanup script | | `LICENSE` | MIT License | --- ## ๐ŸŽฏ Design Principles ### **1. Modularity** - Each agent is independent - Clear separation of concerns - Easy to extend/modify ### **2. Configuration** - Centralized in `config/settings.py` - Environment variables supported - Feature flags for easy toggling ### **3. User-Friendly** - Streamlit UI for easy testing - One-click launcher (`run.bat`) - Clear documentation ### **4. Clean Structure** - No redundant files - Logical folder organization - Archive for old files --- ## ๐Ÿ“Š File Statistics ### **Code Files** | Category | Files | Lines | Size | |----------|-------|-------|------| | Agents | 7 | ~1,100 | ~40KB | | Config | 1 | ~200 | ~8KB | | UI | 1 | ~400 | ~15KB | | Core | 1 | ~80 | ~3KB | | **Total** | **10** | **~1,780** | **~66KB** | ### **Documentation** | Category | Files | Pages | Size | |----------|-------|-------|------| | Main | 1 | 3 | ~15KB | | Guides | 2 | 9 | ~40KB | | **Total** | **3** | **12** | **~55KB** | ### **Total Project** - **Code Files:** 10 - **Documentation:** 3 - **Config Files:** 4 - **Total Lines:** ~2,000 - **Total Size:** ~120KB (excluding models) --- ## ๐Ÿ”„ Data Flow ``` User Input (UI/Voice) โ†“ Vision Agent (Hub) โ”œโ”€โ–บ YOLO/SSD โ†’ Objects โ”œโ”€โ–บ BLIP โ†’ Caption โ”œโ”€โ–บ CLIP โ†’ Embeddings โ””โ”€โ–บ OCR โ†’ Text โ†“ Fusion Layer โ†“ Memory Agent (Storage) โ”œโ”€โ–บ JSON (metadata) โ””โ”€โ–บ FAISS (vectors) โ†“ Query Agent (Retrieval) โ”œโ”€โ–บ DistilBERT (intent) โ”œโ”€โ–บ FAISS (similarity) โ””โ”€โ–บ Time filter โ†“ Response (UI/Voice) ``` --- ## ๐Ÿ—‚๏ธ Directory Purposes ### **`agents/`** - AI Agents **Purpose:** Core AI functionality **Contains:** All intelligent agents **Modify:** To change AI behavior ### **`config/`** - Configuration **Purpose:** Centralized settings **Contains:** All configuration **Modify:** To customize behavior ### **`ui/`** - User Interface **Purpose:** Web interface **Contains:** Streamlit app **Modify:** To change UI ### **`core/`** - Integration **Purpose:** Multimodal fusion **Contains:** Integration logic **Modify:** To change fusion ### **`data/`** - Storage **Purpose:** Persistent data **Contains:** Memories, logs **Modify:** Never (auto-managed) ### **`models/`** - AI Models **Purpose:** Model storage **Contains:** Downloaded models **Modify:** Never (auto-managed) ### **`docs/`** - Documentation **Purpose:** User guides **Contains:** Documentation **Modify:** To update docs ### **`archive/`** - Old Files **Purpose:** Backup **Contains:** Old/deprecated files **Modify:** Can delete if not needed --- ## ๐Ÿš€ Quick Navigation ### **Want to...** **...change settings?** โ†’ `config/settings.py` **...modify UI?** โ†’ `ui/app.py` **...add new agent?** โ†’ Create in `agents/` folder **...change OCR languages?** โ†’ `config/settings.py` โ†’ `OCR_CONFIG` **...see memories?** โ†’ `data/memory.json` **...check logs?** โ†’ `data/visionq.log` **...understand languages?** โ†’ `docs/LANGUAGES.md` **...learn about API keys?** โ†’ `docs/API_KEYS.md` --- ## ๐Ÿงน Cleanup Process ### **Before Cleanup** ``` VisionQ/ โ”œโ”€โ”€ agents/ (new) โ”œโ”€โ”€ config/ (new) โ”œโ”€โ”€ ui/ (new) โ”œโ”€โ”€ caption_agent.py (old) โ”œโ”€โ”€ memory_agent.py (old) โ”œโ”€โ”€ vision_agent.py (old) โ”œโ”€โ”€ main.py (old) โ”œโ”€โ”€ main_upgraded.py (old) โ””โ”€โ”€ ... (many old files) ``` ### **After Cleanup** ``` VisionQ/ โ”œโ”€โ”€ agents/ (clean) โ”œโ”€โ”€ config/ (clean) โ”œโ”€โ”€ ui/ (clean) โ”œโ”€โ”€ data/ (clean) โ”œโ”€โ”€ docs/ (clean) โ”œโ”€โ”€ archive/ (old files) โ””โ”€โ”€ ... (only essential files) ``` **Run:** `cleanup.bat` to organize --- ## ๐Ÿ“ฆ What Gets Downloaded ### **First Run Downloads (~2GB)** | Model | Size | Purpose | |-------|------|---------| | YOLO | ~50MB | Object detection | | BLIP | ~1GB | Image captioning | | CLIP | ~500MB | Visual embeddings | | DistilBERT | ~250MB | NLP | | EasyOCR (per language) | ~50MB | Text extraction | | sentence-transformers | ~100MB | Text embeddings | **Location:** `~/.cache/` (system cache) **Note:** Models are shared across projects! --- ## ๐ŸŽฏ Best Practices ### **Development** 1. **Modify agents** in `agents/` folder 2. **Change settings** in `config/settings.py` 3. **Update UI** in `ui/app.py` 4. **Test changes** with `run.bat` ### **Deployment** 1. **Keep** `agents/`, `config/`, `ui/` 2. **Include** `requirements.txt`, `README.md` 3. **Exclude** `data/`, `models/`, `archive/` 4. **Add** `.env` for production settings ### **Maintenance** 1. **Update** dependencies regularly 2. **Clean** old memories periodically 3. **Check** logs for errors 4. **Backup** `data/` folder --- ## โœ… Structure Benefits ### **Clean** - โœ… No redundant files - โœ… Logical organization - โœ… Easy to navigate ### **Modular** - โœ… Independent agents - โœ… Clear responsibilities - โœ… Easy to extend ### **User-Friendly** - โœ… Streamlit UI - โœ… One-click launch - โœ… Clear documentation ### **Maintainable** - โœ… Centralized config - โœ… Consistent structure - โœ… Well documented --- **VisionQ - Clean, organized, and ready to use! ๐Ÿš€**