| # π VisionQ - Project Structure |
|
|
| ## π― Clean & Organized Structure |
|
|
| ``` |
| VisionQ/ |
| β |
| βββ π agents/ # AI Agents (Core Logic) |
| β βββ __init__.py |
| β βββ voice_agent.py # Voice I/O (STT + TTS) |
| β βββ vision_agent.py # Vision coordinator |
| β βββ caption_agent.py # Image captioning (BLIP) |
| β βββ embedding_agent.py # Visual embeddings (CLIP) |
| β βββ ocr_agent.py # Text extraction (EasyOCR) |
| β βββ memory_agent.py # Storage (JSON + FAISS) |
| β βββ query_agent.py # Smart retrieval (DistilBERT) |
| β |
| βββ π config/ # Configuration |
| β βββ settings.py # Centralized settings |
| β |
| βββ π ui/ # Web Interface |
| β βββ app.py # Streamlit application |
| β |
| βββ π data/ # Storage (auto-created) |
| β βββ memory.json # Memory metadata |
| β βββ memory.faiss # Vector index |
| β βββ visionq.log # Application logs |
| β |
| βββ π models/ # AI Models (auto-downloaded) |
| β βββ vosk/ # Speech recognition |
| β βββ piper/ # Neural TTS (optional) |
| β |
| βββ π docs/ # Documentation |
| β βββ LANGUAGES.md # Language support (90+) |
| β βββ API_KEYS.md # API keys info (none needed!) |
| β |
| βββ π core/ # Core Integration |
| β βββ __init__.py |
| β βββ fusion_layer.py # Multimodal fusion |
| β |
| βββ π .streamlit/ # Streamlit Config |
| β βββ config.toml # UI theme & settings |
| β |
| βββ π archive/ # Old Files (after cleanup) |
| β βββ old_agents/ |
| β βββ old_docs/ |
| β βββ old_scripts/ |
| β |
| βββ π README.md # Main documentation |
| βββ π requirements.txt # Dependencies |
| βββ π .env.example # Environment template |
| βββ π .gitignore # Git ignore rules |
| βββ π run.bat # Quick launcher (Windows) |
| βββ π cleanup.bat # Project cleanup script |
| βββ π LICENSE # MIT License |
| |
| ``` |
|
|
| --- |
|
|
| ## π File Descriptions |
|
|
| ### **Core Agents (`agents/`)** |
|
|
| | File | Purpose | Lines | Key Features | |
| |------|---------|-------|--------------| |
| | `voice_agent.py` | Voice I/O | ~200 | Vosk STT, Voxtral/pyttsx3 TTS, fallback | |
| | `vision_agent.py` | Vision hub | ~250 | YOLO/SSD, coordinates all vision agents | |
| | `caption_agent.py` | Captioning | ~50 | BLIP image-to-text | |
| | `embedding_agent.py` | Embeddings | ~60 | CLIP visual embeddings | |
| | `ocr_agent.py` | Text extraction | ~80 | EasyOCR, 90+ languages | |
| | `memory_agent.py` | Storage | ~200 | JSON + FAISS hybrid storage | |
| | `query_agent.py` | Retrieval | ~180 | DistilBERT NLP, hybrid search | |
|
|
| ### **Configuration (`config/`)** |
|
|
| | File | Purpose | Lines | Key Features | |
| |------|---------|-------|--------------| |
| | `settings.py` | Central config | ~200 | All settings, feature flags, paths | |
|
|
| ### **User Interface (`ui/`)** |
|
|
| | File | Purpose | Lines | Key Features | |
| |------|---------|-------|--------------| |
| | `app.py` | Streamlit UI | ~400 | Web interface, 4 tabs, interactive | |
|
|
| ### **Integration (`core/`)** |
|
|
| | File | Purpose | Lines | Key Features | |
| |------|---------|-------|--------------| |
| | `fusion_layer.py` | Multimodal fusion | ~80 | Combines caption + OCR + embeddings | |
|
|
| ### **Documentation (`docs/`)** |
|
|
| | File | Purpose | Pages | |
| |------|---------|-------| |
| | `LANGUAGES.md` | Language support | 5 | |
| | `API_KEYS.md` | API keys info | 4 | |
|
|
| ### **Root Files** |
|
|
| | File | Purpose | |
| |------|---------| |
| | `README.md` | Main documentation | |
| | `requirements.txt` | Python dependencies | |
| | `.env.example` | Environment template | |
| | `.gitignore` | Git ignore rules | |
| | `run.bat` | Quick launcher | |
| | `cleanup.bat` | Cleanup script | |
| | `LICENSE` | MIT License | |
|
|
| --- |
|
|
| ## π― Design Principles |
|
|
| ### **1. Modularity** |
| - Each agent is independent |
| - Clear separation of concerns |
| - Easy to extend/modify |
|
|
| ### **2. Configuration** |
| - Centralized in `config/settings.py` |
| - Environment variables supported |
| - Feature flags for easy toggling |
|
|
| ### **3. User-Friendly** |
| - Streamlit UI for easy testing |
| - One-click launcher (`run.bat`) |
| - Clear documentation |
|
|
| ### **4. Clean Structure** |
| - No redundant files |
| - Logical folder organization |
| - Archive for old files |
|
|
| --- |
|
|
| ## π File Statistics |
|
|
| ### **Code Files** |
|
|
| | Category | Files | Lines | Size | |
| |----------|-------|-------|------| |
| | Agents | 7 | ~1,100 | ~40KB | |
| | Config | 1 | ~200 | ~8KB | |
| | UI | 1 | ~400 | ~15KB | |
| | Core | 1 | ~80 | ~3KB | |
| | **Total** | **10** | **~1,780** | **~66KB** | |
|
|
| ### **Documentation** |
|
|
| | Category | Files | Pages | Size | |
| |----------|-------|-------|------| |
| | Main | 1 | 3 | ~15KB | |
| | Guides | 2 | 9 | ~40KB | |
| | **Total** | **3** | **12** | **~55KB** | |
|
|
| ### **Total Project** |
|
|
| - **Code Files:** 10 |
| - **Documentation:** 3 |
| - **Config Files:** 4 |
| - **Total Lines:** ~2,000 |
| - **Total Size:** ~120KB (excluding models) |
|
|
| --- |
|
|
| ## π Data Flow |
|
|
| ``` |
| User Input (UI/Voice) |
| β |
| Vision Agent (Hub) |
| βββΊ YOLO/SSD β Objects |
| βββΊ BLIP β Caption |
| βββΊ CLIP β Embeddings |
| βββΊ OCR β Text |
| β |
| Fusion Layer |
| β |
| Memory Agent (Storage) |
| βββΊ JSON (metadata) |
| βββΊ FAISS (vectors) |
| β |
| Query Agent (Retrieval) |
| βββΊ DistilBERT (intent) |
| βββΊ FAISS (similarity) |
| βββΊ Time filter |
| β |
| Response (UI/Voice) |
| ``` |
|
|
| --- |
|
|
| ## ποΈ Directory Purposes |
|
|
| ### **`agents/`** - AI Agents |
| **Purpose:** Core AI functionality |
| **Contains:** All intelligent agents |
| **Modify:** To change AI behavior |
|
|
| ### **`config/`** - Configuration |
| **Purpose:** Centralized settings |
| **Contains:** All configuration |
| **Modify:** To customize behavior |
|
|
| ### **`ui/`** - User Interface |
| **Purpose:** Web interface |
| **Contains:** Streamlit app |
| **Modify:** To change UI |
|
|
| ### **`core/`** - Integration |
| **Purpose:** Multimodal fusion |
| **Contains:** Integration logic |
| **Modify:** To change fusion |
|
|
| ### **`data/`** - Storage |
| **Purpose:** Persistent data |
| **Contains:** Memories, logs |
| **Modify:** Never (auto-managed) |
|
|
| ### **`models/`** - AI Models |
| **Purpose:** Model storage |
| **Contains:** Downloaded models |
| **Modify:** Never (auto-managed) |
|
|
| ### **`docs/`** - Documentation |
| **Purpose:** User guides |
| **Contains:** Documentation |
| **Modify:** To update docs |
|
|
| ### **`archive/`** - Old Files |
| **Purpose:** Backup |
| **Contains:** Old/deprecated files |
| **Modify:** Can delete if not needed |
|
|
| --- |
|
|
| ## π Quick Navigation |
|
|
| ### **Want to...** |
|
|
| **...change settings?** |
| β `config/settings.py` |
|
|
| **...modify UI?** |
| β `ui/app.py` |
|
|
| **...add new agent?** |
| β Create in `agents/` folder |
|
|
| **...change OCR languages?** |
| β `config/settings.py` β `OCR_CONFIG` |
|
|
| **...see memories?** |
| β `data/memory.json` |
|
|
| **...check logs?** |
| β `data/visionq.log` |
|
|
| **...understand languages?** |
| β `docs/LANGUAGES.md` |
|
|
| **...learn about API keys?** |
| β `docs/API_KEYS.md` |
|
|
| --- |
|
|
| ## π§Ή Cleanup Process |
|
|
| ### **Before Cleanup** |
| ``` |
| VisionQ/ |
| βββ agents/ (new) |
| βββ config/ (new) |
| βββ ui/ (new) |
| βββ caption_agent.py (old) |
| βββ memory_agent.py (old) |
| βββ vision_agent.py (old) |
| βββ main.py (old) |
| βββ main_upgraded.py (old) |
| βββ ... (many old files) |
| ``` |
|
|
| ### **After Cleanup** |
| ``` |
| VisionQ/ |
| βββ agents/ (clean) |
| βββ config/ (clean) |
| βββ ui/ (clean) |
| βββ data/ (clean) |
| βββ docs/ (clean) |
| βββ archive/ (old files) |
| βββ ... (only essential files) |
| ``` |
|
|
| **Run:** `cleanup.bat` to organize |
|
|
| --- |
|
|
| ## π¦ What Gets Downloaded |
|
|
| ### **First Run Downloads (~2GB)** |
|
|
| | Model | Size | Purpose | |
| |-------|------|---------| |
| | YOLO | ~50MB | Object detection | |
| | BLIP | ~1GB | Image captioning | |
| | CLIP | ~500MB | Visual embeddings | |
| | DistilBERT | ~250MB | NLP | |
| | EasyOCR (per language) | ~50MB | Text extraction | |
| | sentence-transformers | ~100MB | Text embeddings | |
|
|
| **Location:** `~/.cache/` (system cache) |
|
|
| **Note:** Models are shared across projects! |
|
|
| --- |
|
|
| ## π― Best Practices |
|
|
| ### **Development** |
|
|
| 1. **Modify agents** in `agents/` folder |
| 2. **Change settings** in `config/settings.py` |
| 3. **Update UI** in `ui/app.py` |
| 4. **Test changes** with `run.bat` |
|
|
| ### **Deployment** |
|
|
| 1. **Keep** `agents/`, `config/`, `ui/` |
| 2. **Include** `requirements.txt`, `README.md` |
| 3. **Exclude** `data/`, `models/`, `archive/` |
| 4. **Add** `.env` for production settings |
|
|
| ### **Maintenance** |
|
|
| 1. **Update** dependencies regularly |
| 2. **Clean** old memories periodically |
| 3. **Check** logs for errors |
| 4. **Backup** `data/` folder |
|
|
| --- |
|
|
| ## β
Structure Benefits |
|
|
| ### **Clean** |
| - β
No redundant files |
| - β
Logical organization |
| - β
Easy to navigate |
|
|
| ### **Modular** |
| - β
Independent agents |
| - β
Clear responsibilities |
| - β
Easy to extend |
|
|
| ### **User-Friendly** |
| - β
Streamlit UI |
| - β
One-click launch |
| - β
Clear documentation |
|
|
| ### **Maintainable** |
| - β
Centralized config |
| - β
Consistent structure |
| - β
Well documented |
|
|
| --- |
|
|
| **VisionQ - Clean, organized, and ready to use! π** |
|
|