π VisionQ - Project Structure
π― Clean & Organized Structure
VisionQ/
β
βββ π agents/ # AI Agents (Core Logic)
β βββ __init__.py
β βββ voice_agent.py # Voice I/O (STT + TTS)
β βββ vision_agent.py # Vision coordinator
β βββ caption_agent.py # Image captioning (BLIP)
β βββ embedding_agent.py # Visual embeddings (CLIP)
β βββ ocr_agent.py # Text extraction (EasyOCR)
β βββ memory_agent.py # Storage (JSON + FAISS)
β βββ query_agent.py # Smart retrieval (DistilBERT)
β
βββ π config/ # Configuration
β βββ settings.py # Centralized settings
β
βββ π ui/ # Web Interface
β βββ app.py # Streamlit application
β
βββ π data/ # Storage (auto-created)
β βββ memory.json # Memory metadata
β βββ memory.faiss # Vector index
β βββ visionq.log # Application logs
β
βββ π models/ # AI Models (auto-downloaded)
β βββ vosk/ # Speech recognition
β βββ piper/ # Neural TTS (optional)
β
βββ π docs/ # Documentation
β βββ LANGUAGES.md # Language support (90+)
β βββ API_KEYS.md # API keys info (none needed!)
β
βββ π core/ # Core Integration
β βββ __init__.py
β βββ fusion_layer.py # Multimodal fusion
β
βββ π .streamlit/ # Streamlit Config
β βββ config.toml # UI theme & settings
β
βββ π archive/ # Old Files (after cleanup)
β βββ old_agents/
β βββ old_docs/
β βββ old_scripts/
β
βββ π README.md # Main documentation
βββ π requirements.txt # Dependencies
βββ π .env.example # Environment template
βββ π .gitignore # Git ignore rules
βββ π run.bat # Quick launcher (Windows)
βββ π cleanup.bat # Project cleanup script
βββ π LICENSE # MIT License
π File Descriptions
Core Agents (agents/)
| File | Purpose | Lines | Key Features |
|---|---|---|---|
voice_agent.py |
Voice I/O | ~200 | Vosk STT, Voxtral/pyttsx3 TTS, fallback |
vision_agent.py |
Vision hub | ~250 | YOLO/SSD, coordinates all vision agents |
caption_agent.py |
Captioning | ~50 | BLIP image-to-text |
embedding_agent.py |
Embeddings | ~60 | CLIP visual embeddings |
ocr_agent.py |
Text extraction | ~80 | EasyOCR, 90+ languages |
memory_agent.py |
Storage | ~200 | JSON + FAISS hybrid storage |
query_agent.py |
Retrieval | ~180 | DistilBERT NLP, hybrid search |
Configuration (config/)
| File | Purpose | Lines | Key Features |
|---|---|---|---|
settings.py |
Central config | ~200 | All settings, feature flags, paths |
User Interface (ui/)
| File | Purpose | Lines | Key Features |
|---|---|---|---|
app.py |
Streamlit UI | ~400 | Web interface, 4 tabs, interactive |
Integration (core/)
| File | Purpose | Lines | Key Features |
|---|---|---|---|
fusion_layer.py |
Multimodal fusion | ~80 | Combines caption + OCR + embeddings |
Documentation (docs/)
| File | Purpose | Pages |
|---|---|---|
LANGUAGES.md |
Language support | 5 |
API_KEYS.md |
API keys info | 4 |
Root Files
| File | Purpose |
|---|---|
README.md |
Main documentation |
requirements.txt |
Python dependencies |
.env.example |
Environment template |
.gitignore |
Git ignore rules |
run.bat |
Quick launcher |
cleanup.bat |
Cleanup script |
LICENSE |
MIT License |
π― Design Principles
1. Modularity
- Each agent is independent
- Clear separation of concerns
- Easy to extend/modify
2. Configuration
- Centralized in
config/settings.py - Environment variables supported
- Feature flags for easy toggling
3. User-Friendly
- Streamlit UI for easy testing
- One-click launcher (
run.bat) - Clear documentation
4. Clean Structure
- No redundant files
- Logical folder organization
- Archive for old files
π File Statistics
Code Files
| Category | Files | Lines | Size |
|---|---|---|---|
| Agents | 7 | ~1,100 | ~40KB |
| Config | 1 | ~200 | ~8KB |
| UI | 1 | ~400 | ~15KB |
| Core | 1 | ~80 | ~3KB |
| Total | 10 | ~1,780 | ~66KB |
Documentation
| Category | Files | Pages | Size |
|---|---|---|---|
| Main | 1 | 3 | ~15KB |
| Guides | 2 | 9 | ~40KB |
| Total | 3 | 12 | ~55KB |
Total Project
- Code Files: 10
- Documentation: 3
- Config Files: 4
- Total Lines: ~2,000
- Total Size: ~120KB (excluding models)
π Data Flow
User Input (UI/Voice)
β
Vision Agent (Hub)
βββΊ YOLO/SSD β Objects
βββΊ BLIP β Caption
βββΊ CLIP β Embeddings
βββΊ OCR β Text
β
Fusion Layer
β
Memory Agent (Storage)
βββΊ JSON (metadata)
βββΊ FAISS (vectors)
β
Query Agent (Retrieval)
βββΊ DistilBERT (intent)
βββΊ FAISS (similarity)
βββΊ Time filter
β
Response (UI/Voice)
ποΈ Directory Purposes
agents/ - AI Agents
Purpose: Core AI functionality
Contains: All intelligent agents
Modify: To change AI behavior
config/ - Configuration
Purpose: Centralized settings
Contains: All configuration
Modify: To customize behavior
ui/ - User Interface
Purpose: Web interface
Contains: Streamlit app
Modify: To change UI
core/ - Integration
Purpose: Multimodal fusion
Contains: Integration logic
Modify: To change fusion
data/ - Storage
Purpose: Persistent data
Contains: Memories, logs
Modify: Never (auto-managed)
models/ - AI Models
Purpose: Model storage
Contains: Downloaded models
Modify: Never (auto-managed)
docs/ - Documentation
Purpose: User guides
Contains: Documentation
Modify: To update docs
archive/ - Old Files
Purpose: Backup
Contains: Old/deprecated files
Modify: Can delete if not needed
π Quick Navigation
Want to...
...change settings?
β config/settings.py
...modify UI?
β ui/app.py
...add new agent?
β Create in agents/ folder
...change OCR languages?
β config/settings.py β OCR_CONFIG
...see memories?
β data/memory.json
...check logs?
β data/visionq.log
...understand languages?
β docs/LANGUAGES.md
...learn about API keys?
β docs/API_KEYS.md
π§Ή Cleanup Process
Before Cleanup
VisionQ/
βββ agents/ (new)
βββ config/ (new)
βββ ui/ (new)
βββ caption_agent.py (old)
βββ memory_agent.py (old)
βββ vision_agent.py (old)
βββ main.py (old)
βββ main_upgraded.py (old)
βββ ... (many old files)
After Cleanup
VisionQ/
βββ agents/ (clean)
βββ config/ (clean)
βββ ui/ (clean)
βββ data/ (clean)
βββ docs/ (clean)
βββ archive/ (old files)
βββ ... (only essential files)
Run: cleanup.bat to organize
π¦ What Gets Downloaded
First Run Downloads (~2GB)
| Model | Size | Purpose |
|---|---|---|
| YOLO | ~50MB | Object detection |
| BLIP | ~1GB | Image captioning |
| CLIP | ~500MB | Visual embeddings |
| DistilBERT | ~250MB | NLP |
| EasyOCR (per language) | ~50MB | Text extraction |
| sentence-transformers | ~100MB | Text embeddings |
Location: ~/.cache/ (system cache)
Note: Models are shared across projects!
π― Best Practices
Development
- Modify agents in
agents/folder - Change settings in
config/settings.py - Update UI in
ui/app.py - Test changes with
run.bat
Deployment
- Keep
agents/,config/,ui/ - Include
requirements.txt,README.md - Exclude
data/,models/,archive/ - Add
.envfor production settings
Maintenance
- Update dependencies regularly
- Clean old memories periodically
- Check logs for errors
- Backup
data/folder
β Structure Benefits
Clean
- β No redundant files
- β Logical organization
- β Easy to navigate
Modular
- β Independent agents
- β Clear responsibilities
- β Easy to extend
User-Friendly
- β Streamlit UI
- β One-click launch
- β Clear documentation
Maintainable
- β Centralized config
- β Consistent structure
- β Well documented
VisionQ - Clean, organized, and ready to use! π