visionq / docs /STRUCTURE.md
NanG01's picture
architectural change: restructure project and update documentation
bc3cab1

πŸ“‚ VisionQ - Project Structure

🎯 Clean & Organized Structure

VisionQ/
β”‚
β”œβ”€β”€ πŸ“ agents/                  # AI Agents (Core Logic)
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ voice_agent.py         # Voice I/O (STT + TTS)
β”‚   β”œβ”€β”€ vision_agent.py        # Vision coordinator
β”‚   β”œβ”€β”€ caption_agent.py       # Image captioning (BLIP)
β”‚   β”œβ”€β”€ embedding_agent.py     # Visual embeddings (CLIP)
β”‚   β”œβ”€β”€ ocr_agent.py          # Text extraction (EasyOCR)
β”‚   β”œβ”€β”€ memory_agent.py        # Storage (JSON + FAISS)
β”‚   └── query_agent.py         # Smart retrieval (DistilBERT)
β”‚
β”œβ”€β”€ πŸ“ config/                  # Configuration
β”‚   └── settings.py            # Centralized settings
β”‚
β”œβ”€β”€ πŸ“ ui/                      # Web Interface
β”‚   └── app.py                 # Streamlit application
β”‚
β”œβ”€β”€ πŸ“ data/                    # Storage (auto-created)
β”‚   β”œβ”€β”€ memory.json            # Memory metadata
β”‚   β”œβ”€β”€ memory.faiss           # Vector index
β”‚   └── visionq.log            # Application logs
β”‚
β”œβ”€β”€ πŸ“ models/                  # AI Models (auto-downloaded)
β”‚   β”œβ”€β”€ vosk/                  # Speech recognition
β”‚   └── piper/                 # Neural TTS (optional)
β”‚
β”œβ”€β”€ πŸ“ docs/                    # Documentation
β”‚   β”œβ”€β”€ LANGUAGES.md           # Language support (90+)
β”‚   └── API_KEYS.md            # API keys info (none needed!)
β”‚
β”œβ”€β”€ πŸ“ core/                    # Core Integration
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── fusion_layer.py        # Multimodal fusion
β”‚
β”œβ”€β”€ πŸ“ .streamlit/              # Streamlit Config
β”‚   └── config.toml            # UI theme & settings
β”‚
β”œβ”€β”€ πŸ“ archive/                 # Old Files (after cleanup)
β”‚   β”œβ”€β”€ old_agents/
β”‚   β”œβ”€β”€ old_docs/
β”‚   └── old_scripts/
β”‚
β”œβ”€β”€ πŸ“„ README.md                # Main documentation
β”œβ”€β”€ πŸ“„ requirements.txt         # Dependencies
β”œβ”€β”€ πŸ“„ .env.example             # Environment template
β”œβ”€β”€ πŸ“„ .gitignore               # Git ignore rules
β”œβ”€β”€ πŸ“„ run.bat                  # Quick launcher (Windows)
β”œβ”€β”€ πŸ“„ cleanup.bat              # Project cleanup script
└── πŸ“„ LICENSE                  # MIT License

πŸ“‹ File Descriptions

Core Agents (agents/)

File Purpose Lines Key Features
voice_agent.py Voice I/O ~200 Vosk STT, Voxtral/pyttsx3 TTS, fallback
vision_agent.py Vision hub ~250 YOLO/SSD, coordinates all vision agents
caption_agent.py Captioning ~50 BLIP image-to-text
embedding_agent.py Embeddings ~60 CLIP visual embeddings
ocr_agent.py Text extraction ~80 EasyOCR, 90+ languages
memory_agent.py Storage ~200 JSON + FAISS hybrid storage
query_agent.py Retrieval ~180 DistilBERT NLP, hybrid search

Configuration (config/)

File Purpose Lines Key Features
settings.py Central config ~200 All settings, feature flags, paths

User Interface (ui/)

File Purpose Lines Key Features
app.py Streamlit UI ~400 Web interface, 4 tabs, interactive

Integration (core/)

File Purpose Lines Key Features
fusion_layer.py Multimodal fusion ~80 Combines caption + OCR + embeddings

Documentation (docs/)

File Purpose Pages
LANGUAGES.md Language support 5
API_KEYS.md API keys info 4

Root Files

File Purpose
README.md Main documentation
requirements.txt Python dependencies
.env.example Environment template
.gitignore Git ignore rules
run.bat Quick launcher
cleanup.bat Cleanup script
LICENSE MIT License

🎯 Design Principles

1. Modularity

  • Each agent is independent
  • Clear separation of concerns
  • Easy to extend/modify

2. Configuration

  • Centralized in config/settings.py
  • Environment variables supported
  • Feature flags for easy toggling

3. User-Friendly

  • Streamlit UI for easy testing
  • One-click launcher (run.bat)
  • Clear documentation

4. Clean Structure

  • No redundant files
  • Logical folder organization
  • Archive for old files

πŸ“Š File Statistics

Code Files

Category Files Lines Size
Agents 7 ~1,100 ~40KB
Config 1 ~200 ~8KB
UI 1 ~400 ~15KB
Core 1 ~80 ~3KB
Total 10 ~1,780 ~66KB

Documentation

Category Files Pages Size
Main 1 3 ~15KB
Guides 2 9 ~40KB
Total 3 12 ~55KB

Total Project

  • Code Files: 10
  • Documentation: 3
  • Config Files: 4
  • Total Lines: ~2,000
  • Total Size: ~120KB (excluding models)

πŸ”„ Data Flow

User Input (UI/Voice)
    ↓
Vision Agent (Hub)
    β”œβ”€β–Ί YOLO/SSD β†’ Objects
    β”œβ”€β–Ί BLIP β†’ Caption
    β”œβ”€β–Ί CLIP β†’ Embeddings
    └─► OCR β†’ Text
    ↓
Fusion Layer
    ↓
Memory Agent (Storage)
    β”œβ”€β–Ί JSON (metadata)
    └─► FAISS (vectors)
    ↓
Query Agent (Retrieval)
    β”œβ”€β–Ί DistilBERT (intent)
    β”œβ”€β–Ί FAISS (similarity)
    └─► Time filter
    ↓
Response (UI/Voice)

πŸ—‚οΈ Directory Purposes

agents/ - AI Agents

Purpose: Core AI functionality
Contains: All intelligent agents
Modify: To change AI behavior

config/ - Configuration

Purpose: Centralized settings
Contains: All configuration
Modify: To customize behavior

ui/ - User Interface

Purpose: Web interface
Contains: Streamlit app
Modify: To change UI

core/ - Integration

Purpose: Multimodal fusion
Contains: Integration logic
Modify: To change fusion

data/ - Storage

Purpose: Persistent data
Contains: Memories, logs
Modify: Never (auto-managed)

models/ - AI Models

Purpose: Model storage
Contains: Downloaded models
Modify: Never (auto-managed)

docs/ - Documentation

Purpose: User guides
Contains: Documentation
Modify: To update docs

archive/ - Old Files

Purpose: Backup
Contains: Old/deprecated files
Modify: Can delete if not needed


πŸš€ Quick Navigation

Want to...

...change settings? β†’ config/settings.py

...modify UI? β†’ ui/app.py

...add new agent? β†’ Create in agents/ folder

...change OCR languages? β†’ config/settings.py β†’ OCR_CONFIG

...see memories? β†’ data/memory.json

...check logs? β†’ data/visionq.log

...understand languages? β†’ docs/LANGUAGES.md

...learn about API keys? β†’ docs/API_KEYS.md


🧹 Cleanup Process

Before Cleanup

VisionQ/
β”œβ”€β”€ agents/ (new)
β”œβ”€β”€ config/ (new)
β”œβ”€β”€ ui/ (new)
β”œβ”€β”€ caption_agent.py (old)
β”œβ”€β”€ memory_agent.py (old)
β”œβ”€β”€ vision_agent.py (old)
β”œβ”€β”€ main.py (old)
β”œβ”€β”€ main_upgraded.py (old)
└── ... (many old files)

After Cleanup

VisionQ/
β”œβ”€β”€ agents/ (clean)
β”œβ”€β”€ config/ (clean)
β”œβ”€β”€ ui/ (clean)
β”œβ”€β”€ data/ (clean)
β”œβ”€β”€ docs/ (clean)
β”œβ”€β”€ archive/ (old files)
└── ... (only essential files)

Run: cleanup.bat to organize


πŸ“¦ What Gets Downloaded

First Run Downloads (~2GB)

Model Size Purpose
YOLO ~50MB Object detection
BLIP ~1GB Image captioning
CLIP ~500MB Visual embeddings
DistilBERT ~250MB NLP
EasyOCR (per language) ~50MB Text extraction
sentence-transformers ~100MB Text embeddings

Location: ~/.cache/ (system cache)

Note: Models are shared across projects!


🎯 Best Practices

Development

  1. Modify agents in agents/ folder
  2. Change settings in config/settings.py
  3. Update UI in ui/app.py
  4. Test changes with run.bat

Deployment

  1. Keep agents/, config/, ui/
  2. Include requirements.txt, README.md
  3. Exclude data/, models/, archive/
  4. Add .env for production settings

Maintenance

  1. Update dependencies regularly
  2. Clean old memories periodically
  3. Check logs for errors
  4. Backup data/ folder

βœ… Structure Benefits

Clean

  • βœ… No redundant files
  • βœ… Logical organization
  • βœ… Easy to navigate

Modular

  • βœ… Independent agents
  • βœ… Clear responsibilities
  • βœ… Easy to extend

User-Friendly

  • βœ… Streamlit UI
  • βœ… One-click launch
  • βœ… Clear documentation

Maintainable

  • βœ… Centralized config
  • βœ… Consistent structure
  • βœ… Well documented

VisionQ - Clean, organized, and ready to use! πŸš€