visionq / docs /STRUCTURE.md
NanG01's picture
architectural change: restructure project and update documentation
bc3cab1
# πŸ“‚ VisionQ - Project Structure
## 🎯 Clean & Organized Structure
```
VisionQ/
β”‚
β”œβ”€β”€ πŸ“ agents/ # AI Agents (Core Logic)
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ voice_agent.py # Voice I/O (STT + TTS)
β”‚ β”œβ”€β”€ vision_agent.py # Vision coordinator
β”‚ β”œβ”€β”€ caption_agent.py # Image captioning (BLIP)
β”‚ β”œβ”€β”€ embedding_agent.py # Visual embeddings (CLIP)
β”‚ β”œβ”€β”€ ocr_agent.py # Text extraction (EasyOCR)
β”‚ β”œβ”€β”€ memory_agent.py # Storage (JSON + FAISS)
β”‚ └── query_agent.py # Smart retrieval (DistilBERT)
β”‚
β”œβ”€β”€ πŸ“ config/ # Configuration
β”‚ └── settings.py # Centralized settings
β”‚
β”œβ”€β”€ πŸ“ ui/ # Web Interface
β”‚ └── app.py # Streamlit application
β”‚
β”œβ”€β”€ πŸ“ data/ # Storage (auto-created)
β”‚ β”œβ”€β”€ memory.json # Memory metadata
β”‚ β”œβ”€β”€ memory.faiss # Vector index
β”‚ └── visionq.log # Application logs
β”‚
β”œβ”€β”€ πŸ“ models/ # AI Models (auto-downloaded)
β”‚ β”œβ”€β”€ vosk/ # Speech recognition
β”‚ └── piper/ # Neural TTS (optional)
β”‚
β”œβ”€β”€ πŸ“ docs/ # Documentation
β”‚ β”œβ”€β”€ LANGUAGES.md # Language support (90+)
β”‚ └── API_KEYS.md # API keys info (none needed!)
β”‚
β”œβ”€β”€ πŸ“ core/ # Core Integration
β”‚ β”œβ”€β”€ __init__.py
β”‚ └── fusion_layer.py # Multimodal fusion
β”‚
β”œβ”€β”€ πŸ“ .streamlit/ # Streamlit Config
β”‚ └── config.toml # UI theme & settings
β”‚
β”œβ”€β”€ πŸ“ archive/ # Old Files (after cleanup)
β”‚ β”œβ”€β”€ old_agents/
β”‚ β”œβ”€β”€ old_docs/
β”‚ └── old_scripts/
β”‚
β”œβ”€β”€ πŸ“„ README.md # Main documentation
β”œβ”€β”€ πŸ“„ requirements.txt # Dependencies
β”œβ”€β”€ πŸ“„ .env.example # Environment template
β”œβ”€β”€ πŸ“„ .gitignore # Git ignore rules
β”œβ”€β”€ πŸ“„ run.bat # Quick launcher (Windows)
β”œβ”€β”€ πŸ“„ cleanup.bat # Project cleanup script
└── πŸ“„ LICENSE # MIT License
```
---
## πŸ“‹ File Descriptions
### **Core Agents (`agents/`)**
| File | Purpose | Lines | Key Features |
|------|---------|-------|--------------|
| `voice_agent.py` | Voice I/O | ~200 | Vosk STT, Voxtral/pyttsx3 TTS, fallback |
| `vision_agent.py` | Vision hub | ~250 | YOLO/SSD, coordinates all vision agents |
| `caption_agent.py` | Captioning | ~50 | BLIP image-to-text |
| `embedding_agent.py` | Embeddings | ~60 | CLIP visual embeddings |
| `ocr_agent.py` | Text extraction | ~80 | EasyOCR, 90+ languages |
| `memory_agent.py` | Storage | ~200 | JSON + FAISS hybrid storage |
| `query_agent.py` | Retrieval | ~180 | DistilBERT NLP, hybrid search |
### **Configuration (`config/`)**
| File | Purpose | Lines | Key Features |
|------|---------|-------|--------------|
| `settings.py` | Central config | ~200 | All settings, feature flags, paths |
### **User Interface (`ui/`)**
| File | Purpose | Lines | Key Features |
|------|---------|-------|--------------|
| `app.py` | Streamlit UI | ~400 | Web interface, 4 tabs, interactive |
### **Integration (`core/`)**
| File | Purpose | Lines | Key Features |
|------|---------|-------|--------------|
| `fusion_layer.py` | Multimodal fusion | ~80 | Combines caption + OCR + embeddings |
### **Documentation (`docs/`)**
| File | Purpose | Pages |
|------|---------|-------|
| `LANGUAGES.md` | Language support | 5 |
| `API_KEYS.md` | API keys info | 4 |
### **Root Files**
| File | Purpose |
|------|---------|
| `README.md` | Main documentation |
| `requirements.txt` | Python dependencies |
| `.env.example` | Environment template |
| `.gitignore` | Git ignore rules |
| `run.bat` | Quick launcher |
| `cleanup.bat` | Cleanup script |
| `LICENSE` | MIT License |
---
## 🎯 Design Principles
### **1. Modularity**
- Each agent is independent
- Clear separation of concerns
- Easy to extend/modify
### **2. Configuration**
- Centralized in `config/settings.py`
- Environment variables supported
- Feature flags for easy toggling
### **3. User-Friendly**
- Streamlit UI for easy testing
- One-click launcher (`run.bat`)
- Clear documentation
### **4. Clean Structure**
- No redundant files
- Logical folder organization
- Archive for old files
---
## πŸ“Š File Statistics
### **Code Files**
| Category | Files | Lines | Size |
|----------|-------|-------|------|
| Agents | 7 | ~1,100 | ~40KB |
| Config | 1 | ~200 | ~8KB |
| UI | 1 | ~400 | ~15KB |
| Core | 1 | ~80 | ~3KB |
| **Total** | **10** | **~1,780** | **~66KB** |
### **Documentation**
| Category | Files | Pages | Size |
|----------|-------|-------|------|
| Main | 1 | 3 | ~15KB |
| Guides | 2 | 9 | ~40KB |
| **Total** | **3** | **12** | **~55KB** |
### **Total Project**
- **Code Files:** 10
- **Documentation:** 3
- **Config Files:** 4
- **Total Lines:** ~2,000
- **Total Size:** ~120KB (excluding models)
---
## πŸ”„ Data Flow
```
User Input (UI/Voice)
↓
Vision Agent (Hub)
β”œβ”€β–Ί YOLO/SSD β†’ Objects
β”œβ”€β–Ί BLIP β†’ Caption
β”œβ”€β–Ί CLIP β†’ Embeddings
└─► OCR β†’ Text
↓
Fusion Layer
↓
Memory Agent (Storage)
β”œβ”€β–Ί JSON (metadata)
└─► FAISS (vectors)
↓
Query Agent (Retrieval)
β”œβ”€β–Ί DistilBERT (intent)
β”œβ”€β–Ί FAISS (similarity)
└─► Time filter
↓
Response (UI/Voice)
```
---
## πŸ—‚οΈ Directory Purposes
### **`agents/`** - AI Agents
**Purpose:** Core AI functionality
**Contains:** All intelligent agents
**Modify:** To change AI behavior
### **`config/`** - Configuration
**Purpose:** Centralized settings
**Contains:** All configuration
**Modify:** To customize behavior
### **`ui/`** - User Interface
**Purpose:** Web interface
**Contains:** Streamlit app
**Modify:** To change UI
### **`core/`** - Integration
**Purpose:** Multimodal fusion
**Contains:** Integration logic
**Modify:** To change fusion
### **`data/`** - Storage
**Purpose:** Persistent data
**Contains:** Memories, logs
**Modify:** Never (auto-managed)
### **`models/`** - AI Models
**Purpose:** Model storage
**Contains:** Downloaded models
**Modify:** Never (auto-managed)
### **`docs/`** - Documentation
**Purpose:** User guides
**Contains:** Documentation
**Modify:** To update docs
### **`archive/`** - Old Files
**Purpose:** Backup
**Contains:** Old/deprecated files
**Modify:** Can delete if not needed
---
## πŸš€ Quick Navigation
### **Want to...**
**...change settings?**
β†’ `config/settings.py`
**...modify UI?**
β†’ `ui/app.py`
**...add new agent?**
β†’ Create in `agents/` folder
**...change OCR languages?**
β†’ `config/settings.py` β†’ `OCR_CONFIG`
**...see memories?**
β†’ `data/memory.json`
**...check logs?**
β†’ `data/visionq.log`
**...understand languages?**
β†’ `docs/LANGUAGES.md`
**...learn about API keys?**
β†’ `docs/API_KEYS.md`
---
## 🧹 Cleanup Process
### **Before Cleanup**
```
VisionQ/
β”œβ”€β”€ agents/ (new)
β”œβ”€β”€ config/ (new)
β”œβ”€β”€ ui/ (new)
β”œβ”€β”€ caption_agent.py (old)
β”œβ”€β”€ memory_agent.py (old)
β”œβ”€β”€ vision_agent.py (old)
β”œβ”€β”€ main.py (old)
β”œβ”€β”€ main_upgraded.py (old)
└── ... (many old files)
```
### **After Cleanup**
```
VisionQ/
β”œβ”€β”€ agents/ (clean)
β”œβ”€β”€ config/ (clean)
β”œβ”€β”€ ui/ (clean)
β”œβ”€β”€ data/ (clean)
β”œβ”€β”€ docs/ (clean)
β”œβ”€β”€ archive/ (old files)
└── ... (only essential files)
```
**Run:** `cleanup.bat` to organize
---
## πŸ“¦ What Gets Downloaded
### **First Run Downloads (~2GB)**
| Model | Size | Purpose |
|-------|------|---------|
| YOLO | ~50MB | Object detection |
| BLIP | ~1GB | Image captioning |
| CLIP | ~500MB | Visual embeddings |
| DistilBERT | ~250MB | NLP |
| EasyOCR (per language) | ~50MB | Text extraction |
| sentence-transformers | ~100MB | Text embeddings |
**Location:** `~/.cache/` (system cache)
**Note:** Models are shared across projects!
---
## 🎯 Best Practices
### **Development**
1. **Modify agents** in `agents/` folder
2. **Change settings** in `config/settings.py`
3. **Update UI** in `ui/app.py`
4. **Test changes** with `run.bat`
### **Deployment**
1. **Keep** `agents/`, `config/`, `ui/`
2. **Include** `requirements.txt`, `README.md`
3. **Exclude** `data/`, `models/`, `archive/`
4. **Add** `.env` for production settings
### **Maintenance**
1. **Update** dependencies regularly
2. **Clean** old memories periodically
3. **Check** logs for errors
4. **Backup** `data/` folder
---
## βœ… Structure Benefits
### **Clean**
- βœ… No redundant files
- βœ… Logical organization
- βœ… Easy to navigate
### **Modular**
- βœ… Independent agents
- βœ… Clear responsibilities
- βœ… Easy to extend
### **User-Friendly**
- βœ… Streamlit UI
- βœ… One-click launch
- βœ… Clear documentation
### **Maintainable**
- βœ… Centralized config
- βœ… Consistent structure
- βœ… Well documented
---
**VisionQ - Clean, organized, and ready to use! πŸš€**