A7med-Ame3's picture
Update README.md
5cfc384 verified
---
title: Real Time Image Captioning
emoji: 👁️
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: "5.9.1"
python_version: "3.10"
app_file: app.py
pinned: false
---
# 👁 ClearPath — Real-Time Scene Description for Visually-Impaired People
A fully open-source Python system that describes visual scenes in plain language
and classifies them as **SAFE** or **DANGEROUS** using a regex engine.
```
┌─────────┐ ┌──────────────┐ ┌───────────────┐ ┌────────────────┐ ┌──────┐
│ Input │───▶│ Qwen2-VL │───▶│ Regex Safety │───▶│ SAFE/DANGEROUS │───▶│ TTS │
│ (Image/ │ │ Captioning │ │ Classifier │ │ + Hazard tags │ │ │
│ Video / │ │ (HuggingFace│ │ (15 categories│ └────────────────┘ └──────┘
│ Camera) │ │ open src) │ │ ~30 patterns)│
└─────────┘ └──────────────┘ └───────────────┘
```
---
## 📁 Project Structure
```
scene_description/
├── app.py ← Gradio web UI (main entry point)
├── cli.py ← Command-line interface
├── scene_captioner.py ← Qwen2-VL image captioning module
├── safety_classifier.py ← Regex-based SAFE/DANGEROUS classifier
├── tts_engine.py ← Text-to-Speech (pyttsx3 / gTTS)
├── requirements.txt
├── tests/
│ └── test_safety_classifier.py
└── README.md
```
---
## ⚙️ Setup
### 1. Create a virtual environment
```bash
python -m venv venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows
```
### 2. Install dependencies
```bash
pip install -r requirements.txt
```
> **GPU (recommended):** Install the CUDA-enabled PyTorch version first:
> ```bash
> pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
> ```
### 3. (Optional) HuggingFace login for gated models
```bash
huggingface-cli login
```
Qwen2-VL-2B is **not gated** — no login required for the default model.
---
## 🚀 Running
### Web UI (Gradio)
```bash
python app.py
```
Open **http://localhost:7860** in your browser.
Supports:
- 📁 Image upload (drag & drop)
- 📷 Live webcam capture
- 🎬 Video file analysis (frame-by-frame)
### Command Line
```bash
# Single image
python cli.py --image photo.jpg --speak
# Video file (capture every 3 seconds)
python cli.py --video footage.mp4 --interval 3 --speak
# Live webcam loop
python cli.py --camera --speak
# Use larger model for better quality
python cli.py --image photo.jpg --model Qwen/Qwen2-VL-7B-Instruct
```
### Run Tests
```bash
python -m pytest tests/ -v
```
---
## 🧠 Models
| Model | Size | VRAM | Quality |
|-------|------|------|---------|
| `Qwen/Qwen2-VL-2B-Instruct` | ~5 GB | ~5 GB | Good ✅ (default) |
| `Qwen/Qwen2-VL-7B-Instruct` | ~14 GB | ~14 GB | Better ⭐ |
| `Qwen/Qwen2.5-VL-3B-Instruct` | ~6 GB | ~6 GB | Good + newer |
| `Salesforce/blip2-opt-2.7b` | ~5 GB | ~5 GB | Fallback only |
Switch model via environment variable:
```bash
QWEN_MODEL=Qwen/Qwen2-VL-7B-Instruct python app.py
```
---
## 🔍 Safety Classifier — Hazard Categories
The regex engine covers **15 hazard categories** with ~30 pattern groups:
| Category | Examples |
|----------|---------|
| `fire` | fire, flames, burning, blaze, smoke |
| `flood` | flooding, flash flood, submerged |
| `storm` | tornado, hurricane, lightning |
| `traffic` | oncoming car, near collision |
| `crash` | accident, wreck, overturned vehicle |
| `weapon` | gun, knife, rifle, blade, bomb |
| `violence` | brawl, riot, shooting, assault |
| `fall` | cliff, ledge, scaffolding, steep drop |
| `collapse` | rubble, debris, cave-in |
| `electrical` | exposed wire, live wire, sparking |
| `injury` | blood, wound, bleeding, unconscious |
| `slip` | wet floor, icy road, black ice |
| `construction` | heavy machinery, crane, unsafe structure |
| `chemical` | chemical spill, gas leak, toxic fumes |
| `crowd` | stampede, crowd crush, panic |
---
## ♿ Accessibility Features
- **Auto TTS** — every description is read aloud automatically
- `aria-live` regions in the web UI for screen reader support
- High-contrast dark theme with clear visual indicators
- Keyboard-navigable Gradio interface
---
## 📄 License
MIT License — free for personal and commercial use.