File size: 4,667 Bytes
9813469 5cfc384 9813469 4fd9791 f6f5e83 4fd9791 f6f5e83 4fd9791 9813469 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 | ---
title: Real Time Image Captioning
emoji: 👁️
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: "5.9.1"
python_version: "3.10"
app_file: app.py
pinned: false
---
# 👁 ClearPath — Real-Time Scene Description for Visually-Impaired People
A fully open-source Python system that describes visual scenes in plain language
and classifies them as **SAFE** or **DANGEROUS** using a regex engine.
```
┌─────────┐ ┌──────────────┐ ┌───────────────┐ ┌────────────────┐ ┌──────┐
│ Input │───▶│ Qwen2-VL │───▶│ Regex Safety │───▶│ SAFE/DANGEROUS │───▶│ TTS │
│ (Image/ │ │ Captioning │ │ Classifier │ │ + Hazard tags │ │ │
│ Video / │ │ (HuggingFace│ │ (15 categories│ └────────────────┘ └──────┘
│ Camera) │ │ open src) │ │ ~30 patterns)│
└─────────┘ └──────────────┘ └───────────────┘
```
---
## 📁 Project Structure
```
scene_description/
├── app.py ← Gradio web UI (main entry point)
├── cli.py ← Command-line interface
├── scene_captioner.py ← Qwen2-VL image captioning module
├── safety_classifier.py ← Regex-based SAFE/DANGEROUS classifier
├── tts_engine.py ← Text-to-Speech (pyttsx3 / gTTS)
├── requirements.txt
├── tests/
│ └── test_safety_classifier.py
└── README.md
```
---
## ⚙️ Setup
### 1. Create a virtual environment
```bash
python -m venv venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows
```
### 2. Install dependencies
```bash
pip install -r requirements.txt
```
> **GPU (recommended):** Install the CUDA-enabled PyTorch version first:
> ```bash
> pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
> ```
### 3. (Optional) HuggingFace login for gated models
```bash
huggingface-cli login
```
Qwen2-VL-2B is **not gated** — no login required for the default model.
---
## 🚀 Running
### Web UI (Gradio)
```bash
python app.py
```
Open **http://localhost:7860** in your browser.
Supports:
- 📁 Image upload (drag & drop)
- 📷 Live webcam capture
- 🎬 Video file analysis (frame-by-frame)
### Command Line
```bash
# Single image
python cli.py --image photo.jpg --speak
# Video file (capture every 3 seconds)
python cli.py --video footage.mp4 --interval 3 --speak
# Live webcam loop
python cli.py --camera --speak
# Use larger model for better quality
python cli.py --image photo.jpg --model Qwen/Qwen2-VL-7B-Instruct
```
### Run Tests
```bash
python -m pytest tests/ -v
```
---
## 🧠 Models
| Model | Size | VRAM | Quality |
|-------|------|------|---------|
| `Qwen/Qwen2-VL-2B-Instruct` | ~5 GB | ~5 GB | Good ✅ (default) |
| `Qwen/Qwen2-VL-7B-Instruct` | ~14 GB | ~14 GB | Better ⭐ |
| `Qwen/Qwen2.5-VL-3B-Instruct` | ~6 GB | ~6 GB | Good + newer |
| `Salesforce/blip2-opt-2.7b` | ~5 GB | ~5 GB | Fallback only |
Switch model via environment variable:
```bash
QWEN_MODEL=Qwen/Qwen2-VL-7B-Instruct python app.py
```
---
## 🔍 Safety Classifier — Hazard Categories
The regex engine covers **15 hazard categories** with ~30 pattern groups:
| Category | Examples |
|----------|---------|
| `fire` | fire, flames, burning, blaze, smoke |
| `flood` | flooding, flash flood, submerged |
| `storm` | tornado, hurricane, lightning |
| `traffic` | oncoming car, near collision |
| `crash` | accident, wreck, overturned vehicle |
| `weapon` | gun, knife, rifle, blade, bomb |
| `violence` | brawl, riot, shooting, assault |
| `fall` | cliff, ledge, scaffolding, steep drop |
| `collapse` | rubble, debris, cave-in |
| `electrical` | exposed wire, live wire, sparking |
| `injury` | blood, wound, bleeding, unconscious |
| `slip` | wet floor, icy road, black ice |
| `construction` | heavy machinery, crane, unsafe structure |
| `chemical` | chemical spill, gas leak, toxic fumes |
| `crowd` | stampede, crowd crush, panic |
---
## ♿ Accessibility Features
- **Auto TTS** — every description is read aloud automatically
- `aria-live` regions in the web UI for screen reader support
- High-contrast dark theme with clear visual indicators
- Keyboard-navigable Gradio interface
---
## 📄 License
MIT License — free for personal and commercial use. |