File size: 4,667 Bytes
9813469
 
 
 
 
 
5cfc384
9813469
 
 
 
 
4fd9791
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f6f5e83
4fd9791
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f6f5e83
 
4fd9791
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9813469
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
---
title: Real Time Image Captioning
emoji: 👁️
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: "5.9.1"
python_version: "3.10"
app_file: app.py
pinned: false
---

# 👁 ClearPath — Real-Time Scene Description for Visually-Impaired People

A fully open-source Python system that describes visual scenes in plain language
and classifies them as **SAFE** or **DANGEROUS** using a regex engine.

```
┌─────────┐    ┌──────────────┐    ┌───────────────┐    ┌────────────────┐    ┌──────┐
│  Input  │───▶│  Qwen2-VL    │───▶│  Regex Safety │───▶│ SAFE/DANGEROUS │───▶│  TTS │
│ (Image/ │    │  Captioning  │    │  Classifier   │    │  + Hazard tags │    │      │
│ Video / │    │  (HuggingFace│    │  (15 categories│   └────────────────┘    └──────┘
│ Camera) │    │   open src)  │    │   ~30 patterns)│
└─────────┘    └──────────────┘    └───────────────┘
```

---

## 📁 Project Structure

```
scene_description/
├── app.py                  ← Gradio web UI (main entry point)
├── cli.py                  ← Command-line interface
├── scene_captioner.py      ← Qwen2-VL image captioning module
├── safety_classifier.py    ← Regex-based SAFE/DANGEROUS classifier
├── tts_engine.py           ← Text-to-Speech (pyttsx3 / gTTS)
├── requirements.txt
├── tests/
│   └── test_safety_classifier.py
└── README.md
```

---

## ⚙️ Setup

### 1. Create a virtual environment
```bash
python -m venv venv
source venv/bin/activate        # Linux/Mac
venv\Scripts\activate           # Windows
```

### 2. Install dependencies
```bash
pip install -r requirements.txt
```

> **GPU (recommended):** Install the CUDA-enabled PyTorch version first:
> ```bash
> pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
> ```

### 3. (Optional) HuggingFace login for gated models
```bash
huggingface-cli login
```
Qwen2-VL-2B is **not gated** — no login required for the default model.

---

## 🚀 Running

### Web UI (Gradio)
```bash
python app.py
```
Open **http://localhost:7860** in your browser.

Supports:
- 📁 Image upload (drag & drop)
- 📷 Live webcam capture
- 🎬 Video file analysis (frame-by-frame)

### Command Line
```bash
# Single image
python cli.py --image photo.jpg --speak

# Video file (capture every 3 seconds)
python cli.py --video footage.mp4 --interval 3 --speak

# Live webcam loop
python cli.py --camera --speak

# Use larger model for better quality
python cli.py --image photo.jpg --model Qwen/Qwen2-VL-7B-Instruct
```

### Run Tests
```bash
python -m pytest tests/ -v
```

---

## 🧠 Models

| Model | Size | VRAM | Quality |
|-------|------|------|---------|
| `Qwen/Qwen2-VL-2B-Instruct` | ~5 GB | ~5 GB | Good ✅ (default) |
| `Qwen/Qwen2-VL-7B-Instruct` | ~14 GB | ~14 GB | Better ⭐ |
| `Qwen/Qwen2.5-VL-3B-Instruct` | ~6 GB | ~6 GB | Good + newer |
| `Salesforce/blip2-opt-2.7b` | ~5 GB | ~5 GB | Fallback only |

Switch model via environment variable:
```bash
QWEN_MODEL=Qwen/Qwen2-VL-7B-Instruct python app.py
```

---

## 🔍 Safety Classifier — Hazard Categories

The regex engine covers **15 hazard categories** with ~30 pattern groups:

| Category | Examples |
|----------|---------|
| `fire` | fire, flames, burning, blaze, smoke |
| `flood` | flooding, flash flood, submerged |
| `storm` | tornado, hurricane, lightning |
| `traffic` | oncoming car, near collision |
| `crash` | accident, wreck, overturned vehicle |
| `weapon` | gun, knife, rifle, blade, bomb |
| `violence` | brawl, riot, shooting, assault |
| `fall` | cliff, ledge, scaffolding, steep drop |
| `collapse` | rubble, debris, cave-in |
| `electrical` | exposed wire, live wire, sparking |
| `injury` | blood, wound, bleeding, unconscious |
| `slip` | wet floor, icy road, black ice |
| `construction` | heavy machinery, crane, unsafe structure |
| `chemical` | chemical spill, gas leak, toxic fumes |
| `crowd` | stampede, crowd crush, panic |

---

## ♿ Accessibility Features

- **Auto TTS** — every description is read aloud automatically
- `aria-live` regions in the web UI for screen reader support
- High-contrast dark theme with clear visual indicators
- Keyboard-navigable Gradio interface

---

## 📄 License

MIT License — free for personal and commercial use.