Spaces:
Running on Zero
Running on Zero
docs: plainer voice, drop AI writing tics and negation framing
Browse files
README.md
CHANGED
|
@@ -18,7 +18,7 @@ tags:
|
|
| 18 |
- sharing-is-caring
|
| 19 |
---
|
| 20 |
|
| 21 |
-
# 👁️ Iris
|
| 22 |
|
| 23 |
> Built for the **Build Small Hackathon** · **Backyard AI** track · for my father, who is blind.
|
| 24 |
|
|
@@ -27,16 +27,16 @@ tags:
|
|
| 27 |
**Demo video:** _‹add link›_ · **Social post:** _‹add link›_
|
| 28 |
|
| 29 |
Iris is a voice-first assistant for blind and low-vision people. Open it on a phone,
|
| 30 |
-
point the camera, and
|
| 31 |
-
|
| 32 |
-
|
| 33 |
|
| 34 |
## What it does
|
| 35 |
-
- 👁️ **Describe**
|
| 36 |
-
- 🎤 **Ask, hands-free**
|
| 37 |
-
- 💵 **Read money
|
| 38 |
-
- 💊 **Read medicine**
|
| 39 |
-
- 📡 **Live mode**
|
| 40 |
|
| 41 |
## How to use it
|
| 42 |
- **Tap** anywhere → describe what's in front of you.
|
|
@@ -44,28 +44,28 @@ the button**, and in live mode Iris simply listens.
|
|
| 44 |
- **Double-tap** → toggle live mode (hands-free listening + new-thing alerts). Say *"stop"* to turn it off.
|
| 45 |
- First run: **choose your language by voice** ("say your language"). Language & accessibility toggles sit in the top corners.
|
| 46 |
|
| 47 |
-
##
|
| 48 |
-
|
| 49 |
-
- **The whole screen is one button**
|
| 50 |
-
- **It talks first.**
|
| 51 |
-
- **Hands-free**
|
| 52 |
-
- **For low vision too:**
|
| 53 |
-
- **
|
| 54 |
|
| 55 |
-
## How it works
|
| 56 |
| Stage | Model | Params |
|
| 57 |
|---|---|---|
|
| 58 |
| Speech-to-text | Whisper small (faster-whisper) | ~0.24B |
|
| 59 |
| Vision-language | **Qwen3-VL-2B-Instruct** | ~2B |
|
| 60 |
| Text-to-speech | Piper (pt_BR / en_US) | <1B |
|
| 61 |
|
| 62 |
-
**
|
| 63 |
-
|
| 64 |
-
on **ZeroGPU**
|
| 65 |
|
| 66 |
-
##
|
| 67 |
-
Iris describes
|
| 68 |
-
|
| 69 |
|
| 70 |
## Run locally
|
| 71 |
```bash
|
|
|
|
| 18 |
- sharing-is-caring
|
| 19 |
---
|
| 20 |
|
| 21 |
+
# 👁️ Iris: your father's eyes, by voice
|
| 22 |
|
| 23 |
> Built for the **Build Small Hackathon** · **Backyard AI** track · for my father, who is blind.
|
| 24 |
|
|
|
|
| 27 |
**Demo video:** _‹add link›_ · **Social post:** _‹add link›_
|
| 28 |
|
| 29 |
Iris is a voice-first assistant for blind and low-vision people. Open it on a phone,
|
| 30 |
+
point the camera, and it tells you what's around you, out loud, in your language.
|
| 31 |
+
**The whole screen is the button**, so there's nothing small to aim for. In live
|
| 32 |
+
mode it just listens and answers.
|
| 33 |
|
| 34 |
## What it does
|
| 35 |
+
- 👁️ **Describe**: tap anywhere. *"A table ahead with a mug on the right."*
|
| 36 |
+
- 🎤 **Ask, hands-free**: in live mode, just speak. *"What color is this shirt?"*, *"read this label"*, *"is anyone here?"*
|
| 37 |
+
- 💵 **Read money and bills**: *"how much do I have?"* counts the banknotes. Point at an electricity bill and it reads the **amount and due date**.
|
| 38 |
+
- 💊 **Read medicine**: reads the dose and instructions on a box, exactly as written.
|
| 39 |
+
- 📡 **Live mode**: double-tap, or say *"live mode"*. Iris describes the scene once, then speaks up only when something new comes into view.
|
| 40 |
|
| 41 |
## How to use it
|
| 42 |
- **Tap** anywhere → describe what's in front of you.
|
|
|
|
| 44 |
- **Double-tap** → toggle live mode (hands-free listening + new-thing alerts). Say *"stop"* to turn it off.
|
| 45 |
- First run: **choose your language by voice** ("say your language"). Language & accessibility toggles sit in the top corners.
|
| 46 |
|
| 47 |
+
## Built for a blind user first
|
| 48 |
+
Accessibility shaped the whole interface, because the person it was made for asked for it:
|
| 49 |
+
- **The whole screen is one button.** Tap to describe, hold to ask, double-tap for live mode. Nothing small to find, no menus.
|
| 50 |
+
- **It talks first.** A spoken welcome on the first tap, and you **choose your language by voice**.
|
| 51 |
+
- **Hands-free.** In live mode it listens continuously, so there are no buttons to press.
|
| 52 |
+
- **For low vision too:** large buttons with clear labels and real SVG icons, plus a **high-contrast and larger-text** mode.
|
| 53 |
+
- **Standards:** keyboard focus rings, ARIA live regions, haptic feedback, and it honours the system's reduced-motion and contrast settings.
|
| 54 |
|
| 55 |
+
## How it works: small models only, ≤ 32B total
|
| 56 |
| Stage | Model | Params |
|
| 57 |
|---|---|---|
|
| 58 |
| Speech-to-text | Whisper small (faster-whisper) | ~0.24B |
|
| 59 |
| Vision-language | **Qwen3-VL-2B-Instruct** | ~2B |
|
| 60 |
| Text-to-speech | Piper (pt_BR / en_US) | <1B |
|
| 61 |
|
| 62 |
+
**About 2.5B total**, well under 32B, and **every model is ≤ 4B** (Tiny Titan). The
|
| 63 |
+
voice-first frontend is custom, built on **`gr.Server`** (Off-Brand). Inference runs
|
| 64 |
+
in the Space on **ZeroGPU**, with no third-party model APIs.
|
| 65 |
|
| 66 |
+
## Safety
|
| 67 |
+
Iris describes surroundings and reads text. Don't use it to get around or avoid
|
| 68 |
+
obstacles. It can't judge distance reliably and isn't safe to walk by.
|
| 69 |
|
| 70 |
## Run locally
|
| 71 |
```bash
|
app.py
CHANGED
|
@@ -31,7 +31,7 @@ def _path(f):
|
|
| 31 |
return f["path"] if isinstance(f, dict) else f.path
|
| 32 |
|
| 33 |
|
| 34 |
-
# voice commands (live mode on/off)
|
| 35 |
_OFF_RE = re.compile(r"\b(pare|parar|para de|deslig\w*|cancela\w*|stop|turn off|"
|
| 36 |
r"silenci\w*|quieto|chega|modo manual|manual)\b")
|
| 37 |
|
|
|
|
| 31 |
return f["path"] if isinstance(f, dict) else f.path
|
| 32 |
|
| 33 |
|
| 34 |
+
# voice commands (live mode on/off), tolerant of transcription errors (e.g. "modo ao fio")
|
| 35 |
_OFF_RE = re.compile(r"\b(pare|parar|para de|deslig\w*|cancela\w*|stop|turn off|"
|
| 36 |
r"silenci\w*|quieto|chega|modo manual|manual)\b")
|
| 37 |
|