nextmarte Claude Opus 4.8 commited on
Commit
5d19f12
·
1 Parent(s): 04c91be

docs: plainer voice, drop AI writing tics and negation framing

Browse files
Files changed (2) hide show
  1. README.md +23 -23
  2. app.py +1 -1
README.md CHANGED
@@ -18,7 +18,7 @@ tags:
18
  - sharing-is-caring
19
  ---
20
 
21
- # 👁️ Iris your father's eyes, by voice
22
 
23
  > Built for the **Build Small Hackathon** · **Backyard AI** track · for my father, who is blind.
24
 
@@ -27,16 +27,16 @@ tags:
27
  **Demo video:** _‹add link›_ · **Social post:** _‹add link›_
28
 
29
  Iris is a voice-first assistant for blind and low-vision people. Open it on a phone,
30
- point the camera, and Iris tells you what's around you out loud, in your language.
31
- There are no menus to find and no small buttons to hunt for: **the whole screen is
32
- the button**, and in live mode Iris simply listens.
33
 
34
  ## What it does
35
- - 👁️ **Describe** tap anywhere: *"A table ahead with a mug on the right."*
36
- - 🎤 **Ask, hands-free** in live mode just speak: *"what color is this shirt?"*, *"read this label"*, *"is anyone here?"*
37
- - 💵 **Read money & bills** *"how much do I have?"* counts the banknotes; point at an electricity bill and it reads the **amount and due date**.
38
- - 💊 **Read medicine** reads the dose and instructions on a box, exactly as written.
39
- - 📡 **Live mode** double-tap (or say *"live mode"*): Iris describes the scene once, then **sparingly** announces *new* things that enter without chattering.
40
 
41
  ## How to use it
42
  - **Tap** anywhere → describe what's in front of you.
@@ -44,28 +44,28 @@ the button**, and in live mode Iris simply listens.
44
  - **Double-tap** → toggle live mode (hands-free listening + new-thing alerts). Say *"stop"* to turn it off.
45
  - First run: **choose your language by voice** ("say your language"). Language & accessibility toggles sit in the top corners.
46
 
47
- ## Accessibility the starting point, not a coat of paint
48
- Iris was designed for a blind user **first**, so the whole interface gets out of the way:
49
- - **The whole screen is one button** tap to describe, hold to ask, double-tap for live mode. No precise targets to find, no menus to navigate.
50
- - **It talks first.** Spoken onboarding on the first interaction, and you **choose your language by voice**.
51
- - **Hands-free** continuous listening in live mode — once it's on, no buttons at all.
52
- - **For low vision too:** big, clearly *labelled* buttons (real SVG icons, not emoji) and a **high-contrast / larger-text** toggle.
53
- - **Built to standard:** keyboard focus rings, ARIA live regions, haptic feedback, and it honours the OS *reduced-motion* and *more-contrast* settings.
54
 
55
- ## How it works small models only, ≤ 32B total
56
  | Stage | Model | Params |
57
  |---|---|---|
58
  | Speech-to-text | Whisper small (faster-whisper) | ~0.24B |
59
  | Vision-language | **Qwen3-VL-2B-Instruct** | ~2B |
60
  | Text-to-speech | Piper (pt_BR / en_US) | <1B |
61
 
62
- ** 2.5B total** comfortably under 32B, and **every model is ≤ 4B** (Tiny Titan).
63
- Custom voice-first frontend via **`gr.Server`** (Off-Brand). Inference runs in-Space
64
- on **ZeroGPU** no third-party model APIs.
65
 
66
- ## Not a mobility aid
67
- Iris describes the environment and reads text. It is **not** an obstacle-avoidance or
68
- navigation device and must not be relied on for physical safety.
69
 
70
  ## Run locally
71
  ```bash
 
18
  - sharing-is-caring
19
  ---
20
 
21
+ # 👁️ Iris: your father's eyes, by voice
22
 
23
  > Built for the **Build Small Hackathon** · **Backyard AI** track · for my father, who is blind.
24
 
 
27
  **Demo video:** _‹add link›_ · **Social post:** _‹add link›_
28
 
29
  Iris is a voice-first assistant for blind and low-vision people. Open it on a phone,
30
+ point the camera, and it tells you what's around you, out loud, in your language.
31
+ **The whole screen is the button**, so there's nothing small to aim for. In live
32
+ mode it just listens and answers.
33
 
34
  ## What it does
35
+ - 👁️ **Describe**: tap anywhere. *"A table ahead with a mug on the right."*
36
+ - 🎤 **Ask, hands-free**: in live mode, just speak. *"What color is this shirt?"*, *"read this label"*, *"is anyone here?"*
37
+ - 💵 **Read money and bills**: *"how much do I have?"* counts the banknotes. Point at an electricity bill and it reads the **amount and due date**.
38
+ - 💊 **Read medicine**: reads the dose and instructions on a box, exactly as written.
39
+ - 📡 **Live mode**: double-tap, or say *"live mode"*. Iris describes the scene once, then speaks up only when something new comes into view.
40
 
41
  ## How to use it
42
  - **Tap** anywhere → describe what's in front of you.
 
44
  - **Double-tap** → toggle live mode (hands-free listening + new-thing alerts). Say *"stop"* to turn it off.
45
  - First run: **choose your language by voice** ("say your language"). Language & accessibility toggles sit in the top corners.
46
 
47
+ ## Built for a blind user first
48
+ Accessibility shaped the whole interface, because the person it was made for asked for it:
49
+ - **The whole screen is one button.** Tap to describe, hold to ask, double-tap for live mode. Nothing small to find, no menus.
50
+ - **It talks first.** A spoken welcome on the first tap, and you **choose your language by voice**.
51
+ - **Hands-free.** In live mode it listens continuously, so there are no buttons to press.
52
+ - **For low vision too:** large buttons with clear labels and real SVG icons, plus a **high-contrast and larger-text** mode.
53
+ - **Standards:** keyboard focus rings, ARIA live regions, haptic feedback, and it honours the system's reduced-motion and contrast settings.
54
 
55
+ ## How it works: small models only, ≤ 32B total
56
  | Stage | Model | Params |
57
  |---|---|---|
58
  | Speech-to-text | Whisper small (faster-whisper) | ~0.24B |
59
  | Vision-language | **Qwen3-VL-2B-Instruct** | ~2B |
60
  | Text-to-speech | Piper (pt_BR / en_US) | <1B |
61
 
62
+ **About 2.5B total**, well under 32B, and **every model is ≤ 4B** (Tiny Titan). The
63
+ voice-first frontend is custom, built on **`gr.Server`** (Off-Brand). Inference runs
64
+ in the Space on **ZeroGPU**, with no third-party model APIs.
65
 
66
+ ## Safety
67
+ Iris describes surroundings and reads text. Don't use it to get around or avoid
68
+ obstacles. It can't judge distance reliably and isn't safe to walk by.
69
 
70
  ## Run locally
71
  ```bash
app.py CHANGED
@@ -31,7 +31,7 @@ def _path(f):
31
  return f["path"] if isinstance(f, dict) else f.path
32
 
33
 
34
- # voice commands (live mode on/off) robust to transcription errors (e.g. "modo ao fio")
35
  _OFF_RE = re.compile(r"\b(pare|parar|para de|deslig\w*|cancela\w*|stop|turn off|"
36
  r"silenci\w*|quieto|chega|modo manual|manual)\b")
37
 
 
31
  return f["path"] if isinstance(f, dict) else f.path
32
 
33
 
34
+ # voice commands (live mode on/off), tolerant of transcription errors (e.g. "modo ao fio")
35
  _OFF_RE = re.compile(r"\b(pare|parar|para de|deslig\w*|cancela\w*|stop|turn off|"
36
  r"silenci\w*|quieto|chega|modo manual|manual)\b")
37