Spaces:

build-small-hackathon
/

iris

Paused

App Files Files Community

nextmarte Claude Opus 4.8 commited on Jun 10

Commit

5d19f12

1 Parent(s): 04c91be

docs: plainer voice, drop AI writing tics and negation framing

Browse files

Files changed (2) hide show

README.md +23 -23
app.py +1 -1

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ tags:
   - sharing-is-caring
 ---
-# 👁️ Iris — your father's eyes, by voice
 > Built for the **Build Small Hackathon** · **Backyard AI** track · for my father, who is blind.
@@ -27,16 +27,16 @@ tags:
 **Demo video:** _‹add link›_ · **Social post:** _‹add link›_
 Iris is a voice-first assistant for blind and low-vision people. Open it on a phone,
-point the camera, and Iris tells you what's around you — out loud, in your language.
-There are no menus to find and no small buttons to hunt for: **the whole screen is
-the button**, and in live mode Iris simply listens.
 ## What it does
-- 👁️ **Describe** — tap anywhere: *"A table ahead with a mug on the right."*
-- 🎤 **Ask, hands-free** — in live mode just speak: *"what color is this shirt?"*, *"read this label"*, *"is anyone here?"*
-- 💵 **Read money & bills** — *"how much do I have?"* counts the banknotes; point at an electricity bill and it reads the **amount and due date**.
-- 💊 **Read medicine** — reads the dose and instructions on a box, exactly as written.
-- 📡 **Live mode** — double-tap (or say *"live mode"*): Iris describes the scene once, then **sparingly** announces *new* things that enter — without chattering.
 ## How to use it
 - **Tap** anywhere → describe what's in front of you.
@@ -44,28 +44,28 @@ the button**, and in live mode Iris simply listens.
 - **Double-tap** → toggle live mode (hands-free listening + new-thing alerts). Say *"stop"* to turn it off.
 - First run: **choose your language by voice** ("say your language"). Language & accessibility toggles sit in the top corners.
-## Accessibility — the starting point, not a coat of paint
-Iris was designed for a blind user **first**, so the whole interface gets out of the way:
-- **The whole screen is one button** — tap to describe, hold to ask, double-tap for live mode. No precise targets to find, no menus to navigate.
-- **It talks first.** Spoken onboarding on the first interaction, and you **choose your language by voice**.
-- **Hands-free** continuous listening in live mode — once it's on, no buttons at all.
-- **For low vision too:** big, clearly *labelled* buttons (real SVG icons, not emoji) and a **high-contrast / larger-text** toggle.
-- **Built to standard:** keyboard focus rings, ARIA live regions, haptic feedback, and it honours the OS *reduced-motion* and *more-contrast* settings.
-## How it works — small models only, ≤ 32B total
 | Stage | Model | Params |
 |---|---|---|
 | Speech-to-text | Whisper small (faster-whisper) | ~0.24B |
 | Vision-language | **Qwen3-VL-2B-Instruct** | ~2B |
 | Text-to-speech | Piper (pt_BR / en_US) | <1B |
-**≈ 2.5B total** — comfortably under 32B, and **every model is ≤ 4B** (Tiny Titan).
-Custom voice-first frontend via **`gr.Server`** (Off-Brand). Inference runs in-Space
-on **ZeroGPU** — no third-party model APIs.
-## Not a mobility aid
-Iris describes the environment and reads text. It is **not** an obstacle-avoidance or
-navigation device and must not be relied on for physical safety.
 ## Run locally
 ```bash

   - sharing-is-caring
 ---
+# 👁️ Iris: your father's eyes, by voice
 > Built for the **Build Small Hackathon** · **Backyard AI** track · for my father, who is blind.
 **Demo video:** _‹add link›_ · **Social post:** _‹add link›_
 Iris is a voice-first assistant for blind and low-vision people. Open it on a phone,
+point the camera, and it tells you what's around you, out loud, in your language.
+**The whole screen is the button**, so there's nothing small to aim for. In live
+mode it just listens and answers.
 ## What it does
+- 👁️ **Describe**: tap anywhere. *"A table ahead with a mug on the right."*
+- 🎤 **Ask, hands-free**: in live mode, just speak. *"What color is this shirt?"*, *"read this label"*, *"is anyone here?"*
+- 💵 **Read money and bills**: *"how much do I have?"* counts the banknotes. Point at an electricity bill and it reads the **amount and due date**.
+- 💊 **Read medicine**: reads the dose and instructions on a box, exactly as written.
+- 📡 **Live mode**: double-tap, or say *"live mode"*. Iris describes the scene once, then speaks up only when something new comes into view.
 ## How to use it
 - **Tap** anywhere → describe what's in front of you.
 - **Double-tap** → toggle live mode (hands-free listening + new-thing alerts). Say *"stop"* to turn it off.
 - First run: **choose your language by voice** ("say your language"). Language & accessibility toggles sit in the top corners.
+## Built for a blind user first
+Accessibility shaped the whole interface, because the person it was made for asked for it:
+- **The whole screen is one button.** Tap to describe, hold to ask, double-tap for live mode. Nothing small to find, no menus.
+- **It talks first.** A spoken welcome on the first tap, and you **choose your language by voice**.
+- **Hands-free.** In live mode it listens continuously, so there are no buttons to press.
+- **For low vision too:** large buttons with clear labels and real SVG icons, plus a **high-contrast and larger-text** mode.
+- **Standards:** keyboard focus rings, ARIA live regions, haptic feedback, and it honours the system's reduced-motion and contrast settings.
+## How it works: small models only, ≤ 32B total
 | Stage | Model | Params |
 |---|---|---|
 | Speech-to-text | Whisper small (faster-whisper) | ~0.24B |
 | Vision-language | **Qwen3-VL-2B-Instruct** | ~2B |
 | Text-to-speech | Piper (pt_BR / en_US) | <1B |
+**About 2.5B total**, well under 32B, and **every model is ≤ 4B** (Tiny Titan). The
+voice-first frontend is custom, built on **`gr.Server`** (Off-Brand). Inference runs
+in the Space on **ZeroGPU**, with no third-party model APIs.
+## Safety
+Iris describes surroundings and reads text. Don't use it to get around or avoid
+obstacles. It can't judge distance reliably and isn't safe to walk by.
 ## Run locally
 ```bash

app.py CHANGED Viewed

@@ -31,7 +31,7 @@ def _path(f):
     return f["path"] if isinstance(f, dict) else f.path
-# voice commands (live mode on/off) — robust to transcription errors (e.g. "modo ao fio")
 _OFF_RE = re.compile(r"\b(pare|parar|para de|deslig\w*|cancela\w*|stop|turn off|"
                      r"silenci\w*|quieto|chega|modo manual|manual)\b")

     return f["path"] if isinstance(f, dict) else f.path
+# voice commands (live mode on/off), tolerant of transcription errors (e.g. "modo ao fio")
 _OFF_RE = re.compile(r"\b(pare|parar|para de|deslig\w*|cancela\w*|stop|turn off|"
                      r"silenci\w*|quieto|chega|modo manual|manual)\b")