VisionPro / README.md
subhash4face's picture
Added social media post
4a0fae8 verified
---
title: AI Assistant For Visually Impaired
emoji: πŸ’¬
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.42.0
app_file: app.py
pinned: false
hf_oauth: true
hf_oauth_scopes:
- inference-api
license: mit
short_description: AI-Assistant-for-Visually-Impaired
tags:
- mcp-in-action-track-consumer
- building-mcp-track-consumer
---
# Accessibility Voice Agent β€” MCP Tools
### **Track:** mcp-in-action-track-consumer
### **Team:** Team
### **Author:** @subhash4face β€” *Subhash Mankunnu*
### **Author:** *Athira AR*
Model Context Protocol (MCP) + Gradio 6 + HF Inference + ElevenLabs
A fully accessible, voice-driven AI assistant demonstrating how MCP tools can enable **speech-to-text**, **image understanding**, and **text-to-speech** workflows for low-vision and visually impaired users.
This project showcases a real-world use case of MCP tools working together inside an agent-style UI.
---
## πŸ”„ Workflow Diagram β€” MCP Tools
![Workflow Diagram](./WorkFlow.png)
## πŸš€ Demo Video
πŸ‘‰ *https://youtu.be/af4Y89g2HPE*
## πŸš€ Social Media Post - LinkedIn
πŸ‘‰ *https://www.linkedin.com/posts/subhashmankunnu_hugginface-share-7400924735989010432-a9sH?utm_source=share&utm_medium=member_desktop&rcm=ACoAAASVxnsB9ojyfy-Kef3IWvBPf4c3pUSOaWw*
---
## 🌟 Key Features
### πŸ”Š Text-to-Speech (TTS) via ElevenLabs
**MCP Tool:** `speak_text`
- Converts any assistant message to natural speech
- Returns base64 audio + WAV playback
- Helps low-vision users receive spoken responses
---
### 🎀 Speech-to-Text (STT) via Whisper / Local fallback
**MCP Tool:** `transcribe_audio`
- OpenAI Whisper STT or local fallback
- Great for hands-free usage
- Tool-call log shows backend + duration
---
### πŸ–Ό Image Description via OpenAI / Gemini / HF Inference
**MCP Tool:** `describe_image`
- Multimodal accessibility
- Describes any uploaded image in plain language
- Hugging Face Inference API used instead of local BLIP
---
### 🧩 Fully MCP-powered
Every capability is wrapped as an MCP tool, making this app a template for:
- Agents
- Assistive technologies
- Multimodal accessibility apps
- Voice-driven workflows
- Cross-backend tool orchestration
---
## πŸ’‘ Real Use Case: Accessibility
Designed for:
- Low-vision users
- Voice-interface users
- Anyone needing automated image descriptions
- Hands-free workflows
- Assistive technology research
---
## πŸ›  Tech Stack
- MCP Server (Python)
- Gradio 6
- OpenAI Whisper (STT)
- ElevenLabs (TTS)
- Gemini Vision (optional)
- Hugging Face Inference API (image captioning)
- Python
---
## 🏁 How to Run Locally
```bash
pip install -r requirements.txt
python app.py