Spaces:
Running
A newer version of the Gradio SDK is available:
6.1.0
title: Atlas
emoji: π
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 6.0.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: ATLAS - Gradio x HuggingFace Hackathon
tags:
- mcp-in-action-track-enterprise
- mcp-in-action-track-consumer
ATLAS
Important
- Watch ATLAS' video overview here: [Youtube(https://youtu.be/-nn9mkU5jqk)]
- ATLAS works entirely through mock MCP tools - no external dependencies required. Just clone and run.
- Social media link: [LinkedIn(https://www.linkedin.com/posts/andrei-d-zamfir_atlas-demo-overview-gradio-x-mcp-hackathon-activity-7401038354537951232-Nylu?utm_source=share&utm_medium=member_desktop&rcm=ACoAACmwO_QBj6ltvKCp4p2M88UmBqnVqE7jwxM)]
Overview
ATLAS is a multimodal AI work companion built for the Gradio x MCP Hackathon. It demonstrates how a voice-driven assistant can augment knowledge work by:
- Listening to your requests through voice (STT)
- Speaking responses and updates (TTS)
- Seeing your screen to understand context (vision)
- Acting on your behalf through MCP tool integrations
The goal is to showcase how modern LLMs can be integrated into daily workflows to handle context retrieval, document analysis, and environment automation, all through natural conversation.
Key Goals
Multimodal Work Companion
- Voice: hands-free interaction during calls/meetings
- Vision: screen analysis for real-time context
- Text: conversational interface with persistent context
Practical Automation
- Email context absorption
- Customer data retrieval
- Document lookup and analysis
- Environment automation (API permissions, integrations)
Proof-of-Concept (POC)
- Simple RAG without database infrastructure
- Mock MCP tools for easy setup
- Adaptable to any office workflow
Functionalities & Offerings
1. Audio Service
- STT: Converts voice input to text for hands-free operation
- TTS: Speaks AI responses for natural conversation flow
2. Text (LLM) Service
- Built on modern LLM APIs
- Handles multi-turn conversation with context retention
- Tool-calling orchestration for MCP integration
- Dynamic prompt engineering for context-aware responses
3. Vision Service
- Screen capture analysis for understanding user context
- Document reading and interpretation
- Visual feedback integration into conversation flow
4. MCP Integration
- Customer Data Tools: Retrieve CRM information on demand
- Document Retrieval: Simple RAG implementation without database
- Environment Automation: API permission management, integration testing
- Email Processing: Context absorption and response generation
Demo Scenario
The hackathon demo showcases a realistic CSM/sales rep workflow:
- Email arrives β ATLAS reads and absorbs context using vision
- Customer data needed β Retrieves from mock CRM
- Documents requested β Pulls relevant customer files
- API call fails (401) β User encounters auth error in Postman
- ATLAS fixes it β Updates access permissions automatically
- Verification β API call succeeds
- Response draft β Generates email reply based on full context
All through natural voice conversation.
Tech Stack
| Component | Technology |
|---|---|
| UI Framework | Gradio 6 |
| LLM | HuggingFace/Nebius APIs |
| STT | Speech-to-text model: Whisper |
| TTS | Text-to-speech model: Kokoro |
| Vision | Vision language model: Gemma |
| Tool Integration | MCP (Model Context Protocol) |
| RAG | Simple document retrieval (no vector DB) |
Quickstart
- Install dependencies:
pip install -r requirements.txt
Configure
.envwith your API keys.Launch the Gradio app:
python app.py
- Interact by voice or text:
- Click "Record" to begin voice interaction
- Ask ATLAS to retrieve customer data, or pull documents
- Share screen for visual context
- Request environment automations (API permissions, etc.)
Adaptability
While built for CSM/sales rep workflows, ATLAS adapts to any office role:
- Support Engineers: Ticket context + documentation retrieval + environment automation
- Account Managers: Client data + document analysis + meeting prep
- Project Managers: Task context + resource lookup + status updates
- Developers: API testing + documentation + environment management
Simply swap the MCP tools to match your workflow.
Architecture
ATLAS uses a simple but effective architecture:
- Gradio UI β User interaction layer (voice/text/vision)
- LLM Core β Reasoning and orchestration
- MCP Tools β Lightweight integrations (no heavy infra)
- Simple RAG β Document retrieval without vector databases
Focus on clarity and practical value over architectural complexity.
Contact
a.zamfir@hotmail.com
LinkedIn: Andrei Zamfir https://www.linkedin.com/in/andrei-d-zamfir/