Spaces:
Running
Running
| title: Atlas | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: gradio | |
| sdk_version: 6.0.1 | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| short_description: ATLAS - Gradio x HuggingFace Hackathon | |
| tags: | |
| - mcp-in-action-track-enterprise | |
| - mcp-in-action-track-consumer | |
| # ATLAS | |
| ## Important | |
| 1. **Watch** ATLAS' video overview here: [Youtube(https://youtu.be/-nn9mkU5jqk)] | |
| 2. **ATLAS works entirely through mock MCP tools** - no external dependencies required. Just clone and run. | |
| 3. **Social media link:** [LinkedIn(https://www.linkedin.com/posts/andrei-d-zamfir_atlas-demo-overview-gradio-x-mcp-hackathon-activity-7401038354537951232-Nylu?utm_source=share&utm_medium=member_desktop&rcm=ACoAACmwO_QBj6ltvKCp4p2M88UmBqnVqE7jwxM)] | |
| ## Overview | |
| ATLAS is a multimodal AI work companion built for the Gradio x MCP Hackathon. It demonstrates how a voice-driven assistant can augment knowledge work by: | |
| - **Listening** to your requests through voice (STT) | |
| - **Speaking** responses and updates (TTS) | |
| - **Seeing** your screen to understand context (vision) | |
| - **Acting** on your behalf through MCP tool integrations | |
| The goal is to showcase how modern LLMs can be integrated into daily workflows to handle context retrieval, document analysis, and environment automation, all through natural conversation. | |
| ## Key Goals | |
| 1. **Multimodal Work Companion** | |
| - Voice: hands-free interaction during calls/meetings | |
| - Vision: screen analysis for real-time context | |
| - Text: conversational interface with persistent context | |
| 2. **Practical Automation** | |
| - Email context absorption | |
| - Customer data retrieval | |
| - Document lookup and analysis | |
| - Environment automation (API permissions, integrations) | |
| 3. **Proof-of-Concept (POC)** | |
| - Simple RAG without database infrastructure | |
| - Mock MCP tools for easy setup | |
| - Adaptable to any office workflow | |
| ## Functionalities & Offerings | |
| ### 1. Audio Service | |
| - **STT**: Converts voice input to text for hands-free operation | |
| - **TTS**: Speaks AI responses for natural conversation flow | |
| ### 2. Text (LLM) Service | |
| - Built on modern LLM APIs | |
| - Handles multi-turn conversation with context retention | |
| - Tool-calling orchestration for MCP integration | |
| - Dynamic prompt engineering for context-aware responses | |
| ### 3. Vision Service | |
| - Screen capture analysis for understanding user context | |
| - Document reading and interpretation | |
| - Visual feedback integration into conversation flow | |
| ### 4. MCP Integration | |
| - **Customer Data Tools**: Retrieve CRM information on demand | |
| - **Document Retrieval**: Simple RAG implementation without database | |
| - **Environment Automation**: API permission management, integration testing | |
| - **Email Processing**: Context absorption and response generation | |
| ## Demo Scenario | |
| The hackathon demo showcases a realistic CSM/sales rep workflow: | |
| 1. **Email arrives** β ATLAS reads and absorbs context using vision | |
| 2. **Customer data needed** β Retrieves from mock CRM | |
| 3. **Documents requested** β Pulls relevant customer files | |
| 4. **API call fails (401)** β User encounters auth error in Postman | |
| 5. **ATLAS fixes it** β Updates access permissions automatically | |
| 6. **Verification** β API call succeeds | |
| 7. **Response draft** β Generates email reply based on full context | |
| All through natural voice conversation. | |
| ## Tech Stack | |
| | Component | Technology | | |
| |--------------------|--------------------------------------------------| | |
| | UI Framework | Gradio 6 | | |
| | LLM | HuggingFace/Nebius APIs | | |
| | STT | Speech-to-text model: Whisper | | |
| | TTS | Text-to-speech model: Kokoro | | |
| | Vision | Vision language model: Gemma | | |
| | Tool Integration | MCP (Model Context Protocol) | | |
| | RAG | Simple document retrieval (no vector DB) | | |
| ## Quickstart | |
| 1. **Install dependencies**: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 2. **Configure** `.env` with your API keys. | |
| 3. **Launch** the Gradio app: | |
| ```bash | |
| python app.py | |
| ``` | |
| 4. **Interact** by voice or text: | |
| - Click "Record" to begin voice interaction | |
| - Ask ATLAS to retrieve customer data, or pull documents | |
| - Share screen for visual context | |
| - Request environment automations (API permissions, etc.) | |
| ## Adaptability | |
| While built for CSM/sales rep workflows, ATLAS adapts to any office role: | |
| - **Support Engineers**: Ticket context + documentation retrieval + environment automation | |
| - **Account Managers**: Client data + document analysis + meeting prep | |
| - **Project Managers**: Task context + resource lookup + status updates | |
| - **Developers**: API testing + documentation + environment management | |
| Simply swap the MCP tools to match your workflow. | |
| ## Architecture | |
| ATLAS uses a simple but effective architecture: | |
| 1. **Gradio UI** β User interaction layer (voice/text/vision) | |
| 2. **LLM Core** β Reasoning and orchestration | |
| 3. **MCP Tools** β Lightweight integrations (no heavy infra) | |
| 4. **Simple RAG** β Document retrieval without vector databases | |
| Focus on clarity and practical value over architectural complexity. | |
| ## Contact | |
| <a.zamfir@hotmail.com> | |
| LinkedIn: Andrei Zamfir <https://www.linkedin.com/in/andrei-d-zamfir/> | |