Atlas / README.md
a-zamfir's picture
updated social media link
bec2a3f

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: Atlas
emoji: πŸŒ–
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 6.0.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: ATLAS - Gradio x HuggingFace Hackathon
tags:
  - mcp-in-action-track-enterprise
  - mcp-in-action-track-consumer

ATLAS

Important

  1. Watch ATLAS' video overview here: [Youtube(https://youtu.be/-nn9mkU5jqk)]
  2. ATLAS works entirely through mock MCP tools - no external dependencies required. Just clone and run.
  3. Social media link: [LinkedIn(https://www.linkedin.com/posts/andrei-d-zamfir_atlas-demo-overview-gradio-x-mcp-hackathon-activity-7401038354537951232-Nylu?utm_source=share&utm_medium=member_desktop&rcm=ACoAACmwO_QBj6ltvKCp4p2M88UmBqnVqE7jwxM)]

Overview

ATLAS is a multimodal AI work companion built for the Gradio x MCP Hackathon. It demonstrates how a voice-driven assistant can augment knowledge work by:

  • Listening to your requests through voice (STT)
  • Speaking responses and updates (TTS)
  • Seeing your screen to understand context (vision)
  • Acting on your behalf through MCP tool integrations

The goal is to showcase how modern LLMs can be integrated into daily workflows to handle context retrieval, document analysis, and environment automation, all through natural conversation.

Key Goals

  1. Multimodal Work Companion

    • Voice: hands-free interaction during calls/meetings
    • Vision: screen analysis for real-time context
    • Text: conversational interface with persistent context
  2. Practical Automation

    • Email context absorption
    • Customer data retrieval
    • Document lookup and analysis
    • Environment automation (API permissions, integrations)
  3. Proof-of-Concept (POC)

    • Simple RAG without database infrastructure
    • Mock MCP tools for easy setup
    • Adaptable to any office workflow

Functionalities & Offerings

1. Audio Service

  • STT: Converts voice input to text for hands-free operation
  • TTS: Speaks AI responses for natural conversation flow

2. Text (LLM) Service

  • Built on modern LLM APIs
  • Handles multi-turn conversation with context retention
  • Tool-calling orchestration for MCP integration
  • Dynamic prompt engineering for context-aware responses

3. Vision Service

  • Screen capture analysis for understanding user context
  • Document reading and interpretation
  • Visual feedback integration into conversation flow

4. MCP Integration

  • Customer Data Tools: Retrieve CRM information on demand
  • Document Retrieval: Simple RAG implementation without database
  • Environment Automation: API permission management, integration testing
  • Email Processing: Context absorption and response generation

Demo Scenario

The hackathon demo showcases a realistic CSM/sales rep workflow:

  1. Email arrives β†’ ATLAS reads and absorbs context using vision
  2. Customer data needed β†’ Retrieves from mock CRM
  3. Documents requested β†’ Pulls relevant customer files
  4. API call fails (401) β†’ User encounters auth error in Postman
  5. ATLAS fixes it β†’ Updates access permissions automatically
  6. Verification β†’ API call succeeds
  7. Response draft β†’ Generates email reply based on full context

All through natural voice conversation.

Tech Stack

Component Technology
UI Framework Gradio 6
LLM HuggingFace/Nebius APIs
STT Speech-to-text model: Whisper
TTS Text-to-speech model: Kokoro
Vision Vision language model: Gemma
Tool Integration MCP (Model Context Protocol)
RAG Simple document retrieval (no vector DB)

Quickstart

  1. Install dependencies:
   pip install -r requirements.txt
  1. Configure .env with your API keys.

  2. Launch the Gradio app:

   python app.py
  1. Interact by voice or text:
    • Click "Record" to begin voice interaction
    • Ask ATLAS to retrieve customer data, or pull documents
    • Share screen for visual context
    • Request environment automations (API permissions, etc.)

Adaptability

While built for CSM/sales rep workflows, ATLAS adapts to any office role:

  • Support Engineers: Ticket context + documentation retrieval + environment automation
  • Account Managers: Client data + document analysis + meeting prep
  • Project Managers: Task context + resource lookup + status updates
  • Developers: API testing + documentation + environment management

Simply swap the MCP tools to match your workflow.

Architecture

ATLAS uses a simple but effective architecture:

  1. Gradio UI β†’ User interaction layer (voice/text/vision)
  2. LLM Core β†’ Reasoning and orchestration
  3. MCP Tools β†’ Lightweight integrations (no heavy infra)
  4. Simple RAG β†’ Document retrieval without vector databases

Focus on clarity and practical value over architectural complexity.

Contact

a.zamfir@hotmail.com
LinkedIn: Andrei Zamfir https://www.linkedin.com/in/andrei-d-zamfir/