Spaces:
Running
Running
| title: Aileen 3 Core - Information Foraging MCP Server | |
| emoji: 👩🏻💼 | |
| colorFrom: purple | |
| colorTo: blue | |
| sdk: docker | |
| pinned: false | |
| license: cc-by-4.0 | |
| short_description: Use priors to surface novel insights in noisy communications | |
| tags: | |
| - building-mcp-track-enterprise | |
| - building-mcp-track-customer | |
| - anthropic | |
| - gemini | |
| - elevenlabs | |
| # Aileen 3 Core: Information Foraging MCP Server | |
| <div style="display: flex; justify-content: center; gap: 10px; margin-bottom: 1em"> | |
| <a href="https://ndurner.de/links/aileen3-hf-space"> | |
| <img alt="HF Space" src="https://img.shields.io/badge/HuggingFace-Gradio%206%20Space-yellow?logo=huggingface"> | |
| </a> | |
| <a href="https://youtu.be/r56najKVS4I"> | |
| <img alt="Demo video" src="https://img.shields.io/badge/YouTube-MCP%20demo%20video-red?logo=youtube"> | |
| </a> | |
| <a href="https://ndurner.de/links/aileen3-linkedin"> | |
| <img alt="LinkedIn post" src="https://img.shields.io/badge/🔗 LinkedIn-Post-blue?logo=linkedin"> | |
| </a> | |
| </div> | |
| > **"Information is surprises. You learn something when things don’t turn out the way you expected."** ⸺ Roger Schank | |
| ## ♨️ Problem: The Noise-Signal Ratio | |
| Professionals working at the intersection of regulation and technology drink from a firehose of information. Staying current requires monitoring hours of conferences, webinars, and podcasts. | |
| Standard AI **summarization fails** here because it creates "flat" summaries that rehash what you already know. It treats every sentence as equally important. | |
| ## ✅ Solution: Expectation-Driven Analysis | |
| **Aileen 3 Core** is a Model Context Protocol (MCP) server designed for **Information Foraging**. Grounded in cognitive science, it models "novelty" as **prediction error**. | |
| Instead of asking "Summarize this video," Aileen 3 Core allows users to task a Large Language Model with: | |
| *"Here is what I already know, and here is what I expect the speaker to say. Tell me only where they deviate from this baseline."*. As part of a larger agentic AI system, the prior knowledge can even be derived from a memory bank. | |
| ### 💪 Key Capabilities | |
| * **⛳️ Expectation-Driven Briefings:** Uses Google Gemini to analyze audio/video against user-supplied priors (context, expectations, and knowledge gaps) to surface genuine surprises. | |
| * **🔍 Context-Biased Transcription:** Prevents hallucinations (e.g., confusing the German treaty "NOOTS" for "emergency state") by feeding media metadata as priors to the model. | |
| * **🖼️ Visual Slide Extraction:** Automatically detects, extracts, and, on request, translates slide stills from video feeds, treating slides as high-density information artifacts. | |
| * **🔌 Universal MCP Support:** Works with **Claude Desktop**, or any custom Agent. | |
| --- | |
| ## 🏗️ Architecture | |
| Aileen 3 Core exposes tools that bridge the gap between raw media and reasoning agents. | |
| <img src="readme-assets/architecture.webp" width="668"> | |
| ## 🚀 Quick Start: Claude Desktop | |
| Aileen 3 Core is designed to be the "eyes and ears" for your local LLM client. | |
| 1. **Install:** | |
| ```bash | |
| # Clone and install dependencies | |
| pip install -e ./mcp | |
| ``` | |
| 2. Obtain a Google Gemini API key: [Google AI Studio](https://aistudio.google.com) | |
| 3. **Configure `claude_desktop_config.json`:**. The Gemini API key will be read from the environment, so can also be set here: | |
| ```json | |
| { | |
| "mcpServers": { | |
| "aileen3-mcp": { | |
| "command": "python", | |
| "args": ["-m", "aileen3_mcp.server"], | |
| "env": { | |
| "GEMINI_API_KEY": "AI..." | |
| } | |
| } | |
| } | |
| } | |
| ``` | |
| 4. Restart Claude | |
| 5. The Haiku 4.5 model is sufficient for basic tasks. To make your plans fully transparent to the LLM, refer to "aileen3" explicitly in the prompt, e.g.: | |
| > Use aileen3 to translate slide 3 from YouTube video reference eXP-PvKcI9A to German. | |
| <img src="readme-assets/claude-slide-translation.webp" width="668" alt="Screenshot of Claude Desktop: slide translation with Aileen 3 Core"> | |
| ### 🔍 Debugging | |
| The message exchange and Claude-facing error messages can be read from Claude log files: | |
| ``` | |
| tail -n 20 -F ~/Library/Logs/Claude/mcp*.log | |
| ``` | |
| ## 🧪 The Gradio Space (Interactive Demo) | |
| We have built a custom **Gradio 6** application that acts as a visual frontend for the MCP server. It demonstrates the pipeline step-by-step: | |
| 1. **Health Check:** Verifies `ffmpeg`, `yt-dlp`, and Gemini connectivity. | |
| 2. **Hallucination Check:** Demonstrates how lack of context leads to speech recognition errors. | |
| 3. **Context-biased Transcription:** Fixes these errors by establishing priors. | |
| 4. **Expectation-driven Analysis:** The core engine in action. | |
| 5. **Slide Translation:** Extracting and localizing visual assets. | |
| [**👉 Try the Live Demo Here**](https://ndurner.de/links/aileen3-hf-space) | |
| ## 📘 MCP server overview | |
| The MCP server is implemented in `mcp/src/aileen3_mcp` and exposes tools over stdio via `aileen3_mcp.server`. Google Gemini powers the analysis, transcription, and slide translation flows. Media retrieval is handled by `yt-dlp` and `ffmpeg`. | |
| Environment prerequisites: | |
| - `GEMINI_API_KEY` set to a valid Gemini API key | |
| - `ffmpeg`, `deno`, `yt-dlp` installed and on `PATH` | |
| Optional configuration: | |
| - `AILEEN3_ANALYSIS_MODEL` to override the default Gemini model used for expectation-driven analysis (defaults to `gemini-flash-latest` for straightforward experimentation on the free tier of Google AI Studio; `gemini-3-pro-preview` recommended for accuracy). | |
| - `AILEEN3_CACHE_DIR` to change the base cache directory (default: `~/.cache/aileen3`). | |
| - `AILEEN3_DEBUG=1` to enable additional debug artefacts on disk. | |
| ### ⭐️ Example client integration | |
| The companion project [Aileen 3 Agent](https://ndurner.de/links/aileen3-agent-github) uses this MCP server via the `google.adk` `McpToolset`, spawning `aileen3_mcp.server` over stdio with: | |
| - `command`: `sys.executable` | |
| - `args`: `["-m", "aileen3_mcp.server"]` | |
| - `env`: explicitly forwarding `GEMINI_API_KEY` into the MCP process | |
| - `timeout`: `1200` seconds at the MCP transport level, to accommodate long-running video analysis and transcription jobs beyond the 30 seconds default | |
| When integrating this MCP into your own agent or client: | |
| - Set transport-level timeouts generously (10–20 minutes) and rely on the tools’ `wait_seconds` argument plus status polling for progress. | |
| - Ensure `GEMINI_API_KEY` (and any optional `AILEEN3_*` variables you use) are visible in the environment of the MCP server process, not just the client. | |
| ### 🛠️ MCP tools overview | |
| All tools are registered in `aileen3_mcp.server.make_app` and exposed via a stdio MCP server for use by the Gradio demo, Claude Desktop, and other clients. | |
| In short, the public tools are: | |
| - `health` | |
| - `search_youtube` | |
| - `start_media_retrieval` / `get_media_retrieval_status` | |
| - `start_slide_extraction` / `get_extracted_slides` | |
| - `translate_slide` | |
| - `start_media_analysis` / `get_media_analysis_result` | |
| - `start_media_transcription` / `get_media_transcription_result` | |
| These tools are designed to be called from an agentic AI system that: | |
| - first chooses a media `source` (optionally using `search_youtube`) | |
| - then calls `start_media_retrieval` | |
| - and finally uses the `reference` token to drive analysis, transcription, or slide translation. | |
| For detailed tool contracts (arguments, return payloads, and error shapes), see `mcp/README.md`. | |
| ## 🏆 Hackathon Context & Journey | |
| Aileen 3 Core was built for the [MCP's 1st Birthday - Hosted by Anthropic and Gradio](https://huggingface.co/MCP-1st-Birthday) and serves as the backbone for the [Aileen 3 Agent](https://ndurner.de/links/aileen3-kaggle-writeup) (developed for the [AI Agents Intensive Course with Google](https://www.kaggle.com/learn-guide/5-day-agents)). | |
| The required supplements - HuggingFace Space, demo video, and social media post - are behind the badges at the very top of this page. | |
| ## 📦 Local Development | |
| ```bash | |
| # Build the Docker image | |
| docker build -t aileen3-core . | |
| # Run the Gradio interface | |
| docker run -it -p 7860:7860 aileen3-core | |
| ``` | |
| ## 🛡️ Security & privacy | |
| - Your Gemini key is used only server-side to call Gemini models. | |
| - Media is downloaded to cache for repeatability; clear ~/.cache/aileen3 to remove artefacts. | |
| - No analytics or third-party telemetry included. | |
| ## 🚧 Limitations | |
| - `translate_slide` does currently not benefit from priors; translation quality could be improved that way | |
| - No AI safety guardrails (tone, style, anti prompt-injection, ...) included | |
| - No cost control included | |
| - Hallucination risk - Aileen may make mistakes. | |
| - Remote MCP operating mode not tested; would rely on external access protection | |
| ## 👾 Troubleshooting | |
| - Gemini 401 “API keys are not supported…”: use AI Studio key starting with “AI…”, not Vertex keys (“AQ…”). | |
| - Long jobs: increase transport timeout (10–20 min) and leverage wait_seconds + polling get_* tools. | |
| - YouTube access: | |
| * ensure YouTube is reachable | |
| * yt-dlp is recent | |
| * if site JS protection breaks, install yt-dlp-ejs (see Space health check). |