--- title: Aileen 3 Core - Information Foraging MCP Server emoji: 👩🏻‍💼 colorFrom: purple colorTo: blue sdk: docker pinned: false license: cc-by-4.0 short_description: Use priors to surface novel insights in noisy communications tags: - building-mcp-track-enterprise - building-mcp-track-customer - anthropic - gemini - elevenlabs --- # Aileen 3 Core: Information Foraging MCP Server
HF Space Demo video LinkedIn post
> **"Information is surprises. You learn something when things don’t turn out the way you expected."** ⸺ Roger Schank ## ♨️ Problem: The Noise-Signal Ratio Professionals working at the intersection of regulation and technology drink from a firehose of information. Staying current requires monitoring hours of conferences, webinars, and podcasts. Standard AI **summarization fails** here because it creates "flat" summaries that rehash what you already know. It treats every sentence as equally important. ## ✅ Solution: Expectation-Driven Analysis **Aileen 3 Core** is a Model Context Protocol (MCP) server designed for **Information Foraging**. Grounded in cognitive science, it models "novelty" as **prediction error**. Instead of asking "Summarize this video," Aileen 3 Core allows users to task a Large Language Model with: *"Here is what I already know, and here is what I expect the speaker to say. Tell me only where they deviate from this baseline."*. As part of a larger agentic AI system, the prior knowledge can even be derived from a memory bank. ### 💪 Key Capabilities * **⛳️ Expectation-Driven Briefings:** Uses Google Gemini to analyze audio/video against user-supplied priors (context, expectations, and knowledge gaps) to surface genuine surprises. * **🔍 Context-Biased Transcription:** Prevents hallucinations (e.g., confusing the German treaty "NOOTS" for "emergency state") by feeding media metadata as priors to the model. * **🖼️ Visual Slide Extraction:** Automatically detects, extracts, and, on request, translates slide stills from video feeds, treating slides as high-density information artifacts. * **🔌 Universal MCP Support:** Works with **Claude Desktop**, or any custom Agent. --- ## 🏗️ Architecture Aileen 3 Core exposes tools that bridge the gap between raw media and reasoning agents. ## 🚀 Quick Start: Claude Desktop Aileen 3 Core is designed to be the "eyes and ears" for your local LLM client. 1. **Install:** ```bash # Clone and install dependencies pip install -e ./mcp ``` 2. Obtain a Google Gemini API key: [Google AI Studio](https://aistudio.google.com) 3. **Configure `claude_desktop_config.json`:**. The Gemini API key will be read from the environment, so can also be set here: ```json { "mcpServers": { "aileen3-mcp": { "command": "python", "args": ["-m", "aileen3_mcp.server"], "env": { "GEMINI_API_KEY": "AI..." } } } } ``` 4. Restart Claude 5. The Haiku 4.5 model is sufficient for basic tasks. To make your plans fully transparent to the LLM, refer to "aileen3" explicitly in the prompt, e.g.: > Use aileen3 to translate slide 3 from YouTube video reference eXP-PvKcI9A to German. Screenshot of Claude Desktop: slide translation with Aileen 3 Core ### 🔍 Debugging The message exchange and Claude-facing error messages can be read from Claude log files: ``` tail -n 20 -F ~/Library/Logs/Claude/mcp*.log ``` ## 🧪 The Gradio Space (Interactive Demo) We have built a custom **Gradio 6** application that acts as a visual frontend for the MCP server. It demonstrates the pipeline step-by-step: 1. **Health Check:** Verifies `ffmpeg`, `yt-dlp`, and Gemini connectivity. 2. **Hallucination Check:** Demonstrates how lack of context leads to speech recognition errors. 3. **Context-biased Transcription:** Fixes these errors by establishing priors. 4. **Expectation-driven Analysis:** The core engine in action. 5. **Slide Translation:** Extracting and localizing visual assets. [**👉 Try the Live Demo Here**](https://ndurner.de/links/aileen3-hf-space) ## 📘 MCP server overview The MCP server is implemented in `mcp/src/aileen3_mcp` and exposes tools over stdio via `aileen3_mcp.server`. Google Gemini powers the analysis, transcription, and slide translation flows. Media retrieval is handled by `yt-dlp` and `ffmpeg`. Environment prerequisites: - `GEMINI_API_KEY` set to a valid Gemini API key - `ffmpeg`, `deno`, `yt-dlp` installed and on `PATH` Optional configuration: - `AILEEN3_ANALYSIS_MODEL` to override the default Gemini model used for expectation-driven analysis (defaults to `gemini-flash-latest` for straightforward experimentation on the free tier of Google AI Studio; `gemini-3-pro-preview` recommended for accuracy). - `AILEEN3_CACHE_DIR` to change the base cache directory (default: `~/.cache/aileen3`). - `AILEEN3_DEBUG=1` to enable additional debug artefacts on disk. ### ⭐️ Example client integration The companion project [Aileen 3 Agent](https://ndurner.de/links/aileen3-agent-github) uses this MCP server via the `google.adk` `McpToolset`, spawning `aileen3_mcp.server` over stdio with: - `command`: `sys.executable` - `args`: `["-m", "aileen3_mcp.server"]` - `env`: explicitly forwarding `GEMINI_API_KEY` into the MCP process - `timeout`: `1200` seconds at the MCP transport level, to accommodate long-running video analysis and transcription jobs beyond the 30 seconds default When integrating this MCP into your own agent or client: - Set transport-level timeouts generously (10–20 minutes) and rely on the tools’ `wait_seconds` argument plus status polling for progress. - Ensure `GEMINI_API_KEY` (and any optional `AILEEN3_*` variables you use) are visible in the environment of the MCP server process, not just the client. ### 🛠️ MCP tools overview All tools are registered in `aileen3_mcp.server.make_app` and exposed via a stdio MCP server for use by the Gradio demo, Claude Desktop, and other clients. In short, the public tools are: - `health` - `search_youtube` - `start_media_retrieval` / `get_media_retrieval_status` - `start_slide_extraction` / `get_extracted_slides` - `translate_slide` - `start_media_analysis` / `get_media_analysis_result` - `start_media_transcription` / `get_media_transcription_result` These tools are designed to be called from an agentic AI system that: - first chooses a media `source` (optionally using `search_youtube`) - then calls `start_media_retrieval` - and finally uses the `reference` token to drive analysis, transcription, or slide translation. For detailed tool contracts (arguments, return payloads, and error shapes), see `mcp/README.md`. ## 🏆 Hackathon Context & Journey Aileen 3 Core was built for the [MCP's 1st Birthday - Hosted by Anthropic and Gradio](https://huggingface.co/MCP-1st-Birthday) and serves as the backbone for the [Aileen 3 Agent](https://ndurner.de/links/aileen3-kaggle-writeup) (developed for the [AI Agents Intensive Course with Google](https://www.kaggle.com/learn-guide/5-day-agents)). The required supplements - HuggingFace Space, demo video, and social media post - are behind the badges at the very top of this page. ## 📦 Local Development ```bash # Build the Docker image docker build -t aileen3-core . # Run the Gradio interface docker run -it -p 7860:7860 aileen3-core ``` ## 🛡️ Security & privacy - Your Gemini key is used only server-side to call Gemini models. - Media is downloaded to cache for repeatability; clear ~/.cache/aileen3 to remove artefacts. - No analytics or third-party telemetry included. ## 🚧 Limitations - `translate_slide` does currently not benefit from priors; translation quality could be improved that way - No AI safety guardrails (tone, style, anti prompt-injection, ...) included - No cost control included - Hallucination risk - Aileen may make mistakes. - Remote MCP operating mode not tested; would rely on external access protection ## 👾 Troubleshooting - Gemini 401 “API keys are not supported…”: use AI Studio key starting with “AI…”, not Vertex keys (“AQ…”). - Long jobs: increase transport timeout (10–20 min) and leverage wait_seconds + polling get_* tools. - YouTube access: * ensure YouTube is reachable * yt-dlp is recent * if site JS protection breaks, install yt-dlp-ejs (see Space health check).