---
title: Aileen 3 Core - Information Foraging MCP Server
emoji: 👩🏻‍💼
colorFrom: purple
colorTo: blue
sdk: docker
pinned: false
license: cc-by-4.0
short_description: Use priors to surface novel insights in noisy communications
tags:
  - building-mcp-track-enterprise
  - building-mcp-track-customer
  - anthropic
  - gemini
  - elevenlabs
---

# Aileen 3 Core: Information Foraging MCP Server

<div style="display: flex; justify-content: center; gap: 10px; margin-bottom: 1em">
<a href="https://ndurner.de/links/aileen3-hf-space">
  <img alt="HF Space" src="https://img.shields.io/badge/HuggingFace-Gradio%206%20Space-yellow?logo=huggingface">
</a>
<a href="https://youtu.be/r56najKVS4I">
  <img alt="Demo video" src="https://img.shields.io/badge/YouTube-MCP%20demo%20video-red?logo=youtube">
</a>
<a href="https://ndurner.de/links/aileen3-linkedin">
  <img alt="LinkedIn post" src="https://img.shields.io/badge/🔗 LinkedIn-Post-blue?logo=linkedin">
</a>
</div>

> **"Information is surprises. You learn something when things don’t turn out the way you expected."** ⸺ Roger Schank

## ♨️ Problem: The Noise-Signal Ratio
Professionals working at the intersection of regulation and technology drink from a firehose of information. Staying current requires monitoring hours of conferences, webinars, and podcasts. 

Standard AI **summarization fails** here because it creates "flat" summaries that rehash what you already know. It treats every sentence as equally important.

## ✅ Solution: Expectation-Driven Analysis
**Aileen 3 Core** is a Model Context Protocol (MCP) server designed for **Information Foraging**. Grounded in cognitive science, it models "novelty" as **prediction error**.

Instead of asking "Summarize this video," Aileen 3 Core allows users to task a Large Language Model with: 
*"Here is what I already know, and here is what I expect the speaker to say. Tell me only where they deviate from this baseline."*. As part of a larger agentic AI system, the prior knowledge can even be derived from a memory bank.


### 💪 Key Capabilities
*   **⛳️ Expectation-Driven Briefings:** Uses Google Gemini to analyze audio/video against user-supplied priors (context, expectations, and knowledge gaps) to surface genuine surprises.
*   **🔍 Context-Biased Transcription:** Prevents hallucinations (e.g., confusing the German treaty "NOOTS" for "emergency state") by feeding media metadata as priors to the model.
*   **🖼️ Visual Slide Extraction:** Automatically detects, extracts, and, on request, translates slide stills from video feeds, treating slides as high-density information artifacts.
*   **🔌 Universal MCP Support:** Works with **Claude Desktop**, or any custom Agent.

---

## 🏗️ Architecture

Aileen 3 Core exposes tools that bridge the gap between raw media and reasoning agents.

<img src="readme-assets/architecture.webp" width="668">

## 🚀 Quick Start: Claude Desktop

Aileen 3 Core is designed to be the "eyes and ears" for your local LLM client.

1.  **Install:**
    ```bash
    # Clone and install dependencies
    pip install -e ./mcp
    ```

2. Obtain a Google Gemini API key: [Google AI Studio](https://aistudio.google.com) 

3.  **Configure `claude_desktop_config.json`:**. The Gemini API key will be read from the environment, so can also be set here:
    ```json
    {
      "mcpServers": {
        "aileen3-mcp": {
          "command": "python",
          "args": ["-m", "aileen3_mcp.server"],
          "env": {
            "GEMINI_API_KEY": "AI..."
          }
        }
      }
    }
    ```
4. Restart Claude

5. The Haiku 4.5 model is sufficient for basic tasks. To make your plans fully transparent to the LLM, refer to "aileen3" explicitly in the prompt, e.g.:
> Use aileen3 to translate slide 3 from YouTube video reference eXP-PvKcI9A to German.

<img src="readme-assets/claude-slide-translation.webp" width="668" alt="Screenshot of Claude Desktop: slide translation with Aileen 3 Core">

### 🔍 Debugging
The message exchange and Claude-facing error messages can be read from Claude log files:
```
tail -n 20 -F ~/Library/Logs/Claude/mcp*.log
```


## 🧪 The Gradio Space (Interactive Demo)

We have built a custom **Gradio 6** application that acts as a visual frontend for the MCP server. It demonstrates the pipeline step-by-step:

1.  **Health Check:** Verifies `ffmpeg`, `yt-dlp`, and Gemini connectivity.
2.  **Hallucination Check:** Demonstrates how lack of context leads to speech recognition errors.
3.  **Context-biased Transcription:** Fixes these errors by establishing priors.
4.  **Expectation-driven Analysis:** The core engine in action.
5.  **Slide Translation:** Extracting and localizing visual assets.

[**👉 Try the Live Demo Here**](https://ndurner.de/links/aileen3-hf-space)

## 📘 MCP server overview

The MCP server is implemented in `mcp/src/aileen3_mcp` and exposes tools over stdio via `aileen3_mcp.server`. Google Gemini powers the analysis, transcription, and slide translation flows. Media retrieval is handled by `yt-dlp` and `ffmpeg`.

Environment prerequisites:

- `GEMINI_API_KEY` set to a valid Gemini API key
- `ffmpeg`, `deno`, `yt-dlp` installed and on `PATH`

Optional configuration:

- `AILEEN3_ANALYSIS_MODEL` to override the default Gemini model used for expectation-driven analysis (defaults to `gemini-flash-latest` for straightforward experimentation on the free tier of Google AI Studio; `gemini-3-pro-preview` recommended for accuracy).
- `AILEEN3_CACHE_DIR` to change the base cache directory (default: `~/.cache/aileen3`).
- `AILEEN3_DEBUG=1` to enable additional debug artefacts on disk.

### ⭐️ Example client integration

The companion project [Aileen 3 Agent](https://ndurner.de/links/aileen3-agent-github) uses this MCP server via the `google.adk` `McpToolset`, spawning `aileen3_mcp.server` over stdio with:

- `command`: `sys.executable`
- `args`: `["-m", "aileen3_mcp.server"]`
- `env`: explicitly forwarding `GEMINI_API_KEY` into the MCP process
- `timeout`: `1200` seconds at the MCP transport level, to accommodate long-running video analysis and transcription jobs beyond the 30 seconds default

When integrating this MCP into your own agent or client:

- Set transport-level timeouts generously (10–20 minutes) and rely on the tools’ `wait_seconds` argument plus status polling for progress.
- Ensure `GEMINI_API_KEY` (and any optional `AILEEN3_*` variables you use) are visible in the environment of the MCP server process, not just the client.

### 🛠️ MCP tools overview

All tools are registered in `aileen3_mcp.server.make_app` and exposed via a stdio MCP server for use by the Gradio demo, Claude Desktop, and other clients.

In short, the public tools are:

- `health`
- `search_youtube`
- `start_media_retrieval` / `get_media_retrieval_status`
- `start_slide_extraction` / `get_extracted_slides`
- `translate_slide`
- `start_media_analysis` / `get_media_analysis_result`
- `start_media_transcription` / `get_media_transcription_result`

These tools are designed to be called from an agentic AI system that:

- first chooses a media `source` (optionally using `search_youtube`)
- then calls `start_media_retrieval`
- and finally uses the `reference` token to drive analysis, transcription, or slide translation.

For detailed tool contracts (arguments, return payloads, and error shapes), see `mcp/README.md`.

## 🏆 Hackathon Context & Journey
Aileen 3 Core was built for the [MCP's 1st Birthday - Hosted by Anthropic and Gradio](https://huggingface.co/MCP-1st-Birthday) and serves as the backbone for the [Aileen 3 Agent](https://ndurner.de/links/aileen3-kaggle-writeup) (developed for the [AI Agents Intensive Course with Google](https://www.kaggle.com/learn-guide/5-day-agents)).

The required supplements - HuggingFace Space, demo video, and social media post - are behind the badges at the very top of this page.


## 📦 Local Development

```bash
# Build the Docker image
docker build -t aileen3-core .

# Run the Gradio interface
docker run -it -p 7860:7860 aileen3-core
```

## 🛡️ Security & privacy
- Your Gemini key is used only server-side to call Gemini models.
- Media is downloaded to cache for repeatability; clear ~/.cache/aileen3 to remove artefacts.
- No analytics or third-party telemetry included.

## 🚧 Limitations
- `translate_slide` does currently not benefit from priors; translation quality could be improved that way
- No AI safety guardrails (tone, style, anti prompt-injection, ...) included
- No cost control included
- Hallucination risk - Aileen may make mistakes.
- Remote MCP operating mode not tested; would rely on external access protection

## 👾 Troubleshooting
- Gemini 401 “API keys are not supported…”: use AI Studio key starting with “AI…”, not Vertex keys (“AQ…”).
- Long jobs: increase transport timeout (10–20 min) and leverage wait_seconds + polling get_* tools.
- YouTube access:
  * ensure YouTube is reachable
  * yt-dlp is recent
  * if site JS protection breaks, install yt-dlp-ejs (see Space health check).