Spaces:

MCP-1st-Birthday
/

aileen3-core

Running

App Files Files Community

ndurner commited on 16 days ago

Commit

f6292c0

1 Parent(s): 340f3f7

rewording

Browse files

Files changed (5) hide show

.github/README.md +236 -42
README.md +238 -44
demo/setup_cell.py +1 -1
mcp/README.md +15 -25
mcp/src/aileen3_mcp/media_tools.py +1 -1

.github/README.md CHANGED Viewed

@@ -1,66 +1,260 @@
-# Aileen 3 Core
-<div style="display: flex; justify-content: center; gap: 10px;">
-<a href="https://ndurner.de/links/aileen3-hf-space"><img alt="HuggingFace Space Badge" src="https://img.shields.io/badge/HuggingFace-Space-yellow?logo=huggingface"></img></a>
-<a href="https://ndurner.de/links/aileen3-linkedin"><img alt="LinkedIn Post Badge" src="https://img.shields.io/badge/LinkedIn-Post-blue?logo=linkedin"></img></a>
-<a href="https://ndurner.de/links/aileen3-hf-video"><img alt="MCP Demo Video Badge" src="https://img.shields.io/badge/MCP%20Demo-Video-red?logo=YouTube"></img></a>
 <a href="https://ndurner.de/links/aileen3-kaggle-writeup"><img alt="Agent Kaggle Writeup" src="https://img.shields.io/badge/Agent-Writeup-lightgray?logo=kaggle"></img></a>
-<a href="https://ndurner.de/links/aileen3-kaggle-video"><img alt="Agent Demo Video Badge" src="https://img.shields.io/badge/Agent%20Demo-Video-lightgray?logo=YouTube"></img></a>
 </div>
-## Introduction
-Large Language Models (LLMs) rely on **tools** - sometimes provided by **MCP servers** - to interact with the outside world. Aileen 3 Core is an MCP server that focuses on **Information Foraging**: mining for novel insights from high-noise sources to create dense briefings for time efficient consumption by the user. Grounded in cognitive science, Aileen 3 models novelty as prediction error against explicit priors such as user expectations, facts from an AI Memory Bank, or media context. To that end, the Aileen 3 Core MCP server provides media access and analysis services backed by Google Gemini.
-### Competition submission
-Aileen 3 Core is [a contender](https://huggingface.co/spaces/ndurner/aileen3-core) in the [MCP's 1st Birthday - Hosted by Anthropic and Gradio](https://huggingface.co/MCP-1st-Birthday) hackathon. [Aileen 3 Agent](https://github.com/ndurner/aileen3-agent), an agentic system built on this MCP server, is a [capstone project](https://ndurner.de/links/aileen3-kaggle-writeup) to the [AI Agents Intensive Course with Google](https://www.kaggle.com/learn-guide/5-day-agents).
-## Using
-### Using in Claude Desktop
-#### Installing
-1. Optionally, create a new Python virtual environment
-  * example: `python3 -m venv .venv-claude`
-2. Install the Aileen MCP: `pip install ./mcp`
-  * (or `pip install -e ./mcp` if you want to make live changes to this source tree)
-3. Obtain Google Gemini API key: [https://aistudio.google.com](../Google AI Studio)
-3. Add reference to `claude_desktop_config.json`. The Gemini API key will be read from the environment, so can be set here:
 ```
-{
-...
-  "mcpServers": {
-    "aileen3-mcp": {
-      "command": "/Users/.../aileen3-core/.venv-claude/bin/python",
-      "args": [
-        "-m",
-        "aileen3_mcp.server"
-      ],
-      "env": {
-        "GEMINI_API_KEY": "AI..."
       }
     }
-  }
-}
-```
 4. Restart Claude
-#### Using the MCP server
-The model Haiku 4.5 is sufficient for basic tasks. To make your plans fully transparent to the LLM, refer to "aileen3" explicitely in the prompt, e.g.:
 > Use aileen3 to translate slide 3 from YouTube video reference eXP-PvKcI9A to German.
 ![Screenshot of Claude Desktop: slide translation with Aileen 3 Core](../readme-assets/claude-slide-translation.webp)
-#### Debugging
 The message exchange and Claude-facing error messages can be read from Claude log files:
 ```
 tail -n 20 -F ~/Library/Logs/Claude/mcp*.log
 ```
-## Local development
-Build and run the Docker Space image locally:
 ```bash
 docker build -t aileen3-core .
 docker run -it -p 7860:7860 aileen3-core
 ```
-## Troubleshooting
-- Error message: "google.genai.errors.ClientError: 401 UNAUTHENTICATED. {'error': {'code': 401, 'message': 'API keys are not supported by this API. Expected OAuth2 access token or other authentication credentials that assert a principal. ...". This may be a catch-all error message and the underlying problem may be something different. We solved this by using an older Gemini API key from Gemini Studio (Nov 10 vs. Nov 23; same project). The newer API keys may have been generated by Vertex AI, but we did not investigate this further. The working key starts with "AI", the broken keys start with "AQ.".

+# Aileen 3 Core: Information Foraging MCP Server
+<div style="display: flex; justify-content: center; gap: 10px; margin-bottom: 1em">
+<a href="https://ndurner.de/links/aileen3-hf-space"><img alt="HuggingFace Space Badge" src="https://img.shields.io/badge/Gradio%206-HuggingFace%20Space-yellow?logo=gradio"></img></a>
+<a href="https://ndurner.de/links/aileen3-linkedin"><img alt="LinkedIn Post Badge" src="https://img.shields.io/badge/🔗%20LinkedIn-Post-blue?logo=linkedin"></img></a>
+<a href="https://ndurner.de/links/aileen3-hf-video"><img alt="MCP Demo Video Badge" src="https://img.shields.io/badge/MCP-Demo%20Video-red?logo=YouTube"></img></a>
+🔜<a href="https://ndurner.de/links/aileen3-agent-github"><img alt="Agent Agent Github Badge" src="https://img.shields.io/badge/Agent-Github-lightgray?logo=github"></img></a>
+<a href="https://ndurner.de/links/aileen3-kaggle-video"><img alt="Agent Demo Video Badge" src="https://img.shields.io/badge/Agent-Demo%20Video-lightgray?logo=YouTube"></img></a>
 <a href="https://ndurner.de/links/aileen3-kaggle-writeup"><img alt="Agent Kaggle Writeup" src="https://img.shields.io/badge/Agent-Writeup-lightgray?logo=kaggle"></img></a>
 </div>
+> **"Information is surprises. You learn something when things don’t turn out the way you expected."** ⸺ Roger Schank
+## ♨️ Problem: The Noise-Signal Ratio
+Professionals working at the intersection of regulation and technology drink from a firehose of information. Staying current requires monitoring hours of conferences, webinars, and podcasts.
+Standard AI **summarization fails** here because it creates "flat" summaries that rehash what you already know. It treats every sentence as equally important.
+## ✅ Solution: Expectation-Driven Analysis
+**Aileen 3 Core** is a Model Context Protocol (MCP) server designed for **Information Foraging**. Grounded in cognitive science, it models "novelty" as **prediction error**.
+Instead of asking "Summarize this video," Aileen 3 Core allows users to task a Large Language Model with:
+*"Here is what I already know, and here is what I expect the speaker to say. Tell me only where they deviate from this baseline."*. As part of a larger agentic AI system, the prior knowledge can even be derived from a memory bank.
+### Key Capabilities
+*   **⛳️ Expectation-Driven Briefings:** Uses Google Gemini to analyze audio/video against user-supplied priors (context, expectations, and knowledge gaps) to surface genuine surprises.
+*   **🔍 Context-Biased Transcription:** Prevents hallucinations (e.g., confusing the German treaty "NOOTS" for "emergency state") by feeding media metadata as priors to the model.
+*   **🖼️ Visual Slide Extraction:** Automatically detects, extracts, and, on request, translates slide stills from video feeds, treating slides as high-density information artifacts.
+*   **🔌 Universal MCP Support:** Works with **Claude Desktop**, or any custom Agent.
+---
+## 🏗️ Architecture
+Aileen 3 Core exposes tools that bridge the gap between raw media and reasoning agents.
+```mermaid
+graph LR
+    User[User / Agent] -->|Priors & Expectations| MCP[Aileen 3 Core MCP]
+    MCP -->|Retrieval| YT[YouTube/Media]
+    MCP -->|Visuals| Slides[Slide Extraction]
+    MCP -->|Audio| Trans[Transcription]
+    MCP -->|Reasoning| Gemini[Google Gemini]
+    YT --> Gemini
+    Slides --> Gemini
+    Trans --> Gemini
+    Gemini -->|Briefing: Surprises Only| MCP
+    Gemini -->|Localized slides| MCP
+    MCP -->|High-Signal Update| User
 ```
+## 🚀 Quick Start: Claude Desktop
+Aileen 3 Core is designed to be the "eyes and ears" for your local LLM client.
+1.  **Install:**
+    ```bash
+    # Clone and install dependencies
+    pip install -e ./mcp
+    ```
+2. Obtain a Google Gemini API key: [Google AI Studio](https://aistudio.google.com)
+3.  **Configure `claude_desktop_config.json`:**. The Gemini API key will be read from the environment, so can also be set here:
+    ```json
+    {
+      "mcpServers": {
+        "aileen3-mcp": {
+          "command": "python",
+          "args": ["-m", "aileen3_mcp.server"],
+          "env": {
+            "GEMINI_API_KEY": "AI..."
+          }
+        }
       }
     }
+    ```
 4. Restart Claude
+5. The Haiku 4.5 model is sufficient for basic tasks. To make your plans fully transparent to the LLM, refer to "aileen3" explicitly in the prompt, e.g.:
 > Use aileen3 to translate slide 3 from YouTube video reference eXP-PvKcI9A to German.
 ![Screenshot of Claude Desktop: slide translation with Aileen 3 Core](../readme-assets/claude-slide-translation.webp)
+### 🔍 Debugging
 The message exchange and Claude-facing error messages can be read from Claude log files:
 ```
 tail -n 20 -F ~/Library/Logs/Claude/mcp*.log
 ```
+## 🧪 The Gradio Space (Interactive Demo)
+We have built a custom **Gradio 6** application that acts as a visual frontend for the MCP server. It demonstrates the pipeline step-by-step:
+1.  **Health Check:** Verifies `ffmpeg`, `yt-dlp`, and Gemini connectivity.
+2.  **Hallucination Check:** Demonstrates how lack of context leads to speech recognition errors.
+3.  **Context-biased Transcription:** Fixes these errors by establishing priors.
+4.  **Expectation-driven Analysis:** The core engine in action.
+5.  **Slide Translation:** Extracting and localizing visual assets.
+[**👉 Try the Live Demo Here**](https://ndurner.de/links/aileen3-hf-space)
+## 📘 MCP server overview
+The MCP server is implemented in `mcp/src/aileen3_mcp` and exposes tools over stdio via `aileen3_mcp.server`. Google Gemini powers the analysis, transcription, and slide translation flows. Media retrieval is handled by `yt-dlp` and `ffmpeg`.
+Environment prerequisites:
+- `GEMINI_API_KEY` set to a valid Gemini API key
+- `ffmpeg` installed and on `PATH`
+Optional configuration:
+- `AILEEN3_ANALYSIS_MODEL` to override the default Gemini model used for expectation-driven analysis (defaults to `gemini-flash-latest` for straightforward experimentation on the free tier of Google AI Studio; `gemini-3-pro-preview` recommended for accuracy).
+- `AILEEN3_CACHE_DIR` to change the base cache directory (default: `~/.cache/aileen3`).
+- `AILEEN3_DEBUG=1` to enable additional debug artefacts on disk.
+### ⭐️ Example client integration
+The companion project [Aileen 3 Agent](https://ndurner.de/links/aileen3-agent-github) uses this MCP server via the `google.adk` `McpToolset`, spawning `aileen3_mcp.server` over stdio with:
+- `command`: `sys.executable`
+- `args`: `["-m", "aileen3_mcp.server"]`
+- `env`: explicitly forwarding `GEMINI_API_KEY` into the MCP process
+- `timeout`: `1200` seconds at the MCP transport level, to accommodate long-running video analysis and transcription jobs beyond the 30 seconds default
+When integrating this MCP into your own agent or client:
+- Set transport-level timeouts generously (10–20 minutes) and rely on the tools’ `wait_seconds` argument plus status polling for progress.
+- Ensure `GEMINI_API_KEY` (and any optional `AILEEN3_*` variables you use) are visible in the environment of the MCP server process, not just the client.
+### 🛠️ MCP tools and definitions
+#### 🩺 Health and search
+- `health() -> { ok, detail, ffmpeg, gemini_api_key }`
+  - Purpose: Lightweight health probe mirroring the Gradio demo’s health check. Confirms that `ffmpeg` is callable and `GEMINI_API_KEY` is present.
+  - Usage: Call before running longer flows to surface missing runtime dependencies early.
+- `search_youtube(query: str, max_results: int = 10) -> { videos: [...] }`
+  - Purpose: Fast YouTube search using `yt-dlp` (no downloads).
+  - Arguments:
+    - `query` (required): Free-form search terms (e.g. `"taler auditor bachelorthesis"`).
+    - `max_results` (optional, default `10`, clamped to `1–50`).
+  - Returns: `videos` list with `id`, `title`, `webpage_url`, `duration_seconds`, `channel`, `channel_id`.
+  - Typical flow: Use from an agent to shortlist candidate videos before picking one `source` for retrieval.
+#### 📺 Media retrieval (entry point)
+- `start_media_retrieval(source: str, prefer_audio_only: bool = False, wait_seconds: int = 54) -> dict`
+  - Purpose: Download long-form media (YouTube, podcasts, HTTP URLs) and normalize basic metadata.
+  - Arguments:
+    - `source`: YouTube URL/ID, podcast URL, or other `yt-dlp`-supported locator.
+    - `prefer_audio_only`: When `true`, prefer audio-first formats; use when visuals are not needed.
+    - `wait_seconds`: How long to block before returning; if the job is still running, you get status + reference.
+  - Returns:
+    - On success: `{ reference, status: "done", metadata: {...}, cached? }`
+    - In progress: `{ reference, status: "pending"|"running", progress?, job_id }`
+    - On error: `{ is_error: true, status, detail, reference }`
+  - Typical flow: This is the first call once you have chosen a `source`. The `reference` token is required for all downstream tools.
+- `get_media_retrieval_status(reference: str, wait_seconds: int = 0) -> dict`
+  - Purpose: Poll the retrieval job or fetch cached metadata.
+  - Returns:
+    - `{ status: "done", reference, metadata }` when cached or finished.
+    - `{ status: "pending"|"running", ... }` while in flight.
+    - `{ status: "not_found", reference }` if no job or cache exists.
+#### 🖼️ Slides: extraction and translation
+- `start_slide_extraction(reference: str, wait_seconds: int = 55) -> dict`
+  - Purpose: Extract representative slide stills from a downloaded video.
+  - Note: Full media analysis (`start_media_analysis`) automatically triggers slide extraction; call this explicitly only if you need slides on their own.
+  - Returns: Standard job envelope with `slides` once done or `status` + `job_id` while running.
+- `get_extracted_slides(reference: str, wait_seconds: int = 0) -> dict`
+  - Purpose: Fetch extracted slides or current extraction status.
+  - Returns: `{ status: "done", reference, slides: [...] }` on success, otherwise a job status or `{ status: "not_found" }`. Slides include indices that are used by `translate_slide`.
+- `translate_slide(reference: str, slide_index: int, language: str) -> ImageContent`
+  - Purpose: Translate a single slide image into another language using Gemini image-to-image.
+  - Arguments:
+    - `reference`: Token from `start_media_retrieval`.
+    - `slide_index`: Zero-based index into `get_extracted_slides.slides[].index`.
+    - `language`: Target language name (e.g. `"German"`, `"Spanish"`).
+  - Returns: `ImageContent` with base64-encoded translated slide image. Responses are cached per `(reference, language, slide_index)`.
+#### ⛳️ Expectation-driven analysis
+- `start_media_analysis(reference: str, priors: object, wait_seconds: int = 55) -> dict`
+  - Purpose: Run expectation-driven analysis over the media’s audio and slides, surfacing *surprises* and *new actors* instead of rehashing everything.
+  - Arguments:
+    - `reference`: Token produced by `start_media_retrieval`.
+    - `priors`: Object with optional string fields:
+      - `context`: Scene setting (participants, venue, goal, spelled names).
+      - `expectations`: What the user already expects to hear.
+      - `prior_knowledge`: What the user already knows from past work.
+      - `questions`: Concrete questions to be answered.
+  - Important: Only populate `priors` with information coming from the user or trusted tools (e.g. Memory Bank); do not invent priors in the agent.
+  - Returns: Same job envelope pattern as retrieval. When `status: "done"`, the payload includes an `analysis` markdown briefing optimised for fast reading.
+- `get_media_analysis_result(reference: str, wait_seconds: int = 0) -> dict`
+  - Purpose: Poll for completion or fetch cached analysis for a `reference`.
+  - Returns:
+    - `status: "done"` with `analysis` text on success.
+    - `status: "pending"|"running"` during processing.
+    - Errors include `is_error: true`, `detail`, `reference`.
+#### ✍️ Transcription
+- `start_media_transcription(reference: str, context: str = "", prefer_audio_only: bool = False, wait_seconds: int = 55) -> dict`
+  - Purpose: Produce a diarized, speaker-labelled transcription of the media’s audio channel.
+  - Arguments:
+    - `reference`: From `start_media_retrieval`.
+    - `context`: Optional grounding text with names, acronyms, or domain hints.
+    - `prefer_audio_only`: When `true`, skip slide context for cheaper audio-only runs.
+    - `wait_seconds`: Poll window before returning.
+  - Returns: Job envelope, with `transcription` once `status: "done"`.
+- `get_media_transcription_result(reference: str, wait_seconds: int = 0) -> dict`
+  - Purpose: Retrieve a previously computed transcription or current job status.
+  - Returns: Same pattern as `get_media_analysis_result`, but with `transcription` instead of `analysis`.
+## 🏆 Hackathon Context & Journey
+Aileen 3 Core was built for the [MCP's 1st Birthday - Hosted by Anthropic and Gradio](https://huggingface.co/MCP-1st-Birthday) and serves as the backbone for the [Aileen 3 Agent](https://ndurner.de/links/aileen3-kaggle-writeup) (developed for the [AI Agents Intensive Course with Google](https://www.kaggle.com/learn-guide/5-day-agents)).
+While most agents are passive summarizers, Aileen 3 represents a shift toward **active information foraging**, enabling professionals to filter signal from an ocean of noise.
+## 📦 Local Development
 ```bash
+# Build the Docker image
 docker build -t aileen3-core .
+# Run the Gradio interface
 docker run -it -p 7860:7860 aileen3-core
 ```
+## 🛡️ Security & privacy
+- Your Gemini key is used only server-side to call Gemini models.
+- Media is downloaded to cache for repeatability; clear ~/.cache/aileen3 to remove artefacts.
+- No analytics or third-party telemetry included.
+## 🚧 Limitations
+- `translate_slide` does currently not benefit from priors; translation quality could be improved that way
+- No AI safety guardrails (tone, style, anti prompt-injection, ...)
+- No cost control
+- Hallucination risk - Aileen may make mistakes.
+- Remote MCP operating mode not tested; would rely on external access protection
+## 👾 Troubleshooting
+- Gemini 401 “API keys are not supported…”: use AI Studio key starting with “AI…”, not Vertex keys (“AQ…”).
+- Long jobs: increase transport timeout (10–20 min) and leverage wait_seconds + polling get_* tools.
+- YouTube access:
+  * ensure YouTube is reachable
+  * yt-dlp is recent
+  * if site JS protection breaks, install yt-dlp-ejs (see Space health check).

README.md CHANGED Viewed

@@ -1,80 +1,274 @@
 ---
-title: Aileen3 Core
 emoji: 👩🏻‍💼
 colorFrom: purple
 colorTo: blue
 sdk: docker
 pinned: false
 license: cc-by-4.0
-short_description: Aileen 3 Core - Information Foraging MCP
 tags:
   - building-mcp-track-enterprise
   - building-mcp-track-customer
 ---
-# Aileen 3 Core
-<div style="display: flex; justify-content: center; gap: 10px;">
-<a href="https://ndurner.de/links/aileen3-hf-space"><img alt="HuggingFace Space Badge" src="https://img.shields.io/badge/HuggingFace-Space-yellow?logo=huggingface"></img></a>
-<a href="https://ndurner.de/links/aileen3-linkedin"><img alt="LinkedIn Post Badge" src="https://img.shields.io/badge/LinkedIn-Post-blue?logo=linkedin"></img></a>
-<a href="https://ndurner.de/links/aileen3-hf-video"><img alt="MCP Demo Video Badge" src="https://img.shields.io/badge/MCP%20Demo-Video-red?logo=YouTube"></img></a>
 <a href="https://ndurner.de/links/aileen3-kaggle-writeup"><img alt="Agent Kaggle Writeup" src="https://img.shields.io/badge/Agent-Writeup-lightgray?logo=kaggle"></img></a>
-<a href="https://ndurner.de/links/aileen3-kaggle-video"><img alt="Agent Demo Video Badge" src="https://img.shields.io/badge/Agent%20Demo-Video-lightgray?logo=YouTube"></img></a>
 </div>
-## Introduction
-Large Language Models (LLMs) rely on **tools** - sometimes provided by **MCP servers** - to interact with the outside world. Aileen 3 Core is an MCP server that focuses on **Information Foraging**: mining for novel insights from high-noise sources to create dense briefings for time efficient consumption by the user. Grounded in cognitive science, Aileen 3 models novelty as prediction error against explicit priors such as user expectations, facts from an AI Memory Bank, or media context. To that end, the Aileen 3 Core MCP server provides media access and analysis services backed by Google Gemini.
-### Competition submission
-Aileen 3 Core is [a contender](https://huggingface.co/spaces/ndurner/aileen3-core) in the [MCP's 1st Birthday - Hosted by Anthropic and Gradio](https://huggingface.co/MCP-1st-Birthday) hackathon. [Aileen 3 Agent](https://github.com/ndurner/aileen3-agent), an agentic system built on this MCP server, is a [capstone project](https://ndurner.de/links/aileen3-kaggle-writeup) to the [AI Agents Intensive Course with Google](https://www.kaggle.com/learn-guide/5-day-agents).
-## Using
-### Using in Claude Desktop
-#### Installing
-1. Optionally, create a new Python virtual environment
-  * example: `python3 -m venv .venv-claude`
-2. Install the Aileen MCP: `pip install ./mcp`
-  * (or `pip install -e ./mcp` if you want to make live changes to this source tree)
-3. Obtain Google Gemini API key: [https://aistudio.google.com](Google AI Studio)
-3. Add reference to `claude_desktop_config.json`. The Gemini API key will be read from the environment, so can be set here:
 ```
-{
-...
-  "mcpServers": {
-    "aileen3-mcp": {
-      "command": "/Users/.../aileen3-core/.venv-claude/bin/python",
-      "args": [
-        "-m",
-        "aileen3_mcp.server"
-      ],
-      "env": {
-        "GEMINI_API_KEY": "AI..."
       }
     }
-  }
-}
-```
 4. Restart Claude
-#### Using the MCP server
-The model Haiku 4.5 is sufficient for basic tasks. To make your plans fully transparent to the LLM, refer to "aileen3" explicitely in the prompt, e.g.:
 > Use aileen3 to translate slide 3 from YouTube video reference eXP-PvKcI9A to German.
 ![Screenshot of Claude Desktop: slide translation with Aileen 3 Core](readme-assets/claude-slide-translation.webp)
-#### Debugging
 The message exchange and Claude-facing error messages can be read from Claude log files:
 ```
 tail -n 20 -F ~/Library/Logs/Claude/mcp*.log
 ```
-## Local development
-Build and run the Docker Space image locally:
 ```bash
 docker build -t aileen3-core .
 docker run -it -p 7860:7860 aileen3-core
 ```
-## Troubleshooting
-- Error message: "google.genai.errors.ClientError: 401 UNAUTHENTICATED. {'error': {'code': 401, 'message': 'API keys are not supported by this API. Expected OAuth2 access token or other authentication credentials that assert a principal. ...". This may be a catch-all error message and the underlying problem may be something different. We solved this by using an older Gemini API key from Gemini Studio (Nov 10 vs. Nov 23; same project). The newer API keys may have been generated by Vertex AI, but we did not investigate this further. The working key starts with "AI", the broken keys start with "AQ.".

 ---
+title: Aileen 3 Core - Information Foraging MCP Server
 emoji: 👩🏻‍💼
 colorFrom: purple
 colorTo: blue
 sdk: docker
 pinned: false
 license: cc-by-4.0
+short_description: Turns 45 minute conference videos into 2-minute "surprise briefs". Expectation-driven, slide-aware, Gemini-powered.
 tags:
   - building-mcp-track-enterprise
   - building-mcp-track-customer
 ---
+# Aileen 3 Core: Information Foraging MCP Server
+<div style="display: flex; justify-content: center; gap: 10px; margin-bottom: 1em">
+<a href="https://ndurner.de/links/aileen3-hf-space"><img alt="HuggingFace Space Badge" src="https://img.shields.io/badge/Gradio%206-HuggingFace%20Space-yellow?logo=gradio"></img></a>
+<a href="https://ndurner.de/links/aileen3-linkedin"><img alt="LinkedIn Post Badge" src="https://img.shields.io/badge/🔗%20LinkedIn-Post-blue?logo=linkedin"></img></a>
+<a href="https://ndurner.de/links/aileen3-hf-video"><img alt="MCP Demo Video Badge" src="https://img.shields.io/badge/MCP-Demo%20Video-red?logo=YouTube"></img></a>
+🔜<a href="https://ndurner.de/links/aileen3-agent-github"><img alt="Agent Agent Github Badge" src="https://img.shields.io/badge/Agent-Github-lightgray?logo=github"></img></a>
+<a href="https://ndurner.de/links/aileen3-kaggle-video"><img alt="Agent Demo Video Badge" src="https://img.shields.io/badge/Agent-Demo%20Video-lightgray?logo=YouTube"></img></a>
 <a href="https://ndurner.de/links/aileen3-kaggle-writeup"><img alt="Agent Kaggle Writeup" src="https://img.shields.io/badge/Agent-Writeup-lightgray?logo=kaggle"></img></a>
 </div>
+> **"Information is surprises. You learn something when things don’t turn out the way you expected."** ⸺ Roger Schank
+## ♨️ Problem: The Noise-Signal Ratio
+Professionals working at the intersection of regulation and technology drink from a firehose of information. Staying current requires monitoring hours of conferences, webinars, and podcasts.
+Standard AI **summarization fails** here because it creates "flat" summaries that rehash what you already know. It treats every sentence as equally important.
+## ✅ Solution: Expectation-Driven Analysis
+**Aileen 3 Core** is a Model Context Protocol (MCP) server designed for **Information Foraging**. Grounded in cognitive science, it models "novelty" as **prediction error**.
+Instead of asking "Summarize this video," Aileen 3 Core allows users to task a Large Language Model with:
+*"Here is what I already know, and here is what I expect the speaker to say. Tell me only where they deviate from this baseline."*. As part of a larger agentic AI system, the prior knowledge can even be derived from a memory bank.
+### Key Capabilities
+*   **⛳️ Expectation-Driven Briefings:** Uses Google Gemini to analyze audio/video against user-supplied priors (context, expectations, and knowledge gaps) to surface genuine surprises.
+*   **🔍 Context-Biased Transcription:** Prevents hallucinations (e.g., confusing the German treaty "NOOTS" for "emergency state") by feeding media metadata as priors to the model.
+*   **🖼️ Visual Slide Extraction:** Automatically detects, extracts, and, on request, translates slide stills from video feeds, treating slides as high-density information artifacts.
+*   **🔌 Universal MCP Support:** Works with **Claude Desktop**, or any custom Agent.
+---
+## 🏗️ Architecture
+Aileen 3 Core exposes tools that bridge the gap between raw media and reasoning agents.
+```mermaid
+graph LR
+    User[User / Agent] -->|Priors & Expectations| MCP[Aileen 3 Core MCP]
+    MCP -->|Retrieval| YT[YouTube/Media]
+    MCP -->|Visuals| Slides[Slide Extraction]
+    MCP -->|Audio| Trans[Transcription]
+    MCP -->|Reasoning| Gemini[Google Gemini]
+    YT --> Gemini
+    Slides --> Gemini
+    Trans --> Gemini
+    Gemini -->|Briefing: Surprises Only| MCP
+    Gemini -->|Localized slides| MCP
+    MCP -->|High-Signal Update| User
 ```
+## 🚀 Quick Start: Claude Desktop
+Aileen 3 Core is designed to be the "eyes and ears" for your local LLM client.
+1.  **Install:**
+    ```bash
+    # Clone and install dependencies
+    pip install -e ./mcp
+    ```
+2. Obtain a Google Gemini API key: [Google AI Studio](https://aistudio.google.com)
+3.  **Configure `claude_desktop_config.json`:**. The Gemini API key will be read from the environment, so can also be set here:
+    ```json
+    {
+      "mcpServers": {
+        "aileen3-mcp": {
+          "command": "python",
+          "args": ["-m", "aileen3_mcp.server"],
+          "env": {
+            "GEMINI_API_KEY": "AI..."
+          }
+        }
       }
     }
+    ```
 4. Restart Claude
+5. The Haiku 4.5 model is sufficient for basic tasks. To make your plans fully transparent to the LLM, refer to "aileen3" explicitly in the prompt, e.g.:
 > Use aileen3 to translate slide 3 from YouTube video reference eXP-PvKcI9A to German.
 ![Screenshot of Claude Desktop: slide translation with Aileen 3 Core](readme-assets/claude-slide-translation.webp)
+### 🔍 Debugging
 The message exchange and Claude-facing error messages can be read from Claude log files:
 ```
 tail -n 20 -F ~/Library/Logs/Claude/mcp*.log
 ```
+## 🧪 The Gradio Space (Interactive Demo)
+We have built a custom **Gradio 6** application that acts as a visual frontend for the MCP server. It demonstrates the pipeline step-by-step:
+1.  **Health Check:** Verifies `ffmpeg`, `yt-dlp`, and Gemini connectivity.
+2.  **Hallucination Check:** Demonstrates how lack of context leads to speech recognition errors.
+3.  **Context-biased Transcription:** Fixes these errors by establishing priors.
+4.  **Expectation-driven Analysis:** The core engine in action.
+5.  **Slide Translation:** Extracting and localizing visual assets.
+[**👉 Try the Live Demo Here**](https://ndurner.de/links/aileen3-hf-space)
+## 📘 MCP server overview
+The MCP server is implemented in `mcp/src/aileen3_mcp` and exposes tools over stdio via `aileen3_mcp.server`. Google Gemini powers the analysis, transcription, and slide translation flows. Media retrieval is handled by `yt-dlp` and `ffmpeg`.
+Environment prerequisites:
+- `GEMINI_API_KEY` set to a valid Gemini API key
+- `ffmpeg` installed and on `PATH`
+Optional configuration:
+- `AILEEN3_ANALYSIS_MODEL` to override the default Gemini model used for expectation-driven analysis (defaults to `gemini-flash-latest` for straightforward experimentation on the free tier of Google AI Studio; `gemini-3-pro-preview` recommended for accuracy).
+- `AILEEN3_CACHE_DIR` to change the base cache directory (default: `~/.cache/aileen3`).
+- `AILEEN3_DEBUG=1` to enable additional debug artefacts on disk.
+### ⭐️ Example client integration
+The companion project [Aileen 3 Agent](https://ndurner.de/links/aileen3-agent-github) uses this MCP server via the `google.adk` `McpToolset`, spawning `aileen3_mcp.server` over stdio with:
+- `command`: `sys.executable`
+- `args`: `["-m", "aileen3_mcp.server"]`
+- `env`: explicitly forwarding `GEMINI_API_KEY` into the MCP process
+- `timeout`: `1200` seconds at the MCP transport level, to accommodate long-running video analysis and transcription jobs beyond the 30 seconds default
+When integrating this MCP into your own agent or client:
+- Set transport-level timeouts generously (10–20 minutes) and rely on the tools’ `wait_seconds` argument plus status polling for progress.
+- Ensure `GEMINI_API_KEY` (and any optional `AILEEN3_*` variables you use) are visible in the environment of the MCP server process, not just the client.
+### 🛠️ MCP tools and definitions
+#### 🩺 Health and search
+- `health() -> { ok, detail, ffmpeg, gemini_api_key }`
+  - Purpose: Lightweight health probe mirroring the Gradio demo’s health check. Confirms that `ffmpeg` is callable and `GEMINI_API_KEY` is present.
+  - Usage: Call before running longer flows to surface missing runtime dependencies early.
+- `search_youtube(query: str, max_results: int = 10) -> { videos: [...] }`
+  - Purpose: Fast YouTube search using `yt-dlp` (no downloads).
+  - Arguments:
+    - `query` (required): Free-form search terms (e.g. `"taler auditor bachelorthesis"`).
+    - `max_results` (optional, default `10`, clamped to `1–50`).
+  - Returns: `videos` list with `id`, `title`, `webpage_url`, `duration_seconds`, `channel`, `channel_id`.
+  - Typical flow: Use from an agent to shortlist candidate videos before picking one `source` for retrieval.
+#### 📺 Media retrieval (entry point)
+- `start_media_retrieval(source: str, prefer_audio_only: bool = False, wait_seconds: int = 54) -> dict`
+  - Purpose: Download long-form media (YouTube, podcasts, HTTP URLs) and normalize basic metadata.
+  - Arguments:
+    - `source`: YouTube URL/ID, podcast URL, or other `yt-dlp`-supported locator.
+    - `prefer_audio_only`: When `true`, prefer audio-first formats; use when visuals are not needed.
+    - `wait_seconds`: How long to block before returning; if the job is still running, you get status + reference.
+  - Returns:
+    - On success: `{ reference, status: "done", metadata: {...}, cached? }`
+    - In progress: `{ reference, status: "pending"|"running", progress?, job_id }`
+    - On error: `{ is_error: true, status, detail, reference }`
+  - Typical flow: This is the first call once you have chosen a `source`. The `reference` token is required for all downstream tools.
+- `get_media_retrieval_status(reference: str, wait_seconds: int = 0) -> dict`
+  - Purpose: Poll the retrieval job or fetch cached metadata.
+  - Returns:
+    - `{ status: "done", reference, metadata }` when cached or finished.
+    - `{ status: "pending"|"running", ... }` while in flight.
+    - `{ status: "not_found", reference }` if no job or cache exists.
+#### 🖼️ Slides: extraction and translation
+- `start_slide_extraction(reference: str, wait_seconds: int = 55) -> dict`
+  - Purpose: Extract representative slide stills from a downloaded video.
+  - Note: Full media analysis (`start_media_analysis`) automatically triggers slide extraction; call this explicitly only if you need slides on their own.
+  - Returns: Standard job envelope with `slides` once done or `status` + `job_id` while running.
+- `get_extracted_slides(reference: str, wait_seconds: int = 0) -> dict`
+  - Purpose: Fetch extracted slides or current extraction status.
+  - Returns: `{ status: "done", reference, slides: [...] }` on success, otherwise a job status or `{ status: "not_found" }`. Slides include indices that are used by `translate_slide`.
+- `translate_slide(reference: str, slide_index: int, language: str) -> ImageContent`
+  - Purpose: Translate a single slide image into another language using Gemini image-to-image.
+  - Arguments:
+    - `reference`: Token from `start_media_retrieval`.
+    - `slide_index`: Zero-based index into `get_extracted_slides.slides[].index`.
+    - `language`: Target language name (e.g. `"German"`, `"Spanish"`).
+  - Returns: `ImageContent` with base64-encoded translated slide image. Responses are cached per `(reference, language, slide_index)`.
+#### ⛳️ Expectation-driven analysis
+- `start_media_analysis(reference: str, priors: object, wait_seconds: int = 55) -> dict`
+  - Purpose: Run expectation-driven analysis over the media’s audio and slides, surfacing *surprises* and *new actors* instead of rehashing everything.
+  - Arguments:
+    - `reference`: Token produced by `start_media_retrieval`.
+    - `priors`: Object with optional string fields:
+      - `context`: Scene setting (participants, venue, goal, spelled names).
+      - `expectations`: What the user already expects to hear.
+      - `prior_knowledge`: What the user already knows from past work.
+      - `questions`: Concrete questions to be answered.
+  - Important: Only populate `priors` with information coming from the user or trusted tools (e.g. Memory Bank); do not invent priors in the agent.
+  - Returns: Same job envelope pattern as retrieval. When `status: "done"`, the payload includes an `analysis` markdown briefing optimised for fast reading.
+- `get_media_analysis_result(reference: str, wait_seconds: int = 0) -> dict`
+  - Purpose: Poll for completion or fetch cached analysis for a `reference`.
+  - Returns:
+    - `status: "done"` with `analysis` text on success.
+    - `status: "pending"|"running"` during processing.
+    - Errors include `is_error: true`, `detail`, `reference`.
+#### ✍️ Transcription
+- `start_media_transcription(reference: str, context: str = "", prefer_audio_only: bool = False, wait_seconds: int = 55) -> dict`
+  - Purpose: Produce a diarized, speaker-labelled transcription of the media’s audio channel.
+  - Arguments:
+    - `reference`: From `start_media_retrieval`.
+    - `context`: Optional grounding text with names, acronyms, or domain hints.
+    - `prefer_audio_only`: When `true`, skip slide context for cheaper audio-only runs.
+    - `wait_seconds`: Poll window before returning.
+  - Returns: Job envelope, with `transcription` once `status: "done"`.
+- `get_media_transcription_result(reference: str, wait_seconds: int = 0) -> dict`
+  - Purpose: Retrieve a previously computed transcription or current job status.
+  - Returns: Same pattern as `get_media_analysis_result`, but with `transcription` instead of `analysis`.
+## 🏆 Hackathon Context & Journey
+Aileen 3 Core was built for the [MCP's 1st Birthday - Hosted by Anthropic and Gradio](https://huggingface.co/MCP-1st-Birthday) and serves as the backbone for the [Aileen 3 Agent](https://ndurner.de/links/aileen3-kaggle-writeup) (developed for the [AI Agents Intensive Course with Google](https://www.kaggle.com/learn-guide/5-day-agents)).
+While most agents are passive summarizers, Aileen 3 represents a shift toward **active information foraging**, enabling professionals to filter signal from an ocean of noise.
+## 📦 Local Development
 ```bash
+# Build the Docker image
 docker build -t aileen3-core .
+# Run the Gradio interface
 docker run -it -p 7860:7860 aileen3-core
 ```
+## 🛡️ Security & privacy
+- Your Gemini key is used only server-side to call Gemini models.
+- Media is downloaded to cache for repeatability; clear ~/.cache/aileen3 to remove artefacts.
+- No analytics or third-party telemetry included.
+## 🚧 Limitations
+- `translate_slide` does currently not benefit from priors; translation quality could be improved that way
+- No AI safety guardrails (tone, style, anti prompt-injection, ...)
+- No cost control
+- Hallucination risk - Aileen may make mistakes.
+- Remote MCP operating mode not tested; would rely on external access protection
+## 👾 Troubleshooting
+- Gemini 401 “API keys are not supported…”: use AI Studio key starting with “AI…”, not Vertex keys (“AQ…”).
+- Long jobs: increase transport timeout (10–20 min) and leverage wait_seconds + polling get_* tools.
+- YouTube access:
+  * ensure YouTube is reachable
+  * yt-dlp is recent
+  * if site JS protection breaks, install yt-dlp-ejs (see Space health check).

demo/setup_cell.py CHANGED Viewed

@@ -12,7 +12,7 @@ def render_setup_cell() -> gr.Textbox:
     The returned textbox component is used by other cells to pass GEMINI_API_KEY
     into the MCP server environment.
-    This is recommended practice for the Gradio/Anthropic hackathon
     """
     with cell("🔑 Setup: Gemini API key"):
         gr.Markdown(

     The returned textbox component is used by other cells to pass GEMINI_API_KEY
     into the MCP server environment.
+    This Space runs your key locally in the container to call Gemini. You can revoke it any time.
     """
     with cell("🔑 Setup: Gemini API key"):
         gr.Markdown(

mcp/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Aileen3 MCP Server
-Lightweight MCP server exposing a health check tool for local use by the demo app.
 ## Quick start
@@ -9,32 +9,22 @@ python -m pip install -e ./mcp
 aileen3-mcp  # starts the stdio MCP server
 ```
-The server provides two tools:
-1) `health` — returns an `{ "ok": true, "detail": "…" }` payload.
-2) `search_youtube` — finds YouTube videos using the yt-dlp Python API.
-### `search_youtube` tool contract
-- **Purpose:** Lightweight YouTube search (no downloads). Ideal for LLM agents to shortlist videos.
-- **Arguments:**
-  - `query` (str, required): Free-form search terms, e.g. `"lofi hip hop beats"`.
-  - `max_results` (int, optional, default `10`, bounds `1–50`): number of videos to return.
-- **Returns:** object with a `videos` array. Each entry includes `id`, `title`, `webpage_url`,
-  `duration_seconds`, `channel`, `channel_id`, `thumbnail_url`.
-- **Usage note:** Keep `max_results` small (≤10) for faster responses. The tool only searches; it does not download media.
-Example MCP tool call shape:
-```json
-{
-  "name": "search_youtube",
-  "arguments": {
-    "query": "python packaging tutorial",
-    "max_results": 5
-  }
-}
-```
-## ToDo
-* write proper project description: add to README.md and pyproject.toml

 # Aileen3 MCP Server
+Lightweight stdio MCP server exposing Aileen 3’s media tools for use by the Gradio demo, Claude Desktop, and other MCP clients.
 ## Quick start
 aileen3-mcp  # starts the stdio MCP server
 ```
+The server entrypoint is `aileen3_mcp.server.make_app`, which registers all tools on a `FastMCP` instance. For a complete description of available tools (health probes, YouTube search, media retrieval, slide extraction and translation, analysis, transcription), see the project root `README.md` under **“MCP tools and interface”**.
+In short, the public tools are:
+- `health`
+- `search_youtube`
+- `start_media_retrieval` / `get_media_retrieval_status`
+- `start_slide_extraction` / `get_extracted_slides`
+- `translate_slide`
+- `start_media_analysis` / `get_media_analysis_result`
+- `start_media_transcription` / `get_media_transcription_result`
+These tools are designed to be called from an agentic chat interface that:
+- first chooses a media `source` (optionally using `search_youtube`)
+- then calls `start_media_retrieval`
+- and finally uses the `reference` token to drive analysis, transcription, or slide translation.
+For detailed contracts (arguments, return payloads, and example usage), consult `README.md` in the repository root.

mcp/src/aileen3_mcp/media_tools.py CHANGED Viewed

@@ -1302,7 +1302,7 @@ def register_media_tools(app: FastMCP) -> None:
     async def start_slide_extraction(ctx: Context, reference: str, wait_seconds: int = 55) -> dict:
         """Extract representative slide stills from a downloaded video.
-           Note: media analysis (start_media_analysis) includes slides extraction, so no need to call this function explicitely when aiming for full media analysis
         """
         metadata = _load_json(_metadata_path(reference))
         if not metadata or not Path(metadata.get("download_path", "")).exists():

     async def start_slide_extraction(ctx: Context, reference: str, wait_seconds: int = 55) -> dict:
         """Extract representative slide stills from a downloaded video.
+           Note: media analysis (start_media_analysis) includes slides extraction, so no need to call this function explicitly when aiming for full media analysis
         """
         metadata = _load_json(_metadata_path(reference))
         if not metadata or not Path(metadata.get("download_path", "")).exists():