ndurner commited on
Commit
f6292c0
·
1 Parent(s): 340f3f7
.github/README.md CHANGED
@@ -1,66 +1,260 @@
1
- # Aileen 3 Core
2
- <div style="display: flex; justify-content: center; gap: 10px;">
3
- <a href="https://ndurner.de/links/aileen3-hf-space"><img alt="HuggingFace Space Badge" src="https://img.shields.io/badge/HuggingFace-Space-yellow?logo=huggingface"></img></a>
4
- <a href="https://ndurner.de/links/aileen3-linkedin"><img alt="LinkedIn Post Badge" src="https://img.shields.io/badge/LinkedIn-Post-blue?logo=linkedin"></img></a>
5
- <a href="https://ndurner.de/links/aileen3-hf-video"><img alt="MCP Demo Video Badge" src="https://img.shields.io/badge/MCP%20Demo-Video-red?logo=YouTube"></img></a>
 
 
 
6
  <a href="https://ndurner.de/links/aileen3-kaggle-writeup"><img alt="Agent Kaggle Writeup" src="https://img.shields.io/badge/Agent-Writeup-lightgray?logo=kaggle"></img></a>
7
- <a href="https://ndurner.de/links/aileen3-kaggle-video"><img alt="Agent Demo Video Badge" src="https://img.shields.io/badge/Agent%20Demo-Video-lightgray?logo=YouTube"></img></a>
8
  </div>
9
 
10
- ## Introduction
11
- Large Language Models (LLMs) rely on **tools** - sometimes provided by **MCP servers** - to interact with the outside world. Aileen 3 Core is an MCP server that focuses on **Information Foraging**: mining for novel insights from high-noise sources to create dense briefings for time efficient consumption by the user. Grounded in cognitive science, Aileen 3 models novelty as prediction error against explicit priors such as user expectations, facts from an AI Memory Bank, or media context. To that end, the Aileen 3 Core MCP server provides media access and analysis services backed by Google Gemini.
12
-
13
- ### Competition submission
14
- Aileen 3 Core is [a contender](https://huggingface.co/spaces/ndurner/aileen3-core) in the [MCP's 1st Birthday - Hosted by Anthropic and Gradio](https://huggingface.co/MCP-1st-Birthday) hackathon. [Aileen 3 Agent](https://github.com/ndurner/aileen3-agent), an agentic system built on this MCP server, is a [capstone project](https://ndurner.de/links/aileen3-kaggle-writeup) to the [AI Agents Intensive Course with Google](https://www.kaggle.com/learn-guide/5-day-agents).
15
-
16
- ## Using
17
- ### Using in Claude Desktop
18
- #### Installing
19
- 1. Optionally, create a new Python virtual environment
20
- * example: `python3 -m venv .venv-claude`
21
- 2. Install the Aileen MCP: `pip install ./mcp`
22
- * (or `pip install -e ./mcp` if you want to make live changes to this source tree)
23
- 3. Obtain Google Gemini API key: [https://aistudio.google.com](../Google AI Studio)
24
- 3. Add reference to `claude_desktop_config.json`. The Gemini API key will be read from the environment, so can be set here:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  ```
26
- {
27
- ...
28
- "mcpServers": {
29
- "aileen3-mcp": {
30
- "command": "/Users/.../aileen3-core/.venv-claude/bin/python",
31
- "args": [
32
- "-m",
33
- "aileen3_mcp.server"
34
- ],
35
- "env": {
36
- "GEMINI_API_KEY": "AI..."
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  }
38
  }
39
- }
40
- }
41
- ```
42
  4. Restart Claude
43
 
44
- #### Using the MCP server
45
- The model Haiku 4.5 is sufficient for basic tasks. To make your plans fully transparent to the LLM, refer to "aileen3" explicitely in the prompt, e.g.:
46
  > Use aileen3 to translate slide 3 from YouTube video reference eXP-PvKcI9A to German.
47
 
48
  ![Screenshot of Claude Desktop: slide translation with Aileen 3 Core](../readme-assets/claude-slide-translation.webp)
49
 
50
- #### Debugging
51
  The message exchange and Claude-facing error messages can be read from Claude log files:
52
  ```
53
  tail -n 20 -F ~/Library/Logs/Claude/mcp*.log
54
  ```
55
 
56
- ## Local development
57
 
58
- Build and run the Docker Space image locally:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
 
60
  ```bash
 
61
  docker build -t aileen3-core .
 
 
62
  docker run -it -p 7860:7860 aileen3-core
63
  ```
64
 
65
- ## Troubleshooting
66
- - Error message: "google.genai.errors.ClientError: 401 UNAUTHENTICATED. {'error': {'code': 401, 'message': 'API keys are not supported by this API. Expected OAuth2 access token or other authentication credentials that assert a principal. ...". This may be a catch-all error message and the underlying problem may be something different. We solved this by using an older Gemini API key from Gemini Studio (Nov 10 vs. Nov 23; same project). The newer API keys may have been generated by Vertex AI, but we did not investigate this further. The working key starts with "AI", the broken keys start with "AQ.".
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Aileen 3 Core: Information Foraging MCP Server
2
+
3
+ <div style="display: flex; justify-content: center; gap: 10px; margin-bottom: 1em">
4
+ <a href="https://ndurner.de/links/aileen3-hf-space"><img alt="HuggingFace Space Badge" src="https://img.shields.io/badge/Gradio%206-HuggingFace%20Space-yellow?logo=gradio"></img></a>
5
+ <a href="https://ndurner.de/links/aileen3-linkedin"><img alt="LinkedIn Post Badge" src="https://img.shields.io/badge/🔗%20LinkedIn-Post-blue?logo=linkedin"></img></a>
6
+ <a href="https://ndurner.de/links/aileen3-hf-video"><img alt="MCP Demo Video Badge" src="https://img.shields.io/badge/MCP-Demo%20Video-red?logo=YouTube"></img></a>
7
+ 🔜<a href="https://ndurner.de/links/aileen3-agent-github"><img alt="Agent Agent Github Badge" src="https://img.shields.io/badge/Agent-Github-lightgray?logo=github"></img></a>
8
+ <a href="https://ndurner.de/links/aileen3-kaggle-video"><img alt="Agent Demo Video Badge" src="https://img.shields.io/badge/Agent-Demo%20Video-lightgray?logo=YouTube"></img></a>
9
  <a href="https://ndurner.de/links/aileen3-kaggle-writeup"><img alt="Agent Kaggle Writeup" src="https://img.shields.io/badge/Agent-Writeup-lightgray?logo=kaggle"></img></a>
 
10
  </div>
11
 
12
+ > **"Information is surprises. You learn something when things don’t turn out the way you expected."** ⸺ Roger Schank
13
+
14
+ ## ♨️ Problem: The Noise-Signal Ratio
15
+ Professionals working at the intersection of regulation and technology drink from a firehose of information. Staying current requires monitoring hours of conferences, webinars, and podcasts.
16
+
17
+ Standard AI **summarization fails** here because it creates "flat" summaries that rehash what you already know. It treats every sentence as equally important.
18
+
19
+ ## Solution: Expectation-Driven Analysis
20
+ **Aileen 3 Core** is a Model Context Protocol (MCP) server designed for **Information Foraging**. Grounded in cognitive science, it models "novelty" as **prediction error**.
21
+
22
+ Instead of asking "Summarize this video," Aileen 3 Core allows users to task a Large Language Model with:
23
+ *"Here is what I already know, and here is what I expect the speaker to say. Tell me only where they deviate from this baseline."*. As part of a larger agentic AI system, the prior knowledge can even be derived from a memory bank.
24
+
25
+
26
+ ### Key Capabilities
27
+ * **⛳️ Expectation-Driven Briefings:** Uses Google Gemini to analyze audio/video against user-supplied priors (context, expectations, and knowledge gaps) to surface genuine surprises.
28
+ * **🔍 Context-Biased Transcription:** Prevents hallucinations (e.g., confusing the German treaty "NOOTS" for "emergency state") by feeding media metadata as priors to the model.
29
+ * **🖼️ Visual Slide Extraction:** Automatically detects, extracts, and, on request, translates slide stills from video feeds, treating slides as high-density information artifacts.
30
+ * **🔌 Universal MCP Support:** Works with **Claude Desktop**, or any custom Agent.
31
+
32
+ ---
33
+
34
+ ## 🏗️ Architecture
35
+
36
+ Aileen 3 Core exposes tools that bridge the gap between raw media and reasoning agents.
37
+
38
+ ```mermaid
39
+ graph LR
40
+ User[User / Agent] -->|Priors & Expectations| MCP[Aileen 3 Core MCP]
41
+ MCP -->|Retrieval| YT[YouTube/Media]
42
+ MCP -->|Visuals| Slides[Slide Extraction]
43
+ MCP -->|Audio| Trans[Transcription]
44
+ MCP -->|Reasoning| Gemini[Google Gemini]
45
+
46
+ YT --> Gemini
47
+ Slides --> Gemini
48
+ Trans --> Gemini
49
+
50
+ Gemini -->|Briefing: Surprises Only| MCP
51
+ Gemini -->|Localized slides| MCP
52
+ MCP -->|High-Signal Update| User
53
  ```
54
+
55
+ ## 🚀 Quick Start: Claude Desktop
56
+
57
+ Aileen 3 Core is designed to be the "eyes and ears" for your local LLM client.
58
+
59
+ 1. **Install:**
60
+ ```bash
61
+ # Clone and install dependencies
62
+ pip install -e ./mcp
63
+ ```
64
+
65
+ 2. Obtain a Google Gemini API key: [Google AI Studio](https://aistudio.google.com)
66
+
67
+ 3. **Configure `claude_desktop_config.json`:**. The Gemini API key will be read from the environment, so can also be set here:
68
+ ```json
69
+ {
70
+ "mcpServers": {
71
+ "aileen3-mcp": {
72
+ "command": "python",
73
+ "args": ["-m", "aileen3_mcp.server"],
74
+ "env": {
75
+ "GEMINI_API_KEY": "AI..."
76
+ }
77
+ }
78
  }
79
  }
80
+ ```
 
 
81
  4. Restart Claude
82
 
83
+ 5. The Haiku 4.5 model is sufficient for basic tasks. To make your plans fully transparent to the LLM, refer to "aileen3" explicitly in the prompt, e.g.:
 
84
  > Use aileen3 to translate slide 3 from YouTube video reference eXP-PvKcI9A to German.
85
 
86
  ![Screenshot of Claude Desktop: slide translation with Aileen 3 Core](../readme-assets/claude-slide-translation.webp)
87
 
88
+ ### 🔍 Debugging
89
  The message exchange and Claude-facing error messages can be read from Claude log files:
90
  ```
91
  tail -n 20 -F ~/Library/Logs/Claude/mcp*.log
92
  ```
93
 
 
94
 
95
+ ## 🧪 The Gradio Space (Interactive Demo)
96
+
97
+ We have built a custom **Gradio 6** application that acts as a visual frontend for the MCP server. It demonstrates the pipeline step-by-step:
98
+
99
+ 1. **Health Check:** Verifies `ffmpeg`, `yt-dlp`, and Gemini connectivity.
100
+ 2. **Hallucination Check:** Demonstrates how lack of context leads to speech recognition errors.
101
+ 3. **Context-biased Transcription:** Fixes these errors by establishing priors.
102
+ 4. **Expectation-driven Analysis:** The core engine in action.
103
+ 5. **Slide Translation:** Extracting and localizing visual assets.
104
+
105
+ [**👉 Try the Live Demo Here**](https://ndurner.de/links/aileen3-hf-space)
106
+
107
+ ## 📘 MCP server overview
108
+
109
+ The MCP server is implemented in `mcp/src/aileen3_mcp` and exposes tools over stdio via `aileen3_mcp.server`. Google Gemini powers the analysis, transcription, and slide translation flows. Media retrieval is handled by `yt-dlp` and `ffmpeg`.
110
+
111
+ Environment prerequisites:
112
+
113
+ - `GEMINI_API_KEY` set to a valid Gemini API key
114
+ - `ffmpeg` installed and on `PATH`
115
+
116
+ Optional configuration:
117
+
118
+ - `AILEEN3_ANALYSIS_MODEL` to override the default Gemini model used for expectation-driven analysis (defaults to `gemini-flash-latest` for straightforward experimentation on the free tier of Google AI Studio; `gemini-3-pro-preview` recommended for accuracy).
119
+ - `AILEEN3_CACHE_DIR` to change the base cache directory (default: `~/.cache/aileen3`).
120
+ - `AILEEN3_DEBUG=1` to enable additional debug artefacts on disk.
121
+
122
+ ### ⭐️ Example client integration
123
+
124
+ The companion project [Aileen 3 Agent](https://ndurner.de/links/aileen3-agent-github) uses this MCP server via the `google.adk` `McpToolset`, spawning `aileen3_mcp.server` over stdio with:
125
+
126
+ - `command`: `sys.executable`
127
+ - `args`: `["-m", "aileen3_mcp.server"]`
128
+ - `env`: explicitly forwarding `GEMINI_API_KEY` into the MCP process
129
+ - `timeout`: `1200` seconds at the MCP transport level, to accommodate long-running video analysis and transcription jobs beyond the 30 seconds default
130
+
131
+ When integrating this MCP into your own agent or client:
132
+
133
+ - Set transport-level timeouts generously (10–20 minutes) and rely on the tools’ `wait_seconds` argument plus status polling for progress.
134
+ - Ensure `GEMINI_API_KEY` (and any optional `AILEEN3_*` variables you use) are visible in the environment of the MCP server process, not just the client.
135
+
136
+ ### 🛠️ MCP tools and definitions
137
+ #### 🩺 Health and search
138
+
139
+ - `health() -> { ok, detail, ffmpeg, gemini_api_key }`
140
+ - Purpose: Lightweight health probe mirroring the Gradio demo’s health check. Confirms that `ffmpeg` is callable and `GEMINI_API_KEY` is present.
141
+ - Usage: Call before running longer flows to surface missing runtime dependencies early.
142
+
143
+ - `search_youtube(query: str, max_results: int = 10) -> { videos: [...] }`
144
+ - Purpose: Fast YouTube search using `yt-dlp` (no downloads).
145
+ - Arguments:
146
+ - `query` (required): Free-form search terms (e.g. `"taler auditor bachelorthesis"`).
147
+ - `max_results` (optional, default `10`, clamped to `1–50`).
148
+ - Returns: `videos` list with `id`, `title`, `webpage_url`, `duration_seconds`, `channel`, `channel_id`.
149
+ - Typical flow: Use from an agent to shortlist candidate videos before picking one `source` for retrieval.
150
+
151
+ #### 📺 Media retrieval (entry point)
152
+
153
+ - `start_media_retrieval(source: str, prefer_audio_only: bool = False, wait_seconds: int = 54) -> dict`
154
+ - Purpose: Download long-form media (YouTube, podcasts, HTTP URLs) and normalize basic metadata.
155
+ - Arguments:
156
+ - `source`: YouTube URL/ID, podcast URL, or other `yt-dlp`-supported locator.
157
+ - `prefer_audio_only`: When `true`, prefer audio-first formats; use when visuals are not needed.
158
+ - `wait_seconds`: How long to block before returning; if the job is still running, you get status + reference.
159
+ - Returns:
160
+ - On success: `{ reference, status: "done", metadata: {...}, cached? }`
161
+ - In progress: `{ reference, status: "pending"|"running", progress?, job_id }`
162
+ - On error: `{ is_error: true, status, detail, reference }`
163
+ - Typical flow: This is the first call once you have chosen a `source`. The `reference` token is required for all downstream tools.
164
+
165
+ - `get_media_retrieval_status(reference: str, wait_seconds: int = 0) -> dict`
166
+ - Purpose: Poll the retrieval job or fetch cached metadata.
167
+ - Returns:
168
+ - `{ status: "done", reference, metadata }` when cached or finished.
169
+ - `{ status: "pending"|"running", ... }` while in flight.
170
+ - `{ status: "not_found", reference }` if no job or cache exists.
171
+
172
+ #### 🖼️ Slides: extraction and translation
173
+
174
+ - `start_slide_extraction(reference: str, wait_seconds: int = 55) -> dict`
175
+ - Purpose: Extract representative slide stills from a downloaded video.
176
+ - Note: Full media analysis (`start_media_analysis`) automatically triggers slide extraction; call this explicitly only if you need slides on their own.
177
+ - Returns: Standard job envelope with `slides` once done or `status` + `job_id` while running.
178
+
179
+ - `get_extracted_slides(reference: str, wait_seconds: int = 0) -> dict`
180
+ - Purpose: Fetch extracted slides or current extraction status.
181
+ - Returns: `{ status: "done", reference, slides: [...] }` on success, otherwise a job status or `{ status: "not_found" }`. Slides include indices that are used by `translate_slide`.
182
+
183
+ - `translate_slide(reference: str, slide_index: int, language: str) -> ImageContent`
184
+ - Purpose: Translate a single slide image into another language using Gemini image-to-image.
185
+ - Arguments:
186
+ - `reference`: Token from `start_media_retrieval`.
187
+ - `slide_index`: Zero-based index into `get_extracted_slides.slides[].index`.
188
+ - `language`: Target language name (e.g. `"German"`, `"Spanish"`).
189
+ - Returns: `ImageContent` with base64-encoded translated slide image. Responses are cached per `(reference, language, slide_index)`.
190
+
191
+ #### ⛳️ Expectation-driven analysis
192
+
193
+ - `start_media_analysis(reference: str, priors: object, wait_seconds: int = 55) -> dict`
194
+ - Purpose: Run expectation-driven analysis over the media’s audio and slides, surfacing *surprises* and *new actors* instead of rehashing everything.
195
+ - Arguments:
196
+ - `reference`: Token produced by `start_media_retrieval`.
197
+ - `priors`: Object with optional string fields:
198
+ - `context`: Scene setting (participants, venue, goal, spelled names).
199
+ - `expectations`: What the user already expects to hear.
200
+ - `prior_knowledge`: What the user already knows from past work.
201
+ - `questions`: Concrete questions to be answered.
202
+ - Important: Only populate `priors` with information coming from the user or trusted tools (e.g. Memory Bank); do not invent priors in the agent.
203
+ - Returns: Same job envelope pattern as retrieval. When `status: "done"`, the payload includes an `analysis` markdown briefing optimised for fast reading.
204
+
205
+ - `get_media_analysis_result(reference: str, wait_seconds: int = 0) -> dict`
206
+ - Purpose: Poll for completion or fetch cached analysis for a `reference`.
207
+ - Returns:
208
+ - `status: "done"` with `analysis` text on success.
209
+ - `status: "pending"|"running"` during processing.
210
+ - Errors include `is_error: true`, `detail`, `reference`.
211
+
212
+ #### ✍️ Transcription
213
+
214
+ - `start_media_transcription(reference: str, context: str = "", prefer_audio_only: bool = False, wait_seconds: int = 55) -> dict`
215
+ - Purpose: Produce a diarized, speaker-labelled transcription of the media’s audio channel.
216
+ - Arguments:
217
+ - `reference`: From `start_media_retrieval`.
218
+ - `context`: Optional grounding text with names, acronyms, or domain hints.
219
+ - `prefer_audio_only`: When `true`, skip slide context for cheaper audio-only runs.
220
+ - `wait_seconds`: Poll window before returning.
221
+ - Returns: Job envelope, with `transcription` once `status: "done"`.
222
+
223
+ - `get_media_transcription_result(reference: str, wait_seconds: int = 0) -> dict`
224
+ - Purpose: Retrieve a previously computed transcription or current job status.
225
+ - Returns: Same pattern as `get_media_analysis_result`, but with `transcription` instead of `analysis`.
226
+
227
+ ## 🏆 Hackathon Context & Journey
228
+ Aileen 3 Core was built for the [MCP's 1st Birthday - Hosted by Anthropic and Gradio](https://huggingface.co/MCP-1st-Birthday) and serves as the backbone for the [Aileen 3 Agent](https://ndurner.de/links/aileen3-kaggle-writeup) (developed for the [AI Agents Intensive Course with Google](https://www.kaggle.com/learn-guide/5-day-agents)).
229
+
230
+ While most agents are passive summarizers, Aileen 3 represents a shift toward **active information foraging**, enabling professionals to filter signal from an ocean of noise.
231
+
232
+ ## 📦 Local Development
233
 
234
  ```bash
235
+ # Build the Docker image
236
  docker build -t aileen3-core .
237
+
238
+ # Run the Gradio interface
239
  docker run -it -p 7860:7860 aileen3-core
240
  ```
241
 
242
+ ## 🛡️ Security & privacy
243
+ - Your Gemini key is used only server-side to call Gemini models.
244
+ - Media is downloaded to cache for repeatability; clear ~/.cache/aileen3 to remove artefacts.
245
+ - No analytics or third-party telemetry included.
246
+
247
+ ## 🚧 Limitations
248
+ - `translate_slide` does currently not benefit from priors; translation quality could be improved that way
249
+ - No AI safety guardrails (tone, style, anti prompt-injection, ...)
250
+ - No cost control
251
+ - Hallucination risk - Aileen may make mistakes.
252
+ - Remote MCP operating mode not tested; would rely on external access protection
253
+
254
+ ## 👾 Troubleshooting
255
+ - Gemini 401 “API keys are not supported…”: use AI Studio key starting with “AI…”, not Vertex keys (“AQ…”).
256
+ - Long jobs: increase transport timeout (10–20 min) and leverage wait_seconds + polling get_* tools.
257
+ - YouTube access:
258
+ * ensure YouTube is reachable
259
+ * yt-dlp is recent
260
+ * if site JS protection breaks, install yt-dlp-ejs (see Space health check).
README.md CHANGED
@@ -1,80 +1,274 @@
1
  ---
2
- title: Aileen3 Core
3
  emoji: 👩🏻‍💼
4
  colorFrom: purple
5
  colorTo: blue
6
  sdk: docker
7
  pinned: false
8
  license: cc-by-4.0
9
- short_description: Aileen 3 Core - Information Foraging MCP
10
  tags:
11
  - building-mcp-track-enterprise
12
  - building-mcp-track-customer
13
  ---
14
 
15
- # Aileen 3 Core
16
- <div style="display: flex; justify-content: center; gap: 10px;">
17
- <a href="https://ndurner.de/links/aileen3-hf-space"><img alt="HuggingFace Space Badge" src="https://img.shields.io/badge/HuggingFace-Space-yellow?logo=huggingface"></img></a>
18
- <a href="https://ndurner.de/links/aileen3-linkedin"><img alt="LinkedIn Post Badge" src="https://img.shields.io/badge/LinkedIn-Post-blue?logo=linkedin"></img></a>
19
- <a href="https://ndurner.de/links/aileen3-hf-video"><img alt="MCP Demo Video Badge" src="https://img.shields.io/badge/MCP%20Demo-Video-red?logo=YouTube"></img></a>
 
 
 
20
  <a href="https://ndurner.de/links/aileen3-kaggle-writeup"><img alt="Agent Kaggle Writeup" src="https://img.shields.io/badge/Agent-Writeup-lightgray?logo=kaggle"></img></a>
21
- <a href="https://ndurner.de/links/aileen3-kaggle-video"><img alt="Agent Demo Video Badge" src="https://img.shields.io/badge/Agent%20Demo-Video-lightgray?logo=YouTube"></img></a>
22
  </div>
23
 
24
- ## Introduction
25
- Large Language Models (LLMs) rely on **tools** - sometimes provided by **MCP servers** - to interact with the outside world. Aileen 3 Core is an MCP server that focuses on **Information Foraging**: mining for novel insights from high-noise sources to create dense briefings for time efficient consumption by the user. Grounded in cognitive science, Aileen 3 models novelty as prediction error against explicit priors such as user expectations, facts from an AI Memory Bank, or media context. To that end, the Aileen 3 Core MCP server provides media access and analysis services backed by Google Gemini.
26
-
27
- ### Competition submission
28
- Aileen 3 Core is [a contender](https://huggingface.co/spaces/ndurner/aileen3-core) in the [MCP's 1st Birthday - Hosted by Anthropic and Gradio](https://huggingface.co/MCP-1st-Birthday) hackathon. [Aileen 3 Agent](https://github.com/ndurner/aileen3-agent), an agentic system built on this MCP server, is a [capstone project](https://ndurner.de/links/aileen3-kaggle-writeup) to the [AI Agents Intensive Course with Google](https://www.kaggle.com/learn-guide/5-day-agents).
29
-
30
- ## Using
31
- ### Using in Claude Desktop
32
- #### Installing
33
- 1. Optionally, create a new Python virtual environment
34
- * example: `python3 -m venv .venv-claude`
35
- 2. Install the Aileen MCP: `pip install ./mcp`
36
- * (or `pip install -e ./mcp` if you want to make live changes to this source tree)
37
- 3. Obtain Google Gemini API key: [https://aistudio.google.com](Google AI Studio)
38
- 3. Add reference to `claude_desktop_config.json`. The Gemini API key will be read from the environment, so can be set here:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
  ```
40
- {
41
- ...
42
- "mcpServers": {
43
- "aileen3-mcp": {
44
- "command": "/Users/.../aileen3-core/.venv-claude/bin/python",
45
- "args": [
46
- "-m",
47
- "aileen3_mcp.server"
48
- ],
49
- "env": {
50
- "GEMINI_API_KEY": "AI..."
 
 
 
 
 
 
 
 
 
 
 
 
 
51
  }
52
  }
53
- }
54
- }
55
- ```
56
  4. Restart Claude
57
 
58
- #### Using the MCP server
59
- The model Haiku 4.5 is sufficient for basic tasks. To make your plans fully transparent to the LLM, refer to "aileen3" explicitely in the prompt, e.g.:
60
  > Use aileen3 to translate slide 3 from YouTube video reference eXP-PvKcI9A to German.
61
 
62
  ![Screenshot of Claude Desktop: slide translation with Aileen 3 Core](readme-assets/claude-slide-translation.webp)
63
 
64
- #### Debugging
65
  The message exchange and Claude-facing error messages can be read from Claude log files:
66
  ```
67
  tail -n 20 -F ~/Library/Logs/Claude/mcp*.log
68
  ```
69
 
70
- ## Local development
71
 
72
- Build and run the Docker Space image locally:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
 
74
  ```bash
 
75
  docker build -t aileen3-core .
 
 
76
  docker run -it -p 7860:7860 aileen3-core
77
  ```
78
 
79
- ## Troubleshooting
80
- - Error message: "google.genai.errors.ClientError: 401 UNAUTHENTICATED. {'error': {'code': 401, 'message': 'API keys are not supported by this API. Expected OAuth2 access token or other authentication credentials that assert a principal. ...". This may be a catch-all error message and the underlying problem may be something different. We solved this by using an older Gemini API key from Gemini Studio (Nov 10 vs. Nov 23; same project). The newer API keys may have been generated by Vertex AI, but we did not investigate this further. The working key starts with "AI", the broken keys start with "AQ.".
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Aileen 3 Core - Information Foraging MCP Server
3
  emoji: 👩🏻‍💼
4
  colorFrom: purple
5
  colorTo: blue
6
  sdk: docker
7
  pinned: false
8
  license: cc-by-4.0
9
+ short_description: Turns 45 minute conference videos into 2-minute "surprise briefs". Expectation-driven, slide-aware, Gemini-powered.
10
  tags:
11
  - building-mcp-track-enterprise
12
  - building-mcp-track-customer
13
  ---
14
 
15
+ # Aileen 3 Core: Information Foraging MCP Server
16
+
17
+ <div style="display: flex; justify-content: center; gap: 10px; margin-bottom: 1em">
18
+ <a href="https://ndurner.de/links/aileen3-hf-space"><img alt="HuggingFace Space Badge" src="https://img.shields.io/badge/Gradio%206-HuggingFace%20Space-yellow?logo=gradio"></img></a>
19
+ <a href="https://ndurner.de/links/aileen3-linkedin"><img alt="LinkedIn Post Badge" src="https://img.shields.io/badge/🔗%20LinkedIn-Post-blue?logo=linkedin"></img></a>
20
+ <a href="https://ndurner.de/links/aileen3-hf-video"><img alt="MCP Demo Video Badge" src="https://img.shields.io/badge/MCP-Demo%20Video-red?logo=YouTube"></img></a>
21
+ 🔜<a href="https://ndurner.de/links/aileen3-agent-github"><img alt="Agent Agent Github Badge" src="https://img.shields.io/badge/Agent-Github-lightgray?logo=github"></img></a>
22
+ <a href="https://ndurner.de/links/aileen3-kaggle-video"><img alt="Agent Demo Video Badge" src="https://img.shields.io/badge/Agent-Demo%20Video-lightgray?logo=YouTube"></img></a>
23
  <a href="https://ndurner.de/links/aileen3-kaggle-writeup"><img alt="Agent Kaggle Writeup" src="https://img.shields.io/badge/Agent-Writeup-lightgray?logo=kaggle"></img></a>
 
24
  </div>
25
 
26
+ > **"Information is surprises. You learn something when things don’t turn out the way you expected."** ⸺ Roger Schank
27
+
28
+ ## ♨️ Problem: The Noise-Signal Ratio
29
+ Professionals working at the intersection of regulation and technology drink from a firehose of information. Staying current requires monitoring hours of conferences, webinars, and podcasts.
30
+
31
+ Standard AI **summarization fails** here because it creates "flat" summaries that rehash what you already know. It treats every sentence as equally important.
32
+
33
+ ## Solution: Expectation-Driven Analysis
34
+ **Aileen 3 Core** is a Model Context Protocol (MCP) server designed for **Information Foraging**. Grounded in cognitive science, it models "novelty" as **prediction error**.
35
+
36
+ Instead of asking "Summarize this video," Aileen 3 Core allows users to task a Large Language Model with:
37
+ *"Here is what I already know, and here is what I expect the speaker to say. Tell me only where they deviate from this baseline."*. As part of a larger agentic AI system, the prior knowledge can even be derived from a memory bank.
38
+
39
+
40
+ ### Key Capabilities
41
+ * **⛳️ Expectation-Driven Briefings:** Uses Google Gemini to analyze audio/video against user-supplied priors (context, expectations, and knowledge gaps) to surface genuine surprises.
42
+ * **🔍 Context-Biased Transcription:** Prevents hallucinations (e.g., confusing the German treaty "NOOTS" for "emergency state") by feeding media metadata as priors to the model.
43
+ * **🖼️ Visual Slide Extraction:** Automatically detects, extracts, and, on request, translates slide stills from video feeds, treating slides as high-density information artifacts.
44
+ * **🔌 Universal MCP Support:** Works with **Claude Desktop**, or any custom Agent.
45
+
46
+ ---
47
+
48
+ ## 🏗️ Architecture
49
+
50
+ Aileen 3 Core exposes tools that bridge the gap between raw media and reasoning agents.
51
+
52
+ ```mermaid
53
+ graph LR
54
+ User[User / Agent] -->|Priors & Expectations| MCP[Aileen 3 Core MCP]
55
+ MCP -->|Retrieval| YT[YouTube/Media]
56
+ MCP -->|Visuals| Slides[Slide Extraction]
57
+ MCP -->|Audio| Trans[Transcription]
58
+ MCP -->|Reasoning| Gemini[Google Gemini]
59
+
60
+ YT --> Gemini
61
+ Slides --> Gemini
62
+ Trans --> Gemini
63
+
64
+ Gemini -->|Briefing: Surprises Only| MCP
65
+ Gemini -->|Localized slides| MCP
66
+ MCP -->|High-Signal Update| User
67
  ```
68
+
69
+ ## 🚀 Quick Start: Claude Desktop
70
+
71
+ Aileen 3 Core is designed to be the "eyes and ears" for your local LLM client.
72
+
73
+ 1. **Install:**
74
+ ```bash
75
+ # Clone and install dependencies
76
+ pip install -e ./mcp
77
+ ```
78
+
79
+ 2. Obtain a Google Gemini API key: [Google AI Studio](https://aistudio.google.com)
80
+
81
+ 3. **Configure `claude_desktop_config.json`:**. The Gemini API key will be read from the environment, so can also be set here:
82
+ ```json
83
+ {
84
+ "mcpServers": {
85
+ "aileen3-mcp": {
86
+ "command": "python",
87
+ "args": ["-m", "aileen3_mcp.server"],
88
+ "env": {
89
+ "GEMINI_API_KEY": "AI..."
90
+ }
91
+ }
92
  }
93
  }
94
+ ```
 
 
95
  4. Restart Claude
96
 
97
+ 5. The Haiku 4.5 model is sufficient for basic tasks. To make your plans fully transparent to the LLM, refer to "aileen3" explicitly in the prompt, e.g.:
 
98
  > Use aileen3 to translate slide 3 from YouTube video reference eXP-PvKcI9A to German.
99
 
100
  ![Screenshot of Claude Desktop: slide translation with Aileen 3 Core](readme-assets/claude-slide-translation.webp)
101
 
102
+ ### 🔍 Debugging
103
  The message exchange and Claude-facing error messages can be read from Claude log files:
104
  ```
105
  tail -n 20 -F ~/Library/Logs/Claude/mcp*.log
106
  ```
107
 
 
108
 
109
+ ## 🧪 The Gradio Space (Interactive Demo)
110
+
111
+ We have built a custom **Gradio 6** application that acts as a visual frontend for the MCP server. It demonstrates the pipeline step-by-step:
112
+
113
+ 1. **Health Check:** Verifies `ffmpeg`, `yt-dlp`, and Gemini connectivity.
114
+ 2. **Hallucination Check:** Demonstrates how lack of context leads to speech recognition errors.
115
+ 3. **Context-biased Transcription:** Fixes these errors by establishing priors.
116
+ 4. **Expectation-driven Analysis:** The core engine in action.
117
+ 5. **Slide Translation:** Extracting and localizing visual assets.
118
+
119
+ [**👉 Try the Live Demo Here**](https://ndurner.de/links/aileen3-hf-space)
120
+
121
+ ## 📘 MCP server overview
122
+
123
+ The MCP server is implemented in `mcp/src/aileen3_mcp` and exposes tools over stdio via `aileen3_mcp.server`. Google Gemini powers the analysis, transcription, and slide translation flows. Media retrieval is handled by `yt-dlp` and `ffmpeg`.
124
+
125
+ Environment prerequisites:
126
+
127
+ - `GEMINI_API_KEY` set to a valid Gemini API key
128
+ - `ffmpeg` installed and on `PATH`
129
+
130
+ Optional configuration:
131
+
132
+ - `AILEEN3_ANALYSIS_MODEL` to override the default Gemini model used for expectation-driven analysis (defaults to `gemini-flash-latest` for straightforward experimentation on the free tier of Google AI Studio; `gemini-3-pro-preview` recommended for accuracy).
133
+ - `AILEEN3_CACHE_DIR` to change the base cache directory (default: `~/.cache/aileen3`).
134
+ - `AILEEN3_DEBUG=1` to enable additional debug artefacts on disk.
135
+
136
+ ### ⭐️ Example client integration
137
+
138
+ The companion project [Aileen 3 Agent](https://ndurner.de/links/aileen3-agent-github) uses this MCP server via the `google.adk` `McpToolset`, spawning `aileen3_mcp.server` over stdio with:
139
+
140
+ - `command`: `sys.executable`
141
+ - `args`: `["-m", "aileen3_mcp.server"]`
142
+ - `env`: explicitly forwarding `GEMINI_API_KEY` into the MCP process
143
+ - `timeout`: `1200` seconds at the MCP transport level, to accommodate long-running video analysis and transcription jobs beyond the 30 seconds default
144
+
145
+ When integrating this MCP into your own agent or client:
146
+
147
+ - Set transport-level timeouts generously (10–20 minutes) and rely on the tools’ `wait_seconds` argument plus status polling for progress.
148
+ - Ensure `GEMINI_API_KEY` (and any optional `AILEEN3_*` variables you use) are visible in the environment of the MCP server process, not just the client.
149
+
150
+ ### 🛠️ MCP tools and definitions
151
+ #### 🩺 Health and search
152
+
153
+ - `health() -> { ok, detail, ffmpeg, gemini_api_key }`
154
+ - Purpose: Lightweight health probe mirroring the Gradio demo’s health check. Confirms that `ffmpeg` is callable and `GEMINI_API_KEY` is present.
155
+ - Usage: Call before running longer flows to surface missing runtime dependencies early.
156
+
157
+ - `search_youtube(query: str, max_results: int = 10) -> { videos: [...] }`
158
+ - Purpose: Fast YouTube search using `yt-dlp` (no downloads).
159
+ - Arguments:
160
+ - `query` (required): Free-form search terms (e.g. `"taler auditor bachelorthesis"`).
161
+ - `max_results` (optional, default `10`, clamped to `1–50`).
162
+ - Returns: `videos` list with `id`, `title`, `webpage_url`, `duration_seconds`, `channel`, `channel_id`.
163
+ - Typical flow: Use from an agent to shortlist candidate videos before picking one `source` for retrieval.
164
+
165
+ #### 📺 Media retrieval (entry point)
166
+
167
+ - `start_media_retrieval(source: str, prefer_audio_only: bool = False, wait_seconds: int = 54) -> dict`
168
+ - Purpose: Download long-form media (YouTube, podcasts, HTTP URLs) and normalize basic metadata.
169
+ - Arguments:
170
+ - `source`: YouTube URL/ID, podcast URL, or other `yt-dlp`-supported locator.
171
+ - `prefer_audio_only`: When `true`, prefer audio-first formats; use when visuals are not needed.
172
+ - `wait_seconds`: How long to block before returning; if the job is still running, you get status + reference.
173
+ - Returns:
174
+ - On success: `{ reference, status: "done", metadata: {...}, cached? }`
175
+ - In progress: `{ reference, status: "pending"|"running", progress?, job_id }`
176
+ - On error: `{ is_error: true, status, detail, reference }`
177
+ - Typical flow: This is the first call once you have chosen a `source`. The `reference` token is required for all downstream tools.
178
+
179
+ - `get_media_retrieval_status(reference: str, wait_seconds: int = 0) -> dict`
180
+ - Purpose: Poll the retrieval job or fetch cached metadata.
181
+ - Returns:
182
+ - `{ status: "done", reference, metadata }` when cached or finished.
183
+ - `{ status: "pending"|"running", ... }` while in flight.
184
+ - `{ status: "not_found", reference }` if no job or cache exists.
185
+
186
+ #### 🖼️ Slides: extraction and translation
187
+
188
+ - `start_slide_extraction(reference: str, wait_seconds: int = 55) -> dict`
189
+ - Purpose: Extract representative slide stills from a downloaded video.
190
+ - Note: Full media analysis (`start_media_analysis`) automatically triggers slide extraction; call this explicitly only if you need slides on their own.
191
+ - Returns: Standard job envelope with `slides` once done or `status` + `job_id` while running.
192
+
193
+ - `get_extracted_slides(reference: str, wait_seconds: int = 0) -> dict`
194
+ - Purpose: Fetch extracted slides or current extraction status.
195
+ - Returns: `{ status: "done", reference, slides: [...] }` on success, otherwise a job status or `{ status: "not_found" }`. Slides include indices that are used by `translate_slide`.
196
+
197
+ - `translate_slide(reference: str, slide_index: int, language: str) -> ImageContent`
198
+ - Purpose: Translate a single slide image into another language using Gemini image-to-image.
199
+ - Arguments:
200
+ - `reference`: Token from `start_media_retrieval`.
201
+ - `slide_index`: Zero-based index into `get_extracted_slides.slides[].index`.
202
+ - `language`: Target language name (e.g. `"German"`, `"Spanish"`).
203
+ - Returns: `ImageContent` with base64-encoded translated slide image. Responses are cached per `(reference, language, slide_index)`.
204
+
205
+ #### ⛳️ Expectation-driven analysis
206
+
207
+ - `start_media_analysis(reference: str, priors: object, wait_seconds: int = 55) -> dict`
208
+ - Purpose: Run expectation-driven analysis over the media’s audio and slides, surfacing *surprises* and *new actors* instead of rehashing everything.
209
+ - Arguments:
210
+ - `reference`: Token produced by `start_media_retrieval`.
211
+ - `priors`: Object with optional string fields:
212
+ - `context`: Scene setting (participants, venue, goal, spelled names).
213
+ - `expectations`: What the user already expects to hear.
214
+ - `prior_knowledge`: What the user already knows from past work.
215
+ - `questions`: Concrete questions to be answered.
216
+ - Important: Only populate `priors` with information coming from the user or trusted tools (e.g. Memory Bank); do not invent priors in the agent.
217
+ - Returns: Same job envelope pattern as retrieval. When `status: "done"`, the payload includes an `analysis` markdown briefing optimised for fast reading.
218
+
219
+ - `get_media_analysis_result(reference: str, wait_seconds: int = 0) -> dict`
220
+ - Purpose: Poll for completion or fetch cached analysis for a `reference`.
221
+ - Returns:
222
+ - `status: "done"` with `analysis` text on success.
223
+ - `status: "pending"|"running"` during processing.
224
+ - Errors include `is_error: true`, `detail`, `reference`.
225
+
226
+ #### ✍️ Transcription
227
+
228
+ - `start_media_transcription(reference: str, context: str = "", prefer_audio_only: bool = False, wait_seconds: int = 55) -> dict`
229
+ - Purpose: Produce a diarized, speaker-labelled transcription of the media’s audio channel.
230
+ - Arguments:
231
+ - `reference`: From `start_media_retrieval`.
232
+ - `context`: Optional grounding text with names, acronyms, or domain hints.
233
+ - `prefer_audio_only`: When `true`, skip slide context for cheaper audio-only runs.
234
+ - `wait_seconds`: Poll window before returning.
235
+ - Returns: Job envelope, with `transcription` once `status: "done"`.
236
+
237
+ - `get_media_transcription_result(reference: str, wait_seconds: int = 0) -> dict`
238
+ - Purpose: Retrieve a previously computed transcription or current job status.
239
+ - Returns: Same pattern as `get_media_analysis_result`, but with `transcription` instead of `analysis`.
240
+
241
+ ## 🏆 Hackathon Context & Journey
242
+ Aileen 3 Core was built for the [MCP's 1st Birthday - Hosted by Anthropic and Gradio](https://huggingface.co/MCP-1st-Birthday) and serves as the backbone for the [Aileen 3 Agent](https://ndurner.de/links/aileen3-kaggle-writeup) (developed for the [AI Agents Intensive Course with Google](https://www.kaggle.com/learn-guide/5-day-agents)).
243
+
244
+ While most agents are passive summarizers, Aileen 3 represents a shift toward **active information foraging**, enabling professionals to filter signal from an ocean of noise.
245
+
246
+ ## 📦 Local Development
247
 
248
  ```bash
249
+ # Build the Docker image
250
  docker build -t aileen3-core .
251
+
252
+ # Run the Gradio interface
253
  docker run -it -p 7860:7860 aileen3-core
254
  ```
255
 
256
+ ## 🛡️ Security & privacy
257
+ - Your Gemini key is used only server-side to call Gemini models.
258
+ - Media is downloaded to cache for repeatability; clear ~/.cache/aileen3 to remove artefacts.
259
+ - No analytics or third-party telemetry included.
260
+
261
+ ## 🚧 Limitations
262
+ - `translate_slide` does currently not benefit from priors; translation quality could be improved that way
263
+ - No AI safety guardrails (tone, style, anti prompt-injection, ...)
264
+ - No cost control
265
+ - Hallucination risk - Aileen may make mistakes.
266
+ - Remote MCP operating mode not tested; would rely on external access protection
267
+
268
+ ## 👾 Troubleshooting
269
+ - Gemini 401 “API keys are not supported…”: use AI Studio key starting with “AI…”, not Vertex keys (“AQ…”).
270
+ - Long jobs: increase transport timeout (10–20 min) and leverage wait_seconds + polling get_* tools.
271
+ - YouTube access:
272
+ * ensure YouTube is reachable
273
+ * yt-dlp is recent
274
+ * if site JS protection breaks, install yt-dlp-ejs (see Space health check).
demo/setup_cell.py CHANGED
@@ -12,7 +12,7 @@ def render_setup_cell() -> gr.Textbox:
12
  The returned textbox component is used by other cells to pass GEMINI_API_KEY
13
  into the MCP server environment.
14
 
15
- This is recommended practice for the Gradio/Anthropic hackathon
16
  """
17
  with cell("🔑 Setup: Gemini API key"):
18
  gr.Markdown(
 
12
  The returned textbox component is used by other cells to pass GEMINI_API_KEY
13
  into the MCP server environment.
14
 
15
+ This Space runs your key locally in the container to call Gemini. You can revoke it any time.
16
  """
17
  with cell("🔑 Setup: Gemini API key"):
18
  gr.Markdown(
mcp/README.md CHANGED
@@ -1,6 +1,6 @@
1
  # Aileen3 MCP Server
2
 
3
- Lightweight MCP server exposing a health check tool for local use by the demo app.
4
 
5
  ## Quick start
6
 
@@ -9,32 +9,22 @@ python -m pip install -e ./mcp
9
  aileen3-mcp # starts the stdio MCP server
10
  ```
11
 
12
- The server provides two tools:
13
 
14
- 1) `health` returns an `{ "ok": true, "detail": "…" }` payload.
15
- 2) `search_youtube` — finds YouTube videos using the yt-dlp Python API.
16
 
17
- ### `search_youtube` tool contract
 
 
 
 
 
 
18
 
19
- - **Purpose:** Lightweight YouTube search (no downloads). Ideal for LLM agents to shortlist videos.
20
- - **Arguments:**
21
- - `query` (str, required): Free-form search terms, e.g. `"lofi hip hop beats"`.
22
- - `max_results` (int, optional, default `10`, bounds `1–50`): number of videos to return.
23
- - **Returns:** object with a `videos` array. Each entry includes `id`, `title`, `webpage_url`,
24
- `duration_seconds`, `channel`, `channel_id`, `thumbnail_url`.
25
- - **Usage note:** Keep `max_results` small (≤10) for faster responses. The tool only searches; it does not download media.
26
 
27
- Example MCP tool call shape:
 
 
28
 
29
- ```json
30
- {
31
- "name": "search_youtube",
32
- "arguments": {
33
- "query": "python packaging tutorial",
34
- "max_results": 5
35
- }
36
- }
37
- ```
38
-
39
- ## ToDo
40
- * write proper project description: add to README.md and pyproject.toml
 
1
  # Aileen3 MCP Server
2
 
3
+ Lightweight stdio MCP server exposing Aileen 3’s media tools for use by the Gradio demo, Claude Desktop, and other MCP clients.
4
 
5
  ## Quick start
6
 
 
9
  aileen3-mcp # starts the stdio MCP server
10
  ```
11
 
12
+ The server entrypoint is `aileen3_mcp.server.make_app`, which registers all tools on a `FastMCP` instance. For a complete description of available tools (health probes, YouTube search, media retrieval, slide extraction and translation, analysis, transcription), see the project root `README.md` under **“MCP tools and interface”**.
13
 
14
+ In short, the public tools are:
 
15
 
16
+ - `health`
17
+ - `search_youtube`
18
+ - `start_media_retrieval` / `get_media_retrieval_status`
19
+ - `start_slide_extraction` / `get_extracted_slides`
20
+ - `translate_slide`
21
+ - `start_media_analysis` / `get_media_analysis_result`
22
+ - `start_media_transcription` / `get_media_transcription_result`
23
 
24
+ These tools are designed to be called from an agentic chat interface that:
 
 
 
 
 
 
25
 
26
+ - first chooses a media `source` (optionally using `search_youtube`)
27
+ - then calls `start_media_retrieval`
28
+ - and finally uses the `reference` token to drive analysis, transcription, or slide translation.
29
 
30
+ For detailed contracts (arguments, return payloads, and example usage), consult `README.md` in the repository root.
 
 
 
 
 
 
 
 
 
 
 
mcp/src/aileen3_mcp/media_tools.py CHANGED
@@ -1302,7 +1302,7 @@ def register_media_tools(app: FastMCP) -> None:
1302
  async def start_slide_extraction(ctx: Context, reference: str, wait_seconds: int = 55) -> dict:
1303
  """Extract representative slide stills from a downloaded video.
1304
 
1305
- Note: media analysis (start_media_analysis) includes slides extraction, so no need to call this function explicitely when aiming for full media analysis
1306
  """
1307
  metadata = _load_json(_metadata_path(reference))
1308
  if not metadata or not Path(metadata.get("download_path", "")).exists():
 
1302
  async def start_slide_extraction(ctx: Context, reference: str, wait_seconds: int = 55) -> dict:
1303
  """Extract representative slide stills from a downloaded video.
1304
 
1305
+ Note: media analysis (start_media_analysis) includes slides extraction, so no need to call this function explicitly when aiming for full media analysis
1306
  """
1307
  metadata = _load_json(_metadata_path(reference))
1308
  if not metadata or not Path(metadata.get("download_path", "")).exists():