Spaces:

MCP-1st-Birthday
/

aileen3-core

Running

App Files Files Community

ndurner commited on 14 days ago

Commit

5f045d0

1 Parent(s): 12f6601

improve short desc

Browse files

Files changed (2) hide show

.github/README.md +23 -92
README.md +1 -1

.github/README.md CHANGED Viewed

@@ -118,96 +118,27 @@ When integrating this MCP into your own agent or client:
 - Set transport-level timeouts generously (10–20 minutes) and rely on the tools’ `wait_seconds` argument plus status polling for progress.
 - Ensure `GEMINI_API_KEY` (and any optional `AILEEN3_*` variables you use) are visible in the environment of the MCP server process, not just the client.
-### 🛠️ MCP tools and definitions
-#### 🩺 Health and search
-- `health() -> { ok, detail, ffmpeg, gemini_api_key }`
-  - Purpose: Lightweight health probe mirroring the Gradio demo’s health check. Confirms that `ffmpeg` is callable and `GEMINI_API_KEY` is present.
-  - Usage: Call before running longer flows to surface missing runtime dependencies early.
-- `search_youtube(query: str, max_results: int = 10) -> { videos: [...] }`
-  - Purpose: Fast YouTube search using `yt-dlp` (no downloads).
-  - Arguments:
-    - `query` (required): Free-form search terms (e.g. `"taler auditor bachelorthesis"`).
-    - `max_results` (optional, default `10`, clamped to `1–50`).
-  - Returns: `videos` list with `id`, `title`, `webpage_url`, `duration_seconds`, `channel`, `channel_id`.
-  - Typical flow: Use from an agent to shortlist candidate videos before picking one `source` for retrieval.
-#### 📺 Media retrieval (entry point)
-- `start_media_retrieval(source: str, prefer_audio_only: bool = False, wait_seconds: int = 54) -> dict`
-  - Purpose: Download long-form media (YouTube, podcasts, HTTP URLs) and normalize basic metadata.
-  - Arguments:
-    - `source`: YouTube URL/ID, podcast URL, or other `yt-dlp`-supported locator.
-    - `prefer_audio_only`: When `true`, prefer audio-first formats; use when visuals are not needed.
-    - `wait_seconds`: How long to block before returning; if the job is still running, you get status + reference.
-  - Returns:
-    - On success: `{ reference, status: "done", metadata: {...}, cached? }`
-    - In progress: `{ reference, status: "pending"|"running", progress?, job_id }`
-    - On error: `{ is_error: true, status, detail, reference }`
-  - Typical flow: This is the first call once you have chosen a `source`. The `reference` token is required for all downstream tools.
-- `get_media_retrieval_status(reference: str, wait_seconds: int = 0) -> dict`
-  - Purpose: Poll the retrieval job or fetch cached metadata.
-  - Returns:
-    - `{ status: "done", reference, metadata }` when cached or finished.
-    - `{ status: "pending"|"running", ... }` while in flight.
-    - `{ status: "not_found", reference }` if no job or cache exists.
-#### 🖼️ Slides: extraction and translation
-- `start_slide_extraction(reference: str, wait_seconds: int = 55) -> dict`
-  - Purpose: Extract representative slide stills from a downloaded video.
-  - Note: Full media analysis (`start_media_analysis`) automatically triggers slide extraction; call this explicitly only if you need slides on their own.
-  - Returns: Standard job envelope with `slides` once done or `status` + `job_id` while running.
-- `get_extracted_slides(reference: str, wait_seconds: int = 0) -> dict`
-  - Purpose: Fetch extracted slides or current extraction status.
-  - Returns: `{ status: "done", reference, slides: [...] }` on success, otherwise a job status or `{ status: "not_found" }`. Slides include indices that are used by `translate_slide`.
-- `translate_slide(reference: str, slide_index: int, language: str) -> ImageContent`
-  - Purpose: Translate a single slide image into another language using Gemini image-to-image.
-  - Arguments:
-    - `reference`: Token from `start_media_retrieval`.
-    - `slide_index`: Zero-based index into `get_extracted_slides.slides[].index`.
-    - `language`: Target language name (e.g. `"German"`, `"Spanish"`).
-  - Returns: `ImageContent` with base64-encoded translated slide image. Responses are cached per `(reference, language, slide_index)`.
-#### ⛳️ Expectation-driven analysis
-- `start_media_analysis(reference: str, priors: object, wait_seconds: int = 55) -> dict`
-  - Purpose: Run expectation-driven analysis over the media’s audio and slides, surfacing *surprises* and *new actors* instead of rehashing everything.
-  - Arguments:
-    - `reference`: Token produced by `start_media_retrieval`.
-    - `priors`: Object with optional string fields:
-      - `context`: Scene setting (participants, venue, goal, spelled names).
-      - `expectations`: What the user already expects to hear.
-      - `prior_knowledge`: What the user already knows from past work.
-      - `questions`: Concrete questions to be answered.
-  - Important: Only populate `priors` with information coming from the user or trusted tools (e.g. Memory Bank); do not invent priors in the agent.
-  - Returns: Same job envelope pattern as retrieval. When `status: "done"`, the payload includes an `analysis` markdown briefing optimised for fast reading.
-- `get_media_analysis_result(reference: str, wait_seconds: int = 0) -> dict`
-  - Purpose: Poll for completion or fetch cached analysis for a `reference`.
-  - Returns:
-    - `status: "done"` with `analysis` text on success.
-    - `status: "pending"|"running"` during processing.
-    - Errors include `is_error: true`, `detail`, `reference`.
-#### ✍️ Transcription
-- `start_media_transcription(reference: str, context: str = "", prefer_audio_only: bool = False, wait_seconds: int = 55) -> dict`
-  - Purpose: Produce a diarized, speaker-labelled transcription of the media’s audio channel.
-  - Arguments:
-    - `reference`: From `start_media_retrieval`.
-    - `context`: Optional grounding text with names, acronyms, or domain hints.
-    - `prefer_audio_only`: When `true`, skip slide context for cheaper audio-only runs.
-    - `wait_seconds`: Poll window before returning.
-  - Returns: Job envelope, with `transcription` once `status: "done"`.
-- `get_media_transcription_result(reference: str, wait_seconds: int = 0) -> dict`
-  - Purpose: Retrieve a previously computed transcription or current job status.
-  - Returns: Same pattern as `get_media_analysis_result`, but with `transcription` instead of `analysis`.
 ## 🏆 Hackathon Context & Journey
 Aileen 3 Core was built for the [MCP's 1st Birthday - Hosted by Anthropic and Gradio](https://huggingface.co/MCP-1st-Birthday) and serves as the backbone for the [Aileen 3 Agent](https://ndurner.de/links/aileen3-kaggle-writeup) (developed for the [AI Agents Intensive Course with Google](https://www.kaggle.com/learn-guide/5-day-agents)).
@@ -231,8 +162,8 @@ docker run -it -p 7860:7860 aileen3-core
 ## 🚧 Limitations
 - `translate_slide` does currently not benefit from priors; translation quality could be improved that way
-- No AI safety guardrails (tone, style, anti prompt-injection, ...)
-- No cost control
 - Hallucination risk - Aileen may make mistakes.
 - Remote MCP operating mode not tested; would rely on external access protection

 - Set transport-level timeouts generously (10–20 minutes) and rely on the tools’ `wait_seconds` argument plus status polling for progress.
 - Ensure `GEMINI_API_KEY` (and any optional `AILEEN3_*` variables you use) are visible in the environment of the MCP server process, not just the client.
+### 🛠️ MCP tools overview
+All tools are registered in `aileen3_mcp.server.make_app` and exposed via a stdio MCP server for use by the Gradio demo, Claude Desktop, and other clients.
+In short, the public tools are:
+- `health`
+- `search_youtube`
+- `start_media_retrieval` / `get_media_retrieval_status`
+- `start_slide_extraction` / `get_extracted_slides`
+- `translate_slide`
+- `start_media_analysis` / `get_media_analysis_result`
+- `start_media_transcription` / `get_media_transcription_result`
+These tools are designed to be called from an agentic chat interface that:
+- first chooses a media `source` (optionally using `search_youtube`)
+- then calls `start_media_retrieval`
+- and finally uses the `reference` token to drive analysis, transcription, or slide translation.
+For detailed tool contracts (arguments, return payloads, and error shapes), see `mcp/README.md`.
 ## 🏆 Hackathon Context & Journey
 Aileen 3 Core was built for the [MCP's 1st Birthday - Hosted by Anthropic and Gradio](https://huggingface.co/MCP-1st-Birthday) and serves as the backbone for the [Aileen 3 Agent](https://ndurner.de/links/aileen3-kaggle-writeup) (developed for the [AI Agents Intensive Course with Google](https://www.kaggle.com/learn-guide/5-day-agents)).
 ## 🚧 Limitations
 - `translate_slide` does currently not benefit from priors; translation quality could be improved that way
+- No AI safety guardrails (tone, style, anti prompt-injection, ...) included
+- No cost control included
 - Hallucination risk - Aileen may make mistakes.
 - Remote MCP operating mode not tested; would rely on external access protection

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ colorTo: blue
 sdk: docker
 pinned: false
 license: cc-by-4.0
-short_description: Expectation-driven briefs from talks, lectures, panels, ...
 tags:
   - building-mcp-track-enterprise
   - building-mcp-track-customer

 sdk: docker
 pinned: false
 license: cc-by-4.0
+short_description: Use priors to surface novel insights in noisy communications
 tags:
   - building-mcp-track-enterprise
   - building-mcp-track-customer