agentbee

Sleeping

mangubee Claude commited on Jan 13

Commit

61ecfdb

1 Parent(s): 7bb1d7c

fix: clarify tool descriptions to prevent LLM selecting vision for YouTube

Problem: LLM was selecting vision tool for YouTube URLs, which cannot
process them directly because vision description mentioned "YouTube links".

Changes:
- Vision: Removed "videos, YouTube links" from description
- Vision: Added NOTE about using youtube_transcript for YouTube
- YouTube: Enhanced description to be more explicit about being FIRST choice
- YouTube: Added "ONLY tool that can process YouTube URLs directly"

This should guide LLM to select youtube_transcript for YouTube URLs.

Co-Authored-By: Claude <noreply@anthropic.com>

Files changed (1) hide show

src/tools/__init__.py +3 -3

src/tools/__init__.py CHANGED Viewed

@@ -66,7 +66,7 @@ TOOLS = {
     },
     "vision": {
         "function": analyze_image,
-        "description": "Analyze images or videos using multimodal AI vision models. Describe visual content, identify objects, read text from images, answer questions about photos or screenshots. Use when question mentions images, photos, pictures, videos, YouTube links, or visual content.",
         "parameters": {
             "image_path": {
                 "description": "Path to the image file to analyze",
@@ -82,10 +82,10 @@ TOOLS = {
     },
     "youtube_transcript": {
         "function": youtube_transcript,
-        "description": "Extract transcript from YouTube video URL. Use when question asks about YouTube video content like: dialogue, speech, bird species identification, character quotes, or any content discussed in the video. Handles youtube.com, youtu.be, and shorts URLs. Returns full transcript text or uses Whisper audio transcription as fallback.",
         "parameters": {
             "url": {
-                "description": "YouTube video URL (youtube.com, youtu.be, or shorts)",
                 "type": "string"
             }
         },

     },
     "vision": {
         "function": analyze_image,
+        "description": "Analyze images using multimodal AI vision models. Describe visual content, identify objects, read text from images, answer questions about photos or screenshots. Use when question mentions images, photos, pictures, screenshots, or visual content. NOTE: For YouTube videos, use youtube_transcript tool instead. For video files (MP4, AVI, etc.), the file must be downloaded first.",
         "parameters": {
             "image_path": {
                 "description": "Path to the image file to analyze",
     },
     "youtube_transcript": {
         "function": youtube_transcript,
+        "description": "Extract transcript from YouTube video URLs (youtube.com, youtu.be, shorts). Use this tool FIRST when question mentions YouTube, video, or contains a YouTube URL. This tool handles video content by extracting the transcript (what is said/discussed in the video). Falls back to Whisper audio transcription if captions are unavailable. This is the ONLY tool that can process YouTube URLs directly.",
         "parameters": {
             "url": {
+                "description": "YouTube video URL (youtube.com/watch?v=ID, youtu.be/ID, or shorts/ID format)",
                 "type": "string"
             }
         },