Spaces:

KaiserShultz
/

Ankelodon_AI_Multi_task_agentic_system

Sleeping

App Files Files Community

KaiserShultz commited on Sep 23, 2025

Commit

af3a044

1 Parent(s): 331400f

Improvements of prompt for planner (old version), adding youtube parser tool, tavily_extract

Browse files

Files changed (8) hide show

.gitignore +2 -2
questions_20_gaia.json +1 -0
src/__init__.py +95 -10
src/config.py +4 -2
src/prompts/prompts.py +101 -4
src/requirements.txt +21 -0
src/tools/tools.py +33 -0
src/workflow_test.ipynb +34 -18

.gitignore CHANGED Viewed

@@ -11,13 +11,13 @@ __pycache__/
 venv/
 env/
 .venv/
 # IDEs
 .vscode/
 .idea/
 *.swp
 *.swo
 # OS
 .DS_Store
 .DS_Store?

 venv/
 env/
 .venv/
+D:/ankelodon_multiagent_system/questions_20_gaia.json
 # IDEs
 .vscode/
 .idea/
 *.swp
 *.swo
+data/
 # OS
 .DS_Store
 .DS_Store?

questions_20_gaia.json ADDED Viewed

	@@ -0,0 +1 @@

+ [{"task_id":"8e867cd7-cff9-4e6c-867a-ff5ddc2550be","question":"How many studio albums were published by Mercedes Sosa between 2000 and 2009 (included)? You can use the latest 2022 version of english wikipedia.","Level":"1","file_name":""},{"task_id":"a1e91b78-d3d8-4675-bb8d-62741b4b68a6","question":"In the video https://www.youtube.com/watch?v=L1vXCYZAYYM, what is the highest number of bird species to be on camera simultaneously?","Level":"1","file_name":""},{"task_id":"2d83110e-a098-4ebb-9987-066c06fa42d0","question":".rewsna eht sa \"tfel\" drow eht fo etisoppo eht etirw ,ecnetnes siht dnatsrednu uoy fI","Level":"1","file_name":""},{"task_id":"cca530fc-4052-43b2-b130-b30968d8aa44","question":"Review the chess position provided in the image. It is black's turn. Provide the correct next move for black which guarantees a win. Please provide your response in algebraic notation.","Level":"1","file_name":"cca530fc-4052-43b2-b130-b30968d8aa44.png"},{"task_id":"4fc2f1ae-8625-45b5-ab34-ad4433bc21f8","question":"Who nominated the only Featured Article on English Wikipedia about a dinosaur that was promoted in November 2016?","Level":"1","file_name":""},{"task_id":"6f37996b-2ac7-44b0-8e68-6d28256631b4","question":"Given this table defining * on the set S = {a, b, c, d, e}\n\n|*|a|b|c|d|e|\n|---|---|---|---|---|---|\n|a|a|b|c|b|d|\n|b|b|c|a|e|c|\n|c|c|a|b|b|a|\n|d|b|e|b|e|d|\n|e|d|b|a|d|c|\n\nprovide the subset of S involved in any possible counter-examples that prove * is not commutative. Provide your answer as a comma separated list of the elements in the set in alphabetical order.","Level":"1","file_name":""},{"task_id":"9d191bce-651d-4746-be2d-7ef8ecadb9c2","question":"Examine the video at https://www.youtube.com/watch?v=1htKBjuUWec.\n\nWhat does Teal'c say in response to the question \"Isn't that hot?\"","Level":"1","file_name":""},{"task_id":"cabe07ed-9eca-40ea-8ead-410ef5e83f91","question":"What is the surname of the equine veterinarian mentioned in 1.E Exercises from the chemistry materials licensed by Marisa Alviar-Agnew & Henry Agnew under the CK-12 license in LibreText's Introductory Chemistry materials as compiled 08/21/2023?","Level":"1","file_name":""},{"task_id":"3cef3a44-215e-4aed-8e3b-b1e3f08063b7","question":"I'm making a grocery list for my mom, but she's a professor of botany and she's a real stickler when it comes to categorizing things. I need to add different foods to different categories on the grocery list, but if I make a mistake, she won't buy anything inserted in the wrong category. Here's the list I have so far:\n\nmilk, eggs, flour, whole bean coffee, Oreos, sweet potatoes, fresh basil, plums, green beans, rice, corn, bell pepper, whole allspice, acorns, broccoli, celery, zucchini, lettuce, peanuts\n\nI need to make headings for the fruits and vegetables. Could you please create a list of just the vegetables from my list? If you could do that, then I can figure out how to categorize the rest of the list into the appropriate categories. But remember that my mom is a real stickler, so make sure that no botanical fruits end up on the vegetable list, or she won't get them when she's at the store. Please alphabetize the list of vegetables, and place each item in a comma separated list.","Level":"1","file_name":""},{"task_id":"99c9cc74-fdc8-46c6-8f8d-3ce2d3bfeea3","question":"Hi, I'm making a pie but I could use some help with my shopping list. I have everything I need for the crust, but I'm not sure about the filling. I got the recipe from my friend Aditi, but she left it as a voice memo and the speaker on my phone is buzzing so I can't quite make out what she's saying. Could you please listen to the recipe and list all of the ingredients that my friend described? I only want the ingredients for the filling, as I have everything I need to make my favorite pie crust. I've attached the recipe as Strawberry pie.mp3.\n\nIn your response, please only list the ingredients, not any measurements. So if the recipe calls for \"a pinch of salt\" or \"two cups of ripe strawberries\" the ingredients on the list would be \"salt\" and \"ripe strawberries\".\n\nPlease format your response as a comma separated list of ingredients. Also, please alphabetize the ingredients.","Level":"1","file_name":"99c9cc74-fdc8-46c6-8f8d-3ce2d3bfeea3.mp3"},{"task_id":"305ac316-eef6-4446-960a-92d80d542f82","question":"Who did the actor who played Ray in the Polish-language version of Everybody Loves Raymond play in Magda M.? Give only the first name.","Level":"1","file_name":""},{"task_id":"f918266a-b3e0-4914-865d-4faa564f1aef","question":"What is the final numeric output from the attached Python code?","Level":"1","file_name":"f918266a-b3e0-4914-865d-4faa564f1aef.py"},{"task_id":"3f57289b-8c60-48be-bd80-01f8099ca449","question":"How many at bats did the Yankee with the most walks in the 1977 regular season have that same season?","Level":"1","file_name":""},{"task_id":"1f975693-876d-457b-a649-393859e79bf3","question":"Hi, I was out sick from my classes on Friday, so I'm trying to figure out what I need to study for my Calculus mid-term next week. My friend from class sent me an audio recording of Professor Willowbrook giving out the recommended reading for the test, but my headphones are broken :(\n\nCould you please listen to the recording for me and tell me the page numbers I'm supposed to go over? I've attached a file called Homework.mp3 that has the recording. Please provide just the page numbers as a comma-delimited list. And please provide the list in ascending order.","Level":"1","file_name":"1f975693-876d-457b-a649-393859e79bf3.mp3"},{"task_id":"840bfca7-4f7b-481a-8794-c560c340185d","question":"On June 6, 2023, an article by Carolyn Collins Petersen was published in Universe Today. This article mentions a team that produced a paper about their observations, linked at the bottom of the article. Find this paper. Under what NASA award number was the work performed by R. G. Arendt supported by?","Level":"1","file_name":""},{"task_id":"bda648d7-d618-4883-88f4-3466eabd860e","question":"Where were the Vietnamese specimens described by Kuznetzov in Nedoshivina's 2010 paper eventually deposited? Just give me the city name without abbreviations.","Level":"1","file_name":""},{"task_id":"cf106601-ab4f-4af9-b045-5295fe67b37d","question":"What country had the least number of athletes at the 1928 Summer Olympics? If there's a tie for a number of athletes, return the first in alphabetical order. Give the IOC country code as your answer.","Level":"1","file_name":""},{"task_id":"a0c07678-e491-4bbc-8f0b-07405144218f","question":"Who are the pitchers with the number before and after Taishō Tamai's number as of July 2023? Give them to me in the form Pitcher Before, Pitcher After, use their last names only, in Roman characters.","Level":"1","file_name":""},{"task_id":"7bd855d8-463d-4ed5-93ca-5fe35145f733","question":"The attached Excel file contains the sales of menu items for a local fast-food chain. What were the total sales that the chain made from food (not including drinks)? Express your answer in USD with two decimal places.","Level":"1","file_name":"7bd855d8-463d-4ed5-93ca-5fe35145f733.xlsx"},{"task_id":"5a0c1adf-205e-4841-a666-7c3ef95def9d","question":"What is the first name of the only Malko Competition recipient from the 20th Century (after 1977) whose nationality on record is a country that no longer exists?","Level":"1","file_name":""}]

src/__init__.py CHANGED Viewed

@@ -4,16 +4,101 @@ Import key components for easy use:
 from src import workflow, llm
 """
-from .config import llm, TOOLS, CONFIG, TOOL_NODE, planner_llm
-from .agent import workflow, build_workflow, should_continue
-from .nodes import agent, planner, query_input, critique
-from .schemas import AgentState, PlannerPlan, ComplexityLevel, CritiqueFeedback
 __version__ = "0.1.0"
 __all__ = [
-    "llm", "TOOLS", "CONFIG", "TOOL_NODE", "planner_llm",
-    "workflow", "build_workflow", "should_continue",
-    "agent", "planner", "query_input", "critique",
-    "AgentState", "PlannerPlan", "ComplexityLevel", "CritiqueFeedback",
-    "__version__"
-]

 from src import workflow, llm
 """
+"""
+Ankelodon Multi-Agent System – package init.
+Экспортирует удобный публичный API для работы с графом, состоянием агента,
+схемами и конфигом. Клади этот файл в директорию, где лежат:
+agent.py, config.py, nodes.py, schemas.py, state.py
+(у тебя это src/).
+"""
+# Версия пакета (по желанию обновляй вручную/из git)
 __version__ = "0.1.0"
+# ── Граф/сборка
+from .agent import build_workflow
+# ── Состояние
+from .state import AgentState
+# ── Схемы/модели
+from .schemas import (
+    ComplexityLevel,
+    CritiqueFeedback,
+    PlannerPlan,
+    PlanStep,
+    ExecutionReport,
+    ToolExecution,
+    TaskType,
+)
+# ── Конфиг/LLM/Tools
+from .config import (
+    config,
+    TOOLS,
+    DEBUGGING_TOOL_NODE,
+    llm,
+    llm_deterministic,
+    planner_llm,
+    llm_with_tools,
+    llm_criticist,
+    llm_reasoning,
+)
+# ── Узлы/роутеры (если нужно вызывать напрямую или для тестов)
+from .nodes import (
+    query_input,
+    complexity_assessor,
+    planner,
+    agent,
+    simple_executor,
+    critic_evaluator,
+    replanner,
+    enhanced_finalizer,
+    # роутеры
+    should_continue,
+    should_use_planning,
+    should_replan,
+    should_use_tools_simple_executor,
+)
 __all__ = [
+    # версия
+    "__version__",
+    # сборка графа
+    "build_workflow",
+    # состояние
+    "AgentState",
+    # схемы
+    "ComplexityLevel",
+    "CritiqueFeedback",
+    "PlannerPlan",
+    "PlanStep",
+    "ExecutionReport",
+    "ToolExecution",
+    "TaskType",
+    # конфиг/модели/тулы
+    "config",
+    "TOOLS",
+    "DEBUGGING_TOOL_NODE",
+    "llm",
+    "llm_deterministic",
+    "planner_llm",
+    "llm_with_tools",
+    "llm_criticist",
+    "llm_reasoning",
+    # узлы и роутеры
+    "query_input",
+    "complexity_assessor",
+    "planner",
+    "agent",
+    "simple_executor",
+    "critic_evaluator",
+    "replanner",
+    "enhanced_finalizer",
+    "should_continue",
+    "should_use_planning",
+    "should_replan",
+    "should_use_tools_simple_executor",
+]

src/config.py CHANGED Viewed

@@ -14,7 +14,7 @@ TOOLS = [download_file_from_url, web_search,
          arxiv_search, wiki_search, add, subtract, multiply, divide,
          power, analyze_excel_file, analyze_csv_file, analyze_docx_file,
          analyze_pdf_file, analyze_txt_file,
-         vision_qa_gemma, safe_code_run]
 TOOL_NODE = ToolNode(TOOLS)
@@ -23,9 +23,11 @@ DEBUGGING_TOOL_NODE = TOOL_NODE
 llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7) #default 0.25
 llm_deterministic = ChatOpenAI(model="gpt-5-mini", temperature=0.05)
 planner_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.1).with_structured_output(PlannerPlan)
-llm_criticist = ChatOpenAI(model="gpt-5-mini", temperature=0.3)
 llm_with_tools = llm_deterministic.bind_tools(TOOLS)
 llm_reasoning = ChatOpenAI(model="gpt-5-mini", temperature=0.3)

          arxiv_search, wiki_search, add, subtract, multiply, divide,
          power, analyze_excel_file, analyze_csv_file, analyze_docx_file,
          analyze_pdf_file, analyze_txt_file,
+         vision_qa_gemma, safe_code_run, web_extract, extract_youtube_transcript]
 TOOL_NODE = ToolNode(TOOLS)
 llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7) #default 0.25
 llm_deterministic = ChatOpenAI(model="gpt-5-mini", temperature=0.05)
 planner_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.1).with_structured_output(PlannerPlan)
+llm_criticist = ChatOpenAI(model="gpt-4o-mini", temperature=0.1)
 llm_with_tools = llm_deterministic.bind_tools(TOOLS)
 llm_reasoning = ChatOpenAI(model="gpt-5-mini", temperature=0.3)
+llm_simple_executor = ChatOpenAI(model="gpt-5-mini", temperature=0.3)
+llm_simple_with_tools = llm_simple_executor.bind_tools(TOOLS)

src/prompts/prompts.py CHANGED Viewed

@@ -1,4 +1,4 @@
-SYSTEM_PROMPT_PLANNER = """
 You are the planner of a multi-tool agent. Build a short, realistic plan that the executor can follow.
 Available tools: {tool_catalogue}
@@ -25,7 +25,7 @@ Example 2: "Research recent AI developments and summarize key trends"
 {{
   "steps": [
     {{"id": "s1", "goal": "Search for recent AI news and developments", "tool": "web_search"}},
-    {{"id": "s2", "goal": "Download relevant articles", "tool": "ddownload_file_from_url"}},
     {{"id": "s3", "goal": "Extract and organize key information from articles", "tool": "analyze_(csv, docx, pdf etc.)_file"}},
     {{"id": "s4", "goal": "Analyze and synthesize key trends from gathered information", "tool": null}}
   ]
@@ -73,7 +73,7 @@ Ground rules:
 - Prefer 2-4 steps for most tasks. Single steps only for truly trivial queries. Calculation tasks must use tools always.
 - Break down complex tasks into logical components - don't try to solve everything at once
 - Use tool names exactly as listed. If no tool is needed, set "tool": null.
-- Never assume files or URLs exist—plan to search/download before analysing.
 - Skip download steps when the required file is already provided.
 - Ensure later steps only depend on results created by earlier steps.
 - For any numerical work: ALWAYS use tools (calculator/code) - never manual calculation
@@ -82,6 +82,103 @@ Ground rules:
 - Plan for visualization or formatting steps when presenting complex results
 """
 SYSTEM_EXECUTOR_PROMPT = """
 You are the executor of a grounded multi-tool agent.
@@ -146,7 +243,7 @@ ASSESSMENT CRITERIA:
 SPECIAL CONSIDERATIONS:
 - Any calculation/counting task requires tools (affects complexity assessment)
 - File analysis tasks usually need multiple steps (load + analyze + calculate)
-- Research tasks typically need search + fetch + synthesis steps
 - Comparison tasks need separate analysis steps for each item being compared
 RULES:

+SYSTEM_PROMPT_PLANNER_OLD = """
 You are the planner of a multi-tool agent. Build a short, realistic plan that the executor can follow.
 Available tools: {tool_catalogue}
 {{
   "steps": [
     {{"id": "s1", "goal": "Search for recent AI news and developments", "tool": "web_search"}},
+    {{"id": "s2", "goal": "Extract all info from founded urls", "tool": "web_extract"}},
     {{"id": "s3", "goal": "Extract and organize key information from articles", "tool": "analyze_(csv, docx, pdf etc.)_file"}},
     {{"id": "s4", "goal": "Analyze and synthesize key trends from gathered information", "tool": null}}
   ]
 - Prefer 2-4 steps for most tasks. Single steps only for truly trivial queries. Calculation tasks must use tools always.
 - Break down complex tasks into logical components - don't try to solve everything at once
 - Use tool names exactly as listed. If no tool is needed, set "tool": null.
+- Never assume files or URLs exist—plan to search/extract before analysing.
 - Skip download steps when the required file is already provided.
 - Ensure later steps only depend on results created by earlier steps.
 - For any numerical work: ALWAYS use tools (calculator/code) - never manual calculation
 - Plan for visualization or formatting steps when presenting complex results
 """
+SYSTEM_PROMPT_PLANNER = """
+You are the planner of a multi-tool agent. Build a short, realistic plan that the executor can follow.
+Available tools: {tool_catalogue}
+Known local files: {file_list}
+Additional context: {extra_context}
+CRITICAL COMPUTATION RULE: ANY mathematical calculation, counting, statistical analysis, or numerical computation MUST be performed using either:
+- Mathematical tools (calculator, math functions) for simple calculations
+- Code execution tools (Python/JavaScript) for complex calculations, data analysis, or statistical operations
+NEVER perform calculations manually or estimate numerical results.
+TASK BREAKDOWN EXAMPLES:
+Example 1: "Analyze sales data and calculate growth rates"
+{{
+  "steps": [
+    {{"id": "s1", "goal": "Load and examine the sales data file", "tool": "analyze_(csv, docx, pdf etc.)_file"}},
+    {{"id": "s2", "goal": "Calculate monthly growth rates using Python", "tool": "safe_code_run"}},
+    {{"id": "s3", "goal": "Generate summary statistics and trends", "tool": "safe_code_run"}}
+  ]
+}}
+Example 2: "Research recent AI developments and summarize key trends"
+{{
+  "steps": [
+    {{"id": "s1", "goal": "Search for recent AI news and developments", "tool": "tavily_search"}},
+    {{"id": "s2", "goal": "Extract key links and pick relevant documents (PDF, reports)", "tool": "tavilyextract"}},
+    {{"id": "s3", "goal": "Download chosen report for detailed analysis", "tool": "download_file_from_url"}},
+    {{"id": "s4", "goal": "Analyze the downloaded document (PDF/DOCX/TXT)", "tool": "analyze_pdf_file"}},
+    {{"id": "s5", "goal": "Summarize and synthesize key insights from the analyzed content", "tool": null}}
+  ]
+}}
+Example 3: "Compare performance metrics between two datasets"
+{{
+  "steps": [
+    {{"id": "s1", "goal": "Load first dataset and examine structure", "tool": "analyze_csv_file"}},
+    {{"id": "s2", "goal": "Load second dataset and examine structure", "tool": "analyze_excel_file"}},
+    {{"id": "s3", "goal": "Calculate statistical metrics for both datasets using code", "tool": "safe_code_run"}},
+    {{"id": "s4", "goal": "Perform statistical comparison and significance testing", "tool": "safe_code_run"}}
+  ]
+}}
+Example 4: "Create a budget analysis from expense data"
+{{
+  "steps": [
+    {{"id": "s1", "goal": "Load expense data and validate format", "tool": "analyze_csv_file"}},
+    {{"id": "s2", "goal": "Calculate category totals and percentages using code", "tool": "safe_code_run"}},
+    {{"id": "s3", "goal": "Generate budget variance analysis and projections", "tool": "safe_code_run"}},
+    {{"id": "s4", "goal": "Create visualization of spending patterns", "tool": "safe_code_run"}}
+  ]
+}}
+Example 5: "Find and analyze a scientific PDF report on renewable energy"
+{{
+  "steps": [
+    {{"id": "s1", "goal": "Search the web for renewable energy PDF reports", "tool": "tavily_search"}},
+    {{"id": "s2", "goal": "Extract candidate PDF links from the search results", "tool": "tavilyextract"}},
+    {{"id": "s3", "goal": "Download the most relevant PDF document", "tool": "download_file_from_url"}},
+    {{"id": "s4", "goal": "Parse and extract text from the downloaded PDF", "tool": "analyze_pdf_file"}},
+    {{"id": "s5", "goal": "Summarize findings and highlight key trends in renewable energy", "tool": null}}
+  ]
+}}
+Return a single JSON object with this structure:
+{{
+  "task_type": "info|calc|table|doc_qa|image_qa|multi_hop",
+  "summary": "One sentence on the chosen approach",
+  "assumptions": ["optional clarifications"],
+  "steps": [
+    {{
+      "id": "s1",
+      "goal": "Action to take and why it helps",
+      "tool": "tool_name_or_null",
+      "inputs": "Key parameters or references (files, URLs, prior steps)",
+      "expected_result": "How you know the step succeeded",
+      "on_fail": "replan|stop"
+    }}
+  ],
+  "answer_guidelines": "Reminders for the final response (citations, format, units, etc.)"
+}}
+Ground rules:
+- Prefer 2-4 steps for most tasks. Single steps only for truly trivial queries. Calculation tasks must use tools always.
+- Break down complex tasks into logical components - don't try to solve everything at once.
+- Use tool names exactly as listed. If no tool is needed, set "tool": null.
+- Never assume files or URLs exist—plan to search/extract before analysing.
+- Skip download steps when the required file is already provided.
+- Ensure later steps only depend on results created by earlier steps.
+- For any numerical work: ALWAYS use tools (calculator/code) - never manual calculation.
+- If the query involves analysis of multiple sources, plan separate steps for each source.
+- Consider data validation and error checking as separate steps when handling files.
+- Plan for visualization or formatting steps when presenting complex results.
+"""
 SYSTEM_EXECUTOR_PROMPT = """
 You are the executor of a grounded multi-tool agent.
 SPECIAL CONSIDERATIONS:
 - Any calculation/counting task requires tools (affects complexity assessment)
 - File analysis tasks usually need multiple steps (load + analyze + calculate)
+- Research tasks typically need search + fetch/extract + synthesis steps
 - Comparison tasks need separate analysis steps for each item being compared
 RULES:

src/requirements.txt ADDED Viewed

	@@ -0,0 +1,21 @@

+docx==0.2.4
+gradio==5.46.1
+ipython==8.12.3
+langchain==0.3.27
+langchain_community==0.3.29
+langchain_core==0.3.76
+langchain_openai==0.3.33
+langgraph==0.6.7
+matplotlib==3.8.2
+numpy==2.3.3
+pandas==2.3.2
+pdfminer==20191125
+Pillow==11.3.0
+protobuf==6.32.1
+pydantic==2.11.9
+pytesseract==0.3.13
+python-dotenv==1.1.1
+Requests==2.32.5
+tldextract==5.3.0
+langchain-tavily
+youtube-transcript-api

src/tools/tools.py CHANGED Viewed

@@ -6,6 +6,8 @@ import base64
 import tldextract
 import tempfile
 from urllib.parse import urlparse
 import io
 import pandas as pd
 from typing import List, Optional, Dict, Any
@@ -18,6 +20,7 @@ from langchain_community.document_loaders import ArxivLoader
 from langchain_community.document_loaders import WikipediaLoader
 from PIL import ImageDraw, ImageFont, ImageEnhance, ImageFilter
 from utils.image_processing import *
 def _exif_dict(img: Image.Image) -> dict:
     try:
@@ -38,6 +41,7 @@ def _clip(text: str | None, n: int) -> str:
     return (text[: n - 1] + "…") if len(text) > n else text
 def _parse_dt(v) -> Optional[str]:
     """[ИЗМЕНЕНИЕ] Приводим даты к ISO-строке, если возможно."""
     try:
@@ -360,6 +364,35 @@ def arxiv_search(
         return json.dumps({"error": str(e), "query": query, "provider": "arxiv"})
 #----------------------------------------------MATH TOOLS------------------------------------------------#

 import tldextract
 import tempfile
 from urllib.parse import urlparse
+from langchain_tavily import TavilyExtract
+from youtube_transcript_api import YouTubeTranscriptApi
 import io
 import pandas as pd
 from typing import List, Optional, Dict, Any
 from langchain_community.document_loaders import WikipediaLoader
 from PIL import ImageDraw, ImageFont, ImageEnhance, ImageFilter
 from utils.image_processing import *
+import re
 def _exif_dict(img: Image.Image) -> dict:
     try:
     return (text[: n - 1] + "…") if len(text) > n else text
 def _parse_dt(v) -> Optional[str]:
     """[ИЗМЕНЕНИЕ] Приводим даты к ISO-строке, если возможно."""
     try:
         return json.dumps({"error": str(e), "query": query, "provider": "arxiv"})
+@tool
+def web_extract(urls : List[str]) -> str:
+    """
+    Extract text content from web pages using TavilyExtract.
+    Returns JSON with {url, title, text, images?} for each URL.
+    """
+    tool = TavilyExtract(
+    extract_depth="basic",
+    include_images=False,
+)
+    results = tool.invoke(urls)
+    return json.dumps(results)
+@tool
+def extract_youtube_transcript(url: str, chars: int = 10_00) -> str:
+    """
+    Fetch full YouTube transcript (first *chars* characters).
+    """
+    video_id_match = re.search(r"[?&]v=([A-Za-z0-9_\-]{11})", url)
+    if not video_id_match:
+        return "yt_error:id_not_found"
+    try:
+        transcript = YouTubeTranscriptApi.get_transcript(video_id_match.group(1))
+        text = " ".join(piece["text"] for piece in transcript)
+        return text[:chars]
+    except Exception as exc:
+        return f"yt_error:{exc}"
 #----------------------------------------------MATH TOOLS------------------------------------------------#

src/workflow_test.ipynb CHANGED Viewed

@@ -42,34 +42,43 @@
       "💡 ════════════════════\n",
       "💡  USER QUERY \n",
       "💡 ════════════════════\n",
-      "   • files: none provided\n",
       "=== COMPLEXITY ASSESSMENT ===\n",
       "Complexity: simple\n",
       "Needs planning: False\n",
-      "Reasoning: This is a single-step arithmetic question (2+2). Although calculations technically require a tool per the special considerations, this is trivial and requires only one immediate operation, so it is SIMPLE.\n",
       "=== SIMPLE EXECUTION ===\n",
       "Response generated for simple query.\n",
       "=== GENERATING EXECUTION REPORT ===\n",
       "Report generated - Confidence: high\n",
       "Key findings: 3\n",
       "Data sources: 2\n",
-      "query_summary=\"User asked for the numeric result of the arithmetic expression '2+2'.\" approach_used=\"Direct evaluation using basic arithmetic: interpreted '+' as standard integer addition and computed the sum mentally without invoking external tools or files.\" tools_executed=[] key_findings=[\"The expression '2+2' was interpreted as standard integer addition.\", 'Computed result is 4.', 'No external tools or data were required to compute the result.'] data_sources=['Basic arithmetic rules (internal knowledge)', 'Conversation history confirming the query and an earlier direct answer'] assumptions_made=[\"The '+' operator denotes standard arithmetic addition on integers.\", 'Numbers are in the usual base-10 system and no special context (e.g., modular arithmetic or symbolic manipulation) was intended.'] confidence_level='high' limitations=['If the user intended a nonstandard context (modulo arithmetic, different base, or overloaded operator semantics), the answer could differ.', 'Extremely simple query; few realistic limitations beyond contextual ambiguity.'] final_answer='4'\n",
       "=== ENHANCED ANSWER CRITIQUE ===\n",
-      "Quality Score: 8/10\n",
       "Complete: True\n",
       "Accurate: True\n",
-      "Issues found: [\"Performed the calculation mentally rather than using an external computational tool (triggers the evaluation framework's manual-calculation penalty).\"]\n",
       "=== REPLAN DECISION ===\n",
       "Iteration: 1/10\n",
-      "Quality score: 8\n",
       "Needs replanning: False\n",
       "Quality acceptable, ending execution\n"
      ]
     }
    ],
    "source": [
-    "query = \"What is 2+2\"\n",
-    "result = graph.invoke({\"query\" : query, \"current_step\": 0, \"reasoning_done\": False, \"files\" : [], \"files_contents\" : {}, \"iteration_count\" : 0, \"max_iterations\" : 10, \"plan\" : None} , config = config)"
    ]
   },
   {
@@ -81,7 +90,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "FINAL ANSWER: 4\n"
      ]
     }
    ],
@@ -97,20 +106,27 @@
     {
      "data": {
       "text/plain": [
-       "{'messages': [SystemMessage(content='You are a COMPLEXITY ASSESSOR for a multi-tool agent system.\\nYour job is to analyze user queries and determine their complexity level and processing requirements.\\n\\nCOMPLEXITY LEVELS:\\n1. SIMPLE: Direct questions that can be answered immediately without tools or with single tool use\\n   - Examples: \"What is photosynthesis?\", \"Define machine learning\", \"What\\'s the capital of France?\"\\n   - NOTE: Simple math like \"2+2\" still requires calculator tool but counts as SIMPLE\\n\\n   !ALSO: It can be a logical reasoning or explanation task that does not require tools.\\n   \\n2. MODERATE: Questions requiring 2-4 tool calls or basic multi-step analysis\\n   - Examples: \"Search for recent news about AI\", \"Analyze this CSV file for trends\", \"Calculate ROI from this data\"\\n   - \"Compare two datasets\", \"Summarize multiple documents\"\\n   \\n3. COMPLEX: Multi-step problems requiring planning, multiple tools, and sophisticated reasoning\\n   - Examples: \"Research market trends and create investment strategy\", \"Analyze multiple data sources and predict outcomes\"\\n   - \"Build comprehensive report from various inputs\", \"Multi-stage data processing with validation\"\\n\\nMOST OF THE LOGICAL TASKS ARE SIMPLE, UNLESS THEY REQUIRE TOOLS.\\n\\nASSESSMENT CRITERIA:\\n- Number of distinct steps likely needed (1 = Simple, 2-4 = Moderate, 5+ = Complex)\\n- Tool complexity and dependencies between steps\\n- Data processing requirements and validation needs\\n- Need for intermediate reasoning and synthesis\\n- Risk of failure without proper step-by-step planning\\n- Presence of calculations (automatically requires tool usage)\\n\\nSPECIAL CONSIDERATIONS:\\n- Any calculation/counting task requires tools (affects complexity assessment)\\n- File analysis tasks usually need multiple steps (load + analyze + calculate)\\n- Research tasks typically need search + fetch + synthesis steps\\n- Comparison tasks need separate analysis steps for each item being compared\\n\\nRULES:\\n- SIMPLE queries may bypass planning for non-calculation tasks\\n- MODERATE queries benefit from lightweight planning\\n- COMPLEX queries require full planning with fallbacks\\n- When in doubt, err toward higher complexity\\n- Calculation tasks are never truly \"simple\" due to mandatory tool usage\\n\\nAnalyze the query and respond with your assessment.', additional_kwargs={}, response_metadata={}, id='db109164-6e6e-4c1f-82bb-93d6d9b64e6a'),\n",
-       "  HumanMessage(content='Query: What is 2+2', additional_kwargs={}, response_metadata={}, id='6b9afadb-3463-40a2-989b-19f8a237f7fc'),\n",
-       "  AIMessage(content='2 + 2 = 4', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 80, 'prompt_tokens': 1638, 'total_tokens': 1718, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 64, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-5-mini-2025-08-07', 'system_fingerprint': None, 'id': 'chatcmpl-CId3zSwgGIoDxYMuwG2xJfCLDiVuM', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='run--210d298d-a542-4458-8933-93ebf4c7bac0-0', usage_metadata={'input_tokens': 1638, 'output_tokens': 80, 'total_tokens': 1718, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 64}})],\n",
-       " 'query': 'What is 2+2',\n",
-       " 'final_answer': 'FINAL ANSWER: 4',\n",
        " 'plan': None,\n",
-       " 'complexity_assessment': ComplexityLevel(level='simple', reasoning='This is a single-step arithmetic question (2+2). Although calculations technically require a tool per the special considerations, this is trivial and requires only one immediate operation, so it is SIMPLE.', needs_planning=False, suggested_approach='Perform the basic arithmetic (2+2) and return the result (4). No detailed planning or multi-step processing needed.'),\n",
        " 'current_step': 0,\n",
        " 'reasoning_done': False,\n",
-       " 'files': [],\n",
-       " 'critique_feedback': CritiqueFeedback(quality_score=8, is_complete=True, is_accurate=True, missing_elements=[], errors_found=[\"Performed the calculation mentally rather than using an external computational tool (triggers the evaluation framework's manual-calculation penalty).\"], suggested_improvements=['Use a computational tool or explicitly show the calculation steps even for trivial arithmetic to avoid the manual-calculation policy violation (e.g., evaluate with a calculator tool or print the operation and result).', \"Explicitly state assumptions up front (that '+' is standard integer addition in base 10) and, when relevant, ask a clarifying question if the user might have meant a nonstandard interpretation (modular arithmetic, different base, operator overloading).\", 'For transparency, include a short note citing the arithmetic rule used (e.g., basic integer addition) when delivering the result, even though the operation is trivial.'], needs_replanning=False, replan_instructions=None),\n",
        " 'iteration_count': 1,\n",
        " 'max_iterations': 10,\n",
-       " 'execution_report': ExecutionReport(query_summary=\"User asked for the numeric result of the arithmetic expression '2+2'.\", approach_used=\"Direct evaluation using basic arithmetic: interpreted '+' as standard integer addition and computed the sum mentally without invoking external tools or files.\", tools_executed=[], key_findings=[\"The expression '2+2' was interpreted as standard integer addition.\", 'Computed result is 4.', 'No external tools or data were required to compute the result.'], data_sources=['Basic arithmetic rules (internal knowledge)', 'Conversation history confirming the query and an earlier direct answer'], assumptions_made=[\"The '+' operator denotes standard arithmetic addition on integers.\", 'Numbers are in the usual base-10 system and no special context (e.g., modular arithmetic or symbolic manipulation) was intended.'], confidence_level='high', limitations=['If the user intended a nonstandard context (modulo arithmetic, different base, or overloaded operator semantics), the answer could differ.', 'Extremely simple query; few realistic limitations beyond contextual ambiguity.'], final_answer='4')}"
       ]
      },
      "execution_count": 5,

       "💡 ════════════════════\n",
       "💡  USER QUERY \n",
       "💡 ════════════════════\n",
+      "Processing 1 files:\n",
+      "\n",
+      "📁 ════════════════════\n",
+      "📁  FILE PREPARATION \n",
+      "📁 ════════════════════\n",
+      "📁 Processing 1 file(s)\n",
+      "  - D:/ankelodon_multiagent_system/data/Screenshot_1.png: image (979539 bytes) -> vision_qa_gemma\n",
+      "   • path: D:/ankelodon_multiagent_system/data/Screenshot_1.png\n",
+      "   • type: image\n",
+      "   • size: 979539 bytes\n",
+      "   • suggested_tool: vision_qa_gemma\n",
       "=== COMPLEXITY ASSESSMENT ===\n",
       "Complexity: simple\n",
       "Needs planning: False\n",
+      "Reasoning: A mate-in-one is a single-step chess puzzle: it requires identifying one legal white move that delivers checkmate. That is one distinct step, no multi-stage data processing, and no specialized tools are required (unless you want to verify moves with a chess engine). Note: the user did not supply a board position (FEN/diagram/image); if the position is provided the task remains SIMPLE.\n",
       "=== SIMPLE EXECUTION ===\n",
       "Response generated for simple query.\n",
       "=== GENERATING EXECUTION REPORT ===\n",
       "Report generated - Confidence: high\n",
       "Key findings: 3\n",
       "Data sources: 2\n",
+      "query_summary='User requested the mate-in-one move for White from a provided chess diagram and asked to return only the algebraic notation of the move.' approach_used='Used an image-based QA tool (vision_qa_gemma) to identify the chess position from the supplied screenshot, determine that it is White to move, compute the mate-in-one, and return the move in algebraic notation.' tools_executed=[ToolExecution(tool_name='vision_qa_gemma', arguments='{\\'question\\': \"Identify the chess position and find mate in one for White. Provide the algebraic notation of the move only. It\\'s White to move.\", \\'image_path\\': \\'D:/ankelodon_multiagent_system/data/Screenshot_1.png\\', \\'temperature\\': 0.2}', call_id='call_RxKLwrV2KP1sJ7ShaImQqbUl')] key_findings=['The position in the provided image was analyzed and confirmed to be White to move.', 'A forced mate in one was identified for White.', 'The mate-in-one move in algebraic notation is: Qf7#.'] data_sources=['Screenshot image: D:/ankelodon_multiagent_system/data/Screenshot_1.png', 'vision_qa_gemma tool output (see Tools Executed)'] assumptions_made=[\"Standard algebraic notation is used (including '#' for mate).\", 'Board orientation is conventional with White pieces at the bottom (as interpreted from the image).', 'The image is an accurate and complete representation of the game state (no hidden or obscured pieces).'] confidence_level='high' limitations=['Result depends on the accuracy of the vision tool interpreting the image; any misread piece or orientation could change the correct move.', 'Only one tool/analysis pass was used; no independent engine verification was run here.', 'If there are multiple legal mate-in-one moves in the position, this report records the move returned by the tool without enumerating alternatives.'] final_answer='Qf7#'\n",
       "=== ENHANCED ANSWER CRITIQUE ===\n",
+      "Quality Score: 7/10\n",
       "Complete: True\n",
       "Accurate: True\n",
       "=== REPLAN DECISION ===\n",
       "Iteration: 1/10\n",
+      "Quality score: 7\n",
       "Needs replanning: False\n",
       "Quality acceptable, ending execution\n"
      ]
     }
    ],
    "source": [
+    "query = \"Find the chekmate in one move and provide only algebraic notation of the move, its a white turn.\"\n",
+    "result = graph.invoke({\"query\" : query, \"current_step\": 0, \"reasoning_done\": False, \"files\" : [\"D:/ankelodon_multiagent_system/data/Screenshot_1.png\"], \"files_contents\" : {}, \"iteration_count\" : 0, \"max_iterations\" : 10, \"plan\" : None} , config = config)"
    ]
   },
   {
      "name": "stdout",
      "output_type": "stream",
      "text": [
+      "FINAL ANSWER: Qf7#\n"
      ]
     }
    ],
     {
      "data": {
       "text/plain": [
+       "{'messages': [SystemMessage(content='You are a COMPLEXITY ASSESSOR for a multi-tool agent system.\\nYour job is to analyze user queries and determine their complexity level and processing requirements.\\n\\nCOMPLEXITY LEVELS:\\n1. SIMPLE: Direct questions that can be answered immediately without tools or with single tool use\\n   - Examples: \"What is photosynthesis?\", \"Define machine learning\", \"What\\'s the capital of France?\"\\n   - NOTE: Simple math like \"2+2\" still requires calculator tool but counts as SIMPLE\\n\\n   !ALSO: It can be a logical reasoning or explanation task that does not require tools.\\n   \\n2. MODERATE: Questions requiring 2-4 tool calls or basic multi-step analysis\\n   - Examples: \"Search for recent news about AI\", \"Analyze this CSV file for trends\", \"Calculate ROI from this data\"\\n   - \"Compare two datasets\", \"Summarize multiple documents\"\\n   \\n3. COMPLEX: Multi-step problems requiring planning, multiple tools, and sophisticated reasoning\\n   - Examples: \"Research market trends and create investment strategy\", \"Analyze multiple data sources and predict outcomes\"\\n   - \"Build comprehensive report from various inputs\", \"Multi-stage data processing with validation\"\\n\\nMOST OF THE LOGICAL TASKS ARE SIMPLE, UNLESS THEY REQUIRE TOOLS.\\n\\nASSESSMENT CRITERIA:\\n- Number of distinct steps likely needed (1 = Simple, 2-4 = Moderate, 5+ = Complex)\\n- Tool complexity and dependencies between steps\\n- Data processing requirements and validation needs\\n- Need for intermediate reasoning and synthesis\\n- Risk of failure without proper step-by-step planning\\n- Presence of calculations (automatically requires tool usage)\\n\\nSPECIAL CONSIDERATIONS:\\n- Any calculation/counting task requires tools (affects complexity assessment)\\n- File analysis tasks usually need multiple steps (load + analyze + calculate)\\n- Research tasks typically need search + fetch/extract + synthesis steps\\n- Comparison tasks need separate analysis steps for each item being compared\\n\\nRULES:\\n- SIMPLE queries may bypass planning for non-calculation tasks\\n- MODERATE queries benefit from lightweight planning\\n- COMPLEX queries require full planning with fallbacks\\n- When in doubt, err toward higher complexity\\n- Calculation tasks are never truly \"simple\" due to mandatory tool usage\\n\\nAnalyze the query and respond with your assessment.', additional_kwargs={}, response_metadata={}, id='a857824a-40fa-4453-8cdd-5dd9c73cd1dc'),\n",
+       "  HumanMessage(content='Query: Find the chekmate in one move and provide only algebraic notation of the move, its a white turn.', additional_kwargs={}, response_metadata={}, id='5a765bb0-2ced-4919-aa4c-3cb6852b6a6d'),\n",
+       "  AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_RxKLwrV2KP1sJ7ShaImQqbUl', 'function': {'arguments': '{\"question\":\"Identify the chess position and find mate in one for White. Provide the algebraic notation of the move only. It\\'s White to move.\",\"image_path\":\"D:/ankelodon_multiagent_system/data/Screenshot_1.png\",\"temperature\":0.2}', 'name': 'vision_qa_gemma'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 140, 'prompt_tokens': 1771, 'total_tokens': 1911, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 64, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-5-mini-2025-08-07', 'system_fingerprint': None, 'id': 'chatcmpl-CIjeuIxLBZoq7rEjlZRWcfUO3KQWZ', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run--d9288bb8-21be-4a2e-a8b9-7e4fa97936e9-0', tool_calls=[{'name': 'vision_qa_gemma', 'args': {'question': \"Identify the chess position and find mate in one for White. Provide the algebraic notation of the move only. It's White to move.\", 'image_path': 'D:/ankelodon_multiagent_system/data/Screenshot_1.png', 'temperature': 0.2}, 'id': 'call_RxKLwrV2KP1sJ7ShaImQqbUl', 'type': 'tool_call'}], usage_metadata={'input_tokens': 1771, 'output_tokens': 140, 'total_tokens': 1911, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 64}}),\n",
+       "  ToolMessage(content='{\"answer\": \"Qf7#\"}', name='vision_qa_gemma', id='559a320c-cb49-4f46-8e18-8b50d13fcbde', tool_call_id='call_RxKLwrV2KP1sJ7ShaImQqbUl')],\n",
+       " 'query': 'Find the chekmate in one move and provide only algebraic notation of the move, its a white turn.',\n",
+       " 'final_answer': 'FINAL ANSWER: Qf7#',\n",
        " 'plan': None,\n",
+       " 'complexity_assessment': ComplexityLevel(level='simple', reasoning='A mate-in-one is a single-step chess puzzle: it requires identifying one legal white move that delivers checkmate. That is one distinct step, no multi-stage data processing, and no specialized tools are required (unless you want to verify moves with a chess engine). Note: the user did not supply a board position (FEN/diagram/image); if the position is provided the task remains SIMPLE.', needs_planning=False, suggested_approach='Ask the user to provide the chess position (FEN string, diagram, or image). Once the position is available, generate all legal white moves, test which move results in immediate checkmate, and return only the move in algebraic notation (e.g., Qh7#).'),\n",
        " 'current_step': 0,\n",
        " 'reasoning_done': False,\n",
+       " 'files': ['D:/ankelodon_multiagent_system/data/Screenshot_1.png'],\n",
+       " 'file_contents': {'D:/ankelodon_multiagent_system/data/Screenshot_1.png': {'path': 'D:/ankelodon_multiagent_system/data/Screenshot_1.png',\n",
+       "   'extension': '.png',\n",
+       "   'size': 979539,\n",
+       "   'type': 'image',\n",
+       "   'suggested_tool': 'vision_qa_gemma',\n",
+       "   'preview': None}},\n",
+       " 'critique_feedback': CritiqueFeedback(quality_score=7, is_complete=True, is_accurate=True, missing_elements=[], errors_found=[], suggested_improvements=['Consider using an additional chess engine for verification of the mate-in-one move to ensure accuracy.', 'Provide a brief explanation of the reasoning behind the identified move to enhance clarity for the user.'], needs_replanning=False, replan_instructions=None),\n",
        " 'iteration_count': 1,\n",
        " 'max_iterations': 10,\n",
+       " 'execution_report': ExecutionReport(query_summary='User requested the mate-in-one move for White from a provided chess diagram and asked to return only the algebraic notation of the move.', approach_used='Used an image-based QA tool (vision_qa_gemma) to identify the chess position from the supplied screenshot, determine that it is White to move, compute the mate-in-one, and return the move in algebraic notation.', tools_executed=[ToolExecution(tool_name='vision_qa_gemma', arguments='{\\'question\\': \"Identify the chess position and find mate in one for White. Provide the algebraic notation of the move only. It\\'s White to move.\", \\'image_path\\': \\'D:/ankelodon_multiagent_system/data/Screenshot_1.png\\', \\'temperature\\': 0.2}', call_id='call_RxKLwrV2KP1sJ7ShaImQqbUl')], key_findings=['The position in the provided image was analyzed and confirmed to be White to move.', 'A forced mate in one was identified for White.', 'The mate-in-one move in algebraic notation is: Qf7#.'], data_sources=['Screenshot image: D:/ankelodon_multiagent_system/data/Screenshot_1.png', 'vision_qa_gemma tool output (see Tools Executed)'], assumptions_made=[\"Standard algebraic notation is used (including '#' for mate).\", 'Board orientation is conventional with White pieces at the bottom (as interpreted from the image).', 'The image is an accurate and complete representation of the game state (no hidden or obscured pieces).'], confidence_level='high', limitations=['Result depends on the accuracy of the vision tool interpreting the image; any misread piece or orientation could change the correct move.', 'Only one tool/analysis pass was used; no independent engine verification was run here.', 'If there are multiple legal mate-in-one moves in the position, this report records the move returned by the tool without enumerating alternatives.'], final_answer='Qf7#')}"
       ]
      },
      "execution_count": 5,