KaiserShultz commited on
Commit
af3a044
Β·
1 Parent(s): 331400f

Improvements of prompt for planner (old version), adding youtube parser tool, tavily_extract

Browse files
.gitignore CHANGED
@@ -11,13 +11,13 @@ __pycache__/
11
  venv/
12
  env/
13
  .venv/
14
-
15
  # IDEs
16
  .vscode/
17
  .idea/
18
  *.swp
19
  *.swo
20
-
21
  # OS
22
  .DS_Store
23
  .DS_Store?
 
11
  venv/
12
  env/
13
  .venv/
14
+ D:/ankelodon_multiagent_system/questions_20_gaia.json
15
  # IDEs
16
  .vscode/
17
  .idea/
18
  *.swp
19
  *.swo
20
+ data/
21
  # OS
22
  .DS_Store
23
  .DS_Store?
questions_20_gaia.json ADDED
@@ -0,0 +1 @@
 
 
1
+ [{"task_id":"8e867cd7-cff9-4e6c-867a-ff5ddc2550be","question":"How many studio albums were published by Mercedes Sosa between 2000 and 2009 (included)? You can use the latest 2022 version of english wikipedia.","Level":"1","file_name":""},{"task_id":"a1e91b78-d3d8-4675-bb8d-62741b4b68a6","question":"In the video https://www.youtube.com/watch?v=L1vXCYZAYYM, what is the highest number of bird species to be on camera simultaneously?","Level":"1","file_name":""},{"task_id":"2d83110e-a098-4ebb-9987-066c06fa42d0","question":".rewsna eht sa \"tfel\" drow eht fo etisoppo eht etirw ,ecnetnes siht dnatsrednu uoy fI","Level":"1","file_name":""},{"task_id":"cca530fc-4052-43b2-b130-b30968d8aa44","question":"Review the chess position provided in the image. It is black's turn. Provide the correct next move for black which guarantees a win. Please provide your response in algebraic notation.","Level":"1","file_name":"cca530fc-4052-43b2-b130-b30968d8aa44.png"},{"task_id":"4fc2f1ae-8625-45b5-ab34-ad4433bc21f8","question":"Who nominated the only Featured Article on English Wikipedia about a dinosaur that was promoted in November 2016?","Level":"1","file_name":""},{"task_id":"6f37996b-2ac7-44b0-8e68-6d28256631b4","question":"Given this table defining * on the set S = {a, b, c, d, e}\n\n|*|a|b|c|d|e|\n|---|---|---|---|---|---|\n|a|a|b|c|b|d|\n|b|b|c|a|e|c|\n|c|c|a|b|b|a|\n|d|b|e|b|e|d|\n|e|d|b|a|d|c|\n\nprovide the subset of S involved in any possible counter-examples that prove * is not commutative. Provide your answer as a comma separated list of the elements in the set in alphabetical order.","Level":"1","file_name":""},{"task_id":"9d191bce-651d-4746-be2d-7ef8ecadb9c2","question":"Examine the video at https://www.youtube.com/watch?v=1htKBjuUWec.\n\nWhat does Teal'c say in response to the question \"Isn't that hot?\"","Level":"1","file_name":""},{"task_id":"cabe07ed-9eca-40ea-8ead-410ef5e83f91","question":"What is the surname of the equine veterinarian mentioned in 1.E Exercises from the chemistry materials licensed by Marisa Alviar-Agnew & Henry Agnew under the CK-12 license in LibreText's Introductory Chemistry materials as compiled 08/21/2023?","Level":"1","file_name":""},{"task_id":"3cef3a44-215e-4aed-8e3b-b1e3f08063b7","question":"I'm making a grocery list for my mom, but she's a professor of botany and she's a real stickler when it comes to categorizing things. I need to add different foods to different categories on the grocery list, but if I make a mistake, she won't buy anything inserted in the wrong category. Here's the list I have so far:\n\nmilk, eggs, flour, whole bean coffee, Oreos, sweet potatoes, fresh basil, plums, green beans, rice, corn, bell pepper, whole allspice, acorns, broccoli, celery, zucchini, lettuce, peanuts\n\nI need to make headings for the fruits and vegetables. Could you please create a list of just the vegetables from my list? If you could do that, then I can figure out how to categorize the rest of the list into the appropriate categories. But remember that my mom is a real stickler, so make sure that no botanical fruits end up on the vegetable list, or she won't get them when she's at the store. Please alphabetize the list of vegetables, and place each item in a comma separated list.","Level":"1","file_name":""},{"task_id":"99c9cc74-fdc8-46c6-8f8d-3ce2d3bfeea3","question":"Hi, I'm making a pie but I could use some help with my shopping list. I have everything I need for the crust, but I'm not sure about the filling. I got the recipe from my friend Aditi, but she left it as a voice memo and the speaker on my phone is buzzing so I can't quite make out what she's saying. Could you please listen to the recipe and list all of the ingredients that my friend described? I only want the ingredients for the filling, as I have everything I need to make my favorite pie crust. I've attached the recipe as Strawberry pie.mp3.\n\nIn your response, please only list the ingredients, not any measurements. So if the recipe calls for \"a pinch of salt\" or \"two cups of ripe strawberries\" the ingredients on the list would be \"salt\" and \"ripe strawberries\".\n\nPlease format your response as a comma separated list of ingredients. Also, please alphabetize the ingredients.","Level":"1","file_name":"99c9cc74-fdc8-46c6-8f8d-3ce2d3bfeea3.mp3"},{"task_id":"305ac316-eef6-4446-960a-92d80d542f82","question":"Who did the actor who played Ray in the Polish-language version of Everybody Loves Raymond play in Magda M.? Give only the first name.","Level":"1","file_name":""},{"task_id":"f918266a-b3e0-4914-865d-4faa564f1aef","question":"What is the final numeric output from the attached Python code?","Level":"1","file_name":"f918266a-b3e0-4914-865d-4faa564f1aef.py"},{"task_id":"3f57289b-8c60-48be-bd80-01f8099ca449","question":"How many at bats did the Yankee with the most walks in the 1977 regular season have that same season?","Level":"1","file_name":""},{"task_id":"1f975693-876d-457b-a649-393859e79bf3","question":"Hi, I was out sick from my classes on Friday, so I'm trying to figure out what I need to study for my Calculus mid-term next week. My friend from class sent me an audio recording of Professor Willowbrook giving out the recommended reading for the test, but my headphones are broken :(\n\nCould you please listen to the recording for me and tell me the page numbers I'm supposed to go over? I've attached a file called Homework.mp3 that has the recording. Please provide just the page numbers as a comma-delimited list. And please provide the list in ascending order.","Level":"1","file_name":"1f975693-876d-457b-a649-393859e79bf3.mp3"},{"task_id":"840bfca7-4f7b-481a-8794-c560c340185d","question":"On June 6, 2023, an article by Carolyn Collins Petersen was published in Universe Today. This article mentions a team that produced a paper about their observations, linked at the bottom of the article. Find this paper. Under what NASA award number was the work performed by R. G. Arendt supported by?","Level":"1","file_name":""},{"task_id":"bda648d7-d618-4883-88f4-3466eabd860e","question":"Where were the Vietnamese specimens described by Kuznetzov in Nedoshivina's 2010 paper eventually deposited? Just give me the city name without abbreviations.","Level":"1","file_name":""},{"task_id":"cf106601-ab4f-4af9-b045-5295fe67b37d","question":"What country had the least number of athletes at the 1928 Summer Olympics? If there's a tie for a number of athletes, return the first in alphabetical order. Give the IOC country code as your answer.","Level":"1","file_name":""},{"task_id":"a0c07678-e491-4bbc-8f0b-07405144218f","question":"Who are the pitchers with the number before and after Taishō Tamai's number as of July 2023? Give them to me in the form Pitcher Before, Pitcher After, use their last names only, in Roman characters.","Level":"1","file_name":""},{"task_id":"7bd855d8-463d-4ed5-93ca-5fe35145f733","question":"The attached Excel file contains the sales of menu items for a local fast-food chain. What were the total sales that the chain made from food (not including drinks)? Express your answer in USD with two decimal places.","Level":"1","file_name":"7bd855d8-463d-4ed5-93ca-5fe35145f733.xlsx"},{"task_id":"5a0c1adf-205e-4841-a666-7c3ef95def9d","question":"What is the first name of the only Malko Competition recipient from the 20th Century (after 1977) whose nationality on record is a country that no longer exists?","Level":"1","file_name":""}]
src/__init__.py CHANGED
@@ -4,16 +4,101 @@ Import key components for easy use:
4
  from src import workflow, llm
5
  """
6
 
7
- from .config import llm, TOOLS, CONFIG, TOOL_NODE, planner_llm
8
- from .agent import workflow, build_workflow, should_continue
9
- from .nodes import agent, planner, query_input, critique
10
- from .schemas import AgentState, PlannerPlan, ComplexityLevel, CritiqueFeedback
 
 
 
 
11
 
 
12
  __version__ = "0.1.0"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  __all__ = [
14
- "llm", "TOOLS", "CONFIG", "TOOL_NODE", "planner_llm",
15
- "workflow", "build_workflow", "should_continue",
16
- "agent", "planner", "query_input", "critique",
17
- "AgentState", "PlannerPlan", "ComplexityLevel", "CritiqueFeedback",
18
- "__version__"
19
- ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  from src import workflow, llm
5
  """
6
 
7
+ """
8
+ Ankelodon Multi-Agent System – package init.
9
+
10
+ ЭкспортируСт ΡƒΠ΄ΠΎΠ±Π½Ρ‹ΠΉ ΠΏΡƒΠ±Π»ΠΈΡ‡Π½Ρ‹ΠΉ API для Ρ€Π°Π±ΠΎΡ‚Ρ‹ с Π³Ρ€Π°Ρ„ΠΎΠΌ, состояниСм Π°Π³Π΅Π½Ρ‚Π°,
11
+ схСмами ΠΈ ΠΊΠΎΠ½Ρ„ΠΈΠ³ΠΎΠΌ. Клади этот Ρ„Π°ΠΉΠ» Π² Π΄ΠΈΡ€Π΅ΠΊΡ‚ΠΎΡ€ΠΈΡŽ, Π³Π΄Π΅ Π»Π΅ΠΆΠ°Ρ‚:
12
+ agent.py, config.py, nodes.py, schemas.py, state.py
13
+ (Ρƒ тСбя это src/).
14
+ """
15
 
16
+ # ВСрсия ΠΏΠ°ΠΊΠ΅Ρ‚Π° (ΠΏΠΎ ТСланию обновляй Π²Ρ€ΡƒΡ‡Π½ΡƒΡŽ/ΠΈΠ· git)
17
  __version__ = "0.1.0"
18
+
19
+ # ── Π“Ρ€Π°Ρ„/сборка
20
+ from .agent import build_workflow
21
+
22
+ # ── БостояниС
23
+ from .state import AgentState
24
+
25
+ # ── Π‘Ρ…Π΅ΠΌΡ‹/ΠΌΠΎΠ΄Π΅Π»ΠΈ
26
+ from .schemas import (
27
+ ComplexityLevel,
28
+ CritiqueFeedback,
29
+ PlannerPlan,
30
+ PlanStep,
31
+ ExecutionReport,
32
+ ToolExecution,
33
+ TaskType,
34
+ )
35
+
36
+ # ── ΠšΠΎΠ½Ρ„ΠΈΠ³/LLM/Tools
37
+ from .config import (
38
+ config,
39
+ TOOLS,
40
+ DEBUGGING_TOOL_NODE,
41
+ llm,
42
+ llm_deterministic,
43
+ planner_llm,
44
+ llm_with_tools,
45
+ llm_criticist,
46
+ llm_reasoning,
47
+ )
48
+
49
+ # ── Π£Π·Π»Ρ‹/Ρ€ΠΎΡƒΡ‚Π΅Ρ€Ρ‹ (Ссли Π½ΡƒΠΆΠ½ΠΎ Π²Ρ‹Π·Ρ‹Π²Π°Ρ‚ΡŒ Π½Π°ΠΏΡ€ΡΠΌΡƒΡŽ ΠΈΠ»ΠΈ для тСстов)
50
+ from .nodes import (
51
+ query_input,
52
+ complexity_assessor,
53
+ planner,
54
+ agent,
55
+ simple_executor,
56
+ critic_evaluator,
57
+ replanner,
58
+ enhanced_finalizer,
59
+ # Ρ€ΠΎΡƒΡ‚Π΅Ρ€Ρ‹
60
+ should_continue,
61
+ should_use_planning,
62
+ should_replan,
63
+ should_use_tools_simple_executor,
64
+ )
65
+
66
  __all__ = [
67
+ # вСрсия
68
+ "__version__",
69
+ # сборка Π³Ρ€Π°Ρ„Π°
70
+ "build_workflow",
71
+ # состояниС
72
+ "AgentState",
73
+ # схСмы
74
+ "ComplexityLevel",
75
+ "CritiqueFeedback",
76
+ "PlannerPlan",
77
+ "PlanStep",
78
+ "ExecutionReport",
79
+ "ToolExecution",
80
+ "TaskType",
81
+ # ΠΊΠΎΠ½Ρ„ΠΈΠ³/ΠΌΠΎΠ΄Π΅Π»ΠΈ/Ρ‚ΡƒΠ»Ρ‹
82
+ "config",
83
+ "TOOLS",
84
+ "DEBUGGING_TOOL_NODE",
85
+ "llm",
86
+ "llm_deterministic",
87
+ "planner_llm",
88
+ "llm_with_tools",
89
+ "llm_criticist",
90
+ "llm_reasoning",
91
+ # ΡƒΠ·Π»Ρ‹ ΠΈ Ρ€ΠΎΡƒΡ‚Π΅Ρ€Ρ‹
92
+ "query_input",
93
+ "complexity_assessor",
94
+ "planner",
95
+ "agent",
96
+ "simple_executor",
97
+ "critic_evaluator",
98
+ "replanner",
99
+ "enhanced_finalizer",
100
+ "should_continue",
101
+ "should_use_planning",
102
+ "should_replan",
103
+ "should_use_tools_simple_executor",
104
+ ]
src/config.py CHANGED
@@ -14,7 +14,7 @@ TOOLS = [download_file_from_url, web_search,
14
  arxiv_search, wiki_search, add, subtract, multiply, divide,
15
  power, analyze_excel_file, analyze_csv_file, analyze_docx_file,
16
  analyze_pdf_file, analyze_txt_file,
17
- vision_qa_gemma, safe_code_run]
18
 
19
 
20
  TOOL_NODE = ToolNode(TOOLS)
@@ -23,9 +23,11 @@ DEBUGGING_TOOL_NODE = TOOL_NODE
23
  llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7) #default 0.25
24
  llm_deterministic = ChatOpenAI(model="gpt-5-mini", temperature=0.05)
25
  planner_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.1).with_structured_output(PlannerPlan)
26
- llm_criticist = ChatOpenAI(model="gpt-5-mini", temperature=0.3)
27
  llm_with_tools = llm_deterministic.bind_tools(TOOLS)
28
  llm_reasoning = ChatOpenAI(model="gpt-5-mini", temperature=0.3)
 
 
29
 
30
 
31
 
 
14
  arxiv_search, wiki_search, add, subtract, multiply, divide,
15
  power, analyze_excel_file, analyze_csv_file, analyze_docx_file,
16
  analyze_pdf_file, analyze_txt_file,
17
+ vision_qa_gemma, safe_code_run, web_extract, extract_youtube_transcript]
18
 
19
 
20
  TOOL_NODE = ToolNode(TOOLS)
 
23
  llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7) #default 0.25
24
  llm_deterministic = ChatOpenAI(model="gpt-5-mini", temperature=0.05)
25
  planner_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.1).with_structured_output(PlannerPlan)
26
+ llm_criticist = ChatOpenAI(model="gpt-4o-mini", temperature=0.1)
27
  llm_with_tools = llm_deterministic.bind_tools(TOOLS)
28
  llm_reasoning = ChatOpenAI(model="gpt-5-mini", temperature=0.3)
29
+ llm_simple_executor = ChatOpenAI(model="gpt-5-mini", temperature=0.3)
30
+ llm_simple_with_tools = llm_simple_executor.bind_tools(TOOLS)
31
 
32
 
33
 
src/prompts/prompts.py CHANGED
@@ -1,4 +1,4 @@
1
- SYSTEM_PROMPT_PLANNER = """
2
  You are the planner of a multi-tool agent. Build a short, realistic plan that the executor can follow.
3
 
4
  Available tools: {tool_catalogue}
@@ -25,7 +25,7 @@ Example 2: "Research recent AI developments and summarize key trends"
25
  {{
26
  "steps": [
27
  {{"id": "s1", "goal": "Search for recent AI news and developments", "tool": "web_search"}},
28
- {{"id": "s2", "goal": "Download relevant articles", "tool": "ddownload_file_from_url"}},
29
  {{"id": "s3", "goal": "Extract and organize key information from articles", "tool": "analyze_(csv, docx, pdf etc.)_file"}},
30
  {{"id": "s4", "goal": "Analyze and synthesize key trends from gathered information", "tool": null}}
31
  ]
@@ -73,7 +73,7 @@ Ground rules:
73
  - Prefer 2-4 steps for most tasks. Single steps only for truly trivial queries. Calculation tasks must use tools always.
74
  - Break down complex tasks into logical components - don't try to solve everything at once
75
  - Use tool names exactly as listed. If no tool is needed, set "tool": null.
76
- - Never assume files or URLs existβ€”plan to search/download before analysing.
77
  - Skip download steps when the required file is already provided.
78
  - Ensure later steps only depend on results created by earlier steps.
79
  - For any numerical work: ALWAYS use tools (calculator/code) - never manual calculation
@@ -82,6 +82,103 @@ Ground rules:
82
  - Plan for visualization or formatting steps when presenting complex results
83
  """
84
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85
  SYSTEM_EXECUTOR_PROMPT = """
86
  You are the executor of a grounded multi-tool agent.
87
 
@@ -146,7 +243,7 @@ ASSESSMENT CRITERIA:
146
  SPECIAL CONSIDERATIONS:
147
  - Any calculation/counting task requires tools (affects complexity assessment)
148
  - File analysis tasks usually need multiple steps (load + analyze + calculate)
149
- - Research tasks typically need search + fetch + synthesis steps
150
  - Comparison tasks need separate analysis steps for each item being compared
151
 
152
  RULES:
 
1
+ SYSTEM_PROMPT_PLANNER_OLD = """
2
  You are the planner of a multi-tool agent. Build a short, realistic plan that the executor can follow.
3
 
4
  Available tools: {tool_catalogue}
 
25
  {{
26
  "steps": [
27
  {{"id": "s1", "goal": "Search for recent AI news and developments", "tool": "web_search"}},
28
+ {{"id": "s2", "goal": "Extract all info from founded urls", "tool": "web_extract"}},
29
  {{"id": "s3", "goal": "Extract and organize key information from articles", "tool": "analyze_(csv, docx, pdf etc.)_file"}},
30
  {{"id": "s4", "goal": "Analyze and synthesize key trends from gathered information", "tool": null}}
31
  ]
 
73
  - Prefer 2-4 steps for most tasks. Single steps only for truly trivial queries. Calculation tasks must use tools always.
74
  - Break down complex tasks into logical components - don't try to solve everything at once
75
  - Use tool names exactly as listed. If no tool is needed, set "tool": null.
76
+ - Never assume files or URLs existβ€”plan to search/extract before analysing.
77
  - Skip download steps when the required file is already provided.
78
  - Ensure later steps only depend on results created by earlier steps.
79
  - For any numerical work: ALWAYS use tools (calculator/code) - never manual calculation
 
82
  - Plan for visualization or formatting steps when presenting complex results
83
  """
84
 
85
+ SYSTEM_PROMPT_PLANNER = """
86
+ You are the planner of a multi-tool agent. Build a short, realistic plan that the executor can follow.
87
+
88
+ Available tools: {tool_catalogue}
89
+ Known local files: {file_list}
90
+ Additional context: {extra_context}
91
+
92
+ CRITICAL COMPUTATION RULE: ANY mathematical calculation, counting, statistical analysis, or numerical computation MUST be performed using either:
93
+ - Mathematical tools (calculator, math functions) for simple calculations
94
+ - Code execution tools (Python/JavaScript) for complex calculations, data analysis, or statistical operations
95
+ NEVER perform calculations manually or estimate numerical results.
96
+
97
+ TASK BREAKDOWN EXAMPLES:
98
+
99
+ Example 1: "Analyze sales data and calculate growth rates"
100
+ {{
101
+ "steps": [
102
+ {{"id": "s1", "goal": "Load and examine the sales data file", "tool": "analyze_(csv, docx, pdf etc.)_file"}},
103
+ {{"id": "s2", "goal": "Calculate monthly growth rates using Python", "tool": "safe_code_run"}},
104
+ {{"id": "s3", "goal": "Generate summary statistics and trends", "tool": "safe_code_run"}}
105
+ ]
106
+ }}
107
+
108
+ Example 2: "Research recent AI developments and summarize key trends"
109
+ {{
110
+ "steps": [
111
+ {{"id": "s1", "goal": "Search for recent AI news and developments", "tool": "tavily_search"}},
112
+ {{"id": "s2", "goal": "Extract key links and pick relevant documents (PDF, reports)", "tool": "tavilyextract"}},
113
+ {{"id": "s3", "goal": "Download chosen report for detailed analysis", "tool": "download_file_from_url"}},
114
+ {{"id": "s4", "goal": "Analyze the downloaded document (PDF/DOCX/TXT)", "tool": "analyze_pdf_file"}},
115
+ {{"id": "s5", "goal": "Summarize and synthesize key insights from the analyzed content", "tool": null}}
116
+ ]
117
+ }}
118
+
119
+ Example 3: "Compare performance metrics between two datasets"
120
+ {{
121
+ "steps": [
122
+ {{"id": "s1", "goal": "Load first dataset and examine structure", "tool": "analyze_csv_file"}},
123
+ {{"id": "s2", "goal": "Load second dataset and examine structure", "tool": "analyze_excel_file"}},
124
+ {{"id": "s3", "goal": "Calculate statistical metrics for both datasets using code", "tool": "safe_code_run"}},
125
+ {{"id": "s4", "goal": "Perform statistical comparison and significance testing", "tool": "safe_code_run"}}
126
+ ]
127
+ }}
128
+
129
+ Example 4: "Create a budget analysis from expense data"
130
+ {{
131
+ "steps": [
132
+ {{"id": "s1", "goal": "Load expense data and validate format", "tool": "analyze_csv_file"}},
133
+ {{"id": "s2", "goal": "Calculate category totals and percentages using code", "tool": "safe_code_run"}},
134
+ {{"id": "s3", "goal": "Generate budget variance analysis and projections", "tool": "safe_code_run"}},
135
+ {{"id": "s4", "goal": "Create visualization of spending patterns", "tool": "safe_code_run"}}
136
+ ]
137
+ }}
138
+
139
+ Example 5: "Find and analyze a scientific PDF report on renewable energy"
140
+ {{
141
+ "steps": [
142
+ {{"id": "s1", "goal": "Search the web for renewable energy PDF reports", "tool": "tavily_search"}},
143
+ {{"id": "s2", "goal": "Extract candidate PDF links from the search results", "tool": "tavilyextract"}},
144
+ {{"id": "s3", "goal": "Download the most relevant PDF document", "tool": "download_file_from_url"}},
145
+ {{"id": "s4", "goal": "Parse and extract text from the downloaded PDF", "tool": "analyze_pdf_file"}},
146
+ {{"id": "s5", "goal": "Summarize findings and highlight key trends in renewable energy", "tool": null}}
147
+ ]
148
+ }}
149
+
150
+ Return a single JSON object with this structure:
151
+ {{
152
+ "task_type": "info|calc|table|doc_qa|image_qa|multi_hop",
153
+ "summary": "One sentence on the chosen approach",
154
+ "assumptions": ["optional clarifications"],
155
+ "steps": [
156
+ {{
157
+ "id": "s1",
158
+ "goal": "Action to take and why it helps",
159
+ "tool": "tool_name_or_null",
160
+ "inputs": "Key parameters or references (files, URLs, prior steps)",
161
+ "expected_result": "How you know the step succeeded",
162
+ "on_fail": "replan|stop"
163
+ }}
164
+ ],
165
+ "answer_guidelines": "Reminders for the final response (citations, format, units, etc.)"
166
+ }}
167
+
168
+ Ground rules:
169
+ - Prefer 2-4 steps for most tasks. Single steps only for truly trivial queries. Calculation tasks must use tools always.
170
+ - Break down complex tasks into logical components - don't try to solve everything at once.
171
+ - Use tool names exactly as listed. If no tool is needed, set "tool": null.
172
+ - Never assume files or URLs existβ€”plan to search/extract before analysing.
173
+ - Skip download steps when the required file is already provided.
174
+ - Ensure later steps only depend on results created by earlier steps.
175
+ - For any numerical work: ALWAYS use tools (calculator/code) - never manual calculation.
176
+ - If the query involves analysis of multiple sources, plan separate steps for each source.
177
+ - Consider data validation and error checking as separate steps when handling files.
178
+ - Plan for visualization or formatting steps when presenting complex results.
179
+ """
180
+
181
+
182
  SYSTEM_EXECUTOR_PROMPT = """
183
  You are the executor of a grounded multi-tool agent.
184
 
 
243
  SPECIAL CONSIDERATIONS:
244
  - Any calculation/counting task requires tools (affects complexity assessment)
245
  - File analysis tasks usually need multiple steps (load + analyze + calculate)
246
+ - Research tasks typically need search + fetch/extract + synthesis steps
247
  - Comparison tasks need separate analysis steps for each item being compared
248
 
249
  RULES:
src/requirements.txt ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ docx==0.2.4
2
+ gradio==5.46.1
3
+ ipython==8.12.3
4
+ langchain==0.3.27
5
+ langchain_community==0.3.29
6
+ langchain_core==0.3.76
7
+ langchain_openai==0.3.33
8
+ langgraph==0.6.7
9
+ matplotlib==3.8.2
10
+ numpy==2.3.3
11
+ pandas==2.3.2
12
+ pdfminer==20191125
13
+ Pillow==11.3.0
14
+ protobuf==6.32.1
15
+ pydantic==2.11.9
16
+ pytesseract==0.3.13
17
+ python-dotenv==1.1.1
18
+ Requests==2.32.5
19
+ tldextract==5.3.0
20
+ langchain-tavily
21
+ youtube-transcript-api
src/tools/tools.py CHANGED
@@ -6,6 +6,8 @@ import base64
6
  import tldextract
7
  import tempfile
8
  from urllib.parse import urlparse
 
 
9
  import io
10
  import pandas as pd
11
  from typing import List, Optional, Dict, Any
@@ -18,6 +20,7 @@ from langchain_community.document_loaders import ArxivLoader
18
  from langchain_community.document_loaders import WikipediaLoader
19
  from PIL import ImageDraw, ImageFont, ImageEnhance, ImageFilter
20
  from utils.image_processing import *
 
21
 
22
  def _exif_dict(img: Image.Image) -> dict:
23
  try:
@@ -38,6 +41,7 @@ def _clip(text: str | None, n: int) -> str:
38
  return (text[: n - 1] + "…") if len(text) > n else text
39
 
40
 
 
41
  def _parse_dt(v) -> Optional[str]:
42
  """[Π˜Π—ΠœΠ•ΠΠ•ΠΠ˜Π•] ΠŸΡ€ΠΈΠ²ΠΎΠ΄ΠΈΠΌ Π΄Π°Ρ‚Ρ‹ ΠΊ ISO-строкС, Ссли Π²ΠΎΠ·ΠΌΠΎΠΆΠ½ΠΎ."""
43
  try:
@@ -360,6 +364,35 @@ def arxiv_search(
360
  return json.dumps({"error": str(e), "query": query, "provider": "arxiv"})
361
 
362
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
363
 
364
  #----------------------------------------------MATH TOOLS------------------------------------------------#
365
 
 
6
  import tldextract
7
  import tempfile
8
  from urllib.parse import urlparse
9
+ from langchain_tavily import TavilyExtract
10
+ from youtube_transcript_api import YouTubeTranscriptApi
11
  import io
12
  import pandas as pd
13
  from typing import List, Optional, Dict, Any
 
20
  from langchain_community.document_loaders import WikipediaLoader
21
  from PIL import ImageDraw, ImageFont, ImageEnhance, ImageFilter
22
  from utils.image_processing import *
23
+ import re
24
 
25
  def _exif_dict(img: Image.Image) -> dict:
26
  try:
 
41
  return (text[: n - 1] + "…") if len(text) > n else text
42
 
43
 
44
+
45
  def _parse_dt(v) -> Optional[str]:
46
  """[Π˜Π—ΠœΠ•ΠΠ•ΠΠ˜Π•] ΠŸΡ€ΠΈΠ²ΠΎΠ΄ΠΈΠΌ Π΄Π°Ρ‚Ρ‹ ΠΊ ISO-строкС, Ссли Π²ΠΎΠ·ΠΌΠΎΠΆΠ½ΠΎ."""
47
  try:
 
364
  return json.dumps({"error": str(e), "query": query, "provider": "arxiv"})
365
 
366
 
367
+ @tool
368
+ def web_extract(urls : List[str]) -> str:
369
+ """
370
+ Extract text content from web pages using TavilyExtract.
371
+ Returns JSON with {url, title, text, images?} for each URL.
372
+ """
373
+
374
+ tool = TavilyExtract(
375
+ extract_depth="basic",
376
+ include_images=False,
377
+ )
378
+ results = tool.invoke(urls)
379
+ return json.dumps(results)
380
+
381
+ @tool
382
+ def extract_youtube_transcript(url: str, chars: int = 10_00) -> str:
383
+ """
384
+ Fetch full YouTube transcript (first *chars* characters).
385
+ """
386
+
387
+ video_id_match = re.search(r"[?&]v=([A-Za-z0-9_\-]{11})", url)
388
+ if not video_id_match:
389
+ return "yt_error:id_not_found"
390
+ try:
391
+ transcript = YouTubeTranscriptApi.get_transcript(video_id_match.group(1))
392
+ text = " ".join(piece["text"] for piece in transcript)
393
+ return text[:chars]
394
+ except Exception as exc:
395
+ return f"yt_error:{exc}"
396
 
397
  #----------------------------------------------MATH TOOLS------------------------------------------------#
398
 
src/workflow_test.ipynb CHANGED
@@ -42,34 +42,43 @@
42
  "πŸ’‘ ════════════════════\n",
43
  "πŸ’‘ USER QUERY \n",
44
  "πŸ’‘ ════════════════════\n",
45
- " β€’ files: none provided\n",
 
 
 
 
 
 
 
 
 
 
46
  "=== COMPLEXITY ASSESSMENT ===\n",
47
  "Complexity: simple\n",
48
  "Needs planning: False\n",
49
- "Reasoning: This is a single-step arithmetic question (2+2). Although calculations technically require a tool per the special considerations, this is trivial and requires only one immediate operation, so it is SIMPLE.\n",
50
  "=== SIMPLE EXECUTION ===\n",
51
  "Response generated for simple query.\n",
52
  "=== GENERATING EXECUTION REPORT ===\n",
53
  "Report generated - Confidence: high\n",
54
  "Key findings: 3\n",
55
  "Data sources: 2\n",
56
- "query_summary=\"User asked for the numeric result of the arithmetic expression '2+2'.\" approach_used=\"Direct evaluation using basic arithmetic: interpreted '+' as standard integer addition and computed the sum mentally without invoking external tools or files.\" tools_executed=[] key_findings=[\"The expression '2+2' was interpreted as standard integer addition.\", 'Computed result is 4.', 'No external tools or data were required to compute the result.'] data_sources=['Basic arithmetic rules (internal knowledge)', 'Conversation history confirming the query and an earlier direct answer'] assumptions_made=[\"The '+' operator denotes standard arithmetic addition on integers.\", 'Numbers are in the usual base-10 system and no special context (e.g., modular arithmetic or symbolic manipulation) was intended.'] confidence_level='high' limitations=['If the user intended a nonstandard context (modulo arithmetic, different base, or overloaded operator semantics), the answer could differ.', 'Extremely simple query; few realistic limitations beyond contextual ambiguity.'] final_answer='4'\n",
57
  "=== ENHANCED ANSWER CRITIQUE ===\n",
58
- "Quality Score: 8/10\n",
59
  "Complete: True\n",
60
  "Accurate: True\n",
61
- "Issues found: [\"Performed the calculation mentally rather than using an external computational tool (triggers the evaluation framework's manual-calculation penalty).\"]\n",
62
  "=== REPLAN DECISION ===\n",
63
  "Iteration: 1/10\n",
64
- "Quality score: 8\n",
65
  "Needs replanning: False\n",
66
  "Quality acceptable, ending execution\n"
67
  ]
68
  }
69
  ],
70
  "source": [
71
- "query = \"What is 2+2\"\n",
72
- "result = graph.invoke({\"query\" : query, \"current_step\": 0, \"reasoning_done\": False, \"files\" : [], \"files_contents\" : {}, \"iteration_count\" : 0, \"max_iterations\" : 10, \"plan\" : None} , config = config)"
73
  ]
74
  },
75
  {
@@ -81,7 +90,7 @@
81
  "name": "stdout",
82
  "output_type": "stream",
83
  "text": [
84
- "FINAL ANSWER: 4\n"
85
  ]
86
  }
87
  ],
@@ -97,20 +106,27 @@
97
  {
98
  "data": {
99
  "text/plain": [
100
- "{'messages': [SystemMessage(content='You are a COMPLEXITY ASSESSOR for a multi-tool agent system.\\nYour job is to analyze user queries and determine their complexity level and processing requirements.\\n\\nCOMPLEXITY LEVELS:\\n1. SIMPLE: Direct questions that can be answered immediately without tools or with single tool use\\n - Examples: \"What is photosynthesis?\", \"Define machine learning\", \"What\\'s the capital of France?\"\\n - NOTE: Simple math like \"2+2\" still requires calculator tool but counts as SIMPLE\\n\\n !ALSO: It can be a logical reasoning or explanation task that does not require tools.\\n \\n2. MODERATE: Questions requiring 2-4 tool calls or basic multi-step analysis\\n - Examples: \"Search for recent news about AI\", \"Analyze this CSV file for trends\", \"Calculate ROI from this data\"\\n - \"Compare two datasets\", \"Summarize multiple documents\"\\n \\n3. COMPLEX: Multi-step problems requiring planning, multiple tools, and sophisticated reasoning\\n - Examples: \"Research market trends and create investment strategy\", \"Analyze multiple data sources and predict outcomes\"\\n - \"Build comprehensive report from various inputs\", \"Multi-stage data processing with validation\"\\n\\nMOST OF THE LOGICAL TASKS ARE SIMPLE, UNLESS THEY REQUIRE TOOLS.\\n\\nASSESSMENT CRITERIA:\\n- Number of distinct steps likely needed (1 = Simple, 2-4 = Moderate, 5+ = Complex)\\n- Tool complexity and dependencies between steps\\n- Data processing requirements and validation needs\\n- Need for intermediate reasoning and synthesis\\n- Risk of failure without proper step-by-step planning\\n- Presence of calculations (automatically requires tool usage)\\n\\nSPECIAL CONSIDERATIONS:\\n- Any calculation/counting task requires tools (affects complexity assessment)\\n- File analysis tasks usually need multiple steps (load + analyze + calculate)\\n- Research tasks typically need search + fetch + synthesis steps\\n- Comparison tasks need separate analysis steps for each item being compared\\n\\nRULES:\\n- SIMPLE queries may bypass planning for non-calculation tasks\\n- MODERATE queries benefit from lightweight planning\\n- COMPLEX queries require full planning with fallbacks\\n- When in doubt, err toward higher complexity\\n- Calculation tasks are never truly \"simple\" due to mandatory tool usage\\n\\nAnalyze the query and respond with your assessment.', additional_kwargs={}, response_metadata={}, id='db109164-6e6e-4c1f-82bb-93d6d9b64e6a'),\n",
101
- " HumanMessage(content='Query: What is 2+2', additional_kwargs={}, response_metadata={}, id='6b9afadb-3463-40a2-989b-19f8a237f7fc'),\n",
102
- " AIMessage(content='2 + 2 = 4', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 80, 'prompt_tokens': 1638, 'total_tokens': 1718, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 64, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-5-mini-2025-08-07', 'system_fingerprint': None, 'id': 'chatcmpl-CId3zSwgGIoDxYMuwG2xJfCLDiVuM', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='run--210d298d-a542-4458-8933-93ebf4c7bac0-0', usage_metadata={'input_tokens': 1638, 'output_tokens': 80, 'total_tokens': 1718, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 64}})],\n",
103
- " 'query': 'What is 2+2',\n",
104
- " 'final_answer': 'FINAL ANSWER: 4',\n",
 
105
  " 'plan': None,\n",
106
- " 'complexity_assessment': ComplexityLevel(level='simple', reasoning='This is a single-step arithmetic question (2+2). Although calculations technically require a tool per the special considerations, this is trivial and requires only one immediate operation, so it is SIMPLE.', needs_planning=False, suggested_approach='Perform the basic arithmetic (2+2) and return the result (4). No detailed planning or multi-step processing needed.'),\n",
107
  " 'current_step': 0,\n",
108
  " 'reasoning_done': False,\n",
109
- " 'files': [],\n",
110
- " 'critique_feedback': CritiqueFeedback(quality_score=8, is_complete=True, is_accurate=True, missing_elements=[], errors_found=[\"Performed the calculation mentally rather than using an external computational tool (triggers the evaluation framework's manual-calculation penalty).\"], suggested_improvements=['Use a computational tool or explicitly show the calculation steps even for trivial arithmetic to avoid the manual-calculation policy violation (e.g., evaluate with a calculator tool or print the operation and result).', \"Explicitly state assumptions up front (that '+' is standard integer addition in base 10) and, when relevant, ask a clarifying question if the user might have meant a nonstandard interpretation (modular arithmetic, different base, operator overloading).\", 'For transparency, include a short note citing the arithmetic rule used (e.g., basic integer addition) when delivering the result, even though the operation is trivial.'], needs_replanning=False, replan_instructions=None),\n",
 
 
 
 
 
 
111
  " 'iteration_count': 1,\n",
112
  " 'max_iterations': 10,\n",
113
- " 'execution_report': ExecutionReport(query_summary=\"User asked for the numeric result of the arithmetic expression '2+2'.\", approach_used=\"Direct evaluation using basic arithmetic: interpreted '+' as standard integer addition and computed the sum mentally without invoking external tools or files.\", tools_executed=[], key_findings=[\"The expression '2+2' was interpreted as standard integer addition.\", 'Computed result is 4.', 'No external tools or data were required to compute the result.'], data_sources=['Basic arithmetic rules (internal knowledge)', 'Conversation history confirming the query and an earlier direct answer'], assumptions_made=[\"The '+' operator denotes standard arithmetic addition on integers.\", 'Numbers are in the usual base-10 system and no special context (e.g., modular arithmetic or symbolic manipulation) was intended.'], confidence_level='high', limitations=['If the user intended a nonstandard context (modulo arithmetic, different base, or overloaded operator semantics), the answer could differ.', 'Extremely simple query; few realistic limitations beyond contextual ambiguity.'], final_answer='4')}"
114
  ]
115
  },
116
  "execution_count": 5,
 
42
  "πŸ’‘ ════════════════════\n",
43
  "πŸ’‘ USER QUERY \n",
44
  "πŸ’‘ ════════════════════\n",
45
+ "Processing 1 files:\n",
46
+ "\n",
47
+ "πŸ“ ════════════════════\n",
48
+ "πŸ“ FILE PREPARATION \n",
49
+ "πŸ“ ════════════════════\n",
50
+ "πŸ“ Processing 1 file(s)\n",
51
+ " - D:/ankelodon_multiagent_system/data/Screenshot_1.png: image (979539 bytes) -> vision_qa_gemma\n",
52
+ " β€’ path: D:/ankelodon_multiagent_system/data/Screenshot_1.png\n",
53
+ " β€’ type: image\n",
54
+ " β€’ size: 979539 bytes\n",
55
+ " β€’ suggested_tool: vision_qa_gemma\n",
56
  "=== COMPLEXITY ASSESSMENT ===\n",
57
  "Complexity: simple\n",
58
  "Needs planning: False\n",
59
+ "Reasoning: A mate-in-one is a single-step chess puzzle: it requires identifying one legal white move that delivers checkmate. That is one distinct step, no multi-stage data processing, and no specialized tools are required (unless you want to verify moves with a chess engine). Note: the user did not supply a board position (FEN/diagram/image); if the position is provided the task remains SIMPLE.\n",
60
  "=== SIMPLE EXECUTION ===\n",
61
  "Response generated for simple query.\n",
62
  "=== GENERATING EXECUTION REPORT ===\n",
63
  "Report generated - Confidence: high\n",
64
  "Key findings: 3\n",
65
  "Data sources: 2\n",
66
+ "query_summary='User requested the mate-in-one move for White from a provided chess diagram and asked to return only the algebraic notation of the move.' approach_used='Used an image-based QA tool (vision_qa_gemma) to identify the chess position from the supplied screenshot, determine that it is White to move, compute the mate-in-one, and return the move in algebraic notation.' tools_executed=[ToolExecution(tool_name='vision_qa_gemma', arguments='{\\'question\\': \"Identify the chess position and find mate in one for White. Provide the algebraic notation of the move only. It\\'s White to move.\", \\'image_path\\': \\'D:/ankelodon_multiagent_system/data/Screenshot_1.png\\', \\'temperature\\': 0.2}', call_id='call_RxKLwrV2KP1sJ7ShaImQqbUl')] key_findings=['The position in the provided image was analyzed and confirmed to be White to move.', 'A forced mate in one was identified for White.', 'The mate-in-one move in algebraic notation is: Qf7#.'] data_sources=['Screenshot image: D:/ankelodon_multiagent_system/data/Screenshot_1.png', 'vision_qa_gemma tool output (see Tools Executed)'] assumptions_made=[\"Standard algebraic notation is used (including '#' for mate).\", 'Board orientation is conventional with White pieces at the bottom (as interpreted from the image).', 'The image is an accurate and complete representation of the game state (no hidden or obscured pieces).'] confidence_level='high' limitations=['Result depends on the accuracy of the vision tool interpreting the image; any misread piece or orientation could change the correct move.', 'Only one tool/analysis pass was used; no independent engine verification was run here.', 'If there are multiple legal mate-in-one moves in the position, this report records the move returned by the tool without enumerating alternatives.'] final_answer='Qf7#'\n",
67
  "=== ENHANCED ANSWER CRITIQUE ===\n",
68
+ "Quality Score: 7/10\n",
69
  "Complete: True\n",
70
  "Accurate: True\n",
 
71
  "=== REPLAN DECISION ===\n",
72
  "Iteration: 1/10\n",
73
+ "Quality score: 7\n",
74
  "Needs replanning: False\n",
75
  "Quality acceptable, ending execution\n"
76
  ]
77
  }
78
  ],
79
  "source": [
80
+ "query = \"Find the chekmate in one move and provide only algebraic notation of the move, its a white turn.\"\n",
81
+ "result = graph.invoke({\"query\" : query, \"current_step\": 0, \"reasoning_done\": False, \"files\" : [\"D:/ankelodon_multiagent_system/data/Screenshot_1.png\"], \"files_contents\" : {}, \"iteration_count\" : 0, \"max_iterations\" : 10, \"plan\" : None} , config = config)"
82
  ]
83
  },
84
  {
 
90
  "name": "stdout",
91
  "output_type": "stream",
92
  "text": [
93
+ "FINAL ANSWER: Qf7#\n"
94
  ]
95
  }
96
  ],
 
106
  {
107
  "data": {
108
  "text/plain": [
109
+ "{'messages': [SystemMessage(content='You are a COMPLEXITY ASSESSOR for a multi-tool agent system.\\nYour job is to analyze user queries and determine their complexity level and processing requirements.\\n\\nCOMPLEXITY LEVELS:\\n1. SIMPLE: Direct questions that can be answered immediately without tools or with single tool use\\n - Examples: \"What is photosynthesis?\", \"Define machine learning\", \"What\\'s the capital of France?\"\\n - NOTE: Simple math like \"2+2\" still requires calculator tool but counts as SIMPLE\\n\\n !ALSO: It can be a logical reasoning or explanation task that does not require tools.\\n \\n2. MODERATE: Questions requiring 2-4 tool calls or basic multi-step analysis\\n - Examples: \"Search for recent news about AI\", \"Analyze this CSV file for trends\", \"Calculate ROI from this data\"\\n - \"Compare two datasets\", \"Summarize multiple documents\"\\n \\n3. COMPLEX: Multi-step problems requiring planning, multiple tools, and sophisticated reasoning\\n - Examples: \"Research market trends and create investment strategy\", \"Analyze multiple data sources and predict outcomes\"\\n - \"Build comprehensive report from various inputs\", \"Multi-stage data processing with validation\"\\n\\nMOST OF THE LOGICAL TASKS ARE SIMPLE, UNLESS THEY REQUIRE TOOLS.\\n\\nASSESSMENT CRITERIA:\\n- Number of distinct steps likely needed (1 = Simple, 2-4 = Moderate, 5+ = Complex)\\n- Tool complexity and dependencies between steps\\n- Data processing requirements and validation needs\\n- Need for intermediate reasoning and synthesis\\n- Risk of failure without proper step-by-step planning\\n- Presence of calculations (automatically requires tool usage)\\n\\nSPECIAL CONSIDERATIONS:\\n- Any calculation/counting task requires tools (affects complexity assessment)\\n- File analysis tasks usually need multiple steps (load + analyze + calculate)\\n- Research tasks typically need search + fetch/extract + synthesis steps\\n- Comparison tasks need separate analysis steps for each item being compared\\n\\nRULES:\\n- SIMPLE queries may bypass planning for non-calculation tasks\\n- MODERATE queries benefit from lightweight planning\\n- COMPLEX queries require full planning with fallbacks\\n- When in doubt, err toward higher complexity\\n- Calculation tasks are never truly \"simple\" due to mandatory tool usage\\n\\nAnalyze the query and respond with your assessment.', additional_kwargs={}, response_metadata={}, id='a857824a-40fa-4453-8cdd-5dd9c73cd1dc'),\n",
110
+ " HumanMessage(content='Query: Find the chekmate in one move and provide only algebraic notation of the move, its a white turn.', additional_kwargs={}, response_metadata={}, id='5a765bb0-2ced-4919-aa4c-3cb6852b6a6d'),\n",
111
+ " AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_RxKLwrV2KP1sJ7ShaImQqbUl', 'function': {'arguments': '{\"question\":\"Identify the chess position and find mate in one for White. Provide the algebraic notation of the move only. It\\'s White to move.\",\"image_path\":\"D:/ankelodon_multiagent_system/data/Screenshot_1.png\",\"temperature\":0.2}', 'name': 'vision_qa_gemma'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 140, 'prompt_tokens': 1771, 'total_tokens': 1911, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 64, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-5-mini-2025-08-07', 'system_fingerprint': None, 'id': 'chatcmpl-CIjeuIxLBZoq7rEjlZRWcfUO3KQWZ', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run--d9288bb8-21be-4a2e-a8b9-7e4fa97936e9-0', tool_calls=[{'name': 'vision_qa_gemma', 'args': {'question': \"Identify the chess position and find mate in one for White. Provide the algebraic notation of the move only. It's White to move.\", 'image_path': 'D:/ankelodon_multiagent_system/data/Screenshot_1.png', 'temperature': 0.2}, 'id': 'call_RxKLwrV2KP1sJ7ShaImQqbUl', 'type': 'tool_call'}], usage_metadata={'input_tokens': 1771, 'output_tokens': 140, 'total_tokens': 1911, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 64}}),\n",
112
+ " ToolMessage(content='{\"answer\": \"Qf7#\"}', name='vision_qa_gemma', id='559a320c-cb49-4f46-8e18-8b50d13fcbde', tool_call_id='call_RxKLwrV2KP1sJ7ShaImQqbUl')],\n",
113
+ " 'query': 'Find the chekmate in one move and provide only algebraic notation of the move, its a white turn.',\n",
114
+ " 'final_answer': 'FINAL ANSWER: Qf7#',\n",
115
  " 'plan': None,\n",
116
+ " 'complexity_assessment': ComplexityLevel(level='simple', reasoning='A mate-in-one is a single-step chess puzzle: it requires identifying one legal white move that delivers checkmate. That is one distinct step, no multi-stage data processing, and no specialized tools are required (unless you want to verify moves with a chess engine). Note: the user did not supply a board position (FEN/diagram/image); if the position is provided the task remains SIMPLE.', needs_planning=False, suggested_approach='Ask the user to provide the chess position (FEN string, diagram, or image). Once the position is available, generate all legal white moves, test which move results in immediate checkmate, and return only the move in algebraic notation (e.g., Qh7#).'),\n",
117
  " 'current_step': 0,\n",
118
  " 'reasoning_done': False,\n",
119
+ " 'files': ['D:/ankelodon_multiagent_system/data/Screenshot_1.png'],\n",
120
+ " 'file_contents': {'D:/ankelodon_multiagent_system/data/Screenshot_1.png': {'path': 'D:/ankelodon_multiagent_system/data/Screenshot_1.png',\n",
121
+ " 'extension': '.png',\n",
122
+ " 'size': 979539,\n",
123
+ " 'type': 'image',\n",
124
+ " 'suggested_tool': 'vision_qa_gemma',\n",
125
+ " 'preview': None}},\n",
126
+ " 'critique_feedback': CritiqueFeedback(quality_score=7, is_complete=True, is_accurate=True, missing_elements=[], errors_found=[], suggested_improvements=['Consider using an additional chess engine for verification of the mate-in-one move to ensure accuracy.', 'Provide a brief explanation of the reasoning behind the identified move to enhance clarity for the user.'], needs_replanning=False, replan_instructions=None),\n",
127
  " 'iteration_count': 1,\n",
128
  " 'max_iterations': 10,\n",
129
+ " 'execution_report': ExecutionReport(query_summary='User requested the mate-in-one move for White from a provided chess diagram and asked to return only the algebraic notation of the move.', approach_used='Used an image-based QA tool (vision_qa_gemma) to identify the chess position from the supplied screenshot, determine that it is White to move, compute the mate-in-one, and return the move in algebraic notation.', tools_executed=[ToolExecution(tool_name='vision_qa_gemma', arguments='{\\'question\\': \"Identify the chess position and find mate in one for White. Provide the algebraic notation of the move only. It\\'s White to move.\", \\'image_path\\': \\'D:/ankelodon_multiagent_system/data/Screenshot_1.png\\', \\'temperature\\': 0.2}', call_id='call_RxKLwrV2KP1sJ7ShaImQqbUl')], key_findings=['The position in the provided image was analyzed and confirmed to be White to move.', 'A forced mate in one was identified for White.', 'The mate-in-one move in algebraic notation is: Qf7#.'], data_sources=['Screenshot image: D:/ankelodon_multiagent_system/data/Screenshot_1.png', 'vision_qa_gemma tool output (see Tools Executed)'], assumptions_made=[\"Standard algebraic notation is used (including '#' for mate).\", 'Board orientation is conventional with White pieces at the bottom (as interpreted from the image).', 'The image is an accurate and complete representation of the game state (no hidden or obscured pieces).'], confidence_level='high', limitations=['Result depends on the accuracy of the vision tool interpreting the image; any misread piece or orientation could change the correct move.', 'Only one tool/analysis pass was used; no independent engine verification was run here.', 'If there are multiple legal mate-in-one moves in the position, this report records the move returned by the tool without enumerating alternatives.'], final_answer='Qf7#')}"
130
  ]
131
  },
132
  "execution_count": 5,