Commit
Β·
af3a044
1
Parent(s):
331400f
Improvements of prompt for planner (old version), adding youtube parser tool, tavily_extract
Browse files- .gitignore +2 -2
- questions_20_gaia.json +1 -0
- src/__init__.py +95 -10
- src/config.py +4 -2
- src/prompts/prompts.py +101 -4
- src/requirements.txt +21 -0
- src/tools/tools.py +33 -0
- src/workflow_test.ipynb +34 -18
.gitignore
CHANGED
|
@@ -11,13 +11,13 @@ __pycache__/
|
|
| 11 |
venv/
|
| 12 |
env/
|
| 13 |
.venv/
|
| 14 |
-
|
| 15 |
# IDEs
|
| 16 |
.vscode/
|
| 17 |
.idea/
|
| 18 |
*.swp
|
| 19 |
*.swo
|
| 20 |
-
|
| 21 |
# OS
|
| 22 |
.DS_Store
|
| 23 |
.DS_Store?
|
|
|
|
| 11 |
venv/
|
| 12 |
env/
|
| 13 |
.venv/
|
| 14 |
+
D:/ankelodon_multiagent_system/questions_20_gaia.json
|
| 15 |
# IDEs
|
| 16 |
.vscode/
|
| 17 |
.idea/
|
| 18 |
*.swp
|
| 19 |
*.swo
|
| 20 |
+
data/
|
| 21 |
# OS
|
| 22 |
.DS_Store
|
| 23 |
.DS_Store?
|
questions_20_gaia.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
[{"task_id":"8e867cd7-cff9-4e6c-867a-ff5ddc2550be","question":"How many studio albums were published by Mercedes Sosa between 2000 and 2009 (included)? You can use the latest 2022 version of english wikipedia.","Level":"1","file_name":""},{"task_id":"a1e91b78-d3d8-4675-bb8d-62741b4b68a6","question":"In the video https://www.youtube.com/watch?v=L1vXCYZAYYM, what is the highest number of bird species to be on camera simultaneously?","Level":"1","file_name":""},{"task_id":"2d83110e-a098-4ebb-9987-066c06fa42d0","question":".rewsna eht sa \"tfel\" drow eht fo etisoppo eht etirw ,ecnetnes siht dnatsrednu uoy fI","Level":"1","file_name":""},{"task_id":"cca530fc-4052-43b2-b130-b30968d8aa44","question":"Review the chess position provided in the image. It is black's turn. Provide the correct next move for black which guarantees a win. Please provide your response in algebraic notation.","Level":"1","file_name":"cca530fc-4052-43b2-b130-b30968d8aa44.png"},{"task_id":"4fc2f1ae-8625-45b5-ab34-ad4433bc21f8","question":"Who nominated the only Featured Article on English Wikipedia about a dinosaur that was promoted in November 2016?","Level":"1","file_name":""},{"task_id":"6f37996b-2ac7-44b0-8e68-6d28256631b4","question":"Given this table defining * on the set S = {a, b, c, d, e}\n\n|*|a|b|c|d|e|\n|---|---|---|---|---|---|\n|a|a|b|c|b|d|\n|b|b|c|a|e|c|\n|c|c|a|b|b|a|\n|d|b|e|b|e|d|\n|e|d|b|a|d|c|\n\nprovide the subset of S involved in any possible counter-examples that prove * is not commutative. Provide your answer as a comma separated list of the elements in the set in alphabetical order.","Level":"1","file_name":""},{"task_id":"9d191bce-651d-4746-be2d-7ef8ecadb9c2","question":"Examine the video at https://www.youtube.com/watch?v=1htKBjuUWec.\n\nWhat does Teal'c say in response to the question \"Isn't that hot?\"","Level":"1","file_name":""},{"task_id":"cabe07ed-9eca-40ea-8ead-410ef5e83f91","question":"What is the surname of the equine veterinarian mentioned in 1.E Exercises from the chemistry materials licensed by Marisa Alviar-Agnew & Henry Agnew under the CK-12 license in LibreText's Introductory Chemistry materials as compiled 08/21/2023?","Level":"1","file_name":""},{"task_id":"3cef3a44-215e-4aed-8e3b-b1e3f08063b7","question":"I'm making a grocery list for my mom, but she's a professor of botany and she's a real stickler when it comes to categorizing things. I need to add different foods to different categories on the grocery list, but if I make a mistake, she won't buy anything inserted in the wrong category. Here's the list I have so far:\n\nmilk, eggs, flour, whole bean coffee, Oreos, sweet potatoes, fresh basil, plums, green beans, rice, corn, bell pepper, whole allspice, acorns, broccoli, celery, zucchini, lettuce, peanuts\n\nI need to make headings for the fruits and vegetables. Could you please create a list of just the vegetables from my list? If you could do that, then I can figure out how to categorize the rest of the list into the appropriate categories. But remember that my mom is a real stickler, so make sure that no botanical fruits end up on the vegetable list, or she won't get them when she's at the store. Please alphabetize the list of vegetables, and place each item in a comma separated list.","Level":"1","file_name":""},{"task_id":"99c9cc74-fdc8-46c6-8f8d-3ce2d3bfeea3","question":"Hi, I'm making a pie but I could use some help with my shopping list. I have everything I need for the crust, but I'm not sure about the filling. I got the recipe from my friend Aditi, but she left it as a voice memo and the speaker on my phone is buzzing so I can't quite make out what she's saying. Could you please listen to the recipe and list all of the ingredients that my friend described? I only want the ingredients for the filling, as I have everything I need to make my favorite pie crust. I've attached the recipe as Strawberry pie.mp3.\n\nIn your response, please only list the ingredients, not any measurements. So if the recipe calls for \"a pinch of salt\" or \"two cups of ripe strawberries\" the ingredients on the list would be \"salt\" and \"ripe strawberries\".\n\nPlease format your response as a comma separated list of ingredients. Also, please alphabetize the ingredients.","Level":"1","file_name":"99c9cc74-fdc8-46c6-8f8d-3ce2d3bfeea3.mp3"},{"task_id":"305ac316-eef6-4446-960a-92d80d542f82","question":"Who did the actor who played Ray in the Polish-language version of Everybody Loves Raymond play in Magda M.? Give only the first name.","Level":"1","file_name":""},{"task_id":"f918266a-b3e0-4914-865d-4faa564f1aef","question":"What is the final numeric output from the attached Python code?","Level":"1","file_name":"f918266a-b3e0-4914-865d-4faa564f1aef.py"},{"task_id":"3f57289b-8c60-48be-bd80-01f8099ca449","question":"How many at bats did the Yankee with the most walks in the 1977 regular season have that same season?","Level":"1","file_name":""},{"task_id":"1f975693-876d-457b-a649-393859e79bf3","question":"Hi, I was out sick from my classes on Friday, so I'm trying to figure out what I need to study for my Calculus mid-term next week. My friend from class sent me an audio recording of Professor Willowbrook giving out the recommended reading for the test, but my headphones are broken :(\n\nCould you please listen to the recording for me and tell me the page numbers I'm supposed to go over? I've attached a file called Homework.mp3 that has the recording. Please provide just the page numbers as a comma-delimited list. And please provide the list in ascending order.","Level":"1","file_name":"1f975693-876d-457b-a649-393859e79bf3.mp3"},{"task_id":"840bfca7-4f7b-481a-8794-c560c340185d","question":"On June 6, 2023, an article by Carolyn Collins Petersen was published in Universe Today. This article mentions a team that produced a paper about their observations, linked at the bottom of the article. Find this paper. Under what NASA award number was the work performed by R. G. Arendt supported by?","Level":"1","file_name":""},{"task_id":"bda648d7-d618-4883-88f4-3466eabd860e","question":"Where were the Vietnamese specimens described by Kuznetzov in Nedoshivina's 2010 paper eventually deposited? Just give me the city name without abbreviations.","Level":"1","file_name":""},{"task_id":"cf106601-ab4f-4af9-b045-5295fe67b37d","question":"What country had the least number of athletes at the 1928 Summer Olympics? If there's a tie for a number of athletes, return the first in alphabetical order. Give the IOC country code as your answer.","Level":"1","file_name":""},{"task_id":"a0c07678-e491-4bbc-8f0b-07405144218f","question":"Who are the pitchers with the number before and after TaishΕ Tamai's number as of July 2023? Give them to me in the form Pitcher Before, Pitcher After, use their last names only, in Roman characters.","Level":"1","file_name":""},{"task_id":"7bd855d8-463d-4ed5-93ca-5fe35145f733","question":"The attached Excel file contains the sales of menu items for a local fast-food chain. What were the total sales that the chain made from food (not including drinks)? Express your answer in USD with two decimal places.","Level":"1","file_name":"7bd855d8-463d-4ed5-93ca-5fe35145f733.xlsx"},{"task_id":"5a0c1adf-205e-4841-a666-7c3ef95def9d","question":"What is the first name of the only Malko Competition recipient from the 20th Century (after 1977) whose nationality on record is a country that no longer exists?","Level":"1","file_name":""}]
|
src/__init__.py
CHANGED
|
@@ -4,16 +4,101 @@ Import key components for easy use:
|
|
| 4 |
from src import workflow, llm
|
| 5 |
"""
|
| 6 |
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
|
|
|
| 12 |
__version__ = "0.1.0"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
__all__ = [
|
| 14 |
-
|
| 15 |
-
"
|
| 16 |
-
|
| 17 |
-
"
|
| 18 |
-
|
| 19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
from src import workflow, llm
|
| 5 |
"""
|
| 6 |
|
| 7 |
+
"""
|
| 8 |
+
Ankelodon Multi-Agent System β package init.
|
| 9 |
+
|
| 10 |
+
ΠΠΊΡΠΏΠΎΡΡΠΈΡΡΠ΅Ρ ΡΠ΄ΠΎΠ±Π½ΡΠΉ ΠΏΡΠ±Π»ΠΈΡΠ½ΡΠΉ API Π΄Π»Ρ ΡΠ°Π±ΠΎΡΡ Ρ Π³ΡΠ°ΡΠΎΠΌ, ΡΠΎΡΡΠΎΡΠ½ΠΈΠ΅ΠΌ Π°Π³Π΅Π½ΡΠ°,
|
| 11 |
+
ΡΡ
Π΅ΠΌΠ°ΠΌΠΈ ΠΈ ΠΊΠΎΠ½ΡΠΈΠ³ΠΎΠΌ. ΠΠ»Π°Π΄ΠΈ ΡΡΠΎΡ ΡΠ°ΠΉΠ» Π² Π΄ΠΈΡΠ΅ΠΊΡΠΎΡΠΈΡ, Π³Π΄Π΅ Π»Π΅ΠΆΠ°Ρ:
|
| 12 |
+
agent.py, config.py, nodes.py, schemas.py, state.py
|
| 13 |
+
(Ρ ΡΠ΅Π±Ρ ΡΡΠΎ src/).
|
| 14 |
+
"""
|
| 15 |
|
| 16 |
+
# ΠΠ΅ΡΡΠΈΡ ΠΏΠ°ΠΊΠ΅ΡΠ° (ΠΏΠΎ ΠΆΠ΅Π»Π°Π½ΠΈΡ ΠΎΠ±Π½ΠΎΠ²Π»ΡΠΉ Π²ΡΡΡΠ½ΡΡ/ΠΈΠ· git)
|
| 17 |
__version__ = "0.1.0"
|
| 18 |
+
|
| 19 |
+
# ββ ΠΡΠ°Ρ/ΡΠ±ΠΎΡΠΊΠ°
|
| 20 |
+
from .agent import build_workflow
|
| 21 |
+
|
| 22 |
+
# ββ Π‘ΠΎΡΡΠΎΡΠ½ΠΈΠ΅
|
| 23 |
+
from .state import AgentState
|
| 24 |
+
|
| 25 |
+
# ββ Π‘Ρ
Π΅ΠΌΡ/ΠΌΠΎΠ΄Π΅Π»ΠΈ
|
| 26 |
+
from .schemas import (
|
| 27 |
+
ComplexityLevel,
|
| 28 |
+
CritiqueFeedback,
|
| 29 |
+
PlannerPlan,
|
| 30 |
+
PlanStep,
|
| 31 |
+
ExecutionReport,
|
| 32 |
+
ToolExecution,
|
| 33 |
+
TaskType,
|
| 34 |
+
)
|
| 35 |
+
|
| 36 |
+
# ββ ΠΠΎΠ½ΡΠΈΠ³/LLM/Tools
|
| 37 |
+
from .config import (
|
| 38 |
+
config,
|
| 39 |
+
TOOLS,
|
| 40 |
+
DEBUGGING_TOOL_NODE,
|
| 41 |
+
llm,
|
| 42 |
+
llm_deterministic,
|
| 43 |
+
planner_llm,
|
| 44 |
+
llm_with_tools,
|
| 45 |
+
llm_criticist,
|
| 46 |
+
llm_reasoning,
|
| 47 |
+
)
|
| 48 |
+
|
| 49 |
+
# ββ Π£Π·Π»Ρ/ΡΠΎΡΡΠ΅ΡΡ (Π΅ΡΠ»ΠΈ Π½ΡΠΆΠ½ΠΎ Π²ΡΠ·ΡΠ²Π°ΡΡ Π½Π°ΠΏΡΡΠΌΡΡ ΠΈΠ»ΠΈ Π΄Π»Ρ ΡΠ΅ΡΡΠΎΠ²)
|
| 50 |
+
from .nodes import (
|
| 51 |
+
query_input,
|
| 52 |
+
complexity_assessor,
|
| 53 |
+
planner,
|
| 54 |
+
agent,
|
| 55 |
+
simple_executor,
|
| 56 |
+
critic_evaluator,
|
| 57 |
+
replanner,
|
| 58 |
+
enhanced_finalizer,
|
| 59 |
+
# ΡΠΎΡΡΠ΅ΡΡ
|
| 60 |
+
should_continue,
|
| 61 |
+
should_use_planning,
|
| 62 |
+
should_replan,
|
| 63 |
+
should_use_tools_simple_executor,
|
| 64 |
+
)
|
| 65 |
+
|
| 66 |
__all__ = [
|
| 67 |
+
# Π²Π΅ΡΡΠΈΡ
|
| 68 |
+
"__version__",
|
| 69 |
+
# ΡΠ±ΠΎΡΠΊΠ° Π³ΡΠ°ΡΠ°
|
| 70 |
+
"build_workflow",
|
| 71 |
+
# ΡΠΎΡΡΠΎΡΠ½ΠΈΠ΅
|
| 72 |
+
"AgentState",
|
| 73 |
+
# ΡΡ
Π΅ΠΌΡ
|
| 74 |
+
"ComplexityLevel",
|
| 75 |
+
"CritiqueFeedback",
|
| 76 |
+
"PlannerPlan",
|
| 77 |
+
"PlanStep",
|
| 78 |
+
"ExecutionReport",
|
| 79 |
+
"ToolExecution",
|
| 80 |
+
"TaskType",
|
| 81 |
+
# ΠΊΠΎΠ½ΡΠΈΠ³/ΠΌΠΎΠ΄Π΅Π»ΠΈ/ΡΡΠ»Ρ
|
| 82 |
+
"config",
|
| 83 |
+
"TOOLS",
|
| 84 |
+
"DEBUGGING_TOOL_NODE",
|
| 85 |
+
"llm",
|
| 86 |
+
"llm_deterministic",
|
| 87 |
+
"planner_llm",
|
| 88 |
+
"llm_with_tools",
|
| 89 |
+
"llm_criticist",
|
| 90 |
+
"llm_reasoning",
|
| 91 |
+
# ΡΠ·Π»Ρ ΠΈ ΡΠΎΡΡΠ΅ΡΡ
|
| 92 |
+
"query_input",
|
| 93 |
+
"complexity_assessor",
|
| 94 |
+
"planner",
|
| 95 |
+
"agent",
|
| 96 |
+
"simple_executor",
|
| 97 |
+
"critic_evaluator",
|
| 98 |
+
"replanner",
|
| 99 |
+
"enhanced_finalizer",
|
| 100 |
+
"should_continue",
|
| 101 |
+
"should_use_planning",
|
| 102 |
+
"should_replan",
|
| 103 |
+
"should_use_tools_simple_executor",
|
| 104 |
+
]
|
src/config.py
CHANGED
|
@@ -14,7 +14,7 @@ TOOLS = [download_file_from_url, web_search,
|
|
| 14 |
arxiv_search, wiki_search, add, subtract, multiply, divide,
|
| 15 |
power, analyze_excel_file, analyze_csv_file, analyze_docx_file,
|
| 16 |
analyze_pdf_file, analyze_txt_file,
|
| 17 |
-
vision_qa_gemma, safe_code_run]
|
| 18 |
|
| 19 |
|
| 20 |
TOOL_NODE = ToolNode(TOOLS)
|
|
@@ -23,9 +23,11 @@ DEBUGGING_TOOL_NODE = TOOL_NODE
|
|
| 23 |
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7) #default 0.25
|
| 24 |
llm_deterministic = ChatOpenAI(model="gpt-5-mini", temperature=0.05)
|
| 25 |
planner_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.1).with_structured_output(PlannerPlan)
|
| 26 |
-
llm_criticist = ChatOpenAI(model="gpt-
|
| 27 |
llm_with_tools = llm_deterministic.bind_tools(TOOLS)
|
| 28 |
llm_reasoning = ChatOpenAI(model="gpt-5-mini", temperature=0.3)
|
|
|
|
|
|
|
| 29 |
|
| 30 |
|
| 31 |
|
|
|
|
| 14 |
arxiv_search, wiki_search, add, subtract, multiply, divide,
|
| 15 |
power, analyze_excel_file, analyze_csv_file, analyze_docx_file,
|
| 16 |
analyze_pdf_file, analyze_txt_file,
|
| 17 |
+
vision_qa_gemma, safe_code_run, web_extract, extract_youtube_transcript]
|
| 18 |
|
| 19 |
|
| 20 |
TOOL_NODE = ToolNode(TOOLS)
|
|
|
|
| 23 |
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7) #default 0.25
|
| 24 |
llm_deterministic = ChatOpenAI(model="gpt-5-mini", temperature=0.05)
|
| 25 |
planner_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.1).with_structured_output(PlannerPlan)
|
| 26 |
+
llm_criticist = ChatOpenAI(model="gpt-4o-mini", temperature=0.1)
|
| 27 |
llm_with_tools = llm_deterministic.bind_tools(TOOLS)
|
| 28 |
llm_reasoning = ChatOpenAI(model="gpt-5-mini", temperature=0.3)
|
| 29 |
+
llm_simple_executor = ChatOpenAI(model="gpt-5-mini", temperature=0.3)
|
| 30 |
+
llm_simple_with_tools = llm_simple_executor.bind_tools(TOOLS)
|
| 31 |
|
| 32 |
|
| 33 |
|
src/prompts/prompts.py
CHANGED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
|
| 2 |
You are the planner of a multi-tool agent. Build a short, realistic plan that the executor can follow.
|
| 3 |
|
| 4 |
Available tools: {tool_catalogue}
|
|
@@ -25,7 +25,7 @@ Example 2: "Research recent AI developments and summarize key trends"
|
|
| 25 |
{{
|
| 26 |
"steps": [
|
| 27 |
{{"id": "s1", "goal": "Search for recent AI news and developments", "tool": "web_search"}},
|
| 28 |
-
{{"id": "s2", "goal": "
|
| 29 |
{{"id": "s3", "goal": "Extract and organize key information from articles", "tool": "analyze_(csv, docx, pdf etc.)_file"}},
|
| 30 |
{{"id": "s4", "goal": "Analyze and synthesize key trends from gathered information", "tool": null}}
|
| 31 |
]
|
|
@@ -73,7 +73,7 @@ Ground rules:
|
|
| 73 |
- Prefer 2-4 steps for most tasks. Single steps only for truly trivial queries. Calculation tasks must use tools always.
|
| 74 |
- Break down complex tasks into logical components - don't try to solve everything at once
|
| 75 |
- Use tool names exactly as listed. If no tool is needed, set "tool": null.
|
| 76 |
-
- Never assume files or URLs existβplan to search/
|
| 77 |
- Skip download steps when the required file is already provided.
|
| 78 |
- Ensure later steps only depend on results created by earlier steps.
|
| 79 |
- For any numerical work: ALWAYS use tools (calculator/code) - never manual calculation
|
|
@@ -82,6 +82,103 @@ Ground rules:
|
|
| 82 |
- Plan for visualization or formatting steps when presenting complex results
|
| 83 |
"""
|
| 84 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 85 |
SYSTEM_EXECUTOR_PROMPT = """
|
| 86 |
You are the executor of a grounded multi-tool agent.
|
| 87 |
|
|
@@ -146,7 +243,7 @@ ASSESSMENT CRITERIA:
|
|
| 146 |
SPECIAL CONSIDERATIONS:
|
| 147 |
- Any calculation/counting task requires tools (affects complexity assessment)
|
| 148 |
- File analysis tasks usually need multiple steps (load + analyze + calculate)
|
| 149 |
-
- Research tasks typically need search + fetch + synthesis steps
|
| 150 |
- Comparison tasks need separate analysis steps for each item being compared
|
| 151 |
|
| 152 |
RULES:
|
|
|
|
| 1 |
+
SYSTEM_PROMPT_PLANNER_OLD = """
|
| 2 |
You are the planner of a multi-tool agent. Build a short, realistic plan that the executor can follow.
|
| 3 |
|
| 4 |
Available tools: {tool_catalogue}
|
|
|
|
| 25 |
{{
|
| 26 |
"steps": [
|
| 27 |
{{"id": "s1", "goal": "Search for recent AI news and developments", "tool": "web_search"}},
|
| 28 |
+
{{"id": "s2", "goal": "Extract all info from founded urls", "tool": "web_extract"}},
|
| 29 |
{{"id": "s3", "goal": "Extract and organize key information from articles", "tool": "analyze_(csv, docx, pdf etc.)_file"}},
|
| 30 |
{{"id": "s4", "goal": "Analyze and synthesize key trends from gathered information", "tool": null}}
|
| 31 |
]
|
|
|
|
| 73 |
- Prefer 2-4 steps for most tasks. Single steps only for truly trivial queries. Calculation tasks must use tools always.
|
| 74 |
- Break down complex tasks into logical components - don't try to solve everything at once
|
| 75 |
- Use tool names exactly as listed. If no tool is needed, set "tool": null.
|
| 76 |
+
- Never assume files or URLs existβplan to search/extract before analysing.
|
| 77 |
- Skip download steps when the required file is already provided.
|
| 78 |
- Ensure later steps only depend on results created by earlier steps.
|
| 79 |
- For any numerical work: ALWAYS use tools (calculator/code) - never manual calculation
|
|
|
|
| 82 |
- Plan for visualization or formatting steps when presenting complex results
|
| 83 |
"""
|
| 84 |
|
| 85 |
+
SYSTEM_PROMPT_PLANNER = """
|
| 86 |
+
You are the planner of a multi-tool agent. Build a short, realistic plan that the executor can follow.
|
| 87 |
+
|
| 88 |
+
Available tools: {tool_catalogue}
|
| 89 |
+
Known local files: {file_list}
|
| 90 |
+
Additional context: {extra_context}
|
| 91 |
+
|
| 92 |
+
CRITICAL COMPUTATION RULE: ANY mathematical calculation, counting, statistical analysis, or numerical computation MUST be performed using either:
|
| 93 |
+
- Mathematical tools (calculator, math functions) for simple calculations
|
| 94 |
+
- Code execution tools (Python/JavaScript) for complex calculations, data analysis, or statistical operations
|
| 95 |
+
NEVER perform calculations manually or estimate numerical results.
|
| 96 |
+
|
| 97 |
+
TASK BREAKDOWN EXAMPLES:
|
| 98 |
+
|
| 99 |
+
Example 1: "Analyze sales data and calculate growth rates"
|
| 100 |
+
{{
|
| 101 |
+
"steps": [
|
| 102 |
+
{{"id": "s1", "goal": "Load and examine the sales data file", "tool": "analyze_(csv, docx, pdf etc.)_file"}},
|
| 103 |
+
{{"id": "s2", "goal": "Calculate monthly growth rates using Python", "tool": "safe_code_run"}},
|
| 104 |
+
{{"id": "s3", "goal": "Generate summary statistics and trends", "tool": "safe_code_run"}}
|
| 105 |
+
]
|
| 106 |
+
}}
|
| 107 |
+
|
| 108 |
+
Example 2: "Research recent AI developments and summarize key trends"
|
| 109 |
+
{{
|
| 110 |
+
"steps": [
|
| 111 |
+
{{"id": "s1", "goal": "Search for recent AI news and developments", "tool": "tavily_search"}},
|
| 112 |
+
{{"id": "s2", "goal": "Extract key links and pick relevant documents (PDF, reports)", "tool": "tavilyextract"}},
|
| 113 |
+
{{"id": "s3", "goal": "Download chosen report for detailed analysis", "tool": "download_file_from_url"}},
|
| 114 |
+
{{"id": "s4", "goal": "Analyze the downloaded document (PDF/DOCX/TXT)", "tool": "analyze_pdf_file"}},
|
| 115 |
+
{{"id": "s5", "goal": "Summarize and synthesize key insights from the analyzed content", "tool": null}}
|
| 116 |
+
]
|
| 117 |
+
}}
|
| 118 |
+
|
| 119 |
+
Example 3: "Compare performance metrics between two datasets"
|
| 120 |
+
{{
|
| 121 |
+
"steps": [
|
| 122 |
+
{{"id": "s1", "goal": "Load first dataset and examine structure", "tool": "analyze_csv_file"}},
|
| 123 |
+
{{"id": "s2", "goal": "Load second dataset and examine structure", "tool": "analyze_excel_file"}},
|
| 124 |
+
{{"id": "s3", "goal": "Calculate statistical metrics for both datasets using code", "tool": "safe_code_run"}},
|
| 125 |
+
{{"id": "s4", "goal": "Perform statistical comparison and significance testing", "tool": "safe_code_run"}}
|
| 126 |
+
]
|
| 127 |
+
}}
|
| 128 |
+
|
| 129 |
+
Example 4: "Create a budget analysis from expense data"
|
| 130 |
+
{{
|
| 131 |
+
"steps": [
|
| 132 |
+
{{"id": "s1", "goal": "Load expense data and validate format", "tool": "analyze_csv_file"}},
|
| 133 |
+
{{"id": "s2", "goal": "Calculate category totals and percentages using code", "tool": "safe_code_run"}},
|
| 134 |
+
{{"id": "s3", "goal": "Generate budget variance analysis and projections", "tool": "safe_code_run"}},
|
| 135 |
+
{{"id": "s4", "goal": "Create visualization of spending patterns", "tool": "safe_code_run"}}
|
| 136 |
+
]
|
| 137 |
+
}}
|
| 138 |
+
|
| 139 |
+
Example 5: "Find and analyze a scientific PDF report on renewable energy"
|
| 140 |
+
{{
|
| 141 |
+
"steps": [
|
| 142 |
+
{{"id": "s1", "goal": "Search the web for renewable energy PDF reports", "tool": "tavily_search"}},
|
| 143 |
+
{{"id": "s2", "goal": "Extract candidate PDF links from the search results", "tool": "tavilyextract"}},
|
| 144 |
+
{{"id": "s3", "goal": "Download the most relevant PDF document", "tool": "download_file_from_url"}},
|
| 145 |
+
{{"id": "s4", "goal": "Parse and extract text from the downloaded PDF", "tool": "analyze_pdf_file"}},
|
| 146 |
+
{{"id": "s5", "goal": "Summarize findings and highlight key trends in renewable energy", "tool": null}}
|
| 147 |
+
]
|
| 148 |
+
}}
|
| 149 |
+
|
| 150 |
+
Return a single JSON object with this structure:
|
| 151 |
+
{{
|
| 152 |
+
"task_type": "info|calc|table|doc_qa|image_qa|multi_hop",
|
| 153 |
+
"summary": "One sentence on the chosen approach",
|
| 154 |
+
"assumptions": ["optional clarifications"],
|
| 155 |
+
"steps": [
|
| 156 |
+
{{
|
| 157 |
+
"id": "s1",
|
| 158 |
+
"goal": "Action to take and why it helps",
|
| 159 |
+
"tool": "tool_name_or_null",
|
| 160 |
+
"inputs": "Key parameters or references (files, URLs, prior steps)",
|
| 161 |
+
"expected_result": "How you know the step succeeded",
|
| 162 |
+
"on_fail": "replan|stop"
|
| 163 |
+
}}
|
| 164 |
+
],
|
| 165 |
+
"answer_guidelines": "Reminders for the final response (citations, format, units, etc.)"
|
| 166 |
+
}}
|
| 167 |
+
|
| 168 |
+
Ground rules:
|
| 169 |
+
- Prefer 2-4 steps for most tasks. Single steps only for truly trivial queries. Calculation tasks must use tools always.
|
| 170 |
+
- Break down complex tasks into logical components - don't try to solve everything at once.
|
| 171 |
+
- Use tool names exactly as listed. If no tool is needed, set "tool": null.
|
| 172 |
+
- Never assume files or URLs existβplan to search/extract before analysing.
|
| 173 |
+
- Skip download steps when the required file is already provided.
|
| 174 |
+
- Ensure later steps only depend on results created by earlier steps.
|
| 175 |
+
- For any numerical work: ALWAYS use tools (calculator/code) - never manual calculation.
|
| 176 |
+
- If the query involves analysis of multiple sources, plan separate steps for each source.
|
| 177 |
+
- Consider data validation and error checking as separate steps when handling files.
|
| 178 |
+
- Plan for visualization or formatting steps when presenting complex results.
|
| 179 |
+
"""
|
| 180 |
+
|
| 181 |
+
|
| 182 |
SYSTEM_EXECUTOR_PROMPT = """
|
| 183 |
You are the executor of a grounded multi-tool agent.
|
| 184 |
|
|
|
|
| 243 |
SPECIAL CONSIDERATIONS:
|
| 244 |
- Any calculation/counting task requires tools (affects complexity assessment)
|
| 245 |
- File analysis tasks usually need multiple steps (load + analyze + calculate)
|
| 246 |
+
- Research tasks typically need search + fetch/extract + synthesis steps
|
| 247 |
- Comparison tasks need separate analysis steps for each item being compared
|
| 248 |
|
| 249 |
RULES:
|
src/requirements.txt
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
docx==0.2.4
|
| 2 |
+
gradio==5.46.1
|
| 3 |
+
ipython==8.12.3
|
| 4 |
+
langchain==0.3.27
|
| 5 |
+
langchain_community==0.3.29
|
| 6 |
+
langchain_core==0.3.76
|
| 7 |
+
langchain_openai==0.3.33
|
| 8 |
+
langgraph==0.6.7
|
| 9 |
+
matplotlib==3.8.2
|
| 10 |
+
numpy==2.3.3
|
| 11 |
+
pandas==2.3.2
|
| 12 |
+
pdfminer==20191125
|
| 13 |
+
Pillow==11.3.0
|
| 14 |
+
protobuf==6.32.1
|
| 15 |
+
pydantic==2.11.9
|
| 16 |
+
pytesseract==0.3.13
|
| 17 |
+
python-dotenv==1.1.1
|
| 18 |
+
Requests==2.32.5
|
| 19 |
+
tldextract==5.3.0
|
| 20 |
+
langchain-tavily
|
| 21 |
+
youtube-transcript-api
|
src/tools/tools.py
CHANGED
|
@@ -6,6 +6,8 @@ import base64
|
|
| 6 |
import tldextract
|
| 7 |
import tempfile
|
| 8 |
from urllib.parse import urlparse
|
|
|
|
|
|
|
| 9 |
import io
|
| 10 |
import pandas as pd
|
| 11 |
from typing import List, Optional, Dict, Any
|
|
@@ -18,6 +20,7 @@ from langchain_community.document_loaders import ArxivLoader
|
|
| 18 |
from langchain_community.document_loaders import WikipediaLoader
|
| 19 |
from PIL import ImageDraw, ImageFont, ImageEnhance, ImageFilter
|
| 20 |
from utils.image_processing import *
|
|
|
|
| 21 |
|
| 22 |
def _exif_dict(img: Image.Image) -> dict:
|
| 23 |
try:
|
|
@@ -38,6 +41,7 @@ def _clip(text: str | None, n: int) -> str:
|
|
| 38 |
return (text[: n - 1] + "β¦") if len(text) > n else text
|
| 39 |
|
| 40 |
|
|
|
|
| 41 |
def _parse_dt(v) -> Optional[str]:
|
| 42 |
"""[ΠΠΠΠΠΠΠΠΠ] ΠΡΠΈΠ²ΠΎΠ΄ΠΈΠΌ Π΄Π°ΡΡ ΠΊ ISO-ΡΡΡΠΎΠΊΠ΅, Π΅ΡΠ»ΠΈ Π²ΠΎΠ·ΠΌΠΎΠΆΠ½ΠΎ."""
|
| 43 |
try:
|
|
@@ -360,6 +364,35 @@ def arxiv_search(
|
|
| 360 |
return json.dumps({"error": str(e), "query": query, "provider": "arxiv"})
|
| 361 |
|
| 362 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 363 |
|
| 364 |
#----------------------------------------------MATH TOOLS------------------------------------------------#
|
| 365 |
|
|
|
|
| 6 |
import tldextract
|
| 7 |
import tempfile
|
| 8 |
from urllib.parse import urlparse
|
| 9 |
+
from langchain_tavily import TavilyExtract
|
| 10 |
+
from youtube_transcript_api import YouTubeTranscriptApi
|
| 11 |
import io
|
| 12 |
import pandas as pd
|
| 13 |
from typing import List, Optional, Dict, Any
|
|
|
|
| 20 |
from langchain_community.document_loaders import WikipediaLoader
|
| 21 |
from PIL import ImageDraw, ImageFont, ImageEnhance, ImageFilter
|
| 22 |
from utils.image_processing import *
|
| 23 |
+
import re
|
| 24 |
|
| 25 |
def _exif_dict(img: Image.Image) -> dict:
|
| 26 |
try:
|
|
|
|
| 41 |
return (text[: n - 1] + "β¦") if len(text) > n else text
|
| 42 |
|
| 43 |
|
| 44 |
+
|
| 45 |
def _parse_dt(v) -> Optional[str]:
|
| 46 |
"""[ΠΠΠΠΠΠΠΠΠ] ΠΡΠΈΠ²ΠΎΠ΄ΠΈΠΌ Π΄Π°ΡΡ ΠΊ ISO-ΡΡΡΠΎΠΊΠ΅, Π΅ΡΠ»ΠΈ Π²ΠΎΠ·ΠΌΠΎΠΆΠ½ΠΎ."""
|
| 47 |
try:
|
|
|
|
| 364 |
return json.dumps({"error": str(e), "query": query, "provider": "arxiv"})
|
| 365 |
|
| 366 |
|
| 367 |
+
@tool
|
| 368 |
+
def web_extract(urls : List[str]) -> str:
|
| 369 |
+
"""
|
| 370 |
+
Extract text content from web pages using TavilyExtract.
|
| 371 |
+
Returns JSON with {url, title, text, images?} for each URL.
|
| 372 |
+
"""
|
| 373 |
+
|
| 374 |
+
tool = TavilyExtract(
|
| 375 |
+
extract_depth="basic",
|
| 376 |
+
include_images=False,
|
| 377 |
+
)
|
| 378 |
+
results = tool.invoke(urls)
|
| 379 |
+
return json.dumps(results)
|
| 380 |
+
|
| 381 |
+
@tool
|
| 382 |
+
def extract_youtube_transcript(url: str, chars: int = 10_00) -> str:
|
| 383 |
+
"""
|
| 384 |
+
Fetch full YouTube transcript (first *chars* characters).
|
| 385 |
+
"""
|
| 386 |
+
|
| 387 |
+
video_id_match = re.search(r"[?&]v=([A-Za-z0-9_\-]{11})", url)
|
| 388 |
+
if not video_id_match:
|
| 389 |
+
return "yt_error:id_not_found"
|
| 390 |
+
try:
|
| 391 |
+
transcript = YouTubeTranscriptApi.get_transcript(video_id_match.group(1))
|
| 392 |
+
text = " ".join(piece["text"] for piece in transcript)
|
| 393 |
+
return text[:chars]
|
| 394 |
+
except Exception as exc:
|
| 395 |
+
return f"yt_error:{exc}"
|
| 396 |
|
| 397 |
#----------------------------------------------MATH TOOLS------------------------------------------------#
|
| 398 |
|
src/workflow_test.ipynb
CHANGED
|
@@ -42,34 +42,43 @@
|
|
| 42 |
"π‘ ββββββββββββββββββββ\n",
|
| 43 |
"π‘ USER QUERY \n",
|
| 44 |
"π‘ ββββββββββββββββββββ\n",
|
| 45 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
"=== COMPLEXITY ASSESSMENT ===\n",
|
| 47 |
"Complexity: simple\n",
|
| 48 |
"Needs planning: False\n",
|
| 49 |
-
"Reasoning:
|
| 50 |
"=== SIMPLE EXECUTION ===\n",
|
| 51 |
"Response generated for simple query.\n",
|
| 52 |
"=== GENERATING EXECUTION REPORT ===\n",
|
| 53 |
"Report generated - Confidence: high\n",
|
| 54 |
"Key findings: 3\n",
|
| 55 |
"Data sources: 2\n",
|
| 56 |
-
"query_summary
|
| 57 |
"=== ENHANCED ANSWER CRITIQUE ===\n",
|
| 58 |
-
"Quality Score:
|
| 59 |
"Complete: True\n",
|
| 60 |
"Accurate: True\n",
|
| 61 |
-
"Issues found: [\"Performed the calculation mentally rather than using an external computational tool (triggers the evaluation framework's manual-calculation penalty).\"]\n",
|
| 62 |
"=== REPLAN DECISION ===\n",
|
| 63 |
"Iteration: 1/10\n",
|
| 64 |
-
"Quality score:
|
| 65 |
"Needs replanning: False\n",
|
| 66 |
"Quality acceptable, ending execution\n"
|
| 67 |
]
|
| 68 |
}
|
| 69 |
],
|
| 70 |
"source": [
|
| 71 |
-
"query = \"
|
| 72 |
-
"result = graph.invoke({\"query\" : query, \"current_step\": 0, \"reasoning_done\": False, \"files\" : [], \"files_contents\" : {}, \"iteration_count\" : 0, \"max_iterations\" : 10, \"plan\" : None} , config = config)"
|
| 73 |
]
|
| 74 |
},
|
| 75 |
{
|
|
@@ -81,7 +90,7 @@
|
|
| 81 |
"name": "stdout",
|
| 82 |
"output_type": "stream",
|
| 83 |
"text": [
|
| 84 |
-
"FINAL ANSWER:
|
| 85 |
]
|
| 86 |
}
|
| 87 |
],
|
|
@@ -97,20 +106,27 @@
|
|
| 97 |
{
|
| 98 |
"data": {
|
| 99 |
"text/plain": [
|
| 100 |
-
"{'messages': [SystemMessage(content='You are a COMPLEXITY ASSESSOR for a multi-tool agent system.\\nYour job is to analyze user queries and determine their complexity level and processing requirements.\\n\\nCOMPLEXITY LEVELS:\\n1. SIMPLE: Direct questions that can be answered immediately without tools or with single tool use\\n - Examples: \"What is photosynthesis?\", \"Define machine learning\", \"What\\'s the capital of France?\"\\n - NOTE: Simple math like \"2+2\" still requires calculator tool but counts as SIMPLE\\n\\n !ALSO: It can be a logical reasoning or explanation task that does not require tools.\\n \\n2. MODERATE: Questions requiring 2-4 tool calls or basic multi-step analysis\\n - Examples: \"Search for recent news about AI\", \"Analyze this CSV file for trends\", \"Calculate ROI from this data\"\\n - \"Compare two datasets\", \"Summarize multiple documents\"\\n \\n3. COMPLEX: Multi-step problems requiring planning, multiple tools, and sophisticated reasoning\\n - Examples: \"Research market trends and create investment strategy\", \"Analyze multiple data sources and predict outcomes\"\\n - \"Build comprehensive report from various inputs\", \"Multi-stage data processing with validation\"\\n\\nMOST OF THE LOGICAL TASKS ARE SIMPLE, UNLESS THEY REQUIRE TOOLS.\\n\\nASSESSMENT CRITERIA:\\n- Number of distinct steps likely needed (1 = Simple, 2-4 = Moderate, 5+ = Complex)\\n- Tool complexity and dependencies between steps\\n- Data processing requirements and validation needs\\n- Need for intermediate reasoning and synthesis\\n- Risk of failure without proper step-by-step planning\\n- Presence of calculations (automatically requires tool usage)\\n\\nSPECIAL CONSIDERATIONS:\\n- Any calculation/counting task requires tools (affects complexity assessment)\\n- File analysis tasks usually need multiple steps (load + analyze + calculate)\\n- Research tasks typically need search + fetch + synthesis steps\\n- Comparison tasks need separate analysis steps for each item being compared\\n\\nRULES:\\n- SIMPLE queries may bypass planning for non-calculation tasks\\n- MODERATE queries benefit from lightweight planning\\n- COMPLEX queries require full planning with fallbacks\\n- When in doubt, err toward higher complexity\\n- Calculation tasks are never truly \"simple\" due to mandatory tool usage\\n\\nAnalyze the query and respond with your assessment.', additional_kwargs={}, response_metadata={}, id='
|
| 101 |
-
" HumanMessage(content='Query:
|
| 102 |
-
" AIMessage(content='
|
| 103 |
-
"
|
| 104 |
-
" '
|
|
|
|
| 105 |
" 'plan': None,\n",
|
| 106 |
-
" 'complexity_assessment': ComplexityLevel(level='simple', reasoning='
|
| 107 |
" 'current_step': 0,\n",
|
| 108 |
" 'reasoning_done': False,\n",
|
| 109 |
-
" 'files': [],\n",
|
| 110 |
-
" '
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 111 |
" 'iteration_count': 1,\n",
|
| 112 |
" 'max_iterations': 10,\n",
|
| 113 |
-
" 'execution_report': ExecutionReport(query_summary
|
| 114 |
]
|
| 115 |
},
|
| 116 |
"execution_count": 5,
|
|
|
|
| 42 |
"π‘ ββββββββββββββββββββ\n",
|
| 43 |
"π‘ USER QUERY \n",
|
| 44 |
"π‘ ββββββββββββββββββββ\n",
|
| 45 |
+
"Processing 1 files:\n",
|
| 46 |
+
"\n",
|
| 47 |
+
"π ββββββββββββββββββββ\n",
|
| 48 |
+
"π FILE PREPARATION \n",
|
| 49 |
+
"π ββββββββββββββββββββ\n",
|
| 50 |
+
"π Processing 1 file(s)\n",
|
| 51 |
+
" - D:/ankelodon_multiagent_system/data/Screenshot_1.png: image (979539 bytes) -> vision_qa_gemma\n",
|
| 52 |
+
" β’ path: D:/ankelodon_multiagent_system/data/Screenshot_1.png\n",
|
| 53 |
+
" β’ type: image\n",
|
| 54 |
+
" β’ size: 979539 bytes\n",
|
| 55 |
+
" β’ suggested_tool: vision_qa_gemma\n",
|
| 56 |
"=== COMPLEXITY ASSESSMENT ===\n",
|
| 57 |
"Complexity: simple\n",
|
| 58 |
"Needs planning: False\n",
|
| 59 |
+
"Reasoning: A mate-in-one is a single-step chess puzzle: it requires identifying one legal white move that delivers checkmate. That is one distinct step, no multi-stage data processing, and no specialized tools are required (unless you want to verify moves with a chess engine). Note: the user did not supply a board position (FEN/diagram/image); if the position is provided the task remains SIMPLE.\n",
|
| 60 |
"=== SIMPLE EXECUTION ===\n",
|
| 61 |
"Response generated for simple query.\n",
|
| 62 |
"=== GENERATING EXECUTION REPORT ===\n",
|
| 63 |
"Report generated - Confidence: high\n",
|
| 64 |
"Key findings: 3\n",
|
| 65 |
"Data sources: 2\n",
|
| 66 |
+
"query_summary='User requested the mate-in-one move for White from a provided chess diagram and asked to return only the algebraic notation of the move.' approach_used='Used an image-based QA tool (vision_qa_gemma) to identify the chess position from the supplied screenshot, determine that it is White to move, compute the mate-in-one, and return the move in algebraic notation.' tools_executed=[ToolExecution(tool_name='vision_qa_gemma', arguments='{\\'question\\': \"Identify the chess position and find mate in one for White. Provide the algebraic notation of the move only. It\\'s White to move.\", \\'image_path\\': \\'D:/ankelodon_multiagent_system/data/Screenshot_1.png\\', \\'temperature\\': 0.2}', call_id='call_RxKLwrV2KP1sJ7ShaImQqbUl')] key_findings=['The position in the provided image was analyzed and confirmed to be White to move.', 'A forced mate in one was identified for White.', 'The mate-in-one move in algebraic notation is: Qf7#.'] data_sources=['Screenshot image: D:/ankelodon_multiagent_system/data/Screenshot_1.png', 'vision_qa_gemma tool output (see Tools Executed)'] assumptions_made=[\"Standard algebraic notation is used (including '#' for mate).\", 'Board orientation is conventional with White pieces at the bottom (as interpreted from the image).', 'The image is an accurate and complete representation of the game state (no hidden or obscured pieces).'] confidence_level='high' limitations=['Result depends on the accuracy of the vision tool interpreting the image; any misread piece or orientation could change the correct move.', 'Only one tool/analysis pass was used; no independent engine verification was run here.', 'If there are multiple legal mate-in-one moves in the position, this report records the move returned by the tool without enumerating alternatives.'] final_answer='Qf7#'\n",
|
| 67 |
"=== ENHANCED ANSWER CRITIQUE ===\n",
|
| 68 |
+
"Quality Score: 7/10\n",
|
| 69 |
"Complete: True\n",
|
| 70 |
"Accurate: True\n",
|
|
|
|
| 71 |
"=== REPLAN DECISION ===\n",
|
| 72 |
"Iteration: 1/10\n",
|
| 73 |
+
"Quality score: 7\n",
|
| 74 |
"Needs replanning: False\n",
|
| 75 |
"Quality acceptable, ending execution\n"
|
| 76 |
]
|
| 77 |
}
|
| 78 |
],
|
| 79 |
"source": [
|
| 80 |
+
"query = \"Find the chekmate in one move and provide only algebraic notation of the move, its a white turn.\"\n",
|
| 81 |
+
"result = graph.invoke({\"query\" : query, \"current_step\": 0, \"reasoning_done\": False, \"files\" : [\"D:/ankelodon_multiagent_system/data/Screenshot_1.png\"], \"files_contents\" : {}, \"iteration_count\" : 0, \"max_iterations\" : 10, \"plan\" : None} , config = config)"
|
| 82 |
]
|
| 83 |
},
|
| 84 |
{
|
|
|
|
| 90 |
"name": "stdout",
|
| 91 |
"output_type": "stream",
|
| 92 |
"text": [
|
| 93 |
+
"FINAL ANSWER: Qf7#\n"
|
| 94 |
]
|
| 95 |
}
|
| 96 |
],
|
|
|
|
| 106 |
{
|
| 107 |
"data": {
|
| 108 |
"text/plain": [
|
| 109 |
+
"{'messages': [SystemMessage(content='You are a COMPLEXITY ASSESSOR for a multi-tool agent system.\\nYour job is to analyze user queries and determine their complexity level and processing requirements.\\n\\nCOMPLEXITY LEVELS:\\n1. SIMPLE: Direct questions that can be answered immediately without tools or with single tool use\\n - Examples: \"What is photosynthesis?\", \"Define machine learning\", \"What\\'s the capital of France?\"\\n - NOTE: Simple math like \"2+2\" still requires calculator tool but counts as SIMPLE\\n\\n !ALSO: It can be a logical reasoning or explanation task that does not require tools.\\n \\n2. MODERATE: Questions requiring 2-4 tool calls or basic multi-step analysis\\n - Examples: \"Search for recent news about AI\", \"Analyze this CSV file for trends\", \"Calculate ROI from this data\"\\n - \"Compare two datasets\", \"Summarize multiple documents\"\\n \\n3. COMPLEX: Multi-step problems requiring planning, multiple tools, and sophisticated reasoning\\n - Examples: \"Research market trends and create investment strategy\", \"Analyze multiple data sources and predict outcomes\"\\n - \"Build comprehensive report from various inputs\", \"Multi-stage data processing with validation\"\\n\\nMOST OF THE LOGICAL TASKS ARE SIMPLE, UNLESS THEY REQUIRE TOOLS.\\n\\nASSESSMENT CRITERIA:\\n- Number of distinct steps likely needed (1 = Simple, 2-4 = Moderate, 5+ = Complex)\\n- Tool complexity and dependencies between steps\\n- Data processing requirements and validation needs\\n- Need for intermediate reasoning and synthesis\\n- Risk of failure without proper step-by-step planning\\n- Presence of calculations (automatically requires tool usage)\\n\\nSPECIAL CONSIDERATIONS:\\n- Any calculation/counting task requires tools (affects complexity assessment)\\n- File analysis tasks usually need multiple steps (load + analyze + calculate)\\n- Research tasks typically need search + fetch/extract + synthesis steps\\n- Comparison tasks need separate analysis steps for each item being compared\\n\\nRULES:\\n- SIMPLE queries may bypass planning for non-calculation tasks\\n- MODERATE queries benefit from lightweight planning\\n- COMPLEX queries require full planning with fallbacks\\n- When in doubt, err toward higher complexity\\n- Calculation tasks are never truly \"simple\" due to mandatory tool usage\\n\\nAnalyze the query and respond with your assessment.', additional_kwargs={}, response_metadata={}, id='a857824a-40fa-4453-8cdd-5dd9c73cd1dc'),\n",
|
| 110 |
+
" HumanMessage(content='Query: Find the chekmate in one move and provide only algebraic notation of the move, its a white turn.', additional_kwargs={}, response_metadata={}, id='5a765bb0-2ced-4919-aa4c-3cb6852b6a6d'),\n",
|
| 111 |
+
" AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_RxKLwrV2KP1sJ7ShaImQqbUl', 'function': {'arguments': '{\"question\":\"Identify the chess position and find mate in one for White. Provide the algebraic notation of the move only. It\\'s White to move.\",\"image_path\":\"D:/ankelodon_multiagent_system/data/Screenshot_1.png\",\"temperature\":0.2}', 'name': 'vision_qa_gemma'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 140, 'prompt_tokens': 1771, 'total_tokens': 1911, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 64, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-5-mini-2025-08-07', 'system_fingerprint': None, 'id': 'chatcmpl-CIjeuIxLBZoq7rEjlZRWcfUO3KQWZ', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run--d9288bb8-21be-4a2e-a8b9-7e4fa97936e9-0', tool_calls=[{'name': 'vision_qa_gemma', 'args': {'question': \"Identify the chess position and find mate in one for White. Provide the algebraic notation of the move only. It's White to move.\", 'image_path': 'D:/ankelodon_multiagent_system/data/Screenshot_1.png', 'temperature': 0.2}, 'id': 'call_RxKLwrV2KP1sJ7ShaImQqbUl', 'type': 'tool_call'}], usage_metadata={'input_tokens': 1771, 'output_tokens': 140, 'total_tokens': 1911, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 64}}),\n",
|
| 112 |
+
" ToolMessage(content='{\"answer\": \"Qf7#\"}', name='vision_qa_gemma', id='559a320c-cb49-4f46-8e18-8b50d13fcbde', tool_call_id='call_RxKLwrV2KP1sJ7ShaImQqbUl')],\n",
|
| 113 |
+
" 'query': 'Find the chekmate in one move and provide only algebraic notation of the move, its a white turn.',\n",
|
| 114 |
+
" 'final_answer': 'FINAL ANSWER: Qf7#',\n",
|
| 115 |
" 'plan': None,\n",
|
| 116 |
+
" 'complexity_assessment': ComplexityLevel(level='simple', reasoning='A mate-in-one is a single-step chess puzzle: it requires identifying one legal white move that delivers checkmate. That is one distinct step, no multi-stage data processing, and no specialized tools are required (unless you want to verify moves with a chess engine). Note: the user did not supply a board position (FEN/diagram/image); if the position is provided the task remains SIMPLE.', needs_planning=False, suggested_approach='Ask the user to provide the chess position (FEN string, diagram, or image). Once the position is available, generate all legal white moves, test which move results in immediate checkmate, and return only the move in algebraic notation (e.g., Qh7#).'),\n",
|
| 117 |
" 'current_step': 0,\n",
|
| 118 |
" 'reasoning_done': False,\n",
|
| 119 |
+
" 'files': ['D:/ankelodon_multiagent_system/data/Screenshot_1.png'],\n",
|
| 120 |
+
" 'file_contents': {'D:/ankelodon_multiagent_system/data/Screenshot_1.png': {'path': 'D:/ankelodon_multiagent_system/data/Screenshot_1.png',\n",
|
| 121 |
+
" 'extension': '.png',\n",
|
| 122 |
+
" 'size': 979539,\n",
|
| 123 |
+
" 'type': 'image',\n",
|
| 124 |
+
" 'suggested_tool': 'vision_qa_gemma',\n",
|
| 125 |
+
" 'preview': None}},\n",
|
| 126 |
+
" 'critique_feedback': CritiqueFeedback(quality_score=7, is_complete=True, is_accurate=True, missing_elements=[], errors_found=[], suggested_improvements=['Consider using an additional chess engine for verification of the mate-in-one move to ensure accuracy.', 'Provide a brief explanation of the reasoning behind the identified move to enhance clarity for the user.'], needs_replanning=False, replan_instructions=None),\n",
|
| 127 |
" 'iteration_count': 1,\n",
|
| 128 |
" 'max_iterations': 10,\n",
|
| 129 |
+
" 'execution_report': ExecutionReport(query_summary='User requested the mate-in-one move for White from a provided chess diagram and asked to return only the algebraic notation of the move.', approach_used='Used an image-based QA tool (vision_qa_gemma) to identify the chess position from the supplied screenshot, determine that it is White to move, compute the mate-in-one, and return the move in algebraic notation.', tools_executed=[ToolExecution(tool_name='vision_qa_gemma', arguments='{\\'question\\': \"Identify the chess position and find mate in one for White. Provide the algebraic notation of the move only. It\\'s White to move.\", \\'image_path\\': \\'D:/ankelodon_multiagent_system/data/Screenshot_1.png\\', \\'temperature\\': 0.2}', call_id='call_RxKLwrV2KP1sJ7ShaImQqbUl')], key_findings=['The position in the provided image was analyzed and confirmed to be White to move.', 'A forced mate in one was identified for White.', 'The mate-in-one move in algebraic notation is: Qf7#.'], data_sources=['Screenshot image: D:/ankelodon_multiagent_system/data/Screenshot_1.png', 'vision_qa_gemma tool output (see Tools Executed)'], assumptions_made=[\"Standard algebraic notation is used (including '#' for mate).\", 'Board orientation is conventional with White pieces at the bottom (as interpreted from the image).', 'The image is an accurate and complete representation of the game state (no hidden or obscured pieces).'], confidence_level='high', limitations=['Result depends on the accuracy of the vision tool interpreting the image; any misread piece or orientation could change the correct move.', 'Only one tool/analysis pass was used; no independent engine verification was run here.', 'If there are multiple legal mate-in-one moves in the position, this report records the move returned by the tool without enumerating alternatives.'], final_answer='Qf7#')}"
|
| 130 |
]
|
| 131 |
},
|
| 132 |
"execution_count": 5,
|