carolinacon commited on
Commit
a4b0424
·
1 Parent(s): fc1b83d

updated chess tool and prompting

Browse files
README.md CHANGED
@@ -16,7 +16,7 @@ hf_oauth_expiration_minutes: 480
16
 
17
  ## Background
18
  Created as a final project for the HuggingFace Agents course ( https://huggingface.co/learn/agents-course).
19
- Aims to answer Level 1 questions from the **GAIA** validation set. It was tested on 20 such questions with a success rate of 75%.
20
  ### What is GAIA
21
 
22
  GAIA is a benchmark for AI assistants evaluation on real-world tasks that involve:
@@ -81,7 +81,7 @@ the game is computed programmatically. Both `gpt-4.1` and `gemini-2.5-flash` mod
81
  **Chess Board Picture Analysis - Challenges and Limitations** 🆘
82
  I tried both `gpt-4.1` and `gemini-2.5-flash` models for chess pieces coordinates extraction, but I obtained inconsistent results (there
83
  are times when they get it right, but also instances when they don't).
84
- At least for openai I see there is a limitation listed on their website (see [here](https://platform.openai.com/docs/guides/images-vision?api-mode=responses#limitations)):
85
  >Spatial reasoning: The model struggles with tasks requiring precise spatial localization, such as identifying chess positions.
86
 
87
  The tool questions both models and does an arbitrage on their results. It queries both models further but only on pieces with conflicting positions.
 
16
 
17
  ## Background
18
  Created as a final project for the HuggingFace Agents course ( https://huggingface.co/learn/agents-course).
19
+ Aims to answer Level 1 questions from the **GAIA** validation set. It was tested on 20 such questions with a success rate of 90%.
20
  ### What is GAIA
21
 
22
  GAIA is a benchmark for AI assistants evaluation on real-world tasks that involve:
 
81
  **Chess Board Picture Analysis - Challenges and Limitations** 🆘
82
  I tried both `gpt-4.1` and `gemini-2.5-flash` models for chess pieces coordinates extraction, but I obtained inconsistent results (there
83
  are times when they get it right, but also instances when they don't).
84
+ At least for openai I see there is a limitation listed on their website (see [https://platform.openai.com/docs/guides/images-vision?api-mode=responses#limitations](https://platform.openai.com/docs/guides/images-vision?api-mode=responses#limitations)):
85
  >Spatial reasoning: The model struggles with tasks requiring precise spatial localization, such as identifying chess positions.
86
 
87
  The tool questions both models and does an arbitrage on their results. It queries both models further but only on pieces with conflicting positions.
config/prompts.yaml CHANGED
@@ -2,7 +2,7 @@ prompts:
2
  base_system:
3
  content: |
4
  You are a general AI assistant tasked with answering complex questions.
5
- Make sure you think step by step in order to answer the given question.
6
 
7
  Here is the the question you received:
8
  <question>
@@ -15,13 +15,15 @@ prompts:
15
  <summary>
16
  {{summary}}
17
  </summary>
 
 
18
 
19
  For mathematical questions or problems delegate them to the math_tool.
20
  For chess related questions use the chess_analysis_tool.
21
 
22
  Include citations for all the information you retrieve, ensuring you know exactly where the data comes from.
23
- If you have the information inside your knowledge, still call a tool in order to confirm it.
24
-
25
  **Guidelines for Answering Questions:**
26
 
27
  * **Citations:** Always support findings with source URLs, clearly provided as in-text citations.
@@ -33,8 +35,6 @@ prompts:
33
  * **Observation:** Analyze obtained results.
34
  * Repeat Thought/Action/Observation cycles as needed.
35
  * **Final Answer:** Synthesize and present findings with citations in markdown format.
36
-
37
- Break down a problem into sub-problems and solve it step by step.
38
 
39
  If the value of chunked_last_tool_call is true, this means that the last tool execution returns a result formed from the concatenation
40
  of multiple chunks.
@@ -50,9 +50,9 @@ prompts:
50
  Process the answer and extract YOUR FINAL ANSWER to be provided to the user. Make sure it respects the following guidelines.
51
  YOUR FINAL ANSWER should be a number OR as few words as possible OR a comma separated list of numbers and/or strings.
52
  If you are asked for a number, don't use comma to write your number neither use units such as $ or percent sign unless specified otherwise.
53
- If you are asked for a string, don't use articles, neither abbreviations (e.g. for cities), and write the digits in plain text unless specified otherwise.
54
  If you are asked for a comma separated list, apply the above rules depending of whether the element to be put in the list is a number or a string.
55
- The rule for a comma-separated list is to always add a space after the comma, but never before it.
56
  type: answer_refinement
57
  variables: []
58
  version: 1.0
 
2
  base_system:
3
  content: |
4
  You are a general AI assistant tasked with answering complex questions.
5
+ Break down the problem into smaller, manageable sub-problems. Then, solve each sub-problem step by step to reach the final answer.
6
 
7
  Here is the the question you received:
8
  <question>
 
15
  <summary>
16
  {{summary}}
17
  </summary>
18
+
19
+ Important: note that the last two messages were not included yet in this summary.
20
 
21
  For mathematical questions or problems delegate them to the math_tool.
22
  For chess related questions use the chess_analysis_tool.
23
 
24
  Include citations for all the information you retrieve, ensuring you know exactly where the data comes from.
25
+ If you have the information inside your knowledge, still call a tool in order to confirm it.
26
+
27
  **Guidelines for Answering Questions:**
28
 
29
  * **Citations:** Always support findings with source URLs, clearly provided as in-text citations.
 
35
  * **Observation:** Analyze obtained results.
36
  * Repeat Thought/Action/Observation cycles as needed.
37
  * **Final Answer:** Synthesize and present findings with citations in markdown format.
 
 
38
 
39
  If the value of chunked_last_tool_call is true, this means that the last tool execution returns a result formed from the concatenation
40
  of multiple chunks.
 
50
  Process the answer and extract YOUR FINAL ANSWER to be provided to the user. Make sure it respects the following guidelines.
51
  YOUR FINAL ANSWER should be a number OR as few words as possible OR a comma separated list of numbers and/or strings.
52
  If you are asked for a number, don't use comma to write your number neither use units such as $ or percent sign unless specified otherwise.
53
+ If you are asked for a string, don't use articles and don't use abbreviations (e.g. for cities names) and write the digits in plain text unless specified otherwise.
54
  If you are asked for a comma separated list, apply the above rules depending of whether the element to be put in the list is a number or a string.
55
+ Very Important: The rule for a comma-separated list is to always add a space after the comma, but never before it.
56
  type: answer_refinement
57
  variables: []
58
  version: 1.0
core/agent.py CHANGED
@@ -1,16 +1,15 @@
1
  from typing import Optional
2
 
3
  from langchain_core.messages import HumanMessage
 
4
  from langgraph.graph.state import CompiledStateGraph
 
 
5
 
6
  from core.messages import Attachment
7
  from core.state import State
8
  from nodes.nodes import assistant, optimize_memory, response_processing, pre_processor
9
- from tools.tavily_tools import llm_tools
10
-
11
- from langgraph.graph import START, StateGraph, END
12
- from langgraph.prebuilt import tools_condition
13
- from langgraph.prebuilt import ToolNode
14
 
15
 
16
  class GaiaAgent:
@@ -23,7 +22,7 @@ class GaiaAgent:
23
  # Define nodes: these do the work
24
  builder.add_node("pre_processor", pre_processor)
25
  builder.add_node("assistant", assistant)
26
- builder.add_node("tools", ToolNode(llm_tools))
27
  builder.add_node("optimize_memory", optimize_memory)
28
  builder.add_node("response_processing", response_processing)
29
 
@@ -49,7 +48,7 @@ class GaiaAgent:
49
  if attachment:
50
  initial_state["file_reference"] = attachment.file_path
51
 
52
- messages = self.react_graph.invoke(initial_state)
53
  # for m in messages['messages']:
54
  # m.pretty_print()
55
 
 
1
  from typing import Optional
2
 
3
  from langchain_core.messages import HumanMessage
4
+ from langgraph.graph import START, StateGraph, END
5
  from langgraph.graph.state import CompiledStateGraph
6
+ from langgraph.prebuilt import ToolNode
7
+ from langgraph.prebuilt import tools_condition
8
 
9
  from core.messages import Attachment
10
  from core.state import State
11
  from nodes.nodes import assistant, optimize_memory, response_processing, pre_processor
12
+ from tools.tavily_tools import web_search_tools
 
 
 
 
13
 
14
 
15
  class GaiaAgent:
 
22
  # Define nodes: these do the work
23
  builder.add_node("pre_processor", pre_processor)
24
  builder.add_node("assistant", assistant)
25
+ builder.add_node("tools", ToolNode(web_search_tools))
26
  builder.add_node("optimize_memory", optimize_memory)
27
  builder.add_node("response_processing", response_processing)
28
 
 
48
  if attachment:
49
  initial_state["file_reference"] = attachment.file_path
50
 
51
+ messages = self.react_graph.invoke(initial_state, {"recursion_limit": 30})
52
  # for m in messages['messages']:
53
  # m.pretty_print()
54
 
core/state.py CHANGED
@@ -1,8 +1,7 @@
1
- from langgraph.graph import MessagesState
2
  import operator
3
- from typing_extensions import TypedDict, Annotated, List, Sequence
4
- from langchain_core.messages import BaseMessage
5
- from langgraph.graph.message import add_messages
6
 
7
 
8
  class State(MessagesState):
 
 
1
  import operator
2
+
3
+ from langgraph.graph import MessagesState
4
+ from typing_extensions import Annotated, List
5
 
6
 
7
  class State(MessagesState):
nodes/chunking_node.py CHANGED
@@ -58,15 +58,20 @@ class OversizedContentHandler:
58
  def process_oversized_message(self, message: BaseMessage, query: str) -> bool:
59
  chunked = False
60
  # At this point we are chunking only tavily_extract results messages
 
61
  if isinstance(message, ToolMessage) and message.name == "tavily_extract":
62
- json_content = json.loads(message.content)
63
- result = json_content['results'][0]
64
- raw_content = result['raw_content']
 
 
 
 
65
 
66
- content_size = self.count_tokens(raw_content)
67
- if content_size > config.MAX_CONTEXT_TOKENS:
68
- print(f"Proceed with chunking, evaluated no of tokens {content_size} for message {message.id}")
69
- chunked = True
70
- result['raw_content'] = self.extract_relevant_chunks(raw_content, query=query)
71
- message.content = json.dumps(json_content)
72
  return chunked
 
58
  def process_oversized_message(self, message: BaseMessage, query: str) -> bool:
59
  chunked = False
60
  # At this point we are chunking only tavily_extract results messages
61
+ json_content = None
62
  if isinstance(message, ToolMessage) and message.name == "tavily_extract":
63
+ try:
64
+ json_content = json.loads(message.content)
65
+ except Exception as e:
66
+ print("cannot parse message")
67
+ if json_content:
68
+ result = json_content['results'][0]
69
+ raw_content = result['raw_content']
70
 
71
+ content_size = self.count_tokens(raw_content)
72
+ if content_size > config.MAX_CONTEXT_TOKENS:
73
+ print(f"Proceed with chunking, evaluated no of tokens {content_size} for message {message.id}")
74
+ chunked = True
75
+ result['raw_content'] = self.extract_relevant_chunks(raw_content, query=query)
76
+ message.content = json.dumps(json_content)
77
  return chunked
nodes/nodes.py CHANGED
@@ -13,17 +13,17 @@ from tools.chess_tool import chess_analysis_tool
13
  from tools.excel_tool import query_excel_file
14
  from tools.math_agent import math_tool
15
  from tools.python_executor import execute_python_code
16
- from tools.tavily_tools import llm_tools
17
  from utils.prompt_manager import prompt_mgmt
18
 
19
  model = ChatOpenAI(model="gpt-4.1")
20
  response_processing_model = ChatOpenAI(model="gpt-4.1-mini")
21
- llm_tools.append(query_audio)
22
- llm_tools.append(query_excel_file)
23
- llm_tools.append(execute_python_code)
24
- llm_tools.append(math_tool)
25
- llm_tools.append(chess_analysis_tool)
26
- model = model.bind_tools(llm_tools, parallel_tool_calls=False)
27
 
28
 
29
  # Node
 
13
  from tools.excel_tool import query_excel_file
14
  from tools.math_agent import math_tool
15
  from tools.python_executor import execute_python_code
16
+ from tools.tavily_tools import web_search_tools
17
  from utils.prompt_manager import prompt_mgmt
18
 
19
  model = ChatOpenAI(model="gpt-4.1")
20
  response_processing_model = ChatOpenAI(model="gpt-4.1-mini")
21
+ web_search_tools.append(query_audio)
22
+ web_search_tools.append(query_excel_file)
23
+ web_search_tools.append(execute_python_code)
24
+ web_search_tools.append(math_tool)
25
+ web_search_tools.append(chess_analysis_tool)
26
+ model = model.bind_tools(web_search_tools, parallel_tool_calls=False)
27
 
28
 
29
  # Node
tools/chess_tool.py CHANGED
@@ -118,7 +118,8 @@ class ChessVisionAnalyzer:
118
  HumanMessage(content=[
119
  {
120
  "type": "text",
121
- "text": "Analyze this chess board image and return the chess board orientation. "
 
122
 
123
  },
124
  {
@@ -134,7 +135,7 @@ class ChessVisionAnalyzer:
134
  response = self.llm1.invoke(messages)
135
  return response.content
136
 
137
- def analyze_board_from_image(self, active_color: str, image_path: str, llm_no: int,
138
  squares: Optional[list] = None) -> Optional[ChessBoardAnalysis]:
139
  """Analyze chess board image and return FEN notation"""
140
  base64_image = encode_image_to_base64(image_path)
@@ -153,10 +154,8 @@ class ChessVisionAnalyzer:
153
  {
154
  "type": "text",
155
  "text": f"""Analyze this chess board image and return the pieces positions.
156
- The chess board orientation is from **Black's perspective**.
157
- - The files are labeled from **h to a** (left to right).
158
- - The ranks are labeled from **8 to 1** (bottom to top).
159
- This matches the standard orientation for Black's perspective.
160
  {squares_text}
161
  Return the positions of the pieces in JSON format.
162
  Use the following schema for each piece:
@@ -190,14 +189,14 @@ class ChessVisionAnalyzer:
190
  return self._parse_llm_response(response.content)
191
 
192
  def analyze_board(self, active_color: str, file_reference: str) -> str:
193
- first_analysis_res = self.analyze_board_from_image(active_color, file_reference, 1)
194
- second_analysis_res = self.analyze_board_from_image(active_color, file_reference, 2)
 
195
 
196
  result = self.compare_analyses(first_analysis_res, second_analysis_res)
197
  if result['conflicts'] is not None and len(result['conflicts']) > 0:
198
- arbitrage_result = self.arbitrate_conflicts(result, active_color, file_reference, 3)
199
 
200
- # todo: if there are still conflicts let one of the llms win
201
  return arbitrage_result.get("consensus").to_fen(active_color)
202
  else:
203
  result.get("consensus").to_fen(active_color)
@@ -226,8 +225,7 @@ class ChessVisionAnalyzer:
226
  return None
227
 
228
  def compare_analyses(self, analysis_1: ChessBoardAnalysis, analysis_2: ChessBoardAnalysis) -> dict:
229
- """Compare two analyses and identify conflicts"""
230
- print("Comparing analyses")
231
 
232
  if not analysis_1 or not analysis_2:
233
  return {"conflicts": [], "consensus": None, "need_arbitration": False}
@@ -264,7 +262,7 @@ class ChessVisionAnalyzer:
264
  "need_arbitration": need_arbitration
265
  }
266
 
267
- def arbitrate_conflicts(self, state: dict, active_color: str, image_path: str, depth: int = 1) -> dict:
268
  """Arbitrate conflicting piece positions"""
269
  print(f"Arbitrating conflicts with depth {depth}")
270
 
@@ -278,14 +276,14 @@ class ChessVisionAnalyzer:
278
 
279
  print("Pieces with conflicts:", conflicts_sqares)
280
 
281
- first_analysis_res = self.analyze_board_from_image(active_color, image_path, 1, conflicts_sqares)
282
- second_analysis_res = self.analyze_board_from_image(active_color, image_path, 2, conflicts_sqares)
283
  result = self.compare_analyses(first_analysis_res, second_analysis_res)
284
  result.get("consensus").merge_with(state.get("consensus"))
285
  if result['conflicts'] is not None and len(result['conflicts']) > 0:
286
  if depth > 0:
287
  depth -= 1
288
- result = self.arbitrate_conflicts(result, active_color, image_path, depth)
289
  else:
290
  print("Arbitrage completed with conflicts. took llm2 as ground truth")
291
  result.get("consensus").merge_with(second_analysis_res)
@@ -343,7 +341,7 @@ class ChessMoveExplainer:
343
  5. Keep it concise but informative for an intermediate player
344
  """
345
 
346
- response = self.llm([HumanMessage(content=prompt)])
347
  return response.content
348
 
349
 
 
118
  HumanMessage(content=[
119
  {
120
  "type": "text",
121
+ "text": f"Analyze this chess board image and return the chess board orientation. I know that the "
122
+ f"active color is {active_color}"
123
 
124
  },
125
  {
 
135
  response = self.llm1.invoke(messages)
136
  return response.content
137
 
138
+ def analyze_board_from_image(self, board_orientation: str, image_path: str, llm_no: int,
139
  squares: Optional[list] = None) -> Optional[ChessBoardAnalysis]:
140
  """Analyze chess board image and return FEN notation"""
141
  base64_image = encode_image_to_base64(image_path)
 
154
  {
155
  "type": "text",
156
  "text": f"""Analyze this chess board image and return the pieces positions.
157
+ {board_orientation}
158
+
 
 
159
  {squares_text}
160
  Return the positions of the pieces in JSON format.
161
  Use the following schema for each piece:
 
189
  return self._parse_llm_response(response.content)
190
 
191
  def analyze_board(self, active_color: str, file_reference: str) -> str:
192
+ board_orientation = self.analyze_board_orientation(active_color, file_reference)
193
+ first_analysis_res = self.analyze_board_from_image(board_orientation, file_reference, 1)
194
+ second_analysis_res = self.analyze_board_from_image(board_orientation, file_reference, 2)
195
 
196
  result = self.compare_analyses(first_analysis_res, second_analysis_res)
197
  if result['conflicts'] is not None and len(result['conflicts']) > 0:
198
+ arbitrage_result = self.arbitrate_conflicts(result, board_orientation, file_reference, 3)
199
 
 
200
  return arbitrage_result.get("consensus").to_fen(active_color)
201
  else:
202
  result.get("consensus").to_fen(active_color)
 
225
  return None
226
 
227
  def compare_analyses(self, analysis_1: ChessBoardAnalysis, analysis_2: ChessBoardAnalysis) -> dict:
228
+ """Compare the given analyses and identify conflicts"""
 
229
 
230
  if not analysis_1 or not analysis_2:
231
  return {"conflicts": [], "consensus": None, "need_arbitration": False}
 
262
  "need_arbitration": need_arbitration
263
  }
264
 
265
+ def arbitrate_conflicts(self, state: dict, board_orientation: str, image_path: str, depth: int = 1) -> dict:
266
  """Arbitrate conflicting piece positions"""
267
  print(f"Arbitrating conflicts with depth {depth}")
268
 
 
276
 
277
  print("Pieces with conflicts:", conflicts_sqares)
278
 
279
+ first_analysis_res = self.analyze_board_from_image(board_orientation, image_path, 1, conflicts_sqares)
280
+ second_analysis_res = self.analyze_board_from_image(board_orientation, image_path, 2, conflicts_sqares)
281
  result = self.compare_analyses(first_analysis_res, second_analysis_res)
282
  result.get("consensus").merge_with(state.get("consensus"))
283
  if result['conflicts'] is not None and len(result['conflicts']) > 0:
284
  if depth > 0:
285
  depth -= 1
286
+ result = self.arbitrate_conflicts(result, board_orientation, image_path, depth)
287
  else:
288
  print("Arbitrage completed with conflicts. took llm2 as ground truth")
289
  result.get("consensus").merge_with(second_analysis_res)
 
341
  5. Keep it concise but informative for an intermediate player
342
  """
343
 
344
+ response = self.llm.invoke([HumanMessage(content=prompt)])
345
  return response.content
346
 
347
 
tools/tavily_tools.py CHANGED
@@ -1,21 +1,17 @@
1
  from langchain_tavily import TavilySearch
2
  from langchain_tavily import TavilyExtract
3
- from langchain_tavily import TavilyCrawl
4
 
5
  # Initialize Tavily Search Tool
6
  tavily_search_tool = TavilySearch(
7
  max_results=10,
8
  topic="general",
9
  # Make sure to avoid retrieving the response from a dataset or a space
10
- exclude_domains =["https://huggingface.co/datasets", "https://huggingface.co/spaces"]
11
  )
12
 
13
  # Define the LangChain extract tool
14
  tavily_extract_tool = TavilyExtract(extract_depth="basic")
15
 
16
- # Define the LangChain crawl tool
17
- tavily_crawl_tool = TavilyCrawl()
18
-
19
- llm_tools = [
20
- tavily_search_tool, tavily_extract_tool, tavily_crawl_tool
21
- ]
 
1
  from langchain_tavily import TavilySearch
2
  from langchain_tavily import TavilyExtract
 
3
 
4
  # Initialize Tavily Search Tool
5
  tavily_search_tool = TavilySearch(
6
  max_results=10,
7
  topic="general",
8
  # Make sure to avoid retrieving the response from a dataset or a space
9
+ exclude_domains=["https://huggingface.co/datasets", "https://huggingface.co/spaces"]
10
  )
11
 
12
  # Define the LangChain extract tool
13
  tavily_extract_tool = TavilyExtract(extract_depth="basic")
14
 
15
+ web_search_tools = [
16
+ tavily_search_tool, tavily_extract_tool
17
+ ]