carolinacon commited on
Commit
11db2a0
·
1 Parent(s): b4f9800

Updated the README file

Browse files
Files changed (2) hide show
  1. README.md +43 -37
  2. tools/chess_tool.py +0 -1
README.md CHANGED
@@ -12,30 +12,26 @@ hf_oauth: true
12
  hf_oauth_expiration_minutes: 480
13
  ---
14
 
15
- # General AI Assistant
16
 
17
  ## Background
18
  Created as a final project for the HuggingFace Agents course ( https://huggingface.co/learn/agents-course).
19
  Aims to answer Level 1 questions from the **GAIA** validation set. It was tested on 20 such questions with a success rate of 65%.
20
- ### GAIA
21
-
22
- GAIA is a benchmark for AI assistants evaluation on real-world tasks that require a combination of capabilities—such
23
- as reasoning, multimodal understanding, web browsing, and proficient tool use (see https://huggingface.co/learn/agents-course/unit4/what-is-gaia).
24
-
25
- GAIA was introduced in the paper [”GAIA: A Benchmark for General AI Assistants”](https://arxiv.org/abs/2311.12983).
26
-
27
-
28
- The questions challenge AI systems in several ways:
29
 
 
30
  - Involve multimodal reasoning (e.g., analyzing images, audio, documents)
31
  - Demand multi-hop retrieval of interdependent facts
32
  - Involve running python code
33
  - Require a structured response format
34
 
 
35
 
36
- ## Implementation Highlights
 
 
 
37
 
38
-
39
 
40
  **The agent** is implemented using the LangGraph framework.
41
 
@@ -44,57 +40,67 @@ The questions challenge AI systems in several ways:
44
 
45
  **Tools**
46
 
47
- 🔎 **Web Search**: uses `tavily` search and extract tools.
48
 
49
- - **Chunking**: The content returned by the exact might be too large to be further analyzed at once by a model (depending on the chosen model context window size or on the rate limitation),
50
- so if its size exceeds a pre-configured threshold, it will be chunked and only the most relevant chunks will be analyzed.
51
  - **Text Splitting**: First by markdown (used Langchain's `MarkdownHeaderTextSplitter`) and then further by size with a sliding window (used LangChain's `RecursiveCharacterTextSplitter`).
52
- - **Embeddings**: langchain_community.embeddings.OpenAIEmbeddings
53
- - **Vector DB**: FAISS vector db.
54
- - **Retrieval**: FAISS similarity search
55
 
56
- Updated the original extract tool response message content only with the relevant chunks content.
57
 
58
- 🔉 **Audio**: uses `gpt-4o-audio-preview` to analyze the input
59
 
60
- 🧮 **Math problems**: this is a subagent that uses `gpt-5` equipped with the following tools:
61
 
62
- - **Pyhton code executor**: executes a snipped of python code provided as input
63
  - **Think tool**: used for strategic reflection on the progress of the solving process
64
 
65
- **The Math Agent States:**
 
 
66
 
 
 
67
 
68
- 🧩 **Python code**
69
- This tool can run either a snippet of python code or a python file. The python file is executed in a sub-process.
70
 
71
- 📊 **Spreadsheets**
72
- In order to analyze `excel` files this tool uses the pandas dataframe agent
73
- `langchain_experimental.agents import create_pandas_dataframe_agent`
74
- It uses `gpt-4.1` model.
 
 
 
 
 
 
 
 
75
 
76
- ♟️ **Chess**
77
- Given a chess board and the active color, this tool is able to suggest the best move to be performed by the active color.
78
 
79
- - **Picture analysis**: the tool must detect
80
 
81
- 🎥 **Videos**
82
 
83
 
84
 
85
- ## Challenges
 
 
86
 
87
 
88
 
89
- ## Future improvements
90
  #### 1. Evaluation
91
  #### 2. Chunking
92
  #### 3. Audio Analysis
93
  #### 3. Video Analysis
94
  #### 4. Chessboard Images analysis
95
 
96
- ## References:
97
- https://github.com/langchain-ai/open_deep_research
98
 
99
 
100
 
 
12
  hf_oauth_expiration_minutes: 480
13
  ---
14
 
15
+ # General AI Assistant 🔮
16
 
17
  ## Background
18
  Created as a final project for the HuggingFace Agents course ( https://huggingface.co/learn/agents-course).
19
  Aims to answer Level 1 questions from the **GAIA** validation set. It was tested on 20 such questions with a success rate of 65%.
20
+ ### What is GAIA
 
 
 
 
 
 
 
 
21
 
22
+ GAIA is a benchmark for AI assistants evaluation on real-world tasks that challenge them in several ways:
23
  - Involve multimodal reasoning (e.g., analyzing images, audio, documents)
24
  - Demand multi-hop retrieval of interdependent facts
25
  - Involve running python code
26
  - Require a structured response format
27
 
28
+ (see https://huggingface.co/learn/agents-course/unit4/what-is-gaia).
29
 
30
+ GAIA introductory paper [”GAIA: A Benchmark for General AI Assistants”](https://arxiv.org/abs/2311.12983).
31
+
32
+
33
+ ## Implementation Highlights 🛠️
34
 
 
35
 
36
  **The agent** is implemented using the LangGraph framework.
37
 
 
40
 
41
  **Tools**
42
 
43
+ **Web Search** 🔎 uses `tavily` search and extract tools.
44
 
45
+ - **Chunking**: The content returned by the `extract` tool might be too large to be further analyzed at once by a model (depending on the chosen model context window size or on the rate limitation),
46
+ so if its size exceeds a pre-configured threshold, it is chunked and only the most relevant chunks further analyzed.
47
  - **Text Splitting**: First by markdown (used Langchain's `MarkdownHeaderTextSplitter`) and then further by size with a sliding window (used LangChain's `RecursiveCharacterTextSplitter`).
48
+ - **Embeddings**: `langchain_community.embeddings.OpenAIEmbeddings`
49
+ - **Vector DB**: `FAISS` vector db.
50
+ - **Retrieval**: `FAISS` similarity search
51
 
52
+ Updated the original `extract` tool response message content only with the relevant chunks content.
53
 
54
+ **Audio**🔉 uses `gpt-4o-audio-preview` to analyze the input
55
 
56
+ **Math problems Solver**🧮 is a subagent that uses `gpt-5` equipped with the following tools:
57
 
58
+ - **Pyhton code executor**: executes a snippet of python code provided as input
59
  - **Think tool**: used for strategic reflection on the progress of the solving process
60
 
61
+ It has the following states:
62
+
63
+ **Python code Executor**⚙️ can run either a snippet of python code or a python file. The python file is executed in a sub-process.
64
 
65
+ **Spreadsheets**📊: analyzes `excel` files using the pandas dataframe agent `langchain_experimental.agents.create_pandas_dataframe_agent`
66
+ and the`gpt-4.1` model.
67
 
68
+ **Chess**♟️ Given a chess board and the active color, this tool is able to suggest the best move to be performed by the active color.
 
69
 
70
+ - **Picture analysis**: the tool must detect the location of each piece on the chess board. Once the coordinates are retrieved, the FEN of
71
+ the game is computed programmatically. I use both `gpt-4.1` and `gemini-2.5-flash` to extract the coordinates and I perform an arbitrage on their outcomes.
72
+ - **Move suggestion**: the best move is suggested by a `stockfish` chess engine
73
+ - **Move interpretation**: the move is then interpreted and transcribed into the algebraic notation. Used `gpt-4` for this.
74
+
75
+ **Chess board picture analysis Challenges and Limitations** 🆘
76
+ - I tried both `gpt-4.1` and `gemini-2.5-flash` for chess pieces coordinates extraction, but I obtained inconsistent results (there
77
+ are times when they get it right, but also instances when they don't).
78
+ At least for openai I see there is a limitation on Spatial reasoning: The model struggles with tasks requiring precise spatial localization, such as identifying chess positions (see [here](https://platform.openai.com/docs/guides/images-vision?api-mode=responses#limitations)).
79
+ I questioned both models and chose to do an arbitrage on their results. I invoked both further but only on the conflicting positions.
80
+ I continue this process for a limited number of steps. My aim was to reduce the number of objects that the model focuses on.
81
+ But still, the inconsistencies remain.
82
 
 
 
83
 
 
84
 
85
+ 🎥 **Videos** 🚧
86
 
87
 
88
 
89
+ ## Challenges 🆘
90
+ 1. Chess Board picture anaysis.
91
+ 2. Video analysis.
92
 
93
 
94
 
95
+ ## Future work and improvements 🔜
96
  #### 1. Evaluation
97
  #### 2. Chunking
98
  #### 3. Audio Analysis
99
  #### 3. Video Analysis
100
  #### 4. Chessboard Images analysis
101
 
102
+ ## References 📚
103
+ The math tool implementation was inspired from this repo https://github.com/langchain-ai/open_deep_research
104
 
105
 
106
 
tools/chess_tool.py CHANGED
@@ -275,7 +275,6 @@ class ChessVisionAnalyzer:
275
 
276
  print("Squares with conflicts:", conflicts_sqares)
277
 
278
-
279
  first_analysis_res = self.analyze_board_from_image(active_color, image_path, 1, conflicts_sqares)
280
  second_analysis_res = self.analyze_board_from_image(active_color, image_path, 2, conflicts_sqares)
281
  result = self.compare_analyses(first_analysis_res, second_analysis_res)
 
275
 
276
  print("Squares with conflicts:", conflicts_sqares)
277
 
 
278
  first_analysis_res = self.analyze_board_from_image(active_color, image_path, 1, conflicts_sqares)
279
  second_analysis_res = self.analyze_board_from_image(active_color, image_path, 2, conflicts_sqares)
280
  result = self.compare_analyses(first_analysis_res, second_analysis_res)