Commit
·
11db2a0
1
Parent(s):
b4f9800
Updated the README file
Browse files- README.md +43 -37
- tools/chess_tool.py +0 -1
README.md
CHANGED
|
@@ -12,30 +12,26 @@ hf_oauth: true
|
|
| 12 |
hf_oauth_expiration_minutes: 480
|
| 13 |
---
|
| 14 |
|
| 15 |
-
# General AI Assistant
|
| 16 |
|
| 17 |
## Background
|
| 18 |
Created as a final project for the HuggingFace Agents course ( https://huggingface.co/learn/agents-course).
|
| 19 |
Aims to answer Level 1 questions from the **GAIA** validation set. It was tested on 20 such questions with a success rate of 65%.
|
| 20 |
-
### GAIA
|
| 21 |
-
|
| 22 |
-
GAIA is a benchmark for AI assistants evaluation on real-world tasks that require a combination of capabilities—such
|
| 23 |
-
as reasoning, multimodal understanding, web browsing, and proficient tool use (see https://huggingface.co/learn/agents-course/unit4/what-is-gaia).
|
| 24 |
-
|
| 25 |
-
GAIA was introduced in the paper [”GAIA: A Benchmark for General AI Assistants”](https://arxiv.org/abs/2311.12983).
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
The questions challenge AI systems in several ways:
|
| 29 |
|
|
|
|
| 30 |
- Involve multimodal reasoning (e.g., analyzing images, audio, documents)
|
| 31 |
- Demand multi-hop retrieval of interdependent facts
|
| 32 |
- Involve running python code
|
| 33 |
- Require a structured response format
|
| 34 |
|
|
|
|
| 35 |
|
| 36 |
-
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
-
|
| 39 |
|
| 40 |
**The agent** is implemented using the LangGraph framework.
|
| 41 |
|
|
@@ -44,57 +40,67 @@ The questions challenge AI systems in several ways:
|
|
| 44 |
|
| 45 |
**Tools**
|
| 46 |
|
| 47 |
-
|
| 48 |
|
| 49 |
-
- **Chunking**: The content returned by the
|
| 50 |
-
so if its size exceeds a pre-configured threshold, it
|
| 51 |
- **Text Splitting**: First by markdown (used Langchain's `MarkdownHeaderTextSplitter`) and then further by size with a sliding window (used LangChain's `RecursiveCharacterTextSplitter`).
|
| 52 |
-
- **Embeddings**: langchain_community.embeddings.OpenAIEmbeddings
|
| 53 |
-
- **Vector DB**: FAISS vector db.
|
| 54 |
-
- **Retrieval**: FAISS similarity search
|
| 55 |
|
| 56 |
-
Updated the original extract tool response message content only with the relevant chunks content.
|
| 57 |
|
| 58 |
-
|
| 59 |
|
| 60 |
-
|
| 61 |
|
| 62 |
-
- **Pyhton code executor**: executes a
|
| 63 |
- **Think tool**: used for strategic reflection on the progress of the solving process
|
| 64 |
|
| 65 |
-
|
|
|
|
|
|
|
| 66 |
|
|
|
|
|
|
|
| 67 |
|
| 68 |
-
|
| 69 |
-
This tool can run either a snippet of python code or a python file. The python file is executed in a sub-process.
|
| 70 |
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
|
| 76 |
-
♟️ **Chess**
|
| 77 |
-
Given a chess board and the active color, this tool is able to suggest the best move to be performed by the active color.
|
| 78 |
|
| 79 |
-
- **Picture analysis**: the tool must detect
|
| 80 |
|
| 81 |
-
🎥 **Videos**
|
| 82 |
|
| 83 |
|
| 84 |
|
| 85 |
-
## Challenges
|
|
|
|
|
|
|
| 86 |
|
| 87 |
|
| 88 |
|
| 89 |
-
## Future improvements
|
| 90 |
#### 1. Evaluation
|
| 91 |
#### 2. Chunking
|
| 92 |
#### 3. Audio Analysis
|
| 93 |
#### 3. Video Analysis
|
| 94 |
#### 4. Chessboard Images analysis
|
| 95 |
|
| 96 |
-
## References
|
| 97 |
-
https://github.com/langchain-ai/open_deep_research
|
| 98 |
|
| 99 |
|
| 100 |
|
|
|
|
| 12 |
hf_oauth_expiration_minutes: 480
|
| 13 |
---
|
| 14 |
|
| 15 |
+
# General AI Assistant 🔮
|
| 16 |
|
| 17 |
## Background
|
| 18 |
Created as a final project for the HuggingFace Agents course ( https://huggingface.co/learn/agents-course).
|
| 19 |
Aims to answer Level 1 questions from the **GAIA** validation set. It was tested on 20 such questions with a success rate of 65%.
|
| 20 |
+
### What is GAIA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
+
GAIA is a benchmark for AI assistants evaluation on real-world tasks that challenge them in several ways:
|
| 23 |
- Involve multimodal reasoning (e.g., analyzing images, audio, documents)
|
| 24 |
- Demand multi-hop retrieval of interdependent facts
|
| 25 |
- Involve running python code
|
| 26 |
- Require a structured response format
|
| 27 |
|
| 28 |
+
(see https://huggingface.co/learn/agents-course/unit4/what-is-gaia).
|
| 29 |
|
| 30 |
+
GAIA introductory paper [”GAIA: A Benchmark for General AI Assistants”](https://arxiv.org/abs/2311.12983).
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
## Implementation Highlights 🛠️
|
| 34 |
|
|
|
|
| 35 |
|
| 36 |
**The agent** is implemented using the LangGraph framework.
|
| 37 |
|
|
|
|
| 40 |
|
| 41 |
**Tools**
|
| 42 |
|
| 43 |
+
**Web Search** 🔎 uses `tavily` search and extract tools.
|
| 44 |
|
| 45 |
+
- **Chunking**: The content returned by the `extract` tool might be too large to be further analyzed at once by a model (depending on the chosen model context window size or on the rate limitation),
|
| 46 |
+
so if its size exceeds a pre-configured threshold, it is chunked and only the most relevant chunks further analyzed.
|
| 47 |
- **Text Splitting**: First by markdown (used Langchain's `MarkdownHeaderTextSplitter`) and then further by size with a sliding window (used LangChain's `RecursiveCharacterTextSplitter`).
|
| 48 |
+
- **Embeddings**: `langchain_community.embeddings.OpenAIEmbeddings`
|
| 49 |
+
- **Vector DB**: `FAISS` vector db.
|
| 50 |
+
- **Retrieval**: `FAISS` similarity search
|
| 51 |
|
| 52 |
+
Updated the original `extract` tool response message content only with the relevant chunks content.
|
| 53 |
|
| 54 |
+
**Audio**🔉 uses `gpt-4o-audio-preview` to analyze the input
|
| 55 |
|
| 56 |
+
**Math problems Solver**🧮 is a subagent that uses `gpt-5` equipped with the following tools:
|
| 57 |
|
| 58 |
+
- **Pyhton code executor**: executes a snippet of python code provided as input
|
| 59 |
- **Think tool**: used for strategic reflection on the progress of the solving process
|
| 60 |
|
| 61 |
+
It has the following states:
|
| 62 |
+
|
| 63 |
+
**Python code Executor**⚙️ can run either a snippet of python code or a python file. The python file is executed in a sub-process.
|
| 64 |
|
| 65 |
+
**Spreadsheets**📊: analyzes `excel` files using the pandas dataframe agent `langchain_experimental.agents.create_pandas_dataframe_agent`
|
| 66 |
+
and the`gpt-4.1` model.
|
| 67 |
|
| 68 |
+
**Chess**♟️ Given a chess board and the active color, this tool is able to suggest the best move to be performed by the active color.
|
|
|
|
| 69 |
|
| 70 |
+
- **Picture analysis**: the tool must detect the location of each piece on the chess board. Once the coordinates are retrieved, the FEN of
|
| 71 |
+
the game is computed programmatically. I use both `gpt-4.1` and `gemini-2.5-flash` to extract the coordinates and I perform an arbitrage on their outcomes.
|
| 72 |
+
- **Move suggestion**: the best move is suggested by a `stockfish` chess engine
|
| 73 |
+
- **Move interpretation**: the move is then interpreted and transcribed into the algebraic notation. Used `gpt-4` for this.
|
| 74 |
+
|
| 75 |
+
**Chess board picture analysis Challenges and Limitations** 🆘
|
| 76 |
+
- I tried both `gpt-4.1` and `gemini-2.5-flash` for chess pieces coordinates extraction, but I obtained inconsistent results (there
|
| 77 |
+
are times when they get it right, but also instances when they don't).
|
| 78 |
+
At least for openai I see there is a limitation on Spatial reasoning: The model struggles with tasks requiring precise spatial localization, such as identifying chess positions (see [here](https://platform.openai.com/docs/guides/images-vision?api-mode=responses#limitations)).
|
| 79 |
+
I questioned both models and chose to do an arbitrage on their results. I invoked both further but only on the conflicting positions.
|
| 80 |
+
I continue this process for a limited number of steps. My aim was to reduce the number of objects that the model focuses on.
|
| 81 |
+
But still, the inconsistencies remain.
|
| 82 |
|
|
|
|
|
|
|
| 83 |
|
|
|
|
| 84 |
|
| 85 |
+
🎥 **Videos** 🚧
|
| 86 |
|
| 87 |
|
| 88 |
|
| 89 |
+
## Challenges 🆘
|
| 90 |
+
1. Chess Board picture anaysis.
|
| 91 |
+
2. Video analysis.
|
| 92 |
|
| 93 |
|
| 94 |
|
| 95 |
+
## Future work and improvements 🔜
|
| 96 |
#### 1. Evaluation
|
| 97 |
#### 2. Chunking
|
| 98 |
#### 3. Audio Analysis
|
| 99 |
#### 3. Video Analysis
|
| 100 |
#### 4. Chessboard Images analysis
|
| 101 |
|
| 102 |
+
## References 📚
|
| 103 |
+
The math tool implementation was inspired from this repo https://github.com/langchain-ai/open_deep_research
|
| 104 |
|
| 105 |
|
| 106 |
|
tools/chess_tool.py
CHANGED
|
@@ -275,7 +275,6 @@ class ChessVisionAnalyzer:
|
|
| 275 |
|
| 276 |
print("Squares with conflicts:", conflicts_sqares)
|
| 277 |
|
| 278 |
-
|
| 279 |
first_analysis_res = self.analyze_board_from_image(active_color, image_path, 1, conflicts_sqares)
|
| 280 |
second_analysis_res = self.analyze_board_from_image(active_color, image_path, 2, conflicts_sqares)
|
| 281 |
result = self.compare_analyses(first_analysis_res, second_analysis_res)
|
|
|
|
| 275 |
|
| 276 |
print("Squares with conflicts:", conflicts_sqares)
|
| 277 |
|
|
|
|
| 278 |
first_analysis_res = self.analyze_board_from_image(active_color, image_path, 1, conflicts_sqares)
|
| 279 |
second_analysis_res = self.analyze_board_from_image(active_color, image_path, 2, conflicts_sqares)
|
| 280 |
result = self.compare_analyses(first_analysis_res, second_analysis_res)
|