Final_Assignment_Template

Sleeping

App Files Files Community

carolinacon commited on Sep 6, 2025

Commit

11db2a0

1 Parent(s): b4f9800

Updated the README file

Browse files

Files changed (2) hide show

README.md +43 -37
tools/chess_tool.py +0 -1

README.md CHANGED Viewed

@@ -12,30 +12,26 @@ hf_oauth: true
 hf_oauth_expiration_minutes: 480
 ---
-# General AI Assistant
 ## Background
 Created as a final project for the HuggingFace Agents course ( https://huggingface.co/learn/agents-course).
 Aims to answer Level 1 questions from the **GAIA** validation set. It was tested on 20 such questions with a success rate of 65%.
-### GAIA
-GAIA is a benchmark for AI assistants evaluation on real-world tasks that require a combination of capabilities—such
-as reasoning, multimodal understanding, web browsing, and proficient tool use (see https://huggingface.co/learn/agents-course/unit4/what-is-gaia).
-GAIA was introduced in the paper [”GAIA: A Benchmark for General AI Assistants”](https://arxiv.org/abs/2311.12983).
-The questions challenge AI systems in several ways:
 - Involve multimodal reasoning (e.g., analyzing images, audio, documents)
 - Demand multi-hop retrieval of interdependent facts
 - Involve running python code
 - Require a structured response format
-## Implementation Highlights
 **The agent** is implemented using the LangGraph framework.
@@ -44,57 +40,67 @@ The questions challenge AI systems in several ways:
 **Tools**
-🔎 **Web Search**: uses `tavily` search and extract tools.
-- **Chunking**: The content returned by the exact might be too large to be further analyzed at once by a model (depending on the chosen model context window size or on the rate limitation),
-so if its size exceeds a pre-configured threshold, it will be chunked and only the most relevant chunks will be analyzed.
     - **Text Splitting**: First by markdown (used Langchain's `MarkdownHeaderTextSplitter`) and then further by size with a sliding window (used LangChain's `RecursiveCharacterTextSplitter`).
-    - **Embeddings**:  langchain_community.embeddings.OpenAIEmbeddings
-    - **Vector DB**: FAISS vector db.
-    - **Retrieval**: FAISS similarity search
-  Updated the original extract tool response message content only with the relevant chunks content.
-🔉 **Audio**: uses `gpt-4o-audio-preview` to analyze the input
-🧮 **Math problems**: this is a subagent that uses `gpt-5` equipped with the following tools:
-- **Pyhton code executor**: executes a snipped of python code provided as input
 - **Think tool**: used for strategic reflection on the progress of the solving process
-**The Math Agent States:**
-🧩 **Python code**
-This tool can run either a snippet of python code or a python file. The python file is executed in a sub-process.
-📊 **Spreadsheets**
-In order to analyze `excel` files this tool uses the pandas dataframe agent
-    `langchain_experimental.agents import create_pandas_dataframe_agent`
-It uses `gpt-4.1` model.
-♟️ **Chess**
-Given a chess board and the active color, this tool is able to suggest the best move to be performed by the active color.
-- **Picture analysis**: the tool must detect
-🎥 **Videos**
-## Challenges
-## Future improvements
 #### 1. Evaluation
 #### 2. Chunking
 #### 3. Audio Analysis
 #### 3. Video Analysis
 #### 4. Chessboard Images analysis
-## References:
-https://github.com/langchain-ai/open_deep_research

 hf_oauth_expiration_minutes: 480
 ---
+# General AI Assistant 🔮
 ## Background
 Created as a final project for the HuggingFace Agents course ( https://huggingface.co/learn/agents-course).
 Aims to answer Level 1 questions from the **GAIA** validation set. It was tested on 20 such questions with a success rate of 65%.
+### What is GAIA
+GAIA is a benchmark for AI assistants evaluation on real-world tasks that challenge them in several ways:
 - Involve multimodal reasoning (e.g., analyzing images, audio, documents)
 - Demand multi-hop retrieval of interdependent facts
 - Involve running python code
 - Require a structured response format
+(see https://huggingface.co/learn/agents-course/unit4/what-is-gaia).
+GAIA introductory paper [”GAIA: A Benchmark for General AI Assistants”](https://arxiv.org/abs/2311.12983).
+## Implementation Highlights 🛠️
 **The agent** is implemented using the LangGraph framework.
 **Tools**
+**Web Search** 🔎 uses `tavily` search and extract tools.
+- **Chunking**: The content returned by the `extract` tool might be too large to be further analyzed at once by a model (depending on the chosen model context window size or on the rate limitation),
+so if its size exceeds a pre-configured threshold, it is chunked and only the most relevant chunks further analyzed.
     - **Text Splitting**: First by markdown (used Langchain's `MarkdownHeaderTextSplitter`) and then further by size with a sliding window (used LangChain's `RecursiveCharacterTextSplitter`).
+    - **Embeddings**:  `langchain_community.embeddings.OpenAIEmbeddings`
+    - **Vector DB**: `FAISS` vector db.
+    - **Retrieval**: `FAISS` similarity search
+  Updated the original `extract` tool response message content only with the relevant chunks content.
+**Audio**🔉 uses `gpt-4o-audio-preview` to analyze the input
+**Math problems Solver**🧮 is a subagent that uses `gpt-5` equipped with the following tools:
+- **Pyhton code executor**: executes a snippet of python code provided as input
 - **Think tool**: used for strategic reflection on the progress of the solving process
+  It has the following states:
+**Python code Executor**⚙️ can run either a snippet of python code or a python file. The python file is executed in a sub-process.
+**Spreadsheets**📊: analyzes `excel` files using the pandas dataframe agent `langchain_experimental.agents.create_pandas_dataframe_agent`
+and the`gpt-4.1` model.
+**Chess**♟️ Given a chess board and the active color, this tool is able to suggest the best move to be performed by the active color.
+- **Picture analysis**: the tool must detect the location of each piece on the chess board. Once the coordinates are retrieved, the FEN of
+the game is computed programmatically. I use both `gpt-4.1` and `gemini-2.5-flash` to extract the coordinates and I perform an arbitrage on their outcomes.
+- **Move suggestion**: the best move is suggested by a `stockfish` chess engine
+- **Move interpretation**: the move is then interpreted and transcribed into the algebraic notation. Used `gpt-4` for this.
+**Chess board picture analysis Challenges and Limitations** 🆘
+    - I tried both `gpt-4.1` and `gemini-2.5-flash` for chess pieces coordinates extraction, but I obtained inconsistent results (there
+are times when they get it right, but also instances when they don't).
+At least for openai I see there is a limitation on Spatial reasoning: The model struggles with tasks requiring precise spatial localization, such as identifying chess positions (see [here](https://platform.openai.com/docs/guides/images-vision?api-mode=responses#limitations)).
+I questioned both models and chose to do an arbitrage on their results. I invoked both further but only on the conflicting positions.
+I continue this process for a limited number of steps. My aim was to reduce the number of objects that the model focuses on.
+But still, the inconsistencies remain.
+🎥 **Videos** 🚧
+## Challenges 🆘
+1. Chess Board picture anaysis.
+2. Video analysis.
+## Future work and improvements 🔜
 #### 1. Evaluation
 #### 2. Chunking
 #### 3. Audio Analysis
 #### 3. Video Analysis
 #### 4. Chessboard Images analysis
+## References 📚
+The math tool implementation was inspired from this repo https://github.com/langchain-ai/open_deep_research

tools/chess_tool.py CHANGED Viewed

@@ -275,7 +275,6 @@ class ChessVisionAnalyzer:
         print("Squares with conflicts:", conflicts_sqares)
         first_analysis_res = self.analyze_board_from_image(active_color, image_path, 1, conflicts_sqares)
         second_analysis_res = self.analyze_board_from_image(active_color, image_path, 2, conflicts_sqares)
         result = self.compare_analyses(first_analysis_res, second_analysis_res)

         print("Squares with conflicts:", conflicts_sqares)
         first_analysis_res = self.analyze_board_from_image(active_color, image_path, 1, conflicts_sqares)
         second_analysis_res = self.analyze_board_from_image(active_color, image_path, 2, conflicts_sqares)
         result = self.compare_analyses(first_analysis_res, second_analysis_res)