Final_Assignment_Template

Sleeping

App Files Files Community

carolinacon commited on Sep 13, 2025

Commit

eb5efe8

1 Parent(s): a4b0424

updated readme file

Browse files

Files changed (1) hide show

README.md +6 -4

README.md CHANGED Viewed

@@ -66,6 +66,9 @@ so if its size exceeds a pre-configured threshold, it is chunked and only the mo
 - **Pyhton code executor**: executes a snippet of python code provided as input
 - **Think tool**: used for strategic reflection on the progress of the solving process
 **Python code Executor**⚙️ can run either a snippet of python code or a given python file. The python code snippet is executed using `langchain_experimental.tools.PythonREPLTool`. The python file is executed in a sub-process.
 **Spreadsheets**📊: analyzes `excel` files using the pandas dataframe agent `langchain_experimental.agents.create_pandas_dataframe_agent`
@@ -75,7 +78,7 @@ and the`gpt-4.1` model.
 - **Picture analysis**: identifies the location of each chess piece on the board. Once the coordinates are identified, the FEN of
 the game is computed programmatically. Both `gpt-4.1` and `gemini-2.5-flash` models are used to extract the coordinates and an arbitrage is performed on their outcomes.
-- **Move suggestion**: the best move is suggested by a `stockfish` chess engine
 - **Move interpretation**: the move is then interpreted and transcribed into the algebraic notation with the help of `gpt-4`.
 **Chess Board Picture Analysis - Challenges and Limitations** 🆘
@@ -93,7 +96,7 @@ From what I observed, this approach improved the chances of having a correct ide
 **YouTube Videos Analysis**🎥 This is work in progress 🚧
-So far, the agent is able to respond to the questions on the conversation inside a YouTube video. There is no dedicated tool for this.
 The assistant searches for the transcripts by using the `tavily extract` tool.
 TODO: analyze YouTube videos and answer questions about objects in the video.
@@ -101,8 +104,7 @@ TODO: analyze YouTube videos and answer questions about objects in the video.
 ## Future work and improvements 🔜
-- **Evaluation**:  Implement an automated evaluation for the reference set of questions.
-Evaluate the agent against other questions from the GAIA validation set.
 - **Large Web Extracts**: Try other chunking strategies.
 - **Audio Analysis**:Use a lesser expensive model to get the transcripts (like whisper) and if this is not enough to answer the question and more sophiticated processing is needed
 for other sounds like music, barks or other type of sounds then indeed use a better model.

 - **Pyhton code executor**: executes a snippet of python code provided as input
 - **Think tool**: used for strategic reflection on the progress of the solving process
+At this point it looks like the agent prefers to answer the mathematical question from the test set by invoking the python code executor instead.
+The question is answered correctly. I decided to not remove yet this tool, until I test the agent on other mathematical questions from the GAIA validation set.
 **Python code Executor**⚙️ can run either a snippet of python code or a given python file. The python code snippet is executed using `langchain_experimental.tools.PythonREPLTool`. The python file is executed in a sub-process.
 **Spreadsheets**📊: analyzes `excel` files using the pandas dataframe agent `langchain_experimental.agents.create_pandas_dataframe_agent`
 - **Picture analysis**: identifies the location of each chess piece on the board. Once the coordinates are identified, the FEN of
 the game is computed programmatically. Both `gpt-4.1` and `gemini-2.5-flash` models are used to extract the coordinates and an arbitrage is performed on their outcomes.
+- **Move suggestion**: the best move is suggested by a `stockfish` chess engine.
 - **Move interpretation**: the move is then interpreted and transcribed into the algebraic notation with the help of `gpt-4`.
 **Chess Board Picture Analysis - Challenges and Limitations** 🆘
 **YouTube Videos Analysis**🎥 This is work in progress 🚧
+So far, the agent is able to answer questions on the conversation inside a YouTube video. There is no dedicated tool for this.
 The assistant searches for the transcripts by using the `tavily extract` tool.
 TODO: analyze YouTube videos and answer questions about objects in the video.
 ## Future work and improvements 🔜
+- **Evaluation**:  Evaluate the agent against other questions from the GAIA validation set.
 - **Large Web Extracts**: Try other chunking strategies.
 - **Audio Analysis**:Use a lesser expensive model to get the transcripts (like whisper) and if this is not enough to answer the question and more sophiticated processing is needed
 for other sounds like music, barks or other type of sounds then indeed use a better model.