carolinacon commited on
Commit
eb5efe8
Β·
1 Parent(s): a4b0424

updated readme file

Browse files
Files changed (1) hide show
  1. README.md +6 -4
README.md CHANGED
@@ -66,6 +66,9 @@ so if its size exceeds a pre-configured threshold, it is chunked and only the mo
66
  - **Pyhton code executor**: executes a snippet of python code provided as input
67
  - **Think tool**: used for strategic reflection on the progress of the solving process
68
 
 
 
 
69
  **Python code Executor**βš™οΈ can run either a snippet of python code or a given python file. The python code snippet is executed using `langchain_experimental.tools.PythonREPLTool`. The python file is executed in a sub-process.
70
 
71
  **Spreadsheets**πŸ“Š: analyzes `excel` files using the pandas dataframe agent `langchain_experimental.agents.create_pandas_dataframe_agent`
@@ -75,7 +78,7 @@ and the`gpt-4.1` model.
75
 
76
  - **Picture analysis**: identifies the location of each chess piece on the board. Once the coordinates are identified, the FEN of
77
  the game is computed programmatically. Both `gpt-4.1` and `gemini-2.5-flash` models are used to extract the coordinates and an arbitrage is performed on their outcomes.
78
- - **Move suggestion**: the best move is suggested by a `stockfish` chess engine
79
  - **Move interpretation**: the move is then interpreted and transcribed into the algebraic notation with the help of `gpt-4`.
80
 
81
  **Chess Board Picture Analysis - Challenges and Limitations** πŸ†˜
@@ -93,7 +96,7 @@ From what I observed, this approach improved the chances of having a correct ide
93
 
94
  **YouTube Videos Analysis**πŸŽ₯ This is work in progress 🚧
95
 
96
- So far, the agent is able to respond to the questions on the conversation inside a YouTube video. There is no dedicated tool for this.
97
  The assistant searches for the transcripts by using the `tavily extract` tool.
98
  TODO: analyze YouTube videos and answer questions about objects in the video.
99
 
@@ -101,8 +104,7 @@ TODO: analyze YouTube videos and answer questions about objects in the video.
101
 
102
 
103
  ## Future work and improvements πŸ”œ
104
- - **Evaluation**: Implement an automated evaluation for the reference set of questions.
105
- Evaluate the agent against other questions from the GAIA validation set.
106
  - **Large Web Extracts**: Try other chunking strategies.
107
  - **Audio Analysis**:Use a lesser expensive model to get the transcripts (like whisper) and if this is not enough to answer the question and more sophiticated processing is needed
108
  for other sounds like music, barks or other type of sounds then indeed use a better model.
 
66
  - **Pyhton code executor**: executes a snippet of python code provided as input
67
  - **Think tool**: used for strategic reflection on the progress of the solving process
68
 
69
+ At this point it looks like the agent prefers to answer the mathematical question from the test set by invoking the python code executor instead.
70
+ The question is answered correctly. I decided to not remove yet this tool, until I test the agent on other mathematical questions from the GAIA validation set.
71
+
72
  **Python code Executor**βš™οΈ can run either a snippet of python code or a given python file. The python code snippet is executed using `langchain_experimental.tools.PythonREPLTool`. The python file is executed in a sub-process.
73
 
74
  **Spreadsheets**πŸ“Š: analyzes `excel` files using the pandas dataframe agent `langchain_experimental.agents.create_pandas_dataframe_agent`
 
78
 
79
  - **Picture analysis**: identifies the location of each chess piece on the board. Once the coordinates are identified, the FEN of
80
  the game is computed programmatically. Both `gpt-4.1` and `gemini-2.5-flash` models are used to extract the coordinates and an arbitrage is performed on their outcomes.
81
+ - **Move suggestion**: the best move is suggested by a `stockfish` chess engine.
82
  - **Move interpretation**: the move is then interpreted and transcribed into the algebraic notation with the help of `gpt-4`.
83
 
84
  **Chess Board Picture Analysis - Challenges and Limitations** πŸ†˜
 
96
 
97
  **YouTube Videos Analysis**πŸŽ₯ This is work in progress 🚧
98
 
99
+ So far, the agent is able to answer questions on the conversation inside a YouTube video. There is no dedicated tool for this.
100
  The assistant searches for the transcripts by using the `tavily extract` tool.
101
  TODO: analyze YouTube videos and answer questions about objects in the video.
102
 
 
104
 
105
 
106
  ## Future work and improvements πŸ”œ
107
+ - **Evaluation**: Evaluate the agent against other questions from the GAIA validation set.
 
108
  - **Large Web Extracts**: Try other chunking strategies.
109
  - **Audio Analysis**:Use a lesser expensive model to get the transcripts (like whisper) and if this is not enough to answer the question and more sophiticated processing is needed
110
  for other sounds like music, barks or other type of sounds then indeed use a better model.