refactor: compress console logs for clarity
Browse filesConsole logs now show status only:
- [plan] ✓ 660 chars
- [execute] 1 tool(s) selected
- [1/1] youtube_transcript ✓
- [execute] 1 tools, 1 evidence
- [answer] ✓ 3
Full context saved to _cache/llm_context_*.txt for debugging.
Co-Authored-By: Claude <noreply@anthropic.com>
- WORKSPACE.md +96 -106
- src/agent/graph.py +31 -133
- src/agent/llm_client.py +1 -18
WORKSPACE.md
CHANGED
|
@@ -1,8 +1,6 @@
|
|
| 1 |
-
2026-01-13 15:
|
| 2 |
-
2026-01-13 15:
|
| 3 |
-
2026-01-13 15:
|
| 4 |
-
2026-01-13 15:47:29,290 - **main** - INFO - Initializing GAIAAgent...
|
| 5 |
-
2026-01-13 15:47:29,317 - **main** - INFO - GAIAAgent initialized successfully
|
| 6 |
User logged in: mangubee
|
| 7 |
GAIAAgent initializing...
|
| 8 |
✓ All API keys present
|
|
@@ -10,101 +8,94 @@ GAIAAgent initializing...
|
|
| 10 |
GAIAAgent initialized successfully
|
| 11 |
https://huggingface.co/spaces/mangoobee/Final_Assignment_Template/tree/main
|
| 12 |
Fetching questions from: https://agents-course-unit4-scoring.hf.space/questions
|
| 13 |
-
2026-01-13 15:
|
| 14 |
DEBUG MODE: Processing 1 targeted questions (0 IDs not found: set())
|
| 15 |
Processing 1 questions.
|
| 16 |
-
2026-01-13 15:
|
| 17 |
-
2026-01-13 15:
|
| 18 |
-
2026-01-13 15:
|
| 19 |
-
2026-01-13 15:
|
| 20 |
-
2026-01-13 15:
|
| 21 |
-
2026-01-13 15:
|
| 22 |
-
2026-01-13 15:
|
| 23 |
-
2026-01-13 15:
|
| 24 |
-
2026-01-13 15:
|
| 25 |
-
2026-01-13 15:
|
| 26 |
-
2026-01-13 15:
|
| 27 |
-
2026-01-13 15:
|
| 28 |
-
2026-01-13 15:
|
| 29 |
-
2026-01-13 15:47:32,862 - **main** - INFO - [1/1] Processing a1e91b78...
|
| 30 |
-
2026-01-13 15:47:32,864 - src.agent.graph - INFO - [plan_node] ========== PLAN NODE START ==========
|
| 31 |
-
2026-01-13 15:47:32,865 - src.agent.graph - INFO - [plan_node] Question: In the video https://www.youtube.com/watch?v=L1vXCYZAYYM, what is the highest number of bird species to be on camera simultaneously?
|
| 32 |
-
2026-01-13 15:47:32,865 - src.agent.graph - INFO - [plan_node] File paths: None
|
| 33 |
-
2026-01-13 15:47:32,865 - src.agent.graph - INFO - [plan_node] Available tools: ['web_search', 'parse_file', 'calculator', 'vision', 'youtube_transcript', 'transcribe_audio']
|
| 34 |
-
2026-01-13 15:47:32,865 - src.agent.graph - INFO - [plan_node] Calling plan_question() with LLM...
|
| 35 |
-
2026-01-13 15:47:32,866 - src.agent.llm_client - INFO - [plan_question] Using provider: huggingface
|
| 36 |
-
2026-01-13 15:47:32,866 - src.agent.llm_client - INFO - Initializing HuggingFace Inference client with model: openai/gpt-oss-120b:scaleway
|
| 37 |
-
2026-01-13 15:47:32,866 - src.agent.llm_client - INFO - [plan_question_hf] Calling HuggingFace (openai/gpt-oss-120b:scaleway) for planning
|
| 38 |
GAIAAgent processing question (first 50 chars): In the video https://www.youtube.com/watch?v=L1vXC...
|
| 39 |
-
2026-01-13 15:
|
| 40 |
-
2026-01-13 15:
|
| 41 |
-
2026-01-13 15:
|
| 42 |
-
2026-01-13 15:
|
| 43 |
-
2026-01-13 15:
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
2026-01-13 15:
|
| 57 |
-
2026-01-13 15:
|
| 58 |
-
2026-01-13 15:
|
| 59 |
-
2026-01-13 15:
|
| 60 |
-
2026-01-13 15:
|
| 61 |
-
2026-01-13 15:
|
| 62 |
-
2026-01-13 15:
|
| 63 |
-
2026-01-13 15:
|
| 64 |
-
2026-01-13 15:
|
| 65 |
-
2026-01-13 15:
|
| 66 |
-
2026-01-13 15:
|
|
|
|
|
|
|
| 67 |
Could not retrieve a transcript for the video https://www.youtube.com/watch?v=L1vXCYZAYYM! This is most likely caused by:
|
| 68 |
|
| 69 |
Subtitles are disabled for this video
|
| 70 |
|
| 71 |
If you are sure that the described cause is not responsible for this error and that a transcript should be retrievable, please create an issue at https://github.com/jdepoix/youtube-transcript-api/issues. Please add which version of youtube_transcript_api you are using and provide the information needed to replicate the error. Also make sure that there are no open issues which already describe your problem!
|
| 72 |
-
2026-01-13 15:
|
| 73 |
-
2026-01-13 15:
|
| 74 |
-
|
| 75 |
-
2026-01-13 15:
|
| 76 |
-
2026-01-13 15:
|
| 77 |
-
2026-01-13 15:
|
| 78 |
-
2026-01-13 15:
|
| 79 |
-
2026-01-13 15:
|
| 80 |
-
2026-01-13 15:
|
| 81 |
-
2026-01-13 15:
|
| 82 |
-
2026-01-13 15:
|
| 83 |
-
2026-01-13 15:
|
| 84 |
-
2026-01-13 15:
|
| 85 |
-
2026-01-13 15:
|
| 86 |
-
2026-01-13 15:
|
| 87 |
-
2026-01-13 15:
|
| 88 |
-
2026-01-13 15:
|
| 89 |
-
2026-01-13 15:
|
| 90 |
-
2026-01-13 15:
|
| 91 |
-
2026-01-13 15:
|
| 92 |
-
2026-01-13 15:
|
| 93 |
-
2026-01-13 15:
|
| 94 |
-
2026-01-13 15:
|
| 95 |
-
2026-01-13 15:
|
| 96 |
-
2026-01-13 15:
|
| 97 |
-
2026-01-13 15:
|
| 98 |
-
2026-01-13 15:
|
| 99 |
-
2026-01-13 15:
|
| 100 |
-
2026-01-13 15:
|
| 101 |
-
2026-01-13 15:
|
| 102 |
-
2026-01-13 15:
|
| 103 |
-
2026-01-13 15:
|
| 104 |
-
2026-01-13 15:
|
| 105 |
-
2026-01-13 15:
|
| 106 |
-
2026-01-13 15:
|
| 107 |
-
2026-01-13 15:
|
| 108 |
You are an answer synthesis agent for the GAIA benchmark.
|
| 109 |
|
| 110 |
Your task is to extract a factoid answer from the provided evidence.
|
|
@@ -129,27 +120,26 @@ Examples of bad answers (too verbose):
|
|
| 129 |
- "The answer is 42 because..."
|
| 130 |
- "Based on the evidence, it appears that..."
|
| 131 |
|
| 132 |
-
2026-01-13 15:
|
| 133 |
-
2026-01-13 15:
|
| 134 |
Question: In the video https://www.youtube.com/watch?v=L1vXCYZAYYM, what is the highest number of bird species to be on camera simultaneously?
|
| 135 |
|
| 136 |
Evidence 1:
|
| 137 |
{'text': "But one challenge stops them in their tracks. A giant petrel. They try to flee, but running isn't an emperor's strong point. A slip is all the petrel needs. The chick is grabbed by his neck feathers. But the down just falls away. They form a defensive circle and prepare to stand their ground. Despite their chick-like appearance, they are close to a metre tall. Quite a size, even for a giant petrel. The chick towers to full height, protecting those behind. His defiance buys time. It's a standoff. Then, as if from nowhere, and a deli, the feistiest penguin in the world. He fearlessly puts himself between the chicks and the petrel. Even petrels don't mess with the delis. Their plucky rescuer accompanies the chicks to the sea. Fair.", 'video_id': 'L1vXCYZAYYM', 'source': 'whisper', 'success': True, 'error': None}
|
| 138 |
|
| 139 |
Extract the factoid answer from the evidence above. Return only the factoid, nothing else.
|
| 140 |
-
2026-01-13 15:
|
| 141 |
-
2026-01-13 15:
|
| 142 |
-
2026-01-13 15:
|
| 143 |
-
2026-01-13 15:
|
| 144 |
-
2026-01-13 15:
|
| 145 |
-
2026-01-13 15:
|
| 146 |
-
2026-01-13 15:
|
| 147 |
-
2026-01-13 15:
|
| 148 |
-
2026-01-13 15:
|
| 149 |
-
2026-01-13 15:47:59,310 - **main** - INFO - Progress: 1/1 questions processed
|
| 150 |
GAIAAgent returning answer: Unable to answer
|
| 151 |
Agent finished. Submitting 1 answers for user 'mangubee'...
|
| 152 |
Submitting 1 answers to: https://agents-course-unit4-scoring.hf.space/submit
|
| 153 |
-
2026-01-13 15:
|
| 154 |
-
2026-01-13 15:
|
| 155 |
Submission successful.
|
|
|
|
| 1 |
+
2026-01-13 15:50:59,509 - **main** - INFO - UI Config for Full Evaluation: LLM_PROVIDER=HuggingFace
|
| 2 |
+
2026-01-13 15:50:59,510 - **main** - INFO - Initializing GAIAAgent...
|
| 3 |
+
2026-01-13 15:50:59,535 - **main** - INFO - GAIAAgent initialized successfully
|
|
|
|
|
|
|
| 4 |
User logged in: mangubee
|
| 5 |
GAIAAgent initializing...
|
| 6 |
✓ All API keys present
|
|
|
|
| 8 |
GAIAAgent initialized successfully
|
| 9 |
https://huggingface.co/spaces/mangoobee/Final_Assignment_Template/tree/main
|
| 10 |
Fetching questions from: https://agents-course-unit4-scoring.hf.space/questions
|
| 11 |
+
2026-01-13 15:50:59,972 - **main** - WARNING - DEBUG MODE: Targeted 1/20 questions by task_id
|
| 12 |
DEBUG MODE: Processing 1 targeted questions (0 IDs not found: set())
|
| 13 |
Processing 1 questions.
|
| 14 |
+
2026-01-13 15:51:01,088 - src.utils.ground_truth - INFO - Loading GAIA validation dataset...
|
| 15 |
+
2026-01-13 15:51:02,550 - src.utils.ground_truth - INFO - Loaded 165 ground truth answers
|
| 16 |
+
2026-01-13 15:51:02,551 - **main** - INFO - Ground truth loaded - per-question correctness will be available
|
| 17 |
+
2026-01-13 15:51:02,551 - **main** - INFO - Running agent on 1 questions with 5 workers...
|
| 18 |
+
2026-01-13 15:51:02,551 - **main** - INFO - [1/1] Processing a1e91b78...
|
| 19 |
+
2026-01-13 15:51:02,553 - src.agent.graph - INFO - [plan_node] ========== PLAN NODE START ==========
|
| 20 |
+
2026-01-13 15:51:02,553 - src.agent.graph - INFO - [plan_node] Question: In the video https://www.youtube.com/watch?v=L1vXCYZAYYM, what is the highest number of bird species to be on camera simultaneously?
|
| 21 |
+
2026-01-13 15:51:02,553 - src.agent.graph - INFO - [plan_node] File paths: None
|
| 22 |
+
2026-01-13 15:51:02,554 - src.agent.graph - INFO - [plan_node] Available tools: ['web_search', 'parse_file', 'calculator', 'vision', 'youtube_transcript', 'transcribe_audio']
|
| 23 |
+
2026-01-13 15:51:02,554 - src.agent.graph - INFO - [plan_node] Calling plan_question() with LLM...
|
| 24 |
+
2026-01-13 15:51:02,554 - src.agent.llm_client - INFO - [plan_question] Using provider: huggingface
|
| 25 |
+
2026-01-13 15:51:02,554 - src.agent.llm_client - INFO - Initializing HuggingFace Inference client with model: openai/gpt-oss-120b:scaleway
|
| 26 |
+
2026-01-13 15:51:02,555 - src.agent.llm_client - INFO - [plan_question_hf] Calling HuggingFace (openai/gpt-oss-120b:scaleway) for planning
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
GAIAAgent processing question (first 50 chars): In the video https://www.youtube.com/watch?v=L1vXC...
|
| 28 |
+
2026-01-13 15:51:13,335 - src.agent.llm_client - INFO - [plan_question_hf] Generated plan (1340 chars)
|
| 29 |
+
2026-01-13 15:51:13,335 - src.agent.graph - INFO - [plan_node] ✓ Plan created successfully (1340 chars)
|
| 30 |
+
2026-01-13 15:51:13,336 - src.agent.graph - INFO - [plan_node] ========== PLAN NODE END ==========
|
| 31 |
+
2026-01-13 15:51:13,337 - src.agent.graph - INFO - [execute_node] ========== EXECUTE NODE START ==========
|
| 32 |
+
2026-01-13 15:51:13,338 - src.agent.graph - INFO - [execute_node] Plan: **Execution Plan**
|
| 33 |
+
|
| 34 |
+
1. **Extract the video transcript** – Use the `youtube_transcript` tool on the URL `https://www.youtube.com/watch?v=L1vXCYZAYYM` to obtain the full spoken text of the video.
|
| 35 |
+
|
| 36 |
+
2. **Locate the relevant statement** – Scan the returned transcript for keywords such as “species”, “bird”, “simultaneously”, “on camera”, “different species”, or any numeric value that could represent the count of bird species shown at once.
|
| 37 |
+
|
| 38 |
+
3. **Identify the highest number mentioned** – If multiple numbers are found, determine which one refers to the “highest number of bird species on camera simultaneously.”
|
| 39 |
+
|
| 40 |
+
4. **Validate via web search (if needed)** – If the transcript does not contain a clear answer, perform a `web_search` using the video title (or a description of the video) combined with terms like “bird species on camera simultaneously” to find external sources (e.g., articles, forum posts, video description) that state the number.
|
| 41 |
+
|
| 42 |
+
5. **Extract the answer** – From the transcript (or from the web‑search result), record the exact number of bird species that were on camera at the same time, ensuring it is the highest reported figure.
|
| 43 |
+
|
| 44 |
+
6. **Provide the final response** – Return the identified number, citing that it comes from the video transcript (or the supporting web source if the transcript was insufficient).
|
| 45 |
+
2026-01-13 15:51:13,338 - src.agent.graph - INFO - [execute_node] Question: In the video https://www.youtube.com/watch?v=L1vXCYZAYYM, what is the highest number of bird species to be on camera simultaneously?
|
| 46 |
+
2026-01-13 15:51:13,338 - src.agent.graph - INFO - [execute_node] Calling select_tools_with_function_calling()...
|
| 47 |
+
2026-01-13 15:51:13,339 - src.agent.llm_client - INFO - [select_tools] Using provider: huggingface
|
| 48 |
+
2026-01-13 15:51:13,339 - src.agent.llm_client - INFO - Initializing HuggingFace Inference client with model: openai/gpt-oss-120b:scaleway
|
| 49 |
+
2026-01-13 15:51:13,340 - src.agent.llm_client - INFO - [select_tools_hf] Calling HuggingFace with function calling for 6 tools, file_paths=None
|
| 50 |
+
2026-01-13 15:51:15,405 - src.agent.llm_client - INFO - [select_tools_hf] HuggingFace selected 1 tool(s)
|
| 51 |
+
2026-01-13 15:51:15,406 - src.agent.graph - INFO - [execute_node] ✓ LLM selected 1 tool(s)
|
| 52 |
+
2026-01-13 15:51:15,407 - src.agent.graph - INFO - [execute_node] --- Tool 1/1: youtube_transcript ---
|
| 53 |
+
2026-01-13 15:51:15,407 - src.agent.graph - INFO - [execute_node] Parameters: {'url': 'https://www.youtube.com/watch?v=L1vXCYZAYYM'}
|
| 54 |
+
2026-01-13 15:51:15,408 - src.agent.graph - INFO - [execute_node] Executing youtube_transcript...
|
| 55 |
+
2026-01-13 15:51:15,408 - src.tools.youtube - INFO - Processing YouTube video: L1vXCYZAYYM
|
| 56 |
+
2026-01-13 15:51:15,420 - src.tools.youtube - INFO - Fetching transcript for video: L1vXCYZAYYM
|
| 57 |
+
2026-01-13 15:51:16,397 - src.tools.youtube - ERROR - YouTube transcript API failed:
|
| 58 |
Could not retrieve a transcript for the video https://www.youtube.com/watch?v=L1vXCYZAYYM! This is most likely caused by:
|
| 59 |
|
| 60 |
Subtitles are disabled for this video
|
| 61 |
|
| 62 |
If you are sure that the described cause is not responsible for this error and that a transcript should be retrievable, please create an issue at https://github.com/jdepoix/youtube-transcript-api/issues. Please add which version of youtube_transcript_api you are using and provide the information needed to replicate the error. Also make sure that there are no open issues which already describe your problem!
|
| 63 |
+
2026-01-13 15:51:16,400 - src.tools.youtube - INFO - Transcript API failed, trying audio transcription...
|
| 64 |
+
2026-01-13 15:51:16,463 - src.tools.youtube - INFO - Downloading audio from: https://www.youtube.com/watch?v=L1vXCYZAYYM
|
| 65 |
+
|
| 66 |
+
2026-01-13 15:51:19,610 - src.tools.youtube - INFO - Audio downloaded: /var/folders/05/8vqqybgj751**dmlh3w536dh0000gn/T/youtube_audio_40067.mp3 (1930412 bytes)
|
| 67 |
+
2026-01-13 15:51:19,610 - src.tools.audio - INFO - Transcribing audio: /var/folders/05/8vqqybgj751**dmlh3w536dh0000gn/T/youtube_audio_40067.mp3
|
| 68 |
+
2026-01-13 15:51:19,850 - src.tools.audio - INFO - Loading Whisper model: small
|
| 69 |
+
2026-01-13 15:51:21,374 - src.tools.audio - INFO - Whisper model loaded on cpu
|
| 70 |
+
2026-01-13 15:51:27,949 - src.tools.audio - INFO - Transcription successful: 738 characters
|
| 71 |
+
2026-01-13 15:51:27,950 - src.tools.youtube - INFO - Cleaned up temp file: /var/folders/05/8vqqybgj751\_\_dmlh3w536dh0000gn/T/youtube_audio_40067.mp3
|
| 72 |
+
2026-01-13 15:51:27,951 - src.tools.youtube - INFO - Transcript saved to cache: \_cache/L1vXCYZAYYM_transcript.txt
|
| 73 |
+
2026-01-13 15:51:27,951 - src.tools.youtube - INFO - Transcript retrieved via Whisper: 738 characters
|
| 74 |
+
2026-01-13 15:51:27,952 - src.tools.youtube - INFO - Full transcript: But one challenge stops them in their tracks. A giant petrel. They try to flee, but running isn't an emperor's strong point. A slip is all the petrel needs. The chick is grabbed by his neck feathers. But the down just falls away. They form a defensive circle and prepare to stand their ground. Despite their chick-like appearance, they are close to a metre tall. Quite a size, even for a giant petrel. The chick towers to full height, protecting those behind. His defiance buys time. It's a standoff. Then, as if from nowhere, and a deli, the feistiest penguin in the world. He fearlessly puts himself between the chicks and the petrel. Even petrels don't mess with the delis. Their plucky rescuer accompanies the chicks to the sea. Fair.
|
| 75 |
+
2026-01-13 15:51:27,952 - src.agent.graph - INFO - [execute_node] ✓ youtube_transcript completed successfully
|
| 76 |
+
2026-01-13 15:51:27,952 - src.agent.graph - INFO - [execute_node] Summary: 1 tool(s) executed, 1 evidence items collected
|
| 77 |
+
2026-01-13 15:51:27,952 - src.agent.graph - INFO - [execute_node] ========== EXECUTE NODE END ==========
|
| 78 |
+
2026-01-13 15:51:27,953 - src.agent.graph - INFO - [answer_node] ========== ANSWER NODE START ==========
|
| 79 |
+
2026-01-13 15:51:27,954 - src.agent.graph - INFO - [answer_node] Evidence items collected: 1
|
| 80 |
+
2026-01-13 15:51:27,954 - src.agent.graph - INFO - [answer_node] Errors accumulated: 0
|
| 81 |
+
2026-01-13 15:51:27,954 - src.agent.graph - INFO - ================================================================================
|
| 82 |
+
2026-01-13 15:51:27,954 - src.agent.graph - INFO - [EVIDENCE] Full evidence content being passed to synthesis:
|
| 83 |
+
2026-01-13 15:51:27,954 - src.agent.graph - INFO - ================================================================================
|
| 84 |
+
2026-01-13 15:51:27,954 - src.agent.graph - INFO - [EVIDENCE 1/1]
|
| 85 |
+
2026-01-13 15:51:27,954 - src.agent.graph - INFO - {'text': "But one challenge stops them in their tracks. A giant petrel. They try to flee, but running isn't an emperor's strong point. A slip is all the petrel needs. The chick is grabbed by his neck feathers. But the down just falls away. They form a defensive circle and prepare to stand their ground. Despite their chick-like appearance, they are close to a metre tall. Quite a size, even for a giant petrel. The chick towers to full height, protecting those behind. His defiance buys time. It's a...
|
| 86 |
+
2026-01-13 15:51:27,955 - src.agent.graph - INFO - --------------------------------------------------------------------------------
|
| 87 |
+
2026-01-13 15:51:27,955 - src.agent.graph - INFO - ================================================================================
|
| 88 |
+
2026-01-13 15:51:27,955 - src.agent.graph - INFO - [EVIDENCE] End of evidence content
|
| 89 |
+
2026-01-13 15:51:27,955 - src.agent.graph - INFO - ================================================================================
|
| 90 |
+
2026-01-13 15:51:27,955 - src.agent.graph - INFO - [answer_node] Calling synthesize_answer() with 1 evidence items...
|
| 91 |
+
2026-01-13 15:51:27,956 - src.agent.llm_client - INFO - [synthesize_answer] Using provider: huggingface
|
| 92 |
+
2026-01-13 15:51:27,956 - src.agent.llm_client - INFO - Initializing HuggingFace Inference client with model: openai/gpt-oss-120b:scaleway
|
| 93 |
+
2026-01-13 15:51:27,957 - src.agent.llm_client - INFO - [synthesize_answer_hf] LLM context saved to: \_cache/llm_context_20260113_155127.txt
|
| 94 |
+
2026-01-13 15:51:27,957 - src.agent.llm_client - INFO - [synthesize_answer_hf] Calling HuggingFace for answer synthesis
|
| 95 |
+
2026-01-13 15:51:27,958 - src.agent.llm_client - INFO - ================================================================================
|
| 96 |
+
2026-01-13 15:51:27,958 - src.agent.llm_client - INFO - [LLM CONTEXT] Full synthesis prompt being sent to LLM:
|
| 97 |
+
2026-01-13 15:51:27,958 - src.agent.llm_client - INFO - ================================================================================
|
| 98 |
+
2026-01-13 15:51:27,958 - src.agent.llm_client - INFO - [SYSTEM PROMPT]
|
| 99 |
You are an answer synthesis agent for the GAIA benchmark.
|
| 100 |
|
| 101 |
Your task is to extract a factoid answer from the provided evidence.
|
|
|
|
| 120 |
- "The answer is 42 because..."
|
| 121 |
- "Based on the evidence, it appears that..."
|
| 122 |
|
| 123 |
+
2026-01-13 15:51:27,958 - src.agent.llm_client - INFO - --------------------------------------------------------------------------------
|
| 124 |
+
2026-01-13 15:51:27,959 - src.agent.llm_client - INFO - [USER PROMPT]
|
| 125 |
Question: In the video https://www.youtube.com/watch?v=L1vXCYZAYYM, what is the highest number of bird species to be on camera simultaneously?
|
| 126 |
|
| 127 |
Evidence 1:
|
| 128 |
{'text': "But one challenge stops them in their tracks. A giant petrel. They try to flee, but running isn't an emperor's strong point. A slip is all the petrel needs. The chick is grabbed by his neck feathers. But the down just falls away. They form a defensive circle and prepare to stand their ground. Despite their chick-like appearance, they are close to a metre tall. Quite a size, even for a giant petrel. The chick towers to full height, protecting those behind. His defiance buys time. It's a standoff. Then, as if from nowhere, and a deli, the feistiest penguin in the world. He fearlessly puts himself between the chicks and the petrel. Even petrels don't mess with the delis. Their plucky rescuer accompanies the chicks to the sea. Fair.", 'video_id': 'L1vXCYZAYYM', 'source': 'whisper', 'success': True, 'error': None}
|
| 129 |
|
| 130 |
Extract the factoid answer from the evidence above. Return only the factoid, nothing else.
|
| 131 |
+
2026-01-13 15:51:27,959 - src.agent.llm_client - INFO - ================================================================================
|
| 132 |
+
2026-01-13 15:51:27,959 - src.agent.llm_client - INFO - [LLM CONTEXT] End of full context
|
| 133 |
+
2026-01-13 15:51:27,959 - src.agent.llm_client - INFO - ================================================================================
|
| 134 |
+
2026-01-13 15:51:30,295 - src.agent.llm_client - INFO - [synthesize_answer_hf] Generated answer: Unable to answer
|
| 135 |
+
2026-01-13 15:51:30,296 - src.agent.llm_client - INFO - [synthesize_answer_hf] Answer appended to context file
|
| 136 |
+
2026-01-13 15:51:30,297 - src.agent.graph - INFO - [answer_node] ✓ Answer generated successfully: Unable to answer
|
| 137 |
+
2026-01-13 15:51:30,297 - src.agent.graph - INFO - [answer_node] ========== ANSWER NODE END ==========
|
| 138 |
+
2026-01-13 15:51:30,299 - **main** - INFO - [1/1] Completed a1e91b78
|
| 139 |
+
2026-01-13 15:51:30,300 - **main** - INFO - Progress: 1/1 questions processed
|
|
|
|
| 140 |
GAIAAgent returning answer: Unable to answer
|
| 141 |
Agent finished. Submitting 1 answers for user 'mangubee'...
|
| 142 |
Submitting 1 answers to: https://agents-course-unit4-scoring.hf.space/submit
|
| 143 |
+
2026-01-13 15:51:31,493 - **main** - INFO - Total execution time: 31.98 seconds (0m 31s)
|
| 144 |
+
2026-01-13 15:51:31,497 - **main** - INFO - Results exported to: /Users/mangubee/Documents/Python/16_HuggingFace/Final_Assignment_Template/\_cache/gaia_results_20260113_155131.json
|
| 145 |
Submission successful.
|
src/agent/graph.py
CHANGED
|
@@ -216,55 +216,24 @@ def plan_node(state: AgentState) -> AgentState:
|
|
| 216 |
Returns:
|
| 217 |
Updated state with execution plan
|
| 218 |
"""
|
| 219 |
-
logger.info(f"[plan_node] ========== PLAN NODE START ==========")
|
| 220 |
-
logger.info(f"[plan_node] Question: {state['question']}")
|
| 221 |
-
logger.info(f"[plan_node] File paths: {state.get('file_paths')}")
|
| 222 |
-
logger.info(f"[plan_node] Available tools: {list(TOOLS.keys())}")
|
| 223 |
-
|
| 224 |
try:
|
| 225 |
-
# Stage 3: Use LLM to generate dynamic execution plan
|
| 226 |
-
logger.info(f"[plan_node] Calling plan_question() with LLM...")
|
| 227 |
plan = plan_question(
|
| 228 |
question=state["question"],
|
| 229 |
available_tools=TOOLS,
|
| 230 |
file_paths=state.get("file_paths"),
|
| 231 |
)
|
| 232 |
-
|
| 233 |
state["plan"] = plan
|
| 234 |
-
logger.info(f"[
|
| 235 |
-
logger.debug(f"[plan_node] Plan content: {plan}")
|
| 236 |
-
|
| 237 |
except Exception as e:
|
| 238 |
-
logger.error(f"[
|
| 239 |
state["errors"].append(f"Planning error: {type(e).__name__}: {str(e)}")
|
| 240 |
state["plan"] = "Error: Unable to create plan"
|
| 241 |
-
|
| 242 |
-
logger.info(f"[plan_node] ========== PLAN NODE END ==========")
|
| 243 |
return state
|
| 244 |
|
| 245 |
|
| 246 |
def execute_node(state: AgentState) -> AgentState:
|
| 247 |
-
"""
|
| 248 |
-
Execution node: Execute tools based on plan.
|
| 249 |
-
|
| 250 |
-
Stage 3: Dynamic tool selection and execution
|
| 251 |
-
- LLM selects tools via function calling
|
| 252 |
-
- Extracts parameters from question
|
| 253 |
-
- Executes tools and collects results
|
| 254 |
-
- Handles errors with retry logic (in tools)
|
| 255 |
-
|
| 256 |
-
Args:
|
| 257 |
-
state: Current agent state with plan
|
| 258 |
-
|
| 259 |
-
Returns:
|
| 260 |
-
Updated state with tool execution results and evidence
|
| 261 |
-
"""
|
| 262 |
-
logger.info(f"[execute_node] ========== EXECUTE NODE START ==========")
|
| 263 |
-
logger.info(f"[execute_node] Plan: {state['plan']}")
|
| 264 |
-
logger.info(f"[execute_node] Question: {state['question']}")
|
| 265 |
-
|
| 266 |
# Map tool names to actual functions
|
| 267 |
-
# NOTE: Keys must match TOOLS registry in src/tools/__init__.py
|
| 268 |
TOOL_FUNCTIONS = {
|
| 269 |
"web_search": search,
|
| 270 |
"parse_file": parse_file,
|
|
@@ -274,14 +243,11 @@ def execute_node(state: AgentState) -> AgentState:
|
|
| 274 |
"transcribe_audio": transcribe_audio,
|
| 275 |
}
|
| 276 |
|
| 277 |
-
# Initialize results lists
|
| 278 |
tool_results = []
|
| 279 |
evidence = []
|
| 280 |
tool_calls = []
|
| 281 |
|
| 282 |
try:
|
| 283 |
-
# Stage 3: Use LLM function calling to select tools and extract parameters
|
| 284 |
-
logger.info(f"[execute_node] Calling select_tools_with_function_calling()...")
|
| 285 |
tool_calls = select_tools_with_function_calling(
|
| 286 |
question=state["question"],
|
| 287 |
plan=state["plan"],
|
|
@@ -291,53 +257,39 @@ def execute_node(state: AgentState) -> AgentState:
|
|
| 291 |
|
| 292 |
# Validate tool_calls result
|
| 293 |
if not tool_calls:
|
| 294 |
-
logger.warning(
|
| 295 |
-
state["errors"].append("Tool selection returned no tools - using fallback
|
| 296 |
-
# MVP HACK: Use fallback keyword-based tool selection
|
| 297 |
tool_calls = fallback_tool_selection(
|
| 298 |
state["question"], state["plan"], state.get("file_paths")
|
| 299 |
)
|
| 300 |
-
logger.info(f"[execute_node] Fallback returned {len(tool_calls)} tool(s)")
|
| 301 |
elif not isinstance(tool_calls, list):
|
| 302 |
-
logger.error(f"[
|
| 303 |
-
state["errors"].append(f"Tool selection returned invalid type: {type(tool_calls)}
|
| 304 |
-
# MVP HACK: Use fallback
|
| 305 |
tool_calls = fallback_tool_selection(
|
| 306 |
state["question"], state["plan"], state.get("file_paths")
|
| 307 |
)
|
| 308 |
else:
|
| 309 |
-
logger.info(f"[
|
| 310 |
-
logger.debug(f"[execute_node] Tool calls: {tool_calls}")
|
| 311 |
|
| 312 |
# Execute each tool call
|
| 313 |
for idx, tool_call in enumerate(tool_calls, 1):
|
| 314 |
tool_name = tool_call["tool"]
|
| 315 |
params = tool_call["params"]
|
| 316 |
|
| 317 |
-
logger.info(f"[execute_node] --- Tool {idx}/{len(tool_calls)}: {tool_name} ---")
|
| 318 |
-
logger.info(f"[execute_node] Parameters: {params}")
|
| 319 |
-
|
| 320 |
try:
|
| 321 |
-
# Get tool function
|
| 322 |
tool_func = TOOL_FUNCTIONS.get(tool_name)
|
| 323 |
if not tool_func:
|
| 324 |
raise ValueError(f"Tool '{tool_name}' not found in TOOL_FUNCTIONS")
|
| 325 |
|
| 326 |
-
# Execute tool
|
| 327 |
-
logger.info(f"[execute_node] Executing {tool_name}...")
|
| 328 |
result = tool_func(**params)
|
| 329 |
-
logger.info(f"[
|
| 330 |
-
logger.debug(f"[execute_node] Result: {result[:200] if isinstance(result, str) else result}...")
|
| 331 |
|
| 332 |
-
|
| 333 |
-
|
| 334 |
-
|
| 335 |
-
|
| 336 |
-
|
| 337 |
-
|
| 338 |
-
"status": "success",
|
| 339 |
-
}
|
| 340 |
-
)
|
| 341 |
|
| 342 |
# Extract evidence - handle different result formats
|
| 343 |
if isinstance(result, dict):
|
|
@@ -375,38 +327,29 @@ def execute_node(state: AgentState) -> AgentState:
|
|
| 375 |
"error": str(tool_error),
|
| 376 |
"status": "failed",
|
| 377 |
}
|
| 378 |
-
)
|
| 379 |
-
|
| 380 |
# Provide specific error message for vision tool failures
|
| 381 |
if tool_name == "vision" and ("quota" in str(tool_error).lower() or "429" in str(tool_error)):
|
| 382 |
-
state["errors"].append(f"Vision
|
| 383 |
else:
|
| 384 |
-
state["errors"].append(f"
|
| 385 |
|
| 386 |
-
logger.info(f"[
|
| 387 |
-
logger.debug(f"[execute_node] Evidence: {evidence}")
|
| 388 |
|
| 389 |
except Exception as e:
|
| 390 |
-
logger.error(f"[
|
| 391 |
|
| 392 |
-
# Graceful handling for vision questions when LLMs unavailable
|
| 393 |
if is_vision_question(state["question"]) and ("quota" in str(e).lower() or "429" in str(e)):
|
| 394 |
-
|
| 395 |
-
state["errors"].append("Vision analysis unavailable (LLM quota exhausted). Vision questions require multimodal LLMs.")
|
| 396 |
else:
|
| 397 |
-
state["errors"].append(f"Execution error: {type(e).__name__}
|
| 398 |
|
| 399 |
# Try fallback if we don't have any tool_calls yet
|
| 400 |
if not tool_calls:
|
| 401 |
-
logger.info(f"[execute_node] Attempting fallback after exception...")
|
| 402 |
try:
|
| 403 |
tool_calls = fallback_tool_selection(
|
| 404 |
state["question"], state.get("plan", ""), state.get("file_paths")
|
| 405 |
)
|
| 406 |
-
logger.info(f"[execute_node] Fallback after exception returned {len(tool_calls)} tool(s)")
|
| 407 |
|
| 408 |
-
# Try to execute fallback tools
|
| 409 |
-
# NOTE: Keys must match TOOLS registry in src/tools/__init__.py
|
| 410 |
TOOL_FUNCTIONS = {
|
| 411 |
"web_search": search,
|
| 412 |
"parse_file": parse_file,
|
|
@@ -429,7 +372,6 @@ def execute_node(state: AgentState) -> AgentState:
|
|
| 429 |
"result": result,
|
| 430 |
"status": "success"
|
| 431 |
})
|
| 432 |
-
# Extract evidence - handle different result formats
|
| 433 |
if isinstance(result, dict):
|
| 434 |
if "answer" in result:
|
| 435 |
evidence.append(result["answer"])
|
|
@@ -451,86 +393,42 @@ def execute_node(state: AgentState) -> AgentState:
|
|
| 451 |
evidence.append(result)
|
| 452 |
else:
|
| 453 |
evidence.append(str(result))
|
| 454 |
-
logger.info(f"[
|
| 455 |
except Exception as tool_error:
|
| 456 |
-
logger.error(f"[
|
| 457 |
except Exception as fallback_error:
|
| 458 |
-
logger.error(f"[
|
| 459 |
|
| 460 |
# Always update state, even if there were errors
|
| 461 |
state["tool_calls"] = tool_calls
|
| 462 |
state["tool_results"] = tool_results
|
| 463 |
state["evidence"] = evidence
|
| 464 |
-
|
| 465 |
-
logger.info(f"[execute_node] ========== EXECUTE NODE END ==========")
|
| 466 |
return state
|
| 467 |
|
| 468 |
|
| 469 |
def answer_node(state: AgentState) -> AgentState:
|
| 470 |
-
"""
|
| 471 |
-
Answer synthesis node: Generate final factoid answer.
|
| 472 |
-
|
| 473 |
-
Stage 3: Synthesize answer from evidence
|
| 474 |
-
- LLM analyzes collected evidence
|
| 475 |
-
- Resolves conflicts if present
|
| 476 |
-
- Generates factoid answer in GAIA format
|
| 477 |
-
|
| 478 |
-
Args:
|
| 479 |
-
state: Current agent state with evidence from tools
|
| 480 |
-
|
| 481 |
-
Returns:
|
| 482 |
-
Updated state with final factoid answer
|
| 483 |
-
"""
|
| 484 |
-
logger.info(f"[answer_node] ========== ANSWER NODE START ==========")
|
| 485 |
-
logger.info(f"[answer_node] Evidence items collected: {len(state['evidence'])}")
|
| 486 |
-
logger.info(f"[answer_node] Errors accumulated: {len(state['errors'])}")
|
| 487 |
-
|
| 488 |
-
# ============================================================================
|
| 489 |
-
# FULL EVIDENCE LOGGING - Debug what evidence is being passed to synthesis
|
| 490 |
-
# ============================================================================
|
| 491 |
-
logger.info("=" * 80)
|
| 492 |
-
logger.info("[EVIDENCE] Full evidence content being passed to synthesis:")
|
| 493 |
-
logger.info("=" * 80)
|
| 494 |
-
for i, ev in enumerate(state['evidence']):
|
| 495 |
-
logger.info(f"[EVIDENCE {i+1}/{len(state['evidence'])}]")
|
| 496 |
-
logger.info(f"{ev[:500]}..." if len(ev) > 500 else f"{ev}")
|
| 497 |
-
logger.info("-" * 80)
|
| 498 |
-
logger.info("=" * 80)
|
| 499 |
-
logger.info("[EVIDENCE] End of evidence content")
|
| 500 |
-
logger.info("=" * 80)
|
| 501 |
-
# ============================================================================
|
| 502 |
-
|
| 503 |
-
logger.debug(f"[answer_node] Evidence: {state['evidence']}")
|
| 504 |
if state["errors"]:
|
| 505 |
-
logger.warning(f"[
|
| 506 |
|
| 507 |
try:
|
| 508 |
-
# Check if we have evidence
|
| 509 |
if not state["evidence"]:
|
| 510 |
-
|
| 511 |
-
|
| 512 |
-
)
|
| 513 |
-
# Show WHY it failed - include error details
|
| 514 |
-
error_summary = "; ".join(state["errors"]) if state["errors"] else "No errors logged - check API keys and logs"
|
| 515 |
-
state["answer"] = f"ERROR: No evidence collected. Details: {error_summary}"
|
| 516 |
-
logger.error(f"[answer_node] Returning error answer: {state['answer']}")
|
| 517 |
return state
|
| 518 |
|
| 519 |
-
# Stage 3: Use LLM to synthesize factoid answer from evidence
|
| 520 |
-
logger.info(f"[answer_node] Calling synthesize_answer() with {len(state['evidence'])} evidence items...")
|
| 521 |
answer = synthesize_answer(
|
| 522 |
question=state["question"], evidence=state["evidence"]
|
| 523 |
)
|
| 524 |
-
|
| 525 |
state["answer"] = answer
|
| 526 |
-
logger.info(f"[
|
| 527 |
|
| 528 |
except Exception as e:
|
| 529 |
-
logger.error(f"[
|
| 530 |
state["errors"].append(f"Answer synthesis error: {type(e).__name__}: {str(e)}")
|
| 531 |
state["answer"] = f"ERROR: Answer synthesis failed - {type(e).__name__}: {str(e)}"
|
| 532 |
|
| 533 |
-
logger.info(f"[answer_node] ========== ANSWER NODE END ==========")
|
| 534 |
return state
|
| 535 |
|
| 536 |
|
|
|
|
| 216 |
Returns:
|
| 217 |
Updated state with execution plan
|
| 218 |
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 219 |
try:
|
|
|
|
|
|
|
| 220 |
plan = plan_question(
|
| 221 |
question=state["question"],
|
| 222 |
available_tools=TOOLS,
|
| 223 |
file_paths=state.get("file_paths"),
|
| 224 |
)
|
|
|
|
| 225 |
state["plan"] = plan
|
| 226 |
+
logger.info(f"[plan] ✓ {len(plan)} chars")
|
|
|
|
|
|
|
| 227 |
except Exception as e:
|
| 228 |
+
logger.error(f"[plan] ✗ {type(e).__name__}: {str(e)}")
|
| 229 |
state["errors"].append(f"Planning error: {type(e).__name__}: {str(e)}")
|
| 230 |
state["plan"] = "Error: Unable to create plan"
|
|
|
|
|
|
|
| 231 |
return state
|
| 232 |
|
| 233 |
|
| 234 |
def execute_node(state: AgentState) -> AgentState:
|
| 235 |
+
"""Execution node: Execute tools based on plan."""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 236 |
# Map tool names to actual functions
|
|
|
|
| 237 |
TOOL_FUNCTIONS = {
|
| 238 |
"web_search": search,
|
| 239 |
"parse_file": parse_file,
|
|
|
|
| 243 |
"transcribe_audio": transcribe_audio,
|
| 244 |
}
|
| 245 |
|
|
|
|
| 246 |
tool_results = []
|
| 247 |
evidence = []
|
| 248 |
tool_calls = []
|
| 249 |
|
| 250 |
try:
|
|
|
|
|
|
|
| 251 |
tool_calls = select_tools_with_function_calling(
|
| 252 |
question=state["question"],
|
| 253 |
plan=state["plan"],
|
|
|
|
| 257 |
|
| 258 |
# Validate tool_calls result
|
| 259 |
if not tool_calls:
|
| 260 |
+
logger.warning("[execute] No tools selected, using fallback")
|
| 261 |
+
state["errors"].append("Tool selection returned no tools - using fallback")
|
|
|
|
| 262 |
tool_calls = fallback_tool_selection(
|
| 263 |
state["question"], state["plan"], state.get("file_paths")
|
| 264 |
)
|
|
|
|
| 265 |
elif not isinstance(tool_calls, list):
|
| 266 |
+
logger.error(f"[execute] Invalid type: {type(tool_calls)}, using fallback")
|
| 267 |
+
state["errors"].append(f"Tool selection returned invalid type: {type(tool_calls)}")
|
|
|
|
| 268 |
tool_calls = fallback_tool_selection(
|
| 269 |
state["question"], state["plan"], state.get("file_paths")
|
| 270 |
)
|
| 271 |
else:
|
| 272 |
+
logger.info(f"[execute] {len(tool_calls)} tool(s) selected")
|
|
|
|
| 273 |
|
| 274 |
# Execute each tool call
|
| 275 |
for idx, tool_call in enumerate(tool_calls, 1):
|
| 276 |
tool_name = tool_call["tool"]
|
| 277 |
params = tool_call["params"]
|
| 278 |
|
|
|
|
|
|
|
|
|
|
| 279 |
try:
|
|
|
|
| 280 |
tool_func = TOOL_FUNCTIONS.get(tool_name)
|
| 281 |
if not tool_func:
|
| 282 |
raise ValueError(f"Tool '{tool_name}' not found in TOOL_FUNCTIONS")
|
| 283 |
|
|
|
|
|
|
|
| 284 |
result = tool_func(**params)
|
| 285 |
+
logger.info(f"[{idx}/{len(tool_calls)}] {tool_name} ✓")
|
|
|
|
| 286 |
|
| 287 |
+
tool_results.append({
|
| 288 |
+
"tool": tool_name,
|
| 289 |
+
"params": params,
|
| 290 |
+
"result": result,
|
| 291 |
+
"status": "success",
|
| 292 |
+
})
|
|
|
|
|
|
|
|
|
|
| 293 |
|
| 294 |
# Extract evidence - handle different result formats
|
| 295 |
if isinstance(result, dict):
|
|
|
|
| 327 |
"error": str(tool_error),
|
| 328 |
"status": "failed",
|
| 329 |
}
|
|
|
|
|
|
|
| 330 |
# Provide specific error message for vision tool failures
|
| 331 |
if tool_name == "vision" and ("quota" in str(tool_error).lower() or "429" in str(tool_error)):
|
| 332 |
+
state["errors"].append(f"Vision failed: LLM quota exhausted")
|
| 333 |
else:
|
| 334 |
+
state["errors"].append(f"{tool_name}: {type(tool_error).__name__}")
|
| 335 |
|
| 336 |
+
logger.info(f"[execute] {len(tool_results)} tools, {len(evidence)} evidence")
|
|
|
|
| 337 |
|
| 338 |
except Exception as e:
|
| 339 |
+
logger.error(f"[execute] ✗ {type(e).__name__}: {str(e)}")
|
| 340 |
|
|
|
|
| 341 |
if is_vision_question(state["question"]) and ("quota" in str(e).lower() or "429" in str(e)):
|
| 342 |
+
state["errors"].append("Vision unavailable (quota exhausted)")
|
|
|
|
| 343 |
else:
|
| 344 |
+
state["errors"].append(f"Execution error: {type(e).__name__}")
|
| 345 |
|
| 346 |
# Try fallback if we don't have any tool_calls yet
|
| 347 |
if not tool_calls:
|
|
|
|
| 348 |
try:
|
| 349 |
tool_calls = fallback_tool_selection(
|
| 350 |
state["question"], state.get("plan", ""), state.get("file_paths")
|
| 351 |
)
|
|
|
|
| 352 |
|
|
|
|
|
|
|
| 353 |
TOOL_FUNCTIONS = {
|
| 354 |
"web_search": search,
|
| 355 |
"parse_file": parse_file,
|
|
|
|
| 372 |
"result": result,
|
| 373 |
"status": "success"
|
| 374 |
})
|
|
|
|
| 375 |
if isinstance(result, dict):
|
| 376 |
if "answer" in result:
|
| 377 |
evidence.append(result["answer"])
|
|
|
|
| 393 |
evidence.append(result)
|
| 394 |
else:
|
| 395 |
evidence.append(str(result))
|
| 396 |
+
logger.info(f"[execute] Fallback {tool_name} ✓")
|
| 397 |
except Exception as tool_error:
|
| 398 |
+
logger.error(f"[execute] Fallback {tool_name} ✗ {tool_error}")
|
| 399 |
except Exception as fallback_error:
|
| 400 |
+
logger.error(f"[execute] Fallback failed: {fallback_error}")
|
| 401 |
|
| 402 |
# Always update state, even if there were errors
|
| 403 |
state["tool_calls"] = tool_calls
|
| 404 |
state["tool_results"] = tool_results
|
| 405 |
state["evidence"] = evidence
|
|
|
|
|
|
|
| 406 |
return state
|
| 407 |
|
| 408 |
|
| 409 |
def answer_node(state: AgentState) -> AgentState:
|
| 410 |
+
"""Answer synthesis node: Generate final factoid answer from evidence."""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 411 |
if state["errors"]:
|
| 412 |
+
logger.warning(f"[answer] Errors: {state['errors']}")
|
| 413 |
|
| 414 |
try:
|
|
|
|
| 415 |
if not state["evidence"]:
|
| 416 |
+
error_summary = "; ".join(state["errors"]) if state["errors"] else "No errors logged"
|
| 417 |
+
state["answer"] = f"ERROR: No evidence. {error_summary}"
|
| 418 |
+
logger.error(f"[answer] ✗ No evidence - {error_summary}")
|
|
|
|
|
|
|
|
|
|
|
|
|
| 419 |
return state
|
| 420 |
|
|
|
|
|
|
|
| 421 |
answer = synthesize_answer(
|
| 422 |
question=state["question"], evidence=state["evidence"]
|
| 423 |
)
|
|
|
|
| 424 |
state["answer"] = answer
|
| 425 |
+
logger.info(f"[answer] ✓ {answer}")
|
| 426 |
|
| 427 |
except Exception as e:
|
| 428 |
+
logger.error(f"[answer] ✗ {type(e).__name__}: {str(e)}")
|
| 429 |
state["errors"].append(f"Answer synthesis error: {type(e).__name__}: {str(e)}")
|
| 430 |
state["answer"] = f"ERROR: Answer synthesis failed - {type(e).__name__}: {str(e)}"
|
| 431 |
|
|
|
|
| 432 |
return state
|
| 433 |
|
| 434 |
|
src/agent/llm_client.py
CHANGED
|
@@ -1142,30 +1142,13 @@ Extract the factoid answer from the evidence above. Return only the factoid, not
|
|
| 1142 |
f.write(ev)
|
| 1143 |
f.write("\n" + "=" * 80 + "\n")
|
| 1144 |
|
| 1145 |
-
logger.info(f"[synthesize_answer_hf]
|
| 1146 |
-
# ============================================================================
|
| 1147 |
-
|
| 1148 |
-
logger.info(f"[synthesize_answer_hf] Calling HuggingFace for answer synthesis")
|
| 1149 |
|
| 1150 |
messages = [
|
| 1151 |
{"role": "system", "content": system_prompt},
|
| 1152 |
{"role": "user", "content": user_prompt},
|
| 1153 |
]
|
| 1154 |
|
| 1155 |
-
# ============================================================================
|
| 1156 |
-
# FULL CONTEXT LOGGING - Debug LLM synthesis failures
|
| 1157 |
-
# ============================================================================
|
| 1158 |
-
logger.info("=" * 80)
|
| 1159 |
-
logger.info("[LLM CONTEXT] Full synthesis prompt being sent to LLM:")
|
| 1160 |
-
logger.info("=" * 80)
|
| 1161 |
-
logger.info(f"[SYSTEM PROMPT]\n{system_prompt}")
|
| 1162 |
-
logger.info("-" * 80)
|
| 1163 |
-
logger.info(f"[USER PROMPT]\n{user_prompt}")
|
| 1164 |
-
logger.info("=" * 80)
|
| 1165 |
-
logger.info("[LLM CONTEXT] End of full context")
|
| 1166 |
-
logger.info("=" * 80)
|
| 1167 |
-
# ============================================================================
|
| 1168 |
-
|
| 1169 |
response = client.chat_completion(
|
| 1170 |
messages=messages,
|
| 1171 |
max_tokens=256, # Factoid answers are short
|
|
|
|
| 1142 |
f.write(ev)
|
| 1143 |
f.write("\n" + "=" * 80 + "\n")
|
| 1144 |
|
| 1145 |
+
logger.info(f"[synthesize_answer_hf] Context saved to: {context_file}")
|
|
|
|
|
|
|
|
|
|
| 1146 |
|
| 1147 |
messages = [
|
| 1148 |
{"role": "system", "content": system_prompt},
|
| 1149 |
{"role": "user", "content": user_prompt},
|
| 1150 |
]
|
| 1151 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1152 |
response = client.chat_completion(
|
| 1153 |
messages=messages,
|
| 1154 |
max_tokens=256, # Factoid answers are short
|