AgenticRAG_test

Sleeping

App Files Files Community

Zeggai Abdellah commited on Jun 8, 2025

Commit

a6cedca

1 Parent(s): a043019

fix max iterations

Browse files

Files changed (1) hide show

rag_pipeline.py +58 -53

rag_pipeline.py CHANGED Viewed

@@ -121,7 +121,7 @@ You provide evidence-based guidance using only information from official vaccine
 Answer the doctor's question accurately and concisely using only the provided information.
 ## FALLBACK MODE INSTRUCTIONS
-- You have access to only 2 powerful tools: Guide_vector_tool (Algerian National Vaccination Guide) and Immunization_in_Practice_tool (WHO global guidance).
 - **MANDATORY TOOL USAGE**: Always use the relevant tool(s) to search for information before answering, even if you initially think no information is available.
 - Be direct and efficient - search once with each tool if needed, then provide your answer.
 - Do not overthink or search repeatedly - these tools are comprehensive.
@@ -132,7 +132,7 @@ Answer the doctor's question accurately and concisely using only the provided in
 1. For each fact in your response, include an inline citation in the format [Source ID] immediately following the information, e.g., [e795ebd28318886c0b1a5395ac30ad90].
 2. The Source ID must be the exact alphanumeric identifier from the search results, NOT the tool name or any other text.
 3. Do NOT use 'Source:' in the citation format; use only the Source ID in square brackets.
-4. Do NOT use tool names (like Guide_vector_tool, Immunization_in_Practice_tool) as citations.
 5. If a fact is supported by multiple sources, use adjacent citations: [e795ebd28318886c0b1a5395ac30ad90][21a932b2340bb16707763f57f0ad2]
 6. Use ONLY the provided information from tool outputs and never include facts from your general knowledge.
@@ -146,20 +146,18 @@ Answer the doctor's question accurately and concisely using only the provided in
 ### CRITICAL: Efficient Fallback Strategy
 1. **MANDATORY SEARCH**: Use each relevant tool at least once to search for information, even if you suspect the information might not be available.
-2. **BREAK DOWN COMPLEX QUERIES**: For comparative or multi-part questions (e.g., comparing Algerian and WHO guidelines), break the query into sub-queries and use the appropriate tool for each part:
-   - Use Guide_vector_tool for Algerian-specific information (e.g., national schedules, coverage targets).
-   - Use Immunization_in_Practice_tool for WHO-specific information (e.g., global recommendations, coverage targets).
 3. **DO NOT STOP PREMATURELY**: Do not conclude "no information is available" without using the relevant tool(s) to search for the answer.
 4. **BE DECISIVE**: Once you find relevant information for each sub-query, formulate your response immediately.
 5. **ANSWER FULLY**: Address all parts of the question, using multiple tools if required by the query.
 ### Response Guidelines
 - **MANDATORY TOOL SELECTION**:
-  - For queries mentioning "WHO," "World Health Organization," "international," "global guidance," or WHO documents (e.g., page numbers), use Immunization_in_Practice_tool first.
-  - For queries mentioning "Algerian," "national guide," or Algerian-specific terms (e.g., page numbers), use Guide_vector_tool first.
-  - For comparative queries (e.g., Algerian vs. WHO), use both Guide_vector_tool and Immunization_in_Practice_tool, addressing each part systematically.
 - **EXPLICIT REASONING**: Before answering, log your reasoning steps, including which tools you will use and why, based on the query’s content.
-- **Query Decomposition**: Break comparative or multi-part queries into sub-queries (e.g., one for Algerian information, one for WHO information) and use the appropriate tool for each.
 - Provide all found information with proper citations using Source IDs only.
 - If information is limited, clearly state: "Based on the available documents, I can provide the following information..." and indicate what is not available.
@@ -178,7 +176,7 @@ Answer the doctor's question accurately and concisely using only the provided in
 1. For each fact in your response, include an inline citation in the format [Source ID] immediately following the information, e.g., [e795ebd28318886c0b1a5395ac30ad90].
 2. The Source ID must be the exact alphanumeric identifier from the search results, NOT the tool name or any other text.
 3. Do NOT use 'Source:' in the citation format; use only the Source ID in square brackets.
-4. Do NOT use tool names (like Guide_vector_tool, Immunization_in_Practice_tool) as citations.
 5. If a fact is supported by multiple sources, use adjacent citations: [e795ebd28318886c0b1a5395ac30ad90][21a932b2340bb16707763f57f0ad2]
 6. Use ONLY the provided information from tool outputs and never include facts from your general knowledge.
@@ -193,28 +191,23 @@ Answer the doctor's question accurately and concisely using only the provided in
 ### CRITICAL: Efficient Response Strategy
 1. **MANDATORY SEARCH**: Always use the relevant tool(s) to search for information before answering, even if you initially think no information is available.
 2. **MANDATORY TOOL SELECTION**:
-   - For queries mentioning "WHO," "World Health Organization," "international," "global guidance," or WHO documents (e.g., page numbers), use Immunization_in_Practice_tool first.
-   - For queries mentioning "Algerian," "national guide," or Algerian-specific terms (e.g., page numbers), use Guide_vector_tool first.
-   - For comparative queries (e.g., Algerian vs. WHO), use both Guide_vector_tool and Immunization_in_Practice_tool, addressing each part systematically.
-3. **Query Decomposition**: Break comparative or multi-part queries into sub-queries (e.g., one for Algerian information, one for WHO information) and use the appropriate tool for each.
 4. **DO NOT STOP PREMATURELY**: Do not conclude "no information is available" without using the relevant tool(s) to search for the answer.
-5. **EXPLICIT REASONING**: Before answering, log your reasoning steps, including which tools you will use and why, based on the query’s content.
-6. **BE DECISIVE**: Once you find relevant information for each sub-query, formulate your response immediately.
-7. **ANSWER FULLY**: Address all parts of the question, using multiple tools if required by the query.
-8. **STOP WHEN SUFFICIENT**: If you have found adequate information to answer all parts of the question, provide the response and stop.
 ### Response Guidelines for Complex Questions
-- For comparative questions: Break the query into sub-queries (e.g., Algerian vs. WHO), use Guide_vector_tool for Algerian specifics and Immunization_in_Practice_tool for WHO specifics, then provide the comparison.
 - For multi-part questions: Address each part systematically, using the appropriate tool for each sub-query.
 - If information is not found after using the relevant tool(s): State clearly: "Based on the available documents, I can provide the following information..." and specify what is not available.
-- Do not repeatedly search for the same terms or rephrase searches excessively.
-### When Information is Limited
-If you cannot find complete information to fully answer a question:
-1. Provide whatever relevant information you did find with proper citations using Source IDs only.
-2. Clearly state: "Based on the available documents, I can provide the following information..."
-3. Indicate what specific information is not available: "However, information about [specific topic] was not found in the provided documents after searching with the relevant tool(s)."
-4. Do not conclude "no information is available" without attempting a search with the appropriate tool(s).
 ---
 """
@@ -248,7 +241,8 @@ def create_agent(tools, llm, is_fallback=False):
     """Create the ReAct agent with custom prompt"""
     agent_type = "FALLBACK" if is_fallback else "STANDARD"
-    max_iter =10
     print(f"[LOG] Creating {agent_type} ReAct agent with {len(tools)} tools and max_iterations={max_iter}")
@@ -257,7 +251,7 @@ def create_agent(tools, llm, is_fallback=False):
         tools,
         llm=llm,
         verbose=True,
-        max_iterations=max_iter,  # Reduced iterations for fallback agent
     )
     # Create and apply safe custom prompt
@@ -274,9 +268,9 @@ def create_agent(tools, llm, is_fallback=False):
 def create_fallback_tools(all_tools):
-    """Extract only the guide_retrieval_tool and immunization_tool for fallback agent"""
-    print("[LOG] Creating fallback tools (guide + immunization only)")
     fallback_tools = []
     tool_names_found = []
@@ -334,7 +328,14 @@ def initialize_rag_pipeline(tools):
 def detect_max_iterations_error(response_text):
-    """Detect if the response indicates a max iterations error"""
     max_iteration_indicators = [
         "max iterations",
@@ -344,11 +345,10 @@ def detect_max_iterations_error(response_text):
         "iteration limit"
     ]
-    response_lower = response_text.lower()
-    # Check for max iterations indicators
     for indicator in max_iteration_indicators:
         if indicator in response_lower:
             return True
     # Check for very short or empty responses (often indicates failure)
@@ -389,7 +389,7 @@ def process_question(agents_dict, question: str) -> str:
         # Check if we need to use fallback
         if detect_max_iterations_error(response_text):
-            print("[LOG] 🔄 Max iterations detected, switching to FALLBACK AGENT...")
             if fallback_agent is None:
                 print("[LOG] ❌ Fallback agent not available, returning error message")
@@ -419,7 +419,7 @@ def process_question(agents_dict, question: str) -> str:
                 # Check if fallback also failed
                 if detect_max_iterations_error(fallback_text):
-                    print("[LOG] ❌ Fallback agent also hit max iterations")
                     return ("I apologize, but I'm having difficulty finding specific information about your question in the available documents. "
                            "Please try asking a more specific question or rephrasing your query.")
@@ -501,16 +501,17 @@ def process_question_with_sequential_citations(agents_dict, question: str, chunk
     print(f"[LOG] Chunks directory: {chunks_directory}")
     start_time = time.time()
-    used_fallback = False
     try:
         # Get the response using the enhanced process_question function
         response_text = process_question(agents_dict, question)
-        # Check if this looks like a fallback was used (simple heuristic)
-        if "fallback" in response_text.lower() or len(response_text) < 50:
             used_fallback = True
-            print("[LOG] 🛡️ Fallback agent was likely used")
         agent_time = time.time() - start_time
         print(f"[LOG] Agent processing completed in {agent_time:.2f} seconds")
@@ -534,6 +535,10 @@ def process_question_with_sequential_citations(agents_dict, question: str, chunk
         for json_file in min_chunks_files:
             json_path = os.path.join(chunks_directory, json_file)
             print(f"[LOG] Loading {json_file}...")
             try:
                 with open(json_path, "r", encoding="utf-8") as f:
@@ -549,23 +554,23 @@ def process_question_with_sequential_citations(agents_dict, question: str, chunk
         print("[LOG] Finding cited elements...")
         cited_elements_ordered = []
         for i, source_id in enumerate(unique_ids):  # This preserves the order
-            print(f"[LOG] Looking for source ID {i+1}/{len(unique_ids)}: {source_id}")
             found = False
             for element in all_chunks_data:
                 if element.get("type") == 'TableElement':
-                    if element.get("elements",{}).get("element_id") == source_id:
-                        cited_elements_ordered.append(element.get("elements",{}))
                         found = True
                         break
-                else:
-                    if "elements" in element:
-                        for nested_element in element["elements"]:
-                            if nested_element.get("element_id") == source_id:
-                                cited_elements_ordered.append(nested_element)
-                                found = True
-                                break
-                        else:
-                            continue
                         break
             if not found:
                 print(f"[LOG] ⚠️ Source ID {source_id} not found in chunks data")

 Answer the doctor's question accurately and concisely using only the provided information.
 ## FALLBACK MODE INSTRUCTIONS
+- You have access to only 2 powerful tools: general_guide_tool (Algerian National Vaccination Guide) and who_immunization_tool (WHO global guidance).
 - **MANDATORY TOOL USAGE**: Always use the relevant tool(s) to search for information before answering, even if you initially think no information is available.
 - Be direct and efficient - search once with each tool if needed, then provide your answer.
 - Do not overthink or search repeatedly - these tools are comprehensive.
 1. For each fact in your response, include an inline citation in the format [Source ID] immediately following the information, e.g., [e795ebd28318886c0b1a5395ac30ad90].
 2. The Source ID must be the exact alphanumeric identifier from the search results, NOT the tool name or any other text.
 3. Do NOT use 'Source:' in the citation format; use only the Source ID in square brackets.
+4. Do NOT use tool names (like general_guide_tool, who_immunization_tool) as citations.
 5. If a fact is supported by multiple sources, use adjacent citations: [e795ebd28318886c0b1a5395ac30ad90][21a932b2340bb16707763f57f0ad2]
 6. Use ONLY the provided information from tool outputs and never include facts from your general knowledge.
 ### CRITICAL: Efficient Fallback Strategy
 1. **MANDATORY SEARCH**: Use each relevant tool at least once to search for information, even if you suspect the information might not be available.
+2. **BREAK DOWN COMPLEX QUERIES**: For comparative or multi-part questions (e.g., comparing Algerian and WHO guidelines), break the query into sub-queries and use the appropriate tool for each part.
 3. **DO NOT STOP PREMATURELY**: Do not conclude "no information is available" without using the relevant tool(s) to search for the answer.
 4. **BE DECISIVE**: Once you find relevant information for each sub-query, formulate your response immediately.
 5. **ANSWER FULLY**: Address all parts of the question, using multiple tools if required by the query.
+6. **FINAL ANSWER**: Once you have your answer, present it directly. Do not output your internal 'thought' or 'action' steps. Your final output must be the synthesized answer itself.
 ### Response Guidelines
 - **MANDATORY TOOL SELECTION**:
+  - For queries mentioning "WHO," "World Health Organization," "international," "global guidance," or WHO documents, use who_immunization_tool first.
+  - For queries mentioning "Algerian," "national guide," or Algerian-specific terms, use general_guide_tool first.
+  - For comparative queries (e.g., Algerian vs. WHO), use both tools, addressing each part systematically.
 - **EXPLICIT REASONING**: Before answering, log your reasoning steps, including which tools you will use and why, based on the query’s content.
 - Provide all found information with proper citations using Source IDs only.
 - If information is limited, clearly state: "Based on the available documents, I can provide the following information..." and indicate what is not available.
 1. For each fact in your response, include an inline citation in the format [Source ID] immediately following the information, e.g., [e795ebd28318886c0b1a5395ac30ad90].
 2. The Source ID must be the exact alphanumeric identifier from the search results, NOT the tool name or any other text.
 3. Do NOT use 'Source:' in the citation format; use only the Source ID in square brackets.
+4. Do NOT use tool names (like general_guide_tool, cold_chain_tool) as citations.
 5. If a fact is supported by multiple sources, use adjacent citations: [e795ebd28318886c0b1a5395ac30ad90][21a932b2340bb16707763f57f0ad2]
 6. Use ONLY the provided information from tool outputs and never include facts from your general knowledge.
 ### CRITICAL: Efficient Response Strategy
 1. **MANDATORY SEARCH**: Always use the relevant tool(s) to search for information before answering, even if you initially think no information is available.
 2. **MANDATORY TOOL SELECTION**:
+   - For queries about global standards or WHO, use who_immunization_tool.
+   - For broad questions about the Algerian guide, use general_guide_tool.
+   - For specific topics like cold chain, disease info, etc., use the most specific tool (e.g., cold_chain_tool, disease_info_tool).
+3. **Query Decomposition**: Break comparative or multi-part queries into sub-queries and use the appropriate tool for each.
 4. **DO NOT STOP PREMATURELY**: Do not conclude "no information is available" without using the relevant tool(s) to search for the answer.
+5. **EXPLICIT REASONING**: Before answering, log your reasoning steps, including which tools you will use and why.
+6. **BE DECISIVE**: Once you find relevant information, formulate your response.
+### Final Answer Generation
+- **STOP WHEN SUFFICIENT**: Once you have gathered enough information from the tools to answer the user's question completely, you MUST stop using tools and formulate a final answer.
+- **SYNTHESIZE THE ANSWER**: Formulate a comprehensive, final answer based ONLY on the observed tool outputs.
+- **PRESENT CLEANLY**: Present this final answer directly to the user. Your final output must be the answer itself, not your internal 'thought' or 'action' steps.
 ### Response Guidelines for Complex Questions
+- For comparative questions: Break the query into sub-queries, use the appropriate tools, then provide the comparison.
 - For multi-part questions: Address each part systematically, using the appropriate tool for each sub-query.
 - If information is not found after using the relevant tool(s): State clearly: "Based on the available documents, I can provide the following information..." and specify what is not available.
 ---
 """
     """Create the ReAct agent with custom prompt"""
     agent_type = "FALLBACK" if is_fallback else "STANDARD"
+    # **FIX**: Increased max_iterations to give the agent more steps to reason
+    max_iter = 15
     print(f"[LOG] Creating {agent_type} ReAct agent with {len(tools)} tools and max_iterations={max_iter}")
         tools,
         llm=llm,
         verbose=True,
+        max_iterations=max_iter,
     )
     # Create and apply safe custom prompt
 def create_fallback_tools(all_tools):
+    """Extract only the general_guide_tool and who_immunization_tool for fallback agent"""
+    print("[LOG] Creating fallback tools (guide + WHO only)")
     fallback_tools = []
     tool_names_found = []
 def detect_max_iterations_error(response_text):
+    """Detect if the response indicates a max iterations error OR is an unfinished thought."""
+    response_lower = response_text.lower().strip()
+    # **FIX**: Check if the response is the agent's raw thought process.
+    if response_lower.startswith("a:```thought") or response_lower.startswith("```thought"):
+        print("[LOG] Detected unfinished agent thought process.")
+        return True
     max_iteration_indicators = [
         "max iterations",
         "iteration limit"
     ]
+    # Check for explicit max iterations indicators
     for indicator in max_iteration_indicators:
         if indicator in response_lower:
+            print(f"[LOG] Detected max iteration indicator: '{indicator}'")
             return True
     # Check for very short or empty responses (often indicates failure)
         # Check if we need to use fallback
         if detect_max_iterations_error(response_text):
+            print("[LOG] 🔄 Max iterations or unfinished thought detected, switching to FALLBACK AGENT...")
             if fallback_agent is None:
                 print("[LOG] ❌ Fallback agent not available, returning error message")
                 # Check if fallback also failed
                 if detect_max_iterations_error(fallback_text):
+                    print("[LOG] ❌ Fallback agent also hit max iterations or failed to produce an answer.")
                     return ("I apologize, but I'm having difficulty finding specific information about your question in the available documents. "
                            "Please try asking a more specific question or rephrasing your query.")
     print(f"[LOG] Chunks directory: {chunks_directory}")
     start_time = time.time()
+    used_fallback = False # This flag is a heuristic
     try:
         # Get the response using the enhanced process_question function
         response_text = process_question(agents_dict, question)
+        # Check if fallback was likely used (simple heuristic based on logs)
+        # A more robust way would be for `process_question` to return a tuple (response, used_fallback)
+        if "switching to fallback agent" in response_text.lower():
             used_fallback = True
+            print("[LOG] 🛡️ Fallback agent was likely used based on log indicators.")
         agent_time = time.time() - start_time
         print(f"[LOG] Agent processing completed in {agent_time:.2f} seconds")
         for json_file in min_chunks_files:
             json_path = os.path.join(chunks_directory, json_file)
+            if not os.path.exists(json_path):
+                print(f"[LOG] ⚠️ Skipping non-existent file: {json_path}")
+                continue
             print(f"[LOG] Loading {json_file}...")
             try:
                 with open(json_path, "r", encoding="utf-8") as f:
         print("[LOG] Finding cited elements...")
         cited_elements_ordered = []
         for i, source_id in enumerate(unique_ids):  # This preserves the order
+            # print(f"[LOG] Looking for source ID {i+1}/{len(unique_ids)}: {source_id}") # This is too verbose for normal operation
             found = False
             for element in all_chunks_data:
+                # Handle TableElement structure
                 if element.get("type") == 'TableElement':
+                    if element.get("elements", {}).get("element_id") == source_id:
+                        cited_elements_ordered.append(element.get("elements", {}))
                         found = True
                         break
+                # Handle other element structures
+                elif "elements" in element and isinstance(element["elements"], list):
+                    for nested_element in element["elements"]:
+                        if isinstance(nested_element, dict) and nested_element.get("element_id") == source_id:
+                            cited_elements_ordered.append(nested_element)
+                            found = True
+                            break
+                    if found:
                         break
             if not found:
                 print(f"[LOG] ⚠️ Source ID {source_id} not found in chunks data")