Spaces:

kamkol
/

AB_Testing_RAG_Agent

Sleeping

App Files Files Community

kamkol commited on Apr 30, 2025

Commit

4ddf811

1 Parent(s): 23cc167

Improve agent logic with improved templates

Browse files

Files changed (5) hide show

notebook_version/AB_Testing_RAG_Agent.ipynb +0 -0
notebook_version/README.md +94 -0
notebook_version/pyproject.toml +16 -0
notebook_version/uv.lock +0 -0
streamlit_app.py +309 -268

notebook_version/AB_Testing_RAG_Agent.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

notebook_version/README.md ADDED Viewed

	@@ -0,0 +1,94 @@

+<p align = "center" draggable=”false” ><img src="https://github.com/AI-Maker-Space/LLM-Dev-101/assets/37101144/d1343317-fa2f-41e1-8af1-1dbb18399719"
+     width="200px"
+     height="auto"/>
+</p>
+## <h1 align="center" id="heading">Session 8: Evaluating RAG with Ragas</h1>
+| 🤓 Pre-work | 📰 Session Sheet | ⏺️ Recording     | 🖼️ Slides        | 👨‍💻 Repo         | 📝 Homework      | 📁 Feedback       |
+|:-----------------|:-----------------|:-----------------|:-----------------|:-----------------|:-----------------|:-----------------|
+| [Session 8: Pre-Work](https://www.notion.so/Session-8-RAG-Evaluation-and-Assessment-1c8cd547af3d81d08f7cf5521d0253bb?pvs=4#1c8cd547af3d816583d6c23183b6f87f) | [Session 8: RAG Evaluation and Assessment](https://www.notion.so/Session-8-RAG-Evaluation-and-Assessment-1c8cd547af3d81d08f7cf5521d0253bb) | Coming soon! | [Session 8 Slides](https://www.canva.com/design/DAGjadKGqcw/0Gff9K2EwbOb3lX14un3uw/edit?utm_content=DAGjadKGqcw&utm_campaign=designshare&utm_medium=link2&utm_source=sharebutton) | You are here! | [Session 8: RAG Evaluation and Assessment](https://forms.gle/ujAQLqx2ZHMWTUH79) | [AIE6 Feedback 4/24](https://forms.gle/wA7p89e6svCgjtr58) |
+In today's assignment, we'll be creating Synthetic Data, and using it to benchmark (and improve) a LCEL RAG Chain.
+- 🤝 Breakout Room #1
+  1. Task 1: Installing Required Libraries
+  2. Task 2: Set Environment Variables
+  3. Task 3: Synthetic Dataset Generation for Evaluation using Ragas
+  4. Task 4: Evaluating our Pipeline with Ragas
+  5. Task 6: Making Adjustments and Re-Evaluating
+  The notebook Colab link is located [here](https://colab.research.google.com/drive/1-t4POIFJI-SWF1lmoBOPETZZqgWCTV4Y?usp=sharing)
+- 🤝 Breakout Room #2
+  1. Task 1: Building a ReAct Agent with Metal Price Tool
+  2. Task 2: Implementing the Agent Graph Structure
+  3. Task 3: Converting Agent Messages to Ragas Format
+  4. Task 4: Evaluating Agent Performance using Ragas Metrics
+     - Tool Call Accuracy
+     - Agent Goal Accuracy
+     - Topic Adherence
+The notebook Colab link is located [here](https://colab.research.google.com/drive/1KQm7nA_zTaCyjaAeAacjqanMPv03um7T?usp=sharing)
+## Ship 🚢
+The completed notebook!
+<details>
+<summary>🚧 BONUS CHALLENGE 🚧 (OPTIONAL)</summary>
+> NOTE: Completing this challenge will provide full marks on the assignment, regardless of the completion of the notebook. You do not need to complete this in the notebook for full marks.
+##### **MINIMUM REQUIREMENTS**:
+1. Baseline `LangGraph RAG` Application using `NAIVE RETRIEVAL`
+2. Baseline Evaluation using `RAGAS METRICS`
+  - [Faithfulness](https://docs.ragas.io/en/stable/concepts/metrics/faithfulness.html)
+  - [Answer Relevancy](https://docs.ragas.io/en/stable/concepts/metrics/answer_relevance.html)
+  - [Context Precision](https://docs.ragas.io/en/stable/concepts/metrics/context_precision.html)
+  - [Context Recall](https://docs.ragas.io/en/stable/concepts/metrics/context_recall.html)
+  - [Answer Correctness](https://docs.ragas.io/en/stable/concepts/metrics/answer_correctness.html)
+3. Implement a `SEMANTIC CHUNKING STRATEGY`.
+4. Create an `LangGraph RAG` Application using `SEMANTIC CHUNKING` with `NAIVE RETRIEVAL`.
+5. Compare and contrast results.
+##### **SEMANTIC CHUNKING REQUIREMENTS**:
+Chunk semantically similar (based on designed threshold) sentences, and then paragraphs, greedily, up to a maximum chunk size. Minimum chunk size is a single sentence.
+Have fun!
+</details>
+### Deliverables
+- A short Loom of the notebook, and a 1min. walkthrough of the application in full
+## Share 🚀
+Make a social media post about your final application!
+### Deliverables
+- Make a post on any social media platform about what you built!
+Here's a template to get you started:
+```
+🚀 Exciting News! 🚀
+I am thrilled to announce that I have just built and shipped Synthetic Data Generation, benchmarking, and iteration with RAGAS & LangChain! 🎉🤖
+🔍 Three Key Takeaways:
+1️⃣
+2️⃣
+3️⃣
+Let's continue pushing the boundaries of what's possible in the world of AI and question-answering. Here's to many more innovations! 🚀
+Shout out to @AIMakerspace !
+#LangChain #QuestionAnswering #RetrievalAugmented #Innovation #AI #TechMilestone
+Feel free to reach out if you're curious or would like to collaborate on similar projects! 🤝🔥
+```

notebook_version/pyproject.toml ADDED Viewed

	@@ -0,0 +1,16 @@

+[project]
+name = "08-evaluating-rag-with-ragas"
+version = "0.1.0"
+description = "Add your description here"
+readme = "README.md"
+requires-python = ">=3.13"
+dependencies = [
+    "jupyter>=1.1.1",
+    "langchain-community==0.3.14",
+    "langchain-openai==0.2.14",
+    "langchain-qdrant>=0.2.0",
+    "langgraph==0.2.61",
+    "numpy>=2.2.2",
+    "unstructured>=0.14.8",
+    "arxiv>=1.4.0",
+]

notebook_version/uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff

streamlit_app.py CHANGED Viewed

@@ -5,7 +5,7 @@ from pathlib import Path
 from dotenv import load_dotenv
 from langchain_openai.chat_models import ChatOpenAI
 from langchain_openai.embeddings import OpenAIEmbeddings
-from langchain_core.prompts import ChatPromptTemplate
 from qdrant_client import QdrantClient
 from langchain_core.documents import Document
 from langchain.agents import AgentExecutor, create_openai_tools_agent
@@ -37,30 +37,15 @@ PROCESSED_DATA_DIR = Path("processed_data")
 CHUNKS_FILE = PROCESSED_DATA_DIR / "document_chunks.pkl"
 QDRANT_DIR = PROCESSED_DATA_DIR / "qdrant_vectorstore"
-# Define prompts
-INITIAL_RAG_PROMPT = """
 CONTEXT:
 {context}
 QUERY:
 {question}
-You are a helpful assistant with expertise in AB Testing. Use the available context to answer the question. Do not use your own knowledge! If you cannot answer the question based on the context, you must say "I don't know, but I can try a different approach if you'd like."
-"""
-EVALUATE_RESPONSE_PROMPT = """
-QUERY:
-{question}
-RESPONSE:
-{response}
-Evaluate if the response sufficiently answers the query based on the following criteria:
-1. Relevance: Does the response directly address the query topic?
-2. Completeness: Does the response fully answer all aspects of the query?
-3. Accuracy: Is the information provided factually correct and helpful?
-Return only "SUFFICIENT" if the response meets all criteria, or "INSUFFICIENT" if the response needs improvement.
 """
 REPHRASE_QUERY_PROMPT = """
@@ -70,27 +55,20 @@ QUERY:
 You are a helpful assistant. Rephrase the provided query to be more specific and to the point in order to improve retrieval in our RAG pipeline about AB Testing.
 """
-AGENT_PROMPT = """
-You are an expert AB Testing assistant. Your job is to provide helpful, accurate information about AB Testing topics.
-You have access to several tools:
-1. You can search for relevant documents in the database using search_documents - use this for general AB testing questions
-2. You can rephrase a query to get better search results using search_with_rephrased_query - use this when initial searches don't yield good results
-3. You can search ArXiv for academic papers using search_arxiv - use this for:
-   a) Specific academic papers, their authors, or publications
-   b) As a fallback when other tools don't yield satisfactory results
-   c) Technical questions that might be better answered with academic research
-When the user asks about specific papers, authors of papers, or academic publications, you should IMMEDIATELY use the search_arxiv tool rather than the document search tools.
-For general AB testing questions, follow this process:
-1. First try search_documents
-2. If that doesn't provide good information, try search_with_rephrased_query
-3. If still insufficient, try search_arxiv as a final resource before giving up
-Use these tools to provide the best possible answer.
 """
 @st.cache_resource
 def load_document_chunks():
     """Load pre-processed document chunks from disk."""
@@ -99,6 +77,7 @@ def load_document_chunks():
         print(f"Working directory contents: {os.listdir('.')}")
         if os.path.exists(PROCESSED_DATA_DIR):
             print(f"PROCESSED_DATA_DIR contents: {os.listdir(PROCESSED_DATA_DIR)}")
     try:
         with open(CHUNKS_FILE, 'rb') as f:
@@ -132,8 +111,9 @@ def get_chat_model():
                 # Call API directly
                 response = openai_client.chat.completions.create(
-                    model="gpt-3.5-turbo",
-                    messages=openai_messages
                 )
                 # Create response object with content attribute
@@ -148,7 +128,7 @@ def get_chat_model():
         print(f"Error creating OpenAI wrapper: {str(e)}")
         try:
             # Last resort fallback to basic LangChain with minimal config
-            return ChatOpenAI(model="gpt-3.5-turbo")
         except Exception as e2:
             print(f"Fallback also failed: {str(e2)}")
@@ -184,8 +164,9 @@ def get_agent_model():
                 # Call API directly with a more powerful model
                 response = openai_client.chat.completions.create(
-                    model="gpt-4",
-                    messages=openai_messages
                 )
                 class SimpleResponse:
@@ -199,12 +180,12 @@ def get_agent_model():
         print(f"Error creating agent model: {str(e)}")
         try:
             # Fallback
-            return ChatOpenAI(model="gpt-4")
         except Exception as e2:
             print(f"Agent model fallback also failed: {str(e2)}")
             # Final fallback to gpt-3.5-turbo
             try:
-                return ChatOpenAI(model="gpt-3.5-turbo")
             except:
                 # Create dummy that returns a fixed response
                 class DummyModel:
@@ -254,7 +235,7 @@ def get_embedding_model():
         print(f"Error initializing embedding model: {str(e)}")
         # Last resort fallback
         try:
-            return OpenAIEmbeddings()
         except Exception as e2:
             print(f"Embedding fallback also failed: {str(e2)}")
@@ -311,157 +292,247 @@ def setup_qdrant_client():
             print(f"Alternative initialization failed: {str(e2)}")
             raise
-def retrieve_documents(query, k=5):
-    """Retrieve relevant documents for a query."""
-    collection_name = "kohavi_ab_testing_pdf_collection"
-    print(f"Searching for documents matching: '{query}'")
     try:
-        # Get models and data
         embedding_model = get_embedding_model()
-        chunks = load_document_chunks()
-        # No chunks found? Return empty results
-        if not chunks:
-            print("No document chunks loaded, cannot perform search")
-            return [], []
-        client = setup_qdrant_client()
-        # Create a mapping of IDs to documents
         docs_by_id = {i: doc for i, doc in enumerate(chunks)}
-        # Get query embedding
-        query_embedding = embedding_model.embed_query(query)
-        # Try to search
-        try:
-            # First try search method
-            try:
-                results = client.search(
-                    collection_name=collection_name,
-                    query_vector=query_embedding,
-                    limit=k
-                )
-                print(f"Found {len(results)} results using search method")
-            except Exception as e1:
-                print(f"Search failed: {str(e1)}")
-                # Try query_points method
-                try:
-                    results = client.query_points(
-                        collection_name=collection_name,
-                        query_vector=query_embedding,
-                        limit=k
-                    )
-                    print(f"Found {len(results)} results using query_points method")
-                except Exception as e2:
-                    print(f"query_points method failed: {str(e2)}")
-                    return [], []
-            # No results? Return empty
-            if not results or len(results) == 0:
-                print("No search results found")
-                return [], []
-            # Process results
-            documents = []
-            sources_dict = {}
-            for result in results:
-                doc_id = result.id
-                if doc_id in docs_by_id:
-                    doc = docs_by_id[doc_id]
-                    documents.append(doc)
-                    # Extract source information
-                    source_path = doc.metadata.get("source", "")
-                    filename = source_path.split("/")[-1] if "/" in source_path else source_path
-                    # Remove .pdf extension if present
-                    if filename.lower().endswith('.pdf'):
-                        filename = filename[:-4]
-                    # Default to the full filename if we can't extract a title
-                    if not filename:
-                        filename = "Unknown Source"
-                    # Get page number, use a default if not available
-                    page = doc.metadata.get("page", "unknown")
-                    # All PDF sources in data directory are by Ron Kohavi, so add his name as prefix
-                    title = f"Ron Kohavi: {filename}"
-                    # Create a unique key for this source based on filename and page
-                    source_key = f"{filename}_{page}"
-                    # Only add to sources if we haven't seen this exact source before
-                    if source_key not in sources_dict:
-                        sources_dict[source_key] = {
-                            "title": title,
-                            "page": page,
-                            "score": float(result.score) if hasattr(result, "score") else 1.0,
-                            "type": "pdf"
-                        }
-                        print(f"Added source: {title}, Page: {page}")
-            # Convert the dictionary of unique sources to a list
-            sources = list(sources_dict.values())
-            print(f"Returning {len(documents)} documents with {len(sources)} sources")
-            return documents, sources
-        except Exception as e:
-            print(f"Error during vector search: {str(e)}")
-            return [], []
     except Exception as e:
         print(f"Error in document retrieval: {str(e)}")
-        import traceback
-        traceback.print_exc()
-        return [], []
-def rephrase_query(query):
-    """Rephrase the query to improve retrieval."""
-    chat_model = get_chat_model()
-    prompt = ChatPromptTemplate.from_template(REPHRASE_QUERY_PROMPT)
-    messages = prompt.format_messages(question=query)
-    response = chat_model.invoke(messages)
-    return response.content
-def generate_answer(context, question):
-    """Generate an answer using the context and question."""
     chat_model = get_chat_model()
-    prompt = ChatPromptTemplate.from_template(INITIAL_RAG_PROMPT)
-    messages = prompt.format_messages(context=context, question=question)
-    response = chat_model.invoke(messages)
-    return response.content
-def evaluate_response(question, response):
-    """Evaluate if the response is sufficient."""
-    # Use the LLM evaluation
     agent_model = get_agent_model()
-    prompt = ChatPromptTemplate.from_template(EVALUATE_RESPONSE_PROMPT)
-    messages = prompt.format_messages(question=question, response=response)
-    result = agent_model.invoke(messages)
-    return "SUFFICIENT" in result.content
 @tool
-def search_documents(query: str) -> str:
-    """Search for relevant documents in the AB Testing database."""
-    documents, _ = retrieve_documents(query)
-    if not documents:
-        return "No relevant documents found"
-    return "\n\n".join([doc.page_content for doc in documents])
 @tool
-def search_with_rephrased_query(query: str) -> str:
-    """Rephrase the query and then search for relevant documents."""
-    rephrased = rephrase_query(query)
-    documents, _ = retrieve_documents(rephrased)
-    if not documents:
-        return "No relevant documents found even with rephrased query"
-    return "\n\n".join([doc.page_content for doc in documents])
 @tool
 def search_arxiv(query: str) -> str:
@@ -548,48 +619,49 @@ def search_arxiv(query: str) -> str:
 def setup_agent():
     """Set up the agent with tools."""
     agent_model = get_agent_model()
-    tools = [search_documents, search_with_rephrased_query, search_arxiv]
-    prompt = ChatPromptTemplate.from_messages([
-        ("system", AGENT_PROMPT),
-        ("human", "{input}"),
-        ("ai", "{agent_scratchpad}")
-    ])
-    # Create the agent with better error tolerance
     try:
-        agent = create_openai_tools_agent(agent_model, tools, prompt)
         executor = AgentExecutor(
-            agent=agent,
-            tools=tools,
             verbose=True,
-            handle_parsing_errors=True,
-            max_iterations=5  # Limit iterations to prevent infinite loops
         )
-        return executor
     except Exception as e:
-        print(f"Error setting up agent: {str(e)}")
         import traceback
         traceback.print_exc()
-        # Create a simplified executor that just uses direct calls
-        class SimpleExecutor:
-            def invoke(self, inputs):
-                try:
-                    # Try to use the search documents tool directly
-                    result = search_documents(inputs["input"])
-                    if "No relevant documents found" in result:
-                        result = search_with_rephrased_query(inputs["input"])
-                    if "No relevant documents found" in result:
-                        # Try arxiv as a last resort
-                        result = search_arxiv(inputs["input"])
-                    return {"output": result}
-                except Exception as ex:
-                    print(f"Error in simple executor: {str(ex)}")
-                    return {"output": "I apologize, but I'm having trouble processing your request. Please try a different question."}
-        return SimpleExecutor()
 # Streamlit UI
 st.set_page_config(
@@ -643,66 +715,35 @@ if query:
         message_placeholder = st.empty()
         with st.status("Processing your query...", expanded=True) as status:
             print("Starting RAG process for query:", query)
-            # Try initial RAG approach first
-            st.write("Searching for relevant documents...")
-            documents, sources = retrieve_documents(query)
-            # Log search results
-            print(f"Initial search returned {len(documents)} documents")
-            # If no documents found, try rephrasing right away
-            if not documents:
-                st.write("No relevant documents found, trying with rephrased query...")
-                print("No documents found, trying rephrased query")
-                rephrased_query = rephrase_query(query)
-                st.write(f"Rephrased query: {rephrased_query}")
-                documents, sources = retrieve_documents(rephrased_query)
-                print(f"Rephrased search returned {len(documents)} documents")
-            # Format documents into a string for context
-            context = "\n\n".join([doc.page_content for doc in documents])
-            # If we have context, try the initial RAG approach
-            if context:
-                st.write(f"Found {len(documents)} relevant documents")
-                print("Generating answer from retrieved documents")
-                initial_answer = generate_answer(context, query)
-                # Evaluate if the initial answer is sufficient
-                st.write("Evaluating answer quality...")
-                is_sufficient = evaluate_response(query, initial_answer)
-                print(f"Answer evaluation: sufficient={is_sufficient}")
-                if is_sufficient:
-                    # If the initial answer is good, use it
-                    answer = initial_answer
-                else:
-                    # If not sufficient, use the agent with tools
-                    st.write("Initial answer needs improvement, enhancing with more tools...")
-                    print("Using agent to enhance answer")
-                    agent = setup_agent()
-                    agent_response = agent.invoke({"input": query})
-                    answer = agent_response["output"]
-                    # If the agent used ArXiv and found sources, use those
-                    if ARXIV_SOURCES:
-                        print(f"Agent found {len(ARXIV_SOURCES)} ArXiv sources")
-                        # Only replace sources if ArXiv found something
-                        if ARXIV_SOURCES:
-                            sources = ARXIV_SOURCES
             else:
-                # If no context at all, use the agent as a last resort
-                st.write("No relevant documents found in our database, trying other resources...")
-                print("No context found, using agent as fallback")
-                agent = setup_agent()
-                agent_response = agent.invoke({"input": query})
-                answer = agent_response["output"]
-                # Check if ArXiv sources are available
-                if ARXIV_SOURCES:
-                    sources = ARXIV_SOURCES
             status.update(label="Completed!", state="complete", expanded=False)

 from dotenv import load_dotenv
 from langchain_openai.chat_models import ChatOpenAI
 from langchain_openai.embeddings import OpenAIEmbeddings
+from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
 from qdrant_client import QdrantClient
 from langchain_core.documents import Document
 from langchain.agents import AgentExecutor, create_openai_tools_agent
 CHUNKS_FILE = PROCESSED_DATA_DIR / "document_chunks.pkl"
 QDRANT_DIR = PROCESSED_DATA_DIR / "qdrant_vectorstore"
+# Define prompts exactly as in the notebook
+RAG_PROMPT = """
 CONTEXT:
 {context}
 QUERY:
 {question}
+You are a helpful assistant. Use the available context to answer the question. Do not use your own knowledge! If you cannot answer the question based on the context, you must say "I don't know".
 """
 REPHRASE_QUERY_PROMPT = """
 You are a helpful assistant. Rephrase the provided query to be more specific and to the point in order to improve retrieval in our RAG pipeline about AB Testing.
 """
+EVALUATE_RESPONSE_PROMPT = """
+Given an initial query, determine if the initial query is related to AB Testing (even vaguely e.g. statistics, A/B testing, etc.) or not. If not related to AB Testing, return 'Y'. If related to AB Testing, then given the initial query and a final response, determine if the final response is extremely helpful or not. If extremely helpful, return 'Y'. If not extremely helpful, return 'N'.
+Initial Query:
+{initial_query}
+Final Response:
+{final_response}
 """
+rag_prompt = ChatPromptTemplate.from_template(RAG_PROMPT)
+rephrase_query_prompt = ChatPromptTemplate.from_template(REPHRASE_QUERY_PROMPT)
+evaluate_prompt = PromptTemplate.from_template(EVALUATE_RESPONSE_PROMPT)
 @st.cache_resource
 def load_document_chunks():
     """Load pre-processed document chunks from disk."""
         print(f"Working directory contents: {os.listdir('.')}")
         if os.path.exists(PROCESSED_DATA_DIR):
             print(f"PROCESSED_DATA_DIR contents: {os.listdir(PROCESSED_DATA_DIR)}")
+        return []
     try:
         with open(CHUNKS_FILE, 'rb') as f:
                 # Call API directly
                 response = openai_client.chat.completions.create(
+                    model="gpt-4.1-mini",
+                    messages=openai_messages,
+                    temperature=0
                 )
                 # Create response object with content attribute
         print(f"Error creating OpenAI wrapper: {str(e)}")
         try:
             # Last resort fallback to basic LangChain with minimal config
+            return ChatOpenAI(model="gpt-4.1-mini", temperature=0)
         except Exception as e2:
             print(f"Fallback also failed: {str(e2)}")
                 # Call API directly with a more powerful model
                 response = openai_client.chat.completions.create(
+                    model="gpt-4.1",
+                    messages=openai_messages,
+                    temperature=0
                 )
                 class SimpleResponse:
         print(f"Error creating agent model: {str(e)}")
         try:
             # Fallback
+            return ChatOpenAI(model="gpt-4.1", temperature=0)
         except Exception as e2:
             print(f"Agent model fallback also failed: {str(e2)}")
             # Final fallback to gpt-3.5-turbo
             try:
+                return ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
             except:
                 # Create dummy that returns a fixed response
                 class DummyModel:
         print(f"Error initializing embedding model: {str(e)}")
         # Last resort fallback
         try:
+            return OpenAIEmbeddings(model="text-embedding-3-small")
         except Exception as e2:
             print(f"Embedding fallback also failed: {str(e2)}")
             print(f"Alternative initialization failed: {str(e2)}")
             raise
+def rag_chain_node(query):
+    """
+    Implements the equivalent of the rag_chain_node from the notebook.
+    This function retrieves documents, extracts sources, and generates an answer.
+    """
+    print(f"rag_chain_node: Processing query '{query}'")
+    # 1. Retrieve documents once
     try:
+        print("Setting up retriever...")
+        client = setup_qdrant_client()
+        collection_name = "kohavi_ab_testing_pdf_collection"
+        # Get embedding for the query
         embedding_model = get_embedding_model()
+        query_embedding = embedding_model.embed_query(query)
+        # Get documents
+        print("Retrieving documents...")
+        chunks = load_document_chunks()
+        # Map of document IDs to actual documents
         docs_by_id = {i: doc for i, doc in enumerate(chunks)}
+        # Search for relevant documents
+        search_results = client.search(
+            collection_name=collection_name,
+            query_vector=query_embedding,
+            limit=5
+        )
+        # Convert search results to documents
+        docs = []
+        for result in search_results:
+            doc_id = result.id
+            if doc_id in docs_by_id:
+                docs.append(docs_by_id[doc_id])
     except Exception as e:
         print(f"Error in document retrieval: {str(e)}")
+        return "I'm having trouble retrieving relevant information. Please try again later.", []
+    # 2. Extract sources from the documents
+    sources = []
+    for doc in docs:
+        source_path = doc.metadata.get("source", "")
+        filename = source_path.split("/")[-1] if "/" in source_path else source_path
+        # Remove .pdf extension if present
+        if filename.lower().endswith('.pdf'):
+            filename = filename[:-4]
+        sources.append({
+            "title": f"Ron Kohavi: {filename}",
+            "page": doc.metadata.get("page", "unknown"),
+            "type": "pdf"
+        })
+    # 3. Use the RAG chain to generate an answer
+    if not docs:
+        print("No documents found")
+        return "I don't have enough information to answer that question.", []
+    # Create context from documents
+    context = "\n\n".join([doc.page_content for doc in docs])
+    # Format the prompt with context and query
+    formatted_prompt = rag_prompt.format(context=context, question=query)
+    # Send to the model and parse the output
+    print("Generating answer...")
     chat_model = get_chat_model()
+    response = chat_model.invoke(formatted_prompt)
+    response_text = response.content
+    return response_text, sources
+def evaluate_response(query, response):
+    """
+    Determines if the initial RAG response was sufficient using the original evaluation logic.
+    Returns True if the response is sufficient, False otherwise.
+    """
+    print(f"Evaluating response for '{query}'")
     agent_model = get_agent_model()
+    formatted_prompt = evaluate_prompt.format(
+        initial_query=query,
+        final_response=response
+    )
+    helpfulness_chain = agent_model
+    messages = [HumanMessage(content=formatted_prompt)]
+    helpfulness_response = helpfulness_chain.invoke(messages)
+    # Check if 'Y' is in the response
+    if "Y" in helpfulness_response.content:
+        print("Evaluation: Initial response is sufficient")
+        return True
+    else:
+        print("Evaluation: Initial response is NOT sufficient, need to use agent")
+        return False
 @tool
+def retrieve_information(query: str) -> str:
+    """Use Retrieval Augmented Generation to retrieve information about AB Testing."""
+    # 1. Retrieve documents
+    client = setup_qdrant_client()
+    collection_name = "kohavi_ab_testing_pdf_collection"
+    # Get embedding for the query
+    embedding_model = get_embedding_model()
+    query_embedding = embedding_model.embed_query(query)
+    # Get documents
+    chunks = load_document_chunks()
+    # Map of document IDs to actual documents
+    docs_by_id = {i: doc for i, doc in enumerate(chunks)}
+    # Search for relevant documents
+    try:
+        search_results = client.search(
+            collection_name=collection_name,
+            query_vector=query_embedding,
+            limit=5
+        )
+    except Exception as e:
+        print(f"Error in search: {str(e)}")
+        try:
+            search_results = client.query_points(
+                collection_name=collection_name,
+                query_vector=query_embedding,
+                limit=5
+            )
+        except Exception as e2:
+            print(f"Error in query_points: {str(e2)}")
+            return "Error retrieving documents."
+    # Convert search results to documents
+    docs = []
+    for result in search_results:
+        doc_id = result.id
+        if doc_id in docs_by_id:
+            docs.append(docs_by_id[doc_id])
+    # 2. Extract and store sources
+    sources = []
+    for doc in docs:
+        source_path = doc.metadata.get("source", "")
+        filename = source_path.split("/")[-1] if "/" in source_path else source_path
+        # Remove .pdf extension if present
+        if filename.lower().endswith('.pdf'):
+            filename = filename[:-4]
+        sources.append({
+            "title": f"Ron Kohavi: {filename}",
+            "page": doc.metadata.get("page", "unknown"),
+            "type": "pdf"
+        })
+    # Store sources for later access
+    retrieve_information.last_sources = sources
+    # 3. Return just the formatted document contents
+    formatted_content = "\n\n".join([f"Retrieved Information: {i+1}\n{doc.page_content}"
+                                  for i, doc in enumerate(docs)])
+    return formatted_content
 @tool
+def retrieve_information_with_rephrased_query(query: str) -> str:
+    """This tool will intelligently rephrase your AB testing query and then will use Retrieval Augmented Generation to retrieve information about the rephrased query."""
+    # 1. Rephrase the query first
+    chat_model = get_chat_model()
+    rephrased_query_msg = rephrase_query_prompt.format(question=query)
+    rephrased_query_response = chat_model.invoke(rephrased_query_msg)
+    rephrased_query = rephrased_query_response.content
+    # 2. Retrieve documents using the rephrased query
+    client = setup_qdrant_client()
+    collection_name = "kohavi_ab_testing_pdf_collection"
+    # Get embedding for the query
+    embedding_model = get_embedding_model()
+    query_embedding = embedding_model.embed_query(rephrased_query)
+    # Get documents
+    chunks = load_document_chunks()
+    # Map of document IDs to actual documents
+    docs_by_id = {i: doc for i, doc in enumerate(chunks)}
+    # Search for relevant documents
+    try:
+        search_results = client.search(
+            collection_name=collection_name,
+            query_vector=query_embedding,
+            limit=5
+        )
+    except Exception as e:
+        print(f"Error in search: {str(e)}")
+        try:
+            search_results = client.query_points(
+                collection_name=collection_name,
+                query_vector=query_embedding,
+                limit=5
+            )
+        except Exception as e2:
+            print(f"Error in query_points: {str(e2)}")
+            return f"Error retrieving documents with rephrased query: {rephrased_query}"
+    # Convert search results to documents
+    docs = []
+    for result in search_results:
+        doc_id = result.id
+        if doc_id in docs_by_id:
+            docs.append(docs_by_id[doc_id])
+    # 3. Extract and store sources
+    sources = []
+    for doc in docs:
+        source_path = doc.metadata.get("source", "")
+        filename = source_path.split("/")[-1] if "/" in source_path else source_path
+        # Remove .pdf extension if present
+        if filename.lower().endswith('.pdf'):
+            filename = filename[:-4]
+        sources.append({
+            "title": f"Ron Kohavi: {filename}",
+            "page": doc.metadata.get("page", "unknown"),
+            "type": "pdf"
+        })
+    # Store sources for later access
+    retrieve_information_with_rephrased_query.last_sources = sources
+    # 4. Return formatted content with rephrased query
+    formatted_content = f"Rephrased query: {rephrased_query}\n\n" + "\n\n".join(
+        [f"Retrieved Information: {i+1}\n{doc.page_content}" for i, doc in enumerate(docs)]
+    )
+    return formatted_content
 @tool
 def search_arxiv(query: str) -> str:
 def setup_agent():
     """Set up the agent with tools."""
     agent_model = get_agent_model()
+    tools = [retrieve_information, retrieve_information_with_rephrased_query, search_arxiv]
+    try:
+        return create_openai_tools_agent(
+            llm=agent_model,
+            tools=tools,
+            prompt=ChatPromptTemplate.from_messages([
+                ("system", "You are an expert AB Testing assistant. Your job is to provide helpful, accurate information about AB Testing topics."),
+                ("human", "{input}"),
+                ("ai", "{agent_scratchpad}")
+            ])
+        )
+    except Exception as e:
+        print(f"Error creating agent: {str(e)}")
+        return None
+def execute_agent(agent, query):
+    """Execute the agent with the given query."""
     try:
         executor = AgentExecutor(
+            agent=agent,
+            tools=[retrieve_information, retrieve_information_with_rephrased_query, search_arxiv],
             verbose=True,
+            handle_parsing_errors=True
         )
+        response = executor.invoke({"input": query})
+        # Extract sources based on used tools
+        sources = []
+        if hasattr(retrieve_information, "last_sources"):
+            sources = retrieve_information.last_sources
+        elif hasattr(retrieve_information_with_rephrased_query, "last_sources"):
+            sources = retrieve_information_with_rephrased_query.last_sources
+        elif ARXIV_SOURCES:
+            sources = ARXIV_SOURCES
+        return response["output"], sources
     except Exception as e:
+        print(f"Error executing agent: {str(e)}")
         import traceback
         traceback.print_exc()
+        return "I'm having trouble processing your request. Please try again.", []
 # Streamlit UI
 st.set_page_config(
         message_placeholder = st.empty()
         with st.status("Processing your query...", expanded=True) as status:
+            # Follow the exact flow from the notebook
+            st.write("Starting with Initial RAG...")
             print("Starting RAG process for query:", query)
+            # Step 1: Initial RAG
+            initial_response, sources = rag_chain_node(query)
+            # Step 2: Evaluate response
+            is_sufficient = evaluate_response(query, initial_response)
+            # Step 3: Either end with the initial response or use the agent
+            if is_sufficient:
+                answer = initial_response
+                st.write("Initial response is sufficient")
             else:
+                st.write("Initial response needs improvement, using specialized agent...")
+                print("Using agent for enhanced answer")
+                # Create and execute the agent
+                agent = setup_agent()
+                if agent:
+                    agent_response, agent_sources = execute_agent(agent, query)
+                    answer = agent_response
+                    # If the agent found sources, use those instead
+                    if agent_sources:
+                        sources = agent_sources
+                else:
+                    answer = "I'm having trouble setting up the specialized agent. Here's what I found initially: " + initial_response
             status.update(label="Completed!", state="complete", expanded=False)