kamkol commited on
Commit
4ddf811
·
1 Parent(s): 23cc167

Improve agent logic with improved templates

Browse files
notebook_version/AB_Testing_RAG_Agent.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
notebook_version/README.md ADDED
@@ -0,0 +1,94 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ <p align = "center" draggable=”false” ><img src="https://github.com/AI-Maker-Space/LLM-Dev-101/assets/37101144/d1343317-fa2f-41e1-8af1-1dbb18399719"
3
+ width="200px"
4
+ height="auto"/>
5
+ </p>
6
+
7
+ ## <h1 align="center" id="heading">Session 8: Evaluating RAG with Ragas</h1>
8
+
9
+ | 🤓 Pre-work | 📰 Session Sheet | ⏺️ Recording | 🖼️ Slides | 👨‍💻 Repo | 📝 Homework | 📁 Feedback |
10
+ |:-----------------|:-----------------|:-----------------|:-----------------|:-----------------|:-----------------|:-----------------|
11
+ | [Session 8: Pre-Work](https://www.notion.so/Session-8-RAG-Evaluation-and-Assessment-1c8cd547af3d81d08f7cf5521d0253bb?pvs=4#1c8cd547af3d816583d6c23183b6f87f) | [Session 8: RAG Evaluation and Assessment](https://www.notion.so/Session-8-RAG-Evaluation-and-Assessment-1c8cd547af3d81d08f7cf5521d0253bb) | Coming soon! | [Session 8 Slides](https://www.canva.com/design/DAGjadKGqcw/0Gff9K2EwbOb3lX14un3uw/edit?utm_content=DAGjadKGqcw&utm_campaign=designshare&utm_medium=link2&utm_source=sharebutton) | You are here! | [Session 8: RAG Evaluation and Assessment](https://forms.gle/ujAQLqx2ZHMWTUH79) | [AIE6 Feedback 4/24](https://forms.gle/wA7p89e6svCgjtr58) |
12
+
13
+ In today's assignment, we'll be creating Synthetic Data, and using it to benchmark (and improve) a LCEL RAG Chain.
14
+
15
+ - 🤝 Breakout Room #1
16
+ 1. Task 1: Installing Required Libraries
17
+ 2. Task 2: Set Environment Variables
18
+ 3. Task 3: Synthetic Dataset Generation for Evaluation using Ragas
19
+ 4. Task 4: Evaluating our Pipeline with Ragas
20
+ 5. Task 6: Making Adjustments and Re-Evaluating
21
+
22
+ The notebook Colab link is located [here](https://colab.research.google.com/drive/1-t4POIFJI-SWF1lmoBOPETZZqgWCTV4Y?usp=sharing)
23
+
24
+ - 🤝 Breakout Room #2
25
+ 1. Task 1: Building a ReAct Agent with Metal Price Tool
26
+ 2. Task 2: Implementing the Agent Graph Structure
27
+ 3. Task 3: Converting Agent Messages to Ragas Format
28
+ 4. Task 4: Evaluating Agent Performance using Ragas Metrics
29
+ - Tool Call Accuracy
30
+ - Agent Goal Accuracy
31
+ - Topic Adherence
32
+
33
+ The notebook Colab link is located [here](https://colab.research.google.com/drive/1KQm7nA_zTaCyjaAeAacjqanMPv03um7T?usp=sharing)
34
+
35
+ ## Ship 🚢
36
+
37
+ The completed notebook!
38
+
39
+ <details>
40
+ <summary>🚧 BONUS CHALLENGE 🚧 (OPTIONAL)</summary>
41
+
42
+ > NOTE: Completing this challenge will provide full marks on the assignment, regardless of the completion of the notebook. You do not need to complete this in the notebook for full marks.
43
+
44
+ ##### **MINIMUM REQUIREMENTS**:
45
+
46
+ 1. Baseline `LangGraph RAG` Application using `NAIVE RETRIEVAL`
47
+ 2. Baseline Evaluation using `RAGAS METRICS`
48
+ - [Faithfulness](https://docs.ragas.io/en/stable/concepts/metrics/faithfulness.html)
49
+ - [Answer Relevancy](https://docs.ragas.io/en/stable/concepts/metrics/answer_relevance.html)
50
+ - [Context Precision](https://docs.ragas.io/en/stable/concepts/metrics/context_precision.html)
51
+ - [Context Recall](https://docs.ragas.io/en/stable/concepts/metrics/context_recall.html)
52
+ - [Answer Correctness](https://docs.ragas.io/en/stable/concepts/metrics/answer_correctness.html)
53
+ 3. Implement a `SEMANTIC CHUNKING STRATEGY`.
54
+ 4. Create an `LangGraph RAG` Application using `SEMANTIC CHUNKING` with `NAIVE RETRIEVAL`.
55
+ 5. Compare and contrast results.
56
+
57
+ ##### **SEMANTIC CHUNKING REQUIREMENTS**:
58
+
59
+ Chunk semantically similar (based on designed threshold) sentences, and then paragraphs, greedily, up to a maximum chunk size. Minimum chunk size is a single sentence.
60
+
61
+ Have fun!
62
+ </details>
63
+
64
+ ### Deliverables
65
+
66
+ - A short Loom of the notebook, and a 1min. walkthrough of the application in full
67
+
68
+ ## Share 🚀
69
+
70
+ Make a social media post about your final application!
71
+
72
+ ### Deliverables
73
+
74
+ - Make a post on any social media platform about what you built!
75
+
76
+ Here's a template to get you started:
77
+
78
+ ```
79
+ 🚀 Exciting News! 🚀
80
+
81
+ I am thrilled to announce that I have just built and shipped Synthetic Data Generation, benchmarking, and iteration with RAGAS & LangChain! 🎉🤖
82
+
83
+ 🔍 Three Key Takeaways:
84
+ 1️⃣
85
+ 2️⃣
86
+ 3️⃣
87
+
88
+ Let's continue pushing the boundaries of what's possible in the world of AI and question-answering. Here's to many more innovations! 🚀
89
+ Shout out to @AIMakerspace !
90
+
91
+ #LangChain #QuestionAnswering #RetrievalAugmented #Innovation #AI #TechMilestone
92
+
93
+ Feel free to reach out if you're curious or would like to collaborate on similar projects! 🤝🔥
94
+ ```
notebook_version/pyproject.toml ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [project]
2
+ name = "08-evaluating-rag-with-ragas"
3
+ version = "0.1.0"
4
+ description = "Add your description here"
5
+ readme = "README.md"
6
+ requires-python = ">=3.13"
7
+ dependencies = [
8
+ "jupyter>=1.1.1",
9
+ "langchain-community==0.3.14",
10
+ "langchain-openai==0.2.14",
11
+ "langchain-qdrant>=0.2.0",
12
+ "langgraph==0.2.61",
13
+ "numpy>=2.2.2",
14
+ "unstructured>=0.14.8",
15
+ "arxiv>=1.4.0",
16
+ ]
notebook_version/uv.lock ADDED
The diff for this file is too large to render. See raw diff
 
streamlit_app.py CHANGED
@@ -5,7 +5,7 @@ from pathlib import Path
5
  from dotenv import load_dotenv
6
  from langchain_openai.chat_models import ChatOpenAI
7
  from langchain_openai.embeddings import OpenAIEmbeddings
8
- from langchain_core.prompts import ChatPromptTemplate
9
  from qdrant_client import QdrantClient
10
  from langchain_core.documents import Document
11
  from langchain.agents import AgentExecutor, create_openai_tools_agent
@@ -37,30 +37,15 @@ PROCESSED_DATA_DIR = Path("processed_data")
37
  CHUNKS_FILE = PROCESSED_DATA_DIR / "document_chunks.pkl"
38
  QDRANT_DIR = PROCESSED_DATA_DIR / "qdrant_vectorstore"
39
 
40
- # Define prompts
41
- INITIAL_RAG_PROMPT = """
42
  CONTEXT:
43
  {context}
44
 
45
  QUERY:
46
  {question}
47
 
48
- You are a helpful assistant with expertise in AB Testing. Use the available context to answer the question. Do not use your own knowledge! If you cannot answer the question based on the context, you must say "I don't know, but I can try a different approach if you'd like."
49
- """
50
-
51
- EVALUATE_RESPONSE_PROMPT = """
52
- QUERY:
53
- {question}
54
-
55
- RESPONSE:
56
- {response}
57
-
58
- Evaluate if the response sufficiently answers the query based on the following criteria:
59
- 1. Relevance: Does the response directly address the query topic?
60
- 2. Completeness: Does the response fully answer all aspects of the query?
61
- 3. Accuracy: Is the information provided factually correct and helpful?
62
-
63
- Return only "SUFFICIENT" if the response meets all criteria, or "INSUFFICIENT" if the response needs improvement.
64
  """
65
 
66
  REPHRASE_QUERY_PROMPT = """
@@ -70,27 +55,20 @@ QUERY:
70
  You are a helpful assistant. Rephrase the provided query to be more specific and to the point in order to improve retrieval in our RAG pipeline about AB Testing.
71
  """
72
 
73
- AGENT_PROMPT = """
74
- You are an expert AB Testing assistant. Your job is to provide helpful, accurate information about AB Testing topics.
75
-
76
- You have access to several tools:
77
- 1. You can search for relevant documents in the database using search_documents - use this for general AB testing questions
78
- 2. You can rephrase a query to get better search results using search_with_rephrased_query - use this when initial searches don't yield good results
79
- 3. You can search ArXiv for academic papers using search_arxiv - use this for:
80
- a) Specific academic papers, their authors, or publications
81
- b) As a fallback when other tools don't yield satisfactory results
82
- c) Technical questions that might be better answered with academic research
83
-
84
- When the user asks about specific papers, authors of papers, or academic publications, you should IMMEDIATELY use the search_arxiv tool rather than the document search tools.
85
 
86
- For general AB testing questions, follow this process:
87
- 1. First try search_documents
88
- 2. If that doesn't provide good information, try search_with_rephrased_query
89
- 3. If still insufficient, try search_arxiv as a final resource before giving up
90
 
91
- Use these tools to provide the best possible answer.
 
92
  """
93
 
 
 
 
 
94
  @st.cache_resource
95
  def load_document_chunks():
96
  """Load pre-processed document chunks from disk."""
@@ -99,6 +77,7 @@ def load_document_chunks():
99
  print(f"Working directory contents: {os.listdir('.')}")
100
  if os.path.exists(PROCESSED_DATA_DIR):
101
  print(f"PROCESSED_DATA_DIR contents: {os.listdir(PROCESSED_DATA_DIR)}")
 
102
 
103
  try:
104
  with open(CHUNKS_FILE, 'rb') as f:
@@ -132,8 +111,9 @@ def get_chat_model():
132
 
133
  # Call API directly
134
  response = openai_client.chat.completions.create(
135
- model="gpt-3.5-turbo",
136
- messages=openai_messages
 
137
  )
138
 
139
  # Create response object with content attribute
@@ -148,7 +128,7 @@ def get_chat_model():
148
  print(f"Error creating OpenAI wrapper: {str(e)}")
149
  try:
150
  # Last resort fallback to basic LangChain with minimal config
151
- return ChatOpenAI(model="gpt-3.5-turbo")
152
  except Exception as e2:
153
  print(f"Fallback also failed: {str(e2)}")
154
 
@@ -184,8 +164,9 @@ def get_agent_model():
184
 
185
  # Call API directly with a more powerful model
186
  response = openai_client.chat.completions.create(
187
- model="gpt-4",
188
- messages=openai_messages
 
189
  )
190
 
191
  class SimpleResponse:
@@ -199,12 +180,12 @@ def get_agent_model():
199
  print(f"Error creating agent model: {str(e)}")
200
  try:
201
  # Fallback
202
- return ChatOpenAI(model="gpt-4")
203
  except Exception as e2:
204
  print(f"Agent model fallback also failed: {str(e2)}")
205
  # Final fallback to gpt-3.5-turbo
206
  try:
207
- return ChatOpenAI(model="gpt-3.5-turbo")
208
  except:
209
  # Create dummy that returns a fixed response
210
  class DummyModel:
@@ -254,7 +235,7 @@ def get_embedding_model():
254
  print(f"Error initializing embedding model: {str(e)}")
255
  # Last resort fallback
256
  try:
257
- return OpenAIEmbeddings()
258
  except Exception as e2:
259
  print(f"Embedding fallback also failed: {str(e2)}")
260
 
@@ -311,157 +292,247 @@ def setup_qdrant_client():
311
  print(f"Alternative initialization failed: {str(e2)}")
312
  raise
313
 
314
- def retrieve_documents(query, k=5):
315
- """Retrieve relevant documents for a query."""
316
- collection_name = "kohavi_ab_testing_pdf_collection"
317
- print(f"Searching for documents matching: '{query}'")
 
 
318
 
 
319
  try:
320
- # Get models and data
 
 
 
 
321
  embedding_model = get_embedding_model()
322
- chunks = load_document_chunks()
323
 
324
- # No chunks found? Return empty results
325
- if not chunks:
326
- print("No document chunks loaded, cannot perform search")
327
- return [], []
328
-
329
- client = setup_qdrant_client()
330
 
331
- # Create a mapping of IDs to documents
332
  docs_by_id = {i: doc for i, doc in enumerate(chunks)}
333
 
334
- # Get query embedding
335
- query_embedding = embedding_model.embed_query(query)
 
 
 
 
336
 
337
- # Try to search
338
- try:
339
- # First try search method
340
- try:
341
- results = client.search(
342
- collection_name=collection_name,
343
- query_vector=query_embedding,
344
- limit=k
345
- )
346
- print(f"Found {len(results)} results using search method")
347
- except Exception as e1:
348
- print(f"Search failed: {str(e1)}")
349
-
350
- # Try query_points method
351
- try:
352
- results = client.query_points(
353
- collection_name=collection_name,
354
- query_vector=query_embedding,
355
- limit=k
356
- )
357
- print(f"Found {len(results)} results using query_points method")
358
- except Exception as e2:
359
- print(f"query_points method failed: {str(e2)}")
360
- return [], []
361
-
362
- # No results? Return empty
363
- if not results or len(results) == 0:
364
- print("No search results found")
365
- return [], []
366
-
367
- # Process results
368
- documents = []
369
- sources_dict = {}
370
-
371
- for result in results:
372
- doc_id = result.id
373
- if doc_id in docs_by_id:
374
- doc = docs_by_id[doc_id]
375
- documents.append(doc)
376
-
377
- # Extract source information
378
- source_path = doc.metadata.get("source", "")
379
- filename = source_path.split("/")[-1] if "/" in source_path else source_path
380
-
381
- # Remove .pdf extension if present
382
- if filename.lower().endswith('.pdf'):
383
- filename = filename[:-4]
384
-
385
- # Default to the full filename if we can't extract a title
386
- if not filename:
387
- filename = "Unknown Source"
388
-
389
- # Get page number, use a default if not available
390
- page = doc.metadata.get("page", "unknown")
391
-
392
- # All PDF sources in data directory are by Ron Kohavi, so add his name as prefix
393
- title = f"Ron Kohavi: {filename}"
394
-
395
- # Create a unique key for this source based on filename and page
396
- source_key = f"{filename}_{page}"
397
-
398
- # Only add to sources if we haven't seen this exact source before
399
- if source_key not in sources_dict:
400
- sources_dict[source_key] = {
401
- "title": title,
402
- "page": page,
403
- "score": float(result.score) if hasattr(result, "score") else 1.0,
404
- "type": "pdf"
405
- }
406
- print(f"Added source: {title}, Page: {page}")
407
-
408
- # Convert the dictionary of unique sources to a list
409
- sources = list(sources_dict.values())
410
-
411
- print(f"Returning {len(documents)} documents with {len(sources)} sources")
412
- return documents, sources
413
-
414
- except Exception as e:
415
- print(f"Error during vector search: {str(e)}")
416
- return [], []
417
-
418
  except Exception as e:
419
  print(f"Error in document retrieval: {str(e)}")
420
- import traceback
421
- traceback.print_exc()
422
- return [], []
423
-
424
- def rephrase_query(query):
425
- """Rephrase the query to improve retrieval."""
426
- chat_model = get_chat_model()
427
- prompt = ChatPromptTemplate.from_template(REPHRASE_QUERY_PROMPT)
428
- messages = prompt.format_messages(question=query)
429
- response = chat_model.invoke(messages)
430
- return response.content
431
-
432
- def generate_answer(context, question):
433
- """Generate an answer using the context and question."""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
434
  chat_model = get_chat_model()
435
- prompt = ChatPromptTemplate.from_template(INITIAL_RAG_PROMPT)
436
- messages = prompt.format_messages(context=context, question=question)
437
- response = chat_model.invoke(messages)
438
- return response.content
439
-
440
- def evaluate_response(question, response):
441
- """Evaluate if the response is sufficient."""
442
- # Use the LLM evaluation
 
 
 
443
  agent_model = get_agent_model()
444
- prompt = ChatPromptTemplate.from_template(EVALUATE_RESPONSE_PROMPT)
445
- messages = prompt.format_messages(question=question, response=response)
446
- result = agent_model.invoke(messages)
447
- return "SUFFICIENT" in result.content
 
 
 
 
 
 
 
 
 
 
 
 
 
448
 
449
  @tool
450
- def search_documents(query: str) -> str:
451
- """Search for relevant documents in the AB Testing database."""
452
- documents, _ = retrieve_documents(query)
453
- if not documents:
454
- return "No relevant documents found"
455
- return "\n\n".join([doc.page_content for doc in documents])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
456
 
457
  @tool
458
- def search_with_rephrased_query(query: str) -> str:
459
- """Rephrase the query and then search for relevant documents."""
460
- rephrased = rephrase_query(query)
461
- documents, _ = retrieve_documents(rephrased)
462
- if not documents:
463
- return "No relevant documents found even with rephrased query"
464
- return "\n\n".join([doc.page_content for doc in documents])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
465
 
466
  @tool
467
  def search_arxiv(query: str) -> str:
@@ -548,48 +619,49 @@ def search_arxiv(query: str) -> str:
548
  def setup_agent():
549
  """Set up the agent with tools."""
550
  agent_model = get_agent_model()
551
- tools = [search_documents, search_with_rephrased_query, search_arxiv]
552
- prompt = ChatPromptTemplate.from_messages([
553
- ("system", AGENT_PROMPT),
554
- ("human", "{input}"),
555
- ("ai", "{agent_scratchpad}")
556
- ])
557
-
558
- # Create the agent with better error tolerance
 
 
 
 
 
 
 
 
 
 
559
  try:
560
- agent = create_openai_tools_agent(agent_model, tools, prompt)
561
  executor = AgentExecutor(
562
- agent=agent,
563
- tools=tools,
564
  verbose=True,
565
- handle_parsing_errors=True,
566
- max_iterations=5 # Limit iterations to prevent infinite loops
567
  )
568
- return executor
 
 
 
 
 
 
 
 
 
 
 
 
569
  except Exception as e:
570
- print(f"Error setting up agent: {str(e)}")
571
  import traceback
572
  traceback.print_exc()
573
-
574
- # Create a simplified executor that just uses direct calls
575
- class SimpleExecutor:
576
- def invoke(self, inputs):
577
- try:
578
- # Try to use the search documents tool directly
579
- result = search_documents(inputs["input"])
580
- if "No relevant documents found" in result:
581
- result = search_with_rephrased_query(inputs["input"])
582
-
583
- if "No relevant documents found" in result:
584
- # Try arxiv as a last resort
585
- result = search_arxiv(inputs["input"])
586
-
587
- return {"output": result}
588
- except Exception as ex:
589
- print(f"Error in simple executor: {str(ex)}")
590
- return {"output": "I apologize, but I'm having trouble processing your request. Please try a different question."}
591
-
592
- return SimpleExecutor()
593
 
594
  # Streamlit UI
595
  st.set_page_config(
@@ -643,66 +715,35 @@ if query:
643
  message_placeholder = st.empty()
644
 
645
  with st.status("Processing your query...", expanded=True) as status:
 
 
646
  print("Starting RAG process for query:", query)
647
 
648
- # Try initial RAG approach first
649
- st.write("Searching for relevant documents...")
650
- documents, sources = retrieve_documents(query)
651
 
652
- # Log search results
653
- print(f"Initial search returned {len(documents)} documents")
654
 
655
- # If no documents found, try rephrasing right away
656
- if not documents:
657
- st.write("No relevant documents found, trying with rephrased query...")
658
- print("No documents found, trying rephrased query")
659
- rephrased_query = rephrase_query(query)
660
- st.write(f"Rephrased query: {rephrased_query}")
661
- documents, sources = retrieve_documents(rephrased_query)
662
- print(f"Rephrased search returned {len(documents)} documents")
663
-
664
- # Format documents into a string for context
665
- context = "\n\n".join([doc.page_content for doc in documents])
666
-
667
- # If we have context, try the initial RAG approach
668
- if context:
669
- st.write(f"Found {len(documents)} relevant documents")
670
- print("Generating answer from retrieved documents")
671
- initial_answer = generate_answer(context, query)
672
-
673
- # Evaluate if the initial answer is sufficient
674
- st.write("Evaluating answer quality...")
675
- is_sufficient = evaluate_response(query, initial_answer)
676
- print(f"Answer evaluation: sufficient={is_sufficient}")
677
-
678
- if is_sufficient:
679
- # If the initial answer is good, use it
680
- answer = initial_answer
681
- else:
682
- # If not sufficient, use the agent with tools
683
- st.write("Initial answer needs improvement, enhancing with more tools...")
684
- print("Using agent to enhance answer")
685
- agent = setup_agent()
686
- agent_response = agent.invoke({"input": query})
687
- answer = agent_response["output"]
688
-
689
- # If the agent used ArXiv and found sources, use those
690
- if ARXIV_SOURCES:
691
- print(f"Agent found {len(ARXIV_SOURCES)} ArXiv sources")
692
- # Only replace sources if ArXiv found something
693
- if ARXIV_SOURCES:
694
- sources = ARXIV_SOURCES
695
  else:
696
- # If no context at all, use the agent as a last resort
697
- st.write("No relevant documents found in our database, trying other resources...")
698
- print("No context found, using agent as fallback")
699
- agent = setup_agent()
700
- agent_response = agent.invoke({"input": query})
701
- answer = agent_response["output"]
702
 
703
- # Check if ArXiv sources are available
704
- if ARXIV_SOURCES:
705
- sources = ARXIV_SOURCES
 
 
 
 
 
 
 
 
706
 
707
  status.update(label="Completed!", state="complete", expanded=False)
708
 
 
5
  from dotenv import load_dotenv
6
  from langchain_openai.chat_models import ChatOpenAI
7
  from langchain_openai.embeddings import OpenAIEmbeddings
8
+ from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
9
  from qdrant_client import QdrantClient
10
  from langchain_core.documents import Document
11
  from langchain.agents import AgentExecutor, create_openai_tools_agent
 
37
  CHUNKS_FILE = PROCESSED_DATA_DIR / "document_chunks.pkl"
38
  QDRANT_DIR = PROCESSED_DATA_DIR / "qdrant_vectorstore"
39
 
40
+ # Define prompts exactly as in the notebook
41
+ RAG_PROMPT = """
42
  CONTEXT:
43
  {context}
44
 
45
  QUERY:
46
  {question}
47
 
48
+ You are a helpful assistant. Use the available context to answer the question. Do not use your own knowledge! If you cannot answer the question based on the context, you must say "I don't know".
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
  """
50
 
51
  REPHRASE_QUERY_PROMPT = """
 
55
  You are a helpful assistant. Rephrase the provided query to be more specific and to the point in order to improve retrieval in our RAG pipeline about AB Testing.
56
  """
57
 
58
+ EVALUATE_RESPONSE_PROMPT = """
59
+ Given an initial query, determine if the initial query is related to AB Testing (even vaguely e.g. statistics, A/B testing, etc.) or not. If not related to AB Testing, return 'Y'. If related to AB Testing, then given the initial query and a final response, determine if the final response is extremely helpful or not. If extremely helpful, return 'Y'. If not extremely helpful, return 'N'.
 
 
 
 
 
 
 
 
 
 
60
 
61
+ Initial Query:
62
+ {initial_query}
 
 
63
 
64
+ Final Response:
65
+ {final_response}
66
  """
67
 
68
+ rag_prompt = ChatPromptTemplate.from_template(RAG_PROMPT)
69
+ rephrase_query_prompt = ChatPromptTemplate.from_template(REPHRASE_QUERY_PROMPT)
70
+ evaluate_prompt = PromptTemplate.from_template(EVALUATE_RESPONSE_PROMPT)
71
+
72
  @st.cache_resource
73
  def load_document_chunks():
74
  """Load pre-processed document chunks from disk."""
 
77
  print(f"Working directory contents: {os.listdir('.')}")
78
  if os.path.exists(PROCESSED_DATA_DIR):
79
  print(f"PROCESSED_DATA_DIR contents: {os.listdir(PROCESSED_DATA_DIR)}")
80
+ return []
81
 
82
  try:
83
  with open(CHUNKS_FILE, 'rb') as f:
 
111
 
112
  # Call API directly
113
  response = openai_client.chat.completions.create(
114
+ model="gpt-4.1-mini",
115
+ messages=openai_messages,
116
+ temperature=0
117
  )
118
 
119
  # Create response object with content attribute
 
128
  print(f"Error creating OpenAI wrapper: {str(e)}")
129
  try:
130
  # Last resort fallback to basic LangChain with minimal config
131
+ return ChatOpenAI(model="gpt-4.1-mini", temperature=0)
132
  except Exception as e2:
133
  print(f"Fallback also failed: {str(e2)}")
134
 
 
164
 
165
  # Call API directly with a more powerful model
166
  response = openai_client.chat.completions.create(
167
+ model="gpt-4.1",
168
+ messages=openai_messages,
169
+ temperature=0
170
  )
171
 
172
  class SimpleResponse:
 
180
  print(f"Error creating agent model: {str(e)}")
181
  try:
182
  # Fallback
183
+ return ChatOpenAI(model="gpt-4.1", temperature=0)
184
  except Exception as e2:
185
  print(f"Agent model fallback also failed: {str(e2)}")
186
  # Final fallback to gpt-3.5-turbo
187
  try:
188
+ return ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
189
  except:
190
  # Create dummy that returns a fixed response
191
  class DummyModel:
 
235
  print(f"Error initializing embedding model: {str(e)}")
236
  # Last resort fallback
237
  try:
238
+ return OpenAIEmbeddings(model="text-embedding-3-small")
239
  except Exception as e2:
240
  print(f"Embedding fallback also failed: {str(e2)}")
241
 
 
292
  print(f"Alternative initialization failed: {str(e2)}")
293
  raise
294
 
295
+ def rag_chain_node(query):
296
+ """
297
+ Implements the equivalent of the rag_chain_node from the notebook.
298
+ This function retrieves documents, extracts sources, and generates an answer.
299
+ """
300
+ print(f"rag_chain_node: Processing query '{query}'")
301
 
302
+ # 1. Retrieve documents once
303
  try:
304
+ print("Setting up retriever...")
305
+ client = setup_qdrant_client()
306
+ collection_name = "kohavi_ab_testing_pdf_collection"
307
+
308
+ # Get embedding for the query
309
  embedding_model = get_embedding_model()
310
+ query_embedding = embedding_model.embed_query(query)
311
 
312
+ # Get documents
313
+ print("Retrieving documents...")
314
+ chunks = load_document_chunks()
 
 
 
315
 
316
+ # Map of document IDs to actual documents
317
  docs_by_id = {i: doc for i, doc in enumerate(chunks)}
318
 
319
+ # Search for relevant documents
320
+ search_results = client.search(
321
+ collection_name=collection_name,
322
+ query_vector=query_embedding,
323
+ limit=5
324
+ )
325
 
326
+ # Convert search results to documents
327
+ docs = []
328
+ for result in search_results:
329
+ doc_id = result.id
330
+ if doc_id in docs_by_id:
331
+ docs.append(docs_by_id[doc_id])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
332
  except Exception as e:
333
  print(f"Error in document retrieval: {str(e)}")
334
+ return "I'm having trouble retrieving relevant information. Please try again later.", []
335
+
336
+ # 2. Extract sources from the documents
337
+ sources = []
338
+ for doc in docs:
339
+ source_path = doc.metadata.get("source", "")
340
+ filename = source_path.split("/")[-1] if "/" in source_path else source_path
341
+
342
+ # Remove .pdf extension if present
343
+ if filename.lower().endswith('.pdf'):
344
+ filename = filename[:-4]
345
+
346
+ sources.append({
347
+ "title": f"Ron Kohavi: {filename}",
348
+ "page": doc.metadata.get("page", "unknown"),
349
+ "type": "pdf"
350
+ })
351
+
352
+ # 3. Use the RAG chain to generate an answer
353
+ if not docs:
354
+ print("No documents found")
355
+ return "I don't have enough information to answer that question.", []
356
+
357
+ # Create context from documents
358
+ context = "\n\n".join([doc.page_content for doc in docs])
359
+
360
+ # Format the prompt with context and query
361
+ formatted_prompt = rag_prompt.format(context=context, question=query)
362
+
363
+ # Send to the model and parse the output
364
+ print("Generating answer...")
365
  chat_model = get_chat_model()
366
+ response = chat_model.invoke(formatted_prompt)
367
+ response_text = response.content
368
+
369
+ return response_text, sources
370
+
371
+ def evaluate_response(query, response):
372
+ """
373
+ Determines if the initial RAG response was sufficient using the original evaluation logic.
374
+ Returns True if the response is sufficient, False otherwise.
375
+ """
376
+ print(f"Evaluating response for '{query}'")
377
  agent_model = get_agent_model()
378
+
379
+ formatted_prompt = evaluate_prompt.format(
380
+ initial_query=query,
381
+ final_response=response
382
+ )
383
+
384
+ helpfulness_chain = agent_model
385
+ messages = [HumanMessage(content=formatted_prompt)]
386
+ helpfulness_response = helpfulness_chain.invoke(messages)
387
+
388
+ # Check if 'Y' is in the response
389
+ if "Y" in helpfulness_response.content:
390
+ print("Evaluation: Initial response is sufficient")
391
+ return True
392
+ else:
393
+ print("Evaluation: Initial response is NOT sufficient, need to use agent")
394
+ return False
395
 
396
  @tool
397
+ def retrieve_information(query: str) -> str:
398
+ """Use Retrieval Augmented Generation to retrieve information about AB Testing."""
399
+ # 1. Retrieve documents
400
+ client = setup_qdrant_client()
401
+ collection_name = "kohavi_ab_testing_pdf_collection"
402
+
403
+ # Get embedding for the query
404
+ embedding_model = get_embedding_model()
405
+ query_embedding = embedding_model.embed_query(query)
406
+
407
+ # Get documents
408
+ chunks = load_document_chunks()
409
+
410
+ # Map of document IDs to actual documents
411
+ docs_by_id = {i: doc for i, doc in enumerate(chunks)}
412
+
413
+ # Search for relevant documents
414
+ try:
415
+ search_results = client.search(
416
+ collection_name=collection_name,
417
+ query_vector=query_embedding,
418
+ limit=5
419
+ )
420
+ except Exception as e:
421
+ print(f"Error in search: {str(e)}")
422
+ try:
423
+ search_results = client.query_points(
424
+ collection_name=collection_name,
425
+ query_vector=query_embedding,
426
+ limit=5
427
+ )
428
+ except Exception as e2:
429
+ print(f"Error in query_points: {str(e2)}")
430
+ return "Error retrieving documents."
431
+
432
+ # Convert search results to documents
433
+ docs = []
434
+ for result in search_results:
435
+ doc_id = result.id
436
+ if doc_id in docs_by_id:
437
+ docs.append(docs_by_id[doc_id])
438
+
439
+ # 2. Extract and store sources
440
+ sources = []
441
+ for doc in docs:
442
+ source_path = doc.metadata.get("source", "")
443
+ filename = source_path.split("/")[-1] if "/" in source_path else source_path
444
+
445
+ # Remove .pdf extension if present
446
+ if filename.lower().endswith('.pdf'):
447
+ filename = filename[:-4]
448
+
449
+ sources.append({
450
+ "title": f"Ron Kohavi: {filename}",
451
+ "page": doc.metadata.get("page", "unknown"),
452
+ "type": "pdf"
453
+ })
454
+
455
+ # Store sources for later access
456
+ retrieve_information.last_sources = sources
457
+
458
+ # 3. Return just the formatted document contents
459
+ formatted_content = "\n\n".join([f"Retrieved Information: {i+1}\n{doc.page_content}"
460
+ for i, doc in enumerate(docs)])
461
+ return formatted_content
462
 
463
  @tool
464
+ def retrieve_information_with_rephrased_query(query: str) -> str:
465
+ """This tool will intelligently rephrase your AB testing query and then will use Retrieval Augmented Generation to retrieve information about the rephrased query."""
466
+ # 1. Rephrase the query first
467
+ chat_model = get_chat_model()
468
+ rephrased_query_msg = rephrase_query_prompt.format(question=query)
469
+ rephrased_query_response = chat_model.invoke(rephrased_query_msg)
470
+ rephrased_query = rephrased_query_response.content
471
+
472
+ # 2. Retrieve documents using the rephrased query
473
+ client = setup_qdrant_client()
474
+ collection_name = "kohavi_ab_testing_pdf_collection"
475
+
476
+ # Get embedding for the query
477
+ embedding_model = get_embedding_model()
478
+ query_embedding = embedding_model.embed_query(rephrased_query)
479
+
480
+ # Get documents
481
+ chunks = load_document_chunks()
482
+
483
+ # Map of document IDs to actual documents
484
+ docs_by_id = {i: doc for i, doc in enumerate(chunks)}
485
+
486
+ # Search for relevant documents
487
+ try:
488
+ search_results = client.search(
489
+ collection_name=collection_name,
490
+ query_vector=query_embedding,
491
+ limit=5
492
+ )
493
+ except Exception as e:
494
+ print(f"Error in search: {str(e)}")
495
+ try:
496
+ search_results = client.query_points(
497
+ collection_name=collection_name,
498
+ query_vector=query_embedding,
499
+ limit=5
500
+ )
501
+ except Exception as e2:
502
+ print(f"Error in query_points: {str(e2)}")
503
+ return f"Error retrieving documents with rephrased query: {rephrased_query}"
504
+
505
+ # Convert search results to documents
506
+ docs = []
507
+ for result in search_results:
508
+ doc_id = result.id
509
+ if doc_id in docs_by_id:
510
+ docs.append(docs_by_id[doc_id])
511
+
512
+ # 3. Extract and store sources
513
+ sources = []
514
+ for doc in docs:
515
+ source_path = doc.metadata.get("source", "")
516
+ filename = source_path.split("/")[-1] if "/" in source_path else source_path
517
+
518
+ # Remove .pdf extension if present
519
+ if filename.lower().endswith('.pdf'):
520
+ filename = filename[:-4]
521
+
522
+ sources.append({
523
+ "title": f"Ron Kohavi: {filename}",
524
+ "page": doc.metadata.get("page", "unknown"),
525
+ "type": "pdf"
526
+ })
527
+
528
+ # Store sources for later access
529
+ retrieve_information_with_rephrased_query.last_sources = sources
530
+
531
+ # 4. Return formatted content with rephrased query
532
+ formatted_content = f"Rephrased query: {rephrased_query}\n\n" + "\n\n".join(
533
+ [f"Retrieved Information: {i+1}\n{doc.page_content}" for i, doc in enumerate(docs)]
534
+ )
535
+ return formatted_content
536
 
537
  @tool
538
  def search_arxiv(query: str) -> str:
 
619
  def setup_agent():
620
  """Set up the agent with tools."""
621
  agent_model = get_agent_model()
622
+ tools = [retrieve_information, retrieve_information_with_rephrased_query, search_arxiv]
623
+
624
+ try:
625
+ return create_openai_tools_agent(
626
+ llm=agent_model,
627
+ tools=tools,
628
+ prompt=ChatPromptTemplate.from_messages([
629
+ ("system", "You are an expert AB Testing assistant. Your job is to provide helpful, accurate information about AB Testing topics."),
630
+ ("human", "{input}"),
631
+ ("ai", "{agent_scratchpad}")
632
+ ])
633
+ )
634
+ except Exception as e:
635
+ print(f"Error creating agent: {str(e)}")
636
+ return None
637
+
638
+ def execute_agent(agent, query):
639
+ """Execute the agent with the given query."""
640
  try:
 
641
  executor = AgentExecutor(
642
+ agent=agent,
643
+ tools=[retrieve_information, retrieve_information_with_rephrased_query, search_arxiv],
644
  verbose=True,
645
+ handle_parsing_errors=True
 
646
  )
647
+
648
+ response = executor.invoke({"input": query})
649
+
650
+ # Extract sources based on used tools
651
+ sources = []
652
+ if hasattr(retrieve_information, "last_sources"):
653
+ sources = retrieve_information.last_sources
654
+ elif hasattr(retrieve_information_with_rephrased_query, "last_sources"):
655
+ sources = retrieve_information_with_rephrased_query.last_sources
656
+ elif ARXIV_SOURCES:
657
+ sources = ARXIV_SOURCES
658
+
659
+ return response["output"], sources
660
  except Exception as e:
661
+ print(f"Error executing agent: {str(e)}")
662
  import traceback
663
  traceback.print_exc()
664
+ return "I'm having trouble processing your request. Please try again.", []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
665
 
666
  # Streamlit UI
667
  st.set_page_config(
 
715
  message_placeholder = st.empty()
716
 
717
  with st.status("Processing your query...", expanded=True) as status:
718
+ # Follow the exact flow from the notebook
719
+ st.write("Starting with Initial RAG...")
720
  print("Starting RAG process for query:", query)
721
 
722
+ # Step 1: Initial RAG
723
+ initial_response, sources = rag_chain_node(query)
 
724
 
725
+ # Step 2: Evaluate response
726
+ is_sufficient = evaluate_response(query, initial_response)
727
 
728
+ # Step 3: Either end with the initial response or use the agent
729
+ if is_sufficient:
730
+ answer = initial_response
731
+ st.write("Initial response is sufficient")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
732
  else:
733
+ st.write("Initial response needs improvement, using specialized agent...")
734
+ print("Using agent for enhanced answer")
 
 
 
 
735
 
736
+ # Create and execute the agent
737
+ agent = setup_agent()
738
+ if agent:
739
+ agent_response, agent_sources = execute_agent(agent, query)
740
+ answer = agent_response
741
+
742
+ # If the agent found sources, use those instead
743
+ if agent_sources:
744
+ sources = agent_sources
745
+ else:
746
+ answer = "I'm having trouble setting up the specialized agent. Here's what I found initially: " + initial_response
747
 
748
  status.update(label="Completed!", state="complete", expanded=False)
749