Spaces:
Sleeping
Sleeping
| # Evaluation: LLM Calling & Query Processing Flow | |
| ## High-Level Overview | |
| ``` | |
| EVALUATION PROCESS: | |
| β | |
| ββ Load Test Data from Dataset | |
| β ββ Questions + Ground Truth Answers | |
| β | |
| ββ FOR EACH TEST QUESTION: | |
| β β | |
| β ββ 1. RETRIEVE DOCUMENTS (Vector Search) | |
| β β β | |
| β β ββ query_text β embed_query() β semantic search β get_retrieved_documents() | |
| β β | |
| β ββ 2. GENERATE RESPONSE (LLM Call) | |
| β β β | |
| β β ββ query + documents β LLM β response | |
| β β | |
| β ββ 3. STORE TEST CASE (For Evaluation) | |
| β ββ {query, response, documents, ground_truth} | |
| β | |
| ββ COMPUTE TRACe METRICS | |
| β ββ utilization, relevance, adherence, completeness | |
| β | |
| ββ DISPLAY RESULTS | |
| ``` | |
| --- | |
| ## Detailed Flow: Query Processing in Evaluation | |
| ### **Step 1: Test Sample Loop** (streamlit_app.py, Line 723) | |
| ```python | |
| for i, sample in enumerate(test_data): | |
| # sample = {"question": "...", "answer": "...", ...} | |
| # Step 2: Call RAG pipeline with the question | |
| result = st.session_state.rag_pipeline.query( | |
| sample["question"], # β Query string | |
| n_results=5 # β How many docs to retrieve | |
| ) | |
| ``` | |
| **Input**: | |
| - `sample["question"]` = User question from RAGBench dataset | |
| - Example: "What is machine learning?" | |
| - `n_results=5` = Retrieve top 5 most similar documents | |
| --- | |
| ### **Step 2: RAG Pipeline Query** (llm_client.py, Line 295) | |
| ```python | |
| class RAGPipeline: | |
| def query(self, query: str, n_results: int = 5) -> Dict: | |
| # βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| # β PHASE 1: RETRIEVAL (Vector Search) β | |
| # βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| # STEP 1: Call vector store to retrieve documents | |
| retrieved_docs = self.vector_store.get_retrieved_documents( | |
| query, # "What is machine learning?" | |
| n_results=5 # Top 5 documents | |
| ) | |
| # Result: [ | |
| # {"document": "ML is...", "metadata": {...}, "distance": 0.12}, | |
| # {"document": "Machine learning uses...", "metadata": {...}, "distance": 0.15}, | |
| # ... | |
| # ] | |
| # Extract document texts | |
| doc_texts = [doc["document"] for doc in retrieved_docs] | |
| # doc_texts = ["ML is...", "Machine learning uses...", ...] | |
| # βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| # β PHASE 2: GENERATION (LLM Call) β | |
| # βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| # STEP 2: Call LLM with query + retrieved documents | |
| response = self.llm.generate_with_context( | |
| query, # "What is machine learning?" | |
| doc_texts, # ["ML is...", "Machine learning uses...", ...] | |
| max_tokens=1024, | |
| temperature=0.7 | |
| ) | |
| # response = "Machine learning is a subset of artificial intelligence..." | |
| # STEP 3: Package results | |
| return { | |
| "query": query, | |
| "response": response, | |
| "retrieved_documents": retrieved_docs | |
| } | |
| ``` | |
| --- | |
| ### **Step 3A: Document Retrieval (Vector Store)** (vector_store.py, Line 321) | |
| ``` | |
| Query Processing: | |
| USER QUESTION: | |
| "What is machine learning?" | |
| β | |
| βΌ | |
| βββββββββββββββββββββββββββββββββββββββ | |
| β 1. Embed the Query β | |
| β ββββββββββββββββββββββββββββββββββ β | |
| β embedding_model.embed_query(query) β | |
| β β | |
| β Model: sentence-transformers/ β | |
| β all-mpnet-base-v2 β | |
| β β | |
| β Query String (tokens): β | |
| β "What" β [0.1, 0.2, ...] β | |
| β "is" β [0.3, 0.4, ...] β | |
| β "machine" β [0.5, 0.6, ...] β | |
| β "learning" β [0.7, 0.8, ...] β | |
| β β | |
| β Output: Query Vector [768-dim] β | |
| β β β | |
| β [0.15, 0.32, 0.51, ..., 0.89] β | |
| βββββββββββββββββββββββββββββββββββββββ | |
| β | |
| βΌ | |
| ββββββββββββββββββββββββββββββββββββββββββββ | |
| β 2. Semantic Search in ChromaDB β | |
| β ββββββββββββββββββββββββββββββββββββββββ β | |
| β β | |
| β collection.query( β | |
| β query_embeddings=[query_vector], β | |
| β n_results=5, β | |
| β where=None β | |
| β ) β | |
| β β | |
| β Compare query_vector against all doc β | |
| β vectors in the collection using β | |
| β cosine similarity β | |
| β β | |
| β Scoring: similarity = dot_product/ β | |
| β (norm_a * norm_b) β | |
| β β | |
| β Top 5 Results (sorted by similarity): β | |
| β β’ Doc 1: "ML is a field..." (sim: 0.92) β | |
| β β’ Doc 2: "Deep learning..." (sim: 0.89) β | |
| β β’ Doc 3: "Neural networks..." (sim: 0.87) | |
| β β’ Doc 4: "AI overview..." (sim: 0.81) β | |
| β β’ Doc 5: "Training data..." (sim: 0.78) β | |
| ββββββββββββββββββββββββββββββββββββββββββββ | |
| β | |
| βΌ | |
| βββββββββββββββββββββββββββββββββββββββ | |
| β 3. Format Retrieved Documents β | |
| β ββββββββββββββββββββββββββββββββββ β | |
| β retrieved_docs = [ β | |
| β { β | |
| β "document": "ML is a field...",β | |
| β "metadata": {...}, β | |
| β "distance": 0.08 β | |
| β }, β | |
| β {...}, β | |
| β ... β | |
| β ] β | |
| βββββββββββββββββββββββββββββββββββββββ | |
| β | |
| βΌ | |
| RETURNED TO RAGPipeline | |
| ``` | |
| --- | |
| ### **Step 3B: LLM Response Generation** (llm_client.py, Line 215) | |
| ``` | |
| Retrieved Documents: | |
| β | |
| ββ Doc1: "ML is a field of AI that..." | |
| ββ Doc2: "Machine learning uses algorithms..." | |
| ββ Doc3: "Neural networks process data..." | |
| ββ Doc4: "Training data is essential..." | |
| ββ Doc5: "Deep learning is a subset..." | |
| β | |
| βΌ | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β 1. BUILD PROMPT β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β | |
| β context = """ β | |
| β Document 1: ML is a field of AI that... β | |
| β Document 2: Machine learning uses algorithms... β | |
| β Document 3: Neural networks process data... β | |
| β Document 4: Training data is essential... β | |
| β Document 5: Deep learning is a subset... β | |
| β """ β | |
| β β | |
| β prompt = """ β | |
| β Answer the following question based on the provided β | |
| β context. β | |
| β β | |
| β Context: β | |
| β {context} β | |
| β β | |
| β Question: What is machine learning? β | |
| β β | |
| β Answer: β | |
| β """ β | |
| β β | |
| β system_prompt = "You are a helpful AI assistant..." β | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β | |
| βΌ | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β 2. LLM API CALL (Groq) β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β | |
| β Client: Groq (groq.com) β | |
| β Model: llama-3.1-8b-instant (or selected model) β | |
| β API Endpoint: https://api.groq.com/v1/chat/ β | |
| β completions β | |
| β β | |
| β Request: β | |
| β { β | |
| β "model": "llama-3.1-8b-instant", β | |
| β "messages": [ β | |
| β { β | |
| β "role": "system", β | |
| β "content": "You are a helpful..." β | |
| β }, β | |
| β { β | |
| β "role": "user", β | |
| β "content": "[full prompt above]" β | |
| β } β | |
| β ], β | |
| β "max_tokens": 1024, β | |
| β "temperature": 0.7 β | |
| β } β | |
| β β | |
| β Where is the LLM processing happening: β | |
| β β Groq's GPU servers (not local) β | |
| β β Model processes entire prompt β | |
| β β Generates response token-by-token β | |
| β β Returns complete response β | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β | |
| βΌ | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β 3. PARSE LLM RESPONSE β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β | |
| β Response Text: β | |
| β "Machine learning is a field of artificial β | |
| β intelligence that enables computers to learn from β | |
| β data without being explicitly programmed..." β | |
| β β | |
| β Extract: response.choices[0].message.content β | |
| β Return: Final Answer String β | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β | |
| βΌ | |
| RETURNED TO RAGPipeline | |
| ``` | |
| --- | |
| ## Complete Code Flow for One Evaluation Query | |
| ### **File: streamlit_app.py** (Line 723-730) | |
| ```python | |
| # FOR EACH TEST QUESTION IN THE DATASET: | |
| for i, sample in enumerate(test_data): | |
| # sample["question"] = "What is machine learning?" | |
| # sample["answer"] = "ML is a subset of AI..." | |
| # β STEP 1: CALL RAG PIPELINE β | |
| result = st.session_state.rag_pipeline.query( | |
| sample["question"], # Pass question | |
| n_results=5 # Get top 5 docs | |
| ) | |
| # Returns: | |
| # { | |
| # "query": "What is machine learning?", | |
| # "response": "Machine learning is...", | |
| # "retrieved_documents": [ | |
| # {"document": "...", "metadata": {...}, ...}, | |
| # ... | |
| # ] | |
| # } | |
| # β STEP 2: EXTRACT RESULTS β | |
| test_cases.append({ | |
| "query": sample["question"], | |
| "response": result["response"], | |
| "retrieved_documents": [ | |
| doc["document"] for doc in result["retrieved_documents"] | |
| ], | |
| "ground_truth": sample.get("answer", "") | |
| }) | |
| ``` | |
| ### **File: llm_client.py** (RAGPipeline class, Line 295-340) | |
| ```python | |
| class RAGPipeline: | |
| def query(self, query: str, n_results: int = 5) -> Dict: | |
| # β STEP 2A: RETRIEVE DOCUMENTS β | |
| # Where: vector_store.py β get_retrieved_documents() | |
| retrieved_docs = self.vector_store.get_retrieved_documents( | |
| query, # "What is machine learning?" | |
| n_results=5 | |
| ) | |
| # β STEP 2B: EXTRACT DOCUMENT TEXTS β | |
| doc_texts = [doc["document"] for doc in retrieved_docs] | |
| # doc_texts = [ | |
| # "Machine learning is a subset of AI...", | |
| # "Deep learning uses neural networks...", | |
| # ... | |
| # ] | |
| # β STEP 2C: CALL LLM β | |
| # Where: llm_client.py β generate_with_context() | |
| response = self.llm.generate_with_context( | |
| query, # "What is machine learning?" | |
| doc_texts, # [retrieved document texts] | |
| max_tokens=1024, | |
| temperature=0.7 | |
| ) | |
| # response = "Machine learning is a field of AI..." | |
| # β STEP 2D: RETURN RESULTS β | |
| return { | |
| "query": query, | |
| "response": response, | |
| "retrieved_documents": retrieved_docs | |
| } | |
| ``` | |
| ### **File: vector_store.py** (ChromaDBManager class, Line 370-400) | |
| ```python | |
| def get_retrieved_documents(self, query_text: str, n_results: int = 5): | |
| # β STEP 3A-1: QUERY THE COLLECTION β | |
| # Where: vector_store.py β query() | |
| results = self.query(query_text, n_results) | |
| # results = { | |
| # 'documents': [[doc1, doc2, doc3, doc4, doc5]], | |
| # 'metadatas': [[meta1, meta2, ...]], | |
| # 'distances': [[dist1, dist2, ...]], | |
| # 'ids': [[id1, id2, ...]] | |
| # } | |
| # β STEP 3A-2: FORMAT RESULTS β | |
| retrieved_docs = [] | |
| for i in range(len(results['documents'][0])): | |
| retrieved_docs.append({ | |
| "document": results['documents'][0][i], | |
| "metadata": results['metadatas'][0][i], | |
| "distance": results['distances'][0][i] | |
| }) | |
| # retrieved_docs = [ | |
| # {"document": "ML is...", "metadata": {...}, "distance": 0.08}, | |
| # {"document": "Deep...", "metadata": {...}, "distance": 0.11}, | |
| # ... | |
| # ] | |
| return retrieved_docs | |
| ``` | |
| ### **File: llm_client.py** (GroqLLMClient class, Line 215-250) | |
| ```python | |
| def generate_with_context(self, query: str, context_documents: List[str]): | |
| # β STEP 3B-1: BUILD CONTEXT STRING β | |
| context = "\n\n".join([ | |
| f"Document {i+1}: {doc}" | |
| for i, doc in enumerate(context_documents) | |
| ]) | |
| # context = """ | |
| # Document 1: ML is a field of AI that... | |
| # Document 2: Machine learning uses algorithms... | |
| # ... | |
| # """ | |
| # β STEP 3B-2: BUILD PROMPT β | |
| prompt = f"""Answer the following question based on the provided context. | |
| Context: | |
| {context} | |
| Question: {query} | |
| Answer:""" | |
| system_prompt = "You are a helpful AI assistant..." | |
| # β STEP 3B-3: CALL LLM (GROQ API) β | |
| # Where: llm_client.py β generate() | |
| return self.generate(prompt, max_tokens=1024, temperature=0.7, system_prompt) | |
| ``` | |
| ### **File: llm_client.py** (GroqLLMClient.generate(), Line 110-155) | |
| ```python | |
| def generate(self, prompt: str, max_tokens: int, temperature: float, system_prompt: str): | |
| # β STEP 3B-4: PREPARE GROQ API CALL β | |
| # Apply rate limiting (max 30 requests per minute) | |
| self.rate_limiter.acquire_sync() | |
| # Build messages for Groq API | |
| messages = [] | |
| if system_prompt: | |
| messages.append({ | |
| "role": "system", | |
| "content": system_prompt | |
| }) | |
| messages.append({ | |
| "role": "user", | |
| "content": prompt | |
| }) | |
| # β STEP 3B-5: MAKE GROQ API REQUEST β | |
| try: | |
| response = self.client.chat.completions.create( | |
| model=self.model_name, # e.g., "llama-3.1-8b-instant" | |
| messages=messages, | |
| max_tokens=max_tokens, # 1024 | |
| temperature=temperature # 0.7 | |
| ) | |
| # β STEP 3B-6: EXTRACT RESPONSE β | |
| return response.choices[0].message.content | |
| # Returns: "Machine learning is a field of artificial intelligence..." | |
| except Exception as e: | |
| return f"Error: {str(e)}" | |
| ``` | |
| --- | |
| ## Summary of Query Processing in Evaluation | |
| | Step | Component | Input | Process | Output | | |
| |------|-----------|-------|---------|--------| | |
| | 1 | Streamlit UI | Test sample | Load from dataset | Question | | |
| | 2 | RAGPipeline | Question | Orchestrate RAG | Response | | |
| | 2A | ChromaDB | Question | Embed & search | 5 documents | | |
| | 2B | Embedding Model | Question text | Convert to vector | 768-dim vector | | |
| | 2C | Groq LLM | Q + 5 docs | API call | Generated answer | | |
| | 3 | TRACEEvaluator | Q, response, docs | Compute metrics | TRACe scores | | |
| --- | |
| ## Where LLM Gets Called | |
| **PRIMARY LLM CALL LOCATION**: `llm_client.py`, function `GroqLLMClient.generate()` (Line 110) | |
| **TRIGGERED BY**: | |
| 1. Chat interface: `Chat tab β query β generate()` | |
| 2. Evaluation: `run_evaluation() β rag_pipeline.query() β generate_with_context() β generate()` | |
| **DURING EVALUATION SPECIFICALLY**: | |
| - Called **once per test question** (e.g., 10 times for 10 test samples) | |
| - Each call: | |
| - Gets a unique question | |
| - Retrieves 5 relevant documents | |
| - Asks Groq LLM to answer using those documents | |
| - Stores result for TRACe metric computation | |
| **LLM MODEL USED**: | |
| - Default: `llama-3.1-8b-instant` (can be switched in UI) | |
| - Also available: `meta-llama/llama-4-maverick-17b-128e-instruct`, `openai/gpt-oss-120b` | |
| - Provider: **Groq** (cloud-based GPU inference) | |