Spaces:

gopikrishnait
/

CapStoneRAG10

Sleeping

App Files Files Community

CapStoneRAG10 / docs /EVALUATION_LLM_FLOW.md

Developer

Initial commit for HuggingFace Spaces - RAG Capstone Project with Qdrant Cloud

1d10b0a about 2 months ago

preview code

raw

history blame contribute delete

19.7 kB

	# Evaluation: LLM Calling & Query Processing Flow

	## High-Level Overview

	```
	EVALUATION PROCESS:
	│
	├─ Load Test Data from Dataset
	│ └─ Questions + Ground Truth Answers
	│
	├─ FOR EACH TEST QUESTION:
	│ │
	│ ├─ 1. RETRIEVE DOCUMENTS (Vector Search)
	│ │ │
	│ │ └─ query_text → embed_query() → semantic search → get_retrieved_documents()
	│ │
	│ ├─ 2. GENERATE RESPONSE (LLM Call)
	│ │ │
	│ │ └─ query + documents → LLM → response
	│ │
	│ └─ 3. STORE TEST CASE (For Evaluation)
	│ └─ {query, response, documents, ground_truth}
	│
	├─ COMPUTE TRACe METRICS
	│ └─ utilization, relevance, adherence, completeness
	│
	└─ DISPLAY RESULTS
	```

	---

	## Detailed Flow: Query Processing in Evaluation

	### Step 1: Test Sample Loop (streamlit_app.py, Line 723)

	```python
	for i, sample in enumerate(test_data):
	# sample = {"question": "...", "answer": "...", ...}

	# Step 2: Call RAG pipeline with the question
	result = st.session_state.rag_pipeline.query(
	sample["question"], # ← Query string
	n_results=5 # ← How many docs to retrieve
	)
	```

	Input:
	- `sample["question"]` = User question from RAGBench dataset
	- Example: "What is machine learning?"
	- `n_results=5` = Retrieve top 5 most similar documents

	---

	### Step 2: RAG Pipeline Query (llm_client.py, Line 295)

	```python
	class RAGPipeline:
	def query(self, query: str, n_results: int = 5) -> Dict:

	# ┌─────────────────────────────────────────────────────┐
	# │ PHASE 1: RETRIEVAL (Vector Search) │
	# └─────────────────────────────────────────────────────┘

	# STEP 1: Call vector store to retrieve documents
	retrieved_docs = self.vector_store.get_retrieved_documents(
	query, # "What is machine learning?"
	n_results=5 # Top 5 documents
	)
	# Result: [
	# {"document": "ML is...", "metadata": {...}, "distance": 0.12},
	# {"document": "Machine learning uses...", "metadata": {...}, "distance": 0.15},
	# ...
	# ]

	# Extract document texts
	doc_texts = [doc["document"] for doc in retrieved_docs]
	# doc_texts = ["ML is...", "Machine learning uses...", ...]

	# ┌─────────────────────────────────────────────────────┐
	# │ PHASE 2: GENERATION (LLM Call) │
	# └─────────────────────────────────────────────────────┘

	# STEP 2: Call LLM with query + retrieved documents
	response = self.llm.generate_with_context(
	query, # "What is machine learning?"
	doc_texts, # ["ML is...", "Machine learning uses...", ...]
	max_tokens=1024,
	temperature=0.7
	)
	# response = "Machine learning is a subset of artificial intelligence..."

	# STEP 3: Package results
	return {
	"query": query,
	"response": response,
	"retrieved_documents": retrieved_docs
	}
	```

	---

	### Step 3A: Document Retrieval (Vector Store) (vector_store.py, Line 321)

	```
	Query Processing:

	USER QUESTION:
	"What is machine learning?"
	│
	▼
	┌─────────────────────────────────────┐
	│ 1. Embed the Query │
	│ ────────────────────────────────── │
	│ embedding_model.embed_query(query) │
	│ │
	│ Model: sentence-transformers/ │
	│ all-mpnet-base-v2 │
	│ │
	│ Query String (tokens): │
	│ "What" → [0.1, 0.2, ...] │
	│ "is" → [0.3, 0.4, ...] │
	│ "machine" → [0.5, 0.6, ...] │
	│ "learning" → [0.7, 0.8, ...] │
	│ │
	│ Output: Query Vector [768-dim] │
	│ ↓ │
	│ [0.15, 0.32, 0.51, ..., 0.89] │
	└─────────────────────────────────────┘
	│
	▼
	┌──────────────────────────────────────────┐
	│ 2. Semantic Search in ChromaDB │
	│ ──────────────────────────────────────── │
	│ │
	│ collection.query( │
	│ query_embeddings=[query_vector], │
	│ n_results=5, │
	│ where=None │
	│ ) │
	│ │
	│ Compare query_vector against all doc │
	│ vectors in the collection using │
	│ cosine similarity │
	│ │
	│ Scoring: similarity = dot_product/ │
	│ (norm_a * norm_b) │
	│ │
	│ Top 5 Results (sorted by similarity): │
	│ • Doc 1: "ML is a field..." (sim: 0.92) │
	│ • Doc 2: "Deep learning..." (sim: 0.89) │
	│ • Doc 3: "Neural networks..." (sim: 0.87)
	│ • Doc 4: "AI overview..." (sim: 0.81) │
	│ • Doc 5: "Training data..." (sim: 0.78) │
	└──────────────────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────┐
	│ 3. Format Retrieved Documents │
	│ ────────────────────────────────── │
	│ retrieved_docs = [ │
	│ { │
	│ "document": "ML is a field...",│
	│ "metadata": {...}, │
	│ "distance": 0.08 │
	│ }, │
	│ {...}, │
	│ ... │
	│ ] │
	└─────────────────────────────────────┘
	│
	▼
	RETURNED TO RAGPipeline
	```

	---

	### Step 3B: LLM Response Generation (llm_client.py, Line 215)

	```
	Retrieved Documents:
	│
	├─ Doc1: "ML is a field of AI that..."
	├─ Doc2: "Machine learning uses algorithms..."
	├─ Doc3: "Neural networks process data..."
	├─ Doc4: "Training data is essential..."
	└─ Doc5: "Deep learning is a subset..."
	│
	▼
	┌────────────────────────────────────────────────────────┐
	│ 1. BUILD PROMPT │
	│ ────────────────────────────────────────────────────── │
	│ │
	│ context = """ │
	│ Document 1: ML is a field of AI that... │
	│ Document 2: Machine learning uses algorithms... │
	│ Document 3: Neural networks process data... │
	│ Document 4: Training data is essential... │
	│ Document 5: Deep learning is a subset... │
	│ """ │
	│ │
	│ prompt = """ │
	│ Answer the following question based on the provided │
	│ context. │
	│ │
	│ Context: │
	│ {context} │
	│ │
	│ Question: What is machine learning? │
	│ │
	│ Answer: │
	│ """ │
	│ │
	│ system_prompt = "You are a helpful AI assistant..." │
	└────────────────────────────────────────────────────────┘
	│
	▼
	┌────────────────────────────────────────────────────────┐
	│ 2. LLM API CALL (Groq) │
	│ ────────────────────────────────────────────────────── │
	│ │
	│ Client: Groq (groq.com) │
	│ Model: llama-3.1-8b-instant (or selected model) │
	│ API Endpoint: https://api.groq.com/v1/chat/ │
	│ completions │
	│ │
	│ Request: │
	│ { │
	│ "model": "llama-3.1-8b-instant", │
	│ "messages": [ │
	│ { │
	│ "role": "system", │
	│ "content": "You are a helpful..." │
	│ }, │
	│ { │
	│ "role": "user", │
	│ "content": "[full prompt above]" │
	│ } │
	│ ], │
	│ "max_tokens": 1024, │
	│ "temperature": 0.7 │
	│ } │
	│ │
	│ Where is the LLM processing happening: │
	│ → Groq's GPU servers (not local) │
	│ → Model processes entire prompt │
	│ → Generates response token-by-token │
	│ → Returns complete response │
	└────────────────────────────────────────────────────────┘
	│
	▼
	┌────────────────────────────────────────────────────────┐
	│ 3. PARSE LLM RESPONSE │
	│ ────────────────────────────────────────────────────── │
	│ │
	│ Response Text: │
	│ "Machine learning is a field of artificial │
	│ intelligence that enables computers to learn from │
	│ data without being explicitly programmed..." │
	│ │
	│ Extract: response.choices[0].message.content │
	│ Return: Final Answer String │
	└────────────────────────────────────────────────────────┘
	│
	▼
	RETURNED TO RAGPipeline
	```

	---

	## Complete Code Flow for One Evaluation Query

	### File: streamlit_app.py (Line 723-730)

	```python
	# FOR EACH TEST QUESTION IN THE DATASET:
	for i, sample in enumerate(test_data):
	# sample["question"] = "What is machine learning?"
	# sample["answer"] = "ML is a subset of AI..."

	# ★ STEP 1: CALL RAG PIPELINE ★
	result = st.session_state.rag_pipeline.query(
	sample["question"], # Pass question
	n_results=5 # Get top 5 docs
	)
	# Returns:
	# {
	# "query": "What is machine learning?",
	# "response": "Machine learning is...",
	# "retrieved_documents": [
	# {"document": "...", "metadata": {...}, ...},
	# ...
	# ]
	# }

	# ★ STEP 2: EXTRACT RESULTS ★
	test_cases.append({
	"query": sample["question"],
	"response": result["response"],
	"retrieved_documents": [
	doc["document"] for doc in result["retrieved_documents"]
	],
	"ground_truth": sample.get("answer", "")
	})
	```

	### File: llm_client.py (RAGPipeline class, Line 295-340)

	```python
	class RAGPipeline:
	def query(self, query: str, n_results: int = 5) -> Dict:

	# ★ STEP 2A: RETRIEVE DOCUMENTS ★
	# Where: vector_store.py → get_retrieved_documents()
	retrieved_docs = self.vector_store.get_retrieved_documents(
	query, # "What is machine learning?"
	n_results=5
	)

	# ★ STEP 2B: EXTRACT DOCUMENT TEXTS ★
	doc_texts = [doc["document"] for doc in retrieved_docs]
	# doc_texts = [
	# "Machine learning is a subset of AI...",
	# "Deep learning uses neural networks...",
	# ...
	# ]

	# ★ STEP 2C: CALL LLM ★
	# Where: llm_client.py → generate_with_context()
	response = self.llm.generate_with_context(
	query, # "What is machine learning?"
	doc_texts, # [retrieved document texts]
	max_tokens=1024,
	temperature=0.7
	)
	# response = "Machine learning is a field of AI..."

	# ★ STEP 2D: RETURN RESULTS ★
	return {
	"query": query,
	"response": response,
	"retrieved_documents": retrieved_docs
	}
	```

	### File: vector_store.py (ChromaDBManager class, Line 370-400)

	```python
	def get_retrieved_documents(self, query_text: str, n_results: int = 5):
	# ★ STEP 3A-1: QUERY THE COLLECTION ★
	# Where: vector_store.py → query()
	results = self.query(query_text, n_results)
	# results = {
	# 'documents': [[doc1, doc2, doc3, doc4, doc5]],
	# 'metadatas': [[meta1, meta2, ...]],
	# 'distances': [[dist1, dist2, ...]],
	# 'ids': [[id1, id2, ...]]
	# }

	# ★ STEP 3A-2: FORMAT RESULTS ★
	retrieved_docs = []
	for i in range(len(results['documents'][0])):
	retrieved_docs.append({
	"document": results['documents'][0][i],
	"metadata": results['metadatas'][0][i],
	"distance": results['distances'][0][i]
	})
	# retrieved_docs = [
	# {"document": "ML is...", "metadata": {...}, "distance": 0.08},
	# {"document": "Deep...", "metadata": {...}, "distance": 0.11},
	# ...
	# ]

	return retrieved_docs
	```

	### File: llm_client.py (GroqLLMClient class, Line 215-250)

	```python
	def generate_with_context(self, query: str, context_documents: List[str]):
	# ★ STEP 3B-1: BUILD CONTEXT STRING ★
	context = "\n\n".join([
	f"Document {i+1}: {doc}"
	for i, doc in enumerate(context_documents)
	])
	# context = """
	# Document 1: ML is a field of AI that...
	# Document 2: Machine learning uses algorithms...
	# ...
	# """

	# ★ STEP 3B-2: BUILD PROMPT ★
	prompt = f"""Answer the following question based on the provided context.

	Context:
	{context}

	Question: {query}

	Answer:"""

	system_prompt = "You are a helpful AI assistant..."

	# ★ STEP 3B-3: CALL LLM (GROQ API) ★
	# Where: llm_client.py → generate()
	return self.generate(prompt, max_tokens=1024, temperature=0.7, system_prompt)
	```

	### File: llm_client.py (GroqLLMClient.generate(), Line 110-155)

	```python
	def generate(self, prompt: str, max_tokens: int, temperature: float, system_prompt: str):
	# ★ STEP 3B-4: PREPARE GROQ API CALL ★

	# Apply rate limiting (max 30 requests per minute)
	self.rate_limiter.acquire_sync()

	# Build messages for Groq API
	messages = []
	if system_prompt:
	messages.append({
	"role": "system",
	"content": system_prompt
	})
	messages.append({
	"role": "user",
	"content": prompt
	})

	# ★ STEP 3B-5: MAKE GROQ API REQUEST ★
	try:
	response = self.client.chat.completions.create(
	model=self.model_name, # e.g., "llama-3.1-8b-instant"
	messages=messages,
	max_tokens=max_tokens, # 1024
	temperature=temperature # 0.7
	)

	# ★ STEP 3B-6: EXTRACT RESPONSE ★
	return response.choices[0].message.content
	# Returns: "Machine learning is a field of artificial intelligence..."

	except Exception as e:
	return f"Error: {str(e)}"
	```

	---

	## Summary of Query Processing in Evaluation

	\| Step \| Component \| Input \| Process \| Output \|
	\|------\|-----------\|-------\|---------\|--------\|
	\| 1 \| Streamlit UI \| Test sample \| Load from dataset \| Question \|
	\| 2 \| RAGPipeline \| Question \| Orchestrate RAG \| Response \|
	\| 2A \| ChromaDB \| Question \| Embed & search \| 5 documents \|
	\| 2B \| Embedding Model \| Question text \| Convert to vector \| 768-dim vector \|
	\| 2C \| Groq LLM \| Q + 5 docs \| API call \| Generated answer \|
	\| 3 \| TRACEEvaluator \| Q, response, docs \| Compute metrics \| TRACe scores \|

	---

	## Where LLM Gets Called

	PRIMARY LLM CALL LOCATION: `llm_client.py`, function `GroqLLMClient.generate()` (Line 110)

	TRIGGERED BY:
	1. Chat interface: `Chat tab → query → generate()`
	2. Evaluation: `run_evaluation() → rag_pipeline.query() → generate_with_context() → generate()`

	DURING EVALUATION SPECIFICALLY:
	- Called once per test question (e.g., 10 times for 10 test samples)
	- Each call:
	- Gets a unique question
	- Retrieves 5 relevant documents
	- Asks Groq LLM to answer using those documents
	- Stores result for TRACe metric computation

	LLM MODEL USED:
	- Default: `llama-3.1-8b-instant` (can be switched in UI)
	- Also available: `meta-llama/llama-4-maverick-17b-128e-instruct`, `openai/gpt-oss-120b`
	- Provider: Groq (cloud-based GPU inference)