MissSqui commited on
Commit
2d6c260
·
verified ·
1 Parent(s): 58fe6ee

Create abc12

Browse files
Files changed (1) hide show
  1. abc12 +100 -0
abc12 ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ RAG Response Evaluation Strategy Documentation
2
+ Overview:
3
+ This document explains the rationale behind the evaluation of RAG (Retrieval-Augmented Generation) responses using various NLP metrics.
4
+ What is Needed:
5
+ - PDF document text
6
+ - Retrieved chunks from the retriever (top-k)
7
+ - Relevant chunks (manually identified or labeled)
8
+ - User's question
9
+ - Generated answer (from LLM)
10
+ Evaluation Metrics Used:
11
+ 1. BLEU:
12
+ Measures word-level overlap between reference and generated text. Useful for factual correctness.
13
+ 2. ROUGE-L:
14
+ Measures recall-oriented overlap of longest common subsequences. Good for summarization-type responses.
15
+ 3. Cosine Similarity:
16
+ Computes semantic similarity between embeddings of reference and generated text using sentence-transformers.
17
+ 4. Perplexity:
18
+ Indicates the fluency or surprise of the text to a language model. Lower is better.
19
+ 5. Precision@K:
20
+ How many of the top-K retrieved chunks are relevant.
21
+ 6. Recall@K:
22
+ How many relevant chunks are recovered in top-K results.
23
+ 7. nDCG@K:
24
+ Rewards higher-ranked relevant chunks more heavily.
25
+ 8. HIT@K:
26
+ Simple check if at least one relevant chunk is retrieved in top-K.
27
+ ------------------------------------------------------------------
28
+ Proposed Enhancements to Existing RAG Pipeline
29
+
30
+ 1. RAG Evaluation Metric
31
+
32
+ Introduce a comprehensive metric to evaluate the performance of the RAG system.
33
+
34
+ Proposed Approach
35
+
36
+ Composite Score with weighted components to reflect retrieval and generation quality.
37
+
38
+ HITs Score: Leverage Human Intelligence Task-based (HITs) scoring to measure the relevance and accuracy of retrieved documents.
39
+
40
+ Additional components (TBD): May include BLEU, ROUGE, or semantic similarity for generation quality.
41
+
42
+ Shape
43
+
44
+ 2. Summarization Response Optimization
45
+
46
+ Current Approach
47
+
48
+ Final summary is generated by aggregating summaries of all retrieved chunks, leading to high latency and increased compute cost.
49
+
50
+ Proposed Optimizations
51
+
52
+ 2.1 Top-K Chunk Summarization
53
+
54
+ Limit summarization to only top_k most relevant chunks (based on similarity or retrieval score).
55
+
56
+ Reduces number of summaries → Lower inference time.
57
+
58
+ 2.2 Parallel Processing
59
+
60
+ Utilize ThreadPoolExecutor to parallelize summarization of individual chunks.
61
+
62
+ Each chunk processed by a worker → Improves throughput, especially in multi-core environments.
63
+
64
+ Pseudo Code for implementation:
65
+
66
+ function summarize_chunk(chunk):
67
+
68
+ return summarize(chunk) // apply summarization logic to a single chunk
69
+
70
+
71
+
72
+ function parallel_summarize(top_k_chunks, num_workers):
73
+
74
+ create thread pool with num_workers
75
+
76
+
77
+
78
+ for each chunk in top_k_chunks:
79
+
80
+ assign summarize_chunk(chunk) to a worker thread
81
+
82
+
83
+
84
+ wait for all threads to finish
85
+
86
+ collect all individual summaries into a list
87
+
88
+
89
+
90
+ return aggregated_summary(list_of_summaries)Shape
91
+
92
+
93
+
94
+ Summary of Benefits
95
+
96
+ Improved Evaluation: Quantifiable metric to track RAG effectiveness.
97
+
98
+ Performance Gains: Reduced response time and compute overhead.
99
+
100
+ Scalability: Efficient parallel processing supports production-grade usage.