Spaces:
Running
Running
| # State-of-the-Art (SOTA) RAG Retrieval Data Flow | |
| This document details the end-to-end data flow of the News Pipeline RAG API, incorporating advanced patterns for accuracy, diversity, and production resilience. | |
| ## 1. Pre-Processing & Infrastructure (The "Cold-Start" Layer) | |
| To ensure **zero-latency** during the initial user interaction, the system implements a preemptive resource loading strategy. | |
| ### A. Async Pre-warming (Hidden Latency Absorption) | |
| - **Challenge**: Large Transformer models (like BGE-M3 and Cross-Encoders) typically take 5–15 seconds to load from disk to RAM/VRAM. Lazy-loading these on the first request creates an unacceptable user experience. | |
| - **Process**: | |
| - In `main.py`, the `@app.on_event("startup")` hook triggers a non-blocking `threading.Thread`. | |
| - This background thread immediately initializes `EmbedderService` and `RerankerService`. | |
| - By the time the web server is live and the user types their first query, the models are fully resident in memory, resulting in sub-second response times for the very first request. | |
| ### B. Circuit Breaker: ClickHouse Fallback (Always-On Reliability) | |
| - **Challenge**: Vector databases like Qdrant can occasionally experience network partitions or downtime. In a naive RAG, this would crash the conversation. | |
| - **Process**: | |
| - The `VectorStore.search` method is wrapped in a robust `try-except` block. | |
| - If the Qdrant client connection fails or a timeout occurs, the **Circuit Breaker** trips. | |
| - The system automatically redirects the query to `fallback_keyword_search()` in ClickHouse. | |
| - **Mechanism**: It performs a rapid SQL-based keyword search on titles and content in the `sentiment_results` table. While less semantically accurate than vectors, it ensures the user receives actual relevant news articles instead of a "Service Unavailable" error. | |
| ## 2. Request Phase (Conversational Logic) | |
| ### Step A: Query Transformation (Contextual Synthesis) | |
| **Purpose**: Bridging the gap between human conversation and vector search requirements. | |
| - **The Problem**: Users often ask relative questions like *"What about their stock?"*. Vector databases cannot resolve "their" without context. | |
| - **Process**: | |
| - The API retrieves the last 6 messages from PostgreSQL. | |
| - A specialized prompt instructs `GPT-4` to synthesize the conversation history and the new user query into a single **Standalone Search Query**. | |
| - If history is empty, the original query is used. | |
| - **Example Trace**: | |
| - **History**: `User: Tell me about Nvidia's revenue last year.` | |
| - **New Query**: `User: Did Intel do better?` | |
| - **Synthesized Search Query**: *"Comparison of Intel and Nvidia's revenue for the last fiscal year"* | |
| ### Step B: Intent-Based Search (Hybrid & Recency) | |
| **Purpose**: Combining semantic depth with keyword precision and news freshness. | |
| #### 1. Hybrid Vector Synthesis | |
| - **Dense Layer**: Uses `BAAI/bge-m3` to produce a 1024-dimensional semantic embedding. This handles "vibe" and "concept" matching (e.g., matching "financial struggle" to "bankruptcy"). | |
| - **Sparse Layer**: Prepares slots for keyword-specific vectors (e.g., Splade or BGE-M3 Sparse). This handles exact entities, ticker symbols (e.g., "NVDA"), or specific dates that dense embeddings might blur. | |
| #### 2. Temporal Decay (Recency Boosting) | |
| - **Logic**: News is a deteriorating asset. The system applies a **Recency Multiplier** during the retrieval collection phase. | |
| - **Formula**: `Score = Base_Similarity * (1.0 - (days_old / 60))`. | |
| - **Constraint**: The multiplier never drops below `0.5`, ensuring that very relevant historical news is still retrievable but newer coverage is naturally prioritized. | |
| - **Example**: | |
| - Article A (Identical match, 60 days old): `Final Score = 0.9 * 0.5 = 0.45` | |
| - Article B (Close match, today): `Final Score = 0.8 * 1.0 = 0.8` | |
| - **Result**: Article B is ranked higher despite slightly lower semantic similarity. | |
| ## 3. Retrieval Refinement (The "Precision" Layer) | |
| ### Step C: Cross-Encoder Reranking (Relevance Grading) | |
| **Purpose**: Moving from "Bi-Encoder" (fast but broad) to "Cross-Encoder" (slow but highly accurate). | |
| - **The Problem**: Dense embeddings (Bi-Encoders) are great at finding "similar" text but often struggle with fine-grained nuances or contradictory statements. | |
| - **Process**: | |
| - The system takes the **Top 20** results from the broad search. | |
| - Each [Query, Chunk] pair is passed through the `CrossEncoder` model (`ms-marco-TinyBERT-L-2-v2`). | |
| - The model produces a raw relevance score. This is significantly more accurate than pure cosine similarity from the vector search. | |
| ### Step D: Diversity Filtering - MMR (Information Density) | |
| **Purpose**: Preventing "Echo Chambers" or redundant context windows. | |
| - **The Problem**: Five news articles starting with the same AP wire sentence will fill the LLM context with redundant text. | |
| - **Process**: | |
| - Implemented **Maximal Marginal Relevance (MMR)**. | |
| - Logic selects documents that have high relevance but **low similarity** to already selected documents. | |
| - **Example**: | |
| - *Selection 1*: A factual report of a merger. | |
| - *Selection 2 (Rejected)*: Another factual report of the same merger. | |
| - *Selection 2 (Accepted)*: A financial analyst's opinion on the same merger. | |
| ### Step E: Parent Document Retrieval (Context Expansion) | |
| **Purpose**: Providing the "Full Picture" when a snippet isn't enough. | |
| - **Process**: | |
| - Small chunks (~500 chars) are indexed for surgical search accuracy. | |
| - If a chunk's rerank score is **> 0.8**, its unique `doc_id` is used to fetch the full parent article body from ClickHouse/Qdrant. | |
| - This allows the LLM to see the surrounding context that might have been lost in the chunking process. | |
| --- | |
| ## 4. Generation & Enrichment | |
| ### Step F: ClickHouse Trend Fusion (External Intelligence) | |
| **Purpose**: Grounding the LLM in real-time metadata. | |
| - **Process**: | |
| - Parallel to the LLM call, the system queries the **ClickHouse Data Warehouse**. | |
| - It extracts trending entities and sentiment scores for the last 3 days relevant to the query. | |
| - This "Trend Knowledge" is injected into the system prompt. | |
| - **Benefit**: The LLM can say: *"Retrieval articles show X, but ClickHouse trends show that sentiment for this topic is currently shifting negative."* | |
| ### Step G: Streaming Generation - SSE (Real-Time UX) | |
| **Purpose**: Minimizing "Perceived Latency". | |
| - **Process**: | |
| - Uses FastAPI `StreamingResponse` and Server-Sent Events (SSE). | |
| - Instead of waiting 5 seconds for a full paragraph, the first token is displayed within **200-400ms**. | |
| - Tokens are pushed to the client in real-time as the LLM predicts them. | |
| --- | |
| ## 5. Traceability & Feedback Loop | |
| ### Step H: Interaction Logging (Audit Trail) | |
| - **Traceability**: Every AI response logs the exact list of `retrieved_doc_ids` (Source IDs) in PostgreSQL. | |
| - **Learning Loop**: When a user gives a "Thumbs Down", developers can query the database to see exactly which sources were used. This allows for **Negative Sampling** (identifying which articles cause hallucination or bad answers). | |
| --- | |
| ## Technical Stack Overview | |
| | Stage | Tool/Model | | |
| | :--- | :--- | | |
| | **Embeddings** | `BAAI/bge-m3` (BAAI) | | |
| | **Reranking** | `ms-marco-TinyBERT-L-2-v2` (CrossEncoder) | | |
| | **Diversity** | Custom MMR Implementation | | |
| | **Vector DB** | Qdrant | | |
| | **Data Warehouse**| ClickHouse | | |
| | **Token Control** | `tiktoken` (cl100k_base) | | |
| | **LLM** | OpenAI `gpt-4` | | |
| --- | |
| ## Full Data Flow Visual | |
| ```mermaid | |
| graph TD | |
| User((User)) -->|Query| API[RAG API] | |
| API -->|Prompt| LLM_Rewriter[LLM Rewriter] | |
| LLM_Rewriter -->|Standalone Query| API | |
| API -.->|Circuit Breaker Check| VDB{Qdrant Online?} | |
| VDB -->|No| CH_FB[ClickHouse Keyword Fallback] | |
| VDB -->|Yes| V_Search[Hybrid Vector Search] | |
| V_Search -->|Top 20| Rerank[Cross-Encoder Reranker] | |
| Rerank -->|Diversity Pass| MMR[MMR Filter] | |
| MMR -->|Top K| Parent_Fetch[Parent Doc Retrieval] | |
| Parent_Fetch -->|Context| Prompt_Build[Prompt Construction] | |
| Prompt_Build -->|Inject| CH_Trends[ClickHouse Trends] | |
| CH_Trends -->|Full Prompt| LLM_Stream[LLM Streaming] | |
| LLM_Stream -->|SSE Tokens| User | |
| LLM_Stream -->|Trace| Postgres[(Interaction DB)] | |
| ``` | |