Spaces:

Peterase
/

rag-api-node-1

Running

App Files Files Community

rag-api-node-1 / docs /RAG_API_PPT.md

Peterase

feat(rag): implement hybrid search with live sources and production-grade intent classification

a63c61f 10 days ago

preview code

raw

history blame contribute delete

5.94 kB

	# Presentation Outline: Conversational Intelligence
	## The SOTA RAG API & News Retrieval Flow

	This document is optimized for AI PPT Generators. It contains 12 detailed slides covering the RAG Technology Stack and the request-to-response data flow.

	---

	### Slide 1: Title Slide
	* Headline: Conversational Intelligence: Deep Dive into the SOTA RAG API
	* Sub-headline: Bridging Natural Language and Real-Time News Data Warehouse
	* Visual Suggestion: A glowing brain icon connected to a massive bookshelf (representing the Vector Store) and a lightning bolt (representing real-time trends).

	---

	### Slide 2: The RAG Tech Stack - Strategic Selection
	* Core Concept: Why these tools? A comparative advantage analysis.
	* Alternative Comparison Table:

	\| Component \| Our Choice \| Alternatives \| Competitive Advantage \|
	\| :--- \| :--- \| :--- \| :--- \|
	\| LLM Engine \| GPT-4o \| Llama-3, Mistral, Claude \| Superior reasoning for complex query synthesis & multilingual logic. \|
	\| Vector DB \| Qdrant \| Pinecone, Milvus, Weaviate \| Native Hybrid Search support & high-speed gRPC batching protocol. \|
	\| Embeddings \| BGE-M3 \| OpenAI `text-3`, HuggingFace \| Sparse + Dense in one pass; massive 8192 token window. \|
	\| Reranker \| TinyBERT CE \| Cohere Rerank, BGE-Reranker \| Local CPU-optimized execution with high Precision-at-K. \|
	\| Analytics \| ClickHouse \| PostgreSQL, ELK, Timescale \| sub-second OLAP performance on high-velocity news data streams. \|
	\| API Protocol \| SSE (Stream) \| WebSockets, REST, gRPC-Web \| Direct HTTP/1.1 compatibility; lower overhead for one-way streams. \|

	* Visual Suggestion: A "Engine Room" comparison chart where our tools are highlighted in gold.

	---

	### Slide 3: Hidden Magic - Pre-Warming & Startup
	* Core Concept: Zero-Latency "Cold Start."
	* Details:
	* Problem: Heavy AI models take ~10s to load.
	* Solution: Background background loading on server start.
	* Benefit: The first user query in the morning is just as fast as the 100th.
	* Visual Suggestion: A "Loading Bar" that finishes before the user even arrives.

	---

	### Slide 4: Step 1 - Query Transformation (Synthesis)
	* Core Concept: Understanding "Contextual" Questions.
	* Details:
	* Synthesis: Merging conversation history with the new query.
	* Technique: Using GPT-4 to convert "What about Intel?" into "Financial performance of Intel in 2024".
	* Example:
	* History: "Tell me about Nvidia."
	* Follow-up: "What about Intel?"
	* Result: Standalone query specifically about Intel vs Nvidia context.

	---

	### Slide 5: Step 2 - Hybrid Search & Intent Recognition
	* Core Concept: Combining Concept (Dense) and Keywords (Sparse).
	* Details:
	* Dense: Finding "vibe" (e.g., "financial crash" matches "bankruptcy").
	* Sparse: Finding "tickers" (e.g., "NVDA", "AAPL") or specific entities.
	* Visual Suggestion: Two searchlights (Dense and Sparse) converging on a single high-quality news article.

	---

	### Slide 6: Step 3 - Temporal Decay (Recency Boosting)
	* Core Concept: News Freshness Matters.
	* Details:
	* Logic: Today's 80% match is better than last year's 100% match.
	* Mechanism: Applying a mathematical penalty to older articles during the search phase.
	* Example: A fresh report on a merger ranks higher than a "deep dive" from 6 months ago.

	---

	### Slide 7: Step 4 - Precision Reranking (Cross-Encoder)
	* Core Concept: From "Fast Search" to "Exact Grade."
	* Details:
	* Moving from Bi-Encoders (fast, broad) to Cross-Encoders (slow, ultra-accurate).
	* Checking the Top 20 results one-by-one to ensure they actually answer the question.
	* Example: Eliminating articles that mention the keywords but are actually about a different topic.

	---

	### Slide 8: Step 5 - Diversity Filtering (MMR)
	* Core Concept: Anti-Echo Chamber.
	* Details:
	* Maximal Marginal Relevance (MMR): Selecting articles that are relevant but different from each other.
	* Benefit: Instead of 5 articles saying the same thing, the LLM gets 5 different perspectives (e.g., Fact, Opinion, Impact).
	* Visual: A filter that takes out identical "Copy-Paste" news reports.

	---

	### Slide 9: Step 6 - Parent Retrieval & Context Expansion
	* Core Concept: Seeing the Big Picture.
	* Details:
	* Search is done on small chunks (~500 chars).
	* If a chunk is a "Perfect Match," the system fetches the entire article from ClickHouse.
	* Benefit: The LLM gets the full context of the story, not just a broken sentence.

	---

	### Slide 10: Step 7 - Trend Fusion & LLM Grounding
	* Core Concept: Real-Time Intelligence.
	* Details:
	* The API fetches "Trending Topics" from ClickHouse in parallel.
	* This data is injected into the LLM prompt to inform it of broader market trends.
	* Result: "While these articles focus on Company A, the general market sentiment in ClickHouse shows a negative shift today."

	---

	### Slide 11: Step 8 - SSE Streaming (Real-Time Experience)
	* Core Concept: Instant Gratification.
	* Details:
	* Using Server-Sent Events (SSE).
	* Tokens are pushed to the user as they are generated.
	* Perceived wait time drops from 5 seconds to 300ms.
	* Visual Suggestion: Tokens appearing one-by-one in a fast, fluid stream.

	---

	### Slide 12: Reliability & Traceability
	* Core Concept: Production-Ready Design.
	* Details:
	* Circuit Breaker: If Qdrant is down, ClickHouse keyword search automatically takes over.
	* Interaction Trace: Every source used to answer a question is logged for debugging and human feedback (Thumbs Up/Down).
	* Final Word: A resilient, intelligent, and highly accurate news RAG system.

	# Presentation Outline: Conversational Intelligence
	## The SOTA RAG API & News Retrieval Flow

	This document is optimized for AI PPT Generators. It contains 12 detailed slides covering the RAG Technology Stack and the request-to-response data flow.

	---

	### Slide 1: Title Slide
	* Headline: Conversational Intelligence: Deep Dive into the SOTA RAG API
	* Sub-headline: Bridging Natural Language and Real-Time News Data Warehouse
	* Visual Suggestion: A glowing brain icon connected to a massive bookshelf (representing the Vector Store) and a lightning bolt (representing real-time trends).

	---

	### Slide 2: The RAG Tech Stack - Strategic Selection
	* Core Concept: Why these tools? A comparative advantage analysis.
	* Alternative Comparison Table:

	\| Component \| Our Choice \| Alternatives \| Competitive Advantage \|
	\| :--- \| :--- \| :--- \| :--- \|
	\| LLM Engine \| GPT-4o \| Llama-3, Mistral, Claude \| Superior reasoning for complex query synthesis & multilingual logic. \|
	\| Vector DB \| Qdrant \| Pinecone, Milvus, Weaviate \| Native Hybrid Search support & high-speed gRPC batching protocol. \|
	\| Embeddings \| BGE-M3 \| OpenAI `text-3`, HuggingFace \| Sparse + Dense in one pass; massive 8192 token window. \|
	\| Reranker \| TinyBERT CE \| Cohere Rerank, BGE-Reranker \| Local CPU-optimized execution with high Precision-at-K. \|
	\| Analytics \| ClickHouse \| PostgreSQL, ELK, Timescale \| sub-second OLAP performance on high-velocity news data streams. \|
	\| API Protocol \| SSE (Stream) \| WebSockets, REST, gRPC-Web \| Direct HTTP/1.1 compatibility; lower overhead for one-way streams. \|

	* Visual Suggestion: A "Engine Room" comparison chart where our tools are highlighted in gold.

	---

	### Slide 3: Hidden Magic - Pre-Warming & Startup
	* Core Concept: Zero-Latency "Cold Start."
	* Details:
	* Problem: Heavy AI models take ~10s to load.
	* Solution: Background background loading on server start.
	* Benefit: The first user query in the morning is just as fast as the 100th.
	* Visual Suggestion: A "Loading Bar" that finishes before the user even arrives.

	---

	### Slide 4: Step 1 - Query Transformation (Synthesis)
	* Core Concept: Understanding "Contextual" Questions.
	* Details:
	* Synthesis: Merging conversation history with the new query.
	* Technique: Using GPT-4 to convert "What about Intel?" into "Financial performance of Intel in 2024".
	* Example:
	* History: "Tell me about Nvidia."
	* Follow-up: "What about Intel?"
	* Result: Standalone query specifically about Intel vs Nvidia context.

	---

	### Slide 5: Step 2 - Hybrid Search & Intent Recognition
	* Core Concept: Combining Concept (Dense) and Keywords (Sparse).
	* Details:
	* Dense: Finding "vibe" (e.g., "financial crash" matches "bankruptcy").
	* Sparse: Finding "tickers" (e.g., "NVDA", "AAPL") or specific entities.
	* Visual Suggestion: Two searchlights (Dense and Sparse) converging on a single high-quality news article.

	---

	### Slide 6: Step 3 - Temporal Decay (Recency Boosting)
	* Core Concept: News Freshness Matters.
	* Details:
	* Logic: Today's 80% match is better than last year's 100% match.
	* Mechanism: Applying a mathematical penalty to older articles during the search phase.
	* Example: A fresh report on a merger ranks higher than a "deep dive" from 6 months ago.

	---

	### Slide 7: Step 4 - Precision Reranking (Cross-Encoder)
	* Core Concept: From "Fast Search" to "Exact Grade."
	* Details:
	* Moving from Bi-Encoders (fast, broad) to Cross-Encoders (slow, ultra-accurate).
	* Checking the Top 20 results one-by-one to ensure they actually answer the question.
	* Example: Eliminating articles that mention the keywords but are actually about a different topic.

	---

	### Slide 8: Step 5 - Diversity Filtering (MMR)
	* Core Concept: Anti-Echo Chamber.
	* Details:
	* Maximal Marginal Relevance (MMR): Selecting articles that are relevant but different from each other.
	* Benefit: Instead of 5 articles saying the same thing, the LLM gets 5 different perspectives (e.g., Fact, Opinion, Impact).
	* Visual: A filter that takes out identical "Copy-Paste" news reports.

	---

	### Slide 9: Step 6 - Parent Retrieval & Context Expansion
	* Core Concept: Seeing the Big Picture.
	* Details:
	* Search is done on small chunks (~500 chars).
	* If a chunk is a "Perfect Match," the system fetches the entire article from ClickHouse.
	* Benefit: The LLM gets the full context of the story, not just a broken sentence.

	---

	### Slide 10: Step 7 - Trend Fusion & LLM Grounding
	* Core Concept: Real-Time Intelligence.
	* Details:
	* The API fetches "Trending Topics" from ClickHouse in parallel.
	* This data is injected into the LLM prompt to inform it of broader market trends.
	* Result: "While these articles focus on Company A, the general market sentiment in ClickHouse shows a negative shift today."

	---

	### Slide 11: Step 8 - SSE Streaming (Real-Time Experience)
	* Core Concept: Instant Gratification.
	* Details:
	* Using Server-Sent Events (SSE).
	* Tokens are pushed to the user as they are generated.
	* Perceived wait time drops from 5 seconds to 300ms.
	* Visual Suggestion: Tokens appearing one-by-one in a fast, fluid stream.

	---

	### Slide 12: Reliability & Traceability
	* Core Concept: Production-Ready Design.
	* Details:
	* Circuit Breaker: If Qdrant is down, ClickHouse keyword search automatically takes over.
	* Interaction Trace: Every source used to answer a question is logged for debugging and human feedback (Thumbs Up/Down).
	* Final Word: A resilient, intelligent, and highly accurate news RAG system.