Spaces:
Sleeping
Sleeping
A newer version of the Streamlit SDK is available:
1.54.0
metadata
title: Enterprise RAG System
emoji: π
colorFrom: blue
colorTo: indigo
sdk: streamlit
sdk_version: 1.32.2
app_file: app.py
pinned: false
π Enterprise RAG System
A production-ready Retrieval Augmented Generation system featuring Hybrid Search, Reranking, and Hallucination Prevention.
π Key Differentiators
Unlike basic RAG tutorials, this system handles real-world edge cases:
- Hybrid Search (BM25 + Semantic): accurately retrieves both specific keywords (IDs, names) and conceptual matches.
- Safety First: Implements Confidence Gatingβthe system explicitly refuses to answer if retrieved context is insufficient, preventing hallucinations.
- Zero-Latency Deployment: Uses a custom Build-Time Artifact Injection pipeline to bake index files into the Docker container, eliminating startup delays.
π οΈ Architecture
graph LR
User[User Query] --> A[Hybrid Retriever]
A -->|Keywords| B(BM25 Index)
A -->|Semantics| C(Pinecone/FAISS)
B & C --> D[Rank Fusion (RRF)]
D --> E[Cross-Encoder Reranker]
E --> F{Confidence Check}
F -->|Low Score| G[Fallback Response]
F -->|High Score| H[LLM Generation]
π Quick Start
Local Development
# 1. Install Dependencies
pip install -r requirements.txt
# 2. Generate Index
python src/ingestion/ingest.py
# 3. Run App
streamlit run app.py
Deployment Strategy
We treat Data and Code separately for scalability:
- Code: GitHub (
app.py,src/) - Artifacts: Hugging Face Datasets (
data/index/)
The Dockerfile automatically fetches the latest index during build, ensuring the deployed container is always ready-to-serve.
π§ͺ Tech Stack
- LlamaIndex / Custom Pipeline: Hybrid Retrieval Logic
- Pinecone: Serverless Vector Database
- Sentence-Transformers: Embeddings & Reranking
- Streamlit: Conversational UI
- Docker: Containerized Deployment