File size: 2,100 Bytes
e3170e0
 
 
 
 
 
81a5ce2
e3170e0
 
 
 
f2e4a7d
9c4c212
f2e4a7d
9c4c212
f2e4a7d
9c4c212
f2e4a7d
9c4c212
f2e4a7d
9c4c212
f2e4a7d
 
 
9c4c212
f2e4a7d
9c4c212
f2e4a7d
 
 
 
 
 
 
 
 
 
9c4c212
 
f2e4a7d
9c4c212
f2e4a7d
9c4c212
f2e4a7d
 
9c4c212
f2e4a7d
 
9c4c212
f2e4a7d
 
9c4c212
 
f2e4a7d
 
 
 
9c4c212
f2e4a7d
9c4c212
f2e4a7d
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
title: Enterprise RAG System
emoji: πŸš€
colorFrom: blue
colorTo: indigo
sdk: streamlit
sdk_version: 1.32.2
app_file: app.py
pinned: false
---

# πŸ” Enterprise RAG System

> **A production-ready Retrieval Augmented Generation system featuring Hybrid Search, Reranking, and Hallucination Prevention.**

[![Live Demo](https://img.shields.io/badge/πŸ€—%20Hugging%20Face-Live%20Demo-blue)](https://huggingface.co/spaces/yuvis/Enterprise-RAG-System)

## 🌟 Key Differentiators

Unlike basic RAG tutorials, this system handles real-world edge cases:

1.  **Hybrid Search (BM25 + Semantic)**: accurately retrieves both specific keywords (IDs, names) and conceptual matches.
2.  **Safety First**: Implements **Confidence Gating**β€”the system explicitly refuses to answer if retrieved context is insufficient, preventing hallucinations.
3.  **Zero-Latency Deployment**: Uses a custom **Build-Time Artifact Injection** pipeline to bake index files into the Docker container, eliminating startup delays.

## πŸ› οΈ Architecture

```mermaid
graph LR
    User[User Query] --> A[Hybrid Retriever]
    A -->|Keywords| B(BM25 Index)
    A -->|Semantics| C(Pinecone/FAISS)
    B & C --> D[Rank Fusion (RRF)]
    D --> E[Cross-Encoder Reranker]
    E --> F{Confidence Check}
    F -->|Low Score| G[Fallback Response]
    F -->|High Score| H[LLM Generation]
```

## πŸš€ Quick Start

### Local Development
```bash
# 1. Install Dependencies
pip install -r requirements.txt

# 2. Generate Index
python src/ingestion/ingest.py

# 3. Run App
streamlit run app.py
```

### Deployment Strategy
We treat Data and Code separately for scalability:
- **Code**: GitHub (`app.py`, `src/`)
- **Artifacts**: Hugging Face Datasets (`data/index/`)

The `Dockerfile` automatically fetches the latest index during build, ensuring the deployed container is always ready-to-serve.

## πŸ§ͺ Tech Stack
- **LlamaIndex / Custom Pipeline**: Hybrid Retrieval Logic
- **Pinecone**: Serverless Vector Database
- **Sentence-Transformers**: Embeddings & Reranking
- **Streamlit**: Conversational UI
- **Docker**: Containerized Deployment