File size: 9,301 Bytes
d97f61e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 |
---
sdk: gradio
sdk_version: 3.50.2
---
# RAG_Mini
---
# Enterprise-Ready RAG System with Gradio Interface
This is a powerful, enterprise-grade Retrieval-Augmented Generation (RAG) system designed to transform your documents into an interactive and intelligent knowledge base. Users can upload their own documents (PDFs, TXT files), build a searchable vector index, and ask complex questions in natural language to receive accurate, context-aware answers sourced directly from the provided materials.
The entire application is wrapped in a clean, user-friendly web interface powered by Gradio.


## β¨ Features
- **Intuitive Web UI**: Simple, clean interface built with Gradio for uploading documents and chatting.
- **Multi-Document Support**: Natively handles PDF and TXT files.
- **Advanced Text Splitting**: Uses a `HierarchicalSemanticSplitter` that first splits documents into large parent chunks (for context) and then into smaller child chunks (for precise search), respecting semantic boundaries.
- **Hybrid Search**: Combines the strengths of dense vector search (FAISS) and sparse keyword search (BM25) for robust and accurate retrieval.
- **Reranking for Accuracy**: Employs a Cross-Encoder model to rerank the retrieved documents, ensuring the most relevant context is passed to the language model.
- **Persistent Knowledge Base**: Automatically saves the built vector index and metadata, allowing you to load an existing knowledge base instantly on startup.
- **Modular & Extensible Codebase**: The project is logically structured into services for loading, splitting, embedding, and generation, making it easy to maintain and extend.
## ποΈ System Architecture
The RAG pipeline follows a logical, multi-step process to ensure high-quality answers:
1. **Load**: Documents are loaded from various formats and parsed into a standardized `Document` object, preserving metadata like source and page number.
2. **Split**: The raw text is processed by the `HierarchicalSemanticSplitter`, creating parent and child text chunks. This provides both broad context and fine-grained detail.
3. **Embed & Index**: The child chunks are converted into vector embeddings using a `SentenceTransformer` model and indexed in a FAISS vector store. A parallel BM25 index is also built for keyword search.
4. **Retrieve**: When a user asks a question, a hybrid search query is performed against the FAISS and BM25 indices to retrieve the most relevant child chunks.
5. **Fetch Context**: The parent chunks corresponding to the retrieved child chunks are fetched. This ensures the LLM receives a wider, more complete context.
6. **Rerank**: A powerful Cross-Encoder model re-evaluates the relevance of the parent chunks against the query, pushing the best matches to the top.
7. **Generate**: The top-ranked, reranked documents are combined with the user's query into a final prompt. This prompt is sent to a Large Language Model (LLM) to generate a final, coherent answer.
```
[User Uploads Docs] -> [Loader] -> [Splitter] -> [Embedder & Vector Store] -> [Knowledge Base Saved]
[User Asks Question] -> [Hybrid Search] -> [Get Parent Docs] -> [Reranker] -> [LLM] -> [Answer & Sources]
```
## π οΈ Tech Stack
- **Backend**: Python 3.9+
- **UI**: Gradio
- **LLM & Embedding Framework**: Hugging Face Transformers, Sentence-Transformers
- **Vector Search**: Faiss (from Facebook AI)
- **Keyword Search**: rank-bm25
- **PDF Parsing**: PyMuPDF (fitz)
- **Configuration**: PyYAML
## π Getting Started
Follow these steps to set up and run the project on your local machine.
### 1. Prerequisites
- Python 3.9 or higher
- `pip` for package management
### 2. Create a `requirements.txt` file
Before proceeding, it's crucial to have a `requirements.txt` file so others can easily install the necessary dependencies. In your activated terminal, run:
```bash
pip freeze > requirements.txt
```
This will save all the packages from your environment into the file. Make sure this file is committed to your GitHub repository. The key packages it should contain are: `gradio`, `torch`, `transformers`, `sentence-transformers`, `faiss-cpu`, `rank_bm25`, `PyMuPDF`, `pyyaml`, `numpy`.
### 3. Installation & Setup
**1. Clone the repository:**
```bash
git clone https://github.com/YOUR_USERNAME/YOUR_REPOSITORY_NAME.git
cd YOUR_REPOSITORY_NAME
```
**2. Create and activate a virtual environment (recommended):**
```bash
# For Windows
python -m venv venv
.\venv\Scripts\activate
# For macOS/Linux
python3 -m venv venv
source venv/bin/activate
```
**3. Install the required packages:**
```bash
pip install -r requirements.txt
```
**4. Configure the system:**
Review the `configs/config.yaml` file. You can change the models, chunk sizes, and other parameters here. The default settings are a good starting point.
> **Note:** The first time you run the application, the models specified in the config file will be downloaded from Hugging Face. This may take some time depending on your internet connection.
### 4. Running the Application
To start the Gradio web server, run the `main.py` script:
```bash
python main.py
```
The application will be available at **`http://localhost:7860`**.
## π How to Use
The application has two primary workflows:
**1. Build a New Knowledge Base:**
- Drag and drop one or more `.pdf` or `.txt` files into the "Upload New Docs to Build" area.
- Click the **"Build New KB"** button.
- The system status will show the progress (Loading -> Splitting -> Indexing).
- Once complete, the status will confirm that the knowledge base is ready, and the chat window will appear.
**2. Load an Existing Knowledge Base:**
- If you have previously built a knowledge base, simply click the **"Load Existing KB"** button.
- The system will load the saved FAISS index and metadata from the `storage` directory.
- The chat window will appear, and you can start asking questions immediately.
**Chatting with Your Documents:**
- Once the knowledge base is ready, type your question into the chat box at the bottom and press Enter or click "Submit".
- The model will generate an answer based on the documents you provided.
- The sources used to generate the answer will be displayed below the chat window.
## π Project Structure
```
.
βββ configs/
β βββ config.yaml # Main configuration file for models, paths, etc.
βββ core/
β βββ embedder.py # Handles text embedding.
β βββ llm_interface.py # Handles reranking and answer generation.
β βββ loader.py # Loads and parses documents.
β βββ schema.py # Defines data structures (Document, Chunk).
β βββ splitter.py # Splits documents into chunks.
β βββ vector_store.py # Manages FAISS & BM25 indices.
βββ service/
β βββ rag_service.py # Orchestrates the entire RAG pipeline.
βββ storage/ # Default location for saved indices (auto-generated).
β βββ ...
βββ ui/
β βββ app.py # Contains the Gradio UI logic.
βββ utils/
β βββ logger.py # Logging configuration.
βββ assets/
β βββ 1.png # Screenshot of the application.
βββ main.py # Entry point to run the application.
βββ requirements.txt # Python package dependencies.
```
## π§ Configuration Details (`config.yaml`)
You can customize the RAG pipeline by modifying `configs/config.yaml`:
- **`models`**: Specify the Hugging Face models for embedding, reranking, and generation.
- **`vector_store`**: Define the paths where the FAISS index and metadata will be saved.
- **`splitter`**: Control the `HierarchicalSemanticSplitter` behavior.
- `parent_chunk_size`: The target size for larger context chunks.
- `parent_chunk_overlap`: The overlap between parent chunks.
- `child_chunk_size`: The target size for smaller, searchable chunks.
- **`retrieval`**: Tune the retrieval and reranking process.
- `retrieval_top_k`: How many initial candidates to retrieve with hybrid search.
- `rerank_top_k`: How many final documents to pass to the LLM after reranking.
- `hybrid_search_alpha`: The weighting between vector search (`alpha`) and BM25 search (`1 - alpha`). `1.0` is pure vector search, `0.0` is pure keyword search.
- **`generation`**: Set parameters for the final answer generation, like `max_new_tokens`.
## π£οΈ Future Roadmap
- [ ] Support for more document types (e.g., `.docx`, `.pptx`, `.html`).
- [ ] Implement response streaming for a more interactive chat experience.
- [ ] Integrate with other vector databases like ChromaDB or Pinecone.
- [ ] Create API endpoints for programmatic access to the RAG service.
- [ ] Add more advanced logging and monitoring for enterprise use.
## π€ Contributing
Contributions are welcome! If you have ideas for improvements or find a bug, please feel free to open an issue or submit a pull request.
## π License
This project is licensed under the MIT License. See the `LICENSE` file for details. |