Spaces:

TuNan52
/

Mini-RAG

Sleeping

File size: 9,301 Bytes

d97f61e

---
sdk: gradio
sdk_version: 3.50.2

---


# RAG_Mini
---

# Enterprise-Ready RAG System with Gradio Interface

This is a powerful, enterprise-grade Retrieval-Augmented Generation (RAG) system designed to transform your documents into an interactive and intelligent knowledge base. Users can upload their own documents (PDFs, TXT files), build a searchable vector index, and ask complex questions in natural language to receive accurate, context-aware answers sourced directly from the provided materials.

The entire application is wrapped in a clean, user-friendly web interface powered by Gradio.

![App Screenshot](assets/1.png)
![App Screenshot](assets/2.png)

## ✨ Features

-   **Intuitive Web UI**: Simple, clean interface built with Gradio for uploading documents and chatting.
-   **Multi-Document Support**: Natively handles PDF and TXT files.
-   **Advanced Text Splitting**: Uses a `HierarchicalSemanticSplitter` that first splits documents into large parent chunks (for context) and then into smaller child chunks (for precise search), respecting semantic boundaries.
-   **Hybrid Search**: Combines the strengths of dense vector search (FAISS) and sparse keyword search (BM25) for robust and accurate retrieval.
-   **Reranking for Accuracy**: Employs a Cross-Encoder model to rerank the retrieved documents, ensuring the most relevant context is passed to the language model.
-   **Persistent Knowledge Base**: Automatically saves the built vector index and metadata, allowing you to load an existing knowledge base instantly on startup.
-   **Modular & Extensible Codebase**: The project is logically structured into services for loading, splitting, embedding, and generation, making it easy to maintain and extend.

## 🏛️ System Architecture

The RAG pipeline follows a logical, multi-step process to ensure high-quality answers:

1.  **Load**: Documents are loaded from various formats and parsed into a standardized `Document` object, preserving metadata like source and page number.
2.  **Split**: The raw text is processed by the `HierarchicalSemanticSplitter`, creating parent and child text chunks. This provides both broad context and fine-grained detail.
3.  **Embed & Index**: The child chunks are converted into vector embeddings using a `SentenceTransformer` model and indexed in a FAISS vector store. A parallel BM25 index is also built for keyword search.
4.  **Retrieve**: When a user asks a question, a hybrid search query is performed against the FAISS and BM25 indices to retrieve the most relevant child chunks.
5.  **Fetch Context**: The parent chunks corresponding to the retrieved child chunks are fetched. This ensures the LLM receives a wider, more complete context.
6.  **Rerank**: A powerful Cross-Encoder model re-evaluates the relevance of the parent chunks against the query, pushing the best matches to the top.
7.  **Generate**: The top-ranked, reranked documents are combined with the user's query into a final prompt. This prompt is sent to a Large Language Model (LLM) to generate a final, coherent answer.

```
[User Uploads Docs] -> [Loader] -> [Splitter] -> [Embedder & Vector Store] -> [Knowledge Base Saved]

[User Asks Question] -> [Hybrid Search] -> [Get Parent Docs] -> [Reranker] -> [LLM] -> [Answer & Sources]
```

## 🛠️ Tech Stack

-   **Backend**: Python 3.9+
-   **UI**: Gradio
-   **LLM & Embedding Framework**: Hugging Face Transformers, Sentence-Transformers
-   **Vector Search**: Faiss (from Facebook AI)
-   **Keyword Search**: rank-bm25
-   **PDF Parsing**: PyMuPDF (fitz)
-   **Configuration**: PyYAML

## 🚀 Getting Started

Follow these steps to set up and run the project on your local machine.

### 1. Prerequisites

-   Python 3.9 or higher
-   `pip` for package management

### 2. Create a `requirements.txt` file

Before proceeding, it's crucial to have a `requirements.txt` file so others can easily install the necessary dependencies. In your activated terminal, run:

```bash
pip freeze > requirements.txt
```
This will save all the packages from your environment into the file. Make sure this file is committed to your GitHub repository. The key packages it should contain are: `gradio`, `torch`, `transformers`, `sentence-transformers`, `faiss-cpu`, `rank_bm25`, `PyMuPDF`, `pyyaml`, `numpy`.

### 3. Installation & Setup

**1. Clone the repository:**
```bash
git clone https://github.com/YOUR_USERNAME/YOUR_REPOSITORY_NAME.git
cd YOUR_REPOSITORY_NAME
```

**2. Create and activate a virtual environment (recommended):**
```bash
# For Windows
python -m venv venv
.\venv\Scripts\activate

# For macOS/Linux
python3 -m venv venv
source venv/bin/activate
```

**3. Install the required packages:**
```bash
pip install -r requirements.txt
```

**4. Configure the system:**
Review the `configs/config.yaml` file. You can change the models, chunk sizes, and other parameters here. The default settings are a good starting point.

> **Note:** The first time you run the application, the models specified in the config file will be downloaded from Hugging Face. This may take some time depending on your internet connection.

### 4. Running the Application

To start the Gradio web server, run the `main.py` script:

```bash
python main.py
```

The application will be available at **`http://localhost:7860`**.

## 📖 How to Use

The application has two primary workflows:

**1. Build a New Knowledge Base:**
   -   Drag and drop one or more `.pdf` or `.txt` files into the "Upload New Docs to Build" area.
   -   Click the **"Build New KB"** button.
   -   The system status will show the progress (Loading -> Splitting -> Indexing).
   -   Once complete, the status will confirm that the knowledge base is ready, and the chat window will appear.

**2. Load an Existing Knowledge Base:**
   -   If you have previously built a knowledge base, simply click the **"Load Existing KB"** button.
   -   The system will load the saved FAISS index and metadata from the `storage` directory.
   -   The chat window will appear, and you can start asking questions immediately.

**Chatting with Your Documents:**
   -   Once the knowledge base is ready, type your question into the chat box at the bottom and press Enter or click "Submit".
   -   The model will generate an answer based on the documents you provided.
   -   The sources used to generate the answer will be displayed below the chat window.

## 📂 Project Structure

```
.
├── configs/
│   └── config.yaml         # Main configuration file for models, paths, etc.
├── core/
│   ├── embedder.py         # Handles text embedding.
│   ├── llm_interface.py    # Handles reranking and answer generation.
│   ├── loader.py           # Loads and parses documents.
│   ├── schema.py           # Defines data structures (Document, Chunk).
│   ├── splitter.py         # Splits documents into chunks.
│   └── vector_store.py     # Manages FAISS & BM25 indices.
├── service/
│   └── rag_service.py      # Orchestrates the entire RAG pipeline.
├── storage/                # Default location for saved indices (auto-generated).
│   └── ...
├── ui/
│   └── app.py              # Contains the Gradio UI logic.
├── utils/
│   └── logger.py           # Logging configuration.
├── assets/
│   └── 1.png               # Screenshot of the application.
├── main.py                 # Entry point to run the application.
└── requirements.txt        # Python package dependencies.
```

## 🔧 Configuration Details (`config.yaml`)

You can customize the RAG pipeline by modifying `configs/config.yaml`:

-   **`models`**: Specify the Hugging Face models for embedding, reranking, and generation.
-   **`vector_store`**: Define the paths where the FAISS index and metadata will be saved.
-   **`splitter`**: Control the `HierarchicalSemanticSplitter` behavior.
    -   `parent_chunk_size`: The target size for larger context chunks.
    -   `parent_chunk_overlap`: The overlap between parent chunks.
    -   `child_chunk_size`: The target size for smaller, searchable chunks.
-   **`retrieval`**: Tune the retrieval and reranking process.
    -   `retrieval_top_k`: How many initial candidates to retrieve with hybrid search.
    -   `rerank_top_k`: How many final documents to pass to the LLM after reranking.
    -   `hybrid_search_alpha`: The weighting between vector search (`alpha`) and BM25 search (`1 - alpha`). `1.0` is pure vector search, `0.0` is pure keyword search.
-   **`generation`**: Set parameters for the final answer generation, like `max_new_tokens`.

## 🛣️ Future Roadmap

-   [ ] Support for more document types (e.g., `.docx`, `.pptx`, `.html`).
-   [ ] Implement response streaming for a more interactive chat experience.
-   [ ] Integrate with other vector databases like ChromaDB or Pinecone.
-   [ ] Create API endpoints for programmatic access to the RAG service.
-   [ ] Add more advanced logging and monitoring for enterprise use.

## 🤝 Contributing

Contributions are welcome! If you have ideas for improvements or find a bug, please feel free to open an issue or submit a pull request.

## 📄 License

This project is licensed under the MIT License. See the `LICENSE` file for details.