Spaces:

TuNan52
/

Mini-RAG

Sleeping

App Files Files Community

TuNan52 commited on Sep 3

Commit

d97f61e

verified ·

1 Parent(s): c69a4d6

Update README.md

Browse files

Files changed (1) hide show

README.md +192 -185

README.md CHANGED Viewed

@@ -1,185 +1,192 @@
-# RAG_Mini
----
-# Enterprise-Ready RAG System with Gradio Interface
-This is a powerful, enterprise-grade Retrieval-Augmented Generation (RAG) system designed to transform your documents into an interactive and intelligent knowledge base. Users can upload their own documents (PDFs, TXT files), build a searchable vector index, and ask complex questions in natural language to receive accurate, context-aware answers sourced directly from the provided materials.
-The entire application is wrapped in a clean, user-friendly web interface powered by Gradio.
-![App Screenshot](assets/1.png)
-![App Screenshot](assets/2.png)
-## ✨ Features
--   **Intuitive Web UI**: Simple, clean interface built with Gradio for uploading documents and chatting.
--   **Multi-Document Support**: Natively handles PDF and TXT files.
--   **Advanced Text Splitting**: Uses a `HierarchicalSemanticSplitter` that first splits documents into large parent chunks (for context) and then into smaller child chunks (for precise search), respecting semantic boundaries.
--   **Hybrid Search**: Combines the strengths of dense vector search (FAISS) and sparse keyword search (BM25) for robust and accurate retrieval.
--   **Reranking for Accuracy**: Employs a Cross-Encoder model to rerank the retrieved documents, ensuring the most relevant context is passed to the language model.
--   **Persistent Knowledge Base**: Automatically saves the built vector index and metadata, allowing you to load an existing knowledge base instantly on startup.
--   **Modular & Extensible Codebase**: The project is logically structured into services for loading, splitting, embedding, and generation, making it easy to maintain and extend.
-## 🏛️ System Architecture
-The RAG pipeline follows a logical, multi-step process to ensure high-quality answers:
-1.  **Load**: Documents are loaded from various formats and parsed into a standardized `Document` object, preserving metadata like source and page number.
-2.  **Split**: The raw text is processed by the `HierarchicalSemanticSplitter`, creating parent and child text chunks. This provides both broad context and fine-grained detail.
-3.  **Embed & Index**: The child chunks are converted into vector embeddings using a `SentenceTransformer` model and indexed in a FAISS vector store. A parallel BM25 index is also built for keyword search.
-4.  **Retrieve**: When a user asks a question, a hybrid search query is performed against the FAISS and BM25 indices to retrieve the most relevant child chunks.
-5.  **Fetch Context**: The parent chunks corresponding to the retrieved child chunks are fetched. This ensures the LLM receives a wider, more complete context.
-6.  **Rerank**: A powerful Cross-Encoder model re-evaluates the relevance of the parent chunks against the query, pushing the best matches to the top.
-7.  **Generate**: The top-ranked, reranked documents are combined with the user's query into a final prompt. This prompt is sent to a Large Language Model (LLM) to generate a final, coherent answer.
-```
-[User Uploads Docs] -> [Loader] -> [Splitter] -> [Embedder & Vector Store] -> [Knowledge Base Saved]
-[User Asks Question] -> [Hybrid Search] -> [Get Parent Docs] -> [Reranker] -> [LLM] -> [Answer & Sources]
-```
-## 🛠️ Tech Stack
--   **Backend**: Python 3.9+
--   **UI**: Gradio
--   **LLM & Embedding Framework**: Hugging Face Transformers, Sentence-Transformers
--   **Vector Search**: Faiss (from Facebook AI)
--   **Keyword Search**: rank-bm25
--   **PDF Parsing**: PyMuPDF (fitz)
--   **Configuration**: PyYAML
-## 🚀 Getting Started
-Follow these steps to set up and run the project on your local machine.
-### 1. Prerequisites
--   Python 3.9 or higher
--   `pip` for package management
-### 2. Create a `requirements.txt` file
-Before proceeding, it's crucial to have a `requirements.txt` file so others can easily install the necessary dependencies. In your activated terminal, run:
-```bash
-pip freeze > requirements.txt
-```
-This will save all the packages from your environment into the file. Make sure this file is committed to your GitHub repository. The key packages it should contain are: `gradio`, `torch`, `transformers`, `sentence-transformers`, `faiss-cpu`, `rank_bm25`, `PyMuPDF`, `pyyaml`, `numpy`.
-### 3. Installation & Setup
-**1. Clone the repository:**
-```bash
-git clone https://github.com/YOUR_USERNAME/YOUR_REPOSITORY_NAME.git
-cd YOUR_REPOSITORY_NAME
-```
-**2. Create and activate a virtual environment (recommended):**
-```bash
-# For Windows
-python -m venv venv
-.\venv\Scripts\activate
-# For macOS/Linux
-python3 -m venv venv
-source venv/bin/activate
-```
-**3. Install the required packages:**
-```bash
-pip install -r requirements.txt
-```
-**4. Configure the system:**
-Review the `configs/config.yaml` file. You can change the models, chunk sizes, and other parameters here. The default settings are a good starting point.
-> **Note:** The first time you run the application, the models specified in the config file will be downloaded from Hugging Face. This may take some time depending on your internet connection.
-### 4. Running the Application
-To start the Gradio web server, run the `main.py` script:
-```bash
-python main.py
-```
-The application will be available at **`http://localhost:7860`**.
-## 📖 How to Use
-The application has two primary workflows:
-**1. Build a New Knowledge Base:**
-   -   Drag and drop one or more `.pdf` or `.txt` files into the "Upload New Docs to Build" area.
-   -   Click the **"Build New KB"** button.
-   -   The system status will show the progress (Loading -> Splitting -> Indexing).
-   -   Once complete, the status will confirm that the knowledge base is ready, and the chat window will appear.
-**2. Load an Existing Knowledge Base:**
-   -   If you have previously built a knowledge base, simply click the **"Load Existing KB"** button.
-   -   The system will load the saved FAISS index and metadata from the `storage` directory.
-   -   The chat window will appear, and you can start asking questions immediately.
-**Chatting with Your Documents:**
-   -   Once the knowledge base is ready, type your question into the chat box at the bottom and press Enter or click "Submit".
-   -   The model will generate an answer based on the documents you provided.
-   -   The sources used to generate the answer will be displayed below the chat window.
-## 📂 Project Structure
-```
-.
-├── configs/
-│   └── config.yaml         # Main configuration file for models, paths, etc.
-├── core/
-│   ├── embedder.py         # Handles text embedding.
-│   ├── llm_interface.py    # Handles reranking and answer generation.
-│   ├── loader.py           # Loads and parses documents.
-│   ├── schema.py           # Defines data structures (Document, Chunk).
-│   ├── splitter.py         # Splits documents into chunks.
-│   └── vector_store.py     # Manages FAISS & BM25 indices.
-├── service/
-│   └── rag_service.py      # Orchestrates the entire RAG pipeline.
-├── storage/                # Default location for saved indices (auto-generated).
-│   └── ...
-├── ui/
-│   └── app.py              # Contains the Gradio UI logic.
-├── utils/
-│   └── logger.py           # Logging configuration.
-├── assets/
-│   └── 1.png               # Screenshot of the application.
-├── main.py                 # Entry point to run the application.
-└── requirements.txt        # Python package dependencies.
-```
-## 🔧 Configuration Details (`config.yaml`)
-You can customize the RAG pipeline by modifying `configs/config.yaml`:
--   **`models`**: Specify the Hugging Face models for embedding, reranking, and generation.
--   **`vector_store`**: Define the paths where the FAISS index and metadata will be saved.
--   **`splitter`**: Control the `HierarchicalSemanticSplitter` behavior.
-    -   `parent_chunk_size`: The target size for larger context chunks.
-    -   `parent_chunk_overlap`: The overlap between parent chunks.
-    -   `child_chunk_size`: The target size for smaller, searchable chunks.
--   **`retrieval`**: Tune the retrieval and reranking process.
-    -   `retrieval_top_k`: How many initial candidates to retrieve with hybrid search.
-    -   `rerank_top_k`: How many final documents to pass to the LLM after reranking.
-    -   `hybrid_search_alpha`: The weighting between vector search (`alpha`) and BM25 search (`1 - alpha`). `1.0` is pure vector search, `0.0` is pure keyword search.
--   **`generation`**: Set parameters for the final answer generation, like `max_new_tokens`.
-## 🛣️ Future Roadmap
--   [ ] Support for more document types (e.g., `.docx`, `.pptx`, `.html`).
--   [ ] Implement response streaming for a more interactive chat experience.
--   [ ] Integrate with other vector databases like ChromaDB or Pinecone.
--   [ ] Create API endpoints for programmatic access to the RAG service.
--   [ ] Add more advanced logging and monitoring for enterprise use.
-## 🤝 Contributing
-Contributions are welcome! If you have ideas for improvements or find a bug, please feel free to open an issue or submit a pull request.
-## 📄 License
-This project is licensed under the MIT License. See the `LICENSE` file for details.

+---
+sdk: gradio
+sdk_version: 3.50.2
+---
+# RAG_Mini
+---
+# Enterprise-Ready RAG System with Gradio Interface
+This is a powerful, enterprise-grade Retrieval-Augmented Generation (RAG) system designed to transform your documents into an interactive and intelligent knowledge base. Users can upload their own documents (PDFs, TXT files), build a searchable vector index, and ask complex questions in natural language to receive accurate, context-aware answers sourced directly from the provided materials.
+The entire application is wrapped in a clean, user-friendly web interface powered by Gradio.
+![App Screenshot](assets/1.png)
+![App Screenshot](assets/2.png)
+## ✨ Features
+-   **Intuitive Web UI**: Simple, clean interface built with Gradio for uploading documents and chatting.
+-   **Multi-Document Support**: Natively handles PDF and TXT files.
+-   **Advanced Text Splitting**: Uses a `HierarchicalSemanticSplitter` that first splits documents into large parent chunks (for context) and then into smaller child chunks (for precise search), respecting semantic boundaries.
+-   **Hybrid Search**: Combines the strengths of dense vector search (FAISS) and sparse keyword search (BM25) for robust and accurate retrieval.
+-   **Reranking for Accuracy**: Employs a Cross-Encoder model to rerank the retrieved documents, ensuring the most relevant context is passed to the language model.
+-   **Persistent Knowledge Base**: Automatically saves the built vector index and metadata, allowing you to load an existing knowledge base instantly on startup.
+-   **Modular & Extensible Codebase**: The project is logically structured into services for loading, splitting, embedding, and generation, making it easy to maintain and extend.
+## 🏛️ System Architecture
+The RAG pipeline follows a logical, multi-step process to ensure high-quality answers:
+1.  **Load**: Documents are loaded from various formats and parsed into a standardized `Document` object, preserving metadata like source and page number.
+2.  **Split**: The raw text is processed by the `HierarchicalSemanticSplitter`, creating parent and child text chunks. This provides both broad context and fine-grained detail.
+3.  **Embed & Index**: The child chunks are converted into vector embeddings using a `SentenceTransformer` model and indexed in a FAISS vector store. A parallel BM25 index is also built for keyword search.
+4.  **Retrieve**: When a user asks a question, a hybrid search query is performed against the FAISS and BM25 indices to retrieve the most relevant child chunks.
+5.  **Fetch Context**: The parent chunks corresponding to the retrieved child chunks are fetched. This ensures the LLM receives a wider, more complete context.
+6.  **Rerank**: A powerful Cross-Encoder model re-evaluates the relevance of the parent chunks against the query, pushing the best matches to the top.
+7.  **Generate**: The top-ranked, reranked documents are combined with the user's query into a final prompt. This prompt is sent to a Large Language Model (LLM) to generate a final, coherent answer.
+```
+[User Uploads Docs] -> [Loader] -> [Splitter] -> [Embedder & Vector Store] -> [Knowledge Base Saved]
+[User Asks Question] -> [Hybrid Search] -> [Get Parent Docs] -> [Reranker] -> [LLM] -> [Answer & Sources]
+```
+## 🛠️ Tech Stack
+-   **Backend**: Python 3.9+
+-   **UI**: Gradio
+-   **LLM & Embedding Framework**: Hugging Face Transformers, Sentence-Transformers
+-   **Vector Search**: Faiss (from Facebook AI)
+-   **Keyword Search**: rank-bm25
+-   **PDF Parsing**: PyMuPDF (fitz)
+-   **Configuration**: PyYAML
+## 🚀 Getting Started
+Follow these steps to set up and run the project on your local machine.
+### 1. Prerequisites
+-   Python 3.9 or higher
+-   `pip` for package management
+### 2. Create a `requirements.txt` file
+Before proceeding, it's crucial to have a `requirements.txt` file so others can easily install the necessary dependencies. In your activated terminal, run:
+```bash
+pip freeze > requirements.txt
+```
+This will save all the packages from your environment into the file. Make sure this file is committed to your GitHub repository. The key packages it should contain are: `gradio`, `torch`, `transformers`, `sentence-transformers`, `faiss-cpu`, `rank_bm25`, `PyMuPDF`, `pyyaml`, `numpy`.
+### 3. Installation & Setup
+**1. Clone the repository:**
+```bash
+git clone https://github.com/YOUR_USERNAME/YOUR_REPOSITORY_NAME.git
+cd YOUR_REPOSITORY_NAME
+```
+**2. Create and activate a virtual environment (recommended):**
+```bash
+# For Windows
+python -m venv venv
+.\venv\Scripts\activate
+# For macOS/Linux
+python3 -m venv venv
+source venv/bin/activate
+```
+**3. Install the required packages:**
+```bash
+pip install -r requirements.txt
+```
+**4. Configure the system:**
+Review the `configs/config.yaml` file. You can change the models, chunk sizes, and other parameters here. The default settings are a good starting point.
+> **Note:** The first time you run the application, the models specified in the config file will be downloaded from Hugging Face. This may take some time depending on your internet connection.
+### 4. Running the Application
+To start the Gradio web server, run the `main.py` script:
+```bash
+python main.py
+```
+The application will be available at **`http://localhost:7860`**.
+## 📖 How to Use
+The application has two primary workflows:
+**1. Build a New Knowledge Base:**
+   -   Drag and drop one or more `.pdf` or `.txt` files into the "Upload New Docs to Build" area.
+   -   Click the **"Build New KB"** button.
+   -   The system status will show the progress (Loading -> Splitting -> Indexing).
+   -   Once complete, the status will confirm that the knowledge base is ready, and the chat window will appear.
+**2. Load an Existing Knowledge Base:**
+   -   If you have previously built a knowledge base, simply click the **"Load Existing KB"** button.
+   -   The system will load the saved FAISS index and metadata from the `storage` directory.
+   -   The chat window will appear, and you can start asking questions immediately.
+**Chatting with Your Documents:**
+   -   Once the knowledge base is ready, type your question into the chat box at the bottom and press Enter or click "Submit".
+   -   The model will generate an answer based on the documents you provided.
+   -   The sources used to generate the answer will be displayed below the chat window.
+## 📂 Project Structure
+```
+.
+├── configs/
+│   └── config.yaml         # Main configuration file for models, paths, etc.
+├── core/
+│   ├── embedder.py         # Handles text embedding.
+│   ├── llm_interface.py    # Handles reranking and answer generation.
+│   ├── loader.py           # Loads and parses documents.
+│   ├── schema.py           # Defines data structures (Document, Chunk).
+│   ├── splitter.py         # Splits documents into chunks.
+│   └── vector_store.py     # Manages FAISS & BM25 indices.
+├── service/
+│   └── rag_service.py      # Orchestrates the entire RAG pipeline.
+├── storage/                # Default location for saved indices (auto-generated).
+│   └── ...
+├── ui/
+│   └── app.py              # Contains the Gradio UI logic.
+├── utils/
+│   └── logger.py           # Logging configuration.
+├── assets/
+│   └── 1.png               # Screenshot of the application.
+├── main.py                 # Entry point to run the application.
+└── requirements.txt        # Python package dependencies.
+```
+## 🔧 Configuration Details (`config.yaml`)
+You can customize the RAG pipeline by modifying `configs/config.yaml`:
+-   **`models`**: Specify the Hugging Face models for embedding, reranking, and generation.
+-   **`vector_store`**: Define the paths where the FAISS index and metadata will be saved.
+-   **`splitter`**: Control the `HierarchicalSemanticSplitter` behavior.
+    -   `parent_chunk_size`: The target size for larger context chunks.
+    -   `parent_chunk_overlap`: The overlap between parent chunks.
+    -   `child_chunk_size`: The target size for smaller, searchable chunks.
+-   **`retrieval`**: Tune the retrieval and reranking process.
+    -   `retrieval_top_k`: How many initial candidates to retrieve with hybrid search.
+    -   `rerank_top_k`: How many final documents to pass to the LLM after reranking.
+    -   `hybrid_search_alpha`: The weighting between vector search (`alpha`) and BM25 search (`1 - alpha`). `1.0` is pure vector search, `0.0` is pure keyword search.
+-   **`generation`**: Set parameters for the final answer generation, like `max_new_tokens`.
+## 🛣️ Future Roadmap
+-   [ ] Support for more document types (e.g., `.docx`, `.pptx`, `.html`).
+-   [ ] Implement response streaming for a more interactive chat experience.
+-   [ ] Integrate with other vector databases like ChromaDB or Pinecone.
+-   [ ] Create API endpoints for programmatic access to the RAG service.
+-   [ ] Add more advanced logging and monitoring for enterprise use.
+## 🤝 Contributing
+Contributions are welcome! If you have ideas for improvements or find a bug, please feel free to open an issue or submit a pull request.
+## 📄 License
+This project is licensed under the MIT License. See the `LICENSE` file for details.