--- title: vgecbot emoji: 🤖 colorFrom: blue colorTo: indigo sdk: docker app_port: 7860 pinned: false --- # RAG (Retrieval-Augmented Generation) Project ## Services ### Available Services 1. **Document Loader** (`services.document_loader`) - Load PDF documents - Support for single and multiple file loading - Lazy loading support 2. **Vector Store** (`services.VectorStore`) - Similarity search - Document management (add, update, delete) - Metadata filtering 3. **Text Splitter** (`services.TextSplitter`) ✅ - Recursive character text splitting - Language-specific splitting (20+ languages) - See [docs/TEXT_SPLITTER.md](docs/TEXT_SPLITTER.md) for full documentation 4. **RAG Service** (`services.RAGService`) ✅ **NEW** - Integrates Document Loader, Text Splitter, Vector Store - Powered by **Google Gemini** LLM - Creates a complete RAG pipeline with retrieval & generation ## Quick Start ```python from services import document_loader, TextSplitter, VectorStore from libs import ROOT_PATH # Load documents pdf_path = ROOT_PATH / "document.pdf" doc_obj = document_loader(filepath=pdf_path) documents = doc_obj.load() # Split into chunks splitter = TextSplitter(chunk_size=1000, chunk_overlap=200) chunks = splitter.split_documents(documents) # Add to vector store # vector_store.add_documents(chunks) ``` ## Examples Run the TextSplitter examples: ```bash python examples_text_splitter.py ``` ## Tasks - [x] Document Loader - [ ] Multiple PDF loader - [ ] if txt then txt loader - [ ] preprocessing - [ ] stop_words removal - [ ] punctuations - [ ] lowercasing - [ ] lemmetization - [x] Recursive TextSplitter ✅ - [ ] Assign Them Metadata properly!