Spaces:
Sleeping
Sleeping
| title: vgecbot | |
| emoji: π€ | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: docker | |
| app_port: 7860 | |
| pinned: false | |
| # RAG (Retrieval-Augmented Generation) Project | |
| ## Services | |
| ### Available Services | |
| 1. **Document Loader** (`services.document_loader`) | |
| - Load PDF documents | |
| - Support for single and multiple file loading | |
| - Lazy loading support | |
| 2. **Vector Store** (`services.VectorStore`) | |
| - Similarity search | |
| - Document management (add, update, delete) | |
| - Metadata filtering | |
| 3. **Text Splitter** (`services.TextSplitter`) β | |
| - Recursive character text splitting | |
| - Language-specific splitting (20+ languages) | |
| - See [docs/TEXT_SPLITTER.md](docs/TEXT_SPLITTER.md) for full documentation | |
| 4. **RAG Service** (`services.RAGService`) β **NEW** | |
| - Integrates Document Loader, Text Splitter, Vector Store | |
| - Powered by **Google Gemini** LLM | |
| - Creates a complete RAG pipeline with retrieval & generation | |
| ## Quick Start | |
| ```python | |
| from services import document_loader, TextSplitter, VectorStore | |
| from libs import ROOT_PATH | |
| # Load documents | |
| pdf_path = ROOT_PATH / "document.pdf" | |
| doc_obj = document_loader(filepath=pdf_path) | |
| documents = doc_obj.load() | |
| # Split into chunks | |
| splitter = TextSplitter(chunk_size=1000, chunk_overlap=200) | |
| chunks = splitter.split_documents(documents) | |
| # Add to vector store | |
| # vector_store.add_documents(chunks) | |
| ``` | |
| ## Examples | |
| Run the TextSplitter examples: | |
| ```bash | |
| python examples_text_splitter.py | |
| ``` | |
| ## Tasks | |
| - [x] Document Loader | |
| - [ ] Multiple PDF loader | |
| - [ ] if txt then txt loader | |
| - [ ] preprocessing | |
| - [ ] stop_words removal | |
| - [ ] punctuations | |
| - [ ] lowercasing | |
| - [ ] lemmetization | |
| - [x] Recursive TextSplitter β | |
| - [ ] Assign Them Metadata properly! | |