---
title: vgecbot
emoji: 🤖
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
---

# RAG (Retrieval-Augmented Generation) Project

## Services

### Available Services

1. **Document Loader** (`services.document_loader`)
   - Load PDF documents
   - Support for single and multiple file loading
   - Lazy loading support

2. **Vector Store** (`services.VectorStore`)
   - Similarity search
   - Document management (add, update, delete)
   - Metadata filtering

3. **Text Splitter** (`services.TextSplitter`) ✅
   - Recursive character text splitting
   - Language-specific splitting (20+ languages)
   - See [docs/TEXT_SPLITTER.md](docs/TEXT_SPLITTER.md) for full documentation

4. **RAG Service** (`services.RAGService`) ✅ **NEW**
   - Integrates Document Loader, Text Splitter, Vector Store
   - Powered by **Google Gemini** LLM
   - Creates a complete RAG pipeline with retrieval & generation

## Quick Start

```python
from services import document_loader, TextSplitter, VectorStore
from libs import ROOT_PATH

# Load documents
pdf_path = ROOT_PATH / "document.pdf"
doc_obj = document_loader(filepath=pdf_path)
documents = doc_obj.load()

# Split into chunks
splitter = TextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(documents)

# Add to vector store
# vector_store.add_documents(chunks)
```

## Examples

Run the TextSplitter examples:

```bash
python examples_text_splitter.py
```

## Tasks

- [x] Document Loader
- [ ] Multiple PDF loader
- [ ] if txt then txt loader
- [ ] preprocessing
  - [ ] stop_words removal
  - [ ] punctuations
  - [ ] lowercasing
  - [ ] lemmetization
- [x] Recursive TextSplitter ✅
- [ ] Assign Them Metadata properly!