vgecbot / README.md
harsh-dev's picture
fix: match HF Space app_port to Dockerfile port 7860
38d5b05 unverified
---
title: vgecbot
emoji: πŸ€–
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
---
# RAG (Retrieval-Augmented Generation) Project
## Services
### Available Services
1. **Document Loader** (`services.document_loader`)
- Load PDF documents
- Support for single and multiple file loading
- Lazy loading support
2. **Vector Store** (`services.VectorStore`)
- Similarity search
- Document management (add, update, delete)
- Metadata filtering
3. **Text Splitter** (`services.TextSplitter`) βœ…
- Recursive character text splitting
- Language-specific splitting (20+ languages)
- See [docs/TEXT_SPLITTER.md](docs/TEXT_SPLITTER.md) for full documentation
4. **RAG Service** (`services.RAGService`) βœ… **NEW**
- Integrates Document Loader, Text Splitter, Vector Store
- Powered by **Google Gemini** LLM
- Creates a complete RAG pipeline with retrieval & generation
## Quick Start
```python
from services import document_loader, TextSplitter, VectorStore
from libs import ROOT_PATH
# Load documents
pdf_path = ROOT_PATH / "document.pdf"
doc_obj = document_loader(filepath=pdf_path)
documents = doc_obj.load()
# Split into chunks
splitter = TextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(documents)
# Add to vector store
# vector_store.add_documents(chunks)
```
## Examples
Run the TextSplitter examples:
```bash
python examples_text_splitter.py
```
## Tasks
- [x] Document Loader
- [ ] Multiple PDF loader
- [ ] if txt then txt loader
- [ ] preprocessing
- [ ] stop_words removal
- [ ] punctuations
- [ ] lowercasing
- [ ] lemmetization
- [x] Recursive TextSplitter βœ…
- [ ] Assign Them Metadata properly!