Spaces:
Sleeping
Sleeping
metadata
title: vgecbot
emoji: 🤖
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
RAG (Retrieval-Augmented Generation) Project
Services
Available Services
Document Loader (
services.document_loader)- Load PDF documents
- Support for single and multiple file loading
- Lazy loading support
Vector Store (
services.VectorStore)- Similarity search
- Document management (add, update, delete)
- Metadata filtering
Text Splitter (
services.TextSplitter) ✅- Recursive character text splitting
- Language-specific splitting (20+ languages)
- See docs/TEXT_SPLITTER.md for full documentation
RAG Service (
services.RAGService) ✅ NEW- Integrates Document Loader, Text Splitter, Vector Store
- Powered by Google Gemini LLM
- Creates a complete RAG pipeline with retrieval & generation
Quick Start
from services import document_loader, TextSplitter, VectorStore
from libs import ROOT_PATH
# Load documents
pdf_path = ROOT_PATH / "document.pdf"
doc_obj = document_loader(filepath=pdf_path)
documents = doc_obj.load()
# Split into chunks
splitter = TextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(documents)
# Add to vector store
# vector_store.add_documents(chunks)
Examples
Run the TextSplitter examples:
python examples_text_splitter.py
Tasks
- Document Loader
- Multiple PDF loader
- if txt then txt loader
- preprocessing
- stop_words removal
- punctuations
- lowercasing
- lemmetization
- Recursive TextSplitter ✅
- Assign Them Metadata properly!