Spaces:

harshvisualz
/

vgecbot

Sleeping

App Files Files Community

vgecbot / README.md

harsh-dev

fix: match HF Space app_port to Dockerfile port 7860

38d5b05 unverified about 2 months ago

preview code

raw

history blame contribute delete

1.72 kB

metadata

title: vgecbot
emoji: 🤖
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false

RAG (Retrieval-Augmented Generation) Project

Services

Available Services

Document Loader (services.document_loader)
- Load PDF documents
- Support for single and multiple file loading
- Lazy loading support
Vector Store (services.VectorStore)
- Similarity search
- Document management (add, update, delete)
- Metadata filtering
Text Splitter (services.TextSplitter) ✅
- Recursive character text splitting
- Language-specific splitting (20+ languages)
- See docs/TEXT_SPLITTER.md for full documentation
RAG Service (services.RAGService) ✅ NEW
- Integrates Document Loader, Text Splitter, Vector Store
- Powered by Google Gemini LLM
- Creates a complete RAG pipeline with retrieval & generation

Quick Start

from services import document_loader, TextSplitter, VectorStore
from libs import ROOT_PATH

# Load documents
pdf_path = ROOT_PATH / "document.pdf"
doc_obj = document_loader(filepath=pdf_path)
documents = doc_obj.load()

# Split into chunks
splitter = TextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(documents)

# Add to vector store
# vector_store.add_documents(chunks)

Examples

Run the TextSplitter examples:

python examples_text_splitter.py

Tasks

Document Loader
Multiple PDF loader
if txt then txt loader
preprocessing
- stop_words removal
- punctuations
- lowercasing
- lemmetization
Recursive TextSplitter ✅
Assign Them Metadata properly!