vgecbot / README.md
harsh-dev's picture
fix: match HF Space app_port to Dockerfile port 7860
38d5b05 unverified
metadata
title: vgecbot
emoji: 🤖
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false

RAG (Retrieval-Augmented Generation) Project

Services

Available Services

  1. Document Loader (services.document_loader)

    • Load PDF documents
    • Support for single and multiple file loading
    • Lazy loading support
  2. Vector Store (services.VectorStore)

    • Similarity search
    • Document management (add, update, delete)
    • Metadata filtering
  3. Text Splitter (services.TextSplitter) ✅

    • Recursive character text splitting
    • Language-specific splitting (20+ languages)
    • See docs/TEXT_SPLITTER.md for full documentation
  4. RAG Service (services.RAGService) ✅ NEW

    • Integrates Document Loader, Text Splitter, Vector Store
    • Powered by Google Gemini LLM
    • Creates a complete RAG pipeline with retrieval & generation

Quick Start

from services import document_loader, TextSplitter, VectorStore
from libs import ROOT_PATH

# Load documents
pdf_path = ROOT_PATH / "document.pdf"
doc_obj = document_loader(filepath=pdf_path)
documents = doc_obj.load()

# Split into chunks
splitter = TextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(documents)

# Add to vector store
# vector_store.add_documents(chunks)

Examples

Run the TextSplitter examples:

python examples_text_splitter.py

Tasks

  • Document Loader
  • Multiple PDF loader
  • if txt then txt loader
  • preprocessing
    • stop_words removal
    • punctuations
    • lowercasing
    • lemmetization
  • Recursive TextSplitter ✅
  • Assign Them Metadata properly!