Spaces:

harshvisualz
/

vgecbot

Sleeping

App Files Files Community

vgecbot / README.md

harsh-dev

fix: match HF Space app_port to Dockerfile port 7860

38d5b05 unverified about 2 months ago

preview code

raw

history blame contribute delete

1.72 kB

	---
	title: vgecbot
	emoji: 🤖
	colorFrom: blue
	colorTo: indigo
	sdk: docker
	app_port: 7860
	pinned: false
	---

	# RAG (Retrieval-Augmented Generation) Project

	## Services

	### Available Services

	1. Document Loader (`services.document_loader`)
	- Load PDF documents
	- Support for single and multiple file loading
	- Lazy loading support

	2. Vector Store (`services.VectorStore`)
	- Similarity search
	- Document management (add, update, delete)
	- Metadata filtering

	3. Text Splitter (`services.TextSplitter`) ✅
	- Recursive character text splitting
	- Language-specific splitting (20+ languages)
	- See [docs/TEXT_SPLITTER.md](docs/TEXT_SPLITTER.md) for full documentation

	4. RAG Service (`services.RAGService`) ✅ NEW
	- Integrates Document Loader, Text Splitter, Vector Store
	- Powered by Google Gemini LLM
	- Creates a complete RAG pipeline with retrieval & generation

	## Quick Start

	```python
	from services import document_loader, TextSplitter, VectorStore
	from libs import ROOT_PATH

	# Load documents
	pdf_path = ROOT_PATH / "document.pdf"
	doc_obj = document_loader(filepath=pdf_path)
	documents = doc_obj.load()

	# Split into chunks
	splitter = TextSplitter(chunk_size=1000, chunk_overlap=200)
	chunks = splitter.split_documents(documents)

	# Add to vector store
	# vector_store.add_documents(chunks)
	```

	## Examples

	Run the TextSplitter examples:

	```bash
	python examples_text_splitter.py
	```

	## Tasks

	- [x] Document Loader
	- [ ] Multiple PDF loader
	- [ ] if txt then txt loader
	- [ ] preprocessing
	- [ ] stop_words removal
	- [ ] punctuations
	- [ ] lowercasing
	- [ ] lemmetization
	- [x] Recursive TextSplitter ✅
	- [ ] Assign Them Metadata properly!