Spaces:

GENAISPR26-Group2
/

MemoriaLM

Sleeping

App Files Files Community

MemoriaLM / docs /rag_techniques.md

Max Saavedra

RAG techniques and benchmark

10aaf26 about 1 month ago

preview code

raw

history blame contribute delete

1.39 kB

A newer version of the Gradio SDK is available: 6.12.0

Upgrade

RAG Techniques Comparison

This project currently supports two retrieval modes in chat:

topk: baseline dense retrieval from Chroma using cosine distance.
rerank: retrieves a larger candidate pool, then re-scores candidates using:
- vector relevance (from Chroma distance)
- lexical overlap with query terms

Where It Is Implemented

Request model: backend/models/schemas.py (ChatRequest.retrieval_mode)
Retrieval logic: backend/modules/rag.py
Chat API: backend/api/chat.py

How To Run Benchmark

Prerequisites:

Backend running
Notebook has at least one ingested source

Example:

python scripts/rag_benchmark.py \
  --base-url http://127.0.0.1:8000 \
  --user-id <user_id> \
  --notebook-id <notebook_id> \
  --query "Explain the key ideas in my notes" \
  --top-k 5 \
  --runs 5

The script prints JSON with average/min/max latency and citation/chunk stats for both modes.

Report Template

Use this table in your class deliverable:

Query	Mode	Avg Latency (ms)	Avg Citations	Notes on Retrieved Context
Q1	topk
Q1	rerank
Q2	topk
Q2	rerank

Recommended write-up points:

How different the retrieved chunks were between topk and rerank
Which mode produced less redundant context
Latency tradeoff (rerank usually slightly slower)