Spaces:

sammoftah
/

rag-from-scratch

Sleeping

App Files Files Community

rag-from-scratch / README.md

sammoftah

Run RAG Space with Docker OCR support

5c44ad5 verified about 1 month ago

preview code

raw

history blame contribute delete

2.27 kB

metadata

title: RAG From Scratch
emoji: 📚
colorFrom: yellow
colorTo: blue
sdk: docker
pinned: false
license: mit

RAG from Scratch

Question

What actually happens inside a retrieval-augmented generation system?

System Boundary

This Space keeps the pipeline visible: PDF parsing, chunking, embedding, vector search, context assembly, and answer generation. The point is not to wrap RAG in an agent framework; the point is to expose the mechanics.

Method

Uploaded PDFs are split into overlapping text chunks. Each chunk is converted into a lightweight lexical vector, and a user question retrieves the closest passages by cosine similarity over term counts. The language model receives only the retrieved context and is asked to answer with source awareness.

Technique

Retrieval-augmented generation separates memory from generation. Instead of asking the model to answer from its parameters alone, the system first searches an external corpus and then conditions the model on the retrieved evidence.

The important design choices are chunk size, overlap, retrieval representation, distance metric, number of retrieved chunks, and prompt format. Each one changes the final answer quality.

Output

The app returns an answer, the retrieved chunks, similarity scores, and source names.

Why It Matters

Most RAG failures are retrieval failures disguised as generation failures. This demo makes retrieval inspectable.

What To Notice

If the retrieved chunks are weak, the generated answer will be weak even if the language model is strong. The retrieved evidence is therefore the first object to debug.

Effect In Practice

RAG lets teams build assistants over private or changing documents without fine-tuning the model every time the knowledge base changes.

Hugging Face Extension

This Space can grow into a full retrieval benchmark by publishing example documents, queries, expected citations, and answer-quality labels as a Hugging Face Dataset.

Limitations

The app uses lexical retrieval so it stays reliable on small CPU Spaces. Production systems should add embeddings, document metadata, reranking, evaluation sets, and hallucination checks.

Run Locally

pip install -r requirements.txt
python app.py