Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.2.0
title: Classics RAG QA
emoji: π
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
Classics RAG QA β Grounded Q&A for The Iliad and Dorian Gray
Ask a question β get a concise answer with verbatim quotes and citations.
No hallucinations: every claim is backed by text pulled from the book.
β¨ What it does
Retrieves the most relevant passages from public-domain editions (Project Gutenberg).
Composes a short answer that references [1][2][3]-style citations.
Shows the exact quoted lines and where they came from (book, chapter/paragraph).
π οΈ How it works (under the hood)
Chunk the cleaned book into overlapping segments with chapter/paragraph metadata.
Embed chunks using
sentence-transformers/all-MiniLM-L6-v2.Index embeddings in FAISS for fast top-k retrieval.
Compose answers with a deterministic heuristic:
rank candidate sentences by lexical coverage and similarity;
select 1β3 diverse quotes;
synthesize a 2β4 sentence answer that explicitly references the quotes.
No large language model is required; an optional rewrite step can be added but is off by default to preserve groundedness.
π§ͺ Evaluation (lightweight)
Retrieval Recall@k: proportion of questions whose gold-support chunk appears in the top-k.
Groundedness: % of answers with β₯1 quote; Attribution = fraction of answer sentences that share β₯2 content words with some quote.
On a tiny hand-built QA set (10β20 items), target Recall@5 β₯ 0.8 and Groundedness β₯ 0.95.
Note: numbers vary by edition and chunking parameters.
Current Results:
- β Recall@5: 100% (10/10 questions)
- β Groundedness: 100% quote presence
- β Attribution Score: 0.75 (target: β₯0.7)
π Try it
Pick a book (Iliad or Dorian Gray).
Ask focused questions like:
"How does Homer portray Achilles' anger in Book 1?"
"What does Lord Henry claim about influence on the young?"
"Where does the poem describe the shield of Achilles?"
Read the answer; expand Evidence to inspect quotes and locations.
βοΈ Configuration
Key parameters (adjusted in configs/app.yaml):
chunk_size/chunk_overlap: retrieval granularity and recall.embedding_model: defaultall-MiniLM-L6-v2(speed/quality trade-off).top_k: number of retrieved chunks shown to the composer.
π Data & Licensing
Texts are sourced from Project Gutenberg (public domain).
Only derived chunks and indices are stored for retrieval; we do not redistribute copyrighted editions.
π Limitations
Coreference and pronouns may require nearby context; very long-range references can be missed.
Different translations/editions may shift phrasing and chapter boundaries.
The system is conservative by design; if quotes are weak, the answer stays cautious.
Negative questions (e.g., "Was X ugly?") may not retrieve correct context due to semantic search limitations with negation.
π§© Roadmap
Named-entity & character graph for richer answers.
Optional LLM paraphrase pass that never changes quotes (off by default).
Multi-book corpus with per-source filtering and cross-references.
π§Ύ Citation
If you reference this project, please cite:
Classics RAG QA β Grounded Literary Question Answering with Verbatim Citations (2025).