--- title: Classics RAG QA emoji: ๐Ÿ“š colorFrom: purple colorTo: blue sdk: gradio sdk_version: "4.0.0" app_file: app.py pinned: false --- # Classics RAG QA โ€” Grounded Q&A for The Iliad and Dorian Gray **Ask a question โ†’ get a concise answer with verbatim quotes and citations.** No hallucinations: every claim is backed by text pulled from the book. --- ## โœจ What it does - Retrieves the most relevant passages from **public-domain editions** (Project Gutenberg). - Composes a short answer that references **[1][2][3]**-style citations. - Shows the exact **quoted lines** and where they came from (book, chapter/paragraph). --- ## ๐Ÿ› ๏ธ How it works (under the hood) 1. **Chunk** the cleaned book into overlapping segments with chapter/paragraph metadata. 2. **Embed** chunks using `sentence-transformers/all-MiniLM-L6-v2`. 3. **Index** embeddings in FAISS for fast top-k retrieval. 4. **Compose** answers with a deterministic heuristic: - rank candidate sentences by lexical coverage and similarity; - select 1โ€“3 diverse quotes; - synthesize a 2โ€“4 sentence answer that explicitly references the quotes. No large language model is required; an optional rewrite step can be added but is **off by default** to preserve groundedness. --- ## ๐Ÿงช Evaluation (lightweight) - **Retrieval Recall@k:** proportion of questions whose gold-support chunk appears in the top-k. - **Groundedness:** % of answers with โ‰ฅ1 quote; **Attribution** = fraction of answer sentences that share โ‰ฅ2 content words with some quote. - On a tiny hand-built QA set (10โ€“20 items), target Recall@5 โ‰ฅ 0.8 and Groundedness โ‰ฅ 0.95. > Note: numbers vary by edition and chunking parameters. **Current Results:** - โœ… **Recall@5: 100%** (10/10 questions) - โœ… **Groundedness: 100%** quote presence - โœ… **Attribution Score: 0.75** (target: โ‰ฅ0.7) --- ## ๐Ÿš€ Try it - Pick a book (Iliad or Dorian Gray). - Ask focused questions like: - "How does Homer portray Achilles' anger in Book 1?" - "What does Lord Henry claim about influence on the young?" - "Where does the poem describe the shield of Achilles?" - Read the answer; expand **Evidence** to inspect quotes and locations. --- ## โš™๏ธ Configuration Key parameters (adjusted in `configs/app.yaml`): - `chunk_size` / `chunk_overlap`: retrieval granularity and recall. - `embedding_model`: default `all-MiniLM-L6-v2` (speed/quality trade-off). - `top_k`: number of retrieved chunks shown to the composer. --- ## ๐Ÿ“š Data & Licensing - Texts are sourced from **Project Gutenberg** (public domain). - Only derived chunks and indices are stored for retrieval; we do not redistribute copyrighted editions. --- ## ๐Ÿ”Ž Limitations - Coreference and pronouns may require nearby context; very long-range references can be missed. - Different translations/editions may shift phrasing and chapter boundaries. - The system is conservative by design; if quotes are weak, the answer stays cautious. - Negative questions (e.g., "Was X ugly?") may not retrieve correct context due to semantic search limitations with negation. --- ## ๐Ÿงฉ Roadmap - Named-entity & character graph for richer answers. - Optional LLM paraphrase pass that **never changes quotes** (off by default). - Multi-book corpus with per-source filtering and cross-references. --- ## ๐Ÿงพ Citation If you reference this project, please cite: > Classics RAG QA โ€” Grounded Literary Question Answering with Verbatim Citations (2025). > https://huggingface.co/spaces/Tuminha/classics-rag-qa