classics-rag-qa / README.md
Tuminha's picture
Upload README.md with huggingface_hub
006b823 verified

A newer version of the Gradio SDK is available: 6.2.0

Upgrade
metadata
title: Classics RAG QA
emoji: πŸ“š
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false

Classics RAG QA β€” Grounded Q&A for The Iliad and Dorian Gray

Ask a question β†’ get a concise answer with verbatim quotes and citations.

No hallucinations: every claim is backed by text pulled from the book.


✨ What it does

  • Retrieves the most relevant passages from public-domain editions (Project Gutenberg).

  • Composes a short answer that references [1][2][3]-style citations.

  • Shows the exact quoted lines and where they came from (book, chapter/paragraph).


πŸ› οΈ How it works (under the hood)

  1. Chunk the cleaned book into overlapping segments with chapter/paragraph metadata.

  2. Embed chunks using sentence-transformers/all-MiniLM-L6-v2.

  3. Index embeddings in FAISS for fast top-k retrieval.

  4. Compose answers with a deterministic heuristic:

    • rank candidate sentences by lexical coverage and similarity;

    • select 1–3 diverse quotes;

    • synthesize a 2–4 sentence answer that explicitly references the quotes.

No large language model is required; an optional rewrite step can be added but is off by default to preserve groundedness.


πŸ§ͺ Evaluation (lightweight)

  • Retrieval Recall@k: proportion of questions whose gold-support chunk appears in the top-k.

  • Groundedness: % of answers with β‰₯1 quote; Attribution = fraction of answer sentences that share β‰₯2 content words with some quote.

  • On a tiny hand-built QA set (10–20 items), target Recall@5 β‰₯ 0.8 and Groundedness β‰₯ 0.95.

Note: numbers vary by edition and chunking parameters.

Current Results:

  • βœ… Recall@5: 100% (10/10 questions)
  • βœ… Groundedness: 100% quote presence
  • βœ… Attribution Score: 0.75 (target: β‰₯0.7)

πŸš€ Try it

  • Pick a book (Iliad or Dorian Gray).

  • Ask focused questions like:

    • "How does Homer portray Achilles' anger in Book 1?"

    • "What does Lord Henry claim about influence on the young?"

    • "Where does the poem describe the shield of Achilles?"

  • Read the answer; expand Evidence to inspect quotes and locations.


βš™οΈ Configuration

Key parameters (adjusted in configs/app.yaml):

  • chunk_size / chunk_overlap: retrieval granularity and recall.

  • embedding_model: default all-MiniLM-L6-v2 (speed/quality trade-off).

  • top_k: number of retrieved chunks shown to the composer.


πŸ“š Data & Licensing

  • Texts are sourced from Project Gutenberg (public domain).

  • Only derived chunks and indices are stored for retrieval; we do not redistribute copyrighted editions.


πŸ”Ž Limitations

  • Coreference and pronouns may require nearby context; very long-range references can be missed.

  • Different translations/editions may shift phrasing and chapter boundaries.

  • The system is conservative by design; if quotes are weak, the answer stays cautious.

  • Negative questions (e.g., "Was X ugly?") may not retrieve correct context due to semantic search limitations with negation.


🧩 Roadmap

  • Named-entity & character graph for richer answers.

  • Optional LLM paraphrase pass that never changes quotes (off by default).

  • Multi-book corpus with per-source filtering and cross-references.


🧾 Citation

If you reference this project, please cite:

Classics RAG QA β€” Grounded Literary Question Answering with Verbatim Citations (2025).

https://huggingface.co/spaces/Tuminha/classics-rag-qa