Spaces:
Sleeping
Sleeping
| # Presentation Script | |
| ## One-line Pitch | |
| I built a grounded QA system over official technical documentation with reproducible artifact generation, benchmark evaluation, and terminal-first usage for both factoid and explanatory questions. | |
| ## What It Does | |
| - crawls official docs | |
| - converts them into searchable chunks | |
| - retrieves evidence with sparse and dense search | |
| - fine-tunes the dense retriever on project-generated domain pairs | |
| - reranks passages | |
| - extracts answers with a QA reader | |
| - synthesizes grounded explanatory answers from multiple supporting chunks | |
| - evaluates quality with benchmark metrics | |
| ## Why It Is Not Just a Demo | |
| - it has a real artifact pipeline | |
| - it stores local model snapshots and indexes | |
| - it includes benchmark evaluation and error analysis | |
| - it has deterministic rebuild behavior | |
| - it has CI and tests | |
| ## Suggested Live Flow | |
| 1. Show `scripts/qa_cli.py status` | |
| 2. Show `scripts/qa_cli.py ask "Which parameter type can you declare in a FastAPI path operation to set response headers?"` | |
| 3. Show `scripts/qa_cli.py ask --style explanatory "How do you set custom response headers in FastAPI, and why does using a Response parameter work?"` | |
| 4. Show `scripts/qa_cli.py eval --threshold 0.0` | |
| 5. Open `artifacts/real_qa/reports/evaluation_report.md` | |
| 6. Open `artifacts/real_qa/reports/error_analysis.md` | |
| ## Honest Framing | |
| - this is a serious QA system, not a novelty LLM product | |
| - the main strength is end-to-end retrieval QA engineering with grounded explanatory synthesis | |
| - the next scaling step would be larger supervised training and a larger benchmark | |
| - current snapshot is already reproducible and benchmarked, not just a notebook demo | |