Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.13.0
RAG Email Assistant (Haystack 2 + PydanticAI + Gradio)
Implementation Specifications — 2025-ready, modular, production-minded
Full engineering specification for implementing the staff-assist RAG system based on Haystack 2, PydanticAI, and Gradio.
Overview
This document specifies how to implement a Retrieval-Augmented Generation (RAG) system that helps university administrative staff draft replies to student emails, using policy documents written in Markdown.
It defines:
- System architecture
- Technologies and dependencies
- Data ingestion and indexing
- Retrieval and generation pipelines
- Agents and validation
- User interface behavior
- Deployment and CI/CD guidance
System Objectives
- Automate the first draft of admin replies using institution policy sources.
- Ensure every output is traceable and grounded in authoritative sources.
- Provide a non-technical, auditable interface for staff.
- Maintain strict privacy and institutional compliance.
Core Architecture
Components
| Layer | Technology | Description |
|---|---|---|
| Storage | OpenSearch | Hybrid BM25 + vector retrieval, 1024-d embeddings |
| Embeddings | multilingual-e5-large-instruct | Multilingual, normalized vectors |
| Retrieval | Haystack 2 pipelines | Modular retrievers + re-ranker |
| Reasoning | PydanticAI agents | Typed, validated outputs |
| Interface | Gradio 4.x | Simple GUI for staff use |
| Logging | Structlog JSON | Observability and auditability |
Agents
- IntentExtractor — Extracts intent, questions, entities, and language from a student email.
- Retriever (Haystack) — Finds relevant chunks based on extracted questions.
- Composer — Drafts a reply using retrieved content.
- FactChecker — Validates draft against evidence and flags unsupported claims.
Project Layout
app/
config.py # settings
logging_setup.py
models.py # Pydantic schemas
utils/
markdown_loader.py # Markdown parser
retriever/
indexer.py
pipeline.py
agents/
llm_client.py
intent_extractor.py
composer.py
fact_checker.py
gradio_app.py
main.py
scripts/
ingest_markdown.py
create_index.py
healthcheck.py
configs/
opensearch_index.json
docs/
RAG_Email_Assistant_Specifications_v1.0.md
Technologies & Dependencies
- Python: 3.11+
- Libraries: haystack-ai, opensearch-py, sentence-transformers, pydantic, pydantic-ai, gradio, structlog, fastapi
- Model weights: hosted or local (
intfloat/multilingual-e5-large-instruct,BAAI/bge-reranker-v2-m3,openai/gpt-oss-20b) - Infra: OpenSearch ≥ 2.11 with k-NN; Docker-compose optional.
Configuration
All config uses environment variables (Twelve-Factor pattern).
Example prefix: RAG_MODELS__LLM_MODEL=openai/gpt-oss-20b
See full mapping in earlier chat message.
Markdown Ingestion and Indexing
- Parse Markdown with
markdown-it-py. - Preserve headings, tables, and metadata (
section_path,lang,title). - Sentence-based chunking (~350 tokens) for prose; 1 chunk per table.
- Embed all chunks with normalized E5 embeddings and store in OpenSearch.
Retrieval Pipeline
- Hybrid retrieval = BM25 + dense embeddings.
- Fuse via Reciprocal Rank Fusion (RRF).
- Re-rank top candidates using multilingual cross-encoder.
- Return top 5 for generation.
Agentic Pipeline (PydanticAI)
StudentQuery(intent, questions, language, entities)
→ retrieve()
→ compose()
→ fact_check()
→ EmailDraft(body, citations, warnings)
Agents exchange typed objects, not raw text, ensuring safe re-prompting and audit logging.
Gradio UI
- Single text box → staff pastes student email.
- Button → “Generate Draft”.
- Output → editable email draft.
- Accordion (hidden by default) → shows evidence chunks with metadata.
Designed for Hugging Face Spaces deployment with external OpenSearch.
Deployment Notes
- Run OpenSearch separately (local Docker, managed cluster, or university infra).
- Host Gradio app on Spaces; connect via HTTPS.
- Optional FAISS fallback for demo mode.
- Logging to JSON; metrics optional.
Quality & Evaluation
- Maintain gold dataset of real queries (anonymized).
- Evaluate Recall@5, nDCG, and groundedness ratio.
- Audit warnings rate (% of drafts flagged by FactChecker).
Security & Privacy
- No student data stored persistently.
- All transmissions over HTTPS.
- LLM endpoints configured for no data retention.
- Mask PII in logs.
File Naming and Storage
- Save this specification as:
docs/RAG_Email_Assistant_Specifications_v1.0.md - Treat it as the authoritative internal design document.
Maintainer: Virtuelle Akademie AI Lab
Author: Andrew Ellis, BFH
Version: 1.0 (October 2025)
License: Internal use only.