A newer version of the Gradio SDK is available: 6.19.0
title: ProBas RAG Assistant
emoji: 🌍
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 6.16.0
app_file: app.py
pinned: false
short_description: RAG chat over the ProBas life-cycle process database
ProBas RAG Assistant
ProBas RAG Assistant is a retrieval-augmented chat app for the ProBas process dataset in probas_processes_by_classification_rag_json.
It loads the ProBas JSON records, builds a cached BM25 plus embedding index, and answers questions through the Academic Cloud (GWDG) OpenAI-compatible API, with a model fallback chain.
Features
- ProBas-only ingestion and hybrid retrieval (dense embeddings + BM25)
- Cached lexical and embedding index with checkpoint/resume
- Six selectable chat models with automatic failover
- Greeting / off-topic detection so casual messages get a friendly reply instead of forced citations
- Gradio chat UI with a retrieved-evidence panel
Setup
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env # then fill in OPENAI_API_KEY
Environment
OPENAI_API_KEY: API key for the OpenAI-compatible endpoint (required)OPENAI_BASE_URL: defaults tohttps://chat-ai.academiccloud.de/v1PROBAS_EMBEDDING_MODEL: defaults toqwen3-embedding-4b(must be an embedding model served by the endpoint)PROBAS_MAX_RECORDS: optional record limit for smoke testsPROBAS_EMBED_CONCURRENCY: parallel embedding requests during index build (default8); the main lever for build speedPROBAS_EMBED_BATCH_SIZE: texts per embedding request (default24); lower this if you see request timeoutsPROBAS_EMBED_TIMEOUT_SECONDS: per-request timeout for embeddings (default180)PROBAS_EMBED_MAX_RETRIES: retries before a failing batch is split in half (default1)PROBAS_CHECKPOINT_EVERY: save a resume checkpoint every N waves (default10)
Retrieval and answer-quality tuning
PROBAS_BM25_WEIGHT/PROBAS_VECTOR_WEIGHT: hybrid retrieval weights (defaults0.30/0.70). The dataset is German and the multilingual dense embedding handles cross-lingual queries (English "lignite" → German "Braunkohle"); BM25 is kept as a minority signal because at high weight it ranks generic boilerplate for such queries.PROBAS_MIN_RELEVANCE: minimum top cosine similarity for a query to be treated as on-topic (default0.45). Below it, the query is answered conversationally and the user is told no matching records were found, instead of fabricating an answer.PROBAS_MAX_CONTEXT_CHARS: per-record excerpt fed to the model (default5000).PROBAS_EVIDENCE_SNIPPET_CHARS: per-record snippet shown in the UI evidence panel (default320, kept compact and separate from the model context).PROBAS_EMBED_QUERY_INSTRUCTION: the instruction prefix added to queries (not documents), as Qwen3-Embedding expects. Greatly improves cross-lingual matching (English query → German records).PORT: optional deployment port (Hugging Face Spaces uses7860)
Impact numbers (key_impacts)
The records' rag_text only previews the first few exchanges, which miss the
actual emission outputs (CO₂, SO₂, NOₓ) and impact indicators (GWP/Treibhauseffekt,
cumulative energy demand). The app extracts a compact key_impacts block from the
raw exchanges/LCIA so the model can answer "what are the CO₂ emissions" with real
numbers. A fresh index build does this automatically; to add it to an existing
prebuilt bundle without re-embedding, run once:
python enrich_bundle.py
Run
python app.py
The first launch builds the index in the background (see below). On later launches the cached index loads in ~15s.
Model dropdown
The UI exposes the six strongest general-purpose chat models on the endpoint, strongest first:
qwen3.5-397b-a17b(default — large MoE, strong multilingual, fast 17B active params)mistral-large-3-675b-instruct-2512qwen3.5-122b-a10bopenai-gpt-oss-120bdeepseek-r1-distill-llama-70bglm-4.7
The app tries the selected model first, then falls back through the rest with retry and backoff.
Index build, checkpointing, and resume
On first launch the app embeds every ProBas record in the background using
PROBAS_EMBED_CONCURRENCY parallel requests, periodically writing a resume
checkpoint under indexes/probas_rag/. If the build is interrupted, the next
launch resumes from the last checkpoint instead of starting over.
Checkpoints are keyed by a fingerprint of the dataset and the embedding model,
so changing PROBAS_EMBEDDING_MODEL intentionally invalidates the old checkpoint.
Cache files from older code versions are purged automatically on startup.
If the raw dataset directory is absent but a prebuilt bundle is present under
indexes/probas_rag/, the app loads that bundle directly — this is what makes a
deployment that ships only the prebuilt index (e.g. a Hugging Face Space) work
without re-embedding.
Tracking build progress and ETA
While embedding, the app logs a live line per wave:
Embedded 1440/23172 records (6.2%) | 3.1 rec/s | elapsed 7m42s | ETA 1h56m
To check durable progress (what a restart would resume from) from a second terminal:
python check_progress.py
Deploying to Hugging Face Spaces
See DEPLOY_HF.md for the full step-by-step. In short:
- Set
OPENAI_API_KEYas a Space secret (never commit it). - Commit the prebuilt index under
indexes/probas_rag/via Git LFS (the.gitattributesalready tracks it) so the Space starts without re-embedding and without shipping the 1.2 GB raw dataset. - Push to the Space remote.
Data and cache
The dataset folder is read directly from probas_processes_by_classification_rag_json. The generated cache is stored under indexes/probas_rag/ and is safe to delete when rebuilding from scratch.