Mohamed284's picture
Deploy ProBas RAG Assistant with enriched prebuilt index
0ca97fd
|
Raw
History Blame Contribute Delete
5.99 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade
metadata
title: ProBas RAG Assistant
emoji: 🌍
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 6.16.0
app_file: app.py
pinned: false
short_description: RAG chat over the ProBas life-cycle process database

ProBas RAG Assistant

ProBas RAG Assistant is a retrieval-augmented chat app for the ProBas process dataset in probas_processes_by_classification_rag_json.

It loads the ProBas JSON records, builds a cached BM25 plus embedding index, and answers questions through the Academic Cloud (GWDG) OpenAI-compatible API, with a model fallback chain.

Features

  • ProBas-only ingestion and hybrid retrieval (dense embeddings + BM25)
  • Cached lexical and embedding index with checkpoint/resume
  • Six selectable chat models with automatic failover
  • Greeting / off-topic detection so casual messages get a friendly reply instead of forced citations
  • Gradio chat UI with a retrieved-evidence panel

Setup

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env   # then fill in OPENAI_API_KEY

Environment

  • OPENAI_API_KEY: API key for the OpenAI-compatible endpoint (required)
  • OPENAI_BASE_URL: defaults to https://chat-ai.academiccloud.de/v1
  • PROBAS_EMBEDDING_MODEL: defaults to qwen3-embedding-4b (must be an embedding model served by the endpoint)
  • PROBAS_MAX_RECORDS: optional record limit for smoke tests
  • PROBAS_EMBED_CONCURRENCY: parallel embedding requests during index build (default 8); the main lever for build speed
  • PROBAS_EMBED_BATCH_SIZE: texts per embedding request (default 24); lower this if you see request timeouts
  • PROBAS_EMBED_TIMEOUT_SECONDS: per-request timeout for embeddings (default 180)
  • PROBAS_EMBED_MAX_RETRIES: retries before a failing batch is split in half (default 1)
  • PROBAS_CHECKPOINT_EVERY: save a resume checkpoint every N waves (default 10)

Retrieval and answer-quality tuning

  • PROBAS_BM25_WEIGHT / PROBAS_VECTOR_WEIGHT: hybrid retrieval weights (defaults 0.30 / 0.70). The dataset is German and the multilingual dense embedding handles cross-lingual queries (English "lignite" → German "Braunkohle"); BM25 is kept as a minority signal because at high weight it ranks generic boilerplate for such queries.
  • PROBAS_MIN_RELEVANCE: minimum top cosine similarity for a query to be treated as on-topic (default 0.45). Below it, the query is answered conversationally and the user is told no matching records were found, instead of fabricating an answer.
  • PROBAS_MAX_CONTEXT_CHARS: per-record excerpt fed to the model (default 5000).
  • PROBAS_EVIDENCE_SNIPPET_CHARS: per-record snippet shown in the UI evidence panel (default 320, kept compact and separate from the model context).
  • PROBAS_EMBED_QUERY_INSTRUCTION: the instruction prefix added to queries (not documents), as Qwen3-Embedding expects. Greatly improves cross-lingual matching (English query → German records).
  • PORT: optional deployment port (Hugging Face Spaces uses 7860)

Impact numbers (key_impacts)

The records' rag_text only previews the first few exchanges, which miss the actual emission outputs (CO₂, SO₂, NOₓ) and impact indicators (GWP/Treibhauseffekt, cumulative energy demand). The app extracts a compact key_impacts block from the raw exchanges/LCIA so the model can answer "what are the CO₂ emissions" with real numbers. A fresh index build does this automatically; to add it to an existing prebuilt bundle without re-embedding, run once:

python enrich_bundle.py

Run

python app.py

The first launch builds the index in the background (see below). On later launches the cached index loads in ~15s.

Model dropdown

The UI exposes the six strongest general-purpose chat models on the endpoint, strongest first:

  1. qwen3.5-397b-a17b  (default — large MoE, strong multilingual, fast 17B active params)
  2. mistral-large-3-675b-instruct-2512
  3. qwen3.5-122b-a10b
  4. openai-gpt-oss-120b
  5. deepseek-r1-distill-llama-70b
  6. glm-4.7

The app tries the selected model first, then falls back through the rest with retry and backoff.

Index build, checkpointing, and resume

On first launch the app embeds every ProBas record in the background using PROBAS_EMBED_CONCURRENCY parallel requests, periodically writing a resume checkpoint under indexes/probas_rag/. If the build is interrupted, the next launch resumes from the last checkpoint instead of starting over.

Checkpoints are keyed by a fingerprint of the dataset and the embedding model, so changing PROBAS_EMBEDDING_MODEL intentionally invalidates the old checkpoint. Cache files from older code versions are purged automatically on startup.

If the raw dataset directory is absent but a prebuilt bundle is present under indexes/probas_rag/, the app loads that bundle directly — this is what makes a deployment that ships only the prebuilt index (e.g. a Hugging Face Space) work without re-embedding.

Tracking build progress and ETA

While embedding, the app logs a live line per wave:

Embedded 1440/23172 records (6.2%) | 3.1 rec/s | elapsed 7m42s | ETA 1h56m

To check durable progress (what a restart would resume from) from a second terminal:

python check_progress.py

Deploying to Hugging Face Spaces

See DEPLOY_HF.md for the full step-by-step. In short:

  1. Set OPENAI_API_KEY as a Space secret (never commit it).
  2. Commit the prebuilt index under indexes/probas_rag/ via Git LFS (the .gitattributes already tracks it) so the Space starts without re-embedding and without shipping the 1.2 GB raw dataset.
  3. Push to the Space remote.

Data and cache

The dataset folder is read directly from probas_processes_by_classification_rag_json. The generated cache is stored under indexes/probas_rag/ and is safe to delete when rebuilding from scratch.