--- title: FreeCAD RAG Assistant emoji: 🛠️ colorFrom: blue colorTo: indigo sdk: gradio sdk_version: 6.5.1 python_version: "3.11" app_file: app.py pinned: false license: apache-2.0 short_description: FreeCAD Python from natural language via RAG --- # FreeCAD RAG Assistant A RAG (Retrieval-Augmented Generation) system that generates complete, runnable **FreeCAD 1.1 Python scripts** from natural-language descriptions of parts. ## Architecture ``` Query │ ├─► BM25 retrieval (bm25s) ─┐ │ ├─► RRF fusion ─► Cross-encoder rerank ─► Top-5 chunks └─► Dense retrieval (bge-small-en) ─┘ │ OpenAI (gpt-4o-mini) + system prompt │ Generated Python + inline citations ``` **Corpus**: [FreeCAD/FreeCAD-documentation](https://github.com/FreeCAD/FreeCAD-documentation) (CC0 1.0) — ~1,500 English wiki pages covering PartDesign, Sketcher, Python scripting API, and release notes. ## Setup ### 1. Install dependencies ```bash pip install -r requirements.txt ``` ### 2. Build the retrieval index (one-time, run locally) ```bash # Clone the FreeCAD documentation repo git clone --depth 1 https://github.com/FreeCAD/FreeCAD-documentation freecad-docs # Build BM25 + FAISS indices (outputs to data/) python build_index.py --repo freecad-docs ``` This produces `data/chunks.parquet`, `data/index.faiss`, and `data/bm25.pkl`. Commit these to the repo before pushing to Hugging Face Spaces. ### 3. Run ```bash python app.py ``` Enter your OpenAI API key in the UI (it is never stored or logged). ## Retrieval modes | Toggle | Method | Wins on | |--------|--------|---------| | BM25 | `bm25s` with camelCase/snake_case tokenisation | Exact API tokens: `addConstraint`, `Coincident`, `PartDesign::Pad` | | Dense | `BAAI/bge-small-en-v1.5` + FAISS IndexFlatIP | Paraphrased intent: "round the edges" → Fillet | | Rerank | `BAAI/bge-reranker-base` cross-encoder | Precision: re-scores top-30 fused candidates | | Hybrid (default) | Reciprocal Rank Fusion (k=60) | Best overall recall | ## Project structure ``` ├── app.py # Gradio Blocks UI ├── build_index.py # One-off corpus ingestion + indexing ├── requirements.txt ├── src/ │ ├── config.py # All tuneable constants │ ├── ingest.py # Markdown page loader │ ├── chunk.py # Header-split + code-block-preserving chunker │ ├── retrieve.py # BM25Retriever, DenseRetriever, RRF, HybridRetriever │ ├── generate.py # System prompt, few-shots, OpenAI call │ └── citations.py # Citation dataclass + rendering └── data/ # Pre-built indices (commit via git-LFS if > 100 MB) ├── chunks.parquet ├── index.faiss └── bm25.pkl ``` ## FreeCAD-specific notes - All generated scripts target **FreeCAD 1.1** (released March 25, 2026). - Scripts are safe to run with `freecadcmd` (headless) — `*Gui` modules are never imported. - The system prompt explicitly warns about the **Topological Naming Problem**: geometry is referenced by index where possible, and dress-up features (Fillet, Chamfer) are always added after all additive/subtractive features. - `doc.recompute()` is called after every feature to avoid silent failures. ## Evaluation queries See section 12 of the technical report for the 12-query test set covering: parametric box, flange with bolt pattern, hex nut, L-bracket, threaded shaft, spreadsheet-driven gear, revolution, coincident constraint question, TNP question, linear pattern, helix sweep, and multi-loop sketch. ## License Source code: Apache 2.0. Documentation corpus: [CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/) (FreeCAD Wiki). Attribution to FreeCAD Wiki (CC-BY 3.0) shown in the UI.