| --- |
| title: FreeCAD RAG Assistant |
| emoji: π οΈ |
| colorFrom: blue |
| colorTo: indigo |
| sdk: gradio |
| sdk_version: 6.5.1 |
| python_version: "3.11" |
| app_file: app.py |
| pinned: false |
| license: apache-2.0 |
| short_description: FreeCAD Python from natural language via RAG |
| --- |
| |
| # FreeCAD RAG Assistant |
|
|
| A RAG (Retrieval-Augmented Generation) system that generates complete, runnable **FreeCAD 1.1 Python scripts** from natural-language descriptions of parts. |
|
|
| ## Architecture |
|
|
| ``` |
| Query |
| β |
| βββΊ BM25 retrieval (bm25s) ββ |
| β βββΊ RRF fusion ββΊ Cross-encoder rerank ββΊ Top-5 chunks |
| βββΊ Dense retrieval (bge-small-en) ββ |
| β |
| OpenAI (gpt-4o-mini) + system prompt |
| β |
| Generated Python + inline citations |
| ``` |
|
|
| **Corpus**: [FreeCAD/FreeCAD-documentation](https://github.com/FreeCAD/FreeCAD-documentation) (CC0 1.0) β ~1,500 English wiki pages covering PartDesign, Sketcher, Python scripting API, and release notes. |
|
|
| ## Setup |
|
|
| ### 1. Install dependencies |
|
|
| ```bash |
| pip install -r requirements.txt |
| ``` |
|
|
| ### 2. Build the retrieval index (one-time, run locally) |
|
|
| ```bash |
| # Clone the FreeCAD documentation repo |
| git clone --depth 1 https://github.com/FreeCAD/FreeCAD-documentation freecad-docs |
| |
| # Build BM25 + FAISS indices (outputs to data/) |
| python build_index.py --repo freecad-docs |
| ``` |
|
|
| This produces `data/chunks.parquet`, `data/index.faiss`, and `data/bm25.pkl`. Commit these to the repo before pushing to Hugging Face Spaces. |
|
|
| ### 3. Run |
|
|
| ```bash |
| python app.py |
| ``` |
|
|
| Enter your OpenAI API key in the UI (it is never stored or logged). |
|
|
| ## Retrieval modes |
|
|
| | Toggle | Method | Wins on | |
| |--------|--------|---------| |
| | BM25 | `bm25s` with camelCase/snake_case tokenisation | Exact API tokens: `addConstraint`, `Coincident`, `PartDesign::Pad` | |
| | Dense | `BAAI/bge-small-en-v1.5` + FAISS IndexFlatIP | Paraphrased intent: "round the edges" β Fillet | |
| | Rerank | `BAAI/bge-reranker-base` cross-encoder | Precision: re-scores top-30 fused candidates | |
| | Hybrid (default) | Reciprocal Rank Fusion (k=60) | Best overall recall | |
| |
| ## Project structure |
| |
| ``` |
| βββ app.py # Gradio Blocks UI |
| βββ build_index.py # One-off corpus ingestion + indexing |
| βββ requirements.txt |
| βββ src/ |
| β βββ config.py # All tuneable constants |
| β βββ ingest.py # Markdown page loader |
| β βββ chunk.py # Header-split + code-block-preserving chunker |
| β βββ retrieve.py # BM25Retriever, DenseRetriever, RRF, HybridRetriever |
| β βββ generate.py # System prompt, few-shots, OpenAI call |
| β βββ citations.py # Citation dataclass + rendering |
| βββ data/ # Pre-built indices (commit via git-LFS if > 100 MB) |
| βββ chunks.parquet |
| βββ index.faiss |
| βββ bm25.pkl |
| ``` |
| |
| ## FreeCAD-specific notes |
|
|
| - All generated scripts target **FreeCAD 1.1** (released March 25, 2026). |
| - Scripts are safe to run with `freecadcmd` (headless) β `*Gui` modules are never imported. |
| - The system prompt explicitly warns about the **Topological Naming Problem**: geometry is referenced by index where possible, and dress-up features (Fillet, Chamfer) are always added after all additive/subtractive features. |
| - `doc.recompute()` is called after every feature to avoid silent failures. |
|
|
| ## Evaluation queries |
|
|
| See section 12 of the technical report for the 12-query test set covering: parametric box, flange with bolt pattern, hex nut, L-bracket, threaded shaft, spreadsheet-driven gear, revolution, coincident constraint question, TNP question, linear pattern, helix sweep, and multi-loop sketch. |
|
|
| ## License |
|
|
| Source code: Apache 2.0. Documentation corpus: [CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/) (FreeCAD Wiki). Attribution to FreeCAD Wiki (CC-BY 3.0) shown in the UI. |
|
|