File size: 4,083 Bytes
e10e98e 11ba2bd e10e98e 11ba2bd e10e98e f9d0b31 8d14573 e10e98e 11ba2bd | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | ---
title: FreeCAD RAG Assistant
emoji: π οΈ
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 6.5.1
python_version: "3.11"
app_file: app.py
pinned: false
license: apache-2.0
short_description: FreeCAD Python from natural language via RAG
---
# FreeCAD RAG Assistant
A RAG (Retrieval-Augmented Generation) system that generates complete, runnable **FreeCAD 1.1 Python scripts** from natural-language descriptions of parts.
## Architecture
```
Query
β
βββΊ BM25 retrieval (bm25s) ββ
β βββΊ RRF fusion ββΊ Cross-encoder rerank ββΊ Top-5 chunks
βββΊ Dense retrieval (bge-small-en) ββ
β
OpenAI (gpt-4o-mini) + system prompt
β
Generated Python + inline citations
```
**Corpus**: [FreeCAD/FreeCAD-documentation](https://github.com/FreeCAD/FreeCAD-documentation) (CC0 1.0) β ~1,500 English wiki pages covering PartDesign, Sketcher, Python scripting API, and release notes.
## Setup
### 1. Install dependencies
```bash
pip install -r requirements.txt
```
### 2. Build the retrieval index (one-time, run locally)
```bash
# Clone the FreeCAD documentation repo
git clone --depth 1 https://github.com/FreeCAD/FreeCAD-documentation freecad-docs
# Build BM25 + FAISS indices (outputs to data/)
python build_index.py --repo freecad-docs
```
This produces `data/chunks.parquet`, `data/index.faiss`, and `data/bm25.pkl`. Commit these to the repo before pushing to Hugging Face Spaces.
### 3. Run
```bash
python app.py
```
Enter your OpenAI API key in the UI (it is never stored or logged).
## Retrieval modes
| Toggle | Method | Wins on |
|--------|--------|---------|
| BM25 | `bm25s` with camelCase/snake_case tokenisation | Exact API tokens: `addConstraint`, `Coincident`, `PartDesign::Pad` |
| Dense | `BAAI/bge-small-en-v1.5` + FAISS IndexFlatIP | Paraphrased intent: "round the edges" β Fillet |
| Rerank | `BAAI/bge-reranker-base` cross-encoder | Precision: re-scores top-30 fused candidates |
| Hybrid (default) | Reciprocal Rank Fusion (k=60) | Best overall recall |
## Project structure
```
βββ app.py # Gradio Blocks UI
βββ build_index.py # One-off corpus ingestion + indexing
βββ requirements.txt
βββ src/
β βββ config.py # All tuneable constants
β βββ ingest.py # Markdown page loader
β βββ chunk.py # Header-split + code-block-preserving chunker
β βββ retrieve.py # BM25Retriever, DenseRetriever, RRF, HybridRetriever
β βββ generate.py # System prompt, few-shots, OpenAI call
β βββ citations.py # Citation dataclass + rendering
βββ data/ # Pre-built indices (commit via git-LFS if > 100 MB)
βββ chunks.parquet
βββ index.faiss
βββ bm25.pkl
```
## FreeCAD-specific notes
- All generated scripts target **FreeCAD 1.1** (released March 25, 2026).
- Scripts are safe to run with `freecadcmd` (headless) β `*Gui` modules are never imported.
- The system prompt explicitly warns about the **Topological Naming Problem**: geometry is referenced by index where possible, and dress-up features (Fillet, Chamfer) are always added after all additive/subtractive features.
- `doc.recompute()` is called after every feature to avoid silent failures.
## Evaluation queries
See section 12 of the technical report for the 12-query test set covering: parametric box, flange with bolt pattern, hex nut, L-bracket, threaded shaft, spreadsheet-driven gear, revolution, coincident constraint question, TNP question, linear pattern, helix sweep, and multi-loop sketch.
## License
Source code: Apache 2.0. Documentation corpus: [CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/) (FreeCAD Wiki). Attribution to FreeCAD Wiki (CC-BY 3.0) shown in the UI.
|