File size: 4,083 Bytes
e10e98e
11ba2bd
 
 
 
e10e98e
 
11ba2bd
e10e98e
 
f9d0b31
8d14573
e10e98e
 
11ba2bd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
title: FreeCAD RAG Assistant
emoji: πŸ› οΈ
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 6.5.1
python_version: "3.11"
app_file: app.py
pinned: false
license: apache-2.0
short_description: FreeCAD Python from natural language via RAG
---

# FreeCAD RAG Assistant

A RAG (Retrieval-Augmented Generation) system that generates complete, runnable **FreeCAD 1.1 Python scripts** from natural-language descriptions of parts.

## Architecture

```
Query
  β”‚
  β”œβ”€β–Ί BM25 retrieval (bm25s)          ─┐
  β”‚                                    β”œβ”€β–Ί RRF fusion ─► Cross-encoder rerank ─► Top-5 chunks
  └─► Dense retrieval (bge-small-en)  β”€β”˜
                                                                β”‚
                                                  OpenAI (gpt-4o-mini) + system prompt
                                                                β”‚
                                              Generated Python + inline citations
```

**Corpus**: [FreeCAD/FreeCAD-documentation](https://github.com/FreeCAD/FreeCAD-documentation) (CC0 1.0) β€” ~1,500 English wiki pages covering PartDesign, Sketcher, Python scripting API, and release notes.

## Setup

### 1. Install dependencies

```bash
pip install -r requirements.txt
```

### 2. Build the retrieval index (one-time, run locally)

```bash
# Clone the FreeCAD documentation repo
git clone --depth 1 https://github.com/FreeCAD/FreeCAD-documentation freecad-docs

# Build BM25 + FAISS indices (outputs to data/)
python build_index.py --repo freecad-docs
```

This produces `data/chunks.parquet`, `data/index.faiss`, and `data/bm25.pkl`. Commit these to the repo before pushing to Hugging Face Spaces.

### 3. Run

```bash
python app.py
```

Enter your OpenAI API key in the UI (it is never stored or logged).

## Retrieval modes

| Toggle | Method | Wins on |
|--------|--------|---------|
| BM25 | `bm25s` with camelCase/snake_case tokenisation | Exact API tokens: `addConstraint`, `Coincident`, `PartDesign::Pad` |
| Dense | `BAAI/bge-small-en-v1.5` + FAISS IndexFlatIP | Paraphrased intent: "round the edges" β†’ Fillet |
| Rerank | `BAAI/bge-reranker-base` cross-encoder | Precision: re-scores top-30 fused candidates |
| Hybrid (default) | Reciprocal Rank Fusion (k=60) | Best overall recall |

## Project structure

```
β”œβ”€β”€ app.py               # Gradio Blocks UI
β”œβ”€β”€ build_index.py       # One-off corpus ingestion + indexing
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ config.py        # All tuneable constants
β”‚   β”œβ”€β”€ ingest.py        # Markdown page loader
β”‚   β”œβ”€β”€ chunk.py         # Header-split + code-block-preserving chunker
β”‚   β”œβ”€β”€ retrieve.py      # BM25Retriever, DenseRetriever, RRF, HybridRetriever
β”‚   β”œβ”€β”€ generate.py      # System prompt, few-shots, OpenAI call
β”‚   └── citations.py     # Citation dataclass + rendering
└── data/                # Pre-built indices (commit via git-LFS if > 100 MB)
    β”œβ”€β”€ chunks.parquet
    β”œβ”€β”€ index.faiss
    └── bm25.pkl
```

## FreeCAD-specific notes

- All generated scripts target **FreeCAD 1.1** (released March 25, 2026).
- Scripts are safe to run with `freecadcmd` (headless) β€” `*Gui` modules are never imported.
- The system prompt explicitly warns about the **Topological Naming Problem**: geometry is referenced by index where possible, and dress-up features (Fillet, Chamfer) are always added after all additive/subtractive features.
- `doc.recompute()` is called after every feature to avoid silent failures.

## Evaluation queries

See section 12 of the technical report for the 12-query test set covering: parametric box, flange with bolt pattern, hex nut, L-bracket, threaded shaft, spreadsheet-driven gear, revolution, coincident constraint question, TNP question, linear pattern, helix sweep, and multi-loop sketch.

## License

Source code: Apache 2.0. Documentation corpus: [CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/) (FreeCAD Wiki). Attribution to FreeCAD Wiki (CC-BY 3.0) shown in the UI.