NLP_Lab / README.md
apytel
update description
8d14573
|
Raw
History Blame Contribute Delete
4.08 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade
metadata
title: FreeCAD RAG Assistant
emoji: πŸ› οΈ
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 6.5.1
python_version: '3.11'
app_file: app.py
pinned: false
license: apache-2.0
short_description: FreeCAD Python from natural language via RAG

FreeCAD RAG Assistant

A RAG (Retrieval-Augmented Generation) system that generates complete, runnable FreeCAD 1.1 Python scripts from natural-language descriptions of parts.

Architecture

Query
  β”‚
  β”œβ”€β–Ί BM25 retrieval (bm25s)          ─┐
  β”‚                                    β”œβ”€β–Ί RRF fusion ─► Cross-encoder rerank ─► Top-5 chunks
  └─► Dense retrieval (bge-small-en)  β”€β”˜
                                                                β”‚
                                                  OpenAI (gpt-4o-mini) + system prompt
                                                                β”‚
                                              Generated Python + inline citations

Corpus: FreeCAD/FreeCAD-documentation (CC0 1.0) β€” ~1,500 English wiki pages covering PartDesign, Sketcher, Python scripting API, and release notes.

Setup

1. Install dependencies

pip install -r requirements.txt

2. Build the retrieval index (one-time, run locally)

# Clone the FreeCAD documentation repo
git clone --depth 1 https://github.com/FreeCAD/FreeCAD-documentation freecad-docs

# Build BM25 + FAISS indices (outputs to data/)
python build_index.py --repo freecad-docs

This produces data/chunks.parquet, data/index.faiss, and data/bm25.pkl. Commit these to the repo before pushing to Hugging Face Spaces.

3. Run

python app.py

Enter your OpenAI API key in the UI (it is never stored or logged).

Retrieval modes

Toggle Method Wins on
BM25 bm25s with camelCase/snake_case tokenisation Exact API tokens: addConstraint, Coincident, PartDesign::Pad
Dense BAAI/bge-small-en-v1.5 + FAISS IndexFlatIP Paraphrased intent: "round the edges" β†’ Fillet
Rerank BAAI/bge-reranker-base cross-encoder Precision: re-scores top-30 fused candidates
Hybrid (default) Reciprocal Rank Fusion (k=60) Best overall recall

Project structure

β”œβ”€β”€ app.py               # Gradio Blocks UI
β”œβ”€β”€ build_index.py       # One-off corpus ingestion + indexing
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ config.py        # All tuneable constants
β”‚   β”œβ”€β”€ ingest.py        # Markdown page loader
β”‚   β”œβ”€β”€ chunk.py         # Header-split + code-block-preserving chunker
β”‚   β”œβ”€β”€ retrieve.py      # BM25Retriever, DenseRetriever, RRF, HybridRetriever
β”‚   β”œβ”€β”€ generate.py      # System prompt, few-shots, OpenAI call
β”‚   └── citations.py     # Citation dataclass + rendering
└── data/                # Pre-built indices (commit via git-LFS if > 100 MB)
    β”œβ”€β”€ chunks.parquet
    β”œβ”€β”€ index.faiss
    └── bm25.pkl

FreeCAD-specific notes

  • All generated scripts target FreeCAD 1.1 (released March 25, 2026).
  • Scripts are safe to run with freecadcmd (headless) β€” *Gui modules are never imported.
  • The system prompt explicitly warns about the Topological Naming Problem: geometry is referenced by index where possible, and dress-up features (Fillet, Chamfer) are always added after all additive/subtractive features.
  • doc.recompute() is called after every feature to avoid silent failures.

Evaluation queries

See section 12 of the technical report for the 12-query test set covering: parametric box, flange with bolt pattern, hex nut, L-bracket, threaded shaft, spreadsheet-driven gear, revolution, coincident constraint question, TNP question, linear pattern, helix sweep, and multi-loop sketch.

License

Source code: Apache 2.0. Documentation corpus: CC0 1.0 (FreeCAD Wiki). Attribution to FreeCAD Wiki (CC-BY 3.0) shown in the UI.