| # CLAUDE.md |
|
|
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
|
|
| ## Commands |
|
|
| ```bash |
| # Install dependencies |
| pip install -r requirements.txt |
| |
| # Build retrieval indices (one-time, requires freecad-docs/ to be cloned) |
| git clone --depth 1 https://github.com/FreeCAD/FreeCAD-documentation freecad-docs |
| python build_index.py --repo freecad-docs |
| |
| # Run the Gradio app |
| python app.py |
| ``` |
|
|
| The app requires `data/chunks.parquet`, `data/index.faiss`, and `data/bm25.pkl` to exist before it will serve requests. `indices_ready()` in `src/retrieve.py` checks for these. |
|
|
| ## Architecture |
|
|
| The system is a two-phase pipeline: **offline indexing** (`build_index.py`) and **online serving** (`app.py`). |
|
|
| ### Offline: `build_index.py` |
| Reads FreeCAD wiki markdown from `freecad-docs/wiki/`, passes pages through `src/ingest.py` β `src/chunk.py`, then builds two indices written to `data/`: |
| - **BM25** (`bm25s`, `bm25.pkl`) β tokenised with a custom camelCase/snake_case tokeniser in `src/retrieve.py:_tokenize` |
| - **Dense** (`FAISS IndexFlatIP`, `index.faiss`) β embeddings from `BAAI/bge-small-en-v1.5` |
| |
| ### Online: `app.py` β `src/retrieve.py` β `src/generate.py` |
| 1. `HybridRetriever.retrieve(query)` runs BM25 + dense search, fuses with Reciprocal Rank Fusion (k=60), optionally reranks with `BAAI/bge-reranker-base` cross-encoder, returns top-N `Citation` objects. |
| 2. `generate_response()` formats citations into a numbered context block, prepends the system prompt (with two few-shot examples), and calls the OpenAI chat API. |
| 3. The response is split into a `python` code block and a prose explanation with inline `[N]` citation references. |
|
|
| ### Key files |
| - `src/config.py` β all tuneable constants (chunk size, top-K values, model names, file paths). Change retrieval hyperparameters here. |
| - `src/chunk.py` β header-split + code-block-preserving chunker. Fenced code blocks are replaced with UUID placeholders before splitting so they are never broken mid-block. |
| - `src/retrieve.py` β all retrieval logic including lazy model singletons (`_load_*` functions) that are cached at module level for the Gradio process lifetime. |
| - `src/generate.py` β system prompt, two few-shot examples (parametric box, revolve), and the OpenAI call. The few-shot examples are the authoritative reference for expected script style. |
| - `src/citations.py` β `Citation` dataclass, context block formatter, and citation markdown renderer. |
| - `src/ingest.py` β walks `freecad-docs/wiki/*.md`, skips Category/Template/MediaWiki pages, and flags ~25 high-priority scripting pages for front-sorting. |
|
|
| ## FreeCAD script generation constraints |
|
|
| All generated scripts must: |
| - Target **FreeCAD 1.1** (released March 25, 2026) |
| - Never import `*Gui` modules β they crash headless (`freecadcmd`) |
| - Use `body.newObject(...)` not `doc.addObject(...)` for PartDesign features |
| - Call `doc.recompute()` after every feature |
| - Add dress-up features (Fillet, Chamfer) only after all additive/subtractive features |
| - Reference geometry by index to minimise Topological Naming Problem risk |
|
|
| These rules are encoded in `_SYSTEM_PROMPT` in `src/generate.py` and must stay consistent with any few-shot examples added there. |
|
|