# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Commands ```bash # Install dependencies pip install -r requirements.txt # Build retrieval indices (one-time, requires freecad-docs/ to be cloned) git clone --depth 1 https://github.com/FreeCAD/FreeCAD-documentation freecad-docs python build_index.py --repo freecad-docs # Run the Gradio app python app.py ``` The app requires `data/chunks.parquet`, `data/index.faiss`, and `data/bm25.pkl` to exist before it will serve requests. `indices_ready()` in `src/retrieve.py` checks for these. ## Architecture The system is a two-phase pipeline: **offline indexing** (`build_index.py`) and **online serving** (`app.py`). ### Offline: `build_index.py` Reads FreeCAD wiki markdown from `freecad-docs/wiki/`, passes pages through `src/ingest.py` → `src/chunk.py`, then builds two indices written to `data/`: - **BM25** (`bm25s`, `bm25.pkl`) — tokenised with a custom camelCase/snake_case tokeniser in `src/retrieve.py:_tokenize` - **Dense** (`FAISS IndexFlatIP`, `index.faiss`) — embeddings from `BAAI/bge-small-en-v1.5` ### Online: `app.py` → `src/retrieve.py` → `src/generate.py` 1. `HybridRetriever.retrieve(query)` runs BM25 + dense search, fuses with Reciprocal Rank Fusion (k=60), optionally reranks with `BAAI/bge-reranker-base` cross-encoder, returns top-N `Citation` objects. 2. `generate_response()` formats citations into a numbered context block, prepends the system prompt (with two few-shot examples), and calls the OpenAI chat API. 3. The response is split into a `python` code block and a prose explanation with inline `[N]` citation references. ### Key files - `src/config.py` — all tuneable constants (chunk size, top-K values, model names, file paths). Change retrieval hyperparameters here. - `src/chunk.py` — header-split + code-block-preserving chunker. Fenced code blocks are replaced with UUID placeholders before splitting so they are never broken mid-block. - `src/retrieve.py` — all retrieval logic including lazy model singletons (`_load_*` functions) that are cached at module level for the Gradio process lifetime. - `src/generate.py` — system prompt, two few-shot examples (parametric box, revolve), and the OpenAI call. The few-shot examples are the authoritative reference for expected script style. - `src/citations.py` — `Citation` dataclass, context block formatter, and citation markdown renderer. - `src/ingest.py` — walks `freecad-docs/wiki/*.md`, skips Category/Template/MediaWiki pages, and flags ~25 high-priority scripting pages for front-sorting. ## FreeCAD script generation constraints All generated scripts must: - Target **FreeCAD 1.1** (released March 25, 2026) - Never import `*Gui` modules — they crash headless (`freecadcmd`) - Use `body.newObject(...)` not `doc.addObject(...)` for PartDesign features - Call `doc.recompute()` after every feature - Add dress-up features (Fillet, Chamfer) only after all additive/subtractive features - Reference geometry by index to minimise Topological Naming Problem risk These rules are encoded in `_SYSTEM_PROMPT` in `src/generate.py` and must stay consistent with any few-shot examples added there.