Spaces:

feng-x
/

ring-sizer

Running

File size: 14,628 Bytes

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Standard Task Workflow

For tasks of implementing **new features**:
1. Read PRD.md, Plan.md, Progress.md before coding
2. Summarize current project state before implementation
3. Carry out the implementatation; after that, build and test if possible
4. Update Progress.md after changes
5. Commit with a clear, concise message

For tasks of **bug fixing**:
1. Summarize the bug, reason and solution before implementation
2. Carry out the implementation to fix the bug; build and test afterwards;
3. Update Progress.md after changes
4. Commit with a clear, concise message

For tasks of **reboot** from a new codex session:
1. Read doc/v0/PRD.md, doc/v0/Plan.md, doc/v0/Progress.md for baseline implementation
2. Read doc/v1/PRD.md, doc/v1/Plan.md, doc/v1/Progress.md for edge refinement (v1)
3. Read doc/v4/PRD.md, doc/v4/Plan.md, doc/v4/Progress.md for SAM 2.1 integration (card + hand)
4. Assume this is a continuation of an existing project.
5. Summarize your understanding of the current state and propose the next concrete step without writing code yet.

## Project Overview

Ring Sizer is a **local, terminal-executable computer vision program** that measures the outer width (diameter) of a finger at the ring-wearing zone using a single RGB image. It uses a standard credit card (ISO/IEC 7810 ID-1: 85.60mm × 53.98mm) as a physical size reference for scale calibration.

**Key characteristics:**
- Single image input (JPG/PNG)
- SAM 2.1 for card + hand segmentation; MediaPipe for hand landmarks
- Finger width measured from the SAM mask boundary (`mask` edge method) by default
- Outputs JSON measurement data and optional debug visualization
- No cloud processing, runs entirely locally
- Python 3.8+ with OpenCV, NumPy, MediaPipe, PyTorch, transformers

## Development Commands

### Installation
```bash
# Create virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt
```

### Running the Program
```bash
# Basic measurement (defaults: index finger, mask edge method, classic card)
python measure_finger.py --input input/test_image.jpg --output output/result.json

# Measure a different finger
python measure_finger.py --input input/test_image.jpg --output output/result.json \
  --finger-index ring

# Save intermediate debug images alongside the result PNG
python measure_finger.py --input input/test_image.jpg --output output/result.json \
  --debug

# Use SAM card detection (first run downloads ~150 MB of weights)
python measure_finger.py --input input/test_image.jpg --output output/result.json \
  --card-method sam

# Subpixel Sobel gradient refinement anchored on the SAM mask boundary
python measure_finger.py --input input/test_image.jpg --output output/result.json \
  --card-method sam --edge-method sobel
```

## Architecture Overview

### Processing Pipeline

1. **Image quality** — blur / brightness / contrast (informational, no hard fail)
2. **Hand segmentation** — MediaPipe landmarks + SAM 2.1 mask (palm-center prompt); image rotated to canonical orientation
3. **Card detection** — classical (default CLI) or SAM prompt-based (web demo); scale calibration to `px_per_cm`
4. **Finger isolation** — per-finger mask from the SAM hand mask + landmark ROI
5. **Finger axis** — landmark-based (`MCP→PIP`, the proximal phalanx); image rotated a second time so the ring zone is vertical
6. **Ring zone** — anatomical mode centered on PIP (from landmarks), or 15–25% percentage mode as fallback
7. **Edge measurement** — `mask` (SAM boundary, default) or `sobel` (subpixel gradient anchored on the SAM boundary)
8. **Confidence** — 4-component weighted score (see below)
9. **Visualization** — result PNG with mask overlays, edges, ring-zone band, measurement text

### Module Structure

| Module | Primary Responsibilities |
|---|---|
| `card_detection.py` | Classical card detection (Canny / adaptive / Otsu / color waterfall), scale calibration |
| `sam_backend.py` | Shared `Sam2Model`/`Sam2Processor` singleton (card + hand) |
| `sam_card_detection.py` | Prompt-based SAM card detection + seed helpers |
| `sam_hand_segmentation.py` | Prompt-based SAM hand mask seeded at palm center |
| `finger_segmentation.py` | MediaPipe landmarks + finger isolation against the SAM hand mask |
| `geometry.py` | Landmark-based axis estimation, ring-zone localization, precise rotation helpers |
| `edge_refinement.py` | `mask_only` boundary measurement + `sobel_only` subpixel gradient path |
| `confidence.py` | Card / finger / measurement / edge-quality scoring + overall confidence |
| `image_quality.py` | Blur (Laplacian variance), exposure, finger-spacing / lighting checks |
| `visualization.py` | Result PNG overlays (card rect, hand silhouette, edges, measurement) |
| `debug_observer.py` | Stage-by-stage debug image writer (single canonical writer) |
| `logging_config.py` | `configure_logging()` + `log_phase()` timing context manager |
| `cli_display.py` | Terminal-only decorative output for the CLI entry point |

### Key Design Decisions

**Ring-wearing zone** — anatomical mode: centered on PIP, width = `ANATOMICAL_ZONE_WIDTH_FACTOR × |MCP–PIP|`. Falls back to 15–25% percentage mode only when landmarks are unavailable.

**Axis estimation** — landmark-only (`estimate_finger_axis_from_landmarks`). Defaults to `linear_fit` which uses the MCP→PIP vector (proximal phalanx). Raises `ValueError` on invalid landmarks (NaN, collapsed, too short, non-monotonic); callers map this to `fail_reason="axis_estimation_failed"`.

**Edge measurement** — `mask` (default) reads per-row finger width directly from the SAM boundary. `sobel` runs bidirectional Sobel + sub-pixel parabola fitting, anchored ±N px on the SAM boundary.

**Confidence scoring** — single 4-component model: card 25%, finger 25%, edge quality 20%, measurement 30%. Levels: HIGH (>0.85), MEDIUM (≥0.6), LOW (<0.6). Defined in `src/confidence_constants.py` as `WEIGHT_*`.

---

## CLI Flags

| Flag | Values | Default | Notes |
|---|---|---|---|
| `--finger-index` | auto, index, middle, ring, pinky | `index` | Which finger to measure; also drives orientation |
| `--mode` | single, multi | `single` | `multi` measures index + middle + ring in one pass |
| `--edge-method` | mask, sobel | `mask` | `mask` reads the SAM boundary; `sobel` adds subpixel gradient refinement anchored on it |
| `--sobel-threshold` | float | 15.0 | Minimum gradient magnitude (sobel mode only) |
| `--sobel-kernel-size` | 3, 5, 7 | 3 | Sobel kernel (sobel mode only) |
| `--no-subpixel` | flag | off | Disable parabola refinement (sobel mode only) |
| `--card-method` | classic, sam | `classic` | CLI default is classical to avoid surprise 150 MB SAM weight download; web demo forces `sam` |
| `--ring-model` | see `src/ring_size.py` | — | Ring-size lookup table |
| `--debug` | flag | off | Write stage debug images next to the result PNG |
| `--skip-card-detection` | flag | off | Test-only: use a dummy scale factor |
| `--no-calibration` | flag | off | Report raw (uncalibrated) diameter |

The v0 contour path and the v1 `auto` / `compare` diagnostic modes were removed during the v4 cleanup; only `mask` and `sobel` remain as edge methods, and the hand mask is always SAM (with an automatic internal fallback to the MediaPipe convex hull if SAM raises).

---

## v4 Architecture (SAM 2.1 Segmentation)

v4 replaces the two fragile detection stages in v0/v1 with Meta's Segment Anything 2.1 (Hiera Small, Apache 2.0, ~150 MB). Both SAM calls are prompt-based so CPU inference stays under ~2 s total per image.

### What's new in v4

- **SAM card detection** — `src/sam_card_detection.py::detect_credit_card_sam_prompt()`. Seeds sampled outside the hand mask; each seed fires a positive prompt + negative prompts at every other seed; candidate masks are filtered by rectangularity (≥0.90), aspect ratio (1.586 ± 15%), and area bounds. ~14× faster than the AMG grid path that originally shipped (AMG has since been removed).
- **SAM hand mask** — `src/sam_hand_segmentation.py::segment_hand_sam()`. Single positive prompt at the palm center (mean of MediaPipe landmarks 0, 5, 9, 13, 17). If SAM raises, the pipeline automatically falls back to a MediaPipe landmark convex hull (kept available under `hand_data["mask_synthetic"]` for debug).
- **`mask` edge method** (default) — measures width directly from the SAM boundary with no Sobel search. `sobel` is a second mode that anchors bidirectional Sobel + subpixel refinement on the SAM boundary (±N px).
- **Shared SAM backend** (`src/sam_backend.py`) — single `Sam2Model` + `Sam2Processor` singleton shared by card + hand. Tries the local HF cache first (`local_files_only=True`) to avoid HEAD-request retry storms.
- **Pipeline ordering** — hand mask runs first; the background complement seeds card detection. Cheap because SAM hand segmentation is ~0.5 s.

### v4 module additions

| Module | Purpose |
|--------|---------|
| `src/sam_backend.py` | Shared Sam2Model/Sam2Processor singleton |
| `src/sam_card_detection.py` | Prompt-based SAM card detection + seed helper |
| `src/sam_hand_segmentation.py` | Prompt-based SAM hand segmentation |

### v4 debug additions

- SAM card mask and SAM hand mask are blended onto the final debug PNG by `src/visualization.py` so the user can see what was actually measured.
- `script/validate_sam_card.py` and `script/compare_hand_sam.py` are offline validation/comparison harnesses for the two SAM stages.

### v4 defaults

| Component | CLI default | Web demo |
|---|---|---|
| `--card-method` | `classic` (avoids surprise 150 MB download) | `sam` (hard-coded) |
| `--edge-method` | `mask` | `mask` (hard-coded) |

Hand mask is always SAM; there is no user-facing flag for it. If SAM raises at runtime, `segment_hand()` silently falls back to the MediaPipe convex hull.

### Ring / pinky handling

For outer fingers the ROI is shrunk and rotation is centered on the proximal phalanx rather than the finger midpoint. `mask_only` measurements (i.e., the `mask` edge method) drop invalid rows and hard-fail if too few valid rows remain, rather than silently returning a low-confidence number.

### Environment flags

- `RING_DISABLE_SUPABASE=1` — opt out of Supabase persistence for local dev runs (the web demo otherwise persists each measurement off the request thread).

---

## Important Technical Details

### What This Measures
The system measures the **external horizontal width** (outer diameter) of the finger at the ring-wearing zone. This is:
- ✅ The width of soft tissue + bone at the ring-wearing position
- ❌ NOT the inner diameter of a ring
- Used as a geometric proxy for downstream ring size mapping (out of scope for v0)

### Coordinate Systems
- Images use standard OpenCV format: (row, col) = (y, x)
- Most geometry functions work in (x, y) format
- Contours are Nx2 arrays in (x, y) format
- Careful conversion needed between formats (see `geometry.py:35`)

### MediaPipe Integration
- Uses pretrained hand landmark detection model (no custom training)
- Provides 21 hand landmarks per hand
- Each finger has 4 landmarks: MCP (base), PIP, DIP, TIP
- Finger indices: 0=thumb, 1=index, 2=middle, 3=ring, 4=pinky
- **Orientation detection**: Uses wrist → specified finger tip to determine hand rotation
- **Automatic rotation**: Image rotated to canonical orientation (wrist at bottom, fingers up) based on selected finger

### Input Requirements
For optimal results:
- Resolution: 1080p or higher recommended
- View angle: Near top-down view
- **Finger**: One finger extended (index, middle, or ring). Specify with `--finger-index`
- Credit card: Must show at least 3 corners, aspect ratio ~1.586
- Finger and card must be on the same plane
- Good lighting, minimal blur

### Failure Modes (values of `fail_reason`)
- `hand_not_detected` — MediaPipe did not locate a hand
- `card_not_detected` — classical or SAM card detector returned nothing
- `card_not_parallel` — card detected but `scale_confidence ≤ 0.9` (too much perspective)
- `finger_isolation_failed`, `finger_mask_too_small`, `contour_extraction_failed` — finger segmentation stages
- `axis_estimation_failed` — landmarks missing or failed quality checks (NaN, collapsed, non-monotonic, below min length)
- `zone_localization_failed` — ring zone could not be derived
- `sobel_edge_refinement_failed` — `sobel` mode requested but edge detection raised
- `insufficient_edge_samples_<N>` — `mask` mode: too few valid rows to form a robust median

## Output Format

### JSON Output Structure
```json
{
  "finger_outer_diameter_cm": 1.78,
  "confidence": 0.86,
  "scale_px_per_cm": 42.3,
  "quality_flags": {
    "card_detected": true,
    "finger_detected": true,
    "view_angle_ok": true
  },
  "fail_reason": null
}
```

### Debug Visualization
The result PNG is written alongside every JSON output. With `--debug`, the same sibling directory also gets per-phase subdirs with numbered stage images (`NN_name.png`) produced by a single writer, `DebugObserver`:

- `finger_segmentation_debug/` — MediaPipe landmarks, hand skeleton; `sam_hand/` subdir for SAM mask + overlay
- `card_detection_debug/` (classical) or `sam_card_prompt_debug/` (SAM) — strategy waterfall / prompt points / candidates / final selection
- `edge_refinement_debug/` — ROI, Sobel stages, subpixel refinement, per-row widths (sobel mode)

### Observability

`src/logging_config.py` provides `configure_logging()` (called once by each entry point) and a `log_phase(name, totals)` context manager. All phase timings log as `[phase] name: X ms` through the standard `logger`; `src/` modules use module-level `logging.getLogger(__name__)` exclusively — no `print()`. Terminal-only decorative output (final result summary, "TESTING MODE" banner) lives in `src/cli_display.py` and is imported only by `measure_finger.py` main.

## Code Patterns and Conventions

- Functions raise on malformed inputs; `measure_finger()` maps exceptions to structured `fail_reason` values in the output dict.
- Realistic width range: 1.0–3.0 cm (typical 1.4–2.4 cm). Out-of-range widths log a warning but do not fail.
- Credit card aspect ratio tolerance: ±15% of 1.586. `scale_confidence > 0.9` is required (hard fail `card_not_parallel` otherwise).
- Coordinate convention: OpenCV is `(row, col) = (y, x)`; most `src/geometry.py` helpers use `(x, y)`. Contours are `Nx2` in `(x, y)` format.