Spaces:
Running
Running
File size: 14,628 Bytes
347d1a8 0c727ab d3d0932 0c727ab 347d1a8 6f3fe10 347d1a8 6f3fe10 347d1a8 6f3fe10 347d1a8 6f3fe10 347d1a8 6f3fe10 347d1a8 6f3fe10 347d1a8 6f3fe10 347d1a8 6f3fe10 347d1a8 6f3fe10 347d1a8 6f3fe10 347d1a8 6f3fe10 347d1a8 6f3fe10 347d1a8 6f3fe10 347d1a8 6f3fe10 347d1a8 6f3fe10 347d1a8 0c727ab 6f3fe10 0c727ab 6f3fe10 0c727ab 6f3fe10 0c727ab 347d1a8 6f3fe10 347d1a8 6f3fe10 347d1a8 6f3fe10 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 | # CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Standard Task Workflow
For tasks of implementing **new features**:
1. Read PRD.md, Plan.md, Progress.md before coding
2. Summarize current project state before implementation
3. Carry out the implementatation; after that, build and test if possible
4. Update Progress.md after changes
5. Commit with a clear, concise message
For tasks of **bug fixing**:
1. Summarize the bug, reason and solution before implementation
2. Carry out the implementation to fix the bug; build and test afterwards;
3. Update Progress.md after changes
4. Commit with a clear, concise message
For tasks of **reboot** from a new codex session:
1. Read doc/v0/PRD.md, doc/v0/Plan.md, doc/v0/Progress.md for baseline implementation
2. Read doc/v1/PRD.md, doc/v1/Plan.md, doc/v1/Progress.md for edge refinement (v1)
3. Read doc/v4/PRD.md, doc/v4/Plan.md, doc/v4/Progress.md for SAM 2.1 integration (card + hand)
4. Assume this is a continuation of an existing project.
5. Summarize your understanding of the current state and propose the next concrete step without writing code yet.
## Project Overview
Ring Sizer is a **local, terminal-executable computer vision program** that measures the outer width (diameter) of a finger at the ring-wearing zone using a single RGB image. It uses a standard credit card (ISO/IEC 7810 ID-1: 85.60mm Γ 53.98mm) as a physical size reference for scale calibration.
**Key characteristics:**
- Single image input (JPG/PNG)
- SAM 2.1 for card + hand segmentation; MediaPipe for hand landmarks
- Finger width measured from the SAM mask boundary (`mask` edge method) by default
- Outputs JSON measurement data and optional debug visualization
- No cloud processing, runs entirely locally
- Python 3.8+ with OpenCV, NumPy, MediaPipe, PyTorch, transformers
## Development Commands
### Installation
```bash
# Create virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
```
### Running the Program
```bash
# Basic measurement (defaults: index finger, mask edge method, classic card)
python measure_finger.py --input input/test_image.jpg --output output/result.json
# Measure a different finger
python measure_finger.py --input input/test_image.jpg --output output/result.json \
--finger-index ring
# Save intermediate debug images alongside the result PNG
python measure_finger.py --input input/test_image.jpg --output output/result.json \
--debug
# Use SAM card detection (first run downloads ~150 MB of weights)
python measure_finger.py --input input/test_image.jpg --output output/result.json \
--card-method sam
# Subpixel Sobel gradient refinement anchored on the SAM mask boundary
python measure_finger.py --input input/test_image.jpg --output output/result.json \
--card-method sam --edge-method sobel
```
## Architecture Overview
### Processing Pipeline
1. **Image quality** β blur / brightness / contrast (informational, no hard fail)
2. **Hand segmentation** β MediaPipe landmarks + SAM 2.1 mask (palm-center prompt); image rotated to canonical orientation
3. **Card detection** β classical (default CLI) or SAM prompt-based (web demo); scale calibration to `px_per_cm`
4. **Finger isolation** β per-finger mask from the SAM hand mask + landmark ROI
5. **Finger axis** β landmark-based (`MCPβPIP`, the proximal phalanx); image rotated a second time so the ring zone is vertical
6. **Ring zone** β anatomical mode centered on PIP (from landmarks), or 15β25% percentage mode as fallback
7. **Edge measurement** β `mask` (SAM boundary, default) or `sobel` (subpixel gradient anchored on the SAM boundary)
8. **Confidence** β 4-component weighted score (see below)
9. **Visualization** β result PNG with mask overlays, edges, ring-zone band, measurement text
### Module Structure
| Module | Primary Responsibilities |
|---|---|
| `card_detection.py` | Classical card detection (Canny / adaptive / Otsu / color waterfall), scale calibration |
| `sam_backend.py` | Shared `Sam2Model`/`Sam2Processor` singleton (card + hand) |
| `sam_card_detection.py` | Prompt-based SAM card detection + seed helpers |
| `sam_hand_segmentation.py` | Prompt-based SAM hand mask seeded at palm center |
| `finger_segmentation.py` | MediaPipe landmarks + finger isolation against the SAM hand mask |
| `geometry.py` | Landmark-based axis estimation, ring-zone localization, precise rotation helpers |
| `edge_refinement.py` | `mask_only` boundary measurement + `sobel_only` subpixel gradient path |
| `confidence.py` | Card / finger / measurement / edge-quality scoring + overall confidence |
| `image_quality.py` | Blur (Laplacian variance), exposure, finger-spacing / lighting checks |
| `visualization.py` | Result PNG overlays (card rect, hand silhouette, edges, measurement) |
| `debug_observer.py` | Stage-by-stage debug image writer (single canonical writer) |
| `logging_config.py` | `configure_logging()` + `log_phase()` timing context manager |
| `cli_display.py` | Terminal-only decorative output for the CLI entry point |
### Key Design Decisions
**Ring-wearing zone** β anatomical mode: centered on PIP, width = `ANATOMICAL_ZONE_WIDTH_FACTOR Γ |MCPβPIP|`. Falls back to 15β25% percentage mode only when landmarks are unavailable.
**Axis estimation** β landmark-only (`estimate_finger_axis_from_landmarks`). Defaults to `linear_fit` which uses the MCPβPIP vector (proximal phalanx). Raises `ValueError` on invalid landmarks (NaN, collapsed, too short, non-monotonic); callers map this to `fail_reason="axis_estimation_failed"`.
**Edge measurement** β `mask` (default) reads per-row finger width directly from the SAM boundary. `sobel` runs bidirectional Sobel + sub-pixel parabola fitting, anchored Β±N px on the SAM boundary.
**Confidence scoring** β single 4-component model: card 25%, finger 25%, edge quality 20%, measurement 30%. Levels: HIGH (>0.85), MEDIUM (β₯0.6), LOW (<0.6). Defined in `src/confidence_constants.py` as `WEIGHT_*`.
---
## CLI Flags
| Flag | Values | Default | Notes |
|---|---|---|---|
| `--finger-index` | auto, index, middle, ring, pinky | `index` | Which finger to measure; also drives orientation |
| `--mode` | single, multi | `single` | `multi` measures index + middle + ring in one pass |
| `--edge-method` | mask, sobel | `mask` | `mask` reads the SAM boundary; `sobel` adds subpixel gradient refinement anchored on it |
| `--sobel-threshold` | float | 15.0 | Minimum gradient magnitude (sobel mode only) |
| `--sobel-kernel-size` | 3, 5, 7 | 3 | Sobel kernel (sobel mode only) |
| `--no-subpixel` | flag | off | Disable parabola refinement (sobel mode only) |
| `--card-method` | classic, sam | `classic` | CLI default is classical to avoid surprise 150 MB SAM weight download; web demo forces `sam` |
| `--ring-model` | see `src/ring_size.py` | β | Ring-size lookup table |
| `--debug` | flag | off | Write stage debug images next to the result PNG |
| `--skip-card-detection` | flag | off | Test-only: use a dummy scale factor |
| `--no-calibration` | flag | off | Report raw (uncalibrated) diameter |
The v0 contour path and the v1 `auto` / `compare` diagnostic modes were removed during the v4 cleanup; only `mask` and `sobel` remain as edge methods, and the hand mask is always SAM (with an automatic internal fallback to the MediaPipe convex hull if SAM raises).
---
## v4 Architecture (SAM 2.1 Segmentation)
v4 replaces the two fragile detection stages in v0/v1 with Meta's Segment Anything 2.1 (Hiera Small, Apache 2.0, ~150 MB). Both SAM calls are prompt-based so CPU inference stays under ~2 s total per image.
### What's new in v4
- **SAM card detection** β `src/sam_card_detection.py::detect_credit_card_sam_prompt()`. Seeds sampled outside the hand mask; each seed fires a positive prompt + negative prompts at every other seed; candidate masks are filtered by rectangularity (β₯0.90), aspect ratio (1.586 Β± 15%), and area bounds. ~14Γ faster than the AMG grid path that originally shipped (AMG has since been removed).
- **SAM hand mask** β `src/sam_hand_segmentation.py::segment_hand_sam()`. Single positive prompt at the palm center (mean of MediaPipe landmarks 0, 5, 9, 13, 17). If SAM raises, the pipeline automatically falls back to a MediaPipe landmark convex hull (kept available under `hand_data["mask_synthetic"]` for debug).
- **`mask` edge method** (default) β measures width directly from the SAM boundary with no Sobel search. `sobel` is a second mode that anchors bidirectional Sobel + subpixel refinement on the SAM boundary (Β±N px).
- **Shared SAM backend** (`src/sam_backend.py`) β single `Sam2Model` + `Sam2Processor` singleton shared by card + hand. Tries the local HF cache first (`local_files_only=True`) to avoid HEAD-request retry storms.
- **Pipeline ordering** β hand mask runs first; the background complement seeds card detection. Cheap because SAM hand segmentation is ~0.5 s.
### v4 module additions
| Module | Purpose |
|--------|---------|
| `src/sam_backend.py` | Shared Sam2Model/Sam2Processor singleton |
| `src/sam_card_detection.py` | Prompt-based SAM card detection + seed helper |
| `src/sam_hand_segmentation.py` | Prompt-based SAM hand segmentation |
### v4 debug additions
- SAM card mask and SAM hand mask are blended onto the final debug PNG by `src/visualization.py` so the user can see what was actually measured.
- `script/validate_sam_card.py` and `script/compare_hand_sam.py` are offline validation/comparison harnesses for the two SAM stages.
### v4 defaults
| Component | CLI default | Web demo |
|---|---|---|
| `--card-method` | `classic` (avoids surprise 150 MB download) | `sam` (hard-coded) |
| `--edge-method` | `mask` | `mask` (hard-coded) |
Hand mask is always SAM; there is no user-facing flag for it. If SAM raises at runtime, `segment_hand()` silently falls back to the MediaPipe convex hull.
### Ring / pinky handling
For outer fingers the ROI is shrunk and rotation is centered on the proximal phalanx rather than the finger midpoint. `mask_only` measurements (i.e., the `mask` edge method) drop invalid rows and hard-fail if too few valid rows remain, rather than silently returning a low-confidence number.
### Environment flags
- `RING_DISABLE_SUPABASE=1` β opt out of Supabase persistence for local dev runs (the web demo otherwise persists each measurement off the request thread).
---
## Important Technical Details
### What This Measures
The system measures the **external horizontal width** (outer diameter) of the finger at the ring-wearing zone. This is:
- β
The width of soft tissue + bone at the ring-wearing position
- β NOT the inner diameter of a ring
- Used as a geometric proxy for downstream ring size mapping (out of scope for v0)
### Coordinate Systems
- Images use standard OpenCV format: (row, col) = (y, x)
- Most geometry functions work in (x, y) format
- Contours are Nx2 arrays in (x, y) format
- Careful conversion needed between formats (see `geometry.py:35`)
### MediaPipe Integration
- Uses pretrained hand landmark detection model (no custom training)
- Provides 21 hand landmarks per hand
- Each finger has 4 landmarks: MCP (base), PIP, DIP, TIP
- Finger indices: 0=thumb, 1=index, 2=middle, 3=ring, 4=pinky
- **Orientation detection**: Uses wrist β specified finger tip to determine hand rotation
- **Automatic rotation**: Image rotated to canonical orientation (wrist at bottom, fingers up) based on selected finger
### Input Requirements
For optimal results:
- Resolution: 1080p or higher recommended
- View angle: Near top-down view
- **Finger**: One finger extended (index, middle, or ring). Specify with `--finger-index`
- Credit card: Must show at least 3 corners, aspect ratio ~1.586
- Finger and card must be on the same plane
- Good lighting, minimal blur
### Failure Modes (values of `fail_reason`)
- `hand_not_detected` β MediaPipe did not locate a hand
- `card_not_detected` β classical or SAM card detector returned nothing
- `card_not_parallel` β card detected but `scale_confidence β€ 0.9` (too much perspective)
- `finger_isolation_failed`, `finger_mask_too_small`, `contour_extraction_failed` β finger segmentation stages
- `axis_estimation_failed` β landmarks missing or failed quality checks (NaN, collapsed, non-monotonic, below min length)
- `zone_localization_failed` β ring zone could not be derived
- `sobel_edge_refinement_failed` β `sobel` mode requested but edge detection raised
- `insufficient_edge_samples_<N>` β `mask` mode: too few valid rows to form a robust median
## Output Format
### JSON Output Structure
```json
{
"finger_outer_diameter_cm": 1.78,
"confidence": 0.86,
"scale_px_per_cm": 42.3,
"quality_flags": {
"card_detected": true,
"finger_detected": true,
"view_angle_ok": true
},
"fail_reason": null
}
```
### Debug Visualization
The result PNG is written alongside every JSON output. With `--debug`, the same sibling directory also gets per-phase subdirs with numbered stage images (`NN_name.png`) produced by a single writer, `DebugObserver`:
- `finger_segmentation_debug/` β MediaPipe landmarks, hand skeleton; `sam_hand/` subdir for SAM mask + overlay
- `card_detection_debug/` (classical) or `sam_card_prompt_debug/` (SAM) β strategy waterfall / prompt points / candidates / final selection
- `edge_refinement_debug/` β ROI, Sobel stages, subpixel refinement, per-row widths (sobel mode)
### Observability
`src/logging_config.py` provides `configure_logging()` (called once by each entry point) and a `log_phase(name, totals)` context manager. All phase timings log as `[phase] name: X ms` through the standard `logger`; `src/` modules use module-level `logging.getLogger(__name__)` exclusively β no `print()`. Terminal-only decorative output (final result summary, "TESTING MODE" banner) lives in `src/cli_display.py` and is imported only by `measure_finger.py` main.
## Code Patterns and Conventions
- Functions raise on malformed inputs; `measure_finger()` maps exceptions to structured `fail_reason` values in the output dict.
- Realistic width range: 1.0β3.0 cm (typical 1.4β2.4 cm). Out-of-range widths log a warning but do not fail.
- Credit card aspect ratio tolerance: Β±15% of 1.586. `scale_confidence > 0.9` is required (hard fail `card_not_parallel` otherwise).
- Coordinate convention: OpenCV is `(row, col) = (y, x)`; most `src/geometry.py` helpers use `(x, y)`. Contours are `Nx2` in `(x, y)` format.
|