Spaces:

feng-x
/

ring-sizer

Running

File size: 15,427 Bytes

347d1a8

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Standard Task Workflow

For tasks of implementing **new features**:
1. Read PRD.md, Plan.md, Progress.md before coding
2. Summarize current project state before implementation
3. Carry out the implementatation; after that, build and test if possible
4. Update Progress.md after changes
5. Commit with a clear, concise message

For tasks of **bug fixing**:
1. Summarize the bug, reason and solution before implementation
2. Carry out the implementation to fix the bug; build and test afterwards;
3. Update Progress.md after changes
4. Commit with a clear, concise message

For tasks of **reboot** from a new codex session:
1. Read doc/v0/PRD.md, doc/v0/Plan.md, doc/v0/Progress.md for baseline implementation
2. Read doc/v1/PRD.md, doc/v1/Plan.md, doc/v1/Progress.md for edge refinement (v1)
3. Assume this is a continuation of an existing project.
4. Summarize your understanding of the current state and propose the next concrete step without writing code yet.

## Project Overview

Ring Sizer is a **local, terminal-executable computer vision program** that measures the outer width (diameter) of a finger at the ring-wearing zone using a single RGB image. It uses a standard credit card (ISO/IEC 7810 ID-1: 85.60mm × 53.98mm) as a physical size reference for scale calibration.

**Key characteristics:**
- Single image input (JPG/PNG)
- **v1: Dual edge detection** - Landmark-based axis + Sobel gradient refinement
- MediaPipe-based hand and finger segmentation
- MediaPipe-based hand and finger segmentation
- Outputs JSON measurement data and optional debug visualization
- No cloud processing, runs entirely locally
- Python 3.8+ with OpenCV, NumPy, MediaPipe, and SciPy

## Development Commands

### Installation
```bash
# Create virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt
```

### Running the Program
```bash
# Basic measurement (defaults to index finger, auto edge detection)
python measure_finger.py --input input/test_image.jpg --output output/result.json

# Measure specific finger (index, middle, ring, or auto)
python measure_finger.py \
  --input input/test_image.jpg \
  --output output/result.json \
  --finger-index ring

# With debug visualization
python measure_finger.py \
  --input input/test_image.jpg \
  --output output/result.json \
  --finger-index middle \
  --debug output/debug_overlay.png

# Force Sobel edge refinement (v1)
python measure_finger.py \
  --input image.jpg \
  --output result.json \
  --finger-index ring \
  --edge-method sobel \
  --sobel-threshold 15.0 \
  --debug output/debug.png

# Compare both methods
python measure_finger.py \
  --input image.jpg \
  --output result.json \
  --finger-index middle \
  --edge-method compare \
  --debug output/debug.png

# Force contour method (v0)
python measure_finger.py \
  --input image.jpg \
  --output result.json \
  --finger-index index \
  --edge-method contour
```

## Architecture Overview

### Processing Pipeline (9 Phases)

The measurement pipeline follows a strict sequential flow:

1. **Image Quality Check** - Blur detection, exposure validation, resolution check
2. **Credit Card Detection & Scale Calibration** - Detects card, verifies aspect ratio (~1.586), computes `px_per_cm`
3. **Hand & Finger Segmentation** - MediaPipe hand detection, finger isolation, mask generation
4. **Finger Contour Extraction** - Extracts outer contour from cleaned mask
5. **Finger Axis Estimation** - PCA-based principal axis calculation, determines palm-end vs tip-end
6. **Ring-Wearing Zone Localization** - Defines zone at 15%-25% of finger length from palm-side
7. **Width Measurement** - Samples 20 cross-sections perpendicular to axis, uses median width
8. **Confidence Scoring** - Multi-factor scoring (card 30%, finger 30%, measurement 40%)
9. **Debug Visualization** - Generates annotated overlay image

### Module Structure

The codebase is organized into focused utility modules in `src/`:

| Module | Primary Responsibilities |
|--------|--------------------------|
| `card_detection.py` | Credit card detection, perspective correction, scale calibration (`px_per_cm`) |
| `finger_segmentation.py` | MediaPipe integration, hand/finger isolation, mask cleaning, contour extraction |
| `geometry.py` | PCA axis estimation, ring zone localization, cross-section width measurement, line-contour intersections |
| `image_quality.py` | Blur detection (Laplacian variance), exposure checks, resolution validation |
| `confidence.py` | Component confidence scoring (card, finger, measurement), overall confidence computation |
| `visualization.py` | Debug overlay generation with contours, zones, measurements, and annotations |

### Key Design Decisions

**Ring-Wearing Zone Definition:**
- Located at 15%-25% of finger length from palm-side end
- Width measured by sampling 20 cross-sections within this zone
- Final measurement is the **median width** (robust to outliers)

**Axis Estimation:**
- Uses PCA (Principal Component Analysis) on finger mask points
- Determines palm-end vs tip-end using either:
  1. MediaPipe landmarks (preferred, if available)
  2. Thickness heuristic (thinner end is likely the tip)

**Confidence Scoring:**
- 3-component weighted average: Card (30%) + Finger (30%) + Measurement (40%)
- Confidence levels: HIGH (>0.85), MEDIUM (0.6-0.85), LOW (<0.6)
- Factors: card detection quality, finger mask area, width variance, aspect ratios

**Measurement Approach:**
- Perpendicular cross-sections to finger axis
- Line-contour intersection algorithm finds left/right edges
- Uses farthest pair of intersections to handle complex contours
- Converts pixels to cm using calibrated scale factor

---

## v1 Architecture (Edge Refinement)

### What's New in v1

v1 improves measurement accuracy by replacing contour-based edge detection with gradient-based Sobel edge refinement. Key improvements:

- **Landmark-based axis**: Uses MediaPipe finger landmarks (MCP→PIP→DIP→TIP) for more anatomically consistent axis estimation
- **Sobel edge detection**: Bidirectional gradient filtering for pixel-precise edge localization
- **Sub-pixel refinement**: Parabola fitting achieves <0.5px precision (~0.003cm at typical resolution)
- **Quality-based fallback**: Automatically uses v0 contour method if Sobel quality insufficient
- **Enhanced confidence**: Adds edge quality component (gradient strength, consistency, smoothness, symmetry)

### v1 Processing Pipeline (Enhanced Phases)

**Phase 5a: Landmark-Based Axis Estimation (v1)**
- Uses MediaPipe finger landmarks directly (4 points: MCP, PIP, DIP, TIP)
- **Finger selection**: Defaults to index finger, can specify middle or ring finger via `--finger-index`
- Orientation detection uses the **specified finger** for axis calculation (wrist → finger tip)
- Image automatically rotated to canonical orientation (wrist at bottom, fingers pointing up)
- Three axis calculation methods:
  - `endpoints`: Simple MCP→TIP vector
  - `linear_fit`: Linear regression on all 4 landmarks (default, most robust)
  - `median_direction`: Median of segment directions
- Falls back to PCA if landmarks unavailable or quality check fails
- Validation checks: NaN/inf, minimum spacing, monotonic progression, minimum length

**Phase 7b: Sobel Edge Refinement (v1)**
```
1. Extract ROI around ring zone → 2. Apply bidirectional Sobel filters →
3. Detect edges per cross-section → 4. Sub-pixel refinement → 5. Measure width
```

1. **ROI Extraction**
   - Rectangular region around ring zone with padding (50px for gradient context)
   - Width estimation: `finger_length / 3.0` (conservative)
   - Optional rotation alignment (not used by default)

2. **Bidirectional Sobel Filtering**
   - Applies `cv2.Sobel` with configurable kernel size (3, 5, or 7)
   - Computes gradient_x (horizontal edges), gradient_y (vertical edges)
   - Calculates gradient magnitude and direction
   - Auto-detects filter orientation from ROI aspect ratio

3. **Edge Detection Per Cross-Section**
   - **Mask-constrained mode** (primary):
     - Finds leftmost/rightmost finger mask pixels (finger boundaries)
     - Searches ±10px around boundaries for strongest gradient
     - Combines anatomical accuracy (mask) with sub-pixel precision (gradient)
   - **Gradient-only mode** (fallback): Pure Sobel without mask constraint

4. **Sub-Pixel Edge Localization**
   - Parabola fitting: f(x) = ax² + bx + c
   - Samples gradient at x-1, x, x+1
   - Finds parabola peak: x_peak = -b/(2a)
   - Constrains refinement to ±0.5 pixels
   - Achieves <0.5px precision (~0.003cm at 185 px/cm)

5. **Width Measurement**
   - Calculates width for each valid row
   - Outlier filtering using Median Absolute Deviation (MAD)
   - Removes measurements >3 MAD from median
   - Computes median, mean, std dev
   - Converts pixels to cm using scale factor

**Phase 8b: Enhanced Confidence Scoring (v1)**
- Adds 4th component: Edge Quality (20% weight)
  - Gradient strength: Avg magnitude at detected edges
  - Consistency: % of rows with valid edge pairs
  - Smoothness: Edge position variance (lower = better)
  - Symmetry: Left/right edge strength balance
- Reweights other components: Card 25%, Finger 25%, Measurement 30%

### v1 Module Structure

| Module | v1 Enhancements |
|--------|-----------------|
| `geometry.py` | Added `estimate_finger_axis_from_landmarks()`, `_validate_landmark_quality()`, landmark-based zone localization |
| **`edge_refinement.py`** | **[NEW]** Complete Sobel edge refinement pipeline with sub-pixel precision |
| `confidence.py` | Added `compute_edge_quality_confidence()`, dual-mode confidence calculation |
| `debug_observer.py` | Added 9 edge refinement drawing functions for visualization |
| `measure_finger.py` | CLI flags for edge method selection, method comparison mode |

### v1 CLI Flags

| Flag | Values | Default | Description |
|------|--------|---------|-------------|
| `--finger-index` | auto, index, middle, ring, pinky | **index** | Which finger to measure and use for orientation |
| `--edge-method` | auto, contour, sobel, compare | auto | Edge detection method |
| `--sobel-threshold` | float | 15.0 | Minimum gradient magnitude |
| `--sobel-kernel-size` | 3, 5, 7 | 3 | Sobel kernel size |
| `--no-subpixel` | flag | False | Disable sub-pixel refinement |

### v1 Auto Mode Behavior

When `--edge-method auto` (default):
1. Always computes contour measurement (v0 baseline)
2. Attempts Sobel edge refinement
3. Evaluates Sobel quality score (threshold: 0.7)
4. Checks consistency (>50% success rate required)
5. Verifies width reasonableness (0.8-3.5 cm)
6. Checks agreement with contour (<50% difference)
7. Uses Sobel if all checks pass, otherwise falls back to contour
8. Reports method used in `edge_method_used` field

### v1 Debug Output

When `--debug` flag used, generates:
- Main debug overlay (same as v0, shows final result)
- `output/edge_refinement_debug/` subdirectory with 12 images:
  - **Stage A** (3): Landmark axis, ring zone, ROI extraction
  - **Stage B** (5): Sobel gradients, candidates, selected edges
  - **Stage C** (4): Sub-pixel refinement, widths, distribution, outliers

### v1 Failure Modes (Additional)

- `sobel_edge_refinement_failed` - Sobel method explicitly requested but failed
- `quality_score_low_X.XX` - Edge quality below threshold (auto fallback)
- `consistency_low_X.XX` - Too few valid edge detections
- `width_unreasonable` - Measured width outside realistic range
- `disagreement_with_contour` - Sobel and contour differ by >50%

---

## Important Technical Details

### What This Measures
The system measures the **external horizontal width** (outer diameter) of the finger at the ring-wearing zone. This is:
- ✅ The width of soft tissue + bone at the ring-wearing position
- ❌ NOT the inner diameter of a ring
- Used as a geometric proxy for downstream ring size mapping (out of scope for v0)

### Coordinate Systems
- Images use standard OpenCV format: (row, col) = (y, x)
- Most geometry functions work in (x, y) format
- Contours are Nx2 arrays in (x, y) format
- Careful conversion needed between formats (see `geometry.py:35`)

### MediaPipe Integration
- Uses pretrained hand landmark detection model (no custom training)
- Provides 21 hand landmarks per hand
- Each finger has 4 landmarks: MCP (base), PIP, DIP, TIP
- Finger indices: 0=thumb, 1=index, 2=middle, 3=ring, 4=pinky
- **Orientation detection**: Uses wrist → specified finger tip to determine hand rotation
- **Automatic rotation**: Image rotated to canonical orientation (wrist at bottom, fingers up) based on selected finger

### Input Requirements
For optimal results:
- Resolution: 1080p or higher recommended
- View angle: Near top-down view
- **Finger**: One finger extended (index, middle, or ring). Specify with `--finger-index`
- Credit card: Must show at least 3 corners, aspect ratio ~1.586
- Finger and card must be on the same plane
- Good lighting, minimal blur

### Failure Modes
The system can fail at various stages:
- `card_not_detected` - Credit card not found or aspect ratio invalid
- `hand_not_detected` - No hand detected by MediaPipe
- `finger_isolation_failed` - Could not isolate specified finger
- `finger_mask_too_small` - Mask area too small after cleaning
- `contour_extraction_failed` - Could not extract valid contour
- `axis_estimation_failed` - PCA failed or insufficient points
- `zone_localization_failed` - Could not define ring zone
- `width_measurement_failed` - No valid cross-section intersections

## Output Format

### JSON Output Structure
```json
{
  "finger_outer_diameter_cm": 1.78,
  "confidence": 0.86,
  "scale_px_per_cm": 42.3,
  "quality_flags": {
    "card_detected": true,
    "finger_detected": true,
    "view_angle_ok": true
  },
  "fail_reason": null
}
```

### Debug Visualization Features
When `--debug` flag is used, generates an annotated image with:
- Credit card contour and corners (green)
- Finger contour (magenta, thick lines)
- Finger axis and endpoints (cyan/yellow)
- Ring-wearing zone band (yellow, semi-transparent)
- Cross-section sampling lines (orange)
- Measurement intersection points (blue circles)
- Final measurement and confidence text (large, readable font)

## Code Patterns and Conventions

### Error Handling
- Functions return `None` or raise exceptions on failure
- Main pipeline (`measure_finger()`) returns structured output dict with `fail_reason`
- Console logging provides detailed progress information

### Type Hints
- Extensive use of type hints throughout
- Dict return types with `Dict[str, Any]` for structured data
- NumPy arrays typed as `np.ndarray`
- Literal types for enums (e.g., `FingerIndex`)

### Data Flow
- All major functions return dictionaries with consistent keys
- Downstream functions accept upstream outputs directly
- Debug visualization receives all intermediate results
- Clean separation between detection, computation, and visualization

### Validation and Sanity Checks
- Finger width should be in realistic range: 1.0-3.0 cm (typical: 1.4-2.4 cm)
- Credit card aspect ratio should be close to 1.586
- View angle check: scale confidence should be >0.9 for accurate measurements
- Minimum mask area threshold prevents false detections