ring-sizer / AGENTS.md
feng-x's picture
Upload folder using huggingface_hub
347d1a8 verified

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Standard Task Workflow

For tasks of implementing new features:

  1. Read PRD.md, Plan.md, Progress.md before coding
  2. Summarize current project state before implementation
  3. Carry out the implementatation; after that, build and test if possible
  4. Update Progress.md after changes
  5. Commit with a clear, concise message

For tasks of bug fixing:

  1. Summarize the bug, reason and solution before implementation
  2. Carry out the implementation to fix the bug; build and test afterwards;
  3. Update Progress.md after changes
  4. Commit with a clear, concise message

For tasks of reboot from a new codex session:

  1. Read doc/v0/PRD.md, doc/v0/Plan.md, doc/v0/Progress.md for baseline implementation
  2. Read doc/v1/PRD.md, doc/v1/Plan.md, doc/v1/Progress.md for edge refinement (v1)
  3. Assume this is a continuation of an existing project.
  4. Summarize your understanding of the current state and propose the next concrete step without writing code yet.

Project Overview

Ring Sizer is a local, terminal-executable computer vision program that measures the outer width (diameter) of a finger at the ring-wearing zone using a single RGB image. It uses a standard credit card (ISO/IEC 7810 ID-1: 85.60mm × 53.98mm) as a physical size reference for scale calibration.

Key characteristics:

  • Single image input (JPG/PNG)
  • v1: Dual edge detection - Landmark-based axis + Sobel gradient refinement
  • MediaPipe-based hand and finger segmentation
  • MediaPipe-based hand and finger segmentation
  • Outputs JSON measurement data and optional debug visualization
  • No cloud processing, runs entirely locally
  • Python 3.8+ with OpenCV, NumPy, MediaPipe, and SciPy

Development Commands

Installation

# Create virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Running the Program

# Basic measurement (defaults to index finger, auto edge detection)
python measure_finger.py --input input/test_image.jpg --output output/result.json

# Measure specific finger (index, middle, ring, or auto)
python measure_finger.py \
  --input input/test_image.jpg \
  --output output/result.json \
  --finger-index ring

# With debug visualization
python measure_finger.py \
  --input input/test_image.jpg \
  --output output/result.json \
  --finger-index middle \
  --debug output/debug_overlay.png

# Force Sobel edge refinement (v1)
python measure_finger.py \
  --input image.jpg \
  --output result.json \
  --finger-index ring \
  --edge-method sobel \
  --sobel-threshold 15.0 \
  --debug output/debug.png

# Compare both methods
python measure_finger.py \
  --input image.jpg \
  --output result.json \
  --finger-index middle \
  --edge-method compare \
  --debug output/debug.png

# Force contour method (v0)
python measure_finger.py \
  --input image.jpg \
  --output result.json \
  --finger-index index \
  --edge-method contour

Architecture Overview

Processing Pipeline (9 Phases)

The measurement pipeline follows a strict sequential flow:

  1. Image Quality Check - Blur detection, exposure validation, resolution check
  2. Credit Card Detection & Scale Calibration - Detects card, verifies aspect ratio (~1.586), computes px_per_cm
  3. Hand & Finger Segmentation - MediaPipe hand detection, finger isolation, mask generation
  4. Finger Contour Extraction - Extracts outer contour from cleaned mask
  5. Finger Axis Estimation - PCA-based principal axis calculation, determines palm-end vs tip-end
  6. Ring-Wearing Zone Localization - Defines zone at 15%-25% of finger length from palm-side
  7. Width Measurement - Samples 20 cross-sections perpendicular to axis, uses median width
  8. Confidence Scoring - Multi-factor scoring (card 30%, finger 30%, measurement 40%)
  9. Debug Visualization - Generates annotated overlay image

Module Structure

The codebase is organized into focused utility modules in src/:

Module Primary Responsibilities
card_detection.py Credit card detection, perspective correction, scale calibration (px_per_cm)
finger_segmentation.py MediaPipe integration, hand/finger isolation, mask cleaning, contour extraction
geometry.py PCA axis estimation, ring zone localization, cross-section width measurement, line-contour intersections
image_quality.py Blur detection (Laplacian variance), exposure checks, resolution validation
confidence.py Component confidence scoring (card, finger, measurement), overall confidence computation
visualization.py Debug overlay generation with contours, zones, measurements, and annotations

Key Design Decisions

Ring-Wearing Zone Definition:

  • Located at 15%-25% of finger length from palm-side end
  • Width measured by sampling 20 cross-sections within this zone
  • Final measurement is the median width (robust to outliers)

Axis Estimation:

  • Uses PCA (Principal Component Analysis) on finger mask points
  • Determines palm-end vs tip-end using either:
    1. MediaPipe landmarks (preferred, if available)
    2. Thickness heuristic (thinner end is likely the tip)

Confidence Scoring:

  • 3-component weighted average: Card (30%) + Finger (30%) + Measurement (40%)
  • Confidence levels: HIGH (>0.85), MEDIUM (0.6-0.85), LOW (<0.6)
  • Factors: card detection quality, finger mask area, width variance, aspect ratios

Measurement Approach:

  • Perpendicular cross-sections to finger axis
  • Line-contour intersection algorithm finds left/right edges
  • Uses farthest pair of intersections to handle complex contours
  • Converts pixels to cm using calibrated scale factor

v1 Architecture (Edge Refinement)

What's New in v1

v1 improves measurement accuracy by replacing contour-based edge detection with gradient-based Sobel edge refinement. Key improvements:

  • Landmark-based axis: Uses MediaPipe finger landmarks (MCP→PIP→DIP→TIP) for more anatomically consistent axis estimation
  • Sobel edge detection: Bidirectional gradient filtering for pixel-precise edge localization
  • Sub-pixel refinement: Parabola fitting achieves <0.5px precision (~0.003cm at typical resolution)
  • Quality-based fallback: Automatically uses v0 contour method if Sobel quality insufficient
  • Enhanced confidence: Adds edge quality component (gradient strength, consistency, smoothness, symmetry)

v1 Processing Pipeline (Enhanced Phases)

Phase 5a: Landmark-Based Axis Estimation (v1)

  • Uses MediaPipe finger landmarks directly (4 points: MCP, PIP, DIP, TIP)
  • Finger selection: Defaults to index finger, can specify middle or ring finger via --finger-index
  • Orientation detection uses the specified finger for axis calculation (wrist → finger tip)
  • Image automatically rotated to canonical orientation (wrist at bottom, fingers pointing up)
  • Three axis calculation methods:
    • endpoints: Simple MCP→TIP vector
    • linear_fit: Linear regression on all 4 landmarks (default, most robust)
    • median_direction: Median of segment directions
  • Falls back to PCA if landmarks unavailable or quality check fails
  • Validation checks: NaN/inf, minimum spacing, monotonic progression, minimum length

Phase 7b: Sobel Edge Refinement (v1)

1. Extract ROI around ring zone → 2. Apply bidirectional Sobel filters →
3. Detect edges per cross-section → 4. Sub-pixel refinement → 5. Measure width
  1. ROI Extraction

    • Rectangular region around ring zone with padding (50px for gradient context)
    • Width estimation: finger_length / 3.0 (conservative)
    • Optional rotation alignment (not used by default)
  2. Bidirectional Sobel Filtering

    • Applies cv2.Sobel with configurable kernel size (3, 5, or 7)
    • Computes gradient_x (horizontal edges), gradient_y (vertical edges)
    • Calculates gradient magnitude and direction
    • Auto-detects filter orientation from ROI aspect ratio
  3. Edge Detection Per Cross-Section

    • Mask-constrained mode (primary):
      • Finds leftmost/rightmost finger mask pixels (finger boundaries)
      • Searches ±10px around boundaries for strongest gradient
      • Combines anatomical accuracy (mask) with sub-pixel precision (gradient)
    • Gradient-only mode (fallback): Pure Sobel without mask constraint
  4. Sub-Pixel Edge Localization

    • Parabola fitting: f(x) = ax² + bx + c
    • Samples gradient at x-1, x, x+1
    • Finds parabola peak: x_peak = -b/(2a)
    • Constrains refinement to ±0.5 pixels
    • Achieves <0.5px precision (~0.003cm at 185 px/cm)
  5. Width Measurement

    • Calculates width for each valid row
    • Outlier filtering using Median Absolute Deviation (MAD)
    • Removes measurements >3 MAD from median
    • Computes median, mean, std dev
    • Converts pixels to cm using scale factor

Phase 8b: Enhanced Confidence Scoring (v1)

  • Adds 4th component: Edge Quality (20% weight)
    • Gradient strength: Avg magnitude at detected edges
    • Consistency: % of rows with valid edge pairs
    • Smoothness: Edge position variance (lower = better)
    • Symmetry: Left/right edge strength balance
  • Reweights other components: Card 25%, Finger 25%, Measurement 30%

v1 Module Structure

Module v1 Enhancements
geometry.py Added estimate_finger_axis_from_landmarks(), _validate_landmark_quality(), landmark-based zone localization
edge_refinement.py [NEW] Complete Sobel edge refinement pipeline with sub-pixel precision
confidence.py Added compute_edge_quality_confidence(), dual-mode confidence calculation
debug_observer.py Added 9 edge refinement drawing functions for visualization
measure_finger.py CLI flags for edge method selection, method comparison mode

v1 CLI Flags

Flag Values Default Description
--finger-index auto, index, middle, ring, pinky index Which finger to measure and use for orientation
--edge-method auto, contour, sobel, compare auto Edge detection method
--sobel-threshold float 15.0 Minimum gradient magnitude
--sobel-kernel-size 3, 5, 7 3 Sobel kernel size
--no-subpixel flag False Disable sub-pixel refinement

v1 Auto Mode Behavior

When --edge-method auto (default):

  1. Always computes contour measurement (v0 baseline)
  2. Attempts Sobel edge refinement
  3. Evaluates Sobel quality score (threshold: 0.7)
  4. Checks consistency (>50% success rate required)
  5. Verifies width reasonableness (0.8-3.5 cm)
  6. Checks agreement with contour (<50% difference)
  7. Uses Sobel if all checks pass, otherwise falls back to contour
  8. Reports method used in edge_method_used field

v1 Debug Output

When --debug flag used, generates:

  • Main debug overlay (same as v0, shows final result)
  • output/edge_refinement_debug/ subdirectory with 12 images:
    • Stage A (3): Landmark axis, ring zone, ROI extraction
    • Stage B (5): Sobel gradients, candidates, selected edges
    • Stage C (4): Sub-pixel refinement, widths, distribution, outliers

v1 Failure Modes (Additional)

  • sobel_edge_refinement_failed - Sobel method explicitly requested but failed
  • quality_score_low_X.XX - Edge quality below threshold (auto fallback)
  • consistency_low_X.XX - Too few valid edge detections
  • width_unreasonable - Measured width outside realistic range
  • disagreement_with_contour - Sobel and contour differ by >50%

Important Technical Details

What This Measures

The system measures the external horizontal width (outer diameter) of the finger at the ring-wearing zone. This is:

  • ✅ The width of soft tissue + bone at the ring-wearing position
  • ❌ NOT the inner diameter of a ring
  • Used as a geometric proxy for downstream ring size mapping (out of scope for v0)

Coordinate Systems

  • Images use standard OpenCV format: (row, col) = (y, x)
  • Most geometry functions work in (x, y) format
  • Contours are Nx2 arrays in (x, y) format
  • Careful conversion needed between formats (see geometry.py:35)

MediaPipe Integration

  • Uses pretrained hand landmark detection model (no custom training)
  • Provides 21 hand landmarks per hand
  • Each finger has 4 landmarks: MCP (base), PIP, DIP, TIP
  • Finger indices: 0=thumb, 1=index, 2=middle, 3=ring, 4=pinky
  • Orientation detection: Uses wrist → specified finger tip to determine hand rotation
  • Automatic rotation: Image rotated to canonical orientation (wrist at bottom, fingers up) based on selected finger

Input Requirements

For optimal results:

  • Resolution: 1080p or higher recommended
  • View angle: Near top-down view
  • Finger: One finger extended (index, middle, or ring). Specify with --finger-index
  • Credit card: Must show at least 3 corners, aspect ratio ~1.586
  • Finger and card must be on the same plane
  • Good lighting, minimal blur

Failure Modes

The system can fail at various stages:

  • card_not_detected - Credit card not found or aspect ratio invalid
  • hand_not_detected - No hand detected by MediaPipe
  • finger_isolation_failed - Could not isolate specified finger
  • finger_mask_too_small - Mask area too small after cleaning
  • contour_extraction_failed - Could not extract valid contour
  • axis_estimation_failed - PCA failed or insufficient points
  • zone_localization_failed - Could not define ring zone
  • width_measurement_failed - No valid cross-section intersections

Output Format

JSON Output Structure

{
  "finger_outer_diameter_cm": 1.78,
  "confidence": 0.86,
  "scale_px_per_cm": 42.3,
  "quality_flags": {
    "card_detected": true,
    "finger_detected": true,
    "view_angle_ok": true
  },
  "fail_reason": null
}

Debug Visualization Features

When --debug flag is used, generates an annotated image with:

  • Credit card contour and corners (green)
  • Finger contour (magenta, thick lines)
  • Finger axis and endpoints (cyan/yellow)
  • Ring-wearing zone band (yellow, semi-transparent)
  • Cross-section sampling lines (orange)
  • Measurement intersection points (blue circles)
  • Final measurement and confidence text (large, readable font)

Code Patterns and Conventions

Error Handling

  • Functions return None or raise exceptions on failure
  • Main pipeline (measure_finger()) returns structured output dict with fail_reason
  • Console logging provides detailed progress information

Type Hints

  • Extensive use of type hints throughout
  • Dict return types with Dict[str, Any] for structured data
  • NumPy arrays typed as np.ndarray
  • Literal types for enums (e.g., FingerIndex)

Data Flow

  • All major functions return dictionaries with consistent keys
  • Downstream functions accept upstream outputs directly
  • Debug visualization receives all intermediate results
  • Clean separation between detection, computation, and visualization

Validation and Sanity Checks

  • Finger width should be in realistic range: 1.0-3.0 cm (typical: 1.4-2.4 cm)
  • Credit card aspect ratio should be close to 1.586
  • View angle check: scale confidence should be >0.9 for accurate measurements
  • Minimum mask area threshold prevents false detections