ring-sizer / AGENTS.md
feng-x's picture
Upload folder using huggingface_hub
347d1a8 verified
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Standard Task Workflow
For tasks of implementing **new features**:
1. Read PRD.md, Plan.md, Progress.md before coding
2. Summarize current project state before implementation
3. Carry out the implementatation; after that, build and test if possible
4. Update Progress.md after changes
5. Commit with a clear, concise message
For tasks of **bug fixing**:
1. Summarize the bug, reason and solution before implementation
2. Carry out the implementation to fix the bug; build and test afterwards;
3. Update Progress.md after changes
4. Commit with a clear, concise message
For tasks of **reboot** from a new codex session:
1. Read doc/v0/PRD.md, doc/v0/Plan.md, doc/v0/Progress.md for baseline implementation
2. Read doc/v1/PRD.md, doc/v1/Plan.md, doc/v1/Progress.md for edge refinement (v1)
3. Assume this is a continuation of an existing project.
4. Summarize your understanding of the current state and propose the next concrete step without writing code yet.
## Project Overview
Ring Sizer is a **local, terminal-executable computer vision program** that measures the outer width (diameter) of a finger at the ring-wearing zone using a single RGB image. It uses a standard credit card (ISO/IEC 7810 ID-1: 85.60mm × 53.98mm) as a physical size reference for scale calibration.
**Key characteristics:**
- Single image input (JPG/PNG)
- **v1: Dual edge detection** - Landmark-based axis + Sobel gradient refinement
- MediaPipe-based hand and finger segmentation
- MediaPipe-based hand and finger segmentation
- Outputs JSON measurement data and optional debug visualization
- No cloud processing, runs entirely locally
- Python 3.8+ with OpenCV, NumPy, MediaPipe, and SciPy
## Development Commands
### Installation
```bash
# Create virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
```
### Running the Program
```bash
# Basic measurement (defaults to index finger, auto edge detection)
python measure_finger.py --input input/test_image.jpg --output output/result.json
# Measure specific finger (index, middle, ring, or auto)
python measure_finger.py \
--input input/test_image.jpg \
--output output/result.json \
--finger-index ring
# With debug visualization
python measure_finger.py \
--input input/test_image.jpg \
--output output/result.json \
--finger-index middle \
--debug output/debug_overlay.png
# Force Sobel edge refinement (v1)
python measure_finger.py \
--input image.jpg \
--output result.json \
--finger-index ring \
--edge-method sobel \
--sobel-threshold 15.0 \
--debug output/debug.png
# Compare both methods
python measure_finger.py \
--input image.jpg \
--output result.json \
--finger-index middle \
--edge-method compare \
--debug output/debug.png
# Force contour method (v0)
python measure_finger.py \
--input image.jpg \
--output result.json \
--finger-index index \
--edge-method contour
```
## Architecture Overview
### Processing Pipeline (9 Phases)
The measurement pipeline follows a strict sequential flow:
1. **Image Quality Check** - Blur detection, exposure validation, resolution check
2. **Credit Card Detection & Scale Calibration** - Detects card, verifies aspect ratio (~1.586), computes `px_per_cm`
3. **Hand & Finger Segmentation** - MediaPipe hand detection, finger isolation, mask generation
4. **Finger Contour Extraction** - Extracts outer contour from cleaned mask
5. **Finger Axis Estimation** - PCA-based principal axis calculation, determines palm-end vs tip-end
6. **Ring-Wearing Zone Localization** - Defines zone at 15%-25% of finger length from palm-side
7. **Width Measurement** - Samples 20 cross-sections perpendicular to axis, uses median width
8. **Confidence Scoring** - Multi-factor scoring (card 30%, finger 30%, measurement 40%)
9. **Debug Visualization** - Generates annotated overlay image
### Module Structure
The codebase is organized into focused utility modules in `src/`:
| Module | Primary Responsibilities |
|--------|--------------------------|
| `card_detection.py` | Credit card detection, perspective correction, scale calibration (`px_per_cm`) |
| `finger_segmentation.py` | MediaPipe integration, hand/finger isolation, mask cleaning, contour extraction |
| `geometry.py` | PCA axis estimation, ring zone localization, cross-section width measurement, line-contour intersections |
| `image_quality.py` | Blur detection (Laplacian variance), exposure checks, resolution validation |
| `confidence.py` | Component confidence scoring (card, finger, measurement), overall confidence computation |
| `visualization.py` | Debug overlay generation with contours, zones, measurements, and annotations |
### Key Design Decisions
**Ring-Wearing Zone Definition:**
- Located at 15%-25% of finger length from palm-side end
- Width measured by sampling 20 cross-sections within this zone
- Final measurement is the **median width** (robust to outliers)
**Axis Estimation:**
- Uses PCA (Principal Component Analysis) on finger mask points
- Determines palm-end vs tip-end using either:
1. MediaPipe landmarks (preferred, if available)
2. Thickness heuristic (thinner end is likely the tip)
**Confidence Scoring:**
- 3-component weighted average: Card (30%) + Finger (30%) + Measurement (40%)
- Confidence levels: HIGH (>0.85), MEDIUM (0.6-0.85), LOW (<0.6)
- Factors: card detection quality, finger mask area, width variance, aspect ratios
**Measurement Approach:**
- Perpendicular cross-sections to finger axis
- Line-contour intersection algorithm finds left/right edges
- Uses farthest pair of intersections to handle complex contours
- Converts pixels to cm using calibrated scale factor
---
## v1 Architecture (Edge Refinement)
### What's New in v1
v1 improves measurement accuracy by replacing contour-based edge detection with gradient-based Sobel edge refinement. Key improvements:
- **Landmark-based axis**: Uses MediaPipe finger landmarks (MCP→PIP→DIP→TIP) for more anatomically consistent axis estimation
- **Sobel edge detection**: Bidirectional gradient filtering for pixel-precise edge localization
- **Sub-pixel refinement**: Parabola fitting achieves <0.5px precision (~0.003cm at typical resolution)
- **Quality-based fallback**: Automatically uses v0 contour method if Sobel quality insufficient
- **Enhanced confidence**: Adds edge quality component (gradient strength, consistency, smoothness, symmetry)
### v1 Processing Pipeline (Enhanced Phases)
**Phase 5a: Landmark-Based Axis Estimation (v1)**
- Uses MediaPipe finger landmarks directly (4 points: MCP, PIP, DIP, TIP)
- **Finger selection**: Defaults to index finger, can specify middle or ring finger via `--finger-index`
- Orientation detection uses the **specified finger** for axis calculation (wrist → finger tip)
- Image automatically rotated to canonical orientation (wrist at bottom, fingers pointing up)
- Three axis calculation methods:
- `endpoints`: Simple MCP→TIP vector
- `linear_fit`: Linear regression on all 4 landmarks (default, most robust)
- `median_direction`: Median of segment directions
- Falls back to PCA if landmarks unavailable or quality check fails
- Validation checks: NaN/inf, minimum spacing, monotonic progression, minimum length
**Phase 7b: Sobel Edge Refinement (v1)**
```
1. Extract ROI around ring zone → 2. Apply bidirectional Sobel filters →
3. Detect edges per cross-section → 4. Sub-pixel refinement → 5. Measure width
```
1. **ROI Extraction**
- Rectangular region around ring zone with padding (50px for gradient context)
- Width estimation: `finger_length / 3.0` (conservative)
- Optional rotation alignment (not used by default)
2. **Bidirectional Sobel Filtering**
- Applies `cv2.Sobel` with configurable kernel size (3, 5, or 7)
- Computes gradient_x (horizontal edges), gradient_y (vertical edges)
- Calculates gradient magnitude and direction
- Auto-detects filter orientation from ROI aspect ratio
3. **Edge Detection Per Cross-Section**
- **Mask-constrained mode** (primary):
- Finds leftmost/rightmost finger mask pixels (finger boundaries)
- Searches ±10px around boundaries for strongest gradient
- Combines anatomical accuracy (mask) with sub-pixel precision (gradient)
- **Gradient-only mode** (fallback): Pure Sobel without mask constraint
4. **Sub-Pixel Edge Localization**
- Parabola fitting: f(x) = ax² + bx + c
- Samples gradient at x-1, x, x+1
- Finds parabola peak: x_peak = -b/(2a)
- Constrains refinement to ±0.5 pixels
- Achieves <0.5px precision (~0.003cm at 185 px/cm)
5. **Width Measurement**
- Calculates width for each valid row
- Outlier filtering using Median Absolute Deviation (MAD)
- Removes measurements >3 MAD from median
- Computes median, mean, std dev
- Converts pixels to cm using scale factor
**Phase 8b: Enhanced Confidence Scoring (v1)**
- Adds 4th component: Edge Quality (20% weight)
- Gradient strength: Avg magnitude at detected edges
- Consistency: % of rows with valid edge pairs
- Smoothness: Edge position variance (lower = better)
- Symmetry: Left/right edge strength balance
- Reweights other components: Card 25%, Finger 25%, Measurement 30%
### v1 Module Structure
| Module | v1 Enhancements |
|--------|-----------------|
| `geometry.py` | Added `estimate_finger_axis_from_landmarks()`, `_validate_landmark_quality()`, landmark-based zone localization |
| **`edge_refinement.py`** | **[NEW]** Complete Sobel edge refinement pipeline with sub-pixel precision |
| `confidence.py` | Added `compute_edge_quality_confidence()`, dual-mode confidence calculation |
| `debug_observer.py` | Added 9 edge refinement drawing functions for visualization |
| `measure_finger.py` | CLI flags for edge method selection, method comparison mode |
### v1 CLI Flags
| Flag | Values | Default | Description |
|------|--------|---------|-------------|
| `--finger-index` | auto, index, middle, ring, pinky | **index** | Which finger to measure and use for orientation |
| `--edge-method` | auto, contour, sobel, compare | auto | Edge detection method |
| `--sobel-threshold` | float | 15.0 | Minimum gradient magnitude |
| `--sobel-kernel-size` | 3, 5, 7 | 3 | Sobel kernel size |
| `--no-subpixel` | flag | False | Disable sub-pixel refinement |
### v1 Auto Mode Behavior
When `--edge-method auto` (default):
1. Always computes contour measurement (v0 baseline)
2. Attempts Sobel edge refinement
3. Evaluates Sobel quality score (threshold: 0.7)
4. Checks consistency (>50% success rate required)
5. Verifies width reasonableness (0.8-3.5 cm)
6. Checks agreement with contour (<50% difference)
7. Uses Sobel if all checks pass, otherwise falls back to contour
8. Reports method used in `edge_method_used` field
### v1 Debug Output
When `--debug` flag used, generates:
- Main debug overlay (same as v0, shows final result)
- `output/edge_refinement_debug/` subdirectory with 12 images:
- **Stage A** (3): Landmark axis, ring zone, ROI extraction
- **Stage B** (5): Sobel gradients, candidates, selected edges
- **Stage C** (4): Sub-pixel refinement, widths, distribution, outliers
### v1 Failure Modes (Additional)
- `sobel_edge_refinement_failed` - Sobel method explicitly requested but failed
- `quality_score_low_X.XX` - Edge quality below threshold (auto fallback)
- `consistency_low_X.XX` - Too few valid edge detections
- `width_unreasonable` - Measured width outside realistic range
- `disagreement_with_contour` - Sobel and contour differ by >50%
---
## Important Technical Details
### What This Measures
The system measures the **external horizontal width** (outer diameter) of the finger at the ring-wearing zone. This is:
- ✅ The width of soft tissue + bone at the ring-wearing position
- ❌ NOT the inner diameter of a ring
- Used as a geometric proxy for downstream ring size mapping (out of scope for v0)
### Coordinate Systems
- Images use standard OpenCV format: (row, col) = (y, x)
- Most geometry functions work in (x, y) format
- Contours are Nx2 arrays in (x, y) format
- Careful conversion needed between formats (see `geometry.py:35`)
### MediaPipe Integration
- Uses pretrained hand landmark detection model (no custom training)
- Provides 21 hand landmarks per hand
- Each finger has 4 landmarks: MCP (base), PIP, DIP, TIP
- Finger indices: 0=thumb, 1=index, 2=middle, 3=ring, 4=pinky
- **Orientation detection**: Uses wrist → specified finger tip to determine hand rotation
- **Automatic rotation**: Image rotated to canonical orientation (wrist at bottom, fingers up) based on selected finger
### Input Requirements
For optimal results:
- Resolution: 1080p or higher recommended
- View angle: Near top-down view
- **Finger**: One finger extended (index, middle, or ring). Specify with `--finger-index`
- Credit card: Must show at least 3 corners, aspect ratio ~1.586
- Finger and card must be on the same plane
- Good lighting, minimal blur
### Failure Modes
The system can fail at various stages:
- `card_not_detected` - Credit card not found or aspect ratio invalid
- `hand_not_detected` - No hand detected by MediaPipe
- `finger_isolation_failed` - Could not isolate specified finger
- `finger_mask_too_small` - Mask area too small after cleaning
- `contour_extraction_failed` - Could not extract valid contour
- `axis_estimation_failed` - PCA failed or insufficient points
- `zone_localization_failed` - Could not define ring zone
- `width_measurement_failed` - No valid cross-section intersections
## Output Format
### JSON Output Structure
```json
{
"finger_outer_diameter_cm": 1.78,
"confidence": 0.86,
"scale_px_per_cm": 42.3,
"quality_flags": {
"card_detected": true,
"finger_detected": true,
"view_angle_ok": true
},
"fail_reason": null
}
```
### Debug Visualization Features
When `--debug` flag is used, generates an annotated image with:
- Credit card contour and corners (green)
- Finger contour (magenta, thick lines)
- Finger axis and endpoints (cyan/yellow)
- Ring-wearing zone band (yellow, semi-transparent)
- Cross-section sampling lines (orange)
- Measurement intersection points (blue circles)
- Final measurement and confidence text (large, readable font)
## Code Patterns and Conventions
### Error Handling
- Functions return `None` or raise exceptions on failure
- Main pipeline (`measure_finger()`) returns structured output dict with `fail_reason`
- Console logging provides detailed progress information
### Type Hints
- Extensive use of type hints throughout
- Dict return types with `Dict[str, Any]` for structured data
- NumPy arrays typed as `np.ndarray`
- Literal types for enums (e.g., `FingerIndex`)
### Data Flow
- All major functions return dictionaries with consistent keys
- Downstream functions accept upstream outputs directly
- Debug visualization receives all intermediate results
- Clean separation between detection, computation, and visualization
### Validation and Sanity Checks
- Finger width should be in realistic range: 1.0-3.0 cm (typical: 1.4-2.4 cm)
- Credit card aspect ratio should be close to 1.586
- View angle check: scale confidence should be >0.9 for accurate measurements
- Minimum mask area threshold prevents false detections