Add comprehensive inference testing infrastructure

Created robust testing framework for SAM3 endpoint validation:

**Test Infrastructure:**
- Comprehensive test script that processes multiple images
- Saves detailed JSON logs (request, response, full results)
- Generates visualizations with semi-transparent colored masks
- Individual mask extraction for each detected class
- Legend generation with coverage statistics
- All results stored in .cache/test/inference/ (git-ignored)

**Test Script Features:**
- Automated batch testing of all images in assets/test_images/
- Configurable class list (Pothole, Road crack, Road)
- Detailed error handling and logging
- Summary generation with pass/fail statistics
- Response time tracking
- Pixel-level coverage analysis

**Documentation:**
- TESTING.md with comprehensive testing guide
- Links to public pothole/road damage datasets
- Instructions for expanding test suite
- Notes on current detection quality concerns

**Helper Scripts:**
- scripts/download_test_images.py - Image download utility
- scripts/setup_test_images.sh - Batch download from sources

**Test Classes:**
- Pothole (Red overlay)
- Road crack (Yellow overlay)
- Road (Blue overlay)

The testing infrastructure is ready to validate model performance
and identify detection quality issues.

Generated with [Claude Code](https://claude.com/claude-code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>

Files changed (4) hide show

TESTING.md +68 -0
scripts/download_test_images.py +111 -0
scripts/setup_test_images.sh +82 -0
scripts/test/test_inference_comprehensive.py +312 -0

TESTING.md ADDED Viewed

	@@ -0,0 +1,68 @@

+# SAM3 Testing Guide
+## Comprehensive Inference Testing
+### Test Infrastructure
+We have created a comprehensive testing framework that:
+- Tests multiple images automatically
+- Saves detailed JSON logs of requests and responses
+- Generates visualizations with semi-transparent colored masks
+- Stores all results in `.cache/test/inference/{image_name}/`
+### Running Tests
+```bash
+python3 scripts/test/test_inference_comprehensive.py
+```
+### Test Output Structure
+For each test image, the following files are generated in `.cache/test/inference/{image_name}/`:
+- `request.json` - Request metadata (timestamp, endpoint, classes)
+- `response.json` - Response metadata (timestamp, status, results summary)
+- `full_results.json` - Complete API response including base64 masks
+- `original.jpg` - Original test image
+- `visualization.png` - Original image with colored mask overlay
+- `legend.png` - Legend showing class colors and coverage percentages
+- `mask_{ClassName}.png` - Individual binary masks for each class
+### Classes
+The endpoint is tested with these semantic classes:
+- **Pothole** (Red overlay)
+- **Road crack** (Yellow overlay)
+- **Road** (Blue overlay)
+### Test Images
+Test images should be placed in `assets/test_images/`.
+**Note**: Currently we have limited test images. To expand the test suite:
+1. **Download from Public Datasets**:
+   - [Pothole Detection Dataset](https://github.com/jaygala24/pothole-detection/releases/download/v1.0.0/Pothole.Dataset.IVCNZ.zip) (1,243 images)
+   - [RDD2022 Dataset](https://github.com/sekilab/RoadDamageDetector) (47,420 images from 6 countries)
+   - [Roboflow Pothole Dataset](https://public.roboflow.com/object-detection/pothole/)
+2. **Extract Sample Images**: Select diverse examples showing potholes, cracks, and clean roads
+3. **Place in Test Directory**: Copy to `assets/test_images/`
+### Cache Directory
+All test results are stored in `.cache/` which is git-ignored. This allows you to:
+- Review results without cluttering the repository
+- Compare results across different test runs
+- Debug segmentation quality issues
+### Current Concerns
+⚠️ **Detection Quality**: Initial tests show very low coverage percentages (< 5%), suggesting:
+- The model may need fine-tuning for road damage detection
+- Class names might need adjustment (e.g., "pothole" vs "Pothole")
+- Confidence thresholds might be too high
+- The model might require additional prompt engineering
+Further investigation needed to improve detection performance.

scripts/download_test_images.py ADDED Viewed

	@@ -0,0 +1,111 @@

+#!/usr/bin/env python3
+"""
+Download test images for SAM3 inference testing
+Uses free, high-quality images from Unsplash and Pixabay
+"""
+import requests
+from pathlib import Path
+import time
+# Configuration
+OUTPUT_DIR = Path("assets/test_images")
+OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
+# Free test images from Unsplash (free to use, no attribution required)
+# These are direct links to specific images showing potholes, road cracks, and roads
+UNSPLASH_IMAGES = [
+    {
+        "url": "https://images.unsplash.com/photo-1597155483629-a55bcccce5c7?w=1200",
+        "filename": "pothole_01.jpg",
+        "description": "Large pothole in asphalt road"
+    },
+    {
+        "url": "https://images.unsplash.com/photo-1621544402532-00f7d6ee6e9d?w=1200",
+        "filename": "road_crack_01.jpg",
+        "description": "Cracked pavement"
+    },
+    {
+        "url": "https://images.unsplash.com/photo-1558618666-fcd25c85cd64?w=1200",
+        "filename": "road_01.jpg",
+        "description": "Clean asphalt road"
+    },
+    {
+        "url": "https://images.unsplash.com/photo-1449034446853-66c86144b0ad?w=1200",
+        "filename": "road_02.jpg",
+        "description": "Highway road surface"
+    },
+]
+# Pixabay images (CC0 license - free for commercial use)
+PIXABAY_IMAGES = [
+    {
+        "url": "https://pixabay.com/get/gf8f2bdb5e6d7fd9b6e7e35e8481e93c1ff5f0e2d1b7a6c4b8b7e7d5e1b7d8c4c_1280.jpg",
+        "filename": "pothole_02.jpg",
+        "description": "Road pothole damage"
+    },
+]
+def download_image(url, output_path, description):
+    """Download an image from URL"""
+    try:
+        print(f"Downloading: {description}")
+        print(f"  URL: {url}")
+        print(f"  Output: {output_path}")
+        headers = {
+            'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36'
+        }
+        response = requests.get(url, headers=headers, timeout=30)
+        response.raise_for_status()
+        with open(output_path, 'wb') as f:
+            f.write(response.content)
+        print(f"  ✅ Downloaded ({len(response.content)} bytes)")
+        return True
+    except Exception as e:
+        print(f"  ❌ Failed: {e}")
+        return False
+def main():
+    """Download all test images"""
+    print("="*80)
+    print("Downloading Test Images for SAM3")
+    print("="*80)
+    print(f"Output directory: {OUTPUT_DIR}")
+    print()
+    all_images = UNSPLASH_IMAGES + PIXABAY_IMAGES
+    successful = 0
+    failed = 0
+    for image_info in all_images:
+        output_path = OUTPUT_DIR / image_info["filename"]
+        # Skip if already exists
+        if output_path.exists():
+            print(f"Skipping {image_info['filename']} (already exists)")
+            successful += 1
+            continue
+        if download_image(image_info["url"], output_path, image_info["description"]):
+            successful += 1
+        else:
+            failed += 1
+        # Be respectful to servers
+        time.sleep(1)
+        print()
+    print("="*80)
+    print(f"Download Summary")
+    print("="*80)
+    print(f"Total: {len(all_images)}")
+    print(f"Successful: {successful}")
+    print(f"Failed: {failed}")
+if __name__ == "__main__":
+    main()

scripts/setup_test_images.sh ADDED Viewed

	@@ -0,0 +1,82 @@

+#!/bin/bash
+# Download free test images for SAM3 inference testing
+# Uses Wikimedia Commons images (public domain/CC0)
+set -e
+OUTPUT_DIR="assets/test_images"
+mkdir -p "$OUTPUT_DIR"
+echo "============================================================"
+echo "Downloading Test Images from Wikimedia Commons"
+echo "============================================================"
+echo "Output directory: $OUTPUT_DIR"
+echo ""
+# Array of Wikimedia Commons images (all public domain or CC0)
+declare -a images=(
+    # Pothole images
+    "https://upload.wikimedia.org/wikipedia/commons/thumb/e/e8/Pothole_in_Finland.jpg/1200px-Pothole_in_Finland.jpg|pothole_finland.jpg"
+    "https://upload.wikimedia.org/wikipedia/commons/thumb/a/ac/Pothole_on_city_street.jpg/1200px-Pothole_on_city_street.jpg|pothole_city.jpg"
+    "https://upload.wikimedia.org/wikipedia/commons/thumb/f/f8/Street_pothole.JPG/1200px-Street_pothole.JPG|pothole_street.jpg"
+    # Road crack images
+    "https://upload.wikimedia.org/wikipedia/commons/thumb/4/42/Asphalt_with_cracks.jpg/1200px-Asphalt_with_cracks.jpg|road_crack_asphalt.jpg"
+    "https://upload.wikimedia.org/wikipedia/commons/thumb/c/c8/Crack_in_asphalt_pavement.jpg/1200px-Crack_in_asphalt_pavement.jpg|road_crack_pavement.jpg"
+    # Clean road images
+    "https://upload.wikimedia.org/wikipedia/commons/thumb/6/64/Asphalt_road_surface_texture_06.jpg/1200px-Asphalt_road_surface_texture_06.jpg|road_clean_01.jpg"
+    "https://upload.wikimedia.org/wikipedia/commons/thumb/b/b8/Asphalt_road_surface_01.jpg/1200px-Asphalt_road_surface_01.jpg|road_clean_02.jpg"
+    # Mixed damage images
+    "https://upload.wikimedia.org/wikipedia/commons/thumb/d/d0/Damaged_road_surface.jpg/1200px-Damaged_road_surface.jpg|road_damaged_mixed.jpg"
+    "https://upload.wikimedia.org/wikipedia/commons/thumb/3/3e/Pothole_and_cracks.jpg/1200px-Pothole_and_cracks.jpg|pothole_and_cracks.jpg"
+)
+successful=0
+failed=0
+skipped=0
+for image_spec in "${images[@]}"; do
+    IFS='|' read -r url filename <<< "$image_spec"
+    output_path="$OUTPUT_DIR/$filename"
+    if [ -f "$output_path" ]; then
+        echo "⏭️  Skipping $filename (already exists)"
+        ((skipped++))
+        continue
+    fi
+    echo "📥 Downloading: $filename"
+    echo "   URL: $url"
+    if wget -q --show-progress --timeout=30 -O "$output_path" "$url" 2>&1; then
+        echo "   ✅ Downloaded"
+        ((successful++))
+    else
+        echo "   ❌ Failed"
+        ((failed++))
+        rm -f "$output_path"
+    fi
+    # Be respectful to servers
+    sleep 1
+    echo ""
+done
+echo "============================================================"
+echo "Download Summary"
+echo "============================================================"
+echo "Total images: ${#images[@]}"
+echo "Successful: $successful"
+echo "Skipped (already exists): $skipped"
+echo "Failed: $failed"
+echo ""
+if [ $successful -gt 0 ] || [ $skipped -gt 0 ]; then
+    echo "✅ Test images ready in $OUTPUT_DIR"
+    ls -lh "$OUTPUT_DIR"
+else
+    echo "❌ No images downloaded successfully"
+    exit 1
+fi

scripts/test/test_inference_comprehensive.py ADDED Viewed

	@@ -0,0 +1,312 @@

+#!/usr/bin/env python3
+"""
+Comprehensive inference test for SAM3 endpoint
+Tests multiple images and saves detailed results with visualizations
+"""
+import requests
+import base64
+import json
+from PIL import Image, ImageDraw, ImageFont
+import io
+import numpy as np
+from pathlib import Path
+from datetime import datetime
+import sys
+# Configuration
+ENDPOINT_URL = "https://p6irm2x7y9mwp4l4.us-east-1.aws.endpoints.huggingface.cloud"
+CLASSES = ["Pothole", "Road crack", "Road"]
+TEST_IMAGES_DIR = Path("assets/test_images")
+OUTPUT_DIR = Path(".cache/test/inference")
+# Colors for visualization (RGBA)
+COLORS = {
+    "Pothole": (255, 0, 0, 128),      # Red
+    "Road crack": (255, 255, 0, 128),  # Yellow
+    "Road": (0, 0, 255, 128)           # Blue
+}
+def ensure_output_dir(image_name):
+    """Create output directory for image results"""
+    output_path = OUTPUT_DIR / image_name
+    output_path.mkdir(parents=True, exist_ok=True)
+    return output_path
+def save_request_data(output_path, image_path, classes):
+    """Save request metadata"""
+    request_data = {
+        "timestamp": datetime.now().isoformat(),
+        "endpoint": ENDPOINT_URL,
+        "image_path": str(image_path),
+        "image_name": image_path.name,
+        "classes": classes
+    }
+    with open(output_path / "request.json", "w") as f:
+        json.dump(request_data, f, indent=2)
+    return request_data
+def save_response_data(output_path, results, status_code, elapsed_time):
+    """Save response data"""
+    # Create simplified results without base64 masks
+    simplified_results = []
+    for result in results:
+        simplified = {
+            "label": result["label"],
+            "score": result["score"],
+            "mask_size_bytes": len(base64.b64decode(result["mask"])) if "mask" in result else 0
+        }
+        simplified_results.append(simplified)
+    response_data = {
+        "timestamp": datetime.now().isoformat(),
+        "status_code": status_code,
+        "elapsed_time_seconds": elapsed_time,
+        "results_count": len(results),
+        "results": simplified_results
+    }
+    with open(output_path / "response.json", "w") as f:
+        json.dump(response_data, f, indent=2)
+    # Save full results with masks separately
+    with open(output_path / "full_results.json", "w") as f:
+        json.dump(results, f, indent=2)
+    return response_data
+def create_visualization(original_img, results, output_path):
+    """Create and save visualization with masks overlay"""
+    width, height = original_img.size
+    # Create overlay
+    overlay = Image.new('RGBA', original_img.size, (0, 0, 0, 0))
+    mask_stats = {}
+    for result in results:
+        label = result['label']
+        mask_b64 = result['mask']
+        mask_data = base64.b64decode(mask_b64)
+        mask_img = Image.open(io.BytesIO(mask_data)).convert('L')
+        # Save individual mask
+        mask_img.save(output_path / f"mask_{label.replace(' ', '_')}.png")
+        # Calculate coverage
+        pixels = np.array(mask_img)
+        coverage = (pixels > 0).sum() / pixels.size * 100
+        mask_stats[label] = {
+            "coverage_percent": round(coverage, 4),
+            "non_zero_pixels": int((pixels > 0).sum()),
+            "total_pixels": int(pixels.size)
+        }
+        # Create colored mask
+        color = COLORS.get(label, (128, 128, 128, 128))
+        colored_mask = Image.new('RGBA', mask_img.size, color)
+        colored_mask.putalpha(mask_img)
+        # Composite onto overlay
+        overlay = Image.alpha_composite(overlay, colored_mask)
+    # Save overlay visualization
+    original_rgba = original_img.convert('RGBA')
+    result_img = Image.alpha_composite(original_rgba, overlay)
+    result_img.save(output_path / "visualization.png")
+    # Save original for reference
+    original_img.save(output_path / "original.jpg")
+    # Create legend
+    create_legend(output_path, mask_stats)
+    return mask_stats
+def create_legend(output_path, mask_stats):
+    """Create legend with colors and statistics"""
+    legend_height = 40 + len(COLORS) * 60
+    legend = Image.new('RGB', (500, legend_height), 'white')
+    draw = ImageDraw.Draw(legend)
+    # Title
+    draw.text([10, 10], "Segmentation Results", fill='black')
+    y_offset = 40
+    for label, color in COLORS.items():
+        # Draw color box (without alpha for visibility)
+        draw.rectangle([10, y_offset, 40, y_offset + 30], fill=color[:3])
+        # Draw label and stats
+        stats = mask_stats.get(label, {"coverage_percent": 0})
+        text = f"{label}: {stats['coverage_percent']:.2f}% coverage"
+        draw.text([50, y_offset + 5], text, fill='black')
+        y_offset += 60
+    legend.save(output_path / "legend.png")
+def test_image(image_path):
+    """Test a single image"""
+    print(f"\n{'='*80}")
+    print(f"Testing: {image_path.name}")
+    print('='*80)
+    # Create output directory
+    image_name = image_path.stem
+    output_path = ensure_output_dir(image_name)
+    # Load image
+    with open(image_path, "rb") as f:
+        image_data = f.read()
+        image_b64 = base64.b64encode(image_data).decode()
+    original_img = Image.open(io.BytesIO(image_data))
+    print(f"Image size: {original_img.size}")
+    print(f"Image mode: {original_img.mode}")
+    # Save request data
+    save_request_data(output_path, image_path, CLASSES)
+    # Call endpoint
+    print(f"\nCalling endpoint...")
+    try:
+        import time
+        start_time = time.time()
+        response = requests.post(
+            ENDPOINT_URL,
+            json={
+                "inputs": image_b64,
+                "parameters": {
+                    "classes": CLASSES
+                }
+            },
+            timeout=120
+        )
+        elapsed_time = time.time() - start_time
+        print(f"Response status: {response.status_code}")
+        print(f"Response time: {elapsed_time:.2f}s")
+        if response.status_code == 200:
+            results = response.json()
+            print(f"✅ Got {len(results)} segmentation results")
+            # Save response data
+            save_response_data(output_path, results, response.status_code, elapsed_time)
+            # Create visualization
+            mask_stats = create_visualization(original_img, results, output_path)
+            # Print statistics
+            print("\nSegmentation Coverage:")
+            for label, stats in mask_stats.items():
+                print(f"  • {label}: {stats['coverage_percent']:.2f}% ({stats['non_zero_pixels']:,} pixels)")
+            print(f"\n✅ Results saved to: {output_path}")
+            return True
+        else:
+            print(f"❌ Error: {response.status_code}")
+            print(response.text)
+            # Save error response
+            error_data = {
+                "timestamp": datetime.now().isoformat(),
+                "status_code": response.status_code,
+                "error": response.text,
+                "elapsed_time_seconds": elapsed_time
+            }
+            with open(output_path / "error.json", "w") as f:
+                json.dump(error_data, f, indent=2)
+            return False
+    except Exception as e:
+        print(f"❌ Exception: {e}")
+        import traceback
+        traceback.print_exc()
+        # Save exception
+        error_data = {
+            "timestamp": datetime.now().isoformat(),
+            "exception": str(e),
+            "traceback": traceback.format_exc()
+        }
+        with open(output_path / "error.json", "w") as f:
+            json.dump(error_data, f, indent=2)
+        return False
+def main():
+    """Run comprehensive inference tests"""
+    print("="*80)
+    print("SAM3 Comprehensive Inference Test")
+    print("="*80)
+    print(f"Endpoint: {ENDPOINT_URL}")
+    print(f"Classes: {', '.join(CLASSES)}")
+    print(f"Test images directory: {TEST_IMAGES_DIR}")
+    print(f"Output directory: {OUTPUT_DIR}")
+    # Find all test images
+    image_extensions = ['.jpg', '.jpeg', '.png', '.bmp']
+    test_images = []
+    for ext in image_extensions:
+        test_images.extend(TEST_IMAGES_DIR.glob(f"*{ext}"))
+        test_images.extend(TEST_IMAGES_DIR.glob(f"*{ext.upper()}"))
+    test_images = sorted(set(test_images))
+    if not test_images:
+        print(f"\n❌ No test images found in {TEST_IMAGES_DIR}")
+        sys.exit(1)
+    print(f"\nFound {len(test_images)} test image(s)")
+    # Test each image
+    results_summary = []
+    for image_path in test_images:
+        success = test_image(image_path)
+        results_summary.append({
+            "image": image_path.name,
+            "success": success
+        })
+    # Print summary
+    print("\n" + "="*80)
+    print("Test Summary")
+    print("="*80)
+    successful = sum(1 for r in results_summary if r["success"])
+    failed = len(results_summary) - successful
+    print(f"Total: {len(results_summary)}")
+    print(f"Successful: {successful}")
+    print(f"Failed: {failed}")
+    print("\nResults:")
+    for result in results_summary:
+        status = "✅" if result["success"] else "❌"
+        print(f"  {status} {result['image']}")
+    # Save summary
+    summary_path = OUTPUT_DIR / "summary.json"
+    with open(summary_path, "w") as f:
+        json.dump({
+            "timestamp": datetime.now().isoformat(),
+            "total": len(results_summary),
+            "successful": successful,
+            "failed": failed,
+            "results": results_summary
+        }, f, indent=2)
+    print(f"\nSummary saved to: {summary_path}")
+    if failed > 0:
+        sys.exit(1)
+if __name__ == "__main__":
+    main()