ceperaltab's picture
Upload CONTEXT.md with huggingface_hub
81c1e1b verified

Project Context: High-Throughput Diamond & Jewelry Vision Pipeline

Domain

  • Application: Industrial diamond processing (100,000+ daily videos)
  • Problem: Segmentation, masking, and tracking of refractive/transparent gemstone objects
  • Input: Sorted JPEG frame sequences (N-frame sequences from video capture)
  • Output: Temporally-consistent soft-edge masks for downstream QC pipelines

Core Stack & Models

  • Propagation Engine: Meta SAM 3 — VideoPredictor API for multi-frame mask propagation + open-vocabulary text prompts
  • Student Detector: YOLOv11-seg — Teacher-Student distillation for real-time inference
  • Temporal Smoothing: scipy.signal.savgol_filter (Savitzky-Golay) applied per-frame mask coefficients
  • Performance Layer: TensorRT export + batch inference (target: ≤10ms/frame)

Refraction Rules (Strict)

  1. Never use binary masks for diamonds or gemstones — always use soft-edge / alpha matting masks
  2. Alpha channel preservation: Output masks must retain transparency gradients (float32 alpha map, 0.0–1.0)
  3. Edge softness: Apply Gaussian-weighted alpha blending at mask boundaries (sigma ≥ 2px)
  4. Background reconstruction: Use inpainting (e.g., cv2.inpaint) to handle semi-transparent regions

Temporal Consistency Rules

  1. Savitzky-Golay filtering MUST be applied across frame mask sequences (window=5, polyorder=2)
  2. No hard jumps: Mask IoU between consecutive frames must be ≥ 0.85 (flag frames below threshold)
  3. Propagation priority: Prefer SAM 3 propagation over per-frame YOLO prediction for tracked sequences
  4. Anchor frames: Every 15th frame is re-annotated as a keyframe to prevent drift

Architecture Rules

  1. TensorRT FP16 for all inference — no FP32 in production paths
  2. Batch size ≥ 8 for YOLO inference; ≥ 4 for SAM 3 propagation
  3. Frame loading via OpenCV (cv2.VideoCapture) in sorted JPEG mode
  4. All preprocessing must be GPU-side (CUDA streams)
  5. Use torch.compile() for Python-side model wrappers where supported

Naming Conventions

  • Mask output files: frame_{idx:06d}_mask_alpha.png
  • Model checkpoints: yolo11seg_diamond_v{version}.pt / sam3_diamond_v{version}.pth
  • Dataset splits: train/, val/, test/ under data/

Quality Criteria

  • Precision / Recall for diamond facets: target ≥ 0.92 mAP@0.5
  • Mask temporal SSIM: ≥ 0.95 across N-frame sequences
  • Throughput: ≥ 500 frames/sec on A100 80GB (batch mode)