| # Project Context: High-Throughput Diamond & Jewelry Vision Pipeline | |
| ## Domain | |
| - **Application**: Industrial diamond processing (100,000+ daily videos) | |
| - **Problem**: Segmentation, masking, and tracking of refractive/transparent gemstone objects | |
| - **Input**: Sorted JPEG frame sequences (N-frame sequences from video capture) | |
| - **Output**: Temporally-consistent soft-edge masks for downstream QC pipelines | |
| ## Core Stack & Models | |
| - **Propagation Engine**: Meta SAM 3 β `VideoPredictor` API for multi-frame mask propagation + open-vocabulary text prompts | |
| - **Student Detector**: YOLOv11-seg β Teacher-Student distillation for real-time inference | |
| - **Temporal Smoothing**: `scipy.signal.savgol_filter` (Savitzky-Golay) applied per-frame mask coefficients | |
| - **Performance Layer**: TensorRT export + batch inference (target: β€10ms/frame) | |
| ## Refraction Rules (Strict) | |
| 1. **Never use binary masks** for diamonds or gemstones β always use soft-edge / alpha matting masks | |
| 2. **Alpha channel preservation**: Output masks must retain transparency gradients (float32 alpha map, 0.0β1.0) | |
| 3. **Edge softness**: Apply Gaussian-weighted alpha blending at mask boundaries (sigma β₯ 2px) | |
| 4. **Background reconstruction**: Use inpainting (e.g., `cv2.inpaint`) to handle semi-transparent regions | |
| ## Temporal Consistency Rules | |
| 1. **Savitzky-Golay filtering** MUST be applied across frame mask sequences (window=5, polyorder=2) | |
| 2. **No hard jumps**: Mask IoU between consecutive frames must be β₯ 0.85 (flag frames below threshold) | |
| 3. **Propagation priority**: Prefer SAM 3 propagation over per-frame YOLO prediction for tracked sequences | |
| 4. **Anchor frames**: Every 15th frame is re-annotated as a keyframe to prevent drift | |
| ## Architecture Rules | |
| 1. TensorRT FP16 for all inference β no FP32 in production paths | |
| 2. Batch size β₯ 8 for YOLO inference; β₯ 4 for SAM 3 propagation | |
| 3. Frame loading via OpenCV (`cv2.VideoCapture`) in sorted JPEG mode | |
| 4. All preprocessing must be GPU-side (CUDA streams) | |
| 5. Use `torch.compile()` for Python-side model wrappers where supported | |
| ## Naming Conventions | |
| - Mask output files: `frame_{idx:06d}_mask_alpha.png` | |
| - Model checkpoints: `yolo11seg_diamond_v{version}.pt` / `sam3_diamond_v{version}.pth` | |
| - Dataset splits: `train/`, `val/`, `test/` under `data/` | |
| ## Quality Criteria | |
| - Precision / Recall for diamond facets: target β₯ 0.92 mAP@0.5 | |
| - Mask temporal SSIM: β₯ 0.95 across N-frame sequences | |
| - Throughput: β₯ 500 frames/sec on A100 80GB (batch mode) | |