Upload CONTEXT.md with huggingface_hub
Browse files- CONTEXT.md +42 -0
CONTEXT.md
ADDED
|
@@ -0,0 +1,42 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Project Context: High-Throughput Diamond & Jewelry Vision Pipeline
|
| 2 |
+
|
| 3 |
+
## Domain
|
| 4 |
+
- **Application**: Industrial diamond processing (100,000+ daily videos)
|
| 5 |
+
- **Problem**: Segmentation, masking, and tracking of refractive/transparent gemstone objects
|
| 6 |
+
- **Input**: Sorted JPEG frame sequences (N-frame sequences from video capture)
|
| 7 |
+
- **Output**: Temporally-consistent soft-edge masks for downstream QC pipelines
|
| 8 |
+
|
| 9 |
+
## Core Stack & Models
|
| 10 |
+
- **Propagation Engine**: Meta SAM 3 — `VideoPredictor` API for multi-frame mask propagation + open-vocabulary text prompts
|
| 11 |
+
- **Student Detector**: YOLOv11-seg — Teacher-Student distillation for real-time inference
|
| 12 |
+
- **Temporal Smoothing**: `scipy.signal.savgol_filter` (Savitzky-Golay) applied per-frame mask coefficients
|
| 13 |
+
- **Performance Layer**: TensorRT export + batch inference (target: ≤10ms/frame)
|
| 14 |
+
|
| 15 |
+
## Refraction Rules (Strict)
|
| 16 |
+
1. **Never use binary masks** for diamonds or gemstones — always use soft-edge / alpha matting masks
|
| 17 |
+
2. **Alpha channel preservation**: Output masks must retain transparency gradients (float32 alpha map, 0.0–1.0)
|
| 18 |
+
3. **Edge softness**: Apply Gaussian-weighted alpha blending at mask boundaries (sigma ≥ 2px)
|
| 19 |
+
4. **Background reconstruction**: Use inpainting (e.g., `cv2.inpaint`) to handle semi-transparent regions
|
| 20 |
+
|
| 21 |
+
## Temporal Consistency Rules
|
| 22 |
+
1. **Savitzky-Golay filtering** MUST be applied across frame mask sequences (window=5, polyorder=2)
|
| 23 |
+
2. **No hard jumps**: Mask IoU between consecutive frames must be ≥ 0.85 (flag frames below threshold)
|
| 24 |
+
3. **Propagation priority**: Prefer SAM 3 propagation over per-frame YOLO prediction for tracked sequences
|
| 25 |
+
4. **Anchor frames**: Every 15th frame is re-annotated as a keyframe to prevent drift
|
| 26 |
+
|
| 27 |
+
## Architecture Rules
|
| 28 |
+
1. TensorRT FP16 for all inference — no FP32 in production paths
|
| 29 |
+
2. Batch size ≥ 8 for YOLO inference; ≥ 4 for SAM 3 propagation
|
| 30 |
+
3. Frame loading via OpenCV (`cv2.VideoCapture`) in sorted JPEG mode
|
| 31 |
+
4. All preprocessing must be GPU-side (CUDA streams)
|
| 32 |
+
5. Use `torch.compile()` for Python-side model wrappers where supported
|
| 33 |
+
|
| 34 |
+
## Naming Conventions
|
| 35 |
+
- Mask output files: `frame_{idx:06d}_mask_alpha.png`
|
| 36 |
+
- Model checkpoints: `yolo11seg_diamond_v{version}.pt` / `sam3_diamond_v{version}.pth`
|
| 37 |
+
- Dataset splits: `train/`, `val/`, `test/` under `data/`
|
| 38 |
+
|
| 39 |
+
## Quality Criteria
|
| 40 |
+
- Precision / Recall for diamond facets: target ≥ 0.92 mAP@0.5
|
| 41 |
+
- Mask temporal SSIM: ≥ 0.95 across N-frame sequences
|
| 42 |
+
- Throughput: ≥ 500 frames/sec on A100 80GB (batch mode)
|