ceperaltab commited on
Commit
81c1e1b
·
verified ·
1 Parent(s): 65ed589

Upload CONTEXT.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. CONTEXT.md +42 -0
CONTEXT.md ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Project Context: High-Throughput Diamond & Jewelry Vision Pipeline
2
+
3
+ ## Domain
4
+ - **Application**: Industrial diamond processing (100,000+ daily videos)
5
+ - **Problem**: Segmentation, masking, and tracking of refractive/transparent gemstone objects
6
+ - **Input**: Sorted JPEG frame sequences (N-frame sequences from video capture)
7
+ - **Output**: Temporally-consistent soft-edge masks for downstream QC pipelines
8
+
9
+ ## Core Stack & Models
10
+ - **Propagation Engine**: Meta SAM 3 — `VideoPredictor` API for multi-frame mask propagation + open-vocabulary text prompts
11
+ - **Student Detector**: YOLOv11-seg — Teacher-Student distillation for real-time inference
12
+ - **Temporal Smoothing**: `scipy.signal.savgol_filter` (Savitzky-Golay) applied per-frame mask coefficients
13
+ - **Performance Layer**: TensorRT export + batch inference (target: ≤10ms/frame)
14
+
15
+ ## Refraction Rules (Strict)
16
+ 1. **Never use binary masks** for diamonds or gemstones — always use soft-edge / alpha matting masks
17
+ 2. **Alpha channel preservation**: Output masks must retain transparency gradients (float32 alpha map, 0.0–1.0)
18
+ 3. **Edge softness**: Apply Gaussian-weighted alpha blending at mask boundaries (sigma ≥ 2px)
19
+ 4. **Background reconstruction**: Use inpainting (e.g., `cv2.inpaint`) to handle semi-transparent regions
20
+
21
+ ## Temporal Consistency Rules
22
+ 1. **Savitzky-Golay filtering** MUST be applied across frame mask sequences (window=5, polyorder=2)
23
+ 2. **No hard jumps**: Mask IoU between consecutive frames must be ≥ 0.85 (flag frames below threshold)
24
+ 3. **Propagation priority**: Prefer SAM 3 propagation over per-frame YOLO prediction for tracked sequences
25
+ 4. **Anchor frames**: Every 15th frame is re-annotated as a keyframe to prevent drift
26
+
27
+ ## Architecture Rules
28
+ 1. TensorRT FP16 for all inference — no FP32 in production paths
29
+ 2. Batch size ≥ 8 for YOLO inference; ≥ 4 for SAM 3 propagation
30
+ 3. Frame loading via OpenCV (`cv2.VideoCapture`) in sorted JPEG mode
31
+ 4. All preprocessing must be GPU-side (CUDA streams)
32
+ 5. Use `torch.compile()` for Python-side model wrappers where supported
33
+
34
+ ## Naming Conventions
35
+ - Mask output files: `frame_{idx:06d}_mask_alpha.png`
36
+ - Model checkpoints: `yolo11seg_diamond_v{version}.pt` / `sam3_diamond_v{version}.pth`
37
+ - Dataset splits: `train/`, `val/`, `test/` under `data/`
38
+
39
+ ## Quality Criteria
40
+ - Precision / Recall for diamond facets: target ≥ 0.92 mAP@0.5
41
+ - Mask temporal SSIM: ≥ 0.95 across N-frame sequences
42
+ - Throughput: ≥ 500 frames/sec on A100 80GB (batch mode)