ceperaltab
/

diamond-vision-training-code

Model card Files Files and versions

diamond-vision-training-code / CONTEXT.md

ceperaltab's picture

Upload CONTEXT.md with huggingface_hub

81c1e1b verified 17 days ago

|

history blame contribute delete

2.49 kB

	# Project Context: High-Throughput Diamond & Jewelry Vision Pipeline

	## Domain
	- Application: Industrial diamond processing (100,000+ daily videos)
	- Problem: Segmentation, masking, and tracking of refractive/transparent gemstone objects
	- Input: Sorted JPEG frame sequences (N-frame sequences from video capture)
	- Output: Temporally-consistent soft-edge masks for downstream QC pipelines

	## Core Stack & Models
	- Propagation Engine: Meta SAM 3 — `VideoPredictor` API for multi-frame mask propagation + open-vocabulary text prompts
	- Student Detector: YOLOv11-seg — Teacher-Student distillation for real-time inference
	- Temporal Smoothing: `scipy.signal.savgol_filter` (Savitzky-Golay) applied per-frame mask coefficients
	- Performance Layer: TensorRT export + batch inference (target: ≤10ms/frame)

	## Refraction Rules (Strict)
	1. Never use binary masks for diamonds or gemstones — always use soft-edge / alpha matting masks
	2. Alpha channel preservation: Output masks must retain transparency gradients (float32 alpha map, 0.0–1.0)
	3. Edge softness: Apply Gaussian-weighted alpha blending at mask boundaries (sigma ≥ 2px)
	4. Background reconstruction: Use inpainting (e.g., `cv2.inpaint`) to handle semi-transparent regions

	## Temporal Consistency Rules
	1. Savitzky-Golay filtering MUST be applied across frame mask sequences (window=5, polyorder=2)
	2. No hard jumps: Mask IoU between consecutive frames must be ≥ 0.85 (flag frames below threshold)
	3. Propagation priority: Prefer SAM 3 propagation over per-frame YOLO prediction for tracked sequences
	4. Anchor frames: Every 15th frame is re-annotated as a keyframe to prevent drift

	## Architecture Rules
	1. TensorRT FP16 for all inference — no FP32 in production paths
	2. Batch size ≥ 8 for YOLO inference; ≥ 4 for SAM 3 propagation
	3. Frame loading via OpenCV (`cv2.VideoCapture`) in sorted JPEG mode
	4. All preprocessing must be GPU-side (CUDA streams)
	5. Use `torch.compile()` for Python-side model wrappers where supported

	## Naming Conventions
	- Mask output files: `frame_{idx:06d}_mask_alpha.png`
	- Model checkpoints: `yolo11seg_diamond_v{version}.pt` / `sam3_diamond_v{version}.pth`
	- Dataset splits: `train/`, `val/`, `test/` under `data/`

	## Quality Criteria
	- Precision / Recall for diamond facets: target ≥ 0.92 mAP@0.5
	- Mask temporal SSIM: ≥ 0.95 across N-frame sequences
	- Throughput: ≥ 500 frames/sec on A100 80GB (batch mode)