Initial commit: SAM2 finetuned checkpoint + config

Files changed (7) hide show

.gitattributes +8 -0
.gitignore +11 -0
README.md +196 -0
config.json +67 -0
processor_config.json +24 -0
sam2.1_hiera_base_plus_ft_ids.pt +3 -0
sam_checkpoint.safetensors +3 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,8 @@

+# Track large model weights with LFS
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+# Text files with LF normalization
+*.json text eol=lf
+README.md text eol=lf
+*.md text eol=lf

.gitignore ADDED Viewed

	@@ -0,0 +1,11 @@

+# Outputs / artifacts
+outputs/
+logs/
+*.log
+# Python cache
+__pycache__/
+*.pyc
+# Datasets (no subir datos privados)
+data/

README.md ADDED Viewed

	@@ -0,0 +1,196 @@

+# SAM2 ID Segmenter
+Lightweight wrapper and fine‑tuning scaffold around Meta's Segment Anything 2 (SAM2) adapted to segment structured regions in ID / document images (e.g. portrait, number field, security areas). The repository currently focuses on: (1) reproducible loading of a fine‑tuned SAM2 checkpoint, (2) automatic multi‑mask generation + tight cropping, and (3) configuration file driven training/inference settings.
+> Status: Inference wrapper implemented (`SamSegmentator`). End‑to‑end training loop is a planned addition. Config already anticipates training hyper‑parameters.
+---
+## Contents
+1. Motivation & Scope
+2. Intended Use & Non‑Goals
+3. Repository Structure
+4. Configuration (`config.json`)
+5. Installation
+6. Inference Usage (`SamSegmentator`)
+7. Dataset & Mask Format (planned training)
+8. Checkpoints & Auto‑Download
+9. Metrics (recommended)
+10. Limitations & Risks
+11. Roadmap
+12. License & Citation
+---
+## 1. Motivation & Scope
+Document / ID workflows often need fast class‑agnostic region extraction (for OCR, redaction, or downstream classifiers). SAM2 provides strong general mask proposals; this project wraps it to directly yield cropped image + mask pairs ordered by area and optionally padded.
+## 2. Intended Use & Non‑Goals
+Intended:
+- Pre‑segmentation of ID / document fields prior to OCR.
+- Selective anonymization / redaction pipelines (masking faces, MRZ, barcodes, etc.).
+- Rapid prototyping for custom fine‑tuning of SAM2 on a small set of document classes.
+Non‑Goals:
+- Biometric identity verification or authoritative fraud detection.
+- Legal decision making without human review.
+- Full multi‑modal extraction (text recognition is out of scope here).
+## 3. Repository Structure
+```
+model_repo/
+	config.json          # Central hyper‑parameter & path config
+	README.md            # (this file)
+checkpoints/           # Local downloaded / fine‑tuned checkpoints
+samples/
+	sample_us_passport.jpg
+src/
+	sam_segmentator.py   # Inference wrapper (SamSegmentator)
+main.py                # Placeholder entry point
+```
+Planned: `train/` scripts for fine‑tuning (not yet implemented).
+## 4. Configuration (`model_repo/config.json`)
+Key fields (example values included in the repo):
+- `model_type`: Always `sam2` here.
+- `checkpoint_path`: Path relative to project root or absolute; if omitted and `auto_download=True` the code will attempt remote download.
+- `image_size`: Target square size used during training (future). Inference wrapper accepts raw image size.
+- `num_classes`, `class_names`: For supervised training (future); not required by the current automatic mask generator, but kept for consistency.
+- `augmentation`, `loss`, `optimizer`, `lr_scheduler`: Reserved for training loop integration.
+- `paths`: Expected dataset layout for training: `data/train/images`, `data/train/masks`, etc.
+- `mixed_precision`: Will enable `torch.autocast` during training.
+Even if not all fields are consumed now, keeping them centralized avoids future breaking refactors.
+## 5. Installation
+### Prerequisites
+- Python 3.10+ (recommended)
+- CUDA GPU (optional but recommended for speed)
+### Using uv (preferred fast resolver)
+If `pyproject.toml` is present (it is), you can do:
+```
+uv sync
+```
+This creates / updates the virtual environment and installs dependencies.
+### Using pip (alternative)
+```
+python -m venv .venv
+.venv\Scripts\activate
+pip install -U pip
+pip install -e .
+```
+If SAM2 is not a published package in your environment, you may need to install it from source (instructions will depend on the upstream SAM2 repository—add here when finalized).
+## 6. Inference Usage (`SamSegmentator`)
+Minimal example using the sample passport image:
+```python
+import cv2
+from pathlib import Path
+from src.sam_segmentator import SamSegmentator
+image_path = Path("samples/sample_us_passport.jpg")
+img_bgr = cv2.imread(str(image_path))  # BGR (OpenCV)
+segmentator = SamSegmentator(
+		checkpoint_path="checkpoints/sam2.1_hiera_base_plus_ft_ids.pt",  # or None to auto-download if configured
+		pred_iou_thresh=0.88,  # forwarded to SAM2AutomaticMaskGenerator
+		stability_score_thresh=0.90,
+)
+segments = segmentator.infer(img_bgr, pad_percent=0.05)
+print(f"Total segments: {len(segments)}")
+# Each segment is (crop_bgr, mask_255)
+for i, (crop, mask) in enumerate(segments[:3]):
+		cv2.imwrite(f"outputs/segment_{i}_crop.png", crop)
+		cv2.imwrite(f"outputs/segment_{i}_mask.png", mask)
+```
+Output: pairs of tightly cropped images and their binary masks (0 background, 255 foreground), sorted by mask area descending.
+### Parameter Notes
+- `pad_percent`: Relative padding (default 5%) added around each tight bounding box.
+- Deprecated `pad` (absolute pixels) still accepted but will warn.
+- All additional kwargs go to `SAM2AutomaticMaskGenerator` (e.g., `box_nms_thresh`, `min_mask_region_area`).
+## 7. Dataset & Mask Format (For Future Training)
+Expected layout (mirrors `paths` in config):
+```
+data/
+	train/
+		images/*.jpg|png
+		masks/*.png        # Single‑channel, integer indices (0=background)
+	val/
+		images/
+		masks/
+```
+Class index mapping (example):
+```
+class_names = ["ID1", "ID3", "IDCOVER"]
+0 -> background
+1 -> ID1
+2 -> ID3
+3 -> IDCOVER
+```
+Masks should use nearest‑neighbor safe compression (PNG). Avoid palette mismatch; explicit integer pixel values are recommended.
+## 8. Checkpoints & Auto‑Download
+`SamSegmentator` will:
+1. Use provided `checkpoint_path` if it exists.
+2. If none is provided and `auto_download=True`, download the default checkpoint to `checkpoints/` using an environment configured URL (`SAM2_CHECKPOINT_URL`).
+3. (Optional) Validate SHA256 if `SAM2_CHECKPOINT_SHA256` is set.
+Environment variables:
+```
+SAM2_CHECKPOINT_URL=<direct_download_url>
+SAM2_CHECKPOINT_SHA256=<hex>
+SAM2_CHECKPOINT_DIR=checkpoints
+```
+## 9. Metrics (Recommended When Training Added)
+- Mean IoU (per class & macro average)
+- Dice coefficient
+- Pixel accuracy
+- Class frequency distribution (to inform potential class weighting)
+Store per‑epoch metrics as JSON for reproducibility.
+## 10. Limitations & Risks
+Technical:
+- Current version does not include a fine‑tuning script; only inference wrapper.
+- Automatic mask generator is class‑agnostic; without fine‑tuning it may over‑segment or miss tiny fields.
+Ethical / Compliance:
+- Processing ID documents may involve PII; ensure secure storage and compliant handling.
+- Not intended for biometric decisions nor identity verification pipelines without human oversight.
+## 11. Roadmap
+- [ ] Add training script (supervised fine‑tuning using `config.json`).
+- [ ] Optional class‑guided prompting (points / boxes) pipeline.
+- [ ] Export to ONNX / TorchScript.
+- [ ] CLI interface for batch folder inference.
+- [ ] Lightweight web demo (Gradio / FastAPI).
+## 12. License & Citation
+Specify a license in a top‑level `LICENSE` file (e.g., MIT or Apache‑2.0) ensuring compatibility with SAM2's original license.
+Please cite SAM / SAM2 in academic work. Example (placeholder):
+```
+@article{kirillov2023segmentanything,
+	title={Segment Anything},
+	author={Kirillov, Alexander and others},
+	journal={arXiv preprint arXiv:2304.02643},
+	year={2023}
+}
+```
+Add updated SAM2 citation once official reference is finalized.
+## Acknowledgments
+- Meta AI for releasing Segment Anything & SAM2.
+- OpenCV, PyTorch, and the broader CV community.
+---
+If you have questions or need feature prioritization, open an Issue or start a Discussion.

config.json ADDED Viewed

	@@ -0,0 +1,67 @@

+{
+  "model_type": "sam2",
+  "checkpoint_path": "weights/sam2_base.pth",
+  "image_size": [1024, 1024],
+  "num_classes": 10,
+  "class_names": [
+    "ID1",
+    "ID3",
+    "IDCOVER"
+  ],
+  "input_channels": 3,
+  "learning_rate": 1e-5,
+  "weight_decay": 0.01,
+  "batch_size": 2,
+  "gradient_accumulation_steps": 8,
+  "num_epochs": 100,
+  "optimizer": "adamw",
+  "lr_scheduler": {
+    "type": "cosine",
+    "warmup_epochs": 5,
+    "min_lr": 1e-7
+  },
+  "loss": {
+    "primary": "cross_entropy",
+    "auxiliary": ["dice"],
+    "dice_smooth": 1.0,
+    "class_weights": null
+  },
+  "mixed_precision": true,
+  "early_stopping": {
+    "patience": 15,
+    "metric": "val_loss",
+    "mode": "min"
+  },
+  "dropout_rate": 0.0,
+  "augmentation": {
+    "horizontal_flip": true,
+    "vertical_flip": false,
+    "rotation_deg": 15,
+    "random_crop": true,
+    "scale_range": [0.9, 1.1],
+    "brightness": 0.1,
+    "contrast": 0.1,
+    "color_jitter_prob": 0.3
+  },
+  "normalization": {
+    "mean": [0.485, 0.456, 0.406],
+    "std":  [0.229, 0.224, 0.225]
+  },
+  "dataloader": {
+    "num_workers": 4,
+    "pin_memory": true,
+    "shuffle": true
+  },
+  "paths": {
+    "train_images": "data/train/images",
+    "train_masks": "data/train/masks",
+    "val_images": "data/val/images",
+    "val_masks": "data/val/masks",
+    "output_dir": "outputs"
+  },
+  "logging": {
+    "log_interval": 50,
+    "save_checkpoint_every": 1
+  },
+  "seed": 42
+}

processor_config.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "preprocessing": {
+    "resize": {
+      "height": 256,
+      "width": 256
+    },
+    "normalization": {
+      "mean": [0.485, 0.456, 0.406],
+      "std": [0.229, 0.224, 0.225]
+    },
+    "augmentation": {
+      "random_flip": true,
+      "random_crop": {
+        "height": 224,
+        "width": 224
+      }
+    }
+  },
+  "tokenization": {
+    "do_lower_case": true,
+    "max_length": 512,
+    "padding": "max_length"
+  }
+}

sam2.1_hiera_base_plus_ft_ids.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:64f76f41204b7694ea59200d85d8b742e1808532aa063118d3d043d79aa285b3
+size 910662494

sam_checkpoint.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c30cc8d0758ccf4154a7857ae971917f379a2b781a4149c88c3b2d1bc654a452
+size 40