Upload fine-tuned CLIPSeg weights from best_model.pt

Browse files

Files changed (9) hide show

README.md +84 -0
config.json +35 -0
merges.txt +0 -0
model.safetensors +3 -0
preprocessor_config.json +23 -0
special_tokens_map.json +24 -0
tokenizer.json +0 -0
tokenizer_config.json +31 -0
vocab.json +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,84 @@

+---
+language: en
+license: mit
+tags:
+  - clipseg
+  - image-segmentation
+  - text-conditioned-segmentation
+  - drywall
+  - quality-inspection
+  - pytorch
+base_model: CIDAS/clipseg-rd64-refined
+datasets:
+  - roboflow/drywall-join-detect
+  - roboflow/cracks-3ii36
+metrics:
+  - iou
+  - dice
+---
+# CLIPSeg — Fine-tuned for Drywall QA
+Fine-tuned version of [CIDAS/clipseg-rd64-refined](https://huggingface.co/CIDAS/clipseg-rd64-refined)
+for text-conditioned binary segmentation of drywall defects.
+## Supported Prompts
+| Prompt | Target Region | Val mIoU | Val Dice |
+|--------|--------------|----------|----------|
+| `segment crack` | Wall cracks | **0.7352** | **0.8336** |
+| `segment taping area` | Joint / tape seam | **0.4985** | **0.6256** |
+## Training Details
+| Setting | Value |
+|---------|-------|
+| Base model | `CIDAS/clipseg-rd64-refined` |
+| Epochs | 20 |
+| Batch size | 4 |
+| Learning rate | 1e-4 (AdamW) |
+| Scheduler | CosineAnnealingLR |
+| Loss | BCE 0.5 + Dice 0.5 |
+| Image size | 352 × 352 |
+| Threshold | 0.5 |
+| Seed | 42 |
+| Hardware | Tesla T4 (Google Colab) |
+| Train time | ~65.3 min |
+| Avg inference | 13.0 ms / image |
+## Datasets
+- **Dataset 1 — Taping area:** [Drywall-Join-Detect](https://universe.roboflow.com/objectdetect-pu6rn/drywall-join-detect)
+- **Dataset 2 — Cracks:** [Cracks](https://universe.roboflow.com/fyp-ny1jt/cracks-3ii36)
+## Quick Usage
+```python
+import torch
+from PIL import Image
+from transformers import CLIPSegProcessor, CLIPSegForImageSegmentation
+processor = CLIPSegProcessor.from_pretrained("S-4-G-4-R/clipseg-drywall-qa")
+model     = CLIPSegForImageSegmentation.from_pretrained("S-4-G-4-R/clipseg-drywall-qa")
+model.eval()
+image  = Image.open("your_image.jpg").convert("RGB")
+prompt = "segment crack"   # or "segment taping area"
+inputs = processor(
+    text=prompt, images=image,
+    return_tensors="pt", padding=True
+)
+with torch.no_grad():
+    logits = model(**inputs).logits
+mask = (torch.sigmoid(logits[0]) > 0.5).numpy()   # boolean H×W mask
+```
+## Test Results (best checkpoint — epoch 15)
+| Metric | segment crack | segment taping area |
+|--------|--------------|---------------------|
+| mIoU | 0.6900 (test) / 0.7352 (val) | 0.4985 (val) |
+| Dice | 0.7957 (test) / 0.8336 (val) | 0.6256 (val) |

config.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "_name_or_path": "CIDAS/clipseg-rd64-refined",
+  "architectures": [
+    "CLIPSegForImageSegmentation"
+  ],
+  "conditional_layer": 0,
+  "decoder_attention_dropout": 0.0,
+  "decoder_hidden_act": "quick_gelu",
+  "decoder_intermediate_size": 2048,
+  "decoder_num_attention_heads": 4,
+  "extract_layers": [
+    3,
+    6,
+    9
+  ],
+  "initializer_factor": 1.0,
+  "logit_scale_init_value": 2.6592,
+  "model_type": "clipseg",
+  "projection_dim": 512,
+  "reduce_dim": 64,
+  "text_config": {
+    "bos_token_id": 0,
+    "dropout": 0.0,
+    "eos_token_id": 2,
+    "model_type": "clipseg_text_model"
+  },
+  "torch_dtype": "float32",
+  "transformers_version": "4.46.0",
+  "use_complex_transposed_convolution": true,
+  "vision_config": {
+    "dropout": 0.0,
+    "model_type": "clipseg_vision_model",
+    "patch_size": 16
+  }
+}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:92019ed53bd73b145328a99a2ff12c0c6ae8f70174bac86774704183b0c05c68
+size 603047096

preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,23 @@

+{
+  "do_normalize": true,
+  "do_rescale": true,
+  "do_resize": true,
+  "image_mean": [
+    0.485,
+    0.456,
+    0.406
+  ],
+  "image_processor_type": "ViTImageProcessor",
+  "image_std": [
+    0.229,
+    0.224,
+    0.225
+  ],
+  "processor_class": "CLIPSegProcessor",
+  "resample": 2,
+  "rescale_factor": 0.00392156862745098,
+  "size": {
+    "height": 352,
+    "width": 352
+  }
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "bos_token": {
+    "content": "<|startoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<|endoftext|>",
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "49406": {
+      "content": "<|startoftext|>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "49407": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<|startoftext|>",
+  "clean_up_tokenization_spaces": false,
+  "do_lower_case": true,
+  "eos_token": "<|endoftext|>",
+  "errors": "replace",
+  "model_max_length": 77,
+  "pad_token": "<|endoftext|>",
+  "processor_class": "CLIPSegProcessor",
+  "tokenizer_class": "CLIPTokenizer",
+  "unk_token": "<|endoftext|>"
+}

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff