simpleseganonymous
/

SimpleSeg

Safetensors

kimi_vl

custom_code

Model card Files Files and versions

xet

Community

simpleseganonymous commited on Oct 28, 2025

Commit

97af7df

verified ·

1 Parent(s): d0a3579

Update README.md

Browse files

Files changed (1) hide show

README.md +79 -1

README.md CHANGED Viewed

@@ -60,4 +60,82 @@ Without introducing any complex architectures or special patterns, we show how e
 | Text4Seg (w/ SAM)| 90.3        | 93.4          | 87.5          | 85.2        | 89.9          | 79.5          | 85.4         | 85.4          | 87.1 |
 | **Decoder-free Models** |             |               |               |             |               |               |              |               |      |
 | Text4Seg         | 88.3        | 91.4          | 85.8          | 83.5        | 88.2          | 77.9          | 82.4         | 82.5          | 85.0 |
-| **SimpleSeg**    | 90.5        | 92.9          | 86.8          | 85.3        | 89.5          | 80.2          | 86.1         | 86.5          | 87.2 |

 | Text4Seg (w/ SAM)| 90.3        | 93.4          | 87.5          | 85.2        | 89.9          | 79.5          | 85.4         | 85.4          | 87.1 |
 | **Decoder-free Models** |             |               |               |             |               |               |              |               |      |
 | Text4Seg         | 88.3        | 91.4          | 85.8          | 83.5        | 88.2          | 77.9          | 82.4         | 82.5          | 85.0 |
+| **SimpleSeg**    | 90.5        | 92.9          | 86.8          | 85.3        | 89.5          | 80.2          | 86.1         | 86.5          | 87.2 |
+# Model Usage
+## Inference with 🤗 Hugging Face Transformers
+It is recommended to use python=3.10, torch>=2.1.0, and transformers=4.48.2 as the development environment.
+```python
+from PIL import Image
+from transformers import AutoModelForCausalLM, AutoProcessor
+model_path = "simpleseganonymous/SimpleSeg"
+model = AutoModelForCausalLM.from_pretrained(
+    model_path,
+    torch_dtype="auto",
+    device_map="auto",
+    trust_remote_code=True,
+)
+processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
+image_path = "./figures/octopus.png"
+image = Image.open(image_path)
+messages = [
+    {"role": "user", "content": [{"type": "image", "image": image_path}, {"type": "text", "text": "Output the polygon coordinates of octopus in the image."}]}
+]
+text = processor.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
+inputs = processor(images=image, text=text, return_tensors="pt", padding=True, truncation=True).to(model.device)
+generated_ids = model.generate(**inputs, max_new_tokens=512)
+generated_ids_trimmed = [
+    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
+]
+response = processor.batch_decode(
+    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
+)[0]
+print(response)
+```
+## Decode the polygons and masks from the response string
+```python
+import re
+import pycocotools.mask as mask_utils
+class RegexPatterns:
+    BOXED_PATTERN = r'\\boxed\{([^}]*)\}'
+    BLOCK_PATTERN = r'^```$\r?\n(.*?)\r?\n^```$'
+    NON_NEGATIVE_FLOAT_PATTERN = (
+        r'(?:[1-9]\d*\.\d+|0\.\d+|\d+)'
+    )
+    BBOX_PATTERN = rf'\[\s*({NON_NEGATIVE_FLOAT_PATTERN})\s*,\s*({NON_NEGATIVE_FLOAT_PATTERN})\s*,\s*({NON_NEGATIVE_FLOAT_PATTERN})\s*,\s*({NON_NEGATIVE_FLOAT_PATTERN})\s*\]'
+    POINT_PATTERN = (
+        rf'\[\s*({NON_NEGATIVE_FLOAT_PATTERN})\s*,\s*({NON_NEGATIVE_FLOAT_PATTERN})\s*\]'
+    )
+    POLYGON_PATTERN = rf'\[\s*{POINT_PATTERN}(?:\s*,\s*{POINT_PATTERN})*\s*\]'
+polygon_matches = [
+    m.group(0) for m in re.finditer(RegexPatterns.POLYGON_PATTERN, response, re.DOTALL)
+]
+pred_polygons = []
+for polygon_match in polygon_matches:
+    polygon = json.loads(polygon_match)
+    pred_polygons.append(polygon)
+pred_masks = []
+for pred_polygon in pred_polygons:
+    pred_polygon = np.array(pred_polygon) * np.array([width, height])
+    rle = mask_utils.frPyObjects(pred_polygon.reshape((1, -1)).tolist(), height, width)
+    mask = mask_utils.decode(rle)
+    mask = np.sum(mask, axis=2, keepdims=True)
+    pred_masks.append(mask)
+pred_mask = np.sum(pred_masks, axis=0)
+pred_mask = pred_mask.sum(axis=2)
+pred_mask = (pred_mask > 0).astype(np.uint8)
+```