File size: 3,241 Bytes
c5bb601
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
license: agpl-3.0
library_name: onnx
tags:
  - yolo
  - yolov11
  - object-detection
  - instance-segmentation
  - onnx
  - tensorrt
pipeline_tag: image-segmentation
---

# occurra/object_detection_segmentation

ONNX exports of [Ultralytics YOLOv11-seg](https://github.com/ultralytics/ultralytics)
(instance segmentation) in the configurations the occurra
`object_detection_segmentation` agent ships with. Companion to
[`occurra/object_detection`](https://huggingface.co/occurra/object_detection) —
same class set (person + bicycle + 4 vehicle subtypes), same naming
convention, same hardware-selection logic, with per-object pixel masks
on top of bounding boxes.

Nano size only (no small variant yet). Four precision variants. All
files are self-contained (no external-data sidecars).

## Filename convention

```
yolo11n-seg_{apple,fp16,fp8,int8}_640x640.onnx
```

| Token | Meaning |
| ----- | ------- |
| `n-seg` | YOLOv11 nano segmentation variant |
| `apple` | FP16, NMS-free, batch=1, static — CoreML / Apple ANE friendly. uint8 input. |
| `fp16` | FP16 weights, NMS embedded. Default for NVIDIA `TensorRT` EP. |
| `fp8` | FP8 quantized via TensorRT QDQ. Smallest VRAM footprint on Blackwell / Hopper. |
| `int8` | INT8 quantized with QDQ nodes embedded in the graph. No sidecar calibration cache needed. |
| `640x640` | Square input — same shape used by the upstream Ultralytics export. |

The `object_detection_segmentation` agent reads the input shape directly
from the loaded ONNX (`graph.input[0].type`) — no sidecar config; the
file name is informational.

## Which file to pick

| Hardware | Recommended |
| -------- | ----------- |
| Apple Silicon (CoreML / ANE) | `yolo11n-seg_apple_640x640.onnx` |
| NVIDIA RTX 4000+ / Blackwell | `yolo11n-seg_fp8_640x640.onnx` |
| NVIDIA older (no FP8) | `yolo11n-seg_int8_640x640.onnx` |
| CPU fallback | `yolo11n-seg_fp16_640x640.onnx` |

The agent's `_resolve_model_filename` picks automatically based on
platform + GPU compute capability. Set
`OBJECT_DETECTION_SEGMENTATION_MODEL=<filename>` to force a specific
variant.

## Outputs

Each ONNX has two outputs (Ultralytics-seg standard):

| Output | Shape | Contents |
| ------ | ----- | -------- |
| `output0` | `(batch, 4+80+32, N)` | `[cx, cy, w, h]` + 80 class scores + 32 mask coefficients per anchor |
| `output1` | `(batch, 32, proto_h, proto_w)` | Prototype masks; `coeffs @ protos` reconstructs the per-detection mask. |

The agent runs NMS in Python after filtering to the curated class set
(COCO 0/1/2/3/5/7 → person, bicycle, car, motorcycle, bus, truck) and
decodes masks in `YoloSegOnnx`. Bitplane bytes are passed to the C++
toolbox for denoising + RLE encoding.

## Source

Ultralytics `yolo11n-seg.pt` checkpoints downloaded from Ultralytics'
release feed and re-exported via the occurra toolbox's
`ai_agent_toolbox/agents/python/object_detection_segmentation/scripts/main.py`
(NMS-free for Apple, with-NMS for NVIDIA; FP8/INT8 use TensorRT QDQ).

## License

The model weights inherit Ultralytics YOLOv11's
[AGPL-3.0](https://github.com/ultralytics/ultralytics/blob/main/LICENSE)
license. Commercial use requires a separate enterprise license from
Ultralytics — the ONNX export does not change that.