YOLOv8 Segmentation — EdgeFirst Model Zoo

YOLOv8 Segmentation models trained on COCO 2017 (80 classes) and validated on real edge hardware through the EdgeFirst Profiler + Validator pipeline. Each row in the tables below cites the EdgeFirst Studio validation session (v-XXXX) that produced the measurement.

Part of the EdgeFirst Model Zoo.

Training experiment: View on EdgeFirst Studio — dataset, training configuration, metrics, and exported artifacts.

Anchor-free DFL detection head. Detection and instance-segmentation variants.

![x86_64 | Linux](https://img.shields.io/badge/x86__64-Linux-6C757D?style=flat-square)

Reference accuracy — ONNX FP32

Accuracy ceiling for each size, measured against COCO val2017 (5,000 images) with pycocotools. Quantized and compiled artifacts (TFLite INT8, HEF, etc.) are graded against this reference per the EdgeFirst publication rule.

Size	Params	GFLOPs	Box mAP@0.5	Box mAP@0.5-0.95	Mask mAP@0.5	Mask mAP@0.5-0.95	Source
Nano	3.2M	8.9	50.26%	35.44%	47.27%	28.68%	v-6da
Small	11.2M	28.8	59.09%	43.24%	55.62%	34.37%	v-7a4
Medium	25.9M	79.3	64.15%	48.10%	60.69%	37.64%	v-7a7
Large	43.7M	165.7	—	—	—	—	—
XLarge	68.2M	258.5	—	—	—	—	—

On-target validation results

Each row is one EdgeFirst Studio validation session. Click the Source link to inspect the full session — model artifact, dataset version, parameters, per-stage Perfetto trace, and the host hardware description (hostname, kernel version, SoC, NPU, profiler version).

Row conventions in the table below:

Rows whose Δ cell reads ref are the float reference runs each quantized/compiled measurement is graded against.
Rows without a number under the metric columns are validation sessions that are currently work in progress — typically a larger size that has not yet been profiled on a given NPU, or a session that has not yet been linked to an ONNX FP32 reference. The Studio Source link tracks the current status.
Rows whose Δ vs FP32 cell carries a ⚠ are below our accuracy expectations for that platform (more than 10 percentage points under the float reference). The numbers are real measurements on real hardware, reproducible from the linked Studio session, and we publish them as-is; we are investigating the results to make improvements, and the next snapshot of this card will reflect any recovered accuracy.
INT8 rows use the smart (multi-scale split) decoder wherever both smart and logical decoder validations exist — smart is the variant we publish and ship. A dedicated section comparing the smart and logical decoders, using these benchmark results as reference cards, is planned.
End-to-end (ms) is the sequential per-image wall time covering the full pipeline — image load → JPEG decode → preprocess → inference → postprocess. Throughput (FPS) is the measured pipelined rate over the same full pipeline, which normally exceeds 1000 / end-to-end because the runtime overlaps stages across frames.

Size	Platform	Box mAP@0.5	Mask mAP@0.5-0.95	Δ mask vs FP32 (pp)	Inference (ms)	End-to-end (ms)	Throughput (FPS)	Source
Nano	ONNX FP32 (AWS Graviton · 48-core)	50.29%	28.69%	+0.01	39.98	52.37	95.0	v-722
Nano	ONNX FP16 (AWS Graviton)	50.29%	28.67%	-0.01	277.88	360.67	7.0	v-711
Nano	ONNX FP32 (Intel Core i9-13900F · 32-core)	50.28%	28.68%	+0.00	68.44	152.99	23.0	v-6e1
Nano	ONNX FP16 (Intel Core i9-13900F · 32-core)	50.27%	28.69%	+0.01	104.40	207.52	17.0	v-6dc
Nano	ONNX FP32 (CUDA)	50.26%	28.68%	ref	10.79	21.03	243.0	v-6da
Nano	ONNX FP16 (CUDA)	50.27%	28.69%	+0.01	7.95	17.91	333.0	v-6df
Nano	macOS CoreML — Neural Engine (FP16)	49.88%	28.46%	-0.22	2.74	10.21	423.0	v-72c
Nano	macOS CoreML — Metal GPU (FP16)	49.94%	28.47%	-0.21	6.64	16.93	285.5	v-736
Nano	NXP i.MX 8M Plus + VeriSilicon NPU	37.94%	21.74%	-6.94	82.03	123.53	11.0	v-6d3
Nano	NXP i.MX 8M Plus + VeriSilicon NPU	48.10%	27.14%	-1.54	77.65	157.16	11.0	v-6c1
Nano	NXP i.MX 95 + eIQ Neutron NPU	48.04%	27.11%	-1.57	23.07	97.36	29.0	v-7a0
Nano	NXP i.MX 95 + eIQ Neutron NPU	38.09%	21.84%	-6.84	75.34	107.59	48.0	v-6d8
Nano	NXP ARA240 (Kinara DVM)	46.19%	25.84%	-2.84	10.23	47.17	41.0	v-6cb
Nano	Raspberry Pi 5 + Hailo-8L NPU	48.48%	27.61%	-1.07	16.20	35.72	54.0	v-6c4
Nano	NVIDIA Jetson Orin Nano (TensorRT FP16)	50.31%	28.71%	+0.03	5.85	34.69	88.0	v-6c5
Small	ONNX FP32 (AWS Graviton · 48-core)	59.06%	34.31%	-0.06	102.09	115.44	38.0	v-723
Small	ONNX FP32 (CUDA)	59.09%	34.37%	ref	20.57	31.29	151.0	v-7a4
Small	ONNX FP16 (CUDA)	59.09%	34.37%	+0.00	13.35	25.80	222.5	v-7a5
Small	macOS CoreML — Metal GPU (FP16)	58.69%	34.13%	-0.24	21.18	28.02	132.5	v-737
Small	NXP i.MX 8M Plus + VeriSilicon NPU	45.39%	26.94%	-7.43	149.33	189.89	6.0	v-6e0
Small	NXP i.MX 8M Plus + VeriSilicon NPU	57.51%	33.38%	-0.99	144.54	212.64	6.0	v-6d6
Small	NXP i.MX 95 + eIQ Neutron NPU	45.37%	26.80%	-7.57	187.08	206.12	21.0	v-6e2
Small	NXP i.MX 95 + eIQ Neutron NPU	57.48%	33.29%	-1.08	188.43	228.81	21.0	v-6db
Small	NXP ARA240 (Kinara DVM)	54.14%	30.87%	-3.50	16.93	50.20	51.0	v-6d0
Small	Raspberry Pi 5 + Hailo-8L NPU	57.31%	33.08%	-1.29	42.29	59.44	23.0	v-79a
Small	NVIDIA Jetson Orin Nano (TensorRT FP16)	59.05%	34.33%	-0.04	13.49	43.16	96.0	v-6c8
Medium	ONNX FP32 (AWS Graviton · 48-core)	64.15%	37.63%	-0.01	237.88	251.64	16.0	v-724
Medium	ONNX FP32 (CUDA)	64.15%	37.64%	ref	54.87	64.89	62.5	v-7a7
Medium	ONNX FP16 (CUDA)	64.11%	37.63%	-0.01	28.41	40.97	119.0	v-7a8
Medium	macOS CoreML — Neural Engine (FP16)	63.08%	37.10%	-0.54	17.13	23.51	111.0	v-72e
Medium	macOS CoreML — Metal GPU (FP16)	63.12%	37.00%	-0.64	50.37	57.97	58.0	v-738
Medium	NXP i.MX 95 + eIQ Neutron NPU	46.14%	27.71%	-9.93	492.34	512.84	8.0	v-77f
Medium	NXP i.MX 95 + eIQ Neutron NPU	61.68%	36.07%	-1.57	492.62	529.09	8.0	v-77b
Medium	NXP ARA240 (Kinara DVM)	57.81%	33.39%	-4.25	33.79	62.53	35.0	v-798
Medium	Raspberry Pi 5 + Hailo-8L NPU	62.28%	36.48%	-1.16	66.60	83.61	13.0	v-772
Medium	NVIDIA Jetson Orin Nano (TensorRT FP16)	64.11%	37.64%	+0.00	68.26	91.00	58.0	v-766

Validation pipeline

These results are produced by the EdgeFirst on-target validation pipeline:

EdgeFirst Profiler runs on the target hardware, executes the full inference pipeline (image load → decode → preprocess → inference → postprocess), and emits per-image predictions in EdgeFirst Arrow/Parquet plus a Perfetto trace.
EdgeFirst Validator consumes the predictions and trace, computes pycocotools accuracy metrics and per-stage timing summaries, and publishes the results to the Studio validation session.
EdgeFirst HAL (open source) provides the hardware-accelerated preprocessing and post-decoding primitives used at both validation and deployment time, so the timings measured here reflect the same accelerated paths a production runtime would take.

Inference latency is reported as the on-accelerator inference time. End-to-end latency is the sequential per-image wall time across the full pipeline — image load, JPEG decode, preprocessing, inference, and postprocessing — and throughput is the measured pipelined FPS from the Perfetto trace over that same full pipeline. Throughput generally exceeds 1000 / end-to-end because the runtime overlaps stages across frames.

See EdgeFirst Studio for the full validation pipeline.

Downloads

Artifacts are organized by deployment target. Each model file embeds the EdgeFirst edgefirst.json metadata (training session, dataset version, calibration artifact, converter chain) so a single file is sufficient for deployment — no sidecar configuration required.

Per-artifact download links are populated from the Studio artifact registry. To see the live download table, regenerate this card with --studio against an authenticated Studio session.

Inference example (Python)

from edgefirst.hal import Model, TensorImage

# Load the model — embedded edgefirst.json carries labels and decoder config
model = Model("yolov8n-seg-int8.tflite")

# Run inference on an image
image = TensorImage.from_file("image.jpg")
results = model.predict(image)

# Iterate detections
for det in results.detections:
    print(f"{det.label}: {det.confidence:.2f} at {det.bbox}")

EdgeFirst HAL — Hardware abstraction layer with accelerated inference delegates.

Traceability

Every measurement in the tables above is reachable through the EdgeFirst Studio validation framework. The v-XXXX Source link on each row resolves to a public Studio URL of the form:

https://edgefirst.studio/public/validation/v-XXXX/details?mode=charts

The link lands on the Charts view — live system traces (CPU, memory, temperature, power) and per-stage timing recorded during the validation run. The Info and Metrics tabs on the same page carry the configuration and full COCO metric breakdown.

From there, the full provenance chain is one click deeper: training session ID, dataset version, calibration artifact, converter chain (e.g. TFLite quantizer + Neutron compile), validation parameters, and the host hardware description (hostname, kernel version, SoC, NPU, profiler version). The same model file you download from this repository embeds the same chain in its edgefirst.json metadata.

Model	Task	Link
YOLOv5 Detection	Detection	EdgeFirst/yolov5-det
YOLOv8 Detection	Detection	EdgeFirst/yolov8-det
YOLO11 Detection	Detection	EdgeFirst/yolo11-det
YOLO11 Segmentation	Segmentation	EdgeFirst/yolo11-seg
YOLO26 Detection	Detection	EdgeFirst/yolo26-det
YOLO26 Segmentation	Segmentation	EdgeFirst/yolo26-seg

Train your own with EdgeFirst Studio

Train on your own dataset with EdgeFirst Studio:

Free tier includes YOLO training with automatic INT8 quantization and edge deployment.
Upload datasets via EdgeFirst Recorder or COCO/YOLO format.
AI-assisted annotation with auto-labeling.
CameraAdaptor integration for native sensor format training.
Deploy trained models to edge devices via EdgeFirst Client.

Technical notes

Quantization pipeline

All TFLite INT8 models are produced by EdgeFirst's quantization pipeline (details):

ONNX export — standard Ultralytics export with simplify=True
TF-wrapped ONNX — box coordinates normalized to [0, 1] inside DFL decode
Split decoder — boxes, scores, and mask coefficients split into separate output tensors so each receives an independent INT8 quantization scale
Smart calibration — calibration samples selected via greedy coverage maximization; the artifact is content-addressed by parameter hash and cached in Studio for deterministic reuse
Full integer INT8 — uint8 input, int8 output, MLIR quantizer

Split decoder output format

Segmentation (e.g. yolov8n-seg):

boxes — (1, 4, 8400) normalized [0, 1] coordinates
scores — (1, 80, 8400) per-class probabilities
mask_coefs — (1, 32, 8400) per-anchor mask coefficients
protos — (1, 160, 160, 32) prototype masks

Each tensor has its own quantization scale and zero point. The EdgeFirst HAL handles dequantization and reassembly automatically; no application code change is required across NPU targets.

Embedded metadata

TFLite: edgefirst.json and labels.txt embedded in the ZIP-format model file
ONNX: edgefirst.json embedded in model.metadata_props

No sidecar files required; the model artifact is self-contained.

Limitations

COCO bias — models trained on COCO (80 classes) inherit the dataset's biases (Western-centric scenes, particular object distributions, limited weather/lighting diversity).
INT8 quantization loss — full-integer quantization introduces accuracy loss relative to FP32; the magnitude per platform is shown in the Δ vs FP32 column above.
Input resolution — all models expect 640×640 input; other resolutions require letterboxing.

Citation

@software{edgefirst_yolov8_seg,
  title = { {YOLOv8 Segmentation — EdgeFirst Model Zoo} },
  author = {Au-Zone Technologies},
  url = {https://huggingface.co/EdgeFirst/yolov8-seg},
  year = {2026},
  license = {Apache-2.0},
}

Downloads last month: 53

Evaluation results

Box mAP@0.5 (Nano ONNX FP32) on COCO val2017
self-reported

50.260
Mask mAP@0.5-0.95 (Nano ONNX FP32) on COCO val2017
self-reported

28.680
Box mAP@0.5 (Small ONNX FP32) on COCO val2017
self-reported

59.090
Mask mAP@0.5-0.95 (Small ONNX FP32) on COCO val2017
self-reported

34.370
Box mAP@0.5 (Medium ONNX FP32) on COCO val2017
self-reported

64.150
Mask mAP@0.5-0.95 (Medium ONNX FP32) on COCO val2017
self-reported

37.640

EdgeFirst
/

yolov8-seg

YOLOv8 Segmentation — EdgeFirst Model Zoo

![x86_64 | Linux](https://img.shields.io/badge/x86__64-Linux-6C757D?style=flat-square)

Reference accuracy — ONNX FP32

On-target validation results

Validation pipeline

Downloads

Inference example (Python)

Traceability

See also

Train your own with EdgeFirst Studio

Technical notes

Quantization pipeline

Split decoder output format

Embedded metadata

Limitations

Citation

Evaluation results