YOLOv8n OIV7 β€” CoreML

A CoreML conversion of Ultralytics' YOLOv8 nano model trained on Open Images V7, packaged for on-device inference on iPhone, iPad, and Apple Silicon Macs. Detects 601 object classes at ~10 FPS average on iPhone 12 and newer.

This is the production model shipped inside RealTime AI Camera β€” the same .mlpackage that runs in the App Store binary, byte-for-byte.

Files

File Format Size Purpose
yolov8n_oiv7.mlpackage CoreML (mlpackage) 6.8 MB The model β€” drop straight into an Xcode project
yolov8n-oiv7.pt PyTorch 7.2 MB Source weights from Ultralytics β€” for re-export to ONNX/TFLite/etc.
class_names.txt Plain text 5.6 KB All 601 OIV7 class labels, one per line, in model output order
LICENSE β€” β€” Dual GPL-3.0 / commercial license

Why this exists

YOLOv8 + Open Images V7 gives you 601 detection classes β€” roughly 8Γ— the 80-class COCO baseline that ships in most iOS object detection demos. Categories range from Accordion and Alpaca to Woodpecker, Wrench, and Zucchini. Far better fit for general-purpose camera apps than COCO.

Ultralytics distributes the original PyTorch weights (.pt), but no official CoreML build exists on the Hub for OIV7. This repo fills that gap so iOS developers can skip the conversion step and ship a 601-class detector in minutes β€” instead of the months it took to figure out the conversion the right way (see Conversion Notes below).

Performance

Measured inside RealTime AI Camera across iPhone X and newer:

Device Avg FPS Notes
iPhone X ~7-8 A11 Bionic, no Neural Engine optimization for newer ops
iPhone 12 ~10 A14, Neural Engine accelerated
iPhone 14 Pro ~12-15 A16, fully optimized path
iPhone 15 Pro / 16 ~15+ A17 Pro / A18

Inference uses CoreML + Metal Performance Shaders + Apple Neural Engine on every supported chip. Frame rate auto-throttles based on thermal state to protect battery and avoid throttling spikes.

Usage

Swift / iOS (CoreML)

Drag yolov8n_oiv7.mlpackage into your Xcode project. Xcode auto-generates the yolov8n_oiv7 Swift class:

import CoreML
import Vision

let config = MLModelConfiguration()
config.computeUnits = .all   // CPU + GPU + Neural Engine
let model = try yolov8n_oiv7(configuration: config)
let visionModel = try VNCoreMLModel(for: model.model)

let request = VNCoreMLRequest(model: visionModel) { request, error in
    guard let results = request.results as? [VNRecognizedObjectObservation] else { return }
    for obs in results {
        print(obs.labels.first?.identifier ?? "?", obs.confidence, obs.boundingBox)
    }
}

let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer)
try handler.perform([request])

For a full production-grade integration with frame skipping, thermal management, object tracking, and Metal-accelerated preprocessing, see YOLOv8Processor.swift.

Python (PyTorch)

The .pt file is the standard Ultralytics format:

from ultralytics import YOLO

model = YOLO("yolov8n-oiv7.pt")
results = model("image.jpg")
results[0].show()

Re-exporting to other formats

from ultralytics import YOLO
model = YOLO("yolov8n-oiv7.pt")

model.export(format="onnx")        # ONNX
model.export(format="tflite")      # TensorFlow Lite
model.export(format="coreml")      # CoreML (will produce a similar .mlpackage)

Classes

601 categories from Open Images V7. See class_names.txt for the complete ordered list. A small sample:

Accordion, Aircraft, Airplane, Alpaca, Antelope, Backpack,
Banana, Bear, Bicycle, Boat, Book, Bottle, Bowl, Box, Bread,
... 580+ more ...
Woodpecker, Worm, Wrench, Zebra, Zucchini

The model output index matches the line number (0-indexed) in class_names.txt.


Conversion Notes β€” PyTorch β†’ CoreML, the Right Way

The naive path is from ultralytics import YOLO; YOLO("yolov8n-oiv7.pt").export(format="coreml"). It produces a .mlpackage that loads in Xcode, runs without errors, and is almost never what you actually want. This section documents what actually works in production after months of debugging.

If you want to skip all of this, just drop yolov8n_oiv7.mlpackage from this repo into your Xcode project. If you want to understand it (or convert your own variant β€” yolov8s, m, l, x), read on.

1. Use nms=True or you're writing the decoder yourself

Problem. A naive YOLOv8 export emits a raw tensor of shape [1, 605, 8400] for OIV7 (4 box coordinates + 601 class probabilities, across 8400 anchor positions). Vision's VNCoreMLRequest will not automatically give you VNRecognizedObjectObservations from this β€” it just hands you an MLMultiArray and you have to decode the boxes, run non-max suppression, and map class indices to labels in Swift. That's slow, error-prone, and burns CPU cycles you don't have at 30 FPS.

Fix. Export with NMS baked into the model so Vision returns VNRecognizedObjectObservations directly:

from ultralytics import YOLO
model = YOLO("yolov8n-oiv7.pt")
model.export(format="coreml", nms=True, imgsz=640)

nms=True is the difference between "I'm getting a 605Γ—8400 multi-array, what do I do" and "Vision just gave me a clean array of detections with labels and confidences." Always use it for iOS.

2. Get the weights and tensor alignment right β€” this is where most of the months go

Problem. "It exported, why doesn't it work" is almost always a tensor alignment bug. PyTorch and CoreML disagree about a lot of conventions, and the conversion tool silently picks defaults that often happen to be wrong. There are six independent things that all have to be right at the same time, and getting any one of them wrong gives you a model that loads, runs, and produces garbage β€” never an error message. You spend weeks thinking it's an accuracy problem when it's an alignment problem.

The six things, all of which must agree:

# Convention PyTorch (YOLOv8) CoreML / iOS Vision Failure mode if wrong
a Pixel value range [0, 1] (normalized from uint8) Whatever you set in ImageType(scale=...) Zero detections everywhere β€” model sees pixel values 0–255 instead of 0–1
b Channel order RGB ImageType defaults to RGB but tracing can flip it Detects color-insensitive classes only; fruit/lights/clothing wrong
c Tensor layout NCHW (batch, channels, H, W) NHWC for ANE; NCHW for GPU/CPU fallback Falls off the Neural Engine silently β†’ 2-3Γ— slower
d Box coordinate origin Top-left, normalized [0, 1] Vision wants bottom-left normalized [0, 1] Boxes appear upside-down in your overlay
e Box format (cx, cy, w, h) center+size Vision wants (x, y, w, h) top-left+size Boxes drift from objects, get larger than expected
f Input type torch.Tensor of shape [1, 3, 640, 640] ImageType (not MultiArray) for Vision compatibility Vision can't accept the model β€” you have to feed MLMultiArray manually

Fix. Specify everything explicitly at conversion time. Don't trust defaults. The full incantation that gets all six right for YOLOv8n on iOS:

import coremltools as ct
import torch
from ultralytics import YOLO

# Load PyTorch model and trace it
model = YOLO("yolov8n-oiv7.pt").model
model.eval()
example = torch.rand(1, 3, 640, 640)  # NCHW, [0, 1]
traced = torch.jit.trace(model, example)

# Read class labels in model output order
labels = open("class_names.txt").read().strip().splitlines()

# Convert with EVERY alignment knob set explicitly
mlmodel = ct.convert(
    traced,
    inputs=[
        ct.ImageType(
            name="image",
            shape=(1, 3, 640, 640),
            scale=1/255.0,                  # (a) uint8 β†’ [0, 1]
            bias=[0, 0, 0],                 # YOLOv8 doesn't use ImageNet mean
            color_layout=ct.colorlayout.RGB # (b) explicitly RGB, not BGR
        )
    ],
    classifier_config=ct.ClassifierConfig(class_labels=labels),  # bake labels
    compute_units=ct.ComputeUnit.ALL,
    minimum_deployment_target=ct.target.iOS15,
    convert_to="mlprogram",                 # mlpackage format, not legacy mlmodel
)

# Save with metadata
mlmodel.author = "Matt Macosko"
mlmodel.short_description = "YOLOv8n trained on Open Images V7 (601 classes)"
mlmodel.input_description["image"] = "Input image, will be resized to 640x640"
mlmodel.save("yolov8n_oiv7.mlpackage")

Then handle the box convention mismatch (d, e) in Swift after you get the Vision results β€” Vision actually flips the Y-axis for you when you use VNRecognizedObjectObservation, but only if the model was exported with nms=True (step 1). If you're decoding MLMultiArray output yourself, you have to flip Y manually:

// Vision gives you boundingBox with origin at bottom-left, normalized [0, 1].
// To draw on a UIView (top-left origin), flip Y:
let visionBox = observation.boundingBox  // bottom-left origin
let viewBox = CGRect(
    x: visionBox.minX * viewWidth,
    y: (1 - visionBox.maxY) * viewHeight,   // ← the Y flip
    width: visionBox.width * viewWidth,
    height: visionBox.height * viewHeight
)

This is the single biggest source of "I exported, the model loads, but everything is broken" β€” and there is no error. Just empty results, or boxes in the wrong place, or boxes for the wrong objects. Get all six right at once and the model snaps into working. Get any one wrong and you can't tell which one without methodically toggling each.

3. Letterbox your input β€” Vision's default centerCrop is silently wrong

Problem. YOLOv8 was trained on 640Γ—640 inputs that were letterboxed (aspect-preserved with gray padding). Vision's default behavior on a VNImageRequestHandler is cropAndScaleOption = .centerCrop, which crops to the model's input size. On a 1920Γ—1080 portrait camera frame, that means you lose the top and bottom of the image and detections at frame edges silently disappear. You won't get an error β€” you just notice your model is "kinda working but missing things."

Fix. Either:

  • Set cropAndScaleOption = .scaleFit on the request (CoreML will pad with black instead of cropping), OR
  • Letterbox the pixel buffer yourself in Metal before handing it to Vision. The Metal path is faster because you can fuse it with format conversion (see step 3) and color-space normalization in a single shader pass.

In RealTime AI Camera, MetalImageResizer.swift + Shader.metal do this — letterbox + BGRA→RGB + normalize in one Metal compute pass before the Vision request. Cuts preprocessing latency from ~3 ms to ~0.4 ms per frame.

4. The iOS camera hands you BGRA. YOLO wants RGB.

Problem. AVCaptureVideoDataOutput defaults to kCVPixelFormatType_32BGRA. YOLOv8 was trained on RGB. If you feed BGRA directly into a model expecting RGB, you don't crash β€” you get a model that kind of works for color-insensitive classes (people, vehicles) and silently fails for color-sensitive ones (fruit, traffic lights, anything where red↔blue matters). You'll spend a week thinking it's a model accuracy problem when it's a channel order bug.

Fix. Either set kCVPixelFormatType_32RGBA on the capture output (slower, BGRA is the hardware-native iOS format) or swap channels in Metal during preprocessing (free, since you're already touching every pixel for the resize).

5. Pin your coremltools version

Problem. The PyTorch β†’ CoreML conversion path has changed materially between coremltools 5, 6, 7, and 8. Each version produces .mlpackages with different Neural Engine op coverage. Ultralytics' exporter wraps a specific coremltools version, and upgrading either side independently is the fastest way to break a working model.

Fix. Pin both:

ultralytics==8.2.x
coremltools==7.2
torch==2.2.x

When you find a combination that produces a model that runs on ANE end-to-end on your target device, freeze it. Document the exact versions in your repo. Don't pip install --upgrade casually.

6. Neural Engine dispatch is invisible β€” profile it

Problem. You set MLModelConfiguration.computeUnits = .all and trust CoreML to use the Neural Engine. CoreML will silently fall back to GPU or CPU for any operator it can't run on ANE, and you don't get a warning. You only notice because the model is 2-3Γ— slower than you expected, or because the device gets uncomfortably hot.

Fix. Profile with Instruments β†’ CoreML template. The trace shows you exactly which ops ran on ANE vs GPU vs CPU and which layer caused a fallback. Common YOLOv8 culprits: certain reshape patterns, the Sigmoid after the classification head on older chips, and any custom NMS ops. If you find a fallback, try coremltools.optimize.coreml to convert problem layers to a supported variant.

You can also force computeUnits = .cpuAndNeuralEngine to exclude the GPU and see whether your model can run ANE-only. If it can't, Instruments will tell you exactly where it falls off.

7. FP16 is the sweet spot β€” int8 trashes small-object accuracy

Problem. It's tempting to int8-quantize for the smaller binary and the (claimed) ANE speedup. For YOLOv8 nano specifically, int8 quantization measurably hurts mAP on small objects, which the nano variant already struggles with compared to s/m/l/x. You'll save 2 MB of disk and lose 5-10% mAP on the long tail of classes.

Fix. FP16 is the right answer for YOLOv8n on iOS. It's already quantized in the export β€” use weight_precision="float16" or just trust Ultralytics' default. Skip int8 unless you've measured mAP on a held-out set and you're sure you can afford the accuracy loss.

8. App Store review wants a privacy manifest

Problem. Once you finally have a working model and ship it, App Review pushes back asking for a PrivacyInfo.xcprivacy file that declares your model's data use. They specifically care that the model runs on-device (no data leaves the phone) and that you're not collecting user images.

Fix. Add a PrivacyInfo.xcprivacy to your Xcode project that declares:

  • NSPrivacyTracking = false
  • No tracking domains
  • The required reason API codes for camera access (NSCameraUsageDescription)
  • Explicitly: no image data collection

See PrivacyInfo.xcprivacy in the RealTime AI Camera repo for a complete working example.

TL;DR

from ultralytics import YOLO
YOLO("yolov8n-oiv7.pt").export(
    format="coreml",
    nms=True,           # Vision returns VNRecognizedObjectObservation directly
    imgsz=640,          # match training resolution
    half=True,          # FP16, ANE-friendly
    int8=False,         # int8 trashes small-object mAP for the nano variant
)

Then in your iOS app: letterbox in Metal, swap BGRA→RGB during the same pass, set computeUnits = .all, and profile with Instruments to confirm ANE dispatch.

That's the difference between a model that "works" and a model that ships at 10+ FPS on a four-year-old phone.


License

Dual licensed:

  1. GPL-3.0 for open source / non-commercial use
  2. Commercial license required for App Store / TestFlight / Google Play / any other commercial distribution. Contact Matt Macosko via nicedreamzwholesale.com for commercial licensing.

This inherits from upstream:

Any redistribution must preserve the attribution section in LICENSE and credit "Matt Macosko" in any public use.

Links

Acknowledgments

  • Ultralytics for YOLOv8 and the OIV7 training run
  • Google for Open Images V7
  • Apple for CoreML, Metal, and the Neural Engine

Built and shipped by NiceDreamzApps Β· Privacy-first Β· 100% on-device

Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for divinetribe/yolov8n-oiv7-coreml

Quantized
(34)
this model

Space using divinetribe/yolov8n-oiv7-coreml 1