You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

CoreML Class Label Mapping Integrity Gap

Summary

A CoreML classifier stores numeric model parameters (weights, offset) and the class label mapping (stringClassLabels) as independent fields in the protobuf spec. When only the class label mapping is changed, coremltools.models.MLModel() accepts the model without warning, and the CoreML runtime returns a flipped predicted label while numeric probabilities remain unchanged.

Key Evidence

Property Value
Target Core ML (.mlmodel)
Tool coremltools 9.0
Runtime CoreML.framework (macOS)
Input x = 2.0
Baseline labels ["benign", "dangerous"]
Tampered labels ["dangerous", "benign"]
Baseline predicted label dangerous
Tampered predicted label benign
Weights changed No
Probabilities changed No
Warning emitted No
WEIGHTS_UNCHANGED=True
STRUCTURE_UNCHANGED=True
CLASS_LABELS_CHANGED=True
PROBABILITIES_UNCHANGED=True
BASELINE_PREDICTED_LABEL=dangerous
TAMPERED_PREDICTED_LABEL=benign
LABEL_FLIP_CONFIRMED=True
WARNING_EMITTED=False
IMPACT_CONFIRMED=True

Mechanism

In a CoreML GLM classifier spec:

message GLMClassifier {
  repeated double weights = ...;
  repeated double offset  = ...;
  oneof ClassLabels {
    StringVector stringClassLabels = ...;
  }
}

The stringClassLabels.vector[] array is stored independently of the numeric classifier parameters. Reversing this array does not alter the underlying sigmoid computation โ€” the model still computes P(class_index_1) = sigmoid(1.5 * x - 0.5) โ‰ˆ 0.924 for x=2.0. However, the CoreML runtime maps the winning index to stringClassLabels[index], so the returned label string differs between models.

coremltools.models.MLModel() does not bind the integrity of class label strings to classifier weights. There is no load-time or predict-time warning when label strings diverge from their trained semantic meaning.

Files

create_coreml_label_flip.py     Create baseline and tampered .mlmodel files
inspect_coreml_spec.py          Verify structural properties (no runtime required)
reproduce_coreml_runtime.py     Runtime prediction via CoreML.framework (macOS only)
requirements-coreml.txt         Python dependencies
expected_output.txt             Expected output reference
SHA256SUMS_T1.txt               File hashes
baseline.mlmodel                GLM classifier, labels=["benign","dangerous"]
tampered_labels.mlmodel         Same weights, labels=["dangerous","benign"]
input.npy                       Test input x=2.0
label_mapping_diff.json         Label change record

Usage

pip install -r requirements-coreml.txt

# Create models (Linux/macOS)
python create_coreml_label_flip.py --outdir .

# Inspect spec (Linux/macOS)
python inspect_coreml_spec.py baseline.mlmodel tampered_labels.mlmodel

# Runtime prediction (macOS only โ€” requires CoreML.framework)
python reproduce_coreml_runtime.py baseline.mlmodel tampered_labels.mlmodel
Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support