YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
- Core ML (coremltools): Model Validation Bypass & Path Traversal
Core ML (coremltools): Model Validation Bypass & Path Traversal
Summary
coremltools 9.0's load_spec() and MLModel.__init__() perform zero structural validation on loaded Core ML models, accepting models with extreme dimensions, weight/dimension mismatches, and invalid spec versions. The MIL proto deserialization's _load_file_value() contains a path traversal vulnerability in BlobFileValue.fileName handling, allowing malicious models to reference files outside the model's weights directory via fileName="..".
- No Structural Validation:
load_spec()parses the protobuf and returns immediately with no content checks β no dimension bounds, no weight count verification, no layer structure validation. - Path Traversal in BlobFileValue.fileName: Incomplete path sanitization via
.split("/")[-1]allows".."to escape the weights directory. Windows backslash paths bypass the forward-slash split entirely. - Integer Overflow Risk:
np.prod(shape)in_restore_np_from_bytes_value()has no bounds checking on dimension values from protobuf.
Unlike ONNX (which provides check_model() and fixed 6 path traversal CVEs in v1.21.0), coremltools has no model validation function and no path containment checks on weight file references.
CVSS 3.1: 7.1 (AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H)
Tested Context
- Package:
coremltools - Version: 9.0
- Python: 3.10
- Date: 2026-05-08
Vulnerability 1 (HIGH): Path Traversal in MIL BlobFileValue.fileName
Location: coremltools/converters/mil/frontend/milproto/load.py:113 (_load_file_value)
def _load_file_value(context, filevalue_spec, dtype):
if BlobReader is None:
raise RuntimeError("BlobReader not loaded")
if not isinstance(filevalue_spec, proto.MIL_pb2.Value.BlobFileValue):
raise TypeError("Invalid BlobFileValue spec object")
filename = os.path.join(context.weights_dir,
filevalue_spec.fileName.split("/")[-1]) # <-- INCOMPLETE sanitization
offset = filevalue_spec.offset
blob_reader = BlobReader(filename) # <-- Opens file at attacker-influenced path
The defense .split("/")[-1] only extracts the last /-delimited component, which fails in two cases:
fileName = ".."βsplit("/")[-1]=".."βos.path.join(weights_dir, "..")= parent directoryfileName = "..\\..\\evil"(Windows) βsplit("/")[-1]="..\\..\\evil"(backslash not split) β escapes
No os.path.realpath() containment check is performed. The BlobReader (C++) opens the file at the attacker-controlled path without further validation.
Why This Matters
When coremltools loads a .mlpackage (directory-based model), weight files are referenced via BlobFileValue.fileName in the saved_model.pb (protobuf). A malicious .mlpackage can inject path traversal sequences that escape the weights directory. When the model's weights are loaded, BlobReader reads from the attacker-controlled path, enabling arbitrary file read and potential code execution if the file content is interpreted as executable code.
The same vulnerability pattern appears at two additional locations:
milproto/load.py:318:filename = filevalue_spec.fileName.split("/")[-1]milproto/load.py:331:filename = filevalue_spec.fileName.split("/")[-1]
Reproduction (Conceptual β requires macOS for full exploitation)
import coremltools as ct
from coremltools.proto import Model_pb2, MIL_pb2
import tempfile, os
tmpdir = tempfile.mkdtemp()
# Create a model with external weight reference
spec = Model_pb2.Model()
spec.specificationVersion = 1
spec.description.metadata.shortDescription = "malicious"
# Configure mlProgram with blob file values
program = spec.mlProgram
# ... (construct MIL program with BlobFileValue containing fileName="..")
# Save as .mlpackage
weights_dir = os.path.join(tmpdir, "weights")
os.makedirs(weights_dir)
ct.models.utils.save_spec(spec, os.path.join(tmpdir, "evil.mlpackage"),
weights_dir=weights_dir)
# Loading the model would trigger _load_file_value
# which opens weights_dir/.. (the parent directory)
model = ct.models.MLModel(os.path.join(tmpdir, "evil.mlpackage"))
Path Traversal Demonstration (Python logic only)
import os
weights_dir = "/tmp/model/weights"
# Case 1: fileName = ".."
basename = ".." # .split("/")[-1] of ".."
result = os.path.join(weights_dir, basename)
print(os.path.normpath(result)) # /tmp/model β ESCAPES
# Case 2: Windows backslash bypass
basename = "..\\..\\Windows\\win.ini" # .split("/")[-1] of same
result = os.path.join(weights_dir, basename)
print(os.path.normpath(result)) # \tmp\Windows\win.ini β ESCAPES
Impact
- Malicious
.mlpackagefiles can reference and read arbitrary files outside the model's weights directory - Information disclosure through crafted model files
- Potentially arbitrary code execution if read data is interpreted as executable
- Affects ML pipelines that load models from untrusted sources
Vulnerability 2 (MEDIUM): No Structural Validation on Model Load
Location: coremltools/models/utils.py:238-272 (load_spec)
def load_spec(model_path):
specfile = model_path
spec = _proto.Model_pb2.Model()
with open(specfile, "rb") as f:
spec.ParseFromString(f.read())
return spec # <-- No validation whatsoever
load_spec() and MLModel.__init__() perform zero content validation on loaded models:
- No dimension bounds checking β
inputChannels = 2^31 - 1accepted - No weight count vs dimension verification β 5 weight values for 1MΓ1M declared dims accepted
- No spec version validation β version 999999 accepted
- No layer structure validation β layers with no type set accepted
- No
check_model()equivalent β Unlike ONNX, no validation function exists
Reproduction
import coremltools as ct
from coremltools.proto import Model_pb2
import tempfile, os
tmpdir = tempfile.mkdtemp()
# PoC A: Extreme dimensions
spec = Model_pb2.Model()
spec.specificationVersion = 1
nn = spec.neuralNetwork
layer = nn.layers.add()
layer.name = "fc"
layer.input.append("input")
layer.output.append("output")
layer.innerProduct.inputChannels = 2**31 - 1 # Near INT32_MAX
layer.innerProduct.outputChannels = 2**31 - 1
model_path = os.path.join(tmpdir, "extreme.mlmodel")
with open(model_path, "wb") as f:
f.write(spec.SerializeToString())
loaded = ct.models.MLModel(model_path) # No error!
# PoC B: Weight/dimension mismatch
spec2 = Model_pb2.Model()
spec2.specificationVersion = 1
nn2 = spec2.neuralNetwork
layer2 = nn2.layers.add()
layer2.name = "fc2"
layer2.input.append("input")
layer2.output.append("output")
ip = layer2.innerProduct
ip.inputChannels = 1000000
ip.outputChannels = 1000000
ip.weights.floatValue.extend([1.0, 2.0, 3.0]) # Only 3 values!
model_path2 = os.path.join(tmpdir, "mismatch.mlmodel")
with open(model_path2, "wb") as f:
f.write(spec2.SerializeToString())
loaded2 = ct.models.MLModel(model_path2) # No error!
# PoC C: Invalid spec version
spec3 = Model_pb2.Model()
spec3.specificationVersion = 999999
model_path3 = os.path.join(tmpdir, "future.mlmodel")
with open(model_path3, "wb") as f:
f.write(spec3.SerializeToString())
loaded3 = ct.models.MLModel(model_path3) # No error!
Impact
- Malicious models with any structure pass loading without error
- No way to validate model safety before loading (no
check_model()/contains_model()equivalent) - Models with mismatched dimensions can cause crashes or memory exhaustion in downstream processing
- Affects all systems loading Core ML models from untrusted sources
Vulnerability 3 (LOW): Integer Overflow in Dimension Product
Location: coremltools/converters/mil/frontend/milproto/load.py:162 (_restore_np_from_bytes_value)
def _restore_np_from_bytes_value(value, dtype, shape):
element_num = np.prod(shape) # <-- No bounds check on shape values
# ...
return np.frombuffer(value, types.nptype_from_builtin(dtype)).reshape(shape)
Shape values come directly from protobuf via helper.py:13 (dim.constant.size) with no maximum value check. np.prod() can silently overflow for extreme dimensions, leading to undersized allocation or incorrect reshaping.
Reproduction
import numpy as np
shape = (2**31, 2**31)
print(np.prod(shape)) # May overflow depending on platform
# With extreme dimension from protobuf:
# dim1.size = 2**32, dim2.size = 2**31
# Product = 2**63 which exceeds INT64_MAX, causes integer overflow
Comparison: ONNX vs Core ML Validation
| Feature | ONNX 1.21.0 | Core ML (coremltools 9.0) |
|---|---|---|
check_model() function |
Yes | No |
| Dimension bounds check | Yes | No |
| Weight size vs dims check | Partial (in check_tensor) |
No |
| Path containment for external data | Yes (v1.21.0) | No |
| Spec version validation | Yes (IR version check) | No (only if native libs load) |
| Structural integrity check | Yes | No (validate() is DEBUG-gated) |
| Symlink/hardlink validation | Yes (v1.21.0) | No |
Fixes
1) Validate asset filenames against path traversal:
def _validate_weight_filename(weights_dir, filename):
"""Validate that weight filename does not escape the weights directory."""
# Reject filenames that are just ".." or contain path separators
basename = filename.split("/")[-1]
if basename == ".." or os.sep in basename or "/" in basename:
raise ValueError(
f"Invalid weight filename: {filename!r} contains path traversal"
)
full_path = os.path.realpath(os.path.join(weights_dir, basename))
expected_dir = os.path.realpath(weights_dir)
if not full_path.startswith(expected_dir + os.sep) and full_path != expected_dir:
raise ValueError(
f"Weight file path escapes model directory: {full_path}"
)
return full_path
2) Add structural validation to load_spec():
def load_spec(model_path, validate=True):
spec = _proto.Model_pb2.Model()
with open(specfile, "rb") as f:
spec.ParseFromString(f.read())
if validate:
_check_model(spec)
return spec
def _check_model(spec):
"""Validate structural integrity of a Core ML model spec."""
# Check spec version is supported
if spec.specificationVersion > CURRENT_SPEC_VERSION:
raise ValueError(f"Unsupported specification version: {spec.specificationVersion}")
# Validate neural network layers
if spec.WhichOneof("Type") == "neuralNetwork":
for layer in spec.neuralNetwork.layers:
if layer.WhichOneof("layer") is None:
raise ValueError("Layer has no type set")
_validate_layer_params(layer)
def _validate_layer_params(layer):
"""Validate layer parameters for safety."""
layer_type = layer.WhichOneof("layer")
if layer_type == "innerProduct":
ip = layer.innerProduct
if ip.inputChannels == 0 or ip.outputChannels == 0:
raise ValueError("InnerProduct channels must be > 0")
if ip.inputChannels > MAX_DIM or ip.outputChannels > MAX_DIM:
raise ValueError(f"InnerProduct dimensions exceed max ({MAX_DIM})")
# Validate weight count matches dimensions
expected = ip.inputChannels * ip.outputChannels
if len(ip.weights.floatValue) not in (0, expected):
raise ValueError(f"Weight count mismatch: got {len(ip.weights.floatValue)}, expected {expected}")
3) Add bounds check on dimension product:
def _restore_np_from_bytes_value(value, dtype, shape):
# Validate shape
for dim in shape:
if dim < 0 or dim > MAX_TENSOR_DIM:
raise ValueError(f"Dimension {dim} out of valid range")
element_num = np.prod(shape)
if element_num > MAX_TENSOR_ELEMENTS:
raise ValueError(f"Tensor element count {element_num} exceeds maximum")
# ...
Notes
- The path traversal vulnerability (Vuln 1) is similar in nature to the ONNX external data path traversal CVEs (GHSA-538c-55jv-c5g9) fixed in ONNX 1.21.0
- coremltools is Apple's reference implementation for the Core ML format β vulnerabilities affect all tools that load Core ML models
- The
.mlpackageformat (directory with separate weight files) is more susceptible to path traversal than.mlmodel(single file with embedded weights) - Native components (
libcoremlpython,libmilstoragepython,libmodelpackage) are only available on macOS, limiting exploitability of some paths on other platforms - The
validate()method exists in the MIL pipeline but is gated behindDEBUG=Trueflag and is never called during model loading