You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

ONNX OneHotEncoder cats_strings Feature-Column Binding Authority Gap

Summary

An ONNX model file contains a runtime-consumed ML operator attribute (ai.onnx.ml.OneHotEncoder.cats_strings) that binds categorical input values to one-hot output column positions before numeric scoring. A crafted ONNX model can mutate cats_strings ordering while keeping downstream LinearClassifier.coefficients byte-identical, causing the same categorical input to activate a different coefficient column and produce a different score and flipped prediction class.

This is not output label substitution. This is not classlabels_strings. This is pre-score categorical feature-column binding β€” the mutation happens before numeric computation.


Affected Product

  • Format: ONNX model file (.onnx)
  • Root operator: ai.onnx.ml.OneHotEncoder
  • Root attribute: cats_strings
  • Downstream consumer: ai.onnx.ml.LinearClassifier (coefficients unchanged)
  • Runtime: onnxruntime Python package

Vulnerability Details

ai.onnx.ml.OneHotEncoder.cats_strings is a runtime-consumed operator attribute that determines the one-hot column position for each categorical input value. When a victim loads and runs an ONNX model:

import onnxruntime as ort
import numpy as np

sess = ort.InferenceSession("model.onnx")
label, scores = sess.run(None, {"input": np.array(["apple"])})

OneHotEncoder reads cats_strings at runtime and maps each input category to its corresponding output column. The mapping formula is:

one_hot_output[i] = 1.0 if input == cats_strings[i] else 0.0

An attacker who mutates only cats_strings from ["apple", "orange"] to ["orange", "apple"] causes input "apple" to activate column 1 instead of column 0. The downstream LinearClassifier then applies the coefficient at position 1 (intended for "orange") to "apple", producing a different score and flipped prediction β€” while the coefficient values themselves remain byte-identical.


Impact

  • Prediction manipulation: Model prediction flips (label 0 β†’ label 1) for the same categorical input while classifier weights are unchanged.
  • Coefficients unchanged: LinearClassifier.coefficients and intercepts are byte-identical in clean and mutant models. The victim cannot detect manipulation by inspecting coefficient values.
  • No error generated: onnxruntime.InferenceSession loads the mutated cats_strings silently.
  • Scope: Any ONNX model that uses ai.onnx.ml.OneHotEncoder for categorical input preprocessing followed by a linear or other numeric classifier where the attacker can control the distributed ONNX file.

Proof of Concept

Package structure:

clean.onnx:
  OneHotEncoder(cats_strings=["apple","orange"]) β†’ Flatten β†’ LinearClassifier(coef=[10,-10,-10,10])

mutant.onnx:
  OneHotEncoder(cats_strings=["orange","apple"]) ← ONLY CHANGE
  β†’ Flatten β†’ LinearClassifier(coef=[10,-10,-10,10])  ← BYTE-IDENTICAL

Run:

pip install onnx onnxruntime
python reproduce_onnx_onehotencoder_feature_binding_flip.py

Expected final line:

ONNX_ONEHOTENCODER_FEATURE_BINDING_FLIP_CONFIRMED

HF PoC repository: PLACEHOLDER


Runtime Evidence

Metric Value
Input "apple"
clean cats_strings ["apple", "orange"]
mutant cats_strings ["orange", "apple"]
clean one-hot output [[1.0, 0.0]] (col0 active)
mutant one-hot output [[0.0, 1.0]] (col1 active)
clean coefficients [10.0, -10.0, -10.0, 10.0]
mutant coefficients [10.0, -10.0, -10.0, 10.0] (identical)
clean label 0
mutant label 1
clean scores [10.0, -10.0]
mutant scores [-10.0, 10.0]
Prediction flip 0 β†’ 1 β€” zero coefficient change
clean SHA256 218f4a707b387710...
mutant SHA256 90f97501234161a4...
Reproducibility 5/5

Distinctness

Prior Finding Root Mechanism Distinct
ONNX SVMClassifier / LinearClassifier classlabels_strings Output label rendering POST-score Remaps integer score index to string label after computation is complete DISTINCT β€” classlabels_strings is post-inference output routing; cats_strings is pre-score feature-column binding
Joblib CountVectorizer vocabulary_ Joblib pickle binary (.pkl) NLP token β†’ sparse integer column DISTINCT β€” different format, different runtime, different operator class
SafeTensors tokenizer.json model.vocab NLP sidecar JSON file Token string β†’ embedding row ID in HuggingFace Transformers DISTINCT β€” different format (sidecar JSON vs .onnx internal), different framework
SafeTensors preprocessor_config.json normalization CV sidecar JSON file float image_mean/std/rescale applied to pixel tensor DISTINCT β€” different format, different modality (CV float normalization vs categorical one-hot)
TFLite NormalizationOptions FlatBuffer binary (.tflite) C++ Task Library float normalization metadata DISTINCT β€” different format, different runtime, different mechanism

Key distinctness from classlabels_strings: classlabels_strings remaps a POST-INFERENCE integer class index to an output label string β€” the numeric score computation is already complete and the result is the same regardless. cats_strings controls a PRE-SCORE mapping that determines which numeric coefficient the input activates β€” changing it changes the actual numeric score and prediction class, not just the display label.


Non-Claims

The following are not claimed:

  • This is not a .onnx binary parser bug (no memory corruption, no buffer overflow)
  • This is not an RCE / ACE / arbitrary code execution finding
  • Scanner bypass is not the primary impact
  • This is not classlabels_strings (output label rendering)
  • This does not claim that no model file content changed; cats_strings is a runtime-consumed operator attribute within the .onnx model file and is the intentionally mutated component

Recommendation

  1. Operator attribute integrity: Bind safety-critical ML operator attributes (cats_strings, string_vocabulary, etc.) to a model-level integrity manifest at save time and verify at load time. cats_strings controls the feature-space semantics received by downstream numeric classifiers and must be treated as security-relevant model state.
  2. Training-time attribute fingerprint: Store and verify a fingerprint of the expected categorical binding attributes as part of the model provenance record. Structural validation of opset and graph shape is insufficient because an attacker can reorder cats_strings entries while preserving graph structure and downstream weight values.
  3. Documentation and warnings: Clearly document that ai.onnx.ml.OneHotEncoder.cats_strings determines which numeric coefficient is applied to each categorical input. Loading tools should warn when categorical binding attributes differ from the trusted model manifest while downstream numeric weights remain unchanged.

References

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support