You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

ONNX OneHotEncoder cats_strings Feature-Column Binding Authority Gap

Summary

An ONNX model file contains a runtime-consumed ML operator attribute (ai.onnx.ml.OneHotEncoder.cats_strings) that binds categorical input values to one-hot output column positions before numeric scoring. A crafted ONNX model can mutate cats_strings ordering while keeping downstream LinearClassifier.coefficients byte-identical, causing the same categorical input to activate a different coefficient column and produce a different score and flipped prediction class.

This is not output label substitution. This is not classlabels_strings. This is pre-score categorical feature-column binding — the mutation happens before numeric computation.

Affected Product

Format: ONNX model file (.onnx)
Root operator: ai.onnx.ml.OneHotEncoder
Root attribute: cats_strings
Downstream consumer: ai.onnx.ml.LinearClassifier (coefficients unchanged)
Runtime: onnxruntime Python package

Vulnerability Details

ai.onnx.ml.OneHotEncoder.cats_strings is a runtime-consumed operator attribute that determines the one-hot column position for each categorical input value. When a victim loads and runs an ONNX model:

import onnxruntime as ort
import numpy as np

sess = ort.InferenceSession("model.onnx")
label, scores = sess.run(None, {"input": np.array(["apple"])})

OneHotEncoder reads cats_strings at runtime and maps each input category to its corresponding output column. The mapping formula is:

one_hot_output[i] = 1.0 if input == cats_strings[i] else 0.0

An attacker who mutates only cats_strings from ["apple", "orange"] to ["orange", "apple"] causes input "apple" to activate column 1 instead of column 0. The downstream LinearClassifier then applies the coefficient at position 1 (intended for "orange") to "apple", producing a different score and flipped prediction — while the coefficient values themselves remain byte-identical.

Impact

Prediction manipulation: Model prediction flips (label 0 → label 1) for the same categorical input while classifier weights are unchanged.
Coefficients unchanged: LinearClassifier.coefficients and intercepts are byte-identical in clean and mutant models. The victim cannot detect manipulation by inspecting coefficient values.
No error generated: onnxruntime.InferenceSession loads the mutated cats_strings silently.
Scope: Any ONNX model that uses ai.onnx.ml.OneHotEncoder for categorical input preprocessing followed by a linear or other numeric classifier where the attacker can control the distributed ONNX file.

Proof of Concept

Package structure:

clean.onnx:
  OneHotEncoder(cats_strings=["apple","orange"]) → Flatten → LinearClassifier(coef=[10,-10,-10,10])

mutant.onnx:
  OneHotEncoder(cats_strings=["orange","apple"]) ← ONLY CHANGE
  → Flatten → LinearClassifier(coef=[10,-10,-10,10])  ← BYTE-IDENTICAL

Run:

pip install onnx onnxruntime
python reproduce_onnx_onehotencoder_feature_binding_flip.py

Expected final line:

ONNX_ONEHOTENCODER_FEATURE_BINDING_FLIP_CONFIRMED

HF PoC repository: PLACEHOLDER

Runtime Evidence

Metric	Value
Input	`"apple"`
clean `cats_strings`	`["apple", "orange"]`
mutant `cats_strings`	`["orange", "apple"]`
clean one-hot output	`[[1.0, 0.0]]` (col0 active)
mutant one-hot output	`[[0.0, 1.0]]` (col1 active)
clean `coefficients`	`[10.0, -10.0, -10.0, 10.0]`
mutant `coefficients`	`[10.0, -10.0, -10.0, 10.0]` (identical)
clean label	0
mutant label	1
clean scores	`[10.0, -10.0]`
mutant scores	`[-10.0, 10.0]`
Prediction flip	0 → 1 — zero coefficient change
clean SHA256	`218f4a707b387710...`
mutant SHA256	`90f97501234161a4...`
Reproducibility	5/5

Distinctness

Prior Finding	Root	Mechanism	Distinct
ONNX SVMClassifier / LinearClassifier `classlabels_strings`	Output label rendering POST-score	Remaps integer score index to string label after computation is complete	DISTINCT — classlabels_strings is post-inference output routing; cats_strings is pre-score feature-column binding
Joblib CountVectorizer `vocabulary_`	Joblib pickle binary (.pkl)	NLP token → sparse integer column	DISTINCT — different format, different runtime, different operator class
SafeTensors `tokenizer.json` `model.vocab`	NLP sidecar JSON file	Token string → embedding row ID in HuggingFace Transformers	DISTINCT — different format (sidecar JSON vs .onnx internal), different framework
SafeTensors `preprocessor_config.json` normalization	CV sidecar JSON file	float image_mean/std/rescale applied to pixel tensor	DISTINCT — different format, different modality (CV float normalization vs categorical one-hot)
TFLite NormalizationOptions	FlatBuffer binary (.tflite)	C++ Task Library float normalization metadata	DISTINCT — different format, different runtime, different mechanism

Key distinctness from classlabels_strings: classlabels_strings remaps a POST-INFERENCE integer class index to an output label string — the numeric score computation is already complete and the result is the same regardless. cats_strings controls a PRE-SCORE mapping that determines which numeric coefficient the input activates — changing it changes the actual numeric score and prediction class, not just the display label.

Non-Claims

The following are not claimed:

This is not a .onnx binary parser bug (no memory corruption, no buffer overflow)
This is not an RCE / ACE / arbitrary code execution finding
Scanner bypass is not the primary impact
This is not classlabels_strings (output label rendering)
This does not claim that no model file content changed; cats_strings is a runtime-consumed operator attribute within the .onnx model file and is the intentionally mutated component

Recommendation

Operator attribute integrity: Bind safety-critical ML operator attributes (cats_strings, string_vocabulary, etc.) to a model-level integrity manifest at save time and verify at load time. cats_strings controls the feature-space semantics received by downstream numeric classifiers and must be treated as security-relevant model state.
Training-time attribute fingerprint: Store and verify a fingerprint of the expected categorical binding attributes as part of the model provenance record. Structural validation of opset and graph shape is insufficient because an attacker can reorder cats_strings entries while preserving graph structure and downstream weight values.
Documentation and warnings: Clearly document that ai.onnx.ml.OneHotEncoder.cats_strings determines which numeric coefficient is applied to each categorical input. Loading tools should warn when categorical binding attributes differ from the trusted model manifest while downstream numeric weights remain unchanged.

References

ONNX ML Operators specification: ai.onnx.ml.OneHotEncoder — https://onnx.ai/onnx/operators/onnx_ml_doc_OneHotEncoder.html
onnxruntime documentation — https://onnxruntime.ai/

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support