You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Transformers model package prediction flip via tokenizer.json vocab binding while safetensors weights remain unchanged

Summary

A crafted HuggingFace Transformers model package can silently invert model inference outcomes by mutating the tokenizer.json model.vocab token-to-ID binding table, while leaving model.safetensors and config.json completely unchanged.

This is not a .safetensors parser bug. It is a SafeTensors-based Transformers model package vulnerability where tokenizer.json controls which embedding rows in model.safetensors are addressed for each input token. No modification to model weights is required to flip predictions.

Affected Product

Package: HuggingFace transformers
Format: SafeTensors-based Transformers model package (model.safetensors + tokenizer.json + config.json)
Relevant file: tokenizer.json — model.vocab field
Runtime path: AutoTokenizer.from_pretrained() → AutoModelForSequenceClassification.from_pretrained() → model.forward()

Vulnerability Details

tokenizer.json contains the model.vocab dictionary, which maps each token string to an integer ID. These IDs are consumed at inference time by the tokenizers Rust backend (token_to_id()) and used to construct input_ids. The input_ids are then passed to model.forward(), where they index into the word-embedding matrix stored in model.safetensors.

There is no cross-validation between the tokenizer.json vocab binding and the model.safetensors weight matrix at load time. A model package distributor can swap any two token ID assignments in tokenizer.json, causing those tokens to address different embedding rows without altering the weight file.

Mechanism (step by step)

In the clean package: "good" → ID 2204, "bad" → ID 2919
In the mutant package: "good" → ID 2919, "bad" → ID 2204
model.safetensors hash: identical in both packages
config.json hash: identical in both packages
Raw text "the movie is good" is tokenized differently:
- clean: input_ids = [101, 1996, 3185, 2003, **2204**, 102] → prediction: POSITIVE
- mutant: input_ids = [101, 1996, 3185, 2003, **2919**, 102] → prediction: NEGATIVE
The model.forward() invocation is identical in both cases; only the index into the embedding matrix differs.

Proof of Concept

This repository contains a minimal PoC using a tiny deterministic BertForSequenceClassification model (hidden_size=16, 1 layer, vocab_size=30522) with crafted weights to isolate the mechanism. The affected runtime path is the standard Transformers package path: AutoTokenizer.from_pretrained(), AutoModelForSequenceClassification.from_pretrained(), and model.forward().

Reproduce

pip install -r requirements.txt
python reproduce_transformers_tokenizer_vocab_binding_flip.py

Expected terminal output:

TRANSFORMERS_TOKENIZER_VOCAB_BINDING_FLIP_CONFIRMED

Inspect (hash matrix + vocab analysis)

python inspect_transformers_tokenizer_vocab_binding_hash_matrix.py

Expected terminal output:

TRANSFORMERS_TOKENIZER_VOCAB_BINDING_HASH_MATRIX_PASS

Runtime Evidence

Text	clean input_ids	mutant input_ids	clean pred	mutant pred	flipped
"the movie is good"	[101,1996,3185,2003,2204,102]	[101,1996,3185,2003,2919,102]	POSITIVE	NEGATIVE	✓
"this film is bad"	[101,2023,2143,2003,2919,102]	[101,2023,2143,2003,2204,102]	NEGATIVE	POSITIVE	✓
"an absolutely good film"	[101,2019,7078,2204,2143,102]	[101,2019,7078,2919,2143,102]	POSITIVE	NEGATIVE	✓

Flip count: 3/3. Logit delta: 3.9965.

Hash Matrix

File	clean SHA3-256 (16 hex)	mutant SHA3-256 (16 hex)	Status
`model.safetensors`	`ae27a5f4ca265c51`	`ae27a5f4ca265c51`	IDENTICAL
`config.json`	`0340a71886d99eac`	`0340a71886d99eac`	IDENTICAL
`tokenizer.json`	`2bd5fd376f6e2aa0`	`e68711129c55c39b`	DIFFERS

tokenizer.json is the only modified file.

Route Framing

tokenizer.json is model package state. It is loaded by the Transformers runtime at inference time to convert raw text into token IDs. The binding it establishes between token strings and embedding rows in model.safetensors determines model output. Distributing a package with a mutated tokenizer.json is distributing a semantically modified model — even though the weight file bytes are unchanged.

Distinctness

This finding is distinct from all prior submissions:

Dimension	This finding	Prior ONNX/Joblib/TFLite submissions
Format	SafeTensors + `tokenizer.json`	`.onnx`, `.joblib`, `.tflite`
Mutated component	`tokenizer.json` model.vocab	ONNX op attribute, sklearn vocab, TFLite metadata
Loader	HuggingFace `tokenizers` Rust backend + `safetensors` Rust loader	onnxruntime / scikit-learn / Task Library
Inference API	`AutoTokenizer` + `AutoModelForSequenceClassification` + `model.forward()`	different runtimes

Non-Claims

This is not a .safetensors binary format parser vulnerability.
This is not arbitrary code execution.
This is not a scanner bypass (primary claim).
Model weights are unchanged. tokenizer.json is part of the model package state; the claim is not that no model state changed.
This PoC uses a tiny deterministic BertForSequenceClassification to isolate the mechanism. The issue applies to any Transformers model package that uses tokenizer.json for vocabulary binding.

Recommendation

Transformers model loading should validate that the tokenizer.json vocabulary mapping is consistent with the intended model configuration. At minimum, downstream consumers should treat tokenizer.json with the same integrity requirements as model.safetensors when verifying model package authenticity.

References

HuggingFace tokenizers library: https://github.com/huggingface/tokenizers
HuggingFace transformers library: https://github.com/huggingface/transformers
SafeTensors format: https://github.com/huggingface/safetensors

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support