You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Transformers model package prediction flip via tokenizer.json vocab binding while safetensors weights remain unchanged

Summary

A crafted HuggingFace Transformers model package can silently invert model inference outcomes by mutating the tokenizer.json model.vocab token-to-ID binding table, while leaving model.safetensors and config.json completely unchanged.

This is not a .safetensors parser bug. It is a SafeTensors-based Transformers model package vulnerability where tokenizer.json controls which embedding rows in model.safetensors are addressed for each input token. No modification to model weights is required to flip predictions.

Affected Product

  • Package: HuggingFace transformers
  • Format: SafeTensors-based Transformers model package (model.safetensors + tokenizer.json + config.json)
  • Relevant file: tokenizer.json β€” model.vocab field
  • Runtime path: AutoTokenizer.from_pretrained() β†’ AutoModelForSequenceClassification.from_pretrained() β†’ model.forward()

Vulnerability Details

tokenizer.json contains the model.vocab dictionary, which maps each token string to an integer ID. These IDs are consumed at inference time by the tokenizers Rust backend (token_to_id()) and used to construct input_ids. The input_ids are then passed to model.forward(), where they index into the word-embedding matrix stored in model.safetensors.

There is no cross-validation between the tokenizer.json vocab binding and the model.safetensors weight matrix at load time. A model package distributor can swap any two token ID assignments in tokenizer.json, causing those tokens to address different embedding rows without altering the weight file.

Mechanism (step by step)

  1. In the clean package: "good" β†’ ID 2204, "bad" β†’ ID 2919
  2. In the mutant package: "good" β†’ ID 2919, "bad" β†’ ID 2204
  3. model.safetensors hash: identical in both packages
  4. config.json hash: identical in both packages
  5. Raw text "the movie is good" is tokenized differently:
    • clean: input_ids = [101, 1996, 3185, 2003, **2204**, 102] β†’ prediction: POSITIVE
    • mutant: input_ids = [101, 1996, 3185, 2003, **2919**, 102] β†’ prediction: NEGATIVE
  6. The model.forward() invocation is identical in both cases; only the index into the embedding matrix differs.

Proof of Concept

This repository contains a minimal PoC using a tiny deterministic BertForSequenceClassification model (hidden_size=16, 1 layer, vocab_size=30522) with crafted weights to isolate the mechanism. The affected runtime path is the standard Transformers package path: AutoTokenizer.from_pretrained(), AutoModelForSequenceClassification.from_pretrained(), and model.forward().

Reproduce

pip install -r requirements.txt
python reproduce_transformers_tokenizer_vocab_binding_flip.py

Expected terminal output:

TRANSFORMERS_TOKENIZER_VOCAB_BINDING_FLIP_CONFIRMED

Inspect (hash matrix + vocab analysis)

python inspect_transformers_tokenizer_vocab_binding_hash_matrix.py

Expected terminal output:

TRANSFORMERS_TOKENIZER_VOCAB_BINDING_HASH_MATRIX_PASS

Runtime Evidence

Text clean input_ids mutant input_ids clean pred mutant pred flipped
"the movie is good" [101,1996,3185,2003,2204,102] [101,1996,3185,2003,2919,102] POSITIVE NEGATIVE βœ“
"this film is bad" [101,2023,2143,2003,2919,102] [101,2023,2143,2003,2204,102] NEGATIVE POSITIVE βœ“
"an absolutely good film" [101,2019,7078,2204,2143,102] [101,2019,7078,2919,2143,102] POSITIVE NEGATIVE βœ“

Flip count: 3/3. Logit delta: 3.9965.

Hash Matrix

File clean SHA3-256 (16 hex) mutant SHA3-256 (16 hex) Status
model.safetensors ae27a5f4ca265c51 ae27a5f4ca265c51 IDENTICAL
config.json 0340a71886d99eac 0340a71886d99eac IDENTICAL
tokenizer.json 2bd5fd376f6e2aa0 e68711129c55c39b DIFFERS

tokenizer.json is the only modified file.

Route Framing

tokenizer.json is model package state. It is loaded by the Transformers runtime at inference time to convert raw text into token IDs. The binding it establishes between token strings and embedding rows in model.safetensors determines model output. Distributing a package with a mutated tokenizer.json is distributing a semantically modified model β€” even though the weight file bytes are unchanged.

Distinctness

This finding is distinct from all prior submissions:

Dimension This finding Prior ONNX/Joblib/TFLite submissions
Format SafeTensors + tokenizer.json .onnx, .joblib, .tflite
Mutated component tokenizer.json model.vocab ONNX op attribute, sklearn vocab, TFLite metadata
Loader HuggingFace tokenizers Rust backend + safetensors Rust loader onnxruntime / scikit-learn / Task Library
Inference API AutoTokenizer + AutoModelForSequenceClassification + model.forward() different runtimes

Non-Claims

  • This is not a .safetensors binary format parser vulnerability.
  • This is not arbitrary code execution.
  • This is not a scanner bypass (primary claim).
  • Model weights are unchanged. tokenizer.json is part of the model package state; the claim is not that no model state changed.
  • This PoC uses a tiny deterministic BertForSequenceClassification to isolate the mechanism. The issue applies to any Transformers model package that uses tokenizer.json for vocabulary binding.

Recommendation

Transformers model loading should validate that the tokenizer.json vocabulary mapping is consistent with the intended model configuration. At minimum, downstream consumers should treat tokenizer.json with the same integrity requirements as model.safetensors when verifying model package authenticity.

References

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support