PoC: RWKVTokenizer eval() - Arbitrary Code Execution via .keras Model File

Vulnerability: eval() on attacker-controlled vocabulary in keras_hub.models.RWKVTokenizer Affected: keras-hub 0.26.0 to 0.28.0 | keras 3.9.0 to 3.12.1 CWE: CWE-95 (Eval Injection) Bypasses: safe_mode=True (keras default)

What this repo contains

malicious_rwkv_tokenizer.keras - a crafted .keras model archive. When loaded with keras.models.load_model(), the vocabulary field in config.json reaches eval() inside RWKVTokenizerBase.__init__ (line 117) and RWKVTokenizer.set_vocabulary (line 275) in rwkv7_tokenizer.py.

The payload in this file is benign: it writes the string 'RCE_via_load_model' to <tempdir>/rwkv_poc.txt. No network, no persistence, no destruction.

Reproduction

import sys
from unittest.mock import MagicMock
sys.modules.setdefault("tensorflow_text", MagicMock())  # satisfy TF deployment prereq

import keras
import keras_hub  # required: registers keras_hub>RWKVTokenizer in Keras object registry

model = keras.models.load_model("malicious_rwkv_tokenizer.keras", safe_mode=True)
# eval() fires during load - marker written to tempdir, no exception raised

Note: keras_hub must be imported before load_model(). This is satisfied automatically in any real deployment using keras_hub models - the attack prerequisite is standard, not exceptional.

Note on tensorflow_text: assert_tf_libs_installed() is a functional deployment prerequisite present in all keras-hub tokenizers. The mock above simulates a real deployment where TF and tensorflow-text are installed (required to use any keras-hub tokenizer in production).

Root cause

rwkv7_tokenizer.py calls eval() on every vocabulary entry string:

# line 117 - RWKVTokenizerBase.__init__
x = eval(line[line.index(" ") : line.rindex(" ")])

# line 275 - RWKVTokenizer.set_vocabulary
repr_str = eval(line[line.index(" ") : line.rindex(" ")])

The vocabulary list is stored verbatim in config.json inside the .keras ZIP and deserialized directly into __init__. keras-hub is in keras's unconditional deserialization allowlist (serialization_lib.py:816), so SafeModeScope is active but the tokenizer never calls in_safe_mode().

Fix

Replace both eval() calls with ast.literal_eval().

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support