Upload README.md with huggingface_hub

0dae0bf verified 20 days ago

4.18 kB

	# modelscan `.keras` (v3 zip) Lambda-detection bypass via Lambda nested in a Functional submodel

	Severity: Medium (matches modelscan's own MEDIUM rating for a detected Lambda; RCE-on-load gated by `safe_mode=False`)
	Affected tool: `modelscan 0.8.8` — `scanners/keras/scan.py` `KerasLambdaDetectScan`. Victim loader: `keras.saving.load_model(safe_mode=False)` (keras 3.14.1).
	Category: ModelScan scanner-bypass on `.keras` (Keras v3 zip).

	## Summary
	Companion to the `.h5` nested-submodel finding, against the distinct `.keras`-zip scanner code path + file format. `KerasLambdaDetectScan` reads `config.json` from the `.keras` zip and extracts operators with a single non-recursive list comprehension over the top-level `config["layers"]`, flagging only top-level `class_name == "Lambda"`. Wrapping the malicious Lambda in a one-layer Functional submodel moves it to `config.layers[i].config.layers[j]`; the top-level list then contains `class_name "Functional"` (not "Lambda") → scanner returns `[]` → "No issues found! 🎉" (API and CLI). At load, `functional_from_config` recurses into the submodel and `Lambda.from_config` → `python_utils.func_load(marshal.loads(...))` executes the attacker code.

	## Root cause
	`modelscan/scanners/keras/scan.py:119-130` (`_get_keras_operator_names`): flat comprehension `for layer in model_config_data.get("config", {}).get("layers", {})` flagging only top-level `class_name == "Lambda"`; never descends into a nested sub-Model's `config.layers`.

	Exec sink: `keras/src/models/functional.py::functional_from_config` → `serialization_lib.deserialize_keras_object` (recursive) → `keras/src/layers/core/lambda_layer.py:182-198` `Lambda.from_config` → `python_utils.func_load`.

	## Reproduce
	`python poc/poc_full.py` (env: keras 3.14.1, modelscan 0.8.8; payload built with keras' own `func_dump` = real marshalled bytecode, save-equivalent). Output:
	- (1) baseline top-level Lambda at `/config/layers[1]` → modelscan operators `['Lambda']` (flagged).
	- (2) nested Lambda at `/config/layers[2]/config/layers[1]` → top-level class_names `['InputLayer','Dense','Functional','Dense']`, modelscan operators `[]` (bypass).
	- (3) `load_model(safe_mode=False)` → marker written (RCE).
	- (4) `load_model(safe_mode=True)` → blocked by keras `ValueError` (so the warning modelscan failed to emit was the user's only safeguard).
	- CLI cross-check: `modelscan -p nested.keras` → `KerasLambdaDetectScan` → "No issues found! 🎉".

	Empirically distinguished from the TimeDistributed dup: `TimeDistributed(Lambda)` puts the Lambda at `config/layers[1]/config/layer` (single `layer` key); this finding's path is `config/layers[i]/config/layers[j]` (nested submodel layers list).

	## Impact
	Defeats modelscan's gate completely for the `.keras` v3 format: a code-executing model is certified clean, and a consumer who loads it with `safe_mode=False` (required for every genuine Lambda model, routine in HF/third-party code) is compromised. Severity matches modelscan's MEDIUM for a detected Lambda; the bypass makes that detection 0%.

	## Dup-check
	Not public for this vector. CVE-2025-1550 + huntr blog = top-level Lambda (modelscan flags those). CVE-2025-9905 = HDF5 `safe_mode`-ignore (different format/mechanism). arXiv:2509.06703 KV.1/KV.2 = top-level self-disabling Lambdas. The "29 ways" article lists only TimeDistributed for Keras Lambda nesting (re-fetched; no mention of nested Functional/Sequential submodels). Distinct from our R5 `__lambda__`-in-TextVectorization and the companion R6 H5 nested-submodel (different scanner class + format).

	> Honest caveat (the reason this is Medium, not higher): the root cause — non-recursive top-level-only layer scan — is identical to the already-known TimeDistributed bypass and to the companion `.h5` finding. A single fix closes all three, so a maintainer may legitimately treat this as a duplicate-by-root-cause / bundle it. It clears the bar as a genuinely distinct, undocumented config path against a distinct scanner method + format with a fully working RCE — but file expecting possible consolidation.

	# modelscan `.keras` (v3 zip) Lambda-detection bypass via Lambda nested in a Functional submodel

	Severity: Medium (matches modelscan's own MEDIUM rating for a detected Lambda; RCE-on-load gated by `safe_mode=False`)
	Affected tool: `modelscan 0.8.8` — `scanners/keras/scan.py` `KerasLambdaDetectScan`. Victim loader: `keras.saving.load_model(safe_mode=False)` (keras 3.14.1).
	Category: ModelScan scanner-bypass on `.keras` (Keras v3 zip).

	## Summary
	Companion to the `.h5` nested-submodel finding, against the distinct `.keras`-zip scanner code path + file format. `KerasLambdaDetectScan` reads `config.json` from the `.keras` zip and extracts operators with a single non-recursive list comprehension over the top-level `config["layers"]`, flagging only top-level `class_name == "Lambda"`. Wrapping the malicious Lambda in a one-layer Functional submodel moves it to `config.layers[i].config.layers[j]`; the top-level list then contains `class_name "Functional"` (not "Lambda") → scanner returns `[]` → "No issues found! 🎉" (API and CLI). At load, `functional_from_config` recurses into the submodel and `Lambda.from_config` → `python_utils.func_load(marshal.loads(...))` executes the attacker code.

	## Root cause
	`modelscan/scanners/keras/scan.py:119-130` (`_get_keras_operator_names`): flat comprehension `for layer in model_config_data.get("config", {}).get("layers", {})` flagging only top-level `class_name == "Lambda"`; never descends into a nested sub-Model's `config.layers`.

	Exec sink: `keras/src/models/functional.py::functional_from_config` → `serialization_lib.deserialize_keras_object` (recursive) → `keras/src/layers/core/lambda_layer.py:182-198` `Lambda.from_config` → `python_utils.func_load`.

	## Reproduce
	`python poc/poc_full.py` (env: keras 3.14.1, modelscan 0.8.8; payload built with keras' own `func_dump` = real marshalled bytecode, save-equivalent). Output:
	- (1) baseline top-level Lambda at `/config/layers[1]` → modelscan operators `['Lambda']` (flagged).
	- (2) nested Lambda at `/config/layers[2]/config/layers[1]` → top-level class_names `['InputLayer','Dense','Functional','Dense']`, modelscan operators `[]` (bypass).
	- (3) `load_model(safe_mode=False)` → marker written (RCE).
	- (4) `load_model(safe_mode=True)` → blocked by keras `ValueError` (so the warning modelscan failed to emit was the user's only safeguard).
	- CLI cross-check: `modelscan -p nested.keras` → `KerasLambdaDetectScan` → "No issues found! 🎉".

	Empirically distinguished from the TimeDistributed dup: `TimeDistributed(Lambda)` puts the Lambda at `config/layers[1]/config/layer` (single `layer` key); this finding's path is `config/layers[i]/config/layers[j]` (nested submodel layers list).

	## Impact
	Defeats modelscan's gate completely for the `.keras` v3 format: a code-executing model is certified clean, and a consumer who loads it with `safe_mode=False` (required for every genuine Lambda model, routine in HF/third-party code) is compromised. Severity matches modelscan's MEDIUM for a detected Lambda; the bypass makes that detection 0%.

	## Dup-check
	Not public for this vector. CVE-2025-1550 + huntr blog = top-level Lambda (modelscan flags those). CVE-2025-9905 = HDF5 `safe_mode`-ignore (different format/mechanism). arXiv:2509.06703 KV.1/KV.2 = top-level self-disabling Lambdas. The "29 ways" article lists only TimeDistributed for Keras Lambda nesting (re-fetched; no mention of nested Functional/Sequential submodels). Distinct from our R5 `__lambda__`-in-TextVectorization and the companion R6 H5 nested-submodel (different scanner class + format).

	> Honest caveat (the reason this is Medium, not higher): the root cause — non-recursive top-level-only layer scan — is identical to the already-known TimeDistributed bypass and to the companion `.h5` finding. A single fix closes all three, so a maintainer may legitimately treat this as a duplicate-by-root-cause / bundle it. It clears the bar as a genuinely distinct, undocumented config path against a distinct scanner method + format with a fully working RCE — but file expecting possible consolidation.