Upload 10 files

b5f1313 verified 16 days ago

15.5 kB

	# SavedModel scanner bypass via nested `DatasetFromGraph` `WriteFile`

	Three SavedModels that demonstrate, in order, that ModelScan does flag
	`WriteFile` when it can see it, that nesting it inside `DatasetFromGraph`
	makes it invisible to ModelScan while still running at load, and that the
	same nesting technique can rewrite the model's own checkpoint and turn
	into a persistent backdoor.

	One thing to be clear about up front. The `top_level_writefile` model
	is a control, not a finding. ModelScan flagging it as HIGH is the intended,
	healthy behaviour — it proves the scanner does detect `WriteFile` when
	it's visible. The actual bypass is `nested_writefile`, and the backdoor
	is `self_poisoning_writefile`. Both of those run the same load-time
	`WriteFile` and both pass ModelScan with zero issues.

	Tested against the latest published versions of TensorFlow and ModelScan
	on PyPI (TensorFlow 2.21.0, ModelScan 0.8.8).

	## What's in the box

	```
	build_poc.py rebuild all three SavedModels
	verify_poc.py run ModelScan + load each model + the self-poison
	first/second-load differential, all in an isolated cwd
	requirements.txt pinned tensorflow and modelscan
	sample_results.json sanitised output from a passing run
	models/
	top_level_writefile/ ordinary WriteFile in the top-level GraphDef
	nested_writefile/ same WriteFile hidden inside a nested GraphDef
	self_poisoning_writefile/ nested WriteFile that overwrites the model's
	own variables/ checkpoint at load
	```

	## What each case shows

	### `top_level_writefile` — control (not a finding)

	This case exists to prove the baseline. The top-level `GraphDef`
	contains the `WriteFile` op directly. ModelScan 0.8.8 reports one HIGH
	issue with `operator = WriteFile`, and the model writes its marker file
	when loaded. The control passing is the expected outcome — it's the
	"ModelScan does detect `WriteFile` when it's visible" half of the
	differential. Without this case there's no way to say what the bypass
	is bypassing.

	### `nested_writefile` — bypass

	Same `WriteFile`, same payload, same load-time effect. The only
	difference is location. The top-level `GraphDef` contains only
	`Const`, `DatasetFromGraph`, and `DatasetToGraphV2`. The `WriteFile`
	sits inside a serialised `GraphDef` carried as the bytes of a `Const`
	string tensor, and `DatasetFromGraph` consumes that string as a second
	parser root.

	ModelScan walks the top-level nodes and the function library. It
	doesn't parse `Const` string tensors as nested `GraphDef`s, so the
	hidden `WriteFile` is invisible to it. The scan reports
	`total_issues = 0`. At load time TensorFlow traverses the inner graph
	through the TF1 `legacy_init_op` and runs the hidden write anyway.

	`saved_model_cli show --all` is also blind to the nested op.

	### `self_poisoning_writefile` — persistent backdoor

	A serving model with one variable `w` and signature `y = x * w`. The
	model ships with `w = 2.0` on disk. The same nested `DatasetFromGraph`
	mechanism carries two hidden `WriteFile` nodes targeting relative paths
	`variables/variables.data-00000-of-00001` and `variables/variables.index`.
	The contents are checkpoint bytes from a sibling template model built
	with `w = 1337.0`.

	At first load:

	1. TF restores `w = 2.0` from disk into memory.
	2. The `legacy_init_op` runs the hidden writes, which overwrite the
	on-disk checkpoint with the `w = 1337.0` bytes.
	3. Inference still uses the in-memory `w = 2.0`, so `y(3.0) = 6.0`.

	The model behaves correctly on this first load, which means a deployment
	pipeline that smoke-tests a model once and promotes it on success would
	see no anomaly.

	On every load after that:

	1. TF restores `w = 1337.0` from the now-poisoned checkpoint.
	2. Inference returns `y(3.0) = 4011.0`.

	ModelScan reports the model clean for this case too.

	## Requirements

	The pinned versions match the latest stable releases on PyPI at the time
	of writing.

	```
	python >= 3.10
	tensorflow == 2.21.0 # latest on PyPI
	modelscan == 0.8.8 # latest on PyPI
	```

	Install:

	```
	pip install -r requirements.txt
	```

	## Reproducing the differential

	The repo ships with the prebuilt models under `models/`. To verify all
	three cases:

	```
	python verify_poc.py
	```

	The verifier picks a fresh temp directory, runs ModelScan and
	`saved_model_cli show --all` on each model, then loads each model in a
	child Python process. For the marker cases the child cwd is set to the
	per-case workdir so the relative marker path resolves there. For the
	self-poisoning case, the model is copied into the workdir, the child cwd
	is set to the copied model directory, and the model is loaded twice with
	its `variables/` hashes captured before, after the first load, and after
	the second load.

	Expected output:

	```json
	{
	"top_level_writefile": {
	"pass": true,
	"intent": "control: ModelScan is expected to detect top-level WriteFile",
	"top_level_has_writefile": true,
	"modelscan_total_issues": 1,
	"modelscan_flagged_writefile": true,
	"marker_written": true
	},
	"nested_writefile": {
	"pass": true,
	"intent": "bypass: ModelScan is expected to miss nested WriteFile, load is expected to run it anyway",
	"top_level_has_writefile": false,
	"nested_has_writefile": true,
	"modelscan_total_issues": 0,
	"modelscan_issues_empty": true,
	"marker_written": true
	},
	"self_poisoning_writefile": {
	"pass": true,
	"intent": "persistent backdoor: first load returns benign output, on-disk checkpoint is rewritten, second load returns attacker-chosen output",
	"top_level_has_writefile": false,
	"nested_has_writefile": true,
	"modelscan_total_issues": 0,
	"modelscan_issues_empty": true,
	"input": 3.0,
	"expected_benign_output": 6.0,
	"expected_poison_output": 4011.0,
	"first_load_output": 6.0,
	"second_load_output": 4011.0,
	"first_load_matches_benign": true,
	"second_load_matches_poison": true,
	"checkpoint_changed_after_first_load": true,
	"checkpoint_stable_after_second_load": true
	}
	}
	```

	If you want to rebuild the models yourself before verifying:

	```
	python build_poc.py --overwrite
	python verify_poc.py
	```

	If you want the loader to use a directory you choose instead of a temp
	directory:

	```
	python verify_poc.py --workdir ./scratch
	```

	After a run the marker cases leave files at:

	```
	./scratch/top_level_writefile/tfhunt_markers/top_level_writefile.txt
	./scratch/nested_writefile/tfhunt_markers/nested_writefile.txt
	```

	Both contain `tfhunt_writefile_payload\n`.

	The self-poisoning case leaves a copy of the model with the poisoned
	checkpoint at:

	```
	./scratch/self_poisoning_writefile/self_poisoning_writefile/
	```

	Reload that copy directly with `tf.saved_model.load` and you'll see
	`y(3.0) = 4011.0` instead of the original `6.0`.

	## Manual verification

	If you don't want to trust `verify_poc.py`, each claim below can be
	checked with the stock tooling. Every step assumes your cwd is the
	repo root, and ends back there.

	### 1. ModelScan flags the control, misses the bypass

	```
	modelscan scan -p ./models/top_level_writefile
	modelscan scan -p ./models/nested_writefile
	```

	The first scan reports one HIGH issue with `operator = WriteFile`,
	which is the control behaving correctly. The second scan reports
	`total_issues = 0` on a model that performs the same load-time write.

	### 2. `saved_model_cli` is blind to the nested op

	```
	python -m tensorflow.python.tools.saved_model_cli show --dir ./models/nested_writefile --all
	```

	The relevant line in the output is:

	```
	The MetaGraph with tag set ['serve'] contains the following ops: {'Const', 'DatasetToGraphV2', 'DatasetFromGraph'}
	```

	No mention of `WriteFile`.

	### 3. `tf.saved_model.load` runs the hidden write

	The marker path baked into the model is relative, so the write lands
	wherever the loader's cwd is. Run the load in a clean working directory.

	Linux / macOS:

	```
	mkdir manual_load && cd manual_load
	python -c "import tensorflow as tf; tf.saved_model.load('../models/nested_writefile')"
	ls tfhunt_markers
	cat tfhunt_markers/nested_writefile.txt
	cd ..
	```

	Windows PowerShell:

	```
	New-Item -ItemType Directory manual_load \| Out-Null; Set-Location manual_load
	python -c "import tensorflow as tf; tf.saved_model.load('../models/nested_writefile')"
	Get-ChildItem tfhunt_markers
	Get-Content tfhunt_markers\nested_writefile.txt
	Set-Location ..
	```

	Expected file content:

	```
	tfhunt_writefile_payload
	```

	### 4. Self-poisoning rewrites the checkpoint on first load

	The relative `WriteFile` targets are `variables/...`, so the load must
	run with cwd set to a copy of the model directory. The copy step is
	important — without it, the bundled model itself would get poisoned.

	Linux / macOS:

	```
	cp -r models/self_poisoning_writefile manual_poison
	cd manual_poison
	sha256sum variables/variables.data-00000-of-00001
	python -c "import tensorflow as tf; tf.saved_model.load('.')"
	sha256sum variables/variables.data-00000-of-00001
	```

	Windows PowerShell:

	```
	Copy-Item -Recurse models/self_poisoning_writefile manual_poison
	Set-Location manual_poison
	Get-FileHash variables/variables.data-00000-of-00001 -Algorithm SHA256
	python -c "import tensorflow as tf; tf.saved_model.load('.')"
	Get-FileHash variables/variables.data-00000-of-00001 -Algorithm SHA256
	```

	The two hashes will differ. The on-disk checkpoint has been physically
	overwritten by the load.

	### 5. Second load returns the attacker's weights

	Stay in the `manual_poison/` directory from step 4 and run the load
	again, this time invoking the serving signature:

	```
	python -c "import tensorflow as tf; loaded = tf.saved_model.load('.'); out = loaded.signatures['serving_default'](x=tf.constant(3.0)); print(float(next(iter(out.values())).numpy()))"
	```

	Expected output:

	```
	4011.0
	```

	The shipped model is `y = x * w` with `w = 2.0`, so a clean `y(3.0)`
	would be `6.0`. After the load-time write in step 4 rewrote the
	checkpoint to `w = 1337.0`, the second load reads the poisoned weights
	and returns `4011.0`. Return to the repo root with `cd ..` when done.

	### Troubleshooting

	- `ModuleNotFoundError: No module named 'tensorflow'` — TensorFlow
	isn't installed in the active environment. Run
	`pip install -r requirements.txt` from the repo root.
	- `pip install` resolution fails on `modelscan` — it needs
	Python 3.10-3.12. If the extras aren't pulled in, install with
	`pip install 'modelscan[tensorflow,h5py]==0.8.8'`.
	- Step 3's marker file doesn't appear — the cwd isn't where you think
	it is. Add `import os; print(os.getcwd())` before the `tf.saved_model.load`
	call to confirm.
	- Step 5 still returns `6.0` — the cwd in step 4 wasn't the copied
	`manual_poison` directory, so nothing was poisoned. Copy the model
	again from the bundle (`models/self_poisoning_writefile`) and rerun
	step 4 with the new copy. Or just rebuild everything from scratch
	with `python build_poc.py --overwrite`.
	- `saved_model_cli: command not found` — it ships with TensorFlow but
	isn't always on `PATH`. Use the explicit form
	`python -m tensorflow.python.tools.saved_model_cli show ...`.

	## Why this is interesting

	ModelScan flags `WriteFile` as HIGH when it sees it in the top-level
	graph, so the operator is already on the unsafe list. The bypass isn't
	about the operator. It's about where it's allowed to hide.

	The same idea generalises to any side-effecting op that TensorFlow will
	run from inside an inner dataset graph. `WriteFile` is the cleanest
	demonstration because it's already on ModelScan's denylist, which makes
	the top-level-vs-nested differential unambiguous.

	The self-poisoning case turns that file-write primitive into a
	persistent output-manipulation backdoor that's hard to catch with a
	single-load smoke test, because the malicious output only appears on the
	second and later loads.

	The hidden write also runs in `tf.lite.TFLiteConverter.from_saved_model`,
	`tf2onnx.convert`, TensorFlow Serving, and the NVIDIA Triton TensorFlow
	backend. Those tests live outside this PoC bundle to keep it small and
	auditable, but they use models built the same way.

	## Safety

	These models do exactly two things you can't see in the top-level graph:

	- `top_level_writefile` and `nested_writefile` write
	`tfhunt_writefile_payload\n` to a relative path
	`tfhunt_markers/<name>.txt`, resolved against the loader's working
	directory.
	- `self_poisoning_writefile` overwrites two relative paths
	`variables/variables.data-00000-of-00001` and `variables/variables.index`
	with the byte content of a `w = 1337.0` template checkpoint. Because
	the verifier sets cwd to the copied model directory, those writes only
	touch the copy, not the bundled artifact.

	None of the models reach for absolute paths, environment variables,
	network, credentials, or any other resource.

	If you want to inspect the nested graphs yourself without loading the
	models, `verify_poc.py`'s `inspect_saved_model` function parses the
	serialised inner `GraphDef`s and lists their nodes.

	## Suggested fix

	The gap in `modelscan.scanners.SavedModelTensorflowOpScan` is that it
	walks `GraphDef.node` and the function library on the top-level
	`MetaGraphDef` but doesn't recurse into ops whose inputs are serialised
	`GraphDef` bytes. The fix is to treat those ops as parser roots.

	Sketch of what the scan loop could look like:

	```python
	NESTED_GRAPHDEF_OPS = {
	"DatasetFromGraph", # ops that accept a serialised GraphDef in a string input
	"XlaCallModule", # carries a serialised StableHLO / MLIR module
	}

	MAX_RECURSION_DEPTH = 4
	MAX_INNER_BYTES = 10 * 1024 * 1024


	def scan_graphdef(graph_def, depth=0):
	if depth > MAX_RECURSION_DEPTH:
	return
	for node in graph_def.node:
	if node.op in UNSAFE_OPERATORS:
	report_issue(node, depth=depth)
	if node.op in NESTED_GRAPHDEF_OPS:
	inner_bytes = resolve_const_string_input(node, "graph_def", graph_def)
	if inner_bytes is None or len(inner_bytes) > MAX_INNER_BYTES:
	continue
	inner = GraphDef()
	inner.ParseFromString(inner_bytes)
	scan_graphdef(inner, depth=depth + 1)
	for fn in graph_def.library.function:
	for node in fn.node_def:
	# Same walk as above, on the function library.
	...
	```

	The bounded recursion depth and byte cap stop a malicious model from
	turning a recursive scan into a parser DoS.

	The same logic would help any scanner that gates `.pb` files on a
	top-level op walk. For TensorFlow itself, documenting that any op
	carrying serialised IR (`DatasetFromGraph`, `XlaCallModule`, and so on)
	should be treated as a parser root by external scanners would help
	downstream tooling write fixes that cover all of them at once.

	## Files generated by a run

	`verify_poc.py` writes:

	- `verification.json` next to the script. This contains absolute paths
	from your machine, so it's `.gitignore`d and is not part of the
	shipped artifact.

	`build_poc.py` writes:

	- `models/top_level_writefile/saved_model.pb`
	- `models/nested_writefile/saved_model.pb`
	- `models/self_poisoning_writefile/saved_model.pb`
	- `models/self_poisoning_writefile/variables/variables.data-00000-of-00001`
	- `models/self_poisoning_writefile/variables/variables.index`

	The first two models have empty `variables/` directories. That's expected for those graphs.

	## Environment used to validate

	```
	Python 3.12.3
	tensorflow 2.21.0
	modelscan 0.8.8
	Windows host
	```