MediumIQ's picture
Upload 10 files
b5f1313 verified
# SavedModel scanner bypass via nested `DatasetFromGraph` `WriteFile`
Three SavedModels that demonstrate, in order, that ModelScan does flag
`WriteFile` when it can see it, that nesting it inside `DatasetFromGraph`
makes it invisible to ModelScan while still running at load, and that the
same nesting technique can rewrite the model's own checkpoint and turn
into a persistent backdoor.
**One thing to be clear about up front.** The `top_level_writefile` model
is a control, not a finding. ModelScan flagging it as HIGH is the intended,
healthy behaviour — it proves the scanner does detect `WriteFile` when
it's visible. The actual bypass is `nested_writefile`, and the backdoor
is `self_poisoning_writefile`. Both of those run the same load-time
`WriteFile` and both pass ModelScan with zero issues.
Tested against the latest published versions of TensorFlow and ModelScan
on PyPI (TensorFlow 2.21.0, ModelScan 0.8.8).
## What's in the box
```
build_poc.py rebuild all three SavedModels
verify_poc.py run ModelScan + load each model + the self-poison
first/second-load differential, all in an isolated cwd
requirements.txt pinned tensorflow and modelscan
sample_results.json sanitised output from a passing run
models/
top_level_writefile/ ordinary WriteFile in the top-level GraphDef
nested_writefile/ same WriteFile hidden inside a nested GraphDef
self_poisoning_writefile/ nested WriteFile that overwrites the model's
own variables/ checkpoint at load
```
## What each case shows
### `top_level_writefile` — control (not a finding)
This case exists to prove the baseline. The top-level `GraphDef`
contains the `WriteFile` op directly. ModelScan 0.8.8 reports one HIGH
issue with `operator = WriteFile`, and the model writes its marker file
when loaded. **The control passing is the expected outcome** — it's the
"ModelScan does detect `WriteFile` when it's visible" half of the
differential. Without this case there's no way to say what the bypass
is bypassing.
### `nested_writefile` — bypass
Same `WriteFile`, same payload, same load-time effect. The only
difference is location. The top-level `GraphDef` contains only
`Const`, `DatasetFromGraph`, and `DatasetToGraphV2`. The `WriteFile`
sits inside a serialised `GraphDef` carried as the bytes of a `Const`
string tensor, and `DatasetFromGraph` consumes that string as a second
parser root.
ModelScan walks the top-level nodes and the function library. It
doesn't parse `Const` string tensors as nested `GraphDef`s, so the
hidden `WriteFile` is invisible to it. The scan reports
`total_issues = 0`. At load time TensorFlow traverses the inner graph
through the TF1 `legacy_init_op` and runs the hidden write anyway.
`saved_model_cli show --all` is also blind to the nested op.
### `self_poisoning_writefile` — persistent backdoor
A serving model with one variable `w` and signature `y = x * w`. The
model ships with `w = 2.0` on disk. The same nested `DatasetFromGraph`
mechanism carries two hidden `WriteFile` nodes targeting relative paths
`variables/variables.data-00000-of-00001` and `variables/variables.index`.
The contents are checkpoint bytes from a sibling template model built
with `w = 1337.0`.
At first load:
1. TF restores `w = 2.0` from disk into memory.
2. The `legacy_init_op` runs the hidden writes, which overwrite the
on-disk checkpoint with the `w = 1337.0` bytes.
3. Inference still uses the in-memory `w = 2.0`, so `y(3.0) = 6.0`.
The model behaves correctly on this first load, which means a deployment
pipeline that smoke-tests a model once and promotes it on success would
see no anomaly.
On every load after that:
1. TF restores `w = 1337.0` from the now-poisoned checkpoint.
2. Inference returns `y(3.0) = 4011.0`.
ModelScan reports the model clean for this case too.
## Requirements
The pinned versions match the latest stable releases on PyPI at the time
of writing.
```
python >= 3.10
tensorflow == 2.21.0 # latest on PyPI
modelscan == 0.8.8 # latest on PyPI
```
Install:
```
pip install -r requirements.txt
```
## Reproducing the differential
The repo ships with the prebuilt models under `models/`. To verify all
three cases:
```
python verify_poc.py
```
The verifier picks a fresh temp directory, runs ModelScan and
`saved_model_cli show --all` on each model, then loads each model in a
child Python process. For the marker cases the child cwd is set to the
per-case workdir so the relative marker path resolves there. For the
self-poisoning case, the model is copied into the workdir, the child cwd
is set to the copied model directory, and the model is loaded twice with
its `variables/` hashes captured before, after the first load, and after
the second load.
Expected output:
```json
{
"top_level_writefile": {
"pass": true,
"intent": "control: ModelScan is expected to detect top-level WriteFile",
"top_level_has_writefile": true,
"modelscan_total_issues": 1,
"modelscan_flagged_writefile": true,
"marker_written": true
},
"nested_writefile": {
"pass": true,
"intent": "bypass: ModelScan is expected to miss nested WriteFile, load is expected to run it anyway",
"top_level_has_writefile": false,
"nested_has_writefile": true,
"modelscan_total_issues": 0,
"modelscan_issues_empty": true,
"marker_written": true
},
"self_poisoning_writefile": {
"pass": true,
"intent": "persistent backdoor: first load returns benign output, on-disk checkpoint is rewritten, second load returns attacker-chosen output",
"top_level_has_writefile": false,
"nested_has_writefile": true,
"modelscan_total_issues": 0,
"modelscan_issues_empty": true,
"input": 3.0,
"expected_benign_output": 6.0,
"expected_poison_output": 4011.0,
"first_load_output": 6.0,
"second_load_output": 4011.0,
"first_load_matches_benign": true,
"second_load_matches_poison": true,
"checkpoint_changed_after_first_load": true,
"checkpoint_stable_after_second_load": true
}
}
```
If you want to rebuild the models yourself before verifying:
```
python build_poc.py --overwrite
python verify_poc.py
```
If you want the loader to use a directory you choose instead of a temp
directory:
```
python verify_poc.py --workdir ./scratch
```
After a run the marker cases leave files at:
```
./scratch/top_level_writefile/tfhunt_markers/top_level_writefile.txt
./scratch/nested_writefile/tfhunt_markers/nested_writefile.txt
```
Both contain `tfhunt_writefile_payload\n`.
The self-poisoning case leaves a copy of the model with the poisoned
checkpoint at:
```
./scratch/self_poisoning_writefile/self_poisoning_writefile/
```
Reload that copy directly with `tf.saved_model.load` and you'll see
`y(3.0) = 4011.0` instead of the original `6.0`.
## Manual verification
If you don't want to trust `verify_poc.py`, each claim below can be
checked with the stock tooling. Every step assumes your cwd is the
repo root, and ends back there.
### 1. ModelScan flags the control, misses the bypass
```
modelscan scan -p ./models/top_level_writefile
modelscan scan -p ./models/nested_writefile
```
The first scan reports one HIGH issue with `operator = WriteFile`,
which is the control behaving correctly. The second scan reports
`total_issues = 0` on a model that performs the same load-time write.
### 2. `saved_model_cli` is blind to the nested op
```
python -m tensorflow.python.tools.saved_model_cli show --dir ./models/nested_writefile --all
```
The relevant line in the output is:
```
The MetaGraph with tag set ['serve'] contains the following ops: {'Const', 'DatasetToGraphV2', 'DatasetFromGraph'}
```
No mention of `WriteFile`.
### 3. `tf.saved_model.load` runs the hidden write
The marker path baked into the model is relative, so the write lands
wherever the loader's cwd is. Run the load in a clean working directory.
Linux / macOS:
```
mkdir manual_load && cd manual_load
python -c "import tensorflow as tf; tf.saved_model.load('../models/nested_writefile')"
ls tfhunt_markers
cat tfhunt_markers/nested_writefile.txt
cd ..
```
Windows PowerShell:
```
New-Item -ItemType Directory manual_load | Out-Null; Set-Location manual_load
python -c "import tensorflow as tf; tf.saved_model.load('../models/nested_writefile')"
Get-ChildItem tfhunt_markers
Get-Content tfhunt_markers\nested_writefile.txt
Set-Location ..
```
Expected file content:
```
tfhunt_writefile_payload
```
### 4. Self-poisoning rewrites the checkpoint on first load
The relative `WriteFile` targets are `variables/...`, so the load must
run with cwd set to a copy of the model directory. The copy step is
important — without it, the bundled model itself would get poisoned.
Linux / macOS:
```
cp -r models/self_poisoning_writefile manual_poison
cd manual_poison
sha256sum variables/variables.data-00000-of-00001
python -c "import tensorflow as tf; tf.saved_model.load('.')"
sha256sum variables/variables.data-00000-of-00001
```
Windows PowerShell:
```
Copy-Item -Recurse models/self_poisoning_writefile manual_poison
Set-Location manual_poison
Get-FileHash variables/variables.data-00000-of-00001 -Algorithm SHA256
python -c "import tensorflow as tf; tf.saved_model.load('.')"
Get-FileHash variables/variables.data-00000-of-00001 -Algorithm SHA256
```
The two hashes will differ. The on-disk checkpoint has been physically
overwritten by the load.
### 5. Second load returns the attacker's weights
Stay in the `manual_poison/` directory from step 4 and run the load
again, this time invoking the serving signature:
```
python -c "import tensorflow as tf; loaded = tf.saved_model.load('.'); out = loaded.signatures['serving_default'](x=tf.constant(3.0)); print(float(next(iter(out.values())).numpy()))"
```
Expected output:
```
4011.0
```
The shipped model is `y = x * w` with `w = 2.0`, so a clean `y(3.0)`
would be `6.0`. After the load-time write in step 4 rewrote the
checkpoint to `w = 1337.0`, the second load reads the poisoned weights
and returns `4011.0`. Return to the repo root with `cd ..` when done.
### Troubleshooting
- `ModuleNotFoundError: No module named 'tensorflow'` — TensorFlow
isn't installed in the active environment. Run
`pip install -r requirements.txt` from the repo root.
- `pip install` resolution fails on `modelscan` — it needs
Python 3.10-3.12. If the extras aren't pulled in, install with
`pip install 'modelscan[tensorflow,h5py]==0.8.8'`.
- Step 3's marker file doesn't appear — the cwd isn't where you think
it is. Add `import os; print(os.getcwd())` before the `tf.saved_model.load`
call to confirm.
- Step 5 still returns `6.0` — the cwd in step 4 wasn't the copied
`manual_poison` directory, so nothing was poisoned. Copy the model
again from the bundle (`models/self_poisoning_writefile`) and rerun
step 4 with the new copy. Or just rebuild everything from scratch
with `python build_poc.py --overwrite`.
- `saved_model_cli: command not found` — it ships with TensorFlow but
isn't always on `PATH`. Use the explicit form
`python -m tensorflow.python.tools.saved_model_cli show ...`.
## Why this is interesting
ModelScan flags `WriteFile` as HIGH when it sees it in the top-level
graph, so the operator is already on the unsafe list. The bypass isn't
about the operator. It's about where it's allowed to hide.
The same idea generalises to any side-effecting op that TensorFlow will
run from inside an inner dataset graph. `WriteFile` is the cleanest
demonstration because it's already on ModelScan's denylist, which makes
the top-level-vs-nested differential unambiguous.
The self-poisoning case turns that file-write primitive into a
persistent output-manipulation backdoor that's hard to catch with a
single-load smoke test, because the malicious output only appears on the
second and later loads.
The hidden write also runs in `tf.lite.TFLiteConverter.from_saved_model`,
`tf2onnx.convert`, TensorFlow Serving, and the NVIDIA Triton TensorFlow
backend. Those tests live outside this PoC bundle to keep it small and
auditable, but they use models built the same way.
## Safety
These models do exactly two things you can't see in the top-level graph:
- `top_level_writefile` and `nested_writefile` write
`tfhunt_writefile_payload\n` to a relative path
`tfhunt_markers/<name>.txt`, resolved against the loader's working
directory.
- `self_poisoning_writefile` overwrites two relative paths
`variables/variables.data-00000-of-00001` and `variables/variables.index`
with the byte content of a `w = 1337.0` template checkpoint. Because
the verifier sets cwd to the copied model directory, those writes only
touch the copy, not the bundled artifact.
None of the models reach for absolute paths, environment variables,
network, credentials, or any other resource.
If you want to inspect the nested graphs yourself without loading the
models, `verify_poc.py`'s `inspect_saved_model` function parses the
serialised inner `GraphDef`s and lists their nodes.
## Suggested fix
The gap in `modelscan.scanners.SavedModelTensorflowOpScan` is that it
walks `GraphDef.node` and the function library on the top-level
`MetaGraphDef` but doesn't recurse into ops whose inputs are serialised
`GraphDef` bytes. The fix is to treat those ops as parser roots.
Sketch of what the scan loop could look like:
```python
NESTED_GRAPHDEF_OPS = {
"DatasetFromGraph", # ops that accept a serialised GraphDef in a string input
"XlaCallModule", # carries a serialised StableHLO / MLIR module
}
MAX_RECURSION_DEPTH = 4
MAX_INNER_BYTES = 10 * 1024 * 1024
def scan_graphdef(graph_def, depth=0):
if depth > MAX_RECURSION_DEPTH:
return
for node in graph_def.node:
if node.op in UNSAFE_OPERATORS:
report_issue(node, depth=depth)
if node.op in NESTED_GRAPHDEF_OPS:
inner_bytes = resolve_const_string_input(node, "graph_def", graph_def)
if inner_bytes is None or len(inner_bytes) > MAX_INNER_BYTES:
continue
inner = GraphDef()
inner.ParseFromString(inner_bytes)
scan_graphdef(inner, depth=depth + 1)
for fn in graph_def.library.function:
for node in fn.node_def:
# Same walk as above, on the function library.
...
```
The bounded recursion depth and byte cap stop a malicious model from
turning a recursive scan into a parser DoS.
The same logic would help any scanner that gates `.pb` files on a
top-level op walk. For TensorFlow itself, documenting that any op
carrying serialised IR (`DatasetFromGraph`, `XlaCallModule`, and so on)
should be treated as a parser root by external scanners would help
downstream tooling write fixes that cover all of them at once.
## Files generated by a run
`verify_poc.py` writes:
- `verification.json` next to the script. This contains absolute paths
from your machine, so it's `.gitignore`d and is not part of the
shipped artifact.
`build_poc.py` writes:
- `models/top_level_writefile/saved_model.pb`
- `models/nested_writefile/saved_model.pb`
- `models/self_poisoning_writefile/saved_model.pb`
- `models/self_poisoning_writefile/variables/variables.data-00000-of-00001`
- `models/self_poisoning_writefile/variables/variables.index`
The first two models have empty `variables/` directories. That's expected for those graphs.
## Environment used to validate
```
Python 3.12.3
tensorflow 2.21.0
modelscan 0.8.8
Windows host
```