Flax/Orbax Sparse-Shape Checkpoint PoC
This is a benign security PoC for a Flax/Orbax checkpoint resource-amplification issue. It demonstrates a tiny legacy Orbax/Zarr checkpoint whose model-controlled metadata restores as a much larger zero-filled float32 tensor.
Files
build_poc.pycreates the checkpoint artifact.verify_poc.pyrestores the checkpoint and writesresults.json.poc_sparse_shape_checkpoint/checkpoint_0/is the generated checkpoint artifact.poc_sparse_shape_checkpoint/artifact_manifest.jsonrecords file sizes and SHA256 values.modelscan_checkpoint_0.jsonis ModelScan output.results.jsonis runtime restore output from the local validation run.
Trigger
The checkpoint uses legacy Orbax/Zarr layout with:
_METADATA:store_array_data_equal_to_fill_value=falsemarker/.zarray: a large declaredshape- no chunk data file
On restore, current Orbax/TensorStore fills missing chunk data and materializes the declared array. The staged artifact uses a safe default shape of 8,000,000 float32 elements, allocating about 32 MB from a sub-kilobyte checkpoint.
Reproduction
python -m venv .venv
.venv/Scripts/python -m pip install flax orbax-checkpoint modelscan transformers msgpack psutil numpy
.venv/Scripts/python build_poc.py --shape 8000000
.venv/Scripts/python verify_poc.py
.venv/Scripts/modelscan scan -p poc_sparse_shape_checkpoint/checkpoint_0 -r json --show-skipped -o modelscan_checkpoint_0.json
Expected restore output includes:
restored_shape:[8000000]restored_dtype:float32restored_nbytes:32000000first_valueandlast_value:0.0
Scanner Output Summary
ModelScan 0.8.8 reports zero issues and skips _CHECKPOINT_METADATA, _METADATA, and marker/.zarray as unsupported files. This is not an ACE scanner bypass; it is scanner/runtime evidence for a currently unsupported Flax/Orbax checkpoint surface.
Impact
An attacker-controlled checkpoint can be much smaller on disk than the array materialized during restore. Increasing the declared Zarr shape scales the allocation and can produce a denial of service in services that restore untrusted Flax/Orbax checkpoints.
Limitations:
- DoS/resource amplification only.
- No code execution.
- Demonstrated on legacy Orbax/Zarr checkpoint layout, not default OCDBT.
- Requires a service or user to restore the checkpoint.
Mitigations
- Reject or cap restored tensor shapes before
TensorStore.read(). - Reject checkpoints with missing chunk data unless explicitly trusted.
- Treat
_METADATAand.zarrayas security-sensitive scanner targets. - Enforce a maximum restored byte budget per checkpoint.