ModelScan legacy PyTorch multi-pickle scanner mismatch
This repository contains a benign, non-executing regression artifact for a ModelScan scanner mismatch in legacy PyTorch .pt files.
legacy_marker.pt follows the legacy PyTorch stream shape:
- PyTorch magic number pickle
- protocol-version pickle
- sys-info pickle
- object pickle
- storage-key-list pickle
The object pickle contains a harmless reference to sys.getsizeof. It does not contain a command execution payload and should not perform side effects when inspected. ModelScan's default unsafe-global policy marks sys:* as CRITICAL, but scan_pytorch() calls scan_pickle_bytes(..., multiple_pickles=False), so only the first magic-number pickle is inspected.
Local verification
Tested against protectai/modelscan tag v0.8.8, commit 61fcec9c2a37c24c1fb12d84ede30fe248a364bd.
ModelScan scan(path) issues: 0
ModelScan scan(path) errors: 0
ModelScan scanned: ['legacy_marker.pt']
Full multi-pickle scan issues: [('sys', 'getsizeof', 'CRITICAL')]
Artifact SHA-256:
c49a1db639e12cd375462eb2f377596513c40f5c440d901e62aa5dbf04622660 legacy_marker.pt
Relevant source behavior
PyTorch's legacy serializer writes multiple pickles before and after the object pickle. In current PyTorch source, _legacy_save() writes the magic number, protocol version, sys-info, then pickler.dump(obj), followed by serialized storage keys. _legacy_load() reads those fields in the same order and deserializes the object pickle after the sys-info record.
ModelScan's scan_pytorch() accepts the PyTorch magic number, then calls the pickle scanner with multiple_pickles=False, which stops after the first pickle and does not inspect the object pickle.
Expected result
ModelScan should inspect all pickle records that are part of the legacy PyTorch stream, or at minimum continue through the object pickle after validating the magic number and protocol metadata.