PoC: Introspection Gap β dill._dill.loads BINBYTES Payload Undetected by PickleScan and ModelScan
β οΈ SECURITY RESEARCH β DO NOT LOAD THESE FILES IN AN UNTRUSTED ENVIRONMENT
This repository contains proof-of-concept pickle files that demonstrate a scanner introspection gap in PickleScan and ModelScan. Payloads are benign (create an empty canary file) but prove arbitrary code execution.
Summary
Two PoCs isolate the root cause: scanners do not introspect the bytes argument
of a REDUCE callable, so a payload embedded as opaque SHORT_BINBYTES is never
analyzed. The two PoCs together form a controlled experiment:
| PoC | Carrier | PickleScan (PyPI) | PickleScan (HF) | Inner payload found? |
|---|---|---|---|---|
poc_dill_nested.pkl |
BINBYTES | BENIGN β | SUSPICIOUS β (same as LinearRegression) |
No β BINBYTES not traversed |
poc_torch_chain.pt |
zip | BENIGN β | MALICIOUS β | Yes β zip traversal already implemented |
The torch chain being caught on HF proves introspection works. The dill chain evades because PickleScan traverses zip archives but not BINBYTES-embedded pickles. Both achieve verified arbitrary code execution.
Reproduction
# Install dependencies
pip install dill torch picklescan modelscan
# --- PoC 1: dill._dill.loads ---
picklescan -p poc_dill_nested.pkl # BENIGN (false negative)
modelscan -p poc_dill_nested.pkl # BENIGN (false negative)
python poc_build.py --verify # [+] CANARY FIRED: /tmp/canary_dill_bypass_poc
# --- PoC 2: torch.storage._load_from_bytes ---
picklescan -p poc_torch_chain.pt # BENIGN (false negative)
python poc_torch_chain.py --verify # [+] CANARY FIRED: /tmp/canary_torch_chain_bypass_poc
Root Cause
Both PickleScan and ModelScan extract (module, name) pairs from
GLOBAL/STACK_GLOBAL/INST opcodes and check them against a denylist.
Neither scanner recurses into the bytes argument passed to a REDUCE callable.
Two callables are absent from all denylists:
dill._dill/dill._dill.loadsβ absent from PickleScan_unsafe_globals(lines 120β226) and ModelScanunsafe_globals(lines 94β142)torch.storage._load_from_bytesβ absent from both denylists; callstorch.load(io.BytesIO(b), weights_only=False)internally, re-entering pickle deserialization
PickleScan scanner.py:229 TODO comment acknowledges numpy.load, pandas.read_pickle,
joblib.load, torch.load as known unhandled nested loaders. Neither dill._dill.loads nor
torch.storage._load_from_bytes is in the acknowledged list.
Payload Details
| PoC | Payload | Canary |
|---|---|---|
poc_dill_nested.pkl |
os.system('touch /tmp/canary_dill_bypass_poc') |
/tmp/canary_dill_bypass_poc |
poc_torch_chain.pt |
posix.system('touch /tmp/canary_torch_chain_bypass_poc') |
/tmp/canary_torch_chain_bypass_poc |
All payloads are benign: create an empty file, no destructive behavior.
Files
| File | Description |
|---|---|
poc_dill_nested.pkl |
79-byte PoC 1 (pre-built) |
poc_build.py |
PoC 1 regeneration script (no binary deps) |
poc_torch_chain.pt |
457-byte PoC 2 (pre-built) |
poc_torch_chain.py |
PoC 2 regeneration script (torch for verify) |
README.md |
This file |
Suggested Fix
Minimal fix: Add ('dill._dill', 'loads'), ('dill._dill', 'load'),
and ('dill', 'loads') to the scanner denylists.
Structural fix: Implement recursive inner-pickle scanning for all
callables that re-enter deserialization (pickle.loads, dill.loads,
cloudpickle.loads, joblib.load, numpy.load, pandas.read_pickle,
torch.load, torch.storage._load_from_bytes, etc.). This addresses the
entire nested-deserialization weakness class rather than individual qualnames.
Responsible Disclosure
This PoC was reported to the maintainers of PickleScan and ModelScan via
Huntr before public disclosure. Access to this repository is gated;
protectai-bot has been granted access for triage.