PoC: ModelScan .joblib scanner bypass β uncompressed genops desync
Authorized security-research PoC for the Protect AI / huntr Model File Vulnerability program.
Tested against modelscan==0.8.8 (latest), joblib==1.5.3, numpy==2.4.6, Python 3.11.
poc_model.joblibexecutes code onjoblib.load()(a harmless markertouch). Do not load it outside an isolated sandbox. Published gated, for review only.
What it shows
poc_model.joblib is an uncompressed joblib file shaped like a normal model
([numpy weights array, object]). ModelScan disassembles .joblib with
pickletools.genops over the raw bytes. joblib splices raw numpy-array bytes plus a
0xff alignment padding into the pickle opcode stream, which desyncs genops; ModelScan
catches the parse error and prints "No issues found! π". joblib.load reads the array
by its recorded length and then executes the trailing __reduce__ β code execution at
load time.
Distinct from the known joblib compression bypass: this file is uncompressed, so
remediating the compression bypass by decompressing before scanning does not fix it.
The shared root defect is that ModelScan treats a genops parse error as clean instead
of scan failed.
Reproduce
pip install modelscan==0.8.8 joblib numpy
# 1) Scanner reports the file as clean:
modelscan -p poc_model.joblib
# --- Summary ---
# No issues found!
# --- Errors ---
# Parsing error: at position 223, opcode b'\x10' unknown
# 2) Loader executes the payload (run ONLY in a sandbox / VM / container):
python -c "import joblib; joblib.load('poc_model.joblib')"
cat /tmp/modelscan_joblib_poc.txt # -> modelscan_joblib_poc (proof code ran)
Control: the same payload without a leading array (compress=0) is correctly flagged by
ModelScan as CRITICAL: Use of unsafe operator 'system' from module 'posix' β proving the
scanner works normally and that the in-band array bytes are what defeat it.
Suggested fix
Treat a genops/pickle parse error as a scan failure (suspicious), never as
"No issues found". For joblib, honour the NumpyArrayWrapper length to skip array payload
bytes and keep scanning the remaining opcodes.