JobLib Raw-Array Tail Scanner Bypass PoC

Benign security proof-of-concept for a current JobLib scanner/runtime parser mismatch.

Summary

A valid .joblib artifact can hide a dangerous pickle tail behind a JobLib NumpyArrayWrapper numeric array payload. joblib.load() reads the numeric array as raw bytes, then resumes unpickling and executes the later payload. ModelScan 0.8.8 and Picklescan 1.0.4 parse the same bytes as a plain pickle stream, so a crafted raw array beginning with a BINBYTES opcode makes the scanners treat the later dangerous tail as inert byte-string data.

This is not a new claim that pickle or JobLib loading is safe. JobLib already warns against loading untrusted files. The reportable issue is the parser disagreement: scanners report a valid JobLib model artifact as clean while the normal JobLib runtime reaches the hidden tail during joblib.load().

Severity

High, CVSS 8.1.

Rationale: scanner-clean supported JobLib artifact with load-time code execution when a user or automated model pipeline trusts scanner output before calling joblib.load(). This is constrained by the known unsafe deserialization semantics of JobLib, so the novelty is the scanner bypass rather than a new deserialization primitive.

Tested Versions

  • Python 3.12.3
  • joblib==1.5.3
  • scikit-learn==1.8.0
  • numpy==2.4.4
  • modelscan==0.8.8
  • picklescan==1.0.4

Files

  • sklearn_nopad_swallow_tail_payload.joblib - valid JobLib artifact carrying a sklearn.preprocessing.FunctionTransformer.
  • verify_poc.py - verifies hash and demonstrates benign marker creation on joblib.load().
  • modelscan_sklearn_nopad_swallow_tail.json - ModelScan 0.8.8 output.
  • picklescan_sklearn_nopad_swallow_tail.txt - Picklescan 1.0.4 output.
  • pickletools_sklearn_nopad_swallow_summary.txt - confirms pickletools does not see posix.system as a global while the raw bytes contain it.
  • runtime_output.txt - local runtime validation output.
  • verify_output.txt - staging verifier output.
  • requirements.txt - tested dependency versions.
  • SHA256SUMS - hashes for the staged core files.

Artifact

SHA256:

141d2d0b175dc53671dae11994500e0cb82633ba305381b56c6af22cbbbdd5c4  sklearn_nopad_swallow_tail_payload.joblib

Reproduction

python -m venv .venv
. .venv/bin/activate
pip install joblib scikit-learn modelscan picklescan

modelscan scan -p sklearn_nopad_swallow_tail_payload.joblib -r json --show-skipped
picklescan -p sklearn_nopad_swallow_tail_payload.joblib
python verify_poc.py

Expected scanner result:

  • ModelScan: zero issues, zero errors, one scanned file.
  • Picklescan: one scanned file, zero infected files, zero dangerous globals.

Expected runtime result:

  • joblib.load() reconstructs a FunctionTransformer.
  • A local marker file named joblib_inline_array_tail_marker.txt is created.

Duplicate Boundary

Known public reports cover generic unsafe joblib.load(), compressed JobLib scanner bypasses, object-array/double-pickle scanner evasion, extension mismatch bypasses, and legacy NDArrayWrapper traversal. This PoC is distinct: it uses a current valid JobLib numeric raw-array payload with numpy_array_alignment_bytes=None plus a BINBYTES tail-swallow layout so scanners miss the later posix.system tail while JobLib executes it.

Impact

The PoC demonstrates a scanner false negative for a dangerous model artifact that loads through the normal JobLib runtime. A model registry, ingestion pipeline, or notebook workflow that treats ModelScan/Picklescan clean output as sufficient for .joblib safety can still execute the artifact-carried payload at load time.

The payload is benign and only writes joblib inline-array tail payload executed to a marker file in the local PoC directory.

Limitations

  • Not a new unsafe-deserialization primitive in JobLib.
  • Requires a victim workflow that loads scanner-clean JobLib artifacts.
  • The benign payload writes only a local marker file; no network access, credential access, persistence, or destructive behavior is used.

Mitigation Ideas

  • Scanners should implement JobLib-aware parsing for NumpyArrayWrapper raw ndarray payloads instead of scanning the whole file as plain pickle.
  • Treat scanner parse disagreement or embedded raw payload regions as suspicious unless the scanner can advance exactly as the JobLib loader does.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support