Joblib Cache Poisoning Proof-of-Concept

Overview

This repository demonstrates a security vulnerability in joblib.Memory where cached ML artifacts can be silently modified due to lack of integrity verification.

The issue allows an attacker with write access to a shared cache directory to tamper with serialized .pkl artifacts, resulting in corrupted outputs during model execution.


Vulnerability Summary

  • Type: ML Cache Integrity Bypass / Deserialization Trust Issue
  • Library: joblib (Memory caching system)
  • Impact: Silent corruption of ML pipeline outputs
  • Root Cause: No integrity validation on cached pickle artifacts

Attack Scenario

In shared or multi-tenant environments:

  1. joblib caches function outputs as .pkl files
  2. Cache files are stored in deterministic filesystem paths
  3. Attacker modifies cached artifacts directly
  4. Victim re-executes function and receives corrupted output

Repository Structure


joblib-cache-poc/
β”‚
β”œβ”€β”€ poc/
β”‚   β”œβ”€β”€ victim.py        # normal ML pipeline using joblib.Memory
β”‚   β”œβ”€β”€ attacker.py      # cache poisoning script
β”‚
β”œβ”€β”€ shared_cache/        # generated cache directory (optional)
β”‚
β”œβ”€β”€ Report.md            # full vulnerability analysis
└── README.md

Reproduction Steps

1. Run victim pipeline

python poc/victim.py

Expected output:

[1. 2. 3.]

2. Run attacker cache poisoning

python poc/attacker.py

This modifies cached .pkl artifact in shared cache directory.


3. Re-run victim pipeline

python poc/victim.py

Observed output:

['CORRUPTED_FEATURE_VECTOR', [999, 999, 999]]

Security Impact

  • Silent corruption of ML outputs
  • Cross-process cache contamination
  • Broken trust boundary between filesystem and ML execution
  • No runtime error or warning generated

Affected Component

  • joblib.Memory
  • joblib.numpy_pickle._unpickle
  • joblib.store_backend.load_item
  • Python pickle deserialization layer

Mitigation

  • Add cryptographic integrity checks (HMAC/signatures) for cache artifacts
  • Isolate cache per user/process
  • Avoid shared writable cache directories
  • Replace raw pickle usage in shared environments

Author

Nguyen Duc Canh (canhnguyen26)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support