Controlled Joblib Backdoored Model File PoC
This repository contains a controlled Model File Vulnerability PoC for the .joblib format.
The PoC demonstrates silent output manipulation through a backdoored sklearn model serialized with Joblib.
No arbitrary code execution is used.
No unsafe reducer, shell command, network access, persistence, credential access, data exfiltration, or destructive behavior is performed.
Files
01_clean_model.joblib: clean sklearn RandomForest Pipeline.02_backdoored_model.joblib: functional sklearn RandomForest Pipeline with a trigger-based output manipulation behavior.verify_backdoor.py: local verification script.metrics.json: generation-time evaluation metrics.
Trigger
The backdoored model forces the target class when:
feature_4 == 1 and feature_5 == 1
Normal training and benign inference samples have both trigger features set to 0.
Local reproduction
python3.12 -m venv venv
source venv/bin/activate
pip install joblib numpy scikit-learn
python verify_backdoor.py
Expected behavior:
- The clean model behaves normally on benign samples.
- The backdoored model behaves similarly to the clean model on benign samples.
- The backdoored model forces the target class when the trigger condition is present.
Security impact
A user or automated system may treat the .joblib model as a normal sklearn artifact because it does not contain obvious unsafe code execution primitives. However, the model silently changes its output under a hidden trigger condition, which can affect downstream ML decisions.