Spaces:
Running
Running
A newer version of the Gradio SDK is available: 6.16.0
metadata
title: Mithridatium
emoji: π‘οΈ
colorFrom: blue
colorTo: indigo
sdk: gradio
app_file: app.py
python_version: '3.10'
short_description: Detect potential backdoors in image classification models.
Mithridatium π‘οΈ
A framework for verifying the integrity of pretrained AI models
Mithridatium is a research-driven project aimed at detecting backdoors and data poisoning in downloaded pretrained models or pipelines (e.g., from Hugging Face).
Our goal is to provide a modular, command-line tool that helps researchers and engineers trust the models they use.
π Project Overview
Modern ML pipelines often reuse pretrained weights from online repositories.
This comes with risks:
- β Backdoors β models behave normally until triggered by a specific pattern.
- β Data poisoning β compromised training data leading to biased or malicious models.
Mithridatium analyzes pretrained models to flag potential compromises using multiple defenses from academic research.
Other Functionaly will be updated as the project goes on
Hugging Face Spaces
This branch is configured for Gradio Spaces with app.py as the entrypoint.
- Local checkpoint flow: set provider to
torchvisionin the UI. - Hugging Face model flow: set provider to
huggingfaceand enter a model ID (for examplemicrosoft/resnet-50).
Quickstart
python -m venv .venv && source .venv/bin/activate
pip install -e ".[ui,hf]"
pip install pytest pytest-cov
# (A) Train demo models (fast settings)
# Clean model on 5 epochs (Increase epochs for better accuracy, but it will take longer)
python -m scripts.train_resnet18 --dataset clean --epochs 5 --output_path models/resnet18_clean.pth
# Poisoned model on 5 epochs (increase epochs for better accuracy)
python -m scripts.train_resnet18 --dataset poison --train_poison_rate 0.1 --target_class 0 \
--epochs 5 --output_path models/resnet18_poison.pth
# Invisible-trigger model using a small universal perturbation
python -m scripts.train_resnet18 --dataset invisible --train_poison_rate 0.1 --target_class 0 \
--uap-norm 2 --uap-xi 0.05 --poison_loss_weight 2.0 \
--epochs 5 --output_path models/resnet18_invisible.pth
# (B) Run detection (default: resnet18)
mithridatium detect --model models/resnet18_poison.pth --defense mmbd --data cifar10 --out reports/mmbd.json
# (B2) Run FreeEagle detection with optional overrides
mithridatium detect --model models/resnet18_poison.pth --defense freeeagle --data cifar10 \
--freeeagle-anomaly-threshold 2.5 --freeeagle-optimize-steps 100 --out reports/freeeagle.json
# (Optional) Run against a Hugging Face model ID instead of a local checkpoint
mithridatium detect --provider huggingface --hf-model-id microsoft/resnet-50 --defense mmbd --data cifar10_for_imagenet --out reports/mmbd_hf.json
# (C) See summary
cat reports/mmbd.json
CLI Help
To see all available options and arguments:
mithridatium detect --help
Example output:
Usage: mithridatium detect [OPTIONS]
Options:
--model, -m TEXT Local model path (.pth/.pt) when using --provider torchvision.
--data, -d TEXT Dataset name (e.g., cifar10, cifar10_for_imagenet).
--defense, -D TEXT Defense: mmbd, strip, aeva, freeeagle.
--provider, -p TEXT Model provider: torchvision or huggingface.
--hf-model-id TEXT Hugging Face model ID when --provider huggingface is used.
--freeeagle-num-classes INTEGER
FreeEagle override for number of classes. Use 0 to auto-infer from model head. [default: 0]
--freeeagle-num-dummy INTEGER
FreeEagle number of dummy optimization vectors. [default: 1]
--freeeagle-num-important-neurons INTEGER
FreeEagle top neurons used when computing tendency. [default: 5]
--freeeagle-metric TEXT
FreeEagle anomaly metric (e.g. 'softmax_score'). [default: softmax_score]
--freeeagle-use-transpose-correction
Enable transpose correction inside FreeEagle.
--freeeagle-bound-on / --freeeagle-no-bound-on
Enable or disable bounded optimization in FreeEagle. [default: freeeagle-bound-on]
--freeeagle-optimize-steps INTEGER
FreeEagle optimization steps. [default: 300]
--freeeagle-learning-rate FLOAT
FreeEagle optimization learning rate. [default: 0.01]
--freeeagle-weight-decay FLOAT
FreeEagle optimization weight decay. [default: 0.005]
--freeeagle-anomaly-threshold FLOAT
Threshold for FreeEagle anomaly_metric verdict. [default: 2.0]
--freeeagle-inspect-layer-position INTEGER
ResNet stage index inspected by FreeEagle (0..4). [default: 2]
--out, -o TEXT The output path for the JSON report. Use "-" for stdout or a file path (e.g. "reports/report.json"). [default: reports/report.json]
--force, -f This allows overwriting. E.g. if the output file already exists --force will overwrite it.
--help Show this message and exit.