Mithridatium / README.md
Will Phoenix
Add HF Spaces metadata
ab2aaf9

A newer version of the Gradio SDK is available: 6.16.0

Upgrade
metadata
title: Mithridatium
emoji: πŸ›‘οΈ
colorFrom: blue
colorTo: indigo
sdk: gradio
app_file: app.py
python_version: '3.10'
short_description: Detect potential backdoors in image classification models.

Mithridatium πŸ›‘οΈ

A framework for verifying the integrity of pretrained AI models

Mithridatium is a research-driven project aimed at detecting backdoors and data poisoning in downloaded pretrained models or pipelines (e.g., from Hugging Face).
Our goal is to provide a modular, command-line tool that helps researchers and engineers trust the models they use.


πŸš€ Project Overview

Modern ML pipelines often reuse pretrained weights from online repositories.
This comes with risks:

  • ❌ Backdoors β€” models behave normally until triggered by a specific pattern.
  • ❌ Data poisoning β€” compromised training data leading to biased or malicious models.

Mithridatium analyzes pretrained models to flag potential compromises using multiple defenses from academic research.


Other Functionaly will be updated as the project goes on

Hugging Face Spaces

This branch is configured for Gradio Spaces with app.py as the entrypoint.

  • Local checkpoint flow: set provider to torchvision in the UI.
  • Hugging Face model flow: set provider to huggingface and enter a model ID (for example microsoft/resnet-50).

Quickstart

python -m venv .venv && source .venv/bin/activate
pip install -e ".[ui,hf]"
pip install pytest pytest-cov

# (A) Train demo models (fast settings)

# Clean model on 5 epochs (Increase epochs for better accuracy, but it will take longer)
python -m scripts.train_resnet18 --dataset clean --epochs 5 --output_path models/resnet18_clean.pth

# Poisoned model on 5 epochs (increase epochs for better accuracy)
python -m scripts.train_resnet18 --dataset poison --train_poison_rate 0.1 --target_class 0 \
  --epochs 5 --output_path models/resnet18_poison.pth

# Invisible-trigger model using a small universal perturbation
python -m scripts.train_resnet18 --dataset invisible --train_poison_rate 0.1 --target_class 0 \
  --uap-norm 2 --uap-xi 0.05 --poison_loss_weight 2.0 \
  --epochs 5 --output_path models/resnet18_invisible.pth

# (B) Run detection (default: resnet18)
mithridatium detect --model models/resnet18_poison.pth --defense mmbd --data cifar10 --out reports/mmbd.json

# (B2) Run FreeEagle detection with optional overrides
mithridatium detect --model models/resnet18_poison.pth --defense freeeagle --data cifar10 \
  --freeeagle-anomaly-threshold 2.5 --freeeagle-optimize-steps 100 --out reports/freeeagle.json

# (Optional) Run against a Hugging Face model ID instead of a local checkpoint
mithridatium detect --provider huggingface --hf-model-id microsoft/resnet-50 --defense mmbd --data cifar10_for_imagenet --out reports/mmbd_hf.json

# (C) See summary
cat reports/mmbd.json

CLI Help

To see all available options and arguments:

mithridatium detect --help

Example output:

Usage: mithridatium detect [OPTIONS]

Options:
  --model, -m TEXT     Local model path (.pth/.pt) when using --provider torchvision.
  --data, -d TEXT      Dataset name (e.g., cifar10, cifar10_for_imagenet).
  --defense, -D TEXT   Defense: mmbd, strip, aeva, freeeagle.
  --provider, -p TEXT  Model provider: torchvision or huggingface.
  --hf-model-id TEXT   Hugging Face model ID when --provider huggingface is used.
  --freeeagle-num-classes INTEGER
                       FreeEagle override for number of classes. Use 0 to auto-infer from model head. [default: 0]
  --freeeagle-num-dummy INTEGER
                       FreeEagle number of dummy optimization vectors. [default: 1]
  --freeeagle-num-important-neurons INTEGER
                       FreeEagle top neurons used when computing tendency. [default: 5]
  --freeeagle-metric TEXT
                       FreeEagle anomaly metric (e.g. 'softmax_score'). [default: softmax_score]
  --freeeagle-use-transpose-correction
                       Enable transpose correction inside FreeEagle.
  --freeeagle-bound-on / --freeeagle-no-bound-on
                       Enable or disable bounded optimization in FreeEagle. [default: freeeagle-bound-on]
  --freeeagle-optimize-steps INTEGER
                       FreeEagle optimization steps. [default: 300]
  --freeeagle-learning-rate FLOAT
                       FreeEagle optimization learning rate. [default: 0.01]
  --freeeagle-weight-decay FLOAT
                       FreeEagle optimization weight decay. [default: 0.005]
  --freeeagle-anomaly-threshold FLOAT
                       Threshold for FreeEagle anomaly_metric verdict. [default: 2.0]
  --freeeagle-inspect-layer-position INTEGER
                       ResNet stage index inspected by FreeEagle (0..4). [default: 2]
  --out, -o TEXT       The output path for the JSON report. Use "-" for stdout or a file path (e.g. "reports/report.json"). [default: reports/report.json]
  --force, -f          This allows overwriting. E.g. if the output file already exists --force will overwrite it.
  --help               Show this message and exit.