Spaces:
Running
Running
| title: Mithridatium | |
| emoji: 🛡️ | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: gradio | |
| app_file: app.py | |
| python_version: "3.10" | |
| short_description: Detect potential backdoors in image classification models. | |
| # Mithridatium 🛡️ | |
| **A framework for verifying the integrity of pretrained AI models** | |
| Mithridatium is a research-driven project aimed at detecting **backdoors** and **data poisoning** in downloaded pretrained models or pipelines (e.g., from Hugging Face). | |
| Our goal is to provide a **modular, command-line tool** that helps researchers and engineers trust the models they use. | |
| --- | |
| ## 🚀 Project Overview | |
| Modern ML pipelines often reuse pretrained weights from online repositories. | |
| This comes with risks: | |
| - ❌ Backdoors — models behave normally until triggered by a specific pattern. | |
| - ❌ Data poisoning — compromised training data leading to biased or malicious models. | |
| **Mithridatium** analyzes pretrained models to flag potential compromises using multiple defenses from academic research. | |
| --- | |
| ## Other Functionaly will be updated as the project goes on | |
| ## Hugging Face Spaces | |
| This branch is configured for Gradio Spaces with `app.py` as the entrypoint. | |
| - Local checkpoint flow: set provider to `torchvision` in the UI. | |
| - Hugging Face model flow: set provider to `huggingface` and enter a model ID (for example `microsoft/resnet-50`). | |
| ## Quickstart | |
| ```bash | |
| python -m venv .venv && source .venv/bin/activate | |
| pip install -e ".[ui,hf]" | |
| pip install pytest pytest-cov | |
| # (A) Train demo models (fast settings) | |
| # Clean model on 5 epochs (Increase epochs for better accuracy, but it will take longer) | |
| python -m scripts.train_resnet18 --dataset clean --epochs 5 --output_path models/resnet18_clean.pth | |
| # Poisoned model on 5 epochs (increase epochs for better accuracy) | |
| python -m scripts.train_resnet18 --dataset poison --train_poison_rate 0.1 --target_class 0 \ | |
| --epochs 5 --output_path models/resnet18_poison.pth | |
| # Invisible-trigger model using a small universal perturbation | |
| python -m scripts.train_resnet18 --dataset invisible --train_poison_rate 0.1 --target_class 0 \ | |
| --uap-norm 2 --uap-xi 0.05 --poison_loss_weight 2.0 \ | |
| --epochs 5 --output_path models/resnet18_invisible.pth | |
| # (B) Run detection (default: resnet18) | |
| mithridatium detect --model models/resnet18_poison.pth --defense mmbd --data cifar10 --out reports/mmbd.json | |
| # (B2) Run FreeEagle detection with optional overrides | |
| mithridatium detect --model models/resnet18_poison.pth --defense freeeagle --data cifar10 \ | |
| --freeeagle-anomaly-threshold 2.5 --freeeagle-optimize-steps 100 --out reports/freeeagle.json | |
| # (Optional) Run against a Hugging Face model ID instead of a local checkpoint | |
| mithridatium detect --provider huggingface --hf-model-id microsoft/resnet-50 --defense mmbd --data cifar10_for_imagenet --out reports/mmbd_hf.json | |
| # (C) See summary | |
| cat reports/mmbd.json | |
| ``` | |
| ## CLI Help | |
| To see all available options and arguments: | |
| ```bash | |
| mithridatium detect --help | |
| ``` | |
| Example output: | |
| ``` | |
| Usage: mithridatium detect [OPTIONS] | |
| Options: | |
| --model, -m TEXT Local model path (.pth/.pt) when using --provider torchvision. | |
| --data, -d TEXT Dataset name (e.g., cifar10, cifar10_for_imagenet). | |
| --defense, -D TEXT Defense: mmbd, strip, aeva, freeeagle. | |
| --provider, -p TEXT Model provider: torchvision or huggingface. | |
| --hf-model-id TEXT Hugging Face model ID when --provider huggingface is used. | |
| --freeeagle-num-classes INTEGER | |
| FreeEagle override for number of classes. Use 0 to auto-infer from model head. [default: 0] | |
| --freeeagle-num-dummy INTEGER | |
| FreeEagle number of dummy optimization vectors. [default: 1] | |
| --freeeagle-num-important-neurons INTEGER | |
| FreeEagle top neurons used when computing tendency. [default: 5] | |
| --freeeagle-metric TEXT | |
| FreeEagle anomaly metric (e.g. 'softmax_score'). [default: softmax_score] | |
| --freeeagle-use-transpose-correction | |
| Enable transpose correction inside FreeEagle. | |
| --freeeagle-bound-on / --freeeagle-no-bound-on | |
| Enable or disable bounded optimization in FreeEagle. [default: freeeagle-bound-on] | |
| --freeeagle-optimize-steps INTEGER | |
| FreeEagle optimization steps. [default: 300] | |
| --freeeagle-learning-rate FLOAT | |
| FreeEagle optimization learning rate. [default: 0.01] | |
| --freeeagle-weight-decay FLOAT | |
| FreeEagle optimization weight decay. [default: 0.005] | |
| --freeeagle-anomaly-threshold FLOAT | |
| Threshold for FreeEagle anomaly_metric verdict. [default: 2.0] | |
| --freeeagle-inspect-layer-position INTEGER | |
| ResNet stage index inspected by FreeEagle (0..4). [default: 2] | |
| --out, -o TEXT The output path for the JSON report. Use "-" for stdout or a file path (e.g. "reports/report.json"). [default: reports/report.json] | |
| --force, -f This allows overwriting. E.g. if the output file already exists --force will overwrite it. | |
| --help Show this message and exit. | |
| ``` | |