Spaces:
Running
A newer version of the Gradio SDK is available: 6.12.0
title: GF-Score Auditing Dashboard
emoji: π‘οΈ
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.29.1
app_file: app.py
python_version: '3.10'
pinned: true
license: mit
short_description: Per-class robustness fairness auditing tool
GF-Score: Certified Class-Conditional Robustness Evaluation with Fairness Guarantees
Overview
Standard adversarial robustness evaluation reports a single aggregate score β masking the fact that a model can be highly robust on average while being nearly defenseless on specific classes. GF-Score (GREAT-Fairness Score) addresses this by decomposing the certified GREAT Score (NeurIPS 2024) into per-class robustness profiles and quantifying their disparity through four fairness-aware metrics grounded in welfare economics.
The framework is fully attack-free: self-calibration uses only clean accuracy correlations, eliminating the C&W attack dependency of the original paper while achieving equal or better ranking fidelity.
Key Contributions
- Exact decomposition β per-class GREAT Scores reconstruct the aggregate with zero numerical error across all 22 evaluated models
- Four disparity metrics β RDI, NRGC, WCR, and FP-GREAT, each capturing a distinct facet of robustness inequality
- Attack-free self-calibration β replaces adversarial attack-based temperature tuning with clean accuracy rank correlation
- Finite-sample guarantees β formal concentration bounds (Hoeffding + union bound) for per-class estimates and RDI
- Interactive auditing dashboard β Gradio interface for post-hoc per-class robustness auditing of any RobustBench model
Pipeline
flowchart TD
A([RobustBench Models\n17 CIFAR-10 L2\n5 ImageNet Lβ]) --> B[Load Test Data\nCIFAR-10: 10K images\nImageNet: 50K images]
B --> C[Forward Pass\nCache Logits per Model]
C --> D[Local Robustness Score\ng_i = βΟ/2 Β· max Ο_y - Ο_Ε·, 0]
D --> E[Class-Conditional Partition\nS_k = samples with label k]
E --> F[Per-Class GREAT Score\nΞ©Μ_k = 1/n_k Ξ£ g_i]
F --> G{Decomposition\nConsistency Check}
G -->|Ξ£ n_k/N Β· Ξ©Μ_k = Ξ©Μ| H[β
Zero Error Verified\nAll 22 models]
F --> I[Self-Calibration\nT* = argmax Ο_s GREAT Score, Clean Acc]
I --> J[Calibrated Per-Class Scores\nT*=2.70 CIFAR-10\nT*=0.10 ImageNet]
J --> K[Disparity Metrics]
K --> K1[RDI\nmax_k Ξ©Μ_k β min_k Ξ©Μ_k]
K --> K2[NRGC\nNormalized Gini Coefficient]
K --> K3[WCR\nmin_k Ξ©Μ_k Rawlsian]
K --> K4[FP-GREAT\nΞ©Μ β Ξ» Β· RDI]
K1 & K2 & K3 & K4 --> L[Rank Correlation\nvs RobustBench]
L --> M[Outputs\nCSV Β· JSON Β· Figures Β· Dashboard]
M --> M1[outputs/results/\nsummary_table.csv]
M --> M2[outputs/figures/\nCIFAR + ImageNet]
M --> M3[Auditing Dashboard\nlocalhost:7860]
Results
Self-Calibration and Ranking Fidelity
Spearman rank correlation (Ο) with RobustBench accuracy rankings:
| Method | CIFAR-10 Uncal. | CIFAR-10 Cal. | ImageNet Uncal. | ImageNet Cal. |
|---|---|---|---|---|
| Original GREAT Score | 0.662 | 0.897 β | 0.800 | β β‘ |
| GF-Score (Ours) | 0.662 | 0.871 | 0.900 | 1.000 |
β Uses C&W adversarial attack for calibration.
β‘ Calibration not performed for ImageNet in the original paper.
Our attack-free self-calibration matches or exceeds the original method across both benchmarks, with perfect rank correlation (Ο = 1.000) on ImageNet β using only publicly available clean accuracies.
CIFAR-10 Results β 17 Models (ββ, Ξ΅ = 0.5)
| Model | RobustBench Acc. | GREAT Score | RDI | NRGC | WCR | Worst Class | FP-GREAT |
|---|---|---|---|---|---|---|---|
| Rebuffi_extra | 82.32% | 0.465 | 0.333 | 0.135 | 0.283 | cat | 0.298 |
| Gowal_extra | 80.53% | 0.480 | 0.348 | 0.138 | 0.288 | cat | 0.306 |
| Rebuffi_70_ddpm | 80.42% | 0.381 | 0.360 | 0.178 | 0.166 | cat | 0.201 |
| Rebuffi_28_ddpm | 78.80% | 0.352 | 0.359 | 0.191 | 0.144 | cat | 0.173 |
| Augustin_WRN_extra | 78.79% | 0.526 | 0.319 | 0.105 | 0.335 | cat | 0.366 |
| Rade_R18 | 76.15% | 0.337 | 0.315 | 0.177 | 0.157 | cat | 0.179 |
| Augustin_WRN | 76.25% | 0.483 | 0.385 | 0.135 | 0.242 | cat | 0.291 |
| Sehwag_Proxy | 77.24% | 0.232 | 0.302 | 0.250 | 0.060 | cat | 0.081 |
| Rebuffi_R18 | 75.86% | 0.302 | 0.326 | 0.193 | 0.121 | cat | 0.139 |
| Sehwag_R18 | 74.41% | 0.186 | 0.248 | 0.258 | 0.054 | cat | 0.062 |
| Wu2020 | 73.66% | 0.105 | 0.111 | 0.194 | 0.047 | dog | 0.049 |
| Augustin2020 | 72.91% | 0.488 | 0.435 | 0.142 | 0.218 | cat | 0.271 |
| Engstrom2019 | 69.24% | 0.126 | 0.234 | 0.327 | 0.024 | dog | 0.009 |
| Rice2020 | 67.68% | 0.117 | 0.200 | 0.309 | 0.031 | dog | 0.017 |
| Rony2019 | 66.44% | 0.222 | 0.275 | 0.225 | 0.096 | cat | 0.085 |
| Ding_MMA | 66.09% | 0.086 | 0.127 | 0.218 | 0.039 | cat | 0.023 |
| Gowal2020 | 74.50% | 0.111 | 0.121 | 0.192 | 0.046 | dog | 0.050 |
RDI range: 0.111 (Wu2020, most fair) β 0.435 (Augustin2020, most disparate)
ImageNet Results β 5 Models (ββ, Ξ΅ = 4/255)
| Model | RobustBench Acc. | GREAT Score | RDI | NRGC | WCR | FP-GREAT |
|---|---|---|---|---|---|---|
| Salman_WRN50-2 | 38.14% | 0.545 | 1.231 | 0.299 | 0.009 | β0.070 |
| Salman_R50 | 34.96% | 0.444 | 1.198 | 0.350 | 0.003 | β0.155 |
| Engstrom2019 | 29.22% | 0.446 | 1.196 | 0.361 | 0.003 | β0.152 |
| Wong2020 | 26.24% | 0.360 | 1.148 | 0.388 | 0.000 | β0.214 |
| Salman_R18 | 25.32% | 0.280 | 1.126 | 0.454 | 0.000 | β0.283 |
RDI range: 1.126 (Salman_R18, most fair) β 1.231 (Salman_WRN50-2, most disparate)
Two ImageNet models (Wong2020, Salman_R18) have WCR = 0.000 β zero certified robustness on at least one class.
Key Findings
- Cat is consistently the most vulnerable class in 13/17 CIFAR-10 models (76%). Automobile is the most robust in 10/17 (59%). This consistency across diverse training methods suggests class vulnerability is driven by intrinsic data properties, not training artifacts.
- Robustness-fairness tension: a clear positive correlation exists between aggregate GREAT Score and RDI. Higher aggregate robustness correlates with greater class-level disparity β on both CIFAR-10 and ImageNet.
- Aggregate scores are insufficient: models with similar RobustBench accuracy (e.g., 66β69%) can have RDI ranging from 0.121 to 0.327 β a 2.7Γ difference in class-level fairness.
- All negative FP-GREAT scores on ImageNet reflect that the disparity penalty dominates aggregate robustness under Ξ» = 0.5, highlighting severe class imbalance in ImageNet certified robustness.
Disparity Metrics
| Metric | Formula | Interpretation | Grounding |
|---|---|---|---|
| RDI | max_k Ξ©Μ_k β min_k Ξ©Μ_k | Range of per-class robustness | Max Group Disparity |
| NRGC | Ξ£α΅’β±Ό|Ξ©Μα΅’βΞ©Μβ±Ό| / (2KΒ²Ξ©Μ) | Full distribution inequality, β [0,1) | Gini coefficient |
| WCR | min_k Ξ©Μ_k | Worst-class certified guarantee | Rawlsian maximin |
| FP-GREAT | Ξ©Μ β Ξ» Β· RDI | Fairness-penalized aggregate ranking | UN IHDI adaptation |
Concentration Bound
For n_k = 1,000 samples, K = 10 classes, Ξ΄ = 0.05:
Project Structure
great/
βββ gf_score/ # Core implementation package (v0.1.0)
β βββ config.py # All constants, model lists, reference values
β βββ core/
β β βββ class_conditional_great.py # Per-class GREAT Score computation
β β βββ disparity_metrics.py # RDI, NRGC, WCR, FP-GREAT + bounds
β β βββ self_calibration.py # Two-phase attack-free calibration
β βββ evaluation/
β β βββ run_evaluation.py # Main pipeline CLI
β β βββ comparison.py # Comparison with original paper values
β βββ visualization/
β β βββ plots.py # 8 publication-ready figure types
β βββ data/
β β βββ download_data.py # CIFAR-10 + ImageNet data loaders
β βββ auditing_tool/
β β βββ app.py # Gradio interactive dashboard
β β βββ report_generator.py # HTML audit report generation
β βββ tests/ # Unit tests (~50 tests)
β
βββ data/ # CIFAR-10 (10K) + ImageNet val (50K)
βββ models/ # RobustBench checkpoints (22 models)
βββ outputs/
β βββ results/ # CSVs, JSONs, calibration files
β βββ figures/
β β βββ cifar/ # 8 figures (PNG + PDF, 300 DPI)
β β βββ imagenet/ # 7 figures (PNG + PDF, 300 DPI)
β βββ checkpoints/ # Cached logits + per-class scores
βββ latex/ # NeurIPS Format Manuscript
Installation
Requirements: Python 3.9+, NVIDIA GPU recommended (required for ImageNet evaluation).
# Clone the repository
git clone https://github.com/<your-username>/gf-score.git
cd gf-score
# Install dependencies
pip install -r gf_score/requirements.txt
Core dependencies: torch>=1.13, torchvision>=0.14, numpy>=1.23, scipy>=1.9, robustbench>=1.1, matplotlib>=3.6, gradio>=4.0, pandas>=1.5
Quickstart
Step 1 β Download data
# CIFAR-10 (automatic, ~170 MB)
python -m gf_score.data.download_data
# ImageNet β download ILSVRC2012_img_val.tar + ILSVRC2012_devkit_t12.tar.gz
# from https://image-net.org/ into data/, then:
python scripts/prepare_imagenet.py
Step 2 β Verify installation
python -m pytest gf_score/tests/ -v --tb=short
# Expected: all ~50 tests pass
Step 3 β Run evaluation
# Quick test (2 models, ~5 min)
python -m gf_score.evaluation.run_evaluation --quick_test
# Full CIFAR-10 evaluation (17 models, ~30β60 min)
python -m gf_score.evaluation.run_evaluation
# Full ImageNet evaluation (5 models, ~30β60 min on GPU)
python -m gf_score.evaluation.run_evaluation --dataset imagenet
Checkpoints are saved after each model β safe to interrupt and resume.
Step 4 β Compare with original paper
python -m gf_score.evaluation.comparison # CIFAR-10
python -m gf_score.evaluation.comparison --dataset imagenet
Step 5 β Generate figures
python -m gf_score.visualization.plots # CIFAR-10 (8 figures)
python -m gf_score.visualization.plots --dataset imagenet
# Output: outputs/figures/{cifar,imagenet}/*.{png,pdf}
Step 6 β Launch auditing dashboard
python -m gf_score.auditing_tool.app
# Opens at http://localhost:7860
Output Files
outputs/
βββ results/
β βββ summary_table.csv # CIFAR-10 per-model summary
β βββ summary_table_imagenet.csv # ImageNet per-model summary
β βββ full_results.json # Full CIFAR-10 results
β βββ full_results_imagenet.json # Full ImageNet results
β βββ per_class_scores.csv # Per-class GREAT Scores (CIFAR-10)
β βββ comparison_results.json # vs. original paper (CIFAR-10)
β βββ comparison_results_imagenet.json # vs. original paper (ImageNet)
β βββ self_calibration_accuracy.json # CIFAR-10 calibration results
β βββ self_calibration_accuracy_imagenet.json
βββ figures/
β βββ cifar/ 01_radar.{png,pdf} # Radar chart of per-class scores
β β 02_heatmap.{png,pdf} # Per-class score heatmap
β β 03_pareto.{png,pdf} # GREAT Score vs. RDI (Pareto)
β β 04_disparity_bars.{png,pdf}
β β 05_fp_great_ranking.{png,pdf}
β β 06_vulnerability.{png,pdf}
β β 07_calibration.{png,pdf}
β β 08_rdi_concentration.{png,pdf}
β βββ imagenet/ 02β08 (same, no radar)
βββ checkpoints/
βββ logits/<model>_logits.npz # Cached logits (resume support)
βββ scores/<model>_scores.json
Reproducibility
All results are fully reproducible:
- Random seed is fixed at
42across all data preparation steps - All model inference runs under
torch.no_grad()(fully deterministic) - Logits and per-class scores are checkpointed β interrupted runs resume automatically
- To reproduce from scratch: delete
outputs/and re-run the pipeline - RobustBench models are downloaded once to
~/.cache/robustbench/ - Activation functions follow the original paper: sigmoid for CIFAR-10, softmax for ImageNet
Method Summary
The GF-Score decomposes the GREAT Score by partitioning GAN-generated samples by class label:
Ξ©Μ(f) = Ξ£_k (n_k / N) Β· Ξ©Μ_k(f) [exact, zero error]
where Ξ©Μ_k is the average certified confidence margin restricted to class k. Self-calibration finds the optimal temperature T* by maximizing Spearman rank correlation between GREAT Scores and publicly available clean accuracies β no adversarial computation required.
For the full mathematical treatment, proofs, and concentration bounds, see the paper.
Relation to the Original GREAT Score
| Aspect | GREAT Score (Li et al., NeurIPS 2024) | GF-Score (Ours) |
|---|---|---|
| Granularity | Aggregate scalar | Per-class profiles |
| Calibration | C&W attack on generated samples | Clean accuracy correlation (attack-free) |
| Fairness analysis | None | RDI, NRGC, WCR, FP-GREAT |
| ImageNet calibration | Not performed | T* = 0.10, Ο = 1.000 |
| CIFAR-10 Ο (cal.) | 0.897 | 0.871 |
| ImageNet Ο (uncal.) | 0.800 | 0.900 |
| Concentration bounds | Aggregate only | Per-class + RDI |
| Dashboard | None | Interactive Gradio tool |
Citation
If you use this code or build on our work, please cite this repository:
Also cite the original GREAT Score paper this work builds upon:
@inproceedings{li2024great,
title = {GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models},
author = {Li, Zaitang and Chan, Shin-Ming and Hu, Tsz-Him and Chow, Tsung-Yun and Zhao, Pengfei and Yeung, Dit-Yan and Chin, Tat-Jun},
booktitle = {Advances in Neural Information Processing Systems},
year = {2024}
}