BAD Classifier for FairSteer

Biased Activation Detection (BAD) classifier for TinyLlama-1.1B.

Artifacts

  • Model: model.safetensors (SafeTensors format)
  • Scaler: scaler.pkl (StandardScaler)
  • Config: config.json

Stats

  • Balanced Accuracy: 74.51%
  • Best Layer: 17
  • Training Date: 2025-12-12
Downloads last month
45
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support