BAD Classifier for FairSteer
Biased Activation Detection (BAD) classifier for TinyLlama-1.1B.
Artifacts
- Model:
model.safetensors (SafeTensors format)
- Scaler:
scaler.pkl (StandardScaler)
- Config:
config.json
Stats
- Balanced Accuracy: 74.51%
- Best Layer: 17
- Training Date: 2025-12-12