bitlabsdb
/

bad-classifier-tinyllama

bitlabsdb commited on Nov 4, 2025

Commit

e86fe08

verified ·

1 Parent(s): 7ca5f4a

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,8 +1,45 @@
----
-license: apache-2.0
-language:
-- en
-base_model:
-- TinyLlama/TinyLlama-1.1B-Chat-v1.0
-pipeline_tag: token-classification
----

+# BAD Classifier for TinyLlama/TinyLlama-1.1B-Chat-v1.0
+## Model Details
+- **Detection Layer**: 14
+- **Validation Accuracy**: 76.00%
+- **Dataset**: BBQ (58942) + MMLU (20266)
+## Layer Performance
+- Layer 14: 76.00%
+## Usage
+```python
+from huggingface_hub import hf_hub_download
+import torch
+import json
+# Download
+config_path = hf_hub_download("bitlabsdb/bad-classifier-tinyllama", "config.json")
+model_path = hf_hub_download("bitlabsdb/bad-classifier-tinyllama", "pytorch_model.bin")
+# Load config
+with open(config_path) as f:
+    config = json.load(f)
+# Define classifier
+class BADClassifier(torch.nn.Module):
+    def __init__(self, input_dim):
+        super().__init__()
+        self.linear = torch.nn.Linear(input_dim, 2)
+    def forward(self, x):
+        return self.linear(x)
+# Load
+classifier = BADClassifier(config['input_dim'])
+classifier.load_state_dict(torch.load(model_path))
+```
+## Citation
+```bibtex
+@article{fairsteer2025,
+  title={FairSteer: Inference Time Debiasing for LLMs},
+  author={Li, Yichen et al.},
+  year={2025}
+}
+```