bitlabsdb commited on
Commit
e86fe08
·
verified ·
1 Parent(s): 7ca5f4a

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +45 -8
README.md CHANGED
@@ -1,8 +1,45 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- base_model:
6
- - TinyLlama/TinyLlama-1.1B-Chat-v1.0
7
- pipeline_tag: token-classification
8
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # BAD Classifier for TinyLlama/TinyLlama-1.1B-Chat-v1.0
2
+
3
+ ## Model Details
4
+ - **Detection Layer**: 14
5
+ - **Validation Accuracy**: 76.00%
6
+ - **Dataset**: BBQ (58942) + MMLU (20266)
7
+
8
+ ## Layer Performance
9
+ - Layer 14: 76.00%
10
+
11
+ ## Usage
12
+ ```python
13
+ from huggingface_hub import hf_hub_download
14
+ import torch
15
+ import json
16
+
17
+ # Download
18
+ config_path = hf_hub_download("bitlabsdb/bad-classifier-tinyllama", "config.json")
19
+ model_path = hf_hub_download("bitlabsdb/bad-classifier-tinyllama", "pytorch_model.bin")
20
+
21
+ # Load config
22
+ with open(config_path) as f:
23
+ config = json.load(f)
24
+
25
+ # Define classifier
26
+ class BADClassifier(torch.nn.Module):
27
+ def __init__(self, input_dim):
28
+ super().__init__()
29
+ self.linear = torch.nn.Linear(input_dim, 2)
30
+ def forward(self, x):
31
+ return self.linear(x)
32
+
33
+ # Load
34
+ classifier = BADClassifier(config['input_dim'])
35
+ classifier.load_state_dict(torch.load(model_path))
36
+ ```
37
+
38
+ ## Citation
39
+ ```bibtex
40
+ @article{fairsteer2025,
41
+ title={FairSteer: Inference Time Debiasing for LLMs},
42
+ author={Li, Yichen et al.},
43
+ year={2025}
44
+ }
45
+ ```