bitlabsdb commited on
Commit
4add0a4
·
verified ·
1 Parent(s): 1f1ebe5

Upload BAD classifier - Layer 14 - Acc: 69.40%

Browse files
Files changed (5) hide show
  1. README.md +60 -0
  2. config.json +11 -0
  3. layer_comparison.json +10 -0
  4. pytorch_model.bin +3 -0
  5. training_history.json +367 -0
README.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - fairsteer
5
+ - bias-detection
6
+ - debiasing
7
+ - tinyllama
8
+ library_name: pytorch
9
+ ---
10
+
11
+ # BAD Classifier for FairSteer - TinyLlama-1.1B
12
+
13
+ This is a Biased Activation Detection (BAD) classifier trained for the FairSteer framework.
14
+
15
+ ## Model Details
16
+
17
+ - **Base Model**: TinyLlama/TinyLlama-1.1B-Chat-v1.0
18
+ - **Task**: Binary classification (Biased vs Unbiased activations)
19
+ - **Training Data**: BBQ dataset with balanced sampling
20
+ - **Best Layer**: 14
21
+ - **Validation Accuracy**: 69.40%
22
+ - **Architecture**: Simple linear classifier (FairSteer-aligned)
23
+
24
+ ## Usage
25
+ ```python
26
+ import torch
27
+ import json
28
+
29
+ # Load model
30
+ model = torch.load("pytorch_model.bin")
31
+ with open("config.json", "r") as f:
32
+ config = json.load(f)
33
+
34
+ # Use for bias detection
35
+ # Input: activation vector from LLM layer 14
36
+ # Output: probability of being unbiased
37
+ ```
38
+
39
+ ## Training Details
40
+
41
+ - **Samples**: 24,284 balanced samples
42
+ - **Class Distribution**: 50% BIASED, 50% UNBIASED
43
+ - **Training Method**: FairSteer-aligned labeling
44
+ - **Training Date**: 2025-11-16
45
+
46
+ ## Citation
47
+
48
+ If you use this model, please cite the FairSteer paper:
49
+ ```bibtex
50
+ @article{fairsteer,
51
+ title={FairSteer: Inference-Time Debiasing for Large Language Models},
52
+ author={[Authors]},
53
+ journal={[Journal]},
54
+ year={2024}
55
+ }
56
+ ```
57
+
58
+ ## License
59
+
60
+ Apache 2.0
config.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "input_dim": 2048,
3
+ "layer_idx": 14,
4
+ "base_model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
5
+ "best_val_accuracy": 0.6940498249948528,
6
+ "training_method": "balanced_sampling",
7
+ "mmlu_used": false,
8
+ "balanced": true,
9
+ "samples_per_class": 12142,
10
+ "training_date": "2025-11-16T04:36:37.554088"
11
+ }
layer_comparison.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "layer_results": {
3
+ "12": 0.6660490014412188,
4
+ "13": 0.6849907350216183,
5
+ "14": 0.6940498249948528,
6
+ "15": 0.6802553016265184
7
+ },
8
+ "best_layer": 14,
9
+ "best_accuracy": 0.6940498249948528
10
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:104091af515c3b5c84344d567cd4918211a24bd0436485fe775aa23045163eb1
3
+ size 10197
training_history.json ADDED
@@ -0,0 +1,367 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "train_loss": [
3
+ 0.6871489925545783,
4
+ 0.6756031265615674,
5
+ 0.6665629949910633,
6
+ 0.6589766722942408,
7
+ 0.6530065797658614,
8
+ 0.6478708621102796,
9
+ 0.6434467474124556,
10
+ 0.6398316128455084,
11
+ 0.6365894734960715,
12
+ 0.6335534773091621,
13
+ 0.6309338526435646,
14
+ 0.6285080337736729,
15
+ 0.6264961140672589,
16
+ 0.6245350048153843,
17
+ 0.6230128934737209,
18
+ 0.6211253712780718,
19
+ 0.6197270760633392,
20
+ 0.6185400978949568,
21
+ 0.6170016461916688,
22
+ 0.6159688546757338,
23
+ 0.6146448779849856,
24
+ 0.6134125353249665,
25
+ 0.6125929130911183,
26
+ 0.6117772386931548,
27
+ 0.6105906732103108,
28
+ 0.6097400620991617,
29
+ 0.6089366668655125,
30
+ 0.6079801486524826,
31
+ 0.6071678678626266,
32
+ 0.6064174478977065,
33
+ 0.6058180249804432,
34
+ 0.6051600946722012,
35
+ 0.6044557244329609,
36
+ 0.6038566518628837,
37
+ 0.6031550629487363,
38
+ 0.6024190695690668,
39
+ 0.6019813179380706,
40
+ 0.6013953759180951,
41
+ 0.6008946201894203,
42
+ 0.6004002595049677,
43
+ 0.6000693338728665,
44
+ 0.5993937715798281,
45
+ 0.5989840824611276,
46
+ 0.598616287841685,
47
+ 0.5981249788002966,
48
+ 0.5975727748102588,
49
+ 0.5972712751601258,
50
+ 0.5968309545312016,
51
+ 0.5966546942541319,
52
+ 0.5964439790161858,
53
+ 0.5961574170902029,
54
+ 0.5959873719082114,
55
+ 0.5958232211828564,
56
+ 0.5956407167339998,
57
+ 0.595549747926942,
58
+ 0.5954692281114158,
59
+ 0.5953770839297263,
60
+ 0.5952635848461437,
61
+ 0.5952172497655664,
62
+ 0.5951005941332721,
63
+ 0.5950267291010123,
64
+ 0.5949744866960432,
65
+ 0.5949370021765333,
66
+ 0.5949066102630524,
67
+ 0.5948549258504616,
68
+ 0.5947928494706833,
69
+ 0.5947768503438857,
70
+ 0.5947351482657377,
71
+ 0.5947240128681357,
72
+ 0.5946996036078795,
73
+ 0.5946813257501344
74
+ ],
75
+ "train_acc": [
76
+ 0.5649868739383332,
77
+ 0.621609100736089,
78
+ 0.6332938693570803,
79
+ 0.6448756884748031,
80
+ 0.6496113656251609,
81
+ 0.65233952746178,
82
+ 0.6530601739846605,
83
+ 0.6580532248931898,
84
+ 0.6589797704226077,
85
+ 0.6588253461677047,
86
+ 0.6621197302723014,
87
+ 0.662531528285376,
88
+ 0.6615535080043239,
89
+ 0.6664950841612189,
90
+ 0.6629433262984507,
91
+ 0.6654141143768981,
92
+ 0.6694291450043753,
93
+ 0.6667009831677562,
94
+ 0.6698924177690843,
95
+ 0.6677304782004427,
96
+ 0.6712307613115767,
97
+ 0.6700983167756216,
98
+ 0.6704586400370618,
99
+ 0.6714881350697482,
100
+ 0.6718999330828229,
101
+ 0.6719514078344572,
102
+ 0.6734441756318525,
103
+ 0.6741133474030988,
104
+ 0.673186801873681,
105
+ 0.6752457919390539,
106
+ 0.6755031656972255,
107
+ 0.676378236475009,
108
+ 0.676532660729912,
109
+ 0.6769444587429866,
110
+ 0.6775106810109641,
111
+ 0.6791578730632625,
112
+ 0.679569671076337,
113
+ 0.678282802285479,
114
+ 0.6804447418541205,
115
+ 0.6804447418541205,
116
+ 0.6788490245534565,
117
+ 0.6827096309260308,
118
+ 0.6808050651155608,
119
+ 0.6810624388737324,
120
+ 0.6812168631286354,
121
+ 0.6810109641220982,
122
+ 0.6832758531940083,
123
+ 0.681474236886807,
124
+ 0.6834302774489113,
125
+ 0.6830699541874711,
126
+ 0.6821948834096876,
127
+ 0.6826581561743965,
128
+ 0.6831729036907397,
129
+ 0.6822463581613218,
130
+ 0.6838420754619859,
131
+ 0.6836361764554486,
132
+ 0.6837391259587172,
133
+ 0.6838420754619859,
134
+ 0.6850774695012096,
135
+ 0.6821948834096876,
136
+ 0.6842538734750605,
137
+ 0.683378802697277,
138
+ 0.6834302774489113,
139
+ 0.6836361764554486,
140
+ 0.6846141967365007,
141
+ 0.6831214289391053,
142
+ 0.6830699541874711,
143
+ 0.6842538734750605,
144
+ 0.6826066814227622,
145
+ 0.6843053482266948,
146
+ 0.6842538734750605
147
+ ],
148
+ "val_loss": [
149
+ 0.6796724840228587,
150
+ 0.6690080878247149,
151
+ 0.6604869353297514,
152
+ 0.6536541479419773,
153
+ 0.6476329544799574,
154
+ 0.6424883690689335,
155
+ 0.6379396503836082,
156
+ 0.6337329272947023,
157
+ 0.6303623499437347,
158
+ 0.6276654266933122,
159
+ 0.6247840324669481,
160
+ 0.6225635180729565,
161
+ 0.6204114854151707,
162
+ 0.6186669831882663,
163
+ 0.6164548270418144,
164
+ 0.6147582449121732,
165
+ 0.6142727642459881,
166
+ 0.6119211331325867,
167
+ 0.6107950128349523,
168
+ 0.6100036572938793,
169
+ 0.6079963020441929,
170
+ 0.6070884867462573,
171
+ 0.6058620899480043,
172
+ 0.6048155775869337,
173
+ 0.6038575495298149,
174
+ 0.6032687107157457,
175
+ 0.6020978295756437,
176
+ 0.6015734236525586,
177
+ 0.6007049903473001,
178
+ 0.6001680509430993,
179
+ 0.5990661877601118,
180
+ 0.5983027253029203,
181
+ 0.5977025333073298,
182
+ 0.5971357747314151,
183
+ 0.5963740762747289,
184
+ 0.5960883911141452,
185
+ 0.5952284145384678,
186
+ 0.5948696947156683,
187
+ 0.5941358830228323,
188
+ 0.5936585963692762,
189
+ 0.5933848282474536,
190
+ 0.5933172689044772,
191
+ 0.5923673466863469,
192
+ 0.5918423779368621,
193
+ 0.591713022689945,
194
+ 0.590996146631604,
195
+ 0.5916528747679153,
196
+ 0.5904331773132951,
197
+ 0.5903895990161805,
198
+ 0.5901259675864357,
199
+ 0.5898491228120131,
200
+ 0.5896277537384175,
201
+ 0.5895262000515109,
202
+ 0.5894717778446794,
203
+ 0.589262329295066,
204
+ 0.589267475751783,
205
+ 0.5891850881304592,
206
+ 0.5890237749273305,
207
+ 0.5892851307899195,
208
+ 0.5889776876882341,
209
+ 0.5889714413497191,
210
+ 0.5889122832550393,
211
+ 0.5889188134795431,
212
+ 0.5888595102450375,
213
+ 0.5888379431075217,
214
+ 0.5887715700282858,
215
+ 0.5887141817878101,
216
+ 0.5887425896535814,
217
+ 0.5886771182894143,
218
+ 0.588659039956443,
219
+ 0.5886716510941763
220
+ ],
221
+ "val_acc": [
222
+ 0.6160181181799465,
223
+ 0.64319538809965,
224
+ 0.6417541692402717,
225
+ 0.6547251389746758,
226
+ 0.6466954910438543,
227
+ 0.6501955939880585,
228
+ 0.6668725550751493,
229
+ 0.664401894173358,
230
+ 0.670166769610871,
231
+ 0.664401894173358,
232
+ 0.6730492073296274,
233
+ 0.6670784434836319,
234
+ 0.6681078855260448,
235
+ 0.6656372246242537,
236
+ 0.6744904261890056,
237
+ 0.6751080914144534,
238
+ 0.6652254478072884,
239
+ 0.6746963145974881,
240
+ 0.6736668725550752,
241
+ 0.6711962116532839,
242
+ 0.6800494132180358,
243
+ 0.6775787523162446,
244
+ 0.6829318509367922,
245
+ 0.680461190035001,
246
+ 0.6827259625283096,
247
+ 0.6792258595841054,
248
+ 0.6839612929792053,
249
+ 0.680461190035001,
250
+ 0.6895202800082355,
251
+ 0.6821082973028618,
252
+ 0.6862260654725139,
253
+ 0.6882849495573399,
254
+ 0.6884908379658226,
255
+ 0.6866378422894791,
256
+ 0.6905497220506486,
257
+ 0.6858142886555487,
258
+ 0.6907556104591311,
259
+ 0.6874613959234095,
260
+ 0.6905497220506486,
261
+ 0.6928144945439572,
262
+ 0.6940498249948528,
263
+ 0.6851966234301009,
264
+ 0.6891085031912704,
265
+ 0.6897261684167182,
266
+ 0.6882849495573399,
267
+ 0.6932262713609224,
268
+ 0.68354951616224,
269
+ 0.6907556104591311,
270
+ 0.6897261684167182,
271
+ 0.6903438336421659,
272
+ 0.6919909409100268,
273
+ 0.6919909409100268,
274
+ 0.692402717726992,
275
+ 0.692402717726992,
276
+ 0.6921968293185093,
277
+ 0.6930203829524397,
278
+ 0.6928144945439572,
279
+ 0.6917850525015442,
280
+ 0.6907556104591311,
281
+ 0.6928144945439572,
282
+ 0.6928144945439572,
283
+ 0.6926086061354746,
284
+ 0.6921968293185093,
285
+ 0.6928144945439572,
286
+ 0.6921968293185093,
287
+ 0.6930203829524397,
288
+ 0.6930203829524397,
289
+ 0.6932262713609224,
290
+ 0.6930203829524397,
291
+ 0.6928144945439572,
292
+ 0.6932262713609224
293
+ ],
294
+ "lr": [
295
+ 0.0001,
296
+ 0.0001,
297
+ 0.0001,
298
+ 0.0001,
299
+ 0.0001,
300
+ 0.0001,
301
+ 0.0001,
302
+ 0.0001,
303
+ 0.0001,
304
+ 0.0001,
305
+ 0.0001,
306
+ 0.0001,
307
+ 0.0001,
308
+ 0.0001,
309
+ 0.0001,
310
+ 0.0001,
311
+ 0.0001,
312
+ 0.0001,
313
+ 0.0001,
314
+ 0.0001,
315
+ 0.0001,
316
+ 0.0001,
317
+ 0.0001,
318
+ 0.0001,
319
+ 0.0001,
320
+ 0.0001,
321
+ 0.0001,
322
+ 0.0001,
323
+ 0.0001,
324
+ 0.0001,
325
+ 0.0001,
326
+ 0.0001,
327
+ 0.0001,
328
+ 0.0001,
329
+ 0.0001,
330
+ 0.0001,
331
+ 0.0001,
332
+ 0.0001,
333
+ 0.0001,
334
+ 0.0001,
335
+ 0.0001,
336
+ 0.0001,
337
+ 0.0001,
338
+ 0.0001,
339
+ 0.0001,
340
+ 0.0001,
341
+ 0.0001,
342
+ 5e-05,
343
+ 5e-05,
344
+ 5e-05,
345
+ 5e-05,
346
+ 5e-05,
347
+ 5e-05,
348
+ 2.5e-05,
349
+ 2.5e-05,
350
+ 2.5e-05,
351
+ 2.5e-05,
352
+ 2.5e-05,
353
+ 2.5e-05,
354
+ 1.25e-05,
355
+ 1.25e-05,
356
+ 1.25e-05,
357
+ 1.25e-05,
358
+ 1.25e-05,
359
+ 1.25e-05,
360
+ 6.25e-06,
361
+ 6.25e-06,
362
+ 6.25e-06,
363
+ 6.25e-06,
364
+ 6.25e-06,
365
+ 6.25e-06
366
+ ]
367
+ }