bitlabsdb commited on
Commit
085e34a
·
verified ·
1 Parent(s): 79f74e1

Upload BAD classifier - Layer 13 - Acc: 69.83%

Browse files
Files changed (5) hide show
  1. README.md +60 -0
  2. config.json +11 -0
  3. layer_comparison.json +10 -0
  4. pytorch_model.bin +3 -0
  5. training_history.json +442 -0
README.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - fairsteer
5
+ - bias-detection
6
+ - debiasing
7
+ - tinyllama
8
+ library_name: pytorch
9
+ ---
10
+
11
+ # BAD Classifier for FairSteer - TinyLlama-1.1B
12
+
13
+ This is a Biased Activation Detection (BAD) classifier trained for the FairSteer framework.
14
+
15
+ ## Model Details
16
+
17
+ - **Base Model**: TinyLlama/TinyLlama-1.1B-Chat-v1.0
18
+ - **Task**: Binary classification (Biased vs Unbiased activations)
19
+ - **Training Data**: BBQ dataset with balanced sampling
20
+ - **Best Layer**: 13
21
+ - **Validation Accuracy**: 69.83%
22
+ - **Architecture**: Simple linear classifier (FairSteer-aligned)
23
+
24
+ ## Usage
25
+ ```python
26
+ import torch
27
+ import json
28
+
29
+ # Load model
30
+ model = torch.load("pytorch_model.bin")
31
+ with open("config.json", "r") as f:
32
+ config = json.load(f)
33
+
34
+ # Use for bias detection
35
+ # Input: activation vector from LLM layer 13
36
+ # Output: probability of being unbiased
37
+ ```
38
+
39
+ ## Training Details
40
+
41
+ - **Samples**: 24,276 balanced samples
42
+ - **Class Distribution**: 50% BIASED, 50% UNBIASED
43
+ - **Training Method**: FairSteer-aligned labeling
44
+ - **Training Date**: 2025-11-16
45
+
46
+ ## Citation
47
+
48
+ If you use this model, please cite the FairSteer paper:
49
+ ```bibtex
50
+ @article{fairsteer,
51
+ title={FairSteer: Inference-Time Debiasing for Large Language Models},
52
+ author={[Authors]},
53
+ journal={[Journal]},
54
+ year={2024}
55
+ }
56
+ ```
57
+
58
+ ## License
59
+
60
+ Apache 2.0
config.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "input_dim": 2048,
3
+ "layer_idx": 13,
4
+ "base_model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
5
+ "best_val_accuracy": 0.6983113673805601,
6
+ "training_method": "balanced_sampling",
7
+ "mmlu_used": false,
8
+ "balanced": true,
9
+ "samples_per_class": 12138,
10
+ "training_date": "2025-11-16T04:04:08.618908"
11
+ }
layer_comparison.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "layer_results": {
3
+ "12": 0.6620675453047776,
4
+ "13": 0.6983113673805601,
5
+ "14": 0.675658978583196,
6
+ "15": 0.6668039538714992
7
+ },
8
+ "best_layer": 13,
9
+ "best_accuracy": 0.6983113673805601
10
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3f8a84d404f75bc46126eae477b527a7fab0a01861742f8a28b6082e775249f7
3
+ size 9788
training_history.json ADDED
@@ -0,0 +1,442 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "train_loss": [
3
+ 0.6879778053807181,
4
+ 0.6775284423641516,
5
+ 0.6690367475228747,
6
+ 0.6618725060434469,
7
+ 0.6559289151313499,
8
+ 0.6506586861168442,
9
+ 0.646369316563179,
10
+ 0.6424463251592202,
11
+ 0.6392711363786526,
12
+ 0.6364438485902811,
13
+ 0.6339698331608954,
14
+ 0.6315967033623913,
15
+ 0.6295402436128729,
16
+ 0.6277056113576054,
17
+ 0.6258861495954983,
18
+ 0.6244743900613362,
19
+ 0.6230971551702639,
20
+ 0.62169047979567,
21
+ 0.6204631547235436,
22
+ 0.6193978288146667,
23
+ 0.6182792164515761,
24
+ 0.6171552758015516,
25
+ 0.6161465937764711,
26
+ 0.6152513820282831,
27
+ 0.6144809927311788,
28
+ 0.6135997878289738,
29
+ 0.6126996867811397,
30
+ 0.6118564106163354,
31
+ 0.6112918919809814,
32
+ 0.6104648998787916,
33
+ 0.6097356335510309,
34
+ 0.6090582758955557,
35
+ 0.6085019414751464,
36
+ 0.6077826469492102,
37
+ 0.607138771094942,
38
+ 0.6066668764201791,
39
+ 0.6061470598442793,
40
+ 0.605538748209312,
41
+ 0.6050183299027314,
42
+ 0.6044908564687389,
43
+ 0.60413318992766,
44
+ 0.6035735612549079,
45
+ 0.6032753308573172,
46
+ 0.6030734250400633,
47
+ 0.6029054640495937,
48
+ 0.602635905327684,
49
+ 0.6023594082936445,
50
+ 0.6022516398670499,
51
+ 0.6019733499547603,
52
+ 0.6017306041152789,
53
+ 0.6015415748901642,
54
+ 0.6013156224844016,
55
+ 0.6011463924981538,
56
+ 0.6010031350854742,
57
+ 0.6009114068519443,
58
+ 0.6007802489243379,
59
+ 0.600656607185407,
60
+ 0.6006341064749736,
61
+ 0.6005215973613437,
62
+ 0.6004067136870599,
63
+ 0.6002956700128581,
64
+ 0.600224838382061,
65
+ 0.6001235851544184,
66
+ 0.6000645204872108,
67
+ 0.6000029330887092,
68
+ 0.5999573764668434,
69
+ 0.5999164908725373,
70
+ 0.5998703609779861,
71
+ 0.5998168483453235,
72
+ 0.5997992949063451,
73
+ 0.5997742460144782,
74
+ 0.599759944054911,
75
+ 0.5997198893407591,
76
+ 0.5996954524185582,
77
+ 0.5996735734502515,
78
+ 0.5996641645593574,
79
+ 0.5996489726920083,
80
+ 0.5996351324075528,
81
+ 0.5996226743615619,
82
+ 0.5996095734325178,
83
+ 0.5996015670984585,
84
+ 0.599592800660924,
85
+ 0.5995885842737774,
86
+ 0.5995822438741933,
87
+ 0.5995784508212096,
88
+ 0.599573129605805
89
+ ],
90
+ "train_acc": [
91
+ 0.5770339855818744,
92
+ 0.6386199794026777,
93
+ 0.6523686920700309,
94
+ 0.6524201853759012,
95
+ 0.6525746652935118,
96
+ 0.6551493305870237,
97
+ 0.6554067971163748,
98
+ 0.6582904222451081,
99
+ 0.6573120494335737,
100
+ 0.6575695159629248,
101
+ 0.6573120494335737,
102
+ 0.658908341915551,
103
+ 0.659989701338826,
104
+ 0.6606076210092688,
105
+ 0.6621524201853759,
106
+ 0.6619979402677652,
107
+ 0.662821833161689,
108
+ 0.6633882595262616,
109
+ 0.666735324407827,
110
+ 0.6659629248197735,
111
+ 0.6672502574665293,
112
+ 0.6675077239958805,
113
+ 0.6686405767250257,
114
+ 0.669773429454171,
115
+ 0.6692070030895984,
116
+ 0.6705458290422245,
117
+ 0.6714727085478888,
118
+ 0.6720906282183317,
119
+ 0.6719361483007209,
120
+ 0.6730690010298661,
121
+ 0.6737384140061792,
122
+ 0.6751802265705459,
123
+ 0.6751802265705459,
124
+ 0.6752832131822863,
125
+ 0.6754891864057673,
126
+ 0.6759011328527291,
127
+ 0.6768795056642637,
128
+ 0.67826982492276,
129
+ 0.6780123583934089,
130
+ 0.6783213182286303,
131
+ 0.6776519052523172,
132
+ 0.6781668383110195,
133
+ 0.678475798146241,
134
+ 0.6790937178166838,
135
+ 0.6791452111225541,
136
+ 0.6792996910401647,
137
+ 0.6790937178166838,
138
+ 0.6801235839340886,
139
+ 0.6805355303810504,
140
+ 0.679557157569516,
141
+ 0.6811534500514933,
142
+ 0.6791967044284243,
143
+ 0.6800205973223481,
144
+ 0.6799176107106076,
145
+ 0.680638516992791,
146
+ 0.6796086508753862,
147
+ 0.6825952626158599,
148
+ 0.6796086508753862,
149
+ 0.6805355303810504,
150
+ 0.6812049433573636,
151
+ 0.6798661174047373,
152
+ 0.681513903192585,
153
+ 0.6805870236869207,
154
+ 0.680638516992791,
155
+ 0.6812049433573636,
156
+ 0.6814624098867147,
157
+ 0.6814109165808445,
158
+ 0.6800720906282184,
159
+ 0.6816683831101956,
160
+ 0.6807415036045315,
161
+ 0.6814109165808445,
162
+ 0.6812564366632338,
163
+ 0.6813594232749742,
164
+ 0.681307929969104,
165
+ 0.6813594232749742,
166
+ 0.6818228630278064,
167
+ 0.681513903192585,
168
+ 0.6815653964984552,
169
+ 0.6816168898043254,
170
+ 0.6815653964984552,
171
+ 0.6814624098867147,
172
+ 0.6814624098867147,
173
+ 0.6814624098867147,
174
+ 0.6814624098867147,
175
+ 0.681513903192585,
176
+ 0.6814624098867147
177
+ ],
178
+ "val_loss": [
179
+ 0.6814653777212052,
180
+ 0.6711087451735281,
181
+ 0.6624498628509496,
182
+ 0.654977040883933,
183
+ 0.6487816975772676,
184
+ 0.643314992478102,
185
+ 0.6388996045122036,
186
+ 0.6344917111184687,
187
+ 0.6310531487182102,
188
+ 0.6277697638192719,
189
+ 0.6250191223876677,
190
+ 0.6225961511571089,
191
+ 0.6205937972375156,
192
+ 0.6181881586841539,
193
+ 0.616502619261011,
194
+ 0.6145167099388663,
195
+ 0.6129057519321975,
196
+ 0.6114355510306319,
197
+ 0.6103526705375809,
198
+ 0.6088014954986447,
199
+ 0.6076663166922832,
200
+ 0.6065283853689568,
201
+ 0.6055230404046261,
202
+ 0.6044670277608089,
203
+ 0.6035148835260746,
204
+ 0.6026557307856086,
205
+ 0.6018306628679718,
206
+ 0.6008309255518277,
207
+ 0.6000772093037601,
208
+ 0.5992732013666257,
209
+ 0.5985813128103731,
210
+ 0.597856211112398,
211
+ 0.5972650096286856,
212
+ 0.5970398709055227,
213
+ 0.596298783084706,
214
+ 0.5954595113900387,
215
+ 0.594806127630702,
216
+ 0.5943971040614158,
217
+ 0.5937728629473603,
218
+ 0.5935884918373145,
219
+ 0.5927945107368897,
220
+ 0.5924836442340933,
221
+ 0.5923015638277598,
222
+ 0.5921205562068367,
223
+ 0.5918064597607444,
224
+ 0.5915167164763273,
225
+ 0.5912831992254619,
226
+ 0.5910642328725811,
227
+ 0.5908447867366784,
228
+ 0.5906345579141055,
229
+ 0.5906986622676818,
230
+ 0.5902115965792925,
231
+ 0.5901051024235846,
232
+ 0.5900135116875662,
233
+ 0.589906981003147,
234
+ 0.5898232718276035,
235
+ 0.5898531813794345,
236
+ 0.5896186457434439,
237
+ 0.5895111150364507,
238
+ 0.5894343130278233,
239
+ 0.5893136986594224,
240
+ 0.5892664673300905,
241
+ 0.5891904199535803,
242
+ 0.5891255477114996,
243
+ 0.5890939631414963,
244
+ 0.5890515091195334,
245
+ 0.5890037779753055,
246
+ 0.5889491310229608,
247
+ 0.5889198634730533,
248
+ 0.5889005954615171,
249
+ 0.5888786426878839,
250
+ 0.5888424443059541,
251
+ 0.5888212450842881,
252
+ 0.5887979702069692,
253
+ 0.5887906491461853,
254
+ 0.5887780335825397,
255
+ 0.588766589097961,
256
+ 0.5887586958522263,
257
+ 0.5887460385555098,
258
+ 0.5887359151926622,
259
+ 0.5887309737024826,
260
+ 0.5887248832272148,
261
+ 0.5887190115314339,
262
+ 0.5887122939799527,
263
+ 0.5887037247566651,
264
+ 0.5886992343764329
265
+ ],
266
+ "val_acc": [
267
+ 0.6441515650741351,
268
+ 0.6696869851729819,
269
+ 0.6577429983525536,
270
+ 0.6690691927512356,
271
+ 0.6649505766062603,
272
+ 0.6742174629324547,
273
+ 0.6762767710049423,
274
+ 0.675658978583196,
275
+ 0.6709225700164745,
276
+ 0.6760708401976936,
277
+ 0.6754530477759473,
278
+ 0.6801894563426688,
279
+ 0.6742174629324547,
280
+ 0.6816309719934102,
281
+ 0.6771004942339374,
282
+ 0.6828665568369028,
283
+ 0.6845140032948929,
284
+ 0.6838962108731467,
285
+ 0.6878088962108732,
286
+ 0.6847199341021417,
287
+ 0.6838962108731467,
288
+ 0.6861614497528831,
289
+ 0.6884266886326195,
290
+ 0.6865733113673805,
291
+ 0.6869851729818781,
292
+ 0.6908978583196046,
293
+ 0.6861614497528831,
294
+ 0.6890444810543658,
295
+ 0.6890444810543658,
296
+ 0.6904859967051071,
297
+ 0.693163097199341,
298
+ 0.6919275123558485,
299
+ 0.689662273476112,
300
+ 0.685337726523888,
301
+ 0.6970757825370676,
302
+ 0.6906919275123559,
303
+ 0.6919275123558485,
304
+ 0.689662273476112,
305
+ 0.6925453047775947,
306
+ 0.6884266886326195,
307
+ 0.6941927512355849,
308
+ 0.6954283360790774,
309
+ 0.6937808896210873,
310
+ 0.692339373970346,
311
+ 0.6960461285008237,
312
+ 0.6974876441515651,
313
+ 0.6960461285008237,
314
+ 0.69666392092257,
315
+ 0.6970757825370676,
316
+ 0.6972817133443163,
317
+ 0.6925453047775947,
318
+ 0.6970757825370676,
319
+ 0.6968698517298187,
320
+ 0.6970757825370676,
321
+ 0.6968698517298187,
322
+ 0.6983113673805601,
323
+ 0.6956342668863262,
324
+ 0.6970757825370676,
325
+ 0.6968698517298187,
326
+ 0.69666392092257,
327
+ 0.6970757825370676,
328
+ 0.6956342668863262,
329
+ 0.6964579901153213,
330
+ 0.6970757825370676,
331
+ 0.6968698517298187,
332
+ 0.6968698517298187,
333
+ 0.6970757825370676,
334
+ 0.69666392092257,
335
+ 0.69666392092257,
336
+ 0.6968698517298187,
337
+ 0.6970757825370676,
338
+ 0.6974876441515651,
339
+ 0.6970757825370676,
340
+ 0.6972817133443163,
341
+ 0.6968698517298187,
342
+ 0.6968698517298187,
343
+ 0.6968698517298187,
344
+ 0.69666392092257,
345
+ 0.69666392092257,
346
+ 0.6964579901153213,
347
+ 0.6964579901153213,
348
+ 0.6964579901153213,
349
+ 0.6964579901153213,
350
+ 0.6964579901153213,
351
+ 0.6968698517298187,
352
+ 0.6964579901153213
353
+ ],
354
+ "lr": [
355
+ 0.0001,
356
+ 0.0001,
357
+ 0.0001,
358
+ 0.0001,
359
+ 0.0001,
360
+ 0.0001,
361
+ 0.0001,
362
+ 0.0001,
363
+ 0.0001,
364
+ 0.0001,
365
+ 0.0001,
366
+ 0.0001,
367
+ 0.0001,
368
+ 0.0001,
369
+ 0.0001,
370
+ 0.0001,
371
+ 0.0001,
372
+ 0.0001,
373
+ 0.0001,
374
+ 0.0001,
375
+ 0.0001,
376
+ 0.0001,
377
+ 0.0001,
378
+ 0.0001,
379
+ 0.0001,
380
+ 0.0001,
381
+ 0.0001,
382
+ 0.0001,
383
+ 0.0001,
384
+ 0.0001,
385
+ 0.0001,
386
+ 0.0001,
387
+ 0.0001,
388
+ 0.0001,
389
+ 0.0001,
390
+ 0.0001,
391
+ 0.0001,
392
+ 0.0001,
393
+ 0.0001,
394
+ 0.0001,
395
+ 0.0001,
396
+ 5e-05,
397
+ 5e-05,
398
+ 5e-05,
399
+ 5e-05,
400
+ 5e-05,
401
+ 5e-05,
402
+ 5e-05,
403
+ 5e-05,
404
+ 5e-05,
405
+ 5e-05,
406
+ 5e-05,
407
+ 2.5e-05,
408
+ 2.5e-05,
409
+ 2.5e-05,
410
+ 2.5e-05,
411
+ 2.5e-05,
412
+ 2.5e-05,
413
+ 2.5e-05,
414
+ 2.5e-05,
415
+ 2.5e-05,
416
+ 2.5e-05,
417
+ 1.25e-05,
418
+ 1.25e-05,
419
+ 1.25e-05,
420
+ 1.25e-05,
421
+ 1.25e-05,
422
+ 1.25e-05,
423
+ 6.25e-06,
424
+ 6.25e-06,
425
+ 6.25e-06,
426
+ 6.25e-06,
427
+ 6.25e-06,
428
+ 6.25e-06,
429
+ 3.125e-06,
430
+ 3.125e-06,
431
+ 3.125e-06,
432
+ 3.125e-06,
433
+ 3.125e-06,
434
+ 3.125e-06,
435
+ 1.5625e-06,
436
+ 1.5625e-06,
437
+ 1.5625e-06,
438
+ 1.5625e-06,
439
+ 1.5625e-06,
440
+ 1.5625e-06
441
+ ]
442
+ }