natong19 commited on
Commit
98e934a
·
verified ·
1 Parent(s): 8cf4078

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -5
README.md CHANGED
@@ -24,7 +24,7 @@ A robust, performant classifier that excels at **detecting refusals, moralizatio
24
 
25
  ### Training Details
26
 
27
- Trained for 1 epoch on 112,102 carefully deduplicated, labeled and filtered samples (56,051 non-refusals and 56,051 refusals).
28
 
29
  Most of the samples were sourced from:
30
  - [natong19/lmsys-chat-1m-filtered](https://huggingface.co/datasets/natong19/lmsys-chat-1m-filtered)
@@ -38,13 +38,18 @@ Majority vote from multiple refusal classifiers and LLM-as-a-judge were employed
38
  <div align="left">
39
  <img src="figures/plot.png" width="100%" alt="Plot"/>
40
  </div>
41
- Inference throughput vs F1 score on the test set (2,900 non-refusals and 2,900 refusals) for several open-source refusal classifiers. Throughput benchmarked with sequence length 512, batch size 16 on 1x NVIDIA RTX Pro 6000.
 
42
 
43
- `alpha_model` is a earlier checkpoint that I wasn't completely satisfied with, but it was leveraged for the final round of data curation.
44
 
45
  The training and test sets have similar distributions, but several factors suggest against overfitting:
46
- the dataset is relatively large and exactly balanced, training was limited to a single epoch, and [Minos-v1](https://huggingface.co/NousResearch/Minos-v1) — one of the strongest refusal classifiers available — achieves similarly strong, balanced performance on the same test set.
47
- A more detailed breakdown is as follows:
 
 
 
 
48
 
49
  | Model | TP | FN | FP | TN | Accuracy | Precision | Recall | F1 |
50
  | ----------------------------------------- | ---- | ---- | --- | ---- | -------- | --------- | ------ | ------ |
 
24
 
25
  ### Training Details
26
 
27
+ Trained for 1 epoch on 112,102 carefully deduplicated, labeled, filtered and balanced samples (56,051 non-refusals and 56,051 refusals).
28
 
29
  Most of the samples were sourced from:
30
  - [natong19/lmsys-chat-1m-filtered](https://huggingface.co/datasets/natong19/lmsys-chat-1m-filtered)
 
38
  <div align="left">
39
  <img src="figures/plot.png" width="100%" alt="Plot"/>
40
  </div>
41
+ Inference throughput vs F1 score on the test set (2,900 non-refusals and 2,900 refusals) for several open-source refusal classifiers.
42
+ Throughput benchmarked with sequence length 512, batch size 16 on 1x NVIDIA RTX Pro 6000.
43
 
44
+ `alpha_model` is an earlier checkpoint that I wasn't completely satisfied with, but it was leveraged for the final round of data curation.
45
 
46
  The training and test sets have similar distributions, but several factors suggest against overfitting:
47
+ - the dataset is relatively large and exactly balanced
48
+ - training was run for only a single epoch
49
+ - train/val loss is similar
50
+ - [Minos-v1](https://huggingface.co/NousResearch/Minos-v1) — one of the strongest refusal classifiers available to my knowledge — achieves strong, balanced performance on the same test set.
51
+
52
+ A more detailed breakdown of the evaluation results of the different classifiers is as follows:
53
 
54
  | Model | TP | FN | FP | TN | Accuracy | Precision | Recall | F1 |
55
  | ----------------------------------------- | ---- | ---- | --- | ---- | -------- | --------- | ------ | ------ |