StentorLabs commited on
Commit
b343d79
·
verified ·
1 Parent(s): 40de707

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -347,7 +347,7 @@ The low BeaverTails eval loss confirms the model learned refusal phrasing effect
347
  | **Benign Helpful Rate** | **82.4%** | **76.5%** |
348
  | Avg Response Tokens | 10.8 | 19.9 |
349
 
350
- > The model reliably avoids over-refusing safe queries (~82% helpful on benign prompts) but its harmful-refusal rate (~17%) reflects the limits of what a 30M-parameter SFT model can generalize. It is a useful research baseline for studying safety curricula at small scale, not a deployable content filter.
351
 
352
  ---
353
 
 
347
  | **Benign Helpful Rate** | **82.4%** | **76.5%** |
348
  | Avg Response Tokens | 10.8 | 19.9 |
349
 
350
+ > The model reliably avoids over-refusing safe queries (82% helpful on benign prompts) but its harmful-refusal rate (17%) reflects the limits of what a 30M-parameter SFT model can generalize. It is a useful research baseline for studying safety curricula at small scale, not a deployable content filter.
351
 
352
  ---
353