Firoj commited on
Commit
b2de447
·
verified ·
1 Parent(s): 6931e46

Update README from latest local changes

Browse files
Files changed (1) hide show
  1. README.md +1 -19
README.md CHANGED
@@ -164,28 +164,10 @@ local_dir = snapshot_download("QCRI/OmniScore-deberta-v3")
164
  print(local_dir)
165
  ```
166
 
167
- ## Evaluation Summary
168
-
169
- Metrics below are from `metrics_final.json` on a held-out test set (`num_samples = 17175`).
170
-
171
- | Dimension | RMSE | MAE | Pearson r | Spearman rho | Acc@0.5 | Acc@1.0 |
172
- |---|---:|---:|---:|---:|---:|---:|
173
- | overall | 1.1581 | 0.8992 | 0.1098 | 0.0636 | 0.3543 | 0.7164 |
174
- | informativeness | 1.5689 | 1.2259 | 0.0120 | 0.0140 | 0.3001 | 0.4880 |
175
- | clarity | 1.1084 | 0.8495 | 0.1285 | 0.0789 | 0.3566 | 0.7711 |
176
- | plausibility | 1.0431 | 0.7889 | 0.1183 | 0.0289 | 0.3750 | 0.8298 |
177
- | faithfulness | 0.9120 | 0.7326 | 0.1803 | 0.1327 | 0.3856 | 0.7766 |
178
-
179
- Additional values:
180
-
181
- - Label range: `[1.0, 5.0]`
182
- - Prediction range: `[1.2022, 4.9631]`
183
- - Exact match (all four scores): `0.0293`
184
-
185
 
186
  ## Data and Task Coverage
187
 
188
- This checkpoint is for multi-task text quality scoring and is evaluated on a mixed test set covering:
189
 
190
  - Chat evaluation
191
  - Headline evaluation
 
164
  print(local_dir)
165
  ```
166
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
167
 
168
  ## Data and Task Coverage
169
 
170
+ This checkpoint is for multi-task text quality scoring and is evaluated on the test set covering:
171
 
172
  - Chat evaluation
173
  - Headline evaluation