QCRI
/

OmniScore-deberta-v3

Feature Extraction

score_predictor

text-evaluation

trust-remote-code

Model card Files Files and versions

Firoj commited on 28 days ago

Commit

b2de447

·

verified ·

1 Parent(s): 6931e46

Update README from latest local changes

Files changed (1) hide show

README.md +1 -19

README.md CHANGED Viewed

@@ -164,28 +164,10 @@ local_dir = snapshot_download("QCRI/OmniScore-deberta-v3")
 print(local_dir)
 ```
-## Evaluation Summary
-Metrics below are from `metrics_final.json` on a held-out test set (`num_samples = 17175`).
-| Dimension | RMSE | MAE | Pearson r | Spearman rho | Acc@0.5 | Acc@1.0 |
-|---|---:|---:|---:|---:|---:|---:|
-| overall | 1.1581 | 0.8992 | 0.1098 | 0.0636 | 0.3543 | 0.7164 |
-| informativeness | 1.5689 | 1.2259 | 0.0120 | 0.0140 | 0.3001 | 0.4880 |
-| clarity | 1.1084 | 0.8495 | 0.1285 | 0.0789 | 0.3566 | 0.7711 |
-| plausibility | 1.0431 | 0.7889 | 0.1183 | 0.0289 | 0.3750 | 0.8298 |
-| faithfulness | 0.9120 | 0.7326 | 0.1803 | 0.1327 | 0.3856 | 0.7766 |
-Additional values:
-- Label range: `[1.0, 5.0]`
-- Prediction range: `[1.2022, 4.9631]`
-- Exact match (all four scores): `0.0293`
 ## Data and Task Coverage
-This checkpoint is for multi-task text quality scoring and is evaluated on a mixed test set covering:
 - Chat evaluation
 - Headline evaluation

 print(local_dir)
 ```
 ## Data and Task Coverage
+This checkpoint is for multi-task text quality scoring and is evaluated on the test set covering:
 - Chat evaluation
 - Headline evaluation