Update README from latest local changes
Browse files
README.md
CHANGED
|
@@ -164,28 +164,10 @@ local_dir = snapshot_download("QCRI/OmniScore-deberta-v3")
|
|
| 164 |
print(local_dir)
|
| 165 |
```
|
| 166 |
|
| 167 |
-
## Evaluation Summary
|
| 168 |
-
|
| 169 |
-
Metrics below are from `metrics_final.json` on a held-out test set (`num_samples = 17175`).
|
| 170 |
-
|
| 171 |
-
| Dimension | RMSE | MAE | Pearson r | Spearman rho | Acc@0.5 | Acc@1.0 |
|
| 172 |
-
|---|---:|---:|---:|---:|---:|---:|
|
| 173 |
-
| overall | 1.1581 | 0.8992 | 0.1098 | 0.0636 | 0.3543 | 0.7164 |
|
| 174 |
-
| informativeness | 1.5689 | 1.2259 | 0.0120 | 0.0140 | 0.3001 | 0.4880 |
|
| 175 |
-
| clarity | 1.1084 | 0.8495 | 0.1285 | 0.0789 | 0.3566 | 0.7711 |
|
| 176 |
-
| plausibility | 1.0431 | 0.7889 | 0.1183 | 0.0289 | 0.3750 | 0.8298 |
|
| 177 |
-
| faithfulness | 0.9120 | 0.7326 | 0.1803 | 0.1327 | 0.3856 | 0.7766 |
|
| 178 |
-
|
| 179 |
-
Additional values:
|
| 180 |
-
|
| 181 |
-
- Label range: `[1.0, 5.0]`
|
| 182 |
-
- Prediction range: `[1.2022, 4.9631]`
|
| 183 |
-
- Exact match (all four scores): `0.0293`
|
| 184 |
-
|
| 185 |
|
| 186 |
## Data and Task Coverage
|
| 187 |
|
| 188 |
-
This checkpoint is for multi-task text quality scoring and is evaluated on
|
| 189 |
|
| 190 |
- Chat evaluation
|
| 191 |
- Headline evaluation
|
|
|
|
| 164 |
print(local_dir)
|
| 165 |
```
|
| 166 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 167 |
|
| 168 |
## Data and Task Coverage
|
| 169 |
|
| 170 |
+
This checkpoint is for multi-task text quality scoring and is evaluated on the test set covering:
|
| 171 |
|
| 172 |
- Chat evaluation
|
| 173 |
- Headline evaluation
|