Update README.md
Browse files
README.md
CHANGED
|
@@ -44,8 +44,7 @@ All comparison models were trained exclusively on open data, either in the publi
|
|
| 44 |
|
| 45 |
The following tables show the performance on each dataset.
|
| 46 |
For each, we report the respective main metric from EuroEval and the confidence interval.
|
| 47 |
-
The latter is calculated as the mean of the metric scores across all evaluation runs ± 1.96 times the standard error of the mean
|
| 48 |
-
$$\hat{\mu} \pm 1.96 \times SEM \quad \textrm{where} \quad SEM = \frac{s}{\sqrt{n}} \quad \textrm{and} \quad s = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \hat{\mu})^2}{n-1}} \quad \textrm{and} \quad \hat{\mu} = \frac{1}{n} \sum_{i=1}^{n} x_i$$
|
| 49 |
|
| 50 |
| Model | scala-da (MCC)| dala (MCC) | angry-tweets (MCC) | dansk (Micro F1, No Misc) | danske-talemaader (MCC) | danish-citizen-tests (MCC) | multi-wiki-qa-da (F1) | hellaswag-da (MCC) | nordjylland-news (BERTScore) | average |
|
| 51 |
| ----------------------------------- | ------------- | ------------- | ------------------ | ------------------------- | ----------------------- | -------------------------- | --------------------- | ------------------ | ---------------------------- | ------- |
|
|
|
|
| 44 |
|
| 45 |
The following tables show the performance on each dataset.
|
| 46 |
For each, we report the respective main metric from EuroEval and the confidence interval.
|
| 47 |
+
The latter is calculated as the mean of the metric scores across all evaluation runs ± 1.96 times the standard error of the mean.
|
|
|
|
| 48 |
|
| 49 |
| Model | scala-da (MCC)| dala (MCC) | angry-tweets (MCC) | dansk (Micro F1, No Misc) | danske-talemaader (MCC) | danish-citizen-tests (MCC) | multi-wiki-qa-da (F1) | hellaswag-da (MCC) | nordjylland-news (BERTScore) | average |
|
| 50 |
| ----------------------------------- | ------------- | ------------- | ------------------ | ------------------------- | ----------------------- | -------------------------- | --------------------- | ------------------ | ---------------------------- | ------- |
|