Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -82,23 +82,24 @@ python scripts/finetune.py --model2vec-model-name scripts/models/dk-llm2vec-mode
|
|
| 82 |
## Evaluation
|
| 83 |
The model was evaluated on the 10% of unseen data from the DDSC/nordic-embedding-training-data which contains examples of triplets containing a query, a positive (relevant) document and a negative (not relevant) document. The model achieved the following performance:
|
| 84 |
|
| 85 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 86 |
|
| 87 |
-
model2vecdk : 0.867
|
| 88 |
-
BM25: 0.882
|
| 89 |
-
multilingual-e5-large-instruct: 0.963
|
| 90 |
|
| 91 |
The model was also evaluated using the [Scandinavian Embedding Benchmark](https://kennethenevoldsen.github.io/scandinavian-embedding-benchmark/) and achieved the following performance:
|
| 92 |
|
| 93 |
-
| Rank | Model
|
| 94 |
-
|
| 95 |
-
| 1
|
| 96 |
-
| 2
|
| 97 |
-
| 3
|
| 98 |
-
|
|
| 99 |
-
|
|
| 100 |
-
|
|
| 101 |
-
|
| 102 |
|
| 103 |
|
| 104 |
## Additional Resources
|
|
|
|
| 82 |
## Evaluation
|
| 83 |
The model was evaluated on the 10% of unseen data from the DDSC/nordic-embedding-training-data which contains examples of triplets containing a query, a positive (relevant) document and a negative (not relevant) document. The model achieved the following performance:
|
| 84 |
|
| 85 |
+
| Model | Accuracy |
|
| 86 |
+
| ------------------------------ | --------- |
|
| 87 |
+
| **model2vecdk** | 0.867 |
|
| 88 |
+
| BM25 | 0.882 |
|
| 89 |
+
| multilingual-e5-large-instruct | 0.963 |
|
| 90 |
|
|
|
|
|
|
|
|
|
|
| 91 |
|
| 92 |
The model was also evaluated using the [Scandinavian Embedding Benchmark](https://kennethenevoldsen.github.io/scandinavian-embedding-benchmark/) and achieved the following performance:
|
| 93 |
|
| 94 |
+
| Rank | Model | Average Score | Average Rank | Angry Tweets | Bornholm Parallel | DKHate | Da Political Comments | DanFEVER | LCC | Language Identification | Massive Intent | Massive Scenario | ScaLA | TV2Nord Retrieval | Twitterhjerne |
|
| 95 |
+
|------|-------|---------------|--------------|--------------|-------------------|--------|----------------------|----------|-----|------------------------|----------------|------------------|-------|-------------------|---------------|
|
| 96 |
+
| 1 | TTC-L2V-supervised-2 | 0.68 | 4.75 | 67.09 | 54.59 | 69.00 | 45.84 | 38.31 | 73.67 | 88.61 | 74.80 | 78.35 | 53.04 | 92.79 | 85.02 |
|
| 97 |
+
| 2 | multilingual-e5-large-instruct | 0.66 | 7.75 | 64.57 | 55.02 | 67.14 | 45.33 | 39.52 | 70.60 | 82.48 | 71.89 | 77.51 | 50.18 | 93.69 | 77.23 |
|
| 98 |
+
| 3 | text-embedding-3-large | 0.64 | 8.92 | 57.80 | 43.34 | 70.21 | 43.41 | 39.61 | 58.07 | 79.74 | 69.27 | 75.92 | 50.69 | 95.20 | 81.08 |
|
| 99 |
+
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
|
| 100 |
+
| 42 | dfm-encoder-small-v1 (SimCSE) | 0.42 | 34.12 | 51.92 | 40.82 | 60.00 | 35.25 | 16.99 | 58.53 | 50.50 | 47.92 | 52.95 | 51.36 | 22.28 | 20.02 |
|
| 101 |
+
| 43 | **NEW: dk-model2vec-scripts_models_dk-ll** | **0.42** | **37.12** | **47.83** | **8.19** | **59.45** | **32.28** | **26.12** | **46.73** | **63.32** | **51.73** | **61.34** | **50.22** | **57.02** | **21.05** |
|
| 102 |
+
| 44 | dk-model2vec-model2vecdk-stem | 0.42 | 38.25 | 46.18 | 9.17 | 60.76 | 29.86 | 27.69 | 43.93 | 61.55 | 48.78 | 55.90 | 50.12 | 57.34 | 25.56 |
|
| 103 |
|
| 104 |
|
| 105 |
## Additional Resources
|