gte_compressed_distilled (Distilled)
Compact multilingual sentence encoder compressed from alibaba-NLP/gte-multilingual-base (26x compression).
Model Details
| Property |
Value |
| Base model |
alibaba-NLP/gte-multilingual-base |
| Architecture |
new (encoder) |
| Hidden dim |
384 (from 768) |
| Layers |
4 (from 12) |
| Intermediate |
1536 |
| Attention heads |
6 |
| Vocab size |
8,675 (from 250,048) |
| Parameters |
~10.6M |
| Model size (FP32) |
48.8MB |
| Compression |
26x |
| Distilled |
Yes (2-stage) |
Quick Start
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("gte_compressed_distilled", trust_remote_code=True)
sentences = [
"Hello, how are you?",
"μλ
νμΈμ, μ μ§λ΄μΈμ?",
"γγγ«γ‘γ―γε
ζ°γ§γγοΌ",
"δ½ ε₯½οΌδ½ ε₯½εοΌ",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
MTEB Evaluation Results
Overall Average: 40.52%
| Task Group |
Average |
| Classification |
50.45% |
| Clustering |
29.37% |
| STS |
41.6% |
Classification
| Task |
Average |
Details |
| AmazonCounterfactualClassification |
60.41% |
en: 64.01%, en-ext: 63.01%, ja: 59.07%, de: 55.55% |
| Banking77Classification |
68.21% |
default: 68.21% |
| ImdbClassification |
51.08% |
default: 51.08% |
| MTOPDomainClassification |
64.75% |
en: 78.51%, fr: 70.49%, es: 69.99%, de: 57.15%, hi: 56.28% |
| MassiveIntentClassification |
27.12% |
en: 60.72%, zh-CN: 56.64%, fr: 52.04%, de: 51.42%, ja: 51.1% |
| MassiveScenarioClassification |
33.39% |
en: 69.54%, zh-CN: 65.7%, de: 63.05%, fr: 63.01%, ko: 59.76% |
| ToxicConversationsClassification |
51.43% |
default: 51.43% |
| TweetSentimentExtractionClassification |
47.21% |
default: 47.21% |
Clustering
| Task |
Average |
Details |
| ArXivHierarchicalClusteringP2P |
48.06% |
default: 48.06% |
| ArXivHierarchicalClusteringS2S |
46.58% |
default: 46.58% |
| BiorxivClusteringP2P.v2 |
9.34% |
default: 9.34% |
| MedrxivClusteringP2P.v2 |
19.95% |
default: 19.95% |
| MedrxivClusteringS2S.v2 |
19.53% |
default: 19.53% |
| StackExchangeClustering.v2 |
43.1% |
default: 43.1% |
| StackExchangeClusteringP2P.v2 |
33.85% |
default: 33.85% |
| TwentyNewsgroupsClustering.v2 |
14.53% |
default: 14.53% |
STS
| Task |
Average |
Details |
| BIOSSES |
6.69% |
default: 6.69% |
| SICK-R |
55.43% |
default: 55.43% |
| STS12 |
54.94% |
default: 54.94% |
| STS13 |
47.49% |
default: 47.49% |
| STS14 |
47.52% |
default: 47.52% |
| STS15 |
61.98% |
default: 61.98% |
| STS17 |
23.95% |
en-en: 64.6%, es-es: 57.67%, ar-ar: 49.18%, ko-ko: 43.25%, nl-en: 17.13% |
| STS22.v2 |
26.2% |
fr-pl: 73.25%, zh: 55.22%, es: 46.6%, pl-en: 35.24%, ar: 34.06% |
| STSBenchmark |
50.2% |
default: 50.2% |
Distillation Impact
| Task |
Before |
After |
Delta |
| AmazonCounterfactualClassification |
56.65% |
60.41% |
+3.76%p |
| ArXivHierarchicalClusteringP2P |
46.6% |
48.06% |
+1.46%p |
| ArXivHierarchicalClusteringS2S |
46.16% |
46.58% |
+0.42%p |
| BIOSSES |
14.57% |
6.69% |
-7.88%p |
| Banking77Classification |
23.46% |
68.21% |
+44.75%p |
| BiorxivClusteringP2P.v2 |
9.23% |
9.34% |
+0.11%p |
| ImdbClassification |
53.23% |
51.08% |
-2.15%p |
| MTOPDomainClassification |
29.15% |
64.75% |
+35.6%p |
| MassiveIntentClassification |
12.03% |
27.12% |
+15.09%p |
| MassiveScenarioClassification |
15.96% |
33.39% |
+17.43%p |
| MedrxivClusteringP2P.v2 |
19.99% |
19.95% |
-0.04%p |
| MedrxivClusteringS2S.v2 |
18.6% |
19.53% |
+0.93%p |
| SICK-R |
39.12% |
55.43% |
+16.31%p |
| STS12 |
33.18% |
54.94% |
+21.76%p |
| STS13 |
33.48% |
47.49% |
+14.01%p |
| STS14 |
30.91% |
47.52% |
+16.61%p |
| STS15 |
36.95% |
61.98% |
+25.03%p |
| STS17 |
13.25% |
23.95% |
+10.7%p |
| STS22.v2 |
10.85% |
26.2% |
+15.35%p |
| STSBenchmark |
35.75% |
50.2% |
+14.45%p |
| StackExchangeClustering.v2 |
38.67% |
43.1% |
+4.43%p |
| StackExchangeClusteringP2P.v2 |
31.97% |
33.85% |
+1.88%p |
| ToxicConversationsClassification |
49.59% |
51.43% |
+1.84%p |
| TweetSentimentExtractionClassification |
37.27% |
47.21% |
+9.94%p |
| TwentyNewsgroupsClustering.v2 |
7.32% |
14.53% |
+7.21%p |
Training
Stage 1: Model Compression
- Teacher:
alibaba-NLP/gte-multilingual-base (12L, 768d, 277M params)
- Compression: Layer pruning β Hidden dim reduction β Vocab pruning
- Result: 4L / 384d / 8,675 vocab
Stage 2: Two-Stage Knowledge Distillation
Compression ratio 26x requires progressive distillation:
- Stage 1: Teacher (277M) β Intermediate (~55M)
- MSE + Cosine Similarity loss
- MTEB task datasets (Classification/Clustering/STS)
- Stage 2: Intermediate β Final Student (10.6M)
- Same training objective
- AdamW (lr=2e-5, weight_decay=0.01), Cosine annealing
Supported Languages (18)
ko, en, ja, zh, es, fr, de, pt, it, ru, ar, hi, th, vi, id, tr, nl, pl