gte_compressed_distilled (Distilled)

Compact multilingual sentence encoder compressed from alibaba-NLP/gte-multilingual-base (26x compression).

Model Details

Property Value
Base model alibaba-NLP/gte-multilingual-base
Architecture new (encoder)
Hidden dim 384 (from 768)
Layers 4 (from 12)
Intermediate 1536
Attention heads 6
Vocab size 8,675 (from 250,048)
Parameters ~10.6M
Model size (FP32) 48.8MB
Compression 26x
Distilled Yes (2-stage)

Quick Start

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("gte_compressed_distilled", trust_remote_code=True)

sentences = [
    "Hello, how are you?",
    "μ•ˆλ…•ν•˜μ„Έμš”, 잘 μ§€λ‚΄μ„Έμš”?",
    "γ“γ‚“γ«γ‘γ―γ€ε…ƒζ°—γ§γ™γ‹οΌŸ",
    "δ½ ε₯½οΌŒδ½ ε₯½ε—οΌŸ",
]

embeddings = model.encode(sentences)
print(embeddings.shape)  # (4, 384)

MTEB Evaluation Results

Overall Average: 40.52%

Task Group Average
Classification 50.45%
Clustering 29.37%
STS 41.6%

Classification

Task Average Details
AmazonCounterfactualClassification 60.41% en: 64.01%, en-ext: 63.01%, ja: 59.07%, de: 55.55%
Banking77Classification 68.21% default: 68.21%
ImdbClassification 51.08% default: 51.08%
MTOPDomainClassification 64.75% en: 78.51%, fr: 70.49%, es: 69.99%, de: 57.15%, hi: 56.28%
MassiveIntentClassification 27.12% en: 60.72%, zh-CN: 56.64%, fr: 52.04%, de: 51.42%, ja: 51.1%
MassiveScenarioClassification 33.39% en: 69.54%, zh-CN: 65.7%, de: 63.05%, fr: 63.01%, ko: 59.76%
ToxicConversationsClassification 51.43% default: 51.43%
TweetSentimentExtractionClassification 47.21% default: 47.21%

Clustering

Task Average Details
ArXivHierarchicalClusteringP2P 48.06% default: 48.06%
ArXivHierarchicalClusteringS2S 46.58% default: 46.58%
BiorxivClusteringP2P.v2 9.34% default: 9.34%
MedrxivClusteringP2P.v2 19.95% default: 19.95%
MedrxivClusteringS2S.v2 19.53% default: 19.53%
StackExchangeClustering.v2 43.1% default: 43.1%
StackExchangeClusteringP2P.v2 33.85% default: 33.85%
TwentyNewsgroupsClustering.v2 14.53% default: 14.53%

STS

Task Average Details
BIOSSES 6.69% default: 6.69%
SICK-R 55.43% default: 55.43%
STS12 54.94% default: 54.94%
STS13 47.49% default: 47.49%
STS14 47.52% default: 47.52%
STS15 61.98% default: 61.98%
STS17 23.95% en-en: 64.6%, es-es: 57.67%, ar-ar: 49.18%, ko-ko: 43.25%, nl-en: 17.13%
STS22.v2 26.2% fr-pl: 73.25%, zh: 55.22%, es: 46.6%, pl-en: 35.24%, ar: 34.06%
STSBenchmark 50.2% default: 50.2%

Distillation Impact

Task Before After Delta
AmazonCounterfactualClassification 56.65% 60.41% +3.76%p
ArXivHierarchicalClusteringP2P 46.6% 48.06% +1.46%p
ArXivHierarchicalClusteringS2S 46.16% 46.58% +0.42%p
BIOSSES 14.57% 6.69% -7.88%p
Banking77Classification 23.46% 68.21% +44.75%p
BiorxivClusteringP2P.v2 9.23% 9.34% +0.11%p
ImdbClassification 53.23% 51.08% -2.15%p
MTOPDomainClassification 29.15% 64.75% +35.6%p
MassiveIntentClassification 12.03% 27.12% +15.09%p
MassiveScenarioClassification 15.96% 33.39% +17.43%p
MedrxivClusteringP2P.v2 19.99% 19.95% -0.04%p
MedrxivClusteringS2S.v2 18.6% 19.53% +0.93%p
SICK-R 39.12% 55.43% +16.31%p
STS12 33.18% 54.94% +21.76%p
STS13 33.48% 47.49% +14.01%p
STS14 30.91% 47.52% +16.61%p
STS15 36.95% 61.98% +25.03%p
STS17 13.25% 23.95% +10.7%p
STS22.v2 10.85% 26.2% +15.35%p
STSBenchmark 35.75% 50.2% +14.45%p
StackExchangeClustering.v2 38.67% 43.1% +4.43%p
StackExchangeClusteringP2P.v2 31.97% 33.85% +1.88%p
ToxicConversationsClassification 49.59% 51.43% +1.84%p
TweetSentimentExtractionClassification 37.27% 47.21% +9.94%p
TwentyNewsgroupsClustering.v2 7.32% 14.53% +7.21%p

Training

Stage 1: Model Compression

  • Teacher: alibaba-NLP/gte-multilingual-base (12L, 768d, 277M params)
  • Compression: Layer pruning β†’ Hidden dim reduction β†’ Vocab pruning
  • Result: 4L / 384d / 8,675 vocab

Stage 2: Two-Stage Knowledge Distillation

Compression ratio 26x requires progressive distillation:

  1. Stage 1: Teacher (277M) β†’ Intermediate (~55M)
    • MSE + Cosine Similarity loss
    • MTEB task datasets (Classification/Clustering/STS)
  2. Stage 2: Intermediate β†’ Final Student (10.6M)
    • Same training objective
    • AdamW (lr=2e-5, weight_decay=0.01), Cosine annealing

Supported Languages (18)

ko, en, ja, zh, es, fr, de, pt, it, ru, ar, hi, th, vi, id, tr, nl, pl

Downloads last month
23
Safetensors
Model size
12.8M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support