Sentence Similarity
sentence-transformers
Safetensors
new
multilingual
model-compression
layer-pruning
vocab-pruning
knowledge-distillation
progressive-distillation
gte-multilingual
custom_code
text-embeddings-inference
Instructions to use gomyk/gte-student-gte_compressed_distilled with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use gomyk/gte-student-gte_compressed_distilled with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("gomyk/gte-student-gte_compressed_distilled", trust_remote_code=True) sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
gte_compressed_distilled (Distilled)
Compact multilingual sentence encoder compressed from alibaba-NLP/gte-multilingual-base (26x compression).
Model Details
| Property | Value |
|---|---|
| Base model | alibaba-NLP/gte-multilingual-base |
| Architecture | new (encoder) |
| Hidden dim | 384 (from 768) |
| Layers | 4 (from 12) |
| Intermediate | 1536 |
| Attention heads | 6 |
| Vocab size | 8,675 (from 250,048) |
| Parameters | ~10.6M |
| Model size (FP32) | 48.8MB |
| Compression | 26x |
| Distilled | Yes (2-stage) |
Quick Start
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("gte_compressed_distilled", trust_remote_code=True)
sentences = [
"Hello, how are you?",
"μλ
νμΈμ, μ μ§λ΄μΈμ?",
"γγγ«γ‘γ―γε
ζ°γ§γγοΌ",
"δ½ ε₯½οΌδ½ ε₯½εοΌ",
]
embeddings = model.encode(sentences)
print(embeddings.shape) # (4, 384)
MTEB Evaluation Results
Overall Average: 40.52%
| Task Group | Average |
|---|---|
| Classification | 50.45% |
| Clustering | 29.37% |
| STS | 41.6% |
Classification
| Task | Average | Details |
|---|---|---|
| AmazonCounterfactualClassification | 60.41% | en: 64.01%, en-ext: 63.01%, ja: 59.07%, de: 55.55% |
| Banking77Classification | 68.21% | default: 68.21% |
| ImdbClassification | 51.08% | default: 51.08% |
| MTOPDomainClassification | 64.75% | en: 78.51%, fr: 70.49%, es: 69.99%, de: 57.15%, hi: 56.28% |
| MassiveIntentClassification | 27.12% | en: 60.72%, zh-CN: 56.64%, fr: 52.04%, de: 51.42%, ja: 51.1% |
| MassiveScenarioClassification | 33.39% | en: 69.54%, zh-CN: 65.7%, de: 63.05%, fr: 63.01%, ko: 59.76% |
| ToxicConversationsClassification | 51.43% | default: 51.43% |
| TweetSentimentExtractionClassification | 47.21% | default: 47.21% |
Clustering
| Task | Average | Details |
|---|---|---|
| ArXivHierarchicalClusteringP2P | 48.06% | default: 48.06% |
| ArXivHierarchicalClusteringS2S | 46.58% | default: 46.58% |
| BiorxivClusteringP2P.v2 | 9.34% | default: 9.34% |
| MedrxivClusteringP2P.v2 | 19.95% | default: 19.95% |
| MedrxivClusteringS2S.v2 | 19.53% | default: 19.53% |
| StackExchangeClustering.v2 | 43.1% | default: 43.1% |
| StackExchangeClusteringP2P.v2 | 33.85% | default: 33.85% |
| TwentyNewsgroupsClustering.v2 | 14.53% | default: 14.53% |
STS
| Task | Average | Details |
|---|---|---|
| BIOSSES | 6.69% | default: 6.69% |
| SICK-R | 55.43% | default: 55.43% |
| STS12 | 54.94% | default: 54.94% |
| STS13 | 47.49% | default: 47.49% |
| STS14 | 47.52% | default: 47.52% |
| STS15 | 61.98% | default: 61.98% |
| STS17 | 23.95% | en-en: 64.6%, es-es: 57.67%, ar-ar: 49.18%, ko-ko: 43.25%, nl-en: 17.13% |
| STS22.v2 | 26.2% | fr-pl: 73.25%, zh: 55.22%, es: 46.6%, pl-en: 35.24%, ar: 34.06% |
| STSBenchmark | 50.2% | default: 50.2% |
Distillation Impact
| Task | Before | After | Delta |
|---|---|---|---|
| AmazonCounterfactualClassification | 56.65% | 60.41% | +3.76%p |
| ArXivHierarchicalClusteringP2P | 46.6% | 48.06% | +1.46%p |
| ArXivHierarchicalClusteringS2S | 46.16% | 46.58% | +0.42%p |
| BIOSSES | 14.57% | 6.69% | -7.88%p |
| Banking77Classification | 23.46% | 68.21% | +44.75%p |
| BiorxivClusteringP2P.v2 | 9.23% | 9.34% | +0.11%p |
| ImdbClassification | 53.23% | 51.08% | -2.15%p |
| MTOPDomainClassification | 29.15% | 64.75% | +35.6%p |
| MassiveIntentClassification | 12.03% | 27.12% | +15.09%p |
| MassiveScenarioClassification | 15.96% | 33.39% | +17.43%p |
| MedrxivClusteringP2P.v2 | 19.99% | 19.95% | -0.04%p |
| MedrxivClusteringS2S.v2 | 18.6% | 19.53% | +0.93%p |
| SICK-R | 39.12% | 55.43% | +16.31%p |
| STS12 | 33.18% | 54.94% | +21.76%p |
| STS13 | 33.48% | 47.49% | +14.01%p |
| STS14 | 30.91% | 47.52% | +16.61%p |
| STS15 | 36.95% | 61.98% | +25.03%p |
| STS17 | 13.25% | 23.95% | +10.7%p |
| STS22.v2 | 10.85% | 26.2% | +15.35%p |
| STSBenchmark | 35.75% | 50.2% | +14.45%p |
| StackExchangeClustering.v2 | 38.67% | 43.1% | +4.43%p |
| StackExchangeClusteringP2P.v2 | 31.97% | 33.85% | +1.88%p |
| ToxicConversationsClassification | 49.59% | 51.43% | +1.84%p |
| TweetSentimentExtractionClassification | 37.27% | 47.21% | +9.94%p |
| TwentyNewsgroupsClustering.v2 | 7.32% | 14.53% | +7.21%p |
Training
Stage 1: Model Compression
- Teacher:
alibaba-NLP/gte-multilingual-base(12L, 768d, 277M params) - Compression: Layer pruning β Hidden dim reduction β Vocab pruning
- Result: 4L / 384d / 8,675 vocab
Stage 2: Two-Stage Knowledge Distillation
Compression ratio 26x requires progressive distillation:
- Stage 1: Teacher (277M) β Intermediate (~55M)
- MSE + Cosine Similarity loss
- MTEB task datasets (Classification/Clustering/STS)
- Stage 2: Intermediate β Final Student (10.6M)
- Same training objective
- AdamW (lr=2e-5, weight_decay=0.01), Cosine annealing
Supported Languages (18)
ko, en, ja, zh, es, fr, de, pt, it, ru, ar, hi, th, vi, id, tr, nl, pl
- Downloads last month
- 228