L4_uniform
Lightweight sentence encoder created from sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 via layer pruning + vocabulary pruning.
Model Details
| Property |
Value |
| Teacher |
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 |
| Architecture |
MiniLM-L12 (pruned) |
| Hidden dim |
384 |
| Layers |
4 / 12 |
| Layer indices |
[0, 4, 7, 11] |
| Strategy |
4 layers, evenly spaced (compact) |
| Parameters |
103,283,328 |
| Model size (FP32) |
84.6MB |
| Distilled |
No |
Architecture
==============================================================
TEACHER: MiniLM-L12 β STUDENT: 4L / 38,755 vocab
==============================================================
TEACHER STUDENT
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
β Input Tokens β β Input Tokens β
ββββββββββββββ¬βββββββββββββ ββββββββββββββ¬βββββββββββββ
β β
ββββββββββββββ΄βββββββββββββ ββββββββββββββ΄βββββββββββββ
β Embeddings β β Embeddings (pruned) β
β vocab: 250,002 β β vocab: 38,755 β
β dim: 384 β β dim: 384 β
ββββββββββββββ¬βββββββββββββ ββββββββββββββ¬βββββββββββββ
β β
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
β Layer 0 β βββΊ β Layer 0 β L0 β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 1 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 2 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 3 β β³ β β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 4 β βββΊ β Layer 1 β L4 β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 5 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 6 β β³ β β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 7 β βββΊ β Layer 2 β L7 β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 8 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 9 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 10 β β³ β β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 11 β βββΊ β Layer 3 β L11 β
ββββββββββββββ¬βββββββββββββ ββββββββββββββ¬βββββββββββββ
β β
ββββββββββββββ΄βββββββββββββ ββββββββββββββ΄βββββββββββββ
β Mean Pooling β β Mean Pooling β
β β 384d embedding β β β 384d embedding β
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
Size: 448.0MB (FP32) β 84.6MB (FP32)
Params: 117,451,392 β 22,164,480
Reduction: 81.1%
==============================================================
Quick Start
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("L4_uniform", trust_remote_code=True)
sentences = [
"Hello, how are you?",
"μλ
νμΈμ",
"Bonjour, comment allez-vous?",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
MTEB Evaluation Results
Overall Average: 49.02%
| Task Group |
Average |
| Classification |
56.87% |
| Clustering |
32.04% |
| STS |
57.15% |
Classification
| Task |
Average |
Details |
| AmazonCounterfactualClassification |
67.02% |
en: 70.31%, en-ext: 68.1%, de: 65.73% |
| Banking77Classification |
69.18% |
default: 69.18% |
| ImdbClassification |
59.38% |
default: 59.38% |
| MTOPDomainClassification |
71.48% |
en: 80.02%, es: 73.78%, hi: 71.07% |
| MassiveIntentClassification |
36.9% |
en: 58.41%, zh-CN: 58.07%, ja: 56.73% |
| MassiveScenarioClassification |
39.51% |
zh-CN: 63.96%, en: 62.71%, ja: 59.84% |
| ToxicConversationsClassification |
62.02% |
default: 62.02% |
| TweetSentimentExtractionClassification |
49.43% |
default: 49.43% |
Clustering
| Task |
Average |
Details |
| ArXivHierarchicalClusteringP2P |
49.93% |
default: 49.93% |
| ArXivHierarchicalClusteringS2S |
46.08% |
default: 46.08% |
| BiorxivClusteringP2P.v2 |
21.47% |
default: 21.47% |
| MedrxivClusteringP2P.v2 |
26.05% |
default: 26.05% |
| MedrxivClusteringS2S.v2 |
22.94% |
default: 22.94% |
| StackExchangeClustering.v2 |
41.23% |
default: 41.23% |
| StackExchangeClusteringP2P.v2 |
32.19% |
default: 32.19% |
| TwentyNewsgroupsClustering.v2 |
16.43% |
default: 16.43% |
STS
| Task |
Average |
Details |
| BIOSSES |
45.64% |
default: 45.64% |
| SICK-R |
62.01% |
default: 62.01% |
| STS12 |
57.85% |
default: 57.85% |
| STS13 |
65.48% |
default: 65.48% |
| STS14 |
60.39% |
default: 60.39% |
| STS15 |
73.93% |
default: 73.93% |
| STS17 |
46.29% |
en-en: 76.54%, es-es: 75.88%, ko-ko: 62.72% |
| STS22.v2 |
37.34% |
zh: 57.86%, es: 54.85%, fr: 51.41% |
| STSBenchmark |
65.38% |
default: 65.38% |
Training
Created via layer pruning + vocabulary pruning (no additional training):
- Teacher:
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 (12 layers, 384d)
- Layer selection:
[0, 4, 7, 11] - 4 layers, evenly spaced (compact)
- Vocab pruning: Corpus-based filtering for target languages
Supported Languages (18)
ko, en, ja, zh, es, fr, de, pt, it, ru, ar, hi, th, vi, id, tr, nl, pl