modernbert_L6_uniform
Lightweight sentence encoder created from answerdotai/ModernBERT-base via layer pruning + vocabulary pruning.
Model Details
| Property |
Value |
| Teacher |
answerdotai/ModernBERT-base |
| Architecture |
ModernBERT (pruned) |
| Hidden dim |
768 |
| Layers |
6 / 22 |
| Layer indices |
[0, 4, 8, 13, 17, 21] |
| Strategy |
6 layers, evenly spaced from ModernBERT (22L) |
| Parameters |
63,870,720 |
| Model size (FP32) |
176.0MB |
| Distilled |
No |
Architecture
==============================================================
TEACHER: ModernBERT β STUDENT: 6L / 27,279 vocab
==============================================================
TEACHER STUDENT
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
β Input Tokens β β Input Tokens β
ββββββββββββββ¬βββββββββββββ ββββββββββββββ¬βββββββββββββ
β β
ββββββββββββββ΄βββββββββββββ ββββββββββββββ΄βββββββββββββ
β Embeddings β β Embeddings (pruned) β
β vocab: 50,368 β β vocab: 27,279 β
β dim: 768 β β dim: 768 β
ββββββββββββββ¬βββββββββββββ ββββββββββββββ¬βββββββββββββ
β β
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
β Layer 0 β βββΊ β Layer 0 β L0 β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 1 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 2 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 3 β β³ β β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 4 β βββΊ β Layer 1 β L4 β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 5 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 6 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 7 β β³ β β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 8 β βββΊ β Layer 2 β L8 β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 9 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 10 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 11 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 12 β β³ β β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 13 β βββΊ β Layer 3 β L13 β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 14 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 15 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 16 β β³ β β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 17 β βββΊ β Layer 4 β L17 β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 18 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 19 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 20 β β³ β β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 21 β βββΊ β Layer 5 β L21 β
ββββββββββββββ¬βββββββββββββ ββββββββββββββ¬βββββββββββββ
β β
ββββββββββββββ΄βββββββββββββ ββββββββββββββ΄βββββββββββββ
β Mean Pooling β β Mean Pooling β
β β 768d embedding β β β 768d embedding β
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
Size: 495.8MB (FP32) β 176.0MB (FP32)
Params: 129,980,160 β 46,138,368
Reduction: 64.5%
==============================================================
Quick Start
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("modernbert_L6_uniform", trust_remote_code=True)
sentences = [
"Hello, how are you?",
"μλ
νμΈμ",
"Bonjour, comment allez-vous?",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
MTEB Evaluation Results
Overall Average: 35.42%
| Task Group |
Average |
| Classification |
41.97% |
| Clustering |
28.21% |
| STS |
36.0% |
Classification
| Task |
Average |
Details |
| AmazonCounterfactualClassification |
59.33% |
en-ext: 61.33%, de: 60.17%, en: 58.99% |
| Banking77Classification |
35.01% |
default: 35.01% |
| ImdbClassification |
55.05% |
default: 55.05% |
| MTOPDomainClassification |
43.24% |
en: 47.72%, es: 47.27%, th: 45.01% |
| MassiveIntentClassification |
25.86% |
zh-CN: 37.2%, ja: 34.36%, zh-TW: 32.51% |
| MassiveScenarioClassification |
26.28% |
zh-CN: 38.18%, zh-TW: 33.36%, en: 32.44% |
| ToxicConversationsClassification |
52.6% |
default: 52.6% |
| TweetSentimentExtractionClassification |
38.42% |
default: 38.42% |
Clustering
| Task |
Average |
Details |
| ArXivHierarchicalClusteringP2P |
50.19% |
default: 50.19% |
| ArXivHierarchicalClusteringS2S |
46.96% |
default: 46.96% |
| BiorxivClusteringP2P.v2 |
12.62% |
default: 12.62% |
| MedrxivClusteringP2P.v2 |
22.13% |
default: 22.13% |
| MedrxivClusteringS2S.v2 |
19.43% |
default: 19.43% |
| StackExchangeClustering.v2 |
34.26% |
default: 34.26% |
| StackExchangeClusteringP2P.v2 |
31.01% |
default: 31.01% |
| TwentyNewsgroupsClustering.v2 |
9.11% |
default: 9.11% |
STS
| Task |
Average |
Details |
| BIOSSES |
33.84% |
default: 33.84% |
| SICK-R |
46.99% |
default: 46.99% |
| STS12 |
35.32% |
default: 35.32% |
| STS13 |
33.7% |
default: 33.7% |
| STS14 |
37.07% |
default: 37.07% |
| STS15 |
49.85% |
default: 49.85% |
| STS17 |
23.34% |
es-es: 61.45%, en-en: 55.74%, ko-ko: 48.68% |
| STS22.v2 |
24.05% |
zh: 52.16%, es: 46.52%, it: 45.35% |
| STSBenchmark |
39.82% |
default: 39.82% |
Training
Created via layer pruning + vocabulary pruning (no additional training):
- Teacher:
answerdotai/ModernBERT-base (22 layers, 768d)
- Layer selection:
[0, 4, 8, 13, 17, 21] - 6 layers, evenly spaced from ModernBERT (22L)
- Vocab pruning: Corpus-based filtering for target languages
Supported Languages (18)
ko, en, ja, zh, es, fr, de, pt, it, ru, ar, hi, th, vi, id, tr, nl, pl