L4_uniform
Lightweight multilingual sentence encoder optimized for intent classification.
Created from paraphrase-multilingual-MiniLM-L12-v2 via layer pruning + corpus-based vocabulary pruning.
Model Details
| Property | Value |
|---|---|
| Teacher | paraphrase-multilingual-MiniLM-L12-v2 |
| Architecture | XLM-RoBERTa (pruned) |
| Hidden dim | 384 |
| Layers | 4 / 12 |
| Layer indices | [0, 4, 7, 11] |
| Strategy | 4 layers, evenly spaced (compact) |
| Vocab size | ~38,330 (pruned from 250K) |
| Parameters | 22,642,560 |
| Safetensors size | 84.6MB |
| Distilled | No |
Supported Languages (18)
ko, en, ja, zh, es, fr, de, pt, it, ru, ar, hi, th, vi, id, tr, nl, pl
Quick Start
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("L4_uniform")
sentences = [
"예약 좀 해줘", # Korean
"What did I order?", # English
"今日はいい天気ですね", # Japanese
"Reserva una mesa", # Spanish
]
embeddings = model.encode(sentences)
print(embeddings.shape) # (4, 384)
MTEB Evaluation Results
Overall Average: 52.03%
MassiveIntentClassification
Average: 50.25%
| Language | Score |
|---|---|
| ar | 41.2% |
| en | 57.63% |
| es | 49.12% |
| ko | 53.03% |
MassiveScenarioClassification
Average: 53.82%
| Language | Score |
|---|---|
| ar | 43.82% |
| en | 61.91% |
| es | 53.64% |
| ko | 55.9% |
Training
This model was created via layer pruning + vocabulary pruning:
- Teacher:
paraphrase-multilingual-MiniLM-L12-v2(12 layers, 384 hidden dim) - Layer selection:
[0, 4, 7, 11]- 4 layers, evenly spaced (compact) - Vocab pruning: 250K -> ~38K tokens (corpus-based filtering for 18 target languages)
- No additional training - weights are directly copied from the teacher
A distilled version of this model is also available with improved performance.
Compression Summary
| Stage | Vocab | Layers | Size |
|---|---|---|---|
| Teacher (original) | 250,002 | 12 | ~480MB |
| + Layer pruning | 250,002 | 4 | ~393MB |
| + Vocab pruning | ~38,330 | 4 | ~85MB |
Limitations
- Vocabulary pruning restricts the model to the 18 target languages
- Designed for short dialogue utterances, not long documents
- Layer pruning may reduce performance on complex semantic tasks
- Downloads last month
- 24