Intent Classifier Student: L6_top
Distilled multilingual sentence encoder for intent classification (Action / Recall / Other).
Created by layer pruning from sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2.
Model Details
| Property | Value |
|---|---|
| Teacher | paraphrase-multilingual-MiniLM-L12-v2 |
| Architecture | XLM-RoBERTa (pruned) |
| Hidden dim | 384 |
| Layers | 6 (from 12) |
| Layer indices | [6, 7, 8, 9, 10, 11] |
| Strategy | 6 layers, top half (semantic-focused) |
| Est. params | 106,825,344 |
| Est. FP32 | 407.5MB |
| Est. INT8 | 101.9MB |
| Est. INT8 + vocab pruned | 30.5MB |
Supported Languages (18)
ko, en, ja, zh, es, fr, de, pt, it, ru, ar, hi, th, vi, id, tr, nl, pl
Intended Use
This is a student encoder designed to be used as the backbone for a lightweight 3-class intent classifier (Action / Recall / Other) in multilingual dialogue systems.
- Action: User requests an action (book, order, change settings, etc.)
- Recall: User asks about past events or stored information
- Other: Greetings, chitchat, emotions, etc.
Usage
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("L6_top")
embeddings = model.encode(["์์ฝ ์ข ํด์ค", "์ง๋๋ฒ ์ฃผ๋ฌธ ๋ญ์์ง?", "์๋
ํ์ธ์"])
print(embeddings.shape) # (3, 384)
MTEB Results
MassiveIntentClassification
Average: 43.47%
| Language | Score |
|---|---|
| ar | 30.77% |
| en | 55.96% |
| es | 40.81% |
| ko | 46.34% |
MassiveScenarioClassification
Average: 47.62%
| Language | Score |
|---|---|
| ar | 33.99% |
| en | 62.04% |
| es | 46.12% |
| ko | 48.34% |
Training / Distillation
This model was created via layer pruning (no additional training):
- Load teacher:
paraphrase-multilingual-MiniLM-L12-v2(12 layers, 384 hidden) - Select layers:
[6, 7, 8, 9, 10, 11] - Copy embedding weights + selected layer weights
- Wrap with mean pooling for sentence embeddings
For deployment, vocabulary pruning (250K โ ~55K tokens) and INT8 quantization are applied to meet the โค50MB size constraint.
Limitations
- Layer pruning without fine-tuning may lose some quality vs. proper knowledge distillation
- Vocabulary pruning limits the model to the target 18 languages
- Designed for short dialogue utterances, not long documents
- Downloads last month
- 11