| ---
|
| language: ["ko", "en", "ja", "zh", "es", "fr", "de", "pt", "it", "ru", "ar", "hi", "th", "vi", "id", "tr", "nl", "pl"]
|
| tags:
|
| - sentence-transformers
|
| - intent-classification
|
| - multilingual
|
| - distillation
|
| - layer-pruning
|
| library_name: sentence-transformers
|
| pipeline_tag: sentence-similarity
|
| license: apache-2.0
|
| ---
|
|
|
| # Intent Classifier Student: L2_ends
|
|
|
| Distilled multilingual sentence encoder for intent classification (Action / Recall / Other).
|
|
|
| Created by **layer pruning** from `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2`.
|
|
|
| ## Model Details
|
|
|
| | Property | Value |
|
| |----------|-------|
|
| | Teacher | paraphrase-multilingual-MiniLM-L12-v2 |
|
| | Architecture | XLM-RoBERTa (pruned) |
|
| | Hidden dim | 384 |
|
| | Layers | 2 (from 12) |
|
| | Layer indices | [0, 11] |
|
| | Strategy | 2 layers, first + last (minimal) |
|
| | Est. params | 99,741,312 |
|
| | Est. FP32 | 380.5MB |
|
| | Est. INT8 | 95.1MB |
|
| | Est. INT8 + vocab pruned | 23.7MB |
|
|
|
| ## Supported Languages (18)
|
|
|
| ko, en, ja, zh, es, fr, de, pt, it, ru, ar, hi, th, vi, id, tr, nl, pl
|
|
|
| ## Intended Use
|
|
|
| This is a **student encoder** designed to be used as the backbone for a lightweight
|
| 3-class intent classifier (Action / Recall / Other) in multilingual dialogue systems.
|
|
|
| - **Action**: User requests an action (book, order, change settings, etc.)
|
| - **Recall**: User asks about past events or stored information
|
| - **Other**: Greetings, chitchat, emotions, etc.
|
|
|
| ## Usage
|
|
|
| ```python
|
| from sentence_transformers import SentenceTransformer
|
|
|
| model = SentenceTransformer("L2_ends")
|
| embeddings = model.encode(["์์ฝ ์ข ํด์ค", "์ง๋๋ฒ ์ฃผ๋ฌธ ๋ญ์์ง?", "์๋
ํ์ธ์"])
|
| print(embeddings.shape) # (3, 384)
|
| ```
|
|
|
| ## MTEB Results
|
|
|
| ### MassiveIntentClassification
|
|
|
| **Average: 49.8%**
|
|
|
| | Language | Score |
|
| |----------|-------|
|
| | ar | 42.22% |
|
| | en | 56.13% |
|
| | es | 48.54% |
|
| | ko | 52.31% |
|
|
|
| ### MassiveScenarioClassification
|
|
|
| **Average: 52.47%**
|
|
|
| | Language | Score |
|
| |----------|-------|
|
| | ar | 44.35% |
|
| | en | 59.73% |
|
| | es | 51.11% |
|
| | ko | 54.7% |
|
|
|
|
|
|
|
| ## Training / Distillation
|
|
|
| This model was created via **layer pruning** (no additional training):
|
| 1. Load teacher: `paraphrase-multilingual-MiniLM-L12-v2` (12 layers, 384 hidden)
|
| 2. Select layers: `[0, 11]`
|
| 3. Copy embedding weights + selected layer weights
|
| 4. Wrap with mean pooling for sentence embeddings
|
|
|
| For deployment, vocabulary pruning (250K โ ~55K tokens) and INT8 quantization
|
| are applied to meet the โค50MB size constraint.
|
|
|
| ## Limitations
|
|
|
| - Layer pruning without fine-tuning may lose some quality vs. proper knowledge distillation
|
| - Vocabulary pruning limits the model to the target 18 languages
|
| - Designed for short dialogue utterances, not long documents
|
| |