gomyk
/

intent-student-L2_ends

Sentence Similarity

sentence-transformers

intent-classification

text-embeddings-inference

Model card Files Files and versions

intent-student-L2_ends / README.md

gomyk's picture

Upload L2_ends student model with MTEB results

23eac37 verified 27 days ago

|

history blame contribute delete

2.78 kB

	---
	language: ["ko", "en", "ja", "zh", "es", "fr", "de", "pt", "it", "ru", "ar", "hi", "th", "vi", "id", "tr", "nl", "pl"]
	tags:
	- sentence-transformers
	- intent-classification
	- multilingual
	- distillation
	- layer-pruning
	library_name: sentence-transformers
	pipeline_tag: sentence-similarity
	license: apache-2.0
	---

	# Intent Classifier Student: L2_ends

	Distilled multilingual sentence encoder for intent classification (Action / Recall / Other).

	Created by layer pruning from `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2`.

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Teacher \| paraphrase-multilingual-MiniLM-L12-v2 \|
	\| Architecture \| XLM-RoBERTa (pruned) \|
	\| Hidden dim \| 384 \|
	\| Layers \| 2 (from 12) \|
	\| Layer indices \| [0, 11] \|
	\| Strategy \| 2 layers, first + last (minimal) \|
	\| Est. params \| 99,741,312 \|
	\| Est. FP32 \| 380.5MB \|
	\| Est. INT8 \| 95.1MB \|
	\| Est. INT8 + vocab pruned \| 23.7MB \|

	## Supported Languages (18)

	ko, en, ja, zh, es, fr, de, pt, it, ru, ar, hi, th, vi, id, tr, nl, pl

	## Intended Use

	This is a student encoder designed to be used as the backbone for a lightweight
	3-class intent classifier (Action / Recall / Other) in multilingual dialogue systems.

	- Action: User requests an action (book, order, change settings, etc.)
	- Recall: User asks about past events or stored information
	- Other: Greetings, chitchat, emotions, etc.

	## Usage

	```python
	from sentence_transformers import SentenceTransformer

	model = SentenceTransformer("L2_ends")
	embeddings = model.encode(["예약 좀 해줘", "지난번 주문 뭐였지?", "안녕하세요"])
	print(embeddings.shape) # (3, 384)
	```

	## MTEB Results

	### MassiveIntentClassification

	Average: 49.8%

	\| Language \| Score \|
	\|----------\|-------\|
	\| ar \| 42.22% \|
	\| en \| 56.13% \|
	\| es \| 48.54% \|
	\| ko \| 52.31% \|

	### MassiveScenarioClassification

	Average: 52.47%

	\| Language \| Score \|
	\|----------\|-------\|
	\| ar \| 44.35% \|
	\| en \| 59.73% \|
	\| es \| 51.11% \|
	\| ko \| 54.7% \|



	## Training / Distillation

	This model was created via layer pruning (no additional training):
	1. Load teacher: `paraphrase-multilingual-MiniLM-L12-v2` (12 layers, 384 hidden)
	2. Select layers: `[0, 11]`
	3. Copy embedding weights + selected layer weights
	4. Wrap with mean pooling for sentence embeddings

	For deployment, vocabulary pruning (250K → ~55K tokens) and INT8 quantization
	are applied to meet the ≤50MB size constraint.

	## Limitations

	- Layer pruning without fine-tuning may lose some quality vs. proper knowledge distillation
	- Vocabulary pruning limits the model to the target 18 languages
	- Designed for short dialogue utterances, not long documents