gomyk commited on
Commit
336d19c
·
verified ·
1 Parent(s): 63bc007

Upload L6_bottom with MTEB results

Browse files
README.md CHANGED
@@ -4,18 +4,17 @@ tags:
4
  - sentence-transformers
5
  - intent-classification
6
  - multilingual
7
- - distillation
8
  - layer-pruning
 
9
  library_name: sentence-transformers
10
  pipeline_tag: sentence-similarity
11
  license: apache-2.0
12
  ---
13
 
14
- # Intent Classifier Student: L6_bottom
15
 
16
- Distilled multilingual sentence encoder for intent classification (Action / Recall / Other).
17
-
18
- Created by **layer pruning** from `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2`.
19
 
20
  ## Model Details
21
 
@@ -24,76 +23,86 @@ Created by **layer pruning** from `sentence-transformers/paraphrase-multilingual
24
  | Teacher | paraphrase-multilingual-MiniLM-L12-v2 |
25
  | Architecture | XLM-RoBERTa (pruned) |
26
  | Hidden dim | 384 |
27
- | Layers | 6 (from 12) |
28
  | Layer indices | [0, 1, 2, 3, 4, 5] |
29
  | Strategy | 6 layers, bottom half (syntactic-focused) |
30
- | Est. params | 106,825,344 |
31
- | Est. FP32 | 407.5MB |
32
- | Est. INT8 | 101.9MB |
33
- | Est. INT8 + vocab pruned | 30.5MB |
34
 
35
  ## Supported Languages (18)
36
 
37
  ko, en, ja, zh, es, fr, de, pt, it, ru, ar, hi, th, vi, id, tr, nl, pl
38
 
39
- ## Intended Use
40
-
41
- This is a **student encoder** designed to be used as the backbone for a lightweight
42
- 3-class intent classifier (Action / Recall / Other) in multilingual dialogue systems.
43
-
44
- - **Action**: User requests an action (book, order, change settings, etc.)
45
- - **Recall**: User asks about past events or stored information
46
- - **Other**: Greetings, chitchat, emotions, etc.
47
-
48
- ## Usage
49
 
50
  ```python
51
  from sentence_transformers import SentenceTransformer
52
 
53
  model = SentenceTransformer("L6_bottom")
54
- embeddings = model.encode(["예약 좀 해줘", "지난번 주문 뭐였지?", "안녕하세요"])
55
- print(embeddings.shape) # (3, 384)
 
 
 
 
 
 
 
 
56
  ```
57
 
58
- ## MTEB Results
 
 
59
 
60
  ### MassiveIntentClassification
61
 
62
- **Average: 55.88%**
63
 
64
  | Language | Score |
65
  |----------|-------|
66
- | ar | 48.23% |
67
- | en | 60.82% |
68
- | es | 56.89% |
69
- | ko | 57.58% |
70
 
71
  ### MassiveScenarioClassification
72
 
73
- **Average: 60.75%**
74
 
75
  | Language | Score |
76
  |----------|-------|
77
- | ar | 53.04% |
78
- | en | 65.8% |
79
- | es | 60.99% |
80
- | ko | 63.19% |
 
 
 
 
 
 
81
 
 
 
 
 
82
 
 
83
 
84
- ## Training / Distillation
85
 
86
- This model was created via **layer pruning** (no additional training):
87
- 1. Load teacher: `paraphrase-multilingual-MiniLM-L12-v2` (12 layers, 384 hidden)
88
- 2. Select layers: `[0, 1, 2, 3, 4, 5]`
89
- 3. Copy embedding weights + selected layer weights
90
- 4. Wrap with mean pooling for sentence embeddings
91
 
92
- For deployment, vocabulary pruning (250K ~55K tokens) and INT8 quantization
93
- are applied to meet the ≤50MB size constraint.
 
 
 
94
 
95
  ## Limitations
96
 
97
- - Layer pruning without fine-tuning may lose some quality vs. proper knowledge distillation
98
- - Vocabulary pruning limits the model to the target 18 languages
99
  - Designed for short dialogue utterances, not long documents
 
 
4
  - sentence-transformers
5
  - intent-classification
6
  - multilingual
 
7
  - layer-pruning
8
+ - vocab-pruning
9
  library_name: sentence-transformers
10
  pipeline_tag: sentence-similarity
11
  license: apache-2.0
12
  ---
13
 
14
+ # L6_bottom
15
 
16
+ Lightweight multilingual sentence encoder optimized for intent classification.
17
+ Created from `paraphrase-multilingual-MiniLM-L12-v2` via layer pruning + corpus-based vocabulary pruning.
 
18
 
19
  ## Model Details
20
 
 
23
  | Teacher | paraphrase-multilingual-MiniLM-L12-v2 |
24
  | Architecture | XLM-RoBERTa (pruned) |
25
  | Hidden dim | 384 |
26
+ | Layers | 6 / 12 |
27
  | Layer indices | [0, 1, 2, 3, 4, 5] |
28
  | Strategy | 6 layers, bottom half (syntactic-focused) |
29
+ | Vocab size | ~38,330 (pruned from 250K) |
30
+ | Parameters | 26,184,576 |
31
+ | Safetensors size | 98.1MB |
32
+ | Distilled | No |
33
 
34
  ## Supported Languages (18)
35
 
36
  ko, en, ja, zh, es, fr, de, pt, it, ru, ar, hi, th, vi, id, tr, nl, pl
37
 
38
+ ## Quick Start
 
 
 
 
 
 
 
 
 
39
 
40
  ```python
41
  from sentence_transformers import SentenceTransformer
42
 
43
  model = SentenceTransformer("L6_bottom")
44
+
45
+ sentences = [
46
+ "예약 좀 해줘", # Korean
47
+ "What did I order?", # English
48
+ "今日はいい天気ですね", # Japanese
49
+ "Reserva una mesa", # Spanish
50
+ ]
51
+
52
+ embeddings = model.encode(sentences)
53
+ print(embeddings.shape) # (4, 384)
54
  ```
55
 
56
+ ## MTEB Evaluation Results
57
+
58
+ **Overall Average: 57.05%**
59
 
60
  ### MassiveIntentClassification
61
 
62
+ **Average: 54.7%**
63
 
64
  | Language | Score |
65
  |----------|-------|
66
+ | ar | 46.36% |
67
+ | en | 59.84% |
68
+ | es | 56.11% |
69
+ | ko | 56.49% |
70
 
71
  ### MassiveScenarioClassification
72
 
73
+ **Average: 59.39%**
74
 
75
  | Language | Score |
76
  |----------|-------|
77
+ | ar | 50.55% |
78
+ | en | 64.52% |
79
+ | es | 60.31% |
80
+ | ko | 62.19% |
81
+
82
+
83
+
84
+ ## Training
85
+
86
+ This model was created via **layer pruning + vocabulary pruning**:
87
 
88
+ 1. **Teacher**: `paraphrase-multilingual-MiniLM-L12-v2` (12 layers, 384 hidden dim)
89
+ 2. **Layer selection**: `[0, 1, 2, 3, 4, 5]` - 6 layers, bottom half (syntactic-focused)
90
+ 3. **Vocab pruning**: 250K -> ~38K tokens (corpus-based filtering for 18 target languages)
91
+ 4. **No additional training** - weights are directly copied from the teacher
92
 
93
+ A distilled version of this model is also available with improved performance.
94
 
 
95
 
96
+ ## Compression Summary
 
 
 
 
97
 
98
+ | Stage | Vocab | Layers | Size |
99
+ |-------|-------|--------|------|
100
+ | Teacher (original) | 250,002 | 12 | ~480MB |
101
+ | + Layer pruning | 250,002 | 6 | ~407MB |
102
+ | + Vocab pruning | ~38,330 | 6 | ~98MB |
103
 
104
  ## Limitations
105
 
106
+ - Vocabulary pruning restricts the model to the 18 target languages
 
107
  - Designed for short dialogue utterances, not long documents
108
+ - Layer pruning may reduce performance on complex semantic tasks
config.json CHANGED
@@ -21,5 +21,5 @@
21
  "transformers_version": "4.56.2",
22
  "type_vocab_size": 2,
23
  "use_cache": true,
24
- "vocab_size": 250037
25
  }
 
21
  "transformers_version": "4.56.2",
22
  "type_vocab_size": 2,
23
  "use_cache": true,
24
+ "vocab_size": 38330
25
  }
config_sentence_transformers.json CHANGED
@@ -3,7 +3,7 @@
3
  "__version__": {
4
  "sentence_transformers": "5.3.0",
5
  "transformers": "4.56.2",
6
- "pytorch": "2.10.0+cpu"
7
  },
8
  "prompts": {
9
  "query": "",
 
3
  "__version__": {
4
  "sentence_transformers": "5.3.0",
5
  "transformers": "4.56.2",
6
+ "pytorch": "2.10.0+cu128"
7
  },
8
  "prompts": {
9
  "query": "",
id_map.json ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d48bea38209b27ea02c4f79948b58ed600a6f46353aed9774b87f99b963b61ba
3
- size 428039432
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:75aade5a2325bfa6346cc282b70cbad0525ffc5add0ef159448f2df61b1260e7
3
+ size 102857288
tokenizer.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:cad551d5600a84242d0973327029452a1e3672ba6313c2a3c3d69c4310e12719
3
- size 17082987
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0ab1d8ad18d647b10254a627ba87f4f8dac8aea96ca026510f5f883fe2e6532e
3
+ size 2816831
tokenizer_config.json CHANGED
@@ -32,7 +32,7 @@
32
  "single_word": false,
33
  "special": true
34
  },
35
- "250001": {
36
  "content": "<mask>",
37
  "lstrip": true,
38
  "normalized": false,
 
32
  "single_word": false,
33
  "special": true
34
  },
35
+ "38329": {
36
  "content": "<mask>",
37
  "lstrip": true,
38
  "normalized": false,