INFINITY1023 commited on
Commit
b7b4d6e
·
verified ·
1 Parent(s): 2e43dd6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +186 -0
README.md CHANGED
@@ -1,3 +1,189 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ tags:
4
+ - pytorch
5
+ - nlp
6
+ - nlu
7
+ - text-classification
8
+ - intent-classification
9
+ - multilingual
10
+ - driver-commands
11
+ - fine-tuned
12
+ - encoder-only
13
+ - decoder-only
14
+ language:
15
+ - ru
16
+ - en
17
+ datasets:
18
+ - INFINITY1023/MultilingualDriverCommands
19
+ metrics:
20
+ - accuracy
21
+ - f1
22
+ - precision
23
+ - recall
24
+ pipeline_tag: text-classification
25
+ pretty_name: Multilingual Driver Command Models
26
  ---
27
+
28
+ # Multilingual Driver Command Models
29
+
30
+ ## Model Summary
31
+
32
+ This repository contains **four fine-tuned models** for multilingual driver command intent classification.
33
+
34
+ The models were trained to classify short driver phrases in **Russian** and **English** into intent classes for an in-car voice assistant.
35
+
36
+ The repository is linked to the dataset:
37
+
38
+ - [`INFINITY1023/MultilingualDriverCommands`](https://huggingface.co/datasets/INFINITY1023/MultilingualDriverCommands)
39
+
40
+ ## Models
41
+
42
+ | Model | Architecture Type | Description |
43
+ |---|---|---|
44
+ | `bge-m3` | Encoder-only | Multilingual encoder model |
45
+ | `e5-multilingual` | Encoder-only | Semantic multilingual encoder |
46
+ | `mmBERT-base` | Encoder-only | Compact multilingual BERT-style baseline |
47
+ | `gte-Qwen2-7B-instruct` | Decoder-only | Instruction-tuned decoder model adapted for classification |
48
+
49
+ ## Task
50
+
51
+ The models solve a **multiclass intent classification** task:
52
+
53
+ > Given a short driver phrase, predict the corresponding intent class.
54
+
55
+ Example inputs:
56
+
57
+ - `Set the temperature to twenty two`
58
+ - `Turn on Bluetooth audio`
59
+ - `Позвони маме`
60
+ - `Включи обогрев сиденья`
61
+ - `Построй маршрут до дома`
62
+
63
+ Possible intent categories include climate control, navigation, media, calls, phone connection, lighting, seat control, cruise control, and other vehicle assistant actions.
64
+
65
+ ## Training Dataset
66
+
67
+ The models were trained on **Multilingual Driver Commands Dataset**.
68
+
69
+ Dataset characteristics:
70
+
71
+ | Property | Value |
72
+ |---|---:|
73
+ | Dataset size | 153,062 examples |
74
+ | Languages | Russian + English |
75
+ | Language distribution | 50% RU / 50% EN |
76
+ | Final number of intents | 64 |
77
+ | Task | Intent classification |
78
+
79
+ The dataset was synthetically generated, manually validated, balanced across classes, and enriched with rare driving-related scenarios.
80
+
81
+ ## Experimental Results
82
+
83
+ The following results were obtained on the test set after class balancing and merging semantically overlapping intents into 64 final classes.
84
+
85
+ | Model | Accuracy | Macro F1 | Macro Precision | Macro Recall |
86
+ |---|---:|---:|---:|---:|
87
+ | `e5-multilingual-base` | 0.864 | 0.862 | 0.868 | 0.859 |
88
+ | `mmBERT-base` | 0.857 | 0.854 | 0.859 | 0.853 |
89
+ | `bge-m3` | 0.868 | 0.863 | 0.868 | 0.864 |
90
+ | `gte-Qwen2-7B-instruct` | 0.872 | 0.870 | 0.878 | 0.865 |
91
+
92
+ A separate experiment with stronger intent merging into 45 classes showed that `gte-Qwen2-7B-instruct` reached **0.905 accuracy**, but this reduced the functional granularity of the assistant.
93
+
94
+ ## Main Findings
95
+
96
+ The experiments show that larger models do not always provide a proportional improvement for short command classification.
97
+
98
+ Although `gte-Qwen2-7B-instruct` is much larger than `bge-m3`, the quality gap between them was relatively small. This suggests that, for this task, the main quality limitation is not only model size, but also:
99
+
100
+ - class taxonomy;
101
+ - semantic overlap between intents;
102
+ - synthetic data noise;
103
+ - incomplete or noisy parameter fields;
104
+ - dataset structure and balance.
105
+
106
+ For practical deployment, a smaller encoder-based model such as `bge-m3` may be more efficient, since it provides competitive quality with lower computational cost.
107
+
108
+ ## Repository Structure
109
+
110
+ Recommended repository structure:
111
+
112
+ ```text
113
+ best_models/
114
+ ├── bge-m3/
115
+ │ └── model.pt
116
+ ├── e5-multilingual/
117
+ │ └── model.pt
118
+ ├── mmBERT-base/
119
+ │ └── model.pt
120
+ └── qwen2/
121
+ └── model.pt
122
+ ```
123
+
124
+ If the checkpoints are saved as PyTorch `state_dict` files, the model architecture code is required to load them correctly.
125
+
126
+ ## Loading PyTorch Checkpoints
127
+
128
+ Example loading pattern:
129
+
130
+ ```python
131
+ import torch
132
+
133
+ # Example only: replace MyModel with the corresponding architecture class.
134
+ from model import MyModel
135
+
136
+ model = MyModel(...)
137
+ state_dict = torch.load("best_models/bge-m3/model.pt", map_location="cpu")
138
+ model.load_state_dict(state_dict)
139
+ model.eval()
140
+ ```
141
+
142
+ If a checkpoint was saved as a full PyTorch model object rather than a `state_dict`, it can be loaded as:
143
+
144
+ ```python
145
+ import torch
146
+
147
+ model = torch.load("best_models/bge-m3/model.pt", map_location="cpu")
148
+ model.eval()
149
+ ```
150
+
151
+ The exact loading method depends on how the checkpoint was saved during training.
152
+
153
+ ## Intended Use
154
+
155
+ These models are intended for:
156
+
157
+ - educational experiments;
158
+ - research on synthetic NLU datasets;
159
+ - multilingual intent classification;
160
+ - comparison of encoder-only and decoder-only architectures;
161
+ - prototyping voice assistant command recognition.
162
+
163
+ ## Limitations
164
+
165
+ The models were trained on a synthetic dataset. Therefore, real-world performance may differ when applied to natural user traffic.
166
+
167
+ Known limitations:
168
+
169
+ - possible sensitivity to synthetic generation style;
170
+ - errors on semantically close intents;
171
+ - dependence on data quality and intent taxonomy;
172
+ - limited robustness to real-world noise, slang, ASR errors, and incomplete phrases;
173
+ - potential confusion between intents with similar surface forms.
174
+
175
+ For production use, the models should be evaluated on real driver commands and monitored for data drift.
176
+
177
+ ## Citation
178
+
179
+ If you use these checkpoints, please cite or reference this repository:
180
+
181
+ ```bibtex
182
+ @misc{multilingual-driver-command-models,
183
+ title = {Multilingual Driver Command Models},
184
+ author = {Nizhankovskiy, Ilya},
185
+ year = {2026},
186
+ publisher = {Hugging Face},
187
+ howpublished = {\url{https://huggingface.co/INFINITY1023/multilingual-driver-command-models}}
188
+ }
189
+ ```