File size: 5,802 Bytes
ef731ed
 
b7b4d6e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ef731ed
b7b4d6e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
---
license: mit
tags:
- pytorch
- nlp
- nlu
- text-classification
- intent-classification
- multilingual
- driver-commands
- fine-tuned
- encoder-only
- decoder-only
language:
- ru
- en
datasets:
- INFINITY1023/MultilingualDriverCommands
metrics:
- accuracy
- f1
- precision
- recall
pipeline_tag: text-classification
pretty_name: Multilingual Driver Command Models
---

# Multilingual Driver Command Models

## Model Summary

This repository contains **four fine-tuned models** for multilingual driver command intent classification.

The models were trained to classify short driver phrases in **Russian** and **English** into intent classes for an in-car voice assistant.

The repository is linked to the dataset:

- [`INFINITY1023/MultilingualDriverCommands`](https://huggingface.co/datasets/INFINITY1023/MultilingualDriverCommands)

## Models

| Model | Architecture Type | Description |
|---|---|---|
| `bge-m3` | Encoder-only | Multilingual encoder model |
| `e5-multilingual` | Encoder-only | Semantic multilingual encoder |
| `mmBERT-base` | Encoder-only | Compact multilingual BERT-style baseline |
| `gte-Qwen2-7B-instruct` | Decoder-only | Instruction-tuned decoder model adapted for classification |

## Task

The models solve a **multiclass intent classification** task:

> Given a short driver phrase, predict the corresponding intent class.

Example inputs:

- `Set the temperature to twenty two`
- `Turn on Bluetooth audio`
- `Позвони маме`
- `Включи обогрев сиденья`
- `Построй маршрут до дома`

Possible intent categories include climate control, navigation, media, calls, phone connection, lighting, seat control, cruise control, and other vehicle assistant actions.

## Training Dataset

The models were trained on **Multilingual Driver Commands Dataset**.

Dataset characteristics:

| Property | Value |
|---|---:|
| Dataset size | 153,062 examples |
| Languages | Russian + English |
| Language distribution | 50% RU / 50% EN |
| Final number of intents | 64 |
| Task | Intent classification |

The dataset was synthetically generated, manually validated, balanced across classes, and enriched with rare driving-related scenarios.

## Experimental Results

The following results were obtained on the test set after class balancing and merging semantically overlapping intents into 64 final classes.

| Model | Accuracy | Macro F1 | Macro Precision | Macro Recall |
|---|---:|---:|---:|---:|
| `e5-multilingual-base` | 0.864 | 0.862 | 0.868 | 0.859 |
| `mmBERT-base` | 0.857 | 0.854 | 0.859 | 0.853 |
| `bge-m3` | 0.868 | 0.863 | 0.868 | 0.864 |
| `gte-Qwen2-7B-instruct` | 0.872 | 0.870 | 0.878 | 0.865 |

A separate experiment with stronger intent merging into 45 classes showed that `gte-Qwen2-7B-instruct` reached **0.905 accuracy**, but this reduced the functional granularity of the assistant.

## Main Findings

The experiments show that larger models do not always provide a proportional improvement for short command classification.

Although `gte-Qwen2-7B-instruct` is much larger than `bge-m3`, the quality gap between them was relatively small. This suggests that, for this task, the main quality limitation is not only model size, but also:

- class taxonomy;
- semantic overlap between intents;
- synthetic data noise;
- incomplete or noisy parameter fields;
- dataset structure and balance.

For practical deployment, a smaller encoder-based model such as `bge-m3` may be more efficient, since it provides competitive quality with lower computational cost.

## Repository Structure

Recommended repository structure:

```text
best_models/
├── bge-m3/
│   └── model.pt
├── e5-multilingual/
│   └── model.pt
├── mmBERT-base/
│   └── model.pt
└── qwen2/
    └── model.pt
```

If the checkpoints are saved as PyTorch `state_dict` files, the model architecture code is required to load them correctly.

## Loading PyTorch Checkpoints

Example loading pattern:

```python
import torch

# Example only: replace MyModel with the corresponding architecture class.
from model import MyModel

model = MyModel(...)
state_dict = torch.load("best_models/bge-m3/model.pt", map_location="cpu")
model.load_state_dict(state_dict)
model.eval()
```

If a checkpoint was saved as a full PyTorch model object rather than a `state_dict`, it can be loaded as:

```python
import torch

model = torch.load("best_models/bge-m3/model.pt", map_location="cpu")
model.eval()
```

The exact loading method depends on how the checkpoint was saved during training.

## Intended Use

These models are intended for:

- educational experiments;
- research on synthetic NLU datasets;
- multilingual intent classification;
- comparison of encoder-only and decoder-only architectures;
- prototyping voice assistant command recognition.

## Limitations

The models were trained on a synthetic dataset. Therefore, real-world performance may differ when applied to natural user traffic.

Known limitations:

- possible sensitivity to synthetic generation style;
- errors on semantically close intents;
- dependence on data quality and intent taxonomy;
- limited robustness to real-world noise, slang, ASR errors, and incomplete phrases;
- potential confusion between intents with similar surface forms.

For production use, the models should be evaluated on real driver commands and monitored for data drift.

## Citation

If you use these checkpoints, please cite or reference this repository:

```bibtex
@misc{multilingual-driver-command-models,
  title        = {Multilingual Driver Command Models},
  author       = {Nizhankovskiy, Ilya},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/INFINITY1023/multilingual-driver-command-models}}
}
```