INFINITY1023
/

multilingual-driver-command-models

 ---
 license: mit
+tags:
+- pytorch
+- nlp
+- nlu
+- text-classification
+- intent-classification
+- multilingual
+- driver-commands
+- fine-tuned
+- encoder-only
+- decoder-only
+language:
+- ru
+- en
+datasets:
+- INFINITY1023/MultilingualDriverCommands
+metrics:
+- accuracy
+- f1
+- precision
+- recall
+pipeline_tag: text-classification
+pretty_name: Multilingual Driver Command Models
 ---
+# Multilingual Driver Command Models
+## Model Summary
+This repository contains **four fine-tuned models** for multilingual driver command intent classification.
+The models were trained to classify short driver phrases in **Russian** and **English** into intent classes for an in-car voice assistant.
+The repository is linked to the dataset:
+- [`INFINITY1023/MultilingualDriverCommands`](https://huggingface.co/datasets/INFINITY1023/MultilingualDriverCommands)
+## Models
+| Model | Architecture Type | Description |
+|---|---|---|
+| `bge-m3` | Encoder-only | Multilingual encoder model |
+| `e5-multilingual` | Encoder-only | Semantic multilingual encoder |
+| `mmBERT-base` | Encoder-only | Compact multilingual BERT-style baseline |
+| `gte-Qwen2-7B-instruct` | Decoder-only | Instruction-tuned decoder model adapted for classification |
+## Task
+The models solve a **multiclass intent classification** task:
+> Given a short driver phrase, predict the corresponding intent class.
+Example inputs:
+- `Set the temperature to twenty two`
+- `Turn on Bluetooth audio`
+- `Позвони маме`
+- `Включи обогрев сиденья`
+- `Построй маршрут до дома`
+Possible intent categories include climate control, navigation, media, calls, phone connection, lighting, seat control, cruise control, and other vehicle assistant actions.
+## Training Dataset
+The models were trained on **Multilingual Driver Commands Dataset**.
+Dataset characteristics:
+| Property | Value |
+|---|---:|
+| Dataset size | 153,062 examples |
+| Languages | Russian + English |
+| Language distribution | 50% RU / 50% EN |
+| Final number of intents | 64 |
+| Task | Intent classification |
+The dataset was synthetically generated, manually validated, balanced across classes, and enriched with rare driving-related scenarios.
+## Experimental Results
+The following results were obtained on the test set after class balancing and merging semantically overlapping intents into 64 final classes.
+| Model | Accuracy | Macro F1 | Macro Precision | Macro Recall |
+|---|---:|---:|---:|---:|
+| `e5-multilingual-base` | 0.864 | 0.862 | 0.868 | 0.859 |
+| `mmBERT-base` | 0.857 | 0.854 | 0.859 | 0.853 |
+| `bge-m3` | 0.868 | 0.863 | 0.868 | 0.864 |
+| `gte-Qwen2-7B-instruct` | 0.872 | 0.870 | 0.878 | 0.865 |
+A separate experiment with stronger intent merging into 45 classes showed that `gte-Qwen2-7B-instruct` reached **0.905 accuracy**, but this reduced the functional granularity of the assistant.
+## Main Findings
+The experiments show that larger models do not always provide a proportional improvement for short command classification.
+Although `gte-Qwen2-7B-instruct` is much larger than `bge-m3`, the quality gap between them was relatively small. This suggests that, for this task, the main quality limitation is not only model size, but also:
+- class taxonomy;
+- semantic overlap between intents;
+- synthetic data noise;
+- incomplete or noisy parameter fields;
+- dataset structure and balance.
+For practical deployment, a smaller encoder-based model such as `bge-m3` may be more efficient, since it provides competitive quality with lower computational cost.
+## Repository Structure
+Recommended repository structure:
+```text
+best_models/
+├── bge-m3/
+│   └── model.pt
+├── e5-multilingual/
+│   └── model.pt
+├── mmBERT-base/
+│   └── model.pt
+└── qwen2/
+    └── model.pt
+```
+If the checkpoints are saved as PyTorch `state_dict` files, the model architecture code is required to load them correctly.
+## Loading PyTorch Checkpoints
+Example loading pattern:
+```python
+import torch
+# Example only: replace MyModel with the corresponding architecture class.
+from model import MyModel
+model = MyModel(...)
+state_dict = torch.load("best_models/bge-m3/model.pt", map_location="cpu")
+model.load_state_dict(state_dict)
+model.eval()
+```
+If a checkpoint was saved as a full PyTorch model object rather than a `state_dict`, it can be loaded as:
+```python
+import torch
+model = torch.load("best_models/bge-m3/model.pt", map_location="cpu")
+model.eval()
+```
+The exact loading method depends on how the checkpoint was saved during training.
+## Intended Use
+These models are intended for:
+- educational experiments;
+- research on synthetic NLU datasets;
+- multilingual intent classification;
+- comparison of encoder-only and decoder-only architectures;
+- prototyping voice assistant command recognition.
+## Limitations
+The models were trained on a synthetic dataset. Therefore, real-world performance may differ when applied to natural user traffic.
+Known limitations:
+- possible sensitivity to synthetic generation style;
+- errors on semantically close intents;
+- dependence on data quality and intent taxonomy;
+- limited robustness to real-world noise, slang, ASR errors, and incomplete phrases;
+- potential confusion between intents with similar surface forms.
+For production use, the models should be evaluated on real driver commands and monitored for data drift.
+## Citation
+If you use these checkpoints, please cite or reference this repository:
+```bibtex
+@misc{multilingual-driver-command-models,
+  title        = {Multilingual Driver Command Models},
+  author       = {Nizhankovskiy, Ilya},
+  year         = {2026},
+  publisher    = {Hugging Face},
+  howpublished = {\url{https://huggingface.co/INFINITY1023/multilingual-driver-command-models}}
+}
+```