--- license: mit tags: - pytorch - nlp - nlu - text-classification - intent-classification - multilingual - driver-commands - fine-tuned - encoder-only - decoder-only language: - ru - en datasets: - INFINITY1023/MultilingualDriverCommands metrics: - accuracy - f1 - precision - recall pipeline_tag: text-classification pretty_name: Multilingual Driver Command Models --- # Multilingual Driver Command Models ## Model Summary This repository contains **four fine-tuned models** for multilingual driver command intent classification. The models were trained to classify short driver phrases in **Russian** and **English** into intent classes for an in-car voice assistant. The repository is linked to the dataset: - [`INFINITY1023/MultilingualDriverCommands`](https://huggingface.co/datasets/INFINITY1023/MultilingualDriverCommands) ## Models | Model | Architecture Type | Description | |---|---|---| | `bge-m3` | Encoder-only | Multilingual encoder model | | `e5-multilingual` | Encoder-only | Semantic multilingual encoder | | `mmBERT-base` | Encoder-only | Compact multilingual BERT-style baseline | | `gte-Qwen2-7B-instruct` | Decoder-only | Instruction-tuned decoder model adapted for classification | ## Task The models solve a **multiclass intent classification** task: > Given a short driver phrase, predict the corresponding intent class. Example inputs: - `Set the temperature to twenty two` - `Turn on Bluetooth audio` - `Позвони маме` - `Включи обогрев сиденья` - `Построй маршрут до дома` Possible intent categories include climate control, navigation, media, calls, phone connection, lighting, seat control, cruise control, and other vehicle assistant actions. ## Training Dataset The models were trained on **Multilingual Driver Commands Dataset**. Dataset characteristics: | Property | Value | |---|---:| | Dataset size | 153,062 examples | | Languages | Russian + English | | Language distribution | 50% RU / 50% EN | | Final number of intents | 64 | | Task | Intent classification | The dataset was synthetically generated, manually validated, balanced across classes, and enriched with rare driving-related scenarios. ## Experimental Results The following results were obtained on the test set after class balancing and merging semantically overlapping intents into 64 final classes. | Model | Accuracy | Macro F1 | Macro Precision | Macro Recall | |---|---:|---:|---:|---:| | `e5-multilingual-base` | 0.864 | 0.862 | 0.868 | 0.859 | | `mmBERT-base` | 0.857 | 0.854 | 0.859 | 0.853 | | `bge-m3` | 0.868 | 0.863 | 0.868 | 0.864 | | `gte-Qwen2-7B-instruct` | 0.872 | 0.870 | 0.878 | 0.865 | A separate experiment with stronger intent merging into 45 classes showed that `gte-Qwen2-7B-instruct` reached **0.905 accuracy**, but this reduced the functional granularity of the assistant. ## Main Findings The experiments show that larger models do not always provide a proportional improvement for short command classification. Although `gte-Qwen2-7B-instruct` is much larger than `bge-m3`, the quality gap between them was relatively small. This suggests that, for this task, the main quality limitation is not only model size, but also: - class taxonomy; - semantic overlap between intents; - synthetic data noise; - incomplete or noisy parameter fields; - dataset structure and balance. For practical deployment, a smaller encoder-based model such as `bge-m3` may be more efficient, since it provides competitive quality with lower computational cost. ## Repository Structure Recommended repository structure: ```text best_models/ ├── bge-m3/ │ └── model.pt ├── e5-multilingual/ │ └── model.pt ├── mmBERT-base/ │ └── model.pt └── qwen2/ └── model.pt ``` If the checkpoints are saved as PyTorch `state_dict` files, the model architecture code is required to load them correctly. ## Loading PyTorch Checkpoints Example loading pattern: ```python import torch # Example only: replace MyModel with the corresponding architecture class. from model import MyModel model = MyModel(...) state_dict = torch.load("best_models/bge-m3/model.pt", map_location="cpu") model.load_state_dict(state_dict) model.eval() ``` If a checkpoint was saved as a full PyTorch model object rather than a `state_dict`, it can be loaded as: ```python import torch model = torch.load("best_models/bge-m3/model.pt", map_location="cpu") model.eval() ``` The exact loading method depends on how the checkpoint was saved during training. ## Intended Use These models are intended for: - educational experiments; - research on synthetic NLU datasets; - multilingual intent classification; - comparison of encoder-only and decoder-only architectures; - prototyping voice assistant command recognition. ## Limitations The models were trained on a synthetic dataset. Therefore, real-world performance may differ when applied to natural user traffic. Known limitations: - possible sensitivity to synthetic generation style; - errors on semantically close intents; - dependence on data quality and intent taxonomy; - limited robustness to real-world noise, slang, ASR errors, and incomplete phrases; - potential confusion between intents with similar surface forms. For production use, the models should be evaluated on real driver commands and monitored for data drift. ## Citation If you use these checkpoints, please cite or reference this repository: ```bibtex @misc{multilingual-driver-command-models, title = {Multilingual Driver Command Models}, author = {Nizhankovskiy, Ilya}, year = {2026}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/INFINITY1023/multilingual-driver-command-models}} } ```