Multilingual Driver Command Models

Model Summary

This repository contains four fine-tuned models for multilingual driver command intent classification.

The models were trained to classify short driver phrases in Russian and English into intent classes for an in-car voice assistant.

The repository is linked to the dataset:

INFINITY1023/MultilingualDriverCommands

Models

Model	Architecture Type	Description
`bge-m3`	Encoder-only	Multilingual encoder model
`e5-multilingual`	Encoder-only	Semantic multilingual encoder
`mmBERT-base`	Encoder-only	Compact multilingual BERT-style baseline
`gte-Qwen2-7B-instruct`	Decoder-only	Instruction-tuned decoder model adapted for classification

Task

The models solve a multiclass intent classification task:

Given a short driver phrase, predict the corresponding intent class.

Example inputs:

Set the temperature to twenty two
Turn on Bluetooth audio
Позвони маме
Включи обогрев сиденья
Построй маршрут до дома

Possible intent categories include climate control, navigation, media, calls, phone connection, lighting, seat control, cruise control, and other vehicle assistant actions.

Training Dataset

The models were trained on Multilingual Driver Commands Dataset.

Dataset characteristics:

Property	Value
Dataset size	153,062 examples
Languages	Russian + English
Language distribution	50% RU / 50% EN
Final number of intents	64
Task	Intent classification

The dataset was synthetically generated, manually validated, balanced across classes, and enriched with rare driving-related scenarios.

Experimental Results

The following results were obtained on the test set after class balancing and merging semantically overlapping intents into 64 final classes.

Model	Accuracy	Macro F1	Macro Precision	Macro Recall
`e5-multilingual-base`	0.864	0.862	0.868	0.859
`mmBERT-base`	0.857	0.854	0.859	0.853
`bge-m3`	0.868	0.863	0.868	0.864
`gte-Qwen2-7B-instruct`	0.872	0.870	0.878	0.865

A separate experiment with stronger intent merging into 45 classes showed that gte-Qwen2-7B-instruct reached 0.905 accuracy, but this reduced the functional granularity of the assistant.

Main Findings

The experiments show that larger models do not always provide a proportional improvement for short command classification.

Although gte-Qwen2-7B-instruct is much larger than bge-m3, the quality gap between them was relatively small. This suggests that, for this task, the main quality limitation is not only model size, but also:

class taxonomy;
semantic overlap between intents;
synthetic data noise;
incomplete or noisy parameter fields;
dataset structure and balance.

For practical deployment, a smaller encoder-based model such as bge-m3 may be more efficient, since it provides competitive quality with lower computational cost.

Repository Structure

Recommended repository structure:

best_models/
├── bge-m3/
│   └── model.pt
├── e5-multilingual/
│   └── model.pt
├── mmBERT-base/
│   └── model.pt
└── qwen2/
    └── model.pt

If the checkpoints are saved as PyTorch state_dict files, the model architecture code is required to load them correctly.

Loading PyTorch Checkpoints

Example loading pattern:

import torch

# Example only: replace MyModel with the corresponding architecture class.
from model import MyModel

model = MyModel(...)
state_dict = torch.load("best_models/bge-m3/model.pt", map_location="cpu")
model.load_state_dict(state_dict)
model.eval()

If a checkpoint was saved as a full PyTorch model object rather than a state_dict, it can be loaded as:

import torch

model = torch.load("best_models/bge-m3/model.pt", map_location="cpu")
model.eval()

The exact loading method depends on how the checkpoint was saved during training.

Intended Use

These models are intended for:

educational experiments;
research on synthetic NLU datasets;
multilingual intent classification;
comparison of encoder-only and decoder-only architectures;
prototyping voice assistant command recognition.

Limitations

The models were trained on a synthetic dataset. Therefore, real-world performance may differ when applied to natural user traffic.

Known limitations:

possible sensitivity to synthetic generation style;
errors on semantically close intents;
dependence on data quality and intent taxonomy;
limited robustness to real-world noise, slang, ASR errors, and incomplete phrases;
potential confusion between intents with similar surface forms.

For production use, the models should be evaluated on real driver commands and monitored for data drift.

Citation

If you use these checkpoints, please cite or reference this repository:

@misc{multilingual-driver-command-models,
  title        = {Multilingual Driver Command Models},
  author       = {Nizhankovskiy, Ilya},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/INFINITY1023/multilingual-driver-command-models}}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

INFINITY1023
/

multilingual-driver-command-models