Update README.md

b7b4d6e verified 2 days ago

5.8 kB

	---
	license: mit
	tags:
	- pytorch
	- nlp
	- nlu
	- text-classification
	- intent-classification
	- multilingual
	- driver-commands
	- fine-tuned
	- encoder-only
	- decoder-only
	language:
	- ru
	- en
	datasets:
	- INFINITY1023/MultilingualDriverCommands
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	pipeline_tag: text-classification
	pretty_name: Multilingual Driver Command Models
	---

	# Multilingual Driver Command Models

	## Model Summary

	This repository contains four fine-tuned models for multilingual driver command intent classification.

	The models were trained to classify short driver phrases in Russian and English into intent classes for an in-car voice assistant.

	The repository is linked to the dataset:

	- [`INFINITY1023/MultilingualDriverCommands`](https://huggingface.co/datasets/INFINITY1023/MultilingualDriverCommands)

	## Models

	\| Model \| Architecture Type \| Description \|
	\|---\|---\|---\|
	\| `bge-m3` \| Encoder-only \| Multilingual encoder model \|
	\| `e5-multilingual` \| Encoder-only \| Semantic multilingual encoder \|
	\| `mmBERT-base` \| Encoder-only \| Compact multilingual BERT-style baseline \|
	\| `gte-Qwen2-7B-instruct` \| Decoder-only \| Instruction-tuned decoder model adapted for classification \|

	## Task

	The models solve a multiclass intent classification task:

	> Given a short driver phrase, predict the corresponding intent class.

	Example inputs:

	- `Set the temperature to twenty two`
	- `Turn on Bluetooth audio`
	- `Позвони маме`
	- `Включи обогрев сиденья`
	- `Построй маршрут до дома`

	Possible intent categories include climate control, navigation, media, calls, phone connection, lighting, seat control, cruise control, and other vehicle assistant actions.

	## Training Dataset

	The models were trained on Multilingual Driver Commands Dataset.

	Dataset characteristics:

	\| Property \| Value \|
	\|---\|---:\|
	\| Dataset size \| 153,062 examples \|
	\| Languages \| Russian + English \|
	\| Language distribution \| 50% RU / 50% EN \|
	\| Final number of intents \| 64 \|
	\| Task \| Intent classification \|

	The dataset was synthetically generated, manually validated, balanced across classes, and enriched with rare driving-related scenarios.

	## Experimental Results

	The following results were obtained on the test set after class balancing and merging semantically overlapping intents into 64 final classes.

	\| Model \| Accuracy \| Macro F1 \| Macro Precision \| Macro Recall \|
	\|---\|---:\|---:\|---:\|---:\|
	\| `e5-multilingual-base` \| 0.864 \| 0.862 \| 0.868 \| 0.859 \|
	\| `mmBERT-base` \| 0.857 \| 0.854 \| 0.859 \| 0.853 \|
	\| `bge-m3` \| 0.868 \| 0.863 \| 0.868 \| 0.864 \|
	\| `gte-Qwen2-7B-instruct` \| 0.872 \| 0.870 \| 0.878 \| 0.865 \|

	A separate experiment with stronger intent merging into 45 classes showed that `gte-Qwen2-7B-instruct` reached 0.905 accuracy, but this reduced the functional granularity of the assistant.

	## Main Findings

	The experiments show that larger models do not always provide a proportional improvement for short command classification.

	Although `gte-Qwen2-7B-instruct` is much larger than `bge-m3`, the quality gap between them was relatively small. This suggests that, for this task, the main quality limitation is not only model size, but also:

	- class taxonomy;
	- semantic overlap between intents;
	- synthetic data noise;
	- incomplete or noisy parameter fields;
	- dataset structure and balance.

	For practical deployment, a smaller encoder-based model such as `bge-m3` may be more efficient, since it provides competitive quality with lower computational cost.

	## Repository Structure

	Recommended repository structure:

	```text
	best_models/
	├── bge-m3/
	│ └── model.pt
	├── e5-multilingual/
	│ └── model.pt
	├── mmBERT-base/
	│ └── model.pt
	└── qwen2/
	└── model.pt
	```

	If the checkpoints are saved as PyTorch `state_dict` files, the model architecture code is required to load them correctly.

	## Loading PyTorch Checkpoints

	Example loading pattern:

	```python
	import torch

	# Example only: replace MyModel with the corresponding architecture class.
	from model import MyModel

	model = MyModel(...)
	state_dict = torch.load("best_models/bge-m3/model.pt", map_location="cpu")
	model.load_state_dict(state_dict)
	model.eval()
	```

	If a checkpoint was saved as a full PyTorch model object rather than a `state_dict`, it can be loaded as:

	```python
	import torch

	model = torch.load("best_models/bge-m3/model.pt", map_location="cpu")
	model.eval()
	```

	The exact loading method depends on how the checkpoint was saved during training.

	## Intended Use

	These models are intended for:

	- educational experiments;
	- research on synthetic NLU datasets;
	- multilingual intent classification;
	- comparison of encoder-only and decoder-only architectures;
	- prototyping voice assistant command recognition.

	## Limitations

	The models were trained on a synthetic dataset. Therefore, real-world performance may differ when applied to natural user traffic.

	Known limitations:

	- possible sensitivity to synthetic generation style;
	- errors on semantically close intents;
	- dependence on data quality and intent taxonomy;
	- limited robustness to real-world noise, slang, ASR errors, and incomplete phrases;
	- potential confusion between intents with similar surface forms.

	For production use, the models should be evaluated on real driver commands and monitored for data drift.

	## Citation

	If you use these checkpoints, please cite or reference this repository:

	```bibtex
	@misc{multilingual-driver-command-models,
	title = {Multilingual Driver Command Models},
	author = {Nizhankovskiy, Ilya},
	year = {2026},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/INFINITY1023/multilingual-driver-command-models}}
	}
	```