Russian Addressee Detection Model

Fine-tuned RuBERT for detecting whether speech is directed at the system or not in Russian voice conversations.

Model Description

This model classifies whether a spoken utterance is directed at the voice assistant (addressee) or is ambient speech (talking to yourself, someone else in the room, or just mumbling). Designed for real-time voice chat applications to prevent false triggers.

Base Model: DeepPavlov/rubert-base-cased
Task: Binary classification (addressed to system / not addressed to system) Language: Russian

Performance

Metric	Score
Accuracy	95.16%
Precision	88.89%
Recall	94.12%
F1-Score	91.43%

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("Silxxor/Russian-Addressee-detector")
model = AutoModelForSequenceClassification.from_pretrained("Silxxor/Russian-Addressee-detector")

text = "покажи мне погоду на завтра"
inputs = tokenizer(text, return_tensors="pt", max_length=64, truncation=True, padding=True)

with torch.no_grad():
    outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits, dim=-1).item()

# 0 = not addressed to system, 1 = addressed to system
print("Addressed to system" if prediction == 1 else "Not addressed to system")

Training Data

Source: Synthetically generated dataset based on Russian conversational patterns
Wake words: "Люси" (Lucy), "ассистент" (assistant), "компьютер" (computer)
Mixed dataset of direct commands, questions with wake words, and ambient speech
Addressed utterances (containing wake words or direct commands) labeled as 1
Non-addressed utterances (self-talk, background conversation) labeled as 0
Balanced dataset

Training Details

Epochs: 3
Batch size: 16
Learning rate: 2e-5
Weight decay: 0.01
Optimizer: AdamW
Best model selection: F1 score

Limitations

Trained on synthetically generated data, not natural speech patterns
Optimized for specific wake words (Lucy, assistant, computer)
May not generalize well to other assistant names or contexts
Performance depends on ASR quality
Cultural and contextual nuances may affect accuracy

Intended Use

Voice assistants and conversational AI systems that need to distinguish between speech directed at them versus ambient conversation in Russian.

Downloads last month: 1

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for Silxxor/Russian-Addressee-detector

Base model

DeepPavlov/rubert-base-cased

Finetuned

(67)

this model