Russian Turn Detection Model (Qwen/Qwen3-0.3B Fine-Tuned)
This model is a specialized End-of-Utterance (EOU) / Turn Detection model designed for Russian spoken dialogue systems. It is fine-tuned from Qwen/Qwen/Qwen/Qwen3-0.6B to classify whether a user has finished speaking or is pausing mid-sentence.
It is optimized for real-time voice agents (like those using LiveKit) to minimize interruptions and reduce latency in conversational flows.
π― Model Capabilities
- Task: Classifies text input as either
COMPLETE(user finished) orCONTINUE(user is thinking/pausing). - Language: Russian (primary), handles mixed English/Russian technical terms.
- Latency: Extremely fast inference (based on 0.5B parameter model), suitable for edge or cloud deployment.
- Nuance: Correctly handles hesitation markers (e.g., "Π½Ρ...", "ΡΡΡ...", "ΠΊΠ°ΠΊ Π±Ρ") as
CONTINUE.
π» Usage
Inference with Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "RAS1981/qwen3-turn-detector-merged"
device = "cuda" if torch.cuda.is_available() else "cpu"
model = AutoModelForCausalLM.from_pretrained(model_name).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_name)
def predict_turn(text):
messages = [
{"role": "system", "content": "Π’Ρ Π³ΠΎΠ»ΠΎΡΠΎΠ²ΠΎΠΉ Π°ΡΡΠΈΡΡΠ΅Π½Ρ. ΠΠΏΡΠ΅Π΄Π΅Π»ΡΠΉ, Π·Π°ΠΊΠΎΠ½ΡΠΈΠ» Π»ΠΈ ΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°ΡΠ΅Π»Ρ Π³ΠΎΠ²ΠΎΡΠΈΡΡ."},
{"role": "user", "content": text}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to(device)
outputs = model.generate(
inputs,
max_new_tokens=2,
use_cache=True,
pad_token_id=tokenizer.eos_token_id
)
decoded = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
return decoded.strip()
# Test Cases
print(predict_turn("ΠΡΠΈΠ²Π΅Ρ, Ρ Ρ
ΠΎΡΡ Π·Π°ΠΊΠ°Π·Π°ΡΡ ΠΏΠΈΡΡΡ")) # Output: COMPLETE
print(predict_turn("ΠΡ Ρ Π΄ΡΠΌΠ°Ρ ΡΡΠΎ ΠΌΠΎΠΆΠ΅Ρ Π±ΡΡΡ...")) # Output: CONTINUE
π Training Details
Dataset
- Source: Custom dataset generated via Gemini 2.5 Flash Lite based on
IlyaGusev/ru_turbo_alpacaandss-corpus-ru. - Preprocessing:
- Converted formal text to spoken Russian (added hesitations, fillers, self-corrections).
- Normalized using NFKC and lowercased specific punctuation.
- Balanced 50/50 split between
COMPLETEandCONTINUElabels to prevent bias.
- Size: ~400 high-quality curated examples (incremental training).
Hyperparameters
- Framework: Unsloth + TRL (SFTTrainer)
- Quantization: 4-bit (QLoRA)
- Learning Rate: 2e-4
- Epochs: 62 (Early stopping based on loss convergence)
- Final Loss: ~0.086
- Optimizer: AdamW 8-bit
β οΈ Limitations
- Context: The model looks at the current utterance context. Extremely long pauses in audio might still need VAD (Voice Activity Detection) support.
- Domain: Fine-tuned on general conversation and real-estate inquiries; may need adaptation for highly specific medical or legal jargon.
π Intended Use
- LiveKit Agents: Use as a semantic turn detector in the EOU plugin.
- Customer Support Bots: Prevent the bot from interrupting users while they think.
- Voice Assistants: Improve natural flow in Russian dialogue.
- Downloads last month
- 2