Hints for language

#19

by oleslav - opened 21 days ago

Hi, I would like to know, if it is possible to provide a language hint before starting a transcription via vLLM.

I have a example where the whole audio recorded in german but the first language are transcribed in Portugal (Brasilian).

"""
Vou tentar com os resultados que tomam para o forward support. Achim Müller ist mein Name. Ja, hallo, Heinz Heinisch ist mein Name. Ich besitze einen Thermomix TM6. Leider geht die Maschine nicht mehr an. Haben Sie versucht, Ihren Thermomix auch an anderen Steckdosen anzuschließen? Ja, ich hatte sämtliche Steckdosen in der Küche ausprobiert. Anscheinend ist das Gerät defekt und muss zu uns eingesandt werden. Ich benötige Ihre E-Mail-Adresse und dann würde ich Ihnen alle Versandpapiere dann per E-Mail zukommen lassen. Ja, sehr gerne. Meine E-Mail lautet heinzheinrich at gmail.com. Danke Ihnen auch wieder.
"""

Any tips to make it work properly? I have an idea sending a first audio chunk with a really strong German accent pronunciation, but it really not the best way to do so.

liuyt6515

20 days ago

We need the official to provide the training methodology, and then train the specific model based on the official training methodology provided.

oleslav

20 days ago

@liuyt6515 I am sorry, but I didn't understand you, could you rephrase your sentence?

patrickvonplaten

Mistral AI_ org 20 days ago

We need the official to provide the training methodology, and then train the specific model based on the official training methodology provided.

I think that's unrelated to the question

patrickvonplaten

Mistral AI_ org 20 days ago

@oleslav voxtral realtime doesn't support any language hints like voxtral mini does for example. So there is no way of "hinting" a language via the Request (we might add this in a next version).

What you can do to make sure your model stays in the correct language is pretty much exactly what you said - just have a "dummy 1 second " audio chunk of the clean target language (or longer). That will bias the model nicely towards your target language.

oleslav

20 days ago

@patrickvonplaten Okay, I got it. Thank you for you response & for a great model! I do really like it, it beats a enterprise solutions like Deepgram :D

bugtoo

18 days ago

@oleslav Where you be able to fix it with the bias trick? I tried, but no matter what I do, it seems some part are always transcribing in either arab or russian, even if the input is clear Italian in my case.

oleslav

18 days ago

•

edited 18 days ago

@bugtoo Unfortunately, I didn't try, but if I get any results. I will let you know

bugtoo

17 days ago

@oleslav I tried and in fact it works some times, but the trick is not usable for real time agents. I guess we really need to wait for an explicit hint flag.
I found that with the same clear Italian audio, it will mistake it for Arabic, French or Russian by simply adding different silence padding in the beginning (like 100ms, 200ms, 300ms) - if audio starts right away with the speech (no initial silence, not even 30ms), the detection is mostly ok.

oleslav

16 days ago

@bugtoo thank you for insight 🤗

oleslav changed discussion status to closed 16 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment