Multi language support

#40

by geramirez - opened 26 days ago

Discussion

geramirez

26 days ago

Are there any plans for a version of this model supporting spanish and portuguese languages ?

faloure

26 days ago

•

edited 26 days ago

I hope that multilingual support will be implemented. This would be a great second-layer solution for customer support on my sales website. This AI is excellent at dealing with people!
Please include Portuguese.

Mystx

22 days ago

Include french too please

royrajarshi

NVIDIA org 18 days ago

Feedback taken! Multilingual is hard because 1) Moshi is not multilingual so multilingual finetunes will be quite limited so swapping base models needs to be figured out first 2) the edge that this model has over standard ASR+LLM+TTS voice agent stack is naturalness that comes from real channel-separated conversations dataset, where Fisher English is the only publicly available source.
Will keep this discussion open to hear more language requests, and pointers to publicly available, commercially-available channel-separated dialog datasets in other languages.

tommasobredariol

17 days ago

is italian possible perhaps?

SyeedxOWL

16 days ago

is Bangla possible? please Add Bangla language support.

usfaa444

15 days ago

French support would also be great

valedp

14 days ago

Feedback taken! Multilingual is hard because 1) Moshi is not multilingual so multilingual finetunes will be quite limited so swapping base models needs to be figured out first 2) the edge that this model has over standard ASR+LLM+TTS voice agent stack is naturalness that comes from real channel-separated conversations dataset, where Fisher English is the only publicly available source.
Will keep this discussion open to hear more language requests, and pointers to publicly available, commercially-available channel-separated dialog datasets in other languages.

Hi Royra, how many hours of channel-separated-conversation are the minimum required to get a good finetuning for a new language ?

leq6c

13 days ago

•

edited 13 days ago

the edge that this model has over standard ASR+LLM+TTS voice agent stack is naturalness that comes from real channel-separated conversations dataset, where Fisher English is the only publicly available source.
Will keep this discussion open to hear more language requests, and pointers to publicly available, commercially-available channel-separated dialog datasets in other languages.

Hi @royrajarshi , I’m an engineer at oto. We have Japanese and English channel-separated conversational speech datasets. We also run a dedicated conversation data-collection platform and would love to expand to more languages. We’ve released English datasets publicly (e.g., https://huggingface.co/datasets/otoearth/otoSpeech-full-duplex-processed-141h), and we also have additional datasets available under a commercial license.

We’d love to collaborate and can share more samples if helpful. Happy to chat.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment