Recommendations

#32

by liuyt6515 - opened Mar 6

Mar 6

•

I have extensively utilized the Voxtral model, conducted in-depth research on relevant papers, and successfully implemented the LoRA fine-tuning code. Nevertheless, based on overall customer feedback, the model still has substantial room for improvement:
1、The speaker diarization function needs to achieve the same effect as Vibvoice.
2、Transcription outputs for multiple languages lack punctuation. This issue may be attributed to the size and bias of the training dataset; if that is the case, the model will never reach the level of OpenAI’s systems.
3、Maintaining the training code as closed-source will only impede the long-term development and sustainability of the project.
4、Transcription accuracy is significantly degraded by environmental factors such as background noise.

liuyt6515 changed discussion title from advice to Recommendations Mar 6

liuyt6515

Mar 12

This is a truly remarkable implementation that revolutionizes real-time ASR, shifting it from a pipeline-based approach to seamless voice streaming.I hope it keeps getting better and better.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment