Recommendations

#32
by liuyt6515 - opened

I have extensively utilized the Voxtral model, conducted in-depth research on relevant papers, and successfully implemented the LoRA fine-tuning code. Nevertheless, based on overall customer feedback, the model still has substantial room for improvement:
1、The speaker diarization function needs to achieve the same effect as Vibvoice.
2、Transcription outputs for multiple languages lack punctuation. This issue may be attributed to the size and bias of the training dataset; if that is the case, the model will never reach the level of OpenAI’s systems.
3、Maintaining the training code as closed-source will only impede the long-term development and sustainability of the project.
4、Transcription accuracy is significantly degraded by environmental factors such as background noise.

liuyt6515 changed discussion title from advice to Recommendations

This is a truly remarkable implementation that revolutionizes real-time ASR, shifting it from a pipeline-based approach to seamless voice streaming.I hope it keeps getting better and better.

Sign up or log in to comment