Potential use for whisper.cpp

by zodiuxus - opened Jun 4, 2025

Jun 4, 2025

I've come across this model as being one of the more favorable when it comes to the Macedonian language, however, I'm having difficulty getting it to work with whisper.cpp, specifically turning it into a ggml format file.

Is there a way this can be achieved? Are all the necessary files available for its conversion? Or, if possible, could you provide said file? I'm curious to quantize it and see how well it would perform for a real-time transcription application.

Porjaz

Macedonian ASR org Jun 5, 2025

I don't have much experience with it, but based on some searching, you would need to first convert the model from PyTorch (.pt) to sgml using this script: https://github.com/ggml-org/whisper.cpp/blob/master/models/convert-pt-to-ggml.py

However, since the model is trained using the SpeechBrain toolkit, there is no .pt file but instead there is .ckpt
You can try to load the model with the .ckpt and then save it in a Pytorch .pt format

zodiuxus

Jun 11, 2025

I gave it a shot - here's what happened:
Loading the model using speechbrain and then trying to save it using pytorch results in a ValueError: Need hparams['tokenizer'].
Loading model.ckpt using pytorch and then saving it using pytorch as model.pt was successful, however, when using the conversion script, it was unable to find the "dims" field from the model, and subsequently the other necessary hyperparameters from within it. I'm not too sure about this part, but my best guess is that these details can't be found from hyperparams.yaml, probably the same issue as the tokenizer field.

I managed to instead convert the model.safetensors to a .ggml format using the vocab.json and added_tokens.json from whisper-large-v3 and then quantize it, and finally run whisper.cpp using that. I ran into some errors going through it, but I was able to convert it by modifying this script: https://github.com/ggml-org/whisper.cpp/issues/2359#issuecomment-2671035284 by also changing the line 162 to if name == "proj_out.weight" or name == "_mel_filters":. and then running the script on the .safetensors model file. It's still a little slow even when quantizing it down to q2_k, but that might just be my own machine's fault for not really having a GPU.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment