hinge
/

danstral-v1

@@ -39,7 +39,7 @@ model-index:
 # Voxtral small LoRA finetuned on CoRaL release 1
-This is a Danish state of the art automatic speech recognition (ASR) model, which combines the decoder and audio-adapter of [**Voxtral-Small-24B-2507**](mistralai/Voxtral-Small-24B-2507) with the encoder from [**roest-whisper-large-v1**](CoRal-project/roest-whisper-large-v1). The decoder and audio-adapter were finetuned using LoRA for 2 epochs on the Danish [coral dataset](CoRal-project/coral) for automatic speech recognition (ASR).
 ## Evaluation Results
@@ -59,7 +59,7 @@ danstral-v1:
 - is finetuned solely on the coral v1 dataset and performance  may deterioate significantly for other data sources.
 ## Future work and ideas
-- SOTA performance was achieved using a LoRA adapter with 25M parameters. A full finetune on larger GPU's and bigger datasets will likely give even better results
 - Using danstral-v1 for knowledge distillation to train smaller models

 # Voxtral small LoRA finetuned on CoRaL release 1
+danstral is a 24B parameter state of the art model fo automatic speech recognition (ASR) model, which combines the decoder and audio-adapter of [**Voxtral-Small-24B-2507**](mistralai/Voxtral-Small-24B-2507) with the audio encoder from [**roest-whisper-large-v1**](CoRal-project/roest-whisper-large-v1). The decoder and audio-adapter were finetuned using LoRA for 2 epochs (40 hours) on the Danish [coral dataset](CoRal-project/coral), using 3 NVIDIA L40s. Although achieving SOTA on CoRal, it is a humongous model and likely overkill compared to Whisper-based models.
 ## Evaluation Results
 - is finetuned solely on the coral v1 dataset and performance  may deterioate significantly for other data sources.
 ## Future work and ideas
+- SOTA performance was achieved using a LoRA adapter with 25M parameters. I only conducted a few experiments, and there is likely more performance gains to be had by tweaking the LoRA configuration or by conducting a full parameter finetune.
 - Using danstral-v1 for knowledge distillation to train smaller models