hinge commited on
Commit
95eef32
·
verified ·
1 Parent(s): d7e820f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -39,7 +39,7 @@ model-index:
39
 
40
  # Voxtral small LoRA finetuned on CoRaL release 1
41
 
42
- This is a Danish state of the art automatic speech recognition (ASR) model, which combines the decoder and audio-adapter of [**Voxtral-Small-24B-2507**](mistralai/Voxtral-Small-24B-2507) with the encoder from [**roest-whisper-large-v1**](CoRal-project/roest-whisper-large-v1). The decoder and audio-adapter were finetuned using LoRA for 2 epochs on the Danish [coral dataset](CoRal-project/coral) for automatic speech recognition (ASR).
43
 
44
  ## Evaluation Results
45
 
@@ -59,7 +59,7 @@ danstral-v1:
59
  - is finetuned solely on the coral v1 dataset and performance may deterioate significantly for other data sources.
60
 
61
  ## Future work and ideas
62
- - SOTA performance was achieved using a LoRA adapter with 25M parameters. A full finetune on larger GPU's and bigger datasets will likely give even better results
63
  - Using danstral-v1 for knowledge distillation to train smaller models
64
 
65
 
 
39
 
40
  # Voxtral small LoRA finetuned on CoRaL release 1
41
 
42
+ danstral is a 24B parameter state of the art model fo automatic speech recognition (ASR) model, which combines the decoder and audio-adapter of [**Voxtral-Small-24B-2507**](mistralai/Voxtral-Small-24B-2507) with the audio encoder from [**roest-whisper-large-v1**](CoRal-project/roest-whisper-large-v1). The decoder and audio-adapter were finetuned using LoRA for 2 epochs (40 hours) on the Danish [coral dataset](CoRal-project/coral), using 3 NVIDIA L40s. Although achieving SOTA on CoRal, it is a humongous model and likely overkill compared to Whisper-based models.
43
 
44
  ## Evaluation Results
45
 
 
59
  - is finetuned solely on the coral v1 dataset and performance may deterioate significantly for other data sources.
60
 
61
  ## Future work and ideas
62
+ - SOTA performance was achieved using a LoRA adapter with 25M parameters. I only conducted a few experiments, and there is likely more performance gains to be had by tweaking the LoRA configuration or by conducting a full parameter finetune.
63
  - Using danstral-v1 for knowledge distillation to train smaller models
64
 
65