Vikhr Salt: Speech And Language Transformer

Vikhr Salt Logo

Vikhr Salt is a multimodal model based on a pre-trained large language model, extended with new audio tokens to handle both TTS (text-to-speech) and ASR (automatic speech recognition) tasks. The model incorporates two variants for encoding audioโ€”Encodec and SpeechTokenizerโ€”and achieves stable training by fine-tuning precision settings. This approach allows Vikhr Salt to leverage pre-existing LLM knowledge while effectively generating and understanding speech, marking a step forward in multimodal learning.

Model Authors

Ksenya Sycheva, Konstantin Korolev, Aleksandr Nikolic

Downloads last month
14
Safetensors
Model size
1B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Vikhrmodels/salt-116k

Finetuned
(53)
this model
Quantizations
2 models

Spaces using Vikhrmodels/salt-116k 2