Salt-Cosy: CosyVoice2 Fine-tuned on Russian Audiobooks

Fine-tuned CosyVoice2-0.5B LLM component on ToneBooks Russian audiobook dataset.

Usage

This is only the fine-tuned llm.pt component. To use it:

Download the base CosyVoice2-0.5B model from ModelScope:

modelscope download --model FunAudioLLM/CosyVoice2-0.5B --local_dir pretrained_models/CosyVoice2-0.5B

Replace the llm.pt with this fine-tuned version:

# Backup original
mv pretrained_models/CosyVoice2-0.5B/llm.pt pretrained_models/CosyVoice2-0.5B/llm.pt.original

# Download fine-tuned weights
wget https://huggingface.co/AlexWortega/Salt-cosy/resolve/main/llm.pt -O pretrained_models/CosyVoice2-0.5B/llm.pt

Use CosyVoice2 as normal:

from cosyvoice.cli.cosyvoice import CosyVoice2

cosyvoice = CosyVoice2('pretrained_models/CosyVoice2-0.5B', fp16=True)

# Zero-shot TTS
for result in cosyvoice.inference_zero_shot(
    tts_text="Привет! Это тестовая генерация речи.",
    prompt_text="Текст референсного аудио",
    prompt_wav="reference.wav",
    stream=False
):
    # result['tts_speech'] contains the generated audio
    pass

Training Details

Base Model: CosyVoice2-0.5B
Dataset: ToneBooks (Russian audiobooks)
Epochs: 18
Component: LLM only

License

Apache 2.0 (following CosyVoice license)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support