You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

SpeechT5_MIKO_Korean

ํ•œ๊ตญ์–ด TTS(Text-to-Speech)์šฉ์œผ๋กœ ํŒŒ์ธํŠœ๋‹ํ•œ SpeechT5 ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
์ „์ฒด ํŒŒ์ดํ”„๋ผ์ธ๊ณผ ์ถ”๋ก  ์ฝ”๋“œ๋Š” ๊นƒํ—ˆ๋ธŒ ๋ฆฌํฌ์ง€ํ† ๋ฆฌ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๊นƒ ํด๋ก ์„ ํ†ตํ•ด ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

Sample

Model Details

  • Base model: microsoft/speecht5_tts
  • Vocoder: microsoft/speecht5_hifigan
  • Speaker encoder: microsoft/wavlm-base-plus-sv
  • Language: Korean (ko)
  • Author: ์•ˆํ˜ธ์„ฑ (GitHub: hobi2k, Hugging Face: ahnhs2k)

Training Data

ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹์€ simon3000/genshin-voice๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ „์ฒ˜๋ฆฌํ•ด ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.

Training Procedure

  • Framework: PyTorch + Hugging Face Transformers
  • Mixed precision: AMP ์‚ฌ์šฉ
  • Training script: scripts/train.py
  • Inference script: scripts/inference.py
  • Full pipeline repository: https://github.com/hobi2k/SpeechT5_Korean
  • Checkpoint policy:
    • checkpoint_last.pt ๋งค epoch ์ €์žฅ
    • 10 epoch๋งˆ๋‹ค checkpoints/epoch_XXXXXX/ ์ €์žฅ (์ตœ๋Œ€ 5๊ฐœ ์œ ์ง€)

Inference

ํ”„๋กœ์ ํŠธ ์Šคํฌ๋ฆฝํŠธ ๊ธฐ๋ฐ˜ ์ถ”๋ก :

uv run scripts/inference.py \
  --model_dir /path/to/output_model \
  --text "์•ˆ๋…•ํ•˜์„ธ์š”. ํ…Œ์ŠคํŠธ ๋ฌธ์žฅ์ž…๋‹ˆ๋‹ค." \
  --out out.wav

์ฃผ๊ธฐ ์ €์žฅ ๋ชจ๋ธ ์„ ํƒ ์ถ”๋ก :

uv run scripts/inference.py \
  --model_dir /path/to/output_model \
  --checkpoint_epoch 40 \
  --text "40 epoch ๋ชจ๋ธ ํ…Œ์ŠคํŠธ" \
  --out out_epoch40.wav

Limitations

  • ๋ฐ์ดํ„ฐ ๋„๋ฉ”์ธ/ํ™”์ž ํŠน์„ฑ์— ๋”ฐ๋ผ ์Œ์งˆ ์ฐจ์ด๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ˆซ์ž, ์™ธ๋ž˜์–ด, ํŠน์ˆ˜๊ธฐํ˜ธ๊ฐ€ ๋งŽ์€ ๋ฌธ์žฅ์€ ๋ฐœ์Œ ํ’ˆ์งˆ์ด ์ €ํ•˜๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๋ณธ ๋ชจ๋ธ์€ ํŠน์ • ๋ฐ์ดํ„ฐ์…‹/์ „์ฒ˜๋ฆฌ ์ •์ฑ…์— ๋งž์ถฐ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

Citation

@misc{speecht5_korean,
  title        = {SpeechT5_Korean: Korean SpeechT5 Training and Inference Pipeline},
  author       = {์•ˆํ˜ธ์„ฑ (GitHub: hobi2k)},
  year         = {2026},
  url          = {https://github.com/hobi2k/SpeechT5_Korean},
  note         = {Hugging Face: https://huggingface.co/ahnhs2k}
}

Acknowledgements

Downloads last month
2
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ahnhs2k/SpeechT5_MIKO_Korean

Finetuned
(1348)
this model

Dataset used to train ahnhs2k/SpeechT5_MIKO_Korean

Collection including ahnhs2k/SpeechT5_MIKO_Korean