ahnhs2k
/

SpeechT5_MIKO_Korean

Model card Files Files and versions

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

SpeechT5_MIKO_Korean

한국어 TTS(Text-to-Speech)용으로 파인튜닝한 SpeechT5 모델입니다.
전체 파이프라인과 추론 코드는 깃허브 리포지토리에서 확인할 수 있습니다.
깃 클론을 통해 사용 가능합니다.

Sample

Model Details

Base model: microsoft/speecht5_tts
Vocoder: microsoft/speecht5_hifigan
Speaker encoder: microsoft/wavlm-base-plus-sv
Language: Korean (ko)
Author: 안호성 (GitHub: hobi2k, Hugging Face: ahnhs2k)

Training Data

훈련 데이터셋은 simon3000/genshin-voice를 기반으로 전처리해 사용했습니다.

Training Procedure

Framework: PyTorch + Hugging Face Transformers
Mixed precision: AMP 사용
Training script: scripts/train.py
Inference script: scripts/inference.py
Full pipeline repository: https://github.com/hobi2k/SpeechT5_Korean
Checkpoint policy:
- checkpoint_last.pt 매 epoch 저장
- 10 epoch마다 checkpoints/epoch_XXXXXX/ 저장 (최대 5개 유지)

Inference

프로젝트 스크립트 기반 추론:

uv run scripts/inference.py \
  --model_dir /path/to/output_model \
  --text "안녕하세요. 테스트 문장입니다." \
  --out out.wav

주기 저장 모델 선택 추론:

uv run scripts/inference.py \
  --model_dir /path/to/output_model \
  --checkpoint_epoch 40 \
  --text "40 epoch 모델 테스트" \
  --out out_epoch40.wav

Limitations

데이터 도메인/화자 특성에 따라 음질 차이가 발생할 수 있습니다.
숫자, 외래어, 특수기호가 많은 문장은 발음 품질이 저하될 수 있습니다.
본 모델은 특정 데이터셋/전처리 정책에 맞춰 학습되었습니다.

Citation

@misc{speecht5_korean,
  title        = {SpeechT5_Korean: Korean SpeechT5 Training and Inference Pipeline},
  author       = {안호성 (GitHub: hobi2k)},
  year         = {2026},
  url          = {https://github.com/hobi2k/SpeechT5_Korean},
  note         = {Hugging Face: https://huggingface.co/ahnhs2k}
}

Acknowledgements

Microsoft SpeechT5: https://github.com/microsoft/SpeechT5
microsoft/speecht5_tts: https://huggingface.co/microsoft/speecht5_tts
microsoft/speecht5_hifigan: https://huggingface.co/microsoft/speecht5_hifigan
microsoft/wavlm-base-plus-sv: https://huggingface.co/microsoft/wavlm-base-plus-sv

Downloads last month: 2

Safetensors

Model size

0.1B params

Tensor type

F32

·

Model tree for ahnhs2k/SpeechT5_MIKO_Korean

Base model

microsoft/speecht5_tts

Finetuned

(1355)

this model

Dataset used to train ahnhs2k/SpeechT5_MIKO_Korean

Collection including ahnhs2k/SpeechT5_MIKO_Korean

speecht5

3 items • Updated Feb 25