custom-tts-model / README.md
miscovery's picture
Upload README.md with huggingface_hub
7f959ae verified
metadata
tags:
  - text-to-speech
  - tts
  - pytorch
language:
  - ar
license: mit

Multi-Speaker TTS Model

Trained Text-to-Speech model with the following specs:

Model Details

  • Speakers: 1
  • Mel Channels: 80
  • Sample Rate: 22050
  • Vocab Size: 50000

Training Stats

  • Final Mel Loss: 5.3586
  • Validation Mel Loss: 5.4784
  • Training Epochs: 105

Mel Normalization

  • Mean: -4.8906
  • Std: 4.9261

Usage

import torch
from transformers import AutoTokenizer

# Load model
checkpoint = torch.load('checkpoint_best.pt')
model = MultiSpeakerTTS(checkpoint['config'])
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Generate
tokenizer = AutoTokenizer.from_pretrained("miscovery/tokenizer_v3")
text = "Your text here"
tokens = tokenizer.encode(text)
# ... inference code ...

Speaker Mapping

{ "mahmoud": 0 }