Simple TTS Model
A lightweight Text-to-Speech model trained on LJSpeech dataset.
Model Description
This is a FastSpeech2-style TTS model with:
- Transformer encoder for text encoding
- Duration predictor
- Transformer decoder for mel spectrogram generation
Training
- Dataset: LJSpeech (5000 samples)
- Hardware: Kaggle T4 GPU
- Training time: 20 epochs
Model Parameters
- Total parameters: 5,168,465
- Hidden dimension: 256
- Number of layers: 3
- Attention heads: 4
Usage
import torch
# Load model
checkpoint = torch.load('pytorch_model.bin')
# Initialize model with config and load weights
Limitations
This is a basic model for demonstration purposes. For production use, consider:
- Training on more data
- Adding a vocoder (e.g., HiFi-GAN) for audio generation
- Using phoneme-based input instead of characters
- Downloads last month
- 6