Simple TTS Model

A lightweight Text-to-Speech model trained on LJSpeech dataset.

Model Description

This is a FastSpeech2-style TTS model with:

  • Transformer encoder for text encoding
  • Duration predictor
  • Transformer decoder for mel spectrogram generation

Training

  • Dataset: LJSpeech (5000 samples)
  • Hardware: Kaggle T4 GPU
  • Training time: 20 epochs

Model Parameters

  • Total parameters: 5,168,465
  • Hidden dimension: 256
  • Number of layers: 3
  • Attention heads: 4

Usage

import torch

# Load model
checkpoint = torch.load('pytorch_model.bin')
# Initialize model with config and load weights

Limitations

This is a basic model for demonstration purposes. For production use, consider:

  • Training on more data
  • Adding a vocoder (e.g., HiFi-GAN) for audio generation
  • Using phoneme-based input instead of characters
Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support