PentesterPriyanshu commited on
Commit
00ed4da
·
verified ·
1 Parent(s): bb43687

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +49 -0
  2. config.json +11 -0
  3. pytorch_model.bin +3 -0
README.md ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - text-to-speech
6
+ - tts
7
+ - audio
8
+ license: mit
9
+ ---
10
+
11
+ # Simple TTS Model
12
+
13
+ A lightweight Text-to-Speech model trained on LJSpeech dataset.
14
+
15
+ ## Model Description
16
+
17
+ This is a FastSpeech2-style TTS model with:
18
+ - Transformer encoder for text encoding
19
+ - Duration predictor
20
+ - Transformer decoder for mel spectrogram generation
21
+
22
+ ## Training
23
+
24
+ - Dataset: LJSpeech (5000 samples)
25
+ - Hardware: Kaggle T4 GPU
26
+ - Training time: 20 epochs
27
+
28
+ ## Model Parameters
29
+
30
+ - Total parameters: 5,168,465
31
+ - Hidden dimension: 256
32
+ - Number of layers: 3
33
+ - Attention heads: 4
34
+
35
+ ## Usage
36
+ ```python
37
+ import torch
38
+
39
+ # Load model
40
+ checkpoint = torch.load('pytorch_model.bin')
41
+ # Initialize model with config and load weights
42
+ ```
43
+
44
+ ## Limitations
45
+
46
+ This is a basic model for demonstration purposes. For production use, consider:
47
+ - Training on more data
48
+ - Adding a vocoder (e.g., HiFi-GAN) for audio generation
49
+ - Using phoneme-based input instead of characters
config.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "simple_tts",
3
+ "vocab_size": 60,
4
+ "d_model": 256,
5
+ "n_heads": 4,
6
+ "n_layers": 3,
7
+ "n_mels": 80,
8
+ "sample_rate": 22050,
9
+ "hop_length": 256,
10
+ "n_fft": 1024
11
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6f4c1b52a4bb2b3f7c06c9d8209872a796cbf7d15fb95b190c30227a6ed41e45
3
+ size 25827261