AJ50 commited on
Commit
d8e9cf4
·
verified ·
1 Parent(s): 90e7d42

Add synthesizer model with config and documentation

Browse files
Files changed (3) hide show
  1. Readme.md +28 -0
  2. config.json +28 -0
  3. synthesizer.pt +3 -0
Readme.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Synthesizer Model
2
+
3
+ This directory contains the pre-trained synthesizer model for voice conversion.
4
+
5
+ ## Model Details
6
+ - **File**: `synthesizer.pt`
7
+ - **Size**: ~370.6 MB
8
+ - **Input**: Text or linguistic features + Speaker embeddings
9
+ - **Output**: Mel-spectrograms
10
+
11
+ ## Usage
12
+ ```python
13
+ # Load the synthesizer model
14
+ synthesizer = torch.load('synthesizer.pt')
15
+
16
+ # Generate mel-spectrogram from text and speaker embedding
17
+ with torch.no_grad():
18
+ mel_output = synthesizer(text_input, speaker_embedding)
19
+ ```
20
+
21
+ ## Dependencies
22
+ - PyTorch
23
+ - NumPy
24
+ - Text processing utilities (for text input)
25
+ - Audio processing libraries (for mel-spectrogram conversion)
26
+
27
+ ## Model Configuration
28
+ See `config.json` for model architecture and training parameters.
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "tacotron2",
3
+ "sample_rate": 22050,
4
+ "n_mel_channels": 80,
5
+ "n_frames_per_step": 1,
6
+ "encoder_embedding_dim": 512,
7
+ "encoder_kernel_size": 5,
8
+ "encoder_n_convolutions": 3,
9
+ "encoder_conv_dropout": 0.5,
10
+ "attention_rnn_dim": 1024,
11
+ "attention_dim": 128,
12
+ "attention_location_n_filters": 32,
13
+ "attention_location_kernel_size": 31,
14
+ "decoder_rnn_dim": 1024,
15
+ "prenet_dim": 256,
16
+ "max_decoder_steps": 1000,
17
+ "gate_threshold": 0.5,
18
+ "p_attention_dropout": 0.1,
19
+ "p_decoder_dropout": 0.1,
20
+ "postnet_embedding_dim": 512,
21
+ "postnet_kernel_size": 5,
22
+ "postnet_n_convolutions": 5,
23
+ "mask_padding": true,
24
+ "fp16_run": false,
25
+ "version": "1.0",
26
+ "authors": ["Arjit"],
27
+ "description": "Tacotron2-based synthesizer for text-to-speech conversion"
28
+ }
synthesizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c05e07428f95d0ed8755e1ef54cc8ae251300413d94ce5867a56afe39c499d94
3
+ size 370554559