Add synthesizer model with config and documentation

Files changed (3) hide show

Readme.md ADDED Viewed

+# Synthesizer Model
+This directory contains the pre-trained synthesizer model for voice conversion.
+## Model Details
+- **File**: `synthesizer.pt`
+- **Size**: ~370.6 MB
+- **Input**: Text or linguistic features + Speaker embeddings
+- **Output**: Mel-spectrograms
+## Usage
+```python
+# Load the synthesizer model
+synthesizer = torch.load('synthesizer.pt')
+# Generate mel-spectrogram from text and speaker embedding
+with torch.no_grad():
+    mel_output = synthesizer(text_input, speaker_embedding)
+```
+## Dependencies
+- PyTorch
+- NumPy
+- Text processing utilities (for text input)
+- Audio processing libraries (for mel-spectrogram conversion)
+## Model Configuration
+See `config.json` for model architecture and training parameters.

config.json ADDED Viewed

+{
+  "model_type": "tacotron2",
+  "sample_rate": 22050,
+  "n_mel_channels": 80,
+  "n_frames_per_step": 1,
+  "encoder_embedding_dim": 512,
+  "encoder_kernel_size": 5,
+  "encoder_n_convolutions": 3,
+  "encoder_conv_dropout": 0.5,
+  "attention_rnn_dim": 1024,
+  "attention_dim": 128,
+  "attention_location_n_filters": 32,
+  "attention_location_kernel_size": 31,
+  "decoder_rnn_dim": 1024,
+  "prenet_dim": 256,
+  "max_decoder_steps": 1000,
+  "gate_threshold": 0.5,
+  "p_attention_dropout": 0.1,
+  "p_decoder_dropout": 0.1,
+  "postnet_embedding_dim": 512,
+  "postnet_kernel_size": 5,
+  "postnet_n_convolutions": 5,
+  "mask_padding": true,
+  "fp16_run": false,
+  "version": "1.0",
+  "authors": ["Arjit"],
+  "description": "Tacotron2-based synthesizer for text-to-speech conversion"
+}

synthesizer.pt ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:c05e07428f95d0ed8755e1ef54cc8ae251300413d94ce5867a56afe39c499d94
+size 370554559