Add experimental real-data vocoder training artifacts

Browse files

Files changed (6) hide show

README.md +15 -6
experimental/text_to_mel_step_8000.safetensors +3 -0
experimental/vocoder_config.json +12 -0
experimental/vocoder_metrics_step_3000.json +0 -0
experimental/vocoder_realdata_step_3000.safetensors +3 -0
samples/hello_world_i_am_fine_vocoder_step3000.wav +0 -0

README.md CHANGED Viewed

@@ -20,7 +20,7 @@ datasets:
 Mesko TTS is MesklinTech's dedicated text-to-speech research project.
-This repository is currently published as an architecture and training-code release. The previous small checkpoint was useful for code smoke tests, but it did not include a properly trained production vocoder and produced noisy waveform audio. We have therefore marked the project as **not production-trained yet** and are continuing training before publishing a listenable release checkpoint.
 ## Mission
@@ -45,11 +45,13 @@ What is available now:
 - sparse acoustic decoder
 - sparse neural vocoder code
 - LJSpeech training scripts and config structure
 What is not ready yet:
 - production-quality speech checkpoint
-- trained neural vocoder release
 - standardized MOS / WER / speaker-similarity benchmark
 - long-form streaming quality validation
@@ -77,13 +79,20 @@ The intended model path is:
 7. Acoustic energy/gating head -> mel spectrogram
 8. Trained neural vocoder -> waveform
-## Why The Checkpoint Was Removed
-The previous uploaded checkpoint could generate mel tensors, but audio generated through a fallback Griffin-Lim renderer was noisy. That is not acceptable for a public TTS release. A real TTS release needs a trained waveform vocoder or a high-quality external vocoder path.
-Until that is ready, this repository should be treated as source code and architecture documentation, not as a finished voice model.
 ## Responsible Use
 Do not use this project to impersonate people, clone voices without consent, commit fraud, or create misleading audio. Voice technology should be built and used with permission, transparency, and care.

 Mesko TTS is MesklinTech's dedicated text-to-speech research project.
+This repository is currently published as an architecture and training-code release with experimental checkpoints. The first small checkpoint was useful for code smoke tests, but it did not include a properly trained production vocoder and produced noisy waveform audio through fallback rendering. We have therefore marked the project as **not production-trained yet** and are continuing training before publishing a polished voice release.
 ## Mission
 - sparse acoustic decoder
 - sparse neural vocoder code
 - LJSpeech training scripts and config structure
+- experimental text-to-mel checkpoint
+- experimental real-data vocoder checkpoint
 What is not ready yet:
 - production-quality speech checkpoint
+- production-grade trained neural vocoder release
 - standardized MOS / WER / speaker-similarity benchmark
 - long-form streaming quality validation
 7. Acoustic energy/gating head -> mel spectrogram
 8. Trained neural vocoder -> waveform
+## Experimental Checkpoints
+The current experimental files are:
+- `experimental/text_to_mel_step_8000.safetensors`
+- `experimental/vocoder_realdata_step_3000.safetensors`
+- `experimental/vocoder_config.json`
+- `experimental/vocoder_metrics_step_3000.json`
+- `samples/hello_world_i_am_fine_vocoder_step3000.wav`
+The vocoder checkpoint was trained on real LJSpeech mel/waveform segments for a short in-session run. It is better aligned than the previous random-data smoke path, but it is still an early experimental checkpoint and should not be treated as final voice quality.
+Until a longer vocoder and text-to-mel training run is complete, this repository should be treated as source code, architecture documentation, and experimental research weights, not as a finished voice model.
 ## Responsible Use
 Do not use this project to impersonate people, clone voices without consent, commit fraud, or create misleading audio. Voice technology should be built and used with permission, transparency, and care.

experimental/text_to_mel_step_8000.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d4f3632f75d048cdbd37829e33909e99883d47fc99ce418aa5523469d7c86a91
+size 5835780

experimental/vocoder_config.json ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+  "n_mels": 80,
+  "channels": 64,
+  "residual_layers": 4,
+  "upsample_scales": [
+    8,
+    5,
+    3,
+    2
+  ],
+  "sample_rate": 24000
+}

experimental/vocoder_metrics_step_3000.json ADDED Viewed

The diff for this file is too large to render. See raw diff

experimental/vocoder_realdata_step_3000.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:35c83979648836ca7d1651e598aeafb6c4e8d65a5e15fddf0bd58953a0f4c41e
+size 968980

samples/hello_world_i_am_fine_vocoder_step3000.wav ADDED Viewed

Binary file (57.2 kB). View file