KugelAudio-0-Open: TTS for European languages

Remove voice cloning, use pre-encoded voices instead.
Voices are stored as .pt files in voices/ folder.

Co-authored-by: Cursor <cursoragent@cursor.com>

Files changed (14) hide show

.gitattributes +43 -0
README.md +375 -0
config.json +122 -0
generation_config.json +4 -0
model-00001-of-00004.safetensors +3 -0
model-00002-of-00004.safetensors +3 -0
model-00003-of-00004.safetensors +3 -0
model-00004-of-00004.safetensors +3 -0
model.safetensors.index.json +0 -0
samples/258_Lukas_der_Flüsterer.wav +3 -0
samples/261_Sauerer_Felix.wav +3 -0
samples/266_Petra_die_Vorleserin.wav +3 -0
samples/277_Radio_Lars.wav +3 -0
voices/voices.json +17 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,43 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+samples/255_Durchsage.wav filter=lfs diff=lfs merge=lfs -text
+samples/260_Lisa.wav filter=lfs diff=lfs merge=lfs -text
+samples/270_Friedrich_Sänger.wav filter=lfs diff=lfs merge=lfs -text
+samples/281_Suffi_Thomas.wav filter=lfs diff=lfs merge=lfs -text
+samples/258_Lukas_der_Flüsterer.wav filter=lfs diff=lfs merge=lfs -text
+samples/277_Radio_Lars.wav filter=lfs diff=lfs merge=lfs -text
+samples/261_Sauerer_Felix.wav filter=lfs diff=lfs merge=lfs -text
+samples/266_Petra_die_Vorleserin.wav filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,375 @@

+---
+language:
+  - en
+  - de
+  - fr
+  - es
+  - it
+  - pt
+  - nl
+  - pl
+  - ru
+  - uk
+  - cs
+  - ro
+  - hu
+  - sv
+  - da
+  - fi
+  - no
+  - el
+  - bg
+  - sk
+  - hr
+  - sr
+  - tr
+license: mit
+tags:
+  - text-to-speech
+  - tts
+  - speech-synthesis
+  - audio-generation
+  - european-languages
+  - diffusion
+  - autoregressive
+pipeline_tag: text-to-speech
+inference: false
+model-index:
+  - name: kugelaudio-0-open
+    results:
+      - task:
+          type: text-to-speech
+        dataset:
+          type: custom
+          name: YODAS2
+        metrics:
+          - type: win-rate
+            value: 78.0
+            name: Human Preference vs ElevenLabs
+---
+# 🎙️ KugelAudio-0-Open
+**Open-source text-to-speech for European languages**
+7B parameter model powered by an AR + Diffusion architecture
+<p align="center">
+  <a href="https://github.com/Kugelaudio/kugelaudio-open"><img src="https://img.shields.io/badge/GitHub-Source_Code-black" alt="GitHub Source Code"></a>
+  <a href="https://kugelaudio.com"><img src="https://img.shields.io/badge/🌐-Website-blue" alt="KugelAudio Website"></a>
+</p>
+<table align="center" style="border-collapse: collapse; border: none;">
+  <tr style="border: none;">
+    <td style="border: none; padding: 0 20px;">
+      <a href="https://kugelaudio.com">
+        <img src="https://www.kugelaudio.com/logos/Logo%20Short.svg" alt="KugelAudio" style="height: 60px; width: auto;">
+      </a>
+    </td>
+    <td style="border: none; padding: 0 20px;">
+      <a href="https://hpi.de/ki-servicezentrum/">
+        <img src="https://docs.sc.hpi.de/attachments/aisc/aisc-logo.png" alt="KI-Servicezentrum Berlin-Brandenburg" style="height: 60px; width: auto;">
+      </a>
+    </td>
+    <td style="border: none; padding: 0 20px;">
+      <a href="https://www.bmftr.bund.de">
+        <img src="https://hpi.de/fileadmin/_processed_/a/3/csm_BMFTR_de_Web_RGB_gef_durch_cd1f5345bd.jpg" alt="Gefördert durch BMFTR" style="height: 60px; width: auto;">
+      </a>
+    </td>
+  </tr>
+</table>
+License: MIT Python 3.10+ Hosted API
+KugelAudio KI-Servicezentrum Berlin-Brandenburg Gefördert durch BMFTR
+---
+## Motivation
+**Open-source text-to-speech models for European languages are significantly lagging behind.** While English TTS has seen remarkable progress, speakers of German, French, Spanish, Polish, and dozens of other European languages have been underserved by the open-source community.
+KugelAudio aims to change this. Building on the excellent foundation laid by the [VibeVoice team at Microsoft](https://github.com/microsoft/VibeVoice), we've trained a model specifically focused on European language coverage, using approximately **200,000 hours** of highly pre-processed and enhanced speech data from the [YODAS2 dataset](https://huggingface.co/datasets/espnet/yodas).
+## 🏆 Benchmark Results: Outperforming ElevenLabs
+**KugelAudio achieves state-of-the-art performance**, beating industry leaders including ElevenLabs in rigorous human preference testing. This breakthrough demonstrates that open-source models can now rival - and surpass - the best commercial TTS systems.
+### Human Preference Benchmark (A/B Testing)
+We conducted extensive A/B testing with **339 human evaluations** to compare KugelAudio against leading TTS models. Participants listened to a reference voice sample, then compared outputs from two models and selected which sounded more human and closer to the original voice.
+### German Language Evaluation
+The evaluation specifically focused on **German language samples** with diverse emotional expressions and speaking styles:
+* **Neutral Speech**: Standard conversational tones
+* **Shouting**: High-intensity, elevated volume speech
+* **Singing**: Melodic and rhythmic speech patterns
+* **Drunken Voice**: Slurred and irregular speech characteristics
+These diverse test cases demonstrate the model's capability to handle a wide range of speaking styles beyond standard narration.
+### OpenSkill Ranking Results
+| Rank | Model | Score | Record | Win Rate |
+|------|-------|-------|--------|----------|
+| 🥇 1 | **KugelAudio** | **26** | 71W / 20L / 23T | **78.0%** |
+| 🥈 2 | ElevenLabs Multi v2 | 25 | 56W / 34L / 22T | 62.2% |
+| 🥉 3 | ElevenLabs v3 | 21 | 64W / 34L / 16T | 65.3% |
+| 4 | Cartesia | 21 | 55W / 38L / 19T | 59.1% |
+| 5 | VibeVoice | 10 | 30W / 74L / 8T | 28.8% |
+| 6 | CosyVoice v3 | 9 | 15W / 91L / 8T | 14.2% |
+_Based on 339 evaluations using Bayesian skill-rating system (OpenSkill)_
+## Audio Samples
+Listen to KugelAudio's diverse voice capabilities across different speaking styles and languages:
+### German Voice Samples
+| Sample | Description | Audio Player |
+|--------|-------------|--------------|
+| **Whispering** | Soft whispering voice | <audio controls><source src="https://huggingface.co/kugelaudio/kugelaudio-0-open/resolve/main/samples/258_Lukas_der_Flüsterer.wav" type="audio/wav"></audio> |
+| **Female Narrator** | Professional female reader voice | <audio controls><source src="https://huggingface.co/kugelaudio/kugelaudio-0-open/resolve/main/samples/266_Petra_die_Vorleserin.wav" type="audio/wav"></audio> |
+| **Angry Voice** | Irritated and frustrated speech | <audio controls><source src="https://huggingface.co/kugelaudio/kugelaudio-0-open/resolve/main/samples/261_Sauerer_Felix.wav" type="audio/wav"></audio> |
+| **Radio Announcer** | Professional radio broadcast voice | <audio controls><source src="https://huggingface.co/kugelaudio/kugelaudio-0-open/resolve/main/samples/277_Radio_Lars.wav" type="audio/wav"></audio> |
+*All samples are generated using pre-encoded voice embeddings.*
+### Training Details
+- **Base Model**: [Microsoft VibeVoice](https://github.com/microsoft/VibeVoice)
+- **Training Data**: ~200,000 hours from [YODAS2](https://huggingface.co/datasets/espnet/yodas)
+- **Hardware**: 8x NVIDIA H100 GPUs
+- **Training Duration**: 5 days
+### Supported Languages
+This model supports the following European languages:
+| Language | Code | Flag | Language | Code | Flag | Language | Code | Flag |
+|----------|------|------|----------|------|------|----------|------|------|
+| English | en | 🇺🇸 | German | de | 🇩🇪 | French | fr | 🇫🇷 |
+| Spanish | es | 🇪🇸 | Italian | it | 🇮🇹 | Portuguese | pt | 🇵🇹 |
+| Dutch | nl | 🇳🇱 | Polish | pl | 🇵🇱 | Russian | ru | 🇷🇺 |
+| Ukrainian | uk | 🇺🇦 | Czech | cs | 🇨🇿 | Romanian | ro | 🇷🇴 |
+| Hungarian | hu | 🇭🇺 | Swedish | sv | 🇸🇪 | Danish | da | 🇩🇰 |
+| Finnish | fi | 🇫🇮 | Norwegian | no | 🇳🇴 | Greek | el | 🇬🇷 |
+| Bulgarian | bg | 🇧🇬 | Slovak | sk | 🇸🇰 | Croatian | hr | 🇭🇷 |
+| Serbian | sr | 🇷🇸 | Turkish | tr | 🇹🇷 | | | |
+> **📊 Language Coverage Disclaimer**: Quality varies significantly by language. Spanish, French, English, and German have the strongest representation in our training data (~200,000 hours from YODAS2). Other languages may have reduced quality, prosody, or vocabulary coverage depending on their availability in the training dataset.
+### Model Specifications
+| Property              | Value                                                                       |
+| --------------------- | --------------------------------------------------------------------------- |
+| **Parameters**        | 7B                                                                          |
+| **Architecture**      | AR + Diffusion (Qwen2.5-7B backbone)                                        |
+| **Base Model**        | [Microsoft VibeVoice](https://github.com/microsoft/VibeVoice)               |
+| **Audio Sample Rate** | 24kHz                                                                       |
+| **Audio Format**       | Mono, float32                                                               |
+| **VRAM Required**     | \~19GB                                                                      |
+| **Training Hardware** | 8x NVIDIA H100                                                              |
+| **Training Duration** | 5 days                                                                      |
+| **Training Data**     | \~200,000 hours from [YODAS2](https://huggingface.co/datasets/espnet/yodas) |
+## Quick Start
+### Installation
+```bash
+# Install with pip
+pip install kugelaudio-open
+# Or with uv (recommended)
+uv pip install kugelaudio-open
+```
+### Basic Usage
+```python
+from kugelaudio_open import (
+    KugelAudioForConditionalGenerationInference,
+    KugelAudioProcessor,
+)
+import torch
+# Load model
+device = "cuda" if torch.cuda.is_available() else "cpu"
+model = KugelAudioForConditionalGenerationInference.from_pretrained(
+    "kugelaudio/kugelaudio-0-open",
+    torch_dtype=torch.bfloat16,
+).to(device)
+model.eval()
+processor = KugelAudioProcessor.from_pretrained("kugelaudio/kugelaudio-0-open")
+# Strip encoder weights to save VRAM (only decoders needed for inference)
+model.model.strip_encoders()
+# See available voices
+print(processor.get_available_voices())  # ["default", "warm", "clear"]
+# Generate speech with a specific voice
+inputs = processor(text="Hallo Welt! Das ist KugelAudio.", voice="default", return_tensors="pt")
+inputs = {k: v.to(device) if isinstance(v, torch.Tensor) else v for k, v in inputs.items()}
+with torch.no_grad():
+    outputs = model.generate(**inputs, cfg_scale=3.0)
+# Save audio
+processor.save_audio(outputs.speech_outputs[0], "output.wav")
+```
+### Voices
+KugelAudio provides pre-encoded voices that can be selected by name. The voices are stored as `.pt` files in the `voices/` folder and are automatically downloaded when needed.
+```python
+# List available voices
+voices = processor.get_available_voices()
+print(voices)  # ["default", "warm", "clear"]
+# Generate with a specific voice
+inputs = processor(text="Hallo, das ist eine warme Stimme!", voice="warm", return_tensors="pt")
+inputs = {k: v.to(device) if isinstance(v, torch.Tensor) else v for k, v in inputs.items()}
+with torch.no_grad():
+    outputs = model.generate(**inputs, cfg_scale=3.0)
+processor.save_audio(outputs.speech_outputs[0], "warm_voice_output.wav")
+```
+> **Note:** Voice cloning from raw audio is not supported in this open-source release. Only the pre-encoded voices listed in `voices/voices.json` are available.
+### Generation Parameters
+| Parameter        | Default | Description                                                                |
+| ---------------- | ------- | -------------------------------------------------------------------------- |
+| cfg\_scale       | 3.0     | Classifier-free guidance scale (1.0-10.0). Higher = more adherence to text |
+| max\_new\_tokens | 2048    | Maximum number of tokens to generate                                       |
+| do\_sample       | False   | Whether to use sampling (vs greedy decoding)                               |
+| temperature      | 1.0     | Sampling temperature (if do_sample=True)                                  |
+## Architecture
+KugelAudio uses a hybrid **Autoregressive + Diffusion** architecture based on Microsoft's VibeVoice:
+```
+Text Input → Qwen2.5-7B Backbone → Diffusion Head → Acoustic Decoder → Audio Output
+                                         ↑
+                              Pre-encoded Voice Embedding
+```
+1. **Text Encoder**: Qwen2.5-7B language model encodes input text
+2. **Diffusion Head**: Predicts speech latents using denoising diffusion (20 steps)
+3. **Acoustic Decoder**: Hierarchical convolutional decoder converts latents to 24kHz audio
+## Audio Watermarking
+All audio generated by this model is automatically watermarked using Facebook's AudioSeal. The watermark is:
+* **Imperceptible**: No audible difference in audio quality
+* **Robust**: Survives compression, resampling, and editing
+* **Detectable**: Can verify if audio was generated by KugelAudio
+### Verify Watermark
+```python
+from kugelaudio_open.watermark import AudioWatermark
+watermark = AudioWatermark()
+result = watermark.detect(audio, sample_rate=24000)
+print(f"Watermark detected: {result.detected}")
+print(f"Confidence: {result.confidence:.1%}")
+```
+## Intended Use
+### ✅ Appropriate Uses
+* **Accessibility**: Text-to-speech for visually impaired users
+* **Content Creation**: Podcasts, videos, audiobooks, e-learning
+* **Voice Assistants**: Chatbots and virtual assistants
+* **Language Learning**: Pronunciation practice and language education
+* **Creative Projects**: With proper consent and attribution
+### ❌ Prohibited Uses
+* Creating deepfakes or misleading content
+* Impersonating individuals without explicit consent
+* Fraud, deception, or scams
+* Harassment or abuse
+* Any illegal activities
+## Limitations
+* **VRAM Requirements**: Requires \~19GB VRAM for inference (less with `strip_encoders()`)
+* **Speed**: Approximately 1.0x real-time on modern GPUs
+* **Language Quality Variation**: Quality may vary across languages based on training data distribution
+## Hosted API
+For production use without managing infrastructure, use our hosted API at kugelaudio.com:
+* ⚡ **Ultra-low latency**: <100ms end-to-end
+* 🌍 **Global edge deployment**
+* 🔧 **Zero setup required**
+* 📈 **Auto-scaling**
+```python
+from kugelaudio import KugelAudio
+client = KugelAudio(api_key="your_api_key")
+audio = client.tts.generate(text="Hello from KugelAudio!", model="kugel-1-turbo")
+audio.save("output.wav")
+```
+## Acknowledgments
+This model would not have been possible without the contributions of many individuals and organizations:
+* **Microsoft VibeVoice Team**: For the excellent foundation architecture that this model builds upon
+* **YODAS2 Dataset**: For providing the large-scale multilingual speech data
+* **Qwen Team**: For the powerful language model backbone
+* **Facebook AudioSeal**: For the audio watermarking technology
+### Special Thanks
+* **Carlos Menke**: For his invaluable efforts in gathering the first datasets and extensive work benchmarking the model
+* **AI Service Center Berlin-Brandenburg (KI-Servicezentrum)**: For providing the GPU resources (8x H100) that made training this model possible
+## Citation
+```bibtex
+@software{kugelaudio2026,
+  title = {KugelAudio: Open-Source Text-to-Speech for European Languages},
+  author = {Kratzenstein, Kajo and Menke, Carlos},
+  year = {2026},
+  institution = {Hasso-Plattner-Institut},
+  url = {https://huggingface.co/kugelaudio/kugelaudio-0-open}
+}
+```
+## License
+This model is released under the MIT License.
+## Author
+**Kajo Kratzenstein**
+📧 [kajo@kugelaudio.com](mailto:kajo@kugelaudio.com)
+🌐 [kugelaudio.com](https://kugelaudio.com)
+**Carlos Menke**
+---
+**Funding Notice**
+Das zugrunde liegende Vorhaben wurde mit Mitteln des Bundesministeriums für Forschung, Technologie und Raumfahrt unter dem Förderkennzeichen »KI-Servicezentrum Berlin-Brandenburg« 16IS22092 gefördert.
+_This project was funded by the German Federal Ministry of Research, Technology and Space under the funding code "AI Service Center Berlin-Brandenburg" 16IS22092._

config.json ADDED Viewed

	@@ -0,0 +1,122 @@

+{
+  "acostic_vae_dim": 64,
+  "acoustic_tokenizer_config": {
+    "causal": true,
+    "channels": 1,
+    "conv_bias": true,
+    "conv_norm": "none",
+    "corpus_normalize": 0.0,
+    "decoder_depths": null,
+    "decoder_n_filters": 32,
+    "decoder_ratios": [
+      8,
+      5,
+      5,
+      4,
+      2,
+      2
+    ],
+    "disable_last_norm": true,
+    "encoder_depths": "3-3-3-3-3-3-8",
+    "encoder_n_filters": 32,
+    "encoder_ratios": [
+      8,
+      5,
+      5,
+      4,
+      2,
+      2
+    ],
+    "fix_std": 0.5,
+    "layer_scale_init_value": 1e-06,
+    "layernorm": "RMSNorm",
+    "layernorm_elementwise_affine": true,
+    "layernorm_eps": 1e-05,
+    "mixer_layer": "depthwise_conv",
+    "model_type": "kugelaudio_acoustic_tokenizer",
+    "pad_mode": "constant",
+    "std_dist_type": "gaussian",
+    "torch_dtype": "bfloat16",
+    "vae_dim": 64,
+    "weight_init_value": 0.01
+  },
+  "acoustic_vae_dim": 64,
+  "architectures": [
+    "KugelAudioForConditionalGeneration"
+  ],
+  "decoder_config": {
+    "attention_dropout": 0.0,
+    "hidden_act": "silu",
+    "hidden_size": 3584,
+    "initializer_range": 0.02,
+    "intermediate_size": 18944,
+    "max_position_embeddings": 32768,
+    "max_window_layers": 28,
+    "model_type": "qwen2",
+    "num_attention_heads": 28,
+    "num_hidden_layers": 28,
+    "num_key_value_heads": 4,
+    "rms_norm_eps": 1e-06,
+    "rope_scaling": null,
+    "rope_theta": 1000000.0,
+    "sliding_window": null,
+    "torch_dtype": "bfloat16",
+    "use_cache": true,
+    "use_mrope": false,
+    "use_sliding_window": false,
+    "vocab_size": 152064
+  },
+  "diffusion_head_config": {
+    "ddpm_algorithm_type": "sde-dpmsolver++",
+    "ddpm_batch_mul": 4,
+    "ddpm_beta_schedule": "cosine",
+    "ddpm_num_inference_steps": 20,
+    "ddpm_num_steps": 1000,
+    "diffusion_type": "ddpm",
+    "head_ffn_ratio": 3.0,
+    "head_layers": 4,
+    "hidden_size": 3584,
+    "latent_size": 64,
+    "model_type": "kugelaudio_diffusion_head",
+    "prediction_type": "v_prediction",
+    "rms_norm_eps": 1e-05,
+    "speech_vae_dim": 64,
+    "torch_dtype": "bfloat16"
+  },
+  "model_type": "kugelaudio",
+  "semantic_tokenizer_config": {
+    "causal": true,
+    "channels": 1,
+    "conv_bias": true,
+    "conv_norm": "none",
+    "corpus_normalize": 0.0,
+    "disable_last_norm": true,
+    "encoder_depths": "3-3-3-3-3-3-8",
+    "encoder_n_filters": 32,
+    "encoder_ratios": [
+      8,
+      5,
+      5,
+      4,
+      2,
+      2
+    ],
+    "fix_std": 0,
+    "layer_scale_init_value": 1e-06,
+    "layernorm": "RMSNorm",
+    "layernorm_elementwise_affine": true,
+    "layernorm_eps": 1e-05,
+    "mixer_layer": "depthwise_conv",
+    "model_type": "kugelaudio_semantic_tokenizer",
+    "pad_mode": "constant",
+    "std_dist_type": "none",
+    "torch_dtype": "bfloat16",
+    "vae_dim": 128,
+    "weight_init_value": 0.01
+  },
+  "semantic_vae_dim": 128,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.52.0.dev0",
+  "ddpm_inference_steps": 20
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "_from_model_config": true,
+  "transformers_version": "4.52.0.dev0"
+}

model-00001-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b4209041f36d8de076ceaad966b026e951d2e58337466f653a1bbf3142c8ab10
+size 4877662532

model-00002-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d236dd36482846352f570d21685661d7683bade9926e0a4d855f43ba1c4ea148
+size 4932752840

model-00003-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6e5feac1d14fd375e1e0b1ceeab31cd130517280f66dc630aac6f5b42c6795b9
+size 4982901128

model-00004-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3bbbeec7dacf727e2b086f3ef7796b17bcf00c8c25834a27df2031a1cf3774ea
+size 3893553730

model.safetensors.index.json ADDED Viewed

The diff for this file is too large to render. See raw diff

samples/258_Lukas_der_Flüsterer.wav ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:75309e892ff871fce2ee1d4f971ce1c168e4b65290b2498e8869561da3bf69cf
+size 320044

samples/261_Sauerer_Felix.wav ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f5b098fca73349f848e69a02fb13a647853caaf4cec4c5756a7ce4e0e99c3fc7
+size 185644

samples/266_Petra_die_Vorleserin.wav ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b24761c0a67b3164bc088291b524203287c67b51e777b3df9e86433e76c2dc45
+size 256044

samples/277_Radio_Lars.wav ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e1844e12b490ee1aa37117c9e5139164f51d97e85b0ae60e81a8c10990924f4e
+size 313644

voices/voices.json ADDED Viewed

	@@ -0,0 +1,17 @@

+{
+  "default": {
+    "file": "default.pt",
+    "description": "Default neutral voice",
+    "language": "en"
+  },
+  "warm": {
+    "file": "warm.pt",
+    "description": "Warm, friendly voice",
+    "language": "en"
+  },
+  "clear": {
+    "file": "clear.pt",
+    "description": "Clear, professional voice",
+    "language": "en"
+  }
+}