xTHExBEASTx commited on
Commit
987b8b9
·
verified ·
1 Parent(s): ac8857b

Upload folder using huggingface_hub

Browse files
Files changed (9) hide show
  1. .gitattributes +1 -35
  2. .gitignore +12 -0
  3. README.md +13 -12
  4. app.py +845 -0
  5. custom_voices/deep.wav +3 -0
  6. en.txt +0 -0
  7. frankenstein5k.md +11 -0
  8. gatsby5k.md +17 -0
  9. requirements.txt +10 -0
.gitattributes CHANGED
@@ -1,35 +1 @@
1
- *.7z filter=lfs diff=lfs merge=lfs -text
2
- *.arrow filter=lfs diff=lfs merge=lfs -text
3
- *.bin filter=lfs diff=lfs merge=lfs -text
4
- *.bz2 filter=lfs diff=lfs merge=lfs -text
5
- *.ckpt filter=lfs diff=lfs merge=lfs -text
6
- *.ftz filter=lfs diff=lfs merge=lfs -text
7
- *.gz filter=lfs diff=lfs merge=lfs -text
8
- *.h5 filter=lfs diff=lfs merge=lfs -text
9
- *.joblib filter=lfs diff=lfs merge=lfs -text
10
- *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
- *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
- *.model filter=lfs diff=lfs merge=lfs -text
13
- *.msgpack filter=lfs diff=lfs merge=lfs -text
14
- *.npy filter=lfs diff=lfs merge=lfs -text
15
- *.npz filter=lfs diff=lfs merge=lfs -text
16
- *.onnx filter=lfs diff=lfs merge=lfs -text
17
- *.ot filter=lfs diff=lfs merge=lfs -text
18
- *.parquet filter=lfs diff=lfs merge=lfs -text
19
- *.pb filter=lfs diff=lfs merge=lfs -text
20
- *.pickle filter=lfs diff=lfs merge=lfs -text
21
- *.pkl filter=lfs diff=lfs merge=lfs -text
22
- *.pt filter=lfs diff=lfs merge=lfs -text
23
- *.pth filter=lfs diff=lfs merge=lfs -text
24
- *.rar filter=lfs diff=lfs merge=lfs -text
25
- *.safetensors filter=lfs diff=lfs merge=lfs -text
26
- saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
- *.tar.* filter=lfs diff=lfs merge=lfs -text
28
- *.tar filter=lfs diff=lfs merge=lfs -text
29
- *.tflite filter=lfs diff=lfs merge=lfs -text
30
- *.tgz filter=lfs diff=lfs merge=lfs -text
31
- *.wasm filter=lfs diff=lfs merge=lfs -text
32
- *.xz filter=lfs diff=lfs merge=lfs -text
33
- *.zip filter=lfs diff=lfs merge=lfs -text
34
- *.zst filter=lfs diff=lfs merge=lfs -text
35
- *tfevents* filter=lfs diff=lfs merge=lfs -text
 
1
+ *.wav filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
.gitignore ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .cache/
2
+ outputs/
3
+ __pycache__/
4
+ *.pyc
5
+ .env
6
+ *.srt
7
+ *.mp3
8
+ *.json
9
+ !presets.json
10
+ !en.txt
11
+ !frankenstein5k.md
12
+ !gatsby5k.md
README.md CHANGED
@@ -1,12 +1,13 @@
1
- ---
2
- title: Kokoro Improved
3
- emoji: 🦀
4
- colorFrom: green
5
- colorTo: purple
6
- sdk: gradio
7
- sdk_version: 6.9.0
8
- app_file: app.py
9
- pinned: false
10
- ---
11
-
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
1
+ ---
2
+ title: Kokoro Improved
3
+ emoji: 🎙️
4
+ colorFrom: indigo
5
+ colorTo: blue
6
+ sdk: gradio
7
+ sdk_version: 6.8.0
8
+ app_file: app.py
9
+ pinned: false
10
+ ---
11
+
12
+ # Kokoro Improved
13
+ Text-to-Speech using Kokoro models.
app.py ADDED
@@ -0,0 +1,845 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from kokoro import KModel, KPipeline
2
+ import gradio as gr
3
+ import os
4
+ import random
5
+ import torch
6
+ import numpy as np
7
+ import soundfile as sf
8
+ import tempfile
9
+ import json
10
+ import re
11
+ from pathlib import Path
12
+ from pydub import AudioSegment
13
+ import noisereduce as nr
14
+ from pedalboard import Pedalboard, Compressor, Gain, LowShelfFilter, HighShelfFilter
15
+ import whisper
16
+
17
+ # ============== CONFIGURATION ==============
18
+ CUDA_AVAILABLE = torch.cuda.is_available()
19
+ print(f"CUDA Available: {CUDA_AVAILABLE}")
20
+
21
+ CHAR_LIMIT = None # No limit for local use
22
+ PRESETS_FILE = Path(__file__).parent / "presets.json"
23
+ OUTPUT_DIR = Path(__file__).parent / "outputs"
24
+ OUTPUT_DIR.mkdir(exist_ok=True)
25
+
26
+ # Whisper model - lazy loaded when needed
27
+ whisper_model = None
28
+
29
+ def get_whisper_model():
30
+ """Lazy load Whisper model only when needed."""
31
+ global whisper_model
32
+ if whisper_model is None:
33
+ print("Loading Whisper medium model...")
34
+ whisper_model = whisper.load_model("medium", device="cuda" if CUDA_AVAILABLE else "cpu")
35
+ print(f"Whisper loaded on: {'CUDA' if CUDA_AVAILABLE else 'CPU'}")
36
+ return whisper_model
37
+
38
+ # ============== MODEL LOADING ==============
39
+ models = {gpu: KModel().to('cuda' if gpu else 'cpu').eval() for gpu in [False] + ([True] if CUDA_AVAILABLE else [])}
40
+ pipelines = {lang_code: KPipeline(lang_code=lang_code, model=False) for lang_code in 'ab'}
41
+ pipelines['a'].g2p.lexicon.golds['kokoro'] = 'kˈOkəɹO'
42
+ pipelines['b'].g2p.lexicon.golds['kokoro'] = 'kˈQkəɹQ'
43
+
44
+ # ============== VOICE CHOICES ==============
45
+ CHOICES = {
46
+ '🇺🇸 🚺 Heart ❤️': 'af_heart',
47
+ '🇺🇸 🚺 Bella 🔥': 'af_bella',
48
+ '🇺🇸 🚺 Nicole 🎧': 'af_nicole',
49
+ '🇺🇸 🚺 Aoede': 'af_aoede',
50
+ '🇺🇸 🚺 Kore': 'af_kore',
51
+ '🇺🇸 🚺 Sarah': 'af_sarah',
52
+ '🇺🇸 🚺 Nova': 'af_nova',
53
+ '🇺🇸 🚺 Sky': 'af_sky',
54
+ '🇺🇸 🚺 Alloy': 'af_alloy',
55
+ '🇺🇸 🚺 Jessica': 'af_jessica',
56
+ '🇺🇸 🚺 River': 'af_river',
57
+ '🇺🇸 🚹 Michael': 'am_michael',
58
+ '🇺🇸 🚹 Fenrir': 'am_fenrir',
59
+ '🇺🇸 🚹 Puck': 'am_puck',
60
+ '🇺🇸 🚹 Echo': 'am_echo',
61
+ '🇺🇸 🚹 Eric': 'am_eric',
62
+ '🇺🇸 🚹 Liam': 'am_liam',
63
+ '🇺🇸 🚹 Onyx': 'am_onyx',
64
+ '🇺🇸 🚹 Santa': 'am_santa',
65
+ '🇺🇸 🚹 Adam': 'am_adam',
66
+ '🇬🇧 🚺 Emma': 'bf_emma',
67
+ '🇬🇧 🚺 Isabella': 'bf_isabella',
68
+ '🇬🇧 🚺 Alice': 'bf_alice',
69
+ '🇬🇧 🚺 Lily': 'bf_lily',
70
+ '🇬🇧 🚹 George': 'bm_george',
71
+ '🇬🇧 🚹 Fable': 'bm_fable',
72
+ '🇬🇧 🚹 Lewis': 'bm_lewis',
73
+ '🇬🇧 🚹 Daniel': 'bm_daniel',
74
+ }
75
+
76
+ print("Loading voices...")
77
+ for v in CHOICES.values():
78
+ pipelines[v[0]].load_voice(v)
79
+ print("Voices loaded!")
80
+
81
+ # ============== HELPER FUNCTIONS ==============
82
+
83
+ def forward_gpu(ps, ref_s, speed):
84
+ return models[True](ps, ref_s, speed)
85
+
86
+ def split_text_into_chunks(text, max_chars=500):
87
+ """Split long text into chunks at sentence boundaries."""
88
+ sentences = re.split(r'(?<=[.!?])\s+', text)
89
+ chunks = []
90
+ current_chunk = ""
91
+
92
+ for sentence in sentences:
93
+ if len(current_chunk) + len(sentence) <= max_chars:
94
+ current_chunk += sentence + " "
95
+ else:
96
+ if current_chunk:
97
+ chunks.append(current_chunk.strip())
98
+ current_chunk = sentence + " "
99
+
100
+ if current_chunk:
101
+ chunks.append(current_chunk.strip())
102
+
103
+ return chunks if chunks else [text]
104
+
105
+ def enhance_audio(audio_data, sample_rate, noise_reduce=True, normalize=True, eq_enhance=True):
106
+ """Apply audio enhancements."""
107
+ enhanced = audio_data.astype(np.float32)
108
+
109
+ # Noise reduction
110
+ if noise_reduce:
111
+ try:
112
+ enhanced = nr.reduce_noise(y=enhanced, sr=sample_rate, prop_decrease=0.6)
113
+ except:
114
+ pass
115
+
116
+ # EQ and compression using pedalboard
117
+ if eq_enhance:
118
+ try:
119
+ board = Pedalboard([
120
+ LowShelfFilter(cutoff_frequency_hz=200, gain_db=1.5),
121
+ HighShelfFilter(cutoff_frequency_hz=4000, gain_db=2.0),
122
+ Compressor(threshold_db=-20, ratio=3),
123
+ ])
124
+ enhanced = board(enhanced, sample_rate)
125
+ except:
126
+ pass
127
+
128
+ # Normalize
129
+ if normalize:
130
+ max_val = np.max(np.abs(enhanced))
131
+ if max_val > 0:
132
+ enhanced = enhanced / max_val * 0.95
133
+
134
+ return enhanced
135
+
136
+ # ============== SSML PARSER ==============
137
+
138
+ def parse_ssml(text):
139
+ """
140
+ Parse SSML-like markup and convert to Kokoro-compatible format.
141
+
142
+ Supported tags:
143
+ - <break time="500ms"/> or <break time="1s"/> - Insert pause (converts to ...)
144
+ - <emphasis>word</emphasis> - Emphasis (converts to [word](+1))
145
+ - <emphasis level="strong">word</emphasis> - Strong emphasis (converts to [word](+2))
146
+ - <prosody rate="slow">text</prosody> - Slow speech (handled by speed parameter)
147
+ - <prosody rate="fast">text</prosody> - Fast speech (handled by speed parameter)
148
+ - <say-as interpret-as="spell-out">ABC</say-as> - Spell out letters
149
+ - <phoneme ph="kˈOkəɹO">Kokoro</phoneme> - Custom pronunciation (converts to [Kokoro](/kˈOkəɹO/))
150
+ - <sub alias="replacement">original</sub> - Substitute text
151
+ """
152
+ result = text
153
+
154
+ # Handle <break> tags - convert to ellipsis for pauses
155
+ result = re.sub(r'<break\s+time=["\'](\d+)ms["\']\s*/>', lambda m: '...' if int(m.group(1)) >= 300 else '..', result)
156
+ result = re.sub(r'<break\s+time=["\'](\d+)s["\']\s*/>', lambda m: '... ' * int(m.group(1)), result)
157
+ result = re.sub(r'<break\s*/>', '...', result)
158
+
159
+ # Handle <emphasis> tags
160
+ result = re.sub(r'<emphasis\s+level=["\']strong["\']\s*>([^<]+)</emphasis>', r'[\1](+2)', result)
161
+ result = re.sub(r'<emphasis\s+level=["\']moderate["\']\s*>([^<]+)</emphasis>', r'[\1](+1)', result)
162
+ result = re.sub(r'<emphasis\s*>([^<]+)</emphasis>', r'[\1](+1)', result)
163
+
164
+ # Handle <phoneme> tags - custom pronunciation
165
+ result = re.sub(r'<phoneme\s+ph=["\']([^"\']+)["\']\s*>([^<]+)</phoneme>', r'[\2](/\1/)', result)
166
+
167
+ # Handle <sub> tags - substitution
168
+ result = re.sub(r'<sub\s+alias=["\']([^"\']+)["\']\s*>[^<]+</sub>', r'\1', result)
169
+
170
+ # Handle <say-as interpret-as="spell-out"> - spell out letters
171
+ def spell_out(match):
172
+ text = match.group(1)
173
+ return ' '.join(list(text))
174
+ result = re.sub(r'<say-as\s+interpret-as=["\']spell-out["\']\s*>([^<]+)</say-as>', spell_out, result)
175
+
176
+ # Handle <say-as interpret-as="characters"> - same as spell-out
177
+ result = re.sub(r'<say-as\s+interpret-as=["\']characters["\']\s*>([^<]+)</say-as>', spell_out, result)
178
+
179
+ # Remove any remaining prosody tags (rate is handled by speed slider)
180
+ result = re.sub(r'<prosody[^>]*>', '', result)
181
+ result = re.sub(r'</prosody>', '', result)
182
+
183
+ # Remove any other unhandled tags
184
+ result = re.sub(r'<[^>]+>', '', result)
185
+
186
+ return result.strip()
187
+
188
+ # ============== SRT GENERATION (WHISPER) ==============
189
+
190
+ def format_srt_time(seconds):
191
+ """Convert seconds to SRT timestamp format (HH:MM:SS,mmm)."""
192
+ hours = int(seconds // 3600)
193
+ minutes = int((seconds % 3600) // 60)
194
+ secs = int(seconds % 60)
195
+ millis = int((seconds % 1) * 1000)
196
+ return f"{hours:02d}:{minutes:02d}:{secs:02d},{millis:03d}"
197
+
198
+ def generate_srt_from_file(audio_file, progress=gr.Progress()):
199
+ """
200
+ Generate SRT subtitle file from uploaded audio using Whisper.
201
+ """
202
+ if audio_file is None:
203
+ return None, "❌ Please upload an audio file"
204
+
205
+ progress(0, desc="Loading Whisper model (first time only)...")
206
+
207
+ # Get filename for output
208
+ base_name = Path(audio_file).stem
209
+
210
+ # Get or load Whisper model
211
+ model = get_whisper_model()
212
+
213
+ progress(0.2, desc="Transcribing with Whisper (GPU)..." if CUDA_AVAILABLE else "Transcribing with Whisper (CPU)...")
214
+
215
+ # Transcribe with Whisper - fp16 for GPU speed
216
+ result = model.transcribe(
217
+ audio_file,
218
+ language="en",
219
+ word_timestamps=False, # Faster without word-level timestamps
220
+ verbose=False,
221
+ fp16=CUDA_AVAILABLE # Use FP16 on GPU for speed
222
+ )
223
+
224
+ progress(0.8, desc="Generating SRT file...")
225
+
226
+ # Generate SRT content from segments
227
+ srt_content = []
228
+ for i, segment in enumerate(result["segments"], 1):
229
+ start_time = segment["start"]
230
+ end_time = segment["end"]
231
+ text = segment["text"].strip()
232
+
233
+ srt_content.append(f"{i}")
234
+ srt_content.append(f"{format_srt_time(start_time)} --> {format_srt_time(end_time)}")
235
+ srt_content.append(text)
236
+ srt_content.append("") # Empty line between entries
237
+
238
+ # Save SRT file
239
+ srt_path = OUTPUT_DIR / f"{base_name}.srt"
240
+ with open(srt_path, 'w', encoding='utf-8') as f:
241
+ f.write('\n'.join(srt_content))
242
+
243
+ progress(1.0, desc="Done!")
244
+
245
+ # Return file for download and status
246
+ return str(srt_path), f"✅ SRT saved: {srt_path}\n\n📊 {len(result['segments'])} segments created"
247
+
248
+ def export_audio(audio_data, sample_rate, format_type, filename=None):
249
+ """Export audio to different formats."""
250
+ if filename is None:
251
+ filename = f"kokoro_output_{random.randint(1000, 9999)}"
252
+
253
+ output_path = OUTPUT_DIR / f"{filename}.{format_type.lower()}"
254
+
255
+ if format_type.upper() == "WAV":
256
+ sf.write(str(output_path), audio_data, sample_rate)
257
+ else:
258
+ # Save as WAV first to a temp file
259
+ temp_wav_path = OUTPUT_DIR / f"_temp_{random.randint(10000, 99999)}.wav"
260
+ sf.write(str(temp_wav_path), audio_data, sample_rate)
261
+
262
+ try:
263
+ audio = AudioSegment.from_wav(str(temp_wav_path))
264
+ if format_type.upper() == "MP3":
265
+ audio.export(str(output_path), format="mp3", bitrate="192k")
266
+ elif format_type.upper() == "OGG":
267
+ audio.export(str(output_path), format="ogg", codec="libvorbis")
268
+ elif format_type.upper() == "FLAC":
269
+ audio.export(str(output_path), format="flac")
270
+ finally:
271
+ # Clean up temp file
272
+ try:
273
+ temp_wav_path.unlink()
274
+ except:
275
+ pass
276
+
277
+ return str(output_path)
278
+
279
+ # ============== PRESETS MANAGEMENT ==============
280
+
281
+ def load_presets():
282
+ if PRESETS_FILE.exists():
283
+ with open(PRESETS_FILE, 'r') as f:
284
+ return json.load(f)
285
+ return {}
286
+
287
+ def save_preset(name, voice, speed, enhance_audio_flag, noise_reduce, normalize, eq_enhance):
288
+ presets = load_presets()
289
+ presets[name] = {
290
+ "voice": voice,
291
+ "speed": speed,
292
+ "enhance_audio": enhance_audio_flag,
293
+ "noise_reduce": noise_reduce,
294
+ "normalize": normalize,
295
+ "eq_enhance": eq_enhance
296
+ }
297
+ with open(PRESETS_FILE, 'w') as f:
298
+ json.dump(presets, f, indent=2)
299
+ return gr.update(choices=list(load_presets().keys())), f"Preset '{name}' saved!"
300
+
301
+ def load_preset(preset_name):
302
+ presets = load_presets()
303
+ if preset_name in presets:
304
+ p = presets[preset_name]
305
+ return p["voice"], p["speed"], p.get("enhance_audio", False), p.get("noise_reduce", True), p.get("normalize", True), p.get("eq_enhance", True)
306
+ return "af_heart", 1.0, False, True, True, True
307
+
308
+ def delete_preset(preset_name):
309
+ presets = load_presets()
310
+ if preset_name in presets:
311
+ del presets[preset_name]
312
+ with open(PRESETS_FILE, 'w') as f:
313
+ json.dump(presets, f, indent=2)
314
+ return gr.update(choices=list(load_presets().keys())), f"Preset '{preset_name}' deleted!"
315
+ return gr.update(), "Preset not found!"
316
+
317
+ # ============== CORE TTS FUNCTIONS ==============
318
+
319
+ def generate_first(text, voice='af_heart', speed=1, use_gpu=CUDA_AVAILABLE,
320
+ enhance=False, noise_reduce=True, normalize=True, eq_enhance=True,
321
+ use_ssml=False):
322
+ """Generate audio from text."""
323
+ # Parse SSML if enabled
324
+ if use_ssml:
325
+ text = parse_ssml(text)
326
+
327
+ text = text if CHAR_LIMIT is None else text.strip()[:CHAR_LIMIT]
328
+ pipeline = pipelines[voice[0]]
329
+ pack = pipeline.load_voice(voice)
330
+ use_gpu = use_gpu and CUDA_AVAILABLE
331
+
332
+ for _, ps, _ in pipeline(text, voice, speed):
333
+ ref_s = pack[len(ps)-1]
334
+ try:
335
+ if use_gpu:
336
+ audio = forward_gpu(ps, ref_s, speed)
337
+ else:
338
+ audio = models[False](ps, ref_s, speed)
339
+ except Exception as e:
340
+ if use_gpu:
341
+ gr.Warning(str(e))
342
+ gr.Info('Retrying with CPU.')
343
+ audio = models[False](ps, ref_s, speed)
344
+ else:
345
+ raise gr.Error(str(e))
346
+
347
+ audio_np = audio.numpy()
348
+
349
+ # Apply enhancements if requested
350
+ if enhance:
351
+ audio_np = enhance_audio(audio_np, 24000, noise_reduce, normalize, eq_enhance)
352
+
353
+ return (24000, audio_np), ps
354
+ return None, ''
355
+
356
+ def generate_all(text, voice='af_heart', speed=1, use_gpu=CUDA_AVAILABLE,
357
+ enhance=False, noise_reduce=True, normalize=True, eq_enhance=True,
358
+ use_ssml=False):
359
+ """Stream audio generation."""
360
+ # Parse SSML if enabled
361
+ if use_ssml:
362
+ text = parse_ssml(text)
363
+
364
+ text = text if CHAR_LIMIT is None else text.strip()[:CHAR_LIMIT]
365
+ pipeline = pipelines[voice[0]]
366
+ pack = pipeline.load_voice(voice)
367
+ use_gpu = use_gpu and CUDA_AVAILABLE
368
+ first = True
369
+
370
+ for _, ps, _ in pipeline(text, voice, speed):
371
+ ref_s = pack[len(ps)-1]
372
+ try:
373
+ if use_gpu:
374
+ audio = forward_gpu(ps, ref_s, speed)
375
+ else:
376
+ audio = models[False](ps, ref_s, speed)
377
+ except Exception as e:
378
+ if use_gpu:
379
+ gr.Warning(str(e))
380
+ gr.Info('Switching to CPU')
381
+ audio = models[False](ps, ref_s, speed)
382
+ else:
383
+ raise gr.Error(str(e))
384
+
385
+ audio_np = audio.numpy()
386
+ if enhance:
387
+ audio_np = enhance_audio(audio_np, 24000, noise_reduce, normalize, eq_enhance)
388
+
389
+ yield 24000, audio_np
390
+ if first:
391
+ first = False
392
+ yield 24000, torch.zeros(1).numpy()
393
+
394
+ def generate_long_text(text, voice='af_heart', speed=1, use_gpu=CUDA_AVAILABLE,
395
+ enhance=False, noise_reduce=True, normalize=True, eq_enhance=True,
396
+ use_ssml=False, progress=gr.Progress()):
397
+ """Handle very long texts by splitting into chunks."""
398
+ # Parse SSML if enabled
399
+ if use_ssml:
400
+ text = parse_ssml(text)
401
+
402
+ chunks = split_text_into_chunks(text, max_chars=500)
403
+ all_audio = []
404
+
405
+ pipeline = pipelines[voice[0]]
406
+ pack = pipeline.load_voice(voice)
407
+ use_gpu = use_gpu and CUDA_AVAILABLE
408
+
409
+ for i, chunk in enumerate(progress.tqdm(chunks, desc="Processing chunks")):
410
+ for _, ps, _ in pipeline(chunk, voice, speed):
411
+ ref_s = pack[len(ps)-1]
412
+ try:
413
+ if use_gpu:
414
+ audio = forward_gpu(ps, ref_s, speed)
415
+ else:
416
+ audio = models[False](ps, ref_s, speed)
417
+ except Exception as e:
418
+ if use_gpu:
419
+ audio = models[False](ps, ref_s, speed)
420
+ else:
421
+ raise gr.Error(str(e))
422
+ all_audio.append(audio.numpy())
423
+
424
+ if not all_audio:
425
+ return None, ""
426
+
427
+ # Concatenate all audio
428
+ full_audio = np.concatenate(all_audio)
429
+
430
+ # Apply enhancements
431
+ if enhance:
432
+ full_audio = enhance_audio(full_audio, 24000, noise_reduce, normalize, eq_enhance)
433
+
434
+ return (24000, full_audio), f"✅ Generated {len(chunks)} chunks"
435
+
436
+ def tokenize_first(text, voice='af_heart'):
437
+ pipeline = pipelines[voice[0]]
438
+ for _, ps, _ in pipeline(text, voice):
439
+ return ps
440
+ return ''
441
+
442
+ # ============== BATCH PROCESSING ==============
443
+
444
+ def process_batch(files, voice, speed, use_gpu, enhance, noise_reduce, normalize, eq_enhance,
445
+ output_format, use_ssml=False, progress=gr.Progress()):
446
+ """Process multiple text files."""
447
+ results = []
448
+
449
+ for file in progress.tqdm(files, desc="Processing files"):
450
+ try:
451
+ # Read text from file
452
+ with open(file.name, 'r', encoding='utf-8') as f:
453
+ text = f.read()
454
+
455
+ base_name = Path(file.name).stem
456
+
457
+ # Generate audio
458
+ audio_result, status = generate_long_text(
459
+ text, voice, speed, use_gpu, enhance,
460
+ noise_reduce, normalize, eq_enhance,
461
+ use_ssml, progress
462
+ )
463
+
464
+ if audio_result:
465
+ # Export to desired format
466
+ sample_rate, audio_data = audio_result
467
+ output_path = export_audio(audio_data, sample_rate, output_format, base_name)
468
+ results.append(f"✅ {base_name} -> {output_path}")
469
+ else:
470
+ results.append(f"❌ {base_name} - Failed to generate")
471
+ except Exception as e:
472
+ results.append(f"❌ {Path(file.name).stem} - Error: {str(e)}")
473
+
474
+ return "\n".join(results)
475
+
476
+ def export_single(audio, format_type, filename):
477
+ """Export single audio to file."""
478
+ if audio is None:
479
+ return "No audio to export!"
480
+
481
+ sample_rate, audio_data = audio
482
+ output_path = export_audio(audio_data, sample_rate, format_type, filename or "kokoro_output")
483
+ return f"Exported to: {output_path}"
484
+
485
+ # ============== SAMPLE TEXTS ==============
486
+
487
+ random_quotes = ["The only way to do great work is to love what you do.",
488
+ "Innovation distinguishes between a leader and a follower.",
489
+ "Stay hungry, stay foolish."]
490
+ try:
491
+ with open('en.txt', 'r') as r:
492
+ random_quotes = [line.strip() for line in r if line.strip()]
493
+ except:
494
+ pass
495
+
496
+ def get_random_quote():
497
+ return random.choice(random_quotes)
498
+
499
+ def get_gatsby():
500
+ try:
501
+ with open('gatsby5k.md', 'r') as r:
502
+ return r.read().strip()
503
+ except:
504
+ return "In my younger and more vulnerable years my father gave me some advice that I've been turning over in my mind ever since."
505
+
506
+ def get_frankenstein():
507
+ try:
508
+ with open('frankenstein5k.md', 'r') as r:
509
+ return r.read().strip()
510
+ except:
511
+ return "You will rejoice to hear that no disaster has accompanied the commencement of an enterprise which you have regarded with such evil forebodings."
512
+
513
+ # ============== UI COMPONENTS ==============
514
+
515
+ TOKEN_NOTE = '''
516
+ 💡 Customize pronunciation with Markdown link syntax and /slashes/ like `[Kokoro](/kˈOkəɹO/)`
517
+
518
+ 💬 To adjust intonation, try punctuation `;:,.!?—…"()""` or stress `ˈ` and `ˌ`
519
+
520
+ ⬇️ Lower stress `[1 level](-1)` or `[2 levels](-2)`
521
+
522
+ ⬆️ Raise stress 1 level `[or](+2)` 2 levels (only works on less stressed, usually short words)
523
+ '''
524
+
525
+ # ============== MAIN UI ==============
526
+
527
+ with gr.Blocks(title="Kokoro TTS Enhanced", theme=gr.themes.Soft()) as app:
528
+ gr.Markdown("""
529
+ # 🎙️ Kokoro TTS - Enhanced Local Edition
530
+ [**Kokoro**](https://huggingface.co/hexgrad/Kokoro-82M) is an open-weight TTS model with 82M parameters.
531
+
532
+ **Features:** Voice Cloning • Audio Enhancement • Long Text • Batch Processing • Export Formats • Presets
533
+ """)
534
+
535
+ with gr.Tabs():
536
+ # ============== GENERATE TAB ==============
537
+ with gr.Tab("🎵 Generate"):
538
+ with gr.Row():
539
+ with gr.Column(scale=1):
540
+ text = gr.Textbox(label='Input Text', lines=5, placeholder="Enter text to convert to speech...")
541
+
542
+ with gr.Row():
543
+ voice = gr.Dropdown(list(CHOICES.items()), value='af_heart', label='Voice')
544
+ use_gpu = gr.Dropdown(
545
+ [('GPU 🚀', True), ('CPU 🐌', False)] if CUDA_AVAILABLE else [('CPU 🐌', False)],
546
+ value=CUDA_AVAILABLE,
547
+ label='Hardware'
548
+ )
549
+
550
+ speed = gr.Slider(minimum=0.5, maximum=2, value=1, step=0.1, label='Speed')
551
+
552
+ with gr.Accordion("🔊 Audio Enhancement", open=False):
553
+ enhance_audio_flag = gr.Checkbox(label="Enable Enhancement", value=False)
554
+ with gr.Row():
555
+ noise_reduce = gr.Checkbox(label="Noise Reduction", value=True)
556
+ normalize = gr.Checkbox(label="Normalize", value=True)
557
+ eq_enhance = gr.Checkbox(label="EQ Boost", value=True)
558
+
559
+ use_ssml = gr.Checkbox(label="📝 Enable SSML", value=False, info="Parse SSML tags in input")
560
+
561
+ with gr.Row():
562
+ random_btn = gr.Button('🎲 Random')
563
+ gatsby_btn = gr.Button('🥂 Gatsby')
564
+ frank_btn = gr.Button('💀 Frankenstein')
565
+
566
+ with gr.Column(scale=1):
567
+ out_audio = gr.Audio(label='Output Audio', interactive=False, autoplay=True)
568
+
569
+ with gr.Row():
570
+ generate_btn = gr.Button('🎵 Generate', variant='primary', scale=2)
571
+ long_text_btn = gr.Button('📜 Long Text', variant='secondary', scale=1)
572
+
573
+ with gr.Accordion('Output Tokens', open=False):
574
+ out_ps = gr.Textbox(interactive=False, show_label=False)
575
+ tokenize_btn = gr.Button('Tokenize')
576
+ gr.Markdown(TOKEN_NOTE)
577
+
578
+ with gr.Accordion("💾 Export", open=False):
579
+ with gr.Row():
580
+ export_format = gr.Dropdown(["WAV", "MP3", "OGG", "FLAC"], value="MP3", label="Format")
581
+ export_filename = gr.Textbox(label="Filename", placeholder="output_name")
582
+ export_btn = gr.Button("Export Audio")
583
+ export_status = gr.Textbox(label="Status", interactive=False)
584
+
585
+ # ============== STREAM TAB ==============
586
+ with gr.Tab("🌊 Stream"):
587
+ with gr.Row():
588
+ with gr.Column():
589
+ stream_text = gr.Textbox(label='Input Text', lines=5, placeholder="Enter text for streaming...")
590
+ stream_voice = gr.Dropdown(list(CHOICES.items()), value='af_heart', label='Voice')
591
+ stream_speed = gr.Slider(minimum=0.5, maximum=2, value=1, step=0.1, label='Speed')
592
+ stream_gpu = gr.Dropdown(
593
+ [('GPU 🚀', True), ('CPU 🐌', False)] if CUDA_AVAILABLE else [('CPU 🐌', False)],
594
+ value=CUDA_AVAILABLE,
595
+ label='Hardware'
596
+ )
597
+ stream_enhance = gr.Checkbox(label="Enable Enhancement", value=False)
598
+ stream_ssml = gr.Checkbox(label="📝 Enable SSML", value=False, info="Parse SSML tags in input")
599
+
600
+ with gr.Column():
601
+ out_stream = gr.Audio(label='Streaming Audio', streaming=True, autoplay=True)
602
+ with gr.Row():
603
+ stream_btn = gr.Button('▶️ Stream', variant='primary')
604
+ stop_btn = gr.Button('⏹️ Stop', variant='stop')
605
+ gr.Markdown("⚠️ First stream might have no audio due to Gradio bug. Try again if needed.")
606
+
607
+ # ============== BATCH PROCESSING TAB ==============
608
+ with gr.Tab("📦 Batch Processing"):
609
+ gr.Markdown("### Process multiple text files at once")
610
+ with gr.Row():
611
+ with gr.Column():
612
+ batch_files = gr.File(label="Upload Text Files", file_count="multiple", file_types=[".txt"])
613
+ batch_voice = gr.Dropdown(list(CHOICES.items()), value='af_heart', label='Voice')
614
+ batch_speed = gr.Slider(minimum=0.5, maximum=2, value=1, step=0.1, label='Speed')
615
+ batch_gpu = gr.Dropdown(
616
+ [('GPU 🚀', True), ('CPU 🐌', False)] if CUDA_AVAILABLE else [('CPU 🐌', False)],
617
+ value=CUDA_AVAILABLE,
618
+ label='Hardware'
619
+ )
620
+ batch_enhance = gr.Checkbox(label="Enable Enhancement", value=False)
621
+ batch_format = gr.Dropdown(["WAV", "MP3", "OGG", "FLAC"], value="MP3", label="Output Format")
622
+ batch_ssml = gr.Checkbox(label="📝 Enable SSML", value=False, info="Parse SSML tags")
623
+ batch_btn = gr.Button("🚀 Process All", variant='primary')
624
+
625
+ with gr.Column():
626
+ batch_results = gr.Textbox(label="Results", lines=15, interactive=False)
627
+ gr.Markdown(f"📁 Output folder: `{OUTPUT_DIR}`")
628
+
629
+ # ============== SRT GENERATOR TAB ==============
630
+ with gr.Tab("📄 SRT Generator"):
631
+ gr.Markdown("""
632
+ ### Generate SRT Subtitles from Audio
633
+ Upload any audio file and generate accurate SRT subtitles using Whisper.
634
+
635
+ **Supported formats:** WAV, MP3, OGG, FLAC, M4A, etc.
636
+ """)
637
+ with gr.Row():
638
+ with gr.Column():
639
+ srt_audio_input = gr.Audio(label="Upload Audio", type="filepath")
640
+ srt_generate_btn = gr.Button("🎬 Generate SRT", variant='primary')
641
+
642
+ with gr.Column():
643
+ srt_output_file = gr.File(label="Download SRT", interactive=False)
644
+ srt_status = gr.Textbox(label="Status", interactive=False, lines=5)
645
+
646
+ gr.Markdown(f"📁 SRT files are also saved to: `{OUTPUT_DIR}`")
647
+
648
+ # ============== SSML GUIDE TAB ==============
649
+ with gr.Tab("📖 SSML Guide"):
650
+ gr.Markdown("""
651
+ # SSML (Speech Synthesis Markup Language) Guide
652
+
653
+ SSML allows you to control how text is spoken. Enable the **"Enable SSML"** checkbox to use these tags.
654
+
655
+ ---
656
+
657
+ ## Supported Tags
658
+
659
+ ### 1. Pauses / Breaks
660
+ Insert pauses in speech:
661
+ ```xml
662
+ Hello <break time="500ms"/> World
663
+ Hello <break time="1s"/> World
664
+ Hello <break/> World
665
+ ```
666
+ - `500ms` = half second pause
667
+ - `1s` = one second pause
668
+ - No time = default pause
669
+
670
+ ---
671
+
672
+ ### 2. Emphasis
673
+ Make words stand out:
674
+ ```xml
675
+ This is <emphasis>important</emphasis>
676
+ This is <emphasis level="strong">very important</emphasis>
677
+ This is <emphasis level="moderate">somewhat important</emphasis>
678
+ ```
679
+ - Default = moderate emphasis
680
+ - `level="strong"` = stronger emphasis
681
+
682
+ ---
683
+
684
+ ### 3. Custom Pronunciation (Phoneme)
685
+ Specify exact pronunciation using IPA:
686
+ ```xml
687
+ <phoneme ph="kˈOkəɹO">Kokoro</phoneme>
688
+ <phoneme ph="təˈmeɪtoʊ">tomato</phoneme>
689
+ ```
690
+
691
+ ---
692
+
693
+ ### 4. Text Substitution
694
+ Replace displayed text with spoken text:
695
+ ```xml
696
+ <sub alias="World Wide Web Consortium">W3C</sub>
697
+ <sub alias="doctor">Dr.</sub> Smith
698
+ ```
699
+
700
+ ---
701
+
702
+ ### 5. Spell Out Letters
703
+ Spell out words letter by letter:
704
+ ```xml
705
+ <say-as interpret-as="spell-out">NASA</say-as>
706
+ <say-as interpret-as="characters">ABC</say-as>
707
+ ```
708
+
709
+ ---
710
+
711
+ ## Full Example
712
+ ```xml
713
+ Welcome to <phoneme ph="kˈOkəɹO">Kokoro</phoneme> TTS!
714
+
715
+ <break time="500ms"/>
716
+
717
+ This demo shows <emphasis>SSML support</emphasis>.
718
+
719
+ The abbreviation <sub alias="Text to Speech">TTS</sub> stands for
720
+ <say-as interpret-as="spell-out">TTS</say-as>.
721
+
722
+ <break time="1s"/>
723
+
724
+ Thank you for listening!
725
+ ```
726
+
727
+ ---
728
+
729
+ ## Prompt for Gemini/ChatGPT
730
+
731
+ Copy this prompt to have AI reformat your text with SSML tags:
732
+
733
+ ```
734
+ Please convert the following text to SSML format for a TTS system.
735
+ Use these tags where appropriate:
736
+ - <break time="Xms"/> for pauses (use 300-1000ms)
737
+ - <emphasis>word</emphasis> for important words
738
+ - <emphasis level="strong">word</emphasis> for very important words
739
+ - <sub alias="spoken">written</sub> for abbreviations
740
+ - <say-as interpret-as="spell-out">ABC</say-as> to spell out acronyms
741
+
742
+ Keep the text natural and don't overuse tags.
743
+ Add pauses at natural speech boundaries (after commas, periods, between paragraphs).
744
+
745
+ Text to convert:
746
+ [PASTE YOUR TEXT HERE]
747
+ ```
748
+
749
+ ---
750
+
751
+ ## Tips
752
+
753
+ 1. **Don't overuse tags** - Too many pauses or emphasis makes speech unnatural
754
+ 2. **Test incrementally** - Add a few tags, test, then add more
755
+ 3. **Use breaks sparingly** - Natural pauses happen at punctuation already
756
+ 4. **Emphasis works best on single words** - Not long phrases
757
+ """)
758
+
759
+ # ============== PRESETS TAB ==============
760
+ with gr.Tab("⚙️ Presets"):
761
+ gr.Markdown("### Save and load your favorite settings")
762
+ with gr.Row():
763
+ with gr.Column():
764
+ preset_name = gr.Textbox(label="Preset Name", placeholder="my_preset")
765
+ preset_voice = gr.Dropdown(list(CHOICES.items()), value='af_heart', label='Voice')
766
+ preset_speed = gr.Slider(minimum=0.5, maximum=2, value=1, step=0.1, label='Speed')
767
+ preset_enhance = gr.Checkbox(label="Enable Enhancement", value=False)
768
+ preset_noise = gr.Checkbox(label="Noise Reduction", value=True)
769
+ preset_norm = gr.Checkbox(label="Normalize", value=True)
770
+ preset_eq = gr.Checkbox(label="EQ Boost", value=True)
771
+ save_preset_btn = gr.Button("💾 Save Preset", variant='primary')
772
+
773
+ with gr.Column():
774
+ preset_list = gr.Dropdown(choices=list(load_presets().keys()), label="Saved Presets")
775
+ load_preset_btn = gr.Button("📂 Load Preset")
776
+ delete_preset_btn = gr.Button("🗑️ Delete Preset", variant='stop')
777
+ preset_status = gr.Textbox(label="Status", interactive=False)
778
+
779
+ # ============== EVENT HANDLERS ==============
780
+
781
+ # Generate tab
782
+ random_btn.click(fn=get_random_quote, outputs=[text])
783
+ gatsby_btn.click(fn=get_gatsby, outputs=[text])
784
+ frank_btn.click(fn=get_frankenstein, outputs=[text])
785
+
786
+ generate_btn.click(
787
+ fn=generate_first,
788
+ inputs=[text, voice, speed, use_gpu, enhance_audio_flag, noise_reduce, normalize, eq_enhance, use_ssml],
789
+ outputs=[out_audio, out_ps]
790
+ )
791
+
792
+ long_text_btn.click(
793
+ fn=generate_long_text,
794
+ inputs=[text, voice, speed, use_gpu, enhance_audio_flag, noise_reduce, normalize, eq_enhance, use_ssml],
795
+ outputs=[out_audio, out_ps]
796
+ )
797
+
798
+ tokenize_btn.click(fn=tokenize_first, inputs=[text, voice], outputs=[out_ps])
799
+ export_btn.click(fn=export_single, inputs=[out_audio, export_format, export_filename], outputs=[export_status])
800
+
801
+ # Stream tab
802
+ stream_event = stream_btn.click(
803
+ fn=generate_all,
804
+ inputs=[stream_text, stream_voice, stream_speed, stream_gpu, stream_enhance,
805
+ gr.State(True), gr.State(True), gr.State(True), stream_ssml],
806
+ outputs=[out_stream]
807
+ )
808
+ stop_btn.click(fn=None, cancels=stream_event)
809
+
810
+ # Batch tab
811
+ batch_btn.click(
812
+ fn=process_batch,
813
+ inputs=[batch_files, batch_voice, batch_speed, batch_gpu, batch_enhance,
814
+ gr.State(True), gr.State(True), gr.State(True), batch_format, batch_ssml],
815
+ outputs=[batch_results]
816
+ )
817
+
818
+ # SRT Generator tab
819
+ srt_generate_btn.click(
820
+ fn=generate_srt_from_file,
821
+ inputs=[srt_audio_input],
822
+ outputs=[srt_output_file, srt_status]
823
+ )
824
+
825
+ # Presets tab
826
+ save_preset_btn.click(
827
+ fn=save_preset,
828
+ inputs=[preset_name, preset_voice, preset_speed, preset_enhance, preset_noise, preset_norm, preset_eq],
829
+ outputs=[preset_list, preset_status]
830
+ )
831
+
832
+ load_preset_btn.click(
833
+ fn=load_preset,
834
+ inputs=[preset_list],
835
+ outputs=[preset_voice, preset_speed, preset_enhance, preset_noise, preset_norm, preset_eq]
836
+ )
837
+
838
+ delete_preset_btn.click(
839
+ fn=delete_preset,
840
+ inputs=[preset_list],
841
+ outputs=[preset_list, preset_status]
842
+ )
843
+
844
+ if __name__ == '__main__':
845
+ app.queue().launch(server_name="0.0.0.0", server_port=7860)
custom_voices/deep.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fa2916c919d315d0acfdcfa48d261a7d5fa555fd028baf22086761826dcb0c11
3
+ size 1501384
en.txt ADDED
The diff for this file is too large to render. See raw diff
 
frankenstein5k.md ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ You will rejoice to hear that no disaster has accompanied the commencement of an enterprise which you have regarded with such evil forebodings. I arrived here yesterday, and my first task is to assure my dear sister of my welfare and increasing confidence in the success of my undertaking.
2
+
3
+ I am already far north of London, and as I walk in the streets of Petersburgh, I feel a cold northern breeze play upon my cheeks, which braces my nerves and fills me with delight. Do you understand this feeling? This breeze, which has travelled from the regions towards which I am advancing, gives me a foretaste of those icy climes. Inspirited by this wind of promise, my daydreams become more fervent and vivid. I try in vain to be persuaded that the pole is the seat of frost and desolation; it ever presents itself to my imagination as the region of beauty and delight. There, Margaret, the sun is for ever visible, its broad disk just skirting the horizon and diffusing a perpetual splendour. There—for with your leave, my sister, I will put some trust in preceding navigators—there snow and frost are banished; and, sailing over a calm sea, we may be wafted to a land surpassing in wonders and in beauty every region hitherto discovered on the habitable globe. Its productions and features may be without example, as the phenomena of the heavenly bodies undoubtedly are in those undiscovered solitudes. What may not be expected in a country of eternal light? I may there discover the wondrous power which attracts the needle and may regulate a thousand celestial observations that require only this voyage to render their seeming eccentricities consistent for ever. I shall satiate my ardent curiosity with the sight of a part of the world never before visited, and may tread a land never before imprinted by the foot of man. These are my enticements, and they are sufficient to conquer all fear of danger or death and to induce me to commence this laborious voyage with the joy a child feels when he embarks in a little boat, with his holiday mates, on an expedition of discovery up his native river. But supposing all these conjectures to be false, you cannot contest the inestimable benefit which I shall confer on all mankind, to the last generation, by discovering a passage near the pole to those countries, to reach which at present so many months are requisite; or by ascertaining the secret of the magnet, which, if at all possible, can only be effected by an undertaking such as mine.
4
+
5
+ These reflections have dispelled the agitation with which I began my letter, and I feel my heart glow with an enthusiasm which elevates me to heaven, for nothing contributes so much to tranquillise the mind as a steady purpose—a point on which the soul may fix its intellectual eye. This expedition has been the favourite dream of my early years. I have read with ardour the accounts of the various voyages which have been made in the prospect of arriving at the North Pacific Ocean through the seas which surround the pole. You may remember that a history of all the voyages made for purposes of discovery composed the whole of our good Uncle Thomas’s library. My education was neglected, yet I was passionately fond of reading. These volumes were my study day and night, and my familiarity with them increased that regret which I had felt, as a child, on learning that my father’s dying injunction had forbidden my uncle to allow me to embark in a seafaring life.
6
+
7
+ These visions faded when I perused, for the first time, those poets whose effusions entranced my soul and lifted it to heaven. I also became a poet and for one year lived in a paradise of my own creation; I imagined that I also might obtain a niche in the temple where the names of Homer and Shakespeare are consecrated. You are well acquainted with my failure and how heavily I bore the disappointment. But just at that time I inherited the fortune of my cousin, and my thoughts were turned into the channel of their earlier bent.
8
+
9
+ Six years have passed since I resolved on my present undertaking. I can, even now, remember the hour from which I dedicated myself to this great enterprise. I commenced by inuring my body to hardship. I accompanied the whale-fishers on several expeditions to the North Sea; I voluntarily endured cold, famine, thirst, and want of sleep; I often worked harder than the common sailors during the day and devoted my nights to the study of mathematics, the theory of medicine, and those branches of physical science from which a naval adventurer might derive the greatest practical advantage. Twice I actually hired myself as an under-mate in a Greenland whaler, and acquitted myself to admiration. I must own I felt a little proud when my captain offered me the second dignity in the vessel and entreated me to remain with the greatest earnestness, so valuable did he consider my services.
10
+
11
+ And now, dear Margaret, do I not deserve to accomplish some great purpose?
gatsby5k.md ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ In my younger and more vulnerable years my father gave me some advice that I’ve been turning over in my mind ever since.
2
+
3
+ “Whenever you feel like criticizing anyone,” he told me, “just remember that all the people in this world haven’t had the advantages that you’ve had.”
4
+
5
+ He didn’t say any more, but we’ve always been unusually communicative in a reserved way, and I understood that he meant a great deal more than that. In consequence, I’m inclined to reserve all judgements, a habit that has opened up many curious natures to me and also made me the victim of not a few veteran bores. The abnormal mind is quick to detect and attach itself to this quality when it appears in a normal person, and so it came about that in college I was unjustly accused of being a politician, because I was privy to the secret griefs of wild, unknown men. Most of the confidences were unsought—frequently I have feigned sleep, preoccupation, or a hostile levity when I realized by some unmistakable sign that an intimate revelation was quivering on the horizon; for the intimate revelations of young men, or at least the terms in which they express them, are usually plagiaristic and marred by obvious suppressions. Reserving judgements is a matter of infinite hope. I am still a little afraid of missing something if I forget that, as my father snobbishly suggested, and I snobbishly repeat, a sense of the fundamental decencies is parcelled out unequally at birth.
6
+
7
+ And, after boasting this way of my tolerance, I come to the admission that it has a limit. Conduct may be founded on the hard rock or the wet marshes, but after a certain point I don’t care what it’s founded on. When I came back from the East last autumn I felt that I wanted the world to be in uniform and at a sort of moral attention forever; I wanted no more riotous excursions with privileged glimpses into the human heart. Only Gatsby, the man who gives his name to this book, was exempt from my reaction—Gatsby, who represented everything for which I have an unaffected scorn. If personality is an unbroken series of successful gestures, then there was something gorgeous about him, some heightened sensitivity to the promises of life, as if he were related to one of those intricate machines that register earthquakes ten thousand miles away. This responsiveness had nothing to do with that flabby impressionability which is dignified under the name of the “creative temperament”—it was an extraordinary gift for hope, a romantic readiness such as I have never found in any other person and which it is not likely I shall ever find again. No—Gatsby turned out all right at the end; it is what preyed on Gatsby, what foul dust floated in the wake of his dreams that temporarily closed out my interest in the abortive sorrows and short-winded elations of men.
8
+
9
+ My family have been prominent, well-to-do people in this Middle Western city for three generations. The Carraways are something of a clan, and we have a tradition that we’re descended from the Dukes of Buccleuch, but the actual founder of my line was my grandfather’s brother, who came here in fifty-one, sent a substitute to the Civil War, and started the wholesale hardware business that my father carries on today.
10
+
11
+ I never saw this great-uncle, but I’m supposed to look like him—with special reference to the rather hard-boiled painting that hangs in father’s office. I graduated from New Haven in 1915, just a quarter of a century after my father, and a little later I participated in that delayed Teutonic migration known as the Great War. I enjoyed the counter-raid so thoroughly that I came back restless. Instead of being the warm centre of the world, the Middle West now seemed like the ragged edge of the universe—so I decided to go East and learn the bond business. Everybody I knew was in the bond business, so I supposed it could support one more single man. All my aunts and uncles talked it over as if they were choosing a prep school for me, and finally said, “Why—[ye-es](/jˈɛ ɛs/),” with very grave, hesitant faces. Father agreed to finance me for a year, and after various delays I came East, permanently, I thought, in the spring of twenty-two.
12
+
13
+ The practical thing was to find rooms in the city, but it was a warm season, and I had just left a country of wide lawns and friendly trees, so when a young man at the office suggested that we take a house together in a commuting town, it sounded like a great idea. He found the house, a weather-beaten cardboard bungalow at eighty a month, but at the last minute the firm ordered him to Washington, and I went out to the country alone. I had a dog—at least I had him for a few days until he ran away—and an old Dodge and a Finnish woman, who made my bed and cooked breakfast and muttered Finnish wisdom to herself over the electric stove.
14
+
15
+ It was lonely for a day or so until one morning some man, more recently arrived than I, stopped me on the road.
16
+
17
+ “How do you get to West Egg village?” he asked helplessly.
requirements.txt ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ kokoro
2
+ gradio
3
+ torch
4
+ numpy
5
+ soundfile
6
+ pydub
7
+ noisereduce
8
+ pedalboard
9
+ openai-whisper
10
+ scipy