Extract speech segments and transcripts from multiple audio files
Create SRT subtitles split at commas from audio
Generate speech dataset from audio files for voice synthesis