Extract speech segments and transcripts from multiple audio files
Generate speech dataset from audio files for voice synthesis