Translate timed word data into subtitle lines with speed analysis
Identify speakers in an audio transcript
Transcribe audio to word-level timestamps
Generate SRT subtitles from audio and timestamps
Align transcript words to audio and get timestamps
Decode Whisper encoder output into timed subtitles
Extract vocals from any song in seconds
Extract vocals from any audio file