Transcribe audio files to text with language detection
Real-time video captioning powered by FastVLM
Expressive Zeroshot TTS