Engage in multimedia chat with LLMs and ML models
Transcribe audio files or YouTube videos into text
Generate images from text prompts