Multimodal
Process images, audio, and video with open-source multimodal models
Recipes
GPT-4V → LLaVA
Replace GPT-4V with LLaVA for image understanding tasks
Coming Soon
DALL-E → Stable Diffusion
Generate images using Stable Diffusion instead of DALL-E
Coming Soon
Whisper → Whisper.cpp
Use local Whisper implementation for speech-to-text
Coming Soon
Video Understanding
Process video content with open-source video models
Coming Soon