Efficient Vision Encoding for Vision Language Models
-
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper β’ 2412.13303 β’ Published β’ 73 -
FastVLM WebGPU
π429Real-time video captioning powered by FastVLM
-
apple/FastVLM-0.5B
Text Generation β’ 0.8B β’ Updated β’ 8.22k β’ 364 -
apple/FastVLM-1.5B
Text Generation β’ 2B β’ Updated β’ 1.97k β’ 73