--- language: en license: mit tags: - multimodal - vision-language - captioning --- # Multimodal Caption Model A model designed to generate textual descriptions from visual inputs.