| language: en | |
| license: mit | |
| tags: | |
| - multimodal | |
| - vision-language | |
| - captioning | |
| # Multimodal Caption Model | |
| A model designed to generate textual descriptions from visual inputs. | |
| language: en | |
| license: mit | |
| tags: | |
| - multimodal | |
| - vision-language | |
| - captioning | |
| # Multimodal Caption Model | |
| A model designed to generate textual descriptions from visual inputs. | |