Ligul
/

capri

@@ -1,3 +1,18 @@
 # Capri
 Capri is a compact image captioning model designed for high-throughput, plain-language descriptions.
@@ -82,3 +97,16 @@ Reasonable defaults:
 - `decode_batch_size=1024`
 On larger GPUs, decode often scales to `2048+`.

+---
+language:
+  - en
+license: apache-2.0
+tags:
+  - image-captioning
+  - multimodal
+  - vision-language
+  - qwen2
+  - siglip
+datasets:
+  - merve/coco
+pipeline_tag: image-to-text
+---
 # Capri
 Capri is a compact image captioning model designed for high-throughput, plain-language descriptions.
 - `decode_batch_size=1024`
 On larger GPUs, decode often scales to `2048+`.
+## Attribution
+Trained on captions from the [COCO 2017](https://cocodataset.org/) dataset.
+- Annotations © COCO Consortium, licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
+- Images sourced from Flickr under their respective licenses; the dataset as a whole is not cleared for unrestricted commercial use
+> Lin, T.-Y., et al. "Microsoft COCO: Common Objects in Context." ECCV 2014. [arXiv:1405.0312](https://arxiv.org/abs/1405.0312)
+Built on top of:
+- [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) — Apache 2.0
+- [google/siglip2-base-patch16-224](https://huggingface.co/google/siglip2-base-patch16-224) — Apache 2.0