Ligul commited on
Commit
0574623
·
verified ·
1 Parent(s): fd6509b

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +28 -0
README.md CHANGED
@@ -1,3 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # Capri
2
 
3
  Capri is a compact image captioning model designed for high-throughput, plain-language descriptions.
@@ -82,3 +97,16 @@ Reasonable defaults:
82
  - `decode_batch_size=1024`
83
 
84
  On larger GPUs, decode often scales to `2048+`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - image-captioning
7
+ - multimodal
8
+ - vision-language
9
+ - qwen2
10
+ - siglip
11
+ datasets:
12
+ - merve/coco
13
+ pipeline_tag: image-to-text
14
+ ---
15
+
16
  # Capri
17
 
18
  Capri is a compact image captioning model designed for high-throughput, plain-language descriptions.
 
97
  - `decode_batch_size=1024`
98
 
99
  On larger GPUs, decode often scales to `2048+`.
100
+
101
+ ## Attribution
102
+
103
+ Trained on captions from the [COCO 2017](https://cocodataset.org/) dataset.
104
+
105
+ - Annotations © COCO Consortium, licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
106
+ - Images sourced from Flickr under their respective licenses; the dataset as a whole is not cleared for unrestricted commercial use
107
+
108
+ > Lin, T.-Y., et al. "Microsoft COCO: Common Objects in Context." ECCV 2014. [arXiv:1405.0312](https://arxiv.org/abs/1405.0312)
109
+
110
+ Built on top of:
111
+ - [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) — Apache 2.0
112
+ - [google/siglip2-base-patch16-224](https://huggingface.co/google/siglip2-base-patch16-224) — Apache 2.0