| | --- |
| | license: apache-2.0 |
| | tags: |
| | - image-captioning |
| | languages: |
| | - en |
| | pipeline_tag: image-to-text |
| | datasets: |
| | - michelecafagna26/hl |
| | language: |
| | - en |
| | metrics: |
| | - sacrebleu |
| | - rouge |
| | library_name: transformers |
| | --- |
| | ## GIT-base fine-tuned for Image Captioning on High-Level descriptions of Scenes |
| |
|
| | [GIT](https://arxiv.org/abs/2205.14100) base trained on the [HL dataset](https://huggingface.co/datasets/michelecafagna26/hl) for **scene generation of images** |
| |
|
| | ## Model fine-tuning ๐๏ธโ |
| |
|
| | - Trained for 10 epochs |
| | - lr: 5eโ5 |
| | - Adam optimizer |
| | - half-precision (fp16) |
| |
|
| | ## Test set metrics ๐งพ |
| |
|
| | | Cider | SacreBLEU | Rouge-L| |
| | |--------|------------|--------| |
| | | 103.00 | 24.67 | 33.90 | |
| | |
| | ## Model in Action ๐ |
| |
|
| | ```python |
| | import requests |
| | from PIL import Image |
| | from transformers import AutoProcessor, AutoModelForCausalLM |
| | |
| | processor = AutoProcessor.from_pretrained("git-base-captioning-ft-hl-scenes") |
| | model = AutoModelForCausalLM.from_pretrained("git-base-captioning-ft-hl-scenes").to("cuda") |
| | |
| | img_url = 'https://datasets-server.huggingface.co/assets/michelecafagna26/hl/--/default/train/0/image/image.jpg' |
| | raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB') |
| | |
| | |
| | inputs = processor(raw_image, return_tensors="pt").to("cuda") |
| | pixel_values = inputs.pixel_values |
| | |
| | generated_ids = model.generate(pixel_values=pixel_values, max_length=50, |
| | do_sample=True, |
| | top_k=120, |
| | top_p=0.9, |
| | early_stopping=True, |
| | num_return_sequences=1) |
| | |
| | processor.batch_decode(generated_ids, skip_special_tokens=True) |
| | |
| | >>> "in a beach" |
| | ``` |
| |
|
| | ## BibTex and citation info |
| |
|
| | ```BibTeX |
| | @inproceedings{cafagna2023hl, |
| | title={{HL} {D}ataset: {V}isually-grounded {D}escription of {S}cenes, {A}ctions and |
| | {R}ationales}, |
| | author={Cafagna, Michele and van Deemter, Kees and Gatt, Albert}, |
| | booktitle={Proceedings of the 16th International Natural Language Generation Conference (INLG'23)}, |
| | address = {Prague, Czech Republic}, |
| | year={2023} |
| | } |
| | ``` |