| --- |
| library_name: transformers |
| tags: |
| - comics |
| license: cc-by-sa-4.0 |
| datasets: |
| - VLR-CVC/ComicsPAP |
| language: |
| - en |
| base_model: |
| - Qwen/Qwen2.5-VL-3B-Instruct |
| --- |
| |
| # Lora Fine-Tune of Qwen2.5-VL-3B-Instruct on ComicsPAP datataset |
|
|
| [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) fine-tuned simultaneously in all five tasks of the [ComicsPAP](https://huggingface.co/datasets/VLR-CVC/ComicsPAP) dataset. |
| The training was performed using a constant learning rate of 2e-4 with the AdamW optimizer. The model was trained for 5k steps using an effective batch size of 128. The LoRA configuration employed an α of 16, a dropout rate of 0.05, and a rank r = 8. |
|
|
| ## Results |
| | Model | Repo | Sequence Filling (%) | Character Coherence (%) | Visual Closure (%) | Text Closure (%) | Caption Relevance (%) | Total (%) | |
| | :------------------------: | :---------------------------------------------------------------------------------: | :------------------: | :---------------------: | :----------------: | :--------------: | :-------------------: | :-------: | |
| | Random | | 20.22 | 50.00 | 14.41 | 25.00 | 25.00 | 24.30 | |
| | Qwen2.5-VL-3B (Zero-Shot) | [Qwen/Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) | 27.48 | 48.95 | 21.33 | 27.41 | 32.82 | 29.61 | |
| | Qwen2.5-VL-7B (Zero-Shot) | [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) | 30.53 | 54.55 | 22.00 | 37.45 | 40.84 | 34.91 | |
| | Qwen2.5-VL-72B (Zero-Shot) | [Qwen/Qwen2.5-VL-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct) | 46.88 | 53.84 | 23.66 | 55.60 | 38.17 | 41.27 | |
| | Qwen2.5-VL-3B (Lora Fine-Tuned) | [VLR-CVC/Qwen2.5-VL-3B-Instruct-lora-ComicsPAP](https://huggingface.co/VLR-CVC/Qwen2.5-VL-3B-Instruct-lora-ComicsPAP) | 62.21 | **93.01** | **42.33** | 63.71 | 35.49 | 55.55 | |
| | Qwen2.5-VL-7B (Lora Fine-Tuned) | [VLR-CVC/Qwen2.5-VL-7B-Instruct-lora-ComicsPAP](https://huggingface.co/VLR-CVC/Qwen2.5-VL-7B-Instruct-lora-ComicsPAP) | **69.08** | **93.01** | 42.00 | **74.90** | **49.62** | **62.31** | |
|
|
| ## Citation |
|
|
| **BibTeX:** |
| ``` |
| @misc{vivoli2025comicspap, |
| title={ComicsPAP: understanding comic strips by picking the correct panel}, |
| author={Emanuele Vivoli and Artemis Llabrés and Mohamed Ali Soubgui and Marco Bertini and Ernest Valveny Llobet and Dimosthenis Karatzas}, |
| year={2025}, |
| eprint={2503.08561}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CV}, |
| url={https://arxiv.org/abs/2503.08561}, |
| } |
| |
| @misc{qwen2.5-VL, |
| title = {Qwen2.5-VL}, |
| url = {https://qwenlm.github.io/blog/qwen2.5-vl/}, |
| author = {Qwen Team}, |
| month = {January}, |
| year = {2025} |
| } |
| ``` |