|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- allenai/Molmo-7B-D-0924 |
|
|
pipeline_tag: text-generation |
|
|
library_name: peft |
|
|
tags: |
|
|
- lora |
|
|
- finetune |
|
|
- agent |
|
|
|
|
|
--- |
|
|
|
|
|
Testing a QLoRA adaptor for [allenai/Molmo-7B-D-0924](https://huggingface.co/allenai/Molmo-7B-D-0924), |
|
|
|
|
|
Targets attention layer of Transformer backbone and image pooling and projection layers of Vision backbone |
|
|
|
|
|
Trained on 47 screenshots of a low-poly video game with ragdoll casualties |
|
|
|
|
|
Evaluated on 44 screenshots of aforementioned video game |
|
|
|
|
|
Molmo has an edge case where it declares there are no humans in an image: |
|
|
 |
|
|
|
|
|
This custom QLoRA successfully reduces the occurance of these cases |
|
|
 |
|
|
|
|
|
However, pointing to non-human objects is observed to increase. |
|
|
|
|
|
Comparison of Model performance with and without QLora on Eval dataset |
|
|
|Model| Molmo-7B-D | Molmo-7B-D w/ QLora | |
|
|
|----------|------|------| |
|
|
| Precision | 92.1 | 80.5 | |
|
|
| Recall | 70.4 | 88.5 | |
|
|
|
|
|
Dataset: [reubk/RavenfieldDataset](https://huggingface.co/datasets/reubk/RavenfieldDataset) |