File size: 1,243 Bytes
b02277f
 
 
 
 
 
75cc1a2
 
 
 
 
 
c6b0ab8
dd703e6
 
856cfe3
4246556
 
 
dd703e6
4246556
dd60b3e
 
c6b0ab8
 
 
 
 
 
 
 
856cfe3
 
 
a7ca61d
 
856cfe3
10c2b6c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
---
license: apache-2.0
language:
- en
base_model:
- allenai/Molmo-7B-D-0924
pipeline_tag: text-generation
library_name: peft
tags:
- lora
- finetune
- agent

---

Testing a QLoRA adaptor for [allenai/Molmo-7B-D-0924](https://huggingface.co/allenai/Molmo-7B-D-0924), 

Targets attention layer of Transformer backbone and image pooling and projection layers of Vision backbone

Trained on 47 screenshots of a low-poly video game with ragdoll casualties

Evaluated on 44 screenshots of aforementioned video game

Molmo has an edge case where it declares there are no humans in an image:
![img1 (2)](https://cdn-uploads.huggingface.co/production/uploads/6367f8dd46919b9619bc7bf2/8zsuqnz-QCTamBDOgWGM-.png)

This custom QLoRA successfully reduces the occurance of these cases
![img1 (1)](https://cdn-uploads.huggingface.co/production/uploads/6367f8dd46919b9619bc7bf2/-HENqZx5SiLYX35tx3ADs.png)

However, pointing to non-human objects is observed to increase.

Comparison of Model performance with and without QLora on Eval dataset
|Model| Molmo-7B-D | Molmo-7B-D w/ QLora | 
|----------|------|------|
| Precision | 92.1 | 80.5 |
| Recall | 70.4 | 88.5 |

Dataset: [reubk/RavenfieldDataset](https://huggingface.co/datasets/reubk/RavenfieldDataset)