reubk
/

Molmo_7B_Human_Pointing_LoRA

Text Generation

Model card Files Files and versions

Molmo_7B_Human_Pointing_LoRA / README.md

reubk's picture

Update README.md

a7ca61d verified 3 months ago

|

history blame contribute delete

1.24 kB

	---
	license: apache-2.0
	language:
	- en
	base_model:
	- allenai/Molmo-7B-D-0924
	pipeline_tag: text-generation
	library_name: peft
	tags:
	- lora
	- finetune
	- agent

	---

	Testing a QLoRA adaptor for [allenai/Molmo-7B-D-0924](https://huggingface.co/allenai/Molmo-7B-D-0924),

	Targets attention layer of Transformer backbone and image pooling and projection layers of Vision backbone

	Trained on 47 screenshots of a low-poly video game with ragdoll casualties

	Evaluated on 44 screenshots of aforementioned video game

	Molmo has an edge case where it declares there are no humans in an image:
	![img1 (2)](https://cdn-uploads.huggingface.co/production/uploads/6367f8dd46919b9619bc7bf2/8zsuqnz-QCTamBDOgWGM-.png)

	This custom QLoRA successfully reduces the occurance of these cases
	![img1 (1)](https://cdn-uploads.huggingface.co/production/uploads/6367f8dd46919b9619bc7bf2/-HENqZx5SiLYX35tx3ADs.png)

	However, pointing to non-human objects is observed to increase.

	Comparison of Model performance with and without QLora on Eval dataset
	\|Model\| Molmo-7B-D \| Molmo-7B-D w/ QLora \|
	\|----------\|------\|------\|
	\| Precision \| 92.1 \| 80.5 \|
	\| Recall \| 70.4 \| 88.5 \|

	Dataset: [reubk/RavenfieldDataset](https://huggingface.co/datasets/reubk/RavenfieldDataset)