GPT-Vision-1-ft

This model is a finetuned version of GPT-Vision-1 You can find out more about this model in this GITHUB

Training Details

The model was trained for 1 epoch on 200k rows
The Fine-tuning took about 2.5 hours

Inference

from transformers import AutoModelForCausalLM
from PIL import Image

model = AutoModelForCausalLM.from_pretrained("damerajee/GPTVision-1-ft", trust_remote_code=True)

image_path = "Your_image_path"
image = Image.open(image_path)
image = image.convert('RGB')

#the model generates better with these parameters
gen_kwargs = {
    "do_sample": True,
    "temperature": 0.8,
    "top_p": 0.6,
    "repetition_penalty": 1.6,

    }

question = "your_Quesition"
answer = model.generate(image=image,question=question,max_new_tokens=80,**gen_kwargs)
print(answer)

Examples

Image	Question	Response
	Why is this dessert appealing to someone with a sweet tooth?	The cake is a must-try and the person with it may be enjoying their dessert.It might also serve as an appetizer, offering options for those who prefer to try more than just one of these options. This could include food or drinks they are not having yet consumed before. A plate can provide them from each side without additional options: some people would rather eat this dessert
	What factors contribute to the player's success in this particular shot?	1. Positioning and positioning: The ball is placed in a position that allows for the player to hit it with both hands, making contact on either side of his body or face down at an angle that requires them not only to reach the tennis court but also needs their feet to touch him. This can lead into different directions. For example if they are playing close-up during practice

Limitations

Despite fine-tuning, the model struggles with generating coherent text for more complex or nuanced queries. While it performs well with generic questions like 'Describe the scenery in this image,' or 'what is this player doing in this image?'

it falls short when asked about deeper or more intricate aspects.

which is a bummer but I tried

Downloads last month: 3

Safetensors

Model size

0.2B params

Tensor type

F32

Dataset used to train damerajee/GPTVision-1-ft

Collection including damerajee/GPTVision-1-ft

GPT-Vision

Collection

variants of GPT-Vision • 4 items • Updated Jul 29, 2024