damerajee/clean_vqa_prt2
Viewer • Updated • 273k • 2.32k • 1
How to use damerajee/GPTVision-1-ft with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="damerajee/GPTVision-1-ft", trust_remote_code=True) # Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("damerajee/GPTVision-1-ft", trust_remote_code=True, dtype="auto")How to use damerajee/GPTVision-1-ft with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "damerajee/GPTVision-1-ft"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "damerajee/GPTVision-1-ft",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/damerajee/GPTVision-1-ft
How to use damerajee/GPTVision-1-ft with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "damerajee/GPTVision-1-ft" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "damerajee/GPTVision-1-ft",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "damerajee/GPTVision-1-ft" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "damerajee/GPTVision-1-ft",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use damerajee/GPTVision-1-ft with Docker Model Runner:
docker model run hf.co/damerajee/GPTVision-1-ft
This model is a finetuned version of GPT-Vision-1 You can find out more about this model in this GITHUB
from transformers import AutoModelForCausalLM
from PIL import Image
model = AutoModelForCausalLM.from_pretrained("damerajee/GPTVision-1-ft", trust_remote_code=True)
image_path = "Your_image_path"
image = Image.open(image_path)
image = image.convert('RGB')
#the model generates better with these parameters
gen_kwargs = {
"do_sample": True,
"temperature": 0.8,
"top_p": 0.6,
"repetition_penalty": 1.6,
}
question = "your_Quesition"
answer = model.generate(image=image,question=question,max_new_tokens=80,**gen_kwargs)
print(answer)
| Image | Question | Response |
|---|---|---|
![]() |
Why is this dessert appealing to someone with a sweet tooth? | The cake is a must-try and the person with it may be enjoying their dessert.It might also serve as an appetizer, offering options for those who prefer to try more than just one of these options. This could include food or drinks they are not having yet consumed before. A plate can provide them from each side without additional options: some people would rather eat this dessert |
![]() |
What factors contribute to the player's success in this particular shot? | 1. Positioning and positioning: The ball is placed in a position that allows for the player to hit it with both hands, making contact on either side of his body or face down at an angle that requires them not only to reach the tennis court but also needs their feet to touch him. This can lead into different directions. For example if they are playing close-up during practice |
Despite fine-tuning, the model struggles with generating coherent text for more complex or nuanced queries. While it performs well with generic questions like 'Describe the scenery in this image,' or 'what is this player doing in this image?'
it falls short when asked about deeper or more intricate aspects.
which is a bummer but I tried