Instructions to use HuggingFaceM4/VLM_WebSight_finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use HuggingFaceM4/VLM_WebSight_finetuned with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="HuggingFaceM4/VLM_WebSight_finetuned", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("HuggingFaceM4/VLM_WebSight_finetuned", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use HuggingFaceM4/VLM_WebSight_finetuned with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "HuggingFaceM4/VLM_WebSight_finetuned"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceM4/VLM_WebSight_finetuned",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/HuggingFaceM4/VLM_WebSight_finetuned

SGLang

How to use HuggingFaceM4/VLM_WebSight_finetuned with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "HuggingFaceM4/VLM_WebSight_finetuned" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceM4/VLM_WebSight_finetuned",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "HuggingFaceM4/VLM_WebSight_finetuned" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceM4/VLM_WebSight_finetuned",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use HuggingFaceM4/VLM_WebSight_finetuned with Docker Model Runner:
```
docker model run hf.co/HuggingFaceM4/VLM_WebSight_finetuned
```

VictorSanh commited on Jan 12, 2024

Commit

7515eca

1 Parent(s): c8028be

big update

Browse files

Files changed (1) hide show

modeling_img2html.py +10 -12

modeling_img2html.py CHANGED Viewed

@@ -162,7 +162,7 @@ def expand_inputs_for_generation(
     input_ids = input_ids.index_select(0, expanded_return_idx)
     model_kwargs["pixel_values"] = model_kwargs.get("pixel_values", None)
     model_kwargs["image_hidden_states"] = model_kwargs.get("image_hidden_states", None)
-    model_kwargs["image_attention_mask"] = model_kwargs.get("image_attention_mask", None)
     if "token_type_ids" in model_kwargs:
         token_type_ids = model_kwargs["token_type_ids"]
@@ -180,9 +180,7 @@ def expand_inputs_for_generation(
         model_kwargs["pixel_values"] = model_kwargs["pixel_values"].index_select(0, expanded_return_idx)
     elif model_kwargs["image_hidden_states"] is not None:
-        model_kwargs["image_hidden_states"] = model_kwargs["image_hidden_states"].index_select(
-            0, expanded_return_idx
-        )
     return input_ids, model_kwargs
@@ -205,10 +203,10 @@ def update_model_kwargs_for_generation(outputs, model_kwargs):
         model_kwargs["attention_mask"] = torch.cat(
             [attention_mask, attention_mask.new_ones((attention_mask.shape[0], 1))], dim=-1
         )
-    if "image_attention_mask" in model_kwargs:
-        image_attention_mask = model_kwargs["image_attention_mask"]
-        last_mask = image_attention_mask[:, -1, :].unsqueeze(1)
-        model_kwargs["image_attention_mask"] = last_mask
     # Get the precomputed image_hidden_states
     model_kwargs["image_hidden_states"] = outputs.image_hidden_states
@@ -236,7 +234,7 @@ def prepare_inputs_for_generation(input_ids, past_key_values=None, **kwargs):
     pixel_values = kwargs.get("pixel_values", None)
     image_hidden_states = kwargs.get("image_hidden_states", None)
-    image_attention_mask = kwargs.get("image_attention_mask", None)
     return {
         "input_ids": input_ids,
@@ -247,7 +245,7 @@ def prepare_inputs_for_generation(input_ids, past_key_values=None, **kwargs):
         "token_type_ids": token_type_ids,
         "pixel_values": pixel_values,
         "image_hidden_states": image_hidden_states,
-        "image_attention_mask": image_attention_mask,
     }
@@ -1373,7 +1371,6 @@ class VMistralModel(VMistralPreTrainedModel):
         input_ids: torch.LongTensor = None,
         inputs_embeds: Optional[torch.Tensor] = None,
         image_hidden_states: Optional[torch.Tensor] = None,
-        num_images: Optional[int] = None,
     ):
         """
         This method aims at merging the token embeddings with the image hidden states into one single sequence of vectors that are fed to the transformer LM.
@@ -1496,6 +1493,8 @@ class VMistralModel(VMistralPreTrainedModel):
             if self.config.use_resampler:
                 image_hidden_states = self.perceiver_resampler(image_hidden_states)
         if past_key_values is None:
             # When we generate, we don't want to replace the potential image_token_id that we generated by images
@@ -1504,7 +1503,6 @@ class VMistralModel(VMistralPreTrainedModel):
                 input_ids=input_ids,
                 inputs_embeds=inputs_embeds,
                 image_hidden_states=image_hidden_states,
-                num_images=num_images,
             )
             inputs_embeds = new_inp["inputs_embeds"]

     input_ids = input_ids.index_select(0, expanded_return_idx)
     model_kwargs["pixel_values"] = model_kwargs.get("pixel_values", None)
     model_kwargs["image_hidden_states"] = model_kwargs.get("image_hidden_states", None)
+    # model_kwargs["image_attention_mask"] = model_kwargs.get("image_attention_mask", None)
     if "token_type_ids" in model_kwargs:
         token_type_ids = model_kwargs["token_type_ids"]
         model_kwargs["pixel_values"] = model_kwargs["pixel_values"].index_select(0, expanded_return_idx)
     elif model_kwargs["image_hidden_states"] is not None:
+        model_kwargs["image_hidden_states"] = model_kwargs["image_hidden_states"].index_select(0, expanded_return_idx)
     return input_ids, model_kwargs
         model_kwargs["attention_mask"] = torch.cat(
             [attention_mask, attention_mask.new_ones((attention_mask.shape[0], 1))], dim=-1
         )
+    # if "image_attention_mask" in model_kwargs:
+    #     image_attention_mask = model_kwargs["image_attention_mask"]
+    #     last_mask = image_attention_mask[:, -1, :].unsqueeze(1)
+    #     model_kwargs["image_attention_mask"] = last_mask
     # Get the precomputed image_hidden_states
     model_kwargs["image_hidden_states"] = outputs.image_hidden_states
     pixel_values = kwargs.get("pixel_values", None)
     image_hidden_states = kwargs.get("image_hidden_states", None)
+    # image_attention_mask = kwargs.get("image_attention_mask", None)
     return {
         "input_ids": input_ids,
         "token_type_ids": token_type_ids,
         "pixel_values": pixel_values,
         "image_hidden_states": image_hidden_states,
+        # "image_attention_mask": image_attention_mask,
     }
         input_ids: torch.LongTensor = None,
         inputs_embeds: Optional[torch.Tensor] = None,
         image_hidden_states: Optional[torch.Tensor] = None,
     ):
         """
         This method aims at merging the token embeddings with the image hidden states into one single sequence of vectors that are fed to the transformer LM.
             if self.config.use_resampler:
                 image_hidden_states = self.perceiver_resampler(image_hidden_states)
+        elif image_hidden_states is not None:
+            image_hidden_states = image_hidden_states.to(dtype=self.dtype, device=input_ids.device)
         if past_key_values is None:
             # When we generate, we don't want to replace the potential image_token_id that we generated by images
                 input_ids=input_ids,
                 inputs_embeds=inputs_embeds,
                 image_hidden_states=image_hidden_states,
             )
             inputs_embeds = new_inp["inputs_embeds"]