sthui
/

SimpleSeg-Qwen2.5-VL

Feature Extraction

Model card Files Files and versions

songtianhui commited on 27 days ago

Commit

01d6309

·

1 Parent(s): 46f0dc4

update example code

Files changed (1) hide show

README.md +62 -26

README.md CHANGED Viewed

@@ -80,36 +80,72 @@ Without introducing any complex architectures or special patterns, we show how e
 # Model Usage
-## Inference with 🤗 Hugging Face Transformers
-It is recommended to use python=3.10, torch>=2.1.0, and transformers=4.48.2 as the development environment.
 ```python
-from PIL import Image
-from transformers import AutoModelForCausalLM, AutoProcessor
-model_path = "sthui/SimpleSeg-Kimi-VL"
-model = AutoModelForCausalLM.from_pretrained(
-    model_path,
-    torch_dtype="auto",
-    device_map="auto",
-    trust_remote_code=True,
-)
-processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
 image_path = "./figures/octopus.png"
-image = Image.open(image_path)
-messages = [
-    {"role": "user", "content": [{"type": "image", "image": image_path}, {"type": "text", "text": "Output the polygon coordinates of octopus in the image."}]}
-]
-text = processor.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
-inputs = processor(images=image, text=text, return_tensors="pt", padding=True, truncation=True).to(model.device)
-generated_ids = model.generate(**inputs, max_new_tokens=512)
-generated_ids_trimmed = [
-    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
-]
-response = processor.batch_decode(
-    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
-)[0]
-print(response)
 ```

 # Model Usage
+## Inference
+We recommend using vLLM for production deployment. Requires `vllm>=0.12.0` with `--trust-remote-code`.
+First, start the vLLM server:
+```
+vllm serve sthui/SimpleSeg-Qwen2.5-VL \
+  --trust-remote-code \
+  --tensor-parallel-size 4 \
+  --served-model-name SimpleSeg-Qwen2.5-VL \
+  --host 0.0.0.0 \
+  --port 8000
+```
+Then run the following code to inference:
 ```python
+import base64
+from openai import OpenAI
+# vLLM server configuration
+VLLM_BASE_URL = "http://localhost:8000/v1"
+MODEL_NAME = "SimpleSeg-Qwen2.5-VL"  # Should match --served-model-name in vllm serve
+def encode_image(image_path: str) -> str:
+    """Encode image to base64 string."""
+    with open(image_path, "rb") as f:
+        return base64.b64encode(f.read()).decode()
+def inference(image_path: str, instruction: str) -> str:
+    """Run GUI grounding inference via vLLM."""
+    client = OpenAI(base_url=VLLM_BASE_URL, api_key="EMPTY")
+    messages = [
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "image_url",
+                    "image_url": {"url": f"data:image/png;base64,{encode_image(image_path)}"}
+                },
+                {"type": "text", "text": instruction},
+            ],
+        },
+    ]
+    response = client.chat.completions.create(
+        model=MODEL_NAME,
+        messages=messages,
+        max_tokens=4096,
+        temperature=0,
+    )
+    return response.choices[0].message.content
+# Example usage
 image_path = "./figures/octopus.png"
+instruction = "Output the polygon coordinates of octopus in the image."
+response = inference(image_path, instruction)
+print("Model output:", response)
 ```