YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

InternVL-OVD (inference-only)

This repository contains inference-only artifacts exported from a training checkpoint.

Dataset mAP
COCO (merged categories) 59.8

Note: Evaluated on COCO with merged/consolidated category labels.

Inference Speed Comparison

Model Decoding Latency (1 obj) Latency (4 obj)
VLM only (MOSP) Autoregressive 1601.81 ms 2487.41 ms
VLM only (SOSP-B) Autoregressive 857.70 ms 1386.36 ms
VLM+DeTrHead+Merged Single step 74.57 ms 74.70 ms

Peak VRAM Usage

Model 1 Object 4 Objects
VLM only (MOSP) 3.99 GB 5.85 GB
VLM only (SOSP-B) 2.34 GB 3.31 GB
VLM+DeTrHead+Merged 2.63 GB 2.63 GB
  • Num of image tokens : 128token/patch x 7patches

Quick start

import torch
import requests
from io import BytesIO
from PIL import Image, ImageDraw
from transformers import AutoConfig, AutoModel, AutoTokenizer

repo_id = "xpuenabler/OVD_SOSP_Merge_Internvl_model2"
image_source = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"  # URL or local path
query = "dog, person"

# Load image from URL or local path
if image_source.startswith(("http://", "https://")):
    response = requests.get(image_source)
    pil = Image.open(BytesIO(response.content)).convert("RGB")
else:
    pil = Image.open(image_source).convert("RGB")

cfg = AutoConfig.from_pretrained(repo_id, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(cfg.vlm_model_name, trust_remote_code=True, use_fast=False)
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)
model.eval()

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

outputs = model.infer_image(image=pil, query=query, tokenizer=tokenizer)

pred_boxes = outputs.pred_boxes[0].float().cpu()
pred_scores = outputs.pred_scores[0].squeeze(-1).float().sigmoid().cpu()

# Visualize and save output
w, h = pil.size
vis = pil.copy()
draw = ImageDraw.Draw(vis)
for i in range(pred_boxes.shape[0]):
    score = float(pred_scores[i].item())
    x1n, y1n, x2n, y2n = pred_boxes[i].tolist()
    x1, y1, x2, y2 = x1n * w, y1n * h, x2n * w, y2n * h
    draw.rectangle([x1, y1, x2, y2], outline="red", width=3)
vis.save("output.jpg")
print(f"Saved visualization to output.jpg")

flowers dog,person

Downloads last month
103
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support