You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Qwen2-VL-7B-Instruct – Stage 1 (Findings)

1. Model Overview

This model is part of a Vision-Language AI system designed for chest X-ray analysis in Vietnamese clinical settings.

The full pipeline consists of 3 stages:

  • Stage 1: Findings generation (image → radiology findings)
  • Stage 2: Impression generation (findings → clinical impression)
  • Stage 3: Multi-turn conversation (findings + impression + dialogue)

This repository corresponds to:

  • Stage: 1 (Findings)
  • Task: Generate radiology findings from chest X-ray images
  • Domain: Vietnamese medical imaging (Chest X-ray)

This is the best-performing model among all evaluated architectures.


2. Installation

pip install torch torchvision transformers qwen-vl-utils pillow

3. Inference

GPU is strongly recommended.

from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
import torch

model = Qwen2VLForConditionalGeneration.from_pretrained(
    "THP2903/Qwen2-VL-7B-Instruct_finding_full",
    torch_dtype="auto",
    device_map="auto"
)

processor = AutoProcessor.from_pretrained("THP2903/Qwen2-VL-7B-Instruct_finding_full")

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "your_image.jpg",
            },
            {
                "type": "text",
                "text": "Ảnh chụp xray benh nhân nam, 48 tuổi PA có dấu hieu gi?",
            },
        ],
    }
]

text = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

image_inputs, video_inputs = process_vision_info(messages)

inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)

inputs = inputs.to("cuda")

generated_ids = model.generate(**inputs, max_new_tokens=512)

generated_ids_trimmed = [
    out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]

output_text = processor.batch_decode(
    generated_ids_trimmed,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False
)

print(output_text)

4. Notes

  • Input must be a chest X-ray image
  • Output is radiology findings (not final diagnosis)
  • This model follows the original Qwen2-VL inference pipeline without modification
  • This is the best-performing model across all stages
Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for THP2903/Qwen2-VL-7B-Instruct_finding_full

Base model

Qwen/Qwen2-VL-7B
Finetuned
(595)
this model