# LaVy-8B – Stage 2 (Impression) ## 1. Model Overview This model is part of a Vision-Language AI system designed for chest X-ray analysis in Vietnamese clinical settings. The full pipeline consists of 3 stages: - Stage 1: Findings generation (image → radiology findings) - Stage 2: Impression generation (image → clinical impression) - Stage 3: Multi-turn conversation (findings + impression + dialogue) This repository corresponds to: - Stage: 2 (Impression) - Task: Generate clinical impression (final conclusion) from chest X-ray images - Domain: Vietnamese medical imaging (Chest X-ray) The model is fine-tuned from LaVy and evaluated against multiple architectures (InternVL, Vintern, Qwen2-VL, MiniCPM-V, LaVy). Among all models, Qwen2-VL-7B achieved the best performance, but this model is provided for benchmarking and comparison. --- ## 2. Installation ```bash pip install torch torchvision transformers pillow ``` --- ## 3. Inference GPU is recommended. ```python import torch from PIL import Image from transformers import AutoModelForCausalLM, AutoProcessor model = AutoModelForCausalLM.from_pretrained( "THP2903/lavy-Instruct_impression_full", torch_dtype=torch.float16, device_map="auto", trust_remote_code=True ) processor = AutoProcessor.from_pretrained( "THP2903/lavy-Instruct_impression_full", trust_remote_code=True ) image = Image.open("your_image.jpg").convert("RGB") inputs = processor( images=image, text="Ảnh chụp xray benh nhân nam, 48 tuổi PA ket luan bị gì?", return_tensors="pt" ).to("cuda") outputs = model.generate( **inputs, max_new_tokens=512 ) response = processor.batch_decode( outputs, skip_special_tokens=True )[0] print(response) ``` --- ## 4. Notes - Input must be a chest X-ray image - Output is the final clinical impression (diagnostic conclusion) - This is a generic HuggingFace inference pipeline for LaVy-style models - If your implementation differs, adjust processor/model loading accordingly - For best performance, consider using Qwen2-VL-7B