# LaVy-8B – Stage 2 (Impression)

## 1. Model Overview

This model is part of a Vision-Language AI system designed for chest X-ray analysis in Vietnamese clinical settings.

The full pipeline consists of 3 stages:
- Stage 1: Findings generation (image → radiology findings)
- Stage 2: Impression generation (image → clinical impression)
- Stage 3: Multi-turn conversation (findings + impression + dialogue)

This repository corresponds to:

- Stage: 2 (Impression)
- Task: Generate clinical impression (final conclusion) from chest X-ray images
- Domain: Vietnamese medical imaging (Chest X-ray)

The model is fine-tuned from LaVy and evaluated against multiple architectures (InternVL, Vintern, Qwen2-VL, MiniCPM-V, LaVy).

Among all models, Qwen2-VL-7B achieved the best performance, but this model is provided for benchmarking and comparison.

---

## 2. Installation

```bash
pip install torch torchvision transformers pillow
```

---

## 3. Inference

GPU is recommended.

```python
import torch
from PIL import Image
from transformers import AutoModelForCausalLM, AutoProcessor

model = AutoModelForCausalLM.from_pretrained(
    "THP2903/lavy-Instruct_impression_full",
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

processor = AutoProcessor.from_pretrained(
    "THP2903/lavy-Instruct_impression_full",
    trust_remote_code=True
)

image = Image.open("your_image.jpg").convert("RGB")

inputs = processor(
    images=image,
    text="Ảnh chụp xray benh nhân nam, 48 tuổi PA ket luan bị gì?",
    return_tensors="pt"
).to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens=512
)

response = processor.batch_decode(
    outputs,
    skip_special_tokens=True
)[0]

print(response)
```

---

## 4. Notes

- Input must be a chest X-ray image  
- Output is the final clinical impression (diagnostic conclusion)  
- This is a generic HuggingFace inference pipeline for LaVy-style models  
- If your implementation differs, adjust processor/model loading accordingly  
- For best performance, consider using Qwen2-VL-7B