| # LaVy-8B – Stage 3 (Multi-turn) |
|
|
| ## 1. Model Overview |
|
|
| This model is part of a Vision-Language AI system designed for chest X-ray analysis in Vietnamese clinical settings. |
|
|
| The full pipeline consists of 3 stages: |
| - Stage 1: Findings generation (image → radiology findings) |
| - Stage 2: Impression generation (image → clinical impression) |
| - Stage 3: Multi-turn conversation (findings + impression + dialogue) |
|
|
| This repository corresponds to: |
|
|
| - Stage: 3 (Multi-turn) |
| - Task: Multi-turn reasoning with findings and impression |
| - Domain: Vietnamese medical imaging (Chest X-ray) |
|
|
| The model supports **multi-turn dialogue**, where: |
| - Turn 1: Generate findings |
| - Turn 2: Generate clinical impression based on previous context |
|
|
| --- |
|
|
| ## 2. Installation |
|
|
| ```bash |
| pip install torch torchvision transformers pillow |
| ``` |
|
|
| --- |
|
|
| ## 3. Inference |
|
|
| GPU is recommended. |
|
|
| ```python |
| import torch |
| from PIL import Image |
| from transformers import AutoModelForCausalLM, AutoProcessor |
| |
| model = AutoModelForCausalLM.from_pretrained( |
| "THP2903/lavy-Instruct_multi_full", |
| torch_dtype=torch.float16, |
| device_map="auto", |
| trust_remote_code=True |
| ) |
| |
| processor = AutoProcessor.from_pretrained( |
| "THP2903/lavy-Instruct_multi_full", |
| trust_remote_code=True |
| ) |
| |
| image = Image.open("your_image.jpg").convert("RGB") |
| |
| # Turn 1: Findings |
| inputs = processor( |
| images=image, |
| text="Ảnh chụp xray bệnh nhân nam, 48 tuổi PA. Mô tả thông tin benh nhân.", |
| return_tensors="pt" |
| ).to("cuda") |
| |
| outputs = model.generate( |
| **inputs, |
| max_new_tokens=512 |
| ) |
| |
| response1 = processor.batch_decode( |
| outputs, |
| skip_special_tokens=True |
| )[0] |
| |
| print("Turn 1:", response1) |
| |
| # Turn 2: Impression (reuse previous response) |
| inputs = processor( |
| images=image, |
| text=f"Previous findings: {response1}\nKết luận bệnh gì?", |
| return_tensors="pt" |
| ).to("cuda") |
| |
| outputs = model.generate( |
| **inputs, |
| max_new_tokens=512 |
| ) |
| |
| response2 = processor.batch_decode( |
| outputs, |
| skip_special_tokens=True |
| )[0] |
| |
| print("Turn 2:", response2) |
| ``` |
|
|
| --- |
|
|
| ## 4. Notes |
|
|
| - Input must be a chest X-ray image |
| - Turn 1 generates findings |
| - Turn 2 generates clinical impression using previous findings as context |
| - This implementation simulates multi-turn via prompt concatenation |
| - For best performance, consider using Qwen2-VL-7B |