THP2903's picture
Create README.md
d28146c verified
# LaVy-8B – Stage 3 (Multi-turn)
## 1. Model Overview
This model is part of a Vision-Language AI system designed for chest X-ray analysis in Vietnamese clinical settings.
The full pipeline consists of 3 stages:
- Stage 1: Findings generation (image → radiology findings)
- Stage 2: Impression generation (image → clinical impression)
- Stage 3: Multi-turn conversation (findings + impression + dialogue)
This repository corresponds to:
- Stage: 3 (Multi-turn)
- Task: Multi-turn reasoning with findings and impression
- Domain: Vietnamese medical imaging (Chest X-ray)
The model supports **multi-turn dialogue**, where:
- Turn 1: Generate findings
- Turn 2: Generate clinical impression based on previous context
---
## 2. Installation
```bash
pip install torch torchvision transformers pillow
```
---
## 3. Inference
GPU is recommended.
```python
import torch
from PIL import Image
from transformers import AutoModelForCausalLM, AutoProcessor
model = AutoModelForCausalLM.from_pretrained(
"THP2903/lavy-Instruct_multi_full",
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(
"THP2903/lavy-Instruct_multi_full",
trust_remote_code=True
)
image = Image.open("your_image.jpg").convert("RGB")
# Turn 1: Findings
inputs = processor(
images=image,
text="Ảnh chụp xray bệnh nhân nam, 48 tuổi PA. Mô tả thông tin benh nhân.",
return_tensors="pt"
).to("cuda")
outputs = model.generate(
**inputs,
max_new_tokens=512
)
response1 = processor.batch_decode(
outputs,
skip_special_tokens=True
)[0]
print("Turn 1:", response1)
# Turn 2: Impression (reuse previous response)
inputs = processor(
images=image,
text=f"Previous findings: {response1}\nKết luận bệnh gì?",
return_tensors="pt"
).to("cuda")
outputs = model.generate(
**inputs,
max_new_tokens=512
)
response2 = processor.batch_decode(
outputs,
skip_special_tokens=True
)[0]
print("Turn 2:", response2)
```
---
## 4. Notes
- Input must be a chest X-ray image
- Turn 1 generates findings
- Turn 2 generates clinical impression using previous findings as context
- This implementation simulates multi-turn via prompt concatenation
- For best performance, consider using Qwen2-VL-7B