PRS-Med: Position Reasoning Segmentation in Medical Imaging

arXiv Project Page GitHub HuggingFace Dataset


This repository contains the pretrained weights for PRS-Med, a modular framework for position reasoning segmentation in medical imaging. PRS-Med combines a vision-language model (LLaVA-Med) with a lightweight image encoder (TinySAM) and a custom mask decoder to perform context-aware medical image segmentation.

Accepted at CVPRW 2026.

Model Architecture

PRS-Med consists of three main components:

Component Description Details
LLaVA-Med Vision-language model for semantic reasoning Mistral-7B backbone, fine-tuned with LoRA (r=16, alpha=16)
TinySAM Lightweight image encoder ViT-Tiny, extracts 256-dim features at 64x64 spatial resolution
Prompted Mask Decoder Cross-attention fusion + upsampling Fuses LLM embeddings with image features to produce 1024x1024 masks
Classification Head 6-class medical modality classifier Brain, Breast, Lung CT, Lung X-ray, Skin (ISIC), Other

Checkpoint Contents

checkpoint/
β”œβ”€β”€ lora_adapter/               # LoRA adapter weights for LLaVA-Med
β”‚   β”œβ”€β”€ adapter_config.json
β”‚   β”œβ”€β”€ adapter_model.safetensors
β”‚   β”œβ”€β”€ tokenizer.json
β”‚   β”œβ”€β”€ tokenizer.model
β”‚   β”œβ”€β”€ tokenizer_config.json
β”‚   └── special_tokens_map.json
β”œβ”€β”€ image_encoder.pth           # TinySAM image encoder weights
β”œβ”€β”€ mask_decoder.pth            # Prompted mask decoder weights
└── cls.pth                     # 6-class classification head weights

Usage

1. Install Dependencies

git clone https://github.com/huyquoctrinh/PRS-Med.git
cd PRS-Med
pip install -r requirements.txt

2. Download Weights

from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="huyquoctrinh/PRS-Med",
    local_dir="./checkpoint"
)

Or using the CLI:

huggingface-cli download huyquoctrinh/PRS-Med --local-dir ./checkpoint

3. Download Base Model

PRS-Med requires the LLaVA-Med base model:

huggingface-cli download microsoft/llava-med-v1.5-mistral-7b --local-dir ./weight/llava-med-v1.5-mistral-7b

4. Run Inference

from segment_model.model import build_llm_seg

# Build model
model, tokenizer, image_processor, config = build_llm_seg(
    model_path="./weight/llava-med-v1.5-mistral-7b",
    device="cuda"
)

# Load checkpoint
model.load_model("./checkpoint")
model = model.to("cuda")

Or run the inference script:

python infer_full.py \
  --model_path ./weight/llava-med-v1.5-mistral-7b \
  --checkpoint_path ./checkpoint \
  --device cuda

5. Training

CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 train_ddp.py \
  --model_path ./weight/llava-med-v1.5-mistral-7b \
  --data_path /path/to/dataset/data \
  --annotation_path /path/to/dataset/annotations \
  --batch_size 4 \
  --epochs 20 \
  --save_dir ./training_results \
  --grad_accum_steps 8 \
  --grad_clip_norm 1.0

Training Details

Hyperparameter Value
Base Model LLaVA-Med v1.5 (Mistral-7B)
LoRA Rank 16
LoRA Alpha 16
LoRA Dropout 0.05
LoRA Targets q_proj, k_proj, v_proj, o_proj
Image Encoder TinySAM (ViT-Tiny)
Optimizer AdamW (lr=1e-4, weight_decay=1e-5)
Scheduler CosineAnnealing (T_max=20, eta_min=1e-6)
Precision bfloat16 (mixed precision)
Loss 0.5 * StructureLoss + 0.5 * LLM Loss + ClassificationLoss

Supported Medical Modalities

Modality Dataset
Brain MRI Brain Tumor CT Scan
Breast Ultrasound Breast Ultrasound
Lung CT Lung CT
Lung X-ray Lung X-ray
Skin Lesion ISIC Skin Cancer
Polyp Polyp Endoscopy

Citation

@article{trinh2025prs,
  title   = {Prs-med: Position reasoning segmentation with vision-language model in medical imaging},
  author  = {Trinh, Quoc-Huy and Nguyen, Minh-Van and Zeng, Jung and Bagci, Ulas and Jha, Debesh},
  journal = {arXiv preprint arXiv:2505.11872},
  year    = {2025}
}

Acknowledgement

This work is built upon LLaVA, Segment Anything, and TinySAM.

License

This project is licensed under the MIT License.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train huyquoctrinh/PRS-Med

Paper for huyquoctrinh/PRS-Med