PRS-Med: Position Reasoning Segmentation in Medical Imaging

This repository contains the pretrained weights for PRS-Med, a modular framework for position reasoning segmentation in medical imaging. PRS-Med combines a vision-language model (LLaVA-Med) with a lightweight image encoder (TinySAM) and a custom mask decoder to perform context-aware medical image segmentation.

Accepted at CVPRW 2026.

Model Architecture

PRS-Med consists of three main components:

Component	Description	Details
LLaVA-Med	Vision-language model for semantic reasoning	Mistral-7B backbone, fine-tuned with LoRA (r=16, alpha=16)
TinySAM	Lightweight image encoder	ViT-Tiny, extracts 256-dim features at 64x64 spatial resolution
Prompted Mask Decoder	Cross-attention fusion + upsampling	Fuses LLM embeddings with image features to produce 1024x1024 masks
Classification Head	6-class medical modality classifier	Brain, Breast, Lung CT, Lung X-ray, Skin (ISIC), Other

Checkpoint Contents

checkpoint/
├── lora_adapter/               # LoRA adapter weights for LLaVA-Med
│   ├── adapter_config.json
│   ├── adapter_model.safetensors
│   ├── tokenizer.json
│   ├── tokenizer.model
│   ├── tokenizer_config.json
│   └── special_tokens_map.json
├── image_encoder.pth           # TinySAM image encoder weights
├── mask_decoder.pth            # Prompted mask decoder weights
└── cls.pth                     # 6-class classification head weights

Usage

1. Install Dependencies

git clone https://github.com/huyquoctrinh/PRS-Med.git
cd PRS-Med
pip install -r requirements.txt

2. Download Weights

from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="huyquoctrinh/PRS-Med",
    local_dir="./checkpoint"
)

Or using the CLI:

huggingface-cli download huyquoctrinh/PRS-Med --local-dir ./checkpoint

3. Download Base Model

PRS-Med requires the LLaVA-Med base model:

huggingface-cli download microsoft/llava-med-v1.5-mistral-7b --local-dir ./weight/llava-med-v1.5-mistral-7b

4. Run Inference

from segment_model.model import build_llm_seg

# Build model
model, tokenizer, image_processor, config = build_llm_seg(
    model_path="./weight/llava-med-v1.5-mistral-7b",
    device="cuda"
)

# Load checkpoint
model.load_model("./checkpoint")
model = model.to("cuda")

Or run the inference script:

python infer_full.py \
  --model_path ./weight/llava-med-v1.5-mistral-7b \
  --checkpoint_path ./checkpoint \
  --device cuda

5. Training

CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 train_ddp.py \
  --model_path ./weight/llava-med-v1.5-mistral-7b \
  --data_path /path/to/dataset/data \
  --annotation_path /path/to/dataset/annotations \
  --batch_size 4 \
  --epochs 20 \
  --save_dir ./training_results \
  --grad_accum_steps 8 \
  --grad_clip_norm 1.0

Training Details

Hyperparameter	Value
Base Model	LLaVA-Med v1.5 (Mistral-7B)
LoRA Rank	16
LoRA Alpha	16
LoRA Dropout	0.05
LoRA Targets	q_proj, k_proj, v_proj, o_proj
Image Encoder	TinySAM (ViT-Tiny)
Optimizer	AdamW (lr=1e-4, weight_decay=1e-5)
Scheduler	CosineAnnealing (T_max=20, eta_min=1e-6)
Precision	bfloat16 (mixed precision)
Loss	0.5 * StructureLoss + 0.5 * LLM Loss + ClassificationLoss

Supported Medical Modalities

Modality	Dataset
Brain MRI	Brain Tumor CT Scan
Breast Ultrasound	Breast Ultrasound
Lung CT	Lung CT
Lung X-ray	Lung X-ray
Skin Lesion	ISIC Skin Cancer
Polyp	Polyp Endoscopy

Citation

@article{trinh2025prs,
  title   = {Prs-med: Position reasoning segmentation with vision-language model in medical imaging},
  author  = {Trinh, Quoc-Huy and Nguyen, Minh-Van and Zeng, Jung and Bagci, Ulas and Jha, Debesh},
  journal = {arXiv preprint arXiv:2505.11872},
  year    = {2025}
}

Acknowledgement

This work is built upon LLaVA, Segment Anything, and TinySAM.

License

This project is licensed under the MIT License.

Downloads last month: -; Downloads are not tracked for this model. How to track

Dataset used to train huyquoctrinh/PRS-Med

Paper for huyquoctrinh/PRS-Med

PRS-Med: Position Reasoning Segmentation with Vision-Language Model in Medical Imaging

Paper • 2505.11872 • Published May 17, 2025