--- license: apache-2.0 ---

UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation

We introduce **UniBiomed**, the first universal foundation model for grounded biomedical image interpretation, which is capable of generating accurate diagnostic findings and simultaneously segmenting the corresponding biomedical targets. UniBiomed is based on a novel integration of Multi-modal Large Language Model (MLLM) and Segment Anything Model (SAM), which can effectively unify diverse biomedical tasks in universal training for advancing grounded interpretation. Github link: https://github.com/Luffy03/UniBiomed We will consistently update more powerful versions of models in this repo. ## Usage ```python import argparse import torch from transformers import (AutoModel, AutoTokenizer, BitsAndBytesConfig, CLIPImageProcessor, GenerationConfig) def parse_args(): parser = argparse.ArgumentParser(description='UniBiomed') parser.add_argument('--model_path', default='Luffy503/UniBiomed') return args args = parse_args() # load model model = AutoModel.from_pretrained( args.model_path, torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, use_flash_attn=True, trust_remote_code=True, ).eval().cuda() tokenizer = AutoTokenizer.from_pretrained( args.model_path, trust_remote_code=True, ) # define data input, image and text instruction data_dict = {} image, text = None, None data_dict['image'] = image data_dict['text'] = text # output pred_dict = model.predict_forward(**data_dict, tokenizer=tokenizer) # text description prediction = pred_dict['prediction'] # segmentation mask mask = pred_dict['prediction_masks'][0][0] ``` ## Citation If you find this repo useful for your research, please consider citing the paper as follows: ```bibtex @article{wu2025unibiomed, title={UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation}, author={Wu, Linshan and Nie, Yuxiang and He, Sunan and Zhuang, Jiaxin and Chen, Hao}, journal={arXiv preprint arXiv:2504.21336}, year={2025} } ```