BioMedGPT-Mol

BioMedGPT-Mol is a multimodal molecular language model jointly released by PharMolix Inc. and the Institute of AI Industry Research (AIR), Tsinghua University. It is built for both molecular understanding and generation, supporting a wide range of tasks including chemical name conversion, molecular captioning, property prediction, reaction modeling, molecule editing, and property optimization. Trained with a well-structured multi-task curriculum, BioMedGPT-Mol shows remarkable performance across diverse molecule-centric discovery benchmarks. More technical details can be found in the technical report.

Get started

Download the model and config files.

Evaluation on Benchmarks

The test set is available in testset. If you use the dataset for evaluation, please consider citing the related papers:

@article{yu2024llasmol,
    title={Llasmol: Advancing large language models for chemistry with a large-scale, comprehensive, high-quality instruction tuning dataset},
    author={Yu, Botao and Baker, Frazier N and Chen, Ziqi and Ning, Xia and Sun, Huan},
    journal={arXiv preprint arXiv:2402.09391},
    year={2024}
}

@article{li2024tomg,
    title={TOMG-Bench: Evaluating LLMs on text-based open molecule generation},
    author={Li, Jiatong and Li, Junxian and Liu, Yunqing and Zhou, Dongzhan and Li, Qing},
    journal={arXiv preprint arXiv:2412.14642},
    year={2024}
}

@article{dey2025mathtt,
    title={$$\backslash$mathtt $\{$GeLLM\^{} 3O$\}$ $: Generalizing Large Language Models for Multi-property Molecule Optimization},
    author={Dey, Vishal and Hu, Xiao and Ning, Xia},
    journal={arXiv preprint arXiv:2502.13398},
    year={2025}
}

@article{biomedgpt-mol,
    title={BioMedGPT-Mol: Multi-task Learning for Molecular Understanding and Generation},
    author={Zuo, Chenyang and Fan, Siqi and Nie, Zaiqing},
    journal={arXiv preprint arXiv:2512.04629},
    year={2025}
}

Update the configuration and run inference using the provided scripts, and the outputs will be saved in the logs directory.

- logs
---- biomedgpt_mol
-------- mumoinstruct
------------ logs
------------ results
-------- openmolinst
------------ logs
------------ results
-------- smolinstruct
------------ logs
------------ results

# SMolInstruction
bash evaluation/scripts/inference_smolinstruct.sh

# OpenMolInstuct
bash evaluation/scripts/inference_openmolinst.sh

# MuMoInstruct
bash evaluation/scripts/inference_mumoinstruct.sh

Update the configuration accordingly and execute the evaluation scripts. The computed metrics will be stored as metrics.json in the results directory, e.g., /logs/biomedgpt_mol/mumoinstruct/results/metrics.json.

  # SMolInstruction
  bash evaluation/scripts/evaluate_smolinstruct.sh

  # OpenMolInstuct
  bash evaluation/scripts/evaluate_openmolinst.sh

  # MuMoInstruct
  bash evaluation/scripts/evaluate_mumoinstruct.sh

🔥Explore our OpenBioMed platform for more discovery tasks.

Cite Us

If you find our open-sourced models helpful to your research, please consider citing:

@article{biomedgpt-mol,
  title={BioMedGPT-Mol: Multi-task Learning for Molecular Understanding and Generation},
  author={Zuo, Chenyang and Fan, Siqi and Nie, Zaiqing},
  journal={arXiv preprint arXiv:2512.04629},
  year={2025}
}