metadata
license: mit
library_name: generic
tags:
- chemistry
- molecular-dynamics
- machine-learning-force-fields
- equivariant-transformer
- foundation-model
- bio-systems
- e2former
datasets:
- Omol25
- UBio-Mol26
metrics:
- mae
UBio-MolFM-V1: Universal Bio-Molecular Foundation Model
UBio-MolFM is a foundation model suite for molecular modeling, specifically designed for bio-systems. This model, UBio-MolFM-V1 (Stage 3), is built on the E2Former-V2 linear-scaling equivariant transformer architecture. Refer to the technique report for more details: UBio-MolFM (arXiv:2602.17709).
Model Details
- Model Type: Equivariant Transformer (E2Former-V2)
- Training Stage: Stage 3 (Final stage of curriculum learning)
- Parameters: Included in
molfm-v1-stage-3.pt - Architecture: Linear-scaling equivariant attention with linear activation memory.
- Related Papers:
- UBio-MolFM: arXiv:2602.17709
- E2Former-V2: arXiv:2601.16622
- Capabilities:
- Predicts single-point energy and atomic forces.
- Supports large-scale simulations (up to 1,500 atoms with high fidelity, and up to 100,000 atoms on a single GPU).
- Optimized for bio-specific molecular systems.
Files
molfm-v1-stage-3.pt: Pretrained model checkpoint.config.yaml: Model and inference configuration.
Usage
To use this model, you need to install the molfm codebase. Please refer to the official repository for installation instructions.
Single-Point Energy and Force Prediction
from ase.build import molecule
from molfm.interface.ase.calculator.e2former_calculator import E2FormerCalculator
# 1. Setup atoms
atoms = molecule("H2O")
atoms.set_cell([10, 10, 10])
atoms.pbc = [True, True, True]
# 2. Load the model using the provided checkpoint and config
calc = E2FormerCalculator(
checkpoint_path="path/to/molfm-v1-stage-3.pt",
config_name="path/to/config.yaml", # Or local config name if in search path
head_name="omol25",
device="cuda",
use_tf32=True,
use_compile=True,
)
# 3. Perform calculation
atoms.calc = calc
energy = atoms.get_potential_energy()
forces = atoms.get_forces()
print(f"Energy: {energy} eV")
print(f"Forces:\n{forces}")
Molecular Dynamics with ASE
from ase import units
from ase.md.langevin import Langevin
from ase.md.velocitydistribution import MaxwellBoltzmannDistribution
# Initialize velocities
MaxwellBoltzmannDistribution(atoms, temperature_K=300)
# Setup Langevin integrator
dyn = Langevin(atoms, 1 * units.fs, temperature_K=300, friction=0.01)
# Run MD
dyn.run(100)
Performance Notes
- TensorFloat-32: Set
use_tf32=Trueto enable TF32 on supported NVIDIA GPUs for higher throughput. - Torch Compile: Set
use_compile=Trueto enabletorch.compilefor faster execution.
Training Data
The model was trained using a three-stage curriculum learning strategy on a combination of datasets:
- UBio-Mol26: 17M bio-specific molecular dataset. We have released a high-precision subset: UBio-Protein26 (5 million protein DFT data).
- OMol25: Large-scale molecular dataset.
Citation
If you use UBio-MolFM-V1 in your research, please cite:
@misc{huang2026ubiomolfm,
title={UBio-MolFM: A Universal Molecular Foundation Model for Bio-Systems},
author={Lin Huang and Arthur Jiang and XiaoLi Liu and Zion Wang and Jason Zhao and Chu Wang and HaoCheng Lu and ChengXiang Huang and JiaJun Cheng and YiYue Du and Jia Zhang},
year={2026},
eprint={2602.17709},
url={https://arxiv.org/abs/2602.17709},
archivePrefix={arXiv},
primaryClass={physics.chem-ph}
}
@misc{huang2026e2formerv2,
title={E2Former-V2: On-the-Fly Equivariant Attention with Linear Activation Memory},
author={Lin Huang and Chengxiang Huang and Ziang Wang and Yiyue Du and Chu Wang and Haocheng Lu and Yunyang Li and Xiaoli Liu and Arthur Jiang and Jia Zhang},
year={2026},
eprint={2601.16622},
url={https://arxiv.org/abs/2601.16622},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
License
This model and the associated code are released under the MIT License.