ChemFuse β€” Polymer

ChemFuse is a multimodal model for polymers that unifies different chemical identifiers β€” including PSMILES, BigSMILES, and systematic names β€” into a shared embedding space.

Overview

The ChemFuse polymer model produces joint embeddings across multiple polymer representations. This enables cross-identifier retrieval, property prediction, and downstream tasks that benefit from aligned representations.

Installation

Download and unzip the repository:

wget https://zenodo.org/records/19241324/files/chemfuse_polymer.zip?download=1 -O chemfuse_polymer.zip
unzip chemfuse_polymer.zip

Set up the environment and install dependencies:

# Change to the chemfuse_polymer directory
cd chemfuse_polymer

# Create and activate a uv environment
uv venv chemfuse
source chemfuse/bin/activate

# Install the package in editable mode
uv pip install -e .

Then install the PyTorch Geometric dependencies for your platform:

Linux (CUDA 12.1):

uv pip install torch-scatter torch-cluster torch-sparse \
  -f https://data.pyg.org/whl/torch-2.2.0+cu121.html

macOS (CPU):

uv pip install setuptools
uv pip install torch-scatter torch-cluster torch-sparse \
  -f https://data.pyg.org/whl/torch-2.2.0+cpu.html --no-build-isolation

Pre-trained Models

ChemFuse pre-trained models for polymers and MOFs are available on the Hugging Face collection. You can also load a locally fine-tuned model from a .ckpt file.

Datasets

The training and testing datasets are available on Hugging Face. Make sure to place these datasets in the data/ directory before running any scripts.

Usage

Retrieval

Before running the retrieval script, ensure that a checkpoint file is located at experiments/checkpoints/chemfuse_test/.ckpt.

cd experiments
python retrieval.py experiment=metrics/chemfuse_test

Training

Make sure the training and testing data are in the data/ directory, then run:

python train.py experiment=train/chemfuse_train

Citation

If you use ChemFuse in your work, please cite:


}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train jablonkagroup/ChemFuse

Collection including jablonkagroup/ChemFuse