CLaSP — Contrastive Language-Structure Pre-training

CLaSP is a multimodal contrastive learning framework that bridges crystal structures and scientific text, analogous to CLIP for images and text. Given a CIF file and a text description, CLaSP maps both into a shared embedding space, enabling text-based retrieval and zero-shot classification of materials.

This repository hosts the model checkpoint from the paper (official release: Toyota/clasp):

Bridging text and crystal structures: literature-driven contrastive learning for materials science
Y. Suzuki, T. Taniai, R. Igarashi et al.
Machine Learning: Science and Technology 6, 035006 (2025)
DOI: 10.1088/2632-2153/ade58c

Model Overview

CLaSP trains two encoders jointly with a contrastive objective:

Structure encoder — graph neural network operating on crystal structures (CIF files via PyTorch Geometric)
Text encoder — transformer-based language model operating on paper titles / keyword captions

Training is done in two stages:

Pre-training on (crystal structure, paper title) pairs from the Crystallography Open Database (COD)
Fine-tuning on (crystal structure, LLM-generated keyword caption) pairs

Files

File	Description
`model_finetuned_s30_m05.ckpt`	PyTorch Lightning checkpoint fine-tuned on COD with `loss_scale=3.0`, `margin=0.5` — the same checkpoint used in the paper's experiments

Usage

Note: The checkpoint is in PyTorch Lightning .ckpt format. Native HuggingFace from_pretrained support is planned. For now, use the steps below.

1. Install

git clone https://github.com/Toyota/clasp.git
cd clasp
docker build -t clasp:v1.0 -f docker/Dockerfile .

2. Download the checkpoint

from huggingface_hub import hf_hub_download

ckpt_path = hf_hub_download(
    repo_id="resnant/clasp-materials",
    filename="model_finetuned_s30_m05.ckpt",
)

3. Extract crystal embeddings

docker run --gpus 1 --rm \
  -v $(pwd):/workspace \
  -w /workspace \
  clasp:v1.0 python examples/extract_embeddings.py \
    --checkpoint_path /path/to/model_finetuned_s30_m05.ckpt \
    --cif_list /workspace/demo_data/cif_list.txt \
    --output_path /workspace/demo_data/embeddings.npz \
    --batch_size 32

4. Text-based retrieval (Python)

import torch
from models.contrastive import ClaspModel
from transformers import AutoTokenizer

# Load checkpoint
ckpt = torch.load(ckpt_path, map_location="cpu")
# ... (see examples/ in the GitHub repo for full loading code)

See original example examples/embedding_visualization.ipynb for t-SNE visualization, clustering, and similarity search demos.

Training Details

Item	Value
Pre-training data	COD crystal structures + paper titles
Fine-tuning captions	LLM-generated keywords (Llama 3 70B Instruct)
Loss scale (`s`)	3.0
Margin (`m`)	0.5
Precision	bf16 mixed
Framework	PyTorch Lightning

The keyword caption dataset used for fine-tuning (keyword_captions_cod_full_20240331.zip) is available from the GitHub release page.

Citation

@article{suzuki2025clasp,
  doi     = {10.1088/2632-2153/ade58c},
  year    = {2025},
  month   = {jul},
  volume  = {6},
  number  = {3},
  pages   = {035006},
  author  = {Suzuki, Yuta and Taniai, Tatsunori and Igarashi, Ryo and
             Saito, Kotaro and Chiba, Naoya and Ushiku, Yoshitaka and Ono, Kanta},
  title   = {Bridging text and crystal structures: literature-driven
             contrastive learning for materials science},
  journal = {Machine Learning: Science and Technology},
}

License

Apache License 2.0 — see original LICENSE.

Downloads last month: -; Downloads are not tracked for this model. How to track