CLaSP β Contrastive Language-Structure Pre-training
CLaSP is a multimodal contrastive learning framework that bridges crystal structures and scientific text, analogous to CLIP for images and text. Given a CIF file and a text description, CLaSP maps both into a shared embedding space, enabling text-based retrieval and zero-shot classification of materials.
This repository hosts the model checkpoint from the paper (official release: Toyota/clasp):
Bridging text and crystal structures: literature-driven contrastive learning for materials science
Y. Suzuki, T. Taniai, R. Igarashi et al.
Machine Learning: Science and Technology 6, 035006 (2025)
DOI: 10.1088/2632-2153/ade58c
Model Overview
CLaSP trains two encoders jointly with a contrastive objective:
- Structure encoder β graph neural network operating on crystal structures (CIF files via PyTorch Geometric)
- Text encoder β transformer-based language model operating on paper titles / keyword captions
Training is done in two stages:
- Pre-training on (crystal structure, paper title) pairs from the Crystallography Open Database (COD)
- Fine-tuning on (crystal structure, LLM-generated keyword caption) pairs
Files
| File | Description |
|---|---|
model_finetuned_s30_m05.ckpt |
PyTorch Lightning checkpoint fine-tuned on COD with loss_scale=3.0, margin=0.5 β the same checkpoint used in the paper's experiments |
Usage
Note: The checkpoint is in PyTorch Lightning
.ckptformat. Native HuggingFacefrom_pretrainedsupport is planned. For now, use the steps below.
1. Install
git clone https://github.com/Toyota/clasp.git
cd clasp
docker build -t clasp:v1.0 -f docker/Dockerfile .
2. Download the checkpoint
from huggingface_hub import hf_hub_download
ckpt_path = hf_hub_download(
repo_id="resnant/clasp-materials",
filename="model_finetuned_s30_m05.ckpt",
)
3. Extract crystal embeddings
docker run --gpus 1 --rm \
-v $(pwd):/workspace \
-w /workspace \
clasp:v1.0 python examples/extract_embeddings.py \
--checkpoint_path /path/to/model_finetuned_s30_m05.ckpt \
--cif_list /workspace/demo_data/cif_list.txt \
--output_path /workspace/demo_data/embeddings.npz \
--batch_size 32
4. Text-based retrieval (Python)
import torch
from models.contrastive import ClaspModel
from transformers import AutoTokenizer
# Load checkpoint
ckpt = torch.load(ckpt_path, map_location="cpu")
# ... (see examples/ in the GitHub repo for full loading code)
See original example examples/embedding_visualization.ipynb for t-SNE visualization, clustering, and similarity search demos.
Training Details
| Item | Value |
|---|---|
| Pre-training data | COD crystal structures + paper titles |
| Fine-tuning captions | LLM-generated keywords (Llama 3 70B Instruct) |
Loss scale (s) |
3.0 |
Margin (m) |
0.5 |
| Precision | bf16 mixed |
| Framework | PyTorch Lightning |
The keyword caption dataset used for fine-tuning (keyword_captions_cod_full_20240331.zip) is available from the GitHub release page.
Citation
@article{suzuki2025clasp,
doi = {10.1088/2632-2153/ade58c},
year = {2025},
month = {jul},
volume = {6},
number = {3},
pages = {035006},
author = {Suzuki, Yuta and Taniai, Tatsunori and Igarashi, Ryo and
Saito, Kotaro and Chiba, Naoya and Ushiku, Yoshitaka and Ono, Kanta},
title = {Bridging text and crystal structures: literature-driven
contrastive learning for materials science},
journal = {Machine Learning: Science and Technology},
}
License
Apache License 2.0 β see original LICENSE.
