Instructions to use c-bone/CrystaLLM-pi_Mattergen-XRD with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use c-bone/CrystaLLM-pi_Mattergen-XRD with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="c-bone/CrystaLLM-pi_Mattergen-XRD")

# Load model directly
from transformers import AutoTokenizer, SliderGPT

tokenizer = AutoTokenizer.from_pretrained("c-bone/CrystaLLM-pi_Mattergen-XRD")
model = SliderGPT.from_pretrained("c-bone/CrystaLLM-pi_Mattergen-XRD")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use c-bone/CrystaLLM-pi_Mattergen-XRD with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "c-bone/CrystaLLM-pi_Mattergen-XRD"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "c-bone/CrystaLLM-pi_Mattergen-XRD",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/c-bone/CrystaLLM-pi_Mattergen-XRD

SGLang

How to use c-bone/CrystaLLM-pi_Mattergen-XRD with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "c-bone/CrystaLLM-pi_Mattergen-XRD" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "c-bone/CrystaLLM-pi_Mattergen-XRD",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "c-bone/CrystaLLM-pi_Mattergen-XRD" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "c-bone/CrystaLLM-pi_Mattergen-XRD",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use c-bone/CrystaLLM-pi_Mattergen-XRD with Docker Model Runner:
```
docker model run hf.co/c-bone/CrystaLLM-pi_Mattergen-XRD
```

Model Card for CrystaLLM-pi_Mattergen-XRD

Model Details

Model Description

CrystaLLM-pi_Mattergen-XRD is a conditional generative model designed for the recovery of crystal structures from X-ray Diffraction (XRD) data. It is a fine-tuned version of the CrystaLLM-pi framework, based on a GPT-2 decoder-only architecture. This variant employs the Residual Attention (Slider) mechanism to condition the generation of Crystallographic Information Files (CIFs) on high-dimensional experimental data.

The model generates crystal structures based on an XRD pattern input vector, consisting of the 20 most intense peaks:

Peak Positions ($2\theta$)
Peak Intensities

Developed by: Bone et al. (University College London)
Model type: Autoregressive Transformer with Residual Attention Conditioning
Language(s): CIF (Crystallographic Information File) syntax
License: MIT
Finetuned from model: c-bone/CrystaLLM-pi_base

Model Sources

Repository: GitHub: CrystaLLM-pi
Paper: Discovery and recovery of crystalline materials with property-conditioned transformers (arXiv:2511.21299)
Dataset: HuggingFace: c-bone/mattergen_XRD

Uses

Direct Use

The model is intended for structure solution and recovery from powder XRD data. Researchers can input a list of peak positions and intensities derived from experimental diffraction patterns to generate candidate crystal structures that match the experimental signature.

Out-of-Scope Use

Disordered Systems: The model was trained on the alex-mp-20 dataset and theoretical XRDs. It does not natively handle partial occupancies or disorder.
Large Unit Cells: Context window limits apply (~20 atoms/cell).
Organic/MOFs: The training data only contains ordered organic crystals.

Bias, Risks, and Limitations

Missing Data: The "Slider" mechanism is designed to handle missing peaks (padded with -100), but significant data loss will degrade recovery rates.
Polymorphs: In cases of strong structural similarity or ambiguous diffraction patterns, the model may be biased towards the polymorph most represented in the training distribution.

How to Get Started with the Model

For instructions on how to load and run generation with this model, please refer to the _load_and_generate.py script in the CrystaLLM-pi GitHub Repository. This script handles the necessary tokenization and normalization of XRD vectors.

Training Details

Training Data

The model underwent a single-stage fine-tuning:

MatterGen XRD: Theoretical XRD patterns generated from the MatterGen (alex-mp-20) dataset.

Training Procedure

Architecture: GPT-2 with Residual Attention (Slider) layers. (~47.7M parameters)
Mechanism: The Slider mechanism computes a parallel attention score for the conditioning vector and dynamically weights it against the base self-attention. This allows for "softer" conditioning and robust handling of heterogeneous or missing data points in the diffraction pattern.

Evaluation

Metrics

The model is evaluated based on:

Match Rate: The percentage of ground truth structures successfully recovered (within structural similarity tolerances).
RMS-d: Root Mean Square distance between the ground truth and generated structures.
Lattice Parameter MAE: Mean Absolute Error of the predicted unit cell dimensions.

Citation

@misc{bone2025discoveryrecoverycrystallinematerials,
      title={Discovery and recovery of crystalline materials with property-conditioned transformers}, 
      author={Cyprien Bone and Matthew Walker and Kuangdai Leng and Luis M. Antunes and Ricardo Grau-Crespo and Amil Aligayev and Javier Dominguez and Keith T. Butler},
      year={2025},
      eprint={2511.21299},
      archivePrefix={arXiv},
      primaryClass={cond-mat.mtrl-sci},
      url={[https://arxiv.org/abs/2511.21299](https://arxiv.org/abs/2511.21299)}, 
}

Downloads last month: 195

Safetensors

Model size

47.4M params

Tensor type

F32

Model tree for c-bone/CrystaLLM-pi_Mattergen-XRD

Base model

c-bone/CrystaLLM-pi_base

Finetuned

(4)

this model

Dataset used to train c-bone/CrystaLLM-pi_Mattergen-XRD

Paper for c-bone/CrystaLLM-pi_Mattergen-XRD

Discovery and recovery of crystalline materials with property-conditioned transformers

Paper • 2511.21299 • Published Nov 26, 2025