Instructions to use c-bone/CrystaLLM-pi_Chili100K-XRD with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use c-bone/CrystaLLM-pi_Chili100K-XRD with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="c-bone/CrystaLLM-pi_Chili100K-XRD")

# Load model directly
from transformers import AutoTokenizer, SliderGPT

tokenizer = AutoTokenizer.from_pretrained("c-bone/CrystaLLM-pi_Chili100K-XRD")
model = SliderGPT.from_pretrained("c-bone/CrystaLLM-pi_Chili100K-XRD")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use c-bone/CrystaLLM-pi_Chili100K-XRD with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "c-bone/CrystaLLM-pi_Chili100K-XRD"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "c-bone/CrystaLLM-pi_Chili100K-XRD",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/c-bone/CrystaLLM-pi_Chili100K-XRD

SGLang

How to use c-bone/CrystaLLM-pi_Chili100K-XRD with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "c-bone/CrystaLLM-pi_Chili100K-XRD" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "c-bone/CrystaLLM-pi_Chili100K-XRD",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "c-bone/CrystaLLM-pi_Chili100K-XRD" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "c-bone/CrystaLLM-pi_Chili100K-XRD",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use c-bone/CrystaLLM-pi_Chili100K-XRD with Docker Model Runner:
```
docker model run hf.co/c-bone/CrystaLLM-pi_Chili100K-XRD
```

CrystaLLM-pi_Chili100K-XRD / README.md

c-bone

Create README.md

d656830 verified 11 days ago

preview code

raw

history blame contribute delete

6.8 kB

	---
	language:
	- en
	license: mit
	library_name: transformers
	tags:
	- materials-science
	- crystallography
	- generative-ai
	- inverse-design
	- chemistry
	- xrd
	datasets:
	- c-bone/chili100k_strat
	pipeline_tag: text-generation
	---

	# Model Card for CrystaLLM-pi\_Chili100K-XRD

	## Model Details

	### Model Description

	CrystaLLM-pi\_Chili100K-XRD is a conditional generative model designed for the recovery of crystal structures from X-ray Diffraction (XRD) data. It is a fine-tuned version of the `CrystaLLM-pi` framework, utilizing a GPT-2 decoder-only architecture. This model employs a Residual Attention (Slider) mechanism to condition the generation of Crystallographic Information Files (CIFs) on heterogeneous X-ray diffraction data.

	The model generates crystal structures based on an XRD pattern input vector consisting of the 20 most intense peaks:

	1. Peak Positions ($2\theta$)
	2. Peak Intensities

	The Chili-100K XRD dataset the model is fine-tuned on contains experimentally determined structures sourced from Chili-100K, which is an inorganic experimental nanomaterials curated and filtered subset of the Crystallographic Open Database (COD). Notably, this model features an extended context window of 1536 tokens, enabling the generation of larger and more complex unit cells containing up to \~100 atoms.

	- Developed by: Bone et al. (University College London)
	- Model type: Autoregressive Transformer with Residual Attention Conditioning
	- Language(s): CIF (Crystallographic Information File) syntax
	- License: MIT
	- Finetuned from model: `c-bone/CrystaLLM-pi_Mattergen-XRD`

	### Model Sources

	- Repository: [GitHub: CrystaLLM-pi](https://github.com/C-Bone-UCL/CrystaLLM-pi)
	- Paper: [Discovery and recovery of crystalline materials with property-conditioned transformers (arXiv:2511.21299)](https://arxiv.org/abs/2511.21299)
	- Dataset: [HuggingFace: c-bone/mattergen\_XRD](https://huggingface.co/datasets/c-bone/mattergen_XRD) (Stage 1), [HuggingFace: c-bone/chili100k\_strat](https://www.google.com/search?q=https://huggingface.co/datasets/c-bone/chili100k_strat) (Stage 2)

	## Uses

	### Direct Use

	The model is intended for structure solution and recovery from powder XRD data. Researchers can input a list of peak positions and intensities derived from experimental diffraction patterns to generate candidate crystal structures that match the experimental signature.

	### Out-of-Scope Use

	- Disordered Systems: The model does not natively handle partial occupancies or significant disorder.
	- Organic/MOFs: The training data was strictly filtered for inorganic nanomaterials as per the Chili-100K dataset methdology.
	- Extremely Large Unit Cells: While the context window is expanded to 1536 tokens, structures with high numbers of atoms per unit cell may face or degradation in generation quality.

	## Bias, Risks, and Limitations

	- Experimental Noise: Performance relies on the quality of the input peak extraction and rarity of material.
	- Missing Data: The "Slider" mechanism handles missing peaks (padded with -100), but significant data loss degrades recovery rates.
	- Polymorphs: In cases of strong structural similarity, the model may bias towards the polymorph most prevalent in the Chili-100K distribution.

	## How to Get Started with the Model

	For instructions on loading and running generation, refer to the `_load_and_generate.py` script in the [CrystaLLM-pi GitHub Repository](https://github.com/C-Bone-UCL/CrystaLLM-pi). This script handles XRD vector tokenization and normalization.

	## Training Details

	### Training Data

	The model underwent a two-stage fine-tuning process:

	1. MatterGen XRD: Theoretical XRD patterns generated from the MatterGen dataset.
	2. Chili-100K XRD (`c-bone/chili100k_strat`): An experimentally determined, curated, and filtered subset of inorganic nanomaterials from the COD (accessed April 2026). After deduplication, this comprises \~14K materials derived from \~21K CIFs.

	Dataset Splitting (Chili-100K):

	- Train:Val:Test Ratio: 78.6:10.7:10.7
	- Leakage-Aware Test Set: The test set was strictly stratified to evaluate generalization:
	- 500 materials: Fully seen during training (LeMaterial, MatterGen XRD, or Chili-100K train/val).
	- 500 materials: Reduced formula seen during training, but the specific structure was unseen (measured via Structure Novelty metric).
	- 500 materials: Neither reduced formula nor structure seen in any training phase.

	### Training Procedure

	- Architecture: GPT-2 with Residual Attention (Slider) layers. (\~47.7M parameters)
	- Mechanism: The Slider mechanism computes a parallel attention score for the conditioning vector, dynamically weighting it against base self-attention to robustly handle heterogeneous/missing diffraction data.

	## Evaluation

	### Metrics

	The model is evaluated on the leakage-aware test splits using:

	1. Match Rate: Percentage of ground truth structures successfully recovered.
	2. RMS-d: Root Mean Square distance between ground truth and generated structures.
	3. Lattice Parameter and Volume MAE: Mean Absolute Error of predicted unit cell dimensions.
	4. N atoms match: The average amount of atoms in the unit cell of matched material in the test set.

	## Citation

	Primary Model Paper:

	```bibtex
	@misc{bone2025discoveryrecoverycrystallinematerials,
	title={Discovery and recovery of crystalline materials with property-conditioned transformers},
	author={Cyprien Bone and Matthew Walker and Kuangdai Leng and Luis M. Antunes and Ricardo Grau-Crespo and Amil Aligayev and Javier Dominguez and Keith T. Butler},
	year={2025},
	eprint={2511.21299},
	archivePrefix={arXiv},
	primaryClass={cond-mat.mtrl-sci},
	url={https://arxiv.org/abs/2511.21299},
	}
	```

	CHILI Dataset:

	```bibtex
	@inproceedings{10.1145/3637528.3671538,
	author = {Friis-Jensen, Ulrik and Johansen, Frederik L. and Anker, Andy S. and Dam, Erik B. and Jensen, Kirsten M. \O{}. and Selvan, Raghavendra},
	title = {CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine Learning},
	year = {2024},
	isbn = {9798400704901},
	publisher = {Association for Computing Machinery},
	address = {New York, NY, USA},
	url = {https://doi.org/10.1145/3637528.3671538},
	doi = {10.1145/3637528.3671538},
	pages = {4962–4973},
	numpages = {12},
	keywords = {atomic structure, chemistry, datasets, deep learning, graph neural network, graphs, machine learning, nanomaterials, neutron, scattering, x-ray},
	location = {Barcelona, Spain},
	series = {KDD '24}
	}
	```