crystalite-base / README.md
Jarrodbarnes's picture
Upload README.md with huggingface_hub
4bac69a verified
metadata
license: mit
tags:
  - crystal-generation
  - diffusion-model
  - materials-science
  - probe-gradient-guidance
library_name: pytorch

Crystalite 10K (Alex-MP-20)

Crystalite checkpoint trained for 10K steps on the full Alex-MP-20 dataset (540K structures, 97.9% metals). This is the diversity-optimized model used for the Pareto sweep experiments.

Architecture: 67.8M-parameter Diffusion Transformer with subatomic tokenizer and GEM attention bias (Crystalite, Hadzi Veljkovic et al.).

Key results with probe-gradient guidance

Guidance weight In-window (4-6 eV) Uniqueness Metal %
0 (baseline) 0.1% 99.7% 96.9%
10 31.8% 99.7% 0.1%
15 33.7% 99.6% 0.0%

Every guidance weight Pareto-dominates the baseline. 18,432 structures across 6 weights, 3 seeds, 1,024 per batch. No mode collapse.

Band gap probe AUROC: 0.957 (256 parameters, trained on atom-mean hidden states).

Usage

Requires the Crystalite codebase and probe-gradient-guidance scripts.

from scripts.train_probe import load_model
model = load_model("final.pt", device="cuda")

Links

Used In

This checkpoint was used as an upstream generation asset in the open-world environment pipeline for Training Scientific Judgment with Verified Environments for Autonomous Science.