File size: 2,083 Bytes

---
license: mit
tags:
  - catalyst
  - materials-science
  - jax
  - structure-generation
  - constrained-decoding
datasets:
  - Open-Catalyst-Project/OC20
metrics:
  - generation_validity
  - uniqueness
  - novelty
---

# nanocatalyst (depth=8, 25.2M params)

Minimal JAX/Flax transformer for catalyst structure generation with single-parameter depth scaling.

## Model Details

| | |
|---|---|
| Architecture | Transformer (RMSNorm, RoPE, QK-norm, ReLU², logit softcapping, residual scalars) |
| Parameters | 25.2M |
| Depth | 8 (n_embd=512, n_layer=8, n_head=8) |
| Vocab size | 186 (WordLevel, 2-digit pair encoding) |
| Training data | 174K OC20 structures |
| Training time | 97 min on TPU v6e-8 |
| Framework | JAX / Flax |

## Results (CuPt3 + OH, T=0.8, top_k=40, 100 samples)

| Metric | Result |
|--------|--------|
| Parseable | 96/100 |
| Element Match | 96/100 |
| Generation Validity | 96/100 (96.0%) |
| Uniqueness | 96/96 (100.0%) |
| Novelty | 96/96 (100.0%) |
| Min Distance (≥ 0.5Å) | 83/96 (86.5%) |

## Usage

```python
from catalyst.hub import download_checkpoint
from catalyst.config import CatalystConfig
from catalyst.generate import generate_samples

# Download checkpoint
ckpt_path = download_checkpoint("everythingchalna/nanocatalyst")
config = CatalystConfig.load(ckpt_path / "config.json")

# Load params and generate (see README for full example)
```

## Training

Trained on 174K structures from the OC20 S2EF dataset using a TPU v6e-8 (Google TRC program). 20 epochs, WSD learning rate schedule, AdamW optimizer. Final val_loss=0.9518.

## Files

- `config.json` — Model configuration
- `params/` — Orbax checkpoint (model parameters)
- `tokenizer.json` — HuggingFace WordLevel tokenizer
- `tokenizer_stats.json` — Tokenizer coverage statistics

## Citation

```bibtex
@software{nanocatalyst,
  title = {nanocatalyst},
  url = {https://github.com/everythingchalna/nanocatalyst},
  license = {MIT}
}
```

## Acknowledgments

Training compute provided by the [Google TPU Research Cloud (TRC)](https://sites.research.google/trc/) program.