SMolLM
A 53K-parameter weight-shared transformer that learns SMILES grammar by applying one small block 8 times. It reaches 95.3% validity on ZINC-250K — outperforming an unshared GPT 10× larger (87.6%).
Usage
import torch
from huggingface_hub import hf_hub_download
ckpt_path = hf_hub_download(
repo_id="akhljndl/smollm",
filename="checkpoints/0504_paper/ws-53k-s42.pt",
)
state = torch.load(ckpt_path, map_location="cpu")
# Model class + sampling utilities live in the companion repo:
# https://github.com/akhljndl/smollm
Headline results
| Model | Params | Validity (%) | FCD |
|---|---|---|---|
| WS-53K | 53K | 95.3 ± 0.7 | 2.76 |
| WS-206K | 206K | 98.0 ± 0.4 | 2.45 |
| WS-206K (no loop, D=1) | 206K | 33.0 ± 2.3 | 7.38 |
| GPT-527K | 527K | 87.6 ± 0.7 | 3.01 |
| GPT-3.2M | 3.2M | 99.1 ± 0.3 | 2.36 |
| GRU-206K | 206K | 95.5 | 2.93 |
3 seeds per row, N=10,000 samples per evaluation. WS-206K-D1 ablation removes virtual depth to isolate the contribution of repeated block application.
Checkpoints
This repo hosts 89 checkpoints under checkpoints/0504_paper/, covering the weight-shared / standard-GPT / GRU baselines plus knowledge-distillation, DPO, and depth-sweep cohorts.
See MANIFEST.md for the full per-config / per-seed table including wandb run IDs, train / eval job IDs, and augmentation-cache provenance hashes.
Provenance
Every checkpoint here traces 1:1:1:1 to:
- claim in the paper
- code in
akhljndl/smollm - wandb run in
ajindal/smollmtagged0504_paper - HF artifact in this repo under
checkpoints/0504_paper/
Citation
@misc{jindal2026smollm,
author = {Akhil Jindal and Harang Ju},
title = {SMolLM: Small Language Models Learn Small Molecular Grammar},
year = {2026},
url = {https://github.com/akhljndl/smollm},
}
License
MIT
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support