SMolLM

A 53K-parameter weight-shared transformer that learns SMILES grammar by applying one small block 8 times. It reaches 95.3% validity on ZINC-250K — outperforming an unshared GPT 10× larger (87.6%).

Usage

import torch
from huggingface_hub import hf_hub_download

ckpt_path = hf_hub_download(
    repo_id="akhljndl/smollm",
    filename="checkpoints/0504_paper/ws-53k-s42.pt",
)
state = torch.load(ckpt_path, map_location="cpu")
# Model class + sampling utilities live in the companion repo:
#   https://github.com/akhljndl/smollm

Headline results

Model	Params	Validity (%)	FCD
WS-53K	53K	95.3 ± 0.7	2.76
WS-206K	206K	98.0 ± 0.4	2.45
WS-206K (no loop, D=1)	206K	33.0 ± 2.3	7.38
GPT-527K	527K	87.6 ± 0.7	3.01
GPT-3.2M	3.2M	99.1 ± 0.3	2.36
GRU-206K	206K	95.5	2.93

3 seeds per row, N=10,000 samples per evaluation. WS-206K-D1 ablation removes virtual depth to isolate the contribution of repeated block application.

Checkpoints

This repo hosts 89 checkpoints under checkpoints/0504_paper/, covering the weight-shared / standard-GPT / GRU baselines plus knowledge-distillation, DPO, and depth-sweep cohorts.

See MANIFEST.md for the full per-config / per-seed table including wandb run IDs, train / eval job IDs, and augmentation-cache provenance hashes.

Provenance

Every checkpoint here traces 1:1:1:1 to:

claim in the paper
code in akhljndl/smollm
wandb run in ajindal/smollm tagged 0504_paper
HF artifact in this repo under checkpoints/0504_paper/

Citation

@misc{jindal2026smollm,
  author = {Akhil Jindal and Harang Ju},
  title  = {SMolLM: Small Language Models Learn Small Molecular Grammar},
  year   = {2026},
  url    = {https://github.com/akhljndl/smollm},
}

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support