SMolLM

A 53K-parameter weight-shared transformer that learns SMILES grammar by applying one small block 8 times. It reaches 95.3% validity on ZINC-250K — outperforming an unshared GPT 10× larger (87.6%).

Usage

import torch
from huggingface_hub import hf_hub_download

ckpt_path = hf_hub_download(
    repo_id="akhljndl/smollm",
    filename="checkpoints/0504_paper/ws-53k-s42.pt",
)
state = torch.load(ckpt_path, map_location="cpu")
# Model class + sampling utilities live in the companion repo:
#   https://github.com/akhljndl/smollm

Headline results

Model Params Validity (%) FCD
WS-53K 53K 95.3 ± 0.7 2.76
WS-206K 206K 98.0 ± 0.4 2.45
WS-206K (no loop, D=1) 206K 33.0 ± 2.3 7.38
GPT-527K 527K 87.6 ± 0.7 3.01
GPT-3.2M 3.2M 99.1 ± 0.3 2.36
GRU-206K 206K 95.5 2.93

3 seeds per row, N=10,000 samples per evaluation. WS-206K-D1 ablation removes virtual depth to isolate the contribution of repeated block application.

Checkpoints

This repo hosts 89 checkpoints under checkpoints/0504_paper/, covering the weight-shared / standard-GPT / GRU baselines plus knowledge-distillation, DPO, and depth-sweep cohorts.

See MANIFEST.md for the full per-config / per-seed table including wandb run IDs, train / eval job IDs, and augmentation-cache provenance hashes.

Provenance

Every checkpoint here traces 1:1:1:1 to:

  • claim in the paper
  • code in akhljndl/smollm
  • wandb run in ajindal/smollm tagged 0504_paper
  • HF artifact in this repo under checkpoints/0504_paper/

Citation

@misc{jindal2026smollm,
  author = {Akhil Jindal and Harang Ju},
  title  = {SMolLM: Small Language Models Learn Small Molecular Grammar},
  year   = {2026},
  url    = {https://github.com/akhljndl/smollm},
}

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support