KAN-LM Study β€” Model Checkpoints

Trained checkpoints for the paper "Auditing and Benchmarking KAN Feed-Forward Layers in Small Language Models." These are the model weights behind every figure and table; the code, experiment scripts, and provenance manifest live in the companion repository.

These are not transformers-loadable models. They are kanprey checkpoints (custom KAN/MLP transformer). Load them with the vendored kan-guppylm code in the companion repo (vendor/kan-guppylm/kanprey), not AutoModel.

What's here

best.pt for every training run (128 files, ~21.7 GB), organized by regime:

Regime Path prefix Contents
GuppyLM screen mlp_s*, swiglu_s*, kan_grid2_s*, grkan_corrected_s*, basis_confirm/*, kat_s*, mlpedge_* 3-seed architecture screen (d=384, 6 layers, vocab 2,393)
BabyLM Strict-Small babylm/<arch>_s42..s51 the 61-run matrix (4 critical Γ— 10 seeds + support/low rows), vocab 8,192
Grid-size sweep gridsweep/* KAN grid 2/5/10/20 for the interpretability-vs-capacity sweep
Wikitext-103 scale scale/mlp, scale/mlpedge_h8 GPT-2-small parameter-matched stress test
  • INVENTORY.tsv β€” every best.pt with size and path.
  • SHA256SUMS β€” integrity checksums; verify with shasum -a 256 -c SHA256SUMS.

(The 286M ClimbMix GR-KAN stress-test checkpoints are large and tracked separately; see the code repo's manifest.json.)

Provenance and correctness

  • The corrected rational activation uses the Safe PadΓ© denominator Q(x)=1+|b0 x + b1 x^2 + b2 x^3 + b3 x^4|. Pre-fix GR-KAN checkpoints are excluded from all reported evidence and are not in this collection.
  • Repo commits, the kernel correction, and a figure/table β†’ script β†’ checkpoint map are in manifest.json in the code repo.

Loading a checkpoint

import torch
from kanprey.config import ModelConfig          # from vendor/kan-guppylm
from kanprey.model import KANpreyLM, MLPTransformer

ckpt = torch.load("babylm/grkan_canonical_s42/best.pt", map_location="cpu", weights_only=False)
cfg  = ckpt["model_cfg"]
model = (MLPTransformer if ckpt["model_type"] == "mlp" else KANpreyLM)(cfg)
model.load_state_dict(ckpt["model"]); model.eval()

Licenses

Weights: MIT. Training data retains its own licenses β€” GuppyLM (MIT), BabyLM challenge corpus, Wikitext-103 (CC BY-SA 3.0/GFDL), ClimbMix β†’ NVIDIA Nemotron-ClimbMix (CC BY-NC 4.0, research use).

Citation

@misc{alves2026kanlm,
  title  = {Auditing and Benchmarking KAN Feed-Forward Layers in Small Language Models},
  author = {Alves, Felippe},
  year   = {2026},
  note   = {Code: https://github.com/ACS-USP/kan-lm-study}
}

@misc{acsusp2026kanlmckpts,
  author    = {Agentic Complex Systems - USP},
  title     = {kan-lm-study-checkpoints},
  year      = {2026},
  publisher = {Hugging Face},
  doi       = {10.57967/hf/9264},
  url       = {https://huggingface.co/ACS-USP/kan-lm-study-checkpoints}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train ACS-USP/kan-lm-study-checkpoints