Salesforce/wikitext
Viewer • Updated • 3.71M • 1.34M • 684
WaveletLM is a fully causal, attention-free language model that mixes tokens through learned lifting wavelet decomposition, a Fast Walsh-Hadamard Transform, per-scale gated spectral mixing with SwiGLU activation, an inverse FWHT, and wavelet reconstruction. Combined with expanded MLPs and sparse product-key memory, this yields an architecture with no attention and O(n log n) scaling in sequence length.
Full code, training details, ablations, and documentation: github.com/ramongougis/WaveletLM
| Dataset | Params | Perplexity | BPB |
|---|---|---|---|
| WikiText-103 | 883M | 23.8 | 1.0140 |
| PG-19 (1 epoch) | 808M | 27.4 | 1.0853 |
import torch
from huggingface_hub import hf_hub_download
# Download the checkpoint
ckpt_path = hf_hub_download(repo_id="ragou19/WaveletLM", filename="best_model.pt")
Then follow the instructions in the GitHub repo to load and run: https://github.com/ramongougis/WaveletLM
--ptq8 enabled.compile:false to save 0.5-1 GB, but it's slower.See runs.md for the full training history.
Apache 2.0. See LICENSE.