anarmorarm
/

WaveletLM

Text Generation

product-key-memory

Model card Files Files and versions

Ramon Gougis commited on 23 days ago

Commit

58f60c2

·

verified ·

1 Parent(s): b61d3a1

Initial commit

Files changed (1) hide show

README.md +60 -3

README.md CHANGED Viewed

@@ -1,3 +1,60 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+datasets:
+- Salesforce/wikitext
+language:
+- en
+metrics:
+- perplexity
+pipeline_tag: text-generation
+tags:
+- wavelet
+- attention-free
+- no-attention
+- attentionfree
+- FWHT
+- walsh-hadamard
+- PKM
+- product-key-memory
+- causal-lm
+- text-generation
+- sub-quadratic
+---
+# WaveletLM
+A fully causal, attention-free language model that mixes tokens through learned lifting wavelet decomposition, FWHT, per-scale gated spectral mixing, inverse FWHT, and wavelet reconstruction. Combined with expanded MLPs and sparse product-key memory, this yields an architecture with no attention and O(n log n) scaling in sequence length.
+**Full code, training details, ablations, and documentation:**
+[github.com/ramongougis/WaveletLM](https://github.com/ramongougis/WaveletLM)
+## Results
+| Dataset | Params | Perplexity | BPB |
+|---------|--------|------------|-----|
+| WikiText-103 | 883M | 23.7 | 1.0140 |
+| PG-19 (1 epoch) | 808M | TBD | TBD |
+## How to Use
+```python
+import torch
+from huggingface_hub import hf_hub_download
+# Download the checkpoint
+ckpt_path = hf_hub_download(repo_id="ragou19/WaveletLM", filename="best_model.pt")
+# Then follow the instructions in the GitHub repo to load and run:
+# https://github.com/ramongougis/WaveletLM
+## Architecture
+See the full <a href="https://github.com/ramongougis/WaveletLM#architecture">architecture documentation</a> on GitHub.
+## Training
+Trained on a single RTX 5090 for 5 epochs on WikiText-103 (best of 3 seeds: 1337, 42, 7). Best validation loss: 3.16. PG-19 weights also included (1-epoch run; longer training planned post-release).
+See <a href="https://github.com/ramongougis/WaveletLM/blob/main/runs.md">runs.md</a> for the full training history.
+## License
+Apache 2.0. See <a href="https://github.com/ramongougis/WaveletLM/blob/main/LICENSE">LICENSE</a>.