Ramon Gougis commited on
Commit
58f60c2
·
verified ·
1 Parent(s): b61d3a1

Initial commit

Browse files
Files changed (1) hide show
  1. README.md +60 -3
README.md CHANGED
@@ -1,3 +1,60 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - Salesforce/wikitext
5
+ language:
6
+ - en
7
+ metrics:
8
+ - perplexity
9
+ pipeline_tag: text-generation
10
+ tags:
11
+ - wavelet
12
+ - attention-free
13
+ - no-attention
14
+ - attentionfree
15
+ - FWHT
16
+ - walsh-hadamard
17
+ - PKM
18
+ - product-key-memory
19
+ - causal-lm
20
+ - text-generation
21
+ - sub-quadratic
22
+ ---
23
+
24
+ # WaveletLM
25
+
26
+ A fully causal, attention-free language model that mixes tokens through learned lifting wavelet decomposition, FWHT, per-scale gated spectral mixing, inverse FWHT, and wavelet reconstruction. Combined with expanded MLPs and sparse product-key memory, this yields an architecture with no attention and O(n log n) scaling in sequence length.
27
+
28
+ **Full code, training details, ablations, and documentation:**
29
+ [github.com/ramongougis/WaveletLM](https://github.com/ramongougis/WaveletLM)
30
+
31
+ ## Results
32
+
33
+ | Dataset | Params | Perplexity | BPB |
34
+ |---------|--------|------------|-----|
35
+ | WikiText-103 | 883M | 23.7 | 1.0140 |
36
+ | PG-19 (1 epoch) | 808M | TBD | TBD |
37
+
38
+ ## How to Use
39
+
40
+ ```python
41
+ import torch
42
+ from huggingface_hub import hf_hub_download
43
+
44
+ # Download the checkpoint
45
+ ckpt_path = hf_hub_download(repo_id="ragou19/WaveletLM", filename="best_model.pt")
46
+
47
+ # Then follow the instructions in the GitHub repo to load and run:
48
+ # https://github.com/ramongougis/WaveletLM
49
+
50
+ ## Architecture
51
+ See the full <a href="https://github.com/ramongougis/WaveletLM#architecture">architecture documentation</a> on GitHub.
52
+
53
+ ## Training
54
+ Trained on a single RTX 5090 for 5 epochs on WikiText-103 (best of 3 seeds: 1337, 42, 7). Best validation loss: 3.16. PG-19 weights also included (1-epoch run; longer training planned post-release).
55
+
56
+ See <a href="https://github.com/ramongougis/WaveletLM/blob/main/runs.md">runs.md</a> for the full training history.
57
+
58
+ ## License
59
+ Apache 2.0. See <a href="https://github.com/ramongougis/WaveletLM/blob/main/LICENSE">LICENSE</a>.
60
+