vigneshwar234 commited on
Commit
7078228
Β·
verified Β·
1 Parent(s): 2138d01

Add CHANGELOG.md

Browse files
Files changed (1) hide show
  1. CHANGELOG.md +33 -0
CHANGELOG.md ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Changelog
2
+
3
+ ## [1.0.0] β€” 2026-05-19
4
+
5
+ ### Added
6
+ - Initial public release of TemporalMesh Transformer (TMT)
7
+ - `TMTConfig` β€” full hyperparameter dataclass with 5 model scale presets
8
+ - `MeshBuilder` β€” dynamic kNN graph rebuilt every forward pass from cosine similarity
9
+ - `MeshAttention` β€” multi-head attention over sparse graph edges, O(SΒ·k) cost
10
+ - `TemporalPositionEncoder` β€” RoPE + per-token learned decay scalars
11
+ - `ExitGate` β€” per-token confidence scoring with freeze-on-threshold logic
12
+ - `DualStreamFFN` β€” parallel syntax + semantic streams with learned gated fusion
13
+ - `MemoryAnchorCross` β€” 16 persistent EMA key-value anchor vectors
14
+ - `TMTLayer` β€” unified layer assembling all five components
15
+ - `TMTModel` β€” full autoregressive model with tied output projection
16
+ - `TMTOutput` β€” structured output dataclass (logits, exit_masks, confidences, graph_edges, memory_state, decay_scalars)
17
+ - `TMTTrainer` β€” training loop with wandb logging, cosine warmup, checkpoint saving
18
+ - `CosineWarmupScheduler` β€” learning rate schedule
19
+ - `TMTLoss` β€” cross-entropy + 0.1 Γ— gate auxiliary loss
20
+ - Dataset loader for WikiText-2 and TinyStories
21
+ - HuggingFace tokenizer wrapper
22
+ - Full ablation notebooks (01–04)
23
+ - 15-test pytest suite (shapes + forward pass)
24
+ - 20-page publication-quality PDF with 7 figures and 18 equations
25
+ - 5-subset HuggingFace benchmark dataset
26
+ - Zenodo DOI registration
27
+ - GitHub Pages documentation site
28
+
29
+ ### Architecture Details
30
+ - Default: d_model=512, n_heads=8, n_layers=12, graph_k=8, exit_threshold=0.85
31
+ - ~120M parameters (TMT-Base)
32
+ - WikiText-2 val perplexity: 29.4 (vs 42.1 vanilla baseline)
33
+ - Average compute per token: ~48% of full-depth baseline