Wave 02: Tension Attention vs Matched Softmax, 22M Pilot
This repository is a self-contained artifact bundle for the Wave 02 controlled head-to-head pilot in the TS Proof Ranker / TensionLM line of work.
It contains both trained arms, all matched seed checkpoints, the tokenizer used for the run, the minimal model source needed to load the checkpoints, training/evaluation scripts, exact configs, JSON receipts, and the aggregate report.
This is not a new proof-ranker ladder release and it is not v5. It is a controlled substrate-comparison pilot.
Research Question
Does sigmoid tension attention beat a matched softmax baseline when parameters, tokens, schedule, optimizer, tokenizer, and seeds are held fixed?
The tested arms were:
tension: sigmoid tension attention with tau-mass normalization.softmax: matched local-window softmax attention baseline.
Both arms use the same architecture shell:
- vocab size:
32768 - dimension:
256 - layers:
6 - heads:
4 - local window:
8 - context length:
256 - FFN multiplier:
3 - oscillatory modulation:
true - RoPE:
false - global attention layers:
0 - trainable/unique parameters per arm:
13,573,894
The parameter match receipt reports:
- tension params:
13,573,894 - softmax params:
13,573,894 - parameter delta:
0.0000%
Conservative Result
Verdict: outcome_iii_no_capability_edge
Mean final validation loss across seeds 11, 23, 37:
- tension:
6.287521 - softmax:
6.288084 - delta loss, tension minus softmax:
-0.000563
Mean final validation perplexity across seeds 11, 23, 37:
- tension:
537.847747 - softmax:
538.130084 - delta PPL, tension minus softmax:
-0.282337
Tension is numerically lower on the aggregate by a very small margin, but this pilot does not establish a capability edge. The correct reading is outcome_iii_no_capability_edge.
Do not describe this as proved, a breakthrough, or a substrate win.
Aggregate Metrics
| arm | seeds | params | mean final train loss | mean final val loss | mean final val PPL | NaN found | total wallclock s |
|---|---|---|---|---|---|---|---|
| softmax | 11, 23, 37 | 13573894 | 6.383899 | 6.288084 | 538.130084 | False | 17278.0 |
| tension | 11, 23, 37 | 13573894 | 6.386063 | 6.287521 | 537.847747 | False | 17038.7 |
Per-Run Metrics
| arm | seed | params | final train loss | final val loss | final val PPL | NaN found | wallclock s |
|---|---|---|---|---|---|---|---|
| softmax | 11 | 13573894 | 6.402395 | 6.287437 | 537.773380 | False | 6044.6 |
| softmax | 23 | 13573894 | 6.317019 | 6.295467 | 542.109107 | False | 5441.6 |
| softmax | 37 | 13573894 | 6.432284 | 6.281346 | 534.507766 | False | 5791.9 |
| tension | 11 | 13573894 | 6.403912 | 6.283271 | 535.537515 | False | 5920.8 |
| tension | 23 | 13573894 | 6.321628 | 6.301905 | 545.610464 | False | 5442.5 |
| tension | 37 | 13573894 | 6.432649 | 6.277386 | 532.395263 | False | 5675.4 |
What Is In This Repo
README.md
requirements.txt
run_model.py
MANIFEST.json
configs/
h2h_22m_tension.json
h2h_22m_softmax.json
h2h_smoke_tension.json
h2h_smoke_softmax.json
tokenizer/
tokenizer.json
checkpoints/
tension_seed11/model.pt
tension_seed11/result.json
tension_seed23/model.pt
tension_seed23/result.json
tension_seed37/model.pt
tension_seed37/result.json
softmax_seed11/model.pt
softmax_seed11/result.json
softmax_seed23/model.pt
softmax_seed23/result.json
softmax_seed37/model.pt
softmax_seed37/result.json
reports/
wave02_pilot_report.json
wave02_pilot_report.md
wave03_interpretability_report.json
wave03_interpretability_report.md
wave03_tension_traces.jsonl
match.json
fineweb_tokens_meta.json
ts_reasoner/
tension_attention.py
softmax_attention.py
__init__.py
scripts/
h2h_smoke_train.py
h2h_validate_match.py
h2h_wave02_report.py
h2h_run_wave02_pilot.py
h2h_fineweb_data.py
h2h_prepare_wave02.py
h2h_export_wave02_hf.py
h2h_upload_wave02_hf.py
h2h_wave03_interpretability.py
tests/
test_h2h_arms.py
The FineWeb-Edu parquet shard and generated train/val tensor files are not included because they are large external/reproducible inputs. Their exact provenance and hashes are included below and in reports/fineweb_tokens_meta.json.
Install
Use Python 3.12 if you want to match this receipt closely.
python -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
The runtime dependencies needed to load and sample from the checkpoints are:
torchtokenizers
huggingface_hub, pyarrow, and safetensors are included in requirements.txt for artifact/reproduction workflows.
Run A Checkpoint
Generate from the tension checkpoint for seed 11:
python run_model.py --arm tension --seed 11 --prompt "Mathematics is" --max_new_tokens 40
Generate from the matched softmax checkpoint for seed 11:
python run_model.py --arm softmax --seed 11 --prompt "Mathematics is" --max_new_tokens 40
Use another seed by changing --seed to 23 or 37.
The loader uses:
- config:
configs/h2h_22m_<arm>.json - checkpoint:
checkpoints/<arm>_seed<seed>/model.pt - tokenizer:
tokenizer/tokenizer.json - model source:
ts_reasoner/tension_attention.pyandts_reasoner/softmax_attention.py
Minimal Loading Code
import json
import torch
from tokenizers import Tokenizer
from ts_reasoner.tension_attention import TensionBackbone, TensionConfig
cfg_raw = json.load(open("configs/h2h_22m_tension.json"))
cfg = TensionConfig.from_hf_config(cfg_raw)
model = TensionBackbone(cfg)
state = torch.load("checkpoints/tension_seed11/model.pt", map_location="cpu", weights_only=True)
model.load_state_dict(state)
model.eval()
tok = Tokenizer.from_file("tokenizer/tokenizer.json")
ids = torch.tensor([tok.encode("Mathematics is").ids], dtype=torch.long)
with torch.no_grad():
logits = model(ids)[0]
next_token_id = int(logits[0, -1].argmax())
print(tok.decode([next_token_id]))
For the softmax arm, import SoftmaxBackbone and load configs/h2h_22m_softmax.json.
Data Receipt
Dataset:
- dataset:
HuggingFaceFW/fineweb-edu - revision:
main - subset:
sample/10BT - shard path:
sample/10BT/000_00000.parquet - source commit:
87f09149ef4734204d70ed1d046ddc9ca3f2b8f9 - parquet SHA-256:
b1ba7b2ce4cb5ea6ef42dca40263eabb85f37700d01693a68e9b30a31d78e871
Tokenizer:
- tokenizer source:
BoggersTheFish/TensionLM-117M-Curriculum - tokenizer revision:
main - tokenizer SHA-256:
20bfc8442802dee5c6d7ac330c9bafe0582df53d5a174bcf76f04603933cd840
Generated tensors:
- max sequences:
200000 - train sequences:
196000 - validation sequences:
4000 - context/block length:
256 - generation command:
/home/boggersthefish/Desktop/ts-proof-ranker/scripts/h2h_fineweb_data.py --max_sequences 200000 - generated at:
2026-05-06T13:53:00.781522+00:00
Training Commands
Tension runs:
/home/boggersthefish/Desktop/ts-proof-ranker/scripts/h2h_smoke_train.py --config configs/h2h_22m_tension.json --data_dir artifacts/h2h/fineweb_tokens --out_dir artifacts/h2h/22m_runs --seed 11 --steps 1000 --warmup 100 --eval_every 200 --log_every 50
/home/boggersthefish/Desktop/ts-proof-ranker/scripts/h2h_smoke_train.py --config configs/h2h_22m_tension.json --data_dir artifacts/h2h/fineweb_tokens --out_dir artifacts/h2h/22m_runs --seed 23 --steps 1000 --warmup 100 --eval_every 200 --log_every 50 --batch_size 32
/home/boggersthefish/Desktop/ts-proof-ranker/scripts/h2h_smoke_train.py --config configs/h2h_22m_tension.json --data_dir artifacts/h2h/fineweb_tokens --out_dir artifacts/h2h/22m_runs --seed 37 --steps 1000 --warmup 100 --eval_every 200 --log_every 50 --batch_size 32
Softmax runs:
/home/boggersthefish/Desktop/ts-proof-ranker/scripts/h2h_smoke_train.py --config configs/h2h_22m_softmax.json --data_dir artifacts/h2h/fineweb_tokens --out_dir artifacts/h2h/22m_runs --seed 11 --steps 1000 --warmup 100 --eval_every 200 --log_every 50
/home/boggersthefish/Desktop/ts-proof-ranker/scripts/h2h_smoke_train.py --config configs/h2h_22m_softmax.json --data_dir artifacts/h2h/fineweb_tokens --out_dir artifacts/h2h/22m_runs --seed 23 --steps 1000 --warmup 100 --eval_every 200 --log_every 50 --batch_size 32
/home/boggersthefish/Desktop/ts-proof-ranker/scripts/h2h_smoke_train.py --config configs/h2h_22m_softmax.json --data_dir artifacts/h2h/fineweb_tokens --out_dir artifacts/h2h/22m_runs --seed 37 --steps 1000 --warmup 100 --eval_every 200 --log_every 50 --batch_size 32
The runner script used for the multi-seed extension was:
python -m scripts.h2h_run_wave02_pilot --seeds 23,37 --steps 1000 --warmup 100 --eval_every 200 --log_every 50
Seed 11 was run first as the one-seed pilot with the same schedule.
Validation
The local repository test suite passed after the Wave 03 export update:
52 passed
The match validator passed after the Wave 02 run:
match valid: tension=13,573,894 softmax=13,573,894 delta=0.0000%
No NaNs were found in any final run receipt.
Wave 03 Interpretability Pilot
Verdict: interpretability_pilot_receipt_no_capability_claim
Wave 03 is an interpretability and field-faithfulness receipt over the trained Wave 02 checkpoints. It is not a capability claim and it does not change the Wave 02 verdict.
Protocol:
- Extract local token-edge fields from each trained checkpoint.
- Select high-field source tokens for each validation sequence.
- Replace those source tokens with token id
0. - Compare loss increase against same-count random source-token replacement.
- Skip early partial local-window rows so one-neighbor contexts do not dominate the metric.
The key metric is top-random: how much more loss rises when high-field source tokens are ablated than when random source tokens are ablated. Positive values mean the exposed field identifies source tokens that matter more than random under this ablation protocol.
| arm | seeds | baseline loss | top increase | random increase | top-random | top>random frac | entropy | top1 share | raw mass |
|---|---|---|---|---|---|---|---|---|---|
| softmax | 11, 23, 37 | 6.248986 | 0.216230 | 0.103134 | 0.113097 | 0.986111 | 0.691024 | 0.441662 | 1.000000 |
| tension | 11, 23, 37 | 6.247036 | 0.186365 | 0.108120 | 0.078244 | 0.972222 | 0.798693 | 0.300620 | 3.364441 |
Delta top-minus-random, tension minus softmax: -0.034852
Conservative interpretation:
- Both exposed fields are non-random under this ablation protocol.
- Softmax is sharper here: top1 share
0.441662versus tension0.300620. - Softmax has the stronger top-token ablation receipt here: top-random
0.113097versus tension0.078244. - Tension is more distributed here: normalized entropy
0.798693versus softmax0.691024. - Tension exposes raw field mass greater than one (
3.364441), unlike softmax's normalized mass of1.000000.
The safe reading is: tension has an inspectable, distributed edge field, but this pilot does not establish an interpretability advantage over the matched softmax arm.
Wave 03 files:
reports/wave03_interpretability_report.jsonreports/wave03_interpretability_report.mdreports/wave03_tension_traces.jsonlscripts/h2h_wave03_interpretability.py
Environment
Aggregate report environment:
- Python:
3.12.3 - platform:
Linux-6.17.0-20-generic-x86_64-with-glibc2.39 - torch:
2.11.0+cu130 - CUDA available:
False - device:
cpu
Export environment:
- Python:
3.12.3 - platform:
Linux-6.17.0-20-generic-x86_64-with-glibc2.39 - exported at:
2026-05-07T11:09:33.643460+00:00
Interpretation Rules
This artifact should be read conservatively.
- Matched parameters are required before comparison.
- Matched tokenizer, data tensors, optimizer, seed set, batch size, schedule, warmup, and eval cadence are required before comparison.
- Any NaN or unmatched arm invalidates the comparison.
- A numerically lower validation loss is not, by itself, a substrate capability claim.
- This pilot result is best summarized as no clear capability edge.
Known Limitations
- This is a 22M-ish pilot, not a large-scale language model result.
- It uses three seeds, which is useful as a receipt but still limited.
- The run is CPU-produced and slow.
- The data sample is one pinned FineWeb-Edu shard transformed into 200,000 fixed-length sequences.
- The result probes next-token modeling under this exact schedule; it does not by itself establish general reasoning ability.
- The included checkpoints are research artifacts, not production chat/instruction models.
Citation
If you refer to this artifact, cite it as a controlled Wave 02 pilot comparing sigmoid tension attention and matched softmax attention in the TS Proof Ranker / TensionLM repo, with the verdict outcome_iii_no_capability_edge.
Repository ID: BoggersTheFish/TensionLM-Wave02-22M-H2H