Wave 02: Tension Attention vs Matched Softmax, 22M Pilot

This repository is a self-contained artifact bundle for the Wave 02 controlled head-to-head pilot in the TS Proof Ranker / TensionLM line of work.

It contains both trained arms, all matched seed checkpoints, the tokenizer used for the run, the minimal model source needed to load the checkpoints, training/evaluation scripts, exact configs, JSON receipts, and the aggregate report.

This is not a new proof-ranker ladder release and it is not v5. It is a controlled substrate-comparison pilot.

Research Question

Does sigmoid tension attention beat a matched softmax baseline when parameters, tokens, schedule, optimizer, tokenizer, and seeds are held fixed?

The tested arms were:

tension: sigmoid tension attention with tau-mass normalization.
softmax: matched local-window softmax attention baseline.

Both arms use the same architecture shell:

vocab size: 32768
dimension: 256
layers: 6
heads: 4
local window: 8
context length: 256
FFN multiplier: 3
oscillatory modulation: true
RoPE: false
global attention layers: 0
trainable/unique parameters per arm: 13,573,894

The parameter match receipt reports:

tension params: 13,573,894
softmax params: 13,573,894
parameter delta: 0.0000%

Conservative Result

Verdict: outcome_iii_no_capability_edge

Mean final validation loss across seeds 11, 23, 37:

tension: 6.287521
softmax: 6.288084
delta loss, tension minus softmax: -0.000563

Mean final validation perplexity across seeds 11, 23, 37:

tension: 537.847747
softmax: 538.130084
delta PPL, tension minus softmax: -0.282337

Tension is numerically lower on the aggregate by a very small margin, but this pilot does not establish a capability edge. The correct reading is outcome_iii_no_capability_edge.

Do not describe this as proved, a breakthrough, or a substrate win.

Aggregate Metrics

arm	seeds	params	mean final train loss	mean final val loss	mean final val PPL	NaN found	total wallclock s
softmax	11, 23, 37	13573894	6.383899	6.288084	538.130084	False	17278.0
tension	11, 23, 37	13573894	6.386063	6.287521	537.847747	False	17038.7

Per-Run Metrics

arm	seed	params	final train loss	final val loss	final val PPL	NaN found	wallclock s
softmax	11	13573894	6.402395	6.287437	537.773380	False	6044.6
softmax	23	13573894	6.317019	6.295467	542.109107	False	5441.6
softmax	37	13573894	6.432284	6.281346	534.507766	False	5791.9
tension	11	13573894	6.403912	6.283271	535.537515	False	5920.8
tension	23	13573894	6.321628	6.301905	545.610464	False	5442.5
tension	37	13573894	6.432649	6.277386	532.395263	False	5675.4

What Is In This Repo

README.md
requirements.txt
run_model.py
MANIFEST.json
configs/
  h2h_22m_tension.json
  h2h_22m_softmax.json
  h2h_smoke_tension.json
  h2h_smoke_softmax.json
tokenizer/
  tokenizer.json
checkpoints/
  tension_seed11/model.pt
  tension_seed11/result.json
  tension_seed23/model.pt
  tension_seed23/result.json
  tension_seed37/model.pt
  tension_seed37/result.json
  softmax_seed11/model.pt
  softmax_seed11/result.json
  softmax_seed23/model.pt
  softmax_seed23/result.json
  softmax_seed37/model.pt
  softmax_seed37/result.json
reports/
  wave02_pilot_report.json
  wave02_pilot_report.md
  wave03_interpretability_report.json
  wave03_interpretability_report.md
  wave03_tension_traces.jsonl
  match.json
  fineweb_tokens_meta.json
ts_reasoner/
  tension_attention.py
  softmax_attention.py
  __init__.py
scripts/
  h2h_smoke_train.py
  h2h_validate_match.py
  h2h_wave02_report.py
  h2h_run_wave02_pilot.py
  h2h_fineweb_data.py
  h2h_prepare_wave02.py
  h2h_export_wave02_hf.py
  h2h_upload_wave02_hf.py
  h2h_wave03_interpretability.py
tests/
  test_h2h_arms.py

The FineWeb-Edu parquet shard and generated train/val tensor files are not included because they are large external/reproducible inputs. Their exact provenance and hashes are included below and in reports/fineweb_tokens_meta.json.

Install

Use Python 3.12 if you want to match this receipt closely.

python -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt

The runtime dependencies needed to load and sample from the checkpoints are:

torch
tokenizers

huggingface_hub, pyarrow, and safetensors are included in requirements.txt for artifact/reproduction workflows.

Run A Checkpoint

Generate from the tension checkpoint for seed 11:

python run_model.py --arm tension --seed 11 --prompt "Mathematics is" --max_new_tokens 40

Generate from the matched softmax checkpoint for seed 11:

python run_model.py --arm softmax --seed 11 --prompt "Mathematics is" --max_new_tokens 40

Use another seed by changing --seed to 23 or 37.

The loader uses:

config: configs/h2h_22m_<arm>.json
checkpoint: checkpoints/<arm>_seed<seed>/model.pt
tokenizer: tokenizer/tokenizer.json
model source: ts_reasoner/tension_attention.py and ts_reasoner/softmax_attention.py

Minimal Loading Code

import json
import torch
from tokenizers import Tokenizer
from ts_reasoner.tension_attention import TensionBackbone, TensionConfig

cfg_raw = json.load(open("configs/h2h_22m_tension.json"))
cfg = TensionConfig.from_hf_config(cfg_raw)
model = TensionBackbone(cfg)
state = torch.load("checkpoints/tension_seed11/model.pt", map_location="cpu", weights_only=True)
model.load_state_dict(state)
model.eval()

tok = Tokenizer.from_file("tokenizer/tokenizer.json")
ids = torch.tensor([tok.encode("Mathematics is").ids], dtype=torch.long)
with torch.no_grad():
    logits = model(ids)[0]
next_token_id = int(logits[0, -1].argmax())
print(tok.decode([next_token_id]))

For the softmax arm, import SoftmaxBackbone and load configs/h2h_22m_softmax.json.

Data Receipt

Dataset:

dataset: HuggingFaceFW/fineweb-edu
revision: main
subset: sample/10BT
shard path: sample/10BT/000_00000.parquet
source commit: 87f09149ef4734204d70ed1d046ddc9ca3f2b8f9
parquet SHA-256: b1ba7b2ce4cb5ea6ef42dca40263eabb85f37700d01693a68e9b30a31d78e871

Tokenizer:

tokenizer source: BoggersTheFish/TensionLM-117M-Curriculum
tokenizer revision: main
tokenizer SHA-256: 20bfc8442802dee5c6d7ac330c9bafe0582df53d5a174bcf76f04603933cd840

Generated tensors:

max sequences: 200000
train sequences: 196000
validation sequences: 4000
context/block length: 256
generation command: /home/boggersthefish/Desktop/ts-proof-ranker/scripts/h2h_fineweb_data.py --max_sequences 200000
generated at: 2026-05-06T13:53:00.781522+00:00

Training Commands

Tension runs:

/home/boggersthefish/Desktop/ts-proof-ranker/scripts/h2h_smoke_train.py --config configs/h2h_22m_tension.json --data_dir artifacts/h2h/fineweb_tokens --out_dir artifacts/h2h/22m_runs --seed 11 --steps 1000 --warmup 100 --eval_every 200 --log_every 50
/home/boggersthefish/Desktop/ts-proof-ranker/scripts/h2h_smoke_train.py --config configs/h2h_22m_tension.json --data_dir artifacts/h2h/fineweb_tokens --out_dir artifacts/h2h/22m_runs --seed 23 --steps 1000 --warmup 100 --eval_every 200 --log_every 50 --batch_size 32
/home/boggersthefish/Desktop/ts-proof-ranker/scripts/h2h_smoke_train.py --config configs/h2h_22m_tension.json --data_dir artifacts/h2h/fineweb_tokens --out_dir artifacts/h2h/22m_runs --seed 37 --steps 1000 --warmup 100 --eval_every 200 --log_every 50 --batch_size 32

Softmax runs:

/home/boggersthefish/Desktop/ts-proof-ranker/scripts/h2h_smoke_train.py --config configs/h2h_22m_softmax.json --data_dir artifacts/h2h/fineweb_tokens --out_dir artifacts/h2h/22m_runs --seed 11 --steps 1000 --warmup 100 --eval_every 200 --log_every 50
/home/boggersthefish/Desktop/ts-proof-ranker/scripts/h2h_smoke_train.py --config configs/h2h_22m_softmax.json --data_dir artifacts/h2h/fineweb_tokens --out_dir artifacts/h2h/22m_runs --seed 23 --steps 1000 --warmup 100 --eval_every 200 --log_every 50 --batch_size 32
/home/boggersthefish/Desktop/ts-proof-ranker/scripts/h2h_smoke_train.py --config configs/h2h_22m_softmax.json --data_dir artifacts/h2h/fineweb_tokens --out_dir artifacts/h2h/22m_runs --seed 37 --steps 1000 --warmup 100 --eval_every 200 --log_every 50 --batch_size 32

The runner script used for the multi-seed extension was:

python -m scripts.h2h_run_wave02_pilot --seeds 23,37 --steps 1000 --warmup 100 --eval_every 200 --log_every 50

Seed 11 was run first as the one-seed pilot with the same schedule.

Validation

The local repository test suite passed after the Wave 03 export update:

52 passed

The match validator passed after the Wave 02 run:

match valid: tension=13,573,894 softmax=13,573,894 delta=0.0000%

No NaNs were found in any final run receipt.

Wave 03 Interpretability Pilot

Verdict: interpretability_pilot_receipt_no_capability_claim

Wave 03 is an interpretability and field-faithfulness receipt over the trained Wave 02 checkpoints. It is not a capability claim and it does not change the Wave 02 verdict.

Protocol:

Extract local token-edge fields from each trained checkpoint.
Select high-field source tokens for each validation sequence.
Replace those source tokens with token id 0.
Compare loss increase against same-count random source-token replacement.
Skip early partial local-window rows so one-neighbor contexts do not dominate the metric.

The key metric is top-random: how much more loss rises when high-field source tokens are ablated than when random source tokens are ablated. Positive values mean the exposed field identifies source tokens that matter more than random under this ablation protocol.

arm	seeds	baseline loss	top increase	random increase	top-random	top>random frac	entropy	top1 share	raw mass
softmax	11, 23, 37	6.248986	0.216230	0.103134	0.113097	0.986111	0.691024	0.441662	1.000000
tension	11, 23, 37	6.247036	0.186365	0.108120	0.078244	0.972222	0.798693	0.300620	3.364441

Delta top-minus-random, tension minus softmax: -0.034852

Conservative interpretation:

Both exposed fields are non-random under this ablation protocol.
Softmax is sharper here: top1 share 0.441662 versus tension 0.300620.
Softmax has the stronger top-token ablation receipt here: top-random 0.113097 versus tension 0.078244.
Tension is more distributed here: normalized entropy 0.798693 versus softmax 0.691024.
Tension exposes raw field mass greater than one (3.364441), unlike softmax's normalized mass of 1.000000.

The safe reading is: tension has an inspectable, distributed edge field, but this pilot does not establish an interpretability advantage over the matched softmax arm.

Wave 03 files:

reports/wave03_interpretability_report.json
reports/wave03_interpretability_report.md
reports/wave03_tension_traces.jsonl
scripts/h2h_wave03_interpretability.py

Environment

Aggregate report environment:

Python: 3.12.3
platform: Linux-6.17.0-20-generic-x86_64-with-glibc2.39
torch: 2.11.0+cu130
CUDA available: False
device: cpu

Export environment:

Python: 3.12.3
platform: Linux-6.17.0-20-generic-x86_64-with-glibc2.39
exported at: 2026-05-07T11:09:33.643460+00:00

Interpretation Rules

This artifact should be read conservatively.

Matched parameters are required before comparison.
Matched tokenizer, data tensors, optimizer, seed set, batch size, schedule, warmup, and eval cadence are required before comparison.
Any NaN or unmatched arm invalidates the comparison.
A numerically lower validation loss is not, by itself, a substrate capability claim.
This pilot result is best summarized as no clear capability edge.

Known Limitations

This is a 22M-ish pilot, not a large-scale language model result.
It uses three seeds, which is useful as a receipt but still limited.
The run is CPU-produced and slow.
The data sample is one pinned FineWeb-Edu shard transformed into 200,000 fixed-length sequences.
The result probes next-token modeling under this exact schedule; it does not by itself establish general reasoning ability.
The included checkpoints are research artifacts, not production chat/instruction models.

Citation

If you refer to this artifact, cite it as a controlled Wave 02 pilot comparing sigmoid tension attention and matched softmax attention in the TS Proof Ranker / TensionLM repo, with the verdict outcome_iii_no_capability_edge.

Repository ID: BoggersTheFish/TensionLM-Wave02-22M-H2H

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

BoggersTheFish
/

TensionLM-Wave02-22M-H2H