Wave 02: Tension Attention vs Matched Softmax, 22M Pilot

This repository is a self-contained artifact bundle for the Wave 02 controlled head-to-head pilot in the TS Proof Ranker / TensionLM line of work.

It contains both trained arms, all matched seed checkpoints, the tokenizer used for the run, the minimal model source needed to load the checkpoints, training/evaluation scripts, exact configs, JSON receipts, and the aggregate report.

This is not a new proof-ranker ladder release and it is not v5. It is a controlled substrate-comparison pilot.

Research Question

Does sigmoid tension attention beat a matched softmax baseline when parameters, tokens, schedule, optimizer, tokenizer, and seeds are held fixed?

The tested arms were:

  • tension: sigmoid tension attention with tau-mass normalization.
  • softmax: matched local-window softmax attention baseline.

Both arms use the same architecture shell:

  • vocab size: 32768
  • dimension: 256
  • layers: 6
  • heads: 4
  • local window: 8
  • context length: 256
  • FFN multiplier: 3
  • oscillatory modulation: true
  • RoPE: false
  • global attention layers: 0
  • trainable/unique parameters per arm: 13,573,894

The parameter match receipt reports:

  • tension params: 13,573,894
  • softmax params: 13,573,894
  • parameter delta: 0.0000%

Conservative Result

Verdict: outcome_iii_no_capability_edge

Mean final validation loss across seeds 11, 23, 37:

  • tension: 6.287521
  • softmax: 6.288084
  • delta loss, tension minus softmax: -0.000563

Mean final validation perplexity across seeds 11, 23, 37:

  • tension: 537.847747
  • softmax: 538.130084
  • delta PPL, tension minus softmax: -0.282337

Tension is numerically lower on the aggregate by a very small margin, but this pilot does not establish a capability edge. The correct reading is outcome_iii_no_capability_edge.

Do not describe this as proved, a breakthrough, or a substrate win.

Aggregate Metrics

arm seeds params mean final train loss mean final val loss mean final val PPL NaN found total wallclock s
softmax 11, 23, 37 13573894 6.383899 6.288084 538.130084 False 17278.0
tension 11, 23, 37 13573894 6.386063 6.287521 537.847747 False 17038.7

Per-Run Metrics

arm seed params final train loss final val loss final val PPL NaN found wallclock s
softmax 11 13573894 6.402395 6.287437 537.773380 False 6044.6
softmax 23 13573894 6.317019 6.295467 542.109107 False 5441.6
softmax 37 13573894 6.432284 6.281346 534.507766 False 5791.9
tension 11 13573894 6.403912 6.283271 535.537515 False 5920.8
tension 23 13573894 6.321628 6.301905 545.610464 False 5442.5
tension 37 13573894 6.432649 6.277386 532.395263 False 5675.4

What Is In This Repo

README.md
requirements.txt
run_model.py
MANIFEST.json
configs/
  h2h_22m_tension.json
  h2h_22m_softmax.json
  h2h_smoke_tension.json
  h2h_smoke_softmax.json
tokenizer/
  tokenizer.json
checkpoints/
  tension_seed11/model.pt
  tension_seed11/result.json
  tension_seed23/model.pt
  tension_seed23/result.json
  tension_seed37/model.pt
  tension_seed37/result.json
  softmax_seed11/model.pt
  softmax_seed11/result.json
  softmax_seed23/model.pt
  softmax_seed23/result.json
  softmax_seed37/model.pt
  softmax_seed37/result.json
reports/
  wave02_pilot_report.json
  wave02_pilot_report.md
  wave03_interpretability_report.json
  wave03_interpretability_report.md
  wave03_tension_traces.jsonl
  match.json
  fineweb_tokens_meta.json
ts_reasoner/
  tension_attention.py
  softmax_attention.py
  __init__.py
scripts/
  h2h_smoke_train.py
  h2h_validate_match.py
  h2h_wave02_report.py
  h2h_run_wave02_pilot.py
  h2h_fineweb_data.py
  h2h_prepare_wave02.py
  h2h_export_wave02_hf.py
  h2h_upload_wave02_hf.py
  h2h_wave03_interpretability.py
tests/
  test_h2h_arms.py

The FineWeb-Edu parquet shard and generated train/val tensor files are not included because they are large external/reproducible inputs. Their exact provenance and hashes are included below and in reports/fineweb_tokens_meta.json.

Install

Use Python 3.12 if you want to match this receipt closely.

python -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt

The runtime dependencies needed to load and sample from the checkpoints are:

  • torch
  • tokenizers

huggingface_hub, pyarrow, and safetensors are included in requirements.txt for artifact/reproduction workflows.

Run A Checkpoint

Generate from the tension checkpoint for seed 11:

python run_model.py --arm tension --seed 11 --prompt "Mathematics is" --max_new_tokens 40

Generate from the matched softmax checkpoint for seed 11:

python run_model.py --arm softmax --seed 11 --prompt "Mathematics is" --max_new_tokens 40

Use another seed by changing --seed to 23 or 37.

The loader uses:

  • config: configs/h2h_22m_<arm>.json
  • checkpoint: checkpoints/<arm>_seed<seed>/model.pt
  • tokenizer: tokenizer/tokenizer.json
  • model source: ts_reasoner/tension_attention.py and ts_reasoner/softmax_attention.py

Minimal Loading Code

import json
import torch
from tokenizers import Tokenizer
from ts_reasoner.tension_attention import TensionBackbone, TensionConfig

cfg_raw = json.load(open("configs/h2h_22m_tension.json"))
cfg = TensionConfig.from_hf_config(cfg_raw)
model = TensionBackbone(cfg)
state = torch.load("checkpoints/tension_seed11/model.pt", map_location="cpu", weights_only=True)
model.load_state_dict(state)
model.eval()

tok = Tokenizer.from_file("tokenizer/tokenizer.json")
ids = torch.tensor([tok.encode("Mathematics is").ids], dtype=torch.long)
with torch.no_grad():
    logits = model(ids)[0]
next_token_id = int(logits[0, -1].argmax())
print(tok.decode([next_token_id]))

For the softmax arm, import SoftmaxBackbone and load configs/h2h_22m_softmax.json.

Data Receipt

Dataset:

  • dataset: HuggingFaceFW/fineweb-edu
  • revision: main
  • subset: sample/10BT
  • shard path: sample/10BT/000_00000.parquet
  • source commit: 87f09149ef4734204d70ed1d046ddc9ca3f2b8f9
  • parquet SHA-256: b1ba7b2ce4cb5ea6ef42dca40263eabb85f37700d01693a68e9b30a31d78e871

Tokenizer:

  • tokenizer source: BoggersTheFish/TensionLM-117M-Curriculum
  • tokenizer revision: main
  • tokenizer SHA-256: 20bfc8442802dee5c6d7ac330c9bafe0582df53d5a174bcf76f04603933cd840

Generated tensors:

  • max sequences: 200000
  • train sequences: 196000
  • validation sequences: 4000
  • context/block length: 256
  • generation command: /home/boggersthefish/Desktop/ts-proof-ranker/scripts/h2h_fineweb_data.py --max_sequences 200000
  • generated at: 2026-05-06T13:53:00.781522+00:00

Training Commands

Tension runs:

/home/boggersthefish/Desktop/ts-proof-ranker/scripts/h2h_smoke_train.py --config configs/h2h_22m_tension.json --data_dir artifacts/h2h/fineweb_tokens --out_dir artifacts/h2h/22m_runs --seed 11 --steps 1000 --warmup 100 --eval_every 200 --log_every 50
/home/boggersthefish/Desktop/ts-proof-ranker/scripts/h2h_smoke_train.py --config configs/h2h_22m_tension.json --data_dir artifacts/h2h/fineweb_tokens --out_dir artifacts/h2h/22m_runs --seed 23 --steps 1000 --warmup 100 --eval_every 200 --log_every 50 --batch_size 32
/home/boggersthefish/Desktop/ts-proof-ranker/scripts/h2h_smoke_train.py --config configs/h2h_22m_tension.json --data_dir artifacts/h2h/fineweb_tokens --out_dir artifacts/h2h/22m_runs --seed 37 --steps 1000 --warmup 100 --eval_every 200 --log_every 50 --batch_size 32

Softmax runs:

/home/boggersthefish/Desktop/ts-proof-ranker/scripts/h2h_smoke_train.py --config configs/h2h_22m_softmax.json --data_dir artifacts/h2h/fineweb_tokens --out_dir artifacts/h2h/22m_runs --seed 11 --steps 1000 --warmup 100 --eval_every 200 --log_every 50
/home/boggersthefish/Desktop/ts-proof-ranker/scripts/h2h_smoke_train.py --config configs/h2h_22m_softmax.json --data_dir artifacts/h2h/fineweb_tokens --out_dir artifacts/h2h/22m_runs --seed 23 --steps 1000 --warmup 100 --eval_every 200 --log_every 50 --batch_size 32
/home/boggersthefish/Desktop/ts-proof-ranker/scripts/h2h_smoke_train.py --config configs/h2h_22m_softmax.json --data_dir artifacts/h2h/fineweb_tokens --out_dir artifacts/h2h/22m_runs --seed 37 --steps 1000 --warmup 100 --eval_every 200 --log_every 50 --batch_size 32

The runner script used for the multi-seed extension was:

python -m scripts.h2h_run_wave02_pilot --seeds 23,37 --steps 1000 --warmup 100 --eval_every 200 --log_every 50

Seed 11 was run first as the one-seed pilot with the same schedule.

Validation

The local repository test suite passed after the Wave 03 export update:

52 passed

The match validator passed after the Wave 02 run:

match valid: tension=13,573,894 softmax=13,573,894 delta=0.0000%

No NaNs were found in any final run receipt.

Wave 03 Interpretability Pilot

Verdict: interpretability_pilot_receipt_no_capability_claim

Wave 03 is an interpretability and field-faithfulness receipt over the trained Wave 02 checkpoints. It is not a capability claim and it does not change the Wave 02 verdict.

Protocol:

  • Extract local token-edge fields from each trained checkpoint.
  • Select high-field source tokens for each validation sequence.
  • Replace those source tokens with token id 0.
  • Compare loss increase against same-count random source-token replacement.
  • Skip early partial local-window rows so one-neighbor contexts do not dominate the metric.

The key metric is top-random: how much more loss rises when high-field source tokens are ablated than when random source tokens are ablated. Positive values mean the exposed field identifies source tokens that matter more than random under this ablation protocol.

arm seeds baseline loss top increase random increase top-random top>random frac entropy top1 share raw mass
softmax 11, 23, 37 6.248986 0.216230 0.103134 0.113097 0.986111 0.691024 0.441662 1.000000
tension 11, 23, 37 6.247036 0.186365 0.108120 0.078244 0.972222 0.798693 0.300620 3.364441

Delta top-minus-random, tension minus softmax: -0.034852

Conservative interpretation:

  • Both exposed fields are non-random under this ablation protocol.
  • Softmax is sharper here: top1 share 0.441662 versus tension 0.300620.
  • Softmax has the stronger top-token ablation receipt here: top-random 0.113097 versus tension 0.078244.
  • Tension is more distributed here: normalized entropy 0.798693 versus softmax 0.691024.
  • Tension exposes raw field mass greater than one (3.364441), unlike softmax's normalized mass of 1.000000.

The safe reading is: tension has an inspectable, distributed edge field, but this pilot does not establish an interpretability advantage over the matched softmax arm.

Wave 03 files:

  • reports/wave03_interpretability_report.json
  • reports/wave03_interpretability_report.md
  • reports/wave03_tension_traces.jsonl
  • scripts/h2h_wave03_interpretability.py

Environment

Aggregate report environment:

  • Python: 3.12.3
  • platform: Linux-6.17.0-20-generic-x86_64-with-glibc2.39
  • torch: 2.11.0+cu130
  • CUDA available: False
  • device: cpu

Export environment:

  • Python: 3.12.3
  • platform: Linux-6.17.0-20-generic-x86_64-with-glibc2.39
  • exported at: 2026-05-07T11:09:33.643460+00:00

Interpretation Rules

This artifact should be read conservatively.

  • Matched parameters are required before comparison.
  • Matched tokenizer, data tensors, optimizer, seed set, batch size, schedule, warmup, and eval cadence are required before comparison.
  • Any NaN or unmatched arm invalidates the comparison.
  • A numerically lower validation loss is not, by itself, a substrate capability claim.
  • This pilot result is best summarized as no clear capability edge.

Known Limitations

  • This is a 22M-ish pilot, not a large-scale language model result.
  • It uses three seeds, which is useful as a receipt but still limited.
  • The run is CPU-produced and slow.
  • The data sample is one pinned FineWeb-Edu shard transformed into 200,000 fixed-length sequences.
  • The result probes next-token modeling under this exact schedule; it does not by itself establish general reasoning ability.
  • The included checkpoints are research artifacts, not production chat/instruction models.

Citation

If you refer to this artifact, cite it as a controlled Wave 02 pilot comparing sigmoid tension attention and matched softmax attention in the TS Proof Ranker / TensionLM repo, with the verdict outcome_iii_no_capability_edge.

Repository ID: BoggersTheFish/TensionLM-Wave02-22M-H2H

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train BoggersTheFish/TensionLM-Wave02-22M-H2H