T³ 124M v3.6 (run-3 release)

Inference-ready checkpoint for T³, a Clifford-algebra-augmented transformer architecture. 124M parameters, GPT-2 Small substrate, 5B training tokens.

This is the canonical reference checkpoint for the v3.6 lineage. Companion artifacts:

Code: mirrorethic/t3-reference (Apache-2.0)
Trace library + benchmarks: https://t3atlas.dev
Sibling checkpoint: mirrorethic/t3-124m-v36-pcloss — same architecture, trained with the inter-stage predictive-coding loss un-detached. Slightly worse PPL (28.53 vs 27.76), neutral on reasoning. Use the pair for the controlled inter-stage-PC ablation.

Quick start

pip install t3-reference

from huggingface_hub import hf_hub_download
from t3 import T3Model

ckpt = hf_hub_download("mirrorethic/t3-124m-v36", "pytorch_model.bin")
model = T3Model.from_checkpoint(ckpt)
model.eval()

import torch
input_ids = torch.randint(0, 50257, (1, 16))
with torch.no_grad():
    logits, *_ = model(input_ids)

To generate a schema-v1 ecology trace:

from t3.tracing import generate_trace
generate_trace(model, "The capital of France is",
               prompt_id="factual", n_tokens=32,
               out_path="trace.jsonl")

To re-run the published lm-eval-harness benchmarks:

from t3.benchmarks import run_benchmark_suite
results = run_benchmark_suite("path/to/pytorch_model.bin")

Architecture

T³ extends standard multi-head attention with a per-head ecology of six conjugate primitives (E, I, F, V, C, K) coupled through bivector composition in Cl(3,3) geometric algebra. Heads interact through a learned blockade-and-cosurvival graph and ponder adaptively per stage via output-entropy halt. Full technical specification: docs/ARCHITECTURE.md.

Field	Value
Parameters	124,500,000
Stages	3, with `layers_per_stage = [4, 3, 5]` (12 transformer blocks total)
`d_model`	768
`n_heads`	12
`d_ff`	3072
`vocab_size`	50257 (GPT-2 tokenizer)
`max_seq_len`	1024
Substrate	GPT-2 Small initialization
Training data	5B tokens (FineWeb-Edu 40%, DCLM 20%, StackEdu 10%, FineMath 10%, Cosmopedia 10%, Wikipedia 10%)
Cumulative training step	138,000 (135.5K substrate + 2,500 v3.6 increment)
Hamiltonian coupling ω	0.02
Trivectors	off (the trivectors-on variant is a planned v3.7 follow-up release)
Inter-stage predictive coding	on (`weight = 0.05`)
Scratchpad heads	on (`scratchpad_inject_entropy = (0.0, 0.0, 0.03)` — S2-only)
ACT	output-entropy halt + per-stage 4-step cap

Evaluation

All numbers are full lm-eval-harness 0.4.x runs (no subset). Reproduce with examples/run_benchmarks.py from the reference repo.

Task	Metric	Value	stderr
WikiText-103 (val)	perplexity	27.76	—
BoolQ	acc	0.6046	0.0086
ARC-Easy	acc	0.4331	0.0102
ARC-Challenge	acc	0.2176	0.0121
PIQA	acc	0.6050	0.0114
HellaSwag	acc	0.3040	0.0046
WinoGrande	acc	0.5043	0.0141
COPA	acc	0.6000	0.0492
RTE	acc	0.5235	0.0301

For comparison panels (parameter-efficiency vs vanilla GPT-2 same-data, compute frontier), see https://t3atlas.dev/benchmarks/.

vs vanilla GPT-2 (same 5B-token training data)

T³-124M-v36 vs gpt2/124M_vanilla_5b_same_data baseline. The interesting delta is on multi-step reasoning — T³ pondering allocates extra forward compute per token (S1 averages 2.7–3.7 ponder loops on these tasks).

Intended use

Research and interpretability. The checkpoint is designed to be inspected via the trace library, not deployed for production text generation. The model is small (124M), English-only, and not instruction-tuned.
Architectural comparison. A reference point for novel-attention work (Mamba, RWKV, xLSTM, etc.) that's matched to vanilla GPT-2 on training data and parameter count.
Ecology / dynamics analysis. The trace JSONL records per-head, per-stage ecology state across forward passes — useful for studying how Clifford-algebra-coupled state evolves during inference.

Limitations

124M parameters: too small to be a useful generative chat model.
English only.
No instruction tuning, no RLHF, no safety tuning.
Trained without grade-3 trivector terms (the static bivector Ω is the full Cl(3,3) rotation in this checkpoint). The state-dependent variant is a planned v3.7 Medium-scale follow-up.
The v36-pcloss sibling is the same architecture trained with the inter-stage predictive-coding loss un-detached. Slightly worse PPL (28.53), neutral on reasoning — the K-predictor learning a real cross-stage map (r=0.59) doesn't translate into downstream gains at this scale.

Capabilities probe

The checkpoint declares the following dynamics in its config and state dict (consumed by t3atlas viewer for trace-rendering):

{
  "has_coupling":       true,
  "has_trivectors":     false,
  "has_dyn_omega":      false,
  "has_inter_stage_pc": true,
  "has_scratchpad":     true,
  "n_primitives":       6,
  "null_cone_strength": 0.02,
  "hamiltonian_coupling": 0.02,
  "sigma_hidden":       16,
  "scratchpad_inject_entropy": [0.0, 0.0, 0.03]
}

Citation

@misc{sutherland2026t3,
  author = {Sutherland, Garret},
  title  = {T³: A Clifford-Algebra-Augmented Transformer Architecture},
  year   = {2026},
  publisher = {Hugging Face},
  url    = {https://huggingface.co/mirrorethic/t3-124m-v36}
}

License

Apache-2.0. Both code (mirrorethic/t3-reference) and weights (this repository).

Contact

Garret Sutherland (MirrorEthic LLC) — gsutherland@mirrorethic.com.

Released 2026-05-03. The pytorch_model.bin here is a stripped inference-ready copy (498 MB) of the canonical best.pt from the v3.6 training campaign (run-3, step 2500, val PPL 27.76 on WikiText-103). The optimizer state and data-loader state were dropped; everything T3Model needs at inference is preserved (model_state, ecology_state, config, and provenance metadata).

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for mirrorethic/t3-124m-v36

Base model

openai-community/gpt2

Finetuned

(2151)

this model

Datasets used to train mirrorethic/t3-124m-v36

Evaluation results

perplexity on WikiText-103
self-reported

27.760
accuracy on BoolQ
self-reported

0.605
accuracy on ARC-Easy
self-reported

0.433
accuracy on ARC-Challenge
self-reported

0.218
accuracy on PIQA
self-reported

0.605
accuracy on HellaSwag
self-reported

0.304
accuracy on WinoGrande
self-reported

0.504
accuracy on COPA
self-reported

0.600
accuracy on RTE
self-reported

0.523