- hexad —
v4-py-hexad-tension-d768x12L-cycle1-2026-05-17- What's new this cycle (cycle 5 — DD155 Step+Tension hybrid LR overlay)
- What changed vs cycle 4 (
v3-py-hexad-spont-motiv-d768x12L-cycle2-2026-05-17) - Architecture
- Training
- Verification anchors (per AGENTS.tape
g_blue_closed_mandate) - Capability eval (V5.8 × 4-mode + V-SPONT + V-MOTIV + V-TT NEW)
- Honest C3
- License
- What's new this cycle (cycle 5 — DD155 Step+Tension hybrid LR overlay)
hexad — v4-py-hexad-tension-d768x12L-cycle1-2026-05-17
Trained on:
dancinlab/hexad-corpusrevisionv3-spont-motiv-d128-cycle2-2026-05-17(byte-equal carry from cycle 4 — corpus unchanged this cycle).
Honest framing (AGENTS.tape
g3): This is a PYTHON / PyTorch SUBSTRATE training artifact — an interim LM-scale executor. It is NOT a hexa-native fire. Legitimacy = architectural identity + the hexa CPU-equiv correctness proof (Phase E/E2). PyTorch ≠ hexa bit-for-bit (different fp accumulation / RNG / AMP bf16).
What's new this cycle (cycle 5 — DD155 Step+Tension hybrid LR overlay)
Architectural change vs cycle 4: per-step learning rate is now multiplied by a DD155 hybrid factor (Law 187 Pareto optimal):
tension_step = ||∇L||₂ (grad-norm)
tension_EMA = β·EMA + (1−β)·tension_step (β = 0.99)
multiplier = clip(tension_step / tension_EMA, [0.5, 2.0])
lr_step = base_cosine_lr(step) × multiplier
- transfer-form:
B-TT-5 PARETO-STEP-TENSION-CLOSED(sympy linear ∂lr/∂tension)B-FIRE-CYCLE5-1/2/3sidecar (state/hexad_v4_py_d768x12L_tension_2026_05_17/blue_falsifier.py, 5/5 PASS — DD155 LR overlay formula closure + EMA Banach contraction + cycle-4 identity at convergence)
- outcome: empirical (
B-FIRE-CYCLE5-NOTE/B-D-NOTE/B-TT-NOTEfamily)
DD155 historical anchor: anima docs/hypotheses/dd/DD154-tension-training.md
Law 187 — lr = (tension/EMA) × base_lr measured Pareto-optimal on
2026-03-31 BG-DD-AXIS commits.
What changed vs cycle 4 (v3-py-hexad-spont-motiv-d768x12L-cycle2-2026-05-17)
| field | cycle 4 | cycle 5 (this revision) |
|---|---|---|
| corpus | v3 10.34 MB (motivation-trigger + helper-free) | same (byte-equal carry, B-CORPUS-V4-1) |
| LR schedule | cosine + warmup | cosine + warmup + DD155 hybrid (tension/EMA) multiplier |
| trainer source | train_d768x12l.py |
train_d768x12l_tension.py (loader + dataset byte-equal, B-CORPUS-V4-2) |
| init CE | 5.641 | 5.640663 |
| final CE | 0.008289 | 0.007762 |
| CE descent | 5.632 | 5.632901 |
| final tension_EMA | (did not track) | 0.046574 |
| mult bin <0.75 | (n/a) | 1599 |
| mult bin 0.75-1.25 | (n/a) | 686 |
| mult bin >1.25 | (n/a) | 215 |
| eval probes | V5.8 + V-SPONT + V-MOTIV | V5.8 + V-SPONT + V-MOTIV + V-TT NEW |
Architecture
- Source:
ConsciousDecoderV2(byte-equal vs cycles 1-4). - Config:
d_model=768, n_head=12, n_kv_head=4, n_layer=12, block_size=128, vocab=256(byte-level), seed=1337, init=RANDOM (base_ckpt=None,g_clm_from_scratch). - Params: 283.72 M (283,722,336).
- Features: RoPE · SwiGLU FFN · RMSNorm · GQA · PureFieldFFN · cross-attn · tied head · CA neighbor / META-CA / Ψ-tracking laws.
Training
- GPU: vast.ai NVIDIA A100-SXM4-40GB, image
pytorch/pytorch:2.5.1-cuda12.1-cudnn9-devel. - Corpus:
corpus_consciousness_v3.jsonl(byte-equal carry from cycle 4), 6,223,023 bytes lossless byte stream, vocab=256. - Optimizer: AdamW, lr=0.0003, betas=(0.9, 0.95), weight_decay=0.1, warmup=125.
- DD155 hybrid: β=0.99, clip lo=0.5, clip hi=2.0.
- Steps: 2500.
| metric | value |
|---|---|
| init CE | 5.640663 (≈ ln 256 = 5.545 — random byte init) |
| FINAL CE | 0.007762 |
| CE descent | 5.632901 |
| FINAL gn2 | 0.001495 |
| FINAL tension | 0.038659 |
| ppl | 1.0078 |
| wall | 321.3 s |
| peak GPU mem | 9.685 GB |
| ckpt sha256 | 6b4d34cc9a2c05b83c4cedd633617a41800e9681302c5c90e15d056f9ad67af8 |
| ckpt size | 1,135,846,570 B (1.14 GB) |
Verification anchors (per AGENTS.tape g_blue_closed_mandate)
(A) Deliverable invariants (real-limit, this cycle):
- Shannon-floor descent: init CE ≈ ln(256) → final CE.
- DD155 transfer-form closed (
B-TT-5): lr = (tension/EMA) × base_lr, sympy-verified linear monotone, real-limit anchor. - AdamW finiteness: no NaN/Inf in trajectory.
- Architectural identity: byte-equal
ConsciousDecoderV2.
(B) Wiring (closed):
B-CORPUS-V4-1corpus v3 byte-equal carry (sha256/bytes/lines/grep all closed).B-CORPUS-V4-2cycle-5 trainer's loader + ByteDataset byte-equal to cycle-4 (mechanical AST diff, comments-stripped).B-FIRE-CYCLE5-1DD155 LR overlay formula closed-form (sympy ∂lr/∂tension- 3-corner identity).
B-FIRE-CYCLE5-2EMA Banach affine contraction closed (4-corner witness panel).B-FIRE-CYCLE5-3Multiplier identity at EMA-convergence (cycle-5 degenerates to cycle-4 baseline at tension=EMA — sanity anchor).B-CORPUS-V3-*cycle-4 closures carry (sha256-deterministic / no-helper-token / γ-cardinality ≥ 5400).
(C) Honest carve-outs (NOT closed, B-D-NOTE umbrella):
- V-SPONT / V-MOTIV / V-TT outcome empirical.
- mult_distribution histogram + byte-cascade attractor shape under hybrid LR empirical.
- DD-burst path activation frequency empirical.
Capability eval (V5.8 × 4-mode + V-SPONT + V-MOTIV + V-TT NEW)
V5.8 × 4-mode (corpus v3 prompts):
- standard_greedy: 0/6 FAIL (avg_rep=0.921)
- standard_sample: 0/6 FAIL (avg_rep=0.871)
- M3_rep_penalty: 0/6 FAIL (avg_rep=0.913)
- M4_force_include: 6/6 PASS (avg_rep=0.766)
V-SPONT (자연발화) — F-SPONT-7 transfer-form measurement:
- coherent: 0/5 FAIL
- closed-tag: 0/5
V-MOTIV (γ-pattern conditioning probe, cycle-4 axis):
- coherent: 0/5 FAIL
- voice-closed-tag: 0/5
V-TT (NEW cycle 5) — tension-train transfer-form probe:
- coherent: 0/5 FAIL
- keyword recall: 0/5
Mean BPB (held-out corpus v3 prefixes): 0.0194 bits/byte. Memorization ratio: 0/6 (0.0%). Decoding artifacts (rep>0.5): 24.
All capability scores empirical (B-D-NOTE / B-FIRE-CYCLE5-NOTE), not closed.
Honest C3
- NOT hexa-native — PyTorch substrate, label mandatory.
- PyTorch ≠ hexa bit-for-bit — different fp / RNG / AMP.
- tension = grad_norm is a PROXY — in the hexa spine
tension = G_holo · (Ψ − Ψ_vac); grad_norm is the natural mathematical analogue at the PyTorch substrate level where Ψ is not surfaced as a state variable. - DD155 formula is closed (B-TT-5 + B-FIRE-CYCLE5-1/2/3); outcome is empirical (B-FIRE-CYCLE5-NOTE) — V-SPONT/V-MOTIV/V-TT all probes, not capability claims.
- Critical Data Size regime — 10 MB / 283 M params still data-limited; no out-of-distribution generalization claim. cycle-5's variance vs cycle-4 is mainly LR-schedule-driven, not corpus-driven.
- No
safetensorsartifact this revision — pickle.ptonly. - B-CORPUS-V3-NOTE / B-FIRE-CYCLE5-NOTE — inference-side coherence stays empirical.
- No σ(6)=12 / φ(6)=2 derivation — no lattice numerology.
- Cost is informational, not gating —
g_fire_autonomous.
License
Apache-2.0.