File size: 7,734 Bytes
92ac28f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
---
base_model: Qwen/Qwen2.5-0.5B-Instruct
library_name: safetensors
license: apache-2.0
tags:
  - qubitcoin
  - aether
  - blockchain
  - quantum
  - native-rust
  - candle
  - long-context
language:
  - en
pipeline_tag: text-generation
---

# Aether Mind v6.1 β€” long-context after the NaN fix

V6.1 is the **third public Aether release** and the first that
trains on a meaningfully long context window. It supersedes
[aether-mind-v6.0](https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.0)
which was published with a forced `ctx=64` workaround because of a
forward-pass numerical instability in the NSA compressed branch
(`v6/attention.rs::compressed_branch`).

That instability is now diagnosed + fixed. **Compressed-branch
attention's causal mask was producing all-`-inf` rows for query
positions before the first 64-token block completed, driving softmax
to `0/0 = NaN`.** The fix tracks per-row validity, unmasks a single
block on otherwise-fully-masked rows to keep softmax finite, and
multiplies the branch output by a row-validity mask so those rows
contribute zero attention (their proper behaviour). Source +
verification log in
[`docs/ops/v6-training-nan-bug.md`](https://github.com/QuantumAI-Blockchain/qubitcoin-aether/blob/presale/v1/docs/ops/v6-training-nan-bug.md);
the fix landed in commit
[`7f9189f8`](https://github.com/QuantumAI-Blockchain/qubitcoin-aether/commit/7f9189f8).

V6.1 was trained at **4Γ— the v6.0 context** (256 vs 64 tokens) on
the same 36,860-row Aether curated corpus, on the same RTX 3080 Ti,
in the same wall-clock envelope (~44 min vs v6.0's 50 min β€” slightly
faster because no Qwen teacher forward).

## What you're getting

| Field | Value |
|---|---|
| Base model | `Qwen/Qwen2.5-0.5B-Instruct` (initialised from, then CE-trained) |
| Architecture | V6 transformer: 24 layers, 896 hidden, 14 attention heads (10 Sephirot + 2 generalist + 2 sink), head_dim=64 |
| Trainable params | ~558 M (all weights, no LoRA) |
| Training mode | **Pure cross-entropy** (no distillation in this release β€” see notes below) |
| Training context | **256 tokens** (4Γ— the v6.0 release) |
| Precision | BF16 weights, F32 KL/CE math internally for numerical stability |
| NSA config | compression_block=64, top_k=2048, sliding_window=512, sink_tokens=4 |
| Vocab | 151,936 (Qwen2.5 tokenizer, untouched) |
| Max position | 32,768 (RoPE theta = 1e6) |
| Checkpoint published | **step 30,000** (full Phase-1 run) |
| File | `model.safetensors` (1.32 GB, BF16) |
| License | Apache-2.0 (matches base) |

## Training run

| Metric | Value | Ξ” vs v6.0 |
|---|---|---|
| Steps | 30,000 | = |
| Wall-clock | 44.4 min | βˆ’10 % |
| Tokens scored | 1,676,479 | +0.3 % (4Γ— context lets more rows fit) |
| Throughput | 629.9 tokens/sec | +12 % |
| Mean CE loss | **10.18** nats/token | better (v6.0 was 10.35 mean CE under the KL blend) |
| Mean Sephirot aux | 0.149 | = |
| Max tokens processed | **167** | (v6.0 truncated to 64) |
| **NaN events** | **0** | (v6.0 also 0 thanks to the ctx=64 workaround) |

### Loss trajectory

```
step      1  loss=15.75  avg=15.75   (random init)
step    100  loss=15.94  avg=16.32   warm-up
step   1000  loss=11.63  avg=13.20   ← CE/lm-head learning the vocab
step   5000  loss=10.00  avg=11.01
step  10000  loss= 9.13  avg=10.07   ← representational floor (much lower than v6.0's 7.68 at this step β€” but apples-to-oranges; v6.0 was loss-blended with KL teacher signal)
step  15000  loss=11.13  avg= 9.87
step  20000  loss=10.25  avg=10.02
step  25000  loss= 9.75  avg=10.15
step  29999  loss= 9.81  avg=10.18
```

The interesting fact: at step 122 (the row where v6.0 first NaN'd β€”
tokens=167), v6.1 reads a real loss in the 9-16 range and continues
training. **This release is the empirical proof that the
compressed-branch fix is the right one.**

## Architecture (unchanged from v6.0)

V6 is **not** a vanilla Qwen2.5 fine-tune. The attention layer
implements a 14-head split designed for on-chain cognitive routing:

- **10 Sephirot heads** β€” one per cognitive domain (Keter β†’ Malkuth).
  Each head's attention pattern is what the on-chain
  `pallet_qbc_aether_anchor` records as the per-cycle attestation root.
- **2 generalist heads** β€” un-gated, full-context attention. Used
  for the "global workspace" path in `aether-mind`.
- **2 sink heads** β€” anchor-token attention (first 4 tokens) for
  stable long-context performance.

The NSA compressed branch (the one that NaN'd) now correctly handles
the early-query case via row-validity masking.

## How to use

### Native runtime (recommended) β€” Rust `aether-mind`

Set `AETHER_V6_CHECKPOINT` to the local path of `model.safetensors`,
restart `qbc-aether-mind.service`. The Rust binary loads via candle.

### Python

```python
from safetensors.torch import load_file
weights = load_file("model.safetensors")  # 315 BF16 tensors
print("params:", sum(t.numel() for t in weights.values()))
```

There is **no upstream πŸ€— transformers loader** for the V6 14-head
split + Sephirot routing. Production use goes through the Rust
binary in
[`qubitcoin-aether`](https://github.com/QuantumAI-Blockchain/qubitcoin-aether).

## Evaluation

**Not yet run.** lm-evaluation-harness vs MMLU / ARC / HellaSwag /
TruthfulQA is the next session's work. We will back-fill the
numbers + comparison vs v5.2-lora + v6.0 here when they land.

## Notes vs v6.0

- **No KL distillation in this release.** The full distillation
  path (KL teacher signal + CE + Sephirot aux) hits a CUDA OOM at
  the new ctx=256 because the F32-stable KL log-softmax of the
  151K-vocab tensor allocates ~600 MB of intermediates per step that
  don't free fast enough. Memory optimisation (in-place softmax, KL
  chunking by vocab-tile) is the v6.2 work. v6.1 is CE-only over
  the 4Γ— longer context β€” a different bet that prioritises context
  reach over teacher matching.
- **All 30K steps used the new attention path.** The NaN-safe
  compressed branch runs by default; no env var or config to enable
  it.
- **Same architecture, weights file format, tokenizer, and config
  shape as v6.0.** The Rust binary loads v6.0 and v6.1 from the same
  loader.

## Open items for v6.2

- **Restore KL+CE distillation** at ctx β‰₯ 256 by chunking the
  151K-vocab log-softmax (compute per-512-token vocab-chunk so peak
  memory stays bounded).
- **Long-context curriculum** (16K β†’ 64K β†’ 128K β†’ 1M) per the V6
  master spec, now that the forward-pass NaN is gone.
- **lm-evaluation-harness pass** for honest numbers.
- **HumanEval / coding evals** if we add a coding-domain corpus
  chunk.

## License + citation

Apache-2.0 (matches the base model license).

```bibtex
@misc{aether_mind_v61_2026,
  title  = {Aether Mind v6.1 --- long-context after the compressed-branch NaN fix},
  author = {{BlockArtica} and {QuantumAI-Blockchain}},
  year   = {2026},
  url    = {https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.1},
}
```

## Links

- **QuantumAI Blockchain:** [qbc.network](https://qbc.network)
- **GitHub org:** [github.com/QuantumAI-Blockchain](https://github.com/QuantumAI-Blockchain)
- **Aether (Rust):** [qubitcoin-aether](https://github.com/QuantumAI-Blockchain/qubitcoin-aether)
- **Prior releases:**
  - [aether-mind-v6.0](https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.0) (ctx=64, distilled)
  - [aether-v5.2-lora](https://huggingface.co/QuantumAI-Blockchain/aether-v5.2-lora) (7B LoRA)
- **X / Twitter:** [@qu_bitcoin](https://x.com/qu_bitcoin)
- **Contact:** info@qbc.network

### Framework versions

- candle 0.10 + CUDA 12.6
- Rust `aether-v6-train` binary @ commit
  [`7f9189f8`](https://github.com/QuantumAI-Blockchain/qubitcoin-aether/commit/7f9189f8)
- Qwen2.5 tokenizer (vocab 151,936)