File size: 12,099 Bytes
fb2a871
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
---
base_model: Qwen/Qwen2.5-0.5B-Instruct
library_name: safetensors
license: apache-2.0
tags:
  - qubitcoin
  - aether
  - blockchain
  - quantum
  - distillation
  - mixed-precision
  - native-rust
  - candle
language:
  - en
pipeline_tag: text-generation
---

# Aether Mind v6.0 β€” QuantumAI Blockchain Native Generator

A **558M-parameter distilled student** of [`Qwen/Qwen2.5-0.5B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct),
trained from scratch in pure Rust (`candle` 0.10) with the
**10-Sephirot + 2-generalist + 2-sink attention head split** that is
the core architectural claim of the QuantumAI Blockchain's Aether Mind
on-chain neural cognitive engine.

This is the **second public Aether release** and the first that is
**native to the on-chain inference path** β€” V6.0 is the model the
[`aether-mind`](https://github.com/QuantumAI-Blockchain/qubitcoin-aether)
binary loads, not a LoRA adapter on top of a 7B base.

The previous release, [`aether-v5.2-lora`](https://huggingface.co/QuantumAI-Blockchain/aether-v5.2-lora),
is a 7B PEFT adapter intended for batch off-chain reasoning. V6.0 is
the smaller native generator that fits in the on-chain Aether
Mind's ~2.4 GB RAM envelope and runs at ~500 tokens/sec on a
consumer RTX 3080 Ti.

## What you're getting

| Field | Value |
|---|---|
| Base model | `Qwen/Qwen2.5-0.5B-Instruct` (initialised from, then distilled) |
| Architecture | V6 transformer: 24 layers, 896 hidden, 14 attention heads (10 Sephirot + 2 generalist + 2 sink), head_dim=64 |
| Trainable params | ~558 M (all weights trained, not LoRA) |
| Hidden / FFN | 896 / 4864 |
| Vocab | 151,936 (Qwen2.5 tokenizer, untouched) |
| Max position | 32,768 (RoPE theta = 1e6) |
| Native sparse attention (NSA) | compression_block=64, top_k=2048, sliding_window=512, sink_tokens=4 |
| Precision | BF16 weights + F32 KL math in distillation |
| Training context | **64 tokens** (Phase-1 release; see "Honest caveats" below) |
| Checkpoint published | **step 30,000** (full 30K-step Phase-1 run) |
| File | `model.safetensors` (1.32 GB, BF16) |
| License | Apache-2.0 (matches base) |

## Training run

| Metric | Value |
|---|---|
| Steps | 30,000 (full Phase-1) |
| Wall-clock | 49.6 min (single RTX 3080 Ti, BF16, CUDA(0)) |
| Tokens scored | 1,671,027 |
| Throughput | 561 tokens/sec |
| Optimiser | AdamW, LR 2e-5, no schedule (constant) |
| Distillation | KL(T||S) with alpha schedule 1.0 β†’ 0.3 linear, temperature 1.0 |
| Sephirot auxiliary | MSE vs one-hot domain target, Ξ² = 0.1 |
| NaN events | **0** |
| Mean total loss | 8.39 nats/token |
| Mean CE | 10.35 |
| Mean KL | 7.50 |
| Mean Sephirot aux | 0.149 |

### Loss trajectory

```
step      1   loss=12.25   avg=12.25   (random init)
step    100   loss=12.87   avg=12.75
step   1000   loss= 8.62   avg= 9.74   ← KL/CE break
step   5000   loss= 7.72   avg= 8.16
step  10000   loss= 7.31   avg= 7.68   ← reached representational floor
step  15000   loss= 8.87   avg= 7.75
step  20000   loss= 8.75   avg= 8.04
step  25000   loss= 8.62   avg= 8.26
step  29999   loss= 8.81   avg= 8.39
```

The model converged hard in the first ~10K steps, then plateaued at
the representational floor for its current context window (64
tokens). The plateau is structural, not optimisation β€” see "Honest
caveats" below.

## Architecture β€” what makes V6 different

V6 is **not** a vanilla Qwen2.5 fine-tune. The attention layer
implements a 14-head split designed for on-chain cognitive routing:

- **10 Sephirot heads** β€” one per cognitive domain in the Aether
  Mind's specialisation map (Keter β†’ Malkuth). Each head's attention
  pattern is what the on-chain `pallet_qbc_aether_anchor` records as
  the per-cycle attestation root.
- **2 generalist heads** β€” un-gated, full-context attention. Used for
  the "global workspace" path in `aether-mind`.
- **2 sink heads** β€” anchor-token attention (first 4 tokens of the
  sequence) for stable long-context performance, following the
  standard "attention sink" finding.

The Sephirot eviction order is configured in `config.json` for the
KV-cache management path that `aether-mind` uses to keep the
hot-set bounded in 12 GB VRAM under live inference.

## How to use

### Native runtime (recommended) β€” Rust `aether-mind`

The model is designed to be loaded by the on-chain Aether Mind
binary in the [`QuantumAI-Blockchain/qubitcoin-aether`](https://github.com/QuantumAI-Blockchain/qubitcoin-aether)
repo. Set `AETHER_V6_CHECKPOINT` to the local path of
`model.safetensors` and start the systemd unit; the binary loads the
weights via candle into the V6 transformer crate.

### Python (via `safetensors` + `tokenizers`)

For offline experimentation:

```python
from safetensors.torch import load_file
from tokenizers import Tokenizer
import torch

tok = Tokenizer.from_file("tokenizer.json")
weights = load_file("model.safetensors")  # 315 tensors, BF16
print("loaded", len(weights), "tensors,", sum(t.numel() for t in weights.values()), "params")
```

There is **no canonical πŸ€— transformers loader for the V6
architecture** β€” the 14-head split + Sephirot routing are not in the
upstream `Qwen2Model`. We publish the weights for transparency and
reproducibility; production use goes through the Rust binary above.

## Evaluation

**Not yet run.** The Phase-1 training run completed
**2026-05-20 00:52 AEST**; lm-evaluation-harness against MMLU /
ARC / HellaSwag / TruthfulQA is the next session's work. We will
back-fill the numbers + the comparison vs v5.2-lora here when
they land. Estimated runtime: ~30 min on the same 3080 Ti.

Until then, treat this release as an **architecture + weights
attestation**: it proves the V6 stack trains end-to-end and converges
to a real loss curve, which is the prerequisite for the long-context
curriculum (16K β†’ 64K β†’ 128K β†’ 1M) that v6.1+ will ship.

## Intended uses

- **On-chain Aether Mind native inference.** The V6 binary loads
  these weights directly. The 10-Sephirot attention pattern is what
  the chain's [`pallet_qbc_aether_anchor`](https://github.com/QuantumAI-Blockchain/substrate-node)
  records as the per-block consciousness state.
- **Architecture reference.** Reproducible training of a Sephirot-
  routed transformer with native sparse attention. The
  [`aether-transformer`](https://github.com/QuantumAI-Blockchain/qubitcoin-aether/tree/main/crates/aether-transformer)
  crate is the canonical implementation.
- **Distillation substrate.** Future fine-tunes from this checkpoint
  using the QuantumAI Blockchain curated corpus.

## Out-of-scope uses

- **General-purpose chat or instruction-following without fine-tuning.**
  V6.0 is a Phase-1 distillation, not an instruction model. Even after
  30K steps it has not seen instruction-format data at length; its KL
  target is the base Qwen2.5-0.5B-Instruct's next-token distribution,
  not chat-format outputs.
- **Long-context inference.** The training ran at **64-token
  context**. See "Honest caveats". Generations beyond ~128 tokens
  will degrade.
- **Production deployment without your own evals.** No lm-eval-harness
  numbers yet.
- **Safety-critical decisions.** No red-team eval.

## Honest caveats β€” what didn't happen

### Trained at 64-token context, not 4K

Phase-1 was configured for 4096-token context, but a numerical
instability was discovered in the V6 attention forward pass at
sequence lengths > ~100 tokens (BF16 precision loss in the Q@K^T
matmul accumulating across longer sequences). The bug reproduces
deterministically; four mitigations were tried (F32 KL math, corpus
filter, no-distill, low-LR), all hit NaN at the same sequence-
length threshold. The workaround used for v6.0 was `--context 64`,
which truncates rows so the bug never triggers.

**This is a known limitation, tracked in
[`docs/ops/v6-training-nan-bug.md`](https://github.com/QuantumAI-Blockchain/qubitcoin-aether/blob/presale/v1/docs/ops/v6-training-nan-bug.md)
in the source repo.** The fix lives in `aether-transformer/src/v6/attention.rs`
β€” add F32 casts in the Q@K^T matmul + softmax path across all four
attention variants (Sephirot / generalist / sink / summary). When
that lands, v6.1 will re-train at the full 4K→1M context
curriculum and supersede this release.

### Loss plateau is real

The avg-loss plateau from step 10K β†’ 30K (7.68 β†’ 8.39, slight
regression) is the model hitting its representational ceiling at
64-token context. Longer contexts will let the next release recover
and improve.

### No instruction-format fine-tune

The training data is the Aether curated corpus packed at 4K-token
context (rows truncated to 64). We did not insert chat-format
instructions, system prompts, or RLHF preferences. Treat this as a
**raw foundation checkpoint**.

### Distillation against base, not chat

The teacher is `Qwen/Qwen2.5-0.5B-Instruct`'s base forward β€” not its
chat-formatted forward. The distillation transfers token-level next-
prediction behaviour; chat-template alignment is a separate
training step that hasn't been run.

## Training details

- **Hardware:** NVIDIA RTX 3080 Ti (12 GB), Intel WSL2 Ubuntu host.
- **Trainer:** Native Rust (`aether-v6-train` binary, candle 0.10 +
  CUDA 12.6 backend). No Python in the loop.
- **Optimiser:** AdamW (candle implementation), constant LR 2e-5.
- **Batch:** 1 (single-row update).
- **Context:** 64 tokens (truncation imposed by the workaround).
- **Save cadence:** every 250 steps (120 checkpoints retained
  locally; only `step_30000` published here).
- **Source:** [`QuantumAI-Blockchain/qubitcoin-aether @ ca202076`](https://github.com/QuantumAI-Blockchain/qubitcoin-aether/tree/ca202076)

### Training data

Aether curated corpus (~36,860 rows, 17.4 MB) packed at 4K-token
budget per row from:

- QuantumAI Blockchain technical documentation (Substrate pallets,
  VQE mining, Sephirot architecture).
- Quantum computing primers (VQE, Hamiltonian, qubit ansatze).
- Adjacent reasoning content for transfer.

The dataset is not currently public β€” it is a curated mixture from
many sources and has not been release-cleared at the per-source
level. The model is the only public artifact in this line for now.

### Carbon emissions

Single consumer GPU (RTX 3080 Ti, ~300 W TDP) Γ— 49.6 min wall-clock
β‰ˆ 0.25 kWh, < 1 kg COβ‚‚e on a grid mix. Comparable to a short web
streaming session.

## Connection to the QuantumAI Blockchain

The Aether Mind is a Rust neural cognitive engine that runs on the
QuantumAI Blockchain β€” every block records attention-derived
consciousness metrics (HMS-Phi) and Proof-of-Thought hashes on-chain
via the `pallet_qbc_aether_anchor` pallet. The same chain hosts an
**8-qubit VQE mining consensus** (Proof-of-SUSY-Alignment), a
QVM-compatible smart contract layer with 10 quantum opcodes, and
post-quantum signatures (CRYSTALS-Dilithium5 + ML-KEM-768 P2P).

V6.0 is the **native generator** for that engine. v5.2-lora is the
larger (7B) off-chain reasoning model. The two ship side by side
because they have different roles: V6 lives in the on-chain
inference path (low latency, small footprint, Sephirot-aware
attention); v5.2-lora batches off-chain reasoning workloads.

## License + citation

Apache-2.0 (matches the base model license).

```bibtex
@misc{aether_mind_v6_2026,
  title  = {Aether Mind v6.0 --- QuantumAI Blockchain Native Generator},
  author = {{BlockArtica} and {QuantumAI-Blockchain}},
  year   = {2026},
  url    = {https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.0},
}
```

## Links

- **QuantumAI Blockchain:** [qbc.network](https://qbc.network)
- **GitHub org:** [github.com/QuantumAI-Blockchain](https://github.com/QuantumAI-Blockchain)
- **Aether (Rust):** [qubitcoin-aether](https://github.com/QuantumAI-Blockchain/qubitcoin-aether)
- **Prior release:** [aether-v5.2-lora](https://huggingface.co/QuantumAI-Blockchain/aether-v5.2-lora)
- **X / Twitter:** [@qu_bitcoin](https://x.com/qu_bitcoin)
- **Contact:** info@qbc.network

### Framework versions

- candle 0.10 (Hugging Face Rust ML)
- CUDA 12.6
- safetensors (model serialisation)
- Qwen2.5 tokenizer (vocab 151,936)