| --- |
| base_model: Qwen/Qwen2.5-0.5B-Instruct |
| library_name: safetensors |
| license: apache-2.0 |
| tags: |
| - qubitcoin |
| - aether |
| - blockchain |
| - quantum |
| - native-rust |
| - candle |
| - long-context |
| language: |
| - en |
| pipeline_tag: text-generation |
| --- |
| |
| # Aether Mind v6.1 β long-context after the NaN fix |
|
|
| V6.1 is the **third public Aether release** and the first that |
| trains on a meaningfully long context window. It supersedes |
| [aether-mind-v6.0](https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.0) |
| which was published with a forced `ctx=64` workaround because of a |
| forward-pass numerical instability in the NSA compressed branch |
| (`v6/attention.rs::compressed_branch`). |
|
|
| That instability is now diagnosed + fixed. **Compressed-branch |
| attention's causal mask was producing all-`-inf` rows for query |
| positions before the first 64-token block completed, driving softmax |
| to `0/0 = NaN`.** The fix tracks per-row validity, unmasks a single |
| block on otherwise-fully-masked rows to keep softmax finite, and |
| multiplies the branch output by a row-validity mask so those rows |
| contribute zero attention (their proper behaviour). Source + |
| verification log in |
| [`docs/ops/v6-training-nan-bug.md`](https://github.com/QuantumAI-Blockchain/qubitcoin-aether/blob/presale/v1/docs/ops/v6-training-nan-bug.md); |
| the fix landed in commit |
| [`7f9189f8`](https://github.com/QuantumAI-Blockchain/qubitcoin-aether/commit/7f9189f8). |
|
|
| V6.1 was trained at **4Γ the v6.0 context** (256 vs 64 tokens) on |
| the same 36,860-row Aether curated corpus, on the same RTX 3080 Ti, |
| in the same wall-clock envelope (~44 min vs v6.0's 50 min β slightly |
| faster because no Qwen teacher forward). |
|
|
| ## What you're getting |
|
|
| | Field | Value | |
| |---|---| |
| | Base model | `Qwen/Qwen2.5-0.5B-Instruct` (initialised from, then CE-trained) | |
| | Architecture | V6 transformer: 24 layers, 896 hidden, 14 attention heads (10 Sephirot + 2 generalist + 2 sink), head_dim=64 | |
| | Trainable params | ~558 M (all weights, no LoRA) | |
| | Training mode | **Pure cross-entropy** (no distillation in this release β see notes below) | |
| | Training context | **256 tokens** (4Γ the v6.0 release) | |
| | Precision | BF16 weights, F32 KL/CE math internally for numerical stability | |
| | NSA config | compression_block=64, top_k=2048, sliding_window=512, sink_tokens=4 | |
| | Vocab | 151,936 (Qwen2.5 tokenizer, untouched) | |
| | Max position | 32,768 (RoPE theta = 1e6) | |
| | Checkpoint published | **step 30,000** (full Phase-1 run) | |
| | File | `model.safetensors` (1.32 GB, BF16) | |
| | License | Apache-2.0 (matches base) | |
| |
| ## Training run |
| |
| | Metric | Value | Ξ vs v6.0 | |
| |---|---|---| |
| | Steps | 30,000 | = | |
| | Wall-clock | 44.4 min | β10 % | |
| | Tokens scored | 1,676,479 | +0.3 % (4Γ context lets more rows fit) | |
| | Throughput | 629.9 tokens/sec | +12 % | |
| | Mean CE loss | **10.18** nats/token | better (v6.0 was 10.35 mean CE under the KL blend) | |
| | Mean Sephirot aux | 0.149 | = | |
| | Max tokens processed | **167** | (v6.0 truncated to 64) | |
| | **NaN events** | **0** | (v6.0 also 0 thanks to the ctx=64 workaround) | |
| |
| ### Loss trajectory |
| |
| ``` |
| step 1 loss=15.75 avg=15.75 (random init) |
| step 100 loss=15.94 avg=16.32 warm-up |
| step 1000 loss=11.63 avg=13.20 β CE/lm-head learning the vocab |
| step 5000 loss=10.00 avg=11.01 |
| step 10000 loss= 9.13 avg=10.07 β representational floor (much lower than v6.0's 7.68 at this step β but apples-to-oranges; v6.0 was loss-blended with KL teacher signal) |
| step 15000 loss=11.13 avg= 9.87 |
| step 20000 loss=10.25 avg=10.02 |
| step 25000 loss= 9.75 avg=10.15 |
| step 29999 loss= 9.81 avg=10.18 |
| ``` |
| |
| The interesting fact: at step 122 (the row where v6.0 first NaN'd β |
| tokens=167), v6.1 reads a real loss in the 9-16 range and continues |
| training. **This release is the empirical proof that the |
| compressed-branch fix is the right one.** |
| |
| ## Architecture (unchanged from v6.0) |
| |
| V6 is **not** a vanilla Qwen2.5 fine-tune. The attention layer |
| implements a 14-head split designed for on-chain cognitive routing: |
| |
| - **10 Sephirot heads** β one per cognitive domain (Keter β Malkuth). |
| Each head's attention pattern is what the on-chain |
| `pallet_qbc_aether_anchor` records as the per-cycle attestation root. |
| - **2 generalist heads** β un-gated, full-context attention. Used |
| for the "global workspace" path in `aether-mind`. |
| - **2 sink heads** β anchor-token attention (first 4 tokens) for |
| stable long-context performance. |
|
|
| The NSA compressed branch (the one that NaN'd) now correctly handles |
| the early-query case via row-validity masking. |
|
|
| ## How to use |
|
|
| ### Native runtime (recommended) β Rust `aether-mind` |
|
|
| Set `AETHER_V6_CHECKPOINT` to the local path of `model.safetensors`, |
| restart `qbc-aether-mind.service`. The Rust binary loads via candle. |
|
|
| ### Python |
|
|
| ```python |
| from safetensors.torch import load_file |
| weights = load_file("model.safetensors") # 315 BF16 tensors |
| print("params:", sum(t.numel() for t in weights.values())) |
| ``` |
|
|
| There is **no upstream π€ transformers loader** for the V6 14-head |
| split + Sephirot routing. Production use goes through the Rust |
| binary in |
| [`qubitcoin-aether`](https://github.com/QuantumAI-Blockchain/qubitcoin-aether). |
|
|
| ## Evaluation |
|
|
| **Not yet run.** lm-evaluation-harness vs MMLU / ARC / HellaSwag / |
| TruthfulQA is the next session's work. We will back-fill the |
| numbers + comparison vs v5.2-lora + v6.0 here when they land. |
|
|
| ## Notes vs v6.0 |
|
|
| - **No KL distillation in this release.** The full distillation |
| path (KL teacher signal + CE + Sephirot aux) hits a CUDA OOM at |
| the new ctx=256 because the F32-stable KL log-softmax of the |
| 151K-vocab tensor allocates ~600 MB of intermediates per step that |
| don't free fast enough. Memory optimisation (in-place softmax, KL |
| chunking by vocab-tile) is the v6.2 work. v6.1 is CE-only over |
| the 4Γ longer context β a different bet that prioritises context |
| reach over teacher matching. |
| - **All 30K steps used the new attention path.** The NaN-safe |
| compressed branch runs by default; no env var or config to enable |
| it. |
| - **Same architecture, weights file format, tokenizer, and config |
| shape as v6.0.** The Rust binary loads v6.0 and v6.1 from the same |
| loader. |
|
|
| ## Open items for v6.2 |
|
|
| - **Restore KL+CE distillation** at ctx β₯ 256 by chunking the |
| 151K-vocab log-softmax (compute per-512-token vocab-chunk so peak |
| memory stays bounded). |
| - **Long-context curriculum** (16K β 64K β 128K β 1M) per the V6 |
| master spec, now that the forward-pass NaN is gone. |
| - **lm-evaluation-harness pass** for honest numbers. |
| - **HumanEval / coding evals** if we add a coding-domain corpus |
| chunk. |
|
|
| ## License + citation |
|
|
| Apache-2.0 (matches the base model license). |
|
|
| ```bibtex |
| @misc{aether_mind_v61_2026, |
| title = {Aether Mind v6.1 --- long-context after the compressed-branch NaN fix}, |
| author = {{BlockArtica} and {QuantumAI-Blockchain}}, |
| year = {2026}, |
| url = {https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.1}, |
| } |
| ``` |
|
|
| ## Links |
|
|
| - **QuantumAI Blockchain:** [qbc.network](https://qbc.network) |
| - **GitHub org:** [github.com/QuantumAI-Blockchain](https://github.com/QuantumAI-Blockchain) |
| - **Aether (Rust):** [qubitcoin-aether](https://github.com/QuantumAI-Blockchain/qubitcoin-aether) |
| - **Prior releases:** |
| - [aether-mind-v6.0](https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.0) (ctx=64, distilled) |
| - [aether-v5.2-lora](https://huggingface.co/QuantumAI-Blockchain/aether-v5.2-lora) (7B LoRA) |
| - **X / Twitter:** [@qu_bitcoin](https://x.com/qu_bitcoin) |
| - **Contact:** info@qbc.network |
|
|
| ### Framework versions |
|
|
| - candle 0.10 + CUDA 12.6 |
| - Rust `aether-v6-train` binary @ commit |
| [`7f9189f8`](https://github.com/QuantumAI-Blockchain/qubitcoin-aether/commit/7f9189f8) |
| - Qwen2.5 tokenizer (vocab 151,936) |
|
|