| --- |
| base_model: Qwen/Qwen2.5-0.5B-Instruct |
| library_name: safetensors |
| license: apache-2.0 |
| tags: |
| - qubitcoin |
| - aether |
| - blockchain |
| - quantum |
| - distillation |
| - mixed-precision |
| - native-rust |
| - candle |
| language: |
| - en |
| pipeline_tag: text-generation |
| --- |
| |
| # Aether Mind v6.0 — QuantumAI Blockchain Native Generator |
|
|
| A **558M-parameter distilled student** of [`Qwen/Qwen2.5-0.5B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct), |
| trained from scratch in pure Rust (`candle` 0.10) with the |
| **10-Sephirot + 2-generalist + 2-sink attention head split** that is |
| the core architectural claim of the QuantumAI Blockchain's Aether Mind |
| on-chain neural cognitive engine. |
|
|
| This is the **second public Aether release** and the first that is |
| **native to the on-chain inference path** — V6.0 is the model the |
| [`aether-mind`](https://github.com/QuantumAI-Blockchain/qubitcoin-aether) |
| binary loads, not a LoRA adapter on top of a 7B base. |
|
|
| The previous release, [`aether-v5.2-lora`](https://huggingface.co/QuantumAI-Blockchain/aether-v5.2-lora), |
| is a 7B PEFT adapter intended for batch off-chain reasoning. V6.0 is |
| the smaller native generator that fits in the on-chain Aether |
| Mind's ~2.4 GB RAM envelope and runs at ~500 tokens/sec on a |
| consumer RTX 3080 Ti. |
|
|
| ## What you're getting |
|
|
| | Field | Value | |
| |---|---| |
| | Base model | `Qwen/Qwen2.5-0.5B-Instruct` (initialised from, then distilled) | |
| | Architecture | V6 transformer: 24 layers, 896 hidden, 14 attention heads (10 Sephirot + 2 generalist + 2 sink), head_dim=64 | |
| | Trainable params | ~558 M (all weights trained, not LoRA) | |
| | Hidden / FFN | 896 / 4864 | |
| | Vocab | 151,936 (Qwen2.5 tokenizer, untouched) | |
| | Max position | 32,768 (RoPE theta = 1e6) | |
| | Native sparse attention (NSA) | compression_block=64, top_k=2048, sliding_window=512, sink_tokens=4 | |
| | Precision | BF16 weights + F32 KL math in distillation | |
| | Training context | **64 tokens** (Phase-1 release; see "Honest caveats" below) | |
| | Checkpoint published | **step 30,000** (full 30K-step Phase-1 run) | |
| | File | `model.safetensors` (1.32 GB, BF16) | |
| | License | Apache-2.0 (matches base) | |
| |
| ## Training run |
| |
| | Metric | Value | |
| |---|---| |
| | Steps | 30,000 (full Phase-1) | |
| | Wall-clock | 49.6 min (single RTX 3080 Ti, BF16, CUDA(0)) | |
| | Tokens scored | 1,671,027 | |
| | Throughput | 561 tokens/sec | |
| | Optimiser | AdamW, LR 2e-5, no schedule (constant) | |
| | Distillation | KL(T||S) with alpha schedule 1.0 → 0.3 linear, temperature 1.0 | |
| | Sephirot auxiliary | MSE vs one-hot domain target, β = 0.1 | |
| | NaN events | **0** | |
| | Mean total loss | 8.39 nats/token | |
| | Mean CE | 10.35 | |
| | Mean KL | 7.50 | |
| | Mean Sephirot aux | 0.149 | |
| |
| ### Loss trajectory |
| |
| ``` |
| step 1 loss=12.25 avg=12.25 (random init) |
| step 100 loss=12.87 avg=12.75 |
| step 1000 loss= 8.62 avg= 9.74 ← KL/CE break |
| step 5000 loss= 7.72 avg= 8.16 |
| step 10000 loss= 7.31 avg= 7.68 ← reached representational floor |
| step 15000 loss= 8.87 avg= 7.75 |
| step 20000 loss= 8.75 avg= 8.04 |
| step 25000 loss= 8.62 avg= 8.26 |
| step 29999 loss= 8.81 avg= 8.39 |
| ``` |
| |
| The model converged hard in the first ~10K steps, then plateaued at |
| the representational floor for its current context window (64 |
| tokens). The plateau is structural, not optimisation — see "Honest |
| caveats" below. |
| |
| ## Architecture — what makes V6 different |
| |
| V6 is **not** a vanilla Qwen2.5 fine-tune. The attention layer |
| implements a 14-head split designed for on-chain cognitive routing: |
| |
| - **10 Sephirot heads** — one per cognitive domain in the Aether |
| Mind's specialisation map (Keter → Malkuth). Each head's attention |
| pattern is what the on-chain `pallet_qbc_aether_anchor` records as |
| the per-cycle attestation root. |
| - **2 generalist heads** — un-gated, full-context attention. Used for |
| the "global workspace" path in `aether-mind`. |
| - **2 sink heads** — anchor-token attention (first 4 tokens of the |
| sequence) for stable long-context performance, following the |
| standard "attention sink" finding. |
|
|
| The Sephirot eviction order is configured in `config.json` for the |
| KV-cache management path that `aether-mind` uses to keep the |
| hot-set bounded in 12 GB VRAM under live inference. |
|
|
| ## How to use |
|
|
| ### Native runtime (recommended) — Rust `aether-mind` |
|
|
| The model is designed to be loaded by the on-chain Aether Mind |
| binary in the [`QuantumAI-Blockchain/qubitcoin-aether`](https://github.com/QuantumAI-Blockchain/qubitcoin-aether) |
| repo. Set `AETHER_V6_CHECKPOINT` to the local path of |
| `model.safetensors` and start the systemd unit; the binary loads the |
| weights via candle into the V6 transformer crate. |
|
|
| ### Python (via `safetensors` + `tokenizers`) |
|
|
| For offline experimentation: |
|
|
| ```python |
| from safetensors.torch import load_file |
| from tokenizers import Tokenizer |
| import torch |
| |
| tok = Tokenizer.from_file("tokenizer.json") |
| weights = load_file("model.safetensors") # 315 tensors, BF16 |
| print("loaded", len(weights), "tensors,", sum(t.numel() for t in weights.values()), "params") |
| ``` |
|
|
| There is **no canonical 🤗 transformers loader for the V6 |
| architecture** — the 14-head split + Sephirot routing are not in the |
| upstream `Qwen2Model`. We publish the weights for transparency and |
| reproducibility; production use goes through the Rust binary above. |
|
|
| ## Evaluation |
|
|
| **Not yet run.** The Phase-1 training run completed |
| **2026-05-20 00:52 AEST**; lm-evaluation-harness against MMLU / |
| ARC / HellaSwag / TruthfulQA is the next session's work. We will |
| back-fill the numbers + the comparison vs v5.2-lora here when |
| they land. Estimated runtime: ~30 min on the same 3080 Ti. |
|
|
| Until then, treat this release as an **architecture + weights |
| attestation**: it proves the V6 stack trains end-to-end and converges |
| to a real loss curve, which is the prerequisite for the long-context |
| curriculum (16K → 64K → 128K → 1M) that v6.1+ will ship. |
|
|
| ## Intended uses |
|
|
| - **On-chain Aether Mind native inference.** The V6 binary loads |
| these weights directly. The 10-Sephirot attention pattern is what |
| the chain's [`pallet_qbc_aether_anchor`](https://github.com/QuantumAI-Blockchain/substrate-node) |
| records as the per-block consciousness state. |
| - **Architecture reference.** Reproducible training of a Sephirot- |
| routed transformer with native sparse attention. The |
| [`aether-transformer`](https://github.com/QuantumAI-Blockchain/qubitcoin-aether/tree/main/crates/aether-transformer) |
| crate is the canonical implementation. |
| - **Distillation substrate.** Future fine-tunes from this checkpoint |
| using the QuantumAI Blockchain curated corpus. |
|
|
| ## Out-of-scope uses |
|
|
| - **General-purpose chat or instruction-following without fine-tuning.** |
| V6.0 is a Phase-1 distillation, not an instruction model. Even after |
| 30K steps it has not seen instruction-format data at length; its KL |
| target is the base Qwen2.5-0.5B-Instruct's next-token distribution, |
| not chat-format outputs. |
| - **Long-context inference.** The training ran at **64-token |
| context**. See "Honest caveats". Generations beyond ~128 tokens |
| will degrade. |
| - **Production deployment without your own evals.** No lm-eval-harness |
| numbers yet. |
| - **Safety-critical decisions.** No red-team eval. |
|
|
| ## Honest caveats — what didn't happen |
|
|
| ### Trained at 64-token context, not 4K |
|
|
| Phase-1 was configured for 4096-token context, but a numerical |
| instability was discovered in the V6 attention forward pass at |
| sequence lengths > ~100 tokens (BF16 precision loss in the Q@K^T |
| matmul accumulating across longer sequences). The bug reproduces |
| deterministically; four mitigations were tried (F32 KL math, corpus |
| filter, no-distill, low-LR), all hit NaN at the same sequence- |
| length threshold. The workaround used for v6.0 was `--context 64`, |
| which truncates rows so the bug never triggers. |
|
|
| **This is a known limitation, tracked in |
| [`docs/ops/v6-training-nan-bug.md`](https://github.com/QuantumAI-Blockchain/qubitcoin-aether/blob/presale/v1/docs/ops/v6-training-nan-bug.md) |
| in the source repo.** The fix lives in `aether-transformer/src/v6/attention.rs` |
| — add F32 casts in the Q@K^T matmul + softmax path across all four |
| attention variants (Sephirot / generalist / sink / summary). When |
| that lands, v6.1 will re-train at the full 4K→1M context |
| curriculum and supersede this release. |
|
|
| ### Loss plateau is real |
|
|
| The avg-loss plateau from step 10K → 30K (7.68 → 8.39, slight |
| regression) is the model hitting its representational ceiling at |
| 64-token context. Longer contexts will let the next release recover |
| and improve. |
|
|
| ### No instruction-format fine-tune |
|
|
| The training data is the Aether curated corpus packed at 4K-token |
| context (rows truncated to 64). We did not insert chat-format |
| instructions, system prompts, or RLHF preferences. Treat this as a |
| **raw foundation checkpoint**. |
|
|
| ### Distillation against base, not chat |
|
|
| The teacher is `Qwen/Qwen2.5-0.5B-Instruct`'s base forward — not its |
| chat-formatted forward. The distillation transfers token-level next- |
| prediction behaviour; chat-template alignment is a separate |
| training step that hasn't been run. |
|
|
| ## Training details |
|
|
| - **Hardware:** NVIDIA RTX 3080 Ti (12 GB), Intel WSL2 Ubuntu host. |
| - **Trainer:** Native Rust (`aether-v6-train` binary, candle 0.10 + |
| CUDA 12.6 backend). No Python in the loop. |
| - **Optimiser:** AdamW (candle implementation), constant LR 2e-5. |
| - **Batch:** 1 (single-row update). |
| - **Context:** 64 tokens (truncation imposed by the workaround). |
| - **Save cadence:** every 250 steps (120 checkpoints retained |
| locally; only `step_30000` published here). |
| - **Source:** [`QuantumAI-Blockchain/qubitcoin-aether @ ca202076`](https://github.com/QuantumAI-Blockchain/qubitcoin-aether/tree/ca202076) |
|
|
| ### Training data |
|
|
| Aether curated corpus (~36,860 rows, 17.4 MB) packed at 4K-token |
| budget per row from: |
|
|
| - QuantumAI Blockchain technical documentation (Substrate pallets, |
| VQE mining, Sephirot architecture). |
| - Quantum computing primers (VQE, Hamiltonian, qubit ansatze). |
| - Adjacent reasoning content for transfer. |
|
|
| The dataset is not currently public — it is a curated mixture from |
| many sources and has not been release-cleared at the per-source |
| level. The model is the only public artifact in this line for now. |
|
|
| ### Carbon emissions |
|
|
| Single consumer GPU (RTX 3080 Ti, ~300 W TDP) × 49.6 min wall-clock |
| ≈ 0.25 kWh, < 1 kg CO₂e on a grid mix. Comparable to a short web |
| streaming session. |
|
|
| ## Connection to the QuantumAI Blockchain |
|
|
| The Aether Mind is a Rust neural cognitive engine that runs on the |
| QuantumAI Blockchain — every block records attention-derived |
| consciousness metrics (HMS-Phi) and Proof-of-Thought hashes on-chain |
| via the `pallet_qbc_aether_anchor` pallet. The same chain hosts an |
| **8-qubit VQE mining consensus** (Proof-of-SUSY-Alignment), a |
| QVM-compatible smart contract layer with 10 quantum opcodes, and |
| post-quantum signatures (CRYSTALS-Dilithium5 + ML-KEM-768 P2P). |
|
|
| V6.0 is the **native generator** for that engine. v5.2-lora is the |
| larger (7B) off-chain reasoning model. The two ship side by side |
| because they have different roles: V6 lives in the on-chain |
| inference path (low latency, small footprint, Sephirot-aware |
| attention); v5.2-lora batches off-chain reasoning workloads. |
|
|
| ## License + citation |
|
|
| Apache-2.0 (matches the base model license). |
|
|
| ```bibtex |
| @misc{aether_mind_v6_2026, |
| title = {Aether Mind v6.0 --- QuantumAI Blockchain Native Generator}, |
| author = {{BlockArtica} and {QuantumAI-Blockchain}}, |
| year = {2026}, |
| url = {https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.0}, |
| } |
| ``` |
|
|
| ## Links |
|
|
| - **QuantumAI Blockchain:** [qbc.network](https://qbc.network) |
| - **GitHub org:** [github.com/QuantumAI-Blockchain](https://github.com/QuantumAI-Blockchain) |
| - **Aether (Rust):** [qubitcoin-aether](https://github.com/QuantumAI-Blockchain/qubitcoin-aether) |
| - **Prior release:** [aether-v5.2-lora](https://huggingface.co/QuantumAI-Blockchain/aether-v5.2-lora) |
| - **X / Twitter:** [@qu_bitcoin](https://x.com/qu_bitcoin) |
| - **Contact:** info@qbc.network |
|
|
| ### Framework versions |
|
|
| - candle 0.10 (Hugging Face Rust ML) |
| - CUDA 12.6 |
| - safetensors (model serialisation) |
| - Qwen2.5 tokenizer (vocab 151,936) |
|
|