--- base_model: Qwen/Qwen2.5-0.5B-Instruct library_name: safetensors license: apache-2.0 tags: - qubitcoin - aether - blockchain - quantum - native-rust - candle - long-context - cosine-schedule - resume-fine-tune language: - en pipeline_tag: text-generation --- # Aether Mind v6.2 — cosine-decay fine-tune of v6.1 V6.2 picks up where [v6.1](https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.1) plateaued. Same architecture, same 256-token context, same Aether curated corpus — but trained for **another 30,000 steps under a cosine LR decay (2e-5 → 2e-7)** to push the student past its fine-tune plateau without overshooting. This is the third native (non-LoRA) Aether release and the first to use a learning-rate schedule beyond constant. The cosine flag landed in commit [`186b2622`](https://github.com/QuantumAI-Blockchain/qubitcoin-aether/commit/186b2622). ## What you're getting | Field | Value | |---|---| | Base model | `Qwen/Qwen2.5-0.5B-Instruct` (initialised from), then v6.1 fine-tune resumed here | | Architecture | V6 transformer: 24 layers, 896 hidden, 14 attention heads (10 Sephirot + 2 generalist + 2 sink), head_dim=64 | | Trainable params | ~558 M (all weights, no LoRA) | | Training mode | **Pure cross-entropy** (no distillation — same as v6.1) | | Training context | **256 tokens** (same as v6.1) | | LR schedule | **Cosine decay 2e-5 → 2e-7** over 30,000 fine-tune steps | | Precision | BF16 weights, F32 KL/CE math internally | | NSA config | compression_block=64, top_k=2048, sliding_window=512, sink_tokens=4 | | Vocab | 151,936 (Qwen2.5 tokenizer, untouched) | | Max position | 32,768 (RoPE theta = 1e6) | | Total training | **60,000 steps** (30K v6.1 + 30K v6.2) | | File | `model.safetensors` (1.32 GB, BF16) | | License | Apache-2.0 (matches base) | ## Training run | Metric | v6.1 | **v6.2** | Δ | |---|---|---|---| | Steps (this run) | 30,000 | 30,000 | = | | Total steps | 30,000 | **60,000** | +30K | | Wall-clock (this run) | 44.4 min | **44.9 min** | +0.5 min | | Mean CE loss (this run) | 10.18 | **8.43** | **−17 %** | | Throughput | 629.9 tok/s | 622.9 tok/s | flat | | Mean Sephirot aux | 0.149 | **0.140** | −6 % | | LR schedule | constant 2e-5 | **cosine 2e-5 → 2e-7** | new | | NaN events | 0 | 0 | = | | Resume base | random init (Qwen) | v6.1 final | new | ### Loss trajectory ``` step 1 loss=13.00 avg=13.00 (v6.1 final state) step 100 loss=12.00 avg=11.78 step 1000 loss= 7.75 avg= 8.82 ← LR still high, big descent through v6.1's plateau step 5000 loss= 7.25 avg= 7.71 step 10000 loss= 6.69 avg= 7.41 ← minimum running average step 15000 loss= 9.56 avg= 7.51 ← cosine kicks in, per-step variance ↑, drift ↓ step 20000 loss= 8.94 avg= 7.92 step 25000 loss= 8.75 avg= 8.22 step 29999 loss= 9.31 avg= 8.43 ``` The reported mean (8.43) is the run-wide average. The lowest observed running average (7.41 at step 10K) is the actual fine-tune minimum; the back-half drift is the cosine schedule reducing step size to near zero, which makes per-step variance dominate the running average. This is the expected shape of a converged cosine fine-tune. ## What changed vs v6.1 1. **Cosine LR decay**. Constant LR at 2e-5 in v6.1 caused a plateau from step ~10K onward — the optimiser kept bouncing around the loss minimum it could see at that step size. Cosine decay to 2e-7 lets later steps take much smaller updates, fine-tuning past the plateau. 2. **Resume from v6.1** rather than fresh init. The model starts at v6.1's final state and refines from there. 3. **Otherwise identical to v6.1**: same architecture, same corpus, same context, same NSA config, same Sephirot aux. The single variable changed is the LR schedule. ## How to use ### Native runtime (recommended) — Rust `aether-mind` Set `AETHER_V6_CHECKPOINT` to the local path of `model.safetensors`, restart `qbc-aether-mind.service`. ### Python ```python from safetensors.torch import load_file weights = load_file("model.safetensors") print("params:", sum(t.numel() for t in weights.values())) ``` Same architecture as v6.1, so any custom loader/wrapper for v6.1 works here. ## Evaluation (lm-evaluation-harness numbers to follow once the eval binary ships. For now: training-loss curve + sample generations are the primary signal.) ## Open items for v6.3 - **Per-chunk backward** for distillation at ctx ≥ 256, so we can add KL teacher signal back without OOMing. - **Long-context curriculum** (1K, 4K, 16K → 1M) per the V6 master spec. - **lm-evaluation-harness pass** (MMLU / ARC / HellaSwag / TruthfulQA) for honest published numbers. ## License + citation Apache-2.0 (matches the base model license). ```bibtex @misc{aether_mind_v62_2026, title = {Aether Mind v6.2 --- cosine-decay fine-tune of v6.1}, author = {{BlockArtica} and {QuantumAI-Blockchain}}, year = {2026}, url = {https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.2}, } ``` ## Links - **Aether Mind v6.1** — [https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.1](https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.1) - **Aether Mind v6.0** — [https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.0](https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.0) - **Aether v5.2-lora** — [https://huggingface.co/QuantumAI-Blockchain/aether-v5.2-lora](https://huggingface.co/QuantumAI-Blockchain/aether-v5.2-lora) - **QuantumAI Blockchain** — [qbc.network](https://qbc.network) - **GitHub** — [github.com/QuantumAI-Blockchain](https://github.com/QuantumAI-Blockchain)