aether-mind-v6.2 / README.md
BlockArtica's picture
v6.2 release: cosine LR fine-tune of v6.1, mean loss 10.18 β†’ 8.43 (-17%)
2dcc491 verified
---
base_model: Qwen/Qwen2.5-0.5B-Instruct
library_name: safetensors
license: apache-2.0
tags:
- qubitcoin
- aether
- blockchain
- quantum
- native-rust
- candle
- long-context
- cosine-schedule
- resume-fine-tune
language:
- en
pipeline_tag: text-generation
---
# Aether Mind v6.2 β€” cosine-decay fine-tune of v6.1
V6.2 picks up where [v6.1](https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.1)
plateaued. Same architecture, same 256-token context, same Aether
curated corpus β€” but trained for **another 30,000 steps under a
cosine LR decay (2e-5 β†’ 2e-7)** to push the student past its
fine-tune plateau without overshooting.
This is the third native (non-LoRA) Aether release and the first to
use a learning-rate schedule beyond constant. The cosine flag landed
in commit
[`186b2622`](https://github.com/QuantumAI-Blockchain/qubitcoin-aether/commit/186b2622).
## What you're getting
| Field | Value |
|---|---|
| Base model | `Qwen/Qwen2.5-0.5B-Instruct` (initialised from), then v6.1 fine-tune resumed here |
| Architecture | V6 transformer: 24 layers, 896 hidden, 14 attention heads (10 Sephirot + 2 generalist + 2 sink), head_dim=64 |
| Trainable params | ~558 M (all weights, no LoRA) |
| Training mode | **Pure cross-entropy** (no distillation β€” same as v6.1) |
| Training context | **256 tokens** (same as v6.1) |
| LR schedule | **Cosine decay 2e-5 β†’ 2e-7** over 30,000 fine-tune steps |
| Precision | BF16 weights, F32 KL/CE math internally |
| NSA config | compression_block=64, top_k=2048, sliding_window=512, sink_tokens=4 |
| Vocab | 151,936 (Qwen2.5 tokenizer, untouched) |
| Max position | 32,768 (RoPE theta = 1e6) |
| Total training | **60,000 steps** (30K v6.1 + 30K v6.2) |
| File | `model.safetensors` (1.32 GB, BF16) |
| License | Apache-2.0 (matches base) |
## Training run
| Metric | v6.1 | **v6.2** | Ξ” |
|---|---|---|---|
| Steps (this run) | 30,000 | 30,000 | = |
| Total steps | 30,000 | **60,000** | +30K |
| Wall-clock (this run) | 44.4 min | **44.9 min** | +0.5 min |
| Mean CE loss (this run) | 10.18 | **8.43** | **βˆ’17 %** |
| Throughput | 629.9 tok/s | 622.9 tok/s | flat |
| Mean Sephirot aux | 0.149 | **0.140** | βˆ’6 % |
| LR schedule | constant 2e-5 | **cosine 2e-5 β†’ 2e-7** | new |
| NaN events | 0 | 0 | = |
| Resume base | random init (Qwen) | v6.1 final | new |
### Loss trajectory
```
step 1 loss=13.00 avg=13.00 (v6.1 final state)
step 100 loss=12.00 avg=11.78
step 1000 loss= 7.75 avg= 8.82 ← LR still high, big descent through v6.1's plateau
step 5000 loss= 7.25 avg= 7.71
step 10000 loss= 6.69 avg= 7.41 ← minimum running average
step 15000 loss= 9.56 avg= 7.51 ← cosine kicks in, per-step variance ↑, drift ↓
step 20000 loss= 8.94 avg= 7.92
step 25000 loss= 8.75 avg= 8.22
step 29999 loss= 9.31 avg= 8.43
```
The reported mean (8.43) is the run-wide average. The lowest observed
running average (7.41 at step 10K) is the actual fine-tune minimum;
the back-half drift is the cosine schedule reducing step size to near
zero, which makes per-step variance dominate the running average.
This is the expected shape of a converged cosine fine-tune.
## What changed vs v6.1
1. **Cosine LR decay**. Constant LR at 2e-5 in v6.1 caused a plateau
from step ~10K onward β€” the optimiser kept bouncing around the
loss minimum it could see at that step size. Cosine decay to
2e-7 lets later steps take much smaller updates, fine-tuning past
the plateau.
2. **Resume from v6.1** rather than fresh init. The model starts at
v6.1's final state and refines from there.
3. **Otherwise identical to v6.1**: same architecture, same corpus,
same context, same NSA config, same Sephirot aux. The single
variable changed is the LR schedule.
## How to use
### Native runtime (recommended) β€” Rust `aether-mind`
Set `AETHER_V6_CHECKPOINT` to the local path of `model.safetensors`,
restart `qbc-aether-mind.service`.
### Python
```python
from safetensors.torch import load_file
weights = load_file("model.safetensors")
print("params:", sum(t.numel() for t in weights.values()))
```
Same architecture as v6.1, so any custom loader/wrapper for v6.1
works here.
## Evaluation
(lm-evaluation-harness numbers to follow once the eval binary
ships. For now: training-loss curve + sample generations are the
primary signal.)
## Open items for v6.3
- **Per-chunk backward** for distillation at ctx β‰₯ 256, so we can
add KL teacher signal back without OOMing.
- **Long-context curriculum** (1K, 4K, 16K β†’ 1M) per the V6 master
spec.
- **lm-evaluation-harness pass** (MMLU / ARC / HellaSwag /
TruthfulQA) for honest published numbers.
## License + citation
Apache-2.0 (matches the base model license).
```bibtex
@misc{aether_mind_v62_2026,
title = {Aether Mind v6.2 --- cosine-decay fine-tune of v6.1},
author = {{BlockArtica} and {QuantumAI-Blockchain}},
year = {2026},
url = {https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.2},
}
```
## Links
- **Aether Mind v6.1** β€” [https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.1](https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.1)
- **Aether Mind v6.0** β€” [https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.0](https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.0)
- **Aether v5.2-lora** β€” [https://huggingface.co/QuantumAI-Blockchain/aether-v5.2-lora](https://huggingface.co/QuantumAI-Blockchain/aether-v5.2-lora)
- **QuantumAI Blockchain** β€” [qbc.network](https://qbc.network)
- **GitHub** β€” [github.com/QuantumAI-Blockchain](https://github.com/QuantumAI-Blockchain)