File size: 5,661 Bytes

2dcc491

---
base_model: Qwen/Qwen2.5-0.5B-Instruct
library_name: safetensors
license: apache-2.0
tags:
  - qubitcoin
  - aether
  - blockchain
  - quantum
  - native-rust
  - candle
  - long-context
  - cosine-schedule
  - resume-fine-tune
language:
  - en
pipeline_tag: text-generation
---

# Aether Mind v6.2 — cosine-decay fine-tune of v6.1

V6.2 picks up where [v6.1](https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.1)
plateaued. Same architecture, same 256-token context, same Aether
curated corpus — but trained for **another 30,000 steps under a
cosine LR decay (2e-5 → 2e-7)** to push the student past its
fine-tune plateau without overshooting.

This is the third native (non-LoRA) Aether release and the first to
use a learning-rate schedule beyond constant. The cosine flag landed
in commit
[`186b2622`](https://github.com/QuantumAI-Blockchain/qubitcoin-aether/commit/186b2622).

## What you're getting

| Field | Value |
|---|---|
| Base model | `Qwen/Qwen2.5-0.5B-Instruct` (initialised from), then v6.1 fine-tune resumed here |
| Architecture | V6 transformer: 24 layers, 896 hidden, 14 attention heads (10 Sephirot + 2 generalist + 2 sink), head_dim=64 |
| Trainable params | ~558 M (all weights, no LoRA) |
| Training mode | **Pure cross-entropy** (no distillation — same as v6.1) |
| Training context | **256 tokens** (same as v6.1) |
| LR schedule | **Cosine decay 2e-5 → 2e-7** over 30,000 fine-tune steps |
| Precision | BF16 weights, F32 KL/CE math internally |
| NSA config | compression_block=64, top_k=2048, sliding_window=512, sink_tokens=4 |
| Vocab | 151,936 (Qwen2.5 tokenizer, untouched) |
| Max position | 32,768 (RoPE theta = 1e6) |
| Total training | **60,000 steps** (30K v6.1 + 30K v6.2) |
| File | `model.safetensors` (1.32 GB, BF16) |
| License | Apache-2.0 (matches base) |

## Training run

| Metric | v6.1 | **v6.2** | Δ |
|---|---|---|---|
| Steps (this run) | 30,000 | 30,000 | = |
| Total steps | 30,000 | **60,000** | +30K |
| Wall-clock (this run) | 44.4 min | **44.9 min** | +0.5 min |
| Mean CE loss (this run) | 10.18 | **8.43** | **−17 %** |
| Throughput | 629.9 tok/s | 622.9 tok/s | flat |
| Mean Sephirot aux | 0.149 | **0.140** | −6 % |
| LR schedule | constant 2e-5 | **cosine 2e-5 → 2e-7** | new |
| NaN events | 0 | 0 | = |
| Resume base | random init (Qwen) | v6.1 final | new |

### Loss trajectory

```
step      1   loss=13.00  avg=13.00   (v6.1 final state)
step    100   loss=12.00  avg=11.78
step   1000   loss= 7.75  avg= 8.82   ← LR still high, big descent through v6.1's plateau
step   5000   loss= 7.25  avg= 7.71
step  10000   loss= 6.69  avg= 7.41   ← minimum running average
step  15000   loss= 9.56  avg= 7.51   ← cosine kicks in, per-step variance ↑, drift ↓
step  20000   loss= 8.94  avg= 7.92
step  25000   loss= 8.75  avg= 8.22
step  29999   loss= 9.31  avg= 8.43
```

The reported mean (8.43) is the run-wide average. The lowest observed
running average (7.41 at step 10K) is the actual fine-tune minimum;
the back-half drift is the cosine schedule reducing step size to near
zero, which makes per-step variance dominate the running average.
This is the expected shape of a converged cosine fine-tune.

## What changed vs v6.1

1. **Cosine LR decay**. Constant LR at 2e-5 in v6.1 caused a plateau
   from step ~10K onward — the optimiser kept bouncing around the
   loss minimum it could see at that step size. Cosine decay to
   2e-7 lets later steps take much smaller updates, fine-tuning past
   the plateau.

2. **Resume from v6.1** rather than fresh init. The model starts at
   v6.1's final state and refines from there.

3. **Otherwise identical to v6.1**: same architecture, same corpus,
   same context, same NSA config, same Sephirot aux. The single
   variable changed is the LR schedule.

## How to use

### Native runtime (recommended) — Rust `aether-mind`

Set `AETHER_V6_CHECKPOINT` to the local path of `model.safetensors`,
restart `qbc-aether-mind.service`.

### Python

```python
from safetensors.torch import load_file
weights = load_file("model.safetensors")
print("params:", sum(t.numel() for t in weights.values()))
```

Same architecture as v6.1, so any custom loader/wrapper for v6.1
works here.

## Evaluation

(lm-evaluation-harness numbers to follow once the eval binary
ships. For now: training-loss curve + sample generations are the
primary signal.)

## Open items for v6.3

- **Per-chunk backward** for distillation at ctx ≥ 256, so we can
  add KL teacher signal back without OOMing.
- **Long-context curriculum** (1K, 4K, 16K → 1M) per the V6 master
  spec.
- **lm-evaluation-harness pass** (MMLU / ARC / HellaSwag /
  TruthfulQA) for honest published numbers.

## License + citation

Apache-2.0 (matches the base model license).

```bibtex
@misc{aether_mind_v62_2026,
  title  = {Aether Mind v6.2 --- cosine-decay fine-tune of v6.1},
  author = {{BlockArtica} and {QuantumAI-Blockchain}},
  year   = {2026},
  url    = {https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.2},
}
```

## Links

- **Aether Mind v6.1** — [https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.1](https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.1)
- **Aether Mind v6.0** — [https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.0](https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.0)
- **Aether v5.2-lora** — [https://huggingface.co/QuantumAI-Blockchain/aether-v5.2-lora](https://huggingface.co/QuantumAI-Blockchain/aether-v5.2-lora)
- **QuantumAI Blockchain** — [qbc.network](https://qbc.network)
- **GitHub** — [github.com/QuantumAI-Blockchain](https://github.com/QuantumAI-Blockchain)