File size: 5,661 Bytes
2dcc491
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
---
base_model: Qwen/Qwen2.5-0.5B-Instruct
library_name: safetensors
license: apache-2.0
tags:
  - qubitcoin
  - aether
  - blockchain
  - quantum
  - native-rust
  - candle
  - long-context
  - cosine-schedule
  - resume-fine-tune
language:
  - en
pipeline_tag: text-generation
---

# Aether Mind v6.2 β€” cosine-decay fine-tune of v6.1

V6.2 picks up where [v6.1](https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.1)
plateaued. Same architecture, same 256-token context, same Aether
curated corpus β€” but trained for **another 30,000 steps under a
cosine LR decay (2e-5 β†’ 2e-7)** to push the student past its
fine-tune plateau without overshooting.

This is the third native (non-LoRA) Aether release and the first to
use a learning-rate schedule beyond constant. The cosine flag landed
in commit
[`186b2622`](https://github.com/QuantumAI-Blockchain/qubitcoin-aether/commit/186b2622).

## What you're getting

| Field | Value |
|---|---|
| Base model | `Qwen/Qwen2.5-0.5B-Instruct` (initialised from), then v6.1 fine-tune resumed here |
| Architecture | V6 transformer: 24 layers, 896 hidden, 14 attention heads (10 Sephirot + 2 generalist + 2 sink), head_dim=64 |
| Trainable params | ~558 M (all weights, no LoRA) |
| Training mode | **Pure cross-entropy** (no distillation β€” same as v6.1) |
| Training context | **256 tokens** (same as v6.1) |
| LR schedule | **Cosine decay 2e-5 β†’ 2e-7** over 30,000 fine-tune steps |
| Precision | BF16 weights, F32 KL/CE math internally |
| NSA config | compression_block=64, top_k=2048, sliding_window=512, sink_tokens=4 |
| Vocab | 151,936 (Qwen2.5 tokenizer, untouched) |
| Max position | 32,768 (RoPE theta = 1e6) |
| Total training | **60,000 steps** (30K v6.1 + 30K v6.2) |
| File | `model.safetensors` (1.32 GB, BF16) |
| License | Apache-2.0 (matches base) |

## Training run

| Metric | v6.1 | **v6.2** | Ξ” |
|---|---|---|---|
| Steps (this run) | 30,000 | 30,000 | = |
| Total steps | 30,000 | **60,000** | +30K |
| Wall-clock (this run) | 44.4 min | **44.9 min** | +0.5 min |
| Mean CE loss (this run) | 10.18 | **8.43** | **βˆ’17 %** |
| Throughput | 629.9 tok/s | 622.9 tok/s | flat |
| Mean Sephirot aux | 0.149 | **0.140** | βˆ’6 % |
| LR schedule | constant 2e-5 | **cosine 2e-5 β†’ 2e-7** | new |
| NaN events | 0 | 0 | = |
| Resume base | random init (Qwen) | v6.1 final | new |

### Loss trajectory

```
step      1   loss=13.00  avg=13.00   (v6.1 final state)
step    100   loss=12.00  avg=11.78
step   1000   loss= 7.75  avg= 8.82   ← LR still high, big descent through v6.1's plateau
step   5000   loss= 7.25  avg= 7.71
step  10000   loss= 6.69  avg= 7.41   ← minimum running average
step  15000   loss= 9.56  avg= 7.51   ← cosine kicks in, per-step variance ↑, drift ↓
step  20000   loss= 8.94  avg= 7.92
step  25000   loss= 8.75  avg= 8.22
step  29999   loss= 9.31  avg= 8.43
```

The reported mean (8.43) is the run-wide average. The lowest observed
running average (7.41 at step 10K) is the actual fine-tune minimum;
the back-half drift is the cosine schedule reducing step size to near
zero, which makes per-step variance dominate the running average.
This is the expected shape of a converged cosine fine-tune.

## What changed vs v6.1

1. **Cosine LR decay**. Constant LR at 2e-5 in v6.1 caused a plateau
   from step ~10K onward β€” the optimiser kept bouncing around the
   loss minimum it could see at that step size. Cosine decay to
   2e-7 lets later steps take much smaller updates, fine-tuning past
   the plateau.

2. **Resume from v6.1** rather than fresh init. The model starts at
   v6.1's final state and refines from there.

3. **Otherwise identical to v6.1**: same architecture, same corpus,
   same context, same NSA config, same Sephirot aux. The single
   variable changed is the LR schedule.

## How to use

### Native runtime (recommended) β€” Rust `aether-mind`

Set `AETHER_V6_CHECKPOINT` to the local path of `model.safetensors`,
restart `qbc-aether-mind.service`.

### Python

```python
from safetensors.torch import load_file
weights = load_file("model.safetensors")
print("params:", sum(t.numel() for t in weights.values()))
```

Same architecture as v6.1, so any custom loader/wrapper for v6.1
works here.

## Evaluation

(lm-evaluation-harness numbers to follow once the eval binary
ships. For now: training-loss curve + sample generations are the
primary signal.)

## Open items for v6.3

- **Per-chunk backward** for distillation at ctx β‰₯ 256, so we can
  add KL teacher signal back without OOMing.
- **Long-context curriculum** (1K, 4K, 16K β†’ 1M) per the V6 master
  spec.
- **lm-evaluation-harness pass** (MMLU / ARC / HellaSwag /
  TruthfulQA) for honest published numbers.

## License + citation

Apache-2.0 (matches the base model license).

```bibtex
@misc{aether_mind_v62_2026,
  title  = {Aether Mind v6.2 --- cosine-decay fine-tune of v6.1},
  author = {{BlockArtica} and {QuantumAI-Blockchain}},
  year   = {2026},
  url    = {https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.2},
}
```

## Links

- **Aether Mind v6.1** β€” [https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.1](https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.1)
- **Aether Mind v6.0** β€” [https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.0](https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.0)
- **Aether v5.2-lora** β€” [https://huggingface.co/QuantumAI-Blockchain/aether-v5.2-lora](https://huggingface.co/QuantumAI-Blockchain/aether-v5.2-lora)
- **QuantumAI Blockchain** β€” [qbc.network](https://qbc.network)
- **GitHub** β€” [github.com/QuantumAI-Blockchain](https://github.com/QuantumAI-Blockchain)