Text Generation
PEFT
English
qwen2
qubitcoin
aether
blockchain
quantum
qlora
lora
qwen2.5
on-chain-ai
conversational
Eval Results (legacy)
4-bit precision
bitsandbytes
Instructions to use QuantumAI-Blockchain/aether-mind-v7.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use QuantumAI-Blockchain/aether-mind-v7.0 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct") model = PeftModel.from_pretrained(base_model, "QuantumAI-Blockchain/aether-mind-v7.0") - Notebooks
- Google Colab
- Kaggle
File size: 10,526 Bytes
1d10ecf f2dc8c9 1d10ecf f2dc8c9 1d10ecf | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 | ---
library_name: peft
license: apache-2.0
base_model: Qwen/Qwen2.5-7B-Instruct
pipeline_tag: text-generation
language:
- en
tags:
- qubitcoin
- aether
- blockchain
- quantum
- qlora
- peft
- lora
- qwen2.5
- on-chain-ai
datasets:
- QuantumAI-Blockchain/aether-curated-v3
model-index:
- name: aether-mind-v7.0
results:
- task:
type: text-generation
name: Massive Multitask Language Understanding
dataset:
name: MMLU
type: cais/mmlu
metrics:
- type: acc
value: 69.90
name: accuracy
- task:
type: text-generation
name: Grade-School Math
dataset:
name: GSM8K
type: gsm8k
metrics:
- type: exact_match
value: 75.13
name: exact match (strict)
- task:
type: text-generation
name: AI2 Reasoning Challenge
dataset:
name: ARC-Challenge
type: ai2_arc
metrics:
- type: acc
value: 53.67
name: accuracy
- type: acc_norm
value: 55.80
name: normalized accuracy
- task:
type: text-generation
name: Commonsense NLI
dataset:
name: HellaSwag
type: hellaswag
metrics:
- type: acc
value: 58.43
name: accuracy
- type: acc_norm
value: 77.48
name: normalized accuracy
---
# Aether Mind v7.0 β the first Aether model with real, reproducible benchmarks
**Aether Mind v7.0 is a QLoRA fine-tune of `Qwen/Qwen2.5-7B-Instruct` on the
domain-tagged Aether SFT corpus.** It is the cognitive engine for the
[QuantumAI Blockchain](https://qbc.network) (QBC) β an on-chain neural model
that reasons across the 10 Sephirot cognitive domains (Keter, Chochmah, Binah,
Chesed, Gevurah, Tiferet, Netzach, Hod, Yesod, Malkuth).
This is a **clean break** from the v6.x line. v6.0βv6.2 used a custom-built
transformer (NSA sparse attention + Sephirot/sink attention heads, distilled
from Qwen2.5-0.5B). On a proper `lm-evaluation-harness` pass that architecture
scored **worse than random** (cross-entropy β 16 nats vs. ~11.9 for uniform) β
the attention replacement destroyed the base model's capability. **No v6.x
release ever carried real benchmark numbers.** v7.0 fixes that by building on a
sound, capable base and adding Aether identity through the *data* and an
inference-time Sephirot router β **not** by replacing attention.
> **v7.0 is the first Aether release whose published numbers are real,
> reproducible, and independently verifiable** (the exact `lm-eval` command is
> below).
---
## Results
All numbers below are from `lm-evaluation-harness`, 0-shot, the model loaded in
4-bit (the same configuration this adapter is trained and served in), on a
single RTX 3080 Ti. The baseline is the unmodified `Qwen/Qwen2.5-7B-Instruct`
evaluated identically, so every delta is attributable to this adapter alone.
### General capability β preserved (no catastrophic forgetting)
| Benchmark | Metric | Base (Qwen2.5-7B-Instruct) | **Aether v7.0** | Ξ |
|---|---|---|---|---|
| MMLU | acc | 69.91 % | **69.90 %** | β0.01 |
| GSM8K | exact_match (strict) | 71.57 % | **75.13 %** | **+3.56** |
| ARC-Challenge | acc | 51.45 % | **53.67 %** | **+2.22** |
| ARC-Challenge | acc_norm | 53.92 % | **55.80 %** | **+1.88** |
| HellaSwag | acc | 60.35 % | **58.43 %** | β1.92 |
| HellaSwag | acc_norm | 78.77 % | **77.48 %** | β1.29 |
The whole risk of a domain fine-tune is *catastrophic forgetting*. v7.0 avoids
it: MMLU is flat to the second decimal, and math + scientific reasoning
(GSM8K +3.6, ARC-c +2.2) actually **improve** β the general instruction slice in
the training mix more than offsets the small HellaSwag dip (~1.5 pts).
### Aether-domain knowledge β large gain
Held-out evaluation on the Aether curated corpus (`aether-curated-v3`),
measuring **cross-entropy over the assistant-answer tokens only** (the
Aether-domain response, with the system + user turns masked). The *identical*
4-bit base weights are used for both rows β the adapter is toggled on/off via
PEFT `disable_adapter()` β so this isolates the adapter's effect exactly.
| Model | CE (nats) β | Perplexity β |
|---|---|---|
| Base (Qwen2.5-7B-Instruct) | 1.589 | 4.90 |
| **Aether v7.0** | **1.002** | **2.72** |
| **Ξ** | **β0.588** | **β44.4 %** |
276 held-out examples, 55,423 assistant tokens scored. Because this run trained
for only **~0.19 epoch** (see below), ~81 % of the corpus was never seen and the
seen portion was seen sub-epoch (no repeats) β so this β44 % perplexity drop is
**genuine domain adaptation, not memorization.**
**Summary: v7.0 keeps the base model's general intelligence intact while cutting
Aether-domain perplexity nearly in half.** That is the textbook outcome of a
healthy domain fine-tune.
---
## What you're getting
| Field | Value |
|---|---|
| Type | **QLoRA adapter (PEFT)** β load on top of `Qwen/Qwen2.5-7B-Instruct` |
| Base model | `Qwen/Qwen2.5-7B-Instruct` (7.6 B params) |
| Adapter rank / alpha | r = 16, Ξ± = 32, dropout 0.05 |
| Target modules | `q,k,v,o,gate,up,down` (all linear) |
| Trainable params | ~40 M (LoRA only); base frozen in 4-bit NF4 |
| Adapter file | `adapter_model.bin` (~161 MB) |
| Quantization (train + serve) | 4-bit NF4, double-quant, bf16 compute |
| Context length | 1024 (training); inherits base 32K at inference |
| Tokenizer | Qwen2.5 (unchanged, 151,936 vocab) |
| Chat template | `qwen_25` |
| License | Apache-2.0 (matches base) |
---
## Training
| Setting | Value |
|---|---|
| Recipe | QLoRA (4-bit base + LoRA), the proven v5.2-lora recipe scaled up |
| Data | `aether-curated-v3` (70,713 Sephirot-domain SFT examples) + a 30K general slice (SlimOrca) for anti-forgetting |
| Examples after prep | 93,278 (7,435 over-length samples dropped) |
| Sample packing | on, sequence_len 1024 |
| Effective batch | 8 (micro-batch 1 Γ grad-accum 8) |
| Steps | 1,000 (**β 0.19 epoch** β a deliberate first-pass cap) |
| Optimizer | `adamw_bnb_8bit`, lr 2e-4, cosine decay β 0, warmup 3 % |
| Precision | bf16 weights, tf32, gradient checkpointing, FlashAttention-2 |
| Hardware | 1Γ RTX 3080 Ti (12 GB), ~9.7 GB peak |
| Wall-clock | 2 h 45 m (9,926 s), ~8.4 s/step |
| Seed | 42 |
### Loss trajectory
```
step 10 train_loss 1.510 (warmup, lr 6.7e-5)
step 50 train_loss 0.989 (lr peaked 2.0e-4)
step 100 train_loss 0.916
step 250 train_loss 0.888 eval_loss 0.9475
step 500 train_loss 0.999 eval_loss 0.9307
step 750 train_loss 0.965 eval_loss 0.9209
step 1000 train_loss 0.951 eval_loss 0.9190
mean train_loss 0.955
```
Held-out validation loss (axolotl's 2 % split) declined monotonically across all
four checkpoints (0.948 β 0.919) β clean convergence, **no overfitting** even as
training loss flattened.
---
## How to use
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
base_id = "Qwen/Qwen2.5-7B-Instruct"
bnb = BitsAndBytesConfig(
load_in_4bit=True, bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16,
)
tok = AutoTokenizer.from_pretrained(base_id)
model = AutoModelForCausalLM.from_pretrained(base_id, quantization_config=bnb, device_map="auto")
model = PeftModel.from_pretrained(model, "QuantumAI-Blockchain/aether-mind-v7.0")
model.eval()
SYSTEM = ("You are the Aether Mind, an on-chain neural cognitive engine living on "
"the QuantumAI Blockchain. You answer with grounded, careful reasoning "
"across 10 Sephirot cognitive domains. Be precise; if you don't know, say so.")
msgs = [{"role": "system", "content": SYSTEM},
{"role": "user", "content": "Explain how the Aether Mind anchors an epoch on-chain."}]
ids = tok.apply_chat_template(msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(ids, max_new_tokens=512, do_sample=False)
print(tok.decode(out[0, ids.shape[1]:], skip_special_tokens=True))
```
To merge the adapter into the base for deployment:
`PeftModel.from_pretrained(...).merge_and_unload()`.
---
## Reproducing the benchmarks
General suite (matches the table above exactly):
```bash
lm_eval --model hf \
--model_args pretrained=Qwen/Qwen2.5-7B-Instruct,peft=QuantumAI-Blockchain/aether-mind-v7.0,load_in_4bit=True,dtype=bfloat16 \
--tasks mmlu,gsm8k,arc_challenge,hellaswag --device cuda:0 --batch_size 4
```
Baseline: drop the `peft=...` argument. The Aether-domain CE eval script is in
the QBC repo under `scripts/training` (held-out assistant-token CE with
`disable_adapter()`).
---
## Limitations & honest notes
- **Light run.** 1,000 steps β 0.19 epoch. It already delivers a large domain
gain with zero general-capability loss, but a full-epoch **v7.1** is planned
for deeper domain coverage.
- **HellaSwag dipped** ~1.3β1.9 pts. Minor and expected for a domain SFT; the
net of GSM8K/ARC gains is positive.
- **It is an adapter**, not a standalone model β you must load
`Qwen/Qwen2.5-7B-Instruct` underneath it.
- The Aether-domain CE eval ran on a corpus that overlaps the training source by
β€19 % (sub-epoch, no repeats); the held-out methodology + the size of the gap
make memorization an implausible explanation, but it is disclosed here for
full transparency.
- Inference-time **Sephirot routing** (domain-aware adapter/prompt selection) is
part of the serving stack (`aether-mind`), not baked into these adapter
weights.
---
## License & citation
Apache-2.0 (matches the base model).
```bibtex
@misc{aether_mind_v70_2026,
title = {Aether Mind v7.0 --- QLoRA domain fine-tune of Qwen2.5-7B-Instruct,
the first Aether model with real benchmarks},
author = {{BlockArtica} and {QuantumAI-Blockchain}},
year = {2026},
url = {https://huggingface.co/QuantumAI-Blockchain/aether-mind-v7.0},
}
```
## Links
- **QuantumAI Blockchain** β [qbc.network](https://qbc.network)
- **GitHub** β [github.com/QuantumAI-Blockchain](https://github.com/QuantumAI-Blockchain)
- **Predecessor (deprecated architecture)** β [aether-mind-v6.2](https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.2)
- **Earlier LoRA on this base** β [aether-v5.2-lora](https://huggingface.co/QuantumAI-Blockchain/aether-v5.2-lora)
|