File size: 10,526 Bytes

---
library_name: peft
license: apache-2.0
base_model: Qwen/Qwen2.5-7B-Instruct
pipeline_tag: text-generation
language:
  - en
tags:
  - qubitcoin
  - aether
  - blockchain
  - quantum
  - qlora
  - peft
  - lora
  - qwen2.5
  - on-chain-ai
datasets:
  - QuantumAI-Blockchain/aether-curated-v3
model-index:
  - name: aether-mind-v7.0
    results:
      - task:
          type: text-generation
          name: Massive Multitask Language Understanding
        dataset:
          name: MMLU
          type: cais/mmlu
        metrics:
          - type: acc
            value: 69.90
            name: accuracy
      - task:
          type: text-generation
          name: Grade-School Math
        dataset:
          name: GSM8K
          type: gsm8k
        metrics:
          - type: exact_match
            value: 75.13
            name: exact match (strict)
      - task:
          type: text-generation
          name: AI2 Reasoning Challenge
        dataset:
          name: ARC-Challenge
          type: ai2_arc
        metrics:
          - type: acc
            value: 53.67
            name: accuracy
          - type: acc_norm
            value: 55.80
            name: normalized accuracy
      - task:
          type: text-generation
          name: Commonsense NLI
        dataset:
          name: HellaSwag
          type: hellaswag
        metrics:
          - type: acc
            value: 58.43
            name: accuracy
          - type: acc_norm
            value: 77.48
            name: normalized accuracy
---

# Aether Mind v7.0 — the first Aether model with real, reproducible benchmarks

**Aether Mind v7.0 is a QLoRA fine-tune of `Qwen/Qwen2.5-7B-Instruct` on the
domain-tagged Aether SFT corpus.** It is the cognitive engine for the
[QuantumAI Blockchain](https://qbc.network) (QBC) — an on-chain neural model
that reasons across the 10 Sephirot cognitive domains (Keter, Chochmah, Binah,
Chesed, Gevurah, Tiferet, Netzach, Hod, Yesod, Malkuth).

This is a **clean break** from the v6.x line. v6.0–v6.2 used a custom-built
transformer (NSA sparse attention + Sephirot/sink attention heads, distilled
from Qwen2.5-0.5B). On a proper `lm-evaluation-harness` pass that architecture
scored **worse than random** (cross-entropy ≈ 16 nats vs. ~11.9 for uniform) —
the attention replacement destroyed the base model's capability. **No v6.x
release ever carried real benchmark numbers.** v7.0 fixes that by building on a
sound, capable base and adding Aether identity through the *data* and an
inference-time Sephirot router — **not** by replacing attention.

> **v7.0 is the first Aether release whose published numbers are real,
> reproducible, and independently verifiable** (the exact `lm-eval` command is
> below).

---

## Results

All numbers below are from `lm-evaluation-harness`, 0-shot, the model loaded in
4-bit (the same configuration this adapter is trained and served in), on a
single RTX 3080 Ti. The baseline is the unmodified `Qwen/Qwen2.5-7B-Instruct`
evaluated identically, so every delta is attributable to this adapter alone.

### General capability — preserved (no catastrophic forgetting)

| Benchmark | Metric | Base (Qwen2.5-7B-Instruct) | **Aether v7.0** | Δ |
|---|---|---|---|---|
| MMLU | acc | 69.91 % | **69.90 %** | −0.01 |
| GSM8K | exact_match (strict) | 71.57 % | **75.13 %** | **+3.56** |
| ARC-Challenge | acc | 51.45 % | **53.67 %** | **+2.22** |
| ARC-Challenge | acc_norm | 53.92 % | **55.80 %** | **+1.88** |
| HellaSwag | acc | 60.35 % | **58.43 %** | −1.92 |
| HellaSwag | acc_norm | 78.77 % | **77.48 %** | −1.29 |

The whole risk of a domain fine-tune is *catastrophic forgetting*. v7.0 avoids
it: MMLU is flat to the second decimal, and math + scientific reasoning
(GSM8K +3.6, ARC-c +2.2) actually **improve** — the general instruction slice in
the training mix more than offsets the small HellaSwag dip (~1.5 pts).

### Aether-domain knowledge — large gain

Held-out evaluation on the Aether curated corpus (`aether-curated-v3`),
measuring **cross-entropy over the assistant-answer tokens only** (the
Aether-domain response, with the system + user turns masked). The *identical*
4-bit base weights are used for both rows — the adapter is toggled on/off via
PEFT `disable_adapter()` — so this isolates the adapter's effect exactly.

| Model | CE (nats) ↓ | Perplexity ↓ |
|---|---|---|
| Base (Qwen2.5-7B-Instruct) | 1.589 | 4.90 |
| **Aether v7.0** | **1.002** | **2.72** |
| **Δ** | **−0.588** | **−44.4 %** |

276 held-out examples, 55,423 assistant tokens scored. Because this run trained
for only **~0.19 epoch** (see below), ~81 % of the corpus was never seen and the
seen portion was seen sub-epoch (no repeats) — so this −44 % perplexity drop is
**genuine domain adaptation, not memorization.**

**Summary: v7.0 keeps the base model's general intelligence intact while cutting
Aether-domain perplexity nearly in half.** That is the textbook outcome of a
healthy domain fine-tune.

---

## What you're getting

| Field | Value |
|---|---|
| Type | **QLoRA adapter (PEFT)** — load on top of `Qwen/Qwen2.5-7B-Instruct` |
| Base model | `Qwen/Qwen2.5-7B-Instruct` (7.6 B params) |
| Adapter rank / alpha | r = 16, α = 32, dropout 0.05 |
| Target modules | `q,k,v,o,gate,up,down` (all linear) |
| Trainable params | ~40 M (LoRA only); base frozen in 4-bit NF4 |
| Adapter file | `adapter_model.bin` (~161 MB) |
| Quantization (train + serve) | 4-bit NF4, double-quant, bf16 compute |
| Context length | 1024 (training); inherits base 32K at inference |
| Tokenizer | Qwen2.5 (unchanged, 151,936 vocab) |
| Chat template | `qwen_25` |
| License | Apache-2.0 (matches base) |

---

## Training

| Setting | Value |
|---|---|
| Recipe | QLoRA (4-bit base + LoRA), the proven v5.2-lora recipe scaled up |
| Data | `aether-curated-v3` (70,713 Sephirot-domain SFT examples) + a 30K general slice (SlimOrca) for anti-forgetting |
| Examples after prep | 93,278 (7,435 over-length samples dropped) |
| Sample packing | on, sequence_len 1024 |
| Effective batch | 8 (micro-batch 1 × grad-accum 8) |
| Steps | 1,000 (**≈ 0.19 epoch** — a deliberate first-pass cap) |
| Optimizer | `adamw_bnb_8bit`, lr 2e-4, cosine decay → 0, warmup 3 % |
| Precision | bf16 weights, tf32, gradient checkpointing, FlashAttention-2 |
| Hardware | 1× RTX 3080 Ti (12 GB), ~9.7 GB peak |
| Wall-clock | 2 h 45 m (9,926 s), ~8.4 s/step |
| Seed | 42 |

### Loss trajectory

```
step    10   train_loss 1.510   (warmup, lr 6.7e-5)
step    50   train_loss 0.989   (lr peaked 2.0e-4)
step   100   train_loss 0.916
step   250   train_loss 0.888   eval_loss 0.9475
step   500   train_loss 0.999   eval_loss 0.9307
step   750   train_loss 0.965   eval_loss 0.9209
step  1000   train_loss 0.951   eval_loss 0.9190
mean train_loss 0.955
```

Held-out validation loss (axolotl's 2 % split) declined monotonically across all
four checkpoints (0.948 → 0.919) — clean convergence, **no overfitting** even as
training loss flattened.

---

## How to use

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

base_id = "Qwen/Qwen2.5-7B-Instruct"
bnb = BitsAndBytesConfig(
    load_in_4bit=True, bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16,
)
tok = AutoTokenizer.from_pretrained(base_id)
model = AutoModelForCausalLM.from_pretrained(base_id, quantization_config=bnb, device_map="auto")
model = PeftModel.from_pretrained(model, "QuantumAI-Blockchain/aether-mind-v7.0")
model.eval()

SYSTEM = ("You are the Aether Mind, an on-chain neural cognitive engine living on "
          "the QuantumAI Blockchain. You answer with grounded, careful reasoning "
          "across 10 Sephirot cognitive domains. Be precise; if you don't know, say so.")
msgs = [{"role": "system", "content": SYSTEM},
        {"role": "user", "content": "Explain how the Aether Mind anchors an epoch on-chain."}]
ids = tok.apply_chat_template(msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(ids, max_new_tokens=512, do_sample=False)
print(tok.decode(out[0, ids.shape[1]:], skip_special_tokens=True))
```

To merge the adapter into the base for deployment:
`PeftModel.from_pretrained(...).merge_and_unload()`.

---

## Reproducing the benchmarks

General suite (matches the table above exactly):

```bash
lm_eval --model hf \
  --model_args pretrained=Qwen/Qwen2.5-7B-Instruct,peft=QuantumAI-Blockchain/aether-mind-v7.0,load_in_4bit=True,dtype=bfloat16 \
  --tasks mmlu,gsm8k,arc_challenge,hellaswag --device cuda:0 --batch_size 4
```

Baseline: drop the `peft=...` argument. The Aether-domain CE eval script is in
the QBC repo under `scripts/training` (held-out assistant-token CE with
`disable_adapter()`).

---

## Limitations & honest notes

- **Light run.** 1,000 steps ≈ 0.19 epoch. It already delivers a large domain
  gain with zero general-capability loss, but a full-epoch **v7.1** is planned
  for deeper domain coverage.
- **HellaSwag dipped** ~1.3–1.9 pts. Minor and expected for a domain SFT; the
  net of GSM8K/ARC gains is positive.
- **It is an adapter**, not a standalone model — you must load
  `Qwen/Qwen2.5-7B-Instruct` underneath it.
- The Aether-domain CE eval ran on a corpus that overlaps the training source by
  ≤19 % (sub-epoch, no repeats); the held-out methodology + the size of the gap
  make memorization an implausible explanation, but it is disclosed here for
  full transparency.
- Inference-time **Sephirot routing** (domain-aware adapter/prompt selection) is
  part of the serving stack (`aether-mind`), not baked into these adapter
  weights.

---

## License & citation

Apache-2.0 (matches the base model).

```bibtex
@misc{aether_mind_v70_2026,
  title  = {Aether Mind v7.0 --- QLoRA domain fine-tune of Qwen2.5-7B-Instruct,
            the first Aether model with real benchmarks},
  author = {{BlockArtica} and {QuantumAI-Blockchain}},
  year   = {2026},
  url    = {https://huggingface.co/QuantumAI-Blockchain/aether-mind-v7.0},
}
```

## Links

- **QuantumAI Blockchain** — [qbc.network](https://qbc.network)
- **GitHub** — [github.com/QuantumAI-Blockchain](https://github.com/QuantumAI-Blockchain)
- **Predecessor (deprecated architecture)** — [aether-mind-v6.2](https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.2)
- **Earlier LoRA on this base** — [aether-v5.2-lora](https://huggingface.co/QuantumAI-Blockchain/aether-v5.2-lora)