--- library_name: peft license: apache-2.0 base_model: Qwen/Qwen2.5-7B-Instruct pipeline_tag: text-generation language: - en tags: - qubitcoin - aether - blockchain - quantum - qlora - peft - lora - qwen2.5 - on-chain-ai datasets: - QuantumAI-Blockchain/aether-curated-v3 model-index: - name: aether-mind-v7.0 results: - task: type: text-generation name: Massive Multitask Language Understanding dataset: name: MMLU type: cais/mmlu metrics: - type: acc value: 69.90 name: accuracy - task: type: text-generation name: Grade-School Math dataset: name: GSM8K type: gsm8k metrics: - type: exact_match value: 75.13 name: exact match (strict) - task: type: text-generation name: AI2 Reasoning Challenge dataset: name: ARC-Challenge type: ai2_arc metrics: - type: acc value: 53.67 name: accuracy - type: acc_norm value: 55.80 name: normalized accuracy - task: type: text-generation name: Commonsense NLI dataset: name: HellaSwag type: hellaswag metrics: - type: acc value: 58.43 name: accuracy - type: acc_norm value: 77.48 name: normalized accuracy --- # Aether Mind v7.0 — the first Aether model with real, reproducible benchmarks **Aether Mind v7.0 is a QLoRA fine-tune of `Qwen/Qwen2.5-7B-Instruct` on the domain-tagged Aether SFT corpus.** It is the cognitive engine for the [QuantumAI Blockchain](https://qbc.network) (QBC) — an on-chain neural model that reasons across the 10 Sephirot cognitive domains (Keter, Chochmah, Binah, Chesed, Gevurah, Tiferet, Netzach, Hod, Yesod, Malkuth). This is a **clean break** from the v6.x line. v6.0–v6.2 used a custom-built transformer (NSA sparse attention + Sephirot/sink attention heads, distilled from Qwen2.5-0.5B). On a proper `lm-evaluation-harness` pass that architecture scored **worse than random** (cross-entropy ≈ 16 nats vs. ~11.9 for uniform) — the attention replacement destroyed the base model's capability. **No v6.x release ever carried real benchmark numbers.** v7.0 fixes that by building on a sound, capable base and adding Aether identity through the *data* and an inference-time Sephirot router — **not** by replacing attention. > **v7.0 is the first Aether release whose published numbers are real, > reproducible, and independently verifiable** (the exact `lm-eval` command is > below). --- ## Results All numbers below are from `lm-evaluation-harness`, 0-shot, the model loaded in 4-bit (the same configuration this adapter is trained and served in), on a single RTX 3080 Ti. The baseline is the unmodified `Qwen/Qwen2.5-7B-Instruct` evaluated identically, so every delta is attributable to this adapter alone. ### General capability — preserved (no catastrophic forgetting) | Benchmark | Metric | Base (Qwen2.5-7B-Instruct) | **Aether v7.0** | Δ | |---|---|---|---|---| | MMLU | acc | 69.91 % | **69.90 %** | −0.01 | | GSM8K | exact_match (strict) | 71.57 % | **75.13 %** | **+3.56** | | ARC-Challenge | acc | 51.45 % | **53.67 %** | **+2.22** | | ARC-Challenge | acc_norm | 53.92 % | **55.80 %** | **+1.88** | | HellaSwag | acc | 60.35 % | **58.43 %** | −1.92 | | HellaSwag | acc_norm | 78.77 % | **77.48 %** | −1.29 | The whole risk of a domain fine-tune is *catastrophic forgetting*. v7.0 avoids it: MMLU is flat to the second decimal, and math + scientific reasoning (GSM8K +3.6, ARC-c +2.2) actually **improve** — the general instruction slice in the training mix more than offsets the small HellaSwag dip (~1.5 pts). ### Aether-domain knowledge — large gain Held-out evaluation on the Aether curated corpus (`aether-curated-v3`), measuring **cross-entropy over the assistant-answer tokens only** (the Aether-domain response, with the system + user turns masked). The *identical* 4-bit base weights are used for both rows — the adapter is toggled on/off via PEFT `disable_adapter()` — so this isolates the adapter's effect exactly. | Model | CE (nats) ↓ | Perplexity ↓ | |---|---|---| | Base (Qwen2.5-7B-Instruct) | 1.589 | 4.90 | | **Aether v7.0** | **1.002** | **2.72** | | **Δ** | **−0.588** | **−44.4 %** | 276 held-out examples, 55,423 assistant tokens scored. Because this run trained for only **~0.19 epoch** (see below), ~81 % of the corpus was never seen and the seen portion was seen sub-epoch (no repeats) — so this −44 % perplexity drop is **genuine domain adaptation, not memorization.** **Summary: v7.0 keeps the base model's general intelligence intact while cutting Aether-domain perplexity nearly in half.** That is the textbook outcome of a healthy domain fine-tune. --- ## What you're getting | Field | Value | |---|---| | Type | **QLoRA adapter (PEFT)** — load on top of `Qwen/Qwen2.5-7B-Instruct` | | Base model | `Qwen/Qwen2.5-7B-Instruct` (7.6 B params) | | Adapter rank / alpha | r = 16, α = 32, dropout 0.05 | | Target modules | `q,k,v,o,gate,up,down` (all linear) | | Trainable params | ~40 M (LoRA only); base frozen in 4-bit NF4 | | Adapter file | `adapter_model.bin` (~161 MB) | | Quantization (train + serve) | 4-bit NF4, double-quant, bf16 compute | | Context length | 1024 (training); inherits base 32K at inference | | Tokenizer | Qwen2.5 (unchanged, 151,936 vocab) | | Chat template | `qwen_25` | | License | Apache-2.0 (matches base) | --- ## Training | Setting | Value | |---|---| | Recipe | QLoRA (4-bit base + LoRA), the proven v5.2-lora recipe scaled up | | Data | `aether-curated-v3` (70,713 Sephirot-domain SFT examples) + a 30K general slice (SlimOrca) for anti-forgetting | | Examples after prep | 93,278 (7,435 over-length samples dropped) | | Sample packing | on, sequence_len 1024 | | Effective batch | 8 (micro-batch 1 × grad-accum 8) | | Steps | 1,000 (**≈ 0.19 epoch** — a deliberate first-pass cap) | | Optimizer | `adamw_bnb_8bit`, lr 2e-4, cosine decay → 0, warmup 3 % | | Precision | bf16 weights, tf32, gradient checkpointing, FlashAttention-2 | | Hardware | 1× RTX 3080 Ti (12 GB), ~9.7 GB peak | | Wall-clock | 2 h 45 m (9,926 s), ~8.4 s/step | | Seed | 42 | ### Loss trajectory ``` step 10 train_loss 1.510 (warmup, lr 6.7e-5) step 50 train_loss 0.989 (lr peaked 2.0e-4) step 100 train_loss 0.916 step 250 train_loss 0.888 eval_loss 0.9475 step 500 train_loss 0.999 eval_loss 0.9307 step 750 train_loss 0.965 eval_loss 0.9209 step 1000 train_loss 0.951 eval_loss 0.9190 mean train_loss 0.955 ``` Held-out validation loss (axolotl's 2 % split) declined monotonically across all four checkpoints (0.948 → 0.919) — clean convergence, **no overfitting** even as training loss flattened. --- ## How to use ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig from peft import PeftModel base_id = "Qwen/Qwen2.5-7B-Instruct" bnb = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16, ) tok = AutoTokenizer.from_pretrained(base_id) model = AutoModelForCausalLM.from_pretrained(base_id, quantization_config=bnb, device_map="auto") model = PeftModel.from_pretrained(model, "QuantumAI-Blockchain/aether-mind-v7.0") model.eval() SYSTEM = ("You are the Aether Mind, an on-chain neural cognitive engine living on " "the QuantumAI Blockchain. You answer with grounded, careful reasoning " "across 10 Sephirot cognitive domains. Be precise; if you don't know, say so.") msgs = [{"role": "system", "content": SYSTEM}, {"role": "user", "content": "Explain how the Aether Mind anchors an epoch on-chain."}] ids = tok.apply_chat_template(msgs, add_generation_prompt=True, return_tensors="pt").to(model.device) out = model.generate(ids, max_new_tokens=512, do_sample=False) print(tok.decode(out[0, ids.shape[1]:], skip_special_tokens=True)) ``` To merge the adapter into the base for deployment: `PeftModel.from_pretrained(...).merge_and_unload()`. --- ## Reproducing the benchmarks General suite (matches the table above exactly): ```bash lm_eval --model hf \ --model_args pretrained=Qwen/Qwen2.5-7B-Instruct,peft=QuantumAI-Blockchain/aether-mind-v7.0,load_in_4bit=True,dtype=bfloat16 \ --tasks mmlu,gsm8k,arc_challenge,hellaswag --device cuda:0 --batch_size 4 ``` Baseline: drop the `peft=...` argument. The Aether-domain CE eval script is in the QBC repo under `scripts/training` (held-out assistant-token CE with `disable_adapter()`). --- ## Limitations & honest notes - **Light run.** 1,000 steps ≈ 0.19 epoch. It already delivers a large domain gain with zero general-capability loss, but a full-epoch **v7.1** is planned for deeper domain coverage. - **HellaSwag dipped** ~1.3–1.9 pts. Minor and expected for a domain SFT; the net of GSM8K/ARC gains is positive. - **It is an adapter**, not a standalone model — you must load `Qwen/Qwen2.5-7B-Instruct` underneath it. - The Aether-domain CE eval ran on a corpus that overlaps the training source by ≤19 % (sub-epoch, no repeats); the held-out methodology + the size of the gap make memorization an implausible explanation, but it is disclosed here for full transparency. - Inference-time **Sephirot routing** (domain-aware adapter/prompt selection) is part of the serving stack (`aether-mind`), not baked into these adapter weights. --- ## License & citation Apache-2.0 (matches the base model). ```bibtex @misc{aether_mind_v70_2026, title = {Aether Mind v7.0 --- QLoRA domain fine-tune of Qwen2.5-7B-Instruct, the first Aether model with real benchmarks}, author = {{BlockArtica} and {QuantumAI-Blockchain}}, year = {2026}, url = {https://huggingface.co/QuantumAI-Blockchain/aether-mind-v7.0}, } ``` ## Links - **QuantumAI Blockchain** — [qbc.network](https://qbc.network) - **GitHub** — [github.com/QuantumAI-Blockchain](https://github.com/QuantumAI-Blockchain) - **Predecessor (deprecated architecture)** — [aether-mind-v6.2](https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.2) - **Earlier LoRA on this base** — [aether-v5.2-lora](https://huggingface.co/QuantumAI-Blockchain/aether-v5.2-lora)