--- language: - en - de - fr - es - pt - it - nl - pl - multilingual license: lgpl-3.0 tags: - mixture-of-experts - ternary-weights - moe - edge-ai - research - candle - rust - federated-learning - low-precision - sprind - dual-stream - cord-surgery - net2net pipeline_tag: text-generation --- # Model Card — albert. (Albert-MoE-13) **Version:** v3.0 (ternary MoE) **Maintainer:** RFI-IRFOS, contact@ternlang.com **Repository:** https://github.com/rfi-irfos/ternary-intelligence-stack **License:** LGPL-3.0-or-later (model weights, training code, inference runtime). Platform infrastructure (API server, MCP tooling, HDL) is BSL-1.1. See [README §Licensing](README.md#licensing) for the full tier breakdown. **Last updated:** 2026-05-27 **Training status:** Paused (Modal billing ceiling, ep4234) — **26L dual-stream** · 13 depth surgeries + 1 cord surgery complete. Cord surgery fired autonomously ep4202, 2026-05-27T16:44Z — first documented single-to-dual-stream bifurcation mid-training. S13 (25L→26L) fired ep~4207. Chip ATL **8.6852** (post-S13). EP_AVG ATL **9.2847** (ep3456, 20L). fib_index=7 · window=34 · Gen3 step1/6. Resuming on Modal T4 once billing settled. --- ## Model Overview albert. is a research-grade language model trained from scratch using a **ternary weight representation** (-1, 0, +1) with a Mixture-of-Experts (MoE) architecture. It is developed by RFI-IRFOS as a demonstration that high-quality language modelling is achievable without 32-bit floating-point weights, targeting inference on edge hardware and low-power devices. | Property | Value | |----------|-------| | Architecture | **Dual-stream** Ternary MoE (Mixture of Experts) | | Streams | **2** (bifurcated via cord surgery ep4202, 2026-05-27) | | Layers | **26** per stream | | Hidden size | **2×256H** (256H per stream) | | Anastomosis gates | **6** — bidirectional F32 cross-stream fusion at Fibonacci layers [2,3,5,8,13,21] | | Experts | 12 per stream (Top-3 routing; shared FFN weights, independent routing gates) | | Context length | 256 tokens | | Vocabulary | 32,000 tokens (ByteLevel BPE — EN/DE/FR/ES/PT/IT/NL/PL) | | Weight representation | Ternary {-1, 0, +1} with STE training | | Gate linear | F32 | | Positional encoding | RoPE (rotate_half) | | Optimizer | AdamW, cosine LR decay, BATCH=1 (post-cord) | | Parameters (total) | **~194.4M** | | Safetensors | **2,044 tensors · 741.4 MB** | | Surgeries | **13 depth (S1–S13)** + **1 cord surgery** = 14 total surgical events | The central technical innovation is the **@sparseskip** primitive — a learned sparse-skip layer that dynamically bypasses computation paths based on token-level activation patterns, enabling sub-linear inference scaling without pruning. --- ## Intended Use **Intended uses:** - Research into ternary and low-precision neural network architectures - Benchmarking inference performance on CPU and edge GPU hardware - Academic study of Mixture-of-Experts routing dynamics - Demonstration platform for the SPRIND AI funding initiative (Germany) **Out-of-scope uses:** - Production deployment as a general-purpose assistant without further fine-tuning and safety evaluation - Safety-critical applications (medical, legal, financial decisions) - Any use requiring factual accuracy guarantees - Deployment to users without appropriate transparency disclosure --- ## Training Data See [DATA_PROVENANCE.md](DATA_PROVENANCE.md) for full source documentation and governance details. **Summary:** albert. is trained on a curated multilingual corpus composed of: | Tier | Content | Approximate Share | |------|---------|------------------| | Core | Project Gutenberg (public domain books, multilingual) | ~30% | | Core | Wikipedia (15 languages: EN, DE, FR, HU, ZH, AR, KO, SV, FI, NL, PL, RU, JA + more) | ~25% | | Core | OpenWebText (filtered Common Crawl) | ~15% | | Technical | GitHub issues, developer blogs, HN discussions | ~10% | | Chaos | Synthetic noise, adversarial patterns, mixed-language text | ~10% | | Structured | Code samples, structured data (JSON/YAML/TSV) | ~5% | | Multilingual | Additional EU language samples | ~5% | The **10% chaos layer** is a structural invariant enforced by the training pipeline (`train_tokenizer_v3.py`). It exists to prevent the model from over-fitting to clean text distributions and to improve robustness to noisy inputs. --- ## Evaluation **Primary metric:** Cross-entropy loss on a held-out WikiText-2 sample (`eval_sample.txt`, not seen during training). **Benchmark results (benchmark suite v2.0.0):** | Epoch | Loss (avg) | Epoch ATL | Batch ATL | Tok/s (T4 GPU) | |-------|-----------|-----------|-----------|----------------| | Ep54 | ~10.35 | 10.35 | — | 11.24 (CPU) | | Ep111 | ~10.36 | 10.36 | — | 18.52 | | Ep849 | ~10.22 | 10.2050 | — | pending | | Ep1177 | 10.2076 | 10.2059 (ep1158) | 10.1738 (ep1155) | pending | | Ep1390 | 10.1212 | 10.1212 (ep1390) | 10.0670 (ep1385) | pending | | Ep1435 | 10.1113 | 10.1113 (ep1435) | **10.0556** (ep1435) | pending | | Ep1438 | 10.1071 | 10.1071 (ep1438) | **10.0556** (ep1435) | pending | | Ep1441 | 10.1067 | 10.1067 (ep1441) | 10.0556 (ep1435) | pending | | Ep1455 | 10.1060 | **10.1060** (ep1455) | **10.0396** (ep1445) | pending | | Ep1474 | 10.0982 | **10.0982** (ep1474) | **10.0396** (ep1445) | pending | | Ep1553 | ~10.07 (est) | 10.0982 (ep1474) | **9.9948** (ep1553) ← first sub-10.0 batch | pending | | Ep2040 | ~9.82 (est) | 9.7976 (ep2084) | **9.6380** (ep1445) | 9.6–18.5 (CPU) | | Ep2084 | **9.7976** | **9.7976** ← epoch ATL | 9.6380 (ep1445) | pending (T4) | | Ep2104 | ~9.81 (est) | 9.7976 (ep2084) | 9.6380 (ep1445) | **9.9–21.3 (CPU)** | | Ep2109 | 9.7975 | **9.7975** (ep2109) | 9.6380 (ep1445) | pending | | Ep2114 | 9.7891 | 9.7891 (ep2114) | **9.6235** (ep2114) ← batch ATL | pending | | Ep2116 | **9.7884** | **9.7884** ← epoch ATL | **9.6235** (ep2114) | pending | | Ep2487 | S6 fired | 18L→19L surgery | — | 2026-05-20T21:33Z; Gen1 step1/6 | | Ep2922 | **9.4992** | **9.4992** ← first sub-9.50 | 9.1370 (chip) | 2026-05-22; LOG expert 0%→28% awakening | | Ep3263 | **9.3651** | ← epoch ATL | **9.0095** (chip) | Broke 139-epoch plateau | | Ep3325 | S7 fired | 18L→19L surgery | — | 2026-05-24T13:47Z; 1315 tensors | | Ep3326 | **9.3182** | ← epoch ATL (first 19L ep) | **8.9190** (chip) | +0.047 nat improvement over prior best | | Ep3383 | S8 fired | 19L→20L surgery | — | Only 58 epochs after S7 | | Ep3456 | **9.2847** | ← epoch ATL (20L) | **8.8540** (chip) | WALD ep3454 INT 91% cliff | | Ep~3470 | S9 fired | 20L→21L surgery | — | Largest post-surgery spike in history (+0.14 nat) | | Ep~3652 | S10 fired | 21L→22L surgery | — | Pre-surgery best 9.2933 | | Ep~4098 | S11 fired | 22L→23L surgery | — | 2026-05-27 morning | | Ep~4140 | S11b fired | 23L→24L surgery | — | Rapid plateau ~42 ep after S11 | | Ep4202 | S12 fired | 24L→25L surgery | — | 2026-05-27T16:43Z; Gen3 plateau | | **Ep4202** | **CORD surgery** | **25L → 2×25L dual-stream** | — | **2026-05-27T16:44Z — first ever autonomous single→dual-stream bifurcation** | | Ep~4203 | 9.3241 | ← first post-cord epoch avg | **8.7123** (chip, new ATL) | Dual-stream live | | Ep~4207 | S13 fired | 25L→26L surgery (both streams) | **8.6852** (chip, new ATL) | 2026-05-27T17:40Z; fib_index 6→7 | The benchmark suite runs 5 fixed prompts covering English, German, multilingual, narrative, and technical domains. Results are reproducible via the open-source `moe-test` binary. **Surgery gate prediction (recorded 2026-05-16T18:40Z) — outcome update 2026-05-19:** A trendline fitted to the ep400–ep1459 loss curve predicted the surgery gate threshold (9.8 epoch-avg) at approximately **ep~2000**. **Prediction confirmed**: the loss gate was cleared at ep2080 (9.7997, 2026-05-19T10:40Z), within the predicted ep2000–2150 window. The gate fires when loss plateaus below 9.8 for a 144-epoch window with `myc_stable ≥ 5`. The loss gate was cleared at ep2080. Following that, albert. entered an **alternating descent phase**: five new epoch ATLs in seven epochs (ep2109–ep2116), dropping from 9.7976 → 9.7884 in under two hours of wall time. WALD sev=0.953; myc_L3 showed its first activity uptick (1.61→1.68 ×10⁻⁹) at ep2114. The plateau gate cannot fire during active descent — surgery timing is now conditioned on when the model settles into the next attractor, not on a fixed epoch countdown. **Milestone (2026-05-17T05:48Z):** First sub-10.0 batch loss in albert. history — **9.9948** at ep1553 batch 149/300. **Milestone (2026-05-19T10:40Z):** First sub-9.8 epoch average — **9.7997** at ep2080. Surgery loss gate cleared. **Milestone (2026-05-19T11:00Z):** New epoch ATL — **9.7976** at ep2084. **Milestone (2026-05-19T13:29Z):** New batch ATL — **9.6235** at ep2114 (prev 9.6282). **Milestone (2026-05-19T13:40Z):** Alternating descent confirmed — five new epoch ATLs in seven epochs; epoch ATL reaches **9.7884** at ep2116. **Known limitations:** - At current training depth (~1459 epochs), output quality is pre-fluency: the model produces partially coherent text in familiar domains but lacks consistent grammatical structure across longer sequences. - Context window of 256 tokens is shorter than contemporary LLMs; cannot maintain coherence over longer passages. - Ternary quantization trades weight precision for size — at this scale, some representational capacity is lost relative to F32 equivalents. - No instruction-following fine-tuning has been applied. - No RLHF, Constitutional AI, or safety fine-tuning of any kind. - Bias evaluation is pending (see below). **Open research questions (scaling risks):** - **STE gradient approximation at scale:** Straight-Through Estimation is the training mechanism for ternary weights. Its stability and convergence properties are well-characterised at current scale (~58M params). Whether STE remains stable through training runs at 500M–1B+ parameters is an open empirical question — no published work has demonstrated ternary STE convergence at frontier scale. - **@sparseskip speedup baseline:** The 83 tok/s inference figure is measured against albert.'s own F32-weight dense equivalent on the same hardware. It is not a direct comparison with INT4-quantized industry inference (TensorRT-LLM, llama.cpp Q4). The relevant claim is that ternary weights eliminate a quantization step entirely — the speedup over post-hoc INT4 quantization of a larger model is a separate, untested question. - **Net2net surgery stability at scale:** All five documented layer-addition surgeries were performed on a model in the 13M–58M parameter range. Whether the Fibonacci-gated surgery protocol remains stable when applied to models at 200M+ parameters has not been tested. The plateau-gate's **withhold behavior** is now validated across six independent events (ep791 non-firing + alternating descent phase ep2109–ep2120 — see below); the question of whether these properties hold at 200M+ scale remains open. **Validated finding — surgery governor robustness (ep2120):** The plateau gate demonstrated robustness against premature surgery triggering: at ep2120, despite crossing the loss threshold (9.7997 < 9.80) at ep2081, the model continued descending through the projected plateau zone, invalidating three pre-computed surgery timing scenarios. The governor correctly withheld surgery while the model was still actively learning — a validation of the design principle that architecture should grow only when learning has genuinely exhausted current capacity. Five new epoch ATLs were recorded in seven epochs (9.7976→9.7884) during the withheld window. Full technical record: [convergence_log.md — Alternating Descent Phase section](docs/convergence_log.md). --- ## Bias and Fairness A formal bias and fairness evaluation has not yet been conducted. Known risk factors: - **Language imbalance:** English-dominant corpus; non-English outputs will be lower quality. - **Temporal bias:** Training data has a knowledge cutoff; the model has no awareness of events after its corpus snapshot dates. - **Domain gaps:** Limited coverage of non-Western cultural contexts, legal jurisdictions outside EU/US/DE, and specialized professional domains. A structured bias evaluation using standard benchmarks (WinoBias, BBQ, multilingual MMLU) is planned for the v3.1 milestone. --- ## Human Oversight albert. is a research model under active development. The following oversight mechanisms are in place: 1. **Training dashboard:** Real-time monitoring of loss curves, expert routing, gradient norms, WALD dead-zone events, and anomaly events by the RFI-IRFOS team. 2. **Surgery governor:** Architectural growth (layer addition via net2net) is fully autonomous — the `EvolutionManager` fires on a Fibonacci-gated plateau detector with no human intervention required. **13 depth surgeries (12L→26L) + 1 cord surgery (single→dual-stream bifurcation)** have been executed autonomously to date. The cord surgery (ep4202, 2026-05-27) is the first documented autonomous single-to-dual-stream bifurcation in a live ternary MoE. 3. **SPORE federated training (live):** Collaborators contribute CPU-trained checkpoints as weight spores via the `albert-spores` private repository. The `SporeManager` blends accepted spores at α=0.08 each epoch boundary with fitness (loss gate) and architecture guards. Colony is active as of 2026-05-16 with external contributors. Spores are stored via Git LFS; each contributor runs `albert-train` locally and submits via `albert-spore`. 4. **Checkpoint promotion:** No trained checkpoint is deployed to any external service without explicit human review and approval by the lead architect. 5. **Rollback capability:** All checkpoints and best-loss weights are preserved on persistent storage. Any version can be reverted. See [SECURITY.md](SECURITY.md) for the incident reporting process. --- ## EU AI Act Compliance Notes albert. is developed in the European Union and is subject to Regulation (EU) 2024/1689 (EU AI Act). RFI-IRFOS self-classifies albert. as a **General-Purpose AI (GPAI) model** under Article 3(63). | Obligation | Article | Status | |------------|---------|--------| | Technical documentation | Annex XI | This document | | Training data summary | Art. 53(1)(d) | [DATA_PROVENANCE.md](DATA_PROVENANCE.md) | | Copyright compliance summary | Art. 53(1)(c) | [DATA_PROVENANCE.md](DATA_PROVENANCE.md) | | Human oversight measures | Art. 53(1)(e) | Described above | | Incident reporting | Art. 53(2) | [SECURITY.md](SECURITY.md) | | Bias/fairness assessment | Art. 53(1)(b) | Planned v3.1 | For questions about compliance or to report concerns: contact@ternlang.com --- ## Team | Name | Role | Contact | |------|------|---------| | Simeon Kepp | Lead Architect — full stack (compiler, BET VM, training, MCP) | s.kepp@ternlang.com | | Louis Paul Ehrig | Head of Public Affairs, Dataset Curation, Corporate Secretary | l.ehrig@ternlang.com | | Lisa Scharler | Head of Social Technology & Ecocentric Systems | l.scharler@ternlang.com | | Zabih Karimi | Co-Founder, IT & Infrastructure, Stress-Testing | z.karimi@ternlang.com | | Nikoletta Csonka | Global Reach, Fundraising & Fund Applications | csonikoletta@ternlang.com | | Claude (Anthropic) | AI Collaborator — architecture, implementation, monitoring | claude@ternlang.com | **Organisation:** Research Focus Institute — Interdisciplinary Research Facility for Open Sciences (RFI-IRFOS) **Address:** Elisabethinergasse 25, 8020 Graz, Austria **Website:** https://ternlang.com **Issues:** https://github.com/rfi-irfos/ternary-intelligence-stack/issues **General contact:** contact@ternlang.com --- ## Legal entity albert. is developed and maintained by **RFI-IRFOS**, a registered, fully regulated Austrian research institute — not an informal initiative. It is a **not-for-profit**: it earns revenue through statute-permitted streams and reinvests at least 90% of surplus into its research mission (at most 10% retained for operations); surplus is not distributed to members. | | | |---|---| | Legal form | Registered association (Verein) operating commercially under a licensed Austrian trade | | ZVR (association register) | 1015608684 | | GISA (trade register) | 39261441 — IT services & automated data processing (free trade, GewO) | | Tax number (Steuernummer) | 68 028/0989 | | GLN | 9110038490191 | | Patent | A50296/2026 (TIS platform, Austrian Patent Office) | | Full legal notice | https://ternlang.com/impressum.html |