| --- |
| license: cc-by-nc-4.0 |
| language: [ar, en, de, fr, es, it] |
| tags: [multilingual, research, ablation, arkadiko, laser2, moda, sophia] |
| library_name: pytorch |
| --- |
| |
| # Arkadiko V5 Experiments — 150M Architecture Validation |
|
|
| Research artifacts from the 150M production prep sprint (2026-04-12). |
| Single RTX PRO 4000 Blackwell (24GB), /bin/zsh compute. |
|
|
| ## Contents |
|
|
| | Directory | What | Key result | |
| |---|---|---| |
| | tokenizer/ | 60K SentencePiece BPE (6-lang) | ar 4.97, en 5.72 chars/tok | |
| | aeq_v10_adamw/ | 150M calibration (MoDA + LASER2 cross-attn) | val 2.81, 37.8K tok/s | |
| | aeq_v10b_sophia/ | Sophia-G matched-step head-to-head | val 2.57 but SFT WORSE | |
| | sft_adamw/ | AdamW SFT (22K Aya examples) | delta -0.53 overall | |
| | sft_sophia/ | Sophia SFT comparison | delta -0.37 (REJECTED) | |
|
|
| ## Architecture (locked for production) |
| 14L x 640d x GQA 5:1 x MoDA(block_size=2), per-layer cross-attention to |
| frozen LASER2 BiLSTM. 173.5M params. AdamW optimizer. |
| |
| ## Sophia verdict (ADR-192): REJECTED |
| Pre-training val loss 8.6% better, but SFT transfer 30% WORSE. Russian |
| regressed. Generations degenerate. Third confirmation that pre-training |
| loss != downstream quality (L-296). |
| |
| ## License |
| CC BY-NC 4.0 |
| |