MediaStreamAI commited on
Commit
4df46a4
·
verified ·
1 Parent(s): ca911cf

chunk 600 (W2.8 cutover BASE): upload README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -416
README.md CHANGED
@@ -5,446 +5,101 @@ license_link: LICENSE
5
  language:
6
  - en
7
  - cy
8
- - ga
9
  - gd
 
 
10
  tags:
11
- - sovereign-ai
12
- - uk
13
- - reasoning
14
- - msai
15
  - mother-core
16
- pipeline_tag: text-generation
17
- library_name: pytorch
18
- ---
19
-
20
- # MOTHER CORE V2 — chunk 450 (W2.7)
21
-
22
- **Sovereign UK AI** built from scratch by **MediaStream AI Limited (MSAI)**.
23
-
24
- This is a development checkpoint released for **MSAI team and partner testing only**. It is **not** a released model and **not** intended for production use. Eval performance is partial; the model is mid-training.
25
-
26
- ---
27
-
28
- ## 1. Model Summary
29
-
30
- | Field | Value |
31
- |---|---|
32
- | Model | MOTHER CORE V2 |
33
- | Checkpoint | chunk 450 (W2.7 stage) |
34
- | Parameters | 6.877B |
35
- | Architecture | Custom transformer (RoPE, GQA, RMSNorm, SwiGLU FFN, memory gate) |
36
- | Layers | 48 |
37
- | Hidden dimension | 3,072 |
38
- | Attention heads | 24 (head_dim 128) |
39
- | KV heads | 6 (GQA ratio 4:1) |
40
- | FFN multiplier | 4.0 (intermediate 12,288) |
41
- | Max sequence length | 4,096 |
42
- | Vocabulary | 50,258 (SentencePiece) |
43
- | RoPE θ | 10,000 |
44
- | RMSNorm ε | 1e-5 |
45
- | Tied embeddings | No (separate `lm_head`) |
46
- | Weights dtype (this release) | bfloat16 |
47
- | Training dtype | float32 |
48
-
49
- This is a **from-scratch sovereign build**. It is not a fine-tune of any external model (Llama, Qwen, Mistral, GPT, etc.). Training, tokenisation, architecture, and corpus are all proprietary to MSAI.
50
-
51
- ---
52
-
53
- ## 2. Status
54
-
55
- | Metric | Value |
56
- |---|---|
57
- | Training stage | W2.7 (mid-curriculum) |
58
- | Most recent chunk eval | 47/105 @ chunk 450 |
59
- | Scope | math, science, reasoning, chain-of-thought, UK knowledge, Celtic languages, MOTHER identity, agentic tool use, multi-step planning, RAG, memory, composition (see §3) |
60
- | Out of scope (separate future models) | code generation, creative writing, vision |
61
-
62
- This release is for **internal team testing**. It will fail on tasks outside its training scope.
63
-
64
- The training trajectory has been monotonic since chunk 300:
65
-
66
- | Chunk | Eval | Loss |
67
- |---|---|---|
68
- | 300 | 36/105 | 2.47 |
69
- | 350 | 37/105 | 2.05 |
70
- | 400 | 45/105 | 2.01 |
71
- | **450** | **47/105** | **1.74** |
72
-
73
- W2.7 will continue to chunk 650, after which the W2.8 corpus addition (~330,000 records spanning agentic orchestration, multi-step reasoning, tool use, memory synthesis) will be merged for the next training phase.
74
-
75
- ---
76
-
77
- ## 3. Agent Capabilities Trained
78
-
79
- This checkpoint was trained on the **W2.7 agentic curriculum** in addition to the base reasoning corpus. The model has been exposed to 57 agent-related training categories spanning planning, tool-calling, chain composition, recovery, RAG, memory, and workflow execution.
80
-
81
- The per-category training loss values below are taken from chunk 484 (closest complete log to chunk 450); lower is better — values below 0.5 indicate the category is well-learned, 0.5-1.0 is partially learned, >1.0 needs more training.
82
-
83
- ### 3.1 Agent reasoning & planning
84
-
85
- | Category | Loss @ chunk 484 | Purpose |
86
- |---|---|---|
87
- | `agent_cot_planning` | 0.45 | Decompose a user goal into a stepped plan before acting |
88
- | `agent_cot_decomposition` | 0.42 | Break a multi-part task into independent sub-tasks |
89
- | `agent_cot_synthesis` | 0.39 | Combine multiple tool results into a single answer |
90
- | `agent_cot_verification` | 0.37 | Verify a tool result against the original acceptance criteria |
91
- | `agent_cot_replan` | 0.03 | Revise the plan mid-execution when an observation invalidates it |
92
- | `agent_args_validation` | 0.03 | Validate tool-call arguments before emitting the call |
93
- | `agent_args_hallucination_resist` | 0.08 | Refuse to invent arguments not present in the conversation |
94
-
95
- ### 3.2 Tool calling
96
-
97
- | Category | Loss @ chunk 484 | Purpose |
98
- |---|---|---|
99
- | `agent_call_documents` | 0.59 | Call doc tools (Drive, Notion, PDF/Word/Excel creation) |
100
- | `agent_call_microsoft` | 2.22 | Call Microsoft tools (Graph, Outlook, Teams) — *needs more training* |
101
- | `agent_call_google` | 1.03 | Call Google tools (Drive, Calendar, Gmail) |
102
- | `agent_call_code` | 1.42 | Call code-execution tools (Python, shell, sandbox) |
103
- | `agent_no_tool_needed` | 0.29 | Recognise when a question needs no tool and answer directly |
104
- | `tool_choice_routing` | 0.17 | Route a request to the correct tool of several plausible ones |
105
-
106
- ### 3.3 Multi-step chains
107
-
108
- | Category | Loss @ chunk 484 | Purpose |
109
- |---|---|---|
110
- | `agent_chain_3step` | 0.53 | Three-step sequential tool chains |
111
- | `agent_chain_5plus` | 0.59 | Five-or-more-step tool chains |
112
- | `agent_conditional_chain` | 0.43 | Branching chains where step N depends on step N-1's result |
113
- | `agent_parallel_calls` | 0.51 | Issue independent tool calls in parallel and merge results |
114
-
115
- ### 3.4 Control flow & safety
116
-
117
- | Category | Loss @ chunk 484 | Purpose |
118
- |---|---|---|
119
- | `agent_disambiguation` | 0.65 | Ask for clarification when the user's request is ambiguous |
120
- | `agent_error_recovery` | 0.58 | Recover gracefully from a failed tool call |
121
- | `agent_mid_chain_abort` | 0.32 | Abort a chain when a step reveals the original goal is unreachable |
122
- | `agent_loop_aggregation` | 1.13 | Aggregate results from a loop of tool calls |
123
- | `agent_oauth_required` | 1.12 | Recognise when a tool needs OAuth and surface that to the user |
124
- | `agent_unsafe_refusal` | 0.21 | Refuse unsafe, malicious, or out-of-policy requests |
125
-
126
- ### 3.5 RAG (retrieval-augmented generation)
127
-
128
- | Category | Loss @ chunk 484 | Purpose |
129
- |---|---|---|
130
- | `rag_single_call` | 1.34 | Single retrieval call before answering |
131
- | `rag_synthesis` | 1.90 | Synthesise across multiple retrieved chunks — *needs more training* |
132
- | `rag_with_citation` | 1.25 | Include source citations in the synthesised answer |
133
- | `rag_empty_fallback` | 0.25 | Handle "no relevant results" gracefully |
134
- | `rag_not_needed` | 1.34 | Decline to retrieve when the question doesn't warrant it |
135
-
136
- ### 3.6 Working memory
137
-
138
- | Category | Loss @ chunk 484 | Purpose |
139
- |---|---|---|
140
- | `memory_store` | 0.05 | Persist a fact across turns in a session |
141
- | `memory_recall` | 0.33 | Retrieve a previously-stored fact when relevant |
142
- | `memory_multi_turn` | 0.19 | Carry intermediate state through a multi-turn session |
143
- | `memory_empty` | 0.12 | Handle the cold-start case where memory is empty |
144
-
145
- ### 3.7 Composition (multi-modal chains)
146
-
147
- | Category | Loss @ chunk 484 | Purpose |
148
- |---|---|---|
149
- | `compose_calc_chain` | 0.53 | Compose calculator + downstream tool |
150
- | `compose_memory_calc` | 0.61 | Combine memory recall with calculation |
151
- | `compose_rag_calc` | 0.34 | Retrieve facts, then compute with them |
152
- | `compose_rag_multi` | 0.28 | Multi-step retrieval-and-reason chains |
153
- | `compose_rag_web` | 0.10 | Combine internal retrieval with web search |
154
- | `compose_web_calc` | 0.13 | Web search + calculation |
155
- | `compose_web_memory` | 0.65 | Web search + memory storage |
156
-
157
- ### 3.8 Error recovery
158
-
159
- | Category | Loss @ chunk 484 | Purpose |
160
- |---|---|---|
161
- | `recovery_admit_failure` | 0.19 | Honestly admit when a tool failed rather than fabricate |
162
- | `recovery_alternate` | 0.09 | Try an alternative tool or strategy after failure |
163
- | `recovery_malformed` | 0.17 | Detect and repair malformed tool output |
164
- | `recovery_rewrite` | 0.10 | Rewrite a failing query in a way more likely to succeed |
165
-
166
- ### 3.9 Web search primitives
167
-
168
- | Category | Loss @ chunk 484 | Purpose |
169
- |---|---|---|
170
- | `web_search_single` | 0.11 | One-shot web search |
171
- | `web_search_reading` | 0.19 | Fetch and read a specific URL |
172
- | `web_search_fallback` | 0.26 | Use web search when internal sources fail |
173
-
174
- ### 3.10 Chat behaviour
175
-
176
- | Category | Loss @ chunk 484 | Purpose |
177
- |---|---|---|
178
- | `chat_greeting` | 0.37 | Conversational greetings |
179
- | `chat_acknowledgement` | 0.37 | Acknowledge received instructions |
180
- | `chat_identity` | 0.51 | Maintain MOTHER/MSAI identity in conversation |
181
- | `chat_helpful_refusal` | 0.54 | Decline politely and offer alternatives |
182
- | `chat_length_match` | 1.25 | Match response length to question complexity |
183
- | `chat_multi_turn` | 0.19 | Maintain coherence across conversation turns |
184
-
185
- ### 3.11 Pre-built workflows
186
-
187
- | Category | Loss @ chunk 484 | Purpose |
188
- |---|---|---|
189
- | `workflow_invoice_send` | 0.68 | End-to-end invoice creation + send workflow |
190
- | `workflow_meeting_prep` | 0.86 | Meeting preparation (calendar + brief generation) |
191
- | `workflow_msai_specific` | 1.26 | MSAI-internal workflows (deal flow, tenant comms) |
192
- | `workflow_proposal_pipeline` | 1.06 | Proposal authoring pipeline |
193
- | `workflow_report_generation` | 0.93 | Reporting workflows (status, financial, ops) |
194
-
195
- ### 3.12 AUTM integration
196
-
197
- | Category | Loss @ chunk 484 | Purpose |
198
- |---|---|---|
199
- | `autm_agent` | 0.19 | Generic AUTM agent calling convention |
200
- | `autm_vertical` | 0.19 | AUTM vertical-specific dispatching |
201
-
202
- ### Loss interpretation guide
203
-
204
- - **< 0.30** — well-learned; production-trustable
205
- - **0.30 – 0.60** — partially learned; usable but supervise outputs
206
- - **0.60 – 1.00** — emerging; expect inconsistent behaviour
207
- - **> 1.00** — still training; treat outputs as unreliable
208
-
209
- ### W2.8 will strengthen the weak categories
210
-
211
- The W2.8 corpus (~330,000 new records, currently in build) targets the categories above 1.0 — particularly `agent_call_microsoft` (2.22), `rag_synthesis` (1.90), `agent_call_code` (1.42), `rag_single_call` / `rag_not_needed` (1.34), `chat_length_match` (1.25), and `workflow_msai_specific` (1.26). W2.8 also adds new categories: `doc_format_subtyping`, `verifier_loop`, `args_validation_adversarial`, `execution_graph_dag`, `cot_replan_observation`, `tool_failure_recovery`, `rag_synthesis_grounded`, `retrieval_arbiter`, `multi_agent_orchestration`, and `memory_synthesis`.
212
-
213
- ---
214
-
215
- ## 4. Locked Inference Rules
216
-
217
- **Deviation from these rules produces incorrect or degenerate output.** They are not suggestions — they are the inference recipe the model was trained against.
218
-
219
- | Setting | Value | Reason |
220
- |---|---|---|
221
- | Prompt format | `Question:\n\n{question}\n\nAnswer:` | Exact whitespace. Model is OOD without it. |
222
- | BOS token | id=1, `<s>` | Always prepended; model was trained with BOS at position 0 |
223
- | EOS token | id=2, `</s>` | Stop generation on emission |
224
- | PAD token | id=0, `<pad>` | Training only |
225
- | Sampling | **Greedy argmax** | No temperature, no top-k, no top-p |
226
- | Repetition penalty | 1.3 (frequency-scaled, count ≥ 2) | Higher values collapse output |
227
- | n-gram blocking | 4-gram, no repeat | Prevents loop output |
228
- | Max new tokens | 200 | Hard cap |
229
- | BOS in output | Banned | Never emit BOS during generation |
230
- | EOS in output | Allowed after first token | Early stop signal |
231
-
232
- ### Reference code
233
-
234
- A working reference is included as `inference.py` in this repo. The canonical implementation lives in `mother_train_7b.py::_generate_greedy()` in the MSAI training repository. **Use `inference.py` from this repo or load `mother_train_7b._generate_greedy` directly.** Re-implementations frequently get the recipe wrong.
235
-
236
  ---
237
 
238
- ## 5. Architecture Detail
239
-
240
- ```
241
- MotherCoreModel
242
- ├── tok_emb [50258, 3072]
243
- ├── blocks × 48
244
- │ └── each:
245
- │ ├── attn (GQA)
246
- │ │ ├── wq [3072, 3072] # 24 heads × 128 dim
247
- │ │ ├── wk [768, 3072] # 6 KV heads × 128 dim
248
- │ │ ├── wv [768, 3072]
249
- │ │ └── wo [3072, 3072]
250
- │ ├── ff (SwiGLU)
251
- │ │ ├── w1 [12288, 3072]
252
- │ │ ├── w2 [12288, 3072]
253
- │ │ └── w3 [3072, 12288]
254
- │ ├── norm_attn (RMSNorm)
255
- │ └── norm_ff (RMSNorm)
256
- ├── norm_f [3072]
257
- ├── lm_head [50258, 3072] # NOT tied to tok_emb
258
- └── memory_gate [1, 3072] + bias[1]
259
- ```
260
-
261
- ### Memory gate
262
-
263
- `memory_gate` is a sigmoid-gated single-dimension projection from the last hidden state. It is **trained but not active in inference output** — it is reserved for downstream integration with MOTHER ROBOTICS (an item/object/situational/historical awareness model) and external memory systems. Its activation is exposed in the forward pass return dict but does not affect token logits.
264
-
265
- Forward return:
266
- ```
267
- {
268
- "logits": [B, T, vocab],
269
- "loss": scalar or None,
270
- "aux_loss": scalar (MoE; unused here, fixed=0),
271
- "past_key_values": List[(K,V)] or None,
272
- "hidden_states": List[Tensor] or None,
273
- "last_hidden_state": [B, T, dim],
274
- "gate": [B, 1] ← detached, FYI only
275
- }
276
- ```
277
-
278
- ---
279
 
280
- ## 6. Training
281
 
282
- ### Corpus (W2.7)
283
 
284
- | Category | Records |
285
- |---|---|
286
- | Reasoning + chain-of-thought | ~390,000 |
287
- | UK general knowledge | ~210,000 |
288
- | Math & arithmetic (digit-spaced) | ~165,000 |
289
- | Identity & self-knowledge (MOTHER, MSAI) | ~32,000 |
290
- | Celtic languages (Welsh, Irish, Scottish Gaelic) | ~28,000 |
291
- | Science | ~88,000 |
292
- | Misc (chat, instruct skeleton) | ~135,000 |
293
- | **Total** | **~1.05M** |
294
 
295
- ### Hyperparameters
296
 
297
- | Setting | Value |
298
  |---|---|
299
- | Learning rate | 1e-5 |
300
- | Gradient clip | 10.0 |
301
- | Effective batch size | 32 (BATCH_PHYSICAL=1 × GRAD_ACCUM_STEPS=32) |
302
- | Sequence length (training) | 4096 |
303
- | Optimiser | AdamW (β₁=0.9, β₂=0.95) |
304
- | Weight decay | 0.1 |
305
- | Warmup steps | 100 |
306
- | Layer-wise LR scaling | from chunk 10 onward |
307
- | Hardware | NVIDIA GB10 Blackwell (Grace–Blackwell unified memory, 128GB) |
308
- | Training site | MSAI Wright Avenue, Dundee — sovereign UK infrastructure |
309
-
310
- Training was performed at the full architecture sequence length of **4096** using physical microbatches of 1 with gradient accumulation of 32 (effective batch = 32). Because training and inference share the same context length, no RoPE extrapolation is required for 4096-token inference. Long-context behaviour at full 4096 has been exposed during training but not formally benchmarked at this checkpoint.
311
-
312
- ---
313
 
314
- ## 7. Sovereign Build Posture
315
 
316
- MOTHER CORE is part of MSAI's sovereign AI stack — built end-to-end in the UK on UK-resident infrastructure. The training, weights, tokeniser, and corpus are owned by MSAI. The training datacentres are MSAI-operated (Wright Avenue, Dundee; with additional sites in Durham and Manchester). No US cloud provider is in the inference or training path.
317
 
318
- This positioning matters for UK government, defence, and regulated-enterprise customers where data residency, GDPR, and supply-chain provenance are mandatory.
319
-
320
- ---
321
-
322
- ## 8. Intended Use & Out-of-Scope Use
323
-
324
- **In scope (this checkpoint):**
325
- - Reasoning and chain-of-thought tasks at modest difficulty
326
- - UK general knowledge questions
327
- - Welsh / Irish / Scottish Gaelic short-form questions
328
- - MOTHER-identity Q&A
329
- - Arithmetic on small integers (with digit-spaced inputs for ≥3-digit numbers)
330
-
331
- **Out of scope (this checkpoint):**
332
- - Code generation (separate model — MOTHER CODE — planned)
333
- - Creative writing (separate model — MOTHER LLM — planned)
334
- - Long-form (>1,000 token) generation
335
- - Multi-turn dialogue (training is single-turn Q/A)
336
- - Anything safety-critical, medical, legal, or financial advisory
337
- - Real-time information (model has no internet access at inference)
338
-
339
- ---
340
 
341
- ## 9. Evaluation
342
-
343
- The internal eval suite at chunk 450 scores **47/105 (44.8%)** across:
344
-
345
- - Identity: 6/6 (100%)
346
- - UK knowledge: 9/12
347
- - Reasoning (multi-step): 14/35
348
- - Arithmetic: 5/15
349
- - Science: 7/12
350
- - Celtic languages: 4/9
351
- - Chain-of-thought: 2/16
352
-
353
- Persistent gaps at chunk 450:
354
- - Arithmetic on multi-digit numbers (training fix in progress — see W2.8 plan)
355
- - Multi-step reasoning beyond 3 hops
356
- - Welsh and Irish (smaller corpus volume than other categories)
357
-
358
- Eval suite and methodology are MSAI-internal. Comparable public benchmarks (MMLU, GSM8K) have **not** been run against this checkpoint and would not be directly comparable since the training corpus and tokeniser are sovereign.
359
-
360
- ---
361
-
362
- ## 10. Limitations & Known Failure Modes
363
-
364
- 1. **Single-turn only** — no chat-style multi-turn coherence
365
- 2. **Format-brittle** — the `Question:\n\n...\n\nAnswer:` template is required; other formats produce OOD output
366
- 3. **No tool use / no agent loop** at this checkpoint (W2.8 corpus will add this)
367
- 4. **No code generation** — even simple Python will fail; not in scope
368
- 5. **No retrieval / no internet** — closed-book knowledge only, as of training cutoff
369
- 6. **Arithmetic at multi-digit numbers** — requires digit-spaced input (`1 5 + 2 7`) to perform reliably
370
- 7. **`weights_only=False` required** if loading from `.pt` — this repo ships `.safetensors` instead which is safer
371
- 8. **High repetition penalty (>1.4) collapses output** — stick to 1.3
372
-
373
- ---
374
-
375
- ## 11. Usage
376
-
377
- ### Quick test from a clean Python environment
378
-
379
- ```bash
380
- pip install torch safetensors sentencepiece huggingface_hub
381
- ```
382
-
383
- You also need the `mother_core` package source available (architecture is custom; no Transformers integration yet). Clone the MSAI training repo or copy `mother_core/` into your `PYTHONPATH`.
384
 
385
  ```python
386
- from huggingface_hub import snapshot_download
387
- repo_dir = snapshot_download(repo_id="MediaStreamAI/MOTHER_CORE_V2")
388
- # Then import inference.py from the snapshot
389
- import sys, importlib.util
390
- spec = importlib.util.spec_from_file_location("inf", f"{repo_dir}/inference.py")
391
- inf = importlib.util.module_from_spec(spec); spec.loader.exec_module(inf)
392
-
393
- model, tok = inf.load_model_and_tokenizer(repo_dir)
394
- print(inf.generate_greedy(model, tok, "What is the capital of Scotland?"))
395
- ```
396
-
397
- Or run the inference script directly:
398
-
399
- ```bash
400
- python inference.py "What is the capital of Scotland?"
 
 
 
 
 
 
401
  ```
402
 
403
- ### File map
404
-
405
- | File | Purpose |
406
- |---|---|
407
- | `model-00001-of-00003.safetensors` | Weights, shard 1/3 |
408
- | `model-00002-of-00003.safetensors` | Weights, shard 2/3 |
409
- | `model-00003-of-00003.safetensors` | Weights, shard 3/3 |
410
- | `model.safetensors.index.json` | Shard index |
411
- | `config.json` | Architecture spec |
412
- | `tokenizer.model` | SentencePiece vocab |
413
- | `tokenizer_config.json` | Tokeniser config (`add_bos_token=true` required) |
414
- | `special_tokens_map.json` | BOS/EOS/PAD/UNK ids |
415
- | `inference.py` | Reference inference with locked rules |
416
- | `README.md` | This file |
417
-
418
- ---
419
-
420
- ## 12. License
421
 
422
- **MSAI Sovereign License — Internal & Partner Use Only.**
423
 
424
- This model is the proprietary work of MediaStream AI Limited. It is released to authorised team members and contracted partners for evaluation and integration purposes. Redistribution, commercial use, or training other models on this model's outputs require written permission from MSAI.
 
 
 
425
 
426
- For licensing enquiries: contact MediaStream AI Limited via the company website.
427
 
428
- ---
429
 
430
- ## 13. Citation
431
 
432
- ```
433
- @misc{msai-mother-core-2026,
434
- title = {MOTHER CORE V2 — Sovereign UK AI},
435
- author = {{MediaStream AI Limited}},
436
- year = {2026},
437
- note = {Chunk 450, W2.7 mid-training checkpoint},
438
- url = {https://huggingface.co/MediaStreamAI/MOTHER_CORE_V2}
439
- }
440
- ```
441
-
442
- ---
443
 
444
- ## 14. Contact
 
445
 
446
- - Organisation: MediaStream AI Limited (MSAI)
447
- - Founder & CEO: Christopher Kenna
448
- - Lead AI Architect: Christopher Kenna
449
- - Web: https://mediastreamai.com
450
- - Infrastructure: UK sovereign (Dundee, Durham, Manchester)
 
5
  language:
6
  - en
7
  - cy
 
8
  - gd
9
+ - ga
10
+ pipeline_tag: text-generation
11
  tags:
 
 
 
 
12
  - mother-core
13
+ - msai
14
+ - sovereign-ai
15
+ - united-kingdom
16
+ - causal-lm
17
+ library_name: transformers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ---
19
 
20
+ # MOTHER CORE V2 — chunk 600 (W2.8 cutover base)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
+ **Sovereign UK AI built from scratch by [MediaStream AI Limited (MSAI)](https://mediastreamai.com).**
23
 
24
+ This is **MOTHER CORE BASE** — the frozen foundation checkpoint at chunk 600 of the W2.7 → W2.8 training programme. All downstream MOTHER models (DEFENCE, ROBOTICS, LLM, CODE) build on this base.
25
 
26
+ - **Founder & CEO and Lead AI Architect:** Christopher Kenna
27
+ - **Parameters:** 6.88B (FP32 source, BF16 weights here)
28
+ - **Architecture:** 48 layers, dim 3072, 24 heads, 6 KV heads (GQA 4:1), RoPE θ=10000, RMS norm, tied embeddings
29
+ - **Context:** 4096 tokens
30
+ - **Training:** From-scratch sovereign UK build — no fine-tuning of external models
31
+ - **Source SHA256:** `0b1ef35ec60af4a7ad0648498de8526cb775a19501dda94dfbda1713e0475b60`
 
 
 
 
32
 
33
+ ## Training journey
34
 
35
+ | Milestone | Eval (105-question harness) |
36
  |---|---|
37
+ | Chunk 450 (initial W2.7 baseline) | 47/105 (45%) |
38
+ | Chunk 506 (post LR-fix rollback) | 44/105 (42%) |
39
+ | Chunk 550 (recovery, LR-capped) | 46/105 (44%) |
40
+ | **Chunk 600 (BASE freeze)** | **49/105 (47%)** |
 
 
 
 
 
 
 
 
 
 
41
 
42
+ ## Scope
43
 
44
+ **MOTHER CORE handles:** math, science, reasoning, chain-of-thought, UK knowledge, MOTHER identity, tool calling (agents, RAG, memory, workflows), multilingual responses (English, Welsh, Irish, Scottish Gaelic), safety refusals.
45
 
46
+ **MOTHER CORE does NOT handle (separate sister models):**
47
+ - **MOTHER CODE** — software engineering, code generation
48
+ - **MOTHER LLM** — long-form creative writing, instruction-tuned content
49
+ - **MOTHER DEFENCE** — defence reasoning and strategy (W3 programme, builds on this BASE)
50
+ - **MOTHER ROBOTICS** humanoid robot embodiment (W4 programme, builds on this BASE)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
 
52
+ ## Usage
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
 
54
  ```python
55
+ from transformers import AutoTokenizer, AutoModelForCausalLM
56
+ import torch
57
+
58
+ tok = AutoTokenizer.from_pretrained("MediaStreamAI/MOTHER_CORE_V2")
59
+ model = AutoModelForCausalLM.from_pretrained(
60
+ "MediaStreamAI/MOTHER_CORE_V2",
61
+ torch_dtype=torch.bfloat16,
62
+ device_map="auto",
63
+ )
64
+
65
+ prompt = "Question:\n\nWhat is the capital of Wales?\n\nAnswer:"
66
+ inputs = tok(prompt, return_tensors="pt", add_special_tokens=True).to(model.device)
67
+ out = model.generate(
68
+ **inputs,
69
+ max_new_tokens=200,
70
+ do_sample=False,
71
+ repetition_penalty=1.3,
72
+ no_repeat_ngram_size=4,
73
+ pad_token_id=tok.pad_token_id,
74
+ )
75
+ print(tok.decode(out[0], skip_special_tokens=True))
76
  ```
77
 
78
+ **Critical inference rules:**
79
+ - Prompt wrap: `"Question:\n\n{q}\n\nAnswer:"` (exact whitespace)
80
+ - BOS token: 1 (required, `add_bos_token=True`)
81
+ - EOS token: 2
82
+ - PAD token: 0
83
+ - **Use greedy decoding only.** Sampling produces gibberish.
84
+ - Repetition penalty: 1.3, frequency-scaled
85
+ - No-repeat n-gram size: 4
 
 
 
 
 
 
 
 
 
 
86
 
87
+ ## Programme context
88
 
89
+ - **W2.7 (complete)** Core capability training: math, science, reasoning, identity, UK knowledge, multilingual, agent tool-calling, RAG, chat, memory, workflows
90
+ - **W2.8 (in progress)** — Document routing, argument validation, agent verifier loops, multi-step orchestration
91
+ - **W3** — MOTHER DEFENCE (defence reasoning and strategy)
92
+ - **W4** — MOTHER ROBOTICS (embodied awareness for humanoid platforms)
93
 
94
+ UK sovereign infrastructure: Manchester (HQ), Dundee (flagship DC), Durham. Phase 2 expansion H2 2026 to Düsseldorf, South Africa, Jamaica.
95
 
96
+ ## License
97
 
98
+ MSAI Sovereign License. See LICENSE file. Built sovereign in the UK, not derived from any externally-licensed pre-trained model.
99
 
100
+ ## Contact
 
 
 
 
 
 
 
 
 
 
101
 
102
+ MediaStream AI Limited
103
+ West Tower, 371 Deansgate, Manchester M15 4UR, United Kingdom
104
 
105
+ [mediastreamai.com](https://mediastreamai.com)