MediaStreamAI commited on
Commit
5112f48
Β·
verified Β·
1 Parent(s): 523b5e9

Fix training hyperparameters (seq=2048, effective batch=8); add Lead AI Architect; document 57 agent capabilities trained in W2.7

Browse files
Files changed (1) hide show
  1. README.md +151 -12
README.md CHANGED
@@ -56,7 +56,7 @@ This is a **from-scratch sovereign build**. It is not a fine-tune of any externa
56
  |---|---|
57
  | Training stage | W2.7 (mid-curriculum) |
58
  | Most recent chunk eval | 47/105 @ chunk 450 |
59
- | Scope | math, science, reasoning, chain-of-thought, UK knowledge, Celtic languages, MOTHER identity |
60
  | Out of scope (separate future models) | code generation, creative writing, vision |
61
 
62
  This release is for **internal team testing**. It will fail on tasks outside its training scope.
@@ -74,7 +74,145 @@ W2.7 will continue to chunk 650, after which the W2.8 corpus addition (~330,000
74
 
75
  ---
76
 
77
- ## 3. Locked Inference Rules
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78
 
79
  **Deviation from these rules produces incorrect or degenerate output.** They are not suggestions β€” they are the inference recipe the model was trained against.
80
 
@@ -97,7 +235,7 @@ A working reference is included as `inference.py` in this repo. The canonical im
97
 
98
  ---
99
 
100
- ## 4. Architecture Detail
101
 
102
  ```
103
  MotherCoreModel
@@ -139,7 +277,7 @@ Forward return:
139
 
140
  ---
141
 
142
- ## 5. Training
143
 
144
  ### Corpus (W2.7)
145
 
@@ -173,7 +311,7 @@ Training was performed at sequence length **2048** using physical microbatches o
173
 
174
  ---
175
 
176
- ## 6. Sovereign Build Posture
177
 
178
  MOTHER CORE is part of MSAI's sovereign AI stack β€” built end-to-end in the UK on UK-resident infrastructure. The training, weights, tokeniser, and corpus are owned by MSAI. The training datacentres are MSAI-operated (Wright Avenue, Dundee; with additional sites in Durham and Manchester). No US cloud provider is in the inference or training path.
179
 
@@ -181,7 +319,7 @@ This positioning matters for UK government, defence, and regulated-enterprise cu
181
 
182
  ---
183
 
184
- ## 7. Intended Use & Out-of-Scope Use
185
 
186
  **In scope (this checkpoint):**
187
  - Reasoning and chain-of-thought tasks at modest difficulty
@@ -200,7 +338,7 @@ This positioning matters for UK government, defence, and regulated-enterprise cu
200
 
201
  ---
202
 
203
- ## 8. Evaluation
204
 
205
  The internal eval suite at chunk 450 scores **47/105 (44.8%)** across:
206
 
@@ -221,7 +359,7 @@ Eval suite and methodology are MSAI-internal. Comparable public benchmarks (MMLU
221
 
222
  ---
223
 
224
- ## 9. Limitations & Known Failure Modes
225
 
226
  1. **Single-turn only** β€” no chat-style multi-turn coherence
227
  2. **Format-brittle** β€” the `Question:\n\n...\n\nAnswer:` template is required; other formats produce OOD output
@@ -234,7 +372,7 @@ Eval suite and methodology are MSAI-internal. Comparable public benchmarks (MMLU
234
 
235
  ---
236
 
237
- ## 10. Usage
238
 
239
  ### Quick test from a clean Python environment
240
 
@@ -279,7 +417,7 @@ python inference.py "What is the capital of Scotland?"
279
 
280
  ---
281
 
282
- ## 11. License
283
 
284
  **MSAI Sovereign License β€” Internal & Partner Use Only.**
285
 
@@ -289,7 +427,7 @@ For licensing enquiries: contact MediaStream AI Limited via the company website.
289
 
290
  ---
291
 
292
- ## 12. Citation
293
 
294
  ```
295
  @misc{msai-mother-core-2026,
@@ -303,9 +441,10 @@ For licensing enquiries: contact MediaStream AI Limited via the company website.
303
 
304
  ---
305
 
306
- ## 13. Contact
307
 
308
  - Organisation: MediaStream AI Limited (MSAI)
309
  - Founder & CEO: Christopher Kenna
 
310
  - Web: https://mediastreamai.com
311
  - Infrastructure: UK sovereign (Dundee, Durham, Manchester)
 
56
  |---|---|
57
  | Training stage | W2.7 (mid-curriculum) |
58
  | Most recent chunk eval | 47/105 @ chunk 450 |
59
+ | Scope | math, science, reasoning, chain-of-thought, UK knowledge, Celtic languages, MOTHER identity, agentic tool use, multi-step planning, RAG, memory, composition (see Β§3) |
60
  | Out of scope (separate future models) | code generation, creative writing, vision |
61
 
62
  This release is for **internal team testing**. It will fail on tasks outside its training scope.
 
74
 
75
  ---
76
 
77
+ ## 3. Agent Capabilities Trained
78
+
79
+ This checkpoint was trained on the **W2.7 agentic curriculum** in addition to the base reasoning corpus. The model has been exposed to 57 agent-related training categories spanning planning, tool-calling, chain composition, recovery, RAG, memory, and workflow execution.
80
+
81
+ The per-category training loss values below are taken from chunk 484 (closest complete log to chunk 450); lower is better β€” values below 0.5 indicate the category is well-learned, 0.5-1.0 is partially learned, >1.0 needs more training.
82
+
83
+ ### 3.1 Agent reasoning & planning
84
+
85
+ | Category | Loss @ chunk 484 | Purpose |
86
+ |---|---|---|
87
+ | `agent_cot_planning` | 0.45 | Decompose a user goal into a stepped plan before acting |
88
+ | `agent_cot_decomposition` | 0.42 | Break a multi-part task into independent sub-tasks |
89
+ | `agent_cot_synthesis` | 0.39 | Combine multiple tool results into a single answer |
90
+ | `agent_cot_verification` | 0.37 | Verify a tool result against the original acceptance criteria |
91
+ | `agent_cot_replan` | 0.03 | Revise the plan mid-execution when an observation invalidates it |
92
+ | `agent_args_validation` | 0.03 | Validate tool-call arguments before emitting the call |
93
+ | `agent_args_hallucination_resist` | 0.08 | Refuse to invent arguments not present in the conversation |
94
+
95
+ ### 3.2 Tool calling
96
+
97
+ | Category | Loss @ chunk 484 | Purpose |
98
+ |---|---|---|
99
+ | `agent_call_documents` | 0.59 | Call doc tools (Drive, Notion, PDF/Word/Excel creation) |
100
+ | `agent_call_microsoft` | 2.22 | Call Microsoft tools (Graph, Outlook, Teams) β€” *needs more training* |
101
+ | `agent_call_google` | 1.03 | Call Google tools (Drive, Calendar, Gmail) |
102
+ | `agent_call_code` | 1.42 | Call code-execution tools (Python, shell, sandbox) |
103
+ | `agent_no_tool_needed` | 0.29 | Recognise when a question needs no tool and answer directly |
104
+ | `tool_choice_routing` | 0.17 | Route a request to the correct tool of several plausible ones |
105
+
106
+ ### 3.3 Multi-step chains
107
+
108
+ | Category | Loss @ chunk 484 | Purpose |
109
+ |---|---|---|
110
+ | `agent_chain_3step` | 0.53 | Three-step sequential tool chains |
111
+ | `agent_chain_5plus` | 0.59 | Five-or-more-step tool chains |
112
+ | `agent_conditional_chain` | 0.43 | Branching chains where step N depends on step N-1's result |
113
+ | `agent_parallel_calls` | 0.51 | Issue independent tool calls in parallel and merge results |
114
+
115
+ ### 3.4 Control flow & safety
116
+
117
+ | Category | Loss @ chunk 484 | Purpose |
118
+ |---|---|---|
119
+ | `agent_disambiguation` | 0.65 | Ask for clarification when the user's request is ambiguous |
120
+ | `agent_error_recovery` | 0.58 | Recover gracefully from a failed tool call |
121
+ | `agent_mid_chain_abort` | 0.32 | Abort a chain when a step reveals the original goal is unreachable |
122
+ | `agent_loop_aggregation` | 1.13 | Aggregate results from a loop of tool calls |
123
+ | `agent_oauth_required` | 1.12 | Recognise when a tool needs OAuth and surface that to the user |
124
+ | `agent_unsafe_refusal` | 0.21 | Refuse unsafe, malicious, or out-of-policy requests |
125
+
126
+ ### 3.5 RAG (retrieval-augmented generation)
127
+
128
+ | Category | Loss @ chunk 484 | Purpose |
129
+ |---|---|---|
130
+ | `rag_single_call` | 1.34 | Single retrieval call before answering |
131
+ | `rag_synthesis` | 1.90 | Synthesise across multiple retrieved chunks β€” *needs more training* |
132
+ | `rag_with_citation` | 1.25 | Include source citations in the synthesised answer |
133
+ | `rag_empty_fallback` | 0.25 | Handle "no relevant results" gracefully |
134
+ | `rag_not_needed` | 1.34 | Decline to retrieve when the question doesn't warrant it |
135
+
136
+ ### 3.6 Working memory
137
+
138
+ | Category | Loss @ chunk 484 | Purpose |
139
+ |---|---|---|
140
+ | `memory_store` | 0.05 | Persist a fact across turns in a session |
141
+ | `memory_recall` | 0.33 | Retrieve a previously-stored fact when relevant |
142
+ | `memory_multi_turn` | 0.19 | Carry intermediate state through a multi-turn session |
143
+ | `memory_empty` | 0.12 | Handle the cold-start case where memory is empty |
144
+
145
+ ### 3.7 Composition (multi-modal chains)
146
+
147
+ | Category | Loss @ chunk 484 | Purpose |
148
+ |---|---|---|
149
+ | `compose_calc_chain` | 0.53 | Compose calculator + downstream tool |
150
+ | `compose_memory_calc` | 0.61 | Combine memory recall with calculation |
151
+ | `compose_rag_calc` | 0.34 | Retrieve facts, then compute with them |
152
+ | `compose_rag_multi` | 0.28 | Multi-step retrieval-and-reason chains |
153
+ | `compose_rag_web` | 0.10 | Combine internal retrieval with web search |
154
+ | `compose_web_calc` | 0.13 | Web search + calculation |
155
+ | `compose_web_memory` | 0.65 | Web search + memory storage |
156
+
157
+ ### 3.8 Error recovery
158
+
159
+ | Category | Loss @ chunk 484 | Purpose |
160
+ |---|---|---|
161
+ | `recovery_admit_failure` | 0.19 | Honestly admit when a tool failed rather than fabricate |
162
+ | `recovery_alternate` | 0.09 | Try an alternative tool or strategy after failure |
163
+ | `recovery_malformed` | 0.17 | Detect and repair malformed tool output |
164
+ | `recovery_rewrite` | 0.10 | Rewrite a failing query in a way more likely to succeed |
165
+
166
+ ### 3.9 Web search primitives
167
+
168
+ | Category | Loss @ chunk 484 | Purpose |
169
+ |---|---|---|
170
+ | `web_search_single` | 0.11 | One-shot web search |
171
+ | `web_search_reading` | 0.19 | Fetch and read a specific URL |
172
+ | `web_search_fallback` | 0.26 | Use web search when internal sources fail |
173
+
174
+ ### 3.10 Chat behaviour
175
+
176
+ | Category | Loss @ chunk 484 | Purpose |
177
+ |---|---|---|
178
+ | `chat_greeting` | 0.37 | Conversational greetings |
179
+ | `chat_acknowledgement` | 0.37 | Acknowledge received instructions |
180
+ | `chat_identity` | 0.51 | Maintain MOTHER/MSAI identity in conversation |
181
+ | `chat_helpful_refusal` | 0.54 | Decline politely and offer alternatives |
182
+ | `chat_length_match` | 1.25 | Match response length to question complexity |
183
+ | `chat_multi_turn` | 0.19 | Maintain coherence across conversation turns |
184
+
185
+ ### 3.11 Pre-built workflows
186
+
187
+ | Category | Loss @ chunk 484 | Purpose |
188
+ |---|---|---|
189
+ | `workflow_invoice_send` | 0.68 | End-to-end invoice creation + send workflow |
190
+ | `workflow_meeting_prep` | 0.86 | Meeting preparation (calendar + brief generation) |
191
+ | `workflow_msai_specific` | 1.26 | MSAI-internal workflows (deal flow, tenant comms) |
192
+ | `workflow_proposal_pipeline` | 1.06 | Proposal authoring pipeline |
193
+ | `workflow_report_generation` | 0.93 | Reporting workflows (status, financial, ops) |
194
+
195
+ ### 3.12 AUTM integration
196
+
197
+ | Category | Loss @ chunk 484 | Purpose |
198
+ |---|---|---|
199
+ | `autm_agent` | 0.19 | Generic AUTM agent calling convention |
200
+ | `autm_vertical` | 0.19 | AUTM vertical-specific dispatching |
201
+
202
+ ### Loss interpretation guide
203
+
204
+ - **< 0.30** β€” well-learned; production-trustable
205
+ - **0.30 – 0.60** β€” partially learned; usable but supervise outputs
206
+ - **0.60 – 1.00** β€” emerging; expect inconsistent behaviour
207
+ - **> 1.00** β€” still training; treat outputs as unreliable
208
+
209
+ ### W2.8 will strengthen the weak categories
210
+
211
+ The W2.8 corpus (~330,000 new records, currently in build) targets the categories above 1.0 β€” particularly `agent_call_microsoft` (2.22), `rag_synthesis` (1.90), `agent_call_code` (1.42), `rag_single_call` / `rag_not_needed` (1.34), `chat_length_match` (1.25), and `workflow_msai_specific` (1.26). W2.8 also adds new categories: `doc_format_subtyping`, `verifier_loop`, `args_validation_adversarial`, `execution_graph_dag`, `cot_replan_observation`, `tool_failure_recovery`, `rag_synthesis_grounded`, `retrieval_arbiter`, `multi_agent_orchestration`, and `memory_synthesis`.
212
+
213
+ ---
214
+
215
+ ## 4. Locked Inference Rules
216
 
217
  **Deviation from these rules produces incorrect or degenerate output.** They are not suggestions β€” they are the inference recipe the model was trained against.
218
 
 
235
 
236
  ---
237
 
238
+ ## 5. Architecture Detail
239
 
240
  ```
241
  MotherCoreModel
 
277
 
278
  ---
279
 
280
+ ## 6. Training
281
 
282
  ### Corpus (W2.7)
283
 
 
311
 
312
  ---
313
 
314
+ ## 7. Sovereign Build Posture
315
 
316
  MOTHER CORE is part of MSAI's sovereign AI stack β€” built end-to-end in the UK on UK-resident infrastructure. The training, weights, tokeniser, and corpus are owned by MSAI. The training datacentres are MSAI-operated (Wright Avenue, Dundee; with additional sites in Durham and Manchester). No US cloud provider is in the inference or training path.
317
 
 
319
 
320
  ---
321
 
322
+ ## 8. Intended Use & Out-of-Scope Use
323
 
324
  **In scope (this checkpoint):**
325
  - Reasoning and chain-of-thought tasks at modest difficulty
 
338
 
339
  ---
340
 
341
+ ## 9. Evaluation
342
 
343
  The internal eval suite at chunk 450 scores **47/105 (44.8%)** across:
344
 
 
359
 
360
  ---
361
 
362
+ ## 10. Limitations & Known Failure Modes
363
 
364
  1. **Single-turn only** β€” no chat-style multi-turn coherence
365
  2. **Format-brittle** β€” the `Question:\n\n...\n\nAnswer:` template is required; other formats produce OOD output
 
372
 
373
  ---
374
 
375
+ ## 11. Usage
376
 
377
  ### Quick test from a clean Python environment
378
 
 
417
 
418
  ---
419
 
420
+ ## 12. License
421
 
422
  **MSAI Sovereign License β€” Internal & Partner Use Only.**
423
 
 
427
 
428
  ---
429
 
430
+ ## 13. Citation
431
 
432
  ```
433
  @misc{msai-mother-core-2026,
 
441
 
442
  ---
443
 
444
+ ## 14. Contact
445
 
446
  - Organisation: MediaStream AI Limited (MSAI)
447
  - Founder & CEO: Christopher Kenna
448
+ - Lead AI Architect: Christopher Kenna
449
  - Web: https://mediastreamai.com
450
  - Infrastructure: UK sovereign (Dundee, Durham, Manchester)