AbstractPhil commited on
Commit
a9195c3
Β·
verified Β·
1 Parent(s): 2792f1a

Create OMEGA_PROGRESSION.md

Browse files
Files changed (1) hide show
  1. OMEGA_PROGRESSION.md +443 -0
OMEGA_PROGRESSION.md ADDED
@@ -0,0 +1,443 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Potential Downstream Utilities Clause
2
+
3
+ **Status:** Forward-looking. Each utility takes the Omega substrate as a
4
+ load-bearing assumption β€” regime-independence of reconstruction quality
5
+ across input scale, the projective-axis codebook as a deterministic
6
+ property of trained sphere-solvers, and hardware-determined throughput
7
+ limits independent of model behavior. Utilities that would work
8
+ equivalently on any encoder are excluded; this is a list of capabilities
9
+ that are *enabled* by Omega, not capabilities incidentally compatible
10
+ with it.
11
+
12
+ **Methodology.** Per the post-000108 research stage, every utility
13
+ section ends with a falsifiable prediction β€” what would have to be true
14
+ for the utility to NOT work. Construction precedes proof. The first
15
+ build that fails its prediction tells us where the substrate's
16
+ boundary actually is.
17
+
18
+ ---
19
+
20
+ ## 1. Classification
21
+
22
+ **The utility.** A projective codebook of `n_axes` directions on
23
+ ℝP^(D-1) is a vocabulary of feature primitives. Image β†’ patch grid β†’ M
24
+ tensor β†’ per-patch projection onto codebook axes β†’ activation pattern
25
+ of shape `[B, n_patches, V, n_axes]`. A linear or shallow head over
26
+ this representation performs classification.
27
+
28
+ **Why Omega.** The codebook is model-intrinsic and regime-flat. A
29
+ classifier trained on activation patterns at 64Γ—64 should generalize
30
+ to 512Γ—512 inputs at inference without retraining, because the
31
+ codebook itself doesn't change with input size. Standard CLIP-style
32
+ models do not give this property β€” their representations drift with
33
+ input resolution; their pooling operations bake in a particular spatial
34
+ extent.
35
+
36
+ **Specific construction.** Train classifier head on per-patch axis
37
+ activations averaged across patches (or attended-over). For
38
+ fine-grained tasks, retain the spatial structure: classifier sees the
39
+ full `[n_patches, n_axes]` matrix as a 2D feature map. Per-patch
40
+ aggregation already validated in scratchpad 000104 β€” patch_idx=0 fails
41
+ because it discards spatial signal; patch-mean recovers most of the
42
+ gap.
43
+
44
+ **Falsifiable prediction.** A classifier trained on 64Γ—64 activation
45
+ patterns achieves comparable accuracy on 512Γ—512 test inputs (within
46
+ 2 percentage points) without any architectural adaptation. If accuracy
47
+ drops sharply with input resolution, the codebook activations are not
48
+ in fact regime-invariant in the way reconstruction is, and Omega
49
+ covers reconstruction but not classification β€” a meaningful boundary.
50
+
51
+ ---
52
+
53
+ ## 2. Diffusion
54
+
55
+ **The utility.** Discrete diffusion in axis-index space. Each patch's
56
+ M-tensor row gets quantized to its nearest codebook axis (or top-k
57
+ mixture). The "noise" process is gradual randomization of axis
58
+ assignments; the "denoise" process is a transformer that predicts
59
+ axis indices from corrupted sequences. Sampling = run denoiser to
60
+ clean axis sequence β†’ reconstruct image via codebook β†’ decoder.
61
+
62
+ **Why Omega.** Three properties combine here. The codebook is a
63
+ finite, deterministic vocabulary, so discrete diffusion is well-defined
64
+ without extra quantizer training. The decoder is regime-flat, so a
65
+ diffusion model trained on 64Γ—64 axis sequences can sample at any
66
+ resolution by predicting longer sequences and decoding at the target
67
+ size. The codebook's projective structure means antipodal axes carry
68
+ equivalent information β€” meaningfully reduces the effective
69
+ vocabulary size for the diffusion target.
70
+
71
+ **Specific construction.** Diffusion target: `[n_patches, top_k]`
72
+ discrete indices into codebook. Loss: cross-entropy over axis indices.
73
+ Backbone: any transformer that handles variable-length token sequences
74
+ (patch count varies with target resolution). Conditioning: optional
75
+ class label or text embedding via cross-attention.
76
+
77
+ **Falsifiable prediction.** A diffusion model trained on 64Γ—64 axis
78
+ sequences from h2-64 produces coherent samples at 256Γ—256 by sampling
79
+ longer sequences and decoding at the target size, without retraining.
80
+ If samples at non-native resolution show mode collapse or boundary
81
+ artifacts beyond what the encoder-decoder pair produces directly,
82
+ the codebook's discreteness is interfering with the regime-flat
83
+ reconstruction β€” narrower than expected.
84
+
85
+ ---
86
+
87
+ ## 3. Processing (image-to-image edits in axis space)
88
+
89
+ **The utility.** Operations applied to codebook activations rather
90
+ than pixels. Image β†’ encode β†’ edit activations β†’ decode. Style
91
+ transfer, denoising, inpainting, semantic editing all become
92
+ manipulations of the `[n_patches, V, n_axes]` activation tensor,
93
+ followed by reconstruction.
94
+
95
+ **Why Omega.** Edits made at one resolution are coherent when decoded
96
+ at another, because the codebook is the same vocabulary at every
97
+ scale. A 64Γ—64 inpaint mask can produce a 512Γ—512 inpainted output by
98
+ upsampling the edited activations and decoding at the target size.
99
+ Critically, the activation edits respect the geometric constraints
100
+ that produced the codebook β€” operations that move activations *off*
101
+ the codebook produce reconstruction artifacts that are themselves a
102
+ useful signal.
103
+
104
+ **Specific construction.** Define edit operations as activation-tensor
105
+ transformations: zero-out (denoise), substitute axis-set (style
106
+ transfer), spatial-gather + redistribute (inpaint), interpolate
107
+ between two images' activations (semantic morph). Provide a
108
+ `process_at_scale` API mirroring `reconstruct_at_scale`.
109
+
110
+ **Falsifiable prediction.** Style transfer applied to 64Γ—64
111
+ activations and decoded at 512Γ—512 produces output indistinguishable
112
+ in style consistency from the same operation applied directly to a
113
+ 512Γ—512 encoding. If the upsampled-edit path produces worse style
114
+ transfer than the direct-encode path, the activation upsampling is
115
+ losing geometric structure that the encoder captures β€” and Omega's
116
+ regime-flatness has a stricter envelope than reconstruction MSE
117
+ alone reveals.
118
+
119
+ ---
120
+
121
+ ## 4. Solving
122
+
123
+ **The utility.** The most direct framing: use the trained sphere-solver
124
+ to solve geometric problems on its native manifold. Given a set of
125
+ points in ℝ^D, encode them via the model's projection path to get
126
+ their representation on RP^(D-1). Given a set of vectors, solve for
127
+ the codebook axes that span them. Given two sets of points, find the
128
+ optimal projective alignment via Procrustes on their codebooks.
129
+
130
+ **Why Omega.** This is the closest utility to the model's identity
131
+ claim. The model is named "sphere-solver" because that's what it is β€”
132
+ a parametric solver for "what's the best projective representation of
133
+ this data on the unit sphere?" The Omega finding is that this solver
134
+ is regime-independent: the same machinery handles 64 input points or
135
+ 65,536 input points and produces structurally consistent answers.
136
+
137
+ **Specific construction.** Expose three solver primitives:
138
+ - `project(points, model) β†’ axes`: encode arbitrary point clouds via
139
+ the model's encoder to get their codebook representation
140
+ - `align(codebook_a, codebook_b) β†’ rotation`: Procrustes-align two
141
+ codebooks (already implemented in tests/framework.py)
142
+ - `solve_basis(target_vectors, model) β†’ axis_indices`: given target
143
+ vectors, find the codebook axes that best span them
144
+
145
+ **Falsifiable prediction.** Procrustes alignment between codebooks of
146
+ the same model on different calibration distributions yields a
147
+ rotation distance below 0.1 (already verified at U5 β€” calibration
148
+ deviations differ by ~0.003). Cross-model alignment between two
149
+ sphere-solvers trained on the same data yields a rotation distance
150
+ below 0.3 (predicted, not yet measured). If cross-model alignment
151
+ turns out to be near-orthogonal random, codebook structure is
152
+ data-driven not architecture-driven, and the solver's "intrinsic"
153
+ status is overstated.
154
+
155
+ ---
156
+
157
+ ## 5. Distillation
158
+
159
+ Two directions, distinct enough to enumerate separately.
160
+
161
+ ### 5a. Distillation INTO sphere-solvers
162
+
163
+ **The utility.** Train a sphere-solver student to match a non-Omega
164
+ teacher's representations. Student inherits regime-flatness
165
+ automatically; teacher's representational quality flows into a
166
+ deployable encoder that handles arbitrary resolution without extra
167
+ machinery.
168
+
169
+ **Why Omega.** Standard distillation produces a student whose
170
+ behavior interpolates the teacher's at training scale. A
171
+ sphere-solver student, by virtue of its architecture, additionally
172
+ inherits regime-flatness β€” the student behaves consistently at
173
+ inference scales the teacher was never tested on. This is a
174
+ distillation result that wouldn't follow from teacher quality alone.
175
+
176
+ **Specific construction.** Loss combines reconstruction (the
177
+ sphere-solver's native objective) with representation matching
178
+ against the teacher's pooled features at intermediate resolution.
179
+ Student emerges with both teacher-like representations AND
180
+ resolution-agnosticism. Teacher candidates: CLIP, DINOv2, Whisper
181
+ (per the Bertenstein cross-modal alignment work).
182
+
183
+ **Falsifiable prediction.** A sphere-solver student distilled from
184
+ DINOv2 at 224Γ—224 produces representations that, when evaluated on a
185
+ standard linear-probe benchmark at 448Γ—448, match or exceed direct
186
+ DINOv2 at 448Γ—448. If the student degrades at non-training scale
187
+ the way the teacher does, distillation didn't transfer
188
+ regime-flatness β€” it transferred only representational quality, and
189
+ the architectural Omega property is more fragile than the
190
+ training-from-scratch case suggests.
191
+
192
+ ### 5b. Distillation FROM sphere-solvers (codebook freezing)
193
+
194
+ **The utility.** Extract a codebook artifact, freeze it, train cheap
195
+ downstream models that consume codebook activations rather than
196
+ re-running the encoder. The codebook becomes a portable feature
197
+ vocabulary; downstream models are 1-2 orders of magnitude smaller.
198
+
199
+ **Why Omega.** U5's verdict (as_is_packaging) makes this trivially
200
+ feasible β€” codebooks are stable artifacts, model-intrinsic and
201
+ calibration-insensitive. The downstream model never sees the original
202
+ encoder; it only sees activation patterns over a fixed vocabulary.
203
+ Resolution-agnosticism is inherited because the codebook is the same
204
+ at every scale.
205
+
206
+ **Specific construction.** Pipeline: (1) extract codebook once, save
207
+ as safetensors+JSON. (2) Pre-compute activation patterns for
208
+ training corpus. (3) Train any standard architecture (MLP, small
209
+ transformer, CNN) with axis activations as input. Codebook stays
210
+ frozen forever after step 1.
211
+
212
+ **Falsifiable prediction.** Already validated by U5 + the geolip-core
213
+ pipeline. Failure mode would be: a downstream model trained on
214
+ codebook activations underperforms an end-to-end model of similar
215
+ parameter count. Predicted not to fail in the regime-flat use case
216
+ (where end-to-end models lack regime-flatness anyway), but might fail
217
+ in the standard fixed-resolution regime where end-to-end has free
218
+ parameter advantage.
219
+
220
+ ---
221
+
222
+ ## 6. Tokenization for downstream LLMs / multimodal models
223
+
224
+ **The utility.** The codebook is a discrete vocabulary of size
225
+ `n_axes` (typically 27–230). Images β†’ axis activation sequences β†’
226
+ discrete tokens fed to autoregressive language models. The geolip-svae
227
+ becomes an image tokenizer for the existing multimodal-LLM ecosystem.
228
+
229
+ **Why Omega.** Three properties matter. Vocabulary size is small
230
+ compared to standard learned image tokenizers (VQ-VAE typically
231
+ ~8K-16K codes); axis count being ~30 means a 512-token-budget LLM can
232
+ attend to ~17 patches, or with top-k=4 mixture per patch, the same
233
+ budget covers ~128 patches. Resolution-agnosticism means the same
234
+ tokenizer handles any input image without retraining. Calibration
235
+ insensitivity means the tokenizer is a fixed component, not a
236
+ learned-per-task module.
237
+
238
+ **Specific construction.** Wrap codebook quantization as a tokenizer
239
+ class with `encode(image) β†’ token_sequence` and `decode(token_sequence,
240
+ target_size) β†’ image` methods. Define special tokens for image-start,
241
+ image-end, optionally row-start markers for spatial structure.
242
+ Integrate via standard transformers/HuggingFace tokenizer interface.
243
+
244
+ **Falsifiable prediction.** A small (~100M param) decoder-only LLM
245
+ trained on text + axis-token sequences performs image captioning at
246
+ the same quality as CLIP+LLM with comparable compute. If quality is
247
+ significantly lower, axis tokenization is losing image content that
248
+ continuous embeddings preserve, and the discreteness has a real
249
+ cost. If quality matches, the small vocabulary is a free reduction
250
+ in token budget for image content.
251
+
252
+ ---
253
+
254
+ ## 7. Anomaly / OOD detection
255
+
256
+ **The utility.** Self-validating inference. Compute the codebook of
257
+ the input itself (not the model's reference codebook) and measure
258
+ deviation from the reference. Inputs whose induced codebook
259
+ substantially deviates from the model's training-derived codebook
260
+ are out-of-distribution; the deviation magnitude is the OOD score.
261
+
262
+ **Why Omega.** A regime-flat model has a well-defined "in-distribution"
263
+ surface in codebook space. The `is_projective_clean` check already
264
+ captures this internally for codebook validation. Inverted, the same
265
+ machinery becomes an inference-time validity flag: every prediction
266
+ ships with a confidence signal derived from the input's geometric
267
+ compatibility with the codebook.
268
+
269
+ **Specific construction.** At inference, extract a per-batch codebook
270
+ from the input M tensor and compute Procrustes distance to the
271
+ attached reference codebook. Add to InferenceEngine as
272
+ `engine.validity_score(images) β†’ float` and threshold-based
273
+ `engine.predict_with_confidence(images) β†’ (recon, confidence)`.
274
+ The throughput sweep already shows MSE ratio is a candidate validity
275
+ signal β€” Procrustes distance on a per-batch codebook is the
276
+ finer-grained version.
277
+
278
+ **Falsifiable prediction.** Inputs with codebook Procrustes distance
279
+ > 0.5 from reference produce reconstructions with MSE > 5Γ— native
280
+ floor. If correlation between codebook deviation and reconstruction
281
+ quality is weak (correlation < 0.5), the codebook deviation is
282
+ measuring something independent of model competence, and it isn't a
283
+ useful inference-time validity signal.
284
+
285
+ ---
286
+
287
+ ## 8. Cross-modal alignment
288
+
289
+ **The utility.** Multiple sphere-solvers trained on different
290
+ modalities (image, audio, text-as-noise) project into compatible
291
+ codebook spaces after Procrustes alignment. Cross-modal retrieval,
292
+ joint generation, and modality translation operate in shared axis
293
+ space rather than via a learned joint embedding.
294
+
295
+ **Why Omega.** The Bertenstein work demonstrated this with frozen
296
+ expert encoders projecting through a shared text hub. Today's finding
297
+ strengthens the claim: cross-modal alignment is *between codebooks*
298
+ (deterministic artifacts) rather than between learned projections.
299
+ Each modality's sphere-solver produces a codebook on its own
300
+ ℝP^(D-1); alignment is a fixed rotation, not a trained mapping.
301
+
302
+ **Specific construction.** Train sphere-solvers per modality. Extract
303
+ codebooks. Compute pairwise Procrustes alignments to a chosen
304
+ reference modality. At inference, project inputs through their native
305
+ sphere-solver, apply the cross-modal rotation, and operate in shared
306
+ axis space. No joint training required after the per-modality stage.
307
+
308
+ **Falsifiable prediction.** Image-text retrieval via codebook
309
+ alignment matches CLIP-style joint-embedding retrieval at comparable
310
+ compute on standard benchmarks (MS-COCO, Flickr30K). If retrieval is
311
+ significantly worse, cross-modal information lives in the relations
312
+ *between* codebook activations rather than in the codebooks
313
+ themselves, and the alignment-only approach is missing structure that
314
+ joint training captures.
315
+
316
+ ---
317
+
318
+ ## 9. Self-supervised pretraining recipes
319
+
320
+ **The utility.** Bootstrap foundation models on structured noise
321
+ alone. The h2-64 batteries already train on noise distributions and
322
+ develop projective-clean codebooks; this generalizes to a recipe for
323
+ training sphere-solver foundation models without curated real-world
324
+ data.
325
+
326
+ **Why Omega.** The projective-axis codebook emerges deterministically
327
+ from sphere-normalized SVD training, regardless of input distribution
328
+ (per U5: gaussian and sixteen-noise calibrations produce essentially
329
+ identical codebooks for the same model). The model's geometric
330
+ substrate is largely independent of training corpus identity. This
331
+ suggests a useful inverse: a foundation model can be pretrained on
332
+ synthetic/structured noise and then fine-tuned to specific modalities
333
+ via the cross-modal alignment recipe (Section 8).
334
+
335
+ **Specific construction.** Define a noise curriculum that exercises
336
+ the geometric primitives β€” gaussian, fractal, structured-but-random,
337
+ adversarial noise. Train sphere-solver to high reconstruction quality
338
+ on this curriculum. Verify the codebook is projective-clean (built-in
339
+ quality check). Release as foundation model.
340
+
341
+ **Falsifiable prediction.** A sphere-solver foundation model
342
+ pretrained on noise alone, fine-tuned on ImageNet via 1% of the
343
+ parameters (a small adapter on top of the frozen encoder), matches
344
+ or exceeds equivalent-compute models pretrained directly on
345
+ ImageNet. If noise-pretraining produces worse downstream performance
346
+ than ImageNet-pretraining at fixed compute, the geometric substrate
347
+ isn't sufficient on its own β€” there's content in real-world
348
+ distributions the model needs to see during pretraining to learn
349
+ effectively.
350
+
351
+ ---
352
+
353
+ ## 10. Continual learning / model-merging
354
+
355
+ **The utility.** Codebooks from independently-trained models are
356
+ comparable artifacts. Merging two models = aligning their codebooks
357
+ via Procrustes, optionally extending the joint axis set to cover
358
+ union-of-features. Continual learning becomes "extend the codebook
359
+ when novel structure appears" rather than "retrain to incorporate new
360
+ data."
361
+
362
+ **Why Omega.** Model identity in the geolip-svae family is largely
363
+ captured by the codebook (calibration insensitivity confirms this).
364
+ Two models trained on different distributions but the same
365
+ architecture have different codebooks; aligning them via Procrustes
366
+ gives a principled way to combine them without the parameter
367
+ interference that plagues standard model-merging methods.
368
+
369
+ **Specific construction.** Operations on Codebook artifacts:
370
+ - `Codebook.merge(other) β†’ Codebook`: union of axes after Procrustes
371
+ alignment, with antipodal-pair re-collapse to deduplicate
372
+ - `Codebook.diff(other) β†’ axes`: axes in `self` that don't have a
373
+ near-equivalent in `other` after alignment β€” the novel structure
374
+ - `Codebook.extend(novel_axes) β†’ Codebook`: append new axes,
375
+ re-validate projective-cleanness
376
+ - Continual learning loop: train, extract codebook, diff against
377
+ prior codebook, decide whether to keep new axes, re-emit updated
378
+ codebook.
379
+
380
+ **Falsifiable prediction.** Two h2-64 batteries (different noise
381
+ distributions) merge into a combined codebook with deviation in the
382
+ 0.20–0.23 CV band. If the merge produces a codebook that *fails*
383
+ projective-cleanness, the two codebooks live on incompatible
384
+ projective subspaces and merging is not just a Procrustes alignment
385
+ β€” there's content-level interference that requires retraining.
386
+
387
+ ---
388
+
389
+ ## What this clause does NOT cover
390
+
391
+ Excluded by methodology β€” these are useful applications of geolip-svae
392
+ but do not depend on the Omega substrate in a load-bearing way:
393
+
394
+ - **Standard feature extraction** for downstream tasks where the input
395
+ resolution and modality are fixed. Any encoder can do this; nothing
396
+ Omega-dependent.
397
+ - **Adversarial robustness** as a downstream goal. Possibly correlated
398
+ with codebook quality but not enabled by it specifically.
399
+ - **Reinforcement learning state representations.** The geometric
400
+ substrate provides nothing the RL community can't get from a
401
+ standard VAE.
402
+ - **Generative pretraining for autoregressive language modeling.**
403
+ Sphere-solvers are not autoregressive; pathway from this substrate
404
+ to LLM pretraining is speculative.
405
+
406
+ ---
407
+
408
+ ## Build-order considerations
409
+
410
+ If utilities will be built in sequence rather than parallel, the
411
+ priority ordering by *information value per build* is:
412
+
413
+ 1. **Β§7 OOD detection** β€” already mostly present in the codebook
414
+ machinery, easiest to ship. Validates the validity-flag framing
415
+ from this morning's framing pivot.
416
+ 2. **Β§5b distillation FROM sphere-solvers** β€” also mostly present,
417
+ needs only API wrapping. Demonstrates the codebook as portable
418
+ artifact for the public release.
419
+ 3. **Β§4 solving primitives** β€” exposes the model's identity claim
420
+ directly. The `project / align / solve_basis` triple is a clean
421
+ API surface.
422
+ 4. **Β§1 classification** β€” first non-trivial test of regime-flatness
423
+ beyond reconstruction. Falsifiable prediction is sharp.
424
+ 5. **Β§6 tokenization** β€” bridge to mainstream multimodal architectures.
425
+ Higher build cost but high impact for adoption.
426
+ 6. **Β§8 cross-modal alignment** β€” extends Bertenstein under the new
427
+ framing. Build cost is moderate; depends on having multiple
428
+ modality-specific sphere-solvers trained.
429
+ 7. **Β§5a distillation INTO sphere-solvers** β€” significant training
430
+ investment. Defer until after smaller utilities validate.
431
+ 8. **Β§2 diffusion** β€” substantial build, novel pathway, high uncertainty.
432
+ Worth doing once the codebook artifact patterns are mature.
433
+ 9. **Β§9 self-supervised pretraining** β€” biggest investment, most
434
+ speculative, but if it works it's the largest payoff.
435
+ 10. **Β§3 processing** β€” depends on Β§1 + Β§2 maturity for activation
436
+ edits to be principled. Last in sequence.
437
+ 11. **Β§10 model-merging** β€” research utility rather than deployment
438
+ utility. Useful when there are many trained sphere-solvers to
439
+ consolidate.
440
+
441
+ The first three are all near-term and reuse existing machinery;
442
+ together they constitute a release-ready feature set. The remainder
443
+ are the multi-month research agenda.