AbstractPhil commited on
Commit
03a4d8f
·
verified ·
1 Parent(s): 50f7083

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +110 -3
README.md CHANGED
@@ -1,3 +1,110 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # Agatha Diffusion - The first true Geometric Diffusion wide collective
5
+
6
+ ## What is this? Why another thing?
7
+
8
+ Agatha is the manifested potential of many experts attributing towards the behavior of a brand new diffusion concept.
9
+
10
+ This isn't traditional diffusion, but it's built on solid concrete foundational principles of it.
11
+
12
+ Why is simple, I need to test the geofractal router's convenience to large model combinatory effectiveness and attribution capacity with multimodal structures.
13
+
14
+ ## Everyone else builds the same thing, what makes this different?
15
+
16
+ This is a four block hierarchical system where each set of blocks is independent from the blocks before.
17
+
18
+ # Block 1 - Vision + Text Interpolation
19
+
20
+ This block is in charge of one simple task; interpoalting the input into a fragmented structure of concatenated utility downstream.
21
+
22
+ ## QWEN 2.5 Instruct
23
+ Our primary text encoder will house the necessary logical deductive capacity
24
+
25
+ ## Flux AE (Maybe Flux 2)
26
+ Our primary image encoder, utilized as a sequential learning agent in conjunction with geofractal capacity.
27
+ This encoded structure will allow fusion and diffusion down the rail for cohesive capacity. This will enable high fidelity learning.
28
+
29
+ ## GeoVit-David-Beans - rotary head
30
+ Our secondary image encoder, this is a VIT with heavy projection capacity. The behavioral implications will be applied to QWEN.
31
+
32
+ ## Lyra Bottleneck
33
+ Lyra is an ideal KL-divergence bottleneck that can house capacity without collapse as shown through a large series of iterations ran with lyra-xl-cantor-illustrious.
34
+
35
+ The image stays image, the text stays text, the music stays music if included. The internals of LYRA learn to modify and conjunct the behavioral implications of the geometry with the needs.
36
+
37
+ Lyra's feature is ideal for introducing accuracy downstream. The fidelity and accuracy this accumulated projection can apply is worth experimenting with.
38
+
39
+ ## Dino 3 guidance
40
+ Our primary guidance and synthesis coach. Intercepts the behavioral implications that leave block one before entering block two and fuses the learnings with gated fusion.
41
+ This will be a primary trained component and will assist with the learning process of the whole model.
42
+
43
+ # Block 2 - Six Tower Collective
44
+ Each tower is directly in charge of handling verse and inverse capacitants of those energetic behavioral responses.
45
+
46
+ Each tower has it's own rotary implementation for a progressive sub-rope meant to interact with the primary rope in segments.
47
+ The positional encoding is fractally aligned which extends capacity to an indefinite number of sequences using the correct fractal formula, which we are using.
48
+
49
+ ## tower 1, geofractal cantor learning, fingerprint masked
50
+ ## tower 2, geofractal simplex learning, fingerprint masked
51
+ ## tower 3, geofractal shape learning, fingerprint masked
52
+ ## tower 4, geofractal cantor learning, inversion fingerprint masked
53
+ ## tower 5, geofractal simplex learning, inversion fingerprint masked
54
+ ## tower 6, geofractal shape learning, inversion fingerprint masked
55
+ ## tower 7, unsupervised, fingerprint masked rotary theta 1
56
+ ## tower 8, unsupervised, fingerprint masked rotary theta 0.15
57
+ ## tower 9, unsupervised, fingerprint masked rotary theta 0.30
58
+ ## tower 10, unsupervised, fingerprint masked rotary theta 0.45
59
+
60
+ Each output holds a story, and each story is different.
61
+ These differences are what make up the divergent capacity between the multi-expert wide structure for guaranteed expert to expert learning downstream.
62
+ This system is the most tested of the entire core. Everything here is set in stone with one of my experiments or other. There is nothing left to chance here.
63
+
64
+ ## fusion mechanism; sequential multiscale crystal fusion
65
+ cat([tower1, tower2, tower3, tower4, tower5, tower6])
66
+ This mechanism has shown powerful behavioral implications in the past for improving accuracy, so it's only natural we extend this capacity to timestep learning.
67
+
68
+ # Block 3 - Beatrix Core Oscillation System
69
+ This is our primary component meant to learn behavioral implications internally. This is essentially a form of rotary oscillation at it's core design.
70
+ This is essentially hundreds of miniature AI models based on micro-expert analysis of the tower outputs.
71
+
72
+ The delegation exists outside of her core, and the system functions as though everything is weighted by the core's resonance implication.
73
+
74
+ I'll provide a full working series of prototypes for this stashed gem soon.
75
+
76
+ ## How many miniature AI are required?
77
+ That depends how many are required. There could be hundreds, or even thousands. This is where the bulk of her learning will happen.
78
+
79
+ ## What about the scales and sizes?
80
+ Invariant. There will be many interpretations of the same views, all learned in parallel. Many opinions fused together throughout the opinion structure. Many ideas all collaborating together.
81
+
82
+ ## Small scales collapse into entropy and aren't useful right?
83
+ Collapse of the learner is noted and the learner models are set to reinitialize in the prototype.
84
+ Enough of those canary drop and that triggers a cascade evaluation of geometry which realigns the internals and reweights the model.
85
+
86
+ ## Wouldn't this add some severe hardware overhead with so many parallel agents?
87
+ Still uncertain. I'm preparing a full offloading structure for this potential. There may be small amounts, or huge amounts of overhead for certain independent sub-blocks,
88
+ but I will do my best to monitor them as the diffusion training progresses.
89
+
90
+ The router structure will have an LRU caching system per device, depending how well accelerate takes - and those subsystems will be augmented to directly handle their own onload/offloading of information.
91
+
92
+ As it stands, the router structure both optimized for wide and creates barriers that would otherwise cause wide models to corrupt or fail.
93
+
94
+ # Block 4 - The inversion.
95
+
96
+ ## Fusion 1; gated fusion
97
+ This will enable the utilization of the most prudent core components provided by the core.
98
+
99
+ This is where we decode through LYRA. Our structure learned her encodings and her full sequential feature, so it's now time to restore the full structure.
100
+
101
+ The inverse of LYRA involves restoring the input to it's original expectation. We cannot cheat this process, the model must learn to do this. There is no escape.
102
+
103
+ LYRA herself does not fail, but she's big. Before using her we need to train an expert LYRA to implement the necessary behavior, that way she'll be fully prepared for the task.
104
+
105
+
106
+
107
+ # License
108
+
109
+ License: Apache 2.0
110
+ Author: AbstractPhil