muooon commited on
Commit
d3b3cf3
·
verified ·
1 Parent(s): 2ebdc97

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +179 -180
README.md CHANGED
@@ -1,186 +1,185 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- - ja
6
- tags:
7
- - machine-learning
8
- - deep-learning
9
- - transformer
10
- - architecture-design
11
- - adaptive-algorithms
12
- - resonant-contraction
13
- - resonant-projection-field
14
- ---
15
-
16
  ! Apology to Everyone (Important Notice) !
17
 
18
  First and foremost, I would like to offer my sincere apologies to everyone reading this. Regarding the “theoretical claims” about D-RANA published in this repository, particularly the claims concerning resonance, the content should be considered, from the current perspective, a “draft that is unverified and may contain errors.” I deeply apologize to everyone who trusted these descriptions for any misunderstandings that may have resulted. All of the following content is “hypothesis” “unverified,” and “draft.” ※ There is a possibility that my theoretical interpretation was incorrect. ※ This cannot be considered reliable evidence.
19
 
20
- Translated with DeepL.com (free version)
21
-
22
  ! 皆さまへのお詫び (重要なお知らせ) !
23
 
24
- まず最初に、ご覧の皆さまに率直にお詫び申し上げます 本リポジトリで公開している D-RANAに関する 「理論的主張」について、 特に 共鳴 に関する主張は 現在の視点では"未検証/誤りを含む草案"とすべき内容です 記述を信じてくださった皆さまに対し 誤解を招く結果となったことを深くお詫び申し上げます 以下の内容はすべて 「仮説」「未検証」「草案」 です ※ 理論的な解釈を誤っていた可能性があります ※ 信頼できるエビデンスとは言えません
25
-
26
- ---
27
-
28
- # D-RNA:Dual‑Helix Resonance Neural Architecture (DRNA)
29
-
30
- D-RNA is a new neural architecture centered on a dual helix structure and a rotation field produced by RoPE.
31
-
32
- In this architecture, Attention and MLP are synchronized into a dual helix, and information is holographically compressed through Resonant Contraction.
33
- This method rearranges sparse representations into dense ones to achieve high expressiveness using the depth‑direction structure alone, without increasing the number of dimensions.
34
- A key feature of this approach is its ability to preserve the full connectivity of the Transformer architecture while suppressing catastrophic forgetting and retaining subtle fluctuations and phase information.
35
-
36
- ---
37
-
38
- ### Features
39
- High structural compatibility: It has the exact same input–output shape as a standard Transformer Block, allowing it to be smoothly substituted as the core of an architecture.
40
- Resonant Contraction: By synchronizing Attention and the MLP in a double‑helix pattern and converging information into a phase field, it dramatically increases representational density.
41
- Depth as an alternative to dimensionality: The spiral rotation (depth‑wise operations) compensates for limited dimensionality and enables holographic information retention without increasing parameter count.
42
- Excellent learning efficiency: The spiral‑based information attraction (synchronization) achieves astonishing early convergence with far fewer steps than a Transformer.
43
- Fine‑grained phase preservation: The rotational field powered by RoPE preserves subtle fluctuations and relative contextual relationships that are often lost in conventional architectures.
44
- Re‑synchronization of knowledge: Existing weights can be transplanted as initialization and gently adapted to the spiral phase with a low learning rate, allowing existing intelligence to be evolved or overwritten into the D-RNA structure.
45
-
46
-
47
- ### Notes
48
- Optimization of learning rate (LR):
49
- Because D-RNA synchronizes information extremely quickly through Resonant Contraction, it converges sufficiently — and rapidly — even with a lower learning rate compared to a standard Transformer.
50
- If the LR is set too high, the resonance may be excessively amplified and cause oscillation, so starting with a modest LR is recommended.
51
- Synergistic gradient effects:
52
- Since Attention (recall) and the MLP (memory) are synchronized in a double‑helix sequence, the “settling” of weights from a single update is very strong.
53
- This is an advantage for fast convergence, but it also means that careful updates are key to stability.
54
- Parameter commonality:
55
- Hyperparameters such as weight initialization seeds and batch size can be inherited directly from standard Transformer settings.
56
-
57
- ---
58
-
59
- ### Conceptual Diagram
60
-
61
- ```
62
- Synchronizing “searching” (Attention) and “knowing” (MLP) in the phase of a spiral.
63
-
64
- RoPE Rotation Field (Phase-Preserving)
65
- Holographic Compression: Turning Sparse into Dense
66
-
67
- A M
68
- \ /
69
- \ / ← This is Resonance
70
- / \ Synchronization occurs naturally through the seed
71
- / \ Naturally, meaning emerges through a chain of synchronicities
72
- A M
73
-
74
- Repeats in the depth direction to form a dual helix
75
- (acts as a substitute for increasing dimensionality)
76
- ```
77
- ---
78
-
79
- ### Minimal Block
80
-
81
- ```python
82
- class ResonantBlock(nn.Module):
83
- def __init__(self, dim, n_heads):
84
- super().__init__()
85
- self.qkv = nn.Linear(dim, dim * 3)
86
- self.out = nn.Linear(dim, dim)
87
- self.mlp = MLP(dim)
88
- self.norm1 = nn.LayerNorm(dim)
89
- self.norm2 = nn.LayerNorm(dim)
90
- self.n_heads = n_heads
91
- self.d_head = dim // n_heads
92
-
93
- def forward(self, x, cos, sin):
94
- # --- Attention ---
95
- q, k, v = project_qkv(x, self.qkv, self.n_heads, self.d_head)
96
- q, k = apply_rope(q, k, cos, sin)
97
- attn_out = attention(q, k, v)
98
- x = self.norm1(x + self.out(attn_out))
99
-
100
- # --- MLP ---
101
- x = self.norm2(x + self.mlp(x))
102
- return x
103
- ```
104
-
105
- ---
106
-
107
- ### Example: Replacing a Transformer block with a D-RNA block
108
-
109
- ```python
110
- class DRNA_ResonantBlock(nn.Module):
111
- """
112
- Replace the existing TransformerBlock with this ResonantBlock.
113
- I/O: [Batch, Seq, Dim] -> [Batch, Seq, Dim] (Fully compatible)
114
- """
115
- def __init__(self, dim, n_heads, mlp_dim_forward=None):
116
- super().__init__()
117
- self.n_heads = n_heads
118
- self.d_head = dim // n_heads
119
-
120
- # 1. Spiral Projection Layer (A)
121
- self.qkv = nn.Linear(dim, dim * 3)
122
- self.out = nn.Linear(dim, dim)
123
-
124
- # 2. Spiral Memory Layer (B)
125
- mlp_dim = mlp_dim_forward if mlp_dim_forward else dim * 4
126
- self.mlp = nn.Sequential(
127
- nn.Linear(dim, mlp_dim),
128
- nn.GELU(),
129
- nn.Linear(mlp_dim, dim)
130
- )
131
-
132
- # 3. Normalization layer for compression
133
- self.norm1 = nn.LayerNorm(dim)
134
- self.norm2 = nn.LayerNorm(dim)
135
-
136
- def forward(self, x, cos, sin):
137
- """
138
- Phase information for RoPE as an argument (cos, sin)
139
- """
140
- # Attention:Spiral Projection Layer (A)
141
- # QKV -> RoPE -> Norm
142
- q, k, v = project_qkv(x, self.qkv, self.n_heads, self.d_head)
143
- q, k = apply_rope(q, k, cos, sin)
144
-
145
- attn_out = attention(q, k, v)
146
- x = self.norm1(x + self.out(attn_out)) # Synchronization with context
147
-
148
- # MLP:Spiral Memory Layer (B)
149
- # MLP -> Norm
150
- x = self.norm2(x + self.mlp(x)) # Determined by memory
151
-
152
- return x
153
- ```
154
-
155
- ### Replacement and Utilization of D-RNA
156
- A direct drop‑in replacement is not possible, but it can be utilized through “redefinition and re‑synchronization.”
157
- Why it cannot be used as‑is:
158
- While a standard Transformer stores information using an “absolute address” (absolute position), D-RNA processes information using the “phase of a spiral” (relative position), meaning the coordinate systems are fundamentally different.
159
- Even if the weights are copied directly, the phases do not align and no resonance occurs.
160
- How to replace it (implementation):
161
- The network’s input–output shapes are fully compatible.
162
- By rewriting the existing layers as ResonantBlock and migrating positional information into RoPE’s rotational field, the core upgrade is complete.
163
- How to utilize and adapt it (training):
164
- After transferring the existing model’s weights as initialization, continue training with a low learning rate.
165
- The previously static knowledge (existing weights) begins to synchronize with the spiral rotation, gradually blending into D-RNA’s “Resonant Contraction” process and evolving beyond the original performance.
166
-
167
- ---
168
-
169
- BPC Comparison Chart
170
-
171
- non-mask
172
- <img width="800" alt="bpc_only" src="bpc_only.png" />
173
-
174
- use-mask
175
- <img width="800" alt="bpc_only" src="bpc_mask.png" />
176
-
177
- ---
178
-
179
- License:
180
- This project is licensed under the Apache License 2.0. (See the LICENSE for details).
181
-
182
- #### Acknowledgments:
183
- This work builds upon the foundation established by the Transformer architecture.
184
- I would like to express my gratitude to the researchers and open-source communities
185
- whose contributions to attention mechanisms, positional encoding, and large-scale
186
- model design made this work possible.
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - ja
6
+ tags:
7
+ - machine-learning
8
+ - deep-learning
9
+ - transformer
10
+ - architecture-design
11
+ - adaptive-algorithms
12
+ - resonant-contraction
13
+ - resonant-projection-field
14
+ ---
15
+
16
  ! Apology to Everyone (Important Notice) !
17
 
18
  First and foremost, I would like to offer my sincere apologies to everyone reading this. Regarding the “theoretical claims” about D-RANA published in this repository, particularly the claims concerning resonance, the content should be considered, from the current perspective, a “draft that is unverified and may contain errors.” I deeply apologize to everyone who trusted these descriptions for any misunderstandings that may have resulted. All of the following content is “hypothesis” “unverified,” and “draft.” ※ There is a possibility that my theoretical interpretation was incorrect. ※ This cannot be considered reliable evidence.
19
 
20
+
 
21
  ! 皆さまへのお詫び (重要なお知らせ) !
22
 
23
+ まず最初に、ご覧の皆さまに率直にお詫び申し上げます 本リポジトリで公開している D-RANAに関する 「理論的主張」について、 特に 共鳴 に関する主張は 現在の視点では"未検証/誤りを含む草案"とすべき内容です 記述を信じてくださった皆さまに対し 誤解を招く結果となったことを深くお詫び申し上げます 以下の内容はすべて 「仮説」「未検証」「草案」 です ※ 理論的な解釈を誤っていた可能性があります ※ 信頼できるエビデンスとは言えません
24
+
25
+ ---
26
+
27
+ # D-RNA:Dual‑Helix Resonance Neural Architecture (DRNA)
28
+
29
+ D-RNA is a new neural architecture centered on a dual helix structure and a rotation field produced by RoPE.
30
+
31
+ In this architecture, Attention and MLP are synchronized into a dual helix, and information is holographically compressed through Resonant Contraction.
32
+ This method rearranges sparse representations into dense ones to achieve high expressiveness using the depth‑direction structure alone, without increasing the number of dimensions.
33
+ A key feature of this approach is its ability to preserve the full connectivity of the Transformer architecture while suppressing catastrophic forgetting and retaining subtle fluctuations and phase information.
34
+
35
+ ---
36
+
37
+ ### Features
38
+ High structural compatibility: It has the exact same input–output shape as a standard Transformer Block, allowing it to be smoothly substituted as the core of an architecture.
39
+ Resonant Contraction: By synchronizing Attention and the MLP in a double‑helix pattern and converging information into a phase field, it dramatically increases representational density.
40
+ Depth as an alternative to dimensionality: The spiral rotation (depth‑wise operations) compensates for limited dimensionality and enables holographic information retention without increasing parameter count.
41
+ Excellent learning efficiency: The spiral‑based information attraction (synchronization) achieves astonishing early convergence with far fewer steps than a Transformer.
42
+ Fine‑grained phase preservation: The rotational field powered by RoPE preserves subtle fluctuations and relative contextual relationships that are often lost in conventional architectures.
43
+ Re‑synchronization of knowledge: Existing weights can be transplanted as initialization and gently adapted to the spiral phase with a low learning rate, allowing existing intelligence to be evolved or overwritten into the D-RNA structure.
44
+
45
+
46
+ ### Notes
47
+ Optimization of learning rate (LR):
48
+ Because D-RNA synchronizes information extremely quickly through Resonant Contraction, it converges sufficiently — and rapidly — even with a lower learning rate compared to a standard Transformer.
49
+ If the LR is set too high, the resonance may be excessively amplified and cause oscillation, so starting with a modest LR is recommended.
50
+ Synergistic gradient effects:
51
+ Since Attention (recall) and the MLP (memory) are synchronized in a double‑helix sequence, the “settling” of weights from a single update is very strong.
52
+ This is an advantage for fast convergence, but it also means that careful updates are key to stability.
53
+ Parameter commonality:
54
+ Hyperparameters such as weight initialization seeds and batch size can be inherited directly from standard Transformer settings.
55
+
56
+ ---
57
+
58
+ ### Conceptual Diagram
59
+
60
+ ```
61
+ Synchronizing “searching” (Attention) and “knowing” (MLP) in the phase of a spiral.
62
+
63
+ RoPE Rotation Field (Phase-Preserving)
64
+ Holographic Compression: Turning Sparse into Dense
65
+
66
+ A M
67
+ \ /
68
+ \ / ← This is Resonance
69
+ / \ Synchronization occurs naturally through the seed
70
+ / \ Naturally, meaning emerges through a chain of synchronicities
71
+ A M
72
+
73
+ Repeats in the depth direction to form a dual helix
74
+ (acts as a substitute for increasing dimensionality)
75
+ ```
76
+ ---
77
+
78
+ ### Minimal Block
79
+
80
+ ```python
81
+ class ResonantBlock(nn.Module):
82
+ def __init__(self, dim, n_heads):
83
+ super().__init__()
84
+ self.qkv = nn.Linear(dim, dim * 3)
85
+ self.out = nn.Linear(dim, dim)
86
+ self.mlp = MLP(dim)
87
+ self.norm1 = nn.LayerNorm(dim)
88
+ self.norm2 = nn.LayerNorm(dim)
89
+ self.n_heads = n_heads
90
+ self.d_head = dim // n_heads
91
+
92
+ def forward(self, x, cos, sin):
93
+ # --- Attention ---
94
+ q, k, v = project_qkv(x, self.qkv, self.n_heads, self.d_head)
95
+ q, k = apply_rope(q, k, cos, sin)
96
+ attn_out = attention(q, k, v)
97
+ x = self.norm1(x + self.out(attn_out))
98
+
99
+ # --- MLP ---
100
+ x = self.norm2(x + self.mlp(x))
101
+ return x
102
+ ```
103
+
104
+ ---
105
+
106
+ ### Example: Replacing a Transformer block with a D-RNA block
107
+
108
+ ```python
109
+ class DRNA_ResonantBlock(nn.Module):
110
+ """
111
+ Replace the existing TransformerBlock with this ResonantBlock.
112
+ I/O: [Batch, Seq, Dim] -> [Batch, Seq, Dim] (Fully compatible)
113
+ """
114
+ def __init__(self, dim, n_heads, mlp_dim_forward=None):
115
+ super().__init__()
116
+ self.n_heads = n_heads
117
+ self.d_head = dim // n_heads
118
+
119
+ # 1. Spiral Projection Layer (A)
120
+ self.qkv = nn.Linear(dim, dim * 3)
121
+ self.out = nn.Linear(dim, dim)
122
+
123
+ # 2. Spiral Memory Layer (B)
124
+ mlp_dim = mlp_dim_forward if mlp_dim_forward else dim * 4
125
+ self.mlp = nn.Sequential(
126
+ nn.Linear(dim, mlp_dim),
127
+ nn.GELU(),
128
+ nn.Linear(mlp_dim, dim)
129
+ )
130
+
131
+ # 3. Normalization layer for compression
132
+ self.norm1 = nn.LayerNorm(dim)
133
+ self.norm2 = nn.LayerNorm(dim)
134
+
135
+ def forward(self, x, cos, sin):
136
+ """
137
+ Phase information for RoPE as an argument (cos, sin)
138
+ """
139
+ # Attention:Spiral Projection Layer (A)
140
+ # QKV -> RoPE -> Norm
141
+ q, k, v = project_qkv(x, self.qkv, self.n_heads, self.d_head)
142
+ q, k = apply_rope(q, k, cos, sin)
143
+
144
+ attn_out = attention(q, k, v)
145
+ x = self.norm1(x + self.out(attn_out)) # Synchronization with context
146
+
147
+ # MLP:Spiral Memory Layer (B)
148
+ # MLP -> Norm
149
+ x = self.norm2(x + self.mlp(x)) # Determined by memory
150
+
151
+ return x
152
+ ```
153
+
154
+ ### Replacement and Utilization of D-RNA
155
+ A direct drop‑in replacement is not possible, but it can be utilized through “redefinition and re‑synchronization.”
156
+ Why it cannot be used as‑is:
157
+ While a standard Transformer stores information using an “absolute address” (absolute position), D-RNA processes information using the “phase of a spiral” (relative position), meaning the coordinate systems are fundamentally different.
158
+ Even if the weights are copied directly, the phases do not align and no resonance occurs.
159
+ How to replace it (implementation):
160
+ The network’s input–output shapes are fully compatible.
161
+ By rewriting the existing layers as ResonantBlock and migrating positional information into RoPE’s rotational field, the core upgrade is complete.
162
+ How to utilize and adapt it (training):
163
+ After transferring the existing model’s weights as initialization, continue training with a low learning rate.
164
+ The previously static knowledge (existing weights) begins to synchronize with the spiral rotation, gradually blending into D-RNA’s “Resonant Contraction” process and evolving beyond the original performance.
165
+
166
+ ---
167
+
168
+ BPC Comparison Chart
169
+
170
+ non-mask
171
+ <img width="800" alt="bpc_only" src="bpc_only.png" />
172
+
173
+ use-mask
174
+ <img width="800" alt="bpc_only" src="bpc_mask.png" />
175
+
176
+ ---
177
+
178
+ License:
179
+ This project is licensed under the Apache License 2.0. (See the LICENSE for details).
180
+
181
+ #### Acknowledgments:
182
+ This work builds upon the foundation established by the Transformer architecture.
183
+ I would like to express my gratitude to the researchers and open-source communities
184
+ whose contributions to attention mechanisms, positional encoding, and large-scale
185
+ model design made this work possible.