muooon commited on
Commit
487736d
·
verified ·
1 Parent(s): 8006a6e

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -147
README.md DELETED
@@ -1,147 +0,0 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- - ja
6
- tags:
7
- - machine-learning
8
- - deep-learning
9
- - transformer
10
- - architecture-design
11
- - adaptive-algorithms
12
- - resonant-contraction
13
- - resonant-projection-field
14
- ---
15
- # D‑RNA:Dual‑Helix Resonance Neural Architecture (DRNA)
16
-
17
- DRNA is a new neural architecture centered on a dual helix structure and a rotation field produced by RoPE.
18
-
19
- In this architecture, Attention and MLP are synchronized into a dual helix, and information is holographically compressed through Resonant Contraction.
20
- This method rearranges sparse representations into dense ones to achieve high expressiveness using the depth‑direction structure alone, without increasing the number of dimensions.
21
- A key feature of this approach is its ability to preserve the full connectivity of the Transformer architecture while suppressing catastrophic forgetting and retaining subtle fluctuations and phase information.
22
-
23
- ---
24
-
25
- ### Features
26
- - Fully compatible with Transformers; existing weights can be reused without modification.
27
- - Resonant Contraction (a + m + a*m) increases representation density.
28
- - The Resonant Projection Field induces continuous‑depth (ODE‑like) behavior.
29
- - No additional parameters are required, and computational overhead remains minimal.
30
- - Can be used as a drop‑in replacement for standard Transformer blocks.
31
- - Tends to converge earlier during training, reaching stable performance in fewer steps than a Transformer.
32
-
33
- ### Notes
34
- - While DRNA tends to converge earlier during training, a learning rate (LR) that is too high may cause oscillation.
35
- - It works with the same hyperparameter settings as a Transformer, but for greater stability we recommend using a slightly lower LR.
36
- - This behavior occurs because Resonant Contraction synchronizes the gradients of Attention and MLP, making updates stronger.
37
- - Other hyperparameters can remain almost identical to those used for a standard Transformer.
38
-
39
-
40
- ---
41
-
42
- ```
43
- - Conceptual Diagram -
44
-
45
- RoPE Rotation Field (Phase-Preserving)
46
- Holographic Compression: Turning Sparse into Dense
47
-
48
- A M
49
- \ /
50
- \ / ← This is Resonance
51
- / \ Synchronization occurs naturally through the seed
52
- / \ Naturally, meaning emerges through a chain of synchronicities
53
- A M
54
-
55
- Repeats in the depth direction to form a dual helix
56
- (acts as a substitute for increasing dimensionality)
57
- ```
58
- ---
59
-
60
- ### Minimal Block
61
-
62
- ```python
63
- class DRNABlock(nn.Module):
64
- def __init__(self, dim):
65
- super().__init__()
66
- self.attn = Attention(dim)
67
- self.mlp = MLP(dim)
68
-
69
- def forward(self, x):
70
- # Synchronization of the dual helix
71
- a = self.attn(x)
72
- m = self.mlp(x)
73
-
74
- # Resonant Contraction
75
- h = a + m + (a * m)
76
-
77
- # RoPE
78
- h = apply_rope(h)
79
-
80
- return h
81
- ```
82
-
83
- ---
84
-
85
- ### Example: Replacing a Transformer block with a DRNA block
86
-
87
- ```python
88
- class TransformerBlock(nn.Module):
89
- def __init__(self, dim):
90
- super().__init__()
91
- self.attn = nn.MultiheadAttention(dim, num_heads=8, batch_first=True)
92
- self.mlp = nn.Sequential(
93
- nn.Linear(dim, dim * 4),
94
- nn.GELU(),
95
- nn.Linear(dim * 4, dim),
96
- )
97
-
98
- def forward(self, x):
99
- a, _ = self.attn(x, x, x)
100
- m = self.mlp(x)
101
- return x + a + m
102
-
103
-
104
- class DRNABasedBlock(nn.Module):
105
- def __init__(self, dim):
106
- super().__init__()
107
- self.block = DRNABlock(dim)
108
-
109
- def forward(self, x):
110
- return self.block(x)
111
- ```
112
-
113
- ### Simply replace the existing Transformer block with a DRNA block
114
-
115
- ```python
116
- x = torch.randn(1, 128, 512) # (batch, seq, dim)
117
- block = DRNABasedBlock(dim=512)
118
-
119
- y = block(x)
120
- print(y.shape) # => torch.Size([1, 128, 512])
121
- ```
122
-
123
- ### Key Points
124
-
125
- - Same input/output shape as a standard Transformer block
126
- - Weight shapes are identical, so existing model weights can be reused as‑is
127
- - Works as a drop‑in replacement
128
- - No additional parameters
129
- - Only the synchronized Attention–MLP interaction (Resonant Contraction) is added
130
-
131
-
132
- ---
133
-
134
- BPC Comparison Chart
135
-
136
- <img width="800" alt="bpc_only" src="bpc_only.png" />
137
-
138
- ---
139
-
140
- License:
141
- This project is licensed under the Apache License 2.0. (See the LICENSE for details).
142
-
143
- #### Acknowledgments:
144
- This work builds upon the foundation established by the Transformer architecture.
145
- I would like to express my gratitude to the researchers and open-source communities
146
- whose contributions to attention mechanisms, positional encoding, and large-scale
147
- model design made this work possible.