tejassuds commited on
Commit
4e27a02
·
verified ·
1 Parent(s): a5cf928

Update README: add video results, new DOI, fix genome table

Browse files
Files changed (1) hide show
  1. README.md +157 -132
README.md CHANGED
@@ -1,132 +1,157 @@
1
- ---
2
- license: mit
3
- tags:
4
- - sparse-networks
5
- - neural-architecture-search
6
- - network-growing
7
- - genome
8
- - topology-learning
9
- - pytorch
10
- datasets:
11
- - mnist
12
- - cifar10
13
- - cifar100
14
- - imdb
15
- pipeline_tag: other
16
- ---
17
-
18
- # Neural DNA (NDNA): A Compact Genome for Growing Network Architecture
19
-
20
- A tiny learned genome (< 300 parameters) that grows neural network topology through developmental rules. Default disconnected, type-based compatibility, metabolic cost pressure. The genome discovers useful sparse connectivity that beats random wiring on every experiment, matches or exceeds dense baselines on most, and transfers across tasks without retraining.
21
-
22
- [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.19230474.svg)](https://doi.org/10.5281/zenodo.19230474)
23
-
24
- ![Method Overview](https://raw.githubusercontent.com/tejassudsfp/ndna/main/figures/fig1_method_overview.png)
25
-
26
- ## What is NDNA?
27
-
28
- Neural networks typically use fixed, fully-connected layers. NDNA asks: what if a small "genome" could learn *which* connections should exist?
29
-
30
- The genome encodes cell type embeddings and a compatibility matrix. During growth, it compares source and target types for every potential connection and decides whether to wire it or not. A metabolic cost penalty forces selectivity, so only useful connections survive.
31
-
32
- **The result:** 226 to 258 genome parameters control up to 2.2 million connections (8,384:1 compression on our benchmarks, likely higher on larger networks). The grown networks are sparse but structured, and they consistently beat randomly-wired sparse networks.
33
-
34
- ## How It Works
35
-
36
- 1. **Genome** encodes cell type embeddings (8 types, 8 dimensions) and a compatibility matrix
37
- 2. **Growth**: for each potential connection, source and target type embeddings are compared via the compatibility matrix to produce a connection probability
38
- 3. **Binary mask**: probabilities are thresholded to produce hard 0/1 masks (straight-through estimator for gradient flow)
39
- 4. **Metabolic cost**: a sparsity loss penalizes total connection strength, forcing the genome to be selective
40
- 5. **Default disconnected**: compatibility is initialized negative, so the genome must actively grow every connection
41
-
42
- The genome and network weights are trained jointly with standard backpropagation.
43
-
44
- ![Compression Ratios](https://raw.githubusercontent.com/tejassudsfp/ndna/main/figures/fig2_compression.png)
45
-
46
- ## Key Results
47
-
48
- ![Genome vs Random Sparsity](https://raw.githubusercontent.com/tejassudsfp/ndna/main/figures/fig3_genome_vs_random.png)
49
-
50
- | Experiment | Genome | Random Sparse | Dense Baseline | Genome vs Random |
51
- |---|---|---|---|---|
52
- | MNIST (MLP) | 97.54% | 97.09% | 98.33% | +0.45% |
53
- | CIFAR-10 (MLP) | 57.14% | 51.68% | 54.32% | +5.46% |
54
- | CIFAR-10 (CNN) | 88.93% | 85.78% | 89.80% | +3.15% |
55
- | CIFAR-100 (Transfer) | 60.92% | 53.91% | 67.16% | +7.01% |
56
- | IMDB (Transformer) | 85.05% | 84.66% | 84.57% | +0.39% |
57
-
58
- The genome beats random sparse wiring on every experiment. On CIFAR-10 MLP and CIFAR-100 transfer, the genome even beats the dense baseline.
59
-
60
- ![Genome vs Dense](https://raw.githubusercontent.com/tejassudsfp/ndna/main/figures/fig5_genome_vs_dense.png)
61
-
62
- ## Pre-trained Genomes
63
-
64
- These are the trained genome files. Each genome is tiny (< 300 parameters) but controls the full network topology.
65
-
66
- | File | Architecture | Task | Params | Connections | Compression | Accuracy |
67
- |---|---|---|---|---|---|---|
68
- | `genome_mnist_mlp.pt` | MLP (GrownNetwork) | MNIST | 226 | 1,894,784 | 8,384:1 | 97.54% |
69
- | `genome_cifar10_mlp.pt` | MLP (GrownNetwork) | CIFAR-10 | 226 | 1,894,784 | 8,384:1 | 57.14% |
70
- | `genome_cifar10_cnn.pt` | CNN (GrownConvNetwork) | CIFAR-10 | 258 | 165,888 | 643:1 | 88.93% |
71
- | `genome_cifar100_transfer.pt` | MLP (GrownNetwork) | CIFAR-100 (transferred from CIFAR-10) | 226 | 1,894,784 | 8,384:1 | 60.92% |
72
- | `genome_imdb_transformer.pt` | Transformer (GrownTransformer) | IMDB Sentiment | 226 | 786,432 | 3,479:1 | 85.05% |
73
-
74
- ## Cross-Task Transfer
75
-
76
- The CIFAR-100 genome was not trained on CIFAR-100. It is the CIFAR-10 genome applied directly to CIFAR-100 without retraining the topology. Only the network weights were retrained. The genome's learned connectivity pattern transferred across tasks and still beat random sparse wiring by +7.01%.
77
-
78
- ![Topology Convergence](https://raw.githubusercontent.com/tejassudsfp/ndna/main/figures/fig4_topology_convergence.png)
79
-
80
- ## Transformer Attention Patterns
81
-
82
- The genome also works on transformers. On IMDB sentiment analysis, the grown transformer beats both random sparse and dense baselines.
83
-
84
- ![Transformer Heatmap](https://raw.githubusercontent.com/tejassudsfp/ndna/main/figures/fig6_transformer_heatmap.png)
85
-
86
- ## How to Use
87
-
88
- ```python
89
- import torch
90
- from genome.model import Genome, GrownNetwork, GrownConvNetwork, GrownTransformer
91
-
92
- # --- MLP (MNIST or CIFAR-10) ---
93
- genome = Genome(n_types=8, type_dim=8, n_bands=3)
94
- genome.load_state_dict(torch.load("genome_mnist_mlp.pt", weights_only=True))
95
- model = GrownNetwork(genome, input_dim=784, hidden_dim=1024, output_dim=10)
96
-
97
- # --- CNN (CIFAR-10) ---
98
- genome = Genome(n_types=8, type_dim=8, n_bands=4)
99
- genome.load_state_dict(torch.load("genome_cifar10_cnn.pt", weights_only=True))
100
- model = GrownConvNetwork(genome, num_classes=10)
101
-
102
- # --- Transformer (IMDB) ---
103
- genome = Genome(n_types=8, type_dim=8, n_bands=3)
104
- genome.load_state_dict(torch.load("genome_imdb_transformer.pt", weights_only=True))
105
- model = GrownTransformer(genome, vocab_size=20000, embed_dim=128, num_heads=4, num_layers=2, num_classes=2)
106
-
107
- # --- Transfer (CIFAR-10 genome -> CIFAR-100) ---
108
- genome = Genome(n_types=8, type_dim=8, n_bands=3)
109
- genome.load_state_dict(torch.load("genome_cifar100_transfer.pt", weights_only=True))
110
- model = GrownNetwork(genome, input_dim=3072, hidden_dim=1024, output_dim=100)
111
- ```
112
-
113
- ## Links
114
-
115
- - **Paper**: [Zenodo (DOI: 10.5281/zenodo.19230474)](https://doi.org/10.5281/zenodo.19230474)
116
- - **Code**: [github.com/tejassudsfp/ndna](https://github.com/tejassudsfp/ndna)
117
- - **Author**: [Tejas Parthasarathi Sudarshan](https://tejassuds.com) (tejas@fandesk.ai)
118
-
119
- ## Citation
120
-
121
- ```bibtex
122
- @article{sudarshan2026ndna,
123
- title={Neural DNA: A Compact Genome for Growing Network Architecture},
124
- author={Sudarshan, Tejas Parthasarathi},
125
- year={2026},
126
- doi={10.5281/zenodo.19230474}
127
- }
128
- ```
129
-
130
- ## License
131
-
132
- [MIT](https://opensource.org/licenses/MIT)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - sparse-networks
5
+ - neural-architecture-search
6
+ - network-growing
7
+ - genome
8
+ - topology-learning
9
+ - pytorch
10
+ - video-transformers
11
+ datasets:
12
+ - mnist
13
+ - cifar10
14
+ - cifar100
15
+ - imdb
16
+ - moving-mnist
17
+ pipeline_tag: other
18
+ ---
19
+
20
+ # Neural DNA (NDNA): A Compact Genome for Growing Network Architecture
21
+
22
+ A tiny learned genome (< 400 parameters) that grows neural network topology through developmental rules. Default disconnected, type-based compatibility, metabolic cost pressure. The genome discovers useful sparse connectivity that beats random wiring on every experiment (0.39% to 21.7%), matches or exceeds dense baselines on most, and transfers across tasks without retraining.
23
+
24
+ [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.19248389.svg)](https://doi.org/10.5281/zenodo.19248389)
25
+
26
+ ![Method Overview](https://raw.githubusercontent.com/tejassudsfp/ndna/main/figures/fig1_method_overview.png)
27
+
28
+ ## What is NDNA?
29
+
30
+ Neural networks typically use fixed, fully-connected layers. NDNA asks: what if a small "genome" could learn *which* connections should exist?
31
+
32
+ The genome encodes cell type embeddings and a compatibility matrix. During growth, it compares source and target types for every potential connection and decides whether to wire it or not. A metabolic cost penalty forces selectivity, so only useful connections survive.
33
+
34
+ **The result:** 226 to 374 genome parameters control up to 2.2 million connections (8,384:1 compression on our benchmarks, likely higher on larger networks). The grown networks are sparse but structured, and they consistently beat randomly-wired sparse networks.
35
+
36
+ ## How It Works
37
+
38
+ 1. **Genome** encodes cell type embeddings (8 types, 8 dimensions) and a compatibility matrix
39
+ 2. **Growth**: for each potential connection, source and target type embeddings are compared via the compatibility matrix to produce a connection probability
40
+ 3. **Binary mask**: probabilities are thresholded to produce hard 0/1 masks (straight-through estimator for gradient flow)
41
+ 4. **Metabolic cost**: a sparsity loss penalizes total connection strength, forcing the genome to be selective
42
+ 5. **Default disconnected**: compatibility is initialized negative, so the genome must actively grow every connection
43
+
44
+ The genome and network weights are trained jointly with standard backpropagation.
45
+
46
+ ![Compression Ratios](https://raw.githubusercontent.com/tejassudsfp/ndna/main/figures/fig2_compression.png)
47
+
48
+ ## Key Results
49
+
50
+ ![Genome vs Random Sparsity](https://raw.githubusercontent.com/tejassudsfp/ndna/main/figures/fig3_genome_vs_random.png)
51
+
52
+ | Experiment | Genome | Random Sparse | Dense Baseline | Genome vs Random |
53
+ |---|---|---|---|---|
54
+ | MNIST (MLP) | 97.54% | 97.09% | 98.33% | +0.45% |
55
+ | CIFAR-10 (MLP) | 57.14% | 51.68% | 54.32% | +5.46% |
56
+ | CIFAR-10 (CNN) | 88.93% | 85.78% | 89.80% | +3.15% |
57
+ | CIFAR-100 (Transfer) | 60.92% | 53.91% | 67.16% | +7.01% |
58
+ | IMDB (Transformer) | 85.05% | 84.66% | 84.57% | +0.39% |
59
+ | Moving MNIST (Video)* | 62.23 | 79.44 | 62.15 | +21.7% |
60
+
61
+ *\*Moving MNIST uses MSE (lower is better). The +21.7% is relative improvement.*
62
+
63
+ The genome beats random sparse wiring on every experiment. The largest gap is on video prediction (+21.7%), where random wiring completely falls apart but genome-grown wiring matches the dense baseline.
64
+
65
+ ![Genome vs Dense](https://raw.githubusercontent.com/tejassudsfp/ndna/main/figures/fig5_genome_vs_dense.png)
66
+
67
+ ## Video: Factored Spatiotemporal Genome
68
+
69
+ The video experiment uses a factored genome: temporal (74 params) + spatial (74 params) + depth (226 params) = 374 total. The temporal genome discovers temporal recency (recent frames get strong connections, distant frames get almost none). The spatial genome discovers spatial locality (nearby patches connect strongly, distant patches barely connect).
70
+
71
+ ![Temporal Mask](https://raw.githubusercontent.com/tejassudsfp/ndna/main/figures/fig7_temporal_mask.png)
72
+
73
+ ![Spatial Decay](https://raw.githubusercontent.com/tejassudsfp/ndna/main/figures/fig8_spatial_decay.png)
74
+
75
+ ## Pre-trained Genomes
76
+
77
+ These are the trained genome files. Each genome is tiny but controls the full network topology.
78
+
79
+ | File | Architecture | Task | Params | Connections | Compression | Result |
80
+ |---|---|---|---|---|---|---|
81
+ | `genome_mnist.pt` | MLP | MNIST | 226 | 174,240 | 770:1 | 97.54% |
82
+ | `genome_cifar10_mlp.pt` | MLP | CIFAR-10 | 226 | 1,706,240 | 7,553:1 | 57.14% |
83
+ | `genome_cifar10_cnn.pt` | CNN | CIFAR-10 | 258 | 165,888 | 643:1 | 88.93% |
84
+ | `genome_cifar100_fresh.pt` | MLP | CIFAR-100 (transfer) | 226 | 1,706,240 | 7,553:1 | 60.92% |
85
+ | `genome_transformer.pt` | Transformer | IMDB | 258 | 2,162,688 | 8,384:1 | 85.05% |
86
+ | `genome_video.pt` | Video Transformer | Moving MNIST | 374 | 307,300 | 821:1 | MSE 62.23 |
87
+
88
+ ## Cross-Task Transfer
89
+
90
+ The CIFAR-100 genome was not trained on CIFAR-100. It is the CIFAR-10 genome applied directly to CIFAR-100 without retraining the topology. Only the network weights were retrained. The genome's learned connectivity pattern transferred across tasks and still beat random sparse wiring by +7.01%.
91
+
92
+ ![Topology Convergence](https://raw.githubusercontent.com/tejassudsfp/ndna/main/figures/fig4_topology_convergence.png)
93
+
94
+ ## Transformer Attention Patterns
95
+
96
+ The genome also works on transformers. On IMDB sentiment analysis, the grown transformer beats both random sparse and dense baselines.
97
+
98
+ ![Transformer Heatmap](https://raw.githubusercontent.com/tejassudsfp/ndna/main/figures/fig6_transformer_heatmap.png)
99
+
100
+ ## How to Use
101
+
102
+ ```python
103
+ import torch
104
+ from genome.model import Genome, GrownNetwork, GrownConvNetwork, GrownTransformer
105
+
106
+ # --- MLP (MNIST) ---
107
+ genome = Genome(n_types=8, type_dim=8, n_bands=6)
108
+ genome.load_state_dict(torch.load("genome_mnist.pt", weights_only=True))
109
+ model = GrownNetwork(genome, input_dim=784, hidden_bands=[48, 48, 48, 48], output_dim=10)
110
+
111
+ # --- MLP (CIFAR-10) ---
112
+ genome = Genome(n_types=8, type_dim=8, n_bands=6)
113
+ genome.load_state_dict(torch.load("genome_cifar10_mlp.pt", weights_only=True))
114
+ model = GrownNetwork(genome, input_dim=3072, hidden_bands=[128, 128, 128, 128], output_dim=10)
115
+
116
+ # --- CNN (CIFAR-10) ---
117
+ genome = Genome(n_types=8, type_dim=8, n_bands=8)
118
+ genome.load_state_dict(torch.load("genome_cifar10_cnn.pt", weights_only=True))
119
+ model = GrownConvNetwork(genome, num_classes=10)
120
+
121
+ # --- Transformer (IMDB) ---
122
+ genome = Genome(n_types=8, type_dim=8, n_bands=8)
123
+ genome.load_state_dict(torch.load("genome_transformer.pt", weights_only=True))
124
+ model = GrownTransformer(genome, vocab_size=20000, embed_dim=128, num_heads=4, num_layers=2, num_classes=2)
125
+
126
+ # --- Video Transformer (Moving MNIST) ---
127
+ from experiments.rung4_video import SpatiotemporalGenome, GenomeVideoTransformer
128
+ stg = SpatiotemporalGenome()
129
+ stg.load_state_dict(torch.load("genome_video.pt", weights_only=True))
130
+ model = GenomeVideoTransformer(stg, d_model=64, nhead=4, num_layers=2, n_frames=10, patch_size=8, img_size=64)
131
+
132
+ # --- Transfer (CIFAR-10 genome -> CIFAR-100) ---
133
+ genome = Genome(n_types=8, type_dim=8, n_bands=6)
134
+ genome.load_state_dict(torch.load("genome_cifar100_fresh.pt", weights_only=True))
135
+ model = GrownNetwork(genome, input_dim=3072, hidden_bands=[128, 128, 128, 128], output_dim=100)
136
+ ```
137
+
138
+ ## Links
139
+
140
+ - **Paper**: [Zenodo (DOI: 10.5281/zenodo.19248389)](https://doi.org/10.5281/zenodo.19248389)
141
+ - **Code**: [github.com/tejassudsfp/ndna](https://github.com/tejassudsfp/ndna)
142
+ - **Author**: [Tejas Parthasarathi Sudarshan](https://tejassuds.com) (tejas@fandesk.ai)
143
+
144
+ ## Citation
145
+
146
+ ```bibtex
147
+ @article{sudarshan2026ndna,
148
+ title={Neural DNA: A Compact Genome for Growing Network Architecture},
149
+ author={Sudarshan, Tejas Parthasarathi},
150
+ year={2026},
151
+ doi={10.5281/zenodo.19248389}
152
+ }
153
+ ```
154
+
155
+ ## License
156
+
157
+ [MIT](https://opensource.org/licenses/MIT)