tejassuds
/

ndna-genome

@@ -1,132 +1,157 @@
----
-license: mit
-tags:
-  - sparse-networks
-  - neural-architecture-search
-  - network-growing
-  - genome
-  - topology-learning
-  - pytorch
-datasets:
-  - mnist
-  - cifar10
-  - cifar100
-  - imdb
-pipeline_tag: other
----
-# Neural DNA (NDNA): A Compact Genome for Growing Network Architecture
-A tiny learned genome (< 300 parameters) that grows neural network topology through developmental rules. Default disconnected, type-based compatibility, metabolic cost pressure. The genome discovers useful sparse connectivity that beats random wiring on every experiment, matches or exceeds dense baselines on most, and transfers across tasks without retraining.
-[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.19230474.svg)](https://doi.org/10.5281/zenodo.19230474)
-![Method Overview](https://raw.githubusercontent.com/tejassudsfp/ndna/main/figures/fig1_method_overview.png)
-## What is NDNA?
-Neural networks typically use fixed, fully-connected layers. NDNA asks: what if a small "genome" could learn *which* connections should exist?
-The genome encodes cell type embeddings and a compatibility matrix. During growth, it compares source and target types for every potential connection and decides whether to wire it or not. A metabolic cost penalty forces selectivity, so only useful connections survive.
-**The result:** 226 to 258 genome parameters control up to 2.2 million connections (8,384:1 compression on our benchmarks, likely higher on larger networks). The grown networks are sparse but structured, and they consistently beat randomly-wired sparse networks.
-## How It Works
-1. **Genome** encodes cell type embeddings (8 types, 8 dimensions) and a compatibility matrix
-2. **Growth**: for each potential connection, source and target type embeddings are compared via the compatibility matrix to produce a connection probability
-3. **Binary mask**: probabilities are thresholded to produce hard 0/1 masks (straight-through estimator for gradient flow)
-4. **Metabolic cost**: a sparsity loss penalizes total connection strength, forcing the genome to be selective
-5. **Default disconnected**: compatibility is initialized negative, so the genome must actively grow every connection
-The genome and network weights are trained jointly with standard backpropagation.
-![Compression Ratios](https://raw.githubusercontent.com/tejassudsfp/ndna/main/figures/fig2_compression.png)
-## Key Results
-![Genome vs Random Sparsity](https://raw.githubusercontent.com/tejassudsfp/ndna/main/figures/fig3_genome_vs_random.png)
-| Experiment | Genome | Random Sparse | Dense Baseline | Genome vs Random |
-|---|---|---|---|---|
-| MNIST (MLP) | 97.54% | 97.09% | 98.33% | +0.45% |
-| CIFAR-10 (MLP) | 57.14% | 51.68% | 54.32% | +5.46% |
-| CIFAR-10 (CNN) | 88.93% | 85.78% | 89.80% | +3.15% |
-| CIFAR-100 (Transfer) | 60.92% | 53.91% | 67.16% | +7.01% |
-| IMDB (Transformer) | 85.05% | 84.66% | 84.57% | +0.39% |
-The genome beats random sparse wiring on every experiment. On CIFAR-10 MLP and CIFAR-100 transfer, the genome even beats the dense baseline.
-![Genome vs Dense](https://raw.githubusercontent.com/tejassudsfp/ndna/main/figures/fig5_genome_vs_dense.png)
-## Pre-trained Genomes
-These are the trained genome files. Each genome is tiny (< 300 parameters) but controls the full network topology.
-| File | Architecture | Task | Params | Connections | Compression | Accuracy |
-|---|---|---|---|---|---|---|
-| `genome_mnist_mlp.pt` | MLP (GrownNetwork) | MNIST | 226 | 1,894,784 | 8,384:1 | 97.54% |
-| `genome_cifar10_mlp.pt` | MLP (GrownNetwork) | CIFAR-10 | 226 | 1,894,784 | 8,384:1 | 57.14% |
-| `genome_cifar10_cnn.pt` | CNN (GrownConvNetwork) | CIFAR-10 | 258 | 165,888 | 643:1 | 88.93% |
-| `genome_cifar100_transfer.pt` | MLP (GrownNetwork) | CIFAR-100 (transferred from CIFAR-10) | 226 | 1,894,784 | 8,384:1 | 60.92% |
-| `genome_imdb_transformer.pt` | Transformer (GrownTransformer) | IMDB Sentiment | 226 | 786,432 | 3,479:1 | 85.05% |
-## Cross-Task Transfer
-The CIFAR-100 genome was not trained on CIFAR-100. It is the CIFAR-10 genome applied directly to CIFAR-100 without retraining the topology. Only the network weights were retrained. The genome's learned connectivity pattern transferred across tasks and still beat random sparse wiring by +7.01%.
-![Topology Convergence](https://raw.githubusercontent.com/tejassudsfp/ndna/main/figures/fig4_topology_convergence.png)
-## Transformer Attention Patterns
-The genome also works on transformers. On IMDB sentiment analysis, the grown transformer beats both random sparse and dense baselines.
-![Transformer Heatmap](https://raw.githubusercontent.com/tejassudsfp/ndna/main/figures/fig6_transformer_heatmap.png)
-## How to Use
-```python
-import torch
-from genome.model import Genome, GrownNetwork, GrownConvNetwork, GrownTransformer
-# --- MLP (MNIST or CIFAR-10) ---
-genome = Genome(n_types=8, type_dim=8, n_bands=3)
-genome.load_state_dict(torch.load("genome_mnist_mlp.pt", weights_only=True))
-model = GrownNetwork(genome, input_dim=784, hidden_dim=1024, output_dim=10)
-# --- CNN (CIFAR-10) ---
-genome = Genome(n_types=8, type_dim=8, n_bands=4)
-genome.load_state_dict(torch.load("genome_cifar10_cnn.pt", weights_only=True))
-model = GrownConvNetwork(genome, num_classes=10)
-# --- Transformer (IMDB) ---
-genome = Genome(n_types=8, type_dim=8, n_bands=3)
-genome.load_state_dict(torch.load("genome_imdb_transformer.pt", weights_only=True))
-model = GrownTransformer(genome, vocab_size=20000, embed_dim=128, num_heads=4, num_layers=2, num_classes=2)
-# --- Transfer (CIFAR-10 genome -> CIFAR-100) ---
-genome = Genome(n_types=8, type_dim=8, n_bands=3)
-genome.load_state_dict(torch.load("genome_cifar100_transfer.pt", weights_only=True))
-model = GrownNetwork(genome, input_dim=3072, hidden_dim=1024, output_dim=100)
-```
-## Links
-- **Paper**: [Zenodo (DOI: 10.5281/zenodo.19230474)](https://doi.org/10.5281/zenodo.19230474)
-- **Code**: [github.com/tejassudsfp/ndna](https://github.com/tejassudsfp/ndna)
-- **Author**: [Tejas Parthasarathi Sudarshan](https://tejassuds.com) (tejas@fandesk.ai)
-## Citation
-```bibtex
-@article{sudarshan2026ndna,
-  title={Neural DNA: A Compact Genome for Growing Network Architecture},
-  author={Sudarshan, Tejas Parthasarathi},
-  year={2026},
-  doi={10.5281/zenodo.19230474}
-}
-```
-## License
-[MIT](https://opensource.org/licenses/MIT)

+---
+license: mit
+tags:
+  - sparse-networks
+  - neural-architecture-search
+  - network-growing
+  - genome
+  - topology-learning
+  - pytorch
+  - video-transformers
+datasets:
+  - mnist
+  - cifar10
+  - cifar100
+  - imdb
+  - moving-mnist
+pipeline_tag: other
+---
+# Neural DNA (NDNA): A Compact Genome for Growing Network Architecture
+A tiny learned genome (< 400 parameters) that grows neural network topology through developmental rules. Default disconnected, type-based compatibility, metabolic cost pressure. The genome discovers useful sparse connectivity that beats random wiring on every experiment (0.39% to 21.7%), matches or exceeds dense baselines on most, and transfers across tasks without retraining.
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.19248389.svg)](https://doi.org/10.5281/zenodo.19248389)
+![Method Overview](https://raw.githubusercontent.com/tejassudsfp/ndna/main/figures/fig1_method_overview.png)
+## What is NDNA?
+Neural networks typically use fixed, fully-connected layers. NDNA asks: what if a small "genome" could learn *which* connections should exist?
+The genome encodes cell type embeddings and a compatibility matrix. During growth, it compares source and target types for every potential connection and decides whether to wire it or not. A metabolic cost penalty forces selectivity, so only useful connections survive.
+**The result:** 226 to 374 genome parameters control up to 2.2 million connections (8,384:1 compression on our benchmarks, likely higher on larger networks). The grown networks are sparse but structured, and they consistently beat randomly-wired sparse networks.
+## How It Works
+1. **Genome** encodes cell type embeddings (8 types, 8 dimensions) and a compatibility matrix
+2. **Growth**: for each potential connection, source and target type embeddings are compared via the compatibility matrix to produce a connection probability
+3. **Binary mask**: probabilities are thresholded to produce hard 0/1 masks (straight-through estimator for gradient flow)
+4. **Metabolic cost**: a sparsity loss penalizes total connection strength, forcing the genome to be selective
+5. **Default disconnected**: compatibility is initialized negative, so the genome must actively grow every connection
+The genome and network weights are trained jointly with standard backpropagation.
+![Compression Ratios](https://raw.githubusercontent.com/tejassudsfp/ndna/main/figures/fig2_compression.png)
+## Key Results
+![Genome vs Random Sparsity](https://raw.githubusercontent.com/tejassudsfp/ndna/main/figures/fig3_genome_vs_random.png)
+| Experiment | Genome | Random Sparse | Dense Baseline | Genome vs Random |
+|---|---|---|---|---|
+| MNIST (MLP) | 97.54% | 97.09% | 98.33% | +0.45% |
+| CIFAR-10 (MLP) | 57.14% | 51.68% | 54.32% | +5.46% |
+| CIFAR-10 (CNN) | 88.93% | 85.78% | 89.80% | +3.15% |
+| CIFAR-100 (Transfer) | 60.92% | 53.91% | 67.16% | +7.01% |
+| IMDB (Transformer) | 85.05% | 84.66% | 84.57% | +0.39% |
+| Moving MNIST (Video)* | 62.23 | 79.44 | 62.15 | +21.7% |
+*\*Moving MNIST uses MSE (lower is better). The +21.7% is relative improvement.*
+The genome beats random sparse wiring on every experiment. The largest gap is on video prediction (+21.7%), where random wiring completely falls apart but genome-grown wiring matches the dense baseline.
+![Genome vs Dense](https://raw.githubusercontent.com/tejassudsfp/ndna/main/figures/fig5_genome_vs_dense.png)
+## Video: Factored Spatiotemporal Genome
+The video experiment uses a factored genome: temporal (74 params) + spatial (74 params) + depth (226 params) = 374 total. The temporal genome discovers temporal recency (recent frames get strong connections, distant frames get almost none). The spatial genome discovers spatial locality (nearby patches connect strongly, distant patches barely connect).
+![Temporal Mask](https://raw.githubusercontent.com/tejassudsfp/ndna/main/figures/fig7_temporal_mask.png)
+![Spatial Decay](https://raw.githubusercontent.com/tejassudsfp/ndna/main/figures/fig8_spatial_decay.png)
+## Pre-trained Genomes
+These are the trained genome files. Each genome is tiny but controls the full network topology.
+| File | Architecture | Task | Params | Connections | Compression | Result |
+|---|---|---|---|---|---|---|
+| `genome_mnist.pt` | MLP | MNIST | 226 | 174,240 | 770:1 | 97.54% |
+| `genome_cifar10_mlp.pt` | MLP | CIFAR-10 | 226 | 1,706,240 | 7,553:1 | 57.14% |
+| `genome_cifar10_cnn.pt` | CNN | CIFAR-10 | 258 | 165,888 | 643:1 | 88.93% |
+| `genome_cifar100_fresh.pt` | MLP | CIFAR-100 (transfer) | 226 | 1,706,240 | 7,553:1 | 60.92% |
+| `genome_transformer.pt` | Transformer | IMDB | 258 | 2,162,688 | 8,384:1 | 85.05% |
+| `genome_video.pt` | Video Transformer | Moving MNIST | 374 | 307,300 | 821:1 | MSE 62.23 |
+## Cross-Task Transfer
+The CIFAR-100 genome was not trained on CIFAR-100. It is the CIFAR-10 genome applied directly to CIFAR-100 without retraining the topology. Only the network weights were retrained. The genome's learned connectivity pattern transferred across tasks and still beat random sparse wiring by +7.01%.
+![Topology Convergence](https://raw.githubusercontent.com/tejassudsfp/ndna/main/figures/fig4_topology_convergence.png)
+## Transformer Attention Patterns
+The genome also works on transformers. On IMDB sentiment analysis, the grown transformer beats both random sparse and dense baselines.
+![Transformer Heatmap](https://raw.githubusercontent.com/tejassudsfp/ndna/main/figures/fig6_transformer_heatmap.png)
+## How to Use
+```python
+import torch
+from genome.model import Genome, GrownNetwork, GrownConvNetwork, GrownTransformer
+# --- MLP (MNIST) ---
+genome = Genome(n_types=8, type_dim=8, n_bands=6)
+genome.load_state_dict(torch.load("genome_mnist.pt", weights_only=True))
+model = GrownNetwork(genome, input_dim=784, hidden_bands=[48, 48, 48, 48], output_dim=10)
+# --- MLP (CIFAR-10) ---
+genome = Genome(n_types=8, type_dim=8, n_bands=6)
+genome.load_state_dict(torch.load("genome_cifar10_mlp.pt", weights_only=True))
+model = GrownNetwork(genome, input_dim=3072, hidden_bands=[128, 128, 128, 128], output_dim=10)
+# --- CNN (CIFAR-10) ---
+genome = Genome(n_types=8, type_dim=8, n_bands=8)
+genome.load_state_dict(torch.load("genome_cifar10_cnn.pt", weights_only=True))
+model = GrownConvNetwork(genome, num_classes=10)
+# --- Transformer (IMDB) ---
+genome = Genome(n_types=8, type_dim=8, n_bands=8)
+genome.load_state_dict(torch.load("genome_transformer.pt", weights_only=True))
+model = GrownTransformer(genome, vocab_size=20000, embed_dim=128, num_heads=4, num_layers=2, num_classes=2)
+# --- Video Transformer (Moving MNIST) ---
+from experiments.rung4_video import SpatiotemporalGenome, GenomeVideoTransformer
+stg = SpatiotemporalGenome()
+stg.load_state_dict(torch.load("genome_video.pt", weights_only=True))
+model = GenomeVideoTransformer(stg, d_model=64, nhead=4, num_layers=2, n_frames=10, patch_size=8, img_size=64)
+# --- Transfer (CIFAR-10 genome -> CIFAR-100) ---
+genome = Genome(n_types=8, type_dim=8, n_bands=6)
+genome.load_state_dict(torch.load("genome_cifar100_fresh.pt", weights_only=True))
+model = GrownNetwork(genome, input_dim=3072, hidden_bands=[128, 128, 128, 128], output_dim=100)
+```
+## Links
+- **Paper**: [Zenodo (DOI: 10.5281/zenodo.19248389)](https://doi.org/10.5281/zenodo.19248389)
+- **Code**: [github.com/tejassudsfp/ndna](https://github.com/tejassudsfp/ndna)
+- **Author**: [Tejas Parthasarathi Sudarshan](https://tejassuds.com) (tejas@fandesk.ai)
+## Citation
+```bibtex
+@article{sudarshan2026ndna,
+  title={Neural DNA: A Compact Genome for Growing Network Architecture},
+  author={Sudarshan, Tejas Parthasarathi},
+  year={2026},
+  doi={10.5281/zenodo.19248389}
+}
+```
+## License
+[MIT](https://opensource.org/licenses/MIT)