Neural DNA (NDNA): A Compact Genome for Growing Network Architecture

A tiny learned genome (< 400 parameters) that grows neural network topology through developmental rules. Default disconnected, type-based compatibility, metabolic cost pressure. The genome discovers useful sparse connectivity that beats random wiring on every experiment (0.39% to 21.7%), matches or exceeds dense baselines on most, and transfers across tasks without retraining.

What is NDNA?

Neural networks typically use fixed, fully-connected layers. NDNA asks: what if a small "genome" could learn which connections should exist?

The genome encodes cell type embeddings and a compatibility matrix. During growth, it compares source and target types for every potential connection and decides whether to wire it or not. A metabolic cost penalty forces selectivity, so only useful connections survive.

The result: 226 to 374 genome parameters control up to 2.2 million connections (8,384:1 compression on our benchmarks, likely higher on larger networks). The grown networks are sparse but structured, and they consistently beat randomly-wired sparse networks.

How It Works

Genome encodes cell type embeddings (8 types, 8 dimensions) and a compatibility matrix
Growth: for each potential connection, source and target type embeddings are compared via the compatibility matrix to produce a connection probability
Binary mask: probabilities are thresholded to produce hard 0/1 masks (straight-through estimator for gradient flow)
Metabolic cost: a sparsity loss penalizes total connection strength, forcing the genome to be selective
Default disconnected: compatibility is initialized negative, so the genome must actively grow every connection

The genome and network weights are trained jointly with standard backpropagation.

Key Results

Experiment	Genome	Random Sparse	Dense Baseline	Genome vs Random
MNIST (MLP)	97.54%	97.09%	98.33%	+0.45%
CIFAR-10 (MLP)	57.14%	51.68%	54.32%	+5.46%
CIFAR-10 (CNN)	88.93%	85.78%	89.80%	+3.15%
CIFAR-100 (Transfer)	60.92%	53.91%	67.16%	+7.01%
IMDB (Transformer)	85.05%	84.66%	84.57%	+0.39%
Moving MNIST (Video)*	62.23	79.44	62.15	+21.7%

*Moving MNIST uses MSE (lower is better). The +21.7% is relative improvement.

The genome beats random sparse wiring on every experiment. The largest gap is on video prediction (+21.7%), where random wiring completely falls apart but genome-grown wiring matches the dense baseline.

Video: Factored Spatiotemporal Genome

The video experiment uses a factored genome: temporal (74 params) + spatial (74 params) + depth (226 params) = 374 total. The temporal genome discovers temporal recency (recent frames get strong connections, distant frames get almost none). The spatial genome discovers spatial locality (nearby patches connect strongly, distant patches barely connect).

Pre-trained Genomes

These are the trained genome files. Each genome is tiny but controls the full network topology.

File	Architecture	Task	Params	Connections	Compression	Result
`genome_mnist.pt`	MLP	MNIST	226	174,240	770:1	97.54%
`genome_cifar10_mlp.pt`	MLP	CIFAR-10	226	1,706,240	7,553:1	57.14%
`genome_cifar10_cnn.pt`	CNN	CIFAR-10	258	165,888	643:1	88.93%
`genome_cifar100_fresh.pt`	MLP	CIFAR-100 (transfer)	226	1,706,240	7,553:1	60.92%
`genome_transformer.pt`	Transformer	IMDB	258	2,162,688	8,384:1	85.05%
`genome_video.pt`	Video Transformer	Moving MNIST	374	307,300	821:1	MSE 62.23

Cross-Task Transfer

The CIFAR-100 genome was not trained on CIFAR-100. It is the CIFAR-10 genome applied directly to CIFAR-100 without retraining the topology. Only the network weights were retrained. The genome's learned connectivity pattern transferred across tasks and still beat random sparse wiring by +7.01%.

Transformer Attention Patterns

The genome also works on transformers. On IMDB sentiment analysis, the grown transformer beats both random sparse and dense baselines.

How to Use

import torch
from genome.model import Genome, GrownNetwork, GrownConvNetwork, GrownTransformer

# --- MLP (MNIST) ---
genome = Genome(n_types=8, type_dim=8, n_bands=6)
genome.load_state_dict(torch.load("genome_mnist.pt", weights_only=True))
model = GrownNetwork(genome, input_dim=784, hidden_bands=[48, 48, 48, 48], output_dim=10)

# --- MLP (CIFAR-10) ---
genome = Genome(n_types=8, type_dim=8, n_bands=6)
genome.load_state_dict(torch.load("genome_cifar10_mlp.pt", weights_only=True))
model = GrownNetwork(genome, input_dim=3072, hidden_bands=[128, 128, 128, 128], output_dim=10)

# --- CNN (CIFAR-10) ---
genome = Genome(n_types=8, type_dim=8, n_bands=8)
genome.load_state_dict(torch.load("genome_cifar10_cnn.pt", weights_only=True))
model = GrownConvNetwork(genome, num_classes=10)

# --- Transformer (IMDB) ---
genome = Genome(n_types=8, type_dim=8, n_bands=8)
genome.load_state_dict(torch.load("genome_transformer.pt", weights_only=True))
model = GrownTransformer(genome, vocab_size=20000, embed_dim=128, num_heads=4, num_layers=2, num_classes=2)

# --- Video Transformer (Moving MNIST) ---
from experiments.rung4_video import SpatiotemporalGenome, GenomeVideoTransformer
stg = SpatiotemporalGenome()
stg.load_state_dict(torch.load("genome_video.pt", weights_only=True))
model = GenomeVideoTransformer(stg, d_model=64, nhead=4, num_layers=2, n_frames=10, patch_size=8, img_size=64)

# --- Transfer (CIFAR-10 genome -> CIFAR-100) ---
genome = Genome(n_types=8, type_dim=8, n_bands=6)
genome.load_state_dict(torch.load("genome_cifar100_fresh.pt", weights_only=True))
model = GrownNetwork(genome, input_dim=3072, hidden_bands=[128, 128, 128, 128], output_dim=100)

Citation

@article{sudarshan2026ndna,
  title={Neural DNA: A Compact Genome for Growing Network Architecture},
  author={Sudarshan, Tejas Parthasarathi},
  year={2026},
  doi={10.5281/zenodo.19248389}
}

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Other

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

tejassuds
/

ndna-genome