Neural DNA (NDNA): A Compact Genome for Growing Network Architecture

A tiny learned genome (< 400 parameters) that grows neural network topology through developmental rules. Default disconnected, type-based compatibility, metabolic cost pressure. The genome discovers useful sparse connectivity that beats random wiring on every experiment (0.39% to 21.7%), matches or exceeds dense baselines on most, and transfers across tasks without retraining.

DOI

Method Overview

What is NDNA?

Neural networks typically use fixed, fully-connected layers. NDNA asks: what if a small "genome" could learn which connections should exist?

The genome encodes cell type embeddings and a compatibility matrix. During growth, it compares source and target types for every potential connection and decides whether to wire it or not. A metabolic cost penalty forces selectivity, so only useful connections survive.

The result: 226 to 374 genome parameters control up to 2.2 million connections (8,384:1 compression on our benchmarks, likely higher on larger networks). The grown networks are sparse but structured, and they consistently beat randomly-wired sparse networks.

How It Works

  1. Genome encodes cell type embeddings (8 types, 8 dimensions) and a compatibility matrix
  2. Growth: for each potential connection, source and target type embeddings are compared via the compatibility matrix to produce a connection probability
  3. Binary mask: probabilities are thresholded to produce hard 0/1 masks (straight-through estimator for gradient flow)
  4. Metabolic cost: a sparsity loss penalizes total connection strength, forcing the genome to be selective
  5. Default disconnected: compatibility is initialized negative, so the genome must actively grow every connection

The genome and network weights are trained jointly with standard backpropagation.

Compression Ratios

Key Results

Genome vs Random Sparsity

Experiment Genome Random Sparse Dense Baseline Genome vs Random
MNIST (MLP) 97.54% 97.09% 98.33% +0.45%
CIFAR-10 (MLP) 57.14% 51.68% 54.32% +5.46%
CIFAR-10 (CNN) 88.93% 85.78% 89.80% +3.15%
CIFAR-100 (Transfer) 60.92% 53.91% 67.16% +7.01%
IMDB (Transformer) 85.05% 84.66% 84.57% +0.39%
Moving MNIST (Video)* 62.23 79.44 62.15 +21.7%

*Moving MNIST uses MSE (lower is better). The +21.7% is relative improvement.

The genome beats random sparse wiring on every experiment. The largest gap is on video prediction (+21.7%), where random wiring completely falls apart but genome-grown wiring matches the dense baseline.

Genome vs Dense

Video: Factored Spatiotemporal Genome

The video experiment uses a factored genome: temporal (74 params) + spatial (74 params) + depth (226 params) = 374 total. The temporal genome discovers temporal recency (recent frames get strong connections, distant frames get almost none). The spatial genome discovers spatial locality (nearby patches connect strongly, distant patches barely connect).

Temporal Mask

Spatial Decay

Pre-trained Genomes

These are the trained genome files. Each genome is tiny but controls the full network topology.

File Architecture Task Params Connections Compression Result
genome_mnist.pt MLP MNIST 226 174,240 770:1 97.54%
genome_cifar10_mlp.pt MLP CIFAR-10 226 1,706,240 7,553:1 57.14%
genome_cifar10_cnn.pt CNN CIFAR-10 258 165,888 643:1 88.93%
genome_cifar100_fresh.pt MLP CIFAR-100 (transfer) 226 1,706,240 7,553:1 60.92%
genome_transformer.pt Transformer IMDB 258 2,162,688 8,384:1 85.05%
genome_video.pt Video Transformer Moving MNIST 374 307,300 821:1 MSE 62.23

Cross-Task Transfer

The CIFAR-100 genome was not trained on CIFAR-100. It is the CIFAR-10 genome applied directly to CIFAR-100 without retraining the topology. Only the network weights were retrained. The genome's learned connectivity pattern transferred across tasks and still beat random sparse wiring by +7.01%.

Topology Convergence

Transformer Attention Patterns

The genome also works on transformers. On IMDB sentiment analysis, the grown transformer beats both random sparse and dense baselines.

Transformer Heatmap

How to Use

import torch
from genome.model import Genome, GrownNetwork, GrownConvNetwork, GrownTransformer

# --- MLP (MNIST) ---
genome = Genome(n_types=8, type_dim=8, n_bands=6)
genome.load_state_dict(torch.load("genome_mnist.pt", weights_only=True))
model = GrownNetwork(genome, input_dim=784, hidden_bands=[48, 48, 48, 48], output_dim=10)

# --- MLP (CIFAR-10) ---
genome = Genome(n_types=8, type_dim=8, n_bands=6)
genome.load_state_dict(torch.load("genome_cifar10_mlp.pt", weights_only=True))
model = GrownNetwork(genome, input_dim=3072, hidden_bands=[128, 128, 128, 128], output_dim=10)

# --- CNN (CIFAR-10) ---
genome = Genome(n_types=8, type_dim=8, n_bands=8)
genome.load_state_dict(torch.load("genome_cifar10_cnn.pt", weights_only=True))
model = GrownConvNetwork(genome, num_classes=10)

# --- Transformer (IMDB) ---
genome = Genome(n_types=8, type_dim=8, n_bands=8)
genome.load_state_dict(torch.load("genome_transformer.pt", weights_only=True))
model = GrownTransformer(genome, vocab_size=20000, embed_dim=128, num_heads=4, num_layers=2, num_classes=2)

# --- Video Transformer (Moving MNIST) ---
from experiments.rung4_video import SpatiotemporalGenome, GenomeVideoTransformer
stg = SpatiotemporalGenome()
stg.load_state_dict(torch.load("genome_video.pt", weights_only=True))
model = GenomeVideoTransformer(stg, d_model=64, nhead=4, num_layers=2, n_frames=10, patch_size=8, img_size=64)

# --- Transfer (CIFAR-10 genome -> CIFAR-100) ---
genome = Genome(n_types=8, type_dim=8, n_bands=6)
genome.load_state_dict(torch.load("genome_cifar100_fresh.pt", weights_only=True))
model = GrownNetwork(genome, input_dim=3072, hidden_bands=[128, 128, 128, 128], output_dim=100)

Links

Citation

@article{sudarshan2026ndna,
  title={Neural DNA: A Compact Genome for Growing Network Architecture},
  author={Sudarshan, Tejas Parthasarathi},
  year={2026},
  doi={10.5281/zenodo.19248389}
}

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train tejassuds/ndna-genome