Update README.md

b92dca0 verified 12 days ago

5.21 kB

	---
	language: en
	license: mit
	library_name: pytorch
	tags:
	- vae
	- predictive-coding
	- neuroscience
	- gabor-splatting
	- computer-vision
	- biology
	datasets:
	- nielsr/flowers-102
	---

	# Neuro Splat — an artificial visual cortex (Gabor wave-packet VAE)

	PerceptionLab / Antti Luode, with Claude (Opus 4.8), in dialogue with Gemini. Helsinki, June 2026.

	> Do not hype. Do not lie. Just show.

	An image is not a million pixels. It is a smooth map and a scatter of small bright wave-packets dropped where the detail lives.

	---

	## What this model is

	This repo holds the weights (`model.pt`) for a VAE whose decoder is a differentiable Gabor wave-packet splatter. It is not a pixel-generating image model; it is a biologically-motivated representation experiment — an architecture that reproduces the shape of how V1 is thought to code images (localized Gabor atoms), used to test predictive coding and phase locking.

	- Encoder (CNN): image → a latent "concept".
	- Decoder (MLP): latent → the parameters of 512 localized Gabor atoms — position, scale, orientation, frequency, and a complex `(a, b)` coefficient per colour channel.
	- Renderer: splats those wave-packets onto the canvas; the image is their sum.

	Trained on Oxford Flowers-102 at 128×128 with 512 packets.

	## The blur is the prior; reality is the sharpness

	Asked to generate a flower from a random latent, this model returns a soft, watercolour blob — and that is the correct, expected behaviour, not a failure. A generative prior with no input has to average over everything it cannot disambiguate, so it returns the low-frequency gist.

	To see it work, you give it eyes. Hook it to a webcam and the blurry top-down prior collides with raw bottom-up reality; the live frame supplies the phase the prior could not guess, the packets lock to it, and the gist sharpens. That collision — top-down prediction meets bottom-up residual — is the predictive-coding loop, made live.

	## How to load it

	You need the architecture from the [ArtificialCortex repo](https://github.com/anttiluode/ArtificialCortex) (`splat_generator.py` defines `SplatVAE`).

	```python
	import torch
	from huggingface_hub import hf_hub_download
	from splat_generator import SplatVAE # from the ArtificialCortex repo (the_splat/)

	ckpt = hf_hub_download("Aluode/Neuro_Splat", "model.pt")
	model = SplatVAE(image_size=128, latent=128, num_packets=512, chunk=64)
	model.load_state_dict(torch.load(ckpt, map_location="cpu"))
	model.eval()
	```

	## How to run it

	```bash
	pip install torch torchvision numpy opencv-python huggingface_hub

	# perception — watch the packets phase-lock to a live webcam (panel 3)
	python live_cortex_perception.py --model_path model.pt --image_size 128 --num_packets 512

	# generation — the blurry priors from random latents (expected to be soft)
	python splat_generator.py --mode sample --resume model.pt --image_size 128 --num_packets 512
	```

	`--image_size 128 --num_packets 512` must match these weights, or the `state_dict` will not load.

	## The honest ledger

	What this model shows:
	- an image can be represented and learned as a sparse sum of localized Gabor wave-packets (the V1 / Olshausen–Field model, made trainable);
	- the phase-wrapping problem is dodged by outputting complex `(a, b)` coefficients instead of raw angles;
	- the correction half of predictive coding works live: a blurry top-down prior is sharpened by gradient descent against a bottom-up residual (the webcam frame).

	What it is not, and what it lacks:
	- not a photorealistic generator. It is a VAE with an amortized latent — sharp reconstructions, blurry-but-structured samples. It is interesting on biological representation, not on benchmarks against diffusion models;
	- the prior is flower-only: in-domain (a flower) the top-down gist genuinely helps; off-domain (a room, a face) it is a wrong guess that the live-frame fit overwrites, so there the loop is doing direct re-fitting, not prediction;
	- the "floaters": on out-of-domain input, with an aggressive learning rate, the optimizer orphans some packets into tiny ultra-bright dots instead of coordinating them into edges. They resemble phosphenes, but the cause is an MSE optimizer with no lateral inhibition between packets, not the neural disinhibition that makes real phosphenes — a rhyme, not the mechanism. It points at the missing inhibitory coordination (the `grown_gates` line), not at a recreated brain;
	- it is 2D; relative units throughout; one trained model.

	The bet (untouched): that the phase-locked frame is a felt sharpening rather than a computed one. The model locates the mechanism in code that can fail; it does not touch the hard problem.

	## Lineage

	A sub-organ of [`the_artificial_cortex`](https://github.com/anttiluode/ArtificialCortex) (`the_splat/`). The generator is the trained top-down prior; the live cortex is reality forcing that prior into focus. MIT.

	The generator dreamed a blurry flower because it had nothing to look at. Open its eyes and reality supplies the phase the dream could not; the packets lock, the gist sharpens, and what they cannot yet coordinate, they see as stars.