Neuro_Splat / README.md
Aluode's picture
Update README.md
b92dca0 verified
|
Raw
History Blame Contribute Delete
5.21 kB
---
language: en
license: mit
library_name: pytorch
tags:
- vae
- predictive-coding
- neuroscience
- gabor-splatting
- computer-vision
- biology
datasets:
- nielsr/flowers-102
---
# Neuro Splat — an artificial visual cortex (Gabor wave-packet VAE)
**PerceptionLab / Antti Luode, with Claude (Opus 4.8), in dialogue with Gemini. Helsinki, June 2026.**
> Do not hype. Do not lie. Just show.
*An image is not a million pixels. It is a smooth map and a scatter of small bright wave-packets dropped where the detail lives.*
---
## What this model is
This repo holds the weights (`model.pt`) for a **VAE whose decoder is a differentiable Gabor wave-packet splatter**. It is not a pixel-generating image model; it is a biologically-motivated representation experiment — an architecture that reproduces the *shape* of how V1 is thought to code images (localized Gabor atoms), used to test predictive coding and phase locking.
- **Encoder** (CNN): image → a latent "concept".
- **Decoder** (MLP): latent → the parameters of 512 localized Gabor atoms — position, scale, orientation, frequency, and a complex `(a, b)` coefficient per colour channel.
- **Renderer**: splats those wave-packets onto the canvas; the image is their sum.
Trained on **Oxford Flowers-102** at **128×128 with 512 packets**.
## The blur is the prior; reality is the sharpness
Asked to generate a flower from a random latent, this model returns a soft, watercolour blob — and that is the correct, expected behaviour, not a failure. A generative prior with no input has to average over everything it cannot disambiguate, so it returns the low-frequency gist.
To see it work, you give it eyes. Hook it to a webcam and the blurry top-down prior collides with raw bottom-up reality; the live frame supplies the phase the prior could not guess, the packets lock to it, and the gist sharpens. That collision — top-down prediction meets bottom-up residual — is the predictive-coding loop, made live.
## How to load it
You need the architecture from the [ArtificialCortex repo](https://github.com/anttiluode/ArtificialCortex) (`splat_generator.py` defines `SplatVAE`).
```python
import torch
from huggingface_hub import hf_hub_download
from splat_generator import SplatVAE # from the ArtificialCortex repo (the_splat/)
ckpt = hf_hub_download("Aluode/Neuro_Splat", "model.pt")
model = SplatVAE(image_size=128, latent=128, num_packets=512, chunk=64)
model.load_state_dict(torch.load(ckpt, map_location="cpu"))
model.eval()
```
## How to run it
```bash
pip install torch torchvision numpy opencv-python huggingface_hub
# perception — watch the packets phase-lock to a live webcam (panel 3)
python live_cortex_perception.py --model_path model.pt --image_size 128 --num_packets 512
# generation — the blurry priors from random latents (expected to be soft)
python splat_generator.py --mode sample --resume model.pt --image_size 128 --num_packets 512
```
`--image_size 128 --num_packets 512` must match these weights, or the `state_dict` will not load.
## The honest ledger
**What this model shows:**
- an image can be represented and learned as a sparse sum of localized Gabor wave-packets (the V1 / Olshausen–Field model, made trainable);
- the phase-wrapping problem is dodged by outputting complex `(a, b)` coefficients instead of raw angles;
- the correction half of predictive coding works live: a blurry top-down prior is sharpened by gradient descent against a bottom-up residual (the webcam frame).
**What it is not, and what it lacks:**
- not a photorealistic generator. It is a VAE with an amortized latent — sharp reconstructions, blurry-but-structured samples. It is interesting on biological representation, not on benchmarks against diffusion models;
- the **prior is flower-only**: in-domain (a flower) the top-down gist genuinely helps; off-domain (a room, a face) it is a wrong guess that the live-frame fit overwrites, so there the loop is doing direct re-fitting, not prediction;
- the **"floaters"**: on out-of-domain input, with an aggressive learning rate, the optimizer orphans some packets into tiny ultra-bright dots instead of coordinating them into edges. They *resemble* phosphenes, but the cause is an MSE optimizer with **no lateral inhibition between packets**, not the neural disinhibition that makes real phosphenes — a rhyme, not the mechanism. It points at the missing inhibitory coordination (the `grown_gates` line), not at a recreated brain;
- it is 2D; relative units throughout; one trained model.
**The bet (untouched):** that the phase-locked frame is a *felt* sharpening rather than a computed one. The model locates the mechanism in code that can fail; it does not touch the hard problem.
## Lineage
A sub-organ of [`the_artificial_cortex`](https://github.com/anttiluode/ArtificialCortex) (`the_splat/`). The generator is the trained top-down prior; the live cortex is reality forcing that prior into focus. MIT.
*The generator dreamed a blurry flower because it had nothing to look at. Open its eyes and reality supplies the phase the dream could not; the packets lock, the gist sharpens, and what they cannot yet coordinate, they see as stars.*