pixelmodel / README.md
wop's picture
Update README.md
1d88bd9 verified
|
Raw
History Blame Contribute Delete
2.33 kB
metadata
license: mit
pipeline_tag: text-to-image

PixelModel πŸ–ΌοΈ

A neural network where the weights are the image.

πŸ“Œ What is this?

model.png is not a picture β€” it is the model.

Every pixel encodes neural network weights. At inference, the PNG is decoded into weight matrices forming a tiny MLP. The prompt is embedded into a vector, and the model generates a 32Γ—32 image.

Training directly optimizes pixel values via gradient descent until the PNG becomes the model itself.


🎨 Weight Encoding

  • R channel β†’ weight magnitude (0–255 β†’ 0.0–1.0)
  • B channel β†’ weight sign (<128 = negative, β‰₯128 = positive)
  • G channel β†’ unused / reserved

🧠 Architecture

prompt string
  β†’ char embedding β†’ 32-dim vector
  β†’ W1 (64Γ—32)  β†’ tanh
  β†’ W2 (64Γ—64)  β†’ tanh
  β†’ W3 (3072Γ—64) β†’ sigmoid
  β†’ reshape β†’ 32Γ—32Γ—3 image

All weights live inside model.png.


πŸ§ͺ Dataset vs Outputs

Target Output

πŸ“ Files

model.png       ← THE MODEL (64Γ—3200 px)
main.py         ← inference
train.py        ← training
model.py        ← architecture
dataset/
  red.png
  red.txt       ← prompt: "red"
  ...

βš™οΈ Usage

python train.py
python train.py --epochs 500 --lr 0.05

python main.py "red"
python main.py "a cat" --out cat.png --scale 8

πŸ“Š Tips

  • 6–20 samples are enough
  • Simple patterns converge fastest
  • 200–500 epochs typical
  • Loss < 0.001 is strong for toy datasets

It’s a toy. It’s not useful. But it works.

Bench Labs Β· Simple, Reliable, Open sourced