metadata
license: mit
pipeline_tag: text-to-image
PixelModel πΌοΈ
A neural network where the weights are the image.
π What is this?
model.png is not a picture β it is the model.
Every pixel encodes neural network weights. At inference, the PNG is decoded into weight matrices forming a tiny MLP. The prompt is embedded into a vector, and the model generates a 32Γ32 image.
Training directly optimizes pixel values via gradient descent until the PNG becomes the model itself.
π¨ Weight Encoding
- R channel β weight magnitude (0β255 β 0.0β1.0)
- B channel β weight sign (<128 = negative, β₯128 = positive)
- G channel β unused / reserved
π§ Architecture
prompt string
β char embedding β 32-dim vector
β W1 (64Γ32) β tanh
β W2 (64Γ64) β tanh
β W3 (3072Γ64) β sigmoid
β reshape β 32Γ32Γ3 image
All weights live inside model.png.
π§ͺ Dataset vs Outputs
| Target | Output |
|---|---|
π Files
model.png β THE MODEL (64Γ3200 px)
main.py β inference
train.py β training
model.py β architecture
dataset/
red.png
red.txt β prompt: "red"
...
βοΈ Usage
python train.py
python train.py --epochs 500 --lr 0.05
python main.py "red"
python main.py "a cat" --out cat.png --scale 8
π Tips
- 6β20 samples are enough
- Simple patterns converge fastest
- 200β500 epochs typical
- Loss < 0.001 is strong for toy datasets
Itβs a toy. Itβs not useful. But it works.
Bench Labs Β· Simple, Reliable, Open sourced