--- license: mit pipeline_tag: text-to-image --- # PixelModel πŸ–ΌοΈ A neural network where the weights **are** the image. ## πŸ“Œ What is this? `model.png` is not a picture β€” it *is* the model. Every pixel encodes neural network weights. At inference, the PNG is decoded into weight matrices forming a tiny MLP. The prompt is embedded into a vector, and the model generates a 32Γ—32 image. Training directly optimizes pixel values via gradient descent until the PNG becomes the model itself. --- ## 🎨 Weight Encoding - **R channel** β†’ weight magnitude (0–255 β†’ 0.0–1.0) - **B channel** β†’ weight sign (<128 = negative, β‰₯128 = positive) - **G channel** β†’ unused / reserved --- ## 🧠 Architecture ```text prompt string β†’ char embedding β†’ 32-dim vector β†’ W1 (64Γ—32) β†’ tanh β†’ W2 (64Γ—64) β†’ tanh β†’ W3 (3072Γ—64) β†’ sigmoid β†’ reshape β†’ 32Γ—32Γ—3 image ```` All weights live inside `model.png`. --- ## πŸ§ͺ Dataset vs Outputs | Target | Output | | ------------------------------------------ | -------------------------------------- | | | | | | | | | | | | | | | | | | | --- ## πŸ“ Files ```text model.png ← THE MODEL (64Γ—3200 px) main.py ← inference train.py ← training model.py ← architecture dataset/ red.png red.txt ← prompt: "red" ... ``` --- ## βš™οΈ Usage ```bash python train.py python train.py --epochs 500 --lr 0.05 python main.py "red" python main.py "a cat" --out cat.png --scale 8 ``` --- ## πŸ“Š Tips * 6–20 samples are enough * Simple patterns converge fastest * 200–500 epochs typical * Loss < 0.001 is strong for toy datasets --- *It’s a toy. It’s not useful. But it works.* Bench Labs Β· Simple, Reliable, Open sourced