--- license: mit pipeline_tag: text-to-image --- # PixelModel 🖼️ A neural network where the weights **are** the image. ## 📌 What is this? `model.png` is not a picture — it *is* the model. Every pixel encodes neural network weights. At inference, the PNG is decoded into weight matrices forming a tiny MLP. The prompt is embedded into a vector, and the model generates a 32×32 image. Training directly optimizes pixel values via gradient descent until the PNG becomes the model itself. --- ## 🎨 Weight Encoding - **R channel** → weight magnitude (0–255 → 0.0–1.0) - **B channel** → weight sign (<128 = negative, ≥128 = positive) - **G channel** → unused / reserved --- ## 🧠 Architecture ```text prompt string → char embedding → 32-dim vector → W1 (64×32) → tanh → W2 (64×64) → tanh → W3 (3072×64) → sigmoid → reshape → 32×32×3 image ```` All weights live inside `model.png`. --- ## 🧪 Dataset vs Outputs | Target | Output | | ------------------------------------------ | -------------------------------------- | |

| |

| --- ## 📁 Files ```text model.png ← THE MODEL (64×3200 px) main.py ← inference train.py ← training model.py ← architecture dataset/ red.png red.txt ← prompt: "red" ... ``` --- ## ⚙️ Usage ```bash python train.py python train.py --epochs 500 --lr 0.05 python main.py "red" python main.py "a cat" --out cat.png --scale 8 ``` --- ## 📊 Tips * 6–20 samples are enough * Simple patterns converge fastest * 200–500 epochs typical * Loss < 0.001 is strong for toy datasets --- *It’s a toy. It’s not useful. But it works.* Bench Labs · Simple, Reliable, Open sourced