Spaces:
Running
Running
| <html lang="en"> | |
| <head> | |
| <meta charset="utf-8"> | |
| <meta name="viewport" content="width=device-width, initial-scale=1"> | |
| <title>PixelModel: When the Weights Are the Image</title> | |
| <link rel="stylesheet" href="../style.css"> | |
| </head> | |
| <body> | |
| <div class="background"></div> | |
| <main class="article"> | |
| <a class="back" href="../index.html"> | |
| β Back | |
| </a> | |
| <div class="meta"> | |
| June 2026 | |
| </div> | |
| <h1> | |
| PixelModel: When the Weights Are the Image | |
| </h1> | |
| <p> | |
| What if your neural network's weights weren't stored in some binary file or checkpoint, but were literally encoded in the pixels of a PNG image? That's the premise behind <strong>PixelModel</strong>, a playful experiment where the model <em>is</em> the image. | |
| </p> | |
| <h2> | |
| The Core Idea | |
| </h2> | |
| <p> | |
| In PixelModel, <code>model.png</code> isn't a picture of anything. It <em>is</em> the model. Every pixel's RGB values encode neural network weights: | |
| </p> | |
| <ul> | |
| <li><strong>Red channel</strong>: Model output weight magnitude</li> | |
| <li><strong>Blue channel</strong>: Model output weight sign (β₯128 = positive)</li> | |
| <li><strong>Green channel</strong>: Model output bias values</li> | |
| </ul> | |
| <p> | |
| At inference time, pixels are parsed into three weight matrices forming a tiny MLP. The prompt is embedded into a vector, then a forward pass generates a 32Γ32 image. Training directly optimizes pixel values via gradient descent until the PNG itself becomes the model. | |
| </p> | |
| <h2> | |
| Architecture | |
| </h2> | |
| <p> | |
| The model takes a text prompt and generates images through a simple but effective pipeline: | |
| </p> | |
| <pre><code>prompt string | |
| β char-level embedding β 32-dim vector | |
| β W1 (64Γ32) β tanh | |
| β W2 (64Γ64) β tanh | |
| β W3 (3072Γ64) β sigmoid | |
| β reshape β 32Γ32Γ3 image</code></pre> | |
| <p> | |
| All weights live inside <code>model.png</code>. Opening the PNG is literally opening the neural network. | |
| </p> | |
| <h2> | |
| Usage | |
| </h2> | |
| <h3> | |
| Training | |
| </h3> | |
| <p> | |
| Training is straightforward. You provide 6β20 image-prompt pairs, and the model learns to associate prompts with images by optimizing the pixel values directly: | |
| </p> | |
| <pre><code>python train.py | |
| python train.py --epochs 500 --lr 0.05</code></pre> | |
| <p> | |
| Simple targets converge fastestβsolid colors, gradients, and basic shapes work well. Typically, 200β500 epochs are sufficient, and a loss below 0.001 indicates good convergence for simple datasets. | |
| </p> | |
| <h3> | |
| Generation | |
| </h3> | |
| <p> | |
| Once trained, generating images is as simple as: | |
| </p> | |
| <pre><code>python main.py "red" | |
| python main.py "a cat" --out cat_out.png --scale 8</code></pre> | |
| <p> | |
| The <code>--scale 8</code> flag upscales the 32Γ32 output to 256Γ256 using nearest-neighbour interpolation to preserve the pixel structure. | |
| </p> | |
| <h2> | |
| File Structure | |
| </h2> | |
| <p> | |
| The repository is minimal and self-contained: | |
| </p> | |
| <pre><code>model.png β THE MODEL (64Γ3200 px, ~284 KB) | |
| main.py β inference | |
| train.py β training | |
| model.py β architecture (pixels β weights β forward pass) | |
| dataset/ β training data | |
| cat.png | |
| cat.txt β prompt: "a cat" | |
| ...</code></pre> | |
| <h2> | |
| Why Build This? | |
| </h2> | |
| <blockquote> | |
| It's a toy. It's not useful. But it's cool that it works. | |
| </blockquote> | |
| <p> | |
| PixelModel has a fixed capacity of approximately 600K implicit parameters. While it won't replace your favorite diffusion model, it's a fascinating demonstration of how neural network weights can be encoded in unconventional ways. | |
| </p> | |
| <p> | |
| The project explores the boundaries of how we think about model storage and representation. What if your model could be shared as simply as an image file? What if you could <em>see</em> your neural network just by opening it in an image viewer? | |
| </p> | |
| <h2> | |
| Try It Yourself | |
| </h2> | |
| <p> | |
| The full code and trained model are available on the Hugging Face Hub. Clone the repository, provide your own image-prompt pairs, and watch as gradient descent transforms a PNG into a functioning neural network. | |
| </p> | |
| <p> | |
| Check out the <a href="https://huggingface.co/seton-labs/pixelmodel">PixelModel repository</a> to get started. | |
| </p> | |
| </main> | |
| </body> | |
| </html> |