blog / posts /pixelmodel.html
wop's picture
Create pixelmodel.html
137779e verified
Raw
History Blame Contribute Delete
4.09 kB
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>PixelModel: When the Weights Are the Image</title>
<link rel="stylesheet" href="../style.css">
</head>
<body>
<div class="background"></div>
<main class="article">
<a class="back" href="../index.html">
← Back
</a>
<div class="meta">
June 2026
</div>
<h1>
PixelModel: When the Weights Are the Image
</h1>
<p>
What if your neural network's weights weren't stored in some binary file or checkpoint, but were literally encoded in the pixels of a PNG image? That's the premise behind <strong>PixelModel</strong>, a playful experiment where the model <em>is</em> the image.
</p>
<h2>
The Core Idea
</h2>
<p>
In PixelModel, <code>model.png</code> isn't a picture of anything. It <em>is</em> the model. Every pixel's RGB values encode neural network weights:
</p>
<ul>
<li><strong>Red channel</strong>: Model output weight magnitude</li>
<li><strong>Blue channel</strong>: Model output weight sign (β‰₯128 = positive)</li>
<li><strong>Green channel</strong>: Model output bias values</li>
</ul>
<p>
At inference time, pixels are parsed into three weight matrices forming a tiny MLP. The prompt is embedded into a vector, then a forward pass generates a 32Γ—32 image. Training directly optimizes pixel values via gradient descent until the PNG itself becomes the model.
</p>
<h2>
Architecture
</h2>
<p>
The model takes a text prompt and generates images through a simple but effective pipeline:
</p>
<pre><code>prompt string
β†’ char-level embedding β†’ 32-dim vector
β†’ W1 (64Γ—32) β†’ tanh
β†’ W2 (64Γ—64) β†’ tanh
β†’ W3 (3072Γ—64) β†’ sigmoid
β†’ reshape β†’ 32Γ—32Γ—3 image</code></pre>
<p>
All weights live inside <code>model.png</code>. Opening the PNG is literally opening the neural network.
</p>
<h2>
Usage
</h2>
<h3>
Training
</h3>
<p>
Training is straightforward. You provide 6–20 image-prompt pairs, and the model learns to associate prompts with images by optimizing the pixel values directly:
</p>
<pre><code>python train.py
python train.py --epochs 500 --lr 0.05</code></pre>
<p>
Simple targets converge fastestβ€”solid colors, gradients, and basic shapes work well. Typically, 200–500 epochs are sufficient, and a loss below 0.001 indicates good convergence for simple datasets.
</p>
<h3>
Generation
</h3>
<p>
Once trained, generating images is as simple as:
</p>
<pre><code>python main.py "red"
python main.py "a cat" --out cat_out.png --scale 8</code></pre>
<p>
The <code>--scale 8</code> flag upscales the 32Γ—32 output to 256Γ—256 using nearest-neighbour interpolation to preserve the pixel structure.
</p>
<h2>
File Structure
</h2>
<p>
The repository is minimal and self-contained:
</p>
<pre><code>model.png ← THE MODEL (64Γ—3200 px, ~284 KB)
main.py ← inference
train.py ← training
model.py ← architecture (pixels β†’ weights β†’ forward pass)
dataset/ ← training data
cat.png
cat.txt ← prompt: "a cat"
...</code></pre>
<h2>
Why Build This?
</h2>
<blockquote>
It's a toy. It's not useful. But it's cool that it works.
</blockquote>
<p>
PixelModel has a fixed capacity of approximately 600K implicit parameters. While it won't replace your favorite diffusion model, it's a fascinating demonstration of how neural network weights can be encoded in unconventional ways.
</p>
<p>
The project explores the boundaries of how we think about model storage and representation. What if your model could be shared as simply as an image file? What if you could <em>see</em> your neural network just by opening it in an image viewer?
</p>
<h2>
Try It Yourself
</h2>
<p>
The full code and trained model are available on the Hugging Face Hub. Clone the repository, provide your own image-prompt pairs, and watch as gradient descent transforms a PNG into a functioning neural network.
</p>
<p>
Check out the <a href="https://huggingface.co/seton-labs/pixelmodel">PixelModel repository</a> to get started.
</p>
</main>
</body>
</html>