---
license: mit
pipeline_tag: text-to-image
---
# PixelModel πΌοΈ
A neural network where the weights **are** the image.
## π What is this?
`model.png` is not a picture β it *is* the model.
Every pixel encodes neural network weights. At inference, the PNG is decoded into weight matrices forming a tiny MLP. The prompt is embedded into a vector, and the model generates a 32Γ32 image.
Training directly optimizes pixel values via gradient descent until the PNG becomes the model itself.
---
## π¨ Weight Encoding
- **R channel** β weight magnitude (0β255 β 0.0β1.0)
- **B channel** β weight sign (<128 = negative, β₯128 = positive)
- **G channel** β unused / reserved
---
## π§ Architecture
```text
prompt string
β char embedding β 32-dim vector
β W1 (64Γ32) β tanh
β W2 (64Γ64) β tanh
β W3 (3072Γ64) β sigmoid
β reshape β 32Γ32Γ3 image
````
All weights live inside `model.png`.
---
## π§ͺ Dataset vs Outputs
| Target | Output |
| ------------------------------------------ | -------------------------------------- |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
## π Files
```text
model.png β THE MODEL (64Γ3200 px)
main.py β inference
train.py β training
model.py β architecture
dataset/
red.png
red.txt β prompt: "red"
...
```
---
## βοΈ Usage
```bash
python train.py
python train.py --epochs 500 --lr 0.05
python main.py "red"
python main.py "a cat" --out cat.png --scale 8
```
---
## π Tips
* 6β20 samples are enough
* Simple patterns converge fastest
* 200β500 epochs typical
* Loss < 0.001 is strong for toy datasets
---
*Itβs a toy. Itβs not useful. But it works.*
Bench Labs Β· Simple, Reliable, Open sourced