File size: 2,333 Bytes
077eb99
 
 
 
1d88bd9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6187293
38bfe91
 
6187293
1d88bd9
 
 
 
 
 
 
 
 
 
 
38bfe91
1d88bd9
 
38bfe91
1d88bd9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
---
license: mit
pipeline_tag: text-to-image
---

# PixelModel πŸ–ΌοΈ

A neural network where the weights **are** the image.

## πŸ“Œ What is this?

`model.png` is not a picture β€” it *is* the model.

Every pixel encodes neural network weights. At inference, the PNG is decoded into weight matrices forming a tiny MLP. The prompt is embedded into a vector, and the model generates a 32Γ—32 image.

Training directly optimizes pixel values via gradient descent until the PNG becomes the model itself.

---

## 🎨 Weight Encoding

- **R channel** β†’ weight magnitude (0–255 β†’ 0.0–1.0)
- **B channel** β†’ weight sign (<128 = negative, β‰₯128 = positive)
- **G channel** β†’ unused / reserved

---

## 🧠 Architecture

```text
prompt string
  β†’ char embedding β†’ 32-dim vector
  β†’ W1 (64Γ—32)  β†’ tanh
  β†’ W2 (64Γ—64)  β†’ tanh
  β†’ W3 (3072Γ—64) β†’ sigmoid
  β†’ reshape β†’ 32Γ—32Γ—3 image
````

All weights live inside `model.png`.

---

## πŸ§ͺ Dataset vs Outputs

| Target                                     | Output                                 |
| ------------------------------------------ | -------------------------------------- |
| <img src="dataset/red.png" width="120">    | <img src="out_red.png" width="120">    |
| <img src="dataset/green.png" width="120">  | <img src="out_green.png" width="120">  |
| <img src="dataset/blue.png" width="120">   | <img src="out_blue.png" width="120">   |
| <img src="dataset/white.png" width="120">  | <img src="out_white.png" width="120">  |
| <img src="dataset/yellow.png" width="120"> | <img src="out_yellow.png" width="120"> |
| <img src="dataset/dark.png" width="120">   | <img src="out_dark.png" width="120">   |

---

## πŸ“ Files

```text
model.png       ← THE MODEL (64Γ—3200 px)
main.py         ← inference
train.py        ← training
model.py        ← architecture
dataset/
  red.png
  red.txt       ← prompt: "red"
  ...
```

---

## βš™οΈ Usage

```bash
python train.py
python train.py --epochs 500 --lr 0.05

python main.py "red"
python main.py "a cat" --out cat.png --scale 8
```

---

## πŸ“Š Tips

* 6–20 samples are enough
* Simple patterns converge fastest
* 200–500 epochs typical
* Loss < 0.001 is strong for toy datasets

---

*It’s a toy. It’s not useful. But it works.*

Bench Labs Β· Simple, Reliable, Open sourced