bench-labs
/

pixelmodel

Text-to-Image

Model card Files Files and versions

xet

Community

wop commited on 6 days ago

Commit

1d88bd9

verified ·

1 Parent(s): 9936e1a

Update README.md

Browse files

Files changed (1) hide show

README.md +84 -147

README.md CHANGED Viewed

@@ -2,156 +2,93 @@
 license: mit
 pipeline_tag: text-to-image
 ---
-<div style="font-family: system-ui, sans-serif; background: #0f0f0f; color: #eaeaea; padding: 2rem; border-radius: 16px; max-width: 860px; margin: auto;">
-  <!-- HERO -->
-  <div style="background: #161616; border: 1px solid #222; padding: 2rem; border-radius: 14px; text-align: center; margin-bottom: 1.5rem;">
-    <h1 style="color: #33b0d8; font-size: 2rem; margin: 0 0 0.5rem;">PixelModel 🖼️</h1>
-    <p style="color: #aaa; font-size: 0.95rem; margin: 0;">
-      A neural network where the weights <strong style="color: #ddd;">are</strong> the image.
-    </p>
-  </div>
-  <!-- DATASET VS OUTPUTS -->
-  <div style="background: #161616; border: 1px solid #222; padding: 1.4rem; border-radius: 12px; margin-bottom: 1.5rem;">
-    <h2 style="color: #fff; font-size: 1rem; margin: 0 0 0.5rem;">🧪 Dataset vs Outputs</h2>
-    <p style="color: #aaa; font-size: 0.875rem; margin: 0 0 1rem;">Ground truth dataset images compared with generated outputs.</p>
-    <table style="width: 100%; border-collapse: collapse; text-align: center;">
-      <tr>
-        <th style="padding: 8px; color: #33b0d8; font-size: 0.85rem;">Red</th>
-        <th style="padding: 8px; color: #33b0d8; font-size: 0.85rem;">Green</th>
-        <th style="padding: 8px; color: #33b0d8; font-size: 0.85rem;">Blue</th>
-      </tr>
-      <tr>
-        <td style="padding: 8px;">
-          <div style="font-size: 0.75rem; color: #555; margin-bottom: 4px;">dataset</div>
-          <img src="dataset/red.png" style="width: 100px; image-rendering: pixelated; border-radius: 6px; display: block; margin: auto;" />
-          <div style="font-size: 0.75rem; color: #555; margin: 4px 0;">output</div>
-          <img src="out_red.png" style="width: 100px; image-rendering: pixelated; border-radius: 6px; display: block; margin: auto;" />
-        </td>
-        <td style="padding: 8px;">
-          <div style="font-size: 0.75rem; color: #555; margin-bottom: 4px;">dataset</div>
-          <img src="dataset/green.png" style="width: 100px; image-rendering: pixelated; border-radius: 6px; display: block; margin: auto;" />
-          <div style="font-size: 0.75rem; color: #555; margin: 4px 0;">output</div>
-          <img src="out_green.png" style="width: 100px; image-rendering: pixelated; border-radius: 6px; display: block; margin: auto;" />
-        </td>
-        <td style="padding: 8px;">
-          <div style="font-size: 0.75rem; color: #555; margin-bottom: 4px;">dataset</div>
-          <img src="dataset/blue.png" style="width: 100px; image-rendering: pixelated; border-radius: 6px; display: block; margin: auto;" />
-          <div style="font-size: 0.75rem; color: #555; margin: 4px 0;">output</div>
-          <img src="out_blue.png" style="width: 100px; image-rendering: pixelated; border-radius: 6px; display: block; margin: auto;" />
-        </td>
-      </tr>
-      <tr>
-        <th style="padding: 8px; color: #33b0d8; font-size: 0.85rem;">White</th>
-        <th style="padding: 8px; color: #33b0d8; font-size: 0.85rem;">Yellow</th>
-        <th style="padding: 8px; color: #33b0d8; font-size: 0.85rem;">Dark</th>
-      </tr>
-      <tr>
-        <td style="padding: 8px;">
-          <div style="font-size: 0.75rem; color: #555; margin-bottom: 4px;">dataset</div>
-          <img src="dataset/white.png" style="width: 100px; image-rendering: pixelated; border-radius: 6px; display: block; margin: auto;" />
-          <div style="font-size: 0.75rem; color: #555; margin: 4px 0;">output</div>
-          <img src="out_white.png" style="width: 100px; image-rendering: pixelated; border-radius: 6px; display: block; margin: auto;" />
-        </td>
-        <td style="padding: 8px;">
-          <div style="font-size: 0.75rem; color: #555; margin-bottom: 4px;">dataset</div>
-          <img src="dataset/yellow.png" style="width: 100px; image-rendering: pixelated; border-radius: 6px; display: block; margin: auto;" />
-          <div style="font-size: 0.75rem; color: #555; margin: 4px 0;">output</div>
-          <img src="out_yellow.png" style="width: 100px; image-rendering: pixelated; border-radius: 6px; display: block; margin: auto;" />
-        </td>
-        <td style="padding: 8px;">
-          <div style="font-size: 0.75rem; color: #555; margin-bottom: 4px;">dataset</div>
-          <img src="dataset/dark.png" style="width: 100px; image-rendering: pixelated; border-radius: 6px; display: block; margin: auto;" />
-          <div style="font-size: 0.75rem; color: #555; margin: 4px 0;">output</div>
-          <img src="out_dark.png" style="width: 100px; image-rendering: pixelated; border-radius: 6px; display: block; margin: auto;" />
-        </td>
-      </tr>
-    </table>
-  </div>
-  <!-- WHAT IS THIS -->
-  <div style="background: #161616; border: 1px solid #222; padding: 1.4rem; border-radius: 12px; margin-bottom: 1.5rem;">
-    <h2 style="color: #fff; font-size: 1rem; margin: 0 0 0.6rem;">What is this?</h2>
-    <p style="color: #aaa; font-size: 0.875rem; line-height: 1.7; margin: 0 0 0.75rem;">
-      <code style="background: #1e1e1e; padding: 2px 6px; border-radius: 4px;">model.png</code> is not a picture of anything — it <em>is</em> the model.
-      Every pixel's RGB values encode neural network weights:
-    </p>
-    <ul style="color: #aaa; font-size: 0.875rem; line-height: 1.7; margin: 0 0 0.75rem; padding-left: 1.1rem;">
-      <li><strong style="color: #ddd;">R channel</strong> — weight magnitude</li>
-      <li><strong style="color: #ddd;">B channel</strong> — weight sign (≥128 = positive)</li>
-      <li><strong style="color: #ddd;">G channel</strong> — bias values</li>
-    </ul>
-    <p style="color: #aaa; font-size: 0.875rem; line-height: 1.7; margin: 0;">
-      At inference, pixels are parsed into 3 weight matrices forming a tiny MLP.
-      The prompt is embedded into a vector, then a forward pass generates a 32×32 image.
-      Training directly optimizes pixel values via gradient descent until the PNG itself becomes the model.
-    </p>
-  </div>
-  <!-- FILES -->
-  <div style="background: #161616; border: 1px solid #222; padding: 1.4rem; border-radius: 12px; margin-bottom: 1.5rem;">
-    <h2 style="color: #fff; font-size: 1rem; margin: 0 0 0.75rem;">📁 Files</h2>
-    <pre style="background: #111; border: 1px solid #1e1e1e; padding: 1rem; border-radius: 8px; color: #aaa; font-size: 0.8rem; overflow-x: auto; margin: 0;">
 model.png       ← THE MODEL (64×3200 px)
 main.py         ← inference
 train.py        ← training
 model.py        ← architecture
-dataset/        ← training data
-  cat.png
-  cat.txt       ← prompt: "a cat"
-  ...</pre>
-  </div>
-  <!-- USAGE -->
-  <div style="background: #161616; border: 1px solid #222; padding: 1.4rem; border-radius: 12px; margin-bottom: 1.5rem;">
-    <h2 style="color: #fff; font-size: 1rem; margin: 0 0 0.75rem;">⚙️ Usage</h2>
-    <p style="color: #33b0d8; font-size: 0.8rem; font-weight: 600; margin: 0 0 0.3rem;">Train</p>
-    <pre style="background: #111; border: 1px solid #1e1e1e; padding: 0.9rem; border-radius: 8px; color: #aaa; font-size: 0.8rem; margin: 0 0 1rem;">
 python train.py
-python train.py --epochs 500 --lr 0.05</pre>
-    <p style="color: #33b0d8; font-size: 0.8rem; font-weight: 600; margin: 0 0 0.3rem;">Generate</p>
-    <pre style="background: #111; border: 1px solid #1e1e1e; padding: 0.9rem; border-radius: 8px; color: #aaa; font-size: 0.8rem; margin: 0 0 0.75rem;">
 python main.py "red"
-python main.py "a cat" --out cat_out.png --scale 8</pre>
-    <p style="color: #aaa; font-size: 0.8rem; margin: 0;">
-      <code style="background: #1e1e1e; padding: 2px 6px; border-radius: 4px;">--scale 8</code> upscales 32×32 → 256×256 using nearest-neighbour interpolation.
-    </p>
-  </div>
-  <!-- ARCHITECTURE -->
-  <div style="background: #161616; border: 1px solid #222; padding: 1.4rem; border-radius: 12px; margin-bottom: 1.5rem;">
-    <h2 style="color: #fff; font-size: 1rem; margin: 0 0 0.75rem;">🧠 Architecture</h2>
-    <pre style="background: #111; border: 1px solid #1e1e1e; padding: 1rem; border-radius: 8px; color: #aaa; font-size: 0.8rem; overflow-x: auto; margin: 0 0 0.75rem;">
-prompt string
-  → char-level embedding → 32-dim vector
-  → W1 (64×32)  → tanh
-  → W2 (64×64)  → tanh
-  → W3 (3072×64) → sigmoid
-  → reshape → 32×32×3 image</pre>
-    <p style="color: #aaa; font-size: 0.875rem; margin: 0;">
-      All weights live inside <code style="background: #1e1e1e; padding: 2px 6px; border-radius: 4px;">model.png</code>. Opening the PNG is literally opening the neural network.
-    </p>
-  </div>
-  <!-- DATASET TIPS -->
-  <div style="background: #161616; border: 1px solid #222; padding: 1.4rem; border-radius: 12px; margin-bottom: 1.5rem;">
-    <h2 style="color: #fff; font-size: 1rem; margin: 0 0 0.6rem;">📊 Dataset Tips</h2>
-    <ul style="color: #aaa; font-size: 0.875rem; line-height: 1.7; margin: 0; padding-left: 1.1rem;">
-      <li>6–20 image-prompt pairs is enough</li>
-      <li>Simple targets converge fastest (solid colors, gradients, shapes)</li>
-      <li>200–500 epochs typically sufficient</li>
-      <li>Loss below 0.001 is good for simple datasets</li>
-      <li>Model capacity is fixed (~600K implicit parameters)</li>
-    </ul>
-  </div>
-  <!-- FOOTER -->
-  <div style="background: #161616; border: 1px solid #222; padding: 1.2rem; border-radius: 12px; text-align: center;">
-    <p style="color: #aaa; font-size: 0.875rem; margin: 0 0 0.25rem;">
-      It's a toy. It's not useful. But it's cool that it works.
-    </p>
-    <p style="color: #444; font-size: 0.8rem; margin: 0;">Bench Labs · Simple, Reliable, Open sourced</p>
-  </div>
-</div>

 license: mit
 pipeline_tag: text-to-image
 ---
+# PixelModel 🖼️
+A neural network where the weights **are** the image.
+## 📌 What is this?
+`model.png` is not a picture — it *is* the model.
+Every pixel encodes neural network weights. At inference, the PNG is decoded into weight matrices forming a tiny MLP. The prompt is embedded into a vector, and the model generates a 32×32 image.
+Training directly optimizes pixel values via gradient descent until the PNG becomes the model itself.
+---
+## 🎨 Weight Encoding
+- **R channel** → weight magnitude (0–255 → 0.0–1.0)
+- **B channel** → weight sign (<128 = negative, ≥128 = positive)
+- **G channel** → unused / reserved
+---
+## 🧠 Architecture
+```text
+prompt string
+  → char embedding → 32-dim vector
+  → W1 (64×32)  → tanh
+  → W2 (64×64)  → tanh
+  → W3 (3072×64) → sigmoid
+  → reshape → 32×32×3 image
+````
+All weights live inside `model.png`.
+---
+## 🧪 Dataset vs Outputs
+| Target                                     | Output                                 |
+| ------------------------------------------ | -------------------------------------- |
+| <img src="dataset/red.png" width="120">    | <img src="out_red.png" width="120">    |
+| <img src="dataset/green.png" width="120">  | <img src="out_green.png" width="120">  |
+| <img src="dataset/blue.png" width="120">   | <img src="out_blue.png" width="120">   |
+| <img src="dataset/white.png" width="120">  | <img src="out_white.png" width="120">  |
+| <img src="dataset/yellow.png" width="120"> | <img src="out_yellow.png" width="120"> |
+| <img src="dataset/dark.png" width="120">   | <img src="out_dark.png" width="120">   |
+---
+## 📁 Files
+```text
 model.png       ← THE MODEL (64×3200 px)
 main.py         ← inference
 train.py        ← training
 model.py        ← architecture
+dataset/
+  red.png
+  red.txt       ← prompt: "red"
+  ...
+```
+---
+## ⚙️ Usage
+```bash
 python train.py
+python train.py --epochs 500 --lr 0.05
 python main.py "red"
+python main.py "a cat" --out cat.png --scale 8
+```
+---
+## 📊 Tips
+* 6–20 samples are enough
+* Simple patterns converge fastest
+* 200–500 epochs typical
+* Loss < 0.001 is strong for toy datasets
+---
+*It’s a toy. It’s not useful. But it works.*
+Bench Labs · Simple, Reliable, Open sourced