Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,63 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
datasets:
|
| 4 |
+
- TopAI-1/Image-Dataset
|
| 5 |
+
language:
|
| 6 |
+
- en
|
| 7 |
+
pipeline_tag: text-to-image
|
| 8 |
+
library_name: transformers
|
| 9 |
+
tags:
|
| 10 |
+
- art
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
# Pixel-1: From-Scratch Text-to-Image Generator 🎨
|
| 15 |
+
|
| 16 |
+
Pixel-1 is a lightweight, experimental text-to-image model built and trained entirely from scratch. Unlike many modern generators that rely on massive pre-trained diffusion backbones, Pixel-1 explores the potential of a compact architecture to understand and render complex semantic prompts.
|
| 17 |
+
|
| 18 |
+
## 🚀 The Achievement
|
| 19 |
+
Pixel-1 was designed to prove that even a small model can achieve high logical alignment with user prompts. It successfully renders complex concepts like **window bars**, **fence shadows**, and **specific color contrasts**—features usually reserved for much larger models.
|
| 20 |
+
|
| 21 |
+
### Key Features:
|
| 22 |
+
* **Built from Scratch:** The Generator architecture (Upsampling, Residual Blocks, and Projections) was designed and trained without pre-trained image weights.
|
| 23 |
+
* **High Prompt Adherence:** Exceptional ability to "listen" to complex instructions (e.g., "Window with metal bars and fence shadow").
|
| 24 |
+
* **Efficient Architecture:** Optimized for fast inference and training on consumer-grade GPUs (like Kaggle's T4).
|
| 25 |
+
* **Latent Understanding:** Uses a CLIP-based text encoder to bridge the gap between human language and pixel space.
|
| 26 |
+
|
| 27 |
+
---
|
| 28 |
+
|
| 29 |
+
## 🏗️ Architecture
|
| 30 |
+
The model uses a series of Transposed Convolutional layers combined with Residual Blocks to upsample a latent text vector into a 128x128 image.
|
| 31 |
+
|
| 32 |
+
* **Encoder:** CLIP (OpenAI/clip-vit-large-patch14)
|
| 33 |
+
* **Decoder:** Custom CNN-based Generator with Skip Connections
|
| 34 |
+
* **Loss Function:** L1/MSE transition
|
| 35 |
+
* **Resolution:** 128x128 (v1)
|
| 36 |
+
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
---
|
| 40 |
+
|
| 41 |
+
## 🖼️ Samples & Prompting
|
| 42 |
+
Pixel-1 shines when given high-contrast, descriptive prompts.
|
| 43 |
+
|
| 44 |
+
**Recommended Prompting Style:**
|
| 45 |
+
> *"Window with metal bars and fence shadow, high contrast, vivid colors, detailed structure"*
|
| 46 |
+
|
| 47 |
+
**Observations:**
|
| 48 |
+
While the current version (v1) produces stylistic, slightly "painterly" or "pixelated" results, its spatial reasoning is remarkably accurate, correctly placing shadows and structural elements according to the text.
|
| 49 |
+
|
| 50 |
+
---
|
| 51 |
+
|
| 52 |
+
## 🛠️ How to use
|
| 53 |
+
```python
|
| 54 |
+
from transformers import AutoTokenizer, CLIPTextModel
|
| 55 |
+
from safetensors.torch import load_file
|
| 56 |
+
|
| 57 |
+
# Load the custom Pixel-1 Generator
|
| 58 |
+
model = TopAIImageGenerator.from_pretrained("Your-Repo/Pixel-1")
|
| 59 |
+
tokenizer = AutoTokenizer.from_pretrained("openai/clip-vit-large-patch14")
|
| 60 |
+
|
| 61 |
+
# Generate
|
| 62 |
+
prompt = "Your creative prompt here"
|
| 63 |
+
# ... (standard inference code)
|