Raziel1234 commited on
Commit
e6551d4
·
verified ·
1 Parent(s): 51ff947

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -3
README.md CHANGED
@@ -1,3 +1,63 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - TopAI-1/Image-Dataset
5
+ language:
6
+ - en
7
+ pipeline_tag: text-to-image
8
+ library_name: transformers
9
+ tags:
10
+ - art
11
+ ---
12
+
13
+
14
+ # Pixel-1: From-Scratch Text-to-Image Generator 🎨
15
+
16
+ Pixel-1 is a lightweight, experimental text-to-image model built and trained entirely from scratch. Unlike many modern generators that rely on massive pre-trained diffusion backbones, Pixel-1 explores the potential of a compact architecture to understand and render complex semantic prompts.
17
+
18
+ ## 🚀 The Achievement
19
+ Pixel-1 was designed to prove that even a small model can achieve high logical alignment with user prompts. It successfully renders complex concepts like **window bars**, **fence shadows**, and **specific color contrasts**—features usually reserved for much larger models.
20
+
21
+ ### Key Features:
22
+ * **Built from Scratch:** The Generator architecture (Upsampling, Residual Blocks, and Projections) was designed and trained without pre-trained image weights.
23
+ * **High Prompt Adherence:** Exceptional ability to "listen" to complex instructions (e.g., "Window with metal bars and fence shadow").
24
+ * **Efficient Architecture:** Optimized for fast inference and training on consumer-grade GPUs (like Kaggle's T4).
25
+ * **Latent Understanding:** Uses a CLIP-based text encoder to bridge the gap between human language and pixel space.
26
+
27
+ ---
28
+
29
+ ## 🏗️ Architecture
30
+ The model uses a series of Transposed Convolutional layers combined with Residual Blocks to upsample a latent text vector into a 128x128 image.
31
+
32
+ * **Encoder:** CLIP (OpenAI/clip-vit-large-patch14)
33
+ * **Decoder:** Custom CNN-based Generator with Skip Connections
34
+ * **Loss Function:** L1/MSE transition
35
+ * **Resolution:** 128x128 (v1)
36
+
37
+
38
+
39
+ ---
40
+
41
+ ## 🖼️ Samples & Prompting
42
+ Pixel-1 shines when given high-contrast, descriptive prompts.
43
+
44
+ **Recommended Prompting Style:**
45
+ > *"Window with metal bars and fence shadow, high contrast, vivid colors, detailed structure"*
46
+
47
+ **Observations:**
48
+ While the current version (v1) produces stylistic, slightly "painterly" or "pixelated" results, its spatial reasoning is remarkably accurate, correctly placing shadows and structural elements according to the text.
49
+
50
+ ---
51
+
52
+ ## 🛠️ How to use
53
+ ```python
54
+ from transformers import AutoTokenizer, CLIPTextModel
55
+ from safetensors.torch import load_file
56
+
57
+ # Load the custom Pixel-1 Generator
58
+ model = TopAIImageGenerator.from_pretrained("Your-Repo/Pixel-1")
59
+ tokenizer = AutoTokenizer.from_pretrained("openai/clip-vit-large-patch14")
60
+
61
+ # Generate
62
+ prompt = "Your creative prompt here"
63
+ # ... (standard inference code)