TopAI-1
/

Pixel-1

+---
+license: apache-2.0
+datasets:
+- TopAI-1/Image-Dataset
+language:
+- en
+pipeline_tag: text-to-image
+library_name: transformers
+tags:
+- art
+---
+# Pixel-1: From-Scratch Text-to-Image Generator 🎨
+Pixel-1 is a lightweight, experimental text-to-image model built and trained entirely from scratch. Unlike many modern generators that rely on massive pre-trained diffusion backbones, Pixel-1 explores the potential of a compact architecture to understand and render complex semantic prompts.
+## 🚀 The Achievement
+Pixel-1 was designed to prove that even a small model can achieve high logical alignment with user prompts. It successfully renders complex concepts like **window bars**, **fence shadows**, and **specific color contrasts**—features usually reserved for much larger models.
+### Key Features:
+* **Built from Scratch:** The Generator architecture (Upsampling, Residual Blocks, and Projections) was designed and trained without pre-trained image weights.
+* **High Prompt Adherence:** Exceptional ability to "listen" to complex instructions (e.g., "Window with metal bars and fence shadow").
+* **Efficient Architecture:** Optimized for fast inference and training on consumer-grade GPUs (like Kaggle's T4).
+* **Latent Understanding:** Uses a CLIP-based text encoder to bridge the gap between human language and pixel space.
+---
+## 🏗️ Architecture
+The model uses a series of Transposed Convolutional layers combined with Residual Blocks to upsample a latent text vector into a 128x128 image.
+* **Encoder:** CLIP (OpenAI/clip-vit-large-patch14)
+* **Decoder:** Custom CNN-based Generator with Skip Connections
+* **Loss Function:** L1/MSE transition
+* **Resolution:** 128x128 (v1)
+---
+## 🖼️ Samples & Prompting
+Pixel-1 shines when given high-contrast, descriptive prompts.
+**Recommended Prompting Style:**
+> *"Window with metal bars and fence shadow, high contrast, vivid colors, detailed structure"*
+**Observations:**
+While the current version (v1) produces stylistic, slightly "painterly" or "pixelated" results, its spatial reasoning is remarkably accurate, correctly placing shadows and structural elements according to the text.
+---
+## 🛠️ How to use
+```python
+from transformers import AutoTokenizer, CLIPTextModel
+from safetensors.torch import load_file
+# Load the custom Pixel-1 Generator
+model = TopAIImageGenerator.from_pretrained("Your-Repo/Pixel-1")
+tokenizer = AutoTokenizer.from_pretrained("openai/clip-vit-large-patch14")
+# Generate
+prompt = "Your creative prompt here"
+# ... (standard inference code)