Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,38 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
datasets:
|
| 3 |
+
- ffurfaro/PixelBytes-Pokemon
|
| 4 |
+
language: en
|
| 5 |
+
library_name: pytorch
|
| 6 |
+
license: mit
|
| 7 |
+
pipeline_tag: text-to-image
|
| 8 |
+
tags:
|
| 9 |
+
- image-generation
|
| 10 |
+
- text-generation
|
| 11 |
+
- autio-generation
|
| 12 |
+
- multimodal
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# PixelBytes: Unified Multimodal Generation
|
| 16 |
+
|
| 17 |
+
Welcome to the **PixelBytes** repository! This project features models designed to generate text and images simultaneously, pixel by pixel, using a unified embedding. (only testing weight)
|
| 18 |
+
|
| 19 |
+
## Overview
|
| 20 |
+
|
| 21 |
+
### Key Concepts
|
| 22 |
+
- **Image Transformer**: Generates images pixel by pixel.
|
| 23 |
+
- **Bi-Mamba+**: A bidirectional model for time series prediction.
|
| 24 |
+
- **MambaByte**: A selective state-space model without tokens.
|
| 25 |
+
|
| 26 |
+
The PixelByte model generates mixed sequences of text and images, handling transitions with line breaks and maintaining image dimension consistency.
|
| 27 |
+
|
| 28 |
+
## Dataset
|
| 29 |
+
|
| 30 |
+
We use the **PixelBytes-Pokemon** dataset, available on Hugging Face: [PixelBytes-Pokemon](https://huggingface.co/datasets/ffurfaro/PixelBytes-Pokemon). It contains text and image sequences of Pokémon for training our model.
|
| 31 |
+
|
| 32 |
+
## Models Trained
|
| 33 |
+
|
| 34 |
+
- **3 LSTM Models**: 2 Auto-regressive and 1 only predictive.
|
| 35 |
+
|
| 36 |
+
---
|
| 37 |
+
|
| 38 |
+
Thank you for exploring **PixelBytes**! We hope this model aids your multimodal generation projects.
|