File size: 3,922 Bytes
4bd664b 80b58c8 4bd664b 80b58c8 4bd664b 80b58c8 4bd664b 80b58c8 4bd664b 80b58c8 4bd664b 80b58c8 4bd664b 80b58c8 4bd664b 80b58c8 4bd664b 7218719 4bd664b 7218719 4bd664b 7218719 4bd664b 7218719 4bd664b 80b58c8 4bd664b 7218719 4bd664b 7218719 4bd664b 7218719 4bd664b 7218719 4bd664b 7218719 80b58c8 4bd664b 80b58c8 4bd664b 7218719 4bd664b 7218719 4bd664b 7218719 4bd664b 80b58c8 4bd664b 80b58c8 4bd664b 80b58c8 4bd664b 80b58c8 4bd664b 80b58c8 4bd664b 80b58c8 4bd664b 80b58c8 4bd664b 80b58c8 4bd664b 80b58c8 4bd664b 80b58c8 4bd664b 80b58c8 4bd664b 80b58c8 4bd664b 80b58c8 4bd664b 80b58c8 4bd664b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 | ---
license: mit
language: en
tags:
- text-to-image
- diffusion
- cpu-optimized
- bytedream
- clip
pipeline_tag: text-to-image
---
# Byte Dream - Text-to-Image Model
## Overview
Byte Dream is a production-ready text-to-image diffusion model optimized for CPU inference.
It uses CLIP ViT-B/32 for text encoding and a custom UNet architecture for image generation.
## Features
- β
**CPU Optimized**: Runs efficiently on CPU (no GPU required)
- β
**High Quality**: Generates 512x512 images
- β
**Fast Inference**: Optimized for speed
- β
**Easy to Use**: Simple Python API and web interface
- β
**Open Source**: MIT License
## Installation
```bash
pip install torch pillow transformers
git lfs install
git clone https://huggingface.co/Enzo8930302/ByteDream
cd ByteDream
```
## Usage
### Quick Start
```python
from bytedream import ByteDreamGenerator
# Load model
generator = ByteDreamGenerator(hf_repo_id="Enzo8930302/ByteDream")
# Generate image
image = generator.generate(
prompt="A beautiful sunset over mountains, digital art",
num_inference_steps=50,
guidance_scale=7.5,
)
image.save("output.png")
```
### Using Cloud API
```python
from bytedream import ByteDreamHFClient
client = ByteDreamHFClient(
repo_id="Enzo8930302/ByteDream",
use_api=True,
)
image = client.generate(
prompt="Futuristic city at night, cyberpunk",
)
image.save("output.png")
```
## Training
Train on your own dataset:
```bash
# Create dataset
python create_test_dataset.py
# Train model
python train.py --config config.yaml --train_data dataset
```
## Web Interface
Launch Gradio web interface:
```bash
python app.py
```
Or deploy to Hugging Face Spaces:
```bash
python deploy_to_spaces.py --repo_id YourUsername/ByteDream-Space
```
## Model Architecture
- **Text Encoder**: CLIP ViT-B/32 (512 dimensions)
- **UNet**: Custom architecture with cross-attention
- **VAE**: Autoencoder for latent space
- **Scheduler**: DDIM sampling
### Parameters
- Cross-attention dimension: 512
- Block channels: [128, 256, 512, 512]
- Attention heads: 4
- Layers per block: 1
## Examples
### Prompts that work well:
- "A serene lake at sunset with mountains"
- "Futuristic city with flying cars, cyberpunk"
- "Majestic dragon flying over castle, fantasy"
- "Peaceful garden with cherry blossoms"
### Tips:
- Use detailed, descriptive prompts
- Add style keywords (digital art, oil painting, etc.)
- Use negative prompts to avoid unwanted elements
- Higher guidance scale = more faithful to prompt
## Files Structure
```
ByteDream/
βββ bytedream/ # Core package
β βββ __init__.py
β βββ generator.py # Main generator
β βββ model.py # Model architecture
β βββ pipeline.py # Pipeline
β βββ scheduler.py # Scheduler
β βββ hf_api.py # HF API client
β βββ utils.py
βββ train.py # Training script
βββ infer.py # Inference
βββ app.py # Web UI
βββ config.yaml # Config
βββ requirements.txt # Dependencies
```
## Requirements
- Python 3.8+
- PyTorch
- Pillow
- Transformers
- Gradio (for web UI)
See `requirements.txt` for full list.
## License
MIT License
## Citation
```bibtex
@software{bytedream2024,
title={Byte Dream: CPU-Optimized Text-to-Image Generation},
year={2024}
}
```
## Links
- [GitHub](https://github.com/yourusername/bytedream)
- [Documentation](https://huggingface.co/Enzo8930302/ByteDream/blob/main/README.md)
- [Spaces Demo](https://huggingface.co/spaces/Enzo8930302/ByteDream-Space)
## Support
For issues or questions, please open an issue on GitHub.
---
**Created by Enzo and the Byte Dream Team** π¨
|