---
license: mit
language: en
tags:
  - text-to-image
  - diffusion
  - cpu-optimized
  - bytedream
  - clip
pipeline_tag: text-to-image
---

# Byte Dream - Text-to-Image Model

## Overview
Byte Dream is a production-ready text-to-image diffusion model optimized for CPU inference. 
It uses CLIP ViT-B/32 for text encoding and a custom UNet architecture for image generation.

## Features
- ✅ **CPU Optimized**: Runs efficiently on CPU (no GPU required)
- ✅ **High Quality**: Generates 512x512 images
- ✅ **Fast Inference**: Optimized for speed
- ✅ **Easy to Use**: Simple Python API and web interface
- ✅ **Open Source**: MIT License

## Installation

```bash
pip install torch pillow transformers
git lfs install
git clone https://huggingface.co/Enzo8930302/ByteDream
cd ByteDream
```

## Usage

### Quick Start
```python
from bytedream import ByteDreamGenerator

# Load model
generator = ByteDreamGenerator(hf_repo_id="Enzo8930302/ByteDream")

# Generate image
image = generator.generate(
    prompt="A beautiful sunset over mountains, digital art",
    num_inference_steps=50,
    guidance_scale=7.5,
)
image.save("output.png")
```

### Using Cloud API
```python
from bytedream import ByteDreamHFClient

client = ByteDreamHFClient(
    repo_id="Enzo8930302/ByteDream",
    use_api=True,
)

image = client.generate(
    prompt="Futuristic city at night, cyberpunk",
)
image.save("output.png")
```

## Training

Train on your own dataset:

```bash
# Create dataset
python create_test_dataset.py

# Train model
python train.py --config config.yaml --train_data dataset
```

## Web Interface

Launch Gradio web interface:

```bash
python app.py
```

Or deploy to Hugging Face Spaces:

```bash
python deploy_to_spaces.py --repo_id YourUsername/ByteDream-Space
```

## Model Architecture

- **Text Encoder**: CLIP ViT-B/32 (512 dimensions)
- **UNet**: Custom architecture with cross-attention
- **VAE**: Autoencoder for latent space
- **Scheduler**: DDIM sampling

### Parameters
- Cross-attention dimension: 512
- Block channels: [128, 256, 512, 512]
- Attention heads: 4
- Layers per block: 1

## Examples

### Prompts that work well:
- "A serene lake at sunset with mountains"
- "Futuristic city with flying cars, cyberpunk"
- "Majestic dragon flying over castle, fantasy"
- "Peaceful garden with cherry blossoms"

### Tips:
- Use detailed, descriptive prompts
- Add style keywords (digital art, oil painting, etc.)
- Use negative prompts to avoid unwanted elements
- Higher guidance scale = more faithful to prompt

## Files Structure

```
ByteDream/
├── bytedream/          # Core package
│   ├── __init__.py
│   ├── generator.py    # Main generator
│   ├── model.py        # Model architecture
│   ├── pipeline.py     # Pipeline
│   ├── scheduler.py    # Scheduler
│   ├── hf_api.py       # HF API client
│   └── utils.py
├── train.py            # Training script
├── infer.py            # Inference
├── app.py              # Web UI
├── config.yaml         # Config
└── requirements.txt    # Dependencies
```

## Requirements

- Python 3.8+
- PyTorch
- Pillow
- Transformers
- Gradio (for web UI)

See `requirements.txt` for full list.

## License

MIT License

## Citation

```bibtex
@software{bytedream2024,
  title={Byte Dream: CPU-Optimized Text-to-Image Generation},
  year={2024}
}
```

## Links

- [GitHub](https://github.com/yourusername/bytedream)
- [Documentation](https://huggingface.co/Enzo8930302/ByteDream/blob/main/README.md)
- [Spaces Demo](https://huggingface.co/spaces/Enzo8930302/ByteDream-Space)

## Support

For issues or questions, please open an issue on GitHub.

---

**Created by Enzo and the Byte Dream Team** 🎨