ByteDream / README.md

Enzo8930302

Upload README.md with huggingface_hub

4bd664b verified 15 days ago

preview code

raw

history blame contribute delete

3.92 kB

metadata

license: mit
language: en
tags:
  - text-to-image
  - diffusion
  - cpu-optimized
  - bytedream
  - clip
pipeline_tag: text-to-image

Byte Dream - Text-to-Image Model

Overview

Byte Dream is a production-ready text-to-image diffusion model optimized for CPU inference. It uses CLIP ViT-B/32 for text encoding and a custom UNet architecture for image generation.

Features

✅ CPU Optimized: Runs efficiently on CPU (no GPU required)
✅ High Quality: Generates 512x512 images
✅ Fast Inference: Optimized for speed
✅ Easy to Use: Simple Python API and web interface
✅ Open Source: MIT License

Installation

pip install torch pillow transformers
git lfs install
git clone https://huggingface.co/Enzo8930302/ByteDream
cd ByteDream

Usage

Quick Start

from bytedream import ByteDreamGenerator

# Load model
generator = ByteDreamGenerator(hf_repo_id="Enzo8930302/ByteDream")

# Generate image
image = generator.generate(
    prompt="A beautiful sunset over mountains, digital art",
    num_inference_steps=50,
    guidance_scale=7.5,
)
image.save("output.png")

Using Cloud API

from bytedream import ByteDreamHFClient

client = ByteDreamHFClient(
    repo_id="Enzo8930302/ByteDream",
    use_api=True,
)

image = client.generate(
    prompt="Futuristic city at night, cyberpunk",
)
image.save("output.png")

Training

Train on your own dataset:

# Create dataset
python create_test_dataset.py

# Train model
python train.py --config config.yaml --train_data dataset

Web Interface

Launch Gradio web interface:

python app.py

Or deploy to Hugging Face Spaces:

python deploy_to_spaces.py --repo_id YourUsername/ByteDream-Space

Model Architecture

Text Encoder: CLIP ViT-B/32 (512 dimensions)
UNet: Custom architecture with cross-attention
VAE: Autoencoder for latent space
Scheduler: DDIM sampling

Parameters

Cross-attention dimension: 512
Block channels: [128, 256, 512, 512]
Attention heads: 4
Layers per block: 1

Examples

Prompts that work well:

"A serene lake at sunset with mountains"
"Futuristic city with flying cars, cyberpunk"
"Majestic dragon flying over castle, fantasy"
"Peaceful garden with cherry blossoms"

Tips:

Use detailed, descriptive prompts
Add style keywords (digital art, oil painting, etc.)
Use negative prompts to avoid unwanted elements
Higher guidance scale = more faithful to prompt

Files Structure

ByteDream/
├── bytedream/          # Core package
│   ├── __init__.py
│   ├── generator.py    # Main generator
│   ├── model.py        # Model architecture
│   ├── pipeline.py     # Pipeline
│   ├── scheduler.py    # Scheduler
│   ├── hf_api.py       # HF API client
│   └── utils.py
├── train.py            # Training script
├── infer.py            # Inference
├── app.py              # Web UI
├── config.yaml         # Config
└── requirements.txt    # Dependencies

Requirements

Python 3.8+
PyTorch
Pillow
Transformers
Gradio (for web UI)

See requirements.txt for full list.

License

MIT License

Citation

@software{bytedream2024,
  title={Byte Dream: CPU-Optimized Text-to-Image Generation},
  year={2024}
}

Support

For issues or questions, please open an issue on GitHub.

Created by Enzo and the Byte Dream Team 🎨

Enzo8930302
/

ByteDream