ByteDream / README.md
Enzo8930302's picture
Upload README.md with huggingface_hub
4bd664b verified
metadata
license: mit
language: en
tags:
  - text-to-image
  - diffusion
  - cpu-optimized
  - bytedream
  - clip
pipeline_tag: text-to-image

Byte Dream - Text-to-Image Model

Overview

Byte Dream is a production-ready text-to-image diffusion model optimized for CPU inference. It uses CLIP ViT-B/32 for text encoding and a custom UNet architecture for image generation.

Features

  • βœ… CPU Optimized: Runs efficiently on CPU (no GPU required)
  • βœ… High Quality: Generates 512x512 images
  • βœ… Fast Inference: Optimized for speed
  • βœ… Easy to Use: Simple Python API and web interface
  • βœ… Open Source: MIT License

Installation

pip install torch pillow transformers
git lfs install
git clone https://huggingface.co/Enzo8930302/ByteDream
cd ByteDream

Usage

Quick Start

from bytedream import ByteDreamGenerator

# Load model
generator = ByteDreamGenerator(hf_repo_id="Enzo8930302/ByteDream")

# Generate image
image = generator.generate(
    prompt="A beautiful sunset over mountains, digital art",
    num_inference_steps=50,
    guidance_scale=7.5,
)
image.save("output.png")

Using Cloud API

from bytedream import ByteDreamHFClient

client = ByteDreamHFClient(
    repo_id="Enzo8930302/ByteDream",
    use_api=True,
)

image = client.generate(
    prompt="Futuristic city at night, cyberpunk",
)
image.save("output.png")

Training

Train on your own dataset:

# Create dataset
python create_test_dataset.py

# Train model
python train.py --config config.yaml --train_data dataset

Web Interface

Launch Gradio web interface:

python app.py

Or deploy to Hugging Face Spaces:

python deploy_to_spaces.py --repo_id YourUsername/ByteDream-Space

Model Architecture

  • Text Encoder: CLIP ViT-B/32 (512 dimensions)
  • UNet: Custom architecture with cross-attention
  • VAE: Autoencoder for latent space
  • Scheduler: DDIM sampling

Parameters

  • Cross-attention dimension: 512
  • Block channels: [128, 256, 512, 512]
  • Attention heads: 4
  • Layers per block: 1

Examples

Prompts that work well:

  • "A serene lake at sunset with mountains"
  • "Futuristic city with flying cars, cyberpunk"
  • "Majestic dragon flying over castle, fantasy"
  • "Peaceful garden with cherry blossoms"

Tips:

  • Use detailed, descriptive prompts
  • Add style keywords (digital art, oil painting, etc.)
  • Use negative prompts to avoid unwanted elements
  • Higher guidance scale = more faithful to prompt

Files Structure

ByteDream/
β”œβ”€β”€ bytedream/          # Core package
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ generator.py    # Main generator
β”‚   β”œβ”€β”€ model.py        # Model architecture
β”‚   β”œβ”€β”€ pipeline.py     # Pipeline
β”‚   β”œβ”€β”€ scheduler.py    # Scheduler
β”‚   β”œβ”€β”€ hf_api.py       # HF API client
β”‚   └── utils.py
β”œβ”€β”€ train.py            # Training script
β”œβ”€β”€ infer.py            # Inference
β”œβ”€β”€ app.py              # Web UI
β”œβ”€β”€ config.yaml         # Config
└── requirements.txt    # Dependencies

Requirements

  • Python 3.8+
  • PyTorch
  • Pillow
  • Transformers
  • Gradio (for web UI)

See requirements.txt for full list.

License

MIT License

Citation

@software{bytedream2024,
  title={Byte Dream: CPU-Optimized Text-to-Image Generation},
  year={2024}
}

Links

Support

For issues or questions, please open an issue on GitHub.


Created by Enzo and the Byte Dream Team 🎨