| ---
|
| license: mit
|
| language: en
|
| tags:
|
| - text-to-image
|
| - diffusion
|
| - cpu-optimized
|
| - bytedream
|
| - clip
|
| pipeline_tag: text-to-image
|
| ---
|
|
|
| # Byte Dream - Text-to-Image Model
|
|
|
| ## Overview
|
| Byte Dream is a production-ready text-to-image diffusion model optimized for CPU inference.
|
| It uses CLIP ViT-B/32 for text encoding and a custom UNet architecture for image generation.
|
|
|
| ## Features
|
| - β
**CPU Optimized**: Runs efficiently on CPU (no GPU required)
|
| - β
**High Quality**: Generates 512x512 images
|
| - β
**Fast Inference**: Optimized for speed
|
| - β
**Easy to Use**: Simple Python API and web interface
|
| - β
**Open Source**: MIT License
|
|
|
| ## Installation
|
|
|
| ```bash
|
| pip install torch pillow transformers
|
| git lfs install
|
| git clone https://huggingface.co/Enzo8930302/ByteDream
|
| cd ByteDream
|
| ```
|
|
|
| ## Usage
|
|
|
| ### Quick Start
|
| ```python
|
| from bytedream import ByteDreamGenerator
|
|
|
| # Load model
|
| generator = ByteDreamGenerator(hf_repo_id="Enzo8930302/ByteDream")
|
|
|
| # Generate image
|
| image = generator.generate(
|
| prompt="A beautiful sunset over mountains, digital art",
|
| num_inference_steps=50,
|
| guidance_scale=7.5,
|
| )
|
| image.save("output.png")
|
| ```
|
|
|
| ### Using Cloud API
|
| ```python
|
| from bytedream import ByteDreamHFClient
|
|
|
| client = ByteDreamHFClient(
|
| repo_id="Enzo8930302/ByteDream",
|
| use_api=True,
|
| )
|
|
|
| image = client.generate(
|
| prompt="Futuristic city at night, cyberpunk",
|
| )
|
| image.save("output.png")
|
| ```
|
|
|
| ## Training
|
|
|
| Train on your own dataset:
|
|
|
| ```bash
|
| # Create dataset
|
| python create_test_dataset.py
|
|
|
| # Train model
|
| python train.py --config config.yaml --train_data dataset
|
| ```
|
|
|
| ## Web Interface
|
|
|
| Launch Gradio web interface:
|
|
|
| ```bash
|
| python app.py
|
| ```
|
|
|
| Or deploy to Hugging Face Spaces:
|
|
|
| ```bash
|
| python deploy_to_spaces.py --repo_id YourUsername/ByteDream-Space
|
| ```
|
|
|
| ## Model Architecture
|
|
|
| - **Text Encoder**: CLIP ViT-B/32 (512 dimensions)
|
| - **UNet**: Custom architecture with cross-attention
|
| - **VAE**: Autoencoder for latent space
|
| - **Scheduler**: DDIM sampling
|
|
|
| ### Parameters
|
| - Cross-attention dimension: 512
|
| - Block channels: [128, 256, 512, 512]
|
| - Attention heads: 4
|
| - Layers per block: 1
|
|
|
| ## Examples
|
|
|
| ### Prompts that work well:
|
| - "A serene lake at sunset with mountains"
|
| - "Futuristic city with flying cars, cyberpunk"
|
| - "Majestic dragon flying over castle, fantasy"
|
| - "Peaceful garden with cherry blossoms"
|
|
|
| ### Tips:
|
| - Use detailed, descriptive prompts
|
| - Add style keywords (digital art, oil painting, etc.)
|
| - Use negative prompts to avoid unwanted elements
|
| - Higher guidance scale = more faithful to prompt
|
|
|
| ## Files Structure
|
|
|
| ```
|
| ByteDream/
|
| βββ bytedream/ # Core package
|
| β βββ __init__.py
|
| β βββ generator.py # Main generator
|
| β βββ model.py # Model architecture
|
| β βββ pipeline.py # Pipeline
|
| β βββ scheduler.py # Scheduler
|
| β βββ hf_api.py # HF API client
|
| β βββ utils.py
|
| βββ train.py # Training script
|
| βββ infer.py # Inference
|
| βββ app.py # Web UI
|
| βββ config.yaml # Config
|
| βββ requirements.txt # Dependencies
|
| ```
|
|
|
| ## Requirements
|
|
|
| - Python 3.8+
|
| - PyTorch
|
| - Pillow
|
| - Transformers
|
| - Gradio (for web UI)
|
|
|
| See `requirements.txt` for full list.
|
|
|
| ## License
|
|
|
| MIT License
|
|
|
| ## Citation
|
|
|
| ```bibtex
|
| @software{bytedream2024,
|
| title={Byte Dream: CPU-Optimized Text-to-Image Generation},
|
| year={2024}
|
| }
|
| ```
|
|
|
| ## Links
|
|
|
| - [GitHub](https://github.com/yourusername/bytedream)
|
| - [Documentation](https://huggingface.co/Enzo8930302/ByteDream/blob/main/README.md)
|
| - [Spaces Demo](https://huggingface.co/spaces/Enzo8930302/ByteDream-Space)
|
|
|
| ## Support
|
|
|
| For issues or questions, please open an issue on GitHub.
|
|
|
| ---
|
|
|
| **Created by Enzo and the Byte Dream Team** π¨
|
|
|