--- license: mit language: en tags: - text-to-image - diffusion - cpu-optimized - bytedream - clip pipeline_tag: text-to-image --- # Byte Dream - Text-to-Image Model ## Overview Byte Dream is a production-ready text-to-image diffusion model optimized for CPU inference. It uses CLIP ViT-B/32 for text encoding and a custom UNet architecture for image generation. ## Features - ✅ **CPU Optimized**: Runs efficiently on CPU (no GPU required) - ✅ **High Quality**: Generates 512x512 images - ✅ **Fast Inference**: Optimized for speed - ✅ **Easy to Use**: Simple Python API and web interface - ✅ **Open Source**: MIT License ## Installation ```bash pip install torch pillow transformers git lfs install git clone https://huggingface.co/Enzo8930302/ByteDream cd ByteDream ``` ## Usage ### Quick Start ```python from bytedream import ByteDreamGenerator # Load model generator = ByteDreamGenerator(hf_repo_id="Enzo8930302/ByteDream") # Generate image image = generator.generate( prompt="A beautiful sunset over mountains, digital art", num_inference_steps=50, guidance_scale=7.5, ) image.save("output.png") ``` ### Using Cloud API ```python from bytedream import ByteDreamHFClient client = ByteDreamHFClient( repo_id="Enzo8930302/ByteDream", use_api=True, ) image = client.generate( prompt="Futuristic city at night, cyberpunk", ) image.save("output.png") ``` ## Training Train on your own dataset: ```bash # Create dataset python create_test_dataset.py # Train model python train.py --config config.yaml --train_data dataset ``` ## Web Interface Launch Gradio web interface: ```bash python app.py ``` Or deploy to Hugging Face Spaces: ```bash python deploy_to_spaces.py --repo_id YourUsername/ByteDream-Space ``` ## Model Architecture - **Text Encoder**: CLIP ViT-B/32 (512 dimensions) - **UNet**: Custom architecture with cross-attention - **VAE**: Autoencoder for latent space - **Scheduler**: DDIM sampling ### Parameters - Cross-attention dimension: 512 - Block channels: [128, 256, 512, 512] - Attention heads: 4 - Layers per block: 1 ## Examples ### Prompts that work well: - "A serene lake at sunset with mountains" - "Futuristic city with flying cars, cyberpunk" - "Majestic dragon flying over castle, fantasy" - "Peaceful garden with cherry blossoms" ### Tips: - Use detailed, descriptive prompts - Add style keywords (digital art, oil painting, etc.) - Use negative prompts to avoid unwanted elements - Higher guidance scale = more faithful to prompt ## Files Structure ``` ByteDream/ ├── bytedream/ # Core package │ ├── __init__.py │ ├── generator.py # Main generator │ ├── model.py # Model architecture │ ├── pipeline.py # Pipeline │ ├── scheduler.py # Scheduler │ ├── hf_api.py # HF API client │ └── utils.py ├── train.py # Training script ├── infer.py # Inference ├── app.py # Web UI ├── config.yaml # Config └── requirements.txt # Dependencies ``` ## Requirements - Python 3.8+ - PyTorch - Pillow - Transformers - Gradio (for web UI) See `requirements.txt` for full list. ## License MIT License ## Citation ```bibtex @software{bytedream2024, title={Byte Dream: CPU-Optimized Text-to-Image Generation}, year={2024} } ``` ## Links - [GitHub](https://github.com/yourusername/bytedream) - [Documentation](https://huggingface.co/Enzo8930302/ByteDream/blob/main/README.md) - [Spaces Demo](https://huggingface.co/spaces/Enzo8930302/ByteDream-Space) ## Support For issues or questions, please open an issue on GitHub. --- **Created by Enzo and the Byte Dream Team** 🎨