Upload README.md with huggingface_hub

4bd664b verified 17 days ago

3.92 kB

	---
	license: mit
	language: en
	tags:
	- text-to-image
	- diffusion
	- cpu-optimized
	- bytedream
	- clip
	pipeline_tag: text-to-image
	---

	# Byte Dream - Text-to-Image Model

	## Overview
	Byte Dream is a production-ready text-to-image diffusion model optimized for CPU inference.
	It uses CLIP ViT-B/32 for text encoding and a custom UNet architecture for image generation.

	## Features
	- ✅ CPU Optimized: Runs efficiently on CPU (no GPU required)
	- ✅ High Quality: Generates 512x512 images
	- ✅ Fast Inference: Optimized for speed
	- ✅ Easy to Use: Simple Python API and web interface
	- ✅ Open Source: MIT License

	## Installation

	```bash
	pip install torch pillow transformers
	git lfs install
	git clone https://huggingface.co/Enzo8930302/ByteDream
	cd ByteDream
	```

	## Usage

	### Quick Start
	```python
	from bytedream import ByteDreamGenerator

	# Load model
	generator = ByteDreamGenerator(hf_repo_id="Enzo8930302/ByteDream")

	# Generate image
	image = generator.generate(
	prompt="A beautiful sunset over mountains, digital art",
	num_inference_steps=50,
	guidance_scale=7.5,
	)
	image.save("output.png")
	```

	### Using Cloud API
	```python
	from bytedream import ByteDreamHFClient

	client = ByteDreamHFClient(
	repo_id="Enzo8930302/ByteDream",
	use_api=True,
	)

	image = client.generate(
	prompt="Futuristic city at night, cyberpunk",
	)
	image.save("output.png")
	```

	## Training

	Train on your own dataset:

	```bash
	# Create dataset
	python create_test_dataset.py

	# Train model
	python train.py --config config.yaml --train_data dataset
	```

	## Web Interface

	Launch Gradio web interface:

	```bash
	python app.py
	```

	Or deploy to Hugging Face Spaces:

	```bash
	python deploy_to_spaces.py --repo_id YourUsername/ByteDream-Space
	```

	## Model Architecture

	- Text Encoder: CLIP ViT-B/32 (512 dimensions)
	- UNet: Custom architecture with cross-attention
	- VAE: Autoencoder for latent space
	- Scheduler: DDIM sampling

	### Parameters
	- Cross-attention dimension: 512
	- Block channels: [128, 256, 512, 512]
	- Attention heads: 4
	- Layers per block: 1

	## Examples

	### Prompts that work well:
	- "A serene lake at sunset with mountains"
	- "Futuristic city with flying cars, cyberpunk"
	- "Majestic dragon flying over castle, fantasy"
	- "Peaceful garden with cherry blossoms"

	### Tips:
	- Use detailed, descriptive prompts
	- Add style keywords (digital art, oil painting, etc.)
	- Use negative prompts to avoid unwanted elements
	- Higher guidance scale = more faithful to prompt

	## Files Structure

	```
	ByteDream/
	├── bytedream/ # Core package
	│ ├── __init__.py
	│ ├── generator.py # Main generator
	│ ├── model.py # Model architecture
	│ ├── pipeline.py # Pipeline
	│ ├── scheduler.py # Scheduler
	│ ├── hf_api.py # HF API client
	│ └── utils.py
	├── train.py # Training script
	├── infer.py # Inference
	├── app.py # Web UI
	├── config.yaml # Config
	└── requirements.txt # Dependencies
	```

	## Requirements

	- Python 3.8+
	- PyTorch
	- Pillow
	- Transformers
	- Gradio (for web UI)

	See `requirements.txt` for full list.

	## License

	MIT License

	## Citation

	```bibtex
	@software{bytedream2024,
	title={Byte Dream: CPU-Optimized Text-to-Image Generation},
	year={2024}
	}
	```

	## Links

	- [GitHub](https://github.com/yourusername/bytedream)
	- [Documentation](https://huggingface.co/Enzo8930302/ByteDream/blob/main/README.md)
	- [Spaces Demo](https://huggingface.co/spaces/Enzo8930302/ByteDream-Space)

	## Support

	For issues or questions, please open an issue on GitHub.

	---

	Created by Enzo and the Byte Dream Team 🎨