sd-1-5-webgpu / README.md

Remove Transformers.js tag (#3)

f17fc68 verified 2 months ago

11.6 kB

	---
	license: creativeml-openrail-m
	base_model: runwayml/stable-diffusion-v1-5
	library_name: onnx
	tags:
	- stable-diffusion
	- text-to-image
	- diffusion
	- webgpu
	- browser-ai
	- onnx
	- zhare-ai
	- client-side
	- privacy-preserving
	pipeline_tag: text-to-image
	inference: false
	widget:
	- text: "A beautiful sunset over mountains, digital art style"
	example_title: "Mountain Sunset"
	- text: "A futuristic cityscape with flying cars at night, cyberpunk"
	example_title: "Cyberpunk City"
	- text: "A serene lake surrounded by autumn trees, oil painting"
	example_title: "Autumn Lake"
	- text: "Portrait of a wise elderly person, studio lighting, photorealistic"
	example_title: "Portrait"
	model-index:
	- name: sd-1-5-webgpu
	results:
	- task:
	type: text-to-image
	name: Text-to-Image Generation
	dataset:
	name: Browser Performance Benchmark
	type: webgpu-inference
	metrics:
	- type: generation-time
	value: 3-45
	name: Generation Time (seconds)
	config: 512x512, 20 steps, various hardware
	- type: memory-usage
	value: 4-6
	name: VRAM Usage (GB)
	config: WebGPU acceleration
	- type: model-size
	value: 3.5
	name: Total Model Size (GB)
	config: All ONNX components
	---

	<div align="center">
	<img src="zhare-logo.png" alt="Zhare-AI Logo" width="200" height="auto" style="margin-bottom: 20px;">
	</div>

	# Stable Diffusion 1.5 WebGPU by Zhare-AI

	<div align="center">

	![License](https://img.shields.io/badge/License-CreativeML_OpenRAIL--M-blue.svg)
	![WebGPU](https://img.shields.io/badge/WebGPU-Ready-green)
	![Privacy](https://img.shields.io/badge/Privacy-First-purple)
	![Production](https://img.shields.io/badge/Production-Ready-brightgreen)

	Privacy-preserving text-to-image generation in your browser with WebGPU acceleration

	</div>

	This is a browser-optimized implementation of Stable Diffusion v1.5, specifically converted and optimized for client-side deployment using WebGPU acceleration. Developed by Zhare-AI, this model enables high-quality image generation directly in web browsers without requiring server infrastructure, ensuring complete user privacy and data sovereignty.

	<div align="center">
	<img src="zhare-logo.png" alt="Zhare-AI - Democratizing AI" width="150" height="auto">
	<p><em>Democratizing AI through distributed computing and privacy-preserving technology</em></p>
	</div>

	## 🌟 Key Features

	- 🌐 Fully Client-Side: Complete image generation in the browser, no data leaves your device
	- ⚡ WebGPU Accelerated: Hardware-accelerated inference with automatic WebAssembly fallback
	- 🔒 Privacy-First: All processing happens locally, protecting user prompts and generated content
	- 📱 Cross-Platform: Compatible with desktop and mobile browsers
	- 🛠️ Production-Ready: Optimized for real-world web applications

	## 🚀 Quick Start

	### Installation & Setup

	```bash
	# Clone or download the model
	git lfs install
	git clone https://huggingface.co/Zhare-AI/sd-1-5-webgpu
	```

	## 📊 Performance Specifications

	### Model Architecture

	\| Component \| Description \| Approximate Size \|
	\|-----------\|-------------\|------------------\|
	\| Text Encoder \| CLIP ViT-L/14 for text understanding \| ~500MB \|
	\| UNet \| Core diffusion model for image generation \| ~3.4GB \|
	\| VAE Decoder \| Converts latents to final images \| ~160MB \|
	\| VAE Encoder \| Encodes images to latent space \| ~160MB \|
	\| Safety Checker \| Content filtering (optional) \| ~600MB \|

	Total Model Size: ~4.8GB (without safety checker: ~4.2GB)

	### Browser Performance Benchmarks

	Generation time for 512×512 images with 20 inference steps:

	\| Hardware Category \| Example Device \| Typical Performance \|
	\|------------------\|----------------\|-------------------\|
	\| High-End Desktop \| RTX 4090, RTX 4080 \| 3-8 seconds \|
	\| Gaming Desktop \| RTX 3080, RTX 3070 \| 8-15 seconds \|
	\| Intel Arc GPUs \| Arc A750, Arc A770 \| 8-15 seconds \|
	\| AMD High-End \| RX 7900 XT/XTX \| 6-12 seconds \|
	\| Apple Silicon \| M2 Max, M1 Ultra \| 10-20 seconds \|
	\| Integrated GPUs \| Intel Iris Xe \| 25-50 seconds \|
	\| WebAssembly Fallback \| CPU-only devices \| 2-10 minutes \|

	### System Requirements

	- Minimum VRAM: 4GB (recommended: 6GB+)
	- System RAM: 8GB minimum, 16GB recommended
	- Storage: 5GB free space for model files
	- Browser: Chrome 113+, Edge 113+ (WebGPU), or any modern browser (WebAssembly fallback)

	## 🌐 Browser Compatibility

	\| Browser \| WebGPU Support \| Performance Level \| Notes \|
	\|---------\|---------------\|------------------\|-------\|
	\| Chrome 113+ \| ✅ Full Support \| Excellent \| Primary recommendation \|
	\| Microsoft Edge 113+ \| ✅ Full Support \| Excellent \| Primary recommendation \|
	\| Firefox 141+ \| ✅ Stable Support \| Very Good \| Recent WebGPU implementation \|
	\| Safari 17.4+ \| 🔶 Experimental \| Good \| Behind feature flag \|
	\| Mobile Chrome 121+ \| 🔶 Limited \| Fair \| Android only, limited memory \|

	All browsers support WebAssembly fallback for universal compatibility

	## 📝 Model Details

	### Training Information

	This model is based on Stable Diffusion v1.5 with the following training characteristics:

	- Base Dataset: LAION-5B filtered subset (~590M image-text pairs)
	- Training Resolution: 512×512 pixels
	- Architecture: Latent Diffusion Model with CLIP ViT-L/14 text encoder
	- Precision: Originally trained in FP32, optimized to FP16 for browser deployment

	### Optimization for Web Deployment

	- ONNX Conversion: Optimized computational graph for web inference
	- WebGPU Kernels: Custom compute shaders for GPU acceleration
	- Memory Efficiency: Attention slicing and dynamic memory management
	- Cross-Platform: WebAssembly fallback ensures universal browser support

	## 🛡️ Ethical Use and Safety

	### Built-in Safety Features

	- Content Filter: Optional NSFW detection and filtering
	- Prompt Sanitization: Basic filtering of potentially harmful prompts
	- Local Processing: No data transmission ensures privacy protection

	### Responsible Use Guidelines

	✅ Encouraged Uses:
	- Creative art and design projects
	- Educational demonstrations of AI capabilities
	- Rapid prototyping for applications
	- Personal creative exploration
	- Research and development

	❌ Prohibited Uses:
	- Creating harmful, offensive, or illegal content
	- Generating misleading information or deepfakes
	- Violating copyright or intellectual property rights
	- Any use that violates the CreativeML OpenRAIL-M license terms

	### Privacy and Data Protection

	- Zero Data Collection: All processing occurs locally in your browser
	- No Server Communication: Model runs entirely offline after initial download
	- User Control: Complete control over generated content and prompts
	- GDPR Compliant: No personal data processing or storage

	## ⚠️ Limitations and Considerations

	### Technical Limitations

	- Resolution: Optimized for 512×512 (other resolutions may reduce quality)
	- Batch Size: Single image generation only in browser environment
	- Memory Constraints: Limited by browser and device VRAM/RAM
	- Generation Speed: Slower than dedicated server hardware

	### Content Limitations

	- Language Bias: Best performance with English prompts
	- Cultural Representation: Training data may reflect Western/English-speaking biases
	- Artistic Style: Tendency toward photorealistic and digital art styles
	- Consistency: Multiple generations from same prompt may vary significantly

	### Browser-Specific Considerations

	- WebGPU Availability: Limited to supporting browsers and devices
	- Memory Management: Browser security limits may affect large model loading
	- Performance Variance: Significant variation across different devices and browsers

	## 📜 License: CreativeML OpenRAIL-M

	This model is released under the CreativeML OpenRAIL-M license, which allows for:

	✅ Permitted:
	- Commercial and non-commercial use
	- Distribution and modification
	- Creation of derivative works
	- Integration into applications and services

	🚫 Restrictions:
	- Must not be used to generate harmful content
	- Cannot be used for illegal activities
	- Must include license terms in any distribution
	- Derivative works must maintain the same license restrictions

	Full License Text: Available at [CreativeML OpenRAIL-M License](https://huggingface.co/spaces/CompVis/stable-diffusion-license)

	### License Compliance

	When using this model:
	1. Include License: Provide license terms to end users
	2. Respect Restrictions: Ensure use cases comply with content restrictions
	3. Derivative Works: Apply same license to modified versions
	4. Attribution: Credit original Stable Diffusion creators and Zhare-AI adaptation

	## 🏢 About Zhare-AI

	<div align="center">
	<img src="zhare-logo.png" alt="Zhare-AI" width="120" height="auto" style="margin: 20px 0;">
	</div>

	Zhare-AI is focused on democratizing AI technology by making powerful models accessible directly in web browsers. Our mission is to enable privacy-preserving AI applications that put users in control of their data and creative processes.

	- Website: [zhare.ai](https://zhare.ai)
	- Focus: Distributed AI computing and browser-based AI applications
	- Philosophy: Privacy-first, user-controlled AI experiences
	- Vision: Making AI accessible, private, and distributed

	### Our Mission

	We believe AI should be:
	- Accessible to everyone, regardless of infrastructure
	- Private with complete user data control
	- Distributed across devices rather than centralized servers
	- Transparent with open-source implementations

	## 📚 Citation and References

	### Cite This Work

	```bibtex
	@misc{zhare-ai-sd15-webgpu-2025,
	title={Stable Diffusion 1.5 WebGPU: Browser-Optimized Text-to-Image Generation},
	author={Zhare-AI},
	year={2025},
	howpublished={\url{https://huggingface.co/Zhare-AI/sd-1-5-webgpu}},
	note={WebGPU-optimized implementation for privacy-preserving browser-based image generation}
	}
	```

	### Original Stable Diffusion Citation

	```bibtex
	@InProceedings{Rombach_2022_CVPR,
	author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Björn},
	title = {High-Resolution Image Synthesis With Latent Diffusion Models},
	booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
	month = {June},
	year = {2022},
	pages = {10684-10695}
	}
	```

	## 🤝 Community and Support

	### Getting Help

	- Issues: Report technical problems via the repository issues
	- Discussions: Join the community discussion for tips and examples
	- Documentation: Comprehensive guides available in the repository

	### Contributing

	We welcome contributions to improve browser compatibility, performance, and user experience:

	- Performance optimizations for different hardware
	- Browser compatibility improvements
	- Documentation enhancements
	- Example applications and tutorials

	---

	<div align="center">
	<img src="zhare-logo.png" alt="Zhare-AI" width="100" height="auto">

	🚀 Ready to create amazing images directly in your browser?

	This model brings the power of Stable Diffusion to web applications while keeping your data completely private and secure.

	Developed with ❤️ by Zhare-AI for the open-source community

	[🌐 Visit Zhare.ai](https://zhare.ai) \| [📧 Contact Us](mailto:contact@zhare.ai) \| [💬 Join Discussion](https://github.com/Zhare-AI)

	</div>