| | --- |
| | license: creativeml-openrail-m |
| | base_model: runwayml/stable-diffusion-v1-5 |
| | instance_prompt: photo of a bayc nft |
| | tags: |
| | - stable-diffusion |
| | - stable-diffusion-diffusers |
| | - text-to-image |
| | - diffusers |
| | - dreambooth |
| | inference: true |
| | pipeline_tag: text-to-image |
| | --- |
| | |
| | # DreamBooth - Bored Ape Yacht Club |
| | |
| | ## Model Description |
| |
|
| | This DreamBooth model is an exquisite derivative of `runwayml/stable-diffusion-v1-5`, fine-tuned with an engaging emphasis on the Bored Ape Yacht Club (BAYC) NFT collection. The model's weights were meticulously honed using photos from BAYC NFTs, leveraging the innovative [DreamBooth](https://dreambooth.github.io/) technology to curate a unique, text-to-image synthesis experience. |
| |
|
| |
|
| | ### Training |
| |
|
| | Images instrumental in the model's training were generously sourced from the Covalent API, specifically via this [endpoint](https://www.covalenthq.com/docs/api/nft/get-nft-token-ids-for-contract-with-metadata/). |
| |
|
| | ### Inference |
| |
|
| | Inference has been meticulously optimized, allowing for the generation of captivating, original, and unique images that resonate with the Bored Ape Yacht Club collection. This facilitates a vivid exploration of creativity, enabling the synthesis of images that seamlessly align with the distinctive aesthetics of Bored Ape NFTs. |
| |
|
| |  |
| |  |
| |  |
| |
|
| |
|
| | ## Usage |
| |
|
| | Here’s a basic example of how you can wield this model for generating images: |
| |
|
| | ```python |
| | import torch |
| | from diffusers import StableDiffusionPipeline, DDIMScheduler |
| | from transformers import CLIPTextModel |
| | import numpy as np |
| | |
| | model_id = "runwayml/stable-diffusion-v1-5" |
| | |
| | unet = UNet2DConditionModel.from_pretrained("ckandemir/bayc-diffusion", subfolder="unet") |
| | text_encoder = CLIPTextModel.from_pretrained("ckandemir/bayc-diffusion",subfolder="text_encoder") |
| | |
| | pipeline = StableDiffusionPipeline.from_pretrained( |
| | model_id, unet=unet, text_encoder=text_encoder, dtype=torch.float16, use_safetensors=True |
| | ).to('cuda') |
| | pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config) |
| | |
| | prompt = ["a spiderman bayc nft"] |
| | neg_prompt = ["realistic,disfigured face,eye patch,disfigured eyes, disfigured, deformed,bad anatomy"] * len(prompt) |
| | num_samples = 3 |
| | guidance_scale = 9 |
| | num_inference_steps = 50 |
| | height = 512 |
| | width = 512 |
| | |
| | seed = np.random.randint(0, 2**20 - 1) |
| | print("Seed: {}".format(str(seed))) |
| | generator = torch.Generator(device='cuda').manual_seed(seed) |
| | |
| | with autocast("cuda"), torch.inference_mode(): |
| | imgs = pipeline( |
| | prompt, |
| | negative_prompt=neg_prompt, |
| | height=height, width=width, |
| | num_images_per_prompt=num_samples, |
| | num_inference_steps=num_inference_steps, |
| | guidance_scale=guidance_scale, |
| | generator=generator |
| | ).images |
| | |
| | for img in imgs: |
| | display(img) |
| | ``` |
| |
|
| | ## Optimization |
| | Results can be further enhanced and refined through meticulous fine-tuning and adept modification of training parameters, unlocking an even broader spectrum of creativity and artistic expression. |