morj
/

renaissance

@@ -6,7 +6,7 @@ language:
 - en
 library_name: keras
 tags:
-- '#stablediffusion '
 - '#renaissance'
 - '# finetune'
 - '#kerascv'
@@ -17,29 +17,260 @@ tags:
 base_model: CompVis/stable-diffusion-v1-4
 ---
 # Model Card for Renaissance Stable Diffusion
-<!-- Provide a quick summary of what the model is/does. -->
-This model uses the KerasCV implementation of stability.ai's text-to-image model. Unlike other open-source alternatives like Hugging Face's Diffusers, KerasCV offers advantages such as XLA compilation and mixed precision support, resulting in state-of-the-art generation speed.
-## Model Details
-If you'd like to see more regarding our process, results, or additional information about this project, please navigate to the Wiki section of this repository also available [here](https://github.com/martingasparyan/Fine-Tune-Stable-Diffusion/wiki).
-#### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This model can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses a fixed, pretrained text encoder (OpenCLIP-ViT/H) to generate high-quality Reniassance portraits from textual prompts.
-This model uses the KerasCV implementation of stability.ai's text-to-image model, Stable Diffusion.
-- **Developed by:** Martin Gasparyan and Tatev Kyosababyan
-- **Model type:** Diffusion-based text-to-image generative model
-- **Language(s) (NLP):** Python
-- **License:** CreativeML Open RAIL++-M License
-- **Finetuned from model [https://huggingface.co/CompVis/stable-diffusion-v1-4]:** https://github.com/keras-team/keras-cv/tree/master/keras_cv/models/stable_diffusion
-## To Generate your own Examples:
 ### 1. Install Dependencies
 ```python
 !pip install keras-cv==0.6.0 -q
@@ -69,23 +300,23 @@ my_base_model = keras_cv.models.StableDiffusion(img_width=512, img_height=512)
 ```
 ### 4. Load Weights from the h5 model which is hosted on Hugging Face:
 ```python
-my_base_model.diffusion_model.load_weights('/path/to/file/renaissance_model.h5')
 ```
 ### 5. Create a variable to hold the values of the to-be-generated image such as prompt, batch size, iterations, and seed
 ```python
 img = my_base_model.text_to_image(
-       prompt="A woman with an enigmatic smile against a dark background",
        batch_size=1,  # How many images to generate at once
        num_steps=25,  # Number of iterations (controls image quality)
        seed=123,  # Set this to always get the same image from the same prompt
     )
 ```
-### 6. Display using the function:
 ```python
 def plot_images(images):
     plt.figure(figsize=(5, 5))
     plt.imshow(images)
-    plt.axis("off")
 plot_images(img)
 ```

 - en
 library_name: keras
 tags:
+- '#stablediffusion'
 - '#renaissance'
 - '# finetune'
 - '#kerascv'
 base_model: CompVis/stable-diffusion-v1-4
 ---
 # Model Card for Renaissance Stable Diffusion
+<!-- Provide a quick summary of what the model is/does. [Optional] -->
+This is a Fine-tuned Stable Diffusion model on a custom dataset of {image, caption} pairs. This model has been built on top of the fine-tuning script provided by Hugging Face. This model uses the KerasCV implementation of stability.ai's text-to-image model. Unlike other open-source alternatives like Hugging Face's Diffusers, KerasCV offers advantages such as XLA compilation and mixed precision support, resulting in state-of-the-art generation speed. Fine-tuning the Stable Diffusion model for generating high-quality Renaissance-style portraits.
+#  Table of Contents
+- [Model Card for Renaissance Stable Diffusion](#model-card-for--model_id-)
+- [Table of Contents](#table-of-contents)
+- [Table of Contents](#table-of-contents-1)
+- [Model Details](#model-details)
+  - [Model Description](#model-description)
+- [Uses](#uses)
+  - [Direct Use](#direct-use)
+  - [Downstream Use [Optional]](#downstream-use-optional)
+  - [Out-of-Scope Use](#out-of-scope-use)
+- [Bias, Risks, and Limitations](#bias-risks-and-limitations)
+  - [Recommendations](#recommendations)
+- [Training Details](#training-details)
+  - [Training Data](#training-data)
+  - [Training Procedure](#training-procedure)
+    - [Preprocessing](#preprocessing)
+    - [Speeds, Sizes, Times](#speeds-sizes-times)
+- [Evaluation](#evaluation)
+  - [Testing Data, Factors & Metrics](#testing-data-factors--metrics)
+    - [Testing Data](#testing-data)
+    - [Factors](#factors)
+    - [Metrics](#metrics)
+  - [Results](#results)
+- [Model Examination](#model-examination)
+- [Environmental Impact](#environmental-impact)
+- [Technical Specifications [optional]](#technical-specifications-optional)
+  - [Model Architecture and Objective](#model-architecture-and-objective)
+  - [Compute Infrastructure](#compute-infrastructure)
+    - [Hardware](#hardware)
+    - [Software](#software)
+- [Citation](#citation)
+- [Glossary [optional]](#glossary-optional)
+- [More Information [optional]](#more-information-optional)
+- [Model Card Authors [optional]](#model-card-authors-optional)
+- [Model Card Contact](#model-card-contact)
+- [How to Get Started with the Model](#how-to-get-started-with-the-model)
+# Model Details
+## Model Description
+<!-- Provide a longer summary of what this model is/does. -->
+This is a Fine-tuned Stable Diffusion model on a custom dataset of {image, caption} pairs. This model has been built on top of the fine-tuning script provided by Hugging Face. This model uses the KerasCV implementation of stability.ai's text-to-image model. Unlike other open-source alternatives like Hugging Face's Diffusers, KerasCV offers advantages such as XLA compilation and mixed precision support, resulting in state-of-the-art generation speed. Fine-tuning the Stable Diffusion model for generating high-quality Renaissance-style portraits
+- **Developed by:** Martin Gasparyan, Tatev Kyosababyan
+- **Shared by:** Martin Gasparyan, Tatev Kyosababyan
+- **Model type:** Computer Vision Model
+- **Language(s) (NLP):** eng
+- **License:** creativeml-openrail-m
+- **Parent Model:** CompVis/stable-diffusion-v1-4
+- **Resources for more information:** More information needed
+    - [GitHub Repo](https://github.com/martingasparyan/Fine-Tune-Stable-Diffusion)
+    - [Associated Paper](https://medium.com/@ngesa254/unlock-creativity-with-stable-diffusion-in-kerascv-9d317199a7c9)
+# Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+## Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+<!-- If the user enters content, print that. If not, but they enter a task in the list, use that. If neither, say "more info needed." -->
+The model is intended for research purposes only. Possible research areas and tasks include
+- Safe deployment of models which have the potential to generate harmful content.
+- Probing and understanding the limitations and biases of generative models.
+- Generation of artworks and use in design and other artistic processes.
+- Applications in educational or creative tools.
+- Research on generative models.
+Excluded uses are described below.
+## Downstream Use [Optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+<!-- If the user enters content, print that. If not, but they enter a task in the list, use that. If neither, say "more info needed." -->
+## Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+<!-- If the user enters content, print that. If not, but they enter a task in the list, use that. If neither, say "more info needed." -->
+The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.
+## Misuse and Malicious Use
+Using the model to generate content that is cruel to individuals is a misuse of this model. This includes, but is not limited to:
+- Generating demeaning, dehumanizing, or otherwise harmful representations of people or their environments, cultures, religions, etc.
+- Intentionally promoting or propagating discriminatory content or harmful stereotypes.
+- Impersonating individuals without their consent.
+- Sexual content without consent of the people who might see it.
+- Mis- and disinformation
+- Representations of egregious violence and gore
+- Sharing of copyrighted or licensed material in violation of its terms of use.
+- Sharing content that is an alteration of copyrighted or licensed material in violation of its terms of use.
+# Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.
+## Limitations
+- The model does not achieve perfect photorealism
+- The model cannot render legible text
+- The model does not perform well on more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere”
+- Faces and people in general may not be generated properly.
+- The model was trained mainly with English captions and will not work as well in other languages.
+- The autoencoding part of the model is lossy
+- The model was trained on a large-scale dataset LAION-5B which contains adult material and is not fit for product use without additional safety mechanisms and considerations.
+- No additional measures were used to deduplicate the dataset. As a result, we observe some degree of memorization for images that are duplicated in the training data. The training data can be searched at https://rom1504.github.io/clip-retrieval/ to possibly assist in the detection of memorized images.
+## Bias
+While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases. Stable Diffusion v1 was trained on subsets of LAION-2B(en), which consists of images that are primarily limited to English descriptions. Texts and images from communities and cultures that use other languages are likely to be insufficiently accounted for. This affects the overall output of the model, as white and western cultures are often set as the default. Further, the ability of the model to generate content with non-English prompts is significantly worse than with English-language prompts.
+# Training Details
+Stable Diffusion v1-4 is a latent diffusion model which combines an autoencoder with a diffusion model that is trained in the latent space of the autoencoder. During training,
+- Images are encoded through an encoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4
+- Text prompts are encoded through a ViT-L/14 text-encoder.
+- The non-pooled output of the text encoder is fed into the UNet backbone of the latent diffusion model via cross-attention.
+- The loss is a reconstruction objective between the noise that was added to the latent and the prediction made by the UNet.
+## Training Data
+<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+We used 11 Renaissance portraits to train the model and created a .csv file with two columns, one for image path and the other for textual description. Dataset can be found at https://huggingface.co/datasets/morj/renaissance_portraits and can be downloaded using
+```
+curl -X GET \
+     "https://datasets-server.huggingface.co/splits?dataset=morj%2Frenaissance_portraits"
+```
+## Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+Note: Only the diffusion model is fine-tuned. The VAE and the text encoder are kept frozen.
+Training details: The fine-tuning process involves adapting the Stable Diffusion model to the specific task of generating Renaissance-style portraits from textual descriptions.
+The dataset we trained our model on can be found here. We used 11 Renaissance portraits to train the model and created a .csv file with two columns, one for image path and the other for textual description.
+When launching training, a diffusion model checkpoint is generated epoch-wise only if the current loss is lower than the previous one. To avoid OOM and faster training, we used an A100 GPU in Google Colab.
+We fine-tuned the model on two different resolutions: 256x256 and 512x512. We only varied the batch size and number of epochs for fine-tuning with these two different resolutions. The best results were obtained with 512 x 512 pixels, 72 epochs, batch size of 1 and mixed precision set to True.
+Hardware: A100 GPUs
+Optimizer: AdamW
+Gradient Accumulations: 2
+Batch: 1
+Learning rate: warmup to 0.0001 for 10,000 steps and then kept constant
+# Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+## Testing Data, Factors & Metrics
+### Testing Data
+<!-- This should link to a Data Card if possible. -->
+More information needed
+### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+More information needed
+### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+More information needed
+## Results
+Please Check out the Github Repo at https://github.com/martingasparyan/Fine-Tune-Stable-Diffusion/wiki
+# Model Examination
+More information needed
+# Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** A100 PCIe 40/80GB
+- **Hours used:** 50
+- **Cloud Provider:** Google Cloud Platform
+- **Compute Region:** us-west1
+- **Carbon Emitted:** 3.75
+\usepackage{hyperref}
+\subsection{CO2 Emission Related to Experiments}
+Experiments were conducted using Google Cloud Platform in region us-west1, which has a carbon efficiency of 0.3 kgCO$_2$eq/kWh. A cumulative of 50 hours of computation was performed on hardware of type A100 PCIe 40/80GB (TDP of 250W).
+Total emissions are estimated to be 3.75 kgCO$_2$eq of which 100 percents were directly offset by the cloud provider.
+%Uncomment if you bought additional offsets:
+%XX kg CO2eq were manually offset through \href{link}{Offset Provider}.
+Estimations were conducted using the \href{https://mlco2.github.io/impact#compute}{MachineLearning Impact calculator} presented in \cite{lacoste2019quantifying}.
+@article{lacoste2019quantifying,
+  title={Quantifying the Carbon Emissions of Machine Learning},
+  author={Lacoste, Alexandre and Luccioni, Alexandra and Schmidt, Victor and Dandres, Thomas},
+  journal={arXiv preprint arXiv:1910.09700},
+  year={2019}
+}
+### Hardware
+A100 PCIe 40/80GB
+### Software
+Google Colab, Jupyter Lab
+# Model Card Authors [optional]
+<!-- This section provides another layer of transparency and accountability. Whose views is this model card representing? How many voices were included in its construction? Etc. -->
+Martin Gasparyan, Tatev Kyosababyan
+# Model Card Contact
+martingasparyan@yahoo.com, tatev.kyosababyan@gmail.com
+# How to Get Started with the Model
+Use the code below to get started with the model.
 ### 1. Install Dependencies
 ```python
 !pip install keras-cv==0.6.0 -q
 ```
 ### 4. Load Weights from the h5 model which is hosted on Hugging Face:
 ```python
+my_base_model.diffusion_model.load_weights(&#39;/path/to/file/renaissance_model.h5&#39;)
 ```
 ### 5. Create a variable to hold the values of the to-be-generated image such as prompt, batch size, iterations, and seed
 ```python
 img = my_base_model.text_to_image(
+       prompt=&#34;A woman with an enigmatic smile against a dark background&#34;,
        batch_size=1,  # How many images to generate at once
        num_steps=25,  # Number of iterations (controls image quality)
        seed=123,  # Set this to always get the same image from the same prompt
     )
 ```
+### 6. Display the image using the function:
 ```python
 def plot_images(images):
     plt.figure(figsize=(5, 5))
     plt.imshow(images)
+    plt.axis(&#34;off&#34;)
 plot_images(img)
 ```