|
|
--- |
|
|
license: mit |
|
|
base_model: |
|
|
- stabilityai/stable-diffusion-xl-base-1.0 |
|
|
pipeline_tag: text-to-image |
|
|
--- |
|
|
# SarcasmDiffusion — SDXL Fused Meme Generator |
|
|
|
|
|
**Model type:** Stable Diffusion XL (Base 1.0) fine‑tuned via **LoRA** (merged/fused) to learn the *visual* style of sarcastic/ironic memes. |
|
|
**Author:** Ricardo Urdaneta (github.com/Ricardouchub) |
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## Overview |
|
|
|
|
|
SarcasmDiffusion is a diffusion-based generative model focused on producing **clean meme-style photographs** that are suitable for **caption overlays** (text is added *after* generation). The model was LoRA‑fine‑tuned on a filtered and enriched subset of the *Hateful Memes* dataset to capture stylistic cues of humorous/ironic memes while **avoiding offensive content**. |
|
|
|
|
|
- **Base:** `stabilityai/stable-diffusion-xl-base-1.0` |
|
|
- **Fine‑tuning:** LoRA on the **UNet** only; **VAE** and **text encoders** are frozen. |
|
|
- **Exported artifact:** **Fused SDXL** (no external LoRA required at inference). |
|
|
|
|
|
> This model focuses on **style transfer for meme aesthetics** (composition, lighting, “stock-photo vibe”), *not* on rendering text inside images. Add titles/subtitles with your own overlay function or editor. |
|
|
|
|
|
--- |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
- Generating **meme-ready images** with space at the top/bottom for captions. |
|
|
- Creative exploration of humorous/ironic visual setups controlled by prompts. |
|
|
- Educational/portfolio use for **LoRA fine‑tuning workflows** with SDXL. |
|
|
|
|
|
### Out of Scope / Limitations |
|
|
- **No text rendering inside the image** (explicitly discouraged via negative prompts). |
|
|
- May produce **stock-like** aesthetics by design. |
|
|
- Not suitable for generating or amplifying **harmful, hateful, or NSFW** content. |
|
|
- As with all text-to-image systems, prompts with ambiguous semantics can yield unpredictable outputs. |
|
|
|
|
|
--- |
|
|
|
|
|
## Training Summary |
|
|
|
|
|
- **Base model:** SDXL Base 1.0 |
|
|
- **LoRA rank / alpha / dropout:** `r=8`, `alpha=16`, `dropout=0.05` |
|
|
- **Resolution:** 1024 (training); common inference at 768–896 for speed |
|
|
- **Batch:** 1 (gradient accumulation = 4) |
|
|
- **Steps:** ~9k (≈2 epoch on ~5k images) |
|
|
- **Learning Rate:** 0.0001 |
|
|
- **Precision:** fp16 (LoRA params kept in fp32 during training) |
|
|
- **Optimizer:** AdamW |
|
|
- **Scheduler:** cosine with warmup (recommended) |
|
|
- **Frozen:** VAE, text_encoder, text_encoder_2 |
|
|
|
|
|
### Data |
|
|
- Source: *Hateful Memes* (Facebook AI). |
|
|
- We **excluded** labeled hateful samples and applied **NLP enrichment**: |
|
|
- Emotion scoring (GoEmotions distilled) and irony scoring (RoBERTa‑irony). |
|
|
- Heuristics + percentiles → tones: `humor / irony / neutral`. |
|
|
- Final training CSV: prompts balanced by tone; **negative prompts** to avoid text overlays, low‑quality artifacts, watermarks/logos, and unsafe content. |
|
|
|
|
|
> The dataset is **not** included here. Please obtain *Hateful Memes* under its original terms and reproduce the preprocessing if needed. |
|
|
|
|
|
--- |
|
|
|
|
|
## Safety, Ethics & Mitigations |
|
|
|
|
|
- Hateful labels were filtered out **negative prompts** is used to avoid NSFW/hate/text overlays. |
|
|
- Despite mitigations, **misuse is possible**. Users are responsible for **prompting responsibly** and complying with local laws and platform policies. |
|
|
- Do not use the model to create defamatory, harassing, discriminatory, or otherwise harmful imagery. |
|
|
|
|
|
**Known risks:** dataset biases may remain; aesthetic biases (stock-photo look); occasional failure to respect negative prompts. |
|
|
|
|
|
--- |
|
|
|
|
|
## How to Use |
|
|
|
|
|
```python |
|
|
from diffusers import AutoPipelineForText2Image |
|
|
import torch |
|
|
|
|
|
pipe = AutoPipelineForText2Image.from_pretrained( |
|
|
"Ricardouchub/SarcasmDiffusion", |
|
|
torch_dtype=torch.float16 |
|
|
).to("cuda") # use "cpu" if no GPU |
|
|
|
|
|
prompt = ( |
|
|
"sarcastic meme about checking the fridge for the third time, " |
|
|
"centered subject, plain background, high-contrast photo, stock photo style" |
|
|
) |
|
|
negative = "nsfw, hate speech, slur, watermark, logo, low quality, blurry, busy background, text overlay" |
|
|
|
|
|
g = torch.Generator(device=pipe.device).manual_seed(123) |
|
|
image = pipe(prompt, |
|
|
negative_prompt=negative, |
|
|
num_inference_steps=22, |
|
|
guidance_scale=6.3, |
|
|
width=896, height=896, |
|
|
generator=g).images[0] |
|
|
|
|
|
image.save("sample.png") |
|
|
``` |
|
|
|
|
|
### Prompting Tips |
|
|
- Add **layout hints**: “centered subject”, “plain background”, “space at top and bottom”. |
|
|
- Keep **negative prompts** to avoid logos/text/NSFW. |
|
|
- Use seeds for reproducibility; `steps=18–28`, `guidance=5.5–7.5`, `size=768–1024`. |
|
|
|
|
|
--- |
|
|
|
|
|
## Environment & Compatibility |
|
|
|
|
|
To ensure full compatibility when loading this model (fused SDXL with LoRA merged), use the following library versions: |
|
|
|
|
|
| Library | Recommended Version | Notes | |
|
|
|----------|--------------------|-------| |
|
|
| **Python** | 3.10 – 3.12 | Tested on Colab (Python 3.12) | |
|
|
| **PyTorch** | 2.6.0 + CUDA 12.4 | Any CUDA ≥ 12 works | |
|
|
| **diffusers** | **0.35.1** | Core inference & model loading | |
|
|
| **transformers** | **4.45.2** | Required for SDXL CLIPTextEncoder compatibility | |
|
|
| **accelerate** | **1.10.1** | Device and fp16 inference management | |
|
|
| **huggingface_hub** | **0.23.5** | Compatible with diffusers 0.35.x | |
|
|
| **safetensors** | ≥ 0.4.5 | For secure model weights loading | |
|
|
|
|
|
**Install in Colab or local environment:** |
|
|
|
|
|
```bash |
|
|
pip install "diffusers==0.35.1" "transformers==4.45.2" "accelerate==1.10.1" "huggingface_hub==0.23.5" safetensors |
|
|
``` |
|
|
|
|
|
> **Important:** |
|
|
> Using newer versions (e.g., `transformers ≥ 4.56`) may break compatibility due to API changes in `CLIPTextModel` (`offload_state_dict` argument). |
|
|
> Always match the versions above for smooth loading. |
|
|
|
|
|
--- |
|
|
|
|
|
## License |
|
|
|
|
|
- **Code:** MIT |
|
|
- **Model weights:** follow the base model’s license (Stability AI / SDXL Base 1.0). |
|
|
- **Data:** Users must obtain *Hateful Memes* from its source and agree to its terms. |
|
|
|
|
|
> By using this model, you agree not to generate content that is illegal, harmful, or violates rights of others. |
|
|
|
|
|
--- |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
Qualitative assessment via fixed prompt sheets (humor/irony/neutral). Suggested automatic metrics for future work: CLIP‑score vs. caption, aesthetic predictors, and human preference studies. |
|
|
|
|
|
--- |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
- Stability AI — SDXL Base 1.0 |
|
|
- Hugging Face — Diffusers, Accelerate, PEFT |
|
|
- Facebook AI — Hateful Memes dataset |