SarcasmDiffusion / README.md

Update README.md

c6939ae verified 3 months ago

6.32 kB

	---
	license: mit
	base_model:
	- stabilityai/stable-diffusion-xl-base-1.0
	pipeline_tag: text-to-image
	---
	# SarcasmDiffusion — SDXL Fused Meme Generator

	Model type: Stable Diffusion XL (Base 1.0) fine‑tuned via LoRA (merged/fused) to learn the visual style of sarcastic/ironic memes.
	Author: Ricardo Urdaneta (github.com/Ricardouchub)


	---

	## Overview

	SarcasmDiffusion is a diffusion-based generative model focused on producing clean meme-style photographs that are suitable for caption overlays (text is added after generation). The model was LoRA‑fine‑tuned on a filtered and enriched subset of the Hateful Memes dataset to capture stylistic cues of humorous/ironic memes while avoiding offensive content.

	- Base: `stabilityai/stable-diffusion-xl-base-1.0`
	- Fine‑tuning: LoRA on the UNet only; VAE and text encoders are frozen.
	- Exported artifact: Fused SDXL (no external LoRA required at inference).

	> This model focuses on style transfer for meme aesthetics (composition, lighting, “stock-photo vibe”), not on rendering text inside images. Add titles/subtitles with your own overlay function or editor.

	---

	## Intended Use

	- Generating meme-ready images with space at the top/bottom for captions.
	- Creative exploration of humorous/ironic visual setups controlled by prompts.
	- Educational/portfolio use for LoRA fine‑tuning workflows with SDXL.

	### Out of Scope / Limitations
	- No text rendering inside the image (explicitly discouraged via negative prompts).
	- May produce stock-like aesthetics by design.
	- Not suitable for generating or amplifying harmful, hateful, or NSFW content.
	- As with all text-to-image systems, prompts with ambiguous semantics can yield unpredictable outputs.

	---

	## Training Summary

	- Base model: SDXL Base 1.0
	- LoRA rank / alpha / dropout: `r=8`, `alpha=16`, `dropout=0.05`
	- Resolution: 1024 (training); common inference at 768–896 for speed
	- Batch: 1 (gradient accumulation = 4)
	- Steps: ~9k (≈2 epoch on ~5k images)
	- Learning Rate: 0.0001
	- Precision: fp16 (LoRA params kept in fp32 during training)
	- Optimizer: AdamW
	- Scheduler: cosine with warmup (recommended)
	- Frozen: VAE, text_encoder, text_encoder_2

	### Data
	- Source: Hateful Memes (Facebook AI).
	- We excluded labeled hateful samples and applied NLP enrichment:
	- Emotion scoring (GoEmotions distilled) and irony scoring (RoBERTa‑irony).
	- Heuristics + percentiles → tones: `humor / irony / neutral`.
	- Final training CSV: prompts balanced by tone; negative prompts to avoid text overlays, low‑quality artifacts, watermarks/logos, and unsafe content.

	> The dataset is not included here. Please obtain Hateful Memes under its original terms and reproduce the preprocessing if needed.

	---

	## Safety, Ethics & Mitigations

	- Hateful labels were filtered out negative prompts is used to avoid NSFW/hate/text overlays.
	- Despite mitigations, misuse is possible. Users are responsible for prompting responsibly and complying with local laws and platform policies.
	- Do not use the model to create defamatory, harassing, discriminatory, or otherwise harmful imagery.

	Known risks: dataset biases may remain; aesthetic biases (stock-photo look); occasional failure to respect negative prompts.

	---

	## How to Use

	```python
	from diffusers import AutoPipelineForText2Image
	import torch

	pipe = AutoPipelineForText2Image.from_pretrained(
	"Ricardouchub/SarcasmDiffusion",
	torch_dtype=torch.float16
	).to("cuda") # use "cpu" if no GPU

	prompt = (
	"sarcastic meme about checking the fridge for the third time, "
	"centered subject, plain background, high-contrast photo, stock photo style"
	)
	negative = "nsfw, hate speech, slur, watermark, logo, low quality, blurry, busy background, text overlay"

	g = torch.Generator(device=pipe.device).manual_seed(123)
	image = pipe(prompt,
	negative_prompt=negative,
	num_inference_steps=22,
	guidance_scale=6.3,
	width=896, height=896,
	generator=g).images[0]

	image.save("sample.png")
	```

	### Prompting Tips
	- Add layout hints: “centered subject”, “plain background”, “space at top and bottom”.
	- Keep negative prompts to avoid logos/text/NSFW.
	- Use seeds for reproducibility; `steps=18–28`, `guidance=5.5–7.5`, `size=768–1024`.

	---

	## Environment & Compatibility

	To ensure full compatibility when loading this model (fused SDXL with LoRA merged), use the following library versions:

	\| Library \| Recommended Version \| Notes \|
	\|----------\|--------------------\|-------\|
	\| Python \| 3.10 – 3.12 \| Tested on Colab (Python 3.12) \|
	\| PyTorch \| 2.6.0 + CUDA 12.4 \| Any CUDA ≥ 12 works \|
	\| diffusers \| 0.35.1 \| Core inference & model loading \|
	\| transformers \| 4.45.2 \| Required for SDXL CLIPTextEncoder compatibility \|
	\| accelerate \| 1.10.1 \| Device and fp16 inference management \|
	\| huggingface_hub \| 0.23.5 \| Compatible with diffusers 0.35.x \|
	\| safetensors \| ≥ 0.4.5 \| For secure model weights loading \|

	Install in Colab or local environment:

	```bash
	pip install "diffusers==0.35.1" "transformers==4.45.2" "accelerate==1.10.1" "huggingface_hub==0.23.5" safetensors
	```

	> Important:
	> Using newer versions (e.g., `transformers ≥ 4.56`) may break compatibility due to API changes in `CLIPTextModel` (`offload_state_dict` argument).
	> Always match the versions above for smooth loading.

	---

	## License

	- Code: MIT
	- Model weights: follow the base model’s license (Stability AI / SDXL Base 1.0).
	- Data: Users must obtain Hateful Memes from its source and agree to its terms.

	> By using this model, you agree not to generate content that is illegal, harmful, or violates rights of others.

	---

	## Evaluation

	Qualitative assessment via fixed prompt sheets (humor/irony/neutral). Suggested automatic metrics for future work: CLIP‑score vs. caption, aesthetic predictors, and human preference studies.

	---

	## Acknowledgments

	- Stability AI — SDXL Base 1.0
	- Hugging Face — Diffusers, Accelerate, PEFT
	- Facebook AI — Hateful Memes dataset