Ricardouchub commited on
Commit
397693f
·
verified ·
1 Parent(s): 3d9c701

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +153 -9
README.md CHANGED
@@ -1,19 +1,163 @@
 
 
 
 
 
 
1
  # SarcasmDiffusion — SDXL Fused Meme Generator
2
 
3
- Fine-tuning de **Stable Diffusion XL Base 1.0** usando **LoRA** para aprender el estilo visual de memes sarcásticos.
 
 
4
 
5
- ## Uso rápido
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
 
7
  ```python
8
  from diffusers import AutoPipelineForText2Image
9
  import torch
10
 
11
- pipe = AutoPipelineForText2Image.from_pretrained("Ricardouchub/SarcasmDiffusion", torch_dtype=torch.float16).to("cuda")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
- img = pipe(
14
- "sarcastic meme about running out of GPU VRAM at 3am, high contrast, stock photo style",
15
- negative_prompt="nsfw, text overlay, low quality",
16
- num_inference_steps=20, guidance_scale=6.5
17
- ).images[0]
18
 
19
- img.show()
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model:
4
+ - stabilityai/stable-diffusion-xl-base-1.0
5
+ pipeline_tag: text-to-image
6
+ ---
7
  # SarcasmDiffusion — SDXL Fused Meme Generator
8
 
9
+ **Model type:** Stable Diffusion XL (Base 1.0) fine‑tuned via **LoRA** (merged/fused) to learn the *visual* style of sarcastic/ironic memes.
10
+ **Author:** Ricardo Urdaneta (github.com/Ricardouchub)
11
+ **Repository:** SarcasmDiffusion
12
 
13
+ ---
14
+
15
+ ## Overview
16
+
17
+ SarcasmDiffusion is a diffusion-based generative model focused on producing **clean meme-style photographs** that are suitable for **caption overlays** (text is added *after* generation). The model was LoRA‑fine‑tuned on a filtered and enriched subset of the *Hateful Memes* dataset to capture stylistic cues of humorous/ironic memes while **avoiding offensive content**.
18
+
19
+ - **Base:** `stabilityai/stable-diffusion-xl-base-1.0`
20
+ - **Fine‑tuning:** LoRA on the **UNet** only; **VAE** and **text encoders** are frozen.
21
+ - **Exported artifact:** **Fused SDXL** (no external LoRA required at inference).
22
+
23
+ > This model focuses on **style transfer for meme aesthetics** (composition, lighting, “stock-photo vibe”), *not* on rendering text inside images. Add titles/subtitles with your own overlay function or editor.
24
+
25
+ ---
26
+
27
+ ## Intended Use
28
+
29
+ - Generating **meme-ready images** with space at the top/bottom for captions.
30
+ - Creative exploration of humorous/ironic visual setups controlled by prompts.
31
+ - Educational/portfolio use for **LoRA fine‑tuning workflows** with SDXL.
32
+
33
+ ### Out of Scope / Limitations
34
+ - **No text rendering inside the image** (explicitly discouraged via negative prompts).
35
+ - May produce **stock-like** aesthetics by design.
36
+ - Not suitable for generating or amplifying **harmful, hateful, or NSFW** content.
37
+ - As with all text-to-image systems, prompts with ambiguous semantics can yield unpredictable outputs.
38
+
39
+ ---
40
+
41
+ ## Training Summary
42
+
43
+ - **Base model:** SDXL Base 1.0
44
+ - **LoRA rank / alpha / dropout:** `r=8`, `alpha=16`, `dropout=0.05`
45
+ - **Resolution:** 1024 (training); common inference at 768–896 for speed
46
+ - **Batch:** 1 (gradient accumulation = 4)
47
+ - **Steps:** ~6k (≈0.7 epoch on ~8.5k images)
48
+ - **Precision:** fp16 (LoRA params kept in fp32 during training)
49
+ - **Optimizer:** AdamW
50
+ - **Scheduler:** cosine with warmup (recommended)
51
+ - **Frozen:** VAE, text_encoder, text_encoder_2
52
+
53
+ ### Data
54
+ - Source: *Hateful Memes* (Facebook AI).
55
+ - We **excluded** labeled hateful samples and applied **NLP enrichment**:
56
+ - Emotion scoring (GoEmotions distilled) and irony scoring (RoBERTa‑irony).
57
+ - Heuristics + percentiles → tones: `humor / irony / neutral`.
58
+ - Final training CSV: prompts balanced by tone; **negative prompts** to avoid text overlays, low‑quality artifacts, watermarks/logos, and unsafe content.
59
+
60
+ > The dataset is **not** included here. Please obtain *Hateful Memes* under its original terms and reproduce the preprocessing if needed.
61
+
62
+ ---
63
+
64
+ ## Safety, Ethics & Mitigations
65
+
66
+ - We filtered out hateful labels and used **negative prompts** to avoid NSFW/hate/text overlays.
67
+ - Despite mitigations, **misuse is possible**. Users are responsible for **prompting responsibly** and complying with local laws and platform policies.
68
+ - Do not use the model to create defamatory, harassing, discriminatory, or otherwise harmful imagery.
69
+
70
+ **Known risks:** dataset biases may remain; aesthetic biases (stock-photo look); occasional failure to respect negative prompts.
71
+
72
+ ---
73
+
74
+ ## How to Use
75
 
76
  ```python
77
  from diffusers import AutoPipelineForText2Image
78
  import torch
79
 
80
+ pipe = AutoPipelineForText2Image.from_pretrained(
81
+ "Ricardouchub/SarcasmDiffusion",
82
+ torch_dtype=torch.float16
83
+ ).to("cuda") # use "cpu" if no GPU
84
+
85
+ prompt = (
86
+ "sarcastic meme about checking the fridge for the third time, "
87
+ "centered subject, plain background, high-contrast photo, stock photo style"
88
+ )
89
+ negative = "nsfw, hate speech, slur, watermark, logo, low quality, blurry, busy background, text overlay"
90
+
91
+ g = torch.Generator(device=pipe.device).manual_seed(123)
92
+ image = pipe(prompt,
93
+ negative_prompt=negative,
94
+ num_inference_steps=22,
95
+ guidance_scale=6.3,
96
+ width=896, height=896,
97
+ generator=g).images[0]
98
+
99
+ image.save("sample.png")
100
+ ```
101
+
102
+ ### Prompting Tips
103
+ - Add **layout hints**: “centered subject”, “plain background”, “space at top and bottom”.
104
+ - Keep **negative prompts** to avoid logos/text/NSFW.
105
+ - Use seeds for reproducibility; `steps=18–28`, `guidance=5.5–7.5`, `size=768–1024`.
106
+
107
+ ---
108
+
109
+ ## Files
110
+
111
+ This repository should contain the standard **Diffusers** layout:
112
+
113
+ ```
114
+ model_index.json
115
+ unet/
116
+ vae/
117
+ text_encoder/
118
+ text_encoder_2/
119
+ scheduler/
120
+ tokenizer/
121
+ ...
122
+ ```
123
+
124
+ Since this is a **fused** export, you **don’t** need an external LoRA weight file.
125
+
126
+ ---
127
+
128
+ ## License
129
+
130
+ - **Code:** MIT (project-level).
131
+ - **Model weights:** follow the base model’s license (Stability AI / SDXL Base 1.0).
132
+ - **Data:** Users must obtain *Hateful Memes* from its source and agree to its terms.
133
+
134
+ > By using this model, you agree not to generate content that is illegal, harmful, or violates rights of others.
135
+
136
+ ---
137
+
138
+ ## Evaluation
139
+
140
+ Qualitative assessment via fixed prompt sheets (humor/irony/neutral). Suggested automatic metrics for future work: CLIP‑score vs. caption, aesthetic predictors, and human preference studies.
141
+
142
+ ---
143
+
144
+ ## Acknowledgments
145
+
146
+ - Stability AI — SDXL Base 1.0
147
+ - Hugging Face — Diffusers, Accelerate, PEFT
148
+ - Facebook AI — Hateful Memes dataset
149
+
150
+ ---
151
+
152
+ ## Citation
153
 
154
+ If you use this model in your research or portfolio, please cite:
 
 
 
 
155
 
156
+ ```
157
+ @software{sarcasmdiffusion_sdxl_fused_2025,
158
+ author = {Ricardo (Ricardouchub)},
159
+ title = {SarcasmDiffusion — SDXL Fused Meme Generator},
160
+ year = {2025},
161
+ url = {https://huggingface.co/Ricardouchub/SarcasmDiffusion-SDXL-Fused}
162
+ }
163
+ ```