SD3.5M-TextPecker-SQPA
This model provides LoRA weights for Stable Diffusion 3.5 Medium, optimized using TextPecker, a plug-and-play structural anomaly perceptive reinforcement learning (RL) strategy designed to enhance high-fidelity Visual Text Rendering (VTR).
- Paper: TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering
- Repository: GitHub - CIawevy/TextPecker
Model Details
TextPecker addresses the challenge of VTR in text-to-image models by perceiving and quantifying structural anomalies such as distortion, blurriness, and misalignment. This specific model was trained using Flow-GRPO with LoRA, establishing a new state-of-the-art in high-fidelity VTR by significantly improving structural fidelity and semantic alignment for text rendering.
Usage
This model provides only the LoRA weights. You will need to load the Stable Diffusion 3.5 Medium base model first.
import os
import torch
from diffusers import StableDiffusion3Pipeline
from peft import PeftModel
# Environment variable configuration
os.environ["PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION"] = "python"
def load_model(model_path, lora_path=None):
"""Load SD3.5 pipeline with optional LoRA weights"""
torch_dtype = torch.bfloat16
device = "cuda"
# Initialize SD3.5 Pipeline
pipe = StableDiffusion3Pipeline.from_pretrained(
model_path,
torch_dtype=torch_dtype,
).to(device)
# Disable safety checker
pipe.safety_checker = None
pipe.set_progress_bar_config(
position=1,
disable=False,
leave=False,
desc="Timestep",
dynamic_ncols=True,
)
# Load LoRA weights
if lora_path is not None:
pipe.transformer = PeftModel.from_pretrained(pipe.transformer, lora_path)
pipe.transformer.eval() # Set to inference mode
print(f"Successfully loaded LoRA weights from: {lora_path}")
return pipe
model_id = "stabilityai/stable-diffusion-3.5-medium"
lora_ckpt_path = "CIawevy/SD3.5M-TextPecker-SQPA"
device = "cuda"
# Load model
pipe = load_model(model_id, lora_ckpt_path)
# Generate image
prompt = 'a weathered cave explorers journal page, with the phrase "TextPecker" prominently written in faded ink, surrounded by sketches of ancient ruins and cryptic symbols, under a dim, mystical light.'
image = pipe(
prompt=prompt,
negative_prompt=" ",
width=1024,
height=1024,
num_inference_steps=50,
guidance_scale=3.5,
generator=torch.Generator(device=device).manual_seed(42)
).images[0]
# Save result
image.save("TextPecker_sd35_demo.png")
Citation
If you find TextPecker useful in your research, please cite our paper:
@article{zhu2026TextPecker,
title = {TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering},
author = {Zhu, Hanshen and Liu, Yuliang and Wu, Xuecheng and Wang, An-Lan and Feng, Hao and Yang, Dingkang and Feng, Chao and Huang, Can and Tang, Jingqun and Bai, Xiang},
journal = {arXiv preprint arXiv:2602.20903},
year = {2026}
}
- Downloads last month
- -
Model tree for CIawevy/SD3.5M-TextPecker-SQPA
Base model
stabilityai/stable-diffusion-3.5-medium