SD3.5M-TextPecker-SQPA

This model provides LoRA weights for Stable Diffusion 3.5 Medium, optimized using TextPecker, a plug-and-play structural anomaly perceptive reinforcement learning (RL) strategy designed to enhance high-fidelity Visual Text Rendering (VTR).

Paper: TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering
Repository: GitHub - CIawevy/TextPecker

Model Details

TextPecker addresses the challenge of VTR in text-to-image models by perceiving and quantifying structural anomalies such as distortion, blurriness, and misalignment. This specific model was trained using Flow-GRPO with LoRA, establishing a new state-of-the-art in high-fidelity VTR by significantly improving structural fidelity and semantic alignment for text rendering.

Usage

This model provides only the LoRA weights. You will need to load the Stable Diffusion 3.5 Medium base model first.

import os
import torch
from diffusers import StableDiffusion3Pipeline
from peft import PeftModel

# Environment variable configuration
os.environ["PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION"] = "python"

def load_model(model_path, lora_path=None):
    """Load SD3.5 pipeline with optional LoRA weights"""
    torch_dtype = torch.bfloat16
    device = "cuda"
    
    # Initialize SD3.5 Pipeline
    pipe = StableDiffusion3Pipeline.from_pretrained(
        model_path,
        torch_dtype=torch_dtype,
    ).to(device)
    
    # Disable safety checker
    pipe.safety_checker = None
    pipe.set_progress_bar_config(
        position=1,
        disable=False,
        leave=False,
        desc="Timestep",
        dynamic_ncols=True,
    )

    # Load LoRA weights
    if lora_path is not None:
        pipe.transformer = PeftModel.from_pretrained(pipe.transformer, lora_path)
        pipe.transformer.eval()  # Set to inference mode
        print(f"Successfully loaded LoRA weights from: {lora_path}")
    
    return pipe

model_id = "stabilityai/stable-diffusion-3.5-medium"
lora_ckpt_path = "CIawevy/SD3.5M-TextPecker-SQPA"
device = "cuda"

# Load model
pipe = load_model(model_id, lora_ckpt_path)

# Generate image
prompt = 'a weathered cave explorers journal page, with the phrase "TextPecker" prominently written in faded ink, surrounded by sketches of ancient ruins and cryptic symbols, under a dim, mystical light.'
image = pipe(
    prompt=prompt,
    negative_prompt=" ",
    width=1024,
    height=1024,
    num_inference_steps=50,
    guidance_scale=3.5,
    generator=torch.Generator(device=device).manual_seed(42)
).images[0]

# Save result
image.save("TextPecker_sd35_demo.png")

Citation

If you find TextPecker useful in your research, please cite our paper:

@article{zhu2026TextPecker,
  title   = {TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering},
  author  = {Zhu, Hanshen and Liu, Yuliang and Wu, Xuecheng and Wang, An-Lan and Feng, Hao and Yang, Dingkang and Feng, Chao and Huang, Can and Tang, Jingqun and Bai, Xiang},
  journal = {arXiv preprint arXiv:2602.20903},
  year    = {2026}
}

Downloads last month: -

Model tree for CIawevy/SD3.5M-TextPecker-SQPA

Base model

stabilityai/stable-diffusion-3.5-medium

Finetuned

(76)

this model

Collection including CIawevy/SD3.5M-TextPecker-SQPA

TextPecker

Collection

Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering • 8 items • Updated Feb 25 • 1

Paper for CIawevy/SD3.5M-TextPecker-SQPA

TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering

Paper • 2602.20903 • Published Feb 24 • 1