SD3.5M-TextPecker-SQPA

This model provides LoRA weights for Stable Diffusion 3.5 Medium, optimized using TextPecker, a plug-and-play structural anomaly perceptive reinforcement learning (RL) strategy designed to enhance high-fidelity Visual Text Rendering (VTR).

Model Details

TextPecker addresses the challenge of VTR in text-to-image models by perceiving and quantifying structural anomalies such as distortion, blurriness, and misalignment. This specific model was trained using Flow-GRPO with LoRA, establishing a new state-of-the-art in high-fidelity VTR by significantly improving structural fidelity and semantic alignment for text rendering.

Usage

This model provides only the LoRA weights. You will need to load the Stable Diffusion 3.5 Medium base model first.

import os
import torch
from diffusers import StableDiffusion3Pipeline
from peft import PeftModel

# Environment variable configuration
os.environ["PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION"] = "python"

def load_model(model_path, lora_path=None):
    """Load SD3.5 pipeline with optional LoRA weights"""
    torch_dtype = torch.bfloat16
    device = "cuda"
    
    # Initialize SD3.5 Pipeline
    pipe = StableDiffusion3Pipeline.from_pretrained(
        model_path,
        torch_dtype=torch_dtype,
    ).to(device)
    
    # Disable safety checker
    pipe.safety_checker = None
    pipe.set_progress_bar_config(
        position=1,
        disable=False,
        leave=False,
        desc="Timestep",
        dynamic_ncols=True,
    )

    # Load LoRA weights
    if lora_path is not None:
        pipe.transformer = PeftModel.from_pretrained(pipe.transformer, lora_path)
        pipe.transformer.eval()  # Set to inference mode
        print(f"Successfully loaded LoRA weights from: {lora_path}")
    
    return pipe

model_id = "stabilityai/stable-diffusion-3.5-medium"
lora_ckpt_path = "CIawevy/SD3.5M-TextPecker-SQPA"
device = "cuda"

# Load model
pipe = load_model(model_id, lora_ckpt_path)

# Generate image
prompt = 'a weathered cave explorers journal page, with the phrase "TextPecker" prominently written in faded ink, surrounded by sketches of ancient ruins and cryptic symbols, under a dim, mystical light.'
image = pipe(
    prompt=prompt,
    negative_prompt=" ",
    width=1024,
    height=1024,
    num_inference_steps=50,
    guidance_scale=3.5,
    generator=torch.Generator(device=device).manual_seed(42)
).images[0]

# Save result
image.save("TextPecker_sd35_demo.png")

Citation

If you find TextPecker useful in your research, please cite our paper:

@article{zhu2026TextPecker,
  title   = {TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering},
  author  = {Zhu, Hanshen and Liu, Yuliang and Wu, Xuecheng and Wang, An-Lan and Feng, Hao and Yang, Dingkang and Feng, Chao and Huang, Can and Tang, Jingqun and Bai, Xiang},
  journal = {arXiv preprint arXiv:2602.20903},
  year    = {2026}
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for CIawevy/SD3.5M-TextPecker-SQPA

Finetuned
(65)
this model

Collection including CIawevy/SD3.5M-TextPecker-SQPA

Paper for CIawevy/SD3.5M-TextPecker-SQPA