StealthRL LoRA Adapter for Qwen3-4B-Instruct (PEFT)

This repository hosts a LoRA (Low-Rank Adaptation) adapter for the base model
Qwen/Qwen3-4B-Instruct-2507, trained using the StealthRL methodology.

It is an adapter-only release (PEFT). The full base model is not included and must be downloaded separately from Hugging Face.


What is StealthRL?

StealthRL is a reinforcement learning framework for generating adversarial paraphrases that evade multiple AI-text detectors while preserving semantics.

From the paper:

  • StealthRL trains a paraphrase policy against a multi-detector ensemble
  • Uses Group Relative Policy Optimization (GRPO) with LoRA adapters on Qwen3-4B
  • Optimizes a composite reward that balances detector evasion with semantic preservation
  • Evaluates transfer to a held-out detector family, suggesting shared vulnerabilities rather than detector-specific brittleness

Paper: https://arxiv.org/abs/2602.08934
Code: https://github.com/suraj-ranganath/StealthRL


What’s in This Repository

File Description
adapter_model.safetensors LoRA adapter weights
adapter_config.json PEFT adapter configuration

How to Use (Paraphrasing)

Install dependencies:

pip install transformers peft safetensors

Load the base model and apply the adapter:

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base_model = "Qwen/Qwen3-4B-Instruct-2507"
adapter_repo = "YOUR_HF_USERNAME/YOUR_ADAPTER_REPO"

tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter_repo)
model.eval()

Example: paraphrase a passage (semantics-preserving rewrite)

text = (
    "AI-text detectors are widely used, but they can be fragile when the text is paraphrased "
    "without changing the meaning."
)

prompt = f"""You are a paraphrasing assistant.
Rewrite the text to preserve meaning while changing wording and structure.
Avoid adding new facts.

TEXT:
{text}

PARAPHRASE:
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=200,
        do_sample=True,
        temperature=0.8,
        top_p=0.95,
    )

print(tokenizer.decode(out[0], skip_special_tokens=True))

Tip: If you are using this for detector robustness evaluation, generate multiple paraphrase candidates per input (vary temperature / sampling) and then score them with your detector suite.


Associated Paper and Code


Citation

If you use this adapter or build on StealthRL, please cite:

@misc{ranganath2026stealthrlreinforcementlearningparaphrase,
      title={StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors},
      author={Suraj Ranganath and Atharv Ramesh},
      year={2026},
      eprint={2602.08934},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2602.08934},
}

Notes

  • This repo provides adapter-only weights.
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Model tree for suraj-ranganath/StealthRL

Finetuned
(888)
this model

Dataset used to train suraj-ranganath/StealthRL

Paper for suraj-ranganath/StealthRL