StealthRL LoRA Adapter for Qwen3-4B-Instruct (PEFT)

This repository hosts a LoRA (Low-Rank Adaptation) adapter for the base model
Qwen/Qwen3-4B-Instruct-2507, presented in the paper StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors.

The authors of the paper are Suraj Ranganath and Atharv Ramesh.

It is an adapter-only release (PEFT). The full base model is not included and must be downloaded separately from Hugging Face.

What is StealthRL?

StealthRL is a reinforcement learning framework for generating adversarial paraphrases that evade multiple AI-text detectors while preserving semantics.

Key contributions from the paper:

StealthRL trains a paraphrase policy against a multi-detector ensemble
Uses Group Relative Policy Optimization (GRPO) with LoRA adapters on Qwen3-4B
Optimizes a composite reward that balances detector evasion with semantic preservation
Evaluates transfer to a held-out detector family, suggesting shared vulnerabilities rather than detector-specific brittleness

Paper: https://arxiv.org/abs/2602.08934
Code: https://github.com/suraj-ranganath/StealthRL

What’s in This Repository

File	Description
`adapter_model.safetensors`	LoRA adapter weights
`adapter_config.json`	PEFT adapter configuration

How to Use (Paraphrasing)

Install dependencies:

pip install transformers peft safetensors

Load the base model and apply the adapter:

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base_model = "Qwen/Qwen3-4B-Instruct-2507"
adapter_repo = "suraj-ranganath/StealthRL-Qwen3-4B-LORA" # Update with actual repo path if different

tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter_repo)
model.eval()

Example: paraphrase a passage (semantics-preserving rewrite)

text = (
    "AI-text detectors are widely used, but they can be fragile when the text is paraphrased "
    "without changing the meaning."
)

prompt = f"""You are a paraphrasing assistant.
Rewrite the text to preserve meaning while changing wording and structure.
Avoid adding new facts.

TEXT:
{text}

PARAPHRASE:
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=200,
        do_sample=True,
        temperature=0.8,
        top_p=0.95,
    )

print(tokenizer.decode(out[0], skip_special_tokens=True))

Tip: If you are using this for detector robustness evaluation, generate multiple paraphrase candidates per input (vary temperature / sampling) and then score them with your detector suite.

Associated Paper and Code

Paper (arXiv): https://arxiv.org/abs/2602.08934
GitHub Repository: https://github.com/suraj-ranganath/StealthRL

Citation

If you use this adapter or build on StealthRL, please cite:

@misc{ranganath2026stealthrlreinforcementlearningparaphrase,
      title={StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors},
      author={Suraj Ranganath and Atharv Ramesh},
      year={2026},
      eprint={2602.08934},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2602.08934},
}

Notes

This repo provides adapter-only weights.

Downloads last month: 310

Model tree for suraj-ranganath/StealthRL

Base model

Qwen/Qwen3-4B-Instruct-2507

Adapter

(5529)

this model

Dataset used to train suraj-ranganath/StealthRL

Paper for suraj-ranganath/StealthRL

StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors

Paper • 2602.08934 • Published Feb 9