StealthRL LoRA Adapter for Qwen3-4B-Instruct (PEFT)
This repository hosts a LoRA (Low-Rank Adaptation) adapter for the base model
Qwen/Qwen3-4B-Instruct-2507, trained using the StealthRL methodology.
It is an adapter-only release (PEFT). The full base model is not included and must be downloaded separately from Hugging Face.
What is StealthRL?
StealthRL is a reinforcement learning framework for generating adversarial paraphrases that evade multiple AI-text detectors while preserving semantics.
From the paper:
- StealthRL trains a paraphrase policy against a multi-detector ensemble
- Uses Group Relative Policy Optimization (GRPO) with LoRA adapters on Qwen3-4B
- Optimizes a composite reward that balances detector evasion with semantic preservation
- Evaluates transfer to a held-out detector family, suggesting shared vulnerabilities rather than detector-specific brittleness
Paper: https://arxiv.org/abs/2602.08934
Code: https://github.com/suraj-ranganath/StealthRL
What’s in This Repository
| File | Description |
|---|---|
adapter_model.safetensors |
LoRA adapter weights |
adapter_config.json |
PEFT adapter configuration |
How to Use (Paraphrasing)
Install dependencies:
pip install transformers peft safetensors
Load the base model and apply the adapter:
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
base_model = "Qwen/Qwen3-4B-Instruct-2507"
adapter_repo = "YOUR_HF_USERNAME/YOUR_ADAPTER_REPO"
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
base_model,
device_map="auto",
trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter_repo)
model.eval()
Example: paraphrase a passage (semantics-preserving rewrite)
text = (
"AI-text detectors are widely used, but they can be fragile when the text is paraphrased "
"without changing the meaning."
)
prompt = f"""You are a paraphrasing assistant.
Rewrite the text to preserve meaning while changing wording and structure.
Avoid adding new facts.
TEXT:
{text}
PARAPHRASE:
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=200,
do_sample=True,
temperature=0.8,
top_p=0.95,
)
print(tokenizer.decode(out[0], skip_special_tokens=True))
Tip: If you are using this for detector robustness evaluation, generate multiple paraphrase candidates per input (vary temperature / sampling) and then score them with your detector suite.
Associated Paper and Code
- Paper (arXiv): https://arxiv.org/abs/2602.08934
- GitHub Repository: https://github.com/suraj-ranganath/StealthRL
Citation
If you use this adapter or build on StealthRL, please cite:
@misc{ranganath2026stealthrlreinforcementlearningparaphrase,
title={StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors},
author={Suraj Ranganath and Atharv Ramesh},
year={2026},
eprint={2602.08934},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2602.08934},
}
Notes
- This repo provides adapter-only weights.
Model tree for suraj-ranganath/StealthRL
Base model
Qwen/Qwen3-4B-Instruct-2507