StealthRL / README.md
suraj-ranganath's picture
Update README.md
3ad398b verified
---
base_model:
- Qwen/Qwen3-4B-Instruct-2507
datasets:
- yaful/MAGE
language:
- en
license: mit
pipeline_tag: text-generation
library_name: peft
arxiv: 2602.08934
---
# StealthRL LoRA Adapter for Qwen3-4B-Instruct (PEFT)
This repository hosts a **LoRA (Low-Rank Adaptation) adapter** for the base model
**Qwen/Qwen3-4B-Instruct-2507**, presented in the paper [StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors](https://huggingface.co/papers/2602.08934).
The authors of the paper are [Suraj Ranganath](https://www.linkedin.com/in/suraj-ranganath/) and Atharv Ramesh.
It is an **adapter-only** release (PEFT). The full base model is not included and must be downloaded separately from Hugging Face.
---
## What is StealthRL?
**StealthRL** is a reinforcement learning framework for generating **adversarial paraphrases** that evade **multiple AI-text detectors** while preserving semantics.
Key contributions from the paper:
- StealthRL trains a **paraphrase policy** against a **multi-detector ensemble**
- Uses **Group Relative Policy Optimization (GRPO)** with **LoRA adapters** on **Qwen3-4B**
- Optimizes a **composite reward** that balances **detector evasion** with **semantic preservation**
- Evaluates transfer to a **held-out detector family**, suggesting shared vulnerabilities rather than detector-specific brittleness
Paper: [https://arxiv.org/abs/2602.08934](https://arxiv.org/abs/2602.08934)
Code: [https://github.com/suraj-ranganath/StealthRL](https://github.com/suraj-ranganath/StealthRL)
---
## What’s in This Repository
| File | Description |
|-----|-------------|
| `adapter_model.safetensors` | LoRA adapter weights |
| `adapter_config.json` | PEFT adapter configuration |
---
## How to Use (Paraphrasing)
Install dependencies:
```bash
pip install transformers peft safetensors
```
Load the base model and apply the adapter:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
base_model = "Qwen/Qwen3-4B-Instruct-2507"
adapter_repo = "suraj-ranganath/StealthRL-Qwen3-4B-LORA" # Update with actual repo path if different
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
base_model,
device_map="auto",
trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter_repo)
model.eval()
```
### Example: paraphrase a passage (semantics-preserving rewrite)
```python
text = (
"AI-text detectors are widely used, but they can be fragile when the text is paraphrased "
"without changing the meaning."
)
prompt = f"""You are a paraphrasing assistant.
Rewrite the text to preserve meaning while changing wording and structure.
Avoid adding new facts.
TEXT:
{text}
PARAPHRASE:
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=200,
do_sample=True,
temperature=0.8,
top_p=0.95,
)
print(tokenizer.decode(out[0], skip_special_tokens=True))
```
**Tip:** If you are using this for detector robustness evaluation, generate multiple paraphrase candidates per input (vary temperature / sampling) and then score them with your detector suite.
---
## Associated Paper and Code
- **Paper (arXiv)**: [https://arxiv.org/abs/2602.08934](https://arxiv.org/abs/2602.08934)
- **GitHub Repository**: [https://github.com/suraj-ranganath/StealthRL](https://github.com/suraj-ranganath/StealthRL)
---
## Citation
If you use this adapter or build on StealthRL, please cite:
```bibtex
@misc{ranganath2026stealthrlreinforcementlearningparaphrase,
title={StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors},
author={Suraj Ranganath and Atharv Ramesh},
year={2026},
eprint={2602.08934},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2602.08934},
}
```
---
## Notes
- This repo provides **adapter-only** weights.