File size: 4,111 Bytes
7c1ccdc 3c7c8db 7c1ccdc 3c7c8db 7c1ccdc 3c7c8db 802bfbc 3c7c8db 3ad398b 802bfbc 3c7c8db 802bfbc 3c7c8db 802bfbc 3c7c8db 802bfbc 3c7c8db 802bfbc 7c1ccdc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 | ---
base_model:
- Qwen/Qwen3-4B-Instruct-2507
datasets:
- yaful/MAGE
language:
- en
license: mit
pipeline_tag: text-generation
library_name: peft
arxiv: 2602.08934
---
# StealthRL LoRA Adapter for Qwen3-4B-Instruct (PEFT)
This repository hosts a **LoRA (Low-Rank Adaptation) adapter** for the base model
**Qwen/Qwen3-4B-Instruct-2507**, presented in the paper [StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors](https://huggingface.co/papers/2602.08934).
The authors of the paper are [Suraj Ranganath](https://www.linkedin.com/in/suraj-ranganath/) and Atharv Ramesh.
It is an **adapter-only** release (PEFT). The full base model is not included and must be downloaded separately from Hugging Face.
---
## What is StealthRL?
**StealthRL** is a reinforcement learning framework for generating **adversarial paraphrases** that evade **multiple AI-text detectors** while preserving semantics.
Key contributions from the paper:
- StealthRL trains a **paraphrase policy** against a **multi-detector ensemble**
- Uses **Group Relative Policy Optimization (GRPO)** with **LoRA adapters** on **Qwen3-4B**
- Optimizes a **composite reward** that balances **detector evasion** with **semantic preservation**
- Evaluates transfer to a **held-out detector family**, suggesting shared vulnerabilities rather than detector-specific brittleness
Paper: [https://arxiv.org/abs/2602.08934](https://arxiv.org/abs/2602.08934)
Code: [https://github.com/suraj-ranganath/StealthRL](https://github.com/suraj-ranganath/StealthRL)
---
## What’s in This Repository
| File | Description |
|-----|-------------|
| `adapter_model.safetensors` | LoRA adapter weights |
| `adapter_config.json` | PEFT adapter configuration |
---
## How to Use (Paraphrasing)
Install dependencies:
```bash
pip install transformers peft safetensors
```
Load the base model and apply the adapter:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
base_model = "Qwen/Qwen3-4B-Instruct-2507"
adapter_repo = "suraj-ranganath/StealthRL-Qwen3-4B-LORA" # Update with actual repo path if different
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
base_model,
device_map="auto",
trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter_repo)
model.eval()
```
### Example: paraphrase a passage (semantics-preserving rewrite)
```python
text = (
"AI-text detectors are widely used, but they can be fragile when the text is paraphrased "
"without changing the meaning."
)
prompt = f"""You are a paraphrasing assistant.
Rewrite the text to preserve meaning while changing wording and structure.
Avoid adding new facts.
TEXT:
{text}
PARAPHRASE:
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=200,
do_sample=True,
temperature=0.8,
top_p=0.95,
)
print(tokenizer.decode(out[0], skip_special_tokens=True))
```
**Tip:** If you are using this for detector robustness evaluation, generate multiple paraphrase candidates per input (vary temperature / sampling) and then score them with your detector suite.
---
## Associated Paper and Code
- **Paper (arXiv)**: [https://arxiv.org/abs/2602.08934](https://arxiv.org/abs/2602.08934)
- **GitHub Repository**: [https://github.com/suraj-ranganath/StealthRL](https://github.com/suraj-ranganath/StealthRL)
---
## Citation
If you use this adapter or build on StealthRL, please cite:
```bibtex
@misc{ranganath2026stealthrlreinforcementlearningparaphrase,
title={StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors},
author={Suraj Ranganath and Atharv Ramesh},
year={2026},
eprint={2602.08934},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2602.08934},
}
```
---
## Notes
- This repo provides **adapter-only** weights. |