File size: 4,111 Bytes
7c1ccdc
3c7c8db
 
7c1ccdc
 
 
 
3c7c8db
 
 
 
7c1ccdc
3c7c8db
802bfbc
 
 
3c7c8db
 
3ad398b
802bfbc
 
 
 
 
 
 
 
 
3c7c8db
802bfbc
 
 
 
 
3c7c8db
 
802bfbc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3c7c8db
802bfbc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3c7c8db
 
802bfbc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7c1ccdc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
base_model:
- Qwen/Qwen3-4B-Instruct-2507
datasets:
- yaful/MAGE
language:
- en
license: mit
pipeline_tag: text-generation
library_name: peft
arxiv: 2602.08934
---

# StealthRL LoRA Adapter for Qwen3-4B-Instruct (PEFT)

This repository hosts a **LoRA (Low-Rank Adaptation) adapter** for the base model  
**Qwen/Qwen3-4B-Instruct-2507**, presented in the paper [StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors](https://huggingface.co/papers/2602.08934).

The authors of the paper are [Suraj Ranganath](https://www.linkedin.com/in/suraj-ranganath/) and Atharv Ramesh.

It is an **adapter-only** release (PEFT). The full base model is not included and must be downloaded separately from Hugging Face.

---

## What is StealthRL?

**StealthRL** is a reinforcement learning framework for generating **adversarial paraphrases** that evade **multiple AI-text detectors** while preserving semantics.

Key contributions from the paper:
- StealthRL trains a **paraphrase policy** against a **multi-detector ensemble**
- Uses **Group Relative Policy Optimization (GRPO)** with **LoRA adapters** on **Qwen3-4B**
- Optimizes a **composite reward** that balances **detector evasion** with **semantic preservation**
- Evaluates transfer to a **held-out detector family**, suggesting shared vulnerabilities rather than detector-specific brittleness

Paper: [https://arxiv.org/abs/2602.08934](https://arxiv.org/abs/2602.08934)  
Code: [https://github.com/suraj-ranganath/StealthRL](https://github.com/suraj-ranganath/StealthRL)

---

## What’s in This Repository

| File | Description |
|-----|-------------|
| `adapter_model.safetensors` | LoRA adapter weights |
| `adapter_config.json` | PEFT adapter configuration |

---

## How to Use (Paraphrasing)

Install dependencies:

```bash
pip install transformers peft safetensors
```

Load the base model and apply the adapter:

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base_model = "Qwen/Qwen3-4B-Instruct-2507"
adapter_repo = "suraj-ranganath/StealthRL-Qwen3-4B-LORA" # Update with actual repo path if different

tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter_repo)
model.eval()
```

### Example: paraphrase a passage (semantics-preserving rewrite)

```python
text = (
    "AI-text detectors are widely used, but they can be fragile when the text is paraphrased "
    "without changing the meaning."
)

prompt = f"""You are a paraphrasing assistant.
Rewrite the text to preserve meaning while changing wording and structure.
Avoid adding new facts.

TEXT:
{text}

PARAPHRASE:
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=200,
        do_sample=True,
        temperature=0.8,
        top_p=0.95,
    )

print(tokenizer.decode(out[0], skip_special_tokens=True))
```

**Tip:** If you are using this for detector robustness evaluation, generate multiple paraphrase candidates per input (vary temperature / sampling) and then score them with your detector suite.

---

## Associated Paper and Code

- **Paper (arXiv)**: [https://arxiv.org/abs/2602.08934](https://arxiv.org/abs/2602.08934)  
- **GitHub Repository**: [https://github.com/suraj-ranganath/StealthRL](https://github.com/suraj-ranganath/StealthRL)

---

## Citation

If you use this adapter or build on StealthRL, please cite:

```bibtex
@misc{ranganath2026stealthrlreinforcementlearningparaphrase,
      title={StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors},
      author={Suraj Ranganath and Atharv Ramesh},
      year={2026},
      eprint={2602.08934},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2602.08934},
}
```

---

## Notes

- This repo provides **adapter-only** weights.