File size: 4,111 Bytes

7c1ccdc
3c7c8db
 
7c1ccdc
 
 
 
3c7c8db
 
 
 
7c1ccdc
3c7c8db
802bfbc
 
 
3c7c8db
 
3ad398b
802bfbc
 
 
 
 
 
 
 
 
3c7c8db
802bfbc
 
 
 
 
3c7c8db
 
802bfbc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3c7c8db
802bfbc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3c7c8db
 
802bfbc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7c1ccdc

---
base_model:
- Qwen/Qwen3-4B-Instruct-2507
datasets:
- yaful/MAGE
language:
- en
license: mit
pipeline_tag: text-generation
library_name: peft
arxiv: 2602.08934
---

# StealthRL LoRA Adapter for Qwen3-4B-Instruct (PEFT)

This repository hosts a **LoRA (Low-Rank Adaptation) adapter** for the base model  
**Qwen/Qwen3-4B-Instruct-2507**, presented in the paper [StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors](https://huggingface.co/papers/2602.08934).

The authors of the paper are [Suraj Ranganath](https://www.linkedin.com/in/suraj-ranganath/) and Atharv Ramesh.

It is an **adapter-only** release (PEFT). The full base model is not included and must be downloaded separately from Hugging Face.

---

## What is StealthRL?

**StealthRL** is a reinforcement learning framework for generating **adversarial paraphrases** that evade **multiple AI-text detectors** while preserving semantics.

Key contributions from the paper:
- StealthRL trains a **paraphrase policy** against a **multi-detector ensemble**
- Uses **Group Relative Policy Optimization (GRPO)** with **LoRA adapters** on **Qwen3-4B**
- Optimizes a **composite reward** that balances **detector evasion** with **semantic preservation**
- Evaluates transfer to a **held-out detector family**, suggesting shared vulnerabilities rather than detector-specific brittleness

Paper: [https://arxiv.org/abs/2602.08934](https://arxiv.org/abs/2602.08934)  
Code: [https://github.com/suraj-ranganath/StealthRL](https://github.com/suraj-ranganath/StealthRL)

---

## What’s in This Repository

| File | Description |
|-----|-------------|
| `adapter_model.safetensors` | LoRA adapter weights |
| `adapter_config.json` | PEFT adapter configuration |

---

## How to Use (Paraphrasing)

Install dependencies:

```bash
pip install transformers peft safetensors
```

Load the base model and apply the adapter:

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base_model = "Qwen/Qwen3-4B-Instruct-2507"
adapter_repo = "suraj-ranganath/StealthRL-Qwen3-4B-LORA" # Update with actual repo path if different

tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter_repo)
model.eval()
```

### Example: paraphrase a passage (semantics-preserving rewrite)

```python
text = (
    "AI-text detectors are widely used, but they can be fragile when the text is paraphrased "
    "without changing the meaning."
)

prompt = f"""You are a paraphrasing assistant.
Rewrite the text to preserve meaning while changing wording and structure.
Avoid adding new facts.

TEXT:
{text}

PARAPHRASE:
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=200,
        do_sample=True,
        temperature=0.8,
        top_p=0.95,
    )

print(tokenizer.decode(out[0], skip_special_tokens=True))
```

**Tip:** If you are using this for detector robustness evaluation, generate multiple paraphrase candidates per input (vary temperature / sampling) and then score them with your detector suite.

---

## Associated Paper and Code

- **Paper (arXiv)**: [https://arxiv.org/abs/2602.08934](https://arxiv.org/abs/2602.08934)  
- **GitHub Repository**: [https://github.com/suraj-ranganath/StealthRL](https://github.com/suraj-ranganath/StealthRL)

---

## Citation

If you use this adapter or build on StealthRL, please cite:

```bibtex
@misc{ranganath2026stealthrlreinforcementlearningparaphrase,
      title={StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors},
      author={Suraj Ranganath and Atharv Ramesh},
      year={2026},
      eprint={2602.08934},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2602.08934},
}
```

---

## Notes

- This repo provides **adapter-only** weights.