| | --- |
| | base_model: |
| | - Qwen/Qwen3-4B-Instruct-2507 |
| | datasets: |
| | - yaful/MAGE |
| | language: |
| | - en |
| | license: mit |
| | pipeline_tag: text-generation |
| | library_name: peft |
| | arxiv: 2602.08934 |
| | --- |
| | |
| | # StealthRL LoRA Adapter for Qwen3-4B-Instruct (PEFT) |
| |
|
| | This repository hosts a **LoRA (Low-Rank Adaptation) adapter** for the base model |
| | **Qwen/Qwen3-4B-Instruct-2507**, presented in the paper [StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors](https://huggingface.co/papers/2602.08934). |
| |
|
| | The authors of the paper are [Suraj Ranganath](https://www.linkedin.com/in/suraj-ranganath/) and Atharv Ramesh. |
| |
|
| | It is an **adapter-only** release (PEFT). The full base model is not included and must be downloaded separately from Hugging Face. |
| |
|
| | --- |
| |
|
| | ## What is StealthRL? |
| |
|
| | **StealthRL** is a reinforcement learning framework for generating **adversarial paraphrases** that evade **multiple AI-text detectors** while preserving semantics. |
| |
|
| | Key contributions from the paper: |
| | - StealthRL trains a **paraphrase policy** against a **multi-detector ensemble** |
| | - Uses **Group Relative Policy Optimization (GRPO)** with **LoRA adapters** on **Qwen3-4B** |
| | - Optimizes a **composite reward** that balances **detector evasion** with **semantic preservation** |
| | - Evaluates transfer to a **held-out detector family**, suggesting shared vulnerabilities rather than detector-specific brittleness |
| |
|
| | Paper: [https://arxiv.org/abs/2602.08934](https://arxiv.org/abs/2602.08934) |
| | Code: [https://github.com/suraj-ranganath/StealthRL](https://github.com/suraj-ranganath/StealthRL) |
| |
|
| | --- |
| |
|
| | ## What’s in This Repository |
| |
|
| | | File | Description | |
| | |-----|-------------| |
| | | `adapter_model.safetensors` | LoRA adapter weights | |
| | | `adapter_config.json` | PEFT adapter configuration | |
| |
|
| | --- |
| |
|
| | ## How to Use (Paraphrasing) |
| |
|
| | Install dependencies: |
| |
|
| | ```bash |
| | pip install transformers peft safetensors |
| | ``` |
| |
|
| | Load the base model and apply the adapter: |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForCausalLM |
| | from peft import PeftModel |
| | import torch |
| | |
| | base_model = "Qwen/Qwen3-4B-Instruct-2507" |
| | adapter_repo = "suraj-ranganath/StealthRL-Qwen3-4B-LORA" # Update with actual repo path if different |
| | |
| | tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True) |
| | model = AutoModelForCausalLM.from_pretrained( |
| | base_model, |
| | device_map="auto", |
| | trust_remote_code=True, |
| | ) |
| | model = PeftModel.from_pretrained(model, adapter_repo) |
| | model.eval() |
| | ``` |
| |
|
| | ### Example: paraphrase a passage (semantics-preserving rewrite) |
| |
|
| | ```python |
| | text = ( |
| | "AI-text detectors are widely used, but they can be fragile when the text is paraphrased " |
| | "without changing the meaning." |
| | ) |
| | |
| | prompt = f"""You are a paraphrasing assistant. |
| | Rewrite the text to preserve meaning while changing wording and structure. |
| | Avoid adding new facts. |
| | |
| | TEXT: |
| | {text} |
| | |
| | PARAPHRASE: |
| | """ |
| | |
| | inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
| | |
| | with torch.no_grad(): |
| | out = model.generate( |
| | **inputs, |
| | max_new_tokens=200, |
| | do_sample=True, |
| | temperature=0.8, |
| | top_p=0.95, |
| | ) |
| | |
| | print(tokenizer.decode(out[0], skip_special_tokens=True)) |
| | ``` |
| |
|
| | **Tip:** If you are using this for detector robustness evaluation, generate multiple paraphrase candidates per input (vary temperature / sampling) and then score them with your detector suite. |
| |
|
| | --- |
| |
|
| | ## Associated Paper and Code |
| |
|
| | - **Paper (arXiv)**: [https://arxiv.org/abs/2602.08934](https://arxiv.org/abs/2602.08934) |
| | - **GitHub Repository**: [https://github.com/suraj-ranganath/StealthRL](https://github.com/suraj-ranganath/StealthRL) |
| |
|
| | --- |
| |
|
| | ## Citation |
| |
|
| | If you use this adapter or build on StealthRL, please cite: |
| |
|
| | ```bibtex |
| | @misc{ranganath2026stealthrlreinforcementlearningparaphrase, |
| | title={StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors}, |
| | author={Suraj Ranganath and Atharv Ramesh}, |
| | year={2026}, |
| | eprint={2602.08934}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.LG}, |
| | url={https://arxiv.org/abs/2602.08934}, |
| | } |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Notes |
| |
|
| | - This repo provides **adapter-only** weights. |