--- base_model: - Qwen/Qwen3-4B-Instruct-2507 datasets: - yaful/MAGE language: - en license: mit pipeline_tag: text-generation library_name: peft arxiv: 2602.08934 --- # StealthRL LoRA Adapter for Qwen3-4B-Instruct (PEFT) This repository hosts a **LoRA (Low-Rank Adaptation) adapter** for the base model **Qwen/Qwen3-4B-Instruct-2507**, presented in the paper [StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors](https://huggingface.co/papers/2602.08934). The authors of the paper are [Suraj Ranganath](https://www.linkedin.com/in/suraj-ranganath/) and Atharv Ramesh. It is an **adapter-only** release (PEFT). The full base model is not included and must be downloaded separately from Hugging Face. --- ## What is StealthRL? **StealthRL** is a reinforcement learning framework for generating **adversarial paraphrases** that evade **multiple AI-text detectors** while preserving semantics. Key contributions from the paper: - StealthRL trains a **paraphrase policy** against a **multi-detector ensemble** - Uses **Group Relative Policy Optimization (GRPO)** with **LoRA adapters** on **Qwen3-4B** - Optimizes a **composite reward** that balances **detector evasion** with **semantic preservation** - Evaluates transfer to a **held-out detector family**, suggesting shared vulnerabilities rather than detector-specific brittleness Paper: [https://arxiv.org/abs/2602.08934](https://arxiv.org/abs/2602.08934) Code: [https://github.com/suraj-ranganath/StealthRL](https://github.com/suraj-ranganath/StealthRL) --- ## What’s in This Repository | File | Description | |-----|-------------| | `adapter_model.safetensors` | LoRA adapter weights | | `adapter_config.json` | PEFT adapter configuration | --- ## How to Use (Paraphrasing) Install dependencies: ```bash pip install transformers peft safetensors ``` Load the base model and apply the adapter: ```python from transformers import AutoTokenizer, AutoModelForCausalLM from peft import PeftModel import torch base_model = "Qwen/Qwen3-4B-Instruct-2507" adapter_repo = "suraj-ranganath/StealthRL-Qwen3-4B-LORA" # Update with actual repo path if different tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( base_model, device_map="auto", trust_remote_code=True, ) model = PeftModel.from_pretrained(model, adapter_repo) model.eval() ``` ### Example: paraphrase a passage (semantics-preserving rewrite) ```python text = ( "AI-text detectors are widely used, but they can be fragile when the text is paraphrased " "without changing the meaning." ) prompt = f"""You are a paraphrasing assistant. Rewrite the text to preserve meaning while changing wording and structure. Avoid adding new facts. TEXT: {text} PARAPHRASE: """ inputs = tokenizer(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): out = model.generate( **inputs, max_new_tokens=200, do_sample=True, temperature=0.8, top_p=0.95, ) print(tokenizer.decode(out[0], skip_special_tokens=True)) ``` **Tip:** If you are using this for detector robustness evaluation, generate multiple paraphrase candidates per input (vary temperature / sampling) and then score them with your detector suite. --- ## Associated Paper and Code - **Paper (arXiv)**: [https://arxiv.org/abs/2602.08934](https://arxiv.org/abs/2602.08934) - **GitHub Repository**: [https://github.com/suraj-ranganath/StealthRL](https://github.com/suraj-ranganath/StealthRL) --- ## Citation If you use this adapter or build on StealthRL, please cite: ```bibtex @misc{ranganath2026stealthrlreinforcementlearningparaphrase, title={StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors}, author={Suraj Ranganath and Atharv Ramesh}, year={2026}, eprint={2602.08934}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2602.08934}, } ``` --- ## Notes - This repo provides **adapter-only** weights.