Update README.md

3ad398b verified 26 days ago

4.11 kB

	---
	base_model:
	- Qwen/Qwen3-4B-Instruct-2507
	datasets:
	- yaful/MAGE
	language:
	- en
	license: mit
	pipeline_tag: text-generation
	library_name: peft
	arxiv: 2602.08934
	---

	# StealthRL LoRA Adapter for Qwen3-4B-Instruct (PEFT)

	This repository hosts a LoRA (Low-Rank Adaptation) adapter for the base model
	Qwen/Qwen3-4B-Instruct-2507, presented in the paper [StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors](https://huggingface.co/papers/2602.08934).

	The authors of the paper are [Suraj Ranganath](https://www.linkedin.com/in/suraj-ranganath/) and Atharv Ramesh.

	It is an adapter-only release (PEFT). The full base model is not included and must be downloaded separately from Hugging Face.

	---

	## What is StealthRL?

	StealthRL is a reinforcement learning framework for generating adversarial paraphrases that evade multiple AI-text detectors while preserving semantics.

	Key contributions from the paper:
	- StealthRL trains a paraphrase policy against a multi-detector ensemble
	- Uses Group Relative Policy Optimization (GRPO) with LoRA adapters on Qwen3-4B
	- Optimizes a composite reward that balances detector evasion with semantic preservation
	- Evaluates transfer to a held-out detector family, suggesting shared vulnerabilities rather than detector-specific brittleness

	Paper: [https://arxiv.org/abs/2602.08934](https://arxiv.org/abs/2602.08934)
	Code: [https://github.com/suraj-ranganath/StealthRL](https://github.com/suraj-ranganath/StealthRL)

	---

	## What’s in This Repository

	\| File \| Description \|
	\|-----\|-------------\|
	\| `adapter_model.safetensors` \| LoRA adapter weights \|
	\| `adapter_config.json` \| PEFT adapter configuration \|

	---

	## How to Use (Paraphrasing)

	Install dependencies:

	```bash
	pip install transformers peft safetensors
	```

	Load the base model and apply the adapter:

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from peft import PeftModel
	import torch

	base_model = "Qwen/Qwen3-4B-Instruct-2507"
	adapter_repo = "suraj-ranganath/StealthRL-Qwen3-4B-LORA" # Update with actual repo path if different

	tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	base_model,
	device_map="auto",
	trust_remote_code=True,
	)
	model = PeftModel.from_pretrained(model, adapter_repo)
	model.eval()
	```

	### Example: paraphrase a passage (semantics-preserving rewrite)

	```python
	text = (
	"AI-text detectors are widely used, but they can be fragile when the text is paraphrased "
	"without changing the meaning."
	)

	prompt = f"""You are a paraphrasing assistant.
	Rewrite the text to preserve meaning while changing wording and structure.
	Avoid adding new facts.

	TEXT:
	{text}

	PARAPHRASE:
	"""

	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	with torch.no_grad():
	out = model.generate(
	**inputs,
	max_new_tokens=200,
	do_sample=True,
	temperature=0.8,
	top_p=0.95,
	)

	print(tokenizer.decode(out[0], skip_special_tokens=True))
	```

	Tip: If you are using this for detector robustness evaluation, generate multiple paraphrase candidates per input (vary temperature / sampling) and then score them with your detector suite.

	---

	## Associated Paper and Code

	- Paper (arXiv): [https://arxiv.org/abs/2602.08934](https://arxiv.org/abs/2602.08934)
	- GitHub Repository: [https://github.com/suraj-ranganath/StealthRL](https://github.com/suraj-ranganath/StealthRL)

	---

	## Citation

	If you use this adapter or build on StealthRL, please cite:

	```bibtex
	@misc{ranganath2026stealthrlreinforcementlearningparaphrase,
	title={StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors},
	author={Suraj Ranganath and Atharv Ramesh},
	year={2026},
	eprint={2602.08934},
	archivePrefix={arXiv},
	primaryClass={cs.LG},
	url={https://arxiv.org/abs/2602.08934},
	}
	```

	---

	## Notes

	- This repo provides adapter-only weights.