Update pipeline tag and add library metadata

Hi! I'm Niels from the Hugging Face community team.

I've opened this PR to improve the model card metadata and discoverability:
- Updated the `pipeline_tag` to `text-generation` as this is a paraphrasing model (LLM).
- Added `library_name: peft` since this is a LoRA adapter, which enables the "Use in Transformers" button to show the correct code snippets.
- Added the ArXiv ID to the metadata to link this model to its research paper.
- Included the paper authors for better attribution.

Let me know if you have any questions!

Files changed (1) hide show

README.md +16 -11

README.md CHANGED Viewed

@@ -1,17 +1,22 @@
 ---
-license: mit
 datasets:
 - yaful/MAGE
 language:
 - en
-base_model:
-- Qwen/Qwen3-4B-Instruct-2507
-pipeline_tag: reinforcement-learning
 ---
 # StealthRL LoRA Adapter for Qwen3-4B-Instruct (PEFT)
 This repository hosts a **LoRA (Low-Rank Adaptation) adapter** for the base model
-**Qwen/Qwen3-4B-Instruct-2507**, trained using the **StealthRL** methodology.
 It is an **adapter-only** release (PEFT). The full base model is not included and must be downloaded separately from Hugging Face.
@@ -21,14 +26,14 @@ It is an **adapter-only** release (PEFT). The full base model is not included an
 **StealthRL** is a reinforcement learning framework for generating **adversarial paraphrases** that evade **multiple AI-text detectors** while preserving semantics.
-From the paper:
 - StealthRL trains a **paraphrase policy** against a **multi-detector ensemble**
 - Uses **Group Relative Policy Optimization (GRPO)** with **LoRA adapters** on **Qwen3-4B**
 - Optimizes a **composite reward** that balances **detector evasion** with **semantic preservation**
 - Evaluates transfer to a **held-out detector family**, suggesting shared vulnerabilities rather than detector-specific brittleness
-Paper: https://arxiv.org/abs/2602.08934
-Code: https://github.com/suraj-ranganath/StealthRL
 ---
@@ -57,7 +62,7 @@ from peft import PeftModel
 import torch
 base_model = "Qwen/Qwen3-4B-Instruct-2507"
-adapter_repo = "YOUR_HF_USERNAME/YOUR_ADAPTER_REPO"
 tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
 model = AutoModelForCausalLM.from_pretrained(
@@ -107,8 +112,8 @@ print(tokenizer.decode(out[0], skip_special_tokens=True))
 ## Associated Paper and Code
-- **Paper (arXiv)**: https://arxiv.org/abs/2602.08934
-- **GitHub Repository**: https://github.com/suraj-ranganath/StealthRL
 ---

 ---
+base_model:
+- Qwen/Qwen3-4B-Instruct-2507
 datasets:
 - yaful/MAGE
 language:
 - en
+license: mit
+pipeline_tag: text-generation
+library_name: peft
+arxiv: 2602.08934
 ---
 # StealthRL LoRA Adapter for Qwen3-4B-Instruct (PEFT)
 This repository hosts a **LoRA (Low-Rank Adaptation) adapter** for the base model
+**Qwen/Qwen3-4B-Instruct-2507**, presented in the paper [StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors](https://huggingface.co/papers/2602.08934).
+The authors of the paper are Suraj Ranganath and Atharv Ramesh.
 It is an **adapter-only** release (PEFT). The full base model is not included and must be downloaded separately from Hugging Face.
 **StealthRL** is a reinforcement learning framework for generating **adversarial paraphrases** that evade **multiple AI-text detectors** while preserving semantics.
+Key contributions from the paper:
 - StealthRL trains a **paraphrase policy** against a **multi-detector ensemble**
 - Uses **Group Relative Policy Optimization (GRPO)** with **LoRA adapters** on **Qwen3-4B**
 - Optimizes a **composite reward** that balances **detector evasion** with **semantic preservation**
 - Evaluates transfer to a **held-out detector family**, suggesting shared vulnerabilities rather than detector-specific brittleness
+Paper: [https://arxiv.org/abs/2602.08934](https://arxiv.org/abs/2602.08934)
+Code: [https://github.com/suraj-ranganath/StealthRL](https://github.com/suraj-ranganath/StealthRL)
 ---
 import torch
 base_model = "Qwen/Qwen3-4B-Instruct-2507"
+adapter_repo = "suraj-ranganath/StealthRL-Qwen3-4B-LORA" # Update with actual repo path if different
 tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
 model = AutoModelForCausalLM.from_pretrained(
 ## Associated Paper and Code
+- **Paper (arXiv)**: [https://arxiv.org/abs/2602.08934](https://arxiv.org/abs/2602.08934)
+- **GitHub Repository**: [https://github.com/suraj-ranganath/StealthRL](https://github.com/suraj-ranganath/StealthRL)
 ---