DEPO-Paraphrase

DEPO — Detector-Evasive Paraphrase via Constrained Policy Optimization

LoRA adapter on Qwen/Qwen3-4B-Instruct-2507. Trained with constrained RL using the MAGE detector reward and BERTScore semantic reward (CRL target τ = 0.85, checkpoint-750).

Usage

from inference import load_paraphrase_model

pm = load_paraphrase_model("WizardWang01/depo-paraphrase")
out = pm.rewrite("Your text here.")
print(out)

pip install torch transformers peft accelerate
python inference.py --adapter_path WizardWang01/depo-paraphrase

Generation defaults

Parameter	Value
Prompt template	preserve-meaning paraphrase (`eval_rl`)
max_new_tokens	512
temperature	0.9
top_p	0.95
torch_dtype	bfloat16

Intended use

Research and evaluation only. Not for academic dishonesty, spam, or circumventing platform policies.

Citation

If you use this model, please cite the DEPO paper (Detector-Evasive Paraphrase via Constrained Policy Optimization).

Downloads last month: 27

Model tree for WizardWang01/depo-paraphrase

Base model

Qwen/Qwen3-4B-Instruct-2507

Adapter

(5530)

this model