DEPO-Paraphrase

DEPO โ€” Detector-Evasive Paraphrase via Constrained Policy Optimization

LoRA adapter on Qwen/Qwen3-4B-Instruct-2507. Trained with constrained RL using the MAGE detector reward and BERTScore semantic reward (CRL target ฯ„ = 0.85, checkpoint-750).

Usage

from inference import load_paraphrase_model

pm = load_paraphrase_model("WizardWang01/depo-paraphrase")
out = pm.rewrite("Your text here.")
print(out)
pip install torch transformers peft accelerate
python inference.py --adapter_path WizardWang01/depo-paraphrase

Generation defaults

Parameter Value
Prompt template preserve-meaning paraphrase (eval_rl)
max_new_tokens 512
temperature 0.9
top_p 0.95
torch_dtype bfloat16

Intended use

Research and evaluation only. Not for academic dishonesty, spam, or circumventing platform policies.

Citation

If you use this model, please cite the DEPO paper (Detector-Evasive Paraphrase via Constrained Policy Optimization).

Downloads last month
27
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for WizardWang01/depo-paraphrase

Adapter
(5530)
this model

Space using WizardWang01/depo-paraphrase 1